학술논문

Implementation of an extended Fellegi-Sunter probabilistic record linkage method using the Jaro-Winkler string comparator
Document Type
Conference
Source
IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) Biomedical and Health Informatics (BHI), 2014 IEEE-EMBS International Conference on. :375-379 Jun, 2014
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Couplings
Probabilistic logic
Estimation
Educational institutions
Databases
Accuracy
Computational modeling
Language
ISSN
2168-2194
2168-2208
Abstract
Record linkage is the task of identifying which records from one or more data sources refer to the same person. Often, records do not have a common key and may contain typographical variations in identifier fields, in such a case, the Fellegi-Sunter probabilistic record linkage is a method commonly used. In this method, a weight is assigned for each pair of records. Record pairs with weights above a given threshold are considered as matches. Winkler introduced an extension of the Fellegi-Sunter method that takes into account field similarity in the calculation of weight, and proved its outperformance. The implementation of the Fellegi-Sunter method is frequently presented in the literature, however, the application of Winkler method is rarely mentioned. This paper presents brief backgrounds of these two record linkage methods, and describes in details how to implement the Winkler method. We formalized and then estimated the required parameters of the Winkler method using the expectation-maximization (EM) algorithm. Simulated data sets-with known truth of the matches-were used to assess parameters' estimation and to compare Winkler and Fellegi-Sunter methods regarding their ability to reduce the rates of false matches and false non-matches.