학술논문

Quality of record linkage in a highly automated cancer registry that relies on encrypted identity data
Document Type
article
Source
GMS Medizinische Informatik, Biometrie und Epidemiologie, Vol 12, Iss 1, p Doc02 (2016)
Subject
record linkage
cancer registry
evaluation
encrypted data
data quality
Computer applications to medicine. Medical informatics
R858-859.7
Infectious and parasitic diseases
RC109-216
Language
German
English
ISSN
1860-9171
Abstract
Objectives: In the absence of unique ID numbers, cancer and other registries in Germany and elsewhere rely on identity data to link records pertaining to the same patient. These data are often encrypted to ensure privacy. Some record linkage errors unavoidably occur. These errors were quantified for the cancer registry of North Rhine Westphalia which uses encrypted identity data. Methods: A sample of records was drawn from the registry, record linkage information was included. In parallel, plain text data for these records were retrieved to generate a gold standard. Record linkage error frequencies in the cancer registry were determined by comparison of the results of the routine linkage with the gold standard. Error rates were projected to larger registries.Results: In the sample studied, the homonym error rate was 0.015%; the synonym error rate was 0.2%. The F-measure was 0.9921. Projection to larger databases indicated that for a realistic development the homonym error rate will be around 1%, the synonym error rate around 2%.Conclusion: Observed error rates are low. This shows that effective methods to standardize and improve the quality of the input data have been implemented. This is crucial to keep error rates low when the registry’s database grows. The planned inclusion of unique health insurance numbers is likely to further improve record linkage quality. Cancer registration entirely based on electronic notification of records can process large amounts of data with high quality of record linkage.