학술논문

Secure Approximate String Matching for Privacy-Preserving Record Linkage
Document Type
Periodical
Author
Source
IEEE Transactions on Information Forensics and Security IEEE Trans.Inform.Forensic Secur. Information Forensics and Security, IEEE Transactions on. 14(10):2623-2632 Oct, 2019
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
Couplings
Protocols
Encoding
Encryption
Public key
Databases
Homomorphic encryption
secure computation
approximate string matching
privacy-preserving records linkage
Language
ISSN
1556-6013
1556-6021
Abstract
Real-world applications of record linkage often require matching to be robust in spite of small variations in string fields. For example, two health care providers should be able to detect a patient in common, even if one record contains a typo or transcription error. In the privacy-preserving setting, however, the problem of approximate string matching has been cast as a trade-off between security and practicality, and the literature has mainly focused on Bloom filter encodings , an approach which can leak significant information about the underlying records. We present a novel public-key construction for secure two-party evaluation of threshold functions in restricted domains based on embeddings found in the message spaces of additively homomorphic encryption schemes. We use this to construct an efficient two-party protocol for privately computing the threshold Dice coefficient. Relative to the approach of Bloom filter encodings, our proposal offers formal security guarantees and greater matching accuracy. We implement the protocol and demonstrate the feasibility of this approach in linking medium-sized patient databases with tens of thousands of records.