학술논문

Entity Resolution in Dissimilarity Spaces
Document Type
Conference
Source
25th Pan-Hellenic Conference on Informatics. :413-418
Subject
Language
English
Abstract
In this paper we propose a dissimilarity-based entity resolution framework that imposes a new efficient object representation scheme. This representation relies on the embedding of the dissimilarity space of pairs of objects to the space of distances of objects from a set of prototypes. These prototypes are selected among the input objects as the centers of clusters which are identified through an efficient clustering technique. An accurate object similarity metric that takes into consideration the rank correlation of distances from the prototypes is utilized to overcome the curse of dimensionality problem. Our methodology proposes the use of the generalized Hausdorff distance metric to deal with those cases where only partially ranked data is available in the representation domain of objects. Finally a locality sensitive hashing approach for partially ranked data is applied to reduce the high complexity of the similarity search for approximate duplicates.

Online Access