학술논문

Entity Matching Based on Attribute-Aware and Multi-Perspective Similarity Measurement
Document Type
Article
Author
Source
Journal of Information Science and Engineering. Vol. 39 Issue 2, p423-438. 16 p.
Subject
entity matching
similarity measurement
data integration
deep learning
natural language processing
Language
英文
ISSN
1016-2364
Abstract
Entity matching (EM) identifies tuples from different data sources that refer to the same real-world entity. One of the main challenges of EM is attribute heterogeneity, that is, there are many different types of attributes in an entity. Present researches focus on using rules or neural networks to select similarity measures for different types of attributes. However, they select only one specific similarity measure for each attribute but ignore matching information from many other aspects. In addition, existing methods neglect the fact that different attributes have different contributions to final matching decision, and do not consider the influence of dirty data on matching results. In this paper, we propose an entity matching method based on attribute-aware and multi-perspective similarity measurement. Firstly, we propose a multi-perspective similarity measurement framework based on pre-trained language model DeBERTa to achieve the comprehensive multi-perspective similarity computation, which will capture the matching information from multiple perspectives such as literal, size and semantics. Secondly, we introduce an attribute attention mechanism to aggregate matching evidences from all aligned attributes according to the importance of each attribute for final matching decision. Finally, we use cross-attribute comparison to solve dirty data problems such as swap errors, and we further improve our model's matching capability through injecting external entity knowledge. Experimental results show that our framework for entity matching outperforms state-of-the-art methods on multiple real-world data sets.