학술논문

LTACL: long-tail awareness contrastive learning for distantly supervised relation extraction
Document Type
article
Source
Complex & Intelligent Systems, Vol 10, Iss 1, Pp 1551-1563 (2023)
Subject
Distantly supervised learning
Information extraction
Relation extraction
Contrastive learning
Electronic computers. Computer science
QA75.5-76.95
Information technology
T58.5-58.64
Language
English
ISSN
2199-4536
2198-6053
Abstract
Abstract Distantly supervised relation extraction is an automatically annotating method for large corpora by classifying a bound of sentences with two same entities and the relation. Recent works exploit sound performance by adopting contrastive learning to efficiently obtain instance representations under the multi-instance learning framework. Though these methods weaken the impact of noisy labels, it ignores the long-tail distribution problem in distantly supervised sets and fails to capture the mutual information of different parts. We are thus motivated to tackle these issues and establishing a long-tail awareness contrastive learning method for efficiently utilizing the long-tail data. Our model treats major and tail parts differently by adopting hyper-augmentation strategies. Moreover, the model provides various views by constructing novel positive and negative pairs in contrastive learning for gaining a better representation between different parts. The experimental results on the NYT10 dataset demonstrate our model surpasses the existing SOTA by more than 2.61% AUC score on relation extraction. In manual evaluation datasets including NYT10m and Wiki20m, our method obtains competitive results by achieving 59.42% and 79.19% AUC scores on relation extraction, respectively. Extensive discussions further confirm the effectiveness of our approach.