학술논문

Improving Distantly-Supervised Relation Extraction Through BERT-Based Label and Instance Embeddings
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:62574-62582 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Bit error rate
Encoding
Shape
Data mining
Task analysis
Noise measurement
Training
Relation extraction
distant supervision
BERT
label embeddings
relation attention
entity information
Language
ISSN
2169-3536
Abstract
Distantly-supervised relation extraction (RE) is an effective method to scale RE to large corpora but suffers from noisy labels. Existing approaches try to alleviate noise through multi-instance learning and by providing additional information but manage to recognize mainly the top frequent relations, neglecting those in the long-tail. We propose REDSandT (Relation Extraction with Distant Supervision and Transformers), a novel distantly-supervised transformer-based RE method that manages to capture a wider set of relations through highly informative instance and label embeddings for RE by exploiting BERT’s pre-trained model, and the relationship between labels and entities, respectively. We guide REDSandT to focus solely on relational tokens by fine-tuning BERT on a structured input, including the sub-tree connecting an entity pair and the entities’ types. Using the extracted informative vectors, we shape label embeddings, which we also use as an attention mechanism over instances to further reduce noise. Finally, we represent sentences by concatenating relation and instance embeddings. Experiments in the two benchmark datasets for distantly-supervised RE, NYT-10 and GDS, show that REDSandT captures a broader set of relations with higher confidence, achieving a state-of-the-art AUC (0.424) in NYT-10 and an excellent AUC (0.862) in GDS.