학술논문

Dynamic Graph Construction Framework for Multimodal Named Entity Recognition in Social Media
Document Type
Periodical
Source
IEEE Transactions on Computational Social Systems IEEE Trans. Comput. Soc. Syst. Computational Social Systems, IEEE Transactions on. 11(2):2513-2522 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Visualization
Semantics
Task analysis
Social networking (online)
Machine translation
Computational modeling
Representation learning
Dynamic cross-modal graph
dynamic graph construction
multimodal named entity recognition (MNER)
text-image matching
Language
ISSN
2329-924X
2373-7476
Abstract
Multimodal named entity recognition (MNER) aims to detect named entities and identify the entity types based on texts and attached images, which also generates inputs for other comprehensive tasks, such as multimodal machine translation, visual dialog, and multimodal sentiment analysis. Existing studies have limitations in text-image matching and multimodal semantic disparity reduction. For one thing, current methods fail to resolve both overall and local text-image matching issues in a self-guided way. For another, the static graphs constructed in MNER models are challenging in bridging the semantic gap between different modalities. In this work, a dynamic graph construction framework (DGCF) is proposed to solve the above-mentioned limitations. A similarity vector-based text-image matching inferring strategy is designed to obtain the overall and local matching relation between text and image while the overall matching determines the retained proportion of visual information. Then, a multimodal dynamic graph interaction module is developed. Within each layer of the module, the local matching relations and part of speech (POS)-based multihead attention are integrated to construct a dynamic cross-modal graph and a semantic graph. Lastly, a CRF layer is used to predict entity label. Extensive experiments are performed on two benchmark datasets. The experimental results reveal that our model is a competitive alternative and achieves state-of-the-art performance.