학술논문

Learning Multitask Commonness and Uniqueness for Multimodal Sarcasm Detection and Sentiment Analysis in Conversation
Document Type
Periodical
Source
IEEE Transactions on Artificial Intelligence IEEE Trans. Artif. Intell. Artificial Intelligence, IEEE Transactions on. 5(3):1349-1361 Mar, 2024
Subject
Computing and Processing
Task analysis
Multitasking
Sentiment analysis
Oral communication
Correlation
Artificial intelligence
Electronic mail
Multimodal representation
multimodal sentiment analysis
multitask uniqueness
sarcasm detection
Language
ISSN
2691-4581
Abstract
Sarcasm is a form of figurative language device to express human inner feelings, where the author writes the positive sentence on surface form, while he/she actually expresses negative sentiment, vice versa. Sentiment, thus, comes into sight, and is closely related with sarcasm, leading to the recent popularity of multimodal sarcasm and sentiment joint detection in conversation (dialogue). The key challenges involve multimodal fusion and multitask interaction. Most of the existing studies have focused on building multimodal fused representation, while the commonness and uniqueness across related tasks has not received attention. To fill this gap, we propose a multimodal multitask interaction learning framework, termed MIL, for joint detection of sarcasm and sentiment. Specifically, a cross-modal target attention mechanism is proposed to automatically learn the alignment between texts and images/speeches. In addition, a multimodal interaction learning paradigm consisting of a dual-gating network, three separate fully connected layers that simultaneously capture the commonness and uniqueness. Comprehensive experiments on two benchmarking datasets (i.e., Memotion and MUStARD) show the effectiveness of the proposed model over state-of-the-art baselines with a significant improvement of 1.9%, 2.4% in terms of F1.