학술논문

Holistic Multi-Modal Memory Network for Movie Question Answering

Document Type

Periodical

Author

Wang, A.; Luu, A.T.; Foo, C.; Zhu, H.; Tay, Y.; Chandrasekhar, V.

Source

IEEE Transactions on Image Processing IEEE Trans. on Image Process. Image Processing, IEEE Transactions on. 29:489-499 2020

Subject

Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Knowledge discovery
Visualization
Videos
Hidden Markov models
Task analysis
Motion pictures
Semantics
Question answering
multi-modal learning
MovieQA

Language

ISSN

1057-7149
1941-0042

Abstract

Answering questions using multi-modal context is a challenging problem, as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a holistic multi-modal memory network (HMMN) framework that fully considers interactions between different input sources (multi-modal context and question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the MovieQA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송