학술논문

Contrastive Video Question Answering via Video Graph Transformer

Document Type

Periodical

Author

Xiao, J.; Zhou, P.; Yao, A.; Li, Y.; Hong, R.; Yan, S.; Chua, T.

Source

IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 45(11):13265-13280 Nov, 2023

Subject

Computing and Processing
Bioengineering
Transformers
Cognition
Visualization
Task analysis
Question answering (information retrieval)
Data models
Benchmark testing
VideoQA
cross-modal visual reasoning
video- language
dynamic visual graphs
contrastive learning
transformer

Language

ISSN

0162-8828
2160-9292
1939-3539

Abstract

We propose to perform video question answering (VideoQA) in a Co ntrastive manner via a V ideo G raph T ransformer model (CoVGT). CoVGT's uniqueness and superiority are three-fold: 1) It proposes a dynamic graph transformer module which encodes video by explicitly capturing the visual objects, their relations and dynamics, for complex spatio-temporal reasoning. 2) It designs separate video and text transformers for contrastive learning between the video and text to perform QA, instead of multi-modal transformer for answer classification. Fine-grained video-text communication is done by additional cross-modal interaction modules. 3) It is optimized by the joint fully- and self-supervised contrastive objectives between the correct and incorrect answers, as well as the relevant and irrelevant questions respectively. With superior video encoding and QA solution, we show that CoVGT can achieve much better performances than previous arts on video reasoning tasks. Its performances even surpass those models that are pretrained with millions of external data. We further show that CoVGT can also benefit from cross-modal pretraining, yet with orders of magnitude smaller data. The results demonstrate the effectiveness and superiority of CoVGT, and additionally reveal its potential for more data-efficient pretraining.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송