학술논문

Text-Based Localization of Moments in a Video Corpus

Document Type

Periodical

Author

Paul, S.; Mithun, N.C.; Roy-Chowdhury, A.K.

Source

IEEE Transactions on Image Processing IEEE Trans. on Image Process. Image Processing, IEEE Transactions on. 30:8886-8899 2021

Subject

Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Task analysis
Location awareness
Semantics
Visualization
Image coding
Feature extraction
Annotations
Temporal localization
video moment retrieval
video corpus

Language

ISSN

1057-7149
1941-0042

Abstract

Prior works on text-based video moment localization focus on temporally grounding the textual query in an untrimmed video. These works assume that the relevant video is already known and attempt to localize the moment on that relevant video only. Different from such works, we relax this assumption and address the task of localizing moments in a corpus of videos for a given sentence query. This task poses a unique challenge as the system is required to perform: 2) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, 2) temporal localization of moment in the relevant video based on sentence query. Towards overcoming this challenge, we propose Hierarchical Moment Alignment Network (HMAN) which learns an effective joint embedding space for moments and sentences. In addition to learning subtle differences between intra-video moments, HMAN focuses on distinguishing inter-video global semantic concepts based on sentence queries. Qualitative and quantitative results on three benchmark text-based video moment retrieval datasets - Charades-STA, DiDeMo, and ActivityNet Captions - demonstrate that our method achieves promising performance on the proposed task of temporal localization of moments in a corpus of videos.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송