학술논문

One-Shot Example Videos Localization Network for Weakly-Supervised Temporal Action Localization
Document Type
Conference
Source
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR) MIPR Multimedia Information Processing and Retrieval (MIPR), 2021 IEEE 4th International Conference on. :125-130 Sep, 2021
Subject
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
Robotics and Control Systems
Signal Processing and Analysis
Location awareness
Training
Conferences
Information processing
Streaming media
Image reconstruction
Videos
Video Analysis
Weakly Supervised Learning
Action Localization
Untrimmed Video
Language
Abstract
This paper tackles the problem of example-driven weakly-supervised temporal action localization. We propose the One-shot Example Videos Localization Network (OSEVLNet) for precisely localizing the action instances in untrimmed videos with only one trimmed example video. Since the frame-level ground truth is unavailable under weakly-supervised settings, our approach automatically trains a self-attention module with reconstruction and feature discrepancy restriction. Specifically, the reconstruction restriction minimizes the discrepancy between the original input features and the reconstructed features of a Variational AutoEncoder (VAE) module. The feature discrepancy restriction maximizes the distance of weighted features between highly-responsive regions and slightly-responsive regions. Our approach achieves comparable or better results on THUMOS’14 dataset than other weakly-supervised methods while it is trained with much less videos. Moreover, our approach is especially suitable for the expansion of newly emerging action categories to meet the requirements of different occasions.