학술논문

Random Walks for Temporal Action Segmentation with Timestamp Supervision
Document Type
Conference
Source
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) WACV Applications of Computer Vision (WACV), 2024 IEEE/CVF Winter Conference on. :6600-6610 Jan, 2024
Subject
Computing and Processing
Training
Computer vision
Adaptation models
Uncertainty
Smoothing methods
Image annotation
Predictive models
Algorithms
Video recognition and understanding
Machine learning architectures
formulations
and algorithms
Language
ISSN
2642-9381
Abstract
Temporal action segmentation relates to high-level video understanding, commonly formulated as frame-wise classification of untrimmed videos into predefined actions. Fully-supervised deep-learning approaches require dense video annotations which are time and money consuming. Furthermore, the temporal boundaries between consecutive actions typically are not well-defined, leading to inherent ambiguity and interrater disagreement. A promising approach to remedy these limitations is timestamp supervision, requiring only one labeled frame per action instance in a training video. In this work, we reformulate the task of temporal segmentation as a graph segmentation problem with weakly-labeled vertices. We introduce an efficient segmentation method based on random walks on graphs, obtained by solving a sparse system of linear equations. Furthermore, the proposed technique can be employed in any one or combination of the following forms: (1) as a standalone solution for generating dense pseudo-labels from timestamps; (2) as a training loss; (3) as a smoothing mechanism given intermediate predictions. Extensive experiments with three datasets (50Salads, Breakfast, GTEA) show that our method competes with state-of-the-art, and allows the identification of regions of uncertainty around action boundaries.