학술논문

Unified Spatio-Temporal Dynamic Routing for Efficient Video Object Segmentation
Document Type
Periodical
Source
IEEE Transactions on Intelligent Transportation Systems IEEE Trans. Intell. Transport. Syst. Intelligent Transportation Systems, IEEE Transactions on. 25(5):4512-4526 May, 2024
Subject
Transportation
Aerospace
Communication, Networking and Broadcast Technologies
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Semantics
Object segmentation
Task analysis
Memory management
Visualization
Vehicle dynamics
Routing
Video object segmentation
spatio-temporal dynamic routing
progressive contextual memory enhancement
spatial constraint
temporal consistency
Language
ISSN
1524-9050
1558-0016
Abstract
Existing methods for video object segmentation (VOS) have achieved significant success by performing semantic guidance, spatial constraint, or temporal consistency. However, VOS still remains highly challenging because it is difficult to collaboratively leverage spatial constraint, temporal consistency, and semantic guidance while reducing redundant information. In this paper, we propose an efficient unified spatio-temporal dynamic routing (STDR) framework to address VOS by achieving a better spatio-temporal balance while avoiding redundancy. Specifically, our unified spatio-temporal modeling contains three paths: 1) short-term spatial path is employed to mine the spatial constraints from the previous frame; 2) long-term semantic path is used to capture semantic cues from the first reference frame with ground-truth labels; 3) memory queue path is designed to efficiently exploit the temporal consistency of middle frames with a compact memory bank of constant size. To enhance the input of each path, we introduce a progressive contextual memory enhancement module to exploit the contextualized memory with growing receptive fields by progressively aggregating spatial contextual information from adjacent frames for each memory frame. Furthermore, we design a dynamic memory-routed module to globally refine the outputs of our three paths for unified modeling. Enhanced by the proposed modules, our STDR achieves state-of-the-art performance with fast speed on the DAVIS 2016, DAVIS 2017 Val/Test, YouTube-VOS 2018/2019, and real-world long-video benchmarks.