학술논문

Video Instance Segmentation via Spatial Feature Enhancement and Temporal Fusion
Document Type
Conference
Source
2024 4th Asia Conference on Information Engineering (ACIE) ACIE Information Engineering (ACIE), 2024 4th Asia Conference on. :58-64 Jan, 2024
Subject
Computing and Processing
Instance segmentation
Tracking
Motion segmentation
Asia
Feature extraction
Multitasking
Data mining
video instance segmentation
temporal fusion
spatial attention
Language
Abstract
Video instance segmentation(VIS) has received enormous attention recently due to its inherently positive potential in areas such as autonomous driving and video understanding. As a difficult multi-task issue, instance tracking failure and incomplete segmentation caused by extensively existed occlusion or motion blur have been the key challenges. To combat this, we propose a novel online video instance segmentation algorithm, which attempts to take advantage of both temporal and spatial information of the video to improve the segmentation effect. Firstly, a spatial feature enhancement strategy is designed to extract spatial information in video frames which helps instance locating and achieving high accuracy segmentation results. Then, a temporal fusion module is proposed for capturing temporal information and generating the instance offset for tracking. The temporal information obtained helps to mitigate the adverse effects caused by mutual occlusion and motion blur. In addition, the temporal fusion module can implement tracking almost simultaneously with the segmentation, which effectively reduces the computational complexity caused by introducing an additional tracking module. Extensive experiments conducted on the challenging YouTube-VIS-2019 dataset demonstrate that our proposed method outperforms other outstanding algorithms.