학술논문
Guest Editorial Introduction to the Special Issue on Video Transformers
Document Type
Periodical
Author
Source
IEEE Transactions on Circuits and Systems for Video Technology IEEE Trans. Circuits Syst. Video Technol. Circuits and Systems for Video Technology, IEEE Transactions on. 33(9):4448-4451 Sep, 2023
Subject
Language
ISSN
1051-8215
1558-2205
1558-2205
Abstract
Currently, Transformer has been widely used in natural language and image processing and has achieved excellent results. Benefiting from the self-attention operation and global interaction, Transformer has demonstrated more powerful spatiotemporal modeling capabilities than traditional convolutional and recurrent neural networks. However, research on video Transformer is still in its infancy. Specifically, with the development of internet technology, video data has become a commonly used medium, playing a critical role in many areas such as entertainment, education, healthcare, security, etc. Different from static data such as images and text, video data consists of a series of image frames and is more concerned with temporal and motion information, which makes it necessary to employ some adaptations and well-designed network architectures to capture the discriminative features. In addition, the multi-modal information attached to video data further increases the difficulty of applying Transformer to videos.