학술논문

Guest Editorial Introduction to the Special Issue on Video Transformers
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems for Video Technology IEEE Trans. Circuits Syst. Video Technol. Circuits and Systems for Video Technology, IEEE Transactions on. 33(9):4448-4451 Sep, 2023
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Special issues and sections
Transformers
Image processing
Natural language processing
Video equipment
Spatiotemporal phenomena
Videos
Language
ISSN
1051-8215
1558-2205
Abstract
Currently, Transformer has been widely used in natural language and image processing and has achieved excellent results. Benefiting from the self-attention operation and global interaction, Transformer has demonstrated more powerful spatiotemporal modeling capabilities than traditional convolutional and recurrent neural networks. However, research on video Transformer is still in its infancy. Specifically, with the development of internet technology, video data has become a commonly used medium, playing a critical role in many areas such as entertainment, education, healthcare, security, etc. Different from static data such as images and text, video data consists of a series of image frames and is more concerned with temporal and motion information, which makes it necessary to employ some adaptations and well-designed network architectures to capture the discriminative features. In addition, the multi-modal information attached to video data further increases the difficulty of applying Transformer to videos.