학술논문

Action-Transformer for Action Recognition in Short Videos
Document Type
Conference
Source
2021 11th International Conference on Intelligent Control and Information Processing (ICICIP) Intelligent Control and Information Processing (ICICIP), 2021 11th International Conference on. :278-283 Dec, 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Three-dimensional displays
Convolution
Memory management
Transforms
Information processing
Feature extraction
Data models
action-transformer
convolution-free
residual attention
action recognition
Language
Abstract
Action recognition methods are mostly based on a 3-Dimensional (3D) Convolution Network which have some limitations in practice, e.g. redundant parameters, big memory consumed and low performance. In this paper, a new convolution-free model called action-transformer is proposed to address the mentioned problems. The model proposed is mainly composed of three modules: spatial-temporal transformation module, hybrid feature attention module, and residual-transformer module. The spatial-temporal transformation module is designed to map the split short video into spatial and temporal features. The hybrid feature attention module is designed to extract the fine-grained features from the spatial and temporal features and produce the hybrid features. The residual-transformer module is designed with the combination of the attention, feed-forward network, and the residual mechanism to extract local and global features from the hybrid features. The model is tested on the HMDB51 and UCFIOI data set, and the result shows that the memory, the parameters used by the proposed model are less than those models mentioned in the literature, and it achieves better performance too.