학술논문

GEA-net: Global embedded attention neural network for image classification
Document Type
Conference
Source
2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) TRUSTCOM Trust, Security and Privacy in Computing and Communications (TrustCom), 2021 IEEE 20th International Conference on. :1300-1305 Oct, 2021
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Visualization
Privacy
Three-dimensional displays
Tensors
Neuroscience
Fuses
Neurons
Attention
Image classification
Global context information
Language
ISSN
2324-9013
Abstract
Recently, it is generally acknowledged that the receptive field size of visual cortical neurons are regulated by the stimulus in the neuroscience community. Thus, once the global receptive field is obtained, the network performance can be greatly improved. Unfortunately, the larger receptive field method has been rarely considered in constructing CNNs. Recent studies on network design have demonstrated that the key to improving model performance is channel attention. However, they usually neglect the location information. Hence, it is difficult to capture the long-term dependency of location information. In particular, the location information is important for generating spatially selective attention blocks. Therefore, in this paper, we propose a novel attention neural network termed “GEA” by embedding global context information into channel attention. Firstly, instead of channel attention, which is transformed from feature tensor to single feature vector via 2D pooling, our method decomposes the channel attention into two 1D feature encoding processes that aggregate features along two spatial directions. In particular, the long-term dependency can be captured by using one spatial direction as well as preserving accurate location information in the opposite direction. Then, we concatenate the results of the two directions, while carry out batch normalization and use relu activation function. In addition, the cross channel soft attention is used to adaptively select different spatial scales of the information. Finally, our method is simple and can be flexibly inserted into classic networks, such as ResNet and EfficientNet, with limited computational overhead. A large number of experiments show that our method is not only conducive to the classification of COCO and ImageNet, but also demonstrates promising performance for 3D features, such as Magnetic Resonance Imaging(MRI).