학술논문

Cross-Layer Aggregation with Transformers for Multi-Label Image Classification

Document Type

Conference

Author

Zhang, Weibo; Zhu, Fuqing; Han, Jizhong; Guo, Tao; Hu, Songlin

Source

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022 - 2022 IEEE International Conference on. :3448-3452 May, 2022

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Costs
Benchmark testing
Transformers
Feature extraction
Computational efficiency
Task analysis
Cross-layer aggregation
transformers
multi-label image classification

Language

ISSN

2379-190X

Abstract

Multi-label image classification task aims to predict multiple object labels in a given image and faces the challenge of variable-sized objects. Limited by the size of CNN convolution kernels, existing CNN-based methods have difficulty capturing global dependencies and effectively fusing multiple layers features, which is critical for this task. Recently, transformers have utilized multi-head attention to extract feature with long range dependencies. Inspired by this, this paper proposes a Cross-layer Aggregation with Transformers (CAT) framework, which leverages transformers to capture the long range dependencies of CNN-based features with Long Range Dependencies module and aggregate the features layer by layer with Cross-Layer Fusion module. To make the framework efficient, a multi-head pre-max attention is designed to reduce the computation cost when fusing the high-resolution features of lower-layers. On two widely-used benchmarks (i.e., VOC2007 and MS-COCO), CAT provides a stable improvement over the baseline and produces a competitive performance.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송