학술논문

Efficient Large-Scale Audio Tagging Via Transformer-to-CNN Knowledge Distillation

Document Type

Conference

Author

Schmid, Florian; Koutini, Khaled; Widmer, Gerhard

Source

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Computational modeling
Tagging
Predictive models
Transformers
Computational efficiency
Convolutional neural networks
Audio Tagging
AudioSet
Patchout Audio Transformer
MobileNetV3
Knowledge Distillation

Language

ISSN

2379-190X

Abstract

Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. The proposed training schema and the efficient CNN design based on MobileNetV3 results in models outperforming previous solutions in terms of parameter and computational efficiency and prediction performance. We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of .483 mAP on AudioSet. 1

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송