학술논문

Device-Robust Acoustic Scene Classification via Impulse Response Augmentation

Document Type

Conference

Author

Morocutti, Tobias; Schmid, Florian; Koutini, Khaled; Widmer, Gerhard

Source

2023 31st European Signal Processing Conference (EUSIPCO) European Signal Processing Conference (EUSIPCO), 2023 31st. :176-180 Sep, 2023

Subject

Signal Processing and Analysis
Training
Performance evaluation
Scene classification
Europe
Transformers
Acoustics
Recording
Recording Device Generalization
Impulse Response Augmentation
Freq-MixStyle
Acoustic Scene Classification

Language

ISSN

2076-1465

Abstract

The ability to generalize to a wide range of recording devices is a crucial performance factor for audio classification models. The characteristics of different types of microphones introduce distributional shifts in the digitized audio signals due to their varying frequency responses. If this domain shift is not taken into account during training, the model's performance could degrade severely when it is applied to signals recorded by unseen devices. In particular, training a model on audio signals recorded with a small number of different microphones can make generalization to unseen devices difficult. To tackle this problem, we convolve audio signals in the training set with pre-recorded device impulse responses (DIRs) to artificially increase the diversity of recording devices. We systematically study the effect of DIR augmentation on the task of Acoustic Scene Classification using CNNs and Audio Spectrogram Transformers. The results show that DIR augmentation in isolation performs similarly to the state-of-the-art method Freq-MixStyle. However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송