학술논문

Towards an Efficient Deep Learning Model for Emotion and Theme Recognition in Music

Document Type

Conference

Author

Rajamani, Srividya Tirunellai; Rajamani, Kumar; Schuller, Bjorn W.

Source

2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP) Multimedia Signal Processing (MMSP), 2021 IEEE 23rd International Workshop on. :1-5 Oct, 2021

Subject

Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Deep learning
Geometry
Emotion recognition
Visualization
Conferences
Signal processing
Hardware
automatic music tagging
music emotion recognition
VGG
multi-label classification
self-attention

Language

ISSN

2473-3628

Abstract

Emotion and theme recognition in music plays a vital role in music information retrieval and recommendation systems. Deep learning based techniques have shown great promise in this regard. Realising optimal network configurations with least number of floating point operations per second (FLOPS) and model parameters is of paramount importance to obtain efficient deployable models, especially for resource constrained hardware. We propose a novel integration of stand-alone self-attention into a Visual Geometry Group (VGG)-like network for the task of multi-label emotion and theme recognition in music. Through extensive experimental evaluation, we discover the ideal and optimal integration of stand-alone self-attention which leads to substantial reduction in number of parameters and FLOPS, yet yielding better performance. We benchmark our results on the autotagging-moodtheme subset of the MTG-Jamendo dataset. Using mel-spectrogram as the input, we demonstrate that our proposed SA-VGG network requires 55 % fewer parameters and 60 % fewer FLOPS while improving the baseline ROC-AUC and PR-AUC.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송