학술논문

Improving Automatic Singing Skill Evaluation with Timbral Features, Attention, and Singing Voice Separation
Document Type
Conference
Source
2023 IEEE International Conference on Multimedia and Expo (ICME) ICME Multimedia and Expo (ICME), 2023 IEEE International Conference on. :612-617 Jul, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Correlation coefficient
Instruments
Manuals
Data augmentation
Data models
Videos
Timbral features
attention
singing voice separation
accompaniment removal
Language
ISSN
1945-788X
Abstract
Most automatic singing skill evaluation (ASSE) models focus only on solo singing, resulting in a limited application scope since singing is usually mixed with instrumental accompaniment in music. In this paper, we propose a more general ASSE model which applies to both solo singing and singing with accompaniment. For this purpose, we employ an existing singing voice separation tool for accompaniment removal and compare ASSE models trained with and without accompaniment. Results show that accompaniment removal achieves better performances. Furthermore, we explore different features and model architectures, concluding that the additions of timbral features, attention mechanism, and dense layer further improve the performance. Finally, we show that our proposed model achieves a Pearson correlation coefficient of 0.562, a 62.4% relative improvement compared to 0.346 for the baseline model.