학술논문

Quantitative Analysis of a Common Audio Similarity Measure
Document Type
Periodical
Source
IEEE Transactions on Audio, Speech, and Language Processing IEEE Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE Transactions on. 17(4):693-703 May, 2009
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Instruments
Music information retrieval
Cepstral analysis
Timbre
Bit rate
Speech processing
Frequency
Councils
Nearest neighbor searches
Source separation
Melody
musical instrument classification
timbre recognition
Language
ISSN
1558-7916
1558-7924
Abstract
For music information retrieval tasks, a nearest neighbor classifier using the Kullback–Leibler divergence between Gaussian mixture models of songs' melfrequency cepstral coefficients is commonly used to match songs by timbre. In this paper, we analyze this distance measure analytically and experimentally by the use of synthesized MIDI files, and we find that it is highly sensitive to different instrument realizations. Despite the lack of theoretical foundation, it handles the multipitch case quite well when all pitches originate from the same instrument, but it has some weaknesses when different instruments play simultaneously. As a proof of concept, we demonstrate that a source separation frontend can improve performance. Furthermore, we have evaluated the robustness to changes in key, sample rate, and bitrate.