학술논문

Bayesian Singing Transcription Based on a Hierarchical Generative Model of Keys, Musical Notes, and F0 Trajectories
Document Type
Periodical
Source
IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE/ACM Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 28:1678-1691 2020
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Trajectory
Hidden Markov models
Rhythm
Estimation
Multiple signal classification
Bayes methods
Automatic singing transcription
hierarchical hidden semi-Markov model
Language
ISSN
2329-9290
2329-9304
Abstract
This article describes automatic singing transcription (AST) that estimates a human-readable musical score of a sung melody represented with quantized pitches and durations from a given music audio signal. To achieve the goal, we propose a statistical method for estimating the musical score by quantizing a trajectory of vocal fundamental frequencies (F0s) in the time and frequency directions. Since vocal F0 trajectories considerably deviate from the pitches and onset times of musical notes specified in musical scores, the local keys and rhythms of musical notes should be taken into account. In this article we propose a Bayesian hierarchical hidden semi-Markov model (HHSMM) that integrates a musical score model describing the local keys and rhythms of musical notes with an F0 trajectory model describing the temporal and frequency deviations of an F0 trajectory. Given an F0 trajectory, a sequence of musical notes, that of local keys, and the temporal and frequency deviations can be estimated jointly by using a Markov chain Monte Carlo (MCMC) method. We investigated the effect of each component of the proposed model and showed that the musical score model improves the performance of AST.