학술논문

GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition
Document Type
Periodical
Source
IEEE Transactions on Audio, Speech, and Language Processing IEEE Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE Transactions on. 18(6):1300-1312 Aug, 2010
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Kernel
Speaker recognition
Support vector machines
Covariance matrix
Statistical distributions
NIST
Support vector machine classification
Scalability
Speech recognition
Speech processing
Gaussian mixture model (GMM)
speaker recognition
supervector
support vector machine (SVM)
Language
ISSN
1558-7916
1558-7924
Abstract
Among conventional methods for text-independent speaker recognition, Gaussian mixture model (GMM) is known for its effectiveness and scalability in modeling the spectral distribution of speech. A GMM-supervector characterizes a speaker's voice by the GMM parameters such as the mean vectors, covariance matrices and mixture weights. Besides the first-order statistics, it is generally believed that speaker's cues are partly conveyed by the second-order statistics. In this paper, we introduce a Bhattacharyya-based GMM-distance to measure the distance between two GMM distributions. Subsequently, the GMM-UBM mean interval (GUMI) concept is introduced to derive a GUMI kernel which can be used in conjunction with support vector machine (SVM) for speaker recognition. The GUMI kernel allows us to exploit the speaker's information not only from the mean vectors of GMM but also from the covariance matrices. Moreover, by analyzing the Bhattacharyya-based GMM-distance measure, we extend the Bhattacharyya-based kernel by involving both the mean and covariance statistical dissimilarities. We demonstrate the effectiveness of the new kernel on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2006 dataset.