학술논문
GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition
Document Type
Periodical
Author
Source
IEEE Transactions on Audio, Speech, and Language Processing IEEE Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE Transactions on. 18(6):1300-1312 Aug, 2010
Subject
Language
ISSN
1558-7916
1558-7924
1558-7924
Abstract
Among conventional methods for text-independent speaker recognition, Gaussian mixture model (GMM) is known for its effectiveness and scalability in modeling the spectral distribution of speech. A GMM-supervector characterizes a speaker's voice by the GMM parameters such as the mean vectors, covariance matrices and mixture weights. Besides the first-order statistics, it is generally believed that speaker's cues are partly conveyed by the second-order statistics. In this paper, we introduce a Bhattacharyya-based GMM-distance to measure the distance between two GMM distributions. Subsequently, the GMM-UBM mean interval (GUMI) concept is introduced to derive a GUMI kernel which can be used in conjunction with support vector machine (SVM) for speaker recognition. The GUMI kernel allows us to exploit the speaker's information not only from the mean vectors of GMM but also from the covariance matrices. Moreover, by analyzing the Bhattacharyya-based GMM-distance measure, we extend the Bhattacharyya-based kernel by involving both the mean and covariance statistical dissimilarities. We demonstrate the effectiveness of the new kernel on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2006 dataset.