학술논문

A frequency warping approach to speaker normalization
Document Type
Periodical
Author
Source
IEEE Transactions on Speech and Audio Processing IEEE Trans. Speech Audio Process. Speech and Audio Processing, IEEE Transactions on. 6(1):49-60 Jan, 1998
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Frequency estimation
Degradation
Speech recognition
Shape
Maximum likelihood estimation
Telephony
Filter bank
Cepstrum
Cepstral analysis
Error analysis
Language
ISSN
1063-6676
1558-2353
Abstract
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.