학술논문

Robust automatic speech recognition using acoustic model adaptation prior to missing feature reconstruction
Document Type
Conference
Source
2009 17th European Signal Processing Conference Signal Processing Conference, 2009 17th European. :535-539 Aug, 2009
Subject
Signal Processing and Analysis
Speech
Speech recognition
Noise
Noise measurement
Hidden Markov models
Acoustics
Adaptation models
Language
Abstract
When speech recognition is used in real-world environments, simultaneous speaker and environmental adaptation and compensation for time-varying noise effects is needed. Noise compensation methods like missing feature reconstruction should be combined with adaptation methods like constrained maximum likelihood linear regression (CMLLR). This is only straightforward if reconstruction is used prior to CMLLR. In this work, reconstruction is modified so that we can estimate CMLLR transformations prior to reconstruction. The new approach is evaluated on large vocabulary speech data recorded in noisy public and car environments and compared to using reconstruction prior to CMLLR estimation. The results suggest the noise environment determines which approach is better. Using adaptation prior to reconstruction has the better performance when evaluated on data from public environments. The relative reductions in letter error rate were 47–50 % compared to the baseline and 13–19 % compared to using either adaptation or reconstruction alone.