학술논문

Hypothesis-driven adaptation (Hydra): a flexible eigenvoice architecture
Document Type
Conference
Author
Source
2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221) Acoustics, speech, and signal processing Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on. 1:349-352 vol.1 2001
Subject
Signal Processing and Analysis
Components, Circuits, Devices and Systems
Adaptation model
Maximum likelihood linear regression
Automatic speech recognition
Training data
Speech recognition
Speech processing
Humans
Maximum likelihood estimation
Probability density function
Gaussian processes
Language
ISSN
1520-6149
Abstract
In this article, a new architecture for speech recognition is introduced. As with many existing speech systems, this new approach involves multi-pass processing. In the present case, however, second-pass models are constructed on-line for each active hypothesis. Models for each hypothesized segment of the current utterance are constructed from linear combinations of "data cluster models" that have been trained on low-variability clusters of the training corpus. The data cluster weights are determined using an "eigenvoice" mechanism that is operative on low-complexity, low definition models. Once determined, the same weights are used to construct high-complexity, high-definition second-pass models generated over the same data clusters. Results from a simple recognition task are reported to demonstrate the interesting properties of the new architecture. The limitations, trade-offs and some possible extensions of the proposed approach are discussed.