학술논문

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Document Type

Periodical

Author

Stolcke, A.; Barry Chen; Franco, H.; Venkata Ramana Rao Gadde; Graciarena, M.; Mei-Yuh Hwang; Kirchhoff, K.; Mandal, A.; Morgan, N.; Xin Lei; Ng, T.; Ostendorf, M.; Sonmez, K.; Venkataraman, A.; Vergyri, D.; Wen Wang; Jing Zheng; Qifeng Zhu

Source

IEEE Transactions on Audio, Speech, and Language Processing IEEE Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE Transactions on. 14(5):1729-1744 Sep, 2006

Subject

Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Technological innovation
Natural languages
Computer science
Laboratories
Speech recognition
Adaptation model
Acoustic measurements
Multilayer perceptrons
Cepstral analysis
Telephony
Broadcast news (BN)
conversational telephone speech (CTS)
speech-to-text (STT)

Language

ISSN

1558-7916
1558-7924

Abstract

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin.

Online Access

Full Text (IEEE) Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송