학술논문

Global Normalization for Streaming Speech Recognition in a Modular Framework

Document Type

Working Paper

Author

Variani, Ehsan; Wu, Ke; Riley, Michael; Rybach, David; Shannon, Matt; Allauzen, Cyril

Source

Subject

Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Computer Science - Computation and Language

Language

Abstract

We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.

Online Access

Open Access (Arxiv) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송