학술논문

Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training
Document Type
Conference
Source
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2023 IEEE. :1-7 Dec, 2023
Subject
Signal Processing and Analysis
Training
Transducers
Conferences
Buildings
Logic gates
Transformers
Acoustics
Multilingual automatic speech recognition
transformer transducer
language ID
expert
Language
Abstract
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring user input for language identification (LID) during inference. Our method incorporates a gating mechanism and LID loss to enable transformer experts to learn language-specific information. Linear experts are applied on joint network to stabilize training. The curriculum training scheme leverages LID to guide gated experts in improving their respective language-specific performance. Experimental results on an English and Spanish bilingual task show significant average relative word error reductions of 12.5 % and 7.3 % compared to the baseline bilingual and monolingual models, respectively. Our models even perform similarly to upper-bound models with oracle LID. Extending our approach to trilingual, quadrilingual, and pentalingual models reveals similar advantages to those seen in the bilingual models, highlighting its ease of extension to multiple languages.