학술논문

Modular Conformer Training for Flexible End-to-End ASR

Document Type

Conference

Author

Audhkhasi, Kartik; Farris, Brian; Ramabhadran, Bhuvana; Moreno, Pedro J.

Source

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Convolution
Error analysis
Acoustics
Decoding
Speech processing
Standards
Automatic speech recognition
self-attention
submodels

Language

ISSN

2379-190X

Abstract

The state-of-the-art conformer used in automatic speech recognition combines feed-forward, convolution and multi-headed self-attention layers in a single model that is trained end-to-end with a decoder network. While this end-to-end training is simple and beneficial for word error rate, it restricts the ability to perform inference with the model at different operating points of word error rate and latency. Existing approaches to overcome this limitation include cascaded encoders and variable attention context models. We propose an alternative approach, called Modular Conformer training, which splits the Conformer model into a backbone convolutional model and attention submodels, which are added at each layer. We conduct experiments with a few training techniques on the Librispeech and Librilight corpus. We show that dropping-out the attention layers during the training of the backbone model allows for the largest WER improvements upon adding fine-tuned attention submodels, without impacting the WER of the backbone model itself.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송