학술논문

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset

Document Type

Conference

Author

Chen, Xie; Wu, Yu; Wang, Zhenghao; Liu, Shujie; Li, Jinyu

Source

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021 - 2021 IEEE International Conference on. :5904-5908 Jun, 2021

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Transducers
Runtime
Computational modeling
Conferences
Speech recognition
Computer architecture
Signal processing
Transformer
Transducer
Real-time decoding
Speech Recognition

Language

ISSN

2379-190X

Abstract

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer- XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송