학술논문

Context-Aware end-to-end ASR Using Self-Attentive Embedding and Tensor Fusion

Document Type

Conference

Author

Chang, Shuo-Yiin; Zhang, Chao; Sainath, Tara N.; Li, Bo; Strohman, Trevor

Source

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Tensors
Video on demand
Transducers
Fuses
Signal processing
Acoustics
Decoding
longform ASR
end-to-end ASR

Language

ISSN

2379-190X

Abstract

Typical automatic speech recognition (ASR) systems are built to recognize independent utterances without using the cross-utterance context. However, the context over multiple utterances often provides useful information for the ASR task. In this work, we propose a context-aware end-to-end ASR model that injects the self-attentive context embedding into the decoder of the recurrent neural network transducer (RNN-T). We also propose a factorised 3-way tensor fusion approach to fuse the context embedding with the acoustic representations extracted from the acoustic encoder and the text representations obtained using the prediction network based on the previous subword units. Experimental results on a long-form Youtube ASR task shows that the proposed approach achieves 10.8% relative word error rate reductions.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송