학술논문

Attention-Based Speech Recognition Using Gaze Information

Document Type

Conference

Author

Segawa, Osamu; Hayashi, Tomoki; Takeda, Kazuya

Source

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Automatic Speech Recognition and Understanding Workshop (ASRU), 2019 IEEE. :465-470 Dec, 2019

Subject

Signal Processing and Analysis
Speech recognition
Acoustics
Task analysis
Decoding
Hidden Markov models
Lips
Feature extraction
end-to-end speech recognition
attention
multi-modal
gaze-point

Language

Abstract

We assume that there is a correlation between an utterance and a corresponding gaze object, and propose a new paradigm of multi-modal end-to-end speech recognition using multimodal information, namely, utterances and corresponding gaze points. In our method, the system extracts acoustic features and corresponding images around gaze points, and inputs the information into the proposed attention-based multiple encoder-decoder networks. This makes it possible to integrate the two different modalities, and the performance of speech recognition is improved. To evaluate the proposed method, we prepared a simulation task of power-line control operations, and built a corpus that contains utterances and corresponding gaze points in the operations. We conducted an experimental evaluation using this corpus, and the results showed the reduction in the CER, suggesting the effectiveness of the proposed method in which acoustic features and gaze information are integrated.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송