학술논문

Lip reading for robust speech recognition on embedded devices
Document Type
Conference
Source
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Acoustics, Speech, and Signal Processing Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on. 1:I/473-I/476 Vol. 1 2005
Subject
Signal Processing and Analysis
Components, Circuits, Devices and Systems
Robustness
Speech recognition
Noise reduction
Automatic speech recognition
Acoustic noise
Acoustic devices
Mouth
Feature extraction
Signal processing algorithms
Working environment noise
Language
ISSN
1520-6149
2379-190X
Abstract
In this article a complete audio-visual speech recognition system suitable for embedded devices is presented. As visual feature extraction algorithms active shape models (ASM) and discrete cosine transformation (DCT) have been investigated and discussed for an embedded implementation. The audio-visual information integration has also been designed by taking into account device limitations. It is well known that the use of visual cues improves the recognition results especially in scenarios with high level of acoustical noise. We wanted to compare the performance of lip reading and the conventional noise reduction systems in these degraded scenarios, as well as the combination of both kinds of solutions. Important improvements are obtained especially for nonstationary background noise like voice interference, car acceleration or indicator clicks. For this kind of noise lip reading outperforms the results obtained with conventional noise reduction technologies.