학술논문

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Document Type

Conference

Author

Chen, Yu-Wen; Hung, Kuo-Hsuan; Chuang, Shang-Yi; Sherman, Jonathan; Huang, Wen-Chin; Lu, Xugang; Tsao, Yu

Source

2021 IEEE International Symposium on Circuits and Systems (ISCAS). :1-5 May, 2021

Subject

Components, Circuits, Devices and Systems
Training
Measurement
Vocoders
System performance
Sensor systems
Sensors
Spectrogram
articulatory movement
end-to-end
multimodal learning
neural network
speech synthesis

Language

ISSN

2158-1525

Abstract

Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송