학술논문

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Document Type

Conference

Author

Koizumi, Yuma; Zen, Heiga; Karita, Shigeki; Ding, Yifan; Yatabe, Kohei; Morioka, Nobuyuki; Zhang, Yu; Han, Wei; Bapna, Ankur; Bacchiani, Michiel

Source

2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023 IEEE Workshop on. :1-5 Oct, 2023

Subject

Signal Processing and Analysis
Degradation
Training
Training data
Speech enhancement
Linguistics
Signal processing
Feature extraction
Speech restoration
speech enhancement
text-to-speech
self-supervised learning

Language

ISSN

1947-1629

Abstract

Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) a text representation extracted from transcripts via PnG-BERT as a linguistic conditioning feature. Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web. Audio samples are available at our demo page: google.github.io/df-conformer/miipher/.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송