학술논문

Ensemble Prosody Prediction For Expressive Speech Synthesis

Document Type

Conference

Author

Teh, Tian Huey; Hu, Vivian; Ram Mohan, Devang S; Hodari, Zack; Wallis, Christopher G. R.; Gomez Ibarrondo, Tomas; Torresquintero, Alexandra; Leoni, James; Gales, Mark; King, Simon

Source

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Computational modeling
Computer architecture
Predictive models
Signal processing
Data models
Acoustics
Speech synthesis
Text-to-Speech
prosody prediction
ensemble methods

Language

ISSN

2379-190X

Abstract

Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ensemble of models.We apply ensemble learning to prosody prediction. We construct simple ensembles of prosody predictors by varying either model architecture or model parameter values.To automatically select amongst the models in the ensemble when performing Text-to-Speech, we propose a novel, and computationally trivial, variance-based criterion. We demonstrate that even a small ensemble of prosody predictors yields useful diversity, which, combined with the proposed selection criterion, outperforms any individual model from the ensemble.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송