학술논문

Automatic detection of unnatural word-level segments in unit-selection speech synthesis
Document Type
Conference
Source
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. :289-294 Dec, 2011
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Training
Speech
Feature extraction
Humans
Speech synthesis
Acoustics
Testing
Language
Abstract
We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.