학술논문
Automatic detection of unnatural word-level segments in unit-selection speech synthesis
Document Type
Conference
Source
2011 IEEE Workshop on Automatic Speech Recognition & Understanding Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. :289-294 Dec, 2011
Subject
Language
Abstract
We investigate the problem of automatically detecting unnatural word-level segments in unit selection speech synthesis. We use a large set of features, namely, target and join costs, language models, prosodic cues, energy and spectrum, and Delta Term Frequency Inverse Document Frequency (TF-IDF), and we report comparative results between different feature types and their combinations. We also compare three modeling methods based on Support Vector Machines (SVMs), Random Forests, and Conditional Random Fields (CRFs). We then discuss our results and present a comprehensive error analysis.