학술논문

Statistical methods in data-driven modeling of Spanish prosody for text to speech
Document Type
Conference
Source
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 Spoken language processing Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on. 3:1377-1380 vol.3 1996
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Statistical analysis
Speech synthesis
Frequency conversion
Spatial databases
Feature extraction
Telecommunications
Natural languages
Electronic mail
Speech recognition
Contracts
Language
Abstract
In (Lopez-Gonzalo et al., 1995), we proposed an automatic data-driven methodology to model both fundamental frequency and segmental duration in TTS converters from a monospeaker recorded corpus. Therefore, it had the advantage that it could be adapted to a specific corpus or a particular speaker. The main disadvantage was the size of the obtained prosodic database. In this paper, we propose to use some statistical methods for reducing the prosodic database required in this methodology. A 50% reduction can be obtained without compromising the naturalness of the synthetic speech obtained by our previous methodology with the same prosodic corpus. A compromise between variability and reduction in prosodic contours is also discussed.