학술논문

A novel quasi-diphone inventory approach to Text-To-Speech synthesis
Document Type
Conference
Source
MELECON 2008 - The 14th IEEE Mediterranean Electrotechnical Conference Electrotechnical Conference, 2008. MELECON 2008. The 14th IEEE Mediterranean. :799-804 May, 2008
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Speech
Speech synthesis
Text analysis
Digital signal processing
Speech processing
Optimization
Fading
TTS
concatenative synthesis
mixed-rank inventory
quasi-diphone
Macedonian
Language
ISSN
2158-8473
2158-8481
Abstract
The paper presents a novel approach to concatenative Text-To-Speech synthesis. The system uses a unique optimized mixed-rank inventory, based on a modification of the classical diphone concept. A new unit type is introduced in our work, dubbed the quasi-diphone unit. A set of these units is designed to cover all the critical transitions between phones and at the same time to be compatible with phone-length units for concatenation purposes. This allows for inventory optimization in respect to its size and quality of the generated speech. The system includes elementary pitch, duration and amplitude modeling implemented with the standard PSOLA algorithm. Presented results show that it was possible to achieve full intelligibility and reasonable naturalness whilst maintaining a rather small inventory. The system was specially developed for the synthesis of Macedonian, and is the first HQ TTS system for this language. Using the developed standardized interface between the modules, the system is also applicable to any of the world’s languages.