학술논문

Pitch transformation in neural network based voice conversion
Document Type
Conference
Source
The 9th International Symposium on Chinese Spoken Language Processing Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. :197-200 Sep, 2014
Subject
Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Artificial neural networks
Speech
Wavelet transforms
Context
Training
Vectors
voice conversion
pitch
neural network
Language
Abstract
In voice conversion task, prosody conversion especially pitch conversion is a very challenging research topic because of the discontinuity property of pitch. Conventionally pitch conversion is always achieved by adjusting the mean and variance of the source pitch distribution to the target pitch distribution. This method removes most of the detailed information of the speaker's prosody and only maintains the global F0 contour. In this paper, we propose a neural network based pitch conversion system which converts F0 and spectral features all together frame by frame. Experimental results show that neural network based pitch conversion can significantly reduce the Unvoiced/Voiced error and RMSE of F0 between converted pitch and target pitch compared with the conventional Gaussian normalized transformation method. Wavelet decomposition for F0 can further improve the performance of voice conversion.