학술논문

Pitch transformation in neural network based voice conversion

Document Type

Conference

Author

Xie, Feng-Long; Qian, Yao; Soong, Frank K.; Li, Haifeng

Source

The 9th International Symposium on Chinese Spoken Language Processing Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on. :197-200 Sep, 2014

Subject

Computing and Processing
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Artificial neural networks
Speech
Wavelet transforms
Context
Training
Vectors
voice conversion
pitch
neural network

Language

Abstract

In voice conversion task, prosody conversion especially pitch conversion is a very challenging research topic because of the discontinuity property of pitch. Conventionally pitch conversion is always achieved by adjusting the mean and variance of the source pitch distribution to the target pitch distribution. This method removes most of the detailed information of the speaker's prosody and only maintains the global F0 contour. In this paper, we propose a neural network based pitch conversion system which converts F0 and spectral features all together frame by frame. Experimental results show that neural network based pitch conversion can significantly reduce the Unvoiced/Voiced error and RMSE of F0 between converted pitch and target pitch compared with the conventional Gaussian normalized transformation method. Wavelet decomposition for F0 can further improve the performance of voice conversion.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송