학술논문

Sensitivity Analysis of MaskCycleGAN based Voice Conversion for Enhancing Cleft Lip and Palate Speech Recognition
Document Type
Conference
Source
2022 IEEE International Conference on Signal Processing and Communications (SPCOM) Signal Processing and Communications (SPCOM), 2022 IEEE International Conference on. :1-5 Jul, 2022
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Time-frequency analysis
Sensitivity analysis
Lips
Conferences
Speech recognition
Speech enhancement
Assistive technologies
Cleft lip and palate speech
automatic speech recognition
MaskCycleGAN-VC
Language
ISSN
2474-915X
Abstract
Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. The existing work on CLP speech enhancement is by using CycleGAN-VC based non-parallel voice conversion method. However, CycleGAN-VC cannot capture the time-frequency structures which can be done by MaskCycleGAN-VC by application of a module named as time-frequency adaptive normalization. It also has the added advantage of mel-spectrogram conversion rather than mel-spectrum conversion. This voice conversion of a CLP speech to a normal speech increases the intelligibility and thereby allows automatic speech recognition systems to predict the uttered sentences which is necessary in day to day life as speech recognition devices are automatizing living on a large scale. But in order to develop an assistive technology it is very essential to study the sensitivity of automatic speech recognizers. This work focuses on the sensitivity analysis of a MaskCycleGAN based voice conversion system depending on the variation of acoustic and gender mismatch.