학술논문

Hyperkinetic Dysarthria voice abnormalities: a neural network solution for text translation.
Document Type
Article
Source
International Journal of Speech Technology. Mar2024, Vol. 27 Issue 1, p255-265. 11p.
Subject
*Speech perception
*Speech disorders
*Dysarthria
*Automatic speech recognition
Data augmentation
Web-based user interfaces
Error rates
Language
ISSN
1381-2416
Abstract
The implementation of a defect speech recognition (DSR) system has the opportunity to significantly improve the lifestyle of people with speech disorders. In this paper, we developed a novel ConvGRUSpeechNet model for recognizing and understanding hyperkinetic dysarthria disorder (HDD) speech. The proposed model uniquely combines convolutional layers, recurrent layers (GRU and BiGRU), and dense layers with a LogSoftmax function to effectively recognize and translate HDD speech into text. To prevent overfitting and handling imbalances, we employed data augmentation and splitting functions during the training process. Also, the Mel-frequency cepstral coefficients (MFCC) were employed to reduce the issue of vanishing gradients. In addition, a dataset of Russian speech has been created, comprising 2000 recordings of HDD speech. The primary objective of this research is to improve speech recognition for individuals with HDD by employing the ConvGRUSpeechNet model. The proposed DSR system outperformed the recognition character error rate (CER) of 12.35% using the test dataset. Under the same conditions, the experimental findings show that the proposed solution exhibits superior performance in comparison to existing state-of-the-art CBNs and TDNN-F LF-MMI models. Furthermore, we implemented the TensorFlow model on a flask server, making it accessible for use in a web application. [ABSTRACT FROM AUTHOR]