학술논문

Harmonizing Voices: Enhancing Speech Recognition Through Integrated Phonological Features in Bengali
Document Type
Conference
Source
2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON) Smart Generation Computing, Communication and Networking (SMART GENCON), 2023 3rd International Conference on. :1-8 Dec, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Adaptation models
Speech recognition
Speech enhancement
Feature extraction
Usability
Task analysis
Stress
Robustness
inclusive recognition
place and manner of articulation
speech attributes
phonological features
Language
Abstract
This study reveals a novel avenue for advancing speech recognition accuracy by integrating phonological features. Despite remarkable progress, speech recognition systems encounter challenges with varying accents and speech patterns. This study proposes an innovative approach incorporating phonological attributes like stress patterns, phoneme duration, and intonation into the recognition process. The system aims to capture intricate speech nuances essential for precise understanding by assimilating these linguistic cues. Extensive experiments conducted on diverse datasets demonstrate the efficacy of this phonology-enriched approach in enhancing recognition accuracy across different speech styles and variations. The phoneme detection model is generated on a system, prepared using deep neural network and the classification model is developed based on a stacked denoising autoencoder model. The outcomes under-score the potential of phonological integration in constructing adaptable and inclusive speech recognition systems, holding promise for improved communication technology in real-world multilingual scenarios. The proposed system produced 86.19% of overall accuracy. Classification among several places and manner of articulation has been performed also. In this classification task, the system produced 98.9% accuracy in the case of the manner of articulation and 50.2% in place of articulation.