학술논문

Comparison of different Acoustic Models for Kannada language using Kaldi Toolkit
Document Type
Conference
Source
2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) Advances in Computing, Communications and Informatics (ICACCI), 2018 International Conference on. :2415-2420 Sep, 2018
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Training
Speech recognition
Data models
Hidden Markov models
Mel frequency cepstral coefficient
Decoding
KALDI
word error rate (WER)
ASR
GMM
Language
Abstract
This paper describes a speech recognition system for the South Indian language, Kannada using Kaldi toolkit. KALDI is a open source toolkit based on Finite State Transducers (FST's). Two speech data sets has been collected from 10 different speakers (5 male and 5 female). The first data set consists of a digit corpora in Kannada where each speaker has spoken a number ten times and the second data set consists of simple Kannada phrases. The noise to a large extent has been filtered manually and the data has been segmented using the software application Audacity(v2.2.2). The main objective is to compare the word error rate (WER) of the two data sets using different acoustic models in Gaussian Mixture Models(GMM) and Sub-Gaussian Mixture Model(SGMM). The WER for Gaussian Mixture Model and Subspace Gaussian Mixture Model for the first data set is 4.54% and 4.27% respectively and for the second data set the WER for GMM and SGMM is 12.27% and 13%.