학술논문

Phoneme based Domain Prediction for Language Model Adaptation

Document Type

Conference

Author

Bhasin, Anmol; Mathur, Gaurav; Yenigalla, Promod; Natarajan, Bharatram

Source

2020 International Joint Conference on Neural Networks (IJCNN) Neural Networks (IJCNN), 2020 International Joint Conference on. :1-6 Jul, 2020

Subject

Bioengineering
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Erbium
Decoding
Lattices
Predictive models
Training
Mel frequency cepstral coefficient
Language Adaptation
Phoneme Classification
Multistage CNN
Domain specific LM

Language

ISSN

2161-4407

Abstract

Automatic Speech Recognizer (ASR) and Natural Language Understanding (NLU) are the two key components for any voice assistant. ASR converts the input audio signal to text using acoustic model (AM), language model (LM) and Decoder. NLU further processes this text for sub-tasks like predicting domain, intent and slots. Since input to NLU is text, any error in ASR module will propagate in NLU sub-tasks. ASR generally process speech in small duration windows and first generates phonemes using Acoustic Model (AM) and then Word Lattices using Decoder, Dictionary and Language Model (LM). Training and maintaining a generic LM, which fits the distribution of data of multiple domains is a difficult task. So our proposed architecture uses multiple domain specific LMs to rescore word lattice and has a way to select LMs for rescoring. In this paper, we are proposing a novel Multistage CNN architecture to classify the domain from partial phoneme sequence and use it to select top K domain LMs. The accuracy of multistage classification model based on phoneme input for top three domains has achieved stateof-the-art results on 2 open datasets, 97.76% in ATIS and 99.57% in Snips.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송