학술논문

A Linguistic-based Transfer Learning Approach for Low-resource Bahnar Text-to-Speech

Document Type

Conference

Author

Nguyen, Tan Dat; Lam, Quang Tuong; Do, Duc Hao; Cai, Huu Thuc; Nguyen, Hoang Suong; Vo, Thanh Hung; Nguyen, Duc Dung

Source

2022 9th NAFOSTED Conference on Information and Computer Science (NICS) Information and Computer Science (NICS), 2022 9th NAFOSTED Conference on. :148-153 Oct, 2022

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Computer science
Vocoders
Synthesizers
Transfer learning
Phonetics
Acoustics
Recording

Language

Abstract

The Text-to-Speech (TTS) model often requires a large number of recorded utterances in standard quality for a high-fidelity synthesized speech. For low-resource languages, lacking data becomes a big challenge. In this work, we address this problem in the Bahnar Kriem language, a rare language used by Bahnar people living in Binh Dinh county, Vietnam. We propose the linguistic approach to process a poor-quality dataset of 720 utterances of Bahnar Kriem language, along with some preprocessing steps. We also analyze the Bahnar Kriem language and figure out a mixture between Bahnar and Vietnamese due to the historical development between the two races. Therefore, we propose the transfer learning approach to integrate the Vietnamese pronunciation into the Bahnar TTS synthesizer. The experiments show significant improvement in the performance of the TTS model for a low-resource language. Our model can also generate long Bahnar sentences with a short inference time. The subjective and objective evaluations suggest promising results and some potential improvements based on our approach. We also provide audio samples generated from our model 1 .

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송