학술논문
Optimize Wav2vec2s Architecture for Small Training Set Through Analyzing its Pre-Trained Models Attention Pattern
Document Type
Conference
Author
Source
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2022 - 2022 IEEE International Conference on. :7112-7116 May, 2022
Subject
Language
ISSN
2379-190X
Abstract
Transformer-based automatic speech recognition (ASR) systems have shown their success in the presence of large datasets. But, in medical research, we have to create ASR for the non-typical population, i.e. pre-school children with speech disorders, with small training dataset. To increase training efficiency on small datasets, we optimize the architecture of Wav2Vec 2.0, a variation of Transformer, through analyzing its pre-trained model’s block-level attention pattern. We show that block-level patterns can serve as an indicator for narrowing down the optimization direction. To ensure the reproducibility of our experiments, we leverage Librispeech-100-clean as training data to simulate the limited data condition. We leverage two techniques, local attention mechanism and cross-block parameter sharing, with counter-intuitive configurations. Our optimized architecture outperforms the vanilla architecture about 1.8% absolute word error rate (WER) on dev-clean and 1.4% on test-clean.