학술논문

Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR
Document Type
Conference
Source
2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Chinese Spoken Language Processing (ISCSLP), 2022 13th International Symposium on. :21-25 Dec, 2022
Subject
Computing and Processing
Signal Processing and Analysis
Training
Adaptation models
Data models
Speech processing
Automatic speech recognition
speech recognition
unsupervised domain adaptation
distribution matching
transfer learning
Language
Abstract
Unsupervised domain adaptation (UDA) aims to improve the cross-domain model performance without labeled target domain data. Distribution matching is a widely used UDA approach for automatic speech recognition (ASR), which learns domain-invariant while class-discriminative representations. Most previous approaches to distribution matching simply treat all frames in a sequence as independent features and match them between domains. Although intuitive and effective, the neglect of the sequential property could be sub-optimal for ASR. In this work, we propose to explicitly capture and match the sequence-level statistics with sequence pooling, leading to a sequence distribution matching approach. We examined the effectiveness of the sequence pooling on the basis of the maximum mean discrepancy (MMD) based and domain adversarial training (DAT) based distribution matching approaches. Experimental results demonstrated that the sequence pooling methods effectively boost the performance of distribution matching, especially for the MMD-based approach. By combining sequence pooling features and original features, MMD-based and DAT-based approaches relatively reduce WER by 12.08% and 14.72% over the source domain model.