학술논문

Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification
Document Type
Conference
Source
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Performance evaluation
Degradation
Adaptation models
Data models
Acoustics
Recording
StarGAN
Domain Adaptation
Data Augmentation
Speaker Verification
Language
ISSN
2379-190X
Abstract
Automatic speaker verification (ASV) faces domain shift caused by the mismatch of intrinsic and extrinsic factors, such as recording device and speaking style, in real-world applications, which leads to severe performance degradation. Since single-speaker multi-condition (SSMC) data is difficult to collect in practice, existing domain adaptation methods are hard to ensure the feature consistency of the same class but different domains. To this end, we propose a cross-domain data generation method to obtain a domain-invariant ASV system. Inspired by voice conversion (VC) task, a StarGAN based generative model first learns cross-domain mappings from SSMC data, and then generates missing domain data for all speakers, thus increasing the intra-class diversity of the training set. Considering the difference between ASV and VC task, we renovate the corresponding training objectives and network structure to make the adaptation task-specific. Evaluations on achieve a relative performance improvement of about 5-8% over the baseline in terms of minDCF and EER, outperforming the CNSRC winner’s system of the equivalent scale.