학술논문

Augmentation Adversarial Training for Self-Supervised Speaker Representation Learning

Document Type

Periodical

Author

Kang, J.; Huh, J.; Heo, H.S.; Chung, J.S.

Source

IEEE Journal of Selected Topics in Signal Processing IEEE J. Sel. Top. Signal Process. Selected Topics in Signal Processing, IEEE Journal of. 16(6):1253-1262 Oct, 2022

Subject

Signal Processing and Analysis
Training
Speaker recognition
Self-supervised learning
Semisupervised learning
Representation learning
speaker recognition

Language

ISSN

1932-4553
1941-0484

Abstract

The goal of this work is to train robust speaker recognition models using self-supervised representation learning. Recent works on self-supervised speaker representations are based on contrastive learning in which they encourage within-utterance embeddings to be similar and across-utterance embeddings to be dissimilar. However, since the within-utterance segments share the same acoustic characteristics, it is difficult to separate the speaker information from the channel information. To this end, we propose an augmentation adversarial training strategy that trains the network to be discriminative for the speaker information, while invariant to the augmentation applied. Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general. Extensive experiments on the VoxCeleb and VOiCES datasets show significant improvements over previous works using self-supervision, and the performance of our self-supervised models far exceeds that of humans. We also conduct semi-supervised learning experiments to show that augmentation adversarial training benefits performance in presence of speaker labels.

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송