학술논문

Self-Supervised Learning-Based Source Separation for Meeting Data

Document Type

Conference

Author

Li, Yuang; Zheng, Xianrui; Woodland, Philip C.

Source

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2023 - 2023 IEEE International Conference on. :1-5 Jun, 2023

Subject

Bioengineering
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Training
Adaptation models
Time-frequency analysis
Source separation
Self-supervised learning
Feature extraction
Data models
source separation
self-supervised learning
automatic speech recognition

Language

ISSN

2379-190X

Abstract

Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel source separation, most studies have focused on simulated setups. In this paper, seven SSL models were compared on both simulated and real-world corpora. Then, we propose to integrate the best-performing model WavLM into an automatic transcription system through a novel iterative source selection method. To improve real-world performance, time-domain unsupervised mixture invariant training was adapted to the time-frequency domain. Experiments showed that in the transcription system when source separation was inserted before an ASR model fine-tuned on separated speech, absolute reductions of 1.9% and 1.5% in concatenated minimum-permutation word error rate for an unknown number of speakers (cpWER-us) were observed on the AMI dev and test sets.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송