학술논문

Temporal-Frequency-Spatial Features Fusion for Multi-channel Informed Target Speech Separation

Document Type

Conference

Author

Zhang, Wen; Lin, Bin; Ma, Li; Zhou, Aolong; Wu, Guoli

Source

2022 5th International Conference on Information Communication and Signal Processing (ICICSP) Information Communication and Signal Processing (ICICSP), 2021 4th International Conference on. :168-174 Nov, 2022

Subject

Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Measurement
Training
Knowledge engineering
Time-frequency analysis
Neural networks
Signal processing
Predictive models
multi-channel
informed target speech separation
features fusion
end-to-end
deep learning

Language

ISSN

2770-792X

Abstract

Our goal is to make full use of time-frequency domain features and spatial domain features of the multichannel speech signal, and we propose an end-to-end multichannel target speech separation method based on temporal-frequency-spatial feature fusion, called the cTFS model. For the target speech separation task, the cTFS model takes the angel feature of the target speech signal as the prior knowledge, then predicts the complex ideal ratio mask target with a complex U-shaped network. We achieve the reconstruction of the target speech signal by signal approximation. Furthermore, a multi-channel target speaker separation dataset is constructed based on the WSJ0-2mix dataset based on the signal reverberation model. The performance of each target speaker separation model is evaluated on this dataset using the evaluation metrics SDR, SI-SNR, PESQ, and STOI. Experimental results show the effectiveness of the proposed method as well as the benefit of incorporating angle feature information in multichannel speech separation.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송