학술논문

Temporal-Frequency-Spatial Features Fusion for Multi-channel Informed Target Speech Separation
Document Type
Conference
Source
2022 5th International Conference on Information Communication and Signal Processing (ICICSP) Information Communication and Signal Processing (ICICSP), 2021 4th International Conference on. :168-174 Nov, 2022
Subject
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Measurement
Training
Knowledge engineering
Time-frequency analysis
Neural networks
Signal processing
Predictive models
multi-channel
informed target speech separation
features fusion
end-to-end
deep learning
Language
ISSN
2770-792X
Abstract
Our goal is to make full use of time-frequency domain features and spatial domain features of the multichannel speech signal, and we propose an end-to-end multichannel target speech separation method based on temporal-frequency-spatial feature fusion, called the cTFS model. For the target speech separation task, the cTFS model takes the angel feature of the target speech signal as the prior knowledge, then predicts the complex ideal ratio mask target with a complex U-shaped network. We achieve the reconstruction of the target speech signal by signal approximation. Furthermore, a multi-channel target speaker separation dataset is constructed based on the WSJ0-2mix dataset based on the signal reverberation model. The performance of each target speaker separation model is evaluated on this dataset using the evaluation metrics SDR, SI-SNR, PESQ, and STOI. Experimental results show the effectiveness of the proposed method as well as the benefit of incorporating angle feature information in multichannel speech separation.