학술논문

End-to-End Brain-Driven Speech Enhancement in Multi-Talker Conditions
Document Type
Periodical
Source
IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE/ACM Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 30:1718-1733 2022
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Speech enhancement
Feature extraction
Brain
Electroencephalography
Noise measurement
Data mining
Convolution
Deep learning
EEG signals
Language
ISSN
2329-9290
2329-9304
Abstract
Single-channel speech enhancement algorithms have seen great improvements over the past few years. Despite these improvements, they still lack the efficiency of the auditory system in extracting attended auditory information in the presence of competing speakers. Recently, it has been shown that the attended auditory information can be decoded from the brain activity of the listener. In this paper, we propose two novel end-to-end deep learning methods referred to as the Brain Enhanced Speech Denoiser (BESD) and the U-shaped Brain Enhanced Speech Denoiser (U-BESD) respectively, that take advantage of this fact to denoise a multi-talker speech mixture without considering further background noises or reverberations. We use a Feature-wise Linear Modulation (FiLM) between the brain activity and the sound mixture, to better extract the features of the attended speaker to perform speech enhancement. We show, using electroencephalography (EEG) signals recorded from the listener, that both BESD and U-BESD successfully extract the attended speaker without any prior information about this speaker. Moreover, U-BESD also outperforms a current state-of-the-art approach that also uses brain activity to perform enhancement. The proposed neural network-based methods would thus make great candidates for realistic applications where no prior information about the attended speaker is available, such as hearing aids, cellphones, or noise cancelling headphones.