학술논문

Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors
Document Type
Periodical
Source
IEEE/ACM Transactions on Audio, Speech, and Language Processing IEEE/ACM Trans. Audio Speech Lang. Process. Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 32:1954-1967 2024
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Reverberation
Microphones
Source separation
Location awareness
Time-frequency analysis
Analytical models
Vectors
array signal processing
convolutive nonnegative matrix factorization
room reverberation
Language
ISSN
2329-9290
2329-9304
Abstract
Speech signals recorded by distant microphones are often contaminated with room reverberation and signals of interfering speakers. This article addresses the problem of joint source separation and dereverberation using multichannel nonnegative tensor factorization (NTF) in which late reverberant components are modeled using the so-called delayed subsources. The article formulates two distinct signal models of the time-frequency spectrum of the multichannel microphone mixture, in which reverberation is modeled either independently for each source using delayed source variances or jointly using delayed microphone signals. In addition, it defines computationally efficient variants of these two methods with a simplified spatial model in which spatial properties of the late reverberant components are estimated jointly for all delays. For each of the four distinct algorithms, the article first formulates a maximum a posteriori (MaP) estimator based on the NTF model with the localization prior over the mixing matrix that is suitable for the estimation of the early reverberation (primarily the direct-path) signals in a reverberant environment. Next it derives update equations for the four resulting expectation-maximization algorithms, which are thoroughly evaluated and shown to outperform similar state-of-the-art approaches. The results of experimental evaluations, performed using real and simulated data, for determined, over-determined and under-determined scenarios, indicate superior performance of the proposed processing over state-of-the-art in terms of standard source separation and dereverberation metrics.