학술논문

Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation
Document Type
Periodical
Source
IEEE Signal Processing Letters IEEE Signal Process. Lett. Signal Processing Letters, IEEE. 28:1670-1674 2021
Subject
Signal Processing and Analysis
Computing and Processing
Communication, Networking and Broadcast Technologies
Training
Predictive models
Decoding
Reverberation
Neural networks
Computational modeling
Analytical models
Neural source separation
unsupervised training
deep generative models
variational autoencoders
Language
ISSN
1070-9908
1558-2361
Abstract
This paper describes aneural blind source separation (BSS) method based on amortized variational inference (AVI) of a non-linear generative model of mixture signals. A classical statistical approach to BSS is to fit a linear generative model that consists of spatial and source models representing the inter-channel covariances and power spectral densities of sources, respectively. Although the variational autoencoder (VAE) has successfully been used as a non-linear source model with latent features, it should be pretrained from a sufficient amount of isolated signals. Our method, in contrast, enables the VAE-based source model to be trained only from mixture signals. Specifically, we introduce a neural mixture-to-feature inference model that directly infers the latent features from the observed mixture and integrate it with a neural feature-to-mixture generative model consisting of a full-rank spatial model and a VAE-based source model. All the models are optimized jointly such that the likelihood for the training mixtures is maximized in the framework of AVI. Once the inference model is optimized, it can be used for estimating the latent features of sources included in unseen mixture signals. The experimental results show that the proposed method outperformed the state-of-the-art BSS methods based on linear generative models and was comparable to a method based on supervised learning of the VAE-based sourcemodel.