학술논문

A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data
Document Type
Periodical
Source
IEEE Transactions on Signal Processing IEEE Trans. Signal Process. Signal Processing, IEEE Transactions on. 71:1669-1682 2023
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal processing algorithms
Clustering algorithms
Finite element analysis
Estimation
Generators
Classification algorithms
Task analysis
Angular Gaussian distributions
EM algorithm
elliptical distributions
imputation
missing data
mixture models
Language
ISSN
1053-587X
1941-0476
Abstract
This article tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on $k$-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or have non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate $t$-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.