학술논문

A Flexible EM-Like Clustering Algorithm for Noisy Data

Document Type

Periodical

Author

Source

IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE Trans. Pattern Anal. Mach. Intell. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 46(5):2709-2721 May, 2024

Subject

Computing and Processing
Bioengineering
Clustering algorithms
Estimation
Covariance matrices
Data models
Shape
Gaussian distribution
Task analysis
Clustering
high-dimensional data
mixture models
robust estimation
semi-parametric model

Language

ISSN

0162-8828
2160-9292
1939-3539

Abstract

Though very popular, it is well known that the Expectation-Maximisation (EM) algorithm for the Gaussian mixture model performs poorly for non-Gaussian distributions or in the presence of outliers or noise. In this paper, we propose a Flexible EM-like Clustering Algorithm (FEMCA): a new clustering algorithm following an EM procedure is designed. It is based on both estimations of cluster centers and covariances. In addition, using a semi-parametric paradigm, the method estimates an unknown scale parameter per data point. This allows the algorithm to accommodate heavier tail distributions, noise, and outliers without significantly losing efficiency in various classical scenarios. We first present the general underlying model for independent, but not necessarily identically distributed, samples of elliptical distributions. We then derive and analyze the proposed algorithm in this context, showing in particular important distribution-free properties of the underlying data distributions. The algorithm convergence and accuracy properties are analyzed by considering the first synthetic data. Finally, we show that FEMCA outperforms other classical unsupervised methods of the literature, such as k-means, EM for Gaussian mixture models, and its recent modifications or spectral clustering when applied to real data sets as MNIST, NORB, and 20newsgroups .

Online Access

Full Text (IEEE) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송