학술논문

An efficient validity index method for datasets with complex-shaped clusters
Document Type
Conference
Source
2016 International Conference on Machine Learning and Cybernetics (ICMLC) Machine Learning and Cybernetics (ICMLC), 2016 International Conference on. 2:558-563 Jul, 2016
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Dispersion
Indexes
Shape
Clustering algorithms
Partitioning algorithms
Cybernetics
Estimation
Cluster validity index
GKFCM
Dispersion measure
Overlap measure
Concave shape
Language
ISSN
2160-1348
Abstract
In this paper, a validity index method VDOGK, a variation of the index method VDO, for estimating the optimal number of clusters in datasets with concave-/elongated-shaped clusters is presented. The new index uses Gustafson-Kessel FCM to partition the dataset so that geometric-shape-sensitivity problem of FCM can be reduced. It is based on both dispersion and overlap measures, where the dispersion measure estimates the overall cluster compactness and the overlap measure estimates the total ambiguity degree of data belonging to any pair of clusters in the dataset. A good clustering result is expected to have both measures small. Examples of synthetic datasets comprising concave, elongated, spherical, and/or elliptical clusters are presented. Experimental results on various datasets including synthetic and real datasets from UCI Machine Learning Laboratory demonstrate that the proposed VDOGK made correct estimation on number of clusters for all nine tested datasets, whereas VDO only scored three real datasets.