학술논문

Combination Evaluation Method of Fuzzy C-Mean Clustering Validity Based on Hybrid Weighted Strategy
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:27239-27261 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Clustering algorithms
Indexes
Partitioning algorithms
Information entropy
Entropy
Classification algorithms
Fuzzy C-means clustering algorithm
clustering validity function
weighted combination
expert weighting
information entropy weighting
Language
ISSN
2169-3536
Abstract
Clustering validity function is an index used to judge the accuracy of clustering results. At present, most studies on clustering validity are based on single clustering validity function. Research shows that no clustering validity function can handle any data and always perform better than other indexes. Therefore, a hybrid weighted combination evaluation method based on fuzzy C-means (FCM) clustering validity functions was proposed. The weighting method combines expert weighting with information entropy weighting to improve the subjective factor influence of expert weighting and the shortcoming of information entropy weighting in the value judgment of each clustering validity function. Four clustering validity function combination methods of linear, exponential, logarithm and proportion was studied. Finally, the proposed fuzzy clustering validity evaluation method is verified by experiments on artificial data sets and UCI data sets. The experimental results show that the proposed fuzzy clustering validity evaluation method can overcome the shortcoming of single clustering validity function, and can get the optimal clustering number more accurately for different data sets.