학술논문

Unveiling Subpopulations and Patterns within the Pima Indians Diabetes Dataset through FCM Clustering Analysis
Document Type
Conference
Source
2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS) Automation, Computing and Renewable Systems (ICACRS), 2023 2nd International Conference on. :984-988 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Renewable energy sources
Clustering algorithms
Data visualization
Medical services
Prediction algorithms
Diabetes
Glucose
Fuzzy
Membership function
Clustering
Fuzzy Clustering means
Language
Abstract
This research study analyzes the application of Fuzzy C-Means (FCM) clustering to the Pima Indians Diabetes dataset, which is commonly used in the field of medical data analysis for diabetes outcome prediction. FCM is a robust clustering algorithm designed to identify inherent patterns and structures within complex datasets. The primary aim of this research was to uncover potential groupings or clusters within the dataset based on key features associated with diabetes risk factors. The dataset was loaded from a local file and underwent meticulous preprocessing to ensure that all data values were numeric. Specifically, non-numeric values were transformed into NaN (Not a Number), and rows containing missing values were omitted from the analysis. This preprocessing step was essential to prevent errors stemming from non-numeric data during the clustering process. Following data preparation, the FCM clustering algorithm was employed. The FCM algorithm assigns data points to clusters based on their degree of membership in each cluster. The number of clusters and the fuzziness parameter were set, and stopping criteria were established to control the clustering procedure. The results of the FCM clustering were visualized to assess the formation of clusters within the dataset. The clustering outcomes were displayed using a scatter plot, where each data point represented a specific entry and was color-coded to indicate its cluster membership. Cluster centers were depicted as red X markers. The application of FCM clustering to the Pima Indians Diabetes dataset demonstrated its capability to reveal latent structures within medical data. The identification of subpopulations or clusters within this dataset holds the potential to enhance diabetes interventions and treatments by tailoring them to specific patient profiles. It is important to note that while FCM clustering provides insights into data patterns, further analysis and domain knowledge are necessary to comprehend the clinical relevance of the identified clusters within the context of diabetes management and diagnosis.