학술논문

Breast Cancer Prediction using Feature Selection and Ensemble Voting
Document Type
Conference
Source
2019 International Conference on System Science and Engineering (ICSSE) System Science and Engineering (ICSSE), 2019 International Conference on. :250-254 Jul, 2019
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Breast cancer
Analytical models
STEM
Principal component analysis
Training
Support vector machines
Data models
Breast Cancer
Ensemble Voting Classification
SVM
Random Forest
Perception
Logistics Regression
KNN
Stochastic Gradient Descent
XGBoost
Extremely Randomised Trees
AdaBoost
Language
ISSN
2325-0925
Abstract
Breast cancer is the most common cause of cancer among women worldwide. This paper analyses the performance of supervised and unsupervised models for breast cancer classification. Data from Wisconsin Breast Cancer Dataset is used in this paper. Feature selection is processed through scaling and principal component analysis. Final results indicate that Ensemble Voting approach is ideal as a predictive model for breast cancer. The raw data has 569 cases of breast cancer. The data is split into training and testing sets in the ration 70:30, respectively. The benchmark model is then created using Random Forest method. Various models are trained and tested on the data after Feature Scaling and Principle Component Analysis. Cross-validation is performed which showed that our model is stable. Among all the evaluated models, only four models, i.e., Ensemble - Voting Classifier, Logistics Regression, SVM Tuning and AdaBoost returned with accuracy of at least 98%. Based on results of the precision and recall, ROC-AVC, Fl-measure and computational time of the models, the Ensemble showed the most potential in breast cancer classification of the given dataset.