학술논문

Combining PCA and SMOTE for Software Defect Prediction with Visual Analytics Approach
Document Type
Conference
Source
2022 10th International Conference on Cyber and IT Service Management (CITSM) Cyber and IT Service Management (CITSM), 2022 10th International Conference on. :1-7 Sep, 2022
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Support vector machines
Analytical models
Visual analytics
Software algorithms
NASA
Buildings
Predictive models
Software Defect Prediction
PCA
Imbalance Handling
ML
Visual Analytics
Language
ISSN
2770-159X
Abstract
Software defect prediction enables efficient management of time and resources in the form of improving software quality. Therefore, research to improve the performance or accuracy score of the software defect prediction model is still being carried out. However, datasets for SDP often have a large number of attributes and imbalance between the defective and non-defective class samples, which reduces classification performance. In this study, we propose combining PCA with SMOTE with aims to produce models with better performance and visualization approach to represent the model created to help understanding and analysis for modeling in the future. The SVM, RF, NB, and NN classification algorithms which the best parameters are sought, are evaluated based on the Recall, AUC and G-Mean values in five different NASA datasets. The authors then compare the results of the evaluation of the proposed model with the PCA model without SMOTE to find out whether the performance of the model has improved. Visual analytics is successfully built after the model is created for all stages of the model building so it provides confidence, helps users understand and gain insights from the resulting model. The findings indicate that the proposed method outperforms the model using PCA alone on average by 60%, 47%, and 16% for Recall, AUC, and G-Mean scores, respectively. SMOTE is proven to overcome the effect of class imbalance by increasing the g-mean score in all models and NN is the best algorithm based on the average score in the proposed model.