학술논문

The Effectiveness of Resampling Method for Handling Class Imbalances in Software Defect Prediction
Document Type
Conference
Source
2023 International Conference on Information Technology Research and Innovation (ICITRI) Information Technology Research and Innovation (ICITRI), 2023 International Conference on. :22-27 Aug, 2023
Subject
Computing and Processing
Robotics and Control Systems
Signal Processing and Analysis
Logistic regression
Technological innovation
Predictive models
Software
Computer crashes
Software reliability
Classification algorithms
Resampling Technique
Class Imbalance
Software Defect Prediction
Language
Abstract
Defect prediction is crucial for software products to be high-quality and reliable. Class imbalance, however, in which one class does much better than the other, poses a significant challenge to flaw prediction models. This inequality often results in discriminatory behavior towards the majority class, resulting in poor performance in identifying the defects of the minority class. By undersampling the dominant class, oversampling the minority class, or combining the two, resampling entails changing the distribution of the dataset. This study aims to develop a robust and accurate model that can overcome the limitations of class imbalance and improve overall crash prediction performance. Logistic regression, a widely used classification algorithm, offers interpretability and flexibility, making it suitable for defect prediction. This research investigates the effectiveness of the resampling technique in conjunction with logistic regression to deal with the class imbalance in defect prediction software. Accuracy and UAC measurement result from the t-test for 12 MDP datasets show that Logistic Regression+Sample (Bootstrapping) works much better than Logistic Regression, with an average accuracy of 90.78% and an average AUC of 0.81.