학술논문

An Innovative Software Bug Prediction System using Random Forest Algorithm for Enhanced Accuracy in Comparison with Logistic Regression Algorithm
Document Type
Conference
Source
2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS) Intelligent Computing and Control for Engineering and Business Systems (ICCEBS), 2023. :1-6 Dec, 2023
Subject
Bioengineering
Computing and Processing
Engineering Profession
General Topics for Engineers
Robotics and Control Systems
Signal Processing and Analysis
Logistic regression
Software algorithms
Computer bugs
Prediction algorithms
Software
Security
Random forests
Novel Random Forest
Decision Tree
Bug prediction
Machine learning
Software bugs
Security threats
Language
Abstract
This research seeks to compare the accuracy of the novel Random Forest (RF) and Logistic Regression (LR) methods for forecasting software problems which avoids security threats based on a variety of software parameters. The research employed a dataset taken from the Kaggle network, which comprised several software measures such as script lines, program churn, and code intricacy. This study included two machine learning techniques, Logistic Regression and the innovative Random Forest. In the first group, 10 samples were trained employing Logistic Regression, while in the second group, 10 samples were trained employing the novel Random Forest approach. The Python scikit-learn package was utilized for the development of the Logistic Regression and Random Forest methodologies. The evaluation of model accuracy was conducted by employing the testing data subsequent to the training of said models utilizing the training data. Based on a statistical power (G-power) of 80%, a significance level (alpha) of 0.05, and a desired level of type II error (beta) of 0.2, a sample size of 10 per group was judged to be appropriate. The research findings indicate that the Random Forest strategy had a greater accuracy rate of 78.59% in comparison to the Logistic Regression technique, which attained a success rate of 76.54%. In addition, a 95% confidence interval was determined by calculating to ensure that the results were statistically significant values with 0.000 (p