학술논문

Using machine learning methods to identify significant variables for the prediction of first-year Informatics Engineering students dropout
Document Type
Conference
Source
2020 39th International Conference of the Chilean Computer Science Society (SCCC) Chilean Computer Science Society (SCCC), 2020 39th International Conference of the. :1-5 Nov, 2020
Subject
Computing and Processing
Engineering Profession
Training
Forestry
Predictive models
Decision trees
Feeds
Informatics
Random forests
first-year student dropout
decision trees
random forest
Language
Abstract
Student dropout is a phenomenon that affects all higher education institutions in Chile, with costs for people, institutions and the State. The reported retention rate of first year students for all Chilean universities was of 75%. Despite the extensive research and the implementation of various models to identify dropout causes and risk groups, few of them have been carried out in the Chilean higher education context.Our work attempts to identify, using machine learning methods, the variables with highest predictive value for student dropout by the end of the first year of study, within a 6-year Informatics Engineering programme with a rather high dropout rate of 21.9% reported on 2018. In that regard, we use the data of 4 cohorts of students (2012-2016) enrolled at the programme, to feed a random forest feature selection process. We later build a decision tree using the identified relevant features, which we later test using data of the 2017-2018 cohorts of students.Despite the fact that the decision tree is over-fitted (97,21% training accuracy against 81.01% test accuracy), the process sheds light on the nature of the variables that determine whether or not a student remains at the end of their first year of study at the University. 6 of the identified factors are academic, and the remaining one is social-cultural.