
Application of Neighborhood Components Analysis to Process and Survey Data to Predict Student Learning of Statistics
Document Type
2022 International Conference on Advanced Learning Technologies (ICALT) ICALT Advanced Learning Technologies (ICALT), 2022 International Conference on. :147-151 Jul, 2022
Communication, Networking and Broadcast Technologies
Computing and Processing
Linear regression
Machine learning
Feature extraction
Market research
Predictive analytics
predictive analytics
machine learning
neighborhood components analysis
Advanced Placement
Machine learning methods for predictive analytics have great potential for uncovering trends in educational data. However, simple linear models still appear to be most widely used, in part, because of their interpretability. This study aims to address the issues of interpretability of complex machine learning classifiers by conducting feature extraction by neighborhood components analysis (NCA). Our dataset comprises 287 features from both process data indicators (i.e., derived from log data of an online statistics learning platform) and self-report data from high school students enrolled in Advanced Placement (AP) Statistics (N=733). As a label for prediction, we use students’ scores on the AP Statistics exam. We evaluated the performance of machine learning classifiers with a given feature extraction method by evaluation criteria including F1 scores, the area under the receiver operating characteristic curve (AUC), and Cohen’s Kappas. We find that NCA effectively reduces the dimensionality of training datasets, stabilizes machine learning predictions, and produces interpretable scores. However, interpreting the NCA weights of features, while feasible, is not very straightforward compared to linear regression. Future research should consider developing guidelines to interpret NCA weights.