학술논문

BELLATREX: Building Explanations Through a LocaLly AccuraTe Rule EXtractor
Document Type
Periodical
Source
IEEE Access Access, IEEE. 11:41348-41367 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Radio frequency
Random forests
Task analysis
Forestry
Predictive models
Solid modeling
Regression tree analysis
Explainable AI
interpretable ML
multi-label classification
multi-target regression
random forest
random survival forest
Language
ISSN
2169-3536
Abstract
Random forests are machine learning methods characterised by high performance and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), which is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method: 1) pre-selects a subset of the rules used to make the prediction; 2) creates a vector representation of such rules; 3) projects them to a low-dimensional space; 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that Bellatrex is able to approximate the performance of the corresponding ensemble model in all considered tasks, and it does so while selecting at most three rules from the whole forest. Finally, a comparison with similar methods in literature also shows that our proposed approach substantially outperforms other explainable toolboxes in terms of predictive performance.