학술논문

Explainable Fault Analysis in Mobile Networks: A SHAP-Based Supervised Clustering Approach
Document Type
Conference
Source
2023 16th International Conference on Signal Processing and Communication System (ICSPCS) Signal Processing and Communication System (ICSPCS), 2023 16th International Conference on. :1-9 Sep, 2023
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Fields, Waves and Electromagnetics
Signal Processing and Analysis
Fault diagnosis
Degradation
Root cause analysis
Fault detection
Key performance indicator
Signal processing algorithms
Signal processing
Mobile Networks
Boosting Models
SHAP
Supervised Clustering
Language
Abstract
The advent of the Fifth Generation (5G) of cellular technology has made obsolete the human-centric monitoring of mobile networks. The increasing demand and complexity of the monitoring services handled by Network Operations Centers (NOCs) engineers have forced Mobile Network Operators (MNOs) to shift their focus towards automated solutions for network fault detection and diagnosis. To address this need, numerous Root Cause Analysis (RCA) systems based on Machine Learning (ML) have been developed. However, these systems often lack the explainability of the presented results, since ML models typically behave as black boxes. Thus, this paper aims to overcome this lack of explainability by presenting a supervised clustering methodology based on the SHapley Additive exPlanations (SHAP) method. The developed work is divided into a fault detection phase for the User Downlink (DL) Average Throughput Key Performance Indicator (KPI) using Boosting models, and a subsequent diagnosis stage utilizing the Tree- SHAP method and a clustering algorithm. By analyzing the formed clusters, it was possible to identify the different root causes for the faults in the User DL Average Throughput KPI. Namely, it was possible to diagnose that 13.23% of the faults occurred due to radio conditions problems, 37.48% occurred in areas with extremely low network usage where the performance degradation of a specific user group affects the average site performance, 26.37% were caused by low network capacity problems and the remaining 22.76% experienced severe mobility issues in addition to capacity problems. Finally, the paper outlines mitigation strategies for each of the identified clusters of faults.