학술논문

Risk-driven Online Testing and Test Case Diversity Analysis for ML-enabled Critical Systems
Document Type
Conference
Source
2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) ISSRE Software Reliability Engineering (ISSRE), 2023 IEEE 34th International Symposium on. :344-354 Oct, 2023
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineering Profession
General Topics for Engineers
Nuclear Engineering
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Measurement
Metaheuristics
Collaboration
Clustering algorithms
Diversity methods
Search problems
Hazards
Search-based testing
ML-enabled systems
Risk
Diversity analysis
Simulation
Language
ISSN
2332-6549
Abstract
Machine Learning (ML)-enabled systems that run in safety-critical settings expose humans to risks. Hence, it is important to build such systems with strong assurances for domain-specific safety requirements. Simulation as well as metaheuristic optimizing search have proven to be valuable tools for online testing of ML-enabled systems for early detection of hazards. However, the efficient generation of effective test cases remains a challenging issue. In particular, the testing process shall produce as many failures as possible but also unveil diverse sets of failure scenarios.To study this phenomenon, we introduce a risk-driven test case generation and diversity analysis method tailored to ML-enabled systems. Our approach uses an online testing technique based on metaheuristic optimizing search to falsify domain-specific safety requirements. All test cases leading to hazards are then analyzed to assess their diversity by using clustering and interpretable ML. We evaluated our approach in a collaborative robotics case study showing that generating tests considering risk metrics represents an effective strategy. Furthermore, we compare alternative optimizing search algorithms and rank them based on the overall diversity of the test cases, ultimately showing that selecting the testing strategy based on the number of failures only may be misleading.