학술논문

Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems
Document Type
Periodical
Source
IEEE Transactions on Dependable and Secure Computing IEEE Trans. Dependable and Secure Comput. Dependable and Secure Computing, IEEE Transactions on. 19(3):1476-1491 Jun, 2022
Subject
Computing and Processing
Anomaly detection
Cloud computing
Machine learning
Instruments
Training
Hardware
Software
Fault injection
failure mode analysis
cloud computing
openstack
unsupervised machine learning
Language
ISSN
1545-5971
1941-0018
2160-9209
Abstract
Cloud computing systems fail in complex and unexpected ways due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This article introduces a new paradigm ( fault injection analytics ) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost.