학술논문

Trace-based Multi-Dimensional Root Cause Localization of Performance Issues in Microservice Systems
Document Type
Conference
Source
2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE) ICSE Software Engineering (ICSE), 2024 IEEE/ACM 46th International Conference on. :1347-1358 Apr, 2024
Subject
Computing and Processing
Location awareness
Runtime environment
Accuracy
Microservice architectures
Benchmark testing
Data mining
Spectral analysis
Microservice
Root Cause Analysis
Tracing
Language
ISSN
1558-1225
Abstract
Modern microservice systems have become increasingly complicated due to the dynamic and complex interactions and runtime environment. It leads to the system vulnerable to performance issues caused by a variety of reasons, such as the runtime environments, communications, coordinations, or implementations of services. Traces record the detailed execution process of a request through the system and have been widely used in performance issues diagnosis in microservice systems. By identifying the execution processes and attribute value combinations that are common in anomalous traces but rare in normal traces, engineers may localize the root cause of a performance issue into a smaller scope. However, due to the complex structure of traces and the large number of attribute combinations, it is challenging to find the root cause from the huge search space. In this paper, we propose TraceContrast, a trace-based multidimensional root cause localization approach. TraceContrast uses a sequence representation to describe the complex structure of a trace with attributes of each span. Based on the representation, it combines contrast sequential pattern mining and spectrum analysis to localize multidimensional root causes efficiently. Experimental studies on a widely used microservice benchmark show that TraceContrast outperforms existing approaches in both multidimensional and instance-dimensional root cause localization with significant accuracy advantages. Moreover, Trace-Contrast is efficient and its efficiency can be further improved by parallel execution.