학술논문

Demystifying deep learning in predictive monitoring for cloud-native SLOs

Document Type

Conference

Author

Morichetta, Andrea; Pusztai, Thomas; Vij, Deepak; Pujol, Victor Casamayor; Raith, Philipp; Xiong, Ying; Nastic, Stefan; Dustdar, Schahram; Zhang, Zhaobo

Source

2023 IEEE 16th International Conference on Cloud Computing (CLOUD) CLOUD Cloud Computing (CLOUD), 2023 IEEE 16th International Conference on. :1-11 Jul, 2023

Subject

Computing and Processing
Measurement
Cloud computing
Analytical models
Computational modeling
Neural networks
Predictive models
Transformers
workload prediction
neural networks
cloud
LSTM

Language

ISSN

2159-6190

Abstract

The complexity inherent in managing cloud computing systems calls for novel solutions that can effectively enforce high-level Service Level Objectives (SLOs) promptly. Unfortunately, most of the current SLO management solutions rely on reactive approaches, i.e., correcting SLO violations only after they have occurred. Further, the few methods that explore predictive techniques to prevent SLO violations focus solely on forecasting low-level system metrics, such as CPU and Memory utilization. Although valid in some cases, these metrics do not necessarily provide clear and actionable insights into application behavior. This paper presents a novel approach that directly predicts high-level SLOs using low-level system metrics. We target this goal by training and optimizing two state-of-the-art neural network models, a Short-Term Long Memory - LSTM, and a Transformer-based model. Our models provide actionable insights into application behavior by establishing proper connections between the evolution of low-level workload-related metrics and the high-level SLOs. We demonstrate our approach to selecting and preparing the data. We show in practice how to optimize LSTM and Transformer by targeting efficiency as a high-level SLO metric and performing a comparative analysis. We show how these models behave when the input workloads come from different distributions. Consequently, we demonstrate their ability to generalize in heterogeneous systems. Finally, we operationalize our two models by integrating them into the Polaris framework we have been developing to enable a performance-driven SLO-native approach to Cloud computing.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송