학술논문

Prediction of Cloud API Performance Using Uncertainty-Based Fusion of Predictive and Analytical Modeling
Document Type
Conference
Source
2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) HPCC-DSS-SMARTCITY-DEPENDSYS High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2023 IEEE International Conference on. :515-522 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Training
Analytical models
Uncertainty
Distributed databases
Machine learning
Predictive models
Data models
performance modeling
hybrid models
predictive uncertainty
analytical models
machine learning models
Language
Abstract
Performance prediction for cloud applications is important due to multiple reasons, including ensuring service level agreements, cost optimization, and resource provisioning. However, accurately predicting performance is challenging on account of different characteristics of performance data, such as noise, which can occur due to errors in measurements or specific behaviors of cloud applications, and imbalanced data (i.e., regions of missing data), which can occur due to various access patterns of users. Existing prediction methods using analytical, machine learning, as well as hybrid approaches assume uniformly distributed performance data, and these methods have high uncertainty and low accuracy when interpolating and generalizing into areas that do not contain reference data. In this paper, we introduce two novel approaches, Uncertainty-based Sub-modeling, and Uncertainty-based Iterative Sampling, as methods of combining analytical and machine learning approaches based on the uncertainty of predictions (i.e., predictive uncertainty). We focus on improving the predictions in under-represented regions by utilizing the analytical model based on an inference threshold as well as by incorporating data generated using analytical models for machine learning model training. Our approaches outperform their baseline models in 8 and 12 out of 15 cases with respect to RMSE and MAE respectively. Further analysis reveals that our approaches excel in addressing imbalanced data scenarios within performance modeling, outperforming previously proposed machine learning and hybrid models.