학술논문

On Performance Modeling and Prediction for Spark-HBase Applications in Big Data Systems

Document Type

Conference

Author

Source

ICC 2022 - IEEE International Conference on Communications Communications, ICC 2022 - IEEE International Conference on. :3685-3690 May, 2022

Subject

Communication, Networking and Broadcast Technologies
Machine learning algorithms
Machine learning
Predictive models
Big Data
Parallel processing
Prediction algorithms
Data models
Spark
HBase
big data
machine learning
representation learning
performance modeling and prediction

Language

ISSN

1938-1883

Abstract

Many large-scale applications in various business and scientific domains require both parallel computing and distributed data management for big data processing. One typical scenario is the use of the Spark computing engine to process a large amount of data managed by HBase in Hadoop. Such computing workflows provide an opportunity to optimize application performance through strategic resource allocation with suitable parameter settings. As such, it necessitates accurate modeling and prediction of application performance to provide an effective recommendation of optimal system configurations to end users. However, this is a challenging problem for multiple reasons, mainly the large parameter space and the dynamic interactions between different technology layers of big data systems. In this paper, we propose a class of regression-based machine learning models to predict the execution performance of Spark-HBase applications in Hadoop. We first explore and identify an exhaustive set of system parameters across multiple layers including Spark and HBase, and then conduct in-depth exploratory analysis of their effects on the execution time of Spark-HBase applications. Based on these analysis results, we design a performance predictor using regression-based machine learning algorithms. Experimental results show that the resulted predictor achieves high accuracy with different algorithms in comparison. The proposed approach can facilitate automatic system configurations and has potential to be applied to other similar systems for big data processing.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송