학술논문

QHB+: Accelerated Configuration Optimization for Automated Performance Tuning of Spark SQL Applications
Document Type
Periodical
Source
IEEE Access Access, IEEE. 12:60138-60148 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Sparks
Optimization methods
Machine learning
Tuning
Bayes methods
Upper bound
Big Data
Configuration management
Structured Query Language
Big data
configuration optimization
spark SQL
hyperband
Language
ISSN
2169-3536
Abstract
Apache Spark stands out as a well-known solution for big data processing because of its efficiency and rapid processing capabilities. One of its modules, Spark SQL, serves as a prominent big data query engine. However, executing Spark SQL applications with massive data can be time-intensive, and the execution time can vary significantly depending on its configurations. Recent studies try to reduce application execution times by searching optimal configurations for applications. While Bayesian optimization is recognized as a powerful method in recent studies for configuration optimization, it faces challenges such as computational costs and time-consuming computations, especially when dealing with large search spaces Due to these challenges, we propose QHB+, designed to rapidly search optimal configurations. QHB+ utilizes the Successive Halving Algorithm-based optimization methods, performing well in hyperparameter optimization of machine learning models, for configuration optimization of Spark SQL applications. Through empirical evaluations against established benchmarks, we show the efficiency of QHB+, highlighting them as swift alternatives to conventional optimization method for optimizing Spark SQL configurations.