학술논문

Toward Network-Aware Query Execution Systems in Large Datacenters
Document Type
Periodical
Source
IEEE Transactions on Network and Service Management IEEE Trans. Netw. Serv. Manage. Network and Service Management, IEEE Transactions on. 20(4):4494-4504 Dec, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Distributed databases
Processor scheduling
Optimization
Telecommunication traffic
Costs
Task analysis
Schedules
Query data operator
coflow scheduling
network communication
performance optimizations
datacenters
Language
ISSN
1932-4537
2373-7379
Abstract
How to efficiently process concurrent data tasks such as online analytical queries in datacenter environments is still a big challenge for current computing techniques. One of the fundamental reasons is that their task execution normally involves large numbers of distributed data operators, which are always expensive in terms of communication time. To improve the general performance, various advanced approaches on the execution optimization of data operators have been proposed in the past years. However, most of them focus on application-level optimization, such as using data locality scheduling to reduce network traffic. Moreover, few of them has considered the optimization opportunities for concurrent execution of multiple data operators. In this paper, we propose a novel coflow-based scheduling system called CoFlop, which aims to improve network communication time for multiple distributed operators at a query level, and on that basis to lay a solid foundation for the development of a network-aware query execution system in datacenter networks. We introduce the detailed system design of CoFlop and conduct a simulation-based evaluation with large concurrent distributed join operations. Compared to existing methods, the experimental results show that CoFlop can perform better in the presence of different large workloads.