학술논문

Optimal Rack-Coordinated Updates in Erasure-Coded Data Centers: Design and Analysis
Document Type
Periodical
Source
IEEE Transactions on Computers IEEE Trans. Comput. Computers, IEEE Transactions on. 72(7):1871-1885 Jul, 2023
Subject
Computing and Processing
Encoding
Data centers
Codes
Reliability
Bandwidth
Computer architecture
Redundancy
Cross-rack update traffic
erasure codes
rack-coordinated updates
Language
ISSN
0018-9340
1557-9956
2326-3814
Abstract
Erasure coding has been extensively deployed in today's data centers to tackle prevalent failures, yet it is prone to substantial cross-rack traffic for parity updates. In this article, we propose a new rack-coordinated update mechanism to suppress the cross-rack update traffic, which comprises two successive phases: a delta-collecting phase that collects data delta chunks, and another selective parity update phase that renews the parity chunks based on the update pattern and parity layout. We further design ${\sf RackCU}$RackCU, an optimal rack-coordinated update solution that achieves the theoretical lower bound of the cross-rack update traffic. We also perform reliability analysis, demonstrating that ${\sf RackCU}$RackCU can attain a lower data loss probability via shortening the update procedure. We conduct extensive evaluations, in terms of large-scale simulation and real-world data center experiments. We show that ${\sf RackCU}$RackCU can reduce 16.5-77.1% of the cross-rack update traffic and hence improve 24.9-772.0% of the update throughput.