학술논문

Is it Time to Revisit Erasure Coding in Data-Intensive Clusters?

Document Type

Conference

Author

Darrous, Jad; Ibrahim, Shadi; Perez, Christian

Source

2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) MASCOTS Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2019 IEEE 27th International Symposium on. :165-178 Oct, 2019

Subject

Communication, Networking and Broadcast Technologies
Computing and Processing
Encoding
Layout
Task analysis
Performance evaluation
Data analysis
Sparks
Decoding
Erasure codes, Hadoop, MapReduce, Experimental evaluation, Data-intensive clusters

Language

ISSN

2375-0227

Abstract

Data-intensive clusters are heavily relying on distributed storage systems to accommodate the unprecedented growth of data. Hadoop distributed file system (HDFS) is the primary storage for data analytic frameworks such as Spark and Hadoop. Traditionally, HDFS operates under replication to ensure data availability and to allow locality-aware task execution of data-intensive applications. Recently, erasure coding (EC) is emerging as an alternative method to replication in storage systems due to the continuous reduction in its computation overhead. In this work, we conduct an extensive experimental study to understand the performance of data-intensive applications under replication and EC. We use representative benchmarks on the Grid'5000 testbed to evaluate how analytic workloads, data persistency, failures, the back-end storage devices, and the network configuration impact their performances. Our study sheds the light not only on the potential benefits of erasure coding in data-intensive clusters but also on the aspects that may help to realize it effectively.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송