학술논문

The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

Document Type

Conference

Author

Hayot-Sasson, Valerie; Glatard, Tristan; Rokem, Ariel

Source

2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS) WORKS Workflows in Support of Large-Scale Science (WORKS), 2021 IEEE Workshop on. :42-49 Nov, 2021

Subject

Computing and Processing
Neuroimaging
Cloud computing
Costs
Processor scheduling
Prefetching
Loading
Pipelines
neuroimaging
cloud computing

Language

Abstract

To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs within processing time of prior tasks. We present an implementation of “Rolling Prefetch”, a Python library that implements a particular form of prefetching from AWS S3 object store, and we quantify its benefits. Rolling Prefetch extends S3Fs, a Python library exposing AWS S3 functionality via a file object, to add prefetch capabilities. In measured analysis performance of a 500 GB brain connectivity dataset stored on S3, we found that prefetching provides significant speed-ups of up to 1.86 ×, even in applications consisting entirely of data loading. The observed speed-up values are consistent with our theoretical analysis. Our results demonstrate the usefulness of prefetching for scientific data processing on cloud infrastructures and provide an implementation applicable to various application domains.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송