학술논문

Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows
Document Type
Periodical
Source
IEEE Transactions on Parallel and Distributed Systems IEEE Trans. Parallel Distrib. Syst. Parallel and Distributed Systems, IEEE Transactions on. 29(2):338-350 Feb, 2018
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
System recovery
Storage management
Algorithm design and analysis
Heuristic algorithms
Runtime
Static analysis
Workflow management
storage
workflow analysis
storage management
Language
ISSN
1045-9219
1558-2183
2161-9883
Abstract
Workflow management systems are widely used to express and execute highly parallel applications. For data-intensive workflows, storage can be the constraining resource: The number of tasks running at once must be artificially limited to not overflow the space available in the filesystem. It is all too easy for a user to dispatch a workflow which consumes all available storage and disrupts all system users. To address these issues, we present a three-tiered approach to workflow storage management: (1) A static analysis algorithm which analyzes the storage needs of a workflow before execution, giving a realistic prediction of success or failure. (2) An online storage management algorithm which accounts for the storage needed by future tasks to avoid deadlock at runtime. (3) A task containment system which limits storage consumption of individual tasks, enabling the strong guarantees of the static analysis and dynamic management algorithms. We demonstrate the application of these techniques on three complex workflows.