학술논문

Fault-Tolerance in Dataflow-Based Scientific Workflow Management
Document Type
Conference
Source
2010 6th World Congress on Services Services (SERVICES-1), 2010 6th World Congress on. :336-343 Jul, 2010
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
General Topics for Engineers
Fault tolerance
Fault tolerant systems
Pipelines
Data models
Monitoring
Data structures
Biological system modeling
Scientific Workflow Patterns Kepler
Language
ISSN
2378-3818
Abstract
This paper addresses the challenges of providing fault-tolerance in scientific workflow management. The specification and handling of faults in scientific workflows should be defined precisely in order to ensure the consistent execution against the process-specific requirements. We identified a number of typical failure patterns that occur in real-life scientific workflow executions. Following the intuitive recovery strategies that correspond to the identified patterns, we developed the methodologies that integrate recovery fragments into fault-prone scientific workflow models. Compared to the existing fault-tolerance mechanisms, the propositions reduce the effort of workflow designers by defining recovery fragments automatically. Furthermore, the developed framework implements the necessary mechanisms to capture the faults from the different layers of a scientific workflow management architecture. Experience indicates that the framework can be employed effectively to model, capture and tolerate the typical failure patterns that we identified.