학술논문

A Provenance Approach to Trace Scientific Experiments on a Grid Infrastructure
Document Type
Conference
Source
2011 IEEE Seventh International Conference on eScience E-Science (e-Science), 2011 IEEE 7th International Conference on. :134-141 Dec, 2011
Subject
Components, Circuits, Devices and Systems
Geoscience
General Topics for Engineers
Knowledge based systems
Concrete
Computer architecture
Software
Distributed databases
Biomedical imaging
Scientific workflow
workflow systems
provenance
metadata
data management
e-infrastructre
distributed systems
DCI
grid computing
e-science
bioscience
Language
Abstract
Large experiments on distributed infrastructures become increasingly complex to manage, in particular to trace all computations that gave origin to a piece of data or an event such as an error. The work presented in this paper describes the design and implementation of an architecture to support experiment provenance and its deployment in the concrete case of a particular e-infrastructure for biosciences. The proposed solution consists of: (a) a data provenance repository to capture scientific experiments and their execution path, (b) a software tool (crawler) that gathers, classifies, links, and stores the information collected from various sources, and (c) a set of user interfaces through which the end-user can access the provenance data, interpret the results, and trace the sources of failure. The approach is based on an OPM-compliant API, PLIER, that is flexible to support future extensions and facilitates interoperability among heterogeneous application systems.