Characterizing workflow-based activity on a production e-infrastructure using provenance data.

Authors
  • S. Madougou
  • S. Shahand
  • M. Santcroos
  • B. van Schaik
Publication date 2013
Journal Future Generation Computer Systems
Volume | Issue number 29 | 8
Pages (from-to) 1931-1942
Organisations
  • Faculty of Science (FNWI) - Swammerdam Institute for Life Sciences (SILS)
  • Faculty of Medicine (AMC-UvA)
Abstract
Grid computing and workflow management systems emerged as solutions to the challenges arising from the processing and storage of shear volumes of data generated by modern simulations and data acquisition devices. Workflow management systems usually document the process of the workflow execution either as structured provenance information or as log files. Provenance is recognized as an important feature in workflow management systems, however there are still few reports on its usage in practical cases. In this paper we present the provenance system implemented in our platform, and then use the information captured by this system during 8 months of platform operation to analyze the platform usage and to perform multilevel error pattern analysis. We make use of the large amount of structured data using the explanatory potential of statistical approaches to find properties of workflows, jobs and resources that are related to workflow failure. Such an analysis enables us to characterize workflow executions on the infrastructure and understand workflow failures. The approach is generic and applicable to other e-infrastructures to gain insight into operational incidents.
Document type Article
Language English
Published at https://doi.org/10.1016/j.future.2013.04.019
Permalink to this page
Back