Methods for federating and transferring data to eScience applications

Open Access
Authors
Supervisors
Cosupervisors
Award date 18-10-2016
ISBN
  • 9789402803662
Number of pages 201
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The main objective of this research is to investigate efficient, scalable, flexible and transparent methods for moving large volumes of data between workflow tasks and distributed, heterogeneous and independent storage resources.
We address the challenges rising from the exchange of large data volumes between web services in scientific applications by proposing a data pipeline model to reduce workflow execution times and demand for storage and network resources.
We also propose a data resource federation architecture which is technology agnostic and enables a unified view of the data resources providing an abstraction layer under which independent data storage resources are coordinated To address scalability and performance issues we have extended our initial data management architecture by introducing modules that can be deployed on multiple and heterogeneous infrastructures. Next, we investigated how programmable networks can reduce the execution time of data and I/O intensive workflows.
To demonstrate the usage of the proposed methods and tools, we have applied them to real world applications. The performance and usability of our data pipeline model for web services is evaluated with two workflows. Next, we applied our storage federation approach to a well known data-intensive workflow based on Montage. Finally, we analyze usage data of our storage federation approach coming from the VPH-Share project infrastructure which is used for executing medical applications.
Document type PhD thesis
Note Research conducted at: Universiteit van Amsterdam
Language English
Downloads
Permalink to this page
cover
Back