Learning Workflow Scheduling on Multi-Resource Clusters

Open Access
Authors
Publication date 2019
Book title 2019 IEEE International Conference on Networking, Architecture and Storage (NAS)
Book subtitle proceedings : Enshi, China, 15-17 August 2019
ISBN
  • 9781728144108
ISBN (electronic)
  • 9781728144092
Event 14th IEEE International Conference on Networking, Architecture and Storage, NAS 2019
Pages (from-to) 17-24
Number of pages 8
Publisher Piscataway, NJ: IEEE
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Workflow scheduling is one of the key issues in the management of workflow execution. Typically, a workflow application can be modeled as a Directed-Acyclic Graph (DAG). In this paper, we present GoDAG, an approach that can learn to well schedule workflows on multi-resource clusters. GoDAG directly learns the scheduling policy from experience through deep reinforcement learning. In order to adapt deep reinforcement learning methods, we propose a novel state representation, a practical action space and a corresponding reward definition for workflow scheduling problem. We implement a GoDAG prototype and a simulator to simulate task running on multi-resource clusters. In the evaluation, we compare the GoDAG with three state-of-the-art heuristics. The results show that GoDAG outperforms the baseline heuristics, leading to less average makespan to different workflow structures.

Document type Conference contribution
Note This research has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreements 643963 (SWITCH project), 654182 (ENVRIplus project), 676247 (VRE4EIC project), 824068 (ENVRI-FAIR project) and 825134 (ARTICONF project). The research is also supported by Chinese Scholarship Council.
Language English
Published at https://doi.org/10.1109/NAS.2019.8834720
Published at https://zenodo.org/record/3466676
Other links https://www.proceedings.com/50412.html https://www.scopus.com/pages/publications/85073191082
Downloads
2019.8.conference.nas.camera (Accepted author manuscript)
Permalink to this page
Back