Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”

S. Grafberger; P. Groth; S. Schelter

doi:https://doi.org/10.1145/3650203.3663327

Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”

Authors	S. Grafberger P. Groth S. Schelter
Publication date	2024
Book title	Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning (DEEM)
Book subtitle	in conjunction with the 2024 ACM SIGMOD/PODS Conference, Santiago, Chile
ISBN (electronic)	9798400706110
Event	8th Workshop on Data Management for End-to-End Machine Learning
Pages (from-to)	7–11
Publisher	New York, New York: The Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3650203.3663327 (Final published version)
Downloads	3650203.3663327 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”