Towards Interactively Improving ML Data Preparation Code via “Shadow Pipelines”
| Authors | |
|---|---|
| Publication date | 2024 |
| Book title | Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning (DEEM) |
| Book subtitle | in conjunction with the 2024 ACM SIGMOD/PODS Conference, Santiago, Chile |
| ISBN (electronic) |
|
| Event | 8th Workshop on Data Management for End-to-End Machine Learning |
| Pages (from-to) | 7–11 |
| Publisher | New York, New York: The Association for Computing Machinery |
| Organisations |
|
| Abstract |
Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3650203.3663327 |
| Downloads |
3650203.3663327
(Final published version)
|
| Permalink to this page | |
