Mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and Over?

S. Grafberger; S. Guha; P. Groth; S. Schelter

doi:https://doi.org/10.14778/3611540.3611606

Mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and Over?

Authors	S. Grafberger S. Guha P. Groth S. Schelter
Publication date	08-2023
Journal	Proceedings of the VLDB Endowment
Volume \| Issue number	16 \| 12
Pages (from-to)	4002–4005
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Software systems that learn from data with machine learning (ML) are used in critical decision-making processes. Unfortunately, real-world experience shows that the pipelines for data preparation, feature encoding and model training in ML systems are often brittle with respect to their input data. As a consequence, data scientists have to run different kinds of data centric what-if analyses to evaluate the robustness and reliability of such pipelines, e.g., with respect to data errors or preprocessing techniques. These what-if analyses follow a common pattern: they take an existing ML pipeline, create a pipeline variant by introducing a small change, and execute this variant to see how the change impacts the pipeline's output score.We recently proposed mlwhatif, a library that enables data scientists to declaratively specify what-if analyses for an ML pipeline, and to automatically generate, optimize and execute the required pipeline variants. We demonstrate how data scientists can leverage mlwhatif for a variety of pipelines and three different what-if analyses focusing on the robustness of a pipeline against data errors, the impact of data cleaning operations, and the impact of data preprocessing operations on fairness. In particular, we demonstrate step-by-step how mlwhatif generates and optimizes the required execution plans for the pipeline analyses. Our library is publicly available at https://github.com/stefan-grafberger/mlwhatif.
Document type	Article
Language	English
Published at	https://doi.org/10.14778/3611540.3611606
Downloads	Mlwhatif (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses over and Over?