MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

S. Grafberger; S. Guha; J. Stoyanovich; S. Schelter

doi:https://doi.org/10.1145/3448016.3452759

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

Authors	S. Grafberger S. Guha J. Stoyanovich S. Schelter
Publication date	2021
Book title	SIGMOD '21
Book subtitle	proceedings of the 2021 International Conference on the Management of Data : June 20 -25, 2021, virtual event, China
ISBN (electronic)	9781450383431
Event	2021 International Conference on the Management of Data
Pages (from-to)	2736–2739
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. While bias detection cannot be fully automated, computational tools can help pinpoint particular types of data issues.We recently proposed mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. In this demonstration, we show how mlinspect can be used to detect data distribution bugs in a representative pipeline. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines, can handle both relational and matrix data, and does not require manual code instrumentation. The library is publicly available at https://github.com/stefan-grafberger/mlinspect.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3448016.3452759
Other links	https://github.com/stefan-grafberger/mlinspect
Downloads	3448016.3452759 (Final published version)
Supplementary materials	3448016.3452759
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines