Proactively Screening Machine Learning Pipelines with ArgusEyes

Open Access
Authors
  • C. Zhang
Publication date 2023
Book title SIGMOD '23 Companion
Book subtitle Companion of the 2023 ACM/SIGMOD International Conference on Management of Data : June 18-23, 2023, Seattle, WA, USA
ISBN (electronic)
  • 9781450395076
Event 2023 ACM/SIGMOD International Conference on Management of Data
Pages (from-to) 91–94
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Software systems that learn from data with machine learning (ML) are ubiquitous. ML pipelines in these applications often suffer from a variety of data-related issues, such as data leakage, label errors or fairness violations, which require reasoning about complex dependencies between their inputs and outputs. These issues are usually only detected in hindsight after deployment, after they caused harm in production. We demonstrate ArgusEyes, a system which enables data scientists to proactively screen their ML pipelines for data-related issues as part of continuous integration. ArgusEyes instruments, executes and screens ML pipelines for declaratively specified pipeline issues, and analyzes data artifacts and their provenance to catch potential problems early before deployment to production. We demonstrate our system for three scenarios: detecting mislabeled images in a computer vision pipeline, spotting data leakage in a price prediction pipeline, and addressing fairness violations in a credit scoring pipeline.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3555041.3589682
Downloads
3555041.3589682 (Final published version)
Permalink to this page
Back