Comparing correction methods to reduce misclassification bias

Open Access
Authors
Publication date 2021
Host editors
  • M. Baratchi
  • L. Cao
  • W.A. Kosters
  • J. Lijffijt
  • J.N. van Rijn
  • F.W. Takes
Book title Artificial Intelligence and Machine Learning
Book subtitle 32nd Benelux Conference, BNAIC/Benelearn 2020, Leiden, The Netherlands, November 19–20, 2020 : revised selected papers
ISBN
  • 9783030766399
ISBN (electronic)
  • 9783030766405
Series Communications in Computer and Information Science
Event 32nd Benelux Conference on Artificial Intelligence and Belgian-Dutch Conference on Machine Learning, BNAIC/Benelearn 2020
Pages (from-to) 64-90
Publisher Cham: Springer
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam School of Economics Research Institute (ASE-RI)
  • Faculty of Economics and Business (FEB)
Abstract
When applying supervised machine learning algorithms to classification, the classical goal is to reconstruct the true labels as accurately as possible. However, if the predictions of an accurate algorithm are aggregated, for example by counting the predictions of a single class label, the result is often still statistically biased. Implementing machine learning algorithms in the context of official statistics is therefore impeded. The statistical bias that occurs when aggregating the predictions of a machine learning algorithm is referred to as misclassification bias. In this paper, we focus on reducing the misclassification bias of binary classification algorithms by employing five existing estimation techniques, or estimators. As reducing bias might increase variance, the estimators are evaluated by their mean squared error (MSE). For three of the estimators, we are the first to derive an expression for the MSE in finite samples, complementing the existing asymptotic results in the literature. The expressions are then used to compute decision boundaries numerically, indicating under which conditions each of the estimators is optimal, i.e., has the lowest MSE. Our main conclusion is that the calibration estimator performs best in most applications. Moreover, the calibration estimator is unbiased and it significantly reduces the MSE compared to that of the uncorrected aggregated predictions, supporting the use of machine learning in the context of official statistics.
Document type Conference contribution
Language English
Related publication Comparing correction methods to reduce misclassification bias
Published at https://doi.org/10.1007/978-3-030-76640-5_5
Published at http://bnaic.liacs.leidenuniv.nl/bnaic2020proceedings.pdf
Downloads
Meertens (Accepted author manuscript)
Permalink to this page
Back