Comparing correction methods to reduce misclassification bias

Open Access
Authors
Publication date 2020
Host editors
  • L. Cao
  • W. Kosters
  • J. Lijffijt
Book title BNAIC/BeNeLearn 2020
Book subtitle proceedings : Leiden, the Netherlands, November 19-20, 2020
Event 32nd Benelux Conference on Artificial Intelligence and Belgian-Dutch Conference on Machine Learning, BNAIC/Benelearn 2020
Pages (from-to) 103-129
Publisher Leiden: Universiteit Leiden
Organisations
  • Faculty of Economics and Business (FEB) - Amsterdam School of Economics Research Institute (ASE-RI)
  • Faculty of Economics and Business (FEB)
Abstract
When applying supervised machine learning algorithms to classification, the classical goal is to reconstruct the true labels as accurately as possible. However, if the predictions of an accurate algorithm are aggregated, for example by counting the predictions of a single class label, the result is often still statistically biased. Implementing machine learning algorithms in the context of official statistics is therefore impeded. The statistical bias that occurs when aggregating the predictions of a machine learning algorithm is referred to as misclassification bias. In this paper, we focus on reducing the misclassification bias of binary classification algorithms by employing five existing estimation techniques,or estimators. As reducing bias might increase variance, the estimators are evaluated by their mean squared error (MSE). For three of the estimators, we are the first to derive an expression for the MSE in finite samples, complementing the existing asymptotic results in the literature.The expressions are then used to compute decision boundaries numerically, indicating under which conditions each of the estimators is optimal,i.e., has the lowest MSE. Our main conclusion is that the calibration estimator performs best in most applications. Moreover, the calibration estimator is unbiased and it significantly reduces the MSE compared to that of the uncorrected aggregated predictions, supporting the use of machine learning in the context of official statistics.
Document type Conference contribution
Language English
Related publication Comparing correction methods to reduce misclassification bias
Published at http://bnaic.liacs.leidenuniv.nl/bnaic2020proceedings.pdf
Downloads
Meertens (Accepted author manuscript)
Permalink to this page
Back