Comparing correction methods to reduce misclassification bias

K. Kloos; Q. Meertens; S. Scholtus; J. Karch

doi:https://doi.org/10.1007/978-3-030-76640-5_5

Comparing correction methods to reduce misclassification bias

Authors	K. Kloos Q. Meertens S. Scholtus J. Karch
Publication date	2021
Host editors	M. Baratchi L. Cao W.A. Kosters J. Lijffijt J.N. van Rijn F.W. Takes
Book title	Artificial Intelligence and Machine Learning
Book subtitle	32nd Benelux Conference, BNAIC/Benelearn 2020, Leiden, The Netherlands, November 19–20, 2020 : revised selected papers
ISBN	9783030766399
ISBN (electronic)	9783030766405
Series	Communications in Computer and Information Science
Event	32nd Benelux Conference on Artiﬁcial Intelligence and Belgian-Dutch Conference on Machine Learning, BNAIC/Benelearn 2020
Pages (from-to)	64-90
Publisher	Cham: Springer
Organisations	Faculty of Economics and Business (FEB) - Amsterdam School of Economics Research Institute (ASE-RI) Faculty of Economics and Business (FEB)
Abstract	When applying supervised machine learning algorithms to classification, the classical goal is to reconstruct the true labels as accurately as possible. However, if the predictions of an accurate algorithm are aggregated, for example by counting the predictions of a single class label, the result is often still statistically biased. Implementing machine learning algorithms in the context of official statistics is therefore impeded. The statistical bias that occurs when aggregating the predictions of a machine learning algorithm is referred to as misclassification bias. In this paper, we focus on reducing the misclassification bias of binary classification algorithms by employing five existing estimation techniques, or estimators. As reducing bias might increase variance, the estimators are evaluated by their mean squared error (MSE). For three of the estimators, we are the first to derive an expression for the MSE in finite samples, complementing the existing asymptotic results in the literature. The expressions are then used to compute decision boundaries numerically, indicating under which conditions each of the estimators is optimal, i.e., has the lowest MSE. Our main conclusion is that the calibration estimator performs best in most applications. Moreover, the calibration estimator is unbiased and it significantly reduces the MSE compared to that of the uncorrected aggregated predictions, supporting the use of machine learning in the context of official statistics.
Document type	Conference contribution
Language	English
Related publication	Comparing correction methods to reduce misclassification bias
Published at	https://doi.org/10.1007/978-3-030-76640-5_5
Published at	http://bnaic.liacs.leidenuniv.nl/bnaic2020proceedings.pdf
Downloads	Meertens (Accepted author manuscript) Comparing correction methods to reduce misclassification bias (Other version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Comparing correction methods to reduce misclassification bias