Comparing correction methods to reduce misclassification bias

Comparing correction methods to reduce misclassification bias

Authors	K. Kloos Q. Meertens S. Scholtus J. Karch
Publication date	2020
Host editors	L. Cao W. Kosters J. Lijffijt
Book title	BNAIC/BeNeLearn 2020
Book subtitle	proceedings : Leiden, the Netherlands, November 19-20, 2020
Event	32nd Benelux Conference on Artiﬁcial Intelligence and Belgian-Dutch Conference on Machine Learning, BNAIC/Benelearn 2020
Pages (from-to)	103-129
Publisher	Leiden: Universiteit Leiden
Organisations	Faculty of Economics and Business (FEB) - Amsterdam School of Economics Research Institute (ASE-RI) Faculty of Economics and Business (FEB)
Abstract	When applying supervised machine learning algorithms to classification, the classical goal is to reconstruct the true labels as accurately as possible. However, if the predictions of an accurate algorithm are aggregated, for example by counting the predictions of a single class label, the result is often still statistically biased. Implementing machine learning algorithms in the context of official statistics is therefore impeded. The statistical bias that occurs when aggregating the predictions of a machine learning algorithm is referred to as misclassification bias. In this paper, we focus on reducing the misclassification bias of binary classification algorithms by employing five existing estimation techniques,or estimators. As reducing bias might increase variance, the estimators are evaluated by their mean squared error (MSE). For three of the estimators, we are the first to derive an expression for the MSE in finite samples, complementing the existing asymptotic results in the literature.The expressions are then used to compute decision boundaries numerically, indicating under which conditions each of the estimators is optimal,i.e., has the lowest MSE. Our main conclusion is that the calibration estimator performs best in most applications. Moreover, the calibration estimator is unbiased and it significantly reduces the MSE compared to that of the uncorrected aggregated predictions, supporting the use of machine learning in the context of official statistics.
Document type	Conference contribution
Language	English
Related publication	Comparing correction methods to reduce misclassification bias
Published at	http://bnaic.liacs.leidenuniv.nl/bnaic2020proceedings.pdf
Downloads	Meertens (Accepted author manuscript) Comparing correction methods to reduce misclassification bias (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Comparing correction methods to reduce misclassification bias