ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning

Authors	P. Törnberg
Publication date	13-04-2023
Edition	v1
Number of pages	5
Publisher	ArXiv
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Amsterdam Institute for Social Science Research (AISSR) Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	This paper assesses the accuracy, reliability, and bias of the Large Language Model (LLM) ChatGPT-4 on the text analysis task of classifying the political affiliation of a Twitter poster based on the content of a tweet. The LLM is compared to manual annotation by both expert classifiers and crowd workers, generally considered the gold standard for such tasks. We use Twitter messages from United States politicians during the 2020 election, providing a ground truth against which to measure accuracy. The paper finds that ChatGPT-4 has achieved higher accuracy, higher reliability, and equal or lower bias than the human classifiers. The LLM is able to correctly annotate messages that require reasoning on the basis of contextual knowledge and inferences around the author’s intentions—traditionally seen as uniquely human abilities. These findings suggest that LLM will have a substantial impact on the use of textual data in the social sciences, by enabling interpretive research at a scale.
Document type	Preprint
Language	English
Published at	https://doi.org/10.48550/arXiv.2304.06588
Downloads	2304.06588v1 (Final published version)
Permalink to this page

Back

UvA-DARE