Can we use automated approaches to measure the quality of online political discussion? How to (not) measure interactivity, diversity, rationality, and incivility in online comments to the news

Open Access
Authors
Publication date 2026
Journal Communication Methods and Measures
Volume | Issue number 20
Pages (from-to) 1-25
Organisations
  • Faculty of Social and Behavioural Sciences (FMG) - Amsterdam School of Communication Research (ASCoR)
Abstract
This article explores the (in)ability of automated tools to measure the deliberative quality of online user comments along the standards set out by Habermas: interactivity, diversity, rationality, and (in)civility. Utilizing a stratified sample of manually coded comments (n = 3,862) responding to news videos on YouTube and Twitter, we examined the performance of rule-based measures (i.e. dictionaries), machine-learning classifiers (conventional and transformer-based) and measurements by generative AI (Llama 3.1, GPT-4o, GPT-4T). We present results for over 50 metrics side-by-side to judge the opportunity costs of choosing one method over another. The results revealed strong variation across different groups of models. Overall, our expectation that more modern methods (transformers and generative AI) outperform the older, simpler ones was confirmed. However, the absolute differences between these model groups strongly depended on the measured concept, and we observed strong variance in performance among models of the same group. We provide recommendations for future research that balance ease of use with the performance of automated measurements, along with important cautions to consider.
Document type Article
Language English
Published at https://doi.org/10.1080/19312458.2025.2553300
Other links https://www.scopus.com/pages/publications/105016719006
Downloads
Supplementary materials
Permalink to this page
Back