URLs can facilitate machine learning classification of news stories across languages and contexts

E. de León; S. Vermeer; D. Trilling

doi:https://doi.org/10.5117/CCR2023.2.4.DELE

URLs can facilitate machine learning classification of news stories across languages and contexts

Authors	E. de León S. Vermeer D. Trilling
Publication date	2023
Journal	Computational Communication Research
Volume \| Issue number	5 \| 2
Number of pages	27
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Amsterdam School of Communication Research (ASCoR)
Abstract	Comparative scholars studying political news content at scale face the challenge of addressing multiple languages. While many train individual supervised machine learning classifiers for each language, this is a costly and time-consuming process. We propose that instead of rely-ing on thematic labels generated by manual coding, researchers can use ‘distant’ labels created by cues in article URLs. Sections reflected in URLs (e.g., nytimes.com/politics/) can therefore help create training material for supervised machine learning classifiers. Using cues provided by news media organizations, such an approach allows for efficient political news identification at scale while facilitating imple-mentation across languages. Using a dataset of approximately 870,000 URLs of news-related content from four countries (Italy, Germany, Netherlands, and Poland), we test this method by providing a comparison to ‘classical’ supervised machine learning and a multilingual BERT model, across four news topics. Our results suggest that the use of URL section cues to distantly annotate texts provides a cheap and easy-to-implement way of classifying large volumes of news texts that can save researchers many valuable resources without having to sacrifice quality.
Document type	Article
Language	English
Published at	https://doi.org/10.5117/CCR2023.2.4.DELE (Final published version)
Other links	https://www.scopus.com/pages/publications/85173004156
Downloads	CCR2023.2.4.DELE (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

URLs can facilitate machine learning classification of news stories across languages and contexts