No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

E. de Vries; M. Schoonvelde; G. Schumacher

doi:https://doi.org/10.31219/osf.io/cuxha

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Authors	E. de Vries M. Schoonvelde G. Schumacher
Publication date	10-2018
Journal	Political Analysis
Volume \| Issue number	26 \| 4
Pages (from-to)	417-430
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Amsterdam Institute for Social Science Research (AISSR)
Abstract	Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al. 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models—such as topic models. We use the europarl dataset and compare term-document matrices (TDMs) as well as topic model results from gold standard translated text and machine-translated text. We evaluate results at both the document and the corpus level. We first find TDMs for both text corpora to be highly similar, with minor differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. With regard to LDA topic models, we find topical prevalence and topical content to be highly similar with again only small differences across languages. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.
Document type	Article
Note	With supplemental materials
Language	English
Related dataset	Replication Data for: No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications
Published at	https://doi.org/10.31219/osf.io/cuxha (Submitted manuscript) https://doi.org/10.1017/pan.2018.26 (Final published version)
Downloads	Translation_paper (Submitted manuscript) no-longer-lost-in-translation-evidence-that-google-translate-works-for-comparative-bag-of-words-text-applications (Final published version)
Supplementary materials	S1047198718000268sup001
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications