Neural Language Models for Nineteenth-Century English (dataset; language model zoo)

Contributors	Kasra Hosseini Kaspar Beelen Giovanni Colavizza Mariona Coll Ardanuy
Publication date	23-05-2021
Description	This dataset contains four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Github repository: https://github.com/Living-with-machines/histLM
Publisher	Zenodo
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Document type	Dataset
Related publication	Neural Language Models for Nineteenth-Century English
DOI	https://doi.org/10.5281/zenodo.4782245
Other links	https://zenodo.org/record/4782245
Permalink to this page

Back

UvA-DARE