A Comprehensive Dataset of Citations with Identifiers from English Wikipedia (2023)

Creators
Publication date 22-05-2023
Description This is a dataset of 40.664.485 citations extracted from English Wikipedia February 2023 dump (https://dumps.wikimedia.org/enwiki/20230220/). The dataset is purely based on information from Wikipedia, labelled and annotated datasets will be added in the follow up versions. The source code to extract citations can be found here: https://github.com/albatros13/wikicite. The code is a fork of the earlier project on Wikipedia citation extraction: https://github.com/Harshdeep1996/cite-classifications-wiki.
Publisher Zenodo
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Document type Dataset
Related publication Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia
DOI https://doi.org/10.5281/zenodo.7958486
Other links https://zenodo.org/record/7958486
Permalink to this page
Back