WN-Salience: A Corpus of News Articles with Entity Salience Annotations

Open Access
Authors
Publication date 2020
Host editors
  • N. Calzolari
  • F. Béchet
  • P. Blache
  • K. Choukri
  • C. Cieri
  • T. Declerck
  • S. Goggi
  • H. Isahara
  • B. Maegaard
  • J. Mariani
  • H. Mazo
  • A. Moreno
  • J. Odijk
  • S. Piperidis
Book title LREC 2020
Book subtitle Twelfth International Conference on Language Resources and Evaluation : May 11-16, 2020, Palais du Pharo, Marseille, France : conference proceedings
ISBN (electronic)
  • 9791095546344
Event 12th International Conference on Language Resources and Evaluation, LREC 2020
Pages (from-to) 2095-2102
Number of pages 8
Publisher Paris: The European Language Resources Association
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Entities can be found in various text genres, ranging from tweets and web pages to user queries submitted to web search engines. Existing research either considers all entities in the text equally important, or heuristics are used to measure their salience. We believe that a key reason for the relatively limited work on entity salience is the lack of appropriate datasets. To support research on entity salience, we present a new dataset, the WikiNews Salience dataset (WN-Salience), which can be used to benchmark tasks such as entity salience detection and salient entity linking. WN-Salience is built on top of Wikinews, a Wikimedia project whose mission is to present reliable news articles. Entities in Wikinews articles are identified by the authors of the articles and are linked to Wikinews categories when they are salient or to Wikipedia pages otherwise. The dataset is built automatically, and consists of approximately 7,000 news articles, and 90,000 in-text entity annotations. We compare the WN-Salience dataset against existing datasets on the task and analyze their differences. Furthermore, we conduct experiments on entity salience detection; the results demonstrate that WN-Salience is a challenging testbed that is complementary to existing ones.

Document type Conference contribution
Language English
Published at https://www.aclweb.org/anthology/2020.lrec-1.257
Other links http://www.lrec-conf.org/proceedings/lrec2020/index.html https://www.scopus.com/pages/publications/85096565032
Downloads
2020.lrec-1.257 (Final published version)
Permalink to this page
Back