Entity models for trigger-reaction documents

Open Access
Authors
Publication date 2008
Book title Proceedings of the 8th Dutch-Belgian Information Retrieval Workshop (DIR 2008)
ISBN
  • 9789056812829
Event 8th Dutch-Belgian Information Retrieval Workshop (DIR 2008), Maastricht, the Netherlands
Pages (from-to) 1-6
Publisher Maastricht: University of Maastricht
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
We define the notion of an entity model for a special kind of document popular on the web: an article followed by a list of reactions on that article, usually by many authors, usually inverse chronologically ordered. We call these documents trigger-reactions pairs. The entity model describes which named entities (persons, organizations, locations, products, urls) are mentioned, their type, how often and where they are mentioned, and it lists all variants referring to the same entity. These models find applications in media-analysis, trend watching, entity tracking and marketing.
The two main challenges for creating entity models are 1) detecting the entities and 2) normalizing all variants to the same correct canonical form. This task is particularly hard for user generated content on the web, of which our reactions are an example.
We use an algorithm for named entity recognition and normalization (NEN) tailor-made for trigger-reaction documents. It achieves high recall and reasonable precision by using two simple facts: 1) incomplete entities in reactions often occur complete in the trigger and 2) entities mentioned in news-articles on the web often have a Wikipedia page.
This article describes our experience in creating and using entity models on a corpus of 56,449 Dutch trigger-reaction documents, with a total of 616,715 reactions, collected from the web from November 11, 2006 to February 5, 2008. This paper accompanies an earlier article from our group in which the focus was on a systems-evaluation of the NEN algorithm.
Document type Conference contribution
Published at http://ilps.science.uva.nl/PoliticalMashup/wildersdata/dir2008-mamama.pdf
Downloads
Permalink to this page
Back