- Entity models for trigger-reaction documents
- 8th Dutch-Belgian Information Retrieval Workshop (DIR 2008), Maastricht, the Netherlands
- Book/source title
- Proceedings of the 8th Dutch-Belgian Information Retrieval Workshop (DIR 2008)
- Pages (from-to)
- Maastricht: University of Maastricht
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
We define the notion of an entity model for a special kind of document popular on the web: an article followed by a list of reactions on that article, usually by many authors, usually inverse chronologically ordered. We call these documents trigger-reactions pairs. The entity model describes which named entities (persons, organizations, locations, products, urls) are mentioned, their type, how often and where they are mentioned, and it lists all variants referring to the same entity. These models find applications in media-analysis, trend watching, entity tracking and marketing.
The two main challenges for creating entity models are 1) detecting the entities and 2) normalizing all variants to the same correct canonical form. This task is particularly hard for user generated content on the web, of which our reactions are an example.
We use an algorithm for named entity recognition and normalization (NEN) tailor-made for trigger-reaction documents. It achieves high recall and reasonable precision by using two simple facts: 1) incomplete entities in reactions often occur complete in the trigger and 2) entities mentioned in news-articles on the web often have a Wikipedia page.
This article describes our experience in creating and using entity models on a corpus of 56,449 Dutch trigger-reaction documents, with a total of 616,715 reactions, collected from the web from November 11, 2006 to February 5, 2008. This paper accompanies an earlier article from our group in which the focus was on a systems-evaluation of the NEN algorithm.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.