- Personal name resolution of web people search
- WWW2008 workshop NLP Challenges in the Information Explosion Era (NLPIX 2008), Beijing, China
- Book/source title
- NLPIX 2008: WWW2008 workshop NLP Challenges in the Information Explosion Era: Proceedings
- Pages (from-to)
- Document type
- Conference contribution
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a difficult and challenging task. In this paper, we explore the extent to which the "cluster hypothesis" for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (single pass clustering) or (2) semantic based matching (Probabilistic Latent Semantic Analysis). We compare and contrast these strategies and provide strong evidence to suggest that the hypothesis holds for the former. And in fact, on the new evaluation platform of the SemEval 2007 Web People Search task, we show that using single pass clustering with a standard IR document representations fits well with the assumptions about the data and the task which yields state-of-the-art performance.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.