Personal name resolution of web people search

Open Access
Authors
Publication date 2008
Event WWW2008 workshop NLP Challenges in the Information Explosion Era (NLPIX 2008), Beijing, China
Number of pages 10
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a difficult and challenging task. In this paper, we explore the extent to which the "cluster hypothesis" for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (single pass clustering) or (2) semantic based matching (Probabilistic Latent Semantic Analysis). We compare and contrast these strategies and provide strong evidence to suggest that the hypothesis holds for the former. And in fact, on the new evaluation platform of the SemEval 2007 Web People Search task, we show that using single pass clustering with a standard IR document representations fits well with the assumptions about the data and the task which yields state-of-the-art performance.
Document type Paper
Language English
Published at http://staff.science.uva.nl/~mdr/Publications/Files/nlpix2008-webpeople.pdf
Other links http://www.ra.ethz.ch/cdstore/www2008/www.slis.tsukuba.ac.jp/~fujii/NLPIX2008/index.htm
Downloads
Permalink to this page
Back