Link detection in XML documents: What about repeated links?

J. Zhang; J. Kamps

Link detection in XML documents: What about repeated links?

Authors	J. Zhang J. Kamps
Publication date	2008
Host editors	A. Trotman S. Geva J. Kamps
Book title	Proceedings of the SIGIR 2008 Workshop on Focused Retrieval: Held in Singapore, 24 July 2008
ISBN	9780473134686
Event	SIGIR 2008 Workshop on Focused Retrieval (Singapore)
Pages (from-to)	59-66
Publisher	Dunedin: University of Otago, Department of Computer Science
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Link detection is a special case of focused retrieval where potential links between documents have to be detected automatically. The use case, as studied at INEX's Link the Wiki track, is that of a new, orphaned page (here, a structured XML document) for which we need to detect relevant incoming and outgoing links to other pages (here, the INEX Wikipedia collection). We focus on outgoing links and investigate link density, and especially repeated occurrences of links with the same anchor text and destination. We provide an extensive analysis of link density and repetition, and look at parameters like the document's length, the distance between anchor text occurrences, and the frequency of the anchor text within an article. We also conduct experiments trying to determine what should be done with links that are repeated. We describe alternative approaches and compare them against two baselines: the first baseline is to link only once, and the second is to link all candidates. The performance is measured with precision and recall in terms of the total set of discovered links. Our main finding is that, although the overall impact of link repetition is modest, performance can increase by taking a informed approach to link repetition.
Document type	Conference contribution
Published at	http://www.cs.otago.ac.nz/sigirfocus2008/paper_14.pdf
Downloads	297428.pdf
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Link detection in XML documents: What about repeated links?