Generating pseudo test collections for learning to rank scientific articles

R. Berendsen; M. Tsagkias; M. de Rijke; E. Meij

doi:https://doi.org/10.1007/978-3-642-33247-0_6

Generating pseudo test collections for learning to rank scientific articles

Authors	R. Berendsen M. Tsagkias M. de Rijke E. Meij
Publication date	2012
Host editors	T. Catarci P. Forner D. Hiemstra A. Peñas G. Santucci
Book title	Information Access Evaluation : Multilinguality, Multimodality, and Visual Analytics
Book subtitle	third international conference of the CLEF Initiative, CLEF 2012: Rome, Italy, September 17-20 2012: proceedings
ISBN	9783642332463
ISBN (electronic)	9783642332470
Series	Lecture Notes in Computer Science
Event	Third International Conference of the CLEF Initiative, CLEF 2012, Rome, Italy, September 17-20, 2012
Pages (from-to)	42-53
Publisher	Heidelberg: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-642-33247-0_6
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Generating pseudo test collections for learning to rank scientific articles