Generating pseudo test collections for learning to rank scientific articles

Authors
Publication date 2012
Host editors
  • T. Catarci
  • P. Forner
  • D. Hiemstra
  • A. PeƱas
  • G. Santucci
Book title Information Access Evaluation : Multilinguality, Multimodality, and Visual Analytics
Book subtitle third international conference of the CLEF Initiative, CLEF 2012: Rome, Italy, September 17-20 2012: proceedings
ISBN
  • 9783642332463
ISBN (electronic)
  • 9783642332470
Series Lecture Notes in Computer Science
Event Third International Conference of the CLEF Initiative, CLEF 2012, Rome, Italy, September 17-20, 2012
Pages (from-to) 42-53
Publisher Heidelberg: Springer
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-642-33247-0_6
Permalink to this page
Back