Pseudo Test Collections for Training and Tuning Microblog Rankers

R. Berendsen; M. Tsagkias; W. Weerkamp; M. de Rijke

doi:https://doi.org/10.1145/2484028.2484063

Pseudo Test Collections for Training and Tuning Microblog Rankers

Authors	R. Berendsen M. Tsagkias W. Weerkamp M. de Rijke
Publication date	2013
Book title	SIGIR '13
Book subtitle	the proceedings of the 36th International ACM SIGIR Conference on Research & Development in Information Retrieval : July 28-August 1, 2013, Dublin, Ireland
ISBN (electronic)	9781450320344 9781450324533
Event	SIGIR '13
Pages (from-to)	53-62
Publisher	New York: ACM
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Recent years have witnessed a persistent interest in generating pseudo test collections, both for training and evaluation purposes. We describe a method for generating queries and relevance judgments for microblog search in an unsupervised way. Our starting point is this intuition: tweets with a hashtag are relevant to the topic covered by the hashtag and hence to a suitable query derived from the hashtag. Our baseline method selects all commonly used hashtags, and all associated tweets as relevance judgments; we then generate a query from these tweets. Next, we generate a timestamp for each query, allowing us to use temporal information in the training process. We then enrich the generation process with knowledge derived from an editorial test collection for microblog search. We use our pseudo test collections in two ways. First, we tune parameters of a variety of well known retrieval methods on them. Correlations with parameter sweeps on an editorial test collection are high on average, with a large variance over retrieval algorithms. Second, we use the pseudo test collections as training sets in a learning to rank scenario. Performance close to training on an editorial test collection is achieved in many cases. Our results demonstrate the utility of tuning and training microblog search algorithms on automatically generated training material.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/2484028.2484063
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Pseudo Test Collections for Training and Tuning Microblog Rankers