Recall Aspects of Transformers for Text Ranking

D. Rau; J. Kamps

Recall Aspects of Transformers for Text Ranking

Authors	D. Rau J. Kamps
Publication date	2022
Host editors	I. Soboroff A. Ellis
Book title	The Thirtieth Text REtrieval Conference (TREC 2021) Proceedings
Series	NIST Special Publication, SP 500-335
Event	30th Text REtrieval Conference
Number of pages	6
Publisher	Gaithersburg, MD: National Institute of Standards and Technology
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	This paper documents the University of Amsterdam’s participation in the TREC 2021 Deep Learning Track. In addition to providing labeled training data at scale, the other major contribution of the TREC DL track is to avoid the pool-bias exhibited in all the earlier adhoc search test collections created through the pooling only runs from traditional sparse retrieval systems. However, even in the TREC deep learning track, we have shallow pools, and runs with varying but high fractions of unjudged documents. This prompts a deeper analysis of pool coverage over ranks for both a representative traditional approach (i.e., BM25) and a representative neural approach (i.e., the BERT cross-encoder for the passage retrieval task). Our main conclusions are the following. First, we submitted a neural run that specifically looks beyond those documents easily found by traditional models, highlighting the potential of neural models to address recall aspects in addition to the precision aspects prioritized in the TREC Deep Learning Track up to now. Second, we observe high fractions of unjudged documents after the initial ranks for both the 2020 and 2021 data, which may hinder the evaluation of recall-oriented aspects and reusability of the judgments for runs not contributing to the pooling. Third, we observe a gradual decline of the fraction of relevant over judged documents for 2020, which is a positive sign against pooling bias, but almost no decrease for 2021. Our general conclusion is that coverage below the guaranteed pooling horizon is far from complete and that analysis of recall aspects must be done with care, but that there is great potential to study these in future editions of the track.
Document type	Conference contribution
Language	English
Published at	https://trec.nist.gov/pubs/trec30/papers/UAmsterdam-DL.pdf
Other links	https://trec.nist.gov/pubs/trec30/trec2021.html
Downloads	UAmsterdam-DL (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Recall Aspects of Transformers for Text Ranking