Impact of Tokenization, Pretraining Task, and Transformer Depth on Text Ranking

Open Access
Authors
Publication date 2021
Host editors
  • E.M. Voorhees
  • A. Ellis
Book title The Twenty-Ninth Text REtrieval Conference (TREC 2020) Proceedings
Series NIST Special Publication, SP 1266
Event 29th Text REtrieval Conference, TREC 2020
Number of pages 8
Publisher Gaithersburg, MD: National Institute of Standards and Technology
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR)
Abstract
This paper documents the University of Amsterdam’s participation in the TREC 2020 Deep Learning Track. Rather than motivated by engineering the best scoring system, our work is motivated by our interest in analysis, informing our understanding of the opportunities and challenges of transformers for text ranking. Specifically, we focus on the passage retrieval task where we try to answer three of sets of questions.First, transformers use different tokenization than traditional IR approaches such as stemming and lemmatizing, leading to different document representations. What is the effect of modern preprocessing techniques on traditional retrieval algorithms? Our main observation is that the limited vocabulary of the BERT tokenizer is affecting many long-tail tokens, which leads to large gains in efficiency at the cost of a small decrease in effectiveness.Second, the effectiveness of transformers is a result of the self-supervised pre-training task promoting general language understanding, ignorant of the specific demands of ranking tasks.Can we make further correlate queries and relevant passages in the pre-training task? Our main observation is that there is a whole continuum between the original self-supervised training task of BERT and the final interaction ranker, and isolating ranking-aware pre-training tasks may leads to gains in efficiency (as these pretrained models can be reused for many tasks) as well as to gains in effectiveness (in particular when limited data on the target task is available).
Third, transformers combine large sequence length with many layers, with unclear what this deep semantics adds in the context of ranking. How complex do the models need to be in order to perform well on this task? Our main observation is that the deep layers of BERT lead to some, but relatively modest, gains in performance, but that the exact role of the presumed superior language understanding for search is far from clear.
Document type Conference contribution
Language English
Published at https://trec.nist.gov/pubs/trec29/papers/UAmsterdam.DL.pdf
Other links https://trec.nist.gov/pubs/trec29/trec2020.html
Downloads
UAmsterdam.DL (Final published version)
Permalink to this page
Back