Impact of Tokenization, Pretraining Task, and Transformer Depth on Text Ranking
| Authors | |
|---|---|
| Publication date | 2021 |
| Host editors |
|
| Book title | The Twenty-Ninth Text REtrieval Conference (TREC 2020) Proceedings |
| Series | NIST Special Publication, SP 1266 |
| Event | 29th Text REtrieval Conference, TREC 2020 |
| Number of pages | 8 |
| Publisher | Gaithersburg, MD: National Institute of Standards and Technology |
| Organisations |
|
| Abstract |
This paper documents the University of Amsterdam’s participation in the TREC 2020 Deep Learning Track. Rather than motivated by engineering the best scoring system, our work is motivated by our interest in analysis, informing our understanding of the opportunities and challenges of transformers for text ranking. Specifically, we focus on the passage retrieval task where we try to answer three of sets of questions.First, transformers use different tokenization than traditional IR approaches such as stemming and lemmatizing, leading to different document representations. What is the effect of modern preprocessing techniques on traditional retrieval algorithms? Our main observation is that the limited vocabulary of the BERT tokenizer is affecting many long-tail tokens, which leads to large gains in efficiency at the cost of a small decrease in effectiveness.Second, the effectiveness of transformers is a result of the self-supervised pre-training task promoting general language understanding, ignorant of the specific demands of ranking tasks.Can we make further correlate queries and relevant passages in the pre-training task? Our main observation is that there is a whole continuum between the original self-supervised training task of BERT and the final interaction ranker, and isolating ranking-aware pre-training tasks may leads to gains in efficiency (as these pretrained models can be reused for many tasks) as well as to gains in effectiveness (in particular when limited data on the target task is available).
Third, transformers combine large sequence length with many layers, with unclear what this deep semantics adds in the context of ranking. How complex do the models need to be in order to perform well on this task? Our main observation is that the deep layers of BERT lead to some, but relatively modest, gains in performance, but that the exact role of the presumed superior language understanding for search is far from clear. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://trec.nist.gov/pubs/trec29/papers/UAmsterdam.DL.pdf |
| Other links | https://trec.nist.gov/pubs/trec29/trec2020.html |
| Downloads |
UAmsterdam.DL
(Final published version)
|
| Permalink to this page | |
