CEQE to SQET: A study of contextualized embeddings for query expansion

S. Naseri; J. Dalton; A. Yates; J. Allan

doi:https://doi.org/10.1007/s10791-022-09405-y

CEQE to SQET: A study of contextualized embeddings for query expansion

Authors	S. Naseri J. Dalton A. Yates J. Allan
Publication date	06-2022
Journal	Information Retrieval Journal
Event	EUROPEAN CONFERENCE ON INFORMATION RETRIEVAL (ECIR) 2021
Volume \| Issue number	25 \| 2
Pages (from-to)	184–208
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for lexical word-based expansion in both unsupervised and supervised contexts. For unsupervised models, we study the behavior of the Contextualized Embeddings for Query Expansion (CEQE) model. We introduce a new model, Supervised Contextualized Query Expansion with Transformers (SQET) that performs expansion as a supervised classification task and leverages context in pseudo-relevant results. We study the behavior of these expansion approaches for the tasks of ad-hoc document and passage retrieval. We conduct experiments combining expansion with probabilistic retrieval models as well as neural document ranking models. We evaluate expansion effectiveness on three standard TREC collections: Robust, Complex Answer Retrieval, and Deep Learning. We analyze the results of extrinsic retrieval effectiveness, intrinsic ability to rank expansion terms, and perform a qualitative analysis of the differences between the methods. We find out CEQE statically significantly outperforms static embeddings across all three datasets for Recall@1000. Moreover, CEQE outperforms static embedding-based expansion methods on multiple collections (by up to 18% on Robust and 31% on Deep Learning on average precision) and also improves over proven probabilistic pseudo-relevance feedback (PRF) models. SQET outperforms CEQE by 6% in P@20 on the intrinsic term ranking evaluation and is approximately as effective in retrieval performance. Models incorporating neural and CEQE-based expansion score achieves gains of up to 5% in P@20 and 2% in AP on Robust over the state-of-the-art transformer-based re-ranking model, Birch.
Document type	Article
Note	In Special Issue on ECIR 2021.
Language	English
Published at	https://doi.org/10.1007/s10791-022-09405-y
Downloads	s10791-022-09405-y (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

CEQE to SQET: A study of contextualized embeddings for query expansion