DiffIR: Exploring Differences in Ranking Models' Behavior

K.M. Jose; T. Nguyen; S. MacAvaney; J. Dalton; A. Yates

doi:https://doi.org/10.1145/3404835.3462784

DiffIR: Exploring Differences in Ranking Models' Behavior

Authors	K.M. Jose T. Nguyen S. MacAvaney J. Dalton A. Yates
Publication date	2021
Book title	SIGIR '21
Book subtitle	proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval : July 11-15, 2021, virtual event, Canada
ISBN (electronic)	9781450380379
Event	44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Pages (from-to)	2595-2599
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Understanding and comparing the behavior of retrieval models is a fundamental challenge that requires going beyond examining average effectiveness and per-query metrics, because these do not reveal key differences in how ranking models' behavior impacts individual results. DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually 'diffing' system rankings at the individual result level for queries where behavior significantly diverges. Using one of several configurable similarity measures, it identifies queries for which the rankings of models compared have important differences in individual rankings and provides a visual web interface to compare the rankings side-by-side. DiffIR additionally supports a model-specific visualization approach based on custom term importance weight files. These support studying the behavior of interpretable models, such as neural retrieval methods that produce document scores based on a similarity matrix or based on a single document passage. Observations from this tool can complement neural probing approaches like ABNIRML to generate quantitative tests. We provide an illustrative use case of DiffIR by studying the qualitative differences between recently developed neural ranking models on a standard TREC benchmark dataset.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3404835.3462784
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

DiffIR: Exploring Differences in Ranking Models' Behavior