University of Amsterdam at the CLEF 2022 SimpleText Track

Open Access
Authors
  • F. Mostert
  • A. Sampatsing
  • M. Spronk
  • D. Rau
Publication date 2022
Host editors
  • G. Faggioli
  • N. Ferro
  • A. Hanbury
  • M. Potthast
Book title Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum
Book subtitle Bologna, Italy, September 5th to 8th, 2022
Series CEUR Workshop Proceedings
Event 2022 Conference and Labs of the Evaluation Forum, CLEF 2022
Pages (from-to) 2832-2844
Number of pages 13
Publisher Aachen: CEUR-WS
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

This paper reports on the University of Amsterdam’s participation in the CLEF 2022 SimpleText track. The overall goal of removing barriers that prevent the general public from accessing scientific literature is of great importance to help users make sense of a world of misinformation and shallow opinions. We perform preliminary studies within the track’s setup, analyzing the text complexity of searching a large set of academic abstracts in the context of popular science topics emerging in the news, with a specific focus at the relation between the topical relevance and the text complexity of the retrieved information. Our main findings are the following. First, we analyzed a large corpus of scientific abstracts and confirmed that these are highly complex on average, but that the variation is large and many abstracts with accessible readability levels exist. Second, we ran retrieval experiments and found that standard search ignores readability, yet filtering on the desirable reading level still retains competitive performance while avoiding retrieving relevant but incomprehensible results. Third, we ran complexity spotting experiments and found that straightforward lexical complexity or term frequency measures are strong indicators, but have to be combined with the importance of the concept in the broader context of the information request. Fourth, we ran a GPT-2 based text simplification model in a zero-shot way, resulting in conservative rewriting of abstracts, able to significantly reduce the text complexity. More generally, our results demonstrate that text complexity is an essential aspect to consider for improving non-expert access to scientific information, and opens up new routes to develop effective scientific information access technology tailored to needs of the general public.

Document type Conference contribution
Language English
Published at http://ceur-ws.org/Vol-3180/paper-242.pdf
Other links http://ceur-ws.org/Vol-3180/ https://www.scopus.com/pages/publications/85136996447
Downloads
paper-242 (Final published version)
Permalink to this page
Back