University of Amsterdam at the CLEF 2022 SimpleText Track

F. Mostert; A. Sampatsing; M. Spronk; D. Rau; J. Kamps

University of Amsterdam at the CLEF 2022 SimpleText Track

Authors	F. Mostert A. Sampatsing M. Spronk D. Rau J. Kamps
Publication date	2022
Host editors	G. Faggioli N. Ferro A. Hanbury M. Potthast
Book title	Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum
Book subtitle	Bologna, Italy, September 5th to 8th, 2022
Series	CEUR Workshop Proceedings
Event	2022 Conference and Labs of the Evaluation Forum, CLEF 2022
Pages (from-to)	2832-2844
Number of pages	13
Publisher	Aachen: CEUR-WS
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	This paper reports on the University of Amsterdam’s participation in the CLEF 2022 SimpleText track. The overall goal of removing barriers that prevent the general public from accessing scientific literature is of great importance to help users make sense of a world of misinformation and shallow opinions. We perform preliminary studies within the track’s setup, analyzing the text complexity of searching a large set of academic abstracts in the context of popular science topics emerging in the news, with a specific focus at the relation between the topical relevance and the text complexity of the retrieved information. Our main findings are the following. First, we analyzed a large corpus of scientific abstracts and confirmed that these are highly complex on average, but that the variation is large and many abstracts with accessible readability levels exist. Second, we ran retrieval experiments and found that standard search ignores readability, yet filtering on the desirable reading level still retains competitive performance while avoiding retrieving relevant but incomprehensible results. Third, we ran complexity spotting experiments and found that straightforward lexical complexity or term frequency measures are strong indicators, but have to be combined with the importance of the concept in the broader context of the information request. Fourth, we ran a GPT-2 based text simplification model in a zero-shot way, resulting in conservative rewriting of abstracts, able to significantly reduce the text complexity. More generally, our results demonstrate that text complexity is an essential aspect to consider for improving non-expert access to scientific information, and opens up new routes to develop effective scientific information access technology tailored to needs of the general public.
Document type	Conference contribution
Language	English
Published at	http://ceur-ws.org/Vol-3180/paper-242.pdf (Final published version)
Other links	http://ceur-ws.org/Vol-3180/ https://www.scopus.com/pages/publications/85136996447
Downloads	paper-242 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

University of Amsterdam at the CLEF 2022 SimpleText Track