UBOnlp Report at the SimpleText lab of CLEF 2025

Open Access
Authors
  • Benjamin Vendeville
  • Liana Ermakova
  • Pierre De Loor
  • Jaap Kamps ORCID logo
Publication date 2025
Host editors
  • G. Faggioli
  • N. Ferro
  • P. Rosso
  • D. Spina
Book title Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2025)
Book subtitle Madrid, Spain, 9-12 September 2025
Series CEUR Workshop Proceedings
Event 26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025
Pages (from-to) 4363-4375
Number of pages 13
Publisher Aachen: CEUR-WS
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

This paper presents the UBOnlp team’s participation in the SimpleText lab at CLEF 2025, focusing on scientific text simplification and controlled creativity tasks. We evaluate the performance of GPT-4o using simple prompt-based approaches across multiple subtasks without specialized training or fine-tuning. For Task 1 (Text Simplification), we applied GPT-4o to both sentence-level and document-level simplification of scientific abstracts from the Cochrane-Auto corpus. Our system achieved competitive SARI scores (42.20 for sentence-level, 43.37 for document-level) while maintaining low complexity metrics, demonstrating effective simplification through content reduction rather than lexical substitution. For Task 2 (Controlled Creativity), we addressed spurious generation detection and error classification in simplified texts. Our approach showed strong performance in fluency error detection (F1 = 0.322, ranking first) and alignment error detection (F1 = 0.381, ranking third), but struggled with general spurious content detection, particularly in post-hoc scenarios without source documents. These results highlight both the potential and limitations of large language models for specialized text simplification tasks. While GPT-4o demonstrates capabilities in linguistic quality assessment, task-specific architectures remain superior for comprehensive error detection and generation control. Our findings contribute to understanding the practical applicability of general-purpose language models in scientific text processing workflows.

Document type Conference contribution
Language English
Published at https://ceur-ws.org/Vol-4038/paper_360.pdf
Other links https://ceur-ws.org/Vol-4038/ https://www.scopus.com/pages/publications/105019055759
Downloads
paper_360 (Final published version)
Permalink to this page
Back