MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt

Benjamin Vendeville; Liana Ermakova; Pierre De Loor; Jaap Kamps

doi:https://doi.org/10.1145/3746252.3761644

MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt

Authors	Benjamin Vendeville Liana Ermakova Pierre De Loor Jaap Kamps
Publication date	2025
Book title	CIKM'25
Book subtitle	Proceedings of the 34th ACM International Conference on Information and Knowledge Management : November 10-14, 2025, Seoul, Republic of Korea
ISBN (electronic)	9798400720406
Event	34th ACM International Conference on Information and Knowledge Management, CIKM 2025
Pages (from-to)	6539-6543
Number of pages	5
Publisher	New York, NY: Association for Computing Machinery
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Errors in natural language generation, so-called hallucinations, remain a critical challenge, particularly in high-stakes domains such as healthcare or science communication. While several automatic metrics have been proposed to detect and quantify hallucinations, such as FactCC, QAGS, FEQA, and FactAcc, these metrics are often unavailable, difficult to reproduce, or incompatible with modern development workflows. We introduce MIRAGE, an open-source Python library designed to address these limitations. MIRAGE re-implements key hallucination evaluation metrics in a unified library built on the Hugging Face framework, offering modularity, reproducibility, and standardized inputs and outputs. By adhering to FAIR principles, MIRAGE promotes reproducibility, accelerates experimentation, and supports the development of future hallucination metrics. We validate MIRAGE by re-evaluating existing metrics on benchmark datasets, demonstrating comparable performance while significantly improving usability and transparency.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3746252.3761644
Other links	https://www.scopus.com/pages/publications/105023153112
Downloads	3746252.3761644 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt