MIRAGE: A Metrics lIbrary for Rating hAllucinations in Generated tExt
| Authors |
|
|---|---|
| Publication date | 2025 |
| Book title | CIKM'25 |
| Book subtitle | Proceedings of the 34th ACM International Conference on Information and Knowledge Management : November 10-14, 2025, Seoul, Republic of Korea |
| ISBN (electronic) |
|
| Event | 34th ACM International Conference on Information and Knowledge Management, CIKM 2025 |
| Pages (from-to) | 6539-6543 |
| Number of pages | 5 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Errors in natural language generation, so-called hallucinations, remain a critical challenge, particularly in high-stakes domains such as healthcare or science communication. While several automatic metrics have been proposed to detect and quantify hallucinations, such as FactCC, QAGS, FEQA, and FactAcc, these metrics are often unavailable, difficult to reproduce, or incompatible with modern development workflows. We introduce MIRAGE, an open-source Python library designed to address these limitations. MIRAGE re-implements key hallucination evaluation metrics in a unified library built on the Hugging Face framework, offering modularity, reproducibility, and standardized inputs and outputs. By adhering to FAIR principles, MIRAGE promotes reproducibility, accelerates experimentation, and supports the development of future hallucination metrics. We validate MIRAGE by re-evaluating existing metrics on benchmark datasets, demonstrating comparable performance while significantly improving usability and transparency. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3746252.3761644 |
| Other links | https://www.scopus.com/pages/publications/105023153112 |
| Downloads |
3746252.3761644
(Final published version)
|
| Permalink to this page | |
