Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

J. Qi; G. Sarti; R. Fernández; A. Bisazza

doi:https://doi.org/10.18653/v1/2024.emnlp-main.347

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Authors	J. Qi G. Sarti R. Fernández A. Bisazza
Publication date	2024
Host editors	Y. Al-Onaizan M. Bansal Y.-N. Chen
Book title	The 2024 Conference on Empirical Methods in Natural Language Processing : Proceedings of the Conference
Book subtitle	EMNLP 2024 : November 12-16, 2024
ISBN (electronic)	9798891761643
Event	2024 Conference on Empirical Methods in Natural Language Processing
Pages (from-to)	6037-6053
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution.
Document type	Conference contribution
Note	With supplementary software
Language	English
Published at	https://doi.org/10.18653/v1/2024.emnlp-main.347 (Final published version)
Other links	https://github.com/Betswish/MIRAGE
Downloads	Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation (Final published version)
Supplementary materials	2024.emnlp-main.347.software
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation