The Silent Saboteur Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

Open Access
Authors
Publication date 2025
Host editors
  • Wanxiang Che
  • Joyce Nabende
  • Ekaterina Shutova
  • Mohammad Taher Pilehvar
Book title The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : Findings of the Association for Computational Linguistics: ACL 2025
Book subtitle ACL 2025 : July 27-August 1, 2025
ISBN (electronic)
  • 9798891762565
Event 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Pages (from-to) 13935-13952
Number of pages 18
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-k candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations.

Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.findings-acl.717
Other links https://www.scopus.com/pages/publications/105028560721
Downloads
2025.findings-acl.717 (Final published version)
Permalink to this page
Back