Interactive Image Retrieval Meets Query Rewriting with Large Language and Vision Language Models
| Authors | |
|---|---|
| Publication date | 10-2025 |
| Journal | ACM Transactions on Multimedia Computing Communications and Applications |
| Article number | 286 |
| Volume | Issue number | 21 | 10 |
| Number of pages | 23 |
| Organisations |
|
| Abstract |
Image search is a pivotal task in multi-media and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries and retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates an image captioner based on a vision-language model (VLM) to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a denoiser based on a large language model (LLM) to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT and MSVD video retrieval datasets to the image retrieval task, offering multiple relevant ground-truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10 the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1145/3744910 |
| Downloads |
Interactive Image Retrieval Meets Query Rewriting
(Final published version)
|
| Permalink to this page | |
