A Test Collection of Synthetic Documents for Training Rankers

A. Askari; M. Aliannejadi; E. Kanoulas; S. Verberne

doi:https://doi.org/10.1145/3583780.3615111

A Test Collection of Synthetic Documents for Training Rankers ChatGPT vs. Human Experts

Authors	A. Askari M. Aliannejadi E. Kanoulas S. Verberne
Publication date	2023
Book title	CIKM '23
Book subtitle	Proceedings of the 32nd ACM International Conference on Information and Knowledge Management : October 21-25, 2023, Birmingham, England
ISBN (electronic)	9798400701245
Event	32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
Pages (from-to)	5311-5315
Number of pages	5
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	We investigate the usefulness of generative large language models (LLMs) in generating training data for cross-encoder re-rankers in a novel direction: generating synthetic documents instead of synthetic queries. We introduce a new dataset, ChatGPT-RetrievalQA, and compare the effectiveness of strong models fine-tuned on both LLM-generated and human-generated data. We build ChatGPT-RetrievalQA based on an existing dataset, the human ChatGPT comparison corpus (HC3), consisting of multiple public question collections featuring both human- and ChatGPT-generated responses. We fine-tune a range of cross-encoder re-rankers on either human-generated or ChatGPT-generated data. Our evaluation on MS MARCO DEV, TREC DL'19, and TREC DL'20 demonstrates that cross-encoder re-ranking models trained on LLM-generated responses are significantly more effective for out-of-domain re-ranking than those trained on human responses. For in-domain re-ranking, however, the human-trained re-rankers outperform the LLM-trained re-rankers. Our novel findings suggest that generative LLMs have high potential in generating training data for neural retrieval models and can be used to augment training data, especially in domains with less labeled data. ChatGPT-RetrievalQA presents various opportunities for analyzing and improving rankers with both human- and LLM-generated data. Our data, code, and model checkpoints are publicly available.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3583780.3615111
Other links	https://github.com/arian-askari/ChatGPT-RetrievalQA https://www.scopus.com/pages/publications/85178122401
Downloads	3583780.3615111 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

A Test Collection of Synthetic Documents for Training Rankers ChatGPT vs. Human Experts