LLM4Eval: Large Language Model for Evaluation in IR

H.A. Rahmani; C. Siro; M. Aliannejadi; N. Craswell; C.L.A. Clarke; G. Faggioli; B. Mitra; P. Thomas; E. Yilmaz

doi:https://doi.org/10.1145/3626772.3657992

LLM4Eval: Large Language Model for Evaluation in IR

Authors	H.A. Rahmani C. Siro M. Aliannejadi N. Craswell C.L.A. Clarke G. Faggioli B. Mitra P. Thomas E. Yilmaz
Publication date	2024
Book title	SIGIR '24
Book subtitle	Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval : July 14-18, 2024, Washington, DC, USA
ISBN (electronic)	9798400704314
Event	47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Pages (from-to)	3040-3043
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Large language models (LLMs) have demonstrated increasing task-solving abilities not present in smaller models. Utilizing the capabilities and responsibilities of LLMs for automated evaluation (LLM4Eval) has recently attracted considerable attention in multiple research communities. For instance, LLM4Eval models have been studied in the context of automated judgments, natural language generation, and retrieval augmented generation systems. We believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks. The main goal of LLM4Eval workshop is to bring together researchers from industry and academia to discuss various aspects of LLMs for evaluation in information retrieval, including automated judgments, retrieval-augmented generation pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. We also plan to run an automated judgment challenge prior to the workshop, where participants will be asked to generate labels for a given dataset while maximising correlation with human judgments. The format of the workshop is interactive, including roundtable and keynote sessions and tends to avoid the one-sided dialogue of a mini-conference.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3626772.3657992
Downloads	3626772.3657992 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

LLM4Eval: Large Language Model for Evaluation in IR