LLM4Eval: Large Language Model for Evaluation in IR
| Authors |
|
|---|---|
| Publication date | 2024 |
| Book title | SIGIR '24 |
| Book subtitle | Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval : July 14-18, 2024, Washington, DC, USA |
| ISBN (electronic) |
|
| Event | 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024 |
| Pages (from-to) | 3040-3043 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
Large language models (LLMs) have demonstrated increasing task-solving
abilities not present in smaller models. Utilizing the capabilities and
responsibilities of LLMs for automated evaluation (LLM4Eval) has
recently attracted considerable attention in multiple research
communities. For instance, LLM4Eval models have been studied in the
context of automated judgments, natural language generation, and
retrieval augmented generation systems. We believe that the information
retrieval community can significantly contribute to this growing
research area by designing, implementing, analyzing, and evaluating
various aspects of LLMs with applications to LLM4Eval tasks. The main
goal of LLM4Eval workshop is to bring together researchers from industry
and academia to discuss various aspects of LLMs for evaluation in
information retrieval, including automated judgments,
retrieval-augmented generation pipeline evaluation, altering human
evaluation, robustness, and trustworthiness of LLMs for evaluation in
addition to their impact on real-world applications. We also plan to run
an automated judgment challenge prior to the workshop, where
participants will be asked to generate labels for a given dataset while
maximising correlation with human judgments. The format of the workshop
is interactive, including roundtable and keynote sessions and tends to
avoid the one-sided dialogue of a mini-conference.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3626772.3657992 |
| Downloads |
3626772.3657992
(Final published version)
|
| Permalink to this page | |
