CrowdGP: A Gaussian process model for inferring relevance from crowd annotations

doi:https://doi.org/10.1145/3442381.3450047

CrowdGP: A Gaussian process model for inferring relevance from crowd annotations

Authors	D. Li Z. Ren E. Kanoulas
Publication date	2021
Book title	The Web Conference 2021
Book subtitle	proceedings of the World Wide Web Conference WWW 2021 : April 19-23, 2021, Ljubljana, Slovenia
ISBN (electronic)	9781450383127
Event	2021 World Wide Web Conference, WWW 2021
Pages (from-to)	1821-1832
Number of pages	12
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Test collection has been a crucial factor for developing information retrieval systems. Constructing a test collection requires annotators to assess the relevance of massive query-document pairs. Relevance annotations acquired through crowdsourcing platforms alleviate the enormous cost of this process but they are often noisy. Existing models to denoise crowd annotations mostly assume that annotations are generated independently, based on which a probabilistic graphical model is designed to model the annotation generation process. However, tasks are often correlated with each other in reality. It is an understudied problem whether and how task correlation helps in denoising crowd annotations. In this paper, we relax the independence assumption to model task correlation in terms of relevance. We propose a new crowd annotation generation model named CrowdGP, where true relevance labels, annotator competence, annotator's bias towards relevancy, task difficulty, and task's bias towards relevancy are modelled through a Gaussian process and multiple Gaussian variables respectively. The CrowdGP model shows better performance in terms of interring true relevance labels compared with state-of-the-art baselines on two crowdsourcing datasets on relevance. The experiments also demonstrate its effectiveness in terms of selecting new tasks for future crowd annotation, which is a new functionality of CrowdGP. Ablation studies indicate that the effectiveness is attributed to the modelling of task correlation based on the auxiliary information of tasks and the prior relevance information of documents to queries.
Document type	Conference contribution
Note	This research was supported by the NWO Innovational Research Incentives Scheme Vidi (016.Vidi.189.039), the NWO Smart Culture -Big Data / Digital Humanities (314-99-301), the H2020-EU.3.4. - SOCIETAL CHALLENGES - Smart, Green And Integrated Transport (814961), the National Key R&D Program of China with grant No. 2020YFB1406704, the Natural Science Foundation of China (61972234, 61902219), and the China Scholarship Council.
Language	English
Published at	https://doi.org/10.1145/3442381.3450047
Other links	https://www.scopus.com/pages/publications/85107938795
Downloads	3442381.3450047 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

CrowdGP: A Gaussian process model for inferring relevance from crowd annotations