MassiveClicks: A Massively-Parallel Framework for Efficient Click Models Training

S. Thijssen; P. Khandel; A. Yates; A.-L. Varbanescu

doi:https://doi.org/10.1007/978-3-031-50684-0_18

MassiveClicks: A Massively-Parallel Framework for Efficient Click Models Training

Authors	S. Thijssen P. Khandel A. Yates A.-L. Varbanescu
Publication date	2024
Host editors	D. Zeinalipur D. Blanco Heras G. Pallis H. Herodotou D. Trihinas D. Balouek P. Diehl T. Cojean K. Fürlinger M.H. Kirkeby M. Nardelli P. Di Sanzo
Book title	Euro-Par 2023: Parallel Processing Workshops
Book subtitle	Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28-September 1, 2023 : revised selected papers
ISBN	9783031506833
ISBN (electronic)	9783031506840
Series	Lecture Notes in Computer Science
Event	Euro-Par 2023: Parallel Processing Workshops
Volume \| Issue number	I
Pages (from-to)	232–245
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Click logs collect user interaction with information retrieval systems (e.g., search engines). Clicks therefore become implicit feedback for such systems, and are further used to train click models, which in turn improve the quality of search and recommendations results. Click models based on expectation maximization (EM) are known to be effective and robust against various biases. Training EM-based models is challenging due to the size of click logs, and can take many hours when using sequential tools like PyClick. Alternatives, such as ParClick, employ parallelism and show significant speed-up. However, ParClick only works on single-node multi-core systems. To further scale up and out, in this work we introduce MassiveClicks, the first massively parallel, distributed, multi-GPU framework for EM-based click-models training. MassiveClicks relies on efficient GPU kernels, balanced data-partitioning policies, and distributed computing to improve the performance of EM-based model training, outperforming ParClick by orders of magnitude when using GPUs and/or multiple nodes. Additionally, the framework supports heterogeneous GPU architectures, variable numbers of GPUs per node, allows for multi-node multi-core CPU-based training when no GPUs are available.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-031-50684-0_18
Downloads	978-3-031-50684-0_18 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

MassiveClicks: A Massively-Parallel Framework for Efficient Click Models Training