Multimodal machine learning for information retrieval

Multimodal machine learning for information retrieval A vision and language perspective

Authors	M.Y. Hendriksen
Supervisors	M. de Rijke
Cosupervisors	P.T. Groth
Award date	13-12-2024
ISBN	9789465064383
Number of pages	168
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this thesis, we investigate multimodal machine learning for information retrieval, focusing on vision and language. Our research is organized into three main areas: (i) dense and sparse cross-modal retrieval, where we examine reproducibility issues and propose methods for learned sparse retrieval; (ii) representation learning and evaluation, where we study the limitations of vision-language contrastive learning and advocate for more robust evaluation frameworks; and (iii) product retrieval, where we analyze user behavior and leverage multimodal data to improve retrieval performance across categories of varying granularity.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back