Multimodal machine learning for information retrieval A vision and language perspective

Open Access
Authors
Supervisors
Cosupervisors
Award date 13-12-2024
ISBN
  • 9789465064383
Number of pages 168
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
In this thesis, we investigate multimodal machine learning for information retrieval, focusing on vision and language. Our research is organized into three main areas: (i) dense and sparse cross-modal retrieval, where we examine reproducibility issues and propose methods for learned sparse retrieval; (ii) representation learning and evaluation, where we study the limitations of vision-language contrastive learning and advocate for more robust evaluation frameworks; and (iii) product retrieval, where we analyze user behavior and leverage multimodal data to improve retrieval performance across categories of varying granularity.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back