Multimodal machine learning for information retrieval A vision and language perspective
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 13-12-2024 |
| ISBN |
|
| Number of pages | 168 |
| Organisations |
|
| Abstract |
In this thesis, we investigate multimodal machine learning for information retrieval, focusing on vision and language. Our research is organized into three main areas: (i) dense and sparse cross-modal retrieval, where we examine reproducibility issues and propose methods for learned sparse retrieval; (ii) representation learning and evaluation, where we study the limitations of vision-language contrastive learning and advocate for more robust evaluation frameworks; and (iii) product retrieval, where we analyze user behavior and leverage multimodal data to improve retrieval performance across categories of varying granularity.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
