Multi-modal learning algorithms for sequence modeling and representation learning

Authors	M.J.R. Bleeker
Supervisors	M. de Rijke
Cosupervisors	A.C. Yates
Award date	14-06-2024
ISBN	9789464961300
Number of pages	182
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this thesis, we work on multi-modal learning problems and algorithms. To that end, we center our investigations around three modalities: (i) audio, (ii) image(s), and (iii) text. We provide novel methods and insights into two directions: multi-modal sequence modeling and multi-modal representation learning. In the first part of the thesis, we introduce two novel methods for multi-modal sequence modeling: one for contextual automatic speech recognition and one for scene text recognition. In the second part of the thesis, we focus on multi-modal representation learning for two modalities: images and text. The primary focus is on contrastive image-text representation learning, where we provide new insights into the understanding and improvement of contrastive image-text methods.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE