Multi-modal learning algorithms for sequence modeling and representation learning

Open Access
Authors
Supervisors
Cosupervisors
Award date 14-06-2024
ISBN
  • 9789464961300
Number of pages 182
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
In this thesis, we work on multi-modal learning problems and algorithms. To that end, we center our investigations around three modalities: (i) audio, (ii) image(s), and (iii) text. We provide novel methods and insights into two directions: multi-modal sequence modeling and multi-modal representation learning. In the first part of the thesis, we introduce two novel methods for multi-modal sequence modeling: one for contextual automatic speech recognition and one for scene text recognition. In the second part of the thesis, we focus on multi-modal representation learning for two modalities: images and text. The primary focus is on contrastive image-text representation learning, where we provide new insights into the understanding and improvement of contrastive image-text methods.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back