Multi-modal learning algorithms for sequence modeling and representation learning
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 14-06-2024 |
| ISBN |
|
| Number of pages | 182 |
| Organisations |
|
| Abstract |
In this thesis, we work on multi-modal learning problems and algorithms. To that end, we center our investigations around three modalities: (i) audio, (ii) image(s), and (iii) text. We provide novel methods and insights into two directions: multi-modal sequence modeling and multi-modal representation learning. In the first part of the thesis, we introduce two novel methods for multi-modal sequence modeling: one for contextual automatic speech recognition and one for scene text recognition. In the second part of the thesis, we focus on multi-modal representation learning for two modalities: images and text. The primary focus is on contrastive image-text representation learning, where we provide new insights into the understanding and improvement of contrastive image-text methods.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
