Unveiling the unknown Learning and generalizing beyond predefined visual boundaries
| Authors |
|
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 20-05-2026 |
| Number of pages | 172 |
| Organisations |
|
| Abstract |
Machine learning models are typically developed under the assumptions that training and test data follow the same distribution and share a fixed set of categories. However, real-world environments violate both assumptions through distribution shifts and the emergence of novel categories. This thesis addresses the question: "How can we prepare visual models to function in an unknown world?" In the first part, we study distribution shifts in video data, where both spatial and temporal variations affect model behavior. We propose a causal intervention framework that reduces background bias and encourages models to focus on action-relevant features, leading to improved generalization to unseen domains. In the second part, we move beyond fixed category assumptions and address the challenge of novel categories. We first redefine the notion of a category as the solution to an optimization problem, enabling flexible and data-driven categorization. Building on this, we introduce hierarchical representation learning to capture multi-level semantic structures and improve fine-grained discrimination. Finally, we propose a frequency-based framework that leverages complementary low- and high-frequency information to enhance generalized category discovery, particularly in fine-grained settings. Together, these contributions advance the development of adaptive, self-supervised visual models capable of generalizing across domains and discovering novel categories, and enable models to operate effectively in open-world environments.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
