Unveiling the unknown Learning and generalizing beyond predefined visual boundaries

Open Access
Authors
  • S. Rastegar
Supervisors
Cosupervisors
Award date 20-05-2026
Number of pages 172
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Machine learning models are typically developed under the assumptions that training and test data follow the same distribution and share a fixed set of categories. However, real-world environments violate both assumptions through distribution shifts and the emergence of novel categories. This thesis addresses the question: "How can we prepare visual models to function in an unknown world?" In the first part, we study distribution shifts in video data, where both spatial and temporal variations affect model behavior. We propose a causal intervention framework that reduces background bias and encourages models to focus on action-relevant features, leading to improved generalization to unseen domains. In the second part, we move beyond fixed category assumptions and address the challenge of novel categories. We first redefine the notion of a category as the solution to an optimization problem, enabling flexible and data-driven categorization. Building on this, we introduce hierarchical representation learning to capture multi-level semantic structures and improve fine-grained discrimination. Finally, we propose a frequency-based framework that leverages complementary low- and high-frequency information to enhance generalized category discovery, particularly in fine-grained settings. Together, these contributions advance the development of adaptive, self-supervised visual models capable of generalizing across domains and discovering novel categories, and enable models to operate effectively in open-world environments.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back