From language models to cosmic structures A geometric perspective
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 13-03-2026 |
| Number of pages | 225 |
| Organisations |
|
| Abstract |
Complex systems often organize themselves into geometric and topological patterns that encode information beyond what traditional statistical methods can capture. This thesis develops a unified geometric perspective for analyzing two domains: the large-scale distribution of matter in the universe and the internal representations of large language models.
We demonstrate that persistent homology, a tool from topological data analysis that tracks the birth and death of geometric features across scales, provides information complementary to conventional approaches in both settings. In cosmology, topological summaries of the cosmic web capture high-order spatial correlations that Fourier-space statistics do not, thereby improving constraints on fundamental parameters and breaking degeneracies among them. In machine learning, tracking how topology evolves across transformer layers reveals universal phases of information processing and enables principled model compression. Beyond topology, we show that geometric quantities such as intrinsic dimension and statistical cumulants connect the geometry of internal representations to model behavior. The intrinsic dimension of token representations correlates with predictive uncertainty, while cumulant expansions reveal how models progressively learn higher-order structure during training. The central finding is one of complementarity: geometric and topological methods do not replace existing analytical frameworks but enrich them, providing interpretable signatures of structure that would otherwise remain hidden. This cross-domain success suggests that geometry and topology offer general tools for understanding complex systems, from the cosmos to artificial intelligence. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |