From patient to pattern Probabilistic models for synthetic data in digital health twin analytics
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 25-06-2026 |
| Number of pages | 233 |
| Organisations |
|
| Abstract |
This thesis develops probabilistic generative methods for Digital Health Twin frameworks, where the central bottleneck is not only prediction but access to the data-generating process itself. Clinical data are distributed across institutions, governed by strict privacy constraints, and heterogeneous in both statistical structure and administrative availability. The thesis therefore studies synthetic data as a learned probabilistic surrogate and a controllable approximation to the joint distribution of patient data that can be sampled, inspected, shared, and used for downstream model development without repeated access to the original records.
The work addresses four technical problems that determine whether such a surrogate is scientifically and operationally useful. First, it introduces tail-adaptive normalizing flows for mixed-tail healthcare data, allowing bounded, light-tailed, and heavy-tailed variables to be modeled within a single likelihood-based framework. Second, it develops semantic regularization through a learned validator, using latent density as a proxy for whether generated records lie in plausible regions of the empirical data manifold. Third, it proposes a differentially private synthetic-data mechanism based on normalizing flows, aiming to control disclosure risk while retaining useful distributional information. Fourth, it analyzes the effect of institutional distribution shift on privacy-preserving distributed learning, with particular attention to utility and fairness. The thesis evaluates these methods in a proof-of-concept Digital Health Twin framework for clinical pathways. Its main argument is that synthetic healthcare data should be understood not as anonymized replicas, but as probabilistic objects whose value depends on likelihood calibration, tail fidelity, semantic validity, privacy control, and robustness under distribution shift. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
