Ontology- and LLM-based data harmonization for federated learning in healthcare
| Authors |
|
|---|---|
| Publication date | 18-03-2026 |
| Journal | Frontiers in Digital Health |
| Article number | 1756555 |
| Volume | Issue number | 8 |
| Number of pages | 12 |
| Organisations |
|
| Abstract |
Introduction: Semantic heterogeneity across electronic health records (EHRs) limits scalable and privacy-preserving analytics in healthcare. While federated learning (FL) enables collaborative modeling without sharing raw data, it requires consistent, ontology-aligned representations. We present an ontology- and large language model (LLM)-based data harmonization approach to support secure, interoperable FL workflows. Methods: We propose a general two-step pipeline for converting or annotating clinical text into a predefined target ontology format. First, candidate concepts are retrieved from the target vocabulary using embedding-based similarity search or ontology cross-references. Second, an LLM acts as a semantic validator, accepting or rejecting candidates based on explicit equivalence or subsumption criteria. The approach is ontology-agnostic and configurable; mapping to MONDO and HPO is demonstrated as a real-world use case. Final accepted mappings were evaluated against independent human expert assessment. Results: Across two clinical datasets, expert-LLM agreement reached up to 92%, with overall performance ranging from 78% to 91% depending on candidate-generation strategy. Retrieval alone was insufficient for reliable mapping, whereas LLM-based validation substantially improved precision while complementary retrieval strategies improved recall. Discussion: The proposed pipeline transforms ontology-based harmonization from a manual expert task into a reusable and configurable workflow suitable for federated healthcare research. By combining high-recall retrieval with LLM-based semantic adjudication, the approach enables scalable, privacy-preserving conversion of heterogeneous clinical text into standardized representations across domains. |
| Document type | Article |
| Note | With supplementary material. |
| Language | English |
| Published at | https://doi.org/10.3389/fdgth.2026.1756555 |
| Other links | https://zenodo.org/records/15411810 https://www.scopus.com/pages/publications/105038112885 |
| Downloads |
fdgth-8-1756555
(Final published version)
|
| Supplementary materials | |
| Permalink to this page | |
