From fine-tuning to prompting A paradigm shift in knowledge graph construction

Open Access
Authors
Supervisors
Cosupervisors
Award date 04-02-2026
Series SIKS Dissertation Series, 2026-12
Number of pages 192
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Knowledge graphs (KGs) provide structured, machine-actionable representations of information that support search, reasoning, and decision-making. Constructing them, however, remains challenging in complex domains such as organizational conversations, where data is noisy, evolving, and context-dependent. This thesis examines how knowledge graph construction (KGC) can adapt to these conditions through two complementary perspectives: (i) analyzing the limitations of the pretrain-then-finetune (PTFT) paradigm when applied to conversational data, and (ii) exploring how the emerging pretrain, prompt, and predict (PPP) paradigm can provide more flexible and cost-efficient workflows.
In the first part, we investigate the fragility of PTFT-based information extraction models under real-world variation. We show that distribution shifts in named entity recognition lead to large and predictable performance drops; that static topic models, though semantically coherent, struggle to detect the emergence of new topics; and that cross-document coreference in multi-party email exposes persistent weaknesses in current methods. These findings highlight the limits of task-specific models in domains shaped by input shifts, temporal change, and long conversational structure.
In the second part, we turn to PPP-based workflows that leverage large language models through prompting rather than fine-tuning. We demonstrate that instruction-tuned LLMs can achieve competitive results in relation extraction, provided schema knowledge is carefully encoded. We introduce knowledge-centric prompt composition to guide in-context learning for knowledge base construction, showing that prompts enriched with schema constraints and examples substantially improve extraction quality. Finally, we propose a hybrid system for data preparation, TableSwift, which routes tasks between LLM-generated code and deterministic fallbacks to reduce costs while maintaining accuracy on transformation, error detection, and entity matching.
Taken together, this thesis traces a critical paradigm shift in KGC: from PTFT pipelines reliant on specialized models, toward PPP workflows that are promptable, adaptable, and cost-aware. By diagnosing the weaknesses of PTFT and designing PPP-based solutions, it offers both empirical insights and practical architectures for building reliable knowledge graphs in complex, real-world domains.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back