Beyond boundaries Towards generalizable information extraction frameworks

Open Access
Authors
Supervisors
Cosupervisors
  • Z. Ren
Award date 11-12-2024
ISBN
  • 9789465066813
Number of pages 114
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Information Extraction (IE) is a core area of natural language processing focused on identifying structured information, such as named entities and relationships, within plain text. It is essential for downstream applications, including question answering, knowledge graph construction, reasoning, and information retrieval. Traditional IE frameworks, trained under the i.i.d. (independent and identically distributed) data assumption, often suffer performance drops due to domain gaps in real-world scenarios, such as variations in text genres and entity types. Moreover, collecting data across all domains is costly and often impractical, leading to data scarcity. This thesis addresses these challenges by exploring generalizable IE frameworks across three themes: (i) transferring IE models from data-rich to sparsely labeled domains, (ii) adapting IE models to new, unseen domains, and (iii) generalizing IE for strict zero-shot settings on unlabeled corpora.
Specifically, we begin by developing adaptable IE frameworks for real-world cross-domain transfer scenarios, formulating a practical task focused on transferring a legal element extractor across domains. To mitigate data sparsity and label discrepancies across domains, we propose a graph-enhanced prompt learning framework. Next, given the limited availability of labeled data, we investigate few-shot cross-domain named entity recognition, designing a prompt learning framework that incorporates type-related features. Finally, we explore whether IE models can generalize from unannotated corpora in strict zero-shot settings, proposing a cooperative multi-agent system for zero-shot IE tasks that uses the collective intelligence and specialized abilities of large language model-based agents.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back