Beyond boundaries Towards generalizable information extraction frameworks
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors |
|
| Award date | 11-12-2024 |
| ISBN |
|
| Number of pages | 114 |
| Organisations |
|
| Abstract |
Information Extraction (IE) is a core area of natural language processing focused on identifying structured information, such as named entities and relationships, within plain text. It is essential for downstream applications, including question answering, knowledge graph construction, reasoning, and information retrieval. Traditional IE frameworks, trained under the i.i.d. (independent and identically distributed) data assumption, often suffer performance drops due to domain gaps in real-world scenarios, such as variations in text genres and entity types. Moreover, collecting data across all domains is costly and often impractical, leading to data scarcity. This thesis addresses these challenges by exploring generalizable IE frameworks across three themes: (i) transferring IE models from data-rich to sparsely labeled domains, (ii) adapting IE models to new, unseen domains, and (iii) generalizing IE for strict zero-shot settings on unlabeled corpora.
Specifically, we begin by developing adaptable IE frameworks for real-world cross-domain transfer scenarios, formulating a practical task focused on transferring a legal element extractor across domains. To mitigate data sparsity and label discrepancies across domains, we propose a graph-enhanced prompt learning framework. Next, given the limited availability of labeled data, we investigate few-shot cross-domain named entity recognition, designing a prompt learning framework that incorporates type-related features. Finally, we explore whether IE models can generalize from unannotated corpora in strict zero-shot settings, proposing a cooperative multi-agent system for zero-shot IE tasks that uses the collective intelligence and specialized abilities of large language model-based agents. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
