Entity-centric document understanding Entity aspects and salience
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors |
|
| Award date | 03-11-2020 |
| ISBN |
|
| Number of pages | 104 |
| Organisations |
|
| Abstract |
The amount of information available for consumption has been overwhelming since the end of the 20th century, leading to information overload. Automated information processing techniques make it possible to process and organize large volumes of information. Textual document is one category of information that is widely available with entities playing a key role in automatically understanding the semantics of documents.
In this thesis, we aim at enhancing document understanding by using entity aspects and entity salience information. First, we hypothesize that entities have multiple aspects, and different documents may be discussing different aspects of a given entity. Given that, we propose to learn entity-centric document representation for entity associated documents. Then we study entity aspect linking, which links text fragments (entity mentions) to particular aspects of entities. We view entity aspect linking as a pairwise semantic matching problem and propose a neural network based approach to solve the task. We also investigate how to enhance document understanding using entity salience information. We assume that in long textual documents, not all entities are equally important: some are salient and others are not. We propose a novel entity topic model to take salient entities into consideration in the document generation process. We present a new dataset to support research on entity salience related tasks such as entity salience detection and salient entity linking. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
