Assisted authoring for avoiding inadequate claims in scientific reporting
| Authors |
|
|---|---|
| Supervisors |
|
| Award date | 22-01-2020 |
| ISBN |
|
| Number of pages | 215 |
| Organisations |
|
| Abstract |
In this thesis, we report on our work on developing Natural Language Processing (NLP) algorithms to aid readers and authors of biomedical articles in detecting spin (distorted presentation of research results). We focused on spin in abstracts of articles reporting Randomized Controlled Trials (RCTs). We conducted a linguistic study of spin and created a description of its textual features. We annotated a set of corpora for the key tasks of our spin detection pipeline: extraction of declared (primary) and reported outcomes, assessment of semantic similarity of pairs of trial outcomes, and extraction of relations between reported outcomes and their statistical significance levels. We annotated two smaller corpora for identification of statements of similarity between treatments and of within-group comparisons. We developed a number of rule-based and machine learning algorithms for the key tasks (outcome extraction, outcome similarity assessment, and outcome-significance relation extraction). The best performance was shown by a deep learning approach that consists in fine-tuning deep pre-trained domain-specific language representations (BioBERT and SciBERT models) for downstream tasks. This approach was implemented in our spin detection prototype system, called DeSpin, released as open source code. Our prototype includes some other algorithms, such as text structure analysis (identification of the abstract of an article, identification of sections within the abstract), detection of statements of similarity between treatments and of within-group comparisons, extraction of data from trial registries. Our prototype system includes a simple annotation and visualization interface.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |