A distantly supervised dataset for automated data extraction from diagnostic studies

C. Norman; M. Leeflang; R. Spijker; E. Kanoulas; A. Névéol

doi:https://doi.org/10.18653/v1/W19-5012

A distantly supervised dataset for automated data extraction from diagnostic studies

Authors	C. Norman M. Leeflang R. Spijker E. Kanoulas A. Névéol
Publication date	2019
Host editors	D. Demner-Fushman K.B. Cohen S. Ananiadou J. Tsujii
Book title	SIGBioMed Workshop on Biomedical Natural Language Processing
Book subtitle	BioNLP 2019 : Proceedings of the 18th BioNLP Workshop and Shared Task : August 1, 2019, Florence, Italy
ISBN (electronic)	9781950737284
Event	18th SIGBioMed Workshop on Biomedical Natural Language Processing, BioNLP 2019
Pages (from-to)	105-114
Number of pages	10
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Systematic reviews are important in evidence based medicine, but are expensive to produce. Automating or semi-automating the data extraction of index test, target condition, and reference standard from articles has the potential to decrease the cost of conducting systematic reviews of diagnostic test accuracy, but relevant training data is not available. We create a distantly supervised dataset of approximately 90,000 sentences, and let two experts manually annotate a small subset of around 1,000 sentences for evaluation. We evaluate the performance of BioBERT and logistic regression for ranking the sentences, and compare the performance for distant and direct supervision. Our results suggest that distant supervision can work as well as, or better than direct supervision on this problem, and that distantly trained models can perform as well as, or better than human annotators.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.18653/v1/W19-5012
Other links	https://www.scopus.com/pages/publications/85094732207
Downloads	W19-5012 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

A distantly supervised dataset for automated data extraction from diagnostic studies