Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text

G. Cheirmpos; S.A. Tabatabaei; E. Kanoulas; G. Tsatsaronis

doi:https://doi.org/10.1007/978-3-031-53969-5_11

Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text

Authors	G. Cheirmpos S.A. Tabatabaei E. Kanoulas G. Tsatsaronis
Publication date	2024
Host editors	G. Nicosia V. Ojha E. La Malfa G. La Malfa P.M. Pardalos R. Umeton
Book title	Machine Learning, Optimization, and Data Science
Book subtitle	9th International Conference, LOD 2023, Grasmere, UK, September 22–26, 2023 : revised selected papers
ISBN	9783031539688
ISBN (electronic)	9783031539695
Series	Lecture Notes in Computer Science
Event	9th International Conference on Machine Learning Optimization Data Science
Volume \| Issue number	I
Pages (from-to)	131–141
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Named entity recognition (NER) is an important component of many information extraction and linking pipelines. The task is especially challenging in a low-resource scenario, where there is very limited amount of high quality annotated data. In this paper we benchmark machine learning approaches for NER that may be very effective in such cases, and compare their performance in a novel application; information extraction of research infrastructure from scientific manuscripts. We explore approaches such as incorporating Contrastive Learning (CL), as well as Conditional Random Fields (CRF) weights in BERT-based architectures and demonstrate experimentally that such combinations are very efficient in few-shot learning set-ups, verifying similar findings that have been reported in other areas of NLP, as well as Computer Vision. More specifically, we show that the usage of CRF weights in BERT-based architectures achieves noteworthy improvements in the overall NER task by approximately 12%, and that in few-shot setups the effectiveness of CRF weights is much higher in smaller training sets.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-031-53969-5_11
Downloads	Benchmarking Named Entity Recognition Approaches (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text