Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text
| Authors |
|
|---|---|
| Publication date | 2024 |
| Host editors |
|
| Book title | Machine Learning, Optimization, and Data Science |
| Book subtitle | 9th International Conference, LOD 2023, Grasmere, UK, September 22–26, 2023 : revised selected papers |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Event | 9th International Conference on Machine Learning Optimization Data Science |
| Volume | Issue number | I |
| Pages (from-to) | 131–141 |
| Publisher | Cham: Springer |
| Organisations |
|
| Abstract |
Named entity recognition (NER) is an important component of many information extraction and linking pipelines. The task is especially challenging in a low-resource scenario, where there is very limited amount of high quality annotated data. In this paper we benchmark machine learning approaches for NER that may be very effective in such cases, and compare their performance in a novel application; information extraction of research infrastructure from scientific manuscripts. We explore approaches such as incorporating Contrastive Learning (CL), as well as Conditional Random Fields (CRF) weights in BERT-based architectures and demonstrate experimentally that such combinations are very efficient in few-shot learning set-ups, verifying similar findings that have been reported in other areas of NLP, as well as Computer Vision. More specifically, we show that the usage of CRF weights in BERT-based architectures achieves noteworthy improvements in the overall NER task by approximately 12%, and that in few-shot setups the effectiveness of CRF weights is much higher in smaller training sets.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-3-031-53969-5_11 |
| Downloads |
Benchmarking Named Entity Recognition Approaches
(Final published version)
|
| Permalink to this page | |