Een corpus waar alle constructies in gevonden zouden moeten kunnen worden?

J. Bloem

doi:https://doi.org/10.5117/NEDTAA2020.1.003.BLOE

Een corpus waar alle constructies in gevonden zouden moeten kunnen worden? Corpusonderzoek met behulp van automatisch gegenereerde syntactische annotatie

Authors	J. Bloem
Publication date	04-2020
Journal	Nederlandse Taalkunde
Event	Dag van de Nederlandse Zinsbouw 12
Volume \| Issue number	25 \| 1
Pages (from-to)	39-71
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	In this contribution, I discuss the use of automatic syntactic annotation in Dutch corpus research, using a case study of five-verb clusters. Large amounts of text can be annotated automatically, but the parser makes mistakes, while correct annotation is very important in linguistic research. How much of a problem is this, and how can we learn about the extent of these parsing mistakes? There are several approaches to evaluating the quality of automatic annotation for specific research questions. I demonstrate these approaches for the case study at hand, which will help us to make claims based on automatically annotated corpus data with greater confidence. In dit artikel ga ik in op het gebruik van automatische syntactische annotatie bij Nederlands corpusonderzoek, op basis van een voorbeeldstudie naar vijfledige werkwoordsclusters. Grote hoeveelheden tekst kunnen automatisch geannoteerd worden, maar de parsecomputer maakt fouten, terwijl correcte annotatie bij taalkundig onderzoek zeer belangrijk is. Hoe erg is dit, en hoe kunnen we een beeld krijgen van de fouten die de parser maakt? Het blijkt dat er een aantal benaderingen zijn om de kwaliteit van de automatische annotatie voor een specifieke onderzoeksvraag te bepalen, waardoor we met grotere zekerheid uitspraken kunnen doen op basis van automatisch geannoteerde corpusdata.
Document type	Article
Language	Dutch
Published at	https://doi.org/10.5117/NEDTAA2020.1.003.BLOE (Final published version)
Downloads	s3 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Een corpus waar alle constructies in gevonden zouden moeten kunnen worden? Corpusonderzoek met behulp van automatisch gegenereerde syntactische annotatie