Evaluating automatically annotated treebanks for linguistic research
| Authors | |
|---|---|
| Publication date | 2016 |
| Host editors |
|
| Book title | 4th Workshop on Challenges in the Management of Large Corpora |
| Book subtitle | Wotkshop Programme : 28 May 2016 |
| Event | 4th Workshop on the Challenges in the Management of Large Corpora (CMCL-4) |
| Pages (from-to) | 8-14 |
| Publisher | Mannheim: Institut für Deutsche Sprache |
| Organisations |
|
| Abstract |
This study discusses evaluation methods for linguists to use when employing an automatically annotated treebank as a source of linguistic evidence. While treebanks are usually evaluated with a general measure over all the data, linguistic studies often focus on a particular construction or a group of structures. To judge the quality of linguistic evidence in this case, it would be beneficial to estimate annotation quality over all instances of a particular construction. I discuss the relative advantages and disadvantages of four approaches to this type of evaluation: manual evaluation of the results, manual evaluation of the text, falling back to simpler annotation and searching for particular instances of the construction. Furthermore, I illustrate the approaches using an example from Dutch linguistics, two-verb cluster constructions, and estimate precision and recall for this construction on a large automatically annotated treebank of Dutch. From this, I conclude that a combination of approaches on samples from the treebank can be used to estimate the accuracy of the annotation for the construction of interest. This allows researchers to make more definite linguistic claims on the basis of data from automatically annotated treebanks.
|
| Document type | Conference contribution |
| Language | English |
| Published at | http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf |
| Downloads |
Evaluating automatically annotated treebanks
(Final published version)
|
| Permalink to this page | |
