Evaluating automatically annotated treebanks for linguistic research

J. Bloem

Evaluating automatically annotated treebanks for linguistic research

Authors	J. Bloem
Publication date	2016
Host editors	P. Bański M. Kupietz H. Lüngen A. Witt A. Barbaresi H. Biber E. Breiteneder S. Clematide
Book title	4th Workshop on Challenges in the Management of Large Corpora
Book subtitle	Wotkshop Programme : 28 May 2016
Event	4th Workshop on the Challenges in the Management of Large Corpora (CMCL-4)
Pages (from-to)	8-14
Publisher	Mannheim: Institut für Deutsche Sprache
Organisations	Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract	This study discusses evaluation methods for linguists to use when employing an automatically annotated treebank as a source of linguistic evidence. While treebanks are usually evaluated with a general measure over all the data, linguistic studies often focus on a particular construction or a group of structures. To judge the quality of linguistic evidence in this case, it would be beneficial to estimate annotation quality over all instances of a particular construction. I discuss the relative advantages and disadvantages of four approaches to this type of evaluation: manual evaluation of the results, manual evaluation of the text, falling back to simpler annotation and searching for particular instances of the construction. Furthermore, I illustrate the approaches using an example from Dutch linguistics, two-verb cluster constructions, and estimate precision and recall for this construction on a large automatically annotated treebank of Dutch. From this, I conclude that a combination of approaches on samples from the treebank can be used to estimate the accuracy of the annotation for the construction of interest. This allows researchers to make more definite linguistic claims on the basis of data from automatically annotated treebanks.
Document type	Conference contribution
Language	English
Published at	http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf
Downloads	Evaluating automatically annotated treebanks (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Evaluating automatically annotated treebanks for linguistic research