Evaluating automatically annotated treebanks for linguistic research

Open Access
Authors
Publication date 2016
Host editors
  • P. Bański
  • M. Kupietz
  • H. Lüngen
  • A. Witt
  • A. Barbaresi
  • H. Biber
  • E. Breiteneder
  • S. Clematide
Book title 4th Workshop on Challenges in the Management of Large Corpora
Book subtitle Wotkshop Programme : 28 May 2016
Event 4th Workshop on the Challenges in the Management of Large Corpora (CMCL-4)
Pages (from-to) 8-14
Publisher Mannheim: Institut für Deutsche Sprache
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract
This study discusses evaluation methods for linguists to use when employing an automatically annotated treebank as a source of linguistic evidence. While treebanks are usually evaluated with a general measure over all the data, linguistic studies often focus on a particular construction or a group of structures. To judge the quality of linguistic evidence in this case, it would be beneficial to estimate annotation quality over all instances of a particular construction. I discuss the relative advantages and disadvantages of four approaches to this type of evaluation: manual evaluation of the results, manual evaluation of the text, falling back to simpler annotation and searching for particular instances of the construction. Furthermore, I illustrate the approaches using an example from Dutch linguistics, two-verb cluster constructions, and estimate precision and recall for this construction on a large automatically annotated treebank of Dutch. From this, I conclude that a combination of approaches on samples from the treebank can be used to estimate the accuracy of the annotation for the construction of interest. This allows researchers to make more definite linguistic claims on the basis of data from automatically annotated treebanks.
Document type Conference contribution
Language English
Published at http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf
Downloads
Permalink to this page
Back