- Applying automatically parsed corpora to the study of language variation
- COLING 2014
- Book/source title
- COLING 2014: the 25th International Conference on Computational Linguistics
- Book/source subtitle
- proceedings of COLING 2014 : technical papers: August 23-29, 2014, Dublin, Ireland
- Pages (from-to)
- Sroudsburg, PA: Association for Computational Linguistics
- Document type
- Conference contribution
- Faculty of Humanities (FGw)
- Amsterdam Center for Language and Communication (ACLC)
In this work, we discuss the benefits of using automatically parsed corpora to study language variation. The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another domain of text, using additional data.
- Final publisher version
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.