Automatic animacy classification for Dutch

Open Access
Authors
Publication date 2013
Journal Computational Linguistics in the Netherlands Journal
Volume | Issue number 3
Pages (from-to) 82-102
Number of pages 21
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam Center for Language and Communication (ACLC)
Abstract
We present an automatic animacy classifier for Dutch that can determine the animacy status of
nouns -- how alive the noun's referent is (human, inanimate, etc.). Animacy is a semantic property
that has been shown to play a role in human sentence processing, felicity and grammaticality.
Although animacy is not marked explicitly in Dutch, we expect knowledge about animacy to be
helpful for parsing, translation and other NLP tasks. Only a few animacy classifiers and animacy-
annotated corpora exist internationally. For Dutch, animacy information is only available in the
Cornetto lexical-semantic database. We augment this lexical information with context information
from the Dutch Lassy Large treebank, to create training data for an animacy classifier that uses
a novel kind of context features.
We use the k-nearest neighbour algorithm with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb `to think' in a corpus, to decide on the (pre-dominant) animacy class. The size of the Lassy Large corpus makes this possible, and the high level of detail these word association features provide, results in accurate Dutch-language animacy classification.
Document type Article
Language English
Published at http://www.clinjournal.org/sites/clinjournal.org/files/06-Bloem-Bouma-CLIN2013.pdf
Downloads
06-Bloem-Bouma-CLIN2013 (Final published version)
Permalink to this page
Back