- Automatic animacy classification for Dutch
- Computational Linguistics in the Netherlands Journal
- Pages (from-to)
- Number of pages
- Document type
- Faculty of Humanities (FGw)
- Amsterdam Center for Language and Communication (ACLC)
We present an automatic animacy classifier for Dutch that can determine the animacy status of
nouns -- how alive the noun's referent is (human, inanimate, etc.). Animacy is a semantic property
that has been shown to play a role in human sentence processing, felicity and grammaticality.
Although animacy is not marked explicitly in Dutch, we expect knowledge about animacy to be
helpful for parsing, translation and other NLP tasks. Only a few animacy classifiers and animacy-
annotated corpora exist internationally. For Dutch, animacy information is only available in the
Cornetto lexical-semantic database. We augment this lexical information with context information
from the Dutch Lassy Large treebank, to create training data for an animacy classifier that uses
a novel kind of context features.
We use the k-nearest neighbour algorithm with distributional lexical features, e.g. how frequently the noun occurs as a subject of the verb `to think' in a corpus, to decide on the (pre-dominant) animacy class. The size of the Lassy Large corpus makes this possible, and the high level of detail these word association features provide, results in accurate Dutch-language animacy classification.
- Final publisher version