Automatic Animacy Classification for Romanian Nouns

Open Access
Authors
Publication date 2024
Host editors
  • N. Calzolari
  • M.-Y. Kan
  • V. Hoste
  • A. Lenci
  • S. Sakti
  • N. Xue
Book title The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Book subtitle main conference proceedings : 20-25 May, 2024, Torino, Italia
ISBN (electronic)
  • 9782493814104
Series COLING
Event 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Pages (from-to) 1825–1831
Publisher ELRA Language Resources Association
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
We introduce the first Romanian animacy classifier, specifically a type-based binary classifier of Romanian nouns into the classes human/non-human, using pre-trained word embeddings and animacy information derived from Romanian WordNet. By obtaining a seed set of labeled nouns and their embeddings, we are able to train classifiers that generalize to unseen nouns. We compare three different architectures and observe good performance on classifying word types. In addition, we manually annotate a small corpus for animacy to perform a token-based evaluation of Romanian animacy classification in a naturalistic setting, which reveals limitations of the type-based classification approach.
Document type Conference contribution
Language English
Published at https://aclanthology.org/2024.lrec-main.163
Other links https://github.com/mariatepei/RO-animacy
Downloads
2024.lrec-main.163 (Final published version)
Permalink to this page
Back