- Learning structural dependencies of words in the Zipfian Tail
- Journal of Logic and Computation
- Volume | Issue number
- 24 | 2
- Pages (from-to)
- Number of pages
- Document type
- Interfacultary Research Institutes
Faculty of Science (FNWI)
- Institute for Logic, Language and Computation (ILLC)
This article uses semi-supervised Expectation Maximization (EM) to learn lexico-syntactic dependencies, i.e. associations between words and the structures that occur with them. Due to Zipfian distributions in language, such dependencies are extremely sparse in labelled data, and unlabelled data are the only source for learning them. Specifically, we learn sparse lexical parameters of a generative parsing model (a Probabilistic Context-Free Grammar, PCFG) that is initially estimated over the Penn Treebank. Our lexical parameters are similar to supertags - they are fine-grained, and encode complex structural information at the pre-terminal level. Our goal is to use unlabelled data to learn these for words that are rare or unseen in the labelled data. We get large error reductions (up to 17.5%) in parsing ambiguous structures associated with unseen verbs, the most important case of learning lexico-structural dependencies, resulting in a statistically significant improvement in labelled bracketing score of the treebank PCFG. Our semi-supervised method incorporates structural and lexical priors from the labelled data to guide estimation from unlabelled data, and is the first successful use of semi-supervised EM to improve a generative structured model already trained over large labelled data. The method scales well to larger amounts of unlabelled data, and also gives substantial error reductions (up to 11.5%) for models trained on smaller amounts of labelled data, making it relevant to low-resource languages with small treebanks as well.
- go to publisher's site
- Other links
- Link to publication in Scopus
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.