- Accurate parsing with compact tree-substitution grammars: double-DOP
- EMNLP 2011: Conference on Empirical Methods in Natural Language Processing
- Book/source title
- Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference
- Pages (from-to)
- Stroudsburg: The Association for Computational Linguistics
- Document type
- Conference contribution
- Interfacultary Research Institutes
- Institute for Logic, Language and Computation (ILLC)
We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe-
art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.