Accurate parsing with compact tree-substitution grammars: double-DOP

Authors	F. Sangati W.H. Zuidema
Publication date	2011
Book title	Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference
ISBN	9781937284114
Event	EMNLP 2011: Conference on Empirical Methods in Natural Language Processing
Pages (from-to)	84-95
Publisher	Stroudsburg: The Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	We present a novel approach to Data-Oriented Parsing (DOP). Like other DOP models, our parser utilizes syntactic fragments of arbitrary size from a treebank to analyze new sentences, but, crucially, it uses only those which are encountered at least twice. This criterion allows us to work with a relatively small but representative set of fragments, which can be employed as the symbolic backbone of several probabilistic generative models. For parsing we define a transform-backtransform approach that allows us to use standard PCFG technology, making our results easily replicable. According to standard Parseval metrics, our best model is on par with many state-ofthe- art parsers, while offering some complementary benefits: a simple generative probability model, and an explicit representation of the larger units of grammar.
Document type	Conference contribution
Language	English
Published at	http://homepages.inf.ed.ac.uk/fsangati/Sangati_Zuidema_EMNLP11.pdf
Permalink to this page

Back

UvA-DARE