Sources of evidence for automatic indexing of political texts

M. Dehghani; H. Azarbonyad; M. Marx; J. Kamps

doi:https://doi.org/10.1007/978-3-319-16354-3_63

Sources of evidence for automatic indexing of political texts

Authors	M. Dehghani H. Azarbonyad M. Marx J. Kamps
Publication date	2015
Host editors	A. Hanbury G. Kazai A. Rauber N. Fuhr
Book title	Advances in Information Retrieval
Book subtitle	37th European Conference on IR Research, ECIR 2015, Vienna, Austria, March 29-April 2, 2015 : proceedings
ISBN	9783319163536
ISBN (electronic)	9783319163543
Series	Lecture Notes in Computer Science
Event	ECIR 2015: 37th European Conference on Information Retrieval
Pages (from-to)	568-573
Publisher	Cham: Springer
Organisations	Faculty of Science (FNWI) Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) Other Faculty of Humanities (FGw) Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Political texts on the Web, documenting laws and policies and the process leading to them, are of key importance to government, industry, and every individual citizen. Yet access to such texts is difficult due to the ever increasing volume and complexity of the content, prompting the need for indexing or annotating them with a common controlled vocabulary or ontology. In this paper, we investigate the effectiveness of different sources of evidence—such as the labeled training data, textual glosses of descriptor terms, and the thesaurus structure—for automatically indexing political texts. Our main findings are the following. First, using a learning to rank (LTR) approach integrating all features, we observe significantly better performance than previous systems. Second, the analysis of feature weights reveals the relative importance of various sources of evidence, also giving insight in the underlying classification problem. Third, a lean-and-mean system using only four features (text, title, descriptor glosses, descriptor term popularity) is able to perform at 97% of the large LTR model.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-3-319-16354-3_63
Downloads	439280 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Sources of evidence for automatic indexing of political texts