On Horizontal and Vertical Separation in Hierarchical Text Classification

doi:https://doi.org/10.1145/2970398.2970408

On Horizontal and Vertical Separation in Hierarchical Text Classification

Authors	M. Dehghani H. Azarbonyad J. Kamps M. Marx
Publication date	2016
Book title	ICTIR'16
Book subtitle	proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval : September 12-16, 2016, Newark, Delaware, USA
ISBN (electronic)	9781450344975
Event	ICTIR '16 ACM SIGIR International Conference on the Theory of Information Retrieval
Pages (from-to)	185-194
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI) Faculty of Science (FNWI) Interfacultary Research - Institute for Logic, Language and Computation (ILLC) Faculty of Humanities (FGw)
Abstract	Hierarchy is an effective and common way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based on that, we introduce "Strong Separation Principle" for optimizing expected effectiveness of classifiers decision based on separation property. Second, we present Significant Words Language Models (SWLM) which capture all, and only, the essential features of hierarchical entities according to their relative position in the hierarchy resulting in horizontally and vertically separable models. Third, we validate our claims on real world data and demonstrate that how SWLM improves the accuracy of classification and how it provides transferable models over time. Although discussions in this paper focus on the classification problem, the models are applicable to any information access tasks on data that has, or can be mapped to, a hierarchical structure.
Document type	Conference contribution
Language	English
Related publication	Hierarchical Text Classification Based on Separation in the Data or Feature Space
Published at	https://doi.org/10.1145/2970398.2970408
Downloads	p185-dehghani (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

On Horizontal and Vertical Separation in Hierarchical Text Classification