A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization

Authors
Publication date 24-08-2016
Journal IEEE Transactions on Image Processing
Volume | Issue number 25 | 11
Pages (from-to) 5252-5265
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam School of Historical Studies (ASH)
Abstract
It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results.
Document type Article
Language English
Published at https://doi.org/10.1109/TIP.2016.2602078
Permalink to this page
Back