@PhilosTEI: Building Corpora for Philosophers

Open Access
Authors
Publication date 2017
Host editors
  • J. Odijk
  • A. van Hessen
Book title CLARIN in the Low Countries
ISBN
  • 9781911529248
ISBN (electronic)
  • 9781911529255
  • 9781911529262
  • 9781911529279
Pages (from-to) 379-392
Publisher London: Ubiquity Press
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
For philosophers to be able to take a computational turn in their field, especially if that field relies heavily on historical material, it is crucial to be able to build high-quality, easily and freely accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. At the moment, corpora matching these needs are virtually non-existent. Within the CLARIN-NL project @PhilosTEI, we have addressed the problem of building this kind of corpora by developing an open-source, web-based, user-friendly workflow from textual images to TEI, based on state-of-the-art open-source OCR software Tesseract, and a multi-language version of TICCL, a powerful OCR post-correction tool. We have demonstrated the utility of the @PhilosTEI tool by applying it to a multilingual, multi-script corpus of important 18th to 20th century European philosophical texts.
Document type Chapter
Language English
Published at https://doi.org/10.5334/bbi.32
Published at http://www.jstor.org/stable/10.2307/j.ctv3t5qjk http://www.oapen.org/search?identifier=641502
Downloads
clarin-in-the-low-countries. (Final published version)
Permalink to this page
Back