Constructing a Recipe Web from Historical Newspapers

Authors
Publication date 2018
Host editors
  • D. Vrandečić
  • K. Bontcheva
  • M.C. Suárez-Figueroa
  • V. Presutti
  • I. Celino
  • M. Sabou
  • L.-A. Kaffee
  • E. Simperl
Book title The Semantic Web – ISWC 2018
Book subtitle 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018 : proceedings
ISBN
  • 9783030006709
ISBN (electronic)
  • 9783030006716
Series Lecture Notes in Computer Science
Event 17th International Semantic Web Conference, ISWC 2018
Volume | Issue number I
Pages (from-to) 217-232
Number of pages 16
Publisher Cham: Springer
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract

Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied. Digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how natural language processing, machine learning, and semantic web can be combined to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture.

Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-3-030-00671-6_13
Other links https://www.scopus.com/pages/publications/85054854478
Permalink to this page
Back