Adding Domain Knowledge to Improve Entity Resolution in 17th and 18th Century Amsterdam Archival Records

Open Access
Authors
  • A.J. Feelders
Publication date 2022
Host editors
  • A. Dimou
  • S. Neumaier
  • T. Pellegrini
  • S. Vahdati
Book title Towards a Knowledge-Aware AI
Book subtitle SEMANTiCS 2022 — Proceedings of the 18th International Conference on Semantic Systems, 13–15 September 2022, Vienna, Austria
ISBN
  • 9781643683201
  • 9783898387675
ISBN (electronic)
  • 9781643683218
Series Studies on the Semantic Web
Event SEMANTiCS 2022
Pages (from-to) 90-104
Number of pages 15
Publisher Amsterdam: IOS Press
Organisations
  • Faculty of Humanities (FGw) - Amsterdam Institute for Humanities Research (AIHR) - Amsterdam School for Heritage, Memory and Material Culture (AHM)
Abstract
The problem of entity resolution is central in the field of Digital Humanities. It is also one of the major issues in the Golden Agents project, which aims at creating an infrastructure that enables researchers to search for patterns that span across decentralised knowledge graphs from cultural heritage institutes. To this end, we created a method to perform entity resolution on complex historical knowledge graphs. In previous work, we encoded and embedded the relevant (duplicate) entities in a vector space to derive similarities between them based on sharing a similar context in RDF graphs. In some cases, however, available domain knowledge or rational axioms can be applied to improve entity resolution performance. We show how domain knowledge and rational axioms relevant to the task at hand can be expressed as (probabilistic) rules, and how the information derived from rule application can be combined with quantitative information from the embedding. In this work, we perform our entity resolution method on two data sets. First, we apply it to a data set for which we have a detailed ground truth for validation. This experiment shows that the combination of embedding and the application of domain knowledge and rational axioms leads to improved resolution performance. Second, we perform a case study by applying our method to a larger data set for which there is no ground truth and where the outcome is subsequently validated by a domain expert. Results of this demonstrate that our method achieves a very high precision.
Document type Conference contribution
Language English
Published at https://doi.org/10.3233/SSW220012
Downloads
SSW-55-SSW220012 (Final published version)
Permalink to this page
Back