- On entity resolution in probabilistic data
- Award date
- 30 September 2014
- Number of pages
- Document type
- PhD thesis
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Entity resolution (ER) is the problem of identifying duplicate tuples, which are the tuples that represent the same real-world entity. There are many real-life applications in which the ER problem arises. These applications range from news aggregation websites, identifying the news that cover the same story, in order to avoid presenting one story several times to the user, to the integration of two companies' customer databases in the case of a merger, where identifying the tuples that refer to the same customer is crucial.
Due to its diverse applications, the ER problem has been formulated in different ways in the literature. The two main ER's related problem formulations include: 1) identity resolution, and 2) deduplication. In identity resolution, the aim is to find duplicate(s) of a given tuple in a given database, while in deduplication, the aim is to find groups of duplicate tuples in a given database, and merge them in order to increase the quality of the database itself.
The ER problem is however not limited to deterministic (ordinary) databases, rather it also arises in applications that deal with probabilistic databases, i.e. databases in which each tuple or attribute value is associated with a probability value to, for instance, indicate its confidence level.
In this thesis, we study the ER problem in probabilistic databases. More specifically, we study the following problems: 1) identity resolution in probabilistic data, 2) identity resolution in distributed probabilistic data, 3) deduplication in probabilistic data, and 4) schema matching in a fully automated setting.
- Research conducted at: Universiteit van Amsterdam
Series: SIKS dissertation series 2014-32
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.