- Uncertain data integration using functional dependencies
- Number of pages
- Amsterdam: Informatics Institute, University of Amsterdam
- Document type
- Faculty of Science (FNWI)
- Informatics Institute (IVI)
Data integration systems are crucial for applications that need to provide a uniform interface to a set of autonomous and heterogeneous data sources. However, setting up a full data integration system
for many application contexts, e.g. web and scientifc data management, requires significant human effort which prevents it from being really scalable.
In this paper, we propose IFD (Integration based on Functional Dependencies), a pay-as-you-go data integration system that allows integrating a given set of data sources, as well as incrementally integrating additional sources. IFD takes advantage of the background knowledge implied within functional dependencies for matching the source schemas. Our system is built on a probabilistic data model that allows capturing the uncertainty in data integration systems. Our performance evaluation
results show significant performance gains of our approach in terms of recall and precision compared to the baseline approaches. They confirm the importance of functional dependencies and also the contribution of using a probabilistic data model in improving the quality of schema matching. The analytical study and experiments show that IFD scales well.