Quantifying and Propagating Uncertainty in Automated Linked Data Integration

  • Klitos Christodoulou
  • , Fernando Rene Sanchez Serrano
  • , Alvaro A. A. Fernandes
  • , Norman W. Paton
  • , Abdelkader Hameurlain (Editor)
  • , Roland Wagner (Editor)

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The Web of Data consists of numerous Linked Data (LD) sources from many largely independent publishers, giving rise to the need for data integration at scale. To address data integration at scale, automation can provide candidate integrations that underpin a pay-as-you-go approach. However, automated approaches need: (i) to operate across several data integration steps; (ii) to build on diverse sources of evidence; and (iii) to contend with uncertainty. This paper describes the construction of probabilistic models that yield degrees of belief both on the equivalence of real-world concepts, and on the ability of mapping expressions to return correct results. The paper shows how such models can underpin a Bayesian approach to assimilating different forms of evidence: syntactic (in the form of similarity scores derived by string-based matchers), semantic (in the form of semantic annotations stemming from LD vocabularies), and internal in the form of fitness values for candidate mappings. The paper presents an empirical evaluation of the methodology described with respect to equivalence and correctness judgements made by human experts. Experimental evaluation confirms that the proposed Bayesian methodology is suitable as a generic, principled approach for quantifying and assimilating different pieces of evidence throughout the various phases of an automated data integration process.

    Original languageEnglish
    Pages (from-to)81-112
    Number of pages32
    JournalTransactions on Large-Scale Data- and Knowledge-Centered Systems XXXVII
    DOIs
    Publication statusPublished - 2018

    Keywords

    • Bayesian updating
    • Data integration
    • Linked data
    • Probabilistic modelling

    Fingerprint

    Dive into the research topics of 'Quantifying and Propagating Uncertainty in Automated Linked Data Integration'. Together they form a unique fingerprint.

    Cite this