Ontology-based data integration involves the use of ontology(s) to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
Contents
Background
Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used:
Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies [1] has made it possible for the data integration community to leverage them for semantic integration of data and information.
The role of ontologies
Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles:
The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology.
In some systems like SIMS, the query is formulated using the ontology as a global query schema.
The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.
Approaches using ontologies for data integration
There are three main architectures that are implemented in ontology-based data integration applications, namely,