Co-typing of a data exchange setting and co-typing graph

Get Complete Project Material File(s) Now! »

Relational schema and dependencies

We assume an infinite set of attribute names A Va. A relational schema is a tuple R = (R; attrs; fd; ind) where R is a finite set of relation names, each relation name with a given arity; attrs is a function attrs:R ! P(A) that assigns to each relation name R 2 R a finite set of attribute names whose cardinality is equal to the arity of R; fd is a set of functional dependencies (fds) of the form R : X ! Y , where R 2 R is a relation name, and X; Y attrs(R), and ind is a set of inclusion dependencies of the form R[X] P [Y ], where R; P 2 R are relation names, X attrs(R) and Y attrs(P ) and jXj = jY j. Also, we assume that the set of attribute names of every relation have a fixed order. Intuitively, a fd R : X ! Y means that values of attribute names in X determine the values of attribute names in Y in every tuple; and an inclusion dependency between two relations R and P , R[X] P [Y ], means that every value of an attribute in X is the same value of an attribute in Y . We point out that key dependencies (kds) are a special case of functional dependencies where Y = attrs(R). Also, foreign key constraints are a special case of inclusion dependencies where Y is a primary key in P .
Given a finite set A A of attribute names, a tuple over A is a function that maps from an attribute name to a value, i.e., t : A ! Dom. We view a tuple as a record, consequently we write t:a for t(a). Let X A be a set of attributes, by t:X we denote a tuple over X that consists of precisely the value of t on attributes in X, i.e., t : X ! Dom and t:X(a) = t(a) for a 2 X. An instance I of a relational schema R is a function I that maps every relation name R 2 R to a finite set I(R) of tuples over attributes of R. The size of an instance I of R, denoted by jIj is the sum of number of tuples by relation mapped by I. An instance I of a relational schema R satisfies a fd R : X ! Y , denoted by I j= R : X ! Y , if for any pair of tuples in I(R) that agree on X also agree on Y . Formally, I j= R : X ! Y iff 8t1; t2 2 I(R): t1:X = t2:X ) t1:Y = t2:Y . An instance I of a relational schema R satisfies an inclusion dependency R[X] P [Y ], denoted by I j= R[X] P [Y ], if for any tuple t 2 I(R), there is a tuple t0 2 I(P ) such that the tuple t over X agrees with t0 over Y . Formally, I j= R[X] P [Y ] iff 8t 2 I(R): 9t0 2 I(P ): ran(t) ran(t0) ^ t:X = t0:Y where ran returns the range of t. Finally, we say that an instance I is consistent if I satisfies the set of dependencies fd [ ind. The active domain adom(I) of the instance I is the set of values from Dom used in I. For ease of use, when writing a functional dependency or inclusion dependency, we write a set of attributes A = fa1; : : : ; ang as A = a1 : : : an. We denote the set of instances of a relational schema R by Inst(R). Example 1.2.2 (cont. Example 1.2.1). Take the conference database, the relational schema R0 = (R0; attrs; fd; ind) has the following relation names R0 = fResearcher; UniTeam; Conference; Registration; Talk; Collaboratorg.

Relational data exchange

We recall the classical data management problem called data exchange. Relational data exchange is the transformation from a source relational database to a target relational database. Such a transformation is defined by a set of mappings from the source rela-tional schema to the target schema. These elements constitute the data exchange setting. The result of a transformation of a source instance is an instance of the target schema, and if it satisfies the constraints (of target schema), it is called a solution. We illustrate the above notions with the help of the following example. Example 1.3.1 (cont. Example 1.2.1 and 1.2.2). Suppose that the conference database needs to be transformed into a database that follows a different relational schema. Let this target schema contain the relations: Author, Publication and Conference; and the constraints be the following dependencies:
Author : IdAuthor ! IdAuthor Name Country.
Conference : Name Year ! Name Year BookTitle Place.
Publication : IdAuthor Title ! IdAuthor Title BookTitle.

Universal solution

Given a data exchange setting and a source instance, there might be no solution or possibly infinite number of solutions, and a considerable amount of work has focused on finding universal solutions that represent the entire space of solutions. Intuitively, a universal solution is the most general solution that is included in all specific solutions. As we shall see later on, universal solutions are useful for computing answers to a given query over all solutions. We illustrate the significance of being a particular and general solution by their comparison in the following example. Example 1.3.6. Recall the solution in Figure 1.9 denoted by J and the solution in Figure 1.4 denoted by J1. We make the following observations. J1 contains data values not present in the source instance e.g., author Lucia and the conference name AMW. Also this solution contains facts that cannot be de-rived from source instance using st-tgds. For instance, the publication with title On algebraic connectivity of graphs is in the conference ISWC. In J1, all authors are from the same country. On the other hand, J contains only data values that are present in the source instance and null values invented to satisfy target constraints. Also, J contains only facts that are derived from source instance. Comparing these two solutions, we observe that J1 has been constructed with some additional information that makes it more specific solution than J. Furthermore, we see all the information of J can be found in J1, but not all the information of J1 can be found in J. We use homomorphisms to compare solutions: we say that J subsumes J0 if there is a homomorphism from J to J0.

Table of contents :

1 Preliminaries
1.1 Logic
1.2 Relational databases
1.2.1 Relational schema and dependencies
1.2.2 Logic formalization
1.2.3 Database queries
1.3 Relational data exchange
1.3.1 Data exchange setting
1.3.2 Chase procedure
1.3.3 Universal solution
1.3.4 Certain query answering
1.3.5 Consistency
1.4 Resource description framework
1.4.1 RDF graph
1.4.2 Logic formalization
1.5 Schemas for RDF graphs
1.5.1 Typed RDF graph
1.5.2 Shape constraints language
1.5.3 Logic formalization
1.5.4 Shape constraints as dependencies
2 Relational to RDF data exchange
2.1 Relational to RDF data exchange setting
2.2 R2RML: proof of concept
2.3 Problems of interest
2.3.1 Checking consistency
2.3.2 Computing certain answers
2.3.3 Visual mapping language
2.3.4 Schema elicitation
2.4 Related work
3 Consistency
3.1 The opposite side of consistency: inconsistency
3.1.1 Sources of inconsistency
3.1.2 Importance of core pre-solution
3.2 Value consistency
3.2.1 Testing value consistency
3.3 Node kind consistency
3.3.1 Co-typing of a data exchange setting and co-typing graph
3.3.2 Co-typing of a graph
3.3.3 Formalization
3.3.4 Necessary condition
3.3.5 Algorithm for testing node kind consistency
3.4 Deciding consistency
3.4.1 Decidability
3.5 Conclusion
3.6 Related work
4 Certain query answering
4.1 Motivation and problems
4.2 Results from existing approaches
4.2.1 Super-weakly acyclic tgds
4.2.2 Guarded tgds
4.3 Simulation-based approach
4.3.1 Preliminar notions
4.3.2 Forward NRE-based Boolean query language
4.3.3 Robust query classes
4.3.4 Universal simulation solution
4.4 Conclusion
4.5 Related work
5 Visual mapping language
5.1 Motivation and use case
5.2 Preliminary notions
5.3 The intermediary language
5.4 The visual mapping language
5.5 ShERML
5.5.1 Architecture
5.5.2 VML Editor
5.5.3 Materializer
5.5.4 Converter
5.5.5 Consistency checking
5.5.6 Additional features
5.6 Evaluation
5.6.1 Methodology
5.6.2 Results
5.7 Discussion and conclusion
5.8 Related work
6 Shapes schema elicitation
6.1 Motivation
6.2 Problem statement
6.2.1 Schema-less data exchange setting
6.2.2 Elicitation Problem
6.3 M3 Elicitation algorithm
6.3.1 Preliminary notions
6.3.2 Algorithm
6.4 Soundness
6.5 Completeness
6.6 Negative results
6.7 Conclusion
6.8 Related work
Bibliography