Ontology-Based Query Answering Problem
We call fact a closed conjunction of atoms. A conjunctive query is a formula of F OL(^; 9), in case the formula is closed, we called it Boolean conjunctive query. Free variables of the query are called the answer variables.
Example 2.5. If we consider the classes P rof essor, T eacher and Reviewer, the property reviews and the constant alice, we can define the following fact D and queries q1 and q2:
• D = P rof essor(alice) ^ Reviewer(alice)
• q1 = 9v9wT eacher(v) ^ reviews(v; w)
• q2(x) = P rof essor(x)
q1 is a Boolean conjunctive query and q2 conjunctive query, which has a single answer variable x.
Given a fact D, an answer to a query q with free variables fx1; x2; : : : ; xng is the restriction to fx1; x2; : : : ; xng of a homomorphism h from q to D. In particular, we have D j= h(q).
The behavior in this example is not fully satisfying, as it does not take the additional knowledge that we have on professors and teachers into account: indeed, we know that all professors are actually teachers. We tackle this by introducing existential rules [BLMS11], which are a formalism to represent formal definitions of domain knowledge, also known ontologies. Note that we could have alternatively used Description Logics [BCM+03].
Definition 2.6 (Existential Rule). An existential rule R is a first-order formula with the following form:
R = 8x8y B[x; y] ! 9zH[y; z]
where x; y; z are sets of variables and B; H are conjunctions of atoms whose variables belong to respectively x [y and y [z. We call B the body of the rule R, denoted body(R), H the head of R, denoted head(R). The frontier of R, denoted fr(R), is the set of variables y and z are call the existential variables of R.
Example 2.7 (Existential Rules). Using the same vocabulary as in Example 2.5, we can define the following existential rules:
• R1 = 8x Reviewer(x) ) 9y reviews(x; y);
• R2 = 8x P rof essor(x) ) T eacher(x):
R1 says that each reviewer reviews something. R2 says that each professor is also a teacher. We notice that in rule R1 there are no existential variables.
We call a set of rules R an ontology. A pair (D; R), where D is a fact and R is an ontology, is called a knowledge base, also denoted by K . Ontology-based query answering is the problem of finding the answers of a query on a knowledge base. Given a knowledge base and a (Boolean) query, an important reasoning task is to determine whether the query is entailed by the knowledge base. Several methods have been designed to this purpose. To ease the understanding, we first introduce saturation, which intuitively adds atoms that are entailed by the knowledge base to the data. We focus in the next section on an alternative approach called query rewriting, which is at the core of our approach.
Definition 2.8 (Rule Application). A rule R is applicable to a fact D if there exists a homomorphism h from the body of R to D. The result of the application of R with respect to h is (D; R; hsafe) = F [ hsafe(head(R)), where hsafe is a substitution on variables of head(R) that replaces a variable x by h(x) if x 2 fr(R) and by a fresh variable otherwise.
One can apply a rule on a fact that results from a rule application, hence leading to the notion of derivation sequence.
Definition 2.9 (Derivation Sequence). Let K = (D; R) be a knowledge base. An R -derivation se-quence of D is a finite sequence (D0 = D); D1; : : : ; Dn such as for 0 i n, there is Ri 2 R and a homomorphism hi from body(Ri) to Di such as Fi+1 = (Di; Ri; hi).
Example 2.10. We can apply the rule R1 = 8x Reviewer(x) ) 9y reviews(x; y) to D = P rof essor(alice) ^ Reviewer(alice) and get the fact D1 = P rof essor(alice) ^ Reviewer(alice) ^ 9x reviews(alice; x). We can apply the rule R2 = 8x P rof essor(x) ) T eacher(x) to D1 and we get the fact D2 = P rof essor(alice)^Reviewer(alice)^9x reviews(alice; x)^T eacher(alice). Let R = fR1; R2g, the sequence D; D1; D2 is an R-derivation sequence of D. We notice that D2 satifies q2, when D does not.
The following result states that entailment can be characterized through derivation sequences.
Theorem 2.11 ([BLMS11]). Let K = (D; R) be a knowledge base and q a Boolean conjunctive query, we know that K j= q if and only if there exists an R -derivation (D0 = D); D1; : : : ; Dk such that Dk j= q.
Rewriting Approach with Existential Rules
While saturation is a very intuitive approach to OBQA, an alternative approach that we will exploit in our approach is the so-called query rewriting approach. Instead of modifying the data in such a way that queries can be evaluated directly over the modified data, the query is modified in such a way that the modified query can be evaluated directly over the original data. We need a few technical tools to define this approach. For the ease of understanding, we only present the rewriting method on Boolean conjunctive queries, but it can be generalised for conjunctive queries.
Definition 2.12 (Separating Variables). Let q be a Boolean conjunctive query and be q0 a subset of q, the set of separating variables of q0 is denoted by sepq(q0) and is defined as:
sepq(q0) = vars(q n q0) \ vars(q0)
Definition 2.13 (Piece Unifier). Let q be a Boolean conjunctive query and R be an existential rule. A piece unifier = (q0; u) of q with R is such as q0 q and u is a function from vars(q0) [ fr(R) to const(q0) [ terms(head(R)) such as:
• for all y 2 fr(R); u(y) 2 fr(R) or u(y) is a constant;
• for all x 2 sepq(q0); u(x) 2 fr(R) or u(x) is a constant;
• u(q0) u(head(R)).
Using piece unifiers, the notion of direct rewriting of a query can be defined.
Definition 2.14 (Direct Rewriting). Given q a Boolean conjunctive query, R an existential rule and a piece unifier = (q0; u) of q with R, the direct rewriting of q with , denoted (q; R; ) is u(body(R) [ (q n q0)).
Example 2.15 (Direct Rewriting). If we use the notation from Example 2.5 and 2.7, we can find two direct rewritings of q1 by applying either R1 or R2. Considering R1, we find the following piece unifier 1 1 1 where 1 f g, 1 f 7! g and the direct rewriting of 1 with is = (q ; u ) q0 = T eacher(v) u = v x q q3 = 9x9w P rof essor(x) ^ reviews(x; w). Considering R2 on q3, we find the following piece unifier 2 3 2 , where 3 f g, this time the separing variables is not the empty set, but = (q0 ; u ) q0 = reviews(x; w) contains x. The direct rewriting of q3 by 2 is the query q4 = 9x9y; P rof essor(x) ^ Reviewer(x). We notice that D entails q4 from the Example 2.5 – hence this is an example of how rewriting the query enables to find more results than the original query.
As in the saturation case, it is possible to apply a rewriting step on a query that has been generated by a first rewriting step. This leads to the notion of rewriting sequence.
Definition 2.16 (Rewriting Sequence). A sequence of queries q0; q1; : : : ; qk is a R-rewriting sequence of q0 if for each 1 i k, it exists a rule Ri 2 R and a piece of unifier i of qi 1 with R such as
qi = (qi 1; Ri; i).
Definition 2.17 (R-rewriting). Let q be a Boolean conjunctive query and R be a set of existential rules, qk is a R-rewriting of q if there exists q0 = q; q1; : : : ; qk a R-rewriting sequence of q.
We recall the following theorem, proving that the above notion of query rewriting is adequate.
Theorem 2.18 ([BLMS11]). Let D be a fact, R a set of existential rules and q be a Boolean conjunctive query, D; R j= q if and only if there exists q0 a R-rewriting of q such that D j= q0.
In our approach, we will often exploit the following corollary of Theorem 2.18.
Corollary 2.19. Let q be a Boolean conjunctive query, R a set of existential rules and q0 a R-rewriting of q, then q0; R j= q.
Proof. We knows that q0 is a R-rewriting of q such as q0 j= q0, using Theorem 2.18 and since q0 can be seen as a fact, we know that q0; R j= q.
Let us notice that the set of rewritings of a query with respect to a set of rules R is not necessarily finite, as witnessed by the following example.
Example 2.20. Let us consider the query q = r(a; b) and the rule 8x 8y r(x; y) ^ r(y; z) ! r(x; z). Any query of the shape r(a; x1) ^ r(x1; x2) ^ : : : ^ r(xn; b), for any n, is a rewriting of q. There exists thus an infinity of (non-equivalent) R-rewriting of q.
Of special interest to our approach are sets of rules whose set of rewritings admits a finite covering. In other words, there exists a finite set of query Q such that for any fact D , it holds that D; R j= q if and only if there exists q0 2 Q such that D j= q0. Sets of rules ensuring that there exists such Q are called finite unification sets [BLMS11], and prominent exemple of such sets of rules are aGRD, linear [CGK13] or sticky rules [CGP10].
Given a query q and a knowledge base (D; R), we want to find interesting semantically defined subsets of the answers of q. We formalize this through the notion of labellisation of a query. For the sake of generality, we define this notion for first-order queries and not only formulas of F OL(^; 9).
Definition 3.1. A labellisation L of a query q with respect to a set of rules R is a set of queries such as each q0 2 L semantically implies q, formally: 0; R j= q
We expect that a labellisation of a query has at least two properties: covering and simplicity. Covering states that any answer of the original query is an answer of some query of the labellisation. Simplicity states that the labellisation does not contain redundant queries.
Definition 3.2 (Covering). A labellisation L of a query q with respect to a set of rules R is called covering if : _ q; = q0: R j q0 2L
Definition 3.3 (Simplicity). A labellisation L of a query q with respect to a set of rules R is called simple if : 8q1; q2 2 L ; q1; R 6j= q2:
Let us notice that this notion of labellisation defines a set acceptable objects. However, there exists trivial labellisations, such as fqg. Our goal is to find more interesting ones – and we will especially consider the cardinality of answers of the queries appearing in the labellisation.
As we will from now on exclusively consider conjunctive queries, we simplify the terminology and use the term “query” for conjunctive queries. We introduce the notion of a rewriting graph, which is at the core of the method presented in the next section, which returns a labellisation. The main idea behind the rewriting graph is to have a graph such as its vertices are the rewritings of the original query q and the edges contain semantic structure (entailment) relations between rewriting queries. We choose to find a labellisation in the set of rewriting queries of q, because they naturally are more precise queries than q by Proposition 2.19. We also use the way rewriting queries are build using rules which provide their semantic structure. The first Subsection 4.1 defines the general notion of the rewriting graph for existential rules. After a quick recall of RDFS and of a simpler rewriting procedure for that ontology language, Subsection 4.3 defines product graphs which can be viewed as an approximation of the rewriting graph for RDFS rules.
Definition & Properties
In this part, we define a graph of rewritings of the initial query q which contains in its edges information about entailments which exist between rewritings. In the first part, we define theoretically this graph and some of this properties. We then define the same graph using a more constructive approach.
Definition of the Rewriting Graph
In order to get a semantic definition of subset of answers, we define the set of all the rewritings of a query.
Notation 1. Let q be a query and R a set of existential rules, we denote the set of R-rewritings of q by Nq;R. Whenever there is no ambiguity, we will abusively simplify this notation in Nq.
As it is convenient to abstract away from the syntax of rewritings, we will consider classes of queries up to semantic equivalence, as formalized in the next definition.
Definition 4.1 (Class of Semantic Equivalence). Let R be a set of existential rules and S a set of q, the set of classes of semantic equivalence of S with is:
fq~R;S=q 2 Sg;
where q~R;S = fq0 2 S=q0; R j= q and q0; R j= qg.
Whenever there is no ambiguity, we will denote q~R;S by q~. We define a graph that contains all rewritings of a query with the respect of a set of existential rules and also all implication links between rewritings.
Definition 4.2 (Saturated Graph). The saturated graph of R-rewriting classes of q is the graph Gq = (V; E) defined by:
• V is the set of classes of semantic equivalence of Nq;
• E is the set of edges (q~1; q~2) such as q~1 6= q~2 and the class q~2 semantically implies the class q~1, which means there exists q1 2 q~1 and q2 2 q~2 such as q2; R j= q1.
Proposition 4.3. Let q be a query and R be a set of existential rules, the saturated graph G of R-rewriting classes of q is an acyclic graph.
Proof. Suppose there is a cycle q~0; q~2; : : : ; q~k in G, we know that for 1 i k, qi 1; R j= qi and q~0 = q~k, So finally for 1 i; j k; q~i = q~j and there is not cycle.
Let us notice that the saturated graph of R-rewriting classes of q may be infinite, but is finite whenever R is a finite unification set. In the remaining of this paper, we assume R to be a finite unification set. We can now define the rewriting graph of a query q, which is the version of the graph we use to extract from a labellisation. Note that this notion is well-defined, as the transitive reduction of an acyclic graph is uniquely defined. We define also this graph as a transitive reduction, because we want that all oriented paths in this graph are as detailed as possible. In other words, by definition of the transitive reduction, if we have the path q~0; q~1; : : : ; q~k in the rewriting graph, we know that for all 1 i k, there is no rewriting qi0 of q such as qi0 2= qi~ 1 \ q~i and such as qi 1; R j= qi0 and qi0; R j= qi.
Definition 4.4 (Rewriting Graph). The rewriting graph of a query q with respect to a set of existential rules R is the transitive reduction of the saturated graph of R-rewriting classes of q.
Construction of the Rewriting Graph
In this subsection, we prove that Algorithm 1 builds the rewriting graph of a query q with respect to a set of existential rules R. Algorithm 1 works as follows: Lines 4 to 14, the graph of rewriting process is built. Lines 15 to 18, all homomorphisms links are added to the edges of the graph. Strongly connected components are then merged (Line 21, by MergeStronglyConnectedComponents(G)) and the transitive reduction is computed by TransitiveReduction(G) and output. We do not provide more details about the done by TransitiveReduction and MergeStronglyConnectedComponents, as these operations are rather classical.
Table of contents :
1.1 Context and Motivation
2 Recall of OBQA Problem & Approches
2.1 Basic Logical Notations
2.2 Ontology-Based Query Answering Problem
2.3 Rewriting Approach with Existential Rules
3 Problem Statement
4 Rewritings Graphs
4.1 Definition & Properties
4.1.1 Definition of the Rewriting Graph
4.1.2 Construction of the Rewriting Graph
4.2 RDFS rewriting
4.3 Product of Graph
6 Implementation & Experiment
6.2 Computation of the Rewriting Graph
6.3 Building Labellisations
7 Future Work