High-Order Deterministic tree Transducers

Get Complete Project Material File(s) Now! »

Simply-typed lambda calculus

We explain here how to represent trees and functions on trees using simply-typed – calculus, and we present the de nitions of linear and almost linear -terms. Fix a nite set of atomic types A, we then de ne the set of types over A, types(A), as the types that are either an atomic type, i.e. an element of A, or a functional type (A ! B), with A and B being in types(A). The operator ! is right-associative and A1 ! ! An ! B denotes the type (A1 ! ( ! (An ! B) )). The order of a type A is inductively de ned by order(A) = 0 when A 2 A, and order(A ! B) = max(order(A) + 1; order(B)). Signatures A signature is a triple (C; A; ) with C being a nite set of constants, A a nite set of atomic types, and a mapping from C to types(A), the typing function. We allow ourselves to write types( ) to refer to the set types(A).
The order of a signature is the maximal order of a type assigned to a constant (i.e. maxforder( (c)) j c 2 Cg). A tree signature is a signature of order 1 with a unique atomic type. In this work, we mostly deal with tree signatures. In a tree signature with atomic type o, the types of constants are of the form o ! ! o ! o. We write on ! o for an order-1 type which uses n + 1 occurrences of o, for example, o2 ! o denotes o ! o ! o. When c is a constant of type A, we may write cA to make explicit that c has type A. Two signatures 1 = (C1; A1; 1) and 2 = (C2; A2; 2) can be summed if for every c in C1 \ C2 we have 1(c) = 2(c); in such a case we write 1 + 2 for the signature (C1 [C2; A1 [A2; ) so that if c is in C1, (c) = 1(c) and if c is in C2, (c) = 2(c). The sum operation over signatures being associative and commutative, we write 1 + + n to denote the sum of several signatures. We assume that for every type A, there is an in nite countable set of variables of type A. When two types are di erent the set of variables of those types are of course disjoint. As with constants, we may write xA to make it clear that variable x is of type A.
-Terms When is a signature, we de ne the family of simply typed -terms over , denoted ( ) = ( A( ))A2types( ), as the smallest family indexed by types( ) so that:
• if cA is a constant in , then cA is in A( ),
• any variable xA is in A( ),
• if A = B ! C and M is in C ( ), then ( xB:M) is in A( ),
• if M is in B!A( ) and N is in B( ), then (M N) is in A( ).
The term M is a pure -term if it does not contain any constant cA from . When the type is irrelevant we write M 2 ( ) instead of M 2 A( ). We drop parentheses when it does not bring ambiguity. In particular, we write x1 : : : xn:M for ( x1(: : : ( xn:M) : : : )), and M0M1 : : : Mn for ((: : : (M0M1) : : : )Mn). The set fv(M) of free variables of a term M is inductively de ned on the structure of M :
• fv(c) = ;,
• fv(x) = fxg,
• fv(M N) = fv(M) [ fv(N),
• fv( x:M) = fv(M) n fxg.
Terms which have no free variables are called closed. We write M[x1; : : : ; xk] to emphasize that fv(M) is included in fx1; : : : ; xkg. When doing so, we write M[N1; : : : ; Nk] for the capture avoiding substitution of variables x1, . . . , xk by the terms N1, . . . , Nk. In other contexts, we simply use the usual notation M[x1=N1; : : : ; xk=Nk]. Moreover given a substitution , we write M: for the result of applying this (capture avoiding) substitution and we write [x1=N1; : : : ; xk=Nk] for the substitution that maps the variables xi to the terms Ni but is otherwise equal to . Of course, we authorize such substitutions only when the -term Ni has the same type as the variable xi.
Reduction A -redex is a term of the form (( x:M)N) and its -contractum is M[x=N]. A term M -contracts to M0 when one of its subterm is a -redex and is replaced by its -contractum in M0. We write M ! M0 when M -contracts to M0. The re exive transitive closure of -contraction is -reduction and is written ! and its symmetric re exive and transitive closure, -conversion, is written = . A term that does not contain a -redex is said in -normal form.
An -redex is a term of the form ( x:(M x)) when x 2= fv(M) and its -contractum is the term M. The relations of -contraction, ! , -reduction, ! , and -conversion, = , are de ned similarly to -contraction. So as to compare -terms, we use the union of – contraction and -contration, ! . But this can be done by putting terms in a particular form: the -long form. A term M is said to be in -long form whenever if N is a subterm of M that has type A ! B then either N is of the form x:N0, or its occurrence in M is applied to some argument. For every term M there is a term M0 in -long form such that M = M0 and moreover M = N i given M0 and N0 that are -long forms of M and N, M 0 = N0. From now on, we are always going to work with terms in -long form. Consider closed terms of type o that are in -normal form and that are built on a tree signature, they can only be of the form a t1 : : : tn where a is a constant of type on ! o and t1, . . . , tn are closed terms of type o in -normal form. This is just another notation for ranked trees. So when the type o is meant to represent trees, types of order 1 which have the form o ! ! o ! o represent functions from trees to trees, or more precisely tree contexts. Types of order 2 are types of trees parametrized by contexts. The notion of order captures the complexity of the operations that terms of a certain type describe.
Linearity and almost linearity A term M is said linear if each variable (either bound or free) in M occurs exactly once in M. A term M is said syntactically almost linear when each variable in M of non-atomic type occurs exactly once in M. Note that, throug reduction, linearity is preserved but not syntactic almost linearity.
For example, given a tree signature 1 with one atomic type o and two constants f of type o2 ! o and a of type o, the term M = ( y1y2:f y1 (f a y2)) a (f x a) with free variable x of type o is linear because each variable (y1, y2 and x) occurs exactly once in M. The term M contains a -redex so: ( y1y2:f y1 (f a y2)) a (f x a) ! ( y2:f a (f a y2)) (f x a) ! f a (f a (f x a)). The term f a (f a (f x a)) has no -redex so it is the -normal form of M.
Another example: the term M2 = ( y:f y y) (x a) with free variable x of type o ! o is syntactically almost linear because the variable y which occurs twice in the term is of the atomic type o. It -reduces to the term M20 = f (x a) (x a) which is not syntactically almost linear, so -reduction does not preserve syntactical almost linearity. We call a term almost linear when it is -convertible to a syntactically almost linear term. Almost linear terms are characterized also by typing properties (see [22]).
For any signature = (C; fog; ) with a single atomic type o and for any type A on this signature, we de ne a new symbol A which will serve as an undetermined object. We note the signature obtained this way. Intuitively, it represent the result of a computation which does not produce an output. For example, if there is no transition rule with a state q and a tree a x1 : : : xn, then we can add a rule q(a x1 : : : xn) ! A; this allows us to have a complete set of transition rules, while still being a generalisation of other transduction models (such as DTOP and MTT) that do not necessarily produce an output for every input.

High-Order Deterministic top-down tree Transducers

From now on we assume that i is a tree signature for every number i and that its atomic type is oi.
A High-Order Deterministic top-down tree Transducer (HODT) T is a tuple ( Q; 1; 2; q0; R) where:
• 1 = (C1; fo1g; 1) is a rst-order tree signature, the input signature,
• 2 = (C2; fo2g; 2) is a rst-order tree signature, the output signature,
• Q = (Q; fo1; o2g; s) is the state signature, and is so that for every q 2 Q, s(q) is of the form o1 ! Aq where Aq is in types( 2). Constants of Q are called states,
• q0 is the initial state,
• R is a nite set of rules of the form ! ! q(a x ) M[x1; : : : ; xn] where:
{ q is a state of Q.
{ a is a constant of 1 with type on1 ! o1.
{!x = x1; : : : ; xn are variables of type o1 representing the child trees of the root labeled a.
{ M is a term of type Aq that is built on the signature 2 + Q,
{ there is exactly one rule in R for each possible left-hand side (determinism).
Notice that we have given states a type of the form o1 ! A where A 2 types(o2). The reason why we do this is to have a uniform notation. Indeed, a state q is meant to transform, thanks to the rules in R, a tree built in 1 into a -term built on 2 with type Aq. So we simply write q M N1 : : : Nn when we want to transform M with the state q and pass N1,. . . , Nn as arguments to the result of the transformation. We write T for the signature 1 + 2 + Q (assuming that the constants in Q, 1 and 2 are distinct from each other). Notice also that the right-hand part of a rule is a term that is built only with constants of 2 , states from Q and variables of type o1. Thus, in order for this term to have a type in types( 2), it is necessary that the variables of type o1 only occur as the rst argument of a state in Q.
Remark that we did not put any requirement on the type of the initial state. So as to restrict our attention to transducers as they are usually understood, it su ces to add the requirement that the initial state is of type o1 ! o2. However, we consider as well that transducers may produce programs instead of rst order terms.

READ AN EXPLORATION OF THE INTER-DEPENDENCE OF INSTITUTIONS, RIGHTS, AND GOVERNANCE

Regular look-ahead

Similarly to known models of transducers (like DTOP in section 1.4) we equip HODT with what we call a regular look-ahead. It is a deterministic bottom-up tree automaton which is run on the input tree of a High-Order Deterministic top-down tree Transducer (HODT) in order to guide its rules. We rst restate the de nition of bottom-up tree automata, but with the -calculus representation of trees. Tree automata A BOT A is a tuple ( P ; ; R) where:
• = (C; fog; ) is a rst-order tree signature, the input signature,
• P = (P; fog; P ) is the state signature, and is such that for every p 2 P , P (p) = o. Constants of P are called states,
• R is a nite set of rules of the form a p1 : : : pn ! p where:
{ p,p1; : : : ; pn are states of P ,
{ a is a constant of with type on ! o.
An automaton is said deterministic when there is at most one rule in R for each possible left hand side. It is non-deterministic otherwise. Apart from the notation, our de nition di ers from the classical one by the fact there are no nal states, and hence, the automaton does not describe a language. This is due to the fact that BOT will be used here purely for look-ahead purposes. A High-Order Deterministic top-down tree Transducer with Regular look-ahead (HODTR) T is a tuple ( Q; 1; 2; q0; R; A) where 1, 2, Q and q0 are as before and:
• A is a deterministic bottom-up tree automaton, we call it the look-ahead automaton of T ,

Linear and Almost-Linear variants

A linear High-Order Deterministic top-down tree Transducer with Regular look-ahead (HODTRlin) is a HODTR where the terms on the right side of rules are linear. An almost-linear High-Order Deterministic top-down tree Transducer with Regular look-ahead (HODTRal) is a HODTR where the terms on the right side of rules are almost-linear. The rules of such transducers are of the form
(! ! i ! 1 n q a x ) h p M[x ; : : : ; x ] where x = x ; : : : ; x n are variables of type o 1 and p = p ; : : : ; p n are states of the look- ! 1 ! 1 ahead automaton. The linearity or almost linearity constraint on M a ects both bound variables and free variables x1; : : : ; xn, meaning that all of the subtrees x1; : : : ; xn are used in computing the output.
That will be important for the composition of two transducers because if the rst transducer fails in a branch of its input tree then the second transducer, applied to that tree, must fail too. This restriction forcing the use of input subtrees does not reduce the model’s expres-sivity because we can always add a state q which visits the subtree but only produces the identity function on type o2 (this state then has type o1 ! Aq = o1 ! o2 ! o2).
A linear High-Order Weakly Deterministic top-down tree Transducer with Regular look-ahead (HOWDTRlin) is a HODTRlin whose look-ahead automaton is not necessarily de-terministic, but the transducer is deterministic in the weaker sense that, when two rules of the transducer are of the form: and q(a x1 : : : xn)hp1; : : : ; pni !T M[x1; : : : ; xn]

Table of contents :

I Part 1 : Introduction and state of the art
1 Introduction
1.1 Motivations
1.2 Our approach
1.3 The CoLiS project
1.3.1 The CoLiS toolchain
1.3.2 Modelisation of lesystems
1.3.3 Modeling tree transformations for CoLiS
1.4 Theoretical models of tree transformations
1.4.1 Models of tree transformations
1.4.2 Expressivity of tree transformation models
1.4.3 Composition of tree transformations
1.4.4 The functional programming approach
1.5 Contribution and Plan of the thesis
1.5.1 High-Order Deterministic tree Transducers
1.5.2 Tree transducers in the CoLiS project
1.5.3 The MIX language
1.5.4 Plan of the thesis
II Part 2 : Contribution to the CoLiS project
2 Abstraction of lesystems and scripts in the CoLiS project
2.1 Abstraction of the problem
2.2 Control ow structure of Shell scripts
2.3 Abstraction of le systems
2.4 NP-hardness of verication on scripts
3 A model of transducers for the CoLiS project
3.1 Tree pattern transducers
3.2 Formalisation of Unix commands
3.2.1 mkdir
3.2.2 rmdir
3.2.3 rm
3.2.4 touch
3.2.5 test and [ ]
3.2.6 which
3.2.7 mv
3.2.8 cp
3.3 Composition of tree pattern transducers
3.3.1 Example of composition
3.3.2 Unication of tree patterns
3.3.3 Tree constraints in composition
3.3.4 Conclusion on composition
4 Implementation
4.1 The algorithm
4.1.1 Naive version of the algorithm
4.1.2 The improved algorithm
4.1.3 Specicity of the backward approach
4.2 Practical results
4.3 Equivalence
4.4 Conclusion on CoLiS
III Part 3 : Transduction through functional programming
5 Denition of high-order tree transducers
5.1 Denitions
5.1.1 Simply-typed lambda calculus
5.1.2 High-Order Deterministic top-down tree Transducers
5.1.3 Regular look-ahead
5.1.4 Linear and Almost-Linear variants
5.1.5 Tree transformations associated with transducers
5.2 Example of high-order tree transducers
5.3 Properties of High-Order Transducers
5.3.1 Domains
5.3.2 Look-ahead
5.3.3 The expressivity of the order of transducers
6 Composition of transducers
6.1 Composition of HODTR based on Scott models
6.2 Composition of HODTRlin based on coherent spaces
6.2.1 Semantic analysis
6.2.2 Unicity of derivation for semantic token judgements
6.2.3 Collapsing of token derivations
6.2.4 Construction of the transducer which realizes the composition
7 Equivalence with existing models
7.1 Template decomposition
7.1.1 Finiteness of linear templates
7.1.2 Finiteness of almost linear templates
7.2 Order reduction
7.2.1 Linear case of the order reduction
7.2.2 Almost linear case of the order reduction
7.3 Expressiveness of HODTRlin and HODTRal
7.3.1 Denition of ATT
7.3.2 REL ATT HODTRal and REL ATTsur HODTRlin
7.3.3 HODTRal REL ATT and HODTRlin REL ATTsur
IV Part 4 : Expressivity of MSO transductions, case of the MIX languages
8 The MIX languages and MSO transductions
8.1 Preliminaries
8.2 MIX2 is not an EDT0L language
8.2.1 The counter example word sk
V Conclusion
9 Conclusion
9.1 Contributions
9.2 Perspectives
9.2.1 Our implementation in CoLiS
9.2.2 Implementing HODTRlin in CoLiS
9.2.3 Testing equivalence of HODTR .