Get Complete Project Material File(s) Now! »

## Lyndon words

In 1954 and using the lexicographic order, Roger Lyndon [34] investigated the family of Lyndon words after being introduced by Anatoly Shirshov [45] in 1953 with a di erent name. The proper de nition of this family of words is stated as follows:

De nition 1.2 A word w 2 A is a Lyndon word if for all u; v 2 A+ such that w = uv, we have w <‘ vu.

In other words, we can say that the Lyndon word is the smallest word among all its conjugates. Hence, since i(w) is the circular permutation of w at position i which gives the conjugate of w of order i, hence we can write w <l i(w) ; 8i 2 f1 : : : jwjg. It is equivalent that for each such factorization w <l v.

Based on this de nition, we can deduce one of the properties of this family of words. In fact, we have that Lyndon words are primitive words otherwise, we can have an equality between at least one of the conjugates of w. Therefore, if w is a Lyndon word, ww is not Lyndon.

Example 7 Let w = 0110, we have X = f i(w); 8 0 i 3g, hence we can write X = f0011; 0110; 1100; 1001g. The smallest element of X in lexicographic order is w0 = 0011. Therefore, w is not a Lyndon word.

Over the alphabet A = f0; 1g, we represent the rst 9 Lyndon words where the trivial words are the letters of the alphabet.

We say that s is a proper su x of w if we can write w = w0s where s 6= and w 6= w0.

We de ne now two types of factorization for the Lyndon words, the right and the left factorizations. We mention before, that two words u and v are elements of the factorization of w if w is obtained by the concatenation of these two words. We represent the factorization by w = uv = (u; v).

Proposition 1.3 Let w be a Lyndon word, if w can be written as w = (u; v) with v its lexicographically least proper su x, then u; v are Lyndon words and u <l v.

This is the right factorization, while if u is the longest pre x then we get the left factorization.

In some cases these two factorizations coincide as we will see in Example 8 and in this case the word is called a regular word. This concept was used by Melancon in 1999 [35], then presented by (Berstel et al.2009) in [5] with a di erent notation as \balanced2″ and nally used by (Reutenauer 2016) [43] as \equilibrated ».

Example 8 Let w1 = xxyxxzxy, w2 = xxyy, and w3 = xxyxxyxy be three words over the alphabet A = fx; y; z g. Table 1.1 shows the right and left factorization of each word. We remark that the left and right factorization of w3 are the same, hence w3 is a regular word.

The right factorization is simply called the standard factorization. We give now another kind of factorization that was de ned in 1958 by Chen, Fox and Lyndon [13].

De nition 1.4 Every non empty word w admits a unique factorization as a lex-icographically decreasing sequence of Lyndon words: w = l1n1 l2n2 lknk , s.t l1 >l l2 >l >l lk where ni 1 and li are Lyndon words.

This factorization is called Lyndon factorization. In 1980 Duval proved that this factorization can be computed in a linear time [22].

Example 9 In Table 1.2, we nd the Lyndon factorization of the following three words: w1 = 100101100101010, w2 = 0010010 and w3 = 001010110010.

Lyndon words give rise to a basis of some algebra. In particular, those factor-izations were used as tools to solve problems in free groups and free Lie algebra [42].

**Standard Sturmian Words**

In this section, we study another family of words. We introduce the family of standard Sturmian words as the family of one sided in nite words over a bi-nary alphabet that are obtained as a discretization of a line with irrational slope starting from the origin. An old game of English billiards on a square table can clearly represent the situation. In fact, the pounded ball follows a trajectory and hits the billiard board in such way that the vertical intersections are denoted by 0 and the horizontal ones are denoted 1. We represent this trajectory in a square lattice by considering the linear trajectory of irrational slope that cuts the squared lattice and we label the horizontal intersection by 1 and the vertical one by 0. The sequence obtained forms exactly the standard Sturmian word.

By considering the nite case of standard Sturmian words, the slope of the line segments turns into a rational number and we get a new family called family of Christo el words that will be the fundamental object of this thesis. All the properties and characteristics of this family will be given in detail in the next sec-tion.

One of the most important characterizations of the standard Sturmian words is the order of balancedness that is de ned as follows:

De nition 1.5 [51] A word w is k-balanced if, for any two factors u,v of w, of same length, and any letter a of the alphabet A, If 8 1 i jwj; u; v 2 F acti(w); juj = jvj =) jjuja j vjaj k:

By convention, a balanced word refers to a 1-balanced word, see [40]. One of the properties of standard Sturmian words is that are balanced words. Further more in this chapter, we will expand and explicit the balancedness property.

While studying the structure of these words, we can not avoid to mention their periodicity de ned as follows:

De nition 1.6 A positive integer p is a period of a nite word w; 1 p jwj if w[i] = w[i + p]; for all 1 i jwj p.

A nite word can have more than one period and at the same time, we can nd some words with no periods.

Example 10 Let w = 0001, with De nition 1.6, we can note that for all 1 i 4, the word w has no period.

The most important result concerning the periodicity is the theorem Fine and Wilf (1965) [23] that describes the structure of words with more than one period.

Theorem 1.7 If w has two periods p and q such that jwj p + q gcd(p; q) then the gcd(p; q) is also a period of w.

Example 11 Let w = 0100100100100 be a word of length n = jwj = 13.

As we can see, the word w has a rst period p = 6 since w[1 7] = 0100100 = w[7 13]. We also remark that the second period of w is q = 9 where w[1 4] = 0100 = w[10 13]. Hence this word has two periods p = 6 and q = 9 and since n 6 + 9 gcd(6; 9) = 12, then by the theorem Fine and Wilf, we have that 3 = gcd(6; 9) is also a period as we can see: w[1 10] = w[4 13] = 0100100100.

This theorem has an optimal bound related to the case when the two periods p and q are coprime. In this case, the maximal length of a non constant word is p + q 2. Such words are called extremal Fine and Wilf words. This is the second characterization of Sturmian words, since in 1994, de Luca and Mignosi in [17] showed that the set of all factors of extremal Fine and Wilf words corresponds to the set of factors of standard Sturmian words.

### Christo el words

After de ning the previous two families of words, we study in detail and we give all the properties of this particular family of words over a binary alphabet. Christo el words [14] have many equivalent de nitions. In 1771, Jean Bernoulli was the rst to give the de nition of Christo el words in the discrete plane and in 1990 Jean Berstel gave them this name with respect to Elwin. B. Christo el (1829 1900). In a geometric view, we consider the Christo el words to be the discretization of a line segment of rational slope. This was well explored in the papers (Osborne and Zieschang, 1981 [37]; Dulucq and Gouyou-Beauchamps, 1990 [21]; Borel and Laubie, 1993 [7]; Berstel and al., 2009 [5]). For the rest of this thesis, we restrict the work to the binary alphabet A = f0; 1g, unless mentioned elsewhere.

We start by de ning the Christo el path which is de ned on a discrete lattice, i.e it is a sequence of unitary steps joining points of integer lattice.

We recall that if two integers a and b in the set of non-negative integers N have the property that their di erence a b is integrally divisible by a number n (i.e., (a b)=n is an integer), then a and b are said to be « congruent modulo n. » and we denote a b mod n. Two numbers a and b are coprime if the greatest common divisor between them is 1, and we denote it: a ? b.

Let a?b, the Lower Christo el path of slope a=b is the path joining the origin O(0; 0) to the point (b; a) and respecting the following characteristics:

(i) it is the nearest path below the line segment joining these two points;

(ii) there are no points of Z Z between the path and line segment.

Analogously, the Upper Christo el path is the path that lies above the line segment. By convention, the Christo el path is exactly the Lower Christo el path.

Example 12 Consider the line segment joining the origin O(0; 0) to the point (8; 5). We have a = 5; b = 8 and n = a + b = 13. The Lower and upper Christo el paths of slope 5=8 are represented in the Figure 1.2:

We remark that the Upper Christo el path of slope a=b is the mirror image of the Lower Christo el path obtained by a rotation of an angle of 180 .

By assigning to each horizontal step the letter 0 and to each vertical step the letter 1 of the binary alphabet A = f0; 1g, we get the Christo el word of slope ab denoted by: C( a ), where the fraction a is exactly: jwj1 (see Figure 1.3).

We de ne the morphism : A ! Q [ f1g by: ( ) = 1 and (w) = jwj1 ; 8 w 6= 2 A ; jwj0 where 1 = 1. This morphism determines the slope for each given word in A . 0

1. For a = 0, we get the Christo el word of slope 0 represented by w = C(01 ) = 0.

2. For b = 0, we have the Christo el word of slope 1 represented by w = C(10) = 1.

Example 13 Let (a; b) = (3; 5), the Christo el word of slope 3=5 is given by: C 3 = 00100101 and represented in Figure 1.3: Let A = f0; 1; : : : ; kg be an alphabet with integers, we let E be the automorphism de ned as follows: 8i 2 A;

We have E(i) + E(k i) = k. In particular, as we are dealing with the binary alphabet A = f0; 1g, we have: E(0) = 1 and E(1) = 0.

Example 14 Let u = 00100110 and v = 11011001, we have v = E(u).

Remark 1.8 With the automorphism E, we remark that if w is a Lower Christof-fel word of slope a=b then E(w) is an Upper Christo el word of slope b=a.

Example 15 For the Lower Christo el word of slope 3=5, we get w = 00100101. By evaluating E(w), we get: E(w) = 11011010 which is an Upper Christo el word of slope 5=3 as shown in Figure 1.4.

In 1997, Berstel in his paper [4], gave 14 di erent properties for the Christo el words. Here, we mention four of them that are useful for the results obtained in this thesis.

Property 1.9 Christo el words are primitive words.

In fact, if w is a Christo el word of slope a=b then jwj1 = a and jwj0 = b where a?b. This implies that w is not a proper power of another word which means that w is a primitive word.

Property 1.10 [5] Let w = C ab be the Christo el word of slope a=b, we write w = 0w01, where w0 is a palindrome.

We name w0 the central part of w. Note that the lower and upper Christo el words have the same central part.

For example: The word w = C(35 ) = 0 010010 1 is a Christo el word, where the central part 010010 is a palindrome as shown in the Figure 1.3.

In the previous section, we de ned the Lyndon word to be the smallest between its conjugates. This property may be equivalently stated for Christo el words and hence we have the third characterization of Christo el words in terms of Lyndon words [4].

Theorem 1.11 [35] A word w is a Christo el word if and only if it is a 1-balanced Lyndon Word.

We remark that the conjugate of Christo el words are all distinct.

Example 16 Let w = C(23 ) = 00101 be the Christo el word of slope 2=3. In Ta-ble 1.3, we show all the conjugates of w placed in an increasing lexicographic order:

We remark that the rst word in the lexicographic order is the Christo el word of slope 2=3, while the last word in the lexicographic order is exactly the Upper Christo el word of slope 2=3. Hence, we can conclude that Lower and Upper Christo el words are conjugate as we can see in Example 16 and this result was proved in [17] by De Luca and Migonsi in 1994.

#### Standard factorization

One of the most important characterizations of a Christo el word is that any Christo el word can be written as the concatenation of two Christo el words in a unique way. This concept was introduced by Borel and Laubie in 1993 [7] and they called it the Standard factorization de ned on proper Christo el words as follows:

Theorem 1.12 (Borel-Laubie 1993) [7] A proper Christo el word w has a unique standard factorization w = (u; v), where u and v are both Christo el words.

For this theorem, we are able to nd several proof in di erent books. I introduce the proof given by Hugh Thomas that was mentioned in [5].

Proof. Let w be a Christo el word of the line segment [OF ] of slope a=b. We prove the existence and uniqueness of this factorization as follow:

1. The existence:

Let C be the closest point on the Christo el path to the line segment [OF ]. The triangle OCF has no interior integer points, same for the two paths determined by the line segments [OC] and [CF ] as we can see in the Figure 1.5. The consequence of this statement is that the coordinates of C are coprime and the two paths obtained are Christo el ones. Hence w = (u; v), where u is the Christo el word corresponding to the line segment [OC] and v is the Christo el word corresponding to the line segment [CF ].

**Table of contents :**

**1 Some combinatorics on words **

1.1 Notation and preliminaries

1.2 Lyndon words

1.3 Standard Sturmian Words

1.4 Christoel words

1.4.1 Standard factorization

1.4.2 Christoel tree

1.5 Balance property

1.5.1 Balance matrix

1.5.2 Properties of the balance matrix

1.5.3 Christoel words with a and b not coprime

**2 Algebraic view **

2.1 Algebraic denition

2.2 More about Christoel words

2.2.1 Continued fractions

2.2.2 Stern-Brocot tree

2.2.3 Second construction of the balance test matrix

**3 Second order balanced matrix **

3.1 Second order balance matrix

3.2 Recursive construction of the second order balance matrix

3.2.1 Properties of the matrix Ua

3.2.2 General form of the second order balance matrix

3.3 The construction of Ua

3.3.1 The trivial cases : z 2 f0; 1g

3.3.2 The general case : z 2

**4 Some results **

4.1 Second order balance of C( a b )

4.1.1 Renement of the value of 2( a b ) using the continued fraction

4.1.2 Fibonacci sequence

4.2 Abelian vectors

**5 Symmetric standard factorization **

5.1 Symmetric standard factorization

5.1.1 Geometrical examples

**6 Synchronization of Christoel words **

6.1 Synchronization of Christoel words

6.2 Vertical invariant

6.3 Seeds for two generators

6.4 Horizontal invariant

6.5 Case of three generators

6.5.1 Generators not relatively co prime

6.6 Relation between In;k and Ihgi

6.7 Equal generators

6.8 Distinct generators

6.8.1 Fraenkel’s seed

6.9 Relation between 2 and 3 generators

**7 Convexity and digital convex polyominoes **

7.1 Introduction

7.1.1 Discrete geometry and digital space

7.2 Theoretical results

7.2.1 Perturbations on the WN paths

7.2.2 Denition of the split operator

7.2.3 Commutativity of the split operator