Locally decodable and correctable codes

Get Complete Project Material File(s) Now! »

Locally decodable and correctable codes

Classical correcting algorithms allow us to retrieve a codeword c from a damaged version y of c. They usually take as input the entire noisy word y and return, hopefully, the entire codeword c. Assume now that one wants to retrieve only one symbol ci of c. Of course, one can use the previous algorithm which returns c entirely, and then on can extract the desired symbol ci. However, the complexity of this method is at least linear in the length of the code. Local correcting algorithms were introduced in order to reach a better computation complexity for this problem. Explicitly, they aim at retrieving a correct symbol from a damaged codeword, with high probability (w.h.p.) and in sublinear time.
In this chapter, we recall formal definitions of locally decodable and locally correctable codes. We then present algebraic constructions in the high-rate regime, which is of main interest for our applications. We also review current bounds and issues in this area. Next, we show how locally correctable codes (LCCs) can be built from block designs. Finally, after introducing generalised design-based codes, we discuss to what extent they can lead us to high-rate LCCs.

Motivation and definitions

Historical points

The notion of locality in codes appeared in computational theory in the early 1990s. It was initially raised in the scope of testing computations and self-correcting programs, see e.g. [BLR93, BFLS91, GLR +91, GS92]. A typical question was to output w.h.p. the image f (x) of x 2 S by a low-degree polynomial f , given a program P that computes f (y) correctly for only a constant fraction of entries y 2 S.
Later, codes with locality also turned out to be useful to obtain short probabilistic checkable proofs (PCPs). Very informally, assume one wants to assess w.h.p. whether a long string x belongs to some language L, with the restriction that one can make only a small number of oracle queries1 to x or to another long string derived from x. Generically, if we directly access x, the partial information we get cannot be sufficient to determine whether x 2 L. Thus, a natural idea is to encode x into a longer proof p, so that a few oracle queries to p assert w.h.p. if p corresponds to an encoding of an element of L. This leads us to the notion of locally testable codes which were a key element for proving the well-known PCP Theorem [AS98, ALM+98].
A few years later, Katz and Trevisan [KT00] formally introduced so-called locally decodable codes in an information theoretic perspective. This seminal work was motivated by sublinear-time data recovery in large corrupted databases. The authors proved that codes equipped with a local decoding procedure must be somewhat smooth, in the sense that queries should be sufficiently balanced. They also gave first lower bounds on the number of redundancy symbols needed in the codeword, and they proposed an application to private information retrieval protocols — see Chapter 4 for more details.

Definitions

Let us formally introduce locally correctable and decodable codes, as defined in Yekhanin’s survey [Yek12]. In this thesis, an oracle query to a string y 2 SS refers to a map y 7!yi for some i 2 S. One usually considers that the computational cost of such a map is 1, and the index i is sometimes called a query. If an algorithm A is only allowed to access a string y via oracle queries, we say it has oracle access to y, and we represent it by A(y). Notice that, if the string 2 SS is seen as a map gy : S ! S, then an oracle query is purely the evaluation of gy at i 2 S.
Definition 2.1 (locally correctable code, or LCC). Let C SS be an Fq-linear code, j Sj = n. Let also 1 ‘ n, d 2 (0, 1) and # < 1/2. We say that C is an (‘, d, #)-locally correctable code (LCC) if there exists a randomised algorithm LC, taking as input i 2 S and having oracle access to words y 2 SS, which satisfies the following requirements. For every y 2 SS and c 2 C such that d(y, c) dn, and for every i 2 S, we have:
– Pr(LC(y)(i) = ci) 1 #, the probability being taken over the internal randomness of LC, and 1an oracle query is a map y 7!yi for some index i, see Subsection 2.1.2
– LC(y)(i) queries at most ‘ symbols of y.
We refer to ‘ as the locality of the code C, and to LC as the local correcting algorithm.
In the literature, it is usually considered sufficient to have a failure probability # 1/3. The idea is that, by repeating the local correcting algorithm LC several times on the same input i, one can attain values of # exponentially small in the number of procedures we run.
Definition 2.2 (locally decodable code, or LDC). Let C SS be an Fq-linear code, jSj = n, equipped with an encoder E : Fkq ! C. Let also 1 ‘ k, d 2 (0, 1) and # < 1/2. We say that C is an (‘, d, #)-locally decodable code (LDC) if there exists a randomised algorithm LD, taking as input i 2 [1, k] and having oracle access to words y 2 SS, which satisfies the following requirements. For every y 2 SS and m 2 Fkq such that d(y, E(m)) dn, and for every i 2 [1, k], we have:
– Pr(LD(y)(i) = mi) 1 #, the probability being taken over the internal randomness of LD, and
– LD(y)(i) queries at most ‘ symbols of y.
Once again, ‘ is the locality of the code C, and LD is referred to as a local decoding algorithm. Remark 2.3. Let C SS be an (‘, d, #)-LCC with ‘ k = dim(C). If we know a systematic encoder E : Fkq ! C, then the code C is also (‘, d, #)-locally decodable. Indeed, any message symbol mi, 1 i k, is also a codeword symbol cj for some j 2 S, and the local correcting algorithm C thus becomes a local decoding algorithm.
For g 2 (0, 1), we say that a locally correctable code C SS is g-smooth if, for all i, j 2 S, algorithm LC(i) queries y j with probability at most g. Moreover, C is called perfectly smooth if symbols y j are queried with equal probability. Hence, a perfectly smooth LCC is (‘/n)-smooth, but the converse may be false when some vectors of queries have size strictly less than ‘. Smooth LDCs can be defined very similarly to smooth LCCs, the only difference being that i 2 [1, k] is a message coordinate instead of a codeword coordinate.
In fact, any LCC intrinsically admits a certain smoothness, as we show in the following result inspired by [KT00, Theorem 1].
Proposition 2.4. Let C SS be an (‘, d, #)-LCC of length n. For every g > d‘n , the code C is also a g-smooth (‘, d ‘ , #)-LCC. gn
Proof. Let g > ‘ and d0 = d ‘ , and denote by LC the local correcting algorithm for dn gn C. Relying on LC, we will describe another local correcting algorithm LCg for C, with the additional g-smooth property.
Let y 2 SS and c 2 C such that d(y, c) d0n. On input i 2 S, define
Si := fj 2 S j Pr(LC(y)(i) reads yj) > gg as the set of coordinates which are queried too often. Let us now consider a new word y0 2 SS, defined for all j 2 S as follows: y0 = 0 if j 2 Si
j yj otherwise.
The new algorithm LCg is simply defined by LC(gy)(i) := LC(y0)(i). Let us analyse its properties.
As we removed all queries that LC makes with probability larger than g, the algorithm LCg is g-smooth. Moreover, we can easily check that jSij ‘/g. Hence d(y0, y) ‘/g, and d(y0, c) ‘/g + d0n = dn follows. Therefore LC(gy)(i) outputs ci with probability 1 # as expected.
The previous proposition may question the relevance of focusing on the smoothness of LCCs.
However, we see that we cannot make any LCC perfectly smooth, since d is strictly less than 1.
Therefore, searching for perfectly smooth LCCs remains a very relevant goal.

A first example: the binary Hadamard code

The binary Hadamard code Had2(m) is a classical example of locally correctable code — see Subsection 1.2.3 for a definition. Since the code consists in evaluation vectors of linear forms over F2m, the 3 non-zero points of any plane give rise to a parity-check equation for Had2(m). Such equations can then be exploited to recover any symbol with only 2 queries.
For convenience we here adopt the functional representation. Formally, denote by S = F2m n f0g and let f 2 C (that is, f is a linear form over F2m). For all u 6= v 2 S, we see that u + v 2 S, and the fact that f (u) + f (v) = f (u + v) leads us to the following local correcting algorithm.
Algorithm 1: A smooth (2, d, 2d)-local correcting algorithm for Had2(m).
Input: a point u 2 S := Fmq n f0g, and an oracle access to g 2 F2S such that d(g, f ) dn for some f 2 Had(m), where n = jSj.
Output: f (u) with high probability.
1 Pick v R S n fug uniformly at random. .
2 Toss a random binary coin b 2 f0, 1g, following B 2
n+ 1
if b = 1 then
Query fg(u)g and output g(u).
else
6 Query fg(v), g(u + v)g and output g(u + v) g(v).
Algorithm 1 is often presented with steps 1 and 6 only. In that case, coordinate u is never queried. We here add the tossing trick in order to make the algorithm perfectly smooth, as we can see in the following proposition.
Proposition 2.5. Let d < 1/4, and m 2. By using Algorithm 1, the Hadamard code Had2(m) is a perfectly smooth (2, d, 2d)-locally correctable code.
Proof. We first check that evaluations of g are queried with equal probability. We see that g(u) is read with probability 2/(n + 1), and g(v), v 2 S n fug, is read with probability
Let us now prove that the output is correct with probability 1 2d, and denote by E = supp( f g) the support of the errors (we have jEj dn). We also define Su as the (random) support of the query made by LC(g)(u). Notice that Pr(v 2 Su) = 2/(n + 1) for every v 2 S. From Algorithm 1 we see that Su can be either fug or fv, u + vg for some v 6= u. If Su = fug, then the local correcting algorithm LC(g)(u) outputs f (u) if and only if u 2/ E. If Su = fv, v + ug, then LC(g)(u) = f (u) if and only if jE \ Suj 2 f0, 2g, in other words, if and only if E \ Su = ? or Su.
Thus, the probability of success of LC(g)(u) is given by:
Pr(LC(g)(u) = f (u)) = 1 Pr(jE \ Suj = 1) 1 Pr(jE \ Suj 1) 1 E(jE \ Suj) .
Furthermore, by linearity we obtain
E(jE \ Suj) = å Pr(v 2 Su) = å = jEj dn < 2d .
n + 1 n + 1 n + 1
v2E v2E
Recall that Had2(m) has dimension m. Denote by fe1, . . . , e mg (resp. fX1, . . . , Xmg) the canon-ical basis of the affine space F2m (resp. the space of linear forms over F2m). The information set I = fe1, . . . , emg S induces a systematic encoder E : F2m ! F2S for Had2(m), given by E(mi) = evS(X i) for every m 2 F2m. Therefore Had2(m) is also a (2, d, 2d)-LDC.
Quantitatively, Hadamard codes define a family of binary LCCs with increasing length 2m 1 and constant query size 2. However, the dimension m of these codes is only logarithmic in their length. In the following section, we will see that a certain class of Reed-Muller codes defines another family of LCCs, achieving constant rate provided an increase of the query size.

Some algebraic constructions of LCCs

This section is devoted to presenting families of locally correctable codes that are built by eval-uating polynomials over vector spaces. In Subsection 2.2.1, we detail two smooth local cor-recting algorithms for Reed-Muller codes. We show how perfect smoothness can be achieved, and we depict qualitative relations between the three parameters of LCCs. Next subsections are then devoted to more evolved algebraic constructions leading to high-rate LCCs, namely multiplicity codes and lifted codes.

Reed-Muller codes

Reed-Muller codes have been defined in Subsection 1.2.2. In this subsection, we let C = RMq(m, r) for 0 r m(q 1). Informally, we see that if r is small enough, then restrictions of C to subspaces A Fmq define non-degenerate codes, which consist in the evaluation vectors of low degree polynomials over A. This key property will be used for local correcting purposes. Before going deeper into details, we state a few results which are straightforward to prove.
Lemma 2.6. Let 1 t m and f 2 Aff(Ftq, Fmq) be an injective affine map.
If f 2 Fq[X1, . . . , Xm], deg( f ) r, then there exists g 2 Fq[X1, . . . , Xt], deg(g) r, such that
8u 2 Fqt, ( f f)(u) = g(u) . (2.1)
Conversely, if g 2 Fq[X1, . . . , Xt], deg(g) r, then there exists f 2 Fq[X1, . . . , Xm] with deg( f ) r, such that (2.1) holds.
In terms of codes, Lemma 2.6 implies the next corollary.
Corollary 2.7. Let C = RMq(m, r) with r t( q 1) for some t 1, and f 2 Aff(Ftq, Fmq). Then f (C) = RMq(t, r). In particular, if we denote by A = im(f) the t-dimensional affine space that defines, then CjA FqA is permutation-equivalent to RMq(t, r), the permutation being given by : Ftq ! A.
A first local correcting algorithm. Corollary 2.7 induces a local correcting procedure for RMq( m, r), r < t(q 1). It consists in picking at random f 2 Aff(Ftq, Fmq), and correcting the noisy codeword restricted to the image of f, thanks to an efficient decoding algorithm for RMq(t, r).
Thus, we now assume to have at our disposal a correcting algorithm Corr for RMq(t, r), which corrects 1 erasure and up to w errors, where 2 + 2w = dmin(RMq(t, r)). Such an algorithm can be derived from an efficient half-distance 1-erasure correcting algorithm for Reed-Solomon codes, as those we have mentioned in Subsection 1.2.2.
Algorithm 2: A perfectly smooth local correcting algorithm of locality ‘ = qt 1 for the Reed-Muller code RMq(m, r), where r < t(q 1).
Input: a coordinate u 2 S := Fmq, and an oracle access to g 2 FSq such that d( f , g) dn, for some f 2 RMq(m, r), where n = jSj.
Output: f (u) with high probability.
/* CorrA code
denotes a half-distance one-erasure correcting algorithm for the RMq(m, r)jA, isomorphic to RMq(t, r).
Pick uniformly at random f 2 Aff(Ftq, Fmq) such that u 2 A := im(f).
Toss a random binary coin b 2 f0, 1g, following B(p) with p := qtqm1 .
if b = 1 then
Pick v uniformly at random in A n fug.
else
6 Define u.
v
Define A0 = A n fvg and query fg(v) : v 2 A0g.
Define g0 2 (Fq [ f?g)A by gj0A0 = gjA0 and g0(v) =?. Run CorrA on input g0.
9 if CorrA fails then
Abort with fail.
else
Denote by h 2 FqA the output of CorrA. Output h(u).
Let us give additional notation to make the analysis of Algorithm 2 simpler. We write r = a(q 1) + b, with 0 b q 1 and a m 1, the degree of the Reed-Muller code RMq(m, r), and we recall that dmin(RMq(m, r)) = (q b)qm a 1. We also denote by w = dmin(RMq(t, r)) 2 = (q b)qt a 1 2 the number of errors that Corr can correct, and by t := w/(qt 1) its ratio compared to the locality parameter.
Proposition 2.8. Let m 2 and r = a(q 1) + b such that a m 1 and 0 b < (q 1). Let also t be the relative error correcting capability, defined as above. Then, for every d < t/2 and every a + 1 t m 1, the Reed-Muller code RMq(m, r) is a perfectly smooth (qt 1, d, d/t)-locally correctable code, using Algorithm 2.
Proof. We use the same arguments as for Hadamard codes. First, we notice that Algorithm 2 is smooth: every point is chosen with probability p = (qt 1)/qm. Let us now bound its probability of success.
Denote by S = Fmq, jSj = n. Let g = f + e 2 FSq, where f 2 RMq(m, r),
= supp(e) and jEj dn. Let also A0u represent the random queries made by know the algorithm succeeds if jE \ A0uj w, where 1 + 2w = dmin(RMq(t, r))
2 FSq with LC(g)(u). We
Hence,
Pr(LC(g)(u) = f (u)) 1 Pr( E \ A0 w + 1) 1 E(jE \ Au0j)
j uj w + 1
by Markov’s inequality. Let us now estimate E(jE \ Au0 j). By linearity
E(jE \ Au0j) = å Pr(v 2 Au0) = å p dqm qt 1 = d(qt 1) .
qm
v2E v2E
Finally, we get d(qt
Pr(LC(g)(u) = f (u)) 1 1) 1 d .
t(qt 1)+1 t
Many parameters are involved in Proposition 2.8. If we fix the degree r of the Reed-Muller code, there remains freedom for the choice of t. We see that t affects exponentially the loc-ality, but its influence on t is moderate, since we have approximately t ’ ( q b)q (a+1)/2. Therefore, choosing the minimum t = a + 1 appears to be the most relevant choice since the locality is a crucial parameter. A second local correcting algorithm for RM codes. In Algorithm 2, the strategy was to use the full correcting capability of the local code Cj A ’ RMq(t, r), hoping that the number of errors on the queried symbols does not exceed the packing radius w of CjA. One can reduce the locality at the expense of an increase of the failure probability. Simply, the idea is to pick at random a map f 2 Iso(Ftq, A) and an information set I A for the local code CjA, and then to query the noisy codeword only on I. Hoping for no corrupted symbols on I, we get an LCC of locality ‘ = dim(CjA) = dim(RMq(t, r)).
A minor difficulty appears if we require smoothness. Indeed we need that each individual query is uniformly distributed over the support Fmq, when information sets I are picked at random. Fortunately, Reed-Muller codes admit an automorphism group which contains the doubly-transitive group Aff(Ftq) of affine transformations. Therefore, if I is a fixed information set for RMq(t, r), then every elements of its orbit under Aff(Ftq) is also an information set for RMq(t, r). Hence, it allows us to define the smooth query generator given in Algorithm 3.

Table of contents :

Remerciements
Contents
Résumé
Introduction
I Codes with locality
1 Basics of coding and design theory
1.1 Elementary notions in coding theory
1.2 Algebraic constructions of codes
1.2.1 Reed-Solomon codes
1.2.2 Reed-Muller codes
1.2.3 Hadamard codes
1.3 Combinatorial constructions through block designs
1.3.1 Designs
1.3.2 Design-based codes
2 Locally decodable and correctable codes
2.1 Motivation and definitions
2.1.1 Historical points
2.1.2 Definitions
2.1.3 A first example: the binary Hadamard code
2.2 Some algebraic constructions of LCCs
2.2.1 Reed-Muller codes
2.2.2 Multiplicity codes
2.2.3 Lifted codes
2.2.4 A short comparison between multiplicity and lifted codes
2.3 Other constructions and bounds
2.3.1 Constructions
2.3.2 Bounds
2.4 LCCs from a design theory perspective
2.4.1 Formulating some LCCs as design-based codes
2.4.2 Design-based codes for low-error LCCs
2.4.3 Generalised design-based codes
2.4.4 Further perspectives
3 Projective lifted codes
3.1 Preliminaries
3.1.1 Affine and projective evaluation codes
3.1.2 Reduced degree sets
3.1.3 Isomorphisms and embeddings
3.2 Definition and first properties of projective lifted codes
3.2.1 Affine lifted codes
3.2.2 Projective lifted codes
3.2.3 Monomiality of projective lifted codes
3.2.4 Degree sets of lifted codes
3.3 Local correction
3.4 Intertwined relations between affine and projective lifted codes
3.4.1 Motivation and similar results
3.4.2 Shortening and puncturing projective lifted codes
3.4.3 Projective lifted codes as generalised design-based codes
3.5 Other properties towards practicality
3.5.1 Automorphisms and (quasi-)cyclicity
3.5.2 Explicit information sets
3.5.3 Estimation of the minimum distance
3.6 Rate and degree sets of lifted codes
3.6.1 Computation of degree sets
3.6.2 The case m = 2
II Cryptographic applications
4 Private information retrieval from transversal designs
4.1 Private information retrieval
4.1.1 PIR models
4.1.2 First constructions, and links with locally decodable codes
4.2 A 1-private protocol based on transversal designs
4.2.1 Transversal designs
4.2.2 The protocol
4.3 Instances
4.3.1 Transversal designs from affine geometries
4.3.2 Transversal designs from projective geometries
4.3.3 Orthogonal arrays and the incidence code construction
4.3.4 High-rate incidence codes from divisible codes
4.4 The t-private construction
4.4.1 Generic construction and analysis
4.4.2 Instances
4.5 Recent PIR constructions and bounds
4.5.1 PIR capacity, and capacity achieving protocols
4.5.2 Is PIR rate the only criterion to consider?
5 Proofs of retrievability from codes with locality
5.1 Introduction
5.1.1 Motivation
5.1.2 Previous works
5.1.3 Our approach
5.1.4 Organisation
5.2 Proofs of retrievability
5.2.1 Definition
5.2.2 Security models
5.3 The PoR construction
5.3.1 Verification structures for codes
5.3.2 A PoR scheme based on codes with locality
5.3.3 Analysis
5.3.4 Estimation of the bias
5.3.5 Pairwise uncorrelation of variables fXugu2D
5.4 Performance
5.4.1 Efficient scrambling of the encoded file
5.4.2 Parameters
5.5 Instantiations
5.5.1 Tensor-product codes
5.5.2 Reed-Muller and related codes
5.5.3 Experimental estimate of the bias a
Conclusion
List of Algorithms
List of Figures
List of Tables