DOKK Library

Query Complexity Lower Bounds for Reconstruction of Codes

Authors Sourav Chakraborty, Eldar Fischer, Arie Matsliah,

License CC-BY-3.0

Plaintext
                        T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533
                                       www.theoryofcomputing.org




           Query Complexity Lower Bounds for
                Reconstruction of Codes
               Sourav Chakraborty                        Eldar Fischer∗              Arie Matsliah
                Received June 10, 2012; Revised August 25, 2014; Published December 29, 2014




       Abstract: We investigate the problem of local reconstruction, as defined by Saks and
       Seshadhri (2008), in the context of error correcting codes.
            The first problem we address is that of message reconstruction, where given oracle
       access to a corrupted encoding w ∈ {0, 1}n of some message x ∈ {0, 1}k our goal is to
       probabilistically recover x (or some portion of it). This should be done by a procedure
       (reconstructor) that given an index i as input, probes w at few locations and outputs the
       value of xi . The reconstructor can (and indeed must) be randomized, with all its randomness
       specified in advance by a single random seed, and the main requirement is that for most
       random seeds, all values x1 , . . . , xk are reconstructed correctly. (Notice that swapping the
       order of “for most random seeds” and “for all x1 , . . . , xk ” would make the definition equivalent
       to standard Local Decoding.)
            A message reconstructor can serve as a “filter” that allows evaluating certain classes
       of algorithms on x safely and efficiently. For instance, to run a parallel algorithm, one can
       initialize several copies of the reconstructor with the same random seed, and then they can
       autonomously handle decoding requests while producing outputs that are consistent with the
     A conference version of this paper appeared in the The Proceedings of the Second Symposium on Innovations in Computer
Science (ICS 2011) [6].
   ∗ Supported by an ERC-2007-StG grant number 202405.



ACM Classification: F.2.2, H.1.1
AMS Classification: 68Q17, 68W20
Key words and phrases: property testing, locally decodable codes, locally correctable codes, recon-
struction

© 2014 Sourav Chakraborty, Eldar Fischer, and Arie Matsliah
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2014.v010a019
                     S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

      original message x. Another motivation for studying message reconstruction arises from the
      theory of Locally Decodable Codes.
          The second problem that we address is codeword reconstruction, which is similarly
      defined, but instead of reconstructing the message the goal is to reconstruct the codeword
      itself, given oracle access to its corrupted version.
          Error correcting codes that admit message and codeword reconstruction can be obtained
      from Locally Decodable Codes (LDC) and Self Correctable Codes (SCC) respectively.
      The main contribution of this paper is a proof that in terms of query complexity, these
      constructions are close to best possible in many settings, even when we disregard the length
      of the encoding.


1    Introduction
Consider the following problem: a large data set x ∈ {0, 1}k is stored on a storage device in encoded
form, but a small fraction of the encoding may be corrupted. We want to execute an algorithm M on x,
but most likely M will only need a small fraction of x for its execution. This can be the case if M is a
single process in a large parallelized system, or if M is a querying algorithm with limited memory, that
can even be adaptive, not knowing which bits of the input it will need in advance. Ideally, in all these
cases M should have the ability to efficiently decode any bit of x only when the need arises (in particular,
decoding every bit should be done by reading only a small fraction of the corrupted encoding), and to
ensure correctness it is also necessary that M succeeds (with high probability) in correctly decoding all
bits that are required for its execution. One way of ensuring this is to decode the whole message x in
advance, but in many cases this may be very inefficient, or even impossible.
     To perform the above task, a message reconstructor is required, that can simulate query access to
x in a local manner. Concretely, a message reconstructor is an algorithm that can recover the original
message x ∈ {0, 1}k from a corrupted encoding w ∈ {0, 1}n under two main conditions: (locality) for
every i ∈ [k] reconstructing xi requires reading w only at very few locations; (consistency) with high
probability, all indices i ∈ [k] should be reconstructed correctly. When the consistency condition is
weakened, so that only each index in itself is reconstructed correctly with high probability, the definition
becomes equivalent to Local Decoding, as formally defined in [15]. Informally, a locally decodable code
(LDC) is an error-correcting code which allows to probabilistically decode any symbol of an encoded
message by probing only a few symbols of its corrupted encoding. As in the case of LDCs, the main
challenge in message reconstruction is to find short codes that allow reconstruction of every xi with as
few queries to w as possible.
     Any q-query LDC can be used for O(q log k)-query message reconstruction, by simply repeating the
local decoding procedure O(log k) times (for every decoding request xi ) and outputting the majority vote.
The repetition will reduce the probability of incorrectly reconstructing xi to 1/(3k), so that with probability
2/3 all k indices are reconstructed correctly. Thus having O(1)-query LDCs (e. g., the Hadamard code)
immediately implies the existence of codes with an O(log k)-query message reconstructor. The question
that immediately arises is whether one can do better, specifically in terms of query complexity. The
first theorem of this paper (see Theorem 3.2) states that for any encoding of any length, a non-adaptive
message reconstructor must make Ω(log k) queries per decoding request.

                    T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                              516
                    Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

    Another family of error correcting codes related to LDCs are self-correctable codes (SCC) [5]. In
a q-query self-correctable code the probabilistic decoder is required to recover an arbitrary symbol of
the encoding itself. The second problem that we study here is of codeword reconstruction, which is
related to SCCs in the same manner that message reconstruction is related to LDCs. Concretely, a
codeword reconstructor is an algorithm that can recover a codeword y ∈ {0, 1}n from its corrupted version
w ∈ {0, 1}n with two conditions: (locality) for every i ∈ [n] reconstructing yi requires reading w only at
very few locations; (consistency) with high probability, all indices i ∈ [n] should be reconstructed correctly.
Here too, if the consistency condition is weakened, so that only each index in itself is reconstructed
correctly with high probability, then the definition becomes equivalent to self-correction and any q-query
SCC can be used for O(q log n)-query codeword reconstruction. Thus having O(1)-query SCCs (e. g., the
Hadamard code) immediately implies the existence of an O(log n)-query codeword reconstructor.
    The second theorem of this paper (see Theorem 3.4) gives lower bounds on the query complexity of
codeword reconstruction for linear codes.1 Denoting by n̂ the number of distinct rows in the generating
                                                                                          √
matrix of a linear code, Theorem 3.4 states that codeword reconstruction requires Ω( log n̂) queries per
decoding request. Since essentially all known SCCs are linear codes with n̂ = n, this bound is tight up to
the square root. Furthermore, a lower bound on the query complexity can neither be stated for general
(non-linear) codes, nor stated in terms of n alone. We elaborate on this in Remark 3.5 below.

1.1     Related work
1.1.1    Local reconstruction
Initially the model of online property reconstruction was introduced in [1]. In this setting a data set f is
given (we can think of it as a function f : [n]d → N) which should have a specified structural property
P, but this property may not hold due to unavoidable errors. The specific property studied in [1] was
monotonicity, and the goal of the reconstructor was to construct (online) a new function g that is monotone,
but not very far from the original f . The authors developed such a reconstructor, which given x ∈ [n]d
could compute g(x) by querying f at only few locations. However, the reconstructor had to “remember”
its previous answers in order to be consistent with some fixed monotone function g, making it not suitable
for parallel or memory-limited applications.
     This issue was addressed in [17], where the authors presented a purely local reconstructor for
monotonicity, that could output g(x) based only on few inspections of f . Given a random string r, the
reconstructor of [17] could locally reconstruct g at any point x, such that all answers were consistent
with some function g, which for most random strings r was monotone. Such a reconstructor affords an
obvious distributed implementation: generate one random seed, and distribute it to each of the copies of
the reconstructor. Since they are all determined by r, their answers will be consistent.
     Local reconstruction was also studied in the context of graphs [14, 12], functions on discrete do-
mains [4, 13] and geometric problems [7]. In particular, [14] studied the problem of expander recon-
struction. Given an oracle access to a graph that is close to being an expander, the algorithm of [14]
can simulate an oracle access to a corrected graph, that is an expander. Reconstruction in the context of
partition problems for dense graphs was implicitly studied in [12]. Given a dense graph G that is close to
   1 An code C, encoding a k bit string by an n bit string, is linear if (for all k) it has a generating matrix G ∈ {0, 1}n×k such

that for all x ∈ {0, 1}k , C(x) = Gx.


                         T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                                            517
                     S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

satisfying some partition property (say k-colorability), the approximate partitioning algorithm from [12]
can be made into one that efficiently and locally reconstructs a graph G0 that satisfies the partition property
and is close to G.

1.1.2   LDCs and SCCs
Locally decodable codes were explicitly defined in [15], but they were extensively studied before that in
the context of self-correcting computation, worst-case to average-case reductions, randomness extraction,
probabilistically checkable proofs and private information retrieval (see [18] for a survey). The initial
constructions of LDCs (and SCCs) were based on Reed-Muller codes and allowed to encode a k-bit
message by a poly(log k)-query LDC of length poly(k) [3, 8].
    For a fixed number of queries, the complexity of LDCs was first studied in [15], and has since been
the subject of a large body of work (see [18, 11] for surveys). To this day, there is a nearly exponential
gap between the best known upper bounds on the length of q-query LDCs [21, 9] and the corresponding
lower bounds [16, 19, 20], for any constant q ≥ 3.
    While interesting in its own right, studying message reconstruction can help us in understanding the
limitations of LDCs. If we view the local decoding algorithm as a deterministic algorithm that takes as
input i ∈ [k] and a random string r, then we require that for each i, most choices of r lead to the correct
decoding of the ith bit of the message. For message reconstruction we swap the for all i and for most r
quantifiers, and require that for most choices of r, the algorithm correctly decodes all k bits of the encoded
message.
    Our lower bounds imply that in the case of error-correcting codes, it is impossible to correlate the
success probabilities of a local decoder in a way that for most random strings r, either all bits of the
message are decoded correctly, or only very few of them are. This is in contrast to the results from
local reconstruction of general properties (described in previous section), where clever use of the fixed
randomness r allows reconstruction with very few queries.
    As we explained earlier, a constant query LDC of polynomial length would imply the existence of an
O(log k)-query message reconstructor of polynomial rate. Thus constructing an O(log k)-query message
reconstructable code of polynomial rate can be an intermediate step towards constant-query LDCs of
polynomial length.


2    Preliminaries
For α ∈ Σn we denote by αi the i-th symbol of α, and for a subset I ⊆ [n] we denote by αI the restriction
of α to the indices in I. The Hamming distance, or simply the distance d(α, β ) between two strings
α, β ∈ Σn , is the number of indices i ∈ [n] such that αi 6= βi . Given a set S ⊆ Σn and α ∈ Σn , the distance
of α from S is defined as d(α, S) = minβ ∈S {d(α, β )}, where as expected the minimum over an empty
set is defined to be +∞. For ε > 0 we say that α is ε-close to S if d(α, S) ≤ εn.
     An [n, k, d]Σ code is a mapping C : Σk → Σn such that for every x 6= x0 ∈ Σk , d(C(x),C(x0 )) ≥ d. The
parameter k is called the message length (or the information length) of the code and n is called the
block length or simply length of the code. Sometimes we refer to C also as a subset of Σn defined as
C = {y : ∃x ∈ Σk , y = C(x)}, and the elements y ∈ C are called codewords.

                    T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               518
                    Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

    Usually we will be interested in some family C of codes hCk ik∈N , with functions n = n(k) and
d = d(k), in which every Ck is an [n(k), k, d(k)]Σ code. Whenever the alphabet Σ is not specified it
means that Σ = {0, 1}. With a slight abuse of notation, for x ∈ Σk we will denote by C(x) the mapping
Ck (x) given by the “right size” code Ck from the family C. Similarly, for n = n(k) and w ∈ Σn we define
the distance of w from C as d(w, C) = d(w,Ck ), which is the minimal distance between w and some
codeword of C. If d(w, C) < d/2 then the minimum is achieved by a unique codeword y ∈ C, and we
denote this codeword by DC (w). In this case the original message is well defined, and it will be denoted
by C−1 (DC (w)).
    An [n, k, d] code C is linear if (for all k) it has a generating matrix G ∈ {0, 1}n×k such that2 for all
x ∈ {0, 1}k , C(x) = Gx. We denote by R(C) the number of distinct non-zero rows in C’s generating
matrix.


3     Definition of our model and statement of main results
Here we formally define message and codeword reconstruction, and state our main results. All our
definitions apply to non-adaptive algorithms, i. e., algorithms that base their query strategy solely on their
random bits, while basing their output on both random bits and the answers to the queries. For clarity
of analysis we make the dependence on a random string r of bits (assumed to be chosen uniformly and
independently) explicit, and we restrict3 our attention to families of [n, k, d] codes, where Σ = {0, 1}. We
disregard computation time considerations because all lower bounds presented here hold regardless of
computation time.
Definition 3.1 (Message Reconstruction). Let C be a family of [n, k, d] codes with d ≥ 2δ n for some
fixed δ > 0, let q, ρ : N → N and let ε be a fixed constant4 satisfying 0 < ε < δ . A (q, ε, ρ) message
reconstructor for C is a deterministic machine A, taking as inputs k, n = n(k) ∈ N, i ∈ [k] and a random
string r ∈ {0, 1}ρ(k) , that for every w ∈ {0, 1}n satisfies the following:
     • A generates a set Qi,r ⊆ [n] of q = q(k) indices and a function Ci,r : {0, 1}q → {0, 1}, and outputs
                                                                 
                                             Awr (i) , Ci,r wQ     .
                                                                             i,r


     • Let Awr ∈ {0, 1}k denote the concatenation Awr (1)Awr (2) · · · Awr (k). If d(w, C) ≤ εn then
                                          h                    i
                                        Pr Awr = C−1 (DC (w)) ≥ 2/3 .
                                                  r

When it is not important, we will avoid mentioning explicitly the random string length ρ, and will simply
use the term (q, ε) message reconstructor (or even q-query message reconstructor) to mean a (q, ε, Ω(1))
message reconstructor. For abbreviation, we may also use Ar to denote the algorithm A that operates
with the fixed random string r ∈ {0, 1}ρ .
    2 Here and in the following we may identify {0, 1} with the field of two elements in the standard way.
    3 Nevertheless, the bounds presented in this paper generalize to any constant size alphabets as well.
    4 For clarity of presentation ε will be considered to be an absolute constant. Nevertheless, the bounds presented here have

only logarithmic dependence on 1/ε.


                        T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                                          519
                     S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

    In this terminology, if a code C has a (q, ε) message reconstructor then it is locally decodable with q
queries, up to noise rate ε. On the other hand, a code that is locally decodable with q queries (up to noise
rate ε) has an (O(q log k), ε) message reconstructor, and so the existence of constant query LDCs implies
that there exist codes that have an (O(log k), Ω(1)) message reconstructor. Our first result shows that in
terms of the number of queries this is essentially optimal, even for arbitrarily long codes.

Theorem 3.2. There are no codes (of any length) with an o(log k)-query non-adaptive message recon-
structor.

    Next we formally define codeword reconstruction. Notice that here the query complexity and the
randomness of the algorithm are mentioned in terms of n, the block length, rather than k as in the definition
of message reconstruction.

Definition 3.3 (Codeword Reconstruction). Let C, δ , q, ρ and ε be as in Definition 3.1. A (q, ε, ρ)
codeword reconstructor for C is a deterministic machine A, taking as inputs k, n = n(k) ∈ N, i ∈ [n] and a
random string r ∈ {0, 1}ρ(n) , that for every w ∈ {0, 1}n satisfies the following conditions:

    • A generates a set Qi,r ⊆ [n] of q = q(n) indices and a function Ci,r : {0, 1}q → {0, 1}, and outputs
                                                                
                                            Awr (i) , Ci,r wQ     .
                                                                  i,r


    • Let Awr ∈ {0, 1}n denote the concatenation Awr (1)Awr (2) · · · Awr (n). If d(w, C) ≤ ε then
                                            h              i
                                          Pr Awr = DC (w) ≥ 2/3 .
                                             r


Here too, we may avoid mentioning ρ explicitly, and may use Ar to denote the algorithm A that operates
with the fixed random string r.

    If a code C has a (q, ε) codeword reconstructor then it is self-correctable with q queries, and conversely,
any code that is self-correctable with q queries up to noise rate ε has an (O(q log n), ε) codeword
reconstructor. As stated earlier, constant query SCCs exist, hence there are codes with an (O(log n), Ω(1))
codeword reconstructor. Since essentially all known LDCs and SCCs are linear codes, and furthermore
they satisfy R(C) = n (i. e., all rows in their generating matrix are distinct), we actually have linear codes
with an (O(log R(C)), Ω(1)) codeword reconstructor. Our second result shows that in the case of linear
codes one cannot get significantly better than that.
                                                                            p
Theorem 3.4. There are no linear codes (of any length) with an o( log R(C))-query non-adaptive
codeword reconstructor.

Remark 3.5. For general (non-linear) codes there is no lower bound on the query complexity which
is poly-logarithmic in n. This follows from a simple observation that if a code has a q-query message
reconstructor then it also has a qk-query codeword reconstructor (in qk queries it is possible to decode
the whole message and its encoding). So, for example, Long Codes admit a codeword reconstructor that
makes only O(k log k) = O(log log n log log log n) queries.

                    T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               520
                    Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

     Furthermore, even if we focus on linear codes only, stating the bound in Theorem 3.4 in terms of
n (instead of R(C)) is impossible, since there are linear [n, k, d] codes that have a q-query codeword
reconstructor, with q arbitrarily smaller than n. For example, let C0 be a linear [n0 , k, d 0 ] code with a
(q, ε) codeword reconstructor, and define a new linear [n = tn0 , k, d = td 0 ] code C as C(x) = C0 (x) · · · C0 (x),
namely, encoding x with t copies of C0 (x). Now for any t (and hence arbitrarily large n), C(x) has
a (q, ε/2) codeword reconstructor, which picks a random copy of C0 (x) from the encoding, and then
simulates on it the original codeword reconstructor for C0 .

   In the following corollary we use the fact that any linear code can be transformed into a systematic
code5 and combine Theorem 3.2 with Theorem 3.4.
                                                            p
Corollary 3.6. There is no linear code with an o(max{log k, log R(C)})-query codeword reconstructor.

   It is worth mentioning that, while both local decoding and self-correction become trivial in the
random-noise model (via simple repetition codes), this is not the case for reconstruction. In fact, the
query-complexity lower bounds from Theorems 3.2 and 3.4 apply to the random-noise model as well.
See more details in Section 6.1.


4     Proof of Theorem 3.2
Outline: Usually, lower bound proofs that use Yao’s Principle involve an input distribution that fools
any deterministic algorithm of a certain type (bounded query complexity in our case) with probability
larger than 1/3. However, in this proof we will need to analyse the probabilistic algorithm A itself,
giving special treatment to the indices that it queries with high probability. We first prove that for most of
the deterministic algorithms Ar that result from fixing the random seed r in A (the “typical” ones), the
number of bits that are reconstructed correctly based only on the indices that are frequently queried by A
is small. Then we show that there is a distribution that fools any such typical algorithm with very high
probability. Due to this technique, instead of the usual application of Yao’s Principle we argue that there
is a distribution D that fools at least half of the deterministic algorithms Ar with probability 1 − o(1).
Thus, the distribution D would fool the probabilistic algorithm A with probability at least 2/5 − o(1).
     Fix ε > 0. Let C be a family of [n, k, d] codes and let A be its (q, ε) message reconstructor. Our goal
is to prove that q = Ω(log k). Recall that Qi,r ⊆ [n] denotes the set of indices queried by A on input i ∈ [k],
given the random string r. For j ∈ [n] we define

                                               I( j, r) = {i ∈ [k] : j ∈ Qi,r } .

Namely, I( j, r) is the set of indices whose reconstructed value may depend on the j-th bit of the received
word, given that the random seed is r. Similarly, for a subset S ⊂ [n] we define
                                                 [
                                     I(S, r) =         I( j, r) = {i ∈ [k] : S ∩ Qi,r 6= 0}
                                                                                         / .
                                                 j∈S

We call an index j ∈ [n] influential with respect to r if |I( j, r)| > 10q, and non-influential otherwise.
    5 A linear code C is systematic if C(x)                         k
                                           [k] = x for all x ∈ {0, 1} .

                         T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               521
                      S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

     The following two lemmas follow by considering the bipartite graph Gr with message indices on the
left and code indices on the right, where edges are defined by Qi,r and I( j, r), i. e., i j is an edge if and
only if j ∈ Qi,r if and only if i ∈ I( j, r).

Lemma 4.1. For any r, there are at most k/10 influential indices in [n].

Proof. If there are more than k/10 influential indices then
                                    n              
                                                     k
                                   ∑ |I( j, r)| > 10 (10q) = kq .
                                   j=1

On the other hand
                                          n               k
                                         ∑ |I( j, r)| = ∑ |Qi,r | ≤ kq ,
                                         j=1             i=1

a contradiction.

Lemma 4.2. Let A0r ⊆ [n] be the set of influential indices (with respect to r). There exists a partition
A1r , A2r , . . . , ATr of [n] \ A0r into T ≤ k parts such that for all i ∈ [T ], |I(Air , r)| ≤ 10q.

Proof. Recall that ∑nj=1 |I( j, r)| ≤ kq from the previous proof. Let us construct a partition A1r , . . . , ATr of
the non-influential indices as follows: A1r contains the first `1 non-influential indices, where `1 is the
maximal number for which |I(A1r )| ≤ 10q (note that in particular `1 ≥ 1); then A2r contains the next `2
non-influential indices, where `2 is the largest number satisfying |I(A2r )| ≤ 10q and so on. We claim that
in the end of this process T ≤ k.
     Assume that T > k. Consider the partition B1 , . . . , BdT /2e of the non-influential indices where Bi =
Ar ∪ A2i
  2i−1
           r . Notice that since T ≥ k + 1, there are at least k/2 parts in this partition. By the definition of
the Ar , all but at most one (the last one) of the Bi satisfy I(Bi ) > 10q. So we have
      i


                              n               dT /2e−1
                             ∑ |I( j, r)| ≥ ∑ |I(Bi , r)| > (k/2 − 1)10q > kq
                             j=1                i=1

contradicting the fact ∑nj=1 |I( j, r)| ≤ kq.

     Now let us define a distribution D over received words. We will use the notation x ∼ D to mean that x
is chosen at random according to D, and whenever D is a set, we also use x ∼ D to mean that x is chosen
uniformly at random from D. Whenever the distribution or the set are clear from context we may omit
their specification.
     Recall that we need a distribution over received words that are ε-close to C, on which every determin-
istic message reconstructor fails with probability larger than 1/3. Instead, we define a distribution D that
will provide a word that is ε-close to C only with probability 1 − o(1). However, this will be sufficient
because we will show that any algorithm will with probability at least 2/5 − o(1) fail to reconstruct the
correct message. So also if we condition on the probability 1 − o(1) event we still get the same error
probability estimate.

                     T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                                  522
                 Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

    A random word w ∼ D is generated by picking a uniformly random x ∼ {0, 1}k , setting y = C(x) and
then obtaining w by flipping each of the bits of y with probability ε/2, independently of the other bits. In
fact this will be a distribution over w and x, but with some abuse of notation we will generally omit x, as
with probability 1 − o(1) it is just the message corresponding to the codeword closest to w. We need the
following simple fact about D.

Fact 4.3. For every q ≤ n, Q ⊆ [n] of size |Q| = q and α ∈ {0, 1}q ,

                                        Pr wQ = α ≥ (ε/2)q .
                                                    
                                       w∼D

   We shall prove that if the query complexity of A is o(log k), then the probability that A fails on
w ∼ D is large, namely,
                               Pr Awr = C−1 (DC (w)) ≤ 1/2 + o(1) .
                                                    
                                 w∼D,r

    For w ∼ {0, 1}n , r ∈ {0, 1}ρ and i ∈ [k] we say that Awr (i) is determined by wA0 if Awr (i) = Ayr (i) for
                                                                                      r
every y ∈ {0, 1}n with yA0 = wA0 (notice that in general, Awr (i) may be determined by wA0 even when
                            r       r                                                             r
Qi,r * A0r ). The next definition and lemma say that for most random strings r, the expected number of
indices correctly determined only by the influential indices is not very large, where the expectation is
taken over w ∼ D.

Definition 4.4. For every x ∈ {0, 1}k , w ∈ {0, 1}n and r ∈ {0, 1}ρ we set βrw,x = 0 if there is any index i
such that Awr (i) is determined by wA0 but does not match the value xi . If there is no such index, then we
                                        r
set βrw,x to be the number of i ∈ [k] such that Awr (i) is determined (correctly) by wA0 . We call r ∈ {0, 1}ρ
                                                                                        r
typical with respect to A if
                                                                2
                                                E [βrw,x ] ≤ k .
                                             w∼D,x∼D            3
Definition 4.5. For any typical r we call an w ∈ {0, 1}n r-hard if

                                                            5
                                                E [βrw,x ] ≤ k .
                                               x∼D          6
Lemma 4.6. For any typical r,
                                                                   4
                                            Pr [w is r-hard] >       .
                                           w∼D                     5
Proof. This proof follows directly from the definition of typical and r-hard and Markov’s Inequality.

Lemma 4.7. Let A be a message reconstructor for C. Then Prr [r is typical w. r. t. A] > 1/2.

    We defer the proof of Lemma 4.7 to Section 4.1, and first show how to use it to prove Theorem 3.2.
    For every random string r ∈ {0, 1}ρ we denote by N(r), 0 ≤ N(r) ≤ k, the expected number of
correctly reconstructed bits by Ar , where the expectation is taken over w ∼ D. Formally,
                                              h                        i
                                 N(r) = E k − d(Awr , C−1 (DC (w))) .
                                           w∼D


                     T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               523
                     S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

Furthermore, given y ∈ D (by y ∈ D we mean y ∈ {0, 1}n that is in the support of D) and S ⊆ [n] we
denote by N(r, y, S) the expected number of correctly reconstructed bits by Ar , where the expectation is
taken over w ∼ D conditioned on the event wS = yS. Formally, N(r, y, S) is equal to
                                    h                                 i
                                 E k − d(Awr , C−1 (DC (w))) wS = yS .
                                w∼D

Lemma 4.8. For any typical r and y that is r-hard we have,
                                                                         !
                                                          (ε/2)11q/10
                                    N(r, y, A0r ) ≤ k 1 −                      .
                                                              3

Proof. By the definition of a typical r and the fact that y is r-hard, the expected number of bits whose
reconstructed value either depends on non-influential indices or is guaranteed to be always wrong is at
least k/6. Call these bits free bits. This means that every free bit may be incorrectly reconstructed by Ar
for some assignment α to the non-influential indices (if it can be correctly reconstructed at all). So, by
Fact 4.3 the probability that Ar fails on a specific free bit, taken over

                                             w ∼ D | wA0 = yA0 ,
                                                           r       r

is at least (ε/2)11q/10 (we can assume that the reconstructed value of each bit depends on at most
q = log k ≤ log n of the non-influential indices, since otherwise the query complexity is not as advertised
and we are done and we know that |A0r | < q/10). Hence, by linearity of expectation, the expected number
of bits that are reconstructed correctly by Ar , taken over w ∼ D, is at most
                                                                             !
                                                                 (ε/2)11q/10
                                          ε 11q/10 
                            5     k
                              k+     1−                 = k 1−                 .
                            6     6         2                          6

    We now define a sequence of T + 1 random variables forming a Doob martingale. For a fixed
r ∈ {0, 1}ρ , y ∈ D and every t, 0 ≤ t ≤ T , we define Htr,y , N(r, y, A0r ∪ · · · ∪ Atr ), that is the expected
number of bits reconstructed correctly by Ar , where the expectation is taken over all w ∼ D that agree
with y on all indices in A0r ∪ A1r ∪ · · · ∪ Atr . That is, in the tth step of the martingale we expose y|Atr .
    By Lemma 4.8,                                                           !
                                                              (ε/2)  11q/10
                                            H0r,y ≤ k 1 −
                                                                    6

for all typical r and r-hard y. On the other hand, if Ar properly reconstructs y then HTr,y = k. Thus, for
every typical r we have             h                  i       h        i
                                  Pr Ayr = C−1 (DC (y)) = Pr HTr,y = k
                                 y∼D                            y∼D

which is at most
                                           h                 (ε/2)11q/10 i
                                        Pr HTr,y − H0r,y ≥ k               .
                                       y∼D                       6

                     T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               524
                  Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

    By Lemma 4.2, T ≤ k and the influence I(Atr , r) of each set Atr , 1 ≤ t ≤ T , is bounded by 10q. Thus
                            r,y
for all 1 ≤ t < T we have |Ht+1 − Htr,y | ≤ 10q. Now we can apply Azuma’s Inequality [2], by which we
obtain that                              "                            #
                                            r,y   r,y   (ε/2)11q/10
                                   Pr HT − H0 ≥ k
                                  y∼D                        6

is at most                                                !                                         !
                                         −k2 (ε/2)11q/5                     −k(ε/2)11q/5
                               exp                            ≤ exp
                                          36T (10q)2                          3600q2

where in the last inequality we used the fact that T ≤ k.
     This means that the probability that A reconstructs all k bits correctly with a typical r is o(1),
unless q = Ω(log k). Since a random r is typical with probability at least 1/2 and y is r-hard with
probability at least 4/5, we conclude that unless q = Ω(log k), A fails on w ∼ D with overall probability
at least (1/2)(4/5) − o(1) = 2/5 − o(1). This completes the proof of Theorem 3.2, pending the proof of
Lemma 4.7 which we present next.

4.1    Proof of Lemma 4.7
We need to prove that most random strings r satisfy

                                                               2
                                                   E [βrw,x ] ≤ k .
                                                 w,x∼D         3

To this end, we need the following auxiliary lemma, which is essentially an entropy preservation argument.

Lemma 4.9. For any two (possibly probabilistic) algorithms given by their functions, an encoder
E : {0, 1}k → {0, 1}k/10 and a decoder D : {0, 1}k/10 → {0, 1}k ,

                                                         x = D(E(x)) ≤ 2−9k/10 ,
                                                                   
                                            Pr
                                            ρ       k
                                   r∼{0,1} ,x∼{0,1}

where r denotes the outcome of the random coin flips of E and D. Furthermore, the inequality holds even
if E and D have shared randomness.

Proof. Let E, D be the two algorithms. Denote by Er and Dr their deterministic versions operating with a
fixed random seed r. For every possible r, partition the set {0, 1}k into at most mr ≤ 2k/10 parts X1r , . . . , Xmr r
such for all i and x, y ∈ Xir we have Er (x) = Er (y). Observe that for every fixed r, and conditioned over
the event that a random x falls in Xir , Prx∼Xir [Dr (Er (x)) = x] ≤ 1/|Xir |, and clearly the probability that x
falls in Xir is exactly |Xir |/2k . So for every fixed r we have

                                                      mr                |Xir |
                                                                                             
                                                                                       1
                                   Pr x = Dr (Er (x)) = ∑
                                     x
                                                               i=1        2k           |Xir |

which is equal to ∑m      k
                   i=1 1/2 ≤ 2
                     r         −9k/10 . Hence this holds for a random r as well.



                      T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                                    525
                     S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

    Now consider the following encoder/decoder pair E, D corresponding to C and A. We will assume
that E and D share a random string r, and describe their operation for every possible r.
    The encoder Er on x ∈ {0, 1}k computes y = C(x), converts y into w by flipping each bit with
probability ε/2 independently of the other bits, and outputs
                                                    Er (x) , wA0
                                                                     r

(the identity of the flipped bits is not part of the shared randomness). By the definition of A0r , |Er (x)| ≤
k/10 for every x and r. (If necessary, Er (x) can be padded arbitrarily up to length k/10.)
    Before defining the decoder, let us denote by Srz ⊆ [k] the set of bits for which the value is determined
by Ar , given that the assignment to the influential indices A0r of the received word equals z.
    The decoder Dr on z ∈ {0, 1}k/10 constructs
                                              Dr (z) , x0 ∈ {0, 1}k
by reconstructing xi0 according to Ar for all i ∈ Srz , and guessing xi0 ∈ {0, 1} uniformly at random for all
other indices i ∈ [k] \ Srz .
    We can now prove Lemma 4.7 by showing that the pair E, D defined above contradicts the statement
of Lemma 4.9, unless for most random strings r,
                                                                2
                                                    E [βrw,x ] ≤ k .
                                                  w,x∼D         3
We start by observing that the distributions Er (x) : x ∼ {0, 1}k and wA0 : w ∼ D are identical for every r.
                                                                         r
Therefore, since for every r the value βrw,x depends only on wA0 and x, we have
                                                                             r
                                                                     ext(Er (x)),x 
                                      E βrw,x =
                                            
                                                            E        βr
                                    w,x∼D                x∼{0,1}k

for all r as well, where ext(Er (x)) ∈ {0, 1}n is any arbitrary string (extension) whose restriction to A0r
equals Er (x).
    Now assume towards a contradiction that for at least half of the random strings r,
                                         w,x               ext(Er (x)),x  2
                                  E      βr =          E     βr              > k.
                                w,x∼D               x∼{0,1} k                 3

Since 0 ≤ βrw,x ≤ k for all w, x and r, this means that for at least half of the random strings r,
                                               ext(Er (x)),x       1
                                         Pr    βr             ≥ k/2 > .
                                      x∈{0,1} k                      6
                                          ext(E (x)),x
Therefore (taking into account that βr r            > 0 also implies that each of the bits not correctly
                0
determined by Ar obtains the correct value with probability 1/2 independently of the others),
                                                             
                                                           1    1 −k/2
                                 Pr      x = Dr (Er (x)) ≥            2
                             r,x∼{0,1}k                      2    6

which is more than 2−9k/10 and thus contradicting Lemma 4.9.

                    T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                              526
                  Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

5     Proof of Theorem 3.4
Outline: For the proof of Theorem 3.4 we show that there ispan input distribution D on which any
deterministic codeword reconstructor with query complexity o( log R(C)) fails with probability larger
than 1/3. This is done by constructing a set of indices S ⊂ [n] such that for every deterministic recon-
structor and for every i ∈ S, if Ei is the event that the reconstructor errs on index i, then the events Ei ,
i ∈ S are independent (with respect to D). Since the reconstructor fails unless it reconstructs all indices
i ∈ S correctly, the probability that it does not fail goes down exponentially with the size of the set S.
Thus it is sufficient to show that the probability of each Ei is not too low, and that S is sufficiently large.
The latter is done using the Sunflower Lemma [10] and the fact that C is a linear code.
    Fix ε > 0. Let C be a family of linear [n, k, d] codes and let A be its (q, ε) codeword reconstructor.
Let G1 , . . . , Gn ∈ {0, 1}k denote the rows of C’s generating matrix G, satisfying C(x)i = hGi , xi (mod 2)
for every x ∈ {0, 1}k and i ∈ [n], and let n̂ , R(C) denote the number of distinct rows in G.
    The distribution D is defined similarly to the definition in Section 4, that is, a random word w ∼ D
is generated by picking a uniformly random y ∈ C and then obtaining w by flipping each of the bits of
y with probability ε/2, independently of the other bits. As we also mentioned in Section 4, w ∼ D is
ε-close to C only with probability 1 − o(1), but this will be sufficient because we will show that for any r,
Ar will fail to reconstruct the correct codeword with probability at least 1/2.
Lemma 5.1. The following holds for D and any linear code C:
    1. Let T ⊆ [n] and i ∈ [n] \ T be such that Gi (the i-th row of C’s generating matrix) is linearly
       independent of rows {G j : j ∈ T }. Then for any α ∈ {0, 1}|T | ,
                                                                                   
                           Pr       DC (w)i = 1 is equal to          Pr   DC (w)i = 0
                          w∼D:w =α                                      w∼D:w =α
                                T                                              T

       and thus both are equal to 1/2, where “w ∼ D : wT = α” is shorthand for “w ∼ D conditioned
       over the event that wT = α.”
    2. Let S1 , . . . , St ⊆ [n] be disjoint sets with |Si | ≤ q ≤ log n for all i ∈ [t]. For any sequence

                                                   hαi ∈ {0, 1}|Si | ii∈[t]

       of partial assignments, we have that
                                                                                t
                         Pr wS = αi for some i ∈ [t] is at least 1 − 1 − (ε/2)q .
                                                    
                          w∼D       i



Proof. For the first item, notice that since the codewords of C form a linear subspace, for every i ∈ [n]
we have Pry∈C [yi = 1] = Pry∈C [yi = 0] = 1/2 (here we assume without loss of generality that C is not
redundant, i. e., the generating matrix of C has no all-zero rows). Conditioning the above over some
restriction to yT has no effect on indices i that are linearly independent of the indices in T . This holds
since the subset of C formed by the restriction is an affine subspace not parallel to the kernel of Gi . Finally,
in D the i-th bit of y can be flipped with some probability, but this has no effect since the probability of yi
being flipped is independent of the value yi .
    The second item follows from immediately from the definition of D.

                      T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                               527
                       S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

    Recall that Qi,r ⊆ [n] is the set of indices queried by A on input i ∈ [n] and random string r. From this
point on let us fix r and prove that unless
                                                         
                                                q , max |Qi,r |
                                                     i∈[n]
              √
is of order Ω( log n̂), the probability (over w ∼ D) that Ar correctly reconstructs w (into y = DC (w)) is
less than 1/2. Since r is fixed, we will make the notation shorter by omitting the subscript r from the sets
Qi,r .
     The proof proceeds in two steps. In the first step we find a large subset F ⊆ [n] of indices, such that
Qi ∩ Q j = Qi0 ∩ Q j0 for all i, j, i0 , j0 ∈ F, and in addition, for all i, j ∈ F the values DC (w)i , DC (w) j are
independent of
                                                      wQ ∩ Q .
                                                        i     j
F is constructed by first finding a large sunflower in the sets Q1 , . . . , Qn and then removing from it the
(not too many) “bad” petals that violate the above property. In the second part of the proof we show that
given a large enough set F as above, the probability that Ar incorrectly reconstructs at least one of the
indices in F is high.
Definition 5.2. A sunflower with t petals and a core T is a collection of sets, Q1 , . . . , Qt , such that
Qi ∩ Q j = T for all i 6= j. We call the number of petals the size of the sunflower.
Lemma 5.3 (Sunflower Lemma [10]). Let Q be a family of n sets, each set having cardinality at most q.
If n > q!(t − 1)q then Q contains a sunflower with t petals. In particular, Q contains a sunflower of size at
least (1/q)n1/q .
    Let R be a set of n̂ indices corresponding to n̂ distinct rows in G. Let Q = hQi ii∈R be the family of sets
queried by Ar on inputs i ∈ R. By definition, Q contains n̂ sets, each of size at most q. From Lemma 5.3
we can obtain a sunflower S ⊆ Q, S = Qi1 , . . . , Qit , with t ≥ (1/q)n̂1/q petals. Let T ⊂ [n] denote the core
of S. We define the span of the core T as span(T ) = span{Gi : i ∈ T } which is equal to
                                    n                                        o
                                      ∑ αi Gi (mod 2) : ∀i, αi ∈ {0, 1}
                                      i∈T

which is a subset of {0, 1}k . Since |T | ≤ q the span of T contains at most 2q different rows from G.
    Next we form a family S0 ⊆ S of sets by removing from S every petal Qi j for which Gi j belongs to
the span of T . Namely, we set
                                      S0 , Qi j ∈ S : Gi j ∈
                                            
                                                           / span(T ) .
Notice that the resulting family S0 is a sunflower as well, with the same core T . Furthermore, the size of
S0 is at least t 0 ≥ (1/q)n̂1/q − 2q .
     Intuitively, the query sets in S0 correspond to those indices in R that are “independent” of wT . We
call these indices free indices and denote their set by F. Namely, F = {i : Qi ∈ S0 }. By the first item of
Lemma 5.1, for every free index i ∈ F and all α ∈ {0, 1}|T | we have that Prw∼D:w =α [DC (w)i = 1] is
                                                                                        T
same as Prw∼D:w =α [DC (w)i = 0]:
                   T
                                                                                       1
                              Pr     [DC (w)i = 1] =         Pr     [DC (w)i = 0] =      .                    (5.1)
                         w∼D:w =α
                               T
                                                        w∼D:w =α
                                                              T
                                                                                       2

                       T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                                 528
                    Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

                                       √
     Now we can show that if q = o( log n̂), then with probability at least 1/2 one of the free indices will
be reconstructed incorrectly by Ar . Assume that the contrary is true, i. e., Prw∼D [βw ] > 1/2, where βw
is the indicator of the event that Ar reconstructs all free indices correctly. This implies that there exists
an α ∈ {0, 1}|T | such that Prw∼D:w|T =α [βw ] > 1/2. If for some i ∈ F the value of Awr (i) is determined by
the fact that wT = α then by equation (5.1) the probability that Ar reconstructs i incorrectly is 1/2. So
it must be the case that having wT = α does not determine Awr (i) for any of the free indices i, and in
particular, for every Si , Qi \ T , Qi ∈ S0 there must be an assignment αi to wS for which Ar reconstructs
                                                                                  i
i incorrectly. Since the sets Si are disjoint we can apply the second item of Lemma 5.1 by which the
probability that one of the free indices is reconstructed incorrectly is at least
                                                                                 1 1/q
                                                        0                              −2q
                               1 − (1 − (ε/2)q )|S | ≥ 1 − (1 − (ε/2)q ) q n̂

which is at least
                                                            q 1 1/q −2q )
                                             1 − e−(ε/2) ( q n̂             .
Notice that the above probability is larger than 1/2 (in fact it is almost 1) if
                                                                   
                                                q       1 1/q
                                         (ε/2)            n̂ − 2q       > 10 ,
                                                        q
                            √
which is the case for q = o( log n̂). Hence a contradiction.


6     Extensions and open problems
6.1   Reconstruction against random noise
In the usual definition of locally decodable codes (and self-reconstructable codes) it is assumed that
the noise is adversarial, i. e., that the set of corrupted locations is chosen in a worst case manner.
Our definition of message and codeword reconstruction is against adversarial noise as well. But does
reconstruction become easier in the random-noise model, where the received word w is obtained by
flipping (independently) each bit of the codeword y with probability at most ε, and the requirement is that
for every y ∈ C the decoding/correction/reconstruction is successful with probability at least 2/3, taken
over both r and w?
     While both local decoding and self-correction become trivial in the random-noise model (via simple
repetition codes), this is not the case for reconstruction. In fact, our proofs used input distributions that
exactly correspond to the random noise model. Moreover, there is a general reduction that allows to
translate any query-complexity lower bound in the adversarial-noise model for message reconstruction to
a lower bound in the random-noise model (or alternatively, translate any upper bound in the random-noise
message reconstruction model into one that works in the adversarial-noise model) via LDCs:

Claim 6.1. Let C be a code with a q-query message reconstructor against random noise, and let H be a
p-query LDC. Then the code H(C)(x) , H(C(x)) (the LDC H composed with C) has an O(pq)-query
message reconstructor in the adversarial-noise model.

                       T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                          529
                      S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

    We omit the formal proof of Claim 6.1, but the main idea is that the additional LDC encoding enables
converting adversarial noise into random noise. This is the case since by the definition of an LDC, every
bit of the codeword y ∈ C can be decoded correctly with high probability, independently of other bits.
    Combining Claim 6.1 with Theorem 3.2, and using the fact that there are constant query LDCs, we
get that message reconstruction against random noise requires Ω(log k) queries. We can also deduce
correlations for lower bounds concerning codeword reconstruction, however here the size of the LDC
used becomes important as it affects the codeword size of the combined code.
    We note that while there are codes of super-polynomial length with an O(log k)-query message
reconstructor, we do not know any polynomial-length code with an O(log2 k)-query message reconstructor.
So, the real bound in Theorem 3.2 may be higher than Ω(log k) for efficient codes. On the other hand, the
repetition code, whose encoding is obtained by concatenating O(log k) copies of the original message,
has a trivial log k-query message reconstructor against random noise. Thus for random noise our lower
bound is optimal, irrespective of the length of the code.


6.2     Partial reconstruction
In a more general setting, we may require that a message reconstructor should decode correctly any
specified t bits of the message, instead of all k. It is straightforward to extend the lower bound in
Theorem 3.2 for this generalization to Ω(logt). Similarly, if we require that a codeword reconstructor
should decode correctly any pt bits of the codeword instead of all n, then we can extend Theorem 3.4 to
provide a lower bound of Ω( log tˆ), where tˆ is the number of distinct rows in the t rows corresponding
to the decoded part.


6.3     Open problems
      • Our results suggest that one cannot get message reconstructors that are significantly more query-
        efficient than the amplified versions of constant-query LDCs. It is an interesting question whether
        this connection is bidirectional, i. e., whether q-query message reconstruction implies q-query local
        decoding with error probability O(1/k).

      • Are there codes of polynomial length that have O(log k)-query message reconstructors? A positive
        answer would be an intermediate step towards proving that efficient constant query LDCs exist.

      • In the case of codeword reconstruction, Theorem 3.4 says that for any linear code the codeword
                                                   √
        reconstructor must have query complexity Ω( log n̂). On the other hand we have linear codes with
        O(log n̂)-query codeword reconstructor. We expect the upper bound to be the correct one, but we
        are currently unable to close this gap.


Acknowledgments
We thank C. Seshadhri, David Garcia-Soreano and Ronald de Wolf for useful discussions on the topic. In
particular, we thank Ronald de Wolf for many valuable comments on an early draft of this paper.

                     T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                            530
               Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

References
 [1] N IR A ILON , B ERNARD C HAZELLE , S ESHADHRI C OMANDUR , AND D ING L IU: Property-
     preserving data reconstruction. Algorithmica, 51(2):160–182, 2008. Preliminary version in
     ISAAC’04. [doi:10.1007/s00453-007-9075-9] 517
 [2] N OGA A LON AND J OEL H. S PENCER: The Probabilistic Method. Wiley, 1992. 525
 [3] L ÁSZLÓ BABAI , L ANCE F ORTNOW, L EONID A. L EVIN , AND M ARIO S ZEGEDY: Checking
     computations in polylogarithmic time. In Proc. 23rd STOC, pp. 21–31. ACM Press, 1991.
     [doi:10.1145/103418.103428] 518
 [4] A RNAB B HATTACHARYYA , E LENA G RIGORESCU , M ADHAV J HA , K YOMIN J UNG , S OFYA
     R ASKHODNIKOVA , AND DAVID P. W OODRUFF: Lower bounds for local monotonicity reconstruc-
     tion from transitive-closure spanners. SIAM J. Discrete Math., 26(2):618–646, 2012. Preliminary
     version in RANDOM’10. [doi:10.1137/100808186] 517
 [5] M ANUEL B LUM , M ICHAEL L UBY, AND RONITT RUBINFELD: Self-testing/correcting with appli-
     cations to numerical problems. J. Comput. System Sci., 47(3):549–595, 1993. Preliminary version
     in STOC’90. [doi:10.1016/0022-0000(93)90044-W] 517
 [6] S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH: Query complexity lower
     bounds for reconstruction of codes. In Proc. 2nd Symp. on Innovations in Computer Science
     (ICS’11), pp. 264–274, 2011. Accessible at ICS. See also ECCC. 515
 [7] B ERNARD C HAZELLE AND C. S ESHADHRI: Online geometric reconstruction. J. ACM, 58(4):14,
     2011. Preliminary version in SoCG’06. [doi:10.1145/1989727.1989728] 517
 [8] B ENNY C HOR , O DED G OLDREICH , E YAL K USHILEVITZ , AND M ADHU S UDAN: Private
     information retrieval. J. ACM, 45(6):965–981, 1998. Preliminary version in FOCS’95.
     [doi:10.1145/293347.293350] 518
 [9] K LIM E FREMENKO: 3-query locally decodable codes of subexponential length. SIAM J. Comput.,
     41(6):1694–1703, 2012. Preliminary version in STOC’09. [doi:10.1137/090772721] 518
[10] PAUL E RD ŐS AND R ICHARD R ADO: Intersection theorems for systems of sets. J. London Math.
     Soc., 35:85–90, 1960. Accessible at Alfréd Rényi Institute of Mathematics. 527, 528
[11] W ILLIAM I. G ASARCH: A survey on private information retrieval (computational complexity
     column). Bulletin of the EATCS, 82:72–107, 2004. Accessible at Author’s Website. 518
[12] O DED G OLDREICH , S HAFI G OLDWASSER , AND DANA RON: Property testing and its connection
     to learning and approximation. J. ACM, 45(4):653–750, 1998. Preliminary version in FOCS’96.
     [doi:10.1145/285055.285060] 517, 518
[13] M ADHAV J HA AND S OFYA R ASKHODNIKOVA: Testing and reconstruction of Lipschitz functions
     with applications to data privacy. SIAM J. Comput., 42(2):700–731, 2013. Preliminary version in
     FOCS’11. See also ECCC. [doi:10.1137/110840741] 517

                  T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                      531
                   S OURAV C HAKRABORTY, E LDAR F ISCHER , AND A RIE M ATSLIAH

[14] S ATYEN K ALE , Y UVAL P ERES , AND C. S ESHADHRI: Noise tolerance of expanders and sublinear
     expander reconstruction. SIAM J. Comput., 42(1):305–323. Preliminary version in FOCS’08.
     [doi:10.1137/110837863] 517

[15] J ONATHAN K ATZ AND L UCA T REVISAN: On the efficiency of local decoding procedures for error-
     correcting codes. In Proc. 32nd STOC, pp. 80–86. ACM Press, 2000. [doi:10.1145/335305.335315]
     516, 518

[16] I ORDANIS K ERENIDIS AND RONALD DE W OLF: Exponential lower bound for 2-query locally
     decodable codes via a quantum argument. J. Comput. System Sci., 69(3):395–420, 2004. Preliminary
     versions in STOC’03 and CoRR. [doi:10.1016/j.jcss.2004.04.007, arXiv:quant-ph/0208062] 518

[17] M ICHAEL E. S AKS AND C. S ESHADHRI: Parallel monotonicity reconstruction. In Proc. 19th Ann.
     ACM-SIAM Symp. on Discrete Algorithms (SODA’08), pp. 962–971. ACM Press, 2008. Accessible
     at ACM DL. 517

[18] L UCA T REVISAN: Some applications of coding theory in computational complexity. Quaderni di
     Matematica, 13:347–424, 2004. See ECCC and CoRR. [arXiv:cs/0409044] 518

[19] S TEPHANIE W EHNER AND RONALD DE W OLF: Improved lower bounds for locally decodable
     codes and private information retrieval. In Proc. 32th Internat. Colloq. on Automata, Languages
     and Programming (ICALP’05), pp. 1424–1436, 2005. [doi:10.1007/11523468_115, arXiv:quant-
     ph/0403140] 518

[20] DAVID P. W OODRUFF: New lower bounds for general locally decodable codes. Electronic Collo-
     quium on Computational Complexity (ECCC), 14(006), 2007. Accessible at ECCC. 518

[21] S ERGEY Y EKHANIN: Towards 3-query locally decodable codes of subexponential length. J. ACM,
     55(1):1, 2008. Preliminary version in STOC’07. [doi:10.1145/1326554.1326555] 518


AUTHORS

      Sourav Chakraborty
      Associate professor
      Chennai Mathematical Institute, Chennai, Inda
      sourav cmi ac in
      http://www.cmi.ac.in/~sourav




                   T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                      532
             Q UERY C OMPLEXITY L OWER B OUNDS FOR R ECONSTRUCTION OF C ODES

    Eldar Fischer
    Faculty of Computer Science
    Technion – Israel Institute of Technology
    Technion City, Haifa 32000, Israel
    eldar cs technion ac il
    http://www.cs.technion.ac.il/users/eldar


    Arie Matsliah
    Software Engineer
    Google, Mountain View, CA
    ariem google com


ABOUT THE AUTHORS

    S OURAV C HAKRABORTY is an associate professor at the Chennai Mathematical Institute,
       India. He completed his Ph. D. in 2008 at the University of Chicago under the supervision
       of László Babai. His research interests lie in the classical and quantum complexity of
       Boolean functions (including property testing and different combinatorial measures of the
       complexity of Boolean functions), electronic commerce, graph algorithms, and coding
       theory.


    E LDAR F ISCHER completed his Ph. D. in 1999 at Tel-Aviv University under the supervision
       of Noga Alon, and has been a member of the Faculty of Computer Science at the Israel
       Institute of Technology (the Technion) since 2001. His main interests lie in the analysis
       of algorithms, especially property testers and sublinear-time algorithms, but he is also
       interested in most things combinatorial. His main extracurricular interest centers on
       board and card games.


    A RIE M ATSLIAH is a software engineer in the ads-quality team at Google. He graduated
       from the Technion – Israel Institute of Technology in 2008; his advisor was Eldar Fischer.
       His research interests include sublinear algorithms and complexity of Boolean functions.




                 T HEORY OF C OMPUTING, Volume 10 (19), 2014, pp. 515–533                           533