Sharp Bounds for Population Recovery

Authors Anindya De, Ryan O'Donnell, Rocco A. Servedio,
Plaintext
                            T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20
                                         www.theoryofcomputing.org




         Sharp Bounds for Population Recovery
               Anindya De∗                   Ryan O’Donnell†                      Rocco A. Servedio‡
               Received August 25, 2017; Revised January 22, 2019; Published September 25, 2020




       Abstract. The population recovery problem is a basic problem in noisy unsupervised learn-
       ing that has attracted significant attention in recent years (Dvir et al., ITCS’12), (Wigderson
       and Yehudayoff, STOC’12), (Moitra and Saks, FOCS’13), (Batman et al., RANDOM’13),
       (Lovett and Zhang, STOC’15), (De et al., FOCS’16). A number of variants of this problem
       have been studied, often under assumptions on the unknown distribution (such as that it
       has restricted support size). In this article we study the sample complexity and algorithmic
       complexity of the most general version of the problem, under both the bit-flip noise and the
       erasure noise models. We give essentially matching upper and lower sample complexity
       bounds for both noise models, and efficient algorithms matching these sample complexity
       bounds up to polynomial factors.
ACM Classification: G.3, F.2.0
AMS Classification: 68Q32
Key words and phrases: population recovery, Littlewood polynomial, heavy hitters, Hadamard three-
circle theorem

1     Introduction
1.1    The erasure noise and bit-flip noise population recovery problems
The noisy population recovery (NPR) problem is to learn an unknown probability distribution D on
{0, 1}n , under ν-noise, to `∞ -accuracy ε.1 In this problem the learner gets access to independent samples y,
   ∗ Supported by NSF grant CCF 1926872 (transferred from CCF 1814706) and CCF 1910534. Most of the work done as a

faculty at Northwestern University.
   † Supported by NSF grant CCF-1717606.
   ‡ Supported by NSF grants CCF-1420349, CCF-1563155 and CCF 1814873.
   1 With high probability. Because we are not concerned with logarithmic factors in our time/sample complexity, we will for

simplicity omit discussion of the standard tricks (independent repetition, taking the median of estimators) used to boost success


© 2020 Anindya De, Ryan O’Donnell, and Rocco A. Servedio
c b Licensed under a Creative Commons Attribution License (CC-BY)                                 DOI: 10.4086/toc.2020.v016a006
                           A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

each distributed as follows: First x ∼ D, and then y ∼ Noiseν (x), where Noiseν (·) denotes the application
of bit-flip noise or erasure noise (described below). The learner’s task is to produce an estimate D
                                                                                                   b of D
satisfying kD − Dk∞ ≤ ε (with high probability). For the sake of a compact representation, we assume
              b
the learner only lists the nonzero values of D;
                                             b this means that a successful learner need only list O(1/ε)
nonzero values. We are interested in minimizing both the sample complexity and the running time of
learning algorithms.
    A simpler variation of the NPR problem is the estimation task. Here the algorithm does not need
to compute a complete D;   b it only needs to produce an ε-accurate estimate of D(u) for a given input
            n
u ∈ {0, 1} . Certainly the estimation task is no harder than full NPR; conversely, it is known and not
hard to see (see Section 2.1) that given the ability to do estimation, one can do full NPR with just a
poly(n, 1/ε) factor slowdown. Hence we mainly focus on estimation in this paper.
    As mentioned above, we consider two different models of noise. Each involves a parameter 0 < ν < 1;
smaller values of ν correspond to more noise, so ν may be better thought of as a “correlation” parameter.


Erasure noise. For x ∈ {0, 1}n we define Erase1−ν (x) to be the distribution on {0, 1, ?}n given by
independently replacing each coordinate of x with the symbol ‘?’ with probability 1 − ν. Thus, ν is the
retention probability for each coordinate.


Bit-flip noise. For x ∈ {0, 1}n we define Flip 1−ν (x) to be the distribution on {0, 1}n given by indepen-
                                                  2
dently flipping each coordinate of x with probability (1 − ν)/2. Equivalently, each coordinate of x is
retained with probability ν (as in erasure noise), and is otherwise replaced with a uniformly random bit.
This is also the model of noise associated with the so-called “Bonami noise operator” Tν (see [13] for the
precise description and many applications of this operator).


1.2    Our results
For the bit-flip noise population recovery problem, our main result is a lower bound on the sample
complexity of estimation, as well as a full NPR algorithm whose running time (hence also sample
complexity) matches it up to polynomial factors.

Theorem 1.1 (NPR bit-flip noise upper and lower bounds). Let ε > 0 be sufficiently small and let n ∈ N.
Then any estimation algorithm for NPR with bit-flip noise must use at least the following number of
samples:
                                                                    14
              exp Θ n1/3 · ln2/3 (1/ε)/ν 2/3              if 2 ln(1/ε)
                                                                   n         ≤ ν ≤ 1/2,
                                                   
              exp Θ n1/3 · ln2/3 (1/ε) · (1 − ν)1/3       if 1/2 ≤ ν ≤ 1 − 2 ln(1/ε)
                                                                                   n  .

Here Θ(·) hides an absolute constant factor independent of ν and n. Furthermore, for 2 ln(1/ε)/n ≤ ν ≤
1 − 2 ln(1/ε)/n, there is an algorithm for the full NPR problem with bit-flip noise having running time
and number of samples equal to the above times poly(n, 1/ε).

probabilities. We will also always assume, without loss of generality, that ε is at most some sufficiently small absolute constant.


                             T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                                2
                              S HARP B OUNDS FOR P OPULATION R ECOVERY

     Prior to the present paper and the recent independent article [14], no nontrivial upper or lower bounds
were known even for the sample complexity of the general bit-flip noise population recovery problem.
(See [17, 10, 6] for earlier work that gave upper bounds and algorithms under the additional assumption
that the unknown distribution D is guaranteed to be supported on at most k strings.)
     For the erasure noise population recovery problem, our main positive result is an efficient algorithm,
and our main negative result is a near-matching lower bound for algorithms which meet either of the
following conditions: (a) only uses information about the number of 1s that are present in the received
string or (b) the ambient dimension n is sufficiently large. More precisely, we have the following theorem.
Theorem 1.2 (NPR erasure noise upper and lower bounds). Let ε > 0 be sufficiently small, 0 ≤ ν ≤ 1
and let n ∈ N.

   1. There is an algorithm for the full NPR problem with erasure noise using time and samples at most
      poly(n, 1/ε 1/ν ). Moreover, the sample complexity of the estimation algorithm is upper bounded by
      O(1/ε 1/ν ).
                    p
   2. Assume that 16 ln(1/ε)/n ≤ ν ≤ 1/160. Then any estimation algorithm for NPR with erasure
      noise that only uses the number of 1s in each received string must use at least 1/ε Ω(1/ν) samples.
   3. If n ≥ ε −Ω(1/ν) , then any estimation algorithm for NPR with erasure noise must use at least
      1/ε Ω(1/ν) samples.

    For this problem, an earlier paper by Moitra and Saks [11] gave an algorithm with sample complexity
and running time (n/ε)O(log(1/ν)/ν) . In the above theorem, item 3 follows from a simple reduction from
item 2 (which exploits the shift invariance of binomial distributions). So as to not affect the flow of the
text, this reduction is presented in Section 6. Thus, in the main body of this paper, we only focus on
proving items 1 and 2.
    Finally, we note that in recent independent work, Yury Polyanskiy, Ananda Theertha Suresh, and
Yihong Wu [14] have obtained results similar to Theorem 1.1 and Theorem 1.2 for the population recovery
problem. We explain the relationship between their results and ours below.

1.3   Our techniques and relationship to [14]
Our approach is similar in spirit to, and shares some technical similarities with, the recent papers [5, 12]
on the trace reconstruction problem as well as an earlier paper by Moitra and Saks [11] on population
recovery with erasure noise. At a high level, we take an analytic view on the combinatorial process
defined by the bit-flip and erasure noise operators, and convert the sample complexity questions for these
population recovery problems to questions about the extrema of real-coefficient polynomials satisfying
certain conditions on various circles in the complex plane; we then obtain our sample complexity bounds
by analyzing these extremal polynomial questions. We remark here that [11] was the first paper to
introduce complex analytic tools in the line of work mentioned here—in contrast to our paper, [11] arrives
at the complex analytic formulation by considering the dual of a LP-based estimator. However, it is
possible to directly arrive at the complex analytic formulation without invoking the notion of LP duality,
and this is what we do in this paper. Finally, we mention that the main algorithmic ingredient in our
results is linear programming.

                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                              3
                        A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

    The work presented here and in [14] was done concurrently and independently. We now briefly
explain the relationship between the techniques and results in these papers. (a) The results for NPR with
bit-flip noise (i. e., Theorem 1.1) are the same as those of [14]. (b) As stated, our results for NPR with
erasure noise are quantitatively somewhat weaker though qualitatively quite similar to those of [14]. In
particular, we show that the sample complexity of the estimation problem in this setting is 1/ε Θ(1/ν) . In
contrast, [14] shows that the sample complexity for the estimation problem in presence of erasure noise,
is precisely (1/ε)max{2,2(1−ν)/ν} up to polylogarithmic factors. For any ν, our result differs from that of
[14] only up to a fixed constant factor in the exponent of ε.
    As our paper was written independently of [14], no attempt was made to match the results of [14] or
to obtain exponents with precise constant factors. Incidentally, it turns out that the proof of item 1 of
Theorem 1.2 in fact yields the same exponent as that of [14] though we state our result without the factor
“1 − ν” (see Theorem 4.1). Finally, we also note that the sample complexity for both our results and the
results of [14] are “dimension free” for the estimation problem, i. e., the sample complexity bound is
independent of the ambient dimension n. In contrast, for the full NPR problem, the sample complexity
depends on n (in both the papers).
    At a high level, the techniques of [14] are similar to ours (and those of [12, 5]) though our proofs are
substantially shorter. This is essentially because we are able to leverage some recent results from [4, 8] in
our proofs. In particular, in the proof of Item 2 of Theorem 1.2, we directly utilize the construction of an
extremal polynomial from [8], while in contrast [14] rely on an argument from first principles based on
Hadamard’s three line theorem.


2     Preliminaries
We start by formally defining the notion of a probability distribution on a finite set X as well as the `1 and
`∞ metrics on the space of such distributions.
Definition 2.1. A probability distribution on a finite set X is a function D : X → [0, 1] such that
∑x∈X D(x) = 1. For two such distributions D1 and D2 , their `1 distance, denoted by kD1 − D2 k1 ,
is given by ∑x∈X |D1 (x) − D2 (x)|. Likewise, their `∞ distance , denoted by kD1 − D2 k∞ , is given by
maxx∈X |D1 (x) − D2 (x)|.

2.1   Well-known preliminary reductions
Estimation, enumeration, and recovery. Variants of the NPR problem with relaxed goals have been
studied in the literature. One is the aforementioned estimation problem. Another (complementary) variant
is called enumeration: in the enumeration problem, the learning algorithm is only required to return a list
of strings x1 , . . . , xm that is guaranteed (with high probability) to include all strings that have probability
at least ε under D; such strings are sometimes referred to as “heavy hitters.” Batman et al. [1] give a
range of results for the enumeration problem.
     It is easy to see that a solution to the estimation problem can be efficiently bootstrapped to full NPR
given the ability to solve the enumeration problem (simply run estimation, with a sufficiently boosted
success probability, on each of the m strings in the list obtained from enumeration). In turn, it is also well
known that an estimation algorithm can be efficiently transformed into an enumeration algorithm via a

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                   4
                                S HARP B OUNDS FOR P OPULATION R ECOVERY

“branch-and-prune” approach. Roughly speaking, such an approach maintains a not-too-large (size at
most O(1/ε)) set of i-bit prefixes that is known to contain all the “heavy hitters”; to construct the set of
(i + 1)-bit prefixes, the approach first “branches” to extend each i-bit prefix x to both x0 and x1, and then
“prunes” any element of {x0, x1} that is determined, using the estimation procedure, not to be a heavy
hitter. (Note that since only heavy hitters are maintained it will again be the case that the set of (i + 1)-bit
prefixes has size at most O(1/ε).) As observed in [1], an early example of such a branch-and-prune
routine that performs enumeration given an oracle for estimation is the Goldreich–Levin algorithm [9] for
list-decoding the Hadamard code. Both Dvir et al. [7] and Batman et al. [1] give fairly detailed analyses
of the above-described reduction from enumeration to estimation; we omit the details here and refer the
interested reader to Section 6.1 of [7] and Section 2 of [1], respectively.
     Summarizing the reductions discussed above, we have that NPR is (up to polynomial factors) no
harder than the estimation problem, and it is also clearly no easier than estimation (since estimation is a
subproblem of general NPR). Thus, in the rest of this paper we restrict our attention to the estimation
problem.


Symmetrization. We further recall some well-known techniques that have been used in past papers
on NPR. First, in the estimation problem, we may assume without loss of generality that the string u
whose probability is to be estimated is u = (0, . . . , 0). To see this, for any point u ∈ {0, 1}n , let Du define
the distribution where Du (v) = D(u ⊕ v). Here u ⊕ v represents the bitwise XOR of u and v. Then the
mass of Du (0) is the same as the mass of D(u). For bit flip noise, we can generate y ∼ Flip 1−ν (Du ) as
                                                                                                           2
u ⊕ Flip 1−ν (D). For erasure noise, we can generate y ∼ Erase1−ν (Du ) as u ⊕ Erase1−ν (D) (where we
           2
are overloading the operator ⊕ in the obvious way, i. e., ‘?’ ⊕ {0, 1} =‘?’). Thus, for both erasure and
bit-flip noise, given noisy samples from D, we can generate noisy samples for the distribution Du .
     Next, for the problem of estimating D(0, . . . , 0), we may assume without loss of generality that D is
symmetric, meaning that it gives equal probability mass to all strings at the same Hamming weight. In other
words, D is effectively given by a probability distribution Dsym on [0..n], with D(x) = Dsym (|x|)/ |x|         n
                                                                                                                  
                                                                                                                   .
On one hand, if D(0, . . . , 0) can be estimated in the general case, it can certainly be estimated in the
symmetric case. On the other hand, given a general distribution D, the learner can randomly permute the
coordinates of each sample, effectively obtaining access to samples from a symmetric distribution D0 ,
with D0 (0, . . . , 0) = D(0, . . . , 0). Thus, it suffices for the learner to be able to estimate in the symmetric
case. (We note that both these tricks, namely reducing to the case when u = (0, . . . , 0) and symmetrizing
D, appear in prior work (see, e. g., [7, 11]).
     In the symmetric case, we will express the unknown Dsym as a probability (row) vector [p0 p1 · · · pn ].
Here pi denotes the total weight of the strings with Hamming weight i. Although the learner observes full
strings, it may as well only consider the Hamming weights of the strings it receives. (This is without loss
of generality in the bit-flip noise model, since the number of 0s is completely determined by the number
of 1s. In the erasure noise model, this is why our lower bound holds only for algorithms that only use the
number of 1s in each received string; see the discussion in Section 1.3.)
     Thus we may view the learner as obtaining samples from the probability (row) vector [q0 q1 · · · qn ],
where
            q = pA,        Ai j = Pr[a weight i string becomes a weight j string under ν noise].              (2.1)

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                     5
                        A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

It is not hard to write down the entries of A in either noise model. We remark that, after symmetrization,
the bit-flip model becomes equivalent to running the well-known Ehrenfest urn model for continuous
time tn, where e−t = ν. It is easy to write down the known generating function for that model.

Proposition 2.2 ([16, 2]). For A associated to the Flip 1−ν noise model, and z an indeterminate,
                                                             2

                                                          i 
                            n
                                                                 1 + ν 1 − ν n−i
                                                                            
                                    j       1−ν 1+ν
                          ∑ Ai j z =         2
                                               +
                                                 2
                                                    z
                                                                   2
                                                                      +
                                                                         2
                                                                            z    .
                          j=0


Proof. Fix i, j and let x be any string of weight i. Let Ek denote the event that k of the 1’s in x become
0 and j − i + k of the 0’s in x become 1. From positivity constraints, we derive that 0 ≤ k ≤ i and
i ≤ j + k ≤ n. It follows then that

                                                    1 + ν (i−k)+(n− j−k) 1 − ν k+( j−i+k)
                                                                       
                                       i    n−i
           Ai j =          ∑                                                              .
                  k:0≤k≤i and i≤ j+k≤n k   j−i+k      2                    2

Thus, we get that

      n             n
                                                      1 + ν (i−k)+(n− j−k) 1 − ν k+( j−i+k) j
                                                                         
             j                           i    n−i
     ∑ Ai j z = ∑            ∑               j−i+k      2                    2
                                                                                           z.
     j=0        j=0 k:0≤k≤i and i≤ j+k≤n k

We now simplify the right hand side by reversing the order of summation and rewriting it in terms of
` = j + k − i.

                                i   n−i  
                                                    1 + ν n−k−` 1 − ν k+` `+i−k
                                                                     
                                     i     n−i
                          ∑ ∑                                                 z    .
                          k=0 `=0 k         `         2              2
                           i  
                                       1 + ν n−k 1 − ν k i−k            (1 − ν)z n−i
                                                                             
                                i
                        = ∑                       ·           z     1+
                          k=0 k          2              2                (1 + ν)
                                           n−i        n 
                                                                   (1 − ν) i
                                                                           
                                 (1 − ν)z         1+ν       i
                        =   1+                             z 1+
                                  (1 + ν)            2            (1 + ν)z
                                             i                n−i
                            1−ν 1+ν                1+ν 1−ν
                        =           +       z             +     z      .                             (2.2)
                              2          2            2       2

This completes the proof.

   For the erasure model, the generating function is even simpler.

Proposition 2.3. For A associated to the Erase1−ν noise model, and z an indeterminate,
                                            n                        i
                                            ∑ Ai j z j = (1 − ν) + νz .
                                            j=0



                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                           6
                                     S HARP B OUNDS FOR P OPULATION R ECOVERY

Proof. By definition of Erase1−ν , it follows that when j ≤ i, Ai j = ij ν j (1 − ν)i− j and 0 otherwise.
                                                                        

Thus,
                         n                
                                          i                                  i
                        ∑ Ai j z = ∑ j ν j (1 − ν)i− j z j = (1 − ν) + νz .
                                j
                        j=0          j≤i

     To recap, in the estimation problem the learner’s task is to estimate p0 to accuracy ε, given samples
from q. We recall the well-known fact that, by taking the empirical distribution of O(n/δ 2 ) samples, the
learner may obtain an estimate qb of q satisfying kb
                                                   q − qk1 ≤ δ (with high probability). Although q = pA,
as noted in previous works one unfortunately cannot effectively estimate p0 simply as the first coordinate
of qbA−1 , because A is poorly conditioned. Instead one needs a more sophisticated approach.


3     Reduction to an analytic problem
It is not hard to characterize the optimal sample complexity for the estimation problem. Define

                                        η(ε, ν) =              min               kpA − p0 Ak1                                  (3.1)
                                                      probability vectors p,p0
                                                           |p0 −p00 |>2ε


(where the parameter ν implicitly appears within A). If two probability vectors p and p0 have |p0 − p00 | >
2ε, then a successful estimation algorithm must be able to distinguish the two cases. But if q = pA,
q0 = p0 A are close, in the sense that kq − q0 k1 ≤ δ , then a learning algorithm will need Ω(1/δ ) samples
to distinguish them with high probability. We have reached the following conclusion.

Proposition 3.1. The sample complexity of any population recovery algorithm—indeed, any estimation
algorithm—is Ω(1/η(ε, ν)).

    On the other hand, suppose the lower bound η(ε/2, ν) ≥ 2δ holds. Consider an estimation algorithm
                                                   q − qk1 < δ using O(n/δ 2 ) samples, and then exactly
that first produces an empirical estimate qb with kb
solves the following optimization problem using linear programming:

                                                        min               q − p0 Ak1 .
                                                                         kb
                                                probability vectors p0

(This can be efficiently written as an LP with O(n) variables and constraints and with rational numbers of
poly(n) bit-complexity.2 ) First, observe that the objective of the above program is at most δ (because
p0 = p achieves objective at most δ ). Thus, if p∗ is the optimal solution, then kb  q − p∗ Ak1 ≤ δ . This
implies that kpA − p∗ Ak1 ≤ 2δ which by our assumption implies that kp − p∗ k1 ≤ ε. Consequently,
|p0 − p∗0 | ≤ ε. Thus we get an efficient solution to the estimation problem. In conclusion, we have
established the following result.

Proposition 3.2. The estimation problem can be solved with poly(n, 1/η(ε/2, ν)) time and samples.
   2 For simplicity in this paper we assume that ε and ν are rational quantities of poly(n) bits known to the learning algorithm.

Since ε is part of the input, this is a reasonable assumption about ε. In the case of erasure noise, it is easy to estimate ν from the
samples—see Section 3.3 in [17].


                             T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                                   7
                      A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

Thus we see that, up to polynomial factors, both the sample complexity and the algorithmic complexity
of the estimation problem are controlled by the parameter η(ε, ν).
    We now further simplify the definition of η(ε, ν), similar to what was done in [5]. The difference of
two probability vectors over [0..n] is precisely any vector in the set

                                   ∆ := {[c0 c1 · · · cn ] : ∑ ci = 0, ∑ |ci | ≤ 2}.
                                                              i             i

Thus we have that
                                                  η(ε, ν) = min kcAk1 .
                                                             c∈∆
                                                            c0 >2ε

Let z be defined as z = (1, z, z2 , . . . , zn ). Then, using the triangle inequality and the Cauchy-Schwarz
inequality, we obtain                                        √
                                 max |cAz| ≤ kcAk1 ≤ n + 1 · max |cAz|.
                                    |z|=1                                  |z|=1

Note also that cAz is a polynomial in z that is easily calculated from the generating function of the
noise process (see Proposition 2.2 and Proposition 2.3). Putting it all together, we obtain the following
inequality.

Theorem 3.3.
                                   (
             η(ε, ν)                max|z|=1 |Fc (z)|      in the Flip 1−ν noise model
             √       ≤ min                                                 2             ≤ η(ε, ν),
              n+1      c∈∆           max|z|=1 |Ec (z)| in the Erase1−ν noise model
                          c0 >2ε

where
                                                                  i 
                                     n
                                                                         1 + ν 1 − ν n−i
                                                                                    
                                       1−ν 1+ν
                        Fc (z) = ∑ ci        +       z                        +     z    ,            (3.2)
                                 i=0     2       2                         2     2
                                  n               i
                        Ec (z) = ∑ ci (1 − ν) + νz .                                                  (3.3)
                                    i=0

    Given c ∈ ∆ with c0 > 2ε, define the following polynomial (with real coefficients and a complex
parameter):
                                                                  n
                                                    Qc (v) = ∑ ci vi .
                                                              i=0

Thus the assumptions on c are equivalent to Qc (0) > 2ε, Qc (1) = 0, and L(Qc ) ≤ 2, where L(Qc ) is the
length of Qc , i. e., the sum of the absolute values of its coefficients.
    In analyzing Ec above, we use that Ec (z) = ∑ni=0 ci ui , where u = (1 − ν) + νz. As z ranges over the
unit circle |z| = 1, the range of the parameter u, namely {(1 − ν) + νz : |z| = 1} is precisely the circle
∂ Dν (1 − ν) of radius ν centered at the real value 1 − ν. Thus

                                          max |Ec (z)| =      max         |Qc (u)|.
                                          |z|=1            u∈∂ Dν (1−ν)


                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                              8
                                   S HARP B OUNDS FOR P OPULATION R ECOVERY

      In analyzing Fc above, we use that

                                                                                        1−ν
                                                  n n                                         + 1+ν
                                                                                                   2 z
                        Fc (z) =        1+ν
                                            ν
                                             1−ν
                                         2 − 2 w
                                                      ∑ ci wi ,                      2
                                                                          where w = 1+ν
                                                                                                + 1−ν
                                                                                                          .        (3.4)
                                                      i=0                                   2      2 z

As the parameter z ranges over the unit circle, w also ranges over the unit circle. To see this, note that w
is a Möbius transformation of z, and thus as z ranges over a circle w ranges over either a circle or a line.
Further, it is easy to see that for z = 1, −1, i, the resulting w lies on the unit circle. Consequently, for any
z such that |z| = 1, w lies on the unit circle. Parameterizing w as w = eiθ , it is not hard to compute that

                          ν
                                    2                    2ν 2                      1
                      1+ν 1−ν           =                                 =                         .              (3.5)
                       2 − 2 w              (1 − cos θ ) + (1 + cos θ )ν 2 1 + (1−ν 2 ) sin2 (θ /2)
                                                                                                     ν2

Thus
                                                                                    n/2
                                                                      1
                         max |Fc (z)| = max                         2      2                · |Qc (eiθ )|.         (3.6)
                         |z|=1                −π<θ ≤π       1 + (1−ν )νsin2 (θ /2)

We have reached the following conclusion.

Corollary 3.4.
                                                        n/2
                                              1
                   max                                          · |Q(eiθ )| in the Flip 1−ν noise model
                  
                  
η(ε, ν)                                  2      2
√       ≤ min −π<θ ≤π            1 + (1−ν )νsin2 (θ /2)                                          2
                                                                                                              ≤ η(ε, ν),
 n+1       Q 
              max
                                  |Q(u)|,                                      in the Erase1−ν noise model
                    u∈∂ Dν (1−ν)


where the minimum is over real-coefficient polynomials Q of degree at most n satisfying Q(0) > 2ε,
Q(1) = 0, and L(Q) ≤ 2.

   Combining Propositions 3.1 and 3.2 with Corollary 3.4, we see that Theorems 1.1 and 1.2 follow
from giving bounds on the two quantities specified in Corollary 3.4 (or in Theorem 3.3). We give such
bounds in the following sections.


4      Circle bounds for erasure noise
4.1     A lower bound on η(ε, ν) for erasure noise
Notice that L(Q) ≤ 2 implies that |Q(u)| ≤ 2 for all |u| = 1. We have the following result.

Theorem 4.1. Let Q be a complex polynomial with |Q(0)| ≥ 2ε and |Q(u)| ≤ 2 for |u| = 1. Then for 0 <
ν < 1/2 we have maxu∈∂ Dν (1−ν) |Q(u)| ≥ 2ε (1−ν)/ν . For 1/2 ≤ ν ≤ 1, we have maxu∈∂ Dν (1−ν) |Q(u)| ≥
2ε.

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                         9
                         A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

Proof. Let us first consider the case when 1/2 ≤ ν ≤ 1. In this case, note that 0 is contained in the circle
Dν (1 − ν). By the maximum modulus principle, |Q(0)| ≤ maxu∈∂ Dν (1−ν) |Q(u)|. However |Q(0)| ≥ 2ε
which completes the proof in this case.
    Let us now assume that 0 < ν < 1/2. Let U be the unit circle, let O be the circle of radius 1/2 centered
at 1/2, which lies inside U, and let C = ∂ Dν (1 − ν), which lies inside O. The Möbius transformation
A(u) = 1/(1 − u) takes these circles to vertical lines U 0 , O0 , and C0 with real parts 1/2, 1, and 1/2ν,
respectively. Defining the function f (u) = Q(A−1 (u)), we have that f is bounded on the strip defined
by U 0 and C0 , and we have that supu∈U 0 | f (u)| ≤ 2, supu∈O0 | f (u)| ≥ 2ε. Writing M for the maximum
modulus of f on C0 , the Hadamard Three-Lines Theorem implies that
                                                 1−2ν      ν
                                               2 1−ν M 1−ν ≥ 2ε,

which completes the proof after rearrangement.

4.2     An upper bound on η(ε, ν) for erasure noise
In this section, we will prove the following result.

Theorem 4.2. There is an absolute constant τ > 0 such that for every ν ≤ 1/10, 0 < ε < τ and
ln(1/ε)/ν 2 ≤ n, there exists a vector c ∈ ∆ with c0 > 2ε such that the polynomial Qc (u) = ∑ni=0 ci ui
satisfies
                                         sup      |Qc (u)| = ε Ω(1/ν) .
                                      u∈∂ Dν/16 (1−ν/16)

      In order to prove this theorem, we will first collect a few facts. Given a, r > 0, define the set Ba,r as

                                Ba,r = (1 − 8a) + 4a(z + z−1 ) : z ∈ ∂ Dr (0) .
                                      


We now make a few observations about the set Ba,r as r varies. In particular, we have the following fact.

Fact 4.3. For r ∈ {1, 2, 4}, the sets Ba,r are as follows:

      • For r = 1, the set Ba,r is the line segment joining 1 and 1 − 16a.

      • For r = 2, the set Ba,r is the ellipse centered at 1 − 8a with major axis [1 − 8a − 10a, 1 − 8a + 10a]
        and minor axis [1 − 8a + 6i, 1 − 8a − 6i].

      • For r = 4, the set Ba,r is the ellipse centered at 1 − 8a with major axis [1 − 8a − 17a, 1 − 8a + 17a]
        and minor axis [1 − 8a + 15i, 1 − 8a − 15i].

Proof. For z ∈ ∂ Dr (0), we can express z = x + iy where x = r cos θ and y = r sin θ , where r ∈ R and
θ ∈ [0, 2π). Consequently, points on Ba,r can be parameterized as
                                                                                 
                                                 1               1
                   Ba,r = (1 − 8a) + 4a cos θ r + + 4ai sin θ r −    : θ ∈ [0, 2π) .
                                                 r                r

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                10
                                S HARP B OUNDS FOR P OPULATION R ECOVERY

Let w = x1 + iy1 where x1 , y1 ∈ R and w ∈ Ba,r . Then the tuple (x1 , y1 ) satisfies

                                     (x1 − (1 − 8a))2        y21
                                                   2
                                                      +             2 = 1
                                      16a2 r + 1r       16a2 r − 1r
                                                  


This implies each r, Ba,r describes an ellipse with the center at 1 − 8a. The major axis is given by [1 − 8a +
4a(r + 1r ), 1 − 8a − 4a(r + 1r )] and the minor axis is given by [1 − 8a + 4a(r − 1r )i, 1 − 8a − 4a(r − 1r )i].
Plugging in the values of r (for r ∈ {1, 2, 4}), we get the claim.

    Next, we make the following claim.

Claim 4.4. The circle D4a (1 − 4a) is contained in Ba,2 .

Proof. The ellipse Ba,2 is centered at 1 − 8a with the major and minor axis aligned with the real and
imaginary axis. Further, the length of the semi-major axis is 10a and the length of the semi-minor axis is
6a. Thus, any point z = x + iy is contained in this ellipse as long as

                                            (x − (1 − 8a))2    y2
                                                            +      ≤ 1.
                                                100a2         36a2
The circle ∂ D4a (1 − 4a) consists of points z = x + iy where x = 1 − 4a + 4a cos θ and y = 4a sin θ .
Observe that for any such point z = x + iy,

          (x − (1 − 8a))2    y2    (4a + 4a cos θ )2 (4a sin θ )2 4(1 + cos θ )2 4 sin2 θ
                          +      =                  +            =              +         .
              100a2         36a2        100a2           36a2           25            9
The last quantity can be easily bounded by 1 showing that ∂ D4a (1 − 4a) is contained in Ba,2 . This
immediately implies the same for D4a (1 − 4a).

    By Hadamard’s three-circle theorem, any holomorphic function f satisfies
                                                       r               r
                      sup    | f (u)| ≤ sup | f (u)| ≤   sup | f (u)| · sup | f (u)|.                      (4.1)
                      u∈D4a (1−4a)             u∈Ba,2                  u∈Ba,1     u∈Ba,4


Consequently, we have the following corollary.

Corollary 4.5. Let c ∈ ∆ and Qc (u) = ∑ni=0 ci ui . Then,
                                                r                    p
                            sup    |Qc (u)| ≤        sup |Qc (u)| · 2 exp(9an).
                             u∈D4a (1−4a)                     u∈Ba,1


Proof. We apply equation (4.1) to the function Qc and then observe that
                                                     n          
                    sup |Qc (u)| ≤ sup |u|n ·           ∑ |c j | ≤ 2 · (1 + 9a)n ≤ 2 · exp(9an),
                    u∈Ba,4            u∈Ba,4            j=0

which concludes the proof.

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                11
                        A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

    We next recall the following result from [8].

Theorem 4.6 ([8, Lemma 3.3]). For any L ∈ [0, 1/17) and M ∈ N, there is a real-coefficient polynomial
p(u) = ∑Mj=0 a j u j with |a0 | ≥ L · (∑Mj=1 |a j |) such that p has at least
                                                                                   
                                                   2p
                                        TL,M = min   M · (− ln L), M
                                                   7

repeated roots at 1.

    We will also use the following result from [4].

Claim 4.7 (Lemma 5.4 of [4]). Let p : C → C be defined as p(u) = ∑Mj=0 a j u j where |a j | ≤ 1 for all
0 ≤ j ≤ n. Further, let p have k repeated roots at 1. Let A be the interval [1 − k/(9M), 1]. Then
                                                                   k
                                                                   e
                                              sup |p(u)| ≤ (M + 1)     .
                                              u∈A                  9

    With these two results in hand, we are now ready to prove Theorem 4.2.
Proof of Theorem 4.2: Let us set M = bln(1/ε)/ν 2 c and let p(u) = ∑Mj=0 c j u j be the polynomial from
Theorem 4.6 with L = 2ε. Let us also scale the coefficients such that |c0 | = 2ε and thus ∑Mj=0 |c j | ≤ 2.
As ln(1/ε)/ν 2 ≤ n, M ≤ n and thus our construction is well-defined. The polynomial p has at least T
roots at 1, where                                         
                                        2 ln(1/ε) ln(1/ε)      2 ln(1/ε)
                             T = min             ,     2
                                                             =            .
                                        7 ν          ν         7 ν
Let us define θ = T /(9M) = (2/63) · ν. By applying Claim 4.7, it follows that
                                                                      T  T
                                                                      e    1
                                    sup       |p(u)| ≤ (M + 1) ·         ≤     .
                                 u∈[1−θ ,1]                           9    3

Here the last inequality uses the relation between T and M and ε ≤ τ. Finally, set a = ν/63. Then,
applying Corollary 4.5, we obtain
                                                                                   s 
                                        r                 p                          1 T
                  sup        |p(u)| ≤       sup |p(u)| · 2 exp(9aM) ≤                    · 4 · exp(9aM).
              u∈D4a (1−4a)                u∈Ba,1                                     3

Plugging in a = ν/63, M = bln(1/ε)/ν 2 c and T = (2Mν)/7, we obtain that

                                                   sup       |p(u)| ≤ ε Ω(1/µ) ,
                                              u∈D4a (1−4a)


which concludes the proof.

                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                               12
                                S HARP B OUNDS FOR P OPULATION R ECOVERY

5     Circle bounds for bit-flip noise
5.1   A lower bound on η(ε, ν) for bit-flip noise
In this section we prove the following theorem.
Theorem 5.1. For 0 < ν, ε < 1 and n ∈ N which satisfy
                                       2 ln(2/ε)          2 ln(2/ε)
                                                 ≤ ν ≤ 1−           ,
                                            n                  n
we have
                                               ln2/3 (1/ε) · (n(1 − ν 2 ))1/3
                                                                                 
                         η(ε, ν) ≥ ε · exp − O                                       .
                                                           ν 2/3
Proof. Fix any vector [c0 c1 . . . cn ] ∈ ∆ with |c0 | > 2ε. Recalling Theorems 3.3 and 3.4, to prove Theo-
rem 5.1 it suffices to show that the function Fc (z) as defined in equation (3.4) satisfies
                                                        2/3
                                                         ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                                    
                        max |Fc (z)| ≥ ε · exp − O                                     .               (5.1)
                        |z|=1                                       ν 2/3
To prove this, we recall equation (3.6) which states that
                                                                     n/2
                                                         1
                       max |Fc (z)| = max                     2            · |Qc (eiθ )|.
                       |z|=1          −π<θ ≤π 1 + (1−ν 2 ) sin (θ /2)
                                                           ν2
Next, we observe that for −π < θ ≤ π, we have
                                                  (1 − ν 2 )θ 2     (1 − ν 2 )θ 2
                                                                                 
                              2     2
                    1 − (1 − ν ) sin (θ /2) ∈ 1 −               ,1−                 ,
                                                       4                16
where the last inclusion uses θ 2 /16 ≤ sin2 (θ /2) ≤ θ 2 /4, which holds for θ ∈ [−π, π]. Using the
elementary fact e−x ≤ 1/(1 + x) for all x ≥ 0, it follows that
                                                                1 − ν2 2
                                                                      
                                         1
                                              2         ≥ exp −       θ
                                         2
                               1 + (1−ν )νsin2 (θ /2)            4ν 2

and thus, we have
                                                    1 − ν2 2
                                                            
                           max |Fc (z)| ≥ max exp −       θ n · |Qc (eiθ )|.                            (5.2)
                           |z|=1         −π<θ ≤π     8ν 2
Next, set θ ∗ as
                                                1 ν 2/3 · ln1/3 (1/ε)
                                         θ∗ =     ·                   .
                                                10 (n(1 − ν 2 ))1/3
(It is easy to see the constraints on ν dictate imply that θ ∗ ≤ 1). Let A∗ = [−θ ∗ , θ ∗ ]. Then, plugging in
the value of θ ∗ in equation (5.2), we get
                                            2/3
                                             ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                        
                  max |Fc (z)| ≥ exp − O                                     · max∗ |Qc (eiθ )|.         (5.3)
                  |z|=1                                 ν 2/3                  θ ∈A

To give a lower bound on maxθ ∈A∗ |Qc (eiθ )|, we recall Corollary 3.2 from [3].

                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                               13
                        A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

Theorem 5.2 (Corollary 3.2 of [3]). There is a universal constant c > 0 such that the following holds.
Let Q(u) be a univariate polynomial with complex coefficients, Q(u) = ∑nj=0 b j u j with |b0 | = 1 and all
coefficients |b j | ≤ M. Let A be a subarc of the unit circle with length a, where 0 < a < 2π. Then there is
some w ∈ A such that                                                 
                                                        −c(1 + ln M)
                                       |Q(w)| ≥ exp                     .
                                                              a
    We now apply this theorem to polynomial Qc /c0 by setting “M” to 1/c0 , “a” to θ ∗ and “A” to A∗ .
This yields
                               |Qc (eiθ )|
                                                                          
                                                   −Θ(1) · (1 + ln(1/c0 ))
                          max∗             ≥ exp
                          θ ∈A     c0                        θ∗
Using that 1 ≥ c0 ≥ 2ε, we get that
                                                   2/3
                                                   ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                              
                                 iθ
                      max∗ |Qc (e )| ≥ ε · exp − O                               .
                      θ ∈A                                  ν 2/3

Combining with equation (5.3) completes the proof.

5.2   An upper bound on η(ε, ν) for bit-flip noise
In this section we prove the following result.

Theorem 5.3. There is a universal constant c > 0 such that for ν, 0 < ε < c and n ∈ N which satisfy
                                                     1/4
                                          2 ln(2/ε)                     2 ln(2/ε)
                                                             ≤ ν ≤ 1−             ,
                                               n                             n

we have                                      2/3
                                             ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                        
                           η(ε, ν) = exp − Ω                               .
                                                      ν 2/3

    Recalling equation (3.6), to prove this result we must demonstrate the existence of a vector [c0 c1 . . . cn ]
in ∆, |c0 | > 2ε such that Fc (z) satisfies

                                                 ln2/3 (1/ε) · (n(1 − ν 2 ))1/3
                                                                              
                          sup |Fc (z)| = exp Ω −                                  ,                         (5.4)
                         |z|=1                               ν 2/3

where we recall from equation (3.2) that
                                      n
                                                1 − ν 1 + ν i 1 + ν 1 − ν n−i
                                                                        
                         Fc (z) = ∑ ci               +     z       +     z    .
                                  i=0             2     2       2     2

To prove (5.4), we will use Theorem 4.6 and the following lemma, which relates the multiplicity of roots
of a polynomial at 1 with the supremum of p on an arc centered at 1.

                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                   14
                                S HARP B OUNDS FOR P OPULATION R ECOVERY

Lemma 5.4 (Lemma 4.7 in [3]). Suppose p : C → C is a polynomial of the form p(u) = ∑Mj=0 a j u j , where
|a j | ≤ 9 and p has k repeated roots at 1. If A denotes the arc of the unit circle that is symmetric around 1
and has length (2k)/(9M), then
                                                                 k
                                                                 e
                                        sup |p(u)| ≤ 9(M + 1) ·      .
                                        u∈A                      9

Proof of Theorem 5.3. With these results in hand we are ready to specify our construction of [c0 , . . . cn ].
For this, we set M as follows:

                                 M = bn2/3 · ln1/3 (1/ε) · (1 − ν 2 )2/3 · ν −4/3 c.

We first make the following observations about M. (i) Since ν 4 ≥ ln(1/ε)/n, it is the case that M ≤ n.
(ii) Since 1 − ν ≥ 2 ln(2/ε)/n, it is moreover the case that M ≥ ln(1/ε).
     For M as defined above, let us rescale the polynomial in Theorem 4.6 so that |a0 | = 2ε and thus,
  M
∑ j=1 |a j | ≤ 1. We now set c j = a j for all 1 ≤ j ≤ M and c j = 0 otherwise. Note that since M ≤ n, this is
well-defined.
     By construction, the polynomial p(u) defined as p(u) = ∑Mj=0 c j u j has at least T repeated roots at 1,
where                                                       
                                         2p                       2p
                           T = min            M · ln(1/2ε), M =       M · ln(1/2ε),
                                         7                        7
where the last equality uses 1 − ν ≥ 2 ln(2/ε)/n. We note for later reference that
                                                                             
                             T = Ω n1/3 · ln2/3 (1/ε) · (1 − ν 2 )1/3 · ν −2/3 .                        (5.5)

Let us define θ ∗ as                         r
                               2T   4            ln(1/2ε)    4 ln1/3 (1/ε) · ν 2/3
                          θ∗ =    =                       ≤   ·                     .                   (5.6)
                               9M 63                M       63 n1/3 · (1 − ν 2 )1/3
Observe that since 1 − ν ≥ 2 ln(1/ε)/n, it holds that θ ∗ ≤ 4/63. Let A be the arc of the unit circle
A = {eiθ | − θ ∗ ≤ θ ≤ θ ∗ }. Applying Lemma 5.4 (and observing that all degree M + 1 and higher
coefficients of p are zero), we obtain that
                                                              T  T
                                                              e    1
                                  sup |p(u)| = 9 · (M + 1) ·     ≤     .                                (5.7)
                                  u∈A                         9    3

Here the last inequality uses
                                                2p
                                             T=     M · ln(1/2ε)
                                                7
and the fact that ε is at most some sufficiently small constant.
   Now we turn our attention to Fc (z). Recalling equation (3.4), we have that
                                                                      n     n
                                                               ν
                                sup |Fc (z)| = sup         1+ν   1−ν
                                                                            · ∑ ci wi .                 (5.8)
                             |z|=1             |w|=1        2 − 2 w          i=0


                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                               15
                         A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

Let us write Φc (w) to denote                                  !n
                                                                      n
                                                ν
                                            1+ν   1−ν
                                                                    · ∑ ci wi ,
                                             2 − 2 w                 i=0

so we seek to upper bound sup|w|=1 |Φc (w)|. We do this by upper bounding |Φc (w)| separately on the
sets A and A.
    First, we bound |Φc (w)| in the set A as follows:
                                                           2/3
                                                            ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                                       
                                          −Ω(T )
            sup |Φc (w)| ≤ sup |p(w)| ≤ e        = exp − Ω                                .    (5.9)
            w∈A            w∈A                                       ν 2/3
Here the first inequality uses the fact that
                                                      ν
                                               1+ν 1−ν
                                                                    ≤ 1,
                                                2 − 2 w

the second inequality uses equation (5.7), and the last equality uses equation (5.5).
    To bound |Φc (w)| in A, we will need a couple of facts. First, since ∑nj=0 |c j | ≤ 2, it is the case that
|p(w)| ≤ 2 for all |w| = 1, and consequently
                                                                             n
                                                         ν
                                        |Φc (w)| ≤ 2 1+ν 1−ν  .
                                                      2 − 2 w

Recalling equation (3.5), we have
                                                               n/2                                  n/2
                                                1                                        1
                   sup |Φc (w)| ≤ 2             2    2     ∗
                                                                          ≤2                 2   ∗ 2          ,
                   w∈A                 1 + (1−ν ) sin
                                                  ν2
                                                      (θ /2)
                                                                                  1 + (1−ν8ν)(θ
                                                                                             2
                                                                                                )


where the last inequality uses sin2 (θ ∗ /2) ≥ (θ ∗ )2 /8 which holds since θ ∗ ≤ 4/63. Finally, again using
ν 4 ≥ ln(1/ε)/n and recalling equation (5.6), we have
                                            (1 − ν 2 )(θ ∗ )2
                                                              ≤ 4/63
                                                 8ν 2
(with room to spare). Thus, we have that
                                                    n/2
                                                                       (1 − ν 2 )(θ ∗ )2 n
                                                                                        
                                        1
               sup |Φc (w)| ≤ 2           2    ∗ )2      ≤  exp   − Ω
               w∈A               1 + (1−ν8ν)(θ
                                             2
                                                                              ν2
                                          2/3
                                            ln (1/ε) · (n(1 − ν 2 ))1/3
                                                                       
                            ≤ exp − Ω                                      ,
                                                         ν 2/3
where for the last inequality we used

                                                         ln1/3 (1/ε) · ν 2/3
                                       θ ∗ = Θ(1) ·                           ,
                                                         n1/3 · (1 − ν 2 )1/3
which follows from equation (5.6). Combining with equation (5.9) completes the proof.

                         T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                     16
                                S HARP B OUNDS FOR P OPULATION R ECOVERY

6    Reduction between the restricted and the general lower bound

    In this section we give a reduction from item 2 to item 3 of Theorem 1.2. We first set up some notation.
For a distribution D supported on {0, 1}n , let De denote the distribution obtained from Erase1−ν (x)
where x ∼ D. Let D1,e denote the distribution of the number of ones in De and D0,1,e denote the joint
distribution of the number of zeros and ones in De . Note that D1,e is supported on N and D0,1,e is
supported on N2 .
    We will now show how item 2 of Theorem 1.2 implies item 3. Observe that item 2 is equivalent to the
existence of a pair of distributions X and Y supported on {0, 1}n0 (for n0 ∈ N) such that kX − Yk∞ ≥ ε
and kX1,e − Y1,e k1 ≤ ε Ω(1/ν) . In fact, it suffices to choose n0 = Θ(ν −2 · ln(1/ε)).
    Note that by symmetrization, without loss of generality, we can assume that X and Y are symmetric
distributions. As a consequence, for any pair z, z0 ∈ {0, 1, ?}n0 which have the same number of zeros and
ones (and hence the same number of ‘?’), Xe (z) = Xe (z0 ) and Ye (z) = Ye (z0 ). Item 3 would thus follow if
we can show kX0,1,e − Y0,1,e k1 ≤ ε Ω(1/ν) . While this is not necessarily true, we will modify X and Y to
achieve this property.
    Choose a number m0 (we will fix this later) and let us define X    e by sampling x from X and padding
with m0 zeros. The distribution Y is defined likewise. Observe that
                                   e

                                         kX
                                          e − Yk
                                              e ∞ = kX − Yk∞ ≥ ε.

For any ` ∈ N and q ∈ [0, 1], let Bin(`, q) denote the binomial distribution with ` trials where each trial
succeeds with probability q. We now claim that
                                                                     n0
                           kX
                            e 0,1,e − X1,e × Bin(m0 , ν)k1 ≤ p                   .                         (6.1)
                                                              m0 · min{ν, 1 − ν}
To prove this, we need the following basic fact about binomial distributions.
Fact 6.1. Let X ∼ Bin(m0 , ν) and Y ∼ Bin(m0 , ν) + 1. Then,
                                                        1
                                     kX − Yk1 ≤ p                   ,
                                                 m0 · min{ν, 1 − ν}
     We note that for random variables Z1 and Z2 , kZ1 − Z2 k1 denotes the `1 distance between the
corresponding distributions. To prove equation (6.1), let X0,z,e = X0,1,e |(X1,e = z). In other words, X0,z,e
is the conditional distribution on the number of zeros in X0,1,e conditioned on X1,e = z. Note that the
number of ones in X   e 0,1,e is the same as X1,e . We now define X  e 0,z,e = X
                                                                               e 0,1,e |(X1,e = z). Observe that
X
e 0,z,e = X0,z,e + Bin(m0 , ν). However, observe that X0,z,e is supported on [0, . . . , n0 ]. Applying Fact 6.1,
we get
                                                                    n0
                                kXe 0,z,e − Bin(m0 , ν)k1 ≤ p                     .
                                                             m0 · min{ν, 1 − ν}
Since this holds for all z, we get equation (6.1). Analogously, we also get
                                                                    n0
                           kY
                            e0,1,e − Y1,e × Bin(m0 , ν)k1 ≤ p                   .                          (6.2)
                                                             m0 · min{ν, 1 − ν}

                        T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                                  17
                       A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

By construction, we have that kX1,e −Y1,e k1 ≤ ε Ω(1/ν) . Combining with equation (6.1) and equation (6.2),
we have
                                                                   2n0
                          kY
                           e0,1,e − X
                                    e 0,1,e k1 ≤ ε Ω(1/ν) + p                   .
                                                             m0 · min{ν, 1 − ν}

To ensure that the right hand side is bounded by ε Ω(1/ν) , it suffices to choose

                                                                  1
                                    m0 = ε −Ω(1/ν) · n20 ·                 .
                                                             min{ν, 1 − ν}

Plugging in the value of n0 = Θ(ν −2 · ln(1/ε)), it suffices to choose

                                                  1
                                      m0 =                 · ε −Ω(1/ν) .
                                             min{ν, 1 − ν}

Thus, the distributions X
                        e and Y
                              e are supported on {0, 1}n where

                                                  1
                                       n=                  · ε −Ω(1/ν) .
                                             min{ν, 1 − ν}

This almost proves item 3 except the factor of 1/min{ν, 1 − ν} in the lower bound for n.
    To remove this factor, note that even if ν = 1, there is a trivial lower bound of ε −2 for any estimation
algorithm for NPR (this holds as long as n ≥ log(1/ε)). Further, since access to samples with erasure
noise ν 0 can be simulated given access to samples with noise rate ν (provided ν 0 ≤ ν), this lower bound
holds for any noise rate ν ≥ 0. Combining this observation with the earlier lower bound proves Item 3.

Acknowledgments
A. D. would like to thank Mike Saks for suggesting the noisy population recovery problem for unrestricted
support and for many illuminating conversations about this problem. The authors would like to thank
Tamás Erdélyi for several helpful email exchanges about [8].
    The first version of this paper stated Item 2 of Theorem 1.2 as an unconditional lower bound for all
algorithms rather than just ones which use the total numbers of 1 in the received strings. We gratefully
acknowledge Yury Polyanskiy and Yihong Wu [15] for informing us of this error.


References
 [1] L UCIA BATMAN , RUSSELL I MPAGLIAZZO , C ODY M URRAY, AND R AMAMOHAN PATURI: Finding
     heavy hitters from lossy or noisy data. In Proc. 17th Internat. Workshop on Randomization and
     Computation (RANDOM’13), pp. 347–362. Springer, 2013. [doi:10.1007/978-3-642-40328-6_25]
     4, 5

 [2] R ICHARD B ELLMAN AND T HEODORE H ARRIS: Recurrence times for the Ehrenfest model. Pacific
     J. Math., 1(2):179–193, 1951. [doi:10.2140/pjm.1951.1.179] 6

                       T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                               18
                            S HARP B OUNDS FOR P OPULATION R ECOVERY

 [3] P ETER B ORWEIN AND TAMÁS E RDÉLYI: Littlewood-type problems on subarcs of the unit circle.
     Indiana Univ. Math. J., 46(4):1323–1346, 1997. [doi:10.1512/iumj.1997.46.1435] 13, 14, 15

 [4] P ETER B ORWEIN , TAMÁS E RDÉLYI , AND G ÉZA KÓS: Littlewood-type problems on [0, 1]. Proc.
     London Math. Soc., 79(1):22–46, 1999. [doi:10.1112/S0024611599011831] 4, 12

 [5] A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO: Optimal mean-based algorithms
     for trace reconstruction. Ann. Appl. Probab., 29(2):851–874, 2019. Preliminary version in STOC’17.
     [doi:10.1214/18-AAP1394] 3, 4, 8

 [6] A NINDYA D E , M ICHAEL E. S AKS , AND S IJIAN TANG: Noisy population recovery in polynomial
     time. In Proc. 57th FOCS, pp. 675–684. IEEE Comp. Soc. Press, 2016. [doi:10.1109/FOCS.2016.77,
     arXiv:1602.07616] 3

 [7] Z EEV DVIR , A NUP R AO , AVI W IGDERSON , AND A MIR Y EHUDAYOFF: Restriction ac-
     cess. In Innovations in Theoret. Computer Sci. (ITCS’12), pp. 19–33. ACM Press, 2012.
     [doi:10.1145/2090236.2090239] 5

 [8] TAMÁS E RDÉLYI: Coppersmith–Rivlin type inequalities and the order of vanishing of polynomials
     at 1. Acta Arithmetica, 172:271–284, 2016. [doi:10.4064/aa8129-11-2015, arXiv:1406.2560] 4, 12,
     18

 [9] O DED G OLDREICH AND L EONID A. L EVIN: A hard-core predicate for all one-way functions. In
     Proc. 21st STOC, pp. 25–32. ACM Press, 1989. [doi:10.1145/73007.73010] 5

[10] S HACHAR L OVETT AND J IAPENG Z HANG: Improved noisy population recovery, and reverse
     Bonami–Beckner inequality for sparse functions. In Proc. 47th STOC, pp. 137–142. ACM Press,
     2015. [doi:10.1145/2746539.2746540] 3

[11] A NKUR M OITRA AND M ICHAEL S AKS: A polynomial time algorithm for lossy population recovery.
     In Proc. 54th FOCS, pp. 110–116. IEEE Comp. Soc. Press, 2013. [doi:10.1109/FOCS.2013.20] 3, 5

[12] F EDOR NAZAROV AND Y UVAL P ERES: Trace reconstruction with exp(O(N 1/3 )) samples. In Proc.
     49th STOC, pp. 1042–1046. ACM Press, 2017. [doi:10.1145/3055399.3055494, arXiv:1612.03599]
     3, 4

[13] RYAN O’D ONNELL: Analysis of Boolean Functions.                 Cambridge Univ. Press, 2014.
     [doi:10.1017/CBO9781139814782] 2

[14] Y URY P OLYANSKIY, A NANDA T HEERTHA S URESH , AND Y IHONG W U: Sample complexity of
     population recovery. In Proc. 30th Ann. Conf. on Learning Theory (COLT’17), pp. 1589–1618,
     2017. Link at MLR Press. [arXiv:1702.05574] 3, 4

[15] Y URY P OLYANSKIY AND Y IHONG W U: Personal communication, March 6, 2017. 18

[16] H. A. F. S IEGERT: Note on the Ehrenfest problem. Technical Report LADC-438, Technical
     Information Division, Oak Ridge Operations, Oak Ridge, Tennesssee, 1947. 6

                      T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                          19
                    A NINDYA D E , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO

[17] AVI W IGDERSON AND A MIR Y EHUDAYOFF: Population recovery and partial identification. Ma-
     chine Learning, 102:29–56, 2016. Preliminary version in FOCS’12. [doi:10.1007/s10994-015-
     5489-9] 3, 7

AUTHORS
     Anindya De
     Assistant Professor
     University of Pennsylvania
     anindyad seas upenn edu
     http://www.seas.upenn.edu/~anindya

     Ryan O’Donnell
     Professor
     Carnegie Mellon University
     odonnell cs cmu edu
     http://www.cs.cmu.edu/~odonnell/

     Rocco A. Servedio
     Professor
     Columbia University
     rocco cs columbia edu
     https://cs.columbia.edu/~rocco/

ABOUT THE AUTHORS
     A NINDYA D E is an Assistant Professor of Computer Science at the University of Pennsylva-
        nia. He received his Ph. D. from U.C. Berkeley in 2013, advised by Luca Trevisan. He
        was a postdoc at the Institute for Advanced Study and DIMACS from 2013 to 2015 and an
        Assistant Professor at Northwestern University from 2015 to 2018. His research interests
        are in complexity theory, learning theory, discrete analysis and applied probability.

     RYAN O’D ONNELL received a B.Sc. from the University of Toronto in 1999 and a Ph.D.
       from the MIT Mathematics Department in 2003. His Ph.D. advisor was Madhu Sudan.
       Following this he was a postdoc at IAS for a year in Avi Wigderson’s group, and a
       postdoc at Microsoft Research for two years in Jennifer Chayes’s group. Since 2006 he
       has been an assistant professor in the Computer Science Department at Carnegie Mellon
       University. Ryan’s research interests include Analysis of Boolean Functions, Constraint
       Satisfaction Problems, Quantum Complexity, and Probability. He enjoys his spare time.

     ROCCO S ERVEDIO is a professor in the Department of Computer Science at Columbia
       University. He graduated from Harvard University, where his Ph. D. was supervised by
       Les Valiant. He is interested in computational complexity theory, computational learning
       theory, randomness in computing, property testing, and other topics.


                     T HEORY OF C OMPUTING, Volume 16 (6), 2020, pp. 1–20                          20