More on Bounded Independence Plus Noise: Pseudorandom Generators for Read-Once Polynomials

Authors Chin Ho Lee, Emanuele Viola,
Plaintext
                           T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50
                                        www.theoryofcomputing.org




 More on Bounded Independence Plus Noise:
       Pseudorandom Generators for
          Read-Once Polynomials
                                 Chin Ho Lee∗                       Emanuele Viola†
                Received March 30, 2019; Revised December 26, 2019; Published October 9, 2020




       Abstract. We construct pseudorandom generators with improved seed length for several
       classes of tests. First, we consider the class of read-once polynomials over GF(2) in m
       variables. For error ε we obtain seed length Õ(log(m/ε) log(1/ε)). This is optimal up to a
       factor of log(1/ε) · poly log log(m/ε). The previous best seed length was polylogarithmic in
       m and 1/ε.
           Second, we consider product tests f : {0, 1}m → C≤1 . These tests are the product of k
       functions fi : {0, 1}` → C≤1 , where the inputs of the fi are disjoint subsets of the m variables
       and C≤1 is the complex unit disk. Here we obtain seed length ` · polylog(m/ε). This implies
       better generators for other classes of tests. If moreover the fi have output range {−1, 0, 1}
       then we obtain seed length Õ((log(k/ε) + `)(log(1/ε) + log log m)). This is again optimal
       up to a factor√ of log(1/ε) · polylog(`, log k, log m, log(1/ε)), while the previous best seed
       length was ≥ k.
           A main component of our proofs is showing that these classes of tests are fooled by
       almost d-wise independent distributions perturbed with noise.

ACM Classification: G.3
AMS Classification: 68Q87, 68W20
Key words and phrases: pseudorandom generators, bounded independence plus noise, branching
programs, read-once polynomials
   ∗ Supported by NSF CCF award 1813930, the Croucher Foundation and the Simons Collaboration on Algorithms and

Geometry. Most of this work was carried out while the author was at Northeastern University.
  † Supported by NSF CCF award 1813930.



© 2020 Chin Ho Lee and Emanuele Viola
c b Licensed under a Creative Commons Attribution License (CC-BY)                              DOI: 10.4086/toc.2020.v016a007
                                   C HIN H O L EE AND E MANUELE V IOLA

1     Introduction
A pseudorandom generator (PRG) is a function that stretches a few input bits to a longer output so that on
a uniformly distributed input, the output distribution of the generator fools, i. e., looks random to, some
tests.

Definition 1.1. A distribution D over a finite set Ω fools a family F of functions f : Ω → C with error ε,
if
                                        E [ f (x)] − E [ f (u)] ≤ ε
                                         x∼D          u∼U

for every f ∈ F, where U is the uniform distribution over Ω.

   We often use the letter U to denote a random variable, uniformly distributed over the shared domain
Ω of a family F of functions.

Definition 1.2. A pseudorandom generator (PRG) for a family F of functions with domain Ω is a
function G : {0, 1}s → Ω such that G(U) fools F, where U is a random variable distributed uniformly
over {0, 1}s .

    The parameter s is called the seed length, and sometimes we call the family F a class of tests. The
construction of unconditional pseudorandom generators that fool restricted classes of tests is a fundamental
research direction, as the generators constructed often have disparate applications in theoretical computer
science. In this article we obtain new generators for several classes of tests. We start with the simplest
one.

1.1     History and results
1.1.1    Fooling read-once polynomials
Pseudorandom generators for polynomials have been studied since at least the 1993 work by Luby,
Veličković, and Wigderson [27],
                                √who gave a generator for GF(2) polynomials consisting of s monomials
                              O( log(s/ε))
with error ε and seed length 2    √          . See [38] for an alternative proof. Servedio and Tan [34] recently
improved the seed length to 2   O(  log s) · log(1/ε), and any significant improvement on the seed length
would require a breakthrough on circuit lower bounds. For low-degree polynomials, better generators are
known [6, 25, 39]. In this work we consider read-once polynomials on m variables, which are a sum of
monomials on disjoint sets of variables. For this class, a generator with seed length polylogarithmic in
m and 1/ε is given in [14] and it applies more generally to read-once ACC0 . We obtain a seed length
which is optimal up to a factor of log(1/ε) · poly log log(m/ε). In particular, when ε is not too small, our
generator has seed length optimal up to poly log log m. Throughout this paper the notation Õ(t) stands for
t · polylog(t).

Theorem 1.3. There exists an explicit generator G : {0, 1}s → {0, 1}m that fools any read-once GF(2)
polynomial with error ε and seed length s = Õ(log(m/ε) log(1/ε)).

    A specific motivation for studying read-once polynomials comes from derandomizing space-bounded
algorithms, a major line of research in pseudorandomness whose leading goal is proving RL = L. It

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                 2
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

is known that RL ⊆ DSPACE((log n)3/2 ) [33]. However, despite a lot of effort, in terms of PRGs for
general space-bounded algorithms, there has been no improvement over the seed length ≥ log2 m since
the classic 1992 paper by Nisan [31]. In fact, no improvement is known even under the restriction that the
algorithm uses constant space. Read-once polynomials can be implemented by constant-space algorithms,
and were specifically pointed out by several authors as a bottleneck for progress on space-bounded
algorithms, see for example this survey talk by Trevisan [36]. Thus our work can be seen as progress
towards derandomizing small-space algorithms. We note that the concurrent work of Chattopadhyay,
Hatami, Reingold and Tal [9] gives a generator for space-bounded algorithms which implies a generator
for polynomials with seed length Õ(log3 m log2 (m/ε)).
    Theorem 1.3 also holds for polynomials modulo M for any fixed M; in fact we obtain it as an easy
corollary of a more general generator (Theorem 1.7).

1.1.2   Fooling products
We consider tests on m bits that can be written as the product of k bounded functions on disjoint inputs
of ` bits each. Such tests generalize the well-studied combinatorial rectangles [1, 31, 32, 22, 12, 3, 26,
40, 16, 19] as well as other classes of tests, see [15]. They were introduced in [15] by Gopalan, Kane,
and Meka who call them Fourier shapes. However, in their definition the partition of the m-bit input into
the k blocks of `-bit inputs to the functions is fixed and known to the generator. Following a recent push
for breaking the mold of “fixed-order” tests, we consider such tests under arbitrary order. We call them
product tests and define them formally next.
Definition 1.4 (Product tests). A function f : {0, 1}m → C≤1 is an m-bit product test with k functions
of input length ` if there exist k disjoint subsets I1 , I2 , . . . , Ik ⊆ {1, 2, . . . , m} of size ≤ ` each such
that f (x) = ∏i≤k fi (xIi ) for some functions fi with range in C≤1 . Here C≤1 is the complex unit disk
{z ∈ C : |z| ≤ 1}, and xIi is the string of |Ii | bits of x indexed by Ii .
    Handling arbitrary order is significantly more challenging, because the classical space-bounded
generators [31, 22] only work in fixed order [37, 5]. Our previous work with Haramaty [20]
                                                                                       √ gave the first
generators for this class, but its dependence on k is poor: the seed length is always ≥ k. In this paper
we improve the dependence on k exponentially, though the results in [20] remain unsurpassed when k is
very small, e. g., k = O(1). We actually obtain two incomparable generators.
Theorem 1.5. There exists an explicit generator G : {0, 1}s → {0, 1}m that fools any m-bit product
test with k functions of input length ` with error ε and seed length s = Õ (` + log k)(log k) log(1/ε) +
log log m .
    Using the probabilistic method, one can show that there exist (non-explicit) generators with seed
length s = O(` + log k + log(1/ε) + log log m).
    By the reductions in [15], we also obtain generators that fool variants of product tests where the
outputs of the fi are not simply multiplied but combined in other ways. These variants include generalized
halfspaces [18] and combinatorial shapes [17, 10], extended to arbitrary √   order. For those we obtain
seed length Õ(` + log k)2 log(1/ε) log k, whereas the previous best was ≥ ` k [20]. As this application
amounts to plugging the above theorem into previous reductions, we don’t discuss it further in this paper
and instead refer the reader to Section 6 in [20].

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   3
                                    C HIN H O L EE AND E MANUELE V IOLA

    We then give another generator which has a seed length that is optimal up to a factor log(1/ε) ·
polylog(`, log k, log m, log(1/ε)), just like Theorem 1.3. However, for this we need each function fi in the
definition of product tests to have expectation at most 1 − Ω(2−` ). This condition is satisfied by Boolean
and most natural functions. For simplicity one can think of the functions fi having outputs {−1, 1}.

Definition 1.6 (Nice product tests). A product test as in Definition 1.4 is nice if each function fi has
expectation at most 1 − 2−` /4.

    Here, the constant 4 is chosen for ease of presentation and can be replaced by an arbitrary constant.

Theorem 1.7. There exists an explicit generator G : {0, 1}s → {0, 1}m that fools any nice m-bit prod-
           k functions of input length ` with error ε and seed length Õ (log(k/ε) + `)(log(1/ε) +
uct test with
log log m) .

    This is the result from which the generator for polynomials in Theorem 1.3 follows easily.


1.1.3   Bounded independence plus noise

The framework in which we develop these generators was first laid out by Ajtai and Wigderson in their
pioneering paper [2] where they constructed generators for AC0 with polynomial seed length. The
framework seems to have been forgotten for a while, possibly due to the spectacular successes by Nisan
who gave better and arguably simpler generators [30, 31]. The interest in the Ajtai–Wigderson generators
has recently been revived in a series of papers starting with the influential paper [16] by Gopalan, Meka,
Reingold, Trevisan, and Vadhan who use it to obtain a generator for read-once CNF on m bits with
error ε and seed length Õ(log(m/ε)). This significantly improves the previously available seed length of
O(log m) log(1/ε) [8, 11] when ε is small.
    The Ajtai–Wigderson framework goes by showing that the test is fooled by a distribution with
bounded independence [29] (d-wise independence for some small value of d), if we perturb it with noise.
(Previous papers use the equivalent language of restrictions; we instead follow [20].) Then the high-level
idea is to recurse on the function restricted to the positions perturbed by the noise. This has to be coupled
with a separate, sometimes technical argument showing that each recursive step simplifies the test. We
shall explain this in Section 1.2. Thus our goal is to understand if bounded independence plus noise fools
product tests.
    We say that X is an n-bit random variable if it is a random variable that takes its values in {0, 1}n .
We say that U is a uniformly distributed n-bit random variable if U is uniformly distributed over {0, 1}n .
    We perform bitwise operations of m-bit random variables. Specifically, for m-bit random variables
X = (X1 , . . . , Xm ) and Y = (Y1 , . . . ,Ym ), we write X +Y = (X1 +Y1 , . . . , Xm +Ym ) (coordinatewise XOR)
and X ∧Y = (X1Y1 , . . . , XmYm ).
    Let D and T be m-bit random variables and U a uniformly distributed m-bit variable. We write the
perturbed version of D = (D1 , . . . , Dm ) as D + (T ∧U). So if T = (T1 , . . . , Tm ) and Ti = 1 then we replace
Di by a uniform random bit, and all these bits are independent. We refer to T ∧U as the noise vector. For
the application it is important that T be selected pseudorandomly, though the result is interesting even if
T is uniform in {0, 1}m . We now state the result from [20] after defining bounded near-independence.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   4
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

Definition 1.8 (Total variation distance of m-bit random variables). Let X = (X1 , . . . , Xm ) and Y =
(Y1 , . . . ,Ym ) be m-bit random variables. The total variation distance of X and Y is defined as

                                δTV (X,Y ) = max | Pr(X ∈ S) − Pr(Y ∈ S)|.
                                               S⊆{0,1}m


Definition 1.9 ((δ , d)-closeness). The m-bit random variable X = (X1 , . . . , Xm ) is (δ , d)-close to the m-bit
random variable Y = (Y1 , . . . ,Ym ) if for every i1 , . . . , id ∈ {1, 2, . . . , m} the the d-bit random variables
(Xi1 , . . . , Xid ) and (Yi1 , . . . ,Yid ) have total variation distance at most δ .

    Note that the variables Xi are d-wise independent exactly if (X1 , . . . , Xm ) is (0, d)-close to the uniform
distribution.

Theorem 1.10 ([20]). Let f : {0, 1}m → C≤1 be an m-bit product test with k functions of input length `.
Let D, T and U be three independent m-bit random variables, where D and T are (0, d`)-close to uniform,
and U is uniform. Then
                                                                   2
                              E[ f (D + T ∧U)] − E[ f (U)] ≤ k2−Ω(d `/k) .

    Note that the dependence on k, the number of functions, is poor: when k = Ω(d 2 `), the error bound
does not give anything non-trivial. One of our main technical contributions is obtaining exponentially
better dependence on k using different techniques from [20]. Our theorem gives a non-trivial error bound
even when d = O(1) and k is exponential in `.

Theorem 1.11. Let f : {0, 1}m → C≤1 be an m-bit product test with k functions of input length `. Let
D, T and U be three independent m-bit random variables, where D and T are (δ , d`)-close to uniform,
and U is uniform. Then

                                 E [ f (D + T ∧U)] − E[ f (U)] ≤ 2−Ω(d) + Bδ ,
                               D,T,U                      U

for the following choices of B:

    i. B = (k2` )O(d) ;

   ii. if f is nice, then B = (d2` )O(d) .

     Setting δ = 0, Theorem 1.11 has a better bound than Theorem 1.10 when k = Ω(d`). An interesting
feature of Theorem 1.11 is that for nice products the parameter δ can be independent of k. We complement
this feature with a negative result showing that for general products a dependence on k is necessary. Thus,
the distinction between products and nice products is not an artifact of our proof but is inherent.

Claim 1.12. For every sufficiently large k, there exists a random variable D over {0, 1}k that is
(k−Ω(1) , kΩ(1) )-close to uniform, and a k-bit product test f : {0, 1}k → C≤1 with k functions of input
length 1 such that
                                   E[ f (D + T ∧U)] − E[ f (U)] ≥ 1/100,
where D, T and U are independent, and D and T are uniform over {0, 1}k .

                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                     5
                                   C HIN H O L EE AND E MANUELE V IOLA

    This claim also shows that for ` = 1 and ε = Ω(1) one needs δ ≤ k−Ω(1) , and even for random
variables which are (δ , kΩ(1) )-close to uniform, instead of just (δ , O(1))-close.
    For the class of combinatorial rectangles, which corresponds to product tests with each fi outputting
values in {0, 1}, the classic result [12] (extended in [8], for an exposition see Lecture 1 in [41]) shows
that d`-wise independence alone fools rectangles with error 2−Ω(d) and this error bound is tight. So
Theorem 1.11 does not give better bounds for rectangles, even with the presence of noise. We develop
additional machinery and obtain an improvement on Theorem 1.11. While the improvement is modest,
the machinery we develop may be useful for further improvements. Since this improvement is not used
in our construction of PRGs, we only state and prove it for exact bounded independence. For technical
reasons we restrict the range of the fi .

Theorem 1.13. Let f be an m-bit product test with k functions of input length `. Suppose the range of
each function fi of f is the set {0, 1}, or the set of all M-th roots of unity for some fixed M. Let D, T
and U be three independent m-bit random variables, where D and T are d`-wise independent, and U is
uniform. Then
                                  E[ f (D + T ∧U)] − E[ f (U)] ≤ `−Ω(d) .

    Finally, it is natural to ask if similar techniques fool non-read-once polynomials. In this regard, we
are able to show that small-bias distributions [29] plus noise fool F2 -polynomials of degree 2.

Claim 1.14. Let p : {0, 1}m → {0, 1} be any F2 -polynomial of degree 2. Let D, T = (T1 , . . . , Tm ) and
U be three independent m-bit random variables, where D is δ -biased, T1 , . . . , Tm are independent and
E[Ti ] = 2/3 for every i, and U is uniform. Then

                                     E[p(D + T ∧U)] − E[p(U)] ≤ δ .

1.1.4    Subsequent developments
This paper is one of a series of recent articles that construct new pseudorandom generators. One celebrated
goal is to construct better generators for read-once bounded-width branching programs. As previously
mentioned, the special case of read-once polynomials was an obstacle to achieving this goal that was
noted by several researchers including Trevisan [36] and Vadhan (personal communication). This paper
removes that obstacle. Building on this paper, subsequent work made progress on width-3 branching
programs [28]. As of February 2019, the main generators in this paper are subsumed by subsequent
work [28, 23], but in the case of products of complex-valued functions, the seed length in this paper
remains unsurpassed, and we believe the techniques here may find further applications.

1.2     Techniques
We first explain how to prove that bounded independence plus noise fools product tests (Theorem 1.11).
After that, we will explain the additional ideas that go into constructing our PRGs.
     Following the literature [16, 17, 19], at a high level we do a case analysis based on the total variance
of the product test f we want to fool. This is defined as the sum of the variances Var[ fi ] of the functions fi
in the definition of product test. The variance of a function g is E[|g(x)|2 ] − |E[g(x)]|2 where x is uniform.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                 6
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

1.2.1   Low total variance
Our starting point is an important inequality by Gopalan, Kane and Meka [15] (cf. [16, 19]), who showed
that bounded independence alone without noise fools low total-variance product tests. However, their
result is only proved for exact bounded independence, namely, every d bits are exactly uniform, whereas
it is critical for our seed lengths to handle bounded near-independence, i. e., every d bits are close to
uniform.
     One of the technical contributions of this paper is extending the inequality in [15] to work for bounded
near-independence. The proof of the inequality in [15] is somewhat technical, and our extension introduces
several complications. For example, the expectations of the fi under a bounded near-independent
distribution and the uniform distribution are not guaranteed to be equal, and this requires additional
arguments. However our proof follows the argument in [15], which we also present in a slightly different
way that is possibly of interest to some readers. Finally we mention that Claim 1.12 shows that our error
term is close to tight in certain regimes, cf. Section 7.

1.2.2   High total variance
Here we take a different approach from the ones in the literature: The papers [14, 15] essentially reduce
the case of high total variance to the case of low total variance. However their techniques either blow up
the seed length polynomially [14] or rely on space-bounded generators that only work in fixed order [15].
    We instead observe that bounded independence plus noise fools even high total-variance product tests.
We now give some details of our approach. A standard fact is that the expectation of a product test f is
bounded above by
                              ∏|E[ fi ]| ≤ ∏(1 − Var[ fi ])1/2 ≤ e− ∑i Var[ fi ]/2 .
                                i            i

So if the total variance ∑i Var[ fi ] is large then the expectation of the product test under the uniform distri-
bution is small. Thus, it suffices to show that the expectation is also small under bounded independence
plus noise. To show this, we argue that typically, the total variance remains high even considering the
 fi as functions of the noise only. Specifically, we first show that on average over uniformly distributed
x and t, the variance of the functions fi0 (y) := fi (x + t ∧ y) is about as large as that of the fi . This uses
Fourier analysis. Then we use concentration inequalities for bounded near-independent random variables
to derandomize this fact: we show that it also holds for typical x and t sampled from D and T .
     This suffices to prove Theorem 1.11.i. Proving Theorem 1.11.ii requires additional ideas.
     We first note that the case of high total variance actually does not appear in the read-once CNF
generator in [16]. This is because one can always truncate the CNF to have at most 2w log(1/ε) number
of clauses of width w, which suffices to determine the expected value of the CNF up to an additive error
of ε, and such a CNF has low total variance (for this one argues that noise helps reduce the variance a
little.) To handle an arbitrary read-once CNF, [16] partition the clauses according to their width, and
handle each partition separately.
     However, one cannot truncate polynomials without noise. To see why, consider, for a simple example,
the linear polynomial x1 + x2 + . . . + xm (corresponding to a product test that computes the parity function).
Here no strict subset of the monomials determines the expectation of the polynomial. Indeed, one can
construct distributions which look random to m − 1 monomials, but not to m.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                  7
                                   C HIN H O L EE AND E MANUELE V IOLA

1.2.3   Truncation using noise

Although we cannot truncate polynomials without noise, we show that something almost as good can
still be done, and this idea is critical to obtaining our seed lengths. We show that the statistical closeness
parameter in D and T can be selected as if the polynomial was truncated: it is independent from the
number k of functions. This is reflected in Theorem 1.11.ii, where δ is independent from k. The proof
goes by showing that if the number k of functions is much larger than 23` then noise alone will be enough
to fool the test, regardless of anything else. This proof critically uses noise: without noise a dependence
on k is necessary, as shown in the parity example in our discussion. Also, for the proof to work the
functions must have expectation at most 1 − Ω(2−` ). As mentioned earlier, we further prove that this
last requirement is necessary (Claim 1.12): we construct functions whose expectation is about 1 − 1/k
but their product is not fooled by almost bounded independence plus noise, if the statistical closeness
parameter is larger than 1/kc for a suitable constant c.

1.2.4   Additional ideas for improved bound

To obtain the improved error bound in Theorem 1.13, we show that whenever the total variance of a
product test lies below dn0.1 , we can use noise to bring it down below d`−0.1 . This produces a gap
of [d`−0.1 , d`0.1 ] between cases of the high and low total variance, which gives the better bound using
the previous arguments. Reducing the total variance requires a few additional ideas. First, we use
Theorem 1.10 to handle the functions fi in the product test which have high variances. Then we use the
hypercontractivity theorem to reduce the variances of the rest of the fi individually. [16] also uses noise
to reduce variance, but the functions fi in [16] are just AND and so it does not need hypercontractivity.
To combine both ideas, we prove a new “XOR Lemma” for bounded independence, a variant of an XOR
lemma for small-bias, which was proved in [16].

1.2.5   Constructing our PRGs

We now explain how to use Theorem 1.11 to construct our PRGs. The high-level idea of our PRG
construction is to apply Theorem 1.11 recursively following the Ajtai–Wigderson framework: Given
D + T ∧U, where T = (T1 , . . . , Tm ), we can think of each Ti as an indicator random variable that selects
the i-th position with probability 1/2. For intuition, it would be helpful to assume each position is selected
independently.
     We will focus on how to construct a PRG using a seed of length Õ(log m) for read-once polynomials
with constant error, as this simplifies the parameters and captures all the ideas. Without loss of generality,
we can assume the degree of a polynomial to be ` = O(log m), because the contribution of higher-degree
terms can be shown to be negligible under a small-bias distribution. (See the proof of Theorem 1.3.)
     Let p : {0, 1}m → {0, 1} be a degree-` read-once polynomial with k monomials. It would be conve-
nient to think of p outputting values {−1, 1}. Further, we can write p as a product ∏ki=1 pi , where each pi
is a monomial on at most ` bits (with outputs in {−1, 1}.)
     Now suppose we only assign the values in D to the positions not chosen by T , that is, setting the
input bits xi = Di for i 6∈ T . This induces another polynomial pD,T defined on the positions in T . Clearly,
pD,T also has degree at most `, and so we can reapply Theorem 1.11 to pD,T .

                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                8
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

    Repeating the above argument t times induces a polynomial defined on the positions Tt := ti=1 Ti .
                                                                                                       V

One can think of Tt = (Tt 1 . . . , Tt m ) as a single random variable, where each Ti selects the iposition
with probability 2−t . Viewing Tt this way, it is easy to see that we can terminate the recursion after
t := O(log m) steps, as the set of positions selected by Tt should become empty with high probability.
    By standard constructions [29], it takes s := Õ(`) bits to sample D and T in Theorem 1.11.ii each
time. Therefore, we get a PRG of seed length t · s = Õ(` log m).
    To obtain a better seed length, we will instead apply Theorem 1.11 in stages. Our goal in each stage is
to reduce the degree of the polynomial by half. In other words, we want the restricted polynomial defined
on the positions in ti=1 Ti to have degree `/2. It is not difficult to see that in order to reduce the degree of
                   V

the m monomials of p to `/2 with high probability, it suffices to apply our above argument recursively for
t := O(log m)/` times. So in each stage, we use a seed of length
                                                         log m 
                                        t · s = Õ(`) ·           = Õ(log m).
                                                            `
After repeating the same argument for O(log `) = O(log log m) stages, with high probability the restricted
polynomial would have degree 0 and we are done. Therefore, the total seed length of our PRG is Õ(log m).
    Here we remark that it is crucial in our argument that D and T are bounded near-independent, as
opposed to being small-biased. Otherwise, we cannot have seed length s = Õ(`) when ` = o(log m).
For example, when ` = O(1), with small-bias we would need s = O(log m) bits, whereas we just use
O(log log m) bits.
    Motivated by the analysis in [20], Forbes and Kelley [13] show that 2t-wise independence plus noise
fools width-w ROBPs on m bits with error 2−t/2 mw. Their work implicitly shows that t-wise independence
plus noise fools product tests with k functions of input length ` with error k2−Ω(t)+`−1 , improving [20].
However, their result is incomparable to Theorems 1.11 and 1.13, as there is no dependence on k in our
error bounds for exact bounded independence, i. e., when D is (0, d`)-close to uniform. By combining
their result with the observation that noise alone fools nice product tests when k, the number of functions,
is much larger than 23` , we show that the dependence on k in their error bound can be removed for nice
product tests.
Theorem 1.15. Let f : {0, 1}m → C≤1 be a nice product test with k functions of input length `. Let D, T
and U be three m-bit random variables, where D and T are t-wise independent and U is uniform. Then

                                  E [ f (D + T ∧U)] − E[ f (U)] ≤ 28`−Ω(t) .
                                 D,T,U                   U

    We note that for product tests this error bound is optimal up to the constant in the exponent, because
the same distribution fools parities with error 2−(t+1) . On the other hand, [7, Theorem 8] shows that for
ROBPs the dependence on m in the error is inherent.

Organization. We prove bounded independence plus noise fools product (Theorem 1.11) in Section 2,
except the proof of the case of low total variance, which we defer to Section 4. Then we give constructions
of our PRGs in Section 3. In Section 5, we show how to obtain the modest improvement of Theorem 1.11
and the optimal error bound for nice product tests (Theorem 1.15) using [13]. After that, we prove our
result on fooling degree-2 polynomials in Section 6. Finally, we prove Claim 1.12 in Section 7.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                 9
                                     C HIN H O L EE AND E MANUELE V IOLA

                Conditions                     Uses                 Follows from                         Error
    (1)       ∑i≤k Var[ fi ] ≤ αd              D                   Lemma 2.1                      2−Ω(d) + (k2` )O(d) δ
    (2)       ∑i≤k Var[ fi ] ≥ αd          D + T ∧U          Derandomized Claim 2.2                 2−Ω(d) + kO(d) δ
    (3)    k ≥ 23`+1 d, nice products        T ∧U                  Claim 2.4                       2−Ω(d`) + 2O(d`) δ

Table 1: Error bounds for fooling a product tests of k functions of input length ` under different conditions.
Here D, T and U are independent, where D and T are (δ , d`)-close to uniform, U is uniform, and α is a
small constant.



2    Bounded independence plus noise fools products
In this section we prove Theorem 1.11. As we mentioned in the introduction, the proof consists of 3
parts: (1) Low total variance, (2) high total variance, and (3) truncation using noise for nice products. We
summarize the conditions and the error bounds we obtain for these cases in Table 1. Let us now quickly
explain how to put them together to prove Theorem 1.11. Clearly, combining (1) and (2) immediately
gives us a bound of 2−Ω(d) + (k2` )O(d) for product tests, proving Theorem 1.11.i. For nice product tests,
we can apply (3) if k ≥ 23`+1 d, otherwise we can plug in k ≤ 23`+1 d in the previous bound, proving
Theorem 1.11.ii.
    We now discuss each of the 3 cases in order. Since the proof of the case of low total variance is quite
involved, we only state the lemma in this section and defer its proof to Section 4.

Lemma 2.1. Let X1 , X2 , . . . , Xk be k independent random variables over C≤1 such that for every i ∈
{1, . . . , k} and zi ∈ Supp(Xi ) we have Pr[Xi = zi ] ≥ 2−`
                                                          q. Let Y1 ,Y2 , . . . ,Yk be k random variables over C≤1
that are (ε, 16d)-close to X1 , . . . , Xk . For every σ ≥     ∑ki=1 Var[Xi ],

                              h k          h k                               d/2
                                                                        σ2
                                     i               i              
                           E ∏ Yi − E ∏ Xi               ≤2  O(d)
                                                                                    + (k2` )O(d) ε.
                               i=1             i=1                      d

    We now prove a claim that handles the case of high total variance. This claim shows that for
uniformly distributed x and t, the variance of the function g(y) := f (x + t ∧ y) is close to the variance of f
in expectation. Its proof follows from a simple calculation in Fourier analysis. Later, we will derandomize
this claim in the proof of Theorem 1.11.

Claim 2.2. Let T = (T1 , . . . , T` ) be a `-bit random variable where the T j ’s are independent and E[T j ] = η
for each j. Let f : {0, 1}` → C be any function. Then
                                       h                      i
                                                          0
                                      E Var
                                          0
                                            [ f (U + T ∧U   )]  ≥ η Var[ f ],
                                     U,T   U


where T,U and U 0 are independent, and U and U 0 are uniform.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                             10
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

Proof of Claim 2.2. By the definition of variance and linearity of expectation, we have
           h                      i     h                                              2i
                              0                           0 2                      0
                                                              
          E Var
              0
                [ f (U + T ∧U   )]  = E  E0
                                            | f (U + T ∧U  )|   − E0
                                                                     [ f (U + T ∧U   )]
         U,T     U                                        U,T      U                                    U
                                                         h 
                                                                             2
                                                                               i    h                     2i
                                                      = E E0 f (U + T ∧U 0 )     − E E0 [ f (U + T ∧U 0 )] .
                                                          U,T      U                                    U,T       U

The first term is equal to

                                       E[| f (U)|2 ] = ∑ fˆα fˆα 0 E[χα−α 0 (U)] = ∑| fˆα |2 .
                                       U                                           U
                                                                   α,α 0                                    α

The second term is equal to
                       h h                      i h                             ii
                     E E0 ∑ fˆα χα (U + T ∧U 0 ) E00 ∑ fˆα 0 χα 0 (U + T ∧U 00 )
                         U,T       U                                                   U
                                             α                                                 α0
                                       h                                                   i
                                             ˆ  ˆ                                       00 0
                         = E
                           U,T
                                           ∑ fα fα E0 [χα (U + T ∧U )] E00 [χα (U + T ∧U )]
                                                  0
                                                               U
                                                                              0
                                                                                                U
                                           α,α 0
                                                                               h              i
                         = ∑ fˆα fˆα 0 E[χα+α 0 (U)] E E0 [χα (T ∧U 0 )] E00 [χα 0 (T ∧U 00 )]
                                                    U                      T       U                U
                           α,α 0

                         = ∑| fˆα |2                 E [χα (T ∧ (U 0 +U 00 ))]
                               α                 T,U 0 ,U 00

                         = ∑| fˆα |2 (1 − η)|α| .
                               α

Therefore,
                h                      i                          
                                   0             2             |α|
               E Var
                   0
                     [ f (U + T ∧U   )]  =  |
                                           ∑ αˆ
                                              f |   1 − (1 − η)      ≥ η ∑ | fˆα |2 = η Var[ f ],
               U,T   U                                             α                                            α6=0

where the inequality is because 1 − (1 − η)|α| ≥ 1 − (1 − η) ≥ η for any α 6= 0.

    With Lemma 2.1 and Claim 2.2, we now prove Theorem 1.11.

Proof of Theorem 1.11.i. Let σ be exactly (∑i≤k Var[ fi ])1/2 . We will consider two cases: σ 2 ≤ αd and
σ 2 > αd, where α > 0 is a sufficiently small constant.
     If σ 2 ≤ αd, we use Lemma 2.1. Specifically, since Pr[ fi (U) = zi ] ≥ 2−` for every i and zi ∈ Supp( fi ),
it follows from Lemma 2.1 that
                            h k       i    h k         i
                          E ∏ fi (D) − E ∏ fi (U) ≤ 2−Ω(d) + (k2` )O(d) δ ,
                                           i=1                           i=1

and the desired bound holds for every fixing of T and U.
   If σ 2 ≥ αd, then the expectation of f under the uniform distribution is small. More precisely, we
have
                                                                                                    1   2
                               ∏ EU [ fi (U)] ≤ ∏(1 − Var[ fi ])1/2 ≤ e− σ ≤ 2−Ω(d) .               2                  (2.1)
                               i≤k                                 i≤k



                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                           11
                                         C HIN H O L EE AND E MANUELE V IOLA

Thus, it suffices to show that its expectation under D + T ∧U is at most 2−Ω(d) + (k2` )O(d) δ . We now use
Claim 2.2 to show that
                                      h k               i
                                  E ∏ fi (D + T ∧U) ≤ 2−Ω(d) + kO(d) δ .
                                  D,T,U
                                           i=1
                                                               2 denote Var 0 [ f (x + t ∧ U 0 )]. We claim that
For each t, x ∈ {0, 1}m , and each i ∈ {1, 2, . . . , k}, let σt,x,i       U i
      2 is large for most x and t sampled from D and T respectively. From Claim 2.2 we know that
∑i≤k σt,x,i
this quantity is large in expectation for uniform x and t. By a tail bound for bounded near-independent
random variables, we show that the same is true for most x ∈ D and t ∈ T . By a similar calculation to
(2.1) we show that for these x and t we have that |E[ f (x + t ∧U)]| is small.
    To proceed, let T 0 be a random variable distributed uniformly over {0, 1}m . Applying Claim 2.2 with
η = 1/2, we have ET 0 ,U [σT20 ,U,i ] ≥ Var[ fi ]/2. So by linearity of expectation,
                                            h                i
                                                     2           2
                                         E     ∑   σ T 0 ,U,i ≥ σ /2 ≥ αd/2.
                                         0T ,U   i≤k

Since T and D are both (δ , d`)-close to uniform, the random variables σT,D,1  2                2
                                                                                     , . . . , σT,D,k are (2δ , d)-close
    2                   2                                                      2
to σT 0 ,U,1 , . . . , σT 0 ,U,k . We now use the following tail bound on the σT,D,i . Its proof can be found in
Section 8.
Lemma 2.3. Let X1 , X2 , . . . , Xk ∈ [0, 1] be independent random variables. Let d be an even positive
integer. Let Y1 ,Y2 , . . . ,Yk ∈ [0, 1] be random variables that are (ε, d)-close to X1 , . . . , Xk . Let Y = ∑i≤k Yi
and µ = E[∑i Xi ]. Then,
                                                                     !d
                                                                                2k d
                                                           p                      
                                                         d   µd + d
                               Pr[|Y − µ| ≥ δ µ] ≤ 4 · 2                 +4           ε.
                                                              δµ               δµ

In particular, if µ ≥ 25d, we have Pr[|Y − µ| ≥ µ/2] ≤ 2−Ω(d) + kd ε.
    Let µ be ET 0 ,U [∑i≤k σT20 ,U,i ] ≥ αd/2. By Lemma 2.3,
                                         h               i
                                      Pr
                                       0
                                              2
                                           ∑ T ,U,i
                                            σ   0   ≤ µ/2  ≤ 2−Ω(d) + kO(d) δ .                                   (2.2)
                                   T ,U i≤k


Hence, except with probability 2−Ω(d) + kO(d) δ over t ∈ T and x ∈ D, we have
                                       2
                                    ∑ σt,x,i = ∑ Var[ fi (x + t ∧U 0 )] ≥ αd/4.
                                                        0
                                    i≤k          Ui≤k

For every such t and x, we have

                                  ∏ EU [ fi (x + t ∧U)] ≤ ∏ EU [ fi (x + t ∧U)]
                                   i≤k                         i≤k
                                                                     2
                                                            ≤ ∏(1 − σt,x,i )1/2
                                                               i≤k
                                                                  1      2
                                                            ≤ e− 2 ∑i≤k σt,x,i ≤ 2−Ω(d) .                         (2.3)


                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                       12
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

In addition, we always have | f | ≤ 1. Hence, summing the right hand side of (2.2) and (2.3), we have
                                                                    "                             #
                      h                  i
                E         ∏ fi (D + T ∧U) ≤ E                           ∏ EU [ fi (D + T ∧U)]          ≤ 2−Ω(d) + kO(d) δ .
              D,T,U                                           D,T
                          i≤k                                           i≤k


    To prove Theorem 1.11.ii, we use the following additional observation that noise alone fools nice
products when k is suitably larger than 22` . The high-level idea is that in such a case there will be at least
k2−` ≥ 2` functions fi whose inputs are completely set to uniform by the noise. Since the expectation
of each fi is bounded by 1 − 2−` /4, the expectation of their product becomes small when k is suitably
larger than 22` . On the other hand, E[ f (U)] can only get smaller under the uniform distribution, and so
the expectations under uniform and noise are both small.

Claim 2.4 (Noise fools nice products with large k). Let f : {0, 1}m → C≤1 be a nice m-bit product test
with k ≥ 23`+1 d functions of input length `. Let T,U be two independent m-bit random variables where
T is (δ , d`)-close to uniform, and U is uniform.
    Then
                               E [ f (T ∧U)] − E[ f (U)] ≤ 2−Ω(d`) + 2O(d`) δ .
                                        T,U


Proof. We will bound above both expectations in absolute value. Let k0 := 23`+1 d ≤ k. Write f = ∏ki=1 fi ,
where fi : {0, 1}Ii → C≤1 . Since f is nice, we have |E[ fi (U)]| ≤ 1 − 2−` /4 for every i ∈ {1, . . . , k}. Under
the uniform distribution, we have

                                               k
                                                                                                      −`
                          E[ f (U)] = ∏ E[ fi (U)] ≤ (1 − 2−` /4)k ≤ e−Ω(k2 ) ≤ 2−Ω(d`) .                                     (2.4)
                                              i=1


It suffices to show that the expectation under T ∧U is at most 2−Ω(d`) + 2O(d`) δ . Note that

                                                        h k                      i         h k0         i
                          E[ f (T ∧U)] ≤ E ∏ E[ fi (T ∧U)]                           ≤ E ∏ E[ fi (T ∧U)] .
                                                    T          U                       T          U
                                                         i=1                                i=1


We now show that the right hand side is at most 2−Ω(d`) + 2O(d`) δ . We first show that the expected number
of fi whose inputs are all selected by T when T is uniform is large, and then apply Lemma 2.3 to show
that it holds for most t ∈ T . Let T 0 be a random variable distributed uniformly over {0, 1}m . Then

                                 h k0              i   k0
                                       0    |Ii |
                             E     1(T
                                  ∑ Ii   = 1      )  = ∑ Pr[TI0i = 1|Ii | ] ≥ k0 2−` = 22`+1 d.
                                  i=1                               i=1

Since T is (δ , d`)-close to uniform, the TIi are (δ , d)-close to uniform. By Lemma 2.3,

                                        h k0                         i
                                                    |Ii |
                                   Pr      1(T
                                          ∑ i I = 1       ) ≤ 2 2`
                                                                   d   ≤ 2−Ω(d`) + 2O(d`) δ .                                 (2.5)
                                    T
                                          i=1


                            T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                13
                                     C HIN H O L EE AND E MANUELE V IOLA

                                                                                                      0
Note that if TIi = 1|Ii | , then |EU [ fi (T ∧ U)]| = |E[ fi ]| ≤ 1 − 2−` /4. Thus, conditioned on ∑ki=1 1(TIi =
1|Ii | ) ≥ 22` d, we have

                                k0
                                                                   2`
                               ∏ E[ fi (T ∧U)] ≤ (1 − 2−` /4)2 d ≤ 2−Ω(d`) .                                (2.6)
                               i=1

Since we always have | f | ≤ 1, the error bound follows from summing the right hand side of (2.4), (2.5)
and (2.6).

    Theorem 1.11.ii now follows easily from Claim 2.4 and Theorem 1.11.i.

Proof of Theorem 1.11.ii. Since f is nice, |E[ fi ]| ≤ 1 − 2−` /4. If k ≥ 23`+1 d, then the theorem follows
from Claim 2.4. Otherwise, k ≤ 23`+1 d and the theorem follows from Theorem 1.11.i.


3    Pseudorandom generators
In this section we construct our generators. As explained in the introduction, all constructions follow from
applying the Theorem 1.11 recursively. We obtain our generator for arbitrary product tests (Theorem 1.5)
by applying Theorem 1.11 for O(log `k) times recursively. Our progress measure for the recursion is
the number of bits the restricted product test is defined on. We show that after O(log `k) steps of the
recursion we are left with a product test that is defined on m0 := O(` log(1/ε)) bits, which can be fooled
by a distribution that is (ε, m0 )-close to uniform. As a first read, we suggest the readers to refer to the Õ
notations in the statements and proofs, i. e., ignore polylogarithmic factors in `, log k, log(1/ε) and log m,
and think of k as m and ε as some small constant.

Proof of Theorem 1.5. Let C be a sufficiently large constant. Let t = C log(`k), d = C log(t/ε) and
δ = (k2` )−Cd . Let D1 , . . . , Dt , T1 , . . . , Tt and G0 be 2t + 1 independent m-bit random variables that are
(δ , d`)-close to uniform. Define D(1) := D1 and D(i+1) := Di+1 + Ti ∧ D(i) . Let D := D(t) , T := ti=1 Ti .
                                                                                                           V

Our generator G outputs
                                                           D + T ∧ G0 .
    We first look at the seed length of G. By [29, Lemma 4.2], sampling G0 and each of the random
variables Di and Ti takes a seed of length
                                                                  
                                      O d` + log(1/δ ) + log log m
                                                                         
                                     = O (` + log k) log(t/ε) + log log m
                                     = Õ(` + log k) log(1/ε) + O(log log m).

Hence the total seed length of G is
                                                                                                 
     (2t + 1) · Õ(` + log k) log(1/ε) + O(log log m) = Õ (` + log k)(log k) log(1/ε) + log log m .

We now look at the error of G. By our choice of δ and applying Theorem 1.11.i recursively for t times,

                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   14
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

we have
                     E[ f (D + T ∧U)] − E[ f (U)] ≤ t · 2−Ω(d) + (k2` )O(d) δ ≤ ε/2.
                                                                             

Next, we show that for every fixing of D and most choices of T , the function fD,T (y) := f (D + T ∧ y) is a
product test defined on d` bits, which can be fooled by G0 .
     Let I = ki=1 Ii . Note that |I| ≤ `k. Because the variables Ti are independent and each of them is
              S

(δ , d`)-close to uniform, we have
                                       
                                        |I|
                                            (2−d` + δ )t ≤ 2d` log(`k) · 2−Ω(Cd` log(`k)) ≤ ε/4.
                                 
                  Pr |I ∩ T | ≥ d` ≤
                                        d`
It follows that for every fixing of D, with probability at least 1 − ε/4 over the choice of T , the function
fD,T is a product test defined on at most d` bits, which can be fooled by G0 with error ε/4. Hence G fools
f with error ε.

     Our generator for nice product tests (Theorem 1.7) uses the maximum input length of the functions
 fi as the progress measure. We will use the following lemma, which captures the trade-off between the
number of recursive steps and the simplification on a product test measured in terms of the maximum
input length of the fi .
                                                        0
Lemma 3.1. Given an explicit generator G0 : {0, 1}s → {0, 1}m that fools nice m-bit product tests with
k functions of input length r with error ε 0 and seed length s0 , one can construct another explicit generator
G : {0, 1}s → {0, 1}m that fools nice m-bit product tests with k functions of input length ` with error
ε 0 + tε, where                                                      
                                              log(k/ε)             `
                                      t =O              + log              ,
                                                r+1               r+1
and seed length
        s = s0 + t · O((` + log log(1/ε)) log(1/ε) + log log m) = s0 + t · Õ (` log(1/ε)) + log log m .
                                                                                                      

    We defer its proof to the end. Theorem 1.7 requires applying the lemma in stages, where in each
stage we apply the lemma with a different value of `. XORing its output with a random variable sampled
according to a small-bias distribution gives our generator for polynomials (Theorem 1.3).
    We will apply Lemma 3.1 in O(log `) stages. In each stage our goal is to halve the input length of the
product test.

Proof of Theorem 1.7. Let f be a nice m-bit product test with k functions of input length `. Note that by
applying Lemma 3.1 with r = `/2 and error ε/(t log `), where t = O(log(k/ε)/` + 1), we can halve its
input length by incurring an error of ε/O(log `) and using a seed of length
                                                                                     
                        t · O (` + log log((t log `)/ε)) log((t log `)/ε) + log log m
                                                                     
                       = Õ (log(k/ε) + `)(log(1/ε) + log log m) .
Now we repeat the argument for s = O(log `) steps until the input length is zero, which is a constant
function and can be fooled with zero error. So we have a generator that fools nice m-bit product tests with
                                                                                                        
k functions of input length `, with error ε and seed length s · Õ (log(k/ε) + `)(log(1/ε) + log log m) =
Õ (log(k/ε) + `)(log(1/ε) + log log m) .

                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                               15
                                 C HIN H O L EE AND E MANUELE V IOLA

    Theorem 1.3 follows from XORing the output of the above generator with a small-bias distribution.

Proof of Theorem 1.3. Let c be a sufficiently large constant. Let D and G be two independent m-bit
random variables, where D is sampled from a (ε/m)c -biased distribution [29], and G is sampled from the
output distribution of the generator in Theorem 1.7 that fools m-bit product tests with m functions and
input length c log(m/ε) with error ε/2. The generator outputs D + G. By [29] and Theorem 1.7, it takes
a seed of length
                                                      
             O(log(m/ε)) + Õ log(m/ε) + c log(m/ε) log(1/ε) = Õ(log(m/ε)) log(1/ε).
     Let p : {0, 1}m → {−1, 1} be any read-once GF(2) polynomial. Consider the polynomial p0 obtained
from p by removing all the monomials with degree greater than c log(m/ε) in p. We claim that the
expectation of p and p0 under D + G differs by at most ε. Note that for any random variable X that is
sampled according any (ε/m)c -biased distribution X, the probability that any c log(m/ε) bits of X are 1
is at most ε/4m, and so by a union bound we have Pr[p(X) 6= p0 (X)] ≤ ε/4. In particular, this holds for
D + G and U. It follows that
                   E[p(D + G)] − E[p(U)] ≤ E[p0 (D + G)] − E[p0 (U)] + ε/2 ≤ ε,
where the last inequality holds for any fixed D because of Theorem 1.7.

    We now prove Lemma 3.1. First we make an observation that will be used in the proof to reduce the
input length of the product test.
Claim 3.2. Let T (1) , . . . , T (t) be t independent and identical random variables over {0, 1}` that are
                                                         `
                                                            (2−(r+1) + δ )t .
                                      Vt
δ -close to uniform. Then Pr[wt( i=1 T (i) ) > r] ≤ r+1
                                                           

Proof. We have
                 h ^t         i                          t
                                                          h^                     i
                                                                           (i)
               Pr wt   T (i) > r ≤          ∑        Pr           ∧ j∈S (T j = 1)
                       i=1               S:|S|=r+1          i=1
                                                      t  h^            i
                                                               (i)
                                     =      ∑        ∏ Pr   (T j   = 1)
                                         S:|S|=r+1 i=1            j∈S
                                                                            
                                                          −(r+1)          `
                                     ≤      ∑     (2               +δ) =t
                                                                              (2−(r+1) + δ )t .
                                         S:|S|=r+1
                                                                         r+1

Proof of Lemma 3.1. Let C be a sufficiently large constant. The generator G will output H (1) , where we
define the distribution of the random variable H (i) recursively for
                                                                   
                                           log(k/ε)              `
                                    t =O               + log
                                             r+1               r+1

steps: At the i-th step, H (i) samples two independent m-bit random variables, D(i) , T (i) , that are
(δ ,C` log(1/ε))-close to uniform, where δ = 2−C(`+log log(1/ε)) log(1/ε) , and independent from H (i+1) .
Then output
                                     H (i) := D(i) + T (i) ∧ H (i+1) .

                       T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                            16
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

We define H (t+1) to be G0 (Us0 ).
   By [29, Lemma 4.2], sampling D(i) and T (i) takes a seed of length

                                       u := O(` log(1/ε) + log(1/δ ) + log log m)
                                            = O((` + log log(1/ε)) log(1/ε) + log log m)
                                            = Õ(` log(1/ε)) + O(log log m).

The total seed length of G is therefore s = s0 + tu = s0 + t · Õ (` log(1/ε)) + log log m .
                                                                                          
                                                                                                                 (i)
    We now analyze the error of G. For i ∈ {1, 2, . . . ,t}, consider the variant HU of H (1) , which is the
                                                                          (0)
same as H (1) but at the i-th step replace H (i+1) with Um . Let HU = Um .
    For every i ∈ {1, . . . ,t}, for every fixed D(1) , . . . , D(i−1) and T (1) , . . . , T (i−1) , the function f restricted
to j<i T ( j) remains a product test with k functions of input length `, and remains nice if f is nice. Call
   V

the restricted function g. Then, by Theorem 1.11, we have
                                  (i−1)                   (i)
                       E[ f (HU           )] − E[ f (HU )] = E[g(U)] − E[g(D(i) + T (i) ∧Um )] ≤ ε.

Hence, summing over i we have
                                                                    t
                                                          (t)                      (i−1)                 (i)
                             E[ f (Um )] − E[ f (HU ] ≤ ∑ E[ f (HU                         )] − E[ f (HU )] ≤ tε.
                                                                   i=1

                                              (t)
    We now prove that |E[ f (HU )] − E[ f (H (1) )]| ≤ ε 0 + 2ε. We will show that except with probability ε,
the function f restricted to j≤t T ( j) is an r-bit product test and so we can fool the restricted function
                            V

using G0 given by our assumption.
    Write f = ∏i≤k fi , where each fi is defined on {0, 1}Ii with |Ii | ≤ `. We claim that

                                          h t
                                           ^                                    i
                                                (i)
                                      Pr wt   TI j > r for some j ∈ {1, . . . , k} ≤ ε.
                                                    i=1

                                       Vt       (i)
It suffices to analyze Pr[wt(             i=1 TI j ) > r] for each j and take a union bound over j ≤ k.
                           (i)
    Since |I j | ≤ `, TI j is 2−C` -close to uniform, by Claim 3.2 and a union bound over j ≤ k, the probability
that some fi has input length > r is at most
                                                                                r+1
                                                                             `·e
                                                                                                                      
                         `                                  t                                    Ω log(k/ε)     `
                                                                                                       r+1 +log( r+1 )
                   k                  2−(r+1) + 2−C`             ≤ k·                       2−r                           ≤ ε.
                        r+1                                                 r+1

Hence, for every D(1) , . . . , D(t) , with probability 1 − ε over the choice of T (1) , . . . , T (t) , the function f
restricted to ti=1 T (i) becomes a product with k functions of input length r, and remains nice if f is nice.
             V
                                                                     (t)
Conditioned on this, we have by the definition of G0 that |E[ f (HU )] − E[ f (H (1) )]| ≤ ε 0 . Otherwise, as | f |
                                                                                 (t)
is bounded by 1, the absolute difference is always at most 2. Hence, |E[ f (HU )] − E[ f (H (1) )]| ≤ ε 0 + 2ε,
                                       0
and so the total error is at most ε + (t + 2)ε.

                                 T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                            17
                                     C HIN H O L EE AND E MANUELE V IOLA

4    On almost k-wise independent variables with small total variance
In this section we prove Lemma 2.1. We first restate the lemma.
Lemma 4.1. Let X1 , X2 , . . . , Xk be k independent random variables over C≤1 such that for every i ∈
{1, . . . , k} and zi ∈ Supp(Xi ) we have Pr[Xi = zi ] ≥ 2−`
                                                          q. Let Y1 ,Y2 , . . . ,Yk be k random variables over C≤1
that are (ε, 16d)-close to X1 , . . . , Xk . For every σ ≥     ∑ki=1 Var[Xi ],

                              h k          h k                               d/2
                                                                        σ2
                                     i             i                
                           E ∏ Yi − E ∏ Xi             ≤2    O(d)
                                                                                    + (k2` )O(d) ε.
                               i=1           i=1                        d

    Our proof follows closely to the one in [15], which proves the lemma for ε = 0, that is, when the Yi ’s
are d-wise independent. We first give an overview of their proof.
    For independent random variables Z1 , . . . , Zk , we will use σ (Z) to denote (∑ki=1 Var[Zi ])1/2 .
    As a first step, let us assume each E[Xi ] is nonzero and normalize the variables Xi by writing
                                                                                       
                                                                            Xi − E[Xi ]
                      ∏ Xi = ∏(E[Xi ] + (Xi − E[Xi ])) = ∏ E[Xi ] 1 + E[Xi ] .
                        i       i                             i

Let Zi denote (Xi − E[Xi ])/ E[Xi ]. If |Zi | is small, then intuitively a low-order Taylor’s expansion of
∏i (1 + Zi ) should approximate the original function well. To write down its Taylor’s expansion, a
convenient way is to rewrite ∏i (1 + Zi ) as e∑i log(1+Zi ) . It suffices to bound above its error term in
expectation. This is equivalent to bounding the d-th moment of ∑i log(1 + Zi ). A standard calculation
gives a bound in terms of the norm and variance of the functions log(1 + Zi ). Since |Zi | is small,
log(1 + Zi ) behaves similarly as Zi . So we can relate the error term in terms of |Zi | andp∑i Var[Zi ] = σ (Z)2 .
In particular if |Zi | ≤ B for all i then we would get an error bound of the form 2O(d) ( σ (Z)2 /d + B)O(d) .
For now let’s think of E[Xi ] being bounded away from 0 so that Var[Zi ] = Θ(Var[Xi ]).
    Now we handle the case where |Zi | is large. Note that this implies either (1) |Xi − E[Xi ]| is large, or
(2) E[Xi ] is small. We will handle the two conditions separately by a reduction to the case where the |Zi |’s
are small.
    The recurring idea throughout is that we can always tolerate a few bad variables that violate the
conditions, provided with high probability there can be at most O(d) of them. This is because by affording
an extra O(d) amount of independence in the beginning, we can condition on the values of these variables
and work with the remaining ones.
    As a simple illustration of this idea, throughout the proof we can assume each Var[Xi ] is bounded by
σ (X)2 /d, as there can be at most d bad variables Xi that violate this inequality, and so we can start with
2d-wise independence, then conditioned on the values of the bad variables Xi , each of the rest of the Xi
would satisfy the bound.
    We first assume the |E[Xi ]|’s are large and handle (1), we will round the Xi to E[Xi ] whenever
|Xi − E[Xi ]| ≥ B. Note that by Chebyshev’s inequality an Xi gets rounded with probability Var[Xi ]/B2 . It
follows that the probability that there are more than d such Xi ’s is at most (σ (X)/Bd)d . This suggests
taking B to be (σ (X)/d)α for some constant α ∈ (0, 1) to balance the error terms.
    It remains to handle condition (2), for Zi to be bounded by B = (σ (X)2 /d)Ω(1) , as explained above it
suffices to show that all but O(d) of the Xi ’s satisfy |E[Xi ]| ≥ (σ (X)2 /d)O(1) . If |E[Xi ]| ≥ (σ (X)2 /d)O(1)

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                  18
       M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

for Ω(d) of the Xi ’s, then by a similar argument as above one can show that with high probability at least
half of them is bounded by (σ (X)2 /d)Ω(1) . Hence, E[∏i Xi ] is at most (σ (X)2 /d)Ω(d) when the Xi ’s are
d-wise independent. This finishes the proof.
    Note that in the case of ε > 0, each Xi is only ε-close to the corresponding Yi and they are not exactly
identical. As a result, throughout the proof we will often have to introduce hybrid terms to move from
functions of Xi to functions of Yi , and vice versa, and we will show that each of these steps introduces an
error of at most kO(d) ε.
    Also, there is some loss in ε whenever we condition on the values of any subset of the Yi ’s, see
Claim 4.9 for a formal claim. This introduces the extra condition that each Xi must put a certain mass on
each outcome.

4.1   Preliminaries
In this section, we prove several claims that will be used in the proof of Lemma 2.1.

Lemma 4.2. For any z ∈ C with |z| ≤ 1/2, |log(1 + z)| ≤ 2|z|, where we take the principle branch of the
logarithm (phase between (−π, π)).

Proof. From the Taylor series expansion of the complex-valued log function we have

                                     ∞
                                        (−1)n−1 n   ∞           ∞
                    |log(1 + z)| = ∑           z ≤ ∑ |z|n ≤ |z| ∑ (1/2)n = 2|z|.
                                    n=1    n       n=1         n=0


Lemma 4.3. Let Z ∈ C be a random variable with |Z| ≤ 1/2, E[Z] = 0 and W = log(1 + Z). We have
Var[W ] ≤ 4 Var[Z].

Proof. Using the definition of Variance, the assumption that E[Z] = 0 and Lemma 4.2,

                                         Var[W ] = E[|W |2 ] − |E[W ]|2
                                                 ≤ E[|W |2 ]
                                                 ≤ 4 E[|Z|2 ]
                                                 = 4 Var[Z].

Lemma 4.4 (Taylor’s approximation). For w ∈ C and d > 0,

                                     d−1
                                          wj        |w|d
                                ew − ∑       ≤ O(1)      · max{1, eℜ(w) }.
                                      j=0 j!         d!

Proof. We will prove the inequality |ew − 1| ≤ O(1) · |w| max{1, eℜ(w) }. The rest then follows from the
observations that
                                        d                  d−1 j 
                                           wj
                                                Z w
                                                                z
                                  ew − ∑      =        ez − ∑      dz
                                       j=0 j!    0          j=0 j!


                       T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                              19
                                     C HIN H O L EE AND E MANUELE V IOLA

and
                                   |z|d−1                     |w|d
                            Z w
                                           max{1, eℜ(z) }dz ≤      · max{1, eℜ(w) }.
                             0    (d − 1)!                     d!
We now prove the inequality at the beginning of the proof. Note that |ew | = eℜ(w) . Suppose |w| > 1. If
ℜ(w) > 0, then we have
                                      |ew − 1|        eℜ(w) + 1
                                                    ≤           ≤ 2.
                                 |w| max{1, eℜ(w) }     eℜ(w)
Otherwise, ℜ(w) ≤ 0 and we have
                                            |ew − 1|
                                                          ≤ eℜ(w) + 1 ≤ 2.
                                       |w| max{1, eℜ(w) }

Now suppose |w| ≤ 1. Then |ew − 1| = w 01 etw dt ≤ |w| · e.
                                                R



Lemma 4.5. For any random variable W ∈ C, |eE[W ] | ≤ E[|eW |].
Proof. By Jensen’s inequality, we have

                                  |eE[W ] | = |eE[ℜ(W )] | ≤ |E[eℜ(W ) ]| = E[|eW |].

Claim 4.6. |ez1 − ez2 | ≤ |ez2 | · O(|z1 − z2 |) whenever |z1 − z2 | ≤ 1.
Proof. By Lemma 4.4 with d = 1,

                      |ez1 −z2 − 1| ≤ O(1) · |z1 − z2 | · max{1, eℜ(z1 −z2 ) } = O(|z1 − z2 |),

because ℜ(z1 − z2 ) ≤ |z1 − z2 | ≤ 1. Therefore,

                                         |ez1 − ez2 | = |ez2 (ez1 −z2 − 1)|
                                                     = |ez2 ||ez1 −z2 − 1|
                                                     ≤ |ez2 | · O(|z1 − z2 |).

Claim 4.7. Let X,Y ∈ Ω be two discrete random variables such that sd(X,Y ) ≤ ε. Let f : Ω → C be any
function. We have |E[ f (X)] − E[ f (Y )]| ≤ 2 maxz | f (z)| · sd(X,Y ).
Proof. Let p and q be the probability mass function of X and Y . Using the fact that sd(X,Y ) = 12 ∑z |p(z)−
q(z)|, we have

                              E[ f (X)] − E[ f (Y )] = ∑ p(z) f (z) − ∑ q(z) f (z)
                                                              z               z
                                                      ≤ ∑| f (z)||p(z) − q(z)|
                                                          z
                                                      ≤ max| f (z)| · ∑|p(z) − q(z)|
                                                              z         z
                                                      = 2 max| f (z)| · sd(X,Y ).
                                                                  z


                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                            20
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

Claim 4.8 (Maclaurin’s inequality (cf. [35])). Let z1 , . . . , zk be k non-negative real numbers. For any
i ∈ {0, . . . , k}, we have
                                                                                        k
                                     Si (z1 , . . . , zk ) :=    ∑ ∏ z j ≤ (e/i)i ( ∑ z j )i .
                                                                S:|S|=i j∈S            j=1


4.2     Proof of Lemma 2.1
We now prove Lemma 2.1. Recall that for independent random variables Z1 , . . . , Zk , we use σ (Z) to
denote (∑ki=1 Var[Zi ])1/2 . We will also denote σ 2 /d by v for notational simplicity. As hinted in the
overview above, throughout the proof we will assume Var[Xi ] ≤ v = σ 2 /d for every i ∈ {1, . . . , k}. This
assumption will be used in the proof of Lemma 4.13 to give a uniform bound on how close the rounded
Xi ’s and Xi ’s are in expectation. We will show how this assumption can be removed right after the proof
of Lemma 4.20.

4.2.1    Assuming the variances are not too small
We first prove a claim showing that the Yi remain close to the Xi even if we condition on the values of
a few of the Yi ’s. This claim will be used multiple times throughout the proof. Note that this claim is
immediate for exact independence (ε = 0) but less for almost independence. We shall use the assumption
that each Xi takes any value with probability at least 2−` .
Claim 4.9. Let X1 , X2 , . . . , Xk be k independent random variables over C≤1 . Let Y1 ,Y2 , . . . ,Yk be k random
variables over C≤1 that are (ε, d)-close to X1 , X2 , . . . , Xk . Let S ⊆ {1, . . . , k} be a subset of size t. Suppose
for every i ∈ S and zi ∈ Supp(Xi ), we have Pr[Xi = zi ] ≥ 2−` . Then conditioned on any values of the Yi for
i ∈ S, the Yi for i 6∈ S are (2t` ε, d − t)-close to the Xi for i 6∈ S.
Proof. For every subset W ⊆ [k] denote by ZW = zW the event                         j∈W (Z j = z j ). Let T ⊆ [k] − S be a subset
                                                                                V

of size at most d − t. We have for every choice of z,
              Pr[YT = zT | YS = zS ] − Pr[XT = zT ]
            Pr[YS∪T = zS∪T ] Pr[XS∪T = zS∪T ]
          =                 −
              Pr[YS = zS ]      Pr[XS = zS ]
                 1            1                          |Pr[YS∪T = zS∪T ] − Pr[XS∪T = zS∪T ]|
          ≤             −             Pr[YS∪T = zS∪T ] +                                       .
            Pr[YS = zS ] Pr[XS = zS ]                                Pr[XS = zS ]
Summing over all possible choices of zT we have

                                ∑ Pr[YT = zT | YS = zS ] − Pr[XT = zT ]
                                zT
                                    1               1                            ε
                            ≤                −               Pr[YS = zS ] +
                               Pr[YS = zS ] Pr[XS = zS ]                    Pr[XS = zS ]
                              |Pr[XS = zs ] − Pr[YS = zS ]|        ε
                            =                               +
                                      Pr[XS = zS ]            Pr[XS = zS ]
                                   2ε
                            ≤              ,
                              Pr[XS = zS ]

                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                21
                                          C HIN H O L EE AND E MANUELE V IOLA

where the inequalities follow because |S ∪ T | ≤ d and the variables Yi are (ε, d)-close to the Xi . Since the
Xi are independent and Pr[Xi = zi ] ≥ 2−` , we have Pr[XS = zS ] ≥ 2−t` . This completes the proof.

4.2.2   Assuming the variables are close to their expectations and the expectations are large
Lemma 4.10. Let X1 , X2 , . . . , Xk be k independent discrete random variables over C≤1 . Let Y1 ,Y2 , . . . ,Yk
be k discrete random variables over C≤1 that are (ε, d)-close to X1 , . . . , Xk . Assume for each Xi and Yi ,
there exist Zi and Zi0 such that

                                   Xi = E[Xi ](1 + Zi ) and Yi = E[Xi ](1 + Zi0 ),

where |Zi | ≤ B ≤ 1/2 and |Zi0 | ≤ B ≤ 1/2. Then for every σZ ≥ σ (Z)

                           h k                 h k                          √       !d
                                    i                     i               σZ d + Bd
                          E ∏ Xi − E ∏ Yi                     ≤ 2O(d)                  + (kB)O(d) ε.
                            i=1                 i=1                           d

Remark 4.11. Note that we define Zi0 above in terms of E[Xi ] but not E[Yi ]. The random variables Zi are
independent, but the variables Zi0 may not be. Also, later we will take B to be v1/3 .

Proof. Define Wi , Ŵi such that

                                    Wi = log(1 + Zi ) and Ŵi = Wi − E[Wi ].

Also define Wi0 , Ŵi0 such that

                                    Wi0 = log(1 + Zi0 ) and Ŵi0 = Wi0 − E[Wi ].
                                                                                                     0
Let Ŵ = ∑i Ŵi and Ŵ 0 = ∑i Ŵi0 . Note that Xi = E[Xi ]eŴi +E[Wi ] and Yi = E[Xi ]eŴi +E[Wi ] . We have

                      k           k                                      k           k                   0
                     ∏ Xi = ∏ E[Xi ]eE[W ] eŴ        i
                                                                    and   ∏ Yi = ∏ E[Xi ]eE[W ] eŴ .i

                     i=1           i=1                                    i=1           i=1

Hence the difference is
                                    k            k             k                             0
                                   ∏ Xi − ∏ Yi =                ∏ E[Xi ]eE[Wi ]       eŴ − eŴ .
                                   i=1          i=1             i=1
                      0
We rewrite eŴ − eŴ as a sum of 3 terms:
                              d−1           d−1                  d−1 j             
                       0                                    j                         0
              eŴ − eŴ = eŴ − ∑ Ŵ j / j! + ∑ (Ŵ j − Ŵ 0 )/ j! + ∑ Ŵ 0 / j! − eŴ .
                                         j=0                     j=0                           j=0


It suffices to bound above the expectation of each term multiplied by γ := ∏ki=1 E[Xi ]eE[Wi ] . We bound the
first and last terms using Taylor’s approximation (Lemma 4.4), and the second term using (ε, d)-closeness

                           T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                  22
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

of the variables. Specifically, we will show the following:
                     "
                           0 d−1 j 
                                               #               √        !d
                             Ŵ                       O(d) σZ d + Bd
                   E γ · e − ∑ Ŵ / j!   0       ≤2                        + (kB)O(d) ε                                  (4.1)
                                   j=0                           d
                      "
                                   d−1
                                               #               √        !d
                                                          σZ  d  + Bd
                    E γ · eŴ − ∑ Ŵ j / j!      ≤ 2O(d)                                                                 (4.2)
                                    j=0                          d
                                    h d−1                    i
                           γ ·E          (Ŵ j
                                               − Ŵ 0 j )/ j! ≤ 2kd ε.                                                   (4.3)
                                      ∑
                                      j=0

For (4.1), by Lemma 4.4 we have
                         0 d−1 j                      |Ŵ 0 |d               0
                     γ · eŴ − ∑ Ŵ 0 / j! ≤ |γ| · O(1)          · max{1, eℜ(Ŵ ) }.
                               j=0                        d!
                                                  0
We now bound above |γ · max{1, eℜ(Ŵ ) }| by 1. We have
                         k
                |γ| = ∏ E[Xi ]eE[Wi ]
                        i=1
                         k
                    = ∏ E[Xi ] · eE[∑i Wi ]
                        i=1
                         k
                    ≤ ∏ E[Xi ] · E |e∑i Wi |
                                           
                                                                   (Jensen’s inequality, see Lemma 4.5)
                        i=1
                       h k                i
                    = E ∏ E[Xi ] · e∑i Wi
                              i=1
                       h k   i
                    = E ∏ Xi
                              i=1
                    ≤ 1.
Moreover,
                                            k                             k                      k
                              0                                0                        0
                  |γ · eℜ(Ŵ ) | = ∏ E[Xi ]eE[Wi ] · eℜ(Ŵ ) = ∏ E[Xi ]eE[Wi ] eŴ = ∏ Yi ≤ 1.
                                        i=1                              i=1                   i=1

Hence, it suffices to bound above E[|Ŵ 0 |d ]. Note that the Ŵi0 ’s are (ε, d)-close to the Ŵi ’s. So we bound
above |Ŵi | and Var[Ŵi ] and then apply the following lemma. The same lemma and its proof can be found
in Section 8 as Lemma 8.1.)
Lemma 4.12. Let Ŵ1 , Ŵ2 , . . . , Ŵk ∈ C be independent random variables with E[Ŵi ] = 0, |Ŵi | < B. Let d
be an even positive integer. Let Ŵ10 , Ŵ20 , . . . , Ŵk0 ∈ C be random variables that are (ε, d)-close to Ŵ1 , . . . , Ŵk .
Then,                  "               #
                           k         d                                1/2     d
                     E ∑ Ŵi0            ≤ 4 · 2d ∑ Var[Ŵi ] · d            + dB + 4(2kB)d ε.
                                  i=1                      i



                              T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                          23
                                                 C HIN H O L EE AND E MANUELE V IOLA

   First, since |Zi | ≤ B, we have |Wi | ≤ 2B because of Lemma 4.2, and so |Ŵi | ≤ |Wi | + |E[Wi ]| ≤ 4B.
Next, we have Var[Ŵi ] ≤ 4 Var[Zi ] because of Lemma 4.3, and so σ (Ŵ ) ≤ 2σ (Z). Therefore, applying
Lemma 4.12 to E[|Ŵ 0 |d ] and using the inequality d! ≥ (d/e)d , we have
             "                                                           #
                  k
                                 E[Wi ]
                                           0 d−1 j                                E[|Ŵ 0 |d ]
         E        ∏ E[Xi ]e                 eŴ − ∑ Ŵ 0 / j!                ≤ O(1)
                  i=1                                      j=0                          d!
                                                                                                 √        !d
                                                                                O(d) σ (Ŵ ) d + Bd
                                                                             ≤2                              + (kB)O(d) ε
                                                                                                  d
                                                                                           √           !d
                                                                                       σ Z     d  + Bd
                                                                             ≤ 2O(d)                      + (kB)O(d) ε.     (4.4)
                                                                                               d


    It follows by Inequality (4.4) by considering ε = 0 that
                           "
                                k                               d−1
                                                                                    #                 √       !d
                                                 E[Wi ]
                                                                                                 σZ d + Bd
                       E        ∏ E[Xi ]e                   eŴ − ∑ Ŵ j / j!           ≤2   O(d)
                                                                                                                 .
                                i=1                                j=0                                  d


    Finally we prove Inequality (4.3). By linearity of expectation,

                                         h d−1                    i d−1
                                                  j      0 j )/ j! =                    j 
                                     E     ∑  (Ŵ   − Ŵ             ∑ E[Ŵ j ] − E[Ŵ 0 ] / j! .
                                                                                X             Y
                                           j=0                           j=0


Note that Ŵ j = (∑i Ŵi ) j can be written as a sum of k j terms where each term is a product of at most
j ≤ d different Wi ’s. Moreover, we have |Wi | ≤ 2B ≤ 1 for each i because of Lemma 4.2. So by Claim 4.7
                          j
we have |E[Ŵ j ] − E[Ŵ 0 ]| ≤ 2k j ε. Hence,

                                          hd−1           j
                                                               i  d−1
                                                                                     j
                                         E ∑ (Ŵ j − Ŵ 0 )/ j! ≤ ∑ E[Ŵ j ] − E[Ŵ 0 ]
                                             j=0                               j=0
                                                                                d−1
                                                                          ≤ 2 ∑ k jε
                                                                                j=0

                                                                          ≤ 2kd ε.

Recall that |γ| ≤ 1, this concludes (4.3).


4.2.3   Assuming large expectations and small variances

We now prove the main lemma assuming each random variable Xi has expectation far from zero and small
variance.

                               T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                           24
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

Lemma 4.13. Let σ be a real number. Let X1 , X2 , . . . , Xk be k independent random variables over C≤1
such that for every i ∈ {1, . . . , k} and zi ∈ Supp(Xi ) we have Pr[Xi = zi ] ≥ 2−` . Let Y1 ,Y2 , . . . ,Yk be k
random variables over C≤1 that are (ε, 9d)-close to X1 , . . . , Xk . Assume for each i we have |E[Xi ]| ≥ v1/6
and Var[Xi ] ≤ v, where v = σ 2 /d. We have

                                      h k i    h k i
                                     E ∏ Xi − E ∏ Yi ≤ 2O(d) vd + (k2` )O(d) ε.
                                        i=1            i=1


Proof. We will assume v is less than a sufficiently small constant and ε ≤ (k2` )−Cd for a sufficiently
large C; otherwise the right hand side of the inequality is greater than 2 and there is nothing to prove.
    For each i ∈ {1, 2, . . . , k}, we define a function rdi : C≤1 → C≤1 that will be used to round the variables
Xi and Yi . We define rdi as                      (
                                                   z       if |z − E[Xi ]| ≤ v1/3
                                       rdi (z) :=
                                                   E[Xi ] otherwise.

Let X̃i = rdi (Xi ) and Ỹi = rdi (Yi ). We will write both ∏i Xi and ∏i Yi as
                             k           k
                            ∏ Xi = ∏(Xi − X̃i + X̃i ) =              ∑       ∏(Xi − X̃i ) ∏ X̃i ,
                            i=1         i=1                     S⊆{1,2,...,k} i∈S           i6∈S

and
                                 k           k
                              ∏ Yi = ∏(Yi − Ỹi + Ỹi ) =            ∑       ∏(Yi − Ỹi ) ∏ Ỹi .
                              i=1        i=1                    S⊆{1,2,...,k} i∈S          i6∈S

Let m = 3d. Define
                                  Pm (z1 , z2 , . . . , zk ) = ∑ ∏(zi − rdi (zi )) ∏ rdi (zi ).
                                                         |S|<m i∈S                  i6∈S

We will prove two claims below. Claim 4.14 shows that Pm is a good approximation of the product in
expectation under both Yi ’s and Xi ’s (by setting ε to 0). Claim 4.15 shows that the expectations of Pm
under Xi ’s and Yi ’s are close.
                 h                        i
Claim 4.14. E ∏i Yi − Pm (Y1 , . . . ,Yk ) ≤ 2O(d) vd + kO(d) ε.

Claim 4.15. |E[Pm (X1 , . . . , Xk )] − E[Pm (Y1 , . . . ,Yk )]| ≤ 2O(d) vd + (k2` )O(d) ε .
      Combining these two claims proves Lemma 4.13.

   We now prove Claims 4.14 and 4.15. We will use the following inequalities repeatedly. Recall that
v = σ 2 /d.

Claim 4.16. Pr[X̃i 6= Xi ] ≤ Var[Xi ]v−2/3 ≤ v1/3 . In particular, ∑i Pr[X̃i 6= Xi ] ≤ (dσ )2/3 .

Proof. The first inequality follows from Chebyshev’s inequality and second follows from the assumption
Var[Xi ] ≤ v. The last sentence is implied by the first inequality.

                           T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                25
                                            C HIN H O L EE AND E MANUELE V IOLA

Proof of Claim 4.14. Consider the product ∏i∈S (Yi − Ỹi ). Let N 0 be the number of i ∈ {0, 1, 2, . . . , k} such
that Ỹi 6= Yi . If N 0 < m then any set S of size at least m must contain an i such that Ỹi = Yi . In this case the
product is 0 and thus


                             ∏ Yi − Pm (Y1 , . . . ,Yk ) = ∑ ∏(Yi − Ỹi ) ∏ Ỹi = 0.
                               i                                       |S|≥m i∈S        i6∈S



So,

          h                           i   h                                       i
         E ∏ Yi − Pm (Y1 , . . . ,Yk ) = E 1(N 0 ≥ m) · ∏ Yi − Pm (Y1 , . . . ,Yk )
              i                                                                 i
                                                    h                                         i
                                                 ≤ E 1(N 0 ≥ m) · ∏ Yi + |Pm (Y1 , . . . ,Yk )|
                                                                                i
                                                    h                 i   h                                   i
                                                 = E 1(N 0 ≥ m) · ∏ Yi + E 1(N 0 ≥ m) · |Pm (Y1 , . . . ,Yk )| .
                                                                            i


                                          N0     m−1 N 0 m      m N 0 subsets in the sum in P for
If N 0 ≥ m then there can be at most ∑m−1
                                                                  
                                      s=0 s ≤ ∑s=0 m       s ≤ 2   m                         m
which the product is nonzero, and each such product can have magnitude at most 2m because |S| < m.
Thus,

                                                                                            m−1  0 
                                   0                                                0   m        N
                          1(N ≥ m) · Pm (Y1 , . . . ,Yk ) ≤ 1(N ≥ m) · 2                       ∑
                                                                                            s=0  s
                                                                                                0
                                                                                                N
                                                                        ≤ 1(N 0 ≥ m) · 2m · 2m
                                                                                                m
                                                                               0
                                                                                N
                                                                        ≤ 22m      .
                                                                                m


Therefore,

                                                                                       0 
          h
              0
                                                  i    h
                                                               0
                                                                             i
                                                                                 2m     N
         E 1(N ≥ m) · ∏ Yi + |Pm (Y1 , . . . ,Yk )| ≤ E 1(N ≥ m) · ∏ Yi + 2 E
                       i                                                i               m
                                                                               0 
                                                                                N
                                                      ≤ E[1(N 0 ≥ m)] + 22m E         .
                                                                                m

                                       h        i
                                           N0
Claim 4.17. Pr[N 0 ≥ m] ≤ E                m         ≤ vd + kO(d) ε.


                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                      26
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

Proof. The first inequality is clear. To see the second one, note that
              0 
                 N                h^             i
          E            ≤ ∑ Pr          Yi 6= Ỹi
                  m       |S|=m    i∈S
                                                    
                       ≤ ∑ ∏ Pr[Xi 6= X̃i ] + ε                        (each Yi is ε-close to Xi )
                              |S|=m     i∈S

                          ≤ ∑ ∏ Pr[Xi 6= X̃i ] + km ε
                              |S|=m i∈S
                                                           m
                                  e · ∑ki=1 Pr[Xi 6= X̃i ]
                              
                          ≤                                   + km ε                          (Maclaurin’s inequality)
                                             m
                                               !3d
                                  e · (dσ )2/3
                          ≤                         + km ε                                    (Claim 4.16)
                                       3d
                          ≤ vd + kO(d) ε.

    Therefore,
                                                                      0 
        h
             0
                                             i
                                                        0         2m   N
      E 1(N ≥ m) · ∏ Yi − Pm (Y1 , . . . ,Yk )   ≤ E[1(N ≥ m)] + 2 E
                    i                                                  m
                                                                 ≤ (1 + 26d )(vd + kO(d) ε)                            (m = 3d)
                                                                 ≤ 2O(d) vd + kO(d) ε,

proving the claim.

   We have just shown that Pm approximates the product well under both Xi and Yi in expectation. It
remains to show that Pm (Y1 , . . . ,Yk ) is close to Pm (X1 , . . . , Xk ) in expectation.

Proof of Claim 4.15. The difference between Pm (X1 , . . . , Xk ) and Pm (Y1 , . . . ,Yk ) equals
                                                                                                   
            Pm (X1 , . . . , Xk ) − Pm (Y1 , . . . ,Yk ) = ∑ ∏(Xi − X̃i ) ∏ X̃i − ∏(Yi − Ỹi ) ∏ Ỹi .
                                                          |S|<m      i∈S               i6∈S        i∈S          i6∈S

We can rewrite the right hand side as
                                                                                           !
                                                                                       
                 ∑          ∏(Xi − X̃i ) − ∏(Yi − Ỹi ) ∏ X̃i + ∏(Yi − Ỹi ) ∏ X̃i − ∏ Ỹi .
                |S|<m       i∈S               i∈S                 i6∈S           i∈S             i6∈S    i6∈S

It suffices to show that
                     h                                i
                   E ∑ ∏(Xi − X̃i ) − ∏(Yi − Ỹi ) ∏ X̃i ≤ kO(d) ε                                                                (4.5)
                        |S|<m     i∈S               i∈S                   i6∈S
                               h                            i
                              E ∑ ∏(Yi − Ỹi ) ∏ X̃i − ∏ Ỹi    ≤ 2O(d) vd + (k2` )O(d) ε.                                        (4.6)
                                  |S|<m i∈S               i6∈S           i6∈S



                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                       27
                                      C HIN H O L EE AND E MANUELE V IOLA

We first prove Inequality (4.5). Because the Xi ’s are independent, the left hand side of the inequality
equals

                               h            i   h            i h     i
                          ∑    E ∏(Xi − X̃i ) − E ∏(Yi − Ỹi ) E ∏ X̃i
                         |S|<m        i∈S                   i∈S                  i6∈S
                           m−1  h            i   h            i   h      i
                         ≤ ∑ ∑ E ∏(Xi − X̃i ) − E ∏(Yi − Ỹi ) · E ∏ X̃i
                            s=1 |S|=s       i∈S                   i∈S                   i6∈S
                            m−1 h            i   h            i
                         ≤ ∑ ∑ E ∏(Xi − X̃i ) − E ∏(Yi − Ỹi )
                            s=1 |S|=s       i∈S                   i∈S
                            m−1
                         ≤ ∑ ∑ 2 · 2s ε
                            s=1 |S|=s
                            m−1
                         ≤ ∑ ks · 2 · 2s ε
                            s=1
                         ≤ 2(2k)m ε
                         = kO(d) ε.


To see the third inequality, note that |z − rdi (z)| ≤ 2, and so |∏i∈S (zi − rdi (zi ))| ≤ 2|S| . So we can apply
Claim 4.7 to bound above the absolute difference by 2 · 2|S| ε.

    Now we prove Inequality (4.6). As the Yi ’s are (ε, 9d)-close to Xi ’s, for each S with |S| ≤ m = 3d, if
we conditioned on the values of Ỹi for which i ∈ S, then by Claim 4.9 the remaining Ỹi ’s for which i 6∈ S
are still (2O(m·`) ε, 6d)-close to their corresponding X̃i ’s. (Recall that we can assume ε = (k2` )−Cd for a
sufficiently large C.) We will apply Lemma 4.10 to these Xi and Yi .

    Define Zi , Zi0 such that X̃i = E[X̃i ](1 + Zi ) and Ỹi = E[X̃i ](1 + Zi0 ). To apply Lemma 4.10, we need the
following two claims to bound above |Zi |, |Zi0 | and σ (Z)2 . We defer their proofs to the end.


Claim 4.18. Let B = 6v1/6 . Then |Zi | ≤ B and |Zi0 | ≤ B.


Claim 4.19. σ (Z)2 ≤ 4σ (X)2 v−1/3 ≤ 4σ 2 v−1/3 .


    Therefore, by Lemma 4.10 with ε 0 = 2O(m·`) ε and B = 6v1/6 ≤ 1/2 (Recall that we can assume v less
than a sufficiently small constant),

                      h                            i      h            i
                     E ∑ ∏(Yi − Ỹi ) ∏ X̃i − ∏ Ỹi    ≤ ∑ E ∏(Yi − Ỹi ) · M,
                        |S|<m i∈S             i6∈S   i6∈S               |S|<m   i∈S



                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   28
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

where
                                                        √       !6d
                                                 σ v−1/6 d + dB
                                 M ≤ 2O(d)                          + (Bk)O(d) ε 0
                                                        d
                                   ≤ 2O(d) (v1/3 + B)6d + (Bk2` )O(d) ε
                                                       6d
                                   = 2O(d) v1/3 + v1/6      + (k2` )O(d) ε
                                   = 2O(d) vd + (k2` )O(d) ε.
    We now bound above E[|∏i∈S (Yi − Ỹi )|]. Note that |∏i∈S (zi − rdi (zi ))| ≤ 2|S| . Hence by Claim 4.7,
                          h                 i     h               i
                        E ∏(Yi − Ỹi ) ≤ E ∏(Xi − X̃i ) + 2 · 2|S| ε.
                                   i∈S                           i∈S

Let N be the number of i ∈ {0, 1, . . . , k} such that X̃i 6= Xi . Note that
                    h                i m−1            
                                                            N
                                                   s
                ∑ E ∏(Xi − X̃i ) ≤ ∑ 2 E s
              |S|<m   i∈S                   s=0

                                         ≤ 2m E[2N ]
                                               k                 
                                         = 2m ∏ 1 + Pr[Xi 6= X̃i ]
                                                 i=1
                                                m ∑i Pr[Xi 6=X̃i0 ]
                                         ≤2 e                                        (Claim 4.16)
                                                    (dσ )2/3
                                         ≤ 2m e
                                         ≤ 2O(d) ,
where the last inequality is because σ 2 /d ≤ 1 and so σ 2/3 ≤ d 1/3 . Therefore,
                      h              i
                 ∑ E ∏(Yi − Ỹi ) ≤ 2O(d) + 2 ∑ 2|S| ε ≤ 2O(d) + 2 · (2k)m ε ≤ 2O(d) ,
                 |S|<m     i∈S                                 |S|<m

where the last inequality is because ε ≤ k−Cd for a sufficiently large C.            So altogether the bound is
2O(d) · M ≤ 2O(d) vd + (k2` )O(d) ε, proving Inequality (4.6). This complete the proof of Claim 4.15.
    We now prove Claim 4.18 and 4.19. By Claim 4.16, |E[Xi ] − E[X̃i ]| ≤ 2v1/3 . Also by assumption,
|E[Xi ]| ≥ v1/6 . So, we have |E[X̃i ]| ≥ |E[Xi ]|/2 ≥ v1/6 /2.
Proof of Claim 4.18. As |E[X̃i ]| ≥ v1/6 /2, we have
                                            |X̃i − E[X̃i ]|
                                      |Zi | =
                                               |E[X̃i ]|
                                            |X̃i − E[Xi ]| + |E[X̃i ] − E[Xi ]|
                                          ≤
                                                          |E[X̃i ]|
                                          ≤ 6v1/3 /v1/6
                                          ≤ 6v1/6 ,


                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                               29
                                              C HIN H O L EE AND E MANUELE V IOLA

and the same argument holds for |Zi0 | because |Ỹi − E[Xi ]| ≤ v1/3 by the definition of rdi .

Proof of Claim 4.19. Since h∗ = E[H] is the minimizer of E[|H − h|2 ] for any random variable H, we
have

                        Var[X̃i ] = E[|X̃i − E[X̃i ]|2 ]
                                    ≤ E[|X̃i − E[Xi ]|2 ]
                                    ≤ E[|Xi − E[Xi ]|2 ]                                     (X̃i = rdi (Xi ))
                                    = Var[Xi ].

Therefore, Var[Zi ] = Var[X̃i ]/|E[X̃i ]|2 ≤ 4 Var[Xi ]v−1/3 and thus ∑i Var[Zi ] ≤ 4σ 2 v−1/3 .

4.2.4   The general case
We first prove Lemma 2.1 by assuming Var[X j ] ≤ σ 2 /d for all j and the Yi ’s are only (ε, 15d)-close to
the Xi ’s. Later we will handle the general case. We will again assume σ 2 /d is less than a sufficiently
small constant and ε ≤ (k2` )−Cd for a sufficiently large constant C.
Lemma 4.20. Let X1 , X2 , . . . , Xk be k independent random variables over C≤1 such that for every i and zi ∈
Supp(Xi ) we have Pr[Xi = zi ] ≥ 2−` . Let Y1 ,Y2 , . . . ,Yk be k random variables over C≤1 . q
                                                                                               Suppose Var[Xi ] ≤
v = σ 2 /d for every i ∈ [k] and the Yi are (ε, 15d)-close to the Xi . Then for every σ ≥                        ∑ki=1 Var[Xi ],

                             h k i    h k i          2 d/2
                                                     σ
                            E ∏ Yi − E ∏ Xi ≤ 2O(d)          + (k2` )O(d) ε.
                              i=1      i=1           d

Proof. Let m be the number of i such that |E[Xi ]| ≤ v1/6 . If m ≤ 6d, let J be the set of indices for which
|E[Xi ]| ≤ v1/6 . We can write
                                                                                  
                      ∏ Xi − ∏ Yi = ∏ X j − ∏ Y j ∏ X j + ∏ Y j ∏ X j − ∏ Y j .
                       i            i             j∈J      j∈J           j6∈J      j∈J       j6∈J      j6∈J

    It suffices to show that
                               h               i
                              E ∏ X j − ∏Yj ∏ X j ≤ ε                                                                         (4.7)
                                         j∈J        j∈J    j6∈J
                                h               i
                               E ∏Yj ∏ X j − ∏Yj    ≤ 2O(d) vd + (k2` )O(d) ε.                                                (4.8)
                                        j∈J      j6∈J     j6∈J

    We first show Inequality (4.7). Since the Xi are independent and the Yi are (ε, 15d)-close to the Xi ,
the left hand side of (4.7) is
                        h     i     h     i h        i     h      i     h      i
                        E ∏ X j − E ∏Yj E ∏ X j ≤ E ∏ X j − E ∏Yj
                              j∈J                 j∈J             j6∈J                 j∈J            j∈J

                                                                                ≤ ε.


                           T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                    30
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

    To prove Inequality (4.8), note that conditioned on the values of the Yi ’s for which i ∈ J, by Claim 4.9,
the rest of the Yi ’s are still (2O(d`) ε, 9d)-close to the corresponding Xi ’s with |E[Xi ]| ≥ v1/6 . (Recall that
we can assume ε = (k2` )−Cd for a sufficiently large C.) So the bound follows from Lemma 4.13.
      If m ≥ 6d, then note that
                                         h k i   k
                                        E ∏ Xi = ∏|E[Xi ]| ≤ vm/6 ≤ vd .
                                          i=1            i=1

So it suffices to show that
                                          h k i
                                         E ∏ Yi ≤ 2O(d) vd/2 + kO(d) ε.
                                            i=1

Consider the event E that at least 3d of the Yi for i ∈ J have absolute value less than 2v1/6 . Conditioned
on E, we have that
                                                    k
                                                   ∏ Yi ≤ 23d · vd/2 .
                                                   i=1

We will show that E happens except with probability at most v2d + k3d ε. Let N ∈ {0, 1, 2, . . . , m} be the
number of i ∈ J such that |Yi | ≥ 2v1/6 . Note that
                                                      h^                 i
                                                                    1/6
                          Pr[N ≥ 3d] ≤        ∑     Pr   (|Yi | ≥ 2v    )
                                                S⊆J:|S|=3d         i∈S
                                                                  h               i
                                                                              1/6
                                           ≤       ∑         ∏ Pr  |Xi | ≥ 2v       + k3d ε.
                                                S⊆J:|S|=3d i∈S

By Chebyshev’s inequality,

                          Pr[|Xi | ≥ 2v1/6 ] ≤ Pr[|Xi − E[Xi ]| ≥ v1/6 ] ≤ Var[Xi ]v−1/3 .

Hence, by Maclaurin’s inequality,
                                                                                                   !3d
                                      h
                                                 1/6
                                                     i               e · ∑mi=1 Pr[|Xi | ≥ 2v
                                                                                             1/6 ]
                          ∑       ∏ Pr |Xi | ≥ 2v ≤                               3d
                       S⊆J:|S|=3d i∈S
                                                                                              !3d
                                                                     e · ∑m              −1/3
                                                                           i=1 Var[Xi ]v
                                                               ≤
                                                                                3d
                                                                                   !3d
                                                                     e · σ 2 v−1/3
                                                               ≤
                                                                          3d
                                                               ≤ v2d .

So,
                                            Pr[N ≥ 3d] ≤ v2d + k3d ε.

                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   31
                                         C HIN H O L EE AND E MANUELE V IOLA

Therefore,
                                               h    i
                                              E ∏ Yi ≤ 23d vd/2 + v2d + k3d ε
                                                     i
                                                                      ≤ 2O(d) vd/2 + kO(d) ε.

   We now use Lemma 4.20 to prove Lemma 2.1 by paying a small overhead in the (ε, d)-closeness
between the random variables Xi and Yi to remove the assumption Var[Xi ] ≤ v = σ 2 /d for each i.

Proof of Lemma 2.1. Note that there can be at most d different indices i for which Var[Xi ] > v. Let J be
the set of these indices. We have

                       ∏ Xi − ∏ Yi = ∏ X j ∏ Xi − ∏ Y j ∏ Yi
                         i          i         j∈J          i6∈J             j∈J      i6∈J
                                                                                                                                
                                         =        ∏ X j − ∏Yj ∏ X j + ∏Yj ∏ X j − ∏Yj .
                                                  j∈J                 j∈J         i6∈J          j∈J           i6∈J       i6∈J

We first bound the expectation of the first term. Since the Xi ’s are independent,
                         h                                      i       h     i   h   i   h      i
                   E
                   X,Y
                              ∏ X j − ∏Yj ∏ X j                        = E ∏ X j − E ∏Yj · E ∏ X j
                              j∈J       j∈J         i6∈J                           j∈J                    j∈J                   i6∈J
                                                                              h             i         h              i
                                                                       ≤ E ∏ X j − E ∏Yj
                                                                                   j∈J                    j∈J

                                                                       ≤ ε.

For the second term, note that conditioned on the values of the Y j for which j ∈ J, by Claim 4.9, the
remaining variables are (2d` ε, 15d)-close to the corresponding X j and Var[Xi ] ≤ v for each of them. So
we can apply Lemma 4.20 and this completes the proof.


5    Improved bound for bounded independence plus noise fools products
In this section we prove Theorem 1.13, which improves the error bound in Theorem 1.11 from 2−Ω(d)
to `−Ω(d) , and Theorem 1.15, which gives the optimal error bound for nice product tests. The proof of
Theorem 1.13 requires developing a few additional technical tools. We first outline the high-level idea on
how to obtain the improvement.
    For simplicity, we will assume d = O(1) and show how to obtain an error bound of `−Ω(1) . Recall in
the proof of Theorem 1.11 (see also Table 1) that we used a win-win argument on the total variance: we
applied two different arguments depending on whether the total variance of a product test f is above or
below a certain threshold. Suppose now the total variance of f is guaranteed to lie outside the interval
[`−0.1 , `0.1 ]. Then applying the same arguments as before would already give us an error of `−Ω(1) . So it
suffices to handle the additional case, where the total variance is in the range of [`−0.1 , `0.1 ]. Our goal is to
use noise to reduce the total variance down to `−0.1 , which can then be handled by the low total variance

                             T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                      32
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

argument. To achieve this, as a first step we will handle the functions fi with (individual) variance above
and below `−0.6 separately, and show that O(`)-wise independence plus noise fools the product of the fi
in each case.
     For the former, note that since the total variance is ≤ `0.1 , there can be at most `0.7 functions with
variances above `−0.6 . In this case we can simply apply the result in [20] (Theorem 1.10). To prove the
latter case, we use noise to reduce the variance of each function. Specifically, we use the hypercontractivity
theorem to show that applying the noise operator to a function reduces its variance from σ 2 to (σ 2 )(4/3) .
This is proved in Section 5.1 below. Hence, on average over the noise, the variance σi2 of each fi is
reduced to at most (`−0.6 )1/3 σi2 , and so the total variance of the fi is at most (`−0.6 )1/3 · `0.1 = `−0.1 and
we can argue as before. To combine the two cases, we prove a new XOR Lemma for bounded independent
distributions, inspired by a similar lemma for small-bias distributions which is proved in [16], and the
theorem follows.

5.1     Noise reduces variance of bounded complex-valued functions
In this section, we show that on average, noise reduces the variance of bounded complex-valued functions.
We will use the hypercontractivity theorem for complex-valued functions (cf. [21, Theorem 6.1.8]).
     Let f : {0, 1}n → C be any function. For every ρ ∈ [0, 1], define the noise operator Tρ to be
Tρ f (x) := EN [ f (x + N)], where N = (N1 , . . . , Nn ) is a random variable over {0, 1}n , and the variables Ni
are independent and E[Ni ] = 1 − ρ for all i.
                                                                                          p
Theorem 5.1 (Hypercontractivity Theorem). Let q ∈ [2, ∞). Then for any ρ ∈ [0, 1/(q − 1)],
                                                       1/q
                                     E |Tρ f (x)|q           ≤ |E[ f (x)2 ]|1/2 .
                                       

      We will use the following well-known corollary.
Corollary 5.2. Let f : {0, 1}n → C. Then
                                                                  2 2
                                     E |Tρ f (x)|2 ≤ E | f (x)|1+ρ 1+ρ 2 .
                                                    

Proof.
                        E |Tρ f (x)|2 = E E 0 [ f (x + N) f (x + N 0 )]
                                                                     
                                         x N,N
                                                                       
                                       = E E 0 [ f (x) f (x + N + N 0 )]
                                         x N,N
                                                                        
                                       = E f (x) E 0 [ f (x + N + N 0 )]
                                         x       N,N
                                                            
                                       = E f (x)Tρ Tρ f (x)
                                          x
                                                      2 1                  1+ 1  1
                                       ≤ E | f (x)|1+ρ 1+ρ 2 E |Tρ Tρ f (x)| ρ 2 1+1/ρ 2
                                                             

                                                        2   1
                                       ≤ E[| f (x)|1+ρ ] 1+ρ 2 E[|Tρ f (x)|2 ]1/2 .
                                                                1       1
The first inequality follows from Hölder’s inequality because 1+ρ 2 + 1+1/ρ 2 = 1, and the second inequality

follows from Theorem 5.1 with q = 1 + 1/ρ 2 .

                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   33
                                     C HIN H O L EE AND E MANUELE V IOLA

    Let T = (T1 , . . . , Tm ) be an m-bit random variable, where the variables Ti are independent and
E[Ti ] = 1 − ρ for all i.
Claim 5.3. ET,U |EU 0 [ f (U + T ∧U 0 )]|2 = E |T√ρ f (x)|2 .
                                                         

Proof.
                                        h                                                                 i
          E |E0 [ f (U + T ∧U 0 )]|2 = E ∑ fˆα fˆα 0 E[χα+α 0 (U)] E0 [χα (T ∧U 0 )] E00 [χα 0 (T ∧U 00 )]
                                   
         T,U U                           T                  U              U                          U
                                              α,α 0

                                      = ∑| fˆα |2         E0 00 χα T ∧ (U 0 +U 00 )
                                                                                  
                                          α           T,U ,U

                                      = ∑| fˆα |2 ρ |α| = E |T√ρ f (x)|2 ,
                                                                       
                                          α

where the last inequality follows from Parseval’s identity because the Fourier expansion of Tρ f (x) is
∑α fˆα ρ |α| χα (x).
    We are now ready to prove that noise reduces the variance of a function. The main idea is to translate
the function to a point close to its mean so that its variance is close to its second moment, and then apply
Corollary 5.2 to it.
Lemma 5.4. Let f : {0, 1}n → C≤1 be any function. Let δ := min{| f (x) − f (x0 )| : f (x) 6= f (x0 )}. Then
                                                                        2
                                   h                  i     2 Var[ f ] 1+ρ
                                 E Var E[ f (x + T ∧U)] ≤ 4                  .
                                 T  x U                          δ2

Proof. We can assume Var[ f ] ≤ δ 2 /2; otherwise the conclusion is trivial. Let S be the support of f . For
every y ∈ S, let py := Pr[ f (x) = y]. Let µ = E[ f ] and σ 2 = Var[ f ]. Since σ 2 = E[| f (x) − µ|2 ], there is a
point z ∈ S such that |z − µ|2 ≤ σ 2 . We have
                                                                                            
                  σ 2 = ∑ py |y − µ|2 ≥ ∑ py |y − µ|2 ≥ min |y − µ|2                ∑ py .
                          y∈S                                          y∈S:y6=z
                                              y∈S:y6=z                                           y∈S:y6=z


Define g(x) := f (x)−z
                   2 . We have for every t,
                                                                            2
                 Var E[ f (x + t ∧U)] = 4 Var E[g(x + t ∧U)] ≤ 4 E E[g(x + t ∧U)] .
                  x   U                          x    U                            x     U

By Corollary 5.2,
                                                                               2
                                                                                   ! 22
                                 2 2                            y − z 1+ρ             1+ρ                           22
             E |Tρ g|2 ≤ E |g|1+ρ 1+ρ 2 =                                                                             1+ρ
                        
                                                        ∑ y 2  p                             ≤         ∑        py
                                                      y∈S:y6=z                                       y∈S:y6=z

because |y − y0 | ≤ 2 for every y, y0 ∈ C≤1 . So by Claim 5.3, we have
                                                                                                2
                                           2                                               1+ρ
                           E E[g(x + T ∧U)] = E |Tρ g|2 ≤
                                              
                                                                                   ∑ py                .
                           T,x   U
                                                                               y∈S:y6=z


                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                              34
          M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

It follows from above that
                                                                                                                          2
                                                                                                                       ! 1+ρ
                                                                                   2
 h                 i                2
                                                                                1+ρ                    Var[ f ]
E Var E[ f (x+T ∧U)] ≤ 4 E E[g(x+T ∧U)] ≤ 4                           ∑ py               ≤4                                    .
T     x   U                         T,x   U
                                                                    y∈S:y6=z                    miny∈S:y6=z |y − µ|2

Now we bound below miny∈S:y6=z |y − µ|2 . For every y 6= z,

                                δ 2 ≤ |y − z|2 ≤ |y − µ|2 + |µ − z|2 ≤ |y − µ|2 + σ 2 .

Because σ 2 ≤ δ 2 /2, we have
                                                            2                 2
                        h                  i      Var[ f ] 1+ρ      2 Var[ f ] 1+ρ
                      E Var E[ f (x + T ∧U)] ≤ 4                 ≤4                  .
                      T  x U                       δ2 −σ2                δ2
Remark 5.5. The dependence on δ is necessary. Consider a function f with support {0, ε}. Then f = εg,
where g has support {0, 1}. We have Var[ f ] = ε 2 Var[g]. Applying noise to f is the same as applying
noise to g, but g has no dependence on ε.

5.2       XOR Lemma for bounded independence
We now prove a version of XOR lemma for bounded independent distributions that is similar to the one
in [16], which proves the lemma for small-bias distributions.
Lemma 5.6. Let f1 , . . . , fk : {0, 1}m → [0, 1] be k functions on disjoint inputs. Let H : [0, 1]k → [0, 1]
be a multilinear function in its input. If each fi is fooled by any di -wise independent distribution with
error ε, then the function h : {0, 1}m → [0, 1] defined by h(x) := H( f1 (x), f2 (x), . . . , fk (x)) is fooled by
any (∑i≤k di )-wise independent distribution with error 8k ε.
    We will use the following dual equivalence between bounded independence and sandwiching polyno-
mials that was introduced by Bazzi [4]. This equivalence was stated for Boolean functions in [4], but it is
clear from the proof that it holds for [0, 1]-valued functions as well.
Fact 5.7 ([4]). A function f : {0, 1}m → [0, 1] is fooled by every d-wise independent distribution if and
only if there exist two multivariate polynomials p` and pu of degree d such that
    1. For every x ∈ {0, 1}m , we have p` (x) ≤ f (x) ≤ pu (x), and

    2. E[pu (U) − f (U)] ≤ ε and E[ f (U) − p` (U)] ≤ ε.
Proof of Lemma 5.6. By Fact 5.7, for each i ∈ {1, . . . , k}, there exist two degree-di polynomials fiu and
fi` for fi which satisfy the conditions in Fact 5.7. Hence, we have

                                fiu (x) ≥ fi (x) ≥ 0   and   1 − fi` (x) ≥ 1 − fi (x) ≥ 0.

For every α ∈ {0, 1}k , define

              Mαu (x) := ∏ fiu (x) ∏ 1 − f j` (x) and
                                                                                                
                                                                 Mα (x) := ∏ fi (x) ∏ 1 − f j (x) .
                      i:αi =1        j:α j =0                                  i:αi =1        j:α j =0


                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                             35
                                      C HIN H O L EE AND E MANUELE V IOLA

Clearly, Mαu (x) ≥ Mα (x), and Mαu (x) has degree ∑i≤k di . We claim that for every α ∈ {0, 1}k ,

                                               E[Mαu (x) − Mα (x)] ≤ 2k ε.

Fix a string α ∈ {0, 1}k . Define the hybrids M0 = Mαu (x), M1 , . . . , Mk = Mα (x), where
                                                                   (1)            (2)
                                              Mi (x) := Mi (x) · Mi (x),

where
                                     (1)                                                             
                                  Mi (x) :=             ∏         f j (x)      ∏          1 − f j (x) ,
                                                    j≤i,α j =1              j≤i:α j =0

and
                                    (2)
                                                                 f ju (x)                 1 − f j` (x) .
                                                                                                      
                                 Mi (x) :=              ∏                      ∏
                                                    j>i:α j =1              j>i:α j =0

Note that
                        (2)
                                                  E f ju (x)                   E 1 − f j` (x) ≤ (1 + ε)k−i ,
                                                                                          
                   E[Mi (x)] =         ∏                            ∏
                                     j>i:α j =1                   j>i:α j =0

        (1)
and Mi (x) ≤ 1. So, if αi = 1, then
                                     h                 (1)               i
                                                                     (2)
              E Mi−1 (x) − Mi (x) = E fiu (x) − fi (x) · Mi−1 (x) · Mi (x) ≤ ε · (1 + ε)k−i .
                                


Likewise, if αi = 0, we have
                                 h                             (1)               i
                                                                             (2)
         E[Mi−1 (x) − Mi (x)] = E (1 − fi` (x)) − (1 − fi (x)) · Mi−1 (x) · Mi (x) ≤ ε · (1 + ε)k−i .

Hence,
                E[Mαu (x) − Mα (x)] ≤        ∑ E[Mi−1 (x) − Mi (x)] ≤ ε ∑ (1 + ε)i ≤ 2k ε.
                                            1≤i≤k                                             0≤i≤k−1

Now we define Mα` (x) := 1 − ∑β :β 6=α Mβu (x). Note that Mα` (x) also has degree ∑i≤k di . Since
                                                                            
                                    ∑       Mα (x) = ∏ fi (x) + (1 − fi (x)) = 1,
                                 α∈{0,1}k                   i≤k

we have
                          Mα` (x) = 1 −        ∑ Mβu (x) ≤ 1 − ∑ Mβ (x) = Mα (x).
                                             β :β 6=α                          β :β 6=α

Hence,
                E[Mα (x) − Mα` (x)] =                   Mβu (x) − Mβ (x) ≤                  ∑ 2k ε ≤ 2k 2k ε = 4k ε.
                                                                        
                                              ∑
                                            β :β 6=α                                      β :β 6=α

As H is multilinear, we can write H as

                              H(y1 , . . . , yk ) =      ∑         H(α) ∏ yi ∏ (1 − yi ),
                                                       α∈{0,1}k               i:αi =1     i:αi =0


                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                           36
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

where H(α) ∈ [0, 1] for every α. So

                 h(x) =        ∑        H(α) ∏ fi (x) ∏ (1 − fi (x)) =            ∑           H(α)Mα (x).
                             α∈{0,1}k         i:αi =1     i:αi =0             α∈{0,1}k

Now if we define

                     hu (x) :=     ∑        H(α)Mαu (x) and h` (x) :=          ∑        H(α)Mα` (x).
                                 α∈{0,1}k                                    α∈{0,1}k

Clearly hu and h` both have degree ∑i≤k di . We also have hu (x) ≥ h(x) ≥ h` (x), and

                 E[hu (x) − h(x)] ≤          ∑       H(α) E[Mαu (x) − Mα (x)] ≤       ∑         2k ε ≤ 4k ε.
                                          α∈{0,1}k                                α∈{0,1}k

A similar calculation shows that E[h(x) − h` (x)] ≤ 8k ε. Therefore, since hu and h` are two polynomials
that satisfy the conditions in Fact 5.7 with error 8k ε, the lemma follows.

5.3   Proof of Theorem 1.13
Armed with Lemma 5.4 and Lemma 5.6, we are now ready to prove Theorem 1.13. We first need the
following useful fact to handle the case when S is the M-th roots of unity.
Fact 5.8. Let X and Y be two random variables on {0, 1}m . Suppose for every m-bit product test
g : {0, 1}m → S, where S is the set of all M-th roots of unity, we have E[g(X)] − E[g(Y )] ≤ ε. Then for
every m-bit product test g : {0, 1}m → S and every z ∈ S, we have Pr[g(X) = z] − Pr[g(Y ) = z] ≤ ε.
Proof. Let ω be a primitive M-th root of unity. Note that for every integer j, the function g j is also a
product test with the same range. So for every j, k ∈ {0, . . . , M − 1},

                 E[(ω −k g(X)) j ] − E[(ω −k g(Y )) j ] ≤ ω −k j · E[g(X) j ] − E[g(Y ) j ] ≤ ε.

Using the identity                                              (
                                            1 M−1 (i−k) j        1 if i = k
                                              ∑ω          =
                                            M j=0                0 otherwise,
we have for every k ∈ {0, . . . , M − 1},

                                                         1 M−1  −k
           Pr[g(X) = ω k ] − Pr[g(Y ) = ω k ] ≤                E (ω g(X)) j − E (ω −k g(Y )) j ≤ ε.
                                                                                            
                                                           ∑
                                                         M j=0

Proof of Theorem 1.13. Write f = ∏ki=1 fi , where fi : {0, 1}Ii → C≤1 . Let σ denote (∑i≤k Var[ fi ])1/2 .
We will consider two cases: σ 2 ≥ d`0.1 and σ 2 ≤ d`0.1 .
   If σ 2 ≥ d`0.1 , then the expectation of f under the uniform distribution is small. Specifically, we have
                                                                     1   2              0.1
                      ∏ EU [ fi (U)] ≤ ∏(1 − Var[ fi ])1/2 ≤ e− σ ≤ 2−Ω(d` ) ≤ `−Ω(d) .
                                                                     2                                         (5.1)
                      i≤k                   i≤k



                            T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                 37
                                               C HIN H O L EE AND E MANUELE V IOLA

Thus, it suffices to show that its expectation under D + T ∧U is at most `−Ω(d) . We now use Claim 2.2 to
show that
                                          h k               i
                                       E ∏ fi (D + T ∧U) ≤ `−Ω(d) .
                                                D,T,U
                                                          i=1

                                                               2 denote Var 0 [ f (x +t ∧U 0 )]. Let T 0 be a uniform
For each t, x ∈ {0, 1}m , and each i ∈ {1, 2, . . . , k}, let σt,x,i       U i
m-bit random variable. By Claim 2.2 with η = 1/2, we have ET 0 ,U [σT20 ,U,i ] ≥ Var[ fi ]/2. So by linearity
of expectation,
                                                    h                     i
                                                E
                                                0
                                               T ,U
                                                    ∑ σT2 ,U,i ≥ σ 2 /2 ≥ d`0.1 /2.
                                                                 0
                                                        i≤k


Since T and D are both d`-wise independent, the random variables σT,D,1         2               2
                                                                                     , . . . , σT,D,k are (0, d)-close to
  2                  2                                2            0.1
                                                             
σT 0 ,U,1 , . . . , σT 0 ,U,k . Let µ = ET 0 ,U ∑i≤k σT 0 ,U,i ≥ d` /2. By Lemma 2.3,

                                                                                     p            !d
                                   h                            i                        µd + d
                                   2
                             Pr ∑ σT,D,i ≤ µ/2 ≤ 4 · 2d                                                = `−Ω(d) .
                             T,D                                                         µ/2
                                       i≤k


Hence, except with probability `−Ω(d) over t ∈ T and x ∈ D, we have

                                            2
                                         ∑ σt,x,i = ∑ Var[ fi (x + t ∧U 0 )] ≥ d`0.1 /4.
                                                                     0
                                         i≤k          U   i≤k

By a similar calculation to (5.1), for every such t and x,

                             ∏ EU [ fi (x + t ∧U)] ≤ ∏ EU [ fi (x + t ∧U)]
                             i≤k                                         i≤k
                                                                         2
                                                                = ∏(1 − σt,x,i )1/2
                                                                         i≤k
                                                                           1     2                 0.1 )
                                                                ≤ e− 2 ∑i≤k σt,x,i ≤ 2−Ω(d`                ≤ `−Ω(d) .

In addition, we always have | f | ≤ 1. Hence,
                             h                i     h                   i
                       E          f
                                 ∏i (D + T ∧U)  ≤ E     [
                                                      ∏ i
                                                       E  f (D + T ∧U)]   ≤ `−Ω(d) .
                     D,T,U                                               D,T         U
                                 i≤k                                           i≤k



    Suppose σ 2 ≤ d`0.1 . Let σ12 ≥ σ22 ≥ · · · ≥ σk2 be the variances of f1 , f2 , . . . , fk respectively. Let
                                                                       0
k0 = d`0.7 . We have σk20 ≤ d`0.1 /k0 = `−0.6 ; for otherwise σ 2 ≥ ∑ki=1 σi2 ≥ k0 σk20 > d`0.1 , a contradiction.
Let T 0 be a uniform m-bit random variable. Let σ̃i2 denote
                                                            h                          i
                                                                              0    0
                                                        Var
                                                         0
                                                              E0
                                                                 [ f i (U + T   ∧U   )] .
                                                        T ,U U


                           T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                         38
         M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

We now show that σ̃i2 ≤ O(σi2 )4/3 . For every i ∈ {1, . . . , k}, define gi : {0, 1}m → C≤1 to be gi (x) =
( fi (x) − E[ fi ])/2 so that E[gi ] = 0 and Var[gi ] = Var[ fi ]/4. We apply Lemma 5.4 with ρ = 1/2. Notice
that since M is fixed, we have |g(x) − g(x0 )| = Ω(1) whenever g(x) 6= g(x0 ). Hence,
                                                       h                         i
                                                                        0    0
                                        σ̃i2 = 4 Var     E  [g i (U + T   ∧U   )]
                                                 T 0 ,U U 0
                                                     h                            i
                                             = 4 E0 Var E0 [gi (U + T 0 ∧U 0 )]
                                                   T  U U
                                                     2 4/3
                                               = O(σi ) .

It follows that
                                              
                  ∑ σ̃i2 = O ∑ (σi2 )4/3 ≤ O (σk2 )1/3 ∑ σi2 ≤ O(`−0.2 ) · d`0.1 = d`−Ω(1) .
                                                                    
                                                               0
              i>k0              i>k0                                    i>k0

Now, if we let F2 := ∏i>k0 fi , then by Lemma 2.1,

                                        E [F2 (D + T ∧U)] − E[F2 (U)] ≤ `−Ω(d) .                               (5.2)
                                       D,T,U                            U

                                                       0
On the other hand, if we define F1 to be ∏ki=1 fi , then it follows from Theorem 1.10 that
                                                                                      2   0        0.3 )
                          E [F1 (D + T ∧U)] − E[F1 (U)] ≤ k0 2−Ω(d `/k ) = 2−Ω(d`                          .   (5.3)
                        D,T,U                              U


     We now combine (5.2) and (5.3) using Lemma 5.6. To begin, define g1 (x) := ET,U [F1 (x + T ∧U)]
and g2 (x) := ET,U [F2 (x + T ∧U)].
     Let S be the range of the functions fi . If S = [0, 1], then the theorem follows immediately by applying
Lemma 5.6 to g1 and g2 . However, if S is the set of M-th roots of unity, then we cannot apply Lemma 5.6
directly because it only applies to functions with range [0, 1]. Nevertheless we can use Fact 5.8 to reduce
from S to {0, 1}.
     We now reduce S to {0, 1} and apply Lemma 5.6. For every z ∈ S, we define the point function
1z : S → {0, 1} by 1z (x) = 1 if and only if x = z. Then for every random variable Z on S,

                                         E[Z] = ∑ z Pr[Z = z] = ∑ z E[1z (Z)].
                                                   z∈S                      z∈S

Hence,
                                                                
                          E[g1 (X)g2 (X)] = ∑ z E 1z g1 (X)g2 (X)
                                                   z∈S
                                                      h                                          i
                                                = ∑ zE             ∑           1u g1 (X) 1v g2 (X)
                                                   z∈S         u,v∈S:uv=z
                                                                         h                  i
                                                = ∑z           ∑        E 1u g1 (X) 1v g2 (X) .
                                                   z∈S u,v∈S:uv=z



                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                   39
                                        C HIN H O L EE AND E MANUELE V IOLA

Hence, by Fact 5.8, for every u, v ∈ S, the functions 1u ◦ g1 and 1v ◦ g2 are fooled by d-wise independence
with error `−Ω(d) . So by Lemma 5.6, (1u ◦ f )(1v ◦ g) are fooled by 2d-wise independence with error
`−Ω(d) . Hence,

                    E[ f (D + T ∧U)] − E[ f (U)]
                 = E[(g1 g2 )(D)] − E[(g1 g2 )(U)]
                                                                                  
                 ≤ ∑ |z| ∑        E (1u ◦ g1 )(1v ◦ g2 ) D − E (1u ◦ g1 )(1v ◦ g2 ) U
                    z∈S    u,v∈S:uv=z

                 ≤ M · `−Ω(d) = `−Ω(d)
                      2


because M is fixed, proving the theorem.

5.4     Proof of Theorem 1.15
We now prove Theorem 1.15. We will need the following theorem that is implicit in [13]. The original
theorem was stated for read-once branching programs. Below we sketch how to modify their proof to
handle product tests. Combining the theorem with Claim 2.4 proves Theorem 1.15.

Theorem 5.9 ([13] (Implicit)). Let f : {0, 1}m → C≤1 be an m-bit product test with k functions of input
length `. Let D, T and U be three independent m-bit random variables, where D and T are 2t-wise
independent and U is uniform. Then

                                E [ f (D + T ∧U)] − E[ f (U)] ≤ k · 2−(t−`+1)/2 .
                              D,T,U                            U

Proof. We can assume t ≥ ` for otherwise the conclusion is trivial. Let t 0 := t − ` + 1 ≥ 1. We slightly
modify the decomposition in [13, Proposition 6.1] as follows. Let f be an m-bit product test and write
 f = ∏ki=1 fi . As the random variable D + T ∧U is symmetric, i. e., invariant under permutations of its
coordinates, we can assume the function fi is defined on the i-th block of ` bits. For every i ∈ {1, . . . , k},
let f ≤i = ∏ j≤i f j and f >i = ∏ j>i f j . We decompose f into

                                                                      k
                                                 f = fˆ0/ + L + ∑ Hi f >i ,                               (5.4)
                                                                    i=1

where
                                                   L :=        ∑          fˆα χα
                                                           α∈{0,1}`k
                                                            0<|α|<t 0

and
                                         Hi :=                 ∑                    fˆα≤i χα .
                                                   α=(α1 ,...,αi )∈{0,1}`i :
                                                 the t 0 -th 1 in α appears in αi

We now show that the expressions on both sides of Equation (5.4) are identical. Clearly, every Fourier
coefficient on the right hand side is a coefficient of f . To see that every coefficient of f appears on the

                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                               40
        M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

right hand side exactly once, let α = (α1 , . . . , αk ) ∈ {0, 1}`k and fˆα = ∏ki=1 fˆi (αi ) be a coefficient of f .
If |α| < t 0 , then fˆα appears in fˆ0/ or L. Otherwise, |α| ≥ t 0 . Then the t 0 -th 1 in α must appear in one of
α1 , . . . , αk . Say it appears in αi . Then we claim that α appears in Hi f >i . This is because the coefficient
indexed by (α1 , . . . , αi ) appears in Hi , and the coefficient indexed by (αi+1 , . . . , αk ) appears in f >i . Note
that all the coefficients in each function Hi have weights between t 0 = t − ` + 1 and t 0 + ` − 1 = t, and
                                                                                                         0
because our random variables D and T are both 2t-wise independent, we get an error of 2−t = 2−(t−`+1)
in Lemma 6.2 in [13]. The rest of the analysis follows from [13] or [20].

    Theorem 1.15 easily follows from Theorem 5.9 and Claim 2.4.

Proof of Theorem 1.15. We may assume t ≥ 8`, otherwise the conclusion is trivial. If k ≥ 23`+1 dt/`e,
then the theorem follows from Claim 2.4. Otherwise, k ≤ 23`+1 dt/`e and the theorem follows from
Theorem 5.9.


6    Small-bias plus noise fools degree-2 polynomials
In this section we show that small-bias distributions plus noise fool non-read-once F2 -polynomials of
degree 2. We first state a structural theorem about degree-2 polynomials over F2 which will be used in
our proof.

Theorem 6.1 (Theorem 6.30 in [24]). For every F2 -polynomial p : {0, 1}m → {0, 1} of degree 2, there
exists an invertible matrix A ∈ Fm×m
                                 2   , an integer k ≤ bm/2c, and a subset L ⊆ [m] such that p(Ax) :=
  k
     x     x +
∑i=1 2i−1 2i ∑i∈L i  x .

Proof of Claim 1.14. Let p be a degree-2 polynomial. It suffices to fool q(x) := (−1) p(x) . By Theo-
rem 6.1, there exists an invertible matrix A, an integer k and a subset L ⊆ [m] such that q(Ax) = r(x)· χL (x),
                       k
where r(x) := (−1)∑i=1 x2i−1 x2i , and χL (x) = (−1)∑i∈L xi . By writing r(x) in its Fourier expansion, q(Ax)
has the Fourier expansion
                                                                
                                        q(Ax) = ∑ r̂S χS (x) χL (x),
                                                       S⊆[2k]

where |r̂S | = 2−k . Note that L is a subset of [m]. Viewing the sets S and L as vectors in {0, 1}m , we have

          E[q(D + T ∧U)] − E[q(U)] ≤             ∑          2−k E[χS+L (A−1 (D))] · E[χS+L (A−1 (T ∧U))]
                                              06/ =S⊆[2k]

                                           ≤ 2−k δ          ∑       E[χS+L (A−1 (T ∧U))]
                                                      06/ =S⊆[2k]
                                               −k
                                           =2 δ             ∑       E[χA(S+L) (T ∧U)]
                                                      06/ =S⊆[2k]

                                           = 2−k δ          ∑ (1/3)|A(S+L)| ,
                                                      06/ =S⊆[2k]



                          T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                      41
                                   C HIN H O L EE AND E MANUELE V IOLA

where the second inequality follows because small-bias distributions are closed under linear transforma-
tions. We now bound above the summation. We claim that


                                ∑ (1/3)|A(S+L)| ≤ ∑ (1/3)|S| = (4/3)2k .
                               S⊆[2k]                S⊆[2k]



The equality is clear. To see the inequality, notice that since S ⊆ [2k], when viewed as a vector in {0, 1}m
its last m − 2k positions must be 0. So we will instead think of S as a vector in {0, 1}2k , and rewrite
A(S + L) as A0 S + AL, where A0 is the first 2k columns of the full rank matrix A. In particular, A0 is a full
rank m × 2k matrix. As we are only concerned with the Hamming weight of A0 S + AL, we can permute
its coordinates and rewrite A0 as [I2k |A00 ]T for some 2k × (m − 2k) matrix A00 . (Readers who are familiar
with linear codes should think of the standard form of a generator matrix.) Moreover, for a lower bound
on the Hamming weight, we can restrict our attention to the first 2k bits of A0 S + AL. Hence, we can think
of first 2k bits of A0 S + AL as S shifted by the first 2k bits of the fixed vector AL. Since we are summing
over all S in {0, 1}2k , the shift does not affect the sum, and the inequality follows. Therefore, we have


                         E[q(D + T ∧U)] − E[q(U)] ≤ 2−k δ · (4/3)2k ≤ (8/9)k δ ,


and proving the claim.




7    Proof of Claim 1.12

In this section, we more generally exhibit a distribution D that is (d 2 /10k, d)-close to uniform for any
d. One can obtain Claim 1.12 by setting d = k1/3 . To simplify notation we will switch from {0, 1} to
{−1, 1}, and replace k with 2k.
   We define D to be a random variable distributed uniformly over strings in {−1, 1}2k with equal
number of −1’s and 1’s.


Claim 7.1. D is (10d 2 /k, d)-close to uniform for every integer d.


Proof. We can assume d 2 ≤ k/10, for otherwise the conclusion is trivial. Let I ⊆ [k] be a subset of size d.
For every x ∈ {−1, 1}d , we have
                                                           2k−d
                                                                 
                                                          k−wt(x)
                                          Pr[DI = x] =      2k
                                                                    ,
                                                             k


                         T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                              42
       M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

where wt(x) is the number of −1’s in x. We bound below the right hand side by
                                 2k−d
                                       
                                  k−d        k(k − 1) · · · (k − d + 1)
                                   2k
                                       =
                                    k
                                           2k(2k − 1) · · · (2k − d + 1)
                                             k−d +1 d
                                                       
                                         ≥
                                                2k
                                                     d −1 d
                                                             
                                            −d
                                         =2      1−
                                                         k
                                                                 
                                            −d       d(d − 1)
                                         ≥2      1−
                                                            k
                                               ≥ 2−d · (1 − d 2 /k),
and bound it above by
                                      2k−d
                                          
                                     k−d/2      (k(k − 1) · · · (k − d/2 + 1))2
                                       2k
                                              =
                                                 2k(2k − 1) · · · (2k − d + 1)
                                          
                                        k
                                                               d
                                                      k
                                              ≤
                                                  2k − d + 1
                                                                       d
                                                 −d             d −1
                                              =2      1+
                                                           2k − d + 1
                                                                             !
                                                 −d
                                                            d 
                                                                    d(d − 1) i
                                              ≤2      1+ ∑
                                                          i=1 2k − d + 1
                                                                          
                                                 −d              d(d − 1)
                                              ≤2      1+2·
                                                                2k − d + 1
                                          ≤ 2−d · (1 + 2d 2 /k).

The second inequality follows from (1 + a)d = 1 + ∑di=1 di ai ≤ 1 + ∑di=1 (da)i , where a = (d − 1)/(2k −
                                                             

d + 1). The third inequality is because the geometric sum has ratio ≤ 1/2 as d 2 ≤ k/10, and so is bounded
by twice the first term. Hence, we have |Pr[DI = x] − 2−d | ≤ 2−d · 2d 2 /k for every x ∈ {−1, 1}d . The
claim then follows from summing the inequality over every x ∈ {−1, 1}d .

      We now define our product  √
                                      test f . For each j ∈ {1, . . . , 2k}, define f j : {−1, 1}2k → C≤1 to be
f j (x) = ω x j , where ω := e−i/ 2k . Let f = ∏ j≤2k f j . We now show that for every large enough k we have

                                     E[ f (D + T ∧U)] − E[ f (U)] ≥ 1/100.

We now bound above and below the expectation of f under the distributions D + T ∧U and U We will
use the fact that 1 − θ 2 /2 ≤ cos θ ≤ 1 − 2θ 2 /5 for θ ∈ [−1, 1]. First, we have
                                                                        √ 2k
         E[ f (U)] = ∏        E [ω x ] = ∏ (ω + ω −1 )/2 = cos(1/ 2k)             ≤ (1 − 1/(5k))2k .
                     j≤2k x∼{−1,1}            j≤2k


                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                43
                                          C HIN H O L EE AND E MANUELE V IOLA

Next for every j ∈ {1, 2, . . . , 2k}, we have

                                                                 3       1
                                            E [ f j (x + T ∧U)] = ω x j + ω −x j .
                                           T,U                   4       4

Define β : {−1, 1} → C≤1 to be β (x) := 43 ω x + 14 ω −x . Since D has the same number of −1’s and 1’s,
                                     h          i
                              E          ∏ j
                                          β  (D)  = β (1)k β (−1)k
                                 D
                                         j≤2k

                                                     = (10/16 + 3/16 · (ω 2 + ω −2 ))k
                                                                          √
                                                     = (5/8 + 3/8 · cos(2/ 2k))k
                                                     ≥ (5/8 + 3/8 · (1 − 1/k))k
                                                     = (1 − 3/(8k))k ,

Therefore |E[ f (D + T ∧U] − E[ f (U)]| ≥ (1 − 3/(8k))k − (1 − 1/(5k))2k ≥ 1/100, for every sufficiently
large k, concluding the proof.
    The fi in this proof have variance Θ(1/k). So this counterexample gives a product test with total
variance O(1), and is relevant also to Lemma 2.1. Specifically it shows that for ` = 1 and say d = O(1),
the error term (k2` )O(d) ε in Lemma 2.1 cannot be replaced with kc ε for a certain constant c. Moreover, it
cannot be replaced even if any kΩ(1) of the Yi are close to the Xi (as opposed to just O(1)).


8    Moment bounds for sum of almost d-wise independent variables
In this section we prove some moment bounds and tail bounds for sum of almost d-wise independent
complex variables.

Lemma 8.1. Let Z1 , Z2 , . . . , Zk ∈ C be independent random variables with E[Zi ] = 0, |Zi | < B. Let d be
an even positive integer. Let W1 ,W2 , . . . ,Wk ∈ C be random variables that are (ε, d)-close to Z1 , . . . , Zk
with |Wi | < B. Then,
                      "               #
                            k       d                         1/2     d
                     E ∑ Wi             ≤ 4 · 2d ∑ Var[Zi ] · d      + dB + 4(2kB)d ε.
                           i=1                          i

Proof. Note that for any random variable W ∈ C we have
                               h     i     h                   d/2 i
                              E |W |d = E |ℜ(W )|2 + |ℑ(W )|2
                                           h                           d/2 i
                                       ≤ E 2 max{|ℜ(W )|2 , |ℑ(W )|2 }
                                                 h                   i
                                       ≤ 2d/2 · E |ℜ(W )|d + |ℑ(W )|d ,                                    (8.1)

and Var[W ] = Var[ℜ(W )] + Var[ℑ(W )]. We first prove a bound assuming the Wi are real-valued.

                        T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                  44
              M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

     Since W1 , . . . ,Wk are (ε, d)-close to Z1 , . . . , Zk , and d is even, we have
                                     "           #          "            #
                                         k     d                     d
                                   E ∑ Wi           = E ∑ Wi
                                                         i=1                            i
                                                                                            "                 #
                                                                                                d
                                                                       =     ∑ E ∏ Wi                     j
                                                                           i1 ,...,id           j=1
                                                                                            "             #
                                                                                                d
                                                                       ≤     ∑ E ∏ Zi                 j       + 2kd Bd ε,
                                                                           i1 ,...,id           j=1

because there are kd products in the sum and we can apply Claim 4.7 to each product, which is bounded
by Bd .
   We now estimate the quantity ∑i1 ,...,id E ∏dj=1 Zi j . We have
                                                       

                                   "         #                     "     #
                                                               d              d                                       d
                                            ∑ E ∏ Zi               j   = ∑              ∑             ∑           E   ∏ Zi   j   .
                                          i1 ,...,id       j=1              m=1 |S|=m i1 ,...,id ∈S:                  j=1
                                                                                        {i j } j =S

The expectation is zero whenever Zi j appears only once for some i j ∈ S. So each Zi j must appear at least
twice. So the expectation is 0 whenever m > d/2. As each Zi is bounded by B, each product is bounded
by Bd−2m ∏ j∈S E[Z 2j ] = Bd−2m ∏ j∈S Var[Z j ]. For each S ⊆ [k] of size m, there are at most md such terms.
Let σ denote (∑ki=1 Var[Zi ])1/2 . Then,
         "       #
                 d              d/2
   ∑ E ∏ Zi            j   ≤ ∑ Bd−2m md ∑ ∏ Var[Z j ]
 i1 ,...,id      j=1         m=1                         |S|=m j∈S
                                d/2
                           ≤ ∑ Bd−2m md−m em σ 2m                                                             (Maclaurin’s inequality, see Claim 4.8)
                             m=1
                                       d/2
                           ≤ ed/2 ∑ Bd−2m (d/2)d−m σ 2m
                                      m=1
                                                          d/2 
                                                                     σ 2 m
                           ≤ ed/2 (d/2)d Bd ∑
                                                          m=0      (d/2)B2
                                                
                                                        σd       d/2
                           ≤e   d/2              d d
                                      (d/2) B · d 1 +              ( ∑ α m ≤ d(α 0 + α d/2 ), ∀α > 0)
                                                    (d/2)d/2 Bd     m=0
                                                          
                                 d/2       d d       d/2 d
                           ≤ de       (d/2) B + (d/2) σ
                                             √
                           ≤ 2 · 2d/2 (dB + σ d)d .

Hence,                          "                    #
                                      k          d                     k              1/2 d
                            E         ∑ Wi               ≤ 2 · 2d/2 dB + ∑ Var[Zi ] · d         + 2(kB)d ε.
                                    i=1                                                 i=1


                                T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                                                45
                                            C HIN H O L EE AND E MANUELE V IOLA

To handle complex-valued Wi , we apply the bound above to |∑ki=1 ℜ(Wi )|d and |∑ki=1 ℑ(Wi )|d , and plug
both bounds in (8.1), giving us
        "               #
             k      d
    E       ∑ Wi
            i=1
                   k                    1/2       d        k               1/2     d
    ≤ 2 · 2d        ∑ Var[ℜ(Zi )] · d            + dB + 2 · 2d ∑ Var[ℑ(Zi )] · d      + dB + 4 · 2d/2 (kB)d ε
                    i=1                                               i=1
                   k               1/2       d
    ≤ 4 · 2d        ∑ Var[Zi ] · d          + dB + 4(2kB)d ε.
                    i=1

Lemma 8.2. Let X1 , X2 , . . . , Xk ∈ [0, 1] be independent random variables. Let d be an even positive
integer. Let Y1 ,Y2 , . . . ,Yk ∈ [0, 1] be random variables that are (ε, d)-close to X1 , . . . , Xk . Let Y = ∑i≤k Yi
and µ = E[∑i Xi ]. Then,
                                                                  p            !d          d
                                                              d       µd + d           2k
                                  Pr[|Y − µ| ≥ δ µ] ≤ 4 · 2                         +4           ε.
                                                                      δµ               δµ

In particular, if µ ≥ 25d, we have Pr[|Y − µ| ≥ µ/2] ≤ 2−Ω(d) + kd ε.

Proof. Let Xi0 = Xi − E[Xi ], Yi0 = Yi − E[Xi ] and Y 0 = ∑i Yi0 . Note that Xi0 ∈ [−1, 1] and E[Xi0 ] = 0. Since
Xi ∈ [0, 1], we have
                         E[Xi ] ≥ E[Xi2 ] ≥ Var[Xi ] = Var[Xi − E[Xi ]] = Var[Xi0 ].

By Lemma 8.1 and Markov’s inequality,

                            Pr[|Y − µ| ≥ δ µ] = Pr[|Y 0 |d ≥ (δ µ)d ]
                                                                                      !d
                                                            (∑i Var[Xi0 ] · d)1/2 + d         2k d
                                                                                                
                                                 ≤ 4 · 2d                                +4        ε
                                                                     δµ                       δµ
                                                                       !d
                                                                                   2k d
                                                            p                         
                                                              µd + d
                                                 ≤ 4 · 2d                    +4          ε,
                                                               δµ                  δµ

where in the last inequality we used µ ≥ ∑i Var[Xi0 ].


Acknowledgments. We thank Daniel Kane for answering some questions about [15], and Kavi Rama
Murthy for providing a proof of Lemma 4.4. An earlier version of this paper had the bound Õ(` +
log k) log(k/ε) log k in Theorem 1.5. We thank Dean Doron, Pooya Hatami and William Hoza for
pointing out an improvement of this bound. We are also very grateful to the anonymous reviewers for
their detailed comments which, in addition to correcting several mistakes, also strengthened some of the
intermediate claims and simplified their proofs.

                                T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                                46
       M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

References
 [1] M IKLÓS A JTAI , J ÁNOS KOMLÓS , AND E NDRE S ZEMERÉDI: Deterministic simulation in
     LOGSPACE. In Proc. 19th STOC, pp. 132–140. ACM Press, 1987. [doi:10.1145/28395.28410] 3

 [2] M IKLÓS A JTAI AND AVI W IGDERSON: Deterministic simulation of probabilistic constant depth
     circuits. Advances in Computing Research, 5(1):199–222, 1989. Preliminary version in FOCS’85. 4

 [3] ROY A RMONI , M ICHAEL S AKS , AVI W IGDERSON , AND S HIYU Z HOU: Discrepancy sets and
     pseudorandom generators for combinatorial rectangles. In Proc. 37th FOCS, pp. 412–421. IEEE
     Comp. Soc. Press, 1996. [doi:10.1109/SFCS.1996.548500] 3

 [4] L OUAY M. J. BAZZI: Polylogarithmic independence can fool DNF formulas. SIAM J. Comput.,
     38(6):2220–2272, 2009. Preliminary version in FOCS’07. [doi:10.1137/070691954] 35

 [5] A NDREJ B OGDANOV, P ERIKLIS A. PAPAKONSTANTINOU , AND A NDREW WAN: Pseudorandom-
     ness for read-once formulas. In Proc. 52nd FOCS, pp. 240–246. IEEE Comp. Soc. Press, 2011.
     [doi:10.1109/FOCS.2011.57] 3

 [6] A NDREJ B OGDANOV AND E MANUELE V IOLA: Pseudorandom bits for polynomials. SIAM J.
     Comput., 39(6):2464–2486, 2010. Preliminary version in FOCS’07. [doi:10.1137/070712109] 2

 [7] R AVI B OPPANA , J OHAN H ÅSTAD , C HIN H O L EE , AND E MANUELE V IOLA: Bounded indepen-
     dence versus symmetric tests. ACM Trans. Computation Theory, 11(4):21:1–27, 2019. Preliminary
     version in RANDOM’16. [doi:10.1145/3337783] 9

 [8] S URESH C HARI , PANKAJ ROHATGI , AND A RAVIND S RINIVASAN: Improved algorithms via
     approximations of probability distributions. J. Comput. System Sci., 61(1):81–107, 2000. Preliminary
     version in STOC’94. [doi:10.1006/jcss.1999.1695] 4, 6

 [9] E SHAN C HATTOPADHYAY, P OOYA H ATAMI , O MER R EINGOLD , AND AVISHAY TAL: Improved
     pseudorandomness for unordered branching programs through local monotonicity. In Proc. 50th
     STOC, pp. 363–375. ACM Press, 2018. [doi:10.1145/3188745.3188800] 3

[10] A NINDYA D E: Beyond the central limit theorem: asymptotic expansions and pseudorandom-
     ness for combinatorial sums. In Proc. 56th FOCS, pp. 883–902. IEEE Comp. Soc. Press, 2015.
     [doi:10.1109/FOCS.2015.59] 3

[11] A NINDYA D E , O MID E TESAMI , L UCA T REVISAN , AND M ADHUR T ULSIANI: Improved pseudo-
     random generators for depth 2 circuits. In Proc. 14th Internat. Workshop on Randomization and
     Computation (RANDOM’10), pp. 504–517. Springer, 2010. [doi:10.1007/978-3-642-15369-3_38] 4

[12] G UY E VEN , O DED G OLDREICH , M ICHAEL L UBY, N OAM N ISAN , AND B OBAN V ELI ČKOVI Ć:
     Efficient approximation of product distributions. Random Structures Algorithms, 13(1):1–16,
     1998. Preliminary version in STOC’92. [doi:10.1002/(SICI)1098-2418(199808)13:1<1::AID-
     RSA1>3.0.CO;2-W] 3, 6

                      T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                            47
                               C HIN H O L EE AND E MANUELE V IOLA

[13] M ICHAEL A. F ORBES AND Z ANDER K ELLEY: Pseudorandom generators for read-once branch-
     ing programs, in any order. In Proc. 59th FOCS, pp. 946–955. IEEE Comp. Soc. Press, 2018.
     [doi:10.1109/FOCS.2018.00093, arXiv:1808.06265] 9, 40, 41
[14] D MITRY G AVINSKY, S HACHAR L OVETT, AND S RIKANTH S RINIVASAN: Pseudorandom genera-
     tors for read-once ACC0 . In Proc. 27th IEEE Conf. on Computational Complexity (CCC’12), pp.
     287–297. IEEE Comp. Soc. Press, 2012. [doi:10.1109/CCC.2012.37] 2, 7
[15] PARIKSHIT G OPALAN , DANIEL M. K ANE , AND R AGHU M EKA: Pseudorandomness via the
     discrete Fourier transform. SIAM J. Comput., 47(6):2451–2487, 2018. Preliminary version in
     FOCS’15. [doi:10.1137/16M1062132, arXiv:1506.04350] 3, 7, 18, 46
[16] PARIKSHIT G OPALAN , R AGHU M EKA , O MER R EINGOLD , L UCA T REVISAN , AND S ALIL VAD -
     HAN: Better pseudorandom generators from milder pseudorandom restrictions. In Proc. 53rd FOCS,
     pp. 120–129. IEEE Comp. Soc. Press, 2012. [doi:10.1109/FOCS.2012.77] 3, 4, 6, 7, 8, 33, 35
[17] PARIKSHIT G OPALAN , R AGHU M EKA , O MER R EINGOLD , AND DAVID Z UCKERMAN: Pseudo-
     random generators for combinatorial shapes. SIAM J. Comput., 42(3):1051–1076, 2013. Preliminary
     version in STOC’11. [doi:10.1137/110854990] 3, 6
[18] PARIKSHIT G OPALAN , RYAN O’D ONNELL , Y I W U , AND DAVID Z UCKERMAN: Fooling func-
     tions of halfspaces under product distributions. In Proc. 25th IEEE Conf. on Computational
     Complexity (CCC’10), pp. 223–234. IEEE Comp. Soc. Press, 2010. [doi:10.1109/CCC.2010.29,
     arXiv:1001.1593] 3
[19] PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF: Inequalities and tail bounds for elementary
     symmetric polynomial with applications, 2014. [arXiv:1402.3543] 3, 6, 7
[20] E LAD H ARAMATY, C HIN H O L EE , AND E MANUELE V IOLA: Bounded independence plus
     noise fools products. SIAM J. Comput., 47(2):493–523, 2018. Preliminary version in CCC’17.
     [doi:10.1137/17M1129088] 3, 4, 5, 9, 33, 41
[21] H AMED H ATAMI: Lecture notes on Harmonic Analysis of Boolean functions, 2014. Available at
     author’s home page. 33
[22] RUSSELL I MPAGLIAZZO , N OAM N ISAN , AND AVI W IGDERSON: Pseudorandomness for network
     algorithms. In Proc. 26th STOC, pp. 356–364. ACM Press, 1994. [doi:10.1145/195058.195190] 3
[23] C HIN H O L EE: Fourier bounds and pseudorandom generators for product tests. In 34th Computa-
     tional Complexity Conference (CCC’19), volume 137, pp. 7:1–25. Schloss Dagstuhl. Leibniz-Zent.
     Inform., Wadern, 2019. [doi:10.4230/LIPIcs.CCC.2019.7] 6
[24] RUDOLF L IDL AND H ARALD N IEDERREITER: Finite Fields. Cambridge Univ. Press, 1997.
     [doi:10.1017/CBO9780511525926] 41
[25] S HACHAR L OVETT: Unconditional pseudorandom generators for low-degree polynomi-
     als.     Theory of Computing, 5(3):69–82, 2009. Preliminary version in STOC’08.
     [doi:10.4086/toc.2009.v005a003] 2

                     T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                        48
       M ORE ON B OUNDED I NDEPENDENCE P LUS N OISE : PRG S FOR R EAD -O NCE P OLYNOMIALS

[26] C HI -J EN L U: Improved pseudorandom generators for combinatorial rectangles. Combinatorica,
     22(3):417–433, 2002. Preliminary version in ICALP’98. [doi:10.1007/s004930200021] 3
[27] M ICHAEL L UBY, B OBAN V ELI ČKOVI Ć , AND AVI W IGDERSON: Deterministic approximate
     counting of depth-2 circuits. In 2nd Israel Symp. on Theory and Computing Systems (ISTCS’93), pp.
     18–24. IEEE Comp. Soc. Press, 1993. [doi:10.1109/ISTCS.1993.253488] 2
[28] R AGHU M EKA , O MER R EINGOLD , AND AVISHAY TAL: Pseudorandom generators for width-3
     branching programs. In Proc. 51st STOC. ACM Press, 2019. [doi:10.1145/3313276.3316319,
     arXiv:1806.04256] 6
[29] J OSEPH NAOR AND M ONI NAOR: Small-bias probability spaces: efficient constructions and
     applications. SIAM J. Comput., 22(4):838–856, 1993. Preliminary version in STOC’90.
     [doi:10.1137/0222053] 4, 6, 9, 14, 16, 17
[30] N OAM N ISAN: Pseudorandom bits for constant depth circuits. Combinatorica, 11(1):63–70, 1991.
     [doi:10.1007/BF01375474] 4
[31] N OAM N ISAN: Pseudorandom generators for space-bounded computation. Combinatorica,
     12(4):449–461, 1992. Preliminary version in STOC’90. [doi:10.1007/BF01305237] 3, 4
[32] N OAM N ISAN AND DAVID Z UCKERMAN: Randomness is linear in space. J. Comput. System Sci.,
     52(1):43–52, 1996. [doi:10.1006/jcss.1996.0004] 3
[33] M ICHAEL S AKS AND S HIYU Z HOU: BPH SPACE(S) ⊆ DSPACE(S3/2 ). J. Comput. System Sci.,
     58(2):376–403, 1999. Preliminary version in FOCS’95. [doi:10.1006/jcss.1998.1616] 3
[34] ROCCO A. S ERVEDIO AND L I -YANG TAN: Improved pseudorandom generators from pseudoran-
     dom multi-switching lemmas. In Proc. 23rd Internat. Workshop on Randomization and Compu-
     tation (RANDOM’19), pp. 45:1–45:23. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
     [doi:10.4230/LIPIcs.APPROX-RANDOM.2019.45] 2
[35] J. M ICHAEL S TEELE: The Cauchy-Schwarz Master Class.            Cambridge Univ. Press, 2004.
     [doi:10.1017/CBO9780511817106] 21
[36] L UCA T REVISAN: Open problems in unconditional derandomization. Slides presented at China
     Theory Week (CTW’10), 2010. 3, 6
[37] YOAV T ZUR: Notions of weak pseudorandomness and GF(2n )-polynomials. Master’s thesis,
     Weizmann Institute of Science, 2009. Link at ECCC. 3
[38] E MANUELE V IOLA: Pseudorandom bits for constant-depth circuits with few arbitrary sym-
     metric gates. SIAM J. Comput., 36(5):1387–1403, 2007. Preliminary version in CCC’05.
     [doi:10.1137/050640941] 2
[39] E MANUELE V IOLA: The sum of D small-bias generators fools polynomials of degree D. Comput.
     Complexity, 18(2):209–217, 2009. Preliminary version in CCC’08. [doi:10.1007/s00037-009-0273-
     5] 2

                      T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                         49
                                C HIN H O L EE AND E MANUELE V IOLA

[40] E MANUELE V IOLA: Randomness buys depth for approximate counting. Comput. Complexity,
     23(3):479–508, 2014. Preliminary version in FOCS’11. [doi:10.1007/s00037-013-0076-6] 3

[41] E MANUELE V IOLA: Special topics in complexity theory. Lecture notes, Northeastern Univ., 2017.
     ECCC. 6


AUTHORS

      Chin Ho Lee
      Postdoctoral research fellow
      Department of Computer Science
      Columbia University
      New York, NY
      c.h.lee columbia edu
      https://cs.columbia.edu/~chlee


      Emanuele Viola
      Associate professor
      Khoury College of Computer Sciences
      Northeastern University
      Boston, MA
      viola ccs neu edu
      http://www.ccs.neu.edu/home/viola/


ABOUT THE AUTHORS

      C HIN H O L EE just completed his Ph. D. at Northeastern University under the guidance of
         Emanuele Viola. He is now doing his postdoc in Columbia Unviersity. He recently got
         married! The picture in the HTML page was taken in Jeju Island, Korea.


      E MANUELE V IOLA is contemplating replacing his humidifier with an areca palm.




                     T HEORY OF C OMPUTING, Volume 16 (7), 2020, pp. 1–50                         50