Concentration for Limited Independence via Inequalities for the Elementary Symmetric Polynomials

Authors Parikshit Gopalan, Amir Yehudayoff,
Plaintext
                          T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29
                                       www.theoryofcomputing.org




 Concentration for Limited Independence via
  Inequalities for the Elementary Symmetric
                  Polynomials
                            Parikshit Gopalan                       Amir Yehudayoff∗
               Received October 8, 2015; Revised August 30, 2020; Published December 24, 2020




       Abstract. We study the extent of independence needed to approximate the product of
       bounded random variables in expectation. This natural question has applications in pseudo-
       randomness and min-wise independent hashing.
            For random variables with absolute value bounded by 1, we give an error bound of
       the form σ Ω(k) when the input is k-wise independent and σ 2 is the variance of their sum.
       Previously, known bounds only applied in more restricted settings, and were quantitatively
       weaker.
            Our proof relies on a new analytic inequality for the elementary symmetric polynomials
       Sk (x) for x ∈ Rn . We show that if |Sk (x)|, |Sk+1 (x)| are small relative to |Sk−1 (x)| for some
       k > 0 then |S` (x)| is also small for all ` > k.
            We use this to give a simpler and more modular analysis of a construction of min-wise
       independent hash functions and pseudorandom generators for combinatorial rectangles due
       to Gopalan et al., which also improves the overall seed-length.

ACM Classification: G.3
AMS Classification: 68Q87
Key words and phrases: pseudorandomness, k-wise independence, hashing, concentration, symmetric
polynomials


    A preliminary version of this paper appeared as ECCC TR14-019 in 2014.
   ∗ Horev fellow—supported by the Taub foundation. Research supported by ISF and BSF.



© 2020 Parikshit Gopalan and Amir Yehudayoff
c b Licensed under a Creative Commons Attribution License (CC-BY)                        DOI: 10.4086/toc.2020.v016a017
                              PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

1    Introduction
The power of independence in probability and randomized algorithms stems from the fact that it lets us
control expectations of products of random variables. If X1 , . . . , Xn are independent random variables,
then E [∏ni=1 Xi ] = ∏ni=1 µi where the µi are their respective expectations. (To avoid measurability issues,
we assume all random variables have finite support.) However, there are numerous settings in computer
science, where true independence either does not hold, or is too expensive (in terms of memory or
randomness).
    Motivated by this, we explore settings when approximate versions of the product rule for expectations
hold even under limited independence. Concretely, let X1 , . . . , Xn be random variables lying in the range
[−1, 1], where Xi has mean µi and variance σi2 . We are interested in the smallest k = k(δ ) such that
whenever the Xi s are drawn from a k-wise independent distribution D, it holds that
                                              "       #
                                                  n           n
                                          ED     ∏ Xi − ∏ µi ≤ δ .                                     (1.1)
                                                 i=1         i=1

As stated, we cannot hope to make do even with k = n − 1. Consider the case where each Xi is a random
{±1} bit. If Xn = ∏n−1
                    i=1 Xi , then the resulting distribution is (n − 1)-wise independent, but E[∏i Xi ] = 1,
whereas ∏i µi = 0. So, we need some additional assumptions about the random variables.
    The main message of this paper is that small total variance is sufficient to ensure that the product rule
holds approximately even under k-wise independence.

Theorem 1.1. Let X1 , . . . , Xn be random variables each distributed in the range [−1, 1], where Xi has
mean µi and variance σi2 . Let σ 2 = ∑i σi2 . There exist constants c1 > 1 and 1 > c2 > 0 such that under
any k-wise independent distribution D,
                                         "      #
                                            n           n
                                     ED    ∏ Xi − ∏ µi ≤ (c1 σ )c k .2
                                                                                                       (1.2)
                                           i=1         i=1

Specifically, if σ < 1/(2c1 ) then k = O(log(1/δ ))-wise independence suffices for Equation (1.1).

    An important restriction that naturally arises is positivity, where each Xi lies in the interval [0, 1].
This setting of parameters (positive variables, small total variance) is important for the applications
considered in this paper: pseudorandom generators (PRGs) for combinatorial rectangles [7, 12] and min-
wise independent permutations [4]. The former is an important problem in the theory of unconditional
pseudorandomness which has been studied intensively [7, 12, 19, 3, 13, 9]. Min-wise independent hashing
was introduced by Broder et al. [4] motivated by similarity estimation, and further studied by [11, 5, 19].
The authors of [19] showed that PRGs for rectangles give min-wise independent hash functions.
    The results of [7, 11] tell us that under k-wise independence, positivity and boundedness, the LHS
of Equation (1.1) is bounded by exp(−Ω(k)), hence k = O(log(1/δ )) suffices for error δ . In contrast,
we have seen that such a bound cannot hold in the [−1, 1] case. However, once the variance is smaller
than some constant, our bound beats this bound even in the [0, 1] setting. Concretely, when σ 2 < n−ε
for some ε > 0, our result says that O(1)-wise independence suffices for inverse polynomial error in
Equation (1.1), as opposed to O(log(n))-wise independence. This improvement is crucial in analyzing

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                               2
      C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

PRGs and hash functions in the polynomially small error regime. A recent result of [9] achieves near-
logarithmic seed-length for both these problems, even in the regime of inverse polynomial error. Their
construction is simple, but its analysis is not. Using our results, we give a modular analysis of the
pseudorandom generator construction for rectangles of [9], using the viewpoint of hash functions. Our
analysis also improves the seed-length of the construction, getting the dependence on the dimension n
down to O(log log(n)) as opposed to O(log(n)), which (nearly) matches a lower bound due to [12].
     The main technical ingredient in our work is a new analytic inequality about symmetric polynomials
in real variables which we believe is of independent interest. The k-th symmetric polynomial in a =
(a1 , a2 , . . . , an ) is defined as

                                           Sk (a) =       ∑        ∏ ai                                    (1.3)
                                                      T ⊆[n]:|T |=k i∈T

(we let S0 (a) = 1). We show that for any real vector a, if |Sk (a)|, |Sk+1 (a)| are small relative to |Sk−1 (a)|
for some k > 0, then |S` (a)| is also small for all ` > k. This strengthens and generalizes a result of [9] for
the case k = 1.
    We give an overview of the new inequality, its use in the derivation of bounds under limited indepen-
dence, and finally the application of these bounds to the construction of pseudorandom generators and
hash functions.

1.1   The elementary symmetric polynomials
The elementary symmetric polynomials appear as coefficients of a univariate polynomial with real roots,
since ∏i∈[n] (ξ + ai ) = ∑nk=0 ξ k Sn−k (a). They have been well studied in mathematics, dating back to
classical results of Newton and Maclaurin (see [20] for a survey). This work focuses on their growth rates.
Specifically, we study how local information on Sk (a) for two consecutive values of k implies global
information for all larger values of k.
    It is easy to see that symmetric polynomials over the real numbers have the following property:
Fact 1.2. Over the real numbers, if S1 (b) = S2 (b) = 0 then b = 0.
    This is equivalent to saying that if p(ξ ) is a real univariate polynomial of degree n with n nonzero
roots and p0 (0) = p00 (0) = 0 then p ≡ 0. This does not hold over all fields, for example, the polynomial
p(ξ ) = ξ 3 + 1 has three nonzero complex roots and p0 (0) = p00 (0) = 0.
    A robust version of Fact 1.2 was recently proved in [9]: For every a ∈ Rn and k ∈ [n],
                                                                    k/2
                                      |Sk (a)| ≤ S12 (a) + 2|S2 (a)|     .                                 (1.4)

That is, if S1 (a), S2 (a) are small in absolute value, then so is everything that follows. We provide an
essentially optimal bound.
Theorem 1.3. For every a ∈ Rn and k ∈ [n],
                                                                            !k
                                                6e(S12 (a) + |S2 (a)|)1/2
                                   |Sk (a)| ≤                                    .
                                                           k1/2


                        T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                  3
                                PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

     The parameters promised by Theorem 1.3 are tight up to an exponential in k which is often too small
to matter (we do not attempt to optimise the constants). For example, if ai = (−1)i for all i ∈ [n] then
|S1 (a)| ≤ 1 and |S2 (a)| ≤ n + 1 but Sk (a) is roughly (n/k)k/2 .
     A more general statement than Fact 1.2 actually holds (see Section 2.1 for a proof).

Fact 1.4. Over the reals, if Sk (a) = Sk+1 (a) = 0 for k > 0 then S` (a) = 0 for all ` ≥ k.

   We prove a robust version of this fact as well: A twice-in-a-row bound on the increase of the
symmetric functions implies a bound on what follows.

Theorem 1.5. For every a ∈ Rn , if Sk (a) 6= 0 and
                                                        
                         k + 1 Sk+1 (a)               k + 2 Sk+2 (a)
                                             ≤ C and                 ≤ C2
                           k       Sk (a)               k    Sk (a)

then for every 1 ≤ h ≤ n − k,

                                                            8eC h
                                                              
                                         k + h Sk+h (a)
                                                        ≤          .
                                           k    Sk (a)      h1/2

     Theorem 1.5 is proved by reduction to Theorem 1.3. The proof of Theorem 1.3 is analytic and uses
the method of Lagrange multipliers, and is different from that of [9] which relied on the Newton–Girard
identities. The argument is quite general, and similar bounds may be obtained for functions that are
recursively defined.
     Stronger bounds are known when the inputs are nonnegative. When ai ≥ 0 for all i ∈ [n], the classical
Maclaurin inequalities [20] imply that Sk (a) ≤ (e/k)k (S1 (a))k . In contrast, when we do not assume
non-negativity, one cannot hope for such bounds to hold under the assumption that |S1 (a)| or any single
|Sk (a)| is small (see the alternating signs example above).

1.2   Tail bounds
We return to the question alluded to earlier about how much independence is required for the approximate
product rule of expectation. This question arises in the context of min-wise hashing [11], PRGs for
combinatorial rectangles [7, 9], read-once DNFs [9] and more.
    One could derive bounds of similar shape to ours using the work of [9], but with much stronger
assumptions on the variables. More precisely, one would require E[Xi2k ] ≤ (2k)2k σi2k for all i ∈ [n], and
get an error bound of roughly kO(k) σ Ω(k) . These stronger assumptions limit the settings where their
bound can be applied (biased variables typically do not have good moment bounds), and ensuring these
conditions hold led to tedious case analysis in analyzing their PRG construction.
    We briefly outline our approach. We start from the results of [7, 11] who give an error bound of
δ ≤ exp(−k) in (1.1). To prove this, they consider random variables Yi = 1 − Xi , so that
                                 n        n              n
                                ∏ Xi = ∏(1 −Yi ) = ∑ (−1) j S j (Y1 , . . . ,Yn ).                    (1.5)
                                i=1      i=1            j=0



                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                             4
      C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

By inclusion-exclusion/Bonferroni inequalities, the sum on the right gives alternating upper and lower
bounds, and the error incurred by truncating to k terms is bounded by Sk (Y ). So we can bound the
expected error by E[Sk (Y )] for which k-wise independence suffices.
    Our approach replaces inclusion-exclusion by a Taylor-series style expansion about the mean, as
in [9]. Let us assume µi 6= 0 and let Xi = µi (1 + Zi ). Thus,
                                                                                              !
                                     n             n                n              n
                                    ∏ Xi = ∏ µi (1 + Zi ) = ∏ µi ∑ S j (Z) .                            (1.6)
                                    i=1         i=1                i=1            j=0


In this approach, it is usually not sufficient to bound E[|Sk (Z)|], since Z may have negative entries (even
if we start with Xi s all positive). So, to argue that the first k terms are a good approximation, we need to
bound the tail ∑`≥k S` (Z) . At first, this seems problematic, since this involves high degree polynomials,
and it seems hard to get their expectations right assuming just k-wise independence.1 Even though we
cannot bound E[S` (Z)] under k-wise independence once `  k, we use our new inequalities for symmetric
polynomials to get strong tail bounds on them. This lets us show that truncating Equation (1.6) after k
terms gives error roughly σ k , and thus k = O(log(1/δ )/ log(1/σ )) suffices for error δ . We next describe
these tail bounds in detail.
    We assume the following setup: Z = (Z1 , . . . , Zn ) is a vector of real valued random variables where Zi
has mean 0 and variance σi2 , and σ 2 = ∑i σi2 < 1. Let U denote   √ the distribution where the coordinates of
                                                                 `
Z are independent. One can show that EZ∈U [|S` (Z)|] ≤ σ / `! and hence by Markov’s inequality (see
Corollary 3.2) when t > 1 and tσ ≤ 1/2,
                                                "                            #
                                                        n
                                           Pr       ∑ |S` (Z)| ≥ 2(tσ )  k
                                                                                 ≤ 2t −2k .             (1.7)
                                          Z∈U `=k


   Although k-wise independence does not suffice to bound E[S` (Z)] for `  k, we use Theorem 1.5 to
show that a similar tail bound holds under limited independence.

Theorem 1.6. Let D denote a distribution over Z = (Z1 , . . . , Zn ) as above where the Zi s are (2k + 2)-wise
independent. For t > 0 and2 16etσ < 1,
                                               "                             #
                                                    n
                                          Pr
                                         X∈D `=k
                                                   ∑ |S` (Z)| ≥ 2(8etσ )k ≤ 2t −2k .                    (1.8)


    Typically proofs of tail bounds under limited independence proceed by bounding the expectation
of some suitable low-degree polynomial. The proof of Theorem 1.6 does not follow this route. In
Section 3.2, we give an example of Zi s and a (2k + 2)-wise independent distribution on where E[|S` (Z)|]
for ` ∈ {2k + 3, . . . , n − 2k − 3} is much larger than under the uniform distribution. The same example
also shows that our tail bounds are close to tight.
   1 We formally show this in Section 3.2.
   2 A weaker but more technical assumption on t, σ , k suffices, see Equation (3.11).



                           T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                            5
                                PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

1.3   Applications
The notion of min-wise independent hashing was introduced by Broder et al. [4] motivated by similarity
estimation, and independently by Mulmuley [15] motivated by computational geometry. A hash function
is a map h : [n] → [m]. Let U denote the family of all hash functions h : [n] → [m]. Let H ⊆ U be a family
of hash functions. For S ⊆ [n], let min h(S) = minx∈S h(x). The following generalization was introduced
by Broder et al. [5]:

Definition 1.7. We say that H : [n] → [m] is approximately `-minima-wise independent with error ε if
for every S ⊆ [n] of size |S| ≥ ` and for every sequence T = (t1 , . . . ,t` ) of ` distinct elements of S,

           Pr [h(t1 ) < · · · < h(t` ) < min h(S \ T )] − Pr [h(t1 ) < · · · < h(t` ) < min h(S \ T )] ≤ ε.
          h∈H                                           h∈U

    Combinatorial rectangles are a well-studied class of tests in pseudorandomness [7, 12, 19, 3, 13, 9].
In addition to being a natural class of statistical tests, constructing generators for them with optimal seeds
(up to constant factors) will improve on Nisan’s generator for logspace [3], a long-standing open problem
in derandomization.

Definition 1.8. A combinatorial rectangle is a function f : [m]n → {0, 1} which is specified by n co-
ordinate functions fi : [m] → {0, 1} as f (x1 , . . . , xn ) = ∏i∈m fi (xi ). A map G : {0, 1}r → [m]n is a PRG for
combinatorial rectangles with error ε if for every combinatorial rectangle f : [m]n → {0, 1},

                                    Ex∈{0,1}r [ f (G(x))] − Ex∈[m]n [ f (x)] ≤ ε.

     A generator G : {0, 1}r → [m]n can naturally be thought of as a collection of 2r hash functions, one
for each seed. For y ∈ {0, 1}r , let G(y) = (x1 , . . . , xn ). The corresponding hash function is given by
gy (i) = xi . The corresponding hash functions have the property that the probability that they fool all test
functions given by combinatorial rectangles. Saks et al. [19] showed that this suffices for `-minima-wise
independence. They state their result for ` = 1, but their proof extends to all `.
     Constructions of PRGs for rectangles and min-wise hash functions that achieve seed-length
O(log(mn) log(1/ε)) were given by [7, 11] using limited independence. The first construction GMR
to achieve seed-length Õ(log(mn/ε)) was given recently by [9]. We use our results to give an analysis
of their generator which we believe is simpler and more intuitive, and also improves the seed-length, to
(nearly) match the lower bound from [12].
     We take the view of GMR as a collection of hash functions g : [n] → [m], based on iterative applications
of an alphabet squaring step. We describe the generator formally in Section 5. We start by observing that
fooling rectangles is easy when m is small; O(log(1/δ ))-wise Independence suffices, and this requires
O(log(1/δ ) log(m)) = O(log(1/δ )) random bits for m = O(1).
     The key insight in [9] is that gradually increasing the alphabet is also easy (in that it requires only
logarithmic randomness). Assume that we have a hash function g0 : [n] → [m] and from it, we define
g1 : [n] → [m2 ]. To do this, we pick a function g01 : [n] × [m] → [m2 ] and set g1 (i) = g01 (i, g0 (i)). The
key observation is that it suffices to pick g01 using only O(log(1/δ )/ log(m))-wise independence, rather
than the O(log(1/δ ))-wise independence needed for one shot (the larger m is, the less independence is
required).

                         T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                   6
       C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

    To see why this is so, fix subsets Si ⊆ [m2 ] for each co-ordinate and pretend that g0 is truly random.
One can show that the random variable Prg0 [g1 (i) ∈ Si ] over the choice of g01 has variance 1/poly(m).
Since we are interested in ∏i Prg0 [g1 (i) ∈ Si ], which is the product of n small variance random variables,
Theorem 1.1 says it suffices to use limited independence.3

Theorem 1.9. Let GMR be the family of hash functions from [n] to [m] defined in Section 5.2 with error
parameter δ > 0. The seed length is at most O((log log(n) + log(m/δ )) log log(m/δ )). Then, for every
S1 , . . . , Sn ⊆ [m],
                               Pr [∀ i ∈ [n] g(i) ∈ Si ] − Pr [∀ i ∈ [n] h(i) ∈ Si ] ≤ δ .
                             g∈GMR                             h∈U

      This improves the bound from [9] in the dependence on n and δ ; their bound was

                      O(log(mn/δ ) log log(m) + log(1/δ ) log log(1/δ ) log log log(1/δ )).

In particular, the dependence on n reduces from log(n) to4 log log(n). The authors of [12] showed
a lower bound of Ω(log(m) + log(1/ε) + log log(n)) even for hitting sets, so our bound is tight upto
the log log(m/δ ) factor. While [12] constructed hitting-set generators for rectangles with near-optimal
seedlength, we are unaware of previous constructions of pseudorandom generators for rectangles where
the dependence of the seedlength on n is o(log(n)).
    Saks et al. [19] showed how to translate a PRG for combinatorial rectangles to an approximately
minima-wise independent family (for completeness, see Section 5.5 for a proof).

Theorem 1.10 ([19]). Let G : {0, 1}r → [m]n be a PRG for combinatorial rectangles with error ε. The
resulting family {g               r
                m
                   y : y ∈ {0, 1} } of hash functions is approximately `-minima-wise independent with
error at most ε ` .

      We thus get the following corollary.

Corollary 1.11. For every `, there is a family of approximately `-minima-wise independent hash functions
with error ε and seed length at most O((log log(n) + log(m` /ε))(log log(m` /ε))).

1.4     Follow-up work
The basic nature of the questions we consider has led to follow-up work which we now briefly describe.
(A preliminary version of this article appeared as [10]).
    Gopalan, Kane and Meka [8] constructed the first PRG with seed-length O((log(n/δ ) log log(n/δ )2 )
for several classes of functions, including halfspaces, modular tests and combinatorial shapes. The key
technical ingredient of their work is a generalization of Theorem 1.1 to the complex numbers. Their
proof, however, is different from ours, and in particular it does not imply the inequalities and tail bounds
for symmetric polynomials that are proved here.
   3 To optimize the seed-length, we actually use almost k-wise independence rather than exact k-wise independence. So the

analysis does not use Theorem 1.1 as a black-box, but rather it directly uses Theorem 1.6.
   4 The reason log log(n) seedlength is possible is because every rectangle can be ε-approximated by one that depends only on

O(m log(1/ε)) co-ordinates. Hence the number of functions to fool grows polynomially in n, rather than exponentially.


                           T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                            7
                                    PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

    Understanding the tradeoff between space and randomness as computational resources is an important
problem in computational complexity theory. A central technique for understanding this tradeoff is via
PRGs for branching programs [17]. Meka, Reingold and Tal [14] constructed the first PRG for width-3
branching programs with nearly optimal seed length. Their proof relies on the proof strategy describe
here.
    The iterated restrictions approach is one of the few general mechanisms for constructing PRGs. It
was suggested by Ajtai and Wigderson [1] and in most applications does not yield truly optimal seed
length. Doron, Hatami and Hoza [6] showed that that the iterated restrictions approach can achieve
optimal seed length (in a specific scenario). A key step in their proof is an extension of our results on the
elementary symmetric polynomials to subset-wise symmetric polynomials.

1.5     Organization
We present the proofs of our inequalities for symmetric polynomials (Theorems 1.3 and 1.5) in Section 2
and tail bounds for symmetric polynomials (Theorem 1.6) in Section 3. We use these bounds to prove the
bound on products of low-variance variables (Theorem 1.1) in Section 4, and to analyze the generator
from [9] in Section 5.


2     Inequalities for symmetric polynomials
The proof of our inequality for the elementary symmetric polynomials is by induction on k, and uses the
method of Lagrange multipliers together with the Maclaurin identities.

Proof of Theorem 1.3. It will be convenient to use

                                                        E2 (a) = ∑ a2i .
                                                                  i∈[n]


By Newton’s identity, E2 = S12 − 2S2 so for all a ∈ Rn ,

                                          S12 (a) + E2 (a) ≤ 2(S12 (a) + |S2 (a)|).

It therefore suffices to prove that for all a ∈ Rn and k ∈ [n],

                                                        (16e2 (S12 (a) + E2 (a)))k
                                            Sk2 (a) ≤                              .
                                                                     kk
We prove this by induction. For k ∈ {1, 2}, it indeed holds. Let k > 2. Our goal will be upper bounding
the maximum of the projectively defined5 function

                                                                   Sk2 (a)
                                                 φk (a) =
                                                            (S12 (a) + E2 (a))k
    5 That is, for every a 6= 0 in Rn and c 6= 0 in R, we have φ (ca) = φ (a).
                                                                k        k



                            T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                          8
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

under the constraint that S1 (a) is fixed. Since φk is projectively defined, its supremum is attained in
the (compact) unit sphere, and is therefore a maximum. Choose a 6= 0 to be a point that achieves the
maximum of φk . We assume, without loss of generality, that S1 (a) is non-negative (if S1 (a) < 0, consider
−a instead of a). There are two cases to consider:
    The first case is that for all i ∈ [n],

                                                 2k1/2 (S12 (a) + E2 (a))1/2
                                         ai ≤                                .                        (2.1)
                                                               n

In this case we do not need the induction hypothesis and can in fact replace each ai by its absolute value.
Let P ⊆ [n] be the set of i ∈ [n] so that ai ≥ 0. Then by Equation (2.1),

                                      ∑ |ai | ≤ 2k1/2 (S12 (a) + E2 (a))1/2 .
                                      i∈P

Note that
                                          S1 (a) = ∑ |ai | − ∑ |ai | ≥ 0.
                                                        i∈P       i6∈P

Hence
                                ∑ |ai | ≤ ∑ |ai | ≤ 2k1/2 (S12 (a) + E2 (a))1/2 .
                               i6∈P           i∈P

Overall we have
                                      ∑ |ai | ≤ 4k1/2 (S12 (a) + E2 (a))1/2 .
                                      i∈[n]

We then bound

                    |Sk (a1 , . . . , an )| ≤ Sk (|a1 |, . . . , |an |)
                                                                      !k
                                               e k
                                            ≤             ∑ |ai | by the Maclaurin identities
                                                k        i∈[n]

                                                 4e k 2
                                                    
                                            ≤ √            (S1 (a) + E2 (a))k/2 .
                                                   k

    The second case is that there exists i0 ∈ [n] so that

                                                    2k1/2 (S12 (a) + E2 (a))1/2
                                        a i0 >                                  .                     (2.2)
                                                                  n

In this case we use induction and Lagrange multipliers. For simplicity of notation, for a function F on Rn
denote
                                 F(−i) = F(a1 , a2 , . . . , ai−1 , ai+1 , . . . , an )

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                             9
                                     PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

for i ∈ [n]. For every δ ∈ Rn so that ∑i δi = 0 we have φk (a + δ ) ≤ φk (a). Hence,6 for all δ so that
∑i δi = 0,

                                             Sk2 (a + δ )
                        φk (a) ≥
                                    (S12 (a + δ ) + E2 (a + δ ))k
                                     (Sk (a) + ∑i δi Sk−1 (−i) + O(δ 2 ))2
                                ≥
                                    (S12 (a) + E2 (a) + 2 ∑i ai δi + O(δ 2 ))k
                                               Sk2 (a) + 2Sk (a) ∑i δi Sk−1 (−i) + O(δ 2 )
                                ≥                                                                    .
                                    (S12 (a) + E2 (a))k + 2k(S12 (a) + E2 (a))k−1 ∑i ai δi + O(δ 2 )

Hence, for all δ close enough to zero so that ∑i δi = 0,

                        Sk2 (a)                 Sk2 (a) + 2Sk (a) ∑i δi Sk−1 (−i) + O(δ 2 )
                                    ≥                                                                 ,
                 (S12 (a) + E2 (a))k (S12 (a) + E2 (a))k + 2k(S12 (a) + E2 (a))k−1 ∑i ai δi + O(δ 2 )
or

                                    ∑ δi ai Sk (a)k − (S12 (a) + E2 (a))Sk−1 (−i) ≥ 0.
                                                                                          
                                                                                                             (2.3)
                                     i

For the above inequality to hold for all such δ , it must be that there is λ so that for all i ∈ [n],

                                         ai Sk (a)k − (S12 (a) + E2 (a))Sk−1 (−i) = λ .

To see why this is true, set λi = ai Sk (a)k − (S12 (a) + E2 (a))Sk−1 (−i) . We now have λ1 , . . . , λn so that

                                                           ∑ λi δi ≥ 0                                       (2.4)
                                                            i

for every δ1 , . . . , δn of sufficiently small norm where ∑i δi = 0. We claim that this implies that in fact
λi = λ for every i. To see this, assume for contradiction that λ1 6= λ2 and |λ1 | > |λ2 |. Set

                                     δ1 = −µλ1 , δ2 = µλ1 , δ3 = δ4 = . . . = δn = 0

for µ > 0 sufficiently small. It follows that ∑i δi = 0 and ∑i λi δi = µ(λ1 λ2 − λ12 ) < 0 so Equation (2.4)
is violated.
    Sum over i to get

                             λ n = S1 (a)Sk (a)k − (S12 (a) + E2 (a))(n − (k − 1))Sk−1 (a).

Thus, for all i ∈ [n],

                       ai Sk (a)k − (S12 (a) + E2 (a))Sk−1 (−i)
                                     1
                                         S1 (a)Sk (a)k − (S12 (a) + E2 (a))(n − (k − 1))Sk−1 (a) ,
                                                                                                
                                  =
                                     n
     6 Here and below, O(δ 2 ) means of absolute value at most C · kδ k
                                                                          ∞ for C = C(n, k) ≥ 0.



                            T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                  10
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

or
                              
                        S1 (a)
           Sk (a)k ai −
                          n
                                                                      (k − 1) 2
                 = (S12 (a) + E2 (a))(Sk−1 (−i) − Sk−1 (a)) +                (S1 (a) + E2 (a))Sk−1 (a)).
                                                                         n
This specifically holds for i0 , so using Equation (2.2) we have
                             ai0
                  Sk (a)k
                              2                
                                  S1 (a)
                  < Sk (a)k ai0 −
                                    n
                                                               (k − 1)(S12 (a) + E2 (a))Sk−1 (a)
                  ≤ (S12 (a) + E2 (a))ai0 Sk−2 (−i0 ) +                                          ,
                                                                               n
or

                  |Sk (a)|                                                                                       (2.5)
                         2(S12 (a) + E2 (a))Sk−2 (−i0 )   2(k − 1)(S12 (a) + E2 (a))Sk−1 (a)
                  ≤                                     +
                                        k                                nkai0
                         2(S12 (a) + E2 (a))Sk−2 (−i0 )  (S2 (a) + E2 (a))1/2 Sk−1 (a)
                  <                                     + 1                            .
                                        k                            k1/2

To apply induction we need to bound S12 (−i0 ) + E2 (−i0 ) from above. Since

                         S12 (a) + E2 (a) − S12 (−i0 ) − E2 (−i0 ) = a2i0 + 2ai0 S1 (−i0 ) + a2i0
                                                                   = 2ai0 S1 (a) ≥ 0.

we have the bound

                                       S12 (−i0 ) + E2 (−i0 ) ≤ S12 (a) + E2 (a).

Finally, by induction and Equation (2.5),

                         2(S12 (a) + E2 (a)) (16e2 (S12 (−i0 ) + E2 (−i0 )))(k−2)/2
            |Sk (a)| ≤
                                   k                     (k − 2)(k−2)/2
                         (S12 (a) + E2 (a))1/2 (16e2 (S12 (a) + E2 (a)))(k−1)/2
                    +
                                  k1/2                 (k − 1)(k−1)/2
                                                                                                             !
                      (16e2 (S12 (a) + E2 (a)))k/2                   2                        1
                    ≤                                                    (k−2)/2 +               (k−1)/2
                                   kk/2                    16e2 1 − 2k                4e 1 − 1k
                         (16e2 (S12 (a) + E2 (a)))k/2
                    <                                 .
                                      kk/2

                         T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                     11
                                  PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

   The proof of the more general inequality (Theorem 1.5) is by reduction to Theorem 1.3, and uses the
connection between real polynomials in one variable and the symmetric polynomials.

Proof of Theorem 1.5. Assume a1 , . . . , am are nonzero and am+1 , . . . , an are zero. Denote a0 = (a1 , . . . , am )
and notice that for all7 k ∈ [n],
                                               Sk (a) = Sk (a0 ).

Write
                                                                       m
                                        p(ξ ) = ∏ (ξ ai + 1) = ∑ ξ k Sk (a).
                                                  i∈[m]               k=0

Derive k times to get
                                                  
               (k)               m Sm (a) m−k    m − 1 Sm−1 (a) m−k−1
             p (ξ ) = Sk (a)k!            ξ   +                    ξ     +...
                                 k Sk (a)          k      Sk (a)
                                                                                  
                                                                 k + 1 Sk+1 (a)
                                                        ...+                    ξ +1 .
                                                                   k    Sk (a)

Since p has m real roots, p(k) has m − k real roots. Since p(k) (0) 6= 0, there is b ∈ Rm−k so that

                                          p(k) (ξ ) = Sk (a)k!    ∏ (ξ bi + 1).
                                                                 i∈[m−k]


For all h ∈ [m − k],
                                                            
                                                        k + h Sk+h (a)
                                              Sh (b) =                 .
                                                          k    Sk (a)
By assumption,
                                           |S1 (b)| ≤ C and |S2 (b)| ≤ C2 .

Theorem 1.3 implies
                                                                  (8eC)h
                                                      
                                                  k + h Sk+h (a)
                                      |Sh (b)| =                 ≤ h/2 .
                                                    k    Sk (a)    h

2.1     Zeros of polynomials
We conclude the section with a proof of Fact 1.4 which states that over the reals, if Sk (a) = Sk+1 (a) = 0
for k > 0 then S` (a) = 0 for all ` ≥ k.
    For a univariate polynomial p(ξ ) and a root y ∈ R of p, denote by mult(p, y) the multiplicity of
the root y in p. We use the following property of polynomials p(ξ ) with real roots (see, e. g., [18]),
which can be proved using the interlacing of the zeroes of p(ξ ) and p0 (ξ ): If mult(p0 , y) ≥ 2 then
mult(p, y) ≥ mult(p0 , y) + 1.
   7 For k > m we have S (a) = 0 so there is nothing to prove.
                        k



                          T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                     12
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

Proof of Fact 1.4. Let
                                                                               n
                                       p(ξ ) = ∏ (ξ + bi ) = ∑ ξ k Sn−k (b).
                                                    i∈[n]                     k=0

Consider p(n−k−1) (ξ ) which is the (n − k − 1)-th derivative of p(ξ ). Since Sk (b) = Sk+1 (b) = 0 for k > 0,
it follows that ξ 2 divides p(n−k−1) (ξ ) and hence mult(p(n−k−1) , 0) ≥ 2. Applying the above fact n − k − 1
times, we get mult(p, 0) ≥ n − k + 1 so Sn (b) = . . . = Sk (b) = 0.


3    Tail bounds under limited independence
In this section we work with the following setup. Let X = (X1 , . . . , Xn ) be a vector of real valued random
variables so that E[Xi ] = 0 for all i ∈ [n]. Let σi2 denote the variance of Xi , and let σ 2 = ∑ni=1 σi2 . The
goal is proving a tail bound on the behavior of the symmetric functions under limited independence.
    We start by obtaining tail estimates, under full independence. Let U denote the distribution over
X = (X1 , . . . , Xn ) where X1 , . . . , Xn are independent.
                                  2`
Lemma 3.1. EX∈U [S`2 (X)] ≤ σ`! .

Proof. Since the expectation of Xi is zero for all i ∈ [n],
                                                                              "                        #
                           E[S`2 (X)] =                  ∑                E       ∏ Xt ∏ Xt        0

                                               T,T 0 ⊆[n]:|T |=|T 0 |=`        t∈T      t 0 ∈T 0
                                                                   "           #
                                           =        ∑          E       ∏ Xt2 =              ∑              ∏ σt2
                                               T ⊆[n]:|T |=`        t∈T               T ⊆[n]:|T |=` t∈T
                                                                !`
                                               1                            σ 2`
                                           ≤
                                               `!     ∑ σi2             =
                                                                             `!
                                                                                 .
                                                      i∈[n]


Corollary 3.2. For t > 0 and ` ∈ [n], by Markov’s inequality,
                                                                         !`                  
                                                              e1/2tσ                 (tσ )`1
                               Pr |S` (X)| ≥                                     ≥ √  ≤ 2` .                     (3.1)
                               X∈U                             `1/2                  `!  t


If 2e1/2tσ ≤ k1/2 then by the union bound
                                                                             !k 
                                       n
                                                                   e1/2tσ                          1
                             Pr  ∑ |S` (X)| ≥ 2                                 ≤                           .    (3.2)
                            X∈U    `=k                              k1/2                  t 2k − t 2(k−1)

    We now consider limited independence.

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                         13
                                PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

Lemma 3.3. Let D denote a distribution over X = (X1 , . . . , Xn ) where X1 , . . . , Xn are (2k + 2)-wise
independent. Let t ≥ 1. Except with D-probability at most 2t −2k , the following bounds hold for all
` ∈ {k, . . . , n}:
                                                              `/2
                                                              k`
                                          |S` (X)| ≤ (8etσ )        .                                        (3.3)
                                                              `

Proof. In the following the underlying probability distribution over X is D. By Lemma 3.1, for i ∈
{k, k + 1},

                                                               σ 2i
                                                E[Si2 (X)] ≤        .
                                                                i!
Hence by Markov’s inequality,

                                                        (tσ )i
                                                              
                                          Pr |Si (X)| ≥ √        ≤ t −2i .
                                                           i!
From now on, condition on the event that

                                           (tσ )k              (tσ )k+1
                                |Sk (X)| ≤ √ and |Sk+1 (X)| ≤ p          ,                                   (3.4)
                                              k!                (k + 1)!

which occurs with probability at least 1 − 2t −2k . Fix x = (x1 , . . . , xn ) such that Equation (3.4) holds.
   We claim that there must exist k0 ∈ {0, . . . , k − 1} for which the following bounds hold:

                                                          (tσ )k0
                                             |Sk0 (x)| ≥ √        ,                                          (3.5)
                                                             k0 !
                                                            (tσ )k0 +1
                                           |Sk0 +1 (x)| ≤ p            ,                                     (3.6)
                                                             (k0 + 1)!
                                                           (tσ )k0 +2
                                           |Sk0 +2 (x)| ≤ p           .                                      (3.7)
                                                            (k0 + 2)!

To see this, mark point j ∈ {0, . . . , k + 1} as high if

                                                             (tσ ) j
                                                 |S j (x)| ≥ √
                                                                j!
and low if
                                                            (tσ ) j
                                                |S j (x)| ≤ √ .
                                                               j!
A point is marked both high and low if equality holds. Observe that 0 is marked high (and low) since
S0 (x) = 1 and k and k + 1 are marked low by Equation (3.4). This implies the existence of a triple
k0 , k0 + 1, k0 + 2 where the first point is high and the next two are low.

                        T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                    14
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

    Let γ > 0 be the smallest number so that the following inequalities hold:
                                                            γ
                                |Sk0 +1 (x)| ≤ |Sk0 (x)| √       ,                                               (3.8)
                                                          k0 + 1
                                                                 γ2
                                |Sk0 +2 (x)| ≤ |Sk0 (x)| p                  .                                    (3.9)
                                                           (k0 + 1)(k0 + 2)

By definition, one of Equations (3.8) and (3.9) holds with equality so
                                   (               √                   p                 )
                                       |Sk0 +1 (x)| k0 + 1 |Sk0 +2 (x)| (k0 + 1)(k0 + 2)
                 |Sk0 (x)| = max                          ,                                .
                                                 γ                       γ2

Observe further that γ ≤ tσ by Equations (3.5), (3.6) and (3.7). Combining this with the bounds in
Equations (3.6) and (3.7)

                                          (tσ )k0 +1 (tσ )k0 +2                   (tσ )k0 +2
                                                                        
                          |Sk0 (x)| ≤ max   √       , √                       =       √      .                  (3.10)
                                           γ k0 ! γ 2 k0 !                         γ 2 k0 !
                                                                 √
    Equations (3.8) and (3.9) let us apply Theorem 1.5 with C = γ k0 + 1 and h ≥ 3 to get

                                        Sk0 +h (x)          (k0 + 1)h/2
                                                   ≤ (8eγ)h           .
                                         Sk0 (x)            hh/2 k0k+h    0


Bounding |Sk0 | by Equation 3.10, we get

                                        (k0 + 1)h/2 (tσ )k0 +2          k0 +h (k0 + 1)
                                                                                       h/2
                |Sk0 +h (x)| ≤ (8eγ)h                  √       ≤ (8etσ )         √           .
                                        hh/2 k0k+h γ 2 k0 !                  hh/2 k0 ! k0h+h
                                                  
                                                   0


Since
                                   (                  )
                                      k0 + h k0 k0 + h h    (k0 + h)(k0 +h)/2
                                           
                    k0 + h
                             ≥ max             ,          ≥      k /2
                                                                              ,
                       h                k0         h            k 0 hh/2                0

we have
                                                     h/2       k /2                           (k0 +h)/2
                   (k0 + 1)h/2                                  k00 hh/2
                                                                                 
                                            k0 + 1                                    k0 + 1
                     √          ≤                                            ≤                             .
                 hh/2 k0 ! k0h+h               h            (k0 + h)(k0 +h)/2         k0 + h

Therefore, denoting ` = k0 + h, since k0 + 1 ≤ k,
                                                                   `/2
                                                                   k
                                            |S` (x)| ≤ (8etσ )`          .
                                                                   `


                      T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                        15
                                PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

3.1   Proof of tail bounds
Proof of Theorem 1.6. As in Lemma 3.3, fix x = (x1 , . . . , xn ) such that Equation 3.4 holds (the random
vector X has this property with D-probability at least 1 − 2t −2k ). By the proof of lemma, since by
assumption 8etσ < 1/2,
                    n                            n                       `/2
                              (tσ )k (tσ )k+1                            k
                  ∑ |S` (x)| ≤ k! + p(k + 1)! + ∑ (8etσ )`               `
                                                                               ≤ 2(8etσ )k .                (3.11)
                  `=k                          `=k+2




3.2   On the tightness of the tail bounds
We conclude by showing that (2k + 2)-wise independence is insufficient to fool |S` | for ` > 2k + 2 in
expectation. We use a modification of a simple proof due to Noga Alon of the Ω(nk/2 ) lower bound on
the support size of a k-wise independent distribution on {−1, 1}n , which was communicated to us by
Raghu Meka.
    For this section, let X1 , . . . , Xn be so that each Xi is uniform over {−1, 1}. Thus σ 2 = ∑i Var[Xi ] = n.
By Lemma 3.1, we have
                                                                 1/2 n`/2
                                 EX∈U [|S` (X)|] ≤ EX∈U [S`2 (X)]    ≤√ .                                   (3.12)
                                                                        `!
In contrast we have the following:
Lemma 3.4. There is a (2k + 2)-wise independent distribution on X = (X1 , X2 , . . . , Xn ) in {−1, 1}n such
that for every ` ∈ [n],
                                                
                                                  n          1
                                  Pr |S` (X)| ≥         ≥ k+1 .
                                 X∈D               `       3n
Specifically,
                                                              n
                                                                   
                                                               `
                                           EX∈D [|S` (X)|] ≥ k+1 .                                          (3.13)
                                                            3n
Proof. Let D be a (2k + 2)-wise independent distribution on {−1, 1}n that is uniform over a set D of size
2(n + 1)k+1 ≤ 3nk+1 . Such distributions are known to exist [2]. Further, by translating the support by
some fixed vector if needed, we may assume that (1, 1, . . . , 1) ∈ D. It is easy to see that every such translate
also induces a (2k + 2)-wise independent distribution. The claim holds since S` (1, . . . , 1) = n` .

    When, for example, k = O(log n), which is often the case of interest, for 2k + 3 ≤ ` ≤ n − (2k + 3),
the RHS of (3.13) is much larger than the bound guaranteed by Equation 3.12. The tail bound provided
by Lemma 3.3 can not therefore be extended to a satisfactory bound on the expectation. Furthermore,
applying Lemma 3.3 with                            r
                                                 1    n
                                             t=
                                                 8e `k

                        T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                   16
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

implies that for any (2k + 2)-wise independent distribution,
                                        "                      #              k
                                                         √ ` k `/2      64e2 k`
                                                                  
                               n
               Pr |S` (X)| ≥        ≤ Pr |S` (X)| ≥ (8et n)        ≤2              .
                               `                              `           n

When k` = o(n), this is at most O(n−k+o(1) ). Comparing this to the bound given in Lemma 3.4, we see
that the bound provided by Lemma 3.3 is nearly tight.


4    Limited independence fools products of variables
In this section we work with the following setup. We have n random variables X1 , . . . , Xn , each distributed
in the interval [−1, 1]. Let µi and σi2 denote the mean and variance of Xi , and let σ 2 = ∑ni=1 σi2 . The
following theorem shows that limited independence fools products of bounded variables with low total
variance.

Theorem 4.1. There exists C > 0 such that under any Ck-wise independent distribution D,

                                                    n            n
                                           ED [∏ Xi ] − ∏ µi ≤ (Cσ )k .                                  (4.1)
                                                i=1             i=1


Proof. Denote by U the distribution on (X1 , . . . , Xn ) in which the Xi s are independent with the same
                                                                                        √
marginal distribution as in D. Define H ⊆ [n] to be the set of indices such that |µi | ≤ σ . There are two
cases to consider.
Case one: The first case is that |H| ≥ 2k. In this case, let H 0 be a subset of H of size 2k. Since the
variables are bounded in [−1, 1], we have

                                ED [ ∏ Xi ] ≤ ED [ ∏ Xi ] ≤ ED [ ∏ Xi ].
                                    i∈[n]                   i∈[n]                i∈H 0


The 2k-wise independence implies
                                              q          q
                  ED [ ∏ Xi ] = ∏ E[|Xi |] ≤ ∏ E[Xi ] = ∏ σi2 + µi2 ≤ (2σ )k .
                                                   2
                        i∈H 0      i∈H 0                i∈H 0            i∈H 0

The same bound also holds under U. Hence,

                                    ED [ ∏ Xi ] − EU [ ∏ Xi ] ≤ 2(2σ )k .
                                            i∈[n]                i∈[n]


Case two: The second case is that |H| < 2k. Let T = H \ [n]. For ease of notation, we shall assume that
T = [m] for some m ≤ n. We may assume that m > 10k, since otherwise there is nothing to prove. Even
after conditioning on the outcome of variables in H, the resulting distribution on X1 , . . . , Xm is 10k-wise

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                17
                               PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

independent. Since the variables have absolute value at most 1, it suffices to show that for a 10k-wise
independent distribution D,

                                      ED [ ∏ Xi ] − EU [ ∏ Xi ] ≤ 2σ k .
                                          i∈[m]               i∈[m]

Write Xi = µi (1 + Zi ) so that Zi has mean 0 and variance σi2 /µi2 . Define the random variables
                                                                      m
                                       P = ∏ Xi = ∏ µi · ∑ S` (Z),
                                           i∈[m]         i∈[m]        `=0
                                                        4k
                                      P0 = ∏ µi · ∑ S` (Z),
                                           i∈[m]        `=0


where Z = (Z1 , . . . , Zm ). We will prove the following claim.
Claim 4.2. For a 4k-wise independent distribution D,

                                           ED [P − P0 ] ≤ (cσ )k /2.

    We first show how to finish the proof of Theorem 4.1 with this claim. We have

                  |ED [P] − EU [P]| ≤ ED [P − P0 ] + EU [P − P0 ] + ED [P0 ] − EU [P0 ] .

The first two terms are bounded from above by (cσ k )/2 by the claim, and the last is 0 since 10k-wise
independence fools degree 4k polynomials, such as P0 .

Proof of Claim 4.2. Denote by σ̄i2 the variance of Zi . By definition of T , we have σ̄i2 = σi2 /µi2 ≤ σi2 /σ .
The variance of the Z can be bounded by
                                                  m
                                                                  σi2
                                         σ̄ 2 = ∑ σ̄i2 ≤ ∑            ≤ σ.
                                                  i=1         i∈T σ
                                            √
Let G denote the event that |P − P0 | ≤ 2(8e σ̄ )4k , and denote by ¬G the complement of G. Write

                             E[P − P0 ] = E[(P − P0 )1(G)] + E[(P − P0 )1(¬G)].

By the definition of G,
                                                              √
                                     |E[(P − P0 )1(G)]| ≤ 2(8e σ̄ )4k .

Bound the second term as follows. First, since −1 ≤ P ≤ 1,

                          |E[(P − P0 )1(¬G)]| ≤ |E[P1(¬G)]| + |E[P0 1(¬G)]|
                                                            q
                                              ≤ |E[1(¬G)]| + E[P02 ] · E[1(¬G)].                         (4.2)


                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                18
      C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

Recall that
                                                           m
                                             P − P0 =      ∑ S` .
                                                        `=4k+1
              √
Letting t = 1/ σ̄ and applying Theorem 1.6,
                                         E[1(¬G)] ≤ 2t −8k = 2σ̄ 4k .                                     (4.3)
Since E[Zi ] = 0 for all i, and by Lemma 3.1,
                                                4k             4k
                                                                  σ̄ 2i
                                      E[P02 ] ≤ ∑ E[Si2 ] ≤ ∑           ≤ 2.                              (4.4)
                                                i=0            i=0 i!
                                                √
So, the RHS of Equation (4.2) is at most σ̄ 4k + 2σ̄ 2k ≤ 3σ k , as required.




5     Analyzing the PRG for rectangles
Gopalan et al. [9] proposed and analyzed a PRG for combinatorial rectangles, which we denote by GMR .
In this section, we provide a different presentation and analysis of their construction, which is based on
our results concerning the symmetric polynomials. Our analysis is simpler and follows the intuition that
products of low variance events are easy to fool using limited independence. It also improves on their
seedlength in the dependence on n, δ , as discussed above.

5.1   Preliminaries
Let U denote the uniform distribution on [m]n , and let D be a distribution on [m]n . We denote by Prx∈D the
probability distribution induced by choosing x according to D. For K ⊆ [n], denote by DK the marginal
distribution of D in co-ordinates in K.
Definition 5.1. A distribution D on [m]n is (k, ε)-wise independent if for every K ⊆ [n] of size at most k,
the total variation distance between DK and UK is at most ε.
    Naor and Naor [16] showed that such distributions (for m a power of two) can be generated using
seed-length O(log log(n) + k log(m) + log(1/ε)). Indeed, such distributions can be generated by taking
a (k log(m))-wise ε-dependent string of length n log(m). We can also assume that every co-ordinate is
uniformly random in [m], by adding the string (a, a, . . . , a) modulo m, where a ∈ [m] is uniformly random.
    The following property holds. Let P be a real linear combination of combinatorial rectangles,
                                                 P = ∑ cS f S ,
                                                       S

where fS (x) = ∏i∈S fS,i (xi ) where fS,i : [m] → {0, 1} for all S, i. Let L1 (P) = ∑S |cS |. The degree of P is
the maximum size of S for which cS 6= 0. Convexity implies that if D is (k, ε)-wise independent and P
has degree at most k then
                                   |Ex∈D [P(x)] − Ex∈U [P(x)]| ≤ L1 (P)ε.

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                 19
                              PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

5.2   The generator
We use an alternate view of GMR as a collection of hash functions g : [n] → [m]. The generator GMR is
based on iterative applications of an alphabet increasing step. The first alphabet m0 is chosen to be large
enough, and at each step t > 1 the size of the alphabet mt is squared mt = mt−12 .

    There is a constant C > 0 so that the following holds. Denote by δ the error parameter of the generator.
Let T ≤ C log log(m) be the first integer so that mT ≥ m. Let δ 0 = δ /T .
Base Case: Let m0 ≥ C log(1/δ ) be a power of 2. Sample g0 : [n] → [m0 ] using a (k0 , ε0 )-wise
independent distribution on [m0 ]n with

                                    k0 = C log(1/δ 0 ), ε0 = δ 0 · m−Ck
                                                                    0
                                                                       0
                                                                         .                             (5.1)

This requires seed length O(log log(n) + log(log log(m)/δ ) log log(log log(m)/δ )).
Squaring the alphabet: Pick gt0 : [mt−1 ] × [n] → [mt ] using a (kt , εt )-wise independent distribution over
[mt ][mt−1 ]×[n] with

                                            log(1/δ 0 )
                                                          
                                 kt = max C             , 2 , εt ≤ mt−Ckt .
                                             log(mt )

Define a hash function gt : [n] → [mt ] as

                                             gt (i) = gt0 (gt−1 (i), i).

This requires seed length O(log log(n) + log(mt ) + log(log log(m)/δ )).


5.3   Two lemmas
We first analyze the base case using the inclusion-exclusion approach of [7]. We need to extend their
analysis to the setting where the co-ordinates are only approximately k-wise independent.

Lemma 5.2. Let D be a (k, ε)-wise independent distribution on [m]n with k odd. Then, for every
S1 , . . . , Sn ⊆ [m],


                  Pr [∀ i ∈ [n] g(i) ∈ Si ] − Pr [∀ i ∈ [n] h(i) ∈ Si ] ≤ εmk + exp(−Ω(k)).
                 g∈D                         h∈U


Proof. Let pi = |Si |/m, and qi = 1 − pi . Observe that
                                                                                     !
                                                    n         n                 n
                       Pr [∀ i ∈ [n] h(i) ∈ Si ] = ∏ pi = ∏(1 − qi ) ≤ exp − ∑ qi .                    (5.2)
                     h∈U                           i=1       i=1               i=1


We consider two cases based on ∑i qi .

                        T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                             20
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

Case 1: When ∑i qi ≤ k/(2e). Since every non-zero qi is at least 1/m, there can be at most mk/(2e)
indices i so that qi > 0. For i so that qi = 0, we have Si = [m], so we can drop such indices and assume
n ≤ mk/(2e). By Bonferroni inequality, since k is odd,

                                                            k−1
                        Pr[∀ i ∈ [n] g(i) ∈ Si ] − ∑ (−1) j                     ∑          Pr[∀ i ∈ J g(i) 6∈ Si ]
                        g                                                                  g
                                                            j=0             J⊆[n]:|J|= j

                                     ≤      ∑          Pr[∀ i ∈ J g(i) 6∈ Si ].
                                                       g
                                         J⊆[n]:|J|=k

                                                                                                           n
                                                                                                            
A similar bound holds for h. Using the (k, ε)-wise independence, and since                                 k    ≤ (en/k)k ,


                            Pr[∀ i ∈ [n] g(i) ∈ Si ] − Pr[∀ i ∈ [n] h(i) ∈ Si ]
                             g                                    h
                                                            k
                                          ≤ ε(en/k) + 2                    ∑        Pr[∀ i ∈ J h(i) 6∈ Si ].
                                                                                     h
                                                                  J⊆[n]:|J|=k


The second term is twice Sk (q1 , . . . , qn ), which we can bound by Maclaurin’s identity as
                                                                                         !k
                                                                                n
                                     Sk (q1 , . . . , qn ) ≤ (e/k)k            ∑ qi            ≤ 2−k .
                                                                               i=1


The lemma is proved since n ≤ mk/(2e).
Case 2: When ∑i qi > k/(2e). Once again, we drop indices i so that qi = 0. Consider the largest n0 such
that
                                                                      n0
                                                k/(2e) − 1 ≤ ∑ qi ≤ k/(2e).
                                                                      i=1

Repeating the argument from Case 1 for this n0 , we get

                   Pr[∀ i ∈ [n0 ] g(i) ∈ Si ] − Pr[∀ i ∈ [n0 ] h(i) ∈ Si ] ≤ ε(m/2)k + 2−k+1 .
                    g                                   h


Similarly to Equation (5.2),
                                           Pr[∀ i ∈ [n0 ] h(i) ∈ Si ] ≤ e1−k/(2e) .
                                            h

Finally, since
                                 Pr[∀ i ∈ [n] g(i) ∈ Si ] ≤ Pr[∀ i ∈ [n0 ] g(i) ∈ Si ],
                                 g                                         g

the lemma is proved.

   To analyze the iterative steps, we use the following lemma. To simplify notation, for a finite set X,
we denote by Prx∈X the probability distribution induced by choosing x uniformly in X.

                         T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                                21
                                      PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

Lemma 5.3. There is C > 0 so that the following holds. Let 0 < δ < 1/C. Assume

                                 k > 1, ` ≥ log(1/δ ), ` ≥ k, `−k ≤ δ C , ε ≤ (m`)−Ck .

Let D be a (Ck, ε)-wise independent distribution on g0 : [`] × [n] → [m] so that for every (a, i) ∈ [`] × [n],
the distribution of g0 (a, i) is uniform on [m]. Let g : [n] → [m] be defined by g(i) = g0 (xi , i). Then,

                                 Pr       [∀ i ∈ [n] g(i) ∈ Si ] − Pr [∀ i ∈ [n] h(i) ∈ Si ] ≤ δ .
                           g0 ∈D,x∈[`]n                            h∈[m]n

Proof. Let pi = |Si |/m and qi = 1 − pi . Partition [n] into a head H = {i : pi < `−0.1 } and a tail T = {i :
pi ≥ `−0.1 }. There are two cases to consider.8
The head is large. If |H| ≥ k, we show that both probabilities are small which means that they are close.
Indeed, let H 0 be the first k indices in H. First, by definition of H,

                   Pr[∀ i ∈ [n] h(i) ∈ Si ] ≤ Pr[∀ i ∈ H 0 h(i) ∈ Si ] = ∏ Pr[h(i) ∈ Si ] ≤ `−0.1k .
                    h                              h
                                                                                i∈H 0

Second, (k, ε)-wise independence implies

                            Pr[∀ i ∈ [n] g(i) ∈ Si ] ≤ Pr[∀ i ∈ H 0 g(i) ∈ Si ] ≤ `−0.1k + ε,
                             g                               g

so the proof is complete.
The head is small. From now on, we may assume that |H| < k. We may also assume that qi ≥ 1/m
and pi > 0 for all i ∈ T , since otherwise Si is trivial and we can drop such an index. As in the proof of
Lemma 5.2, by restricting to a subset if necessary, we can also assume that

                                                       ∑ qi ≤ C log(1/δ ).                                                    (5.3)
                                                       i∈T

Therefore, |T | ≤ Cm log(1/δ ).
   For i ∈ T , define the random variable

                                                 Yi,a = 1(g0 (a, i) ∈ Si ) − pi .

Since g0 (a, i) is uniform over [m] for all a, i, we have E[Yi,a ] = 0 and Var[Yi,a ] = qi pi . Define the random
vector A = (Ai : i ∈ T ) by
                                                       1 `
                                                 Ai =     ∑ Yi,a .
                                                      `pi a=1
Define the random variable
                                                  Q = Pr[∀ i ∈ H g(i) ∈ Si ].
                                                         x
   8 Standard arguments imply that if (k, ε)-wise independence fools both ∀ i ∈ H g(i) ∈ S and ∀ i ∈ T g(i) ∈ S with error δ
                                                                                                 i                     i
then (O(k), ε O(1) )-wise independence fools their intersection with error O(δ ). So it suffices to consider each of them separately.
However, since we could not find an explicit reference for this statement, we provide a self-contained argument.


                            T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                                22
      C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

So, for every fixed g0 ,
                                                                                      |T |
                      Pr[∀ i ∈ [n] g(i) ∈ Si ] = Q · ∏ pi (1 + Ai ) = Q · ∏ pi · ∑ Si (A).               (5.4)
                       x
                                                        i∈T                   i∈T     i=0

Define
                                                               |T |
                                                P = Q · ∏ pi · ∑ Si (A),
                                                        i∈T    i=0
                                                                k
                                                P0 = Q · ∏ pi · ∑ Si (A).
                                                        i∈T    i=0

The degree of P0 is at most 2k. We will show that P0 is a good approximation to P.
Claim 5.4. |ED [P − P0 ]| ≤ O(`−0.2k ).
    The claim completes the proof:

                    |ED [P] − EU [P]| ≤ |ED [P − P0 ]| + |ED [P0 ] − EU [P0 ]| + |EU [P − P0 ]|.

Bound the first and third terms by O(`−0.2k ) using the claim (U is also (Ck, ε)-wise independent). Bound
the second term as follows. Since k ≥ 2, for all i,

                                                   1 `                 q2i   qi
                                   Var[Ai ] =     2 2 ∑   Var[Yi,a ] =     ≤ 0.9 ,
                                                 ` pi a=1              `pi `
                                                  1 `               2
                                   L1 (Ai ) ≤        ∑ L1 (Yi,a ) ≤ pi ≤ `.
                                                 `pi a=1

Plugging in the bounds from Equations (5.3):
                                    |T |
                                                      C log(1/δ )    1
                                    ∑ Var[Ai ] ≤          `0.9
                                                                  ≤ 0.6 ,
                                                                   `
                                    i=1
                                     |T |
                                    ∑ L1 (Ai ) ≤ Cm log(1/δ )` ≤ m`O(1) ,
                                    i=1
                                                                  !k
                                                         n
                                    L1 (Sk (A)) ≤       ∑ L1 (Ai )     ≤ mk `O(k) .
                                                        i=1

Thus, since the degree of P0 is 2k,

                                      |ED [P0 ] − EU [P0 ]| ≤ εL1 (P0 ) ≤ `−k .

Overall,

            Pr[∀ i ∈ [n] g(i) ∈ Si ] − Pr[∀ i ∈ [n] h(i) ∈ Si ] = |ED [P] − EU [P]| ≤ O(`−0.2k ) ≤ δ .
             g                              h


                           T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                           23
                             PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

Proof of Claim 5.4. We repeat the proof of Lemma 3.3 with σ 2 = `−0.5 and t = `0.2 . The event G defined
as
                            (                                                 )
                                         `−0.05k                  `−0.05(k+1)
                       G = |Sk (A)| ≤ √          and |Sk+1 (A)| ≤ p
                                             k!                     (k + 1)!

occurs with probability at least 1 − 2`−0.4k . Denote by ¬G the complement of G. Write

                           E[P − P0 ] = E[(P − P0 )1(G)] + E[(P − P0 )1(¬G)].

   Bound the first term as follows. If the co-ordinates of A are (Ck, 0)-wise independent, then, by
Lemma 3.1,
                                             (∑ Var[Ai ])k `−0.6k
                                E[Sk (A)2 ] = i           ≤       .
                                                  k!          k!
Hence, under (Ck, ε)-wise independence,

                                                `−0.6k               `−0.5k
                                E[Sk (A)2 ] ≤          + εL1 (Sk ) ≤        .                        (5.5)
                                                  k!                   k!
As in the proof of Theorem 1.6, conditioned on G,

                                        |P − P0 | ≤ 2(8e`−0.05 )k .

Thus,

                                   |E[(P − P0 )1(G)]| ≤ 2(20`−0.25 )k .

   It remains to bound the second term from above. Note that 0 ≤ P ≤ 1 since it is the probability of an
event. Bound
                                                                          q
       |E[(P − P0 )1(¬G)]| ≤ |E[P1(¬G)]| + |E[P0 1(¬G)]| ≤ |E[1(¬G)]| + E[P02 ] · E[1(¬G)].

Since E[Ai ] = 0 for all i and L1 (Sk ) ≤ `O(k) , using Equation (5.5), it follows that under (Ck, ε)-wise
independence, we have E[P02 ] ≤ O(1). So, we can bound the RHS from above by O(`−0.2k ).




5.4     Completing the analysis
Proof of Theorem 1.9. The proof uses an hybrid argument. The GMR generator chooses g0 : [n] → [m0 ],
and then g01 , . . . , g0T where gt0 = [mt−1 ] × [n] → [mt ] defines the map

                                          gt (i) = gt0 (gt−1 (i), i).

                      T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                            24
      C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

Let h0 , h01 , . . . , ht0 be truly random hash functions with similar domains and ranges. For 0 ≤ t, q ≤ T , define
the hybrid family Gtq = { ftq : [n] → [mt ]} as follows: for t = 0 and every q, define
                                                        (
                                                    q     g0 for q = 0,
                                                   f0 =
                                                          h0 for q > 0,
and for t > 0 and every q,                                      (
                                                                         q
                                                                 gt0 ( ft−1 (i), i) for t ≥ q,
                                                  ftq (i) =        0     q
                                                                 ht ( ft−1 (i), i) for t < q.
For every q, let Gq = GqT . Thus, G0 = GMR and GT = U. We will show that for every q ≥ 0,

                          Pr         [∀ i ∈ [n] f q+1 (i) ∈ Si ] − qPr q [∀ i ∈ [n] f q (i) ∈ Si ] ≤ δ 0 = δ /T.
                       f q+1 ∈Gq+1                                        f ∈G

The desired bound then follows by the triangle inequality.
      In the case q = 0, couple G0 and G1 by picking the same g01 , . . . , g0T , and use them to define the function
f 0 : [m1 ] × [n] → [m] so that
                                              f 0 (i) = f 0 (g0 (i), i), f 1 (i) = f 0 (h0 (i), i).
For i ∈ [n], define
                                                       Si0 = {a ∈ [m1 ] : f 0 (a, i) ∈ Si }.
Thus,

              Pr [∀ i ∈ [n] f 1 (i) ∈ Si ] − Pr [∀ i ∈ [n] f 0 (i) ∈ Si ]
             f 1 ∈G1                                     f 0 ∈G0

                                                       = Pr[∀ i ∈ [n] h0 (i) ∈ Si0 ] − Pr[∀ i ∈ [n] g0 (i) ∈ Si0 ] ≤ δ 0 ,
                                                           h0                                  g0

                                                                                        −O(k)
by applying Lemma 5.2 with k = O(log(1/δ 0 )) and ε = δ 0 · m0        .
    For the case q > 0, couple Gq+1 and Gq by picking the same g0q+1 , . . . , g0T , and pick x ∈ [mq−1 ]n
uniformly at random. There is a function f 0 : [mq ] × [n] → [m] so that
                                         f q (i) = f 0 (h0q (xi , i), i), f q−1 (i) = f 0 (g0q (xi , i), i).
As before, define
                                                       Si = {a ∈ [mq ] : f 0 (a, i) ∈ Si }.
Hence,

                 Pr        [∀ i ∈ [n] f q+1 (i) ∈ Si ] − qPr q [∀ i ∈ [n] f q (i) ∈ Si ]
            f q+1 ∈Gq+1                                            f ∈G

                                          = Pr
                                            0
                                               [∀ i ∈ [n] h0q (xi , i) ∈ Si0 ] − Pr
                                                                                 0
                                                                                    [∀ i ∈ [n] g0q (xi , i) ∈ Si0 ] ≤ δ 0 ,
                                               hq ,x                                   gq ,x

by Lemma 5.3 with
                                                                                 −k        C
                kq > 1, mq−1 ≥ log(1/δ 0 ), mq−1 ≥ kq , mq−1q ≤ δ 0 , εq−1 ≤ (mq mq−1 )−Ckq .


                               T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                                          25
                              PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

5.5   Minima-wise independence
Saks et al. [19] showed how to translate a PRG for combinatorial rectangles to an approximately minima-
wise independent family. We conclude with a routine extension of their result to large `.

Proof of Theorem 1.10. Fix S ⊆ [n] and a sequence T = (t1 , . . . ,t` ) of ` distinct elements from S. The
event
                                  g(t1 ) < · · · < g(t` ) < min g(S \ T )

can be viewed as the disjoint union of m` events by fixing the set A = {a1 < . . . < a` } that T maps to.
                                          

The indicator 1A of the event

                                 g(t1 ) = a1 , . . . , g(t` ) = a` , g(S \ T ) > a`

is a combinatorial rectangle: Define

                                      fi (xi ) = 1 for i 6∈ S
                                      fi (xi ) = 1(xi = a j ) for i = t j ∈ T
                                      fi (xi ) = 1(xi > a` ) for i ∈ S \ T

and

                                           fA (x1 , . . . , xn ) = ∏ fi (xi ).
                                                                i∈[n]


Since g(i) = xi , it follows that 1A (g) = fA (x). Further, choosing h ∈ U is equivalent to choosing x ∈ [m]n
uniformly at random. Hence,

                     Pr [g(t1 ) < · · · < g(t` ) < min g(S \ T )]
                     g∈G

                                  = ∑ Ey∈{0,1}r [ fA (G(y))]
                                      A
                                  = ∑(Eh∈U [1A (h)] ± ε)
                                      A
                                                                                       
                                                                                       m
                                  = Pr [h(t1 ) < · · · < h(t` ) < min h(S \ T )] ±       ε.
                                     h∈U                                               `


Acknowledgments
We thank Nati Linial, Raghu Meka, Yuval Peres, Dan Spielman, Avi Wigderson and David Zuckerman
for helpful discussions. We thank an anonymous referee for pointing out an error in the statement of
Theorem 1.6 in a previous version of the paper.

                       T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                              26
     C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

References
 [1] M IKLÓS A JTAI AND AVI W IGDERSON: Deterministic simulation of probabilistic constant depth
     circuits. In S ILVIO M ICALI AND F RANCO P REPARATA, editors, Randomness and Computation,
     volume 5 of Adv. in Computing Research, pp. 199–222. JAI Press, 1989. Available on author’s
     homepage. Preliminary version in FOCS’85. 8

 [2] N OGA A LON , L ÁSZLÓ BABAI , AND A LON I TAI: A fast and simple randomized parallel algorithm
     for the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. [doi:10.1016/0196-
     6774(86)90019-2] 16

 [3] ROY A RMONI , M ICHAEL E. S AKS , AVI W IGDERSON , AND S HIYU Z HOU: Discrepancy sets and
     pseudorandom generators for combinatorial rectangles. In Proc. 37th FOCS, pp. 412–421. IEEE
     Comp. Soc., 1996. [doi:10.1109/SFCS.1996.548500] 2, 6

 [4] A NDREI Z. B RODER , M OSES C HARIKAR , A LAN M. F RIEZE , AND M ICHAEL M ITZENMACHER:
     Min-wise independent permutations. J. Comput. System Sci., 60(3):630–659, 2000. Preliminary
     version in STOC’98. [doi:10.1006/jcss.1999.1690] 2, 6

 [5] A NDREI Z. B RODER , M OSES C HARIKAR , AND M ICHAEL M ITZENMACHER: A derandomization
     using min-wise independent permutations. J. Discr. Algorithms, 1(1):11–20, 2003. Preliminary
     version in RANDOM’98. [doi:10.1016/S1570-8667(03)00003-0] 2, 6

 [6] D EAN D ORON , P OOYA H ATAMI , AND W ILLIAM M. H OZA: Log-seed pseudorandom generators
     via iterated restrictions. In Proc. 35th Comput. Complexity Conf. (CCC’20), pp. 1–36. Schloss
     Dagstuhl–Leibniz-Zentrum fuer Informatik, 2020. [doi:10.4230/LIPIcs.CCC.2020.6] 8

 [7] G UY E VEN , O DED G OLDREICH , M ICHAEL L UBY, N OAM N ISAN , AND B OBAN V ELI ČKOVI Ć: Ef-
     ficient approximation of product distributions. Random Struct. Algor., 13(1):1–16, 1998. Preliminary
     version in STOC’92. [doi:10.1002/(SICI)1098-2418(199808)13:1<1::AID-RSA1>3.0.CO;2-W] 2,
     4, 6, 20

 [8] PARIKSHIT G OPALAN , DANIEL M. K ANE , AND R AGHU M EKA: Pseudorandomness via the
     discrete Fourier transform. SIAM J. Comput., 47(6):2451–2487, 2018. Preliminary version in
     FOCS’15. [doi:10.1137/16M1062132, arXiv:1506.04350] 7

 [9] PARIKSHIT G OPALAN , R AGHU M EKA , O MER R EINGOLD , L UCA T REVISAN , AND S ALIL P.
     VADHAN: Better pseudorandom generators from milder pseudorandom restrictions. In Proc. 53rd
     FOCS, pp. 120–129. IEEE Comp. Soc., 2012. [doi:10.1109/FOCS.2012.77, arXiv:1210.0049] 2, 3,
     4, 5, 6, 7, 8, 19

[10] PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF: Inequalities and tail bounds for elementary
     symmetric polynomials. Electron. Colloq. Comput. Complexity, TR14-019, 2014. [ECCC] 7

[11] P IOTR I NDYK: A small approximately min-wise independent family of hash functions. J. Algorithms,
     38(1):84–90, 2001. Preliminary version in SODA’99. [doi:10.1006/jagm.2000.1131] 2, 4, 6

                     T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                            27
                            PARIKSHIT G OPALAN AND A MIR Y EHUDAYOFF

[12] NATHAN L INIAL , M ICHAEL L UBY, M ICHAEL E. S AKS , AND DAVID Z UCKERMAN: Efficient
     construction of a small hitting set for combinatorial rectangles in high dimension. Combinatorica,
     17(2):215–234, 1997. Preliminary version in STOC’93. [doi:10.1007/BF01200907] 2, 3, 6, 7

[13] C HI -J EN L U: Improved pseudorandom generators for combinatorial rectangles. Combinatorica,
     22(3):417–434, 2002. Preliminary version in ICALP’98. [doi:10.1007/s004930200021] 2, 6

[14] R AGHU M EKA , O MER R EINGOLD , AND AVISHAY TAL: Pseudorandom generators for
     width-3 branching programs.      In Proc. 51st STOC, pp. 626–637. ACM Press, 2019.
     [doi:10.1145/3313276.3316319, arXiv:1806.04256] 8

[15] K ETAN M ULMULEY: Randomized geometric algorithms and pseudorandom generators. Algo-
     rithmica, 16(4/5):450–463, 1996. Preliminary version in FOCS’92. [doi:10.1007/BF01940875]
     6

[16] J OSEPH NAOR AND M ONI NAOR: Small-bias probability spaces: Efficient constructions and
     applications. SIAM J. Comput., 22(4):838–856, 1993. Preliminary version in STOC’90.
     [doi:10.1137/0222053] 19

[17] N OAM N ISAN: Pseudorandom generators for space-bounded computation. Combinatorica,
     12(4):449–461, 1992. Preliminary version in STOC’90. [doi:10.1007/BF01305237] 8

[18] G EORGE P ÓLYA AND G EORGE S ZEG Ő: Problems and Theorems in Anaysis II. Springer, 1976. 12

[19] M ICHAEL E. S AKS , A RAVIND S RINIVASAN , S HIYU Z HOU , AND DAVID Z UCKERMAN: Low
     discrepancy sets yield approximate min-wise independent permutation families. Information Pro-
     cessing Letters, 73(1–2):29–32, 2000. Preliminary version in RANDOM’99. [doi:10.1016/S0020-
     0190(99)00163-5] 2, 6, 7, 26

[20] J. M ICHAEL S TEELE: The Cauchy-Schwarz Master Class.             Cambridge Univ. Press, 2004.
     [doi:10.1017/CBO9780511817106] 3, 4


AUTHORS

      Parikshit Gopalan
      Senior Researcher
      VMware
      Palo Alto, CA, US
      pgopalan vmware com
      https://research.vmware.com/researchers/parikshit-gopalan




                     T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                          28
   C ONCENTRATION FOR L IMITED I NDEPENDENCE VIA E LEMENTARY S YMMETRIC P OLYNOMIALS

    Amir Yehudayoff
    Associate Professor
    Department of Mathematics
    Technion-IIT, Haifa, Israel
    amir yehudayoff gmail com
    https://yehudayoff.net.technion.ac.il/


ABOUT THE AUTHORS

    PARIKSHIT G OPALAN has been a researcher at Microsoft Research (Silicon Valley and
       Redmond) and previously a postdoc at UT Austin and the University of Washington.
      After completing his undergraduate studies at IIT Bombay, he received his Ph. D. in
       2006 from the Algorithms, Combinatorics and Optimization program at Georgia Tech
       under the supervision of Richard Lipton. He has worked on coding theory, erasure
       coding for distributed storage and computational complexity. He is currently interested
       in algorithms and systems for big data and machine learning.


    A MIR Y EHUDAYOFF received his Ph. D. in 2008 from the Weizmann Institute of Science
       under the supervision of Ran Raz. Subsequently he spent two years at the Institute for
       Advanced Study in Princeton. He is currently an associate professor in the Department
       of Mathematics at the Technion–Israel Institute of Technology. His main research area
       is theoretical computer science, with a recent focus on learning theory and information
       theory.




                   T HEORY OF C OMPUTING, Volume 16 (17), 2020, pp. 1–29                         29