Bounding the Sensitivity of Polynomial Threshold Functions

Authors Prahladh Harsha, Adam Klivans, Raghu Meka,
Plaintext
                           T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26
                                        www.theoryofcomputing.org

                   S PECIAL ISSUE : A NALYSIS OF B OOLEAN F UNCTIONS



                   Bounding the Sensitivity of
                 Polynomial Threshold Functions
                 Prahladh Harsha∗                     Adam Klivans               Raghu Meka†
                 Received July 16, 2012; Revised November 11, 2013; Published March 25, 2014




       Abstract: We give the first nontrivial upper bounds on the average sensitivity and noise
       sensitivity of polynomial threshold functions. More specifically, for a Boolean function f on
       n variables equal to the sign of a real, multivariate polynomial of total degree d, we prove

           • The average sensitivity of f is at most O(n1−1/(4d+6) ). (We also give a combinatorial
                                          d
             proof of the bound O(n1−1/2 ).)
          • The noise sensitivity of f with noise rate δ is at most O(δ 1/(4d+6) ).
                                                                               √        √
          Previously, only bounds for the degree d = 1 case were known (O( n) and O( δ ), for
       average and noise sensitivity, respectively).
          We highlight some applications of our results in learning theory where our bounds
       immediately yield new agnostic learning algorithms and resolve an open problem of Klivans,
       O’Donnell and Servedio (FOCS’08).
     An extended abstract of this result, which was proved independently by two groups (the authors of this paper and
Diakonikolas et al. [11]), appeared in the Proc. 42nd ACM Symposium on Theory of Computing, 2010 [10].
   ∗ Work done while the author was visiting the University of Texas at Austin.
   † Work done while the author was at the University of Texas at Austin.



ACM Classification: F.2.2, G.3, I.2.6
AMS Classification: 68Q25, 68Q32
Key words and phrases: polynomial threshold functions, average sensitivity


© 2014 Prahladh Harsha, Adam Klivans, and Raghu Meka
c b Licensed under a Creative Commons Attribution License (CC-BY)                        DOI: 10.4086/toc.2014.v010a001
                        P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

          The proof of our results use (i) the invariance principle of Mossel, O’Donnell and
      Oleszkiewicz (2010), (ii) the anti-concentration properties of polynomials in Gaussian space
      due to Carbery and Wright (2001) and (iii) new structural theorems about random restrictions
      of polynomial threshold functions obtained via hypercontractivity.
          These structural results may be of independent interest, as they provide a generic template
      for transforming problems related to polynomial threshold functions defined on the Boolean
      hypercube to polynomial threshold functions defined in Gaussian space.


1     Introduction
1.1   Background
Let P be a real, multivariate polynomial of degree d, and let f = sign(P). We say that the Boolean function
 f is a polynomial threshold function (PTF) of degree d. PTFs play an important role in computational
complexity with applications in circuit complexity [2, 4], learning theory [26, 24], communication
complexity [35, 36], and quantum computing [3]. While many interesting properties (e. g., Fourier
spectra, influence, sensitivity) have been characterized for the case d = 1 of linear threshold functions
(LTFs), very little is known for degrees 2 and higher. Gotsman and Linial [13] conjectured, for example,
                                                                 √
that the average sensitivity of a degree-d polynomial is O(d n). In this work, we take a step towards
resolving this conjecture and give the first nontrivial bounds on the average sensitivity and noise sensitivity
of degree-d PTFs (Theorem 1.6).
     Average sensitivity [5] and noise sensitivity [17, 6] are two fundamental quantities that arise in
the analysis of Boolean functions. Roughly speaking, the average sensitivity of a Boolean function f
measures the expected number of bit positions that change the sign of f for a randomly chosen input, and
the noise sensitivity of f measures the probability over a randomly chosen input x that f changes sign if
each bit of x is flipped independently with probability δ (we give formal definitions below).
     Bounds on the average and noise sensitivity of Boolean functions have direct applications in hardness
of approximation [14, 23], hardness amplification [31], circuit complexity [27], the theory of social
choice [19], and quantum complexity [37]. In this paper, we focus on applications in learning theory,
where it is known that bounds on the noise sensitivity of a class of Boolean functions yield learning algo-
rithms for the class that succeed in harsh noise models (i. e., work in the agnostic model of learning) [18].
We obtain the first efficient algorithms for agnostically learning PTFs with respect to the uniform distribu-
tion on the hypercube. We also give efficient algorithms for agnostically learning ellipsoids in Rn with
respect to the Gaussian distribution, resolving an open problem of Klivans et al. [25]. We discuss these
learning theory applications in Section 2.


1.2   Main definitions and results
We begin by defining the (Boolean) noise sensitivity of a Boolean function:

Definition 1.1 (Boolean noise sensitivity). Let f be a Boolean function f : {1, −1}n → {1, −1}. For any
δ ∈ (0, 1), let X be a random element of the hypercube {1, −1}n and Z a δ -perturbation of X defined as

                        T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                 2
                 B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

follows: for each i independently, Zi is set to Xi with probability 1 − δ and −Xi with probability δ . The
noise sensitivity of f , denoted NSδ ( f ), for noise δ is then defined as follows:

                                       NSδ ( f ) = Pr [ f (X) 6= f (Z)] .

    Intuitively, the Boolean noise sensitivity of f measures the probability that f changes value when a
random input to f is perturbed slightly. In order to analyze Boolean noise sensitivity, we will also need
to analyze the Gaussian noise sensitivity, which is defined similarly, but the random variables X and Z
are drawn from a multivariate Gaussian distribution. Let N = N(0, 1) denote the univariate Gaussian
distribution on R with mean 0 and variance 1.
Definition 1.2 (Gaussian noise sensitivity). Let f : Rn → {−1, 1} be any Boolean function on Rn . Let
                                               √ from the multivariate Gaussian distribution N and Z a
X,Y be two independent random variables drawn                                                   n

δ -perturbation of X defined by Z = (1 − δ )X + 2δ − δ 2Y . The Gaussian noise sensitivity of f , denoted
GNSδ ( f ), for noise δ is defined as follows:

                                      GNSδ ( f ) = Pr [ f (X) 6= f (Z)] .
                                                                                               √
    It is well known that the Boolean and Gaussian noise sensitivity of LTFs are at most O( δ ). Our
results give the first nontrivial bounds for degrees 2 and higher in both the Gaussian and Boolean cases,
with the Gaussian case being considerably easier to handle than the Boolean case.
Theorem 1.3 (Boolean noise sensitivity). For any degree-d PTF f : {1, −1}n → {1, −1} and 0 < δ < 1,
                                                                
                                   NSδ ( f ) = 2O(d) · δ 1/(4d+6) .

    For the Gaussian case, we get a slightly better dependence on the degree d.
Theorem 1.4 (Gaussian noise sensitivity). For any degree-d polynomial P such that P is either multilinear
or corresponds to an ellipsoid, the following holds for the corresponding PTF f = sign(P). For all
0 < δ < 1,                                                       
                                   GNSδ ( f ) = 2O(d) · δ 1/(2d+1) .

   Diakonikolas et al. [11] prove that a similar bound holds for all degree-d PTFs. Our next set of results
bound the average sensitivity or total influence of degree-d PTFs.
Definition 1.5 (average sensitivity). Let f be a Boolean function, and let X be a random element of
                                                                 (i)      (i)
the hypercube {1, −1}n . Let X (i) ∈ {1, −1}n be such that Xi = −Xi and X j = X j for j 6= i. Then, the
influence of the ith variable is defined by
                                                    h               i
                                       Ii ( f ) = Pr f (X) 6= f X (i) .

The sum of all the influences is referred to as the average sensitivity of the function f ,

                                             AS( f ) = ∑ Ii ( f ) .
                                                         i


                        T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                             3
                         P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

     Clearly, for any function f , AS( f ) is at most n. It is well known that the average sensitivity of “unate”
                                                                                              √
functions (functions monotone in each coordinate), and thus of LTFs in particular is O( n). This bound
                                                                    √
is tight as the Majority function has average sensitivity Θ( n). As mentioned before, Gotsman and
                                                                                               √
Linial [13] conjectured in 1994 that the average sensitivity of any degree-d PTF f is O(d n). We are not
aware of any progress on this conjecture until now, with no o(n) bounds known.
     We give two upper bounds on the average sensitivity of degree-d PTFs. We first use a simple
translation lemma for bounding average sensitivity in terms of noise sensitivity of a Boolean function and
Theorem 1.3 to obtain the following bound.

Theorem 1.6 (average sensitivity). For a degree-d PTF f : {1, −1}n → {1, −1},
                                                                   
                                       AS( f ) = 2O(d) · n1−1/(4d+6) .

      We also give an elementary combinatorial argument, to show that the average sensitivity of any
                                          d
degree-d PTF is at most 3n1−1/2 . The combinatorial proof is based on the following lemma for
general Boolean functions that may prove useful elsewhere. For x ∈ {1, −1}n , and i ∈ [n], let x−i =
(x1 , . . . , xi−1 , xi+1 , . . . , xn ).

Lemma 1.7. For Boolean functions fi : {1, −1}n → {1, −1} with fi not depending on the i’th coordinate
xi , and X ∈u {1, −1}n ,
                              "                #2
                                  E    ∑ Xi fi (X−i )    ≤ 2 ∑ AS( fi ) + n .
                                  X
                                        i                      i

    We believe that when the functions fi in the above lemma are LTFs, the above bound can be improved
to O(n), which in turn would imply the Gotsman-Linial conjecture for quadratic threshold functions.


1.3   Random restrictions of PTFs—a structural result
An important ingredient of our sensitivity bounds for PTFs are new structural theorems about random
restrictions of PTFs obtained via hypercontractivity. The structural results we obtain can be seen as
part of the high level “randomness vs. structure” paradigm that has played a fundamental role in many
recent breakthroughs in additive number theory and combinatorics. Specifically, we obtain the following
structural result (Lemmas 5.1 and 5.2): for any PTF, there exists a small set of variables such that with at
least a constant probability, any random restriction of these variables satisfies one of the following: (1)
the restricted polynomial is “regular” in the sense that no single variable has large influence or (2) the
sign of the restricted polynomial is a very biased function.
    We remark that our structural results, though motivated by similar results of Servedio [34] and
Diakonikolas et al. [9] for the simpler case of LTFs, do not follow from a generalization of their
arguments for LTFs to PTFs. The structural results for random restrictions of low-degree PTFs provide a
reasonably generic template for reducing problems involving arbitrary PTFs to ones on regular PTFs. In
fact, these structural properties are used precisely for the above reason both in this work and in a parallel
work by one of the authors, Meka and Zuckerman [29] to construct pseudorandom generators for PTFs.

                         T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                  4
                 B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

1.4   Related and subsequent work
Independent of this work, Diakonikolas, Raghavendra, Servedio, and Tan [11] have obtained nearly
identical results to ours for both the average and noise sensitivity of PTFs. The broad outline of their
proof is also similar to ours. In our proof, we first obtain bounds on noise sensitivity and then move to
average sensitivity using a translation lemma. On the other hand, Diakonikolas et al. [11] first obtain
bounds on the average sensitivity of PTFs and then use a generalization of Peres’ argument [33] for LTFs
to move from average sensitivity to noise sensitivity.
    Regarding our structural result described in Section 1.3, Diakonikolas, Servedio, Tan and Wan [12]
have independently obtained similar results to ours. As an application, they prove the existence of
low-weight approximators for polynomial threshold functions. In the context of approximating functions
with many influential variables by low-degree PTFs, Ben-Eliezer, Lovett and Yadin [28] obtained similar
structural results (with weaker parameters) independent of both our work and that of [12].
    There has been considerable progress towards the resolution of the Gotsman-Linial conjecture since
our work. In particular, Kane proved optimal upper bounds for the average sensitivity of PTFs in the
Gaussian setting [20] and an upper bound of Od,ε (n5/6+ε ) in the Boolean setting [21]. Building on this
                                                       √                    2
final work, Kane recently showed an upper bound of n(log n)O(d log d) 2O(d log d) . These later works have
a similar broad outline as ours (use regularity lemma to reduce to Gaussian case) and use a much stronger
structure theorem for PTFs called diffused decompositions by Kane [21].

1.5   Proof outline
The proofs of our theorems are inspired by the use of the invariance principle in the proof of the “Majority
is Stablest” theorem [30]. As in the proof of the “Majority is Stablest” theorem, our main technical tools
are the invariance principle and the anti-concentration bounds (also called small ball probabilities) of
Carbery and Wright [8].
     Bounding the probability that a threshold function changes value either when it is perturbed slightly
(in the case of noise sensitivity) or when a variable is flipped (average sensitivity) involves bounding
probabilities of the form Pr [|Q(X)| ≤ |R(X)|] where Q(X), R(X) are low-degree polynomials and R has
small `2 -norm relative to that of Q. The event |Q(X)| ≤ |R(X)| implies that either |Q(X)| is small or
|R(X)| is large. In other words, for every γ

                        Pr [|Q(X)| ≤ |R(X)|] ≤ Pr [|Q(X)| ≤ γ] + Pr [|R(X)| > γ] .

Since R has small norm, the second quantity in the above expression can be easily bounded using a tail
bound (even Markov’s inequality suffices). Bounding the first quantity is trickier. Our first observation is
that if the random variable X were distributed according to the Gaussian distribution as opposed to the
uniform distribution on the hypercube, bounds on probabilities of the form Pr [|Q(X)| ≤ γ] immediately
follow from the anti-concentration bounds of Carbery and Wright [8]. We then transfer these bounds to
the Boolean setting using the invariance principle.
     Unfortunately, the invariance principle holds only for regular polynomials (i.e., polynomials in which
no single variable has large influence). We thus obtain the required bounds on noise sensitivity and
average sensitivity for the special case of regular PTFs. We then extend these results to an arbitrary PTF
 f using our structural results on random restrictions of the PTF f . The structural results state that either

                        T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                5
                        P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

the restricted PTF is a regular polynomial or is a very biased function. In the former case, we resort to
the above argument for regular PTFs and bound the noise sensitivity of the given PTF. In the latter case,
we merely note that the noise sensitivity of a biased function can be easily bounded. This in turn lets us
extend the results for regular PTFs to all PTFs.


2    Learning theory applications
In this section we briefly elaborate on the learning theory applications of our results. Our bounds on
Boolean and Gaussian noise sensitivity imply learning results in the challenging agnostic model of
learning of Haussler [15] and Kearns, Schapire and Sellie [22] which we define below.
Definition 2.1. Let D be an arbitrary distribution on X and C a class of Boolean functions f : X → {−1, 1}.
For δ , ε ∈ (0, 1), we say that algorithm A is a (δ , ε)-agnostic learning algorithm for C with respect to D
if the following holds. For any distribution D0 on X × {−1, 1} whose marginal over X is D, if A is given
access to a set of labeled examples (x, y) drawn from D0 , then with probability at least 1 − δ algorithm A
outputs a hypothesis h : X → {−1, 1} such that
                                           Pr      [h(x) 6= y] ≤ opt + ε
                                        (x,y)∼D0

where opt is the error made by the best classifier in C, that is,
                                        opt = inf      Pr      [g(x) 6= y] .
                                                g∈C (x,y)∼D0

    Kalai, Klivans, Mansour and Servedio [18] showed that the existence of low-degree real valued
polynomial `2 -approximators to a class of functions, implies agnostic learning algorithms for the class. In
an earlier result, Klivans, O’Donnell and Servedio [24] gave a precise relationship between polynomial
approximation and noise sensitivity, essentially showing that small noise sensitivity bounds imply good
low-degree polynomial `2 -approximators.
    Combining these two results, it follows that bounding the noise sensitivity (either Boolean or Gaus-
sian) of a concept class C yields an agnostic learning algorithm for C (with respect to the appropriate
distribution). Thus, using our bounds on noise sensitivity of PTFs, we obtain corresponding learning
algorithms for PTFs with respect to the uniform distribution over the hypercube.
Theorem 2.2. The concept class of degree-d PTFs is agnostically learnable to within ε with respect to
                                                 O(d)
the uniform distribution on {−1, 1}n in time n1/ε .
     These are the first polynomial-time algorithms for agnostically learning constant-degree PTFs with
respect to the uniform distribution on the hypercube (to within any constant error parameter). Previously,
Klivans et al. [25] had shown that quadratic (degree-2) PTFs corresponding to spheres are agnostically
learnable with respect to spherical Gaussians on Rn . Our bounds on the Gaussian noise sensitivity of
ellipsoids imply that this result can be extended to all ellipsoids with respect to (not necessarily spherical)
Gaussian distributions thus resolving an open problem of Klivans et al. [25].
     It is implicit from a recent paper of Blais, O’Donnell and Wimmer [7] that bounding the Boolean
noise sensitivity for a concept class C yields non-trivial learning algorithms for a very broad class of
discrete and continuous product distributions. We believe this is additional motivation for obtaining
bounds on a function’s Boolean noise sensitivity.

                        T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                 6
                    B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

3     Organization
The rest of the paper is organized as follows. We introduce the necessary notation and preliminaries in
Section 4. We then present the structural results on random restrictions of PTFs (Lemmas 5.1 and 5.2) in
Section 5. In Section 6 we present our analysis of Gaussian noise sensitivity, followed by the analysis of
Boolean noise sensitivity in Section 7. We remark that the analysis of the Gaussian noise sensitivity is
simpler than the Boolean noise sensitivity analysis, since the Boolean case, in some sense, reduces to the
“regular” or Gaussian case. We then present our bounds on average sensitivity of PTFs in Section 8.


4     Notation and preliminaries
We will consider functions/polynomials over n variables X1 , . . . , Xn . Corresponding to any set I ⊆ [n]
(possibly multi-set), there is a monomial X I defined as X I = ∏i∈I Xi . The degree of the monomial X I is
the size of the set I, denoted by |I|. Note that if I is a “regular” set (opposed to a multi-set), then the
monomial X I is linear in each of the participating variables Xi , i ∈ I.
    A polynomial of degree d is a linear combination of monomials of degree at most d, that is,

                                              P(X1 , . . . , Xn ) =       ∑         aI X I .
                                                                      I⊆[n],|I|≤d

The aI are called the coefficients of the polynomial P. By convention, we set aI = 0 for all other I. If
the above summation is only over sets I and not multi-sets, then the polynomial is said to be multilinear.
Observe that while working over the hypercube, it suffices to consider only multilinear polynomials. We
use the following notations throughout.

    1. Unless otherwise stated, we work with a PTF f of degree d and a degree-d polynomial P(X) =
       ∑I aI X I with zero constant term (i. e., a0/ = 0) such that

                                            f (X1 , . . . , Xn ) = sign(P(X1 , . . . , Xn ) − θ ) .

       In case of ambiguity, we will refer to the coefficients aI as aI (P).

    2. For a polynomial P as above and an underlying distribution over X = (X1 , . . . , Xn ), the `2 -norm of
                                                2           2
                                                              
       the polynomial over X is defined by kPk = E P(X) . Note that if P is a multilinear polynomial
       and the distribution is either the multivariate Gaussian Nn or the uniform distribution over the
       hypercube, then kPk2 = ∑I a2I .

    3. For i ∈ [n], xi = (x1 , . . . , xi ) ∈ {1, −1}i , fxi : {1, −1}n−i → {1, −1} is defined by

                                  fxi (Xi+1 , . . . , Xn ) = sign(P(x1 , . . . , xi , Xi+1 , . . . , Xn ) − θ ) .

    4. For i ∈ [n], P|i (X1 , . . . , Xi ) = ∑I⊆[i] aI X I is the restriction of P to the variables X1 , . . . , Xi .

    5. For a multi-set S, x ∈u S denotes an uniformly chosen element from S.

                             T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                       7
                         P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

   6. For clarity, we suppress the exact dependence of the constants on the degree d; a more careful
      examination of our proofs shows that all constants depending on the degree d are at worst 2O(d) .
Definition 4.1. A partial assignment xi = (x1 , . . . , xi ) is ε-determining for f , if there exists b ∈ {1, −1}
such that
                                   Pr             [ fxi (Xi+1 , . . . , Xn ) 6= b ] ≤ ε .
                              (Xi+1 ,...,Xn )∈u {1,−1}n−i

    We now define regular polynomials which play an important role in all our results. Intuitively,
a polynomial is regular if no variable has high influence. For a polynomial Q, the weight of the ith
coordinate is defined by w2i (Q) = ∑I3i a2I . For i ∈ [n], let σi (Q)2 = ∑ j≥i w2j (Q).
Definition 4.2 (regular polynomials). A multilinear polynomial Q is ε-regular if
                                                        !2
                                   ∑ w4i (P) ≤ ε 2 ∑ w2i (P)       = ε 2 σ14 (P) .
                                    i                        i

A PTF f (x) = sign(Q(x) − θ ) is ε-regular if Q is ε-regular.
    We also assume without loss of generality that the variables are ordered such that
                                         w1 (P) ≥ w2 (P) ≥ · · · ≥ wn (P) .
   We repeatedly use three powerful tools: (2, 4)-hypercontractivity (cf. [32, Chapter 9, Bonami
Lemma]), the invariance principle of Mossel et al. [30] and the anti-concentration bounds of Carbery and
Wright [8]. We state the relevant results below.
Lemma 4.3 ((2,
             24)-hypercontractivity).  If Q,R are degree-d multilinear polynomials, then for X ∈u
{1, −1} , EX Q · R ≤ 9 · EX Q · EX R2 . In particular,
                  2
       n
                       d
                              2

                                                        2
                                       E Q4 ≤ 9d · E Q2 .
                                          

   The following anti-concentration bound is a special case of Theorem 8 of [8]. (In their notation, set
q = 2d and the log-concave distribution µ to be Nn .)
Theorem 4.4 (Carbery-Wright anti-concentration bound). There exists an absolute constant C such that
for any polynomial Q of degree at most d with kQk = 1 and any interval I ⊆ R of length α,
                                             Pr [Q(X) ∈ I] ≤ Cd α 1/d .
                                           X←Nn
    The following result due to Mossel et al. [30] generalizes the classical quantitative central limit
theorem for sums of independent variables, the Berry-Esséen Theorem, to low-degree polynomials over
independent variables.
Theorem 4.5 (Mossel et al.). There exists a universal constant C such that the following holds. For any
ε-regular multilinear polynomial P of degree at most d with kPk = 1 and t ∈ R,

                              Pr        [P(X) < t] − Pr n [P(Y ) < t] ≤ Cd ε 2/(4d+1) .
                          X∈u {1,−1}n                       Y ←N

  The result stated in [30] uses maxi w2i (P) as the notion of regularity instead of ∑i w4i (P) as we do.
However, their proof extends straightforwardly to the above.

                         T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                  8
                  B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

5    Random restrictions of PTFs
We now establish our structural results on random restrictions of low-degree PTFs. The use of critical
indices (K(P, ε)) in our analysis is motivated by the results of Servedio [34] and Diakonikolas et al. [9]
who obtain similar results for LTFs. At a high level, we show the following.
    Given any ε > 0, define the ε-critical index of a multilinear polynomial P, K = K(P, ε), to be the
least index i such that w2j (P) ≤ ε 2 σi+1
                                        2 (P) for all j > i. We consider two cases depending on how large

K(P, ε) is and roughly, show the following (here c, α > 0 are some universal constants).

    1. K ≤ 1/ε cd . In this case we show that for xK = (x1 , . . . , xK ) ∈u {1, −1}K , the PTF fxK is ε-regular
       with probability at least α.

    2. K > 1/ε cd . In this case we show that with probability at least α, the value of the threshold function
       is determined by the top L = 1/ε cd variables.

    More concretely, we show the following.

Lemma 5.1. For every integer d, there exist constants ad ∈ R, γd > 0 such that for any multilinear
polynomial P of degree at most d and K = K(P, ε) as defined above, the following holds. The polynomial

                                                      def
                               PxK (Yk+1 , . . . ,Yn ) = P(x1 , . . . , xK ,YK+1 , . . . ,Yn )

in variables YK+1 , . . . ,Yn obtained by randomly choosing xK = (x1 , . . . , xK ) ∈u {1, −1}K is ad ε-regular
with probability at least γd .

Lemma 5.2. For every d, there exist constants bd , cd ∈ R, δd > 0, such that for any multilinear polynomial
P of degree at most d the following holds. If K(P, ε) ≥ cd log(1/ε)/ε 2 = L, then a random partial
assignment (x1 , . . . , xL ) ∈u {1, −1}L is bd ε-determining for P with probability at least δd .

    To prove the above structural properties we need the following simple lemmas.

           ([1, Lemma 3.2]). Let A be a real valued random variable satisfying E [A] = 0, E A2 = σ 2
                                                                                            
Lemma 5.3
and E A4 ≤ bσ 4 . Then,
        
                                     h          √ i
                                   Pr A ≥ σ /4 b ≥ 1/44/3 b .

Lemma 5.4. For d > 0 there exist constants αd , βd > 0 such that for any degree at most d polynomial Q,
and X ∈u {1, −1}n ,
                                Pr [ Q(X) ≥ E [Q] + αd σ (Q) ] ≥ βd ,

where σ 2 (Q) is the variance of Q(X) = kQk2 − (EX [Q])2 . In particular, Pr [ Q(X) ≥ E [Q] ] ≥ βd .
                                                                       2
Proof. Let random variable
                      4  A  = Q(X)
                                 2 − E X [Q(X)]. Then, E [A] = 0, E  A = σ 2 (Q) and by (2, 4)-
                            d          d   4
hypercontractivity, E A ≤ 9 E A = 9 σ (Q). The claim now follows from Lemma 5.3.


                         T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                 9
                           P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

5.1     Proof of Lemma 5.1
Let X ≡ (X1 , . . . , XK ). We prove the lemma as follows: (1) Bound the expectation of ∑ j>K w4j (PX ) using
hypercontractivity and use Markov’s inequality to show that with high probability ∑ j>K w4j (PX ) is small.
                           2 (P ) =
(2) Use the fact that σK+1      X     ∑ j>K w2j (PX ) is a polynomial of degree at most 2d in X and Lemma 5.4
to lower bound the probability that σK+1  2 (P ) is large. Let
                                                 X

                 PX (YK+1 , . . . ,Yn ) = P(X1 , . . . , XK ,YK+1 , . . . ,Yn )
                                          = R(X1 , . . . , XK ) +                   ∑          QJ (X1 , . . . , XK ) ∏ Y j .
                                                                          J⊆[K+1,n],0<|J|≤d                         j∈J
                    h               i
      We now bound E ∑ j>K w4j (PX ) . Fix a j > K and observe that w2j (PX ) = ∑J3 j Q2J (X). Thus,

                         E w2j (PX ) = ∑ E Q2J (X) = ∑ kQJ k2 = w2j (P) .
                                                  
                                                                                                     (5.1)
                                X                               X
                                                         J3 j                       J3 j

Further, by (2, 4)-hypercontractivity (Lemma 4.3),
                                                       2
                             E w4j (PX ) ≤ 9d E w2j (PX )    = 9d w4j (P) .
                                       
                                     X                                X
Hence,                                             "                      #
                                               E        ∑ w4j (PX ) ≤ 9d ∑ w4j (P) .
                                                       j>K                          j>K

Now, from the definition of K(P, ε), w2j (P) ≤ ε 2 σK+1
                                                    2 (P) for all j > K. Thus,


                                    ∑ w4j (P) ≤ ε 2 σK+1
                                                     2
                                                         (P) ∑ w2j (P) = ε 2 σK+1
                                                                              4
                                                                                  (P) .
                                 j>K                                          j>K

Combining the above inequalities and applying Markov’s inequality we get
                                "                           #
                                         Pr
                                         X
                                               ∑ w4j (PX ) ≥ γ9d ε 2 σK+1
                                                                      4
                                                                          (P) ≤ 1/γ .                                          (5.2)
                                               j>K

Observe that Q(X) = ∑ j>K w2j (PX ) is a polynomial of degree at most 2d in X1 , . . . , Xk and by (5.1),
                                                   E [Q] = ∑ w2j (P) = σK+1
                                                                        2
                                                                            (P) .
                                                                j>K

Thus, by applying Lemma 5.4 to Q,
                                                   "                                       #
                                              Pr       ∑ w2j (PX ) ≥ σK+1
                                                                      2
                                                                          (P) ≥ β2d .
                                                       j>K

Setting γ = 2/β2d in (5.2) and using the above equation, we get
                                                               !2 
                          Pr  ∑ w4j (PX ) ≤ a2d ε 2 ∑ w2j (PX )  ≥ β2d /2 ,
                                X
                                         j>K                                  j>K

where a2d = 2 · 9d /β2d . Thus, the polynomial PX (YK+1 , . . . ,Yn ) is (ad ε)-regular with probability at least
γd = β2d /2.

                           T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                  10
                       B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

5.2      Proof of Lemma 5.2
We use the following simple lemma.
Lemma 5.5. For 1 ≤ i < j < K(P, ε), σ 2j (P) ≤ (1 − ε 2 ) j−i σi2 (P).
Proof. For 1 ≤ i < K(P, ε), we have
                                          σi2 (P) = w2i (P) + σi+1
                                                                2
                                                                   (P) ≥ ε 2 σi2 (P) + σi+1
                                                                                         2
                                                                                            (P) .
        2 (P) ≤ (1 − ε 2 )σ 2 (P). The lemma follows.
Thus, σi+1                 i

Proof of Lemma 5.2. Suppose that K(P, ε) ≥ L = cd log(1/ε)/ε 2 for a constant c to be chosen later and
let
                         Q(X1 , . . . , Xn ) = P(X1 , . . . , Xn ) − P|L (X1 , . . . , XL ) .
The proof proceeds as follows. We first show that kQk is significantly smaller than kP|L k. We then use
Lemma 5.4 applied to P|L − θ and Markov’s inequality applied to |Q(X)| to show that |P|L (X1 , . . . , XL ) − θ |
is larger than |Q(X)|, so that Q(X) cannot flip the sign of P|L (X1 , . . . , XL ) − θ , with at least a constant
probability. We first bound kQk.
                                                     √
Claim 5.6. For a suitably large constant cd , kQk ≤ ε αd kP|L k.
Proof. Let αd , βd be the constants from Lemma 5.4. By definition kQk2 = ∑I:I6⊆[L] a2I ≤ σL2 (P). Now,
           σ12 (P) = ∑ w2j (P) + σL2 (P) ≤ d                    ∑          a2I + σL2 (P) ≤ d             ∑        a2I + d     ∑ a2I + σL2 (P)
                        j<L                                  I:I∩[L]6=0/                           I:06/ =I⊆[L]             I:I6⊆[L]

                      ≤d      ∑           a2I + d   ∑     w2j (P) + σL2 (P) ≤ d              ∑         a2I + (d + 1) σL2 (P) .
                           I:06/ =I⊆[L]             j>L                                 I:06/ =I⊆[L]

Further, by Lemma 5.5, σL2 (P) ≤ (1 − ε 2 )L−1 σ12 (P). Combining the above inequalities we get,
                σL2 (P) ≤ Od (1 − ε 2 )L−1          ∑ a2I = Od (1 − ε 2 )L−1 σ 2 (P) .
                                                                            
                                                                                                                                                (5.3)
                                                                    I:06/ =I⊆[L]

Choosing L = cd log(1/ε)/ε 2 for large enough cd , we get the claim.
      By Claim 5.6 and Markov’s inequality,
                                                                                                               √ 
            Pr n |Q(x1 , . . . , xn )| ≥ αd kP|L k ≤                           Pr   n
                                                                                      |Q(x1 , . . . , xn )| ≥ kQk/ ε ≤ ε.                       (5.4)
          x∈u {1,−1}                                                       x∈u {1,−1}

Let S ⊆ {1, −1}L be the set of all bad xL ∈ {1, −1}L such that,
                                                                                          
                        Pr            |Q(x1 , . . . , xL , XL+1 , . . . , Xn )| ≥ αd kP|L k ≥ 2ε/βd .
                       (XL+1 ,...,Xn )∈u {1,−1}n

Then, from (5.4) and the above equation, PrxL ∈u {1,−1}L xL ∈ S ≤ βd /2. Now, let T ⊆ {1, −1}L be
                                                                                 

such that for xL ∈ T , | P|L (x1 , . . . , xL ) − θ | ≥ αd kPL k and xL ∈      / S. Observe that all xL ∈ T are (2ε/βd )-
determining and by Lemma 5.4 and the above equations,
               L                                                                                 L    
       Pr      x ∈T ≥              Pr            P|L (x1 , . . . , xL ) − θ ≥ αd kPL k −     Pr      x ∈ S ≥ βd /2 .
      xL ∈u {1,−1}L                   xL ∈u {1,−1}L                                                               xL ∈u {1,−1}L

The lemma now follows.

                                T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                              11
                        P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

6    Gaussian noise sensitivity of PTFs
In this section we bound the Gaussian noise sensitivity of PTFs and thus prove Theorem 1.4. The proof is
simpler than the Boolean case and only makes use of an anti-concentration bound for polynomials in
Gaussian space.
     Although Theorem 1.4 was stated only for multilinear polynomials and ellipsoids, we give a proof
below that works for all degree-d polynomials using ideas from Diakonikolas et al. [11], who were the
first to prove a bound on the Gaussian noise sensitivity of general degree-d polynomials (see remarks
after the statement of Claim 6.1).

Proof of Theorem 1.4. Let f be the degree-d PTF and P the corresponding degree-d polynomial such
                                                                                                   2
   2f (x) = sign(P(x)). We may assume without loss of generality that P is normalized, i. e., kPk =
that
E P (X) = 1.
     The proof is based on the Carbery-Wright anti-concentration bound (Theorem 4.4) for degree-d PTFs.
Let X,Y ∼ Nn and
                                       q
                        def
                                                                      p
                      Z = (1 − δ ) X + 1 − (1 − δ )2 Y = (1 − δ ) X + 2δ − δ 2 Y .
          √
Let ρ =    2δ − δ 2 . Define the perturbation polynomial

                         Q(X,Y ) = P(Z) − P(X) = P((1 − δ )X + ρY ) − P(X) .

    Now, for γ > 0 to be chosen later,

                Pr [sign(P(X)) 6= sign(P(Z))] = Pr [sign(P(X)) 6= sign(P(X) + Q(X,Y ))]
                                                ≤ Pr [|P(X)| < |Q(X,Y )|]
                                                ≤ Pr [|P(X)| < γ] + Pr [|Q(X,Y ) > γ]
                                                ≤ Cd γ 1/d + Pr [|Q(X,Y )| > γ] ,

where the last inequality follows from the anti-concentration bound√ (Theorem 4.4). In Claim 6.1, we
show that the norm kQk of the perturbation polynomial is at most cd δ for some constant cd (dependent
on d). We can now apply Markov’s inequality to bound the second quantity as follows.

                                Pr [|Q(X,Y )| > γ] ≤ kQk2 /γ 2 ≤ cd δ /γ 2 .

Thus,
                                                                   cd δ
                                         GNSδ ( f ) ≤ Cd γ 1/d +        .
                                                                    γ2
The theorem follows by setting γ = δ d/(2d+1) in which case we get GNSδ ( f ) = Od (δ 1/(2d+1) ).

    We note that we can get a slightly stronger bound of
                                                 p         
                                         Od δ 1/2d log(1/δ )

if we used a stronger tail bound instead of Markov’s in the above argument.

                       T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                          12
                      B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS
                                                        √
Claim 6.1. There exists a constant cd such that kQk ≤ cd δ .

     An earlier version of this paper had an error in the proof of this claim. As pointed out to us by
the authors of [11], that proof worked only for multilinear polynomials and ellipsoids. Diakonikolas et
al. [11] proved the claim for general degree-d polynomials. For the sake of completeness, we give a
simplified presentation of their proof (that works for all degree-d polynomials) in Section 9.


7      Noise sensitivity of PTFs
We now bound the noise sensitivity of PTFs and prove Theorem 1.3. We do so by first bounding the noise
sensitivity of regular PTFs and then use the results of the previous section to reduce the general case to
the regular case.


7.1     Noise sensitivity of regular PTFs
At a high level, we bound the noise sensitivity of regular PTFs as follows: (1) Reduce the problem
to that of proving certain anti-concentration bounds for regular PTFs over the hypercube. (2) Use the
invariance principle of Mossel et al. [30] to reduce proving anti-concentration bounds over the hypercube
to that of proving anti-concentration bounds over Gaussian distributions. (3) Use the Carbery-Wright
anti-concentration bounds [8] for polynomials over log-concave distributions.
    For the rest of this section, we fix a degree-d multilinear polynomial P and a corresponding degree-d
PTF f . Recall that it suffices to consider multilinear polynomials as we are working over the hypercube.
We first reduce bounding noise sensitivity to proving anti-concentration bounds.

Lemma 7.1. For 0 < ρ < 1 and δ > 0,
                                                                                                         √
                                NSρ ( f ) ≤ (d + 1) δ +                       Pr        [ |P(x) − θ | ≤ 2 ρ/δ ] .
                                                                          x∈{1,−1}n


Proof. Let S be a random subset S ⊆ [n] where each i ∈ [n] is in S independently with probability ρ. From
the definition of noise sensitivity it easily follows that
                                     "                                                                                           !#
                                                                                                                         I
      NSρ ( f ) =        Pr              sign (P(X) − θ ) 6= sign                  P(X) − 2               ∑           aI X − θ
                    X∈u {1,−1}n ,S
                                                                                                     I:|I∩S| is odd
                                     "                                                           #
              ≤          Pr              |P(x) − θ | ≤ 2                   ∑           aI X I
                    X∈u {1,−1}n ,S
                                                                      I:|I∩S| is odd
                                     "                                       #
                                                                  I      √                                                      √
              ≤          Pr                    ∑           aI X         ≥ ρ/δ +                      Pr       [ | P(X) − θ | ≤ 2 ρ/δ ] .   (7.1)
                    X∈u {1,−1}n ,S                                                              X∈u {1,−1}n
                                          I:|I∩S| is odd


Define a non-negative random variable PS as follows: PS2 = ∑I:|I∩S| is odd a2I . We can then bound the first

                              T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                           13
                               P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

quantity in the above expression using PS as follows:
                   "                                           #                          "                                        #
                                                   √                                                                          √
       Pr                   ∑           aI X I    ≥ ρ/δ              ≤         Pr                  ∑           aI X I   ≥ PS / δ
  X∈u {1,−1}n ,S                                                         X∈u {1,−1}n ,S
                       I:|I∩S| is odd                                                         I:|I∩S| is odd
                                                                                                                    h    √ √ i
                                                                                                                + Pr PS ≥ ρ/ δ . (7.2)
                                                                                                                    S

Since EX ( ∑I|I∩S| is odd aI X I )2 = PS2 , by Markov’s inequality, we have
                                                       "                                              #
                                                                                                √
                                                                                      I
                                                 Pr                  ∑         aI X       ≥ PS / δ        ≤ δ.                         (7.3)
                                         x∈u {1,−1}n
                                                           I:|I∩S| is odd


Now, note that PS2 ≤ ∑i∈S w2i (P). Thus,
                                                           "                   #
                                          E PS2 ≤ E                  w2i (P)       = ρ ∑ w2i (P) ≤ d ρ .
                                            
                                          S            S
                                                               ∑
                                                               i∈S                        i

                                              √ √
Hence, by Markov’s inequality, PrS [ PS ≥ ρ/ δ ] ≤ d δ . The lemma now follows by combining
equations (7.1), (7.2), (7.3) and the above equation.

    We now prove an anti-concentration bound for regular PTFs.

Lemma 7.2. If P is ε-regular, then for any interval I ⊆ R of length at most α,

                                                 Pr    [ P(X) ∈ I ] = Od ( α 1/d + ε 2/(4d+1) ) .
                                         X∈u {1,−1}n


Proof. Let Z1 = P(X), Z2 = P(Y ) for X ∈u {1, −1}n ,Y ← Nn . Then, since P is ε-regular, by Theorem 4.5,
for all t ∈ R,
                             | Pr [Z1 > t] − Pr [Z2 > t] | = Od (ε 2/(4d+1) ) .
Now, by the above equation and Theorem 4.4 applied to the random variable Y for interval I,

                         Pr [Z1 ∈ I] = Pr [Z2 ∈ I] + Od ( ε 2/(4d+1) ) = Od ( α 1/d + ε 2/(4d+1) ) .

    We can now obtain a bound on noise sensitivity of regular PTFs.

Theorem 7.3. If f is an ε-regular PTF of degree d, then NSε ( f ) ≤ Od ε 1/(2d+2) .
                                                                                 

Proof. Let δ > 0 to be chosen later. Then, by Lemma 7.1 and Lemma 7.2 above,

                                           NSε ( f ) = Od ( δ + ε 2/(4d+1) + ε 1/2d /δ 1/d ) .

Choosing δ = ε 1/(2d+2) we get NSε ( f ) = Od ( ε 1/(2d+2) ).

                              T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                       14
                  B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

7.2   Noise sensitivity of arbitrary PTFs
We prove Theorem 1.3 by recursively applying the following lemma.

Lemma 7.4. For every d there exist universal constants cd , ∆d ∈ N, αd ∈ (0, 1) such that for M =
min(K(P, ε), cd log(1/ε)/ε 2 ) and X M = (X1 , . . . , XM ) ∈u {1, −1}M ,
                                     h                              i
                                  Pr NSε ( fX M ) ≤ ∆d ε 1/(2d+2) ≥ αd .                      (7.4)
                                         XM

Proof. Let ad , bd , cd , γd , δd be the constants from Lemmas 5.1 and 5.2. Let αd = min(γd , δd ). We consider
two cases.
Case (i): M = K(P, ε). Then, by Lemma 5.1 and Theorem 7.3, for X K ∈u {1, −1}K , with probability at
least αd , NSε ( fxK ) ≤ ∆d ε 1/(2d+2) for some constant ∆d .
Case (ii): M = cd log(1/ε)/ε 2 . Then, by Lemma 5.2, X M ∈u {1, −1}M is bd ε-determining with probabil-
ity at least αd . Further, if X M is bd ε-determining, with fX M biased towards b ∈ {1, −1}, then

  NSε ( fX M ) =            Pr           [ fX M (Z1 ) 6= fX M (Z2 ) ] ≤ 2           Pr        [ fX M (Z) 6= b ] ≤ 2bd ε ,
                     Z1 ∈u {1,−1}n−M ,                                        Z∈u {1,−1}n−M
                          Z2 ∈ε Z1

where “Z2 ∈ε Z1 ” is short-hand to denote that Z2 is an ε-perturbation of Z1 . The lemma now follows.

Proof of Theorem 1.3. Let cd , ∆d , αd be as in the above lemma and let

                              L = cd log(1/ε)/ε 2               and       t = log1−αd (1/ε) .

We will show that for

                            δ = ε 1/(2d+2) /(Lt) = Od (ε (4d+5)/(2d+2) / log2 (1/ε)) ,

we have
                                               NSδ ( f ) = Od ( ε 1/(2d+2) ) .
      Observe that for this setting, δ ≤ ε. For S ⊆ [n] and x ∈ {1, −1}n let Px,S : {1, −1}S̄ → R be the
polynomial of degree at most d defined by Px,S (XS̄ ) = P(x|S , XS̄ ). Fix x = (x1 , . . . , xn ) ∈ {1, −1}n and
define Sx,i ⊆ [n] for i ≥ 1, recursively as follows. Sx,1 is the set of M1 ≤ L largest weight coordinates in P
given by applying Lemma 7.4 to P. For i ≥ 1, let Sx,i = Sx,1 ∪ Sx,2 ∪ . . . ∪ Sx,i .
      For i > 1, let Sx,i+1 be the set of Mi+1 ≤ L largest weight coordinates in Px,Sx,i given by applying
Lemma 7.4 to the polynomial Px,Sx,i . Define fx,i by fx,i (·) ≡ sign(Px,Sx,i (·) − θ ). Note that the definition of
 fx,i only depends on x j for j ∈ Sx,i and that |Sx,i | ≤ L · i.
      Call x ∈ {1, −1}n (ε, f )-good if there exists an i, 1 ≤ i ≤ t such that NSε ( fx,i ) ≤ ∆d ε 1/(2d+2) and let
tx be such an i for a (ε, f )-good x. Then, from the definition of fx,i and Lemma 7.4,

                                              Pr        [ x is (ε, f )-good ] ≥ 1 − ε .                               (7.5)
                                          x∈u {1,−1}n


                         T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                           15
                            P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

      Let y ∈δ x be a δ -perturbation of x ∈u {1, −1}n . Then, since |Sx,tx | ≤ Lt,

                                    Pr x|Sx,tx 6= y|Sx,tx ≤ Lt δ = ε 1/(2d+2) .
                                                        
                                                                                                                      (7.6)
                                         x,y

      Also note that for any i ≥ 1, conditioned on an assignment for the values in x|Sx,i and x|Sx,i = y|Sx,i ,

                                       Pr [ f (x) 6= f (y)] = NSδ ( fx,i ) ≤ NSε ( fx,i ) .
                                       x,y

Thus, conditioned on x being (ε, f )-good and x|Sx,tx = y|Sx,tx ,

                                   Pr [ f (x) 6= f (y) ] ≤ NSε ( fx,tx ) ≤ ∆d ε 1/(2d+2) .                            (7.7)
                                   x,y

Combining (7.5), (7.6), (7.7), we get
                                                                                               
                                                               1/(2d+2)              1/(2d+2)
                              NSδ ( f ) ≤ ε + Lt δ + ∆d ε                 = Od ε                    .
             4d+5              
Since δ = Od ε 2d+2 / log2 (1/ε) and the above is applicable for all ε > 0, we get that for all ρ > 0,
                                                                          
                            NSρ ( f ) = Od log(1/ρ)ρ 1/(4d+5) = Od ρ 1/(4d+6) .


8      Average sensitivity of PTFs
In this section we bound the average sensitivity of PTFs on the Boolean hypercube, proving Theorem 1.6.
We first prove a lemma bounding the average sensitivity of a Boolean function in terms of its noise
sensitivity. Theorem 1.6 follows immediately from Theorem 1.3 and the following lemma:

Lemma 8.1 (noise sensitivity to average sensitivity). For any Boolean function f : {1, −1}n → {1, −1},

                                                 AS( f ) ≤ ne NS(1/n) ( f ) .

Proof. Let δ = 1/n. Let X ∈u {1, −1}n and let S ⊆ [n] be a random set with each element i ∈ [n] present
in S independently with probability δ . Let X(S) be the vector obtained by flipping the coordinates of X in
S. Then, NS( f ) = PrX,S [ f (X) 6= f (X(S))]. Observe that for i ∈ [n],

                          Pr [S = {i}] = δ (1 − δ )n−1 = (1/n) (1 − 1/n)n−1 > 1/(ne) .

Therefore,

    NSδ ( f ) = Pr [ f (X) 6= f (X(S))]
               X,S

             = ∑ Pr [ S = {i} ] · Pr [ f (X) 6= f (X(S)) | S = i ] + Pr [ |S| 6= 1 ] · Pr [ f (X) 6= f (X(S)) | |S| 6= 1 ]
                     S             X                                      S              X,S
                 i
                   1                             1
             >∑       Pr [ f (X) 6= f (X({i}))] = AS( f ) .
                 i ne                            ne
                      X



                           T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                          16
                      B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

                                                         −d
    We now give a bound of O(n1−2 ) on the average sensitivity using a different, combinatorial,
argument (not using the noise sensitivity bounds).
                                                                                                                                      −d
Theorem 8.2. For any degree-d PTF f : {1, −1}n → {1, −1}, AS( f ) ≤ 3 n1−2 .

    We first show the theorem using Lemma 1.7.


Proof. Let P(x) = xi Pi (x−i ) + Qi (x−i ), where Pi ( ), Qi ( ) are degree-(d − 1) and degree-d polynomials,
respectively, that do not depend on xi . Define fi (x−i ) = sign(Pi (x−i )) and gi (x) = f (x) fi (x−i ). Then,
                                       h                      i                           h                 i
         Ii ( f ) =        Pr              f (X) 6= f (X (i) ) =        f (X)Pr
                                                                              f  (X
                                                                                i −i ) 6
                                                                                       = f (X (i)
                                                                                                  ) f  (X
                                                                                                      i −i )
                     X∈u {1,−1}n                      X∈u {1,−1}n
                                 h                                          i               h                  i
                   =     Pr n f (X) fi (X−i ) 6= f (X (i) ) fi ((X (i) )−i ) =       Pr n gi (X) 6= gi (X (i) )
                      X∈u {1,−1}                                                                                   X∈u {1,−1}

                   = Ii (gi ) .

     Observe that gi is monotone increasing in xi as the coefficient of xi in gi written as a polynomial is
fi (x−i ) · Pi (x−i ) ≥ 0. Hence, for i ∈ [n], Ii (gi ) = EX [Xi gi (X)]. Thus,
                                                                                                                                         "       #
  AS( f ) = ∑ Ii ( f ) = ∑ Ii (gi ) = ∑ E [Xi gi (X)] = ∑ E [Xi f (X) fi (X−i )] = E f (X) ∑ Xi fi (X−i ) .
                                                         X                                X                                          X
               i                   i                 i                                i                                                      i


Since | f (x)| ≤ 1 for all x, we have
                                                                         "                                     #
                                                     AS( f ) ≤ E             ∑ Xi fi (X−i )                        .                             (8.1)
                                                                     X
                                                                              i


    We now use induction and Lemma 1.7. For an LTF f , fi as defined above are constants. Therefore,
by equation (8.1) and the Cauchy-Schwarz inequality,
                                                         "                        #               "                    #
                                                                                                                               √
                                       AS( f ) ≤ E            ∑ Xi fi (X−i )              =E          ∑ Xi                 ≤    n.
                                                    X                                         X
                                                               i                                           i


Suppose the theorem is true for degree-d PTFs and let f be a degree-(d + 1) PTF and let fi be as defined
before. Then, by equation (8.1) and Lemma 1.7

                                                                                                      −d                         −d
                                  AS( f )2 ≤ 2 ∑ AS( fi ) + n ≤ ∑ 6 n1−2 + n ≤ 7 n2−2 .
                                                     i                            i

                                       −(d+1)
Therefore, AS( f ) ≤ 3 n1−2                     . The theorem follows by induction.

                              T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                                   17
                                   P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA


                  For brevity, let fi (x) = fi (x−i ). By Cauchy-Schwarz, for any random variable Z we
Proof of Lemma 1.7.
have E [|Z|]2 ≤ E Z 2 . Thus,
                                                                    ! 
                            "                #                   2                                        2

                                      E    ∑ Xi fi (X−i )             ≤ E               ∑ Xi fi (X−i )       
                                      X                                   X
                                            i                                             i
                                                                              "                                #
                                                                      =E          ∑ Xi X j fi (X) f j (X)
                                                                          X
                                                                                  i, j

                                                                      = n + ∑ E [ Xi X j fi (X) f j (X) ] .                            (8.2)
                                                                                          X
                                                                                  i6= j

For i 6= j ∈ [n], let x−i j = (xk : k ∈ [n], k 6= i, j) and let

                                   Sij = {x ∈ {1, −1}n : fi (x) 6= fi (x ⊕ e j )} .
                          h       i
Note that I j ( fi ) = PrX X ∈ Sij . Now,

               E [ Xi X j fi (X) f j (X) ] =       ∑        µ(x) xi x j fi (x) f j (x) +              ∑ µ(x) xi x j fi (x) f j (x) ,   (8.3)
               X                                    j                                                  j
                                                x∈Si ∪Sij                                           / i ∪Sij
                                                                                                   x∈S

where µ(x) = 1/2n is the probability of choosing x under the uniform distribution. We bound the first
term in the above expression by the average sensitivity of the fi and show that the second term vanishes.
Observe that

                         ∑ µ(x)xi x j fi (x) f j (x) ≤ µ(Sij ∪ Sij ) ≤ µ(Sij ) + µ(Sij ) = I j ( fi ) + Ii ( f j ) .
                              j
                                                                                                                                       (8.4)
                       x∈Si ∪Sij


    Note that for x ∈  / Sij ∪ Sij , fi (x), f j (x) are both independent of the values of xi , x j . For such x (abusing
notation) let fi (x−i j ) = fi (x), f j (x−i j ) = f j (x) and let

                                                                             / Sij ∪ Sij } .
                                                Ti j = {(xk : k 6= i, j) : x ∈

                  / Sij ∪ Sij , fi (x), f j (x) depend only on x−i j , we get that x ∈
Then, since for x ∈                                                                  / Sij ∪ Sij if and only if x−i j ∈
                                                                                                                      / Ti j .
Therefore,

                        ∑ µ(x) xi x j fi (x) f j (x) = ∑ µ(x−i j ) µ(xi ) µ(x j ) fi (x−i j ) f j (x−i j ) xi x j
                          j                                       j
                    / i ∪Sij
                   x∈S                                        / i ∪Sij
                                                             x∈S

                                                         =      ∑ µ(x−i j ) fi (x−i j ) f j (x−i j ) xE,x [xi x j ] = 0 .              (8.5)
                                                                                                                   i   j
                                                             x−i j ∈T
                                                                   / ij

    From equations (8.2), (8.3), (8.4),(8.5) we have,
          "               #2
          E     ∑ Xi fi (X−i )            ≤ n + ∑ (I j ( fi ) + Ii ( f j )) = n + 2 ∑ ∑ I j ( fi ) = n + 2 ∑ AS( fi ) .
          X
                   i                             i6= j                                            i j: j6=i                 i



                                   T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                                  18
                  B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

Remark 8.3. The bound of Lemma 1.7 is tight up to a constant factor if we only have bounds on the
average sensitivity of the fi to go with. For example, consider fi defined as follows. Divide [n] into
    √
m = n blocks B1 , . . . , Bm of size m each and for 1 ≤ j ≤ m, i ∈ B j , let fi = ∏k∈B j :k6=i xk . Then, the left
                                                             √
hand side of the lemma is Θ(n3/2 ) and AS( fi ) = m − 1 = Θ( n) for all i.


9     Bounding the perturbation polynomial in the Gaussian setting
In this section we prove the bound on the perturbation polynomial (Claim 6.1). For this, we need some
background on Hermite polynomials, which we provide in Section 9.1.

9.1   Background on Hermite polynomials
The univariate Hermite polynomials are defined as follows:

                                              (−1)k 2 d k         2
                                     Hk (x) = √ ex /2 k e−x /2 .
                                                 k!       dx
                                                      √
The univariate Hermite polynomials satisfy Hk0 (x) = kHk−1 (x).
    The multivariate Hermite polynomials in n variables (x1 , . . . , xn ) are defined as follows. Let S ⊆ [n]
be a multiset. It will be convenient to denote a multiset S by a sequence of n indices as S = (s1 , . . . , sn )
where each si denotes the multiplicity of element i ∈ [n] in the set S. Note, by this notation, |S| = ∑ si .
                                                                            n
                                                  HS (x1 , . . . , xn ) = ∏ Hsi (xi ) .
                                                                           i=1

    The Hermite polynomials are especially useful while working over the (multivariate) normal distribu-
tion due to the following orthonormality conditions.
                                                       (
                                                        1 if S = T,
                                  E n [HS (X)HT (X)] =
                                X←N                     0 otherwise.

This implies that kPk2 = EX←Nn [P2 (X)] = ∑ P̂S2 .
    We will need to work with the Taylor series expansion for the Hermite polynomials. For this, we first
observe that the partial derivatives of the multivariate Hermite polynomials can be calculated as follows
                                                  √                               √
                  (∂ HS )i (x1 , . . . , xn ) =    si Hsi −1 (xi ) ∏ Hs j (x j ) = si HS\{i} (x1 , . . . , xn ) .
                                                                    j6=i


Furthermore, the iterative partial derivatives can be calculated as follows. Let R = (r1 , . . . , rn ) ⊆ S be any
multiset.                                        s
                                                     n
                                                            si !
                                      (∂ HS )R = ∏                  · HS\R .
                                                    i=1 (si − ri )!


                         T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                       19
                                P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

This in particular gives the follow Taylor series expansion for HS (z) = HS (z1 , . . . , zn ) about the point
x = (x1 , . . . , xn ) for multisets S. Let |S| = d. Since HS depends on at most d variables, we can assume
without loss of generality that HS is defined on the first d variables, i.e., S ⊆ [d] and HS (x) = HS (x1 , . . . , xd ).
                                                                                      !
                                      d                                   d
                                                   1
                  HS (z) = HS (x) + ∑ ∑          d
                                                         (∂ HS )R (x) · ∏(zi − xi )ri
                                                ∏    r
                                     k=1 R:|R|=k i=1 i !                 i=1
                                                         s                                          !
                                      d                      d                          d
                                                   1                si !
                          = HS (x) + ∑ ∑
                                                ∏d
                                                     r !
                                                            ∏ (si − ri )! · HS\R (x) · ∏(zi − xi )ri .           (9.1)
                                     k=1 R:|R|=k i=1 i      i=1                        i=1


    The multivariate Hermite polynomials up to degree d form a basis for the set of all multivariate
polynomials of degree d. In particular, given any degree-d polynomial P(x1 , . . . , xn ), we can write it as a
linear combination of Hermite polynomials as follows

                                       P(x1 , . . . , xn ) =      ∑          P̂S HS (x1 , . . . , xn ) .
                                                               S⊂[n]:|S|≤d


The values P̂S are called the Hermite coefficients of P.

9.2    Proof of Claim 6.1
                                                                       √
Recall that we must prove there exists a constant cd such that kQk ≤ cd δ .

Proof. Given any degree-d multivariate polynomial P, we can write it in the Hermite basis as P(x) =
∑S:|S|≤d P̂S HS (x) and use this expansion to bound kQk = kP(Z) − P(X)k as follows.
                                                                                      !2 
       kQk2 = E (P(Z) − P(X))2 = E 
                             
                                                          ∑       P̂S (HS (Z) − HS (X)) 
                                                        S:|S|≤d

             = ∑ P̂S P̂T E [(HS (Z) − HS (X)) · (HT (Z) − HT (X))]
                S,T

             = ∑ P̂S2 E (HS (Z) − HS (X))2 + ∑ P̂S P̂T E [(HS (Z) − HS (X)) · (HT (Z) − HT (X))]
                                         
                 S                                         S6=T

                      P̂S2 E    (HS (Z) − HS (X))2 − ∑ P̂S P̂T (E [HS (Z)HT (X)] + E [HS (X)HT (Z)]) ,
                                                 
             =∑                                                                                                   (9.2)
                 S                                         S6=T

where the last step follows from the orthonormality of the Hermite polynomials.
   We will now show that

                                  E[HS (X)HT (Z)] = E[HS (X1 . . . Xn )HT (Z1 . . . Zn )] = 0

for S 6= T . Since S 6= T and (X1 , . . . , Xn ), (Z1 , . . . , Zn ) are product distributions, it suffices to show the
following univariate case: E[Hs (X1 )Ht (Z1 )] = 0 for s 6= t. We now observe that the joint distribution
(X1 , Z1 ) is identical to the distribution (Z1 , X1 ). Hence, to calculate E[Hs (X1 )Ht (Z1 )] for s 6= t we can

                               T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                  20
                 B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

assume without loss of generality that s > t. Now, Ht (Z1 ) = Ht ((1 − δ )X1 + ρY1 ) is a bivariate degree-t
polynomial and can be expanded in the Hermite basis as ∑ti, j=0 αi j Hi (X1 )H j (Y1 ). We thus have
                                                        t
                      E [Hs (X1 )Ht (Z1 )] = ∑ αi j E [Hs (X1 )Hi (X1 )] · E[H j (Y1 )] = 0
                     X1 ,Y1                                       X1                      Y1
                                                      i, j=0

since s > t ≥ i.
    Plugging this into the expression for kQk in (9.2), we have

                                 kQk2 = ∑ P̂S2 E (HS (Z) − HS (X))2 .
                                                                  
                                                    S

Since kPk2 = ∑S P̂S2 = 1, to prove the claim it suffices if we show that there exists a constant cd such that
for any multiset S, kHS (Z) − HS (X)k2 ≤ c2d δ . We bound the norm kHS (Z) − HS (X)k using the Taylor
series expansion of HS (Z) as stated in equation (9.1). Let |S| = d; then we have

                                                      s                         "                               !#
                               d                             d                                       d
                                             1                    si !                                     ri
E [|HS (Z) − HS (X)|] ≤ ∑           ∑      d                ∏ (si − ri )! · E       HS\R (X) ·   ∏(Zi − Xi )
                              k=1 R:|R|=k ∏i=1 ri !
X,Y
                                                            i=1                                  i=1
                                                                "                                    !#
                               d                                                     d
                                             1
                       ≤∑           ∑ ∏d r ! d k/2 E              HS\R (X) ·        ∏(Zi − Xi )r i

                              k=1 R:|R|=k   i=1 i                                   i=1

                               [ since each si ≤ d and ∑ ri = |R| = k ]
                                                   v              "               #
                          d
                                                   u h               d
                                           1       u         i
                       ≤ ∑ d k/2 ∑       d
                                                   tE H 2 (X) · E
                                                       S\R         ∏(Zi − Xi )2ri
                         k=1           ∏
                                R:|R|=k i=1  r i !                  i=1

                               [ by the Cauchy-Schwarz inequality ]
                                                  s
                          d                          d
                                           1
                       = ∑ d k/2 ∑       d         ∏    E [(Zi − Xi )2ri ]
                         k=1    R:|R|=k ∏i=1 ri !   i=1

                               [ by orthonormality of HS\R and independence of (Zi − Xi ) over the values i ]
                                                  s
                          d                          d
                                           1                 (2ri )!
                       = ∑ d k/2 ∑       d         ∏    δ ri
                         k=1    R:|R|=k ∏i=1 ri !   i=1        ri !
                                                          √                            (2r)!
                                 [ since Zi − Xi ∼ N(0, 2δ ) whose 2r-th moment is δ r       ]
                                             s                                         r!
                          d                     d
                                                     2ri     d                  d       √
                       = ∑ (dδ )k/2 ∑         ∏ ri ≤ ∑ (dδ )k/2 · d k · 2k/2 = ∑ (d 3/2 2δ )k
                         k=1         R:|R|=k  i=1           k=1                k=1
                               √                         √
                       ≤ 2d 3/2 2δ             [ if d 3/2 2δ ≤ 1/2 ] .

To complete the proof we need the following (1, 2)-hypercontractivity for degree-d polynomials under
the normal distribution.

                        T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                                         21
                       P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

Lemma 9.1 ((1, 2)-hypercontractivity [16, Remark 5.13]). If P is a multivariate polynomial of degree d,
then                               q
                           kPk =       E n [P(X)2 ] ≤ ed · E n [|P(X)|] .
                                        X←N                 X←N
                 √                                            √
   Thus, if d 3/2 2δ ≤ 1/2, then E[|HS (Z) − HS (X)|] ≤ 2d 3/2 2δ . We can now use the above (1, 2)-
hypercontractivity and bound kHS (Z) − HS (X)k as follows.
                                                                              √
                     kHS (Z) − HS (X)k2 ≤ ed E [|HS (Z) − HS (X)|] ≤ 2d 3/2 ed 2δ .
            √
    If d 3/2 2δ > 1/2, we have
                                                                                √
                    E[|HS (Z) − HS (X)|2 ] ≤ 2 E[HS2 (Z) + HS2 (X)] ≤ 4 < 8d 3/2 2δ .
                                                                                          √
Thus, either way, we have that there exists a constant cd such that kHS (Z) − HS (X)k ≤ cd δ .


Acknowledgments
We thank Ilias Diakonikolas for pointing out an error in an earlier version of this writeup.


References
 [1] N OGA A LON , G REGORY G UTIN , AND M ICHAEL K RIVELEVICH: Algorithms with large domina-
     tion ratio. J. Algorithms, 50(1):118–131, 2004. [doi:10.1016/j.jalgor.2003.09.003] 9

 [2] JAMES A SPNES , R ICHARD B EIGEL , M ERRICK L. F URST, AND S TEVEN RUDICH: The expressive
     power of voting polynomials. Combinatorica, 14(2):135–148, 1994. Preliminary version in
     STOC’91. [doi:10.1007/BF01215346] 2

 [3] ROBERT B EALS , H ARRY B UHRMAN , R ICHARD C LEVE , M ICHELE M OSCA , AND RONALD DE
     W OLF: Quantum lower bounds by polynomials. J. ACM, 48(4):778–797, 2001. Preliminary version
     in FOCS’98. [doi:10.1145/502090.502097] 2

 [4] R ICHARD B EIGEL: The polynomial method in circuit complexity. In Proc. 8th IEEE Conf.
     on Structure in Complexity Theory (SCT’93), pp. 82–95. IEEE Comp. Soc. Press, 1993.
     [doi:10.1109/SCT.1993.336538] 2

 [5] M ICHAEL B EN -O R AND NATHAN L INIAL: Collective coin flipping, robust voting schemes and
     minima of Banzhaf values. In Proc. 26th FOCS, pp. 408–416. IEEE Comp. Soc. Press, 1985.
     [doi:10.1109/SFCS.1985.15] 2

 [6] I TAI B ENJAMINI , G IL K ALAI , AND O DED S CHRAMM: Noise sensitivity of Boolean functions and
     applications to percolation. Inst. Hautes Études Sci. Publ. Math., 90(1):5–43, 1999. See also at
     arXiv. [doi:10.1007/BF02698830] 2

                       T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                         22
                B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

 [7] E RIC B LAIS , RYAN O’D ONNELL , AND K ARL W IMMER: Polynomial regression under arbitrary
     product distributions. Machine Learning, 80(2-3):273–294, 2010. Preliminary version in COLT’08.
     [doi:10.1007/s10994-010-5179-6] 6

 [8] A NTHONY C ARBERY AND JAMES W RIGHT: Distributional and Lq norm inequalities for
     polynomials over convex bodies in Rn . Mathematical Research Letters, 8(3):233–248, 2001.
     [doi:10.4310/MRL.2001.v8.n3.a1] 5, 8, 13

 [9] I LIAS D IAKONIKOLAS , PARIKSHIT G OPALAN , R AGESH JAISWAL , ROCCO A. S ERVEDIO , AND
     E MANUELE V IOLA: Bounded independence fools halfspaces. SIAM J. Comput., 39(8):3441–3462,
     2010. Preliminary version in FOCS’09. See also at ECCC. [doi:10.1137/100783030] 4, 9

[10] I LIAS D IAKONIKOLAS , P RAHLADH H ARSHA , A DAM K LIVANS , R AGHU M EKA , P RASAD
     R AGHAVENDRA , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Bounding the average sensitivity
     and noise sensitivity of polynomial threshold functions. In Proc. 42nd STOC, pp. 533–542. ACM
     Press, 2010. [doi:10.1145/1806689.1806763] 1

[11] I LIAS D IAKONIKOLAS , P RASAD R AGHAVENDRA , ROCCO S ERVEDIO , AND L I -YANG TAN:
     Average sensitivity and noise sensitivity of polynomial threshold functions. Technical report, 2009.
     To appear in SIAM J. Comput. [arXiv:0909.5011] 1, 3, 5, 12, 13

[12] I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN: A reg-
     ularity lemma and low-weight approximators for low-degree polynomial threshold functions.
     Theory of Computing, 10(2):27–53, 2014. Preliminary version in CCC’10. See also at arXiv.
     [doi:10.4086/toc.2014.v010a002] 5

[13] C RAIG G OTSMAN AND NATHAN L INIAL: Spectral properties of threshold functions. Combinator-
     ica, 14(1):35–50, 1994. [doi:10.1007/BF01305949] 2, 4

[14] J OHAN H ÅSTAD: Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001. Prelimi-
     nary version in STOC’97. [doi:10.1145/502090.502098] 2

[15] DAVID H AUSSLER: Decision theoretic generalizations of the PAC model for neural net and other
     learning applications. Inform. and Comput., 100(1):78–150, 1992. Preliminary version in ALT’90.
     [doi:10.1016/0890-5401(92)90010-D] 6

[16] S VANTE JANSON:      Gaussian Hilbert           Spaces.        Cambridge    Univ.    Press,   1997.
     [doi:10.1017/CBO9780511526169] 22

[17] J EFF K AHN , G IL K ALAI , AND NATHAN L INIAL: The influence of variables on Boolean functions.
     In Proc. 29th FOCS, pp. 68–80. IEEE Comp. Soc. Press, 1988. [doi:10.1109/SFCS.1988.21923] 2

[18] A DAM TAUMAN K ALAI , A DAM R. K LIVANS , Y ISHAY M ANSOUR , AND ROCCO A. S ERVEDIO:
     Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777–1805, 2008. Preliminary version in
     FOCS’05. [doi:10.1137/060649057] 2, 6

                      T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                            23
                      P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

[19] G IL K ALAI: Noise sensitivity and chaos in social choice theory. In Fete of Combinatorics and
     Computer Science, pp. 173–212. Springer, 2010. [doi:10.1007/978-3-642-13580-4_8] 2

[20] DANIEL M. K ANE: The Gaussian surface area and noise sensitivity of degree-d polynomial thresh-
     old functions. Comput. Complexity, 20(2):389–412, 2011. See also at arXiv. [doi:10.1007/s00037-
     011-0012-6] 5

[21] DANIEL M. K ANE: A structure theorem for poorly anticoncentrated Gaussian chaoses and applica-
     tions to the study of polynomial threshold functions. In Proc. 53rd FOCS, pp. 91–100. IEEE Comp.
     Soc. Press, 2012. [doi:10.1109/FOCS.2012.52] 5

[22] M ICHAEL J. K EARNS , ROBERT E. S CHAPIRE , AND L INDA S ELLIE: Toward efficient ag-
     nostic learning. Machine Learning, 17(2-3):115–141, 1994. Preliminary version in COLT’92.
     [doi:10.1007/BF00993468] 6

[23] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM J. Comput., 37(1):319–
     357, 2007. Preliminary version in FOCS’04. See also at ECCC. [doi:10.1137/S0097539705447372]
     2

[24] A DAM R. K LIVANS , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO: Learning intersections
     and thresholds of halfspaces. J. Comput. System Sci., 68(4):808–840, 2004. Preliminary version in
     FOCS’02. [doi:10.1016/j.jcss.2003.11.002] 2, 6

[25] A DAM R. K LIVANS , RYAN O’D ONNELL , AND ROCCO A. S ERVEDIO: Learning geometric
     concepts via Gaussian surface area. In Proc. 49th FOCS, pp. 541–550. IEEE Comp. Soc. Press,
     2008. [doi:10.1109/FOCS.2008.64] 2, 6
                                                                               1/3
[26] A DAM R. K LIVANS AND ROCCO A. S ERVEDIO: Learning DNF in time 2Õ(n ) . J. Comput. System
     Sci., 68(2):303–318, 2004. Preliminary version in STOC’01. [doi:10.1016/j.jcss.2003.07.007] 2

[27] NATHAN L INIAL , Y ISHAY M ANSOUR , AND N OAM N ISAN: Constant depth circuits, Fourier
     transform, and learnability. J. ACM, 40(3):607–620, 1993. Preliminary version in FOCS’89.
     [doi:10.1145/174130.174138] 2

[28] S HACHAR L OVETT, I DO B EN -E LIEZER , AND A RIEL YADIN: Polynomial threshold functions:
     Structure, approximation and pseudorandomness. Electron. Colloq. on Comput. Complexity (ECCC),
     16:118, 2009. ECCC. See also at arXiv. 5

[29] R AGHU M EKA AND DAVID Z UCKERMAN: Pseudorandom generators for polynomial threshold
     functions. SIAM J. Comput., 42(3):1275–1301, 2013. Preliminary version in STOC’10. See also at
     arXiv. [doi:10.1137/100811623] 4

[30] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 5, 8, 13

                      T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                         24
                B OUNDING THE S ENSITIVITY OF P OLYNOMIAL T HRESHOLD F UNCTIONS

[31] RYAN O’D ONNELL: Hardness amplification within NP. J. Comput. System Sci., 69(1):68–94, 2004.
     Preliminary version in STOC’02. [doi:10.1016/j.jcss.2004.01.001] 2

[32] RYAN O’D ONNELL: Analysis of Boolean Functions. Cambridge Univ. Press, 2014. See online
     version here. 8

[33] Y UVAL P ERES: Noise stability of weighted majority. Technical report, 2004. [arXiv:math/0412377]
     5

[34] ROCCO A. S ERVEDIO: Every linear threshold function has a low-weight approximator. Comput.
     Complexity, 16(2):180–209, 2007. Preliminary version in CCC’06. [doi:10.1007/s00037-007-0228-
     7] 4, 9

[35] A LEXANDER A. S HERSTOV: Separating AC0 from depth-2 majority circuits. SIAM J. Comput.,
     38(6):2113–2129, 2009. Preliminary version in STOC’07. [doi:10.1137/08071421X] 2

[36] A LEXANDER A. S HERSTOV: The pattern matrix method. SIAM J. Comput., 40(6):1969–2000,
     2011. Preliminary version in STOC’08. See also at arXiv. [doi:10.1137/080733644] 2

[37] YAOYUN S HI: Lower bounds of quantum black-box complexity and degree of approximating
     polynomials by influence of Boolean variables. Information Processing Letters, 75(1-2):79–83,
     2000. See also at arXiv. [doi:10.1016/S0020-0190(00)00069-7] 2


AUTHORS

      Prahladh Harsha
      Tata Institute of Fundamental Research
      Mumbai, India
      prahladh tifr res in
      http://www.tcs.tifr.res.in/~prahladh/


      Adam Klivan
      The University of Texas at Austin
      klivans cs utexas edu
      http://www.cs.utexas.edu/~klivans/


      Raghu Meka
      Microsoft Research, Silicon Valley
      raghu ias edu
      http://www.math.ias.edu/~raghu




                      T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                         25
                    P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA

ABOUT THE AUTHORS

    P RAHLADH H ARSHA is a reader at the Tata Institute of Fundamental Research, Mumbai, In-
       dia. He obtained his Ph. D. from MIT under the supervision of Madhu Sudan. Prahladh’s
       research interests include complexity theory, probabilistically checkable proofs, and
       communication complexity.


    A DAM K LIVANS is an Associate Professor of Computer Science at the University of Texas
       at Austin. His interests are in learning theory and its relationship with computational
       complexity. He is frequently confused with Adam Kalai.


    R AGHU M EKA is a researcher at Microsoft Research, Silicon Valley, Mountain View. He
       received his Ph. D. from the University of Texas at Austin in 2011 under the supervision
       of David Zuckerman. His main interests are in complexity theory, pseudo-randomness,
       algorithm design, learning theory and more generally in anything to do with probability.




                    T HEORY OF C OMPUTING, Volume 10 (1), 2014, pp. 1–26                          26