A Regularity Lemma and Low-Weight Approximators for Low-Degree Polynomial Threshold Functions

Authors Ilias Diakonikolas, Rocco A. Servedio, Li-Yang Tan, Andrew Wan,
Plaintext
                           T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53
                                        www.theoryofcomputing.org




         A Regularity Lemma and Low-Weight
            Approximators for Low-Degree
           Polynomial Threshold Functions
             Ilias Diakonikolas ∗                      Rocco A. Servedio†                Li-Yang Tan‡
                                                      Andrew Wan§
              Received September 8, 2012; Revised September 27, 2013; Published March 25, 2014




       Abstract: We give a “regularity lemma” for degree-d polynomial threshold functions
       (PTFs) over the Boolean cube {−1, 1}n . Roughly speaking, this result shows that every
       degree-d PTF can be decomposed into a constant number of subfunctions such that almost
       all of the subfunctions are close to being regular PTFs. Here a “regular” PTF is a PTF
       sign(p(x)) where the influence of each variable on the polynomial p(x) is a small fraction of
       the total influence of p.
     A conference version of this paper appeared in the Proc. 25th Ann. IEEE Conf. on Computational Complexity, CCC
2010 [11].
   ∗ ilias.d@ed.ac.uk. Supported in part by a Simons Postdoctoral Fellowship. Most of this work was done at Columbia

University supported by NSF grant CCF-0728736, and by an Alexander S. Onassis Foundation Fellowship. Part of this research
was done while visiting IBM Almaden.
   † rocco@cs.columbia.edu. Supported by NSF grants CNS-0716245, CCF-0915929, and CCF-1115703.
   ‡ liyang@cs.columbia.edu. Supported by DARPA award no. HR0011-08-1-0069 and NSF CyberTrust grant no. CNS-

0716245.
   § atw12@eecs.berkeley.edu. This work was done at Columbia University supported by NSF CyberTrust award CNS-

0716245.


ACM Classification: F.1.3, G.2.0
AMS Classification: 68Q15, 68R01
Key words and phrases: complexity theory, Boolean functions, polynomials, threshold functions


© 2014 Ilias Diakonikolas, Rocco A. Servedio, Li-Yang Tan, and Andrew Wan
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2014.v010a002
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

          As an application of this regularity lemma, we prove that for any constants d ≥ 1, ε > 0,
      every degree-d PTF over n variables can be approximated to accuracy ε by a constant-degree
      PTF that has integer weights of total magnitude Oε,d (nd ). This weight bound is shown to be
      optimal up to logarithmic factors.


1     Introduction
A polynomial threshold function (henceforth PTF) is a Boolean function f : {−1, 1}n → {−1, 1},

                                             f (x) = sign(p(x)) ,

where p : {−1, 1}n → R is a polynomial with real coefficients. If p has degree d, we say that f is a
degree-d PTF. Low-degree PTFs are a natural generalization of linear threshold functions (the case d = 1)
and hence are of significant interest in complexity theory, see e. g., [1, 4, 29, 28, 9, 13, 16, 25, 32, 34]
and many other works.
    The influence of coordinate i on a function g : {−1, 1}n → R measures the extent to which xi affects
the output of g. More precisely, we have Infi (g) = ∑S3i gb(S)2 , where ∑S⊆[n] gb(S)χS (x) is the Fourier
expansion of g. The total influence of g is the sum of all n coordinate influences, Inf(g) = ∑ni=1 Infi (g).
See [27, 18] for background on influences.
    We say that a polynomial p : {−1, 1}n → R is “τ-regular” if the influence of each coordinate on p is
at most a τ fraction of p’s total influence (see Section 3 for a more detailed definition). A PTF f is said to
be τ-regular if f = sign(p), where p is τ-regular. Roughly speaking, regular polynomials and PTFs are
useful because they inherit some nice properties of PTFs and polynomials over Gaussian (rather than
Boolean) inputs; this intuition can be made precise using the “invariance principle” of Mossel et al. [26].
This point of view has been useful in the d = 1 case for constructing pseudorandom generators [6],
low-weight approximators [33, 10], and other results for LTFs [30, 24].

1.1   Our results
A regularity lemma for degree-d PTFs A number of useful results in different areas, loosely referred
to as “regularity lemmas,” show that for various types of combinatorial objects an arbitrary object can be
approximately decomposed into a constant number of “pseudorandom” sub-objects. The best-known
example of such a result is Szemerédi’s classical regularity lemma for graphs [35], which (roughly) says
that any graph can be decomposed into a constant number of subsets such that almost every pair of
subsets induces a “pseudorandom” bipartite graph. Another example is Green’s recent regularity lemma
for Boolean functions [14]. Results of this sort are useful because different properties of interest are
sometimes easier to establish for pseudorandom objects, and via regularity lemmas it may be possible
to prove the corresponding theorems for general objects. We note also that results of this sort play an
important part in the “structure versus randomness” paradigm that has been prominent in recent work in
combinatorics and number theory, see e. g., [36].
    We prove a structural result about degree-d PTFs which follows the above pattern; we thus refer to it
as a “regularity lemma for degree-d PTFs.” Our result says that any low-degree PTF can be decomposed
as a small depth decision tree, most of whose leaves are close to regular PTFs:

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                               28
                     A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

Theorem 1.1. Let f (x) = sign(p(x)) be any degree-d PTF. Fix any τ > 0. Then f is equivalent to a
decision tree T, of depth
                                                        1 O(d)
                                                         
                                               1
                                 depth(d, τ) := · d log
                                               τ        τ
with variables at the internal nodes and a degree-d PTF fρ = sign(pρ ) at each leaf ρ, with the following
property: with probability at least 1 − τ, a random path1 from the root reaches a leaf ρ such that either
(i) pρ is τ-regular, or (ii) fρ is τ-close to a constant function.
Regularity is a natural way to capture the notion of pseudorandomness for PTFs, and results of interest
can be easier to establish for regular PTFs than for arbitrary PTFs (this is the case for our main application,
constructing low-weight approximators, as we describe below). Our regularity lemma provides a general
tool to reduce questions about arbitrary PTFs to regular PTFs; it has been used in this way as an essential
ingredient in the recent proof that bounded independence fools all degree-2 PTFs [8] and (subsequently to
the conference publication of this work [11]) degree-d PTFs [19] (see Section 9). We note that the recent
construction of pseudorandom generators for degree-d PTFs of [25] also crucially uses a decomposition
result which is very similar to our regularity lemma; we discuss the relation between our work and [25] in
more detail below and in Section 1.2. Finally, Kane’s recent work establishing the correct exponent in
the Gotsman-Linial conjecture also uses a decomposition result which is very similar to our regularity
lemma [21].

Application: Every low-degree PTF has a low-weight approximator [33] showed that every linear
threshold function (LTF) over {−1, 1}n can be ε-approximated by an LTF with integer weights w1 , . . . , wn
                              2
such that ∑i w2i = n · 2Õ(1/ε ) . (Here and throughout the paper we say that g is an ε-approximator for f
if f and g differ on at most ε2n inputs from {−1, 1}n .) This result and the tools used in its proof found
several subsequent applications in complexity theory and learning theory, see e. g., [6, 30].
     We apply our regularity lemma for degree-d PTFs to prove an analogue of the [33] result for low-
degree polynomial threshold functions. Our result implies that for any constants d, ε, any degree-d PTF
has an ε-approximating PTF of constant degree and (integer) weight O(nd ).
     When we refer to the weight of a PTF f = sign(p(x)), we assume that all the coefficients of p are
integers; by “weight” we mean the sum of the squares of p’s coefficients. We prove
Theorem 1.2. Let f (x) = sign(p(x)) be any degree-d PTF. Fix any ε > 0. Then there is a polynomial
                                              O(d)
q(x) of degree D = (d/ε)O(d) and weight 2(d/ε) · nd such that sign(q(x)) is ε-close to f .
A result on the existence of low-weight ε-approximators for PTFs is implicit in the recent work [9]. They
show that any degree-d PTF f has Fourier concentration ∑|S|>1/ε O(d) fb(S)2 ≤ ε, and this easily implies
that f can be ε-approximated by a PTF with integer weights. (Indeed, recall that the learning algorithm
of [22] works by constructing such a PTF as its hypothesis.) The above Fourier concentration bound
                                                               O(d)
implies that there is a PTF of degree 1/ε O(d) and weight n1/ε      which ε-approximates f . In contrast, our
Theorem 1.2 can give a weaker degree bound (if d = 1/ε ω(1) ), but always gives a much stronger weight
bound in terms of the dependence on n. We mention here that Podolskii [31] has shown that for every
                                                                                                    d
constant d ≥ 2, there is a degree-d PTF for which any exact representation requires weight nΩ(n ) .
   1 A random path corresponds to the standard uniform random walk on the tree.



                         T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              29
              I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

                                                    e d ) is required to ε-approximate degree-d PTFs
    We also prove lower bounds showing that weight Ω(n
for sufficiently small constant ε; see Section 4.3.


Techniques An important ingredient in our proof of Theorem 1.1 is a case analysis based on the “critical
index” of a degree-d polynomial (see Section 3 for a formal definition). The critical index measures the
first point (going through the variables from most to least influential) at which the influences “become
small;” it is a natural generalization of the definition of the “critical index” of a linear form [33] that has
been useful in several subsequent works [30, 6, 10]. Roughly speaking we show that:

      • If the critical index of p is large, then a random restriction fixing few variables (the variables with
        largest influence in p) causes sign(p) to become a close-to-constant function with non-negligible
        probability; see Section 3.1.

      • If the critical index of p is positive but small, then a random restriction as described above causes p
        to become regular with non-negligible probability; see Section 3.2.

      • If the critical index of p is zero, then p is already a regular polynomial as desired.


Structure of the paper Section 2 contains notation and background. In Section 3 we prove our main
result (Theorem 1.1). In Section 4 we prove our upper bound regarding integer-weight approximations
(Theorem 1.2) and also give a nearly matching lower bound for any constant accuracy ε > 0.


1.2     Related work
Theorem 1.1 and the results in Sections 3.1 and 3.2 strengthen earlier results with a similar flavor that
appeared in the work of Diakonikolas et al. [7]. Furthermore, similar structural results were proven
simultaneously and independently by Ben-Eliezer et al. [23] and by Harsha et al. and Meka et al. [16, 25].
We describe each of these works below and their structural results for PTFs as they compare to our
Theorem 1.1.
    The results in [7] were obtained simultaneously and independently by Diakonikolas et al. [9] and
Harsha et al. [16], and [7] is the resulting merge (which followed the approach of [9]). A regularity lemma
as in Theorem 1.1 is not explicitly derived in [7]; using their Lemmas 5.10 and 5.9 and the ideas present
here, one may obtain the conclusion of Theorem 1.1, but with depth(d, τ) := (1/τ)(d log n log(1/τ))O(d) .
Note the dependence on n; eliminating this dependence is essential for our low-weight approximator
application and for the applications in [8, 19]. We eliminate the dependence on the dimension partly
by developing dimension-independent versions of Lemmas 5.10 and 5.9, which we do in Lemmas 3.5
and 3.9, respectively.
    Harsha et al. [16] give a result which is very similar to Lemma 3.14, the main component in our
proof of Theorem 1.1. By applying the result from [16], Meka and Zuckerman [25] give a dimension-
independent regularity lemma which is quite similar to our Theorem 1.1. They obtain a small-depth
decision tree such that most leaves are ε-close to being ε-regular under a stronger definition of regularity,
which we will call “ε-regularity in `2 ” to distinguish it from our notion.

                        T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                               30
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

    Let p : {−1, 1}n → R be a polynomial and ε > 0. We say that the polynomial p is “ε-regular in `2 ” if
                                   s
                                           n                 n
                                          ∑ Infi (p)2 ≤ ε · ∑ Infi (p) .
                                          i=1               i=1

    Recall that in our definition of regularity, instead of upper bounding the `2 -norm of the influence
vector I = (Inf1 (p), . . . , Infn (p)) by ε times the total influence of p (i. e., the `1 norm of I), we upper
bound the `∞ norm (i. e., the maximum influence). We may thus call our notion “ε-regularity in `∞ .”
    Note that if a polynomial is ε-regular in `2 , then it is also ε-regular in `∞ . (And this implication is
easily seen to be essentially tight, e. g., if we have many variables with tiny influence and one variable
  √ an ε-fraction of the total influence.) For the other direction, if a polynomial is ε-regular in `∞ , then it
with
is ε-regular in `2 . (This is also tight if we have 1/ε many variables with influence ε.)
    Meka and Zuckerman prove the following.

Theorem 1.3 ([25]). Every degree-d PTF f = sign(p) can be expressed as a decision tree of depth
2O(d) · (1/ε 2 ) log2 (1/ε) with variables at the internal nodes and a degree-d PTF fρ = sign(pρ ) at each
leaf ρ, such that with probability 1 − ε, a random root-to-leaf path reaches a leaf ρ such that fρ is ε-close
to being ε-regular in `2 . (In particular, for a “good” leaf ρ, either pρ will be ε-regular in `2 or fρ will
be ε-close to a constant.)

    Theorem 1.1 shows exactly the same statement as the one above if we replace “`2 ” by “`∞ ” and the
depth of the tree by (1/ε) · (d log(1/ε))O(d) .
    Since ε-regularity in `2 implies ε-regularity in `∞ , the result of [25] implies a version of Theorem 1.1
which has depth of 2O(d) · (1/ε 2 ) log2 (1/ε). Hence the latter result and our result are quantitatively
incomparable to each other. Roughly, if d is a constant (independent of ε), then our Theorem 1.1 is
asymptotically better when ε becomes small. This range of parameters is quite natural in the context of
pseudo-random generators. In particular, in the proof that poly(1/ε)-wise independence ε-fools degree-2
PTFs [8], using [25] instead of Theorem 1.1, would give a worse bound on the degree of independence
           e −18 ) as opposed to O(ε
(namely, O(ε                       e −9 )). On the other hand, if d = Ω(log(1/ε)),
                                                                          e             then the result of [25]
is better.
    Finally, Ben-Eliezer et al. [23] establish the existence of a decision tree such that most leaves are
τ-regular (as opposed to τ-close to being τ-regular):

Theorem 1.4 ([23]). Every degree-d PTF f = sign(p) can be expressed as a decision tree of depth
      O(d)
2(d/τ) · log(1/τ) with variables at the internal nodes and a degree-d PTF fρ = sign(pρ ) at each leaf
ρ, such that with probability 1 − τ, a random root-to-leaf path reaches a leaf ρ such that fρ is τ-regular.

    Note that the depth of their tree is exponential in 1/τ.


2    Preliminaries
We start by establishing some basic notation. We write [n] to denote {1, 2, . . . , n} and [k, `] to denote
{k, k + 1, . . . , `}. We write E[X] and Var[X] to denote expectation and variance of a random variable X,

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                 31
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

where the underlying distribution will be clear from the context. For x ∈ {−1, 1}n and A ⊆ [n] we write
xA to denote (xi )i∈A .
    For a function f : {−1, 1}n → R and q ≥ 1, we denote by k f kq its `q norm, i. e.,
                                               def            1/q
                                        k f kq = Ex | f (x)|q
                                                   
                                                                   ,
where the intended distribution over x will always be uniform over {−1, 1}n . For Boolean-valued
functions f , g : {−1, 1}n → {−1, 1} the distance between f and g, denoted dist( f , g), is Prx [ f (x) 6= g(x)]
where the probability is over uniform x ∈ {−1, 1}n .
    We assume familiarity with the basic elements of Fourier analysis over {−1, 1}n ; see Section 2.1 for
a concise review. Our proofs will use various analytic and probabilistic bounds, which we collect for easy
reference in Section 2.1 below. We call the reader’s attention in particular to Theorem 2.3; throughout the
paper, C (which will be seen to play an important role in our proofs) denotes C02 , where C0 is the universal
constant from that theorem.

2.1   Useful background results
Fourier Analysis over {−1, 1}n and influences The survey by de Wolf [37] is an excellent source
of background material on Fourier analysis over {−1, 1}n , including material on the Hypercontractive
Inequality; here we recall only the very basics that we will require. We consider functions f : {−1, 1}n →
R (though we often focus on Boolean-valued functions which map to {−1, 1}), and we think of the inputs
x to f as being distributed according to the uniform probability distribution. The set of such functions
forms a 2n -dimensional inner product space with inner product given by h f , gi = E[ f (x)g(x)]. The set of
functions (χS )S⊆[n] defined by χS (x) = ∏i∈S xi forms a complete orthonormal basis for this space. Given
a function f : {−1, 1}n → R we define its Fourier coefficients by
                                                  def
                                            fb(S) = E[ f (x)χS (x)] ,
and we have that f (x) = ∑S fb(S)χS (x). We refer to the maximum |S| over all nonzero fb(S) as the Fourier
degree of f .
     As an easy consequence of orthonormality we have Plancherel’s identity h f , gi = ∑S fb(S)b          g(S),
which has as a special case Parseval’s identity, E[ f (x)2 ] = ∑S fb(S)2 . From this it follows that for every
 f : {−1, 1}n → {−1, 1} we have ∑S fb(S)2 = 1. We recall the well-known fact (see e. g., [17]) that
the total influence Inf( f ) of any Boolean function equals ∑S fb(S)2 |S|. Note that, in this setting, the
expectation and the variance can be expressed in terms of the Fourier coefficients of f by E[ f ] = fb(0)    /
                                2
and Var[ f ] = ∑06/ =S⊆[n] f (S) .
                           b
     Let f : {−1, 1}n → R and f (x) = ∑S fb(S)χS (x) be its Fourier expansion. The influence of variable i
                   def
on f is Infi ( f ) = ∑S3i fb(S)2 , and the total influence of f is Inf( f ) = ∑n Infi ( f ).
                                                                            i=1


Useful probability bounds We first recall the following moment bound for low-degree polynomials,
which is equivalent to the well-known hypercontractive inequality of [3, 15]:
Theorem 2.1. Let p : {−1, 1}n → R be a degree-d polynomial and q > 2. Then
                                           kpkq ≤ (q − 1)d/2 kpk2 .


                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                 32
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

     The following concentration bound for low-degree polynomials, a simple corollary of hypercontrac-
tivity, is well known (see e. g., [12, 2]):

Theorem 2.2. Let p : {−1, 1}n → R be a degree-d polynomial. For any t > ed , we have

                                  Prx [|p(x)| ≥ tkpk2 ] ≤ exp(−Ω(t 2/d )) .

   We will also need the following weak anti-concentration bound for low-degree polynomials over the
cube:

Theorem 2.3 ([12, 2]). There is a universal constant C0 > 1 such that for any non-zero degree-d
polynomial p : {−1, 1}n → R with E[p] = 0, we have

                                      Prx [p(x) > C0−d · kpk2 ] > C0−d .

     Throughout this paper, we let C = C02 , where C0 is the universal constant from Theorem 2.3. Note
that since C > C0 , Theorem 2.3 holds for C as well.
     We denote by Nn the standard n-dimensional Gaussian distribution N(0, 1)n . The following two facts
will be useful in the proof of Theorem 1.2, in particular in the analysis of the regular case. The first fact
is a powerful anti-concentration bound for low-degree polynomials over independent Gaussian random
variables: EG∼Nn [p(G)2 ]1/2 ):

Theorem 2.4 ([5]). Let p : Rn → R be a nonzero degree-d multilinear polynomial. For all ε > 0 and
t ∈ R we have                  h               p          i
                        PrG∼Nn |p(G) − t| ≤ ε · Var[p(G)] ≤ O(dε 1/d ) .

   We note that the above bound is essentially tight.
   The second fact is a version of the invariance principle of Mossel, O’Donnell and Oleszkiewicz,
specifically Theorem 3.19 under hypothesis H4 in [26]:

Theorem 2.5 ([26]). Let
                                        p(x) =       ∑         pb(S)χS (x)
                                                 S⊆[n],|S|≤d

be a degree-d multilinear polynomial with Var[p] = 1. Suppose each coordinate i ∈ [n] has Infi (p) ≤ τ.
Then,
                        sup |Prx [p(x) ≤ t] − PrG∼Nn [p(G) ≤ t]| ≤ O(dτ 1/(8d) ) .
                          t∈R


3    Main result: a regularity lemma for low-degree PTFs
Let f : {−1, 1}n → {−1, 1} be a degree-d PTF. Fix a representation f (x) = sign(p(x)), where p :
{−1, 1}n → R is a degree-d polynomial which (w. l. o. g.) we may take to have Var[p] = 1. We assume
w. l. o. g. that the variables are ordered in such a way that Infi (p) ≥ Infi+1 (p) for all i ∈ [n − 1].
     We now define the notion of the τ-critical index of a polynomial [9] and state its basic properties.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              33
               I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

Definition 3.1. Let p : {−1, 1}n → R and τ > 0. Assume the variables are ordered such that Inf j (p) ≥
Inf j+1 (p) for all j ∈ [n − 1]. The τ-critical index of p is the least i such that:
                                                                  n
                                              Infi+1 (p) ≤ τ · ∑ Inf j (p) .                                          (3.1)
                                                                j=i+1

If (3.1) does not hold for any i we say that the τ-critical index of p is +∞. If p has τ-critical index 0, we
say that p is τ-regular.

    Note that if p is a τ-regular polynomial then maxi Infi (p) ≤ dτ since the total influence of p is at most
d. If f (x) = sign(p(x)), we say f is τ-regular when p is τ-regular, and we take the τ-critical index of f to
be that of p.2 The following lemma says that the total influence ∑ni= j+1 Infi (p) goes down geometrically
as a function of j prior to the critical index:

Lemma 3.2. Let p : {−1, 1}n → R and τ > 0. Let k be the τ-critical index of p. For j ∈ [0, k] we have
                                              n
                                             ∑ Infi (p) ≤ (1 − τ) j · Inf(p) .
                                            i= j+1


Proof. The lemma trivially holds for j = 0. In general, since j is at most k, we have that Inf j (p) ≥
τ · ∑ni= j Infi (p), or equivalently ∑ni= j+1 Infi (p) ≤ (1−τ)· ∑ni= j Infi (p) which yields the claimed bound.

    We will use the fact that in expectation, the influence of an unrestricted variable in a polynomial does
not change under random restrictions:

Lemma 3.3. Let p : {−1, 1}n → R. Consider a random assignment ρ to the variables x1 , . . . , xk and fix
` ∈ [k + 1, n]. Then Eρ [Inf` (pρ )] = Inf` (p).

Proof. To prove Lemma 3.3, we first recall an observation about the expected value of Fourier coefficients
under random restrictions (see e. g., [22]):
Fact 3.4. Let p : {−1, 1}n → R. Consider a random assignment ρ to the variables x1 , . . . , xk . Fix any
S ⊆ [k + 1, n]. Then we have c                                               pρ (S)2 ] = ∑T ⊆[k] pb(S ∪ T )2 .
                             pρ (S) = ∑T ⊆[k] pb(S ∪ T )ρT and therefore Eρ [c
    In words, the above fact says that all the Fourier weight on sets of the form S ∪ {any subset of restricted
variables} “collapses” down onto S in expectation. Consequently, the influence of an unrestricted variable
does not change in expectation under random restrictions:
    We thus have
                                         "                    #
                                                            2
                                                                                  pb(S ∪ T )2
                               
                   Eρ Inf` (pρ ) = Eρ          ∑      pρ (S) = ∑
                                                      c                     ∑
                                                  `∈S⊆[k+1,n]             T ⊆[k] `∈S⊆[k+1,n]
                                                          2
                                       =     ∑       pb(U) = Inf` (p) .
                                           `∈U⊆[n]

   2 Strictly speaking, τ-regularity is a property of a particular representation and not of a PTF f , which could have many

different representations. The particular representation we are concerned with will always be clear from context.


                          T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                          34
                    A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

Notation: For S ⊆ [n], we write “ρ fixes S” to indicate that ρ ∈ {−1, 1}|S| is a restriction mapping xS ,
i. e., each coordinate in S, to either −1 or 1 and leaving coordinates not in S unrestricted.

3.1   The large critical index case
The main result of this section is Lemma 3.5, which says that if the critical index of f is large, then a
noticeable fraction of restrictions ρ of the high-influence variables cause fρ to become close to a constant
function.

Lemma 3.5. Let f : {−1, 1}n → {−1, 1} be a degree-d PTF f = sign(p). Fix β > 0 and suppose that
                                    def
f has τ-critical index at least K = α/τ, where α = Ω(d log log(1/β ) + d log d). Then, for at least a
1/(2Cd ) fraction of restrictions ρ fixing [K], the function fρ is β -close to a constant function.
                                                         def
proof Partition the coordinates into a “head” part H = [K] (the high-influence coordinates) and a “tail”
part T = [n] \ H. We can write p(x) = p(xH , xT ) = p0 (xH ) + q(xH , xT ), where p0 (xH ) is the truncation of
p comprising only the monomials all of whose variables are in H, i. e., p0 (xH ) = ∑S⊆H pb(S)χS (xH ).
    Now consider a restriction ρ of H and the corresponding polynomial pρ (xT ) = p(ρ, xT ). It is clear
that the constant term of this polynomial is exactly p0 (ρ). To prove the lemma, we will show that for
at least a 1/(2Cd ) fraction of all ρ ∈ {−1, 1}K , the (restricted) degree-d PTF fρ (xT ) = sign(pρ (xT ))
satisfies PrxT [ fρ (xT ) 6= sign(p0 (ρ))] ≤ β . Let us define the notion of a good restriction:

Definition 3.6. A restriction ρ ∈ {−1, 1}K that fixes H is called good iff the following two conditions
are simultaneously satisfied:
                              def                                                         −d/2
             (i) |p0 (ρ)| ≥ t ∗ = 1/(2Cd ) ,   and   (ii) kq(ρ, xT )k2 ≤ t ∗ · Θ(log(1/β )      .

     Intuitively condition (i) says that the constant term p0 (ρ) of pρ has “large” magnitude, while condition
(ii) says that the polynomial q(ρ, xT ) has “small” `2 -norm. We claim that if ρ is a good restriction then
the degree-d PTF fρ satisfies PrxT [ fρ (xT ) 6= sign(p0 (ρ))] ≤ β . To see this claim, note that for any fixed
ρ we have fρ (xT ) 6= sign(p0 (ρ)) only if |q(ρ, xT )| ≥ |p0 (ρ)|, so to show this claim it suffices to show that
if ρ is a good restriction then PrxT [|q(ρ, xT )| ≥ |p0 (ρ)|] ≤ β . But for ρ a good restriction, by conditions
(i) and (ii) we have
                                                   h                                          d/2 i
             PrxT |q(ρ, xT )| ≥ |p0 (ρ)| ≤ PrxT |q(ρ, xT )| ≥ kq(ρ, xT )k2 · Θ(log(1/β )
                                         
                                                                                                     ,

which is at most β by the concentration bound (Theorem 2.2), as desired. So the claim holds: if ρ is a
good restriction then fρ (xT ) is β -close to sign(p0 (ρ)). Thus to prove Lemma 3.5 it remains to show that
at least a 1/(2Cd ) fraction of all restrictions ρ to H are good.
     We prove this in two steps. First we show (Lemma 3.7) that the polynomial p0 is not too concentrated:
with probability at least 1/Cd over ρ, condition (i) of Definition 3.6 is satisfied. We then show (Lemma 3.8)
that the polynomial q is highly concentrated: the probability (over ρ) that condition (ii) is not satisfied is
at most 1/(2Cd ). Lemma 3.5 then follows by a union bound.

Lemma 3.7. We have that Prρ |p0 (ρ)| ≥ t ∗ ≥ 1/Cd .
                                                


                        T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                 35
                I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

Proof. Using the fact that the critical index of p is large, we will show that the polynomial p0 has large
variance (close to 1), and hence we can apply the anti-concentration bound Theorem 2.3.
    We start by establishing that Var[p0 ] lies in the range [1/2, 1]. To see this, first recall that for
g : {−1, 1}n → R we have Var[g] = ∑06/ =S⊆[n] gb2 (S). It is thus clear that Var[p0 ] ≤ Var[p] = 1. To
establish the lower bound we use the property that the “tail” T has “very small” influence in p, which is a
consequence of the critical index of p being large. More precisely, Lemma 3.2 yields

                         ∑ Infi (p) ≤ (1 − τ)K · Inf(p) = (1 − τ)α/τ · Inf(p) ≤ d · e−α                  (3.2)
                         i∈T

where the last inequality uses the fact that Inf(p) ≤ d. Therefore, we have:

                   Var[p0 ] = Var[p] −        ∑            pb(S)2 ≥ 1 − ∑ Infi (p) ≥ 1 − de−α ≥ 1/2
                                         T ∩S6=0,S⊆[n]
                                               /                        i∈T

where the first inequality uses the fact that Infi (p) = ∑i∈S⊆[n] pb(S)2 , the second follows from (3.2) and
the third from our choice of α. We have thus established that indeed Var[p0 ] ∈ [1/2, 1].
    At this point, we would like to apply Theorem 2.3 for p0 . Note however that E[p0 ] = E[p] = pb(0)      /
which is not necessarily zero. To address this minor technical point we apply Theorem 2.3 twice: once for
the polynomial p00 = p0 − pb(0)
                              / and once for −p00 . (Clearly, E[p00 ] = 0 and Var[p00 ] = Var[p0 ] ∈ [1/2, 1].)
We thus get that, independent of the value of pb(0),   / we have Prρ [|p0 (ρ)| > 2−1/2 · C−d ] ≥ C−d , as
desired.
                                                                    −d/2 
Lemma 3.8. We have that Prρ kq(ρ, xT )k2 > t ∗ · Θ(log(1/β )
                                
                                                                             ≤ 1/(2Cd ).
Proof. To obtain the desired concentration bound we must show that the degree-2d polynomial Q(ρ) =
kq(ρ, xT )k22 has “small” variance. The desired bound then follows by an application of Theorem 2.2.
   We thus begin by showing that kQk2 ≤ 3d de−α . To see this, we first note that Q(ρ) = ∑06/ =S⊆T c
                                                                                                   pρ (S)2 .
Hence an application of the triangle inequality for norms and hypercontractivity (Theorem 2.1) yields:
                                 kQk2 ≤         pρ (S)k24 ≤ 3d
                                             ∑ kc                            pρ (S)k22 .
                                                                          ∑ kc
                                          06/ =S⊆T                      06/ =S⊆T

We now proceed to bound from above the RHS term by term:
                                             h           i                  h            i
           pρ (S)k22 = ∑ Eρ [cpρ (S)2 ] = Eρ      pρ (S)2 ≤ Eρ Inf(pρ ) = Eρ ∑ Infi (pρ )
                                                                      
     ∑ kc                                      ∑ c
     06/ =S⊆T              06/ =S⊆T                          06/ =S⊆T                            i∈T

                        = ∑ Eρ Infi (pρ ) = ∑ Infi (p) ≤ de−α
                                        
                                                                                                         (3.3)
                           i∈T                       i∈T

                                                              pρ (S)2 , the equality in (3.3) follows from
where the first inequality uses the fact Inf(pρ ) ≥ ∑06/ =S⊆T c
Lemma 3.3, and the last inequality is equation (3.2). We have thus shown that kQk2 ≤ 3d de−α .
   We now upper bound
                                 Prρ Q(ρ) > (t ∗ )2 · Θ(log(1/β ))−d .
                                                                       

Since kQk2 ≤ 3d de−α , Theorem 2.2 implies that for all t > ed we have Prρ [Q(ρ) > t · 3d de−α ] ≤
exp(−Ω(t 1/d )). Taking t to be Θ(d d lnd C) this upper bound is at most 1/(2Cd ). Our choice of the
parameter α gives t · d3d · e−α ≤ (t ∗ )2 · Θ(log(1/β ))−d . This completes the proof of Lemma 3.8, and
thus also the proof of Lemma 3.5.

                         T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              36
                     A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

3.2    The small critical index case
In this section we show that if the critical index of p is “small,” then a random restriction of “few”
variables causes p to become regular with non-negligible probability. We do this by showing that no
matter what the critical index is, a random restriction of all variables up to the τ-critical index causes p
to become τ 0 -regular, for some τ 0 not too much larger than τ, with probability at least 1/(2Cd ). More
formally, we prove:

Lemma 3.9. Let p : {−1, 1}n → R be a degree-d polynomial with τ-critical index k ∈ [n]. Let ρ be a
random restriction that fixes [k], and let τ 0 = (C0 · d ln d · ln τ1 )d · τ for some suitably large absolute constant
C0 . With probability at least 1/(2Cd ) over the choice of ρ, the restricted polynomial pρ is τ 0 -regular.

Proof. We must show that with probability at least 1/(2Cd ) over ρ the restricted polynomial pρ satisfies
                                                        n
                                          Inf` (pρ )/ ∑ Inf j (pρ ) ≤ τ 0                                       (3.4)
                                                      j=k+1


for all ` ∈ [k + 1, n]. Note that before the restriction, we have Inf` (p) ≤ τ · ∑nj=k+1 Inf j (p) for all ` ∈
[k + 1, n] because the τ-critical index of p is k.
      Let us give an intuitive explanation of the proof. We first show (Lemma 3.10) that with probability
at least C−d the denominator in (3.4) does not decrease under a random restriction. This is an anti-
concentration statement that follows easily from Theorem 2.3. We then show (Lemma 3.11) that with
probability at least 1 −C−d /2 the numerator in (3.4) does not increase by much under a random restriction,
i. e., no variable influence Inf` (pρ ), ` ∈ [k + 1, n], becomes too large. Thus both events occur (and pρ is
τ 0 -regular) with probability at least C−d /2.
      We note that while each individual influence Inf` (pρ ) is indeed concentrated around its expectation
(see Claim 3.12), we need a concentration statement for n − k such influences. This might seem difficult to
achieve since we require bounds that are independent of n. We get around this difficulty by a “bucketing”
argument that exploits the fact (at many different scales) that all but a few influences Inf` (p) must be
“very small.”
      It remains to state and prove Lemmas 3.10 and 3.11. Consider the event
                                                                                 
                              def                      n                n
                            E = ρ ∈ {−1, 1} | ∑ Inf` (pρ ) ≥ ∑ Inf` (p) .
                                                 k
                                                    `=k+1               `=k+1

We first show:

Lemma 3.10. Prρ [E] ≥ C−d .

                                                                                             def
Proof. It follows from Fact 3.4 and the Fourier expression of Inf` (pρ ) that A(ρ) = ∑n`=k+1 Inf` (pρ ) is
a degree-2d polynomial. By Lemma 3.3 we get that Eρ [A] = ∑n`=k+1 Inf` (p) > 0. Also observe that
A(ρ) ≥ 0 for all ρ ∈ {−1, 1}k . We may now apply Theorem 2.3 to the polynomial A0 = A − Eρ [A], to
obtain:
                     Prρ [E] = Pr[A0 ≥ 0] ≥ Pr[A0 ≥ C0−2d · σ (A0 )] > C0−2d = C−d .


                         T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                     37
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

    We now turn to Lemma 3.11. Consider the event
                       def
                           n                               n          o
                     J = ρ ∈ {−1, 1}k max Inf` (pρ ) > τ 0 ∑ Inf j (p) .
                                                `∈[k+1,n]                    j=k+1

We show:
Lemma 3.11. Prρ [J] ≤ (1/2) ·C−d .
    The rest of this subsection consists of the proof of Lemma 3.11. A useful intermediate claim is that
the influences of individual variables do not increase by a lot under a random restriction (note that this
claim does not depend on the value of the critical index):
Claim 3.12. Let p : {−1, 1}n → R be a degree-d polynomial. Let ρ be a random restriction fixing [ j].
Fix any t > e2d and any ` ∈ [ j + 1, n]. With probability at least 1 − exp(−Ω(t 1/d )) over ρ, we have
Inf` (pρ ) ≤ 3d t Inf` (p).

                                               pρ (S)2 and Fact 3.4 imply that Inf` (pρ ) is a degree-2d
Proof. The identity Inf` (pρ ) = ∑`∈S⊆[ j+1,n] c
polynomial in ρ. Hence the claim follows from the concentration bound, Theorem 2.2, assuming we can
appropriately upper bound the `2 -norm of the polynomial Inf` (pρ ). So, to prove Claim 3.12 it suffices to
show that

                                          kInf` (pρ )k2 ≤ 3d Inf` (p) .                                 (3.5)
   The proof of equation (3.5) is similar to the argument establishing that kQk2 ≤ 3d de−α in Section 3.1.
The triangle inequality tells us that we may bound the `2 -norm of each squared-coefficient separately:

                                   kInf` (pρ )k2 ≤          ∑       k pbρ (S)2 k2 .
                                                     `∈S⊆[ j+1,n]

Since pbρ (S) is a degree-d polynomial, Theorem 2.1 yields that

                                  k pbρ (S)2 k2 = k pbρ (S)k24 ≤ 3d k pbρ (S)k22 ,

hence
                           kInf` (pρ )k2 ≤ 3d        ∑         k pbρ (S)k22 = 3d Inf` (p) ,
                                                `∈S⊆[ j+1,n]

where the last equality is a consequence of Fact 3.4. Thus equation (3.5) holds, and Claim 3.12 is
proved.

    Claim 3.12 says that for any given coordinate, the probability that its influence after a random
restriction increases by a t factor decreases exponentially in t. Note that Claim 3.12 and a naive union
bound over all coordinates in [k + 1, n] does not suffice to prove Lemma 3.11. Instead, we proceed as
follows: We partition the coordinates in [k + 1, n] into “buckets” according to their influence in the tail of
p. In particular, the i-th bucket (i ≥ 0) contains all variables ` ∈ [k + 1, n] such that

                                          Inf` (p)
                                                        ∈ [τ/2i+1 , τ/2i ] .
                                      ∑nj=k+1 Inf j (p)

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                               38
                    A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

    We analyze the effect of a random restriction ρ on the variables of each bucket i separately and then
conclude by a union bound over all the buckets.
    So fix a bucket i. Note that, by definition, the number of variables in the i-th bucket is at most
2i+1 /τ. We bound from above the probability of the event B(i) that there exists a variable ` in bucket i
that violates the regularity constraint, i. e., such that Inf` (pρ ) > τ 0 ∑n`=k+1 Inf` (p). We will do this by a
combination of Claim 3.12 and a union bound over the variables in the bucket. We will show:

Claim 3.13. We have that Prρ [B(i)] ≤ 2−(i+2) ·C−d .

     The above claim completes the proof of Lemma 3.11 by a union bound across buckets. Indeed,
assuming the claim, the probability that any variable ` ∈ [k + 1, n] violates the condition Inf` (pρ ) ≤
τ 0 ∑n`=k+1 Inf` (p) is at most

                              ∞                           ∞
                              ∑ Prρ [B(i)] ≤ C−d 2−2 ∑ (1/2)i = (1/2) ·C−d .
                              i=0                        i=0


    It thus remains to prove Claim 3.13. Fix a variable ` in the i-th bucket. We apply Claim 3.12 selecting
a value of
                                                               d
                                                      Cd 4i+2
                                                   
                                               def
                                         t =et = ln                .
                                                         τ

                 t ≤ c0d (d + i + ln(1/τ))d for some absolute constant c0 . As a consequence, there is an
It is clear that e
absolute constant C0 such that for every i,

                                                            1 d
                                                             
                                              −d 0d i
                                        t ≤ 3 C 2 d ln d ln
                                        e                       .                                          (3.6)
                                                            τ

(To see this, note that for i ≤ 10d ln d we have d + i + ln(1/τ) < 11d ln d ln(1/τ), from which the claimed
bound is easily seen to hold. For i > 10d ln d, we use d + i + ln(1/τ) < di ln(1/τ) and the fact that id < 2i
for i > 10d ln d.)
    Inequality (3.6) can be rewritten as 3d · et · τ/2i ≤ τ 0 . Hence, our assumption on the range of Inf` (p)
gives
                                                                  n
                                           t · Inf` (p) ≤ τ 0 · ∑ Inf j (p) .
                                      3d · e
                                                               j=k+1

Therefore, by Claim 3.12, the probability that coordinate ` violates the condition

                                                              n
                                         Inf` (pρ ) ≤ τ 0 ∑ Inf j (p)
                                                          j=k+1


is at most τ/(Cd 4i+2 ) by our choice of e
                                         t . Since bucket i contains at most 2i+1 /τ coordinates, Claim 3.13
follows by a union bound. Hence Lemma 3.11, and thus Lemma 3.9, is proved.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                  39
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

3.3   Putting everything together: proof of Theorem 1.1
The following lemma combines the results of the previous two subsections:
Lemma 3.14. Let p : {−1, 1}n → R be a degree-d polynomial and 0 < τe, β < 1/2. Fix

                  α = Θ(d log log(1/β ) + d log d) and        τe0 = τe · (C0 d ln d ln(1/τe))d ,

where C0 is a universal constant. (We assume w. l. o. g. that the variables are ordered s.t. Infi (p) ≥
Infi+1 (p), i ∈ [n − 1].) One of the following statements holds true:
   1. The polynomial p is τe-regular.

   2. With probability at least 1/(2Cd ) over a random restriction ρ fixing the first α/τe (most influential)
      variables of p, the function sign(pρ ) is β -close to a constant function.

   3. There exists a value k ≤ α/τe, such that with probability at least 1/(2Cd ) over a random restriction
      ρ fixing the first k (most influential) variables of p, the polynomial pρ is τe0 -regular.
Proof. The proof is by case analysis based on the value ` of the τe-critical index of the polynomial p. If
` = 0, then by definition p is τe-regular, hence the first statement of the lemma holds. If ` > α/τe, then we
randomly restrict the first α/τe many variables. Lemma 3.5 says that for a random restriction ρ fixing
these variables, with probability at least 1/(2Cd ) the (restricted) degree-d PTF sign(pρ ) is β -close to
a constant. Hence, in this case, the second statement holds. To handle the case ` ∈ [1, α/τe], we apply
Lemma 3.9. This lemma says that with probability at least 1/(2Cd ) over a random restriction ρ fixing
variables [`], the polynomial pρ is τe0 -regular, so the third statement of Lemma 3.14 holds.

Proof of Theorem 1.1. We begin by observing that any function f on {−1, 1}n is equivalent to a decision
tree where each internal node of the tree is labeled by a variable, every root-to-leaf path corresponds
to a restriction ρ that fixes the variables as they are set on the path, and every leaf is labeled with the
restricted subfunction fρ . Given an arbitrary degree-d PTF f = sign(p), we will construct a decision tree
T of the form described in Theorem 1.1. It is clear that in any such tree every leaf function fρ will be a
degree-d PTF; we must show that T has depth depth(d, τ) and that with probability 1 − τ over the choice
of a random root-to-leaf path ρ, the restriction ρ is such that either fρ = sign(pρ ) is τ-close to a constant
function, or pρ is τ-regular.
    For a tree T computing f = sign(p), we denote by N(T ) its set of internal nodes and by L(T ) its set
of leaves. We call a leaf ρ ∈ L(T ) “good” if either fρ is τ-close to a constant function or pρ is τ-regular.
We call a leaf “bad” otherwise. Let GL(T ) and BL(T ) be the sets of “good” and “bad” leaves in T
respectively.
    The basic approach for the proof is to invoke Lemma 3.14 repeatedly in a sequence of at most
2Cd ln(1/τ) stages. In the first stage we apply Lemma 3.14 to f itself; this gives us an initial decision
tree. In the second stage we apply Lemma 3.14 to the restricted subfunctions fρ corresponding to leaves
of the initial decision tree that are bad; this “grows” our initial decision tree. Subsequent stages continue
similarly; we will argue that after at most 2Cd ln(1/τ) stages, the resulting tree satisfies the required
properties for T. In every application of Lemma 3.14 the parameters β and τe0 are both taken to be τ; note
that taking τe0 to be τ sets the value of τe in Lemma 3.14 to a value that is less than τ.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                               40
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

     We now provide the details. In the first stage, the initial application of Lemma 3.14 results in a tree
T1 . This tree T1 may consist of a single leaf node that is τe-regular (if f is τe-regular to begin with—in this
case, since τe < τ, we are done), or a complete decision tree of depth α/τe (if f had large critical index),
or a complete decision tree of depth k < α/τe (if f had small critical index). Note that in each case the
depth of T1 is at most α/τe. Lemma 3.14 guarantees that:
                                    Prρ∈T1 [ρ ∈ BL(T1 )] ≤ 1 − 1/(2Cd ) ,
where the probability is over a random root-to-leaf path ρ in T1 .
    In the second stage, the “good” leaves ρ ∈ GL(T1 ) are left untouched; they will be leaves in the final
tree T. For each “bad” leaf ρ ∈ BL(T1 ), we order the unrestricted variables in decreasing order of their
influence in the polynomial pρ , and we apply Lemma 3.14 to fρ . This “grows” T1 at each bad leaf by
replacing each such leaf with a new decision tree; we call the resulting overall decision tree T2 .
    A key observation is that the probability that a random path from the root reaches a “bad” leaf is
significantly smaller in T2 than in T1 ; in particular
                                  Prρ∈T2 [ρ ∈ BL(T2 )] ≤ (1 − 1/(2Cd ))2 .
We argue this as follows: Let ρ be any fixed “bad” leaf in T1 , i. e., ρ ∈ BL(T1 ). The function fρ is not
τe0 -regular and consequently not τe-regular. Thus, either statement (2) or (3) of Lemma 3.14 must hold
when the Lemma is applied to fρ . The tree that replaces ρ in T0 has depth at most α/τe, and a random
root-to-leaf path ρ1 in this tree reaches a “bad” leaf with probability at most 1 − 1/(2Cd ). So the overall
probability that a random root-to-leaf path in T2 reaches a “bad” leaf is at most (1 − 1/(2Cd ))2 .
      Continuing in this fashion, in the i-th stage we replace all the bad leaves of Ti−1 by decision trees
according to Lemma 3.14 and we obtain the tree Ti . An inductive argument gives that
                                   Prρ∈Ti [ρ ∈ BL(Ti )] ≤ (1 − 1/(2Cd ))i ,
                          def
which is at most τ for i∗ = 2Cd ln(1/τ).
     The depth of the overall tree will be the maximum number of stages (2Cd ln(1/τ)) times the maximum
depth added in each stage (at most α/τe, since we always restrict at most this many variables), which is at
most (α/τe) · i∗ . Since β = τ, we get α = Θ(d log log(1/τ) + d log d). Recalling that τe0 in Lemma 3.14 is
set to τ, we see that
                                                           τ
                                          τe =                         .
                                               (C0 d ln d ln(1/τ))O(d)
By substitution we get that the depth of the tree is upper bounded by d O(d) · (1/τ) · log(1/τ)O(d) which
concludes the proof of Theorem 1.1.


4    Every degree-d PTF has a low-weight approximator
In this section we apply Theorem 1.1 to prove Theorem 1.2, which we restate below:
Theorem 1.2. Let f (x) = sign(p(x)) be any degree-d PTF. Fix any ε > 0 and let τ = (Θ(1) · ε/d)8d .
Then there is a polynomial q(x) of degree D = d + depth(d, τ) and weight nd · 24depth(d,τ) · (d/ε)O(d) ,
which is such that the PTF sign(q(x)) is ε-close to f .

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                 41
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

    To prove Theorem 1.2, we first show that any sufficiently regular degree-d PTF over n variables has a
low-weight approximator, of weight roughly nd . Theorem 1.1 asserts that almost every leaf ρ of T is a
regular PTF or close to a constant function; at each regular leaf ρ we use the low-weight approximator of
the previous sentence to approximate fρ . Finally, we combine all of these low-weight polynomials to get
an overall PTF of low weight which is a good approximator for f . We give details below.

4.1   Low-weight approximators for regular PTFs
In this subsection we prove that every sufficiently regular PTF has a low-weight approximator of degree
d:

Lemma 4.1. Given ε > 0, let τ = (Θ(1) · ε/d)8d . Let p : {−1, 1}n → R be a τ-regular degree-d
polynomial with Var[p] = 1. There exists a degree-d polynomial q : {−1, 1}n → R of weight nd · (d/ε)O(d)
such that sign(q(x)) is an ε-approximator for sign(p(x)).

Proof. The polynomial q is obtained by rounding the weights of p to an appropriate granularity, similar
to the regular case in [33] for the d = 1 case. To show that this works, we use the fact that regular PTFs
have very good anti-concentration. In particular we will use the following claim, which follows easily by
combining the invariance principle [26] and Gaussian anti-concentration [5]:
Claim 4.2. Let p : {−1, 1}n → R be a τ-regular degree-d polynomial with Var[p] = 1. Then

                                      Prx [|p(x)| ≤ τ] ≤ O(dτ 1/8d ) .

Proof. We recall that, since Var[p] = 1 and p is of degree d, it holds Inf(p) ≤ d. Thus, since p is
τ-regular, we have that maxi∈[n] Infi (p) ≤ dτ. An application of the invariance principle (Theorem 2.5) in
tandem with anti-concentration in gaussian space (Theorem 2.4) yields

                         Prx [|p(x)| ≤ τ] ≤ O(d · (dτ)1/8d ) + PrG∼Nn [|p(G)| ≤ τ]
                                         ≤ O(dτ 1/8d ) + O(dτ 1/d ) = O(dτ 1/8d ) ,

and the claim follows.

    We turn to the detailed proof of Lemma 4.1. We first note that if the constant coefficient pb(0)
                                                                                                   / of P
                                           d/2
has magnitude greater than (O(log(1/ε))) , then Theorem 2.2 (applied to p(x) − pb(0))     / implies that
                                / for at least a 1 − ε fraction of inputs x. So in this case sign(p(x)) is
sign(p(x)) agrees with sign( pb(0))
ε-close to a constant function, and the conclusion of the Lemma certainly holds. Thus we henceforth
                 / is at most (O(log(1/ε)))d/2 .
assume that | pb(0)|
    Let
                                                       τ
                                         α=
                                              (Kn · ln(4/ε))d/2
where K > 0 is an absolute constant (specified later). For each S 6= 0/ let qb(S) be the value obtained by
rounding pb(S) to the nearest integer multiple of α, and let qb(0) / equal pb(0).
                                                                                / This defines a degree-d
polynomial q(x) = ∑S qb(S)χS (x). It is easy to see that rescaling by α, all of the non-constant coefficients

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              42
                     A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

of q(x)/α are integers. Since each coefficient qb(S) has magnitude at most twice that of pb(S), we may
bound the sum of squares of coefficients of q(x)/α by

                              / 2 ∑S6=0/ 4 pb(S)2 (O(log(1/ε)))d
                           pb(0)
                                 +               ≤               ≤ nd · (d/ε)O(d) .
                             α2         α2              α2
We now observe that the constant coefficient pb(0)    / of q(x) can be rounded to an integer multiple of α
without changing the value of sign(q(x)) for any input x. Doing this, we obtain a polynomial q0 (x)/α
with all integer coefficients, weight nd · (d/ε)O(d) , and which has sign(q0 (x)) = sign(q(x)) for all x.
      In the rest of our analysis we shall consider the polynomial q(x) (recall that the constant coefficient of
                       / It remains to show that sign(q) is an ε-approximator for sign(p). For each S 6= 0/
q(x) is precisely pb(0)).
let eb(S) equal pb(S) − qb(S). This defines a polynomial (with constant term 0) e(x) = ∑S eb(S)xS , and we
have q(x) + e(x) = p(x). (The coefficients eb(S) are the “errors” induced by approximating pb(S) by qb(S).)
      Recall that τ = (Θ(1) · ε/d)8d . For any input x, we have that sign(q(x)) 6= sign(p(x)) only if either

  (i) |e(x)| ≥ τ, or

  (ii) |p(x)| ≤ τ.

Since each coefficient of e(x) satisfies
                                                 α            τ
                                      |b
                                       e(S)| ≤     ≤                    ,
                                                 2   2(Kn · ln(4/ε))d/2

the sum of squares of all (at most nd ) coefficients of e is at most

                                        τ2                                      τ
                     ∑ eb(S)2 ≤                 ,    and thus   kek2 ≤                   .
                       S          4(K ln(4/ε))d                          2(K ln(4/ε))d/2

Applying Theorem 2.2, we get that Prx [|e(x)| ≥ τ] ≤ ε/2 (for a suitable absolute constant choice of K),
so we have upper bounded the probability of (i).
    For (ii), we use the anti-concentration bound for regular polynomials, Claim 4.2. This directly gives
us that Prx [|p(x)| ≤ τ] ≤ O(dτ 1/8d ) ≤ ε/2.
    Thus the probability, over a random x, that either (i) or (ii) holds is at most ε. Consequently sign(q)
is an ε-approximator for sign(p), and Lemma 4.1 is proved.

4.2   Proof of Theorem 1.2
Let f = sign(p) be an arbitrary degree-d PTF over n Boolean variables, and let ε > 0 be the desired
approximation parameter. We invoke Theorem 1.1 with its “τ” parameter set to τ = (Θ(1) · (ε/2)/d)8d
(i. e., our choice of τ is obtained by plugging in “ε/2” for ε in the first sentence of Lemma 4.1). Each leaf
ρ of the tree T as described in Theorem 1.1 (we call these “good” leaves) is either a τ-regular degree-d
PTF or τ-approximated by a constant function. By Lemma 4.1, for each regular leaf ρ there is a degree-d
polynomial of weight nd · (d/ε)O(d) , which we denote q(ρ) , such that fρ is ε/2-close to sign(q(ρ) ). For
each of the other leaves in T (which are reached by at most a τ fraction of all inputs to T —we call

                           T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                             43
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

these “bad” leaves), for which fρ is not τ-close to any τ-regular degree-d PTF, let q(ρ) be the constant-1
function.
    For each leaf ρ of depth r in T , let Pρ (x) be the unique multilinear polynomial of degree r which
outputs 2r iff x reaches ρ and outputs 0 otherwise. (As an example, if ρ is a leaf which is reached by the
path “x3 = −1, x6 = 1, x2 = 1” from the root in T , then Pρ (x) would be (1 − x3 )(1 + x6 )(1 + x2 ).) Our
final PTF is
                          g(x) = sign(Q(x)) , where Q(x) = ∑ Pρ (x)q(ρ) (x) .
                                                                     ρ

It is easy to see that on any input x, the value Q(x) equals 2|ρx | · q(ρx ) (x), where we write ρx to denote the
leaf of T that x reaches and |ρx | to denote the depth of that leaf. Thus sign(Q(x)) equals sign(q(ρx ) (x))
for each x, and from this it follows that Prx [g(x) 6= f (x)] is at most τ + τ + ε/2 < ε. Here the first τ is
because a random input x may reach a bad leaf with probability up to τ, and the τ + ε/2 is because for
each good leaf ρ, the function is either τ-approximated by a constant function or ε/2-approximated by
sign(q(ρ) ).
     Since T has depth depth(d, τ), it is easy to see that Q has degree at most depth(d, τ) + d. It is
clear that the coefficients of Q are all integers, so it remains only to bound the sum of squares of these
coefficients. Each polynomial addend Pρ (x)q(ρ) (x) in the sum is easily seen to have sum of squared
coefficients
                                                      
                                (ρ) 2
       ∑ P\  (ρ)   2
          ρ q (S) = E[(Pρ · q ) ] ≤ max Pρ (x)
                                                     2
                                                         · E[q(ρ) (x)2 ] ≤ 22depth(d,τ) · nd · (d/ε)O(d) .  (4.1)
      S                                        x


Since T has depth depth(d, τ), the number of leaves ρ is at most 2depth(d,τ) , and hence for each S by
Cauchy-Schwarz we have
                                           2
                           Q(S) = ∑ Pρ q (S) ≤ 2depth(d,τ) · ∑ P\
                           b   2    \   (ρ)                        (ρ) 2
                                                                ρ q (S) .                                  (4.2)
                                       ρ                                 ρ


This implies that the total weight of Q is

                   b 2 ≤ 2depth(d,τ) · ∑ P\
                 ∑ Q(S)                       (ρ)
                                           ρ q (S)
                                                      2
                                                                                 (using (4.2))
                 S                     ρ,S
                                                             
                        ≤ 22depth(d,τ) max ∑ P\   ρ q(ρ) (S)2
                                           ρ       S

                          ≤ 24depth(d,τ) · nd · (d/ε)O(d) ,                      (using (4.1))

and Theorem 1.2 is proved.


4.3                         e d )-weight approximators
      Degree-d PTFs require Ω(n
In this section we give two lower bounds on the weight required to ε-approximate certain degree-d PTFs.
(We use the notation Ωd () below to indicate that the hidden constant of the big-Omega depends on d.)

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                  44
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

Theorem 4.3. For all sufficiently large n, there is a degree-d n-variable PTF f (x) with the following
property: Let K(d) be any positive-valued function depending only on d. Suppose that g(x) = sign(q(x))
                                                                                            def
is a degree-K(d) PTF with integer coefficients qb(S) such that dist( f , g) ≤ ε ? where ε ? = C−d /2. Then
the weight of q is Ωd (nd / log n).

Theorem 4.4. For all sufficiently large n, there is a degree-d n-variable PTF f (x) with the following
property: Suppose that g(x) = sign(q(x)) is any PTF (of any degree) with integer coefficients qb(S) such
                               def
that dist( f , g) ≤ ε ? where ε ? = C−d /2. Then the weight of q is Ωd (nd−1 ).

   Viewing d and ε as constants, Theorem 4.3 implies that the O(nd ) weight bound of our ε-approximator
from Theorem 1.2 (of constant degree) is essentially optimal for any constant-degree ε-approximator.
Theorem 4.4 says that there is only small room for improving our weight bound even if arbitrary-degree
PTFs are allowed as approximators.
   Theorems 4.3 and 4.4 are both consequences of the following theorem:
                                                                     d
Theorem 4.5. There exists a set C = { f1 , . . . , fM } of M = 2Ωd (n ) degree-d PTFs fi such that for any
1 ≤ i < j ≤ M, we have dist( fi , f j ) ≥ C−d .

Proof of Theorems 4.3 and 4.4 assuming Theorem 4.5. First we prove Theorem 4.3. We begin by claim-
ing that there are at most
                                                  A
                                               n
                                          3
                                             ≤ K(d)

many integer-weight PTFs of degree K(d) and weight at most A. This is because any such PTF can be

   n
        by making a sequence of A steps, where at each step either −1, 0, or 1 is added
obtained
                                                                                           n
                                                                                               to one of the
 ≤K(d)  many    monomials   of degree at most   K(d). Each  step can be carried out in 3 ≤K(d) ways, giving
the claimed bound.
    By Theorem 4.5, there are M distinct degree-d PTFs f1 , . . . , fM any two of which are C−d -far from
each other. Consequently any Boolean function (in particular, any weight-A degree-K(d) PTF g) can
have dist(g, fi ) ≤ C−d /2 for at most one fi . Since there are only
                                                     A
                                                   n
                                               3
                                                 ≤ K(d)

many weight-A degree-K(d) PTFs, which is less than M for some A = Ωd (nd / log n), it follows that some
fi must have distance at least C−d /2 from every weight-A, degree-K(d) PTF. This gives Theorem 4.3.
    The proof of Theorem 4.4 is nearly identical. We now use the fact that there are at most (3 · 2n )A
many integer-weight PTFs of weight at most A (and any degree), and use the fact that (3 · 2n )A is less than
M for some A = Ωd (nd−1 ).


    It remains to prove Theorem 4.5.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              45
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

4.3.1   Proof of Theorem 4.5
The proof is by the probabilistic method. We define the following distribution D over n-variable degree-d
polynomials. A draw of p(x) = ∑S⊂[n],|S|=d pb(S)χS (x) from D is obtained in the following way: each of
the dn coefficients pb(S) is independently and uniformly selected from {−1, 1}.
      

    We will prove Theorem 4.5 using Lemma 4.6, which says that it is extremely likely the polynomial
c—the product of two independent draws a and b from D—will have both small bias and large variance.
Lemma 4.6. Let a(x) and b(x) be two degree-d polynomials drawn independently from D, and let
                                                           d
c(x) = a(x)b(x). Then with probability at least 1 − 2−Ωd (n ) we have:

         / ≤ 41 C−d n/2
                       
   1. |b
       c(0)|          d , and

                                1 n/2 2
   2. Var[c] = ∑|S|>0 cb(S)2 ≥ 12
                                     
                                   d    .
    Suppose that Lemma 4.6 holds. Let a(x) and b(x) be independent draws from D and let c(x) =
a(x)b(x) which satisfies the conclusions of the lemma. Then the constant term cb(0)     / is small compared
with the variance of c(x). Let us rescale so the variance is 1; i. e., define the polynomial

                                                   def     c(x)
                                              e(x) =
                                                         Var[c]1/2

so Var[e] = 1 and |b   / < C−d . We now apply Theorem 2.3 to the degree-2d polynomial q(x) =
                     e(0)|
            / and we see that with probability at least C−d (over a random uniform draw of x) we have
−e(x) + eb(0),
            / > C−d , and hence Prx [sign(e(x)) < 0] > C−d .
−e(x) + eb(0)
    We now observe that sign(e(x)) < 0 if and only if sign(a(x)) 6= sign(b(x)), and consequently
Prx [sign(e(x)) < 0] is precisely dist(sign(a), sign(b)). We thus have that for a(x), b(x) drawn from
                                                                                                    d
D as described above, the probability that dist(sign(a), sign(b)) is less than C−d is at most 2−αd n for
some absolute constant αd > 0 (depending only on d).
    Now let us consider
                                                          d
                                             M = 2(αd /2)n
many independent draws of polynomials a1 , a2 , . . . , aM from D. A union bound over all the
                                            
                                             M                d
                                                       < 2αd n
                                              2
pairs (i, j) with 1 ≤ i < j ≤ M gives that with nonzero probability, every ai , a j pair satisfies

                                      dist(sign(ai ), sign(a j )) ≥ C−d .

Thus there must be some outcome for the polynomials a1 , a2 , . . . , aM such that

                                       dist(sign(ai ), sign(a j )) ≥ C−d

for all 1 ≤ i < j ≤ M. Setting fi = sign(ai ) for this outcome, Theorem 4.5 is proved.
    It remains only for us to prove Lemma 4.6.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                            46
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

4.3.2   Proof of Lemma 4.6
Let us consider a(x) = ∑|S|=d ab(S)χS (x) and b(x) = ∑|S|=d b b(S)χS (x) drawn independently from D. We
will show that the bias of the polynomial c(x) = a(x)b(x) fails to satisfy the bound in item 1 with
                   d                                                                                   d
probability 2−Ωd (n ) . Then we show the variance of c fails to satisfy item 2 with probability 2−Ωd (n ) , and
the lemma follows from a union bound.
    To bound the bias of c, we begin by noting that:

                                             cb(0)
                                                / = ∑ ab(S)b
                                                           b(S) .
                                                   S⊆[n]


               b(S) in the summand is uniform, i.i.d in {−1, 1}. Define the random variable XS =
Each term ab(S)b
1/2 − (1/2)b
           a(S)b
               b(S). Then ∑S⊆[n] XS is binomially distributed and setting

                                                 1     n d
                                              t = C−d        ,
                                                 4     2d
we may apply the Chernoff bound to obtain:
                                                                    !
                                                                 t2
                                  
                       1 −d  n d                                           d
                   / <− C
             Pr cb(0)                = Pr[X > E[X] + t] ≤ exp −2 n = 2−Ωd (n ) .
                       4     2d                                   d

The same analysis gives a bound on the magnitude in the negative direction. Since
                                           
                                          n/2         n d
                                                ≥          ,
                                           d         2d

this concludes the analysis for the first item of the lemma.
    Now we show that item 2 of the lemma also fails with very small probability. The following
terminology will be useful. Let T ⊂ [n] be a subset of size exactly |T | = 2d (we think of T as the set of
variables defining some monomial of degree 2d). For such a T we let

                             def                                      def
                     first(T ) = T ∩ [n/2]      and        second(T ) = T ∩ [n/2 + 1, n] .

                                                                                                   n/22
We say that such a T is balanced if |first(T )| = |second(T )| = d. Note that there are exactly     d    many
balanced subsets T .
    We say that a subset U ⊂ [n], |U| = d is pure if U is contained entirely in [n/2 + 1, n].
    Let us consider

                        a(x) = ∑ ab(S)χS (x)          and       b(x) = ∑ b
                                                                         b(S)χS (x)
                                   |S|=d                               |S|=d


drawn independently from D. Fix any outcome for a (i. e., for the values of all dn coefficients ab(S)), and
                                                                                  

fix any outcome for b
                    b(U) for every U which is not pure. Thus the only “remaining randomness” is the

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                47
             I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

value (drawn uniformly from {−1, 1}}) for each of the n/2
                                                              
                                                            d coefficients b(U) for pure U. We will show
                                                                           b
                                          d
that with probability at least 1 − 2−Ωd (n ) over the remaining randomness, at least 1/6 of the
                                                    2
                                                     n/2
                                                      d
many balanced subsets T have cb(T ) 6= 0. Since each value cb(T ) which is nonzero is at least 1 in magnitude,
this suffices to prove the lemma.
    Consider any fixed pure subset U ⊂ [n], |U| = d (for example U = {n − d + 1, . . . , n}). Let T be a
balanced subset of n (so |T | = 2d) such that second(T ) equals U. (There are precisely n/2
                                                                                                    
                                                                                                 d    balanced
subsets T with this property; let TU denote the collection of all d of them.) Consider the value cb(T ):
                                                                   n/2 

this is
                                      cb(T ) = ∑ ab(S)b     b(T − S) .
                                               S⊆T,|S|=d

The only “not-yet-fixed” part of the above expression is the single coefficient b    b(U); everything else has
been fixed. Since the coefficient ab(T −U) of b(U) is a nonzero integer, there are two possible outcomes
                                                 b
for the value of cb(T ), depending on whether b b(U) is set to +1 or -1. These two possible values differ by
2; consequently, there is at most one possible outcome of b   b(U) that will cause cb(T ) to be zero. (Note that
it may well be the case that no outcome for b  b(U) would cause cb(T ) to become zero.)
     Let us say that an outcome of b b(U) is pernicious if it has the following property: at most 1/3 of the
  n/2
   d   elements T ∈ TU have cb(T ) take a nonzero value under that outcome of b        b(U). (Equivalently, at
least 2/3 of the d elements T ∈ TU have cb(T ) become zero under that outcome of b
                    n/2
                                                                                              b(U).) It may be
the case that neither outcome in {−1, 1} for b(U) is pernicious (e. g., if each outcome makes at least
                                                 b
95% of the cb(T ) values come out nonzero). It cannot be the case that both outcomes {−1, 1} for b          b(U)
are pernicious (for if there were two pernicious outcomes, this would mean that at least 1/3 of the cb(T )
values evaluate to 0 under both outcomes for b  b(U), but it is impossible for any cb(T ) to evaluate to 0 under
two outcomes for b   b(U)). Consequently we have
                                                                      
                               Pr the outcome of b  b(U) is pernicious ≤ 1/2 .

     This is true independently for each of the n/2
                                                      
                                                    d many pure subsets U. As a result, a simple analysis
gives
                                   
             h                      n/2                                                 i           d
           Pr at least 3/4 of the         pure subsets U have a pernicious outcome ≤ 2−Ωd (n ) .
                                     d

Thus we may assume that fewer than 3/4 of the n/2
                                                        
                                                       d pure subsets U have a pernicious outcome. So at
least 1/4 of the pure subsets U are non-pernicious. For each such non-pernicious U, more than 1/3 of
the n/2d elements in TU have c
         
                                 b(T ) take a nonzero value. Consequently, at least

                                                   1 n/2 2
                                                      

                                                  12 d
many balanced subsets T overall have cb(T ) 6= 0. This proves the lemma.

                       T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                                 48
                   A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

5    Conclusions and open problems
In this paper, we gave a regularity lemma for low-degree PTFs. Our lemma roughly says that any
low-degree PTF can be decomposed as a shallow binary decision tree T, such that “most” leaves of T
correspond to “regular” low-degree PTFs (or constant functions). As an application, we showed that any
constant-degree PTF is well-approximated by a constant-degree PTF with low integer weights. Our result
has also been used as an essential ingredient in the recent line of work [8, 19] establishing that bounded
independence fools low-degree PTFs.
     We conclude this paper by mentioning a few relevant open problems. At the quantitative level, an
obvious question is whether the depth of the tree T must depend exponentially on the degree parameter d.
It is plausible to conjecture that such a dependence on d is inherent. It would also be very interesting
to devise qualitatively improved “regularity lemmas” for PTFs (e. g., using a more refined notion of
regularity) that may result in improved bounds for applications; see [20] for a first important step in this
direction.
     We believe that our regularity lemma may find other applications. For example, it may be useful in
the problem of algorithmically reconstructing a degree-d PTF from its Fourier coefficients of degree at
most d (see [30] for a solution to the d = 1 version of this problem). Finally, regarding integer-weight
approximations, say that an integer weight approximator to a degree-d PTF is proper if its degree
is (at most) d. Clearly, the approximators that we obtain are not proper; constructing proper such
approximations is left as an interesting open problem.


References
 [1] JAMES A SPNES , R ICHARD B EIGEL , M ERRICK L. F URST, AND S TEVEN RUDICH: The expressive
     power of voting polynomials. Combinatorica, 14(2):135–148, 1994. Preliminary version in
     STOC’91. [doi:10.1007/BF01215346] 28

 [2] P ER AUSTRIN AND J OHAN H ÅSTAD: Randomly supported independence and resistance. SIAM J.
     Comput., 40(1):1–27, 2011. Preliminary version in STOC’09. [doi:10.1137/100783534] 33

 [3] A LINE B ONAMI: Étude des coefficients Fourier des fonctions de L p (G). Annales de l’Institute
     Fourier, 20(2):335–402, 1970. EuDML. 32

 [4] J EHOSHUA B RUCK: Harmonic analysis of polynomial threshold functions. SIAM J. Discrete Math.,
     3(2):168–177, 1990. [doi:10.1137/0403015] 28

 [5] A NTHONY C ARBERY AND JAMES W RIGHT: Distributional and Lq norm inequalities for
     polynomials over convex bodies in Rn . Mathematical Research Letters, 8(3):233–248, 2001.
     [doi:10.4310/MRL.2001.v8.n3.a1] 33, 42

 [6] I LIAS D IAKONIKOLAS , PARIKSHIT G OPALAN , R AGESH JAISWAL , ROCCO A. S ERVEDIO , AND
     E MANUELE V IOLA: Bounded independence fools halfspaces. SIAM J. Comput., 39(8):3441–3462,
     2010. Preliminary version in FOCS’09. See also at ECCC. [doi:10.1137/100783030] 28, 29, 30

                      T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                              49
            I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

 [7] I LIAS D IAKONIKOLAS , P RAHLADH H ARSHA , A DAM K LIVANS , R AGHU M EKA , P RASAD
     R AGHAVENDRA , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Bounding the average sensitivity
     and noise sensitivity of polynomial threshold functions. In Proc. 42nd STOC, pp. 533–542. ACM
     Press, 2010. [doi:10.1145/1806689.1806763] 30

 [8] I LIAS D IAKONIKOLAS , DANIEL M. K ANE , AND J ELANI N ELSON: Bounded independence fools
     degree-2 threshold functions. In Proc. 51st FOCS, pp. 11–20. IEEE Comp. Soc. Press, 2010. See
     also at ECCC and arXiv. [doi:10.1109/FOCS.2010.8] 29, 30, 31, 49

 [9] I LIAS D IAKONIKOLAS , P RASAD R AGHAVENDRA , ROCCO S ERVEDIO , AND L I -YANG TAN:
     Average sensitivity and noise sensitivity of polynomial threshold functions. Technical report, 2009.
     To appear in SIAM J. Comput. [arXiv:0909.5011] 28, 29, 30, 33

[10] I LIAS D IAKONIKOLAS AND ROCCO A. S ERVEDIO: Improved approximation of linear thresh-
     old functions. Comput. Complexity, 22(3):623–677, 2013. Preliminary version in CCC’09.
     [doi:10.1007/s00037-012-0045-5] 28, 30

[11] I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN: A regularity
     lemma, and low-weight approximators, for low-degree polynomial threshold functions. In Proc.
     25th IEEE Conf. on Computational Complexity (CCC’10), pp. 211–222. IEEE Comp. Soc. Press,
     2010. [doi:10.1109/CCC.2010.28] 27, 29

[12] I RIT D INUR , E HUD F RIEDGUT, G UY K INDLER , AND RYAN O’D ONNELL: On the Fourier tails
     of bounded functions over the discrete cube. Israel J. Math., 160(1):389–412, 2007. Preliminary
     version in STOC’06. [doi:10.1007/s11856-007-0068-9] 33

[13] C RAIG G OTSMAN AND NATHAN L INIAL: Spectral properties of threshold functions. Combinator-
     ica, 14(1):35–50, 1994. [doi:10.1007/BF01305949] 28

[14] B EN G REEN: A Szemerédi-type regularity lemma in abelian groups, with applications. Geometric
     and Functional Analysis (GAFA), 15(2):340–376, 2005. See also at arXiv. [doi:10.1007/s00039-
     005-0509-8] 28

[15] L EONARD G ROSS: Logarithmic Sobolev inequalities. Amer. J. Math., 97(4):1061–1083, 1975.
     JSTOR. 32

[16] P RAHLADH H ARSHA , A DAM K LIVANS , AND R AGHU M EKA: Bounding the sensitivity of
     polynomial threshold functions. Theory of Computing, 10(1):1–26, 2014. See also at arXiv.
     [doi:10.4086/toc.2014.v010a001] 28, 30

[17] J EFF K AHN , G IL K ALAI , AND NATHAN L INIAL: The influence of variables on Boolean functions.
     In Proc. 29th FOCS, pp. 68–80. IEEE Comp. Soc. Press, 1988. [doi:10.1109/SFCS.1988.21923] 32

[18] G IL K ALAI AND S HMUEL S AFRA: Threshold phenomena and influence. In Computational
     Complexity and Statistical Physics, pp. 25–60. Oxford Univ. Press, 2006. Available at author’s
     home page. 28

                     T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                            50
                  A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

[19] DANIEL M. K ANE: k-independent Gaussians fool polynomial threshold functions. In Proc. 26th
     IEEE Conf. on Computational Complexity (CCC’11), pp. 252–261. IEEE Comp. Soc. Press, 2011.
     See also at arXiv. [doi:10.1109/CCC.2011.13] 29, 30, 49

[20] DANIEL M. K ANE: A structure theorem for poorly anticoncentrated Gaussian chaoses and applica-
     tions to the study of polynomial threshold functions. In Proc. 53rd FOCS, pp. 91–100. IEEE Comp.
     Soc. Press, 2012. [doi:10.1109/FOCS.2012.52] 49

[21] DANIEL M. K ANE: The correct exponent for the Gotsman-Linial conjecture. In Proc. 28th
     IEEE Conf. on Computational Complexity (CCC’13), pp. 56–64, 2013. See also at arXiv.
     [doi:10.1109/CCC.2013.15] 29

[22] NATHAN L INIAL , Y ISHAY M ANSOUR , AND N OAM N ISAN: Constant depth circuits, Fourier
     transform, and learnability. J. ACM, 40(3):607–620, 1993. Preliminary version in FOCS’89.
     [doi:10.1145/174130.174138] 29, 34

[23] S HACHAR L OVETT, I DO B EN -E LIEZER , AND A RIEL YADIN: Polynomial threshold functions:
     Structure, approximation and pseudorandomness. Electron. Colloq. on Comput. Complexity (ECCC),
     16:118, 2009. ECCC. See also at arXiv. 30, 31

[24] K EVIN M ATULEF, RYAN O’D ONNELL , RONITT RUBINFELD , AND ROCCO A. S ERVEDIO: Testing
     halfspaces. SIAM J. Comput., 39(5):2004–2047, 2010. Preliminary version in SODA’09. See also at
     ECCC, and related paper in Property Testing. [doi:10.1137/070707890] 28

[25] R AGHU M EKA AND DAVID Z UCKERMAN: Pseudorandom generators for polynomial threshold
     functions. SIAM J. Comput., 42(3):1275–1301, 2013. Preliminary version in STOC’10. See also at
     arXiv. [doi:10.1137/100811623] 28, 29, 30, 31

[26] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 28, 33, 42

[27] RYAN O’D ONNELL: Analysis of Boolean Functions. Cambridge Univ. Press, 2014. See online
     version here. 28

[28] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: Extremal properties of polynomial thresh-
     old functions. J. Comput. System Sci., 74(3):298–312, 2008. Preliminary version in CCC’03.
     [doi:10.1016/j.jcss.2007.06.021] 28

[29] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: New degree bounds for polynomial threshold func-
     tions. Combinatorica, 30(3):327–358, 2010. Preliminary version in STOC’03. [doi:10.1007/s00493-
     010-2173-3] 28

[30] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: The Chow parameters problem. SIAM J. Comput.,
     40(1):165–199, 2011. Preliminary version in STOC’08. [doi:10.1137/090756466] 28, 29, 30, 49

                     T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                         51
           I LIAS D IAKONIKOLAS , ROCCO A. S ERVEDIO , L I -YANG TAN , AND A NDREW WAN

[31] V LADIMIR V. P ODOLSKII: Perceptrons of large weight. Problems of Information Transmission,
     45(1):46–53, 2009. Preliminary version in CSR’07. [doi:10.1134/S0032946009010062] 29

[32] M ICHAEL S AKS: Slicing the hypercube. In K EITH WALKER, editor, Surveys in Combinatorics
     1993, pp. 211–256. Cambridge Univ. Press, 1993. [doi:10.1017/CBO9780511662089.009] 28

[33] ROCCO A. S ERVEDIO: Every linear threshold function has a low-weight approximator. Comput.
     Complexity, 16(2):180–209, 2007. Preliminary version in CCC’06. [doi:10.1007/s00037-007-0228-
     7] 28, 29, 30, 42

[34] A LEXANDER A. S HERSTOV: The intersection of two halfspaces has high threshold degree.
     In Proc. 50th FOCS, pp. 343–362. IEEE Comp. Soc. Press, 2009. See also at ECCC.
     [doi:10.1109/FOCS.2009.18] 28

[35] E NDRE S ZEMERÉDI: Regular partitions of graphs. Colloq. Internat. CNRS: Problèmes combina-
     toires et théorie des graphes, 260:399–401, 1978. 28

[36] T ERENCE TAO: Structure and randomness in combinatorics. In Proc. 48th FOCS, pp. 3–15. IEEE
     Comp. Soc. Press, 2007. [doi:10.1109/FOCS.2007.17] 28

[37] RONALD DE W OLF: A Brief Introduction to Fourier Analysis on the Boolean Cube. Number 1 in
     Graduate Surveys. Theory of Computing Library, 2008. [doi:10.4086/toc.gs.2008.001] 32


AUTHORS

     Ilias Diakonikolas
     School of Informatics
     University of Edinburgh
     ilias.d ed.ac.uk
     http://www.iliasdiakonikolas.org


     Rocco A. Servedio
     Department of Computer Science
     Columbia University
     rocco cs.columbia.edu
     http://www.cs.columbia.edu/~rocco


     Li-Yang Tan
     Department of Computer Science
     Columbia University
     liyang cs.columbia.edu
     http://www.cs.columbia.edu/~liyang


                    T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                      52
                A R EGULARITY L EMMA FOR P OLYNOMIAL T HRESHOLD F UNCTIONS

    Andrew Wan
    Simons Institute for the Theory of Computing
    University of California, Berkeley
    atw12 eecs.berkeley.edu
    http://people.seas.harvard.edu/~atw12


ABOUT THE AUTHORS

    I LIAS D IAKONIKOLAS is an assistant professor in the School of Informatics at the University
        of Edinburgh. He obtained his Ph. D. from Columbia University advised by Mihalis
        Yannakakis. He is interested in algorithms, learning, statistics, and game theory.


    ROCCO S ERVEDIO is an associate professor in the Department of Computer Science at
      Columbia University. He graduated from Harvard University, where his Ph. D. was su-
      pervised by Les Valiant. He is interested in computational learning theory, computational
      complexity, and property testing, with the study of Boolean functions as an underlying
      theme tying these topics together. He enjoys spending time with his family and hopes to
      share a bottle of Cutty Sark with H.M. in the afterlife.


    L I -YANG TAN is a Ph. D. student at Columbia University where he is fortunate to be
         advised by Rocco Servedio. His research interests lie in complexity theory, with a focus
         on discrete Fourier analysis, the complexity of Boolean functions, and computational
         learning theory.


    A NDREW WAN is currently a research fellow at the Simons Institute for the Theory of
       Computing at UC Berkeley. Before this appointment, he was an assistant professor
       at IIIS, Tsinghua University. He received his doctorate from Columbia University in
       2010 under the supervision of Tal Malkin and Rocco Servedio. His interests include
       complexity theory, cryptography, and computational learning theory. Before graduate
       school, he was a student of philosophy at Columbia University and enjoyed playing the
       piano, the trumpet, and the accordion. Although he still enjoys playing music, the PAC
       model rarely affords him the time.




                    T HEORY OF C OMPUTING, Volume 10 (2), 2014, pp. 27–53                           53