Noise Stability is Computable and Approximately Low-Dimensional

Authors Anindya De, Elchanan Mossel, Joe Neeman,
Plaintext
                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47
                                        www.theoryofcomputing.org

                                        S PECIAL ISSUE : CCC 2017




               Noise Stability is Computable and
               Approximately Low-Dimensional
                   Anindya De∗                   Elchanan Mossel†                   Joe Neeman‡
                 Received September 13, 2017; Revised July 7, 2019; Published October 11, 2019




       Abstract: The notion of Gaussian noise stability plays an important role in hardness
       of approximation in theoretical computer science as well as in the theory of voting. The
       Gaussian noise stability of a partition of Rn is simply the probability that two correlated
       Gaussian vectors both fall into the same part. In many applications, the goal is to find an
       optimizer of noise stability among all possible partitions of Rn to k parts with given Gaussian
       measures µ1 , . . . , µk . We call a partition ε-optimal, if its noise stability is optimal up to an
       additive ε. In this paper, we give a computable function n(ε) such that an ε-optimal partition
       exists in Rn(ε) . This result has implications for the computability of certain problems in
       non-interactive simulation, which are addressed in a subsequent paper.

    A conference version of this paper appeared in the Proceedings of the 32nd Computational Complexity Conference, 2017.
   ∗ Supported by NSF grant CCF 1926872 (transferred from CCF 1814706). Most of the work done as a faculty at Northwestern

University.
   † Supported by DMS-1737944 and ONR N00014-16-1-2227, N00014-17-1-2598.
   ‡ Funded by a Deutsche Forschungs-gemeinschaft (DFG, German Research Foundation) under GermanyâĂŹs Excellence

Strategy GZ 2047/1, Projekt-ID 390685813 and a Sloan Fellowship.


ACM Classification: G.3, G.1.6
AMS Classification: 60
Key words and phrases: noise stability, Gaussian surface area, computability


© 2019 Anindya De, Elchanan Mossel, and Joe Neeman
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2019.v015a006
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

1    Introduction
Isoperimetric theory and noise stability. Isoperimetric problems have been studied in mathematics
since antiquity, and they have become central to mathematics and theoretical computer science. Our
interest in this work is in a modern version of the isoperimetric problem: that of maximizing Gaussian
noise stability. This problem was originally studied by Borell in the context of stochastic processes [3].
Benjamini, Kalai, and Schramm [2] introduced Boolean noise stability (both the notion and the terminol-
ogy) to computer science: roughly speaking, a Boolean function—that is, a function {0, 1}n → {0, 1}—is
noise sensitive if slightly rerandomizing the inputs has the effect of completely rerandomizing the output.
This notion naturally extends to functions {0, 1}n → {0, . . . , k − 1}, or, equivalently, to partitions of
{0, 1}n into k parts; this extension and its applications in theoretical computer science are discussed
in [17].
    Boolean noise stability has a natural Gaussian analogue, in which the Boolean hypercube {0, 1}n is
replaced by Rn , and the Boolean noise is replaced by Gaussian noise. The invariance principle of [25]
shows a deep connection between these two notions of noise stability. In particular, it allows one to
translate results about Gaussian noise stability into results about Boolean noise stability, which are
typically harder. This connection is responsible for many results in hardness of approximation [21]. For
more on the applications of noise stability in the theory of computing, see the book by O’Donnell [29].
    We equip Rn with the standard n-dimensional Gaussian measure, denoted by γn . In particular, for any
x ∈ Rn , the density γn (x) is given by

                                                          1          2
                                            γn (x) =        n/2
                                                                e−kxk2 /2 .
                                                       (2π)

The Ornstein-Uhlenbeck operator Pt is defined for t ∈ [0, ∞) and bounded, measurable f : Rn → R by
                                            Z                     p
                             (Pt f )(x) =          f (e−t · x +    1 − e−2t · y)dγn (y).
                                            y∈Rn

We define the (Gaussian) noise stability of f by Stabt ( f ) := E[ f · Pt f + (1 − f ) · (1 − Pt f )], where the
expectation is with respect to γn . If f denotes the indicator function of a set (call it A), then Stabt ( f ) is
the probability that two e−t -correlated Gaussian random variables fall either both in A or both in Ac . In
the limit t → 0+ , noise stability is related to the Gaussian surface area of the set A [23].


Halfspaces are the most stable sets. Since Borell’s work in the 1980s [3], halfspaces have been known
to maximize noise stability subject to a measure constraint. We will state this fact in a slightly convoluted
way, in order to more easily generalize it. Let ∆k denote the standard simplex in Rk (i. e., the convex hull
of the standard unit vectors {e1 , . . . , ek }). Using [k] to denote {1, . . . , k}, we observe that any function
with range [k] naturally embeds in ∆k by identifying i ∈ [k] with ei . Moreover, we extend the Ornstein-
Uhlenbeck operator to act on vector-valued functions in the obvious way: if f = ( f1 , . . . , fk ) : Rn → Rk
then Pt f = (Pt f1 , . . . , Pt fk ). Finally, we say that f = ( f1 , f2 ) : Rn → ∆2 is a halfspace if there exist
a, b ∈ Rn such that f1 (x) = 1{hx−a,bi≤0} for all x ∈ Rn . (Here, hv, wi denotes the Euclidean inner product
∑ni=1 vi · wi .)

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                   2
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

Theorem 1.1 (Borell). For any g : Rn → ∆2 and any t ≥ 0, every halfspace f : Rn → ∆2 with E[ f ] = E[g]
satisfies
                                     E[h f , Pt f i] ≥ E[hg, Pt gi].

    Theorem 1.1 has many applications in computational complexity. Most famously, it can be com-
bined with an invariance principle [25] in order to prove a number of tight Unique-Games-hardness of
approximation results [20].

More parts? It is straightforward to extend the notion of noise stability to partitions with more than
two parts. Namely, a partition of Rn can be described by f : Rn → [k]. Identifying [k] with {e1 , . . . , ek } as
above, we define the noise stability of such a partition by E[h f , Pt f i]. One may then ask for an analogue
of Theorem 1.1 where ∆2 is replaced by ∆k for some k ≥ 3.
     Even the three-part version of Theorem 1.1 turns out to be significantly harder than the two-part
one. We will try to explain why, by analogy with isoperimetric problems. First of all, the Euclidean,
isoperimetric analogue for k = 3 is known as the “double bubble” problem. The well-known “double
bubble conjecture” states that the minimum total surface area of two bodies separating and enclosing two
given (Lebesgue) volumes is achieved by two spheres meeting at 120◦ . After being open for more than a
century, this problem was settled rather recently [15, 16, 32, 31]. For the Gaussian space, [5] showed that
for some small positive constant c > 0, the Gaussian surface area of three-partitions is minimized by the
“standard simplex partition” as long as the measure of each part is within 1/3 ± c.
     Since halfspaces both maximize noise stability and minimize Gaussian surface area, and since the
standard simplex partition is known to minimize multi-part Gaussian surface area in certain cases, it
seems natural to guess that the standard simplex partition also maximizes multi-part noise stability. This
was explicitly conjectured in [20], in the special case that all of the parts in the partition have equal
measure. However, a somewhat surprising recent result [14] showed that the standard simplex partition
fails to maximize multi-part noise stability unless all of the parts have equal measure. On the other hand,
there is also some support for the conjecture in the equal-measure case: Heilman [13] showed that the
conjecture is true if the noise rate t is larger than some explicit function of the ambient dimension n.

Approximate noise stability of multipartitions? In light of the uncertainty about optimal partitioning
for k ≥ 3, one can ask a more modest question. Given k ≥ 3, t > 0, and prescribed measures v ∈ ∆k
for the k parts, let αn = αn (k, v) be the maximal noise stability that can be obtained in Rn under these
constraints. Since the Gaussian measure is a product measure, αn is clearly non-decreasing in n. Since it
is bounded above by one, it has a limit as n → ∞.
    Our main result is that there is a computable n0 = n0 (k,t, ε) such that αn0 ≥ αm − ε for all m ∈ N.
Although the bound on n0 that we give is not particularly good (in fact, it is not primitive recursive),
the key point is that it is computable. As a consequence, up to error ε, the noise-stable partition is also
computable. We conclude the introduction by an open question:

Question 1.2. Fix k ≥ 3 and measure constraints v ∈ ∆k . Does there exist n0 such that αn0 = αn for all
n ≥ n0 ?

    Our current techniques are not suitable for addressing the question above.

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  3
                              A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

1.1    Acknowledgments
The authors started this research at the workshop on “Information theory and concentration phenomena”
at the IMA, Minneapolis in 2015. The authors would like to thank the organizers of this workshop and
the IMA for their hospitality.


2     Main theorem and overview of proof technique
In order to state the main theorem, we first need to recall the notion of a polynomial threshold function. A
function f : Rn → {0, 1} is said to be a degree-d PTF if there exists a polynomial p : Rn → R of degree-d
such that f (x) = 1 if and only if p(x) > 0. We will need a k-ary generalization of this definition. We note
that there are several possible ways to generalize the notion of PTFs to k-ary PTFs and our particular
choice is dictated by the convenience of using the relevant results from [9].

Definition 2.1. A function f : Rn → [k] is said to be a multivariate PTF if there exist polynomials
p1 , . . . , pk : Rn → R such that
                                       (
                                         j if p j (x) > 0 and for i 6= j, pi (x) ≤ 0,
                               f (x) =
                                        1 otherwise.

In this case, we denote f = PTF(p1 , . . . , pk ). Further, f is said to be a degree-d multivariate PTF if
p1 , . . . , pk are degree-d polynomials.

     We now state the main theorem of this paper. We set the convention that, unless explicitly mentioned
otherwise, the underlying distribution is γn , the standard n-dimensional Gaussian measure. Likewise,
given any measurable function f : Rn → Rk , E[ f ] denotes its (vector-valued) expectation (with respect to
γn ) and Var( f ) denotes its covariance matrix. For a vector v ∈ Rk and a number p ≥ 1, kvk p denotes the
` p norm, (∑i |vi | p )1/p .

Theorem 2.2. There exist computable functions n0 = n0 (t, k, ε) and d = d(t, k, ε) such that the following
holds for all d, k, t > 0 and ε > 0. Let f : Rn → [k] such that E[ f ] = µ ∈ Rk . For t > 0 and ε > 0, there
is a degree-d PTF g : Rn0 → [k] such that

    1. kE[ f ] − E[g]k1 ≤ ε,

    2. E[hg, Pt gi] ≥ E[h f , Pt f i] − ε.

We note that while the functions n0 (·, ·, ·) and d(·, ·, ·) are computable, n0 is not primitive recursive.

     Note that the above theorem automatically implies that a function g satisfying the above properties
can be computed up to some additional error ε. This is because the set of degree-d PTFs on Rn0 has a
finite ε-cover (whose elements can be computed).

Remark 2.3. As explained above, we may interpret the expression E[h f , Pt f i] in Theorem 1.1 in terms of
Gaussian random variables taking values in Rn . The correlation of these random variables is e−t (which

                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                               4
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

is always positive), but in many applications it is useful for allow for negative correlations also. For
example, Theorem 1.1 remains true with negative correlations, but with the inequality reversed.
     We do not know how to prove the analogue of Theorem 2.2 in the case of negative correlations,
although we do discuss some related results in the follow-up paper [8]. The reason for this failure is that a
certain trick which works in the k = 2 setting fails for k ≥ 3: suppose instead of maximizing E[h f , Pt f i],
we attempt to minimize E[h f1 , Pt f2 i] where f1 and f2 are not required to be the same function. In the
setting of Theorem 1.1, the minimizers miraculously satisfy f1 (x) = f2 (−x) for all x, and it follows that
 f1 minimizes the noise stability for the negative correlation −e−t . As far as we know, this miracle does
not occur for k ≥ 3.


Proof sketch. The proof of Theorem 2.2 consists of the following main steps:


From general partitions to PTF. The first step in the proof is to show that given any f : Rn → [k],
there is a multivariate PTF g0 : Rn → [k] which meets the two criteria in Theorem 2.2 and has degree
d = d(t, k, ε), for some computable function d(t, k, ε). In other words, g0 satisfies kE[ f ] − E[g0 ]k1 ≤ ε
and Stabt (g0 ) ≥ Stabt ( f 0 ) − ε. This is done in Section 6. Note that main difference between the desired
conclusion of Theorem 2.2 and what is accomplished in this step is that the ambient dimension remains n
as opposed to a bounded dimension n0 .
     Why is this true? The basic intuition is that if f is noise stable then it should have most of its Hermite
expansion weight at low degree. Therefore we should be able to replace f with the PTF where the
polynomial is the truncated expansion of f . There are a number of challenges in formalizing this intuition:
1. We cannot rule out that a positive fraction of the weight of f is at high degrees (perhaps as large as n).
2. It is not clear that the PTF obtained this way is noise stable nor that 3. It has the right expected value.
     Our analysis proceeds as follows. We would like to construct g0 from f by “rounding” Pδ f for some
small δ > 0. The advantage of Pδ f over f is that Pδ f is guaranteed to have Hermite coefficients that
decay at a certain rate. The rounding of Pδ f can be performed given some a ∈ Rn by considering the
function ga : Rn → [k] which takes the value i whenever i is the largest coordinate of Pδ f − a. It is not
hard to prove that it is possible to choose a such that E[ga ] = E[ f ]; moreover, one can show that this
function ga has better noise stability than f does. The main obstacle is that the function ga is not a
PTF. Unfortunately the Hermite decay of Pδ f does not translate to Hermite decay of ga . Instead we use
a randomized construction to show that for most a’s, ga has Hermite decay and can therefore be well
approximated by a PTF. The analysis of this construction uses the co-area formula [11] and gradient
bounds [23] in the Gaussian space and draws on ideas from [26, 22].


From PTF in dimension n to a small PTF of bounded-degree polynomials. Given the function
g0 : Rn → [k] of degree d = d(t, k, ε), our next goal is to show that it is possible to obtain a PTF g on
some n0 = n0 (d, k, ε) variables such that (i) kE[g] − E[g0 ]k1 ≤ ε and (ii) |hg, Pt gi − hg0 , Pt g0 i| ≤ ε. This
part builds on and extends the theory and results of [9]. The key notion introduced in [9] is that of an
eigenregular polynomial. Namely, a polynomial is said to be δ -eigenregular if for the canonical tensor
A p associated with the polynomial, the ratio of the maximum singular value to its Frobenius norm is at
most δ (the tensor notions are explicitly defined later).

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                   5
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

     The key advantage of this definition is that as shown in [9], when δ → 0, the distribution of p (under
γn ) converges to a normal. In other words, eigenregular polynomials obey a central limit theorem. In fact,
given k polynomials p1 , . . . , pk which are δ -eigenregular, they also obey a multidimensional central limit
theorem.
     The regularity lemma from [9] implies that the polynomials p1 , . . . , pk can be jointly expressed as
bounded-size polynomials in eigenregular homogenous polynomials. More precisely, for any δ > 0
there is a collection of “inner” polynomials {In(ps,q,` )} for 1 ≤ s ≤ k, 1 ≤ q ≤ d and 1 ≤ ` ≤ num(s, q)
(where num(·, ·) is an explicitly defined function), and a collection of “outer” polynomials {Out(ps )} for
1 ≤ s ≤ k such that
                                    ps = Out(ps )({In(ps,q,` )}q∈[d],`∈[num(s,q)] ).
Moreover, every inner polynomial is δ -eigenregular and homogeneous, and each outer polynomial
Out(ps ) has arity num(s) = ∑dq=1 num(s, q) bounded by a computable function of d, k, and δ .
    In [9], the regularity lemma was used to conclude that the joint distribution of p1 , . . . , pk can be
approximated in a bounded dimension as we can replace each of the inner polynomials by a one
dimensional Gaussian. For our application things are more delicate, as we are not only interested in the
joint distribution of p1 , . . . , pk but also in the noise stability of p1 , . . . , pk . For this reason it is important
for us to maintain the degrees of the inner polynomials (each of which is homogenous) and not replace
them with Gaussians.

A small PTF representation. In the final step of the proof, we maintain Out(ps ) and show how
the polynomials {In(ps,q,` )} can be replaced by a collection of polynomials {In(rs,q,` )} in bounded
dimensions (i. e., bounded in terms of d, k and ε). The fact that a collection of homogenous polynomials
can be replaced by polynomials in bounded dimensions is a tensor analogue of the fact that for any k
vectors in Rn , there exist k vectors in Rk with the same matrix of inner products. Once such polynomials
are found, it is not hard to construct eigenregular polynomials from them by averaging the polynomials
over independent copies of random variables.


3    Applications
Given the wide applicability of Borell’s isoperimetric result to combinatorics and theoretical computer
science, we believe that Theorem 2.2 will also be widely applicable. We will now point out some
applications of this theorem. First, by combining Theorem 2.2 with the invariance principle [25], we
derive a weak k-ary analogue of “Majority is Stablest.” The analogue of the Ornstein-Uhlenbeck operator
is the so-called Bonami operator (also referred to as the Bonami-Beckner operator) defined as follows: for
ρ ∈ [−1, 1] and x ∈ {−1, 1}n , let Dρ (x) be the product distribution over {−1, 1}n such that for y ∼ Dρ (x),
for all 1 ≤ i ≤ n, E[xi yi ] = ρ. Then, for any f : {−1, 1}n → Rk ,

                                              Tρ f (x) = Ey∼Dρ (x) [ f (y)].

Likewise, for any i ∈ [n] and z ∈ {−1, 1}n−1 , let fz,−i : {−1, 1} → Rk denote the function obtained by
restricting all but the i-th coordinate to z. Then, Var( fz,−i ) = Ex∈{−1,1} [k fz,−i (x) − Ez [ fz,−i ]k22 ]. Define

                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                         6
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

the influence of the i-th coordinate on f by

                                      Inf i ( f ) = Ez∈{−1,1}n−1 [Var( fz,−i )].

Theorem 3.1. There exist computable functions n0 = n0 (k, ε, ρ) and C = C(k, ε, ρ) such that the
following holds. Given any k ∈ N and ρ ∈ [0, 1], ε > 0 and µ = (µ1 , . . . , µk ) ∈ ∆k and n ≥ n0 , there is a
function g = gµ,ρ,ε : {−1, 1}n → [k] such that
                            √
    1. maxi Inf i (g) ≤ C/ n;
                                            √
    2. | Prx∈{−1,1}n0 [g(x) = i] − µi | ≤ C/ n; and

   3. for any f : {−1, 1}n → [k] satisfying | Prx∈{−1,1}n [ f (x) = i] − µi | ≤ ε and maxi Inf i ( f ) ≤ ε,

                                          E[h f , Tρ f i] ≤ E[hg, Tρ gi] + 2kε.

Further, the function g can be computed given the parameters µ, ρ and ε.

    In particular, the case where (µ1 , . . . , µk ) = (1/k, . . . , 1/k) implies that in a tied election between
k alternatives, we can find an ε-optimally robust noise-stable voting rule, where the stability is with
respect of each voter randomizing their vote independently with probability 1 − ρ. Note that in the
actual “Majority is Stablest” theorem, the function g is known explicitly (it is the “majority” function),
whereas in Theorem 3.1 we only know that g is computable. An anonymous referee suggested that we
call Theorem 3.1 a “Something is Stablest” theorem.
    The proof of the theorem, which is omitted, is by now a standard reduction from a Gaussian noise
stability result to a discrete one [25, 17, 6]. In one direction, the invariance principle immediately bounds
the discrete stability of any partition by the Gaussian stability. In the other direction, starting from
an ε-optimal Gaussian partition, each Gaussian can be replaced by a normalized sum of independent
variables to obtain a discrete partition with nearly the same stability. The same result holds for other
domains, for example for f , g : [k]n → [k].

3.1   Relationship to rounding of SDPs
We do not know of any connection between our results and those of Raghavendra and Steurer [30] who
showed that for any CSP, there is a rounding algorithm that is optimal up to ε, whose running time
is polynomial in the instance size and doubly exponential in 1/ε. It is natural to suspect that the two
results are related, since there are well-known connections between SDP rounding and Gaussian noise
stability. However, the usual analysis relating rounding to Gaussian noise stability seems to require that
halfspaces maximize noise stability for all possible values of the noise. In our results, this property does
not necessarily hold.
    In the other direction, it is tempting to try to cast the noise stability problem as an optimization problem
on a “Gaussian graph:” divide Rn into tiny pieces A1 , . . . , Am , and consider the weighted graph with
vertices 1, . . . , m and an edge between vertices i and j that has weight E[1Ai Pt 1A j ] (where 1A denotes the
indicator function of the set A). If the partition is fine enough, maximizing noise stability is approximately
equivalent to finding an appropriate optimal partition of this graph, and so one might hope to apply the

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  7
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

results of Steurer and Raghavendra to obtain computable bounds on the dimension where an almost
optimal solution can be achieved. It is hard to implement this approach for two reasons: first, we do not
know the SDP solution for the Gaussian graph; second, we are interested in the optimal solution and it is
not clear what is the relation between the best integral solution and the SDP solution for the Gaussian
graph.
    While we do not see how to formally relate the two works, connecting the two (if possible) would be
likely to yield important insights.

3.2    Non-interactive correlation distillation
Next, we discuss a basic problem in information theory and communication complexity which was
recently considered in the work of Ghazi, Kamath and Sudan [12]. Let there be two non-communicating
players Alice and Bob who have access to independent samples from a joint distribution P on the set
A × B. In other words, Alice (resp. Bob) has access to (x1 , x2 , . . . , ) (resp. (y1 , y2 , . . .)) such that xi ∈ A,
yi ∈ B and for each i ∈ N, (xi , yi ) is distributed according to P, and the random variables {(xi , yi )}ni=1
are mutually independent (as i varies). Let µ = (µ1 , . . . , µk ) ∈ ∆k and ν = (ν1 , . . . , νk ) ∈ ∆k . What is the
maximum κ ∈ [0, 1] such that Alice and Bob can non-interactively jointly sample a distribution Q on
[k] × [k] such that the distribution of the marginals of Alice and Bob are µ and ν, respectively, and they
sample the same output with probability κ?
     To formulate this problem more precisely, we introduce some notation. Let xn = (x1 , . . . , xn ) and
  n
y = (y1 , . . . , yn ), where each pair (xi , yi ) is drawn from P, independently with respect to i. Now, note that a
non-interactive protocol for Alice and Bob is equivalent to a pair ( f , g) where f : An → [k] and g : Bn → [k]
(for some n ∈ N). In this terminology, the question now becomes the following: given µ, ν, do there exist
n and f : An → [k] and g : Bn → [k] such that f (xn ) ∼ µ, g(yn ) ∼ ν, and Pr( f (xn ) = g(yn )) = κ? (Here,
 f (xn ) ∼ µ means that µ is the law of f (xn ).)
     Before we state the main result of [12] and our extension, we consider a motivating example. Let
A = B = R and let P be the law of two ρ-correlated standard Gaussians. Let k = 2 and µ = ν. Then,
Borell’s isoperimetric theorem (Theorem 1.1) states that the maximum achievable κ is given by
                                               κ = Pr[ f (x) = f (y)],
                                                    x,y

where f : R → ∆2 is a halfspace with measure µ. Thus, in the above case, n = 1 suffices and f = g is the
halfspace whose measure is µ. We now state the main result of [12]. For a probability distribution P, we
let |P| denote the size of some standard encoding of P.
Theorem 3.2 (Ghazi-Kamath-Sudan). Let (A × B, P) be a probability space, fix δ > 0, and let xn and
yn be as above. There is an algorithm running in time O|P|,δ (1) such that given µ and ν in ∆2 and a
parameter κ ∈ [0, 1], it distinguishes between the following two cases:
   1. There exist n ∈ N, f : An → {0, 1}, and g : Bn → {0, 1} such that f (xn ) ∼ µ, g(yn ) ∼ ν, and
      Pr( f (xn ) = g(yn )) ≥ κ − δ . Moreover, there is a computable function n0 such that n ≤ n0 (P, δ ).
      Similarly, f and g can also be computed given P, µ, ν and δ as inputs.
   2. For any n ∈ N and f : An → {0, 1} and g : Bn → {0, 1} that satisfy f (xn ) ∼ µ and g(yn ) ∼ ν,
      Pr( f (xn ) = g(yn )) ≤ κ − 8δ .

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                       8
                                   N OISE S TABILITY IS L OW-D IMENSIONAL

    In other words, the above theorem states that there is an algorithm which, given P, target marginals
µ, ν, and correlation κ, can distinguish between two cases: (a) In the first case, it is possible for Alice
and Bob to non-interactively simulate a distribution R which has the correct marginals and achieves the
correlation κ up to δ . In this case, there is a computable bound on the number of copies of P required and
the algorithm also outputs the functions f , g used for the non-interactive simulation. (b) In the second
case, no non-interactive protocol between Alice and Bob can simulate a distribution R such that it has the
correct marginals and the correlation is more than κ − 8δ .
    We remark that the requirement for the marginals to match exactly is unimportant. Indeed, any
protocol where the marginals match approximately can be “fixed” to have exact marginals, with a small
loss in the correlation.
    The main restriction of Theorem 3.2 is that the output of the non-interactive protocol is a pair of bits.
Theorem 2.2 immediately implies the following modification of Theorem 3.2: The distribution P is a
Gaussian measure and {0, 1} is replaced by [k]. In the best of both worlds, we would be able to replace
{0, 1} by [k] without adding the assumption that P is a Gaussian measure. Using the methods we develop
here, this also turns out to be possible; this is done in a follow-up work [8].


4    Preliminaries
We will start by defining some technical preliminaries which will be useful for the rest of the paper.

Definition 4.1. For k ∈ N and 1 ≤ i ≤ k, let ei ∈ Rk be the unit vector along coordinate i and let ∆k be
the convex hull formed by {ei }1≤i≤k .

    A function f : Rn → Rk is said to be in L2 (γn ) if x k f (x)k22 · γn (x)dx is finite. In this paper, unless
                                                                R

explicitly mentioned otherwise, (a) the domain is always equipped with the standard Gaussian measure
γn and (b) all functions are in L2 (γn ). A key property of such functions is that they admit the so-called
Hermite expansion. To define this notion, let us first define a family of polynomials Hq : R → R (for
q ≥ 0) as
                                                          (−1)q 2            dq     2
                        H0 (x) = 1; H1 (x) = x; Hq (x) = √ · ex /2 · q e−x /2 .
                                                              q!            dx
This family of polynomials is referred to as the Hermite polynomials. Let Z∗ denote the subset of
non-negative integers. For S ∈ Z∗n , define HS : Rn → R as
                                                            n
                                               HS (z) = ∏ HSi (zi ).
                                                         i=1

It is well known that the set {HS }S∈Z∗n forms an orthonormal basis for L2 (γn ). In other words, every
f ∈ L2 (γn ) may be written as
                                           f = ∑ fb(S) · HS ,
                                                    S∈Z∗n

where the sum converges respect to L2 (γn ). We call fb(S) = ( fb1 (s) . . . , b
                                                                               fk (S)) ∈ Rk the Hermite coefficients,
and we refer to the formula above as the Hermite expansion of f . Because the HS form an orthonormal

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                     9
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

basis, we obtain Parseval’s identity:
                                      Z
                                            k f (x)k22 γn (x)dx = ∑ k fb(S)k22 .                                (4.1)
                                       Rn                              S∈Z∗n

Finally, for f : Rn → Rk and d ∈ N, we define the degree-d truncation of f by “throwing away” all
Hermite coefficients at levels higher than d:

                                            f≤d (x) =        ∑ fb(S) · HS (x).
                                                           S:|S|≤d


We also define W≤d [ f ] = k f≤d k22 and W>d [ f ] = ∑|S|>d k fb(S)k22 .

Ornstein-Uhlenbeck operator
Definition 4.2. The Ornstein-Uhlenbeck operator Pt is defined for t ∈ [0, ∞) and any bounded, measurable
function f : Rn → Rk by
                                              Z                       p
                               (Pt f )(x) =            f (e−t · x +    1 − e−2t · y)dγn (y).
                                              y∈Rn

   Note that if f takes values in ∆k ⊂ Rk , then so does Pt f for every t > 0. A basic fact about the
Ornstein-Uhlenbeck operator is that the functions {HS } are eigenfunctions of this operator. We leave the
proof of the next proposition to the reader.

Proposition 4.3. For S ∈ Z∗n , Pt HS = e−t·|S| · HS .

    We define the noise stability of f : Rn → ∆k with noise rate t ∈ [0, ∞) by

                                                  Stabt ( f ) = E[h f , Pt f i].

Multivariate polynomial threshold functions
We recall the definition of polynomial threshold functions from the introduction (primarily to define an
associated quantity which shall be useful later).

Definition. A function f : Rn → [k] is said to be a multivariate PTF if there exist polynomials p1 , . . . , pk :
Rn → R such that                   (
                                     j if p j (x) > 0 and for i 6= j, pi (x) ≤ 0,
                           f (x) =
                                    1 otherwise.
We then denote f as f = PTF(p1 , . . . , pk ). Further, f is said to be a degree-d multivariate PTF if p1 , . . . , pk
are degree-d polynomials. For the rest of this paper, we will use “PTF” to mean “multivariate PTF,” and
the underlying k will be clear from the context. Also, let us define Collision( f ) ⊆ Rn defined as

                            Collision( f ) = x ∈ Rn : |{ j : p j (x) > 0}| 6= 1 .
                                                


                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                      10
                                     N OISE S TABILITY IS L OW-D IMENSIONAL

    For technical reasons, we will also require the PTFs to satisfy a certain regularity property:
Definition 4.4. A degree-d multivariate PTF f = PTF(p(1) , . . . , p(k) ) is said to be (d, δ )-balanced if each
p(i) has variance 1 and |E[p(i) ]| ≤ logd/2 (k · d/δ ).
     Observe that the first condition (namely, Var(p(i) ) = 1) can be achieved without loss of generality
by simply rescaling all the polynomials to have variance 1. This rescaling does not change the value of
the PTF at any point x. While the condition on expectation is non-trivial, the next proposition says that
any multivariate PTF can be assumed to be (d, δ )-balanced while only changing the value of the PTF at
δ -fraction of places.
Proposition 4.5. Let f : Rn → [k] satisfy f = PTF(p(1) , . . . , p(k) ) for some collection of degree-d poly-
nomials p(1) , . . . , p(k) . For any δ > 0, there is a (d, δ )-balanced PTF g = PTF(q(1) , . . . , q(k) ) defined on
Rn such that Prx [g(x) 6= f (x)] ≤ δ and Prx [x ∈ Collision(g)] ≤ Prx [x ∈ Collision( f )] + δ .
    To prove this proposition, we will require the following standard concentration bound for low-degree
polynomials which follows as a consequence of the well-known hypercontractive inequality—e. g.,
Lemma 2.2 in [10] proves the analogue of this statement for the Boolean hypercube. To get the statement
                                                                                                   √
for Gaussians, one can just substitute each one-dimensional Gaussian xi by ((yi,1 + · · · + yi,m )/ m) where
each yi, j is an unbiased {±1} random variable and then take the limit as m → ∞.
Theorem 4.6. Let p : Rn → R be a degree-d polynomial. Then, for any t > 0,
                                                                    2/d
                         Pr |p(x) − E[p(x)]| ≥ t · Var[p] ≤ d · e−t .
                                                 p      
                                x

Proof of Proposition 4.5. As we have observed, we can assume without loss of generality that the
polynomials p(i) have variance 1. We define the polynomials q(i) as follows. If |E[p(i) ]| ≤ logd/2 (d · k/δ ),
then we set q(i) = p(i) . Else, let bi = sign(E[p(i) ]) and define
                                     q(i) = p(i) − E[p(i) ] + bi · logd/2 (d · k/δ ).
According to this definition, sign(q(i) (x)) 6= sign(p(i) (x)) implies that |q(i) (x) − E[q(i) ]| ≥ logd/2 (dk/δ ).
By Theorem 4.6,
                                 Pr sign(q(i) (x)) 6= sign(p(i) (x)) ≤ δ /k.
                                                                    
                                    x∼γn

Note that if f (x) 6= g(x) then there is at least one i for which sign(p(i) (x)) 6= sign(q(i) (x)). By a union
bound, it follows that Prx∼γn [ f (x) 6= g(x)] ≤ δ . Similarly, if x ∈ Collision(g) but x 6∈ Collision( f ) then
there is at least one i for which sign(p(i) (x)) 6= sign(q(i) (x)). Using another union bound,
                               Pr [x ∈ Collision( f )] ≤ Pr[x ∈ Collision(g)] + δ .
                              x∼γn                           x


Convention
At several places in the paper, we will establish bounds on one quantity in terms of others without the
explicit mention of the dependence. For example, if we state that a quantity d = d(k, ε), we mean that
that d can be bounded as a computable function of k and ε (even if the function d(·, ·) is simple, in the
interest of clarity, we do not explicitly write it).

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                    11
                               A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

5    Proof strategy for Theorem 2.2
We now formally state the main results required to prove Theorem 2.2. The first theorem says that
given any k-ary function with a given set of measures for each of the k-partitions, there is a low-degree
multivariate PTF which up to an error ε has (a) the same measures for the induced partitions and (b) is no
less noise stable at a fixed noise rate t. In particular, we have the following theorem.
Theorem 5.1. Let f : Rn → [k] satisfy E[ f ] = µ where µ = (µ1 , . . . , µk ). Then, for every ε > 0 and t > 0,
there exists a multivariate PTF g0 : Rn → [k] of degree d = d(t, k, ε) such that
    • kE[g0 ] − µk1 ≤ ε,

    • E[hg0 , Pt g0 i] ≥ E[h f , Pt f i] − ε,

    • Pr[x ∈ Collision(g0 )] ≤ ε, and

    • g0 is (d, ε)-balanced.
     The key difference between Theorem 5.1 and Theorem 2.2 is that the number of variables in PTF g0
is the same as that in f . The above statement corresponds to the first step (“from general partitions to
PTF”) of the proof of Theorem 2.2 sketched in Section 2. The next theorem statement formalizes the
second and third steps sketched in Section 2. In particular, it states that given a degree-d multivariate PTF
over n variables, there is another multivariate PTF which induces approximately the same partition sizes
and has approximately the same noise stability (at any fixed noise rate t) but the new PTF is only over
some Od,t (1) variables.
Theorem 5.2. Let f : Rn → [k] be a degree-d, (d, ε)-balanced PTF with Ex [ f (x)] = µ where µ =
(µ1 , . . . , µk ). Further, let us assume that Pr[x ∈ Collision( f )] ≤ ε/(40k2 ). Then, for every ε > 0 and t > 0,
there exists a degree-d PTF fjunta : Rn0 → [k] such that,
    • kEx [ fjunta (x)] − µk1 ≤ ε,

    • E[h fjunta , Pt fjunta i] ≥ E[h f , Pt f i] − ε.
Further, n0 = n0 (d, k, ε,t) is a computable function.
    We now see how Theorem 5.1 and Theorem 5.2 yield Theorem 2.2.

Proof of Theorem 2.2
Let us assume that given measure µ = (µ1 , . . . , µk ) and noise rate t > 0, the most noise-stable partition
is f : Rn → [k]. Then, applying Theorem 5.1 (with error parameter ε/(80k2 )), we obtain a PTF g1 of
degree d = Ot,k,ε (1) such that it is (d, ε/2) balanced and Pr[x ∈ Collision(g1 )] ≤ ε/(80k2 ). Further, note
that kE[g1 ] − E[ f ]k1 ≤ ε/2 and E[hg1 , Pt g1 i] ≥ E[h f , Pt f i] − ε/2.
    We next apply Theorem 5.2 to function g1 to obtain fjunta which is a degree-d PTF on n0 = Ot,k,ε (1)
variables such that kE[g1 ] − E[ fjunta ]k1 ≤ ε/2 and E[h fjunta , Pt fjunta i] ≥ E[hg1 , Pt g1 i] − ε/2.
    Combining these two facts, we obtain kE[g1 ]−E[ fjunta ]k1 ≤ ε and E[h fjunta , Pt fjunta i] ≥ E[h f , Pt f i]−
ε. Setting g = fjunta concludes the proof.

                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                 12
                                     N OISE S TABILITY IS L OW-D IMENSIONAL

6    Reduction from arbitrary functions to PTFs: Proof of Theorem 5.1
In this section, we will prove Theorem 5.1 in two parts. The first step is to show that we can replace f by
a function with a lower bound on the Hermite decay:
Lemma 6.1. For every f : Rn → [k], every ε > 0, and every t > 0, there exists a function h : Rn → [k]
satisfying the following:
    • E[h] = E[ f ],
    • E[hh, Pt hi] ≥ E[h f , Pt f i] − ε,
    • W>d [h] ≤ ε for a computable d = d(t, k, ε).
    The second step in the proof of Theorem 5.1 goes from the Hermite decay to an actual PTF.
Lemma 6.2. Let h : Rn → [k] satisfy W>d [h] ≤ ε. Then, there is a PTF g : Rn → [k] of degree d such that
Ex [g(x) 6= h(x)] ≤ k2 · ε and Prx [x ∈ Collision(h)] ≤ k2 · ε.
   It is easy to see that Theorem 5.1 follows in a straightforward way by combining Lemma 6.1 and
Lemma 6.2. Note that the condition of g being (d, ε)-balanced can be obtained by applying Proposition 4.5.
We begin with the proof of Lemma 6.2, which is easier.

Proof of Lemma 6.2. By identifying i ∈ [k] with ei as before, we view h as a function taking values in
Rk . Define h(0) : Rn → Rk by h(0) (x) = h(x) − (1/k) · 1, where 1 = (1, . . . , 1) ∈ Rk . Note that for every
x ∈ Rn , h(0) (x) is a k-dimensional vector with the entry (k − 1)/k in one of the coordinates and −1/k in
every other coordinate. Call such a point a k-lattice point.
          (0)
     Let h≤d : Rn → Rk be the function obtained by truncating the Hermite expansion of h(0) at degree d.
                                                                                                 (0)
Consider the degree-d polynomials p1 , . . . , pk : Rn → R where pi is the i-th coordinate of h≤d , and define
g = PTF(p1 , . . . , pk ). We claim that g satisfies the conditions of the lemma.
   Consider any point x ∈ Rn .
                                                            (0)
    • If x ∈ Collision(g), we claim that kh(0) (x) − h≤d (x)k22 ≥ k−2 . To see this, note that if x ∈
      Collision(g), there are two possibilities: either pi (x) ≤ 0 for all 1 ≤ i ≤ k or there are two distinct
      i, j such that pi (x) > 0 and p j (x) > 0. In either case, it easily follows that the `2 distance from
        (0)
      h≤d (x) to any k-lattice point is at least 1/k.
                                                                                   (0)
    • If x 6∈ Collision(g) and g(x) 6= h(x), then again we claim that kh(0) (x) − h≤d (x)k22 ≥ k−2 . To see this,
      suppose that g(x) = ei 6= e j = h(x). Then the jth coordinate of h(0) (x) is (k − 1)/k, but on the other
                                                                                       (0)
      hand p j (x) ≤ 0 because g(x) = ei and x 6∈ Collision(g). Hence, kh(0) (x) − h≤d (x)k22 ≥ (k − 1)2 /k2 .
Combining these two, we get
                      (0)         1
      Ex kh(0) (x) − h≤d (x)k22 ≥ 2 · Pr[x ∈ Collision(g)] + Pr[x 6∈ Collision(g) ∧ g(x) 6= h(x)] .
                                                                                               
                                 k     x                      x

                                                             / the left hand side is the same as W≥d [h].
As h(0) and h differ only in the Hermite coefficient for S = 0,
Combining this with the above inequality achieves all the stated guarantees.

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                13
                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

    The proof of Lemma 6.1 is longer, and so we begin with an outline. The first observation is that
bounding E[|∇h|] implies a bound on W>d [h]. This follows a standard spectral argument, and is stated as
Corollary 6.4. The second observation is that the function h obtained by thresholding Pt f at a suitable
value (chosen, for example, so that E[h] = E[ f ]) satisfies E[hh, Pt hi] ≥ E[h f , Pt f i]. This is stated as
Lemma 6.7.
    Based on the previous paragraph, it seems like we would like to bound E[|∇h|] where h is obtained
by thresholding Pt f ; let a ∈ Rk be the desired threshold value, so that thresholding Pt f at a produces a
partition with the right measures. It turns out, unfortunately, that for h defined in this way, E[|∇h|] could
be arbitrarily large. A key insight of [22] is that (using the co-area formula and gradient bounds on Pt f )
the partition produced by thresholding at a random value near a has bounded expected surface area. In
particular, although thresholding exactly at a might be a bad idea, there exist many good nearby values at
which to threshold. Based on this observation, we construct h in two steps. In the first step, we define h
by thresholding Pt f , but only on the set of x ∈ Rn for which Pt f (x) is not too close to a. By choosing
“not too close” in a suitable random way, the observation of [22] implies that this step only contributes
a bounded amount to E[|∇h|]. Since the first step is almost the same as just thresholding Pt f at a, it is
consistent with our desire that E[hh, Pt hi] ≥ E[h f , Pt f i] − ε.
    In the second step, we partition the remaining part of Rn by chopping it with halfspaces of the correct
size. Since halfspaces have a bounded surface area, this also contributes a bounded amount to E[|∇h|].
Crucially, this step does not destroy the value of hh, Pt hi; fundamentally, this is because Pt f is almost
constant on the set we are partitioning. This finishes the outline of the proof of Lemma 6.1. We now start
with some preliminaries required in the proof of this lemma.

6.1   Surface area and spectrum
In our outline of the proof of Lemma 6.1, we claimed that a control of E[|∇h|] implies a control of W>d [h].
Here we prove that claim, using a theorem of Ledoux [23] that gives a lower bound on the gradient in
terms of the noise sensitivity.

Theorem 6.3 (Ledoux). For a differentiable function f : Rn → [0, 1] and any t > 0,

                                                           arccos(e−t )
                                 E [ f · ( f − Pt f )] ≤      √         · E [|∇ f |].
                                x∼γn                            2π       x∼γn

    Using this theorem, it is possible to establish an upper bound on W≥d [ f ] in terms of |∇ f | as done
below.

Corollary 6.4. For any differentiable f : Rn → [0, 1] be of bounded variation (in Gaussian measure) and
any d ∈ N,                                              
                                       ≥d             1
                                     W [f] ≤ O √           · E[|∇ f |].
                                                       d
Proof.
                                         ∞
                    E[ f ( f − Pt f )] = ∑ (1 − e−t· j )W j [ f ] ≥ (1 − e−d·t ) · W≥d [ f ].
                                        j=0


                        T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                               14
                                       N OISE S TABILITY IS L OW-D IMENSIONAL

Choose d = 1/t and apply Theorem 6.3, we get
                                                                    1
                                                          arccos(e− d )
                                                                                            
                 ≥d                                                                       1
             W        [ f ] ≤ O(1) · E[ f ( f − P1 f )] ≤     √         · E [|∇ f |] ≤ O √     · E[|∇ f |].
                                                 d             2π        x∼γn              d

6.2     Various lemmas for thresholding
Recall that our proof of Lemma 6.1 is based on thresholding Pt f . Before proving it, we introduce various
properties of the thresholding procedure. The first lemma states that the “best” way to round a ∆k -valued
function to a {e1 , . . . , ek }-valued function while preserving its expectation is simply to threshold it. Here,
“best” means that we are trying to maximize the correlation between the original function and the rounded
one.

Lemma 6.5. Let f : Rn → ∆k , g : Rn → {e1 , . . . , ek }, and z ∈ Rk . Suppose that whenever g(x) = ei , we
have
                               i ∈ argmax{ f j (x) − z j : j = 1, . . . , k}.
Then g maximizes E[h f , hi] among all h : Rn → {e1 , . . . , ek } satisfying E[h] = E[g].

Proof. Let h : Rn → {e1 , . . . , ek } satisfy E[h] = E[g]. By the defining assumption of g, hg, f − zi ≥
hh, f − zi pointwise. Hence,

                  E[hh, f i] = E[hh, f − zi] + hE[h], zi ≥ E[hg, f − zi] + hE[g], zi = E[hg, f i].

      Let us assign a notation to the sort of thresholding involved in Lemma 6.5.

Definition 6.6. For f : Rn → ∆k and z ∈ Rk , let Fz ( f ) denote the set of functions g : Rn → {e1 , . . . , ek }
such that
                                 hg(x), f (x) − zi = max{ f j (x) − z j }
                                                                 j

for every x ∈ Rn .

    We say that functions in Fz ( f ) are obtained by “thresholding” f around the value z.
    The next step is to show that a function obtained by thresholding Pt f is always at least as noise-stable
(with parameter t) as the original function f .

Lemma 6.7. Let f : Rn → {e1 , . . . , ek } and take t > 0. If g ∈ Fz ( f ) for some z ∈ Rk satisfies E[g] = E[ f ]
then Stabt (g) ≥ Stabt ( f ).

Proof. Since E[g] = E[ f ], Lemma 6.5 implies that E[hg, Pt f i] ≥ E[h f , Pt f i] = Stabt ( f ). On the other
hand, the Cauchy-Schwarz inequality and the semi-group property of Pt imply that
                                                               p
                         E[hg, Pt f i] = E[hPt/2 g, Pt/2 f i] ≤ Stabt (g)Stabt ( f ).

The claim follows.

                             T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                              15
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

    Next, we come to the importance of the thresholding parameter z: it allows us to tune the expectation
of the thresholded functions. In particular, in order to make Lemma 6.7 useful we need to show that we
can always find a thresholded function with the same expectation as the original function.
Lemma 6.8. For any f : Rn → ∆k , there exists some z ∈ Rk and g ∈ Fz ( f ) such that E[g] = E[ f ].
    It will actually be convenient to prove a more general Lemma about splitting up the mass of a general
probability measure. Given z ∈ Rk and i ∈ [k], let Ai (z) ⊂ ∆k be the set {x ∈ ∆k : xi − zi = max j x j − z j }.
The correspondence between this notion and the definition of Fz ( f ) is that g ∈ Fz ( f ) if and only if
g(x) = ei implies f (x) ∈ Ai (z).
Lemma 6.9. For every probability measure ζ on ∆k and every q ∈ ∆k , there exists z ∈ Rk such that
ζ ( i∈I Ai (z)) ≥ ∑i∈I qi for every I ⊂ [k].
   S

    Before proving Lemma 6.9, we will show how it implies Lemma 6.8. For this, we introduce one more
ingredient: a weighted version of Hall’s marriage theorem. It will be used to show that we can divide a
certain amount of “missing” volume into pieces of the right size, while preserving certain constraints.
Lemma 6.10. Let G = (U,V, w, E) be a finite, vertex-weighted, bipartite graph. That is, U and V are
finite sets, w is a weight function w : U tV → (0, ∞), and E ⊂ U ×V is the set of edges. Suppose that for
every U 0 ⊂ U, ∑u∈U w(u) ≤ ∑v∼U 0 w(v), where v ∼ U 0 if there exists u ∈ U 0 such that (u, v) ∈ E. Then
there exists p : V → ∆|U| such that pu (v) > 0 implies u ∼ v, and such that

                                              ∑ pu (v)w(v) = w(u)
                                              v∈V

for every u ∈ U.
     In the case that w takes values in {0, 1}, Lemma 6.10 follows from the usual formulation of Hall’s
marriage theorem (which guarantees in addition that p : V → {e1 , . . . , e|U| }). When w is integer-valued
(or, by scaling, rational-valued), it follows by applying Hall’s marriage theorem to the graph in which u is
replicated w(u) times. The general case follows by a simple approximation argument.

Proof of Lemma 6.8. Let ζ be the distribution of f and let q = E[ f ]. Applying Lemma 6.9, we may take
z ∈ Rk such that ζ ( i∈I Ai (z)) ≥ ∑i∈I qi for every I ⊂ [k]. For I ⊂ [k], let BI be the set of x ∈ Rn such that
                      S

{i : f (xi ) − zi = max j f (x j ) − z j } = I. Then {BI : I ⊂ [k]} is a partition of Rn . Moreover, if we define
w(I) = γn (BI ) then                                  [            [         
                                  ∑      w(J) =  γ n       B J   = ζ     A i (z)  ≥ ∑ qi .                  (6.1)
                            J:J∩I6=0/            J:J∩I6=0/          i∈I           i∈I

Now we construct a bipartite graph with U = [k] and V = {I : I ⊂ [k]}. The pair (i, I) is an edge if i ∈ I.
We define w on V as above, and we define w on U by w(i) = pi . According to (6.1), the condition of
Lemma 6.10 holds; hence, there exists p : V → ∆k such that pi (I) > 0 only when i ∈ I, and such that
∑I3i pi (I)γn (BI ) = qi . Now we will define g: for each I ⊂ [k], partition BI arbitrarily into sets {BI,i : i ∈ I},
where γn (BI,i ) = pi (I)γn (BI ). This may be done because γn has no atoms. Finally, define g to be ei on
every BI,i . Then the condition ∑I3i pi (I)γn (BI ) = qi ensures that γn ({g = ei }) = qi . Moreover, note that
g(x) = ei implies that x ∈ BI for some I, which implies that fi (x) − zi = max j ( f j (x) − z j ). This completes
the construction of g.

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                    16
                                    N OISE S TABILITY IS L OW-D IMENSIONAL

Proof of Lemma 6.9. We will assume initially that ζ is absolutely continuous with respect to the uniform
measure on ∆k . As a consequence, the function ψ : ∆k → Rk defined by ψi (z) = ζ (Ai (−kz)) is continuous.
Moreover, the image of ψ is in ∆k , and we need prove that it is all of ∆k .
     For I ( {1, . . . , k}, let FI = {x ∈ ∆k : xi = 0 for all i ∈ I}. Note that every face of ∆k is of the form FI
for some I ( {1, . . . , k}. Next, we claim that ψ maps FI into FI for every I. Indeed, if z ∈ FI then there is
at least one j 6∈ I such that z j ≥ 1/k. For this j and any i ∈ I, if x is in the interior of ∆k then

                                                xi + kzi = xi < x j + kz j .

It follows that Ai (−kz) does not intersect the interior of ∆k ; hence, ψi (z) = 0. Since this holds for all i ∈ I,
ψ(z) ∈ FI . By Sperner’s lemma, any map from ∆k into itself that leaves all faces invariant must be onto,
and so ψ is onto, as claimed.
     To complete the proof, we must eliminate the assumption that ζ is absolutely continuous with respect
to the uniform measure. For an arbitrary probability measure ζ , let ζn be a sequence of probability
measures that are absolutely continuous with respect to the uniform measure, and which converge
to ζ in distribution. (For example, we may construct ζn by first pushing ζ forward under the map
x 7→ (1/2n)(1/k, . . . , 1/k)+(1−1/n)x, and by convolving the push-forward with a smooth bump function
supported on (1/2n)∆2 .) Define ψn : ∆k → ∆k by ψn (z) = (ζn (A1 (z)), . . . , ζn (Ak (z))). By the previous
argument, for every n there is some zn ∈ ∆k such that ψ(zn ) = q. Since ∆k is compact, we may pass to a
subsequence and thereby assume that zn converges to some limit z∞ . Note that
                                   [                  ∞ [ [
                                                      \                                  
                                         Ai (z∞ ) =                   Ai (zm ) ∪ Ai (z∞ ) .
                                   i∈I                n=1   i∈I m≥n

Moreover, m≥n Ai (zn ) ∩ Ai (z∞ ) is closed for every n, since z∞ is the only limit point of the sequence zn .
             S

It follows that
       [                 [ [                                       [ [                       
     µ    Ai (z∞ ) = lim µ            Ai (zm ) ∪ Ai (z∞ ) ≥ lim lim µ`         Ai (zm ) ∪ Ai (z∞ ) ≥ ∑ qi ,
                       n→∞                                            n→∞ `→∞
       i∈I                       i∈I m≥n                                             i∈I m≥n             i

since ∑i∈I µ` (Ai (z` )) = ∑i∈I qi for every I and ` (using the fact that µ has a density, and so it assigns no
mass to Ai (z` ) ∩ A j (z` )).

6.3   Proof of Lemma 6.1
We introduce two final ingredients before beginning the proof of Lemma 6.1: the first is a gradient bound
on Pt f that is due (in a much more precise form) to Bakry and Ledoux [1].
                                                                                              √
Theorem 6.11. If f : Rn → [0, 1] then for any t > 0, Pt f is differentiable and k∇Pt f k∞ ≤ C/ t.

   The second ingredient is the co-area formula in Gaussian space. To state this, let γn+ denote the
Gaussian surface area in n-dimensions. The co-area formula (see [11] for a reference) is stated next. For
any locally Lipschitz f : Rn → R and for any continuous, compactly supported µ : R → R
                       Z                                          Z
                             µ(t) · γn+ ({x : f (x) ≥ t})dt =              µ( f (x))|∇( f (x))|dγ(x).
                         R                                            Rn


                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                   17
                              A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

The co-area formula relates the Gaussian surface of a Boolean function (specified by {x : f (x) ≥ t}) to the
gradient of f . We remark that the co-area formula is sometimes written with γn+ ({x : f (x) ≥ t}) replaced
by the Gaussian-weighted (n − 1)-dimensional Hausdorff measure of {x : f (x) = t}. The two formulations
are equivalent, because these two quantities are equal for almost every t (see, for example, [24, Chapter
13]).

Proof of Lemma 6.1. We begin by applying Lemma 6.8 to Pt f : let z and g ∈ Fz (Pt f ) be such that
E[g] = E[Pt f ] = E[ f ]. According to Lemma 6.7, Stabt (g) ≥ Stabt ( f ). However, E[|∇g|] cannot be
controlled in general; therefore, we will move to an approximation of g.
   Let µ be a probability measure on [0, ε] with continuous density bounded by 2/ε. By the co-area
formula, for any i 6= j we have
                       Z
                           µ(t)γ + ({x : Pt fi (x) − zi > Pt f j (x) − z j + t}) dt
                         RZ

                       =          µ(Pt fi (x) − zi − Pt f j (x) + z j )|∇Pt fi (x) − ∇Pt f j (x)| dγ(x)
                            Rn
                        2
                              Z
                       ≤     |∇Pt fi (x) − ∇Pt f j (x)| dγ(x)
                        ε Rn
                         C
                       ≤ √ ,
                        ε t
                                                                                               −
where the last inequality follows from Theorem 6.11. In particular, there exist some y+
                                                                                      i j and yi j in [0, ε]
such that if A±           n
              i j = {x ∈ R : Pt f i (x) − zi > Pt f j (x) − z j ± yi j± } then
                                                                      √
                                                   γ + (A±
                                                         i j ) ≤ C(ε)/ t.

We repeat this construction of yi j and Ai j for every ordered pair (i, j) with i 6= j. Define Ai = j6=i A+
                                                                                                             T
                                                                                                             ij,
                                                          −
and note that Ai ⊂ {x : g(x) = ei }. On the other hand, Ai j ⊃ {x : g(x) = ei } for every i, j. Next, define

                                                                A−
                                                         \
                                                  Ci =           ij,
                                                         j6=i
                                                         [
                                                  CI =         Ci ,
                                                         i∈I
                                                                 [            [
                                                  BI = CI \            CJ \         Ai .
                                                                 J)I          i∈I

The meaning of these sets is the following: Ai is the set where fi − zi is significantly larger than any
other f j − z j . On Ci , fi − zi is almost max j f j − z j ; on CI , fi − zi is almost maximal for every i ∈ I; and on
BI , the set of i for which fi − zi is almost maximal is exactly I. Importantly, the collection of all Ai and
BI form a partition of Rn . Our basic strategy will be to set h to be ei on Ai , and then to define h on the
remaining part of the space in order to satisfy two properties: E[h] = E[ f ] and h(x) = ei only if x ∈ BI for
some I 3 i.
     Since the Gaussian surface area obeys the inequalities γn+ (A ∩ B) ≤ γn+ (A) + γn+ (B) and γn+ (A ∪ B) ≤
γn (A) + γn+ (B), and since BI and Ai are defined using a finite (depending on k) number of intersections
 +



                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                     18
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

and unions, it follows that                                       √
                                              γ + (Ai ) ≤ C(ε, k)/ t,
                                                                  √                                        (6.2)
                                              γ + (BI ) ≤ C(ε, k)/ t.
    Now, for any I ⊂ [k],                           [                [
                                                              BJ ∪         Ai
                                                  J:J∩I6=0/          i∈I

contains the set of x for which g(x) ∈ {ei : i ∈ I}. It follows that

                                        ∑ γn (BJ ) ≥ ∑ E[gi ] − γn (Ai ).                                  (6.3)
                                      J:J∩I6=0/                i∈I

Consider the bipartite graph where U = [k] and V = {I : I ⊂ [k]}, and (i, I) ∈ E if i ∈ I. We assign
the weights w(i) = E[gi ] − γn (Ai ) to i ∈ U and w(I) = γn (BI ) for I ∈ V (note that w(i) ≥ 0 because the
construction of Ai ensures that Ai ⊂ {x : g(x) = ei )}). Equation (6.3) ensures that this weighted graph
satisfies the hypothesis of Lemma 6.10, and so there exists p : V → ∆k with pi (I) > 0 only if i ∈ I, and
with
                                       ∑ pi (I)γn (BI ) = E[gi ] − γn (Ai ).
                                       I3i
      Finally, we will use p to define h. First, for every I and i ∈ I, let BI,i be a set of the form {x ∈ Rn :
a ≤ x1 ≤ b} such that γn (BI,i ∩ BI ) = pi (I)γn (BI ); moreover, we choose BI,i such that γn (BI,i ∩ BI, j ) = 0
when i 6= j. Then we set h(x) to equal ei on the set Ai ∪ I3i (BI,i ∩ BI ). By the defining property of p,
                                                                    S

h satisfies E[hi ] = E[gi ] = E[ fi ]. Moreover, h(x) = ei only on a subset of Ai ∪ I3i BI , and on this set
                                                                                        S

Pt fi (x) − zi ≥ max j Pt f j (x) − z j − ε = hg(x), Pt f (x) − zi − ε. It follows that
                                E[hh, Pt f i] ≥ E[hg, Pt f i] − ε ≥ Stabt ( f ) − ε.
By the Cauchy-Schwarz inequality,
                            p             p                 ε
                               Stabt (h) ≥ Stabt ( f ) − p            ,
                                                          Stabt ( f )
which implies that Stabt (h) ≥ Stabt ( f ) − 2ε.
   Finally, we address the surface area of h. Recall that
                                                                      [
                                      {x : h(x) = ei } = Ai ∪                BI,i ∩ BI .
                                                                       I,i

The number of terms on the right is some constant depending on k. By (6.2), γn+ (Ai ) and γn+ (BI ) are
bounded by C(ε,t, k). Since BI,i is an intersection of two √ halfspaces, its Gaussian surface area is at √
                                                                                                         most
a constant. It follows that γn+ ({x : h(x) = ei }) ≤ C(ε, k)/ t for every i. Hence, E[|∇h|] ≤ C(ε, k)/ t.
    After applying Corollary 6.4, this completes the proof except that d = d(k, ε,t) instead of the
claimed d(k, ε), where d(k, ε,t) blows up as t → 0. To eliminate this dependence on t, it suffices
to note that the claim is√trivial if t  (ε/k)2 . Indeed, one can easily construct h with E[h] = E[ f ],
Stabt (h) ≥ E[|h|2 ] − O(k t) and E[|∇h|] ≤ O(k). For example, we could take the pieces {x : h(x) = ei }
to be parallel slabs of the form {x : ai ≤ x1 ≤ bi }, where {ai } and {bi } are chosen so that the slabs have
the correct volumes. If t  (ε/k)2 then such an example will satisfy the claim of the lemma.

                        T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  19
                                 A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

7      Reduction from PTFs to PTFs on a constant number of variables
In this section, we are going to prove Theorem 5.2. We restate it here again for the convenience of the
reader.
Theorem 5.2. Let f : Rn → [k] be a degree-d, (d, ε)-balanced PTF with Ex [ f (x)] = µ where µ =
(µ1 , . . . , µk ). Further, let us assume that Pr[x ∈ Collision( f )] ≤ ε/(40k2 ). Then, for every ε > 0 and t > 0,
there exists a degree-d PTF fjunta : Rn0 → [k] such that
      • kEx [ fjunta (x)] − µk1 ≤ ε, and
      • E[h fjunta , Pt fjunta i] ≥ E[h f , Pt f i] − ε.
    Our main workhorse for this section is going to be two structural theorems for low-degree polynomials
proven in [9]. In order to state these theorems, we will need to introduce several notions related to tensors
and polynomial decomposition.

7.1      Polynomials and tensors
We give a brief description of the connection between symmetric tensors and polynomials under the
Gaussian distribution. The interested reader may consult the book [18] for a detailed background. Let H
denote the Hilbert space Rn and let H⊗q be used to denote the q-ary tensor product of H, i. e., the space
of all multilinear functions Hq → R. For f1 , . . . , fq ∈ H, their tensor product f1 ⊗ · · · ⊗ fq is the element
of H⊗q defined by
                                                                                      q
                                          ( f1 ⊗ · · · ⊗ fq )(h1 , . . . , hq ) = ∏h fi , hi i.
                                                         i=1
It is straightforward to check that H is spanned by elements of the form f
                                     ⊗q
                                                                                                  1 ⊗ · · · ⊗ f q ; we therefore
define an inner product on H⊗q by setting
                                                                                          q
                                         h f1 ⊗ · · · ⊗ fq , g1 ⊗ · · · ⊗ gq i = ∏h fi , gi i
                                                                                      i=1

and extending to H⊗q by bilinearity. The norm of a tensor is defined by k f k2F = h f , f i.
    An element f ∈ H⊗q is said to be symmetric if it is invariant under any permutation σ : [q] → [q], in
the sense that f (h1 , . . . , hq ) = f (hσ (1) , . . . hσ (q) ) for every h1 , . . . , hq ∈ H. Let H q denote the symmetric
subspace of H⊗q . We now describe a map between the space H q and polynomials. Recall that Hq
denotes a Hermite polynomial.
Definition 7.1. The iterated Itô integral Iq maps H q as follows: Let h ∈ H be a unit vector and note
that h⊗q ∈ H q . Then, Iq (h⊗q ) = Hq (hh, xi) where x ∈ Rn . The map Iq is extended linearly to H q .
   For the convenience of the reader, here we describe a basis of the space H q . Consider an unordered
multiset S = {s1 , . . . , sq } ⊆ [n] of size q, and define ΦS ∈ H q by
                                               ΦS =        ∑ es    σ (1)
                                                                           ⊗ · · · ⊗ esσ (q) .
                                                       σ ∈Symq

It is easy to check (and we omit the proof) that the tensors ΦS form a basis for H q :

                             T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                            20
                                           N OISE S TABILITY IS L OW-D IMENSIONAL

Proposition 7.2. Let S be the set of unordered multisets of [n] of size q. Then, {ΦS }S∈S forms an
orthogonal basis of H q .

   A fundamental property of the map Iq is that it is an isometry between the space of symmetric tensors
and polynomials endowed with the standard normal measure (see [18] for a proof).

Proposition 7.3. Ex∼γn [Iq ( f ) · Ip (g)] = δ p=q · h f , gi.

   We will refer to the range of Iq as the Wiener chaos Wq . Based on Proposition 7.3, Iq is a bijective
map from H q to Wq . On the other hand, it is easy to show that any polynomial p : Rn → R of degree at
most d can be expressed uniquely as

                                                       d
                                                 p = ∑ Iq ( fq ) where fq ∈ H q .                                                                         (7.1)
                                                      q=0

      In order to apply the results from [9], we will often restrict our attention to multilinear polynomials,
i. e., those for which every variable appears with degree at most 1. We call a tensor f ∈ H q square-free
if Iq ( f ) is a multilinear polynomial. Square-free tensors in H q may also be characterized as the set of
tensors f for which f (h1 , . . . , hq ) = 0 whenever h1 , . . . , hq contains a repeated element. Alternatively, the
space of square-free tensors in H q is the span of {ΦS } where S ranges over all sets (not multisets). It is
easy to check that if p : Rn → R is a multilinear polynomial, then the tensors { fq }0≤q≤d appearing in the
decomposition of p in (7.1) are all square-free.

7.2    Itô’s multiplication formula
We now state the formula for product of two polynomials Ip ( f ) and Iq (g) in terms of f and g. For a
reference, see Nourdin’s survey [28]. First, we define tensor contraction: for f ∈ H⊗p , g ∈ H⊗q , and
r ≤ p ∧ q, define f ⊗r g ∈ H⊗p+q−2r by

      ( f ⊗r g)(t1 , . . . ,t p+q−2r ) =         ∑           f (t1 , . . . ,t p−r , z1 , . . . , zr ) · g(t p−r+1 , . . . ,t p+q−r , z1 , . . . , zr ).
                                           1≤z1 ,...,zr ≤n

                                                  fr g ∈ H p+q−2r is defined as the symmetrization of
The symmetrized contraction product, denoted by f ⊗
the tensor f ⊗r g:

                                                                 1
                 (f ⊗
                    fr g)(t1 , . . . ,t p+q−2r ) =                         ( f ⊗r g)(tσ (1) , . . . ,tσ (p+q−2r) ),
                                                           (p + q − 2r)! ∑
                                                                         σ

where the sum ranges over all permutations on [p + q − 2r].

Proposition 7.4 (Itô’s multiplication formula).
                                           p∧q      p
                                                 p   q   (p + q − 2r)!
                     Ip ( f ) · Iq (g) = ∑ r! ·    ·   ·  √            · Ip+q−2r ( f ⊗
                                                                                     fr g).
                                         r=0     r   r      p! · q!


                            T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                                                            21
                               A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

     We now list a basic fact about contraction products of tensors which shall be helpful later. In
interpreting this fact, it is useful to think of tensors as tables which can be “flattened” into matrices. That
is, if we fix an orthonormal basis e1 , . . . , en of H then the nq elements {ei1 ⊗ · · · ⊗ eiq : 1 ≤ 11 , . . . , iq ≤ n}
form a basis for H⊗q . For any 1 ≤ r < q, we can view f ∈ H⊗q as a nr × nq−r matrix, where the rows are
indexed by [n]r , the columns are indexed by [n]q−r , and the entry in position (i1 , . . . , ir ), ( j1 , . . . , jq−r ) is
the coefficient in f of the basis element ei1 ⊗ · · · ⊗ eir ⊗ e j1 ⊗ · · · ⊗ e jq−r . The advantage of this point of
view is that contraction can be viewed as matrix multiplication: if M f is the n p−r × nr matrix representing
 f ∈ H⊗p and Mg is the nr × nq−r matrix representing g ∈ H⊗q then M f Mg is the n p−r × nq−r matrix
representing f ⊗r g. (Note that each of these matrix representations depends on the choice of a basis for
H, but that the contraction itself does not.)
     The following fact follows immediately from the previous paragraph and the Cauchy-Schwarz
inequality:

Fact 7.5. For f ∈ H⊗p and g ∈ H⊗q and 0 ≤ r ≤ p ∧ q, k f ⊗r gkF ≤ k f kF · kgkF . Consequently,
kf⊗
  e r gkF ≤ k f kF · kgkF .

    The following polynomial anticoncentration theorem is due to Carbery and Wright.

Theorem 7.6. [4] Let p : Rn → R be a degree-d polynomial. Then, for ε > 0,
                                                   p
                           sup Pr[|p(x) − θ | ≤ ε · Var[p]] = O(d · ε 1/d ).
                                    θ ∈R x


   An immediate consequence is that Prx [|p(x)| > d −O(d) ] ≥ 1/2 for a degree-d polynomial p with
Var[p] = 1. With this observation and Theorem 4.6, it is easy to deduce the following bound. (See
Lemma 5 in [9].)

Theorem 7.7. Let a, b : Rn → R be degree-d polynomials. Further, Ex [a(x) − b(x)] = 0 and Var[a − b] ≤
(τ/d)3d · Var[a]. Then, Prx [sign(a(x)) 6= sign(b(x))] = O(τ).

    Before we proceed further, we will make a minor simplifying assumption, namely that the polynomials
involved in defining f in Theorem 5.2 can be assumed to be multilinear. As we have said earlier, this is
because the results in [9] are stated for multilinear polynomials (though it should be easily possible to
carry it over to non-multilinear polynomials). In order to show this, we will use the following simple
lemma.

Lemma 7.8. Let f : Rn → [k] be a degree-d, (d, ε)-balanced PTF. Then, for any ε > 0 and t > 0, there
exists a degree-d, (d, 2ε)-balanced multilinear PTF fmulti : R` → R such that

    • kEx [ f (x)] − Ex [ fmulti (x)]k1 ≤ ε,

    • |Ex [h f , Pt f i] − Ex [h fmulti , Pt fmulti i]| ≤ ε.

    • Prx [x ∈ Collision( fmulti )] ≤ Prx [x ∈ Collision( f )] + ε.

Here ` = n · (k2 /dε)3d · d 2 .

                            T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                         22
                                              N OISE S TABILITY IS L OW-D IMENSIONAL

                                                                                                                                                         (i)
Proof. Let f = PTF(p(1) , . . . , p(k) ), where p(1) , . . . , p(k) are (d, ε)-balanced. Let p(i) = ∑dq=0 Iq ( fq ) for
 (i)
fq ∈ H⊗q . For some T ∈ N which will be fixed later, consider a collection of ` = n · T variables denoted
by {xi, j }1≤i≤n,1≤ j≤T . For s = 1, . . . , k, we define the polynomial r(s) (in the variables {xi, j }) by
                                                                                                                                         
                               (s)                      (s)       x1,1 + · · · + x1,T       xn,1 + · · · + xn,T
                              r ({xi, j }) = p                          √             ,...,       √                                           .
                                                                            T                         T
                                                                     (i)                                       (q)                                         (i)
In terms of the tensor representations, if fq = ∑1≤ j1 ,..., ji ≤n α j1 ,..., ji e j1 ⊗ · · · ⊗ e ji and we define gq by
                  (i)
r(i) = ∑dq=0 Iq (gq ) then

                        (i)                            (i)          e j1 ,1 + · · · + e j1 ,T       e jq ,1 + · · · + e jq ,T
                        gq =              ∑         α j1 ,..., jq            √                ⊗···⊗          √                .
                                1≤ j1 ,..., jq ≤n                                T                               T

Next, define f 0 : R` → [k] as f 0 = PTF(r(1) , . . . , r(k) ). Since the joint distribution of r(1) , . . . , r(k) under γ`
equals the joint distribution of p(1) , . . . , p(k) under γn , we have that for 1 ≤ i ≤ k,

                                                        Pr [ f (x) = i] =                        Pr [ f 0 (x) = i],
                                                       x∼γn                                    x∼γ`
                                                    Ex∼γn [h f , Pt f i] = Ex∼γ` [h f 0 , Pt f 0 i],
                                and Pr [x ∈ Collision( f )] =                                    Pr [x ∈ Collision( f 0 )].                                (7.2)
                                          x∼γn                                                 x∼γ`

                                                                               (i)                                                                (i)
   To construct a multilinear polynomial, let hq be the orthogonal projection of gq on the space of
                                                                                            (i)
square-free tensors, and define the polynomials w(1) , . . . , w(k) by w(i) = ∑dq=0 Iq (hq ). Finally, we set
fmulti : R` → [k] as fmulti = PTF(w(1) , . . . , w(k) ) and we claim that it has the desired properties. By its
construction, fmulti is a degree-d multilinear PTF. Next, for 1 ≤ j1 , . . . , jq ≤ n, let us define S j1 ,..., jq as

                                     S j1 ,..., jq = {(s1 , . . . , sq ) ∈ [T ]q : | ∪qb=1 ( jb , sb )| < q}.

With this definition, observe that
                                      d
                                                                                                          (i)             1
                r(i) − w(i) = ∑                                                                                                  · Iq ⊗qu=1 e ju ,su .
                                                                                                                                                    
                                                     ∑                               ∑                 α j1 ,..., jq ·
                                     q=1      1≤ j1 ,..., jq ≤n          (s1 ,...,sq )∈S j1 ,..., jq                     T q/2

As the tensors {⊗Tu=1 e ju ,su } ju ∈[n],su ∈[T ] are orthonormal, using Proposition 7.3, we get that

                                                                     d                                                   |S j1 ,..., jq |
                                                                                                        (i)
                                 Var[r(i) − w(i) ] = ∑                               ∑          (α j1 ,..., jq )2 ·                       .
                                                                    q=1 1≤ j1 ,..., jq ≤n                                    Tq

On the other hand,
                                                                           d
                                                                                                              (i)
                                                 Var[r(i) ] = ∑                        ∑               (α j1 ,..., jq )2 .
                                                                         q=1 1≤ j1 ,..., jq ≤n


                              T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                                                               23
                               A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

It is easy to see that |S j1 ,..., jq | ≤ |S1,...,1 | ≤ q2 T q−1 ≤ d 2 T q−1 . Thus,

                                                                                  d2
                                             Var[r(i) − w(i) ] ≤ Var[r(i) ] ·        .
                                                                                  T
By applying Theorem 7.7, we get that if T ≥ (τ/d)3d · d 2 , then

                                             Pr [sign(r(i) ) 6= sign(w(i) )] ≤ τ.
                                            x∼γ`

By a union bound, it follows that Prx∼γ` [ fmulti (x) 6= f 0 (x)] ≤ k · τ and

                              Pr [x ∈ Collision( fmulti )] ≤ Pr [x ∈ Collision( f 0 )] + k · τ.
                             x∼γ`                                x∼γ`

Combining this with (7.2) and setting τ = ε/k gives us the claim. Also, by an application of Cauchy-
Schwarz inequality, we have that
                                                        q
                                                     d
                    q             q                                            
                      Var(w(i) ) ≥ Var(r(i) ) · 1 − √     ≥ Var(r(i) ) · 1 − ε .
                                                      T

Hence, by rescaling the polynomials w(i) , we can ensure that fmulti is a (d, 2ε)-balanced PTF.

7.3    Eigenvalues of tensors and polynomials
We will now define the notion of eigenvalues of tensors and the corresponding polynomials.

Definition 7.9. Let f ∈ H d and d ≥ 2. For a partition of [d] into S, S, define

                                                                            h f , g ⊗ hi
                                         λS,S ( f ) =        max                         .
                                                        g∈H⊗|S| ,h∈H⊗|S|   kgkF · khkF

Define λmax ( f ) by
                                               λmax ( f ) = max λS,S ( f ).
                                                              S:0<|S|<d

Definition 7.10. Let p = ∑dq=0 Iq ( fq ) be a multilinear polynomial of degree d > 0. and let ( f0 , . . . , fd ) ∈
H 0 × · · · × H d be the tensor associated with p. Define

                                                λmax (p) = max λmax ( fq ).
                                                               1<q≤d

                                                                                 λmax (p)
(Note that we exclude λmax ( f1 ).) Further, we say that p is δ -eigenregular if √        ≤ δ.
                                                                                             Var(p)

     It is easy to check that eigenvalues of tensors are invariant under unitary transformations, from which
it follows that eigenregularity is also unaffected by certain changes of variables:
                                                                           p
Fact 7.11. If p(x) is δ -eigenregular then for any 0 < ρ < 1, p(ρ · x + 1 − ρ 2 y) is also δ -eigenregular.

                            T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                24
                                     N OISE S TABILITY IS L OW-D IMENSIONAL

    The next fact states that the contraction product with an eigenregular tensor is significantly contractive.

                                                                                        0
Fact 7.12. Let f ∈ H d satisfy λmax ( f ) ≤ κ. For any g ∈ H⊗d and any 0 < r ≤ min{d − 1, d 0 },
k f ⊗r gkF ≤ κ.


Proof. For a fixed orthonormal basis of H, we represent f and g with the matrices M f and Mg as discussed
                                                           0
above, where M f is an nd−r × nr matrix and Mg is nr × nd −r . The condition λmax ( f ) ≤ κ implies that
                                                                   0
kM f kop ≤ κ, where k · kop denotes the operator norm. For S ∈ [n]d −r , let Mg,S denote the corresponding
column of Mg . Then the corresponding column of M f · Mg is given by M f Mg,S . Thus,


                 kM f · Mg k2F =     ∑         kM f · Mg,S k2F ≤      ∑        κ 2 · kMg,S k2F = κ 2 · kMg k2F .
                                         0                                0
                                   S∈[n]d −r                       S∈[n]d −r


This finishes the proof.



   We next have the following easy proposition which says the sum of copies of the same polynomial
over disjoint variables is eigenregular.


Fact 7.13. Let p : Rn → R and define p̃ : Rnk → R by

                                                          1
                                   p̃(X1 , . . . , Xk ) = √ · p(X1 ) + · · · + p(Xk )),
                                                           k

where each Xi = (xi,1 , . . . , xi,n ) is a disjoint “block” of n variables. Then p̃ is √1k -eigenregular.


Proof. Let d be the degree of p̃ and define the tensors fq by q(X1 , . . . , Xk ) = ∑dq=0 Iq ( fq ). Let H = Rnk
and let H = H1 ⊕ · · · ⊕ Hk where Hi = Rn corresponds to the coordinates in Xi . Note that fq has a block
diagonal structure on H1 ⊕ · · · ⊕ Hk , in the sense that fq (h1 , . . . , hq ) = 0 whenever two of the hi belong
to different components Hi .
    For any g = H⊗q , let g(i) ∈ Hi⊗q denote the ith “diagonal block” of g, defined by


                                       g(i) (h1 , . . . , hq ) = g(ι(h1 ), . . . , ι(hq )),

                                                                                                   (1)             (q)
where ι : Hi → H is the inclusion map. The definition of p̃ ensures that k fq k2F = · · · = k fq k2F for
                                                                                 (i)         (1)
every q, and the fact that fq is block diagonal ensures that k fq k2F = ∑ki=1 k fq k2F = kk fq k2F . Assuming
                                                                     (i)
without loss of generality that Var( p̃) = Var(p) = 1, we have k fq kF ≤ k−1/2 k fq k2F ≤ k−1/2 for every i.

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                            25
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

      For any non-trivial partition S, S of [q], we have

                                                   h fq , g ⊗ hi
                     λS,S ( fq ) =       max
                                     g∈H S ,h∈H S kgkF · khkF
                                                   q       (i)
                                                  ∑i=1 h fq , g(i) ⊗ h(i) i
                                =        max
                                     g∈H S ,h∈H S     kgkF · khkF
                                                       q           (i)
                                                      ∑i=1 h fq , g(i) ⊗ h(i) i
                                ≤        max      q                q
                                     g∈H S ,h∈H S   q                   q
                                                   ∑i=1 kg(i) k2F · ∑i=1 kh(i) k2F
                                                           q
                                                  ∑i=1 √1k kg(i) ⊗ h(i) kF
                                ≤    max       q                q
                                  g∈H S ,h∈H S   q                 q
                                                ∑i=1 kg(i) k2F · ∑i=1 kh(i) k2F
                                                 q
                                               ∑i=1 √1k kg(i) kF · kh(i) kF    1
                                =    max     q                q               ≤√ ,
                                      S
                                  g∈H ,h∈H S   q                   q
                                              ∑i=1 kg(i) k2F · ∑i=1 kh(i) k2F   k

where the last equality uses the fact that kA ⊗ BkF = kAk
                                                       √ F · kBkF , and the last inequality is a√
                                                                                                consequence
of Cauchy-Schwarz. This implies that λmax ( fq ) ≤ 1/ k for all q, and hence λmax ( p̃) ≤ 1/ k.

    One of the two main results that we will use from [9] is a Central Limit Theorem for Gaussian
polynomials:

Theorem 7.14. Let d > 1 and let p1 , . . . , pt : Rn → R be t degree-d polynomials such that for 1 ≤ i ≤ t,
E[pi ] = 0 and Var(pi ) ≤ 1 and each pi is ε-eigenregular. Let F = (p1 (x), . . . , pt (x)) where x ∼ γn and
let C denote the covariance matrix of F, i. e., C[i, j] = Ex∼γn [pi (x) · p j (x)]. Let Z = (Z1 , . . . , Zt ) be a
t-dimensional Gaussian with mean zero and covariance C. Then, for any α : Rt → R such that α ∈ C2 ,
                                                                                            √
              Ex∼γn α(p1 (x), . . . , pt (x)) − EZ α(Z1 , . . . , Zt ) ≤ 2O(d log d) · t 2 · ε · kα 00 k∞ .
                                                                   


7.4     Polynomial regularity lemma
The next result we will need from [9] is a regularity lemma for low-degree polynomials. Suppose that
we begin with a collection of k(d + 1) polynomials {ps,q : 1 ≤ s ≤ k, 0 ≤ q ≤ d}, where ps,q ∈ Wq and
Var[ps,q ] = 1 for all s and q. We will consider “decompositions” of these polynomials consisting of the
following objects:

      • for every 1 ≤ s ≤ k and 0 ≤ q ≤ d, an “outer” polynomial Out(ps,q ) of arity num(s, q) ∈ N; and

      • for every q ≤ s ≤ k, 0 ≤ q ≤ d, and 1 ≤ ` ≤ num(s, q), an “inner” polynomial In(ps,q )` .

We also introduce the following statistics of these polynomials:

      • the “total arity,” Num = ∑ks=1 ∑dq=0 num(s, q);

      • Coeff(ps,q ), defined to be the sum of the absolute values of the coefficients of Out(ps,q ); and

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  26
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

    • Coeff = ∑ks=1 ∑dq=0 Coeff(ps,q ).

Definition 7.15. Consider polynomials {ps,q } and a function β : [1, ∞) → (0, 1) satisfying β (x) ≤ 1/x.
Given collections of polynomials {Out(ps,q ) and {In(ps,q )` } as above, we say that these polynomials are
a (β , M, N) decomposition of {ps,q } if the following conditions hold:

   1. for every 1 ≤ s ≤ k, every 0 ≤ q ≤ d, and every x ∈ Rn ,
                                                                                               
                            ps,q (x) = Out(ps,q ) In(ps,q )1 (x), . . . , In(ps,q )num(s,q) (x) ;       (7.3)

   2. for every 1 ≤ s ≤ k, every 1 ≤ q ≤ d, and every 1 ≤ ` ≤ num(s, q), the polynomial In(ps,q )` lies in
      W j for some 1 ≤ j ≤ q and satisfies Var[In(ps,q )` ] = 1;

   3. for every 1 ≤ s ≤ k and every 1 ≤ q ≤ d, Out(ps,q ) is a multilinear polynomial of degree at most d;

   4. Coeff ≤ M and Num ≤ N; and

   5. for every 1 ≤ s ≤ k, every 1 ≤ q ≤ d, and every 1 ≤ ` ≤ num(s, q), In(ps,q )` is β (Num + Coeff)-
      eigenregular.

    Intuitively, the above definition states the following: given a decreasing function β (·) and a collection
of k multilinear polynomials of degree d, a (β , M, N) decomposition of expresses the polynomials as a
“outer polynomial” composed with “inner polynomials.” The quality of this decomposition is captured by
the following parameters:

    • the sum of the arities (denoted by Num, and bounded by N) and the sum of absolute values of
      coefficients (denoted by Coeff, and bounded by M) of the outer polynomials; and

    • the eigenregularity of the inner polynomial (bounded in terms of β ).

    The main decomposition theorem of [9] says that approximate (β , M, N) decompositions exist and
can be computed.

Theorem 7.16. Fix d ≥ 2 and fix any non-increasing computable function β : [1, ∞) → (0, 1) that satisfies
β (x) ≤ 1/x. There is an algorithm MultiRegularize-Many-Wienersd,β with the following properties.
The algorithm takes as input:

    • a collection {ps,q : 1 ≤ s ≤ k, 0 ≤ q ≤ d} of multilinear polynomials satisfying ps,q ∈ Wq and
      Var[ps,q ] = 1 for every s, q; and

    • a parameter τ > 0.

The output of the algorithm consists of

    • a collection { p̃s,q : 1 ≤ s ≤ k, 0 ≤ q ≤ d} of polynomials satisfying (for every s and q) p̃s,q ∈ Wq ,
      p̃s,0 = ps,0 , and Var[ps,q − p̃s,q ] ≤ τ;

    • two numbers M = Mβ (k, d, τ) and N = Nβ (k, d, τ); and

                        T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                               27
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

      • a (β , M, N) decomposition of { p̃s,q }.

The functions Mβ (k, d, τ) and Nβ (k, d, τ) are computable but not necessarily primitive recursive.

   We remark that our use of Theorem 7.16 from [9] is precisely the reason why the quantity n0 in
Theorem 2.2 is not primitive recursive.

7.5     Second moments and approximate equivalence
As we mentioned in the outline of the proof of Theorem 5.2, one of the main steps in the proof is (after
applying a (β , M, N) decomposition) to swap out one collection of inner polynomials for a different one.
The main result of this section essentially says that in order to do so, it suffices to preserve the second
moments of all the inner polynomials:

Lemma 7.17. Let Ψ : RNumΨ → R be a degree-d polynomial, and let Coeff Ψ be the sum of the absolute
values of its coefficients. Let {Ai }1≤i≤NumΨ and {Bi }1≤i≤NumΨ be centered families of polynomials of
variance 1 such that for all 1 ≤ i ≤ m, ∃0 < j ≤ d such that Ai , Bi ∈ W j . Assume, moreover, that for all 1 ≤
i ≤ j ≤ NumΨ , E[Ai A j ] = E[Bi B j ]. If each Ai , Bi is ζ -eigenregular where ζ · Coeff Ψ · 2NumΨ ·(NumΨ ·d+1) ≤
κ, then |E[Ψ(A1 , . . . , ANum )] − E[Ψ(B1 , . . . , BNum )]| ≤ κ.

    Our main tool in the proof of Lemma 7.17 is the following lemma, which says that the expectation of
a product of eigenregular polynomials is essentially determined by their second moments.

Lemma 7.18. Let p1 , . . . , pt ∈ Rn → R be κ-eigenregular polynomials with variance 1, such that pi ∈ Wqi
for every i. Define C ∈ Rt×t by C(i, j) = hpi , p j i. There exists a function F : Rt×t × Zt → R such that

                                   t 
                                 E ∏ pi − F(C, q1 , . . . , qt ) ≤ 2t(q1 +···+qt +1) · κ.
                                    i=1

In other words, up to the error term of 2t(q1 +···+qt ) · κ, the expectation of the product ∏ti=1 pi is just
dependent on the degrees of the polynomials and the covariance matrix of the polynomials.

Proof of Lemma 7.17. Applying Lemma 7.18 and triangle inequality, we have

                |E[Ψ(A1 , . . . , ANum )] − E[Ψ(B1 , . . . , BNum )]| ≤ ζ · Coeff Ψ · 2NumΨ ·(NumΨ ·d+1) .

Applying the assumption on ζ completes the proof.

    Before proving Lemma 7.18, we will explore some further consequences. The following lemma is
along the same lines as Lemma 7.17, but it concerns the sign of Ψ instead of just its expectation.

Lemma 7.19. Fix ε > 0 and η > 0. Let Ψ, Ai , and Bi satisfy the assumptions of the previous lemma, but
assume that Ai and Bi are ζ -eigenregular for ζ satisfying
                                                       3      η
              ζ · Coeff 2Ψ · 2NumΨ ·(NumΨ ·2d+1) · 2O(d ) ≤     , 2O(d log d) · Num2Ψ · (4c2 ) · ζ ≤ ε,
                                                              4

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  28
                                       N OISE S TABILITY IS L OW-D IMENSIONAL

where c is defined by

                        NumΨ · d d/2
                                                                    d
                                                    ε · η 1/2d
                                       
                                                                             NumΨ
               B = Ω ln              ;δ=          √             1/d
                                                                       ; c =   √ .
                          ε                d · B · NumΨ · Coeff              δ· ε
                                                                                     Ψ

If Var(Ψ(A1 , . . . , ANumΨ )) ≥ η and |E[Ψ(A1 , . . . , ANumΨ )]| ≤ η then

                       |E[sign(Ψ(A1 , . . . , ANum ))] − E[sign(Ψ(B1 , . . . , BNum ))]| ≤ O(ε).

    One step in the proof of Lemma 7.19 is a kind of mollification for the sign function:
Lemma 7.20. Fix ε > 0, η > 0. Let Ψ be a degree-d polynomial of arity NumΨ , and let B, δ , and c be
defined as in Lemma 7.19.
    There is a function g̃c : RNumΨ → [0, 1] satisfying kg̃00c k∞ ≤ 4c2 such that for every collection
{Ai }1≤i≤NumΨ of centered, unit-variance polynomials in n variables that satisfy Var(Ψ(A1 , . . . , Am )) ≥ η,
we have

                                E [|sign(Ψ(A1 , . . . , Am )) − g̃c (A1 , . . . , Am )|] ≤ O(ε).
                               x∼γn

Proof. Let R denote the set {x : sign(Ψ(x)) = 1}. By a standard argument based on convolution with a
mollifier, there is a function g̃c : RNumΨ → [0, 1] such that kg̃00c k∞ ≤ 4c2 and

                                                                      Num2Ψ
                                                                                  
                              sign(Ψ(x)) − g̃c (x) ≤ min 1,                                    (7.4)
                                                               c2 · dist2 (x, ∂ R)
for all x, where dist(x, ∂ R) denotes the Euclidean distance of x from the boundary of R.
    It is not hard to check (and follows immediately from Claim 49 in [9]) that if x ∈ RNumΨ satisfies
                                                      d/2
kxk∞ ≤ B and |φ (x)| ≥ d ·(2B)d ·δ ·Coeff Ψ ·NumΨ then dist(x, ∂ R) ≥ δ , in which case (7.4) implies that
|sign(Ψ(x)) − g̃c (x)| ≤ Num2Ψ /(c2 δ 2 ). Rearranging the logic, if |sign(Ψ(x)) − g̃c (x)| > Num2Ψ /(c2 δ 2 )
                                                                                        d/2
then either max j |A j (x)| > B or Ψ(A1 (x), . . . ANumΨ (x)) ≤ d(2B)d δ · Coeff Ψ · NumΨ . Hence,

  Ex∼γn [|sign(φ (A1 , . . . , ANumΨ )) − g̃c (A1 , . . . , ANumΨ )|]
                                                                                           d/2  Num2
         ≤ Pr max |A j | > B + Pr |φ (A1 , . . . , ANumΨ )| ≤ d · (2B)d · δ · Coeff Ψ · NumΨ + 2 2Ψ .
                                 
          x∼γn   j            x∼γn                                                                c δ
We now bound each term on the right hand side. For the first term, each A j is a centered, unit-variance,
degree-d polynomial and so Theorem 4.6 gives

                          Pr max |A j (x)| > B ≤ NumΨ · d · exp − B2/d .
                                                                         
                                x∼γn      j

To bound the second term, apply Theorem 7.6 and the assumption that Var(Ψ(A1 , . . . , ANumΨ )) ≥ η:
                                                                                            1/d √
                                        d/2                 d/2  d · (2B) · δ 1/d · Coeff Ψ · NumΨ
     Pr |φ (A1 , . . . , Am )| ≤ d · (2B) · δ · Coeff Ψ · NumΨ ≤                                     .
    x∼γn                                                                            η 1/2d
Plugging in our choice of parameters completes the proof.

                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                            29
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Proof of Lemma 7.19. Consider the polynomial ϒ(z1 , . . . , zNumΨ ) = (Ψ(z1 , . . . , zNumΨ ))2 ; let Numϒ be
its arity and let Coeff ϒ be the sum of the absolute values of its coefficients. Observe that ϒ is a degree-2d
polynomial, and that Coeff ϒ ≤ Coeff 2Ψ and Numϒ = NumΨ . Now, applying Lemma 7.17 to the function
ϒ,
                            E[Ψ2 (A1 , . . . , ANum )] − E[Ψ2 (B1 , . . . , BNum )] ≤ η/4.                (7.5)
By Lemma 7.17 applied to the function Ψ,

                E[Ψ(A1 , . . . , ANum )] − E[Ψ(B1 , . . . , BNum )] ≤ ζ · Coeff Ψ · 2NumΨ ·(NumΨ ·d+1)
                                                                                     η                              (7.6)
                                                                    ≤                   2          3
                                                                                                      .
                                                                      4 · Coeff Ψ · 2NumΨ ·d · 2O(d )
     Hölder’s inequality and the Gaussian hypercontractive inequality (see Theorem 5.10 in [18]) imply
                                                                                  2
that for any monomial in Ai , E[∏dj=1 Ai j ] ≤ ∏dj=1 E[|Ai j |d ]1/d ≤ d 2d ≤ 2O(d ) ; by the triangle inequality,
                                                       3
it follows that E[Ψ(A1 , . . . , ANum )] ≤ Coeff Ψ 2O(d ) . The same bound applies with B in place of A, and
together with (7.6) this yields
                                                    2                         2
                            E[Ψ(A1 , . . . , ANum )] − E[Ψ(B1 , . . . , BNum )] ≤ η/4.                              (7.7)

    Combining (7.5) and (7.7), we get that |Var(Ψ(A1 , . . . , ANum )) − Var(Ψ(A1 , . . . , ANum ))| ≤ η/2 and
thus,
                                   Var(Ψ(B1 , . . . , BNum )) ≥ η/2.
Applying Lemma 7.20, we obtain a function g̃c : RNum → [0, 1] such that kg̃00c k∞ ≤ 4c2 ,

                     E[sign(Ψ(A1 , . . . , ANum ))] − E[g̃c (Ψ(A1 , . . . , ANum ))] ≤ O(ε), and
                     E[sign(Ψ(B1 , . . . , BNum ))] − E[g̃c (Ψ(B1 , . . . , BNum ))] ≤ O(ε).

Finally, applying Theorem 7.14, we have

          E[g̃c (Ψ(B1 , . . . , BNum ))] − E[g̃c (Ψ(A1 , . . . , ANum ))] ≤ 2O(d log d) · Num2Ψ · (4c2 ) · ζ ≤ ε.

The last inequality uses the second condition on ζ . Combining these three inequalities, we obtain the
proof.

    In the remainder of this section, we prove Lemma 7.18. Let pi and qi be as in the statement of
Lemma 7.18, and let hi ∈ H⊗qi be defined by pi = Iqi (hi ). To prove the lemma, we will express the
product p1 · · · · · pt in terms of the tensors {h1 , . . . , ht } using successive applications of Itô’s multiplication
formula. Towards this, we first have the following definition.

Definition 7.21. Given p1 , . . . , pt as above, a tuple r = (r1 , . . . , rt−1 ) is said to be a contraction sequence
if it satisfies the following definition:
                                                                              i−1     
                        For all 1 ≤ i ≤ t − 1, 0 ≤ ri ≤ min qi+1 , qi + ∑ (q j − 2r j ) .                           (7.8)
                                                                               j=1



                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                        30
                                             N OISE S TABILITY IS L OW-D IMENSIONAL

    Let Svalid be the set of all contraction sequences. Also for a contraction sequence r = (r1 , . . . , rt ),
define the quantity
                                                                 q
                 t−1      
                           qi+1
                                         i−1
                                    qi + ∑ j=1 (q j − 2r j )
                                                                  (qi+1 + ∑ij=1 (q j − 2r j ))!
            γr = ∏ ri ! ·         ·                           ·√        q                            .
                            ri                ri                qi+1 ! · (qi + ∑i−1
                                                                                 j=1 (q j − 2r j ))!
                 i=1


Also, let us define gr1 = h1 and for 1 < i ≤ t, define gri iteratively as gri = gri−1 ⊗
                                                                                      e ri−1 hi . Define mri = qi+1 +
                                                 r
∑ j=1 (q j − 2r j ) and observe that gi+1 ∈ H i . Applying Itô’s multiplication formula (Proposition 7.4)
  i                                   r        m

iteratively, we obtain that
                                                 t
                                                                                                  r
                                                ∏ pi =                   ∑                    r (g ).
                                                                                         γr Imt−1 t                               (7.9)
                                                i=1           r=(r1 ,...,rt−1 )∈Svalid

The next claim establishes a simple upper bound on γr .

Claim 7.22. ∑r=(r1 ,...,rt−1 )∈Svalid |γr | ≤ 2t(q1 +···+qt +1) .

Proof. By applying the fact that nx ≤ 2n , it is easy to show that
                                          

                                                        q
           
            qi+1
                           i−1
                    qi + ∑ j=1 (q j − 2r j )
                                                          (qi+1 + ∑ij=1 (q j − 2r j ))!     i
    ri ! ·        ·                              ·√           q                          ≤ 2∑ j=1 (q j −2r j )+qi+1 +ri .
             ri                 ri                    q ! · (q + ∑i−1 (q − 2r ))!
                                                                   i+1              i      j=1    j         j


This implies that
                                                 t−1     i                                                      t−1
                                  |γr | ≤ 2∑i=1 (∑ j=1 (q j −2r j )+qi+1 +ri ) ≤ 2t(q1 +···+qt )−∑i=1 ri .
As a result, we obtain
                                                                                                      t−1
                            ∑                |γr | ≤             ∑                2t(q1 +···+qt )−∑i=1 ri ≤ 2t(q1 +···+qt +1) .
                  r=(r1 ,...,rt−1 )∈Svalid             r=(r1 ,...,rt−1 )∈Svalid

    Our strategy for proving Lemma 7.18 is as follows. We will first partition the sum in (7.9) into
two sets and show that the terms belonging to the first set are all bounded by κ. We will then further
partition the terms in the second set into two sets: Again, the terms in the first set will be bounded by κ
whereas terms in the second set will just be a function of (C, q1 , . . . , qt ) which will conclude our proof.
Towards this, let us partition Svalid into two partitions, Sz and Svalid \ Sz where the first set is defined as
Sz = {r ∈ Svalid : For all i < t, (ri = 0) ∨ (ri = qi+1 )}. Next, prove the following proposition.

Proposition 7.23. Let h1 , . . . , ht be as above such that for all 1 ≤ i ≤ t, kh1 kF = . . . = kht kF = 1. Further,
each hi is κ-eigenregular. If r 6∈ Sz , then kgtr kF ≤ κ.

Proof. If r 6∈ Sz , then there is a coordinate 1 ≤ j ≤ t − 1 such that r j 6= 0 and r j 6= q j+1 . Appealing to
Fact 7.5, it is easy to see that kgrj kF ≤ 1. Next, we appeal to Fact 7.12, we get kgrj ⊗r j h j+1 kF ≤ κ. By
the triangle inequality, symmetrization decreases norm and so we obtain that kgrj ⊗   e r j h j+1 kF ≤ κ. Finally,
again using that kh j+2 kF , . . . , kht kF ≤ 1 and applying Fact 7.5 iteratively, we obtain kgtr kF ≤ κ. This
finishes the proof.

                             T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                                   31
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

    To define further partitions of Sz , we will need a somewhat more elaborate definition. To do this,
                                                                                        r
consider any r ∈ Sz . Recall that mri = qi+1 + ∑ij=1 (q j − 2r j ) and gi+1 ∈ H mi . For any ` ∈ N, let S`
denote the symmetric group on ` elements. Consider a tuple of permutations σ = (σ1 , . . . , σt−1 ) such
that σi ∈ Smri . For σ and r, we define gr,σ  i  iteratively as gr,σ
                                                                 1 = h1 and gi
                                                                                    r,σ          r,σ
                                                                                        = σi−1 (gi−1 ⊗ri−1 hi ). Let
                                                                       r                        r,σ
Λr = {σ : σ = (σ1 , . . . , σt−1 )}. With this notation, we have that gt = (1/|Λr |) · ∑σ ∈Λr gt . We will now
partition pairs (r, σ ) (where r ∈ Sz ). For this subsequent partitioning, we will need to associate a graph
with the pair (r, σ ) where r ∈ Sz and then partition based on this graph. To understand what this graph is
meant to capture, note that to obtain gr,σ           r,σ
                                          i+1 from gi , one of the following two events happen: (a) If ri = 0,
                          r,σ
then we first compute gi ⊗ hi+1 and then permute the indices of the resulting tensor by σi . Note that each
time an index is added, we can associate a unique 1 ≤ j ≤ t with this index. (b) If ri = qi+1 , we compute
the contraction product of gr,σ  i   with hi+1 and subsequently permute the indices of the resulting tensor
by σi . Note that crucially, the “contraction” is along the last ri indices of gr,σi . If we do a contraction at
i = i0 , we can associate i0 with all the indices that got collapsed at i = i0 . In other words, when an index
gets created, we associate an element of [t] and another one, when it gets annihilated. The graph structure
is meant to capture this relation as formalized below.


   Description of graph

      1. Initialize bipartite graph with L = R = φ . The vertex set L will be ordered at all points in
         the algorithm.

      2. Add q1 vertices (in order), each colored as ‘1’ to L.

      3. For i = 2 to i = t,

           (a) If ri−1 = 0, then add qi vertices to L with color ‘i’ (in order). Permute the ‘unmatched’
               vertices in L with permutation σi−1 .
           (b) If ri−1 = qi , then add qi vertices to R with color ‘i’ (in order) and match them to the
               ‘last’ qi vertices in L. Permute the unmatched vertices in L with permutation σi−1 .


    The correspondence between the graph constructed above and the contraction process used to
generate gr,σ
          i+1 is obvious. For example, we have the following easy observation.

Observation 7.24. The dimension of the tensor gtr,σ is precisely n|U| where U is the set of unmatched
vertices in L. Consequently, E[gtr,σ ] 6= 0 only if U 6= φ .
     Since we are interested in the expectation of E[∏ti=1 pi ], thus from now onwards we only focus on
(r, σ ) such that there are no unmatched vertices in the graph generated above. Note that the property of
having no unmatched vertices is just a property of r and not σ . The next definition forms the crux of our
basis of partitioning the pairs (r, σ ).
Definition 7.25. For a pair (r, σ ) such that the corresponding graph has no unmatched vertices, we say
that (r, σ ) is “aligned” if and only each color ‘i’, there is a unique color ‘ j’ such that vertices of color i
are only matched to vertices of color ‘ j’ and vice-versa. Else, it is said to be “unaligned.”

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                    32
                                  N OISE S TABILITY IS L OW-D IMENSIONAL

    We will now prove the following claim.
Claim 7.26. Let (r, σ ) be an unaligned pair. Then, kgtr,σ k ≤ κ.
Proof. Note that we are assuming that there are no unmatched vertices in the graph. If the pair is (r, σ )
is unaligned, then there is a unique smallest integer 1 < N ≤ t such that at the N-th step of the algorithm
to form the graph, the pair becomes unaligned. Namely, N is the smallest integer such that one of the
following events happen:
Event A: At the beginning of Step (3), when i = N, we have ri−1 = qi and the last qi vertices in L are not
of the same color.
Event B: At the beginning of Step (3), when i = N, we have ri−1 = qi , the last qi vertices in L are of the
same color (say j) but there is also another unmatched vertex in L colored j.
    We ask the reader to verify that (r, σ ) is unaligned if and only if one of Event A or B happen. We
will now prove that in both cases A and B, our conclusion holds.
Proof for Event A: Let I1 = {1} ∪ {N > i > 1 : ri−1 = 0} and I2 = {N > i > 1 : ri−1 6= 0}. By definition
of N, it is easy to verify that there is a one-one mapping ν from I2 to I1 such that for any a ∈ I1 , vertices
of color a are matched with vertices of color ν(a) and vice-versa. Let us define I3 = I1 \ ν(I2 ). As a
consequence, we have
                                      gr,σ
                                        N−1 = ∏ hha , hν(a) i · µ(⊗ j∈I3 h j ),
                                               a∈I2

where µ is a permutation of the indices of ⊗ j∈I3 h j . Further, since at i = N, the last qN vertices of L are of
at least two different colors, we have that the last qN indices of µ(⊗ j∈I3 h j ) belong to more than one h j
(for j ∈ I3 ). Let A be the set of all indices of ⊗ j∈I3 h j which do not map to the last qN positions under µ.
Consider some assignment ρ for these indices. Then, we have that

                          kgr,σ 2                                                 2
                            N kF = ∑ k ∏ hha , hν(a) i · µ(⊗ j∈I3 h j,ρ ) ⊗qN hN kF .
                                        ρ   a∈I2

Here h j,ρ is tensor obtained by restricting the indices in A to ρ. Since λmax (hN ) ≤ κ, we get

                               kµ(⊗ j∈I3 h j,ρ ) ⊗qN hN k2F ≤ κ 2 · k ⊗ j∈I3 h j,ρ k2F .

This implies that

                kgr,σ 2   2                 2                  2    2               2   2
                  N kF ≤ κ ∑ ∏ hha , hν(a) i · k ⊗ j∈I3 h j,ρ kF ≤ κ ∏ hha , hν(a) i ≤ κ .
                               ρ a∈I2                                         a∈I2

The penultimate inequality uses the fact that ∑ρ k ⊗ j∈I3 h j,ρ k2F = 1. Using the fact that kg j kF = 1 for all
j > N and applying Fact 7.5, we get that |gtr,σ | ≤ κ. This finishes the proof.
Proof for Event B: Let us define I1 , I2 and I3 as in Event A. As before, we have

                                     gr,σ
                                      N−1 = ∏ hha , hν(a) i · µ(⊗ j∈I3 h j ),
                                               a∈I2

where µ is a permutation of the indices of ⊗ j∈I3 h j . The crucial difference is that there exists j0 ∈ I3 such
that if B denotes the set of positions where the indices of h j0 get mapped under µ, then this includes

                        T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                  33
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

the last qN indices as a proper subset. Using λmax (h j0 ) ≤ κ, we get that kµ(⊗ j∈I3 h j ) ⊗qN hN k2F ≤
κ 2 · k ⊗ j∈I3 \{ j0 } h j k2F . However, k ⊗ j∈I3 \{ j0 } h j k2F . = 1, and hence we get kgr,σ 2   2
                                                                                             N kF ≤ κ . Using the fact
                                                                                  r,σ
that kg j kF = 1 for all j > N and applying Fact 7.5, we get that |gt | ≤ κ. This finishes the proof.

Claim 7.27. Let (r, σ ) be an aligned pair. Then, gr,σ
                                                   N is just dependent on the covariance matrix C (and
the pair (r, σ )).

Proof. Since (r, σ ) is an aligned pair, it means that the set {1, . . . ,t} can be partitioned into two sets
given by L and R and a bijection π : L → R such that gtr,σ = ∏a∈L hha , hπ(a) i. Note that the mapping π is
just a function of (r, σ ) and once π is fixed, gtr,σ is just a function of the matrix C.

Proof of Lemma 7.18. Let B = {(r, σ ) : r ∈ Sz and (r, σ ) is aligned}. Applying (7.9), we get that
                  t
                            1                       r,σ                  1
              E[∏ pi ] −            ∑
                           |Λr | (r,σ )∈B
                                               r (g
                                          γr Imt−1                r
                                                   t ) ≤ ∑ kγr · gt k +
                                                                        |Λr |       ∑ kγr · gtr,σ k.
                i=1                                      r6∈Sz                (r,σ )6∈B:r∈Sz


Note that (1/|Λr |) ∑(r,σ )∈B γr gtr,σ is dependent just on C and r. Further, applying Claim 7.26, Proposi-
tion 7.23 and Claim 7.22, we see that the right hand side can be bound by 2t(q1 +···+qt +1) · κ. This finishes
the proof.

7.6    Proof of Theorem 5.2
Let us assume that f = PTF(p1 , . . . , pk ) where f is the PTF appearing in the statement of Theorem 5.2
After applying Lemma 7.8, we can pretend that the PTF f = PTF(p1 , . . . , pk ) in Theorem 5.2 is multilinear.
After scaling the polynomials p1 , . . . , pk , we can assume that Var(ps ) = 1 for all 1 ≤ s ≤ k. With this, we
note that there are constants cs,q for 1 ≤ s ≤ k and 1 ≤ q ≤ d such that
                                                              d
                                               ps = ps,0 + ∑ cs,q ps,q ,
                                                             q=1


where ps,q ∈ Wq , Var(ps,q ) = 1 for all s ∈ [k], 1 ≤ q ≤ d and ∑dq=1 c2s,q = 1.
    Let β : [0, ∞) → [0, 1) be a quickly decreasing function, with properties that will be specified later.
Apply Theorem 7.16 (with τ = (ε/(kd))3d ), and let p̃s,q (for 0 ≤ s ≤ k and 1 ≤ q ≤ d) be the resulting
polynomials. Defining p̃s = p̃s,0 + ∑dq=1 cs,q p̃s,q , Theorem 7.16 implies that E[ps − p̃s ] = 0 and

                                                       d
                                  Var(ps − p̃s ) = ∑ c2s,q Var(ps,q − p̃s,q ) ≤ τ.
                                                     q=1

Define the PTF f˜ = PTF( p̃1 , . . . , p̃k ). By Theorem 7.7 and a union bound,

                            kEx∼γn [ f˜(x)] − Ex∼γ` [ f(x)]k ≤ ε,
                        Ex∼γn [h f˜, Pt f˜i] − Ex∼γ` [h f, Pt fi] ≤ ε,
                                      Pr [x ∈ Collision( f˜)] ≤ Pr [x ∈ Collision( f)] + ε.
                                     x∼γ`                          x∼γn



                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                      34
                                      N OISE S TABILITY IS L OW-D IMENSIONAL

    From now on, we will forget about f and work with f˜. Recalling (from Theorem 7.16) that
                                                                                            
                      p̃s,q = Out( p̃s,q ) In( p̃s,q )1 (x), . . . , In( p̃s,q )num(s,q) (x) ,

our next task is to “swap out” the inner polynomials for a new collection. Let P denote the collection of
polynomials {In( p̃s,q )` : 1 ≤ s ≤ k, 1 ≤ q ≤ d, 1 ≤ ` ≤ num(s, q)}, and we will introduce another family
of polynomials R = {In(rs,q )` : 1 ≤ s ≤ k, 1 ≤ q ≤ d, 1 ≤ ` ≤ num(s, q)} that satisfies certain conditions:
Condition 7.28.
    • Each polynomial In(rs,q )` has arity n0 = Os,k,β ,ε (1).
    • For every s, q, and `, In(rs,q )` ∈ W j if and only if In(ps,q )` ∈ W j .
    • P and R have the same covariances: for every 1 ≤ s1 , s2 ≤ k, 1 ≤ q1 , q2 ≤ d, and 1 ≤ `i ≤ num(si , qi )
      (for i ∈ {1, 2}),

                        E[In( p̃s1 ,q1 )`1 (x) · In( p̃s2 ,q2 )`2 (x)] = E[In(rs1 ,q1 )`1 (x) · In(rs2 ,q2 )`2 (x)].

    • Every polynomial in the family R is β (Num + Coeff)-eigenregular.
    It turns out to be possible to satisfy these conditions:
Lemma 7.29. There exists a family R of polynomials on n0 = poly(d, Num, β (Num + Coeff)) variables
that meets the requirements of Condition 7.28.
Proof. For 1 ≤ i ≤ d, let Pi = {In(ps,q )` }`=1,...,num(s,q) ∩ Wi . Let mi denote the size of Pi . We will
construct R by constructing the corresponding set Ri for each 1 ≤ i ≤ d. Note that individually con-
structing Ri for each 1 ≤ i ≤ d suffices for our construction, because if r1 ∈ Wi , r2 ∈ W j and i 6= j then
E[r1 (x) · r2 (x)] = 0.
      For convenience of notation, let us enumerate the elements of Pi as {p1 , . . . , pmi }, and we will
construct Ri = {r1 , . . . , rmi }. Define h j ∈ H i by p j = Ii (h j ), and note that h1 , . . . , hmi span a space of
dimension at most mi . Setting H1 = Rmi , Proposition 7.2 implies that dim(H1 i ) ≥ mi , and so there exist
v1 , . . . , vmi ∈ H1 i satisfying hv j , v j0 i = hh j , h j0 i for all j, j0 .
      Let δ = β (Num + Coeff), κ = d1/δ 2 e, and n0 = mi κ; note that n0 is indeed polynomial in d, Num,
and β (Num + Coeff). For 1 ≤ j ≤ mi , define r j : Rn0 → R as follows: divide x ∈ Rn0 into κ blocks of
size mi each (call the blocks X1 , . . . , Xκ ) and define
                                                 1                               
                                          r j = √ · q j (X1 ) + · · · + q j (Xκ ) .
                                                  κ
Then
                       κ                                      κ                    κ
                          1                                      1                    1
       E[r j1 · r j2 ] = ∑  · E[q j1 (X j ) · q j2 (X j )] = ∑ · hv j1 , v j2 i = ∑ · hh j1 , h j2 i = E[p j1 · p j2 ].
                      j=1 κ                                  j=1 κ                j=1 κ

The first equality relies on the observation that q j1 , q j2 ∈ Wi for i > 1 and thus E[q j1 (X j ) · q j2 (X` )] =
E[q j1 (X j )]E[q j2 (X` )] = 0 if j 6= `; the second and fourth equality use Proposition 7.3. Finally, Fact 7.13
implies that the polynomials {r j }1≤ j≤mi are δ = β (Num + Coeff)-eigenregular.

                             T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                         35
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

      Having constructed the family R, for 1 ≤ s ≤ k and 1 ≤ q ≤ d, we define
                                                                                            
                           rs,q = Out( p̃s,q ) In(rs,q )1 (x), . . . , In(rs,q )num(s,q) (x) .

That is, compared to p̃s,q , the inner polynomials have changed but the outer polynomial is the same. For
1 ≤ s ≤ k, we define
                                                               d
                                              rs = p̃s,0 + ∑ cs,q rs,q ,
                                                              q=1

Finally, we define fjunta = PTF(r1 , . . . , rk ). Note that fjunta : Rn0 → {0, 1}.
    We will now show that this construction has all the properties stated in Theorem 5.2. First of all,
note that as long as the function β (·) is computable, n0 is a computable function of d and ε. To examine
the noise stability of fjunta , we introduce a new family of polynomials which will essentially be “noise-
attenuated” versions of the existing ones. These polynomials will be over either 2n or 2n0 variables. For
1 ≤ s ≤ k, 1 ≤ q ≤ d and 1 ≤ ` ≤ num(s, q), we define
                                                                        p
                               In(us,q )` (x, y) = In( p̃s,q )` (e−t x + 1 − e−2t y),
                                                                        p
                               In(vs,q )` (x, y) = In(rs,q )` (e−t x + 1 − e−2t y),

and
                                                                                               
                        us,q = Out( p̃s,q ) In(us,q )1 (x, y), . . . , In(us,q )num(s,q) (x, y) ,
                                                                                               
                        vs,q = Out( p̃s,q ) In(vs,q )1 (x, y), . . . , In(vs,q )num(s,q) (x, y) ,

and finally
                                                               d
                                              us = us,0 + ∑ cs,q us,k ,
                                                              q=1
                                                              d
                                              vs = vs,0 + ∑ cs,q vs,k ,
                                                            q=1

where us,0 = vs,0 = ps,0 = rs,0 . To help keep the notation straight, note that every polynomial involving
“u” is on 2n variables, while every polynomial involving “v” is on 2n0 variables. The PTFs corresponding
to u and v will be denoted fu = PTF(u1 , . . . , uk ) and fv = PTF(v1 , . . . , vk ).
    The purpose of introducing these new polynomials is to have the following expressions for noise
stability:
                               E [h f˜(x), Pt f˜(x)i] =   E [h f˜(x), fu (x, y)i],
                               x∼γn                       x∼γn ,y∼γn

                         E [h fjunta (x), Pt fjunta (x)i] =         E      [h fjunta (x), fv (x, y)i].
                        x∼γn                                  x∼γn ,y∼γn

The next claim establishes relations between the pairwise correlations of polynomials in the families
{In(us,q )` } and {In(vs,q )` }.

                        T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                             36
                                       N OISE S TABILITY IS L OW-D IMENSIONAL

Claim 7.30. For all 1 ≤ s1 , s2 ≤ k, 1 ≤ q1 , q2 ≤ d and 1 ≤ `i ≤ num(si , qi ) (for i ∈ {1, 2}), we have
                E[In(us1 ,q1 )`1 (x, y) · In(us2 ,q2 )`2 (x, y)] = E[In(vs1 ,q1 )`1 (x, y) · In(vs2 ,q2 )`2 (x, y)]
and
                  E[In(ps1 ,q1 )`1 (x) · In(us2 ,q2 )`2 (x, y)] = E[In(rs1 ,q1 )`1 (x) · In(vs2 ,q2 )`2 (x, y)].
Proof. Note that In(us1 ,q1 )`1 and In(us2 ,q2 )`2 are obtained by applying a unitary transformation (on the
space of variables) to the polynomials In(ps1 ,q1 )`1 and In(ps2 ,q2 )`2 . Since the standard Gaussian measure
is rotationally invariant,
                 E[In(us1 ,q1 )`1 (x, y) · In(us2 ,q2 )`2 (x, y)] = E[In( p̃s1 ,q1 )`1 (x) · In( p̃s2 ,q2 )`2 (x)].
Likewise,
                  E[In(vs1 ,q1 )`1 (x) · In(vs2 ,q2 )`2 (x)] = E[In(rs1 ,q1 )`1 (x, y) · In(rs2 ,q2 )`2 (x, y)].
Since the right hand sides above are equal (by the construction of the family {In(rs,q )` }), the first claim
follows.
    To prove the second claim, note that that there exist j1 , j2 ∈ N, such that
            In(ps1 ,q1 )`1 (x), In(rs1 ,q1 )`1 (x) ∈ W j1 and In(us2 ,q2 )`2 (x, y), In(vs2 ,q2 )`2 (x, y) ∈ W j2 .
If j1 6= j2 , then the second claim holds trivially because both sides are zero; we will assume, therefore,
that j1 = j2 = j. Next, we observe that
                  E[In(ps1 ,q1 )`1 (x) · In(us2 ,q2 )`2 (x, y)] = hIn(ps1 ,q1 )`1 (x), Pt In(ps2 ,q2 )`2 (x)i,
and recall that Pt w = e− jt w for all w ∈ W j . Hence,
                E[In(ps1 ,q1 )`1 (x) · In(us2 ,q2 )`2 (x, y)] = e− jt E[In(ps1 ,q1 )`1 (x) · In(ps2 ,q2 )`2 (x)],
                 E[In(rs1 ,q1 )`1 (x) · In(vs2 ,q2 )`2 (x, y)] = e− jt E[In(rs1 ,q1 )`1 (x) · In(rs2 ,q2 )`2 (x)],
and as before, the right hand sides are equal.

    We will now list the conditions that we will require on the function β (·). These conditions look
somewhat intimidating, but it is easy to see that they will be satisfied as long as β decreases sufficiently
quickly.
Condition 7.31. The function β : [1, ∞) → [0, 1] should satisfy the following conditions:
   1. For ξ = ε/(40k2 ) and for B(1) , δ (1) , c(1) defined as

                         Num · d d/2 (1)
                                                                            d
             (1)                                                  ξ                          Num
            B = Ω ln                     ;δ =                   √          1/d
                                                                                  ; c(1) =      p ,
                            ξ                               (1)
                                                       d · B · Num · Coeff                  (1)
                                                                                           δ · ξ
      we require
                                                                                          1   3
                              β (Coeff + Num) · Coeff 2 · 2Num·(Num·2d+1) · 2O(d ) ≤          and
                                                                                          4
                                          2O(d log d) · Num2 · 4c2(1) · β (Coeff + Num) ≤ ξ .


                           T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                       37
                              A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

                                                                                                        2d
   2. Define ξ = ε/(40k2 ), L = 4 · 9d+1 · (d + 1)2 · logd/2 (k · d/ε) and ϑ = (L/2) · L−2 . Further, define
      B(2) , δ (2) and c(2) as
                                                                                2d
                           2 · Num · 2d d (2)              ξ · ϑ 1/2d
                                            
              (2)                                                                             2 · Num
          B         = Ω ln               ;δ =              √                2/d
                                                                                    ; c(2) =        p .
                                 ξ              2d · B(2) · 2 · Num · Coeff                  δ (2) · ξ

      We require

                                                                                              ϑ   3
                            β (Coeff 2 + 2 · Num) · Coeff 4 · 22Num·(Num·4d+1) · 2O(d ) ≤         and
                                                                                              4
                                        2O(d log d) · Num2 · 4c2(2) · β (Coeff 2 + 2 · Num) ≤ ξ .

   Applying the above conditions on β will allow us to conclude that our pairs of polynomials have
approximately the same joint distribution of signs. For the rest of this section, set ξ = ε/(40k2 ).

Corollary 7.32. If β satisfies Condition 7.31 then for all 1 ≤ s, s0 ≤ k,

  (a) Prx∼γn [ p̃s ≥ 0] − Prx∼γn0 [rs ≥ 0] ≤ ξ ,

  (b) Prx,y∼γn [us ≥ 0] − Prx,y∼γn0 [vs ≥ 0] ≤ ξ ,

  (c) Prx∼γn [ p̃s · p̃s0 ≥ 0] − Prx∼γn0 [rs · rs0 ≥ 0] ≤ ξ ,

  (d) Prx,y∼γn [us · us0 ≥ 0] − Prx,y∼γn0 [vs · vs0 ≥ 0] ≤ ξ ,

  (e) Prx,y∼γn [ p̃s · us0 ≥ 0] − Prx,y∼γn0 [rs · vs0 ≥ 0] ≤ ξ .

Proof. We first note that it suffices to prove (a), (c) and (e). This is because the polynomials {us } (resp.
{vs }) are obtained by a unitary transformation of variables on the polynomials { p̃s } (resp. {rs }). For
1 ≤ s ≤ k, note that there is a Ψs : RNum → R such that

                               p̃s = Ψs ({In( p̃s,q )1 (x), . . . , In( p̃s,q )num(s,q) (x)}1≤q≤d ),

                               rs = Ψs ({In(rs,q )1 (x), . . . , In(rs,q )num(s,q) (x)}1≤q≤d ).

Proof of Item (a): Recall that Var( p̃s ) = 1. Define B(1) , δ (1) , c(1) as in Condition 7.31. We now apply
Lemma 7.19 with Ψ = Ψs , η = 1, {Ai } = P, and {Bi } = R. Condition 7.31 implies that the B polynomials
(i. e., In(rs,q )` ) satisfy the eigenregularity assumption in Lemma 7.19. The conclusion of Lemma 7.19
proves (a).
Proof of Item (c): For 1 ≤ s, s0 ≤ k, note that there is a function Ψs,s0 : R2·Num → R such that

                                 p̃s · p̃s0 = Ψs,s0 ({In( p̃s,q )` (x)}q,` , {In( p̃s0 ,q )(x)}q,` ),
                                  rs · rs0 = Ψs,s0 ({In(rs,q )` (x)}q,` , {In(rs0 ,q )(x)}q,` ).


                            T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                             38
                                      N OISE S TABILITY IS L OW-D IMENSIONAL

Note that Ψs,s0 is a degree-2d polynomial with Coeff Ψs,s0 ≤ Coeff 2 and NumΨs,s0 ≤ 2Num. In order to
apply Lemma 7.19, we will need a lower bound on the variance of p̃ · p̃s :
                                                                       L −22d
                                                 Var( p̃s · p̃s0 ) ≥     ·L   ,
                                                                       2
where L = 4 · 9d+1 · (d + 1)2 · logd/2 (k · d/ε). (This lower bound will be proven as Lemma 7.33.) Note that
the right hand side above is exactly what we called ϑ in Condition 7.31. Now we will apply Lemma 7.19
with η = ϑ : Claim 7.30 implies that the second moments match, while the second part of Condition 7.31
ensures that the polynomials In(rt,q )` are sufficiently eigenregular. The conclusion of Lemma 7.19 proves
item (c).
Proof of Item (e): The proof of item (e) is the same as the proof of item (c) with us0 taking over the role
of p̃s0 . We omit the details.

Lemma 7.33. Let p, q : Rn → R be degree-d polynomials such that Var(p) = Var(q) = 1. Let T =
max{E[p2 ], E[q2 ]} and L = 4 · T · 9d+1 · (d + 1)2 . Then
                                                                  L       −2d
                                                 Var(p · q) ≥       · (L)2 .
                                                                  2
    We will begin with an easier lower bound, in terms of the “highest-level weights:”

Lemma 7.34. If p = ∑di=0
                      1
                         Ii ( fi ) and q = ∑di=0
                                              2
                                                 Ii (gi ) for fi ∈ H i and gi ∈ H i then

                                               Var(p · q) ≥ k fd1 k2F · kgd2 k2F .

Proof.
                         d1    d2
             p·q =      ∑ ∑ Ir ( fr ) · Ir (gr )
                                      1    1       2    2
                        r1 =0 r2 =0
                         d1    d2 r1 ∧r2
                                         p
                                      r1   r2   (r1 + r2 − 2r)!
                   = ∑ ∑ ∑ r! ·          ·    ·  √ √            · Ir1 +r2 −2r ( fr1 ⊗
                                                                                    fr gr2 ).
                     r1 =0 r2 =0 r=0  r    r       r1 ! r2 !

The second equality uses Itô’s multiplication formula (Proposition 7.4). The term with r = 0, r1 = d1 and
r2 = d2 is orthogonal to all the other terms because it is the only one belonging to the (r1 + r2 )th Wiener
chaos; hence,
                               p                                       
                                    (d1 + d2 )!                           (d + d2 )!
              Var(p · q) ≥ Var √ √                              f0 gd2 ) = 1
                                                · Id1 +d2 ( fd1 ⊗                            fr gd2 k2F .
                                                                                       k fd1 ⊗
                                     d1 ! d2 !                             d1 ! · d2 !
Finally, Neuberger [27] showed that
                                                                 1
                                            fr gd2 k2F ≥
                                      k fd1 ⊗                           k fd1 k2F kgd2 k2F ,
                                                              d1 +d2
                                                                d1

which completes the proof.

                       T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                              39
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Proof of Lemma 7.33. Let us express p and q in terms of iterated Itô integrals.
                                                 d                       d
                                         p = ∑ Ir ( fr ) and q = ∑ Ir (gr ).
                                              r=0                       r=0

Let α = E[p2 ] and β = E[q2 ]. Let Γ : N → [0, 1] be a function (which we shall fix later). Consider the
following iterative process:
    • Start with i = d and j = d. If k fi kF · kg j kF ≥ Γ(i + j), then stop the process.
                  p
    • If k fi kF ≤ Γ(i + j), then i ← i − 1, else j ← j − 1.
    • If any of i or j reaches zero, terminate the process.
                                                                                             x
Recall that T = max{α, β } and L = 4 · T · 9d+1 · (d + 1)2 . We let Γ(x) = L · L−2 . First, we claim that for
Γ(·) as chosen here, the iterative process above terminates with (i, j) such that neither of them is zero.
Towards a contradiction, assume that the above process terminates with i = 0 and j > 0. Further, for
d ≤ ` ≤ 1, when the value of i drops from ` to ` − 1, let the value of i + j be κ` . Note that κ1 < · · · < κd
and κ1 ≥ 2. Thus, if the process terminates with i = 0, then
                                             d              d
                                Var(p) = ∑ k f` k2F ≤ ∑ Γ(κ` ) ≤ 2 · Γ(κ1 ) < 1.
                                            `=1           `=1

This results in a contradiction which means that both i and j must be non-zero at the end of the process.
Here the penultimate inequality uses that for our choice of Γ(·), Γ(x + 1) ≤ Γ(x)/2 and the last inequality
uses that Γ(κ1 ) < 1/2.
      Let us assume that (i0 , j0 ) is the pair returned by the above iterative process and define p̃ = ∑ir=00
                                                                                                                Ir ( fr )
              j0                                                                   2
and q̃ = ∑r=0 Ir (gr ). Applying Lemma 7.34, we have Var( p̃ · q̃) ≥ Γ (i0 + j0 ). Observe that p · q =
p̃ · q̃ + (p − p̃) · q + p̃ · (q − q̃). Thus, it follows that Var(p · q − p̃ · q̃) = Var((p − p̃) · q + p̃ · (q − q̃)).
Applying Jensen’s inequality, we get
                      p                 p              p                     p
                        Var(p · q) ≥ Var( p̃ · q̃) − Var( p̃ · (q − q̃)) − Var(q · (p − p̃)).
Hence, it suffices to bound Var( p̃ · (q − q̃)) and Var(q · (p − p̃)). To do this, notice that by definition of
the process,
                                                             d
                         Var(q − q̃), Var(p − p̃) <         ∑          Γ(`) < 2 · Γ(i0 + j0 + 1).
                                                        `=i0 + j0 +1

Now, Cauchy-Schwarz and the Gaussian hypercontractive inequality imply that E[(r · s)2 ] ≤ 9d E[r2 ]E[s2 ]
for any degree-d polynomials r and s. Hence,
                   Var( p̃ · (q − q̃)), Var(q · (p − p̃)) ≤ 2 · Γ(i0 + j0 + 1) · d · (d + 1) · 8d · T.
Thus,         p                             √                √     √ p
                Var(p · q) ≥ Γ(i0 + j0 ) − 2 2 · (d + 1) · (2 2)d · T · Γ(i0 + j0 + 1).
                                 p                    √
However, note that for any x ≥ 0, Γ(x + 1) = Γ(x)/ L. Plugging the value of L, we obtain that
                               p                               L    −2d
                                 Var(p · q) ≥ Γ(i0 + j0 )/2 ≥ · L2 .
                                                               2

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                       40
                                   N OISE S TABILITY IS L OW-D IMENSIONAL

    Before continuing with the proof, let us summarize what we have shown so far:

Theorem 7.35. Fix ε > 0 and t > 0, and let p1 , . . . , pk : Rn → R be degree-d polynomials satisfying
Var(ps ) = 1 and |E[ps ]| ≤ logd/2 (k · d/ε) for all 1 ≤ s ≤ k. There exists a computable n0 = n0 (k, d,t, ε)
and polynomials r1 , . . . , rk : Rn0 → R such that for every 1 ≤ s, s0 ≤ k:

  (a) Prx∼γn [ps ≥ 0] − Prx∼γn [rs ≥ 0] ≤ ε,

  (b) Prx,y∼γn [us ≥ 0] − Prx,y∼γn0 [vs ≥ 0] ≤ ε,

  (c) Prx∼γn [ps · ps0 ≥ 0] − Prx∼γn0 [rs · rs0 ≥ 0] ≤ ε,

  (d) Prx,y∼γn [us · us0 ≥ 0] − Prx,y∼γn0 [vs · vs0 ≥ 0] ≤ ε,

  (e) Prx,y∼γn [ps · us0 ≥ 0] − Prx,y∼γn0 [rs · vs0 ≥ 0] ≤ ε,
                             √                                       √
where us (x, y) = ps (e−t x + 1 − e−2t y) and vs (x, y) = rs (e−t x + 1 − e−2t y).

    Indeed, Theorem 7.35 follows directly from Corollary 7.32 and the fact that sign(ps ) and sign( p̃s )
disagree with probability at most O(ε).
    Returning to the notation of Corollary 7.32, our next goal is to prove that Prx∼γn0 [x ∈ Collision( fjunta )]
is small.
                                                                                              3k2 ξ
Claim 7.36.      Pr [x ∈ Collision( fjunta )] ≤ (k2 + 1) · Pr[x ∈ Collision( f )] + k · ξ +         .
               x∼γn0                                                                            2

Proof. For a multivariate PTF f = PTF(p1 , . . . , pk ), define Unique( f , i) = Collision( f ) ∩ {x : pi (x) ≥ 0}.
In other words, it is the set of all points such that i is the unique index such that pi (x) ≥ 0. Since we
haven’t mentioned f and fjunta for a while, recall that f = PTF(p1 , . . . , pk ) and fjunta = PTF(r1 , . . . , rk ).
    Next, observe that for real numbers a, b 6= 0,

                                              1                                           
                       I [a ≥ 0 ∧ b ≥ 0] =      I [a · b ≥ 0] + I [a ≥ 0] + I [b ≥ 0] − 1                    (7.10)
                                              2

(where I(P) is 1 if P is true and 0 otherwise). Since ps (x) 6= 0 almost surely for every s, the identity
above implies that

               | Pr[ps (x) ≥ 0 ∧ ps0 (x) ≥ 0] − Pr[rs (x) ≥ 0 ∧ rs0 (x) ≥ 0]|
                   1                                                                           
                ≤        Pr[ps (x) ≥ 0] − Pr[rs (x) ≥ 0] + Pr [ps0 (x) ≥ 0] − Pr [rs0 (x) ≥ 0]
                   2                                           x∼γn                  x∼γn
                          1                                                          
                     +         Pr [ps0 (x) · ps (x) ≥ 0] − Pr [rs0 (x) · rs (x) ≥ 0]
                          2 x∼γn                          x∼γn
                   3ξ
                ≤      ,                                                                                     (7.11)
                    2

                         T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                    41
                                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

where the last inequality follows from the first and third items of Corollary 7.32. Now observe that

       Pr [ps (x) ≥ 0] − ∑ Pr [ps (x) ≥ 0 ∧ ps0 (x) ≥ 0] ≤ Pr [x ∈ Unique( f , s)] ≤ Pr [ps (x) ≥ 0],
      x∼γn                                x∼γn                                   x∼γn                             x∼γn
                                 s0 6=s

            Pr [rs (x) ≥ 0] − ∑ Pr [rs (x) ≥ 0 ∧ rs0 (x) ≥ 0] ≤ Pr [x ∈ Unique( fjunta , s)] ≤ Pr [rs (x) ≥ 0].
           x∼γn                            x∼γn                                  x∼γn                                    x∼γn
                                  s0 6=s

Subtracting these two and applying (7.11), we obtain

      Pr [x ∈ Unique( f , s)] − Pr [x ∈ Unique( fjunta , s)]                      ≤      Pr [ps (x) ≥ 0] − Pr [rs (x) ≥ 0]
      x∼γn                                  x∼γn                                        x∼γn                  x∼γn
                                                                                                                                     3kξ
                                                                                  +        Pr [ps (x) ≥ 0 ∧ ps (x) ≥ 0] +
                                                                                        ∑ x∼γ                            0  .
                                                                                        s0 6=s   n                        2

Adding over all s and applying the first item of Corollary 7.32, we obtain
 k
                                                                                             3k2 ξ
   Pr [x ∈ Unique( f , s)] − Pr [x ∈ Unique( fjunta , s)] ≤ k · ξ +
∑ x∼γ                                                                 + ∑ Pr [ps (x) ≥ 0 ∧ ps (x) ≥ 0]                           0
s=1          n              x∼γ                    n                2    x∼γ
                                                                                                     s0 6=s   n



      Noting that for all (s, s0 ), Pr[ps (x) ≥ 0 ∧ ps0 (x) ≥ 0] ≤ Pr[x ∈ Collision( f )], we obtain
       k
                                                                                                                                3k2 ξ
         Pr [x ∈ Unique( f , s)] − Pr [x ∈ Unique( fjunta , s)] ≤ k2 · Pr[x ∈ Collision( f )] + k · ξ +
      ∑ x∼γ                                                                                               .
      s=1         n               x∼γ                   n                                               2

This however immediately implies that
                                                                                                                   3k2 ξ
                           Pr[x ∈ Collision( fjunta )] ≤ (k2 + 1) · Pr[x ∈ Collision( f )] + k · ξ +                     .
                           x                                                                                         2
Lemma 7.37.
                                                                                                                                3kξ
           E[h f , Pt f i] − E[h fjunta , Pt fjunta i] ≤ 2 Pr[x ∈ Collision( f )] + 2 Pr[x ∈ Collision( fjunta )] +                 .
                                                                                                                                 2
Proof. We begin by noting the following two inequalities :
                                                   k
                            E[h f , Pt f i] − ∑ Pr[ps (x) ≥ 0 ∧ us (x, y) ≥ 0] ≤ 2 Pr[x ∈ Collision( f )],
                                                  s=1

                                                        k
                      E[h fjunta , Pt fjunta i] − ∑ Pr[rs (x) ≥ 0 ∧ vs (x, y) ≥ 0] ≤ 2 Pr[x ∈ Collision( fjunta )].
                                                       s=1
Thus,

 E[h f , Pt f i] − E[h fjunta , Pt fjunta i]                 ≤ 2 Pr[x ∈ Collision( f )] + 2 Pr[x ∈ Collision( fjunta )]
                                                                  k
                                                             +   ∑ Pr[ps (x) ≥ 0 ∧ us (x, y) ≥ 0] − Pr[rs (x) ≥ 0 ∧ vs (x, y) ≥ 0] .
                                                                 s=1
                                                                                                                                     (7.12)

                                  T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                                     42
                                     N OISE S TABILITY IS L OW-D IMENSIONAL

We now seek to bound Pr[ps (x) ≥ 0 ∧ us (x) ≥ 0] − Pr[rs (x) ≥ 0 ∧ vs (x) ≥ 0]. As before, using that the
polynomials ps , us , rs and vs are zero only on a measure zero set and using the identity from (7.10), we
obtain that

                      Pr[ps (x) ≥ 0 ∧ us (x, y) ≥ 0] − Pr[rs (x) ≥ 0 ∧ vs (x, y) ≥ 0]
                     1                                         1
               ≤       · Pr[ps (x) ≥ 0] − Pr[rs (x) ≥ 0] + · Pr[us (x, y) ≥ 0] − Pr[vs (x, y) ≥ 0]
                     2                                         2
                     1
               +       · Pr[ps (x) · us (x, y) ≥ 0] − Pr[rs (x) · vs (x, y) ≥ 0] .
                     2
Applying Corollary 7.32 to bound all the terms, we obtain that

                                                                                                 3ξ
                       Pr[ps (x) ≥ 0 ∧ us (x, y) ≥ 0] − Pr[rs (x) ≥ 0 ∧ vs (x, y) ≥ 0] ≤            .
                                                                                                  2
Plugging this bound back to (7.12), we obtain that

                                                                                                                    3kξ
      E[h f , Pt f i] − E[h fjunta , Pt fjunta i] ≤ 2 Pr[x ∈ Collision( f )] + 2 Pr[x ∈ Collision( fjunta )] +          .
                                                                                                                     2
    We are now in a position to finish the proof of Theorem 5.2 and show that the construction fjunta
indeed satisfies the required properties. Note that by construction, fjunta is a degree-d PTF over n0
variables where as we have said before, once β (·) is a computable function, n0 = n0 (k, ε, d,t) is a
computable function. Further, it is easy to see that β (·) can be chosen to be computable; we just need to
decrease sufficiently quickly to satisfy Condition 7.31.
    We first bound kE[ fjunta ] − µk1 = kE[ fjunta ] − E[ f ]k1 . To do this, note that
                               k
   kE[ fjunta ] − E[ f ]k1 ≤ ∑ | Pr[ p̃s (x) ≥ 0] − Pr[rs (x) ≥ 0]| + Pr[x ∈ Collision( f )]
                             s=1

                                                                          + Pr[x ∈ Collision( fjunta )]

                                                                                                                    3k2 · ξ
                          ≤ k · ξ + Pr[x ∈ Collision( f )] + (k2 + 1) · Pr[x ∈ Collision( f )] + k · ξ +                    .
                                                                                                                      2
Here the last inequality follows by applying the first item of Corollary 7.32 and Claim 7.36. Plugging in the
value of ξ = ε/(20k2 ) and the upper bound on Pr[x ∈ Collision( f )], we obtain that kE[ fjunta ]−E[ f ]k1 ≤ ε.
    For the second part, note that applying Lemma 7.37, we obtain

                                                                                                            3kξ
       h fjunta , Pt fjunta i ≥ h f , Pt f i − 2 Pr[x ∈ Collision( f )] − 2 Pr[x ∈ Collision( fjunta )] −       .
                                                                                                             2
Again plugging in the value ξ = ε/(20k2 ) and the upper bound on Pr[x ∈ Collision( f )], we obtain that

                                           h fjunta , Pt fjunta i ≥ h f , Pt f i − ε.

This completes the proof of Theorem 5.2.

                          T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                                                  43
                        A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

References
 [1] D OMINIQUE BAKRY AND M ICHEL L EDOUX: Lévy–Gromov’s isoperimetric inequality for
     an infinite dimensional diffusion generator. Inventiones Math., 123(1):259–281, 1996.
     [doi:10.1007/BF01232376] 17

 [2] I TAI B ENJAMINI , G IL K ALAI , AND O DED S CHRAMM: Noise sensitivity of Boolean func-
     tions and applications to percolation. Publ. Math. Inst. Hautes Études Sci., 90:5–43, 1999.
     [doi:10.1007/BF02698830, arXiv:math/9811157] 2

 [3] C HRISTER B ORELL: Geometric bounds on the Ornstein-Uhlenbeck velocity process. Zeitschrift für
     Wahrscheinlichkeitstheorie und Verw. Gebiete, 70(1):1–13, 1985. [doi:10.1007/BF00532234] 2

 [4] A NTHONY C ARBERY AND JAMES W RIGHT: Distributional and Lq norm inequalities for
     polynomials over convex bodies in Rn . Mathematical Research Letters, 8(3):233–248, 2001.
     [doi:10.4310/MRL.2001.v8.n3.a1] 22

 [5] J OE C ORNELI , I VAN C ORWIN , S TEPHANIE H URDER , VOJISLAM S ESUM , YA X U , E LIZABETH
     A DAMS , D IANA DAVIS , M ICHELLE L EE , R EGINA V ISOCCHI , N EIL H OFFMAN , AND ROBERT
     H ARDT: Double bubbles in Gauss space and spheres. Houston J. Math., 34(1):181–204, 2008. 3

 [6] A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN: Majority is stablest: Discrete
     and SoS. Theory of Computing, 12(4):1–50, 2016. Preliminary version in STOC’13.
     [doi:10.4086/toc.2016.v012a004, arXiv:1211.1001] 7

 [7] A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN: Noise stability is computable and
     approximately low-dimensional. In Proc. 32nd Computational Complexity Conf. (CCC’17), pp.
     10:1–10:11. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017. (conference version of this
     paper). [doi:10.4230/LIPIcs.CCC.2017.10, arXiv:1701.01483]

 [8] A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN: Non interactive simulation of correlated
     distributions is decidable. In Proc. 29th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’18),
     pp. 2728–2746. ACM Press, 2018. [doi:10.1137/1.9781611975031.174, arXiv:1701.01485] 5, 9

 [9] A NINDYA D E AND ROCCO A. S ERVEDIO: Efficient deterministic approximate counting for low-
     degree polynomial threshold functions. In Proc. 46th STOC, pp. 832–841. ACM Press, 2014.
     [doi:10.1145/2591796.2591800, arXiv:1311.7178] 4, 5, 6, 20, 21, 22, 26, 27, 28, 29

[10] I RIT D INUR , E HUD F RIEDGUT, G UY K INDLER , AND RYAN O’D ONNELL: On the Fourier tails
     of bounded functions over the discrete cube. Israel J. Math., 160(1):389–412, 2007. Preliminary
     version in STOC’06. [doi:10.1007/s11856-007-0068-9] 11

[11] H ERBERT F EDERER: Geometric Measure Theory. Springer, 1969. [doi:10.1007/978-3-642-62010-
     2] 5, 17

                     T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                        44
                               N OISE S TABILITY IS L OW-D IMENSIONAL

[12] BADIH G HAZI , P RITISH K AMATH , AND M ADHU S UDAN: Decidability of non-interactive sim-
     ulation of joint distributions. In Proc. 57th FOCS, pp. 545–554. IEEE Comp. Soc. Press, 2016.
     [doi:10.1109/FOCS.2016.65] 8

[13] S TEVEN H EILMAN: Euclidean partitions optimizing noise stability. Electron. J. Probab., 19(71):1–
     37, 2014. [doi:10.1214/EJP.v19-3083, arXiv:1211.7138] 3

[14] S TEVEN H EILMAN , E LCHANAN M OSSEL , AND J OE N EEMAN: Standard simplices and pluralities
     are not the most noise stable. Israel J. Math., 213(1):33–53, 2016. Preliminary version in ITCS’15.
     [doi:10.1007/s11856-016-1320-y, arXiv:1403.0885] 3

[15] M ICHAEL H UTCHINGS , F RANK M ORGAN , M ANUEL R ITORÉ , AND A NTONIO ROS: Proof
     of the double bubble conjecture. Electron. Res. Announc. Amer. Math. Soc., 6:45–49, 2000.
     [doi:10.1090/S1079-6762-00-00079-2, arXiv:math/0406017] 3

[16] M ICHAEL H UTCHINGS , F RANK M ORGAN , M ANUEL R ITORÉ , AND A NTONIO ROS: Proof
     of the double bubble conjecture. Ann. of Math., 155(2):459–489, 2002. [doi:10.2307/3062123,
     arXiv:math/0406017] 3

[17] M ARCUS I SAKSSON AND E LCHANAN M OSSEL: Maximally stable Gaussian partitions with
     discrete applications. Israel J. Math., 189(1):347–396, 2012. [doi:10.1007/s11856-011-0181-7,
     arXiv:0903.3362] 2, 7

[18] S VANTE JANSON: Gaussian Hilbert Spaces.                  Cambridge University Press, 1997.
     [doi:10.1017/CBO9780511526169] 20, 21, 30

[19] G IL K ALAI: A Fourier-theoretic perspective on the Condorcet paradox and Arrow’s theorem. Adv.
     in Appl. Math., 29(3):412–426, 2002. [doi:10.1016/S0196-8858(02)00023-4]

[20] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for Max-Cut and other 2-variable CSPs? In Proc. 45th FOCS, pp. 146–154.
     IEEE Comp. Soc. Press, 2004. [doi:10.1109/FOCS.2004.49] 3, 45

[21] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for Max-Cut and other 2-variable CSPs? SIAM J. Comput., 37(1):319–357,
     2007. Preliminary version [20]. [doi:10.1137/S0097539705447372] 2

[22] P RAVESH KOTHARI , A MIR NAYYERI , RYAN O’D ONNELL , AND C HENGGANG W U: Testing
     surface area. In Proc. 25th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’14), pp. 1204–
     1214. ACM Press, 2014. [doi:10.1137/1.9781611973402.89] 5, 14

[23] M ICHEL L EDOUX: Semigroup proofs of the isoperimetric inequality in Euclidean and Gauss space.
     Bull. des Sciences Mathématiques, 118(6):485–510, 1994. 2, 5, 14

[24] F RANCESO M AGGI: Sets of Finite Perimeter and Geometric Variational Problems:
     An Introduction to Geometric Measure Theory. Cambridge University Press, 2012.
     [doi:10.1017/CBO9781139108133] 18

                      T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                           45
                        A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

[25] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability of
     functions with low influences: Invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295, arXiv:math/0503503] 2, 3, 6,
     7

[26] J OE N EEMAN: Testing surface area with arbitrary accuracy. In Proc. 46th STOC, pp. 393–397.
     ACM Press, 2014. [doi:10.1145/2591796.2591807, arXiv:1309.1387] 5

[27] J OHN W. N EUBERGER: Norm of symmetric product compared with norm of tensor product. Linear
     and Multilinear Algebra, 2(2):115–121, 1974. [doi:10.1080/03081087408817047] 39

[28] I VAN N OURDIN: Lectures on Gaussian approximations with Malliavin calculus. In Séminaire de
     Probabilités XLV, pp. 3–89. Springer, 2013. [doi:10.1007/978-3-319-00321-4_1, arXiv:1203.4147]
     21

[29] RYAN O’D ONNELL: Analysis of Boolean Functions. Cambridge Univ. Press, 2014. 2

[30] P RASAD R AGHAVENDRA AND DAVID S TEURER: How to round any CSP. In Proc. 50th FOCS, pp.
     586–594. IEEE Comp. Soc. Press, 2009. [doi:10.1109/FOCS.2009.74] 7

[31] B EN W. R EICHARDT: Proof of the double bubble conjecture in Rn . J. Geom. Anal., 18(1):172–191,
     2008. [doi:10.1007/s12220-007-9002-y, arXiv:0705.1601] 3

[32] B EN W. R EICHARDT, C ORY H EILMANN , Y UAN Y. L AI , AND A NITA S PIELMAN: Proof of the
     double bubble conjecture in R4 and certain higher dimensional cases. Pacific J. Math., 208(2):347–
     366, 2003. [PJM]. 3


AUTHORS

      Anindya De
      Assistant professor
      University of Pennsylvania
      anindyad seas upenn edu
      http://www.seas.upenn.edu/~anindya


      Elchanan Mossel
      Professor
      Massachusetts Institute of Technology
      elmos mit edu
      http://www.math.mit.edu/~elmos/




                      T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                          46
                             N OISE S TABILITY IS L OW-D IMENSIONAL

    Joe Neeman
    Assistant professor
    University of Texas, Austin
    jneeman math utexas edu
    https://web.ma.utexas.edu/users/jneeman/


ABOUT THE AUTHORS

    A NINDYA D E is an assistant professor of computer science at the University of Pennsylvania.
       He received his Ph. D. from U. C. Berkeley in 2013, advised by Luca Trevisan. He was
       a postdoc at the Institute for Advanced Study and DIMACS from 2013 to 2015 and an
       assistant professor at Northwestern University from 2015 to 2018. His research interests
       are in complexity theory, learning theory, discrete analysis and applied probability.


    E LCHANAN M OSSEL is a professor of mathematics at MIT. He received his Ph. D. in
       mathematics from Hebrew University in 2000, advised by Yuval Peres. After his Ph. D.,
       he was a postdoc in the Theory Group of Microsoft Research, Redmond, and a Miller
       fellow in Statistics and Computer Science at U. C. Berkeley. His research interests
       include combinatorial statistics, discrete Fourier analysis and influences, randomized
       algorithms, computational complexity, MCMC, Markov random fields, social choice,
       game theory and evolution.


    J OE N EEMAN is a professor of mathematics at the University of Bonn and an assistant
        professor of mathematics at the University of Texas, Austin. He received his Ph. D. in
        2013 from U. C. Berkeley, advised by Elchanan Mossel, after which he was a postdoc at
        U. T. Austin. He works in various areas of probability and on applications to computer
        science.




                    T HEORY OF C OMPUTING, Volume 15 (6), 2019, pp. 1–47                            47