Majority is Stablest: Discrete and SoS

Authors Anindya De, Elchanan Mossel, Joe Neeman,
Plaintext
                           T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50
                                        www.theoryofcomputing.org




          Majority is Stablest: Discrete and SoS
                   Anindya De∗                   Elchanan Mossel†                  Joe Neeman‡
               Received September 11, 2013; Revised September 9, 2014; Published July 17, 2016




       Abstract: The “Majority is Stablest” Theorem has numerous applications in hardness of
       approximation and social choice theory. We give a new proof of the “Majority is Stablest”
       Theorem by induction on the dimension of the discrete cube. Unlike the previous proof, it
       uses neither the “invariance principle” nor Borell’s result in Gaussian space. Moreover, the
       new proof allows us to derive a proof of “Majority is Stablest” in a constant level of the Sum
       of Squares hierarchy. This implies in particular that the Khot-Vishnoi instance of Max-Cut
       does not provide a gap instance for the Lasserre hierarchy.

ACM Classification: F.2, G.3
AMS Classification: 68Q17, 60G15, 60D99
Key words and phrases: Majority is Stablest, Sum-of-Squares hierarchy, Gaussian isoperimetry

1     Introduction
The “Majority is Stablest” Theorem [45] affirmed a conjecture in hardness of approximation [30] and
in social choice theory [27]. The result has since been extensively used in the two areas. One of the
surprising features of the proof of [45] is the crucial use of deep results in Gaussian analysis [8] and an
“Invariance Principle” that connects the discrete result to the Gaussian one.
    Since the statement of the “Majority is Stablest” Theorem [45] deals with functions on the discrete
cube, it is natural to ask (as many have) if there is a “discrete proof” of the statement that “Majority
     An extended abstract of this paper appeared in the Proceedings of the 45th ACM Symposium on the Theory of Computing,
2013 [11].
   ∗ Supported by Umesh Vazirani’s Templeton Foundation Grant 21674.
   † Supported by NSF award DMS-1106999, CCF-1320105, DOD ONR grant N000141110140 and Simons Foundation grant

328025.
   ‡ Supported by NSF award DMS-1106999 and DOD ONR grant N00014-11-1-0140, ONR grant N00014-14-1-0823.



© 2016 Anindya De, Elchanan Mossel, and Joe Neeman
c b Licensed under a Creative Commons Attribution License (CC-BY)                          DOI: 10.4086/toc.2016.v012a004
                               A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

is Stablest.” In this paper we answer this question affirmatively and provide a short general proof of
the “Majority is Stablest” Theorem. The proof does not rely on Borell’s result, nor does it rely on
the “Invariance Principle.” One advantage of our new proof of “Majority is Stablest” is that it can be
turned into a constant level “Sum of Squares” proof [3]. This shows that Khot-Vishnoi instance of
M AX -C UT [33] does not provide an integrality gap instance for M AX -C UT in the Lasserre hierarchy [36].


1.1   Functions with low-influence variables
In discrete Fourier analysis, special attention is devoted to functions f : {−1, 1}n → {0, 1} with low
influences. The ith influence of f is defined by

             Infi ( f ) = P[ f (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) 6= f (x1 , . . . , xi−1 , −xi , xi+1 , . . . , xn )] ,   (1.1)

where the probability is taken over the uniform distribution on the discrete cube.
     Functions with low influences have played a crucial role in the development of the theory of discrete
Fourier analysis. Starting with Kahn, Kalai, and Linial [26], the use of hyper-contractive estimates applied
to low-influence variables has become one of the main techniques in discrete Fourier analysis [57, 17].
     Of particular interest are functions all of whose influences are low. The work of Friedgut and
Kalai [17] shows that such functions have sharp thresholds. Central work in theoretical computer
science [9, 14, 52] also pointed to the importance of low-influence functions, including in the context of
the “Unique Games Conjecture” [29, 30]. Such functions have also attracted much interest in the theory
of social choice, see, e. g., [16, 28].
     In the context of voting it is natural to exclude voting schemes that give individual voters too much
power. The same is true in the theory of hardness of approximation where a central concept is to
distinguish between functions that really depend on many variables versus those which have a strong
dependency on a small number of variables, see, e. g., [22, 29, 14]. In this context, it is helpful to recall
the definitions of two standard families of Boolean functions. The dictator functions { fi }i∈[n] are defined
as fi (x) = xi . On the other hand the majority function, MAJ(x) = SIGN(∑ xi ). The dictator functions are
at one end of the spectrum where any of these functions depend on exactly one variable. On the other end
is the majority function, where all variables have o(1) influence.
     The “Majority is Stablest” Theorem considers the correlation between f (x) and f (y) where x, y ∈
{−1, 1}n are ρ-correlated vectors with ρ > 0. Assuming E[ f ] = 1/2, the function that maximizes
E[ f (x) f (y)] is a dictator function. The “Majority is Stablest” theorem states that for functions with low
influences the value of E[ f (x) f (y)] cannot be much larger than the corresponding value for the majority
function. More formally,

Definition 1.1. For ρ ∈ (−1, 1), the noise stability of f : {−1, 1}n → R at ρ is defined to be

                                                    Stabρ ( f ) := E[ f (x) f (y)] ,

when (x, y) ∈ {−1, 1}n × {−1, 1}n is chosen so that (xi , yi ) ∈ {−1, 1}2 are independent random variables
with E[xi ] = E[yi ] = 0 and E[xi yi ] = ρ.


                            T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                                         2
                                   M AJORITY IS S TABLEST: D ISCRETE AND S O S

Theorem 1.2 (“Majority Is Stablest” [45]). Let 0 ≤ ρ ≤ 1 and ε > 0 be given. Then there exists τ > 0
such that if f : {−1, 1}n → [0, 1] satisfies E[ f ] = 1/2 and Infi ( f ) ≤ τ for all i, then

                                             Stabρ ( f ) ≤ 1 − arccos
                                                                  π
                                                                      ρ
                                                                        +ε .

    By Sheppard’s Formula [54], the quantity

                                                           arccos ρ
                                                      1−
                                                              π

is precisely limn→∞ Stabρ (Majn ), where
                                                                        n 
                                          Majn (x1 , . . . , xn ) = sign ∑ xi .
                                                                        i=1

In particular, the “Majority is Stablest” Theorem shows that no low-influence function can be much more
noise stable than the majority function.
    We also remark here that Theorem 1.2 readily generalizes to the case when E[ f ] 6= 1/2, with the right
hand side replaced by the corresponding quantity for the shifted majority with the same expectation. This
statement of “Majority is Stablest” was conjectured in [30] in the context of hardness of approximation
for M AX -C UT. By assuming that Theorem 1.2 holds, the authors showed that it is U NIQUE -G AMES
hard1 to approximate the maximum cut in graphs to within a factor greater than .87856. . . . This result is
optimal, since the efficient algorithm of Goemans and Williamson [19] is guaranteed to find partitions
that cut a .87856. . . fraction of the maximum. A closely related conjecture (for ρ = −1/3) was made by
Kalai in the context of Arrow’s Impossibility Theorem [27]. The results of [45] imply Kalai’s conjecture
and show that Majority minimizes the probability of Arrow’s paradox in ranking 3 alternatives using a
balanced ranking function f . See [27, 45] for more details.
    The statement of Theorem 1.2 deals with Boolean functions, yet the proof of [45] crucially relies on
Gaussian analysis as (a) it uses a deep result of Borell [8] on noise stability in Gaussian space and (b) it
uses the invariance principle developed in [45] that allows to deduce discrete statements from Gaussian
statements. This raises the following natural (informal) question:


Question: Is there a “discrete” proof of “Majority is Stablest”?

    In other words, does there exist a proof of “Majority is Stablest” not using Borell’s result? or any
other result in Gaussian space? We note that almost all prior results in discrete Fourier analysis do
not use Gaussian results. In particular, the classical hyper-contractive estimates [7, 5] are proved by
induction on dimension in the discrete cube. Moreover, most of the results in the area starting from KKL
including [26, 57, 17, 9] do not require sophisticated results in Gaussian geometry.
    In our main result we provide a positive answer to the question above. Informally we show that
   1 A decision problem P is said to be U NIQUE -G AMES hard, if given an instance G of Unique Games, the problem of deciding

whether VAL(G) ≤ ε or VAL(G) ≥ 1 − ε reduces to P for any ε > 0.


                           T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                            3
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Main Result: There is a proof of “Majority is Stablest” by induction on dimension.

    Our proof is short and elegant and involves only elementary calculus and hyper-contractivity. The
main difficulty in the proof is finding the right statement to prove by induction. The induction statement
involves a certain function J, which was recently used in the derivation of a robust version of Borell’s
result and “Majority is Stablest” [44] using Gaussian techniques and the invariance principle.
    In a way, our results here are an analogue of Bobkov’s famous inequality in the discrete cube [6].
Bobkov proved by induction a discrete functional inequality that at the limit becomes the Gaussian
isoperimetric inequality. Moreover, the functional that Bobkov studied was later used by Bakry and
Ledoux [2] to give a semi-group proof of the Gaussian isoperimetric inequality. Moving from isoperimetry
to noise sensitivity, the order is reversed: [44] developed a certain functional in order to give a semi-group
proof of Borell’s result. Here we use the same functional to give a discrete proof by induction.
    It is well known that the “Majority is Stablest” Theorem implies Borell’s result. Here we show
how this can be done by elementary methods only (our proof of Borell’s result does not even require
hyper-contractivity!). Our proof of Borell’s result joins a number of recent proofs of the result including
using spherical symmetrization [24], sub-additivity [34] and a semi-group proof [44]. It is the simplest
proof of Borell’s result using elementary arguments only ([24] uses sophisticated spherical re-arrangement
inequalities, [34] only works for sets of measure 1/2 and certain noise values and [44] requires basic
facts on the Orenstein-Uhlenbeck process).
    Since it was proved, Theorem 1.2 was generalized a number of times including in [13, 42]. The
results and their generalization have been used numerous times in hardness of approximation and social
choice theory including in [1, 48, 43, 18]. Our simple proof extends to cover all of the generalizations
above. It also enables to prove a Sum-of-Squares version of the statement of “Majority is Stablest,” thus
answering the main open problem of [49] as we discuss next.

1.2   Sum of Squares proof system
We now discuss an application of our new proof of “Majority is Stablest” to hardness of approximation.
To discuss the application, we will first need to introduce the “Sum of Squares” (SoS) proof system. In a
nutshell, the SoS proof system is an algebraic proof system where constraints are encoded by polynomial
(in)equalities and the deduction rules are specified by a restricted class of polynomial operations. Viewing
this proof system as a refutation system for polynomial inequalities, the goal is to show that the given
system of constraints is infeasible by using the allowed polynomial operations to arrive at a polynomial
constraint which is “obviously” infeasible.
    Without further ado, we introduce the following notation: let X = (X1 , . . . , Xn ) be a sequence of
variables and let R[X] be the ring of polynomials on X. For polynomials p1 , . . . , pm ∈ R[X], let A =
{p1 ≥ 0, . . . , pm ≥ 0} be a set of constraints (on X = (X1 , . . . , Xn )). Also, let M[X] ⊂ R[X] be the set
of polynomials which can be expressed as sums-of-squares. In other words, q ∈ M[X] if and only if
q = r12 + · · · + r`2 for some r1 , . . . , r` ∈ R[X]. For S ⊆ [m], we use pS to denote ∏i∈S pi with p0/ = 1. Now,
suppose that there exists {qS }S⊆[m] such that for all S ⊆ [m], qS ∈ M[X] and

                                              −1 = ∑ pS · qS .
                                                     S⊆[m]


                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                   4
                               M AJORITY IS S TABLEST: D ISCRETE AND S O S

Then, it is clear that the constraint set A is infeasible over Rn . The surprisingly powerful theorem
of Stengle [56] (and a slightly weaker version proven earlier by Krivine [35]) shows that whenever
A is infeasible, such a certificate of infeasibility always exists. This theorem is known as Stengle’s
Positivstellensatz. Hence, given an infeasible system of constraints A, we can consider a proof system
where the proof of infeasibility is given by the set {qS : S ⊆ [m]}. We shall refer to this as the Sum of
Squares (SoS) proof system. In fact, provided a certain compactness condition holds, the certificate of
infeasibility (i. e., the set {qS : S ⊆ [m]}) can always be assumed to have qS = 0 for |S| > 1; this is due to
Putinar [25].
     While the results in [35, 56, 25] were well-known in the algebraic geometry community and are
intimately tied to Hilbert’s seventeenth problem [23], the interest in the theoretical computer science
community is relatively new. In the late 1980s, Shor [55] introduced the idea of replacing “positivity
constraints” by “sum-of-squares constraints” in optimization problems and also noted that the latter type
of constraints (for a fixed degree d) can be enforced using semidefinite programming. Subsequently, this
idea appeared in other works in optimization (for example, see Nesterov [46]). The first paper to view
Stengle’s Positivstellensatz as a proof system for refutation was Grigoriev and Vorobjov [21]. It should
be mentioned that an earlier paper by Lombardi, Mnev and Roy [38] also considered the proof theoretic
aspects of Positivstellensatz but no attempt was made to quantify the complexity of such proofs. While
the degree of the proofs in the SoS proof system can be potentially unbounded, from the point of view of
complexity theory, it is also interesting to consider a restricted form of the SoS proof system where one
only looks at proofs of refutation where max deg(pS · qS ) ≤ d. The resulting hierarchy of proof systems
where level d corresponds to proofs of refutation of degree at most d, is referred to as the SoS hierarchy.
     Besides the obvious motivation from proof complexity, another reason to consider this restricted
version is that while one loses completeness (i. e., infeasible constraint sets A may not have a proof of
refutation in the degree-d SoS hierarchy for any fixed d), for any fixed d, the restricted proof system is
effective in the following sense: If the set A has a proof of infeasibility of degree d, then it can be found in
time O(m · nO(d) ) using semidefinite programming. As we have mentioned, while Shor [55] was the first
to observe this, the hierarchy was first systematically treated as a tool for optimization by two concurrent
but independent works by Jean Lasserre [36] and by Pablo Parrilo (in his Ph. D. thesis) [50]. Strictly
speaking, the semidefinite programming hierarchy considered by Lasserre is the dual of the SoS hierarchy
considered by Parrilo. Roughly, the dual of the d th level of the Lasserre hierarchy corresponds to the 2d th
level of the SoS hierarchy. However, in this paper, we will be mainly concerned with the hierarchy at level
d = O(1) vs. d = ω(1) and hence we will interchangeably use the terms “SoS hierarchy” and “Lasserre
hierarchy.” We remark that while the works of Lasserre and Parrilo were concurrent, Lasserre was the
first to connect this hierarchy with the existing lift-and-project hierarchies such as the Lovász-Schrijver
hierarchy [40].
     Given that the degree-d SoS hierarchy is automatizable, several researchers have tried to understand
the limitations of its power. Grigoriev [20] showed linear lower bounds for proofs of refutation of Tseitin
tautologies and the mod 2 principle. The latter result was essentially rediscovered by Schoenebeck in the
Lasserre world independently [53].


Applications to hardness of approximation. While the results of Parillo [50] and Lasserre [36] have
been known for more than a decade, there were only a few works in the theoretical computer science

                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                 5
                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

community which harnessed the algorithmic power of [50, 36] (see [4, 10]). In fact, for the results which
did use Lasserre hierarchy, it was not clear if the full power of Lasserre hierarchy was required, or whether
weaker hierarchies, like the one of Lovász and Schrijver [40], would have sufficed.
    However, in a recent exciting paper, Barak et al. [3] used the degree-8 SoS hierarchy to refute the
known integrality gap instances for Unique Games [33, 51, 31]. In other words, there are degree-8 SoS
proofs which can be used to certify that the true value of the integrality gap instances is o(1). This is
interesting for two reasons. The first is that even after a decade of intense investigation, these integrality
gaps remained one of the strongest evidence towards the truth of the Unique Games Conjecture (UGC).
Thus, the SoS hierarchy discredits these instances as evidence towards the truth of the UGC. The second
reason is that these integrality gaps were known to survive Ω((log log n)1/4 ) rounds of weaker hierarchies
like “SDP + Sherali Adams” [51] or “Approximate Lasserre” [32]. Thus, this showed a big gap between
the Lasserre/SoS hierarchy and the weaker hierarchies like “SDP+Sherali Adams” or “Approximate
Lasserre.”
    We now mention the main idea behind showing that the degree-8 SoS hierarchy refutes the known
integrality gap instances for Unique Games [33, 51, 31]. Analyzing the true optimum of these instances
uses tools from analysis like hypercontractivity [7, 5], the KKL theorem [26] etc. Hence, to show that the
degree d-SoS hierarchy can refute these instances, one essentially needs to prove SoS versions of these
statements in the degree-d SoS hierarchy. Note that so far we have only viewed the SoS as a refutation
system, but in fact, as we will see later in the paper, there is an easy extension of the earlier definition,
which formalizes the notion of proving a statement in the degree d-SoS hierarchy. In particular, [3] prove
SoS versions of results like hypercontractivity, small-set expansion etc.
    Extending the results of [3], O’Donnell and Zhou [49] analyze the problems “upward” of unique
games (i. e., problems such that Unique Games reduces to these) like M AX -C UT and BALANCED -
S EPARATOR. In particular, [49] refutes the integrality gap instances of balanced separator from [12].
Since the key to analyzing the optimum of the BALANCED -S EPARATOR instances in [12] is the KKL
theorem [26], the authors provide a proof of the KKL theorem in the degree-4 SoS hierarchy. For M AX -
C UT, their results are somewhat less powerful. Again, here they analyze the instances of M AX -C UT
from [33]. More precisely, for any ρ ∈ (−1, 0), [33] construct gap-instances of Max-Cut where the
true optimum is arccos ρ/π + o(1) whereas the basic SDP-optimum is (1 − ρ)/2 + o(1). The key to
analyzing the true optimum is the “Majority is Stablest” theorem of [45]. Thus, to refute these instances
completely—i. e., to show that the true optimum is arccos ρ/π + o(1)—requires proving the “Majority is
Stablest” theorem in some constant degree-d SoS hierarchy. While O’Donnell and Zhou did not prove
that, they did prove the weaker “2/π” theorem from [30] in some constant degree of the SoS hierarchy.
The 2/π theorem states that for any balanced function whose maximum influence is o(1), the sum of
squares of degree-1 Fourier coefficients is bounded by 2/π + o(1). This implies that the SoS hierarchy
can certify that the true optimum is at most (1/2 − ρ/π) − (1/2 − 1/π)ρ 3 . They left open the problem of
refuting this gap instances optimally, i. e., showing that constant number of rounds of the SoS hierarchy
can certify that the true optimum of these gap instances is arccos ρ/π + o(1). In this paper, as the main
application of the new proof of “Majority is Stablest,” we resolve this problem.
    It should be mentioned here that while the new proof of “Majority is Stablest” is more suitable
for the SoS hierarchy, several powerful theorems and techniques are needed to achieve this adaptation.
For example we use results from approximation theory [39] and a powerful matrix version of Putinar’s

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                6
                               M AJORITY IS S TABLEST: D ISCRETE AND S O S

Positivstellensatz [37] to prove that a certain polynomial approximation preserves positiveness. We
mention here that unlike the previous two papers [3, 49] connecting SoS hierarchy with hardness of
approximation, we make essential use of Putinar’s Positivstellensatz (essentially, the completeness of the
SoS hierarchy). The next theorem shows the power of the SoS hierarchy on the instances of M AX -C UT
from [33].
Theorem (SoS-version of Max-Cut). For every δ ∈ (0, 1) and ρ ∈ (−1, 0), there exists d = d(δ , ρ) such
that the degree-d SoS hierarchy can certify that the M AX -C UT instances from [33] with noise ρ have
true optimum less than arccos ρ/π + δ .
   Note that the true optimum for these instances is known to be at least arccos ρ/π and hence the
degree-d SoS hierarchy refutes these instances optimally. Also, as a key intermediate step, we establish a
SoS version of the “Majority is Stablest” theorem.
Theorem (SoS-version of “Majority is Stablest”). For every δ ∈ (0, 1) and ρ ∈ (−1, 0), there are
constants c = c(δ , ρ) and d = d(δ , ρ) such that the following is true: let 0 ≤ f (x) ≤ 1 for all x ∈ {−1, 1}n
and maxi Infi (f) ≤ τ. There is a degree-d SoS proof of the statement
                                  Stabρ ( f ) ≥ 1 − arccos ρ/π − δ − c · τ .
    Our proof can be easily modified to give the analogous statement of “Majority is Stablest” when
ρ ∈ (0, 1). Of course, we have to change the direction of the inequality as well as impose the condition
that E[ f ] = 1/2 (this condition is not required when ρ ∈ (−1, 0)).
    As the reader can see, the theorem is stated very informally. This is because SoS proofs are heavy in
notation and it is difficult to express the precise statement without having the proper notation. However,
we do remark that the SoS version of M AX -C UT follows easily by composing the proof of refutation of
U NIQUE -G AMES instances of [33] (done in [3]) along with the [30] reduction (the proof of soundness of
this reduction is the step where we require the SoS version of “Majority is Stablest”).


2    Our tensorization theorem
In this section, we will prove our main tensorization inequality on the cube. In subsequent sections,
we will use it to give new proofs of the “Majority is Stablest” theorem of Mossel, O’Donnell and
Oleszkiewicz [45] and the Gaussian stability inequality of Borell [8]. We begin by recalling a function
from [44]: first, let Φ : R → (0, 1) denote the cumulative distribution function of a standard normal
variable:                                          Z x
                                                1          2
                                      Φ(x) =  √        e−y /2 dy .
                                                2π −∞
Next, for every ρ ∈ [−1, 1], define Jρ : (0, 1)2 → [0, 1] as
                                Jρ (x, y) = PrX,Y [X ≤ Φ−1 (x),Y ≤ Φ−1 (y)] .
Here X,Y are jointly normally distributed random variables with the covariance matrix
                                                           
                                                      1 ρ
                                      Cov(X,Y ) =              .
                                                      ρ 1

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                 7
                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

We note that [44] proved that in the Gaussian setup where f , g : Rn → [0, 1], and X,Y are jointly normal
random variables with covariance                       
                                                In ρIn
                                                           ,
                                               ρIn In
then
                                       E Jρ ( f (X), g(Y )) ≤ Jρ (E f , E g) .                          (2.1)
Applied to indicator functions, this reduces to Borell’s inequality [8]. To prove “Majority is Stablest,” we
would like to show an analogous inequality on the cube, with f , g : {−1, 1}n → [0, 1] and X,Y correlated
points on {−1, 1}n . Unfortunately, the inequality (2.1) is not true in this setup, but it is true with some
extra error terms on the right hand side. First, let us define the error term:
Definition 2.1. If Ω is a probability space and f : Ωn → R, then for X ∈ Ω, we define fX : Ωn−1 → R by
                                   fX (X1 , . . . , Xn−1 ) = f (X1 , . . . , Xn−1 , X) .
Definition 2.2. For a function f : Ω → R, define
                                               ∆1 ( f ) = E | f − E f |3 .
For a function f : Ωn → R, define ∆n ( f ) recursively by
                                ∆n ( f ) = EXn [∆n−1 ( fXn )] + ∆1 (E[ fXn | Xn ]) ,
where E[ fXn | Xn ] : Ω → R is defined as
                                 E[ fXn | Xn ](xn ) = Ex1 ,...,xn−1 [ f (x1 , . . . , xn )] .
    Note that the definition of ∆n ( f ) measures the “Lipschitzness” of the function f . For example, if f is
L-Lipschitz in the sense that changing any one of the coordinates of input changes the output by at most
L, then ∆n ( f ) is bounded by n · L3 . Next, we recall the notion of Rényi correlation. This definition will
allow us to state a result that applies somewhat more generally than just to correlated points on {−1, 1}n .
Definition 2.3. Let Ω1 and Ω2 be two sets and µ be a probability measure on Ω1 × Ω2 . We say that
µ has Rényi correlation at most ρ if for every measurable (w. r. t. µ) f : Ω1 → R and g : Ω2 → R with
Eµ f = Eµ g = 0,                                q
                                          Eµ [ f g] ≤ ρ       Eµ [ f 2 ] Eµ [g2 ] .
     For example, suppose that Ω1 = Ω2 and suppose (X,Y ) are generated by the following procedure:
first choose X according to some distribution ν. Then, with probability ρ, we set Y = X, and with
probability 1 − ρ, Y is chosen to be an independent sample from ν. If µ is the distribution of (X,Y ), then
it is easy to check that µ has Rényi correlation ρ.
     We prove the following general theorem, which we will later use to derive both Borell’s inequality
and the “Majority is Stablest” theorem.
Theorem 2.4. For any ε > 0 and 0 < ρ < 1, there is C(ρ) > 0 such that the following holds. Let µ be a
ρ-correlated measure on Ω1 × Ω2 and let (Xi ,Yi )ni=1 be i. i. d. variables with distribution µ. Then for any
measurable functions f : Ωn1 → [ε, 1 − ε] and g : Ωn2 → [ε, 1 − ε],
                     E Jρ ( f (X), g(Y )) ≤ Jρ (E f , E g) +C(ρ)ε −C(ρ) (∆n ( f ) + ∆n (g)) .


                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                8
                                M AJORITY IS S TABLEST: D ISCRETE AND S O S

2.1   The base case
We prove Theorem 2.4 by induction on n. In this section, we will prove the base case n = 1:

Claim 2.5. For any ε > 0 and 0 < ρ < 1, there is a C(ρ) such that for any two random variables
X,Y ∈ [ε, 1 − ε] with correlation in [0, ρ],

                 E Jρ (X,Y ) ≤ Jρ (E X, EY ) +C(ρ)ε −C(ρ) (E |X − E X|3 + E |Y − EY |3 ) .

    The proof of Claim 2.5 essentially follows from Taylor’s theorem applied to the function Jρ ; the
crucial point is that Jρ satisfies a certain differential equation. Define the matrix Mρσ (x, y) by
                                                                               
                                                ∂ 2 Jρ (x,y)      ∂ 2 J (x,y)
                                                        2       σ ∂ ρx∂ y
                                 Mρσ (x, y) =  ∂ 2∂Jxρ (x,y)    ∂ 2 Jρ (x,y)
                                                                                .
                                               σ ∂ x∂ y              ∂ y2




Claim 2.6. For any (x, y) ∈ (0, 1)2 and 0 ≤ σ ≤ ρ, Mρσ (x, y) is a negative semidefinite matrix. Likewise,
if ρ ≤ σ ≤ 0, then Mρσ (x, y) is a positive semidefinite matrix.

    We will also use the fact that the third derivatives of Jρ are bounded (at least, away from the boundary
of [0, 1]2 ).

Claim 2.7. For any −1 < ρ < 1, there exists C(ρ) > 0 such that for any i, j ≥ 0, i + j = 3,

                                 ∂ 3 Jρ (x, y)
                                               ≤ C(ρ)(xy(1 − x)(1 − y))−C(ρ) .
                                   ∂ xi ∂ y j

Further, the function C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1).

    Claims 2.6 and 2.7 follow from elementary calculus, and we defer their proofs to the appendix (in
fact, Claim 2.6 is implicit in [44] and we include the proof here for the sake of completeness). Now we
will use them with Taylor’s theorem to prove Claim 2.5.

Proof of Claim 2.5. Fix ε > 0 and ρ ∈ (0, 1). Now let C(ρ) be large enough so that all third derivatives
of Jρ are uniformly bounded by C(ρ)ε −C(ρ) on the square [ε, 1 − ε]2 (such a C(ρ) exists by Claim 2.7).
Taylor’s theorem then implies that for any a, b, a + x, b + y ∈ [ε, 1 − ε],

                                 ∂ Jρ             ∂ Jρ
  Jρ (a + x, b + y) ≤ Jρ (a, b) + x   (a, b) + y       (a, b)
                                  ∂x               ∂y
                                        2                              
                                           ∂ Jρ           ∂ 2 Jρ
                                                 (a, b) ∂ x∂ y (a, b)
                                                                           
                                1           ∂ x2                           x
                               + (x y) ∂ 2 Jρ
                                                           2J
                                                                             +C(ρ)ε −C(ρ) (|x|3 + |y|3 ) . (2.2)
                                2                (a, b)
                                                          ∂    ρ
                                                               2 (a, b)    y
                                           ∂ x∂ y        ∂y



                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                  9
                                A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

   Now suppose that X and Y are random variables taking values in [ε, 1 − ε]. If we apply (2.2) with
a = E X, b = EY , x = X − E X, and y = Y − EY , and then take expectations of both sides, we obtain
                                                       2                                    
                                            ∂ Jρ                         ∂ 2 Jρ
                                                2 (a, b)                 ∂ x∂ y (a, b) X̃ 
                                                                                        
                               1 
  E Jρ (X,Y ) ≤ Jρ (E X, EY ) + E (X̃ Ỹ ) ∂∂2xJρ
                                          
                                                                         ∂ 2 Jρ
                               2                                                        Ỹ
                                            ∂ x∂ y (a, b)                 ∂ y2
                                                                                (a, b)
                                                                                  +C(ρ)ε −C(ρ) (E |X̃|3 + E |Ỹ |3 ) , (2.3)

where X̃ = X − E X and Ỹ = Y − EY . Now, if X and Y have correlation σ ∈ [0, ρ] then
                                                 p
                                      E X̃ Ỹ = σ E X̃ 2 E Ỹ 2 ,

and so
                   2                                                                                       
                     ∂ Jρ                ∂ 2 Jρ                           ∂ 2 Jρ                   ∂ 2J
                          2 (a, b)       ∂ x∂ y (a, b) X̃                    2 (a, b)          σ ∂ x∂ρy (a, b)
                                                                                                                 
       E (X̃ Ỹ )  ∂∂2xJρ                                  = (σX σY )  ∂∂x2 Jρ                                σX ,
                                         ∂ 2 Jρ         Ỹ                                        ∂ 2 Jρ           σY
                     ∂ x∂ y (a, b)        ∂ y2
                                                (a, b)                   σ ∂ x∂ y (a, b)           ∂ y2
                                                                                                         (a, b)
                √                     √
where σX =        E X̃ 2 and σY = E Ỹ 2 . By Claim 2.6.
                           2                               
                             ∂ Jρ             ∂ 2 Jρ
                                    (a,               (a,
                                                                                     
                                  2     b)  σ             b)   σX                      σ
                              ∂ x
                (σX σY )  ∂ 2 Jρ             ∂  x∂
                                             ∂ 2 Jρ
                                                    y            = (σX σY )Mρσ (a, b) X ≤ 0 .
                            σ        (a, b)       2 (a, b)
                                                               σY                      σY
                                ∂ x∂ y            ∂y

Applying this to (2.3), we obtain

                             E Jρ (X,Y ) ≤ Jρ (E X, EY ) +C(ρ)ε −C(ρ) (E |X̃|3 + E |Ỹ |3 ) .

2.2      The inductive step
Next, we prove Theorem 2.4 by induction.

Proof of Theorem 2.4. Claim 2.5 proves the base case of the induction. Only the inductive case remains
to be proven. Assume that the theorem holds with n replaced by n − 1. Consider f : Ωn1 → [ε, 1 − ε] and
g : Ωn2 → [ε, 1 − ε].
    Conditioning on (Xn ,Yn ) and writing X̃ = (X1 , . . . , Xn−1 ), Ỹ = (Y1 , . . . ,Yn−1 ), we have

                                 E Jρ ( f (X), g(Y )) = EXn ,Yn EX̃,Ỹ Jρ ( fXn (X̃), gYn (Ỹ )) .

Applying the inductive hypothesis for n − 1 conditionally on Xn and Yn ,

      EX̃,Ỹ Jρ ( fXn (X̃), gYn (Ỹ )) ≤ Jρ (E[ fXn | Xn ], E[gYn | Yn ]) +C(ρ)ε −C(ρ) (∆n−1 ( fXn ) + ∆n−1 ( fYn )) .   (2.4)

On the other hand, the base case for n = 1 implies that

 EXn ,Yn Jρ (E[ fXn | Xn ], E[gYn | Yn ]) ≤ Jρ (E f , E g) +C(ρ)ε −C(ρ) (∆1 (E[ fXn | Xn ]) + ∆1 (E[gYn | Yn ])) . (2.5)

                             T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                          10
                                M AJORITY IS S TABLEST: D ISCRETE AND S O S

Taking the expectation of (2.4) over Xn and Yn and combining it with (2.5), we obtain
           E Jρ ( f (X), g(Y )) ≤ Jρ (E f , E g) +C(ρ)ε −C(ρ) EXn ∆n−1 ( fXn ) + EYn ∆n−1 ( fYn )
                                                                                                 

                                + C(ρ)ε −C(ρ) ∆1 (E[ fXn | Xn ]) + ∆1 (E[gYn | Yn ]) .
                                                                                    

Finally, note that the definition of ∆n implies that the right-hand side above is just
                                Jρ (E f , E g) +C(ρ)ε −C(ρ) ∆n ( f ) + ∆n (g) .
                                                                             


3      Borell’s inequality
The most interesting special case of Theorem 2.4 is when Ω1 = Ω2 = {−1, 1} and the distributions of
Xi , Yi satisfy E Xi = EYi = 0, E XiYi = ρ. In this section and the next, we will focus on this special case.
As we will see in this section, even proving very crude bounds for this case implies Borell’s Gaussian
stability inequality. First, let us recall the functional version of Borell’s inequality that was given in [44].
Theorem 3.1. Let ρ ≥ 0 and G1 and G2 are Gaussian vectors with joint distribution
                                                         
                                                  Id ρId
                               (G1 , G2 ) ∼ N 0,               .
                                                 ρId Id

For any measurable f1 , f2 : Rd → [0, 1],
                                    E Jρ ( f1 (G1 ), f2 (G2 )) ≤ Jρ (E f1 , E f2 ) .
    We will prove Theorem 3.1 using Theorem 2.4 and a crude bound on ∆n ( f ) (in the next section, we
will need a much better bound on ∆n ( f ) to prove that “Majority is Stablest”).
Claim 3.2. For X ∈ {−1, 1}n , define
                                   X −i = (X1 , . . . , Xi−1 , −Xi , Xi+1 , . . . , Xn ) .
Then
                                                     n
                                                        E | f (X) − f (X −i )|3
                                       ∆n ( f ) ≤ ∑                             .
                                                    i=1           8
Proof. The proof is by induction: the base case is trivial. The inductive step follows by conditioning on
Xn :
                   ∆n ( f ) = EXn [∆n−1 ( fXn )] + EXn | E[ fXn | Xn ] − E f |3
                             n−1
                                  E | f (X) − f (X −i )|3 EXn | E[ fXn | Xn ] − E[ f−Xn | Xn ]|3
                          ≤∑                             +
                              i=1           8                                8
                             n−1
                                  E | f (X) − f (X −i )|3 + E | f (X) − f (X −n )|3
                          ≤∑                                                        .
                              i=1                         8
Here the first inequality uses the induction hypothesis while the second inequality uses Jensen’s inequality.


                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                 11
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Proof of Theorem 3.1. Let n = md and define
                                                                                          !
                                                 m          2m               md
                                       1
                               G1,n = √          ∑ Xi , ∑ Xi , . . . ,      ∑         Xi .
                                        m       i=1     i=m+1            i=(d−1)m+1

Define G2,n similarly by using Y instead of X. In other words, G1,n and G2,n are vectors obtained by
averaging the vectors X and Y over consecutive blocks of size m. Define Z as

                            Z = (X1 , Xm+1 , . . . , X(d−1)m+1 ,Y1 ,Ym+1 , . . . ,Y(d−1)m+1 ) .
                                                                                           √
Observe that (G1,n , G2,n ) is distributed as sum of m independent copies of Z scaled by 1/ m. Applying
                                                                                      d
the Lindeberg-Feller central limit theorem [15], we obtain (G1,n , G2,n ) → (G1 , G2 ) as m → ∞.
    Suppose first that f1 and f2 are L-Lipschitz functions taking values in [ε, 1 − ε]. Define g1 , g2 :
{−1, 1}n → R as                                                             !!
                                             m      2m               md
                                        1
                         g j (z) = f j √    ∑ zi , ∑ zi , . . . , ∑ zi .
                                         m i=1    i=m+1          i=(d−1)m+1

By Theorem 2.4,

                  E Jρ (g1 (X), g2 (Y )) ≤ Jρ (E g1 , E g2 ) +C(ρ)ε −C(ρ) (∆n (g1 ) + ∆n (g2 )) .      (3.1)

Since fi is L-Lipschitz,
                                                                      2L
                                             |gi (X) − gi (X − j )| ≤ √ ,
                                                                       m
for every j, and so Claim 3.2 implies that
                                                            L3 n   L3 d
                                               ∆n (gi ) ≤        = √    .
                                                            m3/2     m
Applying this to (3.1),
                                                                                  2L3 d
                           E Jρ (g1 (X), g2 (Y )) ≤ Jρ (E g1 , E g2 ) +C(ρ)ε −C(ρ) √ ,
                                                                                    m
and so the definition of gi implies
                                                                                            2L3 d
               E Jρ ( f1 (G1,n ), f2 (G2,n )) ≤ Jρ (E f1 (G1,n ), E f2 (G2,n )) +C(ρ)ε −C(ρ) √ .
                                                                                              m
Taking m → ∞, the multivariate central limit theorem implies that

                                E Jρ ( f1 (G1 ), f2 (G2 )) ≤ Jρ (E f1 (G1 ), E f2 (G2 )) .             (3.2)

    This establishes the theorem for functions f1 and f2 which are Lipschitz and take values in [ε, 1 − ε].
But any measurable f1 , f2 : Rd → [0, 1] can be approximated (say in L p (Rd , γd )) by Lipschitz functions
with values in [ε, 1 − ε]. Since neither the Lipschitz constant nor ε appears in (3.2), the general statement
of the theorem follows from the dominated convergence theorem.

                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                            12
                               M AJORITY IS S TABLEST: D ISCRETE AND S O S

4    “Majority is Stablest”
By giving a bound on ∆n ( f ) that is better than Claim 3.2, we can derive the “Majority is Stablest” theorem
from Theorem 2.4. Indeed, we can express ∆n ( f ) in terms of the Fourier coefficients of f , and we can
bound ∆n ( f ) in terms of the maximum influence of any variable in f . For this, we will introduce some
very basic Fourier analytic preliminaries below.

Fourier analysis
Consider the domain {−1, 1}n equipped with the uniform measure. We start by defining the “character”
functions, i. e., for every S ⊆ [n], define χS (x) : {−1, 1}n → R as χS (x) = ∏i∈S xi . Now, every function
f : {−1, 1}n → R can be expressed as

                     f (x) = ∑ fb(S)χS (x) where            fb(S) =      E        [ f (x) · χS (x)] .
                             S⊆[n]                                    x∈{−1,1}n


The coefficients fb(S) are called the Fourier coefficients of f and the expansion of f in terms of fb(S) is
called the Fourier expansion of f . It is easy to show that

                                       ∑ fb2 (S) = x∈{−1,1}
                                                      E [ f 2 (x)] .
                                                                 n
                                      S⊆[n]


This is known in the literature as Parseval’s identity. For the sake of conciseness, we use fb(i) to denote
fb({i}) for 1 ≤ i ≤ n. Similarly, for any ρ ∈ [−1, 1] and x ∈ {−1, 1}n , we define y ∼ρ x as the distribution
over {−1, 1}n where every bit of y is independent and E[xi yi ] = ρ. This immediately lets us define the
noise operator Tρ as follows: for any function f : {−1, 1}n → R,

                                              Tρ f (x) = E [ f (y)] .
                                                         y∼ρ x


                                                                                                 ρ f (S) =
The effect of the noise operator Tρ is particularly simple to describe on the Fourier spectrum: Td
  |S|
ρ fb(S). The reader is referred to the excellent set of lecture notes by Ryan O’Donnell [47] for an
extensive reference on this topic.
      It is also important to remark here that while we prove the “Majority is Stablest” theorem for the
hypercube with the uniform measure, one can easily derive analogues of this theorem for more general
product spaces by extending our machinery. Instead of using the Fourier expansion of the function,
one has to use the Efron-Stein decomposition (see the lecture notes by Mossel [41] for an extensive
reference on the Efron-Stein decomposition). All the statements that we prove here have analogues in the
Efron-Stein world. We leave it to the expert reader to fill in the details.
      We start by extending the notation of Definition 2.1:

Definition 4.1. For disjoint sets S, T ⊂ [n], and elements x ∈ {−1, 1}S , y ∈ {−1, 1}T , we write x · y for
their concatenation in {−1, 1}S∪T .
    For a function f : {−1, 1}n → R, a set S ⊂ [n], and an element x ∈ {−1, 1}S , we define fx :
{−1, 1}[n]\S → R by fx (y) = f (x · y).

                       T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                               13
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

    Our first observation is that ∆n ( f ) can be written in terms of Fourier coefficients of random restrictions
of f .
Claim 4.2. Let Si be defined as Si = {i + 1, . . . , n}, then
                                                          n
                                                                      fX (i)|3 .
                                            ∆n ( f ) = ∑ EX∈{−1,1}Si |c
                                                         i=1

Proof. The proof is by induction. The base case is just the fact that for a function f : {−1, 1} → R,
                                                   f (1) − f (−1) 3
                                  | fb(1)|3 =                       = (E | f − E f |)3 .
                                                          2
For the inductive step, we have
                ∆n ( f ) = EXn [∆n−1 ( fXn )] + ∆1 (E[ fXn | Xn ]) = EXn [∆n−1 ( fXn )] + | fb(n)|3
                               "                             #
                                 n−1                                         n
                         = EX ∑ EX ,...,X
                              n          i+1         fX (i)|3 + | fb(n)|3 = ∑ E
                                                     c
                                                   n−1                                X∈{−1,1}     fX (i)|3 .
                                                                                               Si |c
                                  i=1                                           i=1

The first equality uses the definition of ∆n ( f ) and the third equality uses the induction hypothesis. The
second equality follows simply from the definition of E[ fXn | Xn ].

   In order to control the Fourier coefficients of restrictions of f , we first express them in terms of the
Fourier coefficients of f :
Claim 4.3. For any disjoint S and U and any x ∈ {−1, 1}S ,
                                                fx (U) = ∑ χT (x) fb(T ∪U) .
                                                b
                                                          T ⊂S

Proof. Fix S and x. Let g : {−1, 1}n → R be the function such that g(y) = 1 when yi = xi for all i ∈ S,
and g(y) = 0 otherwise. It is easy to check that the Fourier expansion of g is
                                                g(y) = 2−|S| ∑ χT (x)χT (y) .
                                                                    T ⊂S

Then
                      fx (U) = EX[n]\S [ fx (X[n]\S ) · χU (X[n]\S )] = 2|S| EX [ f (X)g(X)χU (X)]
                      b
                                  h                                      i
                             = EX f (X) ∑ χT (x)χT (X)χU (X) = ∑ χT (x) fb(T ∪U) .
                                                T ⊂S                          T ⊂S

    In particular, the identity of Claim 4.3 allows us to compute second moments of fbX :
Claim 4.4. For any function f : {−1, 1}n → R, any x ∈ {−1, 1}S and any i ∈ U ⊂ [n],
                                                        fX (U)|2 ≤ Infi ( f ) .
                                            EX∈{−1,1}S |c
Moreover, if Si = {i + 1, . . . , n} then
                                            n
                                                       fX (i)|2 = Var( f ) .
                                          ∑ EX∈{−1,1} |c       Si
                                          i=1


                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                  14
                                   M AJORITY IS S TABLEST: D ISCRETE AND S O S

Proof. In view of Claim 4.3, we can write

                                fX (U)|2 =
                               |c                  ∑ χT (X)χT (X) fb(T ∪U) fb(T 0 ∪U) .
                                                                     0
                                              T,T 0 ⊂S

When we take the expectation with respect to XS , E χT (X)χT 0 (X) = δT,T 0 (where δT,T 0 denotes the
indicator for the event T = T 0 ) and so the cross-terms vanish:

                                                 fX (U)|2 = ∑ fb2 (T ∪U) .
                                            EXS |c                                                           (4.1)
                                                                 T ⊂S


Since Infi ( f ) = ∑T 3i fb2 (T ), the first part of the claim follows.
   For the second part,
                       n                                  n
                                   fX (i)|2 = ∑ ∑ fb2 (T ∪ {i}) =
                      ∑ EX∈{−1,1} |c   Si                                             ∑          fb2 (U) ,   (4.2)
                      i=1                                i=1 T ⊂Si                 U⊂[n],U6=0/


where the last equality used the fact that every non-empty U ⊂ [n] can be written uniquely in the form
T ∪ {i} for some T ⊂ {i + 1, . . . , n}. But of course the right-hand side of (4.2) is just Var( f ).

    Next, we will consider fbx (n − i) as a polynomial in x and apply hypercontractivity to the right hand
side of Claim 4.2. First, note that Tσ commutes (up to a multiplicative factor) with restriction:

Claim 4.5. For any 0 < σ < 1, if S,U ⊂ [n] are disjoint then, as polynomials in x = (xi )i∈S ,

                                              \               |U|
                                             (Tσ f )x (U) = σ     Tσ ( b
                                                                       fx (U)) .

Proof. By Claim 4.3,
                                              fx (U) = ∑ χT (x) fb(T ∪U) .
                                              b
                                                          T ⊂S

                      |T |+|U| fb(S) and T χ (x) = σ |T | χ , it follows that
       σ f (T ∪U) = σ
Since Td                                  σ T              T

                                              |T |+|U|
                            \
                           (Tσ f )x (U) = ∑ σ          χT (x) fb(T ∪U) = σ |U| Tσ ( b
                                                                                    fx (U)) .
                                            T ⊂S

At this point, we recall the well-known Bonami-Beckner hypercontractive inequality:
                                                                            p
Theorem 4.6 ([7, 5]). Let f : {−1, 1}n → R and 1 ≤ q ≤ p. Then, for any ρ ≤ (q − 1)/(p − 1),

                                                      kTρ f k p ≤ k f kq .

Now, set q = 2 and p = 1 + σ −2 for some 0 < σ < 1. Let S and U be disjoint subsets of [n]. Defining the
function g : {−1, 1}S → R as g(x) = fbx (U), we apply Theorem 4.6 to get

                                              fX (U))| p ≤ (EX∈{−1,1}S | fbX (U)|2 ) p/2 .
                              EX∈{−1,1}S |Tσ (c

                           T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                15
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Claim 4.5 then implies that
                           1                \           p                        2 p/2
                               ·E       S |(Tσ f )X (U)| ≤ (EX∈{−1,1}S | fbX (U)| )    ,
                         σ p|U| X∈{−1,1}
and since 0 ≤ σ ≤ 1, we may remove it from the left hand side:
                                         \           p                        2 p/2
                            EX∈{−1,1}S |(Tσ f )X (U)| ≤ (EX∈{−1,1}S | f X (U)| )
                                                                      b             .

Specializing this to S = Si = {i + 1, . . . , n} and U = Ui = {i} (for any 1 ≤ i ≤ n), we obtain
                                           \           p                         2 p/2
                             EX∈{−1,1}Si |(Tσ f )X (i)| ≤ (EX∈{−1,1}Si | f X (i)| )
                                                                         b             .

Applying Claim 4.4 to the right hand side, we get
                                                               p−2
                                      \           p                                        2
                        EX∈{−1,1}Si |(Tσ f )X (i)| ≤ Infi ( f ) 2 · (EX∈{−1,1}Si | f X (i)| ) .
                                                                                   b

Summing over from i = 1 to n, we get
   n                                   n              p−2                                                 p−2
  ∑ EX∈{−1,1} |(T\          p
                 σ f )X (i)| ≤ ∑ Infi ( f )
                  Si                                   2    · (EX∈{−1,1}Si | fbX (i)|2 ) ≤ max Infi ( f ) 2 · Var( f ) , (4.3)
                                                                                              i
  i=1                                i=1

where the last inequality uses the fact that
                                n
                               ∑ (EX∈{−1,1} | fbX (i)|2 ) = ∑ fb2 (S) = Var( f ) .
                                                    Si
                               i=1                                    S6=φ

As a consequence, we have the following claim:
Claim 4.7. If 1 + σ −2 ≤ 3 then
                                                                               1−σ 2
                                              ∆n (Tσ f ) ≤ (max Infi ( f )) 2σ 2 .
                                                                  i

Proof. Using Range( f ) ⊆ [−1, 1], we get that for any X ∈ {−1, 1}Si and |(T\
                                                                            σ f )X (i)| ≤ 1. As a conse-
quence, using Claim 4.2, we get that for 2 ≤ p ≤ 3,
                                          n                            
                                                          \           p
                             ∆n (Tσ f ) ≤ ∑ EX∈{−1,1}Si |(T  f
                                                            σ X) (i)|     .
                                                         i=1

Finally, using (4.3), we get the desired conclusion.

       Now we are ready to prove the “Majority is Stablest” theorem. For this, we define Sρ ( f ) as

                                           Sρ ( f ) = Ex∈{−1,1}n , y∼ρ x [ f (x) f (y)] .

Theorem 4.8. For any 0 < ρ < 1, there are constants 0 < c(ρ),C(ρ) < ∞ such that for any function
f : {−1, 1}n → [0, 1] with maxi Infi ( f ) ≤ τ,
                                                                             log log(1/τ)
                                    Sρ ( f ) ≤ Jρ (E f , E f ) +C(ρ)                      .
                                                                               log(1/τ)

                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                             16
                                  M AJORITY IS S TABLEST: D ISCRETE AND S O S

    As remarked earlier, our proof extends to the generalizations of Theorem 4.8 such as those presented
by [42]. The extension of the proof uses the Efron-Stein decomposition instead of the Fourier decomposi-
tion. The only difference is that the hyper-contractivity parameter will now depend on the underlying
space. See [42] for more details.

Proof. Suppose f : {−1, 1}n → [ε, 1 − ε] satisfies maxi Infi ( f ) ≤ τ, and let X,Y be uniformly random
elements of {−1, 1}n with E XiYi = ρ. By setting σ = 1 − η for sufficiently small η > 0, Claim 4.7
implies that
                                         ∆n (T1−η f ) ≤ τ η/2 .
 Note that the range of T1−η f belongs to [ε, 1 − ε] because the range of f does. Hence, Theorem 2.4
applied to T1−η f implies that

  E Jρ (T1−η f (X), T1−η f (Y )) ≤ Jρ (E T1−η f , E T1−η f ) + ∆n (T1−η f ) ≤ Jρ (E f , E f ) +C(ρ)ε −C(ρ) τ cη .

Since Jρ (x, y) ≥ xy, it follows that

                   Sρ(1−η)2 ( f ) = E T1−η f (X)T1−η f (Y ) ≤ Jρ (E f , E f ) +Cε −C(ρ) τ cη .

This inequality holds for any 0 < ρ < 1; hence we can replace ρ(1 − η)2 by ρ to obtain

                   Sρ ( f ) = E T1−η f (X)T1−η f (Y ) ≤ Jρ(1−η)−2 (E f , E f ) +Cε −C(ρ) τ cη                 (4.4)

for any ρ ≤ (1 − η)2 .
    Now, (4.4) holds for any f : {−1, 1}n → [ε, 1 − ε]. For a function f taking values in [−1, 1], let f ε be
f truncated to [ε, 1 − ε]. Since | E f ε − E f | ≤ ε and (by the proof of Claim 2.6)
                                                   ∂ Jρ (x, y)
                                                               ≤1
                                                       ∂x
for any ρ,
                              Jρ(1−η)−2 (E f ε , E f ε ) ≤ Jρ(1−η)−2 (E f , E f ) + 2ε .
On the other hand, | f − f ε | ≤ ε and so

                                     Sρ ( f ) = E f (X) f (Y ) ≥ Sρ ( f ε ) − 2ε .

Thus, (4.4) applied to f ε implies that for any ρ ≤ (1 − η)2 and any ε > 0,

                              Sρ ( f ) ≤ Jρ(1−η)−2 (E f , E f ) + 2ε +Cε −C(ρ) τ cη .

If we set ε = τ cη/(2C(ρ)) then

                                    Sρ ( f ) ≤ Jρ(1−η)−2 (E f , E f ) +Cτ c(ρ)η .

    Finally, some calculus on Jρ (see Claim A.1) shows that

                                          ∂ Jρ (x, y)   p       −3/2
                                                      ≤   1 − ρ2
                                             ∂ρ

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                    17
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

for any x, y; hence

                                            (1 − η)−2 − 1
      Sρ ( f ) ≤ Jρ(1−η)−2 (E f , E f ) +                  +Cτ c(ρ)η ≤ Jρ (E f , E f ) +C(ρ)(η + τ c(ρ)η ) .
                                             (1 − ρ 2 )3/2

Choosing
                                                          log log(1/τ)
                                               η = C(ρ)
                                                            log(1/τ)
completes the proof as long as ρ ≤ (1 − η)2 . However, we can trivially make the theorem true for
(1 − η)2 ≤ ρ by choosing C(ρ) and c(ρ) appropriately.


5    Sum of Squares hierarchy
In this section, we formally give an introduction to the Sum of Squares (hereafter abbreviated as SoS)
hierarchy. To define the SoS hierarchy, let X = (x1 , . . . , xn ) and let R[X] be the ring of real polynomials
over these variables. We also let R≤d [X] denote the subset of R[X] consisting of polynomials of total
degree bounded by d. As before, let M[X] ⊂ R[X] be the set of polynomials which can be expressed as
sums-of-squares. For polynomials p1 , . . . , pm and q1 , . . . , q` , consider two sets of constraints,

    • Ae = {p1 (X) = 0, p2 (X) = 0, . . . , pm (X) = 0};

    • Ag = {q1 (X) ≥ 0, q2 (X) ≥ 0, . . . , q` (X) ≥ 0}.

Before we go ahead, we define the set Mn,d [X] as the set of monomials over x1 , . . . , xn of degree bounded
by d. Also, let M≤d [X] denote the subset of M[X] of polynomials of degree bounded by d. Further, if
A = Ae ∪ Ag , define V(A) = {X ∈ Rn : A holds on X}. We next define the (degree d) closure of these
constraints:

             Cd (Ae ) = {ps (X) · p(X) : s ∈ [m], p(X) ∈ Mn,d [X] and deg(p) + deg(ps ) ≤ d} ;
                        nm                                     m                  o
             Cd (Ag ) = ∏ qai i (X) : a1 , . . . , am ∈ Z+ and ∑ ai · deg(qi ) ≤ d .
                          i=1                                   i=1

Note that Cd (Ag ) includes 1 ∈ R. Also, it is obvious that the following constraints are implied by Ae ∪ Ag :
For p(X) ∈ Cd (Ae ), p(X) = 0 and for q(X) ∈ Cd (Ag ), q(X) ≥ 0. We also observe that the sets Cd (Ae )
and Cd (Ag ) can be computed from Ae and Ag in time nO(d) · max{`, m}.

Definition 5.1. For the constraint set A = Ae ∪ Ag defined above and h(X) ∈ R[X], we say A `d h(X) ≥ 0
if and only if
                         h(X) =       ∑ α p · p(X) + ∑ rq (X) · q(X) ,
                                     p(X)∈Cd (Ae )             q(X)∈Cd (Ag )

where α p ∈ R, rq ∈ M[X] and for all q(X) ∈ Cd (Ag ), deg(rq ) + deg(q) ≤ d. In this case, we say that A
degree-d SoS proves h(X) ≥ 0.

                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                  18
                                M AJORITY IS S TABLEST: D ISCRETE AND S O S

    For the constraint set A, we say that A `d −1 ≥ 0 if and only if there exists

                            −1 =       ∑           α p · p(X) +       ∑           rq (X) · q(X)
                                   p(X)∈Cd (Ae )                  q(X)∈Cd (Ag )

with the same constraints on α p and rq as above. In this case, we say that there is a degree-d SoS
refutation of the constraint set A.

    Note that we are adopting the same notation as in [49]. The reason we are interested in Definition 5.1
is because one can efficiently decide if A `d −1 ≥ 0 using semidefinite programming. This is because
deciding if A `d −1 ≥ 0 is equivalent to refuting the existence of a map Ee : R≤d [X] → R satisfying the
following conditions (see [50] for more details):

    • E(1)
      e = 1.

    • It is a linear map, i. e., for every g, h ∈ R≤d [X] and α, β ∈ R, E(αg
                                                                        e    + β h) = α E(g)
                                                                                        e + β E(h).
                                                                                              e

    • For every h ∈ Cd (Ae ), E(h)
                              e = 0.

    • For every h ∈ Cd (Ag ) and g ∈ M≤d [X], such that deg(g · h) ≤ d, E(g
                                                                        e · h) ≥ 0.

A map Ee which satisfies all the above constraints is called a degree-d SoS consistent map for the constraint
set A = Ae ∪ Ag . Lasserre [36] and Parillo [50] have shown that using semidefinite programming, it is
possible to decide the feasibility of such a map Ee in time m · nO(d) . In fact, if there exists such a map E,
                                                                                                            e
then the algorithm outputs one in the same time. It is important to mention that since the domain of Ee is
not finite, it is not obvious what one means by outputting the map. To see why this makes sense, note that
Ee is a linear map and hence it suffices to give to specify Ee on the set Mn,d [X]. We also remark here that
the notion of finding a mapping Ẽ is close to the viewpoint taken by Barak et al. [3].
     To get started with SoS proof systems, we state a few facts (which are very easy to prove):

Fact 5.2.

    • If A `d p ≥ 0 and A0 `d 0 q ≥ 0, then A ∪ A0 `max{d,d 0 } p + q ≥ 0.

    • If A `d p ≥ 0 and A `d 0 q ≥ 0, then A `d+d 0 p · q ≥ 0.

    • If A `d {p1 ≥ 0, p2 ≥ 0, . . . , pm ≥ 0} and {p1 ≥ 0, p2 ≥ 0, . . . , pm ≥ 0} `d 0 q ≥ 0, A `d·d 0 q ≥ 0.

    We next prove several other propositions in the SoS proof system. It will be helpful for the uninitiated
reader to look at these proofs in order to get familiar with the SoS proof system. For rest of the paper,
we set the following convention for indeterminates appearing in SoS proofs: capital letters X, Y and Z
will be used to denote a sequence of indeterminates (i. e., X = (x1 , . . . , xn )) while small letters x, y and
z will be used to indicate single indeterminates. This convention is however only for indeterminates in
the SoS proofs. For other variables, both capital and small letters will be used. Also, we will consider
polynomials on the indeterminates occurring in the SoS proofs. Whenever we refer to such polynomials
without an explicit reference to the underlying indeterminates, the set of indeterminates will be clear from
the context.

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                  19
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

5.1     Useful facts in SoS hierarchy
Fact 5.3. If A `d p ≥ 0 and A `d q ≥ 0, then A `d p + q ≥ 0.

Fact 5.4. If A = {−1 ≤ y ≤ 1},

      • if k is an even integer, A `k+1 0 ≤ yk ≤ 1,

      • if k is an odd integer, A `k −1 ≤ yk ≤ 1.

Proof. Note that for k = 0, 1, the conclusion is trivially true. For k = 2, note that trivially, y2 ≥ 0. So, we
begin by observing that (from [49])
                                       1                  1
                               1 − y2 = (1 + y)2 (1 − y) + (1 + y)(1 − y)2 ,
                                       2                  2
and hence 0 ≤ y ≤ 1 `3 y2 ≤ 1. This finishes the case for k = 2. For the remaining cases, we use induction.
We first consider the case when k > 2 is even. Then, trivially, we have yk ≥ 0. Also, observe that
1 − yk = y2 (1 − yk−2 ) + (1 − y2 ). Hence, by induction hypothesis, we have A `k+1 yk ≤ 1.
    Next, consider the case when k > 2 is odd. Again, as 1 − yk = y2 (1 − yk−2 ) + (1 − y2 ), hence by
induction hypothesis, we get A `k yk ≤ 1. Also, note that 1 + yk = y2 (1 + yk−2 ) + (1 − y2 ). Hence, again,
by induction hypothesis, we get A `k −1 ≤ yk

Fact 5.5. −1 ≤ y ≤ 1 `5 y4 ≤ y2 .

Proof.
                                     1                      1
                          y2 − y4 = y2 (1 + y)2 (1 − y) + (1 + y)(1 − y)2 y2 .
                                     2                      2
As the largest degree appearing is 5, this finishes the proof.

Fact 5.6. Let a ≤ y ≤ b `d p(y) ≥ 0. Then, for λ1 , . . . , λk ∈ R such that ∀i ∈ [k], λi ≥ 0 and ∑ki=1 λi = 1,
we have that                    (               )
                                  [k                         k     
                                     a ≤ zi ≤ b `d p ∑ λi zi ≥ 0 .
                                      i=1                      i=1

Proof. In the SoS proof of p(y) ≥ 0, whenever the term (b − y) appears, we simply substitute it by
∑ki=1 λi (b − zi ). Likewise, whenever the term (y − a) appears, we substitute it by ∑ki=1 λi (a − zi ). It is easy
to see that this substitution shows that
                                    (               )
                                      [k                   k      
                                         a ≤ zi ≤ b `d p ∑ λi zi ≥ 0 .
                                      i=1                      i=1

Fact 5.7. For k ≥ 3, −1 ≤ y ≤ 1 `2k+1 0 ≤ y2k ≤ y4 .

Proof. 1 − y2 = (1/2)(1 + y)2 (1 − y) + (1/2)(1 + y)(1 − y)2 and hence −1 ≤ y ≤ 1 `3 y2 ≤ 1. As a
consequence, for any j ≥ 1. we can get that −1 ≤ y ≤ 1 `2 j+1 y2 j ≤ y2 j−2 . Summing all the inequalities
as j variables from j = 3 to j = k, we get the stated inequalities.

                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                   20
                             M AJORITY IS S TABLEST: D ISCRETE AND S O S

Fact 5.8. For integers m, n ≥ 2,

               {−1 ≤ y ≤ 1, −1 ≤ z ≤ 1} `1+max{2m,2n} −(y4 + z4 ) ≤ ym zn ≤ (y4 + z4 ) .

Proof. `max{2m,2n} y2m + z2n ≥ ym zn . Also, using Fact 5.7, −1 ≤ y ≤ 1 `2m+1 y4 ≥ y2m . Similarly, we
have −1 ≤ z ≤ 1 `2n+1 z4 ≥ z2n . Combining these, we have

                      {−1 ≤ y ≤ 1, −1 ≤ z ≤ 1} `1+max{2m,2n} ym zn ≤ (y4 + z4 ) .

Replacing y by −y and z by −z, we can similarly get

                     {−1 ≤ y ≤ 1, −1 ≤ z ≤ 1} `1+max{2m,2n} −ym zn ≤ (y4 + z4 ) .

This completes the proof.

Fact 5.9. For any odd integer n ≥ 3,

                    {−1 ≤ y ≤ 1, −1 ≤ z ≤ 1} `n+2 −(y4 + z4 ) ≤ yzn ≤ (y4 + z4 ) .

Proof. We use A to denote {−1 ≤ y ≤ 1, −1 ≤ z ≤ 1}. We begin with the case n = 3. We first use
Fact 3.10 from [49] which states that A `4 yz3 ≤ y4 + z4 . This replaces one side of the inequality. We
can replace y by −y to get the other inequality. This proves the case n = 3. For n > 3, we begin by
observing that the case n = 1 implies A `n+1 yzn ≤ y4 zn−3 + zn+1 . Now, using Fact 5.4 (Item 1), we get
A `n+2 zn+1 ≤ z4 . And similarly, we get A `n−2 zn−3 ≤ 1 and hence A `n+2 zn−3 y4 ≤ y4 . Combining
these, we get that A `n+2 yzn ≤ y4 + z4 . Replacing y by −y, we get the other side.

Fact 5.10. Let A `d1 0 ≤ x ≤ 1 and A `d2 −z ≤ y ≤ z where z ∈ M≤d3 [X]. Then,

                                       A `d1 +max{d2 ,d3 } −z ≤ xy ≤ z .

Proof. Note that z − xy = z(1 − x) + x(z − y). Now, A `d1 +d2 x(z − y) ≥ 0 and A `d3 +d1 z(1 − x) ≥ 0.
Combining these, we get that A `d1 +max{d2 ,d3 } xy ≤ z. Flipping y to −y, we get the other inequality.

    Moving on from elementary inequalities, we have a bounded-degree hypercontractive inequality due
to Barak et al. This inequality will be used in place of the Bonami-Beckner hypercontractive inequality
(Theorem 4.6) in the SoS version of the error analysis.
Fact 5.11. [3] Let n, d ∈ N and d ≤ n. For every S ⊆ [n] such that |S| ≤ d, we have an indeterminate
`(S).
b     For x ∈ {−1, 1}n , define
                                     `(x) =    ∑ `(S)χ
                                                     b     S (x) .
                                                    S⊆[n]:|S|≤d

Then,                                                                               2
                                                4            d                 2
                             `4       E        [` (x)] ≤ 9           E        [` (x)] .
                                   x∈{−1,1}n                      x∈{−1,1}n

   We next state the following important theorem of Putinar [25] which shall be repeatedly used in the
SoS proof of “Majority is Stablest.”

                      T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                           21
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Theorem 5.12. [25] Let A = {p1 (X) ≥ 0, . . . , pm (X) ≥ 0} and define M(A) to be the set of all polynomials
expressible as ∑ni=1 ri pi + r0 where r0 , . . . , rm ∈ M[X]. Assume that ∃q ∈ M(A) such that the set {X :
q(X) ≥ 0} is compact. If p > 0 on the set V(A), then p ∈ M(A).

    In the SoS proof terminology, Putinar’s theorem has the following consequence. For the convenience
of the reader, we recall that V(A) is the set of x where the constraints defined by A hold.

Corollary 5.13. Let X = (x1 , x2 ) and A = {x1 ≥ ε, x2 ≥ ε, x1 ≤ 1 − ε, x2 ≤ 1 − ε}. Then, for any p(X)
such that p(X) ≥ ε for all X ∈ V(A), there exists an integer d = d(p) such that A `d p ≥ ε/2.

Proof. We can define the polynomials p1 = x1 − ε, p2 = 1 − ε − x1 , p3 = x2 − ε and p4 = 1 − ε − x2 .
Now, note that q(x, y) defined as

           q(x1 , x2 ) = (1 − ε − x1 )2 · p1 + (x1 − ε)2 · p2 + (1 − ε − x2 )2 · p3 + (x2 − ε)2 · p4
                                                                                      !
                                               1 2             1 2 1
                                                              
                       = (1 − 2ε) − x1 −            − x2 −         + − 2ε(1 − ε) .
                                               2               2      4

Clearly, q(x1 , x2 ) ∈ M(A) and that the set {(x1 , x2 ) : q(x1 , x2 ) ≥ 0} is a compact set. As a consequence,
we can apply Theorem 5.12 to get that there is an integer d = d(p) such that A `d p − ε/2 ≥ 0. This
implies A ` p ≥ ε/2.

    As a key step in one of our proofs, we will also require a matrix version of Putinar’s Positivstellensatz
(see [37] for details). A matrix Γ ∈ (R[X]) p×p is said to be a sum-of-squares if there exists B ∈ (R[X]) p×q
(for some q ∈ N) such that B · BT = Γ.

Theorem 5.14. Let A = {p1 (X) ≥ 0, . . . , pm (X) ≥ 0} be satisfying the conditions in the hypothesis of
Theorem 5.12. Let Γ ∈ (R[X]) p×p be a symmetric matrix and δ > 0 be such that Γ  δ I on the set V(A).
Then, Γ = Γ0 (X) + ∑m
                    i=1 Γi (X) · pi (X) where Γ0 , . . . , Γm are sum-of-squares.

    In the same way that Corollary 5.13 followed from Theorem 5.12, we have the following consequence
of Theorem 5.14.

Corollary 5.15. Let X = (x1 , . . . , xn ) and A = {p1 (X) ≥ 0, . . . , pm (X) ≥ 0} be satisfying the conditions
in Theorem 5.12. Let Γ ∈ (R[X]) p×p be such that for x ∈ V(A), Γ  δ I for some δ > 0. Let v ∈ (R[X]) p .
Then, if p = vT · Γ · v, then p ∈ M(A).

Proof. First, by applying Theorem 5.14, we get Γ = Γ0 (X) + ∑m
                                                             i=1 Γi (X) · pi (X). Let us assume that
      T
Γi = Bi · Bi . Then,
                                    m                                             m
    p = vT · Γ · v = vT (Γ0 (X) + ∑ Γi (X) · pi (X))v = (B0 · v)T · (B0 · v) + ∑ (B0 · v)T · (B0 · v) · pi (x) .
                                   i=1                                           i=1

This proves the claim.

                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                      22
                                      M AJORITY IS S TABLEST: D ISCRETE AND S O S

6    SoS proof of “Majority is Stablest”
The principal theorem of this section is the SoS version of “Majority is Stablest” theorem of [45]. Before
we state the theorem, we will need a few definitions. We will consider the indeterminates f (x) (for
x ∈ {−1, 1}n ). The constraints on these indeterminates is given by

                                      A p = {0 ≤ f (x) ≤ 1 : for all x ∈ {−1, 1}n } .

As is the case with the usual setting, its helpful to define the Fourier coefficients of f .

              For S ⊆ [n],        fb(S) =      E        f (x) · χS (x) and hence        f (x) = ∑ fb(S)χS (x) .
                                            x∈{−1,1}n                                              S⊆[n]


Note that fb(S) are nothing but linear forms in terms of the original indeterminates. It is also helpful to
recall the notion of influences and low-degree influences in this context:

                                  Infi ( f ) = ∑ fb2 (S) ,        Inf≤d
                                                                     i (f) =        ∑        fb2 (S).
                                               S3i                               S3i:|S|≤d


With this, we state the main theorem of this section.

Theorem 6.1. For any κ > 0 and ρ ∈ (−1, 0), ∃d0 = d0 (κ, ρ), d1 = d1 (κ, ρ) and c = c(κ, ρ) such that

                                                                                 1                   n              
    A p `d0      E        [ f (x) · f (y) + (1 − f (x)) · (1 − f (y))] ≥ 1 −       arccos ρ − κ − c · ∑ (Inf≤d
                                                                                                            i
                                                                                                              1
                                                                                                                f )2
                                                                                                                       .
              x∈{−1,1}n                                                          π                    i=1
                y∼ρ x


    This is easily seen to be equivalent to the statement of the “Majority is Stablest” theorem of [45].
Before we delve further into the SoS proofs, we will familiarize ourselves with the Fourier machinery in
the SoS world. The upshot of the ensuing discussion is that the basic Fourier identities and operations
hold without any changes in the SoS world.
    We start by verifying that Parseval’s identity holds, i. e., for { f (x)} and { fb(S)} defined as above

                                                     E[ f 2 (x)] = ∑ fb2 (S) .
                                                                   S⊆[n]


Similarly, we can define the noise operator Tρ here as follows: Given the sequence of indeterminates
{ f (x)}x∈{−1,1}n , we define the sequence of indeterminates {g(x)}x∈{−1,1}n as

                                                        g(x) = Ey∼ρ x [ f (x)]

and for every x, use Tρ f (x) to refer to g(x). It is also easy to check that if we define gb(S) = Ex [g(x)· χS (x)],
then gb(S) = ρ |S| fb(S).

                             T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                          23
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

6.1   Smoothening the function
For our purposes, it is necessary to do a certain smoothening of the function f . In particular, we
start by considering a new function f1 , i. e., we create a new sequence of indeterminates defined by
f1 (x) = (1 − ε) f (x) + ε/2 for some ε > 0. The value of ε shall be fixed later. We observe that

                                         A p `1 ∪x∈{−1,1}n {ε ≤ f1 (x) ≤ 1 − ε} ,                         (6.1)
                                    fb1 (S) = (1 − ε) fˆ(S) + (ε/2) · 1S=Φ .

We begin by making the following claim.

Claim 6.2.
                           A p `2 f (x) f (y) − 2ε ≤ f1 (x) f1 (y) ≤ f (x) f (y) + 2ε.

Proof. Begin by noting that 1 − f (x) f (y) = (1 − f (x)) f (y) + (1 − f (y)) and hence A p `2 0 ≤ f (x) f (y) ≤
1. Also,
                                                                     ε2
               f1 (x) f1 (y) − f (x) f (y) = (ε 2 − 2ε) f (x) f (y) + + ε(1 − ε)( f (x) + f (y)).
                                                                     4
Using, A p `2 0 ≤ f (x) f (y), we get that

                                                                                   ε2
                           A p `2 f1 (x) f1 (y) − f (x) f (y) ≤ 2(1 − ε)ε +           ≤ 2ε .
                                                                                   4
On the other hand, using A p `2 f (x) f (y) ≤ 1, we get that

                               A p `2 f (x) f (y) − f1 (x) f1 (y) ≤ 2ε − ε 2 ≤ 2ε .

    The next stage of smoothening is done by defining g = T1−η f1 for some η > 0. Again, the value of η
will be fixed later. We observe that the following relations hold:
                          [                                        [
                                  {ε ≤ f1 (x) ≤ 1 − ε} `1                  {ε ≤ g(x) ≤ 1 − ε} ,           (6.2)
                      x∈{−1,1}n                                x∈{−1,1}n

                               gb(S) = (1 − ε)(1 − η)|S| fˆ(S) + (ε/2) · 1S=Φ .

Also, observe that
                                  Ex,y∼ρ x [ f1 (x) · f1 (y)] = Ex,y∼ρ 0 x [g(x) · g(y)]

where ρ 0 = ρ/(1 − η)2 . Of course, this imposes the condition |ρ| ≤ |1 − η|2 . So, we have to choose η to
be small enough compared to |ρ|. Now, define the constraint set

                                      A0p =
                                                  [
                                                          {ε ≤ g(x) ≤ 1 − ε} .
                                              x∈{−1,1}n

So, we summarize the discussion of this subsection in the following two claims.

Claim 6.3. For any q and d ∈ N, if A0p `d q ≥ 0, then A p `d q ≥ 0.

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                 24
                                     M AJORITY IS S TABLEST: D ISCRETE AND S O S

   The proof of the above is obtained by combining (6.1) and (6.2) with the third bullet of Fact 5.2.
Similarly, using Claim 6.2 and
                                      Ex,y∼ρ x [ f1 (x) · f1 (y)] = Ex,y∼ρ 0 x [g(x) · g(y)] ,
we get the next claim:
Claim 6.4.
                               A p `2 Ex,y∼ρ x [ f (x) · f (y)] ≥ Ex,y∼ρ 0 x [g(x) · g(y)] − 2ε .
   Thus, the above two claims mean that from now on, we will work with A0p and aim to prove a lower
bound on
                                       Ex,y∼ρ 0 x [g(x) · g(y)] .
At this stage, let J˜ρ 0 be the approximation obtained from Claim B.1 with parameter ε > 0 and δ = ε. For
the sake of brevity, we indicate this by J˜ itself. The following claim allows us to compare the terms x · y
     ˜ y).
and J(x,
Claim 6.5. For any ε > 0, such that ρ 0 ∈ (−1, 0) and J˜ is as described above, there is a dα = dα (ε, ρ 0 )
such that,
                      {ε ≤ x ≤ 1 − ε, ε ≤ y ≤ 1 − ε} `dα x · y ≥ J(x,˜ y) − 2ε.

Proof. Note that for (x, y) ∈ (0, 1)2 , J0 (x, y) = xy and hence by Slepian’s lemma, we get that if ρ 0 < 0,
                                                                                         ˜ y) − ε. In other
then xy ≥ Jρ 0 (x, y). Now, by definition, we have that for (x, y) ∈ [ε, 1 − ε]2 , xy ≥ J(x,
words, if we define the polynomial p(x, y) = xy − J(x,  ˜ y) + 2ε, then we know that for (x, y) ∈ [ε, 1 − ε]2 ,
p(x, y) ≥ ε. We can thus apply Corollary 5.13 to get that there is an integer dα = dα (ε, ρ 0 ) such that for
(x, y) ∈ [ε, 1 − ε]2 , for ρ 0 ∈ (0, 1),
                                      {ε ≤ x ≤ 1 − ε, ε ≤ y ≤ 1 − ε} `dα p ≥ 0 .
Expanding p, finishes the proof.

6.2     Taylor’s theorem in the SoS world
Following the proof of “Majority is Stablest,” we now need to prove a Taylor’s theorem in the SoS
hierarchy. The following lemma is the SoS analogue of Claim 2.5.
Lemma 6.6. Define a sequence of indeterminates {h0 (1), h0 (−1), h1 (1), h1 (−1)}. Let A be a set of
constraints defined as A = i, j∈{0,1} {ε ≤ hi ( j) ≤ 1 − ε}. For any ε > 0, ρ 0 ∈ (−1, 0), ∃ cγ = cγ (ε, ρ 0 )
                               S

and ∃ dγ = dγ (ε, ρ 0 ) such that

      A `dγ       E                         e hb0 (0), hb1 (0)) − ε · (hb0 2 (1) + hb1 2 (1)) − cγ · (hb0 4 (1) + hb1 4 (1)) ,
                        e 0 (x), h1 (y))] ≥ J(
                       [J(h
              x∈R {−1,1}
                y∼ρ 0 x

where
                                                                         j
                                                    hi (1) + (−1) · hi (−1)
                                         hbi ( j) =
                                                               2
for i, j ∈ {0, 1}. Further, cγ (ε, ρ ) is a continuous function of ε and ρ 0 .
                                    0



                            T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                                 25
                              A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

    Note that while the error term in Claim 2.5 was third-order, the error term in Lemma 6.6 consists of
a small second-order term and a large fourth-order term. This sort of error term (which was used also
in [49]) is more convenient in the SoS world, because it avoids the need to consider absolute values.
    The proof of Lemma 6.6 is fairly long, since it must be written in the restrictive SoS proof system. To
give some motivation for the proof, let us first give a short outline. First of all, if we restrict to the first-
and second-order terms of J,e then the claimed inequality is true even without the fourth-order error term.
By the completeness of the SoS proof system (Corollary 5.15), there must therefore be a constant-degree
SoS proof of this fact. Most of the proof of Lemma 6.6 goes into showing that the higher order terms of Je
can be bounded by a fourth-order error term. This is of course trivial to show in the usual proof system,
but takes some work to write as an SoS proof.

Proof of Lemma 6.6. We start by noting that since Jeis a symmetric polynomial, hence we can write

                                             e y) =
                                             J(x,             ∑        µm,n xm yn .
                                                         m,n:m+n≤K


Here, we assume that K is the degree of Jeand c is the maximum absolute value of any coefficient. We
next make the following claim.

Claim 6.7.

                                                                           1 + ρ0       0
                                                                                         
                                                            m      n              m 1−ρ
            E
        x∈R {−1,1}
                   [J(h
                    e 0 (x), h1 (y))] =        ∑ νm,n · h0 (1) · h1 (1) · 2 + (−1) · 2 ,
                                                          b      b
                                          m,n:m+n is even
          y∼ρ 0 x


where                                                               
                                                   m1 −m    n1 −n  m1 n1
                          νm,n =     ∑ µm1 ,n1 · h0 (0) · h1 (0) · m n .
                                                 b        b
                                 m1 ≥m;n1 ≥n


Proof. We begin by observing that

                e hb0 (0) + x, hb1 (0) + y) = ∑ µm,n · (hb0 (0) + x)m · (hb1 (0) + y)n = ∑ νm,n · xm · yn .
                J(
                                              m,n                                             m,n

As a consequence, we get that

                      e hb0 (0) + hb0 (1), hb1 (0) + hb1 (1)) = ∑ νm,n · hb0 m (1) · hb1 n (1) ,
                      J(
                                                                 m,n
                      e hb0 (0) − hb0 (1), hb1 (0) − hb1 (1)) = ∑ (−1)m+n νm,n · hb0 m (1) · hb1 n (1) .
                      J(
                                                                 m,n

Adding these equations, we get

                                           e hb0 (0) − hb0 (1), hb1 (0) − hb1 (1)) = 2 · ∑ νm,n · hb0 m (1) · hb1 n (1) . (6.3)
 e hb0 (0) + hb0 (1), hb1 (0) + hb1 (1)) + J(
 J(
                                                                                        m,n:
                                                                                      m+n is even


                            T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                            26
                                         M AJORITY IS S TABLEST: D ISCRETE AND S O S

Similarly, we have that

                            e hb0 (0) − hb0 (1), hb1 (0) + hb1 (1)) = ∑ (−1)m νm,n · hb0 m (1) · hb1 n (1) ,
                            J(
                                                                       m,n
                            e hb0 (0) + hb0 (1), hb1 (0) − hb1 (1)) = ∑ (−1)n νm,n · hb0 m (1) · hb1 n (1) .
                            J(
                                                                        m,n

Thus,

                                          e hb0 (0) + hb0 (1), hb1 (0) − hb1 (1)) = 2 · ∑ (−1)m νm,n · hb0 m (1) · hb1 n (1) .
e hb0 (0) − hb0 (1), hb1 (0) + hb1 (1)) + J(
J(
                                                                                           m,n:
                                                                                         m+n is even
                                                                                                                        (6.4)
Hence, combining (6.3) and (6.4), we get that

                                                                                          1 + ρ0             0
                                                                                                              
                                                                        m        n                     m 1−ρ
                E        [J(h
                          e 0 (x), h1 (y))] =       ∑          νm,n · h0 (1) · h1 (1) ·
                                                                      b        b                 + (−1) ·        .
            x∈R {−1,1}                             m,n:                                     2             2
              y∼ρ 0 x                            m+n is even


     Next, we note that ν0,0 = J(
                               e hb0 (0), hb1 (0)). Thus, we get that

                                                                                     1 + ρ0             0
                                                                                                                      
                             ˜ hb0 (0), hb1 (0)) =
        e 0 (x), h1 (y))] − J(                                                m      n            m 1−ρ
     E [J(h                                             ∑ m,n 0
                                                            ν     · h
                                                                    b (1) · h
                                                                            b1 (1) ·        + (−1) ·                       .
x∈R {−1,1}                                         m,n:m+n is even                     2             2
  y∼ρ 0 x                                                  K≥m+n≥2
                                                                                                                        (6.5)
We first make the following claim which bounds the terms when m + n ≥ 4.
Claim 6.8.
                                                           1 + ρ0       0
                                                                         
                                            m      n              m 1−ρ
              A `2K+3 Y ≥      ∑ νm,n · h0 (1) · h1 (1) · 2 + (−1) · 2 ≥ −Y ,
                                          b      b
                          m,n:m+n is even
                                     and m+n≥4

                                 4          4
where Y = 2cK 4 22K (hb0 (1) + hb1 (1)).

Proof. For m + n ≥ 4 and m1 ≥ m and n1 ≥ n, we define Γm1 ,n1 ,m,n as follows:

                                                                                   1 + ρ0             0
                                                                                                   
                         m1 n1     m        n         m1 −m          n1 −n                      m 1−ρ
 Γm1 ,n1 ,m,n = µm1 ,n1        · h0 (1) · h1 (1) · h0
                                 b        b        b        (0) · h1
                                                                  b        (0) ·          + (−1) ·        .
                         m  n                                                        2             2

Next, define the set of constraints Am as

                    Am = {0 ≤ hb0 (0) ≤ 1 , 0 ≤ hb1 (0) ≤ 1 , −1 ≤ hb0 (1) ≤ 1 , −1 ≤ hb1 (1) ≤ 1} .

Now, it is easy to see that A `1 Am . Hence, by using the third bullet of Fact 5.2, if for any p and d ∈ N,
Am `d p ≥ 0, then A `d p ≥ 0. We shall be using this fact throughout this proof. Applying Fact 5.4, we
get that
                                      m1 −m                                n1 −n
                   A `m1 −m+1 0 ≤ hb0       (0) ≤ 1 ,   A `n1 −n+1 0 ≤ hb1       (0) ≤ 1 ,

                                T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                           27
                             A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

 and hence we have that
                                                              m1 −m           n1 −n
                               A `m1 +n1 −m−n+2       0 ≤ hb0       (0) · hb1       (0) ≤ 1 .                              (6.6)

 Now, we consider two possibilities: Either m = n = 2 or max{m, n} ≥ 3. In the first case, using Fact 5.8,
 we get that
                               4         4         2         2         4         4
                    A `5 −(hb0 (1) + hb1 (1)) ≤ hb0 (1) · hb1 (1) ≤ hb0 (1) + hb1 (1) .             (6.7)
 Next, consider the other case, i. e., when max{m, n} ≥ 3. Without loss of generality, assume m ≥ 3. Then,
 by Fact 5.9, we get that
                                      4           4           m                    4            4
                       A `m+2 −(hb0 (1) + hb1 (1)) ≤ hb0 (1) · hb1 (1) ≤ hb0 (1) + hb1 (1) .

 Next, we use Fact 5.10 to get that
                                   4         4          m         n         4         4
                     A `m+n+1 −(hb0 (1) + hb1 (1)) ≤ hb0 (1) · hb1 (1) ≤ hb0 (1) + hb1 (1) .                               (6.8)

 Thus, combining (6.7) and (6.8), we get that for any m and n such that m + n ≥ 4,
                                   4         4          m         n         4         4
                     A `m+n+1 −(hb0 (1) + hb1 (1)) ≤ hb0 (1) · hb1 (1) ≤ hb0 (1) + hb1 (1) .

 Now, combining (6.6) along with an application of Fact 5.10, we get that
                        4           4         m         n         m1 −m           n1 −n          4         4
      A `m1 +n1 +3 −(hb0 (1) + hb1 (1)) ≤ hb0 (1) · hb1 (1) · hb0       (0) · hb1       (0) ≤ hb0 (1) + hb1 (1) .

 Now, recalling that 0 ≤ m1 , n1 , m, n ≤ K, mm1 , nn1 ≤ 2K , |µm,n | ≤ c and |ρ 0 | ≤ 1, we get that
                                                     

                                     4         4                               4         4
                   A `2K+3 −c22K (hb0 (1) + hb1 (1)) ≤ Γm1 ,n1 ,m,n ≤ c22K (hb0 (1) + hb1 (1)) .

 As
                                               1 + ρ0             0
                                                                             
                               m          n                 m 1−ρ
               ∑          ν    h
                            m,n 0
                               b  (1)hb1 (1) ·        + (−1) ·                     =         ∑            Γm1 ,n1 ,m,n ,
          m,n:m+n is even                        2             2                        m,n:m+n is even
           and m+n≥4                                                                        m+n≥4
                                                                                       K≥m1 ≥m K≥n1 ≥n

 we can conclude that
                            4         4                              m         n                          4            4
    A `2K+3 −c22K K 4 · (hb0 (1) + hb1 (1)) ≤           ∑ νm,n hb0 (1) · hb1 (1) ≤ c22K K 4 · (hb0 (1) + hb1 (1)) .
                                                  m,n:m+n is even
                                                    and m+n≥4

 This completes the proof of Claim 6.8.

     Finally, we need to consider the terms when m + n = 2. This is where we will use Putinar’s theorem.
 Note that
                             1 + ρ0              0
                                                  
            m        n                    m 1−ρ               2             2
 ∑ νm,n · h0 (1) · h1 (1) · 2 + (−1) · 2 = ν2,0 · hb0 (1) + ν0,2 · hb1 (1) + ρ 0 ν1,1 · hb0 (1) · hb1 (1) .
          b        b
m+n=2


                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                               28
                                    M AJORITY IS S TABLEST: D ISCRETE AND S O S

For the sake of brevity, call the above quantity Λ. Next, we observe that at x = hb0 (0), y = hb1 (0),

                          ∂ 2 J(x,
                              e y)                    ∂ 2 J(x,
                                                          e y)                        ∂ 2 J(x,
                                                                                          e y)
                                   = 2ν2,0 ,                   = 2ν0,2 ,                       = ν1,1 .
                              ∂ x2                        ∂ y2                          ∂ x∂ y
To see this, note that

                         ∂ 2 J(x,
                             e y)                            e hb0 (0) + x0 , hb1 (0) + y0 )
                                                         ∂ 2 J(
                                                       =                                                  .
                             ∂ x2 x=hb0 (0), y=hb1 (0)                 ∂ x02                 x0 =0, y0 =0


However, the quantity on the right side is simply twice the coefficient of x02 in the polynomial J( e hb0 (0) +
 0            0
x , h1 (0) + y ) which is exactly 2ν2,0 . The other equalities follow similarly. Thus, we get that
    b

                              ˜ y) 2
                      1 ∂ 2 J(x,                   ˜ y) 2
                                                2 J(x,                     ˜ y)
                                                                        2 J(x,
                                                                                                
                                              ∂                     0 ∂
                 Λ=                hb0 (1) +           hb1 (1) + 2ρ             hb0 (1) · hb1 (1) .
                      2      ∂ x2                 ∂ y2                  ∂ x∂ y

We mention here that in the above equation, the derivatives are evaluated at x = hb0 (0), y = hb1 (0). We do
not specify this in the equation for the sake of compactness. We now make the following claim which
gives a lower bound on Λ.
Claim 6.9. For every ε > 0 and ρ 0 ∈ (−1, 0), there exists dγ0 = dγ0 (ε, ρ 0 ) such that
                                                               2         2
                                           A `dγ0 Λ ≥ −ε · (hb0 (1) + hb1 (1)) .

Proof. We begin by defining the matrix M
                                       e as follows:
                                                            ˜
                                                       ∂ 2 J(x,y)          2 ˜
                                                                      ρ 0 ∂ ∂J(x,y)
                                                                                      !
                                            e=             ∂ x2              x∂ y
                                            M               2 ˜              ˜
                                                                        ∂ 2 J(x,y)        .
                                                      ρ 0 ∂ ∂J(x,y)
                                                              x∂ y         ∂ y2

Put β = 2ε. Now, let us define M    f1 = M   e + β I. Using Claim 2.6 and Claim B.1, for ρ 0 ∈ (−1, 0), and
(x, y) ∈ [ε, 1 − ε]2 , we can say that Mf1  ε · I. Hence, using Corollary 5.15, ∃ dγ0 such that we have the
following
                          ˜ y)                                           ˜ y)                      ˜ y)
                    2
                                                                    ∂ 2 J(x,
                                                                                              2            
              2       ∂ J(x,                0 b                                         2       ∂ J(x,
    A `dγ0 hb0 (1)             + β    +  2ρ   · h 0 (1) · h
                                                          b 1 (1) ·              +  hb1   (1)            + β    ≥0
                         ∂ x2                                         ∂ x∂ y                      ∂ y2
              2          ˜ y)
                    ∂ 2 J(x,       0 b
                                                               ˜ y)
                                                          ∂ 2 J(x,            2            ˜ y)
                                                                                      ∂ 2 J(x,             2         2
 ≡ A `dγ0 hb0 (1) ·        2
                              + 2ρ  · h 0 (1) · h
                                                b1  (1) ·               + hb1   (1) ·         2
                                                                                                  ≥ −β (hb0 (1) + hb1 (1)) .
                        ∂x                                   ∂ x∂ y                        ∂y
    Dividing by 2 on both sides, finishes the proof. Note that the reason dγ0 depends only on ε and ρ 0 is
because from Corollary 5.15, the degree dγ0 depends on ε, ρ 0 and the polynomial J˜ and J˜ depends only on
ε and ρ 0 .

    Now, set dγ = max{dγ0 , 2K + 3}. Also, note that both K and c are bounded by continuous functions
of ε and ρ 0 (see Claim B.1), hence cγ can be chosen to be a continuous function of ρ 0 and ε and be an
upper bound on 2cK 4 22K . Combining Claim 6.8 and Claim 6.9 with (6.5), we get Lemma 6.6.

                           T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                          29
                                A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

6.3   Tensorization
We now do a “tensorization” of the inequality in Lemma 6.6 (analogous to Theorem 2.4). Unlike in the
base case, this step of the SoS proof follows the non-SoS proof fairly closely. Let {φ (x)}x∈{−1,1}n be a set
of indeterminates. We recall that for y ∈ {−1, 1}i , we define the set {φy (z)}z∈{−1,1}n−i of indeterminates
as follows: φy (z) = φ (z · y) where z · y denotes the concatenation of z and y. As before, we can define the
Fourier coefficients φby (S) for S ⊆ [n − i] and it is easy to see that they are homogeneous linear forms in
the indeterminates φb(S) (for S ⊆ [n]). We now state a few basic properties of the indeterminates gy (z)
and gby (S). (The proofs are easy and can be filled in by the reader.)

                                             n−1
                                    A0p `1
                                             [         [           [
                                                                              {ε ≤ gy (z) ≤ 1 − ε} ,                              (6.9)
                                             i=0 y∈{−1,1}i z∈{−1,1}n−i
                                             n−1
                                    A0p `1
                                             [         [          [
                                                                         {−1 ≤ gby (S) ≤ 1} ,                                    (6.10)
                                             i=0 y∈{−1,1}i S⊆[n−i]



                                        `2         E      [gby 2 (n − i)] =       ∑           gb2 (S) .                          (6.11)
                                             y∈{−1,1}i                        S⊆{n−i,...,n}
                                                                                n−i∈S

Lemma 6.10. For the parameters cγ = cγ (ε, ρ 0 ) and dγ = dγ (ε, ρ 0 ) from Lemma 6.6,
                                                                                                                             !
                                                                                               n−1
           A0p `dγ        E      [J(g(x),
                                  e       g(y))] ≥ J(E[g(x)],
                                                   e          E[g(y)]) − ε                      ∑                 g2z (n − i)]
                                                                                                      Ez∈{−1,1}i [b
                     x∈{−1,1}n                                                                  i=0
                       y∼ρ 0 x
                                                                                                       !
                                                                   n−1
                                                           − cγ    ∑ Ez∈{−1,1} [bg4z (n − i)]
                                                                                     i                     .
                                                                   i=0


Proof. The proof is by induction. For any pair of strings z−1 , z1 ∈ {−1, 1}i and j ∈ {−1, 1}, we define
the indeterminate,
                                  hi ( j) = Ez∈{−1,1}n−i−1 [g(z · j · zi )] .
Define                                                       [
                                             Az−1 ,z1 =             {ε ≤ hi ( j) ≤ 1 − ε} .
                                                          j∈{−1,1}

It is trivial to see that A0p `1 Az−1 ,z1 . Now, using Lemma 6.6, for any two strings z1 , z−1 ∈ {−1, 1}i , we
get that

                                                                                         b2z1 (n − i) + gb2z−1 (n − i)
                                                                                                                      
         Az−1 ,z1 `dγ      E        J(E[g
                                    e     x·z1 ], E[gy·z−1 ]) ≥ J(E[gz1 ], E[gz−1 ]) − ε g
                                                                e
                        x∈{−1,1}
                          y∼ρ 0 x

                                                                   − cγ gb4z−1 (n − i) + gb4z1 (n − i) .
                                                                                                      


                              T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                                  30
                                    M AJORITY IS S TABLEST: D ISCRETE AND S O S

As a consequence of the third bullet of Fact 5.2, the left hand side of “`” can be replaced by A0p . Now,
for any given z1 , z−1 ∈ {−1, 1}i , let d(z1 , z−1 ) be its Hamming distance. In particular, if i = 0, then
d(z1 , z−1 ) = 0. Now, multiply the above inequality by
                                         1 + ρ 0 i−d(z1 ,z−1 )  1 − ρ 0 d(z1 ,z−1 )
                                                                ·
                                            4                        4
and add all such inequalities generated by choosing (z1 , z−1 ) ∈ {−1, 1}i × {−1, 1}i for 0 ≤ i < n. It is
                                                                                                              n
easy that upon addition, all terms of the form J(E[gz1 ], E[gz−1 ]) cancel out except when z1 , z−1 ∈ {−1, 1}
                                               e
or z1 = z−1 = φ . Thus, we get the inequality

                       A0p `dγ      E        [J(g(x),
                                              e       g(y))] ≥ J(E[g(x)],
                                                               e          E[g(y)]) + error terms
                                 x∈{−1,1}n
                                   y∼ρ 0 x


where the error terms are the terms coming from gb2z j (n − i) and gb4z j (n − i). We now explicitly compute the
error terms. First, we sum up the error coming from the term

                                       ε gb2z1 (n − i) + gb2z−1 (n − i) .
                                                                       


For any given z1 ∈ {−1, 1}i , consider the term ε gb2z1 (n − i). For every z−1 ∈ {−1, 1}i , it occurs with the
factor                            1 + ρ 0 i−d(z1 ,z−1 )  1 − ρ 0 d(z1 ,z−1 )
                                                         ·                       .
                                     4                        4
Since there are exactly ki strings z−1 ∈ {−1, 1}i such that d(z1 , z−1 ) = k, hence we get that the total
                          

weight associated is
                                 i  
                                            1 + ρ 0 i−k 1 − ρ 0 k
                                                                   
                                     i
                                ∑ k                                     = 2−i .
                                k=0           4                4
Thus, we get that the first kind of error terms contribute

                                                ε gb2z1 (n − i) + gb2z−1 (n − i) .
                                                                                


The calculation of the “fourth degree” error terms is exactly identical resulting in the final theorem.

    We now simplify the error terms. Towards this, note that (6.11) implies that
                                       !
             n−1                                                      
                           g2z (n − i)] = ε · ∑ gb2 (S) ≤ ε · ∑ gb2 (S) = ε ·
      `2 ε · ∑ Ez∈{−1,1}i [b                                                                                 E        [g2 (x)] .
                 i=0                                       S6=φ                      S                    x∈{−1,1}n


Further, A0p `3 Ex∈{−1,1}n [g2 (x)] ≤ 1 (using Fact 5.4). Thus, we get that
                                                                                                                       !
                                                                                     n−1
    A0p `dγ      E        [J(g(x),
                           e       g(y))] ≥ J(E[g(x)],
                                            e          E[g(y)]) − ε − cγ                 ∑ Ex∈{−1,1} [bg4x (n − i)]
                                                                                                      i                    .       (6.12)
              x∈{−1,1}n                                                              i=0
                y∼ρ 0 x



                             T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                                     31
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

6.4     Bounding the error terms
Thus, all we are left to bound is the “degree-4” term. We briefly describe why one has to be careful to get
a (meaningful) upper bound here. The reason is that the obvious strategy to do this is to break g into high
degree and low-degree parts based on the noise parameter (call them h and `). Very naively, this gives an
error which is the sum of Exb   h4x (n − i) and Ex `b4x (n − i) (up to constant factors). The term Ex `b4x (n − i) can
be easily bounded using hypercontractivity. However, there does not seem to be obvious way to bound
           h4x (n − i). This is in spite of the fact that Exb
the term Exb                                                   h2x (n − i) is small. We now show how to get around
this problem.
    We define dη = (1/η) · log(1/η). Now, define the sequence of indeterminates {h(x)}x∈{−1,1}n and
{`(x)}x∈{−1,1}n as follows:

                            h(x) =       ∑ gb(S)χS (x) ,             `(x) =    ∑ gb(S)χS (x) .
                                     |S|>dη                                   |S|≤dη

By the way it is defined, it is clear that `1 h(x) + `(x) = g(x). Now, we can analyze the term

                                                                g4x (n − i)]
                                                    Ex∈{−1,1}i [b

as
        n−1                               n−1
                      g4x (n − i)] =
     `4 ∑ Ex∈{−1,1}i [b                    ∑ (Ex∈{−1,1} [bg3x (n − i)(bhx (n − i) + `bx (n − i))])
                                                             i
        i=0                               i=0
                                          n−1
                                    =      ∑ Ex∈{−1,1} [bg3x (n − i)bhx (n − i)] + Ex∈{−1,1} [bg2x (n − i)`b2x (n − i)]
                                                                 i                               i
                                           i=0
                                                  g2x (n − i)`bx (n − i)b
                                                                                   
                                    + Ex∈{−1,1}i [b                     hx (n − i)] .                              (6.13)

We begin by stating the following useful fact:
Fact 6.11. A p `3 ∑n−1            b2
                   i=0 Ex∈{−1,1}i hx (n − i) ≤ η.

Proof.
       n−1                                                                                                        
                 h2x (n − i) = ∑ b
 `2 ∑ Ex∈{−1,1}i b               h2 (S) =             ∑ (1 − η)d fb2 (S) ≤ η · ∑ fb2 (S) ≤ η · ∑ fb2 (S) .
                                                                      η

       i=0                        S6=φ              |S|>dη                             |S|>dη                |S|

Here the first equality uses (6.11) while the second uses the definition of h.
                                               
                                A p `3 ∑ fb2 (S) = Ex∈{−1,1}n [ f 2 (x)] ≤ 1 .
                                              |S|

Combining the two facts finishes the proof.

We now bound the terms appearing in (6.13). This is done in Claims 6.12, 6.13 and 6.14.
                                                            √
Claim 6.12. A p `7 ∑n−1             g3x (n − i)b
                    i=0 Ex∈{−1,1}i [b          hx (n − i)] ≤ η.


                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                             32
                                   M AJORITY IS S TABLEST: D ISCRETE AND S O S

Proof. For any η > 0, we have that
                                                                                                              
    n−1                                   √            n−1                           ∑n−1
                                                                                      !
                                                                                          E         h
                                                                                                   ib
                                                                                                      2 (n − i)
                                            η                                         i=0  x∈{−1,1}   x
                   3
 `6 ∑ Ex∈{−1,1}i [b
                  gx (n − i)hx (n − i)] ≤
                            b
                                           2            ∑ Ex∈{−1,1}i gb6x (n − i) +            √
                                                                                             2 η
                                                                                                                 .
    i=0                                                 i=0

Next, recall that using Fact 6.11, we have

                                                 n−1
                                                             h2x (n − i) ≤ η .
                                         A p `3 ∑ Ex∈{−1,1}i b
                                                 i=0

Similarly, from (6.10) and Claim 6.3, we have that A p ` −1 ≤ ĝx (n − i) ≤ 1. This in turn implies
A p `7 gb6x (n − i) ≤ gb2x (n − i) (combining Fact 5.5 and Fact 5.7). Combining all the above, we get

                 n−1                                              √        n−1
                                                                                                       !         √
                                                                    η                                              η
           A p `7 ∑                 g3x (n − i)b
                        Ex∈{−1,1}i [b          hx (n − i)] ≤          ·     ∑    Ex∈{−1,1}i gb2x (n − i)       +     .
                  i=0                                              2       i=0                                    2

However, we have that
                                  n−1
                         A p `3 ∑ Ex∈{−1,1}i ĝ2x (n − i) = ∑ gb2 (S) ≤ E[g2 (x)] ≤ 1 .
                                   i=0                             |S|>0

This proves the claim.

Claim 6.13.
                         n−1                                        √
                                                                      η   9dη  n ≤d
                                                                                             
                A p `5   ∑    E [b   g2x (n − i)`b2x (n − i)] ≤         + √ ∑ (Infi η ( f ))2 .
                       i=0 x∈{−1,1}i                                 2   2 η i=1

Proof. For η > 0, we have that
                                                                                                             
     n−1                                  √             n−1
                                                                                      n−1
                                                                                       !
                                                                                          E
                                                                                     ∑i=0 x∈{−1,1} x
                                                                                                  i `
                                                                                                    b4 (n − i)
                   2         2              η
 `4 ∑ Ex∈{−1,1}i [b
                  gx (n − i)`x (n − i)] ≤
                            b
                                           2            ∑ Ex∈{−1,1}i gb4x (n − i) +          √
                                                                                            2 η
                                                                                                                .
    i=0                                                 i=0

Similarly, using Fact 5.11,
                                                                                                       !
                          n−1                                      n−1                           2
                     `4 ∑       Ex∈{−1,1}i `b4x (n − i) ≤ 9dη ·     ∑      Ex∈{−1,1}i `b2x (n − i)         .
                          i=0                                       i=0

As in the proof of Claim 6.12, we can show

                                                 n−1
                                         A p `5 ∑ Ex∈{−1,1}i gb4x (n − i) ≤ 1 .
                                                 i=0


                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                           33
                                 A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Combining all the above, we get
              n−1                                  √                                 n−1 
                                                                                                                         !
                            2         2              η   9dη                                                      2
      A p `5 ∑ Ex∈{−1,1}i [b
                           gx (n − i)`x (n − i)] ≤
                                     b                 + √                           ∑     Ex∈{−1,1}i `b2x (n − i)           .   (6.14)
             i=0                                    2   2 η                          i=0

Again, observe that

                 `2 Ex∈{−1,1}i `b2x (n − i) =                  ∑           `b2 (S) ≤         ∑       `b2 (S)
                                                     S⊆{n−i,...,n}:n−i∈S               S⊆[n]:n−i∈S
                                                                                                          ≤d
                                                 =         ∑       gb2 (S) ≤         ∑       fb2 (S) ≤ Infn−iη ( f ) .
                                                     S⊆[n]:n−i∈S               S⊆[n]:n−i∈S
                                                      and |S|≤dη                and |S|≤dη

As a consequence of Fact 5.2, we get
                                     n−1                     2 n−1           2
                                                                       ≤d
                                  `4 ∑ Ex∈{−1,1}i `b2x (n − i) ≤ ∑ Infn−iη ( f ) .
                                       i=0                                     i=0

Combining this with (6.14), we get the final result.

Claim 6.14.
                    n−1                                            √                          dη   n
                                                                        9                                      ≤dη
                                                                                                                            
                                g2x (n − i)b
           A p `9 ∑ Ex∈{−1,1}i [b          hx (n − i)`bx (n − i)] ≤ η + √                            ∑ (Infi         ( f ))2 .
                  i=0                                                  2 η                           i=1

Proof. For any η > 0, we have
           n−1
                                                               1 n−1                    
       `6 ∑         E     g2x (n − i)b
                         [b          hx (n − i)`bx (n − i)] ≤ √         E     h2
                                                                   ∑ x∈{−1,1}i x
                                                                              b  (n −  i)
           i=0 x∈{−1,1}
                       i                                     2 η i=0
                                                               √                                        
                                                                η ∑n−1  E
                                                                    i=0 x∈{−1,1} xi g
                                                                                    b4 (n − i)`b2 (n − i)
                                                                                                x
                                                             +                                             .
                                                                                 2
From the proof of Claim 6.12, we know that
                                                       n−1
                                              A p `3 ∑          E      h2x (n − i) ≤ η .
                                                                       b
                                                                   i
                                                       i=0 x∈{−1,1}

Thus, we get

                                                       √     √  n−1            4 (n − i)`b2 (n − i)
                                                                                                    
     n−1
                                                         η    η ∑    E       i
                                                                 i=0 x∈{−1,1} xg
                                                                               b           x
  `6 ∑      E [b   g2x (n − i)b
                              hx (n − i)`bx (n − i)] ≤     +                                          . (6.15)
     i=0 x∈{−1,1}
                 i                                      2                   2

However,
           n−1                                             ∑n−1            b8x (n − i) + ∑n−1
                                                            i=0 Ex∈{−1,1}i g
                                                                                                         b4
                                                                                          i=0 Ex∈{−1,1}i `x (n − i)
       `8 ∑         E         gb4x (n − i)`b2x (n − i) ≤                                                            .
           i=0 x∈{−1,1}
                          i                                                            2

                              T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                                  34
                                    M AJORITY IS S TABLEST: D ISCRETE AND S O S

Following the same proof as in the proof of Claim 6.12, we can show that
                                                    n−1
                                             A p `9 ∑        E      gb8x (n − i) ≤ 1 .
                                                                i
                                                    i=0 x∈{−1,1}

Similarly, from the argument in the proof of Claim 6.13, we can show that
                                                                                                       !
                                    n−1                                     n−1                  2
                                                                                        ≤d
                           A p `4 ∑           E     `b4x (n − i) ≤ 9dη ·     ∑       Infn−iη ( f )         .
                                                i
                                    i=0 x∈{−1,1}                            i=0

Combining these, we get that
                                                                                              ≤dη      2 
                           n−1                               1+9              dη ·
                                                                                      ∑n−1
                                                                                       i=0 Infn−i ( f )
                  A p `9    ∑   E gb4x (n − i)`b2x (n − i) ≤                                                .
                         i=0 x∈{−1,1}
                                     i                                                    2

Plugging this in (6.15), we get the claim.

    Combining (6.12) with Claim 6.12, Claim 6.13, Claim 6.14 and plugging into Claim 6.3, we get that
for cγ and dγ described in Lemma 6.6,

         A p `max{dγ ,9}        E       [J(g(x),
                                         e       g(y))] ≥ J(E[g(x)],
                                                          e          E[g(y)]) − ε
                           x∈{−1,1}n
                             y∼ρ 0 x
                                                                    √                                              (6.16)
                                                              5 · cγ η 9dη · cγ              n
                                                                                                   ≤d
                                                                                               (Infi η ( f ))2 .
                                                                                                              
                                                            −         − √                   ∑
                                                                   2       η                i=1

Using Claim 6.5, we have that A p `dα g(x) · g(y) ≥ J(g(x),
                                                    e       g(y)) − 2ε. Similarly, combining this with
Claim 6.4 (and using that dα ≥ 2), we get that

                           A p `dα Ex,y∼ρ x [ f (x) · f (y)] ≥ Ex,y∼ρ 0 x [J(g(x),
                                                                           e       g(y))] − 4ε .                   (6.17)

We now combine this with (6.16) to get that

         A p `max{dγ ,dα ,9}        E      [ f (x) · f (y)] ≥ J(E[g(x)],
                                                              e          E[g(y)]) − 5ε
                               x∈{−1,1}n
                                 y∼ρ x
                                                                    √                                              (6.18)
                                                              5 · cγ η 9dη · cγ              n
                                                                                                   ≤d
                                                                                               (Infi η ( f ))2 .
                                                                                                              
                                                            −         − √                   ∑
                                                                   2       η                i=1

Now, define a new sequence of indeterminates { fc (x)}x∈{−1,1}n where fc (x) = 1 − f (x). Analogous
to the (sequence of) indeterminates { f1 (x)}x∈{−1,1}n and {g(x)}x∈{−1,1}n defined earlier, we define the
indeterminates { f1c (x)}x∈{−1,1}n and {gc (x)}x∈{−1,1}n as follows:

                        f1c (x) = (1 − ε) fc (x) + ε/2           and       gc (x) = Ey∼1−η x [ f1c (x)] .

We now make the following observations:

                           T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                       35
                                  A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

   • ∀x ∈ {−1, 1}n , A p `1 ∀x ∈ {−1, 1}n , ε ≤ gc (x) ≤ (1 − ε).
   • Ex [g(x)] + Ex [gc (x)] = 1.
                                 ≤dη            ≤dη
   • For all i ∈ [n], Infi             f = Infi       fc .
Thus, using the above, analogous to (6.18), we have the following:
         A p `max{dγ ,dα ,9}           E     [ fc (x) · fc (y)] ≥ J(E[g
                                                                  e     c (x)], E[gc (y)]) − 5ε
                                 x∈{−1,1}n
                                   y∼ρ x
                                                                        √                                                (6.19)
                                                                  5 · cγ η 9dη · cγ         n
                                                                                                  ≤d
                                                                                              (Infi η ( fc ))2 .
                                                                                                              
                                                                −         − √              ∑
                                                                       2       η           i=1

Defining ξ as
                                                      √                                          !
                                                5 · cγ η 9dη · cγ            n
                                                                                   ≤d
                                       ξ = 5ε +         + √                 ∑  (Infi η ( f ))2
                                                     2       η              i=1
and summing up (6.18) and (6.19), we get
  A p `max{dγ ,dα ,9}        E        [ f (x) · f (y) + (1 − f (x)) · (1 − f (y))] ≥ J(E[g(x)],
                                                                                     e          E[g(y)])
                        x∈{−1,1}n
                          y∼ρ x

                                                                                     + J(E[1
                                                                                       e     − g(x)], E[1 − g(y)]) − 2ξ .
                                                                                                                     (6.20)
Next, we recall the following fact:
Fact 6.15. For any a ∈ (0, 1) and ρ ∈ (−1, 0),
                                                                                            arccos ρ
                             Jρ (a, a) + Jρ (1 − a, 1 − a) ≥ 2Jρ (1/2, 1/2) = 1 −                    .
                                                                                               π
    As a consequence we have that for every x ∈ [ε, 1 − ε],
                                                                                       0
                            e x) + J(1
                            J(x,      e − x, 1 − x) ≥ 1 − arccos ρ − 2ε .
                                                                π
By using Corollary 5.13, we have that there exists dδ = dδ (ε, ρ 0 ) such that
                                                                           arccos ρ                       0
           A p `dδ J(E[g(x)],
                   e          E[g(y)]) + J(E[1
                                         e     − g(x)], E[1 − g(y)]) ≥ 1 −          − 4ε .                               (6.21)
                                                                               π
Combining (6.20) and (6.21), we get
                                                                                                arccos ρ 0
   A p `max{dγ ,dα ,dδ ,9}        E        [ f (x) · f (y) + (1 − f (x)) · (1 − f (y))] ≥ 1 −              − 4ε − 2ξ .   (6.22)
                             x∈{−1,1}n                                                              π
                               y∼ρ x

From here, getting to Theorem 6.1 is pretty easy. We first note that
                                             √     2 · 9dη · cγ  n     ≤dη       2
                                                                                    
                      4ε + 2ξ = 14ε + 5 · cγ η + √               ∑ i c .
                                                                    (Inf    ( f ))
                                                         η       i=1

We now proceed as follows:

                              T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                          36
                                      M AJORITY IS S TABLEST: D ISCRETE AND S O S

    • For the given ρ and κ, first we choose ε = κ/100. This implies that 14 · ε ≤ κ/4.

    • Next, observe that cγ = cγ (ρ 0 , ε) is a continuous function of ρ 0 and ε. Now, recall that ρ 0 =
                               √
      ρ/(1 − η). At η = 0, η · cγ (ρ 0 , ε) = 0. Hence, there exists η0 = η0 (ρ, ε, κ) > 0 such that for all
              √
      η ≤ η0 , η · cγ (ρ 0 , ε) ≤ κ/32.

    • Again, observe that for any ρ ∈ (−1, 0) and κ > 0, there exists η1 = η1 (ρ, κ) > 0 such that for all
      η ≤ η1 , (arccos ρ)/π ≤ (arccos ρ 0 )/π + κ/4.
Now, choose η = min{η0 , η1 }. With η and ε having been fixed in terms of κ and ρ, we set

                                                                      2 · 9dη · cγ
              d0 (κ, ρ) = max{dγ , dα , dδ , 9} ,        c(κ, ρ) =        √        ,     and   d1 (κ, ρ) = dη
                                                                            η
and hence get
                                                                                       arccos ρ
    A p `d0 (κ,ρ)      E        [ f (x) · f (y) + (1 − f (x)) · (1 − f (y))] ≥ 1 −              −κ
                    x∈{−1,1}n                                                             π
                      y∼ρ x
                                                                                                                         !
                                                                                                n
                                                                                                      ≤d (κ,ρ)
                                                                              − c(κ, ρ) ·      ∑  (Infi 1      ( f ))2       .
                                                                                               i=1

This finishes the proof of Theorem 6.1.


7    Refuting the Khot-Vishnoi instances of M AX -C UT
In this section, we will prove the following theorem:
Theorem 7.1. Let ρ ∈ (−1, 0) and Gρ = (Vρ , Eρ ) be the M AX -C UT instance constructed in [33] for the
noise parameter ρ. Let {xv }v∈V be a sequence of indeterminates and A = v∈V {0 ≤ xv ≤ 1}. Then, for
                                                                            S

any δ > 0, there exists d 0 = d 0 (δ , ρ) such that
                     (                                                    )
                                                             1
                A∪        E xu · (1 − xv ) + xv · (1 − xu ) ≥ arccos ρ + δ `d 0 −1 ≥ 0 .
                       (u,v)∈E                               π

Note that if the graph Gρ has a cut with λ · |E| edges, then there is {0, 1} assignment to the set {xv }v∈Vρ
such that E(u,v)∈E xu · (1 − xv ) + xv · (1 − xu ) = λ . Thus, Theorem 7.1 shows that for any δ > 0, there is
constant degree (dependent just on δ and ρ) SoS refutation for the assertion that the size of the fractional
max-cut of Gρ exceeds (1/π) arccos ρ + δ . This theorem is essentially tight as it is known that the size
of max-cut in Gρ is at least (1/π) arccos ρ − o(1). Thus, a constant number of rounds of the Lasserre
hierarchy computes the value of max-cut in Gρ (nearly) optimally.
    To understand the significance of this result, we recall that O’Donnell and Zhou [49] had proved the
same result with (1/π) arccos ρ + δ replaced by
                                                               
                                             1 ρ          1 1
                                               − −         −      · ρ3 .
                                             2 π          2 π

                              T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                               37
                           A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

While this does show that a constant number of rounds of Lasserre does better than the basic SDP on
the KV M AX -C UT instances, the question of whether with a constant number of rounds of Lasserre, the
integrality gap on these instances can come arbitrarily close to 1 remained open. Theorem 7.1 answers
this question in the affirmative.
    The main technical component that O’Donnell and Zhou use in the proof of their analogue of
Theorem 7.1 is the following theorem.

Theorem 7.2. [49] Let { f (x)}x∈{−1,1}n be a sequence of indeterminates and let A =              x∈{−1,1}n {0 ≤
                                                                                               S

f (x) ≤ 1}. Then, for any δ > 0 and ρ ∈ (−1, 0),
                                                                                 !
                                                                        n
                                                                   2
                       A `O(1/δ 2 ) Stabρ ( f ) ≥ K(ρ) − δ − 2O(1/δ ) · ∑ fb4 (i) .
                                                                           i=1

where                                                 
                                            1 ρ    1 1
                                      K(ρ) = + +    −    · ρ3 .
                                            2 π    2 π
Further, they apply this theorem essentially in a black-box manner. The main idea behind strengthening
the O’Donnell-Zhou result is that Theorem 6.1 is a strengthening of Theorem 7.2. Thus, we seek to follow
the same steps as in [49] except we will use replace the application of Theorem 7.2 by Theorem 6.1. As
Theorem 7.1 can be proved by just following the steps in [49] (and applying Theorem 6.1), we will not
give its complete proof here. However, to help the reader, we do describe some of the important steps.

Unique Games. We begin by recalling the description of instances of U NIQUE -G AMES (UG) (see [30]).
A UG instance consists of:

   • an undirected graph G = (V, E);

   • a probability distribution E on tuples of the form ((u, v), π(u,v) ) where (u, v) ∈ E and π(u,v) : [k] → [k]
     is a permutation.

Further, the weighted graph on G defined by E is regular. Also, [k] is said to be the alphabet of the
U NIQUE G AMES instance. The objective is to get a mapping L : V → [k] so as to maximize the following
quantity:
                           ν(G, E) := Pr(u,v,π(u,v) )∈E [L(v) = π(u,v) (L(u))] .
The quantity ν(G, E) is said to be the value of the instance. Let Eu denote the marginal distribution
on (v, π) when the first vertex is conditioned to be u. We next consider the SoS formulation for the
UG instance described above. It is slightly different from the “obvious” formulation and follows the
formulation in [49]. To motivate the formulation, we first state the following claim without proof. The
proof can be found in [49].

Claim 7.3. Let (G, E) be a U NIQUE G AMES instance as defined above. Let {xv,i }v∈V, i∈[k] be a set of real
numbers satisfying the following conditions:

   • For all v ∈ V , i ∈ [k], xv,i ≥ 0.

                        T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                  38
                                   M AJORITY IS S TABLEST: D ISCRETE AND S O S

   • For all v ∈ V , ∑i∈[k] xv,i ≤ 1.
If ν(G, E) ≤ β , then                                                                !2 
                                                k
                                       E ∑               E           xv,π(u,v)   (i)
                                                                                          ≤ 4β .
                                      u∈V   i=1      (v,π(u,v) )∈Eu


    Let {xv,i }v∈V,i∈[k] be a set of indeterminates. Inspired by Claim 7.3, we make the following definition.
Definition 7.4. Given a UG instance (G, E) with alphabet size k, there is a degree-d SOS refutation for
optimum β if                                                            
                                                                 !2 
                         [        k                                      
                      Ap      E ∑           E       x                 ≥ β   ` −1 ≥ 0 .
                                                      v,πu,v (i)
                                                                           d
                                                                   
                          u∈V i=1     (v,πu,v )∈Eu

where                                                                             (                )
                                            [                             [
                                  Ap =               {xv,i ≥ 0} ∪                     ∑ xv,i ≤ 1       .
                                         v∈V,i∈[k]                        v∈V       i∈[k]

    Before we go ahead, we recall that for any η ∈ (0, 1) and N ∈ N (which is a power of 2), [33] construct
UG instances over 2N /N vertices, alphabet size n such that optimal value of the instance is bounded by
N −η .2 Modifying the result from [3], O’Donnell and Zhou [49] show the following (Lemma 8.7 in [49]):
Theorem 7.5. Let η ∈ (0, 1) and N be a power of 2 and let (V, E) be the corresponding instances of UG
constructed in [33]. Then, there is a degree-4 SoS refutation for optimum β = N −Ω(η) .
    We next describe the reduction from [30] of UG to M AX -C UT. The reduction is parameterized by
a “correlation” value ρ ∈ (−1, 0). Given the instance of UG described above, the set of vertices in the
corresponding M AX -C UT instance is given by V 0 = V × {−1, 1}k . Further, the probability distribution
Eρ,k over the edges is given by the following sampling procedure:
   • Choose u ∼ V uniformly at random.

   • Choose (u, v1 , π(u,v1 ) ) and (u, v2 , π(u,v2 ) ) independently from the distribution Eu .

   • Choose x ∈ {−1, 1}k and y ∼ρ x.

   • Output vertices ((v1 , π(u,v1 ) (x)), (v2 , π(u,v2 ) (y))).
We also have the following simple claim from [30]:
Claim 7.6. Let G0 = (V 0 , Eρ,k ) be an instance of M AX -C UT described above. Consider a partition of the
graph G0 (into two sets) specified by a collection of functions { fv : {−1, 1}k → {0, 1}}. Then, the value
of cut defined by this partition is 1 − Eu∈V [Stabρ (gu )] where

                      gu : {−1, 1}k → [0, 1] is defined as                      gu (x) =       E       [ fv (π(x))] .
                                                                                            (v,π)∈Eu

  2 Of course, the interesting part is that [33] shows that the standard SDP relaxation on this instance has value 1 − η



                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                             39
                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

     Having described the reduction from U NIQUE G AMES to M AX -C UT, we now consider the SoS
relaxation of the M AX -C UT instance defined by V 0 and Eρ,k . In particular, we have an indeterminate
fv (z) for every v ∈ V and z ∈ {−1, 1}k . The constraint set Am is given by
                                           [      [
                                    Am =                   {0 ≤ fv (z) ≤ 1} .
                                           v∈V z∈{−1,1}k


Given Claim 7.6, we can say that there is a degree-d SoS refutation for the value of the max-cut in this
graph exceeding β if
                             Am ∪ {1 − E [Stabρ (gu )] ≥ β } `d −1 ≥ 0.
                                           u∈V

Note that Stabρ (gu ) is defined in terms of the indeterminates {0 ≤ fv (z) ≤ 1}v∈V, z∈{−1,1}k in the obvious
way (gu is a homogeneous linear form in these indeterminates). The following claim (proven by O’Donnell
and Zhou [49]) shows the connection between SoS refutation for U NIQUE G AMES instances and SoS
refutation for M AX -C UT instances obtained after the [30] reduction.

Claim 7.7. Let (G, E) be a U NIQUE G AMES instance with a degree 4 SoS refutation for optimum β and
g(u) be the corresponding indeterminates obtained after the [30] reduction. Then,
                                                                   
                                                         O(1/δ 2 )
               Am ∪ 1 − E [Stabρ (gu )] ≥ K(ρ) − δ − 2             β `O(1/δ 2 )+4 −1 ≥ 0 .     (7.1)
                              u∈V

    This implies that if the reduction from [30] is applied on the instances from Theorem 7.5,
                                                                          
                                                        O(1/δ 2 )    −Ω(η)
           Am ∪ 1 − E [Stabρ (gu )] ≥ K(ρ) − δ − 2                ·N         `O(1/δ 2 )+4 −1 ≥ 0 .
                        u∈V

The proof of Claim 7.7 is reasonably straightforward and goes by proving SoS versions of many analytic
statements and inequalities including use of Theorem 7.2. If we follow the exact same steps but use
Theorem 6.1 instead of Theorem 7.2, we get the following claim.

Claim 7.8. Let (G, E) be a U NIQUE G AMES instance with a degree 4 SoS refutation for optimum β and
g(u) be the corresponding indeterminates obtained after the [30] reduction. Then,
                                                                        
     Am ∪ 1 − E [Stabρ (gu )] ≥ (arccos ρ)/π − δ − d1 (δ , ρ)c(δ , ρ) · β `d0 (δ ,ρ)+4 −1 ≥ 0 . (7.2)
                 u∈V


   We do not repeat the steps here as it is identical to the proof of Claim 7.7. Using β = N −Ω(η) , we get
Theorem 7.1.


Acknowledgements
We thank Ryan O’Donnell and Yuan Zhou for sharing the manuscript [49]. We thank the anonymous
reviewers for their helpful comments.

                       T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                               40
                                M AJORITY IS S TABLEST: D ISCRETE AND S O S

A       Facts regarding Jρ
Here we collect various facts about the function
                                  Jρ (x, y) = Pr[X ≤ Φ−1 (x),Y ≤ Φ−1 (y)] ,
where                                                  
                                           (X,Y ) ∼ N 0, ρ1 ρ1 .
As is standard, we will use φ to denote the density of the standard normal distribution. These calculations
all follow from elementary calculus.
Claim 2.6. For any (x, y) ∈ (0, 1)2 and 0 ≤ σ ≤ ρ, Mρσ (x, y) is a negative semidefinite matrix. Likewise,
if ρ ≤ σ ≤ 0, then Mρσ (x, y) is a positive semidefinite matrix.
Proof. Towards proving this, note that we can define Y = ρ · X + 1 − ρ 2 · Z where Z ∼ N(0, 1) is an
                                                                    p

independent normal. Also, let us define Φ−1 (x) = s and Φ−1 (y) = t. For s,t ∈ R, define Kρ (s,t) as
                                                                               p
                Kρ (s,t) = PrX,Y [X ≤ s,Y ≤ t] = PrX,Z [X ≤ s, Z ≤ (t − ρ · X)/ 1 − ρ 2 ] .
Note that for the aforementioned relations between x, y, s and t, Kρ (s,t) = Jρ (x, y). Note that
                                                         0
                                                            √ 2
                                     Z     s     Z               (t−ρ·s )/   1−ρ
                             Kρ (s,t) =            φ (s0 )                         φ (t 0 )ds0 dt 0 .   (A.1)
                                          s0 =−∞             t 0 =−∞
This implies that
                                                    Z (t−ρ·s)/√1−ρ 2
                                 ∂ Kρ (s,t)
                                            = φ (s)                  φ (t 0 )dt 0 .
                                    ∂s               t 0 =−∞
By chain rule, we get that
                                      ∂ Jρ (x, y) ∂ Kρ (s,t) ∂ s
                                                 =          ·    .
                                          ∂x         ∂s       ∂x
By elementary calculus, it follows that
                         dΦ−1 (x)       1                            ∂s      1       1
                                  =     −1
                                                             ⇒          =    −1
                                                                                  =      .
                           dx       φ (Φ (x))                        ∂ x φ (Φ (x)) φ (s)
Thus,
                                                     Z (t−ρ·s)/√1−ρ 2
                                    ∂ Jρ (x, y)
                                                =                            φ (t 0 )dt 0 .
                                        ∂x             t 0 =−∞
Thus, we next get that
                                                              !
              ∂ 2 Jρ (x, y) ∂ 2 Jρ (x, y) ∂ s        t −ρ ·s         −ρ         1
                           =             ·    =φ                ·p            ·
                   ∂ x2
                                                    p
                               ∂ x∂ s      ∂x          1 − ρ2        1 − ρ 2 φ (s)
                                                       !
                                 Φ−1 (y) − ρ · Φ−1 (x)       −ρ          1
                           =φ            p               ·p          ·       .
                                           1−ρ 2             1−ρ   2   φ (s)
                                                                           !
              ∂ 2 Jρ (x, y) ∂ 2 Jρ (x, y) ∂t        Φ−1 (y) − ρ · Φ−1 (x)          1       1
                           =             ·    =φ          p                    ·p        ·    .
                 ∂ x∂ y        ∂ x∂t       ∂y                1−ρ  2               1 − ρ φ (t)
                                                                                       2



                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                             41
                            A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

Because we know that (X,Y ) ∼ (Y, X), by symmetry, we can conclude that
                                                                          !
                          ∂ 2 Jρ (x, y)      Φ−1 (x) − ρ · Φ−1 (y)               −ρ       1
                                        =φ                                    ·p      ·       ,
                               ∂ y2
                                                  p
                                                     1 − ρ2                     1−ρ 2   φ (t)

and likewise,                                                             !
                          ∂ 2 Jρ (x, y)      Φ−1 (x) − ρ · Φ−1 (y)               1       1
                                        =φ        p                           ·p       ·    .
                             ∂ y∂ x                  1 − ρ2                     1 − ρ φ (s)
                                                                                     2

It is obvious now that
                                                                                      2
                               ∂ 2 Jρ (x, y) ∂ 2 Jρ (x, y)            ∂ 2 Jρ (x, y)
                                                                  
                                            ·              − ρ2                            = 0.
                                    ∂ x2          ∂ y2                   ∂ x∂ y
Now, suppose that |σ | ≤ |ρ|. Then
                                                                                                   2
                                       ∂ 2 Jρ (x, y) ∂ 2 Jρ (x, y)                 ∂ 2 Jρ (x, y)
                                                                               
                     det(Mρσ (x, y)) =              ·              −σ2                                  ≥ 0.
                                            ∂ x2          ∂ y2                        ∂ x∂ y

If ρ ≥ 0 then the diagonal of Mρσ (x, y) is non-positive, and it follows that Mρσ (x, y) is negative semidefi-
nite. If ρ ≤ 0 then the diagonal is non-negative and so Mρσ (x, y) is positive semidefinite.

Claim 2.7. For any −1 < ρ < 1, there exists C(ρ) > 0 such that for any i, j ≥ 0, i + j = 3,

                                  ∂ 3 Jρ (x, y)
                                                ≤ C(ρ)(xy(1 − x)(1 − y))−C(ρ) .
                                    ∂ xi ∂ y j

Further, the function C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1).

Proof. As before, we set Φ−1 (x) = s and Φ−1 (y) = t. From the proof of Claim 2.6, we see that
                                                                          !
                          ∂ 2 Jρ (x, y)      Φ−1 (y) − ρ · Φ−1 (x)               −ρ       1
                                        =φ                                    ·p       ·      .
                               ∂ x2
                                                  p
                                                     1 − ρ2                     1 − ρ 2 φ (s)

To compute the third derivatives of J, recall that

                                        ∂s    1                   ∂t    1
                                           =            and          =      ,
                                        ∂ x φ (s)                 ∂ y φ (t)

we have

                ∂ 3 Jρ (x, y)        ρ        ρt + (2ρ 2 − 1)s      t 2 − 2ρst + (2ρ 2 − 1)s2 
                              =                                exp   −
                     ∂ x3       (1 − ρ 2 )3/2       φ (s)                     2(1 − ρ 2 )
                                   √                                  t 2 − 2ρst + (3ρ 2 − 2)s2              (A.2)
                                    2πρ                 2
                              =               (ρt + (2ρ   − 1)s) exp  −                            .
                                (1 − ρ 2 )3/2                                  2(1 − ρ 2 )

                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                                   42
                               M AJORITY IS S TABLEST: D ISCRETE AND S O S

                √                                                                 √
Now, Φ−1 (x) ∼ 2 log x as x → 0; hence there is a constant C such that Φ−1 (x) ≤ C log x for all x ≤ 1/2.
Hence, exp(s2 ) ≤ x−C for all x ≤ 1/2; by symmetry, exp(s2 ) ≤ (x(1 − x))−C for all x ∈ (0, 1). Therefore
                t 2 − 2ρst + (3ρ 2 − 2)s2           2             (2−3ρ 2 )s2
                                                 − t 2        ρst
          exp −                              = e   2(1−ρ ) e 1−ρ 2 e 2(1−ρ 2
                          2(1 − ρ 2 )
                                                          t2   ρ(s2 +t 2 )   (2−3ρ 2 )s2
                                                      −                                                  (A.3)
                                                  ≤ e 2(1−ρ 2 ) e 2(1−ρ 2 ) e 2(1−ρ 2
                                                                             − ρ      − 2−3ρ 2
                                                  ≤ x(1 − x)y(1 − y) 2(1−ρ 2 ) x(1 − x) 2(1−ρ 2 ) .
Further, using exp(s2 ) ≤ x−C and exp(t 2 ) ≤ y−C , ρt + (2ρ 2 − 1)s ≤ 4(xy)−C . As a consequence, applying
this to (A.2), we see that there is a constant C(ρ) > 0,
                                   ∂ 3 Jρ (x, y)                        −C(ρ)
                                           3
                                                 ≤ C(ρ) x(1 − x)y(1 − y)       .
                                        ∂x
The other third derivatives are similar:
                                √
             ∂ 3 Jρ (x, y)       2πρ                      (2ρ 2 − 1)t 2 − 2ρst + (2ρ 2 − 1)s2 
                           =               (t − 2ρs) exp  −                                      .
               ∂ x2 ∂ y      (1 − ρ 2 )3/2                               2(1 − ρ 2 )
By the same steps that led to (A.3), we get
                                   ∂ 3 Jρ (x, y)                        −C(ρ)
                                                 ≤ C(ρ) x(1 − x)y(1 − y)
                                     ∂ x2 ∂ y
(for a slightly different C(ρ)). The bounds on ∂ 3 J/∂ y2 ∂ x and ∂ 3 J/∂ x3 then follow because J is symmetric
in x and y.
     The fact that C(ρ) can be chosen so that it is continuous for ρ ∈ (−1, 1) is obvious from the discussion
above.

Claim A.1. For any x, y ∈ (0, 1),
                                           ∂ Jρ (x, y)
                                                       ≤ (1 − ρ 2 )−3/2 .
                                              ∂ρ
Proof. We begin from (A.1), but this time we differentiate with respect to ρ:
                                                                            !
                     ∂ Kρ (s,t)           1
                                                  Z s
                                                                   t − ρs 0
                                =−                        φ (s0 )φ p          ds0 .
                        ∂ρ           (1 − ρ 2 )3/2 s0 =−∞            1 − ρ2
Since Range(φ ) ⊂ (0, 1] and s0 φ (s0 )ds0 = 1, it follows that
                               R

                                           ∂ Kρ (s,t)
                                                      ≤ (1 − ρ 2 )−3/2 .
                                              ∂ρ
Since
                                      ∂ Jρ (s,t) ∂ Kρ (Φ−1 (x), Φ−1 (y))
                                                =                        ,
                                         ∂ρ               ∂ρ
the proof is complete.

                         T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                               43
                          A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

We also state the following useful claim without a proof. The proof is obvious from the calculations in
the proofs of Claim 2.6 and Claim 2.7.

Claim A.2. For any ρ ∈ (−1, 1), ε > 0 there exists a continuous function γ(ρ, ε) such that for any
(x, y) ∈ [ε, 1 − ε]2 and 1 ≤ i + j ≤ 3,

                                          ∂ i+ j Jρ (x, y)
                                                           ≤ γ(ρ, ε) .
                                             ∂ i x∂ j y

B    Approximation by polynomials
Claim B.1. For any ρ ∈ (−1, 1) and any δ > 0, there is a polynomial J˜ such that for all 0 ≤ i + j ≤ 2,

                                          ∂ i+ j Jρ (x, y) ∂ i+ j J˜ρ (x, y)
                                 sup                      −                  ≤δ.
                              x,y∈[ε,1−ε]    ∂ xi ∂ y j       ∂ xi ∂ y j

Moreover, if ρ ∈ [−1 + ε, 1 − ε], then the degree of J˜ and the maximal coefficient in J˜ can be bounded by
constants depending only on ε and δ . Further, both these constants are continuous functions of ε and δ
for any ε > 0.

   The proof of Claim B.1 follows from standard results on Bernstein polynomials. In particular, we
make use of the following theorem which may be found, for example, in [39].

Theorem B.2. Suppose f : [0, 1] → R has m continuous derivatives which are all bounded in absolute
value by M. For any n ∈ N, let Bn f be the polynomial
                                             n         
                                                       n k
                               (Bn f )(x) = ∑ f (k/n)    x (1 − x)n−k .
                                            k=1        k

Then for any 0 ≤ i ≤ m,
                                        d i f (x) d i (Bn f )(x)     p
                                 sup             −               ≤ C  M/n .
                                x∈[0,1]    dxi          dxi

    Seeing as the first three derivatives of Jρ are bounded on [ε, 1 − ε], Claim B.1 is essentially just a
2-variable version of Theorem B.2. Although such a result is almost certainly known (and for more than
2 variables), we were not to find a reference in the literature, and so we include the proof here.

Proof of Claim B.1. Suppose that f : [0, 1]2 → R has all partial derivatives up to third order bounded by
M. Define
                                           n  
                                                n
           gn (x, y) = (Bn f (·, y))(x) = ∑        f (k/n, y)xk (1 − x)n−k .
                                          k=1   k
                                            n n   
                                                    n n
           hn (x, y) = (Bn gn (x, ·))(y) = ∑ ∑               f (k/n, `/n)xk (1 − x)n−k y` (1 − y)n−` .
                                           k=1 `=1  k    `


                       T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                             44
                                M AJORITY IS S TABLEST: D ISCRETE AND S O S

Fix 0 ≤ i + j ≤ 2 and note that

                                         ∂ j gn (·, y)       ∂ j f (·, y)
                                                       = B n               ,                            (B.1)
                                             ∂yj                 ∂yj
                                         ∂ i hn (x, ·)       ∂ i gn (x, ·)
                                                       = B n                .                           (B.2)
                                              ∂ xi                ∂ xi
Now, fix y ∈ [0, 1] and apply Theorem B.2 to (B.1): for any x ∈ [0, 1],

                                   ∂ i+ j gn (x, y) ∂ i+ j f (x, y)    p
                                           i   j
                                                   −       i   j
                                                                    ≤ C M/n .
                                      ∂x ∂y            ∂x ∂y

On the other hand, fixing x and applying Theorem B.2 to (B.2) yields

                                  ∂ i+ j hn (x, y) ∂ i+ j gn (x, y)    p
                                          i   j
                                                  −        i   j
                                                                    ≤ C M/n .
                                     ∂x ∂y            ∂x ∂y

Putting these together,
                                  ∂ i+ j hn (x, y) ∂ i+ j f (x, y)      p
                                                  −                ≤ 2C  M/n .
                                     ∂ xi ∂ y j       ∂ xi ∂ y j
Since hn is a polynomial, taking n sufficiently large implies that there is a polynomial f˜ such that f˜,
and its partial derivatives of order at most 2, uniformly approximate the corresponding derivatives of f .
Although we stated this for functions on [0, 1]2 , a change of coordinates shows that it holds equally well
for functions on [δ , 1 − δ ]2 with three bounded derivatives. Since Jρ is such a function, the first part of
the claim follows.
    For the second part of the claim, note that all of the error bounds hold uniformly in ρ ∈ [−1 + ε, 1 − ε]
since the first, second and third derivatives of Jρ are uniformly bounded over ρ ∈ [−1 + ε, 1 − ε] (follows
easily by Claim A.2). Moreover, since maxx,y |Jρ (x, y)| ≤ 1, the coefficients in hn can be bounded in
terms of n, which is in turn bounded in terms of ε and δ .
Further, note that since the first, second and third derivatives derivatives of Jρ are bounded by a continuous
function of ρ and ε (see Claim A.2), hence n can also be bounded by a continuous function of ε and δ .
As maxx,y |Jρ (x, y)| ≤ 1, the coefficients of J˜ρ are bounded by a continuous function of ε and δ and so is
the degree of J˜ρ .



References
 [1] P ER AUSTRIN: Balanced MAX-2-SAT might not be the hardest. In Proc. 39th STOC, pp. 189–197.
     ACM Press, 2007. Available at ECCC. [doi:10.1145/1250790.1250818] 4

 [2] D OMINIQUE BAKRY AND M ICHEL L EDOUX: Lévy-Gromov’s isoperimetric inequality for
     an infinite-dimensional diffusion generator. Inventiones Mathematicae, 123(1):259–281, 1996.
     [doi:10.1007/BF01232376] 4

                          T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                             45
                        A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

 [3] B OAZ BARAK , F ERNANDO G. S. L. B RANDÃO , A RAM W ETTROTH H ARROW, J ONATHAN A.
     K ELNER , DAVID S TEURER , AND Y UAN Z HOU: Hypercontractivity, sum-of-squares
     proofs, and their applications. In Proc. 44th STOC, pp. 307–326. ACM Press, 2012.
     [doi:10.1145/2213977.2214006] 2, 6, 7, 19, 21, 39

 [4] B OAZ BARAK , P RASAD R AGHAVENDRA , AND DAVID S TEURER: Rounding semidefinite pro-
     gramming hierarchies via global correlation. In Proc. 52nd FOCS, pp. 472–481. IEEE Comp. Soc.
     Press, 2011. [doi:10.1109/FOCS.2011.95] 6

 [5] W ILLIAM B ECKNER: Inequalities in Fourier analysis. Ann. of Math., 102(1):159–182, 1975.
     [doi:10.2307/1970980] 3, 6, 15

 [6] S ERGEY G. B OBKOV: An isoperimetric inequality on the discrete cube, and an elementary proof
     of the isoperimetric inequality in Gauss space. Ann. Probab., 25(1):206–214, 1997. Available at
     Project Euclid. 4

 [7] A LINE B ONAMI: Étude des coefficients Fourier des fonctions de L p (G). Annales de l’Institute
     Fourier, 20(2):335–402, 1970. EuDML. 3, 6, 15

 [8] C HRISTER B ORELL: Geometric bounds on the Ornstein-Uhlenbeck velocity process. Probab.
     Theory and Related fields, 70(1):1–13, 1985. [doi:10.1007/BF00532234] 1, 3, 7, 8

 [9] J EAN B OURGAIN: On the distributions of the Fourier spectrum of Boolean functions. Israel J.
     Math., 131(1):269–276, 2002. [doi:10.1007/BF02785861] 2, 3

[10] E DEN C HLAMTAC AND G YANIT S INGH: Improved approximation guarantees through higher
     levels of SDP hierarchies. In Proc. 11th Internat. Workshop on Approximation Algorithms for
     Combinatorial Optimization Problems (APPROX’08), volume 5171 of LNCS, pp. 49–62. Springer,
     2008. [doi:10.1007/978-3-540-85363-3_5] 6

[11] A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN: Majority is stablest: discrete and
     SoS. In Proc. 45th STOC, pp. 477–486. ACM Press, 2013. [doi:10.1145/2488608.2488668,
     arXiv:1211.1001] 1

[12] N IKHIL R. D EVANUR , S UBHASH K HOT, R ISHI S AKET, AND N ISHEETH K. V ISHNOI: Integrality
     gaps for sparsest cut and minimum linear arrangement problems. In Proc. 38th STOC, pp. 537–546.
     ACM Press, 2006. [doi:10.1145/1132516.1132594] 6

[13] I RIT D INUR , E LCHANAN M OSSEL , AND O DED R EGEV: Conditional hardness for approxi-
     mate coloring. SIAM J. Comput., 39(3):843–873, 2009. Preliminary version in STOC’06.
     [doi:10.1137/07068062X] 4

[14] I RIT D INUR AND S AMUEL S AFRA: On the hardness of approximating minimum vertex-
     cover.     Ann. of Math., 162(1):439–485, 2005.  Preliminary version in STOC’02.
     [doi:10.4007/annals.2005.162.439] 2

                     T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                        46
                            M AJORITY IS S TABLEST: D ISCRETE AND S O S

[15] W ILLIAM F ELLER: An Introduction to Probability Theory and its Applications. John Wiley & Sons,
     1968. 12

[16] DAN S. F ELSENTHAL AND M OSHÉ M ACHOVER: The Measurement of Voting Power. Edward
     Elgar Publishing, 1998. 2

[17] E HUD F RIEDGUT AND G IL K ALAI: Every monotone graph property has a sharp threshold. Proc.
     Amer. Math. Soc., 124(10):2993–3002, 1996. [doi:10.1090/S0002-9939-96-03732-X] 2, 3

[18] E HUD F RIEDGUT, G IL K ALAI , AND N OAM N ISAN: Elections can be manipulated often. In Proc.
     49th FOCS, pp. 243–249. IEEE Comp. Soc. Press, 2008. [doi:10.1109/FOCS.2008.87] 4

[19] M ICHEL X. G OEMANS AND DAVID P. W ILLIAMSON: Improved approximation algorithms for
     maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–
     1145, 1995. Preliminary version in STOC’94. [doi:10.1145/227683.227684] 3

[20] D IMA G RIGORIEV: Linear lower bound on degrees of Positivstellensatz calculus proofs for the
     parity. Theoret. Comput. Sci., 259(1-2):613–622, 2001. [doi:10.1016/S0304-3975(00)00157-2] 5

[21] D IMA G RIGORIEV AND N ICOLAI VOROBJOV: Complexity of Null- and Positivstellensatz proofs.
     Ann. Pure Appl. Logic, 113(1-3):153–160, 2001. [doi:10.1016/S0168-0072(01)00055-0] 5

[22] J OHAN H ÅSTAD: Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001. Prelimi-
     nary version in STOC’97. [doi:10.1145/502090.502098] 2

[23] DAVID H ILBERT: Über die Darstellung definiter Formen als Summe von Formenquadraten. Mathe-
     matische Annalen, 32(3):342–350, 1888. [doi:10.1007/BF01443605] 5

[24] M ARCUS I SAKSSON AND E LCHANAN M OSSEL: Maximally stable Gaussian partitions with
     discrete applications. Israel J. Math., 189(1):347–396, 2012. [doi:10.1007/s11856-011-0181-7,
     arXiv:0903.3362] 4

[25] VAITHILINGAM J EYAKUMAR , J EAN B ERNARD L ASSERRE , AND G UOYIN L I: On polynomial
     optimization over non-compact semi-algebraic sets. J. Optim. Theory and Appl., 163(3):707–718,
     2014. [doi:10.1007/s10957-014-0545-3] 5, 21, 22

[26] J EFF K AHN , G IL K ALAI , AND NATHAN L INIAL: The influence of variables on boolean func-
     tions (extended abstract). In Proc. 29th FOCS, pp. 68–80. IEEE Comp. Soc. Press, 1988.
     [doi:10.1109/SFCS.1988.21923] 2, 3, 6

[27] G IL K ALAI: A Fourier-theoretic perspective on the Condorcet paradox and Arrow’s theorem. Adv.
     in Appl. Math., 29(3):412–426, 2002. [doi:10.1016/S0196-8858(02)00023-4] 1, 3

[28] G IL K ALAI: Social indeterminacy. Econometrica, 72(5):1565–1581, 2004. [doi:10.1111/j.1468-
     0262.2004.00543.x] 2

[29] S UBHASH K HOT: On the power of Unique 2-Prover 1-Round Games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017] 2

                     T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                         47
                        A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

[30] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for Max-Cut and other 2-variable CSPs? SIAM J. Comput., 37(1):319–357,
     2007. Preliminary version in FOCS’04. [doi:10.1137/S0097539705447372] 1, 2, 3, 6, 7, 38, 39, 40

[31] S UBHASH K HOT, P REYAS P OPAT, AND R ISHI S AKET: Approximate Lasserre integrality gap for
     Unique Games. In Proc. 13th Internat. Workshop on Approximation Algorithms for Combinato-
     rial Optimization Problems (APPROX’10), volume 6302 of LNCS, pp. 298–311. Springer, 2010.
     [doi:10.1007/978-3-642-15369-3_23] 6

[32] S UBHASH K HOT AND R ISHI S AKET: SDP integrality gaps with local `1 -embeddability. In Proc.
     50th FOCS, pp. 565–574. IEEE Comp. Soc. Press, 2009. [doi:10.1109/FOCS.2009.37] 6

[33] S UBHASH K HOT AND N ISHEETH K. V ISHNOI: The Unique Games Conjecture, integrality gap for
     cut problems and embeddability of negative type metrics into `1 . J. ACM, 62(1), 2015. Preliminary
     version in FOCS’05. [doi:10.1145/2629614, arXiv:1305.4581] 2, 6, 7, 37, 39

[34] G UY K INDLER AND RYAN O’D ONNELL: Gaussian noise sensitivity and Fourier tails. In Proc.
     27th IEEE Conf. on Computational Complexity (CCC’12), pp. 137–147. IEEE Comp. Soc. Press,
     2012. [doi:10.1109/CCC.2012.35] 4

[35] J EAN -L OUIS K RIVINE: Anneaux préordonnés. J. Analyse Math., 12:307–326, 1964. Available at
     HAL. 5

[36] J EAN B ERNARD L ASSERRE: Global optimization with polynomials and the problem of moments.
     SIAM J. Optim., 11(3):796–817, 2001. [doi:10.1137/S1052623400366802] 2, 5, 6, 19

[37] J EAN B ERNARD L ASSERRE: Moments, Positive Polynomials and Their Applications. Imperial
     College Press, London, 2010. [doi:10.1142/9781848164468_fmatter] 7, 22

[38] H ENRI L OMBARDI , N IKOLAI M NEV, AND M ARIE -F RANÇOISE ROY: The Positivstellensatz and
     small deduction rules for systems of inequalities. Mathematische Nachrichten, 181(1):245–259,
     1996. [doi:10.1002/mana.3211810110] 5

[39] G EORGE G. L ORENTZ: Bernstein polynomials. Chelsea Publishing Company, Inc., 1986. 6, 44

[40] L ÁSZLÓ L OVÁSZ AND A LEXANDER S CHRIJVER: Cones of matrices and set-functions and 0-1
     optimization. SIAM J. Optim., 1(2):166–190, 1991. [doi:10.1137/0801013] 5, 6

[41] E LCHANAN M OSSEL: Lecture notes on Fourier analysis. Available at author’s website, 2005. 13

[42] E LCHANAN M OSSEL: Gaussian bounds for noise correlation of functions. Geom. Funct. Anal.,
     19(6):1713–1756, 2010. Preliminary version in FOCS’08. [doi:10.1007/s00039-010-0047-x,
     arXiv:math/0703683] 4, 17

[43] E LCHANAN M OSSEL: A quantitative Arrow theorem. Probab. Theory and Related Fields, 154(1–
     2):49–88, 2012. [doi:10.1007/s00440-011-0362-7, arXiv:0903.2574] 4

                      T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                          48
                            M AJORITY IS S TABLEST: D ISCRETE AND S O S

[44] E LCHANAN M OSSEL AND J OE N EEMAN: Robust optimality of Gaussian noise stability. J. Eur.
     Math. Soc., 17(2):433–482, 2015. [doi:10.4171/JEMS/507, arXiv:1210.4126] 4, 7, 8, 9, 11

[45] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 1, 3, 6, 7, 23

[46] Y URII N ESTEROV: Global quadratic optimization via conic relaxation. Handbook of Semidefinite
     Programming. Kluwer, 2000. Available from Citeseer. 5

[47] RYAN O’D ONNELL: Analysis of Boolean Functions. Cambridge University Press, 2014. 13

[48] RYAN O’D ONNELL AND Y I W U: An optimal SDP algorithm for Max-Cut, and equally optimal long
     code tests. In Proc. 40th STOC, pp. 335–344. ACM Press, 2008. [doi:10.1145/1374376.1374425] 4

[49] RYAN O’D ONNELL AND Y UAN Z HOU: Approximability and proof complexity. In Proc. 24th
     Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’13), pp. 1537–1556. ACM Press, 2013.
     [doi:10.1137/1.9781611973105.111] 4, 6, 7, 19, 20, 21, 26, 37, 38, 39, 40

[50] PABLO A. PARRILO: Structured Semidefinite Programs and Semialgebraic Methods in Robustness
     and Optimization. Ph. D. thesis, California Institute of Technology, 2000. Available from the
     Caltech Thesis Library. 5, 6, 19

[51] P RASAD R AGHAVENDRA AND DAVID S TEURER: Integrality gaps for strong SDP relaxations
     of UNIQUE GAMES. In Proc. 50th FOCS, pp. 575–585. IEEE Comp. Soc. Press, 2009.
     [doi:10.1109/FOCS.2009.73] 6

[52] A LEX S AMORODNITSKY AND L UCA T REVISAN: Gowers uniformity, influence of variables,
     and PCPs. SIAM J. Comput., 39(1):323–360, 2009. Preliminary version in STOC’06.
     [doi:10.1137/070681612, arXiv:math/0510264] 2

[53] G RANT S CHOENEBECK: Linear level Lasserre lower bounds for certain k-CSPs. In Proc. 49th
     FOCS, pp. 593–602. IEEE Comp. Soc. Press, 2008. [doi:10.1109/FOCS.2008.74] 5

[54] W ILLIAM F LEETWOOD S HEPPARD: On the application of the theory of error to cases of nor-
     mal distribution and normal correlation. Phil. Trans. Royal Soc. London, 192:101–167, 1899.
     [doi:10.1098/rsta.1899.0003] 3

[55] NAUM Z. S HOR: Class of global minimum bounds of polynomial functions. Cybernetics, 23(6):731–
     734, 1987. [doi:10.1007/BF01070233] 5

[56] G ILBERT S TENGLE: A Nullstellensatz and a Positivstellensatz in semialgebraic geometry. Mathe-
     matische Annalen, 207(2):87–97, 1974. Available at EuDML. [doi:10.1007/BF01362149] 5

[57] M ICHEL TALAGRAND: On Russo’s approximate 0-1 law. Ann. Probab., 22(3):1576–1587, 1994.
     Available from Project Euclid. 2, 3



                      T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                         49
                      A NINDYA D E , E LCHANAN M OSSEL , AND J OE N EEMAN

AUTHORS
    Anindya De
    Assistant Professor
    Northwestern University
    anindya eecs northwestern edu
    http://users.eecs.northwestern.edu/~anindya/

    Elchanan Mossel
    Professor
    Massachusetts Institute of Technology
    mossel stat berkeley edu
    http://www.stat.berkeley.edu/~mossel/


    Joe Neeman
    Assistant Professor
    University of Texas, Austin
    neeman iam uni-bonn de
    https://wt.iam.uni-bonn.de/neeman/


ABOUT THE AUTHORS
    A NINDYA D E is an Assistant Professor of Computer Science at Northwestern University. He
       received his Ph. D. from U. C. Berkeley in 2013 advised by Luca Trevisan and Umesh
       Vazirani. From 2013 to 2015, he was a postdoc at the Institute for Advanced Study
       and DIMACS. His research interests are in complexity theory, learning theory, discrete
       analysis and applied probability.

    E LCHANAN M OSSEL is a Professor of Mathematics at MIT. He received his Ph. D. in
       mathematics from Hebrew University in 2000, advised by Yuval Peres. After his Ph. D.,
       he was a post-doc at the Theory Group of Microsoft Research, Redmond and a Miller
       fellow in Statistics and Computer Science at U. C. Berkeley. His research interests
       include Combinatorial Statistics, Discrete Fourier Analysis and Influences, Randomized
       Algorithms, Computational Complexity, MCMC, Markov Random Fields, Social Choice,
       Game Theory and Evolution.


    J OE N EEMAN is a Professor in Mathematics at the University of Bonn and an Assistant
        Professor of Mathematics at the University of Texas, Austin. He received his Ph. D. from
        U.C. Berkeley, advised by Elchanan Mossel, after which he was a postdoc at U. T. Austin.
        He works in various areas of probability and on applications to computer science.


                    T HEORY OF C OMPUTING, Volume 12 (4), 2016, pp. 1–50                           50