SDP Gaps and UGC-hardness for Max-Cut-Gain

Authors Subhash Khot, Ryan O'Donnell,
Plaintext
                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117
                                       www.theoryofcomputing.org




                SDP gaps and UGC-hardness for
                       Max-Cut-Gain∗
                                Subhash Khot                  Ryan O’Donnell†
                                   Received: July 9, 2008; published: May 29, 2009.



      Abstract: Given a graph with maximum cut of (fractional) size c, the semidefinite pro-
      gramming (SDP)-based algorithm of Goemans and Williamson is guaranteed to find a cut
      of size at least .878 · c. However this guarantee becomes trivial when c is near 1/2, since
      making random cuts guarantees a cut of size 1/2 (i. e., half of all edges). A few years ago,
      Charikar and Wirth (analyzing an algorithm of Feige and Langberg) showed that given a
      graph with maximum cut 1/2 + ε, one can find a cut of size 1/2 + Ω(ε/ log(1/ε)). The
      main contribution of our paper is twofold:
      1. We give a natural and explicit 1/2 + ε vs. 1/2 + O(ε/ log(1/ε)) integrality gap for the
      Max-Cut SDP based on Euclidean space with the Gaussian probability distribution. This
      shows that the SDP-rounding algorithm of Charikar-Wirth is essentially best possible.
      2. We show how this SDP gap can be translated into a Long Code test with the same param-
      eters. This implies that beating the Charikar-Wirth guarantee with any efficient algorithm
      is NP-hard, assuming the Unique Games Conjecture (UGC). This result essentially settles
      the asymptotic approximability of Max-Cut, assuming UGC.
          Building on the first contribution, we show how “randomness reduction” on related
      SDP gaps for the Quadratic-Programming problem lets us make the Ω(log(1/ε)) gap as
  ∗ An extended abstract appeared in STOC 2006 [30].
  † Supported in part by NSF CAREER grant CCF-0747250.



ACM Classification: F.2.2, G.2.2
AMS Classification: 68Q17, 68Q25, 68W25, 52A40, 90C20, 90C22
Key words and phrases: Max-Cut, Max-Cut-Gain, semidefinite programming, semidefinite pro-
gramming gaps, Unique Games Conjecture, dictator testing, Gaussian space, quadratic programming,
Grothendieck inequality


 2009 Subhash Khot and Ryan O’Donnell
 Licensed under a Creative Commons Attribution License                                DOI: 10.4086/toc.2009.v005a004
                                 S UBHASH K HOT AND RYAN O’D ONNELL

      large as Ω(log n) for n-vertex graphs. In addition to optimally answering an open question
      of Alon, Makarychev, Makarychev, and Naor, this technique may prove useful for other
      SDP gap problems.
          Finally, illustrating the generality of our second contribution, we also show how to
      translate the Davie–Reeds SDP gap for the Grothendieck Inequality into a UGC-hardness
      result for computing the k · k∞7→1 norm of a matrix.



1    Introduction
1.1 Max-Cut
Constraint satisfaction problems (CSPs) constitute some of the most fundamental algorithmic tasks.
For most interesting CSPs, finding an optimum solution is NP-hard; hence it is of interest to study
“approximation algorithms,” i. e., efficient algorithms guaranteed to find a solution within a certain factor
of the optimum. Unfortunately, the computational complexity of approximating CSPs is still not well
understood; for example, it is not known if approximating Vertex-Cover to a factor of 3/2 is in P, nor is
the problem known to be NP-hard.
    The main topic of this paper is the approximability of the Max-Cut problem—arguably the simplest
of all NP-hard constraint satisfaction problems. Recall that Max-Cut is the following algorithmic task:
Given an undirected graph G with nonnegative weights on the edges, partition its vertices into two parts
so as to maximize the “value” of the “cut”—i. e., the sum of the weights of the edges that straddle the
partition. Throughout this paper we will assume that graphs’ edge-weights are normalized so that their
total sum is 1.
    Regarding approximation algorithms for Max-Cut, the trivial solution of picking a random partition
guarantees (in expectation) a cut of value at least 1/2. No essential improvement on this was known
until the breakthrough paper of Goemans and Williamson [20]. Let us very briefly review the Goemans–
Williamson algorithm. Given an n-vertex input graph G with weight ai j on edge (i, j) (and weights
summing to 1), the algorithm writes down the associated Max-Cut problem as an integer program:
                                                      n
                                              1 1
                                      max     2+2    ∑ (−ai j ) (yi · y j )                            (1.1)
                                                    i, j=1

                                        subject to: yi ∈ {−1, 1} .
The algorithm then relaxes this to a semidefinite program (SDP) which can be solved
efficiently:
                                                      n
                                              1 1
                                      max     2+2    ∑ (−ai j ) (yi · y j )                            (1.2)
                                                    i, j=1

                                          subject to: yi ∈ Sn−1 .
Here Sn−1 denotes the unit sphere in n dimensions, {y ∈ Rn : kyk2 = 1}, and · is interpreted as inner
product. Finally, given the optimal unit vector solution {y∗i }, the algorithm “rounds” the vectors to a ±1

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                               84
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

solution as follows: It picks a random vector r from the n-dimensional Gaussian distribution (i. e., each
component ri is an independent standard Gaussian) and sets

                                              yi = sgn(y∗i · r) .

Goemans and Williamson showed that their algorithm has the following two guarantees (in expectation,
but the results can be made to hold with high probability):
                                                                                             √
    • Given a graph with maximum cut 1 − ε the algorithm finds a cut of value at least 1 − Θ( ε).
   • Given a graph with maximum cut s the algorithm finds a cut of value at least αGW · s, where
     αGW ≈ .878 is a certain trigonometric quantity.
    On the hardness-of-approximation side, the best NP-hardness result known for Max-Cut, due to
Håstad [24] and Trevisan–Sorkin–Sudan–Williamson [44], shows that given a graph with maximum cut
17/21 it is NP-hard to find a cut with value 16/21 + ε.
    However, Khot, Kindler, Mossel, and O’Donnell [28] recently showed hardness results that match
the above two guarantees of Goemans and Williamson, assuming Khot’s “Unique Games Conjec-
ture” (UGC) [27]. For a discussion of why assuming UGC seems necessary for sharp results given
our current knowledge, see [27, 28].

1.2 Max-Cut-Gain
Despite the results from [28], the Goemans–Williamson algorithm is certainly suboptimal in some cases.
For example, given a graph with optimum cut .55, the Goemans–Williamson algorithm is only guaran-
teed to return a cut with value .878 · .55 < .49, which is worse than the trivial random algorithm. Indeed,
Alon, Sudakov, and Zwick [4] have shown that there exist graphs on which the Goemans–Williamson
algorithm performs just this poorly. The issue was addressed first by Zwick [45] who gave an alternate
SDP rounding procedure which on graphs with maximum cut of fractional size c is guaranteed to find
a cut of size at least β (c), where β is a somewhat explicitly defined function satisfying β (c)/c → 1
as c → 1/2. The function β was improved by Feige and Langberg [19] using a different rounding
algorithm:
                                                                       (
                            0             ∗                             t/T        if t ∈ [−T, T ],
            Step 1: Set yi = roundT (yi · r), where roundT (t) =                                      (1.3)
                                                                         sgn(t) otherwise.
            Step 2:   Randomly round y0i ∈ [−1, 1] to yi ∈ {−1, 1}.                                     (1.4)

In Step 1 (where again r is a random Gaussian vector), the algorithm may try different values for T . The
meaning of Step 2 is that we take yi to be 1 with probability 12 + 21 y0i and −1 otherwise (so E[yi ] = y0i ),
independently for each i. Note that we may as well view the [−1, 1] values {y0i } as the final “solution,”
since with respect to Step 2,

                            E 21 + 21 ∑(−ai j )(yi · y j ) = 21 + 12 ∑(−ai j )(y0i · y0j ) .
                                                         

Feige and Langberg believed the rounding functions used in Step 1, which they called “s-linear func-
tions,” were “close to being [optimal]” given their rounding technique.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                               85
                                  S UBHASH K HOT AND RYAN O’D ONNELL

    Since the trivial random algorithm finds cuts of value at least 1/2, it makes sense to measure the
performance of a Max-Cut approximation algorithm in terms of how much more than 1/2 it can guar-
antee. Indeed, Håstad and Venkatesh [25] suggest measuring the performance of algorithms for CSPs
by looking at how well they approximate the gain over a random solution; Håstad and Venkatesh were
particularly interested in this question for the Max-Lin (mod 2) problem, of which Max-Cut is a special
case. To that end, we define the “Max-Cut-Gain” approximation problem as follows: Given a weighted
graph with maximum cut 1/2 + ε, find a cut of value 1/2 + αε for α as large as possible.
    Neither Zwick [45] nor Feige–Langberg [19] provided any analysis of their algorithms in the Max-
Cut-Gain framework. However, Charikar and Wirth   p [12] subsequently gave an analysis of the Feige–
Langberg algorithm showing that when T = Θ( log(1/ε)), it has the following guarantee:

      • Given a graph where (1.2) is at least 1/2 + ε (e. g., if the maximum cut (1.1) is at least 1/2 + ε),
        the algorithm finds a cut of value at least 1/2 + Ω(ε/ log(1/ε)).

As for hardness of Max-Cut-Gain, the 16/17 NP-hardness result can be trivially translated into NP-
hardness of finding cuts of value 1/2 + (11/13)ε in graphs with maximum cut 1/2 + ε. The methods
of [28] slightly improve this to 1/2 + (2/π)ε, assuming UGC.

1.3     Our main results—informal statement and discussion
The first two main results of this paper can now be stated.

Main Result 1: (For details, see Theorem 4.1.) For each ε > 0, the Goemans–Williamson SDP for
Max-Cut-Gain has an “integrality gap” of 1/2 + ε vs. 1/2 + O(ε/ log(1/ε)). In other words, there are
nonnegative weights ai j summing to 1 such that (1.2) has value at least 1/2 + ε but (1.1) has value at
most 1/2 + O(ε/ log(1/ε)). Thus the Charikar-Wirth SDP-rounding algorithm is essentially optimal.

Main Result 2: (For details, see Theorems 4.2, 4.3.) The above SDP gap can be translated into an
equivalent “Long Code test.” As a consequence, we get that there exists a constant C such that for
each ε > 0, given a graph with maximum cut 1/2 + ε, it is UGC-hard to find a cut of value at least
1/2 + Cε/ log(1/ε). In other words, beating the Charikar-Wirth Max-Cut-Gain guarantee with any
efficient algorithm is NP-hard, assuming UGC.
     Recall that [28] proved sharp UGC-hardness in the “high end”—maximum cut near 1—and also
sharp UGC-hardness of .878-factor approximation. Since our second theorem proves sharp UGC-
hardness in the “low end”—maximum cut near 1/2—we consider the question of Max-Cut’s approx-
imability to be qualitatively completely settled, up to UGC.
     We view these two results as an interesting continuation of the flurry of work in the last four years
on 2-bit constraint satisfaction problems such as Vertex-Cover, Max-Cut and Sparsest-Cut. This recent
work has made intriguing connections among the following topics:

      • semidefinite programming (SDP) algorithms [6, 12, 26, 1, 3, 2, 11];

      • SDP integrality gaps [32, 29, 33];

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                            86
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

      • Fourier analysis-based hardness results [31, 28, 13, 37, 18] subject to the Unique Games Conjec-
        ture (UGC) [27].

In particular, within these papers we see SDP rounding algorithms, SDP gap constructions, and Fourier
analysis results all motivating one another. It was not until recently—two years subsequent to the ini-
tial conference publication of this work—that the full extent of these relationships became clear. (See
Section 1.5.)
     The main theme in the present paper is the illustration of how SDP gaps for 2-bit constraint satis-
faction problems arise naturally in Gaussian space, and how these SDP gaps can be naturally translated
into Long Code tests and UGC-hardness results.

1.4     Related problems: Correlation-Clustering and Quadratic-Programming
Max-Cut-Gain is a special case of an algorithmic problem called (weighted) Correlation-Clustering. The
unweighted (i. e., all edges having equal weight) version of the problem, introduced by Bansal, Blum and
Chawla [7], has the following setting: Given is an unweighted graph G with some edges labeled “similar”
and the remaining edges labeled “dissimilar.” The goal is to partition the vertices into any number of
“clusters” with the idea that edges labeled “similar” should be contained within clusters and edges
labeled “dissimilar” should straddle clusters. The authors of [7] considered three different goals for a
solution: “MaxAgree,” namely maximizing the number of correctly positioned edges, “MinDisagree,”
minimizing the number of incorrectly positioned edges, and “MaxCorr,” maximizing the “correlation”—
i. e., the difference of the number of correctly positioned edges and incorrectly positioned edges.
      Although the approximability of the MaxAgree and MinDisagree versions became fairly well un-
derstood, progress on the approximability of the MaxCorr was not made until the paper of Charikar and
Wirth [12]. Charikar-Wirth first showed that up to a constant factor (at most 3), partitions into just two
clusters are as good as partitions into arbitrary numbers of clusters. Thus not much is lost by restricting
to the two-cluster version of Correlation-Clustering. It is easy to see that the weighted two-cluster ver-
sion of Correlation-Clustering is essentially the same as the Quadratic-Programming problem: Given a
matrix of weights A = (ai j ),
                                                   n
                                         max      ∑ (ai j ) (yi · y j )                               (1.5)
                                                 i, j=1
                                        subject to: yi ∈ {−1, 1} .
To see the equivalence to two-cluster Correlation-Clustering, think of the yi as indicating which cluster i
is in and think of the positive weights ai j as measuring similarity and the negative weights ai j measuring
dissimilarity. Also, note that Max-Cut-Gain is the special case of Quadratic-Programming in which all
the weights are nonpositive.
     Quadratic-Programming was shown to admit an Ω(1/ log n)-approximation algorithm in works of
Nesterov [39], Nemirovski, Roos, and Terlaky [38], and Megretski [36]; as described in Charikar and
                                                                               √
Wirth [12], running the Feige–Langberg rounding procedure with T = Θ( log n) on the natural SDP
relaxation (1.5) yields an Ω(1/ log n)-approximation algorithm. Thus Correlation-Clustering also has an
Ω(1/ log n)-approximation algorithm.
     On the hardness side, Arora, Berger, Hazan, Kindler and Safra [5] showed that giving a (1/ logγ0 n)-
approximation is hard for some universal γ0 > 0 unless NP is contained in quasipolynomial time. The

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                             87
                                   S UBHASH K HOT AND RYAN O’D ONNELL

instances constructed in that paper use heavily skewed positive and negative weights and are thus not
relevant for the Max-Cut-Gain problem. Alon, Makarychev, Makarychev, and Naor [2] showed a tight
SDP gap for Quadratic-Programming of Ω(log n). Their proof, however, was completely nonconstruc-
tive; they used duality to argue that such a gap existed without giving any explicit instance. They gave
as an open problem the question of finding an explicit instance demonstrating the Ω(log n) gap. The
work [5] gives a fairly complicated construction showing an Ω(log n/ log log n) gap.

      The third main result in this paper is the following:

Main Result 3: (For details, see Theorem 4.4.) There is a relatively simple and essentially explicit
Ω(log n) SDP gap for Quadratic-Programming, based on “randomness reduction” of the 1/2 + ε vs.
1/2 + O(ε/ log(1/ε)) SDP gap we prove for Max-Cut-Gain.

    Finally, the two-cluster Correlation-Clustering problem is of special interest when the underlying
graph is bipartite. Following Alon, Makarychev, Makarychev, and Naor [2], we call the weighted version
of this problem KN,N -Quadratic-Programming:
                                                     n
                                           max      ∑ (ai j ) (yi · z j )                            (1.6)
                                                   i, j=1


                                   subject to: yi ∈ {−1, 1}, z j ∈ {−1, 1} .
This problem is equivalent to that of computing the k·k∞7→1 norm of the matrix A, which is closely related
to its “cut norm.” See Alon and Naor [3] for a discussion of the algorithmic significance of this problem.
Alon and Naor studied approximation algorithms for this problem and noted that the Grothendieck
inequality [22] from the theory of Banach spaces is nothing more than a constant upper bound on the SDP
gap of (1.6). The supremum of the SDP gaps of all instances of (1.6) is called the Grothendieck constant.
√ and Naor took the best known upper bound on Grothendieck’s constant—KKrivine = π/(2 ln(1 +
Alon
  2)) ≈ 1.78, due to Krivine [34]—and translated it into a (1/KKrivine )-approximation algorithm for the
problem of approximating kAk∞7→1 .
     The last main theorem of our paper is the analogue of this for lower bounds/hardness:

Main Result 4: (For details, see Theorems 4.5, 4.6.) The best known lower bound on Grothendieck’s
constant—namely, KDavie–Reeds ≈ 1.67, due independently to Davie [17] (see [23]) and Reeds [43]—
can be translated into a UGC-hardness result for (1/KDavie–Reeds )-approximating the problem KN,N -
Quadratic-Programming.

1.5     Subsequent work
Subsequent to the 2006 publication of the extended abstract of this work [30], research has progressed
on the connections between SDP algorithms and gaps, Fourier analysis, and Long Code test described
in Section 1.3. Two papers appearing recently are particularly relevant. In [40], O’Donnell and Wu
continue the research in the present paper. Using a combination of the von Neumann Minimax theorem

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                          88
                         SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

and Borell’s theorem on Gaussian rearrangement [10], the authors show that the Gaussian graphs used
to prove our two main theorems are indeed the “worst” graphs. More specifically, they show that the
optimal SDP gaps and UGC-hardness for Max-Cut-Gain come from negative probability operators of
the form −(pT1 + (1 − p)Tρ0 ), where p ∈ [0, 1], ρ0 ∈ [−1, 0]—just as in the present paper’s Theorem 5.5.
This allows O’Donnell and Wu to determine, for every c, the precise number s such that c vs. s is the
optimal SDP gap and UGC-hardness for Max-Cut-Gain. In [41], Raghavendra significantly extends the
methods in these papers, showing (roughly speaking) that for all constraint satisfaction problems, the
optimal SDP gaps and UGC-hardness match. The only downside of this result is that it is somewhat
inexplicit; determining what the matching values are is not so easy. In particular, to determine the best s
for a given c up to an additive δ , Raghavendra gives only an algorithm taking doubly-exponential time
in δ .

1.6   Outline of this paper
The remainder of the paper is organized as follows. In Section 2 we formally define the algorithmic prob-
lems we are interested in for this paper: Max-Cut-Gain, Quadratic-Programming, and KN,N -Quadratic-
Programming. We also describe the SDP relaxations for these problems, as well as a probabilistic
rephrasing of them that allows us to consider instances on Gaussian space. In Section 3 we intro-
duce the Fourier and Hermite expansion tools we will need, define “Long Code tests,” and introduce
the “Davie–Reeds operator” that provides the basis for our SDP gaps and UGC-hardness results. Sec-
tion 4 is devoted to formal statements of our main results. Section 5 contains the proofs of our first
two main results, for Max-Cut-Gain. Section 6 gives the proof of our “randomness reduction” for the
Quadratic-Programming SDP. Finally, Section 7 proves our UGC-hardness result for KN,N -Quadratic-
Programming.


2     Problem definitions
In this section we give formal definitions of the problems studied in this paper.

2.1   Algorithmic problems
The most general algorithmic problem we are concerned with is Quadratic-Programming:

Definition 2.1. The Quadratic-Programming problem is the following: Given a symmetric real matrix
A = (ai j ), compute
                                     max ∑(ai j ) (yi · y j )
                                                 i, j

                                        subject to:     yi ∈ [−1, 1] .

Convention: We will always assume that ∑i, j |ai j | = 1. This is without any loss of generality because
we can scale all entries ai j by the same positive constant.
   The assumption that A is symmetric is without loss of generality. Also, our definition of Quadratic-
Programming has a slight difference from the definition given in [12, 5, 2]: namely, we allow the

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                             89
                                S UBHASH K HOT AND RYAN O’D ONNELL

variables to take values in [−1, 1], not just {−1, 1}, and we also allow nonzero diagonal entries in
A, corresponding to “self-loops” in the graph in the Correlation-Clustering problem. We believe our
definition is mathematically more natural, and in any case, the two definitions are essentially equivalent
for all intents and purposes—see Section 2.3.
    We are interested in two special cases of Quadratic-Programming. The first is our main problem of
interest, Max-Cut-Gain:

Definition 2.2. The Max-Cut-Gain problem is the special case of Quadratic-Programming in which all
the ai j are nonpositive. This corresponds to the graphical instance of Max-Cut-Gain in which the weight
on edge (i, j) is −ai j /2.

    We remark that Crescenzi, Silvestri and Trevisan [16] showed that the weighted and unweighted
versions of Max-Cut-Gain have the same polynomial-time approximability up to an additive 1/poly(n).
    We also define the bipartite special case of the Quadratic-Programming problem, investigated by
Alon and Naor [3] using the Grothendieck Inequality. This is equivalent to the problem of computing
kAk∞7→1 .

Definition 2.3. The KN,N -Q UADRATIC-P ROGRAMMING problem is the following: Given a real matrix
A = (ai j ), compute
                                          max       ∑(ai j ) (yi · z j )                             (2.1)
                                                    i, j


                                      subject to:          yi , zi ∈ [−1, 1] .

    It is easy to check that the optimizers in this problem are always in {−1, 1} and thus the issue of
“zeros on the diagonal” is irrelevant.


2.2     SDP relaxations, and the probabilistic viewpoint
We begin this section by defining the semidefinite programming (SDP) relaxation of the Quadratic-
Programming problem.

Definition 2.4. For d ∈ N, we define Bd to be the d-dimensional unit ball, Bd = {x ∈ Rd : kxk2 ≤ 1}. In
particular, B1 = [−1, 1].

      Note that the Quadratic-Programming problem can be written as:

                                                    n
                                         max        ∑ (ai j ) (yi · y j )                            (2.2)
                                                i, j=1


                                          subject to:          yi ∈ B1 .



                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                            90
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

Definition 2.5. The d-dimensional SDP relaxation of (2.2) is defined to be:
                                                     n
                                         max        ∑ (ai j ) (yi · y j )                               (2.3)
                                                   i, j=1

                                           subject to:        yi ∈ Bd .
If unspecified, the value d is assumed to be n.

     The interest in the SDP relaxation is that if d = n, it can be solved in polynomial time (strictly
speaking, it can be solved to within additive error 2−poly(n) ; see [20]). Thus we have the following
strategy for approximating (2.2): solve the SDP relaxation and then try to “round” the optimal vectors
y∗i into reals yi in [−1, 1] such that the value of the quadratic form does not go down too much. This
strategy motivates the definition of “SDP integrality gaps”:

Definition 2.6. Given the matrix A, we say that the SDP has an SDP (integrality) gap of κ vs. κ 0 if the
value of (2.3) is at least κ and the value of (2.2) is at most κ 0 . The associated SDP gap ratio is defined
to be κ 0 /κ. The SDP gap of the problem Quadratic-Programming is defined to be the worst SDP gap
ratio over all possible inputs A. We make the same definitions for Max-Cut-Gain and KN,N -Quadratic-
Programming.

    The input instances A we construct in exhibiting SDP gaps for Max-Cut-Gain and Quadratic-
Programming are most naturally set in Gaussian space. One can think of this setting as the one in
which the matrices A are infinite, their “coordinates” are indexed by points in Rn , and their coordinates
are also “weighted” according to the Gaussian probability distribution. The most natural way to allow
for this is to rephrase the Quadratic-Programming problem and its SDP relaxation in a probabilistic
manner. In the remainder of this section we therefore give definitions for the probabilistic versions of
Quadratic-Programming and Max-Cut-Gain .
    Making these definitions requires some preparation. Let (X, µ) be any probability space. The three
cases of interest to us are X = {1, . . . , N} with µ the uniform distribution, X = {−1, 1}n with µ the
uniform distribution, and X = Rn with µ = γ, the n-dimensional Gaussian probability distribution. We
consider the set of functions f : X → R as an inner product space, with

                                         h f , gi = E [ f (x)g(x)] ,
                                                        µ
                                                    x←X
                          R                                                               p
                                                   µ [F(x)]. We also write k f k2 =
We will sometimes write X F(x) dµ(x) in place of Ex←X                                      h f , f i.

Definition 2.7. An instance of the probabilistic version of Quadratic-Programming is a (bounded)
linear operator A on the above inner product space, and the associated problem is to determine

                                                  sup       h f,A f i.                                  (2.4)
                                             f :X→[−1,1]


    Note that in the case of X = {1, . . . , N} this is the same as the Quadratic-Programming originally
defined, after scaling (the matrix) A by a factor of N.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                               91
                                  S UBHASH K HOT AND RYAN O’D ONNELL

Definition 2.8. The probabilistic version of KN,N -Quadratic-Programming is to determine

                                                 sup     h f , Agi .
                                             f ,g:X→[−1,1]

    Finally, to define the probabilistic version of Max-Cut-Gain, we require the following:

Definition 2.9. A probability operator X is any linear operator B satisfying the following two conditions:
B f ≥ 0 for all f ≥ 0, and h1, B1i = 1.

    In the simple case of X = {1, . . . , N}, a probability operator is precisely a nonnegative matrix with
entries summing to 1. We therefore make the following definition:

Definition 2.10. The probabilistic version of Max-Cut-Gain is defined to be the special case of
Quadratic-Programming in which A is the negative of a probability operator.

    To define the probabilistic version of the SDP for Quadratic-Programming, we need to introduce a
few more notions. First, we extend the inner product space defined above to functions f : X → Rd by
defining
                                       h f , gi = E [h f (x), g(x)i] ,
                                                    µ
                                                  x←X

where the h·, ·i inside the expectation is just the usual inner product in Rd . Next:

Definition 2.11. Given a linear operator A on functions X → R, we say its component-wise version is the
linear operator A(d) on functions f : X → Rd given by applying A component-wise on the d coordinate
functions of f .

    We now can define the probabilistic version of the SDP:

Definition 2.12. The probabilistic d-dimensional SDP relaxation of (2.4) is defined to be

                                               sup h f , A(d) f i .                                      (2.5)
                                              f :X→Bd

We make the analogous definitions also for the probabilistic SDP relaxations of Max-Cut-Gain and
KN,N -Quadratic-Programming. When X = Rn with the Gaussian measure and d is not mentioned, we
assume d = n.

    Regarding the possibility of an infinite domain X, the reader can be assured of the following facts:
First, for any constants κ and κ 0 , a probabilistic SDP gap of κ vs. κ 0 in the Gaussian space setting for any
of our problems implies the same gap in the usual {1, . . . , N} setup for sufficiently large N, by a limiting
argument. Furthermore: a) our Long Code test and UGC-hardness result for Max-Cut-Gain take place
entirely in the finite setting of X = {−1, 1}n , using the Gaussian space setting only for motivation; b) our
third theorem on “randomness reduction” shows how to effectively convert an ε vs. O(ε/ log(1/ε)) SDP
gap in Gaussian space into one with the same parameters over {1, . . . , N} with N = poly(1/ε). We stress
that the extension to the Gaussian space framework is made because this is the setting in which our SDP
gaps naturally occur.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                92
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

2.3   A slight annoyance: zeros on the diagonal
In this section we consider a minor technical point—the issue of “self-loops,” or “zeros on the diagonal.”
    As noted, our definition of Quadratic-Programming, Definition 2.1, is slightly different from the
definition of the problem used by [12, 5, 2]. In their definition, the variables are maximized only over
{−1, 1}, and the diagonal entries of the matrix A must be zero. Let us call this version of the problem
“Restricted-Quadratic-Programming.”
    Let us compare the two definitions. First, note that if A has zero diagonal entries then the definitions
are completely equivalent: given optimal (y∗i ) for our problem, rounding them randomly as in the Feige–
Langberg algorithm yields a solution in {−1, 1} that is equally good in expectation, and thus there exists
maximizers in {−1, 1} with the same value.
    On the other hand, if nonzero diagonal entries are allowed then the maximum over yi ∈ {−1, 1} can
be negative, leading to trouble in defining approximation guarantees. However with our definition the
solution yi ≡ 0 is always allowed, showing that the maximum is always nonnegative. We believe this
makes our definition more mathematically natural.
    In any case, the definitions are essentially equivalent for all intents and purposes, as the following
proposition (whose trick appears implicitly in [5]) shows:
Proposition 2.13. There is an efficient reduction mapping Quadratic-Programming instances to
Restricted-Quadratic-Programming instances on M 2 times as many variables under which the SDP
value and the actual value change by at most an additive O(1/M).
Proof. Suppose we have an instance of Quadratic-Programming, with our convention that ∑ |ai j | = 1
(which is without loss of generality). Replace each variable i with M copies of itself. Split the old
weights ai j into M 2 new weights in the natural way: put weight ai j /M 2 between each new i-variable and
new j-variable when i 6= j; and, put weight 2aii /M 2 between each pair of distinct i-variables and weight
aii /M 2 on each i-variable’s self-loop. This gives a new instance of Quadratic-Programming and it is
easy to see that it has the same value and SDP value as the old one. Indeed, any solution (yi ) in the old
instance can be duplicated into an equal-value solution in the new instance, and any solution (yi ) in the
new instance can be averaged into an equal-value solution in the old instance (SDP value or real value).
But note that if the total weight on the diagonal, ∑ |aii | was α in the old instance, the new instance has
only total weight α/M ≤ 1/M on its diagonal. If we simply delete this diagonal weight (and rescale
slightly) we get an instance of Restricted-Quadratic-Programming, and this deletion changes the value
of any solution by at most an additive O(1/M).
   From this, we conclude that the Restricted-Quadratic-Programming and Quadratic-Programming
problems have identical SDP gaps and also the same polynomial-time approximability up to arbitrarily
small inverse-polynomial additive factors.


3     Long Code tests and Fourier analysis
3.1   Fourier and Hermite expansions
We begin by defining the notions we need for Fourier analysis of Boolean functions and Hermite analysis
of functions on Gaussian space.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                              93
                                   S UBHASH K HOT AND RYAN O’D ONNELL

Definition 3.1. Recall the inner product space on functions f : {−1, 1}n → R defined in Section 2.2. Un-
der this definition, the set of functions (χS )S⊆[n] defined by χS (x1 , . . . , xn ) = ∏i∈S xi forms an orthonormal
basis, and thus we have the Fourier expansion of any f : {−1, 1}n → R,

                                                   f = ∑ fˆ(S)χS ,
                                                         S⊆[n]

where fˆ(S) = E[ f χS ]. The functions χi = ±χ{i} are called the dictator functions or Long Codes. We
also have the Plancherel identity,
                                         h f , gi = ∑ fˆ(S)ĝ(S) .
                                                          S⊆[n]

We define the projection to level k operator Pk (for 0 ≤ k ≤ n) and the Bonami-Beckner operator Tρ (for
0 ≤ ρ ≤ 1) by
                                                                              n
                                    Pk f = ∑ fˆ(S)χS ,             Tρ f = ∑ ρ k Pk .
                                            |S|=k                           k=0

The Bonami-Beckner operator also acts as (Tρ f )(x) = Ey [ f (y)], where the string y is formed by letting
yi = xi with probability ρ and letting yi be uniformly random otherwise, independently for each i. Thus
the Bonami-Beckner operator is a probability operator.
Definition 3.2. Recall the inner product space on functions f : (X, γ) → Rd defined in Section 2.2,
where X = Rn and γ is the n-dimensional Gaussian probability distribution. Under this definition there
is a complete set of orthonormal polynomials Rn → R called the Hermite polynomials, (Hk )k∈Nn (see,
e. g., the book of Ledoux and Talagrand [35]). We write |k| = k1 + · · · + kn . Any function f : Rn → Bd
has a Hermite expansion,
                                          f = ∑ fˆ(k)Hk ,
                                                         k∈Nn

where the Hermite coefficients fˆ(k) = Rn f (x)Hk (x) dγ(x) are in Bd . We have the Plancherel identity
                                           R


                                           h f , gi = ∑ h fˆ(k), ĝ(k)i .
                                                         k∈Nn

We define the projection to level k operator Pk (for k ∈ N) and the Ornstein-Uhlenbeck operator Tρ (for
0 ≤ ρ ≤ 1) by
                                 Pk f = ∑ fˆ(S)Hk , Tρ f = ∑ ρ k Pk .
                                           |k|=k                             k≥0

As H0 (x) = 1 and Hei (x) = xi , we have
                                                                   Z
                               P0 f = E[ f ],    (P1 f )(x) =            hx, yi f (y) dγ(y) .
                                                                    Rn

We also have that                                Z                 p
                                  (Tρ f )(x) =           f (ρx +    1 − ρ 2 y) dγ(y) ,
                                                    Rn
so the Ornstein-Uhlenbeck operator is a probability operator. From these facts we also see that P0 , P1 ,
and Tρ are component-wise operators.


                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                    94
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

3.2     Long Code tests
Our second main theorem in this paper is a hardness result for Max-Cut-Gain. Being a “2-variable
constraint satisfaction problem,” it seems very difficult to prove a sharp inapproximability NP-hardness
result for it using current techniques (see Khot [27]). Thus we prove a UGC-hardness result; i. e., we
reduce the “Unique Label Cover” problem to that of approximating Max-Cut-Gain. The paper [28]
provides a relatively clean template for such reductions in its UGC-hardness result for Max-Cut; the
reduction is through the use of gadgets known as “Long Code tests.” (Such reductions originate in the
work of Bellare, Goldreich, and Sudan [8] and were developed significantly by the work of Håstad [24].)
    Long Code tests are usually defined as property testing algorithms for testing whether a given func-
tion f : {−1, 1}n → {−1, 1} is correlated to a “Long Code” or “dictator”—i. e., ±χi for some i. For
our purposes, however, we can think of a Long Code test as something like an instance A demonstrating
a probabilistic SDP integrality gap in the setting X = {−1, 1}n . The difference is that we don’t com-
pare h f , A f i for functions f : {−1, 1}n → Bd and f : {−1, 1}n → [−1, 1]. Instead we compare dictator
functions f = ±χi versus functions f : {−1, 1}n → [−1, 1] that are far from being dictators. Note that a
function f is far from being a dictator if | fˆ(i)| is small for all i. We make the following definition:

Definition 3.3. In the context of Quadratic-Programming and its subproblems, a linear operator A on
functions f : {−1, 1}n → [−1, 1] is said to be a Long Code test with completeness c and soundness s if:

                                          hχi , Aχi i ≥ c         for all i,

                                 and         sup          h f , A f i ≤ s + δ (ε) ,                  (3.1)
                                       f :{−1,1}n →[−1,1]
                                           ∀i | fˆ(i)|≤ε

where δ (ε) is a function such that δ (ε) → 0 when ε → 0.

    As a rule of thumb, given a c vs. s Long Code test for a 2-query constraint satisfaction problem, the
techniques in [28] tend to yield UGC-hardness of finding solutions of value at least s + δ on inputs with
optimum value at least c − δ , for every δ > 0. We will see that this applies in our case of Max-Cut-Gain,
and thus to get our sharp ε vs. O(ε/ log(1/ε)) UGC-hardness for approximating Max-Cut-Gain it will
suffice for us to construct a Long Code test with these parameters.
    Indeed, the methodology of [28] can even work in a more relaxed setting, where instead of show-
ing (3.1), one only needs to show

                                          sup         h f , A f i ≤ s + δ (ε) ,
                                   f :{−1,1}n →[−1,1]
                                         ≤k(δ )
                                      Infi      ( f )≤ε


where
                                         Inf≤k
                                            i ( f ) :=      ∑         fˆ(S)2
                                                          |S|≤k,k3i

and k(δ ) < ∞ is a function independent of n. We will use this relaxation in our UGC-hardness result for
KN,N -Quadratic-Programming.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                            95
                                  S UBHASH K HOT AND RYAN O’D ONNELL

3.3   The Davie–Reeds operator
We now define a linear operator that plays a crucial role in all of our results:

Definition 3.4. For each 0 < λ < 1, define the Davie–Reeds operator DRλ on functions f : (X, µ) → Rd
by DRλ = P1 − λ · id. This definition makes sense both for X = {−1, 1}n and µ the uniform distribution,
and for X = Rn and µ = γ, the n-dimensional Gaussian distribution. Indeed, as we will see in Section 5.1,
this definition makes sense in an even more general context.

    Davie [17] and Reeds [43] used this operator (with a suitable value of λ ) to give the best known lower
bound on the integrality gap of the KN,N -Quadratic-Programming SDP (2.1) (i. e., the Grothendieck
constant). See Section 7 for further details.
    Although a Davie–Reeds operator is not a negative probability operator, it can be closely related
to one; we will use it to give our SDP gaps for Max-Cut-Gain and also our Long Code test for this
problem. Our “randomness reduction” will also be for Quadratic-Programming instances with Davie–
Reeds operators, and our UGC-hardness result for KN,N -Quadratic-Programming relies on the Davie–
Reeds SDP.
    Note that in general, id = ∑k≥0 Pk . Thus we can also write the Davie–Reeds operator DRλ as

                              DRλ f = −λ P0 + (1 − λ )P1 − λ P2 − λ P3 − · · · .

Intuitively, the Davie–Reeds operator mostly keeps the “linear” part of a function and also “negatively
highlights” the nonlinear parts of a function. Thus it makes sense that it plays a useful role in Long Code
tests, where the functions to be distinguished, dictator functions ±χ, are precisely the linear {−1, 1}-
valued functions.


4     Formal statements of our results
In this section we formally state our main results. They illustrate the main theme of the paper—natural
SDP gaps on Gaussian space yielding Long Code tests yielding UGC-hardness results.
    First, an SDP gap on Gaussian space for a Max-Cut-Gain operator:

Theorem 4.1. Let ε > 0 and set d = poly(1/ε). Let A denote either the Davie–Reeds operator DR1−ε
or the negative probability operator Aε defined in Section 5.4. Then we have the following SDP gap on
d-dimensional Gaussian space:
                                          sup h f , A f i ≥ Ω(ε) ,
                                          f :(Rd ,γ)→Bd

                               and         sup       h f , A f i ≤ O(ε/ log(1/ε)) .
                                     f :(Rd ,γ)→[−1,1]

    For negative probability operators—i. e., those operators used in the Max-Cut-Gain problem—this
tradeoff is optimal, by the SDP rounding algorithm of Charikar and Wirth [12]. The key part of this the-
orem is the study of h f , DR1−ε f i for functions f : (Rd , γ) → [−1, 1]. Interestingly, our analysis suggests
that the optimal functions f should be close to the form f (x) = roundT (hx, ri) for some vector r, where

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                96
                         SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

roundT is as in the Feige–Langberg rounding algorithm (1.3). This would corroborate a suggestion made
in [19], that such functions seem to be close to optimal for rounding the SDP.
    We next show that this gap can be translated into an equivalent Long Code test.

Theorem 4.2. There exist universal constants C, c > 0 such that the following holds. Let ε > 0 and
assume n ≥ C log(1/ε). Let A be as in Theorem 4.1. Then we have the following Long Code test on
{−1, 1}n :
                                hχi , Aχi i ≥ Ω(ε) for all i ∈ [n] ,
                             and           sup           h f , A f i ≤ O(ε/ log(1/ε)) .
                                    f :{−1,1}n →[−1,1]
                                   ∀i | fˆ(i)|≤c/ log(1/ε)

    Using the Unique Label Cover reduction from [28], we easily obtain the following:

Theorem 4.3. Assume UGC holds. Then for all (sufficiently small) constants ε > 0, it is NP-hard to dis-
tinguish Max-Cut-Gain instances with gain at least ε from instances with gain at most O(ε/ log(1/ε)).

    Again, because of [12] this result is optimal up to constant factors. It also closes all qualitative
   √ of the hardness of Max-Cut subject to UGC: from [28] we have hardness of 1 − ε versus 1 −
aspects
Θ( ε) at the “high end” and also .878 factor hardness (both sharp due to [20]); Theorem 4.3 gives the
sharp “low end” tradeoff, 1/2 + ε versus 1/2 + O(ε/ log(1/ε)).
    Regarding Quadratic-Programming, we show that we can perform “randomness reduction” on the
SDP gap of Theorem 4.1 for DR1−ε on Gaussian space; i. e., this SDP gap can be highly efficiently
“discretized.” In particular, we can discretize to poly(d) = poly(1/ε) many points and maintain the gap.

Theorem 4.4. Let ε > 0 be sufficiently small, let d = 1/ε 3 and N = Θ(d 7 ). Let G be a set of N points
drawn at random from d-dimensional Gaussian space and think of G as having the uniform probability
distribution. Then for almost all choices of G (specifically, with probability at least 1 − 1/N), we obtain
an SDP gap on G with factor Ω(log(1/ε)) = Ω(log N):

                                            sup h f , DR1−ε f i ≥ Ω(ε) ,
                                          f :G→Bd

                            and           sup      h f , DR1−ε f i ≤ O(ε/ log(1/ε)) .
                                      f :G→[−1,1]

                                                                                          (d)
(Here we are using our general definition of the Davie–Reeds operator on arbitrary L2 (Rn , µ) spaces;
see Section 5.1 for more details.)
    This SDP gap for Quadratic-Programming is optimal up to constant factors by the O(log N)-
approximation algorithm of [12]. The fact that Quadratic-Programming has an SDP gap of Ω(log N)
was first established by Alon, Makarychev, Makarychev, and Naor in [2]. However the argument used
there was completely nonconstructive; they used duality to argue that an Ω(log N) gap exists without
explicitly giving any instance. They left the problem of describing an instance achieving the gap as an
open problem. Arora, Berger, Hazan, Kindler, and Safra [5] gave a somewhat complicated construction
achieving gap Ω((log N)/(log log N)). We believe our construction essentially resolves the problem by
providing a natural and more or less explicit instance with gap Ω(log N).

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                             97
                                     S UBHASH K HOT AND RYAN O’D ONNELL

   Finally, we describe our hardness result for the KN,N -Quadratic-Programming problem. Here we
show that the Davie–Reeds lower bound for Grothendieck’s constant can be translated into a “two-
function Long Code test” on {−1, 1}n :
Theorem 4.5. For all 0 < λ < .4, d ≥ 1, and δ > 0,

                                 sup           h f , DRλ gi = hχi , DRλ χi i = 1 − λ   for all i ∈ [n] ,
                            f ,g:{−1,1}n →Bd

                 and           sup             h f , DRλ gi ≤ s(λ ) + O(δ ) .
                        f ,g:{−1,1}n →{−1,1}
                              ∀i |ĝ(i)|≤δ

    Here s(λ ) is SDP “soundness” arising in the Davie–Reeds proof; see Section 7 for more details. This
Long Code test is not yet sufficient for a UGC-hardness result because it only shows that if f and g pass
the test with probability significantly more than s(λ ) then each has a coordinate with large low-degree
influence. However we need f and g to share such a coordinate. This technical difficulty can be over-
come, however, with a novel application of Green’s “Szemerédi Regularity Lemma for (Z/2Z)n ” [21].
Having done this, we get a UGC-hardness result for KN,N -Quadratic-Programming:
Theorem 4.6. Assume UGC holds. Then for every constant ε > 0 it is NP-hard to approximate the
KN,N -Quadratic-Programming problem to within factor 1/KDavie–Reeds + ε ≈ .597.
   This result complements the SDP rounding algorithm of Alon and Naor [3] which uses Krivine’s
upper bound on Grothendieck’s constant to give a 1/KKrivine ≈ .561 approximation for KN,N -Quadratic-
Programming. The result also improves significantly on the 12/13 ≈ .923 NP-hardness result for KN,N -
Quadratic-Programming given in [3], albeit only under UGC.


5      Quadratic-Programming and Max-Cut-Gain
5.1     Elementary identities
In this section we give some elementary identities regarding the operators P1 and DR1−ε . Most of the
identities hold completely formally from the definitions and work in the setting of functions Rn → Rd .
Let us give some general definitions:
Definition 5.1. Let µ be any probability distribution on Rn , discrete or continuous. We write E[·] for
                                                                                     (d)
integration with respect to µ. For functions f : Rn → Rd in the inner product space L2 (Rn , µ) we define
the following:
      • The ith “orthogonal coefficient” fˆ(i) ∈ Rd , for 1 ≤ i ≤ n, via

                                                        fˆ(i) = E[yi f (y)] .
                                                                y


      • The operator P1 via
                                                                                 n
                                         (P1 f )(x) = E[hx, yi f (y)] = ∑ fˆ(i)xi .
                                                          y
                                                                                i=1


                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                              98
                           SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

      • The “projection deficiency” operator ∆1 via

                                                             ∆1 = P12 − P1 .

      • The Davie–Reeds operator DR1−ε via

                                                    DR1−ε f = P1 f − (1 − ε) · id .

      The following identities hold immediately from expanding the definitions:
                                (d)
Lemma 5.2. For any f in L2 (Rn , µ),
                                          n
                         h f , P1 f i = ∑ k fˆ(i)k22                                                  (5.1)
                                         i=1
                                      = k f k22 − kP1 f − f k22 + h∆1 f , f i                         (5.2)

                                          n
                    h f , DR1−ε f i = ∑ k fˆ(i)k22 − (1 − ε)k f k22                                   (5.3)
                                         i=1
                                            n
                                      = ε ∑ k fˆ(i)k22 − (1 − ε)kP1 f − f k22 + (1 − ε)h∆1 f , f i    (5.4)
                                           i=1


                                           n
                        h∆1 f , f i = ∑ (E[xi x j ] − δi j ) · h fˆ(i), fˆ( j)i .                     (5.5)
                                         i, j=1

(Here δi j is the Kronecker delta.)
    Our main interest is in the case that µ is either the n-dimensional Gaussian distribution or the uni-
form distribution on {−1, 1}n . (The general case is only necessary for the Gaussian SDP discretization
argument in Section 6.) In these two cases we have E[xi x j ] = δi j ; hence h∆1 f , f i = 0 for every f and
P1 is indeed a projection operator (P12 = P1 ). This implies all the familiar identities from Fourier and
Hermite analysis; e. g., h f , P1 f i = kP1 f k22 .

5.2     SDP gap for DR1−ε on Gaussian space
In this section we show an ε versus O(ε/ log(1/ε)) SDP gap for the Davie–Reeds operator DR1−ε on
Gaussian space. This yields the A = DR1−ε part of Theorem 4.1.
Theorem 5.3. There is a universal constant C such that for every ε > 0 and d ≥ 1 we have

                                                sup       h f , DR1−ε f i ≥ ε −C/d ,                  (5.6)
                                          f :(Rd ,γ)→Bd

                          and                 sup         h f , DR1−ε f i ≤ C ε/ log(1/ε) .           (5.7)
                                      f :(Rd ,γ)→[−1,1]



                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                            99
                                      S UBHASH K HOT AND RYAN O’D ONNELL

Proof. To prove (5.6), take f : (Rd , γ) → Bd to be the function f (z) = z/kzk2 . In this case,
                                         y                                           yi y j
                                Z                                                Z
                      fˆ(i) =        yi     dγ(y),         and hence fˆ(i) j =              dγ(y) .
                                        kyk                                          kyk2

It is easy to see that for j 6= i we have fˆ(i) j = 0 (since the density function of y j is even, say). Thus
                                                                        d
                                     y2i
                               Z                                                       Z
                 k fˆ(i)k2 =             dγ(y) ,         and hence ∑ k fˆ(i)k2 =           kyk2 dγ(y) .
                                    kyk2                               i=1
                                                                                                          √      √
It is well known that the expected length of a random d-dimensional Gaussian is at least                   d − C/ d,
for some universal constant C (this also follows from Lemma 6.2). Thus we have
                                               d               √     √
                                               ∑ k fˆ(i)k2 ≥    d −C/ d .
                                               i=1

But all summands here are in fact the same, by symmetry; thus
                                        d               √     √
                         h f , P1 f i = ∑ k fˆ(i)k22 ≥ ( d −C/ d)2 /d ≥ 1 − 2C/d ,
                                         i=1

where we used (5.1). Substituting this into (5.3) and using k f k2 ≤ 1 we get

                                                   h f , DR1 f i ≥ ε − 2C/d

for our particular f , thus proving the “completeness” (5.6).
    We now prove the “soundness” (5.7) using identity (5.4); note that the h∆1 f , f i term is 0 in our
Gaussian setting. Given any f : (Rd , γ) → [−1, 1], the orthogonal coefficients fˆ(i) are just d real numbers
and (P1 f )(x) is just the linear form ∑di=1 fˆ(i)xi with the xi ’s being independent standard one-dimensional
Gaussians. Thus P1 f is distributed as a mean-zero one-dimensional Gaussian with variance equal to
σ 2 = ∑di=1 fˆ(i)2 . Note that σ 2 ≤ k f k22 ≤ 1, by (5.2). Using (5.4), we have

                                     h f , DR1−ε f i = εσ 2 − (1 − ε)kP1 f − f k22 .                            (5.8)

Now whenever |P1 f | > 2 there is a contribution of at least 1 to kP1 f − f k22 , simply because f ’s values
are in [−1, 1]. Hence by the “heavy tail” property of mean-zero one-dimensional Gaussians, i. e.,

                                          Pr[|P1 f | > 2] ≥ exp(−C · 22 /σ 2 )

for some universal constant C, we have

                                    h f , DR1−ε f i ≤ εσ 2 − (1 − ε) exp(−4C/σ 2 ) .

If σ 2 ≥ C0 / log(1/ε), where C0 is a sufficiently large constant, then (1 − ε) exp(−C/σ 2 ) ≥ 2ε; since
σ 2 ≤ 1 this would make h f , DR1−ε f i negative. Thus we can assume σ 2 ≤ C0 / log(1/ε). But this means
h f , DR1−ε f i ≤ C0 ε/ log(1/ε) and so (5.7) is proved.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                     100
                            SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

    Let us make a remark about the analysis of the soundness case. Our upper bound worked by first
considering the linear part of f to be fixed; say, P1 f (x) = hx, ri for some vector r with krk22 = σ 2 . It then
considered the “worst possible” values f could have in (5.8), assuming that these values could be chosen
arbitrarily in [−1, 1]. Of course they can’t be, since P1 f was fixed. But if they could, it’s clear from (5.8)
that it’s worst if f (x) equals P1 f (x) whenever this value is in [−1, 1], and equals sgn(P1 f (x)) otherwise.
Then optimizing σ 2 to Θ(1/ log(1/ε)), we conclude that the worst functions “for our analysis” are of
the form
                                             f (z) = roundT (hz, ri) ,
                  p
where T = Θ( log(1/ε)). Interestingly, this precisely mimics the SDP rounding algorithm for
Quadratic-Programming/Max-Cut-Gain suggested by Feige and Langberg [19] and analyzed by
Charikar and Wirth [12].

5.3   Long Code test with DR1−ε on the discrete cube
When DR1−ε operates on functions on the discrete cube {−1, 1}n rather than functions on Gaussian
space, it turns out that the “dictator” functions ±χi become optimal for h f , DR1−ε f i. Thus the com-
pleteness bound (5.6) becomes simply ε even for d = 1, and therefore the soundness bound (5.7) no
longer holds. However the soundness bound holds if we restrict to functions which have all the quan-
tities fˆ(i)2 small enough. (Note that fˆ(i)2 = Inf≤1
                                                   i ( f ).) The following gives the A = DR1−ε part of
Theorem 4.2.

Theorem 5.4. There is a universal constant C such that for every 0 < ε < 1/2 and d ≥ 1 we have

                                  sup         h f , DR1−ε f i = hχi , DR1−ε χi i = ε       for all i,      (5.9)
                            f :{−1,1}n →Bd


                  and            sup            h f , DR1−ε f i ≤ C ε/ log(1/ε) .                         (5.10)
                          f :{−1,1}n →[−1,1]
                        ∀i | fˆ(i)|≤1/C log(1/ε)

Proof. The proof of (5.9) is trivial: By identity (5.4) (with the h∆1 f , f i term dropping out) we see that
h f , DR1−ε f i is always at most ε. Since dictators achieve ε, they are indeed optimal. As for (5.10), the
proof is similar to the Gaussian case, except now P1 f is a Rademacher average, P1 f = ∑ni=1 fˆ(i)xi with
the xi independent uniform ±1. Nevertheless the heavy tail property still holds assuming all | fˆ(i)| are
small enough: the following result appears in, e. g., the book of Ledoux–Talagrand [35, p. 92]: There is
a universal constant C such that if ∑ni=1 αi2 = σ 2 , t ≥ σ , and |αi | ≤ σ 2 /Ct for all i, then

                                         Pr        [∑ni=1 αi xi ≥ t] ≥ exp(−Ct 2 /σ 2 ).
                                    x∈{−1,1}n

The result (5.10) now follows by the same argument as in the Gaussian case.

    We now have a “Long Code test” for Quadratic-Programming using DR1−ε very much like the one
for Max-Cut used in [28]. However DR1−ε is not a Max-Cut-Gain operator, which we would prefer.
However, as we will see in the next section, it can be replaced by one.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                 101
                                  S UBHASH K HOT AND RYAN O’D ONNELL

5.4   The relation of DR1−ε to Max-Cut-Gain
In this section we show that DR1−ε can be replaced by a negative probability operator with essentially
the same properties. This gives us an SDP gap and Long Code test for Max-Cut-Gain with essentially
the same soundness and completeness as in Theorems 5.3 and 5.4, completing the proof of Theorems 4.1
and 4.2.

Theorem 5.5. Let X denote either L2 (Rn , γ) or L2 ({−1, 1}n ). Then for each 0 < ε < 1/2 there is a
negative probability operator Aε on X such that for all d ≥ 1,

                                                                      1
                    sup h f , DR1−ε f i ≤       sup h f , Aε f i ≤         sup h f , DR1−4ε f i .       (5.11)
                   f :X→Bd                     f :X→Bd                4 f :X→Bd

Proof. We let Aε = −(pT1 + (1 − p)T−1/2 ), where Tρ denotes the Bonami-Beckner operator in the dis-
crete cube case and the Ornstein-Uhlenbeck operator in the Gaussian case; and, where p is chosen so
that the projection function g(y) = y1 satisfies hg, Aε gi = ε; specifically, p = 13 − 23 ε. Since the Bonami-
Beckner and Ornstein-Uhlenbeck operators are probability operators, Aε is clearly a negative probability
operator.
    Let us now calculate h f , Aε f i. We write f in terms of its Fourier or Hermite expansion, f = ∑k≥0 Pk f ,
and recall that Tρ Pk f = ρ k Pk f . Hence

                                        h f , Aε f i =    ∑ cd kPd f k22 ,
                                                         d≥0


where cd = −(p · 1d + (1 − p) · (−1/2)d ). It’s not very difficult to check that c1 = ε and that

                                         −1 ≤ cd ≤ c3 = −( 41 − 34 ε)

for all d 6= 1. Hence

                εkP1 f k22 − k f − P1 f k22 ≤ h f , Aε f i ≤ εkP1 f k22 − ( 14 − ε)k f − P1 f k22 .

Comparing this with (5.4) (which reads h f , DR1−ε f i = εkP1 f k22 − (1 − ε)kP1 f − f k22 in the Gaussian
setting since h∆1 f , f i = 0) yields the theorem, (5.11).

5.5   UGC-hardness of Max-Cut-Gain
Using the Long Code test just developed, we can get Theorem 4.3, an ε versus O(ε/ log(1/ε)) UGC-
hardness result for Max-Cut-Gain which matches the algorithm of Charikar and Wirth [12]. The proof
follows by repeating almost verbatim the reduction in [28] that proves .878 + ε UGC-hardness for Max-
Cut.
    More specifically, we first modify [28]’s reduction in Section 8.1. That reduction picks x ∈ {−1, 1}M
uniformly, forms y = xµ ∈ {−1, 1}M by picking µ from the ρ-biased distribution, and “accepts” if and
only if fw (x ◦ σ ) 6= fw (y ◦ σ 0 ). Instead, given a constant ε > 0, our reduction chooses the pair (x, y)
with probability −Axy , where A = Aε is the negative probability operator from Theorem 4.1. This makes

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                102
                          SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

sense: since A is a negative probability operator, its entries are all negative and their absolute values sum
to 1. Under this distribution, for f : {−1, 1}M → {−1, 1} we have, as in (1.1),

                      Pr[ f (x) 6= f (y)] = ∑ (−Axy ) 21 − 21 f (x) f (y) = 12 + 12 h f , A f i ;
                                                                         
                                         x,y

and hence the “gain” over 1/2 is h f , A f i.
    The reduction’s completeness is as in [28]: If f is a dictator (Long Code) then the first part of
Theorem 4.2 implies that the Max-Cut-Gain instance will have value at least Ω(ε) − O(η). (The label
size M can be increased to O(log(1/ε)) if necessary.) This value is still Ω(ε) if η > 0 is taken to be
a small enough constant compared to ε. As for soundness, the same analysis as in [28] gives that the
value of the Max-Cut-Gain instance is hgv , Agv i, where gv is as defined in [28]. We define a vertex v to
be “good” if hgv , Agv i is, say, at least twice the O(ε/ log(1/ε)) quantity in Theorem 4.2. Then at least an
Ω(ε/ log(1/ε)) fraction of v’s are good, and for each, our Theorem 4.2 implies that gv has at least one
coordinate i with |gˆv (i)| ≥ Ω(1/ log(1/ε)). We can interpret this as Inf≤1                     2
                                                                               i (gv ) ≥ Ω(1/ log (1/ε)) and
proceed with the proof from [28], taking its “k” to be 1. Overall, we get a Unique Label Cover labeling
satisfying at least a γ 0 = Ω(ε/ log5 (1/ε)) fraction of edges, a positive constant as necessary.


6     Discretizing the Gaussian SDP gap for DR1−ε
6.1   Overview
As an interlude before we move on to KN,N -Quadratic-Programming, we describe how to perform a
discretization or “randomness-reduction” on our Gaussian SDP gap for DR1−ε . The ideas in this section
may be useful for shortening the Long Code and possibly eliminating the loss of a logarithm in certain
SDP gap constructions, such as those in [32, 33].
    By taking the constant ε → 0, Theorem 5.3 shows that there is no constant integrality gap for the
Quadratic-Programming problem. We would like to give a construction where ε is “subconstant”; i. e.,
a function of the number of points in the domain. This doesn’t make sense in Gaussian space since
this is an infinite domain. However we will show that we can discretize Gaussian space into N points
in such a way that the SDP gap still holds. In particular we will show how to do this with dimension
d = poly(1/ε) and number of points N = poly(d) = poly(1/ε). Since the SDP gap is Ω(log(1/ε)),
this yields a Quadratic-Programming integrality gap of Ω(log N) on N-vertex graphs. As discussed in
Section 4, this is tight up to constant factors.

6.2   How to discretize
The discretization we perform is as simple as possible: We simply pick a set G ⊂ Rd of N points
randomly according to the d-dimensional Gaussian distribution. We view G as having the uniform
                                                                               (d)
distribution µ. We can define orthogonal coefficients, P1 , ∆1 , and DR1−ε on L2 (G, µ) as in Section 5.1.
We then show that with high probability over the choice of G, the ε versus O(ε/ log(1/ε)) SDP gap
from Theorem 5.3 still holds. The basic idea is that the proof of Theorem 5.3 uses only a small number
of facts about Gaussian random variables; we will isolate the facts used and ensure that G has each of
them, at least approximately.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                              103
                                  S UBHASH K HOT AND RYAN O’D ONNELL

    Specifically, given d ≥ 1, we will require following three facts about G ⊂ Rd :
                                 √         p                  √         p
                         ∀ x ∈ G, d − O( log d) ≤ kxk2 ≤ d + O( log d) ,                                 (6.1)

                                 ∀ 1 ≤ i, j ≤ n,     E [xi x j ] − δi j ≤ 1/d 3 ,                        (6.2)
                                                    x∈G
                                             h                  i
                                           Pr hv, xi > Ω( log d) ≥ α, where α = d −1/4 .
                                                         p
              ∀ v ∈ Rd with kvk2 = 1,                                                                    (6.3)
                                           x∈G

(Here δi j denotes
             √ the Kronecker delta.) In words: Property (6.1) demands that all vectors in G have
length about d; Property
                    R
                            (6.2) demands that products of pairs of coordinate have essentially the correct
expectation, since xi x j dγ(x) = δi j ; and Property (6.3) demands that the random variable hv, xi, which
should act like a linear combination of Gaussians, has an appropriately heavy tail for all unit-length
coefficient vectors v.
    Standard arguments based on the probabilistic method and the union bound show that G has all three
properties with high probability when N = d O(1) :

Lemma 6.1. Let d be at least a sufficiently large constant and let N = Θ(d 7 ). Then with probability at
least 1 − 1/N, G satisfies Properties (6.1), (6.2), and (6.3).

Proof. For Property (6.1), if g is drawn from the d-dimensional Gaussian distribution then E[kgk22 ] = d
and there is sharp concentration around the mean; in particular, Part 1 of Lemma 6.2 below implies
    h                 p           i     h√        p                     √      p         i          1
 Pr kgk22 − d ≤ O( d log N) = Pr d − O( d log d) ≤ kgk2 ≤ d + O( d log d) ≥ 1 − 2
                                                                                                 3N
if the constant in the O(·) is large enough and d is at least a large constant. So by a union bound we have
that Property (6.1) holds for G except with probability 1/(3N).
     The proof of Property (6.2) is essentially similar. Fix some particular i 6= j; then when G is chosen
at random, the distribution of the quantity Ex∈G [xi x j ] is that of the average of N i.i.d. random variables
whose distribution isp  the product of two independent normals. By Part 2 of Lemma 6.2 below, the
quantity is within O( (log N)/N) ≤ 1/d 3 of the desired value δi j = 0 except with probability at most
1/(3N 2 ). Similarly, when j = i we have that the quantity is within 1/d 3 of δi j = 1 except with proba-
bility at most 1/(3N 2 ), using Part 1 of Lemma 6.2 below. Taking a union bound over all d 2 < N pairs
(i, j), we get that Property (6.2) holds for G except with probability at most 1/(3N).

    The proof of Property (6.3) is only a little bit trickier. Let us first fix a particular vector v ∈ Rd with
kvk2 = 1 and analyze the probability (over the choice of G) that Yv ≥ αN, where Yv is defined by
                                                                p
                                      Yv = #{x ∈ G : hv, xi > Ω( log d)} .
                                                                                                       √
We can write Yv = ∑Ni=1 Xi , where Xi is an indicator random variable for the event that hv, gi i ≥ Ω( log d).
Since hv, gi i is distributed as a standard one-dimensional Gaussian, by the heavy tail property of Gaus-
sians we get that Pr[Xi = 1] ≥ 2α, assuming the constant in the Ω(·) is small enough. Hence by stan-
dard large deviation bounds with respect to the sum Yv = ∑ni=1 Xi we conclude that Pr[Yv < αN] ≤
exp(−Ω(2αN)) ≤ exp(−d 6 ).

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                104
                         SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

    Now let N be a (1/d)-net on the d-dimensional unit sphere; by a standard bound we may assume
|N| ≤ 2O(d log d) . Since |N| exp(−d 6 ) is much less than 1/(3N), another union bound lets us conclude
that when G is chosen at random, except with probability at most 1/(3N) we have
                                                h                   i
                                 ∀ v ∈ N, Pr hv, xi > Ω( log d) ≥ α .
                                                            p
                                           x∈G

Assuming this happens, and also assuming Property (6.1) holds, Property (6.3) follows easily. To see
this, let w be any unit vector in Rd , and let v be the closest point in N to w. We know that at least an α
                                                  √
fraction of the points x in G satisfy hv, xi > Ω( log d). But for these x’s,
                                                p                  √        p            p
         hw, xi ≥ hv, xi − kv − wk2 · kxk2 > Ω( log d) − (1/d)( d + O( log d)) = Ω( log d) ;

thus Property (6.3) holds for w. The proof is complete.

    Following are the large deviation bounds we used in the above proof:
Lemma 6.2. There is a universal constant C < ∞ such that the following hold:
   1. Let X1 , . . . , Xn be i.i.d. random variables, where Xi is distributed as the square of a standard
                                        √
      Gaussian. Then for all s < n,
                                                                √ 
                                   Pr (X1 + · · · + Xn ) − n ≥ s n ≤ 2 exp(−s2 /C) .
                                      


   2. If Xi is instead distributed as the product of two independent standard Gaussians, then for all
          √
      t < n,                                               √ 
                               Pr (X1 + · · · + Xn )/n ≥ s/ n ≤ 2 exp(−s2 /C) .
                                  

Proof. These are completely standard applications of the theory of large deviations [15, 14] of Cramér
and Chernoff; we include the proofs for convenience. For√Part 1, direct calculation shows that the
moment generating function of X1 − 1 is g(t) = exp(−t)/ 1 − 2t, converging for t < 1/2. Writing
      √
ε = s/ n < 1 and selecting t = ε/(2 + 2ε) < 1/2, the theory tells us that
                                                 √                  √      n
                   Pr (X1 + · · · + Xn − n)/n ≥ s/ n ≤ exp(−ε/2) 1 + ε .
                                             √
We have the Taylor expansion exp(−ε/2) 1 + ε = 1 − ε 2 /4 + ε 3 /6 − · · · and so it’s easy to check that
for ε < 1 the quantity is upper-bounded by exp(−ε 2 /C) for some C < ∞; indeed, C = 7 suffices. Thus
                                               √                n
                    Pr X1 + · · · + Xn − n ≥ s n ≤ exp(−ε 2 /C) = exp(−s2 /C) .
                        

                                          √
The proof of Pr[X1 + · · · + Xn − n ≤ −s n] ≤ exp(−s2 /C) is the same, replacing t with −t. √
     As for Part 2, direct calculation shows that the moment generating function of X1 is 1/ 1 − t 2 ,
                                                √
converging for |t| < 1. Again we write ε = s/ n < 1, and we then select t = ε. Then Chernoff’s theory
tells us that                                      √              p           n
                       Pr (X1 + · · · + Xn )/n ≥ s/ n ≤ exp(−ε 2 )/ 1 − ε 2 .
                           

                                            √
We have the Taylor expansion exp(−ε 2 )/ 1 − ε 2 = 1 − ε 2 /2 + (3/8)ε 4 + · · · , and the remainder of the
proof is very similar to that for Part 1.

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                             105
                                    S UBHASH K HOT AND RYAN O’D ONNELL

6.3   Reproving the SDP gap on G
We can now prove Theorem 4.4:

Proof. Applying Lemma 6.1 we have that G satisfies Properties (6.1), (6.2), and (6.3). We now proceed
with the proof of Theorem 5.3, making the corrections necessary given that we are on the domain G
rather than (Rd , γ).
    For the “completeness” lower bound we use the same function f (z) = z/kzk2 . Now
                                 h y2 i      1 − 1/d 3         1        √log d 
                      fˆ(i)i = E    i
                                        ≥√        √       ≥  √    −  O
                              y∈G kyk2     d + O( log d)        d           d
using Properties (6.1) and (6.2); hence
                                d                         p
                               ∑ k fˆ(i)k22 ≥ 1 − O( (log d)/d) ≥ 1 − ε/2 ,
                               i=1

and therefore h f , DR1−ε f i ≥ ε/2, using (5.3) again.
    For the “soundness” lower bound, given f : G → [−1, 1] we can no longer use h∆1 f , f i = 0. However
using Properties (6.1) and (6.2) it is not hard to show that |h∆1 f , f i| is very small. Combining (5.5) and
Property (6.2) we have
                                                   d                            d
                          |h∆1 f , f i| ≤ (1/d 3 ) ∑ | fˆ(i) fˆ( j)| ≤ (1/d 2 ) ∑ fˆ(i)2 .
                                                 i, j=1                        i=1

We can crudely upper-bound ∑di=1 fˆ(i)2 using the following formal identity:
                                       d
                                       ∑ fˆ(i)2 = x,y∈G
                                                    E [hx, yi f (x) f (y)] .
                                       i=1

Since f ’s range is [−1, 1] and |hx, yi| ≤ O(d) for every x, y ∈ G (using Property (6.1)), we get ∑di=1 fˆ(i)2 ≤
O(d). Overall we conclude that
                                           |h∆1 f , f i| ≤ O(1/d) ≤ ε 2 .                                  (6.4)
We now continue as in the proof of Theorem 5.3. Let w ∈ Rd be the vector whose coefficients are the
fˆ(i)’s and let σ 2 = kwk22 = ∑ fˆ(i)2 . Now that we know |h∆1 f , f i| ≤ O(1/d) ≤ ε 2 , the identity (5.2) lets
us conclude the much sharper upper bound σ 2 ≤ 1 + ε 2 . We will again use (5.4) (combined with (6.4)):
                              h f , DR1−ε f i ≤ εσ 2 − (1 − ε)kP1 f − f k22 + ε 2 .                       (6.5)
Now (P1 f )(x) = hw, xi is no longer a Gaussian; however, Property (6.3) still lets us conclude that
                                  h              p          i
                               Pr (P1 f )(x) > Ω( log d/σ ) ≥ α = ε 3/4 .
                               x∈G
                                                                                            √
So as in the old proof, if σ 2 ≥ C0 / log(1/ε) for C0 a sufficiently large constant, then Ω( log d/σ > 2.
In this case, (P1 f )(x) exceeds 2 with probability at least ε 3/4 hence kP1 f − f k22 ≥ ε 3/4 and so (6.5)
is negative (using σ 2 ≤ 1 + ε 2 ). Thus we may assume σ 2 ≤ C0 / log(1/ε) and we get h f , DR1−ε f i ≤
O(ε/ log(1/ε)) from (6.5), as required.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                 106
                               SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

7      KN,N -Quadratic-Programming
7.1     SDP gap for DRλ on Gaussian space:
        The Davie–Reeds lower bound on the Grothendieck constant
In this section we review the best known lower bound on the Grothendieck constant. This bound (i. e.,
SDP gap) was first proven by Davie [17], although his paper is unpublished and difficult to obtain. We
will instead follow the independent proof of Reeds [43], which uses the same operator. Reeds’s proof
is less slick but it is the one that we will be able to translate into an equivalent Long Code test in the
{−1, 1}n setting. The main translational difficulty is that both papers begin by simplifying the analysis
using the rotational symmetry of the Gaussian distribution. This is of course missing in the {−1, 1}n
case. We show in this section how to change Reeds’s proof so that it does not use this step. For more on
SDP gaps and Long Codes tests for the KN,N -Quadratic-Programming problem, the reader may consult
the very recent paper of Raghavendra and Steurer [42].

    Let us begin with some notation from Reeds’s
                                           p     paper. For 0 < λ < .4, define s(λ ) to be (λ /η)2 +
λ (1 − 4Φ(−η)), where η ∈ (0, 1) satisfies 2/πη exp(−η 2 /2) = λ and Φ is the cdf of a standard
normal.1 The Davie–Reeds result shows an SDP gap of 1 − λ versus s(λ ); in particular, this gives a
lower bound of
                            KDavie–Reeds = sup (1 − λ )/s(λ ) ≈ 1.6769
                                                          0<λ <.4

for Grothendieck’s constant, taking λ = λ ∗ ≈ .19748.
      We now sketch (a slight alteration of) Reeds’s proof:

Theorem 7.1. There is a universal constant C such that for all 0 < λ < .4 and d ≥ 1,

                                            sup         h f , DRλ gi ≥ 1 − λ −C/d                                          (7.1)
                                      f ,g:(Rd ,γ)→Bd
                                                                                              Z
                   and                      sup          h f , DRλ gi =          sup              |DRλ g| ≤ s(λ ) .        (7.2)
                                    f ,g:(Rd ,γ)→[−1,1]                    g:(Rd ,γ)→[−1,1]

Proof. (Sketch of Reeds’s proof.) The bound (7.1) follows immediately from (5.6). ToR prove (7.2),
first note that it suffices to take the supremum over functions g : (Rd , γ) → {−1, 1}, since |DRλ g| is a
convex function of g. Hence we must determine
                                                                     Z
                                                        sup              |P1 g − λ g| .                                    (7.3)
                                                  g:(Rd ,γ)→{−1,1}


So suppose g : (Rd , γ) → {−1, 1}. Let ` denote P1 g and σ = k`k2 ≤ 1. If σ = 0 then clearly (7.3)
equals λ , which is smaller than s(λ ) for λ < .4. Hence we may assume σ > 0; in this case, let φ
denote the pdf of `/σ (so in fact here φ is the standard Gaussian pdf). Let θg : R → [−1, 1] be defined by
    1 Reeds’s proof considers all 0 ≤ λ ≤
                                            p
                                            2/eπ ≈ .484, but the proof is slightly simpler if one restricts λ ’s range slightly so
that we always have λ < s(λ ). This doesn’t hurt because the optimal λ turns out to be .19748. (Note that Reeds’s paper states
the optimal λ is .25573 but this is a typo—that’s the optimal value of η.)


                              T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                             107
                                       S UBHASH K HOT AND RYAN O’D ONNELL

θg (z) = E[g | `/σ = z], so Pr[g = 1 | `/σ = z] = 1/2+θg (z)/2 and Pr[g = −1 | `/σ = z] = 1/2−θg (z)/2.
Note that, by definition,
                                                Z
                                                         φ (z)zθg (z) dz = σ                                                (7.4)
                                                    R
and          Z                    Z                                                                         
                                                    1   1                                1   1
                 |P1 g − λ g| =        φ (z)        2 + 2 θg (z)       σz−λ +            2 − 2 θg (z)   σz+λ         dz .   (7.5)
                                   R
    Now (7.3) can be bounded from above as follows: First fix the value σ . Then the supremum in (7.3),
conditioned on g satisfying kP1 gk22 = σ 2 , is upper-bounded by what can thought of as an (infinite) linear
program: the unknowns are the values of θg , the objective is (7.5) (which is linear in θg ), and (7.4)
and |θg | ≤ 1 serve as the constraints. For each value of σ an optimal θ ∗ is determined; substituting
this into (7.5) gives an upper bound for (7.3) conditioned on kP1 gk22 = σ 2 . Finally, this upper bound is
maximized over 0 < σ ≤ 1.
    Let us rewrite the objective (7.5) as
                                                Z
                                                        φ (z)ψ(z)θ (z) dz +Cλ ,σ                                            (7.6)
                                                    R
where                                                                      Z
            ψ(z) = 12 (|σ z − λ | − |σ z + λ |) ,                Cλ ,σ =        φ (z) · 12 (|σ z − λ | + |σ z + λ |) dz .
                                                                            R
Since φ is an even function (and ψ is odd), we see that both the constraint (7.4) and the objective (7.6)
involve integrating θ against an odd function. It follows that the optimal θ ∗ may as well be odd as well,
since one can replace an arbitrary optimizer θ ∗ by θ 0 (z) = (θ ∗ (z) − θ ∗ (−z))/2. Thus we equivalently
are trying to find θ : R+ → [−1, 1] maximizing
                 Z ∞                                                                  Z ∞
                       2φ (z)ψ(z)θ (z) dz +Cλ ,σ                  subject to                 2φ (z)zθ (z) dz = σ .
                  0                                                                      0
Since ψ(z)/z is increasing in z for all σ , λ , it follows that the optimizing θ ∗ is of the form
                                          (
                                             1       if z ∈ (−h, 0) ∪ (h, ∞)
                                θ ∗ (z) =                                                                                   (7.7)
                                             −1 if z ∈ (−∞, −h) ∪ (0, h),
where h = h(σ ) ≥ 0 is selected to be as large as possible such that the constraint (7.4) holds.
   Thus we conclude that
                                                                                Z
                                      sup           h f , DRλ gi ≤ sup              φ (z)Ψλ ,h (z) dz ,                     (7.8)
                              f ,g:(Rd ,γ)→[−1,1]                      0<σ ≤1 R

where Ψλ ,h : R → R is defined by
                      Ψλ ,h (z) = 21 (|σ z − λ | − |σ z + λ |)θ ∗ (z) + 21 (|σ z − λ | + |σ z + λ |) ,
h is defined in terms of σ via                       Z
                                                            φ (z)zθ ∗ (z) dz = σ ,                                          (7.9)
                                                        R
and θ ∗ is defined in (7.7).    √
   With φ (z) = exp(−z2 /2)/ 2π, everything in the expression on the right in (7.8) is completely
explicit, and Reeds calculates that it is equal to (7.2).

                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                               108
                           SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

7.2   Long Code test with DRλ on the discrete cube
We now transfer the SDP gap from the previous section to the discrete cube setting, just as we did in
the Quadratic-Programming case. This time the SDP gap used more than just the “heavy tail” property
of the linear part of a Gaussian function; it uses estimates on certain piecewise linear functionals of the
linear part. The following is a restatement of Theorem 4.5.
Theorem 7.2. There is a universal constant C such that for all 0 < λ < .4, d ≥ 1, and δ > 0,

                            sup          h f , DRλ gi = hχi , DRλ χi i = 1 − λ           for all i,         (7.10)
                      f ,g:{−1,1}n →Bd

              and          sup              h f , DRλ gi =         sup           E[|DRλ g|] ≤ s(λ ) +Cδ .   (7.11)
                    f ,g:{−1,1}n →{−1,1}                     g:{−1,1}n →{−1,1}
                          ∀i |ĝ(i)|≤δ                           ∀i |ĝ(i)|≤δ

Proof. We begin by noting that for f , g : {−1, 1}n → Bd ,
                                                q
                   h f , DRλ gi ≤ kDRλ gk2 ≤ (1 − λ )2 kP1 gk22 + λ 2 ∑i6=1 kPi gk22 ;                      (7.12)

thus certainly h f , DRλ gi ≤ 1 − λ and so (7.10) is proved.
    To prove (7.11) we proceed as in the proof of (7.2) from Reeds’s proof; we bound E[|DRλ g|] as-
suming g : {−1, 1}n → {−1, 1} is some function satisfying |ĝ(i)| ≤ δ for all i. Again we let ` = P1 g, a
Rademacher average, and define σ = k`k2 . We may assume

                                                         σ 2 ≥ .001                         (7.13)
                                           √
                                                      2
    √ otherwise E[|DRλ g|] ≤ kDRλ gk2 ≤ .001 + λ using (7.12),∗and it can be checked numerically
because
             2
that .001 + λ ≤ s(λ ). (In fact, we only need this for the optimal λ .)
    We now define φ 0 to be the pmf (probability mass function) of `/σ and proceed with the linear
programming part of Reeds’s proof. We continue to use the notation
                                                     Z
                                                          φ 0 (z)F(z) dz
                                                      R

for the expected value of the discrete random variable F(z) when z is distributed as `/σ . Note that
are now only a finite number of “variables” θ (z), corresponding to the values that `/σ can attain. The
pmf φ 0 is still an even function, so the optimizing θ ∗ can again be odd. Again, the fact that ψ(z)/z is
increasing implies that the optimizing θ ∗ is of the form (7.7) for some h that `/σ attains; the value of
θ ∗ precisely at ±h may be in the interior of [−1, 1] so as to satisfy the constraint (7.4). Thus we have
                                                                           Z
                                  sup          E[|DRλ g|] ≤        sup           φ 0 (z)Ψλ ,h (z) dz
                         g:{−1,1}n →{−1,1}                       .001≤σ ≤1 R
                             ∀i |ĝ(i)|≤δ

as in Reeds’s proof.
    Our goal is to show this is the at most the same quantity with φ in place of φ 0 (as in (7.8)), up to
an additive O(δ ). By the piecewise-linearity of θ ∗ and Ψλ ,h , it is easy to check that this would follow
immediate from:

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                   109
                                   S UBHASH K HOT AND RYAN O’D ONNELL

Claim 7.3. If F(z) is any piecewise-linear function on R with O(1) pieces, each of the form az + b with
|a|, |b| ≤ O(1), then
                                   Z                        Z
                                       φ 0 (z)F(z) dz −         φ (z)F(z) dz ≤ O(δ ) .

    We complete the overall proof by proving the claim. The key is to use the fact that (`/σ ) is a
Rademacher average with variance 1 and all coefficients at most δ /σ ≤ 1000δ in magnitude (us-
ing (7.13)). Thus the Central Limit Theorem implies that φ 0 is “close” to the normal pdf φ . More
precisely, for I ⊆ R an interval, define Φ0 (I) = Pr[`/σ ∈ I] and define Φ(I) = Pr[N(0, 1) ∈ I]. Then the
non-uniform Berry-Esseen Theorem due to Bikelis [9] implies that

                                                                   O(δ )
                                 Φ0 (−∞,t] − Φ(∞,t] ≤                            ∀ t ∈ R.           (7.14)
                                                                  1 + |t|3

In particular, |Φ0 (I) − Φ(I)| ≤ O(δ ) for all intervals I ⊆ R, and Pr[`/σ = t] ≤ O(δ ) for each t ∈ R.
    To prove the Claim it suffices to prove the following: for any linear function F(z) = az + b on any
interval I 63 0,
                           Z                          Z
                                 φ 0 (z)F(z) dz −         φ (z)F(z) dz ≤ O(δ )(|a| + |b|)
                             I                        I

(here the notation I φ 0 (z)F(z) dz should be interpreted as φ 0 (z)1z∈I F(z) dz). We will show this for
                   R                                                     R

I = [u, v) with u ≥ 0, leaving the other cases to the reader. We have
      Z                                  Z                                         Z v            
             0                0                 0             0            0             0
           φ (z)F(z) dz = bΦ [u, v) + a       φ (z)z dz = bΦ [u, v) + a uΦ [u, v) +     Φ [t, v) dt .
      [u,v)                                   [u,v)                                         u

Subtracting this from the same expression with Φ gives an error term of
                                                            Zv 0
                                                                                   
                   0                       0
                                 
             b Φ [u, v) − Φ[u, v) + a u Φ [u, v) − Φ[u, v) + (Φ [t, v) − Φ[t, v)) dt ,
                                                                             u

the magnitude of which, by (7.14), is indeed at most
                                                    Z v           
                                           3                  3
            |b|O(δ ) + |a| u · O(δ )/(1 + u ) + O(δ ) 1/(1 + t ) dt ≤ O(δ )(|a| + |b|) ,
                                                                   u

as claimed.

    Using this theorem we get a sort of “two-function KN,N -Quadratic-Programming Long Code test”:
we see that if f , g : {−1, 1}n → {−1, 1} have h f , DRλ gi > s(λ ) + Cδ then g must have a coordinate i
with Inf≤1 ( f ) ≥ δ 2 . Indeed, since DRλ is self-adjoint, both f and g must have at least one influential
coordinate. However this is not a proper two-function Long Code test, because to get something useful
for UGC-hardness we need to show that f and g share an influential coordinate when h f , DRλ gi is too
large.
    It’s not clear whether or not this is true. However in the next section we will show that such a
statement can be made if we slightly “smooth” the operator DRλ .

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                             110
                            SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

7.3   A smoothed version of DRλ
To get a useful two-function KN,N -Quadratic-Programming Long Code test we introduce the following
operator, for small ε > 0:
                                     (1−ε)
                                   DRλ     = DRλ T1−ε = T1−ε DRλ ;
here T1−ε is the Bonami-Beckner operator with parameter 1 − ε. Introducing the slight smoothing T1−ε
does not hurt completeness very much, and on the soundness side it helps by making general influences
and low-degree influences essentially equivalent. With T1−ε built into our operator it might seem that
a straightforward averaging trick—like the one used by Dinur, Mossel, and Regev [18] in the context
of noise stability tests—should suffice to give us a two-function Long Code test. However there is a
technical difficulty: we only know how to prove soundness (7.11) for functions g : {−1, 1}n → {−1, 1}
with small |ĝ(i)|’s, not for the functions g : {−1, 1}n → [−1, 1] that arise from averaging. Bypassing this
difficulty requires some technical machinations.
     The main trick involves a twist on the so-called Szemerédi Regularity Lemma for (Z/2Z)n due to
Green [21]. Given f : {−1, 1}n → {−1, 1}, 0 < ε < 1/2, and J ⊆ [n] we say that f is ε-flat for J if for all
but an ε fraction of the strings xJ ∈ {−1, 1}J , it holds that | c
                                                                 fxJ (i)| ≤ ε for all i ∈ [n] \ J. Here fxJ denotes
the restricted function on {−1, 1}[n]\J obtained by fixing the inputs xJ .

Theorem 7.4. There is a universal function W : (0, 1/2) → N such that the following holds: For any
f : {−1, 1}n → {−1, 1} and 0 < ε < 1/2 one can associate a canonical set of coordinates J = J f ,ε ⊆ [n]
satisfying |J| ≤ W (ε) such that f is ε-flat for J.

    To prove this theorem one only needs to repeat Green’s proof with the notion of ε-flatness replacing
his notion of ε-regularity.
    We can now state our two-function Long Code test based on KN,N -Quadratic-Programming.

Theorem 7.5. There is a universal constant C0 such that the following holds: Given 0 < ε < 1/2, set
k = dlog(1/ε)/εe and δ = ε 2 /W (ε), where W is the function from Theorem 7.4. Then
                                            (1−ε)                   (1−ε)
                     sup             h f , DRλ      gi = hχi , DRλ          χi i = (1 − λ )(1 − ε)    for all i,   (7.15)
              f ,g:{−1,1}n →{−1,1}

                                                                         (1−ε)
                           and                      sup0          h f , DRλ      gi ≤ s(λ ) +C0 ε ,                (7.16)
                                           f ,g:{−1,1}n →{−1,1}

where the sup0 in (7.16) is taken over all f and g such that Inf≤k
                                                                i (g) ≤ δ for all i ∈ J = J f ,ε , the canonical
set of coordinates given by Theorem 7.4.

Proof. The completeness statement (7.15) follows easily from (7.12), so we now focus on the soundness
statement. Specifically, assume f , g : {−1, 1}n → {−1, 1}, J = J f ,ε is the set of coordinates guaranteed
by Theorem 7.4, and Inf≤ki (g) ≤ δ for all i ∈ J.
                   (1−ε)
    Write Q = DRλ        for short and note that the notation Qg≤k makes sense, in that Q “commutes”
with taking the low degree part. We have

                         h f , Qgi = h f , Qg≤k i + h f , Qg>k i ≤ h f , Qg≤k i + kQg>k k2 .

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                        111
                                   S UBHASH K HOT AND RYAN O’D ONNELL

We have kQg>k k22 = kDRλ T1−ε g>k k22 ; also kT1−ε g>k k22 ≤ (1 − ε)2k ≤ ε 2 and DRλ is a contraction on
L2 . Thus it suffice to show
                                        h f , Qg≤k i ≤ s(λ ) +C0 ε .
Given a function h : {−1, 1}n → {−1, 1}, let us use a bar to denote averaging over the coordinates of J.
In other words, we write h̄ : {−1, 1}n → [−1, 1] for the function ExJ [hxJ ] = ∑S∩J=0/ ĥ(S)χS , and this is
a function which doesn’t depend on the coordinates of J. Note that this averaging also commutes with
taking the low degree part. We have

                       kg≤k − ḡ≤k k22 =          ∑        ĝ(S)2 ≤ ∑ Inf≤k              2
                                                                         i (g) ≤ |J|δ = ε .
                                           |S|≤k,S∩J6=0               i∈J

                        (1−γ)
Using the fact that DRλ         is a contraction on L2 , an argument similar to the last one implies that it
suffices for us to show
                                                h f , Qḡ≤k i ≤ s(λ ) +C0 ε .
Now since Qḡ≤k does not depend on the coordinates of J we may write

                                                h f , Qḡ≤k i = h f¯, Qḡ≤k i ,

where here h, i is the inner product on {−1, 1}[n]\J . We will henceforth work over this probability space.
Using the fact that DRλ is self-adjoint and then using linearity, we have

                                     h f¯, Qḡ≤k i = EhDRλ fxJ , T1−ε ḡ≤k i .
                                                         xJ

Now again, kT1−ε ḡ>k k2 ≤ (1 − ε)k ≤ ε so it suffices for us to prove

                                      EhT1−ε ḡ, DRλ fxJ i ≤ s(λ ) +C0 ε .
                                      xJ

Since T1−ε ḡ has range [−1, 1], it suffices to show

                                           E E[|DRλ fxJ |] ≤ s(λ ) +C0 ε .
                                           xJ

We now use the fact that f is ε-flat for J. There are at most ε exceptional strings xJ for which we have
no control over fxJ . For these xJ we use simply E[|DRλ fxJ |] ≤ 1. For the remainder of the strings xJ we
have that fxJ is a {−1, 1}-valued function with | c
                                                  fxJ (i)| ≤ ε for all i. Thus E[| fxJ |] ≤ s(λ ) +Cε for these
strings, by (7.11). This completes the proof.

7.4   UGC-hardness of KN,N -Quadratic-Programming
The two-function Long Code test from Theorem 7.5 yields Theorem 4.6, a 1/KDavie–Reeds + ε ≈ .597
UGC-hardness result for KN,N -Quadratic-Programming. This complements the approximation algo-
rithm of Alon and Naor [3], which gives an approximation algorithm for KN,N -Quadratic-Programming
with factor 1/KKrivine ≈ .561, where KKrivine denotes the best known upper bound on Grothendieck’s
constant, due to Krivine [34].

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                                112
                         SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

    The proof of Theorem 4.6 is again very easily derived by substituting the Long Code test from
Theorem 7.5 into the UGC-hardness result for Max-Cut given in [28]. Indeed, the proof is even sim-
pler because one directly replaces each constraint in the bipartite Unique Label Cover instance by the
   (1−Ω(ε))
DRλ ∗       Long Code test, instead of putting tests between pairs of vertices on the right side and aver-
aging over Long Codes on the left side.


References
 [1] A MIT AGARWAL , M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY
                         √
     M AKARYCHEV: O( log n) approximation algorithms for Min-Uncut, Min-2CNF-Deletion,
     and directed cut problems.      In Proc. 37th STOC, pp. 573–581. ACM Press, 2005.
     [doi:10.1145/1060590.1060675]. 86

 [2] N OGA A LON , KONSTANTIN M AKARYCHEV, Y URY M AKARYCHEV, AND A SSAF NAOR:
     Quadratic forms on graphs. Invent. Math., 163(3), 2006. [doi:10.1007/s00222-005-0465-9]. 86,
     88, 89, 93, 97

 [3] N OGA A LON AND A SSAF NAOR: Approximating the Cut-Norm via Grothendieck’s inequality.
     SIAM J. Comput., 35(4):787–803, 2006. [doi:10.1137/S0097539704441629]. 86, 88, 90, 98, 112

 [4] N OGA A LON , B ENNY S UDAKOV, AND U RI Z WICK: Constructing worst case instances for
     semidefinite programming based approximation algorithms. SIAM J. Discrete Math., 15(1):58–
     72, 2002. [doi:10.1137/S0895480100379713]. 85

 [5] S ANJEEV A RORA , E LI B ERGER , E LAD H AZAN , G UY K INDLER , AND M ULI S AFRA: On non-
     approximability for quadratic programs. In Proc. 46th FOCS, pp. 206–215. IEEE Comp. Soc.
     Press, 2005. [doi:10.1109/SFCS.2005.57]. 87, 88, 89, 93, 97

 [6] S ANJEEV A RORA , S ATISH R AO , AND U MESH VAZIRANI: Expander flows, geometric em-
     beddings and graph partitioning. In Proc. 36th STOC, pp. 222–231. ACM Press, 2004.
     [doi:10.1145/1007352.1007355]. 86

 [7] N IKHIL BANSAL , AVRIM B LUM , AND S HUCHI C HAWLA: Correlation clustering. Machine
     Learning, 56(1-3):89–113, 2004. [doi:10.1023/B:MACH.0000033116.57574.95]. 87

 [8] M IHIR B ELLARE , O DED G OLDREICH , AND M ADHU S UDAN: Free bits, PCPs, and
     nonapproximability—towards tight results. SIAM J. Comput., 27(3):804–915, 1998.
     [doi:10.1137/S0097539796302531]. 95

 [9] A LGIMANTAS B IKELIS: Estimates of the remainder in the Central Limit Theorem. Litovsk. Mat.
     Sb., 6(3):323–346, 1966. In Russian. 110

[10] C HRISTER B ORELL: Geometric bounds on the Ornstein-Uhlenbeck velocity process. Z. Wahrsch.
     Verw. Gebiete, 70(1):1–13, 1985. [doi:10.1007/BF00532234]. 89

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                            113
                               S UBHASH K HOT AND RYAN O’D ONNELL

[11] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Near-optimal
     algorithms for unique games.     In Proc. 38th STOC, pp. 205–214. ACM Press, 2006.
     [doi:10.1145/1132516.1132547]. 86

[12] M OSES C HARIKAR AND A NTHONY W IRTH: Maximizing quadratic programs: Extending
     Grothendieck’s inequality. In Proc. 45th FOCS, pp. 54–60. IEEE Comp. Soc. Press, 2004.
     [doi:10.1109/FOCS.2004.39]. 86, 87, 89, 93, 96, 97, 101, 102

[13] S HUCHI C HAWLA , ROBERT K RAUTHGAMER , R AVI K UMAR , Y UVAL R ABANI , AND
     D. S IVAKUMAR: On the hardness of approximating Multicut and Sparsest-Cut. Comput. Com-
     plexity, 15(2):94–114, 2006. [doi:10.1007/s00037-006-0210-9]. 87

[14] H ERMAN C HERNOFF: A measure of asymptotic efficiency for tests of a hypothesis
     based on the sum of observations.  Annals of Mathematical Statistics, 23(4), 1952.
     [doi:10.1214/aoms/1177729330]. 105

[15] H ARALD C RAM ÉR: Sur un nouveau théroème-limite de la théorie des probabilités. Actualités
     Scientifiques et Industrielles, 736, 1938. 105

[16] P IERLUIGI C RESCENZI , R ICCARDO S ILVESTRI , AND L UCA T REVISAN: On weighted vs. un-
     weighted versions of combinatorial optimization problems. Inform. and Comput., 167(1):10–26,
     2001. [doi:10.1006/inco.2000.3011]. 90

[17] A LEXANDER DAVIE: Lower bound for KG . Unpublished, 1984. 88, 96, 107

[18] I RIT D INUR , E LCHANAN M OSSEL , AND O DED R EGEV: Conditional hardness for approximate
     coloring. In Proc. 38th STOC, pp. 344–353. ACM Press, 2006. [doi:10.1145/1132516.1132567].
     87, 111

[19] U RIEL F EIGE AND M ICHAEL L ANGBERG: The RPR2 rounding technique for semidefinite pro-
     grams. J. Algorithms, 60(1):1–23, 2006. [doi:10.1016/j.jalgor.2004.11.003]. 85, 86, 97, 101

[20] M ICHEL G OEMANS AND DAVID W ILLIAMSON: Improved approximation algorithms for max-
     imum cut and satisfiability problems using semidefinite programming. J. ACM, 42:1115–1145,
     1995. [doi:10.1145/227683.227684]. 84, 91, 97

[21] B EN G REEN: A Szemerédi-type regularity lemma in abelian groups, with applications. Geom.
     Funct. Anal., 15(2):340–376, 2005. [doi:10.1007/s00039-005-0509-8]. 98, 111

[22] A LEXANDRE G ROTHENDIECK: Résumé de la théorie métrique des produits tensoriels
     topologiques. Bol. Soc. Mat. Sao Paulo, 8:1–79, 1953. 88

[23] U FFE H AAGERUP: A new upper bound for the complex Grothendieck constant. Israel J. Math.,
     60(2):199–224, 1987. [doi:10.1007/BF02790792]. 88

[24] J OHAN H ÅSTAD: Some optimal inapproximability results.        J. ACM, 48(4):798–859, 2001.
     [doi:10.1145/502090.502098]. 85, 95

                       T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                         114
                        SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

[25] J OHAN H ÅSTAD AND S RINIVASAN V ENKATESH: On the advantage over a random assignment.
     Random Structures Algorithms, 25(2):117–149, 2004. [doi:10.1002/rsa.20031]. 86

[26] G EORGE K ARAKOSTAS: A better approximation ratio for the Vertex Cover problem. In Proc. 32nd
     Internat. Conf. on Automata, Languages, and Programming (ICALP), volume 3580 of Lecture
     Notes in Computer Science, pp. 1043–1050. Springer, 2005. [doi:10.1007/11523468 84]. 86

[27] S UBHASH K HOT: On the power of unique 2-prover 1-round games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017]. 85, 87, 95

[28] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM J. Comput., 37(1):319–
     357, 2007. [doi:10.1137/S0097539705447372]. 85, 86, 87, 95, 97, 101, 102, 103, 113

[29] S UBHASH K HOT AND A SSAF NAOR: Nonembeddability theorems via Fourier analysis. Math.
     Ann., 334(4):821–852, 2006. [doi:10.1007/s00208-005-0745-0]. 86

[30] S UBHASH K HOT AND RYAN O’D ONNELL: SDP gaps and UGC-hardness for MaxCutGain. In
     Proc. 47th FOCS, pp. 217–226. IEEE Comp. Soc. Press, 2006. [doi:10.1109/FOCS.2006.67]. 83,
     88

[31] S UBHASH K HOT AND O DED R EGEV: Vertex Cover might be hard to approximate to within 2 − ε.
     In Proc. 18th Conf. on Computational Complexity (CCC), pp. 379–386. IEEE Comp. Soc. Press,
     2003. 87

[32] S UBHASH K HOT AND N ISHEETH V ISHNOI: The Unique Games Conjecture, integrality gap for
     cut problems and embeddability of negative type metrics into `1 . In Proc. 46th FOCS, pp. 53–62.
     IEEE Comp. Soc. Press, 2005. [doi:10.1109/SFCS.2005.74]. 86, 103

[33] ROBERT K RAUTHGAMER AND Y UVAL R ABANI: Improved lower bounds for embeddings into
     L1 . In Proc. 17th Symp. on Discrete Algorithms (SODA), pp. 1010–1017. ACM Press, 2006.
     [doi:10.1145/1109557.1109669]. 86, 103

[34] J EAN -L OUIS K RIVINE: Sur la constante de Grothendieck. Comptes Rendus Acad. Sci. Paris Sér.
     A-B, 284:445–446, 1977. 88, 112

[35] M ICHEL L EDOUX AND M ICHEL TALAGRAND: Probability in Banach Spaces. Springer, 1991.
     94, 101

[36] A LEXANDRE M EGRETSKI: Relaxations of quadratic programs in operator theory and system anal-
     ysis. In Systems, Approximation, Singular Integral Operators, and Related Topics: International
     Workshop on Operator Theory and Applications. Birkhauser, 2001. 87

[37] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: Invariance and optimality. In Proc. 46th FOCS, pp. 21–30. IEEE
     Comp. Soc. Press, 2005. To appear, Annals of Mathematics. [doi:10.1109/SFCS.2005.53]. 87

                       T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                        115
                              S UBHASH K HOT AND RYAN O’D ONNELL

[38] A RKADI N EMIROVSKI , C ORNELIS ROOS , AND TAM ÁS T ERLAKY: On maximization of
     quadratic form over intersection of ellipsoids with common center. Math. Program., 86(3):463–
     473, 1999. [doi:10.1007/s101070050100]. 87

[39] Y URII N ESTEROV: Global quadratic optimization via conic relaxation, pp. 363–384. Kluwer
     Academic Publishers, 2000. 87

[40] RYAN O’D ONNELL AND Y I W U: An optimal SDP algorithm for Max-Cut, and equally
     optimal Long Code tests.       In Proc. 40th STOC, pp. 335–344. ACM Press, 2008.
     [doi:10.1145/1374376.1374425]. 88

[41] P RASAD R AGHAVENDRA: Optimal algorithms and inapproximability results for every CSP? In
     Proc. 40th STOC, pp. 245–254. ACM Press, 2008. [doi:10.1145/1374376.1374414]. 89

[42] P RASAD R AGHAVENDRA AND DAVID S TEURER: Towards computing the Grothendieck constant.
     In Proc. 19th Symposium on Discrete Algorithms (SODA), pp. 525–534. ACM Press, 2009. 107

[43] JAMES R EEDS: A new lower bound on the real Grothendieck constant. Manuscript (http://www.
     dtc.umn.edu/∼reedsj/bound2.dvi), 1991. 88, 96, 107

[44] L UCA T REVISAN , G REGORY S ORKIN , M ADHU S UDAN , AND DAVID W ILLIAMSON: Gad-
     gets, approximation, and linear programming. SIAM J. Comput., 29(6):2074–2097, 2000.
     [doi:10.1137/S0097539797328847]. 85

[45] U RI Z WICK: Outward rotations: A tool for rounding solutions of semidefinite programming re-
     laxations, with applications to MAX CUT and other problems. In Proc. 31st STOC, pp. 679–687.
     ACM Press, 1999. [doi:10.1145/301250.301431]. 85, 86


AUTHORS

     Subhash Khot
     Department of Computer Science
     Courant Institute of Mathematical Sciences
     New York University
     khot cs nyu edu
     http://cs.nyu.edu/∼khot


     Ryan O’Donnell
     Department of Computer Science
     Carnegie Mellon University
     Pittsburgh, PA
     odonnell cs cmu edu
     http://www.cs.cmu.edu/∼odonnell



                      T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                      116
                     SDP GAPS AND UGC- HARDNESS FOR M AX -C UT-G AIN

ABOUT THE AUTHORS

   S UBHASH K HOT is an Associate Professor in the Computer Science Department at New
      York University, part of the Courant Institute of Mathematical Sciences. He completed
      his Ph. D. in the summer of 2003 at the Princeton CS Department under the supervision
      of Sanjeev Arora. He stayed in Princeton for another year as a member of the School of
      Mathematics at IAS. He has been an Assistant Professor at the College of Computing at
      Georgia Tech since the fall of 2004, currently on leave.


   RYAN O’D ONNELL received a B. S. from the University of Toronto in 1999 and a Ph. D.
     from the MIT Mathematics Department in 2003. His Ph. D. advisor was Madhu Sudan.
     Following this he was a postdoc at IAS for a year in Avi Wigderson’s group, and a
     postdoc at Microsoft Research for two years in Jennifer Chayes’s group. Since 2006 he
     has been an assistant professor in the Computer Science Department at Carnegie Mellon
     University. Ryan’s research interests include Analysis of Boolean Functions, Hardness
     of Approximation, Learning Theory, and Probability. He enjoys his spare time.




                    T HEORY OF C OMPUTING, Volume 5 (2009), pp. 83–117                         117