Sherali--Adams Strikes Back

Authors Ryan O'Donnell, Tselil Schramm,
Plaintext
                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37
                                        www.theoryofcomputing.org

                                        S PECIAL ISSUE : CCC 2019



                       Sherali–Adams Strikes Back
                               Ryan O’Donnell∗                      Tselil Schramm†
              Received August 31, 2019; Revised October 29, 2020; Published September 28, 2021




       Abstract. Let G be any n-vertex graph√whose random walk matrix has its nontrivial
       eigenvalues bounded in magnitude by 1/ ∆ (for example, a random         graph   G of average
                                                                                 log n
       degree Θ(∆) typically has this property). We show that the exp c log ∆ -round Sherali–
       Adams linear programming hierarchy certifies that the maximum cut in such a G is at most
       50.1% (in fact, at most 12 + 2−Ω(c) ). For example, in random graphs with n1.01 edges, O(1)
       rounds suffice; in random graphs with n · polylog(n) edges, nO(1/ log log n) = no(1) rounds
       suffice.
            Our results stand in contrast to the conventional beliefs that linear programming hierar-
       chies perform poorly for MAX - CUT and other CSPs, and that eigenvalue/SDP methods are
       needed for effective refutation. Indeed, our results imply that constant-round Sherali–Adams
       can strongly refute random Boolean k-CSP instances with ndk/2e+δ constraints; previously
       this had only been done with spectral algorithms or the SOS SDP hierarchy. We also show
       similar results for other classical combinatorial optimization problems such as independent
       set, max clique, and vertex cover.

ACM Classification: G.1.6
AMS Classification: 03F20, 68W25
Key words and phrases: hierarchies, Sherali–Adams, combinatorial optimization
      A conference version of this paper appeared in the Proceedings of the 34th Computational Complexity Conference
(CCC’19) [37].
    ∗ Supported by NSF grants CCF-1618679, CCF-1717606. This material is based upon work supported by the National

Science Foundation under grant numbers listed above. Any opinions, findings and conclusions or recommendations expressed
in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).
    † This work was supported by NSF grants CCF 1565264 and CNS 1618026.



© 2021 Ryan O’Donnell and Tselil Schramm
c b Licensed under a Creative Commons Attribution License (CC-BY)                            DOI: 10.4086/toc.2021.v017a009
                                       RYAN O’D ONNELL AND T SELIL S CHRAMM

1     Introduction
Linear programming (LP) is a fundamental algorithmic primitive, and is the method of choice for a huge
number of optimization and approximation problems. Still, there are some very basic tasks where it
performs poorly. A classic example is the simplest of all constraint satisfaction problems (CSPs), the
MAX - CUT problem: Given a graph G = (V, E), partition V into two parts so as to maximize the fraction
of “cut” (crossing) edges. The standard LP relaxation for this problem [5, 38] involves optimizing over
the metric polytope. Using “±1 notation,” we have a variable Yuv for each pair of vertices {u, v} (with
Yuv supposed to be −1 if the edge is cut, +1 otherwise); the LP is:

                                                                                   −1 ≤ Yuv ≤ 1        (for all u, v ∈ V )
                        1 1 1
    MAX - CUT (G) ≤ max  − ·     ∑ Yuv                          s.t.    −Yuv − Yvw − Ywu ≤ 1           (for all u, v, w ∈ V )
                        2 2 |E| uv∈E
                                                                        −Yuv + Yvw + Ywu ≤ 1           (for all u, v, w ∈ V )

While this LP gives the optimal bound for some graphs (precisely, all graphs not contractible to K5 [5]), it
can give a very poor bound in general. Indeed, although there are graphs with maximum cut arbitrarily
close to 1/2 (e. g., Kn ), the above LP bound is at least 2/3 for every graph, since Yuv ≡ −1/3 is always
a valid solution. Worse, there are graphs G with MAX - CUT(G) arbitrarily close to 1/2 but with LP
value arbitrarily close to 1—i. e., graphs where the integrality ratio is 2 − o(1). For example, this is
true [39] of an Erdős–Rényi G(n, ∆/n) random graph with high probability (whp) when ∆ = ∆(n) satisfies
ω(1) < ∆ < no(1) .
    There have been two main strategies employed to overcome this deficiency: strengthened LPs, and
eigenvalue methods.

Strengthened LPs. One way to try to improve the performance of LPs on MAX - CUT is to add more valid
inequalities to the LP relaxation, beyond just the “triangle inequalities.” Innumerable valid inequalities
have been considered: (2k + 1)-gonal, hypermetric, negative type, gap, clique-web, suspended tree, as
well as inequalities from the Lovász–Schrijver hierarchy; see Deza and Laurent [18, Ch. 28–30] for a
review.
    It is now known that the most principled and general form of this strategy is the Sherali–Adams
LP hierarchy [45], reviewed in Section 2.4. At a high level, the Sherali–Adams LP hierarchy gives
a standardized way to tighten LP relaxations of Boolean integer programs, by adding variables and
constraints. The number of new variables/constraints is parameterized by a positive integer R, called the
number of “rounds.” Given a Boolean optimization problem with n variables, the R-round Sherali–Adams
LP has variables and constraints corresponding to monomials of degree up to R, and thus has size O(n)R .
A remarkable recent line of work [12, 33] has shown that for any CSP (such as MAX - CUT), the R-round
Sherali–Adams LP relaxation achieves essentially1 the tightest integrality ratio among all LPs of its size.
Nevertheless, even this most powerful of LPs arguably struggles to certify good bounds for MAX - CUT.
In a line of work [23, 44] concluding in a result of Charikar–Makarychev–Makarychev [13], it was
demonstrated that for any constant ε > 0, there are graphs (random ∆-regular ones, ∆ = O(1)) for which
    1 Provided that the LP is an extended formulation, which means that the feasible region is required to contain a representation

of each solutions {±1}n , so only the objective function of the LP can depend on the specific max cut instance.


                             T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                                2
                                     S HERALI –A DAMS S TRIKES BACK

the nΩ(1) -round Sherali–Adams LP has a MAX - CUT integrality gap of 2 − ε. As a consequence, every
                                        Ω(1)
MAX - CUT LP relaxation of size up to 2n     has such an integrality gap.

Eigenvalue and SDP methods. But for MAX - CUT, there is a simple, non-LP, algorithm that works
very well to certify that random graphs have maximum cut close to 1/2: the eigenvalue bounds. There
are two slight variants here (that coincide in the case of regular graphs): Given graph G = (V, E) with
adjacency matrix A and diagonal degree matrix D, the eigenvalue bounds are
                                                  |V |
                                  MAX - CUT (G) ≤      λmax (D − A) ,                                  (1.1)
                                                 4|E|
                                                 1 1             −1
                                  MAX - CUT (G) ≤ + λmax (−D A) .                                      (1.2)
                                                 2 2
Here D − A and D−1 A are the Laplacian matrix and the random walk matrix, respectively. The use of
eigenvalues to bound various cut values in graphs (problems like MAX - CUT, MIN - BISECTION, 2- XOR,
expansion, etc.) has a long history dating back to Fieldler and Donath–Hoffman [24, 19] among others
(Inequality (1.1) is specifically from Mohar–Poljak [36]). It was recognized early on that eigenvalue
methods work particularly well for solving planted-random instances (e. g., of 2- XOR [31] and MIN -
BISECTION [10]) and for certifying MAX - CUT values near 1/2 for truly random instances. Indeed, as soon
as one√knows (as we now do [46, 22]) that D−1 A has all nontrivial eigenvalues bounded in magnitude by
O(1/ ∆) (whp) for a random ∆-regular graph (or an Erdős–Rényi√G(n, ∆/n) graph with ∆ & log n), the
eigenvalue bound (1.2) certifies that MAX - CUT(G)√≤ 1/2 + O(1/ ∆). This implies an integrality ratio
tending to 1; indeed, MAX - CUT(G) = 1/2 + Θ(1/ ∆) in such random graphs (whp).
    Furthermore, if one extends the eigenvalue bound (1.1) above to
                                                          |V |
                             MAX - CUT (G) ≤     min           λmax (D − A +U)
                                               U diagonal 4|E|
                                                tr(U)=0

(as suggested by Delorme and Poljak [17], following [19, 10]), one obtains the polynomial-time com-
putable semidefinite programming (SDP) bound. Goemans and Williamson [28] showed this bound has
integrality ratio less than 1.14 ≈ 1/.88 for worst-case G, and it was subsequently shown [49, 21, 15] that
the SDP bound is 1/2 + o(1) whenever MAX - CUT(G) ≤ 1/2 + o(1).

LPs cannot compete with eigenvalues/SDPs? This seemingly striking separation between the perfor-
mance of LPs and SDPs in the context of random MAX - CUT instances is now taken as a matter of course.
To quote, e. g., [47],
          [E]xcept for semidefinite programming, we know of no technique that can provide, for
      every graph of max cut optimum ≤ .501, a certificate that its optimum is ≤ .99. Indeed, the
      results of [23, 44, 13] show that large classes of Linear Programming relaxations of max cut
      are unable to distinguish such instances.
Specifically, the last statement here is true for ∆-regular random graphs when ∆ is a certain large constant.
The conventional wisdom is that for such graphs, linear programs cannot compete with semidefinite

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                               3
                                      RYAN O’D ONNELL AND T SELIL S CHRAMM

programs, and cannot certify even the eigenvalue bound.

      Our main result challenges this conception.

1.1     Our results
We show that whenever the eigenvalue bound (1.2) certifies the bound MAX - CUT(G) ≤ 1/2 + o(1), then
no(1) -round Sherali–Adams can certify this as well.2

Theorem 1.1 (Simplified version of Theorem 3.6 and Theorem 3.7). Let G be a simple n-vertex graph
and assume that |λ | < ρ for all eigenvalues λ of G’s random walk matrix D−1 A (excluding the trivial
eigenvalue of 1). Then for any 1 ≤ c ≤ Ω(log(1/ρ)), Sherali–Adams with nO(c/ log(1/ρ)) rounds certifies
that MAX - CUT(G) ≤ 1/2 + 2−c .

    For example, if G’s random walk matrix has its nontrivial eigenvalues bounded in magnitude by
n−.001 , as is the case (whp) for random graphs with about n1.002 edges, then Sherali–Adams can certify
MAX - CUT (G) ≤ 50.1% with constantly many rounds. We find this result surprising, and in defiance
of the common belief that polynomial-sized LPs cannot take advantage of spectral properties of the
underlying graph.3
    (As an aside, in Theorem 4.2, we show that the R-round Sherali–Adams relaxation for MAX - CUT has
value at least 1/2 + Ω(1/R) for every graph G. This at least demonstrates that some dependence of the
refutation strength on the number of Sherali–Adams rounds is always necessary.)
    One might ask whether Theorem 1.1 even requires the assumption of small eigenvalues. That is,
perhaps no(1) -round Sherali–Adams can certify MAX - CUT ≤ 1/2 + o(1) whenever this is true. As far
as we know, this may be possible; after all, the basic SDP relaxation has this property [49, 21, 15]. On
the other hand, the eigenvalue bound itself does not have this property; there exist graphs with large
(nontrivial) eigenvalues even though the maximum cut is close to 1/2.4

1.1.1    Subexponential-sized LPs for MAX - CUT in sparse random graphs
One setting in which the spectral radius ρ is understood concretely is in random regular graphs. Building
upon [26, 11, 16], the following was recently shown:

Theorem 1.2 ([46]). There is a fixed constant C such that for all 3 ≤ ∆ ≤ n/2 with ∆n even, it holds that
a uniformly random n-vertex ∆-regular simple graph G satisfies the following √with high probability: all
eigenvalues of G ’s normalized adjacency matrix, other than 1, are at most C/ ∆ in magnitude.
    2 Actually, there is a slight mismatch between our result and Inequality (1.2): in Theorem 1.1 we need the maximum

eigenvalue in magnitude to be small; i. e., we need λmin (−D−1 A) to be not too negative. This may well just be an artifact of our
proof.
    3 We should emphasize that it is not concomitant nO(1) running time that is surprising, since the eigenvalue bound itself

already achieves that.
    4 Consider, for example, a graph given by the union of a ∆-regular random graph on n vertices and a ∆-regular bipartite graph
    √
on n vertices. This will have MAX - CUT value close to 1/2, but will also have large negative eigenvalues coming from the
bipartite component.


                             T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                               4
                                           S HERALI –A DAMS S TRIKES BACK

    Combining the above with Theorem 1.1, we have the following consequence for MAX - CUT on random
regular graphs:

Corollary 1.3. Let n, 3 ≤ ∆ ≤ n/2, and 1 ≤ c ≤ Ω(log ∆) be positive integers. Then if G is a ran-
dom ∆-regular n-vertex graph, with high probability nO(c/ log ∆) -round Sherali–Adams can certify that
MAX - CUT (G) ≤ 12 + 2−c .

   For example, if ∆ ≥ C · 106 (for C the constant in the bound on λ (G)), then n1/3 -rounds of Sherali–
Adams can certify MAX - CUT(G) ≤ .51. This result serves as a partial converse to [13]:

Theorem 1.4. ([13, Theorem 5.3]) For every fixed integer ∆ ≥ 3, with high probability over the choice of
an n-vertex ∆-regular random graph G,5 the nΘ(1/ f (∆)) -round Sherali–Adams relaxation for MAX - CUT
has value at least MAX - CUT(G) ≥ 0.99, where f (∆) is a function that grows with ∆.6

    While [13] show that ∆-regular random graphs require Sherali–Adams (and by [33] any LP) relax-
ations of at least subexponential size, our result implies that subexponential LPs are sufficient. Further,
though the function f (∆) is not specified in [13], by tracing back through citations (e. g., [3, 4, 14]) to
extract a dependence, it appears we may take f (∆) = log ∆. So our upper bound is tight as a function of
∆, up to constant factors.
    Prior to our result, it was unclear whether even (n/polylog n)-round Sherali–Adams could certify
that the MAX - CUT value was < 0.9 for sparse random regular graphs. Indeed, it was equally if not more
conceivable that Charikar et al.’s result was not tight, and could be extended to Ω̃(n)-rounds. In light of
our result, we are left to wonder whether there are instances of MAX - CUT which have truly exponential
extension complexity.

1.1.2    Refuting random CSPs with linear programs
With minor modifications, our argument extends as well to 2- XOR. Then following the framework in [2],
we have the following consequence for certifying bounds on the value of random k-CSPs:

Theorem 1.5 (Simplified version of Theorem 5.2). Suppose that P : {±1}k → {0, 1} is a k-ary Boolean
predicate, and that δ , ε > 0. Let E[P] be the probability that a random x ∈ {±1}k satisfies P. Then
for a random instance I of P on n variables with m ≥ ndk/2e+δ expected clauses, with high probability
Sherali–Adams can certify that OBJI (x) ≤ E[P] + ε using R = Oε,δ ,k (1) rounds.

    This almost matches the comparable performance of sum-of-squares and spectral algorithms [2],
which are known to require m ≥ nk/2 clauses to certify comparable bounds in polynomial time [30, 43,
34].7 Prior to our work it was known that Sherali–Adams admits weak refutations (i. e., a certificate that
   5 In [13], the graph is actually a pruned random graph, in which o(n) edges are removed; this does not affect compatibility

with our results, as the LP value is Lipschitz and so the pruning changes the LP value by o(1).
   6 Though f (∆) is not specified in [13] (and in their proof f (∆) is dictated by a combination of lemmas in prior work) it

appears we can take f (∆) = log ∆.
   7 The expert may notice that we require the number of clauses m  ndk/2e , whereas the best sum-of-squares and spectral

algorithms require only m  nk/2 . This is because we do not know how to relate the Sherali–Adams value of the objective
function to its square (local versions of the Cauchy–Schwarz argument result in a loss). Such a relation would allow us to apply
our techniques immediately to prove that Sherali Adams matches the SOS and spectral performance for odd as well as even k.


                            T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                              5
                                       RYAN O’D ONNELL AND T SELIL S CHRAMM

OBJ ≤ 1 − o(1)) when m ≥ nk/2 , but it was conceivable (and even conjectured) that O(1)-rounds could
not certify OBJ ≤ 1 − δ for constant δ at m = o(nk ).
    The result above also extends to t-wise independent predicates as in [2] (see Section 5). Also, one may
extract the dependence on the parameters ε, δ to give nontrivial results when these parameters depend on
n.8

1.1.3    Independent Set, Clique, and Vertex Cover
Our result also extends to constrained optimization problems such as Independent Set, Vertex Cover,
and Clique in regular9 graphs. Here again, our results complement strong LP lower bounds obtained by
Charikar et al. and others (e. g., [3, 14]). We prove a theorem for Independent Set in regular graphs:

Theorem 1.6 (Simplified version of Theorem 3.11). Suppose that G is a regular, simple, n-vertex graph
and assume that |λ | < ρ for all eigenvalues of G’s random walk matrix D−1 A (excluding the trivial 1
eigenvalue). Then for any 1 ≤ c ≤ Ω(log(1/ρ)), Sherali–Adams with nO(c/ log(1/ρ)) rounds certifies that
INDEPENDENT- SET (G) ≤ 2−c .

    Because the maximum independent set value is related to the minimum vertex cover value via a
proof that is captured by the Sherali–Adams proof system, and because the maximum clique problem is
equivalent to maximum independent set on the complement graph, this also implies that Sherali–Adams
certifies nontrivial bounds on the value of the maximum clique and minimum independent set. We leave
the precise consequences for clique and vertex cover as an exercise for the interested reader.

1.2     Prior work
It is a folklore result that in random graphs with average degree nδ , 3-round Sherali–Adams (SA) certifies
a MAX - CUT value of at most max(1 − Ω(δ ), 23 ) (observed for the special case of δ > 12 in [6, 39]); this is
simply because of concentration phenomena, since most edges participate in roughly the same number of
odd cycles of length O( δ1 ), after which one can apply the triangle inequality. However, this observation
does not allow one to take the refutation strength independent of the average degree.
     There is some prior work examining the performance of Sherali–Adams hierarchies on random CSPs.
De la Vega and Mathieu [23] show that in dense graphs, with average degree Ω(n), O(1) rounds of
Sherali–Adams certify tight bounds on MAX - CUT. This result heavily leverages the density of the graph,
and is consistent with the fact that dense MAX - CUT is known to admit a PTAS [27]. However, their work
does not give O(1)-round approximations better than 2 − o(1) for graphs with o(n2 ) edges.
     Another relevant line of work is a series of LP hierarchy lower bounds (both for Sherali–Adams and
for the weaker Lovász–Schrijver hierarchy) for problems such as MAX - CUT, vertex cover, and sparsest
cut, including [1, 3, 23, 44], and culminating in the already mentioned result of Charikar, Makarychev and
Makarychev; in [13], they give subexponential lower bounds on the number of rounds of Sherali–Adams
required to strongly refute MAX - CUT in random regular graphs. Initially, one might expect that this result
    8 Though for 2- XOR and MAX - CUT we have done this explicitly, for higher-arity random CSPs we have left this for the

interested reader.
    9 Theorem 3.11 actually gives nontrivial bounds for irregular graphs as well, but what we are able to control is the size of the

clique as weighted by the vertex degree, rather than weighting all vertices equally.


                             T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                                 6
                                           S HERALI –A DAMS S TRIKES BACK

could be strengthened to prove that sparse random graphs require almost-exponential-sized LPs to refute
MAX - CUT ; our result demonstrates instead that [13] is almost tight.
     We also mention the technique of global correlation rounding in the sum-of-squares hierarchy,
which was used to give subexponential time algorithms for unique games [8] and polynomial-time
approximations to max-bisection [42]. One philosophical similarity between these algorithms and ours
is that they both relate local properties (correlation among edges) to global properties (correlation of
uniformly random pairs). But [8, 42] use the fact that the relaxation is an SDP (whereas our result is
interesting because it is in the LP-only setting), and the “conditioning” steps that drive their algorithm are
a fundamentally different approach.
     There is much prior work concerned with certifying bounds on random CSPs, and we survey only
some of them here, referring the interested reader to the discussion in [2]. The sequence of articles
[30, 43, 34] establishes sum-of-squares lower bounds for refuting any random constraint satisfaction
problem, and these results are tight via the SOS algorithms of [2, 40]. The upshot is that for k- SAT and
k- XOR,10 ω(1) rounds of SOS are necessary to strongly refute an instance with m = o(nk/2 ) clauses, and
O(1) rounds of SOS suffice when m = Ω̃(nk/2 ). Because SOS is a tighter relaxation than Sherali–Adams,
the lower bounds [30, 43, 34] apply; our work can be seen to demonstrate that SA does not lag far behind
SOS, strongly refuting with O(1) rounds as soon as m = Ω(ndk/2e+δ ) for any δ > 0.
     In a way, our result is part of a trend in anti-separation results for SDPs and simpler methods for
pseudorandom and structured instances. For example, we have for planted clique that the SOS hierarchy
performs no better than the Lovász–Schrijver+ hierarchy [20, 7], and also no better than a more primitive
class of estimation methods based on local statistics (see, e. g., [41] for a discussion). Similar results hold
for problems relating to estimating the norms of random tensors [32]. Further, in [32] an equivalence is
shown between SOS and spectral algorithms for a large class of average-case problems. Our result shows
that for random CSPs, the guarantees of linear programs are surprisingly not far from the guarantees of
SOS.
     Finally, we mention related work in extended formulations. The sequence of articles [12, 33]
shows that SA lower bounds for CSPs imply lower bounds for any LP relaxation which is an extended
formulation; the stronger (and later) statement is due to Kothari et al. [33], who show that subexponential-
round integrality gaps for CSPs in the Sherali–Adams hierarchy imply subexponential-size lower bounds
for any LP extended formulation. The techniques of these papers are then applied in conjunction with the
techniques of [30, 43, 13] to give subexponential lower bounds against CSPs for any LP. Our results give
an upper limit to the mileage one can get from these lower bounds in the case of MAX - CUT, as we show
that the specific construction of [13] cannot be strengthened much further.

1.3    Techniques
Our primary insight is that while Sherali–Adams is unable to reason about spectral properties globally, it
does enforce that every set of R variables behave locally according to the marginals of a valid distribution,
which induces local spectral constraints on every subset of up to R variables.
    At first, it is unclear how one harnesses such local spectral constraints. But now, suppose that we are
in a graph with a small spectral radius. This implies that random walks mix rapidly, in say t steps, to a
  10 This is more generally true for any CSP that supports a k-wise independent distribution over satisfying assignments.



                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                             7
                                      RYAN O’D ONNELL AND T SELIL S CHRAMM

close-to-uniform distribution. Because a typical pair of vertices at distance t is distributed roughly as
a uniformly random pair of vertices, any subset of R vertices which contains a path of length t already
allows us to relate global and local graph properties.
    To see why this helps, we take for a moment the “pseudoexpectation” view, in which we think of the
R-round Sherali–Adams as providing a proxy for the degree-R moments of a distribution over MAX - CUT
solutions x ∈ {±1}n , with MAX - CUT value
                                       MAX - CUT (G) = 21 − 21           E        e u Xv ]
                                                                                  E[X                                        (1.3)
                                                                     (u,v)∈E(G)

        e u Xv ] is the “pseudo-correlation” of variables Xu , Xv . Because there is no globally consistent
where E[X
assignment, the pseudo-correlation E[X e u Xv ] for vertices u, v sampled uniformly at random will be close
     11
to 0. But in any fixed subgraph of size Ω(t), enforcing E[X      e u Xv ] ≈ 0 for pairs u, v at distance t has
consequences, and limits the magnitude of correlation between pairs of adjacent vertices as well. In
particular, for any XS which is the restriction of X to a set S of up to R vertices, the pseudo-second
moment matrix E[Xe S X> ] must be PSD, forcing some entries to 0 gives a constraint on the magnitude of
                         S
edge correlations.
    For example, suppose for a moment that we are in a graph G with t = 2, and that S is a star graph in
                                                                                                 e S X> ]  0.
G, given by one “root” vertex r with k ≤ R − 1 children U = {u1 , . . . , uk }, and define X = E[X       S
Notice that pairs of distinct children ui , u j are at distance t = 2 in S. If we suppose for a moment that
                                                                                                   e 2 ] = 1),
e u Xu ] = 0 for every ui 6= u j , the only nonzero entries of X are the diagonals (which are all E[X
E[X  i   j                                                                                              i
and the entries corresponding to edges from the root to its children, (r, ui ), which are E[X e r Xu ]. Now
                                                                                                      i
defining the vector c ∈ RS with a 1 at the root r, cr = 1, and α (to be chosen later) on each child u ∈ U,
cu = α, we have from the PSDness of X that
                0 ≤ c> Xc = kck22 + ∑ 2cr cu · E[X
                                               e r Xu ] = (1 + α 2 k) + 2αk                      E        e u Xv ] .
                                                                                                          E[X
                                          u∈U                                                (u,v)∈E(S)

Choosing α = k−1/2 , we have that
                                                   E         e u Xv ] ≥ −k−1/2 .
                                                             E[X
                                                (u,v)∈E[S]

                                                                          e u Xu ] = 0 for all ui 6= u j ∈ S;
    To deduce this lower bound on the edge correlation, we required that E[X    i  j
this will not necessarily hold for each star S (or even for any star S). But if we take a weighted average
over all k-stars according to a well-chosen distribution D, this will (approximately) hold on average: we
will have                                                   
                                            E E e X (S) X (S) ≈ 0
                                                    u
                                                    S∼D  u       i   j

for all i 6= j. Our strategy is to take a carefully-chosen average over specific subgraphs S of G with
|S| = Ω(t), according to a distribution D. By our choice of distribution and subgraph, the fact that
the subgraphs locally have PSD pseudocorrelation matrices has consequences for the global average
pseudocorrelation across edges, which in turn gives a bound on the objective value eq. (1.3). This allows
us to show that Sherali–Adams certifies much better bounds than we previously thought possible, by
aggregating local spectral information across many small subgraphs.
  11 This is implicit in our proof, but intuitively it should be true because, e. g., u, v should be connected by equally many even-

and odd-length paths.


                             T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                                 8
                                              S HERALI –A DAMS S TRIKES BACK

Organization
We begin with technical preliminaries in Section 2. In Section 3 we prove our main result. Section 4
establishes a mild lower bound for arbitrary graphs. Finally, Section 5 applies Theorem 1.1 to the
refutation of arbitrary Boolean CSPs.


2      Setup and preliminaries
We begin by recalling preliminaries and introducing definitions that we will rely upon later.


2.1      Random walks on undirected graphs
Here, we recall some properties of random walks in undirected graphs that will be of use to us.

Definition 2.1. Let G = (V, E) be an undirected finite graph, with parallel edges and self-loops allowed,12
and with no isolated vertices. The standard random walk on G is the Markov chain on V in which at each
step one follows a uniformly random edge out of the current vertex. For u ∈ V , we use the notation v ∼ u
to denote that v is the result of taking one random step from u.

Definition 2.2. We write K for the transition operator of the standard random walk on G. That is, K is
obtained from the adjacency matrix of G by normalizing the uth row by a factor of 1/ deg(u).

Definition 2.3. We write π for the probability distribution on V defined by π(v) = deg(v)      2|E| . As is well
known, this is an invariant distribution for the standard random walk on G, and this Markov chain is
reversible with respect to π. For u ∼ π and v ∼ u , the distribution of (uu, v ) is that of a uniformly random
(directed) edge from E. We will also use the notation π∗ = minv∈V {π(v)}.

Definition 2.4. For f , g : V → R we use the notation h f , giπ for Eu ∼π [ f (uu)g(uu)]. This is an inner product
on the vector space RV ; in case G is regular and hence π is the uniform distribution, it is the usual inner
product scaled by a factor of 1/|V |. It holds that

                                        h f , Kgiπ = hK f , giπ =        E [ f (uu)g(vv)] .                 (2.1)
                                                                      (uu,vv)∼E


Definition 2.5. A stationary d-step walk is defined to be a sequence (uu0 , u1 , . . . , ud ) formed by choos-
ing an initial vertex u 0 ∼ π, and then taking a standard random walk, with ut ∼ ut−1 . Generalizing
Equation (2.1), it holds in this case that

                                                E[ f (uu0 )g(uud )] = h f , K d giπ .

    12 Self-loops count as “half an edge,” and contribute 1 to a vertex’s degree.



                              T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                              9
                                      RYAN O’D ONNELL AND T SELIL S CHRAMM

2.2    Tree-indexed random walks
To prove our main theorem we define a class of homomorphisms we call tree-indexed random walks.

Definition 2.6. Suppose we have a finite undirected tree with vertex set T . A stationary T -indexed
random walk in G is a random homomorphism φ : T → V defined as follows: First, root the tree at an
arbitrary vertex i0 ∈ T . Next, define φ (i0 ) ∼ π. Then, independently for each “child” j of i0 in the tree,
define φ ( j) ∼ φ (i0 ); that is, define φ ( j) ∈ V to be the result of taking a random walk step from φ (i0 ).
Recursively repeat this process for all children of i0 ’s children, etc., until each vertex k ∈ T has been
assigned a vertex φ (k) ∈ V .

    We note that the homomorphism φ defining the T -indexed random walk need not be injective.
Consequently, if T is a tree with maximum degree D, we can still have a T -indexed random walk in a
d-regular graph with d < D.
    The following fact is simple to prove; see, e. g., [35].

Fact 2.7. The definition of φ does not depend on the initially selected root i0 ∈ T . Further, for any two
vertices i, j ∈ T at tree-distance d, if i = i0 , i1 , . . . , id = j is the unique path in the tree between them, then
the sequence (φφ (i0 ), φ (i1 ), . . . , φ (id )) is distributed as a stationary d-step walk in G.

2.3    2XOR and signed random walks
The 2- XOR constraint satisfaction problem is defined by instances of linear equations in Fn2 . For us it
will be convenient to associate with these instances a graph with signed edges, and on such graphs we
perform a slightly modified random walk.

Definition 2.8. We assume that for each pair of vertices (u, v) which are connected by an edge in G, there
is an associated sign ξuv = ξvu ∈ {±1}.13 We arrange these signs into a symmetric matrix Ξ = (ξuv )uv . If
G has no (u, v) edge then the entry Ξuv will not matter; we can take it to be 0.

Definition 2.9. We write K = Ξ ◦ K for the signed transition operator.14 The operator K is self-adjoint
with respect to h·, ·iπ , and hence has real eigenvalues. It also holds that

                                    h f , Kgiπ = hK f , giπ =       E [ξu v f (uu)g(vv)] .                       (2.2)
                                                                 (uu,vv)∼E

Definition 2.10. We may think of G and Ξ as defining a 2- XOR constraint satisfaction problem (CSP), in
which the task is to find a labeling f : V → {±1} so as to maximize the fraction of edges (u, v) ∈ E for
which the constraint f (u) f (v) = ξuv is satisfied. The fraction of satisfied constraints is
                                      1 1                       1 1
                                        2 + 2 ξu v f (u ) f (v ) = 2 + 2 h f , K f iπ .
                                  E                   u v                                          (2.3)
                                    (uu,vv)∼E


We will typically ignore the 12 ’s and think of the 2- XOR CSP as maximizing the quadratic form h f , K f iπ .
When all signs in the matrix Ξ are −1, we refer to this as the MAX - CUT CSP.
  13 If G has multiple (u, v) edges, we think of them as all having the same sign.
  14 Here ◦ denotes the Hadamard product.



                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                     10
                                               S HERALI –A DAMS S TRIKES BACK

Definition 2.11. A signed stationary d-step walk is a sequence of pairs (uut , σ t ) ∈ V × {±1} for 0 ≤ t ≤ d,
chosen as follows: first, we choose a stationary d-step walk (uu0 , u 1 , . . . , u d ) in G; second, we choose
σ 0 ∈ {±1} uniformly at random; finally, we define σ t = σ t−1 ξut−1 ut . Generalizing Equation (2.2), it
holds in this case that
                                                                       d
                                                 σ d g(uud )] = h f , K giπ .
                                     σ 0 f (uu0 )σ
                                   E[σ

Definition 2.12. We extend the notion from Theorem 2.6 to that of a signed stationary T -indexed random
walk in G. Together with the random homomorphism φ : T → V , we also choose a random signing
σ : T → {±1} as follows: for the root i0 , the sign σ (i0 ) ∈ {±1} is chosen uniformly at random; then, all
other signs are deterministically chosen—for each child j of i0 we set σ ( j) = ξi0 j σ (i0 ), and in general
σ (k) = ξk0 k σ (k) where k0 is the parent of k. Again, it is not hard to show that the definition of (φφ , σ )
does not depend on the choice of root i0 , and that for any path i0 , i1 , . . . , id of vertices in the tree, the
distribution of (φφ (i0 ), σ (i0 )), (φφ (i1 ), σ (i1 )), . . . (φφ (id ), σ (id )) is that of a signed stationary d-step walk
in G.


2.4    Proof systems
Our central object of study is the Sherali–Adams proof system, but our results also apply to a weaker
proof system which we introduce below.

Definition 2.13. We define the R-local, degree-D (static) sum of squares (SOS) proof system over
indeterminates X1 , . . . , Xn as follows. The “lines” of the proof are real polynomial inequalities in
X = (X1 , . . . , Xn ). The “default axioms” are any real inequalities of the form p(Xu1 , . . . , XuR )2 ≥ 0, where
p is a polynomial in at most R variables and of degree at most D/2. The “deduction rules” allow one to
derive any nonnegative linear combination of previous lines. This is a sound proof system for inequalities
about n real numbers X1 , . . . , Xn .
    In addition to the default axioms, one may also sometimes include problem-specific “equalities” of
the form q(X) = 0. In this case, one is allowed additional axioms of the form q(X)s(X) R 0 as long as the
polynomial q(X)s(X) depends on at most R indeterminates and has degree at most D.

Fact 2.14. The case of R = ∞ (equivalently, R = n) corresponds to the well-known degree-D SOS proof
system.

Definition 2.15. Suppose one includes the Boolean equalities, meaning X2u − 1 = 0 for all 1 ≤ i ≤ n.15 In
this case D = ∞ is equivalent to D = R, and the corresponding proof system is the well-known R-round
Sherali–Adams proof system. It is well known that every inequality p(Xu1 , . . . , XuR ) ≥ 0 that is true for
±1-valued Xu1 , . . . , XuR is derivable in this system.

Fact 2.16. There is a poly(nR , L)-time algorithm based on Linear Programming for determining whether
a given polynomial inequality p(X) ≥ 0 of degree at most R (and rational coefficients of total bit-
complexity L) is derivable in the R-round Sherali–Adams proof system.
  15 Or alternatively, X2 − X = 0 for all i.
                        u    u



                            T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                          11
                                 RYAN O’D ONNELL AND T SELIL S CHRAMM

    For example, in our primary case of MAX - CUT on the graph G = (V, E) with variables {Xu }u∈V (G) ,
our axioms will be the Boolean inequalities {X2u − 1 = 0}u∈V (G) , and we wish to certify the best upper
bound possible on the degree-2 polynomial that defines the objective value, 12 − 12 E(u,v)∼E(G) Xu Xv . By
now there are many references which treat the use of the Sherali–Adams proof system in the optimization
of constraint satisfaction problems; for example see [12, 13].

Fact 2.17. We will often be concerned with the R-local, degree-2 SOS proof system, where all lines are
quadratic inequalities. In this case, we could equivalently state that the default axioms are all those
inequalities of the form
                                              x> Px ≥ 0 ,                                          (2.4)
where x = (Xu1 , . . . , XuR ) is a length-R subvector of X, and P is an R × R positive semidefinite (PSD)
matrix.

Remark 2.18. In fact, we will often be concerned with the R-round, degree-2 Sherali–Adams proof
system. Despite the restriction to D = 2, we only know the poly(nR , L)-time algorithm for deciding
derivability of a given quadratic polynomial p(X) ≥ 0 (of bit-complexity L).

2.5   Linear Algebra
We will use the following basic linear algebraic notions. We will denote the identity matrix by 1. For
an n × m real matrix M, we use kMk or kMkop to denote M’s operator norm, which is equal to the
maximum singular value. When M is square and m = n, we use ρ(M) to denote the spectral radius of M,
ρ(M) = max{|λ | | ∃v ∈ Rn s.t. Mv = λ v}. We will also use the following facts:

Fact 2.19. If M ∈ Rn×n , then for all i ∈ [n], ∑ j∈[n] M 2ji ≤ kMk2 , and ∑ j∈[n] Mi2j ≤ kMk2 .

Proof. By definition of the operator norm,
                                                                   s
                                 kMk = max kMvk ≥ kMei k =
                                         kvk=1
                                                                       ∑ M2ji .
                                                                       j∈[n]


where ei is the ith standard basis vector. Repeating the same argument using the fact that kMk = kM > k
proves the second inequality.

Fact 2.20. Let e ∈ Rn be the all-1’s vector. If K ∈ Rn×n is a transition matrix with stationary distribution
π ∈ Rn , then K 0 = K − eπ > is the projection of K orthogonal to the trivial eigenspace with left eigenvector
π and right eigenvector e.

Proof. This is a consequence of the spectral theorem: viewing K as a linear operator from Rn to Rn , notice
that K is self-adjoint with respect to the inner product h·, ·iπ . Thus we can write Kv = ∑ni=1 λi φi hφi , viπ ,
for orthonormal eigenfunctions φ1 , . . . , φn with corresponding real eigenvalues λ1 ≥ · · · ≥ λn . Since K is
a transition operator, λi ≤ 1 for all i. Because e is a right-eigenvector of K with eigenvalue 1, we can take
φ1 = e, λ1 = 1, and then (K − eπ > )v = ∑ni=2 λi φi hφi , viπ . From this we see that K − eπ > has the same
spectral decomposition as K, except λ1 is replaced by 0.

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                  12
                                     S HERALI –A DAMS S TRIKES BACK

3    2XOR certifications from spider walks
In this section, we prove our main theorem: given a 2- XOR or MAX - CUT instance on a graph G with small
spectral radius, we will show that the R-local degree-2 SOS proof system gives nontrivial refutations with
R not too large. We also prove a similar result for maximum independent set.
    Our strategy is as follows: we select a specific tree T of size ∝ R, and we consider the distribution over
copies of T in our graph given by the T -indexed stationary random walk. We will use this distribution to
define the coefficients for a degree-2, R-local proof that bounds the objective value of the CSP. We will
do this by exploiting the uniformity of the graph guaranteed by the small spectral radius, and the fact
that degree-2 R-local SOS proofs can certify positivity of quadratic forms c> X|S X|>   S c, where X|S is the
restriction of X to a set S of variables with |S| ≤ R and c ∈ R|S| .
    Intuitively, in the “pseudoexpectation” view, the idea of our proof is as follows. When there is no
globally consistent assignment, a uniformly random pair of vertices u, v ∈ V will have pseudocorrelation
close to zero. On the other hand, if t-step random walks mix to a roughly uniform distribution over vertices
in the graph, then pairs of vertices at distance t will also have pseudocorrelation close to zero. But also,
in our proof system the degree-2 pseudomoments of up to R variables obey a positive-semidefiniteness
constraint. By choosing the tree T with diameter at least t, while also choosing T to propagate the effect
of the low-pseudocorrelation at the diameter to give low-pseudocorrelation on signed edges, we show
that the proof system can certify that the objective value is small. Specifically, we will choose T to be a
spider graph:

Definition 3.1. For integers k, ` ∈ N+ , we define a (k, `)-spider graph to be the tree formed by gluing
together k paths of length ` at a common endpoint called the root. This spider has k` + 1 vertices and
diameter 2`.

    While we were not able to formally prove that the spider is the optimal choice of tree, intuitively, we
want to choose a tree that maximizes the ratio of the number of pairs at maximum distance (since such
pairs relate the local properties to the global structure) to the number of vertices in the tree (because we
need to take our number of rounds R to be at least the size of the tree). Among trees, the spider is the
graph that maximizes this ratio.
    Let us henceforth fix a (k, `)-spider graph, where the parameters k and ` will be chosen later. We
write S for the vertex set of this tree (and sometimes identify S with the tree itself).

Definition 3.2. For 0 ≤ d ≤ 2`, we define the matrix A(d) ∈ RS×S to be the “distance-d” adjacency matrix
                       (d)
of the spider; i. e., Ai j is 1 if distS (i, j) = d and is 0 otherwise. (We remark that A(0) is the identity
matrix.)

    The following key technical theorem establishes the existence of a matrix Ψ which will allow us to
define the coefficients in our R-local degree-2 SOS proof. It will be proven in Section 3.3:

Theorem 3.3. For any parameter α ∈ R, there is a PSD matrix Ψ = Ψα ∈ RS×S with the following

                       T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                13
                                  RYAN O’D ONNELL AND T SELIL S CHRAMM

properties:

                                                     1 2`     1 α 2` − α 2
                                  hΨ, A(0) i = 1 +      α +                ,
                                                     2k     k − 1 α2 − 1
                                  hΨ, A(1) i = α,
                                   hΨ, A(d) i = 0 for 1 < d < 2` ,
                                                1 − 1/k 2`
                                  hΨ, A(2`) i =        α .
                                                   2

Here we are using the notation hB,Ci for the “matrix (Frobenius) inner product” Tr(B>C).

    The crucial property of Ψ is that, for a well-chosen value of α, it has value Θ(1) on A(0) , value 0 on
A(d) for 1 < d < 2`, and the values on A(1) and A(2` ) are related by a 2`th power:

Corollary 3.4. Assuming that k ≥ 3` and taking α = k1/2` , the PSD matrix Ψ satisfies

   3/2 ≤ hΨ, A(0) i ≤ 2,   hΨ, A(1) i = k1/2` ,     hΨ, A(d) i = 0 for 1 < d < 2`,   hΨ, A(2`) i = 21 (k − 1) .

    We will also use the following small technical lemma, which will let us relate the correlation across
vertex pairs when weighted by some matrix M to M’s spectral norm; for example, when M is a power of
the transition matrix of our instance.

Lemma 3.5. Let M ∈ RV ×V and recall π∗ = minv∈V {π(v)} > 0. Then the 2-local, degree-2 SOS proof
system can derive
                                                 −1/2
                             E ∑ Mu v Xu Xv ≤ π∗ kMk E X2u .
                                   u ∼π                              u ∼π
                                          v∈V

Proof. The proof system can derive the following inequality for any γ > 0, since the difference of the
two sides is a perfect square:
                                                2
                                              Muv         γπ(v) 2
                                Muv Xu Xv ≤         X2u +       Xv .
                                             2γπ(v)         2
Thus it can derive
                                                                Mu2v   γ
                            E ∑ Mu v Xu Xv ≤ E X2u ∑                  + E X2v .                             (3.1)
                           u ∼π
                                  v∈V
                                                    u ∼π
                                                           v∈V 2γπ(v)  2 v ∼π

                 −1/2
We’ll take γ = π∗ kMk. Since we can certainly derive aX2u ≤ bX2u whenever a ≤ b, we see that it
suffices to establish
                                          Mu2v     γ
                                     ∑ 2γπ(v)    ≤
                                     v∈V           2

for every outcome of u . But this is implied by ∑v Mu2v ≤ (π(v)/π∗ )kMk2 for all v ∈ V , which is indeed
true.

    We can now prove the following main theorem:

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                      14
                                     S HERALI –A DAMS S TRIKES BACK

Theorem 3.6. Suppose we have a graph G = (V, E) with signed transition operator K and π∗ =
minv∈V deg(v)                        `
        2|E| . Given parameters k ≥ 3 , let R = k` + 1 and define

                                                 −1/2
                                              kπ∗               2
                                        β=       1/2`
                                                      ρ(K)2` + 1/2` .
                                              2k              k

Then R-local, degree-2 SOS can deduce the bound “ρ(K) ≤ β ”; more precisely, it can deduce the two
inequalities
                               −β hX, Xiπ ≤ hX, KXiπ ≤ β hX, Xiπ .

    Before proving this theorem, let us simplify the parameters. For any ε ∈ (0, ε0 ] for ε0 a universal
                                                                           −1/2
constant, we can choose ` to be the smallest integer so that ( ε1 ρ(K))2` π∗    ≤ ε, and k = d( ε1 )2` e. This
gives the corollary:

Corollary 3.7. Suppose we have a graph G = (V, E) with signed transition operator K and π∗ =
minv∈V deg(v)
        2|E| . Given ε ∈ (ρ(K), ε0 ] for ε0 a universal constant, take

                                 l 1 log(ε 2 π ) m                            l 1 2` m
                                              ∗
                            `=                             and       k=                       .
                                   4 log(ρ(K)/ε)                                    ε

Then for R = k` + 1, it holds that R-local degree-2 SOS can deduce the bound ρ(K) ≤ 52 ε. In particular,
if we think of G, Ξ as a 2- XOR CSP, it holds that R-round Sherali–Adams can deduce the bound OBJ ≤
1    5
2 + 4 ε.

Proof. We take the parameters as outlined above. The constraints X2u = 1 imply that R-round Sherali–
Adams can deduce that hX, Xiπ = 1 whenever R ≥ 2. Further, as noted in eq. (2.3), OBJ(X) = 12 +
1
2 hX, KXiπ . Combining these observations with Theorem 3.6 gives the result.

    Theorem 3.7 implies the 2- XOR version of Theorem 1.1 since in simple graphs, log π1∗ = Θ(log n).

Proof of Theorem 3.6. For our (k, `)-spider graph on S, let (φφ , σ ) be a signed stationary S-indexed
random walk in G. Define x to be the S-indexed vector with xi = σ (i)Xφ (i) . Then letting Ψ be the PSD
matrix from Theorem 3.4, the R-local, degree-2 SOS proof system can derive

                                           hΨ, xx> i = x> Ψx ≥ 0 .

(This is in the form of Inequality (2.4) if we take P = diag(σ
                                                             σ )Ψ diag(σ
                                                                       σ ).) Furthermore, the proof system
can deduce this inequality in expectation; namely,

                                      hΨ, Yi ≥ 0, where Y = E[xx> ] .                                   (3.2)

Now by the discussion in Theorems 2.11 and 2.12,
                                                                           distS (i, j)
                                     σ (i)Xφ (i) σ ( j)Xφ ( j) ] = hX, K
                            Yi j = E[σ                                                    Xiπ .         (3.3)

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                               15
                                   RYAN O’D ONNELL AND T SELIL S CHRAMM

Thus recalling the notation A(d) from Theorem 3.2,
                                                        2`
                                                                     d
                                                Y = ∑ hX, K Xiπ A(d) ,                                             (3.4)
                                                       d=0

and hence from Inequality (3.2) we get that R-local, degree-2 SOS can deduce
                   2`
                                         d                                                                    2`
              0 ≤ ∑ hΨ, A(d) ihX, K Xiπ = c0 hX, Xiπ + k1/2` hX, KXiπ + 12 (k − 1)hX, K Xiπ ,                      (3.5)
                   d=0

for some constant 3/2 ≤ c0 ≤ 2 (here we used Theorem 3.4). Regarding the last term, we have:
                                                2`                              2`
                                          hX, K Xiπ = E ∑ (K )u v Xu Xv .                                          (3.6)
                                                             u ∼π
                                                                    v∈V

If we cared only about the Sherali–Adams proof system with Boolean equalities, we would simply now
deduce
                2`                    2`       p           2`   p          2`   p
       E ∑ (K )u v Xu Xv ≤ E ∑ (K )u v ≤ |V | E kK u ,· k ≤ |V |ρ(K ) = |V |ρ(K)2` ,
      u ∼π                        u ∼π                                   u ∼π
             v∈V                         v∈V

and later combine this with c0 hX, Xiπ = c0 . But proceeding more generally, we instead use Theorem 3.5
to show that our proof system can derive
                                                2`                   −1/2
                                   E ∑ (K )u v Xu Xv ≤ π∗                       ρ(K)2` hX, Xiπ .
                                  u ∼π
                                         v∈V

Putting this into Equation (3.6) and Inequality (3.5) we get
                                                              −1/2
                                      c0 + 21 (k − 1)π∗              ρ(K)2`
                         hX, KXiπ ≥ −                                                hX, Xiπ ≥ −β hX, Xiπ .
                                                  k1/2`
Repeating the derivation with −K in place of K completes the proof.

3.1   Max-Cut
The following theorem is quite similar to Theorem 3.6. In it, we allow K to have the large eigenvalue 1,
and only certify that it has no large-magnitude negative eigenvalue. The subsequent corollary is deduced
identically to Theorem 3.7.
Theorem 3.8. Given transition operator K for the standard random walk on G, let K 0 = K − eπ > , where
e is the all-1’s vector and π is the stationary measure. For parameters k ≥ 3` , let R = k` + 1 and define
                                                       −1/2
                                                     kπ∗                  2
                                               β=       1/2`
                                                             ρ(K 0 )2` + 1/2` .
                                                     2k                 k
Then 2R-local, degree-2 SOS can deduce the bound “λmin (K) ≥ −β ”; more precisely, it can deduce the
inequality
                                     hX, K 0 Xiπ ≥ −β hX, Xiπ .

                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                      16
                                           S HERALI –A DAMS S TRIKES BACK

Remark 3.9. Notice that ρ(K 0 ) is equal to the absolute value of the maximum-magnitude eigenvalue
of K when the trivial 1 eigenvalue is excluded (see Theorem 2.20).

Corollary 3.10. Suppose we have a graph G = (V, E) with transition operator K and centered transition
operator K 0 = K − eπ > with e the all-1’s vector, and π∗ = minv∈V deg(v)               0
                                                                    2|E| . Given ε > ρ(K ), take

                                     l 1 log(ε 2 π ) m                              l 1 2` m
                                                    ∗
                                `=                                 and        k=                 .
                                       4 log(ρ(K 0 )/ε)                                ε

Then for R = k` + 1, it holds that 2R-local degree-2 SOS can deduce the bound ρ(K 0 ) ≤ 52 ε. In particular,
if we think of G as a MAX - CUT CSP, it holds that R-round Sherali–Adams can deduce the bound
OBJ ≤ 12 + 45 ε.

    Again, Theorem 3.10 implies Theorem 1.1 since in simple graphs, log π1∗ = Θ(log n).

Proof of Theorem 3.8. The proof is a modification of the proof of Theorem 3.6. Letting S be the (k, `)-
spider vertices, instead of taking a signed stationary S-indexed random walk in G, we take two independent
unsigned stationary S-indexed random walks, φ 1 and φ 2 . For j ∈ {1, 2}, define x j to be the S-indexed
vector with ith coordinate equal to Xφ j (i) , and write ẋ for the concatenated vector (x1 , x2 ). Also, for
0 < θ < 1 a parameter16 slightly less than 1, let Ψ be the PSD matrix from Theorem 3.4, and define the
PSD block-matrix                                    1          
                                                  1   θ Ψ −Ψ
                                            Ψ̇ = 2                .
                                                      −Ψ θ Ψ
Then as before, the 2R-local, degree-2 SOS proof system can derive

     0 ≤ hΨ̇, E ẋẋ> i = ιhΨ, Yi − hΨ, Zi,          where ι = 1/θ2+θ ,          Y = E[xx> ],        Z = E[x1 x>
                                                                                                               2 ],   (3.7)

and x (which will play the role of x) denotes the common distribution of x1 and x2 . Similar to Equa-
tions (3.3) and (3.4), we now have
                                                        2`
                                                 Y = ∑ hX, K d Xiπ A(d) ,
                                                       d=0

and by independence of x1 and x2 we have

                                                                            2`
                                          Z = h1, Xi2π · J = h1, Xi2π · ∑ A(d) .
                                                                          d=0

Thus applying Theorem 3.4 to Inequality (3.7), our proof system can derive
                                                                                               
 0 ≤ ι · c0 hX, Xiπ + k1/2` hX, KXiπ + 21 (k − 1)hX, K 2` Xiπ − E [Xu ]2 · c0 + k1/2` + 21 (k − 1) . (3.8)
                                                                             u ∼π

  16 This parameter is introduced to fix a small annoyance; the reader might like to imagine θ = 1 at first.



                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                         17
                                    RYAN O’D ONNELL AND T SELIL S CHRAMM

By selecting θ appropriately, we can arrange for the factor c0 + k1/2` + 12 (k − 1) on the right to equal
ι · 12 (k − 1). Inserting this choice into Inequality (3.8) and then dividing through by ι, we conclude that
the proof system can derive
                                                                                          
                       0 ≤ c0 hX, Xiπ + k1/2` hX, KXiπ + 21 (k − 1) hX, K 2` Xiπ − h1, Xi2π ,

cf. Inequality (3.5). Recalling now that K has the constantly-1 function as a right eigenvector with
eigenvalue 1, since K is self-adjoint under h·, ·iπ (see Theorem 2.20) we have the identity

                 hX, K 2` Xiπ − h1, Xi2π = hX, K 2` Xiπ − hX, eπ > Xiπ = hX, (K − eπ > )2` Xiπ .

Now the remainder of the proof is just as in Theorem 3.6, with K − eπ > in place of K, except we do not
have the step of repeating the derivation with −K in place of K.

3.2     Independent Set
Given a graph G = (V, E), consider the Sherali–Adams LP relaxation for maximum independent set
using formal variables Zu for each vertex u ∈ V (which represent the 0/1 indicators for membership in the
independent set):17
                                              1
       INDEPENDENT- SET (G) ≤ max                 ∑ Zu         s.t.         Z2u = Zu            (for all u ∈ V )
                                             |V | u∈V
                                                                         Zu Zv = 0              (for all (u, v) ∈ E)

   We show that in regular and near-regular graphs with bounded spectral radius, The Sherali–Adams
LP can certify nontrivial bounds on the value of the maximum independent set.

Theorem 3.11. Given transition operator K for the standard random walk on G, let K 0 = K − eπ > , where
e is the all-1’s vector. For parameters k ≥ 3` , let R = k` + 1 and define
                                                     −1/2
                                                 kπ∗                  2
                                           β=       1/2`
                                                         ρ(K 0 )2` + 1/2` .
                                                 2k                 k
Then 2R-local, degree-2 SOS can deduce the bound “Eu ∼π [Zu ] ≤ β .”

      Again, in the following corollary we select parameters for convenience:

Corollary 3.12. Suppose we have a regular graph G = (V, E) with transition operator K and centered
transition operator K 0 = K − eπ > for e the all-1’s vector. Given ε > ρ(K 0 ), take
                                   l1    log(ε 2 n) m                          l 1 2` m
                              `=                               and        k=                .
                                     4 log(ρ(K 0 )/ε)                             ε

Then for R = k` + 1, it holds that 2R-local degree-2 SOS can deduce the bound INDEPENDENT- SET(G) ≤
5
2 ε.
  17 We have changed to work with Z variables to emphasize that they are 0/1 valued rather than ±1-valued.



                          T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                         18
                                      S HERALI –A DAMS S TRIKES BACK

Proof of Theorem 3.11. The proof is a slight modification of the proof of Theorem 3.8. Letting S be the
(k, `)-spider vertices, we again take two independent unsigned stationary S-indexed random walks, φ 1
and φ 2 . For j ∈ {1, 2}, define z j to be the S-indexed vector with ith coordinate equal to Zφ j (i) , and write
ż for the concatenated vector (z1 , z2 ). Also, define the PSD block matrix
                                                               
                                                   1    Ψ −Ψ
                                             Ψ̇ = 2               .
                                                      −Ψ Ψ
Then as before, the 2R-local, degree-2 SOS proof system can derive
                0 ≤ hΨ̇, E żż> i = hΨ, Ai − hΨ, Bi,     where A = E[zz> ],      B = E[z1 z>
                                                                                            2 ],           (3.9)
where z denotes the common distribution of z1 and z2 . Similar to Equations (3.3) and (3.4), we have
                                                 2`
                                           A = ∑ hZ, K d Ziπ A(d) ,
                                                 d=0

with the additional twist that our proof system certifies hZ, KZiπ = 0, since we have the constraints that
Zu Zv = 0 on edges. By the independence of z1 and z2 ,
                                                                      2`
                                     B = h1, Zi2π · J = h1, Zi2π · ∑ A(d) .
                                                                   d=0

Thus applying Theorem 3.4 to Inequality (3.9), our proof system can derive
                                                                                     
              0 ≤ c0 hZ, Ziπ + 21 (k − 1)hZ, K 2` Ziπ − h1, Zi2π c0 + k1/2` + 21 (k − 1) .                (3.10)

Recalling now that K has the constantly-1 function e as a right eigenvector with eigenvalue 1, since K is
self-adjoint under h·, ·iπ (see Theorem 2.20) we have the identity
                 hZ, K 2` Ziπ − h1, Zi2π = hZ, K 2` Ziπ − hZ, eπ > Ziπ = hZ, (K − eπ > )2` Ziπ .
Now the remainder of the proof is similar to that of Theorem 3.8. We observe that hZ, Ziπ = Eu∼π Zu from
the constraint Zu = Z2u , and h1, Zi2π = Eu ∼π [Zu ]2 . We apply these observations along with Theorem 3.5
and to Inequality (3.10) to get
                                             −1/2
                0 ≤ c0 E [Zu ] + 21 (k − 1)π∗       ρ(K − J)2` E [Zu ] − (c0 + k1/2` ) · E [Zu ]2
                       u ∼π                                    u ∼π                     u ∼π

and then algebraic manipulations along with the bound 0 ≤ c0 ≤ 2 give us the result.

3.3   A technical construction of coefficients on the spider
Here, we prove Theorem 3.3. The construction of Ψ is technical, and we cannot offer strong intuition
for it. Roughly speaking, we build Ψ as a sum of ` + 1 quadratic forms of vectors, which ensures that
Ψ  0. The first vector adds mass to Ψ’s correlation with A(0) , A(1) and A(2) ; each successive tth vector
for t ∈ {0, . . . , ` − 1} adds mass to the A(2t) and A(2t+4) correlations while subtracting mass from the
A(2t+2) correlation. By tuning everything carefully, and finally making a small rank-1 adjustment, we
ensure that the only nonzero correlations are on A(d) for d = 0, 1, 2`.

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                  19
                                   RYAN O’D ONNELL AND T SELIL S CHRAMM

Proof of Theorem 3.3. We are considering the (k, `)-spider graph on vertex set S. We write Vt for the set
of all vertices at distance t from the root (so |V0 | = 1 and |Vt | = k for 1 ≤ t ≤ `). We will be considering
vectors in RS , with coordinates indexed by the vertex set S. For 0 ≤ t ≤ ` define the vector

                                                  µt = avg{α t ei } ,
                                                          i∈Vt

where ei = (0, . . . , 0, 1, 0, . . . , 0) is the vector with the 1 in the ith position. Further define vectors

                                                    χ = µ0 + µ1 ,
                                                   ψt = µt − µt+2 for 0 ≤ t < ` ,

with the understanding that µ`+1 = 0. Next, define the PSD matrix
                                                                 `−1
                                              e = χ χ > + ∑ ψt ψt> .
                                              Ψ
                                                                 t=0

This will almost be our desired final matrix Ψ. Let us now compute
                                                                         `−1
                                      e A(d) i = χ > A(d) χ + ∑ ψt> A(d) ψt .
                                     hΨ,
                                                                         t=0

To do this, we observe that

                                   µs> A(d) µt = α s+t       Pr          [distS (ii, j ) = d] ,
                                                         i ∼Vs , j ∼Vt

and
                                                     (
                                                      α t if d = t ,
                             µ0> A(d) µt = µt> A(d) µ0 =
                                                      0 else;
                                                     
                                                             s+t                          if d = |s − t| ,
                                                     (1/k)α
                                                     
                                             > (d)
                       and for s,t > 0,     µs A µt = (1 − 1/k)α s+t                      if d = s + t ,
                                                     
                                                       0                                  else .
                                                     

From this we can compute the following (with a bit of effort):
                        e A(0) i = 2 + (2/k)α 2 + (2/k)α 4 + · · · + (2/k)α 2`−2 + (1/k)α 2`
                       hΨ,
                        e A(1) i = 2α
                       hΨ,
                        e A(2) i = −(2/k)α 2 − (2/k)α 4 − (2/k)α 6 − · · · − (2/k)α 2`−2
                       hΨ,
                    e A(2t+1) i = 0,
                   hΨ,                    1≤t <`
                      e A(2t) i = 0,
                     hΨ,                  1<t <`
                       e A(2`) i = (1 − 1/k)α 2`
                      hΨ,


                          T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                    20
                                             S HERALI –A DAMS S TRIKES BACK

Now, for a parameter η > 0 to be chosen shortly, we finally define the PSD matrix
                                                       1e
                                                     Ψ= Ψ + η µ1 µ1> .
                                                       2
We have                                                    (
                                                            η(1/k)α 2                 if d = 0 ,
                                      hη µ1 µ1> , A(d) i =
                                                            η(1 − 1/k)α 2             if d = 2 .
Therefore by carefully choosing                           2`−2    
                                                      1   α     −1
                                                  η=                 ,
                                                     k−1   α2 − 1
we get all of the desired inner products in the theorem statement.


4      Lower bounds
In this section, we show that degree-R Sherali–Adams cannot refute a random 2- XOR or MAX - CUT
instance better than 12 + Ω( R1 ). This is a straightforward application of the framework of Charikar,
Makarychev and Makarychev [13]. In that article, the authors show that if every subset of r points in a
metric space can be locally embedded into the unit sphere, then Goemans–Williamson rounding can be
used to give a Θ(r)-round Sherali–Adams feasible point. The upshot is the following theorem appearing
in [13] (where it is stated in slightly more generality, for the 0/1 version of the cut polytope):
Theorem 4.1 (Theorem 3.1 in [13]). Let (X, ρ) be a metric space, and assume that every r = 2R + 3
points of (X, ρ) isometrically embed in the Euclidean sphere of radius 1. Then the following point is
feasible for R-rounds of the Sherali–Adams relaxation for the cut polytope:18
                                                                      
                                               2            1        2
                               E[Xi X j ] = 1 − arccos 1 − ρ(i, j) .
                               e
                                               π            2
Proposition 4.2. In any 2- XOR or MAX - CUT instance, R-rounds of Sherali–Adams cannot certify that
                                                              1   1   1
                                                OBJ(x) <        +   −   .
                                                              2 2πR 3R2
Proof. Suppose that we are given a 2- XOR (equivalently, MAX - CUT) instance on the graph G, so that on
each edge (i, j) ∈ E(G) we have the constraint Xi X j bi j = 1 for some bi j ∈ {±1}. Define the  qmetric space
on (X, ρ) as follows: let X = {1, . . . , n} have a point for each vertex of G, and set ρ(i, j) = 2 1 − bi j 1r
for r = 2R + 3.
    We claim that any r points of X embed isometrically into the Euclidean sphere of radius 1. To see
this, fix a set S ⊂ X, and define the |S| × |S| matrix BS so that
                                                  (b
                                                     ij
                                                          if (i, j) ∈ E(G),
                                        (BS )i j = r
                                                    0     otherwise .
    18 We are here using the standard branch of arccos, returning values in [0, π).



                             T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                           21
                                  RYAN O’D ONNELL AND T SELIL S CHRAMM

So long as |S| ≤ r, the matrix MS = 1 + BS is diagonally dominant, and therefore positive semidefinite, so
in particular there must exist V ∈ R|S|×|S| with MS = VV > . For each i ∈ S, let vi correspond to the row of
V indexed by i. Since for each i ∈ S, 1 = (MS )i,i = kvi k2 , we assign to i ∈ S the unit vector vi , and further
for every pair i, j ∈ S, kvi − v j k2 = 2 − 2bi j 1r = ρ(i, j)2 .
    Applying Theorem 4.1, we have that the solution
                                                                                        
                                    2                1            1          2             1
                 E[Xi X j ] = 1 − arccos 1 − · 2 1 − bi j
                 e                                                     = 1 − arccos bi j
                                    π                2            r          π             r

is feasible. For convenience, let f (z) = 1 − π2 arccos(z). We use the following properties of f :
Claim 4.3. The function f (z) = 1 − π2 arccos(z) exhibits the rotational symmetry f (z) = − f (−z), and
further f (z) ≥ π2 z for z ∈ [0, 1].
   We give the proof of the claim (using straightforward calculus) below. Now, because f (z) = − f (−z),
we have that
                                                                  
                                                            1       1   2 1
                           bi j · E[X
                                  e i X j ] = bi j · f bi j     =f     ≥ · ,
                                                            r       r   π r

where to obtain the final inequality we have used that for z ∈ [0, 1], f (z) ≥ π2 z ≥ 0. We conclude that
R-round Sherali–Adams is unable to certify that OBJ < 12 + πr1
                                                               = 12 + π1 2R+3
                                                                            1
                                                                              , as desired.

Proof of Theorem 4.3. The rotational symmetry follows from simple manipulations:
                                      2                              2
              f (z) − (− f (−z)) = 2 − (arccos(z) + arccos(−z)) = 2 − arccos(−1) = 0 .
                                      π                              π
For the second claim, we use that the derivative of f (z) − π2 z is positive in the interval [0, 12 ]:
                                        
                          ∂            2       2     1         2
                               f (z) − z = √                − > 0 for |z| < 1 ,
                         ∂z            π       π 1−z      2    π

and that at z = 0, f (z) − π2 z = 0.


5    Refutation for any Boolean CSP
In this section, we argue that Sherali–Adams can also refute non-trivial Boolean CSPs. For integers
n,t ≥ 1, we will use the notation [n]t to denote the set of sequences of length t in elements of [n]. First, for
any predicate P : {±1}k → {0, 1} we define a parameterized distribution over the CSP with constraints
from P:

Definition 5.1. Let P : {±1}k → {0, 1} be a predicate. Then we define a random instance of P on n
vertices with m expected clauses to be an instance sampled as follows: define p = nmk , and for each
sequence S ∈ [n]k , independently with probability p we sample a uniformly random string ζS ∈ {±1}k
and add the constraint that P(xS ζS ) = 1, where ◦ denotes the entry-wise (or Hadamard) product.

                         T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                 22
                                    S HERALI –A DAMS S TRIKES BACK

   This is one of several popular models, and in our case it is the most convenient to work with. By
employing some manipulations, results from this model transfer readily to the others (see for example
Appendix D of [2] for details).
   Our result is as follows:

Theorem 5.2. Suppose that P : {±1}k → {0, 1} and that δ , ε > 0 are fixed constants. Let E[P] be the
probability that a random x ∈ {±1}k satisfies P. Then with high probability, for a random instance I of P
on n variables with m ≥ ndk/2e+δ expected clauses, the R-round Sherali–Adams proof system can certify
that OBJI (x) ≤ E[P] + ε with R = Oε,δ ,k (1) rounds. More specifically,
                                                !2`
                                   3 · 2k/2−1                                  1
                          R = k`                      +k   for    ` = dd 2k e 2δ e.
                                        ε

   We can also prove a more fine-grained result, to obtain strong refutation at lower clause densities
when the predicate has certain properties.

Definition 5.3. We say that a predicate P : {±1}k → {0, 1} is η-far from t-wise supporting if every
t-wise uniform distribution has probability mass at least η on the set of unsatisfying assignments P−1 (0).

Theorem 5.4. Suppose that P : {±1}k → {0, 1} is η-far from t-wise supporting, and that δ , ε > 0. Then
with high probability, for a random instance I of P on n variables and m ≥ ndt/2e+δ expected clauses,
the R-round Sherali–Adams proof system can certify that OBJI (x) ≤ 1 − η + ε with R = Oε,δ ,t (1) rounds.
More specifically,
                                             !2`
                                  3 · 2t/2−1                                1
                          R = t`                 +t     for    ` = dd 2t e 2δ e.
                                       ε

    Following the strategy introduced in [2], we will do this by first refuting weighted random instances
of k- XOR for k ≥ 1. After this, any predicate P : {±1}k → {0, 1} can be decomposed according to its
Fourier decomposition, which will yield a weighted sum of t- XOR instances for t ≤ k, and our proof
system will refute each individually.


5.1   Higher-arity XOR
Ultimately, we will reduce each k-CSP to a sum over weighted t- XOR instances with t ≤ k:

Definition 5.5. Let W be a distribution over signed integers. We say that I is a random k- XOR instance
weighted according to W if it is sampled as follows: for each sequence S ∈ [n]k , we take a bS to be equal
to a uniformly random sample from W, and finally set the objective function to be ∑S bS · xS .

    Following the standard strategy introduced by [29, 25] and subsequently honed in many articles, we
will reduce refuting these t- XOR instances to refuting 2- XOR instances.

                       T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                             23
                                 RYAN O’D ONNELL AND T SELIL S CHRAMM

5.1.1   Even k-XOR
In this case, we perform a standard transformation to view the k- XOR instance as a 2- XOR instance on
super-vertices given by subsets of vertices of size k/2.

Definition 5.6. Suppose k > 1 is an integer and I is a 2k- XOR instance on n variables x1 , . . . , xn , with
objective ∑U∈[n]2k bU · xU . Then we let its flattening, Iflat , be the 2- XOR instance on nk variables given by
associating a new variable yS for each sequence S ∈ [n]k , and for each U ∈ [n]2k , choosing the partition of
U into the sequences S, T with S containing the first k elements and T containing the last k, taking the
objective function ∑S,T bU · yS yT .

Lemma 5.7. Suppose that I is a 2k- XOR instance, and let Iflat be the 2- XOR instance given by its
flattening. Then if the R-round Sherali–Adams proof system can certify that OBJIflat (x) ≤ c, then the
k · R-round Sherali–Adams proof system can certify that OBJI (x) ≤ c.

Proof. Every degree-R Sherali–Adams proof for Iflat can be transformed into a Sherali–Adams proof of
degree at most kR for I by applying the transformation yS = ∏i∈S xi = xS . Further, this transformation
exactly relates the objective functions of Iflat and I. This proves the claim.

    If the 2k- XOR instances that we start with are random weighted instances, then their flattenings are
also random weighted 2- XOR instances.

Claim 5.8. Suppose that I is a random 2k- XOR instance on n vertices weighted according to W. Then
the flattening Iflat is a random 2- XOR instance on nk vertices weighted according to W.

Proof. This fact is immediate, since the sequences U ∈ [n]2k are in bijection with ordered pairs of
sequences S, T ∈ [n]k .

    We will require the following proposition, which applies our main theorem in the context of random
k- XOR instances with random weights from well-behaved distributions.

Proposition 5.9. Suppose that W is a distribution over integers which is symmetric about the origin, and
let k be a fixed positive integer, and n be an integer sufficiently large. Let E denote the expectation under
the measure W, and let σ 2 ≥ E w2 be a bound on the variance. Furthermore, suppose that
                                                             q          
                                                                   log n
     • The expected absolute value is at least E |w| = ω σ           nk
                                                                          ,

    • With high probability over n2k i.i.d. samples w1 , . . . , wn2k ∼ W, maxi∈[n2k ] |wi | ≤ M = o(σ 2 nk ).

Now, define                                                         !
                                                σ log n         M
                                     ρ =O           √ · max(1, √ k ) .
                                               E |w| nk         n

Then if I is a random 2k- XOR instance on n variables weighted according to W, with high probability I
                        √
has E |w| · n2k ± O(σ nk log n) constraints. Further, choosing ` ∈ N+ large enough so that nk/4` ρ ≤ 12 ε 2`
                           2`
and setting R = 2k · ` · ε1 , R rounds of Sherali–Adams can deduce the bound OBJI (x) ≤ 12 + 32 ε.

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                     24
                                           S HERALI –A DAMS S TRIKES BACK

    To prove the above, we require the following standard matrix Bernstein inequality:

Theorem 5.10 (Theorem 6.6.1 in [48]). Let A1 , . . . , Am ∈ RN×N be independent random matrices, with
E Ai = 0 for all i ∈ [m] and kAi k ≤ M for all i ∈ [m]. Let A = ∑i∈[m] Ai denote their sum, and suppose that
k E AA> k, k E A> Ak ≤ σ 2 . Then
                                                                               !
                                                               1 −t 2
                                     Pr (kAk ≥ t) ≤ N · exp                        .
                                                               2 σ 2 + 31 Mt

Proof of Theorem 5.9. Given a weighted 2k- XOR instance on n variables with weights from W, we
consider its flattening Iflat with objective function OBJ(x) = m1 ∑i, j∈[N] 12 (1 + bi j xi x j ) for m the absolute
sum of weights, we construct its signed adjacency matrix as follows: first take the matrix W defined so
that Wi, j = bi j , and obtain a new matrix B = 12 (W +W > ). For any x, applying Theorem 5.7 we have that
1     1 >
2 + 2m x Bx = OBJI (x).
    Since W is a distribution over integers, 2B has signed integer entries. We think of 2B as defining a
multigraph G on nk vertices with signed edges, so that there are 2 · |Bi j | multiedges between i, j ∈ [nk ],
each with sign sgn(Bi j ). Let 2 · D be the degree matrix of G, let A = |2B| be the adjacency matrix of G,
let Ξ = sgn(B) be the matrix of signs of B, and let K = (2D−1 )A = D−1 B ◦ Ξ be the transition matrix for
the random walk on G.
    To apply Theorem 3.7, we must upper bound the spectral radius of Ξ ◦ K = D−1 B, as well as bound
the minimum degree of G and the total number of edges. We will use the bound

                                                                      1
                                   kD−1 Bkop ≤ kD−1 kop · kBkop ≤        kBkop .
                                                                      π∗

First, we bound kBkop . Take B0 to be the truncated version of B, so that B0i, j = sgn(Bi, j ) · max(|Bi, j |, M).
                                          k
Thinking of the matrix B0 as the sum of n2 + nk symmetric matrices, one for each pair i, j ∈ [nk ], satisfies
the requirements
           p     of Theorem 5.10. We have that E[B0 B0> ]  nk σ 2 · 1, so applying Theorem 5.10 with
t = O max( σ 2 nk log n, M log n) we get that with high probability,
                                                p                     
                                   kBkop ≤ O max   σ 2 nk log n, M log n ,

where we have also used that with high probability B = B0 by the properties of W.
    Now, we bound the sum of degrees 2m and the minimum degree dmin . We have that the total sum of
the degrees is given by 2m = ∑i, j∈[nk ] |bi j | with bi j ∼ W. By a Bernstein inequality,
                                                                                       !
                             
                                      2k
                                                          1       s2
                        Pr       2m − n E |w| ≥ s ≤ 2 exp − 2k                             ,
                                                           2 n · E w2 + 31 Ms
                                                      √
so since by assumption σ 2 n2k  M, setting s = O(σ nk log n) we have that with high probability
                                                                  p
                                        2m = n2k E |w| ± O(σ nk    log n) .                                   (5.1)

                         T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                    25
                                     RYAN O’D ONNELL AND T SELIL S CHRAMM

    By our assumptions on W we have that for every i ∈ [nk ], E degG (i) = nk E |w|. Applying a Bernstein
inequality gives that
                                                                              !
                                      k                      1        t2
                       Pr[degG (i) ≤ n E |w| − t] ≤ exp − · k 2 1                ,
                                                             2 n σ + 3 Mt
                                            p                                               p
so using that M ≤ nk σ 2 and taking t = O      nk σ 2 log n we get that dmin = nk · E |w| ± O( nk σ 2 log n)
with high probability. This gives that with high probability,
                                                p
                  ∗    dmin    nk · E |w| − O(σ nk log n)       1
                π =         ≥ 2k                   k
                                                     √        ≥ k · (1 − o(1))
                        2m     n · E |w| + O(σ n log n) n
                                       √
                                          k                  M                   !
                       kBkop O(σ n log n) · max(1, √nk )                 σ log n
         kΞ ◦ Kkop ≤          ≤                     p           ≤O           √      · max(1, √M k ) ,
                        dmin      nk · E |w| − O(σ nk log n)            E |w| nk              n

                          √         √
where we have used that σ log n  nk E |w|.
   Now, the result follows by applying Theorem 3.7 and Theorem 5.7.

5.1.2    Odd k-XOR
For odd integers k, k- XOR instances do not have the same natural, symmetric flattenings. Instead, we
define what we call a lift:

Definition 5.11. Suppose k ≥ 1 is an integer and I is a (2k + 1)- XOR instance on n variables x1 , . . . , xn ,
with objective ∑U∈[n]2k+1 bU · xU . Then we let its lift, Ili f t , be the bipartite 2- XOR instance on parts each
containing nk+1 variables created as follows:

    • Create new variables w1 , . . . , wn

    • For each U ∈ [n]2k+1 , choose a random index iU ∈ [n] and modify the objective to ∑U∈[n]2k+1 bU ·
      xU · wiU

    • For each sequence S associate a new variable yS , and for each sequence T ∈ [n]   k
                                                                                        and index i ∈ [n]
      associate a new variable zT,i . We understand yS = ∏i∈S xi , and zT,i = ∏ j∈T x j · wi .

    • For each U ∈ [n]2k+1 , we take the sequence V = (U, iU ) and assign it a new coefficient bV0 = bU .
      For the remaining bV0 , we set bV0 = 0.

Finally, Ili f t is the instance with the objective function ∑S∈[n]k+1 ,T ∈[n]k ,i∈[n] b0S∪T ∪i · yS zT,i .

    We obtain a statement analogous to Theorem 5.7 for odd k- XOR:

Lemma 5.12. Suppose that I is a weighted (2k + 1)- XOR instance, and let Iflat be the bipartite 2-
XOR instance given by its flattening. Then if the R-round Sherali–Adams proof system can certify that
OBJIflat (x) ≤ c, then the (k + 1) · R-round Sherali–Adams proof system can certify that OBJI (x) ≤ c.


                           T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                26
                                      S HERALI –A DAMS S TRIKES BACK

Proof. The only modification to the proof of Theorem 5.7 is that for zT,i we substitute zT,i = xT (where
we have implicitly substituted wi = 1 for all i ∈ [n]).

   However, the lifting procedure does not preserve the weighting distribution W, because of the step in
which a random index iU is chosen to lift U. For this reason, we prove an analog of Theorem 5.9:

Proposition 5.13. Suppose that W is a distribution over integers which is symmetric about the origin,
and let k be a fixed positive integer, and let n be an integer sufficiently large. Let E denote the expectation
under the measure W, and let σ 2 ≥ Ew2 be a bound on the variance. Furthermore, suppose that
                                                            q
   • The expected absolute value is at least E |w|  σ log      nk
                                                                   n
                                                                     ,

    • With high probability over n2k+1 i.i.d. samples w1 , . . . , wn2k+1 ∼ W, maxi∈[n2k+1 ] |wi | ≤ M  σ 2 nk .

Now, define                                                          !
                                                σ log n
                                     ρ =O           √ · max(1, √M k ) .
                                               E |w| nk         n


Then if I is a random (2k +1)- XOR instance on n variables weighted according to W, with high probability
                                √
it has E |w| · n2k+1 ± O(σ nk log n) constraints. Furthermore, choosing ` ∈ N+ large enough so that
                                                    2`
n(k+1)/4` ρ ≤ 21 ε 2` and setting R = (2k + 2)` · ε1 , R rounds of Sherali–Adams can deduce the bound
OBJI (x) ≤ 12 + 32 ε.

Proof. The thread of the proof is the same as that of Theorem 5.9. We will refute Ili f t , since by
Theorem 5.12 this is sufficient. We begin by associating with Ili f t a multigraph G (which we may do
because W is a distribution over integers). The multigraph G is a bipartite graph, with one bipartition
corresponding to variables yS for S ∈ [n]k+1 , and one bipartition corresponding to variables zT,i for T ∈ [n]k
and i ∈ [n]. We let the block matrix B be (the 12 -scaled) signed adjacency matrix of G, let Ξ be the matrix
of signs, D be the diagonal degree matrix, and K ◦ Ξ = D−1 B be the signed transition matrix of the random
walk on G. In order to apply Theorem 3.7 we must bound kK ◦ Ξk and π ∗ = dmin (G)/2m.
    First, we bound the vertex degrees. For a vertex of the form (T, i), the expected value of the incident
edge (S, T ∪ i) is bS,T · 1n , where bS,T . The degree of T ∪ i is simply the sum

                                        degG (T ∪ i) =     ∑ |bS,T ∪i | ,
                                                         S∈[n]k+1


a sum of independent random variables with expectation 1n E |w| and variances 1n σ 2 . Applying a Bernstein
inequality, we have that
                                                                                       !
                                                                            −t 2
                                                      
                                    1 k+1                            1
                  Pr degG (T ∪ i) − · n E[|w|] ≥ t ≤ 2 exp                               ,
                                    n                                2 nk σ 2 + 31 Mt
                 p
                           n) (and using that M  nk σ 2 ), we have that the degree of T ∪ i vertices is
so taking t = O( σ 2 nk logp
                k
degG (T ∪ i) = n E |w| ± O( σ 2 nk log n) with high probability.

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                  27
                                 RYAN O’D ONNELL AND T SELIL S CHRAMM

      A similar argument applies to the S vertices; the total degree of such a vertex is


                                        degG (S) =    ∑ ∑ bS,T ∪i ,
                                                     T ∈[n]k i∈[n]

since only one of the bS,T ∪i will be nonzero. The inner sums are independent random variables with mean
E |w| and variance σ 2 , therefore
                                                                                   !
                                                                1      −t 2
                         Pr degG (S) − nk E |w| ≥ t ≤ 2 exp                          ,
                                                                  2 nk σ 2 + 31 Mt
                   p
so taking t = O( σ 2 nk p log n) (and using that M  nk σ 2 ), we have that the degree of S vertices is also
             k
degG (S) = n E |w| ± O( σ 2 nk log n) with high probability.
     We finally bound kK ◦ Ξk ≤ kD−1 k · kBk. As before, B is a sum of independent symmetric matrices,
one for each coefficient bU from I. That is, we can define matrices BU for each U ∈ [n]2k+1 with U = S, T
for S ∈ [n]k+1 , T ∈ [n]k where BU has a number bU ∼ W in one off-diagonal block entry (S, T ∪ i) and
the other off-diagonal block entry (T ∪ i, S) for a randomly chosen i ∈ [n]. Thus, E BU BU> is a diagonal
matrix with 1n σ 2 on each diagonal of the form (T ∪ i, T ∪ i) and σ 2 on each block diagonal of the form
(S, S). We then have that E BB>  nk σ 2 1, since for each S there is a sum over nk matrices BU and for
each T ∪ i there is a sum over nk+1 matrices BU . Applying Theorem 5.10 by using the same truncation
trick again, we have that                       p                      
                                 kBk ≤ O max        σ 2 nk log n, M log n ,

and from this we have that with high probability,
                                                                       !
                                           1                 σ log n
                          kK ◦ Ξk ≤                 ≤O           √         · max(1, √M k ) ,          (5.2)
                                       degmin (G)           E |w| nk                   n

                                       degmin (G)    1
                                π∗ =              = k+1 · (1 ± o(1)) ,                                (5.3)
                                          2m       n
after which we can apply Theorem 3.7.

5.2     From Boolean CSPs to k-XOR
Following [2], we prove Theorem 5.2 via reduction to XOR.

Proof of Theorem 5.2. Given a random instance of the CSP defined by the predicate P, and p = n−bk/2c+δ ,
a Bernstein inequality
                  p gives us that the number of constraints m is with high probability given by
m=n   dk/2e+δ                                 1
              ± 10 ndk/2e+δ log n. Set ` = d 2δ e.
   Since P is a Boolean predicate, we can write P in its Fourier expansion:

                                           P(z) = ∑ P(α)
                                                    b    ∏ zi .
                                                    α⊆[k]        i∈α



                         T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                            28
                                      S HERALI –A DAMS S TRIKES BACK

Using this expansion, we re-write the objective function. Recall that [n]k is the set of all sequences of
k elements of [n]. For each S ∈ [n]k , let bS be the 0/1 indicator that there is a constraint on S, and let
ζS ∈ {±1}S be the literal signing on the variables for the constraint on S. Then, if the total number of
constraints is m,
                                  1
                    OBJI (x) =       ∑ k bS · P(xS ◦ ζS )
                                  m S∈[n]
                                  1
                              =           ∑ k ∑ bS · P(α)
                                  m S={i ,...,i
                                                            b ∏ xia (ζS )ia
                                        1     k }∈[n] α⊆[k]   a∈α
                                                                                                !
                                1
                              =    ∑ P(α)
                                        b · ∑                      ∑           bS · ∏ (ζS )ia       · xT .   (5.4)
                                m α⊆[k]    T ∈[n]|α|          S∈[n]k ,S|α =T       a∈α

Now, define for each α ⊆ [k] with |α| = t > 0 the t- XOR instance
                                                                !
                           1                                            1
                  I (x) =
                   α
                                                  bS · ∏ (ζS )ia · xT =           wT · xT ,
                          m T∑           ∑
                              ∈[n]t S∈[n]k ,S| =T      a∈α              m   ∑
                                                                          T ∈[n]t
                                                  α


where we have taken wT = ∑S∈[n]k ,S|α =T bS · ∏a∈α (ζS )ia . So that from Equation (5.4),

                                        OBJI (x) = ∑ P(α)
                                                     b    · Iα (x) .                                         (5.5)
                                                      α⊆[k]

    Let Wnk−t be the distribution defined so that w ∼ Wnk−t is a sum of nk−t independent variables taking
value {±1} with probability p and value 0 otherwise. Since for each S ⊇ T , the quantity ∏a∈α (ζS )ia is
an independent uniform sign in {±1} and bS is an independent Bernoulli-p variable, we have that the
coefficients wT in Iα are i.i.d. from Wnk−t . The following lemma establishes some properties of WN (we
will prove the lemma in Appendix A):
Lemma 5.14. Let WN (p) be the distribution defined so that X ∼ WN is given by X = ∑t=1    N
                                                                                             Yt · Zt , where
the {Yt }t , {Zt }t are i.i.d with Yt ∼ Ber(p) and Zt ∼ {±1}. Then for X ∼ WN (p), EX = 0 and E X 2 = pN.
                                           2 √                    √
Further, so long as pN ≥ 1, E |X| ≥ e3/2       pN, and Pr(|X| > 2t pN) ≤ 2 exp −t 2 . Otherwise, if pN ≤ 1,
           1         1
                        , and Pr(|X| ≥ 1 + t) ≤ exp − 12 t .
                                                          
E |X| ≥ 2e    log 1−pN
                                                                                          q
    From Theorem 5.14, we have that E w2T = pnk−t , and by Cauchy–Schwarz, E |wT | ≤ E w2T . Let mα
be the total absolute weight of constraints in Iα ,

                                                  mα = ∑ |wT | .
                                                         T

Notice that in all cases, mα ≤ m.
    Now, we show that Sherali–Adams can certify upper bounds on |Iα (x)| for every α. First, consider α
with |α| = t = 1. In this case, Sherali–Adams with R = 1 can certify that
                                             1                1          mα
                                  Iα (x) =      ∑    wi · xi ≤ ∑ |wi | =    ,
                                             m i∈[n]          m i∈[n]    m

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                                   29
                                RYAN O’D ONNELL AND T SELIL S CHRAMM

From an application of Bernstein’s inequality (the same as in the proof of Theorem 5.9),
                                             q         p
                                    mα ≤ n · E w2T + pnk log n
                                          p
with high probability whenever pnk           pnk−1 , and applying our bound on m we conclude that with
high probability SA will certify that
                                              p
                                         n·    pnk−1 (1 + o(1))     2
                               I (x) ≤
                                α
                                                                ≤p       .
                                              pnk (1 ± o(1))       pnk−1
The lower bound on Iα (x) follows from identical reasoning with its negation, so we can conclude that
with high probability SA can certify that
                                                        2
                                        |Iα (x)| ≤ p           = o(1) .
                                                       pnk−1
     Now, we tackle α with |α| = t for 2 ≤ t ≤ k. We will verify that the conditions of Theorems 5.9
and 5.13 hold. First, consider the α with |α| = t for pnk−t ≥ 1. From Theorem 5.14, in this case we have
that
                                                      2 p
                                          E |wT | ≥ 3/2 pnk−t ,
                                                    e
                                       p                                              p
and with high probability, |wT | ≤ O( t pnk−t log n) for all T ∈ [n]t . Letting M = O( t pnk−t log n), and
σ 2 = pnk−t , we meet the conditions for Theorem 5.9:
                                       p
                              M ≤ O( pnk−t )  pnk−t · nbt/2c = σ 2 nbt/2c ,
                                                    r                  r
                                      2 p k−t          pnk−t log n       σ 2 log n
                          E |wT | ≥ 3/2 pn                        =               ,
                                    e                    nbt/2c            nbt/2c
so long as pnk−t ≥ 1, which we have assumed. So applying Theorems 5.9 and 5.13 to both mmα Iα and
− mmα Iα , we have
                                   !                                           p        !
                                                                                  pnk−t
                                                                    
                        σ log n                 M            log n
             ρ =O          √         · max(1, √ bt/2c ) ≤ O √          · max 1, bt/2c     ,
                    E |wT | nbt/2c             n              nbt/2c            n

and so long as mmα · ndt/2e/4` ρ ≤ 21 ε 2` , with high probability over Iα , t(`r + 1) rounds of Sherali–Adams
certify that |Iα (x)| ≤ 23 ε. We confirm that
                                                    p
                              mα dt/2e/4`             pnk−t · nt dt/2e/4`
                                 ·n           ·ρ =              ·n        · ρ  o(1) ,
                              m                         pnk
whenever t ≥ 1 and ` ≥ 1.
   Finally, we handle α with |α| = t for 2 ≤ t and pnk−t < 1. From Theorem 5.14 we have
                                                     1         1
                                         E |wT | ≥     log           ,
                                                     e     1 − pnk−t

                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                               30
                                     S HERALI –A DAMS S TRIKES BACK

and with high probability, |wT | ≤ 4 log n for all T ∈ [n]t . Taking M = 4 log n, we meet the conditions of
Theorems 5.9 and 5.13:
                       M ≤ 4 log n  ndk/2e−dt/2e+δ = pnk−t nbt/2c = σ 2 nbt/2c ,
                                                         r                  r
                            1         1        1 k−t        pnk−t log n         σ 2 log n
                   E |wT | ≥ log            ≥    pn                     =                ,
                            e     1 − pnk−t    e               nbt/2c             nbt/2c
where the last inequality is true whenever pnk−t+bt/2c = ndk/2e−dt/2e+δ  log n, which we have by
assumption. So applying Theorems 5.9 and 5.13 to both mmα Iα and − mmα Iα , we have that for
                     p               !              s
                         pnk−t log n
                                                                                      
                                                         1                  log n
             ρ =O           √          = O(log n) ·             = O   √                  ,
                      pnk−t nbt/2c                    pnk−dt/2e         ndk/2e−dt/2e+δ

so long as mmα ndt/2e/4` ρ ≤ 21 ε 2` , R = t(`r + 1) rounds of Sherali–Adams certify that |Iα (x)| ≤ 23 ε with
high probability. Verifying,
                                                                                              
           mα dt/2e/4`                   log n              1                         log n
                ·n         ·O √                       =p         · ndt/2e/4` · O √                 ,
           m                         ndk/2e−dt/2e+δ        pnk−t                  ndk/2e−dt/2e+δ
which is maximized at t = k. By our choice of `, the condition
                                                             √ holds.
    We therefore have (using Parseval’s identity and kxk1 ≤ kkxk2 for x ∈ Rk to simplify Equation (5.5))
that the same number of rounds certifies that
                                                                    √ 3
                            OBJI (x) ≤ ∑ P(α)  b   · Iα (x) ≤ P( / + 2k ε ,
                                                              b 0)
                                         α⊂[k]
                                                                       2

as desired.

   Using arguments analogous to the above along with the reasoning outlined in Theorem 4.9, proof 2
and Claim 6.7 from [2], we can also prove Theorem 5.4.


A     Characteristics of distributions of XOR-subformula coefficients
We now prove Theorem 5.14. We will use the following estimate for the mean absolute deviation of a
binomial random variable.
Lemma A.1 (e. g., [9]). If X is distributed according to the binomial distribution X ∼ Bin(n, p), then
                                              r                      
                                                2                  1
                              E |X − E X| =       np(1 − p) + O    √     .
                                                π                   n
Proof of theorem 5.14. We calculate the absolute value directly. Given that there are exactly k nonzero
Yt , the absolute value of X is distributed according to |Bin(k, 12 ) − 12 k|. Using the method of conditional
expectations,
                                                                                              r
                     N                                              N  
                                                      t     1            N k            N−k      1
            E |X| = ∑ Pr[k nonzero Yt ’s] · E |Bin(k, 2 ) − 2 k| ≥ ∑           p (1 − p)    ·      k,
                    k=0                                            k=0 k                        2π


                        T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                               31
                                RYAN O’D ONNELL AND T SELIL S CHRAMM

where we have applied the estimate from Theorem A.1. Letting D(akb) = a ln ba + (1 − a) ln 1−a1−b be the
relative entropy, we then have from Stirling’s inequality that
                   s                                      r
                N
                     2π    N                     k
                                                           1     1 N              k
                                                                                         
      E |X| ≥ ∑                  · exp −N   · D  N kp   ·      k ≥   ∑   exp −N · D N kp    .      (A.1)
               k=1   e2 k(N − k)                            2π     e k=1

Now, if pN < 1, we take

                  1 N                   1 N pN k
                                           
                                 pN                  1
           (A.1) ≥ ∑ exp k log         = ∑        ≥ − log(1 − pN) − O(pN ) ,                        (A.2)
                  e k=1           k     e k=1 k      e

as desired.
    If pN ≥ 1, applying the change of variables ` = k − bpNc and δ = N` ,

                                         1 b(1−p)Nc
                               (A.1) ≥        ∑ exp (−N · D (p + δ kp)) .                           (A.3)
                                         e `=1−bpNc
                                            δ = N`

Now using the Taylor expansion for log(1 − x) to simplify and restricting the sum over the range
       √      √
` = [−b pNc, b pNc], we get the bound
                            √
                        1 b pNc            δ2
                                             
                                                 1 p                     2 p
                (A.3) ≥      ∑      exp −N      ≥ · 2 pN · exp(− 2p ) ≥ 3/2 pN ,                    (A.4)
                        e `=−b√ pNc        2     e                     e
                             δ = N`

as desired.
    The first and second moment we can also obtain by calculation; the Zt ensure that the summands have
mean 0, and the Yt give that the variance of the summands is p, which gives the result.
                                        √
    The tail bound Pr(|X| ≥ (1 + 2t) pN) ≤ 2 exp(−t 2 ) comes from an application of Bernstein’s
inequality if pN ≥ 1; when pN < 1, we again apply Bernstein’s inequality, in which case we have
                                                                        !
                                                            1 s2
                               Pr (|X| − E |X| ≥ s) ≤ exp −               ,
                                                            2 pN + 13 s

and choosing s = 1 + t gives the result.


Acknowledgments
We thank Luca Trevisan for helpful comments, Sam Hopkins for feedback on an earlier written version,
and Boaz Barak for suggesting the title. We also thank the Schloss Dagstuhl Leibniz Center for Informatics
(and more specifically the organizers of the CSP Complexity and Approximability workshop), as well
as the Casa Mathemática Oaxaca (and more specifically the organizers of the Analytic Techniques in
TCS workshop); parts of this paper came together during discussions at these venues. T.S. also thanks

                       T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                            32
                                  S HERALI –A DAMS S TRIKES BACK

the Oberwolfach Research Institute for Mathematics (and the organizers of the Proof Complexity and
Beyond workshop), the Simons Institute (and the organizers of the Optimization semester program),
and the Banff International Research Station (and the organizers of the Approximation Algorithms and
Hardness workshop) where she tried to prove the opposite of the results in this paper, as well as Sam
Hopkins, with whom some of those efforts were made.


References
 [1] M IKHAIL A LEKHNOVICH , S ANJEEV A RORA , AND I ANNIS T OURLAKIS: Towards strong nonap-
     proximability results in the Lovász–Schrijver hierarchy. Comput. Complexity, 20(4):615–648, 2011.
     [doi:10.1007/s00037-011-0027-z] 6

 [2] S ARAH R. A LLEN , RYAN O’D ONNELL , AND DAVID W ITMER: How to refute a random CSP. In
     Proc. 56th FOCS, pp. 689–708. IEEE Comp. Soc., 2015. [doi:10.1109/FOCS.2015.48] 5, 6, 7, 23,
     28, 31

 [3] S ANJEEV A RORA , B ÉLA B OLLOBÁS , L ÁSZLÓ L OVÁSZ , AND I ANNIS T OURLAKIS: Proving
     integrality gaps without knowing the linear program. Theory of Computing, 2(2):19–51, 2006.
     [doi:10.4086/toc.2006.v002a002] 5, 6

 [4] S ANJEEV A RORA , L ÁSZLÓ L OVÁSZ , I LAN N EWMAN , Y UVAL R ABANI , Y URI R ABINOVICH ,
     AND S ANTOSH V EMPALA : Local versus global properties of metric spaces. SIAM J. Comput.,
     41(1):250–271, 2012. [doi:10.1137/090780304] 5

 [5] F RANCISCO BARAHONA AND A LI R IDHA M AHJOUB: On the cut polytope. Math. Programming,
     36(2):157–173, 1986. [doi:10.1007/BF02592023] 2

 [6] B OAZ BARAK , M ORITZ H ARDT, T HOMAS H OLENSTEIN , AND DAVID S TEURER: Subsampling
     mathematical relaxations and average-case complexity. In Proc. 22nd Ann. ACM–SIAM Symp. on
     Discrete Algorithms (SODA’11), pp. 512–531. SIAM, 2011. [doi:10.1137/1.9781611973082.41] 6

 [7] B OAZ BARAK , S AMUEL B. H OPKINS , J ONATHAN A. K ELNER , P RAVESH K. KOTHARI , A NKUR
     M OITRA , AND A ARON P OTECHIN: A nearly tight Sum-of-Squares lower bound for the Planted
     Clique problem. SIAM J. Comput., 48(2):687–735, 2019. Preliminary version in FOCS’16.
     [doi:10.1137/17M1138236] 7

 [8] B OAZ BARAK , P RASAD R AGHAVENDRA , AND DAVID S TEURER: Rounding semidefinite pro-
     gramming hierarchies via global correlation. In Proc. 52nd FOCS, pp. 472–481. IEEE Comp. Soc.,
     2011. [doi:10.1109/FOCS.2011.95] 7

 [9] C OLIN R. B LYTH: Expected absolute error of the usual estimator of the binomial parameter. The
     American Statistician, 34(3):155–157, 1980. [doi:10.1080/00031305.1980.10483022] 31

[10] R AVI B. B OPPANA: Eigenvalues and graph bisection: An average-case analysis. In Proc. 28th
     FOCS, pp. 280–285. IEEE Comp. Soc., 1987. [doi:10.1109/SFCS.1987.22] 3

                     T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                          33
                             RYAN O’D ONNELL AND T SELIL S CHRAMM

[11] A NDREI Z. B RODER , A LAN M. F RIEZE , S TEPHEN S UEN , AND E LI U PFAL: Optimal con-
     struction of edge-disjoint paths in random graphs. SIAM J. Comput., 28(2):541–573, 1998.
     [doi:10.1137/S0097539795290805] 4
[12] S IU O N C HAN , JAMES R. L EE , P RASAD R AGHAVENDRA , AND DAVID S TEURER: Ap-
     proximate constraint satisfaction requires large LP relaxations. J. ACM, 63(4):34:1–22, 2016.
     [doi:10.1145/2811255] 2, 7, 12
[13] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Integrality
     gaps for Sherali–Adams relaxations. In Proc. 41st STOC, pp. 283–292. ACM Press, 2009.
     [doi:10.1145/1536414.1536455] 2, 3, 5, 6, 7, 12, 21
[14] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Lo-
     cal global tradeoffs in metric embeddings. SIAM J. Comput., 39(6):2487–2512, 2010.
     [doi:10.1137/070712080] 5, 6
[15] M OSES C HARIKAR AND A NTHONY W IRTH: Maximizing quadratic programs: Extend-
     ing Grothendieck’s Inequality. In Proc. 45th FOCS, pp. 54–60. IEEE Comp. Soc., 2004.
     [doi:10.1109/FOCS.2004.39] 3, 4
[16] N ICHOLAS C OOK , L ARRY G OLDSTEIN , AND T OBIAS J OHNSON: Size biased couplings and
     the spectral gap for random regular graphs. Ann. Probab., 46(1):72–125, 2018. [doi:10.1214/17-
     AOP1180] 4
[17] C HARLES D ELORME AND S VATOPLUK P OLJAK: Laplacian eigenvalues and the maximum cut
     problem. Math. Programming, 62(1–3):557–574, 1993. ACM DL. 3
[18] M ICHEL M ARIE D EZA AND M ONIQUE L AURENT: Geometry of Cuts and Metrics. Volume 15 of
     Algorithms and Combinatorics. Springer, 1997. [doi:10.1007/978-3-642-04295-9] 2
[19] W ILLIAM E. D ONATH AND A LAN J. H OFFMAN: Lower bounds for the partitioning of graphs.
     IBM J. Res. Dev., 17(5):420–425, 1973. Included in “Selected Papers of Alan J. Hoffman: With
     Commentary” (World Scientific, 2003, pp. 437–442). [doi:10.1147/rd.175.0420] 3
[20] U RIEL F EIGE AND ROBERT K RAUTHGAMER: The probable value of the Lovász–Schrijver
     relaxations for maximum independent set. SIAM J. Comput., 32(2):345–370, 2003.
     [doi:10.1137/S009753970240118X] 7
[21] U RIEL F EIGE AND M ICHAEL L ANGBERG: The RPR2 rounding technique for semidef-
     inite programs.      J. Algorithms, 60(1):1–23, 2006. Preliminary version in ICALP’01.
     [doi:10.1016/j.jalgor.2004.11.003] 3, 4
[22] U RIEL F EIGE AND E RAN O FEK: Spectral techniques applied to sparse random graphs. Random
     Struct. Algor., 27(2):251–275, 2005. [doi:10.1002/rsa.20089] 3
[23] W ENCESLAS F ERNANDEZ DE LA V EGA AND C LAIRE M ATHIEU: Linear programming relaxations
     of maxcut. In Proc. 18th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’07), pp. 53–61.
     SIAM, 2007. ACM DL. 2, 3, 6

                     T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                       34
                                  S HERALI –A DAMS S TRIKES BACK

[24] M IROSLAV F IEDLER: Algebraic connectivity of graphs. Czechoslovak Math. J., 23(2):298–305,
     1973. [doi:10.21136/CMJ.1973.101168] 3
[25] J OEL F RIEDMAN , A NDREAS G OERDT, AND M ICHAEL K RIVELEVICH: Recognizing more
     unsatisfiable random k-SAT instances efficiently. SIAM J. Comput., 35(2):408–430, 2005.
     [doi:10.1137/S009753970444096X] 23
[26] J OEL F RIEDMAN , J EFF K AHN , AND E NDRE S ZEMERÉDI: On the second eigenvalue of random
     regular graphs. In Proc. 21st STOC, pp. 587–598. ACM Press, 1989. [doi:10.1145/73007.73063] 4
[27] A LAN F RIEZE AND R AVI K ANNAN: The regularity lemma and approximation schemes
     for dense problems.      In Proc. 37th FOCS, pp. 12–20. IEEE Comp. Soc., 1996.
     [doi:10.1109/SFCS.1996.548459] 6
[28] M ICHEL X. G OEMANS AND DAVID P. W ILLIAMSON: Improved approximation algorithms for
     maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–
     1145, 1995. [doi:10.1145/227683.227684] 3
[29] A NDREAS G OERDT AND M ICHAEL K RIVELEVICH: Efficient recognition of random unsatisfiable k-
     SAT instances by spectral methods. In Proc. 18th Symp. Theoret. Aspects of Comp. Sci. (STACS’01),
     pp. 294–304. Springer, 2001. [doi:10.1007/3-540-44693-1_26] 23
[30] D IMA G RIGORIEV: Linear lower bound on degrees of Positivstellensatz calculus proofs for the
     parity. Theoret. Comput. Sci., 259(1–2):613–622, 2001. [doi:10.1016/S0304-3975(00)00157-2] 5, 7
[31] J OHAN H ÅSTAD: An NP-complete problem – some aspects of its solution and some possible
     applications. Technical Report 16, Inst. Mittag-Leffler, 1984. 3
[32] S AMUEL B. H OPKINS , P RAVESH K. KOTHARI , A ARON P OTECHIN , P RASAD R AGHAVENDRA ,
     T SELIL S CHRAMM , AND DAVID S TEURER: The power of sum-of-squares for detecting hidden
     structures. In Proc. 58th FOCS, pp. 720–731. IEEE Comp. Soc., 2017. [doi:10.1109/FOCS.2017.72]
     7
[33] P RAVESH K. KOTHARI , R AGHU M EKA , AND P RASAD R AGHAVENDRA: Approximating rect-
     angles by juntas and weakly-exponential lower bounds for LP relaxations of CSPs. In Proc. 49th
     STOC, pp. 590–603. ACM Press, 2017. [doi:10.1145/3055399.3055438] 2, 5, 7
[34] P RAVESH K. KOTHARI , RYUHEI M ORI , RYAN O’D ONNELL , AND DAVID W ITMER: Sum of
     squares lower bounds for refuting any CSP. In Proc. 49th STOC, pp. 132–145. ACM Press, 2017.
     [doi:10.1145/3055399.3055485, arXiv:1701.04521] 5, 7
[35] DAVID A. L EVIN AND Y UVAL P ERES: Counting walks and graph homomorphisms via
     Markov chains and importance sampling. Amer. Math. Monthly, 124(7):637–641, 2017.
     [doi:10.4169/amer.math.monthly.124.7.637] 10
[36] B OJAN M OHAR AND S VATOPLUK P OLJAK: Eigenvalues in combinatorial optimization. In
     Combinatorial and Graph-Theoretical Problems in Linear Algebra, pp. 107–151. Springer, 1993.
     [doi:10.1007/978-1-4613-8354-3_5] 3

                      T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                         35
                              RYAN O’D ONNELL AND T SELIL S CHRAMM

[37] RYAN O’D ONNELL AND T SELIL S CHRAMM: Sherali–Adams strikes back. In Proc. 34th Comput.
     Complexity Conf. (CCC’19), pp. 8:1–30. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
     [doi:10.4230/LIPIcs.CCC.2019.8] 1
[38] S VATOPLUK P OLJAK: Polyhedral and eigenvalue approximations of the max-cut problem. In Sets,
     Graphs and Numbers (Budapest, 1991), volume 60 of Colloq. Math. Soc. János Bolyai, pp. 569–581.
     North-Holland, 1992. LINK. 2
[39] S VATOPLUK P OLJAK AND Z SOLT T UZA: The expected relative error of the polyhedral approx-
     imation of the max-cut problem. Oper. Res. Lett., 16(4):191–198, 1994. [doi:10.1016/0167-
     6377(94)90068-X] 2, 6
[40] P RASAD R AGHAVENDRA , S ATISH R AO , AND T SELIL S CHRAMM: Strongly refuting random
     CSPs below the spectral threshold. In Proc. 49th STOC, pp. 121–131. ACM Press, 2017.
     [doi:10.1145/3055399.3055417] 7
[41] P RASAD R AGHAVENDRA , T SELIL S CHRAMM , AND DAVID S TEURER: High-dimensional estima-
     tion via sum-of-squares proofs. In Proc. Internat. Congress of Mathematicians (ICM’18), volume 4,
     pp. 3389–3424. World Scientific, 2019. [doi:10.1142/9789813272880_0186, arXiv:1807.11419] 7
[42] P RASAD R AGHAVENDRA AND N ING TAN: Approximating CSPs with global cardinality constraints
     using SDP hierarchies. In Proc. 23rd Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’12),
     pp. 373–387. SIAM, 2012. [doi:10.1137/1.9781611973099.33, arXiv:1110.1064] 7
[43] G RANT S CHOENEBECK: Linear level Lasserre lower bounds for certain k-CSPs. In Proc. 49th
     FOCS, pp. 593–602. IEEE Comp. Soc., 2008. [doi:10.1109/FOCS.2008.74] 5, 7
[44] G RANT S CHOENEBECK , L UCA T REVISAN , AND M ADHUR T ULSIANI: Tight integrality gaps for
     Lovász–Schrijver LP relaxations of vertex cover and max cut. In Proc. 39th STOC, pp. 302–310.
     ACM Press, 2007. [doi:10.1145/1250790.1250836] 2, 3, 6
[45] H ANIF D. S HERALI AND WARREN P. A DAMS: A hierarchy of relaxations between the continuous
     and convex hull representations for zero-one programming problems. SIAM J. Discr. Math.,
     3(3):411–430, 1990. [doi:10.1137/0403036] 2
[46] KONSTANTIN T IKHOMIROV AND P IERRE YOUSSEF: The spectral gap of dense random regular
     graphs. Ann. Probab., 47(1):362–419, 2019. [doi:10.1214/18-AOP1263, arXiv:1610.01765] 3, 4
[47] L UCA T REVISAN: Max cut and the smallest eigenvalue. SIAM J. Comput., 41(6):1769–1786, 2012.
     Preliminary version in STOC’09. [doi:10.1137/090773714] 3
[48] J OEL A. T ROPP: An Introduction to Matrix Concentration Inequalities. Foundations and Trends R
     in Machine Learning, 8(1–2):1–230, 2015. [doi:10.1561/2200000048, arXiv:1501.01571] 25
[49] U RI Z WICK: Outward rotations: A tool for rounding solutions of semidefinite programming
     relaxations, with applications to max cut and other problems. In Proc. 31st STOC, pp. 679–687.
     ACM Press, 1999. [doi:10.1145/301250.301431] 3, 4


                      T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                         36
                                 S HERALI –A DAMS S TRIKES BACK

AUTHORS

    Ryan O’Donnell
    Professor
    Department of Computer Science
    Carnegie Mellon University
    odonnell cs cmu edu.


    Tselil Schramm
    Assistant professor
    Department of Statistics
    Stanford University
    schramm stanford edu


ABOUT THE AUTHORS

    RYAN O’D ONNELL received a B. Sc. from the University of Toronto in 1999 and a Ph. D.
      from the MIT Mathematics Department in 2003. His Ph. D. advisor was Madhu Sudan.
      Following this he was a postdoc at IAS for a year in Avi Wigderson’s group, and a
      postdoc at Microsoft Research for two years in Jennifer Chayes’s group. Since 2006 he
      has been a professor in the Computer Science Department at Carnegie Mellon University.
      Ryan’s research interests include Analysis of Boolean Functions, Constraint Satisfaction
      Problems, Quantum Complexity, and Probability. He enjoys his spare time.


    T SELIL S CHRAMM received a B. Sc. from Harvey Mudd College in 2012 and a Ph. D.
        from the UC Berkeley Computer Science Department in 2017, co-advised by Prasad
        Raghavendra and Satish Rao. Following this she spent two years as a postdoc at Harvard
        and MIT, hosted by Boaz Barak, Jonathan Kelner, Ankur Moitra, and Pablo Parrilo.
        In January of 2021 she began an appointment as an assistant professor in the Stanford
        University Statistics Department. Her research interests include average-case complexity,
        algorithms, and semidefinite programming hierarchies.




                    T HEORY OF C OMPUTING, Volume 17 (9), 2021, pp. 1–37                            37