From Weak to Strong Linear Programming Gaps for All Constraint Satisfaction Problems

Authors Mrinalkanti Ghosh, Madhur Tulsiani,
Plaintext
                          T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33
                                       www.theoryofcomputing.org

                                        S PECIAL ISSUE : CCC 2017



                      From Weak to Strong
                Linear Programming Gaps for All
                Constraint Satisfaction Problems
                            Mrinalkanti Ghosh∗                      Madhur Tulsiani†
                       Received July 30, 2017; Revised May 8, 2018; Published June 6, 2018




       Abstract: We study the approximability of constraint satisfaction problems (CSPs) by linear
       programming (LP) relaxations. We show that for every CSP, the approximation obtained by
       a basic LP relaxation is at least as strong as the approximation obtained using relaxations
       given by c · log n/ log log n levels of the Sherali–Adams hierarchy (for some constant c > 0)
       on instances of size n.
            It was proved by Chan et al. [FOCS 2013] (and recently strengthened by Kothari et
       al. [STOC 2017]) that for CSPs, any polynomial-size LP extended formulation is at most
       as strong as the relaxation obtained by a constant number of levels of the Sherali–Adams
       hierarchy (where the number of levels depend on the exponent of the polynomial in the size
       bound). Combining this with our result also implies that any polynomial-size LP extended
       formulation is at most as strong as the basic LP, which can be thought of as the base level of
     A conference version of this paper appeared in the Proceedings of the 32nd Computational Complexity Conference
(CCC’17) [14].
   ∗ NSF award number CCF-1254044.
   † NSF award number CCF-1254044.



ACM Classification: F.2.2, G.1.6
AMS Classification: 68Q17, 90C05
Key words and phrases: constraint satisfaction problem, convex programming, linear programming
hierarchy, integrality gap


© 2018 Mrinalkanti Ghosh and Madhur Tulsiani
c b Licensed under a Creative Commons Attribution License (CC-BY)                      DOI: 10.4086/toc.2018.v014a010
                                 M RINALKANTI G HOSH AND M ADHUR T ULSIANI

       the Sherali–Adams hierarchy. This essentially gives a dichotomy result for approximation of
       CSPs by polynomial-size LP extended formulations.
           Using our techniques, we also simplify and strengthen the result by Khot et al. [STOC
       2014] on (strong) approximation resistance for LPs. They provided a necessary and sufficient
       condition under which o(log log n) levels of the Sherali–Adams hierarchy cannot achieve an
       approximation better than a random assignment. We simplify their proof and strengthen the
       bound to o(log n/ log log n) levels.


1    Introduction
Given a finite alphabet [q] = {0, . . . , q − 1} and a predicate f : [q]k → {0, 1}, an instance of the problem
MAX k-CSP( f ) consists of m constraints over a set of n variables, x1 , . . . , xn , taking values in the set
[q]. Each constraint Ci is of the form f (xi1 + bi1 , . . . , xik + bik ) for some k-tuple (xi1 , . . . xik ) of variables,
and constants bi1 , . . . , bik ∈ [q], where the addition is taken to be modulo q. We say that an assignment
σ to the variables satisfies the constraint Ci if Ci (σ (xi1 ), . . . , σ (xik )) = 1. Given an instance Φ of the
problem, the goal is to find an assignment σ to the variables satisfying as many constraints as possible.
The approximability of the MAX k-CSP( f ) problem has been extensively studied for various predicates
 f (see, e. g., [30] for a survey), and special cases include several interesting and natural problems such as
MAX 3-SAT, MAX 3-XOR and MAX-CUT.
     A topic of much recent interest has been the efficacy of Linear Programming (LP) and Semidefinite
Programming (SDP) relaxations. For a given instance Φ of MAX k-CSP( f ), let OPT(Φ) denote the
fraction of constraints satisfied by an optimal assignment, and let FRAC(Φ) denote the value of the
convex (LP/SDP) relaxation for the problem. Then, the performance guarantee of this algorithm is given
by the integrality gap which equals the supremum of FRAC(Φ)/OPT(Φ), over all instances Φ.
     The study of unconditional lower bounds for general families of LP relaxations was initiated by Arora,
Bollobás and Lovász [2] (see also [3]). They studied the Lovász-Schrijver [25] LP hierarchy and proved
lower bounds on the integrality gap for Minimum Vertex Cover (their technique also yields similar bounds
for MAX-CUT). De la Vega and Kenyon-Mathieu [12] and Charikar, Makarychev and Makarychev [9]
proved a lower bound of 2 − o(1) for the integrality gap of the LP relaxations for MAX-CUT given
respectively by O(log log n) and nO(1) levels of the Sherali–Adams LP hierarchy [29]. Several follow-up
papers have also shown lower bounds for various other special cases of the MAX k-CSP problem, both
for LP and SDP hierarchies [1, 28, 33, 27, 6, 5, 22].
     An LP extended formulation of a polytope P ⊆ Rd is a linear program of the form

                                   Ex + Fy = g           and       E 0 x + F 0 y ≤ g0 ,

where x ∈ Rd , y ∈ Rt , and x ∈ P if and only if there exists y such that (x, y) is a solution to the above
LP. We take the size of an LP extended formulation to be the sum of the number of variables and the
number of constraints (equalities plus inequalities). We refer the interested reader to a discussion in [13]
for different (equivalent) notions of size.
    A recent result by Chan et al. [7] shows a connection between strong lower bounds for the Sherali–
Adams hierarchy, and lower bounds on the size of LP extended formulations for the corresponding
problem. In fact, their result proved a connection not only for a lower bound on the worst case integrality

                          T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                          2
                            F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

gap, but for the entire approximability curve. We say that Φ is (c, s)-integrality gap instance for a
relaxation of MAX k-CSP( f ), if we have

                               FRAC(Φ) ≥ c           and      OPT(Φ) < s .

And we say that Φ is (c, s)-approximable by a relaxation of MAX k-CSP( f ), if for instances with
OPT(Φ) < s, we have FRAC(Φ) ≤ c. They showed that for any fixed t ∈ N, if there exist (c, s)-integrality
gap instances of size n for the relaxation given by t levels of the Sherali–Adams hierarchy, then for all
ε > 0 and sufficiently large N, there exists a (c − ε, s + ε)-integrality gap instance of size (number of
variables) N, for any linear extended formulation of size at most N t/2 . They also give a trade-off when t is
a function of n. This was recently improved by Kothari et al. [21] and we describe the improved trade-off
later.
    We strengthen the above results by showing that for all c, s ∈ [0, 1], (c, s)-integrality gap instances
for a “basic LP” can be used to construct (c − ε, s + ε)-integrality gap instances for Ωε (log n/ log log n)
levels of the Sherali–Adams hierarchy. The basic LP uses only a subset of the constraints in the relaxation
given by k levels of the Sherali–Adams hierarchy for MAX k-CSP( f ). In particular, this shows that a
lower bound on the integrality gap for even the basic LP implies a similar lower bound on the integrality
gap of any polynomial-size extended formulation. This can also be viewed as a dichotomy result showing
that for any predicate f , either MAX k-CSP( f ) is (c, s)-approximable by the basic LP relaxation (where
the number of LP variables and constraints are linear in the size of the instance) or for all ε > 0, a
(c − ε, s + ε)-approximation cannot be achieved by any polynomial-size LP extended formulation. We
note that both the above results and our result apply to all f , q and all c, s ∈ [0, 1].

1.1   Comparison with (implications of) Raghavendra’s UG-hardness result
A remarkable result by Raghavendra [26] shows that a (c, s)-integrality gap instance for a “basic SDP”
relaxation of MAX k-CSP( f ) implies Unique-Games-hardness (UG-hardness) [16] of distinguishing
instances Φ with OPT(Φ) < s from instances with OPT(Φ) ≥ c. The basic SDP considered by Raghaven-
dra involves moments for all pairs of variables and all subsets of variables included in a constraint. The
basic LP we consider is weaker than this SDP and does not contain the positive semidefiniteness constraint.
    Combining Raghavendra’s result with known constructions of integrality gaps for Unique Games
by Raghavendra and Steurer [27], and by Khot and Saket [17], one can obtain a result qualitatively
similar to ours, for the mixed hierarchy. In particular, a (c, s)-integrality gap for the basic SDP implies a
(c − ε, s + ε)-integrality gap for Ω((log log n)1/4 ) levels of the mixed hierarchy.
    Note however, that the above result is incomparable to our result, since it starts with stronger
hypothesis (a basic SDP gap) and yields a gap for the mixed hierarchy as opposed to the Sherali–Adams
hierarchy. While the above can also be used to derive lower bounds for linear extended formulations,
one needs to start with an SDP gap instance to derive an LP lower bound. The basic SDP is known to be
provably stronger than the basic LP for several problems including various 2-CSPs. Also, for the worst
case f for q = 2, the integrality gap of the basic SDP is O(2k /k) [10], while that of the basic LP is 2k−1 .
The integrality gap for the basic LP is achieved by “all zero or all one” predicate.
    A recent result by Khot and Saket [18] shows a connection between the integrality gaps for the
basic LP and those for the basic SDP. They prove that a (c, s)-integrality gap instance for the basic LP

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                3
                             M RINALKANTI G HOSH AND M ADHUR T ULSIANI

implies UG-hardness of distinguishing instances Φ with OPT(Φ) ≥ Ω c/(k3 · log q) from instances
                                                                                        

with OPT(Φ) ≤ 4s. Their result also shows  that a (c, s)-integrality gap instance for the basic LP can
be used to produce a Ω c/(k3 · log q) , 4s -integrality gap instance for the basic SDP, and hence for
Ω((log log n)1/4 ) levels of the mixed hierarchy.


1.2     Other related work
The power of the basic LP for solving valued CSPs to optimality has been studied in several previous
papers. These results concerns the problem of minimizing the penalty for unsatisfied constraints, where
the penalties take values in Q ∪ {∞}. Also, they study the problem not only in terms of single predicate f ,
but rather in terms of the constraint language generated by a given set of (valued) predicates.
    It was shown by Thapper and Živný [31] that when the penalties are finite-valued, if the problem
of finding the optimum solution cannot be solved by the basic LP, then it is NP-hard. Kolmogorov,
Thapper and Živný [20] give a characterization of CSPs where the problem of minimizing the penalty
for unsatisfied constraints can be solved exactly by the basic LP. Also, a recent result by Thapper and
Živný [32] shows that the valued CSP problem for a constraint language can be solved to optimality by a
bounded number of levels of the Sherali–Adams hierarchy if and only if it can be solved by a relaxation
obtained by augmenting the basic LP with constraints implied by three levels of the Sherali–Adams
hierarchy. However, the above papers only consider the case when the LP gives an exact solution, and do
not focus on approximation.
    The techniques from [9] used in our result, were also extended by Lee [24] to prove a hardness for the
Graph Pricing problem. Kenkre et al. [15] also applied these to show the optimality of a simple LP-based
algorithm for Digraph Ordering.

1.2.1    Our results

Our main result is the following theorem, which shows that for every CSP, for instances of size n, the
basic LP is at least as strong as any relaxation given by o(log n/ log log n) levels of the Sherali–Adams
hierarchy.

Theorem 1.1. Let f : [q]k → {0, 1} be any predicate. Let Φ0 be a (c, s)-integrality gap instance for basic
LP relaxation of MAX k-CSP ( f ). Then for every ε > 0, there exists cε > 0 such that for infinitely
many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N for the LP relaxation given by
cε · log N/ log log N levels of the Sherali–Adams hierarchy.

     Combining the above with the connection between Sherali–Adams gaps and extended formulations
by [7, 21] also yields that the basic LP is at least as strong as any LP extended formulation of size
no(log n/ log log n) .

Corollary 1.2. Let f : [q] → {0, 1} be any predicate. Let Φ0 be a (c, s)-integrality gap instance for basic
LP relaxation of MAX k-CSP ( f ). Then for every ε > 0, there exists c0ε > 0 such that for infinitely many
N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N, for every linear extended formulation
                   0
of size at most N cε ·log N/ log log N .

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                              4
                            F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

    As an application of our methods, we also simplify and strengthen the approximation resistance
results for LPs proved by Khot et al. [19]. They studied predicates f : {0, 1}k → {0, 1} and provided
a necessary and sufficient condition for the predicate to be strongly approximation resistant for the
Sherali–Adams LP hierarchy. One says a predicate is strongly approximation resistant if for all ε > 0, it
is hard to distinguish instances Φ for which

                                        OPT(Φ) −           E    [ f (x)] ≤ ε
                                                     x∈{0,1}k

from instances with OPT(Φ) ≥ 1 − ε. In the context of the Sherali–Adams hierarchy, they showed that
when this condition is satisfied, there exist instances Φ satisfying

                    OPT(Φ) −        E       [ f (x)] ≤ ε        and     FRAC(Φ) ≥ 1 − ε ,
                                 x∈{0,1}k

where FRAC(Φ) is the value of the relaxation given by Oε (log log n) levels of the Sherali–Adams
hierarchy. We strengthen their result (and provide a simpler proof) to prove the following.

Theorem 1.3. Let f : {0, 1}k → {0, 1} be any predicate satisfying the condition for strong approximation
resistance for LPs, given by [19]. Then for every ε > 0, there exists cε > 0 such that for infinitely many
N ∈ N, there exists an instance Φ of MAX k-CSP( f ) of size N, satisfying

                    OPT(Φ) −        E       [ f (x)] ≤ ε        and     FRAC(Φ) ≥ 1 − ε ,
                                 x∈{0,1}k

where FRAC(Φ) is the value of the relaxation given by cε · log N/ log log N levels of the Sherali–Adams
hierarchy.

    As before, the above theorem also yields a corollary for extended formulations.

1.2.2   Proof overview and techniques
The gap instance. The construction of our gap instances is inspired by the construction by Khot et
al. [19]. They gave a generic construction to prove integrality gaps for any approximation resistant
predicate (starting from certificates of hardness in form of certain “vanishing measures”), and we use
similar ideas to give a construction which can start from a basic LP integrality gap instance as a certificate,
to produce a gap instance for a large number of levels. This construction is discussed in Section 5.
     Given an integrality gap instance Φ0 on n0 variables, we treat this as a “template” (as in Raghaven-
dra [26]) and generate a random instance using this. Concretely, we generate a new instance Φ on n0 sets
of n variables each. To generate a constraint, we sample a random constraint C0 ∈ Φ0 , and pick a variable
randomly from each of the sets corresponding to variables in C0 . Thus, the instances generated are
n0 -partite random hypergraphs, with each edge being generated according to a specified “type” (indices
of sets to chose vertices from).
     Note that previous instances of gap constructions for LP and SDP hierarchies were hypergraphs
generated according to the model Gn,p with the signs of the literals chosen independently at random.

                        T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                5
                              M RINALKANTI G HOSH AND M ADHUR T ULSIANI

However, proving an LP/SDP lower bound using such instances implies a strong result: The predicate f is
useless for the corresponding relaxation, in the sense defined by [4] (replacing the “P 6= NP” assumption
by the assumption that UG does not belong to P). Uselessness only holds for a limited class of predicates
 f (when f −1 (1) supports a balanced pairwise independent distribution on [q]k ) [4]. Thus, proving an SDP
lower bound for predicates which are not expected to be useless requires a new construction of instances,
which cannot be generated uniformly at random. Our construction provides such a generalization, and
may be useful in proving new SDP lower bounds. The properties of random Gn,p hypergraphs easily carry
over to our instances, and we collect these properties in Section 3.
     The above construction ensures that if the instance Φ0 does not have an assignment satisfying more
than an s fraction of the constraints, then OPT(Φ) ≤ s + ε with high probability. Also, it is well-known
that providing a good LP solution to the relaxation given by t levels of the Sherali–Adams hierarchy
is equivalent to providing distributions DS on [q]S for all sets of variables S with |S| ≤ t, such that
the distributions are consistent restricted to subsets, i. e., for all S with |S| ≤ t and all T ⊆ S, we have
DS|T = DT . Thus, in our case, we need to produce such consistent local distributions such that the
expected probability that a random constraint C ∈ Φ is satisfied by the local distribution on the set of
variables involved in C (which we denote as SC ) is at least c − ε.

Local distributions from local structure. Most papers on integrality gaps for CSPs utilize the local
structure of random hypergraphs to produce such distributions. Since the girth of a sparse random
hypergraph is Ω(log n), any induced subgraph on o(log n) vertices is simply a forest. In case that the
induced hypergraph GS on a set S is a tree, there is an easy distribution to consider: simply choose an
arbitrary root and propagate down the tree by sampling each child conditioned on its parent. It is also
easy to see that for T ⊆ S, if the induced hypergraph GT is a subtree of GS , then the distributions DS and
DT produced as above are consistent.
     The extension of this idea to forests requires some care. One can consider extending the distribution
to forests by propagating independently on each tree in the forest. However, if for T ⊆ S GT is a forest
while GS is a tree, then a pair of vertices disconnected in GT will have no correlation in DT but may
be correlated in DS . This was handled, for example, in [19] by adding noise to the propagation and
using a large ball B(S) around S to define DS . Then, if two vertices of T are disconnected in B(T ) but
connected in B(S), then they must be at a large distance from each other. Thus, because of the noise,
the correlation between them (which is zero in DT ) will be very small in DS . However, correcting
approximate consistency to exact consistency incurs a cost which is exponential in the number of levels
(i. e., the sizes of the sets), which is what limits the results in [19, 12] to O(log log n) levels. This also
makes the proof more involved since it requires a careful control of the errors in consistency.

Consistent partitioning schemes. We resolve the above consistency issue by first partitioning the
given set S into a set of clusters, each of which have diameter ∆H = o(log n) in the underlying hypergraph
H. Since each cluster has bounded diameter, it becomes a tree when we add all the missing paths between
any two vertices in the cluster. We then propagate independently on each cluster (augmented with the
missing paths). This preserves the correlation between any two vertices in the same cluster, even if the
path between them was not originally present in GS .
    Of course, the above plan requires that the partition obtained for T ⊆ S, is consistent with the

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                6
                              F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

restriction to T of partition obtained for the set S. In fact, we construct distributions over partitions
{PS }|S|≤t , which satisfy the consistency property PS|T = PT . These distributions over partitions, which
we call consistent partitioning schemes, are constructed in Section 4.
     In addition to being consistent, we require that the partitioning scheme cuts only a small number of
edges in expectation, since these contribute to a loss in the LP objective. We remark that such low-diameter
decompositions (known as separating and padded decompositions) have been used extensively in the
theory metric embeddings (see, e. g., [23] and the references therein). The only additional requirement in
our application is consistency.
     We obtain the decompositions by proving the (easy) hypergraph extensions the results of Charikar,
Makarychev and Makarychev [11], who exhibit a metric which is similar to the shortest path metric on
                                                                                                        0
graphs at small distances, and has the property that its restriction to any subset of size at most nε (for
an appropriate ε 0 < 1) is `2 embeddable. This is proved in Section 3. A variant of this metric was used
                                                                                                   0
by Charikar, Makarychev and Makarychev [9] to prove lower bounds for MAX-CUT, for nε levels of
                                                                                                             0
the Sherali–Adams hierarchy. They used the embedding to construct a “local SDP solution” for any nε
variables (with value 1 − ε 0 ) and produced√the distributions required for Sherali–Adams by rounding the
SDP solutions (which gives value 1 − O( ε 0 )). However, rounding an SDP solution with a high value
does not always produce a good integral solution for other CSPs.
     Instead, we use these metrics in Section 4 to construct the consistent partitioning schemes as described
above, by applying a result of Charikar et al. [8] giving separating decompositions for finite subsets of
`2 . We remark that it is the consistency requirement of the partitioning procedure that limits our results
to O (log n/ log log n) levels. The separation probability in the decomposition procedure grows with the
dimension of the `2 embedding, while (to the best of our knowledge) dimension reduction procedures
seem to break consistency.


2     Preliminaries
We use [n] to denote the set {1, . . . , n}. The only exception is [q], where we overload this notation
to denote the set {0, . . . , q − 1}, which corresponds to the the alphabet for the Constraint Satisfaction
Problem under consideration. We use DS and PS to denote probability distributions over assignments
to and partitions of a set S, respectively. For T ⊆ S, the notation DS|T is used to denote the restriction
(marginal) of the distribution DS to the set T (and similarly for PS|T ).


2.1   Constraint Satisfaction Problems
Definition 2.1. Let [q] denote the set {0, . . . , q − 1}. For a predicate f : [q]k → {0, 1}, an instance Φ of
MAX k-CSPq ( f ) consists of a set {x1 , . . . , xn } of variables and a set {C1 , . . . ,Cm } of constraints. Each
constraint Ci is over a k-tuple (xi1 , . . . , xik ) of variables, and the constraint is of the form

                                        Ci ≡ f (xi1 + bi1 , . . . , xik + bik )

for some bi1 , . . . , bik ∈ [q], where the addition is modulo q. For an assignment σ : [n] 7→ [q], let sat(σ )
denote the fraction of constraints satisfied by σ (where xi gets assigned to σ (i)). The maximum fraction

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                   7
                                       M RINALKANTI G HOSH AND M ADHUR T ULSIANI

                                                        

             maximize        E  ∑ f (α + bC ) · x(SC ,α) 
                           C∈Φ
                                   α∈[q]k

             ∑ x(S,α) = x(T,β )                                         ∀T ⊆ S ⊆ [n], |S| ≤ t, ∀β ∈ [q]T
            α∈[q]S
            α|T =β

                     x(S,α) ≥ 0                                               ∀S ⊆ [n], |S| ≤ t, ∀α ∈ [q]S
                           / = 1
                     x(0,/ 0)



                           Figure 1: Level-t Sherali–Adams LP for MAX k-CSPq ( f ).


of constraints that can be simultaneously satisfied is denoted by OPT(Φ), i. e.,

                                              OPT(Φ) =        max sat(σ ) .
                                                          σ :[n]7→[q]


    For a constraint C of the above form, we use xC to denote the tuple (xi1 , . . . , xik ) of variables, and bC
to denote the tuple (bi1 , . . . , bik ). We then write the constraint as f (xC + bC ). We also denote by SC the set
of indices, {i1 , . . . , ik }, of the variables participating in the constraint C.


2.2   The LP relaxations for Constraint Satisfaction Problems
Below we present various LP relaxations for the MAX k-CSPq ( f ) problem that are relevant in this paper.
    We start with the level-t Sherali–Adams relaxation. The intuition behind it is the following. Note
that an integer solution to the problem can be given by an assignment σ : [n] → [q]. Using this, we can
define {0, 1}-valued variables x(S,α) for each S ⊆ [n], 1 ≤ |S| ≤ t and α ∈ [q]S , with the intended solution
x(S,α) = 1 if σ (S) = α and 0 otherwise. We also introduce a variable x(0,/ 0)
                                                                             / , which equals to 1. We relax
the integer program and allow variables to take real values in [0, 1]. Now the variables {x(S,α) }α∈[q]S give
a probability distribution DS over assignments to S. We can enforce consistency between these local
distributions by requiring that for T ⊆ S, the distribution over assignments to S, when marginalized to T ,
is precisely the distribution over assignments to T , i. e., DS|T = DT . The relaxation is shown in Figure 1.
    The basic LP relaxation is a reduced form of the above relaxation where only those variables x(S,α) are
included for which S = SC is the set of CSP variables for some constraint C. The consistency constraints
are included only for singleton subsets of the sets SC . We note here that for any feasible solution to
basic LP relaxation, the local distributions {x(S,α) } assign the same value to the repeated variables of a
constraint. Note that the all the constraints for the basic LP are implied by the relaxation obtained by
level k of the Sherali–Adams hierarchy.
    For an LP/SDP relaxation of MAX k-CSPq , and for a given instance Φ of the problem, we denote by
FRAC(Φ) the LP/SDP (fractional) optimum. A relaxation is said to have a (c, s)-integrality gap if there
exists a CSP instance Φ such that FRAC(Φ) ≥ c and OPT(Φ) < s.

                           T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                  8
                                   F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

                                                                

                     maximize       E  ∑ f (α + bC ) · x(SC ,α) 
                                  C∈Φ
                                          α∈[q]k

                      ∑ x(i,b) = 1                                                                 ∀i ∈ [n]
                      b∈[q]

                   ∑ x(SC ,α) = x(i,b)                                             ∀C ∈ Φ, i ∈ SC , b ∈ [q]
                 α∈[q]SC
                 α(i)=b

                           x(SC ,α) ≥ 0                                              ∀C ∈ Φ, ∀α ∈ [q]SC

                                 Figure 2: Basic LP relaxation for MAX k-CSPq ( f ).


2.3     Hypergraphs
An instance Φ of MAX k-CSP defines a natural associated hypergraph H = (V, E) with V being the set of
variables in Φ and E containing one hyperedge for every constraint C ∈ Φ. We remind the reader of the
familiar notions of degree, paths, and cycles for the case of hypergraphs:

Definition 2.2. Let H = (V, E) be a hypergraph.

      • For a vertex v ∈ V , the degree of the vertex v is defined to be the number of distinct hyperedges
        containing it.

      • A simple path P is a finite alternate sequence of distinct vertices and distinct edges starting and
        ending at vertices, i. e., P = v1 , e1 , v2 , . . . , v` , e` , v`+1 , where vi ∈ V ∀i ∈ [` + 1] and ei ∈ E ∀i ∈ [`].
        Furthermore, ei contains vi , vi+1 for each i. Here ` is called the length of the path P. All paths
        discussed in this paper will be simple paths.

      • A sequence C = (v1 , e1 , v2 , . . . , v` , e` , v1 ) is called a cycle of length ` if the initial segment
        v1 , e1 , . . . , v` is a (simple) path, e` 6= ei and |ei | > 1 for all i ∈ [`], and v1 ∈ e` . We note that
        we don’t include hyperedges with only one vertex towards forming cycles. For a path P (or cycle
        C), we use V(P) (or V(C)) to denote the set of all the vertices that occur in the edges, i. e., the set
        {v : (∃i ∈ [h])(v ∈ ei )}, where e1 , . . . , eh are the hyperedges included in P (or C). For a path P, |P|
        denotes the number of hyperedges on the path P.

      • For a given hypergraph H, the length of the smallest cycle in H is called the girth of H.

    To observe the difference the notions of cycle in graphs and hypergraphs, it is instructive to consider
the following example: let u, v be two distinct vertices in a k-uniform hypergraph for k ≥ 3, and let e1 , e2
be two distinct hyperedges both containing u and v. Then u, e1 , v, e2 , u is a cycle of length 2, which cannot
occur in a graph.
    We shall also need the following notion of the closure of a set S ⊆ V in a given hypergraph H, defined
by [9] for the case of graphs. A stronger notion of closure was also considered by [5].


                              T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                        9
                                M RINALKANTI G HOSH AND M ADHUR T ULSIANI

Definition 2.3. For a given hypergraph H and R ∈ N, and a set S ⊆ V(H), we denote by clR (S) the
R-closure of S obtained by adding all the vertices in all the paths of length at most R connecting two
vertices of S, i. e.,                                              
                                                                           
                                                            [              
                                     clR (S) = S ∪ 
                                                   
                                                                            .
                                                                        V(P)
                                                    P:P is a path in H     
                                                      P connects u, v ∈ S
                                                           |P|≤R

For ease of notation, we use cl(S) to denote cl1 (S).


3    Properties of random hypergraphs
In this section we collect various properties of the hypergraphs corresponding to our integrality gap
instances. The gap instances we generate contain several disjoint collections of variables. Each constraint
in the instance has a specified “type”, which specifies which of the collections each of the participating
k variables much be sampled from. The constraint is generated by randomly sampling each of the k
variables, from the collections specified by its type. This is captured by the generative model described
below.
    In the model below and in the construction of the gap instance, the parameter n0 should be thought
of as constant, while the parameters n and m should be though of a growing to infinity. We will choose
m = γ · n for γ = Ok,q (1).

Definition 3.1. Let n0 , k ∈ N with k ≥ 2. Let m, n ∈ N and let Γ be a distribution on [n0 ]k . We define
a distribution Hk (m, n, n0 , Γ) on n0 -partite hypergraphs on N = n0 · n vertices, divided into n0 sets,
X1 , . . . , Xn0 , of size n each. Each H ∈ Supp(Hk (m, n, n0 , Γ)) has m edges and each edge has at most k
vertices. A random hypergraph H ∼ Hk (m, n, n0 , Γ) is generated by sampling m random hyperedges
independently as follows:
    • Sample a random type (i1 , . . . , ik ) ∈ [n0 ]k from the distribution Γ.
    • For all distinct i j , sample vi j independently and uniformly from Xi j .
    • Add the edge ei = {vi1 , . . . , vik } to H.
    Note that as specified above, the model may generate a multi-hypergraph. However, the number of
such repeated edges is likely to be small, and we will bound these, and in fact the number of cycles of
size o(log n) in Lemma A.2.
    We will study the following metrics (similar to the ones defined in [11]):
Definition 3.2. Given a hypergraph H with vertex set V , we define two metrics dµH (·, ·), ρµH (·, ·) on V as
                                                                        s
                                                                          2 · dµH (u, v) + µ
             dµH (u, v) := 1 − (1 − µ)2·dH (u,v) and      ρµH (u, v) :=                      ,
                                                                                 1+µ

for u 6= v, where dH (·, ·) denotes the shortest path distance in H.

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                            10
                             F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

    We primarily need the local `2 -embeddability of the metric ρµH . The following theorem captures
various properties of random hypergraphs required for our construction. The proof of the theorem heavily
uses results proved in [3] and [9] and we defer the details to Appendix A.

Theorem 3.3. Let H 0 ∼ Hk (m, n, n0 , Γ) with m = γ · n edges and let ε > 0. Then for large enough n,
with high probability (at least 1 − ε, over the choice of H 0 ), there exist δ > 0, constant c = c(k, γ, n0 , ε),
θ = θ (k, γ, n0 , ε) and a subhypergraph H ⊂ H 0 with V (H) = V (H 0 ) satisfying the following:

    • H has girth g ≥ δ · log n.

    • |E(H 0 ) \ E(H)| ≤ ε · m.

    • For all t ≤ nθ , for µ ≥ c · (logt + log log n)/ log n, for all S ⊆ V(H 0 ) with |S| ≤ t, the metric ρµH
      restricted to S is isometrically embeddable into the unit sphere in `2 .


4    Decompositions of hypergraphs from local geometry
We will construct the Sherali–Adams solution by partitioning the given subset of vertices into trees, and
then creating a natural distribution over satisfying assignments on trees. We define below the kind of
partitions we need.

Definition 4.1 (Consistent Partitioning Scheme). Let X be a finite set. For a set S, let PS denote a
distribution over partitions of S. For T ⊆ S, let PS|T be the distribution over partitions of T obtained by
restricting the partitions in PS to the set T . We say that a collection of distributions {PS }|S|≤t forms a
consistent partitioning scheme of order t, if

                              ∀S ⊆ X, |S| ≤ t       and    ∀T ⊆ S     PT = PS|T .

    In addition to being consistent as described above, we also require the distributions to have small
probability of cutting the hyperedges for the hypergraphs corresponding to our CSP instances. We define
this property below.

Definition 4.2. Let H = (V, E) be a hypergraph with each hyperedge having at most k vertices. Let
{PS }|S|≤t be a consistent partitioning scheme of order t for the vertex set V , with t ≥ k. We say the
scheme {PS }|S|≤t is ε-sparse for H if

                                    ∀e ∈ E           P [e is cut by P] ≤ ε .
                                                    P∼Pe

    In this section, we will prove that the hypergraphs arising from random CSP instances admit sparse
and consistent partitioning schemes. Recall that for a hypergraph H, we define (Definition 3.2) the metrics
dµH and ρµH as:
                                                                                s
                                                                                    2 · dµH (u, v) + µ
              dµH (u, v) := 1 − (1 − µ)2·dH (u,v)         and   ρµH (u, v) :=                          .
                                                                                           1+µ


                        T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                 11
                              M RINALKANTI G HOSH AND M ADHUR T ULSIANI

Lemma 4.3. Let H = (V, E) be hypergraph with each hyperedge containing at most k vertices and
let ρµH be the metric as defined above. Further, let H be such that for all sets S ⊆ V with |S| ≤ t, the
                                                                                             √
metric induced on ρµH on S is isometrically embeddable into `2 . Then, there exists ε ≤ 10k · µ · t and
∆H = O(1/µ) such that H admits an ε-sparse consistent partitioning scheme of order t, with each
partition consisting of clusters of diameter at most ∆H in H.

    We use the following result of Charikar et al. [8] which shows that low-dimensional metrics have good
separating decompositions with bounded diameter, i. e., decompositions which have a small probability
of separating points at a small distance.

Theorem 4.4 ([8]). Let W be a finite collection of points in Rd and let ∆ > 0 be given. Then there exists
a distribution P over partitions of W such that

    - ∀P ∈ Supp(P), each cluster in P has `2 diameter at most ∆, and

    - for all x, y ∈ W
                                                              √ kx − yk2
                                   P [P separates x and y] ≤ 2 d ·       .
                                  P∼P                              ∆

    We also need the observation that the partitions produced by the above theorem are consistent,
assuming the set S considered above lies in a fixed bounded set (using a trivial modification of the
procedure in [8]). For the sequel, we use B(x, δ ) to denote the `2 ball around x of radius δ and BH (u, r)
to denote a ball of radius r around a vertex u ∈ V (H). Thus,

          B(x, δ ) := {y | kx − yk2 ≤ δ }        and      BH (u, r) := {v ∈ V | dH (u, v) ≤ r} .

The balls B(S, δ ) and BH (S, r) are defined similarly.

Claim 4.5. Let S and T be sets such that T ⊆ S. Let WS = {wu }u∈S and WT = {w0u }u∈T be `2 -embeddings
of S and T satisfying φ (WT ) ⊆ WS ⊆ B(0, R0 ) ⊂ Rd , for some unitary transformation φ and R0 > 0. Let
PS and PT be distributions over partitions of S and T respectively, induced by partitions on WS and WT
as given by Theorem 4.4. Then
                                              PS|T = PT .

Proof. The claim follows simply by considering (a trivial modification of) the algorithm of [8]. For a
given set W and a parameter ∆, they produce a partition using the following procedure:

    • Let W 0 = W .

    • Repeat until W 0 = 0/

         – Pick a random point x in B(W, ∆/2) according to the Haar measure. Let Cx = B(x, ∆/2) ∩W 0 .
                    / set W 0 = W 0 \Cx . Output Cx as a cluster in the partition.
         – If Cx 6= 0,

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                          12
                            F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

[8] show that the above procedure produces a distribution over partitions satisfying the conditions in
Theorem 4.4. We simply modify the procedure to sample a random point x in B(0, R0 + ∆/2) instead of
B(W, ∆/2). This does not affect the separation probability of any two points, since the only non-empty
clusters are still produced by the points in B(S, ∆/2). Since R0 + ∆ < ∞, the above procedure almost
surely terminates in finitely many steps.
    Let P be a partition of S produced by the above procedure when applied to the point set WS , and
let P0 be a random partition produced when applied to the point set φ (WT ). It is easy to see from the
above procedure that the distribution PT is invariant under a unitary transformation of WT . By coupling
the random choice of a point in B(0, R0 + ∆/2) chosen at each step in the procedures applied to WS
and φ (WT ) ⊆ WS , we get that P(T ) = P0 , i. e., the partition P restricted to T equals P0 . Thus, we get
PS|T = PT .

      We can use the above to prove Lemma 4.3.

Proof of Lemma 4.3. Given a set S, let WS be an `2 embedding of the metric ρµ restricted to S. Since
|S| ≤ t, we can assume WS ∈ Rt . We apply partitioning procedure of Charikar et al. from Theorem 4.4
with ∆ = 1/2. From the definition of the metric ρµH , we get that there exists a ∆H = O(1/µ) such that

                                      H
                                     ρu,v ≤ 1/2 =⇒ dH (u, v) ≤ ∆H .
                                                                     √
Moreover, for u, v contained in an edge e, we have that ρµ (u, v) ≤ 5µ and hence the probability that u
                                  √
and v are separated is at most 10 µ · t. Thus, the probability that any vertex in e is separated from u is at
           √
most 10k · µ · t – as e contains at most k vertices.
    Finally, for any S ⊆ T , if WS and WT denote the corresponding `2 embeddings, by the rigidity of `2
we have that for φ (WT ) ⊆ WS for some unitary transformation φ . Thus, by Claim 4.5, we get that this is a
consistent partitioning scheme of order t.


5     The Sherali–Adams integrality gaps construction
5.1     Integrality gaps from the basic LP
Recall that the basic LP relaxation for MAX k-CSPq ( f ) as given in Figure 2. In this section, we will
prove Theorem 1.1. We recall the statement below.

Theorem 1.1. Let f : [q]k → {0, 1} be any predicate. Let Φ0 be a (c, s)-integrality gap instance for basic
LP relaxation of MAX k-CSP ( f ). Then for every ε > 0, there exists cε > 0 such that for infinitely
many N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N for the LP relaxation given by
cε · log N/ log log N levels of the Sherali–Adams hierarchy.

     Let Φ0 be a (c, s) integrality gap instance for the basic LP relaxation for MAX k-CSPq ( f ) with n0
variables and m0 constraints. We use it to construct a new integrality gap instance Φ. The construction
is similar to the gap instances constructed by Khot et al. [19] discussed in the next section. However,
we describe this construction first since it is simpler. The procedure for constructing the instance Φ is
described in Figure 3.

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                              13
                               M RINALKANTI G HOSH AND M ADHUR T ULSIANI


   Given: A (c, s) gap instance Φ0 on n0 variables, for the basic LP.
   Output: An instance Φ with N = n · n0 variables and m constraints.
   The variables are divided into n0 sets, X1 , . . . , Xn0 , one for each variable in Φ0 . We generate m
   constraints independently at random as follows:

      1. Sample a random constraint C0 ∼ Φ0 . Let xC0 = (i1 , . . . , ik ) ⊆ [n0 ] denote the multi-set of
         variables in this constraint.

      2. For each distinct i j for j ∈ [k], sample a random variable xi j ∈ Xi j . We note that if i j = i j0 then
         we set Xi j = Xi j0 .

      3. Add the constraint f ((xi1 , . . . , xik ) + bC0 ) to the instance Φ.


                                   Figure 3: Construction of the gap instance Φ.

Soundness
We first prove that no assignment satisfies more than s + ε fraction of constraints for the above instance.
Lemma 5.1. For every ε > 0, there exists γ = γ(ε, n0 , q) such that for an instance Φ generated by
choosing at least γ · n constraints independently at random as above, we have with probability 1 −
exp (−Ω(n)), OPT(Φ) < s + ε.
Proof. Fix an assignment σ ∈ [q]N . We will first consider E [satΦ (σ )] for a randomly generated Φ as
above.

                       E [satΦ (σ )] =         E          E          f (σ (xi1 ) + bi1 , . . . , σ (xik ) + bik )
                       Φ                    C0 ∈Φ0 (xi1 ,...,xik )

                                        =      E        E       [ f (ZC0 + bC0 )] ,
                                            C0 ∈Φ0 Z1 ,...Zn0

where for each i ∈ [n0 ], Zi is an independent random variable with the distribution
                                                                 
                                       P [Zi = b] := E 1{σ (x)=b} ,
                                                                 x∈Xi

and ZC0 denotes the collection of variables in the constraint C0 , i. e.,

                                                     ZC0 = {Zi }i∈SC .
                                                                              0

Thus, the random variables Z1 , . . . , Zn0 define a random assignment to the variables in Φ0 , which gives,
for any σ
                          E [satΦ (σ )] = E             E [ f (ZC0 + bC0 )] < s .
                               Φ                      C0 ∈Φ0 Z1 ,...Zn0

Consider a randomly added constraint C to the instance Φ. We have that

                                        P [C(σ ) = 1] = E [satΦ (σ )] < s ,
                                                                     Φ


                        T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                       14
                                F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

for any fixed σ over random choice of the constraint C. Thus, for an instance Φ with m independently
and randomly generated constraints, we have
                                                                          
                   P [satΦ (σ ) ≥ s + ε] ≤ P satΦ (σ ) ≥ E [satΦ (σ )] + ε
                   Φ                       Φ              Φ
                                                                                
                                                           
                                         = P E 1{C(σ )=1} ≥ E [satΦ (σ )] + ε
                                           Φ C∈Φ                  Φ
                                                      2
                                                          
                                         ≤ exp −Ω(ε · m) .

Taking a union bound over all assignments, we get

                                P [∃σ satΦ (σ ) ≥ s + ε] ≤ qn·n0 · exp −ε 2 · m ,
                                                                               
                                Φ

which is at most exp (−Ω(n)) for m = O(((log q)/ε 2 ) · n · n0 ).

Completeness
To prove the completeness, we first observe that the instance Φ as constructed above is also a gap instance
for the basic LP. We will then “boost” this hardness to many levels of the Sherali–Adams hierarchy.

Lemma 5.2. For every ε > 0, there exists γ = γ(ε) such that for an instance Φ generated by choosing at
least γ · n constraints independently at random as above, with probability 1 − exp (−Ω(n)) there exist
distributions DSC over [q]SC for each C ∈ Φ, and distributions Di over [q] for each variable i ∈ [n · n0 ],
satisfying:

    - For all C ∈ Φ and all i ∈ SC , DSC |{i} = Di ;
                                                                            ε
    - The distributions satisfy E         E [ f (α + bC )] ≥ c −               .
                                    C∈Φ α∼DS
                                            C
                                                                            10

                                                       (0)            (0)
Proof. For each C0 ∈ Φ0 and each j ∈ [n0 ], let DSC and D j denote the basic LP solution satisfying
                                                           0


               (0)        (0)
            DSC | j = D j       ∀C0 ∈ Φ0 ∀ j ∈ SC0             and           E      E         [ f (α + bC0 )] ≥ c .
                 0                                                         C0 ∈Φ0 α∼D(0)
                                                                                        SC
                                                                                          0


                                                                                                                      (0)
Each constraint C ∈ Φ is sampled according to some constraint C0 ∈ Φ0 , and we take DSC := DSC for
                                                                                                             0
the corresponding constraint C0 ∈ Φ0 . Also, each variable xi for i ∈ [n0 · n], belongs to one of the sets X j
                                         (0)
for j ∈ [n0 ], and we take Di := D j for the corresponding j ∈ [n0 ].
    The consistency of the distributions follows immediately from the construction of the instance Φ.
Let C ∈ Φ be any constraint and let C0 be the corresponding constraint in Φ0 . If xC0 = ( j1 , . . . , jk ), then
xC = (i1 , . . . , ik ) where each ir ∈ { jr } × [n] for all distinct jr . Thus, for any r ∈ [k],
                                                     (0)             (0)
                                       DSC |ir = DSC | jr = D jr = Dir .
                                                       0



                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                                15
                             M RINALKANTI G HOSH AND M ADHUR T ULSIANI

                                                                                                    (0)
We note that the repeated variables in xC0 gets the same value under the basic LP solution DSC and
                                                                                                      0
therefore the same is true for DSC . To bound the objective value, we again consider its expectation over a
randomly generated instance Φ. Let C be a random constraint added to Φ. Then, if we define DSC as
above for this constraint, we have

                         E E [ f (α + bC )] =         E     E [ f (α + bC0 )] ≥ c .
                         C α∈DS                     C0 ∈Φ0 α∼D(0)
                                C


Thus, the expected contribution of each constraint is at least c. The probability that the
                                                                                          average of m
constraints deviates by at least ε/10 from the expectation, is at most exp −Ω(ε 2 · m) . There exists
γ = O(1/ε 2 ) such that for m ≥ γ · n, the probability is at most exp(−Ω(n)).

    To construct local distributions for the Sherali–Adams hierarchy, we will consider (a slight modifica-
tion of) the hypergraph H corresponding to the instance Φ. We first show that distributions on hyperedges
of this hypergraph can be consistently propagated in a tree, provided they agree on intersecting vertices.
    For a set U ⊆ V(H) in a hypergraph H, recall that cl(U) includes all paths of lengths at most 1
between any two vertices in U. Thus, E(cl(U)) = {e ∈ E | |e ∩U| ≥ 2}. Note that Lemma 5.2 implies
that hyperedges forming a tree in H satisfy the hypothesis of Lemma 5.3 below.
Lemma 5.3. Let H = (V, E) be a hypergraph. Let U ⊆ V be such that the set of hyperedges E(cl(U))
form a tree. For each e ∈ E(cl(U)), let De be a distribution on [q]e such that for any u ∈ U and
e1 , e2 ∈ E(cl(U)) such that e1 ∩ e2 = {u}, we have De1 |u = De2 |u = Du . Then,

    - there exists a distribution DU on [q]U such that DU|e∩U = De|e∩U for all e ∈ E(U);

    - if U 0 ⊆ U is such that the hyperedges in E(cl(U 0 )) form a subtree of E(cl(U)), then DU|U 0 = DU 0 .
Proof. We define the distribution by starting with an arbitrary hyperedge and traversing the tree in an
arbitrary order. Let e1 , . . . , er be a traversal of the hyperedges in E(cl(U)) such that for all i,
                                                          !
                                             [
                                                   e j ∩ ei = 1 .
                                             j<i
         S
Let U0 = j<i e j be the set of vertices for which we have already sampled an assignment and let ei be the
next hyperedge in the traversal, with u being the unique vertex in ei ∩U0 . We sample an assignment to
the vertices in e, conditioned on the value for the vertex u. Formally, we extend the distribution DU0 to
U0 ∪ e by taking, for any α ∈ [q]U0 ∪e

                                               De (α(e))                    De (α(e))
                 DU0 ∪e (α) = DU0 (α(U0 )) ·               = DU0 (α(U0 )) ·           .
                                               De|u (α(u))                  Du (α(u))

The above process defines a distribution Dcl(U) on cl(U), with

                                                 ∏e∈E(U) De (α(e))
                              Dcl(U) (α) =                     deg(u)−1 .
                                                     D
                                             ∏u∈cl(U) u (α(u))

                      T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                               16
                               F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

In the above expression, we use deg(u) to denote the degree of vertex u in tree formed by the hyperedges in
E(cl(U)), i. e., deg(u) = |{e ∈ E(cl(U)) | u ∈ e}|. We then define the distribution DU as the marginalized
distribution Dcl(U)|U , i. e.,
                                     DU (α) = ∑ Dcl(U) (β ) .
                                                          β ∈[q]cl(U)
                                                          β (U)=α

Note that the distribution Dcl(U) and hence also the distribution DU are independent of the order in which
we traverse the hyperedges in E(cl(U)). Also, since the above process samples each hyperedge according
to the distribution De , we have that for any e ∈ E(U), Dcl(U)|e = De . Thus, also for any e ∈ E(U),

                                                  DU|e∩U = De|e∩U .

      Let U 0 ⊆ U be any set such that E(cl(U 0 )) forms a subtree of E(cl(U)). Then there exists a traversal
e1 , . . . , er , and i ∈ [r] such that e j ∈ E(cl(U 0 )) ∀ j ≤ i and e j ∈
                                                                          / E(cl(U 0 )) ∀ j > i. However, the distribution
defined by the partial traversal e1 , . . . , ei is precisely Dcl(U 0 ) . Thus, we get that

                                                 Dcl(U)| cl(U 0 ) = Dcl(U 0 )

which implies DU|U 0 = DU 0 .

    We can now prove the completeness for our construction using consistent decompositions.
Lemma 5.4. Let ε > 0 and let Φ be a random instance of MAX k-CSPq ( f ) generated by choos-
ing γ · n constraints independently at random as above, for large enough n. Then, there is a t =
Ωε,k,n0 (log n/ log log n), such that with probability 1 − ε over the choice of Φ, there exist distributions
{DS }|S|≤t satisfying:

    - For all S ⊆ V with |S| ≤ t, DS is a distribution on [q]S .

    - For all T ⊆ S ⊆ V with |S| ≤ t, DS|T = DT .

    - The distributions satisfy
                                             E        E    [ f (αC + bC )] ≥ c − ε .
                                            C∈Φ αC ∼DSC

Proof. By Theorem 3.3, we know that there exists δ such that with probability 1 − ε/4, after removing
a set of constraints CB of size at most (ε/4) · m, we can assume that the remaining instance has girth
at least g = δ · log n. Also, there exist θ , c > 0 such that for all large enough n, for all t ≤ nθ , the
metric ρµH restricted to any set S of size at most t embeds isometrically into the unit sphere in `2 , for all
µ ≥ c · (logt + log log n)/ log n.
    We choose µ = 2c · log log n/ log n and t = ε 2 /(400k2 · µ) so that
                                       logt + log log n                     √             ε
                             µ ≥ c·                                and          µ ·t ≤       .
                                            log n                                        20k
Thus, by Lemma 4.3, H admits an (ε/2)-sparse partitioning scheme of order t with each cluster in the
partition having diameter at most ∆H = O(1/µ). Let {PS }|S|≤t denote this partitioning scheme.

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                         17
                              M RINALKANTI G HOSH AND M ADHUR T ULSIANI

    Given a set S, the distribution DS is a convex combination of several distributions DS,P , corresponding
to different partitions P sampled from PS . We describe the distribution DS by giving the procedure to
sample an α ∈ [q]S . Given the set S with |S| :≤ t

    - Sample a partition P = (U1 , . . . ,Ur ) from the distribution PS .

    - For each set Ui , consider the set C (Ui ) obtained by including the vertices contained in all the
      hyperedges in the shortest path between all u, v ∈ Ui . Note that since Ui has diameter at most
      ∆H in H, C (Ui ) is connected and in fact C (Ui ) = cl∆H (Ui ). Also, since the each vertex in an
      included path is within distance at most ∆H /2 of an end-point, and Ui has diameter at most ∆H , we
      know that the diameter of C (Ui ) is at most 2 · ∆H . Hence, C (Ui ) is a tree. Finally, we must have
      cl(C (Ui )) = C (Ui ) since any additional path of length 1 would create a cycle of length at most
      2 · ∆H + 1.
      Thus, by Lemma 5.2 and Lemma 5.3 (with probability at least 1 − ε/4) there exists a distribution
      DC(Ui ) for each Ui , satisfying
                                          DC(Ui )|e = De
      for all e ∈ E (C (Ui )). Here, De are the distributions given by Lemma 5.2, which form a solution to
      the basic LP for Φ, with value at least c − ε/4. For each Ui , define the distribution

                                                       DU0 i := DC(Ui )|Ui .

    - Sample α ∈ [q]S according to the distribution

                                                 DS,P := DU0 1 × · · · × DU0 r .

Thus, we have                                                          "                 #
                                                                            r
                                       DS :=              E            ∏         DU0 i       ,
                                                  P=(U1 ,...,Ur )∼PS       i=1

where the distributions DU0 i are defined as above.
    We first prove the distributions are consistent on intersections, i. e., DS|T = DT for any T ⊆ S. Note
that by Lemma 4.3, the distributions PS and PT satisfy PS|T = PT . Each partition (U1 , . . . ,Ur ) of S also
produces a partition T . For ease of notation, we assume that the first (say) r0 clusters have non-empty
intersection with T . Let Vi = Ui ∩ T for 1 ≤ i ≤ r0 (Vi = 0/ for i > r0 ). Then, we have
                                          "           #                      " 0         #
                                                  r                                                r
                  DS|T =           E
                            P=(U1 ,...,Ur )∼PS
                                                 ∏ DU0 i |Vi   =                 E
                                                                   P=(U1 ,...,Ur )∼PS
                                                                                                  ∏ DC(U )|V
                                                                                                           i       i
                                                 i=1                                              i=1
                                                                                                 " 0                   #
                                                                                                   r
                                                               =                 E
                                                                   P=(U1 ,...,Ur )∼PS
                                                                                                  ∏ DC(V )|V
                                                                                                           i       i
                                                                                                  i=1
                                                                                                 " 0                       #
                                                                                                       r
                                                               =                  E
                                                                   P0 =(V1 ,...,Vr0 )∼PT
                                                                                                  ∏ DC(V )|V   i       i
                                                                                                                               .
                                                                                                  i=1



                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                                       18
                             F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

The second to last equality above uses the fact that C (Vi ) is a subtree of C (Ui ) and thus

                                            DC(Ui )|C(Vi ) = DC(Vi )

by Lemma 5.3. The last equality uses the fact that PS|T = PT by Lemma 4.3.
    We now argue that the LP solution corresponding to the above distributions {DS }|S|≤t has value at
least c − ε. Recall that the value of the LP solution is given by

                                           E        E [ f (α + bC )] .
                                          C∈Φ α∼DSC

Consider any constraint C in Φ, with the corresponding set of variables SC and the corresponding
hyperedge e. When defining the distribution DSC , we will partition SC according to the distribution PSC .
By Lemma 4.3 and our choice of parameters
                                                               √       ε
                               P [SC is separated by P] ≤ 10k · µ · t ≤ .
                             P∼PSC                                     2

For a constraint set which is not in the deleted set CB , if the hyperedge e corresponding to the constraint
C is not split by a partition P sampled according to PSC , then by Lemma 5.3

                                               DSC ,P = DSC .

Here, DSC is the distribution given by Lemma 5.2. Since f is Boolean, we have that for C ∈
                                                                                         / CB ,
                                                                            ε
                                 E [ f (α + bC )] ≥       E [ f (α + bC )] − .
                               α∼DSC                    α∼DSC               2

Using Lemma 5.2 again, we get
                                                "                                           !#
                                                                                  ε
               E    E [ f (α + bC )] ≥ E     1 − 1{C∈CB } ·     E [ f (α + bC )] −
              C∼Φ α∼DSC                C∼Φ                    α∼DSC                2
                                                                ε                
                                     ≥ E     E [ f (α + bC )] − − E 1{C∈CB }
                                       C∼Φ α∼DS
                                                C
                                                                2 C∼Φ
                                           ε ε ε
                                     ≥ c− − −
                                           4 2 4
                                     ≥ c−ε ,

where the penultimate inequality uses the fact that the fraction of constraints in the initially deleted set CB
is at most ε/4 (for large enough n).

5.2   Integrality gaps for resistant predicates
Let f : {0, 1}k → {0, 1} be a Boolean predicate and let ρ( f ) = f −1 (1) /2k be the fractions of satisfying
assignments to f . Then f is approximation resistant if it is hard to distinguish the MAX-CSP instances on
f between which are at least 1 − o(1) satisfiable vs which are at most ρ( f ) + o(1) satisfiable.

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                19
                                    M RINALKANTI G HOSH AND M ADHUR T ULSIANI

    In [19] the authors introduce the notion of vanishing measure (on a polytope defined by f ) and use it
to characterize a variant of approximation resistance, called strong approximation resistance, assuming
the Unique Games conjecture. They also gave a weaker notion of vanishing measures, which they used to
characterize strong approximation resistance for LP hierarchies. In particular, they proved that when the
condition in their characterization is satisfied, there exists a (1 − o(1), ρ( f ) + o(1))-integrality gap for
O(log log n) levels of Sherali–Adams hierarchy for predicates f . Here, we show that using Theorem 1.1,
their result can be simplified and strengthened1 to O (log n/ log log n) levels.
    Let us first recall some useful notation defined by Khot et al. [19] before we define the notion of
vanishing measure:

Definition 5.5. For a predicate f : {0, 1}k → {0, 1}, let C( f ) be the convex polytope of first moments
(biases) of distributions supported on satisfying assignments of f , i. e.,
                            n                                                        o
                  C( f ) := ζ ∈ Rk ∀i ∈ [k], ζi = E [(−1)αi ] , Supp(ν) ⊆ f −1 (1) .
                                                                 α∼ν

For a measure Λ on C( f ), S ⊆ [k], b ∈ {0, 1}S and permutation π : S → S, let ΛS,π,b denote the induced
measure on RS by considering vectors with coordinates
                                          n                   o
                                            (−1)bπ(i) · ζπ(i)   ,
                                                                           i∈S

where ζ ∼ Λ.

     We recall below the definition of vanishing measure for LPs from [19] (see Definition 1.3) :

Definition 5.6. A measure Λ on C( f ) is called vanishing (for LPs) if for every 1 ≤ t ≤ k, the following
signed measure                                "            !                #
                                                                 t
                                     E      E        E
                                    |S|=t π:S→S b∈{0,1}t
                                                               ∏(−1)b      i
                                                                                 · fˆ(S) · ΛS,π,b
                                                                i=1

is identically 0. We say f has a vanishing measure if there exists a vanishing measure Λ on C( f ).

     In particular, they prove the following theorem:

Theorem 5.7. Let f : {0, 1}k → {0, 1} be a k-ary Boolean predicate that has a vanishing measure. Then
for every ε > 0, there is a constant cε > 0 such that for infinitely many N ∈ N, there exists an instance Φ
of MAX k-CSP( f ) on N variables satisfying the following:

    • OPT(Φ) ≤ ρ( f ) + ε.

    • The optimum for the LP relaxation given by cε · log log N levels of Sherali–Adams hierarchy has
                                                                √
                                       FRAC(Φ) ≥ 1 − O(k · ε) .
    1 The LP integrality gap result of Khot et al. is in fact slightly stronger than stated above. They show that LP value is at least

1 − o(1) while there is no integer solution achieving a value outside the range [ρ( f ) − o(1), ρ( f ) + o(1)]. It is easy to see that
the same also holds for the instance constructed here.


                            T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                                  20
                              F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

    Combining this with our Theorem 1.1 already gives us the following stronger result:
Corollary 5.8. Let f : {0, 1}k → {0, 1} be a k-ary Boolean predicate that has a vanishing measure. Then
for every ε > 0, there is a constant cε > 0 such that for infinitely may N ∈ N, there exists an instance Φ
of MAX k-CSP( f ) on N variables satisfying the following:
    • All integral assignment of Φ satisfies at most ρ( f ) + ε fraction of constraints.
    • The LP relaxation given by cε · log N/ log log N levels of Sherali–Adams hierarchy has
                                                                 √
                                          FRAC(Φ) ≥ 1 − O(k ε) .

    However, note that to apply Theorem 1.1, one only needs a gap for the basic LP, which is much
weaker requirement than the O(log log N)-level gap given by Theorem 5.7. We observe below that the
gap for the basic LP follows very simply from the construction by Khot et al. [19]. One can then directly
use this gap for applying Theorem 1.1 instead of going through Theorem 5.7.
    Khot et al. [19] use the probabilistic construction given in Figure 4, for a given ε > 0. The construction
                                                                                                          √
actually requires Λ to be a vanishing measure over the polytope Cδ ( f ) := (1 − δ ) · C( f ), for δ = ε.
However, since Cδ ( f ) is simply a scaling of C( f ), a vanishing measure over C( f ) also gives a vanishing
measure over Cδ ( f ). Note that each ζ0 ∈ C( f ) corresponds to a distribution ν0 supported in f −1 (1).
For each ζ ∈ Cδ , let ζ0 = ζ /(1 − δ ) be the point in C( f ) with distribution ν0 . Then the distribution
ν = (1 − δ ) · ν0 + δ ·Uk (where Uk denotes the uniform distribution on {0, 1}k ) satisfies
                                           ∀i ∈ [k] E [(−1)αi ] = ζi .
                                                     α∼ν




   Let n0 = d1/ε e. Partition the interval [0, 1] into n0 + 1 disjoint intervals I0 , I1 , . . . , In0 where I0 = {0}
   and Ii = ((i − 1)/n0 , i/n0 ] for 1 ≤ i ≤ n0 . For each interval Ii , let Xi be a collection of n variables
   (disjoint from all X j for j 6= i).
   Generate m constraints independently according to the following procedure:

       • Sample ζ ∼ Λ.

       • For each j ∈ [k], let i j be the index of the interval which contains |ζ ( j)|. Sample uniformly a
         variable y j from the set X j .

       • If ζ ( j) < 0, then negate y j . If ζ ( j) = 0, then negate y j with probability 1/2.

       • Introduce the constraint f on the sampled k-tuple of literals.

                 Figure 4: Sherali–Adams integrality gap instance for vanishing measure.

    They show for a sufficiently large constant γ, an instance Φ with m = γ · n constraints satisfies with
high probability, that for all assignments σ , |satΦ (σ ) − ρ( f )| ≤ ε (see Lemma 4.4 in [19]). The proof is
similar to that of of Lemma 5.1.

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                      21
                                   M RINALKANTI G HOSH AND M ADHUR T ULSIANI

    Additionally, we need the following claim from [19] (see Claim 4.7 there), which allows one to
“round” coordinates of the vectors ζ ∈ Cδ ( f ) to the end-points of the intervals I0 , . . . , In0 . This ensures
that any two variables in the same collection Xi have the same bias. The proof of the claim follows simply
from a hybrid argument. We include it in the appendix for completeness.
Claim 5.9. Let ζ ∈ Cδ ( f ) and let ν be the corresponding distribution supported in f −1 (1) such that for
all i ∈ [k], we have
                                           ζi = E [(−1)αi ] .
                                                               α∼ν
Let t1 , . . . ,tk ∈ [0, 1] be such that for all i ∈ [k], |ti − |ζi || ≤ ε for ε < δ /2. Then there exists a distribution
ν 0 on {0, 1}k such that
                ν − ν 0 1 = O(k · (ε/δ ))                and         ∀i ∈ [k],    E [(−1)αi ] = sign(ζi ) · ti .
                                                                                 α∼ν 0

Proof. Let r j = sign(ζ j ) ·t j be the desired bias of the j-th coordinate. Then, ζ ( j) − r j ≤ ε for all j ∈ [k].
We construct a sequence of distributions ν0 , . . . , νk such that ν0 = ν and νk = ν 0 . In ν̄ j , the biases are
(r1 , . . . , r j , ζ j+1 , . . . , ζk ).
      The biases in ν0 satisfy the above by definition. We obtain ν̄ j from ν̄ j−1 as,
                                              ν j = (1 − τ j ) · ν j−1 + τ j · D j ,
where D j is the distribution in which all bits, except for the j-th one, are sampled independently according
to their biases in ν̄ j−1 . For the j-th bit, we fix it to sign(r j − ζ j ) (if r j − ζ ( j) = 0, we can simply proceed
with ν̄ j = ν̄ j−1 ). The biases for all except for the j-th bit are unchanged. For the j-th bit, the bias now
becomes r j if
          r j = (1 − τ j ) · ζ j + τ j · sign(r j − ζ j ) =⇒ τ j · (sign(r j − ζ j ) − r j ) = (1 − τ j ) · (r j − ζ j ) .
Since ζ ∈ Cδ ( f ), we know that sign(r j − ζ ( j)) − r j ≥ δ /2. Also, r j − ζ ( j) ≤ ε by assumption. Thus,
we can choose τ j = O(ε/δ ) which gives that ν̄ j − ν̄ j−1 1 = O(ε/δ ). The final bound then follows by
triangle inequality.

    We can now use the above to give a simplified proof of Corollary 5.8.

Proof of Corollary 5.8. Here we exhibit a solution of the basic LP (Figure 2) for the instance given in
Figure 4. For each variable y j coming from the set X j for j ∈ {0, 1, . . . , n0 }, we fix the bias t j of the variable
to be the rightmost point of the interval I j , i. e., fix x(y j ,−1) = (1 − j/n0 ) /2 and x(y j ,1) = (1 + j/n0 ) /2.
    For each constraint C of the form f (yi1 + b1 , . . . , yik + bk ), let ζ (C) ∈ Cδ ( f ) be the point used to
generate it, and let ν(C) denote the corresponding distribution on {0, 1}k . By Claim 5.9, there exists a
distribution ν 0 (C) such that kν(C) − ν 0 (C)k1 = O(k · ε/δ ) and such that the biases of the literals satisfy
                                                E       [(−1)α j ] = sign(ζ j ) · ti j ,
                                            α∼ν 0 (C)

where ti j denotes the bias for the interval to which yi j belongs. When ti j 6= 0, we negate a variable only
when sign(ζ j ) < 0. Thus, we have
                                                 h            i
                                            E     (−1)α j +b j = ti j ,
                                                α∼ν 0 (C)


                           T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                             22
                              F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

which is consistent with the bias given by the singleton variables x(yi j ,1) and x(yi j ,−1) . We thus define the
local distribution on the set SC as

                                          DSC (α) = (ν 0 (C))(α + bC ) .

      For all C ∈ Φ, since ζ (C) ∈ Cδ ( f ), we have that

                                                  E   [ f (α)] ≥ 1 − δ .
                                            α∼ν(C)

Also, since kν(C) − ν 0 (C)k1 = O(k · ε/δ ), we get that

                                       E        [ f (α)] ≥ 1 − δ − O(k · ε/δ ) .
                                    α∼ν 0 (C)

Thus, we have for all C ∈ Φ,

                                    E [ f (α + bC )] ≥ 1 − δ − O(k · ε/δ ) .
                                  α∼DSC

              √
Taking δ =     ε proves the claim.

5.3     Lower bounds for LP extended formulations
A connection between LP integrality gaps for the Sherali–Adams hierarchy, and lower bounds on the size
of LP extended formulations, was first established by Chan et al. [7] and later improved by Kothari et
al. [21]. In [21], the authors proved the following:

Theorem 5.10 ([21], Theorem1.2). There exist constants 0 < h < H such that the following holds.
Consider a function g : N → N. Suppose that the g(n)-level Sherali–Adams relaxation for a CSP cannot
achieve a (c, s)-approximation on instances on n variables. Then, no LP extended formulation (of the
original LP) of size at most nh·g(n) can achieve a (c, s)-approximation for the CSP on nH variables.

    Combining Theorem 1.1 with Theorem 5.10 yields (with g(N) := cε · log N/ log log N) we get the
following corollary.

Corollary 1.2. Let f : [q] → {0, 1} be any predicate. Let Φ0 be a (c, s)-integrality gap instance for basic
LP relaxation of MAX k-CSP ( f ). Then for every ε > 0, there exists c0ε > 0 such that for infinitely many
N ∈ N, there exist (c − ε, s + ε)-integrality gap instances of size N, for every linear extended formulation
                   0
of size at most N cε ·log N/ log log N .


6      Conclusions and open problems
This work shows a dichotomy result for approximating CSPs using linear programs, proving that if a (c, s)
approximation is not achievable using the basic LP, then for every ε > 0, (c − ε, s + ε) approximation
is not achievable using Oε (log n/(log log n)) levels of the Sherali–Adams hierarchy. A natural open
problem is to extend this result to nO(1) levels of the Sherali–Adams hierarchy. Using the results of [21],

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                 23
                              M RINALKANTI G HOSH AND M ADHUR T ULSIANI

this would also show that even exponential-size LP extended formulations are at most as strong as the
basic LP.
    As mentioned in the paper, the current limitation on the number of levels in our result, comes from the
consistency requirement of the low-diameter decompositions. Given a d-dimensional `2 embedding of the
restriction of the metric ρµH to a set of vertices S, the fraction of edges cut by the decomposition procedure
                  √
of [8] grows as d. Our current proof only uses the trivial bound that when |S| = t, the metric admits a
t-dimensional `2 embedding. Even though for a single set S, this bound can be improved to O(logt) at
the cost of slight errors in the distances (using randomized dimension reduction), we do not know how
to do this consistently across various sets S. In particular, since we want to always obtain low-diameter
components, we may need to reject a small fraction of dimension reduction maps for S, if they shrink
the distances too much. However, such maps may not necessarily be rejected when considering T ⊆ S,
which can violate the consistency requirement PS|T = PT . Understanding how and when randomized
dimension reduction can be combined with the consistency requirement is an intriguing question which
may also be useful in other applications.


Acknowledgements
We thank Chandra Chekuri, Subhash Khot and Yury Makarychev for helpful discussions, and Rishi Saket
for pointers to references. We are also grateful to the CCC 2017 reviewers, and the ToC referees and
editors, for numerous suggestions to improve the presentation of this paper. This research was supported
by the National Science Foundation under award number CCF-1254044.


A     Local `2 -embeddability of the Metric ρµH
The goal of this section is to prove the following result about the local `2 -embeddability of the metric ρµH .


Theorem 3.3. Let H 0 ∼ Hk (m, n, n0 , Γ) with m = γ · n edges and let ε > 0. Then for large enough n,
with high probability (at least 1 − ε, over the choice of H 0 ), there exist δ > 0, constant c = c(k, γ, n0 , ε),
θ = θ (k, γ, n0 , ε) and a subhypergraph H ⊂ H 0 with V (H) = V (H 0 ) satisfying the following:

    • H has girth g ≥ δ · log n.

    • |E(H 0 ) \ E(H)| ≤ ε · m.

    • For all t ≤ nθ , for µ ≥ c · (logt + log log n)/ log n, for all S ⊆ V(H 0 ) with |S| ≤ t, the metric ρµH
      restricted to S is isometrically embeddable into the unit sphere in `2 .

    To prove the above theorem, we will use the local structure of random hypergraphs. We first prove
that with high probability for random hypergraphs (sampled from Hk (m, n, n0 , Γ)) a few hyperedges can
be removed to obtain a hypergraph whose girth is Ω(log n) and the degree is bounded. The following
lemma shows a possible trade-off between the degree of the hypergraph vs the number of hyperedges
required to be removed.

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                  24
                               F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

Lemma A.1. Let H 0 ∼ Hk (m, n, n0 , Γ) be a random hypergraph with m = γ · n hyperedges for large
enough γ. Then for any ε > 0, with probability 1 − ε, there exists a sub-hypergraph H with V (H) = V (H 0 )
such that ∀u ∈ V (H), degH (u) ≤ 100 · log (n0 /ε) · k · γ and |E(H 0 ) \ E(H)| ≤ ε · m.

Proof. By linearity of expectation, the expected degree of any vertex v in H 0 is at most k · γ. Let
D = 100 · log (n0 /ε) · k · γ, and let S be the set of all vertices u with degH 0 (u) > D. Let ES be the set of all
hyperedges with at least one vertex in S. We shall take E(H) = E(H 0 ) \ ES . Note that for any u ∈ V (H 0 ),
P [u ∈ S] = P [degH 0 (u) ≥ D] ≤ exp(−D/4) by a Chernoff-Hoeffding bound. We use this to bound the
expected number of edges deleted.
                                                            
             E [ES ] ≤      ∑ E deg(u) · 1{u∈S} = ∑ E [deg(u) | u ∈ S] · P [u ∈ S]
                         u∈V (H 0 )                                  u∈V (H 0 )

                                                                 ≤      ∑ E [deg(u) | u ∈ S] · exp (−D/4)
                                                                     u∈V (H 0 )

                                                                 ≤      ∑ (D + kγ) · exp (−D/4)
                                                                     u∈V (H 0 )

                                                                 ≤ (n · n0 ) · 2D · exp (−D/4) .

The penultimate inequality uses the independence of the hyperedges in the generation process, which
gives E [degH 0 (u) | degH 0 (u) ≥ D] ≤ D + E [degH 0 (u)]. From our choice of the parameter D, we get that
E [ES ] ≤ ε 2 · γ · n = ε 2 · m. Thus, the number of edges deleted is at most ε · m with probability at least
1 − ε.

    The following lemma shows that the expected number of small cycles in random hypergraph is small.


Lemma A.2. Let H ∼ Hk (m, n, n0 , Γ) be a random hypergraph and for ` ≥ 2, let Z` (H) denote the
number of cycles of length at most ` in H. For m, n and k such that k2 · (m/n) > 2, we have
                                                                                  m 2`
                                                E            [Z` (H)] ≤       k2 ·       .
                                          H∼Hk (m,n,n0 ,Γ)                         n

Proof. Let the vertices of H correspond to the set [n0 ] × [n]. Suppose we contract the set [n0 ] × { j} of
vertices into a single vertex j ∈ [n] to get a random multi-hypergraph H 0 on vertex set [n]. An equivalent
way to view the sampling to H 0 is: for each i ∈ [m], the i-th hyperedge ei of H 0 is sampled by independently
sampling kei vertices (with replacement) uniformly at random from [n]. Note that the sampling of H 0 is
independent of Γ in the definition of Hk (m, n, n0 , Γ). Clearly, a cycle of length at most ` in H produces a
cycle of length at most ` in H 0 . Hence, it suffices to bound the expected number of cycles in H 0
    Given any pair (u0 , v0 ) of vertices of H 0 , for u0 6= v0 , the probability of the pair (u0 , v0 ) belonging
together in some hyperedge of H 0 is at most mk2 /n2 —since each hyperedge e contains at most k vertices.
Consider a given h-tuple u = (ui1 , · · · uih ) of variables. Note that we require that hyperedges participating
in a cycle be distinct and we don’t count hyperedges with one vertex in them while forming cycle. So, the
probability that u is part of a cycle in H 0 , i. e., there exists distinct hyperedges e j ∈ H 0 for j ∈ [h] such

                         T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                  25
                                 M RINALKANTI G HOSH AND M ADHUR T ULSIANI

                                                                                h
that ui j , ui j+1 ∈ e j for j ∈ [h − 1], and ui1 , uih ∈ eh is at most mk2 /n2 . As a result, the expected number
of cycles of length h in H 0 is bounded above by
                                        2 h                2 h                h
                                       n     mk              h mk
                                                                          
                                                                               2 m
                                                        ≤  n            =    k  ·      .
                                       h      n2                  n2              n
From the geometric form of the bound, it follows that expected number of cycles of length at most ` in
H 0 is at most                               `+1
                                   k2 · m/n                   2`
                                    2
                                                  < k2 · m/n .
                                  (k · m/n) − 1
    Using the above lemma, it is easy to show that one can remove all small cycles in a random hypergraph
by deleting only a small number of hyperedges.
Corollary A.3. Let H ∼ Hk (m, n, n0 , Γ) be a random hypergraph with m = γ · n for γ > 1. Then, there
exists δ = δ (γ, k) > 0 such that with probability 1 − n−1/6 , all cycles of length at most δ · log n in H can
be removed by deleting at most n2/3 hyperedges.
Proof. As above, let Z` denote the number of cycles of length at most `. With the choice of m, n, and k,
                                                          2`
we have k2 · m/n ≥ 2. By Lemma A.2, E [Z` ] ≤ k2 · m/n . Since m = γ · n, there exists a g = δ · log n
                   √
such that E [Zg ] ≤ n. By Markov’s inequality, P Zg ≥ n2/3 ≤ n−1/6 . Thus, with probability 1 − n−1/6 ,
                                                             

one can remove all cycles of length at most δ · log n by deleting at most n2/3 edges.

    Charikar et al. [11] prove an analogue of Theorem 3.3 for metrics defined on locally-sparse graphs
(see Definition A.5). In fact, they use a consequence of sparsity, which they call `-path decomposability.
To this end, we define the incidence graph2 associated with a hypergraph, on which we will apply their
result.
Definition A.4. Let H = (V (H), E(H)) be a hypergraph. We define its incidence graph as the bipartite
graph GH defined on vertex sets V (H) and E(H), and edge set E defined as
                                   E := {(v, e) | v ∈ V (H), e ∈ E(H), v ∈ e} .
    Note that for any u, v ∈ V (H), we have dGH (u, v) = 2 · dH (u, v). We now define local sparsity of the
incidence graph.
Definition A.5 (Local Sparsity). A graph G = (V, E) on n vertices is defined to be (τ, η)-sparse if for all
S ⊂ V of size at most τ · n we have |E(S)| ≤ |S| /(1 − η) , where E(S) denotes the set of edges contained
in S.
    We note that we will require the sparsity η to be Ok,γ (1/ log n). This gives sparsity only for sublinear-
size sets, as compared to sets of size Ω(n) in previous results where η is a constant. We also observe that
if G1 is (τ, η)-sparse and G2 is a subgraph of G1 then G2 is also (τ, η)-sparse.
    Now we prove that the incidence graph of a hypergraph sampled from Hk (m, n, n0 , Γ) is locally
sparse with high probability. The proof of the following lemma follows closely proofs of analogous
statements in [1, 6].
   2 This is the same notion as the constraint–variable graph considered in various papers on lower bounds for CSPs.



                          T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                                        26
                            F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

Lemma A.6. For every γ > 1 and m = γ · n · n0 , let η < 1/4. Then there exists a constant
                                                           −1/η
                                 τ ≤ (1 + γ)2 · e3 · n20
such that:                                                            2 4 2
                                                                      γ · e · n0
                            P          [GH is not (τ, η)-sparse] ≤ O             ,
                      H∼Hk (m,n,n0 ,Γ)                                    n
for all large enough n.
Proof. For a randomly generated hypergraph H ∼ Hk (m, n, n0 , Γ), we can think of the incidence graph
as a randomly generated bipartite graph on the vertex sets [m] and V (H). We denote second vertex set by
[m] instead of E(H) since the set E(H) is only defined after sampling H. For each i ∈ [m], we randomly
sample a type from Γ, and sample at most k neighbors according to the type. Conditioned on a fixed type,
we have that for any v ∈ V (H), P [(v, i) ∈ E] ≤ 1/n.
    Let T be a set of vertices of GH and let v, i ∈ T be fixed. Since vertices in each bucket are chosen
independently and choices for different indices i ∈ [m] are made independently, we have that the probability
that (v, i) is an edge in GH conditioned on edges of T × T \ {(v, i), (i, v)} is at most 1/n.
    Hence the probability that a set T of h vertices induces at least r = h/(1 − η) edges is at most
                                                 h
                                                   2      1
                                                        · r.
                                                   r      n
Since GH has N 0 = N + m = (1 + γ)n · n0 vertices, by union bound, the probability that GH is not
(τ, η)-sparse is at most
           τ·N 0  0  h 
                               
                  N           2      1                                                  h
            ∑ h r · nr                                                            r :=
                                                                                       1−η
           h=2
                    0                          r
                        N 0 · e h h2 · e
                τ·N                       
                                                    1
           ≤ ∑                     ·             · r
                h=2       h            2·r         n
                    0                         h
                                                     (1 − η) · h · e h/(1−η)
                τ·N                                              
                        (1 + γ) · n · n0 · e
           = ∑                                   ·
                h=2               h                       2n
                                                                         !h/(1−η)
                τ·N 0 (1 − η) · (1 + γ)1−η · e2−η · n1−η  η
                                                           0        h
           = ∑                                                   ·
                h=2                         2                       n
                τ·N 0                       η h/(1−η)
                                     2        h
           ≤ ∑ (1 + γ) · e · n0 ·                              .
                h=2                           n


We split the above sum in two parts depending on the range of h. Let us first consider
     τ·N 0                 η h/(1−η)     τ·(1+γ)·n·n0
                     2      h                                                                 h/(1−η)
     ∑θ (1 + γ) · e · n0 · n               ≤      ∑θ (1 + γ) · e2 · n0 · (τ · (1 + γ) · n0 )η
    h=n                                          h=n
                                                            
                                                          nθ
                                           ≤ 2 exp −            ,
                                                         1−η


                      T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                              27
                              M RINALKANTI G HOSH AND M ADHUR T ULSIANI

where the penultimate inequality follows from our setting of parameter
                                                                 −1/η
                                         τ ≤ (1 + γ)2 · e3 n20

and the geometric nature of summands.
   Now we fix θ = 1/2 and consider the first half of the summation

                  nθ                      η h/(1−η)                           2/(1−η)
                                                                (1 + γ) · e2 · n0
                                                              
                                 2         h               θ
                  ∑    (1 + γ) · e · n0 ·               ≤ n ·
                 h=2                       n                       n(1−θ )·η
                                                                              2/(1−η)
                                                            (1 + γ) · e2 · n0
                                                          
                                                        =                              .
                                                                 n1−θ


Hence the probability that GH is not (τ, η)-sparse is at most (1 + γ)2 · e4 · n20 /n + 2 exp −n1/2 for all
                                                                                                  

large enough n > (2(1 + γ) · e2 · n0 )2 .

   The above property was also implicitly used by Arora et al. [3] in proving the following lemma (see
Lemma 2.12 in [3]).

Lemma A.7. Let ` > 0 be an integer and 0 < η < 1/(3` − 1) < 1. Let G be a η-sparse graph with girth
g > `. Then G is `-path decomposable.

    Recall that we defined the metrics dµH and ρµH on H as (for u 6= v)
                                                                             s
                                                                                 2 · dµH (u, v) + µ
             dµH (u, v) := 1 − (1 − µ)2·dH (u,v)      and    ρµH (u, v) :=                          .
                                                                                        1+µ

For a graph G, we define the following two metrics, for u 6= v:
                                                                          s
                                                                              dµG (u, v) + µ
                  dµG (u, v) := 1 − (1 − µ)dG (u,v)   and ρµG (u, v) :=                      .
                                                                                   1+µ

We note that if H is a hypergraph and GH is its incidence graph, then the metrics dµGH and ρµGH restricted
to V (H), coincide with the metrics dµH and ρµH defined on H. Charikar et al. proved the following theorem
(see Theorem 5.2) in [9].
                                                                                           √
Theorem A.8 ([9]). Let G be a graph on n0 vertices with maximum degree D. Let t < n0 and ` > 0 be
such that for t 0 = D`+1 · t, every subgraph of G on at most t 0 vertices is `-path decomposable. Also, let µ,
t and ` satisfy the relation (1 − µ)`/9 ≤ µ/(2t + 2). Then for every subset S of at most t vertices there
exists a mapping ψS from S to the unit sphere in `2 such that all u, v ∈ S

                                      kψS (u) − ψS (v)k2 = ρµG (u, v) .

    We use this theorem to prove the main theorem of the section.

                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                               28
                             F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

Proof of Theorem 3.3. Let H 0 ∼ Hk (m, n, n0 , Γ) with m = γ · n hyperedges and N = n0 · n vertices for
large enough n. Given ε > 0, from Lemma A.1 we have that with probability at least 1 − ε/2, there exists
H1 ⊂ H such that the maximum degree of H1 is at most D = 100 · log (2n0 /ε) · k · γ with |E(H 0 ) \ E(H1 )| ≤
(ε/2) · m.
    Using Corollary A.3 we also have that there exists δ > 0, such that with probability at least 1 − ε/4
(for large enough n) H 0 has a sub-hypergraph H2 with g ≥ δ · log n and |E(H 0 ) \ E(H2 )| ≤ (ε/4) · m.
    By Lemma A.6, there exists η = Ωn0 ,k,γ (1/(log n)) such that for all large enough n, GH 0 is (τ, η)-
sparse with probability at least 1 − ε/4, for τ ≥ n−1/4 .
    Hence with probability 1 − ε, we have that H = (V (H 0 ), E(H1 ) ∩ E(H2 )) satisfies:
    • The maximum degree of H is bounded above by D.
    • GH is (τ, η)-sparse (for τ ≥ n−1/4 and η = Ωn0 ,k,γ (1/(log n)).
    • Girth of H is at least g > δ · log n.
    • |E(H 0 ) \ E(H)| ≤ ε · m.
We now show that the metric ρµH is locally `2 embeddable.
    For ease of notation, let us denote GH , incidence graph for the hypergraph H, as G. Note that
N ≤ |V(G)| ≤ N · (1 + γ) and the maximum degree of G is also bounded by D. Since a cycle in G is also
a cycle in H, the girth of G is also at least g ≥ δ · log n.
    By Lemma A.6, we have G is (τ, η)-sparse. By Lemma A.7, any subgraph of G on at most τ · (N + m)
vertices is `-path decomposable for any ` ≤ min{g, 1/(4η)}.
    Since D = 100 · kγ · log(2n0 /ε), there exists `0 = Ωk,γ,n0 ,ε (log n) such that D`0 +1 ≤ n1/6 . We choose
` = min {g, 1/(4η), `0 }.
    Let µ0 be the smallest µ such that exp (−µ`/9) ≤ µ/(2t + 2) (note that exp (−µ`/9) /µ is decreasing
in µ). Since we must have µ ≥ 1/`, there exists a µ0 satisfying
                                              9
                                       µ0 ≤     · (ln(2(t + 1)) + ln `) .
                                              `
From our choice of `, there exist constants c = c(k, γ, n0 , ε) and θ = θ (k, γ, n0 , ε) < 1/2 such that
µ0 ≤ c · (logt + log log n)/ log n < 1 when t ≤ nθ . Then, for any µ ∈ [µ0 , 1), we have
                                  (1 − µ)`/9 ≤ exp(−µ`/9) ≤ µ/(2t + 2) .
    We can now apply Theorem A.8 to construct the embedding. Given any subset S of V(H) of size
at most t ≤ nθ , note that S is also a subset of V(G). Moreover, we have t ≤ nθ ≤ (N + m)1/2 . Also, we
have t · D`+1 ≤ n1/2 · n1/6 = n2/3 ≤ τ · (N + m). Thus, any subgraph of G on t · D`+1 vertices is `-path
decomposable.
    Thus, when µ ≥ µ0 , by Theorem A.8 there exists a mapping ψS from S to the unit sphere, such that
for all u, v ∈ S, we have
                                kψS (u) − ψS (v)k2 = ρµG (u, v) = ρµH (u, v) ,
where the last equality uses the fact that for all u, v ∈ V(H), ρµH (u, v) = ρµG (u, v) since dG (u, v) =
2 · dH (u, v).


                       T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                               29
                           M RINALKANTI G HOSH AND M ADHUR T ULSIANI

References
 [1] M ICHAEL A LEKHNOVICH , S ANJEEV A RORA , AND I ANNIS T OURLAKIS: Towards strong nonap-
     proximability results in the Lovász–Schrijver hierarchy. Comput. Complexity, 20(4):615–648, 2011.
     Preliminary version in STOC’05. [doi:10.1007/s00037-011-0027-z] 2, 26

 [2] S ANJEEV A RORA , B ÉLA B OLLOBÁS , AND L ÁSZLÓ L OVÁSZ: Proving integrality gaps without
     knowing the linear program. In Proc. 43rd FOCS, pp. 313–322. IEEE Comp. Soc. Press, 2002.
     [doi:10.1109/SFCS.2002.1181954] 2

 [3] S ANJEEV A RORA , B ÉLA B OLLOBÁS , L ÁSZLÓ L OVÁSZ , AND I ANNIS T OURLAKIS: Proving
     integrality gaps without knowing the linear program. Theory of Computing, 2(2):19–51, 2006.
     [doi:10.4086/toc.2006.v002a002] 2, 11, 28

 [4] P ER AUSTRIN AND J OHAN H ÅSTAD: On the usefulness of predicates. ACM Trans. Comput.
     Theory, 5(1):1:1–1:24, 2013. Preliminary version in CCC’12. [doi:10.1145/2462896.2462897,
     arXiv:1204.5662] 6

 [5] B OAZ BARAK , S IU O N C HAN , AND P RAVESH K. KOTHARI: Sum of squares lower
     bounds from pairwise independence. In Proc. 47th STOC, pp. 97–106. ACM Press, 2015.
     [doi:10.1145/2746539.2746625, arXiv:1501.00734] 2, 9

 [6] S IAVOSH B ENABBAS , KONSTANTINOS G EORGIOU , AVNER M AGEN , AND M ADHUR T UL -
     SIANI : SDP gaps from pairwise independence. Theory of Computing, 8(12):269–289, 2012.
     [doi:10.4086/toc.2012.v008a012] 2, 26

 [7] S IU O N C HAN , JAMES R. L EE , P RASAD R AGHAVENDRA , AND DAVID S TEURER: Approximate
     constraint satisfaction requires large LP relaxations. J. ACM, 63(4):34:1–34:22, 2016. Preliminary
     version in FOCS’13. [doi:10.1145/2811255, arXiv:1309.0563] 2, 4, 23

 [8] M OSES C HARIKAR , C HANDRA C HEKURI , A SHISH G OEL , S UDIPTO G UHA , AND S ERGE
     P LOTKIN: Approximating a finite metric by a small number of tree metrics. In Proc. 39th FOCS,
     pp. 379–388. IEEE Comp. Soc. Press, 1998. [doi:10.1109/SFCS.1998.743488] 7, 12, 13, 24

 [9] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Integrality
     gaps for Sherali–Adams relaxations. In Proc. 41st STOC, pp. 283–292. ACM Press, 2009.
     [doi:10.1145/1536414.1536455] 2, 4, 7, 9, 11, 28

[10] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Near-optimal
     algorithms for maximum constraint satisfaction problems. ACM Trans. Algorithms, 5(3):32:1–32:14,
     2009. Preliminary version in SODA’07. [doi:10.1145/1541885.1541893] 3

[11] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Local global
     tradeoffs in metric embeddings. SIAM J. Comput., 39(6):2487–2512, 2010. Preliminary version in
     FOCS’07. [doi:10.1137/070712080] 7, 10, 26

                     T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                          30
                         F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

[12] W ENCESLAS F ERNANDEZ DE LA V EGA AND C LAIRE K ENYON -M ATHIEU: Linear programming
     relaxations of maxcut. In Proc. 18th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’07),
     pp. 53–61. ACM Press, 2007. ACM DL. 2, 6

[13] S AMUEL F IORINI , S ERGE M ASSAR , S EBASTIAN P OKUTTA , H ANS R AJ T IWARY, AND
     RONALD DE W OLF: Exponential lower bounds for polytopes in combinatorial optimization. J. ACM,
     62(2):17:1–17:23, 2015. Preliminary version in STOC’12. [doi:10.1145/2716307, arXiv:1111.0837]
     2

[14] M RINALKANTI G HOSH AND M ADHUR T ULSIANI: From weak to strong LP gaps for all CSPs.
     In Proc. 32nd Computational Complexity Conference (CCC’17), pp. 11:1–11:27. DROPS, 2017.
     [doi:10.4230/LIPIcs.CCC.2017.11, arXiv:1608.00497] 1

[15] S REYASH K ENKRE , V INAYAKA PANDIT, M ANISH P UROHIT, AND R ISHI S AKET: On the ap-
     proximability of digraph ordering. Algorithmica, 78(4):1182–1205, 2017. Preliminary version in
     ESA’15. [doi:10.1007/s00453-016-0227-7, arXiv:1507.00662] 4

[16] S UBHASH K HOT: On the power of unique 2-prover 1-round games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017] 3

[17] S UBHASH K HOT AND R ISHI S AKET: SDP integrality gaps with local `1 -embeddability. In Proc.
     50th FOCS, pp. 565–574. IEEE Comp. Soc. Press, 2009. [doi:10.1109/FOCS.2009.37] 3

[18] S UBHASH K HOT AND R ISHI S AKET: Approximating CSPs using LP relaxation. In Proc. 42nd
     Internat. Colloq. on Automata, Languages and Programming (ICALP’15), pp. 822–833. Springer,
     2015. [doi:10.1007/978-3-662-47672-7_67] 3

[19] S UBHASH K HOT, M ADHUR T ULSIANI , AND P RATIK W ORAH: A characterization of
     strong approximation resistance. In Proc. 46th STOC, pp. 634–643. ACM Press, 2014.
     [doi:10.1145/2591796.2591817] 5, 6, 13, 20, 21, 22

[20] V LADIMIR KOLMOGOROV, J OHAN T HAPPER , AND S TANISLAV Ž IVNÝ: The power of linear
     programming for general-valued CSPs. SIAM J. Comput., 44(1):1–36, 2015. Preliminary versions
     in FOCS’12 and ICALP’13. [doi:10.1137/130945648, arXiv:1311.4219] 4

[21] P RAVESH K. KOTHARI , R AGHU M EKA , AND P RASAD R AGHAVENDRA: Approximating rect-
     angles by juntas and weakly-exponential lower bounds for LP relaxations of CSPs. In Proc. 49th
     STOC, pp. 590–603. ACM Press, 2017. [doi:10.1145/3055399.3055438, arXiv:1610.02704] 3, 4,
     23

[22] P RAVESH K. KOTHARI , RYUHEI M ORI , RYAN O’D ONNELL , AND DAVID W ITMER: Sum of
     squares lower bounds for refuting any CSP. In Proc. 49th STOC, pp. 132–145. ACM Press, 2017.
     [doi:10.1145/3055399.3055485, arXiv:1701.04521] 2

[23] ROBERT K RAUTHGAMER , JAMES R. L EE , M ANOR M ENDEL , AND A SSAF NAOR: Measured
     descent: A new embedding method for finite metrics. Geom. Funct. Anal. GAFA, 15(4):839–858,
     2005. Preliminary version in FOCS’04. [doi:10.1007/s00039-005-0527-6, arXiv:cs/0412008] 7

                    T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                       31
                          M RINALKANTI G HOSH AND M ADHUR T ULSIANI

[24] E UIWOONG L EE: Hardness of graph pricing through generalized Max-Dicut. In Proc. 47th STOC,
     pp. 391–399. ACM Press, 2015. [doi:10.1145/2746539.2746549, arXiv:1405.0740] 4

[25] L ÁSZLÓ L OVÁSZ AND A LEXANDER S CHRIJVER: Cones of matrices and set-functions and 0–1
     optimization. SIAM J. Optim., 1(12):166–190, 1991. [doi:10.1137/0801013] 2

[26] P RASAD R AGHAVENDRA: Optimal algorithms and inapproximability results for every CSP? In
     Proc. 40th STOC, pp. 245–254. ACM Press, 2008. [doi:10.1145/1374376.1374414] 3, 5

[27] P RASAD R AGHAVENDRA AND DAVID S TEURER: Integrality gaps for strong SDP relax-
     ations of Unique Games. In Proc. 50th FOCS, pp. 575–585. IEEE Comp. Soc. Press, 2009.
     [doi:10.1109/FOCS.2009.73] 2, 3

[28] G RANT S CHOENEBECK: Linear level Lasserre lower bounds for certain k-CSPs. In Proc. 49th
     FOCS, pp. 593–602. IEEE Comp. Soc. Press, 2008. [doi:10.1109/FOCS.2008.74] 2

[29] H ANIF D. S HERALI AND WARREN P. A DAMS: A hierarchy of relaxations between the continuous
     and convex hull representations for zero-one programming problems. SIAM J. Discrete Math.,
     3(3):411–430, 1990. [doi:10.1137/0403036] 2

[30] J OHAN H ÅSTAD: On the efficient approximability of Constraint Satisfaction Problems. In
     Surveys in Combinatorics, volume 346, pp. 201–222. Cambridge University Press, 2007.
     [doi:10.1017/CBO9780511666209.008] 2

[31] J OHAN T HAPPER AND S TANISLAV Ž IVNÝ: The complexity of finite-valued CSPs. J. ACM,
     63(4):37:1–37:33, 2016. Preliminary version in STOC’13. [doi:10.1145/2974019, arXiv:1210.2987]
     4

[32] J OHAN T HAPPER AND S TANISLAV Ž IVNÝ: The power of Sherali–Adams relaxations for
     general-valued CSPs. SIAM J. Comput., 46(4):1241–1279, 2017. [doi:10.1137/16M1079245,
     arXiv:1606.02577] 4

[33] M ADHUR T ULSIANI: CSP gaps and reductions in the Lasserre hierarchy. In Proc. 41st STOC, pp.
     303–312. ACM Press, 2009. [doi:10.1145/1536414.1536457] 2


AUTHORS

     Mrinalkanti Ghosh
     Ph. D. student
     Toyota Technological Institute at Chicago
     Chicago, IL
     mkghosh ttic edu
     http://ttic.uchicago.edu/~mkghosh




                    T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                       32
                        F ROM W EAK TO S TRONG LP G APS FOR A LL CSP S

    Madhur Tulsiani
    Assistant professor
    Toyota Technological Institute at Chicago
    Chicago, IL
    madhurt ttic edu
    http://ttic.uchicago.edu/~madhurt


ABOUT THE AUTHORS

    M RINALKANTI G HOSH (called “Mrinal” by friends and colleagues) is a 5-th year graduate
       student at TTI-C working with Madhur Tulsiani. Before joining Ph. D. program, he
       completed his Masters at IIT Kanpur where he was working on topics in the intersection
       of Ergodic Theory and Computability Theory. Later, Mrinal switched to the more
       practical field of Computational Complexity Theory. Currently, while he is taking a break
       from his busy TV watching schedule, he tries to think about approximation algorithms.


    M ADHUR T ULSIANI is an assistant professor at TTI-Chicago, interested in various aspects
       of approximability and pseudorandomness. Madhur went to college at IIT Kanpur and
       spent some wonderful years at (the coffee shops around) UC Berkeley, while working on
       his Ph. D. with Luca Trevisan. Madhur enjoys biking, running, and aspires to one day
       learn some music (though it’s perhaps better for his neighbors that he hasn’t).




                   T HEORY OF C OMPUTING, Volume 14 (10), 2018, pp. 1–33                           33