Approximation Resistance on Satisfiable Instances for Sparse Predicates

Authors Sangxia Huang,
Plaintext
                        T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388
                                       www.theoryofcomputing.org




                Approximation Resistance on
              Satisfiable Instances for Predicates
                  with Few Accepting Inputs
                                                   Sangxia Huang∗
                 Received May 13, 2013; Revised January 18, 2014; Published October 15, 2014




       Abstract: For every integer k ≥ 3, we prove that there is a predicate P on k Boolean vari-
                     e 1/3
       ables with 2O(k ) accepting assignments that is approximation resistant even on satisfiable
       instances. That is, given a satisfiable CSP instance with constraint P, we cannot achieve
       better approximation ratio than simply picking random assignments. This improves the best
       previously known result by Håstad and Khot (Theory of Computing, 2005) who showed that
                                           1/2
       a predicate on k variables with 2O(k ) accepting assignments is approximation resistant on
       satisfiable instances.
           Our construction is inspired by several recent developments. One is the idea of using
       direct sums to improve soundness of PCPs, developed by Siu On Chan (STOC, 2013). We
       also use techniques from Cenny Wenner (Theory of Computing, 2013) to construct PCPs
       with perfect completeness without relying on the d-to-1 Conjecture.

ACM Classification: F.2.0
AMS Classification: 68Q17
Key words and phrases: Max-CSP, probabilistically checkable proofs, approximation resistance, satisfi-
able instance

    A conference version of this paper appeared in the Proceedings of the 45th annual ACM Symposium on Theory of
Computing, 2013 [18].
  ∗ Supported by ERC Advanced Investigator grant 226203.



© 2014 Sangxia Huang
c b Licensed under a Creative Commons Attribution License (CC-BY)                    DOI: 10.4086/toc.2014.v010a014
                                                  S ANGXIA H UANG

1    Introduction

In this article we study optimization of constraint satisfaction problems with k variables in each constraint
(M AX -k-CSP). A k-CSP instance contains a set of Boolean variables and constraints, where each
constraint is expressed by a Boolean predicate on k literals. The goal of M AX -k-CSP is to find an
assignment that maximizes the number of satisfied constraints. Given a predicate P : {−1, 1}k → {0, 1}
on k Boolean inputs (where for the input variables we use −1 for True and 1 for False), we can also
consider the M AX -P problem, a special case of M AX -k-CSP where all constraints are expressed by the
same Boolean predicate P applied to literals of k distinct variables. A M AX -P instance is satisfiable
if there exists an assignment that satisfies all the constraints simultaneously. Let P−1 (1) be the set of
accepting inputs of P.
     One naive approximation algorithm for M AX -P is to simply pick a random assignment. The expected
fraction of constraints satisfied by this algorithm is |P−1 (−1)|/2k . Somewhat surprisingly, it turns out that
for some predicate P, the above naive algorithm gives the best possible performance assuming P 6= NP.
We call a predicate P approximation resistant if it is hard to achieve better approximation ratio than
simply picking random assignments. In a celebrated result, Håstad [14] showed that for M AX -k-L IN, sets
of linear equations in Z2 on k ≥ 3 variables, it is NP-hard to find an assignment satisfying more than a
1/2 + ε fraction of the constraints for any ε > 0, even when the input has an assignment that satisfies
1 − ε of them, while the random assignment algorithm achieves 1/2. There has been much progress
in understanding what kinds of predicates are approximation resistant, including characterization for
predicates of small arity [14, 33, 12], as well as a handful of approximation resistant predicates of higher
arities [14, 28, 12, 9].
     The picture of approximation resistance becomes even clearer if we assume the Unique Games
Conjecture (UGC) proposed by Khot [20], which states that it is NP-hard to distinguish whether certain
L ABEL -C OVER instances are almost satisfiable or far from satisfiable. Austrin and Mossel [4] proved that
assuming the UGC, P is approximation resistant if the set of satisfying assignments P−1 (1) contains the
support of a pairwise independent distribution. In [29], Samorodnitsky and Trevisan showed approxima-
tion resistance for the following predicate assuming the UGC: the predicate is on 2k − 1 variables, denoted
as x(S) for 0/ 6= S ⊆ {1, . . . , k}, and the predicate accepts if for all S ⊆ [k], |S| ≥ 2, we have x(S) = ∏i∈S x({i}) .
Let K = 2k − 1. We denote the above predicate as H ADAMARDK . Note that H ADAMARDK has only K + 1
accepting assignments over 2K possible assignments, giving a density of (K + 1)/2K . Gustav Hast [13]
proved that a predicate on K ≥ 3 variables with at most 2bK/2c + 1 accepting inputs is not approxima-
tion resistant. More generally, Charikar, Makarychev and Makarychev [6] gave a ck/2k -approximation
algorithm for M AX -k-CSP, for some c > 0.44. Thus the result of Samorodnitsky and Trevisan is optimal
in terms of the sparsity of the predicate.
     In a recent breakthrough [5], Siu On Chan settled the NP-hardness of M AX -H ADAMARDK (and, up
to a constant factor, M AX -k-CSP in general), bypassing the UGC. Chan introduced the idea of direct
sums of probabilistically checkable proofs (PCPs) to improve soundness, which worked very well for
predicates that are subgroups of a domain. In particular, the accepting assignments of the H ADAMARDK
predicate is a subgroup under elementwise product, and Chan’s result implies that it is approximation
resistant assuming only P 6= NP.
     Let us now focus on the approximation of satisfiable instances. We call P approximation resistant

                      T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                      360
           A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

on satisfiable instances if the best possible algorithm is still the random assignment algorithm even with
the promise that there is an assignment that satisfies all constraints. In contrast to our understanding
of approximation resistance as demonstrated above, approximation resistance on satisfiable instances
is still largely a mystery. Most notably, if the constraints only involve linear equations, for instance
k-L IN and H ADAMARDK , we can always find a satisfying assignment using Gaussian elimination if we
are given satisfiable instances, whereas both problems are approximation resistant in general. Several
other approximation algorithms for satisfiable instances were introduced [31, 33], and in particular, it is
known that predicates with fewer than (k + 1) satisfying assignments are never approximation resistant
on satisfiable instances.
     On the hardness side, there have been only a handful of results: Håstad [14] proved that k-S AT is
approximation resistant for satisfiable instances. The sparsest such predicate known on k variables has
     1/2
2O(k ) accepting assignments, given by Håstad and Khot [16]. This situation is not particularly surprising,
as there are quite a few differences between satisfiable instances and almost satisfiable instances. Some
approximation resistant predicates, such as the k-L IN predicate discussed above, are not approximation
resistant on satisfiable instances. In addition to this inherent structural difference, there are challenges in
techniques as well. Even in the case where we do not require perfect completeness, the best hardness
                             1/2
result before [5] is the 2O(k ) /2k hardness proved by Samorodnitsky and Trevisan [28]. Engebretsen and
Holmerin [9] later improved the constant in the exponents and pointed out some technical difficulties in
                                    1/2
getting better hardness than 2O(k ) /2k using certain kind of PCP reduction. Another challenge is that
many approximation resistance results are obtained via reduction from Unique Games. This immediately
introduces problem if we need perfect completeness because the Unique Games problem is solvable in
polynomial time for satisfiable instances.
     To address this, Khot additionally proposed the “d-to-1 Conjectures” [20]. The conjecture states that it
is NP-hard to distinguish whether a “d-to-1 L ABEL -C OVER Instance” is satisfiable or far from satisfiable.
O’Donnell and Wu proved a strong result in [26] that the Not-Two predicate (NTW)—predicate on three
variables that accepts an input if the number of variables set to (−1) is not two—is approximation resistant
on satisfiable instances assuming the d-to-1 conjecture for some d. Their approach was generalized by
Tang [30] to M AX -3-CSPq where q is a prime greater than 3, and Huang [17] to Boolean predicates of
arity k ≥ 3 that accepts a strict superset of inputs of odd parity.
     Recently, Håstad [15] and Wenner [32] proved approximation resistance for the above predicates
without assuming the d-to-1 conjecture. Their proofs are based on new analytic tools as well as Khot’s
S MOOTH -L ABEL -C OVER [19]. We note that several previous results that bypassed the UGC [19, 21, 10,
11] started from S MOOTH -L ABEL -C OVER, although it is not needed in Chan’s recent result.
     An interesting question is whether we could combine these recent developments to get approximation
resistance result for M AX -P on satisfiable instances for predicate P sparser than the one in Håstad and
Khot [16]. From the PCP perspective, this requires PCPs that always accept correct proofs of correct
statements. Not only is this a natural property to have, given the challenge of getting proofs with perfect
completeness as discussed above, understanding approximability of k-CSP on satisfiable instances may
also lead to new tools in both algorithms and hardness results.
     An immediate proposal to achieve tight lower-bound for M AX -k-CSP on satisfiable instances would
be to construct predicates as in [17, 32], that is, adding a single additional accepting assignment to the
H ADAMARDK predicate of arity 2k − 1. However, this simple approach does not work—the accepting

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                              361
                                             S ANGXIA H UANG

inputs of H ADAMARDK form a k-dimensional subspace, so if we add d new accepting inputs to it and
get some other predicate P0 , we only need a (k + d)-dimensional subspace to contain all the accepting
inputs of P0 . Let Q be the predicate that accepts exactly all inputs from this (k + d)-dimensional subspace.
Given any satisfiable M AX -P0 instance with satisfying assignment α, we replace the predicate P0 in
each constraint with the predicate Q. The solution space of the instance with predicate Q is just a linear
subspace satisfying the following:

    • It contains the solution α to the original instance with predicate P0 .

    • If we project the solution space to the set of variables in each constraint, the resulting subspace has
      dimension at most (k + d).

Therefore, if we sample a random point from this linear subspace, then for each constraint, the probability
that we hit α restricted to the variables in that constraint (and hence satisfy the constraint with predicate
P0 ) is at least 1/2k+d . Thus whenever d = o(2k ), the expected performance of the above sampling method
                                                                   k
beats simple random assignment, which only gives (2k + d)/22 .
     The problem with adding more accepting assignments to H ADAMARDK is that the resulting predicate
does not have the group structure as in [5] any more. If we still take many rounds of direct sums as
in [5], then to ensure perfect completeness, we need to accept many assignments that are products of the
additional assignments we added and end up with a predicate that has more accepting assignments than
we would want. On the other hand, as is demonstrated in [5], having more rounds of direct sum helps us
to improve soundness dramatically and so if we are looking for sparse predicates that are approximation
resistant, it would be natural to have more rounds of proofs in the direct sum. This paper is an attempt to
strike a balance. We prove the following approximation resistance result.
                                                           1/3 )
Theorem 1.1. There is a predicate of arity K with 2O(K             accepting assignments that is approximation
                                                       e

resistant on satisfiable instances.
                                                              1/2
    This improves the best previous known result of 2O(K ) of Håstad and Khot [16].
    Our result is based on many ideas developed in a number of previous works, including [9, 32, 5]. On
the highest level, we use direct sum of several PCPs to get improved soundness result. However, as argued
above, we also want to limit the number of PCPs involved. Therefore, we use long-code based PCP
constructions that are already rather efficient, for example those used by Engebretsen and Holmerin [9].
In [32], Wenner showed how different types of noise operators behave similarly when the reduction is
based on S MOOTH -L ABEL -C OVER. This is helpful when analyzing soundness of PCPs in that it allows
us to move from correlated noise with perfect completeness to independent noise that are not perfect
but easier to analyze. We also use a multivariate invariance theorem in [32], which extends methods of
Mossel et al. [24, 23] to projection games. Similar techniques were developed also in other works such
as [25] as well as in [5].


2    Preliminaries
In this section, we introduce some notations. In Section 2.1, we discuss variants of L ABEL -C OVER
problems, and in particular the M ULTI -L AYER -S MOOTH -L ABEL -C OVER that we use in the rest of the

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                              362
            A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

paper. We describe the general approach to proving approximation resistance via L ABEL -C OVER, as
well as Chan’s improvements in Section 2.2. In Section 2.3, we review the basics of harmonic analysis of
Boolean functions.


2.1     Variants of L ABEL -C OVER
We first recall the definition of the L ABEL -C OVER problem.

Definition 2.1. A L ABEL -C OVER instance is defined by a tuple (U,V, E, L, R, Π). Here U and V are two
sets of vertices of a bipartite multigraph, and E is the set of edges between them. L and R are label sets
for vertices U and V , respectively. Π is a collection of projections, one for each edge e, πe : R → L. For
a labeling σ = (σU , σV ) of the L ABEL -C OVER instance σU : U → L, σV : V → R, let its value be the
fraction of edges {u, v} ∈ E such that π{u,v} (σ (v)) = σ (u). The value of a L ABEL -C OVER instance is
the maximum value of all possible assignments.

    The following theorem combines the celebrated PCP theorem [1, 2] with Raz’s parallel repetition
theorem [27] and shows hardness of L ABEL -C OVER.

Theorem 2.2. For every constant η > 0, there is some constant C(η) < ∞ such that for L ABEL -C OVER
instances with |R| ≥ C(η), it is NP-hard to distinguish between those with value 1 and those with value
no more than η.

    The M ULTI -L AYERED -S MOOTH -L ABEL -C OVER problem is a variant of L ABEL -C OVER first studied
in Khot [19] for showing hardness of coloring 3-colorable 3-uniform hypergraphs.

Definition 2.3 (Smoothness). A L ABEL -C OVER instance is ξ -smooth if for any vertex v ∈ V and any
two labels r 6= r0 ∈ R, over a uniformly at random neighbor u of v, we have

                                                Pr [πe (r) = πe (r0 )] ≤ ξ .                                         (2.1)
                                               u∼v

      Similar to L ABEL -C OVER, we have the following hardness result for S MOOTH -L ABEL -C OVER.

Theorem 2.4. For every constant η, ξ > 0, there is some constant D(η, ξ ) < ∞ such that for ξ -smooth
L ABEL -C OVER instances with |R| ≥ D(η, ξ ), it is NP-hard to distinguish between those with value 1
and those with value no more than η.

    M ULTI -L AYERED -L ABEL -C OVER was first devised in [7] to prove strong approximation hardness
result for hypergraph vertex cover, and used in [9] for improving query efficiency and hardness of
approximation result for M AX -CSP. Briefly speaking, a normal L ABEL -C OVER instance checks consis-
tency of labeling between a pair of vertices, while in a k-layered L ABEL -C OVER instance, we consider
tuples of k − 1 independently sampled edges ({u1 , v1 }, {u2 , v2 }, . . . , {uk−1 , vk−1 }), the k hybrid tuples of
vertices (u1 , . . . , ui , vi+1 , . . . , vk−1 ) for i = 0, . . . , k − 1 and their corresponding labelings, and we require
consistency between all pairs of tuples. Formally, given a L ABEL -C OVER instance as defined above, the
constraint between pairs of labelings on tuples is defined as follows.

                       T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                        363
                                                                S ANGXIA H UANG

Definition 2.5. Let ~e = (e1 , . . . , ek−1 ) ∈ E k−1 be a vector, and let 1 ≤ i < j ≤ k. Define the mapping
π~e, j→i : Lk− j × R j−1 → Lk−i × Ri−1 as

     (l1 , . . . , lk− j , rk− j+1 , . . . , rk−1 ) 7→ (l1 , . . . , lk− j , πek− j+1 (rk− j+1 ), . . . , πek−i (rk−i ), rk−i+1 , . . . , rk−1 ) .

    It is not hard to see that the above definition preserves smoothness in the layered L ABEL -C OVER
instances.

Lemma 2.6. For any k-layered L ABEL -C OVER instance constructed from a ξ -smooth L ABEL -C OVER
instance (U,V, E, L, R, Π), any positive integer 1 < i ≤ k, vertex tuple ~u = (u1 , . . . , uk−1 ) ∈ U k−i ×V i−1 ,
two tuples of labelings~r 6= ~r0 ∈ Lk−i × Ri−1 , we have

                                       Pr π~e,i→1 (~r) = π~e,i→1 (~r0 ) < ξ ,
                                                                      
                                                  ~e∼~u

where we sample ~e by picking each ei ∼ ui independently.

Proof. If there exists j ∈ {1, . . . , k − i} such that r j 6= r0j , then we always have

                                                           π~e,i→1 (~r) 6= π~e,i→1 (~r0 )

and hence the above inequality holds for any ξ > 0.
      We now assume that for all j ∈ {1, . . . , k − i}, we have r j = r0j , and that there exists j0 ∈ {k − i +
1, . . . , k − 1} such that r j 6= r0j . Observe that

                                                           π~e,i→1 (~r) = π~e,i→1 (~r0 )

implies that for all j ∈ {k − i + 1, . . . , k − 1}, we have πe j (r j ) = πe j (r0j ), and in particular

                                                              πe j0 (r j0 ) = πe j0 (r0j0 ) .

By definition of smoothness, this happens with probability less than ξ .

    In many applications, it is often easier to work with a slightly stronger notion of smoothness. We
extend the definition of projection π : R → L to sets of labels S ⊆ R by π(S) := {l ∈ L | ∃r ∈ S, π(r) = l}.

Definition 2.7. A L ABEL -C OVER instance is (J, ξ )-smooth if for any vertex v ∈ V and any set of labels
S ⊂ R, |S| ≤ J, over a uniformly at random neighbor u of v, we have
                                                         
                                         Pr |πe (S)| < |S| ≤ ξ .                                    (2.2)
                                                          u∼v

 Similarly, a k-layered L ABEL -C OVER instance is (J, ξ )-smooth if for any integer 1 < i ≤ k, vertex tuple
~u = (u1 , · · · , uk−1 ) ∈ U k−i ×V i−1 , and set of labelings S ⊆ Lk−i × Ri−1 with |S| ≤ J, we have
                                                                    
                                               Pr |π~e,i→1 (S)| < |S| ≤ ξ .
                                                      ~e∼~u


                           T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                                                  364
            A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

    Observe that |πe (S)| < |S| if and only if there exists r 6= r0 ∈ S such that πe (r) = πe (r0 ). By simple
union bound over all possible pairs of labelings in S, we can show that for constant J, the above two
notions of smoothness differs only by a constant factor. The same argument applies to multi-layered
instances.

Lemma 2.8. A ξ -smooth k-layered L ABEL -C OVER instance is J, 2J ξ -smooth.
                                                                           

   Combining all we have, we get the following hardness result for k-layered S MOOTH -L ABEL -C OVER
problem.

Theorem 2.9. For every constant η, J, ξ , k > 0, there is some constant G(η, J, ξ , k) < ∞ such that given
a (J, ξ )-smooth k-layered L ABEL -C OVER instance with |R| ≥ G(η, J, ξ , k), it is NP-hard to distinguish
between the following two cases:

YES: There exist assignments σm : U k−m × V m−1 → Lk−m × Rm−1 (1 ≤ m ≤ k), such that for all ~e =
    (e1 , . . . , ek−1 ) ∈ E k−1 and all i, j, 1 ≤ i < j ≤ k, it holds that

                   π~e, j→i (σ j (u1 , . . . , uk− j , vk− j+1 , . . . , vk−1 )) = σi (u1 , . . . , uk−i , vk−i+1 , . . . , vk−1 ) .

NO: There are no two integers l and h (1 ≤ l < h ≤ k) such that there exist functions Ph : U k−h ×
    V h−1 → Lk−h × Rh−1 and Pl : U k−l × V l−1 → Lk−l × Rl−1 , such that for more than η fraction of
    (e1 , . . . , ek−1 ) ∈ E k−1 , we have

          π~e,l→1 (Pl (u1 , . . . , uk−l , vk−l+1 , . . . , vk−1 )) = π~e,h→1 (Ph (u1 , . . . , uk−h , vk−h+1 , . . . , vk−1 )) .      (2.3)

       If an edge tuple (e1 , . . . , ek−1 ) satisfies the above condition, we say that it is weakly satisfied.

Proof. The proof is similar to [9]. The completeness case is straightforward. For soundness, suppose
there exists 1 ≤ l < h ≤ k and functions Pl , Ph , such that (2.3) holds for more than η fraction of
(e1 , . . . , ek−1 ) ∼ E k−1 . Pick any coordinate i ∈ {k − h + 1, . . . , k − l}. Then there is a way to fix edges
e1 , . . . , ei−1 , ei+1 , . . . , ek−1 , such that (2.3) holds for at least η fraction of the edges ei . We conclude the
proof by noting that the restriction of Pl and Ph on the i-th coordinate gives a labeling with value at least
η for the original L ABEL -C OVER instance.

2.2    PCP reductions
In this section, we describe a reduction from L ABEL -C OVER which is now the standard technique in
hardness of approximation. We also discuss direct sums of PCPs introduced by Chan [5].
    Consider a Boolean predicate P of arity w. The reduction typically translates labelings for u ∈ U and
v ∈ V to 2|L| and 2|R| Boolean variables, respectively. These variables are viewed as functions

                       f u : {−1, 1}|L| → {−1, 1}                  and          gv : {−1, 1}|R| → {−1, 1} .

We require that these functions be folded, that is, for any x ∈ {−1, 1}|L| , y ∈ {−1, 1}|R| ,

                                   f u (−x) = − f u (x)             and         gv (−y) = −gv (y) .

                        T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                                        365
                                                                S ANGXIA H UANG

For each pair of queries (x, −x), we select one of them. If x is selected, then when f (−x) is needed
we return − f (x) instead. Hence in the actual reduction we only use 2|L|−1 Boolean variables for each
u ∈ U and 2|R|−1 variables for each v ∈ V . This is also why we need to allow negated literals in the CSP
instances. In a correct proof for a satisfiable L ABEL -C OVER instance, the functions are long codes for
the corresponding labelings of u and v, that is, setting

                                          f u (x) = xσU (u)               and          gv (y) = yσV (v) .

    For an edge {u, v} in the L ABEL -C OVER, we sample queries

                                                        x(1) , . . . , x(m) , y(m+1) , . . . , y(w)
                                                                                                      


according to some carefully chosen test distribution T. The distribution T has the property that for any
l ∈ L and r ∈ R such that π(u,v) (r) = l, the predicate P accepts

                                                         (1)          (m)     (m+1)             (w) 
                                                        xl , . . . , xl , yr          , . . . , yr

with probability 1 (or 1 − ε for some small constant ε if we are considering non-perfect completeness).
    Let the value of an edge be the following expectation
                                       h                                                                     i
                          E             P( f u (x(1) ), . . . , f u (x(m) ), gv (y(m+1) ), . . . , gv (y(w) ) . (2.4)
                (x(1) ,...,x(m) ,y(m+1) ,...,y(w) )∼T

Observe that in the completeness case where the L ABEL -C OVER instance has an assignment that satisfies
all the edges, setting f u and gv to the long code of the labelings would give value 1 (or close to 1) for the
above expectation.
     In the soundness case, of course the functions f u and gv are not guaranteed to be long codes. Typically,
when proving approximation resistance, we start the analysis by taking the Fourier expansion of predicate
P in (2.4). The constant term in the expansion is exactly the density of P. We then show that if for some
non-constant terms we have that | E[∏ f u ∏ gv ]| ≥ δ for some small constant δ > 0, then we can find a
good labeling for the L ABEL -C OVER instance we started with, allowing us to distinguish between the
YES case and the NO case. In some cases we can show that all possible non-constant terms—including
those that do not appear in the expansion of P—are small, and this implies that predicate P is useless in
the sense of [3], a stronger notion of inapproximability.
     It is not hard to adapt the above reduction to k-layered L ABEL -C OVERs. Instead of encoding the
labelings of single vertices as long codes, we encode labelings for the hybrid vertex tuples. The rest of
the analysis is similar.
     In [5], Chan introduced direct sum of PCPs to get improved hardness of approximation results and
proved the first general criterion for approximation resistant predicate without assuming the UGC. The
main result is the following.

Theorem 2.10 (Siu On Chan). Let k ≥ 3 be an integer, G a finite abelian group, and C a balanced pairwise
independent subgroup of Gk . It is NP-hard to approximate Additive-CSP(C) better than |C|/|G|k + ε for
any constant ε > 0.

                        T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                 366
            A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

    In the Boolean case, we have G = {−1, 1} with product “·” as the group operation, C consists of
the accepting assignments of some predicate P, and Additive-CSP(C) is exactly M AX -P. Note that the
balanced pairwise independence condition is similar to the condition of Austrin and Mossel [4].
    The main idea in Chan’s proof is that instead of sampling one edge, we sample c edges for some c
to be determined. We also have c test distributions T1 , . . . , Tc corresponding to the edges we sampled,
where each distribution Ti satisfies the same requirement as we had for T above. We sample queries
                                           (1,i)
                                            x , . . . , x(k,i) i∈[c] .
                                                              


Note that for the same m ∈ [k], whether the query x(m,i) is from {−1, 1}|L| or {−1, 1}|R| may vary among
different i ∈ [c] and is an important design choice. Depending on the edges we have sampled, we choose
k functions to query as in the classical setting (some of those functions might be the same one). The
functions now have larger domains, since the j-th function would take {x( j,i) }i∈[c] as input. We also
require that the functions are folded in each individual test—for any query {x( j,i) }i∈[c] and i ∈ [c] we have

                      f x( j,1) , . . . , −x( j,i) , . . . , x( j,c) = − f x( j,1) , . . . , x( j,i) , . . . , x( j,c) .
                                                                                                                     

The intended solution now is that each function is the product of functions for the individual PCPs, and
the functions in the individual PCPs are long codes of some legitimate labelings. Similar to the classical
approach, we take the answers to the queries and accept if predicate P accepts.
     It is not hard to see that for the completeness to hold, we would need that the set of satisfying
assignments of P has some group structure—the elementwise product of two satisfying assignments is
still a satisfying assignment. Observe that the H ADAMARDK predicate satisfies this property. On the
soundness side, we need to bound each term in the Fourier expansion of (2.4). The important observation
(Lemma 5.3 in [5]) is that the absolute value of these terms are bounded by the absolute value of these
terms in each individual PCP. Hence, all we need is to show that a term in (2.4) is small in at least one of
the PCPs in the direct sum unless there is good labeling.

2.3     Efron-Stein decomposition, influence and correlation
In this section, we recall basic notions from Fourier analysis, influence, the Bonami-Beckner operator,
and correlation of correlated probability spaces.
      Let (Ω, µ) be a finite probability space with |Ω| = q. We assume that µ(x) > 0 for every x ∈ Ω. Let
χ0 , . . . , χq−1 : Ω → R be an orthonormal basis for L2 (Ω, µ) with respect to scalar product under µ. Let
this basis be such that χ0 = 1 the identical one function. For σ ∈ Znq , define
                                                                  n
                                           χσ (x1 , . . . , xn ) = ∏ χσi (xi ) .
                                                                 i=1

Then {χσ }σ ∈Znq forms an orthonormal basis for L2 (Ωn , µ ⊗n ), and every function f ∈ L2 (Ωn , µ ⊗n ) can be
written as
                                       f (x) = ∑ fˆ(σ )χσ (x) .
                                                       σ ∈Znq

      We also make extensive use of the following Efron-Stein decomposition [8, 23]

                      T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                      367
                                                S ANGXIA H UANG

Theorem 2.11. Any function f ∈ L2 (Ωn , µ ⊗n ) can be uniquely decomposed as

                                                f (x) = ∑ fS (x) ,
                                                              S⊆[n]

where

    • the function fS (x) depends only on xS = {xi | i ∈ S};

                                     / x0 ∈ Ωn , it holds that
    • for every S, T ⊆ [n], S \ T 6= 0,
                                                                  0
                                                E[ fS (x) | xT = xT ] = 0 .

   For σ ∈ Znq , let Set(σ ) = {i | σi > 0}, and let |σ | = | Set(σ )|. It is easily verified that the Efron-Stein
decomposition is related to the Fourier decomposition as follows

                                          fS (x) =        ∑        fˆ(σ )χσ (x) .
                                                        σ ∈Znq
                                                      Set(σ )=S

    A useful notion of function is the influence of a coordinate.

Definition 2.12. For f ∈ L2 (Ωn , µ ⊗n ), i ∈ [n], the influence of i on f is defined as

                                           Infi ( f ) = E [Var[ f (x)]] .
                                                          x[n]\i   xi


    Note that when we refer to influence, it is always with respect to the underlying probability space
(Ωn , µ ⊗n ). We have the following characterization of influence in terms of Fourier decomposition and
Efron-Stein decomposition.

Proposition 2.13. For f ∈ L2 (Ωn , µ ⊗n ) and i ∈ [n],

                                     Infi ( f ) =     ∑         fˆ(σ )2 = ∑ E[ fS2 ] .
                                                     σ ∈Znq               S3i
                                                    i∈Set(σ )

Let the total influence Inf( f ) = ∑i∈[n] Infi ( f ) be the sum of influences of all coordinates on f .

    We next recall the Bonami-Beckner operator, or noise operator.

Definition 2.14. Let 0 ≤ γ ≤ 1. The Bonami-Beckner operator Tγ is a linear operator mapping f ∈
L2 (Ωn , µ ⊗n ) to Tγ f as follows
                                      (Tγ f )(x) = E[ f (y)] ,
where y is sampled by setting each bit independently to yi = xi with probability 1 − γ, and otherwise
sampled according to µ with probability γ.

    Again we have the following Fourier/Efron-Stein characterization of Tγ .

                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                 368
           A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

Proposition 2.15. For any f ∈ L2 (Ωn , µ ⊗n ) and 0 ≤ γ ≤ 1,

                                          Tγ f = ∑ (1 − γ)|σ | fˆ(σ )χσ .
                                                     σ ∈Znq

                                         (γ)                                                 (γ)
    We define noisy influence as Infi ( f ) = Infi (Tγ f ), and similarly Inf(γ) ( f ) = ∑ Infi ( f ). The follow-
ing bound for the total noisy influence of functions with range [−1, 1] appears in [26, Lemma 5.9].

Proposition 2.16 (O’Donnell, Wu). For any f : Ωn → [−1, 1] and 0 < γ ≤ 1, we have
                                                                        (γ)
                                         Inf(γ) ( f ) = ∑ Infi ( f ) ≤ γ −1 .
                                                            i∈[n]

    The following concept of lifted functions is useful in the context of projection games.

Definition 2.17. Given function f : Ωnd → R and d-to-1 mapping π : R → L, define the lifted version of
      π
f as f : (Ωd )n → R as naturally induced by π
                                                        π
                                                       f (x) = f (x) ,

where x satisfies xr,t = x(r,t) for r ∈ L,t ∈ [d].

    In terms of influence, we have the following relation between f and f , due to Wenner [32].

Proposition 2.18 (Wenner). For any r, we have

                                               Infr ( f ) ≤         ∑        Infr0 ( f ) .
                                                              r0 :π(r0 )=r

Proof. The claim follows by applying Proposition 2.13 and comparing the terms.

   The correlation for correlated probability spaces was introduced by Mossel [23]. Given a probability
measure µ defined on Ω × Ψ, we say that Ω and Ψ are correlated spaces, and we use (Ω × Ψ, µ) to
denote correlated spaces and the corresponding measure. We use the following definition of correlation.

Definition 2.19. Let (Ω × Ψ, µ) be a correlated probability space, µ is a distribution on the finite product
set Ω × Ψ and that the marginals of µ on Ω and Ψ have full support. Define the correlation between Ω
and Ψ to be

              ρ(Ω, Ψ; µ) = max | E[ f g]| E[ f ] = 0, E[ f 2 ] ≤ 1, E[g] = 0, E[g2 ] ≤ 1 ,
                                  
                                f :Ω→R
                                g:Ψ→R

where the expectation E[ f g] is under µ, and E[ f ], E[ f 2 ], E[g] and E[g2 ] are under marginals of µ on
corresponding spaces.

    A useful fact for bounding correlation of probability spaces from [23] is that the correlation of
a product of correlated probability space is equal to the maximum correlation among the individual
correlated spaces (excluding empty components).

                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                369
                                              S ANGXIA H UANG

Lemma 2.20. Let {(Ωi × Ψ, µi )} be a set of correlated probability spaces, then
                                                         !
                             ρ   ∏ Ωi , ∏ Ψi ; ∏ µi          ≤ max ρ(Ωi , Ψi ; µi ) .
                                   i      i         i             i


    We also need the following lemma when analyzing correlations. Intuitively, if we can decompose µ
into a convex combination of two distributions and we can bound the correlation between Ω and Ψ in both
sub-distributions by some constant c, then barring special cases it seems reasonable that the correlation
ρ(Ω, Ψ; µ) should also be bounded by some function of c. More formally, we have the following lemma
from [32].

Lemma 2.21. Let (Ω × Ψ, δ ν + (1 − δ )ν 0 ) be a correlated space such that the marginal distribution of
at least one of Ω and Ψ is identical on both ν and ν 0 . Then
                                                  q
                  ρ(Ω, Ψ; δ ν + (1 − δ )ν 0 ) ≤    δ ρ(Ω, Ψ; ν)2 + (1 − δ )ρ(Ω, Ψ; ν 0 )2 .

    Next we recall the definition of the conditional expectation operator.

Definition 2.22. Let (Ω × Ψ, µ) be two correlated spaces. The conditional expectation operator U
associated with (Ω, Ψ) is the operator mapping f ∈ L2 (Ψ, µ) to U f ∈ L2 (Ω, µ) by

                                       (U f )(x) = E[ f (Y ) | X = x]

for x ∈ Ω and (X,Y ) ∈ Ω × Ψ is distributed according to µ.

    An important property we need in the analysis, due to Mossel [22], is that the Efron-Stein decomposi-
tion commutes with the conditional expectation operator.

Proposition 2.23 (Mossel). Let (Ω×Ψ, µ) := (∏ Ωi × ∏ Ψi , µi ) be correlated space and let U := Ui
                                                                      N                               N

be the conditional expectation operator associated with Ω and Ψ. Suppose f ∈ L2 (Ψ) has Efron-Stein
decomposition f (x) = ∑S⊆[n] fS (xS ). Then the Efron-Stein decomposition of U f satisfies (U f )S = U( fS )
for S ⊆ [n].

    The following result, due to Mossel [22], shows that in the above setting, if the correlations between
all Ω and Ψ are less than 1, then the L2 norms of the high-degree terms of U f are small.

Proposition 2.24 (Mossel). Assume the setting of Proposition 2.23 and that for all i, we have

                                              ρ(Ωi , Ψi ; µi ) ≤ ρi .

Then for all f , we have                                        !
                                       kU( fS )k2 ≤      ∏ ρi k fS k2 .
                                                          i∈S



                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                            370
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

3    The predicate, the PCP and outline of proof
Given the soundness parameter ε, the starting point of our reduction is a k-layered (J, ξ )-smooth L ABEL -
C OVER, where J and ξ are constants solely dependent on ε that we will specify later.
   The predicate. Fix some k, and let [k] := {1, 2, . . . , k}. Let

                    S3 := {S ⊆ [k] | |S| = 3}       and       S1 := {S ⊆ [k] | |S| = 1} .

The predicate is on variables {x(S) }S∈S1 ∪S3 taking values from {−1, 1}. We call the variables x({i})
 singleton variables and the remaining ones parity check variables. The predicate accepts if there exists
~w ∈ {−1, 1}S1 ∪S3 such that the number of −1 entries in ~w is no more than k, and

                                          wS x(S) · ∏ w{i} x({i}) = 1
                                                  i∈S

for all S ∈ S3 .
    We can view ~w as an error vector, and the predicate accepts inputs that are no more than Hamming
distance k away from an assignment  that satisfies all parity checks.
                            k
                              
    The predicate is on k + 3 variables, and it has
                                             k      !
                                                  + k
                                   O 2k · 3                = 2O(k log k)
                                                 k

                                                                               e 1/3
accepting inputs, thus the density (assuming the predicate has arity K) is 2O(K ) /2K , where the O     e hides
logarithmic factors.
     Outline of proof. Before going into details about the construction of our PCPs, we first give an
overview of our proof and explain the intuition behind the construction.
     Our PCP design is based on Chan’s idea of direct sums of PCPs [5] as described in Section 2.2. We
prove that all non-constant terms in the Fourier expansion of (2.4) are small.
     One crucial difference between Chan’s proof and ours is that we require perfect completeness. This
means that sometimes there would be perfect correlation between certain queries which makes it possible
for provers to find good cheating strategies. In Chan’s proof as well as in many related results where
perfect completeness is not required, one can usually break this correlation by applying some independent
noise to each query bit. However, in the case of perfect completeness, we cannot afford perturbing each
bit independently, and thus we need to take extra care when designing     test distributions. That is the main
reason our predicate accepts inputs that almost satisfy all 3k linear constraints. In some sense, these
                                                                  

extra accepting inputs serve as noise that breaks up perfect correlations.
     Another important property that Chan uses is the “group” structure of the predicate. This makes
it relatively easy to take direct sums of a large number of PCPs, each handling a small number of
non-constant terms from (2.4), without worrying too much about the completeness of the resulting PCP.
Our predicate, however, does not satisfy this property due to the extra noise we added. It is certainly
possible that if we take the sum of two assignments that are of distance k away from assignments that
satisfies all linear equations, we end up with something that is distance 2k away from an assignment that
satisfies all linear constraints, and that would break perfect completeness. To avoid this situation, we limit

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                              371
                                              S ANGXIA H UANG

both the number of PCPs in the direct sum and in each PCP the distance from an assignment that satisfies
all linear constraints. More specifically, in our construction the queries to each PCP are generated such
that if the provers (of each individual PCP) answer according to some consistent long code, then the
answers is at most distance 1 away from an assignment that satisfies all linear equations. When taking
direct sum of the k PCPs, an answer that is the direct sum of k long codes would give us an answer that
would be accepted by our predicate.
      It remains to find a number of suitable PCPs. If we try to generalize previous approaches, for example
those in [28, 9], to larger predicates such as H ADAMARDK , one of the main adversarial strategies that
we need to consider is that of inconsistent long codes. For example, consider a predicate P on variable
(x1 , . . . , xk ) and a simple PCP reduction where we sample an edge {u, v} and query functions f u and gv
according to some test distribution T as described in Section 2.2. For simplicity, assume that the query
to f u corresponds to input variable x1 , and the remaining queries are on gv . Suppose further that for a
1/2 + δ fraction of the accepting inputs of P, we have x2 x3 x4 = 1 (both H ADAMARDK and the predicate
we are studying in this work have properties similar to this.) Let gv be long code for some arbitrary
label r ∈ R. Observe that the non-constant term gv (x2 )gv (x3 )gv (x4 ) will always have expectation roughly
order of δ simply due to the requirements on T. In this case, we get a large non-constant term but it
does not help us find a consistent labeling for L ABEL -C OVER. A similar argument can be made for
M ULTI -L AYERED -L ABEL -C OVER. Chan’s construction in [5] solves this problem by making sure that
for each term, in at least one of the many PCPs in the direct sum the queries are on different functions.
As discussed before, since we are aiming for fewer PCPs in the direct sum, it would be good if each PCP
can carry out as many consistency checks as possible, and M ULTI -L AYERED -L ABEL -C OVER becomes a
very natural choice. We also need to decide which query should be in which layer for each PCP so that
we do not miss any sets of variables that has linear relations. This is mostly done in Section 4.1.
      Now we describe the PCPs in more details.
      The PCPs. Let C = {σ0 , . . . , σk−1 } be the set of cyclic permutations on [k]. The permutation σi
maps i to k, i + 1 to 1, and so on. We identify 0 with k, and thus σ0 is the identity permutation. Each
permutation corresponds to a PCP for a k-layered L ABEL -C OVER instance, and the permutation decides
which query should be in which layer in the M ULTI -L AYERED -L ABEL -C OVER. As stated above, the
final proof is the direct sum of these k PCPs.
      We   now describe the i-th PCP. It is based on a k-layered L ABEL -C OVER instance, and there are
k + 3k queries, one corresponding to an input variable. We denote the queries as x(S) . For S ∈ S1 ∪ S3 ,
define mi (S) := max σi (S) to be the maximum element of S under permutation σi . The query x(S) is in
layer mi (S). Let
                                           Vi (S) := U k−mi (S) ×V mi (S)−1
be the set of vertex tuples in layer mi (S). The proof has a function for each vertex tuple in Vi (S), and the
input to the functions are {−1, 1} strings indexed by the possible labelings Lk−mi (S) × Rmi (S)−1 in layer
                                                     (S)
mi (S). We denote the domain of the functions as Xi . In a correct proof of a correct labeling, the function
would be a long code encoding a proper labeling for all vertices in the tuple. As described in Section 2.2,
we require that all functions are folded.
     The test distributions. We first define the test distributions for each individual PCP.
     Fix i ∈ [k] and consider the i-th PCP. For notational simplicity we omit i in the subscript for now. We
first independently sample k − 1 edges ~e = {e1 , . . . , ek−1 }. For S ∈ S1 , sample x(S) ∈ X (S) uniformly at

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                               372
           A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

random. For S = {s1 , s2 , s3 } ∈ S3 , let m = m(S) be the layer in which query x(S) is located, m j = m(s j )
for j = 1, 2, 3 be the layer query x({s j }) is in, and set
                                                     3
                                              (S)         ({s })
                                            xr = ∏ xπ j
                                                           ~e,m→m j (r)
                                                    j=1


for all possible labelings r ∈ Lk−m × Rm−1 .
    We then make use of the extra inputs allowed by the predicate to add some “noise” to the distributions.
As discussed above, the resulting distribution must have the property that the output obtained by
                                                                                                applying
some consistent long codes is at most distance 1 away from an assignment that satisfies all 3k equations.
The idea is to perturb one of the variables x(S) . For each r ∈ Lk−1 , pick a uniformly random set
Nr ∈ S1 ∪ S3 , and for each
                                                   −1
                                             t ∈ π~e,m(Nr )→1
                                                              (r) ,
    (N )
set xt r to a uniform random bit independently with probability 1/2.
    We denote the test distribution by T. For each r ∈ Lk−1 , let Tr be the marginal distribution of the bits
that map to r under π~e,l→1 for all l ∈ [k]. Observe that we have T = r∈Lk−1 Tr .
                                                                       N

    Let us start by analyzing the standard completeness case.

Lemma 3.1. For any sampling of edges, let f (S) be the functions we are querying, and let x(S) be the
corresponding queries. If the k-layered L ABEL -C OVER instance has a labeling that satisfies all the edges,
then we can find functions f (S) such that the answers
                                             (S) (S)
                                             f (x ) S∈S ∪S
                                                                1     3


is at most Hamming distance 1 away from an assignment that satisfies all linear constraints on 3 singleton
variables and 1 parity check variable.

Proof. The argument is similar to a standard completeness argument.
    Fix a labeling that satisfies all the edges. The proof in the PCP consists of long codes encoding the
labeling of all hybrid vertex tuples.
    Let r ∈ Lk−1 be the labeling for the vertex tuple in layer 1. The answers we get from the long codes is
the same as returning one bit from each query generated according to Tr . The claim follows by observing
that for each tuple of bits produced as above, either it already satisfies all linear constraints, or it would
satisfy all linear constraints after we flip the Nr -th bit.

    Denote the test distribution of the i-th PCP defined above as Ti . The distribution of the final composed
PCP is simply the product of the individual test distributions ki=1 Ti . The verifier samples the edges and
                                                               N

the inputs to the functions, queries the functions (those that correspond to the chosen vertex tuples) and
accepts if the answers returned by the functions are accepted by the predicate.
    It is not hard to see from above discussions that the above PCP has perfect completeness.

Lemma 3.2. If the k-layered L ABEL -C OVER instance has a labeling that satisfies all edges, then there
exists a set of functions { f (S) } such that after querying { f (S) } the verifier accepts with probability 1.

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                               373
                                             S ANGXIA H UANG

Proof. We let our final proof be the product of proofs of the k individual PCPs given by Lemma 3.1.
Since the answer for each proof is at most distance 1 away from an assignment that satisfies all linear
constraints, their product is at most distance k away, which is exactly what the verifier (and our predicate)
accepts.
                                                                                                     1/3
Remark 3.3. We briefly discuss some of the difficulties in getting hardness result better than 2O(K ) /2K .
                                                                                                 e

The key issue here is not unlike the one discussed in [9]. As we will see in Section 4, our construction
requires us to identify a permutation cover that covers all the queries, otherwise products of a random set
of dictatorship functions would be a good cheating strategy.
    To be more specific, one possibility is to consider a predicate on k + 4k variables which does “parity
                                                                              

checks” on tuples of 4 variables and accepts everything that has less than t errors for some t, and devise a
protocol which is a direct sum of t PCPs. As we can see from Section 4.1, in this case we actually need
Ω(k2 ) permutations to cover all queries which means that t = Ω(k2 ). Such a predicate already has much
more accepting inputs than the one we study here.


4    Soundness
In this section, we analyze the soundness of our PCP. We set

                                            ε1 = ε/(7k3 + 1) ,

                                             ξ = ε12 ,
                                                          
                                                          k
                                            ρ0 = 1 − 1/4     ,
                                                          3
                                             J = 2dlogρ0 ε1 e ,

                                                             J/2
and γ such that 1 − (1 − γ)J/2 < ε1 . Note that this gives ρ0 ≤ ε1 , and that all parameters depend only
on k and ε. Also γ < ε.
    As discussed in Section 3, we would like to prove that for all S 6= 0,
                                                                        / the expectation
                                              "              #
                                            E   ∏ f (S) (x(S) )                                            (4.1)
                                                S∈S

is small unless there is good labeling.

Remark 4.1. Since we are able to bound (4.1) for any S 6= 0,
                                                          / we actually proved that our predicate is
useless in the sense of [3].

Remark 4.2. The functions f (S) actually depend on the underlying edges we sampled. For notational
convenience we suppress this dependency and save another layer of subscripts (of subscripts of subscripts).

   As discussed in previous sections, we need to show that for each non-constant term, there is at least
one PCP among those in the direct sum, such that if the expectation of the term under the PCP is large,

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                374
           A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

we can find a good labeling for the underlying L ABEL -C OVER instance by looking at the functions f
restricted to that PCP. Formally, we have the following lemma which is a reformulation of Lemma 5.3 in
Chan [5].

Lemma 4.3. Let T =        i=1 Ti , where Ti is the test distribution for the i-th PCP. Suppose for some S 6= 0,
                        Nk
                                                                                                             /
we have                                         "                      #
                                                          (S)    (S)
                                           E        ∏f          (x )       =δ,
                                           T    S∈S

then for any i ∈ [k], there exists functions g(S) whose inputs are query bits to the i-th PCP, such that
                                                "                      #
                                           E
                                           Ti
                                                    ∏ g(S) (x(S) )         ≥δ.
                                                    S∈S


    Given f (S) , we find g(S) by fixing query bits that are not in the i-th PCP in a way that does not lower
the expectation.
    Thus to bound each term, we need to carefully find an i, such that the test restricted to the i-th PCP
has small expectation. We show how to choose such i in Section 4.1. We would be back to the traditional
setting with L ABEL -C OVERs and dictatorship testing from then on. In Section 4.2, we show that we
can instead look at the distribution where each individual bit is further perturbed independently by some
random noise. Then we show in Section 4.3 how to apply an invariance-type theorem from [32] in this
new setting to get our soundness result.


4.1   Permutation covering
Our k PCPs use cyclic permutations C ∈ C to decide the layer of each query and the inputs to the
corresponding function. We first give a general definition of the crucial property we need from such sets
of permutations.

Definition 4.4. Let P be a set of permutations on [k]. We say that P covers S1 ∪ S3 if for all 0/ 6= S ⊆
S1 ∪ S3 , there exists a permutation σ ∈ P, some j, l0 ∈ [k], such that

                                   {S ∈ S | j ∈ S, max σ (S) = l0 } is odd .

     We now reformulate the above definition and prove a necessary and sufficient condition for general
sets of permutations P to cover S1 ∪ S3 .
     For each set S ∈ S1 ∪ S3 , we construct a Boolean vector vPS as the following: the elements in the
vector are indexed by a tuple (i, l, j) ∈ [|P|] × [k] × [k], and vPS,(i,l, j) = 1 if max σi (S) = l and j ∈ S, and
vS,(i,l, j) = 0 otherwise.

Proposition 4.5. The set of permutations P covers S1 ∪ S3 iff the vectors {vPS }S∈S1 ∪S3 are linearly
independent over F2 .

                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                 375
                                              S ANGXIA H UANG

Proof. If the set P does not cover S1 ∪ S3 , then there exists a set 0/ 6= S ⊆ S1 ∪ S3 , such that for any
permutation σi ∈ P and j, l0 ∈ [k], we have that

                                  {S ∈ S | S 3 j, max σi (S) = l0 } is even .

Observe that for any S ∈ S1 ∪ S3 , the segment of vPS indexed by (i, l) for some fixed i and l is all zero
if max σi (S) 6= l, and otherwise it is exactly the character vector of the set S. Therefore the above is
equivalent to saying that for any i ∈ [|P|] and l0 , we have

                                                ∑ vPS,(i,l ) = 0 ,
                                                           0
                                               S∈S

where the summation is modulo 2. Since the above holds for all i and l0 , we have

                                                     ∑ vPS = 0 ,
                                                 S∈S

or the vectors {vPS }S∈S are linearly dependent.
    Note that all the above steps are equivalent statements. Thus the other direction also holds.

    As a side note, we can see from the above argument that it is necessary to have Ω(k) permutations in
order to cover S1 ∪ S3 , because otherwise we would have Θ(k3 ) vectors of dimension o(k3 ) and thus
they could not be linearly independent.
    We now prove that the set of all cyclic permutations C = {σ0 , . . . , σk−1 } covers S1 ∪ S3 .
Lemma 4.6. The set of all cyclic permutations C = {σ0 , . . . , σk−1 } covers S1 ∪ S3 .
Proof. For any given collection of sets S ⊆ S1 ∪ S3 , we show how to find the cyclic permutation σ and
indices j, l0 ∈ [k] required in Definition 4.4.
    For a set S ∈ S1 ∪ S3 , let

                                  span(S) = min {max σi (S) − min σi (S)} ,
                                             σi ∈C

that is, the minimum distance between the largest and the smallest element under cyclic permutations.
Note that for singleton sets S ∈ S1 , we have span(S) = 0.
    For a given collection of sets S , let S ∈ S be a set with minimum span in S where we break ties
arbitrarily. Pick i0 such that σi0 (S) contains 1 and span(S) + 1 as its minimum and maximum element.
Let σ := σi0 be the permutation we want, and let l0 = span(S) + 1.
    Now we select j. If span(S) = 0, then let j = σ −1 (1) and we are done. This is because for any
non-singleton set S0 , max σ (S) > 1, and for any singleton set S00 6= S, clearly σ (S00 ) 6= σ (S). Thus S would
be the only set containing j with max σ (S) = 1 = l0 .
    If span(S) 6= 0, then S has three elements, and there is no singleton set in S . If there is any other
non-singleton set S00 ∈ S with max σ (S00 ) = span(S) + 1, then σ (S00 ) and σ (S) have the same maximum
and minimum element, namely span(S) + 1 and 1. That leaves us with the middle element. But since
S 6= S00 , the middle element must be different, so each of them appear only in one set, and setting j to the
inverse of any of the middle elements under σ would work. Otherwise we take j = max S.

                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                               376
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

    For S ⊆ S1 ∪ S3 , we consider the PCP corresponding to the cyclic permutation σi ∈ C covering S
given by Lemma 4.6. We denote the PCP as PCPi . As discussed before, we only need to show that if
(4.1) is large even when restricted to PCPi , we can find a good labeling for the L ABEL -C OVER instance
we started with.
    For notational simplicity, we only prove the case where i = 0, that is, for the identity permutation σ0 .
Arguments for general cyclic permutations are entirely symmetric.

4.2   Introducing independent noise
In this section, we show that perturbing the queries does not change the expectation of the terms by too
much.
     Formally, let Tr0 be the distribution where we first sample according to Tr , and then resample each
bit independently with probability γ according to its marginal distribution in Tr —which in our case is
uniform. Also define T 0 = r∈Lk−1 Tr0 . We prove the following lemma which bounds the difference of
                              N

expectation of (4.1) under T and T 0 .
Lemma 4.7. For any S ⊆ S1 ∪ S3 , we have
                        "              #                 "                              #
                                       (S)    (S)                          (S)    (S)
                            E    ∏f          (x ) − E           ∏f               (x )       < 7k3 ε1 ,    (4.2)
                             T   S∈S                T0
                                                            S∈S

where ε1 = ε/(7k3 + 1) is as defined at the beginning of Section 4.
    Fix some S0 ∈ S1 ∪ S3 . Let T (S0 ) be the distribution where under T, we independently resample the
bits in x(S0 ) from the uniform distribution with probability γ. We first show in Lemma 4.8 below that the
expectation under T is close to that under T (S0 ) . Lemma 4.7 follows by applying similar arguments to
each x(S) in succession.
    For S ∈ S1 ∪ S3 , let m(S) = max S be the maximum element in S. Recall that query x(S) is located in
layer m(S), and for r ∈ Lk−1 , Tr is the distribution containing all bits in
                                    (S)
                                     xt     S ∈ S1 ∪ S3 , πm(S)→1 (t) = r ,
                                                                    (S )
that is, the query bits that map to the same r. We use Tr 0 to denote the marginal distribution of T (S0 ) on
bits in                                    (S0 )
                                           xt     πm(S0 )→1 (t) = r .
Let m = m(S0 ).
    Consider the difference of expectation between T and T (S0 ) . If f (S0 ) (x(S0 ) ) does not appear in the
product, then there would be no difference. We now assume otherwise. The following lemma shows that
introducing independent noise on one query does not change the expectation by too much.
Lemma 4.8. For any S ⊆ S1 ∪ S3 , we have
                        "              #                        "                           #
                            E
                             T
                                 ∏ f (S) (x(S) ) − TE ∏ f (S) (x(S) )
                                                        (S0 )
                                                                                                < 7ε1 .   (4.3)
                                 S∈S                            S∈S



                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                               377
                                                           S ANGXIA H UANG

    In the rest of the section, we prove Lemma 4.8. The proof follows Wenner’s approach [32], especially
Lemmas 3.15 through 3.17.
    For notational simplicity let F 0 be the product of all terms but f (S0 ) (x(S0 ) ) and we abbreviate f (S0 ) as
             (S )
f . We use X 0 to abbreviate
                                                  ∏ X (S) .
                                                       S∈S1 ∪S3 ,S6=S0
                      (S )
Similarly we define X r 0 for r ∈ Lk−1 . The first step is to use Lemma 2.21 to bound the correlation
                                            (S0 )                                             (S0 )
                            ρ(X (S0 ) , X           ; T)         and          ρ(X (S0 ) , X           ; T (S0 ) ) .
Since T is simply a product of Tr with different values r, by Lemma 2.20, we only need to bound
                                 (S )    (S )                                          (S )   (S )        (S )
                            ρ(Xr 0 , X r 0 ; Tr )                  and            ρ(Xr 0 , X r 0 ; Tr 0 ) .
                                                                    (S )     (S )
Claim 4.9. For any S0 ∈ S3 , the correlation ρ(Xr 0 , X r 0 ; Tr ) is upper-bounded by
                                                                    1
                                                            1−          , ρ0 .
                                                                   4 3k
                                  (S )       (S )      (S )
The same bound holds for ρ(Xr 0 , X r 0 ; Tr 0 ).
Proof. We divide Tr into two parts: (i) the set S0 is chosen as Nr ; or (ii) some set other than S0 is chosen.
                                               (S )
It is not hard to verify that the marginal of Xr 0 after conditioning on either of them remains uniform and
thus we can apply Lemma 2.21. Let µ be the conditional distribution assuming (i) happens, and ν be the
one assuming (ii) happens. We have that
                                                            (S )      (S )
                                                     ρ(Xr 0 , X r 0 ; ν) = 1 .
For the correlation of the other part, we have
                                                            (S )      (S )
                                                     ρ(Xr 0 , X r 0 ; µ) = 1 ,
achieved by dictatorship functions. Therefore, the overall correlation is upper-bounded by
                 s                                s                          
                             k          k          2
                                                                    k              k
                    (1 − 1/     ) + 1/     · (1/2) ≤ 1 − 1/2            < 1 − 1/4     .
                             3          3                           3              3
                                              (S )
    Intuitively, the correlation under Tr 0 could not exceed that under Tr since the noise we added are all
independent. In particular, the part corresponding to
                                                               (S )        (S )
                                                           ρ(Xr 0 , X r 0 ; ν)
becomes less than 1 due to lack of perfect correlation, and the part corresponding to
                                                               (S )        (S )
                                                           ρ(Xr 0 , X r 0 ; µ)
remains the same. Thus the result follows by similar calculations as in Tr .

                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                         378
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

    Take the Efron-Stein decomposition f = ∑T ⊆Lk−1 fT . More specifically, for T ⊆ Lk−1 , we have that

                                                 fT =         ∑          fˆU χU .
                                                         U⊆Lk−m ×Rm−1
                                                          πm→1 (U)=T

Again for notational simplicity, we temporarily drop the subscript and write πm→1 as π. We decompose
the terms in the expectation in (4.3) as following

                            f F0 = F0          ∑ fT = F 0 ∑                    fT + F 0      ∑        fT .              (4.4)
                                             T ⊆Lk−1              T ⊆Lk−1                  T ⊆Lk−1
                                                                  |T |≤J/2                 |T |>J/2

We first bound the expectation of the high degree parts under both T and T (S0 ) .
    This is a standard correlation argument. We first consider the expectation under T. Let UT be the
                                                                                           (S )
conditional expectation operator mapping a function in L2 (X (S0 ) ) to a function in L2 (X 0 ) with respect
to distribution T. We have
                                                                       
                                        0                        0
                                                                               ∑ UT fT  .
                                                                                             
                                       F
                                      E       ∑        fT  = E F                                                     (4.5)
                                      T                         T
                                                                
                                             T ⊆Lk−1                     T ⊆Lk−1
                                             |T |>J/2                    |T |>J/2

                                                                                                                         (S )
Note that the expectation on the right hand side is in fact taken under the marginals of T on X 0 .
Applying Cauchy-Schwarz, we have
                                             v                        
                                               u
                                               u
                                               u                         r
                         0          (S0 )                      (S0 ) 2 
                         F         f         ≤           U
                             ∑k−1 T  t T  ∑k−1 T T  ET [F 02 ]
                                                             ( f      )                       (4.6)
                                               u 
                     E                       uE                        
                      T
                        
                                T ⊆L                             T ⊆L
                                |T |>J/2                         |T |>J/2
                                                         v
                                                                                (S ) 2
                                                         u
                                                       ≤ u              UT fT 0                                         (4.7)
                                                         u
                                                               ∑
                                                            T ⊆Lk−1
                                                         t
                                                            |T |>J/2
                                                         v
                                                                                  (S ) 2
                                                         u
                                                                        |T |                      J/2
                                                       ≤ u                      fT 0         ≤ ρ0       ≤ ε1 ,          (4.8)
                                                         u
                                                               ∑ ρ0
                                                            T ⊆Lk−1
                                                         t
                                                            |T |>J/2

where the inequality in (4.8) follows from Proposition 2.24 and that the norm in (4.8) is with respect to
the marginal of T on X (S0 ) , which is uniform. The analysis for expectation under T (S0 ) is identical as it
only involves correlation. Therefore
                                                                              
                   0     0     0                                           0                         
                 E f F − E f F ≤ E
                                  F                          ∑            − E F
                                                                       fT                      ∑           + 2ε1 .
                                                                                                         fT            (4.9)
                 T          T (S0 )                T                             T (S0 )
                                                            T ⊆Lk−1                           T ⊆Lk−1
                                                            |T |≤J/2                          |T |≤J/2


                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                            379
                                                     S ANGXIA H UANG

Now we turn to the low degree parts. Further unraveling the Efron-Stein decomposition, we have


                               F0     ∑        fT                                                          (4.10)
                                    T ⊆Lk−1
                                    |T |≤J/2

                            = F0          ∑             fˆU χU                                             (4.11)
                                    U⊆Lk−m ×Rm−1
                                     |π(U)|≤J/2

                            = F0          ∑             fˆU χU + F 0         ∑        fˆU χU .             (4.12)
                                    U⊆Lk−m ×Rm−1                       U⊆Lk−m ×Rm−1
                                     |U|=|π(U)|                         |U|>|π(U)|
                                     |π(U)|≤J/2                         |π(U)|≤J/2



Following the terminology in [15, 32], we refer to the first term as shattered term, and the second as
non-shattered term. We study these two terms separately. From (4.9), we have

                                                                                                    
                                                                                  
                                                                                  
           0           0                  0            ˆ               0        ˆ
       [ f F ] −       [ f F ] ≤ 2ε  +    F             f     −       F          f                         (4.13)
                                                                                   
     E
     T
                  E                1   E
                                       T
                                               ∑          χ
                                                         U U   E        ∑        χ
                                                                                  U U
                 T 0
                  (S )
                                          U⊆Lk−m ×Rm−1
                                         
                                                             T 0  U⊆Lk−m ×Rm−1
                                                               (S )                
                                                                                     
                                                    |U|=|π(U)|                            |U|=|π(U)|
                                                    |π(U)|≤J/2                            |π(U)|≤J/2
                                                                                                    
                                                                             
                                                                             
                                     0            ˆ               0        ˆ
                                + E F             f     −       F          fU U .                        (4.14)
                                                                              
                                  T
                                          ∑          χ
                                                    U U   E        ∑        χ
                                                        T 0  U⊆Lk−m ×Rm−1
                                                           (S )
                                     U⊆Lk−m ×Rm−1
                                                                              
                                                                                
                                                    |U|>|π(U)|                            |U|>|π(U)|
                                                    |π(U)|≤J/2                            |π(U)|≤J/2



We first use smoothness to bound the non-shattered terms. The process is very similar to that in [32], and
we get

                                                                        
                                                           
                                                           
                                         0            ˆ
                                      E F             fU U  ≤ 2ε1 .                                      (4.15)
                                                            
                                      T
                                              ∑          χ
                                         U⊆Lk−m ×Rm−1
                                                            
                                                            
                                                    |U|>|π(U)|
                                                    |π(U)|≤J/2




The same argument holds under distribution T (S0 ) . For the difference involving the shattered terms, we

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                 380
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

have
                                                                                      
                                                                
                                                                
                         0            ˆ                0     ˆ
                      E F             f      −         F      f                                    (4.16)
                                                                  
                      T
                              ∑          χ
                                        U U    E         ∑     χ
                                                                U U
                                             T  U⊆Lk−m ×Rm−1
                                                (S 0 )
                         U⊆Lk−m ×Rm−1
                                                                 
                                                                   
                               |U|=|π(U)|                                   |U|=|π(U)|
                               |π(U)|≤J/2                                   |π(U)|≤J/2
                                                                                          
                                                                           
                                                                           
                  = E F 0           ˆ           0                   |U| ˆ
                                     f      −    F             (1 −       f                         (4.17)
                                                                            
                    T
                            ∑          χ
                                      U U    E
                                              T
                                                     ∑             γ)      χ
                                                                           U U
                       U⊆Lk−m ×Rm−1             U⊆Lk−m ×Rm−1
                                                                            
                                                                             
                               |U|=|π(U)|                                 |U|=|π(U)|
                               |π(U)|≤J/2                                 |π(U)|≤J/2
                                                                             
                      0
                                                 (1 − (1 − γ)|U| ) fˆU χU 
                                                                          
                  = E
                     F            ∑                                                                (4.18)
                      T
                                                                          
                               U⊆Lk−m ×Rm−1
                              |U|=|π(U)|≤J/2

                  ≤ 1 − (1 − γ)J/2 ≤ ε1 .                                                           (4.19)

The key step is (4.17) where we switch the distribution of the second term from T (S0 ) to T. We rely
crucially on the fact that |U| = |π(U)|. To see why this holds, denote the query to f as x (just for the
current argument). Observe that the variables xt are independent for t ∈ U with different π(t), so we first
focus on those values t that map to the same r ∈ U. Looking at each r ∈ π(U), |π(U)| = |U| implies
that there is a unique t ∈ U such that π(t) = r, and thus perturbing those xt satisfying π(t) = r with
probability γ would give exactly a multiplicative factor of (1 − γ) to the expectation. Since each r ∈ π(U)
contributes a factor of (1 − γ), the final factor thus becomes (1 − γ)|π(U)| = (1 − γ)|U| .
    Summing up the above, we have

                                                  0           0
                                            E[ f F ] − E [ f F ] ≤ 7ε1 .                            (4.20)
                                            T               T (S0 )

This completes the proof.

4.3    Influence based decoding
Suppose we have that for some S ⊆ S1 ∪ S3 , the following term is large
                                     "               #
                                             E0
                                             T
                                                    ∏ f (S) (x(S) )        > ε1 ,                   (4.21)
                                                   S∈S

then for at least an ε1 /2 fraction of all possible edge samplings, we have
                                            "              #
                                         E0
                                         T
                                                  ∏ f (S) (x(S) )         > ε1 /2 .                 (4.22)
                                                  S∈S


                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                           381
                                                       S ANGXIA H UANG

In the rest of the proof, we focus on samplings of edges where (4.22) holds. We show how to extract a
labeling for these edges.
    Observe that after we fixed the edges, which function we query only depends on the layer of the
query, so for the rest of this section, let fl be the function on layer l. Also recall that m(S) = max S is the
layer query x(S) is in, and thus in the PCP query x(S) goes to function fm(S) . Let lm = maxS∈S m(S) be the
maximum layer among queries that appears in S .
    For l ∈ [k], denote the queries that appear on layer l as Ll := {S ∈ S1 ∪ S3 | max S = l}, and let
L≤l := ∪l 0 ≤l Ll 0 , and similarly define L<l . We need the following observation on independence between
queries.
Claim 4.10. For any l ∈ [k] and S0 ∈ Ll , x(S0 ) and {x(S) }S∈L<l are independent under both T and T 0 .
Proof. We first consider T. We can write x(S0 ) = xe · x({l}) , where x({l}) is a uniform random string,
“·” denotes the elementwise product, and xe depends on: (1) S0 , (2) {x(S) }S∈L<l , (3) the choice of the
locations Nr for r ∈ Lk−1 , and (4) the decision whether the bits in query x(Nr ) are resampled. Observe that
x({l}) is independent of {x(S) }S∈L<l , the Nr , and whether the bits are resampled, thus its marginal is still
uniform no matter how we fix everything else, and so is the marginal of x(S0 ) . This implies that x(S0 ) is
independent of everything else and in particular {x(S) }S∈L<l .
     For T 0 , note that the additional noise is applied independently to each bit, and we can use a similar
argument as above to show that the marginal of x(S0 ) is always uniform, no matter how we fix the other
parameters.
    We rewrite the left hand side of (4.22) as
                               "               #                            "                              #
                                                       (S)                                           (S)
                                   E0
                                   T
                                        ∏ fm(S) (x ) = ET ∏ ∏           0
                                                                                                fl (x ) .
                                        S∈S                                 l∈[k] S∈Ll ∩S

By our choice of permutation and Lemma 4.6, there exists l0 and j0 such that
                                              |{S ∈ Ll0 ∩ S | S 3 j0 }| is odd .
                                                              0
Then flipping x({ j0 }) while leaving all other x({ j }) unchanged changes the sign of the following
                                                             ∏          fl0 (x(S) ) ,
                                                       S∈Ll0 ∩S

and since the marginal of x({ j0 }) is uniform and all functions are folded, we have
                                                             

                                                E0      ∏             fl0 (x(S) ) = 0 .
                                                T   S∈Ll0 ∩S

    To complete the proof of soundness, we show that if
                         "               #      "                                                     #
                          E0
                          T
                                   ∏ fm(S) (x(S) )           = E
                                                                  T0
                                                                            ∏ ∏              fl (x(S) )
                               S∈S                                       l∈[k] S∈Ll ∩S
                               "                             #                       "                     #
                                                       (S)                                          (S)
                      = E          ∏ ∏           fl (x ) − ∏ E                           ∏     fl (x )         > ε1 /2 ,   (4.23)
                          T0   l∈[k] S∈Ll ∩S                           l∈[k] T
                                                                                 0
                                                                                     S∈Ll ∩S



                     T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                                382
            A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

then there exists two layers 1 ≤ l < lm ≤ k such that

                                                                         (γ)           (γ)              ε12
                                                     ∑                Infrl ( fl ) Infrm ( flm ) >          ,                                    (4.24)
                                                rl ∈Lk−l ×Rl−1
                                                                                                        4Z
                                              rm ∈Lk−lm ×Rlm −1
                                            πlm →1 (rm )=πl→1 (rl )

                                  3
where Z = Z(k, γ) := 24k k9 γ −1 . This enables us to define a good labeling as the following: choose rl
                     (γ)                                                                  (γ)
with probability Infrl ( fl )/ Inf(γ) ( fl ), and similarly choose rm with probability Infrm ( flm )/ Inf(γ) ( flm ),
then the probability that the labeling weakly satisfies the edge is
                                      (γ)           (γ)
                               Infrl ( fl ) Infrm ( flm )                                               (γ)           (γ)            γ 2 ε12
                 ∑                                                > γ2              ∑                Infrl ( fl ) Infrm ( flm ) ≥            .
            rl ∈Lk−l ×Rl−1      Inf(γ) ( fl ) Inf(γ) ( flm )                   rl ∈Lk−l ×Rl−1
                                                                                                                                      4Z
          rm ∈Lk−lm ×Rlm −1                                                  rm ∈Lk−lm ×Rlm −1
        πlm →1 (rm )=πl→1 (rl )                                            πlm →1 (rm )=πl→1 (rl )

This holds for at least ε1 /2 fraction of choices of edges, thus the expected value achieved by the above
random labeling procedure is at least γ 2 ε13 /8Z, a value depending only on k and ε.
    The key step to proving (4.23) is to bound the following difference
                    "                 #      "                   # "                  #
                      E0
                      T
                            ∏ fm(S) (x(S) ) − ET ∏ ∏       0
                                                                                 fl (x(S) ) E
                                                                                              T0
                                                                                                        ∏       flm (x(S) )   ,                  (4.25)
                            S∈S                                 l<lm S∈Ll ∩S                         S∈Llm ∩S

where we recall that lm is the highest layer of queries involved in S . We can iteratively apply the bound
on (4.25) to get (4.23). In order to establish (4.25), we use an invariance-type result from [32].
Theorem 4.11 (Wenner). Consider functions
                                                                                                                            !⊗n
                                                                                                                 d
                                                                                                       P=
                    (t)
                    f ∈ L∞ (Ωtn ) t∈[d]                        on a probability space                           ∏ Ωt , P
                                                                                                                t=1

and a set M ( [d]. Furthermore, let C be the collection of minimal sets C ⊆ [d], C 6⊆ M, such that the
spaces {Ωt }t∈C are dependent. Then
                              "         #           s
      h         i
            (t)           (t)       (t)
    E ∏f          − ∏ E[ f ] E ∏ f        ≤ 2 max min Inf( f (r) ) ∑ ∏ Infi ( f (t) ) ∏ f (t) .
                                             2d
                                              t∈M                        C∈C       r∈C                   i t∈C\{r}                               ∞
                       t ∈M
                         /                                                                                                        t ∈C
                                                                                                                                    /


    To apply the above theorem, we first combine all functions that are not in the highest layer. Let

                                                               Q=         ∏        X (S) ,
                                                                      S∈S ∩L<lm


and q ∈ Q simply be concatenation of {x(S) }S∈S ∩L<lm . Define the combined function

                                                               F=         ∏        fm(S) ,
                                                                      S∈S ∩L<lm


                           T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                                                  383
                                                            S ANGXIA H UANG

and the noisy version
                                                   F0 =            ∏          Tγ fm(S) .
                                                                S∈S ∩L<lm

We still have by Claim 4.10 that Q and X (S0 ) are independent for all S0 ∈ Llm . Then the first term in (4.25)
becomes                    "                  #      "                           #
                              E0
                              T
                                   ∏ fm(S) (x(S) ) = ET F 0 (q) ∏                          Tγ flm (x(S) ) .
                                   S∈S                                          S∈S ∩Llm

Let us set M = S ∩ Llm . Consider the sets C in Theorem 4.11. Since Theorem 4.11 requires that C 6⊆ M,
we have that C must include variable q. Due to the independence in Claim 4.10, C must also include at
least two variables from S ∩ Llm . Applying Theorem 4.11, we have
                          "                                        #                       "                          #
                              0
                                                Tγ flm (x(S) ) − E F 0 (q) E                            Tγ flm (x(S) )
                                                                         
                     E F (q)          ∏                                                        ∏
                      T            S∈S ∩Llm                              T             T   S∈S ∩Llm
                            r
                      2k3
                 ≤2          Inf(Tγ flm ) ∑ Infr (F 0 ) Infr (Tγ flm ) ,
                                             r∈Lk−1


where F 0 and f lm are lifted versions of F 0 and flm as defined in Definition 2.17.
   Using Proposition 2.18, we have

                                    Inf(Tγ flm ) ≤ Inf(Tγ flm ) = Inf(γ) ( flm ) ≤ γ −1 ,

and similarly
                                                                                                           (γ)
                   Infr (Tγ flm ) ≤            ∑                Infrm (Tγ flm ) =              ∑        Infrm ( flm ) .
                                         rm ∈Lk−lm ×Rlm −1                          rm ∈Lk−lm ×Rlm −1
                                           πlm →1 (rm )=r                             πlm →1 (rm )=r

                                                      (γ)
Now we need to relate Infr (F 0 ) with Infr ( fm(S) ). We use the following generalization of Lemma 6.5
from [23].

Lemma 4.12. Let (∏m    n                                                  n
                  i=1 Ωi , µ) be correlated probability space, and f i : Ωi → [−1, 1] for i = 1, . . . , m.
Then for all r:                              !
                                                            m                 m
                                               Infr     ∏ fi           ≤ m ∑ Infr ( fi ) .
                                                            i=1               i=1

    The argument goes exactly the same so we omit the proof here.
    Applying Lemma 4.12, we can upper-bound Infr (F 0 ) by the following

                                                                                                                              (γ)
       Infr (F 0 ) ≤ k3       ∑       Infr (Tγ fm(S) ) ≤ k6 ∑ Infr (Tγ fl ) ≤ k6 ∑                            ∑           Infrl ( fl ) ,
                          S∈S ∩L<lm                                    l<lm                        l<lm rl ∈Lk−l ×Rl−1
                                                                                                          πl→1 (rl )=r

where we used Proposition 2.18 to obtain the last inequality.

                    T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                                               384
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

    Summing up, we have
                  "                                                  #                         "                          #
                                                              (S)                                                   (S)
                                                                          
                         E Tγ F(q)        ∏         Tγ flm (x ) − E Tγ F(q) E                         ∏      Tγ flm (x )      (4.26)
                          T            S∈S ∩Llm                            T               T     S∈S ∩Llm
                              v
                    3                                                (γ)          (γ)
               ≤ 22k uk6 γ −1
                              u
                              u                  ∑                Infrl ( fl ) Infrm ( flm ) .                                (4.27)
                              u                 1≤l<lm
                              u             rl ∈Lk−l ×Rl−1
                                          rm ∈Lk−lm ×Rlm −1
                              t
                                        πlm →1 (rm )=πl→1 (rl )

                    3   p
    Let Z 0 = 22k         k6 γ −1 , applying (4.27) to all layers, we get
                          "                  #                v
                                    (S) (S)         0
                                                              u                        (γ)          (γ)
                        E0 ∏      f    (x   )  <  Z     ∑     u       ∑             Infrl ( fl ) Infrm ( flm ) .
                        T S∈S
                                                              u
                                                      2≤lm <k u     1≤l<l
                                                              u rl ∈Lk−l ×Rm l−1
                                                                    k−lm      lm −1
                                                              t
                                                                           rm ∈L      ×R
                                                                             πlm →l (rm )=rl


Thus if the left hand side of the above is larger than ε1 /2, then there exists 1 ≤ l < lm ≤ k such that
                                                                                                 2
                                                                                                             ε2
                                                                                    
                                                       (γ)          (γ)                 ε1 /2             1
                                      ∑             Infrl ( fl ) Infrm ( flm ) >                      ·     = 1 .
                                  rl ∈Lk−l ×Rl−1
                                                                                         kZ 0             k 4Z
                                rm ∈Lk−lm ×Rlm −1
                                  πlm →l (rm )=rl

This completes the proof.


5    Acknowledgments
The author would like to thank Johan Håstad for inspiring discussions and invaluable advice; and Cenny
Wenner for explaining many of his ideas, as well as numerous comments on improving this manuscript.
The author would also like to thank the referees for a careful reading of the paper and many suggestions
for improvements.


References
 [1] S ANJEEV A RORA , C ARSTEN L UND , R AJEEV M OTWANI , M ADHU S UDAN , AND M ARIO
     S ZEGEDY: Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555,
     1998. Preliminary version in FOCS’92. See also at ECCC. [doi:10.1145/278298.278306] 363

 [2] S ANJEEV A RORA AND S HMUEL S AFRA: Probabilistic checking of proofs: A new characterization
     of NP. J. ACM, 45(1):70–122, 1998. Preliminary version in FOCS’92. [doi:10.1145/273865.273901]
     363

                           T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                                             385
                                         S ANGXIA H UANG

 [3] P ER AUSTRIN AND J OHAN H ÅSTAD: On the usefulness of predicates. ACM Trans. Computation
     Theory, 5(1):1, 2013. Preliminary version in CCC’12. [doi:10.1145/2462896.2462897] 366, 374

 [4] P ER AUSTRIN AND E LCHANAN M OSSEL: Approximation resistant predicates from pairwise
     independence. Comput. Complexity, 18(2):249–271, 2009. Preliminary version in CCC’08. See
     also at ECCC. [doi:10.1007/s00037-009-0272-6] 360, 367

 [5] S IU O N C HAN: Approximation resistance from pairwise independent subgroups. In Proc. 45th
     STOC, pp. 447–456. ACM Press, 2013. See also at ECCC. [doi:10.1145/2488608.2488665] 360,
     361, 362, 365, 366, 367, 371, 372, 375

 [6] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Near-optimal
     algorithms for maximum constraint satisfaction problems. ACM Trans. Algorithms, 5(3):32:1–32:14,
     2009. Preliminary version in SODA’07. [doi:10.1145/1541885.1541893] 360

 [7] I RIT D INUR , V ENKATESAN G URUSWAMI , S UBHASH K HOT, AND O DED R EGEV: A new mul-
     tilayered PCP and the hardness of hypergraph vertex cover. SIAM J. Comput., 34(5):1129–1146,
     2005. Preliminary version in STOC’03. [doi:10.1137/S0097539704443057] 363

 [8] B RADLEY E FRON AND C HARLES S TEIN: The jackknife estimate of variance. Ann. Stat., 9(3):586–
     596, 1981. [doi:10.1214/aos/1176345462] 367

 [9] L ARS E NGEBRETSEN AND J ONAS H OLMERIN: More efficient queries in PCPs for NP and
     improved approximation hardness of maximum CSP. Random Structures & Algorithms, 33(4):497–
     514, 2008. Preliminary version in STACS’05. [doi:10.1002/rsa.20226] 360, 361, 362, 363, 365,
     372, 374

[10] V ITALY F ELDMAN , V ENKATESAN G URUSWAMI , P RASAD R AGHAVENDRA , AND Y I W U: Ag-
     nostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41(6):1558–1590, 2012.
     Preliminary version in FOCS’09. [doi:10.1137/120865094] 361

[11] V ENKATESAN G URUSWAMI , P RASAD R AGHAVENDRA , R ISHI S AKET, AND Y I W U: Bypassing
     UGC from some optimal geometric inapproximability results. In Proc. 23rd Ann. ACM-SIAM
     Symp. on Discrete Algorithms (SODA’12), pp. 699–717. ACM Press, 2012. See also at ECCC.
     [ACM:2095174] 361

[12] G USTAV H AST: Beating a random assignment. In Proc. 9th Internat. Workshop on Random-
     ization and Computation (RANDOM’05), volume 3624 of LNCS, pp. 134–145. Springer, 2005.
     [doi:10.1007/11538462_12] 360

[13] G USTAV H AST: Beating a Random Assignment: Approximating Constraint Satisfaction Problems.
     Ph. D. thesis, KTH Royal Institute of Technology, 2005. DiVA. 360

[14] J OHAN H ÅSTAD: Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001. Prelimi-
     nary version in STOC’97. [doi:10.1145/502090.502098] 360, 361

                  T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                       386
          A PPROXIMATION R ESISTANCE ON S ATISFIABLE I NSTANCES FOR S PARSE P REDICATES

[15] J OHAN H ÅSTAD: On the NP-hardness of Max-Not-2. SIAM J. Comput., 43(1):179–193, 2014.
     Preliminary version in APPROX’12. [doi:10.1137/120882718] 361, 380
[16] J OHAN H ÅSTAD AND S UBHASH K HOT: Query efficient PCPs with perfect completeness. Theory of
     Computing, 1(7):119–148, 2005. Preliminary version in FOCS’01. [doi:10.4086/toc.2005.v001a007]
     361, 362
[17] S ANGXIA H UANG: Approximation resistance on satisfiable instances for predicates strictly domi-
     nating parity. Electron. Colloq. on Comput. Complexity (ECCC), 19:40, 2012. ECCC. 361
[18] S ANGXIA H UANG: Approximation resistance on satisfiable instances for predicates with few accept-
     ing inputs. In Proc. 45th STOC, pp. 457–466. ACM Press, 2013. [doi:10.1145/2488608.2488666]
     359
[19] S UBHASH K HOT: Hardness results for coloring 3-colorable 3-uniform hypergraphs. In Proc. 43rd
     FOCS, pp. 23–32. IEEE Comp. Soc. Press, 2002. [doi:10.1109/SFCS.2002.1181879] 361, 363
[20] S UBHASH K HOT: On the power of unique 2-prover 1-round games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017] 360, 361
[21] S UBHASH K HOT AND DANA M OSHKOVITZ: NP-hardness of approximately solving linear equa-
     tions over reals. SIAM J. Comput., 42(3):752–791, 2013. Preliminary version in STOC’11. See also
     at ECCC. [doi:10.1137/110846415] 361
[22] E LCHANAN M OSSEL: Gaussian bounds for noise correlation of functions and tight anal-
     ysis of long codes. In Proc. 49th FOCS, pp. 156–165. IEEE Comp. Soc. Press, 2008.
     [doi:10.1109/FOCS.2008.44] 370
[23] E LCHANAN M OSSEL: Gaussian bounds for noise correlation of functions. Geom. Funct. Anal.,
     19(6):1713–1756, 2010. Preliminary version in FOCS’08. [doi:10.1007/s00039-010-0047-x] 362,
     367, 369, 384
[24] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 362
[25] RYAN O’D ONNELL AND J OHN W RIGHT: A new point of NP-hardness for unique games. In Proc.
     44th STOC, pp. 289–306. ACM Press, 2012. [doi:10.1145/2213977.2214005] 362
[26] RYAN O’D ONNELL AND Y I W U: Conditional hardness for satisfiable 3-CSPs. In Proc. 41st STOC,
     pp. 493–502. ACM Press, 2009. [doi:10.1145/1536414.1536482] 361, 369
[27] R AN R AZ: A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998. Preliminary
     version in STOC’95. [doi:10.1137/S0097539795280895] 363
[28] A LEX S AMORODNITSKY AND L UCA T REVISAN: A PCP characterization of NP with op-
     timal amortized query complexity. In Proc. 32nd STOC, pp. 191–199. ACM Press, 2000.
     [doi:10.1145/335305.335329] 360, 361, 372

                   T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                        387
                                           S ANGXIA H UANG

[29] A LEX S AMORODNITSKY AND L UCA T REVISAN: Gowers uniformity, influence of variables, and
     PCPs. SIAM J. Comput., 39(1):323–360, 2009. Preliminary version in STOC’06. See also at ECCC.
     [doi:10.1137/070681612] 360

[30] L INQING TANG: Conditional hardness of approximating satisfiable Max 3CSP-q. In Internat. Symp.
     Algorithms and Computation (ISAAC’09), pp. 923–932. Springer, 2009. [doi:10.1007/978-3-642-
     10631-6_93] 361

[31] L UCA T REVISAN: Approximating satisfiable satisfiability problems. Algorithmica, 28(1):145–172,
     2000. Preliminary version in ESA’97. [doi:10.1007/s004530010035] 361

[32] C ENNY W ENNER: Circumventing d-to-1 for approximation resistance of satisfiable predicates
     strictly containing parity of width at least four. Theory of Computing, 9(23):703–757, 2013.
     Conference version in APPROX’12. [doi:10.4086/toc.2013.v009a023] 361, 362, 369, 370, 375,
     378, 380, 383

[33] U RI Z WICK: Approximation algorithms for constraint satisfaction problems involving at most three
     variables per constraint. In Proc. 9th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’98),
     pp. 201–210. ACM Press, 1998. [ACM:314701] 360, 361


AUTHOR

      Sangxia Huang
      Ph. D. Student
      KTH Royal Institute of Technology
      Stockholm, Sweden
      huang sangxia gmail com
      http://www.csc.kth.se/~sangxia/


ABOUT THE AUTHOR

      S ANGXIA H UANG is a Ph. D. student at KTH Royal Institute of Technology, Stockholm,
         Sweden, advised by Johan Håstad. His research interest is computational complexity, and
         in particular, approximation algorithms and hardness of approximation for Constraint
         Satisfaction Problems (CSPs) and other combinatorial optimization problems. He grew
         up in Shanghai, China, and spent his undergraduate days at Shanghai Jiao Tong University.
         During his spare time, he enjoys reading, playing billiards, and traveling. He is also an
         avid Arsenal fan.




                   T HEORY OF C OMPUTING, Volume 10 (14), 2014, pp. 359–388                          388