Improved NP-Inapproximability for $2$-Variable Linear Equations

Authors Johan H{\aa}stad, Sangxia Huang, Rajsekar Manokaran, Ryan O'Donnell, John Wright,
Plaintext
                           T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51
                                        www.theoryofcomputing.org

                           S PECIAL ISSUE : APPROX-RANDOM 2015



             Improved NP-Inapproximability for
                2-Variable Linear Equations
            Johan Håstad ∗                   Sangxia Huang †                     Rajsekar Manokaran ‡
                                 Ryan O’Donnell §                          John Wright ¶
             Received December 25, 2015; Revised October 25, 2016; Published December 24, 2017




       Abstract: An instance of the 2-Lin(2) problem is a system of equations of the form
       “xi + x j = b (mod 2).” Given such a system in which it is possible to satisfy all but an ε
       fraction of the equations, we show it is NP-hard to satisfy all but a Cε fraction of equations,
       for any C < 11/8 = 1.375 (and any 0 < ε ≤ 1/8). The previous best result, standing for
       over 15 years, had 5/4 in place of 11/8. Our result provides the best known NP-hardness
       even for the Unique-Games problem, and it also holds for the special case of Max-Cut. The
     A preliminary version of this paper appeared in the Proceedings of the 18th International Workshop on Approximation
Algorithms for Combinatorial Optimization Problems (APPROX 2015) [16].
   ∗ Supported by ERC Advanced Investigator Grant 226203 and the Swedish Research Council.
   † Work done when the author was a doctoral student at KTH, Sweden. Supported by ERC Advanced Investigator Grant

226203, the Swedish Research Council, and ERC Starting Grant 335288.
   ‡ Work done when the author was a post-doctoral researcher at KTH, Sweden. Work supported by ERC Advanced Investigator

Grant 226203.
   § Department of Computer Science, Carnegie Mellon University. Supported by NSF grants CCF-0747250 and CCF-1116594

and a grant from the MSR–CMU Center for Computational Thinking.
   ¶ Work done when the author was a graduate student at CMU. Work supported by NSF grants CCF-0747250 and CCF-

1116594 and a grant from the MSR–CMU Center for Computational Thinking.


ACM Classification: F.1.3, F.2.2, G.1.6
AMS Classification: 68Q17, 68W25
Key words and phrases: approximability, unique games, gadget, linear programming


© 2017 Johan Håstad, Sangxia Huang, Rajsekar Manokaran, Ryan O’Donnell, and John Wright
c b Licensed under a Creative Commons Attribution License (CC-BY)                          DOI: 10.4086/toc.2017.v013a019
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

        precise factor 11/8 is unlikely to be best possible; we also give a conjecture concerning
        analysis of Boolean functions which, if true, would yield a larger hardness factor of 3/2.
            Our proof is by a modified gadget reduction from a pairwise-independent predicate. We
        also show an inherent limitation to this type of gadget reduction. In particular, any such
        reduction can never establish a hardness factor C greater than 2.54. Previously, no such
        limitations on gadget reductions was known.




1     Introduction

The well-known constraint satisfaction problem (CSP) 2-Lin(q) is defined as follows: Given n vari-
ables x1 , . . . , xn , as well as a system of equations (constraints) of the form “xi − x j = b (mod q)” for
constants b ∈ Zq , the task is to assign values from Zq to the variables so that there are as few unsatisfied
constraints as possible. It is known [18, 20] that, from an approximability standpoint, this problem
is equivalent to the notorious Unique-Games problem [17]. The special case of q = 2 is particularly
interesting and can be equivalently stated as follows: Given a “supply graph” G and a “demand graph” H
over the same set V of vertices, partition V into two parts so as to minimize the total number of cut supply
edges and uncut demand edges. The further special case when the supply graph G is empty (i. e., every
equation is of the form xi − x j = 1 (mod 2)) is equivalent to the Max-Cut problem.
    Let us say that an algorithm guarantees an (ε, ε 0 )-approximation if, given any instance in which the
best solution violates at most an ε-fraction of the constraints, the algorithm finds a solution violating at
most an ε 0 -fraction of the constraints. If an algorithm guarantees (ε,Cε)-approximation for every ε then
we also say that it is a factor-C approximation. To illustrate the notation we recall two simple facts. On the
one hand, for each fixed q, there is a trivial greedy algorithm which (0, 0)-approximates 2-Lin(q). On the
other hand, (ε, ε)-approximation is NP-hard for every 0 < ε < 1/q; in particular, factor-1 approximation
is NP-hard.
    We remark here that we are prioritizing the so-called “Min-Deletion” version of the 2-Lin(2) problem.
We feel it is the more natural parameterization. For example, in the more traditional “Max-2-Lin(2)”
formulation, the discrepancy between known algorithms and NP-hardness involves two quirky factors,
0.878 and 0.912. However, this disguises what we feel is the really interesting question—the same
key open question that arises for the highly analogous Sparsest-Cut problem: Is there an efficient √
                                                                            √
(ε, O(ε))-approximation, or even one that improves on the known (ε, O( log n)ε)- and (ε, O( ε))-
approximations?
    The relative importance of the “Min-Deletion” version is even more pronounced for the 2-Lin(q)
problem. As we describe below, this version of the problem is essentially equivalent to the notorious
Unique-Games problem. By way of contrast, the traditional maximization approximation factor measure
for Unique-Games is not particularly interesting—it is known [11] that there is no constant-factor
approximation for “Max-Unique-Games,” but this appears to have no relevance for the Unique Games
Conjecture.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               2
               I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

1.1   History of the problem
No efficient (ε, O(ε))-approximation algorithm for 2-Lin(2) is known. The best known efficient ap-
proximation guarantee with no dependence on n dates back to the seminal work of Goemans and
Williamson [13].
                                                                        √
Theorem 1.1 (Goemans, Williamson). There is a polynomial-time (ε, (2/π) ε + o(ε))-approximation
algorithm for 2-Lin(2).

   On the other hand, the best known approximation factor depending on n was given by Agarwal,
Charikar, Makarychev, and Makarychev [1], building on the work of Arora, Rao, and Vazirani [4].
                                                                  √
Theorem 1.2 (Agarwal et al.). There is a polynomial-time factor-O( log n) approximation for 2-Lin(2).

    Generalizing Theorem 1.1 to 2-Lin(q), Charikar, Makarychev, and Makarychev [8] obtained the
following result.
                                                                                √
Theorem 1.3 (Charikar, Makarychev, Makarychev). There is a polynomial-time (ε,Cq ε)-approximation
                                                                 √
for 2-Lin(q) (and indeed for Unique-Games), for a certain Cq = Θ( log q).

    The question of whether or not this theorem can be improved is known to be essentially equivalent to
the influential Unique Games Conjecture of Khot [17].

Theorem 1.4. The Unique Games Conjecture implies ([18, 20]) that improving  √    on Theorems 1.1, 1.3 is
NP-hard. On the other hand ([22]), if there exists q = q(ε) such that (ε, ω( ε))-approximating 2-Lin(q)
is NP-hard then the Unique Games Conjecture holds.

     The recent work of Arora, Barak, and Steurer [3] has also emphasized the importance of subexponen-
tial-time algorithms in this context.

Theorem 1.5 (Arora, Barak, Steurer). For any β ≥ log log(n)/log(n) there exists a 2O(qn ) -time algorithm
                                                                                        β

                  √                                                                                  0.001
for (ε, O(β −3/2 ) ε)-approximating
                        √           2-Lin(q). For example, there is a constant K < ∞ and an O(2n )-
time algorithm for (ε, K ε)-approximating 2-Lin(q) for any q = no(1) .

    Finally, we remark that there is an exact algorithm for 2-Lin(2) running in time roughly 1.73n due to
Williams [25].
    The known NP-hardness results for 2-Lin(q) are rather far from the bounds achieved by the known
polynomial-time algorithms. It follows easily from the PCP Theorem that for any q, there exists C > 1
such that factor-C approximation of 2-Lin(q) is NP-hard. However, getting an explicit value for C
has been a difficult task. In 1995, Bellare, Golreich, and Sudan [6] introduced the Long Code testing
technique, which let them prove NP-hardness of approximating 2-Lin(2) to factor of roughly 1.02.
Around 1997, Håstad [15] gave an optimal inapproximability result for the 3-Lin(2) problem; combining
this with the “automated gadget” results of Trevisan et al. [24] allowed him to establish NP-hardness
of factor-C approximation for any C < 5/4. By including the “outer PCP” results of Moshkovitz and
Raz [19] we may state the following more precise theorem.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                            3
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

Theorem 1.6 (Håstad). Fix any C < 5/4. Then it is NP-hard to (ε,Cε)-approximate 2-Lin(2) (for any 0 <
ε ≤ ε0 = 1/4). In fact ([19]), there is a reduction with quasilinear blowup; hence (ε,Cε)-approximation
                                 1−o(1)
on size-N instances requires 2N         time assuming the Exponential-Time Hypothesis (ETH).

    Since 1997 there had been no improvement on this hardness factor of 5/4, even for the (presumably
much harder) 2-Lin(q) problem. We remark that Håstad [15] showed the same hardness result even for
Max-Cut (albeit with a slightly smaller ε0 ) and that O’Donnell and Wright [21] showed the same result
for 2-Lin(q) (even with a slightly larger ε0 , namely ε0 → 1/2 as q → ∞).

1.2   Our results and techniques
In this work we give the first known improvement to the factor-5/4 NP-hardness for 2-Lin(2) from [15]:

Theorem 1.7. Fix any C < 11/8. Then it is NP-hard to (ε,Cε)-approximate 2-Lin(2) (for any 0 < ε ≤
ε0 = 1/8). Furthermore, the reduction takes 3-Sat instances of size n to 2-Lin(2) instances of size O(n8 );
                                                                          1/8
hence (ε,Cε)-approximating 2-Lin(2) instances of size N requires 2Ω(N ) time assuming the ETH.

This theorem is proven in Section 3, wherein we also note that the same theorem holds in the special case
of Max-Cut (albeit with some smaller, inexplicit value of ε0 ).
    Our result is a gadget reduction from the “7-ary Hadamard predicate” CSP, for which Chan [7]
recently established an optimal NP-inapproximability result. In a sense our Theorem 1.7 is a direct
generalization of Håstad’s Theorem 1.6, which involved an optimal gadget reduction from the “3-ary
Hadamard predicate” CSP, namely 3-Lin(2). That said, we should emphasize some obstacles that
prevented this result from being obtained 15 years ago.
    First, we employ Chan’s recent approximation-resistance result for the 7-ary Hadamard predicate.
In fact, what is crucial is not its approximation-resistance, but rather the stronger fact that it is a useless
predicate, as defined in the recent work [5]. That is, given a nearly-satisfiable instance of the CSP, it
is NP-hard to assign values to the variables so that the distribution on constraint 7-tuples is noticeably
different from the uniform distribution.
    Second, although in principle our reduction fits into the “automated gadget” framework of Trevisan
et al. [24], in practice it is completely impossible to find the necessary gadget automatically, since it
would involve solving a linear program with 2256 variables. Instead we had to construct and analyze our
gadget by hand. On the other hand, by also constructing an appropriate LP dual solution, we are able to
show the following in Section 4.

Theorem 1.8 (Informally stated). Our gadget achieving factor-11/8 NP-hardness for 2-Lin(2) is optimal
among gadget reductions from Chan’s 7-ary Hadamard predicate hardness.

    (See Theorem 4.1 for a formal statement of this result.) In spite of Theorem 1.8, it seems extremely
unlikely that factor-11/8 NP-hardness for 2-Lin(2) is the end of the line. Indeed, we view Theorem 1.7 as
more of a “proof of concept” illustrating that the longstanding factor-5/4 barrier can be broken; we hope
to see further improvements in the future. In particular, in Section 5 we present a candidate NP-hardness
reduction from high-arity useless CSPs that we believe may yield NP-hardness of approximating 2-Lin(2)
to any factor no larger than 3/2. The analysis of this reduction eventually depends on a certain conjecture

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                4
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

regarding analysis of Boolean functions that we were unable to resolve; thus we leave it as an open
problem.
    Finally, in Section 6 we show an inherent limitation of the method of gadget reductions from pairwise-
independent predicates. We prove that such reductions can never establish an NP-hardness factor better
than 1/(1 − e−1/2 ) ≈ 2.54 for (ε,Cε)-approximation of 2-Lin(2). We believe that this highlights a serious
bottleneck in obtaining an inapproximability result matching the performance of algorithms for this
problem as most optimal NP-inapproximability results involve pairwise-independent predicates.


2    Preliminaries
Definition 2.1. Given x, y ∈ {−1, 1}n , the Hamming distance between x and y, denoted dH (x, y), is the
number of coordinates i where xi and yi differ. Similarly, if f , g : V → {−1, 1} are two functions over a
variable set V , then the Hamming distance dH ( f , g) between them is the number of inputs x where f (x)
and g(x) disagree.
Definition 2.2. A predicate on n variables is a function φ : {−1, 1}n → {0, 1}. We say that x ∈ {−1, 1}n
satisfies φ if φ (x) = 1 and otherwise that it violates φ .
Definition 2.3. Given a predicate φ : {−1, 1}n → {0, 1}, Sat(φ ) is the set of satisfying assignments.
Definition 2.4. A set S ⊆ {−1, 1}n is a balanced pairwise-independent subgroup if it satisfies the
following properties:
    1. S forms a group under bitwise multiplication.
    2. If x is selected from S uniformly at random, then Pr[xi = 1] = Pr[xi = −1] = 1/2 for any i ∈ [n],
       and for any i 6= j, xi and x j are independent.
A predicate φ : {−1, 1}n → {0, 1} contains a balanced pairwise-independent subgroup if there exists a
set S ⊆ Sat(φ ) which is a balanced pairwise-independent subgroup.
Definition 2.5. For a subset S ⊆ [n], the parity function χS : {−1, 1}n → {−1, 1} is defined as
                                                  χS (x) := ∏ xi .
                                                              i∈S

Definition 2.6. The Hadk predicate has 2k − 1 input variables, one for each nonempty subset S ⊆ [k]. The
input string {xS }06/ =S⊆[k] satisfies Hadk if for each S, xS = χS x{1} , · · · , x{k} .
Fact 2.7. The Hadk predicate contains a balanced pairwise-independent subgroup. (In fact, the whole
set Sat(Hadk ) is a balanced pairwise-independent subgroup.)
    Given a predicate φ : {−1, 1}n → {0, 1}, an instance I of the Max-φ CSP is a variable set V and a
distribution of φ -constraints on these variables. To sample a constraint from this distribution, we write
C ∼ I, where C = ((x1 , b1 ), (x2 , b2 ), . . . , (xn , bn )). Here the xi are in V and the bi are in {−1, 1}. An
assignment A : V → {−1, 1} satisfies the constraint C if
                                 φ (b1 · A(x1 ), b2 · A(x2 ), . . . , bn · A(xn )) = 1 .
We define several measures of assignments and instances.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                  5
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

Definition 2.8. The value of A on I is just val(A; I) := PrC∼I [A satisfies C], and the value of the instance
I is val(I) := maxassignments A val(A; I). We define uval(A; I) := 1 − val(A; I) and similarly uval(I).

Definition 2.9. Let (=) : {−1, 1}2 → {0, 1} be the equality predicate, i. e., for all x1 , x2 ∈ {−1, 1}, define
(=)(x1 , x2 ) = 1 iff x1 = x2 . We will refer to the Max-(=) CSP as the Max-2-Lin(2) CSP. Any constraint
C = ((x1 , b1 ), (x2 , b2 )) in a Max-2-Lin(2) instance tests “x1 = x2 ” if b1 · b2 = 1, and otherwise tests
“x1 6= x2 .”

     Typically, a hardness of approximation result will show that given an instance I of the Max-φ problem,
it is NP-hard to tell whether val(I) ≥ c or val(I) ≤ s, for some numbers c > s. A stronger notion of
hardness is uselessness, first defined in [5], in which in the second case, not only is val(I) small, but any
assignment to the variables A appears “uniformly random” to the constraints. To make this formal, we
will require a couple of definitions.

Definition 2.10. Given two probability distributions D1 and D2 on some set S, the total variation distance
dTV between them is defined to be
                                                            1
                                     dTV (D1 , D2 ) := ∑      |D1 (e) − D2 (e)| .
                                                        e∈S 2

Definition 2.11. Given a Max-φ instance I and an assignment A, denote by D(A, I) the distribution on
{−1, 1}n generated by first sampling ((x1 , b1 ), . . . , (xn , bn )) ∼ I and then outputting (b1 · A(x1 ), . . . , bn ·
A(xn )).

      The work of Chan [7] showed uselessness for a wide range of predicates, including the Hadk predicate.

Theorem 2.12 (Chan). Let φ : {−1, 1}n → {0, 1} contain a balanced pairwise-independent subgroup.
For every ε > 0, given an instance I of Max-φ , it is NP-hard to distinguish between the following two
cases:

      • (Completeness) val(I) ≥ 1 − ε.

      • (Soundness) For every assignment A, dTV (D(A, I), Un ) ≤ ε, where Un is the uniform distribution
        on {−1, 1}n .

2.1     Gadgets
The work of Trevisan et al. [24] gives a generic methodology for constructing gadget reductions between
two predicates. In this section, we review this with an eye towards our eventual Hadk -to-2-Lin(2) gadgets.
    Suppose φ : {−1, 1}n → {0, 1} is a predicate one would like to reduce to another predicate ψ :
{−1, 1}m → {0, 1}. Set K := |Sat(φ )|. We begin by arranging the elements of Sat(φ ) as the rows of
a K × n matrix, which we will call the φ -matrix. An example of this is done for the Had3 predicate in
Figure 1. The columns of this matrix are elements of {−1, 1}K . Naming this set V := {−1, 1}K , we
will think of V as the set of possible variables to be used in a gadget reduction from φ to ψ. One of the
contributions of [24] was to show that the set V is sufficient for any such gadget reduction, and that any
gadget reduction with more than 2K variables has redundant variables which can be eliminated.

                          T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                       6
                  I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS


                                                                                  
                                       1       1      1     1      1      1      1
                                     1        1     −1     1     −1     −1     −1 
                                                                                  
                                     1       −1      1    −1      1     −1     −1 
                                                                                  
                                     1       −1     −1    −1     −1      1      1 
                                                                                  
                                     −1       1      1    −1     −1      1     −1 
                                                                                  
                                     −1       1     −1    −1      1     −1      1 
                                                                                  
                                     −1      −1      1     1     −1     −1      1 
                                      −1      −1     −1     1      1      1     −1


               Figure 1: The Had3 -matrix. The rows are the satisfying assignments of Had3 .

     Of these variables, the n variables found as the columns of the φ -matrix are special; they correspond
to n of the variables in the original φ instance and are therefore called generic primary variables. We
will call them v1 , v2 , . . . , vn , where they are ordered by their position in the φ -matrix. The remaining
variables are called generic auxiliary variables. For example, per Figure 1, (1, 1, 1, 1, −1, −1, −1, −1)
and (1, −1, −1, 1, −1, 1, 1, −1) are generic primary variables in any gadget reducing from φ , but the
variable (−1, −1, 1, −1, 1, −1, 1, −1) is always a generic auxiliary variable.
     On top of the variables V there will be a distribution of ψ constraints. As a result, a gadget G is just an
instance of the Max-ψ CSP using the variable set V . As above, we will associate G with the distribution
of ψ constraints and write C ∼ G to sample a constraint from this distribution.
     Given an assignment A : V → {0, 1}, the goal is for G to be able to detect whether the values A assigns
to the generic primary variables satisfy the φ predicate. For shorthand, we will say that A satisfies φ when

                                          φ (A(v1 ), A(v2 ), . . . , A(vn )) = 1 .

On the other hand, A fails to satisfy φ when this expression evaluates to 0. Of all assignments, we
are perhaps most concerned with the dictator assignments. The i-th dictator assignment, written di :
{−1, 1}K → {−1, 1}, is defined so that di (x) = xi for all x ∈ {−1, 1}K . The following fact shows why
the dictator assignments are so important:
Fact 2.13. Each dictator assignment di satisfies φ .
Proof. The string ((v1 )i , (v2 )i , . . . , (vn )i ) is the i-th row of the φ -matrix, which, by definition, satisfies φ .


    Before introducing the version we use in our reduction, we first give the standard definition of a
gadget. Typically, one constructs a gadget so that the dictator assignments pass with high probability,
whereas every assignment which fails to satisfy φ passes with low probability. This is formalized in the
following definition, which is essentially from [24].
Definition 2.14 (Old definition). A (c, s)-generic gadget reducing Max-φ to Max-ψ is a gadget G
satisfying the following conditions.
    • (Completeness) For every dictator assignment di , uval(di ; G) ≤ c.

                          T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                          7
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

    • (Soundness) For any assignment A which fails to satisfy φ , uval(A; G) ≥ s.
We use uval as our focus is on the deletion version of 2-Lin(2). We include the word generic in this
definition to distinguish it from the specific type of gadget we will use to reduce Hadk to 2-Lin(2). See
Section 2.3 for details.
     This style of gadget reduction is appropriate for the case when one is reducing from a predicate for
which one knows an inapproximability result and nothing else. However, in our case we are reducing from
predicates containing a balanced pairwise-independent subgroup, and Chan [7] has shown uselessness
for this class of predicates (see Theorem 2.12). As a result, we can relax the (Soundness) condition in
Definition 2.14; when reducing from this class of predicates, it is sufficient to show that this (Soundness)
condition holds for distributions of assignments which appear random on the generic primary variables.
In the following paragraph we expand on what this means.
     Denote by A a distribution over assignments A. The value of A is just the average value of an
assignment drawn from A, i. e., val(A; G) := EA∼A val(A; G), and similarly for uval(A; G). We say that A
is random on the generic primary variables if the tuple
                                          (A(v1 ), A(v2 ), . . . , A(vn ))
is, over a random A ∼ A, distributed as a uniformly random element of {−1, 1}n .
Definition 2.15. Denote by Rgen (φ ) the set of distributions which are random on the generic primary
variables.
    Our key definition is the following, which requires that our gadget only does well against distributions
in Rgen (φ ).
Definition 2.16 (New definition). A (c, s)-generic gadget reducing Max-φ to Max-ψ is a gadget G
satisfying the following properties:
    • (Completeness) For every dictator assignment di , uval(di ; G) ≤ c.
    • (Soundness) For any A ∈ Rgen (φ ), uval(A; G) ≥ s.
    The following proposition is standard, and we sketch its proof for completeness.
Proposition 2.17. Suppose there exists a (c, s)-generic gadget reducing Max-φ to Max-ψ, where Max-φ
is any predicate containing a balanced pairwise-independent subgroup. Then for all ε > 0, given an
instance I0 of Max-ψ, it is NP-hard to distinguish between the following two cases:
    • (Completeness) uval(I0 ) ≤ c + ε.
    • (Soundness) uval(I0 ) ≥ s − ε.
Proof sketch. Let I be an instance of the Max-φ problem produced via Theorem 2.12. To dispense with
some annoying technicalities, we will assume that every constraint C = ((x1 , b1 ), . . . , (xn , bn )) in the
support of I satisfies bi = 1 for all i = 1, . . . , n—if for some xi we have bi = −1, then we switch the sign
of its corresponding primary variable in every ψ-constraint that contains xi in the gadget for C.
     Construct an instance I0 of Max-ψ as follows: for each constraint C = ((x1 , 1), . . . , (xn , 1)) in the
support of I, add in a copy of G—call it GC —whose total weight is scaled down so that it equals the
weight of C. Further, identify the primary variables v1 , . . . , vn of GC with the variables x1 , . . . , xn .
     We now consider the completeness and soundness cases for Max-φ according to Theorem 2.12.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                8
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Completeness. In this case, the instance I is in the completeness case of Theorem 2.12, so there exists
an assignment A to the variables of I which violates at most an ε-fraction of the constraints. We extend
this to an assignment for all the variables of I0 and show that it violates no more than a (c + ε)-fraction of
the constraints. The extension is as follows: for any constraint C = ((x1 , 1), . . . , (xn , 1)) which A satisfies,
there is some dictator assignment to the variables of GC which agrees with A on the primary variables
v1 , . . . , vn . Set A to also agree with this dictator assignment on the auxiliary variables in GC . Regardless
of how A is extended in the remaining gadgets GC , it now labels a (1 − ε)-fraction of the G gadgets in I0
with a dictator assignment, meaning that uval(A; I0 ) ≤ (1 − ε) · c + ε · 1 ≤ c + ε.

Soundness. Let A be an assignment to the variables in I0 . We lower-bound the uval(A; I0 ) by roughly
s assuming that I belongs to the soundness case of Theorem 2.12. Consider the distribution A of
assignments to the gadget G generated as follows: sample C ∼ I and output the restriction of A to
the variables of GC . By the soundness case of Theorem 2.12, the distribution (A(x1 ), . . . , A(xn )) is ε-
close to uniform in total variation distance, and thus A is ε-close in total variation distance to some
distribution A0 ∈ Rgen (φ ). As a result, by the definition of a (c, s)-generic gadget, we have uval(A; G) ≥
uval(A0 ; G) − ε ≥ s − ε. But then uval(A; G) = uval(A; I0 ), which is therefore bounded below by s − ε.
    By Theorem 2.12, it is NP-hard to distinguish between the two cases above for instance I. It follows
that distinguishing between uval(I0 ) ≤ c + ε and uval(I0 ) ≥ s − ε is NP-hard.

2.2     Reducing into 2-Lin(2)
In this section, we consider gadgets which reduce into the 2-Lin(2) predicate. We show several convenient
simplifying assumptions that can be made in this case.

Definition 2.18. An assignment A : {−1, 1}K → {−1, 1} is folded if A(x) = −A(−x) for all x ∈ {−1, 1}K .
Here −x is the bitwise negation of x. In addition, a distribution A is folded if every assignment in its
support is folded.

    The following proposition shows that when designing a gadget which reduces into 2-Lin(2), it suffices
to ensure that its (Soundness) condition holds for folded distributions. The proof is standard.

Proposition 2.19. For some predicate φ , suppose G is a gadget reducing Max-φ to Max-2-Lin(2) which
satisfies the following two conditions:

      • (Completeness) For every dictator assignment di , uval(di ; G) ≤ c.

      • (Soundness) For any folded A ∈ Rgen (φ ), uval(A; G) ≥ s.

Then there exists a (c, s)-generic gadget reducing Max-φ to Max-2-Lin(2).

Remark 2.20. Here we are focusing on uval instead of val at the beginning of Section 2, so the c and s
here corresponds to 1 − c and 1 − s there.

Proof. For each pair of antipodal points x and −x in {−1, 1}K , pick one (say, x) arbitrarily, and set

                                         canon(x) := canon(−x) := x .

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                    9
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

This is the canonical variable associated to x and −x. The one constraint is that if either x or −x is one
of the generic primary variables, then it should be chosen as the canonical variable associated to x and
−x. Define is-canon(x) to be 1 if canon(x) = x and (−1) otherwise. Now, let G0 be the gadget whose
constraints are sampled as follows:

   1. Sample a constraint A(x1 ) · A(x2 ) = b from G.

   2. For i ∈ {1, 2}, set bi = is-canon(xi ).

   3. Output the constraint A(canon(x1 )) · A(canon(x2 )) = b · b1 · b2 .

We claim that G0 is a (c, s)-gadget reducing Max-φ to Max-2-Lin(2). Then the probability that an assign-
ment A fails on G0 is the same as the probability that the assignment A0 (x) := is-canon(x) · A(canon(x))
fails on G. For any dictator function di , di (x) = is-canon(x) · di (canon(x)) for all x. Therefore, di fails G0
with probability c. Next, it is easy to see that for any assignment A, A0 is folded and, due to our restriction
on canon(·), A0 agrees with A on the generic primary variables. Thus, given a distribution A ∈ Rgen (φ ),
A fails on G0 with the same probability that some folded distribution in Rgen (φ ) fails on G, which is at
least s.

Proposition 2.21. For fixed values of c and s, let G be a gadget satisfying the (Completeness) and
(Soundness) conditions in the statement of Proposition 2.19. Then there exists another gadget satisfying
these conditions which only uses equality constraints.

Proof. Let G0 be the gadget which replaces each constraint in G of the form x 6= y with the constraint
x = −y. If A is a folded assignment,

                                     A(x) 6= A(y) ⇐⇒ A(x) = A(−y) .

Thus, for every folded assignment A, val(A; G) = val(A, G0 ). As the (Completeness) and (Soundness)
conditions in Proposition 2.19 only concern distributions over folded assignments, G0 satisfies these
conditions.

   This means that sampling from G can be written as (x, y) ∼ G, meaning that we have sampled the
constraint “x = y.”

2.3   The Hadk -to-2-Lin(2) gadget
Now we focus on our main setting, which is constructing a Hadk -to-2-Lin(2) gadget. Via Section 2.2, we
only need to consider how well the gadget does against folded assignments.
    The Hadk predicate has 2k − 1 variables. In addition, it has K := 2k satisfying assignments, one for
each setting of the variables x{1} through x{k} . It will often be convenient to take an alternative (but
equivalent) viewpoint of the variable set V := {−1, 1}K as the set of k-variable Boolean functions, i. e.,
                                       n                             o
                                  V = f f : {−1, 1}k → {−1, 1} .


                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                  10
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Remark 2.22. The variables 1K and (−1)K in V correspond to the constant functions. In what follows,
when considering generic Hadk -to-2-Lin(2) gadgets, it will be useful to view them as constants 1 and −1
in the 2-Lin(2) equations, instead of (auxiliary) variables. We argue that this is a valid simplification.
Our result concerning Hadk -to-2-Lin(2) gadgets consists of two parts: constructing a Hadk -to-2-Lin(2)
gadget in order to show NP-hardness result for Max-2-Lin(2), and prove that the gadget we constructed
is optimal.

    • For the first part, if we do a Hadk -to-2-Lin(2) reduction using these gadgets, the 2-Lin(2) instances
      we get may have constraints that involve constants, such as (=)(x1 , −1). A standard transformation
      to turn them into ones that do not use constants is to introduce a global variable z, and replace
      1 with z, −1 with −z. By changing the signs of all variables if necessary, we can always find
      an optimal assignment that gives z value 1. Thus the transformation maintains the values of the
      instances, and does not change the computational complexity of the problem.

    • For the second part, observe that we always get gadgets at least as good by replacing the variable 1K
      with constant 1 and (−1)K with constant −1. The constants coincide with the dictator assignments
      for 1K and (−1)K , so the completeness value does not change, and the soundness does not decrease
      after this simplification.

    The Hadk matrix is a 2k × (2k − 1) matrix whose rows are indexed by strings in {−1, 1}k and whose
columns are indexed by nonempty subsets S ⊆ [k]. The (x, S)-entry of this matrix is χS (x). This can be
verified by noting that for any x ∈ {−1, 1}k ,
                                                                                                    
                          χ{1} (x), χ{2} (x), . . . , χ{k} (x), χ{1,2} (x), . . . , χ{1,2,...,k} (x)

is a satisfying assignment of the Hadk predicate. As a result, for each S 6= 0,   / χS is a column in the Hadk
matrix. Therefore, these functions are the generic primary variables. However, it will be convenient to
consider a larger set of functions to be primary. For example, because we plan on using our gadget on
folded assignments, χS and −χS will always have opposite values, and so the −χS should also be primary
variables. In addition, it is a little unnatural to have every parity function but one as a primary variable, so
we will include the constant function χ0/ and its negation −χ0/ in the set of primary variables. In total, we
have the following definition. Note that in contrast to the above discussion about generic gadgets, we
include the constant functions as variables in the following definition of Hadk -to-2-Lin(2) gadgets.

Definition 2.23. The variables of a Hadk -to-2-Lin(2) gadget are all Boolean functions over k variables.
The primary variables of a Hadk -to-2-Lin(2) gadget are the functions ±χS , for any S ⊆ [k]. The remaining
functions are auxiliary variables.

    To account for the inclusion of χ0/ as a primary variable, we will have to modify some of our definitions
from Section 2.1. We begin by defining the following modification to the Hadk predicate, and we now
refer to the definition at the beginning of this subsection as Had0k .

Definition 2.24. The Hadk predicate has 2k input variables, one for each subset S ⊆ [k]. The input string
{xS }S⊆[k] satisfies Hadk if for each S 6= 0,
                                           / xS = x0/ · ∏i∈S (x0/ · x{i} ).

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 11
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

In other words, if x0/ = 1, then the remaining variables should satisfy the Had0k predicate, and if x0/ = −1,
then their negations should. We will say that A satisfies the Hadk predicate if
                                                                                    
                    Hadk A (χ0/ ) , A χ{1} , . . . , A χ{k} , A χ{1,2} , . . . , A χ[k] = 1 .
Otherwise, A fails to satisfy the Hadk predicate. We say that A is random on the primary variables if the
tuple                                                                                  
                           A (χ0/ ) , A χ{1} , . . . , A χ{k} , A χ{1,2} , . . . , A χ[k]
is, over a random A ∼ A, distributed as a uniformly random element of {−1, 1}K .
Definition 2.25. Denote by R(Hadk ) the set of folded distributions which are random on the variables
{χS }S⊆[k] .
Definition 2.26. A (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2) is a gadget G satisfying the
following properties:
    • (Completeness) For every dictator and negated dictator assignment ±di , uval(±di ; G) ≤ c.
    • (Soundness) For any A ∈ R(Hadk ), uval(A; G) ≥ s.
Proposition 2.27. The following two statements are equivalent:
   1. There exists a (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2).
   2. There exists a (c, s)-generic gadget reducing Max-Had0k to Max-2-Lin(2).
Proof. We prove the two directions separately. Recall that the gadgets in both cases are distributions of
2-Lin(2) constraints over essentially the same set of variables, except that in (1) we have variables ±χ0/ ,
whereas in (2), we have constants ±1. To convert a gadget G in (1) to G0 in (2), we simply replace ±χ0/
with ±1, and vice versa.

(1) ⇒ (2). Let G be a (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2), and G0 the corresponding (c, s)-
generic gadget G0 reducing Max-Had0k to Max-2-Lin(2). We claim that for any folded A0 ∈ Rgen (Had0k ),
uval(A0 ; G0 ) ≥ s. To see this, construct distribution A ∈ R(Hadk ) as follows: sample A0 ∼ A0 and
b ∼ {−1, 1}, output the assignment A where we assign A(χ0/ ) = b, A(−χ0/ ) = −b, and A( f ) = bA0 ( f ) for
      / {±χ0/ }. By definition of Rgen (Had0k ), we have that A0 is uniformly random on {χS }S6=0/ , and since b
all f ∈
is uniform and independently sampled, we have that A ∈ R(Hadk ) and therefore uval(A; G) ≥ s. We also
                                                                            / {±χ0/ }, so uval(A0 ; G0 ) = uval(A; G).
have A( f )A(χ0/ ) = A0 ( f ) · 1 and A( f )A(g) = A0 ( f )A0 (g) for f , g ∈
As a result, G satisfies the (Completeness) and (Soundness) conditions in the statement of Proposition 2.19,
               0

meaning it is a (c, s)-generic gadget reducing Max-Had0k to Max-2-Lin(2).

(2) ⇒ (1). Let G0 be a (c, s)-generic gadget reducing Max-Had0k to Max-2-Lin(2), and G be the cor-
responding (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2). Let A ∈ R(Hadk ), and for b ∈ {−1, 1},
write A(b) for A conditioned on the variable χ0/ being assigned the value b. Then b · A(b) (by which we
mean the distribution where we sample A ∼ A(b) and output b · A) is in Rgen (Had0k )for both b ∈ {−1, 1},
and so uval(b · A(b) ; G) = uval b · A(b) ; G0 ≥ s. As uval(A(b) ; G) = uval b · A(b) ; G , uval(A; G) ≥ s, and
so G is a (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2).

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                    12
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

      Combining this with Proposition 2.17, we have the following corollary.
Corollary 2.28. Suppose there exists a (c, s)-gadget reducing Max-Hadk to Max-2-Lin(2). Then for all
ε > 0, given an instance I of Max-2-Lin(2), it is NP-hard to distinguish between the following two cases:
      • (Completeness) uval(I) ≤ c + ε.

      • (Soundness) uval(I) ≥ s − ε.

2.4     Reducing to the length-one case
When constructing good gadgets, we generally want dictators to pass with as high of probability as
possible. By Proposition 2.21, we can assume that our gadget operates by sampling an edge (x, y) and
testing equality between the two endpoints. Any such edge of Hamming distance i will be violated by i/K
of the dictator assignments. Intuitively, then, if we want dictators to pass with high probability, we should
concentrate the probability mass of our gadget G on edges of low Hamming distance. The following
proposition shows that this is true in the extreme: so long as we are only concerned with maximizing the
quantity s/c, we can always assume that G is entirely supported on edges of Hamming distance one.
Proposition 2.29. Suppose there exists a (c, s)-gadget G reducing Max-Hadk to Max-2-Lin(2). Then
there exists a (c0 , s0 )-gadget reducing Max-Hadk to Max-2-Lin(2) using only length-one edges for which
                                                       s0  s
                                                        0
                                                          ≥ .
                                                       c   c
Proof. For each i ∈ {1, . . . , K}, let pi be the probability that an edge sampled from G has length i, and
let Gi denote the distribution of G conditioned on this event. Then sampling from G is equivalent to first
sampling a length i with probability pi , and then sampling an edge from Gi .
    Let Q = 1 · p1 + 2 · p2 + · · · + K · pK , and for each i ∈ {1, . . . , K} define qi = i · pi /Q. It is easy to see
that the qi form a probability distribution. Now we may define the new gadget G0 as follows:
   1. Sample a length i with probability qi .

   2. Sample (x, y) ∼ Gi .

   3. Pick an arbitrary shortest path x = x0 , x1 , . . . , xi = y through the hypercube {−1, 1}K .

   4. Output a uniformly random edge (x j , x j+1 ) from this path.
Note that G0 only uses length-one edges. Let G0i denote the distribution of G0 conditioned on i being
sampled in the first step. (Note that G0i is defined in a way that is different from the way Gi is defined.)
   Let A : {−1, 1}K → {−1, 1} be any assignment. Then
                               K                                                    K
              uval(A; G) = ∑ pi · uval(A; Gi ) ,         and       uval(A; G0 ) = ∑ qi · uval(A; G0i ) .
                              i=1                                                  i=1

We can relate uval(A; G0i ) to uval(A; Gi ) as follows: if A assigns different values to the endpoints of the
edge (x, y) ∼ G, then on any shortest path x = x0 , x1 , . . . , xi = y through the hypercube {−1, 1}K , A must

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                      13
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

assign different values to at least one of the edges (x j , x j+1 ). As a result, every time A errs on Gi , it must
err at least a (1/i)-fraction of the time on G0i . This means that:
                                                              uval(A; Gi )
                                            uval(A; G0i ) ≥                .                                    (2.1)
                                                                   i
In the case when A is a dictator function, Equation (2.1) becomes an equality. This is because x =
x0 , x1 , . . . , xi = y is a shortest path between x and y through the hypercube {−1, 1}K . If A assigns the
same values to x and y, then it will assign the same values to all of x0 , x1 , . . . , xi . If, on the other hand, it
assigns different values to x and y, then it will assign different values to the endpoints of exactly one edge
(x j , x j+1 ).
       Now we can use this to relate uval(A; G0 ) to uval(A; G):
                                                      K
                                     uval(A; G0 ) = ∑ qi · uval(A; G0i )
                                                      i=1
                                                       K             
                                                             i · pi           uval(A; Gi )
                                                  ≥∑                      ·
                                                      i=1      Q                   i
                                                   1 K
                                                  =  ∑ pi · uval(A; Gi )
                                                   Q i=1
                                                   1
                                                  = uval(A; G) .                                                (2.2)
                                                   Q
Here the inequality follows from the definition of qi and Equation (2.1). As Equation (2.1) is an equality
in the case when A is a dictator function, we have that uval(A; G0 ) = (1/Q)uval(A; G) in this case.
    Let A ∈ R(Hadk ) maximize val(A; G0 ), and let di be any dictator function. Then

                                                 Q uval(A; G)
                                                 1
                             s0   uval(A; G0 )                    uval(A; G) s
                                =              ≥ 1              =             ≥ .
                             c0 uval(di ; G0 )   Q uval(di ; G)
                                                                  uval(di ; G) c

Here the first inequality is by Equation (2.2) (and the fact that it is an equality for dictators), and the
second inequality follows from the fact that uval(A, G) ≥ s and uval(di , G) = c.

2.5    Linear programs
One of the key insights of the paper [24] is that optimal gadgets (as per Definition 2.14) can be computed
by simply solving a linear program. Fortunately, the same holds for computing optimal gadgets as per
Definition 2.26. In our case, the appropriate linear program (taking into account Proposition 2.29) is:

                            max s
                             s. t. uval(A; G) ≥ s,          ∀A ∈ R(Hadk ),
                                    G is a gadget supported on edges of length one.

As written, this linear program has an (uncountably) infinite number of constraints, but this can fixed by
suitably discretizing the set R(Hadk ). This is not so important for us, as even after performing this step,

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                     14
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

the linear program is simply too large to ever be feasible in practice. What is important for us is that we
can take its dual; doing so yields the following linear program.
Definition 2.30. The dual LP is defined as

                       min s
                        s. t.    Pr [A(x) = A(y)] ≤ s,    ∀ edges (x, y) of length one,
                                A∼A
                                A ∈ R(Hadk ) .

    The dual linear program shows us that we can upper-bound the soundness of any gadget with the
value s by exhibiting a distribution on assignments in R(Hadk ) which passes each length-one edge with
probability at least s. Moreover, strong LP duality tells us that the optimum values of the two LPs are
the same. Hence, we can prove a tight upper bound by exhibiting the right distribution. We do this in
Section 4 for gadgets reducing Max-Had3 to Max-2-Lin(2).

2.6   The Had3 gadget
In this section, we will prove some structural results about the hypercube {−1, 1}8 which are relevant to
any Had3 -to-2-Lin(2) gadget. The results of this section will be useful for both Sections 3 and 4.
      Given a string x ∈ {−1, 1}n and subset of strings B ⊆ {−1, 1}n , we define the distance of x to B as
dH (x, B) := miny∈B dH (x, y).
      Let V = {−1, 1}8 , and let G = (V, E) be the hypercube graph, where we connect two vertices
v1 , v2 ∈ {−1, 1}8 with an edge if and only if dH (v1 , v2 ) = 1.
Definition 2.31. Given a hypercube graph G = (V, E), where V = {−1, 1}8 . Let V0 be the set of primary
variables of a gadget from Had3 , and for any i > 0, define Vi = {x ∈ V | dH (x,V0 ) = i}. This gives a
partition of vertices in V according to their distances to V0 .
    We can identify V with the set of 3-variable Boolean functions. The set of primary variables V0
corresponds to the set of affine functions, i. e., those of the form ±χS , where S ⊆ [3].
Proposition 2.32. For the hypercube graph G = (V, E), where V = {−1, 1}8 , the following holds:
                                                    · 1 ∪V
   1. The vertex set V can be partitioned as V = V0 ∪V  · 2 , |V0 | = 16, |V1 | = 128, and |V2 | = 112.

   2. Each x ∈ V0 has eight neighbors in V1 .

   3. Let f , g ∈ V0 be a pair of distinct functions. Then either dH ( f , g) = 8, or dH ( f , g) = 4.

   4. For any x, y ∈ {−1, 1}3 , x 6= y, bx , by ∈ {−1, 1}, the number of functions f ∈ V0 such that f (x) = bx
      is 8, and the number of functions f ∈ V0 such that f (x) = bx and f (y) = by is 4.

   5. Each x ∈ V1 has one neighbor in V0 and seven neighbors in V2 .

   6. Each x ∈ V2 has eight neighbors in V1 . Further more, these eight neighbors in V1 can be grouped
      into 4 pairs, and the two vertices in each pair share a common neighbor in V0 . This gives exactly 4
      vertices in V0 that are Hamming distance 2 away from x.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               15
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

   7. Let f ∈ V2 , and let g1 , g2 , g3 , and g4 be the four vertices in V0 which are Hamming distance 2 away
      from f . Then for any x ∈ {−1, 1}3 , three of the gi functions have the same value and one has a
      different value, and therefore f (x) = sign(g1 (x) + g2 (x) + g3 (x) + g4 (x)). We say that g1 , g2 , g3
      and g4 are the primary variables associated with f .
Proof. In this proof, we will take the viewpoint of V as the set of 3-variable Boolean functions.
    Item (2) is straightforward—for each function f ∈ V0 , changing any of the 8 positions would result in
a function in V1 by definition, and since functions in V0 are at distance at least 4 from each other, their
sets of neighbors in V1 do not intersect.
    For item (3), suppose we have f = b f χF and g = bg χG . If F = G, then since f 6= g we must have
f = −g, so dH ( f , g) = 8. Otherwise, since F 6= G,
                               E[ f (x)g(x)] =      E        [b f bg χF (x)χG (x)] = 0 ,
                                x                x∼{−1,1}3

where x is sampled uniformly over {−1, 1}3 . Thus in this case dH ( f , g) = 4.
    We prove item (4) by a dimension argument. A function f ∈ V0 can be defined as f (x1 , x2 , x3 ) =
(−1)b0 +b1 x1 +b2 x2 +b3 x3 , and we can specify f by giving the parameters b0 , b1 , b2 and b3 . Specifying
the value of f at a single point imposes 1 affine relation on the 4 parameters, so the number of affine
functions that satisfies this constraint is 24−1 = 8. Specifying the values at 2 different points introduces 2
independent affine relations and thus the resulting number of affine functions becomes 24−2 = 4.
    We now prove the remaining items. The primary variables are of the form ±χS , where S ⊆ [3]. There
are 16 such functions, and so |V0 | = 16.
    Let f 0 differ from one of the primary variables on a single input. It must be at least distance 3 from
any of the other primary variables. This means that all its neighbors are at distance 2 or 4 from all affine
functions and thus are all in V \V0 \V1 . There are 16 · 8 = 128 distinct ways of constructng such an f 0 ,
and so |V1 | = 128.
    This leaves 256 − 16 − 128 = 112 variables in V not yet accounted for. We will now show a method
for constructing exactly 112 different elements of V2 ; by the pigeonhole principle, this shows that V can
be partitioned as in item (1) with the given sizes. Item (5) also follows because each function in V1 has 8
neighbors, and exactly one of them is in V0 . Items (6) and (7) follow naturally from the proof below.
    Given three primary variables b1 χS1 , b2 χS2 , and b3 χS3 , where b1 , b2 , b3 ∈ {−1, 1}, set b4 := −b1 · b2 ·
b3 and S4 := S1 ∆ S2 ∆ S3 . Consider the function f defined as
                         f (x) := sign (b1 χS1 (x) + b2 χS2 (x) + b3 χS3 (x) + b4 χS4 (x)) .
To see that this sign(·) is well-defined, note that by definition, ∏4i=1 bi χSi (x) = −1 for all x ∈ {−1, 1}3 .
As a result, for any x, three of the bi χSi (x) have the same value, while the other one has a different value.
This also means that for all x, we have
                                                                        !
                                     4                             4
                                    ∑ bi χS (x) = 2 · sign ∑ bi χS (x)
                                             i                            i      .
                                    i=1                          i=1

Thus, the correlation of any of the variables bi χSi with f is
                                                   "                !        #
                                                       4
                                                1                              1
                           E[ f (x) · bi χSi ] = E    ∑   bi χSi (x) · bi χSi = .
                           x                    2x    i=1                      2

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                   16
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

In other words, Prx [ f (x) = bi χSi ] = 3/4 for each i ∈ {1, . . . , 4}. This is equivalent to saying that f is at
distance 2 from each of the bi χSi , for i = 1, 2, 3, 4.
    There are 23 · 83 = 448 ways of selecting the bi χSi for i = 1, 2, 3, and there are 4 different choices
of b1 χS1 , b2 χS2 and b3 χS3 that lead to the same function. Therefore this construction gives 112 unique
functions in V2 . As there are only 112 functions in V which are in neither V0 nor V1 , all of the remaining
variables in V must be contained in V2 , and they must all be generated in the manner above.
    Now we consider the neighbors of f ∈ V2 . To get from f to each of its 4 associated functions in V0 ,
we need to flip 2 bits. Since we can flip them in 2 different orders, this gives all 8 neighbors of f and they
are all in V1 .

Proposition 2.33. Let B = sat(Had3 ). Then
                                  1                            1                                  7
            Pr[dH (x, B) = 0] =      ,   Pr[dH (x, B) = 1] =     ,   and    Pr[dH (x, B) = 2] =      ,
             x                    16      x                    2             x                    16
where x is a uniformly random element of {−1, 1}8 .
Proof. This can be proven using a proof similar to Proposition 2.32. Alternatively, we can just show a
direct correspondence between the setting here and the setting in Proposition 2.32, as follows.
    The input to Had3 is a set of bits {xS }S⊆[k] , which can also be thought of as the function f :
P({1, 2, 3}) → {−1, 1} in which f (S) := xS . The satisfying assignments are then any function of the form
S 7→ b · χS (x), where b ∈ {−1, 1} and x ∈ {−1, 1}3 are both fixed. For a string x ∈ {−1, 1}3 , let α(x) be
the corresponding set, i. e., α(S)i = −1 if and only if i ∈ S. For any function f : P({1, 2, 3}) → {−1, 1},
we can associate it with the function α( f ) : {−1, 1}3 → {−1, 1} defined by α( f )(x) := f (α(x)) for
all x. Then α maps any satisfying assignment to Had3 into one of the primary variables in V0 , and
more generally, dH ( f , B) = i if and only if α( f ) ∈ Vi . The proposition therefore follows by applying
Proposition 2.32 and by noting that 16/256 = 1/16, 128/256 = 1/2, and 112/256 = 7/16.

2.7   Reducing to Max-Cut
Definition 2.34. Let (6=) : {−1, 1}2 → {0, 1} be the inequality predicate, i. e., for all x1 , x2 ∈ {−1, 1},
(6=)(x1 , x2 ) = 1 iff x1 6= x2 . The Max-Cut CSP is the special case of the Max-(6=) CSP in which every
constraint C = ((x1 , b1 ), (x2 , b2 )) satisfies b1 = b2 = 1. In other words, every constraint is of the form
“x1 6= x2 .”
Proposition 2.35. For some predicate φ , suppose G is (c, s)-generic gadget reducing Max-φ to Max-2-
Lin(2). Then there exists a (c0 , s0 )-gadget reducing Max-φ to Max-Cut satisfying
                                                     s0  s
                                                      0
                                                        = .
                                                     c   c
Proof. Suppose the vertex set of G is V = {−1, 1}K . Let G0 be the gadget which operates as follows:
   1. With probability 1 − 1/2K , pick x ∈ {−1, 1}K and output the constraint “x 6= −x.”

   2. Otherwise, sample C ∼ G. If C is of the form “x 6= y,” output “x 6= y.” If C0 is of the form “x = y,”
      output “x 6= −y.”

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                   17
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

Any folded assignment A fails G0 with probability at most 1/2K . Any assignment A which is not folded
fails G0 with probability at least              
                                              1       2     1
                                         1− K · K > K .
                                             2       2     2
As a result, we can always assume that any assignment is folded.
    Now, if A is folded, then for any x, y ∈ {−1, 1}K , A(x) = A(y) if and only if A(x) 6= A(−y). As a
result, uval(A; G0 ) = uval(A; G)/2K . Thus, c0 = c/2K , s0 = s/2K , and so s0 /c0 = s/c.


3     The factor-11/8 hardness result
In this section, we prove the following theorem.

Theorem 3.1. There is a (1/8, 11/64)-gadget reducing Had3 to 2-Lin(2).

     Using Propositions 2.17 and 2.35, we have the following two corollaries.

Corollary 3.2. There is a (c, s)-generic gadget reducing Had3 to Max-Cut with s/c = 11/8.

Corollary 3.3 (Theorem 1.7 restated). Fix any C < 11/8. Then it is NP-hard to achieve a factor-C
approximation for both the Max-2-Lin(2) and the Max-Cut CSPs.

Proof of Theorem 3.1. To construct our gadget, we will assign a nonnegative weight to each edge in the
gadget. Our gadget will then sample each edge with probability equal to its weight normalized by the
weight of the entire gadget. As argued in Proposition 2.29, our gadget will only use length-one edges.
And by Proposition 2.21, our gadget will only use equality constraints. For f , g ∈ V with dH ( f , g) = 1,
the weight of the edge { f , g} is 5 if and only if either f ∈ V0 or g ∈ V0 , and otherwise the weight is 1. The
total weight of the edges in G is 5 × 128 + 896 = 1536.
    For the completeness, the fact that the dictators pass with probability 7/8 follows immediately
from the fact that G only uses edges of length one. For the soundness, let A ∈ R(Had3 ). We consider
each A in the support of A, i. e., all folded assignments. We apply one of the following three lemmas
lower-bounding uval(A; G) depending on the distance between the assignment to the primary variables to
A and the satisfying assignments of Had3 .
Lemma 3.4. Let A : {−1, 1}8 → {−1, 1}. If the assignment of A to the primary variables satisfies the
Had3 predicate, then uval(A; G) ≥ 1/8.
Lemma 3.5. Let A : {−1, 1}8 → {−1, 1}. If the assignment of A to the primary variables is distance one
from satisfying the Had3 predicate, then uval(A; G) ≥ 21/128.
Lemma 3.6. Let A : {−1, 1}8 → {−1, 1}. If the assignment of A to the primary variables is distance two
from satisfying the Had3 predicate, then uval(A; G) ≥ 3/16.
    Proposition 2.33 gives the probability that a random A ∼ A will fall into each of these three cases. In
total
                                            1 1 1 21          7 3        11
                             uval(A; G) ≥      · + ·       + ·        =      ,
                                            16 8 2 128 16 16 64

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 18
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

which is what the theorem guarantees.
    Before proving these lemmas, we will do some preliminary work which is relevant to all three. In the
remaining part of this section, let A be some partial assignment to variables in V . We focus on partial
assignments that at least assign values to all primary variables. To analyze the quality of the gadget, we
analyze certain measures of the gadget and bound the best possible way to complete A to a full assignment.
All definitions in the rest of this section will be with respect to the partial assignment A.
    We classify variables in V2 by the assignments of their associated affine functions, where association
is defined in item (7) of Proposition 2.32.

Definition 3.7. Let A be a partial assignment, and f ∈ V2 be a function associated with g0 , g1 , g2 , g3 ∈ V0 .
We say that f is a (4, 0) function if A(g0 ) = A(g1 ) = A(g2 ) = A(g3 ). We define (3, 1) and (2, 2) functions
similarly.

    We consider paths of length 2 that start at some f ∈ V2 and end at one of the affine functions gi ∈ V0 .

Definition 3.8. Let A be a partial assignment. A path of length 2 from f ∈ V2 to g ∈ V0 is good if
A( f ) = A(g). Otherwise it is bad.

   Observe that for a function f ∈ V2 of type (2, 2), no matter how we assign the value of f , it will
always contribute 2 good paths and 2 bad paths.

Definition 3.9. Let A be a partial assignment that assigns value to all variables in V0 and all (4, 0) and
(3, 1) variables in V2 .
     Let B0 be the number of bad paths starting from the (4, 0) and (3, 1) variables, and let B1 be the
number of (2, 2) variables. Let B := B0 + 2B1 . We say that B is the number of bad paths of assignment A.

Definition 3.10. Consider a function f ∈ V1 and a partial assignment A to variables in V0 ∪ V2 . We
say that f is of type (a, b, c) if in the partial assignment, there are a good paths, b bad paths and c
undetermined-paths going through f . Note that a + b + c = 7.
    A function is switched if it has no good path through it, and is fully switched if it is of type (0, 7, 0).

     Given a partial assignment as in Definition 3.9, if we assign values to variables in V1 according
to their closest variable in V0 , then we get an assignment that violates exactly a weight of B of edges.
Switching the assignment of f only benefits if f is of type (0, b, c), in which case switching the value of
 f decreases uval by at most 2. Therefore, given a partial assignment A whose number of bad paths is B
with C switched functions, the best way to extend it to a full assignment violates at least B − 2C edges.
     The proof focuses on the partial assignment to variables in V2 , since this decides both the number
of bad paths and the number of switched functions. One natural assignment is the majority assignment,
where one sets A( f ) for f ∈ V2 according to the majority assignment of the four associated affine functions
of f (and breaks tie arbitrarily). The overall proof idea is to show that no assignment does better than this
majority assignment. In particular, we prove that no matter how we change the assignment of some of the
(4, 0) and (3, 1) variables to anti-majority and increase the number of bad paths B by some number Q, we
will never be able to increase the number of functions with no good paths through them by more than
Q/2. Thus, the weight of violated constraints will never decrease under non-majority assignments.
     This completes the proof.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 19
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

3.1   Assignments at distance 1 from Had3
In this section, we prove Lemma 3.5.

Proof of Lemma 3.5: Let A be a partial assignment that assigns values to variables in V0 , such that
there exists x0 ∈ {−1, 1}3 , and l0 ∈ V0 , such that A( f ) = f (x0 ) for all f ∈ V0 \ {l0 , −l0 }, and A(b · l0 ) =
−b · l0 (x0 ) for b ∈ {−1, 1}. We call ±l0 the corrupted affine functions.
     Of the 112 functions in V2 , 56 are associated with ±l0 , and 56 are not. The 56 functions that are not
associated with ±l0 are all (3, 1) functions by item (7) of Proposition 2.32.
     Let f ∈ V2 be a function that is associated with a0 ∈ {l0 , −l0 }. If f is such that a0 (x0 ) 6= f (x0 ), then
it is a (4, 0) function. There are 56/(2 · 2) = 14 such functions. Otherwise, it is a (2, 2) function. The
following observation summarizes the above discussion on the types of functions in V2 .

Proposition 3.11. Of the 112 functions in V2 , 56 are not associated with ±l0 and all of these 56 have
type (3, 1). Of those that are associated with ±l0 , 14 have type (4, 0), and the remaining 42 have type
(2, 2).

    As discussed above, given a partial assignment to V0 , we need to decide, for the (4, 0) and (3, 1)
functions in V2 , whether we assign them the majority assignment. The analysis in this section proceed
in two steps. We first argue that for any assignment for the (3, 1) variables, to minimize the weight
of violated constraints, either all (4, 0) variables are assigned according to majority of their associated
functions in V0 , or all of them are assigned according to anti-majority. It is then easy to argue that the
cost will be high in the case where all (4, 0) variables are assigned according to anti-majority. Then,
assuming that the (4, 0) variables are assigned according to majority, we prove that the (3, 1) variables
should also be assigned according to majority. This gives us a sufficient lower-bound for the weight of
violated constraints.
    Let us first classify the variables in V1 with respect to the majority assignment for V2 and see how
different classes of variables relate to each other. We first consider those that are neighbors of the
corrupted affine functions.

Proposition 3.12. Let A be a partial assignment where variables in V2 are assigned according to majority.
The following properties hold:

   1. Each corrupted function has 1 neighbor in V1 of type (7, 0, 0) and 7 neighbors of type (1, 0, 6). This
      contributes a total of 2 functions of type (7, 0, 0) and 14 functions of type (1, 0, 6).

   2. The (7, 0, 0) functions obtained by starting from ±l0 and flipping the value at x0 . All 7 neighbors
      in V2 of the (7, 0, 0) functions have type (4, 0). Note that this gives a total of 14 functions of type
      (4, 0), and those are exactly all the (4, 0) functions.

Proof. The proof is illustrated in Figure 2.
     Consider a path starting from a0 ∈ {±l0 }. Let f1 be the function obtained by flipping the value of
a0 at x0 . We then flip some other value y0 to get to some function f ∈ V2 . Since f (x0 ) 6= a0 (x0 ), by (7)
of Proposition 2.32, we know that a0 is the unique function associated with f that disagrees with f on
x0 . Since A( f ) = −a0 (x0 ), we have that f is a (4, 0) variable. There are 2 × 7 = 14 such paths, and each

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                    20
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

                                             f1 : (7, 0, 0)
                                       x0                       y0
                             a0                                            f : (4, 0)
                                       y0                       x0
                                             f1′ : (1, 0, 6)

                     Figure 2: The (7, 0, 0) and (1, 0, 6) neighbors of a (4, 0) function.

of them end at a distinct (4, 0) variable. This argument also shows that all 7 V2 -neighbors of f1 are of
(4, 0)-type and hence such f1 is a (7, 0, 0) variable.
     Suppose now that we start at a0 and flip the value at y0 to get function f10 . If we further flip x0 , we
again arrive at f . On the other hand, if we flip some value other than x0 or y0 , we get a function in V2 of
type (2, 2). This means that f10 is a (1, 0, 6) function.

    We can now describe all neighbors of the (4, 0) functions.
Proposition 3.13. Let A be a partial assignment where variables in V2 are assigned according to majority.
Then there are 14 functions in V2 of type (4, 0). Each of them has 1 neighbor of type (7, 0, 0), 1 of type
(1, 0, 6), and 6 of type (6, 0, 1).
     Moreover, each (6, 0, 1) function that is adjacent to a (4, 0) function actually has 2 neighbors of type
(4, 0), 1 of type (2, 2) and 4 of type (3, 1). There are 14 × 6/2 = 42 such (6, 0, 1) functions.

Proof. From Proposition 3.12, we can obtain all (4, 0) functions by starting at some corrupted affine
function a0 , flip x0 together with some other bit y0 . Let f be such a function.
    We already have from the above that on the 2 paths from f to a0 , we have 1 function of type (7, 0, 0)
and one of type (1, 0, 6).
    To understand the other neighbors of f , consider the three affine functions g1 , g2 and g3 associated
with f that are not a0 . We have that for any g ∈ {g1 , g2 , g3 }, g(x0 ) 6= a0 (x0 ) and g(y0 ) 6= a0 (y0 ). To each
of those affine functions there are 2 paths from f , giving a total of 6 paths. We now show that all six V1
variables on these 6 paths have type (6, 0, 1).
    Let f2 ∈ V1 be one of the functions on these 6 paths, and let a1 be its associated affine function. We
know that a1 (x0 ) = f2 (x0 ) 6= a0 (x0 ), dH ( f2 , a0 ) = 3. Now consider the affect of flipping different values
in f2 .
    • In order to get to the (7, 0, 0) neighbor of f from f2 , we need to flip two bits. We can flip them in
      two different orders, so this gives us two different paths of length 2, each going through a different
      (4, 0) function (because (7, 0, 0) functions only have (4, 0) neighbors).
    • If we flip value x0 of function f2 and get function f 0 , then we have that dH ( f 0 , a0 ) = 2 and thus
       f 0 is associated with a0 . We also have that dH ( f 0 , a1 ) = 2, so f 0 is also associated with a1 . Note
      that a1 (x0 ) 6= f 0 (x0 ) = a0 (x0 ), which means that A(a1 ) = A(a0 ) 6= f 0 (x0 ), and thus f 0 is a (2, 2)
      function.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                     21
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

    • For all the other V2 -neighbors of f2 , we have that they agree with f2 on x0 . Also, note that the
      above two cases cover all 3 paths from f2 to a0 . Thus for the remaining V2 -neighbors of f2 , we
      have that they all have distance 4 from a0 . This means that those functions are not associated with
      ±l0 , and therefore are (3, 1) functions contributing a good path to f2 .

This concludes the analysis.

   We now consider the neighbors of the uncorrupted affine functions. Note that we have already
characterized some of them in the above Proposition 3.13.
Proposition 3.14. Let A be a partial assignment where variables in V2 are assigned according to majority.
Then each uncorrupted affine function has 4 neighbors of type (3, 1, 3), 1 of type (0, 4, 3), and 3 of type
(6, 0, 1). Functions of type (3, 1, 3) and (0, 4, 3) do not have neighbors of type (4, 0).

Proof. Let a1 be an arbitrary affine function that is not corrupted, and a0 now be the corrupted affine
function such that a0 (x0 ) = a1 (x0 ). Choose one of the four bits where a0 and a1 differ, let us assume that
we have chosen y0 .
    Starting from a1 , let g1 be the function where we flip the value a1 (y0 ). If we flip any of the three
remaining bits where g1 and a0 differ, we get a function that is at distance 2 from a0 , and agrees with
both a0 and a1 on x0 . This is a (2, 2) function. Otherwise, we get a (3, 1) function.
    There are two subcases for the latter case, in which we flip one bit that a0 and a1 agree on: if we flip
x0 and get function g2 , then taking the majority assignment for g2 gives us a bad path at g1 ; otherwise we
get a good path.
    We thus conclude that in this case g1 is a (3, 1, 3) function, and all good and bad paths through it are
from (3, 1) functions. There are 14 × 4 = 56 such functions g1 .
    Now let g0 be the function obtained by starting at a1 and flipping the value at x0 . If we then flip any
of the three remaining bits where a0 and a1 agree, we get a function in V2 that is associated with −a0
and is of type (2, 2). Otherwise, if we flip one of the 4 bits on which a0 and a1 disagree, we get a (3, 1)
function that contributes a bad path. This gives us 14 functions in V1 of type (0, 4, 3).
    If we start at a1 and flip one of the 3 remaining bits, we get a (6, 0, 1) function.
    This completes the proof.

     We have now described the types of all 128 functions in V1 : 2 of type (7, 0, 0), 14 of type (1, 0, 6),
42 of type (6, 0, 1), 56 of type (3, 1, 3), and 14 of type (0, 4, 3). Given a partial assignment to the
affine variables as well as the (4, 0) and (3, 1) variables in V2 (where they are assigned according to
the majority of the assignments to their associated affine functions), and a function f ∈ V1 , let a0 be
the closest corrupted affine function, and let a1 be the closest affine function. We can decide the type
of f by checking whether dH ( f , a0 ) is 1 or 3, whether f (x0 ) = a0 (x0 ), and if dH ( f , a0 ) = 3, whether
 f (x0 ) = a1 (x0 ). The details are in the above arguments and we summarize the result below.
Proposition 3.15. Fix a partial assignment as above. Let f ∈ V1 , a0 be its closest corrupted affine
function, and a1 be its closest affine function. We have the following regarding the type of f :

    • If a0 = a1 and thus dH ( f , a0 ) = 1, then f has type (7, 0, 0) if and only if f (x0 ) = a0 (x0 ), and
      otherwise it has type (1, 0, 6).

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                22
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

    • If a0 6= a1 , but a0 (x0 ) = a1 (x0 ) = f (x0 ), then f is a (3, 1, 3) function.

    • If a0 6= a1 , f (x0 ) 6= a1 (x0 ), a0 (x0 ) 6= a1 (x0 ), then f is a (0, 4, 3) function.

    • If a0 6= a1 , f (x0 ) = a1 (x0 ), a0 (x0 ) 6= a1 (x0 ), then f is a (6, 0, 1) function.
    Finally, we describe the neighbors of the (3, 1) functions in V2 .
Proposition 3.16. Let A be a partial assignment where variables in V2 are assigned according to majority.
Each (3, 1) function contributes 1 good path each to 3 (3, 1, 3) functions, 1 good path each to 3 (6, 0, 1)
functions, and 1 bad path to 1 (0, 4, 3) function and 1 bad path to 1 (3, 1, 3) function.

Proof. Let f now be a (3, 1) function. There are 6 good paths coming from it and going into 3 uncorrupted
V0 variables, and 2 bad paths going to another uncorrupted variable in V0 .
     Consider a pair of good paths that go to the same affine function a1 . We now argue that exactly one
of the two V1 -variables is (3, 1, 3), and the other is (6, 0, 1).
     Let a0 be the corrupted affine function such that a1 (x0 ) = a0 (x0 ). Since f is a (3, 1) function, it is not
associated with any corrupted affine functions. Therefore, on the paths from a1 to f , we flip exactly one
bit x such that x 6= x0 and a1 (x) = a0 (x), and another bit y such that a1 (y) 6= a0 (y). By Proposition 3.15,
we have that starting from a1 , if we first flip y, we get a (3, 1, 3) function. If we first flip x, then the closest
corrupted affine function becomes −a0 rather than a0 , and in this case we get a (6, 0, 1).
     Now we turn to the pair of bad paths, and let a2 be the affine function at the end of the bad paths.
Note that dH ( f , a2 ) = 2, one of the bits on which they differ is x0 , and we denote the other by y. Since
f is a (3, 1) function, we have that dH ( f , a0 ) = 4, therefore it must be that a0 (y) 6= a2 (y). Therefore,
starting from a2 , if we first flip x0 and get function f 0 ∈ V1 , then the closest corrupted affine function to f 0
becomes −a0 , and f 0 (x0 ) = −a2 (x0 ) = −a0 (x0 ), thus by Proposition 3.15, the function f 0 is a (0, 4, 3)
function.
     Otherwise, if we first flip y, then for the resulting function f 0 , the closest corrupted affine function is
a0 , f 0 (x0 ) = a2 (x0 ) = a0 (x0 ), and therefore f 0 is a (3, 1, 3) function.

   The following proposition show that in an optimal assignment, all (4, 0) variables should be assigned
according to majority, unless all of them are flipped.
Proposition 3.17. Let A be a partial assignment to variables in V0 as defined at the beginning of
this subsection. For any partial assignment to the (3, 1) and (2, 2) variables, we can assume that the
assignment to the (4, 0) variables that minimizes the weight of violated constraints either assigns the (4, 0)
variables according to majority, or assigns the (4, 0) variables according to the negation of majority.

Proof. Fix an assignment to all variables of type (3, 1) and (2, 2).
     We start by analyzing the majority assignment to the (4, 0) variables. Under this assignment, the
(7, 0, 0) functions only have (4, 0) neighbors and thus have 7 good paths through them. The (6, 0, 1)
functions have 2 neighbors of type (4, 0) and thus have at least 2 good paths through each of them, and
the (1, 0, 6) functions have at least 1 good path going through each of them.
     Every time we change the assignment of one of those (4, 0) variables, we will introduce 8 bad paths
(actually we flip those assignments in pairs—the variable and its negation—and we will get 16 in each
step). The key is to bound the number of variables in V1 that are now switched. Suppose we flipped

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                   23
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

0 < k < 14 of the (4, 0) functions, in the best case we have made k of the (1, 0, 6) functions switched,
and 6k/2 = 3k of the (6, 0, 1) functions switched. Note that unless we flip all (4, 0) functions, we will
never make the (7, 0, 0) functions switch. Therefore as long as 0 < k < 14, we will increase the number
of bad paths by 8k but will only get at most 4k more switched functions. This means that unless we flip
all assignments of (4, 0) functions to anti-majority, the value of the assignment will never be better than
having all (4, 0) functions be assigned to majority.

    Next we show that suppose the (4, 0) functions are all fixed to majority, then the best assignment
assigns majority to the (3, 1) functions.

Proposition 3.18. Let A be a partial assignment to variables in V0 as defined at the beginning of
this subsection. If all (4, 0) variables in V2 are assigned according to majority, then to complete this
assignment and minimize the weight of violated constraints, one should set all (3, 1) variables according
to majority.

Proof. We start by assigning all (3, 1) variables according to the majority assignments of their associated
affine variables, and compare the number of bad paths introduced by each flip against the potential
number of switched function we get. Note that of all the types of V1 functions we have, after setting the
(4, 0) variables to majority, the only potential functions that might not have good paths through them
have type (3, 1, 3) or (0, 4, 3). Note that for (0, 4, 3) functions, all bad paths come from (3, 1) functions,
thus flipping some (3, 1) to non-majority only contributes good paths to some (0, 4, 3) functions and
thus potentially decrease the number of fully switched functions. In order to upper-bound the maximum
number of fully switched function, we can safely ignore the effect of those functions and only focus on
the (3, 1, 3) functions.
     Now suppose that we change k of the (3, 1) functions to anti-majority assignment. Then we get 4k
extra bad paths, and the number of switched functions will increase by at most 3k/3 = k < 4k/2 (every
flip impacts 3 (3, 1, 3) functions, and each (3, 1, 3) function needs 3 flips such that the number of good
paths becomes 0.) This means that in this case anything other than the majority assignment to the (3, 1)
functions gives a worse assignment.

     Finally, we show that flipping all (4, 0) functions to anti-majority is not a good idea. In this case,
the number of bad paths after assigning anti-majority to (4, 0) and before assigning the (3, 1) and (2, 2)
functions is 14 · 8 = 112. With respect to the assignment where (4, 0) functions are assigned anti-majority
and (3, 1) functions are assigned majority, we have 2 functions in V1 of type (0, 7, 0), 42 functions of type
(4, 2, 1), 14 functions of type (0, 1, 6), 56 of type (3, 1, 3) and 14 of type (0, 4, 3).
     We now show that all (3, 1) functions should be assigned according to majority. Essentially, every flip
of a (3, 1) function decreases the number of good paths in 3 (3, 1, 3) functions and 3 (4, 2, 1) functions.
After flipping k of the (3, 1) variables away from majority, the increase in the number of switched function
is at most 3k/3 + 3k/4 = 7k/4 < 4k/2, not enough to make up the increased number of bad paths. If
we assign majority to the (3, 1) functions, then we already have 224 bad paths and 336 undetermined
paths, thus the total number of bad paths regardless of the assignment to the (2, 2) functions will be
224 + 336/2 = 392, and the number of switched function is at most 16, giving a minimum cost of
392 − 2 · 16 = 360.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               24
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

    On the other hand, if we take the all majority assignment, we get 280 bad paths and at most 14
switched function, giving a minimum cost of 280 − 2 · 14 = 252.
    This completes the proof of the lemma.


3.2   Assignments at distance 2 from Had3
In this section, we prove Lemma 3.6

Proof. Let A be a partial assignments to variables in V0 such that the assignment is distance 2 from
satisfying the Had3 predicate. That is, there exists x0 ∈ {−1, 1}3 , and l0 , l1 ∈ V0 , l0 6= ±l1 , such that
A( f ) = f (x0 ) for all f ∈ V0 \ {±l0 , ±l1 }, and A( f ) = − f (x0 ) otherwise. Note that there are multiple
choices of x0 and the identity of l0 and l1 depends on the choice of x0 . Here we pick x0 arbitrarily from
all the possible options. We call ±l0 and ±l1 corrupted affine functions. Our goal is to show that the total
weight violated by A is at least 288.
     The functions in V2 are of three types: the ones that contain neither of l0 and l1 as associated linear
functions, the ones that contain one of them and the ones that contain both. Functions of the first and the
third type are all (3, 1) functions, and functions of the second type can be either (4, 0) or (2, 2).
     We first describe the types of variables in V2 .

Proposition 3.19. Given a partial assignment to variables in V0 that is at distance 2 from an assignment
that satisfies Had3 , the following properties hold:

   1. There are 24 functions associated with 2 corrupted affine functions. All of them have type (3, 1).

   2. There are 24 functions associated with 0 corrupted affine functions. All of them have type (3, 1).

   3. Of the functions associated with 1 corrupted affine function, 16 have type (4, 0), and 48 have type
      (2, 2).

Proof. Let a0 be some arbitrary corrupted affine function, and let a1 be the other corrupted affine function
with a0 (x0 ) = a1 (x0 ).
    Let f be a function in V2 obtained from a0 by flipping x0 together with one of the bits y0 where a0 and
a1 disagrees. Therefore we have dH ( f , a1 ) = 4, so f is only associated with 1 corrupted affine function.
Also, note that the assignments to the associated affine functions of f all agree, so f is a (4, 0) function.
This gives 4 × 4 = 16 functions of type (4, 0).
    Consider now some function f ∈ V2 obtained from a0 by flipping x0 as well as one of the other bits
where a0 and a1 agrees. This means that dH ( f , −a1 ) = 2, so f is associated with 2 corrupted affine
functions and has type (3, 1). There are   2 × 2 = 4 ways to pick the corrupted affine functions that f is
associated with, and each pair gives 42 = 6 different functions in V2 . Therefore we get 24 functions of
this type.
    Next, consider starting from a0 , flip 1 bit on which a0 and a1 agrees other than x0 , and 1 bit on which
a0 and a1 disagree. The resulting function f is again only associated with 1 corrupted affine function, and
has type (2, 2). There are 48 functions of this type.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               25
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

   Finally, we count the functions that are not associated with any corrupted functions. Let a2 be an
uncorrupted affine function, and we assume that a0 and a1 both agree with a2 on x0 . Note also that a0 , a1
and a2 are all at distance 4 from each other. We have that the size of the following sets are all 2:
                                
                                 x ∈ {−1, 1}3 | a0 (x) = a1 (x) = a2 (x) ,                           (3.1)
                                
                                 x ∈ {−1, 1}3 | a0 (x) = a1 (x) = −a2 (x) ,                          (3.2)
                                           3
                                 x ∈ {−1, 1} | a0 (x) = −a1 (x) = −a2 (x) ,                          (3.3)
                                
                                 x ∈ {−1, 1}3 | a0 (x) = −a1 (x) = a2 (x) .                          (3.4)

Starting from a2 , in order to reach some f ∈ V2 that is not associated with any corrupted affine functions,
we can either choose one bit to flip from (3.1) and one from (3.2), or choose one bit from (3.3) and
another from (3.4). Thus from each a2 , we can reach 8 different functions in f ∈ V2 that are not associated
with any uncorrupted affine function. Each of these f is in turn associated with 4 functions in V0 like a2 .
Therefore, the total number of functions f ∈ V2 that are not associated with any corrupted affine functions
is 12 × 8/4 = 24.

    Now we turn to the neighbors of the affine functions.
Proposition 3.20. Given a partial assignment to variables in V0 that is at distance 2 from an assignment
that satisfies Had3 , the following properties hold:

    • All affine functions have 1 (7, 0, 0) neighbor, 3 (1, 2, 4) neighbors and 4 (4, 0, 3) neighbors. This
      gives a total of 16 (7, 0, 0) functions, 48 (1, 2, 4) functions and 64 (4, 0, 3) functions.

    • All (7, 0, 0) functions have 4 (4, 0) neighbors and 3 (3, 1) neighbors.

    • All (1, 2, 4) functions get 1 good path from a (3, 1) function, 2 bad paths from 2 (3, 1) functions,
      and 4 undetermined paths from 4 (2, 2) functions.

    • All (4, 0, 3) functions get 1 good path from a (4, 0) function, 3 good paths from 3 (3, 1) functions,
      and 3 undetermined paths from 3 (2, 2) functions.

Proof. As in the previous proof, let a0 be some arbitrary corrupted affine function, and let a1 be the other
corrupted affine function with a0 (x0 ) = a1 (x0 ).
    Let g ∈ V1 be the function obtained by flipping x0 from a0 . As in the argument of Proposition 3.19, if
we further flip one bit on which a0 and a1 disagree, we get a (4, 0) function. If we flip one bit on which a0
and a1 agrees, then we get a function that contributes a good path to g and is associated with a0 and −a1 ,
and thus is of type (3, 1). Thus g is of type (7, 0, 0), and it has 4 (4, 0) neighbors and 3 (3, 1) neighbors.
    Now let g be the function obtained by flipping in a0 some bits other than x0 where a0 and a1 agrees.
As in the argument of Proposition 3.19, if we further flip x0 , then we get a (3, 1) function contributing
a good path to g. If instead we flip the remaining 2 bits where a0 and a1 agree, we get another (3, 1)
function contributing a bad path to g. Finally, if we flip one of the 4 bits where a0 and a1 disagree, by the
argument in Proposition 3.19, we get a (2, 2) function. Therefore in this case g is of type (1, 2, 4), with
1 (3, 1) neighbor contributing a good path, 2 (3, 1) neighbors each contributing a bad path, and 4 (2, 2)
neighbors.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               26
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

     Next, let g be the function obtained by flipping in a0 some bits where a0 and a1 disagrees. By the
proof of Proposition 3.19, if we now flip x0 , we get a (4, 0) function; also if we flip another bit where
a0 and a1 agree, we get a (2, 2) function. If we instead flip one bit where a0 and a1 disagree, we get a
function contributing a good path to g (note that in this case the majority assignment to f actually assigns
− f (x0 )) and is associated with a0 and a1 , which means that f is of type (3, 1). Thus, g is a (4, 0, 3)
function with 1 (4, 0) neighbor, 3 (3, 1) neighbors and 3 (2, 2) neighbors.
     Now we turn to neighbors of uncorrupted affine functions. Let a2 be an uncorrupted affine function,
and let a0 , a1 be the two corrupted affine functions with a0 (x0 ) = a1 (x0 ) = a2 (x0 ).
     First, consider the function g obtained from a2 by flipping x0 . If we flip the other bit where a0 , a1 and
a2 all agree, we get a function in V2 that is associated with −a0 and −a1 . This function has type (3, 1)
and contributes a good path to g. If instead we flip the two bits in (3.2), we get a function that is not
associated with any corrupted affine functions, and it contributes a bad path to g. If we flip the remaining
four bits, we get a function that is associated with only 1 corrupted affine function, and it has type (2, 2).
We conclude that g is a (1, 2, 4) function and the profile of its neighbors is the same as above.
     Next, consider flipping the bit other than x0 where a0 , a1 and a2 all agree, and let g be the resulting
function. As discussed above, if we now flip x0 , we get a (3, 1) function with a good path. On the other
hand, if we now flip one of the 2 bits in (3.3), we get a function that is associated with only 1 corrupted
affine function, namely −a1 . This is a (4, 0) function. The situation is the same if we flip one of the 2
bits in (3.4). If we flip one of the 2 bits in (3.2) instead, we get a function associated with no corrupted
affine function, and it contributes a good path to g. We conclude that in this case g is (7, 0, 0) and the
profile of its neighbors is the same as above.
     Now, let g be the one obtained by flipping one of the bits in (3.2). If we now flip x0 , we get a (3, 1)
function with a bad path, as discussed above. If we flip the other bit in (3.1), we get a (3, 1) function with
a good path, as discussed above. If we flip another bit in (3.2), we get a (3, 1) function associated with
a0 and a1 that contributes a bad path. If we flip one of the bits in (3.3) or (3.4), we get a (2, 2) function.
Therefore, again g is a (1, 2, 4) function with the same profile as above.
     The remaining cases are where we obtain g by flipping one of the bits in (3.3) or (3.4) first. These
cases are symmetric, so without loss of generality, suppose the bit flipped is in (3.3). By the discussion
above, if we now flip one of the bits in (3.1) or (3.2), we get 1 (4, 0) function and 3 (2, 2) functions. If we
flip another bit in (3.3), we get a function associated with 2 corrupted affine functions with a good path.
If we flip one of the bits in (3.4), we get a function not associated with any corrupted affine function that
also contributes a good path. We conclude that g is of type (4, 0, 3) with the same profile as above.

    We can use similar arguments to characterize the neighbors of functions in V2 .
Proposition 3.21. Given a partial assignment to variables in V0 that is at distance 2 from an assignment
that satisfies Had3 , the following properties hold:
    • All (4, 0) functions have 4 neighbors of type (7, 0, 0) and 4 of type (4, 0, 3).
    • All (3, 1) functions have 1 neighbors of type (7, 0, 0), 4 of type (4, 0, 3), 2 of type (1, 2, 4) getting
      bad paths and 1 of type (1, 2, 4) getting a good path.
   We can now conclude the proof of this section by arguing that assignments that assign according to
majority for the (4, 0) and (3, 1) functions never do worse than other assignments.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 27
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

    Suppose we flip k of the (4, 0) functions and k of their negations away from majority. At most 8k
functions of type (4, 0, 3) will switch, and (7, 0, 0) functions will not switch unless 4 ≤ k ≤ 8. This means
that for a fixed assignment to the (3, 1) and (2, 2) functions, the weight of violated constraints will not be
less than that of assigning majority to all (4, 0) functions, unless k ≥ 4.
    Further, observe that if two (7, 0, 0) functions f and g share a (4, 0) neighbor, then dH ( f , g) = 2.
Therefore they share exactly 2! = 2 (4, 0) neighbors. Thus if k = 4, 5, at most 1 (7, 0, 0) function and its
negation will be switched.
    Let us consider the case when k = 4. We are flipping 8 (4, 0) functions, generating 64 bad paths. This
should be compensated by turning at least 32 functions from the set of (7, 0, 0) and (4, 0, 3) functions
into functions that have no good paths. Suppose we choose 0 ≤ k1 ≤ 1 of the (7, 0, 0) functions and
0 ≤ k2 ≤ 64 of the (4, 0, 3) functions, where k1 + k2 = 32. Each (7, 0, 0) is connected to a set of 3 (3, 1)
functions, and these sets are disjoint for different (7, 0, 0) functions. Therefore, at least 3k1 of the (3, 1)
functions need to be flipped. Also, each (4, 0, 3) is connected to a set of 3 (3, 1) functions, and each (3, 1)
function is connected to 4 (4, 0, 3) functions, therefore at least 3k2 /4 (3, 1) functions need to be flipped.
Moreover, the number of functions flipped must be even because we are flipping both the function and its
negation. Combining this, we conclude that at least 24 of the (3, 1) needs to be flipped, contributing 96
new bad paths. This means that we now have 64 + 96 = 160 bad paths and would like to find at least 80
functions in V1 that will potentially switch. This is the total number of (7, 0, 0) and (4, 0, 3) functions,
and since we assumed that we only switch 1 of the (7, 0, 0) functions, which means that it is already
impossible.
    For k > 4, the number of functions we need to potentially switch will only be higher, and therefore
flipping (4, 0) is never optimal.
    Once we have that the (4, 0) functions should be assigned majority, the rest of the argument is easy.
The only function that could possibly switch are the (1, 2, 4) functions. However, when we flip a (3, 1),
we get 4 bad paths and could expect to gain at most 1 switched function. Therefore flipping (3, 1) in this
case will always increase the weight of violated constraints.
    It is then easy to see that under the majority assignment, there is no switched function, and the total
number of bad paths is 288.


4     Optimality of the 11/8-gadget
In this section, we will construct an optimal solution to the dual LP given in Definition 2.30. This yields
the following theorem.

Theorem 4.1. [Theorem 1.8 restated] The value of the LP in Definition 2.30 is 11/64. As a result, for
every (c, s)-gadget reducing Max-Had3 to Max-2-Lin(2),

                                                  s 11
                                                    ≤   .
                                                  c   8

In other words, the gadget given in Theorem 3.1 is optimal among gadget reductions from Chan’s 7-ary
Hadamard predicate.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                28
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Proof. Our goal is to construct A ∈ R(Had3 ), i. e., a folded distribution of assignments which is random
on the primary variables. For i ∈ {0, 1, 2}, denote by Ri (Had3 ), the set of distributions Ai such that the
string
                                                                                         
           A (χ0/ ) , A χ{1} , A χ{2} , A χ{3} , A χ{1,2} , A χ{1,3} , A χ{2,3} , A χ{1,2,3}

is, over a random A ∼ Ai , distributed like a uniformly random element of {−1, 1}8 which is distance i
from satisfying the Had3 predicate. To prove Theorem 4.1, we will construct three separate distributions
A0 , A1 , and A2 with the property that Ai ∈ Ri (Had3 ) for each i ∈ {0, 1, 2}. Then, using Proposition 2.33,
we will set
                                             1        1        7
                                        A = A0 + A1 + A2 .
                                             16       2       16
     Using (1) of Proposition 2.32, we can decompose {−1, 1}8 = V0 ∪V     · 1 ∪V
                                                                               · 2 . A length-one edge (x, y)
in {−1, 1}8 has one of two types: either it goes between V0 and V1 , or it goes between V1 and V2 . Each
Ai distribution we construct will perform equally well on edges of a given type. As a result, for each
distribution Ai , we only need to keep track of two numbers, one for each edge type. To do so, for any
fixed edge (x, y) of type V0 ↔ V1 , define

                                 val(A0 ,V0 ↔ V1 ) := Pr 0 [A(x) = A(y)] ,
                                                        A∼A

and define val(A0 ,V1 ↔ V2 ) analogously. For convenience in this section, we will keep track of these
val(·) parameters rather than the corresponding uval(·) parameters.
    For each i ∈ {0, 1, 2}, our goal is to construct the Ai which strikes the right balance between making
val(Ai ,V0 ↔ V1 ) large and making val(Ai ,V1 ↔ V2 ) large. The next three lemmas show what we achieve.
Lemma 4.2. There exists a distribution A0 ∈ R0 (Had3 ) such that
                                                7                                7
                          val(A0 ,V0 ↔ V1 ) =        and   val(A0 ,V1 ↔ V2 ) =     .
                                                8                                8
Lemma 4.3. There exists a distribution A1 ∈ R1 (Had3 ) such that
                                                25                                7
                         val(A1 ,V0 ↔ V1 ) =          and val(A1 ,V1 ↔ V2 ) =       .
                                                32                                8
Lemma 4.4. There exists a distribution A2 ∈ R2 (Had3 ) such that
                                                7                                43
                         val(A2 ,V0 ↔ V1 ) =         and   val(A2 ,V1 ↔ V2 ) =      .
                                                8                                56
    By setting A := 16
                    1
                       A0 + 21 A1 + 16
                                     7
                                       A2 , we see that

                                                                        53
                                val(A,V0 ↔ V1 ) = val(A,V1 ↔ V2 ) =        .
                                                                        64
Finally, 1 − 53/64 = 11/64, and this is the number guaranteed by the theorem.
    We now prove Lemmas 4.2, 4.3, and 4.4. The first two are straightforward, but Lemma 4.4 is relatively
involved.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                              29
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

Proof of Lemma 4.2. The distribution A0 simply picks a random (negated) dictator assignment. More
formally, it does the following:

   1. Pick b ∈ {−1, 1} and i ∈ {1, 2, . . . , 8} independently and both uniformly at random.

   2. Let di be the i-th dictator function.

   3. Output the function b · di .

A random dictator will satisfy any fixed edge of the hypercube with probability 7/8. Thus, it remains to
prove that A0 ∈ R0 (Had3 ), and this is easy to check.

Proof of Lemma 4.3. The distribution A1 picks a random (negated) dictator assignment and corrupts its
value on a single primary variable. More formally, it does the following:

   1. Sample A from A0 .

   2. Let x be a uniformly random primary variable.
          e := −A(x) and A(−x)
   3. Set A(x)           e                    e := A for all other inputs.
                               := −A(−x). Set A
             e
   4. Output A.
                   e is generated from A, A
Note that whenever A                      e and A agree on all variables in V1 and V2 . As a result,

                                                                          7
                                val(A1 ,V1 ↔ V2 ) = val(A0 ,V1 ↔ V2 ) =     .
                                                                          8

Next, fix an edge (x, y) where x ∈ V0 and y ∈ V1 . Then Ae and A agree on (x, y) unless either x or −x is
selected as the primary variable in Step 2, in which case they disagree on x. By (1) of Proposition 2.32,
there are 16 primary variables, meaning this event occurs with probability 1/8. As a result,

                                     7                      1                            25
              val(A1 ,V0 ↔ V1 ) =      · val(A0 ,V0 ↔ V1 ) + · (1 − val(A0 ,V0 ↔ V1 )) =    .
                                     8                      8                            32
                                                                                                   e is
It remains to check that A1 ∈ R1 (Had3 ); this follows from the fact that A0 ∈ R0 (Had3 ) and that A
chosen by randomly perturbing A on a single primary variable.

Proof of Lemma 4.4. The distribution A2 is itself a mixture of two other distributions, A02 and A12 . We
now describe them separately and combine them later.

Constructing A02 . The distribution A02 is generated by the following process:

   1. Sample di uniformly at random from the set of dictators and negated dictators.

   2. Form dei by negating di on two uniformly random primary variables (and their negations).

   3. Set A : V → {−1, 1} to agree with dei on V0 .

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                           30
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

                             e = x1
                             x                                                     x3
                                             −1                       1
                               −1                                                   1
                                       1                 z                  1

                                       1                 1                  1
                               x2                                                  x4
                                             1                        1
                               1                                                    1
Figure 3: The neighborhood of z. The variables are labeled with the assignment di gives them. Note that
the assignment of z agrees with all but one of its neighbors and all but one of the x j . Further, the two
variables which disagree with z are adjacent.


   4. For every y ∈ V1 , let x be y’s neighbor in V0 .

          • If A(x) = di (x), set A(y) := di (y).
          • Otherwise, set A(y) := A(x).

   5. For every z ∈ V2 , set A(z) to the majority value of A on z’s eight neighbors in V1 . (Ties will not
      occur.)

By the construction of A, it is immediate that A02 ∈ R2 ({−1, 1}8 ). As for its performance on the two edge
types, we have the following proposition.

Proposition 4.5. val(A02 ,V0 ↔ V1 ) = 29/32 and              val(A02 ,V1 ↔ V2 ) = 167/224.

Proof. Let x ∈ V0 and y ∈ V1 be neighbors. With probability 3/4, A(x) = di (x), in which case A(y) = di (y).
Conditioning on this event, A(y) = A(x) with 7/8 probability, as the dictator assignment satisfies each
edge with probability 7/8. On the other hand, when A(x) 6= di (x), which happens with probability 1/4,
A(y) always equals A(x). Thus,

                                                             3 7 1 29
                                     val(A02 ,V0 ↔ V1 ) =     · + =   .
                                                             4 8 4 32

    To compute val(A02 ,V1 ↔ V2 ), we will condition the output of A02 on the choice of the (possibly
negated) dictator di in Step 1. Let us focus on a particular z ∈ V2 . We will now describe what the
immediate neighborhood of z looks like.
    Let N(z) ⊆ V1 denote the neighborhood of z, and let x1 , x2 , x3 , and x4 be the four points in V0
which are distance two from z. Regardless of what di is, (di (x1 ), di (x2 ), di (x3 ), di (x4 )) will always have a
majority size of three. Furthermore, di (z) agrees with this majority. Let xe ∈ {x1 , x2 , x3 , x4 } be the variable
in the minority, and assume without loss of generality that xe = x1 . Then of the two elements in N(z)
which neighbor x1 , di assigns one a value of 1 and the other a value of (−1); for the remaining elements
y ∈ N(z), di (y) = di (z). A pictorial representation of this is given in Figure 3.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                    31
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

     Consider the two primary variables which were selected in Step 2. If a given primary variable x j is
selected, then the value given to it by di is negated when forming A. Furthermore, all of x j ’s neighbors in
V1 are assigned the value A(x j ). Pictorially, if x3 in Figure 3 was selected in Step 2, then it and its two
neighbors’ values would be flipped to (−1) in A. Similarly, if xe = x1 in Figure 3 was selected, then both
it and its two neighbors would be given the value 1 by A.
     For a set of Boolean variables x1 , . . . , xk ∈ {−1, 1} where there is a majority value (in other words
∑i xi 6= 0), define Maj-Size(x1 , . . . , xk ) as the number of variables whose value equals the majority value.
The value A assigns to z is just the majority value of z’s neighbors, so A’s success on the edges neighboring
z will just be the fractional size of this majority. If x1 is selected in Step 2, then the value given to every
y ∈ N(z) agrees with the value given to y’s neighbor in the variables x j . Thus, in this case, the majority
size of z’s neighbors will just be 2 · Maj-Size(A(x1 ), A(x2 ), A(x3 ), A(x4 )). Otherwise, the two neighbors
of x1 will disagree with one another, and the majority size will be 1 + 2 · Maj-Size(A(x2 ), A(x3 ), A(x4 )).
     Now, we split into cases based on how many elements of {x1 , x2 , x3 , x4 } were selected in Step 2.
Case 0 occurs when zero were selected, Case 1 when one was selected, and so forth.

Case 0: In this case x1 is not selected, and Maj-Size of the remaining x j is three. As a result, the success
     probability is (1 + 2 · 3)/8 = 7/8.

Case 1: We split into two subcases depending on whether x1 was selected.

      Case x1 is selected: In this case, Maj-Size(A(x1 ), A(x2 ), A(x3 ), A(x4 )) = 4, and so the success
           probability is 1.
      Case x1 is not selected: In this case, Maj-Size(A(x2 ), A(x3 ), A(x4 )) = 2, and so the success prob-
           ability is (1 + 2 · 2)/8 = 5/8.

      The first subcase occurs with probability 1/4 and the second subcase occurs with probability 3/4.
      Combined, this means that A succeeds with probability 1/4 + (3/4) · (5/8) = 23/32 in this case.

Case 2: Again, we split into two subcases depending on whether x1 was selected.

      Case x1 is selected: In this case, Maj-Size(A(x1 ), A(x2 ), A(x3 ), A(x4 )) = 3, so the success proba-
           bility is (2 · 3)/8 = 6/8.
      Case x1 is not selected: In this case, Maj-Size (A(x2 ), A(x3 ), A(x4 )) = 2, and so the success prob-
           ability is (1 + 2 · 2)/8 = 5/8.

      Both subcases occur with probability 1/2. As a result, A succeeds in this case with probability
      (1/2)(3/4 + 5/8) = 11/16
                                                                                                        
There are 82 possible choices for the two primary variables which were negated to form dei . Of these, 42
                                        
yield Case 0, 4 · 4 yield Case 1, and 42 yield Case 2. This gives a total success probability of
                                               4
                                                                     4
                                                                       
                                                    7 4 · 4 23            11 167
                        val(A02 ,V1 ↔ V2 ) =   2
                                               8
                                                  · + 8 · +         2
                                                                      8
                                                                       ·    =     ,
                                               2
                                                    8   2
                                                            32        2
                                                                          16   224

as guaranteed by the proposition.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 32
                   I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Constructing A12 . In constructing A12 , it will be convenient to view V as the set of Boolean functions
on 3 variables V = f f : {−1, 1}3 → {−1, 1} . The distribution A12 is generated by the following
process:

    1. Pick x1 , x2 , x3 ∈ {−1, 1}3 .

    2. Set (x4 )i = (x1 )i · (x2 )i · (x3 )i for i ∈ [3].

    3. Pick b1 , . . . , b4 ∈ {−1, 1} subject to b1 b2 b3 b4 = −1.

    4. Set A1 , . . . , A4 such that Ai ( f ) = bi · f (xi ) for i ∈ [4].

    5. Pick a uniformly random M : {−1, 1}4 → {−1, 1} subject to

           • M(a1 , . . . , a4 ) = sgn(a1 + · · · + a4 ) when a1 + · · · + a4 6= 0, and
           • M(a1 , . . . , a4 ) = −M(−a1 , . . . , −a4 ) otherwise.

    6. Output A( f ) = M(A1 ( f ), . . . , A4 ( f )).

The construction of A12 is similar to the construction of the variables in V2 in the proof of (7) of Proposi-
tion 2.32. Via the correspondence shown in the proof of Proposition 2.33, item (7) of Proposition 2.32
shows that A12 ∈ R2 (Had3 ). As for its performance on the two edge types, we have the following
proposition.

Proposition 4.6. val(A12 ,V0 ↔ V1 ) = val(A12 ,V1 ↔ V2 ) = 13/16.

Proof. Let f , f 0 : {−1, 1}3 → {−1, 1} ∈ V be neighboring variables, i. e., they differ on one input. Let
x̃ be the input that f and f 0 differ on. Let us condition on whether x̃ ∈ {x1 , . . . , x4 }. As there are eight
elements of {−1, 1}3 , both cases occur with half probability.

        / {x1 , . . . , x4 }. In this case, Ai ( f ) = Ai ( f 0 ) for all i, so A( f ) always equals A( f 0 ).
Case x̃ ∈

Case x̃ ∈ {x1 , . . . , x4 }. In this case, one of the two strings ( f (x1 ), . . . , f (x4 )) and ( f 0 (x1 ), . . . , f 0 (x4 ))
has an even parity and the other string has an odd parity. Without loss of generality, assume that
( f (x1 ), . . . , f (x4 )) has odd parity.
      Over the random choice of the bi , the string (b1 f (x1 ), . . . , b4 f (x4 )) is distributed like a uniformly
random string with an even parity. In particular, the probability that all four entries are the same is 1/4.
When this occurs, then A( f ) = A( f 0 ). This holds because if,ÃŁ for instance, (b1 f (x1 ), . . . , b4 f (x4 )) =
(1, 1, 1, 1), then (b1 f 0 (x1 ), . . . , b4 f 0 (x4 )) has three 1’s, and so the majority value is 1.
      This means that with 3/4 probability, (b1 f (x1 ), . . . , b4 f (x4 )) has two 1’s and two (−1)’s. Fix
the bi , and consider the randomness over the choice of M. In this case, A( f ) will be a uniformly
random ±1 bit, because M is uniformly random when there is no clear majority. On the other hand,
(b1 f 0 (x1 ), . . . , b4 f 0 (x4 )) will have a clear majority, so A( f 0 ) a deterministic value. Thus, in this case,
A( f ) = A( f 0 ) with half probability.

                           T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                              33
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

Putting it all together. As a result,
                                                                    
                                                     0  1 1    1 3 1
                                   Pr[A( f ) = A( f )] = + ·    + ·    ,
                                                        2 2    4 4 2

which equals 13/16. As this holds for any neighboring functions f and f 0 , we get our desired conclusion,
which is that val(A12 ,V0 ↔ V1 ) = val(A12 ,V1 ↔ V2 ) = 13/16.

Combining A02 and A12 : The final distribution A2 is given by the mixture A2 = (2/3)A02 + (1/3)A12 .
Because A02 and A12 are both elements of R2 (Had3 ), it follows that A2 ∈ R2 (Had3 ). Furthermore, by
combining Propositions 4.5 and 4.6, we see that

                                                    7                                  43
                            val(A2 ,V0 ↔ V1 ) =          and val(A2 ,V1 ↔ V2 ) =          ,
                                                    8                                  56
as promised by the lemma.

      This completes the construction of the distributions A0 , A1 , and A2 , thereby completing the theorem.



5      A candidate factor-3/2 hardness reduction
5.1     The Game Show Conjecture
Herein we present an interesting problem concerning analysis of Boolean functions. We make a conjecture
about its solution which, if true, implies NP-hardness (with quasilinear blowup) of factor-(3/2 − δ )
approximating 2-Lin(2) for any δ > 0.

Definition 5.1. Let g : {−1, 1}n → {−1, 1} be a folded function (i. e., g(−x) = −x). The Game Show,
played with Middle Function g, works as follows. There are two personages: the Host and the Contestant.
Before the game begins, the Host secretly picks a uniformly random monotone path π from (1, 1, . . . , 1) to
(−1, −1, . . . , −1) in the Hamming cube. (Equivalently, π is a uniformly random permutation on [n].) The
Host also secretly picks T ∼ Binomial(n, 1/2). We define the secret half-path to be the sequence of the
first T edges along π: (x0 , x1 ), (x1 , x2 ), . . . , (xT −1 , xT ). Note that xT is uniformly distributed on {−1, 1}n .
     The Game now begins, with the current time being t = 0, the current point being x0 = (1, 1, . . . , 1),
and the current function being ge = g. (The current function will always be ±g.)
     At each time step t = 0, 1, 2, . . . , the Host asks whether the Contestant would like to negate the current
function, meaning replace ge with −e        g. If the Contestant does not negate the current function there is no
cost. However, if the Contestant elects to negate the current function, the Contestant must pay a cost of

                                                              1
                                                w(t) :=              .                                            (5.1)
                                                          (1 − t/n)2

After the Contestant makes the decision, the Host reveals to the Contestant what the (t + 1)th point on the
secret half-path is, and the new time becomes t + 1.

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                       34
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

    As soon as time T is reached, the Game ends. (In particular, in the unlikely case that T = 0, the
Contestant does not get to make any decisions.) At this instant, if ge(xT ) 6= 1, then the Contestant incurs a
further cost of w(T ). (It is as though the Contestant is now obliged to negate ge.) Thus one can think of
the Contestant’s goal throughout the Game as trying to ensure that ge(xT ) will equal 1, while trying to
minimize the total cost incurred by all negations.
    We define cost(g) to be the least expected cost that a Contestant can achieve when the Game Show is
played with Middle Function g.
    To get a feel for the Game Show, let us make some observations. First, as mentioned, the negation
cost w(t) is an increasing function of time; i. e., the later it is in the Game, the more costly it is for the
Contestant to negate ge. The cost to negate at the beginning of the Game is w(0) = 1 and the cost to negate
when the Game ends is w(T ) ∼ 4 (with very high probability, since T = (1/2 ± on (1))n with very high
probability). As a consequence, we always have cost(g) ≤ 2 + on (1). The reason is that the Contestant
can always use the strategy of never negating ge unless obliged to at the end of the Game. In this case, the
final evaluation will be g(x) where x is uniformly random. By oddness of g, this evaluation is −1 with
probability exactly 1/2, and only in this case does the Contestant suffer a cost, namely 4 ± on (1) (with
high probability).
    It can be shown that the best Middle Function g for the Contestant to play with is any (positive)
dictator, g = di , say. It is easy to check that the Contestant’s best strategy is the obvious one: negate if
and only if the Host restricted coordinate i to −1 on the previous turn. To estimate the expected cost of
this strategy, first note that the probability the Host restricts coordinate i throughout the course of the
game is 1/2. If the Host never restricts coordinate i then the Contestant will have cost 0. Otherwise,
conditioned on the Host restricting coordinate i over the course of the game, the Contestant will have
cost w(t ∗ ), where t ∗ is “essentially” uniformly distributed on {1, 2, . . . , n/2}. At this point it is natural to
introduce the “continuous” time parameter u = t/n, which ranges in [0, 1/2] (with very high probability),
as well as the function
                                                                 1
                                          W (u) := w(un) =             ,                                       (5.2)
                                                             (1 − u)2
which increases from 1 to 4 on [0, 1/2]. The distribution of u∗ = t ∗ /n is “essentially” uniform on [0, 1/2].
More precisely, one may check that up to on (1) errors, the expected cost to the Contestant conditioned on
coordinate i being restricted is just the average value of W , namely
                                               Z 1
                                                    2
                                                        W (u) · 2du = 2 .
                                                0

Thus we finally conclude that cost(di ) ∼ (1/2) · 2 = 1.
    Let us now look at the best strategy when the Middle Function is a negated dictator; i. e., let us try
to determine cost(−di ). Playing the Game Show with Middle Function −di is more stressful for the
Contestant because at the beginning of the game we have ge(x) = −1. Assume the Contestant elects not
to negate ge for a while at the beginning of the Game. If ever the Host restricts the ith coordinate to −1
then the Contestant can relax, knowing that no costs at all will be incurred. However as time progresses
without the ith coordinate being restricted, the Contestant will naturally get more and more nervous
that the game will end with ge(1, 1, . . . , 1) = −1, forcing a cost of essentially 4. On the other hand, if
the Contestant decides to “preemptively” negate, there is still some chance that the ith coordinate will

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                     35
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

subsequently be restricted before the game ends, forcing the Contestant to negate again. Actually, it
is not hard to show that the best strategy for the Contestant is of the following form, for some value
u0 ∈ [0, 1/2]: “If ever the Host restricts the ith coordinate to −1, negate ge if necessary and then never
negate again. Otherwise, wait until the continuous time hits u0 and then negate ge.” For example, the
u0 = 0 case of this strategy involves negating immediately at the game’s beginning (incurring cost 1) and
then playing the optimal strategy for a positive dictator (incurring expected cost 1 ± on (1)). The total
expected cost is 2 ± on (1). As another example, the u0 = 1/2 case of this strategy is the generic strategy
of never negating unless forced to at the game’s end. This case also has an expected cost of 2 ± on (1). In
fact, one can do a short calculation to show that the expected cost is 2 ± on (1) for every value of u0 . This
is by design: the particular cost function W is the unique function with this property.
    We have concluded that cost(di ) ∼ 1 and cost(−di ) ∼ 2. To now state our conjecture about the Game
Show, we need a piece of notation; for g : {−1, 1}n → {−1, 1} and “negation pattern” b ∈ {−1, 1}n , we
write g+b to denote the function defined by g+b (x) = g(b1 x1 , . . . , bn xn ). Roughly speaking, our conjecture
about the Game Show is that for every odd g, the average value of cost(g+b ) over all b is at least 3/2. To
be precise, we need to be concerned with averaging over merely pairwise-independent distributions on b.

Game Show Conjecture. Let g : {−1, 1}n → {−1, 1} be odd and let D be any distribution on {−1, 1}n
which is pairwise-independent and symmetric (meaning PrD [b] = PrD [−b]). Then
                                                            3
                                         E [cost(g+b )] ≥     − on (1) .
                                        b∼D                 2
      Our motivation for making the Game Show Conjecture is the following result.

Theorem 5.2. Suppose the Game Show Conjecture is true. Then it is NP-hard to approximate 2-Lin(2)
(and hence also Max-Cut) to factor 3/2 − δ for any δ > 0.

    We remark that given a Middle Function g, in some sense it is “easy” to determine the Contestant’s
best strategy. It can be done with a dynamic program, since the Game Show is essentially a 2-Lin(2)
instance on a tree graph. Nevertheless, we have been unable to prove the Game Show Conjecture. We
will discuss some of our efforts in Section 5.3. First, however, we will prove Theorem 5.2.

5.2     Proof of Theorem 5.2
The proof is by construction of a gadget as in Definition 2.26.

Theorem 5.3. Suppose the Game Show Conjecture is true. Then for each k, there exists a (c, s)-gadget
reducing Max-Hadk to Max-2-Lin(2) satisfying
                                                s 3
                                                 ≥ − ok (1) .
                                                c 2
   Via Corollary 2.28, this shows that for every ε > 0, it is NP-hard to approximate 2-Lin(2) to factor
3/2 − ok (1) − ε. Thus, by taking k large enough and ε small enough so that ok (1) + ε ≤ δ , we get
Theorem 5.2.
   Now we prove Theorem 5.3.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                  36
                 I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Proof of Theorem 5.3. Set n := 2k (previously we have used K for this number, but for this proof we will
use n). For the gadget G, let {−1, 1}n denote its variable set as usual, and let Z be the set of 2n primary
variables. For reasons that will be clearer later, we will call the variables of G the Contestant Strategy
variables. As in the definition of a (c, s)-gadget, we need only consider assignments A : {−1, 1}n →
{−1, 1} which are folded.
     We will add one twist to this construction: we will have an additional collection of variables,
identified with {−1, 1}n , called the Middle Function variables. We will write g : {−1, 1}n → {−1, 1} for
assignments to these variables, and we will use folding here as well so that each g can assumed to be
folded. We remark that it would actually be okay if the Middle Function variables and the Contestant
Strategy variables were identified; however, we find it conceptually clearer to separate them, and it does
not affect our analysis of gadget’s quality.
     We now describe the constraints we put on our gadget; these are highly reminiscent of the Game
Show described in the previous section. All of the constraints are equality tests along Hamming edges
(either within the Contestant Strategy/Middle Function hypercubes, or between them). We will henceforth
refer to gadget variables as “points,” and assume that n is at least a sufficiently large universal constant.

Overall Gadget Test G:
With probability δ , run the Average Sensitivity Test; with probability 1 − δ , run the Game Show Test.

Average Sensitivity Test:
Choose a uniformly random edge (y, y0 ) in the Middle Function hypercube and test that g(y) = g(y0 ).

Game Show Test:
Choose a random primary point x0 ∈ Z in the Contestant Strategy cube. Choose a random path
(x0 , x1 ), (x1 , x2 ), . . . , (xn−1 , xn ) from x0 to xn := −x0 . Choose T to be a Binomial(n, 1/2) random variable,
                                                      √
conditioned on being in the range n/2± n log n. Choose t ∈ [T ] such that Pr[t = t] is proportional to w(t)
(as defined in (5.1)). Finally, if t < T then test that A(xt−1 ) = A(xt ); otherwise, test that A(xt−1 ) = g(xT ).
(In the former case, both points are taken from the Contestant Strategy cube; in the latter case, xt−1 is
taken from the Contestant Strategy cube and xT is taken from the Middle Function cube.)
Here δ will be some decreasing function of n which we specify implicitly later. We first prove a little
lemma:
                                               √
Lemma 5.4. Let 1 ≤ t ≤ n/2 + n log n and write u = t/n. Then in the Game Show Test,
                                                       W (u)
                                         Pr[t = t] =         · (1 ± on (1)) ,
                                                        n
where W is defined as in (5.2).

Proof. By definition, all we need to do is show that the “constant of proportionality” in the Game
Show Test, namely ∑Ti=1 w(t), is equal to n(1 ± on (1)), uniformly for each
                                                                        RT
                                                                            outcome  of T . Since w is an
                                                                                 R T +1
increasing function, this discrete sum is bounded between the integrals 0 w and 1 w. Using the fact
    R 1/2                                                                           √
that 0 W = 1, it is easy to compute that both of the bounding integrals are n ± O( n log n) when T is
       √
n/2 ± n log n.

                         T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                    37
  J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

   It is easy to see that in both the Average Sensitivity Test and the Game Show Test, we have complete
symmetry with respect to the direction in [n] of the edge being tested for equality. Thus, for any dictator
function di , the probability of rejecting when A = g = di is indeed precisely 1/n. Thus, we need only
show that s ≥ (3/2 − on (1))/n for G to prove Theorem 5.3.
Theorem 5.5 (Soundness). Assume the Game Show Conjecture. Suppose A’s assignments to the 2n
primary points Z are chosen uniformly at random. Then regardless of how g and the remaining values of
A are chosen, the probability that the Overall Gadget Test rejects is, in expectation, at least

                                                 3/2 − on (1)
                                                              ,
                                                      n
provided that δ = on (1) is suitably chosen.

Proof. Let us ignore the Average Sensitivity Test for a moment and focus on the Game Show Test. It is
evidently somewhat similar to the Game Show as played with Middle Function g : {−1, 1}n → {−1, 1}.
In brief, the key differences are: (i) the Game Show Test starts its path from a random point in Z, rather
than from (1, 1, . . . , 1); (ii) in the Game Show Test, the Contestant/A-chooser’s job is even harder than in
the Game Show, because the entire strategy A must be fixed before the Game begins. In other words, it is
as though the Contestant must act at time t independently of what happened prior to time t.
     Having given a little intuition, let us elaborate by carefully phrasing the operation of the Game Show
Test in language similar to that of the Game Show. We incorporate into the Game Show Test A’s random
assignment to the primary points Z; we can imagine that the Host announces uniformly random values
A(z) for each z ∈ Z that the Contestant must use. In the next step of the Game Show Test, we have
the Contestant fix an odd Middle Function g; it is important to note that the Contestant gets to do this
after seeing the random values (A(z))z∈Z . Next we can imagine that the Host secretly picks a uniformly
random primary point z0 ∈ Z and a random half-path from it, (z0 , z1 ), . . . , (zT −1 , zT ). The Host will be
testing equality of A’s values on a random (according to w) edge along this half-path, using the value of g
instead of A for the last point along the half-path. However in the Game Show Test, the Contestant must
fix an entire strategy (A(x))x6∈Z before the Host reveals which edge is tested.
     We will now make things easier on the Contestant—this can only decrease the probability of the
test rejecting. Specifically, we will not force the Contestant to immediately announce an entire strategy
(A(x))x6∈Z . Rather, the Host will reveal the edges (z0 , z1 ), (z1 , z2 ), . . . one-by-one, just as in the Game
Show, and will only require the Contestant to decide on A(zt ) in the tth step of this process. Note that
A(z0 ) is already fixed, as is g(zT ). Once zT is revealed, the Host will then choose t as in the Game Show
Test and do the equality-test on the tth edge.
     Now let us make a few more viewpoint changes, so that the Game Show Test becomes even more
similar to the Game Show. As it has now been described, the Game Show Test starts at a random primary
point z0 ∈ Z, and the Contestant is obliged to use the Host’s initially randomly chosen value A(z0 ). Let us
change this so that once the Host chooses z0 ∈ Z at random, the function g is immediately replaced by
ge = A(z0 )g+z0 = g+A(z0 )z0 . (The last equality uses the fact that g is odd.) In this way, we may equivalently
assume that (as in the Game Show) the random half-path always originates from (1, 1, . . . , 1), and that the
Contestant is obliged to use the assignment A(1, 1, . . . , 1) = 1. Now as the Game Show Test proceeds,
when the Host reveals the tth edge, the Contestant is allowed to specify whether A(zt ) should be equal to

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                  38
                  I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

A(zt−1 ), or equal to its negation. Note that it is “costly” for the Contestant to choose negation, in the sense
that it yields an “unequal” edge that will increase the Host’s rejection probability by w(t). This process
proceeds until zT is reached, whereupon the Contestant is committed to the final assignment value g(zT ).
In our final viewpoint change, instead of allowing the Contestant to either “keep” or “negate” the value
assigned at time t − 1, it is equivalent to allow the Contestant to either keep ge or replace it by −e   g. The
latter choice corresponds to creating an “unequal” edge and incurring cost (rejection probability) w(t).
The fact that the Contestant is obliged to use the initial assignment 1 (recall we initially multiply g+z0 by
A(z0 )) in the test corresponds to the fact that in the Game Show, the Contestant is obliged to end on 1.
     It may seem as though we have by now shown that the Game Show Test is equivalent to playing the
Game Show with Middle Function g+A(z0 )z0 , where z0 ∈ Z is chosen uniformly at random—in the sense
that the rejection probability of the test is equal to the expected value of cost(g+A(z0 )z0 ). However there
is one catch: in the Game Show Test, g may be chosen after the random assignments (A(z))z∈Z to the
primary points. The remainder of the proof is devoted to showing that the Contestant can not effectively
“take advantage” of this.
     For this we return to the actual Overall Gadget Test, which performs the Average Sensitivity Test
with probability δ and the Game Show Test with probability 1 − δ . (Note that our theorem statement
tolerates the loss of factor-(1 − δ ) in soundness here.) The purpose of the Average Sensitivity Test is
to ensure that the chosen g must essentially be a junta. More precisely, we see that if the chosen g has
average sensitivity exceeding (3/2)/δ then the Average Sensitivity component of the Overall Gadget
Test already rejects with probability exceeding (3/2)/n. Thus we can assume the chosen g has average
sensitivity at most (3/2)/δ . It follows from Friedgut’s Junta Theorem [12] that g must be δ -close to a
                             3
junta on some J = 2O(1/δ ) variables. Using the fact that g is odd, one can also ensure that this junta is
odd.1 Next, note that the Game Show Test only involves g with probability
                                          W (1/2)                 5
                                                   (1 ± on (1)) ≤
                                             n                    n
(using Lemma 5.4). Furthermore, when it does involve g, the point on xT on which it involves g is nearly
uniformly random in {−1, 1}n . The difference from uniform randomness comes because we conditioned T
                           √
to lie in the range n/2 ± n log n; however, T lies outside this range only with probability on (1). Thus
we see that if we simply replace g with its δ -close odd J-junta, we only affect the Overall Gadget Test’s
rejection probability by at most (5/n)(δ + on (1)), an amount we can absorb in the theorem statement. In
                                                                                           3
summary, we may freely assume that the chosen g is always an odd junta on J = 2O(1/δ ) coordinates.
     Finally, to complete the proof by applying the Game Show Conjecture, it would remain to show
that with probability 1 − on (1) over the initial random choice of (A(z))z∈Z , for all choices of J out of n
coordinates, the projection of the distribution A(z0 )z0 to the coordinates J is pairwise independent and
symmetric. This is not quite correct, but in Lemma 5.6 below we show that each projection is O(δ )-close
in total variation distance to being simultaneously pairwise-independent and symmetric. This is sufficient
to complete the proof.

Lemma 5.6. Suppose n is sufficiently large as a function of δ . Suppose that for each of the 2n strings z ∈ Z
a uniformly random bit az is chosen. Then except with probability on (1), for every fixed set of J = J(δ )
   1 The proof shows g is close to sgn(
                                       ∑S⊆J gb(S)χS ) for some |J| ≤ J, using an arbitrary convention for sgn(0). One need only
ensure that any sgn(0) choices that arise are made in an oddness-preserving way.


                          T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                             39
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

                                                  3
coordinates in [n] (in particular, for J = 2O(1/δ ) ), the uniform distribution on the set of strings (az zJ )z∈Z
is O(δ )-close (in fact, on (1)-close) to being simultaneously pairwise-independent and symmetric.

Proof. We begin just by showing the pairwise-independence property; here we will not need that J
is small compared to n. For any particular pair of coordinates (i, j) in [n] the list of two-bit strings
(z{ (i, j)})z∈Z has n/4 equal pairs and n/4 unequal pairs (since Walsh–Hadamard columns are orthogonal).
Thus when the bits az are chosen, in the list (az z{(i, j)} )z∈Z we will see each of the four possible two-bit
                   √
strings n/8 ± O( n log n) times except with probability  1/n2 . Thus by taking a union bound over all
pairs of coordinates (i, j), it follows that except with probability on (1) we have that all projections of
                                     e √n)-close to uniform.
(az z)z∈Z onto two coordinates are O(1/
     Next we consider the symmetry property. Fix any subset J ⊂ [n] of J coordinates. Let us consider, for
                                                                                                            √
each string x ∈ {−1, 1}J , how many times it occurs in the list (zJ )z∈Z . Some variables x have at most n
                                                                      √
occurrences. However even collectively, these constitute only 2J n out of the n/2 strings, and thus they
contribute only on (1) total probability mass to the uniform distribution on (zJ )z∈Z . As for the remaining
                                              √
strings x ∈ {−1, 1}J , each occurs at least n times. Thus when the bits az are chosen, nearly equally
many occurrences of ±x will be formed. Taking into account the need to union-bound over all 2J strings
and indeed all nJ subsets J, we still get that except with probability on (1), for all J of cardinality J, the
                                                       √
projection of (az z)z∈Z onto the J-coordinates is O( J log n/n1/4 )-close to symmetric.
     Finally, we claim that if a distribution on J-bit strings is on (1)-close to symmetric and has all 2-bit
marginals on (1)-close to uniform, then it is has total variation distance at most 2J 2 · on (1) = on (1) from
being simultaneously symmetric and pairwise-independent. To see this, we can begin with the on (1)-
nearby symmetric distribution; it still has all its 2-bit marginals at most 2on (1)-close to uniform, and
all of its 1-bit marginals are exactly uniform. Now we can apply the “correction procedure” from [2] to
deduce that the resulting distribution is 2J 2 on (1)-close to being pairwise-independent. It only remains to
observe that this correction procedure maintains symmetry, since it merely mixes the distribution with
various symmetric distributions (namely, distributions of the form “uniform on {x : xi x j = 1}” for distinct
i, j ∈ J).

      This finishes the (Completeness) and (Soundness) cases of G, giving us Theorem 5.3.


5.3     Regarding the Game Show Conjecture
As mentioned, we are unable to prove the Game Show Conjecture. For the record, we describe here some
of our ideas toward proving it. Since we are not claiming any theorems in this section, we will not be
completely precise.
      As we have already seen, it seems more natural to think of a “continuous” time parameter u = t/n
that starts at u = 0 and ends at u = 1/2. (Thinking of n as large, the ending time will indeed be
u = 1/2 ± on (1) with high probability.) In fact, we believe it is even more natural to use a different
continuous parameterization of time. Specifically, define the time remaining parameter s by s = ln(2−2u);
i. e., u = 1 − exp(s)/2. As time runs from u = 0 up to u = 1/2, the “time remaining” runs from s = ln 2
down to s = 0. The idea behind this rescaling is that now the Host restricts coordinates according to an
exponential clock of rate 1/n. Note from (5.2) that the cost to the Contestant of negating ge with s time

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                                 40
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

remaining is
                                            W (s) = 4 exp(−2s) .                                       (5.3)
    Suppose we have some current function ge and the time remaining is s. Let us define costge(s) to be
the expectation of the Contestant’s remaining cost, assuming an optimal strategy; note that cost(eg) =
costge(ln 2). For the two constant functions we have:

                                          cost+1 (s) = 0 ,
                                          cost−1 (s) = 4 exp(−2s) .

The latter equality holds because whenever the current function ge gets restricted to the constant func-
tion −1, it is in the Contestant’s best interest to immediately negate (because the negation cost is an
increasing function of time).
    As we alluded to earlier, for general ge there is a “dynamic program” for computing costge(s). (Actually,
because s is a continuous parameter, it is more like a “differential equation.”) The “base case” for the
dynamic program is                             (
                                                  0 if ge(1, 1, . . . , 1) = 1,
                                   costge(0) =                                                         (5.4)
                                                  4 if ge(1, 1, . . . , 1) = −1.
In general, we can compute costge(s + ds) given knowledge of cost f (s) for all subfunctions f of ±e      g.
The fact that the Contestant is allowed to actively negate ge causes some complications, so let us begin
by considering a lazy Contestant, meaning one who uses the (possibly nonoptimal) strategy of never
negating ge unless it gets restricted to the constantly −1 function (or unless the game ends and negation is
“forced”).
    Writing Cge(s) for the analogue of costge(s) in the case of a lazy Contestant, we have the same base
case (5.4). As for the general formula, suppose that ge depends on r coordinates and that we consider the
time remaining dropping from s + ds to s. Let u = 1 − exp(s)/2 as usual. It is easy to see that each of ge’s
relevant coordinates has probability du/(1 − u) of being restricted. Since u = 1 − exp(s)/2, this precisely
equals −ds, and hence in going from s + ds to s time remaining, each coordinate has probability ds of
being restricted. Thus we deduce the “dynamic programming” formula:

                          Cge(s + ds) = r · ds · avg{Cge0 (s)} + (1 − r · ds) ·Cge(s) ,
                                                  ge0

where the average is over all functions ge0 gotten by restricting one coordinate of ge to be −1. Let us write

                                            Age(s) = avg{Cge0 (s)} ,
                                                        ge0

                                            d
and also approximate Cge(s + ds) = Cge(s) + ds Cge(s). Then after rearrangement, the above dynamic
programming formula becomes the differential equation

                                        −Cge(s) = r(Cge(s) − Age(s)) .

The solution to this ODE is
                                       Cge(s) = r exp(−rs) · Lr Age(s) ,                               (5.5)

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                              41
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

where Lr denotes the operator defined by
                                               Z s
                                   Lr A(s) =         exp(ry)A(y) dy + const ,
                                                0

with the value of const being determined by the initial condition, Cge(0) ∈ {0, 4}. With these formulas in
hand one can directly compute the following formulas for the two r = 1 functions:

                                     Cdi (s) = 4 exp(−s) − 4 exp(−2s) ,
                                    C−di (s) = 4 exp(−s) .

It was previously argued that the lazy strategy is optimal when the Middle Function is a dictator di , and
the first formula above confirms that cost(di ) = 4 exp(− ln 2) − 4 exp(−2 ln 2) = 1. Furthermore, observe
that
                                        C−di (s) = Cdi (s) +W (s) .
The left-hand side is the expected cost to the Contestant when using the lazy strategy on a negated dictator;
the right-hand side is the expected cost if the Contestant decides to negate −di to di at time s. Notice that
we have equality for all s. This is precisely by design; as mentioned earlier, we chose the negation-cost
formula W (s) so that when the Middle Function is a negated dictator, all strategies of the form “negate if
and only the relevant restriction has not occurred by time s0 ” are equally good, including the lazy strategy.
    We have cost(di ) = 1 and cost(−di ) = 4 exp(− ln 2) = 2, hence Eb∼D [cost(g+b )] = 3/2 whenever D
is symmetric and g is a dictator or negated dictator. To confirm the Game Show Conjecture, what we
need to show is that for D symmetric and pairwise-independent we have
                                                                  3
                                               E [cost(g+b )] ≥
                                              b∼D                 2
for all odd g.
    Next we consider functions with exactly 2 relevant variables. There are actually no such odd functions,
but we still need to analyze them since they can arise at intermediate points in the game. From (5.5) one
may compute:

    CAND2 (s) , = 8(exp(s) − 1 − s) exp(−2s) ,            COR2 (s) , = 8s exp(−2s) ,                     (5.6)
    CNOR2 (s) , = 4 exp(−2s) ,                             C=2 (s) , = 4(2 exp(s) − 1 − 2s) exp(−2s) ,   (5.7)
      C6=2 (s) , = 8(exp(s) − 1) exp(−2s) ,              Cx∧¬y (s) , = 4(exp(s) − 1) exp(−2s) ,          (5.8)
    Cx∨¬y (s) , = 4 exp(s) exp(−2s) ,                  CNAND2 (s) , = 4(2 exp(s) − 1) exp(−2s) .         (5.9)

Note that CNAND2 (s) > W (s) +CAND2 (s) for all s > 0. This implies that the lazy strategy is not optimal
for the Contestant when the function is NAND2 . It is not hard to show that the best strategy for the
Contestant involves immediately negating NAND2 to AND2 whenever it arises. For all other functions
above the lazy strategy is optimal, i. e., costg (s) = Cg (s); but for NAND2 we have

                 costNAND2 (s) = W (s) + costAND2 (s) = 4(2 exp(s) − 1 − 2s) exp(−2s) .



                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               42
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

    Now we can consider 3-bit functions. The only odd ones (up to symmetry) are χ{1,2,3} and Maj+b 3
for b ∈ {−1, 1}3 . It is an exercise to confirm the Game Show Conjecture for all parity functions, so
let us focus on the majority-type functions. For them we will introduce the more general notation
LTFa1 ,...,ar (x) = sgn(a1 x1 + · · · + ar xr ). From (5.5) one may compute:
                             CLTF1,1,1 (s) = 24(1 − exp(−s) + s exp(s)) exp(−3s) ,
                           CLTF1,1,−1 (s) = 8(exp(2s) − s exp(s) − 1) exp(−3s) ,
                          CLTF1,−1,−1 (s) = 4(2 exp(s) − 2s − 1) exp(−2s) ,
                         CLTF−1,−1,−1 (s) = 4(3 exp(s) − 2) exp(−3s) .
One can show that in fact cost = C for the first three of these functions. However this is not true for
the last function, namely −Maj3 . Here we have C−Maj3 (s) > CMaj3 (s) +W (s), implying that sometimes
the Contestant should negate −Maj3 to Maj3 . Indeed, now one should consider the overall “dynamic
program” more carefully, to incorporate the fact that the Contestant is allowed to negate. Extending the
differential equation reasoning above, one finds that it is optimal for the Contestant to negate ge at time s0
if and only if
                                 Age(s0 ) ≤ A−eg (s0 ) +W (s0 ) +W 0 (s0 )/r .
In particular, when ge = −Maj3 , this condition is equivalent to
                                 costNOR2 (s0 ) ≤ costOR2 (s0 ) +W (s0 )/3 .
Using the formulas in (5.6), one directly calculates that the least s0 for which we have equality above is
s0 = 1/3. It follows that the optimal strategy for −Maj3 is to negate if and only if the time remaining
reaches 1/3 without any relevant coordinates being restricted. Taking this into account, we ultimately
determine:
                           (
                             24(1 − exp(−s) + s exp(s)) exp(−3s) + 4 exp(−2s) if s ≤ 1/3,
           cost−Maj3 (s) =
                             12(2 + exp(s) − 2 exp(1/3)) exp(−3s)                   if s ≥ 1/3.
By substituting s = ln 2, we can state the optimal costs for all reorientations of Maj3 :
                                   cost(LTF1,1,1 ) = 6 ln 2 − 3 ≈ 1.159 ,
                                 cost(LTF1,1,−1 ) = 3 − 2 ln 2 ≈ 1.614 ,
                                cost(LTF1,−1,−1 ) = 3 − 2 ln 2 ≈ 1.614 ,
                              cost(LTF−1,−1,−1 ) = 6 − 3 exp(1/3) ≈ 1.813 .
The last of these, cost(−Maj3 ), is surprisingly low. In particular, note that
                                                     3                               3
                   avg{cost(Maj3 ), cost(−Maj3 )} = (1 − exp(1/3) + 2 ln 2) ≈ .99 · .               (5.10)
                                                     2                               2
This shows that it is not sufficient in the Game Show Conjecture merely to assume that the distribution on
orientations b is symmetric. However, as the only symmetric pairwise-independent distribution on three
bits is the uniform distribution, the case of g = Maj3 is consistent with the Game Show Conjecture:
              1               3              3              1                         3
                (6 ln 2 − 3) + (3 − 2 ln 2) + (3 − 2 ln 2) + (6 − 3 exp(1/3)) ≈ 1.58 > .
              8               8              8              8                         2

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               43
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

     We do not have a clear strategy for analyzing the general case. A reasonable place to start is to analyze
all linear threshold functions, a class of functions closed under restriction, which includes dictators and
majorities. In particular, the r-ary “monarchy” function, LTFr−2,1,1,...,1 seems like an interesting challenge
to analyze.
     We make one final remark: As we have seen, the Game Show Conjecture is not correct if the
distribution on orientations b need only be symmetric. This is a bit of a shame, because in the proof
of the soundness Theorem 5.5, one can ensure symmetry without using any special properties of the
predicate being reduced from, besides uselessness. In particular, one could use the older NP-hardness
reduction of Samorodnitsky and Trevisan [23] in place of Chan’s, which has the advantage [19] of holding
with a quasilinear-size blowup. If one hazards the guess that Eb∼D [cost(g+b )] ≥ .99 · 3/2 whenever D is
symmetric, a consequence would be that 2-Lin(2) is NP-hard to approximate to factor .99 · 3/2 with a
quasilinear-size reduction; hence, this level of approximation would require nearly full exponential time,
   1−o(1)
2n        , assuming the Exponential-Time Hypothesis.


6     Limitations of gadget reductions
In this section, we show a limitation to proving inapproximability using gadget reductions from balanced
pairwise-independent predicates, that is, predicates φ that admit a set S ⊆ sat(φ ) satisfying Property 2
in Definition 2.4. We show that gadget reductions from φ to 2-Lin(2) can not prove inapproximability
larger than a factor-2.54 for the deletion version. Note that this applies to the Hadk predicates and to a
broader class of predicates that do not necessarily admit a natural group operation.

Theorem 6.1. Let G be a (c, s)-generic gadget reducing Max-φ to Max-2-Lin(2), where φ admits a
balanced pairwise-independent set. Then

                                           s     1
                                             ≤         ≈ 2.54 .
                                           c 1 − e−1/2

Proof. As before, K is the number of satisfying assignments of φ . Recall that the vertex set of G is
V = {−1, 1}K . Further, via Propositions 2.19 and 2.21, we need only consider folded assignments to
these variables, and we can assume G only uses (=)-constraints. Finally, via Proposition 2.29, we can
assume that every (=)-constraint used by G is between two variables x and y which are Hamming distance
one from each other. Let P be the set of generic primary variables, let −P be their negations, and let
P± = P ∪ (−P) denote the union of the two. Since φ is balanced pairwise-independent, we have a set
S ⊆ [K] so that for i picked uniformly at random from S, Pri [ui = vi ] = 1/2 for distinct primary variables
u, v ∈ P.
     Define the similarity between x and y to be the fraction of positions on which they agree sim(x, y) :=
Pri [xi = yi ] and set sim(x, P± ) := maxy∈P± sim(x, y). Pairwise-independence allows us to claim that any
variable x is strongly similar (i. e., has similarity > 3/4) with at most one variable y ∈ P± ; define y to be
x’s closest primary variable.

Fact 6.2. For any x ∈ V , if sim(x, y) > 3/4 for some y ∈ P± , then sim(x, y0 ) < 3/4 for all other y0 ∈ P± .

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               44
                I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

Proof. If x has sim(x, y1 ) > 3/4 and sim(x, y2 ) ≥ 3/4 for y1 , y2 ∈ P± , then

                                                                                1
                              sim(y1 , y2 ) ≥ sim(y1 , x) + sim(x, y2 ) − 1 >     ,
                                                                                2
contradicting the assumption on φ .

    This fact allows us to design the following “threshold-rounding” procedure to construct a distribution
A ∈ Rgen (φ ). Let D be a distribution over [3/4, 1] with probability density function D(t) = C · e2t , for
t ∈ [3/4, 1] (and C set appropriately).

   1. Pick a random assignment to the primary variables.

   2. Pick a number t ∼ D. For any variable x ∈ V , call x type 1 if sim(x, P± ) > t and type 2 otherwise.

   3. Assign all type-1 variables the value of their closest primary variable.

   4. Pick a uniformly random dictator di and set all the type-2 variables to agree with this dictator.

   5. Output the resulting assignment.

Note that the assignments are folded and are random on the primary variables. We analyse the per-
formance of this assignment. Let (x, y) be an edge in {−1, 1}K of Hamming weight one. If both
sim(x, P± ), sim(y, P± ) ≤ 3/4, then regardless of the value of t, x and y will both always be type-2 vari-
ables, in which case A violates the edge between them with the probability of a random dictator, which
is
                                            1        1       1
                                              ≤        −1/2
                                                            · .
                                           K 1−e             K
    On the other hand, suppose without loss of generality that

                         sim(x, P± ) > sim(y, P± ) and that      sim(x, P± ) > 3/4 .

If we set s := sim(y, P± ), then sim(x, P± ) = s + 1/K. Because y is distance one from x, s ≥ 3/4. Not
only that, if y has a closest primary variable, then that variable is the same as x’s closest primary variable
(this is by Fact 6.2). Now, to calculate the probability that A violates (x, y), there are three cases:

   1. If t ∈ [3/4, s), then x and y are assigned the value of the same variable in P± , so (x, y) is never
      violated in this case.

   2. If t ∈ [s, s + 1/K), then y’s value is chosen according to a uniformly random dictator assignment,
      meaning that it is a uniformly random ±1-bit, independent from x’s value. In this case, (x, y) is
      violated with probability 1/2.

   3. If t ∈ [s + 1/K, 1], then both x and y are assigned values according to a random dictator, in which
      case (x, y) is violated with probability 1/K.

                       T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                               45
    J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

In total,
                                        1                            1
                 Pr[A violates (x, y)] =  · Pr [t ∈ [s, s + 1/K)] + · Pr [t ∈ [s + 1/K, 1)]
                                        2 t∼D                        K t∼D
                                          Z s+ 1              Z 1
                                        1      K            1
                                      =          Ce2t dt +          Ce2t dt
                                        2 s                 K s+ K1
                                                             Z 1
                                       1 Ce2s+2/K 1
                                      ≤ ·        +                    Ce2t dt
                                       2    K      K          s+ K1

                                           Ce2     1       1
                                      =        =     −1/2
                                                          · ,
                                           2K    1−e       K

as promised. Here the inequality follows from the fact that e2t is an increasing function. As G only uses
length-one edges, c = 1/K. We have just shown that

                                                              1     1
                                           uval(A; G) ≤            · .
                                                          1 − e−1/2 K

Because A ∈ Rgen (φ ), we conclude that

                                                 s     1
                                                   ≤         .
                                                 c 1 − e−1/2

7     Conclusion
As mentioned, we view our factor-11/8 NP-hardness result more as a proof of concept, illustrating that
the longstanding barrier of factor-5/4 NP-hardness for Max-Cut/2-Lin(2)/Unique-Games can be broken.
There are quite a few avenues for further work:

     • An obvious problem is to derive a better NP-hardness result for 2-Lin(2) by reduction from Had4
       rather than Had3 . As one can always embed our Had3 -based gadget into a standard Had4 -based
       gadget, this method will always yield a hardness of at least 11/8. But presumably the optimal
       Had4 -based gadget will do slightly better.
        Since our analysis of the optimal Had3 gadget is already somewhat complicated, it might be
        challenging to analyze the Had4 case explicitly. A weaker but more plausible goal would be to
        prove (perhaps indirectly) that there exists a δ0 > 0 such that the optimal Had4 gadget achieves
        factor-(11/8+δ0 ) NP-hardness. This would at least definitely establish that 11/8 is not the “correct
        answer” either.

     • Of course, proving the Game Show Conjecture would yield the improved NP-hardness factor of
       3/2. It may also be simpler to try to prove a non-optimal version of the conjecture, yielding some
       hardness factor better than 11/8 but worse than 3/2. Certain ideas we had for trying to prove the
       conjecture (e. g., distinguishing whether g is “close to” or “far from” being a dictator) might yield
       such a result.

                        T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                             46
              I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

   • Along these lines, it could also be a good idea to prove some form of the Game Show Conjecture
     that only relies on the distribution D on orientations b being symmetric, rather than pairwise-
     independent. As mentioned, this would allow one to reduce from the Samorodnitsky–Trevisan
     hardness result, yielding nearly full-exponential hardness under the ETH. Since our proof of
     Theorem 5.2 reduces from Chan’s hardness result, it has the unfortunate aspect that we only get
                                                                                δ0
     factor-(3/2 − δ ) hardness (under ETH) for algorithms running in time 2n for some δ 0 = δ 0 (δ )
     that we did not bother to make explicit.

   • For Max-Cut, our work establishes NP-hardness of (ε,Cε)-approximation for any C < 11/8, but
     only for ε ≤ ε0 where ε0 is some not-very-large constant arising out of Proposition 2.35. It would
     be nice to get a direct Max-Cut gadget yielding a larger ε0 , like the ε0 = 1/8 we have for 2-Lin(2).

   • A recent result of Gupta, Talwar, and Witmer [14] showed NP-hardness of approximating the
     (closely related) Non-Uniform Sparsest Cut problem to factor-17/16, by designing a gadget
     reduction from the old (4/21, 5/21)-approximation hardness of Håstad [15]. A natural question
     is whether one can use ideas from this paper to make a direct reduction from Had2 or Had3 to
     Non-Uniform Sparsest Cut, improving the NP-hardness factor of 17/16.

   • We are now in the situation (similar to the situation prior to [21]) wherein the best NP-hardness
     factor we know how to achieve for 2-Lin(q) (or Unique-Games) is achieved by taking q = 2. In
     fact, we do not know how to achieve an NP-hardness factor better than 5/4 for 2-Lin(q) for any
     q > 2, even though 2-Lin(q) is presumably harder for larger q. Can this situation be remedied?

   • In light of the limitations described in Section 6, it makes sense to seek alternative methodology of
     establishing improved NP-hardness for 2-CSPs. An example showing that this is not at all hopeless
     comes from the decade-old work of Chlebík      √ and Chlebíková [9], which shows NP-hardness of
     approximating 2-Sat(-Deletion) to factor 8 5 − 15 ≈ 2.8885. Their result is essentially a small
     tweak to the Vertex-Cover hardness of Dinur and Safra [10] and thus indeed uses a fairly radical
     methodology for establishing two-bit CSP-hardness, namely direct reduction from a specialized
     Label-Cover-type problem.


Acknowledgments
The authors would like to warmly thank Per Austrin for his assistance with computer analysis of the
11/8-gadget.


References
 [1] A MIT AGARWAL , M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV:
       √
     O( log n) approximation algorithms for Min-Uncut, Min-2CNF-Deletion, and directed cut prob-
     lems. In Proc. 37th STOC, pp. 573–581. ACM Press, 2005. [doi:10.1145/1060590.1060675]
     3

                     T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                             47
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

 [2] N OGA A LON , O DED G OLDREICH , AND Y ISHAY M ANSOUR: Almost k-wise independence versus
     k-wise independence. Information Processing Letters, 88(3):107–110, 2003. [doi:10.1016/S0020-
     0190(03)00359-4] 40

 [3] S ANJEEV A RORA , B OAZ BARAK , AND DAVID S TEURER: Subexponential algorithms for Unique
     Games and related problems. J. ACM, 62(5):42:1–42:25, 2015. Preliminary version in FOCS’10.
     [doi:10.1145/2775105] 3

 [4] S ANJEEV A RORA , S ATISH R AO , AND U MESH VAZIRANI: Expander flows, geometric embed-
     dings and graph partitioning. J. ACM, 56(2):5:1–5:37, 2009. Preliminary version in STOC’04.
     [doi:10.1145/1502793.1502794] 3

 [5] P ER AUSTRIN AND J OHAN H ÅSTAD: On the usefulness of predicates. ACM Trans. Comput.
     Theory, 5(1):1:1–1:24, 2013. Preliminary version in CCC’12. [doi:10.1145/2462896.2462897,
     arXiv:1204.5662] 4, 6

 [6] M IHIR B ELLARE , O DED G OLDREICH , AND M ADHU S UDAN: Free bits, PCPs, and non-
     approximability – towards tight results. SIAM J. Comput., 27(3):804–915, 1998. Preliminary
     version in FOCS’95 and ECCC. [doi:10.1137/S0097539796302531] 3

 [7] S IU O N C HAN: Approximation resistance from pairwise independent subgroups. J. ACM,
     63(3):27:1–27:32, 2016. Preliminary versions in STOC’13 and ECCC. [doi:10.1145/2873054] 4, 6,
     8

 [8] M OSES C HARIKAR , KONSTANTIN M AKARYCHEV, AND Y URY M AKARYCHEV: Near-optimal
     algorithms for Unique Games. In Proc. 38th STOC, pp. 205–214. ACM Press, 2006.
     [doi:10.1145/1132516.1132547] 3

 [9] M IROSLAV C HLEBÍK AND JANKA C HLEBÍKOVÁ: Minimum 2SAT-DELETION: Inapproximability
     results and relations to minimum vertex cover. Discrete Applied Math., 155(2):172–179, 2007.
     Preliminary version in MFCS’04. [doi:10.1016/j.dam.2006.04.039] 47

[10] I RIT D INUR AND S AMUEL S AFRA: The importance of being biased. In Proc. 34th STOC, pp.
     33–42. ACM Press, 2002. [doi:10.1145/509907.509915] 47

[11] U RIEL F EIGE AND DANIEL R EICHMAN: On systems of linear equations with two variables
     per equation. In Proc. 7th Internat. Workshop on Approximation Algorithms for Combinato-
     rial Optimization Problems (APPROX’04), volume 3122 of LNCS, pp. 117–127. Springer, 2004.
     [doi:10.1007/978-3-540-27821-4_11] 2

[12] E HUD F RIEDGUT: Boolean functions with low average sensitivity depend on few coordinates.
     Combinatorica, 18(1):27–35, 1998. [doi:10.1007/PL00009809] 39

[13] M ICHEL X. G OEMANS AND DAVID P. W ILLIAMSON: Improved approximation algorithms for
     maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–
     1145, 1995. Preliminary version in STOC’94. [doi:10.1145/227683.227684] 3

                    T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                      48
              I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

[14] A NUPAM G UPTA , K UNAL TALWAR , AND DAVID W ITMER: Sparsest Cut on bounded treewidth
     graphs: algorithms and hardness results. In Proc. 45th STOC, pp. 281–290. ACM Press, 2013.
     [doi:10.1145/2488608.2488644, arXiv:1305.1347] 47

[15] J OHAN H ÅSTAD: Some optimal inapproximability results. J. ACM, 48(4):798–859, 2001. Prelimi-
     nary version in STOC’97. [doi:10.1145/502090.502098] 3, 4, 47

[16] J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND
     J OHN W RIGHT: Improved NP-inapproximability for 2-variable linear equations. pp. 341–360.
     [doi:10.4230/LIPIcs.APPROX-RANDOM.2015.341] 1

[17] S UBHASH K HOT: On the power of unique 2-prover 1-round games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017] 2, 3

[18] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for Max-Cut and other 2-variable CSPs? SIAM J. Comput., 37(1):319–357,
     2007. Preliminary version in FOCS’04. [doi:10.1137/S0097539705447372] 2, 3

[19] DANA M OSHKOVITZ AND R AN R AZ: Two-query PCP with subconstant error. J. ACM, 57(5):29,
     2010. Prleiminary version in FOCS’08. [doi:10.1145/1754399.1754402] 3, 4, 44

[20] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability of
     functions with low influences: invariance and optimality. Annals of Mathematics, 171(1):295–341,
     2010. Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 2, 3

[21] RYAN O’D ONNELL AND J OHN W RIGHT: A new point of NP-hardness for Unique-Games. In Proc.
     44th STOC, pp. 289–306. ACM Press, 2012. [doi:10.1145/2213977.2214005] 4, 47

[22] A NUP R AO: Parallel repetition in projection games and a concentration bound. SIAM J. Comput.,
     40(6):1871–1891, 2011. Preliminary version in STOC’08. [doi:10.1137/080734042] 3

[23] A LEX S AMORODNITSKY AND L UCA T REVISAN: A PCP characterization of NP with op-
     timal amortized query complexity. In Proc. 32nd STOC, pp. 191–199. ACM Press, 2000.
     [doi:10.1145/335305.335329] 44

[24] L UCA T REVISAN , G REGORY B. S ORKIN , M ADHU S UDAN , AND DAVID P. W ILLIAMSON:
     Gadgets, approximation, and linear programming. SIAM J. Comput., 29(6):2074–2097, 2000.
     Preliminary version in FOCS’96. [doi:10.1137/S0097539797328847] 3, 4, 6, 7, 14

[25] RYAN W ILLIAMS: A new algorithm for optimal 2-constraint satisfaction and its implica-
     tions. Theoret. Comput. Sci., 348(2-3):357–365, 2005. Preliminary version in ICALP’04.
     [doi:10.1016/j.tcs.2005.09.023] 3




                     T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                        49
 J OHAN H ÅSTAD , S ANGXIA H UANG , R AJSEKAR M ANOKARAN , RYAN O’D ONNELL , AND J OHN W RIGHT

AUTHORS

     Johan Håstad
     Professor
     KTH Royal Institute of Technology
     Stockholm, Sweden
     johanh kth se
     http://www.csc.kth.se/~johanh


     Sangxia Huang
     Postdoctoral Researcher
     EPFL
     Lausanne, Switzerland
     huang sangxia gmail com
     http://huang.sangxia.info/


     Rajsekar Manokaran
     Assistant professor
     Indian Institute of Technology
     Madras, India
     rajsekar gmail com
     http://www.cse.iitm.ac.in/~rajsekar


     Ryan O’Donnell
     Associate professor
     Carnegie Mellon University
     Pittsburgh, PA
     odonnell cs cmu edu
     http://www.cs.cmu.edu/~odonnell/


     John Wright
     Postdoctoral researcher
     Massachusetts Institute of Technology
     Cambridge, MA
     jswright mit edu




                    T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                     50
            I MPROVED NP-I NAPPROXIMABILITY FOR 2-VARIABLE L INEAR E QUATIONS

ABOUT THE AUTHORS

    J OHAN H ÅSTAD received his Bachelor of Science from Stockholm University in 1981, his
        Master of Science from Uppsala University in 1984, and his Ph. D. from MIT in 1986
        under the supervision of Shafi Goldwasser. Johan was appointed Associate Professor
        at the Royal Institute of Technology in Stockholm, Sweden in 1988 and advanced
        to the level of Professor in 1992. He was elected a member of the Swedish Royal
        Academy of Sciences in 2001. He has research interests within several subareas of
        Theory of Algorithms and Complexity Theory but has recently mainly focused on the
        approximability of NP-hard optimization problems.


    S ANGXIA H UANG received a B. Eng. from Shanghai Jiao Tong University in 2010, and a
       Ph. D. from KTH Royal Institute of Technology in 2015 under the supervision of Johan
       Håstad. He is currently a postdoc at EPFL in Ola Svensson’s group. His research is
       mainly in the area of approximation algorithms and hardness of approximation.


    R AJSEKAR M ANOKARAN graduated from Princeton in 2012; his advisor was Sanjeev Arora.
       His thesis focused on the approximability of CSPs using mathematical relaxations. He
       was appointed as an assistant professor at the Indian Institute of Technology in Madras,
       India in 2013.


    RYAN O’D ONNELL received a B. S. from the University of Toronto in 1999 and a Ph. D.
      from the MIT Mathematics Department in 2003. His Ph. D. advisor was Madhu Sudan.
      Following this he was a postdoc at IAS for a year in Avi Wigderson’s group, and a
      postdoc at Microsoft Research for two years in Jennifer Chayes’s group. Since 2006 he
      has been an assistant professor in the Computer Science Department at Carnegie Mellon
      University. Ryan’s research interests include Analysis of Boolean Functions, Constraint
      Satisfaction Problems, Quantum Complexity, and Probability. He enjoys his spare time.


    J OHN W RIGHT received a B. S. from the University of Texas at Austin in 2010 and a Ph. D.
        from the CMU Computer Science Department in 2016. His Ph. D. advisor was Ryan
        O’Donnell. He is currently a postdoc at the MIT Center for Theoretical Physics in Eddie
        Farhi, Aram Harrow, and Peter Shor’s group. His research areas include hardness of
        approximation and quantum computing.




                   T HEORY OF C OMPUTING, Volume 13 (19), 2017, pp. 1–51                          51