Easily refutable subformulas of large random 3CNF formulas

Authors Uriel Feige, Eran Ofek,
Plaintext
                                   T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43
                                              http://theoryofcomputing.org




Easily refutable subformulas of large random
               3CNF formulas
                                                Uriel Feige                             Eran Ofek
                                         Received: May 2, 2006; published: February 9, 2007.




        Abstract: A simple nonconstructive argument shows that most 3CNF formulas with cn
        clauses (where c is a sufficiently large constant) are not satisfiable. It is an open question
        whether there is an efficient refutation algorithm that for most formulas with cn clauses
        proves that they are not satisfiable. We present a polynomial time algorithm that for most
        3CNF formulas with cn3/2 clauses (where c is a sufficiently large constant) finds a subfor-
        mula with Θ(c2 n) clauses and then uses spectral techniques to prove that this subformula
        is not satisfiable (and hence that the original formula is not satisfiable). Previously, it was
        only known how to efficiently certify the unsatisfiability of random 3CNF formulas with
        at least poly(log(n)) · n3/2 clauses. Our algorithm is simple enough to run in practice. We
        present some experimental results.

ACM Classification: F.2.2
AMS Classification: 68Q17,68Q25
Key words and phrases: proof complexity, average case analysis, Boolean formula, 3CNF, refutation,
spectral methods

1     Introduction
A 3CNF formula φ over n variables is a set of m clauses, each one contains exactly 3 literals of three
different variables. A formula φ is satisfiable if there exists an assignment to its n variables such that in
each clause there is at least one literal whose value is true. The problem of deciding whether an input
3CNF formula φ is satisfiable is NP-hard, but this does not rule out the possibility of designing a good

Authors retain copyright to their work and grant Theory of Computing unlimited rights
to publish the work electronically and in hard copy. Use of the work is permitted as
long as the author(s) and the journal are properly acknowledged. For the detailed
copyright statement, see http://theoryofcomputing.org/copyright.html.

c 2007 Uriel Feige and Eran Ofek                                                                    DOI: 10.4086/toc.2007.v003a002
                                       U RIEL F EIGE AND E RAN O FEK

heuristic for it. A heuristic for satisfiability may try to find a satisfying assignment for an input formula
φ , in case one exists. A refutation heuristic may try to prove that no satisfying assignment exists. In this
paper we present an algorithm which tries to refute an input formula φ . The algorithm has one sided
error, in the sense that it will never say “unsatisfiable” on a satisfiable formula, but for some unsatisfiable
formulas it will fail to output “unsatisfiable”. It then follows that for a formula φ on which the algorithm
outputs “unsatisfiable,” its execution on φ is a witness for the unsatisfiability of φ .
     How does one measure the quality of a refutation heuristic? A possible test may be to check how
good the heuristic is on a random input. But then, how do we generate a random unsatisfiable formula?
To answer this question we review some known properties of random 3CNF formulas. The satisfiability
property has the following interesting threshold behavior. Let φ be a random 3CNF formula with n
variables and cn clauses (each new clause is chosen independently and uniformly from the set of all
possible clauses). As the parameter c is increased, it becomes less likely that φ is satisfiable, as there
are more constraints to satisfy. In [8] it is shown that there exists cn such that for c < cn (1 − ε) almost
surely φ is satisfiable, and for c > cn (1 + ε), φ is almost surely unsatisfiable (for some ε which tends to
zero as n increases). It is also known that 3.52 < cn < 4.596 [14, 12, 13]. We will use random formulas
with cn clauses (for c > cn (1 + ε)) to measure the performance of a refutation heuristic. Specifically, the
refutation heuristic is considered good if for some c > (1 + ε)cn it almost surely proves that a random
formula with cn clauses is unsatisfiable.
     Notice that for any n, as c is increased (for c > cn (1 + ε)), the algorithmic problem of refutation
becomes less difficult since we can always ignore a fixed fraction of the clauses. The following question
is still open: how small can c be so that there is still a polynomial time algorithm which almost surely
refutes random 3CNF formulas with cn clauses (c may also be an increasing function of n).
     A possible approach for refuting a formula φ is to find a resolution proof for the unsatisfiability of
φ . In this approach one derives new clauses implied by φ by combining pairs of clauses in which one
clause contains a variable and the other clause contains the negation of this variable. Any satisfying
assignment must satisfy at least one of the remaining literals contained in the two clauses, and hence the
collection of these literals is a CNF clause implied by φ . A sequence of iterations of this resolution step
that eventually generates the empty clause is a proof that φ is not satisfiable. Chvátal and Szemerédi [3]
proved that a resolution proof of a random 3CNF formula with linear number of clauses is almost surely
of exponential size. A result of a similar flavor for denser formulas was given by Ben-Sasson and
Wigderson [2] who showed that a random formula with n3/2−ε clauses almost surely requires a resolution
proof of size 2Ω(n
                   ε/(1−ε) )
                             . These lower bounds imply that finding a resolution proof for a random formula
is computationally inefficient.
     A simple refutation algorithm can be used to refute random formula φ with n2 clauses. This is done
by considering only those clauses that contain a particular variable x. Fixing x to be true leaves about
half of the selected clauses as a random 2CNF formula with roughly 3n/2 clauses. A 2CNF formula
with this number of clauses is almost surely not satisfiable. Moreover, any polynomial time algorithm
for 2SAT can be used to certify that this particular sub-formula is not satisfiable. The same can be done
when fixing x to be false, implying that no matter how x is assigned, the formula φ cannot be satisfied.
     A new approach, introduced by Goerdt and Krivelevich in [10], gave a significant improvement and
reduced the bound to log7 n · nk clauses for efficient refutation of 2kCNF formulas. This approach was
later extended in [9] and [11] to handle also random 3CNF formulas with n3/2+ε and poly(log n) · n3/2

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                 26
                 E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

clauses, respectively. In [5, 7] it is shown how to efficiently refute a random 2kCNF instances with at
least cnk clauses.
    The difficulty of finding refutation algorithms for formulas with linearly many clauses may lead
one to assume that no such algorithm exists. It is shown in [6] that this assumption (that there is no
polynomial time refutation heuristic that works for most 3CNF formulas with cn clauses, where c is an
arbitrarily large constant) implies that certain combinatorial optimization problems (like minimum graph
bisection, the 2-catalog segmentation problem, and others) have no polynomial time approximation
schemes. It is an open question whether it is NP-hard to approximate these problems arbitrarily well,
though further evidence that these problems are indeed hard to approximate is given in [15].
    Our refutation algorithm is based on techniques similar to those used in earlier work, such as [9, 5, 6].
We use these techniques in a different way, resulting in an algorithm that is easier to implement, easier
to analyze, and works at lower densities than previous algorithms. For example, both the algorithms
in [9, 11] and our algorithm perform eigenvalue computations on some random matrices derived from
the random input formula φ . However, our matrices are much smaller (of order n rather than n2 ),
making the computational task easier. Moreover, the structure of our matrices is simpler, making the
analysis of our algorithm simpler, and easier to apply also to formulas with fewer clauses than those
in [9, 11]. As a result of this simplicity, we can show that our algorithm refutes formulas with cn3/2
clauses, whereas the algorithms given in [9] and [11] are claimed only to refute formulas with Ω(n3/2+ε )
and Ω(poly(log n) · n3/2 ) clauses, respectively. An implementation of our algorithm refuted a random
formula with n = 50000 variables and 27335932 < 3n3/2 clauses (see details in Section 5).
    In some other respects, our algorithm is more limited then the algorithms in [9, 11]. An algorithm
is said to provide strong refutation if it shows not only that the input 3CNF formula is not satisfiable,
but also that every assignment to the variables fails to satisfy a constant fraction of the clauses. Our
refutation algorithm does not provide a strong refutation. The problem of strong refutation was addressed
in [4], where it was shown that variations of the algorithms of [9, 11] can strongly refute random 3CNF
formulas with at least log6 (n) · n3/2 clauses. The ability to perform strong refutation is an important
issue, and its relation to approximability is discussed in [6].


2      Preliminaries
2.1     The random model
Definition 2.1. A clause is a 3-tuple of literals that belong to three different variables. The set Cn is the
set of all possible clauses over n fixed variables (there are 2n · 2(n − 1) · 2(n − 2) such clauses).

      We use the following model for generating the random formula φ .

Definition 2.2. A 3CNF formula φ is generated by choosing m clauses from Cn independently at random
with repetitions. Such a random formula is denoted by φ ∈R Cnm .

    Although we concentrate on a specific random model for generating random formulas, our algorithm
succeeds also on other related random models. For example the formula can be a random set of cn3/2
distinct clauses.

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                               27
                                        U RIEL F EIGE AND E RAN O FEK

2.2   Notation
                                  a(n)
We write a(n) ∼ b(n) if limn→∞ b(n)    → 1. We use the term w.h.p. (with high probability) to denote a
sequence of probabilities that converges to 1 as n increases.

2.3   Efficient certification of a property
An important concept which will be frequently used is the concept of efficient certification. Let P
be some property of graphs/formulas or any other combinatorial object. An algorithm A certifies the
property P if the following holds:

   1. On any input instance φ the algorithm returns either “has P” or “don’t know.”

   2. Soundness: The algorithm never outputs “has P” on an instance φ which does not have the prop-
      erty P. The algorithm may output “don’t know” on an instance φ which has the property P (the
      algorithm has one sided error).

We will use certification algorithms on random instances of formulas/graphs taken from some probabil-
ity space C. We shall consider properties that are almost surely true for the random object taken from C.
A certification algorithm is complete with respect to the probability space C and a property P if it almost
surely outputs “has P” on an input taken from C.
     The computationally heavy part of our algorithm is certifying that two different graphs derived from
the random formula φ do not have a large cut. One of these two graphs is random, and the other is
a multigraph that is the union of 6 graphs, where each of these graphs by itself is essentially random,
but there are correlations among the graphs. A cut in a graph is a partition of its vertices into two sets.
The size of the cut is the number of edges with one endpoint in each part. A certification algorithm for
verifying that an input graph with m edges has no cut significantly larger than m/2 is implicit in [16].
This algorithm is based on semi-definite programming; if the maximum cut in the input graph is of size
at most m(1/2 + ε), then the algorithm outputs a certificate that the maximum cut in G is bounded by
m(1/2 + δ (ε)), where δ (ε) → 0 as ε → 0. A computationally simpler algorithm can be applied if the
graph is random. In [7] it is shown how to certify that in a random graph taken from Gn,d/n the size of the
                                             √ 
maximum cut is bounded by dn 1/4 + θ (1/ d) , thus bounding the maximum cut by m(1/2 + ε) when
d is large enough. This is done by removing the vertices of highest degree from G, and then computing
the most negative eigenvalue of the adjacency matrix of the resulting graph.

2.4   An overview of our refutation algorithm
Our algorithm builds on ideas taken from earlier work ([10, 9, 6, 7, 5, 11]). This section gives an informal
overview of the algorithm at a fairly detailed level. Other sections of this manuscript fill in the formal
details.
    The input is an arbitrary 3CNF formula φ with n variables and m = cn3/2 clauses, where c is a large
enough constant. Below we describe the expected behavior of our algorithm when the input formula is
random. The algorithm first greedily extracts from φ a subformula φ 0 . This is done as follows. We say
that two clauses match if they differ in their first literal, but agree on their second literal and on their third

                           T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                   28
                E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

literal. For example, the clauses (x1 ∨ x̄2 ∨ x3 ) and (x4 ∨ x̄2 ∨ x3 ) match. The subformula φ 0 is constructed
by greedily putting into φ 0 pairs of clauses that match, until no further matches are found in φ (we allow
each clause of φ to participate in at most one matched pair of φ 0 ). Let m0 be the number of clauses in φ 0 .
A simple probabilistic argument shows that we can expect m0 = Θ(m2 /n2 ) = Θ(c2 n). Moreover, φ 0 is
essentially a union of two random (but correlated) formulas φ1 and φ2 (each containing one clause from
every pair of clauses that are matched in φ 0 ). Our algorithm will now ignore the rest of φ , and refute φ 0 .
As explained, φ 0 is a union of two random formulas. Here we use an observation that is made in [6], that
we shall call the 3XOR principle.
The 3XOR principle: In order to show that a random 3CNF formula is not satisfiable, it suffices to
strongly refute it as a 3XOR formula.
     Let us explain the terms used in the 3XOR principle. A clause in a 3XOR formula is satisfied if
either one or three of its literals are satisfied. A strong refutation algorithm is one that shows that every
assignment to the variables leaves at least a constant fraction of the clauses not satisfied (as 3XOR
clauses, in our case).
     A proof of the 3XOR principle is given in [6]. We sketch it here, and give it in more details in
Section 3. Observe that in a random formula every literal is expected to appear the same number of times,
and if the number of clauses is large enough, then things behave pretty much like their expectation. As a
consequence, every assignment to the variables sets roughly half the occurrences of literals to true, and
roughly half to false. Hence every assignment satisfies on average 3/2 literals per clause. Moreover, this
property is easily certifiable, by summing up the number of occurrences of the n most popular literals.
     Given that every assignment satisfies on average 3/2 literals per clause, let us consider properties
of satisfying assignments (if such assignments exist). The good option is that they satisfy one literal
in roughly 3/4 of the clauses, three literals in roughly 1/4 of the clauses, and 2 literals in a negligible
fraction of the clauses. This keeps the average roughly at 3/2, and indeed nearly satisfies the formula
also as a 3XOR formula, as postulated by the 3XOR principle. The bad option (which also keeps the
average at 3/2) is that the fraction of clauses that are satisfied three times drops significantly below 1/4,
implying that significantly more than 3/4 of the clauses are satisfied either once or twice, or in other
words, as a 3NAE (3-“not all equal” SAT) formula. But here, let us combine two facts. One is that
for a random large enough formula, every assignment satisfies roughly 3/4 of the clauses as a 3NAE
formula. The other (to be explained below) is that there are known efficient algorithms for certifying
that no assignment satisfies more than 3/4 + ε fraction of the clauses of a 3NAE formula. Hence for a
random 3CNF formula, one can efficiently certify that the bad option mentioned above does not occur.
     Having established the 3XOR principle, the next step of our algorithm makes one round of Gaussian
elimination. That is, under the assumption that we are looking for near satisfiability as 3XOR (which
is simply a linear equation modulo 2), we can add clauses modulo 2. Adding (modulo 2) two matched
clauses, the common literals drop out, and we get a clause with only two literals whose XOR is expected
to be 0, namely, a 2EQ clause (EQ for equality). For example, from the clauses (x1 ⊕ x̄2 ⊕ x3 ) and
(x4 ⊕ x̄2 ⊕ x3 ) one gets the clause (x1 = x4 ). Doing this for all pairs of matched clauses in φ 0 , we get
a random 2EQ formula φ2eq . If φ 0 was nearly satisfiable as 3XOR, then φ2eq must be nearly satisfiable
as 2EQ. But if φ 0 is random, then φ2eq is essentially a random 2EQ formula. For such formulas, every
assignment satisfies roughly half the clauses. Moreover, there are known algorithms that certify this (to
be explained shortly). Hence we can strongly refute φ2eq as 2EQ, implying strong refutation of φ 0 as

                           T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                 29
                                          U RIEL F EIGE AND E RAN O FEK

3XOR, implying strong refutation of φ 0 as 3SAT, implying refutation (though not strong refutation) of
φ as 3SAT.
     Let us briefly explain here the major part that we skipped over in the description of our algorithm,
namely, how to certify that a random 2EQ formula is not 1/2 + ε satisfiable, and how to certify that a
random 3NAE formula is not 3/4 + ε satisfiable. In both cases, we reduce the certification problem to
certifying that certain random graphs do not have large cuts, and then use the certification algorithms
mentioned in Section 2.3. (The principle of refuting random formulas by reduction to random graph
problems was introduced in [10].)
     To strongly refute random 2EQ formulas, we negate the first literal in every clause, getting a 2XOR
formula. Now we construct a graph whose vertices are the literals, and whose edges are the clauses. A
nearly satisfying 2XOR assignment naturally partitions the vertices into two sides (those literals set to
true by the assignment versus those that are set to false), giving a cut containing nearly all the edges. On
the other hand, if the original 2EQ formula was random, then so is the graph, and it does not have any
large cut. As explained in Section 2.3, we can efficiently certify that the graph does not have a large cut,
thus strongly refuting the 2XOR formula, and hence also strongly refuting the original 2EQ formula.
     To strongly refute random 3NAE formulas, we again consider a max-cut problem on a graph (in fact,
a multigraph, as there may be parallel edges) whose vertices are the literals. From each 3NAE clause we
derive three edges, one for every pair of literals. For example, from the 3NAE clause (x1 , x̄2 , x3 ) we get
the edges (x1 , x̄2 ), (x̄2 , x3 ) and (x3 , x1 ). It is not hard to see that if a 3/4 + ε fraction of the 3NAE clauses
are satisfied as 3NAE, than a 32 ( 34 + ε) = 12 + 2ε3 fraction of the edges of the graph are cut by the partition
induced by the corresponding assignment. Note that in our case (of φ 0 that is the union of random φ1
and random φ2 ) this graph is essentially a union of 6 random graphs: 3 graphs derived from the clauses
of φ1 (one with edges derived from the first two literals in every clause, one with edges derived from
the last two literals, and one from the first and third literal), and 3 graphs derived from φ2 . Hence it is
not expected to have a cut containing significantly more than half the edges. One may certify that this
is indeed the case either by using the algorithm of [16] on the whole graph, or by using the algorithm
of [7] on each of the 6 random graphs separately.
     Summarizing, our refutation algorithm extracts from φ a subformula φ 0 (composed of matched pairs
of clauses), checks that in φ 0 almost all literals appear roughly the same number of times, derives from
φ 0 certain graphs on 2n vertices and certifies that they do not have large cuts (e. g., by computing the
most negative eigenvalue of their adjacency matrices). The combination of all this evidence forms a
proof that φ is not satisfiable. If φ is random and large enough (cn3/2 clauses), then almost surely the
algorithm will indeed manage to collect all the desired evidence.



3    The refutation algorithm
                                           3/2
The input formula φ is taken from Cncn . We will use (?, w, `) to denote a clause in which the second and
the third literals are w and `, respectively, and the first literal can be any literal. The following algorithm
is used to extract φ 0 from φ .


                            T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                       30
                E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS


           Algorithm Extract(φ )
             Set φ1 = φ2 = 0.
                           /
             For every ordered pair of literals (w, `):

              1. Count the number of clauses in φ of the form (?, w, `) and store it in N(w,`) .

              2. If N(w,`) ≥ 2 add to φ1 the first appearance of a (?, w, `) clause and add to φ2
                 the second appearance of a (?, w, `) clause.

           Return φ 0 = (φ1 , φ2 ).


    Each of φ1 , φ2 is a random formula, though clauses in φ1 (and φ2 ) are not completely independent
of each other: if a clause (x, y, z) appears, then the clause (t, y, z) cannot appear. From now on we will
concentrate on refuting φ 0 , ignoring the rest of φ . The number of matched pairs in φ 0 is denoted by m
(in Section 2.4 we used m to denote the number of clauses in φ ; from here on, m will denote the number
of matched pairs in φ 0 ).
                                                                                   3/2
Lemma 3.1. Let φ 0 be the formula returned by Extract(φ ), where φ ∈R Cncn . W.h.p. the number of
                           2
matched pairs in φ 0 is ∼ c8n .

   The proof of Lemma 3.1 is deferred to Section 4. Before specifying the algorithm, we introduce
additional notation and definitions which will ease the description of the algorithm.

Definition 3.2. Let φ be any 3CNF formula over n variables. The graph induced by φ has 2n ver-
tices (corresponding to all possible literals) and the following edges. Each clause of φ induces three
edges by taking all (unordered) pairs of literals from it (e. g., the clause (x, y, z̄) induces the edges
(x, y), (x, z̄), (y, z̄)). We denote the (multi) graph induced by φ by Gφ .

Definition 3.3. Let φ 0 = (φ1 , φ2 ) be a 3CNF formula with m pairs of matched clauses. The graph Gφ 0 is
the graph induced by φ 0 (as in Definition 3.2). The graph G2eq
                                                             φ 0 is a graph with 2n vertices (corresponding
to all literals). Its edges are as follows: each matched pair from φ1 , φ2 , say (x, w, `), (y, w, `), induces
exactly one edge (x̄, y). Note that we negate the literal which corresponds to the clause of φ1 .

Definition 3.4. Let φ be a formula with n variables and m clauses. The imbalance of a variable i (denoted
by Imi ) is the difference in absolute value between the number of times it appears with positive polarity
and the number of times it appears with negative polarity. The total imbalance of φ is ∑ni=1 Imi and the
normalized imbalance of φ is (1/3m) ∑ni=1 Imi .

    If the normalized imbalance of φ is bounded by δ , then φ is δ -balanced.

Definition 3.5. A 3CNF formula φ has the (1 − ε) 3XOR property if for every assignment A, if A
satisfies φ as 3CNF, then at least 1 − ε fraction of the clauses are satisfied as 3XOR.

Definition 3.6. A graph is said to have a δ -cut if there is a partition of its vertices into two disjoint sets
such that at least a δ -fraction of the edges cross this partition.

                           T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                31
                                        U RIEL F EIGE AND E RAN O FEK


           Algorithm Refute(φ 0 )

              1. Certify that φ 0 has the (1 − γ) 3XOR property. Specifically:
                 (a) Find the normalized imbalance of φ 0 and denote it by δ 0 .
                 (b) Certify that Gφ 0 has no ( 12 + ε 0 )-cut (ε 0 is returned by a subroutine).
                 Set γ = 32 (δ 0 + 2ε 0 ).

              2. Certify that G2eq          1
                               φ 0 has no ( 2 + ε)-cut (ε is returned by a subroutine).

              3. If ε + 2γ < 12 return “unsatisfiable,” otherwise return “don’t know.”


    In steps 1(b) and 2 of Refute(φ 0 ) we use as a blackbox a subroutine for bounding the maximum cuts
in Gφ 0 and G2eq
              φ 0 . This subroutine is explained in Theorem 3.10.
    We first show that Refute(φ 0 ) is sound, i. e., whenever it returns “unsatisfiable” it holds that φ 0 can
not be satisfied. This follows from Lemma 3.8 and Theorem 3.7.

Theorem 3.7 (Soundness). Let φ 0 = (φ1 , φ2 ) be a 3CNF formula composed of pairs of matched clauses.
Denote by G2eq                           0                                             0
               φ 0 the graph induced by φ as described in Definition 3.3. The formula φ is not satisfiable
if all the following conditions hold:

   1. φ 0 has the (1 − γ)3XOR property,

   2. G2eq          1
       φ 0 has no ( 2 + ε)-cut,

   3. ε + 2γ < 12 .

Proof. We shall show that if φ 0 is satisfiable and has the (1 − γ)3XOR property, then G2eq
                                                                                        φ 0 has a (1 − 2γ)-
cut. Combined with the fact that G2eq
                                  φ 0 has no 1/2 + ε cut we derive a contradiction (since ε + 2γ < 1/2).
Consider the cut induced on the vertices of G2eq   φ 0 by a satisfying assignment A (where in one side there
are all the literals whose value is true and in the other side there are all the literals whose value is false).
A(x) denotes the value of the literal x induced by the assignment A. By the 3XOR property of φ 0 , all but γ
fraction of the clauses of φ 0 are satisfied as 3XOR clauses. Hence at least a (1 − 2γ) fraction of the pairs
of matched clauses have both clauses in the pair satisfied by A as 3XOR. Let (x, w, `), (y, w, `) be a pair
such that both (x, w, `) and (y, w, `) are satisfied as 3XOR. It holds that: A(x) + A(y) + 2(A(w) + A(`)) =
0 mod 2. Thus exactly one of the literals x̄, y is true and the other is false (under the assignment A), and
the edge (x̄, y) induced by this pair of clauses crosses the cut. It then follows that at least 1 − 2γ fraction
of the edges in G2eqφ 0 are cut edges.

Lemma 3.8 (The 3XOR lemma). Let φ be a 3CNF formula. If the following hold:

   1. φ is δ -balanced, and

   2. the graph (induced by φ ) Gφ has no ( 12 + ε)-cut,

                           T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                 32
                E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

then φ has the (1 − 23 (δ + 2ε))3XOR property.

Proof. The proof of this lemma appears in [6]; also a similar version of this lemma (for denser random
2kCNF formulas) appears in [5]. We repeat the proof for the sake of self-containment.
    Denote by m the number of clauses in φ and let A be a satisfying assignment. We bound from
above the number of satisfied appearances of literals. The assignment which maximizes the number
of satisfied appearances of literals is the ’majority vote’: a variable x is assigned ’true’ iff it appears
more times with positive polarity than with negative polarity. Using this assignment the total number
of satisfied appearances is 21 3m + ∑ni=1 Imi (where Imi denotes the imbalance of variable i). It follows
that on average each clause is satisfied at most 32 + 2m1
                                                           ∑ni=1 Imi = 23 (1 + δ ) times.
    A clause is satisfied as 3AND by A if all its literal are satisfied by A. We next show that the fraction of
clauses satisfied by A as 3AND is at least 1/4 − 3ε/2. Equivalently it is enough to show that the fraction
of clauses satisfied as 3NAE, denoted by β , is at most 3/4 + 3ε/2 (since A is a satisfying assignment).
Consider the graph Gφ induced by φ . We remind the reader that each clause of φ contributes a “triangle”
of 3 edges to Gφ (e. g., the clause (x, ȳ, z) contributes the edges (x, ȳ), (ȳ, z), (x, z)). Consider the cut
induced by the satisfying assignment A. Each clause satisfied as 3NAE contributes exactly 2 edges to
the cut, thus this cut has at least 2β m edges. But Gφ has no (1/2 + ε)-cut. It follows that 2β m ≤
(1/2 + ε)3m, giving β ≤ 3/4 + 3ε/2 as needed.
    It remains to show that all but a small fraction of the clauses are satisfied as 3XOR by A. Denote
by α1 , α2 , α3 the fraction of clauses which are satisfied exactly once, exactly twice and exactly 3 times
respectively (∑3i=1 αi = 1). We already know that α3 ≥ 1/4 − 3ε/2 and that each clause is satisfied at
most 3/2 + δ times on average, thus

                                     3(1 + δ )
                                               ≥ 3 · α3 + 2 · α2 + 1 · α1 .
                                         2
Substituting α1 = (1 − α3 − α2) and α3 with 41 − 23 ε yields that α2 ≤ 23 (δ + 2ε).
                                                                                                            3/2
    The following two Theorems show that our refutation algorithm refutes most formulas φ ∈R Cncn .
                                                                                                            3/2
Theorem 3.9. For any ε > 0 there is a constant c = c(ε) such that w.h.p. over the choice of φ ∈R Cncn :

  (a) the subformula φ 0 is ε-balanced, and

  (b) each of the graphs Gφ 0 and G2eq          1
                                   φ 0 has no ( 2 + ε)-cut.

    It is well known and follows from standard probabilistic calculations that a random graph (with large
enough average degree) has no (1/2+ε)-cut, and that a random formula is ε-balanced. The distributions
of φ 0 , Gφ 0 are close enough to the standard models of random formulas/graphs respectively, so that the
proof techniques used for the random cases can be applied also in our case. The proof of Theorem 3.9
is deferred to Section 4.2.

Theorem 3.10. There is a polynomial time algorithm that finds the imbalance of a 3CNF formula. There
is a polynomial time algorithm that for every graph G that has no ( 12 + ε)-cut, certifies that G has no
( 21 + δ )-cut, where δ (ε) → 0 as ε → 0.


                           T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                 33
                                       U RIEL F EIGE AND E RAN O FEK

Proof. Finding the normalized imbalance of a formula φ is done by counting positive and negative
appearances for every variable, computing its imbalance and averaging.
    An algorithm for certifying a bound on the maximum cut of a graph is given in [16]. Given a
graph with m edges whose maximum cut is bounded by m(1/2 + ε) this algorithm produces a proof that
the input graph has no cut of cardinality m(1/2 + δ ). This algorithm has the property that δ → 0 as
ε → 0.

Corollary 3.11 (Almost-completeness). For sufficiently large constant c, the refutation algorithm w.h.p.
                                                      3/2
returns “unsatisfiable” for a random formula φ ∈R Cncn .
Proof. Using Theorems 3.9, 3.10 we will show that by taking c to be a sufficiently large constant, the
terms ε, γ from Refute(φ 0 ) can be made arbitrary small (and thus ε + 2γ < 1/2). The term γ equals
(3/2)(δ 0 + 2ε 0 ) where δ 0 is the imbalance of φ 0 and (1/2 + ε 0 ) is the bound returned by the algorithm
of [16] when applied to Gφ 0 . By Theorem 3.9, δ 0 → 0 as c increases. Furthermore, the value of the
maximum cut in both Gφ 0 , G2eq φ 0 approaches 1/2 as c increases. It then follows, using Theorem 3.10, that
the bounds 1/2 + ε 0 and 1/2 + ε returned by the certification algorithm (of [16]) applied to Gφ 0 and G2eq
                                                                                                        φ0 ,
respectively, can be made arbitrary close to 1/2 by setting c to be a sufficiently large constant.



4     Proofs of Lemma 3.1 and Theorem 3.9
4.1   Proof of Lemma 3.1
Proof of Lemma 3.1. For two random clauses the probability that they induce a matched pair of clauses
is
                                             1 1          1
                                        p,             ∼ 2 .
                                            2n 2n − 2 4n
Thus, the expected number of pairs that match is
                                       3/2 
                                        cn         c2 n3 1    c2 n
                                µ=             p∼           ∼      .                             (4.1)
                                          2          2 4n2     8
Let X denote the number of pairs of clauses in φ that match. We use the second moment to show that
w.h.p. X ∼ µ. By Chebyshev’s inequality
                                                             Var(X)
                                       Pr[|X − µ| > ε µ] <          .                                   (4.2)
                                                              ε2µ2
For every two clause locations in φ (e.g., first clause and third clause) we set an indicator random variable
                     3/2 
Xi (i = 1, 2, ..., cn2 ) to be 1 if the respective two clauses match and otherwise 0. For i 6= j we say that
i ∼ j if the pair i and pair j share one clause location, and otherwise i  j. (Note that if pair i and pair j
share two clause locations, then i = j. Note also that i  j might share the same clause without sharing
a clause location, if a certain clause happened to appear twice in φ .) For any fixed i we let
                                          ∆∗ = ∑ Pr[X j | Xi ] .                                        (4.3)
                                                 j: j∼i



                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                34
                E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

¿From symmetry, ∆∗ does not depend on i.

         Var(X) = E X 2 − µ 2 = E (∑ Xi )2 − µ 2 = E ∑ Xi2 + ∑ Xi X j − µ 2
                                                h                i

                                                i                       i         i6= j
                                                                                                                  !
                 = µ − µ + ∑ Pr[Xi X j ] = µ − µ + ∑ Pr[Xi ]
                             2                          2
                                                                                 ∑ Pr[X j ] + ∑ Pr[X j | Xi ]
                                 i6= j                            i             j: ji          j: j∼i

                 ≤ µ − µ + ∑ Pr[Xi ] ∑ Pr[X j ] + ∑ Pr[Xi ] ∑ Pr[X j | Xi ]
                             2
                                  i             j           i          j: j∼i
                                 |         {z       }                  |         {z       }
                                           µ2                                    ∆∗

                 = µ + ∑ Pr[Xi ]∆∗ = µ(1 + ∆∗ ) .                                                                     (4.4)
                         i

Substituting Var(X) with µ(1 + ∆∗ ) in inequality (4.2) we derive
                                                                µ(1 + ∆∗ ) 1 + ∆∗
                                     Pr[|X − µ| > ε µ] <                  ≤ 2     .                                   (4.5)
                                                                  ε2µ2      ε µ
Thus, since µ = ω(1), it suffices to show that ∆∗ = o(µ). It holds that

                                           ∆∗ ≤ 2(cn3/2 − 2)p = o(µ) .                                                (4.6)

So far we showed that w.h.p. X ∼ µ. Note that X may over-count the number of matched pairs in φ 0 .
The reason is that in φ there are expected to be sets of three or more clauses in which any two clauses
match. From each such set, Extract(φ ) takes to φ 0 only the first two clauses of the set.
    For i ≥ 3, we call a set of i clauses bad if each two clauses of the set match. Let Yi be the number of
bad sets of size i in φ . The number of matched clauses in φ 0 is at least X − ∑i≥3 2i Yi . Thus, in order to
show that the number of matched pairs in φ 0 is ∼ µ it is enough to prove that
                                            "   #
                                                   i
                                          E ∑         Yi = o(µ) .
                                             i≥3 2

Then, using Markov’s inequality we derive that w.h.p. ∑i≥3 2i Yi = o(µ). It holds
                                                             

                        3/2                                             !i                    i−1
                     i      i cn            i2                        cn3/2 e                 1
                 E     Yi =          pi−1 ≤                                                                   .       (4.7)
                     2      2   i           2                           i              4n2 (1 − 1/n)

Thus, the sum ∑i≥3 E 2i Yi is bounded by the sum of a geometric sequence whose first term (i = 3) is
                        

o(µ). It follows that E ∑i≥3 2i Yi = o(µ).
                               



4.2   Proof of Theorem 3.9
Let P ⊂ Cn ×Cn be the set of all possible matched pairs of clauses, and let Pm be the set of all m-tuples of
matched pairs of clauses. For a pair of matched clauses (c1 , c2 ) ∈ P, the inducing pair is the pair formed

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                             35
                                           U RIEL F EIGE AND E RAN O FEK

by the second and third literals in each of the two matched clauses c1 , c2 . Let m denote the number of
                              0             0    m                                    m
    0                P
                        in φm. Note that φ ∈ P0 , however not all the elements of P are in the support
matched pairs of clauses
of φ . We denote by m ⊂ P the support of φ , i. e., the collection of all (ordered) m matched pairs for
which every matched pair of clauses has a distinct inducing pair. We claim that φ 0 is a random element
of mP .

Lemma 4.1. Given that φ 0 has m matched clauses, the set of inducing pairs is a random set of m different
ordered pairs of literals (each pair has two distinct literals).

Proof. Let C denote the set of clauses that have a matching clause (from the original formula φ ). Denote
by L the set of inducing pairs, i. e., pairs that participate as the second and third literal in one of the
clauses of C. The set L may be any set of distinct ordered pairs of size m. By symmetry, any such set is
equally likely to be L. The explanation is as follows. Assume we expose the indices of the clauses in C
and also the partition of C into equivalent classes (each equivalent class is a maximal set of clauses that
have the same second and third literals). Given this information, for each choice of L, the probability
that L is the set of inducing pairs is the same (for any L the number of ways to match the pairs of L with
the equivalent classes is the same; additionally, the probability for all other clauses not in C to avoid all
the pairs of L in the second and third literals is the same).

Proof of Theorem 3.9. ¿From here on we will assume that m is a fixed number         and that m ∼ c2 n/8 (this
                                                                             P
is justified by Lemma 3.1). The formula φ 0 is a random element of m . Let φ 00 ∈R Pm (i. e., φ 00 is
                                                                               

composed of m random and independent samples from P). Denote by Ω0 the event that every matched
pair of clauses in φ 00 has a distinct inducing pair. Conditioned on Ω0 , φ 00 has the distribution of φ 0 . As
Lemma 4.2 shows, the event Ω0 is not too rare (the proof of Lemma 4.2 is deferred to the end of this
section).
                           2                                                  4
Lemma 4.2. Let m = c8n . For φ 00 ∈R Pm it holds that Pr[Ω0 ] ≥ e−c /128
Furthermore, as Lemma 4.3 states, Gφ 00 is unlikely to have a large cut.
                           2
Lemma 4.3. Let m ∼ c8n . For φ 00 ∈R Pm with probability 1 − o(1) it holds that Gφ 00 has no ( 21 + 4c )-cut.
Combining Lemmas 4.2, 4.3 we now show that Gφ 0 is unlikely to have a large cut. The reasoning is as
follows:

                          Pr [Gφ 0 has ( 12 + 4c ) cut] = 00 Pr m [Gφ 00 has ( 21 + 4c ) cut | Ω0 ]      (4.8)
                      φ 0 ∈R (mP )                          φ ∈R P

                                           Prφ 00 [Gφ 00 has ( 12 + 4c ) cut]    o(1)
                                       ≤                         0
                                                                              ≤ −c2 /128 = o(1) ,        (4.9)
                                                        Prφ 00 [Ω ]            e

where the last equality is because c is a fixed constant.
   A similar argument shows that if w.h.p. G2eq                                00
                                                φ 00 has no (1/2 + ε)-cut and φ is ε-balanced, then w.h.p.
G2eq                           0
 φ 0 has no (1/2 + ε)-cut and φ is ε-balanced. Hence, we only need to prove the following lemmas.
                           2
Lemma 4.4. Let m ∼ c8n . For φ 00 ∈R Pm with probability 1 − o(1) it holds that G2eq           1   5
                                                                                 φ 00 has no ( 2 + c )-cut.


                               T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                            36
                 E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

                          2
Lemma 4.5. Let m ∼ c8n . For φ 00 ∈R Pm , with probability 1 − o(1) it holds that φ 00 is 3c -balanced.


    To complete the proof of Theorem 3.9 we now give the proofs of Lemmas 4.3, 4.4, 4.5, and 4.2.

Proof of Lemma 4.3. Fix a partition (V1 ,V2 ) of the vertices of Gφ 00 . We denote by Wφ 00 (V1 ,V2 ) the num-
ber of edges crossing the cut (V1 ,V2 ) in Gφ 00 . For φ 00 ∈R Pm the expectation of Wφ 00 (V1 ,V2 ) is at most
6m(1/2 + 1/n). For any formula φ 00 ∈ Pm we let f (φ 00 ) be equal to Wφ 00 (V1 ,V2 ). Let Xi be the expected
value of f after exposing the first i pairs of φ 00 (for i = 0, 1, . . . , m). The sequence X0 , X1 , . . . , Xm is a
martingale. The following two facts:

   1. for any φ 00 ∈ Pm , changing one matched pair (an element of P) can change the value of f by at
      most 4 (each clause forms a triangle that contributes at most 2 edges to the cut), and

   2. φ 00 is taken from a product measure Pm ,

imply that for every i it holds that |Xi − Xi+1 | ≤ 4 (see Theorem 7.4.1 from [1]). Azuma’s inequality
implies that for any λ > 0
                                                                                λ2
                                       Pr[ f (φ 00 ) − 6m( 12 + 1n ) > λ ] < e− 2m42 .
                                         2
Setting λ = 17m              c n
             c and using m ∼ 8 we derive

                                       Pr [Wφ 00 (V1 ,V2 ) > 6m( 12 + 4c )] < 2−1.1n .
                                    φ 00 ∈R Pm

Using the union bound over all possible cuts we derive that w.h.p. (for φ 00 ∈R Pm ) the graph Gφ 00 has no
( 21 + 4c )-cut.


    The proof of Lemma 4.4 is very similar to the proof of Lemma 4.3, details are omitted.

Proof of Lemma 4.5. We first bound the expected imbalance of φ 00 . The total imbalance of φ 00 is bounded
by the sum of the imbalances of φ1 , φ2 (where φ1 /φ2 are formed by taking the first/second clause from
each matched pair of φ 00 ). Since φ2 has the same distribution of φ1 , it is enough to bound the expected
total imbalance of φ1 (and then multiply by two). The total imbalance of φ1 is the sum of the imbalances
of all variables in φ1 , i. e., ∑ni=1 Imi (the imbalance of xi in φ1 is denoted by Imi ).
     For any variable xi we denote by di the number of appearances of x in φ1 . For φ 00 ∈R Pm it holds
that ∑ni=1 E[di ] = 3m. By symmetry, for every i it holds that E[di ] = 3m/n. We denote d , 3m/n. Given
that di = k the polarities of the appearances of xi are still random. Given that di = k, the imbalance of
xi is the absolute value of the sum of k independent random variables, where each random variable has
probability 1/2 of being 1 and probability 1/2 of being −1. Hence E[Im2i | di = k] = k. It then follows
that

                  E[Im2i ] = ∑ Pr[di = k] · E[Im2i | di = k] = ∑ k Pr[di = k] = E[di ] = d ,                  (4.10)
                                k                                      k


                              T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                   37
                                         U RIEL F EIGE AND E RAN O FEK

where all probabilities and expectations are taken over φ 00 ∈R Pm . Using the convexity of the square
function
                                               q             √
                                      E[Imi ] ≤ E[Im2i ] ≤ d .                                  (4.11)
                                                       √
So far we showed that Eφ 00 ∈R Pm ∑ni=1 Imi ≤ n d. Thus, for φ 00 ∈R Pm the total imbalance of φ 00 is
                                              
                            √
expected to be less than√ 2n   d. We will now show that the total imbalance of φ 00 is not likely to be
too large relative to 2n d. For any φ 00 ∈ Pm we let f (φ 00 ) be the total imbalance of φ 00 . Changing one
matched pair of clauses in φ 00 changes the total imbalance of φ 00 by at most 12. Azuma’s inequality
implies that for any λ > 0
                                                00
                                                          √               2
                                                                        − λ 2
                                  00
                                     Pr m
                                            f (φ    ) − 2n  d > λ   < e  2m12 .
                                    φ ∈R P
             √
Setting λ = n d and using d = 3m/n we derive
                                                              00
                                                                      √        n

                            00
                               Pr m
                                    [the total imbalance of φ    > 3n  d] < e− 96 .
                           φ ∈R P

                                                                             √
It then follows that the normalized imbalance of φ 00 is w.h.p. bounded by 3n d/6m ≤ 3/c (using
m ∼ c2 n/8 and d = 3m/n).

Proof of Lemma 4.2. We generate φ 00 iteratively by choosing each time (independently) a random ele-
ment of P. For each new random element of P, the probability for it to have an inducing pair which
is different from all previous inducing pairs is ≥ 1 − m/N, where N = 2n · (2n − 2) is the number of
possible inducing pairs. It then follows that with probability of at least
                                            m (1)                   (2)
                                 m                                         4
                       1−                                      m
                                                ≥ exp −m 2n2 (1−1/n)    ≥ e−c /128 ,
                             2n(2n − 2)

each of the matched pairs in φ 00 has a distinct inducing pair. Inequality (1) is because 1 − x ≥ e−2x
holds for every x ∈ [0, 1/2]. The constant in the exponent following Inequality (2) is derived by taking
m = c2 n/8.


5    Practical considerations for the refutation algorithm
Recall that our refutation algorithm extracts from φ a subformula φ 0 that contains matched pairs of
clauses, and then refutes φ 0 . The longer φ 0 is, the easier it is to refute it. For simplicity, we matched a
pair of clauses only if they agreed on their last two literals. Moreover, every clause of φ participated in
at most one pair of matched clauses in φ 0 , even though a clause may be eligible to participate in more
than one matched pair. In practical implementations, it is advantageous not to have these restrictions,
and thus get a longer formula φ 0 . In particular, we may allow the same clause to participate in several
pairs of matched clauses, by duplicating it. More importantly, we may match any two clauses that share
two variables (regardless of the polarity of the variables, and of their location within the clauses). For
example, the two clauses (x, w, `) and (w̄, `, y) can be matched. If φ 0 is satisfied by an assignment A that

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                                38
                E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

has the 3XOR property, then one step of Gaussian elimination gives in this case A(x) + A(w) + A(`) +
A(y) + A(w̄) + A(`) = 0 mod 2, thus A(x) + A(y) = 1 mod 2. Hence, we will associate the edge (x, y)
with this pair of matched clauses so that the edge induced by this pair in G2eq
                                                                             φ 0 will cross the cut which
corresponds to the assignment A. Using the principles above, the number of pairs of matched clauses
                                                                                  3/2 
that one expects to extract from a random formula of length cn3/2 is roughly 3cn2 / n2 ' 9c2 n.
                                                                                         

    We used the principles above to implement the algorithm in practice. In the current implementa-
tion, the problem of refuting φ 0 is reduced to strong refutation of two 2XOR formulas. We performed 2
eigenvalue computations on matrices of size n × n, whereas the original refutation algorithm performed
eigenvalue computations on matrices of size 2n × 2n. Our implementation uses the conditions of Theo-
rem 5.2 to refute φ 0 . Before stating Theorem 5.2 we need the following definition.
Definition 5.1. Let φ be a 2XOR formula with m clauses and n variables. Aφ is the following n × n
symmetric matrix associated with φ . Initially Aφ is the zero matrix. For each clause of the forms (x, ȳ)
or (x̄, y) we add +1 to positions A(x, y) and A(y, x). For each clause of the forms (x, y) or (x̄, ȳ) we add
−1 to positions A(x, y) and A(y, x).
   A similar matrix can be defined for a 2EQ formula, just by reducing the 2EQ formula into a 2XOR
formula.
Theorem 5.2. Let φ 0 be a 3CNF formula with m pairs of matched clauses and n variables. Let φ2xor be
the 2XOR formula with 6m clauses induced by replacing each 3CNF clause of φ 0 by three 2XOR clauses
(one for every two literals). Let φ2eq be the 2EQ formula with m clauses induced by adding pairs of
matched clauses modulo 2. If the following hold then φ 0 is not satisfiable:
  (1) φ 0 is δ -balanced.

  (2) Let λ2xor , λ2eq denote the largest eigenvalues of Aφ2xor , Aφ2eq respectively; then
                                                   n
                                           3δ +      (λ2xor + λ2eq ) < 12 .
                                                  4m

    The proof of Theorem 5.2 will follow shortly.
Lemma 5.3 (2XOR strong refutation). Let φ be a 2XOR formula with m clauses. If λ is the maximum
eigenvalue of Aφ then φ is at most ( 12 + ε) satisfiable, for ε = 4m
                                                                  λn
                                                                     .
Proof. Let T be an assignment satisfying the most clauses as 2XOR. Let x be the ±1 vector which
corresponds to T: xi equals +1 if T (i) = true, otherwise xi = −1. It holds that xt Aφ x = Aφ • xxt (where
for any two matrices A, B of the same dimensions, A • B = ∑i, j A(i, j)B(i, j)). Every 2XOR clause
satisfied by T contributes 2 to Aφ • xxt whereas every unsatisfied clause contributes −2. If T satisfies
exactly (1/2 + ε)m clauses, then

                            xt Aφ x ( 12 + ε)m(+2) + ( 21 − ε)m(−2)
                                   =                                = 4εm/n .                          (5.1)
                              xt x                 n
Since xt Aφ x/xt x ≤ λ we derive ε ≤ λ n/4m.


                            T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                             39
                                       U RIEL F EIGE AND E RAN O FEK

Lemma 5.4 (3XOR property certification). Let φ be a 3CNF formula which is δ -balanced with m
clauses and n variables. Assume that the 2XOR formula induced by replacing each 3CNF clause by 3
2XOR clauses is at most ( 21 + γ) satisfiable. Then φ has the (1 − 23 (δ + 2γ))3XOR property.

Proof. Assume that the assignment T satisfies φ as 3CNF. Denote by αi the number of clauses satisfied
exactly once, twice, three times respectively (∑3i=1 αi = 1). We need to upper bound α2 . It can not be
that α3 is too small, as in such case there are many satisfied clauses in the induced 2XOR formula: if a
clause of φ is satisfied exactly once or exactly twice, then out of the three 2XOR clauses induced by it,
two are satisfied as 2XOR. Since the number of satisfied 2XOR clauses (in the induced 2XOR formula)
is bounded by 3m(1/2 + γ), we derive 2m(α1 + α2 ) ≤ 3m(1/2 + γ), or equivalently α3 ≥ 1/4 − 3γ/2.
The imbalance of φ is bounded by δ , thus the number of satisfied literals in φ is at most 23 (1 + δ )m. This
implies
                                        3(1 + δ )
                                                  ≥ α1 + 2α2 + 3α3 .
                                            2
Using the facts: α1 = 1 − α2 − α3 and α3 ≥ 41 − 32 γ we deduce
                                       3                        1   3
                                       2 (1 + δ ) ≥ α2 + 1 + 2( 4 − 2 γ),                              (5.2)

and thus 23 (δ + 2γ) ≥ α2 .

Proof of Theorem 5.2. Assume that φ 0 is satisfiable as 3CNF. We show that property (1) contradicts
property (2). Set
                                         λ2xor n                    λ2eq n
                                 ε2xor ,         ,    and    ε2eq ,        .
                                         4 · 6m                       4m
By Lemma 5.3 φ2xor and φ2eq are at most (1/2 + ε2xor ) and (1/2 + ε2eq ) satisfied, respectively . It then
follows, using Lemma 5.4, that φ 0 is (1 − (3/2)(δ + 2ε2xor )) satisfied as 3XOR. Each pair of matched
clauses of φ 0 for which both clauses are satisfied as 3XOR yields a satisfied 2EQ clause of φ2eq . As φ2eq
is at most (1/2 + ε2eq ) satisfied, we conclude that

                                1 − 3(δ + 2ε2xor ) ≤ 21 + ε2eq , or equivalently                       (5.3)
                              3(δ + 2ε2xor ) + ε2eq ≥ 21   .                                           (5.4)

Substituting ε2xor and ε2xor according to the definitions in (5) we derive
                                                n
                                        3δ +      (λ2xor + λ2eq ) ≥ 12 ,
                                               4m
which contradicts property (2).

We generated several random formulas with n = 5 · 104 variables and 27335932 = d2.445 · n3/2 e clauses.
Our algorithm refuted all of them (our current implementation fails to refute formulas of significantly
lower clause density). We give more detailed results for one specific (though typical) run. Our algorithm
extracted a subformula φ 0 with m = 2689832 pairs of matched clauses. Table 1 summarizes the values
computed by the algorithm along with a heuristic estimation of what we could have expected them to
be.

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                               40
               E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

                               m         λ2eq      λ2xor          δ                Bound
          Algorithm         2689832    20.8961    54.6503     0.048662          0.49706 < 21
       Heuristic bound:     2690112    20.7454
                                        √         50.8157
                                                   √          0.05565           0.49946 < 21
                                                                pn          n
       (using formula)       ≈ 9c2 n   2 18c2     2 108c2          6m      4m (λ2eq + λ2xor ) + 3δ

          Table 1: Results for a random formula with 5 · 104 variables and 27335932 clauses.



    Let us explain the heuristic bounds
                                      √ used in the table. To estimate the largest eigenvalue of a sym-
metric matrix we use the formula 2 d where d is the average `1 norm of each of the rows. This bound
is known to be true for various random graph models, but apparently is too optimistic for Aφ2xor . To
estimate the imbalance δ we assume that each variable appears exactly 6m/n times, each time with ran-
dom polarity. The difference between the number of positive and negative appearances behaves like the
distance from 0 when performing a p    random walk of length 6m/n on Z (starting from 0). The expected
square of the distance is 6m/n, and 6m/n   p is an upperpbound on the expected distance. We estimated
the expected normalized imbalance as n 6m/n/6m = n/6m.
    A few words about the implementation of our algorithm. The part of extracting the subformula
  0
φ was implemented in C. The other parts (computing the imbalance and the eigenvalues) were imple-
mented in Matlab. To save memory we used Matlab’s sparse matrix objects. The heavy part of the
algorithm was computing the largest eigenvalues of the two matrices Aφ2xor , Aφ2eq . This part took 63
minutes on an Intel Xeon CPU 1700MHz with 256K cache and 2Gbyte memory (running the Linux
operating system).
    It may be interesting to see if other refutation algorithms (which may use various optimized versions
of resolution, OBDDs, backtracking, to name a few of the common algorithmic principles used for
refutation) can handle random formulas with as many variables as those handled by our algorithm. We
have not made a serious attempt to check this.


Acknowledgements
This research was supported by a grant from the G.I.F., the German-Israeli Foundation for Scientific
Research and Development. We thank Amin Coja-Oghlan for useful discussions.


References
 [1] * N. A LON AND J. S PENCER: The Probabilistic Method. John Wiley and Sons, 2002. 4.2

 [2] * E. B EN -S ASSON AND A. W IGDERSON: Short proofs are narrow— resolution made simple. J.
     ACM, 48(2):149–169, 2001. [JACM:10.1145/375827.375835]. 1

 [3] * V. C HV ÁTAL AND E. S ZEMER ÉDI: Many hard examples for resolution. J. ACM, 35(4):759–768,
     1988. [JACM:10.1145/48014.48016]. 1

                          T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                           41
                                    U RIEL F EIGE AND E RAN O FEK

 [4] * A. C OJA -O GHLAN , A. G OERDT, AND A. L ANKA: Strong refutation heuristics for ran-
     dom k-SAT. Combinatorics, Probability and Computing, 16(1):5–28, 2007. Conference ver-
     sion appeared in Proc. 8th Internat. Workshop on Randomization and Computation (RANDOM
     ’04), volume 3122 of LNCS, pp. 310–321. Springer, 2004. [doi:10.1017/S096354830600784X,
     Springer:x1tlgm3cexcj]. 1

 [5] * A. C OJA -O GHLAN , A. G OERDT, A. L ANKA , AND F. S CHADLICH: Techniques from combina-
     torial approximation algorithms yield efficient algorithms for random 2k-sat. Theoretical Computer
     Science, 329:1–45, 2004. [TCS:10.1016/j.tcs.2004.07.017]. 1, 2.4, 3

 [6] * U. F EIGE: Relations between average case complexity and approximation complexity. In Proc.
     34th STOC, pp. 534–543. ACM Press, 2002. [STOC:509907.509985]. 1, 2.4, 3

 [7] * U. F EIGE AND E. O FEK: Spectral techniques applied to sparse random graphs. Random Struc-
     tures and Algorithms, 27(2):251–275, 2005. [RSA:10.1002/rsa.20089]. 1, 2.3, 2.4

 [8] * E. F RIEDGUT AND J. B OURGAIN: Sharp thresholds of graph properties, and the k-sat prob-
     lem. Journal of the American Mathematical Society, 12(4):1017–1054, 1999. [JAMS:1999-12-
     04/S0894-0347-99-00305-7]. 1

 [9] * J. F RIEDMAN , A. G OERDT, AND M. K RIVELEVICH: Recognizing more unsatisfi-
     able random 3-sat instances efficiently. SIAM J. on Computing, 35(2):408–430, 2005.
     [SICOMP:10.1137/S009753970444096X]. 1, 2.4

[10] * A. G OERDT AND M. K RIVELEVICH: Efficient recognition of random unsatisfiable k-SAT in-
     stances by spectral methods. In Proc. Ann. Symp. on Theoretical Aspects of Computer Science
     (STACS ’01), volume 2010 of LNCS, pp. 294–304. Springer, 2001. [STACS:27cr0lrbpr7px7y3].
     1, 2.4

[11] * A. G OERDT AND A. L ANKA: Recognizing more random 3-sat instances efficiently. LICS’03
     Workshop on Typical Case Complexity and Phase Transitions, 2003. 1, 2.4

[12] * M. H AJIAGHAYI AND B. S ORKIN: The satisfiability threshold of random 3-SAT is at least 3.52,
     2003. [arXiv:math/0310193]. 1

[13] * S. JANSON , Y. S TAMATIOU , AND M. VAMVAKARI: Bounding the unsatisfiability threshold of
     random 3-sat. Random Structures and Algorithms, 17(2):103–116, 2000. [RSA:10.1002/1098-
     2418(200009)17:2¡103::AID-RSA2¿3.0.CO;2-P]. 1

[14] * A. K APORIS , L. K IROUSIS , AND E. L ALAS: The probabilistic analysis of a greedy satisfiability
     algorithm. Random Structures and Algorithms, 28(4):444–480, 2006. [RSA:10.1002/rsa.20104].
     1

[15] * S. K HOT: Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. In
     Proc. 45th FOCS, pp. 136–145. IEEE Computer Society, 2004. [FOCS:10.1109/FOCS.2004.59].
     1

                         T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                           42
              E ASILY REFUTABLE SUBFORMULAS OF LARGE RANDOM 3CNF FORMULAS

[16] * U. Z WICK: Outward rotations: a tool for rounding solutions of semidefinite programming re-
     laxations, with applications to MAX CUT and other problems. In Proc. 31st STOC, pp. 679–687.
     ACM Press, 1999. [STOC:301250.301431]. 2.3, 2.4, 3, 3


AUTHORS

     Uriel Feige [About the author]
     Microsoft Research
     One Microsoft Way
     Redmond, WA 98052-6399
            and
     Weizmann Institute of Science
     Rehovot 76100, Israel
     urifeige microsoft com
     uriel feige weizmann ac il
     http://www.wisdom.weizmann.ac.il/~feige


     Eran Ofek [About the author]
     Department of Computer Science
     and Applied Mathematics
     Weizmann Institute of Science
     Rehovot 76100, Israel
     eran ofek weizmann ac il
     http://www.wisdom.weizmann.ac.il/~erano


ABOUT THE AUTHORS

     U RIEL F EIGE is a member of the theory group at Microsoft Research, currently on leave
        from the Weizmann Institute. His main research interests involve studying the border-
        line between P and NP as it manifests itself in approximation algorithms, heuristics,
        and exact algorithms that are not necessarily polynomial time. Other activities he en-
        joys include playing the piano, dancing with his wife, and helping his kids with their
        homework.


     E RAN O FEK obtained his Ph. D. in Computer Science from the Weizmann Institute of Sci-
        ence in 2006. His Ph. D. and M. S. advisor was Uriel Feige. His research interests
        include optimization algorithms, random structures, and average case complexity. On
        his spare time he likes to play soccer, volleyball, or spend time with his two sons.




                       T HEORY OF C OMPUTING, Volume 3 (2007), pp. 25–43                         43