New Algorithms and Lower Bounds for Circuits With Linear Threshold Gates

Authors R. Ryan Williams,
Plaintext
                           T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25
                                        www.theoryofcomputing.org




        New Algorithms and Lower Bounds for
         Circuits With Linear Threshold Gates
                                                  R. Ryan Williams∗
                   Received June 28, 2015; Revised June 4, 2018; Published December 10, 2018




       Abstract: Let ACC ◦ THR be the class of constant-depth circuits comprised of AND, OR,
       and MODm gates (for some constant m > 1), with a bottom layer of gates computing arbitrary
       linear threshold functions. This class of circuits can be seen as a “midpoint” between ACC
       (where we know nontrivial lower bounds) and depth-two linear threshold circuits (where
       nontrivial lower bounds remain open). We give an algorithm for evaluating an arbitrary
                                   o(1)                                   o(1)
       symmetric function of 2n ACC ◦ THR circuits of size 2n , on all possible inputs, in
       2n · poly(n) time. Several consequences are derived:
                                                                                       o(1)
           • The number of satisfying assignments to an ACC ◦ THR circuit of 2n size is com-
                              ε
             putable in 2n−n time (where ε > 0 depends on the depth and modulus of the circuit).
           • NEXP does not have quasi-polynomial size ACC ◦ THR circuits, and NEXP does not
             have quasi-polynomial size ACC ◦ SYM circuits. Nontrivial size lower bounds were
             not known even for AND ◦ OR ◦ THR circuits.
           • Every 0-1 integer linear program with n Boolean variables and s linear constraints
                                        4
             is solvable in 2n−Ω(n/ log (sM(log n))) · poly(s, n, M) time with high probability, where
                     o(1)
             M ≤ 2n is an upper bound on the bit-length of the coefficients. (For example, 0-1
                                                        o(1)   o(1)
             integer programs with weights in [−2n , 2n ] and poly(n) constraints can be solved
                           4
             in 2n−Ω(n/ log n) time.)
   ∗ Supported by an Alfred P. Sloan Fellowship, a Microsoft Research Faculty Fellowship, a David Morgenthaler II Faculty Fel-

lowship, NSF CCF-1212372 and NSF CCF-1552651 (CAREER). Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


ACM Classification: F.1.1, F.2.3, G.1.6
AMS Classification: 68Q15, 68Q17, 68Q15
Key words and phrases: circuit complexity, satisfiability algorithms, linear threshold functions


© 2018 R. Ryan Williams
c b Licensed under a Creative Commons Attribution License (CC-BY)                                 DOI: 10.4086/toc.2018.v014a017
                                                       R. RYAN W ILLIAMS

            We also present an algorithm for evaluating depth-two linear threshold circuits (also
        known as THR ◦ THR) with exponential weights and 2n/24 size on all 2n input assignments,
        running in 2n · poly(n) time. This is evidence that non-uniform lower bounds for THR ◦ THR
        are within reach.


1     Introduction
In the non-uniform Boolean circuit model, one designs an infinite family of logical circuits {Cn }, one for
each input length n, in order to recognize a given binary language L ⊆ {0, 1}? . This model is notoriously
powerful, even when the size of Cn is bounded from above by a fixed polynomial in n, defining the
complexity class P/poly. With polynomial size circuits, one can already “compute” some undecidable
languages, such as L0 = {1n | the nth Turing machine halts on blank tape}. Nevertheless, it is strongly
believed that NP 6⊂ P/poly, meaning that for even modestly-sized instances of NP-complete problems,
the sizes of computations on such instances must be inevitably gigantic. However, knowledge of P/poly
is rather poor, partly due to the “infinite” nature of the model: it is open if the huge complexity class
nondeterministic exponential time (NEXP) is contained in P/poly. This containment would imply that
problems verifiable with exponentially-long witnesses could be efficiently “solved” with small circuits.
The possibility looks obviously absurd, but we do not know at present how to rule it out.
     In recent years, it has been demonstrated that the existence of nontrivial circuit-analysis algorithms is
closely linked to the NEXP versus P/poly problem. For instance, Impagliazzo, Kabanets, and Wigder-
                                                                   o(1)
son [20] showed that NEXP 6⊂ P/poly follows, if there is a 2n time algorithm that can approximate
a given circuit’s acceptance probability to within 1/10. They also proved a partial converse, in that
NEXP 6⊂ P/poly implies a certain kind of derandomization. Subsequent work [53] strengthened the
algorithms-to-lower bounds implication, proving that a similar algorithm which (for every k) runs in
2n−ω(log n) time on all n-input nk -size circuits still implies NEXP 6⊂ P/poly. A variant of this implication
(for circuit satisfiability algorithms) was combined with an satisfiability algorithm for a restricted circuit
class called ACC, implying that NEXP does not have polynomial-size ACC circuits [54]. Recently, it
was shown that NEXP 6⊂ P/poly is equivalent to establishing a “weak” form of natural proofs [55],
building on Impagliazzo, Kabanets, and Wigderson. In particular, NEXP 6⊂ P/poly if and only if there
is a “constructive” property of Boolean functions that is “useful” against P/poly with O(log n) bits of
advice.1 Hence, assuming strong cryptography is possible, NEXP lower bounds must somehow confront
the framework of natural proofs but sidestep the “large” condition.
     To continue progress on circuit lower bounds for NEXP, it is imperative to understand algorithms
for analyzing circuits, such as algorithms for circuit satisfiability, evaluating a circuit on all 2n inputs,
and approximating the acceptance probability of a circuit. (Recent surveys on these issues include [52,
46, 10, 37].) In this paper, we make this sort of algorithmic progress for circuits with arbitrary linear
threshold gates: such a gate outputs 1 if and only if a certain linear inequality ∑i wi xi ≥ t is true, where
wi ,t ∈ Z are weights and xi ∈ {0, 1} are inputs to the gate. Linear threshold functions have been studied
for decades, coinciding with research on neural networks [32, 33]. Low-depth linear threshold circuits are
powerful: many basic functions in arithmetic, algebra, and cryptography are known to be implementable
    1 The natural proofs barrier [41] states that if such a property is also “large” (true of a large fraction of functions) then strong

cryptographic pseudorandom generators do not exist.


                             T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                                    2
       N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

with only constant-depth linear threshold circuits [45, 48, 49, 30, 35]. In terms of lower bounds for
such circuits, very weak questions remain major open problems: for example, is all of NEXP solvable
with polynomial-size depth-two linear threshold circuits with exponential-size weights? (Note that for
thresholds with polynomially-bounded weights, depth-two threshold lower bounds are known; however
depth-three lower bounds are still open. The survey of Razborov [40] is still relatively current on these
points.) Depth-two threshold circuits correspond to multilayer perceptrons with only one hidden layer.
Despite considerable study in neural networks and deep learning, we still lack understanding of the power
of depth-two.
    In this paper, we report some new progress on understanding the power of linear threshold gates.

Algorithms and lower bounds for ACC with threshold gates. A MODm gate outputs 1 if and only if
the sum of its input bits is divisible by m. Let ACC ◦ THR denote the class of circuits consisting of AND,
OR, MODm gates for some constant m, and linear threshold gates, with unbounded fan-in and constant
depth, such that the inputs of all linear threshold gates connect directly to the circuit’s input variables.
Let SYM ◦ ACC ◦ THR be the class of circuits where the output gate computes an arbitrary symmetric
function, and its inputs connect to the outputs of ACC ◦ THR circuits. We show that such circuits can be
                                                                   o(1)
very efficiently evaluated on all 2n inputs, even if they are of 2n size.
                                                                                   o(1)
Theorem 1.1. Given a SYM ◦ ACC ◦ THR circuit with n inputs and 2n size, we can produce its outputs
on all 2n inputs in 2n · poly(n) time.2 More generally, such a circuit of size s can be evaluated on all
                                         c
inputs in 2n · poly(log s, n) + 2O(log s) time, for some c ≥ 1 depending on the depth of the circuit and the
modulus m of its MODm gates.

    The proof of Theorem 1.1 also carries through for SYM ◦ ACC ◦ SYM circuits, where the bottom-layer
gates compute arbitrary symmetric functions (i. e., functions which only depend on the number of true
             o(1)
inputs) of 2n wires. The algorithm of Theorem 1.1 can be used to count the satisfying assignments to
ACC ◦ THR circuits:

Theorem 1.2. For every integer m > 1 and d > 0, there is an ε > 0 such that counting satisfying
                                                            ε                               ε
assignments to ACC ◦ THR circuits with MODm gates of size 2n and depth d can be done in 2n−n time.

    By modifying prior arguments [54], we can conclude lower bounds for such circuits. The new
argument shows that the ability to count SAT assignments entails non-uniform lower bounds for circuit
classes with very weak closure properties.

Theorem 1.3. NEXP does not have non-uniform ACC ◦ THR circuits of quasi-polynomial size.

    As Theorem 1.1 also holds for SYM ◦ ACC ◦ SYM, it follows that NEXP doesn’t have ACC ◦ SYM
circuits of quasi-polynomial size as well.
    Twenty years ago, Maciel and Thérien [28] considered lower bounds for AC0 ◦ MAJ circuits (which
ACC ◦ THR subsumes), but nontrivial lower bounds have not been reported. Regan [44] studied MOD2 ◦
AND ◦ THR circuits and also noted the absence of lower bounds. Lower bounds have been open even for
the much weaker class AND ◦ OR ◦ MAJ [19].
  2 Throughout the paper, we use the notation “poly(n)” to denote arbitrary (but fixed) polynomial factors of n.



                          T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                    3
                                           R. RYAN W ILLIAMS

    Theorem 1.3 moves a little closer to an “unconditional break” of the natural proofs barrier [41]. More
precisely, it appears plausible that pseudorandom functions are implementable with ACC ◦ THR circuits,
in which case all lower bounds proved against such circuits must be non-naturalizing. We should note
that it is not completely settled whether the proof that NEXP 6⊂ ACC is “truly” non-naturalizing; it could
be that the natural proofs barrier is irrelevant to the problem. (If pseudorandom functions cannot be
implemented in ACC, then natural proofs considerations don’t apply to ACC anyway; if such functions
can be implemented in ACC, then the NEXP lower bound is indeed non-naturalizing.) Plaku [38] has
observed that the Naor-Reingold family of pseudorandom functions [35] can be implemented with quasi-
polynomial size OR ◦ THR ◦ AND circuits; it follows that the natural proofs barrier already applies to this
circuit class. It is an interesting open problem if ACC ◦ THR can efficiently simulate such depth-three
circuits.

Faster integer linear programming. Building on Theorem 1.1, we also give a new method for solving
0-1 integer linear programs (ILP). Impagliazzo, Paturi, and Schneider [23] showed that for each c > 1,
there is a δ < 1 such that 0-1 ILP with cn constraints can be solved in 2δ n time. Impagliazzo, Lovett,
Paturi, and Schneider [21] have recently shown that such ILPs can be solved in 2n(1−Ω(poly(1/c))) time. We
provide an improvement over exhaustive search for up to subexponentially many constraints:
Theorem 1.4. Every 0-1 integer linear program with n variables and s constraints can be solved in
                4                                                                o(1)
time 2n−Ω(n/ log (sM(log n))) · poly(s, n, M) with high probability, where M ≤ 2n is an upper bound on
bit-length of the coefficients in the program.
                                                                                      o(n)
    Notice that the theorem allows for enormous coefficients, of size up to 2M ≤ 22 . The time bound
compares favorably with the AC0 circuit satisfiability algorithm of Impagliazzo, Matthews, and Paturi [22]:
there, the authors use random restriction methods to solve satisfiability of AC0 circuits with depth d and
                      O(d)
size s in 2n−n/(log s) randomized time with zero error.

Depth-two linear threshold circuit evaluation. We take an important step towards depth-two linear
threshold circuit (also known as THR ◦ THR) lower bounds for the case of exponential weights, by giving
an efficient algorithm for evaluating such circuits on all possible assignments.
Theorem 1.5. Let k > 1. Given a depth-two 2n/24 -size linear threshold circuit C with integer weights in
    k    k
[−2n , 2n ], we can evaluate C on all 2n input assignments in 2n · poly(nk ) time.
    Theorem 1.5 follows from a more general result showing that any sufficiently large “combinatorial
rectangle” of inputs can be evaluated in poly(n) amortized time per input. Noting that a similar statement
for evaluating ACC circuits forms the heart of the proof of NEXP 6⊂ ACC [54], Theorem 1.5 suggests
that large complexity classes (such as NEXP) cannot have small depth-two linear threshold circuits.
    However, we do not yet know how to turn Theorem 1.5 into depth-two linear threshold lower
bounds; the precise difficulty is the following. The current known theorems connecting circuit evaluation
algorithms to circuit lower bounds require that, from the OR of a collection of circuits, we can generate
an equivalent circuit in the same class. We do not know how to convert a large OR of THR ◦ THR circuits
into an equivalent THR ◦ THR circuit, even assuming NEXP has small THR ◦ THR circuits. (In the case
of ACC, we trivially have OR ◦ ACC = ACC, because an OR of ACC circuits is still an ACC circuit.)

                       T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                             4
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

1.1    Prior work
Considerable effort has been expended in proving lower bounds against circuits with linear threshold
gates. Here we will provide some major highlights, in addition to the work already mentioned.
    It will help to introduce a little (standard) notation. Define MAJ, AND, OR, THR, and SYM to be
the class of one-gate circuits corresponding to MAJORITY, AND, OR, linear threshold, and symmetric
functions, respectively, with “free” NOT gates that can appear after the output or on the input wires to the
gate. (Recall that a symmetric Boolean function’s output only depends on the number of true inputs.) For
classes of circuits C and D, define C ◦ D to be the class of circuits formed by taking a circuit C ∈ C, and
feeding the outputs of circuits from D as inputs to C. That is, C ◦ D is simply the composition of circuits
from C and D, with the circuits from D receiving the input and the circuit from C giving the output. We
will refer to the size of a circuit as the number of wires, i. e., the number of directed arcs in the DAG
defining the circuit. This is an important measure for circuits with symmetric gates, as the number of
wires governs the size of the symmetric function representation.
    Much work on depth-two threshold lower bounds has concentrated on lower bounds for inner product
modulo 2, i. e., IP2(x1 , . . . , xn , y1 , . . . , yn ) = ∑i xi · yi mod 2. Note that IP2 is easy for ACC (being a MOD2
of AND gates). In groundbreaking work, Hajnal et al. [15] proved that every MAJ ◦ MAJ circuit requires
2Ω(n) gates to compute IP2. They also showed MAJ ◦ SYM circuits can be efficiently simulated by
MAJ ◦ MAJ circuits, so small MAJ ◦ SYM circuits also cannot compute IP2. Nisan [36] extended the
lower bound to MAJ ◦ THR circuits, and Forster et al. [12] extended the lower bound to THR ◦ MAJ
circuits. More recently, Sherstov [47] showed that AC0 requires exponential-size MAJ ◦ MAJ circuits,
Razborov and Sherstov [42] proved that depth-three AC0 requires exponential-size MAJ ◦ THR circuits,
and Beame and Huynh [4] showed that AC0 requires nΩ(log n) -size MAJ ◦ SYM ◦ AND circuits.
    Although superpolynomial-size lower bounds against MAJ ◦ AC0 , THR ◦ AC0 , MAJ ◦ MAJ ◦ AND
and even MAJ ◦ MAJ ◦ AC0 circuits are known [2, 13, 43, 17], and many lower bounds are known for
AC0 circuits augmented with a small number of threshold gates [5, 3, 9, 51, 16, 14, 27, 39], lower
bounds for AC0 ◦ MAJ circuits have remained open. Maciel and Thérien [28] conjectured that the
majority-of-majority function is not in AC0 ◦ MAJ.
    Recently, Hansen and Podolskii [19] have shown an intriguing reduction: superpolynomial-size
THR ◦ THR lower bounds for a function f would follow from superlogarithmic lower bounds on the
3-party NOF unbounded-error communication complexity of f .

1.2    Comparison and intuition
It is instructive to discuss how this paper’s approach relates to prior work on depth-two threshold lower
bounds. A certain popular approach [12, 26, 47, 42] applies ingredients from Fourier analysis of Boolean
functions, linear algebra, communication complexity, discrepancy theory, etc. In particular, these works
follow the general scheme:
   1. Define some notion of “relaxed rank” of a 2n/2 × 2n/2 Boolean matrix C. Intuitively, if C has
      “relaxed rank” r, then there are 2n/2 × r and r × 2n/2 matrices A and B such that the entries of A · B
      correspond to the entries of C in a direct way.
   2. Show that every function f : ({0, 1}n/2 × {0, 1}n/2 ) → {0, 1} computable with a “small” C circuit
      has “small relaxed rank” when construed as an 2n/2 × 2n/2 Boolean matrix.

                          T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                       5
                                            R. RYAN W ILLIAMS

    3. Show that some explicit family of functions gn : ({0, 1}n/2 × {0, 1}n/2 ) → {0, 1}, construed as
        2n/2 × 2n/2 Boolean matrices, requires “high relaxed rank” asymptotically.
     Together, these steps prove that the family g := {gn } cannot have “small” C circuits.
     To prove ACC ◦ THR circuit lower bounds, we define a generalized rank notion we call the symmetric
rank, informally measuring how efficiently a 0-1 matrix M can be decomposed into a sum of rank-one
matrices such that, after applying a fixed symmetric function to each entry of the sum, we obtain the matrix
M. Combining several elements from previous work, we show that for a Boolean matrix representing
                                                                                       c
the truth table of a SYM ◦ ACC ◦ THR circuit of size s, its symmetric rank is O(2log s ) for some constant
c ≥ 1, depending on the depth d and modulus m of the MODm gates in the circuit. Moreover, given such
a circuit we can efficiently compute a low-rank decomposition.
     However, we do not know how to use existing methods to prove that an explicit function g has high
symmetric rank. Instead, we take a more computational approach that still exploits the low symmetric
rank property. The idea is that, if we can efficiently compute a low-rank decomposition from a given
circuit, then the circuit’s truth table can be obtained faster than evaluating the circuit on all its inputs
one-by-one. This in turn suggests that these circuits possess considerable structure that make them
unsuitable for simulating very complex functions, such as those in NEXP.
     Suppose we are given an SYM ◦ ACC ◦ THR circuit C of size s with n inputs. Let M be a 2n/2 × 2n/2
matrix defining the function computed by C. First we show how given any such C we can compute
            c          c
2n/2 × 2log s and 2log s × 2n/2 matrices A and B (and a symmetric function f ) giving a symmetric rank
                                       c
decomposition of M, in 2n/2 · 2O(log s) time. By multiplying A and B and applying f to each entry of
the output matrix, we can obtain M. When s is sufficiently small, a rectangular matrix multiplication of
Coppersmith [11] can be applied to compute the product of A and B, and the final matrix M is obtained in
                                                                                 o(1)
poly(n) time per entry. Hence, given an SYM ◦ ACC ◦ THR circuit C of size 2n , we can evaluate C on all
its 2n inputs in only 2n · poly(n) time. This fast evaluation algorithm is combined with prior work [53, 54]
along with some new tricks to exhibit a g := {gn } ∈ NEXP which does not have quasipolynomial-size
ACC ◦ THR circuits.
     Our evaluation algorithm for depth-two threshold circuits (Theorem 1.5) also uses Coppersmith’s
rectangular matrix multiplication as a subroutine, but the rest of the algorithm is rather different from the
evaluation algorithm for SYM ◦ ACC ◦ THR. We reduce the problem of efficiently evaluating a depth-two
threshold circuit on many inputs to a special type of matrix multiplication. Namely, for two matrices A
and B over the integers, we compute a “weighted” matrix product

                                   C[i, j] = ∑ wk · LEQ(A[i, k], B[k, j]) ,
                                             k

where LEQ(x, y) is a Boolean-valued function equal to 1 if and only if x ≤ y, and the wk ’s are arbitrary
integer weights given as parameters to the problem. We show how Coppersmith’s algorithm can be
combined with a mild brute force search to efficiently compute a rectangular matrix product of the above
form.


2    Algorithms and lower bounds for ACC with a layer of threshold gates
The main result of this section is Theorem 1.1.

                       T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                               6
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

                                                                                                              o(1)
Reminder of Theorem 1.1. Given a SYM ◦ ACC ◦ THR circuit with n inputs and 2n size, we can
produce its outputs on all 2n inputs in 2n · poly(n) time. More generally, such a circuit of size s can be
                                                          c
evaluated on all inputs in 2n · poly(log s, n) + 2O(log s) time, for some c ≥ 1 depending on the depth of the
circuit and the modulus m of its MODm gates.

Depth reduction. The first stage of the proof is to convert an arbitrary SYM ◦ ACC ◦ THR circuit C
of size s into a depth-two circuit C00 of symmetric gates, i. e., a SYM ◦ SYM circuit. The size of the
                                c
depth-two circuit will be O(2log s ) for a constant c ≥ 1, depending on the (constant) depth and (constant)
modulus of circuit C. This stage requires several different pieces from prior work.

Lemma 2.1. There is an algorithm which given an SYM ◦ ACC ◦ THR circuit C of size s ≥ n, depth d,
                                                                                 c
and MODm gates, outputs an equivalent SYM ◦ SYM circuit C00 with at most 2(log s) wires, and runs in
               c
time O(2(log s) ), for c ≥ 1 depending only on d and m.

     In the proof, several constants arise; we will denote all of them by the same constant b which is
assumed to be the maximum of these quantities.
     Before we begin the proof, let us make a few remarks about the threshold gates of C. It is well known
that every THR function is equivalent to a THR function where the weights have absolute value at most
2bn log2 n [34, 33]. That is, every THR function is equivalent to one with weights of bit-length at most
bn log2 n. In the following, we shall assume the THR gates of C have this property; this assumption is
certainly valid for proving the lower bounds of Theorem 1.3, and the weights are dealt with explicitly in
our algorithms, such as Theorems 1.4 and 3.1.3
     The following paragraphs give the proof of Lemma 2.1. Let C be a SYM ◦ ACC ◦ THR circuit with
inputs x1 , . . . , xn , size s, depth d, and MODm gates, for constants d > 2 and m > 1. The first step in
Lemma 2.1 is to translate the THR layer of C into a SYM layer, by absorbing some of its complexity into
the ACC part. Maciel and Thérien [29] provided several fairly tight low-depth circuits for various tasks.
We need the following result ([29, Theorem 3.3]).

Theorem 2.2 (Maciel–Therien). Addition of n distinct n-bit numbers can be performed with polynomial-
size AND ◦ OR ◦ SYM circuits. Furthermore, the circuits can be constructed in polynomial time.

    We can therefore replace every THR gate of C with an AC0 ◦ MAJ circuit, as follows. Fix a threshold
gate of C, with weights wi1 , . . . , wit for t ≤ n, computing
                                                        t−1
                                                        ∑ wi xi ≥ wi
                                                               j   j     t
                                                        j=1


for some i j ∈ {1, . . . , n}. Note |wi j | ≤ 2bn log2 n for j = 1, . . . ,t. Set W = bn log2 n.
    3 The “small-weight” representation of a THR function can be obtained from a given linear form and threshold value, by

evaluating the linear form representation at n + 1 (linearly independent) points in {0, 1}n , then solving a linear system in n + 1
variables to determine the new weights and new threshold value. (See [34], Theorem 16.) However, determining which n + 1
points in {0, 1}n to evaluate the THR function on appears to require solving Knapsack problems (finding the “minimal-terms”
of the THR function); the best algorithm the author knows for explicitly computing such weights takes 2O(n) time. Nevertheless,
this determination of small weights is not required for any of our applications.


                            T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                                7
                                                        R. RYAN W ILLIAMS

      Let D be a circuit for the addition of t − 1 W -bit numbers, provided by Theorem 2.2. For j =
1, . . . ,t − 1, we connect to the jth W -bit input of D a circuit which, given xi j , feeds wi j to D if the input
bit xi j = 1, and the all-zero W -bit string if xi j = 0. Note this extra circuit actually contains no gates: it
simply has a wire from xi j to all bits of the jth W -bit input where the corresponding bit of wi j equals 1.
Letting this new circuit be D0 , we have
                                                                           t−1
                                                 D0 (x1 , . . . , xn ) = ∑ wi j xi j .
                                                                           j=1


This can be compared to the value wit with an AC0 circuit, using the fact that the “less-than-or-equal-to”
comparison of two integers can be performed in AC0 [8]. We now have an AC0 ◦ SYM circuit D00 of size
poly(W,t) ≤ nb computing the given threshold gate. Applying this construction to each threshold gate in
the THR layer of C, we obtain an SYM ◦ ACC ◦ SYM circuit C0 of size at most s · nb .
    The next step of Lemma 2.1 is to convert the SYM ◦ ACC part into a SYM ◦ AND circuit, using a
reduction of Beigel–Tarui [6] (with important details on constructibility filled in by Allender–Gore [1]).

Theorem 2.3 (Beigel–Tarui, Allender–Gore). Every SYM ◦ ACC circuit of size s can be simulated by a
                              c0
SYM ◦ AND circuit of 2(log s) size for some constant c0 depending only on the depth d and MODm gates
                                                                                     0
of the ACC part. Moreover, the AND gates of the final circuit have only (log s)c fan-in, the final circuit
                                                  c0
can be constructed from the original in 2O((log s) ) time, and the final symmetric function at the output can
                         c0
be computed in 2O((log s) ) time.

  Applying this reduction to the top SYM ◦ ACC part of the circuit C0 results in an equivalent SYM ◦
                                                            b c0
AND(log(s·nb ))c0 ◦ SYM circuit C00 of size s0 = 2O((log(s·n )) ) (where the subscript on the AND denotes the
                                                                                                 0
fan-in of each AND gate). For simplicity of notation, let t = (log(s · nb ))c in the following.
    Extending a trick of Beigel [5] to symmetric gates, we can convert every ANDt ◦ SYM subcircuit of
C00 with nb wires into a single SYM gate with O(nb·t ) wires. Let S1 (x1 , . . . , xn ) ∧ · · · ∧ St (x1 , . . . , xn ) be one
such subcircuit, where Si denotes the ith symmetric gate. In particular, for i = 1, . . . ,t, let fi : Z → {0, 1}
be such that                                     !
                                                    n
                                            fi     ∑ ci, j x j       = Si (x1 , . . . , xn ) ,
                                                   j=1

where ci, j denotes the number of copies of x j that feed into Si .
   Let B = 1 + maxi (∑nj=1 ci, j ); note that B ≤ nb . Consider the linear form
                                                                        !
                                                                 t                 n
                                       L(x1 , . . . , xn ) = ∑ Bi−1 ·            ∑ ci, j x j .
                                                              i=1                j=1

For any Boolean assignment to the x j ’s, the number encoded by the linear form L(x1 , . . . , xn ) is an integer
encoded in O(t · b log n) bits. By construction, the bit representation of this integer contains, for every
i = 1, . . . ,t, the number of wires input to Si which are set true, as a string of (b log n) bits. Therefore,
from the linear form L(x1 , . . . , xn ) we can easily infer whether all Si (x1 , . . . , xn ) output 1 or not, and hence
output the value of S1 ∧ · · · ∧ St .

                           T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                            8
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

    To implement this linear form with a single SYM gate, for all j = 1, . . . , n we put
                                                        t
                                                      ∑ Bi−1 ci, j
                                                      i=1

wires from the input variable x j into the new SYM gate. Hence there are O(nb·t ) wires from the inputs
into this new SYM gate. By choosing the appropriate symmetric function (which outputs 1 if and only if
L(x1 , . . . , xn ) encodes a number such that S1 ∧ · · · ∧ St is true) we can simulate any ANDt ◦ SYM circuit of
nb wires with a single SYM gate of O(nb·t ) wires.
     Replacing each AND ◦ SYM subcircuit in this manner results in a SYM ◦ SYM circuit of size O(s0 ·
                     c
n ) ≤ 2O(log s) for some constant c ≥ 1. This concludes the proof of Lemma 2.1.
 b·t



Symmetric rank. Next, we prove that the truth table of any SYM ◦ SYM circuit C00 of t wires and n
inputs represents a 2n/2 × 2n/2 matrix of symmetric rank at most poly(t), and this rank decomposition can
be efficiently computed. For given matrices A and B over the integers, let A · B denote their matrix product
over the integers. Let M ∈ {0, 1}m×n . We define the symmetric rank of M to be the minimum r ∈ N such
that there are matrices A ∈ {0, 1}m×r , B ∈ {0, 1}r×n and a function f : {0, 1, . . . , r} → {0, 1} satisfying
M[i, j] = f ((A · B)[i, j]) for all i, j. We call the triple (A, B, f ) a symmetric rank decomposition of M. The
symmetric rank is similar to the typical notion of rank, except for the additional function f providing a
“filter” from arbitrary integers back to {0, 1}. This filter function could potentially lead to smaller rank
decompositions than the typical notion. However, note that the symmetric rank of M is not necessarily at
most (for instance) the rank of M over R, because A and B are required to have Boolean entries.
     For simplicity let n be even, and let z1 , . . . , z2n/2 be the list of all 2n/2 n/2-bit strings in lexicographical
order. For a circuit C with n inputs, define the truth table matrix MC to be the 2n/2 × 2n/2 matrix with
MC [i, j] equal to the output of C(zi , z j ).
Lemma 2.4. Given a SYM ◦ SYM circuit C with t wires and n inputs, its truth table matrix MC has
symmetric rank O(t 3 ), and a symmetric rank decomposition of MC can be computed from C in 2n/2 ·poly(t)
time.
Proof. For simplicity we assume n is even; the case of odd n will be apparent. Index the input variables of
C by x1 , . . . , xn . Let g1 , . . . , gs be an indexing of the gates of C on the bottom layer (closest to the inputs)
and let g0 denote the output gate of C. (Note that s ≤ t.) Let f : {0, 1, . . . , s} → {0, 1} be the symmetric
function of gate g0 : for all a ∈ {0, 1, . . . , s}, f (a) = b if and only if a true inputs make g0 output b.
       We shall show how to efficiently construct matrices A and B with the appropriate properties. Let
z1 , . . . , z2n/2 be the list of all n/2-bit strings in lexicographical order, in the following. For every pair
(a, b) ∈ {0, 1, . . . ,t}2 such that a + b ≤ t, let Sa,b ⊆ {g1 , . . . , gs } denote the subset of gates g j such that
a + b true inputs makes gate g j output 1.
       The matrices A and B to be constructed show that the symmetric rank of MC is at most
                                          r=        ∑            |Sa,b | ≤ O(t 3 ) .
                                               a,b∈{0,1,...,t}
                                                  a+b≤t

In other words, each pair (a, b) will add |Sa,b | additional components to the rows of A and to the columns
of B, increasing the lengths of all rows of A (and columns of B) by |Sa,b |.

                          T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                       9
                                              R. RYAN W ILLIAMS

      For i = 1, . . . , 2n/2 , the ith row of A and ith column of B are defined as follows. For every pair (a, b),
allocate |Sa,b | additional components for the rows of A and columns of B.
      For j = 1, . . . , |Sa,b |, put a 1 in the jth additional component of the ith row of A if and only if there
are a true wires going into the jth gate of Sa,b when the input variables x1 , . . . , xn/2 are given assignment
zi . That is, the jth component is 1 if and only if the contribution (from the first half of variables) to the
overall sum for the jth gate is a.
      Similarly, for j = 1, . . . , |Sa,b |, put a 1 in the jth additional component of the ith column of B if and
only if there are b true wires going into the jth gate of Sa,b when the input variables xn/2+1 , . . . , xn are
given assignment zi .
      Note that each entry of A and B can be determined in poly(t) time.
      For every fixed (a, b), the product of two jth components for the ith row of A and the kth column of B
is either 0 or 1, and the product is 1 if and only if:
     • the sum of true inputs into the jth gate of Sa,b from the inputs (x1 , . . . , xn/2 ) equals a when the
         inputs (x1 , . . . , xn/2 ) are assigned zi ,
     • the sum of true inputs into the same gate from (xn/2+1 , . . . , xn ) equals b when the inputs
         (xn/2+1 , . . . , xn ) are assigned zk , and
     • the jth gate outputs 1 when its sum of true inputs equals a + b.
      It follows that the inner product of the ith row of A and the kth column of B equals the total number
Ni,k of true wires going into the output gate of C on the variable assignment (x1 , . . . , xn ) 7→ (zi , zk ). By
definition, f (Ni,k ) equals the output of C on that variable assignment.

      We need one more lemma to complete the proof of Theorem 1.1.

Lemma 2.5. For all sufficiently large N, and α ≤ .172, multiplication of an N × N α matrix with an
N α × N matrix can be done in N 2 · poly(log N) arithmetic operations, over any field with O(2poly(log N) )
elements.

      A short exposition of Lemma 2.5 can be found in Appendix C of [54].

Proof of Theorem 1.1. Given a SYM ◦ ACC ◦ THR circuit C and size s, convert C into a SYM ◦ SYM
                       c
circuit C00 of 2(log s) size using Lemma 2.1. Compute a symmetric rank decomposition of C into
                c             c                                                                              c
2n/2 × 23(log s) and 23(log s) × 2n/2 0-1 matrices A and B respectively, along with a function f : [23(log s) ] →
{0, 1}. Compute the product of A and B in 2n · poly(log s, n) time, using Lemma 2.5. Finally, evaluate
function f on all entries of the matrix product. This can be done by numerically sorting the entries,
                                                                                                           c
replacing each entry v by f (v), then inverting the sorted order, in time 2n · poly(log s, n) + 2O(log s) . For
       o(1)
s ≤ 2n , the runtime is 2n · poly(n).

2.1     Counting satisfying assignments to ACC of linear thresholds
The evaluation algorithm of Theorem 1.1 is quite powerful, substantially extending the class of circuits
for which we can perform non-trivial circuit analysis.
Reminder of Theorem 1.2. For every m > 1 and d > 0, there is an ε > 0 such that counting satisfying
                                            ε                                             ε
assignments to ACC ◦ THR circuits of size 2n , depth d, and MODm gates can be done in 2n−n time.

                        T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                   10
       N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

Proof. For all k ∈ N and for i = 1, . . . , 2k, define a symmetric function Bitki with 22k inputs as follows:
for all i = 1, . . . , 2k, Bitki outputs the ith bit of the sum of its input bits.
     Suppose we are given an ACC ◦ THR circuit C of size s and n inputs, and we wish to count its
satisfying assignments. Let ` < n/2 be a parameter to set later. For every assignment A j ∈ {0, 1}2` to the
last 2` inputs of C, make a copy of C with the assignment A j plugged into those 2` inputs, calling this
copy CA j . Note that each CA j has (the same) n − 2` inputs x1 , . . . , xn−2` .
     For every i = 1, . . . , 2`, define

                  Bi (x1 , . . . , xn−2` ) := Bit`i (CA1 (x1 , . . . , xn−2` ), . . . ,CA22` (x1 , . . . , xn−2` )) .

Each function Bi can be implemented in s0 = 22` · s size, as a SYM ◦ ACC ◦ THR circuit. Applying
Theorem 1.1, Bi can be evaluated on all of its 2n−2` possible assignments in time
                                                             0
                        2n−2` · poly(n) + 2poly(log s ) ≤ 2n−2` · poly(n) + 2poly(`+log s) .

    The above for-loop over all i produces 2` · 2n−2` bits: for each of the 2n−2` partial assignments to
n − 2` variables, we learn the number (in 2` bits) of partial assignments on the other 2` variables which
result in satisfaction. The number of all satisfying assignments is obtained by simply summing all 2`-bit
numbers obtained from the 2n−2` assignments, in 2n−2` · poly(`) time.
                                                                   ε
    Letting ` = nε /2 for sufficiently small ε > 0, we have a 2n−n time algorithm.

2.2   Faster 0-1 linear programming
ACC ◦ THR circuits are definitely powerful enough to simulate 0-1 integer linear programming; a
straightforward application of Theorem 1.2 would yield a faster algorithm for the problem. However, the
improvement over exhaustive search would be rather minor, and tedious to calculate. By modifying the
proof of Theorem 1.1 in appropriate places, we can derive a better algorithm in this case.
Reminder of Theorem 1.4. Every 0-1 integer linear program with n variables and s constraints can
                             4
be solved in time 2n−Ω(n/ log (sM(log n))) · poly(s, n, M) with high probability, where M ≤ 2o(n) is an upper
bound on the bit-length of the coefficients in the program.

Proof. Consider a 0-1 linear program of the form Ax ≤ b, along with a cost function hc, xi we wish to
maximize, where A ∈ Zs×n , b ∈ Zs , and c ∈ ([−2M , 2M ] ∩ Z)n by assumption on M. First, reduce the
optimization problem to one of feasibility, in a standard way: include hc, xi ≥ v as an additional constraint
for various v ∈ Z, and by binary searching on v, we maximize the value of v such that the s + 1 constraint
system remains feasible. Since the xi are Boolean valued, the binary search uses at most t := O(M + log n)
calls to feasibility questions.
    Next, observe the feasibility questions can be viewed as a satisfiability question for a depth-two
circuit D with an AND of fan-in s at the top gate, and linear threshold gates on the bottom layer, by
directly translating each constraint in the program into a linear threshold gate. By Theorem 2.2 and the
argument in Lemma 2.1, each threshold gate in the circuit D can be replaced with a polynomial-sized
LEQ ◦ AND ◦ OR ◦ SYM circuit, where LEQ computes on t-bit integers a and b whether a ≤ b (recall
t := O(M + log n)). As LEQ has an XOR ◦ AND ◦ XOR circuit of O(t 2 ) size for t-bit inputs (see [8] for a

                        T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                           11
                                                R. RYAN W ILLIAMS

reference), the satisfiability question for the circuit D reduces to the SAT question for an AC0 [2] ◦ SYM
circuit C of size poly(M, log n, s). In particular, the circuit has the form

                                AND ◦ XOR ◦ AND ◦ XOR ◦ AND ◦ OR ◦ SYM .

Following the strategy of Theorem 1.2 (and the author’s ACC SAT algorithm [54]), the satisfiability
question for C with n inputs and size poly(M, log n, s) can be converted into the problem of evaluating
a larger AC0 [2] ◦ SYM circuit C0 , where C0 has n0 = n − k inputs, 2k · poly(M, log n, s) size, k < n/2 is a
parameter, and the entire AC0 [2] part of C0 has the form

                                 OR ◦ AND ◦ XOR ◦ AND ◦ XOR ◦ AND ◦ OR .

More precisely, C0 is the OR of 2k copies of the circuit C, where each copy has its first k inputs assigned
to a distinct string from {0, 1}k . Clearly, this circuit C0 is satisfiable if and only if C is satisfiable.
    Now we wish to evaluate C0 on all 2n−k inputs, efficiently. Rather than applying Beigel-Tarui at this
point, as in Lemma 2.1, we instead apply the probabilistic polynomials of Smolensky [50] to convert C0
into a SYM ◦ SYM circuit C00 . In particular, we use a slight modification of Smolensky’s construction, as
described by Kopparty and Srinivasan [25].
Theorem 2.6 (Smolensky, Kopparty–Srinivasan). For every AC0 [2] circuit C of depth d, size z, and
m inputs, and ε > 0, there is a distribution of n-variate polynomials DC over F2 with the following
properties. Each p with nonzero support in DC has degree at most (4 log z)d−1 · (log 1/ε), a polynomial
                                     d−1
p can be sampled from DC in mO(log z) (log 1/ε) time, and for every x ∈ {0, 1}m ,

                                            Pr [p(x) = C(x)] ≥ 1 − ε .
                                           p∼DC


    In our case, the AC0 [2] parts of our subcircuits Ci have six different layers, where each layer has the
                                                                                              ?
same gate type. The degree in the above construction is in fact only (4 log z)d −1 · (log 1/ε) where d ? is
the number of layers which are AND or OR (the XOR layers do not contribute to the degree, because
XOR can be directly simulated by summation over F2 ). Note that for each Ci we have d ? = 4 (two layers
are XORs).
    We can therefore apply Theorem 2.6 as follows. Recall that C0 is an OR of some AC0 [2] ◦ SYM
circuits C1 , . . . ,C2k , each with (the same) n − k inputs. Moreover, the top AC0 [2] part of each Ci has
z = poly(M, log n, s) size, and takes poly(M, log n, s) inputs (coming from the outputs of SYM gates).
For every i, we take the top AC0 [2] part of Ci , and invoke Theorem 2.6 with ε = 1/(10 · 2k ) to sample a
                                                                                                     3
polynomial pi ∼ DCi of degree at most O(k(log z)3 ) having at most poly(s, M)O(k·(log z) ) monomials.
    We replace the AC0 [2] part of Ci with the polynomial pi construed as an XOR of ANDs circuit. Now
the circuit C0 is an OR of 2k XOR of AND of SYM circuits; call them C100 , . . . ,C200k . For every input
x ∈ {0, 1}n−k , the SYM gates of C0 produce a single poly(s, M)-bit length input y. Taking the union bound
over all 2k subcircuits, every C100 , . . . ,C200k outputs the same values as C1 , . . . ,C2k on x, with probability at
least 1 − 1/10, by our choice of ε.
    Now we randomly convert the topmost OR in C0 to an XOR, via the usual Razborov-Smolensky
subsum trick: we pick r1,1 , r2,1 , r1,2 , r2,2 , . . . , r1,2k , r2,2k ∈ {0, 1} uniformly at random, and replace C =

                         T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                     12
           N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

OR(C100 , . . . ,C200k ) with
                                                                                        !                                              !
                                             2k                                                   2k
         C00 (x1 , . . . , xn−k ) :=         ∑ r1,i ·Ci00 (x1 , . . . , xn−k ) mod 2 ∨ ∑ r2,i ·Ci00 (x1 , . . . , xn−k ) mod 2
                                             i=1                                                  i=1
                                        2k                                       2k
                                   = ∑ r1,i ·Ci00 (x1 , . . . , xn−k ) + ∑ r2,i ·Ci00 (x1 , . . . , xn−k )
                                       i=1                                     i=1
                                                                                  !                                   !
                                               2k                                        2k
                                       +      ∑ r1,i ·Ci00 (x1 , . . . , xn−k ) · ∑ r2,i ·Ci00 (x1 , . . . , xn−k )       mod 2 ,
                                              i=1                                       i=1

which means that C00 equals

    2k                                        2k
   ∑     r1,i ·Ci00 (x1 , . . . , xn−k ) +   ∑ r2,i ·Ci00 (x1 , . . . , xn−k )
   i=1                                       i=1
                                                                        2k
                                                                 + ∑ r1,i · r2, j ·Ci00 (x1 , . . . , xn−k ) ·C00j (x1 , . . . , xn−k ) mod 2 .
                                                                     i, j=1


Now for every x ∈ {0, 1}n−k ,

            Pr        [C00 (x) 6= C0 (x)]
   pi ∼D,ri, j ∈{0,1}

          ≤         Pr           [∃ i,Ci00 (x) 6= Ci (x)] +        Pr        [OR(C100 (x), . . . ,C200k (x)) = C0 (x) | ∀ i,Ci00 (x) = Ci (x)]
              p1 ,...,p2k ∼DCi                                 ri, j ∈{0,1}

          ≤ 1/10 + 1/4 ≤ 1/3 .

That is, for every input x ∈ {0, 1}n−k , the probability that C0 (x) = C00 (x) will be greater than 2/3.
    Since each polynomial pi has degree at most O(k ·(log z)3 ), the AND gates representing the monomials
of pi have f ≤ O(k · (log z)3 ) fan-in. Applying another part of Lemma 2.1, the AND f ◦ SYM subcircuits
of C00 with poly(s, M) wires can be replaced by a single SYM gate with poly(s, M)O( f ) input wires. This
                                                              3
results in an XOR ◦ SYM circuit C00 of poly(s, M)O(k·(log z) ) total wires; note this is also a SYM ◦ SYM
circuit.
    Let ε > 0 be a parameter, and set
                                                                     
                                                            εn
                                      k := max 1, 4                     .
                                                    log (Ms(log n))

(Note that if k = 1, the statement of Theorem 1.4 is trivially true.) Following the proof of Theorem 1.1,
we can apply fast rectangular matrix multiplication to evaluate C00 on all 2n−k inputs. Recalling that
z = poly(M, log n, s), note that (log z) ≤ O(log(Ms(log n))). So for sufficiently small ε > 0, the evaluation
of C00 runs in time
                                                                                              3               4
  2n−k · poly(O(k · (log z)3 ), log M, n − k) + poly(s, M)O(k·(log z) ) ≤ 2n−Ω(n/ log (sM(log n))) · poly(s, n, M) .

                                  T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                                       13
                                              R. RYAN W ILLIAMS

The output of this procedure is a 2n−k -bit string which, for every x ∈ {0, 1}n−k , contains the correct output
C0 (x) with probability at least 2/3.
      Suppose we repeat the above randomized procedure for n2 times: that is, for n2 times, we inde-
pendently sample 2k polynomials pi for each Ci and sample ri, j ∈ {0, 1}, constructing n2 different
circuits C100 , . . . ,Cn002 from C0 . Then, standard tail bound arguments show that the majority value output by
C100 (x), . . . ,Cn002 (x) equals C0 (x) for every x ∈ {0, 1}n−k , with high probability. If some assignment x? has
majority value 1, we conclude that the integer program is feasible; otherwise, we output infeasible.

2.3   Non-uniform ACC ◦ THR lower bounds
We now turn to the main application of the evaluation algorithm:
Reminder of Theorem 1.3. NEXP does not have non-uniform ACC ◦ THR circuits of quasi-polynomial
size.
    To set the context, let us discuss the prior connection between known circuit satisfiability algorithms
and circuit lower bounds.

Definition 2.7. Let C be a circuit class. C is said to be typical if, given any circuit D from one of the
classes C ◦ C, AND ◦ C, OR ◦ C, NOT ◦ C, an equivalent D0 ∈ C can be produced in poly(size(D)) time.

    That is, C is typical if it is efficiently closed under composition, unbounded fan-in AND, OR, and
negations. Most well-studied circuit classes have this property.
    From prior work, we know there are connections between the existence of good SAT algorithms for
typical circuit classes, and lower bounds against those classes:

Theorem 2.8 ([54]). Let C be typical. Suppose for every c ≥ 1, there is an ε > 0 and an algorithm for
                                                                                        c
satisfiability of C circuits running in time O(2n−n ) on circuits with n inputs and nlog n size. Then NEXP
                                                   ε


does not have quasi-polynomial size C circuits.

    For example, the proof that NEXP 6⊂ ACC follows from giving a faster-than-exhaustive-search ACC
satisfiability algorithm, noting that ACC is typical, and applying Theorem 2.8.
    This theorem cannot be directly applied to a class such as ACC ◦ THR, because it is not known
whether ACC ◦ THR ◦ ACC ◦ THR can be efficiently simulated with ACC ◦ THR. However, by modifying
the argument of Theorem 2.8 and using an algorithm for counting SAT assignments, we can extend the
theorem to circuits with a very weak closure property.

Definition 2.9. Let C be a circuit class. We say C is efficiently closed under AND if, given the AND of
two circuits of C, an equivalent circuit in C can be produced in polynomial time.

     Efficient closure under AND is satisfied by strictly more circuit classes than the property of being
typical. To give an example, any class of the form SYM ◦ · · · is efficiently closed under AND, because an
AND of t SYM gates with s wires can be collapsed into a single symmetric gate with O(st ) wires (as seen
in the proof of Lemma 2.1). However, classes like SYM ◦ SYM and THR ◦ THR are not known to be
efficiently closed under composition or unbounded-fan in AND/OR, hence Theorem 2.8 does not apply
to such classes. We prove:

                        T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                   14
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

Theorem 2.10. Let C be efficiently closed under AND. Suppose for every c ≥ 1, there is an ε > 0 and an
algorithm for counting the satisfying assignments of C circuits in time O(2n−n ) on circuits with n inputs
                                                                              ε

        c
and nlog n size. Then NEXP does not have quasi-polynomial size C circuits.

     Note that Theorem 1.3 (the ACC ◦ THR lower bound) follows immediately from Theorem 2.10 and
the counting algorithm of Theorem 1.2.4 It is our hope that Theorem 2.10 may be applicable in the future
to depth-two classes such as SYM ◦ SYM and depth-two exact threshold circuits [18], both of which are
efficiently closed under AND. A nontrivial counting SAT algorithm for one of these classes would entail
new lower bounds.

Proof of Theorem 2.10. Let us start with C as typical. We survey what is needed to conclude C lower
bounds in the proof of Theorem 2.8 (from [54]), and show that the new hypothesis supplies these needs.
     The proof of Theorem 2.8 starts by assuming that NEXP ⊂ C and the theorem hypothesis. From these
two assumptions, we derive that every L ∈ NTIME[2n ] can be simulated in nondeterministic 2n /n time,
contradicting the nondeterminstic time hierarchy [56]. In particular, it is shown that the assumptions imply
that the NEXP-complete problem S UCCINCT 3SAT on AND/OR/NOT circuits of fan-in at most two,
                                                                               ε
n inputs, and poly(n) size can be nondeterministically solved in O(2n−n ) time, which is also provably
false (see for example [52] for a detailed exposition). Recall that S UCCINCT 3SAT is the problem: given
an AND/ OR/ NOT circuit C of fan-in two, does the truth table of C encode a satisfiable 3-CNF formula?
That is, S UCCINCT 3SAT is a “compressed” version of the 3SAT problem.
     Suppose we are given an (arbitrary) circuit C of s size and n inputs, and wish to determine if it is a
yes-instance of S UCCINCT 3SAT. Assuming NEXP has quasipolynomial-size circuits, it is proved (again
in [54]) that for every C encoding a satisfiable 3-CNF F, there is a quasipolynomial-size circuit D which
succinctly encodes a satisfying assignment for F: for all i, D(i) outputs the value of variable xi in the
satisfying assignment. Our “faster” nondeterministic algorithm for S UCCINCT 3SAT guesses this circuit
                                                                 c
D, and uses it to construct a circuit E with n inputs and nlog n size for some c, which is unsatisfiable if
and only if D encodes a satisfying assignment to the formula F encoded by C.
     Assuming that NEXP has quasipolynomial-size C circuits and that there is an O(2n−n ) time algorithm
                                                                                            ε


for C satisfiability, it is then proved that there is a nondeterministic algorithm A running in 2n−Ω(n ) time
                                                                                                      ε


which, given an AND/OR/NOT circuit E of fan-in two with s size and n inputs, outputs an equivalent
           c
E 0 of slog s size and n + O(log(size(E))) ≤ n + O(logc+1 n) inputs from the class C on at least one
nondeterministic branch (and prints no on other branches). Running this algorithm A, obtaining E 0 , then
running the C satisfiability algorithm on E 0 , we nondeterministically determine that C is a yes-instance of
                                 ε
S UCCINCT-3SAT in 2n−Ω(n ) time.
     Now assume C is only efficiently closed under AND. In the proof sketch above, the closure properties
of C become relevant, precisely in the argument that the nondeterministic algorithm A exists. In fact,
if the present theorem’s hypothesis and the assumption that NEXP has quasipolynomial-size C circuits
implies that such an algorithm A exists, it can be observed that the rest of the proof carries over without
any modifications. We now construct such an algorithm A.
   4 The reader is also recommended to read [24, 37, 7], which consider other (stronger) closure properties. Their work also

implies that faster SAT algorithms for ACC ◦ THR circuits yield lower bounds such as Theorem 1.3. In that light, the significant
contribution of this section is to show how circuit lower bounds still follow from a notably weaker closure property, provided we
can count SAT assignments faster.


                           T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                              15
                                                     R. RYAN W ILLIAMS

    Our new algorithm A is supposed to take a circuit E of size s and n inputs, and print an equivalent E 0 of
    c                                                                                   c
slog s size from the class C. Given E, our A starts by guessing a C circuit E 00 of nlog n size, which takes as
input a pair (x, g) ∈ {0, 1}n × {0, 1}log(size(E)) , and outputs 1 if and only if the gate g in E outputs 1 when E
is given the input x. Observe that such an E 00 exists, assuming that P has quasi-polynomial size C circuits
(if P has quasi-polynomial size C circuits, then the Circuit Evaluation problem has quasi-polynomial size
C circuits; see [54] for details).
      Now we verify that for every gate g indexed by 1, 2, . . . , size(E), E 00 (x, g) outputs what gate g of E(x)
outputs, on all x. Each gate g is either an input, an AND of two previous gates g1 and g2 , an OR of two
previous gates g1 and g2 , or a NOT of a previous gate g1 .
      To aid this verification, we show how to efficiently check for arbitrary C circuits G and H whether
G(x) = H(x) for all inputs x, using an algorithm for counting SAT assignments. Let #SAT (C) be the
number of satisfying assignments to a circuit C. Observe that G(x) = H(x) for all x if and only if
#SAT (G) = #SAT (H) = #SAT (G ∧ H). (Note the third quantity can be efficiently computed, assuming C
is efficiently closed under AND.) Moreover, G(x) 6= H(x) for all x if and only if #SAT (G)+#SAT (H) = 2n
and #SAT (G ∧ H) = 0. Therefore, by counting SAT assignments, we have algorithms checking whether
                                                                                                                              ε
G is equivalent to H, and whether G is equivalent to the negation of H, both running in time O(2n−n ).
      We claim that the verification problem for E 00 can be reduced to a small number of calls to the above
kinds of checks. First, nondeterministically guess a circuit Enot              00 , intended to satisfy E 00 (x, g) = ¬E 00 (x, g)
                                                                                                         not
for all x and g. Verifying this condition can be done by counting SAT assignments, as described above.
      Checking E 00 is correct on the input gates of E means that for all i = 1, . . . , n, E 00 (x1 , . . . , xn , i) = xi .
Both E 00 (x1 , . . . , xn , i) and I(x1 , . . . , xn ) = xi are C circuits, hence their equivalence can be verified by #SAT
calls. Checking a NOT gate g of E with input gate g1 is equivalent to checking that Enot                   00 (x, g ) = E 00 (x, g)
                                                                                                                    1
on all x.
      Checking an AND gate g of two previous gates g1 and g2 amounts to checking that E 00 (x, g) =
E 00 (x, g1 ) ∧ E 00 (x, g2 ) on all x. To do this, compute Gand (x) := E 00 (x, g1 ) ∧ E 00 (x, g2 ) (assuming C is
efficiently closed under AND), then check Gand (x) = E 00 (x, g) for all x.
      Finally, for an OR gate g with inputs g1 and g2 , we want to check that E 00 (x, g) = E 00 (x, g1 ) ∨ E 00 (x, g2 )
on all x. This is equivalent to ¬E 00 (x, g) = ((¬E 00 (x, g1 )) ∧ (¬E 00 (x, g2 ))) for all x. This can be checked
by forming Gor (x) := Enot        00 (x, g ) ∧ E 00 (x, g ), then checking that G (x) = E 00 (x, g) for all x.
                                          1          not     2                          or         not
                                            c                                                    ε                ε
      On a circuit E with s ≤ nlog n gates, the above procedure runs in O(2n−n · s) ≤ 2n−Ω(n ) time. When
it concludes, we know that for all gates g and all x that E 00 (x, g) outputs the correct value. The circuit
E 0 (x) output by A simply evaluates E 00 (x, g? ), where g? is the output gate of E.


3     Fast evaluation of depth-two threshold circuits
Finally we show, that, in a strong sense, depth-two threshold circuits are weak, by giving a fast algorithm
for evaluating such circuit on many assignments in batch. The general result is the following.

Theorem 3.1. Given a depth-two linear threshold circuit C with 2k inputs and at most n1/12 gates with
weights on the bottom layer of absolute value at most Wb , weights on the output gate of absolute value at
most Wo , and given two sets A, B ⊆ {0, 1}k where |A| = |B| = n, we can evaluate C on all n2 points in
A × B using n2 · poly(logWo , log n) + n1+1/12 · poly(log n, logWo , logWb ) time.

                           T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                               16
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

    The following is immediate from Theorem 3.1:
Reminder of Theorem 1.5. Let k > 1. Given a depth-two 2n/24 -size linear threshold circuit C with
                         k    k
integer weights in [−2n , 2n ], we can evaluate C on all 2n input assignments in 2n · poly(nk ) time.
    While the proof of Theorem 3.1 also ultimately depends on Coppersmith’s rectangular matrix
multiplication, the rest of the algorithm is rather different from the evaluation algorithm of Theorem 1.1.

Proof of Theorem 3.1. We reduce the evaluation task to a special kind of matrix multiplication, then
combine Coppersmith’s matrix multiplication with a mild brute force to expedite the matrix multiplication.
    Define the function LEQ : Z × Z → {0, 1} to output 1 on (a, b) if and only if a ≤ b. Given a
vector w = (w1 , . . . , wd ) ∈ Zd , and given two matrices M and N which are n × d and d × n, define their
w-weighted threshold product to be
                                                          d
                                  (M ~w N)[i, j] := ∑ wk · LEQ(M[i, k], N[k, j]) .
                                                         k=1

    We shall show that the w-weighted threshold product of an n × n1/12 matrix and an n1/12 × n matrix
can be computed in essentially n2 · poly(log n) time (with some additional but negligible overhead in
terms of the weights). Let us postpone this algorithm for the moment, and first show how to embed the
evaluation problem into the weighted threshold product.
    Let C be a depth-two circuit of size s, with the 2k input variables x1 , . . . , xk , y1 , . . . , yk . Let w1 , . . . , ws
be the weights of the top threshold gate of C, and let `1 ,t1 , . . . , `s ,ts be the corresponding linear forms and
threshold values from the bottom layer of threshold gates: that is, the output of LEQ(ti , `i ) is multipled
by wi in the output gate. Without loss of generality, we may assume that all weights wi are multiplied by
the output of some threshold gate at the bottom layer (there are at most 2k wires from the input directly
to the output gate, and they can be replaced by O(k) dummy gates at the bottom layer with wires to the
output gate). Let A = {A1 , . . . , An } ⊆ {0, 1}k and B = {B1 , . . . , Bn } ⊆ {0, 1}k .
                                                                                   (x)      (y)               (x)
    We partition each linear form ` j on the bottom layer into two sums ` j and ` j , such that ` j involves
                                       (y)                                      (x)   (y)                  (x)             (y)
only input variables x1 , . . . , xk , ` j involves only y1 , . . . , yk , and ` j + ` j = ` j . Let Ai (` j ) and B j (` j )
                                             (x)                  (y)
denote the value of the linear form ` j (respectively, ` j ) evaluated on assignment Ai (respectively, B j ).
    We now define two special matrices M and N with respect to the Ai ’s and B j ’s. Define a matrix M
with rows indexed by elements of A, and columns indexed by the bottom layer gates 1, . . . , s, where
                                                                        (x)
                                                   M[i, k] := tk − Ai (`k ) .

Further define a matrix N with rows indexed by the bottom layer gates 1, . . . , s, and columns indexed by
elements of B, where
                                                          (y)
                                         N[k, j] := B j (`k ) .
Now consider the w-weighted threshold product M ~w N, where w is the same as above. The i, j entry of
this product equals
                  s                                    s                                    
                                    (x)         (y)                          (x)          (y)
                  w
                 ∑ k · LEQ  tk − A(`k   ), B  (`
                                             j k    )  =   w
                                                         ∑ k · LEQ  t  , A
                                                                      k i k(`    ) + B  (`
                                                                                       j k    )  .
                 k=1                                            k=1


                          T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                             17
                                                R. RYAN W ILLIAMS

This is precisely the value of the linear form in the output gate of C, when x1 , . . . , xk are given the
assignment Ai and y1 , . . . , yk are assigned B j . The truth table of C on A × B can be recovered by simply
checking which entries in (M ~w N) exceed the output gate’s threshold.
    Next, we shall show how to compute a weighted threshold matrix product efficiently. Let δ be a
parameter, and let M and N be n × nδ and nδ × n matrices, respectively. The first step is to reduce the
weights significantly. For all k = 1, . . . , nδ , let Sk be a list of all entries in the kth column of M, plus the kth
row of N. Sort Sk , obtaining a ranking of 2n items, and replace each entry in the kth column of M and the
kth row of N by their rank in the sorted list Sk . This step reduces the domains of M and N to {1, . . . , 2n},
and the w-weighted threshold matrix product remains the same: all inequalities M[i, k] ≤ N[k, j] are
preserved. Note this step takes n1+δ · poly(log n, logWb ) time.
    In order to reduce this matrix computation to a standard matrix multiplication, we perform two
strategies with different advantages. (The reduction is inspired by work of Matousek [31] on computing
dominances in high dimensions.) Let s ∈ {1, . . . , n} be a parameter. Partition each sorted list Sk into
t = dn/se contiguous buckets T1 , . . . , Tt , where each bucket Ti contains at most s entries. (For all i < j,
the largest entry in Ti is at most the smallest entry in T j .)
    Start with an n × n output matrix P that is all zeroes. For every (i, k) ∈ [n] × [nδ ], look up the bucket
T` containing M[i, k] in the sorted list Sk . For all N[k, j] contained in T` such that M[i, k] ≤ N[k, j], add
the weight wk to the entry P[i, j]. This loop adds to P all terms wk · LEQ(M[i, k], N[k, j]) such that M[i, k]
and N[k, j] appear in the same bucket of Sk . Observe that this step takes n · nδ · s · poly(log(Wo ), log(Wb ))
time.
    To handle the (M[i, k], N[k, j]) pairs that do not appear in the same bucket, we use matrix multiplication.
For each (i, k) ∈ [n] × [nδ ], replace the entry M[i, k] with a row vector vi,k ∈ {0, wk }t , such that vi,k [`] := wk
if and only if M[i, k] is in bucket T` of Sk . That is, vi,k has wk in exactly one entry, and zeroes elsewhere.
This forms a matrix M 0 of dimensions n × (nδ · t). For (k, j) ∈ [nδ ] × [n], replace each entry N[k, j] with a
column vector uk, j ∈ {0, 1}t , such that uk, j [`0 ] := 1 if and only if N[k, j] is in bucket T` of Sk and ` > `0 .
This forms a matrix N 0 of dimensions (nδ · t) × n. The matrix product M 0 · N 0 over the integers computes
a sum of inner products
                                           (M 0 · N 0 )[i, j] = ∑hvi,k , uk, j i .
                                                             nδ

If M[i, k] > N[k, j], or M[i, k] and N[k, j] are in the same bucket of Sk , then hvi,k , uk, j i = 0. If M[i, k] ≤
N[k, j] but N[k, j] and M[i, k] are in different buckets of Sk then hvi,k , uk, j i = wk .
    Letting P := P + (M 0 · N 0 ), this procedure adds to P all terms wk · LEQ(M[i, k], N[k, j]) such that
M[i, k] and N[k, j] appear in different buckets of Sk . Therefore P[i, j] contains the value of the linear form
for the output gate of C, under variable assignment (Ai , B j ), for all i, j.
    The above algorithm runs in time

                            O(n · nδ · s logWo + MM(n, n1+δ /s, n) · poly(logWo )) ,

where MM(a, b, c) is the running time for multiplying a × b and b × c matrices. If we set n1+δ /s = n0.172 ,
then Coppersmith’s algorithm (Lemma 2.5) can be applied to the second term of the running time,
implementing it in n2 · poly(log n) time. Under this setting, s = nδ · n0.828 and the first term of the running
time is n1+2δ +0.828 . Setting δ = 0.086 > 1/12, the first term becomes n2 (note that s = n.914 ).

                         T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                     18
        N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

    It is easy to see that, since the above algorithm outputs the values of the linear form at the output gate
of a depth-two threshold circuit over all inputs, we can also efficiently evaluate large SYM ◦ THR circuits.


4    Discussion
In this paper, we have shown how Boolean functions computable by small ACC ◦ THR circuits have low
“symmetric rank,” how to efficiently compute an appropriate rank decomposition (given a circuit), and
how to exploit this rank decomposition for new satisfiability algorithms and new circuit lower bounds.
Our methods provide ways to algorithmically handle symmetric and threshold gates at the bottom of
a circuit, which also lead to faster algorithms for problems where threshold functions are part of the
constraint set, such as 0-1 linear programming.
     Recall that in Section 3 (Theorem 3.1), we showed how to evaluate a given THR ◦ THR circuit of
2 Ω(n) size on all 2n inputs in 2n+o(n) time. Perhaps the most interesting problem left open by this paper is:
can we apply the fast evaluation methods of Section 3 to prove lower bounds for THR ◦ THR circuits?
Drawing from the results of this paper, we believe that THR ◦ THR circuits should be provably weaker
than arbitrary (unrestricted) circuits of polynomially related size. First let us recall a simple observation
regarding the connection between multipoint circuit evaluation and Circuit-SAT (implicit in [54, Theorem
4.5]).

Theorem 4.1. Suppose there is an α > 0 such that we can evaluate arbitrary (unrestricted) circuits of
2αn+o(n) size on all 2n inputs, in 2n+o(n) time. Then there is a δ < 1 such that Circuit-SAT for 2o(n) -size
circuits with n inputs is solvable in 2δ n time.

Proof. The proof is similar in nature to the proof of Theorem 1.2: given a circuit C of size 2o(n) , for every
assignment A j ∈ {0, 1}` , make a circuit CA j (x1 , . . . , xn−` ) := C(x1 , . . . , xn−` , A j ), and define D to be the
OR over all j of CA j . D has n − ` inputs and size 2` · 2o(n) . Setting ` = αn, the hypothesis of the theorem
implies that we can evaluate D on all its possible inputs (and thereby solve SAT for C) in 2n−`+o(n) time,
so we can set δ = 1 − α.

    The conclusion of Theorem 4.1 is extremely strong; the author considers it to be unlikely to be true.
For one, it is contrary to what we generally believe about the power of arbitrary circuits: it seems unlikely
that we can solve the SAT problem much faster on circuits of exponential size. Moreover, this conclusion
has very strong consequences: it would imply exponential-size, unrestricted circuit lower bounds for
EXPNP by the known connections between SAT algorithms and lower bounds [53]. As the hypothesis of
Theorem 4.1 is true for THR ◦ THR circuits, it therefore seems unlikely that one can convert unrestricted
circuits of size s into THR ◦ THR circuits of size poly(s). Indeed, we have the following result.

Theorem 4.2. Suppose the circuit evaluation problem has THR ◦ THR circuits of poly(n) size, which
are constructible in poly(n) time. Then there is a δ < 1 such that Circuit-SAT for 2o(n) -size circuits with
n inputs is solvable in 2δ n time.

Proof. We prove that the hypothesis of the theorem implies the hypothesis of Theorem 4.1. Let k ≥ 1 be
such that the circuit evaluation problem (on circuits of size up to n) has THR ◦ THR circuits of at most
nk size, which are constructible in O(nk ) time. Let α = 1/(24k). Given an unrestricted circuit C with

                         T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                                        19
                                           R. RYAN W ILLIAMS

n inputs of 2αn size, we first generate an O(2kαn )-size THR ◦ THR circuit D for the circuit evaluation
problem, in O(2kαn ) ≤ O(2n/24 ) time. The circuit D takes a pair (C0 , x) as input (where C0 is a circuit
and x is an input to C0 ), and outputs the value of C0 (x). If we hard-code the description of C into the
input of D, we obtain a THR ◦ THR circuit of at most 2kαn ≤ O(2n/24 ) size which is equivalent to C. By
Theorem 3.1, we can evaluate D (and therefore C) on all possible inputs in 2n · poly(n) time.

     Therefore, under the hypothesis that general Circuit-SAT cannot be solved much faster than 2n time
on subexponential-size circuits, we can conclude a super-polynomial lower bound for THR ◦ THR circuits
computing the circuit evaluation problem (which is in P). While this exact line of reasoning is highly
unlikely to lead to an unconditional lower bound itself (proving any lower bound against Circuit-SAT is
difficult!), it provides further confidence that THR ◦ THR circuits are much weaker than their unrestricted
counterparts.
     Together, the above two theorems show the following.

        If P has uniform THR ◦ THR circuits of polynomial size, then there is an α > 0
        such that EXPNP does not have 2αn -size circuits.

    For reasons such as these, we believe that lower bounds such as “NP does not have uniform THR ◦
THR circuits of polynomial size” may be within reach, but we have not yet found the right form of
argument.

Acknowledgements. Thanks to Igor Carboni Oliveira for sending a preliminary version of his survey,
which helped the ideas in the proof of Theorem 2.10 to congeal. I also thank Rahul Santhanam, Emanuele
Viola, and the anonymous STOC and ToC reviewers, for helpful comments (and in the case of the ToC
reviewers, I also thank them for years of patience).


References
 [1] E RIC A LLENDER AND V IVEK G ORE: On strong separations from AC0 . In Proc. 8th Internat. Conf.
     Fundamentals of Computation Theory (FCT’91), pp. 1–15. Springer, 1991. [doi:10.1007/3-540-
     54458-5_44] 8

 [2] JAMES A SPNES , R ICHARD B EIGEL , M ERRICK F URST, AND S TEVEN RUDICH: The expressive
     power of voting polynomials. Combinatorica, 14(2):135–148, 1994. Conference version in
     STOC’91. [doi:10.1007/BF01215346] 5

 [3] DAVID A. M IX BARRINGTON AND H OWARD S TRAUBING: Complex polynomials and circuit
     lower bounds for modular counting. Comput. Complexity, 4(4):325–338, 1994. Conference version
     in LATIN’92. [doi:10.1007/BF01263421] 5

 [4] PAUL B EAME AND T RINH H UYNH: Multiparty communication complexity and threshold cir-
     cuit size of AC0 . SIAM J. Comput., 41(3):484–518, 2012. Conference version in FOCS’09.
     [doi:10.1137/100792779] 5

                      T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                             20
      N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

 [5] R ICHARD B EIGEL: When do extra majority gates help? Polylog(N) majority gates are equiv-
     alent to one. Comput. Complexity, 4(4):314–324, 1994. Conference version in STOC’92.
     [doi:10.1007/BF01263420] 5, 8
 [6] R ICHARD B EIGEL AND J UN TARUI: On ACC. Comput. Complexity, 4(4):350–366, 1994. Confer-
     ence version in FOCS’91. [doi:10.1007/BF01263423] 8
 [7] E LI B EN -S ASSON AND E MANUELE V IOLA: Short PCPs with projection queries. In Proc. 41st
     Internat. Colloq. on Automata, Languages and Programming (ICALP’14), pp. 163–173. Springer,
     2014. [doi:10.1007/978-3-662-43948-7_14] 15
 [8] A SHOK K. C HANDRA , L ARRY S TOCKMEYER , AND U ZI V ISHKIN: Constant depth reducibility.
     SIAM J. Comput., 13(2):423–439, 1984. [doi:10.1137/0213028] 8, 11
 [9] A RKADEV C HATTOPADHYAY AND K RISTOFFER A RNSFELT H ANSEN: Lower bounds for circuits
     with few modular and symmetric gates. In Proc. 32nd Internat. Colloq. on Automata, Languages
     and Programming (ICALP’05), pp. 994–1005. Springer, 2005. [doi:10.1007/11523468_80] 5
[10] G IL C OHEN: A taste of circuit complexity pivoted at NEXP 6⊂ ACC0 (and more), 2013. Available
     at ECCC. 2
[11] D ON C OPPERSMITH: Rapid multiplication of rectangular matrices. SIAM J. Comput., 11(3):467–
     471, 1982. [doi:10.1137/0211037] 6
[12] J ÜRGEN F ORSTER , M ATTHIAS K RAUSE , S ATYANARAYANA V. L OKAM , RUSTAM M UBARAKZ -
     JANOV, N IELS S CHMITT, AND H ANS U LRICH S IMON: Relations between communication complex-
     ity, linear arrangements, and computational complexity. In Proc. 21st Internat. Conf. Foundations
     of Software Technology and Theoretical Computer Science (FSTTCS’01), pp. 171–182. Springer,
     2001. [doi:10.1007/3-540-45294-X_15] 5
[13] M IKAEL G OLDMANN: On the power of a threshold gate at the top. Inform. Process. Lett., 63(6):287–
     293, 1997. [doi:10.1016/S0020-0190(97)00141-5] 5
[14] PARIKSHIT G OPALAN AND ROCCO A. S ERVEDIO: Learning and lower bounds for AC0 with thresh-
     old gates. In Proc. 14th Internat. Workshop on Randomization and Computation (RANDOM’10),
     pp. 588–601. Springer, 2010. [doi:10.1007/978-3-642-15369-3_44] 5
[15] A NDRÁS H AJNAL , W OLFGANG M AASS , PAVEL P UDLÁK , M ARIO S ZEGEDY, AND G YÖRGY
     T URÁN: Threshold circuits of bounded depth. J. Comput. System Sci., 46(2):129–154, 1993.
     Conference version in FOCS’87. [doi:10.1016/0022-0000(93)90001-D] 5
[16] K RISTOFFER A RNSFELT H ANSEN: Computing symmetric boolean functions by circuits with
     few exact threshold gates. In Proc. 13th Ann. Internat. Computing and Combinatorics Conf.
     (COCOON’07), pp. 448–458. Springer, 2007. [doi:10.1007/978-3-540-73545-8_44] 5
[17] K RISTOFFER A RNSFELT H ANSEN AND P ETER B RO M ILTERSEN: Some meet-in-the-middle circuit
     lower bounds. In Proc. 33rd Internat. Symp. Mathematical Foundations of Comp. Sci. (MFCS’04),
     pp. 334–345. Springer, 2004. [doi:10.1007/978-3-540-28629-5_24] 5

                     T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                          21
                                          R. RYAN W ILLIAMS

[18] K RISTOFFER A RNSFELT H ANSEN AND V LADIMIR V. P ODOLSKII: Exact threshold circuits. In
     Proc. 25th IEEE Conf. on Computational Complexity (CCC’10), pp. 270–279. IEEE Comp. Soc.
     Press, 2010. [doi:10.1109/CCC.2010.33] 15

[19] K RISTOFFER A RNSFELT H ANSEN AND V LADIMIR V. P ODOLSKII: Polynomial threshold functions
     and boolean threshold circuits. Inform. and Comput., 240:56–73, 2015. Conference version in
     MFCS’13. [doi:10.1016/j.ic.2014.09.008] 3, 5

[20] RUSSELL I MPAGLIAZZO , VALENTINE K ABANETS , AND AVI W IGDERSON: In search of an easy
     witness: Exponential time vs. probabilistic polynomial time. J. Comput. System Sci., 65(4):672–694,
     2002. Conference version in CCC’01. [doi:10.1016/S0022-0000(02)00024-7] 2

[21] RUSSELL I MPAGLIAZZO , S HACHAR L OVETT, R AMAMOHAN PATURI , AND S TEFAN S CHNEIDER:
     0-1 integer linear programming with a linear number of constraints, 2014. [arXiv:1401.5512] 4

[22] RUSSELL I MPAGLIAZZO , W ILLIAM M ATTHEWS , AND R AMAMOHAN PATURI: A satisfiability
     algorithm for AC0 . In Proc. 23rd Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’12), pp.
     961–972. ACM Press, 2012. ACM DL. [arXiv:1107.3127] 4

[23] RUSSELL I MPAGLIAZZO , R AMAMOHAN PATURI , AND S TEFAN S CHNEIDER: A satisfiability
     algorithm for sparse depth two threshold circuits. In Proc. 54th FOCS, pp. 479–488. IEEE Comp.
     Soc. Press, 2013. [doi:10.1109/FOCS.2013.58, arXiv:1212.4548] 4

[24] H AMIDREZA JAHANJOU , E RIC M ILES , AND E MANUELE V IOLA: Local reductions, 2013.
     [arXiv:1311.3171] 15

[25] S WASTIK KOPPARTY AND S RIKANTH S RINIVASAN: Certifying polynomials for AC0 [⊕] circuits,
     with applications to lower bounds and circuit compression. Theory of Computing, 14(12):1–24,
     2018. Conference version in FSTTCS’12. [doi:10.4086/toc.2018.v014a012] 12

[26] S ATYANARAYANA V. L OKAM: Complexity lower bounds using linear algebra. Foundations and
     Trends in Theoretical Computer Science, 4(1-2):1–155, 2009. [doi:10.1561/0400000011] 5

[27] S HACHAR L OVETT AND S RIKANTH S RINIVASAN: Correlation bounds for poly-size AC0 circuits
     with n1−o(1) symmetric gates. In Proc. 15th Internat. Workshop on Randomization and Computation
     (RANDOM’11), pp. 640–651. Springer, 2011. [doi:10.1007/978-3-642-22935-0_54] 5

[28] A LEXIS M ACIEL AND D ENIS T HÉRIEN: Threshold circuits for iterated multiplication: Using AC0
     for free. In Proc. 10th Symp. Theoretical Aspects of Comp. Sci. (STACS’93), pp. 545–565. Springer,
     1993. [doi:10.1007/3-540-56503-5_54] 3, 5

[29] A LEXIS M ACIEL AND D ENIS T HÉRIEN: Threshold circuits of small majority-depth. Inform. and
     Comput., 146(1):55–83, 1998. [doi:10.1006/inco.1998.2732] 7

[30] A LEXIS M ACIEL AND D ENIS T HÉRIEN: Efficient threshold circuits for power series. Inform. and
     Comput., 152(1):62–73, 1999. [doi:10.1006/inco.1998.2783] 3

                      T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                          22
      N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

[31] J I ŘÍ M ATOUŠEK: Computing dominances in E n . Inform. Process. Lett., 38(5):277–278, 1991.
     [doi:10.1016/0020-0190(91)90071-O] 18

[32] M ARVIN M INSKY AND S EYMOUR A. PAPERT: Perceptrons: An Introduction to Computational
     Geometry. MIT Press, 1969. 2

[33] S ABURO M UROGA: Threshold Logic and its Applications. John Wiley & Sons, Inc., 1971. 2, 7

[34] S ABURO M UROGA , I WAO T ODA , AND S ATORU TAKASU: Theory of majority decision elements.
     Journal of the Franklin Institute, 271(5):376–418, 1961. [doi:10.1016/0016-0032(61)90702-5] 7

[35] M ONI NAOR AND O MER R EINGOLD: Number-theoretic constructions of efficient pseudo-
     random functions.     J. ACM, 51(2):231–262, 2004. Conference version in FOCS’97.
     [doi:10.1145/972639.972643] 3, 4

[36] N OAM N ISAN: The communication complexity of threshold gates. In Combinatorics, Paul Erdős is
     Eighty, pp. 301–315. János Bolyai Mathematical Society, Budapest, 1994. 5

[37] I GOR C ARBONI O LIVEIRA: Algorithms versus circuit lower bounds, 2013. [arXiv:1309.0249] 2,
     15

[38] E RION P LAKU: Multiplicity automata, polynomials and the complexity of small-depth boolean
     circuits. Master’s thesis, Clarkson University, 2002. Available at robotmotionplanning.org. 4

[39] V LADIMIR V. P ODOLSKII: Exponential lower bound for bounded depth circuits with few threshold
     gates. Inform. Process. Lett., 112(7):267–271, 2012. [doi:10.1016/j.ipl.2011.12.011] 5

[40] A LEXANDER A. R AZBOROV: On small depth threshold circuits. In Proc. 3rd Scandinavian
     Workshop on Algorithm Theory (SWAT’92), pp. 42–52. Springer, 1992. [doi:10.1007/3-540-55706-
     7_4] 3

[41] A LEXANDER A. R AZBOROV AND S TEVEN RUDICH: Natural proofs. J. Comput. System Sci.,
     55(1):24–35, 1997. [doi:10.1006/jcss.1997.1494] 2, 4

[42] A LEXANDER A. R AZBOROV AND A LEXANDER A. S HERSTOV: The sign-rank of AC0 . SIAM J.
     Comput., 39(5):1833–1855, 2010. Conference version in FOCS’08. [doi:10.1137/080744037] 5

[43] A LEXANDER A. R AZBOROV AND AVI W IGDERSON: nΩ(log n) lower bounds on the size of depth-3
     threshold circuits with AND gates at the bottom. Inform. Process. Lett., 45(6):303–307, 1993.
     Conference version in STOC’94. [doi:10.1016/0020-0190(93)90041-7] 5

[44] K ENNETH W. R EGAN: Polynomials and combinatorial definitions of languages. In Complexity
     Theory Retrospective II, pp. 261–293. Springer, 1997. 3

[45] J OHN H. R EIF AND S TEPHEN R. TATE: On threshold circuits and polynomial computation. SIAM
     J. Comput., 21(5):896–908, 1992. Conference version in SCTC’87. [doi:10.1137/0221053] 3

                     T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                        23
                                          R. RYAN W ILLIAMS

[46] R AHUL S ANTHANAM: Ironic complicity: Satisfiability algorithms and circuit lower bounds.
     Bulletin of the EATCS, 106:31–52, 2012. 2

[47] A LEXANDER A. S HERSTOV: Separating AC0 from depth-2 majority circuits. SIAM J. Comput.,
     38(6):2113–2129, 2009. Conference version in STOC’07. [doi:10.1137/08071421X] 5

[48] K AI -Y EUNG S IU , J EHOSHUA B RUCK , T HOMAS K AILATH , AND T HOMAS H OFMEISTER: Depth
     efficient neural networks for division and related problems. IEEE Trans. Inform. Theory, 39(3):946–
     956, 1993. [doi:10.1109/18.256501] 3

[49] K AI -Y EUNG S IU AND V WANI P. ROYCHOWDHURY: On optimal depth threshold circuits
     for multiplication and related problems. SIAM J. Discrete Math., 7(2):284–292, 1994.
     [doi:10.1137/S0895480192228619] 3

[50] ROMAN S MOLENSKY: Algebraic methods in the theory of lower bounds for Boolean circuit
     complexity. In Proc. 19th STOC, pp. 77–82. ACM Press, 1987. [doi:10.1145/28395.28404] 12

[51] E MMANUELE V IOLA: Pseudorandom bits for constant-depth circuits with few arbitrary sym-
     metric gates. SIAM J. Comput., 36(5):1387–1403, 2007. Conference version in CCC’05.
     [doi:10.1137/050640941] 5

[52] R. RYAN W ILLIAMS: Guest column: a casual tour around a circuit complexity bound. ACM
     SIGACT News, 42(3):54–76, 2011. [doi:10.1145/2034575.2034591, arXiv:1111.1261] 2, 15

[53] R. RYAN W ILLIAMS: Improving exhaustive search implies superpolynomial lower bounds. SIAM J.
     Comput., 42(3):1218–1244, 2013. Conference version in STOC’10. [doi:10.1137/10080703X] 2, 6,
     19

[54] R. RYAN W ILLIAMS: Nonuniform ACC circuit lower bounds. J. ACM, 61(1):2:1–2:32, 2014.
     Conference version in CCC’11. [doi:10.1145/2559903] 2, 3, 4, 6, 10, 12, 14, 15, 16, 19

[55] R. RYAN W ILLIAMS: Natural proofs versus derandomization. SIAM J. Comput., 45(2):497–529,
     2016. Conference version in STOC’13. [doi:10.1137/130938219, arXiv:1212.1891] 2

[56] S TANISLAV Ž ÁK: A Turing machine time hierarchy. Theoret. Comput. Sci., 26(3):327–333, 1983.
     [doi:10.1016/0304-3975(83)90015-4] 15


AUTHOR

      R. Ryan Williams
      Associate professor
      Electrical Engineering and Computer Science
      MIT, Cambridge, MA
      rrw mit edu
      http://people.csail.mit.edu/rrw/


                     T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                           24
    N EW A LGORITHMS AND L OWER B OUNDS FOR C IRCUITS W ITH L INEAR T HRESHOLD G ATES

ABOUT THE AUTHOR

    R. RYAN W ILLIAMS received his Ph. D. from Carnegie Mellon University in 2007, advised
       by the incomparable Manuel Blum. Up until recently, he lived on the Stanford campus,
       where marble-sized tomatoes grew on his balcony. Nowadays, he has no balcony and no
       tomatoes, but at least he can afford a place larger than a closet. He does not understand
       everyone else’s distinction between algorithms and complexity.




                   T HEORY OF C OMPUTING, Volume 14 (17), 2018, pp. 1–25                           25