Separation of AC$^0[\oplus]$ Formulas and Circuits

Authors Ben Rossman, Srikanth Srinivasan,
Plaintext
                          T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20
                                       www.theoryofcomputing.org




  Separation of AC0[⊕] Formulas and Circuits
                          Benjamin Rossman∗                         Srikanth Srinivasan†
               Received August 11, 2017; Revised October 3, 2019; Published December 17, 2019




       Abstract: We give the first separation between the power of formulas and circuits in the
       AC0 [⊕] basis (unbounded fan-in AND, OR, NOT and MOD2 gates). We show that there
       exist poly(n)-size depth-d circuits that are not equivalent to any depth-d formula of size no(d)
       for all d ≤ O(log(n)/log log(n)). This result is obtained by a combination of new lower and
       upper bounds for Approximate Majorities, the class of Boolean functions {0, 1}n → {0, 1}
       that agree with the Majority function on a 3/4 fraction of the inputs.
       AC0 [⊕
            ⊕] formula lower bound. We show that every depth-d AC0 [⊕] formula of size s has a
       1/4-error polynomial approximation over F2 of degree O((1/d) log s)d−1 . This strengthens
       a classic O(log s)d−1 degree approximation for circuits due to Razborov (1987). Since any
                                                                          √
       polynomial that approximates the Majority function has degree Ω( n), this result implies
       an exp(Ω(dn1/2(d−1) )) lower bound on the depth-d AC0 [⊕] formula size of all Approximate
       Majority functions for all d ≤ O(log n).
       Monotone AC0 circuit upper bound. For all d ≤ O(log(n)/log log(n)), we give a randomized
       construction of depth-d monotone AC0 circuits (without NOT or MOD2 gates) of size
       exp(O(n1/2(d−1) )) that compute an Approximate Majority function. This strengthens a
       construction of formulas of size exp(O(dn1/2(d−1) )) due to Amano (2009).

ACM Classification: F.1.3, F.2.3
AMS Classification: 68Q17, 68Q15
Key words and phrases: formulas, circuits, lower bounds, AC0
     A preliminary version of this paper appeared in the Proc. of the 44th International Colloquium on Automata, Languages,
and Programming (ICALP 2017).
   ∗ Supported by NSERC and a Sloan Foundation Fellowship.
   † Supported by MATRICS grant MTR/2017/000958 awarded by SERB, Government of India.



© 2019 Benjamin Rossman and Srikanth Srinivasan
c b Licensed under a Creative Commons Attribution License (CC-BY)                            DOI: 10.4086/toc.2019.v015a017
                                B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

1     Introduction
The relative power of formulas versus circuits is one of the great mysteries in complexity theory. The
central question in this area is whether NC1 (the class of languages decidable by polynomial-size Boolean
formulas) is a proper subclass of P/poly (the class of languages decidable by polynomial-size Boolean
circuits). Despite decades of efforts, this question remains wide open.1 In the meantime, there has been
progress on analogues of the NC1 vs. P/poly question in certain restricted settings. For instance, in the
monotone basis (with AND and OR gates only), the power of polynomial-size formulas vs. circuits was
separated by the classic lower bound of Karchmer and Wigderson [9] (on the monotone formula size of
st-Connectivity).
    The bounded-depth setting is another natural venue for investigating the question of formulas vs.
circuits. Consider the elementary fact that every depth-d circuit of size s is equivalent to a depth-d formula
of size at most sd−1 , where we measure size by the number of gates. This observation is valid with respect
to any basis (i. e., set of gate types). In particular, we may consider the AC0 basis (unbounded fan-in AND,
OR, NOT gates) and the AC0 [⊕] basis (unbounded fan-in MOD2 gates in addition to AND, OR, NOT
gates). With respect to either basis, there is a natural depth-d analogue of the NC1 vs. P/poly question
(where d = d(n) is a parameter that may depend on n), namely whether every language decidable by
polynomial-size depth-d circuits is decidable by depth-d formulas of size no(d) (i. e., better than the trivial
nO(d) upper bound).
    It is reasonable to expect that this question could be resolved in the sublogarithmic-depth regime
d(n) = o(log n) given the powerful lower-bound techniques against AC0 circuits (Håstad’s Switching
Lemma [6]) and AC0 [⊕] circuits (the Polynomial Method of Razborov [13] and Smolensky [16]). How-
ever, because the standard way of applying these techniques does not distinguish between circuits and
formulas, it is not clear how to prove quantitatively stronger lower bounds on formula size vis-a-vis
circuit size of a given function. Recent work of Rossman [14] developed a new way of applying Håstad’s
Switching Lemma to AC0 formulas, in order to prove an exp(Ω(dn1/(d−1) )) lower bound on the formula
size of the Parity function for all d ≤ O(log n). Combined with the well-known exp(O(n1/(d−1) )) upper
bound on the circuit size of Parity, this yields an asymptotically optimal separation in the power of
depth-d AC0 formulas vs. circuits for all d(n) ≤ O(log(n)/log log(n)), as well as a super-polynomial
separation for all ω(1) ≤ d(n) ≤ o(log n).
    In the present paper, we carry out a similar development for formulas vs. circuits in the AC0 [⊕]
basis, obtaining both an asymptotically optimal separation for all d(n) ≤ O(log(n)/log log(n)) and a
super-polynomial separation for all ω(1) ≤ d(n) ≤ o(log n). Our target functions lie in the class of
Approximate Majorities, here defined as Boolean functions {0, 1}n → {0, 1} that agree with the Majority
function on a 3/4 fraction of the inputs. First, we show how to apply the Polynomial Method to obtain
better parameters in the approximation of AC0 [⊕] formulas by low-degree polynomials over F2 . This
leads to an exp(Ω(dn1/2(d−1) )) lower bound on the AC0 [⊕] formula size of all Approximate Majority
functions. The other half of our formulas vs. circuits separation comes from an exp(O(n1/2(d−1) )) upper
bound on the AC0 [⊕] circuit size of some Approximate Majority function. In fact, this upper bound is
realized by a randomized construction of monotone AC0 circuits (AC0 circuits without NOT or MOD2
gates).
    1 In this paper we focus on non-uniform complexity classes. The question of uniform-NC1 vs. P is wide open as well.



                           T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                          2
                           S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

Theorem 1.1. Let n ∈ N be a growing parameter.
   1. For all 2 ≤ d ≤ O(log n), the depth-d AC0 [⊕] formula size of any n-variable Approximate Majority
      function is exp(Ω(dn1/2(d−1) )).

   2. For all
                                                   1       log n
                                         2≤d≤        ·                ,
                                                   2 log log n + O(1)
      there exists an n-variable Approximate Majority function whose depth-d monotone AC0 circuit size
      is exp(O(n1/2(d−1) )).
    (All asymptotic notation O(·) and Ω(·) hides constants independent of d.) Denote by AC0 [⊕]-
circuit[d(n), s(n)] and AC0 [⊕]-formula[d(n), s(n)] the class of languages decidable by AC0 [⊕]-circuits
and formulas, resp., of depth d(n) and size s(n). For all functions d(n), we have the inclusions

  AC0 [⊕]-formula[d(n), poly(n)] ⊆ AC0 [⊕]-circuit[d(n), poly(n)] ⊆ AC0 [⊕]-formula[d(n), nO(d(n)) ].

Theorem 1.1 implies that the above inclusions are tight for small d(n).
Corollary 1.2. Let n ∈ N be a growing parameter.
 (i) For all functions 2 ≤ d(n) ≤ O logloglogn n ,
                                                


                      AC0 [⊕]-circuit[d(n), poly(n)] * AC0 [⊕]-formula[d(n), no(d(n)) ].

 (ii) For all functions ω(1) ≤ d(n) ≤ o(log n),

                      AC0 [⊕]-formula[d(n), poly(n)] $ AC0 [⊕]-circuit[d(n), poly(n)].

Proof. We prove separation (i) using a padding argument. Let c > 0 be a constant such that Theorem 1.1(2)
holds for all
                                               1     log n
                                      2≤d≤ ·                   + 1.
                                               2 log log n + c
And let { fn,d } be the family of Approximate Majority functions from Theorem 1.1(2). Now suppose
                                                           log n
                                      2 ≤ d ≤ (1/2)c+1             + 1.
                                                         log log n

Letting m = (log n)2(d−1) , we have

                                              log n                   m1/2(d−1)
                         d − 1 ≤ (1/2)c+1             = (d − 1)(1/2)c           .
                                            log log n                  log m
Dividing both sides by d − 1 and taking logs, this rearranges to
                                             1     log m
                                        d≤     ·             + 1.
                                             2 log log m + c

                      T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                             3
                              B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

By Theorem 1.1, fm,d has AC0 [⊕] circuit size poly(n) and AC0 [⊕] formula size nΩ(d) . Separation (i) now
follows by padding fm,d to a function on n variables (noting that m < n).
    Separation (ii) follows from (i) when
                                                1     log n
                                           d≤     ·             + 1.
                                                2 log log n + c

For larger depth, separation (ii) follows from Theorem 1.1, noting that the lower bound exp(Ω(dn1/2(d−1) ))
is super-polynomial for d(n) = o(log n).

   We point out that extending separation (ii) from depth o(log n) to depth log n is equivalent to separating
complexity classes NC1 (= AC0 [⊕]-formula[log n, poly(n)]) from AC1 (= AC0 [⊕]-circuit[log n, poly(n)]).

1.1   Proof outline
Improved polynomial approximation. The lower bound for AC0 [⊕] formulas follows the general
template due to Razborov [13] on proving lower bounds for AC0 [⊕] circuits using low-degree polynomials
over F2 . Razborov showed that for any Boolean function f : {0, 1}n → {0, 1} that has an AC0 [⊕] circuit
of size s and depth d, there is a randomized polynomial P of degree O(log s)d−1 that computes f correctly
on each input with probability 7/8 (we call such polynomials 1/8-error probabilistic polynomials). By
showing that some explicit Boolean function f (e. g., the Majority function or the MODq function for q
                                                                                      √
odd) on n variables does not have such an approximation of degree less than Ω( n) [13, 16, 18, 17], we
get that any AC0 [⊕] circuit of depth d computing f must have size exp(Ω(n1/2(d−1) )).
    In this paper, we improve the parameters of Razborov’s polynomial approximation from above for
AC0 [⊕] formulas. More precisely, for AC0 [⊕] formulas of size s and depth d, we are able to construct
1/8-error probabilistic polynomials of degree O((1/d) log s)d−1 . (Since every depth-d circuit of size s is
equivalent to a depth-d formula of size at most sd−1 , this result implies Razborov’s original theorem that
AC0 [⊕] circuits of size s and depth d have 1/8-error probabilistic polynomials of degree O(log s)d−1 .)
    We illustrate the idea behind this improved polynomial approximation with the special case of a
balanced formula (i. e., all gates have the same fan-in) of fan-in t and depth d. Note that the size of the
formula (number of gates) is Θ(t d−1 ) and hence it suffices in this case to show that it has a 1/8-error
probabilistic polynomial of degree O(logt)d−1 . We construct the probabilistic polynomial inductively.
Given a balanced formula F of depth d and fan-in t, let F1 , . . . , Ft be its subformulas of depth d − 1.
Inductively, each Fi has a 1/8-error probabilistic polynomial Pi of degree O(logt)d−2 and by a standard
error-reduction [11], it has a (1/16t)-error probabilistic polynomial of degree O(logt)d−1 (in particular,
at any given input x ∈ {0, 1}n , the probability that there exists an i ∈ [t] such that Pi (x) 6= Fi (x) is at most
1/16). Using Razborov’s construction of a 1/16-error probabilistic polynomial of degree O(1) for the
output gate of F and composing this with the probabilistic polynomials Pi , we get the result for balanced
formulas. This idea can be extended to general (i. e., not necessarily balanced) formulas with a careful
choice of the error parameter for each subformula Fi to obtain the stronger polynomial approximation
result.

Improved formula lower bounds. Combining the above approximation result with known lower
bounds for polynomial approximation [13, 16, 18, 17], we can already obtain stronger lower bounds for

                         T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                   4
                                 S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

AC0 [⊕] formulas than are known for AC0 [⊕] circuits. For instance, it follows that any AC0 [⊕] formula
of depth d computing the Majority function on n variables must have size exp(Ω(dn1/2(d−1) )) for all
d ≤ O(log n), which is stronger than the corresponding circuit lower bound. Similarly stronger formula
lower bounds also follow for the MODq function (q odd).


Separation between formulas and circuits. However, the above improved lower bounds do not
directly yield the claimed separation between AC0 [⊕] formulas and circuits. This is because we do
not have circuits computing (say) the Majority function of the required size. To be able to prove our
result, we would need to show that the Majority function has AC0 [⊕] circuits of depth d and size
exp(O(n1/2(d−1) )) (where the constant in the O(·) is independent of d). However, as far as we know, the
strongest result in this direction [10] only yields AC0 [⊕] circuits of size greater than exp(Ω(n1/(d−1) )),2
which is superpolynomially larger than the upper bound.
     To circumvent this issue, we change the hard functions to the class of Approximate Majorities, which
is the class of Boolean functions that agree with the Majority function on a 3/4 fraction of inputs from
{0, 1}n . (The choice of the constant 3/4 is not crucial: it can be replaced by any constant in the interval
(1/2, 1).) While this has the downside that we no longer are dealing with an explicitly defined function,
the advantage is that the polynomial approximation method of Razborov yields tight lower bounds for
some functions from this class.
     Indeed, since the method of Razborov is based on polynomial approximations, it immediately
follows that the same proof technique also yields the same lower bound for computing Approximate
Majorities. Formally, any AC0 [⊕] circuit of depth d computing any Approximate Majority must have size
exp(Ω(n1/2(d−1) )). On the upper bound side, it is known from the work of O’Donnell and Wimmer [12]
and Amano [1] that there exist Approximate Majorities that can be computed by monotone AC0 formulas
of depth d and size exp(O(dn1/2(d−1) )). (Note that the double exponent 1/(2(d − 1)) is now the same in
the upper and lower bounds.)
     In order to separate AC0 [⊕] formulas and circuits, we show that both the upper and lower bounds
from the previous paragraph are actually tight (up to the constants implicit in the exponents). Plugging
in our stronger polynomial approximation for AC0 [⊕] formulas, we obtain that any AC0 [⊕] formula of
depth d computing any Approximate Majority must have size exp(Ω(dn1/2(d−1) )). In particular, this
implies that Amano’s construction is tight (up to the universal constant in the exponent) even for AC0 [⊕]
formulas.
     Further, we also modify Amano’s construction [1] to obtain better constant-depth circuits for Approx-
imate Majorities: we show that there exist Approximate Majorities that are computed by monotone AC0
circuits of depth d of size exp(O(n1/2(d−1) )) (the constant in the O(·) is independent of d).


Smaller circuits for Approximate Majority. Our construction closely follows Amano’s, which in
turn is related to Valiant’s probabilistic construction [19] of monotone formulas for the Majority function.
However, we need to modify the construction in a suitable way that exploits the fact that we are
constructing circuits. This modification is in a similar spirit to a construction of Hoory, Magen and
   2 Indeed, this is inevitable with all constructions that we are aware of, since they are actually AC0 circuits and it is known by

a result of Håstad [6] that any AC0 circuit of depth d for the Majority function must have size exp(Ω(n1/(d−1) )).


                            T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                                 5
                                   B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

Pitassi [7] who modify Valiant’s construction to obtain smaller monotone circuits (of depth Θ(log n)) for
computing the Majority function exactly.
      At a high level, the difference between Amano’s construction and ours is as follows. Amano constructs
random formulas Fi of each depth i ≤ d as follows. The formula F1 is the AND of a1 independent and
randomly chosen variables. For even i > 1, Fi is the OR of ai independent and random copies of Fi−1 ,
while for odd i > 1, Fi is the AND of ai independent and random copies of Fi−1 . For suitable values of
a1 , . . . , ad ∈ N, the random formula Fd computes an Approximate Majority with high probability. In our
construction, we build a depth i circuit Ci for each i ≤ d in a similar way, except that each Ci now has M
different outputs. Given such a Ci−1 , we construct Ci by taking M independent randomly chosen subsets
T1 , . . . , TM of ai many outputs of Ci−1 and adding gates that compute either the OR or AND (depending
on whether i is even or odd) of the gates in Ti . Any of the M final gates of Cd now serves as the output
gate. By an analysis similar to Amano’s (see also [7]) we can show that this computes an Approximate
Majority with high probability, which finishes the proof.3


2     Preliminaries
Throughout, n will be a growing parameter. We will consider Boolean functions on n variables, i. e.,
functions of the form f : {0, 1}n → {0, 1}. We will sometimes identify {0, 1} with the field F2 in the
natural way and consider functions f : Fn2 → F2 instead.
    Given a Boolean vector y ∈ {0, 1}n , we use |y|0 to denote the number of 0s in y and |y|1 to denote the
number of 1s in y.
    The Majority function on n variables, denoted MAJn is the Boolean function that maps inputs
x ∈ {0, 1}n to 1 if and only if |x|1 > n/2.

Definition 2.1. An (ε, n)-Approximate Majority is a function f : {0, 1}n → {0, 1} such that

                                                   Pr [ f (x) 6= Majn (x)] ≤ ε.
                                                x∈{0,1}n

    As far as we know, the study of this class of functions was initiated by O’Donnell and Wimmer [12].
See also [1, 4].
    We refer the reader to [2, 8] for standard definitions of Boolean circuits and formulas. We use the
terms AC0 circuits and AC0 formulas to describe and formulas of constant depth, made up of AND, OR
and NOT gates. Similarly, AC0 [⊕] circuits and AC0 [⊕] formulas will be circuits and formulas of constant
depth, made up of AND, OR, MOD2 and NOT gates.

Remark 2.2. In standard terminology, AC0 and AC0 [⊕] circuits refer to circuits of constant depth and
polynomial size (made up of the relevant sets of gates in each case). However, our results hold for more
general settings of these parameters. To phrase these results, it will be convenient to abuse notation and
refer to AC0 and AC0 [⊕] circuits of size s(n) and depth d(n) where s(n) is possibly superpolynomial and
d(n) possibly superconstant, which of course can be defined in the obvious way.
    3 This is a slightly imprecise description of the construction as the final two levels of the circuit are actually defined somewhat

differently.


                             T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                                   6
                                 S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

     By the size of a circuit we mean the number of gates in the circuit. (Note that this, too, is a deviation
from standard terminology that refers to the number of wires as the “size” of the circuit. However, since
the number of wires is polynomially related to the number of gates, Theorem 1.1 and Corollary 1.2
also hold when the size denotes the number of wires.) By the size of a formula we mean the number
of its leaves which is, up to a constant factor, equal to the number of all nodes (including leaves) in the
formula.4


3     Lower bound
In this section, we show that any AC0 [⊕] formulas of depth d computing a (1/4, n)-Approximate Majority
must have size at least exp(Ω(dn1/2(d−1) )) for all d ≤ O(log n).
    We work over the field F2 and identify it with {0, 1} in the natural way. The following concepts are
standard in circuit complexity (see, e. g., Beigel’s survey [3]).

Definition 3.1. Fix any ε ∈ [0, 1]. A polynomial P ∈ F2 [X1 , . . . , Xn ] is said to be an ε-approximating
polynomial for a Boolean function f : {0, 1}n → {0, 1} if

                                                 Pr [ f (x) = P(x)] ≥ 1 − ε.
                                              x∈{0,1}n

   The following lemma, motivated by the ideas of Smolensky [16] (1987), first appeared explicitly in
Szegedy’s Ph. D. Thesis [18, Theorem 2.4.7] (1989) and in Smolensky’s paper [17, Theorem 5] (1993).

Lemma 3.2 (Smolensky, Szegedy). Let ε ∈ (0, 1/2) be any fixed constant. Any (1/2 − ε)-approximating
                                                                       √
polynomial for the Majority function on n variables must have degree Ω( n).

Corollary 3.3. Let f be any (1/4, n)-Approximate Majority and ε ∈ (0, 1/4) an arbitrary constant. Then
                                                                  √
any (1/4 − ε)-approximating polynomial for f must have degree Ω( n).

Proof. The proof is immediate from Lemma 3.2 and the triangle inequality.

Definition 3.4. An ε-error probabilistic polynomial of degree D for a Boolean function f : {0, 1}n →
{0, 1} is a random variable P taking values from polynomials in F2 [X1 , . . . , Xn ] of degree at most D such
that for all x ∈ {0, 1}n , we have Pr[ f (x) = P(x) ] ≥ 1 − ε.

Definition 3.5. Let Dε ( f ) be the minimum degree of an ε-error probabilistic polynomial for f .

     We will make use of the following two lemmas concerning Dε (·).

Lemma 3.6 (Razborov [13]). Let ORn and ANDn be the OR and AND functions in n variables, respec-
tively. Then Dε (ORn ), Dε (ANDn ) ≤ dlog(1/ε)e.

Lemma 3.7 (Kopparty and Srinivasan [11]). There is an absolute constant c1 such that for any ε ∈ (0, 1),
Dε ( f ) ≤ c1 · dlog(1/ε)e · D1/8 ( f ) for all Boolean functions f .
    4 We assume here without loss of generality that the formula does not contain a gate of fan-in 1 feeding into another gate.



                            T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                                 7
                             B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

   We now state our main result, which shows that every AC0 [⊕] formula of size s and depth d + 1
admits a 1/8-error approximating polynomial of degree O((1/d) log s)d .
Theorem 3.8. There is an absolute constant c2 such that, if f is computed by an AC0 [⊕] formula F of
size s and depth d + 1, then
                                                             d
                                                 1
                               D1/8 ( f ) ≤ 3 c2    log(s) + 1      .
                                                 d
Proof. The proof is an induction on the depth d of the formula.
       The base case d = 0 corresponds to the case when the formula is a single AND, OR or MOD2 gate
and we need to show that D1/8 ( f ) ≤ 3. In the case that the formula is an AND or OR gate, this follows
from Lemma 3.6. If the formula is a MOD2 gate, this follows from the fact that the MOD2 function is
exactly a polynomial of degree 1.
       Let d ≥ 1. We assume that the formula F is the AND/OR/MOD2 of subformulas F1 , . . . , Fm computing
 f1 , . . . , fm where Fi has size si and depth d + 1. So F has size s = s1 + · · · + sm and depth d + 2. Assume
that                                                                    d
                                                            1
                                         D1/8 ( fi ) ≤ 3 c2   log(si ) + 1
                                                            d
for all i. We must show that
                                                                d+1
                                                     1
                                 D1/8 ( f ) ≤ 3 c2      log(s) + 1      .
                                                   d +1
   By Lemma 3.7, each fi has an si /(16s)-error probabilistic polynomial Pi of degree c1 · dlog(16s/si )e ·
D1/8 ( fi ), which is at most
                                                                         d
                                                            1
                               3c1 · 5(log(s/si ) + 1) · c2   log(si ) + 1     .
                                                            d
Then (P1 , . . . , Pm ) jointly computes ( f1 , . . . , fm ) with error 1/16 (= ∑m  i=1 (si /(16s))).
    By a reasoning identical to the base case, it follows that there exists a 1/16-error probabilistic
polynomial Q of degree 4 for the output gate of the formula.
    Then Q(P1 , . . . , Pm ) is a 1/8-error probabilistic polynomial for f of degree
                                                                                       d
                                                                         1
                               60c1 · max {(log(s/si ) + 1)} · c2          log(si ) + 1        .
                                       i                                 d
    So long as c2 ≥ 20c1 , it suffices to show that for all i,
                                                        d                  d+1
                                           1                     1
                     (log(s/si ) + 1) ·      log(si ) + 1 ≤         log(s) + 1     .
                                           d                   d +1

Consider any i and let a, b ≥ 0 such that si = 2a and s = 2a+b . We must show
                                                d              d+1
                                            a            a+b
                                 (b + 1)      +1 ≤              +1     .
                                            d            d +1

                        T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                 8
                            S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

For fixed a ≥ 0, as a polynomial in b, the function
                                                     d+1                 d
                                               a+b                     a
                             pa,d (b) :=            +1     − (b + 1)     +1
                                               d +1                    d

is nonnegative over b ≥ 0 with a unique root at b = a/d. This follows from
                                                               d                 d
                                ∂                     a+b                    a
                                   pa,d (b) =              +1        −         +1        ,
                                ∂b                    d +1                   d

which is zero iff b = a/d; this value is a minimum of pa,d with pa,d (a/d) = 0.

Corollary 3.9. Let n ∈ N be a growing parameter and d = d(n) (possibly a constant). Let f be any
(1/4, n)-Approximate Majority. Then any AC0 [⊕] formula of depth d computing f must have size
exp(Ω(dn1/2(d−1) )) for all d ≤ O(log n), where the asymptotic notations O(·) and Ω(·) hide absolute
constants (independent of d and n).

Proof. Say that F is an AC0 [⊕] formula of depth d and size s computing f . Then, by Lemma 3.7, we see
that F has a 1/8-error probabilistic polynomial P of degree D ≤ O((1/d) log s + 1)d−1 . In particular, by
an averaging argument, there is some fixed polynomial P ∈ F2 [X1 , . . . , Xn ] of degree at most D such that
P is a 1/8-error approximating polynomial for f .
                                                         √
    Corollary 3.3 implies that the degree of P must be Ω( n). Hence, we obtain O((1/d) log s + 1)d−1 ≥
   √
Ω( n). It follows that s ≥ exp(Ω(dn1/2(d−1) ) − O(d)). Observe that Ω(dn1/2(d−1) ) dominates O(d)
so long as d ≤ ε log n for some absolute constant ε > 0 (depending on the constants in Ω(·) and O(·)).
Hence, we get the claimed lower bound s ≥ exp(Ω(dn1/2(d−1) )) for all d ≤ ε log n.


4     Upper bound
In this section, we show that for any constant ε, there are (ε, n)-Approximate Majorities that can be
computed by depth-d AC0 circuits of size exp(O(n1/2(d−1) )).

4.1   High-level idea
We start with an informal description of our construction, which closely follows the constructions of
O’Donnell and Wimmer [12] and Amano [1] (see also [19, 7]).
    The first basic observation, from [12], is that to compute an (ε, n)-Approximate majority, it suffices
to compute the Majority function on most (say at least a 1 − (ε/4) fraction of) inputs of relative
                                                                                              √
Hamming weight bounded away from 1/2; say those of weight either at most 1/2 − (ε/4 n) or at
                 √
least 1/2 + (ε/4 n). So we assume that the inputs come from one of these sets. Let us call the sets of
high-weight inputs and low-weight inputs Y (for “Yes”) and N (for “No”), respectively.
    The problem of distinguishing inputs from Y and N is easier than computing Majority exactly since
there is a gap between the weights of the yes and no instances. In particular, given any inputs a ∈ Y
and b ∈ N, we have that the multiplicative gap between the weights of a and b is (1 + γ0 ), where
           √
γ0 = Θ(ε/ n).

                       T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                               9
                                  B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

     The idea is to iteratively increase this gap at each level of the circuit, so that after (d − 1) levels, the
multiplicative gap is a large constant, at which point a simple OR or AND determines the answer on most
inputs.
     The basic observation that allows us to do this (formalized in Lemma 4.1 below) is the following.
Say we have an input x ∈ Y 0 ∪ N 0 where Y 0 is the set of inputs of relative weight at least p(1 + γ) and N 0
is the set of inputs of relative weight at most p(1 − γ) (for some known p, γ). Consider the depth-1 circuit
C0 that chooses M random subsets of the input bits, of size t each, and computes the OR of the variables
in each set (C0 is thus an M-output circuit). If the input x has (relative) weight at least p(1 + γ), then the
expected fraction of 0s in the output—let us call this the 0-weight of the output—is at most

                                            (1 − p(1 + γ))t ≈ exp(−pt − pγt)

while if x has weight at most p(1 − γ), then the output has 0-weight at least exp(−pt + pγt). By a simple
application of the Chernoff bound and averaging, we can easily ensure that we have a fixed circuit C0
that has (roughly) this behaviour at every input x of interest. That is, for inputs a ∈ Y 0 and b ∈ N 0 , the
0-weights of C0 (a) and C0 (b) have 0-weights that are either at most exp(−pt) · exp(−γ pt) or at least
exp(−pt) · exp(γ pt). In particular, they are multiplicatively separated by 1 + γ 0 where γ 0 ≈ (pt) · γ,5
which is much greater than γ if pt is much greater than 1. In an analogous way, we can construct a depth-1
circuit (made up of ANDs this time) that amplifies a multiplicative gap of (1 + γ) in the 0-weights to a
multiplicative gap of (1 + γ 0 ) in the weights.
    Given this fact, our way forward is clear. We stack these depth-1 circuits one on top of the other, at
each stage amplifying the multiplicative gap from 1 + γ (for some γ) to 1 + rγ.6 Since we want the gap
after d − 1 levels to be a constant, we take r ≈ (1/γ0 )1/(d−1) ≈ n1/2(d−1) .
    This in turns determines the size of our constructed circuit. Recall that the depth-1 circuit describes
above amplifies the gap between the weights from 1 + γ to 1 + (pt) · γ, which we would like to be
1 + rγ. At the first level of the circuit, we can take t ≈ r as p = 1/2 = Θ(1). At all subsequent levels,
however, we have p ≈ exp(−r) and hence t ≈ r/p is exp(O(r)). Hence, we obtain a circuit of size
exp(O(r)) = exp(O(n1/2(d−1) )).


4.2     Formal proof
Let ε0 ∈ (0, 1) be a small enough constant so that the following inequalities hold for any β ≤ ε0 .

      • exp(−β ) ≤ 1 − β exp(−β ), and

      • 1 − β ≥ exp(−β − β 2 ) ≥ exp(−2β ).

(It suffices to take ε0 = 1/2.)
     We need the following technical lemma.

   5 We use the approximation that exp(z) ≈ (1 + z) when z is small.
   6 One could also amplify the gaps by varying amounts at different levels of the circuit. This turns out to be below optimal in

terms of the size of the circuit constructed.


                            T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                             10
                            S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

Lemma 4.1. Let A, s be positive reals, M, n ∈ N, and γ ∈ (1/n, 1/10) be such that eA ≥ n3 , n ≥ 1/ε0 ,
and 1 ≤ s ≤ n. Define

  I0 (γ) := {y ∈ {0, 1}M | |y|1 ≤ Me−A (1 − γ)} and               I1 (γ) := {y ∈ {0, 1}M | |y|1 ≥ Me−A (1 + γ)}.

If we choose S ⊆ [M] of size t := deA · se by picking t random elements from M with replacement, then
                                           "           #
                                                  _
                            x ∈ I0 (γ) ⇒ Pr             x j = 0 ≥ exp(−s) · exp(sγ/2),
                                         S
                                                  j∈S
                                              "               #
                                                  _
                            x ∈ I1 (γ) ⇒ Pr             x j = 0 ≤ exp(−s) · exp(−sγ).
                                         S
                                                  j∈S

Further, if sγ ≤ ε0 , then the above probabilities can be bounded from below and from above by exp(−s) ·
(1 + sγ exp(−sγ)) and exp(−s) · (1 − sγ exp(−sγ)), respectively.
    A similar statement can be obtained above for the sets

  J1 (γ) := {y ∈ {0, 1}M | |y|0 ≤ Me−A (1 − γ)} and               J0 (γ) := {y ∈ {0, 1}M | |y|0 ≥ Me−A (1 + γ)},
                W                                                    V
with the event “    j∈S x j = 0” being replaced by the event “           j∈S x j = 1.”

Proof. We give the proof only for I0 (γ) and I1 (γ). The proof for J0 (γ) and J1 (γ) is similar.
   Consider first the case that x ∈ I1 (γ). In this case, we have the following computation.
                                     A
                                1 + γ e ·s
                            
                   _
              Pr[ x j = 0] ≤ 1 − A         ≤ exp(−(1 + γ) · s) ≤ exp(−s) · exp(−sγ).                         (4.1)
              S
                 j∈S
                                 e

The above implies the first upper bound on PrS [ j∈S x j = 0] from the lemma statement. When sγ ≤ ε0 ,
                                                          W

we further have exp(−sγ) ≤ 1 − sγ exp(−sγ), which implies the second upper bound. This proves the
lemma when x ∈ I1 (γ).
   Now consider the case that x ∈ I0 (γ). We have
                                                     A
                                                1 − γ e ·s+1
                                          
                              _
                           Pr[ x j = 0] ≥ 1 − A
                           S
                              j∈S
                                                 e
                                                                        
                                                    1−γ       1      A
                                        ≥ exp     − A − 2A · (e · s + 1)
                                                     e       e
                                                                         
                                                           s    1−γ     1
                                        = exp −s + sγ − A − A − 2A
                                                          e      e     e
                                                             
                                                          2s
                                        ≥ exp −s + sγ − A
                                                          e
                                                                  2e−A
                                                                     
                                        = exp(−s) · exp sγ 1 −                                               (4.2)
                                                                    γ


                        T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                      11
                           B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

where for the second inequality we have used the fact that since e−A ≤ 1/n3 ≤ ε0 , we have
                                                                  
                                     1−γ               1−γ       1
                                 1 − A ≥ exp − A − 2A .
                                       e                e      e

Since e−A ≤ 1/n3 ≤ 1/(4n) ≤ γ/4, we can bound from below the right hand side of inequality (4.2) by
exp(−s) · exp(sγ/2). Also, note that

                     2e−A      2/n3       2
                1−        ≥ 1−      = 1 − 2 ≥ exp(−1/n) ≥ exp(−γ) ≥ exp(−sγ).
                       γ       1/n       n
This implies that the righthand side of inequality (4.2) can also be bounded from below by

                       exp(−s) exp(sγ exp(−sγ)) ≥ exp(−s) · (1 + sγ exp(−sγ)),

                                       j∈S x j = 0] assuming that x ∈ I0 (γ).
                                   W
which implies the claim about PrS [

    We now prove the main result of this section.

Theorem 4.2. For any growing parameter n ∈ N and
                                                 1       log n
                                       2≤d≤        ·
                                                 2 log log n + O(1)

and ε > 0, there is an (ε, n)-Approximate Majority fn computable by a monotone AC0 circuit with at
most exp(O(n1/2(d−1) log(1/ε)/ε)) many gates, where both O(·)’s hide absolute constants (independent
of d, ε).

Proof. We assume throughout that ε is a small enough constant and that n is large enough for various
inequalities to hold. We will actually construct a monotone circuit of depth d and size
                                                                
                                                1/2(d−1) log(1/ε)
                                     exp O n
                                                             ε

computing a (4ε, n)-Approximate Majority, which also implies the theorem.
    Fix parameter A = n1/2(d−1) − α where α ∈ [0, ln 2] is chosen so that A = A1 · ln 2 for some positive
integer A1 . We assume that A ≥ 10 log n (which holds as long as
                                                        log n
                                             2d ≤
                                                    log log n + c

for a large enough absolute constant c > 0) and that ε ≤ ε0 . Let M = de10A e.
    Define a sequence of real numbers γ0 , γ1 , . . . , γd−2 as follows.
                                   ε
                             γ0 = √ ,
                                    n
                             γi = Aγi−1 exp(−2Aγi−1 ), for each i ∈ [d − 2].


                      T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                           12
                              S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

    It is clear that γi ≤ Ai γ0 for each i ∈ [d − 2]. As a result we also obtain

                   γi = Ai γ0 exp(−2A(γ0 + γ1 + · · · + γi−1 ))
                     ≥ Ai γ0 exp(−2γ0 A(1 + A + A2 + · · · + Ai−1 )) ≥ Ai γ0 exp(−3Ai γ0 ).                (4.3)

    Let
                                                                  
                                                             1  ε
                                   Yε = x ∈ {0, 1}n |x|1 ≥     +√ n ,
                                                             2   n
                                                                  
                                                  n          1  ε
                                   Nε = x ∈ {0, 1} |x|1 ≤      −√ n .
                                                             2   n

    The idea is to define a sequence of circuits C1 ,C2 , . . . ,Cd−2 with n inputs and M outputs such that Ci
has depth i and iM many (non-input) gates. Further, for odd i,

                        x ∈ Nε ⇒ Ci (x) ∈ I0 (γi ),          and    x ∈ Yε ⇒ Ci (x) ∈ I1 (γi ),            (4.4)

and similarly for even i,

                        x ∈ Nε ⇒ Ci (x) ∈ J0 (γi ),          and    x ∈ Yε ⇒ Ci (x) ∈ J1 (γi ).            (4.5)

     After this is done, we will add on the top a depth-2 circuit that will reject most inputs from I0 (γd−2 )
or J0 (γd−2 )—depending on whether (d − 2) is odd or even—and accept most inputs from I1 (γd−2 ) or
J1 (γd−2 ).
     We begin with the construction of C1 , . . . ,Cd−2 which is done by induction.

Construction of C1 . The base case of the induction is the construction of C1 , which is done as follows.
We choose M i.i.d. random subsets T1 , . . . , TM ⊆ [n] in the following way: for each i ∈ [M], we sample A1
random elements of [n] with replacement. Let bxi = j∈Ti x j .
                                                         V

    If x ∈ Nε , then the probability that bxi = 1 is given by
                                          A1
                                  1                   1                  1                1
              Pr[bxi = 1] ≤         − γ0         =        (1 − 2γ0 )A1 ≤ A (1 − γ0 A1 ) ≤ A (1 − γ0 A) ,
                                  2                  2 A1               e                e

where the second-last inequality follows from the fact that

                                                            A21 z2
                                                                  
                                            A1
                                     (1 − z) ≤ 1 − zA1 +
                                                              2

for z ∈ [0, 1].
    Let δ = 1/n3 . Note in particular that 2δ /γ0 A ≤ ε0 for large enough n.
    By a Chernoff bound, the probability that
                                            1         1
                                                bxi ≥ A (1 − γ0 A)(1 + δ )
                                            M∑i      e

                       T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                 13
                              B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

is bounded by exp(−Ω(δ 2 M/eA )) ≤ exp(−Ω(e9A /n6 )) ≤ exp(−n), since eA ≥ n10 . Thus, with probabil-
ity at least 1 − exp(−n), we have

                      ∑i bxi   1
                             ≤ A (1 − γ0 A)(1 + δ )
                       M      e                                         
                               1                    1                 δ
                             ≤ A (1 − γ0 A + δ ) = A 1 − γ0 A 1 −
                              e                     e                γ0 A
                                                        
                               1                      2δ       1
                             ≤ A 1 − γ0 A exp −              ≤ A (1 − γ0 A exp(−γ0 A))
                              e                      γ0 A     e
                               1
                             ≤ A (1 − γ1 ).                                                              (4.6)
                              e

Above, we have used the fact that
                                                                  
                                                  δ             −2δ
                                              1−        ≥ exp
                                                 γ0 A           γ0 A

since δ /γ0 A ≤ ε0 .
    If x ∈ Yε , then the probability that bxi = 1 is given by
                                          A1
                                  1                   1                 1                1
              Pr[bxi = 1] ≥         + γ0         ≥       (1 + 2γ0 )A1 ≥ A (1 + γ0 A1 ) ≥ A (1 + γ0 A).
                                  2                  2A1               e                e

    As above, we can argue that the probability that

                                            1         1
                                                bxi ≤ A (1 + γ0 A)(1 − δ )
                                            M∑i      e

is at most exp(−n). Thus, with probability 1 − exp(−n),

                      ∑i bxi   1
                             ≥ A (1 + γ0 A)(1 − δ )
                       M      e                                          
                               1                     1                2δ
                             ≥ A (1 + γ0 A − 2δ ) = A 1 + γ0 A 1 −
                              e                     e                 γ0 A
                                                       
                               1                    4δ        1
                             ≥ A 1 + γ0 A exp −             ≥ A (1 + γ0 A exp(−γ0 A))
                              e                     γ0 A     e
                               1
                             ≥ A (1 + γ1 ).                                                              (4.7)
                              e

Thus, by a union bound over x, we can fix a choice of T1 , . . . , TM so that inequality (4.6) holds for all
x ∈ Nε and inequality (4.7) holds for all x ∈ Yε . Hence, condition (4.4) holds for i = 1 as required. This
                                                                     V
concludes the construction of C1 , which just outputs the values of j∈Ti x j for each i.

                       T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                               14
                             S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

Construction of Ci+1 . For the inductive case, we proceed as follows. We assume that i is odd (the case
that i is even is similar). So by the inductive hypothesis, we know that condition (4.4) holds and hence
that Ci (x) ∈ I0 (γi ) or I1 (γi ) depending on whether x ∈ Nε or Yε . Let γ := γi . Let the output gates of Ci be
g1 , . . . , gM .
      We choose T1 , . . . , TM ⊆ [M] randomly as in the statement of Lemma 4.1 with s = A. Note that the
chosen parameters satisfy all the hypotheses of Lemma 4.1. Further we also have
                                                                  ε
                                          sγ ≤ A · Ai γ0 ≤ Ad−2 · √ ≤ ε0 .
                                                                   n
    The random circuit C0 is defined to be the circuit obtained by adding M OR gates to Ci such that the
jth OR gate computes k∈Tj gk . Let bxj be the output of the jth OR gate on Ci (x).
                      W

    By Lemma 4.1, we have

                        x ∈ Nε ⇒ Pr[bxj = 0] ≥ exp(−A) · (1 + Aγ exp(−Aγ)), and
                                      S
                         x ∈ Yε ⇒ Pr[bxj = 0] ≤ exp(−A) · (1 − Aγ exp(−Aγ)).                                (4.8)
                                      S

    Let δ = 1/n3 . Note that                                        
                                                   1      1
                                              Aγ ∈ √ , 1/2(d−1)
                                                    n n
and hence for large enough n,
                                                     2δ
                                                            ≤ ε0 .
                                                Aγ exp(−Aγ)
    Assume x ∈ Nε . In this case, the Chernoff bound implies that the probability that

                               ∑ bxj ≤ M exp(−A) · (1 + Aγ exp(−Aγ))(1 − δ )
                              j∈[M]

is at most exp(−Ω(δ 2 M/eA )) ≤ exp(−n). When this event does not occur, we have
          ∑i bxi   1
                 ≥ A (1 + Aγ exp(−Aγ))(1 − δ )
           M      e                                                            
                   1                           1                          2δ
                 ≥ A (1 + Aγ exp(−Aγ) − 2δ ) = A 1 + Aγ exp(−Aγ) 1 −
                  e                            e                     Aγ exp(−Aγ)
                                                          
                   1                                 4δ
                 ≥ A 1 + Aγ exp(−Aγ) · exp −
                  e                             Aγ exp(−Aγ)
                   1
                 ≥ A (1 + Aγ exp(−Aγ) · exp(−Aγ))
                  e
                   1                        1
                 ≥ A (1 + Aγ exp(−2Aγ)) ≥ A (1 + γi+1 ).                                                    (4.9)
                  e                         e
    We have used above that for large enough n,
                                                     2δ
                                                            ≤ ε0
                                                Aγ exp(−Aγ)

                        T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                  15
                             B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

and hence                                                             
                                            2δ                 −4δ
                                  1−               ≥ exp                 .
                                       Aγ exp(−Aγ)         Aγ exp(−Aγ)
    Similarly when x ∈ Yε , the Chernoff bound tells us that the probability that

                              ∑ bxj ≥ M exp(−A) · (1 − Aγ exp(−Aγ))(1 + δ )
                             j∈[M]


is at most exp(−n). In this case, we get

            ∑i bxi   1
                   ≤ A (1 − Aγ exp(−Aγ))(1 + δ )
             M      e                                                           
                     1                           1                         δ
                   ≤ A (1 − Aγ exp(−Aγ) + δ ) = A 1 − Aγ exp(−Aγ) 1 −
                    e                            e                    Aγ exp(−Aγ)
                                                             
                     1                                  2δ
                   ≤ A 1 − Aγ exp(−Aγ) · exp −
                    e                              Aγ exp(−Aγ)
                     1
                   ≤ A (1 − Aγ exp(−Aγ) · exp(−Aγ))
                    e
                     1                        1
                   = A (1 − Aγ exp(−2Aγ)) ≥ A (1 − γi+1 ).                                               (4.10)
                    e                         e

    By a union bound, we can fix T1 , . . . , TM so that inequality (4.9) and inequality (4.10) are true for all
x ∈ Nε and x ∈ Yε , respectively. This gives us the circuit Ci+1 which satisfies all the required properties.


The top two levels of the circuit. At the end of the above procedure we have a circuit Cd−2 of depth
d − 2 and at most (d − 2)M gates that satisfies one of condition (4.4) or condition (4.5) depending on
whether d − 2 is odd or even, respectively. We assume that d − 2 is even (the other case is similar).
   Define γ := γd−2 . Recall from inequality (4.3) that γ ≥ Ad−2 γ0 exp(−3Ad−2 γ0 ) ≥ Ad−2 γ0 /2.
   Let                                                           
                                   0          10A log(1/ε)
                                 M = exp                    + 10A .
                                                     ε
We choose M 0 many subsets T1 , . . . , TM0 ⊆ [M] i.i.d. so that each T j is picked as in Lemma 4.1 with
s = 10A log(1/ε)/ε. Note that

                                   Ad−2 γ0 10A log(1/ε) Ad−2 ε
                         sγ ≥ s            =               ·       · √ ≥ 4 log(1/ε).
                                     2             ε           2      n
                                                                                 √
where the final inequality uses the fact that Ad−1 = (n1/2(d−1) − α)d−1 ≥ n · (1 − o(1)) as long as
2d ≤ log n/(log log n + c) for a large enough absolute constant c > 0.
    Say g1 , . . . , gM are the output gates of Cd−2 . We define the random circuit C0 (with n inputs and M 0
outputs) to be the circuit obtained by adding M 0 AND gates such that the jth AND gate computes k∈Tj gk .
                                                                                                    V

Let bxj be the output of the jth AND gate on Cd−2 (x).

                       T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                                 16
                              S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

    By Lemma 4.1, we have
                     x ∈ Nε ⇒ Pr[bxj = 1] ≤ exp(−s) · exp(−sγ) ≤ ε 2 · exp(−s), and
                                  S
                                                                           exp(−s)
                     x ∈ Yε ⇒ Pr[bxj = 1] ≥ exp(−s) · exp(sγ/2) ≥                  .                   (4.11)
                                  S                                           ε2
    Say x ∈ Nε . By a Chernoff bound, the probability that ∑ j bxj ≥ 2ε 2 M 0 exp(−s) is at most

                          exp(−Ω(ε 2 M 0 exp(−s))) ≤ exp(−Ω(ε 2 e10A )) ≤ exp(−n).
Similarly, when x ∈ Yε , the probability that
                                                          M 0 exp(−s)
                                              ∑ bxj ≤          2ε 2
                                               j

is also bounded by exp(−n). By a union bound, we can fix a T1 , . . . , TM0 to get a circuit Cd−1 such that
                               x ∈ Nε ⇒ |Cd−1 (x)|1 ≤ 2ε 2 exp(−s)M 0 , and
                                                        1
                               x ∈ Yε ⇒ |Cd−1 (x)|1 ≥ 2 exp(−s)M 0 .                               (4.12)
                                                       2ε
    This gives us the depth-(d − 1) circuit Cd−1 . Note that Cd−1 has M 0 + O(dM) = O(M 0 ) gates.
    To get the depth-d circuit, we choose a random subset T ⊆ [M 0 ] by sampling exactly dexp(s)e many
elements of [M 0 ] with replacement. We construct a random depth-d circuit Cd0 by taking the OR of the
output gates of Cd−1 indexed by the subset T .
    From implication (4.12) it follows that
                 x ∈ Nε ⇒ Pr[Cd0 (x) = 1] ≤ |T | · 2ε 2 exp(−s) ≤ 4ε 2 < ε, and
                          T

                                                    exp(−s) exp(s)
                                                            
                               0
                 x ∈ Yε ⇒ Pr[Cd (x) = 0] ≤ 1 −                      ≤ exp(−1/2ε 2 ) < ε.
                          T                             2ε 2
    The final inequalities in each case above hold as long as ε is a small enough constant.
    It follows from the above that there is a choice for T such that Cd0 makes an error—i. e., Cd0 (x) = 1 for
x ∈ Nε or Cd0 (x) = 0 for x ∈ Yε —on at most a 2ε fraction of inputs from Nε ∪Yε . We fix such a choice for
T and the corresponding circuit C.
    We have
                  Pr [C(x) 6= Majn (x)] ≤          Pr [C(x) 6= Majn (x)] +        Pr [x 6∈ Yε ∪ Nε ]
               x∈{0,1}n                        x∈Yε ∪Nε                         x∈{0,1}n

                                             ≤ 2ε +       Pr [x 6∈ Yε ∪ Nε ].
                                                       x∈{0,1}n

    Finally by Stirling’s approximation we get
                                                                                    
                                   1                      n    1                       n
            Pr n [x 6∈ Yε ∪ Nε ] = n         ∑               ≤ n         ∑                ≤ 2ε.
          x∈{0,1}                 2 m∈[ n −ε √n, n +ε √n] m   2 m∈[ n −ε √n, n +ε √n] n/2
                                         2         2                        2        2

    Hence we see that the circuit C computes a (4ε, n)-Approximate Majority, which proves Theorem 4.2.
    The circuit has depth d and size O(M 0 ) = exp(O(n1/2(d−1) log(1/ε)/ε)).

                          T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                            17
                            B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

5    Conclusions
It is straightforward to extend our main results to AC0 [MOD p ] for any fixed prime p to yield a lower bound
of exp(Ω(dn1/2(d−1) /p)) (the Ω(·) hides an absolute constant.). To prove this, we construct approximating
polynomials as in Section 3 for size-s depth-d AC0 [MOD p ] formulas of degree O((1/d)p log s)d−1 . The
construction is almost exactly the same, except that we use F p -analogs of Lemma 3.6 (which is true
with the weaker degree bound of (p − 1) · dlog(1/ε)e [16]) and Lemma 3.7 (which holds as stated over
all fields [5]). The Smolensky–Szegedy degree lower bound for approximating the Majority function
(Lemma 3.2) is true over all fields.
     The improved construction of approximating polynomials for AC0 [MOD p ] formulas also implies
another result. Using the fact [16] that any (1/4)-approximating polynomial over F p (p odd) for the
                                                      √
Parity function on n variables must have degree Ω( n), we see that any polynomial-sized AC0 [MOD p ]
formula computing the Parity function on n variables must have depth Ω(log n) for constant primes p.
This strengthens a result of Rossman [14] which gives this statement for AC0 formulas.


Acknowledgements. We thank Rahul Santhanam for valuable discussions, and the organizers of the
2016 Complexity Semester at St. Petersburg, where this collaboration began. We are also grateful to the
anonymous referees who reviewed this paper for Theory of Computing for their corrections and feedback.


References
 [1] K AZUYUKI A MANO: Bounds on the size of small depth circuits for approximating majority. In
     Proc. 36th Internat. Colloq. on Automata, Languages and Programming (ICALP’09), pp. 59–70.
     Springer, 2009. [doi:10.1007/978-3-642-02927-1_7] 5, 6, 9

 [2] S ANJEEV A RORA AND B OAZ BARAK: Computational Complexity - A Modern Approach. Cam-
     bridge Univ. Press, 2009. [doi:10.1017/CBO9780511804090] 6

 [3] R ICHARD B EIGEL: The polynomial method in circuit complexity. In Proc. 8th IEEE Conf.
     on Structure in Complexity Theory (SCT’93), pp. 82–95. IEEE Comp. Soc. Press, 1993.
     [doi:10.1109/SCT.1993.336538] 7

 [4] E RIC B LAIS AND L I -YANG TAN: Approximating Boolean functions with depth-2 circuits. SIAM J.
     Comput., 44(6):1583–1600, 2015. Preliminary version in CCC’13. [doi:10.1137/14097402X] 6

 [5] P RAHLADH H ARSHA AND S RIKANTH S RINIVASAN: On polynomial approximations to AC0 .
     Random Structures Algorithms, 54(2):289–303, 2019. Preliminary version in RANDOM’16.
     [doi:10.1002/rsa.20786, arXiv:1604.08121] 18

 [6] J OHAN H ÅSTAD: Almost optimal lower bounds for small depth circuits. In Proc. 18th STOC, pp.
     6–20. ACM Press, 1986. [doi:10.1145/12130.12132] 2, 5

                       T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                              18
                          S EPARATION OF AC0 [⊕] F ORMULAS AND C IRCUITS

 [7] S HLOMO H OORY, AVNER M AGEN , AND T ONIANN P ITASSI: Monotone circuits for the ma-
     jority function. In Proc. Internat. Workshop on Approximation, Randomization, and Combina-
     torial Optimization. Algorithms and Techniques (RANDOM’06), pp. 410–425. Springer, 2006.
     [doi:10.1007/11830924_38] 6, 9

 [8] S TASYS J UKNA: Boolean Function Complexity: Advances and Frontiers.             Springer, 2012.
     [doi:10.1007/978-3-642-24508-4] 6

 [9] M AURICIO K ARCHMER AND AVI W IGDERSON: Monotone circuits for connectivity require super-
     logarithmic depth. SIAM J. Discrete Math., 3(2):255–265, 1990. Preliminary version in STOC’88.
     [doi:10.1137/0403021] 2

[10] M ARIA M. K LAWE , W OLFGANG J. PAUL , N ICHOLAS P IPPENGER , AND M IHALIS YANNAKAKIS:
     On monotone formulae with restricted depth. In Proc. 16th STOC, pp. 480–487. ACM Press, 1984.
     [doi:10.1145/800057.808717] 5

[11] S WASTIK KOPPARTY AND S RIKANTH S RINIVASAN: Certifying polynomials for AC0 [⊕] circuits,
     with applications to lower bounds and circuit compression. Theory of Computing, 14(12):1–24,
     2018. Preliminary version in FSTTCS’12. [doi:10.4086/toc.2018.v014a012] 4, 7

[12] RYAN O’D ONNELL AND K ARL W IMMER: Approximation by DNF: Examples and counterexam-
     ples. In Proc. 34th Internat. Colloq. on Automata, Languages and Programming (ICALP’07), pp.
     195–206. Springer, 2007. [doi:10.1007/978-3-540-73420-8_19] 5, 6, 9

[13] A LEXANDER A LEXANDROVICH R AZBOROV: Lower bounds on the size of bounded depth circuits
     over a complete basis with logical addition. Mathematical Notes of the Academy of Sciences
     of the USSR, 41(4):333–338, 1987. Russian original in Mat. Zametki 41(4), 1987, 598–607.
     [doi:10.1007/BF01137685] 2, 4, 7

[14] B ENJAMIN ROSSMAN: The average sensitivity of bounded-depth formulas. Comput. Complex-
     ity, 27(2):209–223, 2018. Preliminary version in FOCS’15. [doi:10.1007/s00037-017-0156-0,
     arXiv:1508.07677] 2, 18

[15] P ETR S AVICKÝ AND A LAN R. W OODS: The number of Boolean functions computed by formulas
     of a given size. Random Structures Algorithms, 13(3–4):349–382, 2000. [doi:10.1002/(SICI)1098-
     2418(199810/12)13:3/4<349::AID-RSA9>3.0.CO;2-V]

[16] ROMAN S MOLENSKY: Algebraic methods in the theory of lower bounds for Boolean circuit
     complexity. In Proc. 19th STOC, pp. 77–82. ACM Press, 1987. [doi:10.1145/28395.28404] 2, 4, 7,
     18

[17] ROMAN S MOLENSKY: On representations by low-degree polynomials. In Proc. 34th FOCS, pp.
     130–138. IEEE Comp. Soc. Press, 1993. [doi:10.1109/SFCS.1993.366874] 4, 7

[18] M ARIO S ZEGEDY: Algebraic Methods in Lower Bounds for Computational Models with Limited
     Communication. Ph. D. thesis, University of Chicago, 1989. Available at advisor’s home page. 4, 7

                     T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                         19
                           B ENJAMIN ROSSMAN AND S RIKANTH S RINIVASAN

[19] L ESLIE G. VALIANT: Short monotone formulae for the majority function. J. Algorithms, 5(3):363–
     366, 1984. [doi:10.1016/0196-6774(84)90016-6] 5, 9


AUTHORS

      Benjamin Rossman
      Assistant professor
      Department of Mathematics and
      Department of Computer Science
      University of Toronto
      Toronto, ON, Canada
      ben Rossman utoronto ca
      http://math.toronto.edu/~Rossman


      Srikanth Srinivasan
      Assistant professor
      Department of Mathematics
      IIT Bombay
      Powai, Mumbai, India
      srikanth math iitb ac in
      http://www.math.iitb.ac.in/~srikanth/


ABOUT THE AUTHORS

      B ENJAMIN ROSSMAN found his way to complexity theory after studying mathematical logic
         as an undergraduate at the University of Pennsylvania and subsequently as an intern with
         Yuri Gurevich at Microsoft Research. He received his Ph. D. from the Massachusetts
         Institute of Technology in 2010, under the supervision of Madhu Sudan and under the
         distraction of the ballroom dance team.


      S RIKANTH S RINIVASAN got his undergraduate degree from the Indian Institute of Technol-
          ogy Madras, where his interest in the theory side of CS was piqued under the tutelage of
          N. S. Narayanswamy. Subsequently, he obtained his Ph. D. from The Institute of Mathe-
          matical Sciences in 2011, where his advisor was V. Arvind. His research interests span
          all of TCS (in theory), but in practice are limited to circuit complexity, derandomization,
          and related areas of mathematics. He looks back fondly at the days when he enjoyed
          running and pretending to play badminton.




                      T HEORY OF C OMPUTING, Volume 15 (17), 2019, pp. 1–20                             20