The Large-Error Approximate Degree of AC$^0$

Authors Mark Bun, Justin Thaler,
Plaintext
                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46
                                        www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2019



The Large-Error Approximate Degree of AC0
                                      Mark Bun∗                     Justin Thaler†
              Received January 24, 2020; Revised August 18, 2020; Published September 23, 2021




       Abstract. We prove two new results about the inability of low-degree polynomials to
       uniformly approximate constant-depth circuits, even to slightly-better-than-trivial error. First,
       we prove a tight Ω̃(n1/2 ) lower bound on the threshold degree of the SURJECTIVITY
       function on n variables. This matches the best known threshold degree bound for any AC0
       function, previously exhibited by a much more complicated circuit of larger depth (Sherstov,
                                                     1/2
       FOCS’15). Our result also extends to a 2Ω̃(n ) lower bound on the sign-rank of an AC0
                                                               2/5
       function, improving on the previous best bound of 2Ω(n ) (Bun and Thaler, ICALP’16).
           Second, for any δ > 0, we exhibit a function f : {−1, 1}n → {−1, 1} that is computed
       by a circuit of depth O(1/δ ) and is hard to approximate by polynomials in the following
                                                                          1−δ
       sense: f cannot be uniformly approximated to error ε = 1 − 2−Ω(n ) , even by polynomials
       of degree n1−δ . In our FOCS’17 paper we proved a similar lower bound, but which held
       only for error ε = 1/3.
                                   1−δ
           Our result implies 2Ω(n ) lower bounds on the complexity of AC0 under a variety of
       measures, including discrepancy, margin complexity, and threshold weight, that are central
     An extended abstract of this paper appeared in the Proceedings of the 23rd International Conference on Randomization and
Computation, 2019 [17].
   ∗ Supported by NSF Grant CCF-1947889. This work was done while the author was at Princeton University, and at the

Simons Institute for the Theory of Computing supported by a Google Research Fellowship.
   † Supported by NSF Grant CCF-1845125.



ACM Classification: F.1.2, F.1.3
AMS Classification: 68Q15, 68Q17, 68Q32
Key words and phrases: approximate degree, polynomial approximation, polynomial threshold, poly-
nomial method, dual polynomials, surjectivity, discrepancy


© 2021 Mark Bun and Justin Thaler
c b Licensed under a Creative Commons Attribution License (CC-BY)                              DOI: 10.4086/toc.2021.v017a007
                                             M ARK B UN AND J USTIN T HALER

        to communication complexity and learning theory. This nearly matches the trivial upper
        bound of 2O(n) that holds for every function. The previous best lower bound on AC0 for
                               1/2
        these measures was 2Ω(n ) (Sherstov, FOCS’15). Additional applications in learning theory,
        communication complexity, and cryptography are described.


1     Introduction
The threshold degree of a Boolean function f : {−1, 1}n → {−1, 1}, denoted deg± ( f ), is the least degree
of a real polynomial p that sign-represents f , i. e., p(x) · f (x) > 0 for all x ∈ {−1, 1}n . A closely related
notion is the ε-approximate degree of f , denoted deg f ( f ), which is the least degree of a real polynomial
                                                         ε
p such that |p(x) − f (x)| ≤ ε for all x ∈ {−1, 1} .n

    The parameter setting ε = 1 is a degenerate case: deg       f ( f ) = 0 because the constant 0 function
                                                                   1
approximates any Boolean f to error ε = 1. However, as soon as ε is strictly less than 1, ε-approximate
degree is a highly non-trivial notion with a rich mathematical theory. In particular, it is easily seen that

                                                  deg± ( f ) = lim deg
                                                                   f ( f ).
                                                                       ε
                                                                 ε%1

In other words, threshold degree is equivalent to the notion of ε-approximate degree when ε is permitted
to be arbitrarily close to (but strictly less than) 1.1
    In this paper, we are concerned with proving lower bounds on the ε-approximate degree when either:

     • ε is arbitrarily close to 1, or
                                                                  1−δ
     • ε is exponentially close to 1 (i. e., ε = 1 − 2−n                for some constant δ > 0).

The former parameter regime captures threshold degree, while we refer to the latter as large-error
approximate degree. While the approximate and threshold degree of a function f capture simple
statements about its approximability by polynomials, these quantities relate intimately to the complexity
of computing f in concrete computational models. Specifically, the query complexity models UPPdt and
PPdt , and the communication models UPPcc , PPcc , are all defined (see Section 2) as natural analogs of the
Turing machine class PP, which in turn captures probabilistic computation with arbitrarily small advantage
over random guessing. Briefly, both UPPdt and PPdt capture the minimum number of queries required to
evaluate a Boolean function with probability at least 1/2 on every input; UPPcc , PPcc do the same for two-
party communication problems. It is known that the threshold degree of f is equivalent to its complexity
UPPdt ( f ), while the logarithm of a fundamental matrix-analytic analog of threshold degree known as sign-
rank characterizes UPPcc . Similarly, large-error approximate degree characterizes the query complexity
measure PPdt , in the following sense: for any d > 0, deg   f                            dt
                                                               1−2−d ( f ) ≥ Ω(d) ⇐⇒ PP ( f ) ≥ Ω(d). As
described in Section 2, PPdt is also closely related to the notion of threshold weight. Section 2 elaborates
on these models and their many applications in learning theory, circuit complexity, and cryptography.
    1 It is known that for any d > 0, there are functions of threshold degree d that cannot be approximated by degree d polynomials
                               d
to error better than 1 − 2−Ω̃(n ) [37], and this bound is tight [11]. Hence, threshold degree is also equivalent to the notion of
ε-approximate degree for some value of ε that is doubly-exponentially close to 1.


                             T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                2
                              T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Our results in a nutshell. We prove two results about the threshold degree and large-error approximate
degree of functions in AC0 .2
    First, we prove a tight Ω̃(n1/2 ) lower bound on the threshold degree (i. e., UPPdt complexity) of a
natural function called SURJECTIVITY, which is computed by a depth-three circuit with logarithmic
bottom fan-in. This matches the previous best threshold degree lower bound for any AC0 function, due to
Sherstov [44]. Our analysis is much simpler than Sherstov’s, which takes up the bulk of a (70+)-page
manuscript [44]. An additional advantage of our analysis is that our lower bound on the threshold
degree of SURJECTIVITY “lifts” to give a lower bound for the communication analog UPPcc as well. In
particular, we obtain an Ω(n1/2 ) UPPcc lower bound for a related AC0 function; this improves over the
previous best UPPcc lower bound for AC0 , of Ω(n2/5 ) [15].
    Second, we give nearly optimal bounds on the large-error approximate degree (and hence, PPdt
complexity) of AC0 . For any constant δ > 0, we show that there is an AC0 function with ε-approximate
                                        1−δ
degree Ω(n1−δ ), where ε = 1 − 2−Ω(n ) . This result lifts to an analogous PPcc lower bound.
    To summarize our results succinctly:

   • We prove an Ω̃(n1/2 ) lower bound on the UPP complexity of SURJECTIVITY in the query setting,
     and of a related AC0 function in the communication setting.

   • We prove an Ω(n1−δ ) lower bound on the PP complexity of some AC0 circuit of depth O(1/δ ), in
     both the query and communication settings.

   Table 1 compares our new lower bounds for AC0 to the long line of prior work with similar goals.
Context and prior work. The study of both large-error approximate degree and threshold degree has
led to many breakthrough results in theoretical computer science, especially in the algorithmic and
complexity-theoretic study of constant-depth circuits. For example, threshold degree upper bounds are
at the core of many of the fastest known PAC learning algorithms. This includes the notorious case of
polynomial-size CNF formulas on n variables, for which the fastest known algorithm [27] runs in time
exp(Õ(n1/3 )) owing to an Õ(n1/3 ) upper bound on the threshold degree of any such formula. This upper
bound is tight, matching a classic Ω(n1/3 ) lower bound of Minsky and Papert [33] for the following
read-once CNF: ANDn1/3 ◦ ORn2/3 (here, we use subscripts to clarify the number of inputs on which a
function is defined).
     In complexity theory, breakthrough results of Sherstov [39, 40] and Buhrman et al. [11] used lower
bounds on large-error approximate degree to show that there are AC0 functions with polynomial PPcc
complexity. One notable implication of these results is that Allender’s [3] classic simulation of AC0
functions by depth-three majority circuits is optimal. (This resolved an open problem of Krause and
Pudlák [30].) A subsequent, related breakthrough of Razborov and Sherstov [38] used Minsky and
Papert’s lower bound on the threshold degree of ANDn1/3 ◦ ORn2/3 to prove the first polynomial UPPcc
lower bound for a function in AC0 , answering an old open question of Babai et al. [5].
   These breakthrough lower bounds raised the intriguing possibility that AC0 functions could be
maximally hard for the UPPcc and PPcc communication models, as well as for related complexity
measures. Nevertheless, the quantitative parameters achieved in these papers are far from actually
  2 AC0 is the non-uniform class of sequences of functions computed by polynomial-size Boolean circuits of constant depth.



                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                          3
                                        M ARK B UN AND J USTIN T HALER

showing that this is the case. Indeed, the following basic questions about the complexity of AC0 remain
open.
Open Problem 1. Is there an AC0 function F : {−1, 1}n×n → {−1, 1} with UPPcc complexity Ω(n)?
Open Problem 2. Is there an AC0 function F : {−1, 1}n×n → {−1, 1} with PPcc complexity Ω(n)?
    An affirmative answer to either question would be tight: Every function F : {−1, 1}n × {−1, 1}n →
{−1, 1} has UPPcc and PPcc complexity at most n. Obtaining an affirmative answer to Open Problem 1
is harder than for Open Problem 2, since UPPcc ( f ) ≤ PPcc ( f ) for all f .
    Guided by these open problems, a sequence of papers has established quantitatively stronger and
more general lower bounds for AC0 functions [13–16, 18, 43, 44]. In addition to making partial progress
toward resolving these questions, the techniques developed in this body of work have found fruitful
applications in new domains. For example, Bouland et al. [9] built on techniques from some of these
papers [13–15, 43] to resolve several old open questions about the relativized power of statistical zero
knowledge proofs and their variants. As another example, our recent papers [12, 18] built on the same
line of work to resolve or nearly resolve a number of longstanding open questions in quantum query
complexity. Finally, large-error and threshold degree lower bounds on AC0 functions have recently
proved instrumental in the development of cryptographic secret-sharing schemes with reconstruction
procedures in AC0 [7, 8, 19]. We thus believe that the new techniques developed in this article will find
further applications, perhaps in unexpected areas.
    Prior to our work, the best known result toward a resolution of Open Problem 1 was an Ω(n2/5 ) lower
bound on UPPcc complexity of an AC0 function [15], while the best known result toward Open Problem
2 was an Ω(n1/2 ) bound on the PPcc complexity of a very complicated AC0 circuit [44].

1.1     Our results in detail
1.1.1    Resolving the threshold degree of SURJECTIVITY
Surjectivity and its history. Let R be a power of 2 and n = N log R. The function SURJECTIVITYR,N
(SURJR,N for short) is defined as follows. Given an input in {−1, 1}n , SURJR,N interprets the input as
a list of N numbers (s1 , . . . , sN ) from a range [R] := {1, . . . , R}, and evaluates to −1 if and only if every
element of the range [R] appears at least once in the list.3 SURJR,N is computed by an AC0 circuit of
depth three and logarithmic bottom fan-in, since it is equivalent to the ANDR (over all range items r ∈ [R])
of the ORN (over all inputs i ∈ [N]) of “Is input si equal to r?”, where the quoted question is computed by
a conjunction of width log R over the input bits.
     SURJR,N has been studied extensively in the contexts of quantum query complexity and approximate
degree. Beame and Machmouchi [6] showed that computing SURJR,N for R = N/2 + 1 requires Ω̃(n)
quantum queries, making it the only known AC0 function with linear quantum query complexity. Mean-
while, the (1/3)-approximate degree of SURJR,N was recently shown to be Θ̃(R1/4 · N 1/2 ). The lower
bound is from our prior work [12], while the upper bound was shown by Sherstov [45], with a different
proof given in [12]. In particular, when R = N/2, deg    f (SURJR,N ) = Θ̃(N 3/4 ). Our prior work [12, 18]
                                                            1/3
built directly on the approximate-degree lower bound for SURJR,N to give near-optimal lower bounds on
the (1/3)-approximate degree of AC0 (see Section 3.3 for details).
   3 As is standard, we associate −1 with logical TRUE and +1 with logical FALSE throughout.



                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                   4
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Our result. In spite of the progress described above, the threshold degree SURJR,N remained open.
For R < N/2, an upper bound of Õ(min{R, N 1/2 }) follows from standard techniques (we prove this in
Section 6 for completeness). The best known lower bound was Ω(min{R, N 1/3 }), obtained by a reduction
to Minsky and Papert’s threshold degree lower bound for ANDn1/3 ◦ ORn2/3 . In this work, we settle the
threshold degree of SURJR,N , showing that the known upper bound is tight up to logarithmic factors.

Theorem 1.1. For R < N/2, the threshold degree of SURJR,N is Θ̃(min{R, N 1/2 }). In particular, if
R = N 1/2 , deg± (SURJR,N ) = Θ̃(N 1/2 ).

    In addition to resolving a natural question in its own right, Theorem 1.1 matches the best prior
threshold degree lower bound for AC0 , previously proved in [44] for a much more complicated function
computed by a circuit of strictly greater depth. Furthermore, with some extra effort, our lower bound for
SURJR,N extends to give an Ω̃(n1/2 ) lower bound on the UPPcc complexity of a related AC0 function,
yielding progress on Open Question 1 (see Section 1). In contrast, Sherstov’s Ω(n1/2 ) threshold degree
lower bound for AC0 [44] is not known to extend to UPPcc complexity. As stated in Section 1, the best
previous UPPcc lower bound for an AC0 function was Ω(n2/5 ).

Corollary 1.2. There is an AC0 function F : {−1, 1}n×n → {−1, 1} such that UPPcc (F) ≥ Ω̃(n1/2 ).

1.1.2   AC0 has nearly maximal PPcc complexity
In our second result, for any constant δ > 0, we exhibit an AC0 function f : {−1, 1}n → {−1, 1} with
 f ( f ) = Ω(n1−δ ) for some ε = 1−2−Ω(n1−δ ) . This is a major strengthening of our earlier results [12,18],
deg ε
which gave a similar result for ε = 1/3. By combining this large-error approximate-degree lower bound
with a “query-to-communication lifting theorem” for PP [40], we obtain an Ω(n1−δ ) bound on the PPcc
complexity of an AC0 function, nearly resolving Open Question 2 from the previous section.

Theorem 1.3. For any constant δ > 0, there is an AC0 function F : {−1, 1}n×n → {−1, 1} with PPcc (F) =
Ω(n1−δ ).

    The best previous lower bound for the PPcc complexity of an AC0 function was Ω(n1/2 ) [44].


2    Algorithmic and complexity-theoretic applications
To introduce the applications of our results, we begin by defining the query complexity quantities UPPdt
and PPdt and the communication complexity quantities UPPcc and PPcc .
Query models. In randomized query complexity, an algorithm aims to evaluate a known Boolean function
f on an unknown input x ∈ {−1, 1}n by reading as few bits of x as possible. We say that the query cost of
a randomized algorithm is the maximum number of bits it queries for any input x.

    • UPPdt considers “unbounded error” randomized algorithms, which means that on any input x, the
      algorithm outputs f (x) with probability strictly greater than 1/2. UPPdt ( f ) is the minimum query
      cost of any unbounded-error algorithm for f .

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               5
                                    M ARK B UN AND J USTIN T HALER

   Reference             PPdt                   PPcc               UPPdt             UPPcc       Circuit
              and log(threshold weight) ≈ log(1/discrepancy) = threshold degree ≈ log(sign-rank) Depth
      [33]                —                      —                Ω(n1/3 )             —            2
      [30]                 1/3
                       Ω(n )                     —                   —                 —            3
      [21]                —                  Ω(logk (n))             —             Ω(logk (n))    O(k)
      [35]          Ω(n1/3 logk n)               —            Ω(n1/3 logk (n))         —          O(k)
      [39]                —                       1/5
                                              Ω(n )                  —                 —            3
    [11, 40]              —                   Ω(n1/3 )               —                 —            3
      [38]                —                      —                   —              Ω(n1/3 )        3
      [14]             Ω(n2/5 )               Ω(n2/5 )               —                 —            3
      [43]           Ω(n  1/2−δ  )           Ω(n 1/2−δ  )        Ω(n 1/2−δ  )          —         O(1/δ )
      [44]             Ω(n3/7 )                  —                Ω(n3/7 )             —            3
      [44]             Ω(n1/2 )               Ω(n1/2 )            Ω(n1/2 )             —            4
      [15]                —                      —                   —                  2/5
                                                                                    Ω(n )           3
      [16]           Ω(n1/2−δ )              Ω(n1/2−δ )              —                 —            3
   This paper          Ω̃(n1/2 )              Ω̃(n1/2 )           Ω̃(n1/2 )            —            3
   This paper             —                      —                   —              Ω̃(n1/2 )       7
   This paper         Ω(n )1−δ                    1−δ
                                              Ω(n )                  —                 —         O(1/δ )

Table 1: Comparison of our new bounds for AC0 to prior work in roughly chronological order. The circuit
depth column lists the depth of the Boolean circuit used to exhibit the bound, δ denotes an arbitrarily
small positive constant, and k an arbitrary positive integer. All Boolean circuits are of polynomial size.

   • PPdt ( f ) captures “large” (rather than unbounded) error algorithms. If a randomized query algorithm
     outputs f (x) with probability 1/2 + β for all x, then the PP-cost of the algorithm is the sum of the
     query cost and log(1/β ). The log(1/β ) term here is a proxy for the number of bits of randomness
     used by the algorithm. Hence, the PP-cost of a query algorithm is a lower bound on the runtime,
     assuming both a query and a random coin toss cost at least one time step. PPdt ( f ) is the minimum
     PP-cost of any randomized query algorithm for f .

Communication models. UPPcc and PPcc consider the standard two-party setup where Alice holds an
input x and Bob holds an input y, and they run a private-coin randomized communication protocol to
compute a function f (x, y), while minimizing the number of bits they exchange. In direct analogy to the
query complexity measures above, we say that the communication cost of a randomized protocol is the
maximum number of bits Alice and Bob exchange on any input (x, y).

   • UPPcc ( f ) [36] is the minimum communication cost of any randomized protocol that outputs f (x, y)
     with probability strictly greater than 1/2 on all inputs (x, y).
   • PPcc ( f ) [5] is the minimum PP-cost of a protocol for f , where the PP-cost of a protocol that
     outputs f (x, y) with probability 1/2 + β for all (x, y) is the sum of the communication cost and
     log(1/β ).

   We now give an overview of the applications of Theorem 1.3 and Corollary 1.2. Further technical
background and details are given in Section 9.

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                6
                               T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

2.1     Applications of Theorem 1.3
PPcc is known to be equivalent to two measures of central importance in learning theory and communi-
cation complexity, namely margin complexity [32] and discrepancy [26]. Hence, Theorem 1.3 implies
that AC0 has nearly maximal complexity under both measures. Below, we highlight four additional
applications.
      • Communication Complexity. The PPcc communication model can efficiently simulate almost
        every two-party communication model, including P (i. e., deterministic communication), BPP
        (randomized communication), BQP (quantum), and PNP . The only well-studied exceptions are
        UPPcc , and communication analogs of the polynomial hierarchy (the latter of which we do not know
        how to prove lower bounds against). Hence, in showing that AC0 has essentially maximal PPcc
        complexity, we subsume or nearly subsume all previous results on the communication complexity
        of AC0 .
      • Cryptography. Bogdanov et al. [7] observed that for any f : {−1, 1}n → {−1, 1} and d > 0, if
                         f ( f ) ≥ d, then one obtains a scheme for sharing a secret bit b ∈ {−1, 1} among
        one shows that deg   ε
        n parties such that any subset of d shares provides no reconstruction advantage, yet applying f to all
        n shares yields b with probability at least 1/2 + ε/2. They combined this with known approximate-
        degree lower bounds for AC0 functions to get secret sharing schemes with reconstruction procedures
        in AC0 . Via this connection, an immediate corollary of Theorem 1.3 is a nearly optimal secret
        sharing scheme in AC0 : for any desired constant δ > 0, any subset of n1−δ shares provides no
        reconstruction advantage, yet all n shares can be successfully reconstructed (by applying an AC0
                                             1−δ
        function) with probability 1 − 2−n .
      • Learning Theory. Valiant [48] introduced the evolvability model to quantify how complex
        functions can emerge through an evolutionary process over realistic population sizes within
        realistic time periods. Feldman [20] showed that the “weak evolvability” of a class of functions
        F = {φ1 , . . . , φ|F| } is characterized by the PPcc complexity of the function F(x, y) = φx (y). Hence,
        a consequence of Theorem 1.3 is that there are AC0 functions that are nearly maximally hard to
                                                                                                 1−δ
        evolve. That is, for any constant δ > 0, there are AC0 functions that require either 2n generations,
                                       1−δ
        or populations of size 2n to evolve, even if one only wants evolution to produce a function that
                                     1−δ
        has advantage just 2−n over random guessing. See [20] for additional details.
      • Circuit Complexity. If PPcc ( f ) ≥ d, then f is not computable by Majority-of-Threshold circuits
        of size 2Ω(d) [34]. Hence, by showing that AC0 has nearly maximal PPcc complexity, we show that
                                                                                                  1−δ
        there are AC0 functions that are not computed by Majority-of-Threshold circuits of size 2n . That
        is, AC0 has essentially no non-trivial simulation by Majority-of-Threshold circuits (in contrast,
        AC0 can be efficiently simulated by depth-three Majority circuits [3]).

2.2     Applications of Corollary 1.2
As indicated in Section 1, UPPcc (F) is known to be characterized by (the logarithm of) of the sign-rank
of the matrix [F(x, y)]x,y∈{−1,1}n×n [36].4 Hence, Corollary 1.2 implies an exp(Ω̃(n1/2 )) lower bound on
  4 The sign-rank of a matrix M with entries in {±1} is the least rank of a real matrix M 0 that agrees in sign with M entry-wise.



                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                 7
                                     M ARK B UN AND J USTIN T HALER

the sign-rank of AC0 function. Below, we highlight two additional applications of Corollary 1.2, based on
the following connections between communication complexity, circuit complexity, and learning theory.
    In communication complexity, UPPcc is the most powerful two-party model against which we know
how to prove lower bounds. In circuit complexity, if UPPcc ( f ) ≥ d, then f cannot be computed by
Threshold-of-Majority circuits of size 2Ω(d) [22]. (Threshold-of-Majority circuits represent the most
powerful class of threshold circuits against which we can prove superpolynomial lower bounds.) In
learning theory, it is commonly assumed that data can be classified by a halfspace in many dimensions;
the UPPcc -complexity of a concept class precisely captures how many dimensions are needed. To connect
this to a previously mentioned example, Klivans and Servedio [27] observed that an upper bound of d
on the UPPcc complexity of a concept class C yields a PAC learning for C running in time 2O(d) . They
                                1/3
used this result to give a 2Õ(n ) -time algorithm for PAC-learning CNFs. This remains the state-of-the-art
algorithm for this fundamental problem. Accordingly, Corollary 1.2 has the following implications.

      • Circuit Complexity. There are AC0 functions that are not computable by Threshold-of-Majority
                              1/2
        Circuits of size 2Ω̃(n ) .
                                                                                                          1/2 )
      • Learning Theory. UPPcc -based learning algorithms cannot learn AC0 in time better than 2Ω̃(n              .


3      Techniques
3.1     The SURJECTIVITY Lower Bound
For a function fn , let f ≤N denote the partial function obtained by restricting f to the domain of inputs of
Hamming weight at most N. The ε-approximate degree of f ≤N , denoted deg       f ( f ≤N ), is the least degree
                                                                                  ε
of a real polynomial p such that

                     |p(x) − f (x)| ≤ ε for all inputs x of Hamming weight at most N.                   (3.1)

Note that Property (3.1) allows p to behave arbitrarily on inputs x of Hamming weight more than N.
Similarly, the threshold degree of f ≤N is the least degree of a real polynomial p such that

                      p(x) · f (x) > 0 for all inputs x of Hamming weight at most N.

Our prior work [18] showed that the ε-approximate (respectively, threshold) degree of SURJR,N is
equivalent to the ε-approximate (respectively, threshold) degree of (ANDR ◦ ORN )≤N . Hence, the main
technical result underpinning our threshold-degree lower bound for SURJ is the following theorem about
the threshold degree of (ANDR ◦ ORN )≤N (we have made no effort to optimize the logarithmic factors).
                                                           
Theorem 3.1. Let R = N 1/2 . Then deg± (ANDR ◦ ORN )≤N = Ω(N 1/2 / log3/2 N).

Discussion. Theorem 3.1 is a substantial strengthening of the classic result of Minsky and Papert [33]
mentioned above, which established that the total function MPN 1/2 ,N := ANDN 1/2 ◦ ORN on n = N 3/2
inputs has threshold degree Ω(N 1/2 ). Theorem 3.1 establishes that Minsky and Papert’s lower bound
holds even under the promise that the n-bit input to MPN 1/2 ,N has Hamming weight at most N = n2/3 .

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                 8
                            T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

That is, any polynomial that sign-represents ANDn1/3 ◦ ORn2/3 on inputs of Hamming weight at most n2/3
has degree Ω̃(n1/3 ), even when p is allowed to behave arbitrarily on inputs of Hamming weight larger
than n2/3 .
Proof overview for Theorem 3.1 and comparison to prior work. Like much recent work on approxi-
mate-degree and threshold-degree lower bounds, our proof makes use of dual polynomials. A dual
polynomial is a dual solution to a certain linear program capturing the approximate or threshold degree of
any function, and acts as a certificate of the high approximate or threshold degree of the function.
    A dual polynomial that witnesses the fact that deg± ( fM ) ≥ d is a function ψ : {−1, 1}M → {−1, 1}
satisfying three conditions:

    • ψ(x) · f (x) ≥ 0 for all x ∈ {−1, 1}M . If ψ satisfies this condition, we say ψ agrees in sign with f .

    • ∑x∈{−1,1}M |ψ(x)| = 1. If ψ satisfies this condition, it is said to have `1 -norm equal to 1.

    • For all polynomials p : {−1, 1}M → R of degree at most d, ∑x∈{−1,1}M p(x)·ψ(x) = 0. If ψ satisfies
      this condition, it is said to have pure high degree at least d.
                                 f ( fM ) ≥ d is similar, except that the first condition is replaced with:
A dual witness for the fact that degε

    • ∑x∈{−1,1}M ψ(x) · f (x) > ε. If ψ satisfies this condition, it is said to be ε-correlated with f . If
      ψ(x) · f (x) < 0, we say that ψ makes an error at x.

  Sherstov [44] reproved Minsky and Papert’s result by constructing an explicit dual witness for
MPN 1/2 ,N , via a two-step process. First, Sherstov started with a dual witness ψbase for the fact that

                              f (MP 1/2 ) = Ω(N 1/2 ), for ε = 1 − 2−N 1/2 .
                              degε N ,N

The function ψbase was introduced in our prior work [14], where it was constructed by combining a dual
witness for ANDN 1/2 with a dual witness for ORN via a technique called dual block composition [31,42,47]
(see Section 5.8 for details of this very important technique for combining dual witnesses).
    Unfortunately, ψbase falls short of witnessing Minsky and Papert’s threshold-degree lower bound
because it makes errors on some inputs. In the second step of Sherstov’s construction [44], he adds in a
correction term that zeros out the errors of ψbase , without disturbing the sign of ψbase on any other inputs,
and without lowering its pure high degree.
    Theorem 3.1 asserts that MP≤N  N 1/2 ,N
                                            satisfies the same threshold-degree lower bound as MPN 1/2 ,N itself.
To prove Theorem 3.1, we need to construct a dual witness ψ that not only reproves Minsky and Papert’s
classic lower bound for MPN 1/2 ,N , but also satisfies the extra condition that:

                        ψ(x) = 0 for all inputs x of Hamming weight more than N .                          (3.2)

(Here, the Hamming weight of x is the number of −1 entries.) To accomplish this, we apply a novel
strategy that can be thought of as a three-step process. First, like Sherstov, we start with ψbase . Second,
we modify ψbase to obtain a dual witness ψbase0   that places significant mass on all inputs of Hamming

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                  9
                                     M ARK B UN AND J USTIN T HALER

                                                                           0
weight at most d, for some d = Ω̃(N 1/2 ) (details of the construction of ψbase are described two paragraphs
                                              0
hence). More specifically, we ensure that ψbase satisfies:

                      0
                    |ψbase (x)|  n−d for all inputs x of Hamming weight at most d .                    (3.3)

We refer to this property by saying that ψbase0   is “smooth” or “large” on all inputs of Hamming weight at
most d. Note that, in modifying ψbase to obtain ψbase  0  , we do not correct the errors that ψbase makes, nor
                       0
do we ensure that ψbase is supported on inputs of Hamming weight at most N.
    Third, we add in a correction term, very different than Sherstov’s correction term, that not only zeros
                     0
out the errors of ψbase  , but also zeros out any mass it places on inputs of Hamming weight more than N.
While the general technique we use to construct this correction term appeared in our prior work [12, 18],
the novelty in our construction and analysis is two-fold. First, the technique was used in our prior work
only to zero out mass placed on inputs of Hamming weight more than N (i. e., to ensure that Equation (3.2)
is satisfied), not to correct errors. Second, and more importantly, we crucially exploit the largeness of
  0
ψbase  on inputs of Hamming weight at most d to ensure that the correction term does not disturb the sign
     0
of ψbase on any inputs other than those on which it is deliberately being zeroed out. This is what enables
us to obtain a threshold-degree lower bound, whereas our in our prior work we were only able to obtain
ε-approximate-degree lower bounds for ε bounded away from 1.

    Our “smoothing followed by correction” approach appears to be significantly more generic than the
correction technique of [44]. For example, prior work of Bouland et al. [9] proved an Ω(n1/4 ) lower
bound on the threshold degree of a certain function denoted GAPMAJn1/4 ◦ PTPn3/4 , and used this result
to give an oracle separating the complexity classes SZK and UPP, thereby answering an open question of
Watrous from 2002. Our techniques can be used to give a much simpler proof of this result of this lower
bound on threshold degree. We expect that our technique will find additional applications in the future.

Details of the smoothing step. As stated above, the dual witness ψbase from our prior work does not
have the property we need (see Equation (3.3)) of being “large” on all inputs of Hamming weight at most
d = Ω̃(N 1/2 ).
     Fortunately, we observe that although ψbase is not large on all inputs of Hamming weight at most
d, it is large on one very special input of low Hamming weight, namely the ALL-FALSE input. That is,
ψbase (1) ≥ 2−d . So we just need a way to “bootstrap” this largeness property on 1 to a largeness property
on all inputs of Hamming weight at most d. Put another way, we need to be able to treat other inputs of
Hamming weight at most d as if they actually have Hamming weight 0. But MPN 1/2 ,N := ANDN 1/2 ◦ ORN
has a property that enables precisely this: we can fix the inputs to any constant fraction c of the
OR gates to an arbitrary value in OR−1 (−1), and the remaining function of the unrestricted inputs is
AND(1−c)·R ◦ ORN . This is “almost” the same function as ANDR ◦ ORN ; we have merely slightly reduced
the top fan-in, which does not substantially lower the threshold degree of the resulting function.
     We exploit the above observation to achieve the following: for each input x of Hamming weight at
most d, we build a dual witness νx targeted at x (i. e., that essentially treats x as if it is the ALL-FALSE
input). We do this as follows. Let T be the set of all OR gates that are fed one or more −1s by x, and let
S ⊆ [N 1/2 · N] be the union of the inputs to each of the OR gates in T . Let ψbase be the dual witness for

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               10
                             T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

ANDN 1/2 −|T | ◦ ORN given in our earlier paper [14]. We let
                                                (
                                                 ψbase (yS̄ )    if yS = xS
                                       νx (y) =
                                                 0               otherwise,

where yS̄ denotes the set of all the coordinates of y other than those in S.
   The dual witness ψbase 0   is then defined to be the average of the νx ’s, over all inputs x of Hamming
weight at most d. This averaged dual witness ψbase 0    has all of the same useful properties as ψbase , and
additionally satisfies the key requirement captured by Equation (3.3).


3.2    Extension to UPPcc : Proof of Corollary 1.2
Building on the celebrated framework of Forster [21], Razborov and Sherstov [38] developed techniques
to translate threshold-degree lower bounds into sign-rank lower bounds. Specifically, they showed that,
in order for a threshold-degree lower bound of the form deg± ( fn ) ≥ d to translate into a UPPcc lower
bound for a related function F, it suffices for the threshold-degree lower bound for fn to be exhibited by a
dual witness φ satisfying the following smoothness condition:

                 |φ (x)| ≥ 2−O(d) · 2−n for all but a 2−Ω(d) fraction of inputs x ∈ {−1, 1}n .                 (3.4)

Note that this is a different smoothness condition than the one satisfied by the dual witness ψbase      0  discussed
                                                                                                  0
above for MPN 1/2 ,N (Equation (3.3)): on inputs x of Hamming weight at most d, |ψbase (x)| is always at
least n−d  2−d · 2−n , whereas on inputs x of Hamming weight more than d, |ψbase               0   (x)| may be 0. In
           0
words, |ψbase (x)| is very large on inputs x of Hamming weight at most d, but may not be large at all on
inputs of larger Hamming weight. In contrast, Equation (3.4) requires a dual witness to be “somewhat
large” (within a 2−O(d) factor of uniform) on nearly all inputs.
    In summary, our construction of a dual witness for MP≤N    N 1/2 ,N
                                                                        that is sketched in the previous subsection
is not sufficient to apply Razborov and Sherstov’s framework to SURJR,N , for two reasons. First, the dual
witness we construct for MP≤N   N 1/2 ,N
                                         is not smooth in the sense of Equation (3.4), as it is only “large” on
inputs of Hamming weight at most d. Second, to apply Razborov and Sherstov’s framework to SURJR,N ,
we actually need to give a smooth dual witness for SURJR,N itself, not for MP≤N       N 1/2 ,N
                                                                                               . Note that SURJR,N is
defined over the domain {−1, 1}n where n = N log R, while MP≤N    N 1/2 ,N
                                                                           is defined over subset of {−1, 1}NR
consisting of inputs of Hamming weight at most N.
    We address both of the above issues as follows. First, we show how to turn our dual witness
µ for MP≤NN 1/2 ,N
                   into a dual witness σ̂ for the fact that deg± (SURJR,N ) ≥ d, such that σ̂ inherits the
“largeness” property of µ on inputsof Hamming weight at most d. Second,              we transform σ̂ into a
dual witness τ for the fact that deg± SURJR,N ◦ ANDlog2 n ◦ PARITYlog3 n ≥ d, such that τ satisfies the
smoothness condition given in Equation (3.4). We conclude that SURJR,N ◦ ANDlog2 n ◦ PARITYlog3 n can
be transformed into a related function F (on Õ(n) inputs, and which is also in AC0 ) that has sign-rank
exp(Ω̃(n1/2 )).

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                     11
                                           M ARK B UN AND J USTIN T HALER

3.3    The PPcc bound: Proof of Theorem 1.3
As mentioned in Section 1.1.2, the core of Theorem 1.3 is to exhibit an AC0 function f such that
 f ( f ) = Ω(n1−δ ) for some ε = 1 − 2−Ω(n1−δ ) . To accomplish this, we prove a hardness amplification
deg  ε
theorem that should be understood in the context of a weaker result from our prior work [18].
     As stated in Section 3.1, for ε = 1/3, our prior work [18] showed how to take any Boolean function
 fn in AC0 with ε-approximate degree d and transform it into a related function g on roughly the same
number of variables, such that g is still in AC0 , and g has significantly higher ε 0 -approximate degree
for some ε 0 ≈ 1/3. This was done in a two-step process. First, we showed that in order to construct a
“harder” function g, it is sufficient to identify an AC0 function G defined on poly(n) inputs such that for
some ` = n · polylog(n), degf 0 (G≤` )  d. 5 Second, we exhibited such a G. In our prior work [12, 18],
                                ε
for general functions fn , the function G was fn ◦ ANDr ◦ ORm0 , where r = 10 log n, and m0 = Θ(n/d).
     We would like to prove a similar result, but we require that G have larger ε 0 -approximate degree
than fn , where ε 0 is exponentially closer to 1 than is ε itself. Unfortunately, the definition of G from
our earlier papers [12, 18] does not necessarily result in such a function. For example, if fn = ORn (or
any polylogarithmic DNF for that matter), then the function G = fn ◦ ANDr ◦ ORm0 is also a DNF of
polylogarithmic width, and it is not hard to see that all such DNFs have ε-approximate degree at most
polylog(n) for some ε = 1 − 1/npolylog(n) .
     To address this situation, we change the definition of G. Rather than defining G := fn ◦ ANDr ◦ ORm0 ,
we define G = GAPMAJt ◦ fz ◦ ANDr ◦ ORm for appropriately chosen settings of the parameters t, z, r,
and m. (For simplicity of exposition, some of these parameters are described differently here than in the
proof in Section 8.) Here, GAPMAJt denotes any function evaluating to 1 on inputs of Hamming weight
at most t/3, −1 on inputs of Hamming weight at least 2t/3, and taking any value in {−1, 1} on all other
inputs (such functions are also called approximate majorities, and it is known that there are approximate
majorities computable in AC0 ). GAPMAJ has also played an important role in related prior work [9, 18].
                             f 0 (G≤` )  deg
     In order to show that deg                f ( fn ) for an ε 0 that is exponentially closer to 1 than is ε, we
                                 ε                ε
require a more delicate construction of a dual witness than our papers [12,18]. After all, those papers only
required a dual witness for G≤` with correlation at least 1/3 with G` , while we require a dual witness
achieving correlation with G≤` that is exponentially close to 1. Roughly speaking, in [12, 18] we were
able to get away with exclusively using the simple and clean technique called dual block composition
(described in Section 5.8) for constructing dual witnesses for block composed functions by combining
dual witnesses for the constituent functions. In this article, we instead use a closely related but more
involved construction introduced by Sherstov [41]. (Sherstov introduced his construction to prove that
approximate degree satisfies a type of direct-sum theorem.)
     More specifically, suppose that for some positive integer k, fz has ε(z)-approximate degree at least
                                           k/(k+1)
d(z) = zk/(k+1) , where ε(z) = 1 − 2−z             . In our definition of G, we set intermediate parameters
t=n    1/(k+2) ,z=n (k+1)/(k+2) , r = 10 log n, and m = n2/(k+2) , and we build a dual witness for G≤` via a
multi-step construction.
     In Step 1, we take dual witnesses ψ fz , ψANDr , and ψORm for fz , ANDr , and ORm respectively, and we
combine them using the technique of Sherstov [41], to give a dual witness γ for fz ◦ANDr ◦ORm satisfying
the following conditions: γ has pure high degree at least D(n) = n(k+1)/(k+2) = d(n)(k+1)/k  d(n), and γ’s
   5 This step was also used in the analysis of SURJ
                                                       R,N outlined in Section 3.2 above, where G was the function ANDR ◦ ORN .



                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                             12
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

correlation with with fz ◦ ANDr ◦ ORm is ε 00 ≈ ε(z). That is, γ witnesses the fact that the ε 00 -approximate
degree of fz ◦ ANDr ◦ ORm is much larger than the ε(n)-approximate degree of fn itself.
     This step of the construction is in contrast to our prior work, which constructed a dual witness
for fn ◦ ANDr ◦ ORm0 via direct dual block composition of ψ fn , ψANDr , and ψORm . Direct dual block
composition does not suffice for us because it would yield a dual witness with significantly worse
correlation with fz ◦ ANDr ◦ ORm than ε(z).
     While achieving correlation ε 00 ≈ ε(z) is an improvement over what would obtain from direct dual
block composition, it is still significantly farther from 1 than is ε(n), i. e., 1 − ε 00  1 − ε(n). And we
ultimately need to construct a dual witness for G≤` that is significantly closer to 1 than is ε(n). To address
this issue, in Step 2 of our construction, we use dual block composition to turn γ into a dual witness η for
G = GAPMAJt ◦ fz ◦ ANDr ◦ ORm satisfying the following conditions: η has the same pure high degree
                                                                (k+1)/(k+2) )
as γ, and moreover η has correlation at least ε 0 = 1 − 2−Ω(n                 with G.
     However, after Step 2, we are still not done, because η places some mass on inputs of Hamming
weight as large as t · z · r · m  `. Hence η is only a dual witness to the high ε 0 -approximate degree of G,
not the high ε 0 -approximate degree of G≤` (recall that any dual witness for G≤` , must evaluate to 0 on all
inputs of Hamming weight larger than `, as described in Equation (3.2)). Nonetheless, as in our prior
work [12, 18], we are able to argue that η places very little mass on inputs of Hamming weight more than
`, and thereby invoke techniques from these papers to zero out this mass. The reason this final step of
the argument is not immediate from [12, 18] is as follows. Although these papers developed a precise
understanding of how much mass is placed on inputs of Hamming weight more than ` by dual witnesses
constructed via basic dual block composition, the dual witness γ for fz ◦ ANDr ◦ ORm that we constructed
in Step 1 was not built by invoking pure dual block composition. Our key observation is that Sherstov’s
technique that we invoked to construct γ is “similar enough” to vanilla dual block composition that the
precise understanding of dual block composition developed in our prior work can be brought to bear on
our dual witness η.
     In summary, there are two main technical contributions in our proof of Theorem 1.3. The first is the
identification of a hardness amplification construction for ε-approximate degree that not only amplifies
the degree against which the lower bound holds, but also the error parameter ε. The second is constructing
a dual polynomial to witness the claimed lower bound, using techniques more involved and delicate than
the vanilla dual block composition technique that sufficed in [12, 18].


4    Subsequent work and discussion
Subsequent to our work, Sherstov and Wu [46] have made major progress toward resolving Open Problem
1 by showing nearly optimal threshold degree and sign-rank lower bounds for AC0 . Specifically, for
every k ≥ 1, they exhibit a family of depth-k AC0 circuits with threshold degree Ω̃(n(k−1)/(k+1) ). This
generalizes Minsky and Papert’s lower bound of Ω(n1/3 ) on the threshold degree of DNF, as well as our
lower bound of Ω̃(n1/2 ) for the depth-3 SURJECTIVITY function. Sherstov and Wu, moreover, show
that for any positive constant δ > 0 there is a family of AC0 circuits with depth O(1/δ ) and sign-rank
exp Ω̃(n ) . This gives an almost optimal improvement to our sign-rank lower bound of exp Ω̃(n1/2 )
          1−δ
              

on an AC0 function.
    As in our proof of Theorem 1.3 (see Theorem 8.2 in Section 8), as well as our earlier paper [18],

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               13
                                     M ARK B UN AND J USTIN T HALER

Sherstov and Wu obtain their lower bound on the threshold degree of AC0 by recursively applying a
new hardness amplification theorem. Their hardness amplification theorem shows how to convert a
function fz into a new function gn , computable by circuits with slightly higher depth and roughly the
same size, but with polynomially larger threshold degree. Again as in the proof of Theorem 1.3, in
order to obtain such a g, it suffices to construct a function G with deg± (G≤n )  deg± ( f ). Starting
from a function fz with threshold degree z(k−1)/(k+1) , the function G that they identify as sufficient for
this purpose is G = fz ◦ MPr,r2 , where z = n(k+1)/(k+3) and r = n2/(k+3) . When fz is a trivial function,
this recovers our lower bound of Ω̃(n1/2 ) for SURJECTIVITY. Hence, their construction in full can be
viewed as a generalization of our Theorem 3.1 that is amenable to recursive application. This requires
several technical new ideas in the construction of the dual witness. However, we remain optimistic that
the simplicity of our analysis for SURJECTIVITY will nonetheless lead to future applications of our
techniques.
    Sherstov and Wu’s sign-rank lower bound follows from a similar high-level (though more techni-
cally demanding) strategy, where they show that smooth threshold degree also obeys such a hardness
amplification theorem.
    While these new results resolve the most glaring question raised in the initial version of this paper,
a number of interesting directions remain for further study. A common feature of our large-error
approximate-degree lower bound and Sherstov and Wu’s threshold-degree and sign-rank lower bounds
for AC0 is that, in order to obtain lower bounds of the form Ω(n1−δ ), we must consider functions
computed by circuits of depth Θ(1/δ ). This contrasts with the situation for bounded-error approximate
degree [18], where a lower bound of Ω(n1−δ ) can be obtained at depth only O(log(1/δ )). Can one show
that there are AC0 functions f of depth O(log(1/δ )) with deg  f ( f ) = Ω(n1−δ ) for ε = 1 − 2−Ω(n1−δ ) or
                                                                  ε
with deg± ( f ) = Ω(n1−δ )? There is a common underlying reason why our construction and Sherstov
and Wu’s construction both require circuits of depth Θ(1/δ ) and not Θ(log(1/δ )): a component of the
hardness amplifier in both constructions (in our case, GAPMAJn1/(k+1) , and in Sherstov and Wu’s case, the
top gate of MPr,r2 ) is used to amplify error but does not amplify degree. In contrast, in the construction of
[18] for lower bounding bounded-error approximate degree, up to a logarithmic factor, all of the hardness
amplifier is used to amplify degree.

    We would also like to highlight the question of proving sublinear upper bounds on the threshold
degree of AC0 . Given the surprising O(R1/4 · N 1/2 ) upper bound on the (1/3)-approximate degree of
SURJR,N from recent work [12, 45], we have begun to seriously entertain the possibility that for every
function f computable by AC0 of depth k, there is some constant δ (k) > 0 such that the threshold degree
(and possibly even (1/3)-approximate degree) of f is O(n1−δ ). Unfortunately, we cannot currently even
show that this is true for depth-three circuits of quadratic size. Any progress in this direction would be
very interesting, and we believe that such progress would likely lead to new circuit lower bounds.

Outline for the rest of the paper. Section 5 covers technical preliminaries. Section 7 contains the proof
of our tight threshold-degree lower bound for SURJECTIVITY (Theorem 1.1) and its extension to a
sign-rank lower bound for a related function in AC0 (Corollary 1.2). Section 8 proves our nearly optimal
bound on the discrepancy of AC0 (Theorem 1.3). Section 9 elaborates on applications of these results in
communication complexity, circuit complexity, learning theory, and cryptography.

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                14
                            T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

5      Preliminaries
5.1     Notation
For a natural number N, let [N] = {1, 2, . . . , N} and [N]0 = {0, 1, 2, . . . , N}. All logarithms in this paper
are taken in base 2 unless otherwise noted via the notation ln, which refers to base e. For x ∈ R, define
sgn(x) = −1 if x < 0 and 1 otherwise. For any set S, the notation x ∼ S means that x is drawn uniformly
at random from S.
     As mentioned in Section 1, we sometimes use subscripts to indicate the number of variables on which
a function is defined. For example, we denote OR : {−1, 1}n → {−1, 1} by ORn .
     For any input x ∈ {−1, 1}n , |x| = |{i : xi = −1}| denotes the Hamming weight of x. Hn denotes the
set {−1, 1}n , and for any d ≤ n, we define Hn≤d := {x ∈ {−1, 1}n : |x| ≤ d}. Similarly, Hn>d := {x ∈
{−1, 1}n : |x| > d}. Given a Boolean function fn , we denote by fn≤d the partial function obtained by
restricting the domain of fn to Hn≤d .

5.2     Threshold degree and its dual formulation
Definition 5.1. Let D ⊆ {−1, 1}n , and let f be a function mapping D to {−1, 1}. The threshold degree
of f , denoted deg± ( f ), is the least degree of a real polynomial p : {−1, 1}n → R such that p(x) · f (x) > 0
when x ∈ D. No constraint is placed on p(x) for any x ∈ ({−1, 1}n \ D).
    In this paper, we make essential use of a (standard) dual formulation of threshold degree. To describe
this dual formulation, we need to introduce some terminology.
Definition 5.2. Let ψ : {−1, 1}n → R be any real-valued function on the Boolean hypercube. Given
another function p : {−1, 1}n → R, we let hψ, pi := ∑x∈{−1,1}n ψ(x) · p(x), and refer to hψ, pi as the
correlation of ψ with p. If hψ, pi = 0 for all polynomials p of degree at most d, we say that ψ has pure
high degree at least d. We let kψk1 := ∑x∈{−1,1}n |ψ(x)|, and refer to kψk1 as the `1 -norm of ψ.
      The following theorem provides the aforementioned dual formulation of threshold degree.
Theorem 5.3. Let f : D → {−1, 1} with D ⊆ {−1, 1}n . Then deg± ( f ) > d if and only if there is a real
function ψ : {−1, 1}n → R such that:
    1. (Pure high degree): ψ has pure high degree at least d.

    2. (Non-triviality): kψk1 > 0.

    3. (Sign Agreement): ψ(x) · f (x) ≥ 0 for all x ∈ {−1, 1}n .

    4. (Appropriate Support): ψ(x) = 0 for all x ∈ {−1, 1}n \ D.
      We refer to functions mapping {−1, 1}n → R as dual polynomials. For a function f : D → {−1, 1},
we refer to any ψ satisfying the conditions of Theorem 5.3 as a threshold degree dual polynomial for
 f , or alternatively as a dual witness to the fact that deg± ( f ) ≥ d. Along the way to constructing a
threshold-degree dual polynomial for f , we will often construct dual polynomials that almost satisfy the
Sign Agreement and Appropriate Support conditions, and it will be convenient to give names to the inputs

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                  15
                                           M ARK B UN AND J USTIN T HALER

on which these two conditions are violated. To this end, given a dual polynomial ψ : {−1, 1}n → R,
let E(ψ, f ) = {x ∈ D : ψ(x) · f (x) < 0}, and let W(ψ, f ) := {x ∈ {−1, 1}n \ D : |ψ(x)| > 0}. For each
input x ∈ E, we say that ψ makes an error on x. We will let E(ψ, f ) := ∑x∈E(ψ, f ) |ψ(x)| and W (ψ, f ) :=
∑x∈W(ψ, f ) |ψ(x)|.

5.3     Approximate degree and its dual formulation
Definition 5.4. Let D ⊆ {−1, 1}n , and let f be a function mapping D to {−1, 1}. The ε-approximate
                       f ( f ), is the least degree of a real polynomial p : {−1, 1}n → R such that
degree of f , denoted deg ε
|p(x) − f (x)| ≤ ε when x ∈ D. No constraint is placed on p(x) for any x ∈ ({−1, 1}n \ D).6
      The following theorem provides a standard dual formulation of approximate degree.
Theorem 5.5. Let f : D → {−1, 1} with D ⊆ {−1, 1}n . Then deg
                                                          f ( f ) > d if and only if there is a real
                                                             ε
                    n
function ψ : {−1, 1} → R such that:
   1. (Pure high degree): ψ has pure high degree at least d.

   2. (Non-triviality): kψk1 > 0.

   3. (Correlation): ∑x∈{−1,1}n ψ(x) · f (x) ≥ ε · kψk1 .

   4. (Appropriate Support): ψ(x) = 0 for all x ∈ {−1, 1}n \ D.
      We will also frequently use the following simple fact, which is immediate from linearity.
Fact 5.6. Suppose that ψ1 , . . . , ψk : {−1, 1}n → R all have pure high degree at least d. Then their sum,
ψ1 + · · · + ψk , has pure high degree at least d.

5.4     The Surjectivity function
Definition 5.7. For N ≥ R, define the function SURJR,N : [R]N → {−1, 1} by SURJR,N (s1 , . . . , sN ) = −1
iff for every j ∈ [R], there exists an i such that si = j.
    When N and R are clear from context, we will refer to the function SURJ without the explicit
dependence on these parameters. It will often be convenient to think of the input to SURJR,N as a function
mapping {−1, 1}n → {−1, 1} rather than [R]N → {−1, 1}. When needed, we assume that R is a power
of 2 and an element of [R] is encoded in binary using log R bits. In this case we will view Surjectivity as
a function on n = N log R bits, i. e., SURJ : {−1, 1}n → {−1, 1}.
    For technical reasons, when proving lower bounds, it will be more convenient to work with a variant
of SURJ where the range [R] is augmented by a “dummy element” 0 that is simply ignored by the function.
That is, while any of the items s1 , . . . , sN may take the dummy value 0, the presence of a 0 in the input
is not required for the input to be deemed surjective. We denote this variant of Surjectivity by dSURJ.
More formally:
   6 Prior work (e. g., [1, 12]) has also considered another natural definition of approximate degree for promise problems, that

does require p(x) to be bounded in magnitude outside of the domain D on which f is defined. For our purposes in this paper, the
definition that does not constrain p on inputs outside of D is most useful.


                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                              16
                               T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Definition 5.8. For N ≥ R, define the function dSURJR,N : [R]N0 → {−1, 1} by dSURJR,N (s1 , . . . , sN ) =
−1 iff for every j ∈ [R], there exists an i ∈ [N] such that si = j.

   The following simple proposition shows that a lower bound on the approximate degree of dSURJ
implies a lower bound for SURJ itself.

Proposition 5.9 (Bun, Kothari, and Thaler [12]). Let ε > 0 and N ≥ R. Then

                              f (dSURJR,N ) ≤ deg
                              deg             f (SURJR+1,N+1 ) · log(R + 1) .
                                 ε               ε


    We will also use two additional simple properties of SURJR,N . The first roughly says that increasing
the range size can only make SURJECTIVITY harder to approximate (so long as the domain size remains
significantly larger than the range size).

Proposition 5.10. Let ε > 0. Then for any R0 ≥ R, deg
                                                  f (SURJR,N ) ≤ deg
                                                     ε
                                                                 f (SURJR0 ,N+R0 −R ).
                                                                     ε
                                 0           0
Proof. Let p : {−1, 1}(N+R −R)·dlog(R )e → {−1, 1} be a polynomial of degree d that ε-approximates
SURJR0 ,N+R0 −R . We will use to p to construct a polynomial of degree d that ε-approximates SURJR,N .
Recall that an input to SURJR,N takes the form (s1 , . . . , sN ) where each si is the binary representation of a
number in [R]. For every (s1 , . . . , sN ) ∈ [R]N , observe that

                   SURJR,N (s1 , . . . , sN ) = SURJR0 ,N+R0 −R (s1 , . . . , sN , R + 1, R + 2, . . . , R0 ) .

Hence, the polynomial
                                            p(s1 , . . . , sN , R + 1, R + 2, . . . , R0 )

has degree at most d and ε-approximates SURJR,N . This assumes that for each element r of [R], each bit
                                                0
of the encoding of r in {−1, 1}dlog R e (i. e., when r is viewed as an element of [R0 ]), is a degree-1 function
of its encoding in {−1, 1}dlog Re (i. e., when r is viewed as an element of [R]). This property holds for
the natural binary encoding. Even if some encoding were used that did not satisfy this condition, then
p(s1 , . . . , sN , R + 1, R + 2, . . . , R0 ) will have degree no larger than d · (log(R0 ) + 1).

   The second says that increasing the domain size can only make SURJECTIVITY harder to approxi-
mate.

Proposition 5.11. Let ε > 0. If N 0 > N, then deg
                                              f (SURJR,N ) ≤ deg
                                                  ε
                                                             f (SURJR+1,N 0 ).
                                                                ε

Proof. For every (s1 , . . . , sN ) ∈ [R]N , observe that

                   SURJR,N (s1 , . . . , sN ) = SURJR+1,N 0 (s1 , . . . , sN , R + 1, R + 1, . . . , R + 1),
                                                                                                  0
where the element R + 1 is repeated N 0 − N times. Hence, if p : {−1, 1}N ·dlog(R+1)e → {−1, 1} is a
polynomial of degree d that ε-approximates SURJR+1,N 0 , then p(s1 , . . . , sN , R + 1, R + 1, . . . , R + 1) is a
degree d polynomial that ε-approximates SURJR,N .


                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                    17
                                     M ARK B UN AND J USTIN T HALER

5.5   A useful auxiliary function
The following lemma, due to Razborov and Sherstov [38], gives a function that we will use many times
in this paper to “zero out” mass that intermediate dual witnesses ψ place on the “bad” sets E(ψ, f ) and
W(ψ, f ).

Lemma 5.12 ([38, Proof of Lemma 3.2]). Let D, n ∈ N with 0 ≤ D ≤ n − 1. Then for every y ∈ {−1, 1}n
with |y| > D, there exists a function φy : {−1, 1}n → R such that

                                                 φy (y) = 1                                            (5.1)


                                       |x| > D, x 6= y =⇒ φy (x) = 0                                   (5.2)

                                        deg p ≤ D =⇒ hφy , pi = 0                                      (5.3)
                                                          
                                                          |y|
                                                            D
                                           ∑ |φy (x)| ≤ 2 D .                                          (5.4)
                                         |x|≤D



5.6   A dual witness for OR
We will repeatedly make use of a dual witness for the high approximate degree of the OR function. The
dual witness itself was essentially first constructed by Špalek [50], but we require an analysis of it due to
Bun, Kothari, and Thaler [12].

Proposition 5.13 ([12]). There exist constants c1 , c2 ∈ (0, 1) for which the following holds. Let T ∈ N
and 1/T ≤ δ ≤ 1/2. Then there is a function ω : [T ]0 → R such that

                    T
           ω(0) − ∑ ω(t) ≥ 1 − δ ,                                                                     (5.5)
                   t=1
           ω(0) ≥ (1 − δ )/2 ,                                                                         (5.6)
            T
           ∑ |ω(t)| = 1 ,                                                                              (5.7)
           t=0
                                                               √       T
           for all univariate polynomials q : R → R, deg q < c1 δ T =⇒ ∑ ω(t) · q(t) = 0 ,             (5.8)
                                                                              t=0
                                  √ √
                    170 exp(−c2t δ / T )
           |ω(t)| ≤                               ∀t = 1, . . . , T .                                  (5.9)
                            δ · t2

Proposition 5.14. Let T, N ∈ N with T ≤ N, and let δ> 1/T . Define ω as in Proposition 5.13. Define
                                                  N
the function ψ : {−1, 1}N → R by ψ(x) = ω(|x|)/ |x|   for x ∈ HN≤T and ψ(x) = 0 otherwise. Then ψ

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                             18
                              T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

satisfies:

                 hψ, ORN i ≥ 1 − δ ,                                                                     (5.10)
                 ψ(1N ) ≥ (1 − δ )/2 ,                                                                   (5.11)
                 kψk1 = 1 ,                                                                              (5.12)
                                                                √
                 for any polynomial p : {−1, 1}N → R, deg p < c1 δ T =⇒ hψ, pi = 0 ,                     (5.13)
                                              √ √
                               170 exp(−c2t δ / T )
                  ∑ |ψ(x)| ≤            δ · t2
                                                         ∀t = 1, . . . , T .                             (5.14)
                 |x|=t


5.7    Block composition of functions
Definition 5.15. For functions f : Y n → Z and g : X → Y , define the block composition f ◦ g : X n → Z
by ( f ◦ g)(x1 , . . . , xn ) = f (g(x1 ), . . . , g(xn )), for all x1 , . . . , xn ∈ X.

5.8    Dual block composition
We now define the dual block method [31, 42, 47] for combining dual witnesses for functions f , g in order
to obtain a dual witness for f ◦ g. We then state two basic properties of the method.

Definition 5.16. Let Ψ : {−1, 1}M → R and ψ : {−1, 1}m → R be functions that are not identically
zero. Let x = (x1 , . . . , xM ) ∈ ({−1, 1}m )M . Define the dual block composition of Ψ and ψ, denoted
Ψ ? ψ : ({−1, 1}m )M → R, by

                                                                                        M
                      (Ψ ? ψ)(x1 , . . . , xM ) = 2M · Ψ(. . . , sgn (ψ(xi )) , . . . ) · ∏ |ψ(xi )| .
                                                                                       i=1

Proposition 5.17 ([18, 42]). The dual block composition has the following properties:

Preservation of `1 -norm:       Assume that ψ has pure high degree at least 1. If kΨk1 = 1 and kψk1 = 1,
then
                                                    kΨ ? ψk1 = 1 .                                       (5.15)

Multiplicativity of pure high degree: If hΨ, Pi = 0 for every polynomial P : {−1, 1}M → {−1, 1} of
degree less than D, and hψ, pi = 0 for every polynomial p : {−1, 1}m → {−1, 1} of degree less than d,
then for every polynomial q : {−1, 1}m·M → {−1, 1},

                                        deg q < D · d =⇒ hΨ ? ψ, qi = 0 .                                (5.16)

Associativity:   For every ζ : {−1, 1}mζ → R, ϕ : {−1, 1}mϕ → R, and ψ : {−1, 1}mψ → R, we have

                                             (ζ ? ϕ) ? ψ = ζ ? (ϕ ? ψ) .                                 (5.17)

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                19
                                             M ARK B UN AND J USTIN T HALER

    Given three dual witnesses ζ , ϕ, and ψ, the associativity property allows us to write ζ ? ϕ ? ψ without
ambiguity.
    Next, we state an important lemma that follows directly from an analysis of Bun, Kothari, and
Thaler [12, Proof of Proposition 31]. This lemma shows that if ψ is the dual witness for ORN from
Proposition 5.14, then for any Φ : {−1, 1}R → {−1, 1}, ξ = Φ ? ψ places an exponentially small amount
                                ≤N 7
of mass on inputs outside of HN·R  .
                                                                                 √
Lemma 5.18 ( [12]). Fix a parameter 1 ≤ α ≤ R2 , and assume that N ≥ d20 αeR. For any β ∈ (0, 1)
assume that Φ : {−1, 1}R → R satisfies kΦk1 = 1. Furthermore, let ψ : {−1, 1}N → R be symmetric
(meaning ψ(x) depends only on |x|) and assume that ψ satisfies the following conditions:
                                  N
                                 ∑ ∑ ψ(x) = 0 ,                                                                 (5.18)
                                 t=0 |x|=t
                                  N
                                 ∑ ∑ |ψ(x)| = 1 ,                                                               (5.19)
                                 t=0 |x|=t

                                  ∑ |ψ(x)| ≤ α exp(−βt)/t 2                  ∀t = 1, 2, . . . , N .             (5.20)
                                 |x|=t

Then for sufficiently large R,
                                                                                  
                                                                    2        βN
                                         ∑     |(Φ ? ψ)(x)| ≤         exp −          .                          (5.21)
                                          ≤N                        β       6 ln R
                                      x∈H
                                       / N·R

   In particular, if ψ is the dual witness for ORN obtained from Proposition 5.14 with constant parameter
δ ∈ (0, 1) and with T = N/ log3 (N), then there is a constant c3 > 0 such that
                                                                                  √
                                             ∑ |(Φ ? ψ)(x)| ≤ 2−c · N log N .
                                             ≤N
                                                                              3
                                                                                                                (5.22)
                                         x∈H
                                          / N·R


5.9    Hard functions and Hamming weight promises
In our earlier paper [18], we identified a natural class of functions satisfying the following condition: for
any function h : {−1, 1}n → {−1, 1} in the class, there is a related function H : {−1, 1}poly(n) → {−1, 1}
          f (h) = Θ̃(deg
such that deg            f (H ≤N )) for some N = Θ̃(n). This enables one to prove ε-approximate degree
              ε             ε
(respectively, threshold degree) lower bounds on h by lower bounding the ε-approximate degree (threshold
degree) of the related function H, which is defined on significantly more variables than h itself, under the
promise that the Hamming weight of H’s input is at most N.
     In more detail, this connection is as follows. Fix R, N ∈ N, let f : {−1, 1}R → {−1, 1} and let
g : {−1, 1}N → {−1, 1}. Suppose g is a symmetric function, in the sense that for any x ∈ {−1, 1}N and
any permutation σ : [N] → [N], we have

                                         g(x1 , . . . , xN ) = g(xσ (1) , . . . , xσ (N) ) .
   7 Our Lemma 5.18 follows directly from an intermediate calculation in the proof of Proposition 31 of [12].



                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                    20
                                T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Equivalently, the value of g on any input x depends only on its Hamming weight |x|.
    The functions f and g give rise to two functions. The first, denoted by F prop : [R]N0 → {−1, 1}, is a
                                                                                                      ≤N
certain property of a list of numbers s1 , . . . , sN ∈ [R]0 . The second, which we denote by F ≤N : HN·R →
{−1, 1}, is the block composition of f and g restricted to inputs of Hamming weight at most N. Formally,
these functions are defined as:

        F prop (s1 , . . . , sN ) = f (g(1[s1 = 1], . . . , 1[sN = 1]), . . . , g(1[s1 = R], . . . , 1[sN = R]))
                                   (
          ≤N                          f (g(x1 ), . . . , g(xR )) if x1 , . . . , xR ∈ {−1, 1}N , |x1 | + · · · + |xR | ≤ N
        F (x1 , . . . , xR ) =
                                     undefined                   otherwise.

    Observe that for f = ANDR and g = ORN , F prop = dSURJR,N . That is, dSURJR,N can be expressed
as an ANDR (over all range items r ∈ [R]) of the ORN (over all inputs i ∈ [N]) of “Is input xi equal to r?”.
    The following proposition relates the approximate degrees of the two functions F prop and F ≤N .

Theorem 5.19 (Bun and Thaler [18]). Let f : {−1, 1}R → {−1, 1} be any function and let g : {−1, 1}N →
{−1, 1} be a symmetric function. Then for F prop and F ≤N defined above, we have

                                                             f (F ≤N ) .
                                               f (F prop ) ≥ deg
                                               degε             ε

In addition,
                                              deg± (F prop ) ≥ deg± (F ≤N ) .


6    The threshold degree of Surjectivity: Upper bound
For completeness, we prove a tight upper bound on the threshold degree of SURJR,N . This upper bound
follows from standard techniques.
                                                             n                       o
Claim 6.1. The function SURJR,N has threshold degree O min R · log R, N 1/2 · log3/2 R .

Proof. We assume throughout that R ≤ N, the claim being trivial otherwise. This claim follows from two
well-known facts, stated as Expressions (6.1) and (6.2) below.

                                   deg± (SURJR,N ) ≤ deg± (ANDR ◦ ORN ) · log R                                              (6.1)

                                                         n                o
                                deg± (ANDR ◦ ORN ) = O min R, (N log R)1/2                                                   (6.2)

    Expression (6.1) holds because, as mentioned in Section 1.1.1, SURJR,N is equivalent to the ANDR
(over all range items r ∈ [R]) of the ORN (over all inputs i ∈ [N]) of “Is input si equal to r?”, and the
quoted question is computed by a conjunction of width log R over the input bits. That is, if
                                                 (
                                                  −1 if si = j
                                        yi, j :=
                                                  1 otherwise,

                           T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                21
                                        M ARK B UN AND J USTIN T HALER

then
                   SURJR,N (x) = ANDR (ORN (y1,1 , . . . , yN,1 ), . . . , OR(y1,R , . . . , yN,R )).
Hence, if p sign-represents ANDR ◦ ORN , then p(y1,1 , . . . , yN,R ) is a polynomial of degree deg(p) · log R
that sign-represents SURJR,N .
    Equation (6.2) holds by the following standard analysis. To show that
                                deg± (ANDR ◦ ORN ) ≤ O((N log N)1/2 ),
                                                             √
we exploit the well-known fact that deg 1/(3R) (ORN ) = Θ( N log R) [10, 25]. Let p be a minimal degree
                                    f
                                                                                                       R
polynomial that uniformly approximates OR to error 1/(3R). For inputs x = (x1 , . . . , xR ) ∈ {−1, 1}N ,
the polynomial ∑Rj=1 p(x j ) + R − 1 sign-represents ANDR ◦ ORN , and has degree at most deg(p) =
   √
O( N log R).
    To show that deg± (ANDR ◦ ORN ) ≤ O(R), we use the fact that there exist two linear functions p and
q such that the ratio
                                   p(x) − N − ∑Ni=1 xi + 1/(3R)
                                                             
                                         =
                                                N − ∑Ni=1 xi + 1/(3R)
                                                            
                                   q(x)
satisfies |p(x)/q(x) − ORN (x)| ≤ 1/(3R) for every x. Then the following sum of rational functions
                                                                               N
agrees in sign with ANDR ◦ ORN at all inputs x = (x1 , . . . , xR ) ∈ {−1, 1}R : ∑Rj=1 p(x j )/q(x j ) + R −
1 = ∑Rj=1 p(x j )q(x j )/q2 (x j ) + R − 1. Placing everything over the non-negative common denominator
∏Rj=1 q2 (x j ), it follows that the polynomial
                                       R              R
                           (R − 1) · ∏ q2 (x j ) + ∑ p(x j )q(x j )         ∏             q(x j )2
                                      j=1            j=1              k=1,...,R : k6= j

sign-represents ANDR ◦ ORN . The degree of this polynomial is O(R).


7      The threshold degree of Surjectivity: Lower bound
The key technical result established in this section is Theorem 3.1, restated here for the reader’s conve-
nience.
                                                             
Theorem 3.1. Let R = N 1/2 . Then deg± (ANDR ◦ ORN )≤N = Ω(N 1/2 / log3/2 N).

   By combining Theorems 5.19, 5.9, and 3.1, we obtain the desired lower bound on the threshold
degree of SURJ.
Corollary 7.1. For R = N 1/2 , deg± (SURJR,N ) = Ω(N 1/2 / log5/2 N).
    In fact, combining Corollary 7.1 with Proposition 5.10 determines the threshold degree of SURJR,N
up to logarithmic factors, for all settings of R and N.
Corollary 7.2 (Restatement of Theorem 1.1). For any constant c < 1, if 1 < R < cN, then
                                                                 
                               deg± (SURJR,N ) = Ω̃ min(R, N 1/2 ) .


                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               22
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Proof. If N 1/2 ≤ R ≤ cN, let N 0 = Θ(N)
                                       be such that
                                                              0          0 1/2
                                                       N = N + R − (N ) . Then Proposition 5.10
implies that deg± (SURJR,N ) ≥ deg± SURJ(N 0 )1/2 ,N 0 = Ω̃(N 1/2 ), where the final equality holds by
Corollary 7.1.
   If R < N 1/2 , then Proposition 5.11 implies that deg± (SURJR,N ) ≥ deg± (SURJR−1,(R−1)2 ) = Ω̃(R),
where the final equality holds by Corollary 7.1.

Proof of Theorem 3.1. Let d = N 1/2 /(c log3/2 N) for a sufficiently large constant c to be chosen later. Let
ψ be the dual witness for ORN from Proposition 5.14 with T = N/ log3 N and δ = 1/4. Then ψ has pure
high degree at least d. Fix any positive integer r, and define Φ : {−1, 1}r → R as follows:
                                        
                                        −1/2 if w = (−1, −1, . . . , −1)
                                        
                              Φr (w) = 1/2           if w = (1, 1, . . . , 1)
                                        
                                           0        otherwise.
                                        

   Observe that Φr has pure high degree 1, and also kΦr k1 = 1. We remark that the dual block
                                                                                         f (ANDr ◦
composition Φr ? ψ is essentially the same dual witness constructed in [14] to show that degε
            1/2                 −r
ORN ) = Ω(N ) for ε = 1 − 2 (see Inequality (7.3) below). This dual witness was also used in
subsequent work [9, 12, 43].
   We observe that Φr ? ψ has the following properties.

                                              kΦr ? ψk1 = 1 .                                           (7.1)
                                  Φr ? ψ has pure high degree at least d .                              (7.2)
                                     E(Φr ? ψ, ANDr ◦ ORN ) ≤ 2−r .                                     (7.3)
                                                            1
                                       (Φr ? ψ) (1r·N ) ≥     · (3/4)r .                                (7.4)
                                                            2
                                                                   ≤r
                                     (Φr ? ψ) (y) ≥ 0 for all y ∈ Hr·N .                                (7.5)
                                                                                        √
       There is some constant c3 > 0 such that W (Φr ? ψ, (ANDr ◦ ORN )≤N ) ≤ 2      −c3 N log N
                                                                                                   .    (7.6)

Justification for Properties (7.1)-(7.6). Equation (7.1) follows from Proposition 5.17 (see Equa-
tion (5.15)) and the fact that kΦr k1 = kψk1 = 1. Equation (7.2) follows from Proposition 5.17 (see
Equation (5.16)), and the fact that the pure high degree of Φr is at least 1, and the pure high degree of ψ is
at least d. Inequality (7.3) is an immediate consequence of [14, Theorem 1]. Inequality (7.4) is immediate
from the definition of dual block composition and the fact that ψ(1N ) ≥ 3/8 (see Inequality (5.11)).
   Property (7.5) holds because (Φr ? ψ) (x) < 0 only if ψ(xi ) < 0 for all i = 1, . . . , r, and ψ(xi ) < 0
implies that |xi | ≥ 1 (see Inequality (5.11)). Hence, (Φr ? ψ) (x) < 0 =⇒ |x| ≥ r.
    Finally, Property (7.6) is immediate from Lemma 5.18 (see Inequality (5.22)) by choosing c3 suffi-
ciently large.
                                                           ≤d
Definitions of auxiliary functions νx . Fix any x ∈ HR·N      . We think of x as consisting of R blocks, each
of length N. Accordingly, denote x = (x1 , . . . , xR ), where each xi ∈ {−1, 1}N .

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                23
                                           M ARK B UN AND J USTIN T HALER

    Let I = {i : xi = 1N }, and let r(x) = |I|. Enumerate the set I as I = {i1 , i2 , . . . , ir(x) }. Since |x| ≤ d,
                                                   R
r(x) ≥ R − d. Define the function νx : {−1, 1}N → R by
                                            (
                                             (Φr(x) ? ψ)(yi1 , . . . , yir(x) )                          /I
                                                                                    if yi = xi for all i ∈
                    νx (y1 , . . . , yR ) =
                                             0                                      otherwise.

    We observe that νx has the following properties.

                                                          kνx k1 = 1 .                                         (7.7)

                                          νx has pure high degree at least d .                                 (7.8)
                       E (νx , ANDR ◦ ORN ) = E Ψr(x) ? ψ, ANDr(x) ◦ ORN ≤ 2−r(x) .
                                                                                                
                                                                                                               (7.9)

             νx (x) · (ANDR ◦ ORN ) (x) = Φr(x) ? ψ (1R·N ) ≥ 1/2 · (3/4)r(x) ≥ 1/2 · (3/4)R .
                                                   
                                                                                                              (7.10)
                                                                    ≤d
                                            νx (y) ≥ 0 for all y ∈ HRN .                                      (7.11)
                                                                       √
                                   W νx , (ANDR ◦ ORN )≤N+d ≤ 2−c3 N log N .                                  (7.12)

Justification for Properties (7.7)-(7.12). Equations (7.7) and (7.8) are immediate from Equations (7.1)
and (7.2) respectively.
    We now derive Expression (7.9). The first equality holds by the following      reasoning. Since xi 6= 1N
for all i 6∈ I, zi = xi for all i ∈
                                  / I =⇒ ANDR ◦ ORN (z) = ANDr(x) ◦ ORN (zi1 , . . . , zir(x) ). The inequality in
                                                                       

Expression (7.9) follows from Inequality (7.3).
     Expression (7.10) is immediate from the definition of dual block composition and the fact that
sgn(ψ(1N )) > 0 (see Expression (5.11)). Property (7.11) is immediate from Property (7.5) and the fact
that 2d ≤ R. Inequality (7.12) is immediate from Inequality (7.6), the definition of νx , and the fact that
|x| ≤ d.
                                                     ≤d
                                                        | = ∑di=0 RN
                                                                     
Another intermediate dual witness. Let T = |HRN                    i . Consider the dual witness η =
 1
T  · ∑   ≤d νx . Let rmin = min
      x∈HRN                       ≤d r(x) ≥ R − d ≥ R/2.
                               x∈HRN
     We observe that η has the following properties.

                                                          kηk1 ≤ 1 .                                          (7.13)

                                                    kηk1 ≥ 1 − 21−rmin .                                      (7.14)
                                          η has pure high degree at least d .                                 (7.15)
                                             E(η, ANDR ◦ ORN ) ≤ 2−rmin .                                     (7.16)
                                                ≤d
                                   For all y ∈ HRN , η(y) ≥ (1/T ) · 1/2 · (3/4)R .                           (7.17)
                                                                                     √
                                    W (η, (ANDR ◦ ORN )≤N+d ) ≤ 2                 −c3 N log N
                                                                                                .             (7.18)

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                     24
                                T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Justification for Properties (7.13)-(7.18). To justify Inequality (7.13), observe that
                                                      1                         1
                     ∑         |η(y)| ≤       ∑             ∑ |νx (y)| = T ∑                   ∑        |νx (y)|
                  y∈{−1,1}RN               y∈{−1,1}RN
                                                      T       ≤d                       ≤d y∈{−1,1}RN
                                                           x∈HRN                    x∈HRN
                                                                                        1
                                                                                    =
                                                                                        T    ∑ kνx k1 = 1 ,
                                                                                               ≤d
                                                                                            x∈HRN

where the final equality is an immediate consequence of Equation (7.7).
   To justify Inequality (7.14), observe that

                                                                                             1
                                                                       ∑        |η(y)| =          ∑ RN         ∑ νx (y)
                                                                   y∈{−1,1}RN
                                                                                             T y∈{−1,1}         ≤d
                                                                                                             x∈HRN
                                                                                                               
          1                                                                                                     
      ≥      ∑                       ∑              |νx (y)|
                                                               − 
                                                                             ∑                   ∑         |νx (y)| 
          T y∈{−1,1}RN             x∈H≤d :                               y∈{−1,1}RN        x∈H≤d :
                                                                                                                      
                               y6∈E(νx ,ANDR ◦ORN )                                     y∈E(νx ,ANDR ◦ORN )
                                                                                                               
                  1                                                                                             
              =      ∑                    ∑             |νx (y)|
                                                                   −  ∑
                                                                                                 ∑         |νx (y)| 
                  T  ≤d              y∈{−1,1}RN :                                ≤d      y∈{−1,1}RN :
                                                                                                                      
                         x∈HRN                                                  x∈HRN
                                   y6∈E(νx ,ANDR ◦ORN )                                 y∈E(νx ,ANDR ◦ORN )
                                                                                   
                      1 
                    =
                      T    ∑≤d (1 − E (νx , ANDR ◦ ORN )) −  ∑≤d E (νx , ANDR ◦ ORN )
                                  x∈HRN                                              x∈HRN
                                                          1
                                                        =
                                                          T    ∑ 1 − 2 · E (νx , ANDR ◦ ORN ) ≥ 1 − 21−r
                                                                 ≤d
                                                                                                                     min
                                                                                                                           ,
                                                              x∈HRN

where the final statement holds by Expression (7.9).
    Equation (7.15) is immediate from Equation (7.8) and Fact 5.6.
    Expressions (7.16) and (7.18) are respectively immediate from Expressions (7.9) and (7.12) and the
triangle inequality.
    Expression (7.17) follows from Expressions (7.10) and (7.11).
    Let ζ := η/kηk1 . By the definition of ζ , and Equations (7.13)-(7.18), ζ has the following properties.

                                                          kζ k = 1 .                                                       (7.19)
                                 ζ has pure high degree at least d .                                                       (7.20)
                        E (ζ , ANDR ◦ ORN ) ≤ 2−rmin / 1 − 2−rmin ≤ 21−rmin .
                                                                 
                                                                                                                           (7.21)
                                               √                            √
                W ζ , (ANDR ◦ ORN )≤N+d ≤ 2−c3 N log N /(1 − 2−rmin ) ≤ 21−c3 N log N .                                    (7.22)
                                                 ≤d
                                    For all x ∈ HRN , ζ (x) ≥ (1/T ) · 1/2 · (3/4)R .                                      (7.23)

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                  25
                                       M ARK B UN AND J USTIN T HALER

    Let s= (1/T ) · 1/2 · (3/4)R denote the right hand side of Expression (7.23) . Recalling that T =
      RN
∑di=0 i    ≤ (1 + RN)d , and that d = N 1/2 /(c log3/2 N) for a constant c of our choosing, taking c suffi-
ciently large ensures that
                                                s ≥ 2−R/5 .                                         (7.24)

The final dual witness. Let f = (ANDR ◦ ORN )≤N+d . We now modify ζ using Lemma 5.12 to zero out
the mass on the “bad sets” E(ζ , ANDR ◦ ORN ) and W(ζ , f ), thereby constructing a threshold-degree dual
witness µ for f . Specifically, let B = W(ζ , f ) ∪ E(ζ , ANDR ◦ ORN ). By Expression (7.23), and the fact
                            ≤d
that f (x) = 1 for all x ∈ HRN , it holds that
                                                             ≤d
                                            B ⊆ {−1, 1}RN \ HRN .                                        (7.25)

Hence, for every x ∈ B, we can invoke Lemma 5.12 with D = d to obtain a function φx having Proper-
ties (5.1)-(5.4). For every x ∈ B, define ψcorr,x := ζ (x) · φx . Our final dual witness µ is

                                             µ := ζ − ∑ ψcorr,x .                                        (7.26)
                                                       x∈B


Analysis of µ: A brief overview. Since the correction terms in the definition of µ are specifically
designed to have pure high degree d and zero out all of the “bad” mass of ζ , all that remains in order to
show that µ witnesses a degree d lower bound on the threshold degree of f is show that the correction
terms do not disturb the sign of any inputs in their support. To accomplish this, we start by observing
that the total `1 -mass of all the correction objects is at most (W (ζ , f ) + E(ζ , ANDR ◦ ORN )) · 2d · RN
                                                                                                             
                                                                                                           d .
Recalling that R = N 1/2 and d = N 1/2 /(c log3/2 N) for a constant c of our choosing, by Properties (7.21)
                                                                                       √
and (7.22) we can set c sufficiently large to ensure that both R −d −1 > 4R/5 and c3 N log N −1 > 4R/5.
Then
                                                    
                                                    RN
                                                d
          (W (ζ , f ) + E (ζ , ANDR ◦ ORN )) · 2 ·         ≤ 2−4R/5 · (RN)d ≤ 2−3R/5 ≤ s/2 ,            (7.27)
                                                      d
                                                                             ≤d
where the final inequality holds by Property (7.24). Hence, for each x ∈ HRN    , µ(x) · f (x) ≥ ζ (x) · f (x) −
s/2 > s/2, where the final inequality holds by Property (7.23). That is, the correction terms do not disturb
the sign of any points in their support.
Analysis of µ: Details. We need to prove that µ satisfies the conditions of Theorem 5.3 with f =
(ANDR ◦ ORN )≤N+d .

    • (Pure high degree). That µ has pure high degree at least d follows from Fact 5.6 and the fact that
      ζ and each ψcorr,x have pure high degree at least d (Equations (7.20) and (5.3)).

    • (Sign-agreement). To establish sign-agreement, we consider three cases:

           – y ∈ B. In this case µ(y) = ζ (y) − ψcorr,y (y) = ζ (y) − ζ (y) = 0. Here, the first equality holds
             because B ⊆ HRN  >d
                                  (see Equation (7.25)), and for each x ∈ B \ {y}, the point y is not in the
                                                                                                      ≤d
             support of ψcorr,x (as Equation (5.2) states that the support of ψcorr,x is a subset of HRN ∪ {x}).

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                26
                             T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

                      ≤d
         – y 6∈ (B ∪ HRN ). In this case, µ(y) · f (y) = ζ (y) · f (y) ≥ 0. The first equality holds because y 6∈
                                ≤d
           B, and since y 6∈ HRN   , y is not in the support of any ψcorr,x for any x ∈ B (see Equation (5.2)).
                ≤d
         – y ∈ HRN . In this case,

                   µ(y) = ζ (y) − ∑ ψcorr,x (y) ≥ ζ (y) − ∑ ψcorr,x (y) ≥ ζ (y) − ∑ kψcorr,x k1
                                  x∈B                      x∈B                        x∈B
                                                                            
                                                                        d   RN
                         ≥ ζ (y) − (E (ζ , ANDR ◦ ORN ) +W (ζ , f )) · 2 ·     > s − s/2 > 0 .           (7.28)
                                                                             d
              Here, the third to last inequality follows from the definition of ψcorr,x and Expression (5.4),
              and the penultimate inequality follows by Expressions (7.23) and (7.27).
    • (Appropriate support). The first case considered in the sign-agreement analysis above also
      establishes the appropriate support condition, since it shows that µ(x) = 0 for all x ∈ B, and by
      definition of B, HRN
                        >N+d
                             ⊆ B.
    • (Non-triviality). The third case considered in the sign-agreement analysis also establishes non-
                                              ≤d
      triviality, since µ(x) > 0 for all x ∈ HRN .

    By Theorem 5.3, µ witnesses the fact that for R = N 1/2 ,
                                                               
                       deg± (ANDR ◦ ORN )≤N+d = Ω N 1/2 / log3/2 N .

For any t > 0,                                               
                                         deg± (ANDR ◦ ORN )≤t
is clearly nondecreasing with N. Hence,
                                                                  
                        deg± (ANDR ◦ ORN+d )≤N+d = Ω N 1/2 / log3/2 N .

Since d = o(N), setting N 0 = N + d implies that
                                                0
                                                                         
                       deg± (ANDR ◦ ORN 0 )≤N = Ω (N 0 )1/2 / log3/2 (N 0 ) ,

as desired.


8    The large-error approximate degree of AC0 is nearly linear
Theorem 8.1. For any constant δ > 0, there is a function h : {−1, 1}n → {−1, 1} computed by an AC0
                                   f (h) = Ω̃(n1−δ ), for some ε = 1 − 2−Ω̃(n1−δ ) .
circuit of depth O(1/δ ) such that degε

    Theorem 8.1 is a consequence of the following hardness amplification result, which shows how to
transform any function f that is hard to approximate by low-degree polynomials into a function F that is
even harder to approximate, in terms of both the degree and the error that is achievable by polynomials of
said degree.

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                 27
                                             M ARK B UN AND J USTIN T HALER

                                                                                   c
Theorem 8.2. Fix constants k ≥ 1, c ≥ 0. Let fn : {−1, 1}n log n → {−1, 1} be any (infinite family
                         f ( fn ) ≥ Ω nk/(k+1) for some ε = 1 − 2−Ω(nk/(k+1) ) and sufficiently large
                                              
of) functions satisfying degε
                                                                                                c+O(1)
n. Then there is an explicitly given sequence of functions Fn : {−1, 1}n log           n → {−1, 1} such that
                                                     (k+1)/(k+2)
f (Fn ) ≥ Ω n(k+1)/(k+2) for some ε = 1 − 2 (  −Ω   n            ) and sufficiently large n. Moreover, if f is
                          
deg ε                                                                                                      n
computed by polynomial-size circuits of depth ∆, then Fn is computed by polynomial-size circuits of depth
at most ∆ + O(1).

Proof of Theorem 8.1 assuming Theorem 8.2. SURJN 1/2 ,N is a function on n = Θ(N log N) input bits,
computed by a polynomial-size circuit of depth 3, and Corollary 7.1 implies that deg± (SURJN 1/2 ,N ) =
Ω̃(N 1/2 ). Applying Theorem 8.2 to f = SURJN 1/2 ,N with k = 1 yields a function F1 on n·polylog(n) inputs
                                                                                        f (F1 ) ≥ Ω(n2/3 )
that is computed by a polynomial-size circuit of depth 3 + O(1) = O(1), and satisfies deg                      ε
                             2/3
for some ε = 1 − 2−Ω(n ) . Applying Theorem 8.2 yet again, with f = F1 and k = 2 yields a function F2
in AC0 that is also defined on n · polylog(n) inputs, is computed by a polynomial-size circuit of depth
                     f (F2 ) ≥ Ω(n3/4 ) for some ε = 1 − 2−Ω(n3/4 ) .
O(1), and satisfies deg ε
    In general, for any constant δ ∈ (0, 1), let k be a constant such that 1−δ ≤ k/(k +1), i. e., k ≥ 1/δ −1.
Then iteratively applying Theorem 8.2 k − 1 times, starting with f = SURJN 1/2 ,N , yields a function h
computed by an AC0 circuit of depth O(k) = O(1/δ ), defined on n0 = n logO(k) n = Õ(n) input bits, such
     f (h) = Ω(n1−δ ) for some ε = 1 − 2−Ω(n1−δ ) . Theorem 8.1 follows.
that degε

Proof of Theorem 8.2. Theorem 5.19 implies that, in order to prove Theorem 8.2, it is sufficient to
identify a block-composed function G defined on poly(n) inputs that is computed by a polynomial-size
circuit of depth ∆ + O(1) such that for some ` = n · polylog(n), we have degε (G≤` ) ≥ Ω(n(k+1)/(k+2) )
                        (k+1)/(k+2)
for some ε = 1 − 2−Ω(n              ) . Indeed, if we accomplish this, then Theorem 8.2 follows by setting
F =G . prop

    To define G, we need the following lemma, which follows from the techniques of [2] (see [28] for an
exposition).
Lemma 8.3. There exists a Boolean circuit Cn : {−1, 1}n → {−1, 1} with n inputs, depth 3, and size
Õ(n2 ) satisfying the following two conditions:

    • Cn (x) = 1 for all x of Hamming weight at most n/3.

    • Cn (x) = −1 for all x of Hamming weight at least 2n/3.

   We refer to the function computed by the circuit C of Lemma 8.3 as GAPMAJ, short for a gapped
majority function (such a function is sometimes also called an approximate majority function).8 We
remark that while the circuit C from Lemma 8.3 is not explicitly constructed, explicit constructions of
AC0 circuits satisfying the two bulleted conditions of Lemma 8.3 are known [49].
Definition of G. Let t = n1/(k+2) , z = n(k+1)/(k+2) , r = 10 log n, and m = n2/(k+2) . Let N = t · Z · r · M,
where Z = z logc z, and M = 10n logc+4 n. We define G : {−1, 1}N → {−1, 1} to equal GAPMAJt ◦ fz ◦
    8 In prior related work [9], GAPMAJ referred to the promise function that equals 1 for all inputs x of Hamming weight at

most n/3, equals −1 for all inputs x of Hamming weight at least 2n/3, and is undefined otherwise. In contrast, we use GAPMAJ
to refer to any total function in AC0 that agrees with the partial function from [9] at all points in the partial function’s domain.


                            T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                 28
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

ANDr ◦ ORM . Here, fz denotes the function f on Z = z logc z variables whose existence is assumed by
the hypothesis of the theorem.
    We now begin the process of constructing a dual witness µ showing that deg  f (G≤10n logc+4 n ) =
                                                                                   ε
                                      (k+1)/(k+2)
Ω n(k+1)/(k+2) for some ε = 1 − 2−Ω(n             ).
               

    For any appropriate constant c > 0, let φ 0 : {−1, 1}Z → R be a dual witness for the fact that deg
                                                                                                   f ( fz ) ≥
                                                                                                      ε
C·z k/(k+1) =C·n  k/(k+2)                                                                     0
                          := d (for some constant C) as per Theorem 5.5. Then φ := φ /kφ k1 has the
following properties.

                                                    kφ k1 = 1 .                                        (8.1)
                                       φ has pure high degree at least d .                             (8.2)
                              ∑        φ (x) · fz (x) ≥ 1 − δ 0 for some δ 0 = 2−Ω(d) .                (8.3)
                           x∈{−1,1}Z

   Similar to the proof of Theorem 3.1, let ψ be the dual witness for ORM from Proposition 5.14 with
M = 10n logc+4 n, T = m, and δ = 1/4, and let
                                     
                                     −1/2 if w = (−1, −1, . . . , −1)
                                     
                             Φr (w) = 1/2        if w = (1, 1, . . . , 1)
                                     
                                        0       otherwise.
                                     

    We remind the reader that Equations (7.1)-(7.3) from the proof of Theorem 3.1 guarantee that Φr ? ψ
has the following properties.
                                             kΦr ? ψk1 = 1 .                                      (8.4)
                                                                                       
     Φr ? ψ has pure high degree at least that of ψ, which is D0 := Ω m1/2 = Ω n1/(k+2) .         (8.5)

                                 E (Φr ? ψ, ANDr ◦ ORM ) ≤ 2−r = 1/n10 .                               (8.6)
     We now combine φ and Φr ? ψ to obtain an intermediate dual witness γ, which will witness the fact
                                                √
that deg 1−2−Ω(d) ( f z ◦ ANDr ◦ ORM ) = Ω(d · m). The combining technique that we use is precisely the
       f
one introduced by Sherstov [41, Theorem 6.1] to establish a direct-sum theorem for approximate degree.
Roughly speaking, [41, Theorem 6.1] showed that for any Boolean functions f and g, if deg      f (f) ≥ d
                                                                                                  δ1
      f (g) ≥ d 0 , then deg
and deg                      f ( f ◦ g) = Ω(d · d 0 ) for some δ1 ≈ δ2 . The γ that we construct below is (a
         1/n                     δ2
normalized version of) the dual witness constructed in the proof of [41, Theorem 6.1] when applied with
outer function fz and inner function g = ANDr ◦ ORM .
     For a fixed even integer j ∈ (d/2 − 2, d/2], let Pj be the degree j univariate polynomial given by
             j
Pj (a) = ∏i=1   (a − i), and define p j : {−1, 1}Z → R to be the unique multilinear polynomial such that
p j (x) = Pj (|x|). We will need the following properties of p j .
                                                                                             
                                                                                        Z+ j
        The sum of the absolute values of the Fourier coefficients of p j is at most j!         .     (8.7)
                                                                                          j
                                                                                       1
                     For any c ∈ [1 − 1/Z 2 , 1], it holds that p j (c, . . . , c) ≥     j! ≥ 1 .      (8.8)
                                                                                       2

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               29
                                                M ARK B UN AND J USTIN T HALER

    Property (8.7) is precisely Lemma 3.1 (specifically, Property 3.2) of [41]. To see that Property (8.8)
holds, first observe that p j (x) ≥ 0 for all x ∈ {−1, 1}Z , owing to the fact that j is even. Second, by
the multilinearity of p j , the value of p j (c, . . . , c) is the expected value of p j under the distribution over
x ∈ {−1, 1}Z in which each coordinate i of x is chosen independently from {−1, 1} such that the expected
value of xi is c. Denoting this distribution as Bc , Prx∼Bc [x = 1Z ] ≥ 1/2. Hence,
                                                                                            1
                                      p j (c, . . . , c) ≥ Pr [x = 1Z ] · p j (1Z ) ≥         j! .                          (8.9)
                                                            x∼Bc                            2


    Let δ 00 = 2 · E(ANDr ◦ ORM , Φr ? ψ) ≤ 2 · 2−r = 2/n10 . For y ∈ {−1, 1}r·M , define9 :
                          
                          1
                                       if (ANDr ◦ ORM )(y) = sgn((Φr ? ψ)(y)) = +1
                    α(y) = 1 − 2δ  00   if (ANDr ◦ ORM )(y) = sgn((Φr ? ψ)(y)) = −1
                          
                            −1          otherwise.
                          

    For an input x to fz ◦ ANDr ◦ ORM , write x = (x1 , . . . , xZ ) with each xi ∈ {−1, 1}r·M . Define:
                              γ 0 (x1 , . . . , xZ ) = p j (α(x1 ), . . . , α(xZ )) · (φ ? Φr ? ψ) (x),
and let γ = γ 0 /kγ 0 k1 .
   We observe that γ 0 has the following properties.

                                            kγ 0 k1 = p j (1 − 2δ 00 , . . . , 1 − 2δ 00 ) > 0 .                           (8.10)
                                        0                                                          0
                               γ has pure high degree at least (d/2) · D .                            (8.11)
                                                                                   2(δ 00 ) j+1
                                                                                                     
               0                                             00
                                                                             0                    n
     ∑ γ (x) · ( fz ◦ ANDr ◦ ORM ) (x) ≥ p j . . . , 1 − 2δ , . . . · (1 − 2δ ) − (1 − δ 00 )n · j + 1 .
 x∈{−1,1}Z·r·M
                                                                                                      (8.12)
Property (8.10) is precisely Claim 6.2 from [41]. Equation (8.11) is immediate from [41, Equation 6.7],
combined with the fact that Φr ? ψ has pure high degree at least D0 (Equation (8.5)) and φ has pure high
degree at least d (Equation (8.2)). Property (8.12) is precisely Claim 6.3 from [41].
    Recall that we defined γ = γ 0 /kγ 0 k1 . Properties (8.10)-(8.12) imply that γ satisfies the following
conditions:

                                                               kγk1 = 1 .                                                  (8.13)
                                       γ has pure high degree at least (d/2) · D0 .                                        (8.14)

                                                                                       2(δ 00 ) j+1
                                                                                                          
                                                                                                         n
                       ∑         γ(x) · ( fz ◦ ANDr ◦ ORM ) (x) ≥ (1 − 2δ 0 ) −                     ·
                 x∈{−1,1}Z·r·M
                                                                                       (1 − δ 00 )n     j+1
                                                                                          
                                                                         ≥ 1 − 2δ 0 − 2−3d = 1 − 2−Ω(d) .                  (8.15)
   9 Our definition of α specializes the definition in [41, Page 39] to our setting: our definition and the definition in [41, Page

39] coincide owing to the fact that (as established in [14, Theorem 1]) (Φr ? ψ)(y) < 0 =⇒ (ANDr ◦ ORM ) (y) < 0.


                            T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                30
                              T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

In Property (8.15), the penultimate inequality holds because δ 00 ≤ 2 · 2−r = 2/n10 , and j > d/2 − 2.
The final steps in the construction of µ. Next, define ζ : {−1, 1}t·Z·r·M → R via:

                                            ζ = Φt ? γ.
                                c+4
                                     
    For every x ∈ W ζ , G≤10n log n , invoke Lemma 5.12 with D = C0 · n(k+1)/(k+2) (for a sufficiently
small constant C0 to be chosen later) to obtain a function φx satisfying Conditions (5.1)-(5.4), and define
ψcorr,x := ζ (x) · φx . We define our final dual witness to be µ := ζ − ∑x∈Wζ ,G≤10n logc+4 n  ψcorr,x .

Analysis of ζ and µ. We observe that ζ has the following properties.

                                                          kζ k1 = 1 .                                                  (8.16)

                ζ has pure high degree at least that of γ, which is at least D00 := (d/2) · D0 .                       (8.17)


                           ∑           ζ (x) · (GAPMAJt ◦ fz ◦ ANDr ◦ ORM ) ≥ 1 − 2−Ω(d·t) .                           (8.18)
                     x∈{−1,1}t·Z·r·M
                                                         c+4                 (k+1)/(k+2) log n)
                                       W (ζ , G≤10n log        n
                                                                   ) ≤ 2−ω(n                      .                    (8.19)

Justification for Properties (8.16)-(8.19). Equation (8.16) follows from Proposition 5.17 (see Equa-
tion (5.15)), and the facts that kΦt k = 1 and kγk = 1 (see Equation (8.13)), and γ has pure high degree at
least 1. Equation (8.17) follows from Proposition 5.17 (see Equation (5.16)), the fact that Φt has pure
high degree 1, and the fact that γ has pure high degree at least (d/2) · D0 (see Equation (8.14)). The
validity of Inequality (8.18) is established via a standard, but somewhat lengthy, analysis that we defer to
Section 8.1.
    We justify Inequality (8.19) as follows. Denote an input x in {−1, 1}t·Z·r·M as x = (. . . , xi,s , . . . ) where
i ranges over 1, . . . ,t, s ranges over 1, . . . , Z, and each xi,s ∈ {−1, 1}r·M . Since p j is non-negative on all
inputs in [−1, 1]Z (this follows from the same reasoning as in the proof of Property (8.8)),

                                                               −t     t
            |ζ (x)| = (Φt ? φ ? Φr ? ψ) (x) · kγ 0 k1                · ∏ p j (α(xi,1 ), . . . , α(xi,Z ))
                                                                      i=1
                                                                                        t
                   = (Φt ? φ ? Φr ? ψ) (x) · p j (. . . , 1 − 2δ 00 , . . . )−t · ∏ p j (α(xi,1 ), . . . , α(xi,Z ))
                                                                                      i=1
                                                    t
                   ≤ (Φt ? φ ? Φr ? ψ) (x) · ∏ p j (α(xi,1 ), . . . , α(xi,Z ))
                                                   i=1

                                                Z+ j t
                                                  
                   ≤ |(Φt ? φ ? Φr ? ψ) (x)| j!
                                                 j
                   ≤ |(Φt ? Φr ? ψ) (x)| · Z O( jt)
                   ≤ |(Φt ? φ ? Φr ? ψ) (x)| · 2O(dt log n) .                                                          (8.20)


                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                             31
                                          M ARK B UN AND J USTIN T HALER

Here, the second equality holds by Equation (8.10), the first inequality holds by Property (8.8), and
the second inequality follows from Property (8.7) and the fact that |α(xi,s )| ≤ 1 for all i = 1, . . . ,t and
s = 1, . . . , Z.
    Now invoke Lemma 5.18 (and associativity of dual block composition, see Equation (5.17)) to
conclude that
                                                                              c+4
                                                                                   
                  W (Φt ? φ ? Φr ) ? ψ, (GAPMAJt ◦ fz ◦ ANDr ◦ ORM )≤10n log n ≤
                                                                    2     √                 (k+1)/(k+2) log2 n)
                                                       2−Ω(n log n/ m) ≤ 2−Ω(n                                    .           (8.21)

    Combining Inequalities (8.20) and (8.21), we conclude that
                                                 c+4
                                                              (k+1)/(k+2) log2 n
      W ζ , (GAPMAJt ◦ fz ◦ ANDr ◦ ORM )≤10n log n ≤ 2−Ω(n                       ) · 2O(dt log n)
                                                                               (k+1)/(k+2) log2 n
                                                                        ≤ 2−Ω(n                  ) · 2O(n(k+1)/(k+2) log n)
                                                                               (k+1)/(k+2) log2 n
                                                                        ≤ 2−Ω(n                  ),

where the final inequality holds for large enough n.
   Finally, we observe the following properties of µ.
                                                                    (k+1)/(k+2) )
                                             kµk1 ≤ 1 + 2−ω(n                       .                                         (8.22)

                                                                                 
                     µ has pure high degree at least min(D00 , D) = Ω n(k+1)/(k+2) .                                          (8.23)

                                                                                             (k+1)/(k+2)
                            ∑            µ(x) · G(x) ≥ 1 − 2−Ω(t·d) = 1 − 2−Ω(n                            ).                 (8.24)
                       x∈{−1,1}t·Z·r·M

                                                                        >10n logc+4 n
                                         µ(x) = 0 for all x ∈ Ht·Z·r·M                  .                                     (8.25)

Justification for Properties (8.22)-(8.25). For Inequality (8.22), observe that

                             kµk1 ≤ kζ k1 +               ∑                   kψcorr,x k1
                                                                c+4 n
                                                 x∈W(ζ ,G≤10n log         )
                                                                                 
                                                     ≤10n logc+4 n      t ·Z ·r·M
                                                                              D
                                    ≤ 1 +W (ζ , G                )·2 ·
                                                                             D
                                          −ω (n(k+1)/(k+2) log n)
                                    ≤ 1+2                         ,

where the penultimate inequality holds by Expressions (5.4) and (8.16), and the final inequality holds by
Inequality (8.19).
    Equation (8.23) follows from Fact 5.6, Equation (7.20), and the fact that each term ψcorr,x has pure
high degree at least D (Equation (5.3)).

                        T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                     32
                                  T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

      Expression (8.24) holds because

                     ∑           µ(x) · G(x) ≥          ∑          ζ (x) · G(x) −              ∑             kψcorr,x k1
               x∈{−1,1}t·Z·r·M                   x∈{−1,1}t·Z·r·M
                                                                                                        
                                                                                                     c+4
                                                                                     x∈W ζ ,G≤10n log n
                                                                           (k+1)/(k+2) )
                                             ≥ 1 − 2−Ω(td) − 2−ω(n                         ≥ 1 − 2−Ω(td) ,

where the penultimate inequality invokes Expressions (5.4) and (8.19).
                                                                       >10n logc+4 n
    Expression (8.25) holds because for any x ∈ Ht·Z·r·M            , we have µ(x) = ζ (x) − ψcorr,x (x) =
                0. Here, the first equality holds because Property (5.2) ensures that, for any x0 ∈
    − ζ (x) = c+4
ζ (x)
W ζ , G≤10n log n such that x 6= x0 , we have ψcorr,x0 (x) = 0.
    It is immediate from Properties (8.22)-(8.25) that µ satisfies the conditions required by Theorem 5.5
                         f (G≤10n logc+4 n ) = Ω(n(k+1)/(k+2) ) for some ε = 1 − 2−Ω(n(k+1)/(k+2) ) .
to witness the fact that degε


8.1     Justification for Expression (8.18)
All notation in this section the same as in the proof of Theorem 8.2. Let F = fz ◦ ANDr ◦ ORM . Recall
that ζ = Φt ? γ, where γ has the following properties.

                                                            kγk1 = 1 .                                                     (8.26)

                                     γ has pure high degree at least (d/2) · D0 ≥ 1 .                                      (8.27)

                                             ∑          γ(y) · F(y) ≥ ε := 1 − 2−Ω(d) .                                    (8.28)
                                        y∈{−1,1}Z·r·M

Equation (8.27) implies that

                                         ∑            |γ(y)| =         ∑            |γ(y)| = 1/2 .                         (8.29)
                                    y : sgn(γ(y))=1              y : sgn(γ(y))=−1


Let E1 (γ, F) = ∑y : γ(y)>0 and F(y)<0 |γ(y)| and E−1 (γ, F) = ∑y : γ(y)<0 and F(y)>0 |γ(y)|. Combining Expres-
sions (8.28) and (8.29) implies that E1 (γ, F) and E−1 (γ, F) ≤ 2−Ω(d) .
                                                                    t
       Let τ be the product distribution on {−1, 1}Z·r·M given by τ(x1 , . . . , xt ) = ∏ti=1 |γ(xi )|. Given a
string b = (b1 , . . . , bt ) ∈ {−1, 1}t , let τb be the distribution over x ∈ {−1, 1}t·Z·r·M equal to τ conditioned
on (. . . , sgn(xi ), . . . , ) = b. Equation (8.29) implies that when x = (x1 , . . . , xt ) is drawn from τ, the string
(. . . , sgn(γ(xi ), . . . ) is uniformly distributed in {−1, 1}t .
       Moreover, for any given b ∈ {−1, 1}t , the following two random variables are identically distributed:

      • The string (. . . , F(xi ), . . . ) when one chooses (. . . , xi , . . . ) from the conditional distribution τb .

      • The string (. . . , δi bi , . . . ), where δ ∈ {−1, 1}t is a random string whose ith bit independently takes
        on value −1 with probability 2 ∑x∈Eb |ψ(x)| < 2−Ω(d) .
                                                        i



                            T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                              33
                                           M ARK B UN AND J USTIN T HALER

      Thus,

                                                                  ∑          ζ (x) · GAPMAJ(. . . , F(xi ), . . . )
                                                           x∈{−1,1}t·Z·r·M

                                             = 2t ·        ∑      Φt (b) · Ex∼τb [GAPMAJ(. . . , F(xi ), . . . )]
                                                      b∈{−1,1}t
                                                                                                      
                  t1                                           1
              =2      Ex∼τ1t [GAPMAJ(. . . , F(xi ), . . . )] − Ex∼τ−1t [GAPMAJ(. . . , F(xi ), . . . )
                   2                                           2
               1                  (1)           (1)      1                    (−1)            (−1)
              = Eδ (1) [GAPMAJt (δ1 , . . . , δt )] − Eδ (−1) [GAPMAJt (−δ1 , . . . , −δt )] ,
               2                                         2

where for j ∈ {−1, 1}, δ ( j) ∈ {−1, 1}t is a random string whose ith coordinate takes value −1 with
probability 2 ∑x∈E j |ψ(x)| < 2−Ω(d) .
                                     (1)        (1)
    Note that Eδ (1) [GAPMAJt (δ1 , . . . , δt )] ≥ 1 − 2−Ω(td) : since each bit of δ (1) equals 1 with prob-
ability 1 − 2−Ω(d) , the probability that more than t/3 of the coordinates of δ (1) are −1 is at most
  t
     −Ω(td)                                                  (−1)         (−1)
 t/3 · 2      = 2−Ω(td) . Similarly, Eδ (−1) [GAPMAJt (−δ1 , . . . , −δt )] ≤ −1 + 2−Ω(td) . We con-
clude that the correlation of ζ with GAPMAJt ◦ F is at least 1 − 2−Ω(td) as claimed.


9     Algorithmic and complexity-theoretic applications
9.1     Complexity measures
Throughout this section, for a function F : {−1, 1}n × {−1, 1}n → {−1, 1}, we also view F as a 2n × 2n
matrix whose (x, y)’th entry is given by F(x, y).
Approximate rank. For a matrix F ∈ {−1, 1}N×N , the ε-approximate rank of F, denoted rankε (F), is
the least rank of a matrix A ∈ RN×N such that |Ai j − Fi j | ≤ ε for all (i, j) ∈ [N] × [N]. Approximate rank
is a fundamental notion in learning theory and communication complexity. Meanwhile, the sign-rank of
F is the least rank of a matrix A ∈ RN×N such that |Ai j − Fi j | < 1 for all (i, j) ∈ [N] × [N]. Clearly, for
any F, the exact (not just approximate) rank of F is at most N.
Discrepancy. The discrepancy of F, denoted disc(F), is a combinatorial measure of the complexity of F
that roughly captures F’s correlation with combinatorial rectangles (small discrepancy corresponds to
high complexity).

Definition 9.1. A combinatorial rectangle of X ×Y is a set of the form A × B with A ⊆ X and B ⊆ Y . For
a distribution µ over X ×Y , the discrepancy of F with respect to µ is defined to be the maximum over all
rectangles R of the bias of F on R. That is:


                                     discµ (F) = max           ∑ µ(x, y)F(x, y) .
                                                       R
                                                            (x,y)∈R


The discrepancy of F, disc(F) is defined to be minµ discµ (F).

                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                        34
                                 T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

      It is known that for any function F : {−1, 1}n × {−1, 1}n → {−1, 1}, the discrepancy of F is at least
2−O(n) . Discrepancy plays a central role in communication complexity because it characterizes PPcc ,
the class of communication problems efficiently solvable by small-bias protocols [26]. Discrepancy is
also important in circuit complexity, where it lower bounds the size of Majority-of-Threshold circuits
computing F [23, 24, 34, 39]. Finally, discrepancy is known to be equivalent (up to a constant factor) to
margin complexity [32], which is itself a fundamental notion in learning theory.
Threshold weight. The weight of an n-variate polynomial p is the sum of the absolute value of its coeffi-
cients. The length of p is the number of non-zero coefficients of p. The threshold weight (respectively,
length) of F : {−1, 1}N × {−1, 1}N → {−1, 1} is the least weight (respectively, length) of a polynomial
p with integer coefficients such that p(x, y) · F(x, y) > 0 for all (x, y) ∈ {−1, 1}N × {−1, 1}N (note that
no restriction is placed on the degree of p). The threshold weight of F is always at most 2O(n) , since
Parseval’s inequality implies that every Boolean function is always exactly computed by a polynomial of
weight 2O(n) .

9.2     Nearly optimal bounds on discrepancy, threshold weight, and more
Via well-known techniques, we now translate our approximate-degree lower bounds into approximate-
rank, discrepancy, and threshold-weight bounds in a black-box manner.
    We start with the following theorem (from which Theorem 1.3 from Section 1.1.2 follows).

Theorem 9.2. For any constant δ > 0, there is an AC0 function F : {−1, 1}n × {−1, 1}n → {−1, 1} with
                                                                                                1−δ
discrepancy at most exp(−Ω(n1−δ )) and ε-approximate rank exp(Ω(n1−δ )) for some ε = 1 − 2−Ω(n ) .
                                                                                                                               1−δ
Proof. Let f be the AC0 function with ε-approximate degree at least n1−δ for some ε = 1 − 2−Ω(n )
whose existence is guaranteed by Theorem 8.1. The pattern matrix method [40, Theorem 8.1] implies
that for some C = O(1), the function F : {−1, 1}Cn × {−1, 1}Cn → {−1, 1} given by

                                 F(x, y) = f . . . , ∨Cj=1 (xi, j ∧ yi, j ) . . .
                                                                                  

                                                                               1−δ
satisfies rankε 0 (F) ≥ exp(Ω(n1−δ )), for some ε 0 = 1 − 2−Ω(n ) . Moreover, by [40, Theorem 7.3] F
also satisfies disc(F) ≤ exp(−Ω(n1−δ )).10 Moreover, if f is computed by a Boolean circuit of depth k
and polynomial size, then F is computed by a Boolean circuit of polynomial size and depth k + 2. This
completes the proof of the theorem for approximate rank and discrepancy.
    Meanwhile, a result of Krause [29] implies that the following function F 0 on 3n inputs has threshold
             1−δ                                                                            1−δ
weight 2Ω(n ) : F 0 (x, y, z) = f (. . . , (xi ∧ zi ) ∨ (yi ∧ zi ), . . . ) is at least 2Ω(n ) .11 Clearly, since f is
computed by a Boolean circuit of depth k and polynomial size, F 0 is computed by a Boolean circuit of
polynomial size and depth k + 2.
  10 [40, Theorem 7.3] is actually expressed in terms of the degree d threshold weight of f , rather than the ε-approximate

degree of f , but our lower bound on the ε-approximate degree of f is easily seen to imply the lower bound on the degree d
threshold weight of f required to apply [40, Theorem 7.3] (see, e. g., [14, Lemma 20]).
   11 As in Footnote 10, Krause’s result is expressed in terms of the degree d threshold weight of f , rather than the ε-approximate

degree of f , but our lower bound on the ε-approximate degree of f is easily seen to imply the lower bound on the degree d
threshold weight of f required to apply Krause’s result (see, e. g., [14, Lemma 20]).


                            T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                                 35
                                         M ARK B UN AND J USTIN T HALER

    Via established applications of discrepancy, we conclude from Theorem 9.2 that for any constant
δ > 0, there is an AC0 function F with margin complexity exp(Ω(n1−δ )) [32], Majority-of-Threshold
circuit size exp(Ω(n1−δ )) [34, 39], and PPcc communication complexity Ω(n1−δ ) [26]. These bounds
are all nearly tight, in the sense that any function has margin complexity 2O(n) , is computed by Majority-
of-Threshold circuits of size 2O(n) , and has PPcc communication complexity at most n.

9.3     Lower bounds for sign-rank and threshold length
9.3.1    Threshold length
Theorem 9.3. There is a function F : {−1, 1}n → {−1, 1} computed by a polynomial-size circuit of
depth 3 and logarithmic bottom fan-in such that the threshold length of F is exp(Ω(n1/2 )).

Proof. Recall (Corollary 7.1) that for n = N log(N 1/2 ), we have SURJN 1/2 ,N : {−1, 1}n → {−1, 1}, and
deg± (SURJN 1/2 ,N ) = Ω(n1/2 ). Moreover, SURJN 1/2 ,N is computed by a quadratic-size circuit of depth 3,
with an AND gate at the top, and logarithmic bottom fan-in.
    Krause and Pudlák showed that if deg± ( f ) ≥ d, then the function F : {−1, 1}3n → {−1, 1} defined
via F(x, y, z) := f (. . . , (xi ∧ zi ) ∨ (yi ∧ zi ), . . . ) has threshold length 2Ω(d) . If f = SURJ, then F is clearly
computed by a polynomial-size circuit of depth 5, with an AND where gates at the bottom two layers
(just above the inputs) have fan-in 2, and gates at the third-to-bottom layer have fan-in O(log n). Hence,
each gate at the third layer from the bottom computes a function of just O(log n) inputs, and any such
function can be computed by a polynomial-size DNF and logarithmic width. Replacing each gate at the
third-to-bottom layer with an equivalent polynomial-size DNF of polynomial size and logarithmic width,
and collapsing the two adjacent layers of OR gates, yields a depth-three circuit of logarithmic bottom
fan-in.

9.3.2    Sign-rank
Recall that Corollary 1.2 from the Introduction asserted the existence of an AC0 function

                                    F(x, y) : {−1, 1}n × {−1, 1}n → {−1, 1}

such that the sign-rank of the matrix [F(x, y)] is exp(Ω̃(n1/2 )). We prove Corollary 1.2 below.
   Via established applications of sign-rank, we conclude as a consequence that there is a communication
problem in AC0 with UPPcc communication complexity Ω̃(n1/2 ) [36], and such that all Threshold-of-
                                                          1/2
Majority circuits computing the function have size 2Ω̃(n ) [22].

Proof of Corollary 1.2. Suppose that deg± ( f ) ≥ d. In order to establish sign-rank lower bounds for a
certain matrix A derived from f , Razborov and Sherstov [38] extended a lemma of Forster [21] to show
that it is enough to construct a dual witness µ for the fact that deg± ( f ) ≥ d that additionally satisfies a
smoothness condition. Specifically, to show that the sign-rank of A is exp(Ω(d)), it suffices to show that
there is a threshold degree dual witness µ for f satisfying µ(x) = exp(−O(d)) for all but an exp (−O(d))
fraction of inputs in x ∈ {−1, 1}n . Formally, we have the following theorem, which is implicit in [38]
(the statement here is taken from [15, Theorem 4.1]).

                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                       36
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Theorem 9.4 (Implicit in [38, Theorem 1.1]). Let h : {−1, 1}n → {−1, 1} be a Boolean function, and
suppose there exists a function τ : {−1, 1}n → {−1, 1} of pure high degree at least d such that τ(x)·h(x) ≥
0 for all x ∈ {−1, 1}n , and kτk1 = 1. Moreover, suppose that τ(x) ≥ 2−cd · 2−n for all but a 2−cd
fraction of inputs x ∈ {−1, 1}n . Then there exists a constant C (depending only on c) such that if
F(x, y) := h(. . . , ∧Cj=1 (xi j ∨ yi j ), . . . ), then the matrix [F(x, y)]x,y has sign-rank exp(Ω(d)).

    Let R = N 1/2 , and assume R + 1 is a power of 2. Letting n = N log(R + 1), recall dSURJR,N is a
function on n bits, where any x ∈ {−1, 1}n is interpreted as specifying a list of N numbers from [R]0 .
Here, 0 denotes a “dummy item” that is ignored by dSURJ. We assume without loss of generality in this
section that:
                       The dummy element is represented by the string 1log(R+1) .               (9.1)

This ensures that strings x ∈ {−1, 1}n that encode mostly dummy items have low Hamming weight.
    Recall that within the proof of Theorem 1.1 and Corollary 7.1, we proved that deg± (dSURJR,N ) ≥ d
for some d = Ω(N 1/2 / log3/2 N), via a two-step process. First, we borrowed a result (Theorem 5.19)
from our prior work [18], which showed that for any range size R, deg± (dSURJR,N ) is equivalent to
deg± (ANDR ◦ ORN )≤N . Second, we constructed a dual witness µ : HNR      ≤N
                                                                             → R showing that the latter
quantity is at least d.
     Unfortunately, the construction of µ is not sufficient to apply Theorem 9.4 to dSURJR,N , for two
reasons. First, to apply Theorem 9.4 to dSURJR,N , we need to give a smooth dual witness for dSURJR,N
itself, rather than for (ANDR ◦ ORN )≤N . Note that dSURJR,N is defined over the domain {−1, 1}n where
n = N log(R + 1), while (ANDR ◦ ORN )≤N is defined over the domain HNR     ≤N
                                                                              . Second, the dual witness
                        ≤N
µ for (ANDR ◦ ORN ) constructed in the proof of Corollary 7.1 is not smooth in the sense required
by Razborov and Sherstov, as it is only “large” on inputs of Hamming weight at most d (see the first
paragraph of Section 3.2 for further discussion of this point).
     We will address both of the above issues as follows. First, we will show how to turn µ into a dual
witness σ̂ for the fact that deg± (dSURJR,N ) ≥ d, such that σ̂ inherits the “largeness” property of µ
on inputs of Hamming weight at most d. Second,    we transform σ̂ into a dual witness τ for the fact
that deg± dSURJR,N ◦ ANDlog2 n ◦ PARITYlog3 n ≥ d, such that τ satisfies the smoothness condition
required to apply Theorem 9.4.
Construction and analysis of σ̂ . In the proof of Theorem 3.1, we constructed a dual witness µ for
F = ANDR ◦ ORN satisfying the following conditions.

                                 µ(x) · F(x) ≥ 0 for all x ∈ {−1, 1}R·N .                             (9.2)

                                       µ(x) = 0 for all x ∈ HR·N
                                                             >N
                                                                 .                                    (9.3)

                                    µ has pure high degree at least d .                               (9.4)
                                                                ≤d
                                    µ(x) ≥ 2−1−R/5 for all x ∈ HNR .                                  (9.5)

                                      µ(x) has `1 -norm at most 2 .                                   (9.6)

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                             37
                                         M ARK B UN AND J USTIN T HALER

    Expressions (9.2)-(9.4) are explicitly established in the proof of Theorem 3.1 (specifically, the part
entitled “Analysis of µ”). Expression (9.5) is immediate from the penultimate inequality in Expres-
sion (7.28). Equation (9.6) holds because we defined µ = ζ − ∑x∈B ψcorr,x (Expression (7.26)), and
hence
                                                                                     
                                                                                 d   RN
         kµk1 ≤ kζ k1 + ∑ kψcorr,x k1 ≤ 1 + (E (ζ , ANDR ◦ ORN ) +W (ζ , f )) · 2 ·         ≤ 2.
                        x∈B                                                           d

    Defining µ 0 = µ/kµk1 yields a function that also satisfies Equations (9.2)-(9.4), and such that:

                                                                       ≤d
                                          µ 0 (x) ≥ 2−R/4 for all x ∈ HNR .                                       (9.7)
                                                µ 0 (x) has `1 -norm 1 .                                          (9.8)
   Let Z denote the subset of ([N]0 )R defined as follows: Z := {z ∈ ([N]0 )R : z1 + · · · + zR ≤ N}. Define
G(z1 , . . . , zR ) : Z → {−1, 1} to equal −1 if and only if zi ≥ 1 for all i = 1, . . . , R.
   We will now turn µ 0 (which is defined over domain {−1, 1}NR ) into a function over domain Z.
   Define
                              σ (z1 , . . . , zR ) =          ∑                  µ 0 (x).
                                                x=(x1 ,...,xR )∈({−1,1}N )R :|xi |=zi for all i

We claim that σ has the following properties.

                                           σ (z) · G(z) ≥ 0 for all z ∈ Z.                                        (9.9)

                                                     ∑ |σ (z)| = 1 .                                            (9.10)
                                                     z∈Z

                                       deg(q) ≤ d =⇒ ∑ σ (z) · q(z) = 0 .                                       (9.11)
                                                              z∈Z
                                                                                     R
                                σ (z) ≥ 2−R/4 for all z ∈ Z such that ∑ zi ≤ d .                                (9.12)
                                                                                    i=1
    Expression (9.9) is immediate from Expression (9.2). Equation (9.10) is immediate from Equa-
tions (9.8) and (9.2), and the fact that F(x) = F(x0 ) if |xi | = |xi0 | for i = 1, . . . , R. Equation (9.11) holds
because
                       ∑ σ (z)q(z) =          ∑            µ 0 (x) · q(|x1 |, . . . , |xR |) = 0,
                        z∈Z               x=(x1 ,...,xR )∈{−1,1}R·N
                                                                                                   R
where the last equality holds because q(|x1 |, . . . , |xR |) is a polynomial over {−1, 1}N of degree at most
deg(q). Expression (9.12) is immediate from Expression (9.7).
     Finally, we are in a position to define our desired dual witness σ̂ : {−1, 1}n → R, which will witness
the fact that deg± (dSURJR,N ) ≥ d/ log R. For an x ∈ {−1, 1}n interpreted as a sequence (s1 , . . . , sN ) in
[R]N0 , let zi (x) denote { j : s j = i} , and let z(x) = (z1 (x), . . . , zR (x)) ∈ Z. For a z∗ ∈ Z, let N(z∗ ) = |{x ∈
{−1, 1}n : z(x) = z∗ }|. Define σ̂ : {−1, 1}n → R via:
                                                          1
                                            σ̂ (x) =            · σ (z(x)) .                                    (9.13)
                                                        N(z(x))

                          T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                       38
                             T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

    We observe that σ̂ has the following properties.

                                  σ̂ (x) · dSURJ(x) ≥ 0 for all x ∈ {−1, 1}n .                             (9.14)

                                               σ̂ (x) has `1 -norm 1 .                                     (9.15)

                                        σ̂ (x) ≥ 2−R/3 for all x ∈ Hn≤d .                                  (9.16)

                                 σ̂ (x) has pure high degree at least d/ log R .                           (9.17)

    Expression (9.14) follows from Expression (9.9) and the definition of σ̂ . Equation (9.15) is immediate
from the definition of σ̂ and Equation (9.10). To see that Expression (9.16) holds, observe that if x ∈ Hn≤d ,
then Equation (9.1) implies that ∑Ri=1 zi (x) ≤ d. Hence, Expression (9.12) implies that σ (z(x)) ≥ 2−R/4 .
Moreover, for any     x ∈ Hn≤d , N(z(x)) ≤ dn (this is because for any x, x0 , if z(x) = z(x0 ), then |x| = |x0 |,
                  n
                      ≤ dn ). Combining these two inequalities with the definition of σ̂ (Equation (9.13))
                    
so N(z(x)) ≤ |x|
                                −1
yields that σ̂ (x) ≥ 2−R/4 · dn     ≥ 2−R/3 .
    Equation (9.17) follows from Proposition 9.5 below, combined with Equation (9.11).

Proposition 9.5. Let σ : : Z → R satisfy Equation (9.11), and let σ̂ be as per Equation (9.13). Then σ̂
has pure high degree at least d/ log R.

   Proposition 9.5 is essentially a dual formulation of an important lemma of Ambainis [4], as we now
explain.

Lemma 9.6 (Ambainis [4]). Let n = N log(R + 1). Let p : {−1, 1}n → {−1, 1} be any polynomial of
degree at most d. Then there is a polynomial q : [R]N0 → R of degree at most d log R such that for all
z∗ ∈ Z, q(z∗ ) = Ex : z(x)=z∗ [p(x)].

    Proposition 9.5 follows from Lemma 9.6 by the following reasoning. If p : {−1, 1}n → {−1, 1} is
polynomial of degree at most d/ log R, then

                                     ∑        σ̂ (x) · p(x) = ∑ σ (z) · q(z) = 0 ,                         (9.18)
                                  x∈{−1,1}n                  z∈Z


where the polynomial q is as in Lemma 9.6, and the second equality holds by Equation (9.11).
Construction and analysis of τ. Now that we have constructed a dual witness σ̂ for the high threshold
degree of dSURJ (captured by Equations (9.14), (9.15), and (9.17)), that additionally satisfies the extra
condition of “largeness on low-Hamming-weight inputs” (Equation (9.16)), we can turn to constructing
a dual witness for dSURJ ◦ ANDlog2 n ◦ PARITYlog3 n that satisfies the smoothness condition needed by
Razborov’s and Sherstov’s sign-rank analysis.
    Specifically, let a = log2 n, and define ψ : {−1, 1}a → R via:
                                          (
                                           −1/2 if x = −1a
                                   ψ(x) =
                                           1/(2 · (2a − 1)) otherwise.

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                  39
                                           M ARK B UN AND J USTIN T HALER

   Clearly, ψ has pure high degree at least 1, has `1 -norm 1, and ψ(x) · ANDa (x) ≥ 0 for all x ∈ {−1, 1}a .
Consider the dual witness ζ = σ̂ ? ψ. Then ζ has the following properties:
                                                    ζ has `1 -norm 1 .                                         (9.19)
                                   ζ has pure high degree at least d/ log R .                                  (9.20)
                                                   
                          ζ (x) · dSURJ ◦ ANDlog2 n (x) ≥ 0 for all x ∈ {−1, 1}n·a .                           (9.21)
Here, Equation (9.19) follows from Equation (9.15), the fact that ψ has `1 -norm 1, and Property (5.15).
Equation (9.20) follows from Equation (9.17), Equation (5.16), and the fact that ψ has pure high degree
at least 1. Expression (9.21) follows from the definition of dual block composition, Expression (9.14),
and the fact that ψ(x) · ANDa (x) ≥ 0 for all x ∈ {−1, 1}a .
     Let S be the set of all inputs x = (x1 , . . . , xn ) ∈ ({−1, 1}a )n such that fewer than d of the xi ’s are equal
to −1a . By Expression (9.16) and the definition of dual block composition,
                       For any input x ∈ S, |ζ (x)| ≥ 2−R/3 · (2a − 1)−n ≥ 2−R/3 · 2−an .                      (9.22)
Moreover,                                                          
                                                          −ad      n
                                Pr         [x 6∈ S] ≤ 2         ·     ≤ n−Ω(d log n) ≤ 2−Ω(R) .                (9.23)
                           x∼({−1,1}a )n                           d
    Finally, letting ` = log3 n, let η : {−1, 1}` → R be defined by η(x) = 2−` · PARITY` (x). Observe
that η has `1 -norm 1, has pure high degree `, and sgn(η(x)) = sgn(PARITY` (x)) for all x ∈ {−1, 1}` .
Then Equation (5.15) implies that τ := ζ ? η has `1 -norm 1, Equation (5.16) implies that τ has pure high
degree at least (d/ log R) · ` ≥ R, and it follows from Equations (9.22) and (9.23) that
                                      Pr     a n
                                                   [|τ(x)| < 2−R/3 · 2−`·a·n ] ≤ 2−Ω(R) .                      (9.24)
                               x∼((  {−1,1}`  ))
    Hence, we can apply Theorem 9.4 to the function h = dSURJ ◦ ANDlog2 n ◦ PARITYlog3 n , which is
defined on Õ(n) variables, to obtain an AC0 function F(x, y) that is also defined on Õ(n) variables, such
that [F(x, y)]x,y has sign-rank exp(Ω̃(n1/2 )) .

9.4    Secret sharing schemes
Bogdanov et al. [7] observed that for any f : {−1, 1}n → {−1, 1} and integer d > 0, any dual polynomial
                      f ( f ) ≥ d leads to a scheme for sharing a single secret bit b ∈ {−1, 1} among n
µ for the fact that deg   ε
parties as follows. Decompose µ as µ+ − µ− , where µ+ and µ− are non-negative functions with kµ+ k1 =
kµ0 k1 = 1/2. Then in order to split b among n parties, one draws an input x = (x1 , . . . , xn ) ∈ {−1, 1}n
from the distribution 2 · µb , and gives bit xi to the ith party. In order to reconstruct b, one applies f to
(x1 , . . . , xn ).
      Because µ is ε-correlated with f , the probability of correct reconstruction if the bit is chosen at
random is at least (1 + ε)/2 (and the the reconstruction advantage, defined to equal Prx∼µ+ [ f (x) =
1] − Prx∼µ− [ f (x) = 1], is at least ε). The fact that µ has pure high degree at least d means that any subset
of shares of size less than d provides no information about the secret bit b. Hence, an immediate corollary
of Theorem 8.1 is the following.

                         T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                                      40
                           T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

Corollary 9.7. For any arbitrarily small constant δ > 0, there is a secret sharing scheme that shares a
single bit b among n parties by assigning a bit xi to each party i. The scheme has the following properties.

   1. The reconstruction procedure is computed by an AC0 circuit.
                                                          1−δ )
   2. The reconstruction advantage is at least 1 − 2−Ω(n          .

   3. Any subset of shares of size less than d = Ω(n1−δ ) provides no information about the secret bit b.

    The best previous results could only guarantee security against subsets of shares of size d = Θ(n1/2 ),
or could only guarantee reconstruction advantage bounded away from 1 [7, 18]. Cheng et al. [19] recently
considered a relaxed notion of security, where even very small subsets of shares are allowed to provide
(a bounded amount of) information about the secret bit b. Under this relaxed notion of security, they
achieved perfect reconstruction and security against subsets of size Ω(n).
Acknowledgments. The authors are grateful to Robin Kothari, Nikhil Mande, Jonathan Ullman, and the
anonymous reviewers for valuable comments earlier versions of this manuscript.


References
 [1] S COTT A ARONSON AND YAOYUN S HI: Quantum lower bounds for the collision and the element
     distinctness problems. J. ACM, 51(4):595–605, 2004. [doi:10.1145/1008731.1008735] 16

 [2] M IKLOS A JTAI AND M ICHAEL B EN -O R: A theorem on probabilistic constant depth computations.
     In Proc. 16th STOC, pp. 471–474. ACM Press, 1984. [doi:10.1145/800057.808715] 28

 [3] E RIC A LLENDER: A note on the power of threshold circuits. In Proc. 30th FOCS, pp. 580–584.
     IEEE Comp. Soc., 1989. [doi:10.1109/SFCS.1989.63538] 3, 7

 [4] A NDRIS A MBAINIS: Polynomial degree and lower bounds in quantum complexity: Colli-
     sion and element distinctness with small range. Theory of Computing, 1(3):37–46, 2005.
     [doi:10.4086/toc.2005.v001a003] 39

 [5] L ÁSZLÓ BABAI , P ÉTER F RANKL , AND JANOS S IMON: Complexity classes in communication
     complexity theory (preliminary version). In Proc. 27th FOCS, pp. 337–347. IEEE Comp. Soc.,
     1986. [doi:10.1109/SFCS.1986.15] 3, 6

 [6] PAUL B EAME AND W IDAD M ACHMOUCHI: The quantum query complexity of AC0 . Quantum Inf.
     Comput., 12(7-8):670–676, 2012. [doi:10.26421/QIC12.7-8-11] 4

 [7] A NDREJ B OGDANOV, Y UVAL I SHAI , E MANUELE V IOLA , AND C HRISTOPHER W ILLIAMSON:
     Bounded indistinguishability and the complexity of recovering secrets. In Proc. 36th Ann. Internat.
     Cryptology Conf. (CRYPTO’16), pp. 593–618. Springer, 2016. [doi:10.1007/978-3-662-53015-3_21,
     ECCC:TR15-182] 4, 7, 40, 41

                       T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                              41
                                  M ARK B UN AND J USTIN T HALER

 [8] A NDREJ B OGDANOV AND C HRISTOPHER W ILLIAMSON: Approximate bounded indistin-
     guishability.  In Proc. 44th Internat. Colloq. on Automata, Languages, and Program-
     ming (ICALP’17), pp. 53:1–11. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.ICALP.2017.53] 4
 [9] A DAM B OULAND , L IJIE C HEN , D HIRAJ H OLDEN , J USTIN T HALER , AND P RASHANT NALINI
     VASUDEVAN: On the power of statistical zero knowledge. SIAM J. Comput., 49(4):FOCS17:1–
     58, 2019. Preliminary version in FOCS’17. [doi:10.1137/17M1161749, ECCC:TR16-140,
     arXiv:1609.02888] 4, 10, 12, 23, 28
[10] H ARRY B UHRMAN , R ICHARD C LEVE , RONALD DE W OLF, AND C HRISTOF Z ALKA: Bounds for
     small-error and zero-error quantum algorithms. In Proc. 40th FOCS, pp. 358–368. IEEE Comp.
     Soc., 1999. [doi:10.1109/SFFCS.1999.814607, arXiv:cs/9904019] 22
[11] H ARRY B UHRMAN , N IKOLAI K. V ERESHCHAGIN , AND RONALD DE W OLF: On computation
     and communication with small bias. In Proc. 22nd IEEE Conf. on Comput. Complexity (CCC’07),
     pp. 24–32. IEEE Comp. Soc., 2007. [doi:10.1109/CCC.2007.18] 2, 3, 6
[12] M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER: The polynomial method strikes back: Tight
     quantum query bounds via dual polynomials. Theory of Computing, 16(10):1–71, 2020. Preliminary
     version in STOC’18. [doi:10.4086/toc.2020.v016a010] 4, 5, 10, 12, 13, 14, 16, 17, 18, 20, 23
[13] M ARK B UN AND J USTIN T HALER: Dual lower bounds for approximate degree and Markov–
     Bernstein inequalities. Inform. Comput., 243:2–25, 2015. Preliminary version in ICALP’13.
     [doi:10.1016/j.ic.2014.12.003] 4
[14] M ARK B UN AND J USTIN T HALER: Hardness amplification and the approximate degree of constant-
     depth circuits. In Proc. 42nd Internat. Colloq. on Automata, Languages, and Programming
     (ICALP’15), pp. 268–280. Springer, 2015. [doi:10.1007/978-3-662-47672-7_22, arXiv:1311.1616]
     4, 6, 9, 11, 23, 30, 35
[15] M ARK B UN AND J USTIN T HALER: Improved bounds on the sign-rank of AC0 . In Proc. 43rd
     Internat. Colloq. on Automata, Languages, and Programming (ICALP’16), pp. 37:1–14. Schloss
     Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016. [doi:10.4230/LIPIcs.ICALP.2016.37] 3, 4, 6, 36
[16] M ARK B UN AND J USTIN T HALER: Approximate degree and the complexity of depth three
     circuits. In Proc. 22nd Internat. Workshop on Randomization and Computation (RANDOM’18), pp.
     35:1–18. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2018.35, ECCC:TR16-121] 4, 6
[17] M ARK B UN AND J USTIN T HALER: The large-error approximate degree of AC0 . In Proc. 23rd Inter-
     nat. Workshop on Randomization and Computation (RANDOM’19), pp. 55:1–16. Schloss Dagstuhl–
     Leibniz-Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.APPROX-RANDOM.2019.55] 1
[18] M ARK B UN AND J USTIN T HALER: A nearly optimal lower bound on the approximate de-
     gree of AC0 . SIAM J. Comput., 49(4):FOCS17:59–96, 2019. Preliminary version in FOCS’17.
     [doi:10.1137/17M1161737] 4, 5, 8, 10, 12, 13, 14, 19, 20, 21, 37, 41

                     T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                         42
                          T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

[19] K UAN C HENG , Y UVAL I SHAI , AND X IN L I: Near-optimal secret sharing and error correcting
     codes in AC0 . In Proc. Theory of Cryptography Conf. (TCC’17), pp. 424–458. Springer, 2017.
     [doi:10.1007/978-3-319-70503-3_14] 4, 41

[20] V ITALY F ELDMAN: Evolvability from learning algorithms. In Proc. 40th STOC, pp. 619–628.
     ACM Press, 2008. [doi:10.1145/1374376.1374465] 7

[21] J ÜRGEN F ORSTER: A linear lower bound on the unbounded error probabilistic communication
     complexity. J. Comput. System Sci., 65(4):612–625, 2002. Preliminary version in CCC’01.
     [doi:10.1016/S0022-0000(02)00019-3] 6, 11, 36

[22] J ÜRGEN F ORSTER , M ATTHIAS K RAUSE , S ATYANARAYANA V. L OKAM , RUSTAM M UBARAKZ -
     JANOV, N IELS S CHMITT, AND H ANS U LRICH S IMON: Relations between communication complex-
     ity, linear arrangements, and computational complexity. In Proc. 21st Found. Softw. Techn. Theoret.
     Comp. Sci. Conf. (EFSTTCS’01), pp. 171–182. Springer, 2001. [doi:10.1007/3-540-45294-X_15] 8,
     36

[23] M IKAEL G OLDMANN , J OHAN H ÅSTAD , AND A LEXANDER A. R AZBOROV: Majority
     gates vs. general weighted threshold gates. Comput. Complexity, 2(4):277–300, 1992.
     [doi:10.1007/BF01200426] 35

[24] A NDRÁS H AJNAL , W OLFGANG M AASS , PAVEL P UDLÁK , M ARIO S ZEGEDY, AND G YÖRGY
     T URÁN: Threshold circuits of bounded depth. J. Comput. System Sci., 46(2):129–154, 1993.
     [doi:10.1016/0022-0000(93)90001-D] 35

[25] J EFF K AHN , NATHAN L INIAL , AND A LEX S AMORODNITSKY: Inclusion-exclusion: Exact and
     approximate. Combinatorica, 16(4):465–477, 1996. [doi:10.1007/BF01271266] 22

[26] H ARTMUT K LAUCK: Lower bounds for quantum communication complexity. SIAM J. Com-
     put., 37(1):20–46, 2007. Preliminary version in FOCS’01. [doi:10.1137/S0097539702405620,
     arXiv:quant-ph/0106160] 7, 35, 36
                                                                                 1/3 )
[27] A DAM R. K LIVANS AND ROCCO A. S ERVEDIO: Learning DNF in time 2Õ(n                . J. Comput. System
     Sci., 68(2):303–318, 2004. [doi:10.1016/j.jcss.2003.07.007] 3, 8

[28] S WASTIK KOPPARTY: AC0 lower bounds and pseudorandomness, 2013. Lecture notes of “Topics
     in Complexity Theory and Pseudorandomness (Spring 2013)” at Rutgers University. 28

[29] M ATTHIAS K RAUSE: On the computational power of Boolean decision lists. Comput. Complexity,
     14(4):362–375, 2006. [doi:10.1007/s00037-005-0203-0] 35

[30] M ATTHIAS K RAUSE AND PAVEL P UDLÁK: On the computational power of depth-2 circuits with
     threshold and modulo gates. Theoret. Comput. Sci., 174(1–2):137–156, 1997. [doi:10.1016/S0304-
     3975(96)00019-9] 3, 6

[31] T ROY L EE: A note on the sign degree of formulas, 2009. [arXiv:0909.4607] 9, 19

                      T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                               43
                                  M ARK B UN AND J USTIN T HALER

[32] NATI L INIAL AND A DI S HRAIBMAN: Learning complexity vs communication complexity. Combin.
     Probab. Comput., 18(1–2):227–245, 2009. [doi:10.1017/S0963548308009656] 7, 35, 36

[33] M ARVIN M INSKY AND S EYMOUR PAPERT: Perceptrons: An Introduction to Computational
     Geometry. MIT Press, 1969. [doi:10.7551/mitpress/11301.001.0001] 3, 6, 8

[34] N OAM N ISAN: The communication complexity of threshold gates. In Combinatorics, Paul Erdős is
     Eighty, pp. 301–315. János Bolyai Math. Soc., Budapest, 1994. 7, 35, 36

[35] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: New degree bounds for polynomial threshold func-
     tions. Combinatorica, 30(3):327–358, 2010. Preliminary version in STOC’03. [doi:10.1007/s00493-
     010-2173-3] 6

[36] R AMAMOHAN PATURI AND JANOS S IMON: Probabilistic communication complexity. J. Comput.
     System Sci., 33(1):106–123, 1986. [doi:10.1016/0022-0000(86)90046-2] 6, 7, 36

[37] V LADIMIR V. P ODOLSKII: Perceptrons of large weight. Probl. Info. Transmission, 45:46–53, 2009.
     Preliminary version in CSR’07. [doi:10.1134/S0032946009010062] 2

[38] A LEXANDER A. R AZBOROV AND A LEXANDER A. S HERSTOV: The sign-rank of AC0 . SIAM J.
     Comput., 39(5):1833–1855, 2010. [doi:10.1137/080744037] 3, 6, 11, 18, 36, 37

[39] A LEXANDER A. S HERSTOV: Separating AC0 from depth-2 majority circuits. SIAM J. Comput.,
     38(6):2113–2129, 2009. [doi:10.1137/08071421X] 3, 6, 35, 36

[40] A LEXANDER A. S HERSTOV: The pattern matrix method. SIAM J. Comput., 40(6):1969–2000,
     2011. Preliminary version in STOC’08. [doi:10.1137/080733644] 3, 5, 6, 35

[41] A LEXANDER A. S HERSTOV: Strong direct product theorems for quantum communication and
     query complexity. SIAM J. Comput., 41(5):1122–1165, 2012. Preliminary version in STOC’14.
     [doi:10.1137/110842661] 12, 29, 30

[42] A LEXANDER A. S HERSTOV: The intersection of two halfspaces has high threshold degree. SIAM J.
     Comput., 42(6):2329–2374, 2013. Preliminary version in FOCS’09. [doi:10.1137/100785260] 9, 19

[43] A LEXANDER A. S HERSTOV: Breaking the Minsky-Papert barrier for constant-depth circuits. SIAM
     J. Comput., 47(5):1809–1857, 2018. Preliminary version in STOC’14. [doi:10.1137/15M1015704]
     4, 6, 23

[44] A LEXANDER A. S HERSTOV: The power of asymmetry in constant-depth circuits. SIAM J. Comput.,
     47(6):2362–2434, 2018. Preliminary version in FOCS’15. [doi:10.1137/100785260] 3, 4, 5, 6, 9,
     10

[45] A LEXANDER A. S HERSTOV: Algorithmic polynomials. SIAM J. Comput., 49(6):1173–1231, 2020.
     Preliminary version in STOC’18. [doi:10.1137/19M1278831, arXiv:1801.04607] 4, 14

                     T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                         44
                        T HE L ARGE -E RROR A PPROXIMATE D EGREE OF AC0

[46] A LEXANDER A. S HERSTOV AND P EI W U: Near-optimal lower bounds on the threshold degree and
     sign-rank of AC0 . SIAM J. Comput., online:STOC19:1–86, 2021. Preliminary version in STOC’19.
     [doi:10.1137/20M1312459, arXiv:1901.00988] 13

[47] YAOYUN S HI AND Y UFAN Z HU: Quantum communication complexity of block-composed func-
     tions. Quantum Inf. Comput., 9(5):444–460, 2009. [doi:10.26421/QIC9.5-6-7] 9, 19

[48] L ESLIE G. VALIANT: Evolvability. J. ACM, 56(1):3:1–21, 2009. [doi:10.1145/1462153.1462156]
     7

[49] E MANUELE V IOLA: On approximate majority and probabilistic time. Comput. Complexity,
     18(3):337–375, 2009. Preliminary version in CCC’07. [doi:10.1007/s00037-009-0267-3] 28

[50] ROBERT Š PALEK: A dual polynomial for OR, 2008. [arXiv:0803.4516] 18


AUTHORS

     Mark Bun
     Assistant professor
     Department of Computer Science
     Boston University
     Boston, MA, USA
     mbun bu edu
     http://cs-people.bu.edu/mbun/


     Justin Thaler
     Assistant professor
     Department of Computer Science
     Georgetown University
     Washington, DC, USA
     justin thaler georgetown edu
     http://people.cs.georgetown.edu/jthaler/


ABOUT THE AUTHORS

     M ARK B UN is an Assistant Professor in the Department of Computer Science at Boston
        University. He completed his Ph. D. in the Theory of Computation group at Harvard
        University, where he was advised by Salil Vadhan. After that, he was a postdoctoral
        researcher at Princeton University and a Google Research Fellow at the Simons Institute.
        His research interests include computational complexity, differential privacy, cryptozo-
        ology, and learning theory. As a native Seattleite, he enjoys coffee, rain, composting,
        4-way stop signs, hiking, cheese rolling, and competitive duck herding.


                     T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                          45
                             M ARK B UN AND J USTIN T HALER

J USTIN T HALER, known to his friends as Justin, is an Assistant Professor in the Depart-
    ment of Computer Science at Georgetown University. He completed his Ph. D. in the
    Theory of Computation group at Harvard University, where he was advised by Michael
    Mitzenmacher. After that, he was a postdoctoral researcher at the Simons Institute, and
    a Research Scientist at Yahoo Labs in New York City. His research interests include
    computational complexity, verifiable computing, and algorithms for massive datasets. In
    his spare time, he enjoys running and playing low-quality tennis.




                T HEORY OF C OMPUTING, Volume 17 (7), 2021, pp. 1–46                          46