DOKK Library

The Polynomial Method Strikes Back: Tight Quantum Query Bounds via Dual Polynomials

Authors Mark Bun, Robin Kothari, Justin Thaler,

License CC-BY-3.0

Plaintext
                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71
                                       www.theoryofcomputing.org




          The Polynomial Method Strikes Back:
             Tight Quantum Query Bounds
                 via Dual Polynomials
                      Mark Bun*                  Robin Kothari                 Justin Thaler†
              Received February 6, 2019; Revised November 26, 2019; Published October 24, 2020




       Abstract. The approximate degree of a Boolean function f is the least degree of a real
       polynomial that approximates f pointwise to error at most 1/3. The approximate degree of
       f is known to be a lower bound on the quantum query complexity of f (Beals et al., FOCS
       1998 and J. ACM 2001).
           We find tight or nearly tight bounds on the approximate degree and quantum query
       complexities of several basic functions. Specifically, we show the following.
           • k-Distinctness: For any constant k, the approximate degree and quantum query com-
             plexity of the k-distinctness function is Ω(n3/4−1/(2k) ). This is nearly tight for large k,
             as Belovs (FOCS 2012) has shown that for any constant k, the approximate degree and
                                                                          k+2
             quantum query complexity of k-distinctness is O(n3/4−1/(2 −4) ).
           • Image size testing: The approximate degree and quantum query complexity of testing
             the size of the image of a function [n] → [n] is Ω̃(n1/2 ). This proves a conjecture of
             Ambainis et al. (SODA 2016), and it implies tight lower bounds on the approximate
             degree and quantum query complexity of the following natural problems.
     An extended abstract of this paper [28] appeared in the Proceedings of the 50th ACM Symposium on Theory of Computing
(STOC 2018).
   * Supported by NSF grant CCF-1947889.
   † Supported by NSF CAREER award CCF-1845125.



ACM Classification: F.2.3
AMS Classification: 68Q17, 81P68
Key words and phrases: polynomial method, quantum query complexity, dual polynomials, surjectivity


© 2020 Mark Bun, Robin Kothari, and Justin Thaler
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2020.v016a010
                    M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

       – k-Junta testing: A tight Ω̃(k1/2 ) lower bound for k-junta testing, answering the
         main open question of Ambainis et al. (SODA 2016).
       – Statistical distance from uniform: A tight Ω̃(n1/2 ) lower bound for approximating
         the statistical distance of a distribution from uniform, answering the main question
         left open by Bravyi et al. (STACS 2010 and IEEE Trans. Inf. Theory 2011).
       – Shannon entropy: A tight Ω̃(n1/2 ) lower bound for approximating Shannon entropy
         up to a certain additive constant, answering a question of Li and Wu (2017).
   • Surjectivity: The approximate degree of the surjectivity function is Ω̃(n3/4 ). The best
     prior lower bound was Ω(n2/3 ). Our result matches an upper bound of Õ(n3/4 ) due to
     Sherstov (STOC 2018), which we reprove using different techniques. The quantum
     query complexity of this function is known to be Θ(n) (Beame and Machmouchi,
     Quantum Inf. Comput. 2012 and Sherstov, FOCS 2015).
    Our upper bound for surjectivity introduces new techniques for approximating Boolean
functions by low-degree polynomials. Our lower bounds are proved by significantly refining
techniques recently introduced by Bun and Thaler (FOCS 2017).




                T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                           2
Contents
1 Introduction                                                                                         4
  1.1 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
  1.2 Prior work on lower bounds for approximate degree . . . . . . . . . . . . . . . . . . . . 9
  1.3 Our techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
  1.4 Outline for the rest of the paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Preliminaries                                                                                        14
  2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   14
  2.2 Two variants of approximate degree and their dual formulations . . . . . . . . . . . . .         14
  2.3 Basic facts about polynomial approximations . . . . . . . . . . . . . . . . . . . . . . .        15
  2.4 Functions of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    16
  2.5 Connecting symmetric properties and block-composed functions . . . . . . . . . . . . .           18
  2.6 The dual block method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      19
  2.7 A refinement of a technical lemma from prior work . . . . . . . . . . . . . . . . . . . .        20

3 Upper bound for surjectivity                                                                         25
  3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   26
  3.2 Warmup: Approximating NOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .          26
  3.3 Informal terminology: polynomials as algorithms . . . . . . . . . . . . . . . . . . . . .        28
  3.4 Approximating surjectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     29

4 Lower bound for surjectivity                                                                         37
  4.1 Step 1: A dual witness for ORN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     38
  4.2 Step 2: Constructing a preliminary dual witness for ANDR ◦ ORN . . . . . . . . . . . . .         42
  4.3 Step 3: Constructing the final dual witness . . . . . . . . . . . . . . . . . . . . . . . . .    42

5 Lower bound for k-distinctness                                                                       43
  5.1 Step 1: A dual witness for THRkN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     44
  5.2 Step 2: A preliminary dual witness for ORR ◦ THRkN . . . . . . . . . . . . . . . . . . . .       47
  5.3 Step 3: Completing the construction . . . . . . . . . . . . . . . . . . . . . . . . . . . .      52

6 Lower bound for image size testing and its implications                                              53
  6.1 Image size testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   53
  6.2 Lower bound for junta testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    58
  6.3 Lower bound for SDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      59
  6.4 Lower bound for entropy comparison and approximation . . . . . . . . . . . . . . . . .           60

7 Conclusion and open questions                                                                   62
  7.1 Additional consequences: approximate degree lower bounds for DNFs and AC0 . . . . . 62
  7.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

References                                                                                             63

                                                   3
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

1    Introduction
Approximate degree. The approximate degree of a Boolean function f : {−1, 1}n → {−1, 1}, denoted
 f f ), is the least degree of a real polynomial p such that |p(x) − f (x)| ≤ 1/3 for all x ∈ {−1, 1}n .
deg(
Approximate degree is a basic measure of the complexity of a Boolean function, and has diverse
applications throughout theoretical computer science.
    Upper bounds on approximate degree are at the heart of the most powerful known learning algorithms
in a number of models [10, 46, 47, 49, 50, 63, 68], algorithmic approximations for the inclusion-exclusion
principle [45, 70], and algorithms for differentially private data release [35, 87]. A recent line of
work [84, 85] has used approximate degree upper bounds to show new lower bounds on the formula and
graph complexities of explicit functions.
    Lower bounds on approximate degree have enabled progress in several areas of complexity theory,
including communication complexity [27, 36, 38, 39, 41, 64, 69, 72, 74, 77], circuit complexity [60, 71],
oracle separations [15, 21], and secret-sharing [20]. Most importantly for this paper, approximate degree
lower bounds have been critical in shaping our understanding of quantum query complexity [1, 3, 13],
    In spite of the importance of approximate degree, major gaps remain in our knowledge. In particular,
the approximate degrees of many basic functions are still unknown. Our goal in this paper is to determine
the approximate degrees of several natural functions for which there were significant gaps between known
upper and lower bounds.

Quantum query complexity. While determining the approximate degree of basic functions of interest
is a test of our understanding of approximate degree, it is also motivated by the study of quantum
algorithms. In the quantum query model, a quantum algorithm is given query access to the bits of an input
x, and the goal is to compute some function f of x while minimizing the number of queried bits. Quantum
query complexity captures much of the power of quantum computing, and most quantum algorithms were
discovered in or can easily be described in the query setting.
     Approximate degree was one of the first general lower bound techniques for quantum query complex-
ity. In 1998, Beals et al. [13] observed that the bounded-error quantum query complexity of a function
 f is lower bounded by (one half times) the approximate degree of f . Since polynomials are sometimes
easier to understand than quantum algorithms, this observation led to a number of new lower bounds on
quantum query complexity. This method of proving quantum query lower bounds is called the polynomial
method.
     After several significant quantum query lower bounds were proved via the polynomial method
(including the work of Aaronson and Shi [3], who proved optimal lower bounds for the collision and
element distinctness problems), the polynomial method took a back seat. Since then, the positive-weights
adversary method [5, 12, 52, 92] and the newer negative-weights adversary method [44, 55, 66] have
become the tools of choice for proving quantum query lower bounds (with some notable exceptions, such
as Zhandry’s recent tight lower bound for the set equality problem [91]). This leads us to our second goal
for this work.
     In this article, we seek to resolve several open problems in quantum query complexity using ap-
proximate degree as the lower bound technique. A distinct advantage of proving quantum query lower
bounds with the polynomial method is that any such bound can be “lifted” via Sherstov’s pattern matrix
method [72] to a quantum communication lower bound (even with unlimited shared entanglement [56]);

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            4
                                 T HE P OLYNOMIAL M ETHOD S TRIKES BACK

        Problem                Best prior upper bound    Our lower bound     Best prior lower bound
        k-Distinctness         O(n 3/4−1/(2k+2 −4)
                                               ) [16]    Ω̃(n3/4−1/(2k) )    Ω̃(n2/3 ) [3]
                                  √                         √
        Image size testing     O( n log n) [9]           Ω̃( n)              Ω̃(n1/3 ) [9]
                                  √                         √
        k-Junta testing        O( k log k) [9]           Ω̃( k)              Ω̃(k1/3 ) [9]
                                  √                         √
        SDU                    O( n) [23]                Ω̃( n)              Ω̃(n1/3 ) [3, 23]
                                  √                         √
        Shannon entropy        Õ( n) [23, 58]           Ω̃( n)              Ω̃(n1/3 ) [58]

   Table 1: Our lower bounds on quantum query complexity and approximate degree vs. prior work.


such a result is not known for any other quantum query lower bound technique. More generally, using ap-
proximate degree as a lower bound technique for quantum query complexity has other advantages, such as
the ability to show lower bounds for zero-error and small-error quantum algorithms [24], unbounded-error
quantum algorithms [13], and time-space tradeoffs [48].

Quantum query complexity and approximate degree. We illustrate the power of the polynomial
method by proving optimal or nearly optimal bounds on several functions studied in the quantum
computing literature. These results are summarized in Table 1, and definitions of the problems considered
can be found in Section 1.1. Since the upper bounds for these functions were shown using quantum
algorithms, our results determine both the quantum query complexity and approximate degree of these
functions.
    For most of the functions studied in this paper, the positive-weights adversary bound provably cannot
show optimal lower bounds due to the certificate complexity barrier [83, 92] and the property testing
barrier [44]. While these barriers do not apply to the negative-weights variant (which is actually capable
of proving tight quantum query lower bounds for all functions [55, 66]), the negative-weights adversary
method is often challenging to apply to specific problems, and tight bounds for the problems we consider
have remained unknown for a long time.
    For the functions presented in Table 1, the approximate degree and quantum query complexity are
essentially the same. This is not the case for the surjectivity function, which has played an important
role in the literature on approximate degree and quantum query complexity. Specifically, Beame and
Machmouchi [14] showed that surjectivity has quantum query complexity Θ̃(n). On the other hand,
Sherstov recently showed that surjectivity has approximate degree Õ(n3/4 ) [78]. Surjectivity is the only
known example of a “natural” function separating approximate degree from quantum query complexity;
prior examples of such functions [2, 7] were contrived, and (unlike surjectivity) specifically constructed
to separate the two measures.
    Our final result gives a nearly tight estimate of the approximate degree of surjectivity. We prove a new
lower bound of Ω̃(n3/4 ), which matches Sherstov’s upper bound up to logarithmic factors. We also give
a new construction of an approximating polynomial of degree Õ(n3/4 ), using very different techniques
than [78]. We believe that our proof of this Õ(n3/4 ) upper bound is of independent interest. In particular,
our lower bound proof for surjectivity is specifically tailored to showing optimality of our upper bound
construction, in a sense that can be made formal via complementary slackness. We are optimistic that our

                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            5
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

        Problem        Prior upper bound    Our upper bound     Our lower bound     Prior lower bound
        Surjectivity   Õ(n3/4 ) [78]       Õ(n3/4 )           Ω̃(n3/4 )           Ω̃(n2/3 ) [3]

              Table 2: Our bounds on the approximate degree of surjectivity vs. prior work.


approximation techniques will be useful for showing additional tight approximate-degree bounds in the
future.

1.1     Our results
We now describe our results and prior work on these functions in more detail.

1.1.1    Functions considered
We now informally describe the functions studied in this paper. These functions are formally defined in
Section 2.4.
    Let R be a power of two and N ≥ R, and let n = N · log2 R. Most of the functions that we consider
interpret their inputs in {−1, 1}n as a list of N numbers from a range [R], and determine whether this list
satisfies various natural conditions. We let the frequency fi of range item i ∈ R denote the number of
times i appears in the input list.
    In this paper we study the following functions in which the input is N numbers from a range [R]:

      • Surjectivity (SURJ): Do all range items appear at least once?
      • k-Distinctness: Is there a range item that appears k or more times?
      • Image size testing: Decide if all range items appear at least once or if at most γ · R range items
        appear at least once, under the promise that one of these is true.
      • Statistical distance from uniform (SDU): Interpret the input as a probability distribution p, where
        pi = fi /N. Compute the statistical distance of p from the uniform distribution over R up to some
        small additive error ε.
      • Shannon entropy: Interpret the input as a probability distribution p, where pi = fi /N. Compute the
        Shannon entropy ∑i∈R pi · log(1/pi ) of p up to additive error ε.

An additional function we consider that does not fit neatly into the framework above is k-junta testing.

      • k-Junta testing: Given an input in {−1, 1}n representing the truth table of a function {−1, 1}log n →
        {−1, 1}, determine whether this function depends on at most k of its input bits, or is at least ε-far
        from any such function.

    We give tight or nearly tight bounds on the quantum query complexities and/or approximate degrees
of all of the functions above. Our lower bounds for SURJ, k-distinctness, image size testing, SDU, and
entropy approximation all require N to be “sufficiently larger” than R, by a certain constant factor. For
simplicity, throughout this introduction we do not make this requirement explicit, and for this reason we
label the theorems in this introduction informal.

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                             6
                                   T HE P OLYNOMIAL M ETHOD S TRIKES BACK

1.1.2   Results in detail

Surjectivity. In the surjectivity problem we are given N numbers from [R] and must decide if every
range item appears at least once in the input.
    The quantum query complexity of this problem was studied by Beame and Machmouchi [14], who
proved a lower bound of Ω̃(n), which was later improved by Sherstov to the optimal Θ(n) [80]. Beame
and Machmouchi [14] explicitly leave open the question of determining the approximate degree of
surjectivity. Recently, Sherstov [78] showed an upper bound of Õ(n3/4 ) on the approximate degree of
this function. The best prior lower bound was Ω̃(n2/3 ) [3, 33].
    We give a completely different construction of an approximating polynomial for surjectivity with
degree Õ(n3/4 ). We also prove a matching lower bound, which shows that the approximate degree of the
surjectivity function is Θ̃(n3/4 ).

Theorem 1.1 (Informal). The approximate degree of SURJ is Θ̃(n3/4 ).



k-Distinctness. In this problem, we are given N numbers in [R] and must decide if any range item
appears at least k times in the list (i. e., is there an i ∈ [R] with fi ≥ k?). This generalizes the well-studied
element distinctness problem, which is the same as 2-distinctness.
    Ambainis [8] first used quantum walks to give an O(nk/(k+1) ) upper bound on the quantum query
complexity of any problem with certificates of size k, including k-distinctness and k-sum.1 Later, Belovs
introduced a beautiful new framework for designing quantum algorithms [17] and used it to improve the
                                                     k+2
upper bound for k-distinctness to O(n3/4−1/(2 −4) ) [16]. Several subsequent works have used Belovs’
k-distinctness algorithm as a black-box subroutine for solving more complicated problems (e. g., [58, 61]).
    As for lower bounds, Aaronson and Shi [3] established an Ω̃(n2/3 ) lower bound on the approximate
degree of k-distinctness for any k ≥ 2. Belovs and Špalek used the adversary method to prove a lower
bound of Ω(nk/(k+1) ) on the quantum query complexity of k-sum, showing that Ambainis’ algorithm is
tight for k-sum. They asked whether their techniques can prove an ω(n2/3 ) quantum query lower bound
for k-distinctness. We achieve this goal, but using the polynomial method instead of the adversary method.
Our main result is the following:

Theorem 1.2 (Informal). For any k ≥ 2, the approximate degree and quantum query complexity of
k-distinctness is Ω̃(n3/4−1/(2k) ).

                                                                                                      k+2
    This is nearly tight for large k, as it approaches Belovs’ upper bound of O(n3/4−1/(2 −4) ). Note that
the exponent in both bounds approaches 3/4. It remains an intriguing open question to close the gap
                   k+2
between n3/4−1/(2 −4) and n3/4−1/(2k) , especially for small values of k ≥ 3.
    Our k-distinctness lower bound also implies an Ω̃(n3/4−1/(2k) ) lower bound on the quantum query
complexity of approximating the maximum frequency, F∞ , of any element up to relative error less than
1/k [61], improving over the previous best bound of Ω̃(n2/3 ).

   1 In the k-sum problem, we are given N numbers in [R] and asked to decide if any k of them sum to 0 (mod R).



                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                   7
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Image size testing. In this problem, we are given N numbers in [R] and 0 < γ < 1, and must decide if
every range item appears at least once or if at most γ · R range items appear at least once. We show for
                                                                                       √
any γ > 0, the problem has approximate degree and quantum query complexity Ω̃( n). This holds as
long as N = c · R for a certain constant c > 0.
Theorem 1.3 (Informal). The approximate degree and quantum query complexity of image size testing is
   √
Ω̃( n).
    This lower bound is tight, matching a quantum algorithm of Ambainis, Belovs, Regev, and de
Wolf [9], and confirms a conjecture from their work. The previous best lower bound was Ω(n1/3 ) [9]
obtained via reduction to the collision lower bound [3]. The classical query complexity of this problem is
Θ(n/ log n) [89].
    The version of image size testing we define is actually a special case of the one studied in [9]. The
                                                                              √
version we define is solvable via the following simple algorithm making O( n) queries: pick a random
range item, and Grover search for an instance of that range item. The fact that our lower bound holds
even for this special case of the problem considered in prior work obviously only makes our lower bound
stronger.
    This lower bound also serves as a starting point to establish the next three lower bounds.

k-Junta testing. In this problem, we are given the truth table of a Boolean function and have to
determine if the function depends on at most k variables or if it is ε-far from any such function.
     The best classical algorithm for this problem uses O(k log k + k/ε) queries [19]. The problem was
first studied in the quantum setting by Atıcı and Servedio [11], who gavep a quantum algorithm making
O(k/ε) queries. This was later improved by Ambainis et al. [9] to Õ( k/ε). They also proved a lower
bound of Ω(k√  1/3 ). Via a connection established by Ambainis et al., our image size testing lower bound

implies a Ω̃( k) lower bound on the approximate degree and quantum query complexity of k-junta
testing (for some ε = Ω(1)).
Theorem
   √    1.4 (Informal). The approximate degree and quantum query complexity of k-junta testing is
Ω̃( k).
    This matches the upper bound of [9], resolving the main open question from their paper.

Statistical distance from uniform (SDU). In this problem, we are given a list of N numbers in [R],
which we interpret as a probability distribution p, where pi = fi /N, the fraction of times i appears. The
goal is to compute the statistical distance between p and the uniform distribution to error ε.
                                                                                       √
    This problem was studied by Bravyi, Harrow, and Hassidim [23], who gave an O( n)-query quantum
algorithm approximating the statistical distance between two input distributions to additive error ε = Ω(1).
                                                                                           √
We show that the approximate degree and quantum query complexity of this task are Ω̃( n), even when
one of the distributions is known to be the uniform distribution.
Theorem 1.5 (Informal). There is a constant c > 0 such that the approximate degree and quantum query
complexity of approximating the statistical distribution of a distribution over a range of size n from the
                                                                      √
uniform distribution over the same range to additive error c is is Ω̃( n).
   This matches the upper bound of Bravyi et al. [23] and answers the main question left open from that
work. Note that the classical query complexity of this problem is Θ(n/ log n) [89].

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              8
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Entropy approximation. As in the previous problem, we interpret the input as a probability distribution,
and the goal is to compute its Shannon entropy to additive error ε. The classical query complexity of
this problem is Θ(n/ log n) [89]. We show that, for some ε = Ω(1), the approximate degree and quantum
                         √
query complexity are Ω̃( n).

Theorem 1.6 (Informal). There is a constant ε > 0 such that the approximate degree and quantum query
complexity of approximating the Shannon entropy of a distribution over a range of size n to additive error
           √
ε is is Ω̃( n).

      This too is tight, answering a question of Li and Wu [58].


1.2     Prior work on lower bounds for approximate degree
A relatively new lower-bound technique for approximate degree called the method of dual polynomials
plays an essential role in our paper. This method of dual polynomials dates back to work of Sherstov [75]
and Špalek [82], though dual polynomials had been used earlier to resolve longstanding questions in
communication complexity [36, 57, 71, 72, 81]. To prove a lower bound for a function f via this method,
one exhibits an explicit dual polynomial for f , which is a dual solution to a certain linear program
capturing the approximate degree of f .
    A notable feature of the method of dual polynomials is that it is lossless, in the sense that it can
exhibit a tight lower bound on the approximate degree of any function f (though actually applying the
method to specific functions may be highly challenging). Prior to the method of dual polynomials, the
primary tool available for proving approximate degree lower bounds was symmetrization, introduced by
Minsky and Papert [60] in the 1960s. Although powerful, symmetrization is not a lossless technique.
    Most prior work on the method of dual polynomials can be understood as establishing hardness
amplification results. Such results show how to take a function f that is “somewhat hard” to approximate
by low-degree polynomials, and turn f into a related function g that is much harder to approximate.
Here, harder means either that g requires larger degree to approximate to the same error as f , or that
approximations to g of a given degree incur much larger error than do approximations to f of the same
degree.
Results for block-composed functions. Until very recently, the method of dual polynomials had been
used exclusively to prove hardness amplification results for block-composed functions. That is, the
harder function g would be obtained by block-composing f with another function h, i. e., g = h ◦ f .
Here, a function g : {−1, 1}n·m → {−1, 1} is the block-composition of h : {−1, 1}n → {−1, 1} and
 f : {−1, 1}m → {−1, 1} if g interprets its input as a sequence of n blocks, applies f to each block, and
then feeds the n outputs into h.
      The method of dual polynomials turns out to be particularly suited to analyzing block-composed
functions, as there are sophisticated ways of “combining” dual witnesses for h and f individually to give
an effective dual witness for h ◦ f [21, 29, 30, 73, 75, 79–81, 86]. Prior work on analyzing block-composed
functions has, for example, resulted in the determination of the approximate degree of the function
 f (x) = ni=1 mj=1 xi j , known as the AND-OR tree, which had been open for 19 years [29, 73], established
         V W

new lower bounds for AC0 under basic complexity measures including discrepancy [30, 79, 80, 86],

                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            9
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

sign-rank [32], and threshold degree [79, 80], and resolved a number of open questions about the power
of statistical zero knowledge proofs [21].
Beyond block-composed functions. While the aforementioned results led to considerable progress in
complexity theory, many basic questions require understanding the approximate degree of non-block-
composed functions. One prominent example with many applications is to exhibit an AC0 circuit over n
variables with approximate degree Ω(n). Until very recently, the best result in this direction was Aaronson
and Shi’s well-known Ω̃(n2/3 ) lower bound on the approximate degree of the element distinctness function
(which is equivalent to k-distinctness for k = 2) [3]. However, Bun and Thaler [33] recently achieved a
near-resolution of this problem by proving the following theorem.

Theorem 1.7 (Bun and Thaler [33]). For any constant δ > 0, there is an AC0 circuit with approximate
degree Ω(n1−δ ).

The reason that Theorem 1.7 required moving beyond block-composed functions is the following result
of Sherstov [76].
                                                                                        
                                                           f ◦ f ) = O deg(h)
Theorem 1.8 (Sherstov). For any Boolean functions f and h, deg(h          f      · deg(
                                                                                   f f) .

    Theorem 1.8 implies that the approximate degree of h ◦ f (viewed as a function of its input size) is
never higher than the approximate degree of f or h individually (viewed as a function of their input sizes).
For example, if f and h are both functions on n inputs, and both have approximate degree O(n1/2 ), then
h ◦ f has N := n2 inputs, and by Theorem 1.8, deg(h
                                                f ◦ f ) = O(n1/2 · n1/2 ) = O(N 1/2 ).
    This means that block-composing multiple AC0 functions does not result in a function of higher
approximate degree (as a function of its input size) than that of the individual functions. Bun and
Thaler [33] overcome this hurdle by introducing a way of analyzing functions that cannot be written as a
block-composition of simpler functions.
    Bun and Thaler’s techniques set the stage to finding the approximate degrees of many basic functions
using the method of dual polynomials. However, they were not refined enough to accomplish this on their
own. Our lower bounds in this paper are obtained by refining and extending the methods of [33].

1.3   Our techniques
In order to describe our techniques, it is helpful to explain the process by which we discovered the tight
Θ̃(n3/4 ) lower and upper bounds for surjectivity (cf. Theorem 1.1). It has previously been observed
[29, 33, 86] that optimal dual polynomials for a function f tend to be tailored (in a sense that can be made
precise via complementary slackness) to showing optimality of some specific approximation technique
for f . Hence, constructing a dual polynomial for f can provide a strong hint as to how to construct an
optimal approximation for f , and vice versa.

Upper bound for surjectivity. In [33], Bun and Thaler constructed a dual polynomial witnessing a
suboptimal bound of Ω̃(n2/3 ) for SURJ. Even though this dual polynomial is suboptimal, it still provided
a major clue as to what an optimal approximation for SURJ should look like: it curiously ignored all
inputs failing to satisfy the following condition.

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              10
                                      T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Condition 1.9. Every range item has frequency at most T , for a specific threshold T = O(N 1/3 )  N.

This suggested that an optimal approximation for SURJ should treat inputs satisfying Condition 1.9
differently than other inputs, leading us to the following multi-phase construction (for clarity and brevity,
this overview is simplified). The first phase constructs a polynomial p of degree O(n3/4 ) approximating
SURJ on all inputs satisfying Condition 1.9. However, p may be exponentially large on other inputs.
The second phase constructs a polynomial q of degree O(n3/4 ) that is exponentially small on inputs x
that do not satisfy Condition 1.9 (in particular, q(x)  1/p(x) for such x), and is close to 1 otherwise.
The product p · q still approximates SURJ on inputs satisfying Condition 1.9, and is exponentially small
on all other inputs. Notice that deg(p · q) ≤ deg(p) + deg(q) = O(n3/4 ). Combining the above with an
additional averaging step (the details of which we omit from this introduction) yields an approximation to
SURJ that is accurate on all inputs.

Lower bound for surjectivity. With the O(n3/4 ) upper bound in hand, we were able to identify the
fundamental bottleneck preventing further improvement of the upper bound. This suggested a way to
refine the techniques of [33] to prove a matching lower bound. Once the tight lower bound for SURJ
was established, we were able to identify additional refinements to analyze the other functions that we
consider. We now describe this in more detail.
     Bun and Thaler’s [33] (suboptimal) lower bound analysis for SURJ proceeds in two stages. In the
first stage, proving a lower bound for SURJ (on N input list items and R range items) is reduced to the
problem of proving a lower bound for the block-composed function ANDR ◦ ORN ,2 under the promise that
the input has Hamming weight at most N.3 In this paper, we use this stage of their analysis unmodified.
     The second stage proves an Ω̃(R2/3 ) lower bound for the latter problem by leveraging much of the
machinery developed to analyze the approximate degree of block-composed functions [29, 65, 73]. To
describe this machinery, we require the following notion. A dual polynomial that witnesses the fact that
 f ( fn ) ≥ d is a function ψ : {−1, 1}n → [−1, 1] satisfying the following three conditions.
deg ε

     • ∑x∈{−1,1}n ψ(x) · f (x) > ε. If ψ satisfies this condition, it is said to be well correlated with f .

     • ∑x∈{−1,1}n |ψ(x)| = 1. If ψ satisfies this condition, it is said to have `1 -norm equal to 1.

     • For all polynomials p : {−1, 1}n → R of degree less than d, we have ∑x∈{−1,1}n p(x) · ψ(x) = 0. If
       ψ satisfies this condition, it is said to have pure high degree at least d.

    In more detail, the second stage of the analysis from [33] itself proceeds in two steps. First, the
authors consider a dual witness ψ for the high approximate degree of ANDR ◦ ORN that was constructed
in prior work [29]. ψ is constructed by taking dual witnesses φ and γ for the high approximate degrees
of ANDR and ORN individually, and “combining” them in a specific way [54, 75, 81] to obtain a dual
witness for the high approximate degree of their block composition ANDR ◦ ORN .
   2 When it is not clear from context, we use subscripts to denote the number of variables on which a function is defined.
   3 Note that a reduction the other direction is straightforward: to approximate SURJ, it suffices to approximate AND
                                                                                                                         R ◦ ORN
on inputs of Hamming weight exactly N. This is because SURJ can be expressed as an ANDR (over all range items r ∈ [R]) of
the ORN (over all input bits i ∈ [N]) of “Is input xi equal to r”? Each predicate of the form in quotes is computed exactly by a
polynomial of degree log R, since it depends on only log R of the inputs, and exactly N of these predicates (one for each i ∈ [N])
evaluate to TRUE.


                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                              11
                                 M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

     Unfortunately, ψ only witnesses a lower bound for ANDR ◦ ORN without the promise that the
Hamming weight of the input is at most N. To address this issue, it is enough to “post-process” ψ so that
it no longer “exploits” any inputs of Hamming weight larger than N (formally, ψ(x) should equal zero for
any inputs in {−1, 1}R·N of Hamming weight more than N). The authors accomplish this by observing
that ψ “almost ignores” all such inputs (i. e., it places exponentially little total mass on all such inputs),
and hence it is possible to perturb ψ to make it completely ignore all such inputs.
     Key to this step is the fact that the “inner” dual witness γ for the high approximate degree of the ORN
function satisfies a “Hamming weight decay” condition:
                                                      
                                                      N         1
                                            |γ(x)| ·      ≤            ,                                 (1.1)
                                                      |x|   poly(|x|)

for a suitable polynomial function.
     To improve the lower bound for SURJ from Ω̃(n2/3 ) to the optimal Ω̃(n3/4 ), we observe that γ in fact
satisfies a much stronger decay condition: while the inverse-polynomial decay property of Equation (1.1)
is tight for small Hamming weights |x|, |γ(x)| actually decays exponentially quickly once |x| is larger than
a certain threshold t. This observation is enough to obtain the tight Ω̃(n3/4 ) lower bound for SURJ.
     For intuition, it is worth mentioning that a primal formulation of the dual decay condition that
we exploit shows that any low-degree polynomial p that is an accurate approximation to ORN on low
Hamming-weight inputs requires large degree, even if |p(x)| is allowed to be exponentially large for
inputs of Hamming weight more than t.4 This is precisely the bottleneck that prevents us from improving
our upper bound for SURJ to o(N 3/4 ). In this sense, our dual witness is intuitively tailored to showing
optimality of the techniques used in our upper bound.

Other lower bounds. To obtain the lower bound for k-distinctness, the first stage of the analysis of [33]
reduces to a question about the approximate degree of the block-composed function ORR ◦ THRkN , under
the promise that the input has Hamming weight at most N. Here THRkN : {−1, 1}N → {−1, 1} denotes the
function that evaluates to −1 if and only if the Hamming weight of its input is at least k. By constructing
a suitable dual witness for THRkN , and combining it with a dual witness for ORN via similar techniques
as in our construction for SURJ, we are able to prove our Ω(n3/4−1/(2k) ) lower bound for k-distinctness.
(This description glosses over several significant technical issues that must be dealt with to ensure that
the combined dual witness is well-correlated with ORR ◦ THRkN ).5
   Recall that our lower bounds for k-junta testing, SDU, and entropy approximation are derived as
consequences of our lower bound for image size testing. The connection between image size testing and
    4 We do not formally describe this primal formulation of the dual decay condition, because it is not necessary to prove any of

the results in this paper.
    5 Specifically, our analysis requires the dual witness γ for THRk to be very well-correlated with THRk in a certain one-sided
                                                                    N                                      N
sense. Roughly, we need the probability distribution |γ| to have the property that, conditioned on γ outputting a negative value,
                              −1
the input to γ is in THRkN         (−1) with probability at least 1 − 1/(3R). This property was not required in the analysis for
SURJ, which is why our lower bound for SURJ is larger by a factor of n1/(2k) than our lower bound for k-distinctness (recall
that n = N log2 R and N is a constant factor larger than R). This seemingly technical issue is at least partially intrinsic: a
polynomial loss compared to the Ω̃(n3/4 ) lower bound for SURJ is unavoidable, owing to Belovs’ n3/4−Ω(1) upper bound [16]
for k-distinctness.


                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                              12
                                         T HE P OLYNOMIAL M ETHOD S TRIKES BACK


              Surjectivity upper bound                                 Surjectivity lower bound
                                                 Intuition and ideas
                         ' ,⁄-)
                ! SURJ = ((*                                                     / (*,⁄-)
                                                                        ! SURJ = Ω



                   1-junta testing                                          1-distinctness
                                                                                        ,    8
                ! Junta6              / ( 1)
                                     =Ω                                           / (*- 7 96 )
                                                                       ! Dist 6 = Ω

                 Statistical distance               Red
                                                        u
                                                        ctio
                         / *)
                 ! SDU = Ω(                                  n            Image size testing
                                                    Reduct
                                                           ion
                                                                                  / ( *)
                                                                          ! IST = Ω
                         Reduction




                  Shannon entropy
                           / ( *)
               ! Entropy = Ω

                Figure 1: Diagram of reductions and relationships amongst our results.


junta testing was established by Ambainis et al. [9]. The reason that the image testing lower bound implies
lower bounds for SDU is the following. Consider any distribution p over [R] such that all probabilities
pi are integer multiples of 1/N for some N = O(R). Then if p has full support, p is guaranteed to be
somewhat close to uniform, while if p has small support, p must be very far from uniform. We obtain our
lower bound for entropy approximation using a simple reduction from SDU due to Vadhan [88].

     To obtain our lower bound for image size testing, we observe that the first stage of the analysis of [33]
reduces to a question about the approximate degree of the block-composed function GapANDR ◦ ORN ,
under the promise that the input has Hamming weight at most N. Here, GapANDR is the promise function
that outputs −1 if all of its inputs equal −1, outputs +1 if fewer than γ · R of its inputs are −1, and is
undefined otherwise.
     Roughly speaking, we obtain the desired Ω̃(n1/2 ) lower bound by combining a dual witness for
GapANDR ◦ ORN from prior work [30] with the same techniques as in our construction for SURJ.
However, additional technical refinements to the analysis of [33] are required to obtain our results. In
particular, the analysis of [33] only provides a lower bound for SURJ if N = Ω(R · log2 (R)). But in order
to infer our lower bound for SDU and entropy approximation (as well as k-junta testing for ε = Ω(1)), it
is essential that the lower bound hold for N = O(R). This is because a distribution with full support is
guaranteed to be Ω(1)-close to uniform if all probabilities are integer multiples of 1/N with N = O(R),
but this is not the case otherwise. (Consider, e. g., a distribution that places mass 1 − 1/ log2 (R) on
a single range item, and spreads out the remaining mass evenly over all other range items). Refining
the methods of [33] to yield lower bounds even when N = O(R) requires a significantly more delicate
analysis than in [33].
     A diagram indicating how we obtain our results and the relationships that we establish between
problems is given in Figure 1.

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                               13
                            M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

1.4   Outline for the rest of the paper
Section 2 covers preliminary definitions and lemmas. Section 3 presents the Õ(n3/4 ) upper bound for
SURJ, while Section 4 presents the matching Ω̃(n3/4 ) lower bound. Section 5 gives the Ω̃(n3/4−1/(2k) )
lower bound for k-distinctness. Section 6 presents the lower bound for image size testing, and its
implications for junta testing, SDU, and Shannon entropy approximation. Section 7 concludes by briefly
describing some additional consequences of our results, as well as a number of open questions and
directions for future work.


2     Preliminaries
2.1   Notation
For a natural number N, let [N] = {1, 2, . . . , N} and [N]0 = {0, 1, 2, . . . , N}. All logarithms are taken in
base 2 unless otherwise noted. As is standard, we say that a function f (n) is in Õ(h(n)) if there exists a
constant k such that f (n) is in O(h(n) · logk (h(n))).
    We will frequently work with Boolean functions under the promise that their inputs have low Hamming
weight. To this end, we introduce the following notation for the set of low Hamming-weight inputs.

                                    n denote the subset of {−1, 1}n consisting of all inputs of Hamming
Definition 2.1. For 1 ≤ T ≤ n, let H≤T
weight at most T . We use |x| to denote the Hamming weight of an input x ∈ {−1, 1}n , so H≤T   n = {x ∈
        n
{−1, 1} : |x| ≤ T }.


2.2   Two variants of approximate degree and their dual formulations
There are two natural notions of approximate degree for promise problems (i. e., for functions defined
on a strict subset X of {−1, 1}n ). One notion requires an approximating polynomial p to be bounded
in absolute value even on inputs in {−1, 1}n \ X . The other places no restrictions on p outside of the
promise X . We make use of both notions. Hence, we must introduce some (non-standard) notation to
distinguish the two.

Definition 2.2 (Approximate degree with boundedness outside of the promise required). Let ε > 0 and
X ⊆ {−1, 1}n . The ε-approximate degree of a Boolean function f : X → {−1, 1}, denoted deg        f ( f ),
                                                                                                      ε
is the least degree of a real polynomial p : X → R such that |p(x) − f (x)| ≤ ε for all x ∈ X and
|p(x)| ≤ 1 + ε for all x ∈ {−1, 1}n \ X . We use the term approximate degree without qualification to refer
    f f ) = deg
to deg(       f ( f ).
                 1/3


     The following standard dual formulation of this first variant of approximate degree can be found in,
e. g., [31].

Proposition 2.3. Let X ⊆ {−1, 1}n , and let f : X → {−1, 1}. Then deg
                                                                  f ( f ) ≥ d if and only if there
                                                                     ε


                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                 14
                                  T HE P OLYNOMIAL M ETHOD S TRIKES BACK

exists a function ψ : {−1, 1}n → R with the following properties.

       ∑ ψ(x) · f (x) −       ∑          |ψ(x)| > ε,                                                      (2.1)
      x∈X                 x∈{−1,1}n \X

          ∑       |ψ(x)| = 1, and                                                                         (2.2)
      x∈{−1,1}n

       for every polynomial p : {−1, 1}n → R of degree less than d,         ∑        p(x) · ψ(x) = 0.     (2.3)
                                                                         x∈{−1,1}n

    We will refer to functions ψ : {−1, 1}n → R as dual polynomials. We refer to ∑x∈{−1,1}n |ψ(x)| as
the `1 -norm of ψ, and denote this quantity by kψk1 . If ψ satisfies Equation (2.3), it is said to have pure
high degree at least d.
    Given a function ψ : {−1, 1}n → R, and a (possibly partial) function f : X → {−1, 1}, where
X ⊆ {−1, 1}n , we let h f , ψi := ∑x∈X f (x) · ψ(x) − ∑x∈{−1,1}n \X |ψ(x)|, and refer to this as the correlation
of f and ψ. So Condition (2.1) is equivalent to requiring ψ and f to have correlation great than ε.
Definition 2.4 (Approximate degree with unboundedness permitted outside of the promise). Let ε > 0
and X be a finite set. The ε-unbounded approximate degree of a Boolean function f : X → {−1, 1},
         ^ ε ( f ), is the least degree of a real polynomial p : X → R such that |p(x) − f (x)| ≤ ε for all
denoted ubdeg
x ∈ X (if X is a strict subset of a larger domain, then no constraints are placed on p(x) for x 6∈ X ). We
                                                                               ^ f ) = ubdeg
use the term unbounded approximate degree without qualification to refer to ubdeg(          ^ 1/3 ( f ).

     The following standard dual formulation of this second variant of approximate degree can be found
in, e. g., [72]. A dual polynomial ψ : {−1, 1}n → [−1, 1] witnessing the fact that ubdeg
                                                                                     ^ ε ( f ) ≥ d is the
same as a dual witness for degε ( f ) ≥ d, but with the additional requirement that ψ(x) = 0 outside of X .
                             f

Proposition 2.5. Let X ⊆ {−1, 1}n , and let f : X → {−1, 1}. Then ubdeg
                                                                    ^ ε ( f ) ≥ d if and only if there
                             n
exists a function ψ : {−1, 1} → R satisfying the following conditions.

      ψ(x) = 0 for all x 6∈ X ,                                                                           (2.4)
       ∑ ψ(x) · f (x) > ε,                                                                                (2.5)
      x∈X

         ∑        |ψ(x)| = 1, and                                                                         (2.6)
      x∈{−1,1}n

      For every polynomial p : {−1, 1}n → R of degree less than d,           ∑       p(x) · ψ(x) = 0.     (2.7)
                                                                         x∈{−1,1}n

                   f f ) and ubdeg(
    Observe that deg(        ^ f ) coincide for total functions f . To avoid notational clutter, when
                                                                                         f f ).
referring to the approximate degree of total functions, we will use the shorter notation deg(

2.3   Basic facts about polynomial approximations
The seminal paper of Nisan and Szegedy [62] gave tight bounds on the approximate degree of the ANDn
and ORn functions.

                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                15
                            M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Lemma 2.6. For any constant ε ∈ (0, 1), the functions AND and OR on n bits have ε-approximate degree
Θ(n1/2 ), and the same holds for their negations.
    Approximate degree is invariant under negating the inputs or output of a function, and hence the
result for AND implies the result for NAND, OR, etc.
    The following lemma, which forms the basis of the well-known symmetrization argument, is due to
Minsky and Papert [60].
Lemma 2.7. Let p : {−1, 1}n → R be an arbitrary polynomial and let [n]0 denote the set {0, 1, . . . , n}.
Then there is a univariate polynomial q : R → R of degree at most deg(p) such that
                                                1
                                       q(t) =   n
                                                          ∑             p(x)
                                                t    x∈{−1,1}n : |x|=t

for all t ∈ [n]0 .

2.4     Functions of interest
We give formal definitions of the surjectivity and k-distinctness we consider, as well as several variations
that will be helpful in proving our lower bounds. The formal definitions of the other functions we study,
including image size testing, SDU, and the Shannon entropy functions, are deferred to Section 6.

2.4.1    Surjectivity
Definition 2.8. For N ≥ R, we define SURJN,R : [R]N → {−1, 1} by SURJN,R (s1 , . . . , sN ) = −1 iff for
every j ∈ [R], there exists an i such that si = j.
    When N and R are clear from context, we will often refer to the function SURJ without the explicit
dependence on these parameters. It will sometimes be convenient to think of the input to SURJN,R as
a function mapping {−1, 1}n → {−1, 1} rather than [R]N → {−1, 1}. When needed, we assume that R
is a power of 2 and an element of [R] is encoded in binary using log R bits. In this case we will view
surjectivity as a function on n = N log R bits, i. e., SURJ : {−1, 1}n → {−1, 1}.
    For technical reasons, when proving lower bounds, it will be more convenient to work with a variant
of SURJ where the range [R] is augmented by a “dummy element” 0 that is simply ignored by the function.
That is, while any of the items s1 , . . . , sN may take the dummy value 0, the presence of a 0 in the input is
not required for the input to be deemed surjective. We denote this variant of surjectivity by dSURJ. More
formally:
Definition 2.9. For N ≥ R, we define dSURJN,R : [R]N0 → {−1, 1} by dSURJN,R (s1 , . . . , sN ) = −1 iff for
every j ∈ [R], there exists an i such that si = j.
   The following simple reduction shows that a lower bound on the approximate degree of dSURJ
implies a lower bound for SURJ itself.
Proposition 2.10. Let ε > 0 and N ≥ R. Then
                           f (dSURJN,R ) ≤ deg
                           deg             f (SURJN+1,R+1 ) · log(R + 1).
                              ε               ε


                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                               16
                                     T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Proof. Let p : {−1, 1}(N+1)·log(R+1) → {−1, 1} be a polynomial of degree d that ε-approximates
SURJN+1,R+1 . We will use p to construct a polynomial of degree d that ε-approximates dSURJN,R .
Recall that an input to dSURJN,R takes the form (s1 , . . . , sN ) where each si is the binary representation of
a number in [R]0 . Define the transformation T : [R]0 → [R + 1] by
                                              (
                                                R + 1 if s = 0,
                                      T (s) =
                                                s            otherwise.

Note that as a mapping between binary representations, the function T is exactly computed by a vector of
polynomials of degree at most log(R + 1). For every (s1 , . . . , sN ) ∈ [R]N0 , observe that

                        dSURJN,R (s1 , . . . , sN ) = SURJN+1,R+1 (T (s1 ), . . . , T (sN ), R + 1).

Hence, the polynomial
                                               p(T (s1 ), . . . , T (sN ), R + 1)
is a polynomial of degree d · log(R + 1) that ε-approximates dSURJN,R .

2.4.2 k-Distinctness
Definition 2.11. For integers k, N, R with k ≤ N, define the function DISTkN,R : [R]N → {−1, 1} by
DISTkN,R (s1 , . . . , sN ) = −1 iff there exist r ∈ [R] and distinct indices i1 , . . . , ik such that si1 = · · · = sik = r.
   As with surjectivity, it will be convenient to work with a variant of k-distinctness where [R] is
augmented with a dummy item:
Definition 2.12. For integers k, N, R with k ≤ N, define the function dDISTkN,R : [R]N0 → {−1, 1} by
dDISTkN,R (s1 , . . . , sN ) = −1 iff there exist r ∈ [R] and distinct indices i1 , . . . , ik such that si1 = · · · = sik = r.

    For k ≥ 2, a lower bound on the approximate degree of dDISTk implies a lower bound on the
approximate degree of DISTk . The restriction that k ≥ 2 is essential, because the function DIST1 is the
constant function that evaluates to TRUE on any input (since at least one range item must always have
frequency at least√one), whereas dDIST1 contains ORN as a subfunction, and hence has approximate
degree at least Ω( N).
Proposition 2.13. Let ε > 0, N, R ∈ N, and k ≥ 2. Then

                                degε (dDISTkN,R ) ≤ degε (DISTkN,R+N ) · log(R + 1).

Proof. The proof is similar to that of Proposition 2.10, but uses a slightly more involved reduction. Let
p : {−1, 1}2N·log(R+N) → R be a polynomial of degree d that ε-approximates DISTkN,R+N . We will use
p to construct a polynomial of degree d that ε-approximates dDISTkN,R . For each i = 1, . . . , R, define a
transformation Ti : [R]0 → [R + N] by
                                                (
                                                 R + i if s = 0,
                                       Ti (s) =
                                                 s       otherwise.

                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                            17
                               M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

As a mapping between binary representations, the function T = (T1 , . . . , TN ) is exactly computed by a
vector of polynomials of degree at most log(R + 1). For every (s1 , . . . , sN ) ∈ [R]N0 , observe that

                            dDISTkN,R (s1 , . . . , sN ) = DISTkN,R+N (T1 (s1 ), . . . , TN (sN )).

Hence, the polynomial
                                                    p(T (s1 ), . . . , T (sN ))
is a polynomial of degree d · log(R + 1) that ε-approximates dSURJN,R .

2.5   Connecting symmetric properties and block-composed functions
An important ingredient in [33] is the relationship between the approximate degree of a property of a list
of numbers (such as SURJ) and the approximate degree of a simpler block-composed function, defined
as follows.

Definition 2.14. For functions f : Y n → Z and g : X → Y , define the block composition f ◦ g : X n → Z by
( f ◦ g)(x1 , . . . , xn ) = f (g(x1 ), . . . , g(xn )), for all x1 , . . . , xn ∈ X.

   Fix R, N ∈ N, let f : {−1, 1}R → {−1, 1} and let g : {−1, 1}N → {−1, 1}. Suppose g is a symmetric
function, in the sense that for any x ∈ {−1, 1}N and any permutation σ : [N] → [N], we have

                                          g(x1 , . . . , xN ) = g(xσ (1) , . . . , xσ (N) ).

Equivalently, the value of g on any input x depends only on its Hamming weight |x|.
    The functions f and g give rise to two functions. The first, which we denote by F prop : [R]N0 →
{−1, 1}, is a certain property of a list of numbers s1 , . . . , sN ∈ [R]0 . The second, which we denote by
F ≤N : H≤N
        N·R → {−1, 1}, is the block composition of f and g restricted to inputs of Hamming weight at

most N. Formally, these functions are defined as

       F prop (s1 , . . . , sN ) = f (g(1[s1 = 1], . . . , 1[sN = 1]), . . . , g(1[s1 = R], . . . , 1[sN = R]))
                                  (
                                     f (g(x1 ), . . . , g(xR )) if x1 , . . . , xR ∈ {−1, 1}N , |x1 | + · · · + |xR | ≤ N,
       F ≤N (x1 , . . . , xR ) =
                                     undefined                  otherwise.

   The following proposition from [33], which in turn relies heavily on a clever symmetrization argument
due to due to Ambainis [6], relates the approximate degrees of the two functions F prop and F ≤N .

Theorem 2.15. Let f : {−1, 1}R → {−1, 1} be any function and let g : {−1, 1}N → {−1, 1} be a
symmetric function. Then for F prop and F ≤N defined above, and for any ε > 0, we have

                                                              ^ ε (F ≤N ).
                                             degε (F prop ) ≥ ubdeg

   In the case where f = ANDR and g = ORN , the function F prop (s1 , . . . , sN ) is the surjectivity function
augmented with a dummy item, dSURJN,R (s1 , . . . , sN ). Hence,

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                               18
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Corollary 2.16. Let N, R ∈ N. Then for any ε > 0,

                                      degε
                                                      ^ ε (F ≤N )
                                      f (dSURJN,R ) ≥ ubdeg

where F ≤N : H≤N
              N·R → {−1, 1} is the restriction of AND ◦ OR to H N·R .
                                                     R    N    ≤N

Definition 2.17. For integers k, N with k ≤ N, define the function THRkN : {−1, 1}N → {−1, 1} by
THRkN (x) = −1 iff |x| ≥ k.
   If we let f = ORR and g = THRkN , then the function F prop is the dummy augmented k-distinctness
function dDISTkN,R .
Corollary 2.18. Let N, R ∈ N. Then for any ε > 0,

                                      degε
                                                       ^ ε (G≤N )
                                      f (dDISTkN,R ) ≥ ubdeg

where G≤N : H≤N
             N·R → {−1, 1} is the restriction of OR ◦ THRk to H N·R .
                                                   R     N     ≤N


2.6   The dual block method
This section collects definitions and preliminary results on the dual block method [54, 75, 81] for
constructing dual witnesses for a block-composed function F ◦ f by combining dual witnesses for F and
f.
Definition 2.19. Let Ψ : {−1, 1}M → R and ψ : {−1, 1}m → R be functions that are not identically
zero. Let x = (x1 , . . . , xM ) ∈ ({−1, 1}m )M . Define the dual block composition of Ψ and ψ, denoted
Ψ ? ψ : ({−1, 1}m )M → R, by
                                                                                      M
                    (Ψ ? ψ)(x1 , . . . , xM ) = 2M · Ψ(. . . , sgn (ψ(xi )) , . . . ) · ∏ |ψ(xi )|.
                                                                                     i=1

Proposition 2.20 ([33, 75]). The dual block composition satisfies the following conditions.

Preservation of `1 -norm:    If kΨk1 = 1, kψk1 = 1, and hψ, 1i = 0, then

                                                  kΨ ? ψk1 = 1.                                        (2.8)

Multiplicativity of pure high degree: If hΨ, Pi = 0 for every polynomial P : {−1, 1}M → R of degree
less than D, and hψ, pi = 0 for every polynomial p : {−1, 1}m → R of degree less than d, then for every
polynomial q : {−1, 1}m·M → R,

                                      deg q < D · d =⇒ hΨ ? ψ, qi = 0.                                 (2.9)

Associativity:   For every ζ : {−1, 1}mζ → R, ϕ : {−1, 1}mϕ → R, and ψ : {−1, 1}mψ → R, we have

                                           (ζ ? ϕ) ? ψ = ζ ? (ϕ ? ψ).                                 (2.10)

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              19
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

2.7   A refinement of a technical lemma from prior work
The following technical proposition refines techniques of Bun and Thaler [33]. This proposition is useful
for “zeroing out” the mass that a dual polynomial ξ places on inputs of high Hamming weight, if ξ is
obtained via the dual-block method.

Definition 2.21. Let M ∈ N and α, β > 0. A function ω : [M]0 → R satisfies the (α, β )-decay condition
if
                                M
                               ∑ ω(t) = 0,                                                          (2.11)
                               t=0
                                M
                               ∑ |ω(t)| = 1,                                                        (2.12)
                               t=0
                               |ω(t)| ≤ α exp(−βt)/t 2    ∀t = 1, 2, . . . , M.                     (2.13)


Proposition 2.22. Let R ∈ N be sufficiently large, and let Φ : {−1, 1}R → R with kΦk1 = 1. For  √ M ≤ R,
let ω : [M]0 → R satisfy the (α, β )-decay condition with parameters 1 ≤ α ≤R2 , β ∈ (4 ln2 R/( αR), 1).
                √                                                          N
    Let N = d20 αeR, and define ψ : {−1, 1}N → R by ψ(x) = ω(|x|)/ |x|       . If D < N is such that

                    For every polynomial p with deg p < D, we have hΦ ? ψ, pi = 0,                  (2.14)
                      √
then there exist ∆ ≥ β αR/(4 ln2 R) and a function ζ : ({−1, 1}N )R → R such that

                  For every polynomial p with deg p < min{D, ∆}, we have hζ , pi = 0,               (2.15)
                                 2
                 kζ − Φ ? ψk1 ≤ ,                                                                   (2.16)
                                 9
                 kζ k1 = 1,                                                                         (2.17)
                                    N·R
                 ζ is supported on H≤N  .                                                           (2.18)

    The key refinement of Proposition 2.22 relative to the analysis of Bun and Thaler is that Proposi-
tion 2.22 applies when N = Θ(R) (assuming α = O(1)). In contrast, the techniques of Bun and Thaler
required N = Ω(R · log2 R). As indicated in Section 1.3, this refinement will be essential in obtaining our
lower bounds for SDU, entropy approximation, and junta testing for constant proximity parameter.
    The proof of Proposition 2.22 occurs in two steps. First, in Proposition 2.23, we show that ξ = Φ ? ψ
places an exponentially small amount of mass on inputs outside of H≤N N·R . Second, in Proposition 2.25, we

construct a correction object ν that zeroes out the mass ξ places outside of H≤N N·R without decreasing its

pure high degree. Combining ξ with ν yields the desired object ζ .

Proposition 2.23. Let Φ : {−1, 1}R → R and ψ √   : {−1, 1}n → R satisfy the conditions of
                                                                                       √Proposition 2.22.
Then for sufficiently large R, there exists ∆ ≥ β αR/(4 ln2 R) such that, for N = d20 αeR,

                                      ∑ |(Φ ? ψ)(x)| ≤ (2NR)−2∆ .
                                         N·R
                                                                                                    (2.19)
                                     x∈H
                                      / ≤N


                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                             20
                                      T HE P OLYNOMIAL M ETHOD S TRIKES BACK

                                  N
                                     
Proof. Recall that ψ(x) = ω(|x|)/ |x|  where ω : [M]0 → R. By Equation (2.11) we may write ω =
ω+1 − ω−1 where ω+1 and ω−1 are non-negative functions satisfying
                                                k                 k
                                               ∑ ω+1 (t) = ∑ ω−1 (t) = 1/2.                                                    (2.20)
                                              t=0                t=0

    By the definition of dual block composition, we have
                                                                                                     R
                          (Φ ? ψ)(x1 , . . . , xR ) = 2R · Φ(. . . , sgn (ψ(xi )) , . . . ) · ∏ |ψ(xi )|.
                                                                                                     i=1

Consequently,
                                                                                                                     
                                                                                                             R
                        |(Φ ? ψ)(x)| = 2R
                                                                                                                     
                ∑                                   ∑     |Φ(z)| 
                                                                                       ∑                    ∏ |ψ(xi )|
                                                                                                                       
              x∈H N·R
               / ≤N                           z∈{−1,1}R                      (x1 ,...,xR )∈H N·R s.t.
                                                                                          / ≤N            i=1
                                                                       sgn(ψ(x1 ))=z1 ,...,sgn(ψ(xR ))=zR
                                                                                                     
                                                                                             R
                                                                                          ωzi (|xi |) 
                                      = 2R          ∑     |Φ(z)|             ∑          ∏ N .                                (2.21)
                                              z∈{−1,1}R                (x1 ,...,xR )∈H N·R i=1
                                                                                    / ≤N             |xi |
                                                                                                                       R
      Observe that for any (t1 , . . . ,tR ) ∈ [M]R0 , the number of inputs (x1 , . . . , xR ) ∈ {−1, 1}N                   such that
|xi | = ti for all i ∈ [R] is exactly ∏Ri=1 Nti . Hence, defining
                                                

                                     P = {(t1 , . . . ,tR ) ∈ [M]R0 : t1 + · · · + tR > N},
we may rewrite Expression (2.21) as
                                                                                                 !
                                                                                  R
                                      2R       ∑        |Φ(z)|          ∑ ∏ ωz (ti ) .   i
                                            z∈{−1,1}R             (t1 ,...,tR )∈P i=1

To control this quantity, we appeal to the following combinatorial lemma, whose proof we delay to
Section 2.7.1 (this lemma is a substantial refinement of [33, Lemma 32]).
                                                √                                                 √
Lemma 2.24. Let α ≤ R2 , let β ∈ (4 ln2 R/( αR), 1), and let R be sufficiently large. Let N = d20 αeR.
Let ηi : [M]0 → R, for i = 1, . . . R, be a sequence of non-negative functions where for every i,
                                       M
                                      ∑ ηi (r) ≤ 1/2,                                                                          (2.22)
                                      r=0
                                     ηi (r) ≤ α exp(−β r)/r2                   ∀r = 1, . . . , M.                              (2.23)
Let P = {~t = (t1 , . . . ,tR ) ∈ [M]R0 : t1 + · · · + tR > N}. Then
                                                    R
                                               ∑ ∏ ηi (ti ) ≤ 2−R · (2NR)−2∆
                                              ~t∈P i=1
           √
where ∆ ≥ β αR/(4 ln2 R).

                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                                  21
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

   Observe that the functions ωzi satisfy Condition (2.22) (cf. Equation (2.20)) and Condition (2.23) (cf.
Property (2.13)). We complete the proof by bounding
                                               !
                                       R
              2R ∑ |Φ(z)| ∑ ∏ ωzi (ti ) ≤ 2R ∑ |Φ(z)| · 2−R · (2NR)−2∆
                                                                                        
                 z∈{−1,1}R          ~t∈P i=1            z∈{−1,1}R

                                                   = (2NR)−2∆ .

Here, the equality appeals to the condition that kΦk1 = 1.

   Our final dual witness ζ is obtained by modifying ξ = Φ ? ψ to zero out all of the mass it places on
inputs of total Hamming weight larger than N. The following proposition gives sufficient conditions
under which this postprocessing step can be done.
Proposition 2.25 ( [33, Proposition 34], building on [65]). Let N ≥ R > D and let ξ : ({−1, 1}N )R → R
be any function such that
                                         ∑ |ξ (x)| ≤ (2NR)−2D .
                                             N·R
                                         x∈H
                                          / ≤N

Then there exists a function ν : ({−1, 1}N )R → R such that

                For all polynomials p : ({−1, 1}N )R → R, deg p < D =⇒ hν, pi = 0,
                kνk1 ≤ 1/10,
                |x| > N =⇒ ν(x) = ξ (x).

Remark 2.26. Proposition 2.25 is framed exclusively in the language of dual polynomials: it states that if
a dual polynomial ξ places very little mass on inputs of Hamming weight more than N, then there exists
another dual polynomial ν with certain useful properties (we ultimately use ν to zero out the mass that ξ
places on inputs of Hamming weight more than N). Proposition 2.25 also has a clean primal formulation.
Roughly speaking, it is equivalent to a bound on the growth rate of any polynomial of degree at most D
that is bounded at all inputs of Hamming weight at most D. We direct the interested reader to [90] for
details of this primal formulation.
   We are now ready to combine Proposition 2.23 and Proposition 2.25 to complete the proof of
Proposition 2.22.

Proof of Proposition 2.22. Let ξ = Φ ? ψ. By Proposition 2.23, we have

                                       ∑ |(Φ ? ψ)(x)| ≤ (2NR)−2∆ ,
                                         N·R
                                     x∈H
                                      / ≤N

           √
where ∆ ≥ β αR/(4 ln2 R). By Proposition 2.25, there exists a function ν : ({−1, 1}N )R → R such that

            For all polynomials p : ({−1, 1}N )R → R, deg p < min{D, ∆} =⇒ hν, pi = 0,             (2.24)
            kνk1 ≤ 1/10,                                                                           (2.25)
            |x| > N =⇒ ν(x) = ξ (x).                                                               (2.26)


                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            22
                                    T HE P OLYNOMIAL M ETHOD S TRIKES BACK

   Observe that kξ − νk1 > 0, as kξ k1 = 1 (cf. Equation (2.8)) and kνk1 ≤ 1/10 (cf. Inequality (2.25)).
Define the function
                                                 ξ (x) − ν(x)
                                        ζ (x) =               .
                                                  kξ − νk1
                                                                                                   N·R ,
Since ν(x) = ξ (x) whenever |x| > N (cf. Equation (2.26)), the function ζ is supported on the set H≤N
establishing (2.18). We establish (2.16) by computing
                                                   ξ (x) − ν(x)
                      kζ − ξ k1 =         ∑                     − ξ (x)
                                    x∈({−1,1}N )R
                                                    kξ − νk1
                                                                 
                                                         1                      1
                                  ≤     ∑ N R kξ − νk1 − 1 |ξ (x)| + kξ − νk1 |ν(x)|
                                    x∈({−1,1} )
                                                          
                                             1                             1
                                  ≤                   − 1 · kξ k1 +               · kνk1
                                      kξ k1 − kνk1                   kξ k1 − kνk1
                                                     
                                          1                  1/10
                                  ≤               −1 +
                                      1 − 1/10             1 − 1/10
                                    2
                                  = .
                                    9
   Equation (2.17) is immediate from the definition of ζ . Finally, (2.15) follows from (2.14), (2.24),
and linearity.

2.7.1    Proof of Lemma 2.24
Our final task is to prove Lemma 2.24, restated here for the reader’s convenience.
                                                √                                                 √
Lemma 2.24. Let α ≤ R2 , let β ∈ (4 ln2 R/( αR), 1), and let R be sufficiently large. Let N = d20 αeR.
Let ηi : [M]0 → R, for i = 1, . . . R, be a sequence of non-negative functions where for every i,
                                     M
                                    ∑ ηi (r) ≤ 1/2,                                              (2.22)
                                    r=0
                                   ηi (r) ≤ α exp(−β r)/r2              ∀r = 1, . . . , M.       (2.23)
Let P = {~t = (t1 , . . . ,tR ) ∈ [M]R0 : t1 + · · · + tR > N}. Then
                                                  R
                                           ∑ ∏ ηi (ti ) ≤ 2−R · (2NR)−2∆
                                           ~t∈P i=1
           √
where ∆ ≥ β αR/(4 ln2 R).
                                                      n        en k
                                                                 
Lemma 2.27. Let k, n ∈ N with k ≤ n. Then             k    ≤    k   .
Lemma 2.28. Let m ∈ N and η > 0. Then
                                              ∞
                                                               1
                                              ∑ e−ηr = 1 − e−η · e−ηm .
                                              r=m


                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                      23
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Proof. We calculate
                                 ∞                       ∞
                                                                             1
                                 ∑ e−ηr = e−ηm · ∑ e−ηr = e−ηm · 1 − e−η .
                                r=m                   r=0


Lemma 2.29. Let m ∈ N. Then
                                                 m                √
                                                         1
                                                ∑ √r ≤ 2 m.
                                                r=1

                                          √
Proof. Using the fact that the function 1/ t is decreasing, we may estimate

                                           m          Z m
                                                                     √
                                             1               1
                                         ∑ √r ≤          0
                                                             √ dt = 2 m.
                                                              t
                                         r=1

                  N
Proof. Let T = b 2M c. For each s ∈ {T, . . . , R}, let
                                                                 
                                                    N       N
                                       C(s) = max        , √        .
                                                  6s ln R 6 R · s

      We begin with a simple, but important, structural observation about the set P. Let t = (t1 , . . . ,tR ) ∈
[M]R0 be a sequence such that t1 + · · · + tR > N. Then we claim that there exists an s ∈ {T . . . , R} such
that ti ≥ C(s) for at least s indices i ∈ [R]. To see this, assume without loss of generality that the entries
of ~t are sorted so that t1 ≥ t2 ≥ · · · ≥ tR . Then there must exist an s ≥ T such that ts ≥ C(s). Otherwise,
because no ti can exceed M, we would have:

                                                     R
                         t1 + · · · + tR < T · M + ∑ C(s)
                                                  s=1
                                                              2
                                        N   N bR/ ln Rc 1 N                      R
                                                                                          1
                                       ≤ +       ∑ s + 6√R
                                        2 6 ln R s=1                             ∑        √
                                                                                           s
                                                                           s=dR/ ln2 Re
                                         N N N
                                       ≤    + +
                                         2   6  3
                                       = N.

where the last inequality follows from Lemma 2.29 and the fact that ∑m                    −1 ≤ ln m + 1 (and R is
                                                                                  s=1 s
sufficiently large). Since the entries of ~t are sorted, the preceding values t1 , . . . ,ts−1 ≥ C(s) as well.
    For each subset S ⊆ [R], define

                                PS = {~t ∈ P : ti ≥ C(|S|) for all indices i ∈ S}.

The observations above guarantee that for every~t = (t1 , . . . ,tR ) ∈ P, there exists some set S of size at least

                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                   24
                                     T HE P OLYNOMIAL M ETHOD S TRIKES BACK

  s ∈ {T, . . . , R} such that ti ≥ C(s) for all i ∈ S. Hence
      R            R                  R
∑ ∏ ηi (ti ) ≤ ∑         ∑       ∑ ∏ ηi (ti )
~t∈P i=1         s=T S⊆[R]:|S|=s~t∈PS i=1
                                                           !!s                          !!R−s
                   R                         M                                M
                      R
               ≤∑              max            ∑ ηi (r)               max       ∑ ηi (r)
                 s=T s       i∈{1,...,R} r=dC(s)e                  i∈{1,...,R} r=0
                                                                 !s
                      R              M
                           R
               ≤ 2−R ∑                ∑ 2α exp(−β r)r−2                           by Properties (2.22) and (2.23)
                     s=T   s      r=dC(s)e

                           Re s
                      R                       s      ∞
                  −R                     2α
               ≤2 ∑                                 ·   ∑ exp (−β rs)                            by Lemma 2.27
                     s=T    s          (C(s))2        r=dC(s)e

                           Re s
                      R                       s
                  −R                     2α               1
               ≤2 ∑                                 ·           · exp (−β sC(s))                 by Lemma 2.28
                     s=T    s          (C(s))  2      1 − e−β s
                      R 
                           Re s 72αRs s 2
                                                                      
                  −R                                                βN
               ≤2 ∑                                · · exp −                                by definition of C(s)
                     s=T    s            N2          β            6 ln R
                                           1                 1               2
                         and since                ≤                    = for s ≥ 1, β ∈ (0, 1)
                                       1 − e−β s 1 − (1 − β /2) β
                      R                       √
                                                                                                           √
                                                                
                                          3β αR
               ≤ 2−R ∑ 2−s · exp −                    + ln(2/β )                          setting N = d20 αeR
                     s=T                     ln R
               ≤ 2−R · (2NR)−2∆ ,

 where                                          √                  √
                                        1       3β αR               β αR
                                ∆=           ·         − ln(2/β ) ≥
                                   2 ln(2NR)      ln R              4 ln2 R
  holds for sufficiently large R by the restrictions placed on α and β in the statement of the lemma.


  3        Upper bound for surjectivity
 The goal of this section is to prove that the approximate degree of the surjectivity function (Definition 2.8)
 is Õ(N 3/4 ).
 Theorem 3.1. For any R ∈ N, we have deg(SURJ              3/4 ).
                                              N,R ) = Õ(N
                                     f

      For the remainder of the section, we focus on proving Theorem 3.1 in the case that R = Θ(N).
 This is without loss of generality by the following argument. If R > N, then surjectivity is identically
 false, and hence has (exact) degree 0. And if R = o(N), then we can reduce to the case R = Θ(N) as
 follows. Let N 0 = 2N and R0 = R + N; clearly R0 = Θ(N). Given an input x to SURJN,R , obtain an input
 x0 to SURJN 0 ,R0 by appending range elements R + 1, . . . , R + N to x. This construction guarantees that
 SURJN,R (x) = SURJN 0 ,R0 (x0 ). It follows that an approximation of degree Õ(N 3/4 ) for SURJN 0 ,R0 implies
 an approximation of the same degree for SURJN,R .

                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                    25
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Section Outline. This section is structured as follows. Section 3.1 introduces some notation that is specific
to this section. Section 3.2 provides some intuition for the construction of the polynomial approximation
for surjectivity using the simpler function NOR as a warmup example. Section 3.3 introduces some
terminology that is useful for providing intuitive descriptions of our final polynomial construction using
the language of algorithms. Finally, in Section 3.4 we formally apply our strategy to surjectivity in order
to prove Theorem 3.1.

3.1   Notation
In this section, we make a few departures from the notation used in the introduction and later sections
in order to more easily convey the algorithmic intuition behind our polynomial constructions. First,
we will consider Boolean functions f : {0, 1}n → {0, 1}, where 1 corresponds to logical TRUE and 0
corresponds to logical FALSE. For such a Boolean function, we say that a polynomial p : {0, 1}n → R is
an ε-approximating polynomial for f if p(x) ∈ [0, ε] when f (x) = 0 and p(x) ∈ [1 − ε, 1] when f (x) = 1.
For such a polynomial p, it will be useful to think of p(x) as representing the probability that a randomized
or quantum algorithm accepts when run on input x.
    By extension, throughout this section we will use deg     f ( f ) to denote the least degree of a real
                                                                 ε
polynomial p that ε-approximates f in the sense described above, and we will write deg(   f f ) = degf ( f ).
                                                                                                        1/3
                          n
Given an input x ∈ {0, 1} , we will use |x| to denote its Hamming weight (i. e., |x| = ∑i xi ).

3.2   Warmup: Approximating NOR
To convey the intuition behind our polynomial construction for surjectivity, we start by considering a
much simpler function as an illustrative example. Consider the negation of the OR function on n bits,
NOR : {0, 1}n → {0, 1}. We will give a novel construction of an approximating polynomial for NOR of
           √                                                                                              √
degree Õ( n). Of course, this is not terribly interesting since it is already known that deg(NOR)
                                                                                           f          = Θ( n)
(cf. Lemma 2.6). But this construction highlights the main idea behind the more involved constructions to
follow.
     In many models of computation, such as deterministic or randomized query complexity, the NOR
function remains just as hard if we restrict to inputs with |x| = 0 or |x| = 1 (where |x| denotes the Hamming
weight of x ∈ {0, 1}n ). This is fairly intuitive, since distinguishing these two types of inputs essentially
requires finding a single 1 among n possible locations. The fact that these inputs represent the “hard case”
is true for approximate degree as well: any polynomial that uniformly approximates NOR to error 1/3 on
Hamming weights 0 and 1, and merely remains bounded in [0, 1] on the rest of the hypercube, has degree
    √
Ω( n) [62].
     However, if we remove the boundedness constraint on inputs of Hamming weight larger than 1, then
there is a polynomial of degree one that exactly equals the NOR function on inputs of Hamming weight 0
and 1, namely, the polynomial 1 − ∑i xi . That is, if we view NOR as a promise function mapping H≤1      n to
                                                1/2
{−1, 1}, then its approximate degree is Θ(n ), but its unbounded approximate degree is just 1.
     More generally, suppose that we only want the polynomial to approximate the NOR function on all
inputs of Hamming weight up to T ≤ n, and we place no restrictions on the polynomial when evaluated   √
at inputs of Hamming weight larger than T . This can be achieved by a polynomial of degree O( T ) (see
Lemma 3.3 for a proof).

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              26
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

    Let us refer to the set of low Hamming-weight inputs as P, i. e.,
                                       n
                                  P = H≤T = {x ∈ {0, 1}n : 0 ≤ |x| ≤ T }.                                 (3.1)
                                                                                             √
    The above discussion shows that we can construct a polynomial p of degree O( T ) that tightly
approximates NOR on inputs x ∈ P, though |p(x)| may be exponentially large for x ∈         / P. On the other
hand, distinguishing inputs with |x| = 0 from inputs with x ∈   / P also seems “easy”. For example, a
randomized algorithm that simply samples Θ(n/T ) bits and declares |x| = 0 if and only if it does not see
a 1 is correct with high probability. Analogously, we can construct a low-degree polynomial q̃ which
distinguishes between |x| = 0 or x ∈/ P (and is bounded in [0, 1] for all inputs in {0, 1}n ): it is not hard to
show (via an explicit construction involving Chebyshev polynomials, or by appeal p to a quantum algorithm
called quantum counting [22]) that there exists a polynomial q̃ of degree O( n/T ) that accomplishes
this.                                                                                                 √
   pTo summarize the above discussion, we can construct polynomials p and q̃, of degree O( T ) and
O( n/T ), respectively, with the following properties.
                                                                  
                    [9/10, 1] if |x| = 0
                                                                  [9/10, 1] if |x| = 0
                                                                   
             p(x) ∈ [0, 1/10] if 1 ≤ |x| ≤ T              q̃(x) ∈ [0, 1]          if 1 ≤ |x| ≤ T           (3.2)
                                                                  
                                  if x ∈
                                       /P                            [0, 1/10] if x ∈  /P
                                                                  
                       R

    Now consider the polynomial p(x) · q̃(x). This polynomial approximates NOR on |x| = 0, since its
value is in [0.81, 1]. It also approximates NOR on 1 ≤ |x| ≤ T , since its value is in [0, 1/10]. However,
when x ∈/ P, although q̃(x) is small, we cannot ensure that the product p(x) · q(x) is small, since we have
no control over p(x) for such x.
    To fix this, we will construct a new polynomial q that behaves like q̃ for x ∈ P and is extremely small
when x ∈/ P (in particular 0 ≤ q(x)  1/|p(x)| for such x). To understand how small we need q to be, we
need to determine just how large p(x) can be on inputs with x ∈    / P.
    To understand the behavior of p(x) outside of P, we can either analyze the behavior of an explicit
polynomial of our choice for the NOR function or we can appeal to a general result about the growth
of polynomials that are bounded
                             √       in a region (see Lemma 3.3). In either case, we get that there exists an
upper bound M = exp(O( T log n)) such that for all x ∈      / P, |p(x)| ≤ M.
    This leads us to design a new polynomial q, which has the same behavior as q̃ for all x ∈ P, but is at
most 1/(3M) when x ∈    / P. We can construct the polynomial q from q̃ by applying standard error reduction
to reduce the approximation    √ error of q̃ to ε = 1/(3M), which
                                                              p increases
                                                                       √     the degree by a multiplicative
                                                                                      √
factor of O(log(3M)) = O( T log n). Thus, deg(q) = O( n/T · T log n) = Õ( n).
    In summary, we have now constructed polynomials p and q with the following behavior:
                                                             
                    
                      [9/10,  1]  if |x| = 0                 [1 − 1/(3M), 1] if |x| = 0
                                                              
             p(x) ∈ [0, 1/10] if 1 ≤ |x| ≤ T            q(x) ∈ [0, 1]             if 1 ≤ |x| ≤ T        (3.3)
                                                             
                       [0, M]      if x ∈
                                        /P                       [0, 1/(3M)]      if x ∈
                                                                                       /P
                                                             

   Caricatures of these polynomials are depicted in Figure 2 and Figure 3. It is now easy to see that the
product r(x) = p(x) · q(x) is a (1/3)-error approximation to
                                                           √NOR√for all x ∈  {0, 1}n . The degree of the
                                                                          √
constructed polynomial is deg(r) ≤ deg(p) + deg(q) = Õ( T + n) = Õ( n).

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                 27
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER




   Figure 2: Caricature of the polynomial p(x)             Figure 3: Caricature of the polynomial q(x)

                                                      √
    Thus our constructed polynomial, r, has degree Õ( n) and approximates NOR to error 1/3, which is
optimal by Lemma 2.6.

3.3    Informal terminology: polynomials as algorithms
Before moving on to surjectivity, we briefly introduce some terminology that will allow us convey the
intuition of more involved constructions by reasoning about polynomials as if they were algorithms.
    Consider three Boolean functions p1 : {0, 1}n → {0, 1}, p2 : {0, 1}n → {0, 1} and p3 : {0, 1}n →
{0, 1}, and suppose there are deterministic algorithms A1 , A2 , and A3 that compute these Boolean
functions exactly. Now it makes sense to say “run algorithm A1 in input x; if it accepts then output A2 (x),
and if it rejects, then output A3 (x).” The Boolean function computed by this is A2 (x) if A1 (x) = 1 and
A3 (x) if A1 (x) = 0. Observe that the following polynomial composition of p1 , p2 , p3 computes the same
Boolean function: p1 (x)p2 (x) + (1 − p1 (x))p3 (x).
    We would like to use this terminology to discuss combining polynomials more generally. For
polynomials p that approximate Boolean functions by outputting a value in [0, 1] on any input x, we can
imagine p(x) as representing the probability that a randomized (or quantum) algorithm accepts on input
x, and the same interpretation goes through. We will also use the same terminology for polynomials that
output values greater than 1, in which case we cannot interpret the output as a probability, but expressions
like p1 (x)p2 (x) + (1 − p1 (x))p3 (x) still make sense.
    For example, in the previous section we had two polynomials p and q and we constructed the
polynomial r(x) = p(x) · q(x) by multiplying the two polynomials together. Another way to think of
this is that r is the polynomial obtained when we “run” the polynomial q and output p if it accepts and
output 0 if it rejects. This yields the polynomial q(x)p(x) + (1 − q(x)) · 0 = q(x)p(x). We would like to
informally describe polynomial constructions using this language, which will be especially useful when
the constructions become more involved. So for example, we would informally describe the polynomial
we constructed for the NOR function as follows (Polynomial 1):

Polynomial 1 An informal description of the polynomial approximation for NOR
 1: Using q, check if |x| > T (with exponentially small probability of error if indeed |x| > T ). If so, halt
      and output 0.
 2: Using p, compute NOR under the promise that 0 ≤ |x| ≤ T and output the result.



                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                               28
                                 T HE P OLYNOMIAL M ETHOD S TRIKES BACK

3.4     Approximating surjectivity
We now construct a polynomial that approximates surjectivity using the strategy described above. Recall
that SURJ : [R]N → {0, 1} is defined by SURJ(x1 , . . . , xN ) = 1 if and only if for all r ∈ [R] there exists an
i ∈ [N] such that xi = r.
     In this section, we will use the notation

                                          #r (x) = |{i ∈ [N] : xi = r}|                                    (3.4)

to denote the number of times the range element r appears in the input x. Note that the surjectivity
function evaluates to 1 if and only if #r (x) ≥ 1 for all r ∈ [R].
    Finally, we will also need to consider a generalized version of surjectivity, R-surjectivity for some set
R ⊆ [R], which we denote SURJR . As with surjectivity,

                                           SURJR : [R]N → {0, 1},                                          (3.5)

and SURJR (x1 , . . . , xN ) = 1 if and only if for all r ∈ R there exists an i ∈ [N] such that xi = r. In other
words, SURJR (x) = 1 if and only if for all r ∈ R we have #r (x) ≥ 1. Note that SURJ[R] = SURJ. Our
                                                                                3/4 ) for all R ⊆ [R].
construction will actually show more generally that deg(SURJ         R ) = Õ(N
                                                           f


3.4.1    Approximating surjectivity on the hard inputs
To implement the strategy described in Section 3.2 in the context of surjectivity, we first choose a set P
of inputs that we consider to be “hard”. Since surjectivity can be phrased as asking whether all range
items appear at least once in the input, it is natural to consider the hard inputs to be those that have few
occurrences of each range item. Intuitively, this is because on such inputs, evidence that any given range
item r appears at least once in the input is hard to find.
    To this end, we define P as the set of inputs for which every range item appears at most T times, for
a parameter T to be chosen later:

                                       P = {x : ∀r ∈ [R], #r (x) ≤ T }.                                    (3.6)

More generally, when we consider R-surjectivity, we denote the set of hard inputs PR and define it as

                                       PR = {x : ∀r ∈ R, #r (x) ≤ T }.                                     (3.7)

    In this section, we will construct a polynomial pR that approximates SURJR on PR to bounded error,
but may be exponentially large outside of PR . The value of T in the definition of PR will be chosen later;
for now we only assume that our choice will satisfy T = N Θ(1) , which simplifies some expressions since
we have log T = Θ(log N).
Overview of the construction of pR . To construct a polynomial that works on the hard inputs, PR , we
first express SURJR as                          ^ _
                                 SURJR (x) =            1[xi = r],                              (3.8)
                                                      r∈R i∈[N]


                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                 29
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

where 1[xi = r] is the indicator function that takes value 1 when xi = r and 0 otherwise. Observe that for
any fixed r ∈ R, the function 1[xi = r] depends on only log R bits of x and hence can be exactly computed
by a polynomial of degree at most log R.
    Since our goal in this section is to construct a polynomial pR that approximates SURJR on all inputs
in PR and may be exponentially large outside of PR , we can assume that each inner OR gate in Equation
(3.8) is fed an input of Hamming weight at most T . Hence, our approach will be to first construct a
                                                                                N |R| . We then obtain the
                                                                                   
low-degree polynomial q that approximates AND|R| ◦ ORN for inputs in H≤T
polynomial pR that approximates SURJR at all inputs in PR by composing q with the indicator functions
{1[xi = r]}i∈[N],r∈R . Notice that the degree of pR is at most deg(q) · log R.
    To construct q, our approach is as follows. First, we construct a polynomial VT of degree O(T 1/2 log N)
that approximates ORN to error O(1/N) at all inputs of Hamming weight at most T . (However, VT (x)
may be as large as exp(T 1/2 log N) for inputs x of larger Hamming weight). Invoking Lemma 2.6, we
let w be a polynomial of degree Θ(N 1/2 ) that approximates AND|R| to error 1/20, and finally we define
q := w ◦VT . A simple and elegant analysis of Buhrman et al. [25] (cf. Lemma 3.4) allows us to argue that
                                                N |R| .
                                                   
q indeed approximates AND|R| ◦ ORN on H≤T

Preliminary lemmas. Before formally defining and analyzing pR , we record a few important facts about
Chebyshev polynomials that will be useful throughout the remainder of this section.

Lemma 3.2 (Properties of Chebyshev polynomials). Let d ∈ N.

(1) There exists a polynomial Td : R → R (the Chebyshev polynomial of degree d) such that Td (x) ∈ [−1, 1]
                                                 √
    for all x ∈ [−1, 1] and Td (1 + µ) ≥ 21 exp(d µ) for all µ ∈ (0, 1).

(2) For any polynomial p : R → R of degree d with |p(x)| ≤ 1 for all x ∈ [−1, 1], we have that for any x
    with |x| > 1,
                                      |p(x)| ≤ |Td (x)| ≤ (2|x|)d ,                                (3.9)

    where Td is the Chebyshev polynomial of degree d.

Proof. To establish property (1), we use the fact that for µ > 0, the value of the degree-d Chebyshev
polynomial can be written [37, §3.2 Problem 8(f)] as

                    Td (1 + µ) = cosh(d arcosh(1 + µ))
                                       √
                               ≥ cosh(d µ)                                   for µ ≤ 1
                                 1        √
                               ≥ · exp(d µ).
                                 2
Property (2) appears as [37, 3.2 Problem 19].

    We are now ready to establish the existence of a low-degree polynomial that approximates OR at all
inputs of low Hamming weight.


                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              30
                                  T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Lemma 3.3 (Approximating OR on inputs of low Hamming √       weight). Let ε ∈ (0, 1) and 1 ≤ T ≤ N.
There is a polynomial VT,ε : {0, 1}N → R of degree O( T log(1/ε)) such that
                  
                  [0, ε]
                                                                       if |x| = 0
        VT,ε (x) ∈ [1 − ε, 1]                                           if 1 ≤ |x| ≤ T .      (3.10)
                                               √                  
                   [−a, a] for some a ∈ exp O T · log N · log (1/ε)     if |x| > T
                  

                     √
Proof. Choose d = O( T log(1/ε)) so that M := Td (1 + 1/T ) + 1 ≥ 2/ε (as guaranteed by Property 1
of Lemma 3.2). Define VT,ε by the following affine transformation of Td :
                                                                     
                                            1       1           1 |x|
                           VT,ε (x) = 1 −        − · Td 1 + −             .
                                           M       M           T     T

Then for |x| = 0, we have
                                                              
                                                1    1         1
                                 VT,ε (x) = 1 −     − · Td 1 +     = 0.
                                                M    M         T

If 1 ≤ |x| ≤ T , then 1 + 1/T − |x|/T ∈ [−1, 1], so VT,ε(x) ∈ [1 − ε, 1]. Finally, if T + 1 ≤ |x| ≤ N, then

                                           1                     √
                        |VT,ε (x)| ≤ 1 +     · Td (N/T ) ≤ exp(O( T · log N · log(1/ε)),
                                           M
where the final inequality holds by Equation (3.9).

    The following lemma shows that if p and q are approximating polynomials for Boolean functions f
and g, respectively, then the block composition p ◦ q approximates f ◦ g, with a blowup in error that is
proportional to the number of variables over which f is defined. The proof is due to Buhrman et al. [25],
but our formulation is slightly different so we give the proof for completeness.

Lemma 3.4. Let f : {0, 1}n → {0, 1} and g : X → {0, 1} be Boolean functions, where X ⊆ {0, 1}m for
some m. Let p : {0, 1}n → [0, 1] be an ε-approximating polynomial for f , and let q : X → [0, 1] be a δ -
approximating polynomial for g. Then the block composition p ◦ q : X n → R is an (ε + δ n)-approximating
polynomial for f ◦ g : X n → {0, 1}.

Proof. Fix any input x = (x1 , . . . , xn ) ∈ X n , and let y = (g(x1 ), . . . , g(xn )) ∈ {0, 1}n . Let z ∈ [0, 1]n be
defined by z = (q(x1 ), . . . , q(xn )). Since p is an ε-approximating polynomial for f , by the triangle
inequality, it suffices to show that |p(y) − p(z)| ≤ δ n.
    Let Z be a random variable on {0, 1}n where each Zi = 1 independently with probability zi . Then due
to the multilinearity of p, we have

                        p(z) = E[p(Z)] = Pr[Z = y] · p(y) + Pr[Z 6= y] · E[p(Z)|Z 6= y].

Since q is a δ -approximating polynomial for g, we have |yi − zi | ≤ δ for every i ∈ [n]. Hence,

                                           Pr[Z = y] ≥ (1 − δ )n ≥ 1 − δ n.

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                      31
                              M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Because p is bounded in [0, 1], this implies

                                      p(z) ≥ (1 − δ n) · p(y) + 0 ≥ p(y) − δ n

and
                                        p(z) ≤ 1 · p(y) + δ n · 1 = p(y) + δ n,
completing the proof.

Formal definition of pR . Let w be a (1/20)-approximating polynomial for AND|R| of degree O(|R|1/2 )
whose existence is guaranteed by Lemma 2.6. We may assume that w(x) ∈ [0, 1] for all x ∈ {0, 1}n (we
will exploit this assumption in the proof of Lemma 3.7 below, as it allows us to apply Lemma 3.6 below
to w). Let q := w ◦ VT,1/(20n) . Finally, let us define pR to be the composition of q with the indicator
functions {1[xi = r]}i∈[N],r∈R . For example, if R = {1, . . . , |R|}, then

                pR = q (1 [x1 = 1] , 1 [x2 = 1] , . . . , 1 [xN = 1] , 1 [x1 = 2] , . . . , 1 [xN = |R|]) .

      Observe that
                                                                                             √ 
   deg(pR ) ≤ deg(w) · deg(VT,1/(20n) ) · deg(1[xi = r]) ≤ O |R|1/2 · T 1/2 log n · log R ≤ Õ   NT .

Showing pR approximates SURJR on PR . Lemma 3.4 implies that:
                                                                        N
                                                                                              |R|
                            |q(x) − AND|R| ◦ OR(x)| ≤ 1/10 for all x ∈ H≤T                           .        (3.11)

An immediate consequence is the following lemma.
Lemma 3.5. |pR (x) − SURJR (x)| ≤ 1/10 for all x ∈ PR .

Bounding pR outside of PR . For an input x ∈ [R]N outside of PR , let bR (x) be the number of range
items that appear more than T times, i. e.,

                                   bR (x) = |{r ∈ R : #r (x) > T }|.                             (3.12)
                                         √ 
We claim that |pR (x)| ≤ exp bR (x) · Õ( T ) . This bound relies on the following elementary lemma.
Lemma 3.6. Let p : Rn → R be a multilinear polynomial with p(x) ∈ [0, 1] for all x ∈ {0, 1}n . Then for
x ∈ Rn , we have
                                                           n
                                              |p(x)| ≤ ∏(|1 − xi | + |xi |).
                                                          i=1

Proof. We prove the lemma by induction on the number of variables n. If n = 0, then p is a constant in
the interval [0, 1] and the claim is true.
    Now suppose the claim is true for n − 1 variables, and let p : Rn → R with p(x) ∈ [0, 1] for x ∈ {0, 1}n .
We begin by decomposing

                            p(x) = (1 − xn ) · q0 (x1 , . . . , xn−1 ) + xn · q1 (x1 , . . . , xn−1 )

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                   32
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

where q0 and q1 are themselves multilinear polynomials. Since p(x) ∈ [0, 1] for all x ∈ {0, 1}n , this is in
particular true when xn = 0. Hence q0 (x0 ) ∈ [0, 1] for all x0 ∈ {0, 1}n−1 . Similarly, setting xn = 1 reveals
that q1 (x0 ) ∈ [0, 1] for all x0 ∈ {0, 1}n−1 . Now for any x ∈ Rn with x0 = (x1 , . . . , xn−1 ) we have

                                |p(x)| = (1 − xn ) · q0 (x0 ) + xn · q1 (x0 )
                                       ≤ |1 − xn | · |q0 (x0 )| + |xn | · |q1 (x0 )|
                                                                 n−1
                                       ≤ (|1 − xn | + |xn |) · ∏ (|1 − xi | + |xi |)
                                                                 i=1

where the final inequality uses the inductive hypothesis. This proves the claim.
                                                            √ 
Lemma 3.7. There exists a function a(x) = exp bR (x) · Õ T such that for any R ⊆ [R], the polyno-
                                    √
mial pR : [R]N → R has degree Õ( NT ) and satisfies:
                                  
                                  [0, 1/10]
                                                  if x ∈ PR and SURJR (x) = 0
                        pR (x) ∈ [9/10, 1]         if x ∈ PR and SURJR (x) = 1
                                  
                                    [−a(x), a(x)] if x ∈/ PR .
                                  

Proof. The first two cases are an immediate consequence of Lemma 3.5.
    To upper bound the value√ of |pR (x)| for x ∈
                                                / PR , we exploit the structure of pR as a multilinear
polynomial w of degree O( N) over the variables z1 , . . . , z|R| , where each zr is the output of the rth
polynomial from Lemma 3.3. That is,

                                 zr = VT,1/(20n) (1[x1 = r], . . . , 1[xN = r]) .

If bR (x) range items appear greater than T times, this means that up to bR (x) of the variables zr might
                                                                                                       √take
values outside [0, 1]. However, by Lemma 3.3, each of these bR (x) variables is still at most exp(Õ( T )).
By Lemma 3.6,                                                            √ 
                         |w(z)| ≤ ∏ (|1 − zr | + |zr |) ≤ exp bR (x) · Õ     T ,
                                    r∈R

since each zr that is in [0, 1] contributes a factor of exactly
                                                            √ 1 to the product, whereas each of the remaining
bR (x) variables contributes a factor of at most exp(Õ( T )) to the product.

3.4.2   Controlling the easy inputs
Intuition. Unlike the example of the NOR function, where all the inputs outside the “hard” set P (cf.
Equation (3.1)) were in NOR−1 (1), for surjectivity there are both 0- and 1-inputs outside of the hard set
P (cf. Equation (3.6)) . So our remaining task is not simply a matter of constructing a polynomial that
detects if the input is outside of P, as it was in the case of the NOR function.
    However, we will show that, for surjectivity, the inputs outside of P are easy to handle in a different
sense—they are easy because we can construct a reduction from inputs outside of P to inputs in P.
To gain some intuition, consider the task of designing a randomized algorithm for surjectivity where
we want to reduce the general case to the case where there is a set R ⊆ [R] such that all range items

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                33
                            M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

r ∈ R have #r (x) ≤ T . To do this, we can simply sample a large number of elements xi and remove from
consideration all range items appearing at least once in the sample, because we know these range items
all appear in the input at least once. After this step, we have a new set R ⊆ [R] consisting of all range
items that have not been seen in the sampling stage. Every r ∈ R is likely to have #r (x) ≤ T because
elements that appeared too frequently would have (with high probability) been observed in the sampling
stage. Thus it now suffices to solve SURJR on the input under the assumption that all range items r ∈ R
have #r (x) ≤ T .
     Informally, the above discussion states that we want to construct the polynomial described in Polyno-
mial 2.

Polynomial 2 Informal description of the polynomial approximation for SURJ
 1: Sample S = Θ̃(N 3/4 ) items and remove all range items seen from [R]. Let the remaining set be
    R ⊆ [R].                                                                               √
 2: Solve SURJR under the promise that all r ∈ R have #r (x) ≤ T , where T = Θ̃(            N).


    We now have to construct a polynomial that represents this algorithmic idea. We have already
constructed an (unbounded approximating) polynomial for SURJR under the promise PR in the previous
section, so the second step of this construction is done.
    For Step 1 of Polynomial 2, we need to construct a polynomial to represent the idea of sampling
input elements and evaluating a polynomial that depends on the sampled elements. To build up to this,
consider a deterministic algorithm that queries a subset S ⊆ [N] of input elements, checks if the sampled
string equals another fixed string y and outputs 1 if true and 0 if false. If we denote the input x ∈ [R]N
restricted to the subset S ⊆ [N] as xS , then this algorithm outputs 1 if and only if xS = y. Interpreting the
input as an element of {0, 1}N log R rather than [R]N , it is easy to see that a deterministic query algorithm
querying |S| log R bits of x can solve this problem. Consequently, there is a polynomial of degree |S| log R
that outputs 1 if xS = y and outputs 0 otherwise. For any fixed S ⊆ [N] and y ∈ [R]|S| , we denote this
polynomial by 1y (xS ).
    Now we can construct a polynomial which samples S elements of the input and then outputs a bit
depending on the elements seen. Let S ⊆ [N] be a subset of indices with |S| = S. Let f (S, xS ) be an
arbitrary Boolean function that tells us whether to output 0 or 1 on seeing the sample (S, xS ). Then the
following polynomial samples a random set S of size S, reads the input x restricted to the set S, and
outputs the bit f (S, xS ):
                                         1
                                         N
                                            ∑ ∑ 1y (xS ) f (S, y)                                       (3.13)
                                          S S∈([N]) y∈[R]S
                                                 S


Note that for any function f , this is a polynomial of degree S log R in the variables x1 , . . . , xN , because
1y (xS ) is a multilinear polynomial over those variables, and f (S, y) is simply a hard-coded bit that does
not depend on x. The value of this polynomial on an input x equals

                            1                                      1
                            N
                                   ∑ ∑      1y (xS ) f (S, y) =   N
                                                                          ∑ f (S, xS ).                 (3.14)
                            S S∈([N]) y∈[R]S                       S S∈([N])
                                  S                                      S



                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                 34
                                    T HE P OLYNOMIAL M ETHOD S TRIKES BACK

      We can now generalize this construction to allow for the possibility that the function f is itself
a polynomial. This is what we need to implement Polynomial 2, in which we sample a random set
S ⊆ [N] of size S, query xS (using S log R queries), and then run a polynomial that depends on the results.
Specifically, if we sample the set S ⊆ [N], and learn that xS equals the string y ∈ [R]|S| , then we want to
run the polynomial pR (y) from Lemma 3.7 for the set R(y) of all elements in [R] that do not appear in y,
i. e.,
                                       R(y) = [R] \ {r : ∃i yi = r}.                                  (3.15)

Formal description of the approximation to surjectivity. Using the tools from Section 3.4.1 and the
above discussion, we can now construct a polynomial that corresponds to the informal description in
Polynomial 2:
                                          1
                                  r(x) = N  ∑ ∑ 1y (xS )pR(y) (x).                              (3.16)
                                          S S∈([N]) y∈[R]S
                                                S

                                                                 √
This is a polynomial of degree S log R+maxy {deg(pR(y) )} = Õ(S+ NT ) = Õ(N 3/4 ), using S = Θ̃(N 3/4 )
             √
and T = Θ̃( N) (where the factors hidden by the Θ̃ notation will be chosen later). The value of the
polynomial on input x is
                                              1
                                       r(x) = N  ∑ pR(xS ) (x).                                 (3.17)
                                              S S∈([N])
                                                       S


where recall that R(xS ) is as defined in Equation (3.15). The right hand side of Equation (3.17) is
precisely the expected value of the polynomial pR(xS ) (x) when S is a uniformly random set of size S.
We will now show that r(x) is an approximating polynomial for surjectivity, i. e., that for all x ∈ [R]N ,
|SURJ(x) − r(x)| ≤ 1/3.
    Recalling that bR(xS ) is the number of range items r ∈ R that appear in x greater than T times (cf.
Equation (3.12)), we compute the value of the polynomial r(x) on an input x:

                            1
                   r(x) =   N
                                    ∑pR(xS ) (x)                                                    (3.18)
                            S S∈([N])
                                  S
                                                                               
                                              N/T
                            1 
                        =   N
                                        ∑ ∑ 1[bR(x ) (x) = b] · pR(x ) (x)
                                                              S            S
                                                                                                     (3.19)
                                        S∈([N] b=0
                                            S)
                            S

                            1
                        =   N
                                    ∑1[bR(xS ) (x) = 0] · pR(xS ) (x)                               (3.20)
                            S S∈([N])
                                  S

                                                N/T
                                                      1
                                             +∑     N
                                                             1[bR(xS ) (x) = b] · pR(xS ) (x),
                                                                  ∑                                  (3.21)
                                                b=1 S S∈([N])
                                                          S


where we split up the sum into the b = 0 (3.20) and b ≥ 1 (3.21) cases. We first show that the b ≥ 1 term
(3.21) has essentially no contribution to the final value.

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              35
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

   When bR(xS ) (x) ≥ 1, we know by Lemma 3.7 that the magnitude of the polynomial |pR (x)| is at
                         √ 
most exp bR(xS ) (x) · Õ T . Thus the magnitude of the term in Expression (3.21) is at most
                                  N/T                                                   √
                                        1
                                  ∑ N  ∑ 1[bR(x ) (x) = b] · exp(b · Õ( T ))
                                                            S
                                                                                                          (3.22)
                                  b=1   S     [N]
                                            S∈(   )
                                               S

                                  N/T                            √ 
                             = ∑ Pr[bR(xS ) (x) = b] · exp b · Õ   T .                                   (3.23)
                                        S
                                  b=1
    We now need to compute the value of PrS [bR(xS ) (x) = b] for each b. Intuitively, this roughly
corresponds to the probability that we sample
                                        √     S = Θ̃(N 3/4 ) elements from the input and miss all bT
elements that correspond to the T = Θ̃( N) copies of the b range items that appear at least T times.
If we were to sample N/(bT ) elements, the probability of seeing none of the bT elements would be
Θ(1). Since we are sampling S = Θ̃(N 3/4 )  N/(bT ) elements, the probability of not seeing one of the
bT elements is exp(−bST /N) = exp(−b · Ω̃(N 1/4 )). We make this heuristic calculation formal in the
following lemma.
Lemma 3.8. Let S, T ≤ N. Let S ⊂ [N] be a random subset of size S. Then for every x ∈ [R]N and every
b ≥ 1,
                         Pr[bR(xS ) (x) ≥ b] ≤ exp(−b · (ST /N − log N)),
                                  S
recalling that bR(xS ) is the number of range items r ∈ R that appear in x more than T times, but do not
appear in xS .
Proof. To calculate the probability of interest, we begin by analyzing a simplified experiment. Suppose
there are b range items r1 , . . . , rb each appearing greater than T times in x. We compute the probability
that none of the items r1 , . . . , rb appear in the sample xS . If T ≥ N − S, then this probability is zero.
Otherwise, by direct calculation, we have for each i = 1, . . . , b,
                                                      S−1 
                                                                              T S
                                                                                           
                                                               T                            ST
          Pr[ri ∈
                / xS |r1 ∈
                         / xS , . . . , ri−1 ∈
                                             / xS ] ≤ ∏ 1 −          ≤ 1−          ≤ exp −       .
           S                                          j=0    N− j             N             N
Hence the probability that none of r1 , . . . , rb appear is
                                              b                                                    
                                                                                                bST
           Pr[r1 ∈
                 / xS ∧ · · · ∧ rb ∈
                                   / xS ] = ∏ Pr[ri ∈
                                                    / xS |r1 ∈
                                                             / xS , . . . , ri−1 ∈
                                                                                 / xS ] ≤ exp −       .
           S                                i=1 S                                                N
       Now fix an arbitrary x, and let r1 , . . . , rk be the range items appearing greater than T times in x.
Observe that k ≤ N/T . We estimate the probability of interest by taking a union bound over all subsets of
r1 , . . . , rk of size b:
                             Pr[bR(xS ) (x) ≥ b] ≤          ∑           Pr[ri j ∈
                                                                                / xS   ∀j ∈ Y]
                              S                                         S
                                                        Y ⊆[k]:|Y |=b
                                                                     
                                                          k        bST
                                                      ≤      exp −
                                                          b         N
                                                                           
                                                               bST
                                                      ≤ exp −      + b log N .
                                                                N


                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                36
                               T HE P OLYNOMIAL M ETHOD S TRIKES BACK




    From Lemma 3.8 we know that PrS [bR(xS ) (x) = b] ≤ exp(−b · Ω̃(ST /N)). Hence the b ≥ 1 term
(3.21) is at most

                           N/T                                   √
                            ∑ exp(−b · Ω̃(ST /N)) · exp(b · Õ( T )) = o(1),                       (3.24)
                           b=1
                      √
by choosing T = Θ̃( N) and S = Θ̃(N 3/4 ) appropriately. This shows that the term in (3.21) does not
significantly influence the value of the polynomial for any input x.
    Thus we have for any input x,

                                  1
                         r(x) =   N
                                         ∑ 1[bR(xS ) (x) = 0] · pR(xS ) (x) + o(1).               (3.25)
                                  S S∈([N])
                                        S


   By Lemma 3.7, we know that if bR(xS ) (x) = 0 (and hence x ∈ PR(xS ) ), then pR(xS ) (x) is a (1/10)-
approximation to SURJ(x). Applying Lemma 3.8 once again shows that

                                  Pr[bR(xS ) (x) ≥ 1] ≤ exp(−Ω̃(N 1/4 )),
                                  S

so PrS [bR(xS ) (x) = 0] = 1 − o(1). Hence, for all x ∈ [R]N , |r(x) − SURJ(x)| ≤ 1/10 + o(1) ≤ 1/3.
    This completes the proof of Theorem 3.1.


4    Lower bound for surjectivity
The goal of this section is to show the following improved lower bound on the approximate degree of the
surjectivity function.

Theorem 4.1. For some N = O(R), the (1/3)-approximate degree of SURJN,R is Ω̃(R3/4 ).

   To prove Theorem 4.1, we combine the following theorem with the reductions of Proposition 2.10
and Corollary 2.16.

Theorem 4.2. Let N = c · R for a sufficiently large constant c > 0. Let F ≤N : H≤N
                                                                                N·R → {−1, 1} equal

                                                                            ^ ≤N ) ≥ Ω̃(R3/4 ).
ANDR ◦ ORN restricted to inputs in H N·R = {x ∈ {−1, 1}N·R : |x| ≤ N}. Then ubdeg(F
                                          ≤N

    The proof of Theorem 4.2 entails using dual witnesses for the high approximate degree of ANDR and
ORN to construct a dual witness for the higher approximate degree of F ≤N . As indicated in Section 1.3,
the construction is essentially the same as in [33], except that we observe that a dual witness for OR
constructed and used in prior work satisfies an exponentially stronger decay condition than has been
previously realized.
    The construction can be thought of as consisting of three steps:

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            37
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Step 1. We begin by constructing
                        √         a dual witness ψ for the fact that the unbounded approximate degree    √ of
the ORN function is Ω T even when promised that the input has Hamming weight at most T = Θ( R).
The dual witness ψ is a small variant of the one in [33], but we give a more careful analysis of its tail
decay. In particular, we make use of the fact that for all t√≥ 1, the `1 weight that ψ places on the t-th layer
of the Hamming cube is upper bounded by exp(−Ω(t/ T ))/t 2 .

Step 2. We combine ψ with a dual witness Φ for ANDR to obtain a preliminary dual
                                                                              √ witness
                                                                                  √     Φ ? ψ for
F = ANDR ◦ ORN . The dual witness Φ ? ψ shows that F has approximate degree Ω( R · T ) = Ω(R3/4 ).
However, Φ ? ψ places weight on inputs of Hamming weight larger than N, and hence does not give an
unbounded approximate degree lower bound for the promise variant F ≤N .

Step 3. Using Proposition 2.22 we zero out the mass that Φ ? ψ places on inputs of Hamming weight
larger than N, while maintaining its pure high degree and correlation with F ≤N . This yields the final
desired unbounded approximate degree dual witness ζ for F ≤N , as per Proposition 2.5.

4.1   Step 1: A dual witness for ORN
We begin by constructing a univariate function which captures the properties we need of our inner dual
witness for ORN . The construction slightly modifies the dual polynomial for ORN given by Špalek [82].
We provide a careful analysis of its decay as a function of the input’s Hamming weight.

Proposition 4.3. Let T ∈ N and 1/T ≤ δ ≤ 1/2. There exist constants c1 , c2 ∈ (0, 1) and a function
ω : [T ]0 → R such that
                    T
          ω(0) − ∑ ω(t) ≥ 1 − δ ,                                                                        (4.1)
                   t=1
            T
           ∑ |ω(t)| = 1,                                                                                 (4.2)
           t=0
                                                              √       T
          For all univariate polynomials q : R → R, deg q < c1 δ T =⇒ ∑ ω(t) · q(t) = 0,                 (4.3)
                                                                                t=0
                                 √ √
                   170 exp(−c2t δ / T )
          |ω(t)| ≤                                 ∀t = 1, . . . , T.                                    (4.4)
                           δ · t2
Proof of Proposition 4.3. By renormalizing, it suffices to construct a function ω : [T ]0 → R such that
                    T
           ω(0) − ∑ ω(t) ≥ (1 − δ )kωk1 ,                                                                (4.5)
                   t=1
                                                               √       T
           For all univariate polynomials q : R → R, deg q < c1 δ T =⇒ ∑ ω(t) · q(t) = 0,                (4.6)
                                                                               t=0
                                √ √
                    170 exp(−c2t δ / T )kωk1
           |ω(t)| ≤                                       ∀t = 1, . . . , T.                             (4.7)
                              δ · t2

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              38
                               T HE P OLYNOMIAL M ETHOD S TRIKES BACK

                                                                                  2
      pd8/δ e below. We will freely use the fact that since δ ≤ 1/2, we have c ≤ c /(c − 1) ≤ 10/δ . Let
Let c =
m = b T /2cc and define the set

                                     S = {1, c} ∪ {2ci2 : 0 ≤ i ≤ m}.
                  √
Note that |S| ≥ c1 δ T for some absolute constant c1 > 0. Define the function

                                       (−1)t+(T −m) T
                                                    
                                ω(t) =
                                           T!           ∏ (t − r).
                                                    t r∈[T ] \S     0


Property (4.6) follows from the following well-known combinatorial identity.

Fact 4.4 (e. g., [43] Equation (5.23) or [63]). Let T ∈ N, and let p be a polynomial of degree less than T .
Then
                                            T        
                                                   t T
                                           ∑ (−1) t p(t) = 0.
                                           t=0

    Expanding the binomial coefficient in the definition of ω reveals that
                                              
                                                           1
                                               ∏        |t−r|   for t ∈ S,
                                   |ω(t)| =    r∈S\{t}
                                               0                 otherwise.
                                              


    We now use this characterization to establish the improved tail decay property (4.7). This clearly
holds for t = 1 with c2 = 1/10, since |ω(1)| ≤ kωk1 and

                               170        √ √                √
                                   exp(−c2 δ / T ) ≥ 340 · e− 2/10 > 1.
                                δ

    For t = c, we have

                                  |ω(c)|        c ∏m         2
                                                    i=1 (2ci )
                                         =
                                   ω(0)    c(c − 1) ∏m         2
                                                      i=1 (2ci − c)
                                                                 !−1
                                                    m 2
                                             1         i − 1/2
                                         =
                                           c−1 ∏   i=1     i2
                                                                !−1
                                                         m
                                             1               1
                                         ≤        1− ∑ 2
                                           c−1          i=1 2i
                                                           −1
                                                        π2
                                                
                                             1                       6
                                         ≤        1−             ≤                                     (4.8)
                                           c−1          12         c−1

where the first inequality follows from the fact that ∏m                    m
                                                       i=1 (1 − ai ) ≥ 1 − ∑i=1 ai for ai ∈ (0, 1). Now note


                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              39
                           M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

that
                 |ω(c)| |ω(c)|
                       ≤
                  kωk1    ω(0)
                           6
                       ≤
                         c−1
                          60
                       ≤                                              since 10/δ ≥ c2 /(c − 1)
                         δ · c2
                          170 −c√δ /(10√T )
                       ≤        ·e          ,
                         δ · c2
                              √       √
since δ ≥ 1/T and hence e−c δ /(10 T ) ≥ e−1 . Thus (4.7) holds for t = c, recalling that c2 = 1/10.
    For t = 2c j2 with j ≥ 1, we get
                       |ω(t)|                    c ∏m        2
                                                     i=1 (2ci )
                              =
                        ω(0)    (2c j2 − 1)(2c j2 − c) ∏i∈[m]0 \{ j} |2ci2 − 2c j2 |
                                                         c(m!)2
                              =
                                  (4c2 j4 − (2c2 + 2c) j2 + c) ∏i∈[m]0 \{ j} (i + j)|i − j|
                                               c                  (m!)2
                              =                                ·              .
                                  4c2 j4 − (2c2 + 2c) j2 + c (m + j)!(m − j)!
For j ≥ 1, the first factor is bounded by
                                                c                3c
                                                              ≤          ,
                                    4c2 j4 − (2c2 + 2c) j2 + c (2c j2 )2
using the fact that c ≥ 2. We control the second factor by
                               (m!)2          m       m−1        m− j+1
                                          =       ·        ·...·
                          (m + j)!(m − j)! m + j m + j − 1        m+1
                                                    j
                                                m
                                          ≤
                                              m+ j
                                                    j j
                                                     
                                          ≤ 1−
                                                  2m
                                                     2
                                              ≤ e− j /(2m) ,
where the last inequality uses the fact that 1 − x ≤ e−x for all x. Since
                   |ω(t)| |ω(2c j2 )|     3c         −2c j2 /(4cm)   170 −t √δ /(10√T )
                         ≤            ≤           · e              ≤        ·e          ,
                   kωk1     ω(0)        (2c j2 )2                    δ · t2
this establishes (4.7).
    What remains is to perform the correlation calculation to establish (4.5). For t = 1, we observe
                         |ω(1)|        c ∏m        2
                                           i=1 (2ci )
                                                             m
                                                                   i2
                                =                         ≥ ∏ i2 − 1/(2c) ≥ 1.
                          ω(0)    (c − 1) ∏m         2
                                            i=1 (2ci − 1)   i=1


                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                           40
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Next, we observe that the total contribution of t > c to kωk1 /ω(0) is at most

                                                  m
                                     |ω(t)|           3c   ∞
                                                                 3     π4
                               ∑ ω(0) ≤ ∑ (2c j2 )2 < ∑ 4c j4 = 120c .                             (4.9)
                               t>c             j=1         j=1


    Next, we calculate
                                                       !
             T                                T
    ω(0) − ∑ ω(t) ≥ ω(0) − ω(1) −             ∑ |ω(t)|
            t=1                               t=c

                                                     π4
                                                        
                      ≥ ω(0) − ω(1) − ω(c) + ω(0) ·                                    by (4.9)
                                                    120c
                                                  π4
                                                     
                                             6
                      ≥ −ω(1) + ω(0) 1 −        −                                      by (4.8)
                                           c − 1 120c
                      ≥ −ω(1) + (1 − δ )ω(0)                         by our choice of c ≥ 8/δ .   (4.10)

On the other hand,

                                                     π4
                 kωk1 ≤ ω(0) − ω(1) + ω(c) + ω(0) ·                                by (4.9)
                                                    120c
                                                    π4
                                     
                                             6
                      ≤ −ω(1) + ω(0) 1 +        +                                  by (4.8)
                                           c − 1 120c
                      ≤ −ω(1) + (1 + δ )ω(0)                                since c ≥ 8/δ .       (4.11)

Combining (4.10) and (4.11), and using the fact that −ω(1) ≥ ω(0) shows that

                              k
                      ω(0) − ∑t=1 ω(t) −ω(1) + (1 − δ )ω(0) 2 − δ
                                      ≥                     ≥      ≥ 1−δ.
                           kωk1         −ω(1) + (1 + δ )ω(0) 2 + δ

This establishes (4.5), completing the proof.


    The following construction of a dual polynomial for ORN , with N ≥ T , is an immediate consequence
of Minsky-Papert symmetrization (Lemma 2.7), combined with Proposition 4.3.

Proposition 4.5. Let T, N ∈ N with T ≤ N, and let δ > 1/T . Define ω as in Proposition 4.3. Define the
                                                    N
function ψ : {−1, 1}N → [−1, 1] by ψ(x) = ω(|x|)/ |x|            N and ψ(x) = 0 otherwise. Then
                                                       for x ∈ H≤T

                 hψ, ORN i ≥ 1 − δ ,                                                              (4.12)
                 kψk1 = 1,                                                                        (4.13)
                                                                √
                 For any polynomial p : {−1, 1}N → R, deg p < c1 δ T =⇒ hψ, pi = 0.               (4.14)

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                         41
                              M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

4.2     Step 2: Constructing a preliminary dual witness for ANDR ◦ ORN
The following proposition, when combined with Proposition 2.20, shows that there is a function Φ :
{−1, 1}R → [−1, 1] such that the dual block composition Φ ? ψ is a good dual polynomial for ANDR ◦
                                                                                          N·R .
ORN . In the next section, we will modify Φ ? ψ to zero out the weight it places outside H≤N

Proposition 4.6. Let ORN : {−1, 1}N → {−1, 1} and ANDR : {−1, 1}R → {−1, 1}. Let ψ : {−1, 1}N →
[−1, 1] be a function such that

                                  kψk1 = 1         and    hψ, ORN i ≥ 19/20.
                                                                            √
Then there exists a function Φ : {−1, 1}R → [−1, 1] with pure high degree Ω( R) and kΦk1 = 1 such
that
                                    hΦ ? ψ, ANDR ◦ ORN i > 2/3.

      The proof of Proposition 4.6 is implicit in the results of [29, 73].


4.3     Step 3: Constructing the final dual witness
Proposition 4.7. Let R be sufficiently large. There exist N = O(R), D = Ω̃(N 3/4 ), and ζ : ({−1, 1}N )R →
R such that

                               N·R
       ζ (x) = 0 for all x 6∈ H≤N  ,                                                               (4.15)
         ∑ ζ (x) · (ANDR ◦ ORN )(x) > 1/3,
          N·R
                                                                                                   (4.16)
       x∈H≤N

       kζ k1 = 1, and                                                                              (4.17)
                                             N R
        For every polynomial p : ({−1, 1} ) → R of degree less than D, we have hp, ζ i = 0.        (4.18)

Proof. We start by fixing choices of several key parameters:
              √
      • d = Θ( R) is the pure high degree of the dual witness Φ for ANDR in Proposition 4.6,
                             √
      • T = b(R/d)1/2 c2 = Θ( R),
               √
      • D̂ = c1 T · d = Θ(R3/4 ), where c1 is the constant from Proposition 4.3,

      • δ = 1/20,

      • α = 170/δ = 3400,
                √ √
      • β = c2 · δ / T = Θ(1/R1/4 ), where c2 is the constant from Proposition 4.3,
               √
      • N = d20 αeR = 693R.

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                         42
                               T HE P OLYNOMIAL M ETHOD S TRIKES BACK

    Let ψ : {−1, 1}N → [−1, 1] be the function constructed in Proposition 4.5 with δ := 1/20. Let
Φ : {−1, 1}R → [−1, 1] be the function constructed in Proposition 4.6, and define ξ = Φ ? ψ. Then by
Proposition 2.20, ξ satisfies the following conditions:

                  hξ , ANDR ◦ ORN i > 2/3,                                                       (4.19)
                  kξ k1 = 1,                                                                     (4.20)
                   For every polynomial p of degree less than D, we have hξ , pi = 0.            (4.21)

Recall that ψ was obtained by symmetrizing
                                        √ the function ω constructed in Proposition 4.3. Proposi-
tion 2.22 guarantees that for some ∆ ≥ β αR/(4 ln2 R) = Ω̃(R3/4 ), the function ξ can be modified to
produce a function ζ : ({−1, 1}N )R → R such that
                                     N·R
             ζ (x) = 0 for all x 6∈ H≤N  ,
             hζ , ANDR ◦ ORN i ≥ hξ , ANDR ◦ ORN i − kζ − ξ k1 ≥ 2/3 − 2/9 > 1/3,
             kζ k1 = 1,
              For every polynomial p of degree less than min{D̂, ∆}, we have hζ , pi = 0.

Observing that
                                        D = min{D̂, ∆} = Ω̃(R3/4 )
shows that the function ζ satisfies the conditions necessary to prove Proposition 4.7.

    Theorem 4.2 follows by combining Proposition 4.7 with the dual characterization of unbounded
approximate degree given in Proposition 2.5. By Corollary 2.16, we conclude that deg(dSURJ N,R ) =
                                                                                 f
    3/4
Ω̃(R ). Theorem 4.1 follows by Proposition 2.10.


5    Lower bound for k-distinctness
Our goal is to prove the following lower bound on the approximate degree of the k-distinctness function.
Theorem 5.1. For k ≥ 2 and some N = Ok (R), the (1/3)-approximate degree of DISTkN,R is

                                             Ω̃k (R3/4−1/(2k) ).

The same lower bound holds for the quantum query complexity of DISTkN,R .
    Here, the notation Ok hides factors depending only on k, and Ω̃k hides factors logarithmic in R and
factors depending only on k.
    Theorem 5.1 is a consequence of applying the reductions of Proposition 2.13 and Corollary 2.18 to
the following, which is the ultimate goal of this section.
Theorem 5.2. Let G≤N : H≤N
                         N·R → {−1, 1} equal OR ◦ THRk restricted to inputs in H N·R . Then for some
                                                  R  N                          ≤N
                            ≤N
N = Ok (R), we have ubdeg(G ) ≥ Ω̃k (R
                    ^                  3/4−1/(2k) ).
   The proof of Theorem 5.2 will follow the same basic outline as the proof of Theorem 4.2. We will
construct a dual polynomial for G≤N via the following three steps:

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                          43
                          M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Step 1. We first construct
                    √ a dual    witness ψ showing that the unbounded-approximate degree of the
    k                      −1/k
THRN function is Ωk     TN       , even when promised that the input has Hamming weight at most
        √
T = Θk ( R). Moreover, this dual witness satisfies additional conditions that are exploited in Step 2
below.

Step 2. We combine ψ with a dual witness Φ for ORR to obtain a preliminary dual witness  √ Φ√ ? ψ for
ORR ◦ THRkN . The dual witness Φ ? ψ shows that ORR ◦ THRkN has approximate degree Ω( R · T ) =
Ωk (R3/4−1/(2k) ). However, Φ ? ψ places weight on inputs of Hamming weight larger than N, and hence
does not give an unbounded approximate degree lower bound for the promise variant G≤N .

Step 3. Using Proposition 2.22 we zero out the mass that Φ ? ψ places on inputs of Hamming weight
larger than N, while maintaining its pure high degree and correlation with G≤N . This yields the final
desired dual witness ζ for G≤N .

Additional Notation. For functions f : X → {−1, 1} and ψ : X → R, define the error regions
                              E+ (ψ, f ) = {x ∈ X : ψ(x) > 0, f (x) = −1},
                              E− (ψ, f ) = {x ∈ X : ψ(x) < 0, f (x) = +1}.
    These are the regions where ψ disagrees in sign with f . We refer to E+ as the set of “false positive”
errors made by ψ, and E− as the set of “false negative” errors.

5.1   Step 1: A dual witness for THRkN
We begin by constructing a univariate version of our dual witness for THRkN . Properties (5.1) and (5.2)
below amount to more refined conditions on the correlation between ω and the (symmetrized) THRkN
function. These properties will be needed in order to execute Step 2 of the construction in Section 5.2.
Proposition 5.3. Let k, T, N ∈ N with k ≤ T . There exist constants c1 , c2 ∈ (0, 1] and a function ω :
{0, 1, . . . , T } → R such that
                                     1
                         ∑      |ω(t)| ≤,                                                           (5.1)
                      ω(t)>0,t≥k
                                    48N
                                          
                                      1 2
                          ∑ |ω(t)| ≤ 2 − 4k ,                                                       (5.2)
                      ω(t)<0,t<k
                                T
                      kωk1 := ∑ |ω(t)| = 1,                                                         (5.3)
                               t=0
                      For all univariate polynomials q : R → R,
                                 p                       T
                      deg p < c1 k−1 · T · N −1/k =⇒ ∑ ω(t) · q(t) = 0,                             (5.4)
                                                        t=0
                                               √
                               (2k)k exp(−c2t/ k · T · N 1/k )
                      |ω(t)| ≤                                     ∀t = 1, 2, . . . , T.            (5.5)
                                            t2

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            44
                                  T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Proof. If k = 1, then the function defined by ω(0) = 12 and ω(1) = − 21 satisfies the conditions of the
proposition for c1 = c2 = 1. In what follows, we treat the complementary case where k ≥ 2.
   Let E+ := {t : ω(t) > 0,t ≥ k}, and E− := {t : ω(t) < 0,t < k}. By normalizing, it suffices to
construct a function ω : [T ]0 → R such that


                                       1
                       ∑ |ω(t)| ≤ 48N · kωk1 ,                                                         (5.6)
                      t∈E+
                                               
                                         1 2
                       ∑ |ω(t)| ≤         −
                                         2 4k
                                                    · kωk1 ,                                           (5.7)
                      t∈E−

                     For all univariate polynomials q : R → R,
                                p                        k
                     deg p < c1 k−1 · T · N −1/k =⇒ ∑ ω(t) · q(t) = 0,                                 (5.8)
                                                               t=0
                                             √
                              (2k)k exp(−c2t/ k · T · N 1/k )kωk1
                   |ω(t)| ≤                                                   ∀t = 1, 2, . . . , T.    (5.9)
                                             t2
                                      p
    Let c = 2kdN 1/k e, and let m = b T /cc. Define the set

                                      S = {1, 2, . . . , k} ∪ {ci2 : 0 ≤ i ≤ m}.

Note that |S| = Ω(k−1/2 T 1/2 N −1/(2k) ). Define the polynomial

                                         (−1)t+(T −m)+1 T
                                                        
                                  ω(t) =
                                              T!            ∏ (t − r).
                                                        t r∈[T ] \S     0


    The√signs are chosen so that ω(k) < 0. It is immediate from Fact 4.4 that ω satisfies (5.8) for
c1 = 1/ 2. We now show that (5.9) holds. For t = 1, . . . , k, we have
                                             √                            √
                             (2k)k exp(−c2t/ k · T · N 1/k ) (2k)k exp(−c2 k)
                                                            ≥                 ≥1
                                          t2                         k2
as long as c2 ≤ 1/2 and k ≥ 2. Since |ω(t)| ≤ kωk1 , the bound holds for t = 1, . . . , k.
    For t = c j2 with j ≥ 1, we expand out the binomial coefficient in the definition of ω to obtain
                                           
                                                     1
                                            ∏ |t−r|       for t ∈ S,
                                  |ω(t)| =   r∈S\{t}
                                             0             otherwise.
                                           

For t ∈ {0, 1, . . . , k}, we observe that

                                               k! · ∏m      2
                                                                          
                                  |ω(t)|             i=1 (ci − k)         k
                                         =                m      2
                                                                       ≤    .                         (5.10)
                                  |ω(k)| t! · (k − t)! · ∏i=1 (ci − t)    t

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                           45
                            M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Meanwhile, for t = c j2 with j ≥ 1, we get
                              |ω(t)|            k! · ∏m       2
                                                       i=1 (ci − k)
                                     = k
                              |ω(k)| ∏i=1 (c j2 − i) · ∏i∈[m]0 \{ j} |ci2 − c j2 |
                                                          k! · ∏m
                                                                i=1 ci
                                                                        2
                                        ≤
                                            (c j2 − k)k · ∏i∈[m]0 \{ j} c(i + j)|i − j|
                                                 k!                  (m!)2
                                        =                   ·                    .
                                            j(c j2 − k)k        (m + j)!(m − j)!
The first factor is bounded above by
                                                         k!
                                                                  .
                                                   (c − k)k j2k+1
As long as c ≥ 2k and k ≥ 2, this expression is at most
                                                   kk      (2k)k
                                                         =        .
                                                (c/2)k j4 ck · j4
    We control the second factor by
                                (m!)2          m       m−1        m− j+1
                                           =       ·        ·...·
                           (m + j)!(m − j)! m + j m + j − 1        m+1
                                                     j
                                                 m
                                           ≤
                                               m+ j
                                                     j j
                                                      
                                           ≤ 1−
                                                   2m
                                                        2
                                                 ≤ e− j /(2m) ,

where the last inequality uses the fact that 1 − x ≤ e−x for all x. Hence,
                                           |ω(c j2 )|   (2k)k     2
                                                      ≤ k 4 · e− j /(2m) .                        (5.11)
                                            |ω(k)|     c ·j
This immediately yields
                                |ω(c j2 )| |ω(c j2 )|   (2k)k −c j2 /(2cm)
                                          ≤           ≤          ·e        ,
                                 kωk1       |ω(k)|      (c j2 )2
which establishes (5.9) for all t = c j2 > k.
   Moreover, by (5.11)
                                       m                                      m
                                           (2k)k − j2 /(2m) (2k)k                1     |ω(k)|
               ∑ |ω(t)| ≤ |ω(k)| · ∑        k    4
                                                   · e     ≤    k
                                                                  · |ω(k)| · ∑     4
                                                                                     ≤        .   (5.12)
               t>k                     j=1 c · j              c              j=1 j      48N

Hence, since ω(k) < 0,
                                                                    |ω(k)|     kωk1
                                 ∑ |ω(t)| ≤ ∑ |ω(t)| ≤ 48N ≤ 48N ,
                                t∈E+              t>k


                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                         46
                                     T HE P OLYNOMIAL M ETHOD S TRIKES BACK

which gives (5.6).
   Finally, to establish (5.7), we combine (5.10) and (5.12) to obtain
                                         k  
                                 kωk1       k       1               1
                                      ≤∑        +       < 2k + 1 < · 4k .                          (5.13)
                                |ω(k)| t=0 t       48N              2

We calculate
                kωk1
                     − ∑ |ω(t)| = ∑ (−ω(t)) − ∑ (−ω(t))                       since hω, 1i = 0
                 2    t∈E−       t:ω(t)<0    t∈E−

                                      =        ∑           (−ω(t))
                                          t : ω(t)<0,t≥k

                                      ≥ −ω(k).

Rearranging and applying the bound (5.13),
                                                           
                                    1 ω(k)         1       −k
                     ∑ |ω(t)| ≤ 2 + kωk1 · kωk1 ≤ 2 − 2 · 4 · kωk1 .
                    t∈E−




    Applying Minsky-Papert symmetrization (Lemma 2.7) to ensure that the resulting function has the
appropriate pure high degree, Proposition 5.3 yields a dual polynomial for THRkN .
                                                                                             N
Proposition 5.4. Let k, T, N ∈ N with k ≤ T ≤ N. Define ψ : {−1, 1}N → R by ψ(x) = ω(|x|)/ |x|
                                                                                               
                                                                                                 for
     N
x ∈ H≤T and ψ(x) = 0 otherwise, where ω is as constructed in Proposition 5.3. Then

                                      1
                ∑         |ψ(x)| ≤       ,                                                         (5.14)
                                     48N
        x∈E+ (ψ,THRkN )
                                     1 2
                ∑         |ψ(x)| ≤    − ,                                                          (5.15)
                                     2 4k
        x∈E− (ψ,THRkN )

        kψk1 = 1,                                                                                  (5.16)
                                                           p
        For any polynomial p : {−1, 1}N → R, deg p < c1 k−1 · T · N −1/k =⇒ hψ, pi = 0,            (5.17)
                                      p
         ∑  |ψ(x)| ≤ (2k)k
                           exp(−c 2 t/ k · T · N 1/k )/t 2  ∀t = 1, 2, . . . , N.                  (5.18)
        |x|=t


5.2     Step 2: A preliminary dual witness for ORR ◦ THRkN
5.2.1    Refined amplification lemmas
The dual witness Φ for ORR that we construct will itself be obtained as a dual block composition ρ ? ϕ.
Each constituent dual polynomial will play a distinct role in showing that Φ ? ψ is a good dual polynomial
for ORR ◦ THRkN . The first function ϕ is an “error amplifier” in the sense that ϕ ? ψ is much better

                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                       47
                                      M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

correlated with ORR ◦ THRkN than ψ is with THRkN . The second function ρ, on the other hand, is a
“degree amplifier” in that it serves to increase the pure high degree of ϕ ? ψ.
    While the amplification results we need are new, they are relatively straightforward extensions of
similar results in [29, 30, 73]. Proofs appear below for completeness.

Amplifying error. The following proposition shows that if ψ is a dual witness for the high approximate
degree of a Boolean function f , then there is a dual witness of the form ϕ ? ψ for ORM ◦ f such that (a)
ϕ ? ψ may make slightly more false positive errors than ψ (by at most a factor of M), and (b) ϕ ? ψ makes
significantly fewer false-negative errors than ψ (exponentially smaller in M).

Proposition 5.5. Let f : {−1, 1}m → {−1, 1} and ψ : {−1, 1}m → R be functions such that

                                                             ∑          |ψ(x)| ≤ δ + ,                                       (5.19)
                                                         x∈E+ (ψ, f )

                                                             ∑          |ψ(x)| ≤ δ − ,                                       (5.20)
                                                         x∈E− (ψ, f )

                                                         kψk1 = 1.                                                           (5.21)

For every M ∈ N, there exists a function ϕ : {−1, 1}M → [−1, 1] with kϕk1 = 1 and pure high degree 1
such that

                                                    ∑              |(ϕ ? ψ)(x)| ≤ M · δ + ,                                  (5.22)
                                            x∈E+ (ϕ?ψ,ORM ◦ f )
                                                                                    1
                                                    ∑              |(ϕ ? ψ)(x)| ≤     · (2δ − )M .                           (5.23)
                                            x∈E− (ϕ?ψ,ORM ◦ f )
                                                                                    2

Proof. Let ϕ : {−1, 1}M → {−1, 1} be defined such that ϕ(1) = 1/2, ϕ(−1) = −1/2, and ϕ(x) = 0 for
all other x. Notice that
                                       ∑        ϕ(x1 , . . . , xM ) = 0                    (5.24)
                                                 (x1 ,...,xM )∈{−1,1}M

so Ψ has pure high degree 1, and that kΨk1 = 1.
       We now prove that Equation 5.22 holds. Let λ be the distribution on {−1, 1}m given by λ (x) = |ψ(x)|,
and let λ ⊗M be the product distribution on ({−1, 1}m )M given by λ ⊗M (x1 , . . . , xM ) = ∏M         i=1 |ψ(xi )|.
Since ψ is orthogonal to the constant polynomial, it has expected value 0, and hence the string
(. . . , sgn(ψ(xi )), . . . ) is distributed uniformly in {−1, 1}M when one samples (x1 , . . . , xM ) according
to λ ⊗M . This allows us to write

                 ∑                   |(ϕ ? ψ)(x1 , . . . , xM )|
  (x1 ,...,xM )∈E + (ϕ?ψ,ORM ◦ f )

  = 2M Eλ ⊗M [ϕ(. . . , sgn(ψ(xi )), . . . ) · I(ϕ(. . . , sgn(ψ(xi )), . . . ) > 0 ∧ ORM (. . . , f (xi ), . . . ) = −1)]
  =     ∑ ϕ(z) · λPr [ORM (. . . , f (xi ), . . . ) = −1|(. . . , sgn(ψ(xi )), . . . ) = z].
                          ⊗M
                                                                                                                             (5.25)
      z:ϕ(z)>0



                               T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                            48
                                           T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Observe that for any bit b,

                                 Pr [ f (x) 6= sgn(ψ(x))| sgn(ψ(x)) = b] = 2 ∑ |ψ(x)|,
                                x∼λ                                                        x∈Ab


where for brevity, we have written A+1 = E+ (ψ, f ) and A−1 = E− (ψ, f ). Therefore, as noted in [75], for
any given z ∈ {−1, 1}M , the following two random variables are identically distributed:

    • The string (. . . , f (xi ), . . . ), when one chooses (. . . , xi , . . . ) from λ ⊗M conditioned on
      (. . . , sgn(ψ(xi )), . . . ) = z

    • The string (. . . , yi zi , . . . ), where y ∈ {−1, 1}M is a random string whose ith bit independently takes
      on value −1 with probability 2 ∑x∈Azi |ψ(x)|.

Thus, Expression (5.25) equals

                                              ∑ ϕ(z) · Pry [ORM (. . . , yi zi , . . . ) = −1],                               (5.26)
                                           z:ϕ(z)>0


where y ∈ {−1, 1}M is a random string whose ith bit independently takes on value −1 with probability
2 ∑x∈Azi |ψ(x)|. The only term in this sum corresponds to z = 1, which we now argue is at most Mδ + . By
(5.19), each yi = −1 with probability 2 ∑x∈A1 |ψ(x)| = 2 ∑x∈E + (ψ, f ) |ψ(x)| ≤ 2δ + . Hence, for z = 1, we
have
                                   Pr[ORM (. . . , yi , . . . ) = −1] ≤ 2Mδ +
                                                  y

by a union bound. Thus Expression (5.26) is at most Mδ + , proving (5.22).
    It now remains to prove the bound (5.23). By an identical argument as above, we have

                    ∑                    |(ϕ ? ψ)(x1 , . . . , xM )| =     ∑ ϕ(z) · Pry [ORM (. . . , yi zi , . . . ) = 1].   (5.27)
      (x1 ,...,xM )∈E − (ϕ?ψ,ORM ◦ f )                                   z:ϕ(z)<0


The only term in the sum corresponds to z = −1, which we argue takes value 12 · (2δ − )M . Here, each yi =
−1 independently with probability ∑x∈A−1 |ψ(x)| = 2 ∑x∈E − (ψ, f ) |ψ(x)| ≤ 2δ − , and ORM (. . . , −yi , . . . ) =
1 only if yi = −1 for every i. Hence, we conclude that

                                              Pr[ORM (. . . , −yi , . . . ) = 1] ≤ (2δ − )M .
                                              y


It follows that Expression (5.27) is at most 12 · (2δ − )M , establishing (5.22). This completes the proof.


Amplifying degree. The following proposition states that if ψ is a dual polynomial for a Boolean
function f , then there is a dual polynomial ρ ? ψ for ORM ◦ f with significantly larger pure high degree
that does not make too many more false positive and false negative errors than does ψ itself.

                             T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                               49
                              M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Proposition 5.6. Let f : {−1, 1}m → {−1, 1} and ψ : {−1, 1}m → R be functions such that

                                                         ∑          |ψ(x)| ≤ δ + ,                                    (5.28)
                                                     x∈E+ (ψ, f )

                                                         ∑          |ψ(x)| ≤ δ − ,                                    (5.29)
                                                     x∈E− (ψ, f )

                                                     kψk1 = 1                                                         (5.30)
                                                                                              √
For every M ∈ N there exists a function ρ : {−1, 1}M → R with kρk1 = 1 and pure high degree Ω( M)
such that
                                                     9
                                hρ ? ψ, ORM ◦ f i ≥    − 4Mδ + − 4δ − .                      (5.31)
                                                    10
                                                                                     √
Proof. Lemma 2.6 shows that the function ORM has (9/10)-approximate degree Ω( M). Hence,
                                                                  M
          √ 2.3 guarantees the existence of a function ρ : {−1, 1} → R with kρk1 = 1 and pure high
Proposition
degree Ω( M) such that
                                                         9
                                            hρ, ORM i ≥ .                                    (5.32)
                                                        10
What remains is to establish the correlation bound (5.31). Letting λ denote the distribution λ (x) = |ψ(x)|
as in the proof of Proposition 5.5, we may write

                         ∑                (ϕ ? ψ)(x1 , . . . , xM ) · ORM (. . . , f (xi ), . . . )
              (x1 ,...,xM )∈({−1,1}m )M

                     = 2M Eλ ⊗M [ϕ(. . . , sgn(ψ(xi )), . . . ) · ORM (. . . , f (xi ), . . . )]
                     =       ∑       ϕ(z) · Eλ ⊗M [ORM (. . . , f (xi ), . . . )|(. . . , sgn(ψ(xi )), . . . ) = z]
                         z∈{−1,1}M

                     =       ∑       ϕ(z) · Ey [ORM (. . . , yi zi , . . . )],                                        (5.33)
                         z∈{−1,1}M


where y ∈ {−1, 1}M is a random string whose ith bit independently takes the value −1 with probability
2 ∑x∈Azi |ψ(x)|. (Here, we are using the abbreviated notation A+1 = E+ (ψ, f ) and A−1 = E− (ψ, f ).) We
first consider the contribution of the term corresponding to z = 1 to the sum. Here, by a union bound,

                          Ey [ORM (. . . , yi zi , . . . )] = 1 − 2 Pr[ORM (. . . , yi , . . . ) = −1]
                                                                     y
                                                                                               !
                                                            ≥ 1 − 2M · 2 ∑ |ψ(x)|
                                                                                 x∈A+1
                                                                            +
                                                            ≥ 1 − 4Mδ .

Hence, the term z = 1 contributes ϕ(1) · (1 − 4Mδ + ) to the sum.
   Now we consider the contribution of any term corresponding to z 6= 1. Given such a z, let i∗ be an

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                           50
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

index such that zi∗ = −1. Then we have
                      −Ey [ORM (. . . , yi zi , . . . )] = 1 − 2 Pr[ORM (. . . , yi zi , . . . ) = 1]
                                                                  y

                                                       ≥ 1 − 2 · Pr[yi∗ = −1]
                                                                   yi∗
                                                                                       !
                                                       = 1 − 2 · 2 ∑ |ψ(x)|
                                                                         x∈A−1
                                                                   −
                                                       ≥ 1 − 4δ .
   We can now lower bound (5.33) by
  ϕ(1) · (1 − 4Mδ + ) − ∑ ϕ(z)(1 − 4δ − ) ≥             ∑       ϕ(z)ORM (z) − 4Mδ + |ϕ(1)| − 4δ − ∑ |ϕ(z)|
                       z6=1                         z∈{−1,1}M                                           z6=1
                                                    9
                                                ≥      − 4Mδ + − 4δ − .
                                                    10

5.2.2   Constructing a dual witness for ORR ◦ THRkN
We now combine our amplification lemmas to construct a dual witness for ORR ◦ THRkN .
Proposition 5.7. Let k, T, N, R ∈ N with k ≤ T ≤ R ≤ N and R divisible by 4k . Let ψ : {−1, 1}N → R be
a function with kψk1 = 1 and
                                                                          1
                                                ∑          |ψ(x)| ≤          ,
                                                                         48N
                                         x∈E+ (ψ,THRkN )
                                                                         1 2
                                                ∑          |ψ(x)| ≤       − .
                                                                         2 4k
                                         x∈E− (ψ,THRkN )
                                                                                      √
Then there exists a function Φ : {−1, 1}R → R with kΦk1 = 1 and pure high degree Ω(2−k R) such that
                                         hΦ ? ψ, ORR ◦ THRkN i ≥ 2/3.
Proof. Using the construction of Proposition 5.5 with m = N, M = 4k , and f = THRkN , we first obtain a
                     k
function ϕ : {−1, 1}4 → R with kϕk1 = 1 and pure high degree 1 such that
                                                                       4k
                                ∑              |(ϕ ? ψ)(x)| ≤             ,
                                                                      48N
                     x∈E+ (ϕ?ψ,OR4k ◦THRkN )

                                                                      1              4k e−4
                                ∑              |(ϕ ? ψ)(x)| ≤           · 1 − 4 · 4−k    ≤    .
                                                                      2                    2
                     x∈E− (ϕ?ψ,OR4k ◦THRkN )

Now by the construction of Proposition 5.6 with m = 4k · N, M = R/4k and f = OR4k ◦ THRkN , there
                                k                                            √
exists a function ρ : {−1, 1}R/4 → R with kρk1 = 1 and pure high degree Ω(2−k R) such that
                                                                           9   4R         2
                  hρ ? (ϕ ? ψ), ORR/4k ◦ (OR4k ◦ THRkN )i ≥                  −    − 2e−4 ≥ .
                                                                          10 48N          3

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                     51
                              M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

Let Φ : {−1, 1}R → R be the dual block composition ρ ? ϕ. Since the dual block composition preserves
`1 -norms (Proposition 2.20, Condition (2.8)) and multiplies pure high degrees
                                                                            √ (Proposition 2.20, Condi-
tion (2.9)), the function Φ itself has `1 -norm 1 and pure high degree Ω(2−k R). The claim now follows
from the associativity of dual block composition (Proposition 2.20, Condition (2.10)) and the fact that
ORR = ORR/4k ◦ OR4k .

5.3     Step 3: Completing the construction
We are now ready to apply Proposition 2.22 to zero out the mass that the construction of Proposition 5.7
                             N·R .
places on inputs outside of H≤N

Proposition 5.8. Let R be sufficiently large. There exist N = O((2k)k/2 · R), D = Ω̃(2−k/4 k(k−3)/4 ·
R3/4−1/(2k) ), and ζ : ({−1, 1}N )R → R such that
                               N·R
       ζ (x) = 0 for all x 6∈ H≤N  ,                                                                    (5.34)
         ∑ ζ (x) · (ORR ◦ THRkN )(x) > 1/3,
          N·R
                                                                                                        (5.35)
       x∈H≤N

       kζ k1 = 1, and                                                                                   (5.36)
       For every polynomial p : ({−1, 1}N )R → R of degree less than D, we have hp, ζ i = 0.            (5.37)

Proof. We start by fixing choices of several key parameters:
                 √
   • d = Θ(2−k R) is the pure high degree of the dual witness Φ for ORR in Proposition 5.7,
                   √
   • T = b(8k)k/2 Rc,

      • α = (2k)k ,
                  √
      • N = d20 αeR = Θ((2k)k/2 R),
               √
      • D̂ = c1 k−1 · T · N −1/k · d = Θ(2−k/4 k(k−3)/4 · R3/4−1/(2k) ), where c1 is the constant from Proposi-
        tion 5.3,
                 √
      • β = c2 / k · T · N 1/k = Θ(2−3k/4 k(−k−3)/4 · R−1/4−1/(2k) ), where c2 is the constant from Proposi-
        tion 5.3.

    Let ψ : {−1, 1}N → [−1, 1] be the function constructed in Proposition 5.4. Let Φ : {−1, 1}R → [−1, 1]
be the function constructed in Proposition 5.7, and define ξ = Φ ? ψ. Then by Proposition 2.20, ξ satisfies
the following conditions:

                     hξ , ORR ◦ THRkN i > 2/3,                                                          (5.38)
                     kξ k1 = 1,                                                                         (5.39)
                     For every polynomial p of degree less than D̂, we have hξ , pi = 0.                (5.40)


                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                              52
                                  T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Recall that ψ was obtained by symmetrizing
                                        √     the function ω constructed in Proposition 5.3. Proposi-
tion 2.22 guarantees that for some ∆ ≥ β αR/(4 ln2 R) = Ω̃(2−k/4 k(k−3)/4 · R3/4−1/(2k) ), the function ξ
can be modified to produce a function ζ : ({−1, 1}N )R → R such that
                                       N·R
               ζ (x) = 0 for all x 6∈ H≤N  ,
               hζ , ORR ◦ THRkN i ≥ hξ , ORR ◦ THRkN i − kζ − ξ k1 ≥ 2/3 − 2/9 > 1/3,
               kζ k1 = 1,
                For every polynomial p of degree less than min{D̂, ∆}, we have hζ , pi = 0.

Observing that
                               D = min{D̂, ∆} = Ω̃(2−k/4 k(k−3)/4 · R3/4−1/(2k) )
shows that the function ζ satisfies the conditions necessary to prove Proposition 5.8.

    Theorem 5.2 now follows by combining Proposition 5.8 with the dual characterization of unbounded
approximate degree Proposition 2.5. The approximate degree lower bound in Theorem 5.1 is then a
consequence of Proposition 2.13 and Corollary 2.18. The quantum query lower bound follows via the
standard fact that the ε-error quantum query complexity of f is lower bounded by 1/2 · deg
                                                                                       f ( f ) [13].
                                                                                          2ε



6     Lower bound for image size testing and its implications
6.1    Image size testing
The image size testing problem (IST for short) is defined as follows.
Definition 6.1. Given an input s = (s1 , . . . , sN ) ∈ [R]N0 , and i ∈ [R], let fi = |{ j : s j = i}|. The image size
of s is the number of i ∈ [R] such that fi > 0. For 0 < γ < 1, define:
                                            
                                            −1
                                                             if the image size is R,
                      γ
                 ISTN,R (s1 , . . . , sN ) = 1                if the image size is at most γ · R,
                                            
                                             undefined otherwise.
                                            

    Observe that the definition of IST ignores whether or not the range item 0 has positive frequency, just
like the functions dSURJ and dDISTk . We choose to define IST in this manner to streamline our analysis.
    The goal of this section is to prove the following lower bound.
                                                                                    
Theorem 6.2. For some constant c > 0, and any constant γ ∈ (0, 1), deg    f ISTγ               1/2 ), where
                                                                                  N,R ≥ Ω̃(R
N = c · γ −1/2 · R. The same lower bound applies to the quantum query complexity of ISTN,R .
                                                                                                    γ


Remark 6.3. It is possible to refine our analysis to show that even the unbounded approximate degree of
ISTN,R is Ω̃(R1/2 ), and that this holds even if the error parameter is 1 − 2−n . However, for brevity we
    γ                                                                            Ω(1)


do not explicitly establish this stronger result. We direct the interested reader to the subsequent work [34],
which shows that the threshold degree of SURJN,R is Ω̃(R1/2 ) for R ≤ N/2. The proof of that result can
                                                                               γ
be extended with little difficulty to show that the threshold degree of ISTN,R is Ω̃(R1/2 ).

                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                     53
                                     M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

6.1.1    Connecting symmetric promise properties and block compositions of partial functions
For any function f and symmetric function g, Section 2.5 described a connection between the symmetric
property
             F prop (s1 , . . . , sN ) = f (g(1[s1 = 1], . . . , 1[sN = 1]), . . . , g(1[s1 = R], . . . , 1[sN = R]))
and the partial function
                                     (
                                      f (g(x1 ), . . . , g(xR )) if x1 , . . . , xR ∈ {−1, 1}N , |x1 | + · · · + |xR | ≤ N,
         F ≤N (x1 , . . . , xR ) =
                                      undefined                  otherwise.
    For simplicity and clarity, that discussion was restricted to total functions f and g (in particular, this
avoided having to address the possibility that f (g(x1 ), . . . , g(xR )) is undefined in the definitions of F prop
and F ≤N ). Because IST is a partial function, we need to explain that the same connection still holds
even when f is a partial function. To do this, we need to introduce the notion of the double-promise
approximate degree of F ≤N .
Definition 6.4. Let Y ⊂ {−1, 1}R and f : Y → {−1, 1}, and let g : {−1, 1}N → {−1, 1} be a symmetric
(total) function. Let G = {x1 , . . . , xR : (g(x1 ), . . . , g(xR )) ∈ Y }. Let F ≤N be defined as above. Observe that
F ≤N is defined at all inputs in H≤N    N·R ∩ G. The double-promise ε-approximate degree of F ≤N , denoted

^ ≤N ) is the least degree of a real polynomial p such that:
dpdeg(F
                                         |p(x) − F ≤N (x)| ≤ ε for all x ∈ H≤N
                                                                            N·R
                                                                                ∩ G,                                          (6.1)
                                                                     N·R
                                         |p(x)| ≤ 1 + ε for all x ∈ H≤N  \ G.                                                 (6.2)
   Observe that in the definition above, no restriction is placed on p(x) for any inputs that are not in
 N·R .
H≤N
   Bun and Thaler’s analysis from [33] (cf. Theorem 2.15) applies to partial functions f in the following
way.
Theorem 6.5 (Bun and Thaler [33]). Let Y ⊂ {−1, 1}R and let f : Y → {−1, 1} be any partial function.
Let g : {−1, 1}N → {−1, 1} be any symmetric function. Then for F prop and F ≤N defined above, and for
any ε > 0, we have
                                   degε
                                                  ^ ε (F ≤N ).
                                    f (F prop ) ≥ dpdeg
                                                      ^ ε (F ≤N ).
    We will require the following dual formulation of dpdeg
Proposition 6.6. Let F ≤N and G be defined as above. Then dpdeg
                                                            ^ ε (F ≤N ) ≥ d if and only if there exists
a function ψ : {−1, 1}n → R satisfying the following conditions.
                               N·R
        ψ(x) = 0 for all x 6∈ H≤N  ,                                                                                          (6.3)
                               ≤N
           ∑        ψ(x) · F        (x) −      ∑        |ψ(x)| > ε,                                                           (6.4)
           N·R ∩G
        x∈H≤N                                  N·R \G
                                            x∈H≤N

           ∑        |ψ(x)| = 1, and                                                                                           (6.5)
        x∈{−1,1}n

        For every polynomial p : {−1, 1}n → R of degree less than d,                        ∑       p(x) · ψ(x) = 0.          (6.6)
                                                                                        x∈{−1,1}n


                            T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                               54
                                T HE P OLYNOMIAL M ETHOD S TRIKES BACK

    We will need to define the following partial function.
                                  γ
                                  R
Definition 6.7. Define GapANDR : H≤(γ·R) ∪ {−1} → {−1, 1} via:
                                        
                                        −1
                                                            if xi = −1 for all i,
                                 γ                                    R
                           GapANDR (x) = 1                   if x ∈ H≤(γ·R) ,
                                        
                                         undefined           otherwise.
                                        

                                      γ                                                                  γ
   In the case where f = GapANDR and g = ORN , the function F prop (s1 , . . . , sN ) is precisely the ISTN,R
function. Hence:
                                                                γ
Corollary 6.8. Let N, R ∈ N. Let Y be the domain of GapANDR , and let G = {(x1 , . . . , xR ) ∈ {−1, 1}N·R :
(ORN (x1 ), . . . , ORN (xR )) ∈ Y }. Then for any ε > 0,

                                                      ^ ε (F ≤N )
                                          f (ISTγ ) ≥ dpdeg
                                          degε  N,R

where F ≤N : G ∩ H≤N
                  N·R → {−1, 1} is the partial function obtained by restricting GapAND ◦ OR to   γ
                                                                                      R    N
 N·R .
H≤N

                                                                        ^ ε (F ≤N ).
    With Corollary 6.8 in hand, we now turn to proving a lower bound on dpdeg

6.1.2   Completing the proof of Theorem 6.2
Proof. We construct a dual polynomial to witness the lower bound in Theorem 6.2.
   Define the parameters

    • δ = γ/4,

    • α = 170/δ ,
                  √
    • T = N = d20 αeR ≤ 310 · γ −1/2 · R,
              √ √
    • β = c2 · δ / T , where c2 is the constant from Proposition 4.3.

    Let ψ be the dual witness for ORN from Proposition 4.5 with T = N. Define Φ : {−1, 1}R → R as
follows:                            
                                    −1/2 if x = (−1, −1, . . . , −1),
                                    
                             Φ(x) = 1/2       if x = (1, 1, . . . , 1),
                                    
                                      0      otherwise.
                                    

    The dual block composition Φ ? ψ is the same dual witness for ANDR ◦ ORN which Bun and Thaler
                        f (ANDR ◦ ORN ) = Ω(N 1/2 ) for ε = 1 − 2−R . This dual witness was also used
[30] used to show that deg ε
in subsequent work [21, 79]. Curiously, we are interested in this dual witness for a completely different
reason than these prior works. These prior works were interested in Φ ? ψ because its correlation with the
target function ANDR ◦ ORN is exponentially closer to 1 than is the correlation of ψ with ORN . For our

                       T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                 55
                               M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

purposes, it will not be essential to exploit such a strong correlation guarantee—rather, we are interested
in Φ ? ψ because most of its “`1 -mass” lies on inputs either with full image or tiny image (i. e., most of its
                                       γ
mass lies in the domain of GapANDR ◦ ORN ).
                                                                                γ
    As in Corollary 6.8, let G denote the set of all inputs on which GapANDR ◦ ORN is defined, i. e.,

                 G = {x1 , . . . , xR ∈ {−1, 1}N·R : (ORN (x1 ), . . . , ORN (xR )) ∈ H≤γ·R
                                                                                       R
                                                                                            ∪ {−1}}.

The analysis in these prior works [21, 30] implies that Φ ? ψ satisfies the following three conditions.

              kΦ ? ψk1 = 1,                                                                              (6.7)
                                       γ      
               ∑ (Φ ? ψ)(x) ·    GapANDR ◦ ORN (x) −               ∑          |(Φ ? ψ)(x)| ≥ 9/10,       (6.8)
               x∈G                                           x∈{−1,1}N·R \G
                                                               √
              For any polynomial p : {−1, 1}N·R → R, deg p < c1 δ T =⇒ hΦ ? ψ, pi = 0,                   (6.9)

where c1 is the constant from Proposition 4.5. Indeed, Properties (6.7) and (6.9) are immediate from
Proposition 2.20 on the properties of dual block composition. For completeness, we prove that Prop-
erty (6.8) holds in Section 6.1.3 below, making use of the fact that δ = γ/4 and taking R to be sufficiently
large.                                                                                   √
    Proposition 2.22 (with α and β set as above) guarantees that for some ∆ ≥ β αR/(4 ln2 R) =
Ω̃(R1/2 ), the function Φ ? ψ can be modified to produce a function ζ : ({−1, 1}N )R → R such that
                                 N·R
         ζ (x) = 0 for all x 6∈ H≤N  ,
                       γ                               γ
         hζ , GapANDR ◦ ORN i ≥ hΦ ? ψ, GapANDR ◦ ORN i − kζ − Φ ? ψk1 ≥ 9/10 − 2/9 > 1/3,
         kζ k1 = 1,
         For every polynomial p of degree less than D := min{D̂, ∆}, we have hζ , pi = 0.

Observing that                                     √
                                         D = min{c1 δ T , ∆} = Ω̃(R1/2 )
shows that the function ζ satisfies the conditions necessary to prove Theorem 6.2 via Proposition 6.6.

6.1.3    Proof of Condition (6.8)
Lemma 6.9. Let δ > 0, γ > 2δ , and let

                 G = {x1 , . . . , xR ∈ {−1, 1}N·R : (ORN (x1 ), . . . , ORN (xR )) ∈ H≤γ·R
                                                                                       R
                                                                                            ∪ {−1}}.

Define Φ : {−1, 1}R → [−1, 1] by Φ(−1) = −1/2, Φ(1) = 1/2 and Φ(z) = 0 otherwise. Let ψ :
{−1, 1}N → [−1, 1] be any dual witness for ORN such that kψk1 = 1, hψ, 1i = 0, and hψ, ORN i ≥ 1 − δ .
Then

   ∑ (Φ ? ψ)(x) · GapANDR ◦ ORN (x) − ∑ |(Φ ? ψ)(x)| ≥ 1 − δ R − exp(−(γ − δ )R/3).
                           γ       
   x∈G                                            x∈{−1,1}N·R \G

    The proof of Lemma 6.9 crucially relies on a special property, called one-sided error, that is satisfied
by any dual polynomial for OR.

                           T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                            56
                                       T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Definition 6.10. Let f : {−1, 1}N → {−1, 1} and let ψ : {−1, 1}N → R. We say that ψ has one-sided
error with respect to f if for all x ∈ {−1, 1}N ,

                                               f (x) = 1 =⇒ ψ(x) > 0.                                                    (6.10)

    The following lemma shows that any dual witness for the ORN function has one-sided error.

Lemma 6.11 (Gavinsky and Sherstov [41]). Let ψ : {−1, 1}N → R be a function with pure high degree
at least 1 such that hψ, ORN i > 0. Then ψ has one-sided error with respect to ORN .

    In particular, if ψ is such a dual witness for ORN , then we have
                                               1
                                ∑ |ψ(x)| ≤ 2 (1 − hψ, ORN i),             ∑ |ψ(x)| = 0,                                  (6.11)
                               x∈A+1                                   x∈A−1

where the sets A+1 and A−1 are, respectively, the sets of false positive and false negative errors given by

                      A+1 = E+ (ψ, ORN ) = {x ∈ {−1, 1}N : ψ(x) > 0, ORN (x) = −1},
                      A−1 = E− (ψ, ORN ) = {x ∈ {−1, 1}N : ψ(x) < 0, ORN (x) = +1}.

Proof of Lemma 6.9. We begin by observing that the quantity of interest can be written as

                 ∑         (Φ ? ψ)(x) · (ANDR ◦ ORN ) (x)−
             x∈{−1,1}N·R
                                                                                                                     !
                                  ∑        (Φ ? ψ)(x) · (ANDR ◦ ORN ) (x) +           ∑           |(Φ ? ψ)(x)|
                             x∈{−1,1}N·R \G                                      x∈{−1,1}N·R \G

         ≥       ∑         (Φ ? ψ)(x) · (ANDR ◦ ORN ) (x) − 2        ∑           |(Φ ? ψ)(x)|.                           (6.12)
             x∈{−1,1}N·R                                        x∈{−1,1}N·R \G

We estimate each term of Expression (6.12) separately, beginning with the first term. Just as in the proofs
of Proposition 5.5 and Proposition 5.6, we have

                 ∑         (Φ ? ψ)(x) · (ANDR ◦ ORN )(x) =       ∑       Φ(z) · Ey [ANDR (. . . , yi zi , . . . )]
             x∈{−1,1}N·R                                     z∈{−1,1}R


where y ∈ {−1, 1}R is a random string whose ith bit independently takes the value −1 with probability
2 ∑x∈Azi |ψ(x)|. For z = −1, we have by (6.11) that 2 ∑x∈A−1 |ψ(x)| = 0, so the contribution of the
corresponding term to the sum is 1/2. For z = 1, we use the fact that 2 ∑x∈A+1 |ψ(x)| ≤ δ to compute
                                                                                                  
                  1                                    1
                    · Ey [ANDR (. . . , yi , . . . )] = · 1 − 2 Pr[ANDR (. . . , yi , . . . ) = −1]
                  2                                    2         y
                                                       1
                                                      ≥ · 1 − 2δ R .
                                                                   
                                                       2
Hence, the first summand of (6.12) is at least 1 − δ R .

                            T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                           57
                                 M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

   We now estimate the second summand, 2 ∑x∈G    / |(Φ ? ψ)(x)|. As in the proofs of Proposition 5.5 and
Proposition 5.6, we let λ denote the distribution with probability mass function λ (x) = |ψ(x)|. Then

2 ∑ |(Φ ? ψ)(x)| = 2R+1 Eλ ⊗R [|Φ(. . . , sgn ψ(xi ), . . . )| · I(x ∈
                                                                     / G)]
 x∈G
  /
                     =2        ∑      |Φ(z)| · Pr [x ∈
                                                     / G|(. . . , sgn(ψ(xi )), . . . ) = z]
                                               λ ⊗R
                          z∈{−1,1}R

                     = Pr [x ∈
                             / G|(. . . , sgn(ψ(xi )), . . . ) = −1] + Pr [x ∈
                                                                             / G|(. . . , sgn(ψ(xi )), . . . ) = 1].
                        λ ⊗R                                               λ ⊗R

We analyze each term of this sum separately. For the first term, observe that by one-sided error of ψ, it
follows that if x = (x1 , . . . , xR ) is any input sgn(ψ(xi )) = −1 for all i then ORN (xi ) = −1 for all i. Thus
we are guaranteed that x ∈ G, so the contribution of the first summand is zero. To analyze the second
summand, let us denote by ri ∈ {0, 1} the indicator random variable for the event ORN (xi ) = −1 when xi
is drawn from the conditional distribution (λ | sgn(ψ(xi )) = 1). Then

                                 Pr[ri = 1] = Pr [ORN (xi ) = −1| sgn(ψ(xi )) = 1]
                                                 xi ∼λ
                                             = 2 ∑ |ψ(xi )|
                                                   x∈A+1

                                             ≤δ

by (6.11). Hence,
                                                                                  "           #
                                                                                      R
                           Pr [x ∈
                                 / G|(. . . , sgn(ψ(xi )), . . . ) = 1] ≤ Pr          ∑ ri > γR
                          λ ⊗R                                                    i=1
                                                                                         
                                                                                (γ − δ )R
                                                                        ≤ exp −
                                                                                    3
by the multiplicative Chernoff bound.6 Thus, we have
                                                                         
                                                                (γ − δ )R
                                       2 ∑ |(Φ ? ψ)(x)| ≤ exp −             .
                                        x∈G
                                         /
                                                                    3

Putting everything together, we see that Expression (6.12) is at least 1 − δ R − exp (−(γ − δ )R/3) as we
wanted to show.

6.2    Lower bound for junta testing
It follows from a reduction in Ambainis et al. [9, Section 6] that for any N = O(R) and sufficiently small
constant γ > 0, a Ω̃(R1/2 ) lower bound for the approximate degree or quantum query complexity of
     γ
ISTN,R implies an Ω̃(k1/2 ) approximate degree or quantum query lower bound for k-junta testing for
proximity parameter ε = 1/3. Hence, Theorem 6.2 has the following corollary.
   6 The formulation we use here is as follows. Let r , . . . , r be independent {0, 1}-valued random variables, S = R r ,
                                                     1           R                                                  ∑i=1 i
and µ = E[S]. Then for η > 1, we have Pr[S > (1 + η)µ] ≤ exp(−η µ/3). In this application, we are taking µ ≤ δ R and
η = γR/µ − 1 > 1.


                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                        58
                                  T HE P OLYNOMIAL M ETHOD S TRIKES BACK

Corollary 6.12. Any quantum tester that distinguishes k-juntas from functions that are (1/3)-far from
any k-junta with error probability at most 1/3 makes Ω̃(k1/2 ) queries to the function.

6.3    Lower bound for SDU
The goal of this section is to derive a lower bound for approximating the statistical distance of an input
distribution from uniform up to some additive constant error. We formalize this problem as follows.
    Given an input (s1 , . . . , sN ) ∈ [R]N , and i ∈ [R], let fi = |{ j : s j = i}|, and let p be the probability
distribution over [R] such that pi = fi /N. For N ≥ R and 0 < γ2 < γ1 < 1, define the partial function
      γ1 ,γ2
SDUN,R       as follows.
Definition 6.13. Define
                                                  
                                                  −1
                                                                 if 21 ∑Ri=1 |pi − 1/R| ≤ γ1 ,
                          γ1 ,γ2
                       SDUN,R (s1 , . . . , sN ) = 1               if 12 ∑Ri=1 |pi − 1/R| ≥ γ2 ,
                                                  
                                                   undefined      otherwise.
                                                  

Above, 12 ∑Ri=1 |pi − 1/R| is the statistical distance between p and the uniform distribution.
     The SDU problem reduces to IST in the sense that any approximating polynomial for SDU implies
the existence of an approximation to IST of the same degree. Hence, the approximate degree of SDU is
                                                                                              1/2          1/2,0
at least as large as that of IST. For intuition as to why this is true, let us relate ISTN,R to SDUN,R in the
special case where N = R and no dummy (i. e., 0) items appear in the input to IST. If (s1 , . . . , sN ) is a true
              1/2           1/2
input to ISTN,R , i. e., ISTN,R (s1 , . . . , sN ) = −1, then every index i ∈ [R] must appear in the input list exactly
                                                                                              1/2,0
once. Hence, the distribution represented by (s1 , . . . , sN ) is exactly uniform, so SDUN,R (s1 , . . . , sN ) = −1.
                           1/2
On the other hand, if ISTN,R (s1 , . . . , sN ) = 1, then at most R/2 indices i ∈ [R] appear in the input list, so the
list represents a distribution with statistical distance at least 1/2 from uniform. Thus, an approximating
                       1/2,0                                                    1/2
polynomial for SDUN,R is also an approximating polynomial for ISTN,R .
     We will actually need a more general relationship between the approximate degrees of SDU and IST
to handle the fact that we cannot take N to be exactly equal to R in the IST lower bound, as well as to
handle the occurrences of dummy items in the definition of IST.
                                                                                                     
Theorem 6.14. For some N = O(R), and some constants 0 < γ2 < γ1 < 1, deg                f SDUγ1 ,γ2 ≥ Ω̃(R1/2 ).
                                                                                                  N,R
                                                                    1 2           γ ,γ
The same lower bound applies to the quantum query complexity of SDUN,R  .

Proof. Fix R > 0, let c be the constant from Theorem 6.2, γ < (2/3c)2 be a sufficiently small constant,
and N = c · γ −1/2 · R. As inputs in ISTN,R are interpreted as elements of [R]N0 , we can equivalently interpret
                                        γ
                                                        γ1 ,γ2
them as elements of [R + 1]N , i. e., as inputs to SDUN,R+1       , for any desired 0 < γ1 < γ2 < 1.
    Set γ1 = 1 − 3γ/2 and γ2 = 1 − γ /c. Observe that since γ < (2/3c)2 , γ1 is strictly greater than γ2 .
                                        1/2

We claim that
                                       −1                           −1
                                      γ                        γ1 ,γ2
                                 ISTN,R     (−1) ⊆ SDUN,R+1                (−1), and
                                       −1                           −1
                                      γ                        γ1 ,γ2
                                 ISTN,R     (+1) ⊆ SDUN,R+1                (+1).


                         T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                      59
                             M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

                              −1
                           γ
Indeed, since inputs in ISTN,R     (−1) define a probability distribution over [R + 1] with support size at
least R, with all probabilities being integer multiples of 1/N = γ 1/2 /(cR), the statistical distance between
any such distribution and the uniform distribution is at most 1 − R/N = 1 − γ 1/2 /c. This follows from
the following calculation. Amongst probability distributions (p1 , . . . , pR+1 ) over [R + 1] with support
size at least R and all probabilities pi being integer multiples of 1/N, it is not hard to see that one
maximizes the statistical distance from the uniform distribution over [R + 1] by setting p1 = 1 − R−1            N ,
p2 = p3 = · · · = pR = 1/N, and pR+1 = 0. The statistical distance from uniform is
                                                                                        
                        1          R−1           1                    1       1         1
                              1−            −          + (R − 1)           −       +
                        2             N       R+1                   R+1 N            R+1
                                                                  R 1          1            R
                                                          = 1− + −                 ≤ 1− ,
                                                                 N N R+1                    N
where we have assumed that N ≥ R + 1, which is true for sufficiently small choice of γ.
                                           −1
                                        γ
     Similarly, since inputs in ISTN,R            (1) define a probability distribution over [R + 1] with support
size at most γ · R + 1, the statistical distance between any such distribution and the uniform distribution is
at least 1 − 3γ/2. To see this, let p = (p1 , . . . , pR+1 ) be any distribution of support size at most γ · R, and
let S = {i : pi = 0} and S̄ be the complement of S. Then the statistical distance of p from uniform is at
least
                                !                            !                            !         !
           1                1             1                 1          1     |S|                   |S̄|
           2 i∈S∑ pi − R + 1 + 2 ∑ pi − R + 1 = 2 R + 1 + ∑ pi − R + 1
                                              i∈S̄                                     i∈S̄
                                                                                                   
               1    |S|            |S̄|        1 (1 − γ)R           γR + 1        1         R − 2γR − 1
            =             +1−               ≥                 +1−              =      1+
               2 R+1             R+1           2      R+1            R+1          2            R+1
                                                                  γR + 1              γ +1
                                                           = 1−            = 1−γ −            ≥ 1 − 3γ/2.
                                                                   R+1                R+1
                                                                                                   1 2       γ ,γ
    It is an easy consequence of the above that any ε-approximating polynomial of degree d for SDUN,R+1
                                  γ                                f (ISTγ ) ≤ d). Theorem 6.14 then
implies an approximation to ISTN,R of the same degree (i. e., that degε   N,R
follows from Theorem 6.2.

6.4   Lower bound for entropy comparison and approximation
Given a distribution p over [R], the Shannon entropy of p, denoted H(p), is defined to be H(p) :=
                                                                                                           α,β
∑i∈[R] pi log2 (1/pi ). Following Goldreich and Vadhan [42], we define a partial function GapCmprEntN,R
capturing the problem of comparing the entropies of two distributions.
                                 α,β
    The function GapCmprEntN,R takes as input two vectors in [R]N and interprets each vector i ∈ {1, 2}
as a probability distribution pi over [R], with pi ( j) = fi, j /N where fi, j is the frequency of j in the ith
                                     α,β
vector. The function GapCmprEntN,R evaluates to
                                   
                                   −1
                                                if H(p1 ) − H(p2 ) ≤ β ,
                                     1           if H(p1 ) − H(p2 ) ≥ α,
                                   
                                     undefined otherwise.
                                   


                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                       60
                                 T HE P OLYNOMIAL M ETHOD S TRIKES BACK

                                                                           α,β         1/2 ). The
Theorem 6.15. There exist constants 0 < β < α < 1 such that deg(GapCmprEnt
                                                            f
                                                                           N,R ) = Ω̃(R
                                                                                      α,β
same lower bound applies to the quantum query complexity of GapCmprEntN,R .
Proof. Vadhan [88, Claim 4.4.2 and Remark 4.4.3] showed that as long as H((1 + γ1 )/2) < 1 − γ2 − λ ,
            γ1 ,γ2                               α,β
then SDUN,R        is reducible to GapCmprEnt4N,2R for some constants α, β such that α − β = λ . This
reduction (described next for completeness) implies that deg    f (SDUγ1 ,γ2 ) ≤ deg
                                                                                   f (GapCmprEntα,β ).
                                                                   ε       N,R         ε                  4N,2R
     For completeness, we sketch this transformation, closely following the presentation of Goldreich
                                                                                            γ1 ,γ2
and Vadhan [42]. At a high level, the reduction transforms an input in [R]N to SDUN,R              (interpreted as a
distribution p over [R]) into two distributions p1 , p2 over [R] × {0, 1} as follows. Both p1 and p2 start by
sampling an s ∈ {0, 1} at random. If s = 0, then a random sample r is chosen from p, and if s = 1, then r
is set to a uniform random sample from [R]. Distribution p2 outputs (r, s), while p1 outputs (r, b) for a
random b ∈ {0, 1}.
     The entropy of p1 is always v + 1, where v = 12 H(p) + 12 log2 (R). As for the entropy of p2 , if p is
far from the uniform distribution, then the selection bit s will be essentially determined by the sample r.
Hence, the entropy of p2 will be approximately v, which is noticeably smaller than the entropy of p1 . On
the other hand, if the two input distributions are close then (even conditioned on the sample selected) the
selection bit s will be almost random and so H(p2 ) ≈ v + 1, which is approximately the same as H(p1 ).
Quantitatively, Vadhan [88] shows that if the statistical distance between p and the uniform distribution is
δ , then 1 − δ ≤ H(p1 ) − H(p2 ) ≤ H((1 + δ )/2).
     Since we are considering distributions specified as vectors in [R]N , this transformation can be
equivalently described as follows. Assume for simplicity that R divides N. If p is specified by a vector
u in [R]N , then p2 is specified by a vector w in ([R] × {0, 1})4N defined as follows. For all i ∈ [N] and
 j ∈ {0, 1}, wi, j = (ui , 0), and for j ∈ {2, 3}, wi, j = (dRi/Ne, 1). Similarly, p1 is specified by a vector
v in ([R] × {0, 1})4N . For i ∈ [R] and j ∈ {0, 1}, vi, j = (ui , j), and for j ∈ {2, 3}, vi, j = (dRi/Ne, j − 2).
Observe that when representing u, v, and w as vectors in {−1, 1}N log2 (R) or {−1, 1}4N·log2 (R) , each bit of
v and w depends on at most one bit of u.
     Recall that in the statement of Theorem 6.14, γ1 = 1−3γ/2 and γ2 = 1−γ 1/2 /c, where γ is an arbitrary
constant less than (2/3c)2 , where c > 1 is the constant from Theorem 6.2. Clearly, H((1 + γ1 )/2)) =
H(1 − 3γ/4) = H(3γ/4). Using the fact that for any p ∈ [0, 1/2],

                                  H(p) = −p log2 (p) − (1 − p) log2 (1 − p)
                                         ≤ −2p log2 (p)
                                         ≤ 6p3/4 ,

it follows that for some constant γ ≤ 1/(648c4 ), we have H(3γ/4) < γ 1/2 /c. Hence, Vadhan’s reduction
            γ1 ,γ2              α,β
from SDUN,R        to GapCmprEnt4N,2R applies to this setting of γ1 and γ2 , and this shows that any degree
                                                      α,β
d ε-approximating polynomial for GapCmprEnt4N,2R implies a degree d polynomial ε-approximating
                   γ1 ,γ2
polynomial for SDUN,R     .
                                                                 α,β
    Combined with Theorem 6.14, this implies that deg(GapCmprEnt
                                                  f                  ) = Ω̃(R1/2 ).
                                                                              4N,2R

   Clearly, a quantum query algorithm that approximates entropy up to additive error (α − β )/4 can be
                          α,β
used to solve GapCmprEnt4N,2R , by (approximately) computing the entropies of each of the two input

                        T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                                   61
                               M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

distributions, and determining whether the difference is at most (β + α)/2. Hence, Theorem 6.15 implies
the following lower bound for approximating entropy to additive error α − β .

Corollary 6.16. Let N = c · R for a sufficiently large constant c. Interpret an input in [R]N as a distribution
p in the natural way (i. e., for each j ∈ [R], p j = f j /N, where f j is the number of times j appears in the
input). There is a constant ε > 0 such that any quantum algorithm that approximates the entropy of p up
to additive error ε with probability at least 2/3 requires Ω̃(R1/2 ) queries.


7     Conclusion and open questions
We conclude by briefly describing some additional consequences of our results, as well as a number of
open questions and directions for future work.


7.1    Additional consequences: approximate degree lower bounds for DNFs and AC0
For any constant k > 0, k-distinctness is computed by a DNF of polynomial size. Our Ω̃ n3/4−1/(2k)
                                                                                                         

is the best known lower bound on the approximate degree of polynomial size DNF formulae. The
previous best was Ω̃(n2/3 ) for element distinctness (a.k.a., 2-distinctness) [3], although Bun and Thaler
did establish, for any δ > 0, an Ω(n1−δ ) lower bound on the approximate degree of quasipolynomial size
DNFs.
    Similarly, for any constant k ≥1, Bun and Thaler exhibited an AC0 circuit of depth 2k − 1 with
                              k−2 k−1
approximate degree Ω̃ n1−2 /3         . Our techniques can be used to give a polynomial improvement for
                            −k
                                
any fixed k ≥ 2, to Ω̃ n1−2       (Theorem 1.1 is the special case of k = 2, as SURJ is computed by an
AC0 circuit of depth three). We omit further details of this result for brevity.


7.2    Open problems
The most obvious direction for future work is to extend our techniques to determine the approximate
degree and quantum query complexity of additional problems of interest in the study of quantum
algorithms. These include triangle finding problem [53, 59], graph collision [59], and verifying matrix
products [26, 51]. It would also be
                                   interesting  to close
                                                         the gap between our Ω(n3/4−1/(2k) ) lower bound
                                             k+2
for k-distinctness and Belovs’ O n3/4−1/(2 −4 ) upper bound, especially for small values of k (e. g.,
k = 3). Note that Belovs’ upper bound is known to be tight for k equal to 1 and 2 [3].
                                                              γ1 ,γ2
    Although we prove a lower bound of Ω̃(R1/2 ) for SDUN,R          for some constants 0 < γ2 < γ1 , we leave
                              2/3,1/3
open whether or not SDUN,R              = Ω̃(R1/2 ). It may be tempting to suspect that Theorem 6.14 implies
                                        2/3,1/3
an Ω̃(R1/2 ) lower bound on SDUN,R , by invoking the well-known Polarization Lemma of Sahai and
                                                 γ1 ,γ2
Vadhan [67]. The Polarization Lemma reduces SDUN,R      for any pair of constant γ1 , γ2 with γ2 < γ12 to
      2/3,1/3
SDUN 0 ,R0      for an appropriate choice of N 0 and R0 . Unfortunately, N 0 and R0 may be polynomially larger
                                                                                    2/3,1/3
than N and R, so this reduction does not give an Ω̃(R1/2 ) lower bound for SDUN,R             itself.

                          T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                             62
                              T HE P OLYNOMIAL M ETHOD S TRIKES BACK

    Another important direction is to determine the approximate degree of specific classes of functions,
especially polynomial size DNF formulae, and AC0 circuits. As mentioned in the previous subsection,
our k-distinctness lower bound (Theorem 1.2) gives the best known lower bound on polynomial size
DNFs. A compelling candidate for improving this lower bound is the k-sum function, which may have
approximate degree as large as Θ(nk/(k+1) ) (it is known that the quantum query complexity of k-sum is
Θ̃(nk/(k+1) ) [8, 18]). On the upper bounds side, it may be possible to extend the techniques underlying
our Õ(n3/4 ) upper bound on the approximate degree of SURJ to yield a sublinear upper bound for every
DNF formula of polynomial size.

Open Problem 7.1. For every constant c > 0 and every DNF formula f : {−1, 1}n → {−1, 1} of size at
                                                           f f ) = O(n1−δ )?
most nc , is there a δ > 0 (depending only on c) such that deg(

    A positive answer to Open Problem 7.1 would have major algorithmic consequences, including a
subexponential time algorithm for agnostically learning DNF formulae [46] (and PAC learning depth
three circuits [49]) of any fixed polynomial size.
    For general AC0 circuits, an Ω(n1−δ ) approximate degree lower bound is already known [33]. It
would be very interesting to improve this lower bound to an optimal Ω(n). Until recently, SURJ was a
prime candidate for exhibiting such a lower bound. However, owing to Sherstov’s upper bound [78] and
Theorem 1.1, SURJ is no longer a candidate function. However, we are optimistic about the following
closely related candidate. An approximate majority function is any total Boolean function that evaluates
to −1 (or +1) whenever at least 2/3 of its inputs are −1 (+1, respectively). It is well-known (via the
probabilistic method) that there are approximate majorities computable by depth 3 circuits of quadratic
size and logarithmic bottom fan-in [4]. It is possible that every approximate majority has approximate
degree Ω(n); proving this would resolve a question of Srinivasan [40].


Acknowledgements
We are grateful to Sasha Sherstov for an inspiring conversation at the BIRS 2017 workshop on Communi-
cation Complexity and Applications, II, which helped to spark this work. We also thank the anonymous
STOC and Theory of Computing reviewers for comments improving the presentation of this manuscript.
    This work was done while M. B. was a postdoctoral researcher at Princeton University. Some of this
work was performed when R. K. was a postdoctoral associate at MIT and was partly supported by NSF
grant CCF-1629809.


References
 [1] S COTT A ARONSON: Impossibility of succinct quantum proofs for collision-freeness. Quantum Inf.
     Comput., 12(1-2):21–28, 2012. [arXiv:1101.0403] 4

 [2] S COTT A ARONSON , S HALEV B EN -DAVID , AND ROBIN KOTHARI: Separations in query
     complexity using cheat sheets. In Proc. 48th STOC, pp. 863–876. ACM Press, 2016.
     [doi:10.1145/2897518.2897644] 5

                      T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                          63
                         M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

 [3] S COTT A ARONSON AND YAOYUN S HI: Quantum lower bounds for the collision and the ele-
     ment distinctness problems. J. ACM, 51(4):595–605, 2004. Preliminary version in STOC’02.
     [doi:10.1145/1008731.1008735] 4, 5, 6, 7, 8, 10, 62

 [4] M IKLÓS A JTAI: Σ11 -formulae on finite structures. Ann. Pure Appl. Logic, 24(1):1–48, 1983.
     [doi:10.1016/0168-0072(83)90038-6] 63

 [5] A NDRIS A MBAINIS: Quantum lower bounds by quantum arguments. J. Comput. System Sci.,
     64(4):750–767, 2002. Preliminary version in STOC’00. [doi:10.1006/jcss.2002.1826] 4

 [6] A NDRIS A MBAINIS: Polynomial degree and lower bounds in quantum complexity: Colli-
     sion and element distinctness with small range. Theory of Computing, 1(3):37–46, 2005.
     [doi:10.4086/toc.2005.v001a003] 18

 [7] A NDRIS A MBAINIS: Polynomial degree vs. quantum query complexity. J. Comput. System Sci.,
     72(2):220–238, 2006. Preliminary version in FOCS’03. [doi:10.1016/j.jcss.2005.06.006] 5

 [8] A NDRIS A MBAINIS: Quantum walk algorithm for element distinctness. SIAM J. Comput., 37(1):210–
     239, 2007. Preliminary version in FOCS’04. [doi:10.1137/S0097539705447311] 7, 63

 [9] A NDRIS A MBAINIS , A LEKSANDRS B ELOVS , O DED R EGEV, AND RONALD DE W OLF: Ef-
     ficient quantum algorithms for (gapped) group testing and junta testing. In Proc. 27th
     Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’16), pp. 903–922. ACM Press, 2016.
     [doi:10.1137/1.9781611974331.ch65] 5, 8, 13, 58

[10] A NDRIS A MBAINIS , A NDREW M. C HILDS , B EN R EICHARDT, ROBERT Š PALEK , AND S HENGYU
     Z HANG: Any AND-OR formula of size N can be evaluated in time N 1/2+o(1) on a quan-
     tum computer. SIAM J. Comput., 39(6):2513–2530, 2010. Preliminary version in FOCS’07.
     [doi:10.1137/080712167] 4

[11] A LP ATICI AND ROCCO A. S ERVEDIO: Quantum algorithms for learning and testing juntas.
     Quantum Information Processing, 6(5):323–348, 2007. [doi:10.1007/s11128-007-0061-6] 8

[12] H OWARD BARNUM , M ICHAEL S AKS , AND M ARIO S ZEGEDY: Quantum query complexity and
     semi-definite programming. In Proc. 18th IEEE Conf. on Computational Complexity (CCC’03), pp.
     179–193. IEEE Comp. Soc., 2003. [doi:10.1109/CCC.2003.1214419] 4

[13] ROBERT B EALS , H ARRY B UHRMAN , R ICHARD C LEVE , M ICHELE M OSCA , AND RONALD
     DE W OLF : Quantum lower bounds by polynomials. J. ACM, 48(4):778–797, 2001. Preliminary
     version in FOCS’98. [doi:10.1145/502090.502097] 4, 5, 53

[14] PAUL B EAME AND W IDAD M ACHMOUCHI: The quantum query complexity of AC0 . Quantum Inf.
     Comput., 12(7–8):670–676, 2012. Link at ACM DL. [arXiv:1008.2422] 5, 7

[15] R ICHARD B EIGEL: Perceptrons, PP, and the polynomial hierarchy. Comput. Complexity, 4:339–349,
     1994. Preliminary version in SCT’92. [doi:10.1007/BF01263422] 4

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                       64
                              T HE P OLYNOMIAL M ETHOD S TRIKES BACK

[16] A LEKSANDRS B ELOVS: Learning-graph-based quantum algorithm for k-distinctness. In Proc. 53rd
     FOCS, pp. 207–216. IEEE Comp. Soc., 2012. [doi:10.1109/FOCS.2012.18] 5, 7, 12
[17] A LEKSANDRS B ELOVS: Span programs for functions with constant-sized 1-certificates. In Proc.
     44th STOC, pp. 77–84. ACM Press, 2012. [doi:10.1145/2213977.2213985] 7
[18] A LEKSANDRS B ELOVS AND ROBERT Š PALEK: Adversary lower bound for the k-sum problem. In
     Proc. 4th Symp. Innovations in Theoretical Computer Science (ITCS’13), pp. 323–328. ACM Press,
     2013. [doi:10.1145/2422436.2422474] 63
[19] E RIC B LAIS: Testing juntas nearly optimally. In Proc. 41st STOC, pp. 151–158. ACM Press, 2009.
     [doi:10.1145/1536414.1536437] 8
[20] A NDREJ B OGDANOV, Y UVAL I SHAI , E MANUELE V IOLA , AND C HRISTOPHER W ILLIAMSON:
     Bounded indistinguishability and the complexity of recovering secrets. In Proc. 36th Annual Intern.
     Cryptology Conf. (CRYPTO’16), pp. 593–618, 2016. [doi:10.1007/978-3-662-53015-3_21] 4
[21] A DAM B OULAND , L IJIE C HEN , D HIRAJ H OLDEN , J USTIN T HALER , AND P RASHANT NALINI
     VASUDEVAN: On the power of statistical zero knowledge. SIAM J. Comput., 49(4):1–58, 2019.
     Preliminary version in FOCS’17. [doi:10.1137/17M1161749] 4, 9, 10, 55, 56
[22] G ILLES B RASSARD , P ETER H ØYER , AND A LAIN TAPP: Quantum counting. In Proc. 25th Internat.
     Colloq. on Automata, Languages and Programming (ICALP’98), pp. 820–831. Springer, 1998.
     [doi:10.1007/BFb0055105] 27
[23] S ERGEY B RAVYI , A RAM W ETTROTH H ARROW, AND AVINATAN H ASSIDIM: Quantum algo-
     rithms for testing properties of distributions. IEEE Trans. Inform. Theory, 57(6):3971–3981, 2011.
     Preliminary version in STACS’10. [doi:10.1109/TIT.2011.2134250] 5, 8
[24] H ARRY B UHRMAN , R ICHARD C LEVE , RONALD DE W OLF, AND C HRISTOF Z ALKA: Bounds for
     small-error and zero-error quantum algorithms. In Proc. 40th FOCS, pp. 358–368. IEEE Comp.
     Soc., 1999. [doi:10.1109/SFFCS.1999.814607] 5
[25] H ARRY B UHRMAN , I LAN N EWMAN , H EIN R ÖHRIG , AND RONALD DE W OLF: Robust polynomi-
     als and quantum algorithms. Theory Comput. Syst., 40(4):379–395, 2007. Preliminary version in
     STACS’05. [doi:10.1007/s00224-006-1313-z] 30, 31
[26] H ARRY B UHRMAN AND ROBERT Š PALEK: Quantum verification of matrix products. In Proc. 17th
     Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’06), pp. 880–889. ACM Press, 2006. Link
     at ACM DL. 62
[27] H ARRY B UHRMAN , N IKOLAI K. V ERESHCHAGIN , AND RONALD DE W OLF: On computation and
     communication with small bias. In Proc. 22nd IEEE Conf. on Computational Complexity (CCC’07),
     pp. 24–32. IEEE Comp. Soc., 2007. [doi:10.1109/CCC.2007.18] 4
[28] M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER: The polynomial method strikes back: Tight
     quantum query bounds via dual polynomials. In Proc. 50th STOC, pp. 297–310. ACM Press, 2018.
     [doi:10.1145/3188745.3188784] 1

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                           65
                         M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

[29] M ARK B UN AND J USTIN T HALER: Dual lower bounds for approximate degree and Markov-
     Bernstein inequalities. Inform. Comput., 243:2–25, 2015. Preliminary version in ICALP’13.
     [doi:10.1016/j.ic.2014.12.003] 9, 10, 11, 42, 48
[30] M ARK B UN AND J USTIN T HALER: Hardness amplification and the approximate degree of
     constant-depth circuits. In Proc. 42nd Internat. Colloq. on Automata, Languages and Programming
     (ICALP’15), pp. 268–280. Springer, 2015. [doi:10.1007/978-3-662-47672-7_22] 9, 13, 48, 55, 56
[31] M ARK B UN AND J USTIN T HALER: Dual polynomials for collision and element distinctness.
     Theory of Computing, 12(16):1–34, 2016. [doi:10.4086/toc.2016.v012a016] 14
[32] M ARK B UN AND J USTIN T HALER: Improved bounds on the sign-rank of AC0 . In Proc. 43rd
     Internat. Colloq. on Automata, Languages and Programming (ICALP’16), pp. 37:1–37:14. Schloss
     Dagstuhl–Leibniz-Zentr. fuer Informatik, 2016. [doi:10.4230/LIPIcs.ICALP.2016.37] 10
[33] M ARK B UN AND J USTIN T HALER: A nearly optimal lower bound on the approximate degree of
     AC0 . In Proc. 58th FOCS, pp. 1–12. IEEE Comp. Soc., 2017. [doi:10.1109/FOCS.2017.10] 7, 10,
     11, 12, 13, 18, 19, 20, 21, 22, 37, 38, 54, 63
[34] M ARK B UN AND J USTIN T HALER: The large-error approximate degree of AC0 . In Proc.
     23rd Internat. Workshop on Randomization and Computation (RANDOM’19), pp. 55:1–55:16.
     Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2019.55] 53
[35] K ARTHEKEYAN C HANDRASEKARAN , J USTIN T HALER , J ONATHAN U LLMAN , AND A N -
     DREW WAN : Faster private release of marginals on small databases. In Proc. 5th Symp.
     Innovations in Theoretical Computer Science (ITCS’14), pp. 387–402. ACM Press, 2014.
     [doi:10.1145/2554797.2554833] 4
[36] A RKADEV C HATTOPADHYAY AND A NIL A DA: Multiparty communication complexity of disjoint-
     ness. Electron. Colloq. Comput. Complexity, TR08-002:1–15, 2008. [ECCC] 4, 9
[37] E.W. C HENEY: Introduction to Approximation Theory. AMS Chelsea Pub., 1982. 30
[38] M ATEI DAVID AND T ONIANN P ITASSI: Separating NOF communication complexity classes RP
     and NP. Electron. Colloq. Comput. Complexity, TR08-014:1–11, 2008. [ECCC] 4
[39] M ATEI DAVID , T ONIANN P ITASSI , AND E MANUELE V IOLA: Improved separations between
     nondeterministic and randomized multiparty communication. ACM Trans. Comput. Theory, 1(2):5:1–
     5:20, 2009. Preliminary version in RANDOM’08. [doi:10.1145/1595391.1595392] 4
[40] Y UVAL F ILMUS , H AMED H ATAMI , S TEVEN H EILMAN , E LCHANAN M OSSEL , RYAN
     O’D ONNELL , S USHANT S ACHDEVA , A NDREW WAN , AND K ARL W IMMER: Real analysis
     in computer science: A collection of open problems, 2014. Preprint at the Simons Institute. 63
[41] D MITRY G AVINSKY AND A LEXANDER A. S HERSTOV: A separation of NP and coNP
     in multiparty communication complexity. Theory of Computing, 6(10):227–245, 2010.
     [doi:10.4086/toc.2010.v006a010] 4, 57

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                       66
                             T HE P OLYNOMIAL M ETHOD S TRIKES BACK

[42] O DED G OLDREICH AND S ALIL P. VADHAN: On the complexity of computational problems
     regarding distributions (a survey). In O DED G OLDREICH, editor, Studies in Complexity and
     Cryptography, volume 6650 of LNCS, pp. 390–405. Springer, 2011. [doi:10.1007/978-3-642-22670-
     0_27, ECCC:TR11-004] 60, 61

[43] RONALD L. G RAHAM , D ONALD E. K NUTH , AND O REN PATASHNIK: Concrete Mathematics: A
     Foundation for Computer Science. Addison–Wesley, 2nd edition, 1994. 39

[44] P ETER H ØYER , T ROY L EE , AND ROBERT Š PALEK: Negative weights make adversaries stronger.
     In Proc. 39th STOC, pp. 526–535. ACM Press, 2007. [doi:10.1145/1250790.1250867] 4, 5

[45] J EFF K AHN , NATHAN L INIAL , AND A LEX S AMORODNITSKY: Inclusion-exclusion: Exact and
     approximate. Combinatorica, 16(4):465–477, 1996. [doi:10.1007/BF01271266] 4

[46] A DAM TAUMAN K ALAI , A DAM R. K LIVANS , Y ISHAY M ANSOUR , AND ROCCO A. S ERVEDIO:
     Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777–1805, 2008. Preliminary version in
     FOCS’05. [doi:10.1137/060649057] 4, 63

[47] VARUN K ANADE AND J USTIN T HALER: Distribution-independent reliable learning. In Proc. 27th
     Ann. Conf. on Learning Theory (COLT’14), pp. 3–24, 2014. MLR Press. [arXiv:1402.5164] 4

[48] H ARTMUT K LAUCK , ROBERT Š PALEK , AND RONALD DE W OLF: Quantum and classical strong
     direct product theorems and optimal time-space tradeoffs. SIAM J. Comput., 36(5):1472–1493,
     2007. Preliminary version in FOCS’04. [doi:10.1137/05063235X] 5
                                                                               1/3
[49] A DAM R. K LIVANS AND ROCCO A. S ERVEDIO: Learning DNF in time 2Õ(n ) . J. Comput. System
     Sci., 68(2):303–318, 2004. Prelim. version in STOC’01. [doi:10.1016/j.jcss.2003.07.007] 4, 63

[50] A DAM R. K LIVANS AND ROCCO A. S ERVEDIO: Toward attribute efficient learning of decision
     lists and parities. J. Mach. Learn. Res., 7:587–602, 2006. JMLR, prelim. version in COLT’04. 4

[51] ROBIN KOTHARI AND A SHWIN NAYAK: Quantum algorithms for matrix multiplication and product
     verification. In Encyclopedia of Algorithms, pp. 1673–1677. Springer, 2016. [doi:10.1007/978-1-
     4939-2864-4_303] 62

[52] S OPHIE L APLANTE AND F RÉDÉRIC M AGNIEZ: Lower bounds for randomized and quantum query
     complexity using Kolmogorov arguments. SIAM J. Comput., 38(1):46–62, 2008. Preliminary
     version in CCC’04. [doi:10.1137/050639090] 4

[53] F RANÇOIS L E G ALL: Improved quantum algorithm for triangle finding via combinatorial arguments.
     In Proc. 55th FOCS, pp. 216–225. IEEE Comp. Soc., 2014. [doi:10.1109/FOCS.2014.31] 62

[54] T ROY L EE: A note on the sign degree of formulas, 2009. [arXiv:0909.4607] 11, 19

[55] T ROY L EE , R AJAT M ITTAL , B EN W. R EICHARDT, ROBERT Š PALEK , AND M ARIO S ZEGEDY:
     Quantum query complexity of state conversion. In Proc. 52nd FOCS, pp. 344–353. IEEE Comp.
     Soc., 2011. [doi:10.1109/FOCS.2011.75] 4, 5

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                         67
                         M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

[56] T ROY L EE AND A DI S HRAIBMAN: An approximation algorithm for approximation rank. In Proc.
     24th IEEE Conf. on Computational Complexity (CCC’09), pp. 351–357. IEEE Comp. Soc., 2009.
     [doi:10.1109/CCC.2009.25] 4

[57] T ROY L EE AND A DI S HRAIBMAN: Disjointness is hard in the multiparty number-on-the-
     forehead model. Comput. Complexity, 18(2):309–336, 2009. Preliminary version in CCC’08.
     [doi:10.1007/s00037-009-0276-2] 9

[58] T ONGYANG L I AND X IAODI W U: Quantum query complexity of entropy estimation. IEEE Trans.
     Inform. Theory, 65(5):2899–2921, 2018. [doi:10.1109/TIT.2018.2883306] 5, 7, 9

[59] F RÉDÉRIC M AGNIEZ , M IKLOS S ANTHA , AND M ARIO S ZEGEDY: Quantum algorithms for the
     triangle problem. SIAM J. Comput., 37(2):413–424, 2007. Preliminary version in SODA’05.
     [doi:10.1137/050643684] 62

[60] M ARVIN M INSKY AND S EYMOUR PAPERT: Perceptrons – An Introduction to Computational
     Geometry. MIT Press, 1969. [doi:10.7551/mitpress/11301.001.0001] 4, 9, 16

[61] A SHLEY M ONTANARO: The quantum complexity of approximating the frequency moments.
     Quantum Inf. Comput., 16(13–14):1169–1190, 2016. Link at ACM DL. [arXiv:1505.00113] 7

[62] N OAM N ISAN AND M ARIO S ZEGEDY: On the degree of Boolean functions as real poly-
     nomials. Comput. Complexity, 4(4):301–313, 1994. Preliminary version in STOC’92.
     [doi:10.1007/BF01263419] 15, 26

[63] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: New degree bounds for polynomial threshold func-
     tions. Combinatorica, 30(3):327–358, 2010. Preliminary version in STOC’03. [doi:10.1007/s00493-
     010-2173-3] 4, 39

[64] A NUP R AO AND A MIR Y EHUDAYOFF: Simplified lower bounds on the multiparty com-
     munication complexity of disjointness. In Proc. 30th IEEE Conf. on Computational Com-
     plexity (CCC’15), pp. 88–101. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2015.
     [doi:10.4230/LIPIcs.CCC.2015.88] 4

[65] A LEXANDER A. R AZBOROV AND A LEXANDER A. S HERSTOV: The sign-rank of AC0 . SIAM J.
     Comput., 39(5):1833–1855, 2010. Prelim. version FOCS’08. [doi:10.1137/080744037] 11, 22

[66] B EN W. R EICHARDT: Reflections for quantum query algorithms. In Proc. 22nd Ann.
     ACM-SIAM Symp. on Discrete Algorithms (SODA’11), pp. 560–569. ACM Press, 2011.
     [doi:10.1137/1.9781611973082.44] 4, 5

[67] A MIT S AHAI AND S ALIL VADHAN: A complete problem for statistical zero knowledge. J. ACM,
     50(2):196–249, 2003. Preliminary version in FOCS’97. [doi:10.1145/636865.636868] 62

[68] ROCCO A. S ERVEDIO , L I -YANG TAN , AND J USTIN T HALER: Attribute-efficient learning and
     weight–degree tradeoffs for polynomial threshold functions. In Proc. 25th Ann. Conf. on Learning
     Theory (COLT’12), pp. 14:1–14:19, 2012. MLR Press. 4

                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                        68
                            T HE P OLYNOMIAL M ETHOD S TRIKES BACK

[69] A LEXANDER A. S HERSTOV: Communication lower bounds using dual polynomials. Bull. EATCS,
     95:59–93, 2008. [ECCC:TR08-057] 4

[70] A LEXANDER A. S HERSTOV: Approximate inclusion-exclusion for arbitrary symmetric functions.
     Comput. Complexity, 18(2):219–247, 2009. Preliminary version in CCC’08. [doi:10.1007/s00037-
     009-0274-4] 4

[71] A LEXANDER A. S HERSTOV: Separating AC0 from depth-2 majority circuits. SIAM J. Comput.,
     38(6):2113–2129, 2009. Preliminary version in STOC’07. [doi:10.1137/08071421X] 4, 9

[72] A LEXANDER A. S HERSTOV: The pattern matrix method. SIAM J. Comput., 40(6):1969–2000,
     2011. Preliminary version in STOC’08. [doi:10.1137/080733644] 4, 9, 15

[73] A LEXANDER A. S HERSTOV: Approximating the AND-OR tree. Theory of Computing, 9(20):653–
     663, 2013. [doi:10.4086/toc.2013.v009a020] 9, 11, 42, 48

[74] A LEXANDER A. S HERSTOV: Communication lower bounds using directional derivatives. In Proc.
     45th STOC, pp. 921–930. ACM Press, 2013. [doi:10.1145/2488608.2488725] 4

[75] A LEXANDER A. S HERSTOV: The intersection of two halfspaces has high threshold degree. SIAM J.
     Comput., 42(6):2329–2374, 2013. [doi:10.1137/100785260] 9, 11, 19, 49

[76] A LEXANDER A. S HERSTOV: Making polynomials robust to noise. Theory of Computing, 9(18):593–
     615, 2013. Preliminary version in STOC’12. [doi:10.4086/toc.2013.v009a018] 10

[77] A LEXANDER A. S HERSTOV: The multiparty communication complexity of set disjointness. SIAM
     J. Comput., 45(4):1450–1489, 2016. Preliminary version in STOC’12. [doi:10.1137/120891587] 4

[78] A LEXANDER A. S HERSTOV: Algorithmic polynomials. In Proc. 50th STOC, pp. 311–324. ACM
     Press, 2018. Journal v. to appear in SIAM J. Comp. [doi:10.1145/3188745.3188958] 5, 6, 7, 63

[79] A LEXANDER A. S HERSTOV: Breaking the Minsky–Papert barrier for constant-depth circuits. SIAM
     J. Comput., 47(5):1809–1857, 2018. Prelim. v. STOC’14. [doi:10.1137/15M1015704] 9, 10, 55

[80] A LEXANDER A. S HERSTOV: The power of asymmetry in constant-depth circuits. SIAM J. Comput.,
     47(6):2362–2434, 2018. Preliminary version in FOCS’15. [doi:10.1137/16M1064477] 7, 9, 10

[81] YAOYUN S HI AND Y UFAN Z HU: Quantum communication complexity of block-composed func-
     tions. Quantum Inf. Comput., 9(5):444–460, 2009. [arXiv:0710.0095] 9, 11, 19

[82] ROBERT Š PALEK: A dual polynomial for OR, 2008. [arXiv:0803.4516] 9, 38

[83] ROBERT Š PALEK AND M ARIO S ZEGEDY: All quantum adversary methods are equivalent. Theory
     of Computing, 2(1):1–18, 2006. Prelim. version ICALP’05. [doi:10.4086/toc.2006.v002a001] 5

[84] AVISHAY TAL: Shrinkage of De Morgan formulae by spectral techniques. In Proc. 55th FOCS, pp.
     551–560. IEEE Comp. Soc., 2014. [doi:10.1109/FOCS.2014.65] 4

                    T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                       69
                          M ARK B UN , ROBIN KOTHARI , AND J USTIN T HALER

[85] AVISHAY TAL: Formula lower bounds via the quantum method. In Proc. 49th STOC, pp. 1256–1268.
     ACM Press, 2017. [doi:10.1145/3055399.3055472] 4

[86] J USTIN T HALER: Lower bounds for the approximate degree of block-composed functions. In Proc.
     43rd Internat. Colloq. on Automata, Languages and Programming (ICALP’16), pp. 17:1–17:15.
     Schloss Dagstuhl–Leibniz-Zentr. fuer Informatik, 2016. [doi:10.4230/LIPIcs.ICALP.2016.17] 9, 10

[87] J USTIN T HALER , J ONATHAN U LLMAN , AND S ALIL P. VADHAN: Faster algorithms for privately
     releasing marginals. In Proc. 39th Internat. Colloq. on Automata, Languages and Programming
     (ICALP’12), pp. 810–821. Springer, 2012. [doi:10.1007/978-3-642-31594-7_68] 4

[88] S ALIL P RAVIN VADHAN: A Study of Statistical Zero-knowledge Proofs. Ph. D. thesis, Massachusetts
     Institute of Technology, 1999. Link at Springer. 13, 61

[89] G REGORY VALIANT AND PAUL VALIANT: Estimating the unseen: An n/ log(n)-sample estimator
     for entropy and support size, shown optimal via new CLTs. In Proc. 43rd STOC, pp. 685–694. ACM
     Press, 2011. [doi:10.1145/1993636.1993727] 8, 9

[90] E MANUELE V IOLA: Special Topics in Complexity Theory. Lecture notes, Northeastern Univ.,
     2017. ECCC. 22

[91] M ARK Z HANDRY: A note on the quantum collision and set equality problems. Quantum Inf.
     Comput., 15(7&8):557–567, 2015. Link at ACM DL. [arXiv:1312.1027] 4

[92] S HENGYU Z HANG: On the power of Ambainis lower bounds. Theoret. Comput. Sci., 339(2):241–
     256, 2005. Preliminary version in ICALP’04. [doi:10.1016/j.tcs.2005.01.019] 4, 5


AUTHORS

      Mark Bun
      Assistant professor
      Department of Computer Science
      Boston University
      Boston, MA, USA
      mbun bu edu
      http://cs-people.bu.edu/mbun/


      Robin Kothari
      Senior researcher
      Microsoft Quantum and Microsoft Research
      Redmond, WA, USA
      robin kothari microsoft com
      http://www.robinkothari.com



                     T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                         70
                            T HE P OLYNOMIAL M ETHOD S TRIKES BACK

    Justin Thaler
    Assistant professor
    Department of Computer Science
    Georgetown University
    Washington, DC, USA
    justin thaler georgetown edu
    http://people.cs.georgetown.edu/jthaler/


ABOUT THE AUTHORS

    M ARK B UN is an Assistant Professor in the Department of Computer Science at Boston
       University. He completed his Ph. D. in the Theory of Computation group at Harvard
       University, where he was advised by Salil Vadhan. After that, he was a postdoctoral
       researcher at Princeton University and a Google Research Fellow at the Simons Institute.
       His research interests include computational complexity, differential privacy, cryptozo-
       ology, and learning theory. As a native Seattleite, he enjoys coffee, rain, composting,
       4-way stop signs, hiking, cheese rolling, and competitive duck herding.


    ROBIN KOTHARI received his Ph. D. in Computer Science from the University of Waterloo
      in 2014, under the supervision of Andrew Childs and John Watrous. He was then a
      postdoctoral associate at the Center for Theoretical Physics at the Massachusetts Institute
      of Technology for three years. He is currently a Senior Researcher at Microsoft Quantum
      and Microsoft Research. His research interests include quantum algorithms, quantum
      complexity theory, query complexity, and communication complexity.


    J USTIN T HALER is an Assistant Professor in the Department of Computer Science at
        Georgetown University. He completed his Ph. D. in the Theory of Computation group at
        Harvard University, where he was advised by Michael Mitzenmacher. After that, he was
        a postdoctoral researcher at the Simons Institute, and a Research Scientist at Yahoo Labs
        in New York City. His research interests include computational complexity, verifiable
        computing, and algorithms for massive datasets. In his spare time, he enjoys running and
        playing low-quality tennis.




                    T HEORY OF C OMPUTING, Volume 16 (10), 2020, pp. 1–71                           71