Making Polynomials Robust to Noise

Authors Alexander A. Sherstov,
Plaintext
                         T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615
                                       www.theoryofcomputing.org

                   S PECIAL ISSUE : A NALYSIS OF B OOLEAN F UNCTIONS



           Making Polynomials Robust to Noise∗
                                             Alexander A. Sherstov†
                      Received July 1, 2012; Revised May 14, 2013; Published June 11, 2013




        Abstract: A basic question in any model of computation is how to reliably compute a given
        function when its inputs are subject to noise. Buhrman, Newman, Röhrig, and de Wolf (2003)
        posed the noisy computation problem for real polynomials. We give a complete solution
        to this problem. For any δ > 0 and any polynomial p : {0, 1}n → [−1, 1], we construct a
        corresponding polynomial probust : Rn → R of degree O(deg p + log 1/δ ) that is robust to
        noise in the inputs: |p(x) − probust (x + ε)| < δ for all x ∈ {0, 1}n and all ε ∈ [−1/3, 1/3]n .
        This result is optimal with respect to all parameters. We construct probust explicitly for each p.
        Previously, it was open to give such a construction even for p = x1 ⊕ x2 ⊕ · · · ⊕ xn (Buhrman
        et al., 2003). The proof contributes a technique of independent interest, which allows one to
        force partial cancellation of error terms in a polynomial.

ACM Classification: F.0, F.1.3
AMS Classification: 68Q05, 68Q87
Key words and phrases: computation with noise, polynomial approximation, robust polynomials, real
polynomials on the Boolean hypercube

1     Introduction
Noise is a well-studied phenomenon in the computing literature. It arises naturally in several ways. Most
obviously, the input to a computation can be noisy due to imprecise measurement or human error. In
addition, both the input and the intermediate results of a computation can be corrupted to some extent by
    ∗ An extended abstract of this article appeared in the Proceedings of the Forty-Fourth Annual ACM Symposium on Theory of

Computing (STOC’12), pages 747–758, 2012.
  † Supported by NSF CAREER award CCF-1149018.



© 2009 Alexander A. Sherstov
c b Licensed under a Creative Commons Attribution License (CC-BY)                             DOI: 10.4086/toc.2013.v009a018
                                           A LEXANDER A. S HERSTOV

a malicious third party. Finally, even in a setting with correct input and no third-party interference, errors
can be introduced by using a randomized algorithm as a subroutine in the computation. In all these settings,
one would like to compute the correct answer with high probability despite the presence of noise. A
matter of both theoretical and practical interest is how many additional resources are necessary to combat
the noise. Research has shown that the answer depends crucially on the computational model in question.
Models studied in this context include decision trees [21, 43, 19, 17, 38], circuits [40, 22, 18, 30, 52, 53],
broadcast networks [23, 36, 20, 38, 25, 14, 15], and communication protocols [45, 46, 7, 24]. Some
computational models exhibit a surprising degree of robustness to noise, in that one can compute the
correct answer with probability 99% with only a constant-factor increase in cost relative to the noise-free
setting. In other models, even the most benign forms of noise increase the computational complexity by a
superconstant factor.
     In most cases, one can overcome the noise by brute force, with a logarithmic-factor increase in
computational complexity. In a noisy decision tree, for example, one can repeat each query a logarithmic
number of times and use the majority answer. Assuming independent corruption of the queries, this
strategy results in a correct computation with high probability. Similarly, in a noisy broadcast network
one can repeat each broadcast a logarithmic number of times and take the majority of the received bits.
It may seem, then, that noise is an issue of minor numerical interest. This impression is incorrect on
several counts. First, in some settings such as communication protocols [45, 46, 7, 24], it is nontrivial to
perform any computation at all in the presence of noise. Second, even a logarithmic-factor increase in
complexity can be too costly for some applications [42]. Third and most important, the question at hand
is a qualitative one: is it possible to arrange the steps in a computation so as to cause the intermediate
errors to almost always cancel? This study frequently reveals aspects of the model that would otherwise
be overlooked.

Our problem
The computational model of interest to us is the real polynomial. In this model, the complexity measure of
a Boolean function f : {−1, +1}n → {−1, +1} is the least degree of a real polynomial that approximates
 f pointwise. Formally, the approximate degree of f , denoted g            deg ( f ), is the least degree of a real
polynomial p with | f (x) − p(x)| ≤ 1/3 for every x ∈ {−1, +1}n . The constant 1/3 is chosen for aesthetic
reasons and can be replaced by any other in (0, 1) without changing the model. The approximate degree
of Boolean functions has been studied for over forty years and has enabled spectacular progress in circuit
complexity [39, 51, 6, 3, 34, 35, 49, 5], quantum query complexity [4, 9, 2, 1, 29], communication
complexity [12, 41, 11, 42, 47], and computational learning theory [54, 31, 28, 32, 33, 48, 50].
    The contribution of this paper is to answer a question posed ten years ago by Buhrman et al. [10].
These authors asked whether real polynomials, as a computational model, are robust to noise. Robustness
to noise becomes an issue when one wants to do anything nontrivial with approximating polynomials,
e.g., compose them. To use a motivating example from [10], suppose that we have approximating
polynomials p and q for Boolean functions f : {−1, +1}n → {−1, +1} and g : {−1, +1}m → {−1, +1},
respectively. Having these two polynomials gives us no way whatsoever to approximate the composed
function f (g, g, . . . , g) on nm variables. In particular, the natural construction p(q, q, . . . , q) does not work
because q can range anywhere in [−4/3, −2/3] ∪ [2/3, 4/3] and the behavior of p on non-Boolean inputs
can be arbitrary. In other words, the problem is that the output of q is inherently noisy, and the original

                      T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                    594
                                M AKING P OLYNOMIALS ROBUST TO N OISE

polynomial p is not designed to handle that noise. What we need is a robust approximating polynomial
for f , to use the term introduced by Buhrman et al. [10]. Formally, a robust approximating polynomial
for f is a real polynomial probust : Rn → R such that

                                                                        1
                                         | f (x) − probust (x + ε)| <
                                                                        3
for every x ∈ {−1, +1}n and every ε ∈ [−1/3, 1/3]n . Thus, a robust polynomial approximates f not only
on Boolean inputs but also on the larger domain [−4/3, −2/3] ∪ [2/3, 4/3]. Robust polynomials compose
in the natural way: to use the notations of this paragraph, the polynomial probust (q, q, . . . , q) is a valid
approximating polynomial for f (g, g, . . . , g).
     The obvious question is whether robustness comes at a cost. Ideally, one would like to make an
approximating polynomial robust with only a constant-factor increase in degree, so that every Boolean
function f would have a robust polynomial of degree Θ( g         deg ( f )). Analogous to noisy decision trees
and broadcast networks, a fairly direct calculation shows that every Boolean function f : {−1, +1}n →
{−1, +1} has a robust polynomial of degree O( g     deg ( f ) log n). Buhrman et al. [10] improved this bound
to min{O(n), O( g deg ( f ) log g
                                deg ( f ))} using combinatorial arguments and quantum query complexity.
In particular, the work of Buhrman et al. shows that parity, majority, and random functions—all of
which have approximate degree Θ(n)—also have robust approximating polynomials of degree Θ(n). The
authors of [10] asked whether every Boolean function f has a robust approximating polynomial of degree
Θ( gdeg ( f )).

Our result
We give a complete solution to the problem of Buhrman et al. [10]. To be precise, we study a more
general question. Buhrman et al. [10] asked whether a polynomial p can be made robust with only a
constant-factor increase in degree, provided that p approximates a Boolean function. We prove that
every polynomial p : {−1, +1}n → [−1, 1] can be made robust, regardless of whether p approximates a
Boolean function.

Theorem 1.1. Let p : {−1, +1}n → [−1, 1] be a given polynomial. Then for every δ > 0, there is a
polynomial probust : Rn → R of degree O(deg p + log 1/δ ) such that

                                         |p(x) − probust (x + ε)| < δ

for all x ∈ {−1, +1}n and ε ∈ [−1/3, 1/3]n . Furthermore, probust has an explicit, closed-form description.

    Theorem 1.1 shows that real polynomials are robust to noise. In this regard, they behave differently
from other computational models such as decision trees [21] and broadcast networks [25], where handling
noise provably increases the computational complexity by a superconstant factor. In fact, Theorem 1.1
reveals a very high degree of robustness: the degree of a δ -error robust polynomial grows additively rather
than multiplicatively with the error parameter δ , and the actual dependence on δ is logarithmic. This
dependence on δ is best possible; see Remark 6.3 for details. Theorem 1.1 has the following consequence,
which the reader may find counterintuitive: high-degree polynomials are more easily made robust than

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                              595
                                          A LEXANDER A. S HERSTOV

low-degree polynomials, in the sense that a degree-d polynomial can be made robust within error 2−Θ(d)
with only a constant-factor increase in degree.
    A final point of interest is that Theorem 1.1 gives an explicit, formulaic construction of a robust
polynomial probust in terms of the original polynomial p. Prior to this work, no explicit robust construction
was known even for the parity polynomial p(x) = x1 x2 · · · xn . To quote Buhrman et al. [10], “We are not
aware of a direct ‘closed form’ or other natural way to describe a robust degree-O(n) polynomial for the
parity of n bits, but can only infer its existence from the existence of a robust quantum algorithm. Given
the simplicity of the non-robust representing polynomial for parity, one would hope for a simple closed
form for robust polynomials for parity as well.”
    As a consequence of Theorem 1.1, we conclude that the approximate degree behaves nicely under
function composition:

Corollary. For all Boolean functions f and g,

                                 deg ( f (g, g, . . . , g)) = O( g
                                 g                               deg ( f ) g
                                                                           deg (g)) .

Prior to this paper, this conclusion was known to hold only for several special functions, e.g., [8, 27, 10],
and required quantum query arguments.

Our techniques
We will now overview the techniques of previous work and contrast them with the approach of this
paper. Using a remarkable quantum query argument, Buhrman et al. [10] gave a robust approximating
polynomial of degree O( g   deg ( f )) for every symmetric Boolean function f as well as every Boolean
function f with approximate degree Θ(n). Unfortunately, the quantum argument does not seem to
generalize beyond these two important cases. With an unrelated, combinatorial argument, the authors
of [10] obtained an upper bound of O( g      deg ( f ) log g
                                                           deg ( f )) on the degree of a robust polynomial for
any given Boolean function f . This combinatorial argument also seems to be of no use in proving
Theorem 1.1. For one thing, it is unclear how to save a logarithmic factor in the combinatorial analysis,
and more fundamentally, the combinatorial argument only works for approximating Boolean functions
rather than arbitrary real functions {−1, +1}n → [−1, 1].
    We approach the problem of robust approximation differently, with a direct analytic treatment rather
than combinatorics or quantum query complexity. Our solution comprises three steps, corresponding to
functions of increasing generality:

  (i) robust approximation of the parity polynomial, p(x) = x1 x2 · · · xn ;

 (ii) robust approximation of homogeneous polynomials, p(x) = ∑|S|=d aS ∏i∈S xi ;

(iii) robust approximation of arbitrary polynomials.

    For step (i), we construct an exact representation of the sign function on the domain [−4/3, −2/3] ∪
[2/3, 4/3] as an analytic series whose coefficients decrease exponentially with degree. Multiplying n such
series, we show that the resulting coefficients decay rapidly enough to allow truncation at degree O(n).

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                              596
                                  M AKING P OLYNOMIALS ROBUST TO N OISE

     For step (iii), we write a general polynomial p : {−1, +1}n → [−1, 1] as the sum of its homogeneous
parts p = p0 + p1 + p2 + · · · + pd , where d is the degree of p. Using approximation theory and a convexity
argument, we show that kpi k∞ ≤ 2O(d) for all i. For our purposes, this means that a robust polynomial
for p can be obtained by summing robust polynomials for all pi with sufficiently small error, 2−Ω(d) .
Obtaining such a polynomial for each pi is the content of step (ii).
     Step (ii) is the most difficult part of the proof. A natural approach to the robust approximation of a
homogeneous polynomial p is to robustly approximate every monomial in p to within a suitable error δ ,
using the construction from step (i). Since we want the robust polynomial for p to have degree O(d), the
smallest setting that we can afford is δ = 2−Θ(d) . Unfortunately, there is no reason to believe that with this
δ , the proposed robust polynomial will have small error in approximating p. As a matter of fact, a direct
calculation even suggests that this approach is doomed: it is straightforward   to verify that a homogeneous
polynomial p : {−1, +1}n → [−1, 1] of degree d can have dn monomials, each equal to ± 2n dn −1/2 ,
                                                                                                      

which suggests that the proposed approximant for p could have error as large as
                                                −1/2
                                             n      n
                                          δ      2n         1.
                                             d      d

Surprisingly, we are able to show that the proposed robust approximant for p does work and furthermore
has excellent error, 2−Θ(d) .
    We now describe step (ii) in more detail. The naïve, term-by-term error analysis above ignores
key aspects of the problem, such as the convexity of the unit cube [−1, 1]n , the metric structure of the
hypercube {−1, +1}n , and the multilinearity of p. We contribute a novel technique that exploits these
considerations. In particular, we are able to express the error in the proposed approximant at any given
point z ∈ ([−4/3, −2/3] ∪ [2/3, 4/3])n as a series
                                                    ∞
                                                   ∑ ai p(zi ) ,
                                                   i=1

where each zi = zi (z) is a suitable point in [−1, 1]n , and the coefficients in the series are small and decay
rapidly: ∑∞              −Θ(d) . Since p is bounded by 1 in absolute value on the hypercube {−1, +1}n , it is
           i=1 |ai | ≤ 2
also bounded by 1 inside the convex cube [−1, 1]n , leading to the desired error estimate. In words, even
though the error in the approximation of an individual monomial is relatively large, we show that the
errors across the monomials behave in a coordinated way and essentially cancel each other out.


2    Notation and preliminaries
Throughout this manuscript, we represent the Boolean values “true” and “false” by −1 and +1, re-
spectively. In particular, Boolean functions are mappings f : X → {−1, +1} for some finite set X
such as X = {−1, +1}n . The natural numbers are denoted N = {0, 1, 2, 3, . . . }. The symbol log x
denotes the logarithm of x to base 2. For a string x ∈ Rn and a set S ⊆ {1, 2, . . . , n}, we adopt the
shorthand x|S = (xi1 , xi2 , . . . , xi|S| ) ∈ R|S| , where i1 < i2 < · · · < i|S| are the elements of S. The family
of all subsets of a given set X is denoted P(X). The symbol Sn stands for the group of permutations

                      T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                  597
                                            A LEXANDER A. S HERSTOV

σ : {1, 2, . . . , n} → {1, 2, . . . , n}. A function φ : Rn → R is called symmetric if φ is invariant under per-
mutations of the variables, i. e., φ (x) ≡ φ (xσ (1) , xσ (2) , . . . , xσ (n) ) for all σ ∈ Sn . We adopt the standard
definition of the sign function:
                                                         
                                                         −1 if t < 0,
                                                         
                                                 sgnt = 0        if t = 0,
                                                         
                                                           1     if t > 0.
                                                         

For a set X, we let RX denote the real vector space of functions X → R. For φ ∈ RX , we write

                               kφ k∞ = sup |φ (x)| ,                   kφ k1 = ∑ |φ (x)| ,
                                         x∈X                                        x∈X

where the symbol kφ k1 is reserved for finite X. By the degree of a multivariate polynomial p on Rn ,
denoted deg p, we shall always mean the total degree of p, i. e., the greatest total degree of any monomial
of p. The symbol Pd stands for the family of all univariate real polynomials
                                                                                   of degree up to d. For a
natural number i and a real α, the generalized binomial coefficient αi is given by
                                     
                                     α      α(α − 1) · · · (α − i + 1)
                                         =                             .
                                      i                 i!
The following identity is well-known and straightforward to show [26, p. 186]:
                                              i  
                                       −1/2          1      2i
                                               = −             .                                                 (2.1)
                                         i           4       i

Fourier transform
Consider the vector space of functions {−1, +1}n → R. For S ⊆ {1, 2, . . . , n}, define χS : {−1, +1}n →
{−1, +1} by χS (x) = ∏i∈S xi . Then the functions χS , S ⊆ {1, 2, . . . , n}, form an orthogonal basis for the
vector space in question. In particular, every function φ : {−1, +1}n → R has a unique representation as
a linear combination of the characters χS :

                                               φ=       ∑           φ̂ (S) χS ,
                                                    S⊆{1,2,...,n}

where φ̂ (S) = 2−n ∑x∈{−1,+1}n φ (x)χS (x) is the Fourier coefficient of φ that corresponds to the character
χS . Note that

                                          deg φ = max{|S| : φ̂ (S) 6= 0} .

Formally, the Fourier transform is the linear transformation φ 7→ φ̂ , where φ̂ is viewed as a function
P({1, 2, . . . , n}) → R. In particular, we have the shorthand

                                            kφ̂ k1 =        ∑          |φ̂ (S)| .
                                                       S⊆{1,2,...,n}



                       T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                    598
                                  M AKING P OLYNOMIALS ROBUST TO N OISE

Multilinear extensions and convexity

As the previous paragraph shows, associated to every mapping φ : {−1, +1}n → R is a unique multilinear
polynomial φ̃ : Rn → R such that φ ≡ φ̃ on {−1, +1}n . In discussing the Fourier transform, we identified
φ with its multilinear extension φ̃ to Rn , and will continue to do so throughout this paper. Among other
things, this convention allows one to evaluate φ everywhere in [−1, 1]n as opposed to just {−1, +1}n . It
is a simple but important fact that for every φ : {−1, +1}n → R,

                                    max |φ (x)| =            max     |φ (x)| = kφ k∞ .
                                  x∈[−1,1]n             x∈{−1,+1}n


To see this, fix ξ ∈ [−1, 1]n arbitrarily and consider the probability distribution on strings x ∈ {−1, +1}n
whereby x1 , . . . , xn are distributed independently and E[xi ] = ξi for all i. Then φ (ξ ) = E[φ (x)] by
multilinearity, so that |φ (ξ )| ≤ maxx∈{−1,+1}n |φ (x)|.


3    A robust polynomial for parity
The objective of this section is to construct a low-degree robust polynomial for the parity function. In
other words, we will construct a polynomial p : Rn → R of degree O(n) such that p(x1 , x2 , . . . , xn ) ≈
∏ sgn xi whenever the input variables are close to Boolean: x1 , x2 , . . . , xn ∈ [−4/3, −2/3] ∪ [2/3, 4/3].
Recall that our eventual goal is a robust polynomial for every bounded real function. To this end, the
parity approximant p that we are to construct needs to possess a key additional property: the error
p(x) − ∏ sgn xi , apart from being small, needs to be expressible as a multivariate series in which the
coefficients decay rapidly with monomial order.
    To obtain this coefficient behavior, we use a carefully chosen approximant for the univariate function
sgnt. The simplest candidate is the following ingenious construction due to Buhrman et al. [10]:

                                                         d      
                                                  −d            d
                                      Bd (t) = 2         ∑ i t i (1 − t)d−i .
                                                       i=dd/2e


In words, Bd (t) is the probability of observing more heads than tails in a sequence of d independent coin
flips, each coming up heads with probability t. By the Chernoff bound, Bd sends [0, 1/4] → [0, 2−Ω(d) ]
and similarly [3/4, 1] → [1 − 2−Ω(d) , 1]. As Buhrman et al. [10] point out, this immediately gives a
degree-d approximant for the sign function with exponentially small error on [−4/3, −2/3] ∪ [2/3, 4/3].
Unfortunately, the coefficients of Bd do not exhibit the rapid decay that we require.√ Instead, in what
follows we use a purely analytic construction based on the Maclaurin series for 1/ 1 + t.
                                        √           √
Lemma 3.1. For x1 , x2 , . . . , xn ∈ (− 2, 0) ∪ (0, 2),

                                                                     n         i j  
                                                                                1     2i j
                  sgn(x1 x2 · · · xn ) = x1 x2 · · · xn         ∑ ∏ −4                     (x2j − 1)i j .   (3.1)
                                                        i1 ,i2 ,...,in ∈N j=1          i j



                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                 599
                                             A LEXANDER A. S HERSTOV

                                                    α i
                                                     
Proof. Recall the binomial series (1 + t)α = ∑∞ i=0 i t , valid for all −1 < t < 1 and all real α. In
particular, setting α = −1/2 gives
                               ∞       
                        1          −1/2 i
                      √     =∑            t
                       1 + t i=0     i
                               ∞ 
                                    1 i 2i i
                                        
                            =∑ −              t ,                     −1 < t < 1,                (3.2)
                              i=0   4       i

where the second√
                step uses (2.1). One easily verifies that            √ this absolutely convergent series is the Maclaurin
expansion for 1/ 1 + t. For all real t with 0 < |t| < 2, we have
                                                                     ∞ 
                                                                           1 i 2i 2
                                                                              
                                             t
                         sgnt = p                           =t∑ −                    (t − 1)i ,                     (3.3)
                                        1 + (t 2 − 1)               i=0    4      i
                                                                             √            √
where the second step holds by (3.2). For x1 , x2 , . . . , xn ∈ (− 2, 0) ∪ (0, 2), it follows that
                                                               (                            )
                                                         n           ∞
                                                                           1 i 2i
                 sgn(x1 x2 . . . xn ) = x1 x2 · · · xn ∏ ∑ −                         (x2j − 1)i
                                                        j=1 i=0            4     i
                                                                          n 
                                                                                1 i j 2i j
                                                                                    
                                      = x1 x2 · · · xn         ∑ ∏ −4                        (x2j − 1)i j .
                                                       i1 ,i2 ,...,in ∈N j=1             i j

    We have reached the main result of this section.
Theorem 3.2 (Robust polynomial for the parity function). Fix ε ∈ [0, 1) and let
                                 √          √          √        √
                          X = [− 1 + ε, − 1 − ε] ∪ [ 1 − ε, 1 + ε] .

Then for every integer N ≥ 1, there is an (explicitly given) polynomial p : Rn → R of degree at most
2N + n such that
                                                                              
                                                             N       n/2 N + n
                       max | sgn(x1 x2 · · · xn ) − p(x)| ≤ ε (1 + ε)           n.              (3.4)
                        Xn                                                 N
      Setting ε = 7/9 and N = 22n in this result, we infer that the function sgn(x1 x2 · · · xn ) with inputs
x1 , x2 , . . . , xn ∈ [−4/3, −2/3] ∪ [2/3, 4/3] can be approximated to within 2−n everywhere by a polynomial
of degree O(n). This is the desired robust polynomial for parity.

Proof of Theorem 3.2. If ε > N/(N + n), the right-hand side of (3.4) exceeds
                                             N        
                                         N         N +n
                                                           ≥1
                                       N +n          N
and therefore the theorem holds trivially with p = 0. In what follows, we treat the complementary case:
                                                               N
                                                   0≤ε ≤          .                                                (3.5)
                                                             N +n

                       T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                      600
                                   M AKING P OLYNOMIALS ROBUST TO N OISE

For a natural number d, let Id stand for the family of n-tuples (i1 , . . . , in ) of nonnegative integers such
that i1 + · · · + in = d. Clearly,
                                                          
                                                   d +n−1
                                          |Id | =             .
                                                      d
One can restate (3.1) in the form
                                                                                  ∞
                              sgn(x1 x2 · · · xn ) = x1 x2 · · · xn ∑ ξd (x1 , x2 , . . . , xn ) ,         (3.6)
                                                                                  d=0

where
                                                                          n   i j  
                                                                               1     2i j
                       ξd (x1 , x2 , . . . , xn ) =        ∑             ∏ −4             (x2j − 1)i j .
                                                      (i1 ,...,in )∈Id   j=1          i j

On X n ,
                                                                    
                                                           d   d +n−1         d
                                         kξd k∞ ≤ ε |Id | = ε          .
                                                                  d
As a result, dropping the terms ξN+1 , ξN+2 , . . . from the series (3.6) results in an approximant of degree
2N + n with pointwise error at most

                                                                                 N +n d
                   ∞                                                               
                            d +n−1                             N +n ∞
      (1 + ε)n/2 ∑ ε d                  ≤ (1 + ε)n/2 ε N+1              ∑    ε ·
                 d=N+1         d                               N + 1 d=0         N +1
                                                                     ∞            d
                                                     n/2 N+1 N + n              N
                                        ≤ (1 + ε) ε                     ∑ N +1                  by (3.5)
                                                               N + 1 d=0
                                                                    
                                                     n/2 N+1 N + n
                                        = (1 + ε) ε                    n.
                                                                 N

4    Reduction to homogeneous polynomials
We now turn to the construction of a robust polynomial for any real function on the Boolean cube. Real
functions given by homogeneous polynomials on {−1, +1}n are particularly convenient to work with,
and the proof is greatly simplified by first reducing the problem to the homogeneous case.
     To obtain this reduction, we need a bound on the coefficients of a univariate polynomial in terms
of its degree d and its maximum absolute value on the interval [−1, 1]. This fundamental problem was
solved in the nineteenth century by V. A. Markov [37, p. 81], who proved an upper bound of
                                                       √ !
                                                  (1 + 2)d
                                              O      √                                            (4.1)
                                                        d

on the size of the coefficients of any degree-d polynomial that is bounded on [−1, 1] in absolute value
by 1. Markov further showed that (4.1) is tight. Rather than appeal to this deep result in approximation
theory, we give a first-principles proof of a 4d bound which suffices for our purposes.

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                601
                                        A LEXANDER A. S HERSTOV

Lemma 4.1 (Coefficients of bounded polynomials). Let p(t) = ∑di=0 ait i be a given polynomial. Then
                                    d                              
                                              d            d −2j
                                   ∑ |ai | ≤ 4 j=0,1,...,d
                                                 max p
                                                             d
                                                                          .                      (4.2)
                                   i=0

Proof. We will use a common approximation-theoretic technique [13, 44] whereby one expresses p as
a linear combination of more structured polynomials and analyzes the latter objects. For this, define
q0 , q1 , . . . , qd ∈ Pd by

                                       (−1) j d d d d
                                                               
                                                           d − 2i
                             q j (t) =                  t−          , j = 0, 1, . . . , d .     (4.3)
                                        d! 2d     j ∏
                                                    i=0      d
                                             i6= j

One easily verifies that these polynomials behave like delta functions, in the sense that for i, j =
0, 1, 2, . . . , d,
                                               (
                                        d − 2i    1 if i = j,
                                   qj           =
                                          d       0 otherwise.

Lagrange interpolation gives
                                                  d   
                                                      
                                                d −2j
                                         p= ∑ p         qj .                                     (4.4)
                                            j=0   d

By (4.3), the absolute values of the coefficients of q j sum to at most

           dd d d                                             d
                                         
                          |d − 2i|       1   d      1
                      1 +            =          ·                (d + |d − 2i|)
          d! 2d j ∏
                  i=0        d         d! 2d j d + |d − 2 j| ∏
                                                             i=0
                     i6= j
                                                                 bd/2c               d
                                             1      d      1
                                         =            ·                 (2d  − 2i)    ∏ (2i)
                                           d! 2d j d + |d − 2 j| ∏  i=0            i=bd/2c+1

                                                          2d+1
                                                   
                                             1      d                     d!           d!
                                         =     d
                                                      ·           ·                 ·
                                           d! 2 j d + |d − 2 j| (dd/2e − 1)! bd/2c!
                                                                  
                                             d         2d      d −1
                                         =        ·
                                             j d + |d − 2 j| bd/2c
                                              
                                            d d
                                         ≤2         .
                                                j
In view of (4.4), we conclude that the absolute values of the coefficients of p sum to at most
                                        d                                   
                                   d −2j          d d        d               d −2j
                       max p
                     j=0,1,...,d     d        ∑ 2 j = 4 j=0,1,...,d
                                                                max p
                                                                               d
                                                                                      .
                                              j=0

    We are now prepared to give the desired reduction to the homogeneous case.

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                      602
                                  M AKING P OLYNOMIALS ROBUST TO N OISE

Theorem 4.2. Let φ : {−1, +1}n → R be a given function, deg φ = d. Write φ = φ0 + φ1 + · · · + φd ,
where φi : {−1, +1}n → R is given by φi = ∑|S|=i φ̂ (S)χS . Then

                                     kφi k∞ ≤ 4d kφ k∞ ,             i = 0, 1, . . . , d .

    The above result gives an upper bound on the infinity norm of the homogeneous parts of a polynomial
φ in terms of the infinity norm of φ itself. Note that the bound is entirely independent of the number
of variables. For our purposes, Theorem 4.2 has the following consequence: a robust polynomial for φ
can be obtained by constructing robust polynomials with error 2−Ω(d) kφ k∞ separately for each of the
homogeneous parts. The homogeneous problem will be studied in the next section.

Proof of Theorem 4.2. Pick a point x ∈ {−1, +1}n arbitrarily and fix it for the remainder of the proof.
Consider the univariate polynomial p ∈ Pd given by
                                                            d
                                                  p(t) = ∑ φi (x)t i .
                                                           i=0

For −1 ≤ t ≤ 1, consider the probability distribution µt on the Boolean cube {−1, +1}n whereby each
bit is independent and has expected value t. Then
                                                       "        #
      kφ k∞ ≥ E [φ (x1 z1 , . . . , xn zn )] =
                z∼µt
                                                  ∑ φ̂ (S) z∼µ
                                                            E ∏ xi zi
                                                                 t
                                                                                    =        ∑ φ̂ (S)t |S| ∏ xi = |p(t)| .
                                                 |S|≤d                i∈S                |S|≤d           i∈S

Hence, p is bounded on [−1, 1] in absolute value by kφ k∞ . By Lemma 4.1, it follows that the coefficients
of p do not exceed 4d kφ k∞ :

                                                  |φi (x)| ≤ 4d kφ k∞ .

Since the choice of x ∈ {−1, +1}n was arbitrary, the theorem follows.


5    Error cancellation in homogeneous polynomials
Recall that the goal of this paper is to construct a degree-O(d) robust polynomial for every degree-d
real function φ : {−1, +1}n → [−1, 1]. By the results of Section 4, we may now assume that φ is
homogeneous:

                                                  φ = ∑ φ̂ (S)χS .
                                                         |S|=d

A naïve approach would be to use the construction of Section 3 and robustly approximate each parity
χS to within 2−Θ(d) by a degree-O(d) polynomial. Unfortunately, it is unclear whether the resulting
polynomial would be a good approximant for φ . Indeed, as explained in the introduction, the cumulative
error could conceivably be as large as nΩ(d) 2−Θ(d)  1. The purpose of this section is to prove that, for
a careful choice of approximants for the χS , the errors do not compound but instead partially cancel,
resulting in a cumulative error of 2−Θ(d) . The proof is rather technical. To simplify the exposition, we first
illustrate our technique in the simpler setting of {−1, +1}n and then adapt it to our setting of interest, Rn .

                       T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                               603
                                        A LEXANDER A. S HERSTOV

Error cancellation on the Boolean hypercube
Let φ : {−1, +1}n → R be a degree-d homogeneous polynomial. Our goal is to show that perturbing the
Fourier characters of φ in a suitable, coordinated manner results in partial cancellation of the errors and
does not change the value of φ by much relative to the norm kφ k∞ . A precise statement follows.

Theorem 5.1. Let φ : {−1, +1}n → R be given such that φ̂ (S) = 0 whenever |S| =
                                                                              6 d. Fix an arbitrary
symmetric function δ : {−1, +1}d → R and define ∆ : {−1, +1}n → R by

                                         ∆(x) = ∑ φ̂ (S)δ (x|S ) .
                                                 |S|=d


Then

                                                   dd
                                          k∆k∞ ≤      kφ k∞ kδ̂ k1 .
                                                   d!

   In the above result, δ should be thought of as the error in approximating individual characters χS ,
whereas ∆ is the cumulative error so incurred. The theorem states that the cumulative error exceeds the
norm of φ and δ by a factor of only ed , which is substantially smaller than the factor of nΩ(d) growth that
one could expect a priori.


Proof of Theorem 5.1. We adopt the convention that a0 = 1 for all real a. For a given vector v ∈ {0, 1}d ,
consider the operator Av that takes a function f : {−1, +1}n → R into the function Av f : {−1, +1}n → R
where
                                            "                                                  !#
                                                              1 d                   1  d
                   (Av f )(x) =      E
                                 z∈{−1,+1}d
                                             z1 z2 · · · zd f   ∑ zi x1vi , . . . , d ∑ zi xnvi .
                                                              d i=1                   i=1


This definition is meaningful because we identify f with its multilinear extension to [−1, 1]n , thus making
it possible to evaluate f on inputs with fractional coordinates (see Section 2). It is important to note that
                                                           n
Av is a linear transformation in the vector space R{−1,+1} . This somewhat magical operator is the key to
the proof; the remainder of the proof will provide insight into how this definition could have been arrived
at in the first place. To start with,

                          kAv φ k∞ ≤ max |φ (x)| =         max        |φ (x)| = kφ k∞ ,                (5.1)
                                     x∈[−1,1]n           x∈{−1,+1}n


where the second step holds by convexity. The strategy of the proof is to express ∆ as a linear combination
of the Av φ with small coefficients. Since the infinity norm of any individual Av φ is small, that will give
the desired bound on the infinity norm of ∆.
    To find what suitable coefficients would be, we need to understand the transformation Av in terms of
the Fourier spectrum. Since Av is linear and the nonzero Fourier coefficients of φ have order d, it suffices

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                             604
                                      M AKING P OLYNOMIALS ROBUST TO N OISE

to determine the action of Av on the characters of order d. For every S ⊆ {1, 2, . . . , n} with |S| = d,
                                           "                              !#
                                                               1 d     vi
                  (Av χS )(x) =     E       z1 z2 · · · zd ∏     ∑ zi x j
                                z∈{−1,+1}d                 j∈S d i=1
                                           "                         "          ##
                                                                                                             vτ( j)
                                  =        E            z1 z2 · · · zd            E      ∏ zτ( j) x j
                                      z∈{−1,+1}d                         τ : S→{1,...,d}     j∈S
                                                        "                    "                           #                       #
                                                                                                                       v
                                  =            E                E              z1 z2 · · · zd ∏ zτ( j)        ∏       x jτ( j)       ,   (5.2)
                                      τ : S→{1,...,d}     z∈{−1,+1}d                        j∈S               j∈S

where τ chosen uniformly at random from all possible mappings S → {1, 2, . . . , d}. The inner expectation
in (5.2) acts like the indicator random variable for the event that τ is a bijection, i. e., it evaluates to 1
when τ is a bijection and vanishes otherwise. As a result,
                                                        "                           #
                                                                                       v
                         (Av χS )(x) = P[τ is a bijection] E ∏ x jτ( j) τ is a bijection
                                           τ                             τ
                                                                                 j∈S
                                        d!
                                       = d               E        [χT (x)] .                                                             (5.3)
                                        d               T ⊆S,
                                                   |T |=v1 +···+vd

By the symmetry of δ ,
                                               d
                               δ (x|S ) = ∑ δ̂ ({1, 2, . . . , k}) ∑ χT (x)
                                           k=0                               T ⊆S,
                                                                             |T |=k

                                          dd d
                                                                        
                                                                        d
                                        =    ∑   δ̂ ({1, 2, . . . , k})   (A1k 0d−k χS )(x) ,
                                          d! k=0                        k

where the second step uses (5.3). Taking a weighted sum over S and using the linearity of Av ,
                                                                                                          
                                       dd d
                                                                       
                                                                     d                                    
            ∑         φ̂ (S)δ (x|S ) =    ∑   δ̂ ({1, 2, . . . , k})      A1k 0d−k    ∑          φ̂ (S)χS 
                                                                                                            (x) ,
        S⊆{1,2,...,n}
                                       d! k=0                         k   
                                                                                    S⊆{1,2,...,n}
              |S|=d                                                                                      |S|=d

or equivalently

                                      dd d
                                                                    
                                                                    d
                                   ∆=    ∑   δ̂ ({1, 2, . . . , k})   A k d−k φ .
                                      d! k=0                        k 10

In light of (5.1), this representation gives the sought upper bound on the norm of ∆:

                                 dd d                                dd
                                                           
                                                           d
                         k∆k∞ ≤ ∑ |δ̂ ({1, 2, . . . , k})|   kφ k∞ =     kφ k∞ kδ̂ k1 .
                                 d! k=0                    k         d!


                        T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                                           605
                                          A LEXANDER A. S HERSTOV

Error cancellation with real variables
We now consider the error cancellation problem in its full generality. Again, our goal will be to show that
replacing individual characters with suitable approximants results in moderate cumulative error. This
time, √
      however, the
                √ input variables
                             √     are
                                    √ no longer restricted to be Boolean, and can take on arbitrary values
in [− 1 + ε, − 1 − ε] ∪ [ 1 − ε, 1 + ε] for 0 < ε < 1. This in turn means that the error term will be
given by an infinite series. Another difference is that the coefficients of the error series will not converge
to zero rapidly enough, requiring additional ideas to bound the cumulative error.
Theorem 5.2. Let φ : {−1, +1}n → R be given such that φ̂ (S) = 0 whenever |S| =
                                                                              6 d. Let 0 < ε < 1 and
                                 √        √           √        √
                           X = [− 1 + ε, − 1 − ε] ∪ [ 1 − ε, 1 + ε] .
Then for every integer D ≥ 1, there is an (explicitly given) polynomial p : Rd → R of degree at most
2D + d such that
                                         P(x) =       ∑           φ̂ (S)p(x|S )
                                                  S⊆{1,2,...,n}
                                                    |S|=d

obeys
                                                                           d
                                                                                 
                                                                   d/2 d      D+d
                                                                               D
             max |φ (sgn x1 , . . . , sgn xn ) − P(x)| ≤ (1 + ε)           ε       d kφ k∞ .            (5.4)
              nX                                                        d!     D
Proof. As before, we adopt the notational convention that a0 = 1 for all real a. We will follow the proof
of Theorem 5.1 as closely as possible, pointing out key differences as we go along.
    If ε > D/(D + d), then the right-hand side of (5.4) is at least
                                         D         
                                      D         D+d
                                                        kφ k∞ ≥ kφ k∞
                                    D+d           D
and hence the theorem holds trivially with p = 0. In what follows, we treat the complementary case
ε ≤ D/(D + d). To start with,
                                         d  v j      
                             v1 +···+vd       1    2v j
                  ∑d       ε            ∏ 4               ≤  ∑      ε v1 +···+vd
                 v∈N :                  j=1 |       v j     v∈Nd :
                                                {z      }
               v1 +···+vd ≥D+1                      ≤1               v1 +···+vd ≥D+1
                                                                               ∞       
                                                                      D+1      D+d +i i
                                                                  =ε      ∑              ε
                                                                          i=0 D + 1 + i
                                                                                        D+d i i
                                                                                 ∞       
                                                                      D+1 D + d
                                                                  ≤ε               ∑ D+1 ε
                                                                            D + 1 i=0
                                                                                         i
                                                                            D+d ∞         D
                                                                  ≤ ε D+1          ∑ D+1
                                                                            D + 1 i=0
                                                                                
                                                                            D+d
                                                                  = ε D+1         d,                    (5.5)
                                                                             D


                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                              606
                                   M AKING P OLYNOMIALS ROBUST TO N OISE

where the fourth line in the derivation uses ε ≤ D/(D + d). Define p by
                                                    d 
                                                        1 v j 2v j
                                                            
                       p(x1 , . . . , xd ) =  ∑d ∏ − 4               x j (x2j − 1)v j .
                                             v∈N : j=1          v j
                                               v1 +···+vd ≤D

Analogous to the Boolean setting, we will define functions to capture the error in approximating an
individual character as well as the cumulative error. Let δ : X d → R and ∆ : X n → R be given by
                                                     d 
                                                            1 v j 2v j
                                                                
                     δ (x1 , . . . , xd ) =  ∑d     ∏ −4               x j (x2j − 1)v j ,
                                            v∈N :   j=1            v j
                                              v1 +···+vd ≥D+1



                      ∆(x1 , . . . , xn ) =       ∑           φ̂ (S)δ (x|S ) .
                                              S⊆{1,2,...,n}
                                                |S|=d

Lemma 3.1 implies that δ is the error incurred in approximating a single character by p, in other words,
δ (x1 , . . . , xd ) = sgn(x1 · · · xd ) − p(x1 , . . . , xd ). Hence, ∆ captures the cumulative error:
                                        ∆(x) = φ (sgn x1 , . . . , sgn xn ) − P(x).                    (5.6)
    Recall that our goal is to place an upper bound on k∆k∞ . For v ∈ Nd , consider the operator Av that
takes a function f : {−1, +1}n → R into a function Av f : X n → R where
                                          "                                                    !#
                                                                   1 d zi x j (x2j − 1)vi
                  (Av f )(x) =     E       z1 z2 · · · zd f . . . , ∑ v √                 ,...    .
                               z∈{−1,+1}d                          d i=1 ε i 1 + ε
                                                                   |       {z           }
                                                                                   jth coordinate
                                                                                 √                 √
The jth coordinate in this expression is bounded by 1 in absolute value because 1 − ε < |x j | < 1 + ε
on X n . This definition departs from the earlier one in Theorem 5.1, where v was restricted to 0/1 entries.
Perhaps the most essential difference is the presence of scaling factors in the denominator—it is what
ultimately allows one to bound the cumulative error in the setting of an infinite series. Note that Av is a
                                         n         n
linear transformation sending R{−1,+1} into RX . We further have
                           kAv φ k∞ ≤ max |φ (x)| =                       max      |φ (x)| = kφ k∞ ,   (5.7)
                                              x∈[−1,1]n               x∈{−1,+1}n

where the first step uses the fact that Av φ has domain X n rather than all of Rn , and the second step holds
by convexity.
    We proceed to examine the action of Av on the characters of order d. Since the definition of Av is
symmetric with respect to the n coordinates, it suffices to consider S = {1, 2, . . . , d}:
                                                    "                                          !#
                                                                     d
                                                                        1 d zi x j (x2j − 1)vi
                   (Av χ{1,...,d} )(x) =     E       z1 z2 · · · zd ∏     ∑ vi √1 + ε
                                         z∈{−1,+1}d                 j=1 d i=1 ε
                                                      " "               #                       #
                                                                 d          d x (x2 − 1)vτ( j)
                                             1                                  j j
                                       =             E E ∏ z j zτ( j) ∏                           ,
                                         (1 + ε)d/2 τ z j=1                j=1       ε vτ( j)


                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                             607
                                                 A LEXANDER A. S HERSTOV

where the first expectation is taken over a uniformly random mapping τ : {1, 2, . . . , d} → {1, 2, . . . , d}.
The expectation over z acts like the indicator random variable for the event that τ is a bijection, i. e., it
evaluates to 1 when τ is a bijection and vanishes otherwise. Thus,
                                                                 "                                     #
                                                                    d x (x2 − 1)vτ( j)
                                  1                                    j j
        (Av χ{1,...,d} )(x) =             P[τ is a bijection] E ∏                      τ is a bijection
                              (1 + ε)d/2 τ                     τ
                                                                   j=1    ε vτ( j)
                                                               "                   #
                                                                  d
                                       1              d!
                            =                        · · E ∏ x j (x2j − 1)vσ ( j) .                      (5.8)
                              (1 + ε)d/2 ε v1 +···+vd d d σ ∈Sd j=1

Now, consider the operator
                                                               (                  v j  )
                                                                                    d
                                  dd                                               1     2v j
                   A = (1 + ε)d/2                    ∑          ε v1 +···+vd ∏ −              Av .
                                  d!           v∈Nd :                        j=1   4      vj
                                         v1 +···+vd ≥D+1

This operator is well-defined because by (5.5), the series in question converges absolutely. By (5.8), the
symmetry of δ , and linearity, (Aχ{1,...,d} )(x) = δ (x1 , . . . , xd ) . Since the definition of A is symmetric with
respect to the n coordinates, we conclude that

                                                      (AχS )(x) = δ (x|S )

for all subsets S ⊆ {1, 2, . . . , n} of cardinality d. As an immediate consequence,
                                                                             
                                                                                                  
                        ∆=       ∑           φ̂ (S) · (AχS ) = A 
                                                                              ∑          φ̂ (S)χS 
                                                                                                    = Aφ ,
                             S⊆{1,2,...,n}                                S⊆{1,2,...,n}
                               |S|=d                                        |S|=d

where the second step uses the linearity of A. In particular,

                  k∆k∞ = kAφ k∞
                                             d                                                 d v j  
                                    d/2 d                              v1 +···+vd                1      2v j
                         ≤ (1 + ε)                       ∑         ε                kAv φ k∞ ∏
                                         d!            v∈Nd :                                j=1 4       vj
                                                 v1 +···+vd ≥D+1

                                             d
                                                               
                                    d/2 d            D+1    D+d
                         ≤ (1 + ε)               ε               d kφ k∞ ,
                                         d!                  D
where the final step follows by (5.5) and (5.7). In light of (5.6), the proof is complete.


6    Main result
We are now in a position to prove the main result of this paper, which states that every bounded real
polynomial can be made robust with only a constant-factor increase in degree. Recall that we have

                      T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                    608
                                 M AKING P OLYNOMIALS ROBUST TO N OISE

already proved this fact for homogeneous polynomials (see Theorems 5.1 and 5.2). It remains to remove
the homogeneity assumption, which we will do using the technique of Section 4. For the purposes of
exposition, we will first show how to remove the homogeneity assumption in the much simpler context of
Theorem 5.1. Essentially the same technique will then allow us to prove the main result.
Theorem 6.1. Let φ : {−1, +1}n → R be given, deg φ = d. Fix symmetric functions δi : {−1, +1}i → R,
i = 0, 1, 2, . . . , d, and define ∆ : {−1, +1}n → R by
                                          ∆(x) = ∑ φ̂ (S)δ|S| (x|S ) .
                                                    |S|≤d

Then
                                                                    d
                                         k∆k∞ ≤ (4e)d kφ k∞ ∑ kδ̂i k1 .
                                                                   i=0

     The functions δ0 , δ1 , . . . , δd in this result are to be thought of as the errors in the approximation of
characters of orders 0, 1, . . . , d, respectively, and ∆ is the cumulative error so incurred. As the theorem
shows, the cumulative error exceeds the norms of the functions involved by a factor of only 2O(d) , which
is independent of the number of variables.

Proof of Theorem 6.1. We have
                                                            d
                                               k∆k∞ ≤ ∑ k∆i k∞ ,                                           (6.1)
                                                            i=0

where ∆i : {−1, +1}n → R is given by ∆i (x) = ∑|S|=i φ̂ (S)δi (x|S ). For i = 0, 1, . . . , d, consider φi =
∑|S|=i φ̂ (S)χS , the degree-i homogeneous part of φ . By Theorem 5.1,

                                            k∆i k∞ ≤ ei kφi k∞ kδ̂i k1 .                                   (6.2)
By Theorem 4.2,
                                    kφi k∞ ≤ 4d kφ k∞ ,           i = 0, 1, . . . , d .                    (6.3)
Combining (6.1)–(6.3) completes the proof.

    We will now apply a similar argument in the setting of real variables.
Theorem 6.2 (Main Theorem). Let 0 < ε < 1 and φ : {−1, +1}n → [−1, 1] be given, deg φ = d. Then
for each δ > 0, there is a polynomial P of degree
                                                            
                                             1      1      1
                                      O         d+     log
                                           1−ε     1−ε     δ
such that
                                   maxn |φ (sgn x1 , . . . , sgn xn ) − P(x)| < δ ,                        (6.4)
                                   x∈X

where X = [−1 − ε, −1 + ε] ∪ [1 − ε, 1 + ε]. Furthermore, P is given explicitly in terms of the Fourier
spectrum of φ .

                     T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                 609
                                              A LEXANDER A. S HERSTOV

    Letting ε = 1/3 in this result settles Theorem 1.1 in the introduction.

Proof. We first consider 0 ≤ ε ≤ 1/10, in which case
                                   " r   r     # "r    r     #
                                      1      1       1     1
                              X ⊂ − 1+ ,− 1−    ∪  1− , 1+     .                                                  (6.5)
                                      4      4       4     4

Let D = D(d, δ ) ≥ 1 be a parameter to be chosen later. For i = 0, 1, . . . , d, consider φi = ∑|S|=i φ̂ (S)χS ,
the degree-i homogeneous part of φ . By Theorem 4.2,

                                           kφi k∞ ≤ 4d ,           i = 0, 1, . . . , d .                          (6.6)

In light of (6.5), Theorem 5.2 gives explicit polynomials pi : Ri → R, i = 0, 1, 2, . . . , d, each of degree at
most 2D + d, such that

                                                                                                 Kd
                       max φi (sgn x1 , . . . , sgn xn ) −       ∑           φ̂ (S)pi (x|S ) ≤      kφi k∞
                        n X
                                                             S⊆{1,2,...,n}
                                                                                                 2D
                                                                |S|=i


for some absolute constant K > 1. Letting

                                              P(x) = ∑ φ̂ (S)p|S| (x|S ) ,
                                                       |S|≤d


we infer that

                                                                     Kd d          (d + 1)(4K)d
                     max |φ (sgn x1 , . . . , sgn xn ) − P(x)| ≤        ∑ i∞kφ k ≤              ,
                      n
                      X                                              2D i=0             2D

where the last step uses (6.6). Therefore, (6.4) holds with D = O(d + log 1/δ ).
    To handle the case 1/10 < ε < 1, basic approximation theory [44] gives an explicit univariate
polynomial r of degree O(1/(1 − ε)) that sends [−1−ε, −1+ε] → [−11/10, −9/10] and [1−ε, 1+ε] →
[9/10, 11/10]. In particular, we have |φ (sgn x1 , . . . , sgn xn ) − P(r(x1 ), . . . , r(xn ))| < δ everywhere on X n ,
where P is the approximant constructed in the previous paragraph.

Remark 6.3. The degree of the robust polynomial in Theorem 1.1 grows additively with O(log 1/δ ),
where δ is the error parameter. As stated in the introduction, this dependence on δ is best possible. To
see this, we may assume that p takes on −1 and +1 on the hypercube {−1, +1}n ; this can be achieved
by appropriately translating and scaling p, without increasing its infinity norm beyond 1. Without loss
of generality, p(1, 1, . . . , 1) = 1 and p(−1, −1, . . . , −1) = −1. As a result, the univariate polynomial
probust (t,t, . . . ,t) would need to approximate the sign function on [−4/3, −2/3] ∪ [2/3, 4/3] to within δ ,
which forces deg(probust ) ≥ Ω(log 1/δ ) by basic approximation theory [16].

                       T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                                      610
                              M AKING P OLYNOMIALS ROBUST TO N OISE

Acknowledgments
The author would like to thank Mark Braverman, Harry Buhrman, Vitaly Feldman, Dmitry Gavinsky,
Ankur Moitra, Ryan O’Donnell, Ronald de Wolf, and the anonymous reviewers for their useful feedback.
In particular, the author is thankful to Ryan for the hard-to-find reference [37] on the growth rate of
coefficients of bounded polynomials.


References
 [1] S COTT A ARONSON: Limitations of quantum advice and one-way communication. Theory of
     Computing, 1(1):1–28, 2005. Preliminary version in CCC’04. [doi:10.4086/toc.2005.v001a001]
     594

 [2] S COTT A ARONSON AND YAOYUN S HI: Quantum lower bounds for the collision and the ele-
     ment distinctness problems. J. ACM, 51(4):595–605, 2004. Preliminary version in STOC’02.
     [doi:10.1145/1008731.1008735] 594

 [3] JAMES A SPNES , R ICHARD B EIGEL , M ERRICK L. F URST, AND S TEVEN RUDICH: The expressive
     power of voting polynomials. Combinatorica, 14(2):135–148, 1994. Preliminary version in
     STOC’91. [doi:10.1007/BF01215346] 594

 [4] ROBERT B EALS , H ARRY B UHRMAN , R ICHARD C LEVE , M ICHELE M OSCA , AND RONALD DE
     W OLF: Quantum lower bounds by polynomials. J. ACM, 48(4):778–797, 2001. Preliminary version
     in FOCS’98. [doi:10.1145/502090.502097] 594

 [5] PAUL B EAME AND DANG -T RINH H UYNH -N GOC: Multiparty communication complexity and
     threshold circuit size of AC0 . SIAM Journal on Computing, 41(3):484–518, 2012. Preliminary
     version in FOCS’09. [doi:10.1137/100792779] 594

 [6] R ICHARD B EIGEL , N ICK R EINGOLD , AND DANIEL A. S PIELMAN: PP is closed under in-
     tersection. J. Comput. System Sci., 50(2):191–202, 1995. Preliminary version in STOC’91.
     [doi:10.1006/jcss.1995.1017] 594

 [7] M ARK B RAVERMAN AND A NUP R AO: Towards coding for maximum errors in interactive commu-
     nication. In Proc. 43rd STOC, pp. 159–166. ACM Press, 2011. [doi:10.1145/1993636.1993659]
     594

 [8] H ARRY B UHRMAN , R ICHARD C LEVE , AND AVI W IGDERSON: Quantum vs. classical
     communication and computation. In Proc. 30th STOC, pp. 63–68. ACM Press, 1998.
     [doi:10.1145/276698.276713] 596

 [9] H ARRY B UHRMAN , R ICHARD C LEVE , RONALD DE W OLF, AND C HRISTOF Z ALKA: Bounds for
     small-error and zero-error quantum algorithms. In Proc. 40th FOCS, pp. 358–368. IEEE Comp.
     Soc. Press, 1999. [doi:10.1109/SFFCS.1999.814607] 594

                    T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                        611
                                      A LEXANDER A. S HERSTOV

[10] H ARRY B UHRMAN , I LAN N EWMAN , H EIN R ÖHRIG , AND RONALD DE W OLF: Robust polynomi-
     als and quantum algorithms. Theory Comput. Syst., 40(4):379–395, 2007. Preliminary version in
     STACS’05. [doi:10.1007/s00224-006-1313-z] 594, 595, 596, 599

[11] H ARRY B UHRMAN , N IKOLAI K. V ERESHCHAGIN , AND RONALD DE W OLF: On computation and
     communication with small bias. In Proc. 22nd IEEE Conf. on Computational Complexity (CCC’07),
     pp. 24–32. IEEE Comp. Soc. Press, 2007. (See also SAGA’07.). [doi:10.1109/CCC.2007.18] 594

[12] H ARRY B UHRMAN AND RONALD DE W OLF: Communication complexity lower bounds by
     polynomials. In Proc. 16th IEEE Conf. on Computational Complexity (CCC’01), pp. 120–130.
     IEEE Comp. Soc. Press, 2001. [doi:10.1109/CCC.2001.933879] 594

[13] E LLIOTT W. C HENEY: Introduction to Approximation Theory. Chelsea Publishing, New York, 2nd
     edition, 1982. 602

[14] C HINMOY D UTTA , YASHODHAN K ANORIA , D. M ANJUNATH , AND JAIKUMAR R ADHAKRISH -
     NAN : A tight lower bound for parity in noisy communication networks. In Proc. 19th Ann. ACM-
     SIAM Symp. on Discrete Algorithms (SODA’08), pp. 1056–1065. ACM Press, 2008. [ACM:1347198]
     594

[15] C HINMOY D UTTA AND JAIKUMAR R ADHAKRISHNAN: Lower bounds for noisy wireless networks
     using sampling algorithms. In Proc. 49th FOCS, pp. 394–402. IEEE Comp. Soc. Press, 2008.
     [doi:10.1109/FOCS.2008.72] 594

[16] A LEXANDRE E REMENKO AND P ETER Y UDITSKII: Uniform approximation of sgn x by polynomi-
     als and entire functions. J. d’Analyse Mathématique, 101(1):313–324, 2007. [doi:10.1007/s11854-
     007-0011-3] 610

[17] W ILLIAM S. E VANS AND N ICHOLAS P IPPENGER: Average-case lower bounds for noisy Boolean
     decision trees. SIAM J. Comput., 28(2):433–446, 1998. Preliminary version in STOC’96.
     [doi:10.1137/S0097539796310102] 594

[18] W ILLIAM S. E VANS AND L EONARD J. S CHULMAN: Signal propagation and noisy cir-
     cuits. IEEE Trans. Inform. Theory, 45(7):2367–2373, 1999. Preliminary version in FOCS’93.
     [doi:10.1109/18.796377] 594

[19] U RIEL F EIGE: On the complexity of finite random functions. Inform. Process. Lett., 44(6):295–296,
     1992. [doi:10.1016/0020-0190(92)90102-2] 594

[20] U RIEL F EIGE AND J OE K ILIAN: Finding OR in a noisy broadcast network. Inform. Process. Lett.,
     73(1-2):69–75, 2000. [doi:10.1016/S0020-0190(00)00002-8] 594

[21] U RIEL F EIGE , P RABHAKAR R AGHAVAN , DAVID P ELEG , AND E LI U PFAL: Computing with
     noisy information. SIAM J. Comput., 23(5):1001–1018, 1994. Preliminary version in STOC’90.
     [doi:10.1137/S0097539791195877] 594, 595

                    T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                         612
                             M AKING P OLYNOMIALS ROBUST TO N OISE

[22] P ÉTER G ÁCS AND A NNA G ÁL: Lower bounds for the complexity of reliable Boolean circuits with
     noisy gates. IEEE Trans. Inform. Theory, 40(2):579–583, 1994. Preliminary version in FOCS’91.
     [doi:10.1109/18.312190] 594

[23] ROBERT G. G ALLAGER: Finding parity in a simple broadcast network. IEEE Trans. Inform. Theory,
     34(2):176–180, 1988. [doi:10.1109/18.2626] 594

[24] R AN G ELLES , A NKUR M OITRA , AND A MIT S AHAI: Efficient and explicit coding for in-
     teractive communication. In Proc. 52nd FOCS, pp. 768–777. IEEE Comp. Soc. Press, 2011.
     [doi:10.1109/FOCS.2011.51] 594

[25] NAVIN G OYAL , G UY K INDLER , AND M ICHAEL E. S AKS: Lower bounds for the noisy broad-
     cast problem. SIAM J. Comput., 37(6):1806–1841, 2008. Preliminary version in FOCS’05.
     [doi:10.1137/060654864] 594, 595

[26] RONALD L. G RAHAM , D ONALD E. K NUTH , AND O REN PATASHNIK: Concrete Mathematics: A
     Foundation for Computer Science. Addison-Wesley, Boston, Mass., USA, 2nd edition, 1994. 598

[27] P ETER H ØYER , M ICHELE M OSCA , AND RONALD DE W OLF: Quantum search on bounded-error
     inputs. In Proc. 30th Internat. Colloq. on Automata, Languages and Programming (ICALP’03), pp.
     291–299. Springer, 2003. [doi:10.1007/3-540-45061-0_25] 596

[28] A DAM TAUMAN K ALAI , A DAM R. K LIVANS , Y ISHAY M ANSOUR , AND ROCCO A. S ERVEDIO:
     Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777–1805, 2008. Preliminary version in
     FOCS’05. [doi:10.1137/060649057] 594

[29] H ARTMUT K LAUCK , ROBERT Š PALEK , AND RONALD DE W OLF: Quantum and classical strong
     direct product theorems and optimal time-space tradeoffs. SIAM J. Comput., 36(5):1472–1493,
     2007. Preliminary version in FOCS’04. [doi:10.1137/05063235X] 594

[30] DANIEL J. K LEITMAN , F RANK T HOMSON L EIGHTON , AND Y UAN M A: On the design of reliable
     Boolean circuits that contain partially unreliable gates. J. Comput. System Sci., 55(3):385–401,
     1997. Preliminary version in FOCS’94. [doi:10.1006/jcss.1997.1531] 594
                                                                              1/3
[31] A DAM R. K LIVANS AND ROCCO A. S ERVEDIO: Learning DNF in time 2Õ(n ) . J. Comput. System
     Sci., 68(2):303–318, 2004. Preliminary version in STOC’01. [doi:10.1016/j.jcss.2003.07.007] 594

[32] A DAM R. K LIVANS AND A LEXANDER A. S HERSTOV: Unconditional lower bounds for learning
     intersections of halfspaces. Machine Learning, 69(2-3):97–114, 2007. Preliminary version in
     COLT’06. [doi:10.1007/s10994-007-5010-1] 594

[33] A DAM R. K LIVANS AND A LEXANDER A. S HERSTOV: Lower bounds for agnostic learning via
     approximate rank. Comput. Complexity, 19(4):581–604, 2010. Preliminary version in COLT’07.
     [doi:10.1007/s00037-010-0296-y] 594

                   T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                       613
                                    A LEXANDER A. S HERSTOV

[34] M ATTHIAS K RAUSE AND PAVEL P UDLÁK: On the computational power of depth-2 circuits with
     threshold and modulo gates. Theoret. Comput. Sci., 174(1-2):137–156, 1997. Preliminary version
     in STOC’94. [doi:10.1016/S0304-3975(96)00019-9] 594

[35] M ATTHIAS K RAUSE AND PAVEL P UDLÁK: Computing Boolean functions by polynomials and
     threshold circuits. Comput. Complexity, 7(4):346–370, 1998. Preliminary version in FOCS’95.
     [doi:10.1007/s000370050015] 594

[36] E YAL K USHILEVITZ AND Y ISHAY M ANSOUR: Computation in noisy radio networks.
     SIAM J. Discrete Math., 19(1):96–108, 2005. Preliminary version in SODA’98.
     [doi:10.1137/S0895480103434063] 594

[37] V LADIMIR A. M ARKOV: On functions of least deviation from zero in a given interval. Russian
     Academy of Sciences, St. Petersburg, 1892. In Russian. 601, 611

[38] I LAN N EWMAN: Computing in fault tolerant broadcast networks and noisy decision trees.
     Random Structures & Algorithms, 34(4):478–501, 2009. Preliminary version in CCC’04.
     [doi:10.1002/rsa.20240] 594

[39] R AMAMOHAN PATURI AND M ICHAEL E. S AKS: Approximating threshold circuits by ratio-
     nal functions. Inform. and Comput., 112(2):257–272, 1994. Preliminary version in FOCS’90.
     [doi:10.1006/inco.1994.1059] 594

[40] N ICHOLAS P IPPENGER: On networks of noisy gates. In Proc. 26th FOCS, pp. 30–38. IEEE Comp.
     Soc. Press, 1985. [doi:10.1109/SFCS.1985.41] 594

[41] A LEXANDER A. R AZBOROV: Quantum communication complexity of symmetric predicates.
     Izvestiya: Mathematics, 67(1):145–159, 2003. [doi:10.1070/IM2003v067n01ABEH000422] 594

[42] A LEXANDER A. R AZBOROV AND A LEXANDER A. S HERSTOV: The sign-rank of AC0 . SIAM J.
     Comput., 39(5):1833–1855, 2010. Preliminary version in FOCS’08. [doi:10.1137/080744037] 594

[43] R ÜDIGER R EISCHUK AND B ERND S CHMELTZ: Reliable computation with noisy circuits and
     decision trees–a general n log n lower bound. In Proc. 32nd FOCS, pp. 602–611. IEEE Comp. Soc.
     Press, 1991. [doi:10.1109/SFCS.1991.185425] 594

[44] T HEODORE J. R IVLIN: An Introduction to the Approximation of Functions. Dover Publications,
     New York, 1981. 602, 610

[45] L EONARD J. S CHULMAN: Communication on noisy channels: A coding theorem for computation.
     In Proc. 33rd FOCS, pp. 724–733. IEEE Comp. Soc. Press, 1992. [doi:10.1109/SFCS.1992.267778]
     594

[46] L EONARD J. S CHULMAN: Coding for interactive communication. IEEE Trans. Inform. Theory,
     42(6):1745–1756, 1996. Preliminary versions in FOCS’92 and STOC’93. [doi:10.1109/18.556671]
     594

                   T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                     614
                             M AKING P OLYNOMIALS ROBUST TO N OISE

[47] A LEXANDER A. S HERSTOV: Communication lower bounds using dual polynomials. Bulletin of
     the EATCS, 95:59–93, 2008. EATCS. 594

[48] A LEXANDER A. S HERSTOV: The intersection of two halfspaces has high threshold degree. In Proc.
     50th FOCS, pp. 343–362. IEEE Comp. Soc. Press, 2009. [doi:10.1109/FOCS.2009.18] 594

[49] A LEXANDER A. S HERSTOV: Separating AC0 from depth-2 majority circuits. SIAM J. Comput.,
     38(6):2113–2129, 2009. Preliminary version in STOC’07. [doi:10.1137/08071421X] 594

[50] A LEXANDER A. S HERSTOV: Optimal bounds for sign-representing the intersection of
     two halfspaces by polynomials. In Proc. 42nd STOC, pp. 523–532. ACM Press, 2010.
     [doi:10.1145/1806689.1806762] 594

[51] K AI -Y EUNG S IU , V WANI P. ROYCHOWDHURY, AND T HOMAS K AILATH: Rational approximation
     techniques for analysis of neural networks. IEEE Trans. Inform. Theory, 40(2):455–466, 1994.
     Preliminary version in NIPS’92. [doi:10.1109/18.312168] 594

[52] DANIEL A. S PIELMAN: Highly fault-tolerant parallel computation. In Proc. 37th FOCS, pp.
     154–163. IEEE Comp. Soc. Press, 1996. [doi:10.1109/SFCS.1996.548474] 594

[53] M ARIO S ZEGEDY AND X IAOMIN C HEN: Computing Boolean functions from multiple faulty copies
     of input bits. Theoret. Comput. Sci., 321(1):149–170, 2004. Preliminary version in LATIN’02.
     [doi:10.1016/j.tcs.2003.07.001] 594

[54] J UN TARUI AND TATSUIE T SUKIJI: Learning DNF by approximating inclusion-exclusion formulae.
     In Proc. 14th IEEE Conf. on Computational Complexity (CCC’99), pp. 215–221. IEEE Comp. Soc.
     Press, 1999. [doi:10.1109/CCC.1999.766279] 594


AUTHOR

      Alexander A. Sherstov
      Assistant Professor
      University of California, Los Angeles
      sherstov cs ucla edu
      http://www.cs.ucla.edu/~sherstov


ABOUT THE AUTHOR

      A LEXANDER A. S HERSTOV completed his Ph. D. in 2009 at the University of Texas at
         Austin. After a postdoc at Microsoft Research, Alexander joined UCLA as an assistant
         professor of computer science. His research interests include computational complexity
         theory, computational learning theory, and quantum computing.




                   T HEORY OF C OMPUTING, Volume 9 (18), 2009, pp. 593–615                        615