Limitations of Quantum Advice and One-Way Communication

Authors Scott Aaronson,
Plaintext
                                    T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28
                                               http://theoryofcomputing.org




               Limitations of Quantum Advice and
                   One-Way Communication
                                                               Scott Aaronson∗
                                        Received: June 29, 2004; published: February 9, 2005.



        Abstract: Although a quantum state requires exponentially many classical bits to de-
        scribe, the laws of quantum mechanics impose severe restrictions on how that state can be
        accessed. This paper shows in three settings that quantum messages have only limited
        advantages over classical ones.
            First, we show that BQP/qpoly ⊆ PP/poly, where BQP/qpoly is the class of problems
        solvable in quantum polynomial time, given a polynomial-size “quantum advice state” that
        depends only on the input length. This resolves a question of Buhrman, and means that we
        should not hope for an unrelativized separation between quantum and classical advice. Un-
        derlying our complexity result is a general new relation between deterministic and quantum
        one-way communication complexities, which applies to partial as well as total functions.
            Second, we construct an oracle relative to which NP 6⊂ BQP/qpoly. To do so, we
        use the polynomial method to give the first correct proof of a direct product theorem for
        quantum search. This theorem has other applications; for example, it can be used to fix a
        result of Klauck about quantum time-space tradeoffs for sorting.
            Third, we introduce a new trace distance method for proving lower bounds on quan-
        tum one-way communication complexity. Using this method, we obtain optimal quantum
   ∗ Work supported by an NSF Graduate Fellowship.



ACM Classification: F.1.2, F.1.3
AMS Classification: 81P68, 81P05, 68Q10, 68Q15, 42A05
Key words and phrases: quantum computation, advice, relativized complexity, direct product theorem,
nonuniform, BQP, PP, communication, polynomial method, oracle


Authors retain copyright to their work and grant Theory of Computing unlimited rights
to publish the work electronically and in hard copy. Use of the work is permitted as
long as the author(s) and the journal are properly acknowledged. For the detailed
copyright statement, see http://theoryofcomputing.org/copyright.html.

c 2005 Scott Aaronson                                                                   DOI: 10.4086/toc.2005.v001a001
                                                    S COTT A ARONSON

        lower bounds for two problems of Ambainis, for which no nontrivial lower bounds were
        previously known even for classical randomized protocols.
            A preliminary version of this paper appeared in the 2004 Conference on Computational
        Complexity (CCC).



1     Introduction
How many classical bits can “really” be encoded into n qubits? Is it n, because of Holevo’s Theorem
[19]; 2n, because of dense quantum coding [13] and quantum teleportation [9]; exponentially many,
because of quantum fingerprinting [12]; or infinitely many, because amplitudes are continuous? The
best general answer to this question is probably mu, the Zen word that “unasks” a question.1
     To a computer scientist, however, it is natural to formalize the question in terms of quantum one-way
communication complexity [6, 12, 20, 40]. The setting is as follows: Alice has an n-bit string x, Bob
has an m-bit string y, and together they wish to evaluate f (x, y) where f : {0, 1}n × {0, 1}m → {0, 1} is
a Boolean function. After examining her input x = x1 . . . xn , Alice can send a single quantum message
ρx to Bob, whereupon Bob, after examining his input y = y1 . . . ym , can choose some basis in which to
measure ρx . He must then output a claimed value for f (x, y). We are interested in how long Alice’s
message needs to be, for Bob to succeed with high probability on any x, y pair. Ideally the length will
be much smaller than if Alice had to send a classical message.
     Communication complexity questions have been intensively studied in theoretical computer science
(see the book of Kushilevitz and Nisan [23] for example). In both the classical and quantum cases,
though, most attention has focused on two-way communication, meaning that Alice and Bob get to
send messages back and forth. We believe that the study of one-way quantum communication presents
two main advantages. First, many open problems about two-way communication look gruesomely
difficult—for example, are the randomized and quantum communication complexities of every total
Boolean function polynomially related? We might gain insight into these problems by tackling their
one-way analogues first. And second, because of its greater simplicity, the one-way model more directly
addresses our opening question: how much “useful stuff” can be packed into a quantum state? Thus,
results on one-way communication fall into the quantum information theory tradition initiated by Holevo
[19] and others, as much as the communication complexity tradition initiated by Yao [38].
     Related to quantum one-way communication is the notion of quantum advice. As pointed out by
Nielsen and Chuang [28, p.203], there is no compelling physical reason to assume that the starting state
of a quantum computer is a computational basis state:2

       [W]e know that many systems in Nature ‘prefer’ to sit in highly entangled states of many
       systems; might it be possible to exploit this preference to obtain extra computational power?
    1 Another mu-worthy question is, “Where does the power of quantum computing come from? Superposition? Interference?

The large size of Hilbert space?”
    2 One might object that the starting state is itself the outcome of some computational process, which began no earlier than

the Big Bang. However, (1) for all we know highly entangled states were created in the Big Bang, and (2) 14 billion years is
a long time.


                                T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                             2
                     L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

       It might be that having access to certain states allows particular computations to be done
       much more easily than if we are constrained to start in the computational basis.

    One way to interpret Nielsen and Chuang’s provocative question is as follows. Suppose we could re-
quest the best possible starting state for a quantum computer, knowing the language to be decided and the
input length n but not knowing the input itself.3 Denote the class of languages that we could then decide
by BQP/qpoly—meaning quantum polynomial time, given an arbitrarily-entangled but polynomial-
size quantum advice state.4 How powerful is this class? If BQP/qpoly contained (for example) the
NP-complete problems, then we would need to rethink our most basic assumptions about the power
of quantum computing. We will see later that quantum advice is closely related to quantum one-way
communication, since we can think of an advice state as a one-way message sent to an algorithm by a
benevolent “advisor.”
    This paper is about the limitations of quantum advice and one-way communication. It presents three
contributions which are basically independent of one another.
    First, Section 3 shows that D1 ( f ) = O mQ12 ( f ) log Q12 ( f ) for any Boolean function f , partial or
                                                                     

total. Here D1 ( f ) is deterministic one-way communication complexity, Q12 ( f ) is bounded-error one-
way quantum communication complexity, and m is the length of Bob’s input. Intuitively, whenever the
set of Bob’s possible inputs is not too large, Alice can send him a short classical message that lets him
learn the outcome of any measurement he would have wanted to make on the quantum          message ρx . It is
interesting that a slightly tighter bound for total functions—D1 ( f ) = O mQ12 ( f ) —follows easily from
                                                                                     

a result of Klauck [20] together with a lemma of Sauer [34] about VC-dimension. However, the proof
of the latter bound is highly nonconstructive, and seems to fail for partial f .
    Using our communication complexity result, in Section 3.1 we show that BQP/qpoly ⊆ PP/poly—
in other words, BQP with polynomial-size quantum advice can be simulated in PP with polynomial-size
classical advice.5 This resolves a question of Harry Buhrman (personal communication), who asked
whether quantum advice can be simulated in any classical complexity class with short classical advice.
A corollary of our containment is that we cannot hope to show an unrelativized separation between
quantum and classical advice (that is, that BQP/poly 6= BQP/qpoly), without also showing that PP
does not have polynomial-size circuits.
    What makes this result surprising is that, in the minds of many computer scientists, a quantum
state is basically an exponentially long vector. Indeed, this belief seems to fuel skepticism of quantum
computing (see Goldreich [17] for example). But given an exponentially long advice string, even a
classical computer could decide any language whatsoever. So one might imagine naı̈vely that quantum
advice would let us solve problems that are not even recursively enumerable given classical advice of
   3 If we knew the input, we would simply request a starting state that contains the right answer!
   4 BQP/qpoly might remind readers of a better-studied class called QMA (Quantum Merlin-Arthur).          But there are two key
differences: first, advice can be trusted while proofs cannot; second, proofs can be tailored to a particular input while advice
cannot.
    5 Here PP is Probabilistic Polynomial-Time, or the class of languages for which there exists a polynomial-time classical

randomized algorithm that accepts with probability greater than 1/2 if and only if an input x is in the language. Also, given
a complexity class C, the class C/poly consists of all languages decidable by a C machine, given a polynomial-size classical
advice string that depends only on the input length. See www.complexityzoo.com for more information about standard
complexity classes mentioned in this paper.


                                T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                              3
                                            S COTT A ARONSON

a similar size! The failure of this naı̈ve intuition supports the view that a quantum superposition over
n-bit strings is “more similar” to a probability distribution over n-bit strings than to a 2n -bit string.
     Our second contribution, in Section 4, is an oracle relative to which NP is not contained in
BQP/qpoly. Underlying this oracle separation is the first correct proof of a direct product theorem
for quantum search. Given an N-item√       database
                                                     with K marked items, the direct product theorem says
that if a quantum algorithm makes o N queries, then the probability that the algorithm finds all K
of the marked items decreases exponentially in K. Notice that such a result does not follow from any
existing quantum lower bound. Earlier Klauck [21] claimed a weaker direct product theorem, based on
the hybrid method of Bennett et al. [8], in a paper on quantum time-space tradeoffs for sorting. Un-
fortunately, Klauck’s proof contained a bug. Our proof uses the polynomial method of Beals et al. [7],
with the novel twist that we examine all higher derivatives of a polynomial (not just the first derivative).
Our proof has already been improved by Klauck, Špalek, and de Wolf [22], who were able to recover
and even extend Klauck’s original results about quantum sorting.
     Our final contribution, in Section 5, is a new trace distance method for proving lower bounds on
quantum one-way communication complexity. Previously there was only one basic lower bound tech-
nique: the VC-dimension method of Klauck [20], which relied on lower bounds for quantum random
access codes due to Ambainis et al. [5] and Nayak [27]. Using VC-dimension one can show, for exam-
ple, that Q12 (DISJ) = Ω (n), where the disjointness function DISJ : {0, 1}n × {0, 1}n → {0, 1} is defined
by DISJ (x, y) = 1 if and only if xi yi = 0 for all i ∈ {1, . . . , n}.
     For some problems, however, the VC-dimension method yields no nontrivial quantum lower bound.
Seeking to make this point vividly, Ambainis posed the following problem. Alice is given two elements
x, y of a finite field F p (where p is prime); Bob is given another two elements a, b ∈ F p . Bob’s goal
is to output 1 if y ≡ ax + b (mod p) and 0 otherwise. For this problem, the VC-dimension method
yields no randomized or quantum lower bound better than constant. On the other hand, the well-known
fingerprinting protocol for the equality function [31] seems to fail for Ambainis’ problem, because of
the interplay between addition and multiplication. So it is natural to conjecture that the randomized
and even quantum one-way complexities are Θ (log p)—that is, that no nontrivial protocol exists for this
problem.
     Ambainis posed a second problem in the same spirit. Here Alice is given x ∈ {1, . . . , N}, Bob is
given y ∈ {1, . . . , N}, and both players know a subset S ⊂ {1, . . . , N}. Bob’s goal is to decide whether
x − y ∈ S where
            √      subtraction is modulo N. The conjecture is that if S is chosen uniformly at random with
|S| about N, then with high probability the randomized and quantum one-way complexities are both
Θ (log N).
     Using our trace distance method, we are able to show optimal quantum lower bounds for both of
Ambainis’ problems. Previously, no nontrivial lower bounds were known even for randomized proto-
cols. The key idea is to consider two probability distributions over Alice’s quantum message ρx . The
first distribution corresponds to x chosen uniformly at random; the second corresponds to x chosen uni-
formly conditioned on f (x, y) = 1. These distributions give rise to two mixed states ρ and ρy , which
Bob must be able to distinguish with non-negligible bias assuming he can evaluate f (x, y). We then
show an upper bound on the trace distance kρ − ρy ktr , which implies that Bob cannot distinguish the
distributions.
     Theorem 5.1 gives a very general condition under which our trace distance method works; Corollar-

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                               4
                    L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

ies 5.2 and 5.3 then show that the condition is satisfied for Ambainis’ two problems. Besides showing
a significant limitation of the VC-dimension method, we hope our new method is a non-negligible step
towards proving that R12 ( f ) = O Q12 ( f ) for all total Boolean functions f , where R12 ( f ) is randomized
one-way complexity. We conclude in Section 6 with some open problems.
     This paper is a moderately revised version of an extended abstract that appeared in CCC 2004 [2].
The proofs of Theorems 3.4, 3.5, 5.1, 5.2, and 5.4 have been written out in more detail; and discussions
have been added to Sections 2.2 and 3.1, about the group membership problem and the class PQP/qpoly
respectively. Also, an error has been fixed in Section 4: the direct product theorem in [2] based on
Bernstein’s inequality is incorrect. Fortunately, the easier version based on V. A. Markov’s inequality is
still perfectly sufficient to show NP 6⊂ BQP/qpoly relative to an oracle; and in any case, the Bernstein’s
version has been superseded by the results of Klauck, Špalek, and de Wolf [22].


2     Preliminaries
This section reviews basic definitions and results about quantum one-way communication (in Sec-
tion 2.1) and quantum advice (in Section 2.2); then Section 2.3 proves a quantum information lemma
that will be used throughout the paper.


2.1    Quantum One-Way Communication
Following standard conventions, we denote by D1 ( f ) the deterministic one-way complexity of f , or
the minimum number of bits that Alice must send if her message is a function of x. Also, R12 ( f ), the
bounded-error randomized one-way complexity, is the minimum k such that for every x, y, if Alice sends
Bob a k-bit message drawn from some distribution Dx , then Bob can output a bit a such that a = f (x, y)
with probability at least 2/3. (The subscript 2 means that the error is two-sided.) The zero-error
randomized complexity R10 ( f ) is similar, except that Bob’s answer can never be wrong: he must output
 f (x, y) with probability at least 1/2 and otherwise declare failure.
     The bounded-error quantum one-way complexity Q12 ( f ) is the minimum k such that, if Alice sends
Bob a mixed state ρx of k qubits, there exists a joint measurement of ρx and y enabling Bob to output
an a such that a = f (x, y) with probability at least 2/3. The zero-error and exact complexities Q10 ( f )
and Q1E ( f ) are defined analogously. Requiring Alice’s message to be a pure state would increase these
complexities by at most a factor of 2, since by Kraus’ Theorem, every k-qubit mixed state can be realized
as half of a 2k-qubit pure state. (Winter [37] has shown that this factor of 2 is tight.) See Klauck [20]
for more detailed definitions of quantum and classical one-way communication complexity measures.
     It is immediate that D1 ( f ) ≥ R10 ( f ) ≥ R12 ( f ) ≥ Q12 ( f ), that R10 ( f ) ≥ Q10 ( f ) ≥ Q 1
                                                                                                      2 ( f ), and that
                                                                            R0 ( f ) = Θ D ( f ) , while Klauck
   1          1
D ( f ) ≥ QE ( f ). Also, for total f , Duriš et al. [14] showed that          1                1

[20] showed that Q1E ( f ) = D1 ( f ) and that Q10 ( f ) = Θ D1 ( f ) . In other words, randomized and
quantum messages yield no improvement for total functions if we are unwilling to tolerate a bounded
probability of error. This remains true even if Alice and Bob share arbitrarily many EPR pairs [20]. As
is often the case, the situation is dramatically different for partial functions: there it is easy to see that
R10 ( f ) can be constant even though D1 ( f ) = Ω (n): let f (x, y) = 1 if x1 y1 + · · · + xn/2 yn/2 ≥ n/4 and

                              T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                        5
                                                       S COTT A ARONSON

xn/2+1 yn/2+1 + · · · + xn yn = 0 and f (x, y) = 0 if x1 y1 + · · · + xn/2 yn/2 = 0 and xn/2+1 yn/2+1 + · · · + xn yn ≥
n/4, promised that one of these is the case.
     Moreover, Bar-Yossef, Jayram, and Kerenidis [6] have almost shown that Q1E ( f ) can be exponen-
tially smaller than R12 ( f ). In particular, they proved that separation for a relation, meaning a problem
for which Bob has many possible valid outputs. For a partial function f based on their relation, they
                                                                    √
also showed that Q1E ( f ) = Θ (log n) whereas R10 ( f ) = Θ ( n); and they conjectured (but did not prove)
                    √
that R12 ( f ) = Θ ( n).

2.2    Quantum Advice
Informally, BQP/qpoly is the class of languages decidable in polynomial time on a quantum computer,
given a polynomial-size quantum advice state that depends only on the input length. We now make the
definition more formal.

Definition 2.1. A language L is in BQP/qpoly if there exists a polynomial-size quantum circuit family
{Cn }n≥1 , and a polynomial-size family of quantum states {|ψn i}n≥1 , such that for all x ∈ {0, 1}n ,

   (i) If x ∈ L then q (x) ≥ 2/3, where q (x) is the probability that the first qubit is measured to be |1i,
       after Cn is applied to the starting state |xi ⊗ |0 · · · 0i ⊗ |ψn i.

            / L then q (x) ≤ 1/3.6
  (ii) If x ∈

    The central open question about BQP/qpoly is whether it equals BQP/poly, or BQP with
polynomial-size classical advice. We do have a candidate for an oracle problem separating the two
classes: the group membership problem of Watrous [36], which we describe for completeness. Let Gn
be a black box group7 whose elements are uniquely labeled by n-bit strings, and let Hn be a subgroup of
Gn . Both Gn and Hn depend only on the input length n, so we can assume that a nonuniform algorithm
knows generating sets for both of them. Given an element x ∈ Gn as input, the problem is to decide
whether x ∈ Hn .
    If Gn is “sufficiently nonabelian” and Hn is exponentially large, we do not know how to solve this
problem in BQP or even BQP/poly. On the other hand, we can solve it in BQP/qpoly as follows. Let
our quantum advice state be an equal superposition over all elements of Hn :

                                                                  1
                                                    |Hn i = p            ∑ |yi
                                                                  |Hn | y∈Hn

We can transform |Hn i into
                                                                  1
                                                   |xHn i = p            ∑ |xyi
                                                                  |Hn | y∈Hn
   6 If the starting state is |xi ⊗ |0 · · · 0i ⊗ |ϕi for some |ϕi 6= |ψ i, then we do not require the acceptance probability to lie in
                                                                        n
                                                                                                                             ∗
[0, 1/3]∪[2/3, 1]. Therefore, what we call BQP/qpoly corresponds to what Nishimura and Yamakami [30] call BQP/ Qpoly.
Also, it does not matter whether the circuit family {Cn }n≥1 is uniform, since we are giving it advice anyway.
    7 In other words, we have a quantum oracle available that given x, y ∈ G outputs xy (i.e. exclusive-OR’s xy into an answer
                                                                             n
register), and that given x ∈ Gn outputs x−1 .


                                 T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                                    6
                  L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

by mapping |yi |0i to |yi |xyi to y ⊕ x−1 xy |xyi = |0i |xyi for each y ∈ Hn . Our algorithm will first pre-
                                         √
pare the state (|0i |Hn i + |1i |xHn i) / 2, then apply a Hadamard gate to the first qubit, and finally mea-
sure the first qubit in the standard basis, in order to distinguish the cases |Hn i = |xHn i and hHn |xHn i = 0
with constant bias. The first case occurs whenever x ∈ Hn , and the second occurs whenever x ∈        / Hn .
    Although the group membership problem provides intriguing evidence for the power of quantum
advice, we have no idea how to show that it is not also solvable using classical advice. Indeed, apart
from a result of Nishimura and Yamakami [30] that EESPACE 6⊂ BQP/qpoly, essentially nothing was
known about the class BQP/qpoly before the present work.


2.3   The Almost As Good As New Lemma
The following simple lemma, which was implicit in [5], is used three times in this paper—in Theorems
3.4, 3.5, and 4.7. It says that, if the outcome of measuring a quantum state ρ could be predicted with
near-certainty given knowledge of ρ, then measuring ρ will damage it only slightly. Recall that the
trace distance kρ − σ ktr between two mixed states ρ and σ equals 21 ∑i |λi |, where λ1 , . . . , λN are the
eigenvalues of ρ − σ .

Lemma 2.2. Suppose a 2-outcome measurement of a mixed state ρ yields outcome 0√with probability
1 − ε. Then after the measurement, we can recover a state ρe such that kρe − ρktr ≤ ε. This is true
even if the measurement is a POVM (that is, involves arbitrarily many ancilla qubits).

Proof. Let |ψi be a purification of the entire system (ρ plus ancilla). We can represent any measurement
as a unitary U applied to |ψi, followed by a 1-qubit measurement. Let |ϕ0 i and |ϕ1 i be the two possible
pure states after the measurement; then hϕ0 |ϕ1 i = 0 and U |ψi = α |ϕ0 i + β |ϕ1 i for some α, β such that
|α|2 = 1 − ε and |β |2 = ε. Writing the measurement result as σ = (1 − ε) |ϕ0 i hϕ0 | + ε |ϕ1 i hϕ1 |, it is
easy to show that
                                    σ −U |ψi hψ|U −1 tr = ε (1 − ε).
                                                             p


So applying U −1 to σ ,
                                    U −1 σU − |ψi hψ| tr =
                                                              p
                                                               ε (1 − ε).

Let ρe be the restriction of U −1 σU to the original qubits of ρ. Theorem 9.2 of Nielsen
                                                                                   p and Chuang
                                                                                              √ [28]
shows that tracing out a subsystem never increases trace distance, so kρe − ρktr ≤ ε (1 − ε) ≤ ε.


3     Simulating Quantum Messages
Let f : {0, 1}n × {0, 1}m → {0, 1} be a Boolean function. In this section we first combine existing
results to obtain the relation D1( f ) = O mQ12 ( f ) for total f , and then prove using a new method that
D1 ( f ) = O mQ12 ( f ) log Q12 ( f ) for all f (partial or total).
    Define the communication matrix M f to be a 2n × 2m matrix with f (x, y) in the xth row and yth
column. Then letting rows ( f ) be the number of distinct rows in M f , the following is immediate.



                            T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                 7
                                                   S COTT A ARONSON

Proposition 3.1. For total f ,

                                         D1 ( f ) = dlog2 rows ( f )e ,
                                         Q12 ( f ) = Ω (log log rows ( f )) .

   Also, let the VC-dimension VC ( f ) equal the maximum k for which there exists a 2n × k submatrix
Mg of M f with rows (g) = 2k . Then Klauck [20] observed the following, based on a lower bound for
quantum random access codes due to Nayak [27].

Proposition 3.2 (Klauck). Q12 ( f ) = Ω (VC ( f )) for total f .

   Now let cols ( f ) be the number of distinct columns in M f . Then Theorem 3.2 yields the following
general lower bound:

Corollary 3.3. D1 ( f ) = O mQ12 ( f ) for total f , where m is the size of Bob’s input.
                                      


Proof. It follows from a lemma of Sauer [34] that

                                               VC( f )          
                                                       cols ( f )
                                rows ( f ) ≤    ∑         i
                                                                   ≤ cols ( f )VC( f )+1 .
                                                i=0

Hence VC ( f ) ≥ logcols( f ) rows ( f ) − 1, so
                                                                                   
                                                                 log rows ( f )
                                  Q12 ( f ) = Ω (VC ( f )) = Ω
                                                                  log cols ( f )
                                                                 1      
                                                                 D (f)
                                                             =Ω             .
                                                                    m



    In particular, D1 ( f ) and Q12 ( f ) are polynomially related for total f , whenever Bob’s input is polyno-
mially smaller than Alice’s, and Alice’s input is not “padded.” More formally, D1 ( f ) = O Q12 ( f )1/(1−c)
whenever m = O (nc ) for some c < 1 and rows ( f ) = 2n (i.e. all rows      of M f are distinct). For then
D1 ( f ) = n by Theorem 3.1, and Q12 ( f ) = Ω D1 ( f ) /nc = Ω n1−c by Corollary 3.3.
    We now give a new method for replacing quantum messages by classical ones when Bob’s
input is small.       Although    the best bound we know how to obtain with this method—D1 ( f ) =
O mQ12 ( f ) log Q12 ( f ) —is slightly weaker than the D1 ( f ) = O mQ12 ( f ) of Corollary 3.3, our method
                          

works for partial Boolean functions as well as total ones. It also yields a (relatively) efficient procedure
by which Bob can reconstruct Alice’s quantum message, a fact we will exploit in Section 3.1 to show
BQP/qpoly ⊆ PP/poly. By contrast, the method based on Sauer’s Lemma seems to be nonconstructive.

Theorem 3.4. D1 ( f ) = O mQ12 ( f ) log Q12 ( f ) for all f (partial or total).
                                                  


                             T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                  8
                  L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

Proof. Let f : D → {0, 1} be a partial Boolean function with D ⊆ {0, 1}n × {0, 1}m , and for all x ∈
{0, 1}n , let Dx = {y ∈ {0, 1}m : (x, y) ∈ D}. Suppose Alice can send Bob a quantum state with Q12 ( f )
qubits, that enables him to compute f (x, y) for any y ∈ Dx with error
                                                                     probability at most 1/3. Then she
can also send him a boosted state ρ with K = O Q2 ( f ) log Q2 ( f ) qubits, such that for all y ∈ Dx ,
                                                     1        1


                                                                            1
                                       |Py (ρ) − f (x, y)| ≤                          ,
                                                                      Q12 ( f )10
where Py (ρ) is the probability that some measurement Λ [y] yields a ‘1’ outcome when applied to ρ. We
can assume for simplicity that ρ is a pure state |ψi hψ|; as discussed in Section 2.1, this increases the
message length by at most a factor of 2.
    Let Y be any subset of Dx satisfying |Y| ≤ Q12 ( f )2 . Then starting with ρ, Bob can measure Λ [y]
for each y ∈ Y in lexicographic order, reusing the same message state again and again but uncomputing
whatever garbage he generates while measuring. Let ρt be the state after the t th measurement; thus
ρ0 = ρ = |ψi hψ|. Since the probability that Bob outputs the wrong value of f (x, y) on any given y is
at most 1/Q12 ( f )10 , Lemma 2.2 implies that
                                                   s
                                                        1            1
                                  kρt − ρt−1 ktr ≤           10
                                                                =           .
                                                      1
                                                     Q2 ( f )     Q2 ( f )5
                                                                   1


Since trace distance satisfies the triangle inequality, this in turn implies that
                                                           t                    1
                                     kρt − ρktr ≤                    ≤                    .
                                                      Q12 ( f )  5
                                                                           Q12 ( f )3
Now imagine an “ideal scenario” in which ρt = ρ for every t; that is, the measurements do not damage
ρ at all. Then the maximum bias with which Bob could distinguish the actual from the ideal scenario is
                                                                                |Y|                  1
                          kρ0 − ρktr + · · · + ρ|Y|−1 − ρ tr ≤                                ≤               .
                                                                           Q12 ( f )3             Q12 ( f )

So by the union bound, Bob will output f (x, y) for every y ∈ Y simultaneously with probability at least
                                                |Y|               1
                                        1−                 −               ≥ 0.9
                                             Q12 ( f )10       Q12 ( f )

for sufficiently large Q12 ( f ).
    Now imagine that the communication channel is blocked, so Bob has to guess what message Alice
wants to send him. He does this by using the K-qubit maximally mixed state I in place of ρ. We can
write I as
                                                K
                                            1 2
                                       I = K ∑ ψj ψj ,
                                           2 j=1
where |ψ1 i , . . . , |ψ2K i are orthonormal vectors such that |ψ1 i = |ψi. So if Bob uses the same procedure
as above except with I instead of ρ, then for any Y ⊆ Dx with |Y| ≤ Q12 ( f )2 , he will output f (x, y) for
every y ∈ Y simultaneously with probability at least 0.9/2K .

                            T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                      9
                                                        S COTT A ARONSON

      We now give the classical simulation of the quantum protocol. Alice’s message to Bob consists
of T ≤ K inputs y1 , . . . , yT ∈ Dx, together with f (x, y1 ) , . . . , f (x, yT ).8 Thus the message length is
mT + T = O mQ12 ( f ) log Q12 ( f ) . Here are the semantics of Alice’s message: “Bob, suppose you
looped over all y ∈ Dx in lexicographic order; and for each one, guessed that f (x, y) = round (Py (I)),
where round (p) is 1 if p ≥ 1/2 and 0 if p < 1/2. Then y1 is the first y for which you would guess the
wrong value of f (x, y). In general, let It be the state obtained by starting from I and then measuring
Λ [y1 ] , . . . , Λ [yt ] in that order, given that the outcomes of the measurements are f (x, y1 ) , . . . , f (x, yt ) re-
spectively. (Note that It is not changed by measurements of every y ∈ Dx up to yt , only by measurements
of y1 , . . . , yt .) If you looped over all y ∈ Dx in lexicographic order beginning from yt , then yt+1 is the
first y you would encounter for which round (Py (It )) 6= f (x, y).”
      Given the sequence of yt ’s as defined above, it is obvious that Bob can compute f (x, y) for any
y ∈ Dx . First, if y = yt for some t, then he simply outputs f (x, yt ). Otherwise, let t ∗ be the largest t
for which yt < y lexicographically. Then Bob prepares a classical description of the state It ∗ —which
he can do since he knows y1 , . . . , yt ∗ and f (x, y1 ) , . . . , f (x, yt ∗ )—and then outputs round (Py (It ∗ )) as his
claimed value of f (x, y). Notice that, although Alice uses her knowledge of Dx to prepare her message,
Bob does not need to know Dx in order to interpret the message. That is why the simulation works for
partial as well as total functions.
      But why can we assume that the sequence of yt ’s stops at yT for some T ≤ K? Suppose T > K; we
will derive a contradiction. Let Y = {y1 , . . . , yK+1 }. Then |Y| = K + 1 ≤ Q12 ( f )2 , so we know from
previous reasoning that if Bob starts with I and then measures Λ [y1 ] , . . . , Λ [yK+1 ] in that order, he will
observe f (x, y1 ) , . . . , f (x, yK+1 ) simultaneously with probability at least 0.9/2K . But by the definition of
yt , the probability that Λ [yt ] yields the correct outcome is at most 1/2, conditioned on Λ [y1 ] , . . . , Λ [yt−1 ]
having yielded the correct outcomes. Therefore f (x, y1 ) , . . . , f (x, yK+1 ) are observed simultaneously
with probability at most 1/2K+1 < 0.9/2K , contradiction.

3.1    Simulating Quantum Advice
We now apply our new simulation method to upper-bound the power of quantum advice.

Theorem 3.5. BQP/qpoly ⊆ PP/poly.

Proof. For notational convenience, let Ln (x) = 1 if input x ∈ {0, 1}n is in language L, and Ln (x) = 0
otherwise. Suppose Ln is computed by a BQP machine using quantum advice of length p (n). We
will give a PP machine that computes Ln using classical advice of length O (np (n) log p (n)). Because
of the close connection between advice and one-way communication, the simulation method will be
essentially identical to that of Theorem 3.4.
    By using a boosted advice state on K = O (p (n) log p (n)) qubits, a polynomial-time quantum al-
gorithm A can compute Ln (x) with error probability at most 1/p (n)10 . Now the classical advice
to the PP machine consists of T ≤ K inputs x1 , . . . , xT ∈ {0, 1}n , together with Ln (x1 ) , . . . , Ln (xT ).
Let I be the maximally mixed state on K qubits. Also, let Px (ρ) be the probability that A outputs
‘1’ on input x, given ρ as its advice state. Then x1 is the lexicographically first input x for which
   8 Strictly speaking, Bob will be able to compute f (x, y ) , . . . , f (x, y ) for himself given y , . . . , y ; he does not need Alice
                                                           1                   T                     1           T
to tell him the f values.


                                 T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                                     10
                    L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

round (Px (I)) 6= Ln (x). In general, let It be the state obtained by starting with I as the advice and
then running A on x1 , . . . , xt in that order (uncomputing garbage along the way), if we postselect on
A correctly outputting Ln (x1 ) , . . . , Ln (xt ). Then xt+1 is the lexicographically first x > xt for which
round (Px (It )) 6= Ln (x).
    Given the classical advice, we can compute Ln (x) as follows: if x ∈ {x1 , . . . , xT } then output Ln (xt ).
Otherwise let t ∗ be the largest t for which xt < x lexicographically, and output round (Px (It ∗ )). The proof
that this algorithm works is the same as in Theorem 3.4, and so is omitted for brevity. All we need to
show is that the algorithm can be implemented in PP.
    Adleman, DeMarrais, and Huang [4] (see also Fortnow and Rogers [16]) showed that BQP ⊆ PP, by
using what physicists would call a “Feynman sum-over-histories.” Specifically, let C be a polynomial-
size quantum circuit that starts in the all-0 state, and that consists solely of Toffoli and Hadamard gates
(Shi [35] has shown that this gate set is universal). Also, let αz be the amplitude of basis state |zi
after all gates in C have been applied. We can write αz as a sum of exponentially many contributions,
a1 + · · · + aN , where each ai is a rational real number computable in classical polynomial time. So by
evaluating the sum
                                                                  N
                                                    |αz |2 = ∑ ai a j ,
                                                                i, j=1

putting positive and negative terms on “opposite sides of the ledger,” a PP machine can check whether
|αz |2 > β for any rational constant β . It follows that a PP machine can also check whether

                                                ∑ |αz |2 > ∑ |αz |2
                                              z : S1 (z)           z : S0 (z)

(or equivalently, whether Pr [S1 ] > Pr [S0 ]) for any classical polynomial-time predicates S1 and S0 .
     Now suppose the circuit C does the following, in the case x ∈              / {x1 , . . . , xT }. It first prepares the
K-qubit maximally mixed state I (as half of a 2K-qubit pure state), and then runs A on x1 , . . . , xt ∗ , x in
that order, using I as its advice state. The claimed values of Ln (x1 ) , . . . , Ln (xt ∗ ) , Ln (x) are written to
output registers but not measured. For i ∈ {0, 1}, let the predicate Si (z) hold if and only if basis state
|zi contains the output sequence Ln (x1 ) , . . . , Ln (xt ∗ ) , i. Then it is not hard to see that
                                                                  Pr [S1 ]
                                              Px (It ∗ ) =                       ,
                                                             Pr [S1 ] + Pr [S0 ]
so Px (It ∗ ) > 1/2 and hence Ln (x) = 1 if and only if Pr [S1 ] > Pr [S0 ]. Since the case x ∈ {x1 , . . . , xT } is
trivial, this shows that Ln (x) is computable in PP/poly.

    We make five remarks about Theorem 3.5. First, for the same reason that Theorem 3.4 works
for partial as well as total functions, we actually obtain the stronger result that PromiseBQP/qpoly ⊆
PromisePP/poly, where PromiseBQP and PromisePP are the promise-problem versions of BQP and
PP respectively.
    Second, as pointed out to us by Lance Fortnow, a corollary of Theorem 3.5 is that we cannot
hope to show an unrelativized separation between BQP/poly and BQP/qpoly, without also showing
that PP does not have polynomial-size circuits. For BQP/poly 6= BQP/qpoly clearly implies that
P/poly 6= PP/poly. But the latter then implies that PP 6⊂ P/poly, since assuming PP ⊂ P/poly we

                              T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                          11
                                             S COTT A ARONSON

could also obtain polynomial-size circuits for a language L ∈ PP/poly by defining a new language
L0 ∈ PP, consisting of all (x, a) pairs such that the PP machine would accept x given advice string a.
The reason this works is that PP is a syntactically defined class.
     Third, an earlier version of this paper showed that BQP/qpoly ⊆ EXP/poly, by using a simulation
in which an EXP machine keeps track of a subspace H of the advice Hilbert space to which the ‘true’
advice state must be close. In that simulation, the classical advice specifies inputs x1 , . . . , xT for which
dim (H) is at least halved; the observation that dim (H) must be at least 1 by the end then implies that
T ≤ K = O (p (n) log p (n)), meaning that the advice is of polynomial size. The huge improvement from
EXP to PP came solely from working with measurement outcomes and their probabilities instead of
with subspaces and their dimensions. We can compute the former using the same “Feynman sum-over-
histories” that Adleman et al. [4] used to show BQP ⊆ PP, but could not see any way to compute the
latter without explicitly storing and diagonalizing exponentially large matrices.
     Fourth, assuming BQP/poly 6= BQP/qpoly, Theorem 3.5 is almost the best result of its kind that one
could hope for, since the only classes known to lie between BQP and PP and not known to equal either
are obscure ones such as AWPP [16]. Initially the theorem seemed to us to prove something stronger,
namely that BQP/qpoly ⊆ PostBQP/poly. Here PostBQP is the class of languages decidable by
polynomial-size quantum circuits with postselection—meaning the ability to measure a qubit that has a
nonzero probability of being |1i, and then assume that the measurement outcome will be |1i. Clearly
PostBQP lies somewhere between BQP and PP; one can think of it as a quantum analogue of the
classical complexity class BPPpath [18]. We have since shown, however, that PostBQP = PP [3].
     Fifth, it is clear that Adleman et al.’s BQP ⊆ PP result [4] can be extended to show that PQP = PP.
Here PQP is the quantum analogue of PP—that is, quantum polynomial time but where the probability
of a correct answer need only be bounded above 1/2, rather than above 2/3. A reviewer asked whether
Theorem 3.5 could similarly be extended to show that PQP/qpoly = PP/poly. The answer is no—for
indeed, PQP/qpoly contains every language whatsoever! To see this, given any function Ln : {0, 1}n →
{0, 1}, let our quantum advice state be

                                                 1
                                      |ψn i =          ∑ |xi |Ln (x)i .
                                                2n/2 x∈{0,1}n

Then a PQP algorithm to compute Ln is as follows: given an input x ∈ {0, 1}n , first measure |ψn i in the
standard basis. If |xi |Ln (x)i is observed, output Ln (x); otherwise output a uniform random bit.


4    Oracle Limitations
Can quantum computers solve NP-complete problems in polynomial time? In the early days of quantum
computing, Bennett et al. [8] gave an oracle relative to which NP 6⊂ BQP, providing what is still the
best evidence we have that the answer is no. It is easy to extend Bennett et al.’s result to give an oracle
relative to which NP 6⊂ BQP/poly; that is, NP is hard even for nonuniform quantum algorithms. But
when we try to show NP 6⊂ BQP/qpoly relative to an oracle, a new difficulty arises: even if the oracle
encodes 2n exponentially hard search problems for each input length n, the quantum advice, being an
“exponentially large object” itself, might somehow encode information about all 2n problems. We need

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                 12
                      L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

to argue that even if so, only a miniscule fraction of that information can be extracted by measuring the
advice.
     How does one prove such a statement? As it turns out, the task can be reduced to proving a direct
product theorem for quantum search. This is a theorem that in its weakest form says the following:
given N items, K of which are marked, if we lack enough time to find even one marked item, then the
probability of finding all K items decreases exponentially in K. For intuitively, suppose there were a
quantum advice state that let us efficiently find any one of K marked items. Then by “guessing” the
advice (i.e. replacing it by a maximally mixed state), and then using the guessed advice multiple times,
we could efficiently find all K of the items with a success probability that our direct product theorem
shows is impossible. This reduction is formalized in Theorem 4.7.
     But what about the direct product theorem itself? It seems like it should be trivial to prove—for
surely there are no devious correlations by which success in finding one marked item leads to success
in finding all the others! So it is surprising that even a weak direct product theorem eluded proof for
years. In 2001, Klauck [21] gave an attempted proof using the hybrid method of Bennett et al. [8].
His motivation was to show a limitation of space-bounded quantum sorting algorithms. Unfortunately,
Klauck’s proof contained a bug.9
     In this section we give the first correct proof of a direct product theorem, based on the polynomial
method of Beals et al. [7]. Besides showing that NP 6⊂ BQP/qpoly relative to an oracle, our result can
be used to recover the claims made in [21] about the hardness of quantum sorting (see Klauck, Špalek,
and de Wolf [22] for details). We expect the result to have other applications as well.
     We will need the following lemma of Beals et al. [7], which builds on ideas due to Minsky and
Papert [26] and Nisan and Szegedy [29].
Lemma 4.1 (Beals et al.). Suppose a quantum algorithm makes T queries to an oracle string X ∈
{0, 1}N , and accepts with probability A (X). Then there exists a real polynomial p, of degree at most
2T , such that
                                           p (i) = EX [A (X)]
                                                            |X|=i

for all integers i ∈ {0, . . . , N}, where |X| denotes the Hamming weight of X.
     Lemma 4.1 implies that, to lower-bound the number of queries T made by a quantum algorithm,
it suffices to lower-bound deg (p), where p is a real polynomial representing the algorithm’s expected
acceptance probability. As an example, any quantum algorithm that computes the OR function on N bits,
with success probability at least 2/3, yields a polynomial p such that p (0) ∈ [0, 1/3] and p (i) ∈ [2/3, 1]
for all integers i ∈ {1, . . . , N}. To lower-bound the degree of such a polynomial, one can use an inequality
proved by A. A. Markov in 1890 ([24]; see also [32]):
Theorem 4.2 (A. A. Markov). Given a real polynomial p and constant N > 0, let r(0) = maxx∈[0,N] |p (x)|
and r(1) = maxx∈[0,N] |p0 (x)|. Then
                                                s
                                                   Nr(1)
                                      deg (p) ≥          .
                                                   2r(0)
   9 Specifically, the last sentence in the proof of Lemma 5 in [21] (“Clearly this probability is at least q (p − α)”) is not
                                                                                                             x x
justified by what precedes it.


                                 T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                          13
                                             S COTT A ARONSON

     Theorem 4.2 deals with the entire range [0, N], whereas in our setting p (x) is constrained only at the
integer points x ∈ {0, . . . , N}. But as shown in [15, 29, 33], this is not a problem. For by elementary
calculus, p (0) ≤ 1/3 and p (1) ≥ 2/3 imply that p0 (x) ≥ 1/3 for some real x ∈ [0, 1], and therefore
r(1) ≥ 1/3. Furthermore, let x∗ be a pointin [0, N] where |p (x∗ )| = r(0) . Then p (bx∗ c) ∈ [0, 1] and
p (dx∗ e) ∈ [0, 1] imply that r(1) ≥ 2 r(0) − 1 . Thus
                                  s          s
                                    Nr(1)
                                                      
                                                N max 1/3, 2 r(0) − 1
                                                                               √ 
                      deg (p) ≥            ≥                               =  Ω     N .
                                     2r(0)               2r(0)
                                                                        √ 
This is the proof of Beals et al. [7] that quantum search requires Ω N queries.
     When proving a direct product theorem, we can no longer apply Theorem 4.2 so straightforwardly.
The reason is that the success probabilities in question are extremely small, and therefore the maximum
derivative r(1) could also be extremely small. Fortunately, though, we can still prove a good lower
bound on the degree of the relevant polynomial p. The key is to look not just at the first derivative of p,
but at higher derivatives.
     To start, we need a lemma about the behavior of functions under repeated differentiation.
Lemma 4.3. Let f : R → R be an infinitely differentiable function such that for some positive integer
K, we have f (i) = 0 for all i ∈ {0, . . . , K − 1} and f (K) = δ > 0. Also, let r(m) = maxx∈[0,N] f (m) (x) ,
where f (m) (x) is the mth derivative of f evaluated at x (thus f (0) = f ). Then r(m) ≥ δ /m! for all
m ∈ {0, . . . , K}.
                                                                                 (m)            (m)
Proof. We
         claim,                                    K − m
               by induction on m, that there exist     + 1 points 0 ≤ x0             < · · · < xK−m ≤ K such
            (m)                                            (m)                                (0)
that f (m) xi      = 0 for all i ≤ K − m − 1 and f (m) xK−m ≥ δ /m!. If we define xi = i, then the
base case m = 0 is immediate from the conditions of the lemma. Suppose the claim is true for m;
                                                                             (m+1)    (m) (m)
then by elementary calculus, for all i ≤ K − m − 2 there exists a point xi         ∈ xi , xi+1 such that
                
           (m+1)                       (m+1)    (m)            (0)                             (m+1)
 f (m+1) xi        = 0. Notice that xi       ≥ xi ≥ · · · ≥ xi = i. So there is also a point xK−m−1 ∈
                
    (m)      (m)
   xK−m−1 , xK−m such that
                                                                            
                                                        (m)              (m)
                                            f (m) xK−m       − f (m) xK−m−1
                                     (m+1)
                            f (m+1) xK−m−1 ≥              (m)      (m)
                                                         xK−m − xK−m−1
                                                    δ /m! − 0
                                              ≥
                                                 K − (K − m − 1)
                                                    δ
                                              =           .
                                                 (m + 1)!


     With the help of Lemma 4.3, we can sometimes lower-bound the degree of a real polynomial even its
first derivative is small throughout the region of interest. To do so, we use the following generalization
of A. A. Markov’s inequality (Theorem 4.2), which was proved by A. A. Markov’s younger brother V.
A. Markov in 1892 ([25]; see also [32]).

                           T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                14
                  L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

Theorem 4.4 (V. A. Markov). Given a real polynomial p of degree d and positive real number N, let
r(m) = maxx∈[0,N] p(m) (x) . Then for all m ∈ {1, . . . , d},
                                        !m
                                2r(0)         (m)
                     r(m) ≤                  Td     (1)
                                 N
                                                                                                   
                                                                  d 2 − 22 · · · · · d 2 − (m − 1)2
                                        !m
                                             d 2 d 2 − 12
                                                                         
                                2r(0)
                          ≤                                                                             .
                                 N                            1 · 3 · 5 · · · · · (2m − 1)

Here Td (x) = cos (d arccos x) is the d th Chebyshev polynomial of the first kind.

    As we demonstrate below, combining Theorem 4.4 with Lemma 4.3 yields a lower bound on deg (p).

Lemma 4.5. Let p be a real polynomial such that

  (i) p (x) ∈ [0, 1] at all integer points x ∈ {0, . . . , N}, and

  (ii) for some positive integer K ≤ N and real δ > 0, we have p (K) = δ and p (i) = 0 for all i ∈
       {0, . . . , K − 1}.
                           √      
     Then deg (p) = Ω        Nδ 1/K .

Proof. Let p(m) and r(m) be as in Theorem 4.4. Then for all m ∈ {1, . . . , deg (p)}, Theorem 4.4 yields
                                               !m
                                        2r (0)            deg (p)2m
                                 r(m) ≤
                                          N       1 · 3 · 5 · · · · · (2m − 1)

Rearranging,                              r
                                               N                                     (m) 1/m
                                                                                        
                              deg (p) ≥             1 · 3 · 5 · · · · · (2m − 1) · r
                                              2r(0)
for all m ≥ 1 (if m > deg (p) then r(m) = 0 so the bound is trivial).
                                              (0)
    There are now two
      (1)       (0)
                      cases. First suppose r ≥ 2. Then as discussed previously, condition (i) implies
that r ≥ 2 r − 1 , and hence that
                                             s                s
                                                  Nr(1)           N r(0) − 1
                                                                                 √ 
                              deg (p) ≥                   ≥                   = Ω   N
                                                  2r(0)              r(0)

by Theorem 4.2. Next suppose r(0) < 2. Then r(m) ≥ δ /m! for all m ≤ K by Lemma 4.3. So setting
m = K yields              s
                                                                δ 1/K
                                                                  
                            N                                            p        
                deg (p) ≥        1 · 3 · 5 · · · · · (2K − 1) ·       =Ω    Nδ 1/K   .
                            4                                   K!
Either way we are done.

                            T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                15
                                                    S COTT A ARONSON

    Strictly speaking, we do not need the full strength of Theorem 4.4 to prove a lower bound on deg (p)
that suffices for an oracle separation between NP and BQP/qpoly. For we can show a “rough-and-
ready” version of V. A. Markov’s inequality by applying A. A. Markov’s inequality (Theorem 4.2)
repeatedly, to p, p(1) , p(2) , and so on. This yields
                                                                                     m
                                             2                           2
                                  r   (m)
                                            ≤ deg (p)2 r(m−1) ≤            deg (p)2        r(0)
                                             N                           N

for all m.   If deg (p) is small, then this upper bound on r(m) contradicts the lower bound of
Lemma4.3. However, the lower bound  √on deg (p)
                                                 that one gets from A. A. Markov’s inequality is
only Ω     Nδ /K , as opposed to Ω
        p
             1/K                         Nδ 1/K  from Lemma 4.5.10
    Shortly after seeing our proof of a weak direct product theorem, Klauck,√Špalek, and
                                                                                         de Wolf [22]
managed to improve the lower bound on deg (p) to the essentially tight Ω       NKδ  1/K  . In particular,
                                                                                 √ 
their bound implies that δ decreases exponentially in K whenever deg (p) = o NK . They obtained
this improvement by factoring p instead of differentiating it as in Lemma 4.3.
    In any case, a direct product theorem follows trivially from what has already been said.

Theorem 4.6 (Direct Product Theorem). Suppose a quantum algorithm makes T queries to an oracle
string X ∈ {0, 1}N . Let δ be the minimum probability, over all X with Hamming weight |X| = K, that
                                                           K
the algorithm finds all K of the ‘1’ bits. Then δ ≤ cT 2 /N for some constant c.

Proof. Have the algorithm accept if it finds K or more ‘1’ bits and reject otherwise. Let p (i) be the
expected probability of acceptance if X is drawn uniformly at random subject to |X| = i. Then we know
the following about p:

   (i) p (i) ∈ [0, 1] at all integer points i ∈ {0, . . . , N}, since p (i) is a probability.

  (ii) p (i) = 0 for all i ∈ {0, . . . , K − 1}, since there are not K marked items to be found.

 (iii) p (K) ≥ δ .

   Furthermore, Lemma√4.1 implies
                                 that p is a polynomial in i satisfying deg (p) ≤ 2T . It follows from
                                                                    K
Lemma 4.5 that T = Ω    Nδ  1/K                                2
                                 , or rearranging, that δ ≤ cT /N for some constant c.

    We can now prove the desired oracle separation using standard complexity theory tricks.

Theorem 4.7. There exists an oracle relative to which NP 6⊂ BQP/qpoly.

                                                                     √                    
  10 An earlier version of this paper claimed to prove deg (p) = Ω        NK/ log3/2 (1/δ ) , by applying Bernstein’s inequality
[11] rather than A. A. Markov’s to all derivatives p(m) . We have since discovered a flaw in that argument. In any case, the
Bernstein lower bound is both unnecessary for an oracle separation, and superseded by the later results of Klauck et al. [22].


                               T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                              16
                     L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

Proof. Given an oracle A : {0, 1}∗ → {0, 1}, define the language LA by (y, z) ∈ LA if and only if y ≤ z
lexicographically and there exists an x such that y ≤ x ≤ z and A (x) = 1. Clearly LA ∈ NPA for all A.
We argue that for some A, no BQP/qpoly machine M with oracle access to A can decide LA . Without
loss of generality we assume M is fixed, so that only the advice states {|ψn i}n≥1 depend on A. We also
                                                                                        2
assume the advice is boosted, so that M’s error probability on any input (y, z) is 2−Ω(n ) .
    Choose a set S ⊂ {0, 1}n subject to |S| = 2n/10 ; then for all x ∈ {0, 1}n , set A (x) = 1 if and only if
x ∈ S. We claim that by using M, an algorithm could find all 2n/10 elements of S with high probability
after only 2n/10 poly (n) queries to A. Here is how: first use binary search (repeatedly halving the
distance between y and z) to find the lexicographically first element of S. By Lemma 2.2, the boosted
advice state |ψn i is good for 2Ω(n ) uses, so this takes only poly (n) queries. Then use binary search to
                                   2


find the lexicographically second element, and so on until all elements have been found.
    Now replace |ψn i by the maximally mixed state as in Theorem 3.4. This yields an algorithm that uses
no advice, makes 2n/10 poly (n) queries, and finds all 2n/10 elements of S with probability 2−O(poly(n)) .
But taking δ = 2−O(poly(n)) , T = 2n/10 poly (n), N = 2n , and K = 2n/10 , such an algorithm would satisfy
              K
δ  cT 2 /N , which violates the bound of Theorem 4.6.

    Indeed one can show that NP 6⊂ BQP/qpoly relative a random oracle with probability 1.11


5    The Trace Distance Method
This section introduces a new method for proving lower bounds on quantum one-way communication
complexity. Unlike in Section 3, here we do not try to simulate quantum protocols using classical ones.
Instead we prove lower bounds for quantum protocols directly, by reasoning about the trace distance
between two possible distributions over Alice’s quantum message (that is, between two mixed states).
The result is a method that works even if Alice’s and Bob’s inputs are the same size.
    We first state our method as a general theorem; then, in Section 5.1, we apply the theorem to prove
lower bounds for two problems of Ambainis. Let kD − Ek denote the variation distance between prob-
ability distributions D and E.
Theorem 5.1. Let f : {0, 1}n × {0, 1}m → {0, 1} be a total Boolean function. For each y ∈ {0, 1}m , let
Ay be a distribution over x ∈ {0, 1}n such that f (x, y) = 1. Let B be a distribution over y ∈ {0, 1}m , and
let Dk be the distribution over ({0, 1}n )k formed by first choosing y ∈ B and then choosing k samples
independently from Ay . Suppose that Prx∈D1 ,y∈B [ f (x, y) = 0] = Ω (1) and that D2 − D21 ≤ δ . Then
Q12 ( f ) = Ω (log 1/δ ).
Proof. Suppose that if Alice’s input is x, then she sends Bob the l-qubit mixed state ρx . Suppose also
that for every x ∈ {0, 1}n and y ∈ {0, 1}m , Bob outputs f (x, y) with probability at least 2/3. Then by
amplifying a constant number of times, Bob’s success probability can be made 1 − ε for any constant
ε > 0. So with L = O (l) qubits of communication, Bob can distinguish the following two cases with
constant bias:
  11 First group the oracle bits into polynomial-size blocks as Bennett and Gill [10] do, then use the techniques of Aaronson

[1] to show that the acceptance probability is a low-degree univariate polynomial in the number of all-0 blocks. The rest of
the proof follows Theorem 4.7.


                              T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                            17
                                                  S COTT A ARONSON

   Case I. y was drawn from B and x from D1 .
   Case II. y was drawn from B and x from Ay .
   For in Case I, we assumed that f (x, y) = 0 with constant probability, whereas in Case II, f (x, y) = 1
always. An equivalent way to say this is that with constant probability over y, Bob can distinguish the
mixed states ρ = EXx∈D1 [ρx ] and ρy = EXx∈Ay [ρx ] with constant bias. Therefore
                                      EX kρ − ρy ktr = Ω (1) .
                                                     
                                        y∈B

      We need an upper bound on the trace distance kρ − ρy ktr that is more amenable to analysis. Let
λ1 , . . . , λ2L be the eigenvalues of ρ − ρy . Then
                                                        L
                                             1 2
                               kρ − ρy ktr = ∑ |λi |
                                             2 i=1
                                               v
                                                      2L
                                               u
                                             1u
                                           ≤   t 2 ∑ λi2
                                                   L
                                             2       i=1
                                                    v
                                                    u 2L
                                                                          2
                                           = 2L/2−1 t ∑ (ρ)i j − (ρy )i j
                                                    u
                                                                  i, j=1

where (ρ)i j is the (i, j) entry of ρ. Here the second line uses the Cauchy-Schwarz inequality, and the
third line uses the unitary invariance of the Frobenius norm.
    We claim that                        " L                  #
                                              2                                  2
                                   EX
                                   y∈B i, j=1
                                              ∑ (ρ)i j − (ρy )i j                       ≤ 2δ .

From this claim it follows that
                                                   v
                                                    u 2L                   
                                                                         2
                        EX kρ − ρy ktr ≤ 2L/2−1 EX t ∑ (ρ)i j − (ρy )i j 
                                                  u
                        y∈B                                  y∈B               i, j=1
                                                    v   " L                    #
                                                    u     2                  2
                                           ≤ 2L/2−1 t EX ∑ (ρ)i j − (ρy )i j
                                                    u
                                                                 y∈B i, j=1
                                            √
                                           ≤ 2L−1 δ .
Therefore the message length L must be Ω (log 1/δ ) to ensure that EXy∈B kρ − ρy ktr = Ω (1).
                                                                                   

   Let us now prove the claim. We have
         " L                   #
            2                2      2L         2
                                                                         i       
                                                                                          2
                                                                                            
     EX ∑ (ρ)i j − (ρy )i j      = ∑
                                                                    h
                                                              ∗
                                          (ρ)i j − 2 Re (ρ)i j EX (ρy )i j + EX (ρy )i j
      y∈B i, j=1                     i, j=1                                              y∈B            y∈B

                                      2L               
                                                                       2
                                                                           
                                                                                            2
                                                                                                
                                  = ∑             EX        (ρy )i j           − (ρ)i j             ,
                                     i, j=1       y∈B



                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                    18
                    L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION
           h        i
since EXy∈B (ρy )i j = (ρ)i j . For a given (i, j) pair,
                                                                       "                    #
                     
                                    2
                                        
                                                         2                     h         i2       h         i2
               EX        (ρy )i j           − (ρ)i j         = EX            EX (ρx )i j      − EX (ρx )i j
               y∈B                                               y∈B        x∈Ay                            x∈D1
                                                                         h                  i        h               i
                                                             = EX         (ρx )∗i j (ρz )i j − EX (ρx )∗i j (ρz )i j
                                                              y∈B,x,z∈Ay                      x,z∈D1
                                                                                           !
                                                             = ∑ Pr [x, z] − Pr [x, z] (ρx )∗i j (ρz )i j .
                                                                 x,z       D2            D21


Now for all x, z,
                                                 2L                                 2L                 2
                                                 ∑ (ρx )∗i j (ρz )i j ≤ ∑ (ρx )i j                         ≤ 1.
                                                i, j=1                             i, j=1

Hence
                                                             !                                                           !
                                                                  2L
                     ∑ DPr [x, z] − DPr [x, z] ∑
                               2                 2
                                                                          (ρx )∗i j (ρz )i j ≤   ∑ DPr [x, z] − DPr [x, z]
                                                                                                            2      2
                     x,z                         1               i, j=1                          x,z               1

                                                                                            = 2 D2 − D21
                                                                                            ≤ 2δ ,

and we are done.

   The difficulty in extending Theorem 5.1 to partial functions is that the distribution D1 might not
make sense, since it might assign a nonzero probability to some x for which f (x, y) is undefined.

5.1   Applications
In this subsection we apply Theorem 5.1 to prove lower bounds for two problems of Ambainis. To
facilitate further research and to investigate the scope of our method, we state the problems in a more
general way than Ambainis did. Given a group G, the coset problem Coset (G) is defined as follows.
Alice is given a left coset C of a subgroup in G, and Bob is given an element y ∈ G. Bob must output 1
if y ∈ C and 0 otherwise. By restricting the group G, we obtain many interesting and natural problems.
For example, if p is prime then Coset (Z p ) is just the equality problem, so the protocol of Rabin and Yao
[31] yields Q12 (Coset (Z p )) = Θ (log log p).

Theorem 5.2. Q12 Coset Z2p = Θ (log p).
                                

Proof. The upper bound is obvious. For the lower bound, it suffices to consider a function f p defined
as follows. Alice is given hx, yi ∈ F2p and Bob is given ha, bi ∈ F2p ; then
                                                                    
                                                                           1 if y ≡ ax + b (mod p)
                                            f p (x, y, a, b) =
                                                                           0 otherwise.

                                   T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                          19
                                                    S COTT A ARONSON

Let B be the uniform distribution over ha, bi ∈ F2p , and let Aa,b be the uniform distribution over hx, yi
such that y ≡ ax + b (mod p). Thus D1 is the uniform distribution over hx, yi ∈ F2p ; note that

                                                                                    1
                                            Pr         [ f p (x, y, a, b) = 0] = 1 − .
                                     hx,yi∈D1 ,ha,bi∈B                              p

But what about the distribution D2 , which is formed by first drawing ha, bi ∈ B, and then drawing hx, yi
and hz, wi independently from Aa,b ? Given a pair hx, yi , hz, wi ∈ F2p , there are three cases regarding the
probability of its being drawn from D2 :

  (1) hx, yi = hz, wi (p2 pairs). In this case

                           Pr [hx, yi , hz, wi] =
                           D2
                                                         ∑ Pr [ha, bi] Pr [hx, yi , hz, wi | ha, bi]
                                                      ha,bi∈F2p
                                                                    
                                                             1 1             1
                                                   =p          ·         =      .
                                                             p2 p2           p3

  (2) x 6= z (p4 − p3 pairs). In this case there exists a unique ha∗ , b∗ i such that y ≡ a∗ x + b∗ (mod p) and
      w ≡ a∗ z + b∗ (mod p), so

                                Pr [hx, yi , hz, wi] = Pr [ha∗ , b∗ i] Pr [hx, yi , hz, wi | ha∗ , b∗ i]
                               D2
                                                         1 1      1
                                                     =    2
                                                            · 2 = 4.
                                                         p p     p

  (3) x = z but y 6= w (p3 − p2 pairs). In this case PrD2 [hx, yi , hz, wi] = 0.

    Putting it all together,
                                                                                   
                           1     2 1    1             1    1                    1
               D2 − D21                        4   3                3    2
                                                                           
                       =       p     −    + p −p         −     + p − p 0− 4
                           2       p3 p4               p4 p4                     p
                           1     1
                       = − 2.
                           p p

So taking δ = 1/p − 1/p2 , we have Q12 Coset Z2p = Ω (log (1/δ )) = Ω (log p) by Theorem 5.1.
                                                

    We now consider Ambainis’ second problem. Given a group G and nonempty set S ⊂ G with
|S| ≤ |G| /2, the subset problem Subset (G, S) is defined as follows. Alice is given x ∈ G and Bob is
given y ∈ G; then Bob must output 1 if xy ∈ S and 0 otherwise.
    Let M be the distribution over st −1 ∈ G formed by drawing s and t uniformly and independently
from S. Then let ∆ = kM − D1 k, where D1 is the uniform distribution over G.

Proposition 5.3. For all G, S such that |S| ≤ |G| /2,

                                          Q12 (Subset (G, S)) = Ω (log 1/∆) .

                               T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                             20
                 L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

Proof. Let B be the uniform distribution over y ∈ G, and let Ay be the uniform distribution over x such
that xy ∈ S. Thus D1 is the uniform distribution over x ∈ G; note that

                                                                     |S|  1
                                           Pr       [xy ∈
                                                        / S] = 1 −       ≥ .
                                      x∈D1 ,y∈B                      |G| 2

We have

                                 1       |{y ∈ G, s,t ∈ S : xy = s, zy = t}|    1
                     D2 − D21 =     ∑
                                 2 x,z∈G                |G| |S|2
                                                                             −
                                                                               |G|2
                                            s,t ∈ S : xz−1 = st −1
                                          
                                 1                                       1
                                = ∑                      2
                                                                     −
                                 2 x,z∈G              |S|               |G|2
                                           s,t ∈ S : x = st −1
                                         
                                 1                                  1
                                = ∑                   2
                                                                 −
                                 2 x∈G            |S|              |G|
                                    1              1
                                =     ∑ Pr
                                    2 x∈G M
                                            [x] −
                                                  |G|
                                = kM − D1 k
                                = ∆.

Therefore log (1/δ ) = Ω (log 1/∆).

    Having lower-bounded Q12 (Subset (G, S)) in terms of 1/∆, it remains only to upper-bound the varia-
tion distance ∆. The following proposition implies that for all constants ε > 0, if S is chosen uniformly
at random subject to |S| = |G|1/2+ε , then Q12 (Subset (G, S)) = Ω (log (|G|)) with constant probability
over S.

Theorem 5.4. For all groups G            K ∈ {1, . . . , |G|}, if S ⊂ G is chosen uniformly at random
                                and integers
                               p
subject to |S| = K, then ∆ = O    |G|/K with Ω (1) probability over S.

Proof. We have                                                    s
                                                                                 1 2
                                                          p                        
                           1            1                   |G|
                        ∆ = ∑ Pr [x] −     ≤                         ∑ Pr [x] − |G|
                           2 x∈G M     |G|                  2        x∈G M

by the Cauchy-Schwarz inequality. We claim that
                                   "                               2 #
                                                               1               c
                                      EX
                                       S
                                                ∑    Pr [x] −
                                                     M        |G|
                                                                           ≤
                                                                               K2
                                            x∈G

for some constant c. From this it follows by Markov’s inequality that
                                    "                         #
                                                  1 2 2c
                                                     
                                                                    1
                                 Pr ∑ Pr [x] −           ≥ 2 ≤
                                  S
                                      x∈G M      |G|       K        2

                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                            21
                                                         S COTT A ARONSON

and hence                                            p           r               p         !
                                                         |G|         2c              |G|
                                                ∆≤                      =O
                                                         2           K2              K
with probability at least 1/2.
    Let us now prove the claim. We have
                                                       h         i
                                            Pr [x] = Pr si s−1
                                                            j = x = Pr [si = xs j ] ,
                                            M         i, j                    i, j

where S = {s1 , . . . , sK } and i, j are drawn uniformly and independently from {1, . . . , K}. So by linearity
of expectation,
            "                         #        "                                                          !#
                                  1 2
                                                                           2
                                                                                  2                     1
        EX ∑ Pr [x] −                      = EX ∑            Pr [si = xs j ] −       Pr [si = xs j ] +
          S
              x∈G M             |G|           S
                                                   x∈G
                                                             i, j                |G| i, j              |G|2
                                                                          !                            !
                                                            K
                                                     1                         2          1 K                1
                                           =∑         4 ∑
                                                                  px,i jkl −      ∑ K 2 ∑ px,i j + |G|
                                             x∈G    K  i, j,k,l=1             |G| x∈G       i, j=1

where

                                                 px,i j = Pr [si = xs j ] ,
                                                             S
                                                px,i jkl = Pr [si = xs j ∧ sk = xsl ] .
                                                             S

    First we analyze px,i j . Let ord (x) be the order of x in G. Of the K 2 possible ordered pairs (i, j), there
are K pairs with the “pattern” ii (meaning that i = j), and K (K − 1) pairs with the pattern i j (meaning
that i 6= j). If ord (x) = 1 (that is, x is the identity), then we have px,i j = PrS [si = s j ], so px,i j = 1 under
the pattern ii, and px,i j = 0 under the pattern i j. On the other hand, if ord (x) > 1, then px,i j = 0 under
                               1
the pattern ii, and px,i j = |G|−1 under the pattern i j. So

                                            K                                            
                                   1                     1                      K (K − 1)
                                    2 ∑    ∑
                                   K x∈G i, j=1
                                                px,i j = 2
                                                        K
                                                                  K + (|G| − 1)
                                                                                 |G| − 1
                                                                                            = 1.

    Though unnecessarily cumbersome, the above analysis was a warmup for the more complicated case
of px,i jkl . The following table lists the expressions for px,i jkl , given ord (x) and the pattern of (i, j, k, l).

      Pattern                        Number of such 4-tuples            ord (x) = 1            ord (x) = 2      ord (x) > 2
      iiii, iikk                     K2                                      1                       0                0
                                                                                                     1                1
      i ji j                         K (K − 1)                               0                     |G|−1            |G|−1
                                                                                                     1
      i j ji                         K (K − 1)                               0                     |G|−1              0
      iiil, iiki, i jii, i j j j     4K (K − 1)                              0                       0                0
                                                                                                                      1
      i jki, i j jk                  2K (K − 1) (K − 2)                      0                       0          (|G|−1)(|G|−2)
      iikl, i jkk, i jik, i jk j     4K (K − 1) (K − 2)                      0                       0                0
                                                                                                     1                1
      i jkl                          K (K − 1) (K − 2) (K − 3)               0                 (|G|−1)(|G|−3)   (|G|−1)(|G|−3)


                                    T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                             22
                          L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

Let r be the number of x ∈ G such that ord (x) = 2, and let r0 = |G| − r − 1 be the number such that
ord (x) > 2. Then

                                           K 2 + (2r + r0 ) K(K−1)        0 K(K−1)(K−2)
                                                                                        !
                            K
                1                    1                       |G|−1 + 2r (|G|−1)(|G|−2)
                   ∑ ∑ px,i jkl = K 4
               K 4 x∈G                            + (r + r0 ) K(K−1)(K−2)(K−3)
                       i, j,k,l=1                                (|G|−1)(|G|−3)
                                                  
                                       1            1
                                  ≤          +O
                                    |G| − 3        K2

using the fact that K ≤ |G|.
    Putting it all together,
                     "                          2 #                                 
                                            1                1        1      2   1      1
                    EX
                      S
                            ∑     Pr [x] −
                                  M        |G|
                                                        ≤
                                                          |G| − 3
                                                                  +O
                                                                      K 2
                                                                          −    +
                                                                            |G| |G|
                                                                                    =O
                                                                                        K2
                            x∈G

and we are done.

    From fingerprinting we also have the following upper bound. Let q be the periodicity of S, defined
as the number of distinct sets gS = {gs : s ∈ S} where g ∈ G.
Proposition 5.5. R12 (Subset (G, S)) = O (log |S| + log log q).
Proof. Assume for simplicity that q = |G|; otherwise we could reduce to a subgroup H ≤ G with
|H|
 2= q.2 The protocol       is as follows: Alice draws a uniform random prime p from the range
                    2   2
 |S| log |G| , 2 |S| log |G| ; she then sends Bob the pair (p, x mod p) where x is interpreted as an in-
teger. This takes O (log |S| + log log |G|) bits. Bob outputs 1 if and only if there exists a z ∈ G such
that zy ∈ S and x ≡ z (mod p). To see the protocol’s correctness, observe that if x 6= z,then there at
                                                                                                                  |S|2 log2 |G|
                                                                                                       
most log |G| primes p such that x − z ≡ 0 (mod p), whereas the relevant range contains Ω                        log(|S| log|G|)
primes. Therefore, if xy ∈ / S, then by the union bound
                                                                                  !
                                                                log (|S| log |G|)
                Pr [∃z : zy ∈ S, x ≡ z (mod p)] = O |S| log |G|                     = o (1) .
                 p                                                |S|2 log2 |G|




6      Open Problems
      • Are R12 ( f ) and Q12 ( f ) polynomially related for every total Boolean function f ? Also, can we
        exhibit any asymptotic separation between these measures? The best separation we know of
        is a factor of 2: for the equality function we have R12 (EQ) ≥ (1 − o (1)) log2 n, whereas Winter
        [37] has shown that Q12 (EQ) ≤ (1/2 + o (1)) log2 n using a protocol involving mixed states.12
        This factor-2 savings is tight for equality: a simple counting argument shows that Q12 (EQ) ≥
    12 If we restrict ourselves to pure states, then (1 − o (1)) log
                                                                2 n qubits are needed. Based on that fact, a previous version of
this paper claimed incorrectly that Q12 (EQ) ≥ (1 − o (1)) log2 n.


                                  T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                            23
                                          S COTT A ARONSON

  (1/2 − o (1)) log2 n; and although the usual randomized protocol for equality [31] uses (2 + o (1))
  log2 n bits, there exist protocols based on error-correcting codes that use only log2 (cn) = log2 n +
  O (1) bits. All of this holds for any constant error probability 0 < ε < 1/2.

• As a first step toward answering the above questions, can we lower-bound Q12 (Coset (G))
  for groups other than Z2p (such as Zn2 , or nonabelian groups)? Also, can we characterize
  Q12 (Subset (G, S)) for all sets S, closing the gap between the upper and lower bounds?

• Is there an oracle relative to which BQP/poly 6= BQP/qpoly?

• Can we give oracles relative to which NP ∩ coNP and SZK are not contained in BQP/qpoly?
  Bennett et al. [8] gave an oracle relative to which NP ∩ coNP 6⊂ BQP, while Aaronson [1] gave
  an oracle relative to which SZK 6⊂ BQP.

• Even more ambitiously, can we prove a direct product theorem for quantum query complexity that
  applies to any partial or total function (not just search)?
                                                  √
• For all f (partial or total), is R12 ( f ) = O ( n) whenever Q12 ( f ) = O (log n)? In other words, is the
  separation of Bar-Yossef et al. [6] the best possible?

• Can the result D1 ( f ) = O mQ12 ( f ) log Q12 ( f ) for partial f be improved to D1 ( f ) = O mQ12 ( f ) ?
                                                                                                           

  We do not even know how to rule out D1 ( f ) = O m + Q12 ( f ) .

• In the Simultaneous Messages (SM) model, there is no direct communication between Alice and
  Bob; instead, Alice and Bob both send messages to a third party called the referee, who then
  outputs the function value. The complexity measure is the sum of the two message lengths.
         ||           ||
  Let R2 ( f ) and Q2 ( f ) be the randomized and quantum bounded-error SM complexities of f re-
                         ||,pub
  spectively, and let R2 ( f ) be the randomized SM complexity if Alice and Bob share an arbi-
  trarily long random string. Building on work by Buhrman et al. [12], Yao [40] showed that
    ||                              ||,pub
  Q2 ( f ) = O (log n) whenever R2 ( f ) = O (1). He then asked about the other direction: for some
                  ||,pub                             ||                           ||
  ε > 0, does R2 ( f ) = O n1/2−ε whenever Q2 ( f ) = O (log n), and does R2 ( f ) = O n1−ε
                                                                                                 
                ||                                                                                   ||
    √  Q2 ( f ) = O (log
  whenever                 n)? In an earlier version of this paper, we showed that R2 ( f ) =
             ||,pub
  O     n R2 ( f ) + log n , which means that a positive answer to Yao’s first question would
  imply a positive answer to the second. Later we learned that Yao independently proved the same
  result [39].
                                                 ||                                                ||,pub
  Here we ask a related question: can Q2 ( f ) ever be exponentially smaller than R2 ( f )?
                                        ||                                               ||
  (Buhrman et al. [12] showed that Q2 ( f ) can be exponentially smaller than R2 ( f ).) Iordanis
  Kerenidis has pointed out to us that, based on the hidden matching problem of Bar-Yossef et al.
                                                                            ||
  [6] discussed in Section 2, one can define a relation for which Q2 ( f ) is exponentially smaller
         ||,pub
  than R2 ( f ). However, as in the case of Q12 ( f ) versus R12 ( f ), it remains to extend that result to
  functions.

                        T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                                    24
                 L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

7   Acknowledgments
I thank Oded Regev for pointing out the connection between advice and one-way communication; An-
dris Ambainis for posing the coset and subset problems; Harry Buhrman for asking about an upper
bound on BQP/qpoly; Lance Fortnow for pointing out that P/poly 6= PP/poly implies PP 6⊂ P/poly;
Hartmut Klauck and Ronald de Wolf for discussions that led to improvements of Lemma 4.5; Iordanis
Kerenidis and Andreas Winter for correcting me about the constant factors in R12 (EQ) and Q12 (EQ); and
the anonymous reviewers and ToC editors for helpful comments and corrections.


References
 [1] * S. A ARONSON: Quantum lower bound for the collision problem. In Proc. 34th ACM STOC, pp.
     635–642, 2002. [STOC:509907.509999, arXiv:quant-ph/0111102]. 11, 6

 [2] * S. A ARONSON: Limitations of quantum advice and one-way communication. In 19th IEEE Conf.
     Comput. Complexity (CCC), pp. 320–332, 2004. [CCC:10.1109/CCC.2004.1313854, arXiv:quant-
     ph/0402095]. 1

 [3] * S. A ARONSON: Quantum computing, postselection, and probabilistic polynomial-time. Submit-
     ted, 2004. [arXiv:quant-ph/0412187]. 3.1

 [4] * L. A DLEMAN , J. D E M ARRAIS , AND M.-D. H UANG: Quantum computability. SIAM J. Com-
     puting, 26(5):1524–1540, 1997. [SICOMP:10.1137/S0097539795293639]. 3.1

 [5] * A. A MBAINIS , A. NAYAK , A. TA -S HMA , AND U. V. VAZIRANI: Quantum dense coding and
     quantum finite automata. J. ACM, 49:496–511, 2002. Earlier version in 31st ACM STOC, 1999,
     pp. 376-383. [JACM:581771.581773, arXiv:quant-ph/9804043]. 1, 2.3

 [6] * Z. BAR -YOSSEF, T. S. JAYRAM , AND I. K ERENIDIS: Exponential separation of quantum
     and classical one-way communication complexity. In 36th ACM STOC, pp. 128–137, 2004.
     [STOC:1007352.1007379, ECCC:TR04-036]. 1, 2.1, 6

 [7] * R. B EALS , H. B UHRMAN , R. C LEVE , M. M OSCA , AND R. DE W OLF: Quantum lower bounds
     by polynomials. J. ACM, 48(4):778–797, 2001. Earlier version in 39th IEEE FOCS, 1998, pp.
     352-361. [JACM:502090.502097, arXiv:quant-ph/9802049]. 1, 4, 4

 [8] * C. B ENNETT, E. B ERNSTEIN , G. B RASSARD , AND U. VAZIRANI: Strengths and weaknesses of
     quantum computing. SIAM J. Computing, 26(5):1510–1523, 1997. [SICOMP:30093, arXiv:quant-
     ph/9701001]. 1, 4, 6

 [9] * C. H. B ENNETT, G. B RASSARD , C. C R ÉPEAU , R. J OZSA , A. P ERES , AND W. W OOTTERS:
     Teleporting an unknown quantum state by dual classical and EPR channels. Phys. Rev. Lett.,
     70:1895–1898, 1993. [PRL:10.1103/PhysRevLett.70.1895]. 1

[10] * C. H. B ENNETT AND J. G ILL: Relative to a random oracle A, PA 6= NPA 6= coNPA with proba-
     bility 1. SIAM J. Computing, 10(1):96–113, 1981. [SICOMP:10/0210008]. 11

                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                           25
                                         S COTT A ARONSON

[11] * S. N. B ERNSTEIN: Sur l’ordre de la meilleure approximation des fonctions continues par les
     polynômes de degré donné. Mem. Cl. Sci. Acad. Roy. Belg., 4:1–103, 1912. French. 10

[12] * H. B UHRMAN , R. C LEVE , J. WATROUS , AND R. DE W OLF: Quantum fingerprinting. Phys. Rev.
     Lett., 87(16), 2001. 4 pages. [PRL:10.1103/PhysRevLett.87.167902, arXiv:quant-ph/0102001]. 1,
     6

[13] * H. B UHRMAN AND R. DE W OLF: Complexity measures and decision tree complexity: a survey.
     Theoretical Computer Sci., 288:21–43, 2002. [TCS:10.1016/S0304-3975(01)00144-X]. 1

[14] * P. D URI Š , J. H ROMKOVI Č , J. D. P. ROLIM , AND G. S CHNITGER: Las Vegas versus determin-
     ism for one-way communication complexity, finite automata, and polynomial-time computations.
     In 14th Symp. Theoret. Aspects of Comp.Sci. (STACS), pp. 117–128, 1997. 2.1

[15] * H. E HLICH AND K. Z ELLER: Schwankung von Polynomen zwischen Gitterpunkten. Mathema-
     tische Zeitschrift, 86:41–44, 1964. 4

[16] * L. F ORTNOW AND J. ROGERS: Complexity limitations on quantum computation. J. Comput.
     Sys. Sci., 59(2):240–252, 1999. [JCSS:10.1006/jcss.1999.1651, arXiv:cs.CC/9811023]. 3.1

[17] * O. G OLDREICH: On quantum computing. 2004. 1 page. 1

[18] * Y. H AN , L. H EMASPAANDRA , AND T. T HIERAUF: Threshold computation and cryptographic
     security. SIAM J. Computing, 26(1):59–78, 1997. [SICOMP:24046]. 3.1

[19] * A. S. H OLEVO: Some estimates of the information transmitted by quantum communication
     channels. Problems of Information Transmission, 9:177–183, 1973. English translation. 1

[20] * H. K LAUCK: Quantum communication complexity. In 27th Int’l. Colloq. on Automata, Lan-
     guages, and Programming (ICALP), pp. 241–252, 2000. [arXiv:quant-ph/0005032]. 1, 2.1, 3

[21] * H. K LAUCK: Quantum time-space tradeoffs for sorting. In 35th ACM STOC, pp. 69–76, 2003.
     [STOC:780542.780553, arXiv:quant-ph/0211174]. 1, 4, 9

[22] * H. K LAUCK , R. Š PALEK , AND R. DE W OLF: Quantum and classical strong direct prod-
     uct theorems and optimal time-space tradeoffs. In 45th IEEE FOCS, pp. 12–21, 2004.
     [FOCS:10.1109/FOCS.2004.52, arXiv:quant-ph/0402123]. 1, 4, 4, 10

[23] * E. K USHILEVITZ AND N. N ISAN: Communication Complexity. Cambridge, 1997. 1

[24] * A. A. M ARKOV: On a question by D. I. Mendeleev. Zapiski Imperatorskoi Akademii Nauk,
     SP6(62):1–24, 1890. Russian. English translation at http://www.math.technion.ac.il/hat/
     fpapers/markov4.pdf. 4

[25] * V. A. M ARKOV: Über Polynome, die in einem gegebenen Intervalle möglichst wenig von Null
     abweichen. Math. Ann., 77:213–258, 1916. German. Originally written in 1892. 4

                         T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                         26
                 L IMITATIONS OF Q UANTUM A DVICE AND O NE - WAY C OMMUNICATION

[26] * M. M INSKY AND S. PAPERT: Perceptrons (2nd edition). MIT Press, 1988. First appeared in
     1968. 4

[27] * A. NAYAK: Optimal lower bounds for quantum automata and random access codes. In 40th IEEE
     FOCS, pp. 369–377, 1999. [FOCS:10.1109/SFFCS.1999.814608, arXiv:quant-ph/9904093]. 1, 3

[28] * M. N IELSEN AND I. C HUANG: Quantum Computation and Quantum Information. Cambridge
     University Press, 2000. 1, 2.3

[29] * N. N ISAN AND M. S ZEGEDY: On the degree of Boolean functions as real polynomials. Com-
     putational Complexity, 4(4):301–313, 1994. 4, 4

[30] * H. N ISHIMURA AND T. YAMAKAMI: Polynomial time quantum computation with ad-
     vice. Inform. Proc. Lett., 90:195–204, 2003. [IPL:10.1016/j.ipl.2004.02.005, ECCC:TR03-059,
     arXiv:quant-ph/0305100]. 6, 2.2

[31] * M. R ABIN AND A. C-C. YAO: Manuscript, cited in [38], 1979. 1, 5.1, 6

[32] * T. J. R IVLIN: Chebyshev Polynomials: From Approximation Theory to Algebra and Number
     Theory. Wiley, 1990. 4, 4

[33] * T. J. R IVLIN AND E. W. C HENEY: A comparison of uniform approximations on an interval and
     a finite subset thereof. SIAM J. Numerical Analysis, 3(2):311–320, 1966. [SINUM:03/0703024].
     4

[34] * N. S AUER: On the density of families of sets. J. Combinatorial Theory Series A, 13:145–147,
     1972. [JCombThA:10.1016/0097-3165(72)90019-2]. 1, 3

[35] * Y. S HI: Both Toffoli and controlled-NOT need little help to do universal quantum computation.
     Quantum Information and Computation, 3(1):84–92, 2002. [arXiv:quant-ph/0205115]. 3.1

[36] * J. WATROUS: Succinct quantum proofs for properties of finite groups. In 41st IEEE FOCS, pp.
     537–546, 2000. [FOCS:10.1109/SFCS.2000.892141, arXiv:cs.CC/0009002]. 2.2

[37] * A. W INTER: Quantum and classical message identification via quantum channels. In O. Hirota,
     editor, Quantum Information, Statistics, Probability (A. S. Holevo festschrift), pp. 172–189. Rinton,
     2004. [arXiv:quant-ph/0401060]. 2.1, 6

[38] * A. C-C. YAO: Some complexity questions related to distributive computing. In 11th ACM STOC,
     pp. 209–213, 1979. [STOC:804414]. 1, 31

[39] * A. C-C. YAO: Princeton University course assignment, 2001. 6

[40] * A. C-C. YAO: On the power of quantum fingerprinting. In 35th ACM STOC, pp. 77–81, 2003.
     [STOC:780542.780554, ECCC:TR02-074]. 1, 6



                          T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                             27
                                                S COTT A ARONSON

AUTHOR

      Scott Aaronson13
      postdoc
      Institute for Advanced Study, Princeton
      aaronson ias edu
      http://www.cs.berkeley.edu/~aaronson


ABOUT THE AUTHOR

      Scott Aaronson received his Ph.D. at the University of California, Berkeley, in 2004 under
         Umesh Vazirani. He is proud to be the first author in Theory of Computing.




 13 This work was performed while the author was a graduate student at UC Berkeley.



                            T HEORY OF C OMPUTING, Volume 1 (2005), pp. 1–28                       28