The Power of Unentanglement

Authors Scott Aaronson, Salman Beigi, Andrew Drucker, Bill Fefferman, Peter Shor,
Plaintext
                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42
                                          www.theoryofcomputing.org




                      The Power of Unentanglement
  Scott Aaronson∗                     Salman Beigi†   Andrew Drucker‡                      Bill Fefferman§
                                                Peter Shor¶
                  Received: April 22, 2008; revised: December 27, 2008; published: May 11, 2009.



        Abstract: The class QMA(k), introduced by Kobayashi et al., consists of all languages
        that can be verified using k unentangled quantum proofs. Many of the simplest questions
        about this class have remained embarrassingly open: for example, can we give any evidence
        that k quantum proofs are more powerful than one? Does QMA(k) = QMA(2) for k ≥ 2?
        Can QMA(k) protocols be amplified to exponentially small error?
            In this paper, we make progress on all of the above questions.

    • We give a protocol by which a verifier can be convinced that a 3S AT formula of size m is sat-
      isfiable, with constant soundness, given O(e √m) unentangled quantum witnesses with O(log m)
      qubits each. Our protocol relies on the existence of very short PCPs.

    • We show that assuming a weak version of the Additivity Conjecture from quantum informa-
      tion theory, any QMA(2) protocol can be amplified to exponentially small error, and QMA(k) =
      QMA(2) for all k ≥ 2.

    • We prove the nonexistence of “perfect disentanglers” for simulating multiple Merlins with one.


ACM Classification: F.1.2, F.1.3
AMS Classification: 81P68, 68Q15, 68Q17
Key words and phrases: quantum computing, PCPs, entanglement, Merlin-Arthur, 3SAT
   ∗ To whom correspondence should be addressed. Supported by MIT as well as the NSF IGERT program.
   † Supported in part by NSF Grant CCF-0829421
   ‡ Supported by an Akamai Presidential Graduate Fellowship.
   § Supported by a Caltech Division of Engineering and Applied Science Fellowship and NSF Grants CCF-0830787, CCF-

0829909.
   ¶ Supported in part by NSF Grant CCF-0829421



  2009 S. Aaronson, S. Beigi, A. Drucker, B. Fefferman, and P. Shor
  Licensed under a Creative Commons Attribution License                                DOI: 10.4086/toc.2009.v005a001
                     S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

1     Introduction
Quantum entanglement is often described as a complicated, hard-to-understand resource. But ironically,
many questions in quantum computing are easiest to answer assuming unlimited entanglement, and
become much more difficult if entanglement is not allowed! One way to understand this is that Hilbert
space—the space of all quantum states—has extremely useful linear-algebraic properties, and when we
restrict to the set of separable states we lose many of those properties. So for example, finding a quantum
state that maximizes the probability of a given measurement outcome is just a principal eigenvector
problem, but finding a separable state that does the same is NP-hard [7].
    These observations naturally give rise to a general question at the intersection of computational
complexity and entanglement theory. Namely: supposing we had k quantum proofs, could we use the
promise that the proofs were unentangled to verify mathematical statements beyond what we could
verify otherwise?


1.1    Background and related work
The class QMA, or Quantum Merlin-Arthur, consists of all languages that admit a proof protocol in
which Merlin sends Arthur a polynomial-size quantum state |ψi, and then Arthur decides whether to
accept or reject in quantum polynomial time. This class was introduced by Knill [17], Kitaev [15], and
Watrous [29] as a quantum analogue of NP. By now we know a reasonable amount about QMA: for
example, it allows amplification of success probabilities, is contained in PP, and has natural complete
promise problems. (See Aharonov and Naveh [2] for a survey.)
    In 2003, Kobayashi, Matsumoto, and Yamakami [19] defined a generalization of QMA called
QMA(k). Here there are k Merlins, who send Arthur k quantum proofs |ψ1 i, . . . , |ψk i respectively that
are guaranteed to be unentangled with each other. (Thus QMA(1) = QMA.) Notice that in the classi-
cal case, this generalization is completely uninteresting: we have MA(k) = MA for all k, since we can
always simulate k Merlins by a single Merlin who sends Arthur a concatenation of the k proofs. In the
quantum case, however, a single Merlin could cheat by entangling the k proofs, and we know of no
general way to detect such entanglement.
    When we try to understand QMA(k), we encounter at least three basic questions. First, do multiple
quantum proofs ever actually help? That is, can we find some sort of evidence that QMA(k) 6= QMA(1)
for some k? Second, can QMA(k) protocols be amplified to exponentially small error? Third, are two
Merlins the most we ever need? That is, does QMA(k) = QMA(2) for all k ≥ 2?1
    We know of three previous results that are relevant to the above questions.
    First, in their original paper on QMA(k), Kobayashi et al. [19] proved that a positive answer to the
second question implies a positive answer to the third. That is, if QMA(k) protocols can be amplified,
then QMA(k) = QMA(2) for all k ≥ 2.
    Second, Liu, Christandl, and Verstraete [21] gave a natural problem from quantum chemistry, called
pure state N-representability, which is in QMA(2) but is not known to be in QMA.
   1 The second and third questions are motivated, in part, by an analogy to classical multi-prover interactive proof systems

—where the Parallel Repetition Theorem of Raz [26] and the MIP(k) = MIP(2) theorem of Ben-Or et al. [3] turned out to be
crucial for understanding the class MIP.


                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                            2
                                        T HE P OWER OF U NENTANGLEMENT

    Third, Blier and Tapp [7] recently (and independently of us) gave an interesting QMA(2) protocol
for an NP-complete problem, namely 3-C OLORING. In this protocol, Arthur verifies that an n-vertex
graph G is 3-colorable, using two unentangled witnesses with only O(log n) qubits each. There is a
crucial caveat, though: if G is not 3-colorable, then Arthur can only detect this with probability Ω(1/n6 )
rather than constant probability.2


1.2   Our results
In this paper, we present new results about all three problems listed previously. Our main results are as
follows:


Proving 3S AT with O(e √m) qubits. In Section 3, we give a protocol by which Arthur can verify that
                                                                  √
a 3S AT instance of size m has a satisfying assignment, using O( m polylog m) unentangled witnesses
with O(log m) qubits each. Of course, this is a larger number of qubits than in the protocol of Blier and
Tapp [7], but the point is that Arthur can detect cheating with constant probability. Our protocol relies
on the PCP Theorem, and in particular, on the existence of PCP’s of size O(m polylog m), which was
recently shown by Dinur [11].


Weak additivity implies amplification. In Section 4, we reduce several open problems about QMA(k)
to weak versions of the Additivity Conjecture in quantum information theory. (In an earlier version of
this paper, we pointed out that the weak versions would suffice, but based our results on the original
Additivity Conjecture, which was widely believed at the time. Then, as the paper was undergoing final
revisions, Hastings [13] announced a spectacular disproof of the Additivity Conjecture.) In particular,
we show that weak versions of Additivity Conjecture imply that any QMA(2) protocol can be amplified
to exponentially small error, that the “QMA(k) hierarchy” collapses to QMA(2), and that forcing the
Merlins’ witnesses to be identical does not change the power of QMA(2).


Nonexistence of perfect disentanglers. In Section 5, we rule out one possible approach to showing
QMA(2) = QMA, by proving an extremely simple result that nevertheless seems new and might be of
interest. Namely, given finite-dimensional Hilbert spaces H, K, there is no quantum operation mapping
the set of all states in H to the set of all separable states in K ⊗ K.
     Note: In an earlier version of this paper, we claimed to give evidence that QMA(2) ⊆ PSPACE,
by showing that this would follow from what we called the “Strong Amplification Conjecture”: that
it is possible to amplify any QMA(2) protocol, in such a way that one of the Merlin’s Hilbert space
dimensions remains small compared to the inverse of the error bound. (The trivial upper bound is
QMA(2) ⊆ NEXP, which follows by just guessing exponential-size classical descriptions of the k quan-
tum proofs.) However, Fernando Brandão subsequently pointed out to us that the Strong Amplification

   2 Indeed, if the soundness gap were constant rather than 1/ poly(n), then Blier and Tapp’s protocol could presumably be

“scaled up by an exponential” to show QMA(2) = NEXP!


                              T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                          3
                      S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Conjecture would actually imply QMA(2) = QMA!3 Thus, the Strong Amplification Conjecture now
seems very unlikely to be true, and while our result was correct, it has been both superseded and effec-
tively nullified in its import. We have not included it in the current version.
    In the remainder of this introduction, we give some intuition behind each of these results.

1.3                      e √m) qubits
       Proving 3SAT with O(
Let ϕ be a 3S AT instance with n variables. Then how long a proof does Merlin need to send Arthur, to
convince him that ϕ is satisfiable? (As usual, Merlin is an omniscient prover and Arthur is a skeptical
BPP verifier.)
     Intuitively, it seems the answer should be about n bits. Certainly, if sublinear-size proofs of satisfia-
bility existed, then 3S AT would be in solvable in 2o(n) time, since Arthur could just loop over all possible
proofs until he found one that worked. Even in the quantum case, one can make a similar statement: if
quantum proofs of satisfiability with o(n) qubits existed, then 3S AT would have a 2o(n) -time quantum
algorithm.
     On the other hand, suppose Arthur is given several quantum proofs, which are guaranteed to be
unentangled with each other. Then the previous argument seems no longer to work. And this at least
raises the possibility that 3S AT might have sublinear proofs in this setting.
     We will show that this possibility is realized:

Theorem 1.1. Let ϕ be a satisfiable 3S AT instance with n variables and m ≥ n clauses. Then one can
                                                                                         √
prove the satisfiability of ϕ, with perfect completeness and constant soundness, using O( m polylog m)
unentangled quantum proofs, each with O(log m) qubits.

    In particular, if m = O(n),4 then we get an almost-quadratic improvement over the best known
witness size in the classical world (or for that matter, in the quantum world with one prover).
    Obviously, Theorem 1.1 does not immediately generalize to all NP-complete problems, since the
blowup in reducing to 3S AT could be more than quadratic. It is an interesting question for which NP-
complete problems an analogue of Theorem 1.1 holds and for which it does not.
   3 Here is a sketch of the argument, which we are grateful to Brandão for allowing us to include. If we have two Merlins A

and B, and the witness of A has only s(n) qubits, then any bipartite pure state |ψAB i can be decomposed in Schmidt form as

                                                                    2s(n)
                                                           |ψAB i = ∑ λi |φi iA |ϕi iB ,
                                                                    i=1

where the |φi iA |ϕi iB ’s are all orthogonal to each other, regardless of the size of B. Now, by assumption, every unentangled state
of the form |φi iA |ϕi iB causes Arthur’s amplified protocol to accept with only a tiny probability—without loss of generality,
less than 2−2s(n) . But this means that Arthur’s acceptance probability on |ψAB i can be at most
                                                                                       !2
                                                   2s(n)                       2s(n)
                                          −2s(n)
                                         2         ∑       λi∗ λ j ≤ 2−2s(n)   ∑ |λi |      ≤ 2−s(n)
                                                   i=1                         i=1

by Cauchy-Schwarz. Therefore, the amplified QMA(2) protocol is still sound even if the Merlins entangle their witnesses, and
hence the language being verified is in QMA.
   4 Note that setting m = O(n) is fairly common in the study of 3S AT , and indeed, the “hardest” random 3S AT instances are

believed to occur around m ≈ 4.25n.


                                 T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                                  4
                                    T HE P OWER OF U NENTANGLEMENT

    We now explain the intuition behind Theorem 1.1. The first step in our protocol is to reduce 3S AT to
a more convenient problem called 2-O UT-O F -4-SAT, where every clause has exactly four literals, and
is satisfied if and only if exactly two of the literals are. We also want our 2-O UT-O F -4-SAT instance
to be a PCP: that is, either it should be satisfiable, or else at most a 1 − ε fraction of clauses should
be satisfiable for some constant ε > 0. Finally we want the instance to be bounded-literal, meaning
that every variable occurs in at most a constant number of clauses. Fortunately, we can get all of this
via known classical reductions, including the improved PCP Theorem of Dinur [11], which increase the
number of variables and clauses by at most a polylog m factor.
    So suppose Arthur has applied these reductions, to obtain a bounded-literal 2-O UT-O F -4-SAT PCP
instance φ with N variables. And now suppose Merlin sends Arthur a log N-qubit quantum state of the
form
                                                   1 N
                                         |ψi = √ ∑ (−1)xi |ii ,
                                                    N i=1
where x1 , . . . , xN ∈ {0, 1}N is the claimed satisfying assignment for φ . (We call a state having the above
form a proper state.) Then we show that Arthur can check the veracity of x1 , . . . , xN with perfect com-
pleteness and constant soundness. To do so, Arthur simply measures |ψi in a basis corresponding to the
clauses of φ . With constant probability, he will get an outcome of the form

                              (−1)xi |ii + (−1)x j | ji + (−1)xk |ki + (−1)x` |`i

where (i, j, k, `) is a randomly chosen clause of φ . Assuming this occurs, Arthur can perform a measure-
ment that accepts with certainty if xi + x j + xk + x` = 2 and rejects with constant probability otherwise.
     Thus, if only Arthur could somehow assume |ψi was proper, we would have a log N-qubit witness
for 3S AT! The problem, of course, is that Arthur has no way of knowing whether Merlin has cheated
and given him an improper state. For example, what if Merlin concentrates the amplitude of |ψi on
some small subset of basis states, and simply omits the other basis states?         √
     Our key technical contribution is to show that, if Arthur gets not one but O( N) copies of |ψi, then
he can check with constant√soundness whether |ψi is proper or far from any proper state. Indeed, even
if Arthur is given K = O( N) states |ϕ1 i, . . . , |ϕK i which are not necessarily identical, so long as the
states are not entangled with each other Arthur can check with constant soundness whether most of them
are√close to some proper state |ψi. This then yields a protocol for 3S AT with constant soundness and
O( N) unentangled proofs of size O(log N)—for Arthur can just choose randomly whether to perform
the satisfiability test described above, or to check whether most of the |ϕk i’s are close to some proper
state |ψi.
     To check that most of the states are at least close to each other, Arthur simply has to perform a “swap
test” between (say) |ϕ1 i and a random other state |ϕk i. So the problem is reduced to the following:
assuming most of the |ϕk i’s are close to |ϕ1 i, how can Arthur decide whether |ϕ1 i is proper or far from
any proper state?
     In our protocol, Arthur does this by first choosing a matching M on the set {1, . . . , N} uniformly at
random. He then measures each state |ϕk i in an orthonormal basis that contains the vectors |ii + | ji and
|ii − | ji for every edge (i, j) ∈ M.                                                         √
     Let us think about what happens when Arthur does this. Since he is performing O( N) measure-
ments on almost-identical states, and since each measurement has N possible outcomes, by using a

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                 5
                    S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

suitable generalization of the Birthday Paradox one can prove that with Ω(1) probability, Arthur will
find a collision: that is, two outcomes of the form |ii ± | ji, for the same edge (i, j) ∈ M. So suppose this
happens. Then if the |ϕk i’s are all equal to a proper state |ψi = ∑Ni=1 (−1)xi |ii, the two outcomes will
clearly “agree”: that is, they will either both be |ii + | ji (if xi = x j ) or both be |ii − | ji (if xi 6= x j ). On
the other hand, suppose the |ϕk i’s are far from any proper state. In that case, we show that the outcomes
will “disagree” (that is, one will be |ii + | ji and the other will be |ii − | ji) with Ω(1) probability.
     To understand why, consider that there are two ways for a state |ϕi = ∑Ni=1 αi |ii to be far from
proper. First, the probability distribution (|α1 |2 , . . . , |αN |2 ), which corresponds to measuring |ϕi in the
standard basis, could be far from the uniform distribution. Second, the αi ’s could be roughly equal in
magnitude, but they could have complex phases that cause |ϕi to be far from any state involving positive
and negative real amplitudes only. In either case, though, if Arthur measures according to a random
matching M, then with high probability he will obtain an outcome αi |ii + α j | ji that is not close to either
|ii + | ji or |ii − | ji.
     As one would imagine, making all of these claims quantitative and proving them requires a good
deal of work.
     The reason we need the assumption of unentanglement is that without it, cheating Merlins might
correlate their states in such a way that a swap test between any two states passes with certainty, and yet
no collisions are ever observed. As we point out in Section 3.7, it√ seems unlikely that the assumption of
unentanglement can be removed, since this would lead to a 2O( m) -time classical algorithm for 3S AT.
                                                                         e

On the other hand, we believe it should be possible to improve our protocol to one involving only two
unentangled proofs. This is a problem we leave to future work.


1.4    Weak additivity implies amplification
In the one-prover case, it is easy to amplify a QMA protocol with constant error to a protocol with
exponentially small error. Merlin simply sends Arthur m = poly(n) copies of his proof; then Arthur
checks each of the copies and outputs the majority answer. To show that this works, the key observation
is that Merlin cannot gain anything by entangling the m proofs. Indeed, because of the convexity of
Arthur’s acceptance probability, Merlin might as well send Arthur an unentangled state |ψi⊗m , in which
case the completeness and soundness errors will decrease exponentially with m by the usual Chernoff
bound.
     Now suppose we try the same argument for QMA(2). If we ask each Merlin to send m copies of
his state, each Merlin might cheat by instead sending an entangled state on m registers. And in that
case, as soon as Arthur checks the first copy (consisting of one register from MerlinA and one from
MerlinB ), his doing so might create entanglement in the remaining copies where there was none before!
This is because of a counterintuitive phenomenon called entanglement swapping [30], by which two
quantum systems that have never interacted in the past can nevertheless become entangled, provided
those systems are entangled with other systems on which an entangling measurement is performed.
     Let us give a small illustration of this phenomenon. Suppose that each “proof” is a single qubit, and
that Arthur asks for two proofs from each Merlin (thus, 4 qubits in total). Then if MerlinA is dishonest,
he might send Arthur the entangled state |ψA i = |00i + |11i, and likewise MerlinB might send Arthur
|ψB i = |00i + |11i (omitting normalization). Now suppose Arthur measures the qubits |ψA i(1) and

                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                       6
                                           T HE P OWER OF U NENTANGLEMENT

|ψB i(1) in the “Bell basis,” consisting of the four entangled states |00i + |11i, |00i − |11i, |01i + |10i,
and |01i − |10i. Then conditioned on the outcome of this measurement, it is not hard to see that the joint
state of |ψA i(2) and |ψB i(2) will also be entangled.5
     Of course, as soon as the remaining m−1 copies become entangled, we lose our soundness guarantee
and the proof of amplification fails.
     Nevertheless, there is a natural amplification procedure that seems like it ought to be robust against
such “pathological” behavior. Suppose Arthur chooses the number of copies m to be very large, say n10
(much larger than the number of copies he is actually going to check), and suppose that each copy he
does check is chosen uniformly at random. Then whatever entanglement Arthur produces during the
checking process ought be “spread out” among the copies, so that with high probability, every copy that
Arthur actually encounters is close to separable.
     It follows, from the “finite quantum de Finetti theorem” of König and Renner [20], that if the number
of copies were large enough then the above argument would work. Unfortunately, the number of copies
needs to be exponential in n for that theorem to apply.
     We will show that the argument works with poly(n) copies, provided one can formalize terms like
“spread out” and “close to separable” using a suitable measure of entanglement. The only problem, then,
is that a measure of entanglement with the properties we want is not yet known to exist! Informally, we
want an entanglement measure E that

   (i) is superadditive (meaning it “spreads itself out” among registers), and

  (ii) is faithful (meaning if E(ρ) is polynomially small then ρ is polynomially close to a separable state
       in trace distance).

     Among existing entanglement measures, there is one—the entanglement of formation EF , introduced
by Bennett et al. [6]—that is known to satisfy (ii), and that until recently was conjectured to satisfy
(i).6 This conjecture is known to be equivalent to the Additivity Conjecture from quantum information
theory [27]. Thus, an earlier version of this paper assumed the Additivity Conjecture in proving several
results.
     In a dramatic recent development, Hastings [13] has shown that the Additivity Conjecture is false.
Fortunately, our results require only weak, asymptotic versions of the Additivity Conjecture, which we
still conjecture are true, and which are stated formally in Section 4.1.
     Our first result says that, if a weak Additivity Conjecture holds, then any QMA(2) protocol can be
amplified to exponentially small error. We also prove, unconditionally, that any QMA(k) protocol with
constant soundness can be simulated by a QMA(2) protocol with Ω(1/k) soundness. Combining these
two results, we find that if a weak Additivity Conjecture holds, then QMA(k) = QMA(2) for all k ≥ 2.7
Another interesting consequence we get is that, assuming a weak Additivity Conjecture, k Merlins who
    5 Indeed, this example can be seen as a special case of quantum teleportation [5]: Arthur uses the entanglement between

MerlinA ’s left and right registers, as well as between MerlinB ’s left and right registers, to teleport an entangled state into the
two right registers by acting only on the two left registers.
    6 There is also another measure—the squashed entanglement E , introduced by Christandl and Winter [10] —that is known
                                                                    sq
to satisfy (i), but unfortunately can be shown not to satisfy (ii).
    7 Kobayashi et al. (personal communication) have shown independently of us that if QMA(2) protocols can be amplified to

exponentially small error, then QMA(k) = QMA(2) for all k ≥ 2.


                                 T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                                 7
                       S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

all send copies of the same witness yield the same computational power as k Merlins who can send
different witnesses.


1.5     Nonexistence of perfect disentanglers
While we now have a few examples where multiple quantum proofs seem to help—such as the 3S AT
protocol of this paper, and the pure state N-representability problem [21]—we still have no “complexity-
theoretic” evidence that QMA(2) 6= QMA. Indeed, even proving an oracle separation between QMA(2)
and QMA seems extremely difficult.
     Thus, let us consider the other direction and try to prove these classes are the same. Potentially the
first approach would be to equip Arthur with a disentangler: that is, a quantum operation that would
convert Merlin’s (possibly-entangled) witness into a separable witness, and thereby let Arthur simulate
a QMA(2) protocol in QMA. In this paper we take a first step in the study of disentanglers, by proving
that in finite-dimensional Hilbert spaces, there is no operation that produces all and only the separable
states as output.
     Note that, if we are willing to settle for there being an output close to every separable state, then
a disentangler does exist: simply take as input a classical description of the separable state σ to be
prepared, measure that description in the computational basis, and then prepare σ .8
     Likewise, if we are willing to settle for every output being close to a separable state, then a disentan-
gler also exists. For some sufficiently large N, take as input a quantum state on registers R0 , R1 , . . . , RN ,
choose an index i ∈ [N] uniformly at random, and output the joint state of R0 and Ri (discarding every-
thing else). It follows, from the finite quantum de Finetti theorem [20], that with high probability this
state will be close to separable.
     The key problem with both of these approaches is that the input Hilbert space needs to be exponen-
tially larger than the output Hilbert space. In the case of the “de Finetti approach,” this follows from
considering the maximally antisymmetric state

                                         1
                                        √    ∑    (−1)sgn(σ ) |σ (1)i · · · |σ (N)i ,
                                         N! σ ∈SN

which has the properties that

   (i) there are exponentially many registers (as a function of n = log N, the size of a given register), but

  (ii) the reduced state of any two registers is far from a separable state.

    Watrous (personal communication) has conjectured that this exponentiality is an unavoidable feature
of any approximate disentangler. Proving or disproving this remains one of the central open problems
about QMA(2).
   8 This example also shows that our result fails if the input space is infinite-dimensional—for then one could give an

infinitely-precise description of σ .


                                  T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                    8
                                          T HE P OWER OF U NENTANGLEMENT

1.6     Table of contents
The rest of the paper is organized as follows.

       Section 2     Preliminaries
       Section 3                         e √m) qubits
                     Proving 3S AT with O(
       Section 4     Weak additivity implies amplification (as well as QMA(k) = QMA(2), etc.)
       Section 5     Nonexistence of perfect disentanglers
       Section 6     List of open problems



2      Preliminaries
In this section, we first define the complexity class QMA(k, a, b), or Quantum Merlin-Arthur with k
unentangled witnesses and error bounds a, b, and state some basic facts and conjectures about this class.
We then survey some concepts from quantum information theory we will need, including trace distance,
superoperators, and the swap test.

2.1     Multiple-prover QMA
Definition 2.1. A language L is in QMA(k, a, b) if there exists a polynomial-time quantum algorithm Q
such that for all inputs x ∈ {0, 1}n :

    (i) If x ∈ L then there exist witnesses |ψ1 i, . . . , |ψk i, with poly(n) qubits each, such that Q accepts
        with probability at least b given |xi ⊗ |ψ1 i ⊗ · · · ⊗ |ψk i.

    (ii) If x ∈
              / L then Q accepts with probability at most a given |xi⊗|ψ1 i⊗· · ·⊗|ψk i, for all |ψ1 i, . . . , |ψk i.

      As a convention, we also define QMA(k) := QMA(k, 1/3, 2/3), and QMA := QMA(1).9

   The above definition makes sense for all integers k from 1 up to poly(n), and nonnegative real
functions 2− poly(n) ≤ a(n) < b(n) ≤ 1 − 2− poly(n) .
   In the one-prover case, we know that QMA(1, 1/3, 2/3) = QMA(1, 2−p(n) , 1 − 2−p(n) ) for all poly-
nomials p (see [22] for example). This is what justifies the convention QMA(1) = QMA(1, 1/3, 2/3).
By contrast, we do not yet know whether the convention QMA(k) = QMA(k, 1/3, 2/3) is justified for
k ≥ 2. That it is justified is the content of the following conjecture:

Conjecture 2.2 (Amplification). QMA(k, a, b) = QMA(k, 2−p(n) , 1 − 2−p(n) ) for all k, all a < b with
b − a = Ω(1/ poly(n)), and all polynomials p.

   One is tempted to make an even stronger conjecture: that the entire hierarchy of QMA(k, a, b)’s we
have defined collapses to just two complexity classes, namely QMA and QMA(2).
    9 For purposes of definition, we assume we have fixed a specific machine model (e. g., a universal set of quantum gates)—

though if the Amplification Conjecture to be discussed shortly holds, then this choice will turn out not to matter.


                                T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                           9
                    S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Conjecture 2.3 (Collapse). QMA(k, a, b) = QMA(2, 2−p(n) , 1 − 2−p(n) ) for all k ≥ 2, all a < b with
b − a = Ω(1/ poly(n)), and all polynomials p.

    The main progress so far on these conjectures has been due to Kobayashi et al. [19], who showed
that the Amplification and Collapse Conjectures are actually equivalent:

Theorem 2.4 ([19]). Conjecture 2.2 implies Conjecture 2.3.

     Let us observe that one can make the completeness error (though not the soundness error) exponen-
tially small, using a simple argument based on Markov’s inequality. We will need this observation in
Section 4.

Lemma 2.5. QMA(k, a, b) ⊆ QMA(k, 1−(b−a), 1−2−p(n) ) for all k, all a < b < 1, and all polynomials
p.

Proof. We use the following protocol. Each Merlin provides m = C · p(n)/(b − a)2 registers for some
constant C. Then Arthur runs his verification procedure m times in parallel, once with each k-tuple of
registers, and accepts if and only if at least a d fraction of invocations accept, for some d slightly less
than b.
     To show completeness, we use a Chernoff bound. Assuming the Merlins are honest, each one simply
provides m copies of his witness. Then on each invocation, Arthur accepts with independent probability
at least b. So assuming we chose a sufficiently large constant C, the probability that Arthur accepts less
than dm times is at most 2−p(n) .
     To show soundness, we use Markov’s inequality. The expected number of accepting invocations
is at most am (by linearity of expectation, this is true even if the registers are entangled). Hence the
probability that this number exceeds dm is at most a/d, which we can ensure is less than 1 − (b − a) by
                   a
choosing d ∈ ( 1−(b−a)  , b) (note that such a d must exist by the assumption a < b < 1).

2.2     Quantum information
We now review some quantum information concepts that we will need. For more details see Nielsen and
Chuang [23] for example.
    Given two mixed states ρ and σ , their trace distance is kρ − σ ktr := 21 ∑ni=1 |λi |, where (λ1 , . . . , λn )
are the eigenvalues of ρ − σ . We say σ is ε-close to ρ if kρ − σ ktr ≤ ε, and ε-far otherwise. The
importance of trace distance comes from the following fact:

Proposition 2.6. Suppose σ is ε-close to ρ. Then any measurement that accepts ρ with probability p,
accepts σ with probability at most p + ε.

      Trace distance also satisfies the triangle inequality:

Proposition 2.7. kρ − σ ktr + kσ − ξ ktr ≥ kρ − ξ ktr for all ρ, σ , ξ .

    Given a pure state |ψi and a mixed state ρ, their squared fidelity hψ|ρ|ψi is the probability of
obtaining |ψi as the result of a projective measurement on ρ. Squared fidelity behaves nicely under
tensor products:

                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                   10
                                      T HE P OWER OF U NENTANGLEMENT

Proposition 2.8. Given a k-partite state ρ A1 A2 ···Ak , suppose there are pure states |ψ1 i, . . . , |ψk i such that
hψi |ρ Ai |ψi i ≥ 1 − εi for all i. Let |Ψi := |ψ1 i ⊗ · · · ⊗ |ψk i and ε := ε1 + · · · + εk . Then

                                             Ψ|ρ A1 A2 ···Ak |Ψ ≥ 1 − ε .

Proof. We can assume without loss of generality that |ψi i = |0i for all i. Then each ρ Ai , when measured
in the standard basis, yields the outcome |0i with probability at least 1 − εi . By the union bound,
it follows that ρ A1 A2 ···Ak , when measured in the standard basis, yields the outcome |Ψi = |0i⊗k with
probability at least 1 − ε. Hence hΨ|ρ A1 A2 ···Ak |Ψi ≥ 1 − ε.

     Trace distance and squared fidelity are related to each other as follows:
Proposition 2.9. For all ρ and |ψi,
                                                                     2
                                        hψ|ρ|ψi + ρ − |ψi hψ|              ≤ 1.
                                                                     tr

   The most general kind of operation on quantum states is called a superoperator. Any superoperator
Φ acting on n qubits can be expressed in the following operator-sum representation:
                                           22n                       22n
                                 Φ(ρ) = ∑        Ei ρEi† ,   where   ∑ Ei† Ei = I .
                                           i=1                       i=1

    Given a product state ρ ⊗ σ , the swap test is a quantum operation that measures the overlap between
ρ and σ . The test accepts with probability 21 (1 + tr(ρσ )) and rejects otherwise. The swap test can also
reveal information about the purity of a state, as follows:
Proposition 2.10. Suppose hψ|ρ|ψi < 1 − ε for all pure states |ψi. Then a swap test between ρ and
any other state rejects with probability greater than ε/2.
Proof. Choose a basis that diagonalizes ρ, so that ρ = diag(λ1 , . . . , λN ) where λ1 , . . . , λN are ρ’s eigen-
values. By assumption, λi < 1 − ε for every i. So given any mixed state σ , a swap test between ρ and σ
accepts with probability

                          1 + tr (ρσ ) 1 1 N        1 1−ε N           ε
                               2
                                      = + ∑ λi σii < +
                                       2 2 i=1      2     ∑ σii = 1 − 2 .
                                                       2 i=1




3     Proving 3SAT with quadratically fewer qubits
                                                                                                e √m)
We now present our protocol for proving the satisfiability of a 3S AT instance of size m, using O(
unentangled quantum proofs with O(log m) qubits each. For ease of presentation, the protocol will be
broken into a sequence of four steps:

    (1) In Section 3.1, we give a sequence of classical reductions, from the original 3S AT problem to a
        different NP-complete problem that we will actually use.

                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                     11
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

  (2) In Section 3.2, we describe a protocol for the special case where Merlin’s message to Arthur is
      “proper”: that is, of the form √1N ∑Ni=1 (−1)xi |ii for some Boolean x1 , . . . , xN .

  (3) In Section 3.3, we generalize our protocol to the case where the Merlins send Arthur O(    e √m)
      witnesses, which are not necessarily proper but which are guaranteed to be identical to each other.
  (4) In Section 3.5, we remove the restriction that the states be identical.
   We end in Section 3.7 with some general observations about our protocol and the prospects for
improving it further.

3.1   Classical reductions
It will be convenient to work not with 3S AT but with a related problem called 2-O UT-O F -4-SAT, in
which every clause has exactly four literals, and is satisfied if and only if exactly two of the literals are.
We will also need our 2-O UT-O F -4-SAT instance to be a PCP, and to be bounded-literal (that is, every
variable should appear in at most O(1) clauses). The following lemma shows how to get everything we
want with only a polylogarithmic blowup in the number of variables and clauses.
Lemma 3.1. There exists a polynomial-time Karp reduction that maps a 3S AT instance ϕ to a 2-O UT-
O F -4-SAT instance φ , and that has the following properties:
  (i) If ϕ has n variables and m ≥ n clauses, then φ has O(m polylog m) variables and O(m polylog m)
      clauses.
  (ii) Every variable of φ occurs in at most c clauses, for some constant c.
 (iii) The reduction is a PCP reduction (meaning that satisfiable instances map to satisfiable instances,
       while unsatisfiable instances map to instances that are ε-far from satisfiable for some constant
       ε > 0).
Proof. Given a 3S AT instance ϕ, we first amplify its soundness gap to a constant using the celebrated
method of Dinur [11]. Next we use a reduction due to Papadimitriou and Yannakakis [25], which makes
every variable occur in exactly 29 clauses, without destroying the soundness gap. Finally we use a
gadget due to Khanna et al. [14], which converts from 3S AT to 2-O UT-O F -4-SAT, without destroying
either the soundness gap or the bounded literal property. Note that the reduction of Dinur [11] incurs
only a polylogarithmic blowup in the total size of the instance, while the other two reductions incur a
constant blowup.

3.2   The proper state case
Suppose Arthur has applied Lemma 3.1, to obtain a bounded-literal 2-O UT-O F -4-SAT instance φ with
N = O(m polylog m) variables, M = O(m polylog m) clauses, and a constant soundness gap ε > 0. And
now suppose Merlin sends Arthur a log N-qubit state of the form
                                                1 N
                                         |ψi = √ ∑ (−1)xi |ii ,
                                                 N i=1

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                12
                                      T HE P OWER OF U NENTANGLEMENT

where x1 , . . . , xN ∈ {0, 1}N is a claimed satisfying assignment for φ . Call a state having the above form
(for some Boolean xi ’s) a proper state. Then we claim the following:
Lemma 3.2. Assuming |ψi is proper, Arthur can check whether φ is satisfiable with perfect complete-
ness and constant soundness.
Proof. To perform the check, Arthur uses the following Satisfiability Test. First he partitions the clauses
of φ into a constant number of blocks B1 , . . . , Bs , such that within each block, no two clauses share a
variable. Such a partition clearly exists by the assumption that φ is bounded-literal, and furthermore can
be found efficiently (e. g., using a greedy algorithm). Next he chooses one of the blocks Br uniformly at
random, and measures |ψi in an orthonormal basis with one projector for each clause in Br . Because a
single block in the partition of clauses does not necessarily cover all the variables, it is possible that the
measurement result will not correspond to any clause in Br , in which case Arthur accepts. However, sup-
pose that the measurement yields the following reduced state, for some random clause Ci jk` := (i, j, k, `)
in Br :
                                  1
                        ψi jkl := (−1)xi |ii + (−1)x j | ji + (−1)xk |ki + (−1)x` |`i .
                                                                                           
                                  2
Notice that, of the 16 possible assignments to the variables (xi , x j , xk , x` ), six of them satisfy Ci jk` , and
those six lead to three states |ψi jk` i that are orthogonal to one another (as well as the negations of those
states, which are essentially the same). It follows that Arthur can perform a projective measurement on
|ψi jk` i, which accepts with probability 1 if Ci jk` is satisfied, and rejects with constant probability if Ci jk`
is unsatisfied. Furthermore, because the number of blocks Br is a constant, each of the M clauses of
φ is checked in this test with probability Ω(1/M). And we know that, if x1 , . . . , xN is not a satisfying
assignment for φ , then a constant fraction of the clauses will be unsatisfied.
     Putting everything together, we find that if φ is satisfiable, then the Satisfiability Test accepts |ψi
with probability 1; while if φ is unsatisfiable, then it rejects with constant probability.

3.3   The symmetric case
Thus, the problem we need to solve is “merely” how to force Merlin to send a proper state. For example,
how can Arthur prevent a cheating Merlin from concentrating the amplitude of |ψi on some subset of
basis states for which the Satisfiability Test accepts, and omitting the other basis states?
    To solve
         √ this problem, Arthur is going to need more Merlins. In particular, let us suppose there are
K = Θ( N) unentangled Merlins, who send Arthur log N-qubit states |ϕ1 i, . . . , |ϕK i respectively. By
convexity, we can assume without loss of generality that these states are pure. For the time being, we
also assume that the states are identical; that is, |ϕi i = |ϕi for all i ∈ [K]. Given these states, Arthur
performs one of the following two tests, each with probability 1/2:
Satisfiability Test: Arthur chooses any copy of |ϕi, and performs the Satisfiability Test described in
       Section 3.2.
Uniformity Test: Arthur chooses a matching M on [N] uniformly at random. He then measures each
     copy of |ϕi in an orthonormal basis, which contains the vectors
                                                   |ii + | ji |ii − | ji
                                                      √      , √
                                                        2          2

                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                     13
                  S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

      for every edge (i, j) ∈ M. If for some (i, j) ∈ M, the two outcomes |ii+|
                                                                             √ ji and |ii−|
                                                                              2
                                                                                         √ ji both occur
                                                                                          2
      among the K measurement outcomes, then Arthur rejects. Otherwise he accepts.

    It is clear that the above protocol has perfect completeness. For if φ is satisfiable, then the Merlins
can just send K copies of a proper state |ψi corresponding to a satisfying assignment for φ . In that case,
both tests will accept with probability 1. Our goal is to prove the following:

Theorem 3.3. The protocol has constant soundness (again, assuming the |ϕi i’s are all identical).

    To prove Theorem 3.3, we need to show that if φ is unsatisfiable, then one of the two tests rejects with
constant probability. There are two cases. First suppose |ϕi is ε-close in trace distance to some proper
state |ψi. Then provided we choose ε > 0 sufficiently small, Lemma 3.2, combined with Proposition 2.6,
already implies that the Satisfiability Test rejects with constant probability. So our task reduces to
proving the following:

Claim 3.4. Suppose |ϕi is ε-far in trace distance from any proper state |ψi, for some ε > 0. Then the
Uniformity Test rejects with some constant probability δ (ε) > 0.

    In analyzing the Uniformity Test, we say that Arthur finds a collision if he obtains two measurement
outcomes of the form |ii ± | ji for the same (i, j) pair, and that he finds a disagreement if one of the
outcomes is |ii + | ji and the other is |ii − | ji. Of course, finding a disagreement is what causes him to
reject.
    The first step, though, is to lower-bound the probability that Arthur finds a collision. Let |ϕi =
α1 |1i + · · · + αN |Ni. Then for every copy of |ϕi and every edge (i, j) ∈ M, Arthur measures an outcome
of the form |ii ± | ji with probability |αi |2 + |α j |2 , and these outcomes are independent from one copy
                                                                            √ Arthur measures |ii ± | ji more
to the next. We are interested in the probability that, for some (i, j) pair,
than once. But this is just the famous Birthday Paradox, with K = Θ( N) “people” (the copies of |ϕi)
and N/2 “days” (the edges in M). The one twist is that the distribution over birthdays need not be
uniform. However, a result of Bloom and Knight [8] shows that the Birthday Paradox occurs in the
nonuniform case as well:

Lemma 3.5 (Generalized Birthday Paradox [8]). Suppose there are N days in the year, and each person’s
birthday
   √      is drawn independently from the same distribution (not necessarily uniform). Then if there are
Θ( N) people, at least two of them share a birthday with Ω(1) probability. (Indeed, the probability of
a collision is minimized precisely when the distribution over birthdays is uniform.)

    Therefore Arthur finds a collision with constant probability. The hard part is to show that he finds
a disagreement with constant probability. Here, of course, we will have to use the fact that |ϕi is ε-far
from proper.
    For now, let us restrict attention to two copies of |ϕi. For each edge (i, j) ∈ M, define the “disagree-
ment probability”
                                                  α +α   2 α −α   2
                                               2 i√2 j     i
                                                           √ j
                                                             2
                                         pi j = 
                                                             2 2
                                                               
                                                      2
                                                 |αi | + α j

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               14
                                      T HE P OWER OF U NENTANGLEMENT

to be the probability that, conditioned on measuring two outcomes of the form |ii ± | ji, one of the
outcomes is |ii + | ji and the other one is |ii − | ji. Also, say an edge (i, j) ∈ M is c-unbalanced with
respect to |ϕi if pi j ≥ c, and let Sc ⊆ M be the set of c-unbalanced edges. Say that a set of edges S ⊆ M
is d-large with respect to |ϕi if                              
                                                      2       2
                                          ∑      |αi |  + α j     ≥ d.
                                          (i, j)∈S

Then the key fact is the following:
Theorem 3.6. Suppose |ϕi is ε-far in trace distance from any proper state. Then Sc is d-large with
respect to |ϕi with probability at least 1/3 over the choice of M, for some constants c and d depending
on ε.
    The proof of Theorem 3.6 is deferred to the next section.
    Assuming Theorem 3.6, we can complete the proof of Claim 3.4, and hence of Theorem 3.3. The
idea is this: when Arthur performs the Uniformity Test, simply discard all measurement outcomes that
are not of the form |ii ± | ji for some (i, j) ∈ Sc . Assuming Sc is d-large—which
                                                                               √ it is with constant prob-
ability by Theorem 3.6—with overwhelming probability that still leaves Θ( N) “good” measurement
outcomes. Then by Lemma 3.5, with constant probability there will be a collision among these good
outcomes. And by the definition of Sc , any such collision will also be a disagreement with constant
probability, thereby causing Arthur to reject.

3.4     Unbalanced edges in random matchings
The goal of this section is to prove Theorem 3.6, which we now restate in a more careful way.
Theorem. There exist constants c, d > 0 for which the following holds. Let N be even and sufficiently
large. Suppose the state |ϕi = α1 |1i + · · · + αN |Ni is ε-far in trace distance from any proper state (that
is, any state of the form
                                                1 N
                                               √ ∑ (−1)xi |ii
                                                 N i=1
where x1 , . . . , xN ∈ {0, 1}). Let M be a matching on [N] chosen uniformly at random, and let S be the set
of edges (i, j) ∈ M that are “cε 8 -unbalanced,” meaning that
                                                                       2 2
                                                 2
                                                                       
                                       αi2 − α 2j ≥ 2cε 8 |αi |2 + α j     .

Then                                                               
                                                                  2
                                           ∑         |αi |2 + α j     ≥ dε 4
                                         (i, j)∈S

with probability at least 1/3 over the choice of M.
      Given a state |ϕi = α1 |1i + · · · + αN |Ni, define the nonuniformity of |ϕi to be

                                                          1 N            1
                                        NU (|ϕi) :=         ∑   |αi |2 −   .
                                                          2 i=1          N

                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                              15
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Intuitively, NU(|ϕi) measures whether the distribution induced by measuring |ϕi in the standard basis
is close to uniform or not. We will divide the proof of Theorem 3.6 into two cases: first that NU(|ϕi) >
ε 4 /100 (the “nonuniform case”), and second that NU(|ϕi) ≤ ε 4 /100 (the “uniform case”).

3.4.1   The nonuniform case
We now prove Theorem 3.6 in the case NU(|ϕi) > ε 4 /100. For convenience, define κ := ε 4 /100 and
pi := |αi |2 . Then the condition

                                                                           2 2
                                                  2
                                                                           
                                     αi2 − α 2j       ≥ 2cε 8 |αi |2 + α j

is equivalent to
                                  p2i + p2j − 2 Re αi2 α j 2 ≥ 2cε 8 (pi + p j )2 ,
which will certainly be true whenever (pi − p j )2 ≥ 2cε 8 (pi + p j )2 , or equivalently

                                             pi p j   2 + 2cε 8
                                                +   ≥           .
                                             p j pi   1 − 2cε 8

(If pi = 0 or p j = 0 then we stipulate that the above inequality holds.) Thus, it suffices to prove the
following classical lemma.

Lemma 3.7. Let N be even and sufficiently large. Let (p1 , . . . , pN ) be a probability distribution, and
suppose
                                      1 N         1
                                         ∑   pi −    ≥κ.
                                      2 i=1       N
Let M be a uniform random matching on [N], and let S be the set of edges (i, j) ∈ M such that pi /p j +
p j /pi ≥ 2 + κ 2 /16. Then
                                                         κ
                                        ∑ (pi + p j ) ≥ 12
                                     (i, j)∈S

with probability at least 1/3 over M.

    Let H (the “heavy elements”) be the set of i ∈ [N] such that pi ≥ 1/N, and let H ∗ ⊆ H (the “very
heavy elements”) be the set of i ∈ [N] such that pi ≥ (1 + κ/2)/N. Let L (the “light elements”) be the
set of i ∈ [N] such that pi < 1/N, and let L∗ ⊆ L (the “very light elements”) be the set of i ∈ [N] such
that pi ≤ (1 − κ/4)/N. Clearly
                                                             
                                            1           1
                                  ∑ i Np −      =  ∑      −  p i =κ.
                                 i∈H               i∈L N

Using this, we can prove two simple facts: that there are Ω(N) very light elements, and that the very
heavy elements have total weight Ω(κ).

Proposition 3.8. |L∗ | ≥ κN/2.

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                            16
                                            T HE P OWER OF U NENTANGLEMENT

Proof. We have
                                                                        |L∗ |
                                                               
                                                      1                                        κ
                                  κ=∑                   − pi        ≤         + (|L| − |L∗ |)    .
                                            i∈L       N                  N                    4N
Now use |L| ≤ N and rearrange.

    Given any subset A ⊆ [N], define the “weight” of A to be WA := ∑i∈A pi .

Proposition 3.9. WH ∗ ≥ κ/2.

Proof. We have
                                                                                              
                                    1                                1                         1             κ
                κ=∑            pi −         =       ∑           pi −         + ∑          pi −         ≤N      +WH ∗ .
                     i∈H            N             i∈H\H ∗
                                                                     N        i∈H ∗            N            2N

Now subtract κ/2 from both sides.

     To prove Lemma 3.7, we divide into two cases.
     The first case is that |H| ≥ N/2 (in other words, at least half the elements are heavy). In this case, we
begin constructing the matching M by randomly assigning partners to the “very light elements” i ∈ L∗ .
Recall from Proposition 3.8 that |L∗ | ≥ κN/2. So by a standard Chernoff bound, it is easy to see that
at least (say) |L∗ |/6 elements i ∈ L∗ will be matched to partners j ∈ H, with probability 1 − o(1) over
M. Notice that every edge (i, j) with i ∈ L∗ and j ∈ H satisfies pi ≤ (1 − κ/4)/N and p j ≥ 1/N, and
therefore
                                   pi p j                    1             κ2
                                      +    ≥ 1 − κ/4 +             > 2+       .
                                   p j pi                1 − κ/4           16
Thus, all of these edges go into the set S. We then have

                                                                           |L∗ | 1          κ
                                                  ∑ (pi + p j ) ≥ 6 · N ≥ 12
                                             (i, j)∈S

and are done.
    The second case is that |H| < N/2 (in other words, there are more light elements than heavy ones).
In this case, we begin constructing M by randomly assigning partners to the “very heavy elements”
i ∈ H ∗ . Let B be the set of elements i ∈ H ∗ that get matched to partners in H. Then since |H| < N/2,
every element of H ∗ goes into B with probability less than 1/2, and hence

                                                                          pi WH ∗
                                                      E [WB ] < ∑           =     .
                                                      M             i∈H ∗ 2   2

Therefore                                                         
                                                              3       2
                                                       Pr WB > WH ∗ <
                                                       M      4       3
by Markov’s inequality. In other words, with probability greater than 1/3, at least 1/4 of the probability
weight in H ∗ gets matched to partners in L. Suppose this happens.

                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                          17
                  S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

   Notice that every edge (i, j) with i ∈ H ∗ and j ∈ L satisfies pi ≥ 1+κ/2
                                                                         N   and p j < 1/N, and therefore

                                 pi p j                1        κ2
                                    +   > 1 + κ/2 +         > 2+ .
                                 p j pi             1 + κ/2     8
Thus, all of these edges go into the set S. Furthermore, by the assumption WB ≤ 43 WH ∗ , we have
                                                                                WH ∗   κ
                              ∑ (pi + p j ) ≥ ∑ pi = WH \B ≥ 4 ≥ 8    ∗

                            (i, j)∈S           i∈H ∗ \B

and are done.

3.4.2   The uniform case
We now prove Theorem 3.6 for states |ϕi = α1 |1i + · · · + αN |Ni such that NU(|ϕi) ≤ ε 4 /100. The
first step is to define a measure of the distance from |ϕi to the closest proper state, which we call the
impropriety of |ϕi or imp(|ϕi):
                                                                N
                                       imp (|ϕi) := min ∑ αi2 − r .
                                                     |r|=1/N i=1

Clearly 0 ≤ imp(|ϕi) ≤ 2 for all |ϕi, with imp(|ϕi) = 0 if and only if |ϕi is equivalent to a proper state
up to a phase shift. We also have the following:
Lemma 3.10. Suppose |ϕi is ε-far in trace distance from any proper state. Then imp(|ϕi) > ε 2 .
                                                 √
Proof. By Proposition 2.9, we have |hϕ|ψi| < 1 − ε 2 < 1 − ε 2 /2 for all proper states |ψi. On the other
hand, suppose imp(|ϕi) ≤ ε 2 . Then we will construct a proper state |ψi such that |hϕ|ψi| ≥ 1 − ε 2 /2,
thereby obtaining the desired contradiction.
                                                                                       √
    Let r be a complex number with |r| = 1/N that minimizes ∑Ni=1 |αi2 − r|, and let r be a canonical
square root of r. Also let βi := |αi2 − r|. Then
                                          √        √
                                     αi + r αi − r = αi2 − r = βi ,
                                 √       p           √       p                                √        √
which means that either |αi + r| ≤ βi or |αi − r| ≤ βi . So by setting the γi ’s to r or − r
                              p a state |ψi = γ1 |1i + · · · + γN |Ni that is proper up to a trivial phase
appropriately, we can construct
factor, such that |αi − γi | ≤ βi for all i. Then
                                         N                N               N
                2 − 2 |hϕ|ψi| = 2 − 2 ∑ αi γi ≤ 2 − ∑ αi γi + ∑ αi γi
                                        i=1               i=1             i=1
                                   N         N         N                                
                             ≤ 2 − ∑ αi γi − ∑ αi γi = ∑ |αi |2 + |γi |2 − αi γi − αi γi
                                       i=1     i=1              i=1
                                  N              N
                             = ∑ |αi − γi |2 ≤ ∑ βi = imp (|ϕi) ≤ ε 2 ,
                                 i=1            i=1

and hence |hϕ|ψi| ≥ 1 − ε 2 /2 as claimed.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                             18
                                     T HE P OWER OF U NENTANGLEMENT

    In what follows, assume imp(|ϕi) > ε 2 .
    Now as in Section 3.4.1, let pi := |αi |2 , and for any subset A ⊆ [N], define the “probability weight”
of A to be WA := ∑i∈A pi . Also, let δ := ε 2 /5, and let U (the “δ -uniform subset”) be the set of all
i ∈ [N] such that |pi − 1/N| ≤ δ /N. The following proposition shows that U encompasses “most” of
|ϕi, whether in terms of cardinality or in terms of probability weight.

Proposition 3.11. |U| ≥ N(1 − ε 2 /10) and WU ≥ 1 − 3ε 2 /10.

Proof. We have
                                      ε4             1         δ
                                         ≥ NU (|ϕi) ≥ (N − |U|) ,
                                     100             2         N
hence
                                               ε4          ε2
                                                           
                                  |U| ≥ N 1 −       = N 1−      ,
                                              50δ          10
hence
                                          ε2                                      3ε 2
                                                                   
                                                           1 δ
                               WU ≥ N 1 −                   −              ≥ 1−        .
                                          10               N N                    10



    Let
                                     impU (|ϕi) := min            ∑ αi2 − r
                                                          |r|=1/N i∈U

be an analogue of impropriety that is restricted to the set U. By combining Lemma 3.10 with Proposi-
tion 3.11, we can now lower-bound impU (|ϕi).

Proposition 3.12. impU (|ϕi) ≥ 3ε 2 /5.

Proof. For all r with |r| = 1/N, we have

                                            1              N − |U| 3ε 2 ε 2   2ε 2
                 ∑ αi2 − r ≤ ∑ pi + ∑       N
                                              ≤ (1 −WU ) +
                                                             N
                                                                  ≤    +
                                                                    10 10
                                                                            =
                                                                               5
                i∈U
                 /             i∈U
                                /       i∈U
                                         /

by Proposition 3.11. Hence

                         impU (|ϕi) = min         ∑ αi2 − r
                                        |r|=1/N i∈U

                                      ≥ min
                                        |r|=1/N
                                                  ∑ αi2 − r − |r|=1/N
                                                                max ∑ αi2 − r
                                                  i∈[N]                          i∈U
                                                                                  /

                                                           2ε 2       3ε 2
                                      ≥ imp (|ϕi) −               ≥          .
                                                            5          5



                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                              19
                     S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

    We are finally ready for the geometric core of our result. Let V be a collection of vectors in R2
(possibly with multiplicity), which consists of the vector (N Re αi2 , N Im αi2 ) for every i ∈ U. Let kvk be
the 2-norm of v. Then we know by the definition of U that 1 − δ ≤ kvk ≤ 1 + δ for all v ∈ V . We also
know from Proposition 3.11 that

                                                             ε2
                                                                             
                                          |V | = |U| ≥ N 1 −                       ≥ 0.9N ,
                                                             10

and from Proposition 3.12 that for all unit vectors w ∈ R2 ,

                                                                   3ε 2 N   3ε 2 |V |
                                            ∑ kv − wk ≥              5
                                                                          ≥
                                                                               5
                                                                                      .
                                            v∈V


Based on this information, we want to find two subsets X,Y ⊆ V , both of size Ω(|V |), such that kx −yk =
Ω(1) for all x ∈ X and y ∈ Y .
    For suppose we can do this. Then just as in Section 3.4.1, when a matching M on [N] is chosen uni-
formly at random, by a Chernoff bound it will have Ω(N) edges between the subsets of [N] corresponding
to X and Y with overwhelming probability. Assuming that happens, we will have |αi2 − α 2j | = Ω(1/N)
for every such edge (i, j) ∈ M, and hence all of these edges will get added to the set S. We will therefore
have
                                          ∑ (pi + p j ) = Ω(1)
                                                  (i, j)∈S

as desired. (For simplicity, we have suppressed the dependence on ε here.)
    What we need, then, is the following geometric lemma.

Lemma 3.13. Let V be a collection of vectors in the plane. Suppose that 1 − δ ≤ kvk ≤ 1 + δ for every
v ∈ V , and that ∑v∈V kv − wk ≥ κ|V | for every unit vector w ∈ R2 . Then provided δ ≤ κ/2, there exist
subsets X,Y ⊆ V , both of size at least κ|V |/40, such that kx − yk ≥ κ/20 for all x ∈ X and y ∈ Y .10

Proof. Divide the plane into K ∈ [30/κ, 40/κ] equal-sized, half-open angular sectors, centered about the
origin. By the pigeonhole principle, one of these sectors (call it S) must contain at least |V |/K ≥ κ|V |/40
of the vectors. Let S0 be the union of S and its two adjacent sectors. Then we claim that at least κ|V |/40
of the vectors must lie outside of S0 . For suppose not. Then let z be the unit vector that bisects S, and let
                                                                                
                                             3           2π         3        2π            πκ
                                          θ=                      ≤                    =
                                             2           K          2       30/κ           10

be the angle between z and the border of S0 . Notice that by the triangle inequality, we have
                                                 √                                          πκ
                                   kv − zk ≤         2 − 2 cos θ + δ ≤ θ + δ ≤                 +δ
                                                                                            10
  10 We did not try to optimize the constants.



                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                            20
                                        T HE P OWER OF U NENTANGLEMENT

for every v in S0 (where we have used the bound cos θ ≥ 1 − θ 2 /2). We also have kv − zk ≤ 2 + δ for
every v ∈ V . Hence
                                               πκ     
                             ∑  kv − zk ≤  ∑       + δ   + ∑ (2 + δ )
                            v∈V           v∈S0 10          / 0
                                                          v∈S
                                                        πκ          κ |V |
                                                   ≤       |V | + 2        + δ |V | < κ |V |
                                                        10           40
which is a contradiction.
     Now let X be the set of all v’s in S, and let Y be the set of all v’s outside S0 . Then |X| ≥ κ|V |/40 and
|Y | ≥ κ|V |/40. Also, let
                                                     2π     πκ
                                                τ=       ≥
                                                      K      20
be the angle of a single sector. Then it is not hard to see that for all x ∈ X and y ∈ Y ,
                                      √                          τ          κ  πκ          κ
                  kx − yk ≥ (1 − δ ) 2 − 2 cos τ ≥ (1 − δ ) √ ≥ 1 −                 √ ≥
                                                                  2           2 20 2 20
where we have used the bound cos τ ≤ 1 − τ 2 /4 for all τ ∈ [0, π/2].

   Now set κ := 3ε 2 /5. Then δ = ε 2 /5 < κ/2 and the condition of Lemma 3.13 is satisfied. So
considering the sets X,Y from the lemma, we have
                                               κ |V | 0.9κN   (0.9) 3ε 2 N   ε 2N
                                |X| , |Y | ≥         ≥      =              >      ,
                                                40      40       200         100
and also
                                                      κ    3ε 2
                                                       kx − yk ≥
                                                        ≥
                                                     20 100
for all x ∈ X and y ∈ Y . This means that we can find subsets X 0 ,Y 0 ⊆ [N] such that

  (i) |X 0 | , |Y 0 | ≥ ε 2 N/100 and
                         2
                     3ε
  (ii) αi2 − α 2j ≥ 100N for all i ∈ X 0 and j ∈ Y 0 .

    Property (ii) implies that
                                               2         9ε 4           
                                                                              2       2 2
                                                                                       
                                 αi2 − α 2j        ≥            ≥ 2cε 8
                                                                         |αi |  + α j
                                                       10000N 2
for some suitable constant c. Hence every edge (i, j) ∈ M with i ∈ X 0 and j ∈ Y 0 will get added to the
set S (again assuming a suitable c).
    Property (i), together with a Chernoff bound, implies that with probability 1 − o(1) over the choice
of matching M, there are at least (say) ε 4 N/20000 edges (i, j) ∈ M such that i ∈ X 0 and j ∈ Y 0 . Suppose
this happens. Then
                                                 ε 4N
                                                                 
                                                           1−δ
                                ∑ (pi + p j ) ≥ 20000 · 2 N = Ω ε 4
                                                                            
                             (i, j)∈S

as desired. This completes the proof of Theorem 3.6.

                              T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                              21
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

3.5   The general case
                                                                        √
Of course, in general the states |ϕ1 i, . . . , |ϕK i sent by the K = Θ( N) Merlins need not be identical. To
deal with this, we now give our final protocol, which removes the symmetry restriction.

       The 2- OUT- OF -4-S AT Protocol
       Given |ϕ1 i, . . . , |ϕK i, Arthur performs one of the following three tests, each with probability
       1/3.
       Satisfiability Test: Arthur applies the Satisfiability Test, described in Section 3.2, to |ϕ1 i.
       Symmetry Test: Arthur chooses an index k ∈ {2, . . . , K} uniformly at random, performs a
       swap test between |ϕ1 i and |ϕk i, and accepts if and only if the swap test accepts.
       Uniformity Test: Arthur chooses a matching M on [N] uniformly at random. He then
       measures each |ϕk i in an orthonormal basis, which contains the vectors

                                               |ii + | ji |ii − | ji
                                                  √      , √
                                                    2          2

       for every edge (i, j) ∈ M. If for some (i, j) ∈ M, the two outcomes |ii+|
                                                                              √ ji and |ii−|
                                                                               2
                                                                                          √ ji both
                                                                                           2
       occur among the K measurement outcomes, then Arthur rejects. Otherwise he accepts.

     It is clear that the above protocol has perfect completeness, and thus the problem is to show sound-
ness: that is, if φ is unsatisfiable, then one of the three tests rejects with constant probability. There are
three cases.
     The first case is that |ϕ1 i is ε-close to some proper state |ψi. Then as before, the Satisfiability Test
will reject with constant probability, provided we choose ε sufficiently small.
     The second case is that |hϕ1 |ϕk i| < 1 − δ for at least a γ fraction of indices k ∈ {2, . . . , K}. In that
case it is clear that the Symmetry Test will reject with probability at least γδ /2.
     The third case is that |hϕ1 |ϕk i| ≥ 1 − δ for more than a 1 − γ fraction of indices k ∈ {2, . . . , K}, but
nevertheless |ϕ1 i is ε-far from any proper state. In this case we need to generalize the results of the
previous section, to show that the Uniformity Test will still reject with constant probability (dependent
on ε, δ , and γ).
     The first step in the analysis is simply to discard all states
                                                                √         |ϕk i such that |hϕ1 |ϕk i| < 1 − δ . By
                                      0
Proposition 2.9, the remaining K ≥ (1 − γ)K states are all 2δ -close to |ϕ1 i in trace distance.
     Now given a matching M on [N], let Sc be the set of edges in M that are c-unbalanced with respect
to |ϕ1 i, in the sense defined in Section 3.3. Then Theorem 3.6 implies that Sc is d-large with respect to
|ϕ1 i (for some constants c and d) with probability at least 1/3 over the choice of M. Suppose that it is.
     Call a measurement outcome |ii ± | ji good if (i, j) ∈ Sc . Then when Arthur performs the Uniformity    √
Test, we simply discard all states for which the outcome is not good. Since all of the states are 2δ -
                       √ Sc is d-large with respect to |ϕ1 i, with overwhelming probability this still leaves
close to |ϕ1 i, and since
us with K ≈ (d − 2δ )K 0 states. Call those states |ξ1 i, . . . , |ξK 00 i.
            00

     Let M e = {|ii ± | ji : (i, j) ∈ M}. Given a state |ϕi, let D|ϕi be the probability distribution over M     e
                                                                                            √
induced by measuring |ϕi according to M. Then we know that kD|ϕ1 i − D|ξk i k ≤ 2δ for all k ∈ [K 00 ].

                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                   22
                                    T HE P OWER OF U NENTANGLEMENT

Next let D0|ϕi be the distribution over M e induced by measuring |ϕi, and then conditioning on the outcome
being good. Then we claim that                                   √
                                              0       0            2δ
                                            D|ξk i − D|ϕ1 i ≤      √
                                                               d − 2δ
for all k ∈ [K 00 ], where as always k · k denotes the variation distance. This is so because of the following
simple fact:
Proposition 3.14. Let D1 and D2 be probability distributions, let E be an event, and let D01 and D02
denote D1 and D2 respectively conditioned on E. Suppose kD1 − D2 k ≤ κ and Prx∈D1 [E(x)] ≥ a. Then
kD01 − D02 k ≤ a−κ
                κ
                   .
Proof. Let b = Prx∈D2 [E(x)], and note that |a − b| ≤ κ. Also let px = PrD1 [x] and qx = PrD2 [x]. Then
                                       1                           1         px qx
                          D01 − D02 =      ∑ DPr0 [x] − DPr0 [x] = 2 ∑ a − b
                                       2 x:E(x)   1        2         x:E(x)
                                                                           
                                        1                             b
                                     ≤      ∑ |px − qx | + px − a px
                                       2b x:E(x)
                                         κ  1  a κ   κ
                                     ≤     + 1− ≤ ≤     .
                                         2b 2  b b  a−κ


     By construction, every measurement outcome |ii ± | ji in the support of every D0|ξk i corresponds to
an edge (i, j) that is c-unbalanced with respect to |ϕ1 i. But this still leaves a key question unanswered:
is (i, j) reasonably unbalanced with respect to |ξk i itself? The following lemma will imply that it is, with
high probability over D0|ξk i : in particular that
                                                                                   √
                                    h          c                        i        16 2δ
                           Pr         (i, j) is -unbalanced w.r.t. |ξk i ≥ 1 −     √ 
                    |ii±| ji∈D0
                              |ξk i
                                               4                              c d − 2δ

for all k ∈ [K 00 ].
Lemma 3.15. Let D = (px , qx )x∈[N] and D0 = (p0x , q0x )x∈[N] be any two probability distributions over the
set [N] × {0, 1}. Suppose that kD − D0 k ≤ µ, and that 2px qx ≥ c(px + qx )2 for every x ∈ [N]. Let S be
the set of all x ∈ [N] such that 2p0x q0x ≥ c0 (p0x + q0x )2 . Then
                                                                   8µ
                                         ∑ p0x + q0x ≥ 1 − c − 2c0 ,
                                                        
                                         x∈S

for all constants c ∈ (0, 1/2) and c0 ∈ (0, c/2).
Proof. Assume for simplicity that px , qx > 0 for all x (it is not hard to remove this restriction). Let
εx = p0x − px and δx = q0x − qx . Then by assumption,
                                               N
                                               ∑ (|εx | + |δx |) ≤ 2µ .
                                           x=1


                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                23
                  S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Let S be the complement of S. Consider an adversary with a “budget” of 2µ, who is trying to perturb D
so as to maximize ∑x∈S (p0x + q0x ). Define the “price per pound” of x to be

                                                       |εx | + |δx |
                                               $x :=                 .
                                                         p0x + q0x
Intuitively, $x is the amount the adversary has to “spend” on perturbing px and qx , divided by the amount
of probability mass that gets added to S as a result. We will show that $x ≥ (c − 2c0 )/4 for all x ∈ S. This
will suffice to prove the lemma, since we then have
                                                           |εx | + |δx |     8µ
                                 ∑ p0x + q0x = ∑
                                               
                                                                         ≤         .
                                                                 $x        c − 2c0
                                 x∈S               x∈S

We now lower-bound $x . If we simply divide through by px qx , the condition 2px qx ≥ c(px + qx )2 is
equivalent to px /qx + qx /px ≤ (2 − 2c)/c. Let A = (2 − 2c)/c; then in particular, we have px ≤ Aqx and
qx ≤ Apx for all x. On the other hand, to get x ∈ S we need p0x /q0x + q0x /p0x > (2 − 2c0 )/c0 , and hence
either p0x /q0x > B or q0x /p0x > B where B = (1 − c0 )/c0 .
    Suppose p0x /q0x > B without loss of generality. Then
                                                                p      
                                                                  x
                                     px + εx > B (qx + δx ) > B     + δx ,
                                                                 A
which rearranging means                                            
                                                               B
                                           εx − Bδx >            − 1 px .
                                                               A
Likewise
                                       B (qx + δx ) < px + εx < Aqx + εx ,
which rearranging means                                                    
                                                                       B
                                  εx − Bδx > (B − A) qx >                − 1 qx .
                                                                       A
Combining,                                                       
                                                           B          px + qx
                                         εx − Bδx >          −1
                                                           A             2
and hence                                                                  
                                            1                         1 1       px + qx
                             |εx | + |δx | > (εx − Bδx ) >             −                .
                                            B                         A B          2
Therefore
                                                                   1
                        |εx | + |δx |        |εx | + |δx |         2 (1/A − 1/B)
                   $x =               ≥                        >
                          p0x + q0x     px + qx + |εx | + |δx | 1 + 12 (1/A − 1/B)
                          c/ (2 − 2c) − c0 / (1 − c0 )          c − 2c0 + cc0    c − 2c0
                      =                                    =                   ≥
                        2 + c/ (2 − 2c) − c0 / (1 − c0 ) 4 − 3c − 6c0 + 5cc0        4
as claimed.

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               24
                                       T HE P OWER OF U NENTANGLEMENT

    We now need one last conditioning step: discard all states |ξk i for which the measurement outcome
is not c/4-unbalanced with respect to |ξk i. By Lemma 3.15, with overwhelming probability this still
leaves us with K 000 ≈ K 00 states (for suitable choices of c, d, and δ ). Call those states |ς1 i, . . . , |ςK 000 i.
    Given any state |ϕi, let D00|ϕi be the probability distribution over |ii ± | ji ∈ M  e obtained by starting
from D0|ϕi , and then conditioning on the edge (i, j) being c/4-unbalanced with respect to |ϕi. Then

                              D00|ςk i − D0|ϕ1 i ≤ D00|ςk i − D0|ςk i + D0|ςk i − D0|ϕ1 i
                                                          √              √
                                                     16 2δ                2δ
                                                 ≤         √ +          √
                                                  c d − 2δ             d − 2δ
                                                          √
                                                     17 2δ
                                                 ≤         √ 
                                                  c d − 2δ

for all k ∈ [K 000 ]. So by the triangle inequality,
                                                                    √
                                                                  34 2δ
                                         D00|ςk i − D00|ς i   ≤     √ 
                                                        `
                                                               c d − 2δ

for all k, ` ∈ [K 000 ].
                                                  √
    So to sum up: we have K 000 = Θ( N) samples from M,                  e drawn independently from probability
distributions D|ς1 i , . . . , D|ς 000 i respectively. The distributions D00|ςk i have bounded variation distance from
                  00            00
                                   K
one another. We also know, because the D00|ςk i ’s only involve c/4-unbalanced edges (i, j), that if Arthur
finds a collision among the K 000 samples (i. e., two samples of the form |ii ± | ji for some (i, j)), then that
collision will also be a disagreement with constant probability. Thus, the one remaining task is to show
that Arthur finds a collision with constant probability.
    Showing this amounts to generalizing the Birthday Paradox still further, to the case where the birth-
day distributions are not only nonuniform but can also differ from each other by small amounts. In
particular we want the following:

                          . , XK be independent random variables over [N], and let Di be the distribution
Theorem 3.16. Let X1 , . .√
over Xi . Suppose K ≥ 32 N and kDi − D j k ≤ 1/10 for all i, j. Then

                                                                        1
                                             Pr [∃i, j : Xi = X j ] ≥     .
                                                                        2
    In Section 3.6, we present a proof of Theorem 3.16 based on the second moment method. (Indeed,
our proof works even if the Xi ’s are only 4-wise independent.)               √
    By Theorem 3.16, Arthur will find a collision among the K 000 = Θ( N) remaining samples with
constant probability. Then by the definition of the D00|ςk i ’s, this collision will be a disagreement with
constant probability, thereby causing Arthur to reject.
    So in summary, we get a protocol with perfect completeness, constant soundness, and O(     e √m) unen-
tangled witnesses with O(log m) qubits each.

                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                      25
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

    As a final remark, we can reduce the soundness error to be negligibly small (in particular, 2− polylog m ).
To do so, we simply multiply the number of Merlins by a further polylog m factor, and repeat the whole
protocol polylog m times.

3.6     The generalized birthday paradox
The purpose of this section is to prove the Birthday Paradox, even in the very general situation where

  (1) the distributions over birthdays need not be uniform, and
  (2) the distributions need not be the same for every person, but only ε-close in variation distance, and
  (3) the distributions need not be independent, but only 4-wise independent.

      First we need two lemmas.
Lemma 3.17. Let D1 , D2 be probability distributions over [n] such that kD1 − D2 k ≤ ε. Then
                                          Pr       [x = y] ≥ (1 − ε)2 /n .
                                      x∈D1 ,y∈D2

Proof. Let px = PrD1 [x] and let qx = PrD2 [x]. Then
                                                                                           !2
                                                            1                                       1
            Pr      [x = y] = ∑ px qx ≥ ∑ min (px , qx )2 ≥                 ∑ min (px , qx )    =     (1 − ε)2
         x∈D1 ,y∈D2
                             x∈[n]     x∈[n]
                                                            n           x∈[n]
                                                                                                    n

where the second inequality follows from Cauchy-Schwarz.

Lemma 3.18. Let p1 , . . . , pK be nonnegative reals, and let r = ∑i< j<k pi p j pk and s = ∑i< j pi p j . Then
r2 ≤ 2s3 .
Proof. Let S be the set of 6-tuples (i, j, k, `, m, n) such that i < j, k < `, and m < n, and let R be the set
of 6-tuples such that i < j < k and ` < m < n. Then
                                           s3 = ∑ pi p j pk p` pm pn
                                                   S

while
                                          r2 = ∑ pi p j pk p` pm pn .
                                                   R
Now define a mapping from R to S, by simply swapping k and ` if k > `, or swapping ` and m if k = `.
It is easily checked that this mapping is two-to-one. Hence r2 ≤ 2s3 as claimed.

      We now prove Theorem 3.16, which we restate for convenience.
Theorem. Let X1 , . . . , XK be 4-wise independent random variables over [n], and let Di be the marginal
                                        √
distribution over Xi . Suppose K ≥ 32 n and kDi − D j k ≤ 1/10 for all i, j. Then
                                                                      1
                                           Pr [∃i, j : Xi = X j ] ≥     .
                                                                      2

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                      26
                                    T HE P OWER OF U NENTANGLEMENT

Proof. Let Yi j be 1 if Xi = X j and 0 otherwise, and let Y := ∑i< j Yi j . By Lemma 3.17, we have

                                                   K (1 − 1/10)2
                                                   
                                     E [Y ] ≥                    ≥ 900 .
                                                   2      n

The remainder of the proof will involve upper-bounding the second moment E[Y 2 ]. Let us write

                                 E Y2 =
                                   
                                                  ∑ E [Yi jYk` ] = τ2 + τ3 + τ4 ,
                                             i< j,k<`


where τN contains the terms in which N distinct indices appear among {i, j, k, l}. It is easy to see that

                                             τ2 = ∑ E [Yi j ] = E [Y ]
                                                      i< j

and (by 4-wise independence) that

                                    τ4 =      ∑           E [Yi j ] E [Yk` ] ≤ E [Y ]2 .
                                            i< j,k<l
                                           all distinct

So the nontrivial part is to upper-bound τ3 . Let pi,x := Pr[Xi = x]. Also, let

                                            rx :=         ∑ pi,x p j,x pk,x ,
                                                      i< j<k

                                            sx := ∑ pi,x p j,x ,
                                                      i< j


and notice that ∑x∈[n] sx = E[Y ]. Then

                                 τ3 = 6 ∑          ∑ pi,x p j,x pk,x
                                          i< j<k x∈[n]

                                    = 6 ∑ rx
                                          x∈[n]
                                         q
                                    ≤ 6 ∑ 2s3x
                                          x∈[n]
                                      √
                                                                         
                                                           1 2
                                    ≤6 2 ∑                  s + 12sx
                                              x∈[n]
                                                          40 x
                                                               !2                         
                                       √    1
                                    ≤ 6 2                   ∑ sx        + 12 ∑ sx 
                                           40                x∈[n]              x∈[n]
                                                                           !
                                      √           E [Y ]2
                                    =6 2                  + 12 E [Y ] .
                                                   40


                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                 27
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Here the third line follows from Lemma 3.18, and the fourth line follows from the basic calculus fact
            1 2
that s3/2 ≤ 40 s + 12s for all nonnegative s. Hence
                                                            h                      i Var [Y ]
               Pr [Y = 0] ≤ Pr [|Y − E [Y ]| ≥ E [Y ]] = Pr (Y − E [Y ])2 ≥ E [Y ]2 ≤
                                                                                      E [Y ]2
                            E Y − E [Y ]2 τ2 + τ3 + τ4 − E [Y ]2
                               2
                          =                   =
                                  E [Y ]2                 E [Y ]2
                                       √                         
                            E [Y ] + 6 2 E [Y ]2 /40 + 12 E [Y ] + E [Y ]2 − E [Y ]2
                          ≤
                                                       E [Y ]2
                                    √        √
                            1 + 72 2 6 2 1
                          =               +      ≤ .
                               E [Y ]        40      2

Finally,
                                                                             1
                                 Pr [∃i, j : Xi = X j ] = 1 − Pr [Y = 0] ≥
                                                                             2
and we are done.

3.7   General observations
We conclude this subsection by making three general observations about Theorem 1.1.
     First, we strongly believe that our protocol can be improved to one involving two provers, one of
                                                               √
whom sends O(log m) qubits and the other of whom sends O( m polylog m) qubits. Specifically, if all
but one of the witnesses in our current protocol are entangled with one another, in a way that breaks the
protocol’s soundness, we believe Arthur should be able to use the remaining witness to detect this. This
is a problem we leave to future work.
     Second, our protocol made essential use of the PCP Theorem, in the strong version proved by
Dinur [11]. One might wonder whether Theorem 1.1 could also be proved in a “black-box” fashion,
without exploiting anything about the structure of 3S AT. The following simple result shows that the
answer is no—and that indeed, in the black-box setting, there is essentially no savings at all over the
classical witness size.

Theorem 3.19. Let f : {0, 1}n → {0, 1} be a black-box function. Then any QMA f (k) protocol to con-
vince Arthur that there exists an x such that f (x) = 1, with soundness gap Ω(1/ poly(n)), must involve
n − O(log n) qubits sent by the Merlins.

Proof Sketch. Assume without loss of generality that either f is identically zero, or else there exists a
unique “marked item” x∗ such that f (x∗ ) = 1. Suppose it were possible to convince Arthur that x∗ exists
by giving him unentangled witnesses |ϕ1 i, . . . , |ϕK i with Q qubits in total. Then given these witnesses,
Arthur’s verification algorithm must query f (x∗ ) at some time step with non-negligible probability β =
Ω(1/ poly(n)). For otherwise, by the hybrid argument of Bennett, Bernstein, Brassard, and Vazirani [4],
Arthur’s verification algorithm would not have Ω(1/ poly(n)) soundness (i. e., Arthur would fail to detect
a change in f (x∗ ) from 1 to 0). But this means that Arthur’s algorithm can be modified to one that

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               28
                                    T HE P OWER OF U NENTANGLEMENT

uses no witnesses, and that finds x∗ with probability at least 2−Q β /T . For Arthur can simply replace
|ϕ1 i, . . . , |ϕK i by the Q-qubit maximally mixed state, then measure at a random time step to find which x
is being queried. On the other hand, we know from the result of Bennett et al. [4] mentioned previously
that if x∗ is uniformly random, then after T queries, Arthur can have found x∗ with probability at most
4T 2 /2n . Solving 2−Q β /T ≤ 4T 2 /2n for Q, we find that

                                                β 2n
                                      Q ≥ log        ≥ n − O (log n) .
                                                4T 3



    Third, notice that our protocol does not let Arthur find a satisfying assignment for ϕ; it only con-
vinces him that such an assignment exists. If there were a way to modify our protocol to let Arthur
recover an assignment, this would have a spectacular consequence for quantum algorithms. Namely,
by running Arthur’s verification procedure with the O(e √m)-qubit maximally mixed state in place of
                                                                                e√
the witnesses, we could find a satisfying assignment
                                            √
                                                     for ϕ with probability 2−O( m) , with no help from
any√Merlins. But this would yield a 2O( m) -time quantum algorithm for 3S AT—and in particular, a
                                          e

2O( n) -time algorithm in the “critical regime” m = O(n)!
  e




4      Weak additivity implies amplification
In this section we show how to amplify any QMA(k) protocol to exponentially small error, and to
simulate k provers with two, assuming a weak version of the Additivity Conjecture.

4.1     Entanglement of formation
The analysis of our amplification protocol will involve showing that Arthur cannot create “too much”
entanglement during his verification procedure. To make this precise, we need some way to measure
the entanglement of mixed states. Fortunately, this is one of the most studied topics in all of quantum
information theory. One particular entanglement measure—the entanglement of formation EF defined
by Bennett et al. [6]—will be particularly useful for us.
    To define EF we first need some other concepts. Given a mixed state σ , the von Neumann entropy
of σ is S(σ ) := H({λi }), where H({pi }) = − ∑i pi log2 pi is the usual entropy function and {λi } are
the eigenvalues of σ . Given a bipartite pure state |ψ AB i, let σ A and σ B be the reduced states of the A
and B registers respectively. Then it is not hard to show that S(σ A ) = S(σ B ). We call this quantity the
entanglement entropy of |ψ AB i, or E(|ψ AB i). We can then define EF (ρ AB ) as a weighted average of
entanglement entropy, minimized over all purifications of ρ AB :

Definition 4.1. Given a bipartite state ρ AB , the entanglement of formation EF (ρ AB ) is the minimum of
∑i pi E(|ψi i) over all decompositions ρ AB = ∑i pi |ψi ihψi |.

      Intuitively, EF measures the minimum number of entangled pairs √12 (|00i + |11i) that are needed to
prepare ρ AB .

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               29
                  S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

    Almost by definition, EF satisfies convexity: for all ρ AB and σ AB ,
                      EF pρ AB + (1 − p) σ AB ≤ pEF ρ AB + (1 − p) EF σ AB .
                                                                        

It is also easy to see that EF (ρ AB ) = 0 if and only if ρ AB is separable. In this paper, we will need two
further properties of EF . The first property is what we called “faithfulness” in Section 1.4.
                                                                                        √
Lemma 4.2. Suppose EF (ρ AB ) ≤ ε. Then there exists a separable state that is 2ε-close to ρ AB in
trace distance.
Proof. Let S(ρ||σ ) be the quantum relative entropy between mixed states ρ and σ (see Nielsen and
Chuang [23] for a definition). Then Vedral and Plenio [28] showed that
                                    EF (ρ AB ) ≥ min S ρ AB ||σ AB ,
                                                                  

where the minimum is taken over all separable states σ AB . Also, it is known (see Klauck et al. [16] and
Ohya and Petz [24]) that
                                                       1 AB            2
                                     S(ρ AB ||σ AB ) ≥    ρ − σ AB tr .
                                                       2
Putting these results together, if EF (ρ AB ) ≤ ε then there exists a separable state σ AB such that
                                            S(ρ AB ||σ AB ) ≤ ε ,
                               √
and hence kρ AB − σ AB ktr ≤    2ε.
   The second property is that, if we start from a separable state, we cannot increase the value of EF
much by measuring few qubits and then conditioning on the outcome.
Lemma 4.3. Let ρ AB be a separable state, and suppose σ AB is obtained from ρ AB by applying an
arbitrary entangled measurement on at most n qubits from each register, and then possibly conditioning
on the outcome. Then EF (σ AB ) ≤ EF (ρ AB ) + 2n.
Proof. By convexity, we can assume without loss of generality that ρ AB is a pure state, |ψA i ⊗ |ψB i. So
we can write σ A as
                                         Φ(|ψA ihψA |)
                                                            ,
                                        kΦ(|ψA ihψA |)k
where Φ is some non-trace-increasing operator acting on at most n qubits. Or in the operator-sum
representation,
                                                                 †
                                          ∑Mi=1 Ei |ψA i hψA | Ei
                                   σA =                            †
                                        Tr ∑M i=1 Ei |ψA i hψA | Ei
         2n
where ∑2i=1 Ei† Ei  I and M ≤ 22n . We then have
                               EF σ AB ≤ S σ A
                                                 
                                                                              !
                                                                        †
                                                  ∑Mi=1 Ei |ψA i hψA | Ei
                                           =S                             †
                                                 Tr ∑Mi=1 Ei |ψA i hψA | Ei
                                           ≤ log2 M ≤ 2n
where the first line follows from the concavity of the von Neumann entropy.

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               30
                                          T HE P OWER OF U NENTANGLEMENT

                                                                                                       0   0
    Given an entanglement measure E, we call E superadditive if for any state ρ AA ,BB on four registers,
                                      0   0                    0 0
                                E(ρ AA ,BB ) ≥ E ρ AB + E(ρ A B ) .
                                                     

As mentioned earlier, the analysis of our QMA(k) amplification protocol originally relied on the con-
jecture that EF is superadditive. But in a spectacular recent development, Hastings [13] has shown that
this conjecture is false. (More precisely, Hastings shows a failure of additivity for the minimum out-
put entropy of a quantum channel. By a result of Shor [27], this is equivalent to the superadditivity of
entanglement of formation.)
    However, the violation of additivity found by Hastings is extremely small, and is perfectly consistent
with additivity being true in a weaker or asymptotic sense. To make this precise, we now state a version
of the Additivity Conjecture that suffices for our purposes. Call an entanglement measure E weakly
superadditive if it satisfies the relation

                                                                   c k
                                      E ρ A1 A2 ···Ak ,B1 B2 ···Bk ≥ ∑ E(ρ Ai B j ) ,
                                                                    k i, j=1

for some constant c independent of k. Weak superadditivity is, in particular, implied by the following
inequality:
                             0  0    1h                  0        0          0 0
                                                                                 i
                      E(ρ AA ,BB ) ≥    E ρ AB + E(ρ AB ) + E(ρ A B ) + E(ρ A B )
                                              
                                     2
which in turn is implied by ordinary superadditivity. Then we conjecture the following:

Conjecture 4.4 (Weak Additivity Conjecture). EF is weakly superadditive.

    As a side note, EF badly violates the so-called monogamy inequality11
                                                      0                          0
                                           E(ρ A,BB ) ≥ E(ρ AB ) + E(ρ AB ) .

As an example, consider again the maximally antisymmetric state
                                          1
                                   |ψi = √     ∑ (−1)sgn(σ ) |σ (1)i · · · |σ (N)i .
                                           N! σ ∈SN

The entanglement of formation between the first register of |ψi and the remaining N − 1 registers is at
most log N. Yet the entanglement of formation between the first register and any one other register can
be shown to be Ω(1). Note that, had EF satisfied the monogamy inequality, we would have been able to
use it to show QMA(2) = QMA, using essentially the same argument as in footnote 3.
    A different entanglement measure—the squashed entanglement Esq of Christandl and Winter [10]—
is known to satisfy both superadditivity and the stronger monogamy inequality. The trouble with Esq is
that it badly violates the analogue of Lemma 4.2: there exist N ×N-dimensional bipartite states ρ AB Such
  11 Note that the monogamy inequality implies superadditivity, via

                0   0          0            0   0                   0            0         0 0                     0 0
         E(ρ AA ,BB ) ≥ E(ρ AA ,B ) + E(ρ AA ,B ) ≥ E(ρ AB ) + E(ρ A B ) + E(ρ AB ) + E(ρ A B ) ≥ E(ρ AB ) + E(ρ A B ) .




                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                            31
                   S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

that Esq (ρ AB ) = O( logNN ), yet ρ AB has trace distance Ω(1) to any separable state. The example that shows
this is once again the maximally antisymmetric state, which seems like the “universal counterexample”
of entanglement theory! This is why we cannot use squashed entanglement in this paper, and must
instead use entanglement of formation.

4.2     The two-prover case
We now show that the Weak Additivity Conjecture implies the QMA(2) Amplification Conjecture.

Theorem 4.5. Assume the Weak Additivity Conjecture. Then QMA(2, a, b) = QMA(2, 2−p(n) , 1−2−p(n) )
for all b − a = Ω(1/ poly(n)) and all polynomials p.

Proof. Let L be a language in QMA(2, a, b); then we need to show L ∈ QMA(2, 2−p(n) , 1 − 2−p(n) ). Let
Q be Arthur’s verification algorithm in the original QMA(2, a, b) protocol, and let the original Merlins’
messages have r(n) qubits each for some polynomial r. Also, let T (n) be a number of repetitions of
Q that suffices to amplify it to error probability 2−p(n) , assuming no entanglement among MerlinA ’s or
MerlinB ’s registers. By a standard Chernoff bound, we can take T (n) := C · p(n)/(b − a)2 for some
constant C.
    Our amplified protocol is the following.

  (1) Arthur asks MerlinA and MerlinB to supply q(n) copies each of their respective witnesses, where
      q(n) := C0 · T (n)r(n)/(b − a)2 for some constant C0 . Denote by ρ A1 A2 ···Aq(n) and ρ B1 B2 ···Bq(n) the
      states on q(n)r(n) qubits that Arthur actually receives.

  (2) For all t := 1 to T (n), Arthur chooses registers A j and Bk uniformly and independently from
      among those not already chosen, and runs Q on the state ρ A j Bk .

  (3) Arthur accepts if at least a+b
                                  2 T (n) of the T (n) invocations of Q accepted, and rejects otherwise.

      We need to show two things about this protocol, completeness and soundness.

Completeness: If the Merlins are honest, they can simply send |ψA i⊗q(n) and |ψB i⊗q(n) respectively,
    where |ψA i ⊗ |ψB i is a witness that Q accepts with probability at least b. Then by assumption,
    Arthur will accept with probability at least 1 − 2−p(n) .

Soundness: As usual, this is the interesting part. Our central claim is the following: At every one of the
     T (n) iterations, Arthur can be considered
                                         p       to be running Q on a bipartite state ρ AB that is ε-close
     to a separable state, where ε = O( T (n)r(n)/q(n)).

    Let us first see why soundness follows from the above claim. Suppose x ∈/ L. Then Q accepts every
separable state with probability at most a. By Proposition 2.6, then, Q also accepts every state that is
ε-close to separable with probability at most a + ε. But
                                            s              !
                                               T (n) r (n)     b−a
                                    ε =O                     ≤     ,
                                                  q (n)         4

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                  32
                                           T HE P OWER OF U NENTANGLEMENT

provided we chose a sufficiently large constant C0 when defining q(n). So every invocation of Q accepts
with probability at most a + (b − a)/4. Therefore, provided we choose a sufficiently large constant C
when defining T (n), Arthur will accept with probability at most 2−p(n) by a Chernoff bound.
    We now prove the claim. By Lemma 4.3, the entanglement of formation between MerlinA ’s registers
and MerlinB ’s registers can be at most 2r(n) after the first iteration, at most 4r(n) after the second
iteration, and so on. Hence

                                        EF ρ A1 A2 ···Aq(n) ,B1 B2 ···Bq(n) ≤ 2T (n) r (n)
                                                                           


throughout the protocol. Also, let SA and SB be the sets of A-registers and B-registers respectively that
Arthur has not yet chosen. Then |SA | = |SB | = q(n) − T (n). Assuming the Weak Additivity Conjecture,
we therefore have

                                    EF ρ A j Bk = O (q (n) − T (n)) EF ρ A1 A2 ···Aq(n) ,B1 B2 ···Bq(n)
                                                                                                       
                       ∑
                  A j ∈SA ,Bk ∈SB

                                                 = O (T (n) r (n) (q (n) − T (n))) .

So if we define
                                                        1
                                             σ :=                               ρ A j Bk ,
                                                    |SA | |SB | A j ∈S∑
                                                                      A ,Bk ∈SB


then the convexity of EF implies that
                                                                                                 
                        1                            A j Bk
                                                                   T (n) r (n)         T (n) r (n)
          EF (σ ) ≤                             EF ρ          =O                   =O                 ,
                    |SA | |SB | A j ∈S∑
                                      A ,Bk ∈SB
                                                                   q (n) − T (n)           q (n)
                                                                         p
using the fact that T (n) ≤ q(n)/2. By Lemma 4.2, this means that σ is O( T (n)r(n)/q(n))-close to a
separable state, as claimed.


4.3   The k-prover case
In this section we show unconditionally that any QMA(k) protocol with constant soundness can be
simulated by a QMA(2) protocol with soundness Ω(1/k). Combined with Theorem 4.5, this will imply
that assuming the Weak Additivity Conjecture, for all k ≥ 2 we have

                           QMA (k, 1/3, 2/3) ⊆ QMA (2, a, b) ⊆ QMA (2, 1/3, 2/3)

for some a, b with b − a = Ω(1/k), and hence QMA(k) = QMA(2). Note that Kobayashi et al. (personal
communication) have independently shown that amplification of QMA(2) protocols implies QMA(k) =
QMA(2) for all k ≥ 2. (Their earlier result only showed that amplification of QMA(k) protocols implies
QMA(k) = QMA(2) for all k ≥ 2, which was not strong enough for our purposes.)
                                                              2
Theorem 4.6. QMA(k, a, b) ⊆ QMA(2, 1 − (b−a)       −n
                                         8k , 1 − 2 ).


                              T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               33
                    S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

Proof. We will show that for all k and all δ = Ω(1/ poly(n)),

                                                                      δ2
                                                                                 
                                                 −n                            −n
                                                      
                          QMA k, 1 − δ , 1 − 2            ⊆ QMA 2, 1 − , 1 − 2      .
                                                                      8k

This will suffice to prove the theorem, since Lemma 2.5 implies that for all k and all a, b, we have
QMA(k, a, b) ⊆ QMA(k, 1 − (b − a), 1 − 2−n ).
   Our protocol is as follows. MerlinA and MerlinB send k-partite states ρ A1 A2 ···Ak and ρ B1 B2 ···Bk re-
spectively. Given these states, Arthur performs one of the following two tests, each with probability
1/2:

  (1) Choose i ∈ [k] uniformly at random, perform a swap test between ρ Ai and ρ Bi , and accept if and
      only if the swap test accepts.

  (2) Simulate the QMA(k, 1 − δ , 1 − 2−n ) protocol, using ρ A1 A2 ···Ak in place of the k witness registers.

     We first show completeness of the above protocol. If the Merlins are honest, they can both simply
send k unentangled accepting witnesses for the QMA(k) protocol being simulated. In that case step (1)
accepts with probability 1, while step (2) accepts with probability at least 1 − 2−n .
     We now show soundness. Suppose any set of unentangled witnesses causes the QMA(k) protocol to
reject with probability at least δ . Then we need to show that any pair of witnesses ρ A1 A2 ···Ak and ρ B1 B2 ···Bk
causes the QMA(2) protocol to reject with probability at least δ 2 /(8k). We consider two cases.
     First suppose ρ A1 A2 ···Ak is ε-close in trace distance to some separable pure state |Ψi. Then by Propo-
sition 2.6, step (2) rejects with probability at least δ − ε.
     Next suppose ρ A1 A2 ···Ak is ε-far in trace distance from any separable pure state. Then by Proposi-
tion 2.9, we have hΨ|ρ A1 A2 ···Ak |Ψi < 1 − ε 2 for all separable pure states |Ψi. So taking the contrapositive
of Proposition 2.8, for all pure states |ψ1 i, . . . , |ψk i we have

                                           k
                                          ∑ 1 − ψi |ρ A |ψi           > ε 2.
                                                                  
                                                             i

                                          i=1


Hence step (1) rejects with probability greater than ε 2 /(2k) by Proposition 2.10. Setting ε = 3δ /4, we
thus find that the protocol rejects with probability at least

                                                 δ (3δ /4)2     δ2
                                                           
                                         1
                                           max     ,          ≥    .
                                         2       4    2k        8k



    Combining Theorem 4.6 with Theorem 4.5 now yields the following:

Corollary 4.7. The Weak Additivity Conjecture implies the Collapse Conjecture: QMA(k) = QMA(2)
for all k ≥ 2.

                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                     34
                                       T HE P OWER OF U NENTANGLEMENT

4.4     Symmetric QMA(k)
Define the complexity class SymQMA(k, a, b) the same way as QMA(k, a, b), except that now we are
promised that the k witnesses are all identical (in both the completeness and soundness cases). We saw in
Section 3.3 that symmetric QMA(k) protocols are sometimes easier to analyze than non-symmetric ones.
However, we will now show that assuming the Weak Additivity Conjecture, QMA(k) and SymQMA(k)
are actually equivalent.12
    The first step is to show they are (unconditionally) equivalent up to a loss in error bounds.
                                                                               2
Lemma 4.8. QMA(k, a, b) ⊆ SymQMA(k, a, b) ⊆ QMA(k, 1 − (b−a)       −n
                                                         8k , 1 − 2 ).

Proof. For the first containment, have each Merlin in the SymQMA protocol send k witnesses (for a
total of k2 witnesses). Then simulate the QMA protocol by using the ith witness from the ith Merlin for
all i ∈ [k]. For the second containment, first observe that

                         SymQMA (k, a, b) ⊆ SymQMA k, 1 − (b − a) , 1 − 2−n ,
                                                                             


completely analogously to Lemma 2.5. Let δ = b − a. Then to simulate a SymQMA(k, 1 − δ , 1 − 2−n )
protocol without the symmetry promise we do the following. Let |ϕi i be the witness sent by the ith
Merlin. Then

      • With 1/2 probability, Arthur performs a swap test between |ϕ1 i and a random other witness, and
        accepts if and only if the swap test accepts.

      • With 1/2 probability, Arthur runs the SymQMA protocol as if the witnesses were identical.

    In the completeness case, it is clear that Arthur accepts with probability greater than 1 − 2−n . To
show soundness we consider two cases, just like in Theorem 4.6. Let |Φi = |ϕ1 i ⊗ · · · ⊗ |ϕk i. First
suppose |Φi is ε-close in trace distance to |ϕ1 i⊗k . Then by Proposition 2.6, when he runs the SymQMA
protocol Arthur will reject with probability at least δ − ε. Next suppose |Φi is ε-far from |ϕ1 i⊗k . Then
|hϕ1 |⊗k |Φi|2 < 1 − ε 2 by Proposition 2.9. So by Proposition 2.8, we have
                                              k                  
                                             ∑ 1 − |hϕ1 |ϕi i|2 > ε 2 .
                                             i=1

Hence when Arthur performs a swap test, he rejects with probability greater than ε 2 /(2k).
   Setting ε := 3δ /4, we thus find that the protocol rejects with probability at least δ 2 /(8k).

      Combining Lemma 4.8 with Theorem 4.6, we immediately get the following.

Theorem 4.9. The QMA(2) amplification conjecture implies SymQMA(k) = QMA(k) = QMA(2) for
all k ≥ 2.
  12 This does not mean that proving the Weak Additivity Conjecture would supersede the analysis of the non-symmetric

case in Section 3. Our result will only show that, assuming the Weak Additivity Conjecture, the witness sizes in QMA and
SymQMA protocols are polynomially related, which is vacuous in the context of our 3S AT protocol.


                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                        35
                      S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

5      Nonexistence of perfect disentanglers
In this section, we will speak interchangeably about Hilbert spaces and spaces of density operators over
those Hilbert spaces.

Definition 5.1. Let H and K be two finite-dimensional Hilbert spaces. Then given a superoperator
Φ : H → K ⊗ K, we say Φ is an (ε, δ )-disentangler if

    (i) Φ(ρ) is ε-close to a separable state for every ρ, and

    (ii) for every separable state σ , there exists a ρ such that Φ(ρ) is δ -close to σ .

    As pointed out in Section 1.5, if for sufficiently small constants ε, δ there exists an (ε, δ )-
disentangler with log dim H = poly(log dim K)—and if, moreover, that disentangler can be implemented
in quantum polynomial time—then QMA(2) = QMA.
    Watrous (personal communication) has proposed the following fundamental conjecture.

Conjecture 5.2 (Watrous). For all constants ε, δ < 1, any (ε, δ )-disentangler requires

                                               dim H = 2Ω(dim K) .

   A proof of Conjecture 5.2 would be an important piece of formal evidence that QMA(2) 6= QMA,
and might even lead to a “quantum oracle separation” (as defined by Aaronson and Kuperberg [1])
between the two classes.
   Here we show that, at least in the case ε = δ = 0, no disentangler exists in any finite dimension. The
counterexamples in Section 1.5 imply that this result would be false if we let either ε or δ be nonzero.

Theorem 5.3. Let Φ : H → K ⊗ K be any superoperator whose image is the set of separable states.
Then dim K ≥ 2 implies dim H = ∞.

Proof. For any pure state |αi ∈ K, by assumption there exists a state ρα such that Φ(ρα ) = |αihα| ⊗
|αihα|. By convexity, we can assume ρα = |φα ihφα | is pure. Also, suppose dim H is finite. Then Φ
admits an operator-sum representation Φ(ρ) = ∑ki=1 Ei ρEi† where ∑ki=1 Ei† Ei = I. We then have

                                               k
                            Φ(|φα i hφα |) = ∑ Ei |φα i hφα | Ei† = |αi hα| ⊗ |αi hα| .
                                              i=1

So again by convexity, we find that Ei |φα i must be a multiple of |αi|αi for all i and α; that is, there
exist constants cα,i such that Ei |φα i = cα,i |αi|αi.
    Now let |αi, |β i be any two pure states in K with |αi 6= |β i. Also let |ψi = a|φα i + b|φβ i for some
nonzero real numbers a, b. Then

Φ(|ψi hψ|) = a2 Φ(|φα i hφα |) + b2 Φ( φβ           φβ ) + abΦ(|φα i φβ ) + abΦ( φβ hφα |)
                  2                       2
              = a |αi hα| ⊗ |αi hα| + b |β i hβ | ⊗ |β i hβ | + abc |αi hβ | ⊗ |αi hβ | + abc |β i hα| ⊗ |β i hα| ,


                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                   36
                                     T HE P OWER OF U NENTANGLEMENT

where
                                                       k
                                               c = ∑ cα,i cβ ,i .
                                                   i=1

We claim that c = 0. To see this, recall that Φ(|ψihψ|) is a separable mixed state, and consider any
decomposition of Φ(|ψihψ|) into separable pure states. Since Φ(|ψihψ|) is a mixed state in the sub-
space spanned by |αi|αi and |β i|β i, every pure state in the support of Φ(|ψihψ|) must have the form
x|αi|αi + y|β i|β i. But by the assumption |αi =   6 |β i, such a state cannot be separable unless x = 0
or y = 0. Hence the only separable pure states in the support of Φ(|ψihψ|) are |αi|αi and |β i|β i.
Therefore abc = 0. But a and b were nonzero, so c = 0 as claimed.
    This means in particular that Φ(|φα ihφβ |) = 0 for all |αi 6= |β i. Hence
                                                                      !
                        k                          k
           hφβ |φα i = ∑ φβ Ei† Ei |φα i = Tr     ∑ Ei |φα i φβ Ei†
                                                                                           
                                                                          = Tr Φ(|φα i φβ ) = 0 .
                       i=1                        i=1

So for different |αi’s, the states |φα i are all orthogonal, and since the number of |αi’s is infinite, dim H
must be infinite as well.


6     Open problems
6.1     The power of multiple merlins
The power of QMA(2) and related classes is still poorly understood. Can we find a “classical” problem
(for example, a group-theoretic problem like those of Watrous [29]) that is in QMA(2) but not obviously
in QMA? Can we find a natural QMA(k)-complete promise problem?
     We still do not have any upper bound on QMA(2) better than the trivial NEXP, or even good ev-
idence for such an upper bound. As mentioned before, an earlier version of this paper showed that
QMA(2) ⊆ PSPACE assuming the “Strong Amplification Conjecture,” but Fernando Brandão (personal
communication) subsequently showed that the same conjecture implies QMA(2) = QMA. Can we show
under a plausible conjecture that QMA(2) ⊆ EXP or (better yet) QMA(2) ⊆ PSPACE?
     Regarding our 3S AT protocol, can we reduce the number of provers to two? Can we reduce the
number of qubits below O( e √n), or alternatively, give some sort of evidence against this possibility? For
                              √
example, can we show that Ω( n) qubits are information-theoretically required for the Uniformity Test?
Also, can we show that a QMA(k) protocol for 3S AT with no(1) qubits would have unlikely complexity
consequences?
     A long-shot possibility would be to give a quantum algorithm to find the unentangled witnesses in
the √3S AT protocol, in as much time as it would take were the witnesses entangled. This would yield a
2O( n) -time quantum algorithm for 3S AT.
  e



6.2     Amplification and other complexity issues
In defining QMA(k), does it matter if the amplitudes are reals or complex numbers? For BQP and QMA,
it is not hard to show that this distinction is irrelevant. Interestingly, though, the usual equivalence proofs

                             T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                               37
                     S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

break down for QMA(k). As evidence that QMA(k) might actually be sensitive to the difference between
reals and complex numbers, consider the analysis of our 3S AT protocol: in Section 3.4, the result that
we need becomes much simpler to prove when we assume all amplitudes are real.
    Can we show directly (i. e., without proving the Weak Additivity Conjecture) that

                                                 QMA(k) = QMA(2) ,

or that QMA(2) protocols can be amplified?
     Can we prove Conjecture 5.2: that there is no (ε, δ )-disentangler with poly(n) qubits and ε, δ > 0?
Can we at least rule out such a disentangler when either ε > 0 or δ > 0? Related to that, can we give a
quantum oracle U (as defined by Aaronson and Kuperberg [1]) such that QMAU 6= QMAU (2)? Can we
at least show that Conjecture 5.2 would imply the existence of such an oracle?
     Define the complexity class QMA(2; h) to be the same as QMA(2), except that now, instead of being
completely unentangled, the two Merlins are allowed to share h EPR pairs √12 (|00i + |11i).13 Can we
show—perhaps under some assumption—that QMA(2; h) = QMA(2) for all polynomial h?

6.3 QMA(k) with unentangled measurements
Recall that our 3S AT protocol involved three tests: Satisfiability, Symmetry, and Uniformity. Suppose
we are willing to settle for completeness 1 − ε rather than 1, and suppose we modify the Uniformity
Test so that Arthur rejects on not seeing enough collisions. Then can the Symmetry Test be omitted?
If so, then the resulting protocol would have the extremely interesting property of making no entan-
gled measurements, yet nevertheless depending crucially on the absence of entanglement among the
witnesses.
    More generally, define BellQMA(k) to be the subclass of QMA(k) in which Arthur is restricted to
making a separate measurement on each witness |ϕi i, with no entanglement between the measurements,
and no measurement depending on the outcome of any other. (The name arises because Arthur is es-
sentially restricted to performing a “Bell experiment.”) In an earlier version of this paper, we raised the
question of whether BellQMA(k) = QMA: that is, whether a QMA(k) protocol with separate measure-
ments on each witness ever does superpolynomially better than an ordinary QMA protocol. Recently,
Brandão [9] managed to settle our question, showing that BellQMA(k) = QMA for all constant k.14 In-
deed, Brandão’s result applies even to the stronger model in which Arthur can first measure |ϕ1 i, then
choose a measurement for |ϕ2 i depending on the outcome of the |ϕ1 i measurement, and so on up to |ϕk i.
(In other words, in which there is one-way communication from earlier to later measurement steps.)
    On the other hand, define LOCCQMA(k) to be the subclass of QMA(k) in which Arthur’s measure-
ments on the k witnesses must be unentangled, but can involve arbitrary local operations and classical
communication. (In other words, Arthur can perform a partial measurement on |ϕ1 i, then based on the
outcome of that measurement perform a partial measurement on |ϕ2 i, then based on the outcome of that
measurement perform another partial measurement on |ϕ1 i, and so on.) Then where LOCCQMA(k)
  13 A similar notion was considered by Gavinsky [12]. By contrast, Kobayashi and Matsumoto [18] allow two provers to

share an arbitrary polynomial amount of entanglement.
  14 The case of larger k remains open, and therefore Brandão’s result has no implications for the earlier question about our

3S AT protocol.


                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                                            38
                                 T HE P OWER OF U NENTANGLEMENT

sits between QMA and QMA(k) remains an open problem. (Note that it is trivial to show amplifica-
tion for LOCCQMA(k), as for BellQMA(k). This is because, without entangling measurements, the
entanglement-swapping problem described in Section 1.4 can never arise.)


Acknowledgments
We thank Fernando Brandão for pointing out that our “Strong Amplification Conjecture” implies not
only QMA(2) ⊆ PSPACE but QMA(2) = QMA, and that our original version of Lemma 4.3 contained a
bug, among other extremely helpful comments; Norbert Schuch for pointing out that Theorem 4.5 can be
based on a conjecture about entanglement of formation rather than squashed entanglement; Madhu Su-
dan for pointers on the PCP Theorem; John Watrous for posing Conjecture 5.2; an anonymous reviewer
for greatly simplifying the proof of Lemma 4.3 and other help; and Patrick Hayden, Ryan O’Donnell,
Alain Tapp, and Andreas Winter for helpful discussions.


References
 [1] S. A ARONSON AND G. K UPERBERG: Quantum versus classical proofs and advice. Theory of
     Computing, 3(7):129–157, 2007. Previous version in Proc. CCC 2007 and quant-ph/0604056.
     [ToC:v003/a007, arXiv:quant-ph/0604056]. 36, 38

 [2] D. A HARONOV AND T. NAVEH: Quantum NP - a survey. quant-ph/0210077, 2002. [arXiv:quant-
     ph/0210077]. 2

 [3] M. B EN -O R , S. G OLDWASSER , J. K ILIAN , AND A. W IGDERSON: Multi-prover interactive
     proofs: how to remove the intractability assumptions. In Proc. 20th STOC, pp. 113–131, New
     York, NY, USA, 1988. ACM Press. [STOC:62212.62223]. 2

 [4] C. B ENNETT, E. B ERNSTEIN , G. B RASSARD , AND U. VAZIRANI: Strengths and
     weaknesses of quantum computing.        SIAM J. Comput., 26(5):1510–1523, 1997.
     [SICOMP:10.1137/S0097539796300933, arXiv:quant-ph/9701001]. 28, 29

 [5] C. H. B ENNETT, G. B RASSARD , C. C R ÉPEAU , R. J OZSA , A. P ERES , AND W. W OOTTERS:
     Teleporting an unknown quantum state by dual classical and EPR channels. Phys. Rev. Lett.,
     70:1895–1898, 1993. [PRL:10.1103/PhysRevLett.70.1895]. 7

 [6] C. H. B ENNETT, D. P. D I V INCENZO , J. A. S MOLIN , AND W. K. W OOTTERS: Mixed-
     state entanglement and quantum error correction. Phys. Rev. A, 54(5):3824–3851, 1996.
     [PRA:10.1103/PhysRevA.54.3824, arXiv:quant-ph/9604024]. 7, 29

 [7] H. B LIER AND A. TAPP: All languages in NP have very short quantum proofs. arXiv:0709.0738,
     2007. [arXiv:0709.0738]. 2, 3

 [8] D. M. B LOOM AND W. K NIGHT: A birthday problem. Amer. Math. Monthly, 80(10):1141–1142,
     December 1973. 14

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                         39
                 S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

 [9] F. B RAND ÃO: Entanglement Theory and the Quantum Simulation of Many-Body Physics. PhD
     thesis, Imperial College, London, 2008. [arXiv:0810.0026]. 38

[10] M. C HRISTANDL AND A. W INTER: “Squashed entanglement” - an additive entanglement mea-
     sure. J. Math. Phys., 45(3):829–840, 2004. [arXiv:quant-ph/0308088]. 7, 31

[11] I. D INUR:  The PCP theorem by gap amplification.                    J. ACM, 54(3):12,    2007.
     [JACM:1236457.1236459]. 3, 5, 12, 28

[12] D. G AVINSKY: On the role of shared entanglement. Quantum Inf. Comput., 8(1-2):82–95, 2008.
     [arXiv:quant-ph/0604052]. 38

[13] M. B. H ASTINGS: A counterexample to additivity of minimum output entropy. arXiv:0809.3972,
     2008. [arXiv:quant-ph/0809.3972]. 3, 7, 31

[14] S. K HANNA , M. S UDAN , L. T REVISAN , AND D. P. W ILLIAMSON: The approxima-
     bility of constraint satisfaction problems. SIAM J. Comput., 30(6):1863–1920, 2000.
     [SICOMP:10.1137/S0097539799349948]. 12

[15] A. K ITAEV, A. S HEN , AND M. N. V YALYI: Classical and Quantum Computation. American
     Mathematical Society, 2002. 2

[16] H. K LAUCK , A. NAYAK , A. TA -S HMA , AND D. Z UCKERMAN: Interaction in quantum commu-
     nication. IEEE Trans. Inform. Theory, 53(6):1970–1982, 2007. Earlier version in STOC’2001 and
     quant-ph/0603135. [doi:10.1109/TIT.2007.896888]. 30

[17] E. K NILL: Quantum randomness and nondeterminism. quant-ph/9610012, 1996. [arXiv:quant-
     ph/9610012]. 2

[18] H. KOBAYASHI AND K. M ATSUMOTO: Quantum multi-prover interactive proof systems with
     limited prior entanglement. J. Comput. System Sci., 66(3):429–450, 2003. [JCSS:10.1016/S0022-
     0000(03)00035-7]. 38

[19] H. KOBAYASHI , K. M ATSUMOTO , AND T. YAMAKAMI: Quantum Merlin-Arthur proof systems:
     are multiple Merlins more helpful to Arthur? In Proc. 14th Intern. Symp. Algorithms and Com-
     putation, volume 2906 of LNCS, pp. 189–198, 2003. [ISAAC:847uk09hm42tdu3q, arXiv:quant-
     ph/0306051]. 2, 10

[20] R. K ÖNIG AND R. R ENNER: A de Finetti representation for finite symmetric quantum states. J.
     Math. Phys., 46(122108), 2005. quant-ph/0410229. [arXiv:quant-ph/0410229]. 7, 8

[21] Y.-K. L IU , M. C HRISTANDL , AND F. V ERSTRAETE: Quantum computational complex-
     ity of the N-representability problem: QMA complete. Phys. Rev. Lett., 98(11), 2007.
     [PRL:10.1103/PhysRevLett.98.110503, arXiv:quant-ph/0609125]. 2, 8

[22] C. M ARRIOTT AND J. WATROUS: Quantum Arthur-Merlin games.                   Comput. Complexity,
     14(2):122–152, 2005. [CC:h0521k6666871652]. 9

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                          40
                                 T HE P OWER OF U NENTANGLEMENT

[23] M. N IELSEN AND I. C HUANG: Quantum Computation and Quantum Information. Cambridge
     University Press, 2000. 10, 30

[24] M. O HYA AND D. P ETZ: Quantum Entropy and its Use. Springer, 1993. 30

[25] C. H. PAPADIMITRIOU AND M. H. YANNAKAKIS: Optimization, approximation, and complexity
     classes. J. Comput. System Sci., 43(3):425–440, 1991. [JCSS:10.1016/0022-0000(91)90023-X].
     12

[26] R. R AZ: A parallel repetition theorem. SIAM J. Comput., 27(3):763–803, 1998. Earlier version in
     STOC 1995. [SICOMP:10.1137/S0097539795280895]. 2

[27] P. W. S HOR: Equivalence of additivity questions in quantum information theory. Comm. Math.
     Phys., 246(3):453–472, 2004. [Springer:a6d2vegprhlqvrby, arXiv:quant-ph/0305035]. 7, 31

[28] V. V EDRAL AND M. B. P LENIO: Entanglement measures and purification procedures. Phys. Rev.
     A, 57(3):1619–1633, 1998. [PRA:10.1103/PhysRevA.57.1619, arXiv:quant-ph/9707035]. 30

[29] J. WATROUS: Succinct quantum proofs for properties of finite groups. In Proc. 41st
     FOCS, pp. 537–546, Silver Spring, MD, USA, 2000. IEEE Computer Society Press.
     [doi:10.1109/SFCS.2000.892141]. 2, 37

[30] M. Z UKOWSKI , A. Z EILINGER , M. A. H ORNE , AND A. K. E KERT: “event-ready-detectors”:
     Bell experiment via entanglement swapping. Phys. Rev. Lett., 71(26):4287–4290, 1993.
     [PRL:10.1103/PhysRevLett.71.4287]. 6


AUTHORS

      Scott Aaronson
      assistant professor
      EECS Department
      Massachusetts Institute of Technology, Cambridge, MA
      aaronson csail mit edu
      http://www.scottaaronson.com


      Salman Beigi
      graduate student
      Department of Mathematics
      Massachusetts Institute of Technology, Cambridge, MA
      salman mit edu
      http://web.mit.edu/salman/www/




                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                         41
              S. A ARONSON , S. B EIGI , A. D RUCKER , B. F EFFERMAN , AND P. S HOR

   Andrew Drucker
   graduate student
   EECS Department
   Massachusetts Institute of Technology, Cambridge, MA
   add3993 yahoo com
   http://andysresearch.blogspot.com/

   Bill Fefferman
   graduate student
   Department of Computer Science
   California Institute of Technology, Pasadena, CA
   wjf caltech edu
   http://www.its.caltech.edu/∼wjf

   Peter Shor
   professor
   Massachusetts Institute of Technology, Cambridge, MA
   shor math mit edu
   http://www-math.mit.edu/∼shor/

ABOUT THE AUTHORS
   S COTT A ARONSON is a theoretical computer scientist and blogger. This is his fourth paper
      in Theory of Computing.

   S ALMAN B EIGI received his B. S. at Sharif University of Technology, Tehran in 2004. He
      is currently finishing his Ph. D. at the MIT Math Department under the direction of Peter
      Shor. The title of his thesis is “Quantum Proof Systems and Entanglement Theory.”
      He will continue his research as a postdoc at the Institute for Quantum Information
      at Caltech. His interests include quantum complexity theory, quantum coding theory,
      photography, and playing daf, a traditional Persian musical instrument.

   A NDREW D RUCKER is a Ph. D. student in theoretical computer science at MIT, supervised
      by Scott Aaronson. He has broad interests in complexity theory, and also enjoys run-
      ning, live jazz, and the game of Go.

   B ILL F EFFERMAN is a Ph. D. student at Caltech in computer science and at the Institute for
       Quantum Information. He started this research while visiting MIT, and continued it at
       the University of Chicago, where he was an undergraduate. His research interests are
       quantum computing and computational complexity.

   P ETER S HOR is a professor at MIT. He is known for his factoring algorithm.


                      T HEORY OF C OMPUTING, Volume 5 (2009), pp. 1–42                            42