Parallel Repetition: Simplification and the No-Signaling Case

Authors Thomas Holenstein,
Plaintext
                              T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172
                                           www.theoryofcomputing.org




                Parallel Repetition: Simplifications
                    and the No-Signaling Case∗
                                                   Thomas Holenstein†
                                     Received: June 20, 2008; published: July 29, 2009.



         Abstract: Consider a game where a referee chooses (x, y) according to a publicly known
         distribution, sends x to Alice, and y to Bob. Without communicating with each other, Alice
         responds with a value a and Bob responds with a value b. Alice and Bob jointly win if a
         publicly known predicate Q(x, y, a, b) is satisfied.
             Assume that the maximum probability that Alice and Bob can win in such a game
         is v < 1. Raz (SIAM J. Comput. 27, 1998) shows that if the game is repeated n times
         in parallel, then the probability that Alice and Bob win all games simultaneously is at
                     n
         most v̄ log(s) , where s is the maximal number of possible responses from Alice and Bob in
         the initial game, and v̄ < 1 is a constant depending only on v.
             In this work, we simplify Raz’s proof in various ways and thus shorten it significantly.
         Further we study the case where Alice and Bob are not restricted to local computations and
         can use any strategy which does not imply communication between them.

ACM Classification: F.4.1, F.1.2
AMS Classification: 68Q15, 60C05
Key words and phrases: parallel repetition, probabilistically checkable proofs

1      Introduction
In a two-player refereed game, a referee chooses (x, y) randomly according to a known distribution PXY ,
sends x to Alice, and y to Bob, who respond with values a and b, respectively. Alice and Bob jointly
     ∗ A preliminary version of this paper appeared in the proceedings of the thirty-ninth annual ACM Symposium on Theory of

Computing (STOC ’07) [21]
  † This work was done while the author was at ETH Zürich.



    2009 Thomas Holenstein
    Licensed under a Creative Commons Attribution License                                     DOI: 10.4086/toc.2009.v005a008
                                                 T HOMAS H OLENSTEIN

win if a known predicate Q(x, y, a, b) is satisfied; we denote the maximum winning probability for such
a game by v and assume v < 1.
    In the parallel repetition of such a game, n pairs (xi , yi ) are chosen independently according to
(PXY )n . The values x1 , . . . , xn are sent to Alice, the values y1 , . . . , yn to Bob. The players respond with
values a1 , . . . , an and b1 , . . . , bn , and win if Q(xi , yi , ai , bi ) is satisfied for all i.
    The question studied in this paper is how much parallel repetition reduces the winning probability of
the players. It is motivated by the study of two-prover interactive proofs, initiated by Ben-Or et al. [5].
It was first conjectured that in a game which is repeated n times in parallel, the probability that Alice
and Bob win all the games simultaneously is at most vn (see [19]). However, later a counterexample to
this conjecture was given [18].1


Related work Because of the PCP-Theorem and its application to hardness of approximation [14, 2,
1], people became interested to know whether the winning probability would at least decrease exponen-
tially in n. Various papers give upper bounds on the winning probability of a game which is repeated n
times in parallel [8, 13, 25, 34]; but all of them fall short to give a result of the desired form. Raz then
gave a very strong result in [32], showing that the winning probability is at most v̄n/log(s) for some con-
stant v̄ < 1 depending only on v, and where s is the total number of answers Alice and Bob may give
in the initial game. It is the only explicit bound for arbitrary distributions PXY and quantitatively the
strongest. Parnafes, Raz, and Wigderson [29] modify Raz’s proof to show that the term log(s) can be
replaced by a parameter which is much smaller for some games. The current paper is based on Raz’s
result (see below), and gives a slightly stronger upper bound of the same form.
     Games for which the n-fold parallel repetition decreases the winning probability less than from v
to vn were also constructed: Fortnow [18] gives a game for which the maximal winning probability
in two repetitions is larger than v2 (see also [16]), Feige [13] constructs a game where the winning
probability in two parallel repetitions does not decrease at all, and Feige and Verbitsky [17] give, for
infinitely many s, a game where Θ(log(s)/log log(s)) repetitions decrease the winning probability from
at most 3/4 to some value larger than 1/8, where s is the number of possible answers Alice and Bob can
give. This last result shows that in general a dependence on s as in Raz’s bound is needed.


No-signaling strategies No-signaling strategies are all those strategies which do not imply commu-
nication. Popescu and Rohrlich [30] give an example of such a strategy: Alice receives a bit x, Bob
receives a bit y, and they respond with uniform random bits a and b such that a ⊕ b = x ∧ y. It is clear
that this strategy cannot be implemented with shared randomness and without communication. On the
other hand, black-box access to the strategy does not give Alice and Bob the power to communicate.
    The study of no-signaling strategies is motivated by the idea that if Alice and Bob share some
entangled quantum state, the set of possible strategies they might use increases, but stays a subset of the
no-signaling strategies (this subset is strict: for example the above strategy which achieves a ⊕ b = x ∧ y
from (x, y) cannot be simulated perfectly using quantum mechanics [9], see also [28, Problem 2.3]—the
corresponding game is called the CHSH-game [10]).
    We remark that there are games which can be won with probability 1 given a shared quantum state
   1 For readers not familiar with such counterexamples, a variation of Fortnow’s game is reproduced in Section 10.1.



                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                          142
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

(and thus with a no-signaling strategy), but not using local strategies. Those are called “pseudo-telepathy
games” (see [6] and the references therein).
     A parallel repetition theorem for the case where Alice and Bob share a quantum state and the decision
of the referee only depends on the XOR of the binary answers of Alice and Bob was given by Cleve et
al. [11].


Consistent sampling One of the contributions of this paper is to introduce consistent sampling in
the context of parallel repetition (Lemma 5.2). It essentially considers the following scenario: assume
Alice and Bob are given distributions PXA and PXB with the promise that kPXA − PXB k1 ≤ ε, and they
are supposed to use shared randomness to produce elements XA and XB distributed according to their
respective distributions while maximizing Pr[XA = XB ]; we will see that 1 − O(ε) is achievable.
    After the conference version [21] of this paper has been published, it has come to our attention that
methods for this are known under the name consistent sampling. The first reference seems to be [27],
other occurrences are [7, 24, 20, 26]. Often, the special case where the two distributions are uniform
over a certain set is studied—the difference to the case considered here is small, however.


Contributions of this paper As mentioned above, the strongest upper bound on the winning prob-
ability of a game repeated n times in parallel previous to this paper was given by Raz [32]. His
proof was somewhat complicated. In this paper we simplify Raz’s proof and give a slightly stronger
bound in the following sense: Raz shows that the winning probability of the players is at most (1 −
Ω((1 − v)c ))n/ log(s) , where c = 32 is implicit in the paper (i. e., using our previous notation, Raz shows
v̄ = 1 − Ω((1 − v)32 )). We improve the constant in the exponent to c = 3. The most important change
we do to achieve this is to replace a large part (essentially Section 6) of Raz’s paper with consistent
sampling.
    The use of consistent sampling also makes the rest of the argument simpler. We briefly explain
why: the main part of Raz’s proof consists of showing that the information the players get in the n-fold
repetition does not help them win the subgame in some coordinate j, even conditioned on the event that
certain other subgames are won. This is done in three steps. In two of these steps the information does
not help the players because they can generate this information themselves with local computation only.
Consistent sampling implies that this also holds for the third step. This allows us to merge some of the
steps, which simplifies the overall structure.
    We then study how much the term log(s) in the exponent in the parallel repetition theorem can be
reduced. In [29] it is shown that the logarithm of the partition number of the acceptance predicate can
be used instead of log(s). Based on the ideas from there, we give a bound which might be stronger for
some games (see Theorem 8.2).
    Finally, we prove a parallel repetition theorem in case Alice and Bob are restricted to no-signaling
strategies (in both the given game and the parallel repetition of it).


Subsequent work Theorem 2.5, the main result of this paper, states that the winning probability of
the two players in the n-fold repetition is at most (1 − (1 − v)3 )Ω(n/s) . Based on the proof given here,
Rao [31] strengthens this to (1 − (1 − v)2 )Ω(n) , for games where for every triple (x, y, a) a unique b

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                              143
                                            T HOMAS H OLENSTEIN

satisfies Q(x, y, a, b); such games are known as “projection games.” This special case is important in the
application of the Parallel Repetition Theorem to probabilistically checkable proofs.
    Raz [33] subsequently showed that Rao’s bound is essentially tight for the family of odd cycle games,
excluding the possibility of reducing the exponent 2 in the above bound. Such an improvement would
have been interesting: [15] shows that any exponent smaller than 2 would imply an equivalence between
certain Max-Cut hardness and Khot’s Unique Games conjecture [23]. Barak et al. [3] strengthen Raz’s
result, and give a non-trivial approximation algorithm for the value of a unique game (the special case
of a projection game where also for each b there is a unique a which satisfies Q(x, y, a, b)) repeated in
parallel.
    In [31], Rao also gives a concentration bound: he shows that it is not only unlikely that the players
win all the games, but in fact it is unlikely that they win more than a 1 − v + δ fraction of the games for
any constant δ > 0. For this, he modifies the proof given here appropriately.


2     Notation and basic facts
2.1   Probability distributions
We use calligraphic letters to denote sets. We denote random variables using capital letters, and val-
ues with lower case letters. We use superscripts to denote tuples, e. g., X n := (X1 , . . . , Xn ) and xn :=
(x1 , . . . , xn ).
     If a distribution PXY over X × Y is given, we write PX or PY to denote the marginal distribution, e. g.,
PX (x) := ∑y∈Y PXY (x, y). The conditional distribution PY |X=x is PY |X=x (y) := PXY (x, y)/PX (x) which is
defined if PX (x) > 0. To also define it if PX (x) = 0, we assign expressions of the form 0/0 the value 0,
if they occur in this (or any other) context.
     Let PX0 be a distribution over X and PY1 |X1 =x be a conditional distribution over Y. We define the
distribution PX0 PY1 |X1 over X × Y as

                                 (PX0 PY1 |X1 )(x, y) := PX0 (x) · PY1 |X1 =x (y) .                      (2.1)

For this, it is necessary that PY1 |X1 =x is defined for every x ∈ X. We also use this notation when PY1 |X1 =x
is defined as the marginal of a given distribution PX1Y1 . In this case, we define PY1 |X1 =x in an arbitrary
way if PX1 (x) = 0. This notation is used for example in Corollary 5.3 in the form PX0Y0 PS|X , where it is
understood as (PX0Y0 PS|X )(x, y, s) := PX0Y0 (x, y)PS|X=x (s). Note that the conditional distribution PS|X=x
is defined there by the marginal distribution PSX of the given distribution PSXY . The notation PX0Y0 PS|X
is not completely explicit, since it does not specify that X0 is to be associated with X. However, it will
always be clear from the context.
    For two probability distributions PX0 and PX1 over the same set X we define the statistical distance

                                                     1
                                 kPX0 − PX1 k :=        ∑ PX0 (x) − PX1 (x) .                            (2.2)
                                                     2 x∈X

    The following lemma states alternate ways of characterizing the statistical distance. Equation (2.3)
states that the statistical distance is the advantage of the best function in predicting whether a given

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                               144
                 PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

sample is from X0 or X1 . Equation (2.4) states that X0 and X1 can be generated at the same time such that
they differ only with probability kPX0 − PX1 k (i. e., it provides some sort of coupling).
Lemma 2.1. Let PX0 and PX1 be distributions over a finite set X. Then,
                                                                
                        max Pr[ f (X0 ) = 0] − Pr[ f (X1 ) = 0] = kPX0 − PX1 k                               (2.3)
                    f :X→{0,1}

                                                 Pr[X00 6= X10 ] = kPX0 − PX1 k.
                                                                
                                   min                                                                       (2.4)
                                PX 0 X 0 :PX0 =PX 0 ∧PX1 =PX 0
                                   0 1          0            1

Proof. For (2.3) we refer to [12], equation (11.137). For (2.4) we get, for any PX00 X10 with (PX0 = PX00 ) ∧
(PX1 = PX10 ):

               Pr[X00 = X10 ] = ∑ Pr[X00 = X10 = x] ≤ ∑ min(PX0 (x), PX1 (x))
                               x∈X                               x∈X
                                   PX0 (x) + PX1 (x) |PX0 (x) − PX1 (x)|
                             =∑                     −                    = 1 − kPX0 − PX1 k .
                               x∈X         2                  2
We can reach equality. For this, let ε := kPX0 − PX1 k and assume 0 < ε < 1 (in the other cases the lemma
is trivial). Define the probability distributions
                                                      1
                                          PX =           min(PX0 , PX1 ) ,
                                                    1−ε
                                                    1
                                           PXe0 = (PX0 − min(PX0 , PX1 )) ,
                                                    ε
                                                    1
                                           PXe1 = (PX1 − min(PX0 , PX1 )) .
                                                    ε
Then, PX00 X10 (x0 , x1 ) = (1 − ε)δx0 ,x1 PX (x0 ) + εPXe0 (x0 )PXe1 (x1 ), where δa,b is the Kronecker-delta, has
the desired properties (note that it is a convex combination of the two distributions δx0 ,x1 PX (x0 ) and
PXe0 (x0 )PXe1 (x1 )).

2.2   Games
Definition 2.2. A game G = (PXY , Q) over X × Y × A × B is a distribution PXY over X × Y and a
predicate Q over X × Y × A × B. The value v(G) of a game is
                                   v(G) := max Pr [Q(X,Y, ha (X), hb (Y ))] ,
                                                ha ,hb XY

where the maximization is over functions ha : X → A and hb : Y → B. A strategy (ha , hb ) for a game is
a pair of such functions.
   Sometimes also randomized strategies for Alice and Bob are considered, where ha and hb also de-
pend on (the same) shared randomness r chosen according to some distribution PR . However, there
always exists an r ∈ R such that
                                                                                         
                   Pr [Q(X,Y, ha (X, R), hb (Y, R))] = E Pr [Q(X,Y, ha (X, R), hb (Y, R))]
                   RXY                                           R XY
                                                             ≤ Pr [Q(X,Y, ha (X, r), hb (Y, r))] ,           (2.5)
                                                                 XY


                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                   145
                                               T HOMAS H OLENSTEIN

and we see that the definition of the value is robust against such a change. Individual (local) randomness
can be obtained from shared randomness and is thus a special case of the above.
Definition 2.3. The n-fold parallel repetition Gn of a game G = (PXY , Q) over X × Y × A × B is the
game over Xn × Yn × An × Bn which is given by Gn := (PX nY n , Q∧n ) where
                                                                 n
                                        PX nY n (xn , yn ) := ∏ PXY (xi , yi ) ,            and
                                                                i=1
                                                                 n
                                  Q∧n (xn , yn , an , bn ) :=
                                                                ^
                                                                      Q(xi , yi , ai , bi ) .
                                                                i=1

   If a strategy is given, the distribution PX nY n An Bn of queries and answers is defined in the obvious way.
We further define, for all i, the event Wi which occurs if the ith subgame is won.
Definition 2.4. For a game Gn and a strategy (ha , hb ) the distribution PX nY n An Bn over Xn × Yn × An × Bn
is given by
                                                (
                              n   n   n   n      PX nY n (xn , yn ) if ha (xn ) = an and hb (yn ) = bn
              PX nY n An Bn (x , y , a , b ) :=
                                                 0                  otherwise.
Further, W n is the tuple of events (W1 , . . . ,Wn ) where Wi : ⇐⇒ Q(Xi ,Yi , Ai , Bi ).
    We prove the following version of the Parallel Repetition Theorem.
Theorem 2.5 (Parallel Repetition Theorem). For any game G with value v := v(G) and any integer n:
                                                  (1 − v)3  log(|A||B|)
                                                                   n

                                    v(Gn ) ≤ 1 −                          .
                                                    6000
    The constant 6000 could be improved by a more careful analysis (we will not optimize constants
which would improve it during the proof). However, we do not know whether the 3 in the exponent can
be reduced in the general case (as remarked above, it was reduced to 2 for some games by Rao [31]).
    In [29] it is shown that in Raz’s proof the term log(|A||B|) in the exponent can be reduced to the
maximum of the logarithm of the partition number of Q(x, y, ·, ·). As shown by Beame [4], the argument
can be adapted to work with the proof given here. We give a slightly different argument in Section 8
which shows how the term can be reduced to a quantity which is a lower bound on the logarithm of the
partition number.


3    Proof sketch
Fix an arbitrary game G, its n-fold parallel repetition Gn , and a strategy ha , hb for Gn . With the notation
from Definition 2.4, the parallel repetition theorem is simply an upper bound on Pr[W1 ∧ · · · ∧ Wn ]. To
get such an upper bound, we show that for arbitrary indices i1 , . . . , im there exists an index j such that
                                       Pr[W j |Wi1 ∧ · · · ∧Wim ] ≤ v(G) + ε ,                           (3.1)
where ε depends on m, n, log(|A||B|), and Pr[Wi1 ∧ · · · ∧Wim ] (this is Lemma 6.5). From (3.1) a simple
induction gives the parallel repetition theorem, thus we now concentrate on the proof of (3.1).

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                              146
                 PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

Locally computable embeddings In order to prove (3.1) we define the distribution

                                              PXenYen := PX nY n |Wi1 ∧···∧Wim                                       (3.2)

(i. e., the distribution of the message which the referee sends to Alice and Bob conditioned on the event
that the games i1 to im are won).
     We show (Lemma 6.4) that for some j the following can be achieved by Alice and Bob without
communication and using shared randomness only:

   1. Alice, on input x, produces a tuple x̄n with x̄ j = x.

   2. Bob, on input y, produces a tuple ȳn with ȳ j = y.

   3. Let PX nY n be the resulting joint distribution of the tuples (x̄n , ȳn ), assuming that (x, y) is chosen
      according to PXY . Then

                                                    kPX nY n − PXenYen k ≤ ε .

If 1-3 holds, we say that “(X,Y ) can be 1 − ε-embedded into (Xen , Ye n ) with (Xej , Yej ) = (X,Y ) by local
computation.”
    If such an embedding is given, we can consider the following strategy for the initial game G: Alice
and Bob embed their inputs (X,Y ) in (Xen , Ye n ) with (Xej , Yej ) = (X,Y ), and answer with coordinate j
of ha (Xen ) and hb (Ye n ). This strategy wins with probability at least Pr[W j |Wi1 ∧ · · · ∧ Wim ] − ε. Since no
strategy for the initial game has higher winning probability than v(G) this implies (3.1).
    We remark that a necessary condition for such an embedding to exist is that

                                                 kPXY − PXejYej k ≤ ε ,                                              (3.3)

and indeed this follows from Lemma 4.1 for U j = (X j ,Y j ) (of course this condition is not a sufficient
one).

Constructing an embedding We now give a more detailed explanation how Alice and Bob can em-
bed (X,Y ) into (Xen , Ye n ) with (Xej , Yej ) = (X,Y ). For this, given values (x, y) distributed according to PXY ,
Alice and Bob proceed as follows:

   1. Alice and Bob use shared randomness to produce queries and responses for all the games won,
      i. e., values (xi1 , yi1 , ai1 , bi1 ) to (xim , yim , aim , bim ). Here, Alice and Bob both produce all these values
      consistently (i. e., they ensure that they produce the same actual values).

   2. For every index i ∈ / {i1 , . . . , im , j}, Alice and Bob examine a shared random bit di . If di = 1 both
      locally produce xi , otherwise both locally produce yi . Again, Alice and Bob both produce all these
      values consistently.

   3. Using individual randomness, Alice and Bob locally expand their information such that Alice
      gets xn and Bob yn .

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                        147
                                                       T HOMAS H OLENSTEIN

In steps 1 and 2 we have to take care of two things: first, the values produced should be distributed
according to the respective marginals of the distribution PAen Ben XenYen |Xej =x∧Yej =y (where PAen Ben XenYen is defined
analogously to (3.2)). Second, Alice and Bob should produce equal values (otherwise the resulting
                             n n
random variables (X ,Y ) will not have the correct overall distribution).
    For step 1 achieving both is simple: it follows from Corollary 4.3 that Alice and Bob can choose the
values (xi1 , yi1 , ai1 , bi1 ), . . . , (xim , yim , aim , bim ) independently of (x, y) according to PXei Yei Aei Bei ···Xei Yei Aei Bei .
                                                                                                           1 1 1 1            m m m m
Using shared randomness this can be done such that both get the same tuple.
    The second step is harder, as in this case the values cannot be chosen independently of (x j , y j )
anymore.2 However, let Se be the random variables which Alice and Bob produce in Step 2. It will
follow from Corollary 4.3 that

                                            e Xej − PXY Se
                                       PXY PS|                     and             e Yej − PXY Se
                                                                              PXY PS|

are both small, and Lemma 5.2 implies that this is sufficient to generate Se locally.
    In fact, Corollary 4.3 and Lemma 5.2 are strong enough to do steps 1 and 2 at the same time, and
thus these steps are done simultaneously in the proof of Lemma 6.4.
    Step 3 will be simpler to implement, since conditioned on the values already generated, the values
Alice generates are independent of the values Bob generates. To see this, consider the following experi-
ment. Pick (X n ,Y n ) according to (PXY )n , then, for each coordinate reveal either X or Y . Clearly, given
this information, the non-revealed Xi are independent of the non-revealed Yi . Furthermore, this does not
change even if additionally coordinates i1 , . . . , im of ha (X n ) and hb (Y n ) are revealed to both, which is the
situation before step 3 (actually, one needs to take a little more care because Alice does not know exactly
the same values as Bob: Alice knows X j while Bob knows Y j —see Lemma 5.4 for the exact conditions).


4     Conditioned distributions
The following lemma is essentially Claim 5.1 in Raz’s paper [32] (and we use the proof given there).
It states that if random variables Ui are chosen independently, then conditioning on an event does not
change the individual distributions a lot on average, unless the event has very low probability.

Lemma 4.1. Let PU k := PU1 · · · PUk be a probability distribution over Uk , W an event. Then,
                                                                   k
                                                               −       (kPU j |W −PU j k)2
                                                 Pr[W ] ≤ 2 ∑ j=1                            .                                       (4.1)

    As an example, let Ui be uniform and independent bits and W be the event that at least k( 12 + ε) of
                                                                                     2
these bits are one. Then kPUi |W − PUi k ≥ ε and the lemma states that Pr[W ] ≤ 2−kε , which is a version
of Chernoff’s inequality (note that this implies that Lemma 4.1 is almost tight; see, for example, [22]).
    2 The values d can be chosen independently, but not the values of the x or the y . We quickly explain why this is impossible
                  i                                                        i        i
in general. Assume that the random variables X and Y contain a shared bit B. The game Gn and the strategy (ha , hb ) may be
such that Alice and Bob win subgame i1 in case B1 ⊕ · · · ⊕ Bn = 0. Generating the values independently of (x, y) would now
produce a distribution with statistical distance at least 21 from the target distribution. Therefore, a bit which is contained in both
x and y must be considered when generating the values of xi and yi .


                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                                     148
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

    Using (∑kj=1 a j )2 ≤ k ∑kj=1 a2j one checks that (4.1) implies
                                                                 s
                                    k                                      1 
                                   ∑ kPU |W − PU k ≤
                                               j         j       k log
                                                                          Pr[W ]
                                                                                 ,                      (4.2)
                                   j=1


which is the form we use later.
   The proof of Lemma 4.1 uses the relative entropy D(PS kPT ), which, for two distributions PS and PT
over the same set S, is defined as
                                                               P (s) 
                                                                 S
                                    D(PS kPT ) := ∑ PS (s) log                                          (4.3)
                                                  s∈S           PT (s)

(which is ∞ by convention if there is s ∈ S with PS (s) > 0 = PT (s)). This quantity satisfies D(PS kPT ) ≥
            2
 kPS − PT k (see [12, Lemma 11.6.1]). Furthermore, we have the following inequality, which is well
known, but for which we could not find a written proof in the literature.

Lemma 4.2. For any distributions PU k = PU1 · · · PUk and PV k over a set Uk we have

                                         k
                                         ∑ D(PV kPU ) ≤ D(PV kPU ) .
                                                     j       j       k      k                           (4.4)
                                         j=1


Proof. We prove the case k = 2; the general case follows by induction. We get

                                                    P (u )                               P (u ) 
                                                       V1 1                                  V2 2
        D(PV1V2 kPU1 PU2 ) = ∑ PV1V2 (u1 , u2 ) log               + ∑ PV1V2 (u1 , u2 ) log
                            u1 ,u2                    PU1 (u1 )      u1 ,u2                 PU2 (u2 )
                                                         P                    
                                                                V1V2 (u1 ,u2 )
                                + ∑ PV1V2 (u1 , u2 ) log
                                   u1 ,u2                  PV1 (u1 )PV2 (u2 )
                            = D(PV1 kPU1 ) + D(PV2 kPU2 ) + D(PV1V2 kPV1 PV2 ) ,

and D(PV1V2 kPV1 PV2 ) ≥ 0 (see [12, Theorem 2.6.3]).


    We can now give the proof of Lemma 4.1.


Proof of Lemma 4.1. Using the above we get

                 k                    2 k
                ∑     kPU j |W − PU j k ≤ ∑ D(PU j |W kPU j )
                j=1                            j=1

                                          ≤ D(PU k |W kPU k )

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                             149
                                              T HOMAS H OLENSTEIN

                                                               P k (uk ) 
                                                               k U |W
                                         = ∑ PU k |W (u ) log            k
                                            uk
                                                                 P U k (u )
                                                               Pr[W |U k = uk ] 
                                         = ∑ PU k |W (uk ) log
                                            uk
                                                                      Pr[W ]
                                                1 
                                                          + ∑ PU k |W (uk ) log Pr[W |U k = uk ]
                                                                                                
                                         = log
                                                Pr[W ]       uk
                                                1 
                                         ≤ log            .
                                                Pr[W ]

    We next give a slight extension of Lemma 4.1 (this makes it simpler to apply later). First, the U j
are independent given the value of an additional random variable T . Second, an arbitrary third random
variable V with bounded alphabet size gives side information about U j . Then, choosing U j without
considering the fact that an event W happened and ignoring V does not change the distribution of U j too
much on average. For the notation in the following corollary we refer to Section 2.1, equation (2.1) and
the subsequent remarks.

Corollary 4.3. Let PTU kV := PT PU1 |T PU2 |T · · · PUk |T PV |TU k be a probability distribution over T × Uk ×
V, W be an event. Then,
                                                                    s
                       k                                   √                                   1 
                      ∑ PTU V |W − PTV |W PU |T
                                j                   j
                                                          ≤ k          log(|V∗ |) + log              ,
                      j=1                                                                     Pr[W ]

where V∗ := {v ∈ V|PV |W (v) > 0}.

    The proof is essentially an application of Jensen’s inequality on Lemma 4.1.


Proof. Fix a pair (t, v) ∈ T × V and consider the distributions PU k |T =t,V =v,W and PU k |T =t . We apply
Lemma 4.1 (in the form given by (4.2)) on these distributions (with the event (V =v) ∧W ) and get

      k                                                                          k
     ∑ PTU V |W −PTV |W PU |T =
              j                 j               ∑              PTV |W (t, v) · ∑ PU j |T =t,V =v,W − PU j |T =t
     j=1                                (t,v):PTV |W (t,v)>0                   j=1
                                                                             s
                                                                                                  1          
                                    ≤           ∑              PTV |W (t, v) k log
                                        (t,v):PTV |W (t,v)>0
                                                                                           Pr[W ∧V = v|T = t]
                                      v
                                                                                                   1
                                      u                                                                      
                                    ≤ tk log                               PTV |W (t, v)                        ,   (4.5)
                                      u
                                                               ∑                           Pr[W ∧V = v|T = t]
                                                    (t,v):PTV |W (t,v)>0


                                                                                           p
where the last inequality is Jensen’s inequality applied to the function                    log(·), concave on [1, ∞). We

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                       150
                     PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

compute

                                                1                          Pr[T = t ∧V = v|W ]
              ∑         PTV |W (t, v)                     =       ∑
      (t,v):PTV |W (t,v)>0
                                        Pr[W ∧V = v|T = t] (t,v):P (t,v)>0 Pr[W ∧V = v|T = t]
                                                                           TV |W

                                                                                           Pr[T = t ∧V = v ∧W ] Pr[T = t]
                                                                =           ∑
                                                                    (t,v):PTV |W   (t,v)>0
                                                                                            Pr[W ] Pr[V = v ∧ T = t ∧W ]
                                                                                      Pr[T = t]    |V∗ |
                                                                =           ∑                   =        .
                                                                    (t,v):P   (t,v)>0
                                                                                       Pr[W ]     Pr[W ]
                                                                           TV |W


Inserting this into (4.5) completes the proof.


5      Embedding by local computation
We next study under what conditions random variables can be embedded into other random variables by
local computations. The intent is that Alice holds a random variable X, Bob holds Y , and they would
like to produce S and T , respectively, using the shared randomness R.

Definition 5.1 (Embeddable). For two distributions PX0Y0 and PX1 SY1 T we say that (X0 ,Y0 ) is (1 − ε)-
embeddable in (X1 S,Y1 T ) with (X1 ,Y1 ) = (X0 ,Y0 ) if there exists a probability measure PR over a set R
and functions fA : X × R → S, fB : Y × R → T, such that

                                                PX0Y0 PFA FB |XY − PX1Y1 ST ≤ ε ,

where PFA FB |X=xY =y is the distribution defined by the random variable ( fA (x, R), fB (y, R)).

      The following lemma gives a condition under which (X,Y ) is embeddable in (XS,Y S).

Lemma 5.2. Let a distribution PSXY be given. If

                                                    kPSXY − PXY PS|X k ≤ ε1                                                    (5.1)

and

                                                   kPSXY − PXY PS|Y k ≤ ε2 ,                                                   (5.2)

then (X,Y ) is (1 − 2ε1 − 2ε2 )-embeddable3 in (XS,Y S).

     Even if ε1 = ε2 = 0, equations (5.1) and (5.2) do not imply that S is independent of X and Y .
For example, if X and Y contain the same uniform random bit, then S can depend on this bit. However,
if ε1 = ε2 = 0 the lemma is obviously true: Alice uses shared randomness to choose S according to PS|X=x
(more concretely: Alice chooses a uniform random real ρ ∈ [0, 1] and uses the smallest element s for
    3 It is understood that the embedding satisfies (X,Y ) = (X,Y ), i. e., that the original random variables will result if from the

resulting (XS,Y S) the S-part is omitted.


                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                               151
                                                    T HOMAS H OLENSTEIN

                                            1

                                            3                        p3
                                            4
                                                  p1
                                            1
                                            2


                                            1
                                                                          p2
                                            4
                                                                                    p4
                                                                p5
                                                  s1       s2        s3        s4

Figure 1: Proof sketch of Lemma 5.2: Alice and Bob use the shared randomness to pick an infinite
sequence of points {pi }i≥0 in the plane. Both consider the first point which falls below their density
function, and pick the corresponding element. Here, Alice (dashed line) picks s3 , since p2 is the first
point below the dashed line. Bob (solid line) picks s4 , due to p4 . The probability that Alice and Bob
pick different values si is approximately proportional to the area between the lines, which is twice the
statistical distance.

which the cumulative distribution function ∑s0 ≤s PS|X=x (s0 ) is larger than ρ). Since Bob has the same
distribution PS|Y =y he will find the same value if he uses the same shared randomness.
    In case ε1 > 0 and ε2 > 0, we have to overcome the following problem: PS|Y =y is unknown to Alice
(since y is unknown to Alice), and analogously PS|X=x is unknown to Bob. The solution is depicted
graphically in Figure 1 for an alphabet of size 4: Alice and Bob paint their density function on a piece
of paper (where every element gets assigned the same width), then use the shared randomness to sample
points uniformly in the rectangle until one of the points is below their density. They pick the element
whose label is below the point.
    Note that they will get a different output only if the first element which is below any of the two lines
is above the other line. This happens with probability

                                                     kPS|X=x − PS|Y =y k
                                                                           .
                                                   1 + kPS|X=x − PS|Y =y k

We formalize this in the following proof.

Proof of Lemma 5.2. Let R := (S × [0, 1])∞ be the set of infinite sequences over S × [0, 1]. For a fixed
x, y and a sequence r := {(si , ρi )}i≥0 ∈ R, we define fA (x, r) := si if i is the smallest index for which
PS|X=x (si ) > ρi . Analogously, we define fB (y, r) := s j if j is the smallest index with PS|Y =y (s j ) > ρ j
and4 fAB (x, y, r) := sk if k is the smallest index with PS|X=xY =y (sk ) > ρk . If no such index exists, the
corresponding function is defined in an arbitrary way (this happens with probability 0).
   4 The use of f
                    AB in order to simplify the analysis was suggested by Anup Rao.



                               T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                          152
                 PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

     Let PXY FA FB FAB be the joint distribution of (x, y, fA (x, r), fB (y, r), fAB (x, y, r)) where (x, y) is chosen
according to PXY and r uniformly from R. We have PFAB |X=xY =y = PS|X=xY =y , PFA |X=x = PS|X=x and
PFB |Y =y = PS|Y =y , since these equalities hold conditioned on the event that the corresponding function
accepts in round i, for any fixed i.
     Further, we have Pr[FA = FAB |X = x,Y = y] ≥ 1 − 2kPFA |X=x − PFAB |X=xY =y k: the two values FA , FAB
are equal if ρ j < min(PFA |X=x (s j ), PFAB |X=xY =y (s j )) for the smallest j for which ρ j < max(PFA |X=x (s j ),
PFAB |X=xY =y (s j )) is satisfied. This happens with probability
                  ∑s min(PFA |X=x (s j ), PFAB |X=xY =y (s j ))   1 − kPFA |X=x − PFAB |X=xY =y k
                                                                =
                  ∑s max(PFA |X=x (s j ), PFAB |X=xY =y (s j )) 1 + kPFA |X=x − PFAB |X=xY =y k
                                                              ≥ 1 − 2kPFA |X=x − PFAB |X=xY =y k .
This yields Pr[FA = FAB ] ≥ 1 − 2ε1 , and analogously we get Pr[FB = FAB ] ≥ 1 − 2ε2 , and thus Pr[FA =
FB = FAB ] ≥ 1 − 2ε1 − 2ε2 . This implies
                    kPXY SS − PXY PFA FB |XY k = kPXY FAB FAB − PXY FA FB k ≥ 1 − 2ε1 − 2ε2 .
   In the following corollary, the input distribution is changed slightly. This makes it a bit easier to
apply later.
Corollary 5.3. Let distributions PSXY and PX0Y0 be given. If
                                             kPSXY − PX0Y0 PS|X k ≤ ε1                                          (5.3)

and

                                            kPSXY − PX0Y0 PS|Y k ≤ ε2 ,                                         (5.4)
then (X0 ,Y0 ) is (1 − 3ε1 − 2ε2 )-embeddable5 in (XS,Y S) with (X,Y ) = (X0 ,Y0 ).
Proof. From (5.3) we get kPXY − PX0Y0 k ≤ ε1 . One can now find a joint distribution PXY X0Y0 with
Pr[(X,Y ) = (X0 ,Y0 )] ≥ 1 − ε1 . The corollary now follows by applying fA and fB from Lemma 5.2.

    Random variables S, T,U form a Markov chain, written S ↔ T ↔ U, if PSTU = PT PS|T PU|T (i. e., if
given T the random variable U is independent of S). The following lemma is essentially Lemma 4.1 in
Raz’s paper.
Lemma 5.4. Let PXY ST be any distribution. If
                                   S ↔ X ↔ YT           and        XS ↔ Y ↔ T
then (X,Y ) is 1-embeddable in (XS,Y T ).
Proof. Using individual (non-shared) randomness, Alice computes S according to PS|X=x and Bob com-
putes T according to PT |Y =y . Since
                                  PST XY = PXY PS|XY PT |SXY = PXY PS|X PT |Y                                   (5.5)
this gives the correct (global) distribution.
   5 The statement could be made symmetric: (X ,Y ) is (1 − 2ε − 2ε − min(ε , ε ))-embeddable.
                                              0 0             1    2       1 2



                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                     153
                                                T HOMAS H OLENSTEIN

6    Embeddings for games
Given a game G and its n-fold parallel repetition, we now show that (X,Y ) can be embedded into
(Xen , Ye n ), where PXenYen := PX nY n |Wk+1 ∧···∧Wn .
     We need the following simple fact on statistical distance.

Claim 6.1. Let PZ0 and PZ1 be distributions over Z. Let S ⊆ Z be such that Pr[Z0 ∈ S] = Pr[Z1 ∈ S] = 12 .
Then,

                                     kPZ0 |Z0 ∈S − PZ1 |Z1 ∈S k ≤ 2kPZ0 − PZ1 k .

Proof. Use Lemma 2.1, (2.3).

    Also, we need the following statements about Markov chains.

Claim 6.2. Let PX0Y0 PX1Y1 be a distribution over X0 ×Y0 ×X1 ×Y1 , f : X0 ×X1 → U and g : Y0 ×Y1 → V
be arbitrary. Then,

                                    X0 X1 ↔ X0 f (X0 , X1 )Y1 g(Y0 ,Y1 ) ↔ Y0Y1 .                   (6.1)

Proof. It is sufficient to show this for all possible values x0 ∈ X0 and y1 ∈ Y1 . Let

                                 PYe0 Xe1 := PY0 X1 |X0 =x0Y1 =y1 = PY0 |X0 =x0 PX1 |Y1 =y1 .

In this case, (6.1) reduces to

                                           Xe1 ↔ f (x0 , Xe1 )g(Ye0 , y1 ) ↔ Ye0 .

Since Xe1 and Ye0 are independent this is obvious.

Claim 6.3. Let PTUV be a distribution over T × U × V and W an event with

                                                     T ↔ U ↔ V,
                                                    W ↔ U ↔ TV .

Then, for PTeUeVe := PTUV |W we have Te ↔ U
                                          e ↔ Ve .

Proof.

                            PTeUeVe (t, u, v) = PTUV |W (t, u, v)
                                               = PU|W (u)PTV |U=u,W (t, v)
                                               = PU|W (u)PTV |U=u (t, v)
                                               = PU|W (u)PT |U=u (t)PV |U=u (v)
                                               = PU|W (u)PT |U=u,W (t)PV |U=u,W (v) .


                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                          154
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

Lemma 6.4. Let a game Gn = (Qn , (PXY )n ), a strategy (ha , hb ), and k ≤ n be given. Let

                                            PXenYen := PX nY n |Wk+1 ∧···∧Wn .

      Then, for 1 ≤ j ≤ k, there exists ε j ≥ 0 such that (X,Y ) is (1 − ε j )-embeddable in (Xen , Ye n ) with
(Xej , Yej ) = (X,Y ) and
                                       s
                         k          √                                       1             
                        ∑   ε j ≤ 15 k   (n − k) log(|A| |B|) + log                          .            (6.2)
                        j=1                                          Pr[Wk+1 ∧ · · · ∧Wn ]

Proof. As described in Definition 2.4 we consider the distribution PX nY n An Bn , the corresponding random
variables, and the events W n . Additionally, we let D1 , . . . , Dk be uniform and independent bits. For 1 ≤
j ≤ k we define
                                                (
                                                  X j if D j = 0
                                         U j :=
                                                  Y j otherwise

and
                                                     (
                                                      Yj         if D j = 0
                                              U j :=
                                                      Xj         otherwise.

Also, we set
                                                                                      k
                                   T := (Xk+1 , . . . , Xn ,Yk+1 , . . . ,Yn , Dk ,U ) ,                 (6.3)
                                   V := (Ak+1 , . . . , An , Bk+1 , . . . , Bn ) ,                       (6.4)

and define the event W := Wk+1 ∧ · · · ∧Wn .
   From Corollary 4.3 we get
                                       k
                                      ∑ PTU V |W − PTV |W PU |T ≤ εTot ,
                                                  j                     j
                                                                                                         (6.5)
                                      j=1

where we set
                                             s
                                        √                                             1 
                                 εTot := k      (n − k) log(|A||B|) + log                                (6.6)
                                                                                     Pr[W ]

(we applied Corollary 4.3 using |V∗ | ≤ |V|).
   In (6.5), we condition on both sides on the event D j = 0, which is, on both sides, a restriction on a
subset which has probability 21 . Claim 6.1 implies
                             k
                            ∑ PTU V |W ∧(D =0) − PTV |W ∧(D =0) PU |T ≤ 2εTot ,
                                       j         j                      j        j
                                                                                                         (6.7)
                            j=1



                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                               155
                                                 T HOMAS H OLENSTEIN

where we do not need to condition on D j = 0 in PU j |T since this is included in the given t anyhow; in
fact we can now write PX j |Y j instead of PU j |T .
    For a fixed j, define the random variable

                                T (\ j) := (Xk+1 , . . . , Xn ,Yk+1 , . . . ,Yn ,
                                                         D1 , . . . , D j−1 , D j+1 , . . . , Dk ,
                                                         U 1 , . . . ,U j−1 ,U j+1 , . . . ,U k ) .             (6.8)

With this notation (6.7) is equivalent to
                        k
                       ∑ PT    (\ j) X Y V |W ∧(D =0)
                                      j j        j
                                                             − PT (\ j)Y jV |W ∧(D j =0) PX j |Y j ≤ 2εTot .    (6.9)
                       j=1

But now nothing depends on D j = 0 anymore, so this also means
                                k
                               ∑ PT        (\ j) X Y V |W
                                                  j j
                                                             − PT (\ j)Y jV |W PX j |Y j ≤ 2εTot .             (6.10)
                               j=1

We set S := (T (\ j) ,V ) and define the probability distribution

                                                   PSeXenYen := PSX nY n |W .                                  (6.11)

With this, (6.10) becomes
                                           k
                                       ∑ PSeXe Ye − PSeYe PX |Y
                                                       j j             j       j       j
                                                                                           ≤ 2εTot ,           (6.12)
                                       j=1

or, equivalently
                                       k
                                     ∑ PSeXe Ye − PYe PS|e Ye PX|Y ≤ 2εTot .
                                                   j j             j       j
                                                                                                               (6.13)
                                     j=1

Lemma 4.1 implies
                                                   k
                                                  ∑ PYe − PY ≤ εTot ,
                                                               j
                                                                                                               (6.14)
                                                  j=1

and thus
                                            k
                                           ∑ PSeXe Ye − PXY PS|e Ye
                                                       j j                         j
                                                                                           ≤ 3εTot .           (6.15)
                                           j=1

Symmetric reasoning yields
                                           k
                                           ∑ PSeXe Ye − PXY PS|e Xe
                                                       j j                         j
                                                                                           ≤ 3εTot .           (6.16)
                                           j=1



                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                     156
                 PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

From (6.15) and (6.16), Corollary 5.3 implies that (X,Y ) is (1 − ε j )-embeddable in (Xej S,    e with
                                                                                           e Yej S)
                                     k
(Xej , Yej ) = (X,Y ) and such that ∑ j=1 ε j ≤ 15εTot .
      We next show that
                                                  X k ↔ TV ↔ Y k .                                             (6.17)
If the bits Dk and the values Xk+1 , . . . , Xn , Yk+1 , . . . ,Yn are fixed, this follows immediately from Claim 6.2.
Since it holds for all these values, it must also hold overall.
     From (6.17) we easily get
                                                 X n ↔ X j S ↔ Y nY j S
                                                X nX j S ↔ Y j S ↔ Y n .
Claim 6.3 yields
                                                 Xen ↔ Xej Se ↔ Ye nYej Se                                     (6.18)
                                                Xen Xej Se ↔ Yej Se ↔ Ye n .                                   (6.19)

    Above we have seen that (X,Y ) is embeddable in (Xej S,      e with (Xej , Yej ) = (X,Y ). Lemma 5.4
                                                           e Yej S)
                                                                                          e Ye nYej S).
together with (6.18) and (6.19) now implies that we can 1-locally embed this in (Xen Xej S,         e Since
Alice and Bob can then ignore part of the information constructed, this completes the proof.

Lemma 6.5. Let a game G = (Q, PXY ), its n-fold repetition Gn , and a strategy (ha , hb ) for Gn be given.
Let indices i1 , . . . , im be given. Then there exists an index im+1 such that
                Pr[Wim+1 |Wi1 ∧ · · · ∧Wim ]
                                         r           s
                                                1                                     1             
                          ≤ v(G) + 15                   m log(|A||B|) + log                            .       (6.20)
                                               n−m                             Pr[Wi1 ∧ · · · ∧Wim ]

Proof. First, we can assume that the given indices i` , 1 ≤ ` ≤ m, are pairwise different (otherwise we
get a stronger statement). Given this we can even assume that i` = n − ` + 1 by appropriately redefining
the functions (ha , hb ).
    Define the distribution PXenYen := PX nY n |Wn−m+1 ∧···∧Wn . Lemma 6.4 implies that there exists an index j
such that (X,Y ) is (1 − ε)-embeddable in (Xen , Ye n ) with (Xej , Yej ) = (X,Y ) and
                            r         s
                                  1                                           1            
                     ε := 15            m log(|A||B|) + log                                   .
                               n−m                                  Pr[Wn−m+1 ∧ · · · ∧Wn ]

Consider the following strategy for G. On input (X,Y ) Alice and Bob 1 − ε-embed this into (Xen , Ye n )
with (Xej , Yej ) = (X,Y ). Since the resulting distribution has statistical distance at most ε from PXenYen , if
they output coordinate j of ha (Xen ) and hb (Ye n ) they have probability at least Pr[W j |Wn−m+1 ∧· · ·∧Wn ]−ε
to win the initial game. The shared randomness can be eliminated (see the remark after Definition 2.2),
and thus
                                     v(G) ≥ Pr[W j |Wn−m+1 ∧ · · · ∧Wn ] − ε .


                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                     157
                                                 T HOMAS H OLENSTEIN

7    Parallel repetition theorem
We will need the following recursion in the proof of Theorem 2.5.

Lemma 7.1. Fix 0 < v < 1, c ≥ 12, ` ≥ 1, and let p0 , . . . , pn be a non-increasing sequence of non-
negative reals with p0 = 1 and
                                       r        s
                                             c                     1 
                             pm+1 ≤ pm v +          m` + log             .                       (7.1)
                                           n−m                     pm

Then,
                                                         (1 − v)3  n`
                                                 pn ≤ 1 −               .                         (7.2)
                                                            26c
Proof. We show by induction that
                                                            1 + v m
                                                    pm ≤                ,
                                                               2
                    2
as long as m ≤ (1−v)
                 12c` (n − m). The statement holds for m = 0 and we now make a step from m to m + 1.
First, we can assume that
                                                 1 + v m+1 1 m+1
                                        pm ≥                 >        ,
                                                   2            2
as otherwise the induction step is trivial. In this case, (7.1) yields
                                              r                            
                                                         c p
                              pm+1 ≤ pm · v +                   m` + (m + 1)
                                                      n−m
                                                         1 √
                                              r                     
                                     ≤ pm · v +                3cm`
                                                      n−m
                                                  1−v            1+v
                                     ≤ pm · v +            = pm ·       ,                      (7.3)
                                                     2               2
                         2
where we used m ≤ (1−v)
                   12c` (n − m) in the last inequality.
                                  2                 (1−v)               2
    We get, setting m∗ = n(1−v)                   ∗
                           13c` (which satisfies m ≤ 12c` (n − m) because c ≥ 12):

                                                                             2
                                                             1 + v  n(1−v)
                                                                        13c`
                                            pn ≤ pm∗ ≤                           .                (7.4)
                                                                2
We have
                                             2               (1−v)2
                              1 + v  (1−v)
                                         13c
                                                     1 − v  13c        (1 − v)3
                                                 = 1−               ≤ 1−          ,               (7.5)
                                 2                      2                  26c
where the last inequality follows from (1 − b)a ≤ 1 − ab which holds for all a ∈ [0, 1], b ≤ 1.

                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                        158
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

Remark 7.2. The minimal value of the sequence defined by p0 := 1 and the relation
                                          r
                                               c p                   
                         pm+1 := pm v +              m` + log(1/pm )
                                            n−m
                          n
is indeed 1 − Θ((1 − v)3 ) ` . The lemma above shows that the minimal value can only be lower. On the
other hand, the sequence given by
                                                          r m` 
                                 p00 := 1 ,  p0m+1 := p0m v +
                                                                 n
is strictly smaller than the sequence {p j } j≥0 . This sequence does not decrease anymore if m > m0 :=
n(1 − v)2 /`, and
                  m0 −1
                            r 
              0                i`      0                      2n          3n                n
             pm0 = ∏ v +          ≥ vm = (1 − (1 − v))(1−v) ` ≈ e−(1−v) ` ≈ (1 − (1 − v)3 ) ` .
                   i=0         n

    We now prove the Parallel Repetition Theorem.

Proof of Theorem 2.5. Fix a strategy (ha , hb ) for Gn . Then, repeatedly choose the index im+1 for which
Pr[Wim+1 | Wi1 ∧ · · · ∧Wim ] is minimized. We set p0 := 1 and pm := Pr[Wi1 ∧ · · · ∧Wim ]. Lemma 6.5 implies
                                            r         s
                                                   1                          1 
                         pm+1 ≤ pm · v + 15              m log(|A||B|) + log            .               (7.6)
                                                n−m                             pm

We apply Lemma 7.1 and get
                                                          (1 − v)3  log(|A||B|)
                                                                           n

                            Pr[W1 ∧ · · · ∧Wn ] = pn ≤ 1 −                        .                     (7.7)
                                                            6000



8    Improving the rate
Theorem 2.5 shows that n-fold parallel repetition reduces the winning probability from v(G) to (1 −
                    n
Θ(1 − v(G))3 ) log(|A||B|) . As shown in [29], the term |A| · |B| in the exponent can be reduced to the the
maximum (over x, y) number of (fractional) rectangles needed to cover the 1-entries in Q(x, y, ·, ·). Here,
we show that it can be reduced to a quantity which is possibly smaller in some cases.

Definition 8.1 (Exact Fractional Product Cover). Let Q : A × B → {0, 1} be an arbitrary predicate. Two
functions f : A × {1, . . . , α} → [0, 1] and g : B × {1, . . . , α} → [0, 1] form an exact fractional product
cover of size α for Q if for all a, b we have
                                                    α
                                        Q(a, b) = ∑ f (a, i)g(b, i) .
                                                    i=1


                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                               159
                                                  T HOMAS H OLENSTEIN

   Clearly, any partition by rectangles gives an exact fractional product cover (by defining f (a, i)
and g(b, i) as appropriate predicates). We will prove the following strengthening of Theorem 2.5.
Theorem 8.2. Let G = (PXY , Q) be a game. Let α be such that for all (x, y) there exists an exact
fractional product cover of size α for Qx,y (a, b) := Q(x, y, a, b). If α > 1 then
                                                     (1 − v)3  log(α)
                                                                    n

                                      v(Gn ) ≤ 1 −                      ,                          (8.1)
                                                       6000
and if α = 1 then
                                                       (1 − v)2 n
                                        v(Gn ) ≤ 1 −                  .                            (8.2)
                                                         6000
    To prove Theorem 8.2 we first need a characterization of fractional product covers by Markov chains.
Lemma 8.3. Let a distribution PABZ = PA PB PZ|AB be given for which there exist functions f (a, z) :
A × Z → [0, 1] and g(b, z) : B × Z → [0, 1] which satisfy
                                            PZ|A=aB=b (z) = f (a, z) · g(b, z) .                                            (8.3)
Then A ↔ Z ↔ B.
    Lemma 8.3 has a converse in the following sense: if PZ|AB is such that A ↔ Z ↔ B for all distributions
PA PB , then PZ|AB is of the form (8.3) for some functions f and g. For completeness, we prove this in
Section 10.2.

Proof of Lemma 8.3. We get
                                                    PABZ (a, b, z)
                                PA|B=bZ=z (a) =
                                                     PBZ (b, z)
                                                      PA (a)PB (b) f (a, z)g(b, z)
                                                  =
                                                    ∑a0 PB (b)PA (a0 ) f (a0 , z)g(b, z)
                                                      PA (a) f (a, z)
                                                  =
                                                    ∑a0 PA (a0 ) f (a0 , z)
                                                     ∑b0 PA (a)PB (b0 ) f (a, z)g(b0 , z)
                                                  =
                                                    ∑a0 ,b0 PA (a0 )PB (b0 ) f (a0 , z)g(b0 , z)
                                                    PAZ (a, z)
                                                  =
                                                     PZ (z)
                                                  = PA|Z=z (a) ,
and thus PABZ (a, b, z) = PZ (z)PB|Z=z (b)PA|B=bZ=z (a) = PZ (z)PB|Z=z (b)PA|Z=z (a), which means that
A ↔ Z ↔ B.

Lemma 8.4. Assume Q : A × B → {0, 1} has a fractional product cover of size α. Then there exists a
conditional distribution PZ|AB over6 Z = {1, . . . , α} ∪ (A × B) such that
  6 The intention is that {1, . . . , α} ∩ (A × B) = 0,
                                                     / such that on a given z one can tell whether it’s from {1, . . . , α} or from
A × B.


                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                               160
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

   • for any product distributions PAB = PA PB and PABZ = PA PB PZ|AB we have A ↔ Z ↔ B;

   • Q(a, b) = 1 ⇐⇒ z ∈ {1, . . . , α}.

Proof. Let f , g be the functions as guaranteed by the exact fractional product cover. We set
                                     
                                     1
                                                     if Q(a, b) = 0 ∧ z = (a, b)
                     PZ|A=a,B=b (z) = f (a, z)g(b, z) if Q(a, b) = 1 ∧ z ∈ {1, . . . , α}
                                     
                                      0               otherwise.
                                     

The properties follow immediately from the definition and Lemma 8.3.

   We strengthen Claim 6.2:

Claim 8.5. Let PX0Y0 PX1Y1 be a distribution over X0 ×Y0 ×X1 ×Y1 , f : X0 ×X1 → A and g : Y0 ×Y1 → B
be arbitrary. Let PZ|AB be such that for any product distribution PAB = PA PB the Markov condition
A ↔ Z ↔ B is satisfied. Then, if Z is obtained using PZ|A= f (X0 ,Y1 ),B=g(Y0 ,Y1 ) we have

                                           X0 X1 ↔ X0 ZY1 ↔ Y0Y1 .                                          (8.4)

Proof. We fix x0 ∈ X0 and y1 ∈ Y1 throughout the proof, and consider everything conditioned on these
values. Then f : X1 → A and g : Y0 → B. We have

        PX1Y0 FGZ = PX1 PF|X1 PY0 PY0 |G PZ|FG = PF PX1 |F PG PY0 |G PZ|FG = PZ PF|Z PG|Z PX1 |F PY0 |G ,

which implies the claim.

   For the case α = 1 we will need a slightly different recursion than the one given in Lemma 7.1.

Lemma 8.6. Fix v < 1, c > 5, and let p0 , . . . , pn be a sequence of non-increasing non-negative reals
with p0 = 1 and
                                         r                s
                                                       c         1 
                            pm+1 ≤ pm v +                    log        .                         (8.5)
                                                     n−m         pm

Then,
                                                    (1 − v)2 n
                                            pn ≤ 1 −             .                                          (8.6)
                                                       10c

Proof. We show by induction that pm ≤ ( 1+v  m
                                          2 ) as long as m + 1 ≤ (n − m)(1 − v)/(4c). Clearly, this
holds for m = 0. To make a step from m to m + 1 we can assume
                              m+1              m+1             (1−v)(m+1)
                         1+v               1−v                1
              pm ≥                    = 1−               ≥ 1−                     = 2−(1−v)(m+1)
                          2                 2                 2

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                161
                                              T HOMAS H OLENSTEIN

(where we used (1 − b)a ≤ 1 − ab for a ∈ [0, 1], b ≤ 1) which means that
                                              r c(m + 1)(1 − v) 
                               pm+1 ≤ pm · v +                                                          (8.7)
                                                          n−m
                                                   v
                                     ≤ pm · 1 +                                                         (8.8)
                                                   2
as long as m+1 ≤ (n − m)(1 − v)/(4c), which implies the hypothesis. We thus get for m = n(1 − v)/(5c)
                                                       1 + v  n(1−v)
                                                                  5c
                                              pm ≤                       .
                                                          2
Finally,
                                 (1−v)/5c
                                                  (1 − v) (1−v)/5c      (1 − v)2
                                                       
                           1+v
                                             = 1−                  ≤ 1−          ,
                            2                        2                    10c
again using (1 − b)a ≤ 1 − ab.
                                                                        n
Remark 8.7. Lemma 8.6 is essentially tight: in case pm ≤ 1 − Θ((1 − v)2 ) equation (8.5) only
implies
                q                                    q          
                                                                 
  pm+1 ≤ pm v + c − log(1 − Θ (1 − v)2 ) = pm v + c Θ (1 − v)2 = pm (v + 1 − v) = pm .

    We now come to the proof of Theorem 8.2.

Proof (of Theorem 8.2). We first show that Lemma 6.4 still holds if we replace (6.2) by
                                    s
                      k          √                                  1            
                     ∑   ε j ≤ 15 k  (n −  k) log(α) + log                          .                   (8.9)
                     j=1                                    Pr[Wk+1 ∧ · · · ∧Wn ]

                                                                 k
    For this, we define the random variables Dk , U k , U , and T exactly as in the proof of Lemma 6.4.
Instead of (6.4) we now define

                                              V := (Zk+1 , . . . , Zn ) ,                              (8.10)

where Zi is obtained from (Ai , Bi , Xi ,Yi ) by a channel that has alphabet size at most α in case Wi , which
ensures Ai ↔ XiYi Zi ↔ Bi in case Ai and Bi are independent, and for which Wi can be inferred from
(Xi ,Yi , Zi ). The existence of such a random variable is ensured by Lemma 8.4 and the fact that for
every (x, y) there exists an exact fractional product cover of size α for Q(x, y, ·, ·).
     From Corollary 4.3 we now get
                                     k
                                    ∑ PTU V |W − PTV |W PU |T ≤ εTot ,
                                               j                     j
                                                                                                       (8.11)
                                    j=1



                        T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                               162
                PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

where we set
                                             s
                                       √                                       1 
                                εTot := k      (n − k) log(α) + log                  .                   (8.12)
                                                                              Pr[W ]

    For a fixed j we define T (\ j) as in the proof of Lemma 6.4 and obtain in exactly the same way for
S := (T (\ j) ,V ) and

                                             PSeXenYen := PSX nY n |W                                    (8.13)

the equations

                                        k
                                       ∑ PSeXe Ye − PXY PS|e Ye
                                                  j j                j
                                                                         ≤ 3εTot                         (8.14)
                                       j=1


and
                                       k
                                      ∑ PSeXe Ye − PXY PS|e Xe
                                                 j j                 j
                                                                         ≤ 3εTot .                       (8.15)
                                      j=1


Again, Corollary 5.3 implies that (X,Y ) is (1 − ε j )-embeddable in (Xej S,    e with (Xej , Yej ) = (X,Y ) and
                                                                          e Yej S)
           k
such that ∑ j=1 ε j ≤ 15εTot .
   Again we get

                                                 X k ↔ TV ↔ Y k ,                                        (8.16)

now using the properties of the Zi . (This is done as follows: clearly, X k ↔ T ↔ Y k , i. e.for a fixed
values t for T the X k and Y k are independent. Now, inductively adding Zi will not change this in any
step, due to Claim 8.5.) Claim 6.3 now yields

                                              Xen ↔ Xej Se ↔ Ye nYej Se                                  (8.17)
                                             Xen Xej Se ↔ Yej Se ↔ Ye n ,                                (8.18)

and Lemma 5.4 completes the proof that (8.9) can replace (6.2) in Lemma 6.4.
   From Lemma 6.4 where (6.2) is replaced by (8.9) we apply Lemma 7.1 and get (8.1). To get (8.2)
we note first that in this case (8.9) reduces to
                                                 s
                                 k        √                           1             
                                ∑ ε j ≤ 15 k            log
                                                               Pr[Wk+1 ∧ · · · ∧Wn ]
                                                                                       ,                 (8.19)
                                j=1


and so we can apply Lemma 8.6.

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                163
                                            T HOMAS H OLENSTEIN

9      No-signaling strategies
No-signaling strategies are those where the only restriction on the response of Alice and Bob is that they
do not imply communication.
Definition 9.1 (No-signaling). A pair (ha , hb ) of functions is no-signaling if ha : X × Y × R → A and hb :
X × Y × R → B satisfy
                                       Pr[ha (x, y, R)] = Pr[ha (x, y0 , R)]
                                       R                    R
                                       Pr[hb (x, y, R)] = Pr[hb (x0 , y, R)] ,
                                       R                    R

for all x, x0 , y, y0 .
Definition 9.2 (No-signaling value). The no-signaling value vns (G) of a game G = (PXY , Q) over X ×
Y × A × B is
                            vns (G) := max Pr [Q(X,Y, ha (X,Y, R), hb (X,Y, R))] ,
                                            XY R

where the maximum is over all no-signaling functions (ha , hb ).
    Clearly, v(G) ≤ vns (G), since any local strategy is a no-signaling strategy. We further note that for
no-signaling strategies vns (G2 ) > (vns (G))2 is also possible, similarly to the local case (see Section 10.1).
    We will prove the following upper bound on the no-signaling value of parallel repetition.
Theorem 9.3. For any game G with no-signaling value vns := vns (G) and any integer n we have
                                                          (1 − vns )2 n
                                        vns (Gn ) ≤ 1 −                   .                                 (9.1)
                                                              1000
    We remark that the proof of this theorem will be much simpler than the proof of Theorem 2.5.
    We first show that if PXY ST is a distribution which is close to no-signaling (i. e., kPXY S − PXY PS|X k ≤
ε and kPXY T − PXY PT |Y k ≤ ε) then there exists a no-signaling strategy which produces values (s0 ,t 0 )
from (x, y) such that the distribution (S0 , X, T 0 ,Y ) is statistically close to the distribution (S, X, T,Y ).
This will follow by applying the following lemma twice.
Lemma 9.4. Let PS1 T , PS2 be arbitrary distributions over S × T and S, respectively, where S and T are
finite. Then there exists a distribution PS T such that
                                        kPS T − PS1 T k ≤ kPS1 − PS2 k                                     (9.2)
                                           kPS − PS2 k = 0                                                 (9.3)
                                            kPT − PT k = 0 .                                               (9.4)
Proof. Let PSe1 Se2 be the joint distribution guaranteed in Lemma 2.1, (2.4) which satisfies PSe1 = PS1 ,
PSe2 = PS2 and

                                       Pr[Se1 = Se2 ] = 1 − kPS1 − PS2 k .                                 (9.5)
We consider PSe1 Se2 T = PSe1 Se2 PT |S1 and set PST = PSe2 T to be the corresponding marginal. Equations (9.3)
and (9.4) are clear. To see (9.2) note that with our definitions PS T and PS1 T are marginals of the same
distribution which additionally satisfies (9.5).

                          T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                164
                   PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

Lemma 9.5. Let PX0Y0 and PXY ST be arbitrary distributions. If

                                                   kPX0Y0 PS|X − PXY S k ≤ ε1 ,                                           (9.6)
                                                  kPX0Y0 PT |Y − PXY T k ≤ ε2 ,                                           (9.7)

then there exists a conditional distribution PS0 T 0 |X 0 =xY 0 =y with PS0 |X 0 =xY 0 =y = PS0 |X 0 =x and PT 0 |X 0 =xY 0 =y =
PT 0 |Y 0 =y such that

                                            kPX0Y0 PS0 T 0 |XY − PXY ST k ≤ 3ε1 + 2ε2 .                                   (9.8)

Proof. For fixed x, y we define PS0 T0 |X=xY =y using Lemma 9.4 with the following properties:

                              kPS0 T0 |X=xY =y − PST |X=xY =y k ≤ kPS|X=x − PS|X=xY =y k
                                        kPS0 |X=xY =y − PS|X=x k = 0
                                 kPT0 |X=xY =y − PT |X=xY =y k = 0 .

Then, again using Lemma 9.4, we define PS0 T 0 |X=xY =y such that

                             kPS0 T 0 |X=xY =y − PS0 T0 |X=xY =y k ≤ kPT0 |Y =y − PT0 |X=xY =y k
                                        kPT 0 |X=xY =y − PT0 |Y =y k = 0
                                 kPS0 |X=xY =y − PS0 |X=xY =y k = 0 .

We see that for all pairs x, y we have PS0 |X=xY =y = PS0 |X=x and PT 0 |X=xY =y = PT 0 |Y =y .
   We further get

                kPX0Y0 PS0 T 0 |XY − PXY ST k
                         ≤ ε1 + kPXY PS0 T 0 |XY − PXY ST k
                         = ε1 + ∑ PXY (x, y)PS0 T 0 |X=xY =y (s,t) − PXY (x, y)PST |X=xY =y (s,t)
                                  x,y,s,t
                                                                                               
                         ≤ ε1 + ∑ PXY (x, y) kPS|X=x − PS|X=xY =y k + kPT |Y =y − PT |X=xY =y k
                                  x,y

                         ≤ ε1 + kPXY PS|X − PSXY k + kPXY PT |Y − PT XY k
                         ≤ 3ε1 + 2ε2 .

    We can now prove a no-signaling analogue of Lemma 6.5.7
Lemma 9.6. Let a game G = (Q, PXY ), its n-fold repetition Gn , and a no-signaling strategy (ha , hb )
for Gn be given. Let indices i1 , . . . , im be given. Then there exists an index im+1 such that

                          Pr[Wim+1 |Wi1 ∧ · · · ∧Wim ]
                                                          r         s
                                                               1                    1             
                                        ≤ vns (G) + 10                log                            .                    (9.9)
                                                              n−m            Pr[Wi1 ∧ · · · ∧Wim ]
   7 A previous version of the proof of this lemma contained an error, which was first noticed by Oded Regev and Ricky Rosen.



                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                            165
                                                   T HOMAS H OLENSTEIN

Proof. As in the proof of Lemma 6.5 we assume that i` = n−`+1 and we define W := Wn−m+1 ∧· · ·∧Wn .
The no-signaling property of (ha , hb ) implies PX nY n An = PX n PY n |X n PAn |X n = PAn X n PY n |X n . Thus, when we
apply Corollary 4.3 on this distribution (with the event W and the random variables T = (X n , An ) and
U j = Y j ) we get
                        n−m                                                n−m
                         ∑ PX A Y |W − PX A |W PY |X
                                   n n
                                           j
                                                    n n
                                                               j   j
                                                                       = ∑ PTY j |W − PT |W PY j |T
                         j=1                                                j=1
                                                                           s
                                                                                               1 
                                                                       ≤       (n − m) log           .
                                                                                              Pr[W ]

Taking appropriate marginals this gives
                                                                            s
                          n−m                                                                   1 
                          ∑      PX jY j A j |W − PX j A j |W PY j |X j ≤       (n − m) log           .
                          j=1                                                                  Pr[W ]

Applying Lemma 4.1 once more and rearranging we get
                                                                            s
                          n−m                                                                   1   
                           ∑ PX Y A |W − PXY PA |X W ≤ 2 (n − m) log Pr[W ] .
                                     j j       j           j   j
                                                                                                                 (9.10)
                          j=1

Symmetrically, we obtain
                                                                            s
                          n−m                                                                   1 
                           ∑     PX jY j B j |W − PXY PB j |Y jW ≤ 2           (n − m) log            .          (9.11)
                           j=1                                                                 Pr[W ]

From (9.10), (9.11), and Lemma 9.5 we get that there exists a distribution PA0j B0j |XY which can be imple-
mented by no-signaling functions and for which
                                                                     s
                   n−m                                                          1 
                    ∑   P XY P 0   0
                              A j B j |XY − PX j Y j A j B j |W ≤ 10   (n − m) log        .
                    j=1                                                            Pr[W ]

Thus, if Alice and Bob use the strategy implied by PA0j B0j |XY (which is no-signaling) they can win the
                                              p         p
initial game with probability Pr[W j |W ] − 10 1/(n − m) log(1/ Pr[W ]) for some j, which implies the
lemma.

Proof (of Theorem 9.3). Fix a no-signaling strategy (ha , hb ) for G. As in the proof of Theorem 2.5 we
repeatedly select indices im+1 such that Pr[Wim+1 |Wi1 ∧ · · · ∧Wim ] is minimized. Let pm := Pr[Wi1 ∧ · · · ∧
Wim ]. Lemma 9.6 implies
                                                     s
                                                           1        1 
                                pm+1 ≤ pm · v + 10              log          .                       (9.12)
                                                        n−m            pm

From (9.12) we apply Lemma 8.6 to get (9.1).

                           T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                      166
                  PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

10     Appendix
10.1    Non-triviality
Local case We quickly reproduce a slight modification8 of Fortnow’s example [18] which shows that
v(G2 ) > (v(G))2 is possible. The same variation was also considered by Feige and Lovász [16].
   The game we describe is over bits (i. e., all the queries and all the responses are bits). We set
                                                                                         1
                                    PXY (0, 0) := PXY (0, 1) := PXY (1, 0) :=              ,
                                                                                         3
and define
                                                                            
                                         Q(x, y, a, b) := (x ∨ a) 6= (y ∨ b) .                                        (10.1)

This can be described in words: Alice and Bob each receive a bit, and at least one of these bits is 0. If
both players receive 0, exactly one player must respond with 1. If one of the players receives 1, the other
must respond with 0.
    We first show that for this game v = 23 . Clearly, v ≥ 32 (e. g., both players always answer 0). To
show v ≤ 23 we check all deterministic strategies. If both players reply 0 on query 0, this fails in case x =
y = 0 (and thus with probability 31 ). If one player, w.l.o.g. Alice, answers 0 with 1 the players fail in
case x = 0 and y = 1.
    If this game is repeated twice in parallel, setting (a1 , a2 ) := (x2 , x1 ), (b1 , b2 ) := (y2 , y1 ) also wins
with probability 23 . One can check this as follows: for every fixed query (x1 , y1 ) answering with (x2 , y2 )
wins the first subgame with probability 23 (one checks all 3 cases). Moreover, with this strategy

                                         Q(x1 , y1 , a1 , b1 ) ≡ Q(x2 , y2 , a2 , b2 )

which implies the claim.

No-signaling case We now show that for the above game

                                        v(G) = v(G2 ) = vns (G) = vns (G2 ) .                                         (10.2)

Previously, it was known that quantum strategies do not help Alice and Bob to win this game [35] (in
either the single instance case or where two parallel instances are used).
    To show (10.2) it is sufficient to show that v(G) ≥ vns (G) (since vns (G) ≥ vns (G2 ) ≥ v(G2 ) = v(G)
is already known). There are two ways to see this. First, one can notice that the joint probability of
Alice’s and Bob’s reply only matters if x = y = 0; i. e., only for one query. In such a case one can
always get a local strategy which is as good as a given no-signaling strategy. Alternatively, let p be the
probability that Alice replies 0 on query 0 and q be the probability that Bob replies 0 on query 0. In
this case, the players win with probability at most p on query (x, y) = (0, 1), with probability at most q
on query (1, 0), and with probability at most (1 − p) + (1 − q) on query (0, 0), which gives an overall
winning probability of at most 32 .
   8 Fortnow also lets the referee choose x = y = 1 with some probability, in which case the players cannot win the game.



                            T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                                             167
                                               T HOMAS H OLENSTEIN

10.2    Converse of Lemma 8.3
We show here that Lemma 8.3 can be strengthened to get an “if and only if” condition.

Lemma 10.1. Let a conditional distribution PZ|AB over finite sets A, B, Z be given. If for all product
distributions PAB = PA PB the Markov condition A ↔ Z ↔ B is satisfied then there exist functions f (a, z) :
A × Z → [0, 1] and g(b, z) : B × Z → [0, 1] such that

                                          PZ|A=aB=b (z) = f (a, z) · g(b, z) .                          (10.3)

Proof. Fix an arbitrary z throughout the proof, and consider arbitrary elements a, a0 ∈ A and b, b0 ∈ B.
We set PA (a) = PA (a0 ) = 12 and PB (b) = PB (b0 ) = 12 . The Markov condition implies

                                          PA|Z=zB=b (a) = PA|Z=zB=b0 (a)

which is equivalent to

                              PABZ (a, b, z)                     PABZ (a, b0 , z)
                                                       =
                      PABZ (a, b, z) + PABZ (a0 , b, z) PABZ (a, b0 , z) + PABZ (a0 , b0 , z)
or (because of our choice of PAB )
                           PZ|A=a,B=b (z)                    PZ|A=a,B=b0 (z)
                                                    =                                    .
                   PZ|A=a,B=b (z) + PZ|A=a0 ,B=b (z) PZ|A=a,B=b0 (z) + PZ|A=a0 ,B=b0 (z)

Analogously one gets (by swapping the roles of a and a0 )

                           PZ|A=a0 ,B=b (z)                  PZ|A=a0 ,B=b0 (z)
                                                    =                                    .
                   PZ|A=a,B=b (z) + PZ|A=a0 ,B=b (z) PZ|A=a,B=b0 (z) + PZ|A=a0 ,B=b0 (z)

Together, this implies

                       PZ|A=a,B=b (z)PZ|A=a0 ,B=b0 (z) = PZ|A=a,B=b0 (z)PZ|A=a0 ,B=b (z) .              (10.4)

Fix a∗ with ∑b∈B PZ|A=a∗ ,B=b (z) > 0 (if there is no such a∗ we let f and g be the zero functions) and
let b∗ be such that PZ|A=a∗ ,B=b∗ (z) ≥ PZ|A=a∗ ,B=b (z) for all b (note that PZ|A=a∗ ,B=b∗ (z) > 0). We define

                                            f (a, z) := PZ|A=a,B=b∗ (z) ,
                                                        PZ|A=a∗ ,B=b (z)
                                            g(b, z) :=                    ,
                                                        PZ|A=a∗ ,B=b∗ (z)

and we note that because of our choice of a∗ and b∗ both f , g ∈ [0, 1]. We further get, for any a, b:

                                          PZ|A=a,B=b∗ (z)PZ|A=a∗ ,B=b (z)
                      f (a, z)g(b, z) =                                   = PZ|A=a,B=b (z) ,
                                                PZ|A=a∗ ,B=b∗ (z)

where the last equation follows from (10.4).

                         T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                               168
               PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

Acknowledgments
I would like to thank Paul Beame, Georges Baatz, Ryan O’Donnell, Johan Håstad, Bartosz Przydatek,
Anup Rao, Ran Raz, Renato Renner, Ricky Rosen, Stefano Tessaro, Stefan Wolf, and Jürg Wullschleger
for helpful discussions. Paul Beame pointed out [29] to me and explained how the strengthening given
there can be adapted to the proof given here; he also simplified an argument in a previous version of
this paper. Anup Rao simplified an argument in the proof of Lemma 5.2. Oded Regev and Ricky Rosen
found an error in a previous version of the proof of Lemma 9.6. The anonymous referees gave helpful
comments. I was supported by the Swiss National Science Foundation, project no. 200020-103847/1.


References
 [1] S ANJEEV A RORA , C ARSTEN L UND , R AJEEV M OTWANI , M ADHU S UDAN , AND M ARIO
     S ZEGEDY: Proof verification and hardness of approximation problems. In Proc. 33rd FOCS,
     pp. 14–23. IEEE Comp. Soc. Press, 1992. [doi:10.1109/SFCS.1992.267823]. 142

 [2] S ANJEEV A RORA AND S HMUEL S AFRA: Probabilistic checking of proofs; a new char-
     acterization of NP.   In Proc. 33rd FOCS, pp. 2–13. IEEE Comp. Soc. Press, 1992.
     [doi:10.1109/SFCS.1992.267824]. 142

 [3] B OAZ BARAK , M ORITZ H ARDT, I SHAY H AVIV, A NUP R AO , O DED R EGEV, AND DAVID
     S TEURER: Rounding parallel repetition of unique games. In Proc. 49th FOCS, pp. 374–383.
     IEEE Comp. Soc. Press, 2008. [doi:10.1109/FOCS.2008.31]. 144

 [4] PAUL B EAME: Personal communication, 2006. 146

 [5] M ICHAEL B EN -O R , S HAFI G OLDWASSER , J OE K ILIAN , AND AVI W IGDERSON: Multi-prover
     interactive proofs: How to remove intractability assumptions. In Proc. 20th STOC, pp. 113–132.
     ACM Press, 1988. [doi:10.1145/62212.62223]. 142

 [6] G ILLES B RASSARD , A NNE B ROADBENT, AND A LAIN TAPP: Quantum pseudo-telepathy. Foun-
     dations of Physics, 35(11):1877–1907, 2005. [doi:10.1007/s10701-005-7353-4, arXiv:quant-
     ph/0407221]. 143

 [7] A NDREI Z. B RODER , S TEVEN C. G LASSMAN , M ARK S. M ANASSE , AND G EOFFREY Z WEIG:
     Syntactic clustering of the web. Computer Networks, 29(8–13):1157–1166, 1997. 143

 [8] J IN - YI C AI , A NNE C ONDON , AND R ICHARD J. L IPTON: On games of incomplete information.
     Theoretical Computer Science, 103(1):25–38, 1992. [doi:10.1016/0304-3975(92)90085-T]. 142

 [9] B ORIS S. C IREL’ SON: Quantum generalizations of Bell’s inequality. Lett. Math. Phys., 4(2):93–
     100, 1980. [doi:10.1007/BF00417500]. 142

[10] J OHN F. C LAUSER , M ICHAEL A. H ORNE , A BNER S HIMONY, AND R ICHARD A. H OLT: Pro-
     posed experiment to test local hidden-variable theories. Phys. Rev. Lett., 23(15):880–884, 1969.
     [doi:10.1103/PhysRevLett.24.549]. 142

                      T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                        169
                                       T HOMAS H OLENSTEIN

[11] R ICHARD C LEVE , W ILLIAM S LOFSTRA , FALK U NGER , AND S ARVAGYA U PADHYAY: Per-
     fect parallel repetition theorem for quantum XOR proof systems. Computational Complexity,
     17(2):282–299, 2008. [doi:10.1007/s00037-008-0250-4]. 143

[12] T HOMAS M. C OVER AND J OY A. T HOMAS: Elements of Information Theory. John Wiley &
     Sons, Inc., second edition, 1991. ISBN 0-471-24195-4. 145, 149

[13] U RIEL F EIGE: On the success probability of the two provers in one-round proof systems. In
     Proc. 6th Ann. Structure in Complexity Theory Conf., pp. 116–123. IEEE Comp. Soc. Press, 1991.
     [doi:10.1109/SCT.1991.160251]. 142

[14] U RIEL F EIGE , S HAFI G OLDWASSER , L ÁSZL Ó L OV ÁSZ , S HMUEL S AFRA , AND M ARIO
     S ZEGEDY: Approximating clique is almost NP-complete (preliminary version). In Proc. 32nd
     FOCS, pp. 2–12. IEEE Comp. Soc. Press, 1991. [doi:10.1109/SFCS.1991.185341]. 142

[15] U RIEL F EIGE , G UY K INDLER , AND RYAN O’D ONNELL: Understanding parallel repetition re-
     quires understanding foams. In Proc. IEEE Conf. Comput. Complexity, pp. 179–192. IEEE Comp.
     Soc. Press, 2007. [doi:10.1109/CCC.2007.39]. 144

[16] U RIEL F EIGE AND L ÁSZL Ó L OV ÁSZ: Two-prover one-round proof systems: Their
     power and their problems.       In Proc. 24th STOC, pp. 733–744. ACM Press, 1992.
     [doi:10.1145/129712.129783]. 142, 167

[17] U RIEL F EIGE AND O LEG V ERBITSKY: Error reduction by parallel repetition – a negative result.
     Combinatorica, 22(4):461–478, 2002. [doi:10.1007/s00493-002-0001-0]. 142

[18] L ANCE F ORTNOW: Complexity-Theoretic Aspects of Interactive Proof Systems. PhD thesis, Mas-
     sachusetts Institute of Technology, 1989. 142, 167

[19] L ANCE F ORTNOW, J OHN ROMPEL , AND M ICHAEL S IPSER: On the power of multi-prover inter-
     active proof systems. Theoretical Computer Science, 134(2):545–557, 1994. [doi:10.1016/0304-
     3975(94)90251-8]. 142

[20] S REENIVAS G OLLAPUDI AND R INA PANIGRAHY: A dictionary for approximate string search and
     longest prefix search. In Proc. of 15th ACM Intern. Conf. on Inf. and Knowl. Manag. (CIKM), pp.
     768–775. ACM Press, 2006. [doi:10.1145/1183614.1183723]. 143

[21] T HOMAS H OLENSTEIN: Parallel repetition: Simplifications and the no-signaling case. In Proc.
     39th STOC, pp. 411–419. ACM Press, 2007. [doi:10.1145/1250790.1250852]. 141, 143

[22] T HOMAS H OLENSTEIN AND R ENATO R ENNER: On the randomness of independent experiments,
     2006. [arXiv:cs.IT/0608007]. 148

[23] S UBHASH K HOT: On the power of unique 2-prover 1-round games. In Proc. 34th STOC, pp.
     767–775. ACM Press, 2002. [doi:10.1145/509907.510017]. 144

                      T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                       170
               PARALLEL R EPETITION : S IMPLIFICATIONS AND THE N O -S IGNALING C ASE

[24] J ON M. K LEINBERG AND É VA TARDOS: Approximation algorithms for classification problems
     with pairwise relationships: Metric labeling and Markov random fields. J. ACM, 49(5):616–639,
     2002. [doi:10.1145/585265.585268]. 143

[25] D ROR L APIDOT AND A DI S HAMIR: A one-round, two-prover, zero-knowledge protocol for NP.
     Combinatorica, 15(2):203–214, 1995. [doi:10.1007/BF01200756]. 142

[26] M ARK M ANASSE , F RANK M C S HERRY, AND K UNAL TALWAR: Consistent weighted sampling.
     Manuscript, 2007. 143

[27] U DI M ANBER: Finding similar files in a large file system. In USENIX Winter, pp. 1–10, 1994. 143

[28] M ICHAEL A. N IELSEN AND I SAAC L. C HUANG: Quantum Computation and Quantum Informa-
     tion. Cambridge University Press, 2000. ISBN 0-521-63503-9. 142

[29] I TZHAK PARNAFES , R AN R AZ , AND AVI W IGDERSON: Direct product results and the GCD
     problem, in old and new communication models. In Proc. 29th STOC, pp. 363–372. ACM Press,
     1997. [doi:10.1145/258533.258620]. 142, 143, 146, 159, 169

[30] S ANDU P OPESCU AND DANIEL ROHRLICH: Nonlocality as an axiom for quantum theory.
     Foundations of Physics, 24(3):379–385, March 1994. [doi:10.1007/BF02058098, arXiv:quant-
     ph/9508009]. 142

[31] A NUP R AO: Parallel repetition in projection games and a concentration bound. In Proc. 40th
     STOC, pp. 1–10. ACM Press, 2008. [doi:10.1145/1374376.1374378]. 143, 144, 146

[32] R AN R AZ: A parallel repetition theorem.      SIAM J. Comput., 27(3):763–803, 1998.
     [doi:10.1137/S0097539795280895]. 142, 143, 148

[33] R AN R AZ: A counterexample to strong parallel repetition. In Proc. 49th FOCS, pp. 369–373.
     IEEE Comp. Soc. Press, 2008. [doi:10.1109/FOCS.2008.49]. 144

[34] O LEG V ERBITSKY: Towards the parallel repetition conjecture. In Proc. 9th Ann.
     Structure in Complexity Theory Conf., pp. 304–307. IEEE Comp. Soc. Press, 1994.
     [doi:10.1109/SCT.1994.315794]. 142

[35] J OHN WATROUS: A note on parallel repetition of two-prover quantum-interactive proofs, 2002.
     Manuscript. 167




                       T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                        171
                                   T HOMAS H OLENSTEIN

AUTHOR

   Thomas Holenstein
   postdoctoral researcher
   Princeton University
   tholenst princeton edu
   http://www.cs.princeton.edu/∼tholenst


ABOUT THE AUTHOR

   T HOMAS H OLENSTEIN obtained his Ph. D. from ETH Zürich in 2006. His advisor was
      Ueli Maurer. His CS interests include cryptography, interactive proofs, semidefinite
      programming, and information theory. He enjoys playing Go and riding his bicycle to
      work.




                   T HEORY OF C OMPUTING, Volume 5 (2009), pp. 141–172                       172