Communication Complexity of Set-Disjointness for All Probabilities

Authors Mika G{\"o}{\"o}s, Thomas Watson,
Plaintext
                           T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23
                                        www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2014



              Communication Complexity of
           Set-Disjointness for All Probabilities
                                  Mika Göös                         Thomas Watson
                 Received December 21, 2014; Revised June 8, 2015; Published August 15, 2016




       Abstract: We study S ET-D ISJOINTNESS in a generalized model of randomized two-party
       communication where the probability of acceptance must be at least α(n) on yes-inputs and
       at most β (n) on no-inputs, for some functions α(n) > β (n). Our main result is a complete
       characterization of the private-coin communication complexity of S ET-D ISJOINTNESS for
       all functions α and β , and a near-complete characterization for public-coin protocols. In
       particular, we obtain a simple proof of a theorem of Braverman and Moitra (STOC 2013),
       who studied the case where α = 1/2 + ε(n) and β = 1/2 − ε(n). The following contributions
       play a crucial role in our characterization and are interesting in their own right.
         (1) We introduce two communication analogues of the classical complexity class that
             captures small bounded-error computations: we define a “restricted” class SBP (which
             lies between MA and AM) and an “unrestricted” class USBP. The distinction between
             them is analogous to the distinction between the well-known communication classes
             PP and UPP.
         (2) We show that the SBP communication complexity is precisely captured by the classical
             corruption lower bound method. This sharpens a theorem of Klauck (CCC 2003).
   A preliminary version of this paper appeared in the Proceedings of the 18th International Workshop on Randomization and
Computation (RANDOM), 2014 [22].


ACM Classification: F.1.3
AMS Classification: 68Q17, 68Q15
Key words and phrases: communication complexity, set-disjointness, complexity classes


© 2016 Mika Göös and Thomas Watson
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2016.v012a009
                                   M IKA G Ö ÖS AND T HOMAS WATSON

       (3) We use information complexity arguments to prove a linear lower bound on the USBP
           complexity of S ET-D ISJOINTNESS.


1     Introduction
In the S ET-D ISJOINTNESS problem, Alice is given an x ⊆ [n], Bob is given a y ⊆ [n], and their task is to
decide whether x ∩ y = 0.
                       / Equivalently, viewing x and y as binary strings, we define
                                                         _
                                      D ISJ(x, y) := ¬       (xi ∧ yi ) .
                                                         i∈[n]

Set-disjointness is the preeminent coNP-complete problem in communication complexity [2, 15]. A
fundamental result of Kalyanasundaram and Schnitger [27] (with alternative proofs given by [35, 4])
states that every randomized protocol for S ET-D ISJOINTNESS requires Ω(n) bits of communication to
achieve a constant error probability that is bounded away from 1/2. These lower bounds have been
extremely useful in applications of communication complexity to other areas of theoretical computer
science, including circuit complexity, distributed computing, streaming, data structures, combinatorial
optimization, and more; see [32, 26, 15].
    In this work, we study S ET-D ISJOINTNESS in a generalized setting where the probability of acceptance
must be at least α(n) on yes-inputs and at most β (n) on no-inputs, for any prescribed functions α(n) >
β (n).

1.1   Main result
Our main result is a complete characterization of the private-coin communication complexity of S ET-
D ISJOINTNESS for all functions α and β , and a near-complete characterization for public-coin protocols.
Roughly speaking, we prove that the randomized complexity is

                                            Θ(n · (1 − β /α))

for typical functions α and β ; see Section 1.4 for the statement of the exact bounds.
    As a special case, we obtain a simple proof of a result of Braverman and Moitra [8]. They showed
that the communication complexity of S ET-D ISJOINTNESS is Θ(εn) in case α = 1/2 + ε(n) and β =
1/2 − ε(n). While this special case might suggest that the complexity is determined by the additive gap
α − β , our characterization reveals that, in fact:

      Central tenet: It is not the additive gap between α and β that determines the complexity of
      S ET-D ISJOINTNESS; what matters is the multiplicative gap.

Our proof follows this ideology: we show that in order to understand the communication complexity for
all α and β it suffices to understand the small bounded-error case where α is tiny (e. g., exponentially
small in n) and β = α/2. The basic reason is because a protocol’s multiplicative gap between α and β
can be efficiently amplified at the expense of decreasing α, and thus upper bounds for the general case
yield upper bounds for the small bounded-error case.

                       T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                             2
             C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

1.2 SBP: Small bounded-error probabilities
In classical time-bounded (i. e., poly-time Turing machine) complexity theory, small bounded-error
acceptance probabilities are captured by a counting class called SBP, which was introduced by Böhler,
Glaßer, and Meister [5] and has also been studied in [41]. In particular, [5] observed that SBP is
sandwiched between the Arthur–Merlin classes MA and AM [3].
    In this work, we introduce two communication complexity analogues of SBP: a restricted class called
SBP, and an unrestricted class called USBP. These classes are natural and interesting in their own right.
Most importantly, they serve to structure our argument.


Randomized communication complexity. In what follows, we assume familiarity with basic defini-
tions of communication complexity [32, 26]. Fix a two-party function f : {0, 1}n × {0, 1}n → {0, 1}
where on input (x, y) Alice is given x and Bob is given y. We say (x, y) is a b-input if (x, y) ∈ f −1 (b). We
let R pub                        priv
      α, β ( f ), respectively R α, β ( f ), denote the minimum communication complexity (as a function of n)
of a public-coin, respectively private-coin, protocol for f such that the probability of acceptance is at
least α(n) on all 1-inputs and at most β (n) on all 0-inputs. As is customary [2], for any communication
measure C( f ) we often let C stand for the class of functions f with C( f ) = polylog(n).


PP and UPP. To motivate our upcoming definitions for SBP, we take a little detour and recall the
communication classes associated with the standard complexity class PP. There are in fact two distinct
measures—restricted and unrestricted—as introduced in [2, 34]:

                               PP( f ) := min R pub
                                                1/2 + ε, 1/2 − ε ( f ) + log(1/ε) ,
                                          ε(n)>0

                             UPP( f ) := min R priv
                                               1/2 + ε, 1/2 − ε ( f ) .
                                          ε(n)>0


In the (restricted) public-coin model, one needs to charge the additional log(1/ε) term in order for the
measure to be well-behaved when ε is tiny. (For example, note that R pub
                                                                       1/2 + ε, 1/2 − ε ( f ) ≤ 2 for ε = 2
                                                                                                           −n−1

since public randomness can be used to guess Alice’s input.) The original definition of PP( f ) given in [2]
actually charged for the number of public coin flips instead of the + log(1/ε); however, by standard
sparsification techniques (see [33] and [32, Theorem 3.14]) the two versions are essentially equivalent—
they are within a constant factor plus O(log n)—and the definition we have stated is much more prevalent
in recent literature. It also follows from standard sparsification that we may convert any PP protocol
into a UPP protocol of comparable cost: UPP( f ) ≤ O(PP( f ) + log n). In the converse direction, an
exponential separation between UPP and PP is known [10, 36, 37].


SBP and USBP. Analogously to the above, we define

                                 SBP( f ) := min R pub
                                                   α, α/2( f ) + log(1/α) ,
                                               α(n)>0

                               USBP( f ) := min R priv
                                                  α, α/2( f ) .
                                               α(n)>0



                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                 3
                                   M IKA G Ö ÖS AND T HOMAS WATSON

Here the constant factor 1/2 = β /α can be replaced by any positive constant less than 1 while affecting
the complexity measures by only a constant factor: if we run a protocol ` times and accept iff all iterations
accept, then β /α gets raised to the power ` while the communication and the log(1/α) term each get
multiplied by `. We call this procedure and-amplification (in contrast to the usual majority-amplification).
We also note that by standard sparsification, USBP( f ) ≤ O(SBP( f ) + log n) holds for all f . In the
converse direction, at the time of this writing we did not know whether USBP is significantly more
powerful than SBP (this question has subsequently been resolved in the negative—see Section 6), though
a small separation is witnessed by the greater-than function, which has constant USBP complexity but
Θ(log n) SBP and PP complexity [9].

Relationship to Arthur–Merlin classes. Klauck [29, 31] and Aaronson and Wigderson [1] took up
the study of communication complexity analogues of Arthur–Merlin games. Their results have already
found applications in data streaming [11, 12, 23]. We do not define the communication models MA and
AM here, but we note that the classical inclusions continue to hold in the communication setting (for the
same reasons):
                                           MA ⊆ SBP ⊆ AM .
Indeed, if MA( f ) = m then by majority-amplification and by absorbing Merlin’s nondeterminism into the
randomness we obtain
                                       R pub                      2
                                         2−m−1 , 2−m−2 ( f ) ≤ O(m ) .

Thus SBP( f ) ≤ O(MA( f )2 ) (and the quadratic blow-up is necessary for “black-box” simulations [17]).
On the other hand, AM( f ) ≤ O(SBP( f ) + log n) holds by sparsifying the randomness and using the
Goldwasser–Sipser protocol [20].

1.3   Results for SBP and USBP
We prove that SBP communication complexity is exactly characterized by the well-known corruption
lower bound method (also known as the rectangle bound or one-sided discrepancy). The definition of the
corruption bound Corr( f ) is given in Section 3, but for now, we note that Corr( f ) essentially depends on
the size of the largest approximately 1-monochromatic rectangle in the communication matrix of f . (For
an extensive discussion of the different lower bound methods in communication complexity, see [24].)
Previously, Klauck [29] showed that Corr( f ) lies somewhere between the MA and AM communication
complexities of f ; namely Ω(AM( f )) ≤ Corr( f ) ≤ O(MA( f )2 ). Klauck also gave a combinatorial near-
characterization of Corr( f ) (tight up to logarithmic factors) using so-called one-sided uniform threshold
covers. The following theorem (proved in Section 3) sharpens these results by pinpointing precisely the
class between MA and AM that is characterized by corruption.

Theorem 1.1. SBP( f ) = Θ(Corr( f )) for all f .

    One way to frame Theorem 1.1 is as follows. A lot of effort (e. g., [29, 24, 28, 25, 19]) has been
spent on comparing the relative strengths of different lower bound methods in communication complexity
with the goal of finding a natural method that captures the bounded-error randomized communication
complexity of every function. Theorem 1.1 can be viewed as achieving a diametrically opposite goal: we

                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                               4
             C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

start with a historically important lower bound method (i. e., corruption) and find a natural communication
measure that it captures. Theorem 1.1 is also somewhat analogous, in content and proof, to another result
of Klauck [30] showing that the discrepancy bound captures PP.
     Razborov [35] famously proved that Corr(D ISJ) = Θ(n). (The first linear lower bound for S ET-
                                                                                                      √
D ISJOINTNESS [27] did not use corruption.) By the results of [29], this implies that MA(D ISJ) ≥ Ω( n).
We immediately have a stronger corollary.

Corollary 1.2. SBP(D ISJ) = Θ(n).

   To obtain lower bounds for USBP we show that the information complexity framework, as formulated
by Bar-Yossef, Jayram, Kumar, and Sivakumar [4] (see also [14]), can be adapted to suit our purposes.
The main technical result of this work is the following, proved in Section 4.

Theorem 1.3. USBP(D ISJ) = Θ(n).

    We note that the statement of Theorem 1.3 is similar in spirit to Forster’s theorem [18] stating
that the UPP complexity of the I NNER -P RODUCT function is Θ(n). Note also that Corollary 1.2 is of
course a corollary of Theorem 1.3, too, but the corruption-based proof via Theorem 1.1 is arguably
more elementary than the proof of Theorem 1.3. Finally, we note that the well-studied G AP -H AMMING -
                                                                                 √
D ISTANCE promise problem [13, 40, 39] (where 1-inputs have distance ≥ n/2 + n and 0-inputs have
                    √                                           √
distance ≤ n/2 − n) has SBP and USBP complexities Θ( n), where the lower bound follows by
Theorem 1.3 and a standard reduction from D ISJ, and the upper bound follows by and-amplification of
the trivial protocol that checks inequality at a random bit position.

1.4   Characterization for all α and β
Using our results for SBP and USBP in a black-box manner we derive the following (near) complete
characterization for the randomized communication complexity of S ET-D ISJOINTNESS in Section 2.

Theorem 1.4 (Private-coin). For all α(n) > β (n),

                                  R priv
                                    α, β (D ISJ ) = Θ(n · (1 − β /α) + log n) .

Theorem 1.5 (Public-coin). There is a universal constant C > 0 such that for all α(n) > β (n),
                               (
                                Θ(n · (1 − β /α)) when log(1/α) ≤ C · n · (1 − β /α) ,
             R pub
               α, β (D ISJ ) =
                                2                 when log(1/α) ≥ dn · (1 − β /α)e .

    We stress that for the public-coin characterization (and in particular, the result of [8] as a corollary), it
suffices to rely only on Razborov’s corruption lemma (via Corollary 1.2), and not on any information
complexity techniques. Braverman and Moitra [8] observed that

                                       R pub                            2
                                         1/2 + ε, 1/2 − ε (D ISJ ) ≥ Ω(ε n)

follows from the standard bounded-error lower bound by majority-amplification, and they obtained
the tight Ω(εn) bound by developing information complexity techniques tailored to this setting. Our

                         T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                  5
                                   M IKA G Ö ÖS AND T HOMAS WATSON

idea is that and-amplification imposes only an ε factor loss (rather than the ε 2 factor loss imposed by
majority-amplification) while still reducing to a case where the corruption method applies.
    We also note that for public-coin protocols there remains a small gap in the parameters around the
threshold log(1/α) = Θ(n · (1 − β /α)) that is not covered by our theorem. As we discuss in Section 2,
the power of the public coins kicks in at this threshold.
    Finally, we mention that all the S ET-D ISJOINTNESS lower bounds in this paper continue to hold
under the unique-intersection promise where the inputs are either disjoint or intersect in exactly one
coordinate: for Corollary 1.2 this property is inherited from Razborov’s proof; for Theorem 1.3 this
property is implicit in our proof.


1.5   Extended formulations for maximum-clique
The authors of [6] proved that for every positive constant δ < 1/2, so-called n1/2−δ -approximate extended
formulations for a certain natural linear programming encoding of the maximum-clique problem for
                                2δ
graphs have complexity 2Ω(n ) . Their proof involves developing a generalization of the corruption
lemma from [35]. In [8], this result was improved using information-theoretic methods to show a tight
lower bound for maximum-clique: for every positive constant δ < 1, such n1−δ -approximate extended
                                     δ
formulations have complexity 2Ω(n ) . In [7], a simplified proof of the latter result was given, also using
information-theoretic methods.
    The reason the proof due to [6] only works up to n1/2−δ -approximation (rather than n1−δ -approxima-
tion) is similar to the reason why majority-amplification yields

                                     R pub                            2
                                       1/2 + ε, 1/2 − ε (D ISJ ) ≥ Ω(ε n)


(rather than Ω(εn)) from the standard bounded-error lower bound for D ISJ. Although communication
complexity lower bounds do not seem to imply extended formulation lower bounds in a black-box way,
the tools for the former have generally been useful for the latter. This suggests that perhaps an idea
analogous to our and-amplification-based communication lower bound proof could be used to “bootstrap”
the result of [6] to get the tight lower bound for maximum-clique. In Section 5 we confirm that this is
indeed the case, thus obtaining a proof of the tight bound that is simpler than the ones given in [8, 7] and
avoids their use of information-theoretic methods (instead relying only on the corruption-based argument
of [6]).


2     The complexity of Set-Disjointness
We now prove Theorem 1.4 and Theorem 1.5 using Theorem 1.3 and Corollary 1.2.


2.1   Upper bounds
Public-coin protocols. We start with a simple R pub
                                                1, β /α protocol for D ISJ of cost Θ(n · (1 − β /α)).


                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                              6
              C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES


      Basic public-coin protocol Π.
      1. Use public randomness to pick a uniformly random S ⊆ [n] of size dn · (1 − β /α)e.
      2. Alice sends the substring x|S to Bob.
      3. Bob outputs D ISJ(x|S , y|S ).


    It is straightforward to check that Π is indeed an R pub                                 pub
                                                            1, β /α protocol. To obtain an R α, β protocol for
the first part of Theorem 1.5 (without needing any restriction on the parameters), we can reject with
probability 1 − α at the beginning and otherwise run Π. To obtain a protocol of cost 2 for the second part
of Theorem 1.5, we need to better exploit the power of public coins. If we modify Π so that additional
public coins are used to guess x|S , then Alice can just send one bit indicating whether the guess is correct,
and Bob can send the output bit (rejecting if the guess was wrong). This yields an

                                                  R pub
                                                    1/2|S| , β /α2|S|


protocol which, by the restriction that α ≤ 1/2|S| , can be adapted into an R pub
                                                                              α, β protocol by automatically
rejecting with probability 1 − α2 . |S|

    In fact, the above protocols can be seen as special cases of the following general protocol, which
interpolates between them. For simplicity of presentation, let us assume that log(1/α) is an integer and
log(1/α) ≤ |S|. In step 2 of the basic protocol Π, Alice can expedite the sending of her message to Bob
as follows: Alice and Bob interpret additional public coins as guessing the first log(1/α) bits of Alice’s
message. Alice can use one bit of communication to indicate whether this guess is correct, and if so she
can send the other |S| − log(1/α) bits of her message normally. The probability that the public guess is
correct is 2− log(1/α) = α. Thus, this new protocol ends up working in a familiar way: with probability
1 − α the public guess fails (in which case we reject), but otherwise we are able to run Π successfully.
This results in an R pub
                      α, β protocol of cost |S| − log(1/α) + 2. Here the +2 comes from Alice indicating
whether the public guess is correct and Bob sending the final answer.

Private-coin protocols. By sparsification, we may assume the basic protocol Π uses only O(log n) bits
of public randomness. Thus we have

                                  R priv
                                    1, β /α (D ISJ ) ≤ O(n · (1 − β /α) + log n)

since Alice can pick S privately and send it to Bob along with x|S . An R priv
                                                                          α, β protocol for Theorem 1.4
can be obtained as previously: automatically reject with probability 1 − α and otherwise run the R priv
                                                                                                   1, β /α
protocol.

2.2     Lower bounds
Private-coin lower bounds. Let Π be an R priv        α, β protocol for D ISJ . We prove that the cost of Π is both
Ω(n · (1 − β /α)) and Ω(log n), as required for Theorem 1.4.
    First, if we do and-amplification by iterating the protocol d1/(1 − β /α)e times and accepting iff all
runs accept, we get an R priv                                  0
                         α 0, α 0 /2 protocol for D ISJ with α := α
                                                                     d1/(1−β /α)e (since (β /α)d1/(1−β /α)e < 1/2).



                         T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                    7
                                            M IKA G Ö ÖS AND T HOMAS WATSON

By Theorem 1.3 the amplified protocol must use Ω(n) communication and hence Π must have used
Ω(n · (1 − β /α)) communication.
    Second, Forster’s result [18] that the UPP complexity of I NNER -P RODUCT is Ω(n) gives us the
Ω(log n) lower bound for Π. Indeed, the I NNER -P RODUCT function reduces to D ISJ with exponential
blow-up (see [38, Proposition 6.5]) and we may convert Π into a UPP protocol by shifting the acceptance
threshold near 1/2.

Public-coin lower bounds. Let Π be an R pub  α, β protocol for D ISJ . We consider the two parts of Theo-
rem 1.5 separately.
   For the first part, suppose log(1/α) ≤ C · n · (1 − β /α) for a to-be-specified constant C. We proceed
exactly as above: We first and-amplify Π into an R priv
                                                     α 0, α 0 /2 protocol. The parameters satisfy

            log(1/α 0 ) = log(1/α) · d1/(1 − β /α)e ≤ C · n · (1 − β /α) · d1/(1 − β /α)e ≤ 2C · n .

Hence if C is a sufficiently small universal constant then the Ω(n) lower bound for the amplified protocol
(provided now by Corollary 1.2) must be coming from the communication cost and not from the log(1/α 0 )
term. We conclude that the original protocol Π must have used Ω(n · (1 − α/β )) communication.
    For the second part, we do not need any restriction on the parameters. We claim that since D ISJ has
a 2 × 2 identity submatrix, we cannot have R pub                   1
                                                α, β (D ISJ ) ≤ 1. Suppose for contradiction there is a 1-bit
protocol and yet D ISJ(x, y) = D ISJ(x , y ) = 1 and D ISJ(x, y ) = D ISJ(x0 , y) = 0. Say r is the probability
                                       0  0                      0

Alice declares the output and 1 − r is the probability Bob declares the output. Conditioned on Alice
declaring the output let px , px0 be the acceptance probability for the x and x0 rows, and conditioned
on Bob declaring the output let qy , qy0 be the acceptance probability for the y and y0 columns. Letting
πxy := rpx + (1 − r)qy be the overall acceptance probability on input (x, y), we have

                  α − β ≤ πxy − πx0 y = r(px − px0 ) and                α − β ≤ πx0 y0 − πxy0 = r(px0 − px ) ,

a contradiction.


3 SBP is characterized by corruption
In this section we prove Theorem 1.1, which states that SBP( f ) = Θ(Corr( f )) for all f . We start
by defining the corruption bound. We say a distribution µ over inputs is balanced (with respect to
f ) if µ( f −1 (1)) = µ( f −1 (0)) = 1/2. We say a rectangle R is 1-biased (with respect to f and µ) if
µ(R ∩ f −1 (0)) ≤ µ(R)/8. The corruption bound is defined as
                                                                         
                                                                      1
                                 Corr( f ) := max          min log          .
                                             balanced µ 1-biased R   µ(R)
   1 For the public-coin version of UPP, an equivalence actually holds. For all f , we have UPP pub( f ) ≤ 2, and it is not
                                                              pub
difficult to show that the following are equivalent: (i) UPP      ( f ) ≤ 1, (ii) there exist row and column values px , qy ∈ [0, 1] and
r ∈ [0, 1] such that rpx + (1 − r)qy − f (x, y) < 1/2, (iii) the rows and columns can be permuted so each row and each column
is monotonically nondecreasing (0’s then 1’s), (iv) f does not contain as a submatrix the 2 × 2 identity (or its complement).
To see that (iii) ⇒ (ii), take r = 1/2, and px = fraction of 1’s in the x row, and qy = (y − 1/2)/(number of columns) where y is
viewed as a positive integer.


                              T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                                    8
             C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

If the corruption bound of f is high, then for some µ all 1-biased rectangles are small (and hence if many
inputs must be covered by such rectangles, then many such rectangles will be needed). It was proved
in [29] that the constant factor of 1/8 (in the definition of 1-biased) can be replaced by any positive
constant at most 1/8 while affecting the corruption bound by only a constant factor. It was also proved
in [29] that the bound is robust with respect to the balance condition on µ.


3.1 SBP is lower bounded by corruption
Here we show the lower bound SBP( f ) ≥ Ω(Corr( f )). The intuition is as follows. The first step is to fix
the public randomness of an SBP protocol in such a way that the average-case behavior of the resulting
deterministic protocol mimics the worst-case behavior of the original protocol. Typically, this sort of
thing is done by invoking the distributive law (linearity of expectation), but here we need a more elaborate
calculation due to the asymmetric nature of SBP. Then, the rest of the argument follows along similar
lines as the proof in [29] (that Corr( f ) ≤ O(MA( f )2 )), showing that the 1-inputs are mostly covered by
“small” transcript rectangles (of our deterministic protocol), hence many such rectangles are needed.
    We proceed with the formal proof. Let Π be an R pub         α, α/32 protocol for f ; recall that by and-
amplification we may assume β = α/32 rather than β = α/2 in the definition of SBP. Assuming
log(1/α) < Corr( f )/2, we show that Π uses Ω(Corr( f )) bits of communication. To this end, fix a
balanced distribution µ such that for all 1-biased rectangles R, µ(R) ≤ 2−Corr( f ) .
    Identify the possible outcomes of public randomness with {1, . . . , m}, and let Πi denote Π running
with public randomness i. Let pi be the probability the public randomness is i (so pi = 1/m if the public
randomness is uniformly distributed). Let qi be the probability over µ that Πi accepts, conditioned on the
input being a 1-input. Let ri be the same but conditioned on a 0-input. Now
                                                                          
                        ∑ pi qi = i,(x,y)∼µ
                                      Pr    Πi (x, y) accepts f (x, y) = 1 ≥ α ,                         (3.1)
                         i
                                                                          
                        ∑ pi ri = i,(x,y)∼µ
                                      Pr    Πi (x, y) accepts f (x, y) = 0 ≤ α/32 .                      (3.2)
                         i


Claim 3.1. There exists an i∗ such that qi∗ ≥ α/2 and ri∗ ≤ qi∗ /16.

Proof of Claim. Suppose for contradiction that for all i either qi < α/2 or ri > qi /16. Let S ⊆ {1, . . . , m}
be such that for all i ∈ S, qi < α/2, and for all i ∈ S, ri > qi /16. Then

                                           1                        1                  
       ∑ pi ri ≥ ∑ pi ri ≥ ∑ pi qi /16 = 16 ∑ pi qi − ∑ pi qi ≥ 16 α − ∑ pi α/2 ≥ α/32 .
        i        i∈S         i∈S                i        i∈S                    i∈S


Furthermore, at least one of the inequalities must be strict, contradicting (3.2).

    Fix an i∗ guaranteed by Claim 3.1. Using i∗ as the public randomness in Π, we can now apply the
usual corruption argument. Consider the 1-rectangles that correspond to accepting transcripts of Πi∗ . Call
a 1-rectangle R large if µ(R) > 2−Corr( f ) and small otherwise. Recall that by our assumption on µ, no
large 1-rectangle is 1-biased: for every large 1-rectangle R we have µ(R ∩ f −1 (0)) > µ(R)/8. Under µ,

                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                 9
                                            M IKA G Ö ÖS AND T HOMAS WATSON

the total measure of large 1-rectangles is at most half the total measure of all 1-rectangles, since otherwise
                                                                                               
                                   ri∗ = 2 Pr             Πi∗ (x, y) accepts and f (x, y) = 0
                                            (x,y)∼µ

                                      = 2        ∑            µ(R ∩ f −1 (0))
                                            1-rectangles R

                                      ≥ 2           ∑              µ(R ∩ f −1 (0))
                                            large 1-rectangles R
                                        1
                                      >            ∑           µ(R)
                                        4 large 1-rectangles R
                                          1 1
                                      >    ·       ∑ µ(R)
                                          4 2 1-rectangles R
                                          1
                                      ≥          ∑ µ(R ∩ f −1 (1))
                                          8 1-rectangles R
                                        1                                          
                                      =      Pr Πi∗ (x, y) accepts and f (x, y) = 1
                                        8 (x,y)∼µ
                                        1
                                      = · qi∗ /2
                                        8
                                      = qi∗ /16 .

Therefore

                                             1                1
                    ∑              µ(R) ≥           ∑ µ(R) ≥ 2 · qi∗ /2 ≥ α/8 > 2−Corr( f )/2−3 .
                                             2 1-rectangles
            small 1-rectangles R                            R


Thus there are at least 2−Corr( f )/2−3 2−Corr( f ) = 2Corr( f )/2−3 small 1-rectangles, which implies that Π
                                        

uses at least Corr( f )/2 − 3 bits of communication.


3.2 SBP is upper bounded by corruption
Here we show the upper bound SBP( f ) ≤ O(Corr( f )). The intuition is as follows. If the corruption
bound is small, that means for every balanced distribution over inputs there exists a rectangle (which can
be viewed as a 2-bit protocol) that exhibits average-case SBP-like behavior—accepting a random 1-input
with not-too-small probability, and accepting a random 0-input with constant-factor-smaller probability.
We use the minimax theorem to convert this property into a distribution over rectangles, with a worst-case
SBP guarantee. Several technical issues arise with this argument. One is the asymmetry between 1-inputs
and 0-inputs, but this can be massaged away using a linear transformation of probabilities before invoking
minimax. Another is that the corruption bound can yield an average-case SBP rectangle with a different
“α” for different balanced distributions, whereas the minimax application requires a single α to work
uniformly for all balanced distributions. This issue is fixed by passing to an appropriate subrectangle to
decrease the α if necessary, for any given balanced distribution.

                            T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                           10
              C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

     We proceed with the formal proof. For notational convenience we let 0 and 1 stand for the events
f −1 (0) and f −1 (1), respectively. For example,

              µ(00 | R) = µ(R ∩ f −1 (0))/µ(R) and            µ(R | 0 ) = µ(R ∩ f −1 (0))/µ( f −1 (0)) .

Define α := 2−Corr( f ) .
Claim 3.2. For every balanced µ there exists a rectangle R with µ(R | 1 ) ≥ α and µ(R | 0 ) ≤ α/2.
Proof of Claim. Fix a balanced distribution µ. By definition of corruption, there exists a rectangle S such
that µ(S) ≥ α and µ(00 | S) ≤ 1/8. Decompose S as the disjoint union S1 ∪ S2 ∪ · · · ∪ Sm where the Si ’s
are the individual rows of S, sorted in nondecreasing order of µ(00 | Si ). Let S≤i := S1 ∪ S2 ∪ · · · ∪ Si . For
every i we know that µ(00 | S≤i ) ≤ µ(00 | S) ≤ 1/8. If there exists an i such that α ≤ µ(S≤i | 1 ) ≤ 2α then
R := S≤i witnesses the claim since
                              µ(00 | S≤i ) · µ(S≤i | 1 ) · µ(11) (1/8) · 2α · (1/2)
              µ(S≤i | 0) =                                      ≤                   = 2α/7 ≤ α/2 .
                                     µ(00) · µ(11 | S≤i )          (1/2) · (7/8)
Otherwise, since µ(S≤m | 1 ) = µ(S | 1 ) = µ(11 | S) · µ(S)/µ(11) ≥ (7/8) · α/(1/2) > α and µ(S≤0 | 1 ) =
0 < α, there must exist an i such that µ(S≤i | 1 ) > 2α and µ(S≤i−1 | 1 ) < α and thus µ(Si | 1 ) > 2α − α =
α. In this case, the rectangle R := Si ∩ 1 witnesses the claim since µ(Si ∩ 1 | 1 ) = µ(Si | 1 ) > α and
µ(Si ∩ 1 | 0 ) = 0 ≤ α/2.

    Let M be the matrix with rows indexed by inputs (x, y) ∈ {0, 1}n × {0, 1}n and columns indexed by
rectangles R ⊆ {0, 1}n × {0, 1}n such that
                                      
                                      
                                      
                                       1      if f (x, y) = 1 and (x, y) ∈ R,
                                               if f (x, y) = 1 and (x, y) 6∈ R,
                                      
                                      0
                          M(x,y),R :=
                                      0
                                              if f (x, y) = 0 and (x, y) ∈ R,
                                      
                                       α
                                      
                                               if f (x, y) = 0 and (x, y) 6∈ R.
                                        1−α/2

We claim that for every distribution µ over inputs, there exists a rectangle R such that E(Mµ,R ) ≥ α
(where E denotes expectation). If µ(00) = 0 then take R := {0, 1}n × {0, 1}n , and if µ(11) = 0 then take
      / Otherwise, let µ 0 be the balanced version of µ and invoke Claim 3.2 to find an R such that
R := 0.
µ (R | 1 ) ≥ α and µ 0 (R | 0 ) ≤ α/2. Then we have
  0

                                                                   α
                             E(Mµ,R ) = µ(R | 1) · µ(11) +               · µ(R | 0) · µ(00)
                                                               1 − α/2
                                                                    α
                                      = µ 0 (R | 1 ) · µ(11) +            · µ 0 (R | 0 ) · µ(00)
                                                               1 − α/2
                                                           α
                                      ≥ α · µ(11) +               · (1 − α/2) · µ(00)
                                                        1 − α/2
                                      = α.

Now by the minimax theorem (we can use the most basic version since the matrix M is finite), there
exists a distribution D over rectangles such that for every input (x, y), E(M(x,y),D ) ≥ α. If f (x, y) = 1

                            T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                              11
                                   M IKA G Ö ÖS AND T HOMAS WATSON

this means the probability a random rectangle from D contains (x, y) is at least α. If f (x, y) = 0 this
means α/(1 − α/2) times the probability a random rectangle from D does not contain (x, y) is at least
α, in other words the probability a random rectangle from D contains (x, y) is at most α/2. Thus the
protocol that picks a random rectangle from D and accepts iff the input is in the rectangle shows that
R pub
  α, α/2( f ) ≤ 2 and hence SBP( f ) ≤ 2 + log(1/α) = 2 + Corr( f ).



4 USBP lower bound
In this section we prove Theorem 1.3, which states that USBP(D ISJ) = Θ(n). We first give an informal
overview.
    Our proof uses the by-now standard information complexity approach [4, 14]. In this approach, one
considers some suitably distributed random input (X,Y ) and measures the amount of information that
the protocol transcript Π(X,Y ) (i. e., the concatenation of all messages sent) “leaks” about the input
as quantified by the mutual information I(Π(X,Y ) ; X,Y ). Lower bounding the mutual information has
the side effect of lower bounding the entropy H(Π(X,Y )) of the transcript, which in turn lower bounds
the length of the transcript and thereby the communication complexity. It is often useful to involve the
conditional versions of these information measures, defined by

             H(Π | Z) := Ez∼Z H(Π | Z = z) and       I(Π ; X,Y | Z) := Ez∼Z I(Π ; X,Y | Z = z)

where Z is some random variable (jointly distributed with X and Y ). We refer the reader to [16] for
discussions of these basic information theory concepts.
    A key benefit of studying mutual information is that one automatically obtains for it a direct sum
property (as in [14, 4]), as long as the coordinates (Xi ,Yi ), i ∈ [n], are mutually independent. This way,
the task of proving an Ω(n) lower bound for the original problem reduces to the task of proving an
Ω(1) information lower bound for some constant-size “gadget.” For the S ET-D ISJOINTNESS function
D ISJ := A NDn ◦ NANDn this gadget is typically NAND.
    Our proof follows this outline. The reduction to the single-gadget case will be packaged into
Lemma 4.1 and is standard. By contrast, in proving the Ω(1) information lower bound for the single
gadget, we need to overcome the following two new technical issues.

(1) Small acceptance probabilities. Since the protocol is only required to succeed with a tiny prob-
ability α(n) on 1-inputs, the transcript of Π may be useless most of the time: Imagine a protocol that
rejects with probability 1 − α at the start (and otherwise does something useful). The entropy of the
transcript of such protocols can be as low as O(α).
    To address this issue, we do not work with the transcript distribution of Π directly, but rather with
the conditional distribution given that the protocol accepts. That is, for 1-inputs (x, y), we consider the
random variable
                           T (x, y) := Π(x, y) | Π(x, y) is an accepting transcript
and proceed to lower bound I(T (X,Y ) ; X,Y ) instead. One subtlety is that conditioning on acceptance
does not “commute” with the reduction to the single-gadget case. We must consider the distribution of T
that arises from first conditioning on acceptance and then doing the reduction, which is generally not the

                       T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                              12
            C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

                                                                                0              1
    Protocol Π∗ . On input (x, y) ∈ {0, 1}2 :
    1. If x = 0 Alice sends a “1.” If x = 1 Alice sends a “1”
                                                                       0       11       11      10
       with probability α and rejects otherwise (by sending a
       “0”).

    2. Suppose Alice sent a “1.” Then if y = 0 Bob accepts                     11       11      10
       (by sending a “1”). If y = 1 then Bob accepts with
                                                                       1
       probability α and rejects otherwise.                                     0              0


Figure 1: Protocol Π∗ for NAND. In the illustration on the right, each of the input blocks is further
subdivided into rectangles according to the outcomes of the private coins. The rectangles are labeled with
the associated transcripts.


same distribution as if we did the reduction and then conditioned on acceptance. However, this is not a
significant technical obstacle.


(2) Large acceptance probabilities. The acceptance probability of a protocol Π can vary between α
and 1 when run on different 1-inputs. This, together with our conditioning idea above, introduces a new
problem: there are USBP protocols for NAND such that the associated T leaks no information about the
input!
     Indeed, consider the protocol Π∗ for NAND given in Figure 1. This protocol accepts the 1-input
(0, 0) with probability 1, the 1-inputs (0, 1) and (1, 0) with probability α, and the 0-input (1, 1) with
probability α 2 . Choosing α such that α 2 ≤ α/2 we obtain a USBP protocol for NAND where the
associated conditioned-on-acceptance variable T ∗ is constant (the protocol Π∗ has only one accepting
transcript, namely “11”).
     To avoid this problem, we use a more complicated gadget than NAND; see Figure 2a. The new
gadget G contains two instances of NAND: in Figure 2b one instance of NAND corresponds to the pair
of edges AB and another one to AC. We show that the bad situation described above—i. e., the 1-input
(0, 0) of NAND having much higher acceptance probability than the other two 1-inputs (0, 1) and (1, 0)
of NAND—cannot happen simultaneously for both instances. One subtlety (arising from the conditioning
not “commuting” with the reduction, as described above) is that the bad situation actually depends on
the transcript, with some transcripts being bad for AB and some being bad for AC, but none being bad
for both. We prove an information lower bound (conditioned on acceptance) for whichever instance of
NAND behaves better for “most” transcripts.
     We note that a similar technical issue arose in the proof of Braverman and Moitra [8] when analyzing
the case α = 1/2 + ε, β = 1/2 − ε. Their solution involved applying a certain type of random-self-
reduction (they called it smoothing) to the inputs before invoking the protocol. This approach is highly
tailored to their setting and does not seem to be directly helpful to us. Nevertheless, our gadget G was
inspired by their analysis.

                       T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                            13
                                            M IKA G Ö ÖS AND T HOMAS WATSON

                                  1         2                                           A
                            0     1         1
                                                                                               C
                            1     0         1                                    B

                            2     1         0

                                      (a)                                               (b)

Figure 2: (a) Truth table of the gadget G. (b) Distributions QA , QB , and QC are uniform over the endpoints
of the edges A, B, and C, respectively.


4.1    Proof of Theorem 1.3
Define the gadget G : {0, 1, 2} × {1, 2} → {0, 1} as the indicator for non-equality; see Figure 2a. Define
the function F : {0, 1, 2}n × {1, 2}n → {0, 1} by F := A NDn ◦ Gn , i. e., F(x, y) = 1 iff G(xi , yi ) = 1 for all
i ∈ [n]. Since G reduces to D ISJ on 2 bits by the map (0 7→ 00, 1 7→ 01, 2 7→ 10), we find that F reduces
to D ISJ on 2n bits. Hence it suffices to prove that USBP(F) ≥ Ω(n).

Input distribution. Define QA , QB , QC to be the following three distributions over {0, 1, 2}×{1, 2}: QA
is uniform over {(0, 1), (0, 2)}, QB is uniform over {(0, 1), (2, 1)}, and QC is uniform over {(0, 2), (1, 2)};
see Figure 2b. For i ∈ [n], u ∈ {0, 1, 2}, v ∈ {1, 2}, and z a length-(n − 1) string over the alphabet
{A, B, C} indexed by [n]\{i}, define Di,u,v,z to be the distribution over pairs (x, y) ∈ {0, 1, 2}n × {1, 2}n
obtained by setting xi = u, yi = v, and for each j 6= i (independently) sampling (x j , y j ) from Qz j . Note
that support(Di,u,v,z ) ⊆ F −1 (G(u, v)) and that x and y are independent when sampled from Di,u,v,z .

Reduction to the single-gadget case. Let Π be an R priv     α, α/4 protocol for F (recall that by and-ampli-
fication we may assume β = α/4 in the definition of USBP). Let Π(x, y) denote the transcript of Π
on input (x, y). Thus Π(x, y) is a random variable whose outcome depends on the private coins of the
protocol. For (x, y) ∈ F −1 (1) define T (x, y) as the random variable whose distribution is that of Π(x, y)
conditioned on Π(x, y) being an accepting transcript.
    Suppose for contradiction that the transcripts have length less than n/2400. Using the direct sum
methodology we will next find a coordinate i (and a string z) such that the protocol leaks very little
information about the i-th input (conditioned on the data z). This is formalized in the following lemma
whose proof we defer to Section 4.2 as it is essentially identical to the corresponding argument in [4].
Below, kX −Y k denotes the statistical distance between the distributions of the random variables X and
Y.
Lemma 4.1. If γ > 0 is such that for all i and z either

kT (Di,0,1,z ) − T (Di,0,2,z )k ≥ γ     or kT (Di,0,1,z ) − T (Di,2,1,z )k ≥ γ       or kT (Di,0,2,z ) − T (Di,1,2,z )k ≥ γ ,

then some transcript has length at least nγ 2 /6.

                           T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                           14
              C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

    Contrapositively, letting γ = 1/20, Lemma 4.1 implies that there exists an i and z (which we fix
henceforth) such that kT (Di,0,1,z ) − T (Di,0,2,z )k, kT (Di,0,1,z ) − T (Di,2,1,z )k, and kT (Di,0,2,z ) − T (Di,1,2,z )k
are all less than γ.

The single-gadget case. Let (u, v) be an input to G and let τ be a transcript. We define
                                                      
                        πuv (τ) := Pr Π(Di,u,v,z ) = τ for any (u, v) ,
                        tuv (τ) := Pr T (Di,u,v,z ) = τ for (u, v) ∈ G−1 (1) .
                                                      


Note that πuv (τ)/tuv (τ) is generally not (as might appear at a glance) the acceptance probability of Π on
a random input from Di,u,v,z (this is related to the conditioning on acceptance not “commuting” with the
reduction to the single-gadget case, as mentioned in the intuition).
    We henceforth adopt the convention that 0/0 := 0. Let
                                                                           
                                                        π01 (τ) π02 (τ)
                                  S := accepting τ :             ≤
                                                         t01 (τ)    t02 (τ)

and let S := {accepting τ}\S. Connecting to the intuition, S consists of transcripts that are bad for AC,
and S consists of transcripts that are bad for AB. Since kT (Di,0,1,z ) − T (Di,0,2,z )k < γ, we have either
                                            1−γ                                      1−γ
                        Pr T (Di,0,1,z ) ∈ S ≥                or   Pr T (Di,0,2,z ) ∈ S ≥    .
                                               2                                          2
Henceforth assume the formercase; a completely analogous argument handles the latter case. We now
only need to consider (u, v) ∈ (0, 1), (2, 1), (0, 2), (2, 2) ,forming the AB instance of NAND in G. (In
the latter case, we would only need to consider (u, v) ∈ (0, 1), (1, 1), (0, 2), (1, 2) , forming the AC
instance of NAND in G.)
    Note that π22 (τ) · π01 (τ) = π21 (τ) · π02 (τ) by the basic rectangular structure of τ. Also note that if
G(u, v) = 1 and τ is accepting, then the following both hold.
                                                                    (τ)
   − We have πuv (τ) = 0 iff tuv (τ) = 0, and hence πuv (τ) = πtuvuv(τ) · tuv (τ).
                                                             (τ)
   − Assuming πuv (τ) and tuv (τ) are nonzero, we have πtuvuv(τ) ≥ α. This is because
                                                                                              
                  πuv (τ) = E(x,y)∼Di,u,v,z Pr[Π(x, y) accepts] · Pr Π(x, y) = τ Π(x, y) accepts
                                                                              
                          ≥ α · E(x,y)∼Di,u,v,z Pr Π(x, y) = τ Π(x, y) accepts
                           = α · tuv (τ)

       since support(Di,u,v,z ) ⊆ F −1 (1) and Π is correct.

For (u, v) ∈ {(2, 1), (0, 2)} define γuv (τ) := |tuv (τ) − t01 (τ)| and note that

                                  ∑          γuv (τ) = 2kT (Di,u,v,z ) − T (Di,0,1,z )k < 2γ .                       (4.1)
                               accepting τ



                          T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                          15
                                     M IKA G Ö ÖS AND T HOMAS WATSON

Claim 4.2. For all accepting τ,

                                  t21 (τ) · t02 (τ)
                                                    ≥ t01 (τ) − γ21 (τ) − γ02 (τ) .                        (4.2)
                                       t01 (τ)

Proof of Claim. It suffices to show that

                             t21 (τ) · t02 (τ) ≥ t01 (τ)2 − t01 (τ) γ21 (τ) + γ02 (τ) .
                                                                                     
                                                                                                           (4.3)

(If t01 (τ) 6= 0 then (4.2) follows by dividing (4.3) by t01 (τ), and if t01 (τ) = 0 then (4.2) follows since
its right side is nonpositive and its left side is 0; recall our convention that 0/0 := 0.) For some signs
σuv (τ) ∈ {1, −1}, the left side of (4.3) equals
                                                                                
                              t01 (τ) + σ21 (τ)γ21 (τ) · t01 (τ) + σ02 (τ)γ02 (τ) ,

which expands to

             t01 (τ)2 + σ21 (τ)t01 (τ)γ21 (τ) + σ02 (τ)t01 (τ)γ02 (τ) + σ21 (τ)σ02 (τ)γ21 (τ)γ02 (τ) .     (4.4)

If σ21 (τ) = σ02 (τ) then (4.4) is at least the right side of (4.3) since the last term of (4.4) is nonnegative.
If σ21 (τ) 6= σ02 (τ), say σ21 (τ) = −1 and σ02 (τ) = 1, then (4.4) is at least the right side of (4.3) since the
sum of the last two terms in (4.4) is t01 (τ)γ02 (τ) − γ21 (τ)γ02 (τ) = t21 (τ)γ02 (τ) ≥ 0.

    We have
                                              
                        Pr Π(Di,2,2,z ) accepts =         ∑          π22 (τ)
                                                       accepting τ

                                                   ≥ ∑ π22 (τ)
                                                       τ∈S
                                                           π21 (τ) · π02 (τ)
                                                   ≥ ∑
                                                       τ∈S      π01 (τ)
                                                           π21 (τ) π02 (τ)
                                                           t21 (τ) · t02 (τ) t21 (τ) · t02 (τ)
                                                   = ∑          π01 (τ)
                                                                            ·
                                                       τ∈S                        t01 (τ)
                                                                 t01 (τ)
                                                                                      
                                                   ≥ ∑ α · t01 (τ) − γ21 (τ) − γ02 (τ)
                                                       τ∈S
                                                          1−γ                    
                                                   > α·               − 2γ − 2γ
                                                               2
                                                   > α/4 .

To see that the fifth line follows from the fourth, consider each τ ∈ S: If t21 (τ) 6= 0 and t02 (τ) 6= 0 then it
follows by
                                   π21 (τ)             π02 (τ)  π01 (τ)
                                           ≥ α and                       ≥1
                                   t21 (τ)              t02 (τ) t01 (τ)

                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                  16
             C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

(since τ ∈ S) and Claim 4.2. On the other hand, if t21 (τ) = 0 or t02 (τ) = 0, say, t21 (τ) = 0, then it follows
since the summand on the fourth line is 0, and t01 (τ) − γ21 (τ) = 0 so the summand on the fifth line is
nonpositive. The sixth line follows from the fifth by
                                                     1−γ
               ∑ t01 (τ) = Pr T (Di,0,1,z ) ∈ S ≥       2
                                                             and     ∑ γ21 (τ) ≤ ∑               γ21 (τ)
              τ∈S                                                    τ∈S           accepting τ

and (4.1), and similarly for γ02 .
    We conclude that Pr Π(x, y) accepts > α/4 for some (x, y) ∈ support(Di,2,2,z ) ⊆ F −1 (0), contradict-
                                         

ing the correctness of Π. This finishes the proof of Theorem 1.3.

4.2   Proof of Lemma 4.1
Define jointly distributed random variables X = X1 · · · Xn ∈ {0, 1, 2}n , Y = Y1 · · ·Yn ∈ {1, 2}n , and Z =
Z1 · · · Zn ∈ {A, B, C}n as follows: Z is uniform, and given a particular choice of Z, for each i ∈ [n]
(independently) (Xi ,Yi ) is sampled from QZi . Thus the marginal distribution of (X,Y ) is that for each
i (independently), (Xi ,Yi ) has probability 1/3 for each of (0, 1), (0, 2), and probability 1/6 for each
of (2, 1), (1, 2). Since the support of (X,Y ) is in F −1 (1), we may also view T as a random variable
distributed jointly with (X,Y, Z). Let Z−i denote Z1 · · · Zi−1 Zi+1 · · · Zn .
     For any i ∈ [n], zi ∈ {A, B, C}, and z−i ∈ {A, B, C}n−1 indexed by [n]\{i}, we can view
                                                                     
                                               T (Di,Qzi ,z−i ), Qzi

as a pair of jointly distributed random variables that is distributed identically to
                                                                       
                                       T, (Xi ,Yi ) Zi = zi , Z−i = z−i .

For all i and z−i , by a standard lemma (see [4, Lemma 6.2 and Proposition 6.10]) we have

                            I T (Di,QA ,z−i ) ; QA ≥ kT (Di,0,1,z−i ) − T (Di,0,2,z−i )k2 /2
                                                  

and similarly for B and C. Therefore

           Maximum length of transcript ≥ H(T | Z)
                                             ≥ I(T ; X,Y | Z)
                                             ≥ ∑ I(T ; Xi ,Yi | Z)
                                                   i
                                                     1                                      
                                             = ∑ Ez−i  ∑  I T ; Xi ,Yi Zi = zi , Z−i = z−i
                                               i     3 zi
                                                     1                           
                                             = ∑ Ez−i ∑ I T (Di,Qzi ,z−i ) ; Qzi
                                               i     3 zi
                                                     1 1
                                             ≥ ∑ Ez−i · kT (Di,0,1,z−i ) − T (Di,0,2,z−i )k2
                                                     3 2
                                               i            +kT (Di,0,1,z−i ) − T (Di,2,1,z−i )k2 
                                                                +kT (Di,0,2,z−i ) − T (Di,1,2,z−i )k2

                        T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                                 17
                                   M IKA G Ö ÖS AND T HOMAS WATSON

                                           ≥ ∑ Ez−i (γ 2 /6)
                                                i
                                           = nγ 2 /6 .

where the third line follows by a standard direct sum property for conditional information cost [4].


5     Extended formulations for Maximum-Clique
Recall that the nonnegative rank of a nonnegative ` × m matrix M, denoted rank+ (M), is the least r
such that M = UV for some nonnegative matrices U and V of dimensions ` × r and r × m, respectively.
We say M is an ε-UD ISJ matrix (where ε is to be thought of as a function of n, and UD ISJ stands for
U NIQUE -S ET-D ISJOINTNESS) iff it is 2n × 2n with rows and columns indexed by subsets x, y ⊆ [n], each
entry with |x ∩ y| = 0 is 1, each entry with |x ∩ y| = 1 is 1 − ε, and all other entries are nonnegative. The
authors of [6] proved the following two things.
                                                         2
     (i) For every ε-UD ISJ matrix M, rank+ (M) ≥ 2Ω(ε n) .
    (ii) For some ε-UD ISJ matrix M, rank+ (M) lower bounds the complexity of 1/ε-approximate extended
         formulations for a certain natural linear programming encoding of the maximum-clique problem
         for graphs.

    The lower bound in (i) was improved to 2Ω(εn) in [8], and a simpler alternative proof of this stronger
bound was given in [7]. Combining such nonnegative rank lower bounds with (ii) yields lower bounds on
the approximate extended formulation complexity for maximum-clique. It is not known how to exploit
the additional structure of the matrix M from the proof of (ii) when lower bounding the nonnegative rank.
    The paper [6] proved (i) by developing a generalization of the corruption lemma from [35]. Both of the
subsequent papers [8, 7] employed information-theoretic methods. We now show that an idea analogous
to our and-amplification-based communication lower bound proof (Theorem 1.4 and Theorem 1.5) can
be used to “bootstrap” (i) to get the stronger 2Ω(εn) lower bound, thus bypassing the information-theoretic
methods of [8, 7].
    Given a nonnegative matrix M, let M k denote the entrywise k-th power of M. It is an elementary fact
that rank+ (M k ) ≤ rank+ (M)k . If M is an ε-UD ISJ matrix, then choosing k = Θ(1/ε), we find that M k is
a c-UD ISJ matrix for some c ≥ 1/2. Then by (i) we have rank+ (M k ) ≥ 2Ω(n) and thus
                                                                 1/k
                             rank+ (M) ≥ rank+ (M k )1/k ≥ 2Ω(n)      ≥ 2Ω(εn) .


6     Subsequent developments
It has been proven in [21] that SBP( f ) ≤ O(USBP( f ) + log n) for all f (even partial f ) using a new
and elementary “Truncation Lemma” which enables rectangle-based techniques to be applied to low
nonnegative rank matrices in certain situations. Combining this result with Corollary 1.2 yields an
alternative proof of Theorem 1.3 and shows the class equality SBP = USBP.
    Furthermore, this result implies that the tight extended formulation lower bound for maximum-clique
(discussed in Section 1.5 and Section 5) can be derived from Razborov’s corruption lemma [35] in a

                       T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                               18
            C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

black-box way, without needing any of the machinery in [6, 8, 7]: Specifically, any ε-UD ISJ matrix
having nonnegative rank r can be interpreted as witnessing

                                  R priv
                                    α, (1 − ε)α (UD ISJ ) ≤ log r + O(1)

(for some α > 0). This protocol can be and-amplified to witness USBP(UD ISJ) ≤ O((log r)/ε), and
thus SBP(UD ISJ) ≤ O((log r)/ε + log n). Finally, Corollary 1.2, which is an elementary consequence of
Razborov’s corruption lemma, implies that log r ≥ Ω(εn).
    It has also been proven in [21] that MA 6= SBP, solving another open problem posed in the original
version of this paper [22].


Acknowledgements
We thank Mark Braverman, Tom Gur, Raghu Meka, Toniann Pitassi, and anonymous reviewers for
comments and discussions.


References
 [1] S COTT A ARONSON AND AVI W IGDERSON: Algebrization: A new barrier in complexity the-
     ory. ACM Trans. Comput. Theory, 1(1):2:1–2:54, 2009. Prelimianry version in STOC’08.
     [doi:10.1145/1490270.1490272] 4
 [2] L ÁSZLÓ BABAI , P ETER F RANKL , AND JANOS S IMON: Complexity classes in communica-
     tion complexity theory. In Proc. 27th FOCS, pp. 337–347. IEEE Comp. Soc. Press, 1986.
     [doi:10.1109/SFCS.1986.15] 2, 3
 [3] L ÁSZLÓ BABAI AND S HLOMO M ORAN: Arthur–Merlin games: A randomized proof system, and a
     hierarchy of complexity classes. J. Comput. System Sci., 36(2):254–276, 1988. [doi:10.1016/0022-
     0000(88)90028-1] 3
 [4] Z IV BAR -YOSSEF, T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: An information statistics
     approach to data stream and communication complexity. J. Comput. System Sci., 68(4):702–732,
     2004. Preliminary version in FOCS’02. [doi:10.1016/j.jcss.2003.11.006] 2, 5, 12, 14, 17, 18
 [5] E LMAR B ÖHLER , C HRISTIAN G LAßER , AND DANIEL M EISTER: Error-bounded probabilistic
     computations between MA and AM. J. Comput. System Sci., 72(6):1043–1076, 2006. Preliminary
     version in MFCS’03. [doi:10.1016/j.jcss.2006.05.001] 3
 [6] G ÁBOR B RAUN , S AMUEL F IORINI , S EBASTIAN P OKUTTA , AND DAVID S TEURER: Approxi-
     mation limits of linear programs (beyond hierarchies). Math. Oper. Res., 40(3):756–772, 2015.
     Preliminary version in FOCS’12. [doi:10.1287/moor.2014.0694, arXiv:1204.0957] 6, 18, 19
 [7] G ÁBOR B RAUN AND S EBASTIAN P OKUTTA: Common information and unique disjointness.
     Algorithmica, pp. 1–33, 2016. Preliminary versions in FOCS’13 and ECCC. [doi:10.1007/s00453-
     016-0132-0] 6, 18, 19

                      T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                         19
                                M IKA G Ö ÖS AND T HOMAS WATSON

 [8] M ARK B RAVERMAN AND A NKUR M OITRA: An information complexity approach to extended
     formulations. In Proc. 45th STOC, pp. 161–170. ACM Press, 2013. Preliminary version in ECCC.
     [doi:10.1145/2488608.2488629] 2, 5, 6, 13, 18, 19

 [9] M ARK B RAVERMAN AND O MRI W EINSTEIN: A discrepancy lower bound for information com-
     plexity. In Proc. 16th Internat. Workshop on Randomization and Computation (RANDOM’12),
     pp. 459–470. Springer, 2012. Preliminary version in ECCC. [doi:10.1007/978-3-642-32512-0_39,
     arXiv:1112.2000] 4

[10] H ARRY B UHRMAN , N IKOLAI V ERESHCHAGIN , AND RONALD DE W OLF: On computation and
     communication with small bias. In Proc. 22nd IEEE Conf. on Computational Complexity (CCC’07),
     pp. 24–32. IEEE Comp. Soc. Press, 2007. [doi:10.1109/CCC.2007.18] 3

[11] A MIT C HAKRABARTI , G RAHAM C ORMODE , A NDREW M C G REGOR , AND J USTIN T HALER:
     Annotations in data streams. ACM Trans. Algorithms, 11(1):7, 2014. Preliminary version in ECCC.
     [doi:10.1145/2636924] 4

[12] A MIT C HAKRABARTI , G RAHAM C ORMODE , A NDREW M C G REGOR , J USTIN T HALER , AND
     S URESH V ENKATASUBRAMANIAN: Verifiable stream computation and Arthur–Merlin communi-
     cation. In Proc. 30th IEEE Conf. on Computational Complexity (CCC’15), pp. 217–243. Schloss
     Dagstuhl, 2015. Preliminary version in ECCC. [doi:10.4230/LIPIcs.CCC.2015.217] 4

[13] A MIT C HAKRABARTI AND O DED R EGEV: An optimal lower bound on the communication
     complexity of Gap-Hamming-Distance. SIAM J. Comput., 41(5):1299–1317, 2012. Preliminary
     version in STOC’11. [doi:10.1137/120861072, arXiv:1009.3460] 5

[14] A MIT C HAKRABARTI , YAOYUN S HI , A NTHONY W IRTH , AND A NDREW YAO: Informational
     complexity and the direct sum problem for simultaneous message complexity. In Proc. 42nd FOCS,
     pp. 270–278. IEEE Comp. Soc. Press, 2001. [doi:10.1109/SFCS.2001.959901] 5, 12

[15] A RKADEV C HATTOPADHYAY AND T ONIANN P ITASSI: The story of set disjointness. ACM SIGACT
     News, 41(3):59–85, 2010. [doi:10.1145/1855118.1855133] 2

[16] T HOMAS C OVER AND J OY T HOMAS: Elements of Information Theory. Wiley, 2006. 12

[17] S COTT D IEHL: Lower bounds for swapping Arthur and Merlin. In Proc. 11th Internat. Workshop on
     Randomization and Computation (RANDOM’07), pp. 449–463. Springer, 2007. [doi:10.1007/978-
     3-540-74208-1_33] 4

[18] J ÜRGEN F ORSTER: A linear lower bound on the unbounded error probabilistic communication
     complexity. J. Comput. System Sci., 65(4):612–625, 2002. Preliminary version in CCC’01.
     [doi:10.1016/S0022-0000(02)00019-3] 5, 8

[19] A NAT G ANOR , G ILLAT KOL , AND R AN R AZ: Exponential separation of information and commu-
     nication for boolean functions. In Proc. 47th STOC, pp. 557–566. ACM Press, 2015. Preliminary
     version in ECCC. [doi:10.1145/2746539.2746572] 4

                     T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                        20
            C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

[20] S HAFI G OLDWASSER AND M ICHAEL S IPSER: Private coins versus public coins in interactive proof
     systems. In Proc. 18th STOC, pp. 59–68. ACM Press, 1986. [doi:10.1145/12130.12137] 4

[21] M IKA G ÖÖS , S HACHAR L OVETT, R AGHU M EKA , T HOMAS WATSON , AND DAVID Z UCKERMAN:
     Rectangles are nonnegative juntas. In Proc. 47th STOC, pp. 257–266. ACM Press, 2015. Preliminary
     version in ECCC. [doi:10.1145/2746539.2746596] 18, 19

[22] M IKA G ÖÖS AND T HOMAS WATSON: Communication complexity of set-disjointness for all
     probabilities. In Proc. 18th Internat. Workshop on Randomization and Computation (RANDOM’14),
     pp. 721–736. Schloss Dagstuhl, 2014. Preliminary versin in ECCC. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2014.721] 1, 19

[23] T OM G UR AND R AN R AZ: Arthur–Merlin streaming complexity. Inform. and Comput., 243:145–
     165, 2015. Preliminary version in ICALP’13. [doi:10.1016/j.ic.2014.12.011, arXiv:1302.0418]
     4

[24] R AHUL JAIN AND H ARTMUT K LAUCK: The partition bound for classical communication complex-
     ity and query complexity. In Proc. 25th IEEE Conf. on Computational Complexity (CCC’10), pp.
     247–258. IEEE Comp. Soc. Press, 2010. [doi:10.1109/CCC.2010.31, arXiv:0910.4266] 4

[25] R AHUL JAIN , T ROY L EE , AND N ISHEETH V ISHNOI: A quadratically tight partition bound for
     classical communication complexity and query complexity, 2014. [arXiv:1401.4512] 4

[26] S TASYS J UKNA: Boolean Function Complexity: Advances and Frontiers. Volume 27 of Algorithms
     and Combinatorics. Springer, 2012. [doi:10.1007/978-3-642-24508-4] 2, 3

[27] BALA K ALYANASUNDARAM AND G EORG S CHNITGER: The probabilistic communication com-
     plexity of set intersection. SIAM J. Discrete Math., 5(4):545–557, 1992. Preliminary version in
     SCT’87. [doi:10.1137/0405044] 2, 5

[28] I ORDANIS K ERENIDIS , S OPHIE L APLANTE , V IRGINIE L ERAYS , J ÉRÉMIE ROLAND , AND DAVID
     X IAO: Lower bounds on information complexity via zero-communication protocols and applica-
     tions. SIAM J. Comput., 44(5):1550–1572, 2015. Preliminary versions in FOCS’12 and ECCC.
     [doi:10.1137/130928273, arXiv:1204.1505] 4

[29] H ARTMUT K LAUCK: Rectangle size bounds and threshold covers in communication complexity. In
     Proc. 18th IEEE Conf. on Computational Complexity (CCC’03), pp. 118–134. IEEE Comp. Soc.
     Press, 2003. [doi:10.1109/CCC.2003.1214415, arXiv:cs/0208006] 4, 5, 9

[30] H ARTMUT K LAUCK: Lower bounds for quantum communication complexity. SIAM J. Com-
     put., 37(1):20–46, 2007. Preliminary version in FOCS’01. [doi:10.1137/S0097539702405620,
     arXiv:quant-ph/0106160] 5

[31] H ARTMUT K LAUCK: On Arthur Merlin games in communication complexity. In Proc. 26th
     IEEE Conf. on Computational Complexity (CCC’11), pp. 189–199. IEEE Comp. Soc. Press, 2011.
     [doi:10.1109/CCC.2011.33, arXiv:1101.0523] 4

                     T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                         21
                                M IKA G Ö ÖS AND T HOMAS WATSON

[32] E YAL K USHILEVITZ AND N OAM N ISAN: Communication Complexity. Cambridge Univ. Press,
     1997. 2, 3

[33] I LAN N EWMAN: Private vs. common random bits in communication complexity. Inform. Process.
     Lett., 39(2):67–71, 1991. [doi:10.1016/0020-0190(91)90157-D] 3

[34] R AMAMOHAN PATURI AND JANOS S IMON: Probabilistic communication complexity. J. Com-
     put. System Sci., 33(1):106–123, 1986. Preliminary version in FOCS’84. [doi:10.1016/0022-
     0000(86)90046-2] 3

[35] A LEXANDER A. R AZBOROV: On the distributional complexity of disjointness. Theoret. Comput.
     Sci., 106(2):385–390, 1992. Preliminary version in ICALP’90. [doi:10.1016/0304-3975(92)90260-
     M] 2, 5, 6, 18

[36] A LEXANDER A. S HERSTOV: Halfspace matrices. Comput. Complexity, 17(2):149–178, 2008.
     Preliminary version in CCC’07. [doi:10.1007/s00037-008-0242-4] 3

[37] A LEXANDER A. S HERSTOV: The pattern matrix method. SIAM J. Comput., 40(6):1969–2000,
     2011. Preliminary version in STOC’08. [doi:10.1137/080733644, arXiv:0906.4291] 3

[38] A LEXANDER A. S HERSTOV: The unbounded-error communication complexity of symmetric func-
     tions. Combinatorica, 31(5):583–614, 2011. Preliminary version in FOCS’08. [doi:10.1007/s00493-
     011-2580-0] 8

[39] A LEXANDER A. S HERSTOV: The communication complexity of Gap Hamming Distance. Theory
     of Computing, 8(8):197–208, 2012. Preliminary version in ECCC. [doi:10.4086/toc.2012.v008a008]
     5

[40] T HOMAS V IDICK: A concentration inequality for the overlap of a vector on a large set, with
     application to the communication complexity of the Gap-Hamming-Distance problem. Chicago J.
     Theoret. Comput. Sci., 2012(1):1–12, 2012. [doi:10.4086/cjtcs.2012.001] 5

[41] T HOMAS WATSON: The complexity of estimating min-entropy. Comput. Complexity, 25(1):153–
     175, 2016. Preliminary version in ECCC. [doi:10.1007/s00037-014-0091-2] 3


AUTHORS

      Mika Göös
      Graduate Student
      University of Toronto, Toronto, ON
      mika goos mail utoronto edu
      http://www.cs.utoronto.ca/~mgoos/




                     T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                        22
          C OMMUNICATION C OMPLEXITY OF S ET-D ISJOINTNESS FOR A LL P ROBABILITIES

    Thomas Watson
    Assistant Professor
    University of Memphis, Memphis, TN
    Thomas.Watson memphis edu
    http://www.cs.toronto.edu/~thomasw/


ABOUT THE AUTHORS

    M IKA G ÖÖS is a Ph. D. student in the Theory Group at the Department of Computer Science
       at the University of Toronto, advised by Toniann Pitassi. His research interests lie in
       computational complexity with a particular fondness for distributed graph algorithms
       and communication complexity.


    T HOMAS WATSON is an assistant professor of computer science at the University of Memphis.
       Previously, he was a postdoc at the University of Toronto, supervised by Toniann Pitassi.
       He received his Ph. D. in computer science from the University of California, Berkeley,
       supervised by Luca Trevisan. He received his undergraduate degree in computer science,
       math, and computer engineering from the University of Wisconsin-Madison, where his
       interest in theoretical computer science research was sparked and fostered by Dieter van
       Melkebeek. His research interests include most of complexity theory, but recently he has
       been working on communication complexity.




                    T HEORY OF C OMPUTING, Volume 12 (9), 2016, pp. 1–23                           23