Robust Lower Bounds for Communication and Stream Computation

Authors Amit Chakrabarti, Graham Cormode, Andrew McGregor,
Plaintext
                          T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35
                                       www.theoryofcomputing.org




          Robust Lower Bounds for
     Communication and Stream Computation
          Amit Chakrabarti∗                    Graham Cormode†                      Andrew McGregor‡
                    Received June 28, 2014; Revised May 27, 2016; Published August 22, 2016




       Abstract: We study the communication complexity of evaluating functions when the input
       data is randomly allocated (according to some known distribution) amongst two or more
       players, possibly with information overlap. This naturally extends previously studied variable
       partition models such as the best-case and worst-case partition models. We aim to understand
       whether the hardness of a communication problem holds for almost every allocation of the
       input, as opposed to holding for perhaps just a few atypical partitions.
           A key application is to the heavily studied data stream model. There is a strong connection
       between our communication lower bounds and lower bounds in the data stream model that are
       “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the
       items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from
       the set of all permutations. This random-order data stream model has attracted recent interest,
       since lower bounds here give stronger evidence for the inherent hardness of streaming
       problems.
      A short version of this article is available in the Proceedings of the 40th Annual ACM Symposium on Theory of Computing
(STOC’08), ACM, pp. 641–650 [6]. Compared to the conference presentation, this version considerably expands the detail of
the discussion and in the proofs, and substantially changes some of the proof techniques.
    ∗ Supported in part by NSF Awards CCF-0448277 and IIS-0916565, and a McLane Family Fellowship.
    † Supported in part by European Research Council grant ERC-2014-CoG 647557, the Yahoo Faculty Research and Engage-

ment Program and a Royal Society Wolfson Research Merit Award.
    ‡ This work was supported by NSF Awards CCF-0953754, CCF-1320719, and a Google Research Award.



ACM Classification: F.2.2
AMS Classification: 68Q25
Key words and phrases: communication complexity, lower bounds, data streams


© 2016 Amit Chakrabarti, Graham Cormode, and Andrew McGregor
c b Licensed under a Creative Commons Attribution License (CC-BY)                             DOI: 10.4086/toc.2016.v012a010
                 A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

          Our results include the first random-partition communication lower bounds for problems
      including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also
      extend and improve previous results for a form of pointer jumping that is relevant to the
      problem of selection (in particular, median finding). Collectively, these results yield lower
      bounds for a variety of problems in the random-order data stream model, including estimating
      the number of distinct elements, approximating frequency moments, and quantile estimation.


1    Introduction
Since its introduction in 1979 by Yao, communication complexity [44, 32] has proven to be a powerful
framework for proving lower bounds in a variety of settings, including the cell-probe and data stream
models, circuit and decision tree complexity and VLSI design. The majority of results in this area involve
a fixed-partition model of communication complexity, where the goal is for two or more players to
evaluate a function of an input that has been partitioned between them in a particular way, e. g., computing
 f (x, y) when one player holds x and the other has y. Many explicit functions can be shown to require
a large amount of communication to evaluate when the input is partitioned between the players in this
manner. These imply lower bounds for various models of computation, via arguments that such partitions
necessarily arise in the course of the computation.
      To a lesser extent, variable-partition models, such as best-case and worst-case partition, have also
been studied; see, e. g., [2, 33, 36] and [32, Chap. 7] for a survey. For example, understanding the
best-case partition complexity, where the data is partitioned in the most advantageous manner (subject to
constraints such as each player receiving an equal amount of the input), is important for understanding
various problems in VLSI design [2]. Another kind of worst-case partition arises when the corresponding
bits of two equal-length input strings are written on opposite sides of opaque cards (the “two-sided card
model” [14, 37]). However, a natural question that, to the best of our knowledge, has not been explored
to date, is what happens when the input is partitioned amongst the players at random. In other words,
does evaluating a given function require significant communication for only a few pathological partitions
or does such a requirement apply to an overwhelming fraction of all partitions?
      In this paper we initiate a study of communication complexity under random partitions of the input.
In fact, we consider more general allocations of the input to the players, possibly allowing information
overlap, where bits of data may be known to more than one player. A particularly interesting case is when
each token of data is given to a player chosen uniformly at random; this provides a convenient way to
count “bad” partitions. We consider a communication lower bound to be robust if it applies to all but
a small fraction of possible partitions. One can think of our work as a form of average-case analysis.
However, it is important to note that our work stands in contrast to the usual notion of distributional
complexity: rather than considering a random input, we consider worst-case inputs allocated randomly
amongst the players.

Data stream computation. A strong motivation for our study is the goal of proving robust lower
bounds for problems in the data stream model. The data stream model has enjoyed significant attention
in recent years owing to some influential work in the late 1990s [3, 26, 16]. Study of this model has
thrived both because of the rich theoretical questions it raises and its applicability to numerous real world

                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                               2
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

applications such as network monitoring and query planning in databases. Consequently, it is important
to understand the complexity of problems not just in worst-case but also in “average-case” settings. To
this end we prove lower bounds in the setting that the ordering of tokens in the data stream is chosen not
adversarially but randomly, from the set of all permutations. Arguably, such a lower bound provides a
stronger indication that a problem cannot be solved efficiently in the data stream model than a “fragile”
lower bound that might depend on a clever adversarial ordering. (For further, more detailed, justification
see the recent papers [23, 8]).
    Random-order data streams were considered by Munro and Paterson [35] in one of the first studies of
the data stream model. In recent years there has a been a resurgence of interest in this model for a variety
of reasons [8, 13, 23, 22, 24, 43]. Uniform or near-uniform orderings can arise in a number of ways,
such as when processing a stream of samples that are drawn independently from a non-time-varying
distribution. For problems such as quantile estimation and finding frequent items it has been shown that
there is a considerable difference between processing random-order stream and adversarial streams. In
particular, streaming algorithms to find the median using polylogarithmic space require exponentially
fewer passes if the stream is ordered randomly [23, 8].
    In this paper, we use robust lower bounds on communication complexity in order to deduce robust
data stream lower bounds. Once the communication bounds have been shown, the data stream bounds
follow by simple reductions to appropriate instances of communication. Where such bounds were known
before, our method yields cleaner proofs and tighter bounds. It also yields a number of new bounds for
random-order data streams.


Our results and overview. We begin in Section 2 with a formal definition of our model and introduce
some techniques and terminology. We prove the following results.

   • Multi-Party Set Disjointness: We consider the problem of t-way set disjointness where each entry
     of the relevant t × n matrix is given to one of p players chosen uniformly at random. If p = Ω(t 2 )
     then we show that any randomised protocol requires Ω(n/t) communication. See Section 3.
   • Pointer Jumping and Selection: We consider a natural variant of tree pointer jumping, called
     weight-based tree pointer jumping, that is related to the problem of selection. In this problem,
     instead of an explicit pointer at each node, we have a binary string at each node whose weight
     encodes the pointer. We consider trees of depth p + 1 and show that if the bits of these strings are
     distributed uniformly between two players, then, for every constant ε > 0, any p-round randomised
                                −p
     protocol requires Ω(n(2+ε) ) bits of communication. See Section 4 for details of this two-player
     result and a generalization to more than two players.
   • Hamming Distance and Index: For x, y ∈ {0, 1}n , let ∆(x, y) := |{i ∈ [n] : xi 6= yi }| denote the
     Hamming distance between x and y. We show that, for some constant c, any protocol that
                                                          √                          √
     can distinguish between the cases ∆(x, y) ≤ n/2 − c n and ∆(x, y) ≥ n/2 + c n requires Ω(n)
     communication if the 2n input bits are split uniformly between two players. We also show that a
     one-way protocol for the index problem—INDEX(x, j) := x j , with x ∈ {0, 1}n , j ∈ [n]—requires
     Ω(n) communication if the n + 1 tokens ( j being a single token) are split uniformly between two
     players. See Section 5.

                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                              3
                  A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

    The above communication lower bounds lead to lower bounds for a number of data stream problems in
the random-order model. In Section 6, we deduce such bounds, many of which are tight, for approximating
frequency moments, the number of distinct values, entropy, information divergences, selection, and graph
connectivity. Two of these bounds deserve particular emphasis. For the k-th frequency moment, we
obtain a robust lower bound of Ω(n1−3/k ) for k ≥ 3, where n is the universe size, which comes close to
the optimal Ω(n1−2/k ) bound under adversarial ordering. For the problem of finding the median of a
stream of length m, our framework greatly simplifies the proof of an Ω(log log m) lower bound [8] on the
number of passes required to achieve polylogarithmic space. Note that in a multi-pass algorithm, the data
is seen in the same order in each pass. Further, our pass-space tradeoff for this problem improves the
results of [8]. For instance, with two passes, we obtain a space lower bound of Ω(m1/10 ) as compared
with their Ω(m3/80 ).


2     Notation and preliminaries
We summarise some notation that we use throughout the paper. We use “log” and “ln” to denote base-
2 and natural logarithms, respectively. Define the weight |x| of a Boolean vector x ∈ {0, 1}N to be
|{i : xi = 1}|. Let ei denote the vector that is 1 at location i and 0 elsewhere. For random variables X and
Y , let E[X] denote the expectation and H(X) the entropy of X, H(X | Y ) the conditional entropy of X given
Y , and I(X : Y ) the mutual information between X and Y . We use some basic results from information
theory at certain points in this paper; the textbook by Cover and Thomas [12] is a good reference for all
such results. We write X ∼ µ to indicate that X is drawn from the probability distribution µ, and X ≡ Y
to indicate that X and Y have the same distribution. We denote the product of the distributions µ and ν by
µ ⊗ ν. We use the notation X ∈R S to denote that the random variable X is uniform over the set S.
     There are a large number of natural notions of “distance” between two probability distributions µ and
ν. In this paper, we use three of them: the total variation distance
                                                     1
                                         DTV (µ, ν) = kµ − νk1 ,
                                                     2
the Hellinger distance
                                                     1 √      √
                                        h(µ, ν) = √ k µ − νk2 ,
                                                      2
         √
where “ ·” denotes the pointwise positive square root, and the Kullback-Leibler divergence DKL (µkν),
which is also known as relative entropy. Unlike the first two of these “distances,” the third is not a metric.
    The Binomial distribution with parameters n (number of trials) and p (success probability) is denoted
B(n, p). For an integer k, Sk denotes the set of all k-subsets of S and 2S denotes the power set of S. We
say that a real quantity Q0 is an (ε, δ )-approximation for Q if Pr[|Q0 − Q| > εQ] ≤ δ . For a real value
x ∈ [0, 1] we let Hb (x) := −x log x − (1 − x) log(1 − x) denote the binary entropy function; for continuity
we define Hb (0) = Hb (1) = 0.

2.1   The communication model
Traditionally, a two-party communication problem (between Alice and Bob, say) is formalised as a
function, or partial function, on a domain of the form X ×Y , where the finite set X (resp. Y ) is the set

                         T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                              4
               ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

of Alice’s (resp. Bob’s) possible inputs. For our purposes, it is helpful to think of the input domain
represented differently. We shall think of an input as an m-tuple of tokens, where the tokens are given
to the players according to a random allocation drawn from a known distribution. Thus, it will help to
represent the input domain as X1 × X2 × · · · × Xm , where Xi is the set of possible values for the i-th token.
Typically, each Xi will be either the set {0, 1} or the set [N] := {1, 2, . . . , N}, for some positive integer N.
An allocation amongst p players is then a function σ : [m] → 2[p] .
    A natural and interesting special case of an allocation is a split, where each token is given to exactly
one player selected at random (not necessarily uniformly) from amongst all players. It will be convenient
to think of splits as functions σ : [m] → [p]. A further special case is that of a uniform split, where each
token is equally likely to go to each of the players. We let U p denote the probability distribution of a
uniform split amongst p players.
Definition 2.1 (Communication Problems and Protocols). A random-allocation communication problem
for p players consists of a function f : X1 × · · · × Xm → Z and a probability distribution ν on allocations
σ : [m] → 2[p] . A traditional communication problem is a special case, where ν is supported on a single
allocation (that is typically a split). Protocols, unless explicitly qualified otherwise, are assumed to be
randomised, with the players having access to private as well as public coins. (For a formal definition
of a “protocol,” we refer the reader to a standard textbook, such as Kushilevitz and Nisan [32].) For a
random-allocation protocol P, let P(x, σ ) denote the (possibly random) transcript of P, and out(P, x, σ )
the output of P, on input x allocated according to σ . For a traditional protocol, where σ has only one
possible value, we drop σ from these notations.
Definition 2.2 (Error, Cost, Complexity). Let P be a protocol for a random-allocation communication
problem ( f , ν). We define the error
                                err(P, f , ν) := max Pr[out(P, x, σ ) 6= f (x)] ,
                                                    x

where the probability is taken over σ ∼ ν and the (public and private) coins used by the protocol. If µ is
a distribution on the inputs to f , we define the distributional error
                                  errµ (P, f , ν) := Pr[out(P, X, σ ) 6= f (X)] ,
where X ∼ µ and σ ∼ ν. Let cost(P) := max |P(x, σ )| denote the communication cost of P, where this
maximum is taken over x, σ , and the random coins of P. We define the δ -error communication complexity
of ( f , ν) to be
                              Rδ ( f , ν) := min{cost(P) : err(P, f , ν) ≤ δ }
and the δ -error µ-distributional complexity to be
                              Rµ,δ ( f , ν) := min{cost(P) : errµ (P, f , ν) ≤ δ } .

Let R→ and Rk denote the restrictions of these notions to one-way and k-round protocols, respectively
(the notion of a “round” will be made precise later, when we use it). For traditional communication
problems, where there is a deterministic and well-known input allocation, we drop ν from these notations.
    Informally, a communication lower bound is robust if it applies to Rδ ( f , ν) or Rµ,δ ( f , ν) for some
high-entropy distribution ν, such as the aforementioned U p .

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                   5
                  A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

2.2   Technique preliminaries
In this section we introduce some of the main techniques that we use to establish our results. These are
all based on considering random input in addition to random splits.
     The notion of information complexity has been used on many occasions in the study of communication
protocols [11, 4, 9, 29]. Loosely speaking, information complexity is used to establish a direct sum
result, which reduces the problem of lower bounding the complexity of a “compound” problem (here,
disjointness) to that of lower bounding the complexity of a simpler “base” problem (here, the AND
function). The direct sum result follows from a simulation argument, where we design a protocol for
the base problem that randomly pads its input to generate an artificial input for the compound problem
and then simulates a protocol for the compound problem. Here, for our robust lower bounds for set
disjointness, we need to consider information complexity in a setting that allows both public and private
coins. This is a subtle matter: we must condition on the public coin to have a meaningful notion of
information complexity. At the same time, we must be careful about how the public coin is used in the
simulation argument, ensuring that we do not introduce undesirable correlations in the random padding.
Definition 2.3 (Information cost and complexity). Let PR be a protocol that uses a public random string
R (in addition to any private random strings that players use) and let µ be a distribution on its inputs. We
define
                           icostµ (PR ) := I(X : PR (X) | R) , where X ∼ µ .
Let D be a random variable, possibly correlated with X, but independent from R and any private
randomness used in P. We define the D-conditional µ-information cost

                                  icostµ (PR | D) := I(X : PR (X) | D, R) .

For each information cost measure above, we define a corresponding information complexity measure in
the natural way, e. g., for a communication problem f ,

                               ICµ,δ ( f ) := inf {icostµ (P) : err(P, f ) ≤ δ } ,

where P ranges over protocols that are allowed both private and public coins.
    We also consider random inputs X ∼ µ in another setting. Some of our lower bounds will use a
reduction from a communication problem in the fixed-partition model to one where the allocation σ ∼ ν.
In these reductions, the players choose σ using public random bits, but then distributing the input tokens
according to σ would seem to necessitate communicating a large fraction of the data and this would render
the reduction useless. The solution is to use distributional lower bounds on fixed-partition problems. This
suggests that the players may “guess” data that they do not know. Unfortunately, the issue that arises is
that this guessing may be correlated to the distribution of σ . However, the following lemma connects us
back to the “usual” situation, when inputs and allocations are independent of each other, provided this
correlation is sufficiently weak.
Lemma 2.4. If a protocol P satisfies Pr(x,σ )∼ξ [out(P, x, σ ) 6= f (x)] ≤ δ , for some joint distribution ξ ,
then for all input distributions µ and allocation distributions ν,

                                   errµ (P, f , ν) ≤ δ + DTV (µ ⊗ ν, ξ ) .


                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                6
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

Proof. Simply observe that

       errµ (P, f , ν) = Pr [out(P, x, σ ) 6= f (x)] ≤     Pr [out(P, x, σ ) 6= f (x)] + DTV (µ ⊗ ν, ξ ) .
                        x∼µ,                             (x,σ )∼ξ
                        σ ∼ν


2.3   Preliminary lemmas
We collect together a few basic results that we appeal to at various points in the paper. The first result is a
sharp lower bound on the communication complexity of the INDEX problem. In this problem, Alice holds
a string x ∈ {0, 1}n and Bob holds j ∈ [n]. The goal is for Bob to learn x j . See, e. g., Ablayev [1] for a
proof of the following result.

Lemma 2.5. Let n > 0 be an integer and δ ∈ (0, 1/2) be a real number. Then we have

                                       R→
                                        δ ( INDEX ) ≥ (1 − Hb (δ ))n .

    The second result is a well-known calculation giving a pair of probability bounds about what is often
called the “birthday problem.”

Lemma 2.6. For t, p ∈ N, let α(t, p) denote the probability that t independent random variables, each
drawn uniformly from [p], do not take t distinct values. Then
                                                               t−1         
                            −t(t−1)/(2p)                                i           t(t − 1)
                      1−e                  ≤ α(t, p) = 1 − ∏         1−         ≤            .
                                                               i=1      p              2p

    The third result upper bounds the total variation distance between binomial distributions with similar
parameters. The proof of this lemma is presented in Appendix A.

Lemma 2.7. There exists a constant c1 > 0 such that for all q ∈ [1/2, 1), r ∈ (0, 1), and a, w ∈ N,
                                                                          r
                    w                                                          2
                   √ ≤ r =⇒ DTV (B(a, q), B(a − w, q)) ≤ c1 r ln ,
                      v                                                        r

where v = aq(1 − q) is the variance of B(a, q). In order to define the total variation distance above, we
treat the binomial distribution B(n, p) as a distribution on the set of all non-negative integers, rather
than just {0, 1, . . . , n}.


3     Multi-party set disjointness
Let DISJn,t : {0, 1}nt → {0, 1} denote the following problem. The input is an (nt)-tuple of bits denoted
{xi j }i∈[t], j∈[n] , to be thought of as the entries of a t × n Boolean matrix. The input satisfies a unique
intersection promise, namely, each column of the matrix has weight in {0, 1,t} and at most one column
has weight t. The desired output is nj=1 ti=1 xi j . Gronemeier [20] culminated a line of work [3, 4, 9] on
                                          W    V

this problem, showing that Rδ (DISJn,t ) = Ω(n/t) under a t-player split where each player receives one
row of the matrix.

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                7
                   A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

      Let ANDt : {0, 1}t → {0, 1} be shorthand for DISJ1,t . That is, the input is a t-tuple of bits x =
(x1 , . . . , xt ) that satisfies the promise |x| ∈ {0, 1,t}. The desired output is ti=1 xi . Let D ∈R [t] and
                                                                                    V

X ∈R {0, eD }. Denote the resulting joint distribution of (X, D) by λ and the marginal distribution of X
by µ. The lower bound of [20] follows by carefully analysing ICµ,δ (ANDt | D) and using the direct sum
techniques of Bar-Yossef et al. [4] to link this quantity with ICµ n ,δ (DISJn,t | Dn ).
      Here, we consider the random-allocation communication problem (DISJn,t , U p ) for some suitably
large number, p, of players. We now prove a robust lower bound on its complexity by extending the
earlier techniques.

Lemma 3.1. Let δ 0 = δ + α(t, p). Then

                                     Rδ (DISJn,t , U p ) ≥ n · ICµ,δ 0 (ANDt | D) .

Proof. Let PR be a minimum-cost δ -error protocol for (DISJn,t , U p ) that uses a public random string R,
possibly in addition to private randomness. Then cost(PR ) = Rδ (DISJn,t , U p ). Consider n independent
pairs of random variables (X1 , D1 ), . . . , (Xn , Dn ), each drawn from λ . Then X := X1 X2 . . . Xn ∼ µ n is a
suitable random input for DISJn,t . Let S ∼ U p be a random split. Then, by standard information theoretic
arguments, we have

                           cost(PR ) = max |PR (x, σ )| ≥ H(PR (X, S))
                                          x,σ

                                      ≥ I(X : PR (X, S) | D1 D2 . . . Dn , R, S)

                                      ≥ ∑ I(X j : PR (X, S) | D1 D2 . . . Dn , R, S)                            (3.1)
                                          j∈[n]

                                      = ∑ Ed [I(X j : PR (X, S) | D j , R, S, D− j = d)] ,
                                          j∈[n]

where (3.1) holds because the X j s are independent even after conditioning on D1 D2 . . . Dn , R, and S. Here,
D− j denotes the vector (D1 , . . . , D j−1 , D j+1 , . . . , Dn ) and the final expectation is over d drawn uniformly
from [t][n]\{ j} . To finish the proof, it suffices to show that

                       c j,d := I(X j : PR (X, S) | D j , R, S, D− j = d) ≥ ICµ,δ 0 (ANDt | D) ,

for each j ∈ [n] and each d ∈ [t][n]\{ j} . To this end, we shall design a certain δ 0 -error t-party traditional
protocol QR,S
            j,d for ANDt , parametrised by j and d, that uses (R, S) as a public random string. Further,
                                                             ρ,σ
for each possible value (ρ, σ ) of (R, S), the transcript Q j,d (X j ) will either be constant or be distributed
identically to (PR (X, σ ) | R = ρ, D− j = d), and the players will know which case they are in based on σ
alone. Then, as required, we shall have

               ICµ,δ 0 (ANDt | D) ≤ icostµ (QR,S                    R,S
                                             j,d | D j ) = I(X j : Q j,d (X j ) | D j , R, S) ≤ c j,d .

                 ρ,σ
The protocol Q j,d works as follows. On input x = (x1 , . . . , xt ) ∈ {0, 1}t , the t players create a random
virtual input {Zik }i,k ∈ {0, 1}t×n for DISJn,t , pretend that this input has been split according to σ amongst
p virtual players, and then, if possible, simulate the behaviour of these virtual players when they execute

                          T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                     8
                ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

Pρ on the virtual input. The virtual input is obtained by embedding x into the j-th column of a random
Boolean matrix drawn from (µ n |D− j = d). To wit:
                                             
                                             {xi } ,
                                                      if k = j ,
                                       Zik ∈R {0} ,    if k 6= j and d(k) 6= i ,
                                             
                                              {0, 1} , if k =6 j and d(k) = i .
                                             

     Therefore, the simulation is possible iff σ assigns each of the inputs (Z1 j , . . . , Zt j ) to a distinct virtual
player; we shall say that σ ramifies if this condition is met. If σ does not ramify, the protocol ends
immediately (note that all players know σ so this happens without any communication), leading to a
constant empty transcript and an error probability of 1. If σ does ramify, then Player i plays the role of
that virtual player who is assigned Zi j by σ . The crucial observation that makes this role-playing possible
is that all the other bits assigned to that virtual player are available to Player i, because they are either set
to 0 or can be drawn uniformly at random from {0, 1} using Player i’s private coin. All virtual players
who are not assigned any of the inputs {Zi j }i∈[t] are simulated by Player 1 (say). Thus, if σ ramifies, then

                                      Q j,d (X j ) ≡ (PR (X, σ ) | R = ρ, D− j = d) .
                                         ρ,σ



Finally, QR,S              0
          j,d is indeed a δ -error protocol, because


          err(QR,S                                                                              0
               j,d , ANDt ) ≤ Pr[σ does not ramify] + err(P , DISJ n,t , U p ) = α(t, p) + δ = δ .
                                                           R


Lemma 3.2. There exists a constant c > 0 such that, for all δ ∈ (0, 1/10) and t ≥ 2,

                                                  ICµ,δ (ANDt | D) ≥ c/t .

Proof. This result can almost be deduced from Gronemeier [20], except for the subtlety introduced by
public coins. Specifically, from the work of Gronemeier we can deduce that for a private coin traditional
protocol P such that err(P, ANDt ) ≤ 1/10, we have icostµ (P | D) = Ω(1/t).
    To complete the proof, we show that public coins cannot help reduce information complexity.1 Con-
sider a general δ -error protocol QS for ANDt that uses a public random string S (recall that Definition 2.1
allows such a protocol to use both public and private coins). Let P be a private coin protocol in which
Player 1 generates S privately and announces it to all players, following which the players simulate QS .
Clearly, err(P, ANDt ) = err(QS , ANDt ). The transcript of P on input X is precisely (S, QS (X)). Thus,

        icostµ (P | D) = I(X : S, QS (X) | D) = I(X : S | D) + I(X : QS (X) | D, S) = icostµ (QS | D) ,

where the second step uses the chain rule and the third step uses the independence of S from (X, D).
Combining this with the private-coin lower bound completes the proof.

    Putting together Lemmas 2.6, 3.1 and 3.2 yields the following theorem.
   1 This observation is folklore but we have included a proof for the sake of completeness. Note that this situation is in contrast

to standard communication complexity, where the public-coin complexity could be smaller than the private-coin complexity.


                            T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                                 9
                      A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

Theorem 3.3. For all δ ∈ (0, 1/20), t = t(n) ≥ 2, and p ≥ 10t 2 , we have the robust lower bound

                                                Rδ (DISJn,t , U p ) = Ω(n/t) .

    We note that in order to get this kind of robust lower bound for DISJn,t under U p that increases linearly
with n, we must make p, the number of players, as large as Ω(t 2 ) for constant δ . This is because when
an input x such that DISJn,t (x) = 1 is allocated to p players, with probability α(t, p) there exists a player
that receives at least two tokens from the all-ones column. Therefore, a simple O(p)-communication
protocol, where each player announces whether or not they have received two 1s from the same column,
has error probability at most 1 − α(t, p). By Lemma 2.6, we now have Rδ (DISJn,t , U p ) = O(p) for
p ≤ t(t − 1)/(2 ln(1/δ )) = O(t 2 ).


4     Pointer jumping and selection
We now consider the tree pointer jumping problem TPJk,t , defined as follows. (In reading this section, it
will help to think of t as growing and k as fixed.)

Definition 4.1 (The tree pointer jumping function). Consider a complete k-level t-ary tree, T , rooted at
v0 . We use the convention that the leaves are at level 1 and the root at level k. The input is a function
φ : V (T ) → [t], with φ (v) ∈ {0, 1} if v is a leaf of T . We shall call such an input a “k-input” and shall
sometimes view it as a labelling of V (T ). Define g(v) to be the φ (v)-th child of v if v is an internal node,
and φ (v) if v is a leaf. The desired output is TPJk,t (φ ) := g(k) (v0 ) = g(g(· · · g(v0 ) · · · )) ∈ {0, 1}.

    There are at least two natural ways to make a traditional communication problem out of TPJk,t , both
of which are of interest to us. The first way is to have two players, Alice and Bob, with Alice (resp. Bob)
receiving the values of φ (v) for odd-level (resp. even-level) vertices v; recall that leaves are at level 1.
The second way is to have k players, with Player i receiving the values of φ (v) for vertices v on level
i. When speaking of communication problems, we shall use TPJk,t to denote the former, and M - TPJk,t
to denote the latter (“M” for “multi-player”). For k = 2, the two definitions coincide and we obtain the
well-studied INDEX problem, for which strong one-way lower bounds are known [1], with numerous
implications for stream computation. In particular, Guha and McGregor [23] use a reduction from INDEX
to obtain a tight (up to logarithmic factors) space lower bound for estimating the median of a randomly
ordered stream of numbers in one pass. This lower bound was subsequently extended to multiple passes
by Chakrabarti, Jayram and Pǎtraşcu [8] via a rather different (and intricate) proof.
    As a consequence of the robust communication lower bounds we prove in this section, we obtain
a considerably simpler and improved multi-pass streaming lower bound for median finding.2 The five
theorems in this section can be organised into two parallel chains of implications, each consisting of three
stages and culminating in a lower bound for the MEDIAN problem, as follows.

Stage 1: We prove a multi-round lower bound on the communication complexity of an appropriate
     “source problem,” which is either M - TPJk,t , as in Theorem 4.4 or TPJk,t , as in Theorem 4.9.
    2 Our results, like the earlier ones [23, 8], apply to the more general problem of selection.



                            T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                           10
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

Stage 2: We reduce the source problem to an intermediate problem that we call weight-based tree
      pointer jumping, or W- TPJk,n , defined below. At this stage, we have a robust lower bound for
      W- TPJ k,n , under an allocation distribution that depends on the source problem we started with.
      These reductions appear as Theorems 4.8 and 4.10 below.

Stage 3: Finally, we reduce W- TPJk,n to the MEDIAN problem, as in Theorem 4.3, obtaining a robust
      lower bound for the latter. This reduction does not depend on the choice of the source problem.

   The precise notion of a “round” is crucial here, and is different for the two parallel chains of
implications. When using the two-player problem TPJk,t as the source, a round consists of a single
message, from either Alice or Bob. The player that does not know φ (v0 ) speaks first. When using the
multi-player problem M - TPJk,t as the source, a round consists of one message from each of the k players,
speaking in the fixed order Player 1, . . ., Player k (recall that Player 1 holds the labels of the leaf nodes).

Definition 4.2 (Cost and Complexity, Multi-Round). Fix one of the two notions of a “round,” as described
above. We define the notations Rkµ,δ ( f , ν), etc., as in Definition 2.2, with protocols restricted to k rounds.
The cost of a round is the maximum possible total number of bits communicated by the players who
speak in that round. The cost of a protocol is the maximum cost of a single round.

    The next three subsections are organised thus. We first present the Stage 3 reduction, then the Stage 1
and Stage 2 theorems for the implication chain that starts with M - TPJk,t , and then deal with the chain that
starts from TPJk,t . We choose to present the M - TPJ chain first, and in greater detail, because it ultimately
implies stronger lower bounds for data stream computation. Furthermore, the Stage 1 theorem in this
chain (Theorem 4.4) is a fundamental and interesting result in communication complexity in its own right
that, to the best of our knowledge, has not been proven before.

4.1   Weight-based TPJ and a reduction to selection
We now define the problem W- TPJk,n mentioned above. It is closely related to TPJk,t and M - TPJk,t (with n
determined by k and t); as before, the input can be thought of as a labelling of a complete k-level t-ary
tree. However, the labels are presented differently: instead of specifying φ (v) directly, the input specifies
a binary string xv ∈ {0, 1}ai for each level-i node of T , where the lengths ai are parameters to be fixed
later, and the Hamming weight of xv implicitly determines φ (v). If v is a leaf (i = 1), then ai = 1 and
φ (v) = xv = |xv |. Otherwise, |xv | uniquely determines φ (v) via the following equation.
                                                                
                                              ai            t +1
                                      |xv | =    + φ (v) −          bi−1 ,                              (4.1)
                                              2               2

where bi is the total length of all strings associated with nodes in the subtree rooted at a level-i node, i. e.,
bi = ai + tbi−1 and b1 = 1. We will only define W- TPJk,n on inputs such that each |xv | determines a value
φ (v) in the range {1, . . . ,t}. In particular, each ai will need to be “large enough” so that Equation (4.1) is
feasible. Let x ∈ {0, 1}n be the concatenation of all the strings xv . We then define the partial function
W- TPJ k,n (x) := TPJ k,t (φ ), where φ is determined by x as just described.
    The next theorem completes Stage 3 in the above proof outline. The reduction from W- TPJ to MEDIAN
used in its proof is along similar lines to one by Guha and McGregor [23].

                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                  11
                     A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

Theorem 4.3. Let MEDIANm,N denote the random allocation communication problem where the input
consists of m tokens (x1 , . . . , xm ) ∈ [N]m and the desired output is the median of this collection of tokens.
For any δ > 0, any allocation distribution ν, and any number p ≥ 1 of rounds of communication, we have

                                       Rδp (MEDIANn,Θ(n) , ν) ≥ Rδp (W- TPJ p+1,n , ν) .

Proof. Let k = p + 1. We reduce W- TPJ to MEDIAN. Let T be a complete k-level t-ary tree as usual, and
let x = {xv }v∈V (T ) be an input to W- TPJk,n . Our reduction will associate a pair of integers (α(v), β (v))
with each v ∈ V (T ) such that the following properties are satisfied.

   1. For each leaf v, we have α(v) ≡ 0 (mod 2) and β (v) ≡ 1 (mod 2).

   2. For each strict descendant v of each internal node u, we have α(u) < α(v) < β (v) < β (u).

   3. If vi and v j are the i-th and j-th children of u, with i < j, then β (vi ) < α(v j ).

Further, it will associate a multiset A(v) with each v ∈ V (T ) as follows. If v is a level-i node, then
A(v) consists of ai − |xv | copies of α(v) and |xv | copies of β (v). The properties above, together with
Equation (4.1), ensure that
                                                                  
                                                         [
                                W- TPJ (x) = median           A(v) mod 2 ;
                                                                       v∈V (T )

this can be justified by a straightforward induction on k. The reduction itself works by having each player
                            S
generate one element of v∈V (T ) A(v) per bit of x allocated to her. This is done in the natural way: if the
bit in question corresponds to a node v, then she generates the element α(v) if the bit’s value is 0 and
β (v) if the bit’s value is 1.
      It remains to demonstrate that suitable values (α(v), β (v)) satisfying the above properties exist. Here
is an explicit construction. We use the notation v[ik , . . . , i j ] to denote the i j -th child of v[ik , . . . , i j−1 ], with
v[ ] being the root of T . Set B = 2 d(t + 2)/2e and let hhk , hk−1 , . . . , h1 iB denote the quantity ∑ki=1 Bi−1 hi ,
i. e., a base-B representation. We now set

                α(v) = hik , . . . , i j+1 , 0, 0, . . . , 0iB   and β (v) = hik , . . . , i j+1 ,t + 1, 0, . . . 0iB ,

for each internal node v = v[ik , . . . , i j+1 ] at level j. For each leaf node v = v[ik , . . . , i2 ], let

                                α(v) = hik , . . . , i2 , 0iB     and β (v) = hik , . . . , i2 , 1iB .

One can easily verify that this construction has the properties claimed.

4.2    A robust multi-player lower bound
We now fill in Stages 1 and 2 of our proof outline, using M - TPJ as our source problem, and deriving a
robust lower bound for W- TPJ. Both problems involve (p + 1) players, for p ≥ 1. Recall that, in this case,
a “round” consists of one message from each player, in the order Player 1, . . . , Player (p + 1). We start
by obtaining the following traditional (i. e., “fragile") bounded-round lower bound for M - TPJ.

                           T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                             12
                 ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

Theorem 4.4. Let µk denote the uniform distribution over k-inputs (as introduced in Definition 4.1).
Then, for p = p(t) ≥ 1, we have

                                           Rµp p+1 ,1/3 (M - TPJ p+1,t ) = Ω(t/p2 ) .

    To prove this, we define an appropriate notion of information cost that is concerned only with the
information revealed in the first round of a multi-round protocol’s execution. We then use this notion to
establish an appropriate round elimination lemma, à la Miltersen et al. [34] and Sen [38], which in turn
implies the above theorem.
Definition 4.5 (First-round information cost). Let P be a multi-round, multi-player, private-coin protocol
and µ an input distribution for P. Let P1 (x, R) denote the concatenation of all messages sent by the
players during the first round of P, where R denotes the concatenation of the random strings used by the
players. Then, we define the first round µ-information cost of P as follows.

                                  icost1µ (P) = I(X : P1 (X, R)) ,         where X ∼ µ .

   As a precursor to our round elimination lemma, we prove the following multi-round analogue of a
lemma of Sen [38, Lemma 1].
Lemma 4.6 (Uninformative round lemma). Suppose a k-input Boolean function f has an r-round k-
player private-coin protocol P, in which each round costs at most C. Then, for any input distribution µ, f
has an (r − 1)-round k-player deterministic protocol Q such that
                                             r
                                               ln 2
                                                                                  q
               errµ (Q, f ) ≤ errµ (P, f ) +        · icost1µ (P) ≤ errµ (P, f ) + icost1µ (P) ,
                                                2
and where each round costs at most C.
Proof. Without loss of generality, we may assume that each player uses two independent random strings
in P: one to generate his first-round message, and another to generate all subsequent messages.3 We
proceed under this assumption. Let Qm denote the (r − 1)-round protocol obtained by fixing the first
round’s communication in P to m. Define the function g by

                                            g(x, m) = Pr[out(Qm , x) 6= f (x)] ,                                            (4.2)

where the probability is over the collection of second random strings used the players.
    Define random variables X and M, where X ∼ µ, and M is the first-round communication generated
from X according to P; let λ denote the resulting joint distribution of (X, M). Let β denote the distribution
of M. We then have
                           r                            r                     r
                             ln 2                          ln 2                  ln 2
      DTV (λ , µ ⊗ β ) ≤          · DKL (λ kµ ⊗ β ) =           · I(X : M) =          · icost1µ (P) ,   (4.3)
                               2                             2                    2
   3 To see why, consider a particular player who uses a random string R to generate his messages, the first such message being

M. This player can instead draw two independent random strings R and R0 , using R to generate M, and then R0 to draw from the
conditional distribution R | M. Finally, he can use R0 in place of R while generating all his remaining messages. It is easy to see
that the distribution of messages so generated is identical to that in the original protocol.


                           T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                                13
                    A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

where the first two steps are basic information theory (the inequality is often credited to Pinsker).
   We can express the distributional errors of P and Qm in terms of g, by averaging Equation (4.2) in
two ways:
                  errµ (P, f ) = E(X,M)∼λ [g(X, M)] ; errµ (Qm , f ) = EX∼µ [g(X, m)] .
Thus, we have

                          Em∼β [errµ (Qm , f )] = E(X,M)∼µ⊗β [g(X, M)]
                                                  ≤ E(X,M)∼λ [g(X, M)] + DTV (λ , µ ⊗ β )
                                                                   r
                                                                     ln 2
                                                  ≤ errµ (P, f ) +        · icost1µ (P) ,
                                                                      2
where the first inequality holds because |g(x, m)| ≤ 1 for all x and m, and the second inequality uses
Equation (4.3). Choose m to minimise errµ (Qm , f ), and fix the random strings used by the players in Qm
so as to minimise the µ-distributional error of the resulting deterministic protocol, Q. Then errµ (Q, f ) is
upper-bounded as desired.

Lemma 4.7 (Round elimination for M - TPJ). Let p ≥ 2 be an integer, let K and ε be positive reals. Let
µk denote the uniform distribution over k-inputs. Let A(p, K, ε) denote the statement “M - TPJ p+1,t has a
deterministic p-round protocol in which each round uses at most t/K 2 bits of communication in total,
and whose distributional error under µ p+1 is at most ε.” Then A(p, K, ε) ⇒ A(p − 1, K, ε + 1/K).

Proof. Let P be a protocol whose existence is asserted by A(p, K, ε). Based on P, we shall construct t
private-coin protocols Q1 , . . . , Qt , each for M - TPJ p,t . Let T be a (p + 1)-level t-ary tree, and let T1 , . . . , Tt
denote the p-level subtrees hanging off the root, v0 . Recall, from Definition 4.1 that a (p + 1)-input
can be thought of as a function from V (T ) to [t], or equivalently, as a labelling of V (T ) using labels
from [t]. Given a p-input φ and an integer i ∈ [t], let φ (i) denote the random (p + 1)-input obtained as
follows. Treat φ as a function from V (Ti ) to [t]. Choose independent random inputs ψ j : V (T j ) → [t], for
j ∈ [t] \ {i}, each distributed according to µ p . Then put
                                       
                                       i ,
                                                           if v = v0 ,
                                 (i)
                                φ (v) = φ (v) ,             if v ∈ V (Ti ) ,
                                       
                                        ψ j (v) ,           if v ∈ V (T j ) where j 6= i .
                                       


Let ξi denote the distribution of Φ(i) , where Φ ∼ µ p . Notice that ξi is identical to µ p+1 conditioned on
the label of the root being i.
     Here is how the protocol Qi works. On input φ , the players use private randomness to construct φ (i)
(note that this is possible because of an appropriate product structure of φ (i) ), and then simulate P on this
input, using a virtual “Player p + 1,” who can be locally simulated by each real player, because his input,
i, is common knowledge. Clearly, Qi only errs when its call to P errs. Therefore, we have

        1 t                            1 t
              err
          ∑ µp i
        t i=1
                  (Q , M - TPJ p,t ) =   ∑ errξi (P, M - TPJ p+1,t ) = errµp+1 (P, M - TPJ p+1,t ) ≤ ε .
                                       t i=1
                                                                                                                      (4.4)


                          T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                          14
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

    Let M denote the concatenation of the messages generated in the first round by Players 1, . . . , p when
the protocol P runs on input X ∼ µ p+1 , defined on the tree T . For i ∈ [t], let Xi denote the portion of X
that corresponds to the labelling of the subtree Ti . Then we have
                                              t              t
                        t
                       K2
                          ≥ |M| ≥ I(X : M) ≥ ∑ i I(X : M) = ∑ icost1µp (Qi ) ,                          (4.5)
                                             i=1            i=1

where the rightmost inequality uses the independence of {Xi }i∈[t] . Combining (4.4) and (4.5), we have
                                                           s
                        1 t                                    1 t                        1
                          ∑ errµp (Qi , M - TPJ p,t ) +          ∑   icost1µ p (Qi ) ≤ ε + .
                        t i=1                                  t i=1                      K

Using the concavity of the square root function, plus an averaging argument, we now conclude that
                                                                    q                     1
                         ∃ i ∈ [t] : errµ p (Qi , M - TPJ p,t ) +    icost1µ p (Qi ) ≤ ε + .
                                                                                          K
Applying Lemma 4.6 to this particular Qi gives us the desired protocol, thereby establishing the truth of
A(p − 1, K, ε + 1/K).

    We now have the tools we need to prove Theorem 4.4.

Proof of Theorem 4.4. Suppose that

                                               Rµp p+1 ,1/3 (M - TPJ p+1,t )

is not lower bounded as stated. Specifically, using a standard error-reduction argument, we may assume
that
                                   Rµp p+1 ,1/6 (M - TPJ p+1,t ) ≤ t/(6p)2 .

By the easy direction of Yao’s minimax lemma, we have A(p, 6p, 1/6), where the predicate A is as defined
in Lemma 4.7. Applying that lemma repeatedly, we conclude A(1, 6p, 1/6 + (p − 1)/6p), which implies
A(1, 6p, 1/3). Notice that M - TPJ2,t is just the INDEX problem with a t-bit input. We have just shown
that this problem has a one-round protocol with error at most 1/3 under the uniform distribution and
communication cost at most t/(6p)2 ≤ t/36. Since Hb (1/3) < 12/13, this contradicts Lemma 2.5.

    Now that we have the desired Stage 1 lower bound, we move on to Stage 2, proving the following
robust lower bound. In our proof, we use a reduction from TPJ that introduces a slight correlation between
input and split, and then appeal to Lemma 2.4 to correct for this.

Theorem 4.8. Let p = p(n) ≥ 1 and let V p+1 be the (non-uniform) split distribution that gives each token
to Player 1 with probability 1/2 and to Player i with probability γ := 1/(2p) for each i ∈ {2, . . . , p + 1}.
Then
                                                                                              2 p −1
                                           1       .       
    p
  R1/24 (W- TPJ p+1,n , V p+1 ) = Ω n (p−1)2 p+1 +2
                                                     q(n, p) , where q(n, p) = p2 cp3 log n (p−1)2 p+1 +2


                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                               15
                  A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

for some large constant c. Note that q(n, p) = polylog(n) for constant p.
    Thus, for every constant ε > 0, when p is large enough, we have
                                 p                                     −p 
                               R1/24 (W- TPJ p+1,n , V p+1 ) = Ω n(2+ε) .

Proof. Let P be a protocol for (W- TPJ, V p+1 ) such that err(P, W- TPJ, V p+1 ) ≤ 1/24. We will use P to
construct a protocol Q for M - TPJ such that errµ (Q, M - TPJ p+1,t ) ≤ 1/3, where µ is an arbitrary distribution
with the property that, for an instance φ ∼ µ, we have φ (v) ∈R {0, 1} for each leaf node v. Note that,
in particular, this will imply errµ p+1 (Q, M - TPJ p+1,t ) ≤ 1/3. The result will then follow by invoking
Theorem 4.4.
    In Q, the players first use public randomness to transform an input φ for M - TPJ into an input x for
W- TPJ together with a random split of its tokens. They then proceed to simulate P on this instance. Recall
the notation ai and bi from the start of Section 4.1. We set
                                                               i−1 −1 −2(3·2i−1 −i−2)
                                 ai := (cp3t 2(p+2) log n)2         t                                           (4.6)

for some large constant c to be determined. For each node v, the players use the following public coin
randomised procedure to determine a bit string xv and an allocation of its bits to the players in P.

If v is an internal node at level i: Choose random integers
                                       a                      a       
                                          i                        i
                               d1v ∼ B      ,1−γ    and d0v ∼ B      ,1−γ ,
                                         2                       2
      as well as a set                                               
                                                               [ai ]
                                                 Sv−i ∈R                .
                                                            d1v + d0v
      Let Sv−i = Sv1 ∪ . . . ∪ Svi−1 ∪ Svi+1 ∪ . . . ∪ Svp+1 be a random partition where, for each k ∈ Sv−i ,
                                                           (
                                                    j
                                                             γ/(1 − γ) ,     if j 6= 1 ,
                                        Pr k ∈ Sv =
                                                              1/(2(1 − γ)) , if j = 1 ,

      and put Svi = [ai ] \ Sv−i . Player j will be allocated the values {xv,k : k ∈ Svj }. Randomly set d1v of
      the bits {xv,k : k ∈ Sv−i } to 1 and the remaining d0v bits to 0. Notice that all of this is done without
      reference to the input φ .
      Player i uses φ to determine a target weight |xv | for the string xv , based on Equation (4.1). Notice
      that many of the bits of xv have already been fixed by the construction so far. Player i sets the free
      bits in such a way as to achieve this target weight, i. e., she randomly sets |xv | − d1v of the bits
      {xv,k : k ∈ Svi } to 1 and the remaining bits to 0. Note that this requires d1v ≤ |xv | ≤ ai − d0v ; if this
      condition fails to hold, the protocol aborts and outputs a uniform random bit.

If v is a leaf node: In this case xv is a single bit. Allocate this bit to a random player, with Player 1 being
        chosen with probability 1/2 and every other player being chosen with probability γ. If the bit is
        allocated to Player 1, she sets xv = φ (v). Otherwise, the players set xv ∈R {0, 1}.

                         T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                    16
               ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

This completes the description of Q. Because V p+1 allocates each token to the first player with probability
1/2, and φ assigns a uniformly random bit to each leaf, we have
                                                                              1 1 1          3
                Pr[W- TPJ(x) = TPJ(φ ) | the protocol Q does not abort] = + · = .
                                                                              2 2 2          4
    It remains to show that the bit string x and the allocation σ generated in the reduction are sufficiently
close to being independent. Note that the marginals are correct: we do have σ ∼ V p+1 and, for each
leaf v, the value of xv is indeed chosen according to a uniform setting of φ (v). The issue is that the joint
distribution is not a product distribution. However, note that had d1v and d0v been chosen according
to B(|xv |, 1 − γ) and B(ai − |xv |, 1 − γ), respectively, then σ and x would have been independent, and
furthermore, the protocol would not abort. For each internal node v at level i, let
                              a                                       a          
                                 i                                         i
                    Ãv := B       ,1−γ ,                      B̃v := B      ,1−γ ,
                                2                                        2
                    Av := B (|xv |, 1 − γ) ,                   Bv := B (ai − |xv |, 1 − γ) .
Then it suffices to show that the product of the distributions Ãv and B̃v , over all internal nodes v, is
sufficiently close to the corresponding product of Av and Bv . Using Lemma 2.7 with the fact that
||xv | − ai /2| ≤ tbi−1 , we can bound the total variation distance in terms of ai and bi as follows,
                                                     
                          O                O                                                 
                   DTV       (Ãv ⊗ B̃v ),  (Av ⊗ Bv ) ≤ ∑ DTV Ãv , Av + ∑ DTV B̃v , Bv
                        v              v                     v                  v
                                                                      p+1 p+2−i
                                                          p              t      bi−1
                                                      ≤ O( log n) ∑            p        ,
                                                                      i=2       ai /p
where the first inequality follows from the triangle inequality. Noting that bi−1 ≤ 2ai−1 and by substituting
in the value for ai , we get
                                   √                        i−2          i−2
          bi−1         2ai−1     2 p(cp3t 2(p+2) log n)2 −1t −2(3·2 −i−1)                 2
        p         ≤ p          =       3  2(p+2)       2i−2 −1/2 −(3·2i−1 −i−2) = √ p+2−i √          .
           ai /p        ai /p       (cp t        log n)         t                   cpt        log n
Therefore,
                                                   p+1 √                         p+1
                                                         p · t p+2−i bi−1
                                     
        O                 O                p                                         1    O(1)
  DTV       (Ãv ⊗ B̃v ),   (Av ⊗ Bv ) ≤ O( log n) ∑          √           = O(1) ∑ √    = √
          v               v                        i=2           ai              i=2 cp     c
and the distance can be made less than 1/24 for sufficiently large constant c. By Lemma 2.4,
                                             1  1                                      1
                   errµ (Q, M - TPJ p+1,t ) ≤ + + err(P, W- TPJ p+1,n , V p+1 ) ≤ .
                                             4 24                                      3
As noted above, this implies the same upper bound on errµ p+1 (Q, M - TPJ p+1,t ). Therefore, by Theorem 4.4,
                                   p
                                  R1/24 (W- TPJ p+1,n , V p+1 ) = Ω(t/p2 ) .
Note that
                                              p          p                          p       p+1 +2
            n = b p+1 ≤ 2(cp3t 2(p+2) log n)2 −1t −2(3·2 −p−3) = 2(cp3 log n)2 −1t (p−1)2            ,
and hence,                                  1                 2 p −1 
                               t = Ω n (p−1)2 p+1 +2 cp3 log n (p−1)2 p+1 +2 .
                                                    



                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                              17
                 A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

4.3   A robust two-player lower bound
Finally, we revisit Stages 1 and 2 of our proof outline, this time using the 2-player problem TPJk,t as
our source problem. Now a “round” consists of one message from either Alice or Bob. The traditional
(fragile) lower bound that we need for Stage 1 can be deduced from the work of Klauck et al. [31], who
in fact studied the problem in the more general quantum communication setting. The underlying intuition
is, once again, round elimination.

Theorem 4.9. For p = p(t) ≥ 1, we have

                                      Rµp p+1 ,1/3 (TPJ p+1,t ) = Ω(t/p2 ) ,

where µk is the uniform distribution over k-inputs (as introduced in Definition 4.1).

    For Stage 2, we obtain the following robust lower bound for W- TPJ, using a proof that closely parallels
that of Theorem 4.8: as before, our reduction from TPJ introduces a slight correlation between input and
split, and we use Lemma 2.4 to correct for this.

Theorem 4.10. For each p = p(n) ≥ 1,
                                                                                                      2 p −1
                                        1      .       
    p
  R1/24 (W- TPJ p+1,n , U2 ) = Ω n (p−1)2 p+1 +2 q(n, p) ,       where         q(n, p) = p2 cp2 log n (p−1)2 p+1 +2

for some large constant c. Thus, for every constant ε > 0, when p is large enough, we have
                                  p                                  −p 
                               R1/24  (W- TPJ p+1,n , U2 ) = Ω n(2+ε) .

Proof. Let P be a protocol for (W- TPJ, U2 ) such that err(P, W- TPJ, U2 ) ≤ 1/24. We will use P to construct
a protocol Q for TPJ that works with probability at least 2/3 on any instance φ when φ (v) ∈R {0, 1} for
each leaf node v. In Q, Alice and Bob first use public randomness to construct an input x for W- TPJ
together with a random split of its tokens. They then proceed to simulate P on this instance. We first
define
                                                          i−1          i−1
                               ai := (cp2t 2(p+2) log n)2 −1t −2(3·2 −i−2)                              (4.7)
for some large constant c. For each node v, the players use the following public coin randomised procedure
to determine a bit string xv and an allocation of its bits to the players in P.

If v is an internal node at level i: Choose random integers
                                        a                     a        
                                           i                       i
                                d1v ∼ B      , 1/2  and d0v ∼ B      , 1/2 ,
                                          2                      2
      as well as a set                                             
                                                             [ai ]
                                                Sv ∈R                 .
                                                          d1v + d0v
      First assume i is even. Alice determines {xv,k : k ∈ Sv } and, uniformly at random, sets d1v of these
      tokens to 1 and the remaining d0v tokens to 0. Notice that all of this is done without reference to the

                         T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                        18
               ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

      input φ . Bob then uses φ to determine a target weight |xv | for the string xv , based on Equation (4.1).
      Notice that many of the bits of xv have already been fixed by the construction so far. Bob sets the
      free bits in such a way as to achieve this target weight, i. e., he randomly sets |xv | − d1v of the bits
      {xv,k : k ∈
                / Sv } to 1 and the remaining bits to 0. Note that this requires d1v ≤ |xv | ≤ ai − d0v ; if this
      condition fails to hold, the protocol aborts and outputs a uniform random bit. If i is odd then Alice
      and Bob’s roles are reversed.
If v is a leaf node: In this case xv is a single bit. Allocate this bit to a random player, with Alice and Bob
        being chosen with equal probability. If the bit is allocated to Alice, she sets xv = φ (v). Otherwise,
        Bob sets xv ∈R {0, 1}.
This completes the description of Q. Because U2 allocates each token to Alice with probability 1/2, and
φ assigns a uniformly random bit to each leaf, we have
                                                                               1 1 1 3
               Pr[W- TPJ(x) = TPJ(φ ) | the protocol Q does not abort] =        + · = .
                                                                               2 2 2 4
    It remains to show that the bit string x and the allocation σ generated in the reduction are sufficiently
close to being independent. As in the proof of Theorem 4.8, we note that the marginals are correct: we
do have σ ∼ U2 and, for each leaf v, the value of xv is indeed chosen according to a uniform setting of
φ (v). The issue, as before, is that the joint distribution is non-product. However, note that had d1v and d0v
been chosen according to B(|xv |, 1/2) and B(ai − |xv |, 1/2), respectively, then σ and x would have been
independent, and furthermore, the protocol would not abort. For each internal node v at level i, let
                                                                                              
                   1 1                      1 1                         1                          1
        Ãv := B    ai ,   , B̃v := B        ai ,    , Av := B |xv |,       , Bv := B ai − |xv |,      .
                   2 2                      2 2                         2                          2
Hence, we need to show that the product of the distributions Ãv and B̃v , over all internal nodes v, is
sufficiently close to that of all Av and Bv . Using Lemma 2.7, we can bound the total variation distance in
terms of ai and bi as follows,
                                                    
                        O                 O                                              
                 DTV        (Ãv ⊗ B̃v ),  (Av ⊗ Bv ) ≤ ∑ DTV Ãv , Av + ∑ DTV B̃v , Bv
                        v              v                      v                v
                                                                   p+1 p+2−i
                                                           p           t     bi−1
                                                       ≤ O( log n) ∑     √
                                                                   i=2     ai

where the first inequality follows from the triangle inequality. Noting that bi−1 ≤ 2ai−1 and substituting in
the value for ai , the distance can be made less than 1/24 for sufficiently large constant c. By Lemma 2.4,
                                                1  1                              1
                       errµ (Q, TPJ p+1,t ) ≤     + + err(P, W- TPJ p+1,n , U2 ) ≤ .
                                                4 24                              3
Therefore, by Theorem 4.9,
                                      p
                                     R1/24 (TPJ p+1,n , U2 ) = Ω(t/p2 ) .
Note that
                                                 p        p                        p          p+1 +2
            n = b p+1 ≤ 2(cp2t 2(p+2) log n)2 −1t −2(3·2 −p−3) = 2(cp2 log n)2 −1t (p−1)2              ,

                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                  19
                   A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

and hence,
                                               1                 2 p −1 
                                  t = Ω n (p−1)2 p+1 +2 cp2 log n (p−1)2 p+1 +2 .
                                                       



5     Hamming distance and index
In this section, we prove robust lower bounds for the fundamental communication problems INDEX and
GAP - HAMMING - DISTANCE .


5.1    Hamming distance
The GAP - HAMMING - DISTANCE problem (henceforth, GHD) was first formally stated in the context of
data stream lower bounds [30, 42, 27]. The central goal is to determine whether the Hamming distance
between two binary strings is “low” or “high,” with a certain gap (given by a parameter, G) between the
demarcations of “low” and “high.” To be precise, define the function ∆ : {0, 1}2n → Z by

                             ∆(x) := |{i ∈ [n] : x2i 6= x2i−1 }| ,     for x ∈ {0, 1}2n .

For G ∈ R+ , we then define GHDn,G : {0, 1}2n → {0, 1, ?} by
                                                  
                                                  0 , if ∆(x) ≥ n/2 + G ,
                                                  
                                   GHD n,G (x) :=  1 , if ∆(x) ≤ n/2 − G ,
                                                  
                                                   ? , otherwise,
                                                  

where “?” can be interpreted as “undefined.” Equivalently, a computation problem corresponding to the
function GHDn,G can be thought of as a promise problem, where we are promised that ∆(x) does not
fall between n/2 − G and n/2 + G. Traditional (i. e., “fragile”) communication lower bounds for this
problem, where Alice receives x1 , x3 , . . . , x2n−1 and Bob receives x2 , x4 , . . . , x2n , have been heavily studied
recently. In particular, Chakrabarti and Regev [10] show that R(GHDn,G ) = Θ(min{n, n2 /G2 }); see, also,
Sherstov [39] and Vidick [41].
     For a number of reasons (in particular, the data stream applications) the problem is most interesting
                      √
when we set G = Θ( n). We shall prove an optimal robust lower bound for the problem in this setting.

Theorem 5.1. There exists a constant c3 > 0 such that

                                          R1/4 (GHDn,c3 √n , U2 ) = Ω(n) .

Remark 5.2. It is worth pointing out that c3 needs to be sufficiently small for the theorem to hold. In
fact, for a sufficiently large constant c4 , we have R1/4 (GHDn,c4 √n , U2 ) = 0. This is in contrast to the case
                                                                                                  √
of the standard fixed-partition version of GHD, which remains hard at all gaps in Θ( n).
    The theorem will be proved by a reduction from the GHD problem in the standard setting where Alice
and Bob hold x1 , x3 , . . . , x2n−1 and x2 , x4 , . . . , x2n respectively. Before presenting the actual proof, it may
be useful to consider a proof attempt that will not work. Specifically, suppose Alice and Bob use public
randomness to determine a random split σ . If σ (2i − 1) = 1 and σ (2i) = 2, then Alice and Bob already

                         T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                       20
               ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

know the relevant bits of x. If σ (2i − 1) = 2 and σ (2i) = 1, then Alice could use x2i−1 in place of x2i
and Bob could use x2i in place of x2i−1 since this will not change the Hamming distance. However, if
σ (2i − 1) = σ (2i) then the relevant player will not know both x2i−1 and x2i so suppose that he or she
picks the unknown bit randomly. If y is the resulting bit string upon which the protocol is being simulated
and GHDn,√n (x) ∈ {0, 1}, then it can be shown that GHDn,√n (x) = GHDn,c√n (y) with probability 0.99 if
c > 0 is a sufficiently small constant.
     Hence, it would appear that we can conclude that any protocol for GHDn,c√n in the random-allocation
setting can be used to solve GHDn,√n in the traditional setting and therefore imply a lower bound.
Unfortunately this is incorrect. The correctness guarantee of a protocol in the random-allocation setting
is that for any input y, if the bits of y are partitioned uniformly at random, then the protocol should be
correct with the required probability. This guarantee naturally still applies if y is chosen according to
some distribution and the bits are partitioned uniformly at random, but only if the distribution of y is
independent of the partitioning, which is not the case in the above attempt at a proof, since
                                                   (
                                                     1/2      if σ (2i − 1) = σ (2i) ,
                          Pr [y2i−1 ⊕ y2i = 1] =
                                                     ∆(x)/n if; σ (2i − 1) 6= σ (2i) .

We address this issue in the proof by padding the original instance of GHD with extra coordinates such
that the weak correlation in pairs of bits that are split between the different players is masked by the
random bits in the extra coordinates. There will be a small correlation between the resulting string and
the random allocation but, by setting parameters appropriately, we will be able to make this correlation
arbitrarily small.

Proof of Theorem 5.1. Let y ∈ {0, 1}2n be an instance of GHD in the standard setting where yi is known
to Alice if i is odd and known to Bob if i is even. Define g by ∆(y) = n/2 + g. Distinguishing whether
      √              √
g ≥ n or g < − n requires Ω(n) bits of communication [10] in the standard setting. This bound
                               √                √
also applies if we promise n ≤ |g| ≤ 10 n since the lower bound applies even when y ∈R {0, 1}2n
and under this distribution the promise is satisfied with constant probability. Henceforth, we assume
√                 √
  n ≤ |g| ≤ 10 n.
     Using y and some public randomness, Alice and Bob generate an instance z ∈ {0, 1}2m in the robust
setting, together with a split σ , where m = cn for some large constant c. The pair I = (z, σ ) will have the
properties that z and σ are nearly independent, and that GHDn,√n (y) = GHDm,√n (z).
     It will be helpful to generate (z, σ ) by first generating five sets S0 , S1 , D0 , D1 , and Dinput that partition
[m]. For b ∈ {0, 1}, Sb includes i if the (2i − 1)-th token and the (2i)-th token are received by the same
player, i. e., σ (2i − 1) = σ (2i), and z2i−1 ⊕ z2i = b. For b ∈ {0, 1}, Db includes i if the (2i − 1)-th token
and the (2i)-th token are received by different players and z2i−1 ⊕ z2i = b. The set Dinput encodes a further
n index pairs that will be received by different players and where

                                          z2i−1 ⊕ z2i = y2π(i)−1 ⊕ y2π(i) ,

where π is a random bijection between [n] and Dinput .
   To determine S0 , S1 , D0 , D1 , and Dinput , we proceed as follows. Using public randomness, Alice and
Bob choose independent random integers s0 , s1 ∼ ν where ν is the distribution B (m/2, 1/2) conditioned

                         T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                      21
                       A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

on the event that the value is at most (m − n)/2. Let (S0 , S1 , E, D0 , D1 ) be a random partition of [m]
conditioned on

        |S0 | = s0 ,    |S1 | = s1 ,   |E| = n ,   |D0 | = (m − n)/2 − s0 ,    and   |D1 | = (m − n)/2 − s1 .

Alice and Bob then choose (z, σ ) uniformly at random conditioned on the choice of the above sets:
      • For i ∈ D0 ∪ D1 ∪ S0 ∪ S1 , Alice and Bob know z2i−1 ⊕ z2i = b and hence the pair (z2i−1 , z2i ) ∈R
        {(0, b), (1, 1 − b)} can be determined using only public randomness.

      • For i ∈ Dinput , let ri be a public random bit. If σ (2i − 1) = 1 and σ (2i) = 2 then Alice sets
        z2i−1 = ri ⊕ y2π(i)−1 and Bob sets z2i = ri ⊕ y2π(i) . If σ (2i − 1) = 2 and σ (2i) = 1 then Alice sets
        z2i = ri ⊕ y2π(i)−1 and Bob sets z2i−1 = ri ⊕ y2π(i) . In each case, z2i−1 and z2i are chosen uniformly
        at random conditioned on z2i−1 ⊕ z2i = y2π(i)−1 ⊕ y2π(i) .
    Observe that ∆(z) = |S1 | + |D1 | + ∆(y) = (m − n)/2 + n/2 + g = m/2 + g. Therefore GHDn,√n (y) =
GHD m,√n (z) as required.
    Suppose P is a robust protocol with failure probability δ . In the above reduction, z and σ are not
generated independently but we will show that P still has a failure probability at most δ + 1/5 on the
inputs we generate. To do this, we consider a second distribution over instances I0 ; this distribution is
purely for the sake of analysis and Alice and Bob will not have sufficient information to generate instances
according to this distribution. To generate I0 , choose s0 ∼ B (m/2 − g, 1/2) and s1 ∼ B (m/2 + g, 1/2)
and let (S0 , S1 , D0 , D1 ) be a random partition of [m] conditioned on

                 |S0 | = s0 ,    |S1 | = s1 ,   |D0 | = m/2 − g − s0 ,   and   |D1 | = m/2 + g − s1 ,

and we set the values of σ (2i − 1), σ (2i), z2i−1 , and z2i uniformly at random conditioned on the choice
of these sets. Note that the distribution of I and I0 is identical conditioned on the values of s0 and s1 .
Furthermore, I0 is distributed according to a product distribution and hence P outputs GHDm,√n (z) =
GHD n,√n (y) with probability at least 1 − δ on this distribution. Hence, if δ 0 is the failure probability of P
on the distribution of I, we have
                                                                               
           δ 0 ≤ δ + DTV ν, B (m/2 − g, 1/2) + DTV ν, B (m/2 + g, 1/2)
                                                                                                  
               ≤ δ + 2 Pr [B (m/2, 1/2) ≥ (m − n)/2] + DTV B (m/2, 1/2), B (m/2 − g, 1/2)
                                                           
                 + DTV B (m/2, 1/2), B (m/2 + g, 1/2)
                ≤ δ + 1/10 + 1/20 + 1/20 ≤ δ + 1/5 .

where the last line follows from the Chernoff bound and from Lemma 2.7 (since g2 /m ≤ 100n/m), by
ensuring that m/n = c is sufficiently large.

5.2     Index
For our purposes, we define the INDEXn problem over inputs x ∈ [n] × {0, 1}n as follows: INDEXn (x) := x j
where j := x0 . Traditionally, one considers the worst-case partition where Alice (the player who speaks)

                            T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                               22
               ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

holds x1 . . . xn and Bob holds j. The resulting problem is one of the most basic in communication
complexity, and strong randomised lower bounds are known for it in this setting [1]. In this fixed-partition
model, INDEXn can be thought of as a special case of DISJn,2 , where one string is of the form ei . This is
no longer the case under uniform splits, since the zeros in ei get spread between the players, and leak
information about which indices are not of interest.
    We prove a robust lower bound for a generalisation of INDEX that allocates multiple copies of the
tokens (x0 , . . . , xn ) amongst two players. This generalisation is needed for proving subsequent data stream
bounds. For positive integers a and b, let INDEXa,b     n denote the problem where the input consists of a
copies of each xi (for i ∈ [n]) and b copies of x0 , with x = (x0 , . . . , xn ) being an input for INDEXn . The
(partial) function INDEXa,b   n takes the value INDEX n (x) on such an input. Let ν p denote the distribution of
a random split obtained by independently giving each input token to Player 1 with probability p, and to
Player 2 otherwise.

Theorem 5.3. Let a, b, and p constants such that a and b are positive integers and 0 < p < 1. We have

                                          R→         a,b
                                           δ ( INDEX n , ν p ) = Ω(n) ,

where δ = (1 − p)b pa /4.

Proof. The proof is by reduction from INDEXn when Alice holds x1 . . . xn ∈ {0, 1}n and Bob holds index
x0 = j. Let µ be the uniform distribution over all possible inputs. By Lemma 2.5, any one-way protocol
succeeding with probability at least 1/2 + (1 − p)b pa /4 (for a, b, p positive constants) for instances of
INDEX n drawn from µ requires Ω(n) bits to be communicated.
    Suppose there exists a one-way protocol P with the property that err(P, INDEXa,b                   b a
                                                                                    n , ν p ) ≤ (1 − p) p /4.
We use P to create the following (traditional) protocol Q for INDEXn . Let x = (x0 , . . . , xn ) denote the
input given to Alice and Bob in Q, and let x̂ denote the corresponding input to the players in INDEXa,b    n .
Alice and Bob agree on a split σ ∼ ν p , of the tokens in x̂, using public coins. Consider the events

                          B0 = “a copy of x0 is allocated to Player 1,” and
                           Bi = “a copy of xi is allocated to Player 2,” for i ∈ [n].

Alice and Bob behave as follows in the protocol Q. If B0 occurs, then Bob outputs a uniform random bit.
Otherwise, for each i ∈ [n] such that Bi occurs, they jointly choose a (public) random bit ri ∈R {0, 1} and
set all bits of x̂ that are copies of xi equal to ri . The remaining bits of x̂ (i. e., those that are copies of xi
such that Bi does not occur) are left unchanged. Alice and Bob then simulate P on this updated input x̂,
playing the roles of Player 1 and Player 2 respectively. Clearly, cost(Q) ≤ cost(P).
    Let B = B0 ∨ B j (recall that j = x0 ). If B occurs, then the output of Q is a random bit. Otherwise, Q is
correct whenever P is. Note that Pr [B] = 1 − (1 − p)b pa . Thus, the correctness probability of Q is

        Pr [B]                              Pr [B]                               1 (1 − p)b pa
               + Pr [¬B ∧ (P is correct)] ≥        + Pr [P is correct] − Pr [B] ≥ +            ,
          2                                   2                                  2     4

which implies cost(Q) = Ω(n), and hence, cost(P) = Ω(n).

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                  23
                      A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

6     Robust lower bounds for data stream computation
We use our results on communication complexity from the previous sections to derive robust lower
bounds for a number of problems in the data stream model. The connection between random-allocation
communication complexity and robust bounds in the data stream model is a natural extension of the
connection between fixed-partition communication complexity and the basic data stream model where
the data is ordered adversarially. In particular, an r-pass, s-space data stream algorithm for evaluating
a function f on a set S of tokens presented in (uniform) random order yields an r-round, p-player
communication protocol for evaluating f (S) for certain ways of partitioning S into p subsets S1 , . . . , S p ,
with the i-th player receiving Si . To be precise, each token is placed in one subset chosen independently,
but not necessarily uniformly, from S1 , . . . , S p .
     The communication protocol then works as follows. The i-th player (uniformly) randomly permutes
Si to generate stream si and the players emulate the algorithm on the concatenated stream hs1 |s2 | . . . |s p i.
This emulation requires O(rps) bits of communication. Given a lower bound on the complexity of the
communication problem, this allows us to deduce a lower bound for the data stream problem.

6.1     Frequency moments
The (estimations of) various frequency moments are some of the most well-studied problems in the data
stream model [3]. Suppose the stream comprises a sequence of m values a j ∈ [n]. Define fi = |{ j : a j = i}|.
The k-th frequency moment (k not necessarily integral) is

                                                       Fk := ∑ fik .
                                                                i∈[n]

We consider constant k ≥ 3. It is known that any O(1)-pass algorithm that returns a (1/2, 1/4)-
approximation of Fk requires Ω̃(n1−2/k ) space4 and that this is tight under worst-case orderings [28, 9].
However, it was observed that for random orderings and m = Ω̃ε (an) (a > 1) there exists a single pass
Õ((n/a)1−2/k )-space algorithm that (ε, δ )-approximates Fk [22]. The following theorem shows a lower
bound on the space usage in the random-order case. The proof combines Theorem 3.3 with a variation of
the reduction used in Alon, Matias, and Szegedy [3, Theorem 3.2].

Theorem 6.1. An r-pass algorithm giving a (1/10, 1/10)-approximation for Fk of a randomly or-
dered stream requires Ω(n1−3/k /r) space. For streams where m = Ω(an), for some integer a > 1,
Ω(n1−3/k /(a3 r)) space is required.

Proof. Suppose there exists an r-pass, (1/10, 1/10)-approximation algorithm for Fk that uses s bits
of space. Set t = (5n/4)1/k . Let x = {xi j }i∈[t], j∈[n] be an instance for DISJn,t that satisfies the unique
intersection promise (as discussed at the start of Section 3). Consider a uniform random split of the nt
input tokens between p = 20t 2 players. Let the player who receives the token for xi j , generate the value
 j if xi j = 1 and define S j to be the multiset of values generated by the j-th player. Note that the sets
S1 , . . . , S p are a random partition of S = S1 ∪ · · · ∪ S p . Furthermore Fk (S) ≥ t k = 5n/4 if DISJn,t (x) = 1
    4 The Õ(·) and Ω̃(·) notations used in this section suppress logarithmic dependencies on the stream length, m, the universe

size, n, and the inverse error probability, δ −1 .


                             T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                           24
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

and Fk (S) ≤ n if DISJn,t (x) = 0. Using the template at the start of Section 6 and appealing to Theorem 3.3,
we can deduce that rps = Ω(n/t). Therefore s = Ω(n1−3/k ) as required.
    To prove the second part of the theorem, the reduction from DISJn,t proceeds as before but we also
add a copies of [n] randomly distributed between the p players. This is achieved using public randomness.
Now, if DISJn,t (x) = 1, then Fk ≥ t k , but if DISJn,t (x) = 0, then Fk ≤ (a + 1)k n. If we now choose
t = (5n/4)1/k (a + 1), a (1/10, 1/10)-approximation to Fk distinguishes the two cases. The resulting
lower bound on the space is Ω(n/(rpt)) = Ω(n1−3/k /(a3 r)).

6.2   Distinct elements and entropy: Space/approximation tradeoffs
The number of distinct elements in a stream is F0 := |{i ∈ [n] : fi 6= 0}|, and the empirical entropy is
H := ∑i∈[n] ( fi /m) log(m/ fi ); recall that m denotes the length of the stream. Within this subsection let
us think of the “input size” parameters m and n as fixed, and the approximation parameter ε as varying.
One-pass, Õ(ε −2 )-space, (ε, δ )-approximation algorithms are known for both problems [7, 25, 17, 5, 18].
We prove that the known algorithms are essentially tight even under random order. These results follow
from Theorem 5.1 and the reductions in [7, Theorem 2] and [42, Section 3.2].
Theorem 6.2. Let k ≥ 0 be a constant with k 6= 1. Then, any r-pass (ε, 1/10)-approximation for Fk of a
randomly ordered stream requires Ω(ε −2 /r) space. Also, any r-pass (ε, 1/10)-approximation for H of a
randomly ordered stream requires Ω(ε −2 /(log2 ε −1 · r)) space.
Proof. Suppose there exists an r-pass, (ε, 1/10)-approximation algorithm for H that uses s(ε) bits of
space. Let x ∈ {0, 1}2n be an instance of GHDn,G (where n and G are determined by ε below) and consider
a uniform random split of the 2n tokens between two players. Let the player who receives the token for xi
generate the value (di/2e , xi ). Define SA and SB to be the multisets of values generated by Alice and Bob
respectively. Note that SA and SB are a random partition of S := SA ∪ SB . Furthermore,
                                           ∆             n−∆          ∆
                               H =           log(2n) +        log n = + log n ,
                                           n               n          n
where ∆ = ∆(x) = |{i ∈ [n] : x2i 6= x2i−1 }|. Hence, any algorithm which can (ε, 1/10)-approximate H can
also distinguish the cases ∆(x) ≤ n/2 − G and ∆(x) ≥ n/2 + G, provided ε < G/(n log n). For any ε that
                                                                √
is less than c3 , we set n = c23 ε −2 /(4 log2 ε −1 ) and G = c3 n, where c3 is the constant from Theorem 5.1.
This ensures ε < G/(n log n). Using the template at the start of Section 6 and appealing to Theorem 5.1,
we can deduce that rs(ε) = Ω(n) = Ω(ε −2 / log2 ε −1 ).
     The frequency moments lower bound is similar: the same reduction ensures that if ∆(x) = n/2 + g,
then
        Fk = 2k (n − ∆(x)) + 1k · 2∆(x) = 2k n + (2 − 2k ) · ∆(x) = (2k + 1 − 2k−1 )n + (2 − 2k ) · g .
Then either
              Fk                   |2 − 2k |G                   Fk                   |2 − 2k |G
                         ≤ 1 −                       or                    ≥ 1 +                   .
       (2k + 1 − 2k−1 )n       (2k + 1 − 2k−1 )n         (2k + 1 − 2k−1 )n       (2k + 1 − 2k−1 )n
                                                            √
Setting n = (ε −1 c3 |2 − 2k |/(2k + 1 − 2k−1 ))2 and G = c3 n ensures that the value of Fk in each case
differs by a factor of at least 1 + ε and hence the communication lower bound of Ω(n) entails a space
lower bound of Ω(ε −2 /r) for constant k.

                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                               25
                  A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

6.3   Selection
Selection, including median-finding, is one of the earliest-studied problems in the data stream model [35]
and has been the focus of several works [19, 23, 8]. The following result improves upon the previous best
single and multi-pass lower bounds [23, 8]. As an example, our theorem implies a Ω̃(m1/10 ) space lower
bound for 3-pass algorithms whereas the best previous result was Ω̃(m3/80 ) [8].
Theorem 6.3. Any p-pass algorithm to return the median of a length-m randomly ordered stream which
succeeds with probability at least 3/4 requires
                                                p+1
                                                               
                                     Ω m1/((p−1)2 +2) /q(m, p)

space where the function q, defined in Theorem 4.8, is polylog(m) for any constant p ≥ 1.
Proof. Using the template at the start of Section 6, the theorem is immediate from Theorem 4.8. Note
that this is an example where we consider a robust communication bound in which the player receiving a
specific token is not chosen uniformly. However, as long as the tokens are distributed independently, the
order of the concatenated stream is chosen uniformly at random as required.

    We note that a weaker bound follows from Theorem 4.10. The reason that a reduction from the
two-player result is weaker (despite the apparent similarity between Theorem 4.10 and Theorem 4.8)
stems from the different definition of communication rounds. In the multi-player setting, p streaming
passes corresponds to p rounds, whereas in the two-player setting, p streaming passes corresponds to
2p − 1 rounds. Hence, a reduction from the two-party setting would result in occurrences of p in the
exponent in the above theorem being replaced by occurrences of 2p − 1.

6.4   Graph streaming
We now consider bounds on solving graph problems given a stream of edges in random order. These
problems are known to require a large amount of space when edges are presented in arbitrary order [15, 26].
The problems are no easier when the edge order is randomized. Using Theorem 3.3 and Theorem 5.3 and
reductions similar in spirit to those in [15, 26] we show the following results.
Theorem 6.4. An r-pass algorithm that, given a stream of edges in random order, determines whether
the resulting graph is connected requires Ω(n/r) space.
Proof. We consider a reduction from DISJn/2,2 where tokens corresponding to each bit are uniformly
distributed between p players. We present a lower bound on the communication required between p
players to determine whether a graph is connected when the edges of the graph are randomly partitioned
between the p players. The stream lower bound follows immediately from the template at the start of
Section 6. Let x = {xi j }i∈[2], j∈[n/2] be an instance of DISJn/2,2 . Based on x we define the following graph
Gx = (L ∪ R, E1 ∪ E2 ∪ E3 ) where L = {l1 , . . . , ln/2 }, R = {r1 , . . . , rn/2 } and the edge set includes

                                  E1 = {(li , ri ) : i ∈ [n/2]} ,
                                  E2 = {(l j , l j+1 ) : j ∈ [n/2], x1, j = 0} ,
                                  E3 = {(r j , r j+1 ) : j ∈ [n/2], x2, j = 0} ,


                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                26
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

where ln/2+1 = l1 and rn/2+1 = r1 . It is easy to see that Gx is disconnected iff there exists j such that
x1, j = x2, j = 1. To perform the reduction, the players replace the token corresponding to each xi, j if
appropriate. Note that the edges of E2 ∪ E3 are randomly partitioned between the players because the
relevant tokens were randomly partitioned. Using public randomness, the players can decide on a random
partition of E1 . In this way the entire edge set of Gx is randomly partitioned between the p players.
Setting p = 40 and appealing to Theorem 3.3 gives the required result.

Theorem 6.5. Given a stream of edges in random order, a single pass algorithm that distinguishes
between when the distance between two given vertices (known at the start of the stream) is at most 1
or at least t + 1 requires Ω(ex(n − 2,C3 , . . . ,Ct+1 )) space, where ex(n − 2,C3 , . . . ,Ct+1 ) is the maximum
number of edges of a graph on n − 2 vertices that does not include any cycles of length strictly less than
t + 2.

   A well-known result in extremal graph theory is that ex(n,C3 , . . . ,Ct+1 ) = Ω(n1+1/t ), and it has long
been conjectured that ex(n,C3 , . . . ,Ct+1 ) = Ω(n1+2/t ) for t ≥ 2; see, e. g., Simonovits [40].

Proof. Let G = (V, E) be a graph on n − 2 vertices with m = ex(n − 2,C3 , . . . ,Ct+1 ) edges such that the
shortest cycle has length at least t + 2. Let e1 , . . . , em be some arbitrary ordering of the edges in G. Let
a, b be two vertices not in V . Consider an instance x ∈ [m] × {0, 1}m of INDEX where one copy of each
xi (i ∈ [m]) and two copies of x0 are distributed uniformly between two players. Consider the reduction
in which, for i ≥ 1, each xi is ignored if xi = 0 and replaced by an edge ei with unit weight if xi = 1.
Suppose that x0 = j and that e j = (u j , v j ). With probability 1/2 replace the first copy of x0 by (a, u j )
and the second copy by (b, v j ); these edges have zero weight. With the remaining probability, replace
them in the reverse order. In this way we define a graph G0 on vertices V ∪ {a, b} where the distance
between a and b is 1 if x j = 1 and at least t + 1 if x j = 0. Hence, any protocol that distinguishes between
the distance being 1 and at least t + 1 also determines the value of x j . Appealing to Theorem 5.3 gives
the required result.


6.5   Information divergences
It is common to interpret streams as defining a distribution over a finite set of tokens: the frequencies can
be rescaled to define an empirical probability distribution. Over such empirical distributions, we desire
to compute various statistical measures. The next theorem extends a result by Guha et al. [21] on the
approximation of information divergences. The results follows from Theorem 5.3 using a variant of their
reduction.

Theorem 6.6. Let a be an even positive integer. Given a randomly ordered stream defining two empirical
distributions p and q on [n], Ω(n) space is required to produce an estimate ĥ such that

                                   h2 (p, q)       p
                                  p          ≤ ĥ ≤ (a + 1)/2 · h2 (p, q)
                                   (a + 1)/2

holds with probability at least 1 − 2−a−3 . Here, h2 denotes squared Hellinger distance.

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                 27
                   A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

Proof. We consider a reduction from INDEX. Let k ∈ [n], x1 . . . xn ∈ {0, 1}n be an instance of INDEX.
Consider the random allocation where a copies of each xi are uniformly distributed between the two
players and k is revealed to a player chosen uniformly at random. The players transform this input into a
set of tokens hp, ii and hq, ii as follows:
    1. Using public randomness, the players generate n random binary strings y1 , . . . , yn ∈ {p, q}a where
       each string has the same number of each symbol. Suppose Alice receives di copies of the token for
       xi . Then, Alice generates a copy of hyij , ii for each j ≤ di if xi = 1. Similarly, Bob generates a copy
       of hyij , ii for each di < j ≤ a if xi = 1. Note that if di = 0 or di = a, then one of the players will not
       know xi . But in this case, that player will be generating the empty set.
    2. The player receiving the token for k generates a copy of hq, ki.
    3. Additionally, the players generate a/2 + 1 copies of hp, n + 1i and a/2 copies of hq, n + 1i. These
       are uniformly distributed between the players.
   In this way, for each i ∈ [n] such that xi = 1, a/2 copies of hp, ii and a/2 copies of hq, ii have been
generated. Additionally, one copy of hq, ki, a/2 + 1 copies of hp, n + 1i and a/2 copies of hq, n + 1i have
been generated. Therefore,
                                    p
                                     1              p         2     
                                  
                                            a/2 − a/2 + 1 + 1 , if xk = 0 ,
                                     m
                                  
                     h2 (p, q) =
                                    2 p          p         2
                                  
                                          a/2 − a/2 + 1 ,                 if xk = 1 ,
                                     m
where m = a/2 + 1 + (a/2)|{i ∈ [n] : xi = 1}|. Furthermore, the ratio of these quantities is
                p       p           2
                   a/2 − a/2 + 1 + 1            1              1                 a+1
                                        2 = + p                           2 >         .
                                                2 2 a/2 − a/2 + 1                  2
                   p       p                                   p
                2 a/2 − a/2 + 1
    Hence, an algorithm producing an estimate satisfying the stated bounds would be sufficient to solve
the instance of INDEX. Therefore, by Theorem 5.3, any stream algorithm that returns such an estimate
requires Ω(n) space.


Acknowledgements
We are grateful to the anonymous referees for a number of comments that were a great help to us in
improving the presentation of the paper and fixing some subtle issues in the proof of Theorem 3.3.


A     Distance between binomial distributions
We prove the following technical lemma, giving an upper bound on the total variation distance between
two binomial distributions that have roughly the same number of trials. The lemma essentially shows that
the total variation distance between two binomial distributions B(a, q) and B(a − w, q) is small as long as
w is small compared to the standard deviation of B(a, q).

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                  28
              ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

Lemma A.1 (Restatement of Lemma 2.7). There exists a constant c1 > 0 such that for all q ∈ [1/2, 1),
r ∈ (0, 1), and a ∈ N,
                                                                    r
                       w                                                2
                       √ ≤ r =⇒ DTV (B(a, q), B(a − w, q)) ≤ c1 r ln ,
                        v                                                r
where v = aq(1 − q) is the variance of B(a, q). In order to define the total variation distance above, we
treat B(n, p) as a distribution on the set of all non-negative integers, rather than just {0, 1, . . . , n}.
Proof. Let γ = 1 − q and
                                                                 
                            a i                                  a−w i
                      αi =    q (1 − q)a−i           and   βi =      q (1 − q)a−w−i .
                            i                                     i
By an application of the Chernoff bounds,
                                   h               p             i
                                Pr |B(a, q) − aq| ≥ 4 ln(2/r) · v ≤ r , and
                         h                         p             i
                       Pr |B(a − w, q) − aq| ≥ wq + 4 ln(2/r) · v ≤ r .
            p
   Let t = 4 ln(2/r) · v and note that
                           p               p                     p
                      t = 4 ln(2/r) · v + 4 ln(2/r) · v ≥ wq + 4 ln(2/r) · v .
              √        p
because wq ≤ v and 4 ln(2/r) ≥ 1. Therefore we may bound the total variation distance as follows.
      DTV (B(a, q), B(a − w, q)) = ∑ |αi − βi |
                                       i
                                  ≤ 2r + ∑ (αi − βi ) +              ∑          (βi − αi )
                                           i∈aq±t               i∈aq±t:βi ≥αi
                                            αi ≥βi

                                  = 2r +         ∑         αi (1 − βi /αi ) +       ∑           βi (1 − αi /βi )
                                           i∈aq±t:αi ≥βi                        i∈aq±t:βi ≥αi
                                                           !                     !             
                                                                   βi                          αi
                                  ≤ 2r + ∑ αi max 1 −                   + ∑ βi max 1 −
                                                     i∈aq±t        αi                i∈aq±t    βi
                                           i∈aq±t     αi ≥βi               i∈aq±t     βi ≥αi
                                            αi ≥βi                          βi ≥αi
                                                                    
                                                     βi                αi
                                  ≤ 2r + 2 − min             − min        ,
                                              i∈aq±t αi        i∈aq±t βi

where the last line follows because ∑i αi = ∑i βi = 1 and αi , βi are positive.
   For i ∈ aq ± t, we have a − i ∈ γa ∓ t, and thus
                           a!(a − w − i)!
                 αi /βi =                   (1 − q)w
                          (a − i)!(a − w)!
                                a(a − 1) . . . (a − w + 1)
                        =                                          (1 − q)w
                          (a − i)(a − i − 1) . . . (a − w − i + 1)
                             aγ w
                                                  w                   w
                                               aγ                     t             wt
                        ≥             ≥                  = 1−                 ≥ 1−        ,
                            a−i             γa + t                  γa + t         γa + t


                       T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                                       29
                  A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

and
                     (a − w)!(a − i)!
             βi /αi =                  (1 − q)−w
                      (a − w − i)!a!
                     (a − i)(a − i − 1) . . . (a − w − i + 1)
                   =                                          (1 − q)−w
                           a(a − 1) . . . (a − w + 1)
                       a−w−i w                γa − w − t w             w+t w      w2 + tw
                                                                      
                   ≥                 ≥                        = 1−           ≥ 1−         .
                          γa                      γa                    γa          γa
Therefore,
                                                  wt      w2 + tw
               DTV (B(a, q), B(a − w, q)) ≤ 2r +
                                                                            p
                                                       +          = O(1) · r ln(2/r) ,
                                                γa + t       γa
                                              p                 p               p
where the last line follows since w2 ≤ tw = 4w ln(2/r) · v ≤ 4rv ln(2/r) < 4rγa ln(2/r).


References
 [1] FARID M. A BLAYEV: Lower bounds for one-way probabilistic communication complexity and
     their application to space complexity. Theoret. Comput. Sci., 157(2):139–159, 1996. Preliminary
     version in ICALP’93. [doi:10.1016/0304-3975(95)00157-3] 7, 10, 23
 [2] A LFRED V. A HO , J EFFREY D. U LLMAN , AND M IHALIS YANNAKAKIS: On notions of in-
     formation transfer in VLSI circuits. In Proc. 15th STOC, pp. 133–139. ACM Press, 1983.
     [doi:10.1145/800061.808742] 2
 [3] N OGA A LON , YOSSI M ATIAS , AND M ARIO S ZEGEDY: The space complexity of approximating
     the frequency moments. J. Comput. System Sci., 58(1):137–147, 1999. Preliminary version in
     STOC’96. [doi:10.1006/jcss.1997.1545] 2, 7, 24
 [4] Z IV BAR -YOSSEF, T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: An information statistics
     approach to data stream and communication complexity. J. Comput. System Sci., 68(4):702–732,
     2004. Preliminary version in FOCS’02. [doi:10.1016/j.jcss.2003.11.006] 6, 7, 8
 [5] Z IV BAR -YOSSEF, T. S. JAYRAM , R AVI K UMAR , D. S IVAKUMAR , AND L UCA T REVISAN:
     Counting distinct elements in a data stream. In Proc. 6th Internat. Workshop on Randomization and
     Computation (RANDOM’02), volume 2483 of LNCS, pp. 1–10. Springer, 2002. [doi:10.1007/3-540-
     45726-7_1] 25
 [6] A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR: Robust lower bounds
     for communication and stream computation. In Proc. 40th STOC, pp. 641–650. ACM Press, 2008.
     [doi:10.1145/1374376.1374470] 1
 [7] A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR: A near-optimal
     algorithm for computing the entropy of a stream. ACM Trans. Algorithms, 6:51:1–51:21, 2010.
     Preliminary version in SODA’07. [doi:10.1145/1798596.1798604] 25

                        T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                      30
             ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

 [8] A MIT C HAKRABARTI , T. S. JAYRAM , AND M IHAI P ĂTRA ŞCU: Tight lower bounds for selection in
     randomly ordered streams. In Proc. 19th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’08),
     pp. 720–729. ACM Press, 2008. ACM DL. 3, 4, 10, 26

 [9] A MIT C HAKRABARTI , S UBHASH K HOT, AND X IAODONG S UN: Near-optimal lower bounds
     on the multi-party communication complexity of set disjointness. In Proc. 18th IEEE
     Conf. on Computational Complexity (CCC’03), pp. 107–117. IEEE Comp. Soc. Press, 2003.
     [doi:10.1109/CCC.2003.1214414] 6, 7, 24

[10] A MIT C HAKRABARTI AND O DED R EGEV: An optimal lower bound on the communication
     complexity of gap-hamming-distance. SIAM J. Comput., 41(5):1299–1317, 2012. Preliminary
     versions in STOC’11 and ECCC. [doi:10.1137/120861072, arXiv:1009.3460] 20, 21

[11] A MIT C HAKRABARTI , YAOYUN S HI , A NTHONY W IRTH , AND A NDREW C HI -C HIH YAO: Infor-
     mational complexity and the direct sum problem for simultaneous message complexity. In Proc.
     42nd FOCS, pp. 270–278. IEEE Comp. Soc. Press, 2004. [doi:10.1109/SFCS.2001.959901] 6

[12] T HOMAS M. C OVER AND J OY A. T HOMAS: Elements of Information Theory (2nd ed.). Wiley-
     Interscience, 2006. 4

[13] E RIK D. D EMAINE , A LEJANDRO L ÓPEZ -O RTIZ , AND J. I AN M UNRO: Frequency estimation
     of internet packet streams with limited space. In Proc. 10th Ann. European Symp. on Algorithms
     (ESA’02), volume 2461 of LNCS, pp. 348–360. Springer, 2002. [doi:10.1007/3-540-45749-6_33] 3

[14] J EFF E DMONDS AND RUSSELL I MPAGLIAZZO: Towards time-space lower bounds on branching
     programs. Unpublished manuscript, 1993. Available on author’s website. 2

[15] J OAN F EIGENBAUM , S AMPATH K ANNAN , A NDREW M C G REGOR , S IDDHARTH S URI , AND J IAN
     Z HANG: Graph distances in the data-stream model. SIAM J. Comput., 38(5):1709–1727, 2008.
     Preliminary version in SODA’05. [doi:10.1137/070683155] 26

[16] J OAN F EIGENBAUM , S AMPATH K ANNAN , M ARTIN S TRAUSS , AND M AHESH V ISWANATHAN:
     An approximate L1 difference algorithm for massive data streams. SIAM J. Comput., 32(1):131–151,
     2002. Preliminary version in FOCS’99. [doi:10.1137/S0097539799361701] 2

[17] P HILIPPE F LAJOLET AND G. N IGEL M ARTIN: Probabilistic counting algorithms for data base
     applications. J. Comput. System Sci., 31(2):182–209, 1985. [doi:10.1016/0022-0000(85)90041-8]
     25

[18] S UMIT G ANGULY: Counting distinct items over update streams. Theoret. Comput. Sci., 378(3):211–
     222, 2007. Preliminary version in ISAAC’05. [doi:10.1016/j.tcs.2007.02.031] 25

[19] M ICHAEL B. G REENWALD AND S ANJEEV K HANNA: Space-efficient online computation of
     quantile summaries. In ACM SIGMOD Internat. Conf. on Managment of Data, pp. 58–66. ACM
     Press, 2001. [doi:10.1145/375663.375670] 26

                     T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                        31
                 A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

[20] A NDRE G RONEMEIER: Asymptotically optimal lower bounds on the NIH-multi-party informa-
     tion complexity of the AND-function and disjointness. In Proc. 26th Symp. Theoretical As-
     pects of Comp. Sci. (STACS’09), volume 3 of LIPIcs, pp. 505–516. Dagstuhl Publishing, 2009.
     [doi:10.4230/LIPIcs.STACS.2009.1846, arXiv:0902.1609] 7, 8, 9

[21] S UDIPTO G UHA , P IOTR I NDYK , AND A NDREW M C G REGOR: Sketching information divergences.
     Machine Learning, 72(1):5–19, 2008. [doi:10.1007/s10994-008-5054-x] 27

[22] S UDIPTO G UHA AND A NDREW M C G REGOR: Space-efficient sampling. In Proc. 11th Internat.
     Conf. on Artificial Intelligence and Stat. (AISTATS’07), volume 2 of JMLR Proceedings, pp. 171–178.
     JMLR, 2007. JMLR.org. 3, 24

[23] S UDIPTO G UHA AND A NDREW M C G REGOR: Stream order and order statistics: Quantile estimation
     in random-order streams. SIAM J. Comput., 38(5):2044–2059, 2009. Preliminary version in
     ICALP’07. [doi:10.1137/07069328X] 3, 10, 11, 26

[24] S UDIPTO G UHA , A NDREW M C G REGOR , AND S URESH V ENKATASUBRAMANIAN: Streaming and
     sublinear approximation of entropy and information distances. ACM Trans. Algorithms, 5(4):35:1–
     35:16, 2009. Preliminary version in SODA’06. [doi:10.1145/1597036.1597038, arXiv:cs/0508122]
     3

[25] N ICHOLAS J. A. H ARVEY, J ELANI N ELSON , AND K RZYSZTOF O NAK: Sketching and streaming
     entropy via approximation theory. In Proc. 49th FOCS, pp. 489–498. IEEE Comp. Soc. Press, 2008.
     [doi:10.1109/FOCS.2008.76] 25

[26] M ONIKA R AUCH H ENZINGER , P RABHAKAR R AGHAVAN , AND S RIDHAR R AJAGOPALAN: Com-
     puting on data streams. In JAMES M. A BELLO AND J EFFREY S COTT V ITTER, editors, External
     Memory Algorithms, volume 50 of DIMACS Ser. in Discr. Math. and Theoret. Comput. Sci., pp.
     107–118. Amer. Math. Soc., 1999. ACM DL. 2, 26

[27] P IOTR I NDYK AND DAVID P. W OODRUFF: Tight lower bounds for the distinct elements problem. In
     Proc. 44th FOCS, pp. 283–288. IEEE Comp. Soc. Press, 2003. [doi:10.1109/SFCS.2003.1238202]
     20

[28] P IOTR I NDYK AND DAVID P. W OODRUFF: Optimal approximations of the frequency moments of
     data streams. In Proc. 37th STOC, pp. 202–208. ACM Press, 2005. [doi:10.1145/1060590.1060621]
     24

[29] T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: Two applications of information complexity.
     In Proc. 35th STOC, pp. 673–682. ACM Press, 2003. [doi:10.1145/780542.780640] 6

[30] T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: The one-way communication complexity of
     Hamming distance. Theory of Computing, 4(6):129–135, 2008. [doi:10.4086/toc.2008.v004a006]
     20

                     T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                           32
             ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

[31] H ARTMUT K LAUCK , A SHWIN NAYAK , A MNON TA -S HMA , AND DAVID Z UCKERMAN: Interac-
     tion in quantum communication and the complexity of set disjointness. IEEE Trans. Inform. Theory,
     53(6):1970–1982, 2007. Preliminary version in STOC’01. [doi:10.1109/TIT.2007.896888] 18

[32] E YAL K USHILEVITZ AND N OAM N ISAN: Communication Complexity. Cambridge Univ. Press,
     1997. 2, 5

[33] TAK WAH L AM AND WALTER L. RUZZO: Results on communication complexity classes. J.
     Comput. System Sci., 44(2):324–342, 1992. Preliminary version in SCT’89. [doi:10.1016/0022-
     0000(92)90025-E] 2

[34] P ETER B RO M ILTERSEN , N OAM N ISAN , S HMUEL S AFRA , AND AVI W IGDERSON: On data
     structures and asymmetric communication complexity. J. Comput. System Sci., 57(1):37–49, 1998.
     Preliminary version in STOC’95. [doi:10.1006/jcss.1998.1577] 13

[35] J. I AN M UNRO AND M IKE PATERSON: Selection and sorting with limited storage. Theoret. Comput.
     Sci., 12(3):315–323, 1980. Preliminary version in FOCS’78. [doi:10.1016/0304-3975(80)90061-4]
     3, 26

[36] C HRISTOS H. PAPADIMITRIOU AND M ICHAEL S IPSER: Communication complexity. J. Com-
     put. System Sci., 28(2):260–269, 1984. Preliminary version in STOC’82. [doi:10.1016/0022-
     0000(84)90069-2] 2

[37] PAVEL P UDLÁK AND J I ŘÍ S GALL: An upper bound for a communication game related to time-space
     tradeoffs. In The Mathematics of Paul Erdős I, pp. 399–407. Springer, 2013. Preliminary version
     (1995) in ECCC. [doi:10.1007/978-1-4614-7258-2_24] 2

[38] P RANAB S EN AND S RINIVASAN V ENKATESH: Lower bounds for predecessor searching in the
     cell probe model. J. Comput. System Sci., 74(3):364–385, 2008. Preliminary version in CCC’03.
     [doi:10.1016/j.jcss.2007.06.016, arXiv:cs/0309033] 13

[39] A LEXANDER A. S HERSTOV: The communication complexity of gap Hamming distance. Theory of
     Computing, 8(8):197–208, 2012. Preliminary version in ECCC. [doi:10.4086/toc.2012.v008a008]
     20

[40] M IKLÓS S IMONOVITS: Extremal graph theory. In L OWELL W. B EINEKE AND ROBIN J. W ILSON,
     editors, Selected Topics in Graph Theory, volume 2, pp. 161–200. Academic Press, 1983. 27

[41] T HOMAS V IDICK: A concentration inequality for the overlap of a vector on a large set, with
     application to the communication complexity of the gap-Hamming-distance problem. Chicago J.
     Theor. Comput. Sci., 2012(1), 2012. Preliminary version in ECCC. [doi:10.4086/cjtcs.2012.001] 20

[42] DAVID P. W OODRUFF: Optimal space lower bounds for all frequency moments. In Proc. 15th Ann.
     ACM-SIAM Symp. on Discrete Algorithms (SODA’04), pp. 167–175. ACM Press, 2004. ACM DL.
     20, 25

                     T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                         33
                A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR

[43] DAVID P. W OODRUFF: The average-case complexity of counting distinct elements. In Proc. 13th
     Internat. Conf. on Database Theory (ICDT’09), volume 361 of ACM Internat. Conf. Proc. Series,
     pp. 284–295. ACM Press, 2009. [doi:10.1145/1514894.1514928] 3

[44] A NDREW C HI -C HIH YAO: Some complexity questions related to distributive computing (prelimi-
     nary report). In Proc. 11th STOC, pp. 209–213. ACM Press, 1979. [doi:10.1145/800135.804414]
     2


AUTHORS

     Amit Chakrabarti
     Dartmouth College, NH
     ac cs dartmouth edu
     http://www.cs.dartmouth.edu/~ac/


     Graham Cormode
     University of Warwick, UK
     g cormode warwick ac uk
     http://dimacs.rutgers.edu/~graham/


     Andrew McGregor
     University of Massachusetts, Amherst, MA
     mcgregor cs umass edu
     https://people.cs.umass.edu/~mcgregor/


ABOUT THE AUTHORS

      A MIT C HAKRABARTI grew up in Bangalore, received a B. Tech. from IIT Bombay, and a
         Ph. D. from Princeton University. While broadly interested in theoretical computer sci-
         ence, he is especially fascinated by impossibility phenomena in algorithms and mathemat-
         ics. During his high school days, he was known to cite Abel’s theorem on algebraically
         solving the general quintic as his favourite mathematical result. This has not changed
         yet. He enjoys playing Scrabble at the international tournament level, and various other
         games with his son at a sub-amateur level.


      G RAHAM C ORMODE grew up in England, received a B. A. from the University of Cambridge,
         and a Ph. D. from the University of Warwick. His research interests within computer
         science span algorithms, databases, networks, and security. In his vanishingly small
         spare time, he likes to solve mysteries.



                     T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                          34
       ROBUST L OWER B OUNDS FOR C OMMUNICATION AND S TREAM C OMPUTATION

A NDREW M C G REGOR grew up in Scotland, received a B. A. and M. Math from the Univer-
   sity of Cambridge, and a Ph. D. from the University of Pennsylvania. He is interested in
   a broad range of problems in theoretical computer science and information theory. He
   sometimes cares about constants but mainly when they are in the exponent. He once
   played Scrabble against an opponent of international renown; who won will be left as a
   mystery.




               T HEORY OF C OMPUTING, Volume 12 (10), 2016, pp. 1–35                          35