Separating $k$-Player from $t$-Player One-Way Communication, with Applications to Data Streams

Authors Elbert Du, Michael Mitzenmacher, David Woodruff, Guang Yang,
Plaintext
                          T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44
                                       www.theoryofcomputing.org




            Separating 𝑘-Player from 𝑡-Player
             One-Way Communication, with
              Applications to Data Streams
          Elbert Du                 Michael Mitzenmacher∗                        David Woodruff
                                            Guang Yang
             Received September 25, 2021; Revised January 8, 2023; Published December 31, 2023




       Abstract.              In a 𝑘-party communication problem, the 𝑘 players with inputs
       𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 want to evaluate a function 𝑓 (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 ) using as little communica-
       tion as possible. We consider the message-passing model, in which the inputs are
       partitioned in an arbitrary, possibly worst-case manner, among a smaller number 𝑡 of
       players (𝑡 < 𝑘). The 𝑡-player communication cost of computing 𝑓 can only be smaller
       than the 𝑘-player communication cost, since the 𝑡 players can trivially simulate the
       𝑘-player protocol. But how much smaller can it be? We study deterministic and
       randomized protocols in the one-way model, and provide separations for product
       input distributions, which are optimal for low error probability protocols. We also
       provide much stronger separations when the input distribution is non-product.
     A preliminary version of this paper, by a subset of the authors, appeared in the Proceedings of the 46th
International Colloquium on Automata, Languages and Programming, 2019 (ICALP’19) [24].
   ∗ Supported in part by NSF grants CCF-2101140, CNS-2107078, and DMS-2023528, and by a gift to the Center for
Research on Computation and Society at Harvard University.


ACM Classification: F.1.3, F.2.3, F.2.1
AMS Classification: 68Q17, 68W25, 68W20
Key words and phrases: streaming, space complexity, hamming norm, approximation, algo-
rithms with predictions, direct sum


© 2023 Elbert Du, Michael Mitzenmacher, David Woodruff, and Guang Yang
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2023.v019a010
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

          A key application of our results is in proving lower bounds for data stream
      algorithms. In particular, we give an optimal Ω(𝜀−2 log(𝑁) log log(𝑚𝑀)) bits of space
      lower bound for the fundamental problem of (1 ± 𝜀)-approximating the number
      k𝑥k 0 of non-zero entries of an 𝑛-dimensional vector 𝑥 after 𝑚 integer updates each
      of magnitude at most 𝑀, and with success probability ≥ 2/3, in a strict turnstile
      stream. We additionally prove the matching Ω(𝜀−2 log(𝑁) log log(𝑇)) space lower
      bound for the problem when we have access to a heavy hitters oracle with threshold
      𝑇. Our results match the best known upper bounds when 𝜀 ≥ 1/polylog(𝑚𝑀) and
      when 𝑇 = 2poly(1/𝜖) , respectively. It also improves on the prior Ω(𝜀−2 log(𝑚𝑀)) lower
      bound and separates the complexity of approximating 𝐿0 from approximating the
      𝑝-norm 𝐿 𝑝 for 𝑝 bounded away from 0, since the latter has an 𝑂(𝜀−2 log(𝑚𝑀)) bit
      upper bound.


1    Introduction
Consider a 𝑘-party communication problem, in which the players have inputs 𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘
and want to compute a function 𝑓 (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 ) of their inputs using as little communication
as possible. We consider the message-passing model, in which the inputs are partitioned
in an arbitrary, possibly worst-case manner among a smaller number 𝑡 of players. That is,
we partition {1, 2, . . . , 𝑘} into 𝑡 subsets 𝑆1 , 𝑆2 , . . . , 𝑆𝑡 such that 𝑡𝑖=1 𝑆 𝑖 = {1, 2, . . . , 𝑘} and
                                                                             Ð
𝑆 𝑖 ∩ 𝑆 𝑗 = ∅ for every 1 ≤ 𝑖 < 𝑗 ≤ 𝑡, and let the 𝑖-th player 𝑃𝑖 hold the sequence of inputs
                        
𝑦 𝑖 := 𝑥 𝑖1 , 𝑥 𝑖2 , . . . , 𝑥 𝑖 |𝑆𝑖 | . When we work in this model in the reduction from streaming, we will
get 𝑖1 + |𝑆 𝑖 | − 1 = 𝑖2 + |𝑆 𝑖 | − 2 = · · · = 𝑖 |𝑆𝑖 | . We are still interested in computing the original
function 𝑓 . The total communication required must be smaller than in the original 𝑘-player
setting, since the 𝑡 players can simulate the protocol involving the original 𝑘 players. A natural
question is: how much smaller can the communication be?
     There are many communication models that are possible, but our main motivation for
looking at this question comes from applications to data streams, see below, and so we are
primarily interested in the one-way number-in-hand model. In this model, each of the 𝑡 players
can only see its own input. The first player composes a message 𝑚1 based on its input 𝑦1 and
sends 𝑚1 to the second player. The second player takes 𝑚1 and its input 𝑦2 to compute a message
𝑚2 for the third player, and so on. The 𝑡-th (also the last) player, upon receiving the message
𝑚𝑡−1 from the (𝑡 − 1)-st player, computes the output of the protocol based on 𝑚𝑡−1 and its own
input 𝑦𝑡 . We sometimes abuse notation and refer to the output as 𝑚𝑡 . The total communication
cost is the maximum of 𝑡𝑖=1 |𝑚 𝑖 |, where |𝑚 𝑖 | denotes the length of the 𝑖-th message and the
                             Í
maximum is taken over all possible inputs 𝑦1 , . . . , 𝑦𝑡 (which is a partition of {𝑥 1 , . . . , 𝑥 𝑘 }) and
all random coin tosses of the players. For streaming applications we are especially interested in
max𝑖∈{1,...,𝑡} |𝑚 𝑖 |.
     To explain the connection to data streams, almost all known lower bound arguments on the
memory required of a data stream algorithm are proven via communication complexity, or at
least can be reformulated using communication complexity. The basic idea is to partition the

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                               2
                 S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

elements of an input stream contiguously, consisting of say 𝑘 elements, into a possibly smaller
number 𝑡 of players. Then one argues that if there is a data stream algorithm solving the problem,
then the communication problem can be solved by passing the memory contents as messages
from player to player. Note that this naturally gives rise to the one-way number-in-hand model.
Since the total communication cost is 𝑡 · 𝑆, where 𝑆 is the size of the memory of the streaming
algorithm, if the randomized 𝑡-player communication complexity of the function 𝑓 is 𝐶𝐶𝑡 , we
must have 𝑆 ≥ 𝐶𝐶𝑡 /𝑡. Many lower bounds in data streams are proven already with two players.
However, it is known that for some functions more players are needed to obtain stronger lower
bounds, such as for estimating the frequency moments in insertion only streams (see, e. g., [3, 25]
and references therein).
    One cannot help but ask how powerful is communication complexity for proving data stream lower
bounds? Another natural question is: for a given function 𝑓 , which number 𝑡 of players should one
partition the stream into? Yet another question is regarding the input distribution – should it be a
product distribution for which the inputs to the players are chosen independently, or should the
inputs be drawn from a non-product distribution to obtain the best space lower bounds? Since
we are interested in the limits of using 𝑡 players for establishing lower bounds for data stream
algorithms, we allow the original 𝑘 inputs (which correspond to the 𝑘 elements in a stream) to
be partitioned in the worst possible way for a 𝑡-player communication protocol, as this will give
the strongest possible lower bound.

1.1   Our results
In this paper we study these communication questions and their connections to data streams.
      We first make the simple observation that for non-product input distributions, the commu-
nication complexity can be arbitrarily smaller if we partition the 𝑘 inputs into 𝑡 < 𝑘 players.
Indeed, consider the 𝑘-player set disjointness problem in which the 𝑖-th player, 1 ≤ 𝑖 ≤ 𝑘, has
a set 𝑆 𝑖 ⊆ [𝑛], where for notational simplicity we define [𝑛] := {1, 2, . . . , 𝑛} for 𝑛 ∈ ℕ . The
input distribution satisfies the promise that either (1) 𝑆 𝑖 ∩ 𝑆 𝑗 = ∅ for every 1 ≤ 𝑖 < 𝑗 ≤ 𝑘, or (2)
there is a unique item 𝑎 ∈ [𝑛] such that 𝑎 ∈ 𝑆 𝑖 for all 𝑖 ∈ [𝑘], and for any other 𝑎 0 ≠ 𝑎, there is
at most one 𝑖 ∈ [𝑘] for which 𝑎 0 ∈ 𝑆 𝑖 . It is well-known that the randomized communication
complexity of this problem is Ω (𝑛/𝑘) [3, 9, 12], and that the bound holds even for multiple
rounds of communication and when players share a common blackboard. However, if we look
at 𝑡 < 𝑘 players and an arbitrary, even if the worst-case mapping of the input sets 𝑆1 , . . . , 𝑆 𝑘 to
the 𝑡 players, then by the pigeonhole principle there exists a player who gets two input sets
𝑆 𝑖 , 𝑆 𝑗 with 𝑖 ≠ 𝑗. Now this player can locally determine the output of the function by checking if
𝑆 𝑖 ∩ 𝑆 𝑗 = ∅. Thus with 𝑡 < 𝑘 players the problem is solvable using 𝑂 (1) bits per player. This
simple argument shows that for non-product distributions, there can be an arbitrarily large
gap between the 𝑘-player and the 𝑡-player worst-case-partitioned randomized communication
complexities. Note that this example applies to a symmetric problem, meaning that the 𝑘-player
set disjointness problem is invariant under any one-to-one assignment of 𝑥 1 , . . . , 𝑥 𝑘 to the 𝑘
players.
      Perhaps surprisingly, and this is one of the main messages of our work: for symmetric
functions and product input distributions, we show that for any 𝑡 < 𝑘, for deterministic

                      T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                          3
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

one-way communication complexity or randomized one-way communication complexity with
error probability 1/poly(𝑘), that is, the gap between the 𝑘-player and 𝑡-player communication
complexities is at most a multiplicative 𝑂 (1) factor in maximum message length, or the maximum
communication from a single player, and 𝑂(𝑘) in total communication. Further, this gap is tight,
as there are problems for which the input distribution is a product   distribution, and the 𝑡-player
communication with 1/poly(𝑘) error probability is 𝑂 log 𝑘 for constant         𝑡 = 𝑂 (1), while the
                                                                
𝑘-player communication with 1/poly(𝑘) error probability is Ω 𝑘 log 𝑘 .
                                                                          
    Thus, the answer for product input distributions is significantly different than what we saw
for non-product distributions, even for symmetric functions.
    We also show that for protocols with constant error and under product input distributions,
the gap is at most a multiplicative 𝑂(log 𝑘) factor in message length and 𝑂(𝑘 log 𝑘) in total
communication. Further, we show that there exists a symmetric function and input distribution
which is product on any 𝑘 − 1 out of 𝑘 inputs, for which this gap is best possible. We leave open
the question of the existence of a symmetric function and product input distribution (on all 𝑘
inputs rather than 𝑘 − 1 out of 𝑘) which realizes this gap for constant error protocols.
    One takeaway message from our results is that when showing space lower bounds for data
stream algorithms computing symmetric functions on product distributions, by looking at
2-player communication complexity (which is by far the most common communication setup),
there is only an 𝑂(1) factor loss for error probability 1/poly(𝑘) protocols, and an 𝑂 log 𝑘 factor
loss for constant error protocols.
    However, for non-product distributions, which are often needed to show hardness of
approximation in data streams (such as for the frequency moments [3]), one may need to use as
many as 𝑘 players in order to obtain a non-trivial lower bound from communication complexity.

1.1.1   Data stream lower bounds:
As a key application of our lower bound techniques, we provide a space lower bound for
(1 ± 𝜀)-approximating the Hamming norm in the strict turnstile model. This problem, which is
also known as the 𝐿0 norm estimation and denoted by T𝜀 , requires estimating kxk 0 := |{𝑖 | 𝑥 𝑖 ≠ 0}|
of a vector x = (𝑥1 , . . . , 𝑥 𝑁 ) and outputting an estimate 𝐹
                                                               e for which (1 − 𝜀)kxk 0 ≤ 𝐹
                                                                                          e ≤ (1 + 𝜀)kxk 0
with constant probability. The vector x is initialized to all zeros and undergoes a sequence
of 𝑚 updates each of the form (𝑖, 𝑣) ∈ [𝑁] × [±𝑀], where [±𝑀] := {0, ±1, . . . , ±𝑀} and each
update (𝑖, 𝑣) causes 𝑥 𝑖 ← 𝑥 𝑖 + 𝑣. In the strict turnstile model 𝑥𝑖 ≥ 0 holds for all 𝑖 and at
all points in the stream. We obtain an Ω 𝜀−2 log(𝑁) log log(𝑚𝑀) bits of space lower bound
for (1 ± 𝜀)-approximating the Hamming norm. This               lower bound matches the best known
upper bound 𝑂 𝜀 log(𝑁) log(1/𝜀) + log log(𝑚𝑀) [16] for any 𝜀 ≥ 1/polylog(𝑚𝑀). Note
                      −2

that 𝜀 ≥ 1/polylog(𝑚𝑀) is required in order to obtain polylogarithmic space, and so is the most
common setting of parameters.
    Perhaps surprisingly, there is an upper bound of 𝑂 𝜀−2log(𝑚𝑀)           bits of space for (1 ± 𝜀)-
                                                                               

approximating 𝐿 𝑝 for 𝑝 > 0 [15] (improving an earlier 𝑂 log2 𝑁 bound of [11]; see also a
time-efficient version in [14]), and thus we provide a strict separation in the complexities for
𝑝 = 0 and 𝑝 > 0.

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                            4
                   S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    The Hamming norm has many applications, as it corresponds to estimating the number of
distinct values, and can be used to estimate set union and intersection sizes (see [8] where it was
introduced).

Lower Bounds in the Learning Augmented Setting Recently, there has been a growing interest
in using machine learning to infer information about the stream that would be useful for solving
certain problems in the streaming setting. In this learning augmented setting, we have access
to an oracle (which in practice would have some degree of error and could be implemented
with machine learning). Learned oracles have been used to develop improved algorithms for
various problems, including frequency estimation [10], caching [19], scheduling [20], frequency
moments [13], and more. A fairly comprehensive survey of learning augmented algorithms can
be found in [21].
    In our setting, the oracle provides an additional operation: we can give the oracle a coordinate,
and the oracle will tell us whether the frequency of this coordinate at the end of the stream is at
least 𝑇 for a threshold 𝑇. We refer to this oracle as the heavy hitters oracle. Approximate heavy
hitter oracles have been used for frequency estimation [10].
    We derive a new method to prove space lower bounds even with a perfect heavy hitters
oracle (that is, an oracle that can be accessed with no space cost which always answers correctly
whether the frequency of the coordinate       is at least 𝑇). We use this method to prove a lower
bound of Ω 𝜀−2 log(𝑁) log log(𝑇) for approximating the 𝐿0 norm, which is optimal when
                                      

𝑇 = 2poly(1/𝜖) as it matches the upper bound in [13]. To prove this, we prove and use a slightly
modified version of the direct sum theorem for Viola’s problem, which will be stated in the
following section.

1.2   Notable changes from the conference version
In this version of our paper, we have substantially updated Section 7. First, the result was
generalized to the setting with a heavy hitters oracle as described in the paragraph above. Second,
the proof of the bound in the previous version used an incorrect reduction to gap-hamming. In
this version, this issue was resolved by instead reducing to gap-orthogonality.

1.3   Technical overview
We first illustrate the idea behind showing there is no gap between 𝑘-player and 2-player
deterministic one-way communication complexity. The first player 𝑃1 of the 𝑘-player protocol
pretends to be Alice, the first player of the 2-player protocol, to create the message 𝑚1 as Alice
would do and sends it to the second player 𝑃2 of the 𝑘-player protocol. Having received this
message 𝑚1 , 𝑃2 enumerates over all possible inputs of 𝑃1 until finding one which would cause 𝑃1
to send 𝑚1 . Since the protocol is deterministic and it evaluates a function defined on a product
domain,1 meaning that it is a total function on a domain of the form 𝑆1 × 𝑆2 × · · · × 𝑆 𝑘 , the

   1Note that while we will be working with non-product input distributions, the function is still defined on all
inputs, including ones that occur with probability 0 in the distribution we are working with.


                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                  5
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

function value must be the same as long as 𝑃1 ’s input results in the same message 𝑚1 to be sent.
So 𝑃2 can arbitrarily pick one of those inputs as its guess for 𝑃1 . Now 𝑃2 has a guess 𝑥 for 𝑃1 ’s
input together with its own input 𝑦, and 𝑃2 can simulate Alice in the 2-player protocol. This
is feasible because the 2-player protocol works under any partitioning of the inputs. Then 𝑃2
sends to the third player 𝑃3 the message that Alice would send to Bob in the 2-player protocol,
given that Alice had input (𝑥, 𝑦). In case when every player 𝑃𝑖 cannot figure out how many
input items have been processed from its own input and the received message 𝑚 𝑖−1 , which
is important for its simulation of the 2-player protocol, an additional logarithmic-many-bits
index carrying this piece of information should be passed together with the simulated messages.
In this way, the entire 𝑘-player protocol can be simulated and the per player communication
equals to the communication of the 2-player protocol between Alice and Bob, sometimes plus
the additional logarithmic many bits for the index. Moreover, both protocols are deterministic.
     For the randomized case with a product input distribution, we first consider 2-player
protocols with error probability 1/poly(𝑘).
     We would like to run the same simulation as for deterministic protocols, except now it is
unclear how the second player 𝑃2 can reconstruct a valid input 𝑥 for the first player 𝑃1 from the
first message 𝑚. A natural thing would be for 𝑃2 to choose the input 𝑥 = 𝑥 𝑚 to 𝑃1 for which
the probability of sending 𝑚, given that 𝑃1 ’s input is 𝑥 𝑚 , is greatest. This is not correct though,
since the overall probability of 𝑃1 holding 𝑥 𝑚 and sending 𝑚 may be less than the 1/poly(𝑘)
error bound and the protocol could afford to be always wrong on such a combination of 𝑥 𝑚 and
𝑚. Thus we need some balancing between two probabilities: (i) the first player 𝑃1 sends 𝑚 on
input 𝑥; and (ii) the protocol output is correct given that 𝑃1 has input 𝑥 and sends 𝑚.
     The above naturally suggests that we should impose an input product distribution 𝜇. Then
it must be that for a good fraction of 𝑥, weighted according to 𝜇, the 𝑘-player protocol is correct
when the first player has input 𝑥 and sends message 𝑚. Thus we can sample 𝑥 from the
conditional distribution on 𝜇 given that message 𝑚 is sent. Here, for correctness, it is crucial that
𝜇 is a product distribution; this ensures for most settings of remaining player’s inputs (weighted
according to 𝜇), for most choices of 𝑥 (weighted according to 𝜇) giving rise to 𝑚, the function
evaluated on the inputs is the same, and 𝑥 can be sampled independently of remaining inputs.
Once we have sampled 𝑥, and given that the second player has private input 𝑦 in the 𝑘-player
protocol, we can then have the second player pretend to be Alice of a randomized 2-player
protocol with input (𝑥, 𝑦), similar to the deterministic case. Ultimately, we will show that under
distribution 𝜇 we obtain a protocol with total communication at most 𝑂 (𝑘) times that of the
2-player protocol with error probability 1/poly(𝑘). The maximum message length, which is an
important resource measure in our setting, blows up by at most an 𝑂 (1) multiplicative factor
times that of the 2-player protocol, where the factor 𝑘 comes from the number of invocations of
the 2-player protocol.
     We illustrate the optimality of the randomized reduction above by looking at the Sum-Equal
problem studied  by   Viola [23]: in this problem each of 𝑘 players holds an input 𝑥 𝑖 mod 𝑝,
where 𝑝 = Θ 𝑘 1/4 is a prime, and they wish to determine whether                  𝑖 𝑥 𝑖 = 0 or 1 mod 𝑝.
                                                                              Í

Viola shows this problem has randomized communication complexity Θ 𝑘 log 𝑘 , for both
                                                                                           
randomized protocols with constant error probability as well as deterministic protocols (and

                      T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                          6
                 S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

thus also randomized protocols with 1/poly(𝑘) error probability). Moreover, for randomized
protocols with 1/poly(𝑘) error probability, Viola’s Ω(𝑘 log 𝑘) lower bound holds even for a
product distribution on the inputs (where if 𝑖 𝑥 𝑖 mod 𝑝 ∉ {0, 1} the output can be arbitrary).
                                                  Í
We observe that under any   partition of the inputs into 2-players Alice and Bob, the problem can
be solved with 𝑂 log 𝑘 bits with probability 1 − 1/poly(𝑘) just by running an equality test on
the sum modulo 𝑝 of Alice and the negated sum modulo 𝑝 of Bob. Thus, this illustrates that
the factor 𝑂(𝑘) gap in total communication for protocols for product input distributions with
1/poly(𝑘) error probability is optimal.
     On the other hand, for constant error protocols and a product input distribution, there is a
2-player 𝑂 (1) bit upper bound in the public coin model which comes from running an equality
test with constant error probability (since we measure error with respect to an input distribution,
equality has an 𝑂(1) upper bound with constant error).
     We note that the 𝑘-player                                  𝑘     𝑘
                                                                        
                                protocol has communication   Ω    log     for constant error protocols,
which gives the Ω 𝑘 log 𝑘 factor gap we claimed. The only downside is that the Ω 𝑘 log 𝑘
                                                                                                    
lower bound holds for an input distribution which is product on 𝑘 − 1 out of 𝑘 players, rather
than all 𝑘 players. We leave it as an open question to give an optimal separation for product
input distributions for constant error probability.
     Given the importance of Viola’s problem in showing separations, we next show a direct
sum theorem for his problem, showing its communication complexity increases to Ω 𝑘𝑚 log 𝑘
for solving a constant fraction of 𝑚 independent copies. This additionally confirms that the
Ω(𝑘 log 𝑘) factor gap noted above is multiplicative and not additive. For technical reasons we
require 𝑚 < 𝑘 𝑐 for a constant 𝑐, as discussed in Remark 6.6, but we suspect this may be an
artifact of the proof.
     To show the direct sum theorem for Viola’s problem, one issue is that, unlike for two players
where the technique of information complexity often provides direct sum theorems, for 𝑘-players
the analogues are much weaker. A natural route would be to take Viola’s corruption bound,
argue it implies a high information bound, and then apply standard direct sum theorems for
information. This approach does not give an information cost lower bound on private coin
protocols, though one can fix it for two players using [5], which improves upon a bound in [6].
However, for 𝑘 players similarly strong bounds are unknown. Another natural approach is to
use the fact that if a problem has a corruption bound, then one immediately has a direct sum for
it [4]. Again though, this is only for two players or the number on forehead model, and not for our
setting.
     Instead, our proof is inspired by Viola’s rectangle argument for a single copy of the Sum-Equal
problem, where each rectangle, restricted to the first 𝑘 − 1 players, is a product distribution
on which the protocol generates a message to the 𝑘-th player. We use a rectangle argument
on multiple copies where the output is now a binary vector instead of a single bit. The main
obstacle is that we must consider the Hamming distance between the protocol output and
the correct answer in a vector space, which is much more involved than studying the error
probability for a single instance. The intuition of our proof is that for every large rectangle,
there must be linearly many copies that appear (almost) uniformly random in the last player’s
view. The above argument is fairly intricate, and involves several levels of conversion:

                      T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                          7
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

    (i) a large rectangle implies large conditional entropy in many players’ inputs;
    (ii) the large entropy of all copies implies we have min-entropy at least 1 on many copies;
 (iii) a random variable of min-entropy at least 1 can always be decomposed into a convex
       combination of uniform distributions over two elements;
 (iv) the summation of 𝜔(𝑝 2 ) independent random variables that are each drawn from a
      uniform-over-two-element distribution turns out to be nearly uniform over ℤ𝑝 due to a
      simple argument based on the primitive roots of unity, and hence many Sum-Equal copies
      look uniform to the last player.
    Thus, the last player can hardly outperform a random guess. Note that it is insufficient to
prove uniformity for many copies individually (which is not too hard using the same idea as in
Viola’s proof), since such a situation could be simulated with a much smaller rectangle with
very small error. We instead perform our rectangle argument inductively to show most copies
appear almost uniform, even if conditioned on previous copies.
    This direct sum technique has further applications. One application of the direct sum
technique, with slight modifications, is to prove a lower bound for approximating the Hamming
norm in a strict turnstile stream. Using a result of [2], to show lower bounds for streaming
algorithms in the strict turnstile model, it suffices to show lower bounds in the simultaneous
communication model, where each player simultaneously sends a linear sketch to a referee
who outputs the answer. To get the desired direct sum property, we have a chain of reductions
leading to the Sum-Equal problem of which we compute the information complexity.
    Specifically, we consider a composition of the Gap-Orthogonality problem on top of the
Sum-Equal instances as well as an augmented index version of the composed problem. When
we compose these problems, each coordinate of the Gap-Orthogonality problem becomes a
Sum-Equal instance, and we show that in order to solve Gap Orthogonality, we must solve most
of the Sum-Equal instances. Thus. we can use a direct sum to bound the information cost of
the composed problem in a similar manner as in [25]. We then prove that approximating the
Hamming norm reduces to the augmented index version of this, which allows us to bound its
communication complexity and accordingly its streaming complexity.
    In the augmented problem we additionally give a referee an index 𝑖 and the answers to all
copies 𝑗, with 𝑗 > 𝑖. Similar augmentation has been studied for 𝐿 𝑝 -norms [15]. This allows us to
reduce our communication problem to Hamming norm approximation, and ultimately prove
our data stream lower bound.


2     Preliminaries
A function 𝑓 : Σ 𝑘 → Γ is called a 𝑘-party symmetric function if for every (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 ) ∈ Σ 𝑘 and
for every permutation 𝜎 over {1, 2, . . . , 𝑘}, we have 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ) = 𝑓 𝑥 𝜎(1) , . . . , 𝑥 𝜎(𝑘) .
    A 𝑘-dimensional vector space 𝑆 is called a product space if it can be represented as 𝑆 =
𝑆1 × 𝑆2 × · · · × 𝑆 𝑘 . A distribution 𝜇 is called a product distribution if it is obtained by taking the
product of 𝑘 independent distributions, i. e., 𝜇 = 𝜇1 × 𝜇2 × · · · × 𝜇 𝑘 .

                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                8
                     S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    In the 𝑡-player communication complexity model, there are 𝑡 computationally unbounded
players, e. g., 𝑃1 , . . . , 𝑃𝑡 , required to compute a function 𝑓 : 𝑋1 × · · · × 𝑋𝑡 → 𝑌, where 𝑓
is usually a 𝑡-party symmetric function. Each player 𝑃𝑖 is given a private input 𝑥 𝑖 ∈ 𝑋𝑖
and follows a fixed protocol to exchange messages. For every input (𝑥1 , . . . , 𝑥 𝑡 ), the mes-
sage transcript is denoted by Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) when all players follow the protocol Π𝑡 (when
Π𝑡 is randomized, Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) is a random variable taking probabilities over players’ ran-
dom coins). A deterministic          protocol Π𝑡 computes 𝑓 if there is a function Π𝑜𝑢𝑡 such that
          (𝑡)                                      (𝑡)
Π𝑜𝑢𝑡 Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ), 𝑥 𝑡 ≡ 𝑓 , where Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) denotes 𝑃𝑡 ’s view under the execution of
                                                                                                                       
                                                                                                    (𝑡)
Π𝑡 on input (𝑥 1 , . . . , 𝑥 𝑡 ) and for simplicity we let Π𝑜𝑢𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) := Π𝑜𝑢𝑡 Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ), 𝑥 𝑡 .
A 𝛿-error randomized protocol Π𝑡 for 𝑓 requires the existence of Π𝑜𝑢𝑡 such that for all inputs
(𝑥 1 , . . . , 𝑥 𝑡 ), PrΠ𝑡 [Π𝑜𝑢𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) = 𝑓 (𝑥1 , . . . , 𝑥 𝑡 )] ≥ 1 − 𝛿. The communication cost of Π𝑡 is the
maximum size of Π𝑡 (𝑥 1 , . . . , 𝑥 𝑡 ) over all 𝑥1 , . . . , 𝑥 𝑡 and all random coins. The 𝑡-player deterministic
communication complexity, denoted by DCC𝑡 ( 𝑓 ), is the cost of the best 𝑡-player deterministic pro-
tocol Π𝑡 for 𝑓 . Analogously, the 𝑡-player 𝛿-error randomized communication complexity, denoted by
RCC𝑡,𝛿 ( 𝑓 ), is the cost of the best 𝑡-player 𝛿-error randomized protocol Π𝑡 for 𝑓 with probability
1 − 𝛿.
      Given a 𝑘-party function 𝑓 : 𝑋1 × · · · × 𝑋 𝑘 → 𝑌 and 𝑡 < 𝑘, we define DCC𝑡 ( 𝑓 ) and RCC𝑡,𝛿 ( 𝑓 )
under a worst-case partition of inputs. That is, let 𝑓𝑡 (𝑧 1 , . . . , 𝑧 𝑡 ) = 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ) be defined for every
partition 𝑖0 = 0 ≤ 𝑖1 ≤ · · · ≤ 𝑖 𝑡 = 𝑘 and 𝑧 𝑗 := (𝑥 𝑖 𝑗−1 +1 , . . . , 𝑥 𝑖 𝑗 ), and the 𝑡-player communication
complexity of 𝑓 is defined with respect to the worst choice of 𝑓𝑡 , i. e., DCC𝑡 ( 𝑓 ) := max 𝑓𝑡 DCC𝑡 ( 𝑓𝑡 )
and RCC𝑡,𝛿 ( 𝑓 ) := max 𝑓𝑡 RCC𝑡,𝛿 ( 𝑓𝑡 ).
                                                                                                    𝜇
    Given a 𝑡-party function 𝑓 and its input distribution 𝜇, we let DCC𝑡,𝛿 ( 𝑓 ) denote the
communication cost of the best 𝑡-player deterministic protocol Π𝑡 computing 𝑓 such that
                                                          𝜇
Pr𝑥∼𝜇 [Π𝑜𝑢𝑡 (𝑥) ≠ 𝑓 (𝑥)] ≤ 𝛿. Similarly we define RCC𝑡,𝛿 ( 𝑓 ) for randomized protocols.
    In the one-way communication model [22, 1, 17], 2 the 𝑖-th player sends exactly one message
to the (𝑖 + 1)-st player for 𝑖 ∈ [𝑡 − 1] following Π𝑡 , and then 𝑃𝑡 announces the output of Π𝑡 as
specified by Π𝑜𝑢𝑡 . Note that in this setting there are only 𝑘 − 1 messages sent by 𝑃1 , . . . , 𝑃𝑘−1 ,
and we do not count the final output announced by 𝑃𝑡 in the communication in order to best
correspond to streaming algorithms. This is also known as a sententious protocol in previous
work, e. g., [23]. We denote the deterministic and randomized 𝑡-player one-way communication
                     −−−→            −−−→
complexities of 𝑓 by DCC𝑡 ( 𝑓 ) and RCC𝑡,𝛿 ( 𝑓 ), respectively.
    In the common reference string model (a. k. a. CRS model), there is a sequence of public random
coins, which is by default a uniformly random binary string, accessible to all players. The
obvious advantage of communication in the CRS model is that players have access to the same
random string and thus save the cost of synchronizing their private coins.
    A streaming algorithm is an algorithm that scans the input (𝑥 1 , . . . , 𝑥 𝑚 ) ∈ Σ𝑚 as 𝑚 stream
input items in sequence, updates its internal memory of size 𝑠 = 𝑜 𝑚 log |Σ| (i. e., a streaming
automaton with 2𝑠 states, where the space cost of updating the internal memory is not accounted

   2We are aware that there are errors in [17]. This does not affect our results in any way as we do not use any
theorems from this work.


                          T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                          9
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

for), and finally outputs a function 𝑓 (𝑥 1 , . . . , 𝑥 𝑚 ) evaluated on all input items. If the best3
deterministic streaming algorithm computes 𝑓 with 𝑠 bits of memory and 𝑡 passes over the data
stream, then we say the deterministic streaming complexity of 𝑓 is 𝑠𝑡, denoted by DSC( 𝑓 ) = 𝑠𝑡.
The 𝛿-error streaming complexity of 𝑓 is defined analogously (with reference to the best 𝛿-error
randomized streaming algorithm) and is denoted by RSC𝛿 ( 𝑓 ) = 𝑠𝑡. In a popular and standard
setting, a streaming algorithm scans the input stream in a single pass and only processes every
input item once. The necessary amount of memory required by such single-pass algorithms
                                                                                         −−−→
is called the single-pass deterministic/𝛿-error streaming complexity and denoted by DSC( 𝑓 ) and
−−−→
RSC𝛿 ( 𝑓 ), respectively.
    Note that every streaming algorithm can be naturally interpreted as a communication
protocol where each party holds some (possibly an empty set of) input items on the stream and
the messages capture the memory updates. The connection between streaming complexity and
communication complexity trivially follows in the following lemma.

Lemma 2.1. For every function 𝑓 and error tolerance 𝛿, for every 𝑘 ∈ ℕ , it holds that:

                                        1                            1
                          DSC( 𝑓 ) ≥      · DCC 𝑘 ( 𝑓 ), RSC𝛿 ( 𝑓 ) ≥ · RCC 𝑘,𝛿 ( 𝑓 )
                                        𝑘                            𝑘
    Furthermore, similar relations hold for single-pass streaming complexities
    versus 𝑘-player one-way communication complexities:

                      −−−→          1    −−−→         −−−→          1    −−−→
                      DSC( 𝑓 ) ≥       · DCC 𝑘 ( 𝑓 ), RSC𝛿 ( 𝑓 ) ≥     · RCC 𝑘,𝛿 ( 𝑓 )
                                   𝑘−1                             𝑘−1
    Next, we introduce the linear sketch model of communication. In this setting, we have 𝑛
players, the last of whom is the referee, and the only protocols allowed are of the following form:
    There is some matrix 𝐴 such that if player 𝑖 receives input 𝑥 𝑖 , they compute 𝐴𝑥 𝑖 and send
the result to the referee. The referee then computes 𝑛𝑖=1 𝐴𝑥 𝑖 and uses the result to compute the
                                                      Í
answer. We denote the randomized communication complexity of a function 𝑓 in this model by
RCC𝐿𝐼𝑁
     𝑘,𝛿 ( 𝑓 ).4

     Additionally, we let D 𝑘,𝛿,𝜇 ( 𝑓 ) denote the communication complexity of 𝑓 with 𝑘 players and
𝛿 error under input distribution 𝜇 and IC 𝑘,𝛿 ( 𝑓 ) denote the information complexity of 𝑓 with 𝑘
players and 𝛿 error. Both of these complexities are considered in the linear sketch model. We
extend the notion of information complexity from [7] to this setting by summing the information
costs over all of the players and allowing some probability of returning an incorrect answer. That
                                                    𝑓
is, let 𝐼 denote mutual information, and let Π 𝑘,𝛿 denote the set of 𝑘-player randomized protocols
in the linear sketch model solving 𝑓 with probability 1 − 𝛿. Additionally, let Π(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 )𝑖
denote the message sent by the 𝑖 𝑡 ℎ player when we run the protocol Π on input (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 ).

    3The “best” such algorithm is the one with the minimal value of 𝑠𝑡 on the input that maximizes 𝑠𝑡.
    4Simultaneous communication models often define the referee as an additional player with no input. In this case,
this is equivalent to our model except the 𝑘 − 1s in our proofs and bounds would become 𝑘s.


                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                    10
                  S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

Then
                                                                  𝑘
                                                                  Õ
                              IC 𝑘,𝛿 ( 𝑓 ) := min                       𝐼 (𝑥 𝑖 , Π(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 )𝑖 ) .
                                                        𝑓
                                                 Π∈Π 𝑘,𝛿 𝑖=1



    The following lemma relates IC 𝑘,𝛿 ( 𝑓 ) to RCC𝐿𝐼𝑁
                                                   𝑘,𝛿 ( 𝑓 )


Lemma 2.2. For any function 𝑓 ,
                                                  IC 𝑘,𝛿 ( 𝑓 ) ≤ RCC𝐿𝐼𝑁
                                                                    𝑘,𝛿 ( 𝑓 )


Proof. Consider any protocol Π solving 𝑓 in the linear sketch model, and let 𝐴 be the matrix
used in the protocol Π. Then, Π(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 )𝑖 = 𝐴𝑥 𝑖 . If we let 𝑏 𝑖 be the number of bits used to
represent 𝐴𝑥 𝑖 for each 𝑖 ∈ [𝑘], we have 𝐼(𝑥 𝑖 , 𝐴𝑥 𝑖 ) ≤ 𝑏 𝑖 for every 𝑖 ∈ [𝑘]. In particular, if we let

                                                                  𝑘
                                                                  Õ
                             Π := argminΠ∈Π 𝑓                            𝐼 (𝑥 𝑖 , Π(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 )𝑖 ) ,
                                                            𝑘,𝛿
                                                                  𝑖=1


then
                                     𝑘
                                     Õ                                                      𝑘
                                                                                            Õ
                    IC 𝑘,𝛿 ( 𝑓 ) =         𝐼 (𝑥 𝑖 , Π(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 )𝑖 ) ≤               𝑏 𝑖 ≤ RCC𝐿𝐼𝑁
                                                                                                           𝑘,𝛿 ( 𝑓 )   
                                     𝑖=1                                                    𝑖=1


     Additionally, IC( 𝑓 ) is well-behaved in the sense that it satisfies the direct sum property. That
is, letting 𝑓 𝑚 denote the problem where we solve 𝑚 independent instances of 𝑓 :

Theorem 2.3. For any function 𝑓 and any positive integer 𝑚,

                                                IC 𝑘,𝛿 ( 𝑓 𝑚 ) ≥ 𝑚 · IC 𝑘,𝛿 ( 𝑓 )

where a 𝛿 probability of failure for 𝑓 𝑚 is defined to mean a 𝛿 probability of failure on each instance.

    This follows from the direct sum theorem on two players and no error by grouping all
but player 𝑖 into the referee for each 𝑖 and summing over the information complexities of the
protocols for each 𝑖. Then, to deal with the 𝛿 probability of error, we simply force the protocols
to be deterministic and consider the function only on the values for which it is correct.
    We denote the randomized communication complexity of a function 𝑓 in the linear sketch
model given that the maximum frequency of any coordinate at the end of the stream is at
most 𝑇 by RCC𝐿𝐼𝑁 𝑘,𝛿
                     ,𝑇
                        ( 𝑓 ). Similarly, in the other models, when we bound the frequency of the
coordinates, we will write D𝐿𝐼𝑁 ,𝑇
                            𝑘,𝛿,𝜇
                                   , IC𝑇𝑘,𝛿 , and RSC𝑇𝑘,𝛿 for distributional complexity, information
complexity, and randomized streaming complexity, respectively.

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                           11
              E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

3    Communication complexity for functions on non-product spaces
Theorem 3.1. For every 𝑡 ≥ 2, there is a 𝑡-party symmetric function 𝑓 : 𝐷 → {0, 1} defined
                                 𝑡                                         −−−→
on 𝐷 ⊆ {0, 1} 𝑛 = {0, 1} 𝑛/𝑡 such that for every error tolerance 𝛿 < 1/4, DCC𝑡−1 ( 𝑓 ) ≤ 𝑡 − 1
                                                                         −−−→
but RCC𝑡,𝛿 ( 𝑓 ) = Ω (𝑛/𝑡). In particular, as long as 𝑡 = 𝑂 (1), we have DCC𝑡−1 ( 𝑓 ) = 𝑂 (1) and
RSC𝛿 ( 𝑓 ) ≥ 1𝑡 · RCC𝑡,𝛿 ( 𝑓 ) = Ω (𝑛).
Proof. Consider the 𝑡-party set disjointness problem Disj𝑛/𝑡,𝑡 defined as follows: there are 𝑡
players 𝑃1 , . . . , 𝑃𝑡 such that every player 𝑃𝑖 holds a private indicator vector x𝑖 ∈ {0, 1} 𝑛/𝑡 which
                                                                     𝑛/𝑡
represents a subset of [𝑛/𝑡], i. e., Disj𝑛/𝑡,𝑡 (x1 , . . . , x𝑡 ) = ∨ 𝑗=1 ∧𝑡𝑖=1 𝑥 𝑖,𝑗 , where 𝑥 𝑖,𝑗 denotes the 𝑗-th
                                                                                     

coordinate of x𝑖 . We consider the domain 𝐷 such that the vectors x1 , . . . , x𝑡 ∈ {0, 1} 𝑛/𝑡 are either
(1) pairwise disjoint, or (2) there exists a unique 𝑗 ∈ [𝑛/𝑡] such that 𝑥 𝑖,𝑗 = 1 for all 𝑖 ∈ [𝑡]. Let 𝑓
be the function that computes Disj𝑛/𝑡,𝑡 on domain 𝐷.
                                                  −−−→
    On the one hand, it is easy to verify that DCC𝑡−1 ( 𝑓 ) ≤ 𝑡 − 1. Indeed, at least one of the
𝑡 − 1 players obtains two distinct indicator vectors and hence can itself decide the output of 𝑓 .
The communication is 1 bit per player to pass the result, and hence the total communication is
bounded by 𝑡 − 1 since there are 𝑡 − 1 players.
    On the other hand, the Ω(𝑛/𝑡) lower bound for RCC𝑡,𝛿 ( 𝑓 ) follows from the known lower
bound for multi-player set disjointness (see [3], which was improved to optimal in [9, 12]). The
lower bound for RSC𝛿 ( 𝑓 ) immediately follows by Lemma 2.1.                                           


4    Deterministic communication and streaming complexity
We first show that 2-player one-way communication complexity is equivalent to the streaming
complexity of single-pass streaming algorithms in the deterministic setting. In the following
theorem, we assume for convenience that 𝑚 is known to both players.
                                                      −−−→         −−−→       −−−→
Theorem 4.1. For every symmetric function 𝑓 : Σ𝑚 → Γ, DCC2 ( 𝑓 ) ≤ DSC( 𝑓 ) ≤ DCC2 ( 𝑓 ) + log 𝑚.
                           −−−→       −−−→
Proof. Obviously, DSC( 𝑓 ) ≥ DCC2 ( 𝑓 ) since a 2-player communication protocol can simulate a
                                                         −−−→       −−−→
streaming algorithm. It remains to prove DSC( 𝑓 ) ≤ DCC2 ( 𝑓 ) + log 𝑚.
     Suppose the input stream is (𝑥1 , . . . , 𝑥 𝑚 ) ∈ Σ𝑚 , and for every partition into (𝑥 1 , . . . , 𝑥 𝑖 ) and
(𝑥 𝑖+1 , . . . , 𝑥 𝑚 ) there is a deterministic 2-player one-way protocol Π2𝑖 computing 𝑓 . We design
the deterministic single-pass streaming algorithm 𝐴 for 𝑓 by simulating 2-player one-way
communication protocols under different partitions. The memory usage of 𝐴 is therefore
bounded by the maximum communication cost of the simulated 2-player protocols plus an
index in [𝑚] recording the number of processed items.
     Notice that when processing the item 𝑥 𝑖+1 , 𝐴 has already processed 𝑥 1 , . . . , 𝑥 𝑖 and has (𝑚 𝑖 , 𝑖)
in memory. 𝐴 can thus reconstruct a compatible guess of 𝑥100 , . . . , 𝑥 00𝑖 that would induce exactly
the message 𝑚 𝑖 as in Π2𝑖 , and then sets the memory to be (𝑚 𝑖+1 , 𝑖 + 1) where 𝑚 𝑖+1 is the message
sent in Π2𝑖+1 when 𝑃1 has (𝑥 100 , . . . , 𝑥 00𝑖 , 𝑥 𝑖+1 ) and 𝑃2 has (𝑥 𝑖+2 , . . . , 𝑥 𝑚 ). Since Π2𝑖 is deterministic,

                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                       12
                    S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

it will always output the same answer when 𝑃2 has input (𝑥 𝑖+2 , . . . , 𝑥 𝑚 ) and receives message
𝑚 𝑖 . Thus, if 𝑥 100 , . . . , 𝑥 00𝑖 would induce the same message 𝑚 𝑖 , then Π2𝑖 would produce the same
answer regardless of whether 𝑃1 had input (𝑥1 , . . . , 𝑥 𝑖 ) or (𝑥100 , . . . , 𝑥 00𝑖 ). In particular, since Π2𝑖
computes 𝑓 , this means 𝑓 (𝑥 1 , . . . , 𝑥 𝑚 ) = 𝑓 (𝑥 100 , . . . , 𝑥 00𝑖 , 𝑥 𝑖+1 , . . . , 𝑥 𝑚 ). Thus, when we compute
𝑚 𝑖+1 , we still get some message that Π2𝑖+1 can use to correctly compute 𝑓 alongside 𝑃2 ’s input
(𝑥 𝑖+2 , . . . , 𝑥 𝑚 ).
      𝐴 repeats this process for every 𝑖 = 1, . . . , 𝑚 − 1 and at the end it outputs 𝑓 (𝑥 1 , . . . , 𝑥 𝑚 ).
                                                       −−−→                 −−−→              −−−→
      Therefore, we complete the proof with DCC2 ( 𝑓 ) ≤ DSC( 𝑓 ) ≤ DCC2 ( 𝑓 ) + log 𝑚.                               

    Note that the additional index 𝑖 in the above simulation, which results in the additive
log 𝑚 term in the upper bound, indicates which 2-player protocol should be simulated in the
reconstruction, and it is implicitly shared in the 2-player communication case when 𝑚 is common
knowledge.
    When 𝑚 is not known, the memory used for the index follows any previously agreed upon
encoding, which uses 𝑂(log 𝑚) space. For functions that are well-defined for an arbitrary number
                                                                               −−−→       −−−→
of input items, e. g., the parity function, this index can be saved, and hence DSC( 𝑓 ) = DCC2 ( 𝑓 ).
    For communication complexity among more players, we establish the following corollary.

Corollary 4.2. For every 𝑘-party symmetric function 𝑓 ,

                                −−−→         −−−→                   −−−→               
                      (𝑘 − 1) · DCC2 ( 𝑓 ) ≤ DCC 𝑘 ( 𝑓 ) ≤ (𝑘 − 1) · DCC2 ( 𝑓 ) + log 𝑘

Proof. Combining Lemma 2.1 and Theorem 4.1, it follows that

                       −−−→                    −−−→                −−−→               
                       DCC 𝑘 ( 𝑓 ) ≤ (𝑘 − 1) · DSC( 𝑓 ) ≤ (𝑘 − 1) · DCC2 ( 𝑓 ) + log 𝑘

                           −−−→                   −−−→
The other direction DCC 𝑘 ( 𝑓 ) ≥ (𝑘 − 1) · DCC2 ( 𝑓 ) holds by giving 𝑧 𝑗 = ∅ to every player
𝑗 ∈ {2, . . . , 𝑘 − 1} in the 𝑘-player case, when the problem degenerates to 2-player communication
but the same message has to be passed 𝑘 − 1 times.                                                

    Such a linear separation naturally extends to the communication complexity of 𝑡-player
versus 𝑘-player protocols, as long as 2 ≤ 𝑡 < 𝑘. Thus, the deterministic communication
complexity grows linearly in the number of parties.
    We remark that if every player must get a non-trivial input, i. e., at least one input element
to the function, the linear growth remains for some but not all problems. For example, the
communication complexity of the parity of 𝑘 bits is linear in the number of players. However, to
decide whether 𝑘 √ elements in [𝑘] are distinct, the 2-player protocol requires communication
     𝑘 
log 𝑘/2 ≈ 𝑘 − log 𝑘, whereas the 𝑘-player worst-case communication grows sublinearly, i. e.,
                                                              Í 𝑘−1      𝑘                 𝑘 
for 𝑘 players the communication is no more than                  𝑖=1 log 𝑖  (𝑘 − 1) · log 𝑘/2 .


                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                       13
              E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

5      Communication complexity for functions on a product space
5.1     Separations for randomized communication complexity
In this section, we consider the communication cost of randomized multi-player protocols
defined on product input distributions and present a 𝑘 log 𝑘 versus 𝑡 log 𝑡 separation between
𝑘-player and 𝑡-player communication complexity.
    First we introduce the Sum-Equal problem (as used in Viola’s work [23]).

Definition 5.1. The 𝑘-player Sum-Equal over integers, denoted by Sum-Equal 𝑘 , requires deciding
         Í𝑘
whether 𝑖=1   𝑥 𝑖 = 0, where each player 𝑃𝑖 is given an integer 𝑥 𝑖 as its private input together
with the integer 𝑘 as public input shared by all players. In the CRS model, an additional
public random string is also known to all players. The 𝑘-player Sum-Equal over ℤ𝑚 , denoted by
Sum-Equal 𝑘,𝑚 , is defined similarly as Sum-Equal 𝑘 , except that the input items are drawn from
ℤ𝑚 and the summation is over ℤ𝑚 , for a publicly known 𝑚.

Lemma 5.2 ([23], Theorem 15 and Theorem 29). For every 𝑘 ∈ ℕ , 0 ≤ 𝛿 ≤ 1/3, and in the CRS
model, the 𝑘-player 𝛿-error communication complexity of Sum-Equal satisfies:

                          −−−→
    (a) For every 𝑚 ∈ ℕ , RCC 𝑘,𝛿 (Sum-Equal 𝑘,𝑚 ) = 𝑂 𝑘 log(𝑘/𝛿) .
                                                                 

    (b) For every prime 𝑝 ∈ (𝑘 1/4 , 2𝑘 1/4 ), RCC 𝑘,𝛿 (Sum-Equal 𝑘,𝑝 ) = Ω 𝑘 log 𝑘 .5
                                                                                           


      In particular, RCC 𝑘,𝛿 (Sum-Equal 𝑘,𝑝 ) = Θ 𝑘 log 𝑘 in the CRS model if 𝛿 = Ω (1/poly(𝑘)).
                                                                 

Remark 5.3. Viola’s lower bound for Sum-Equal 𝑘,𝑝 is proved for a non-product distribution 𝜇0
whose support covers exactly a 2/𝑝 fraction of the whole (product) input space. Specifically, 𝜇0
is defined as follows:

Definition 5.4. We define two distributions 𝐺 and 𝐵:

                                                                       − 𝑘−1
                                   (
                                       𝐺 :=     𝐺1 , . . . , 𝐺 𝑘−1 ,         𝐺𝑗
                                                                       Í         
                                                                        Í𝑗=1
                                                                         𝑘−1
                                       𝐵 :=     𝐵1 , . . . , 𝐵 𝑘−1 , 1 − 𝑗=1 𝐵 𝑗
                                                                                 


where for each 𝑖 ∈ [𝑘 − 1], 𝐺 𝑖 and 𝐵 𝑖 are chosen iid uniformly from ℤ𝑝 . Then, 𝜇0 := 𝐺/2 + 𝐵/2 is
drawn from each distribution with probability 12 .

      Thus if a 𝑘-player protocol solves Sum-Equal 𝑘,𝑝 with error 𝛿 ≤ 1/𝑘 on a uniform distribution
𝜇 over the whole input space, then its error with respect to 𝜇0 is bounded by 2/𝑝
                                                                              1/𝑘
                                                                                  < 𝑘 −3/4 .

    Notice that the two player version of Sum-Equal   𝑘,𝑝 degenerates to testing equality over ℤ 𝑝
whose upper bound is 𝑂 log(1/𝛿) + log log 𝑘 , see more details in Appendix A. By Lemma 5.2,
                                               
the Ω (𝑘) separation in Corollary 5.5 naturally follows.
    5Viola’s states the lower bound for constant 𝛿, but it naturally holds for smaller 𝛿 (sometimes not tight).


                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                    14
                   S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

Corollary 5.5. For every prime 𝑝 ∈ (𝑘 1/4 , 2𝑘 1/4 ) and 𝛿 ≤ 1/poly(𝑘), there is a product distribution 𝜇
             𝜇                                    −−−→
such that RCC 𝑘,𝛿 (Sum-Equal 𝑘,𝑝 ) = Ω 𝑘 log 𝑘 , RCC2,𝛿 (Sum-Equal 𝑘,𝑝 ) = 𝑂 log 𝑘 .
                                                                                         

   For a larger error tolerance, say 𝛿 is a constant, we have a stronger separation between
𝑘-party communication and 𝑡-party communication. However, the hard distribution is slightly
non-product, that is, it is a product distribution on any 𝑘 − 1 out of the 𝑘 players.

Corollary 5.6. For every 𝑘 ∈ ℕ , there is a 𝑘-party symmetric function 𝑓 such that

                                                                           −−−→𝜇
  (a) For any product distribution 𝜇, for every 2 ≤ 𝑡 ≤ 𝑘 and 0 ≤ 𝛿 ≤ 1/3, RCC𝑡,𝛿 ( 𝑓 ) = 𝑂 𝑡 log(𝑡/𝛿) .
                                                                                                      
                     −−−→𝜇
      In particular, RCC2,𝛿 ( 𝑓 ) = 𝑂 log(1/𝛿) .
                                                 

  (b) There exists a distribution 𝜇0, which is product on any 𝑘 − 1 out of 𝑘 players, for which
           𝜇0
      RCC 𝑘,𝛿 ( 𝑓 ) = Ω 𝑘 log 𝑘 as long as 𝛿 ≤ 1/3.
                               


                                            𝜇              −−−→𝜇
      For 𝛿 ≥ 1/poly(𝑡), the gap between RCC 𝑘,𝛿 ( 𝑓 ) and RCC𝑡,𝛿 ( 𝑓 ) is bounded as below:

                                            𝜇
                                      RCC 𝑘,𝛿 ( 𝑓 )    𝑘 log 𝑘
                                                                
                                      −−−→𝜇         =Ω
                                      RCC𝑡,𝛿 ( 𝑓 )     𝑡 log 𝑡


                                                                      −−−→
Proof.     (a) If we plug in 𝑘 = 𝑡 to part (a) of Lemma 5.2, we get RCC 𝑘,𝛿 (Sum-Equal𝑡,𝑚 ) =
         𝑂 𝑡 log(𝑡/𝛿) for every 𝑚, 𝑡 ∈ ℕ and 0 ≤ 𝛿 ≤ 1/3. Thus, Sum-Equal𝑡,𝑚 satisfies (a).

 (b) Part (b) of Lemma 5.2 tells us that for any 0 ≤ 𝛿 ≤ 1/3, we can take some prime
     𝑝 ∈ (𝑘 1/4 , 2𝑘 1/4 ), and we have RCC 𝑘,𝛿 (Sum-Equal 𝑘,𝑝 ) = Ω(𝑘 log 𝑘). Furthermore, as is
     noted in Remark 5.3, we actually have that this holds for a distribution 𝜇0 which is a
     product on the first 𝑘 − 1 players. As the Sum-Equal problem is symmetric with respect to
     all k players, the desired property follows immediately.
                                                                                                       

   The outline of the proof of Corollary 5.6 was given in Section 1. That is, the upper bound in
part (a) follows from applying 𝑘 = 𝑗 in the first part of Lemma 5.2, while the lower bound in
part (b) follows from the second part of Lemma 5.2.

5.2      Tightness of the communication complexity separation
The following theorem and corollary show tightness of our separations.

Theorem 5.7. For every 𝑘-party function 𝑓 : Σ 𝑘 → Γ, product distribution 𝜇 over Σ 𝑘 , and error
tolerance 𝛿 < 1/3,

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                          15
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

   the following holds:
                                  (             −−−→
                  −−−→𝜇             𝑂 𝑘 log 𝑘 · RCC2,𝛿 ( 𝑓 )             if 𝛿 = Ω (1)
                  RCC 𝑘,𝛿 ( 𝑓 ) =           −−−→
                                    𝑂 (𝑘) · RCC2,𝛿 ( 𝑓 ) + 𝑂 𝑘 log 𝑘     if 𝛿 ≤ 1/𝑘 Ω(1)
                                                                     

                         −−−→
When 𝛿 > 0, we have that RCC2,𝛿 ( 𝑓 ) = Ω(log 𝑘) and thus the following holds:
                −−−→𝜇
                RCC 𝑘,𝛿 ( 𝑓 )
                                                   (
                                           log 𝑘       𝑂 𝑘 log 𝑘            if 𝛿 = Ω (1)
                                                               
                −−−→          ≤𝑂 𝑘 · 1 +            =
                RCC2,𝛿 ( 𝑓 )             log(1/𝛿)      𝑂 (𝑘)                if 𝛿 = 1/𝑘 Ω(1)

Proof. First we let Π0 be the optimal 𝛿-error 2-player one-way protocol Π0 that computes 𝑓 with
                       −−−→
communication 𝐶 = RCC2,𝛿 ( 𝑓 ), and construct a new protocol Π2 by taking the majority of 𝑀
                                                                    𝛿2
independent parallel copies of Π0 such that Π2 has error 𝜀 = 16𝑘       2 and communication 𝐶 𝑀.
Recall that Π0 has 𝛿 < 1/3, it suffices to let 𝑡 and 𝑀 be defined as in Lemma 5.8 below:

                         𝛿
                                                
               𝑡 = log       /log (4𝛿(1 − 𝛿))                                                      (5.1)
                       16𝑘 2
                                       log(1/𝛿) + 2 log 𝑘 + 4               log 𝑘
                                                                                
              𝑀 = 1 + 2𝑡 = 1 + 2                                   = Θ 1+                          (5.2)
                                   log(1/𝛿) + log(1/(1 − 𝛿)) − 2          log(1/𝛿)

Lemma 5.8. Let 𝑡 ∈ ℕ and 𝑋1 , 𝑋2 , . . . , 𝑋2𝑡+1 be i.i.d. binary random variable such that Pr[𝑋𝑖 = 1] =
𝛿 < 1/2 for every 𝑖 ∈ [𝑡], and let 𝑌 = Majority{𝑋1 , . . . , 𝑋2𝑡+1 } be the majority of all 𝑋𝑖 ’s. Then
Pr[𝑌 = 1] ≤ 𝜀 as long as 𝑡 ≥ log(𝜀/𝛿)/log(4𝛿(1 − 𝛿)).

Proof. For 0 < 𝛿 < 1/2 and 𝑡 ≥ log(𝜀/𝛿)/log(4𝛿(1 − 𝛿)), we have

                          Pr [𝑌 = 1] = Pr [|{𝑖 | 𝑋𝑖 = 1}| ≥ 𝑡 + 1]
                                         2𝑡+1
                                         Õ           
                                                 2𝑡 + 1 𝑗
                                     =                 𝛿 (1 − 𝛿)2𝑡+1−𝑗
                                                    𝑗
                                         𝑗=𝑡+1
                                         2𝑡+1
                                         Õ            
                                                 2𝑡 + 1 𝑡+1
                                     ≤                 𝛿 (1 − 𝛿)𝑡
                                                    𝑗
                                         𝑗=𝑡+1

                                      22𝑡+1 𝑡+1
                                     =     · 𝛿 (1 − 𝛿)𝑡 = (4𝛿(1 − 𝛿))𝑡 · 𝛿
                                        2
                                      𝜀
                                     ≤ ·𝛿=𝜀
                                      𝛿
    The first inequality holds because 𝛿 < 1/2 and hence 𝛿 𝑗 (1 − 𝛿)2𝑡+1−𝑗 ≤ 𝛿 𝑡+1 (1 − 𝛿)𝑡
for 𝑗 ≥ 𝑡 + 1. The second inequality holds because 4𝛿(1 − 𝛿) < 1 for 𝛿 < 1/2, and
(4𝛿(1 − 𝛿))𝑡 ≤ (4𝛿(1 − 𝛿))log(𝜀/𝛿)/log(4𝛿(1−𝛿)) = 𝜀/𝛿. Thus, we have proved that Pr[𝑌 = 1] ≤ 𝜀 for
𝑡 ≥ log(𝜀/𝛿)/log(4𝛿(1 − 𝛿)).                                                                    

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                         16
                     S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION


                                   one-way protocol but has communication 𝐶 𝑀. Furthermore,
   Note that Π2 is still a 2-player
                                                                                       𝐶
we remark that 𝐶 𝑀 = Ω log      𝑘  for 𝛿 > 0, since       probability must be 𝛿 ≥ 1/2 if it is
                                                   the error
                                            log 𝑘                log 𝑘
                              
not zero, and hence 𝑀 = Θ 1 + log(1/𝛿) = Ω 1 +                     𝐶     .
    Second we prove that for every product input distribution 𝜇 over Σ 𝑘 , the 𝑘-party function
𝑓 can be evaluated by a randomized 𝑘-player one-way protocol Π 𝑘 with communication
𝑂 𝑘 · (𝐶𝑀 + log 𝑘) and error 𝛿/2 with respect to 𝜇. The idea is that given the product input
distribution 𝜇, each player 𝑃𝑖 acts as follows:

   1. 𝑃𝑖 first assumes that the received message 𝑚 𝑖−1 from 𝑃𝑖−1 will lead to a correct answer
                              𝛿
      with probability ≥ 1 − 4𝑘 with respect to 𝜇. When we make this assumption, we essentially
      have 𝑃𝑖 consider the problem using the input 𝑃𝑖−1 generated in their step 2 rather than the
      real input.

   2. 𝑃𝑖 samples a possible input 𝑥 10 , . . . , 𝑥 0𝑖−1 of previous players 𝑃1 , . . . , 𝑃𝑖−1 , such that if Alice
                                                                                             𝛿
      gets input (𝑥 10 . . . , 𝑥 0𝑖−1 ) and sends 𝑚 𝑖−1 , then with probability ≥ 1 − 4𝑘         the protocol Π2
      leads to the correct answer. The probability is taken over internal randomness and Bob’s
      input following the marginal distribution of 𝜇 on the remaining players (here we use the
      condition that 𝜇 is a product distribution).

   3. Finally, 𝑃𝑖 sends a message (𝑚 𝑖 , 𝑖) of length 𝐶 𝑀 + log 𝑘 = 𝑂 (𝐶 𝑀), where 𝑚 𝑖 is the message
      that Alice would send in Π2 when her input is (𝑥 10 , . . . , 𝑥 0𝑖−1 , 𝑥 𝑖 ).

      Now, we can bound the error probability recursively: Suppose player 𝑃𝑖 receives the message
𝑚 𝑖−1 from 𝑃𝑖−1 , generated from (𝑥 1 , 𝑥2 , . . . , 𝑥 𝑖−1 ). Then, suppose 𝑃𝑖 generates (𝑥10 , 𝑥20 , . . . , 𝑥 0𝑖−1 )
in step 2. Then Π2 is correct on the input 𝑋 0 = (𝑥10 , 𝑥20 , . . . , 𝑥 0𝑖−1 , 𝑥 𝑖 , 𝑥 𝑖+1 , . . . , 𝑥 𝑘 ) with probability
           𝛿
≥ 1 − 4𝑘       by our choice of 𝑀. Furthermore, since Π2 generates 𝑚 𝑖 from both (𝑥1 , 𝑥2 , . . . , 𝑥 𝑖−1 )
and (𝑥 1 , 𝑥20 , . . . , 𝑥 0𝑖−1 ) and is deterministic, it produces the same answer on both 𝑋 0 and 𝑋 =
           0

(𝑥 1 , 𝑥2 , . . . , 𝑥 𝑘 ). Thus, the answer we get from Π2 on input 𝑋 0 is also correct on input 𝑋 with
                                    𝛿
probability at least 1 − 2𝑘           .
      Thus, we can union bound over all the players to get that the error probability of Π 𝑘 is
                            𝛿
bounded by 𝑘 · ( 2𝑘           ) = 𝛿/2 with respect to 𝜇. The fact that 𝜇 is a product distribution is used
in the second step where the sampling process relies on that previous players’ inputs are
independently distributed from that of future players.
                                                             −−−→𝜇
      Thus we finish the proof and conclude that RCC 𝑘,𝛿 ( 𝑓 ) ≤ 𝑂 (𝑘𝐶 𝑀).                                               

   Notice that in the proof of Theorem 5.7, every message in Π 𝑘 has the length bounded by
𝑂 (𝐶𝑀), which gives an upper bound for the single-pass streaming complexity.

Corollary 5.9. For every 𝑘-party  function  𝑓 and product input distribution 𝜇, and for every 𝛿 < 1/3,
    𝜇        −−−→𝜇         
                                  log 𝑘
                                         −− −→
RSC𝛿 ( 𝑓 ) ≤ RSC𝛿 ( 𝑓 ) ≤ 𝑂 1 + log(1/𝛿) · RCC2,𝛿 ( 𝑓 ).

                          T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                         17
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

6    A direct sum for Viola’s problem
We next turn to our direct sum theorem for Viola’s problem. This extends the results of the
previous section by demonstrating that the gap is indeed multiplicative, where the results of the
previous section do not rule out the possibility of an additive Ω(𝑘 log 𝑘) gap between 2-player
and 𝑘-player communication complexity. A slightly modified version of this result is used in
our streaming result. Note that the theorem is proved for 𝛿 < 1/9, but lower bounds for large
error tolerance such as 𝛿 = 1/3 can be obtained using a standard error amplification argument.
                           𝑘
Theorem 6.1. Let 𝐹 : ℤ𝑚
                      𝑝          → {0, 1} 𝑚 be the 𝑘-party function computing 𝑚 independent copies of
Sum-Equal 𝑘,𝑝 , where 𝑝 is a prime between 𝑘 1/4 and 2𝑘 1/4 and 𝑚 = 𝑜(𝑘 1/4 ). For every error tolerance
𝛿 ∈ (0, 1/9), we say a protocol Π is correct with probability 1 − 𝛿 if there is a reconstruction function
                                                           𝑘
𝐺 such that for every fixed 𝑖 ∈ [𝑚] and input 𝑥 ∈ ℤ𝑚             , 𝐺 𝑖, Π𝑜𝑢𝑡 (𝑥) equals the output of the 𝑖-th
                                                                               
                                                   𝑝
instance of Sum-Equal 𝑘,𝑝 with probability at least 1 − 𝛿, over the internal randomness
                                                                                     of Π. Then the
communication cost of any Π which is correct with probability 1 − 𝛿, is Ω 𝑚 𝑘 log 𝑘 .
Proof. For simplicity of notation in the proof, we flip the output of 𝐹, so that it outputs 0 if the
input to the corresponding Sum-Equal 𝑘,𝑝 instance sums to 0 in ℤ𝑝 , and 𝐹 outputs 1 on instances
with summation other than 0.
   Let Π be an 𝛿-error randomized protocol for 𝐹, and let Π𝑜𝑢𝑡 (𝑥) denote the output of Π on
input 𝑥. Here by “the 𝛿-error protocol” we mean that the expected error rate of Π is bounded
by 𝛿, since both Π𝑜𝑢𝑡 (𝑥) and 𝐹(𝑥) are binary vectors in {0, 1} 𝑚 . Therefore,
                                       Pr [Π𝑜𝑢𝑡 (𝑥)𝑖 ≠ 𝐹𝑖 (𝑥)] ≤ 𝛿
                                      𝑖∈𝑅 [𝑚]
                                                                             
    where the input to 𝐹 is partitioned as 𝑥 = 𝑥 (1) , 𝑥 (2) , . . . , 𝑥 (𝑚) ∈ ℤ𝑚×𝑘
                                                                                𝑝   such that 𝐹𝑖 (𝑥) :=
Sum-Equal 𝑘,𝑝 (𝑥 (𝑖) ) computes the 𝑖-th instance of Sum-Equal 𝑘,𝑝 for each 𝑖 ∈ [𝑚].
   We abuse notation a little in this proof and let | · | denote the Hamming weight of a not
necessarily binary vector, which measures the number of non-zero coordinates of the vector.
Then,
                                       E [|Π𝑜𝑢𝑡 (𝑥) − 𝐹(𝑥)|] ≤ 𝛿𝑚
    To prove that RCC 𝑘,𝛿 (𝐹) = max𝑥 |Π(𝑥)| = Ω 𝑚 𝑘 log 𝑘 for the optimal 𝛿-error protocol Π,
                                                                       
we will deduce a contradiction if Π uses 𝑐 < 𝛾𝑚 𝑘 log 𝑘 bits of communication, for a constant
𝛾 = (1 − 9𝛿)/135 > 0 and sufficiently large 𝑘. Thus, we can conclude a communication lower
bound of 𝑐 ≥ 𝛾𝑚 𝑘 log 𝑘 = Ω 𝑚 𝑘 log 𝑘 .
                                       
    For the purposes of a contradiction, we first convert the randomized protocol Π into a
deterministic protocol Π0 that has small error with respect to a specific distribution ℋ . The
deterministic protocol Π0 is obtained by fixing all internal random coins of Π so that Π0 has
error rate at most 𝛿 for inputs drawn from ℋ .
                                                Π0𝑜𝑢𝑡 (𝑋) − 𝐹(𝑋)       ≤ 𝛿𝑚
                                                                  
                                    E𝑋∼ℋ


                      T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                18
                  S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    Since Π0 can never generate a transcript larger than the communication that Π uses in the
worst case, i. e., |Π0(𝑋)| ≤ max𝑥 |Π(𝑥)| = 𝑐, it suffices to prove a communication lower bound
for Π0 on inputs drawn from ℋ .
    By Markov’s inequality, we have that for every positive constant 𝜀 > 0,

                                                                         Π0𝑜𝑢𝑡 (𝑋) − 𝐹(𝑋)
                                                                                           
                                                              E𝑋∼ℋ                                  𝛿
                             Π0𝑜𝑢𝑡 (𝑋) − 𝐹(𝑋)    > 𝜀𝑚 ≤
                                                       
                    Pr                                                                          ≤            (6.1)
                  𝑋∼ℋ                                                       𝜀𝑚                      𝜀

   Now we specify the distribution ℋ . Let 𝐺, 𝐵 be defined as in Definition 5.4. Note that:

  (a) Sum-Equal 𝑘,𝑝 (𝐺) = 1, Sum-Equal 𝑘,𝑝 (𝐵) = 0 and hence 𝐹𝑖 (𝐺) = 0, 𝐹𝑖 (𝐵) = 1.

 (b) the first 𝑘 − 1 elements of 𝐺 and 𝐵, denoted by 𝐺−𝑘 and 𝐵−𝑘 , follow the same distribution,
     i. e., the uniform distribution over ℤ𝑝𝑘−1 .

For convenience we can write 𝐵 = (𝐺−𝑘 , 1 + 𝐺 𝑘 ).
   Let 𝐻 := 𝐺/2 + 𝐵/2 be a mixture of 𝐺 and 𝐵 and let ℋ be 𝑚 independent copies of 𝐻 as
below:
                                 ℋ := 𝐻 𝑚 = (𝐺/2 + 𝐵/2)𝑚
Since 𝐵 = (𝐺−𝑘 , 1 + 𝐺 𝑘 ) and 𝐻 = 𝐺/2 + 𝐵/2, we note that
                                     Õ          1    𝑚
                              ℋ=                 𝑚
                                                   (𝐺−𝑘 , 𝑣 + 𝐺𝑚       𝑚          𝑚
                                                               𝑘 ) = (𝐺−𝑘 , 𝑉 + 𝐺 𝑘 ),
                                             𝑚
                                               2
                                   𝑣∈{0,1}

        𝑚                                           𝑚×(𝑘−1)                               𝑘−1 𝑚
where 𝐺−𝑘                                      , 𝐺𝑚 is a vector in ℤ𝑚             𝑚
                                                                    𝑝 such that 𝐺 𝑘 = − 𝑗=1 𝐺 𝑗 ,
                                                                                                        Í
           is uniformly distributed over ℤ𝑝       𝑘
and 𝑉 is a random variable that is uniform over {0, 1} 𝑚 , that we will think of as an element in
ℤ𝑚
 𝑝 . With the above notation of ℋ , 𝑉, we have

                                                   𝑚
                                        𝐹(ℋ ) = 𝐹(𝐺−𝑘 , 𝑉 + 𝐺𝑚
                                                             𝑘 )=𝑉

    To prove the communication lower bound of a deterministic protocol Π0 that has error
probability ≤ 𝛿 w.r.t. ℋ , we recall the following protocol decomposition by monochromatic
rectangles, c.f. Claim 24 in [23] or Lemma 1.16 in [18].
Claim 6.2 ([23], Claim 24). A 𝑘-player (number-in-hand) deterministic protocol using communication
≤ 𝑐 partitions the inputs into 𝐶 ≤ 2𝑐 sets of inputs 𝑅 1 , 𝑅2 , . . . , 𝑅 𝐶 such that

   • the protocol outputs the same value on inputs in the same set, and

   • the sets are rectangles: each 𝑅 𝑖 can be written as 𝑅 𝑖 = 𝑅 1𝑖 × 𝑅 2𝑡 × . . . × 𝑅 𝑖𝑘 where 𝑅 𝑖𝑗 is a subset of
     the inputs of Player 𝑗.
                                                               𝑖
   For every 𝑖 ∈ [𝐶] and rectangle 𝑅 𝑖 , we use the notation 𝑅 −𝑗 := 𝑅 1𝑖 ×𝑅 2𝑖 ×· · ·×𝑅 𝑖𝑗−1 ×𝑅 𝑖𝑗+1 ×· · ·×𝑅 𝑖𝑘
to denote the projection of 𝑅 𝑖 on to the 𝑘 − 1 coordinates except the 𝑗-th one, for every 𝑗 ∈ [𝑘]. In

                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                  19
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

              𝑖
particular, 𝑅 −𝑘  := 𝑅 1𝑖 × 𝑅2𝑖 × · · · × 𝑅 𝑖𝑘−1 denotes the first 𝑘 − 1 coordinates. Sometimes the index 𝑖
of rectangle 𝑅 is clear from context, for which we simply write 𝑅 instead of 𝑅 𝑖 .
                𝑖

    In what follows we show a contradiction when Π0 has communication 𝑐 < 𝛾𝑚 𝑘 log 𝑘 and
hence there are 𝐶 ≤ 2𝑐 < 𝑘 𝛾𝑚 𝑘 rectangles. The argument depends on the following lemma,
which essentially guarantees that for every large rectangle, Π0 is likely to make mistakes on
more than 𝜀𝑚 coordinates.

Lemma 6.3. For every rectangle 𝑅 satisfying Pr𝑋−𝑘 ∼ℋ−𝑘 [𝑋−𝑘 ∈ 𝑅−𝑘 ] ≥ 𝛼𝐶          1
                                                                                      > 𝛼𝑘1𝛾𝑚𝑘 for which
𝛼 = 𝑝 𝑂(1) , there must be a set 𝐿 ⊆ [𝑚] such that |𝐿| = (1 − 135𝛾)𝑚 and the conditional distribution
  (𝐿)      𝑚                                           |𝐿|
𝐺 𝑘 | (𝐺−𝑘     ∈ 𝑅 −𝑘 ) is |𝐿|
                            𝑝 -close to uniform over ℤ 𝑝 ; that is, the total variation distance between
  (𝐿)     𝑚                                                                              (𝐿)
𝐺 𝑘 | (𝐺−𝑘  ∈ 𝑅−𝑘 ) and the uniform distribution is at most |𝐿|
                                                             𝑝 , where 𝐺 𝑘 is the sequence of coordinates
𝐺 𝑘 corresponding to the instances of Sum-Equal in 𝐿.

    Lemma 6.3 implies the following claim:

Claim 6.4. For every rectangle 𝑅 on which Π0 outputs 𝑤 ∈ {0, 1} 𝑚 , if Pr𝑋−𝑘 ∼ℋ−𝑘 [𝑋−𝑘 ∈ 𝑅 −𝑘 ] ≥ 𝛼𝐶
                                                                                                   1
                                                                                                     ,
then for 𝛾, 𝜀 satisfying 1 − 135𝛾 ≥ 3𝜀,
                                         h                              i      1
                                Pr           𝑋 ∈ 𝑅, |𝐹(𝑋) − 𝑤| ≤ 𝜀𝑚 <             Pr [𝑋 ∈ 𝑅]                (6.2)
                              𝑋∼ℋ                                              2 𝑋∼ℋ

    For compactness of the proof of Theorem 6.1 we defer the proofs of Claim 6.4 and Lemma 6.3
to the end of this section.
        e be the set of the 𝐶 rectangles and ℛ ⊆ ℛ
    Let ℛ                                             e be the set of all large rectangles satisfying
Pr𝑋−𝑘 ∼ℋ−𝑘 [𝑋−𝑘 ∈ 𝑅 −𝑘 ] ≥ 𝛼𝐶
                            1
                              > 𝛼𝑘1𝛾𝑚 𝑘 . Then for every rectangle 𝑅 ∈ ℛ\ℛ,
                                                                        e

                                                                                     1      1
                             Pr [𝑋 ∈ 𝑅] ≤            Pr       [𝑋−𝑘 ∈ 𝑅 −𝑘 ] <          ≤
                             𝑋∼ℋ                   𝑋−𝑘 ∼ℋ−𝑘                         𝛼𝐶   𝛼 ℛ\ℛ
                                                                                           e

    Using Claim 6.4, we have

                             Π0𝑜𝑢𝑡 (𝑋) − 𝐹(𝑋) ≤ 𝜀𝑚
                                                             
                  Pr
                 𝑋∼ℋ
                     Õ
                                         𝑋 ∈ 𝑅, 𝐹(𝑋) − Π0𝑜𝑢𝑡 (𝑅) ≤ 𝜀𝑚
                                                                              
                 =            Pr
                             𝑋∼ℋ
                     𝑅∈ℛ
                       e
                     Õ                                                                 Õ
                                         𝑋 ∈ 𝑅, 𝐹(𝑋) − Π0𝑜𝑢𝑡 (𝑅) ≤ 𝜀𝑚 +                        Pr [𝑋 ∈ 𝑅]
                                                                               
                 ≤            Pr
                             𝑋∼ℋ                                                              𝑋∼ℋ
                     𝑅∈ℛ                                                              𝑅∈ℛ\ℛ
                                                                                        e
                     Õ1                                Õ
                 ≤              Pr [𝑋 ∈ 𝑅] +                      Pr [𝑋 ∈ 𝑅]
                             2 𝑋∼ℋ                                𝑋∼ℋ
                     𝑅∈ℛ                             𝑅∈ℛ\ℛ
                                                       e
                     1Õ                              1 e          1   1  1
                 ≤              Pr [𝑋 ∈ 𝑅] +           · ℛ\ℛ ·       ≤ +
                     2          𝑋∼ℋ                  2         𝛼 ℛ\ℛ
                                                                 e    2 2𝛼
                         𝑅∈ ℛ
                            e


                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                20
                 S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

   Combining it with (6.1), we have

                     𝛿                            1  1        2𝛿   1
                       ≤ Pr Π0𝑜𝑢𝑡 (𝑋) − 𝐹(𝑋) ≤ 𝜀𝑚 ≤ +
                            
               1−                                       =⇒ 1 −    ≤
                     𝜀 𝑋∼ℋ                         2 2𝛼         𝜀   𝛼


    However, the above inequality cannot be true if we set 𝜀 = 3𝛿 and pick a constant 𝛼 > 3. Let
𝛾 := (1 − 9𝛿)/135 be the constant for which we want to show 𝑐 ≥ 𝛾𝑚 𝑘 log 𝑘 = Ω 𝑚 𝑘 log 𝑘 . Then
1 − 135𝛾 = 9𝛿 ≥ 3𝜀 satisfies the condition in Claim 6.4 and 𝛼 = 𝑂 (1) satisfies the requirement in
Lemma 6.3.
  Thus we finish the contradiction argument and complete the proof of Theorem 6.1 with
RCC 𝑘,𝛿 (𝐹) ≥ 𝛾𝑚 𝑘 log 𝑘 = Ω 𝑚 𝑘 log 𝑘 .                                              


                                              𝑚 (𝐿)
Proof of Claim 6.4. Recall that 𝐺0 := 𝐺 𝑘 | (𝐺−𝑘 ∈ 𝑅 −𝑘 ), 𝐺0 is |𝐿|/𝑝 close to the uniform distribu-
                                                                   |𝐿|
tion by Lemma 6.3. Therefore for every fixed 𝑢 ∈ ℤ𝑝 , letting 𝑣 denote the complement of 𝑣
(that is, we flip all of the bits in 𝑣),

                         Õ
                                        Pr [𝐺0 = 𝑢 − 𝑣]
                 𝑣∈{0,1} |𝐿| :|𝑣|≤𝜀𝑚

                     1© Õ                                      Õ
                 =        Pr [𝐺0 = 𝑢 − 𝑣] +                              Pr [𝐺0 = 𝑢 − 𝑣]®
                                                                                        ª
                       
                     2
                    «𝑣:|𝑣|≤𝜀𝑚                   𝑣:|𝑣|≥|𝐿|−𝜀𝑚                ¬
                  1©    Õ                            Õ                         2|𝐿| ª
                 ≤           Pr [𝐺0 = 𝑢 − 𝑣] +              Pr [𝐺0 = 𝑢 − 𝑣] +
                                                                                𝑝
                                                                                    ®
                  2
                    «𝑣:|𝑣|≤𝜀𝑚                   𝑣:|𝑣|≥|𝐿|−𝜀𝑚                        ¬
                  1 Õ
                 <                     Pr [𝐺0 = 𝑢 − 𝑣]
                     2
                         𝑣∈{0,1} |𝐿|


where the first inequality follows Lemma 6.3, and the last inequality holds since as long as 𝐺0 is
close to the uniform distribution and |𝐿| = (1 − 135𝛾)𝑚 ≥ 3𝜀𝑚, there is

                                          Õ                                       |𝐿|
                                                      Pr [𝐺0 = 𝑢 − 𝑣] = Ω (1) >
                                                                                   𝑝
                                 𝑣:𝜀𝑚<|𝑣|<|𝐿|−𝜀𝑚



   Recall that 𝑢𝐿 and 𝑣 𝐿 denote 𝑢 and 𝑣 restricted to coordinates in the set 𝐿, 𝑢−𝐿 and 𝑣 −𝐿 denote
                                                  (−𝐿)
𝑢 and 𝑣 restricted to coordinates not in 𝐿, and 𝐺 𝑘 denotes 𝐺 𝑘 restricted to coordinates not in
𝐿. We then apply the above inequality and get the following bound relating probabilities on a

                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                    21
               E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

single coordinate conditional on the rest of the coordinates being contained in the rectangle 𝑅:
                                Õ                     h                                      i
                                                  Pr 𝐺 𝑚          𝑚
                                                       𝑘 = 𝑢 − 𝑣 𝐺−𝑘 ∈ 𝑅 −𝑘
                   𝑣∈{0,1} 𝑚 :|𝑣−𝑤|≤𝜀𝑚
                                     Õ                          h                                i
                   ≤                                      Pr 𝐺 𝑚          𝑚
                                                               𝑘 = 𝑢 − 𝑣 𝐺−𝑘 ∈ 𝑅 −𝑘
                       𝑣∈{0,1} 𝑚 :|𝑣 𝐿 −𝑤 𝐿 |≤𝜀𝑚
                                      Õ                             h                                i
                                                                         (𝐿)     𝑚
                   =                                          Pr 𝐺 𝑘 = 𝑢𝐿 − 𝑣 𝐿 𝐺−𝑘 ∈ 𝑅−𝑘
                       𝑣 𝐿 ∈{0,1} |𝐿| :|𝑣 𝐿 −𝑤 𝐿 |≤𝜀𝑚
                                     Õ                    h                                                            i
                                                               (−𝐿)                   𝑚                  (𝐿)
                           ·                        Pr 𝐺 𝑘               = 𝑢−𝐿 − 𝑣−𝐿 𝐺−𝑘 ∈ 𝑅−𝑘 , 𝐺 𝑘 = 𝑢𝐿 − 𝑣 𝐿
                               𝑣−𝐿 ∈{0,1} 𝑚−|𝐿|
                       1         Õ                h                                          i
                                                      (𝐿)      𝑚
                   <                        Pr 𝐺 𝑘 = 𝑢𝐿 − 𝑣 𝐿 𝐺−𝑘 ∈ 𝑅 −𝑘
                       2
                           𝑣 𝐿 ∈{0,1} |𝐿|
                                     Õ                    h                                                            i
                                                               (−𝐿)                    𝑚                 (𝐿)
                           ·                        Pr 𝐺 𝑘               = 𝑢−𝐿 − 𝑣 −𝐿 𝐺−𝑘 ∈ 𝑅 −𝑘 , 𝐺 𝑘 = 𝑢𝐿 − 𝑣 𝐿
                               𝑣 −𝐿 ∈{0,1} 𝑚−|𝐿|
                       1        Õ             h                                          i
                   =                     Pr 𝐺 𝑚          𝑚
                                              𝑘 = 𝑢 − 𝑣 𝐺−𝑘 ∈ 𝑅 −𝑘                                                         (6.3)
                       2
                           𝑣∈{0,1} 𝑚


The above inequality (6.3) implies (6.2) since:

     Pr [𝑋 ∈ 𝑅, |𝐹(𝑋) − 𝑤| ≤ 𝜀𝑚]
    𝑋∼ℋ
                                                          h                                                    i
    =     Pr       [𝑋−𝑘 ∈ 𝑅−𝑘 ] · Pr                          𝑋 𝑘 ∈ 𝑅 𝑘 , |𝐹(𝑋) − 𝑤| ≤ 𝜀𝑚 𝑋−𝑘 ∈ 𝑅−𝑘
        𝑋−𝑘 ∼ℋ−𝑘                            𝑋 𝑘 ∼ℋ𝑘
                                       Õ         1        h                                                     i
          𝑚                                                                               𝑚
    = Pr 𝐺−𝑘 ∈ 𝑅 −𝑘 ·                                       𝑋     𝑅   ,         𝑤|   𝜀𝑚 𝐺      𝑅    , 𝐹(ℋ     𝑣
                               
                                                     Pr       𝑘 ∈   𝑘   |𝐹(𝑋) −    ≤         ∈   −𝑘       ) =
                                              𝑚
                                                2𝑚 𝑋𝑘 ∼ℋ𝑘                                 −𝑘
                                    𝑣∈{0,1}
                                       Õ         1    h                                          i
          𝑚                                                   𝑚                        𝑚
    = Pr 𝐺−𝑘 ∈ 𝑅 −𝑘 ·                                   𝑣   𝐺     𝑅   ,      𝑤|   𝜀𝑚 𝐺      𝑅
                               
                                                   Pr     +   𝑘 ∈   𝑘   |𝑣 −    ≤         ∈   −𝑘
                                              𝑚
                                                2𝑚                                     −𝑘
                                    𝑣∈{0,1}
                                               Õ                         1 Õ    h                         i
          𝑚                                                                             𝑚       𝑚
    = Pr 𝐺−𝑘 ∈ 𝑅 −𝑘 ·                                                             𝑣   𝐺     𝑢 𝐺      𝑅
                               
                                                                             Pr     +   𝑘 =        ∈   −𝑘
                                                                        2𝑚                      −𝑘
                                    𝑣∈{0,1} 𝑚 :|𝑣−𝑤|≤𝜀𝑚                    𝑢∈𝑅 𝑘
      1    𝑚         Õ                                        Õ                    h                             i
    = 𝑚 Pr 𝐺−𝑘 ∈ 𝑅 −𝑘 ·                                                            Pr 𝑣 + 𝐺 𝑚      𝑚
                                                                                            𝑘 = 𝑢 𝐺−𝑘 ∈ 𝑅 −𝑘
     2
                                            𝑢∈𝑅 𝑘 𝑣∈{0,1} 𝑚 :|𝑣−𝑤|≤𝜀𝑚
         1     𝑚          Õ 1                               Õ                h                          i
    <      Pr  𝐺    ∈ 𝑅 −𝑘 ·                                             Pr 𝑣 + 𝐺 𝑚      𝑚
                                                                                  𝑘 = 𝑢 𝐺−𝑘 ∈ 𝑅 −𝑘
        2𝑚       −𝑘           2
                                            𝑢∈𝑅 𝑘     𝑣∈{0,1} 𝑚
        1
    =      Pr [𝑋 ∈ 𝑅]
        2 𝑋∼ℋ
Thus we have completed the proof of Claim 6.4.                                                                                

                               T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                         22
                           S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

Proof of Lemma 6.3. We prove this lemma inductively for the indices
                                                                   in 𝐿, which
                                                                               we label as
1, 2, . . . , ℓ . In what follows, let 𝛿 𝑖 := 𝑝𝑖 for every 𝑖 ∈ [ℓ ]. Given that 𝐺 𝑘 , . . . , 𝐺 𝑘                                                                          𝑚
                                                                                                                                        (1)                 (ℓ −1)
                                                                                                                                                                          𝐺−𝑘 ∈ 𝑅 −𝑘
is 𝛿ℓ −1 -close to the uniform distribution over ℤℓ𝑝−1 , we will show that there exists another instance
                                                                                                                                   
                                                  (ℓ )                                  (1)                     (ℓ −1)       (ℓ )          𝑚
which, w.l.o.g., we label as 𝐺 𝑘 , for which 𝐺 𝑘 , . . . , 𝐺 𝑘                                                           , 𝐺𝑘             𝐺−𝑘 ∈ 𝑅−𝑘 is 𝛿ℓ -close to
uniform distribution over ℤℓ𝑝 .
 The base case for ℓ = 0 is trivial. In what follows we suppose that the conditional distribution
   (1)            (ℓ −1)      𝑚                                                                                                                                            (ℓ )
 𝐺𝑘 , . . . , 𝐺𝑘             𝐺−𝑘 ∈ 𝑅−𝑘 is already 𝛿ℓ −1 -uniform and we do our induction for 𝐺 𝑘 .
                                                                                       h                                                                           i
                                 (ℓ −1)×(𝑘−1)                                                   (1)                 (ℓ −1)              𝑚
    First we fix 𝑥 ∈ ℤ𝑝                                  for which Pr                       𝐺−𝑘 , . . . , 𝐺−𝑘                    = 𝑥 | 𝐺−𝑘 ∈ 𝑅 −𝑘                        ≥ 𝜂𝑝 (ℓ −1)(𝑘−1)
                                                                                                                                                                                  1
                                                                                                                                                                                          ,
where 𝜂 issome value that will be specified later to get the desired property, and let ℰ 𝑥 denote
                   (1)          (ℓ −1)
the event 𝐺−𝑘 , . . . , 𝐺−𝑘                  = 𝑥. Then we discuss the conditional distribution of the remaining
instances given ℰ 𝑥 .
    Let                                                                           h                              i                       
                                                                                                                                 1
                                    𝐽𝑥 := 𝑗 ∈ [𝑘 − 1] Pr                               𝐺𝑚
                                                                                        𝑗 ∈ 𝑅𝑗              ℰ𝑥 ≥                              ,
                                                                                                                           𝛽𝑘 2𝛾𝑚

Then
                                             Ö              h                               i               Ö                                                  𝑘−1−|𝐽𝑥 |
              𝑚                                                                                                                  1                      1
             𝐺−𝑘 ∈ 𝑅 −𝑘 | ℰ 𝑥                                   𝐺𝑚
                                                                 𝑗 ∈ 𝑅 𝑗 | ℰ𝑥
                               
    Pr                              =                 Pr                                        ≤                                         =                                        (6.4)
                                                                                                                          𝛽𝑘 2𝛾𝑚                      𝛽𝑘 2𝛾𝑚
                                         𝑗∈[𝑘−1]                                                      𝑗∈([𝑘−1]\𝐽𝑥 )
                                                                                                           
                                                                           (1)           (ℓ −1)
    On the other hand, recalling that                                     𝐺−𝑘 , . . . , 𝐺−𝑘                      is uniformly distributed and hence
Pr[ℰ 𝑥 ] = 𝑝 (ℓ −1)(𝑘−1) , we have
                   1


                                        𝑚
                                    Pr 𝐺−𝑘 ∈ 𝑅 −𝑘 | ℰ 𝑥
                                                                          
                                          𝑚
                                    = Pr 𝐺−𝑘 ∈ 𝑅−𝑘 , ℰ 𝑥 /Pr[ℰ 𝑥 ]
                                                                              
                                              h                                i
                                              𝑚              𝑚
                                    = Pr ℰ 𝑥 𝐺−𝑘 ∈ 𝑅−𝑘 · Pr[𝐺−𝑘 ∈ 𝑅 −𝑘 ]/Pr[ℰ 𝑥 ]
                                                                                                                 −1
                                                  1      1             1                                                       1
                                    ≥                ·    𝛾𝑚 𝑘
                                                               ·                                                         =                                                         (6.5)
                                      𝜂𝑝 (ℓ −1)(𝑘−1)   𝛼𝑘        𝑝 (ℓ −1)(𝑘−1)                                               𝜂𝛼𝑘 𝛾𝑚 𝑘

    Combining eqs. (6.4) and (6.5) and letting 𝛽 ≥ (𝜂𝛼) /𝑘 , we can conclude
                                                                                                        2



                                                         𝑘−1−|𝐽𝑥 |
                                              1                                  1
                                                                          ≥
                                          𝛽𝑘 2𝛾𝑚                               𝜂𝛼𝑘 𝛾𝑚 𝑘

                                 𝛾𝑚 𝑘 log 𝑘 + log 𝜂𝛼      𝑘 log 𝑘 + log 𝜂𝛼        𝛾
                                                                                                                                                               
              =⇒ |𝐽𝑥 | ≥ 𝑘 − 1 −                     ≥                      ≥ 1−    𝑘−1
                                  2𝛾𝑚 log 𝑘 + log 𝛽    2 log 𝑘 + 2/𝑘 log 𝜂𝛼      2𝛾
                                                                           𝛾
                                                                                  
    Thus the size of 𝐽𝑥 is at least |𝐽𝑥 | ≥                           1 − 2𝛾            𝑘 − 1 = Ω (𝑘).

                              T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                                                    23
                     E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG
                                                       h                             i
    For every 𝑗 ∈ 𝐽𝑥 , we have Pr 𝐺 𝑚
                                    𝑗
                                      ∈ 𝑅 𝑗 ℰ 𝑥 ≥ 1/𝛽𝑘 2𝛾𝑚 by definition of 𝐽𝑥 and hence

                                                                              𝑝 𝑚−ℓ
                         h                             i                                 
                     H       𝐺𝑚     𝑚
                              𝑗 | 𝐺 𝑗 ∈ 𝑅 𝑗 , ℰ𝑥             ≥ log                             = (𝑚 − ℓ ) log 𝑝 − 2𝛾𝑚 log 𝑘 − log 𝛽                            (6.6)
                                                                              𝛽𝑘 2𝛾𝑚

    The inequality in (6.6) follows by noting that 𝐺 𝑚
                                                     𝑗
                                                       is initially uniform over ℤ𝑚
                                                                                  𝑝 , which has 𝑝
                                                                                                  𝑚

distinct values. When we condition on ℰ 𝑥 , we fix ℓ − 1 of the coordinates, so there are 𝑝 𝑚−ℓ +1
possible values of 𝐺 𝑚
                     𝑗
                       conditional on ℰ 𝑥 . Then, the distribution is still uniform so we must have
at least 𝑝 𝑚−ℓ +1 / 𝛽𝑘 2𝛾𝑚 values of 𝐺 𝑚 satisfying ℰ 𝑥 and 𝐺 𝑚 ∈ 𝑅 𝑗 . Thus, the entropy is at least
                                     
                                       𝑗                      𝑗
                                                                                                  𝑚−ℓ +1 
                                                                                                  𝑝                            𝑝 𝑚−ℓ
                                         h                                    i                                                        
                                    H        𝐺𝑚     𝑚
                                              𝑗 | 𝐺 𝑗 ∈ 𝑅 𝑗 , ℰ𝑥                   ≥ log                      > log
                                                                                                  𝛽𝑘 2𝛾𝑚                       𝛽𝑘 2𝛾𝑚

                                                            (𝑖)
    Note that for every 𝑖 ∈ [𝑚], 𝐺 𝑗 is uniform over ℤ𝑝 as long as 𝑗 ∈ [𝑘 − 1]. Thus conditioned
on ℰ 𝑥 and 𝐺 𝑚 ∈ 𝑅 𝑗 , if there exists 𝑎 ∈ ℤ𝑝 , Pr[𝐺 𝑗 = 𝑎 | 𝐺 𝑚
                                                                                           (𝑖)
             𝑗                                                 𝑗
                                                                 ∈ 𝑅 𝑗 , ℰ 𝑥 ] = 𝑝 𝑎 > 12 then we have an
                                                                                         (𝑖)
upper bound for the conditional entropy of 𝐺 𝑗 :

                                                                              1
                  H[𝐺 𝑗 | 𝐺 𝑚
                             (𝑖)
                            𝑗 ∈ 𝑅 𝑗 , ℰ 𝑥 ] ≤ 𝑝 𝑎 log                         𝑝𝑎
                                                                                 + (1 − 𝑝 𝑎 ) log(𝑝 − 1) < (1 + log(𝑝 − 1))/2

    Let 𝐼 𝑗,𝑥 be defined as
                 n                       h                                     i           o                       h                                i         
                                                                                                                                                           1
                     𝑖 ∈ [𝑚] | H∞ 𝐺 𝑗 | 𝐺 𝑚                                                                             𝐺 𝑗 = 𝑎 | 𝐺𝑚
                                   (𝑖)                                                                                   (𝑖)
   𝐼 𝑗,𝑥 :=                               𝑗 ∈ 𝑅 𝑗 , ℰ𝑥                             ≥ 1 = 𝑖 | ∀𝑎, Pr                                𝑗 ∈ 𝑅 𝑗 , ℰ𝑥          ≤
                                                                                                                                                           2

where H∞ refers to the min-entropy.
  Then for all 𝑖 ∈ 𝐼 𝑗,𝑥 := ([𝑚]\[ℓ − 1])\𝐼 𝑗,𝑥 , H[𝐺 𝑗 | 𝐺 𝑚
                                                     (𝑖)
                                                              ∈ 𝑅 𝑗 , ℰ 𝑥 ] < (1 + log(𝑝 − 1))/2.
                                               
                                                            𝑗
Additionally for 𝑖 ∈ [ℓ − 1], H[𝐺 𝑗 | 𝐺 𝑚
                                                      (𝑖)                                                     (𝑖)
                                        𝑗
                                          ∈ 𝑅 𝑗 , ℰ 𝑥 ] = 0 since 𝐺 𝑗 is already fixed in ℰ 𝑥 . Finally, for
all 𝑖 ∈ 𝐼 𝑗,𝑥 , H[𝐺 𝑗 | 𝐺 𝑚
                             (𝑖)                                                                                                              (𝑖)
                          𝑗
                            ∈ 𝑅 𝑗 , ℰ 𝑥 ] ≤ log 𝑝 since there are 𝑝 possible values of 𝐺 𝑗 . As such, we
can bound the conditional entropy as follows:

    h                                i         𝑚
                                               Õ
  H 𝐺𝑚     𝑚
                                                     H[𝐺 𝑗 | 𝐺 𝑚
                                                             (𝑖)
     𝑗 | 𝐺 𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ≤                                      𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ]                                                                                  (6.7)
                                               𝑖=1
        Õ                                                     Õ                                                          Õ
                H[𝐺 𝑗 | 𝐺 𝑚                                                  H[𝐺 𝑗 | 𝐺 𝑚                                          H[𝐺 𝑗 | 𝐺 𝑚
                       (𝑖)                                                         (𝑖)                                                 (𝑖)
  =                       𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ] +                                             𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ] +                                     𝑗 ∈ 𝑅 𝑗 , ℰ 𝑥 ] (6.8)
      𝑖∈𝐼 𝑗,𝑥                                          𝑖∉𝐼 𝑗,𝑥 ∪𝐼 𝑗,𝑥                                                   𝑖∈𝐼 𝑗,𝑥
        Õ                                                          Õ
                H[𝐺 𝑗 | 𝐺 𝑚                                                  H[𝐺 𝑗 | 𝐺 𝑚
                   (𝑖)                                                          (𝑖)
  =                       𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ] + 0 +                                         𝑗 ∈ 𝑅 𝑗 , ℰ𝑥 ]                                                          (6.9)
      𝑖∈𝐼 𝑗,𝑥                                                      𝑖∈𝐼 𝑗,𝑥

  ≤ |𝐼 𝑗,𝑥 | · log 𝑝 + (𝑚 − ℓ + 1 − |𝐼 𝑗,𝑥 |)(1 + log(𝑝 − 1))/2                                                                                            (6.10)


                                   T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                           24
                             S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    Here, (6.7) follows from the subadditivity of entropy, (6.8) splits the indices into 3 sets, and
                                                                                               (𝑖)
(6.9) and (6.10) use the statements we just showed above to upper bound the entropy of 𝐺 𝑗 for
𝑖 in each of the sets 𝐼 𝑗,𝑥 , 𝐼 𝑗,𝑥 , and 𝑖 in neither set.
                                                                                                     h                                i
     Combining the above with the lower bound for H 𝐺 𝑚
                                                      𝑗
                                                        | 𝐺𝑚
                                                           𝑗
                                                             ∈ 𝑅 𝑗 , ℰ 𝑥 in (6.6),

      (𝑚 − ℓ ) log 𝑝 − 2𝛾𝑚 log 𝑘 − log 𝛽 ≤ |𝐼 𝑗,𝑥 | · log 𝑝 + (𝑚 − ℓ + 1 − |𝐼 𝑗,𝑥 |)(1 + log(𝑝))/2
                                        log 𝑝 − 1                     log 𝑝 − 1               1 + log 𝑝
                                                                                                       
                             =⇒                   |𝐼 𝑗,𝑥 | ≥ (𝑚 − ℓ )           − 2𝛾𝑚 log 𝑘 −           − log 𝛽
                                            2                             2                       2

     Therefore, recalling that 𝑝 > 𝑘 1/4 and for log 𝛽 = 𝑜(log 𝑝) = 𝑜(log 𝑘), we have

                        4𝛾𝑚 log 𝑘    log 𝛽          4𝛾𝑚 log 𝑘
                                                                                
     |𝐼 𝑗,𝑥 | ≥ 𝑚 − ℓ −           −𝑂       > 𝑚 −ℓ − 1           − 𝑜 (1) > 𝑚 − ℓ − 18𝛾𝑚 + 1
                        log 𝑝 − 1    log 𝑝            log 𝑘 − 1                                     4

                                                              (ℓ −1)×(𝑘−1)
     Therefore, for every 𝑥 ∈ ℤ𝑝                                             for which
                                                h                                                               i
                                                         (1)             (ℓ −1)                 𝑚
                                        Pr           𝐺−𝑘 , . . . , 𝐺−𝑘                   = 𝑥 | 𝐺−𝑘 ∈ 𝑅−𝑘 ≥ 𝜂𝑝 (ℓ −1)(𝑘−1)
                                                                                                                  1
                                                                                                                          ,

                                            𝛾
                                               
the size of |𝐽𝑥 | ≥ 1 − 2𝛾 𝑘 − 1 = Ω (𝑘); and for every 𝑗 ∈ 𝐽𝑥 , |𝐼 𝑗,𝑥 | > 𝑚 − ℓ − 18𝛾𝑚 + 1 and
𝐼 𝑗,𝑥 = 𝑚 − ℓ + 1 − |𝐼 𝑗,𝑥 | < 18𝛾𝑚.
     That is, these three bounds hold with probability at least 1 − 𝜂1 by taking a union bound over
               (ℓ −1)×(𝑘−1)
all 𝑥 ∈ ℤ𝑝                     where
                                            h                                                               i               1
                                                        (1)             (ℓ −1)              𝑚
                                     Pr          𝐺−𝑘 , . . . , 𝐺−𝑘                   = 𝑥 | 𝐺−𝑘 ∈ 𝑅−𝑘 <
                                                                                                                        𝜂𝑝 (ℓ −1)(𝑘−1)
                                           
                 (1)               (ℓ −1)             𝑚
for 𝑥 ∼ 𝐺−𝑘 , . . . , 𝐺−𝑘                            𝐺−𝑘 ∈ 𝑅 −𝑘 . In what follows we abuse notation a little by assuming
                                      
           (1)                (ℓ −1)             𝑚                                                                    (ℓ −1)(𝑘−1)
𝒳 := 𝐺−𝑘 , . . . , 𝐺−𝑘                          𝐺−𝑘 ∈ 𝑅−𝑘 is a distribution over ℤ𝑝                                                 for which 𝒳 satisfies all
the above statements of 𝐽𝑥 and 𝐼 𝑗,𝑥 . This causes at most an additional loss of 𝜂1 in the error
probability.                                               
                                                                                             (1)         (ℓ −1)           𝑚
     Notice that the conditional distribution 𝐺−𝑘 , . . . , 𝐺−𝑘                                                          𝐺−𝑘 ∈ 𝑅 −𝑘 is indeed a product
distribution since 𝑅 is a rectangle. That is, letting 𝑥 = (𝑥 1 , . . . , 𝑥 𝑘−1 ) where 𝑥 𝑗 ∈ ℤℓ𝑝−1 for
𝑗 ∈ [𝑘 − 1], then ℰ 𝑥 can be decomposed into 𝑘 − 1 independent events ℰ 𝑥 𝑗 , where each ℰ 𝑥 𝑗 denotes
                                                   
                                                         = 𝑥 𝑗 and ℰ 𝑥 = ∧ 𝑘−1
                       (1)              (ℓ −1)
the event 𝐺 𝑗 , . . . , 𝐺 𝑗                                                    ℰ . Therefore the conditional distribution
                                                                           𝑗=1 𝑥 𝑗

𝐺𝑚
 𝑗
       ℰ 𝑥 is identical to 𝐺 𝑚
                             𝑗
                                                          ℰ 𝑥 𝑗 since the distribution of 𝐺 𝑚
                                                                                            𝑗
                                                                                              is independent from inputs of
the remaining 𝑘 − 2 hplayers (among
                                 i the hfirst 𝑘 − 1 players)
                                                       i     in the product distribution. As
a result, we have Pr 𝐺 𝑚
                       𝑗
                         ∈ 𝑅 𝑗 ℰ 𝑥 = Pr 𝐺 𝑚
                                          𝑗
                                            ∈ 𝑅 𝑗 ℰ 𝑥 𝑗 so that ℰ 𝑥 𝑗 and 𝑥 𝑗 fully determines

                                   T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                  25
                     E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG
                                                                                                                                     n                o
                                                                                                                                         𝐺𝑚
                                                                                                                               (𝑖)
whether 𝑗 ∈ 𝐽𝑥 following the definition of 𝐽𝑥 . Similarly we have 𝐺 𝑗                                                                     𝑗
                                                                                                                                            ∈ 𝑅 𝑗 , ℰ 𝑥 identical to
            n                 o
                𝐺𝑚
  (𝑖)
𝐺𝑗               𝑗
                   ∈ 𝑅 𝑗 , ℰ 𝑥 𝑗 , so that 𝐼 𝑗,𝑥 is also fully determined by 𝑥 𝑗 and ℰ 𝑥 𝑗 .
        Next we fix 𝑗 ∈ [𝑘 − 1] and pick 𝑥 𝑗 ∈ ℤℓ𝑝−1 for which 𝑗 ∈ 𝐽𝑥 for 𝑥 extended from 𝑥 𝑗 . Now we
have ℰ 𝑥 𝑗 and 𝐼 𝑗,𝑥 𝑗 := 𝐼 𝑗,𝑥 containing all but a fraction of < 𝑚−ℓ +1 coordinates, since 𝐼 𝑗,𝑥 𝑗 < 18𝛾𝑚
                                                                                                                18𝛾𝑚

out of the 𝑚 − ℓ + 1 unfixed coordinates in total. Then for 𝑋 𝑗 ∼ 𝒰ℤℓ𝑝−1 and ℐ(·) denoting the
indicator function,
                                  𝑚
                                  Õ                        h                            i              
                                                                                                   1
                                            ℐ Pr 𝑖 ∈ 𝐼 𝑗,𝑋 𝑗 𝑗 ∈ 𝐽𝑋 𝑗                            ≥
                                                   𝑋𝑗                                              3
                                      𝑖=ℓ
                                          𝑚
                                          Õ                     h                            i
                                      ≤        3 Pr 𝑖 ∈ 𝐼 𝑗,𝑋 𝑗 𝑗 ∈ 𝐽𝑋 𝑗
                                                       𝑋𝑗
                                            𝑖=ℓ
                                             Õ𝑚                 Õ                                                         
                                      =3                                       Pr[𝑋 𝑗 = 𝑥 𝑗 ] · ℐ 𝑖 ∈ 𝐼 𝑗,𝑥 𝑗
                                            𝑖=ℓ 𝑥 𝑗 ∈ℤℓ𝑝−1 :𝑗∈𝐽𝑥
                                                                           𝑗

                                                    Õ                                                           𝑚
                                                                                                                Õ                         
                                      =3                                Pr[𝑋 𝑗 = 𝑥 𝑗 | 𝑗 ∈ 𝐽𝑋 𝑗 ] ·                      ℐ 𝑖 ∈ 𝐼 𝑗,𝑥 𝑗
                                            𝑥 𝑗 ∈ℤℓ𝑝−1 :𝑗∈𝐽𝑥 𝑗                                                  𝑖=ℓ
                                                    Õ
                                      =3                                Pr[𝑋 𝑗 = 𝑥 𝑗 | 𝑗 ∈ 𝐽𝑋 𝑗 ] · 𝐼 𝑗,𝑥 𝑗 < 54𝛾𝑚
                                            𝑥 𝑗 ∈ℤℓ𝑝−1 :𝑗∈𝐽𝑥 𝑗



     hThat is, for everyifixed 𝑗 ∈ [𝑘−1], there are at least 𝑚−ℓ +1−54𝛾𝑚 coordinates
                                                                         h           𝑖 ∈ [𝑚]isatisfying
Pr 𝑖 ∈ 𝐼 𝑗,𝑋 𝑗 𝑗 ∈ 𝐽𝑋 𝑗 > 23 , i. e., with probability 23 , 𝐺 𝑗 satisfies H∞ 𝐺 𝑗 | 𝐺 𝑚
                                                                                                 (𝑖)                                 (𝑖)
                                                                                     𝑗
                                                                                       ∈ 𝑅 𝑗 , ℰ 𝑥 ≥ 1 for a
randomly selected 𝑥 𝑗 conditioned on that 𝑗 ∈ 𝐽𝑥 𝑗 specifies a big component in the rectangle. This
is exactly the probability that the 𝑖-th coordinate 𝐺 𝑗 of 𝐺 𝑚
                                                                                                  (𝑖)
                                                             𝑗
                                                               can be decomposed into a convex
combination of a uniform distribution over 2 elements.
    Now we have at least (𝑚 − ℓ +h 1 − 54𝛾𝑚)(𝑘 − i1) pairs of (𝑖, 𝑗) ∈ {ℓ , ℓ + 1, . . . , 𝑚} × [𝑘 − 1]
satisfying the above condition Pr 𝑖 ∈ 𝐼 𝑗,𝑋 𝑗 𝑗 ∈ 𝐽𝑋 𝑗 > 23 , which means at least one fixed 𝑖 must
                                                                          
          (𝑚−ℓ +1−54𝛾𝑚)(𝑘−1)
                                                                               (𝑘 − 1) many pairs for different 𝑗 ∈ [𝑘 − 1] by a standard
                                                           54𝛾𝑚
appear in       𝑚−ℓ +1       =                         1 − 𝑚−ℓ +1
averaging argument. Without loss of generality we may assume 𝑖 = ℓ , and let 𝐺00 := (𝐺001 , . . . , 𝐺00𝑘 )
                                                                                                                                     n               o
                                                                                                                                         𝐺𝑚
                                                                                                                           (ℓ )
denote the conditional distribution of 𝐺 (ℓ ) , i. e., each 𝐺00𝑗 := 𝐺 𝑗                                                                   𝑗
                                                                                                                                            ∈ 𝑅 𝑗 , ℰ 𝑥 denotes the
                                                                                                                 𝛾
                                                                                                                    
                                                   (ℓ )
conditional distribution of 𝐺 𝑗 . Recalling that |𝐽𝑥 | ≥ 1 − 2𝛾 𝑘 − 1, the number of elements in
|𝐽𝑥 | hit by those pairs containing ℓ is at least

                𝛾                                          𝛾
                                                                                                                                           
                           54𝛾𝑚                                54𝛾𝑚
            1−    𝑘−1+ 1−         (𝑘 − 1) − (𝑘 − 1) ≥ 1 −    −         𝑘 − 1 = Ω (𝑘)
               2𝛾         𝑚 −ℓ +1                         2𝛾 𝑚 − ℓ + 1

                               T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                             26
                      S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

                                                                                                        𝛾
                                                                                                           
   We say the pair (𝑖, 𝑗) is good for 𝑥 if 𝑗 ∈ 𝐽𝑥 and 𝑖 ∈ 𝐼 𝑗,𝑥 . Then recalling that |𝐽𝑥 | ≥ 1 − 2𝛾 𝑘 − 1,
the expected number of good (ℓ , 𝑗) over 𝑥 ∼ 𝒳 is lower bounded as follows.

                  E𝑥 # 𝑗 ∈ [𝑘 − 1] | (ℓ , 𝑗) is good for 𝑥
                                                             

                        𝑘−1
                       Õ                                
                              ℐ (ℓ , 𝑗) is good for 𝑥 
                                                        
                  = E𝑥 
                       
                        𝑗=1                             
                                                        
                          𝑘−1
                                                                   𝑘−1
                        Õ                                       Õ                            
                              E𝑥 ℐ (ℓ , 𝑗) is good for 𝑥  = E𝑥      Pr[ℓ ∈ 𝐼 𝑗,𝑥 , 𝑗 ∈ 𝐽𝑥 ]
                                                                                           
                  ≥ E𝑥 
                         𝑗=1                                     𝑗=1 𝑥                       
                                                                                             
                        Õ                            
                              Pr[ℓ ∈ 𝐼 𝑗,𝑥 | 𝑗 ∈ 𝐽𝑥 ]
                                                     
                  ≥ E𝑥 
                         𝑗∈𝐽𝑥 𝑥                      
                                                 
                                     𝛾
                                                           
                      2                  54𝛾𝑚
                  ≥     ·     1−       −         𝑘−1
                      3             2𝛾 𝑚 − ℓ + 1

By a Chernoff bound it implies

                                                                   𝛾
                                                                                           
                                                             1         54𝛾𝑚
                  Pr # 𝑗 ∈ [𝑘 − 1] | (ℓ , 𝑗) is good for 𝑥 ≤                   𝑘
                          
                                                               1−    −
                   𝑥                                         3    2𝛾 𝑚 − ℓ + 1
                                     𝛾
                                                     
                                               54𝛾𝑚
                  ≤ exp −Ω 1 −            −             𝑘
                                    2𝛾 𝑚 − ℓ + 1
                                𝛾
                                           
Let 𝛿0 = exp −Ω             1 − 2𝛾 − 𝑚−ℓ +1 𝑘
                                      54𝛾𝑚
                                                    be an upper bound of this error probability. Then with
probability at least 1 − 𝛿0, the conditional distribution 𝐺00𝑗 can be decomposed into a convex
                                                                                     𝛾
                                                                                              
combination of uniform distributions over two distinct elements for at least 13 1 − 2𝛾 − 𝑚−ℓ +1 𝑘
                                                                                         54𝛾𝑚

indices 𝑗 ∈ [𝑘 − 1].
   Next we show that conditioned on the above decomposition, which happens with probability
≥ 1 − 𝛿0, the conditional distribution 𝐺00𝑘 is close to uniform by the following claim.
Claim 6.5 (Claim 31 in [23]). Let 𝑝 be a prime number. Let 𝑋 be the sum of 𝑡 independent random
                                                                                        √
variables each uniform over {𝑎 𝑖 , 𝑏 𝑖 } ⊂ ℤ𝑝 for 𝑎 𝑖 ≠ 𝑏 𝑖 . Then 𝑋 modulo 𝑝 is 𝛿 ≤ 0.5 𝑝 exp −Ω 𝑡/𝑝 2
                                                                                                        
close to uniform.

Remark 6.6. This claim is actually stated incorrectly in the source, where Viola claimed 𝑂(𝑡/𝑝 2 )
instead of −Ω(𝑡/𝑝 2 ). However, their usage of the claim as well as their proof are consistent
with the statement here. The proof of this claim simply considers primitive roots of unity
and does some basic computation, but to see intuitively why this is the case, we can think of
𝑋 = 𝑡𝑎 𝑖 + 𝑌(𝑏 𝑖 − 𝑎 𝑖 ) where 𝑌 is binomial random√ variable with 𝑡 trials of probability 1/2. The
probability mass of 𝑌 is concentrated within 𝑂( 𝑡) of the mean, so as long as 𝑡 is large enough
for this range to be significantly larger than 𝑝, we will have a strong bound.

                            T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                               27
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

     Plugging our parameters into the above claim and following exactly the same argument as
in [23] (𝐺00𝑘 is 𝛿00-close to uniform if every component in the above convex decomposition of
𝐺00𝑘 is 𝛿00-close to uniform), the statistical distance between 𝐺00𝑘 = − 𝑘−1
                                                                          𝑗=1 𝐺 𝑗 and the uniform
                                                                               00
                                                                        Í

distribution over ℤ𝑝 is bounded by

                                                     𝛾
                                                                                                                     
                            00    √           1           54𝛾𝑚
                           𝛿 ≤ 0.5 𝑝 exp −Ω      1−    −         𝑘/𝑝 2
                                              3     2𝛾 𝑚 − ℓ + 1
                                             𝛾             √
                                                       
                                                   54𝛾𝑚
                             = exp −Ω 1 −       −            𝑘
                                            2𝛾 𝑚 − ℓ + 1

                                                                             (ℓ )        𝑚
    Putting it all together, we conclude that 𝐺 𝑘                                       𝐺−𝑘 ∈ 𝑅 −𝑘 , 𝐺(1) , . . . , 𝐺 (ℓ −1) is close to uniform,
                                                                                    
                                                   
                    (1)          (ℓ −1)      (ℓ )             𝑚
which implies 𝐺 𝑘 , . . . , 𝐺 𝑘           , 𝐺𝑘               𝐺−𝑘 ∈ 𝑅 −𝑘 is also close to uniform. Moreover, its statistical
distance to uniform is bounded by

                                                                                    1
                                                        𝛿ℓ ≤ 𝛿ℓ −1 +                  + 𝛿0 + 𝛿00
                                                                                    𝜂
                                                                     log 𝑘
                                                                            
                                                             𝑂
   Let 𝜂 = 2𝑝 and 𝛽 = 2 ≥ (𝜂𝛼) = 2         for 𝛼 = 𝑝 𝑂(1) . Then for sufficiently large 𝑘m the
                                            2/𝑘                        𝑘

above induction argument goes through for ℓ ≤ (1 − 135𝛾)𝑚, with error 𝛿0 , 𝛿00 bounded by
                                            √                                    𝛾   54𝛾𝑚
𝛿0 = exp (−Ω (𝑘)) , 𝛿00 = exp −Ω                    𝑘            ⇐= 1 −                −         ≥ 0.1 ⇐⇒ ℓ ≤ (1 − 135𝛾)𝑚 + 1
                                                                                    2𝛾 𝑚 − ℓ + 1
                                                                                                           
                                                                     (1)                 (ℓ −1)      (ℓ )        𝑚
Therefore the conditional distribution 𝐺 𝑘 , . . . , 𝐺 𝑘                                          , 𝐺𝑘          𝐺−𝑘 ∈ 𝑅 −𝑘 is 𝛿ℓ -close to uniform
for 𝛿ℓ bounded by 𝑝ℓ as follows:

                                        1              ℓ −1    1        √   ℓ
                     𝛿ℓ ≤ 𝛿ℓ −1 +         + 𝛿0 + 𝛿00 ≤      +    + exp −Ω 𝑘 ≤
                                        𝜂                𝑝    2𝑝                𝑝

     Thus we have proved the induction hypothesis for every ℓ ≤ (1 − 135𝛾)𝑚. Letting 𝐿 be the
first (1 − 135𝛾)𝑚 indices as in the induction hypothesis, we complete the proof of Lemma 6.3 for
|𝐿| = (1 − 135𝛾)𝑚 and statistical distance |𝐿|
                                             𝑝 .                                              


7    Lower bound for Hamming norm estimation
In this section we present a space lower bound for single-pass streaming algorithms for (1 ± 𝜀)-
approximating the Hamming norm 𝐿0 in the strict turnstile model, which is denoted by T𝜀 as in
Section 1.1.1.
    Formally, in the Hamming norm estimation problem there is an underlying vector (𝑥 1 , . . . , 𝑥 𝑁 )
which starts from the all zero vector and processes up to 𝑚 updates each of the form (𝑖, 𝑣) ∈
[𝑁] × [±𝑀]. The update (𝑖, 𝑣) means one should add 𝑣 to the 𝑖-th coordinate 𝑥 𝑖 in the vector 𝑥.

                          T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                28
                    S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

After processing all 𝑚 updates, we have k𝑥k 0 = |{𝑖 | 𝑥 𝑖 ≠ 0}| and we want to output a number
within (1 ± 𝜀)k𝑥k 0 with probability ≥ 2/3. We additionally assume all players have access to a
heavy hitters oracle, which tells them whether the frequency of a given coordinate is greater
than 𝑇. This is a generalization of the case without a heavy hitters oracle, where we simply let
𝑇 = 𝑚𝑀 and we know that all frequencies are guaranteed to be smaller. The strict turnstile
model guarantees that 𝑥 𝑖 ≥ 0 for all 𝑖 ∈ [𝑁] at all positions in the stream, in which case it
suffices to prove the space lower bound in the simultaneous communication model following
the reduction in Theorem 4.1 of [2]. Furthermore, it is also guaranteed that for every 𝑖 ∈ [𝑁],
𝑥 𝑖 ≤ poly(𝑛) at the end of the stream. In this setting, the algorithm of [16] approximates k𝑥k 0 up
to a (1 ± 𝜀) factor with 𝑂 𝜀−2 log(𝑁) log(1/𝜀) + log log(𝑇) bits of space6, as long as 𝜀 > 0.
                                                                
     We first note that solving distinct elements with a heavy hitters oracle reduces to solving
distinct elements given a threshold on the frequency of the coordinates. As such, we will solve
the complexity question of space complexity given a threshold 𝑇 for the frequency.

Theorem 7.1. The space complexity of (1 ± 𝜖) approximating 𝐿0 with probability at least 2/3 in a strict
turnstile stream with access to a heavy hitters oracle with a threshold of 𝑇 > 1 is Ω(𝜖 −2 log 𝑁 log log 𝑇).

    We note that the assumption 𝑇 > 1 is necessary for this bound to be well defined. When
𝑇 = 1, the heavy hitters oracle tells us exactly whether or not the frequency of a coordinate is 0
at the end of the stream, so the complexity is Θ(log 𝑁). This lower bound follows as we need to
write down the answer and the upper bound follows as we can directly count the elements with
nonzero frequency.
    To prove this theorem, we first prove the following lemma:

Lemma 7.2. The space complexity of (1 ± 𝜖) approximating 𝐿0 with probability at least 2/3 in a strict
                                                                                                𝑇
turnstile stream with access to a heavy hitters oracle with a threshold of 𝑇 > 1 is at least 𝑅𝑆𝐶2/3 (𝑇𝜖 ).

Proof. Suppose we have an algorithm 𝐴 which gives us a (1 ± 𝜖) approximation of 𝐿0 in a strict
turnstile stream with access to a heavy hitters oracle with a threshold of 𝑇.

    Now, if we are given an input where the maximum frequency of any element is at most 𝑇,
then we can go through our input and do exactly what 𝐴 would do for everything other than
calls to the heavy hitters oracle. If 𝐴 would make a call to a heavy hitters oracle, we just treat the
answer as 0 without making this query and proceed as 𝐴 would.

     Since we assumed the input has a maximum frequency of 𝑇, the heavy hitters oracle would
return 0 for every element, so this would give us the same answer as 𝐴, and by correctness of 𝐴,
it is a (1 ± 𝜖) approximation.                                                                

    Now, we will state and prove our main theorem:
                                                
   6Indeed, their algorithm stores 𝑂 𝜀−2 log 𝑁 counters modulo primes that are each 𝑂 log(1/𝜀) + log log(𝑇)
                                                                                                                      

bits in magnitude, and it does not matter how large the values of 𝑥 𝑖 are at intermediate positions in the stream.


                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                        29
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG
                                                                               q                     
                                                                                     log 𝑘
Theorem 7.3. For error tolerance 𝜀 < 1/3 and 𝜀 = max Ω                                 𝑘         , 𝑁 0.49 , any single-pass
                                                                                                    1


streaming algorithm solving T𝜀 with probability ≥ 2/3 in the strict turnstile model must use
Ω 𝜀 log(𝑁) log log(𝑇) bits of space.
     −2
                        

    First we introduce some supplementary problems that will be used in reductions:

Definition 7.4. In the 𝑐-Gap-Ort𝑛 problem, we have two players Alice and Bob. They each have
as input a vector in {0, 1} 𝑛 and we wish to compute
                                           (                                          √
                                                      ( 𝑖∈𝑛 XOR(𝑥 𝑖 , 𝑦 𝑖 )) − 𝑛2 ≥ 2𝑐 𝑛,
                                                       Í
                                        1,
                    𝑐-Gap-Ort𝑛 (𝑥, 𝑦) =                                              √
                                                      ( 𝑖∈𝑛 XOR(𝑥 𝑖 , 𝑦 𝑖 )) − 𝑛2 ≤ 𝑐 𝑛,
                                                       Í
                                        0,
and otherwise, it can return anything.

Definition 7.5. In the 𝑐-Gap-Ort-Sum-Equal𝑛,𝑘 problem, we have 𝑘 players. The 𝑖 𝑡 ℎ player has
input (𝑥 𝑖,1 , 𝑥 𝑖,2 , . . . , 𝑥 𝑖,𝑛 ) ∈ ℤ𝑛 . Then we wish to compute

                                                         Í                                              √
                                                                                                 − 𝑛2 ≥ 2𝑐 𝑛,
                                               
                                                1,         𝑗∈[𝑛] 1 𝑥     +𝑥     +···+𝑥     =0
                                               
                                                                                       𝑘,𝑗
         𝑐-Gap-Ort-Sum-Equal𝑛 (𝑥, 𝑦) =
                                                                      1,𝑗    2,𝑗
                                                         Í                                    
                                                                                                   𝑛
                                                                                                         √
                                                0,         𝑗∈[𝑛] 𝑥 1,𝑗 +𝑥2,𝑗 +···+𝑥 𝑘,𝑗 =0 − 2 ≤ 𝑐 𝑛,
                                                                  1
                                               
                                               
and in other cases it can return either 0 or 1. We let 1𝑥1,𝑗 +𝑥2,𝑗 +···+𝑥 𝑘,𝑗 =0 denote the indicator function
which is 1 if 𝑥1,𝑗 + 𝑥2,𝑗 + · · · + 𝑥 𝑘,𝑗 = 0 and 0 otherwise.
   We will often be working with the 2-player problem 𝑐-Gap-Ort-Sum-Equal𝑛,2 , which we
simply denote by 𝑐-Gap-Ort-Sum-Equal𝑛 .

    Additionally, we will let Sum-Equal𝑚,𝑎     𝑘,𝛿
                                                   denote the problem where we have 𝑚 independent
instances of Sum-Equal 𝑘,𝛿 , and to solve it, our protocol needs to be able to solve at least 𝑎𝑚 of
these instances correctly with probability at least 1 − 𝛿.
                                                                                                √
Definition 7.6. The Aug-Index-GOSE𝑡𝑛,𝑘 problem consists of 𝑡 independent instances of 10𝜖 𝑛-
Gap-Ort-Sum-Equal𝑛 , denoted 𝑔1 , 𝑔2 , . . . 𝑔𝑡 , with 𝑘 players and 𝑛 coordinates each.7 In this
problem, the referee is asked to estimate 𝑔 𝑖 based on an index 𝑖 ∈ [𝑡] together with the auxiliary
information of 𝑓𝑖+1 , . . . , 𝑓𝑡 , where we let 𝑓𝑖 ∈ [±𝑛] is defined as follows:
    Let 𝑎 be the number of underlying Sum-Equal 𝑘 instances in 𝑔 𝑖 outputting 1, and let 𝑏 be the
number of underlying Sum-Equal 𝑘 instances in 𝑔 𝑖 outputting 0. Then, 𝑓𝑖 = 𝑎 − 𝑏.

    To prove Theorem 7.3, we combine the following statements:

  (1) 𝑇𝜖 reduces to Aug-Index-GOSE𝑡𝑛,𝑘

   7Note that Aug-Index-GOSE𝑡𝑛,𝑘 implicitly depends on the value of 𝜀 even though we do not take 𝜀 as a parameter
since in this work, we will only be using it with 1 value of 𝜀.


                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                            30
                       S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

                                         √
    (2) Aug-Index-GOSE𝑡𝑛,𝑘 reduces to 10𝜖 𝑛-Gap-Ort-Sum-Equal𝑡𝑛,𝑘 .
           √
    (3) 10𝜖 𝑛-Gap-Ort-Sum-Equal𝑡𝑛,𝑘 reduces to 1-Gap-Ort-Sum-Equal𝑡𝑛0 ,𝑘 for some 𝑛 0 which will
        be defined later.

    (4) 𝑘-player communication complexity in the linear model is bounded by 𝑘 − 1 times the
        2-player communication complexity in the linear model with a specific input distribution.

    (5) For a particular hard input distribution 𝜇, 1-Gap-Ort-Sum-Equal𝑡𝑛0 ,2 reduces to
        Sum-Equal2𝑡𝑛 ,𝑐 .
                         0




    (6) We can directly bound the communication complexity of Sum-Equal2𝑡𝑛 ,𝑐 on said distribu-
                                                                                     0


        tion.

Of these statements, (2) follows almost immediately from the definition of Aug-Index-GOSE and
(4) follows by definition of the linear sketch model of communication. This will be explained in
more detail when we combine all of the above parts to bound the complexity of RSC𝑇𝑘,0,4 (𝑇𝜖 ).
     The following lemma proves (3)

                                                    𝑐
Lemma 7.7. for every 𝑘 ∈ ℕ , 0 ≤ 𝛿 ≤ 1/2, and 𝑛 ≥ 100𝜖 2 = 𝑛 ,
                                                            02



                       √
         RCC𝐿𝐼𝑁
            𝑘,𝛿
                ,𝑇
                   (10𝜖 𝑛-Gap-Ort-Sum-Equal𝑛,𝑘 ) ≥ RCC𝐿𝐼𝑁
                                                      𝑘,𝛿
                                                          ,𝑇
                                                             (𝑐-Gap-Ort-Sum-Equal𝑛0 ,𝑘 )



                                  𝑐
Proof of Lemma 7.7. Given 𝑛 0 = 100𝜀 2 and an input instance of 𝑐-Gap-Ort-Sum-Equal𝑛 0 with
                                           2

                                                                                           √
underlying Sum-Equal problems outputting x0 ∈ {0, 1} 𝑛 , we create the new input to 10𝜀 𝑛-
                                                         0


Gap-Ort-Sum-Equal𝑛 by taking 100𝜀2 𝑛/𝑐 2 copies of each coordinate, with results of underlying
problems being x ∈ {−1, 1} 𝑛 where we map each output of 0 to −1. As a result, 𝑛𝑗=1 x 𝑗 =
                                                                                     Í
100𝜀2 𝑛 Í𝑛 0
  𝑐2
        · 𝑗=1 x0𝑗 .
                     √                                                                √
     If | 𝑗 x0𝑗 | ≤ 𝑐 𝑛 0, then | 𝑗 x 𝑗 | ≤ 10𝜀𝑛, and on the other hand | 𝑗 x0𝑗 | ≥ 2𝑐 𝑛 0 implies
         Í                       Í                                       Í
 Í
|    𝑗 x 𝑗 | ≥ 20𝜀𝑛.
                                                                                      √
   Thus, any 𝑘-player 𝛿-error simultaneous communication protocol for 10𝜀 𝑛-
Gap-Ort-Sum-Equal𝑛 immediately implies a 𝑘-player 𝛿-error simultaneous communication
protocol for 𝑐-Gap-Ort-Sum-Equal𝑛0 .
   Since all we are doing is copying coordinates, this does not change the threshold.   

     Now, we prove (5)

Theorem 7.8. Given some simultaneous communication protocol Π with two players that solves 1-
Gap-Ort-Sum-Equal𝑛 when each Sum-Equal instance has input drawn from the distribution 𝜇, where
𝜇 := (𝐺/2 + 𝐵/2)𝑚 consists of 𝑚 independent copies of 𝐺/2 + 𝐵/2 (here and later, we use this notation

                             T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                31
             E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

to denote a random variable being drawn from 𝐺 with probability 12 and 𝐵 with probability 12 ), for 𝐺, 𝐵
defined as follows:
                                                              − 𝑘−1
                             (
                                𝐺 := 𝐺1 , . . . , 𝐺 𝑘−1 ,        𝑗=1 𝐺 𝑗
                                                               Í          

                                𝐵 :=   𝐵1 , . . . , 𝐵 𝑘−1 , 𝑀 − 𝑘−1
                                                                  𝑗=1 𝐵 𝑗
                                                               Í          

                                        𝐿𝐼𝑁             𝐿𝐼𝑁
there exists a protocol Π0 such that RCC2,𝛿 (Π0) ≤ 𝑂(RCC2,𝛿 (Π)) which solves Ω(𝑛) of the individual
Sum-Equal instances with probability at least 2 for some constant 𝛼 > 0.
                                               1+𝛼


Proof. Suppose Π is a protocol that solves 1-Gap-Ort-Sum-Equal𝑛 . Now, if Alice has input
𝑋 = (𝑥 1 , 𝑥2 , . . . 𝑥 𝑛 ) and Bob has input 𝑌 = (𝑦1 , 𝑦2 , . . . 𝑦𝑛 ) to 1-Gap-Ort-Sum-Equal𝑛 , we define a
corresponding instance of 1-Gap-Ort𝑛 where Alice gets input 𝑋 0 = (𝑥 10 , 𝑥20 , . . . 𝑥 0𝑛 ) and Bob gets
input 𝑌 0 = (𝑦10 , 𝑦20 , . . . 𝑦𝑛0 ). We define 𝑥 0𝑖 = 1 − 𝑦 𝑖0 iff 𝑥 𝑖 + 𝑦 𝑖 = 0 and 𝑦 𝑖0 = 0 with probability 1 if
𝑦 𝑖 < 𝑀/2, probability 12 if 𝑦 𝑖 = 𝑀/2, and probability 0 otherwise, where 𝑀 = 𝑎! for the value 𝑎
that is the largest integer such that 𝑎! < 𝑇.
     When 𝑋 , 𝑌 ∼ 𝜇, each 𝑥 𝑖 + 𝑦 𝑖 = 0 with probability 12 and 𝑦 𝑖 is equally likely to be −𝑥 𝑖 or
𝑀 − 𝑥 𝑖 so it is symmetric about 𝑀/2. Hence, (𝑋 0 , 𝑌 0) ∼ {0, 1}2𝑛 . Furthermore, the answer to
the 1-Gap-Ort-Sum-Equal𝑛 instance is the same as the answer to the 1-Gap-Ort𝑛 instance by
construction. Therefore, we can solve this instance of 1-Gap-Ort𝑛 by simply running Π.

    Now, if we let ℳ be the message sent by Alice to Bob in protocol Π, then

                                 𝐼(𝑋 0; ℳ, 𝑌) ≥ IC2,𝛿 (1-Gap-Ort𝑛 ) = Ω(𝑛)
   since Bob can solve 1-Gap-Ort𝑛 where Alice has input 𝑋 0 and Bob has input 𝑌 0 when he has
access to (ℳ, 𝑌) by returning the answer to 1-Gap-Ort-Sum-Equal𝑛 using protocol Π with input
𝑌 after being sent the message ℳ.
   We now note that 𝑋 0 is 𝑛 iid uniformly random bits. As such,
                                     𝑛
                                     Õ                                                     𝑛
                                                                                           Õ
                        0
                   𝐼(𝑋 ; ℳ, 𝑌) =           𝐼(𝑥 0𝑖 ; ℳ, 𝑌 | 𝑥 10 , 𝑥20 , . . . 𝑥 0𝑖−1 ) ≥         𝐼(𝑥 0𝑖 ; ℳ, 𝑌).
                                     𝑖=1                                                   𝑖=1

   Each of these terms is upper bounded by 1, so in order for the sum to be Ω(𝑛), there exists
some constant 𝑐 > 0 such that there are at leasts 𝑐𝑛 indices 𝑗 such that 𝐼(𝑥 0𝑗 ; ℳ, 𝑌) ≥ 𝛼 for some
constant 𝛼 > 0.

    Now, let
                                            𝐽 = {𝑗 | 𝐼(𝑥 0𝑗 ; ℳ, 𝑌) ≥ 𝛼}.

   We claim that the transcript of Π must contain the solution to the 𝑗 𝑡 ℎ Sum-Equal instance
                           2 for each 𝑗. To see this, we note that Bob has as input 𝑌 for
with probability at least 1+𝛼
1-Gap-Ort-Sum-Equal𝑛 so he can compute 𝑦 0𝑗 . Then, we note that

                               𝐻(𝑥 0𝑗 | ℳ, 𝑌) = 𝐻(𝑥 0𝑗 ) − 𝐼(𝑥 0𝑗 ; ℳ, 𝑌) ≤ 1 − 𝛼

                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                      32
                  S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    Since 𝑥 0𝑗 ∈ {0, 1}, let Pr[𝑥 0𝑗 = 0 | ℳ, 𝑌] = 𝑝. Then, if 𝑝 = 0, the entropy is 0 so this is satisfied
for any 0 < 𝛼 ≤ 1. If 𝑝 > 0, we have

                                    −(𝑝 log 𝑝 + (1 − 𝑝) log(1 − 𝑝)) ≤ 1 − 𝛼.

    Since this is symmetric about 𝑝 = 12 and cannot be satisfied by 𝑝 = 12 since 𝛼 > 0, we assume
WLOG that 𝑝 < 12 , in which case the entropy monotonically decreases as 𝑝 decreases. Now, we
claim that we must have 𝑝 < 21 − 𝛼2 . It suffices to show that

                                  1−𝛼     1−𝛼   1+𝛼     1+𝛼
                                                                      
                         −            log     +     log                            ≥ 1−𝛼
                                   2       2     2       2

     Simplifying this expression yields the solution

                                                  0 < 𝛼 < 1.

   Thus, for 0 < 𝛼 < 1, we must have 𝑝 < 1−𝛼
                                          2 . If 𝛼 = 1, then the entropy is 0 so we must have
𝑝 = 0. Thus, we get that 𝑝 ≤ 2 .
                            1−𝛼



     By symmetry, we thus have that either

                                                                  1−𝛼
                                          Pr[𝑥 0𝑗 = 0 | ℳ, 𝑌] ≤
                                                                   2
or
                                                                  1+𝛼
                                         Pr[𝑥 0𝑗 = 0 | ℳ, 𝑌] ≥        .
                                                                   2
In the former case, Bob lets 𝑥ˆ0𝑗 = 1 and in the latter case, Bob lets 𝑥ˆ0𝑗 = 0. Bob then computes 𝑦 0𝑗
from 𝑦 𝑗 . Then, if 𝑥ˆ0 = 𝑦 0 , Bob concludes that 𝑥 𝑗 + 𝑦 𝑗 ≠ 0 and if 𝑥ˆ0 = 1 − 𝑦 0 , Bob concludes that
                     𝑗        𝑗                                              𝑗𝑗
𝑥 𝑗 + 𝑦 𝑗 = 0. By construction, this succeeds with probability at least 2 , and all we did was run
                                                                       1+𝛼

Π and compute the value from the transcript.                                                                

Corollary 7.9. There exist constants 𝛼, 𝑐 > 0 such that

                                                                                           𝜖−2 /100,𝑐
                                                                                                       
             𝐿𝐼𝑁 ,𝑇                                𝐿𝐼𝑁 ,𝑇 
            D2,𝛿,𝜇0 1-Gap-Ort-Sum-Equal𝜖−2 /100 ≥ D2,(1+𝛼)/2,𝜇 Sum-Equal2

Proof. This follows directly from Theorem 7.8. The protocol Π0 solves Ω(𝑛) of the individual
sum-equal instances, so there is some constant 𝑐 such that Π0 solves at least 𝑐𝑛 of them with
probability at least 1+𝛼
                      2 . Each instance of Sum-Equal corresponds to a single coordinate from
1-Gap-Ort-Sum-Equal so their frequencies must all be bounded by 𝑇 as well.                  

     We can now prove (6)

                         T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                              33
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

Theorem 7.10. When 𝛿 < 12 and 𝑎 is some constant fraction,

                               IC𝑇𝑘,𝛿 (Sum-Equal𝑛𝑘 ,𝑎 ) ≥ Ω(𝑛 0 𝑘 log log 𝑇)
                                                   0
                                                                                                    (7.1)

where Sum-Equal𝑛𝑘 ,𝑎 is the problem where we are given 𝑛 0 independent instances of Sum-Equal and
                    0


we are asked to solve 𝑎𝑛 0 of them with probability 1 − 𝛿 each.

    The proof of this theorem can be found in Appendix B.

Corollary 7.11. For the input distribution 𝜇 defined in the proof of Theorem 7.10, 𝛿 < 12 , and 0 < 𝑎 < 1,
                                𝐿𝐼𝑁 ,𝑇
                                       (Sum-Equal2𝑛 ,𝑎 ) ≥ Ω(𝑛 0 log log 𝑇)
                                                       0
                               D2,𝛿,𝜇

Proof. If we plug 𝑘 = 2 into (7.1), we get
                  𝐿𝐼𝑁 ,𝑇
                         (Sum-Equal2𝑛 ,𝑎 ) ≥ IC𝑇2,𝛿 (Sum-Equal2𝑛 ,𝑎 ) ≥ Ω(𝑛 0 log log 𝑇)
                                       0                          0
                 D2,𝛿,𝜇

since by definition 𝜇 is the hard distribution from which we got the information complexity
bound.                                                                                    
    And finally, we prove (1)

Theorem 7.12. RCC𝐿𝐼𝑁 ,𝑇
                 𝑘,1/3
                        (𝑇𝜖 ) ≥ RCC𝐿𝐼𝑁
                                   𝑘,0.4
                                        ,𝑇
                                           (Aug-Index-GOSE)

Proof. Suppose we have a protocol that solves 𝑇𝜖 . Then, from the input of Aug-Index-GOSE𝑡𝑛,𝑘 ,
                    √ to 𝑇𝜖 as follows:
we construct an input
    For the 𝑖-th 10𝜀 𝑛-Gap-Ort-Sum-Equal𝑛 instance 𝑔 𝑖 in the Aug-Index-GOSE𝑡𝑛,𝑘 problem, we
construct 100𝑖−1 distinct copies of every element in the input. We take the concatenation of all of
these inputs as our input to 𝑇𝜖 . Thus the universe contains 𝑁 := 𝑛 + 100 · 𝑛 + · · · + 100𝑡−1 · 𝑛 ≤
100𝑡 𝑛/99 distinct
               √ elements in total, which is 𝑁 ≤ 𝑛
                                                          1.01 for sufficiently small 𝑡 (and hence

          > 1/ 𝑛). The final Hamming norm is a weighted sum 𝐹0 := 𝑡𝑖=1 100𝑖−1 𝑓𝑖0. The
     0.49                                                                       Í
1/𝑁
advantage of 𝐹0 (that is, the difference between the number of 1s and 0s in this stream) is hence
𝐹 := 2𝐹0 − 𝑁 = 𝑡𝑖=1 100𝑖−1 𝑓𝑖 .
                 Í
    Then we invoke the simultaneous communication protocol for T𝜀 to estimate 𝐹0, which
returns a value 𝐹e0 satisfying (1 − 𝜀)𝐹0 ≤ 𝐹e0 ≤ (1 + 𝜀)𝐹0. Translating to the advantage we get
𝐹
e − 𝐹 ≤ 2𝜀𝐹0 ≤ 2𝜀𝑁. From this approximated value 𝐹,
                                                 e together with the index 𝑖 and auxiliary
information 𝑓𝑖+1 , . . . , 𝑓𝑡 , we need to determine the output value of 𝑔 𝑖 . Since the influence of
 𝑓 𝑗 with 𝑗 > 𝑖 can be precisely removed from 𝐹 before getting the approximated norm 𝐹,              e in
what follows it suffices to consider the estimation of 𝑔𝑡 when the index is indeed 𝑖 = 𝑡. Recall
that 𝐹 = 100𝑡−1 𝑓𝑡 + 𝑡−1       100𝑖−1 𝑓𝑖 , and thus 𝐹
                                                    e is also an approximation of 100𝑡−1 𝑓𝑡 as long as the
                      Í
                 Í𝑡−1 𝑖=1𝑖−1
additive error 𝑖=1 100 𝑓𝑖 is bounded.
      Let the input distribution to every 𝑓𝑖 be padded from the 1-Gap-Ort-Sum-Equal𝜖−2 distribu-
tion 𝜇0 as in Theorem 7.8, where the coordinates are iid bits drawn uniformly from {0, 1}. Thus,

                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                          34
                    S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

each 𝑓𝑖 has expectation 0 and variance 25𝜖 2𝑛 2 It immediately    follows by Chebyshev’s inequality
that Pr [| 𝑓𝑖 | ≥ 50𝜀𝑛] ≤ 1/100. Similarly, Pr | 𝑓𝑖 | ≥ 50 𝑗 𝜀𝑛 ≤ 1/100 𝑗 . Therefore,
                                                               

                   " 𝑡−1                                 #     𝑡−1                           𝑡−1
                     Õ                                         Õ                             Õ   1            1
                                 𝑖−1              𝑡−1                              𝑖
                                       𝑓𝑖 > 100         𝜀𝑛 ≤         Pr | 𝑓𝑡−𝑖 | > 50 𝜀𝑛 ≤
                                                                                      
              Pr           100                                                                            ≤                  (7.2)
                    𝑖=1                                        𝑖=1                           𝑖=1
                                                                                                   100𝑖       99

                                                                                                          Í𝑡−1
where the first inequality holds because if | 𝑓𝑡−𝑖 | ≤ 50𝑖 𝜀𝑛 for every 𝑖, then                               𝑖=1 ·100
                                                                                                                         𝑖−1 𝑓
                                                                                                                                 𝑖   ≤
Í𝑡−1     𝑖−1 × 50𝑡−𝑖 𝜀𝑛 ≤ 50𝑡 𝜀𝑛 Í𝑡−1 2𝑖 < 100𝑡−1 𝜀𝑛.
 𝑖=1 100                  100     𝑖=1
    Notice that as long as 𝐹
                           e is a (1 ± 𝜀)-approximation of 𝐹, we must have 𝐹
                                                                           e − 𝐹 ≤ 2𝜀𝑁.
                                        e < 15 · 100𝑡−1 𝜖𝑛 and 1 if 𝐹
Furthermore suppose that we return 0 if 𝐹                           e ≥ 15 · 100𝑡−1 𝜖𝑛. Since we
                 𝑡
know that 𝑁 ≤ 100 𝑛/99, we have

                                          2𝜀𝑁 ≤ 2𝜖100𝑡 𝑛/99 < 3 · 100𝑡−1 𝜖𝑛.
     So in particular, if T𝜀 succeeds, if 𝑔𝑡 = 0, we have | 𝑓𝑡 | ≤ 10·100𝑡−1 𝜖𝑛, so |𝐹| = | 𝑡𝑖=1 100𝑖−1 𝑓𝑖 | ≤
                                                                                                          Í
11 · 100𝑡−1 𝜖𝑛 with probability at least 98               ˜
                                             99 . Then, | 𝐹| < 14 · 100
                                                                        𝑡−1 𝜖𝑛 and our algorithm succeeds.


    Similarly, if 𝑔𝑡 = 1, we have | 𝑓𝑡 | ≥ 20 · 100𝑡−1 𝜖𝑛. Thus, with probability at least 98
                                                                                           99 ,
       Í𝑡       𝑖−1            𝑡−1        ˜           𝑡−1
|𝐹| = | 𝑖=1 100 𝑓𝑖 | ≥ 19 · 100 𝜖𝑛, so | 𝐹| > 16 · 100 𝜖𝑛 and our algorithm succeeds.

    Thus, if T𝜀 succeeds with probability 23 , the above algorithm succeeds with probability
3 · 99 > 0.6. Thus we can determine the value of 𝑔 𝑡 with probability ≥ 0.6. The thresholds stay
2 98

the same because all we did to change the input was copy coordinates, which does not change
the frequencies. Hence,
                                                                                             
                             RCC𝐿𝐼𝑁 ,𝑇
                                𝑘,1/3
                                       (T𝜀 ) ≥ RCC𝐿𝐼𝑁
                                                  𝑘,0.4
                                                       ,𝑇
                                                          Aug-Index-GOSE𝑡𝑛,𝑘 .

                                                                                                                                     

  Finally, we must fill in the missing statements 2) and 4) to bound the streaming complexity
RSC𝑇𝑘,0.4 (𝑇𝜖 ). To do this, we conclude as follows:

    When we solve Aug-Index-GOSE, we claim that in order to solve Aug-Index-GOSE with
probability at least 0.6 on every input, we must solve every instance of Gap-Ort-Sum-Equal with
probability at least 0.6. This is statement 2) in the overview, and can be proved as follows:
    Suppose we have a protocol that solves Aug-Index-GOSE with probability at least 0.6. Now,
consider the 𝑗 𝑡 ℎ instance of Gap-Ort-Sum-Equal. Since the protocol succeeds with probability at
least 0.6 on any input, we can simply consider any input to Aug-Index-GOSE where the index is
𝑗. By assumption, our protocol succeeds with probability at least 0.6, so it solves the 𝑗 𝑡 ℎ instance
of Gap-Ort-Sum-Equal with probability at least 0.6. This holds for every 𝑗, so our protocol must
solve every instance of Gap-Ort-Sum-Equal with probability at least 0.6.

                           T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                     35
            E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

     To see why statement 4) in the overview is true, we wish to prove that
                                                                                                        
         D𝐿𝐼𝑁 ,𝑇
          𝑘,𝛿,𝜇
                                                     𝐿𝐼𝑁 ,𝑇
                 1-Gap-Ort-Sum-Equal𝑡𝑛0 ,𝑘 ≥ (𝑘 − 1)D2,𝛿,𝜇  1-Gap-Ort-Sum-Equal𝑡𝑛0 ,2

   and this holds by definition of the linear sketch model of communication: recall that a
protocol consists of some matrix 𝐴 where every player simply multiplies their input by 𝐴 and
sends the resulting vector to the referee. For any matrix 𝐴, number of bits communicated by
each player will be the same regardless of the number of players, so we get this equivalence.
   Thus, putting all of this together, we get the chain of inequalities:

                                                                                
                     RCC𝐿𝐼𝑁 ,𝑇
                        𝑘,2/3
                               (𝑇𝜖 ) ≥ RCC𝐿𝐼𝑁
                                          𝑘,0.4
                                               ,𝑇
                                                   Aug-Index-GOSE𝑡𝑛,𝑘
                                                   √                          
                                     ≥ RCC𝐿𝐼𝑁
                                          𝑘,0.4
                                               ,𝑇
                                                   10𝜖 𝑛-Gap-Ort-Sum-Equal 𝑡
                                                                           𝑛,𝑘
                                                                                         
                                        ≥ RCC𝐿𝐼𝑁
                                             𝑘,0.4
                                                  ,𝑇
                                                     1-Gap-Ort-Sum-Equal𝑡𝑛0 ,𝑘
                                                                                    
                                        ≥ D𝐿𝐼𝑁 ,𝑇
                                           𝑘,𝛿,𝜇
                                                  1-Gap-Ort-Sum-Equal𝑡𝑛0 ,𝑘
                                                                                             
                                                  𝐿𝐼𝑁 ,𝑇
                                        ≥ (𝑘 − 1)D2,𝛿,𝜇  1-Gap-Ort-Sum-Equal𝑡𝑛0 ,2
                                                                            
                                        ≥ (𝑘 − 1)D𝐿𝐼𝑁 ,𝑇
                                                         Sum-Equal2𝑡𝑛 ,𝑐
                                                                         0

                                                  𝑘,𝛿,𝜇

                                        ≥ Ω(𝑘𝑡𝑛 0 log log 𝑇) = Ω(𝜖 −2 𝑘 log 𝑛 log log 𝑇)

so
                              RCC𝐿𝐼𝑁  ,𝑇
                            1
        RSC𝑇𝑘,0.4 (𝑇𝜖 ) ≥        𝑘,0.4
                                         (𝑇𝜖 ) ≥ Ω(𝜖 −2 log 𝑛 log log 𝑇) = Ω(𝜖 −2 𝑘 log 𝑁 log log 𝑇)
                            𝑘
                                                                                                               


A     Communication upper bound for Equality
The standard 𝛿-error protocol solving the Equality problem starts by sending and       comparing
the digest under a random hash function ℎ : [𝑝] → [𝑞] where 𝑞 = 𝑂 𝛿−1 log 𝑝 . For example,
                                                                                  

let 𝑞 be a random prime drawn from the interval [𝛿−2 log2 𝑝, 2𝛿−2 log √ 𝑝] and let ℎ compute a
                                                                        2

number modulo 𝑞. By the prime number theorem there are at least 2 𝑁 primes in the interval
[𝑁 , 2𝑁], which implies the existence of 2𝛿−1 log(𝑝) distinct primes in that range. For any two
distinct numbers 𝑥, 𝑦 ∈ ℤ𝑝 , since 𝑧 = 𝑥 − 𝑦 has no more than log |𝑧| ≤ log 𝑝 prime factors, the
error probability of the protocol is bounded by the collision probability of ℎ as follows:
                                                                                           log 𝑝
            Pr [ℎ(𝑥) = ℎ(𝑦)] = Pr [𝑥 ≡ 𝑦         (mod 𝑞)] = Pr [𝑞|(𝑥 − 𝑦)] ≤                          <𝛿
             𝑞                      𝑞                          𝑞                         2𝛿−1 log 𝑝

   The communication is a message   of the form (ℎ, ℎ(𝑥)) (indeed (𝑞, 𝑥 mod 𝑞) in the above
example), whose length is at most 2 log 𝑞 = 𝑂 log(1/𝛿) + log log 𝑝 = 𝑂 log(1/𝛿) + log log 𝑘

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                   36
                   S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

bits. In particular this is an upper
                                  bound
                                        for one-way communication protocols computing
Equality. Recalling that 𝑝 = Θ 𝑘 1/4 , we can conclude

                                         −−−→
                          RCC2,𝛿 ( 𝑓 ) ≤ RCC2,𝛿 ( 𝑓 ) = 𝑂 log(1/𝛿) + log log 𝑘
                                                                               

   We note that the 1/𝛿 factor in 𝑞 is unavoidable, since otherwise more than an 𝛿 fraction of
numbers would share the same message and hence the collision probability, as well as the error
probability, would exceed 𝛿.


B      The lower bound for Sum-Equal𝑚,𝑎
                                    𝑘
                                        over integers
Theorem 7.10 (restated). Let Π be the 𝛿-error simultaneous 𝑘-player protocol for solving the
                                  𝑘 log log 𝑇
Sum-Equal𝑚,𝑎
             0

         𝑘
               problem, where 𝑚 ≤ 20 log 𝑘 and the error tolerance 𝛿 ∈ (0, 1/6). The simultaneous
communication complexity of Π is RCC𝐿𝐼𝑁 ,𝑇
                                           (Π) = Ω 𝑚 𝑘 log log 𝑇 .
                                                                              
                                    𝑘,𝛿


Proof. To prove the Ω 𝑚 𝑘 log log 𝑇 lower bound we will deduce a contradiction if Π uses
                                           
𝑐 < 𝛾𝑚𝑘 log log 𝑇 bits of communication, for a sufficiently small constant 𝛾. By decreasing 𝛾 we
may assume that 𝑘 is arbitrarily large.
    For the hard distribution we first introduce a magnitude bound 𝑎 defined to be the largest
integer such that 𝑎! ≤ min(𝑘 1/8 , 𝑇). We define 𝑀 = 𝑎!. Note that log 𝑎 = Ω(log log 𝑇) as
       𝑇
𝑀 ≥ 𝑎+1   , so 𝑎 log 𝑎 = Ω(log 𝑀) = Ω(log 𝑇). Taking the log of both sides, we have log(𝑎 log 𝑎) =
log 𝑎 + log log 𝑎 < 2 log 𝑎, so log 𝑎 = Ω(log log 𝑇).
    Now we specify the distribution ℋ for the Sum-Equal 𝑘 instances. ℋ := (𝐺/2 + 𝐵/2)𝑚
consists of 𝑚 independent copies of 𝐺/2 + 𝐵/2, for 𝐺, 𝐵 defined as follows:

                                                                      − 𝑘−1
                                (
                                    𝐺 :=       𝐺1 , . . . , 𝐺 𝑘−1 ,         𝐺𝑗
                                                                      Í           
                                                                       Í𝑗=1
                                                                        𝑘−1
                                    𝐵 :=       𝐵1 , . . . , 𝐵 𝑘−1 , 𝑀 − 𝑗=1 𝐵 𝑗
                                                                                


where 𝐺 𝑗 , 𝐵 𝑗 are uniformly and independently chosen from [𝑎] for every 𝑗 ∈ [𝑘 − 1]. Note that:

  (a) Sum-Equal 𝑘 (𝐺) = 0, Sum-Equal 𝑘 (𝐵) = 1;

  (b) the first 𝑘 − 1 elements of 𝐺 and 𝐵, denoted by 𝐺−𝑘 and 𝐵−𝑘 , are the same uniform
      distribution over [𝑎] 𝑘−1 . Thus we can write 𝐵 = (𝐺−𝑘 , 𝑀 + 𝐺 𝑘 )

    (c) for 𝑗 ∈ [𝑘 − 1], the 𝑗-th player’s input ℋ𝑗 is uniform over [𝑎]𝑚 and independent from other
        players’ input.

    Besides ℋ𝑘 , the referee gets in addition an index 𝑛 uniformly drawn from [𝑚] together with
the answers 𝑌 (𝑗) = Sum-Equal 𝑘 (𝑋 (𝑗) ) for 𝑗 = 𝑛 + 1, . . . , 𝑚. Let ℋ𝑛0 := (ℋ , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) ) and
the hard input distribution is defined as ℋ 0 := 𝑚                0
                                                   Í
                                                     𝑛=1 𝑚 · ℋ𝑛 .
                                                           1



                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                 37
               E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

     Now we derandomize the protocol Π by fixing the randomness and thus get an 𝛿-error
deterministic protocol Π0 with respect to the above input distribution. That is, Π0 outputs
             (𝑛)
Sum-Equal 𝑘 = Sum-Equal 𝑘 (𝑋 (𝑛) ) with probability ≥ 1 − 𝛿.
     By averaging, for at least 𝑚/2 choices of the index 𝑛 ∈ [𝑚] and the restricted distribution
ℋ𝑛0 , the error of Π0 is bounded by 2𝛿.
                                       h                                                         i
                              Pr           Π0𝑜𝑢𝑡 (Π0(𝑋 , 𝑌)) ≠ Sum-Equal 𝑘 (𝑋 (𝑛) ) ≤ 2𝛿                                            (B.1)
                          (𝑋 ,𝑌)∼ℋ𝑛0

                                                                                                                    
                                                                                        (𝑛)
   Then we introduce Lemma B.1 that lower bounds 𝐼                                     𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1          ≥ 0.1𝑘 log 𝑎 for
protocols with small error. For compactness the proof of Lemma B.1 is deferred to the end of
this section.

Lemma B.1. For every 𝑛 such that Π0 errs with probability ≤ 1/3 on input (𝑋 , 𝑌) ∼ ℋ𝑛0 , on at
   𝑎 𝑚 of the Sum-Equal instances, the mutual information between 𝑋 (𝑛) and Π0(𝑋 , 𝑌) must be
least 0
                    
    (𝑛)
𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 ≥ 0.1𝑘 log 𝑎.

  Using Lemma     B.1, it immediately follows that for 𝛿 ≤ 1/6 the protocol Π0 must use
Ω 𝑚𝑘 log 𝑎 bits of communication. Since
          


                       RCC𝑠𝑖𝑚   0
                          𝑘,𝛿 (Π ) ≥ 𝐼 (𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 )
                                               𝑚
                                               Õ                                                                
                                                           (𝑖)                             (1)         (𝑖−1)
                                           =         𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 | 𝑋−𝑘 , . . . , 𝑋−𝑘
                                               𝑖=1
                                               𝑚
                                               Õ                                                            
                                                           (𝑖)                            (1)        (𝑖−1)
                                           =         𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋−𝑘 , . . . , 𝑋−𝑘
                                               𝑖=1
                                               Õ𝑚                                 
                                                           (𝑖)
                                           ≥         𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1
                                               𝑖=1
                                             𝑎0𝑚
                                                 · 0.1𝑘 log 𝑎 = Ω 𝑚 𝑘 log 𝑎
                                                                            
                                           ≥
                                              2

since 𝑎 0 is some constant between 0 and 1.                                                                                            
                                                                                                      
                                                                             (𝑛)
Proof of Lemma B.1. Suppose by contradiction that 𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 < 0.1𝑘 log 𝑎 and recall
            𝑘 log log 𝑇  0.1𝑘 log 𝑎
that 𝑚 ≤      20 log 𝑘 ≤ log(𝑘𝑎) for log 𝑎 = Ω(log log 𝑇) and sufficiently large 𝑇,

                                                                    
               (𝑛)
          𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) < 0.1𝑘 log 𝑎 + 𝑚 log(𝑘𝑎) < 0.2𝑘 log 𝑎

   Therefore, recalling that 𝐼(𝐴; 𝐵, 𝐶) = 𝐼(𝐴; 𝐵 | 𝐶) when 𝐴 is independent from 𝐶 and that

                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                         38
                            S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

    (𝑛)                                                 (𝑛)                     (𝑛)
𝑋𝑗        is independent from 𝑋1 , . . . , 𝑋 𝑗−1 ,
                                𝑘−1 
                                Õ                                                                                          
                                                (𝑛)
                                         𝐼    𝑋 𝑗 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)
                                 𝑗=1
                                𝑘−1 
                                Õ                                                                                                                              
                                                  (𝑛)                                                                              (𝑛)                   (𝑛)
                           ≤             𝐼 𝑋 𝑗 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) , 𝑋1 , . . . , 𝑋 𝑗−1
                                 𝑗=1
                                𝑘−1 
                                Õ                                                                                                                                  
                                                  (𝑛)                                                                               (𝑛)                     (𝑛)
                           =             𝐼 𝑋 𝑗 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) | 𝑋1 , . . . , 𝑋 𝑗−1
                                 𝑗=1
                                        (𝑛)
                           ≤𝐼(𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) ) < 0.2𝑘 log 𝑎
      As a result, there is 𝐽 ⊆ [𝑘 − 1] and |𝐽 | > 𝑘/2 such that for every 𝑗 ∈ [𝑘 − 1], it holds that
                                                                                                                   
                                          (𝑛)
                                𝐼       𝑋 𝑗 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)                         < −1 + 0.5 log 𝑎,

and hence
                                              h                                                                                i
                                                      (𝑛)
                                         H 𝑋𝑗               | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)
                                              h             i                                                                                       
                                                      (𝑛)                   (𝑛)
                                     =H 𝑋 𝑗                     − 𝐼 𝑋 𝑗 ; 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)
                                     > log 𝑎 − (−1 + 0.5 log 𝑎) = 1 + 0.5 log 𝑎                                                                                              (B.2)
                                h                                                                               i
                                        (𝑛)
      Note that H∞ 𝑋 𝑗                        | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) < 1 implies the existence of 𝑥 ∈ [𝑎]
                      h                                                                                        i
                           (𝑛)
such that Pr 𝑋 𝑗                    = 𝑥 | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) = 𝑝 𝑥 > 12 , and hence it follows that
                  h                                                                                    i     Õ                      1
                          (𝑛)
               H 𝑋𝑗             | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) =                                  𝑝 𝑖 log
                                                                                                                                    𝑝𝑖
                                                                                                             𝑖∈[𝑎]
                                                                                                                           1                    𝑎−1
                                                                                                           ≤𝑝 𝑥 log           + (1 − 𝑝 𝑥 ) log
                                                                                                                           𝑝𝑥                  1 − 𝑝𝑥
                                                                                                           <1 + 0.5 log(𝑎 − 1)                                               (B.3)
                                                                      h                                                                                 i
                                                                           (𝑛)
      Thus, (B.2) ensures that H∞ 𝑋 𝑗                                             | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)                      ≥ 1 for every 𝑗 ∈
                                                                                        h                                                                              i
                                                                                             (𝑛)
𝐽. In what follows, we prove that if H∞ 𝑋 𝑗                                                        | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚)                    ≥ 1 for
every 𝑗 ∈ 𝐽 and |𝐽 | > 𝑘/2, then the conditional distribution 𝐵0𝑘 := 𝐺0𝑘 + 𝑀 and 𝐺0𝑘 :=
    Í 𝑘−1   (𝑛)
      𝑗=1 𝑋 𝑗             𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) have statistical distance ≤ 𝑘 −1/8 .
                      
−
                                                                  h                                                                                 i
                                                                          (𝑛)
      Notice that for 𝑗 ∈ 𝐽 and H∞ 𝑋 𝑗                                           | 𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) ≥ 1, the conditional
                                        (𝑛)
distribution 𝐺0𝑗 := 𝑋 𝑗                               𝑀1 , . . . , 𝑀 𝑘−1 , 𝑋 𝑘 , 𝑌 (𝑛+1) , . . . , 𝑌 (𝑚) is a convex combination of distri-
                                                  

butions uniform over two values. More specifically, 𝐺0𝑗 =                                                               𝑣 𝑗 𝛼𝑣 𝑗 · 𝐺
                                                                                                                                          [𝑣 𝑗 ]
                                                                                                                                                   , where 𝛼 𝑣 𝑗 ∈ (0, 1) and
                                                                                                               Í


                                        T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                                                   39
              E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

each 𝐺 [𝑣 𝑗 ] is a random variable uniform over two values. For 𝑗 ∉ 𝐽, 𝐺0𝑗 = 𝑣 𝑗 𝛼 𝑣 𝑗 · 𝐺 [𝑣 𝑗 ] where 𝐺 [𝑣 𝑗 ]
                                                                                                          Í

is fixed, i. e., a random variable that equals one value with probability
                                                                           1. For 𝑣 = (𝑣 1 , . . . , 𝑣 𝑘−1 ),
            Î 𝑘−1                                                          Í 𝑘−1
let 𝛼 𝑣 =     𝑗=1 𝛼 𝑣 𝑗 and 𝐺
                                [𝑣] =   𝐺 [𝑣1 ] , . . . , 𝐺 [𝑣 𝑘−1 ] , −     𝑗=1 𝐺
                                                                                     [𝑣 𝑗 ]
                                                                                              , then 𝐺0 can be decomposed as
𝐺0 = 𝑣 𝛼 𝑣    · 𝐺 [𝑣] .
      Í
   Now for every 𝑗 ∈ 𝐽 and 𝐺 [𝑣 𝑗 ] uniform over 𝑎 𝑗 , 𝑏 𝑗 ⊂ [𝑎], we can assume w.l.o.g., 𝑎 𝑗 < 𝑏 𝑗
                                                            

and write 𝐺 [𝑣 𝑗 ] = 𝑎 𝑗 + (𝑏 𝑗 − 𝑎 𝑗 )𝑍 𝑗 where 𝑍 𝑗 is uniform over {0, 1}. Since 𝑏 𝑗 − 𝑎 𝑗 ∈ [𝑎], among the
                                                                                                        √
> 𝑘/2 indices 𝑗 ∈ 𝐽 for which 𝐺 [𝑣 𝑗 ] takes two values, we must have 𝑡 ≥ |𝐽 |/𝑎 > 𝑘/𝑂 log 𝑘 > 𝑘
indices 𝐽 0 such that for any 𝑗 ∈ 𝐽 0 the value 𝑏 𝑗 − 𝑎 𝑗 is the same value 𝑀 0.
   Thus 𝐺 [𝑣] can be further decomposed into a convex combination of 𝐺 {𝑢} where, among the
indices in 𝐽, only those in 𝐽 0 are not fixed. Fix any 𝑢 and denote 𝐺 {𝑢} by 𝐺00. Let 𝑆 = 𝑗∈𝐽 0 𝑍 𝑗
                                                                                                      Í
denote the sum of 𝑡 uniform i.i.d. 0/1 random variables. Then we can write

                                                   𝐺00𝑘 = 𝑏 + 𝑀 0 𝑆
                                                   𝐵00𝑘 = 𝑏 + 𝑀 0 𝑆 + 𝑀

   Since 1 ≤ 𝑀 0 < 𝑎, 𝑀 0 divides 𝑀 and hence 𝑀 = 𝑀 0 𝑞 for 𝑞 ∈ ℤ and 𝑞 ≤ 𝑀 ≤ 𝑘 1/8 . Now we
can apply 𝑞 times the shift-invariance of the binomial distribution, which is stated as follows:
Claim B.2 (Claim 39 in [23]). Let 𝑆 be the sum of 𝑡 uniform, i.i.d. Boolean random variables. Then 𝑆
                                          √ 
and 𝑆 + 1 have statistical distance ≤ 𝑂 1/ 𝑡 .

    This yields that 𝐺00𝑘 and 𝐵00𝑘 have statistical distance

                                                                                              √
                                                                                    q             
            𝑆𝐷(𝐺00𝑘 , 𝐵00𝑘 ) = 𝑆𝐷(𝑀 0 · 𝑆, 𝑀 0 · (𝑞 + 𝑆)) ≤ 𝑞 · 𝑂                  1/             𝑘 ≤ 𝑘 1/8 /𝑘 1/4 = 𝑘 −1/8

    Recalling that 𝐺0 is just a convex combination of 𝐺00, the statistical distance between 𝐺0𝑘 and
𝐵0𝑘 = 𝐺0𝑘 + 𝑀 is also bounded by 𝑘 −1/8 . However, by definition of 𝐺0𝑘 and 𝐵0𝑘 we conclude that
the referee cannot distinguish the two cases of 𝑋 (𝑛) ∼ 𝐺 and 𝑋 (𝑛) ∼ 𝐵 with advantage greater
than 𝑘 −1/8 < 1/6, which contradicts the condition that Π0 has error probability < 1/3.
                      (𝑛)
    Therefore, 𝐼 𝑋−𝑘 ; 𝑀1 , . . . , 𝑀 𝑘−1 ≥ 0.1𝑘 log 𝑎 = Ω 𝑘 log 𝑎 .
                                                                                              
                                                                                                                              


References
 [1] Farid Ablayev: Lower bounds for one-way probabilistic communication complexity
     and their application to space complexity. Theoret. Comput. Sci., 157(2):139–159, 1996.
     [doi:10.1016/0304-3975(95)00157-3] 9

 [2] Yuqing Ai, Wei Hu, Yi Li, and David P. Woodruff: New characterizations in turnstile
     streams with applications. In Proc. 31st Comput. Complexity Conf. (CCC’16), pp. 20:1–22.
     Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016. ACM DL. 8, 29

                       T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                                                  40
                S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

 [3] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar: An information statistics
     approach to data stream and communication complexity. J. Comput. System Sci., 68(4):702–
     732, 2004. [doi:10.1016/j.jcss.2003.11.006] 3, 4, 12

 [4] Paul Beame, Toniann Pitassi, Nathan Segerlind, and Avi Wigderson: A direct sum theorem
     for corruption and the multiparty NOF communication complexity of set disjointness. In
     Proc. 20th IEEE Conf. on Comput. Complexity (CCC’05), pp. 52–66. IEEE Comp. Soc., 2005.
     [doi:10.1109/CCC.2005.1] 7

 [5] Mark Braverman and Ankit Garg: Public vs private coin in bounded-round information.
     In Proc. 41st Internat. Colloq. on Automata, Languages, and Programming (ICALP’14), pp.
     502–513. Springer, 2014. [doi:10.1007/978-3-662-43948-7_42] 7

 [6] Joshua Brody, Harry Buhrman, Michal Koucký, Bruno Loff, Florian Speelman, and
     Nikolay K. Vereshchagin: Towards a reverse newman’s theorem in interactive infor-
     mation complexity. Algorithmica, 76(3):749–781, 2016. Preliminary version in CCC’13.
     [doi:10.1007/s00453-015-0112-9] 7

 [7] Amit Chakrabarti, Yaoyun Shi, Anthony Wirth, and Andrew Chi-Chih Yao: Informational
     complexity and the direct sum problem for simultaneous message complexity. In Proc.
     42nd FOCS, pp. 270–278. IEEE Comp. Soc., 2001. [doi:10.1109/SFCS.2001.959901] 10

 [8] Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan: Comparing data
     streams using Hamming norms (how to zero in). IEEE Trans. Knowledge Data Engr.,
     15(3):529–540, 2003. [doi:10.1109/TKDE.2003.1198388] 5

 [9] André Gronemeier: Asymptotically optimal lower bounds on the NIH-multi-party informa-
     tion complexity of the AND-function and Disjointness. In Proc. 26th Symp. Theoret. Aspects
     of Comp. Sci. (STACS’09), pp. 505–516. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik,
     2009. [doi:10.4230/LIPIcs.STACS.2009.1846, arXiv:0902.1609] 3, 12

[10] Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian: Learning-based frequency esti-
     mation algorithms. In Int. Conf. Learning Representations (ICLR’19). ICLR, 2019. OpenReview.
     5

[11] Piotr Indyk: Stable distributions, pseudorandom generators, embeddings, and data stream
     computation. J. ACM, 53(3):307–323, 2006. [doi:10.1145/1147954.1147955] 4

[12] T. S. Jayram: Hellinger strikes back: A note on the multi-party information complexity of
     AND. In Proc. 13th Internat. Workshop on Randomization and Computation (RANDOM’09), pp.
     562–573. Springer, 2009. [doi:10.1007/978-3-642-03685-9_42] 3, 12

[13] Tanqiu Jiang, Yi Li, Honghao Lin, Yisong Ruan, and David P. Woodruff: Learning-
     augmented data stream algorithms. In Int. Conf. Learning Representations (ICLR’20). ICLR,
     2020. OpenReview. 5

                    T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                     41
           E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

[14] Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff: Fast moment
     estimation in data streams in optimal space. In Proc. 43rd STOC, pp. 745–754. ACM Press,
     2011. [doi:10.1145/1993636.1993735, arXiv:1007.4191] 4

[15] Daniel M. Kane, Jelani Nelson, and David P. Woodruff: On the exact space complexity
     of sketching and streaming small norms. In Proc. 21st Ann. ACM–SIAM Symp. on Discrete
     Algorithms (SODA’10), pp. 1161–1178. SIAM, 2010. ACM DL. 4, 8

[16] Daniel M. Kane, Jelani Nelson, and David P. Woodruff: An optimal algorithm for the
     distinct elements problem. In Proc. 29th Symp. on Principles of Database Systems (PODS’10),
     pp. 41–52. ACM Press, 2010. [doi:10.1145/1807085.1807094] 4, 29

[17] Ilan Kremer, Noam Nisan, and Dana Ron: On randomized one-round communication
     complexity. Comput. Complexity, 8(1):21–49, 1999. Errata (2001). [doi:10.1007/s000370050018]
     9

[18] Eyal Kushilevitz: Communication complexity. Adv. in Computers, 44:331–360, 1997.
     [doi:10.1016/S0065-2458(08)60342-3] 19

[19] Thodoris Lykouris and Sergei Vassilvitskii: Competitive caching with machine learned ad-
     vice. J. ACM, 68(4):24:1–25, 2021. Preliminary version in PMLR 2018. [doi:10.1145/3447579,
     arXiv:1802.05399] 5

[20] Michael Mitzenmacher: Scheduling with predictions and the price of misprediction. In
     Proc. 11th Innovations in Theoret. Comp. Sci. Conf. (ITCS’20), pp. 14:1–18. Schloss Dagstuhl–
     Leibniz-Zentrum fuer Informatik, 2020. [doi:10.4230/LIPICS.ITCS.2020.14] 5

[21] Michael Mitzenmacher and Sergei Vassilvitskii: Algorithms with predictions. In Tim
     Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pp. 646–662. Cambridge
     Univ. Press, 2020. [doi:10.1017/9781108637435.037] 5

[22] Christos H. Papadimitriou and Michael Sipser: Communication complexity. J. Comput.
     System Sci., 28(2):260–269, 1984. Preliminary version in STOC’82. [doi:10.1016/0022-
     0000(84)90069-2] 9

[23] Emanuele Viola: The communication complexity of addition. Combinatorica, 35(6):703–747,
     2015. Preliminary version in SODA’13. [doi:10.1007/s00493-014-3078-3] 6, 9, 14, 19, 27, 28,
     40

[24] David P. Woodruff and Guang Yang: Separating 𝑘-player from 𝑡-player one-way com-
     munication, with applications to data streams. In Proc. 46th Internat. Colloq. on Automata,
     Languages, and Programming (ICALP’19), pp. 1–14. Schloss Dagstuhl–Leibniz-Zentrum fuer
     Informatik, 2019. [doi:10.4230/LIPIcs.ICALP.2019.97] 1

[25] David P. Woodruff and Qin Zhang: Tight bounds for distributed functional monitoring.
     In Proc. 44th STOC, pp. 941–960. ACM Press, 2012. [doi:10.1145/2213977.2214063] 3, 8


                    T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                      42
            S EPARATING 𝑘-P LAYER FROM 𝑡-P LAYER O NE -WAY C OMMUNICATION

AUTHORS

   Elbert Du
   Ph. D. student
   Department of Computer Science
   Harvard University
   Cambridge, MA, USA
   edu1@g.harvard.edu
   https://sites.google.com/view/elbert-du/about


   Michael Mitzenmacher
   Professor
   Department of Computer Science
   Harvard University
   Cambridge, MA, USA
   michaelm@eecs.harvard.edu
   https://www.eecs.harvard.edu/~michaelm/


   David Woodruff
   Associate Professor
   Department of Computer Science
   Carnegie Mellon University
   Pittsburgh, PA, USA
   dwoodruf@andrew.cmu.edu
   http://www.cs.cmu.edu/~dwoodruf/


   Guang Yang
   Research Director
   Tree-Graph Blockchain Innovation Center of Shanghai and
   Tree-Graph Blockchain Innovation Center of Xiang River Hunan
   Conflux Foundation
   Shanghai, China
   guang.research@gmail.com
   https://sites.google.com/site/guangyangresearch/home




                T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44       43
         E LBERT D U , M ICHAEL M ITZENMACHER , DAVID W OODRUFF , AND G UANG YANG

ABOUT THE AUTHORS

    Elbert Du is currently a Ph. D. student studying Computer Science at Harvard
       University. He was first introduced to the world of academic mathematics when
       he was in fifth grade, and he began attending the late Professor Paul Sally Jr.’s
       Young Scholars’ Program at the University of Chicago. Elbert is now interested in
       studying complexity, differential privacy, and adaptive data analysis. In his spare
       time, Elbert enjoys reading, solving chess puzzles, and playing video games.


    Michael Mitzenmacher is a Professor of Computer Science at Harvard University.
       He is the coauthor of a well-known textbook on randomized algorithms and
      probabilistic techniques in computer science with Eli Upfal. He is an ACM and
       IEEE Fellow.


    David Woodruff is an associate professor in the Computer Science Department
       at Carnegie Mellon University. He works on the foundations of data science,
       specifically in data streams, machine learning, randomized numerical linear
       algebra, sketching and sparse recovery.


    Guang Yang is currently the research director at Conflux, a startup blockchain
      project initiated by Fan Long and Andrew Yao. Before joining Conflux, he was an
      assistant professor at Institute of Computing Technology (ICT), Chinese Academy
      of Sciences.




                  T HEORY OF C OMPUTING, Volume 19 (10), 2023, pp. 1–44                      44