Separation of Unbounded-Error Models in Multi-Party Communication Complexity

Authors Arkadev Chattopadhyay, Nikhil S. Mande,
Plaintext
                          T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23
                                       www.theoryofcomputing.org




    Separation of Unbounded-Error Models in
     Multi-Party Communication Complexity
                        Arkadev Chattopadhyay∗                          Nikhil S. Mande†
              Received September 7, 2016; Revised March 24, 2017; Published December 28, 2018




       Abstract: We construct a simple function that has small unbounded-error communication
       complexity in the k-party number-on-forehead (NOF) model but every probabilistic protocol
       that solves it with subexponential advantage over random guessing has cost essentially
          √
       Ω( n/4k ) bits. This separates these classes up to k ≤ δ log n players for any constant
       δ < 1/4, and gives the largest known separation by an explicit function in this regime of
       k. Our analysis is elementary and self-contained, inspired by the methods of Goldmann,
       Håstad, and Razborov (Computational Complexity, 1992). After initial publication of our
       work as a preprint (ECCC, 2016), Sherstov pointed out to us that an alternative proof of an
       Ω((n/4k )1/7 ) separation is implicit in his prior work (SICOMP, 2016). Furthermore, based
       on his prior work (SICOMP, 2013 and SICOMP, 2016), Sherstov gave an alternative proof
                               √
       of our constructive Ω( n/4k ) separation and also produced a stronger non-constructive
       Ω(n/4k ) separation. These results are explained in Sherstov’s preprint (ECCC, 2016) and in
       article 14/22 in Theory of Computing.

    A preliminary version of this paper appeared as ECCC technical report TR 16-095.
   ∗ This work was done while the author was partially supported by a Ramanujan fellowship of the Department of Science and

Technology, India.
   † This work was done while the author was supported by a fellowship of the Department of Atomic Energy, India.




ACM Classification: F.1.3, F.2.3
AMS Classification: 68Q05, 68Q10, 68Q15, 68Q17
Key words and phrases: complexity theory, communication complexity, weakly unbounded error,
unbounded error, NOF model

© 2018 Arkadev Chattopadhyay and Nikhil S. Mande
c b Licensed under a Creative Commons Attribution License (CC-BY)                            DOI: 10.4086/toc.2018.v014a021
                            A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

           Our result has the following consequence for Boolean threshold circuits. Let THR and
      MAJ denote the classes of linear threshold functions that have unbounded weights and
      polynomially bounded weights, respectively. Further, let PARk (ANYk ) denote the class of
      functions that are parities of k bits (any k-junta, respectively). Denote by THR ◦ PARk the
      class of depth-2 circuits where the top gate computes a linear threshold function, and the
      bottom gates compute functions in PARk . For every 2 ≤ k ≤ δ log n, we show that there exists
      a function computable by a linear-size THR ◦ PARk circuit, but requires exp nΩ(1) -size
      circuits in the class MAJ ◦ SYM ◦ ANYk−1 , where the gates in the middle layer compute
      symmetric functions. Applying a result of Goldmann et al. (loc. cit.) to the above, similar
      lower bounds on the size of circuits of the form MAJ ◦ THR ◦ ANYk−1 for computing the
      function follow.
           The main technical ingredient of our proof is to exhibit a composed function of the form
       f ◦ XOR which has exponentially small discrepancy while f has sign-degree just 1. An
      interesting aspect of our composed function is that the block size of the inner XOR function
      is fixed to 1, independent of k, the number of players.


1    Introduction
Over thirty years ago Chandra, Furst and Lipton [10] introduced the “number-on-forehead” (NOF) model
of multi-party communication to obtain lower bounds on the size of branching programs. In the NOF
model, there are k players each having an input that is metaphorically held on their foreheads. Every
forehead is visible to a player except her own. The two features that make this model much more subtle
than its classical two-party counterpart, are the mutual overlap of information and the fact that as k
grows, each player misses less information. Indeed, starting with the surprising result of Grolmusz [18],
several sets of authors (see, for example, [3, 1, 15]) constructed counter-intuitive protocols especially
when k is greater than log n. This makes proving multi-party lower bounds on the cost of protocols
quite challenging. However, researchers have been well motivated to take on this challenge due to
many well-known applications of such lower bounds in diverse areas like circuit complexity, proof
complexity, and pseudo-random generators. More recently new applications have emerged in areas like
data structures [25] and distributed computing [16].
     In a seminal paper, Babai, Frankl and Simon [2] introduced communication complexity classes for the
2-party model. Babai et al. [2] argue that protocols with poly-log (of input length) communication cost
is a natural notion of efficient protocols, just as polynomial time is a notion of efficient computation on
Turing machines. Armed with this notion, most computational complexity classes have their analogues
in communication complexity. This also extends easily to the NOF model and gives rise to complexity
classes Pcc         cc    cc       cc
           k , BPPk , NPk , PPk etc. While it is very hard to separate Turing machine complexity classes,
many separations in the communication world are known when k = 2. For instance, the Equality function
easily separates Pcc                cc                                             cc         cc
                     2 from BPP2 . Set-Disjointness famously separates BPP2 from PP2 [2] (cf. [22, 28]).
However, for k ≥ 3, things become more delicate. While for k ≥ 3 Beame et al. [5] separated Pcc           k from
BPPcc k  not too long  ago,  it is still outstanding to find an explicit function witnessing this separation even
for k = 3. A recent line of work [24, 13, 12, 33, 31, 26] showed that Set-Disjointness also separates
BPPcc            cc
      k and PPk for k ≤ δ · log n for some constant δ < 1.


                        T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                  2
     S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

    In this paper, we consider the class PPcc  k . Babai et al. [2] realized that the Turing machine complexity
class PP has two different natural versions in the communication world. Let ε be the advantage of a
probabilistic protocol over random guessing. Then, one way to measure the cost of the protocol is to add
a log(1/ε) term to the total number of bits communicated in the worst case. Functions that admit k-party
probabilistic protocols of poly-logarithmic cost in this model form the class PPcc         k . The other model is
unrestricted: it does not penalize by adding the log(1/ε) term to the cost, i. e., the cost is just the total
number of bits communicated in the worst case. Protocols in this model are allowed to use only private
random coins (see Section 2.1) and must, on each input, have non-zero advantage over random guessing.
Functions that have efficient k-party protocols in this model form the class UPPcc      k . It is not difficult to see
that PPcc
        k  ⊆ UPP   cc
                   k  . The fact that this inclusion is strict for k = 2 was  shown   independently    by Buhrman,
Vereshchagin and de Wolf [8] and by Sherstov [29]. The two papers use two different functions. However
the corresponding separation question for k ≥ 3 players remained unaddressed in the literature.
    We consider a simple and natural extension of the function defined by Goldmann, Håstad and
Razborov [17], which we define as follows.

Definition 1.1. Let
                                                    n−1 n4k −1
                             P(x, y1 , . . . , yk ) := ∑   ∑ 2i y1 j . . . yk j (xi,2 j + xi,2 j+1 )
                                                    i=0 j=0

                     2 k               k
where x ∈ {±1}2n 4 , yi ∈ {±1}n4 for each i.
We set
                          GHRNk x, y1 , . . . , yk := sgn(2P(x, y1 , . . . , yk ) + 1) ,
                                                  

where N = 2n2 4k .

    Our main theorem in this article uses GHRNk to separate PPcc                     cc
                                                                    k from UPPk for k ≤ δ log N, for any
constant δ < 1/4. Note that there is a natural way to assign the input variables to GHRNk to k + 1 players
as follows: x is Player 1’s input, and yi is Player (i + 1)’s input (for i = 1, . . . , k). Our main theorem is
given below.

Theorem 1.2 (Main Theorem). Let Π be any (k + 1)-party probabilistic public-coin protocol computing
the GHRNk function with advantage ε > 0. Then,
                                                     √             
                                                     N
                                 Cost Π + log 1/ε ≥ Ω    − log N − k .
                                                      4k

    Observe that Theorem 1.2 gives a lower bound precisely for the cost of a PPcc   k+1 protocol computing
GHRNk . On the other hand, note that GHRNk is a composition of a linear threshold function with N parities
of arity k + 1. A well-known simple fact (refer to Section 3 for a proof) says that every such function has
a UPPcc k+1 protocol of cost O(log N). This immediately yields the following separation result.

Corollary 1.3. For all 1 ≤ k ≤ δ log n, the GHRNk function is not in PPcc                         cc
                                                                       k+1 but is in the class UPPk+1 ,
for any constant 0 < δ < 1/4.

                           T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                    3
                           A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

    An additional motivation for our work comes from the study of constant-depth circuits with Threshold
gates. There are two types of Threshold gates that have been considered in the literature. The first one is
with unbounded weights and the class of such gates is denoted by THR. Formally, define a gate G to be a
Threshold gate if there exist integer weights a0 , a1 , . . . , an such that
                                                                             !
                                                                  n
                                   G(x1 , . . . , xn ) = sgn a0 + ∑ ai xi   .
                                                                 i=1

The second class of gates is those Threshold gates with polynomially bounded weights, called Majority
gates. We denote the class of such gates by MAJ.
    Goldmann et al. [17] showed that although linear threshold functions with unbounded weights can be
simulated by polynomial-size MAJ ◦ MAJ circuits, a simple function computable by linear-size circuits
of the form THR ◦ PAR2 requires exponential size to be computed by MAJ ◦ SYM circuits, where SYM
denotes gates computing arbitrary symmetric functions. We strengthen their result to depth-3 circuits as
follows.

Theorem 1.4. For each k ≥ 2, the function GHRNk can be computed by linear-size THR ◦ PARk+1 circuits,
but requires size
                                           √            
                                                 N log N
                                     exp Ω         −
                                               k4k      k
to be computed by depth-3 circuits of the form MAJ ◦ SYM ◦ ANYk .

    Let us remark that Theorem 1.4 continues to yield non-trivial bounds as long as k < δ log n for any
constant 0 < δ < 1/4. It is also worth noting that a result of [17] immediately yields, from the above
theorem, the following interesting result.

Corollary 1.5. The function GHRNk can be computed by linear-size THR ◦ PARk+1 circuits but requires
size                                    √              
                                               N log N
                                    exp Ω        −
                                             k4k      k
to be computed by depth-3 circuits of the form MAJ ◦ THR ◦ ANYk .

1.1   Related work
An anonymous reviewer, and subsequently Sherstov [32], pointed out that a comparatively off-the-shelf
Ω((n/4k )1/7 ) separation between PPcc               cc
                                        k and UPPk is implicit in prior work by combining known results
of Sherstov [33] and Beigel [7]. The best PPcc  k lower bound that one would get using functions obtained
in this way is Ω((n/4k )2/5 ), using a later result of Sherstov [31] and a more recent result of Thaler [35].
                                                         √
In contrast, our Theorem 1.2 obtains a bound of Ω( n/4k ) for a function that is a natural multi-party
adaptation of the function used by Goldmann et al. [17]. Our bound is stronger than the above bounds for
k ≤ (1/4 − ε) log n players. In particular, for any constant k, we get an Ω(n1/2 ) bound for our function
whereas the best previous implicit bounds are Ω(n2/5 ). After our result was published in a technical
report [14], Sherstov [34] showed that by carefully piecing together approximation-theoretic ideas from

                       T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                               4
      S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

his earlier work [30] and the result in [33], one can obtain an Ω(n/4k ) lower bound for a non-explicit
function. This implies, invoking a standard technique, our lower bound, for an explicit function that is
similar to ours. We note that while our result separates PPcc                cc
                                                               k from UPPk for up to k ≤ (1/4 − ε) log n
players, Sherstov’s separation [34] extends to k ≤ (1/2 − ε) log n players. On the other hand, our method
is elementary and self-contained. Using first principles, we prove a strong PPcc      k lower bound for a
function which remained unanalyzed until this result.
    The route of combining earlier work of Sherstov [33] uses unique-disjointness as the inner function.
With such an inner function, the previous techniques work with any outer function, like ODD-MAX-BIT,
that has large approximation error for any polynomial of degree sufficiently smaller than n. This is in
contrast to our use of XOR as the inner function. It is not very difficult to see that ODD-MAX-BIT ◦
XOR has very efficient PPcc  k protocols for all k ≥ 2. Thus, our argument has to exploit some feature of
the outer function that is not possessed by functions like ODD-MAX-BIT. We find this an independently
interesting aspect of the technique used in this article. Indeed, there has been considerable recent interest
in studying the communication complexity of XOR functions (see, for example, [36, 21]).
    In summary, progress on separating communication complexity classes in the NOF model has been
slow. This article is the first one to explicitly address the question of separating PPcc              cc
                                                                                          k and UPPk for
k > 2.

1.2    Our proof technique and organization
We work with the GHR function which is easily seen to be the composition of the universal threshold
function [20] and Parity. The universal threshold function derives its name from the fact that by setting
some of its bits appropriately one can induce any arbitrary threshold function. In that sense, it is the
hardest function of sign-degree 1. To estimate the discrepancy of GHRNk , we extend ideas from [17]
where this was estimated in the setting of two players. The basic intuition can be seen after observing
that for a given setting of y1 , . . . , yk the GHRNk function essentially depends on the sign of a plus-minus
combination of the A j for 0 ≤ j ≤ n4k − 1, where

                                                 1 n−1 i                  
                                        A j :=     ∑   2 xi,2 j + xi,2 j+1 .
                                                 2 i=0

The relevant sign of each A j depends on the parity of the bits y1, j , . . . , yk, j . Further, the set of bits in
x that each A j depends on is disjoint from the set of bits that A j0 depends on, whenever j 6= j0 . We
sample x such that each A j is an i.i.d. binomial distribution centered at 0 with range [−2n + 1, 2n − 1].
Let this distribution be µX . We sample each yi uniformly at random. We want to ensure that the GHRNk
function, under this distribution, behaves in a way that leaves the players with little clue about the
outcome unless the relevant sign to be associated with each A j is determined. The distribution defined
above is a product distribution. Sherstov [29] showed that the GHR function has large discrepancy under
product distributions. Thus, as done in [17], one is forced to sample in a slightly more involved way.
First sample the y uniformly at random. Then sample x according to µX , conditioned on the fact that
         k −1
P = ∑n4j=0 A j y1, j · · · yk, j is very close to its mean compared to its standard deviation (which is as high as
exp (Ω(n))). Note that the mean of each A j is 0, which gives us plenty of room to exploit. This turns out
to be the hard distribution but to establish this requires technical work. This is mainly because analyzing

                        T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                    5
                              A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

the discrepancy under non-product distributions is difficult. As a first step to overcome this difficulty, we
follow the ideas of Goldmann et al. [17], and show that it is sufficient to prove an upper bound on the
discrepancy of a function related to the GHR function under a particular product distribution. Analyzing
the discrepancy of this related function on the obtained product distribution is still non-trivial, and this is
the main technical contribution of our result.
Organization: Section 2 develops the basic notions and lemmas. Section 3 establishes our main technical
result, Theorem 3.1, an upper bound on the k-wise discrepancy of the GHR function. Using this, we
prove Theorem 1.2 and Corollary 1.3. Section 4 derives the circuit consequences of Theorem 1.4 and
Corollary 1.5. Finally, Section 5 concludes with some open problems.


2     Preliminaries
2.1    The NOF model
In the k-party model of Chandra et al. [10], k players with unlimited computational power wish to compute
a function f : X1 × · · · × Xk → {−1, 1} on some input x = (x1 , . . . , xk ). For the purpose of this paper, we
consider inputs of the form Xi = {−1, 1}ni . On input x, player i is given (x1 , . . . , xi−1 , xi+1 , . . . , xk ), which
is why it is figuratively said that xi is on the i-th player’s forehead. Players communicate by writing
on a blackboard, so every player sees every message. We denote by Dk ( f ) the deterministic k-party
communication complexity of f , namely the number of bits exchanged in the best deterministic protocol
for f on the worst-case input.
     A probabilistic protocol Π with access to public (private) randomness computes f with advantage ε if
the probability that Π and f agree is at least 1/2 + ε for all inputs. The cost of Π is the maximum number
                                                                                                   pub             priv
of bits it communicates over its internal random choices in the worst case. Let us define Rε ( f ) (Rε ( f ))
to be the cost of the best such protocol. Note that for convenience, we deviate from the notation defined
in [23]. For the purpose of this paper, all logarithms are taken in base 2. We now define two other notions.
                                               
                                 pub            1                                        h
                                                                                           priv
                                                                                                  i
               PPk ( f ) := min Rε ( f ) + log      ,                   UPPk ( f ) := min Rε ( f ) .               (2.1)
                             ε                  ε                                        ε


Note that privateness of the random coins is essential in the definition of UPPk . It is a simple exercise to
show that every function can be computed by unbounded-error protocols using 2 bits if allowed public
coins. Define PPcc                                           cc
                  k = { f : PPk ( f ) = polylog(n)} and UPPk = { f : UPPk ( f ) = polylog(n)}, where n is
the maximum length of an input to a player. Each element in either of these classes refers to a family of
functions, f , one for each input length.


2.2    Cylinder intersections, discrepancy and the cube norm
Let f : X1 ×· · ·×Xk → {−1, 1}. A subset Si ⊆ X1 ×· · ·×Xk is a cylinder in the i-th direction if membership
in Si does not depend on the i-th coordinate. A subset S is called a cylinder intersection if it can be
represented as the intersection of k cylinders, S = ki=1 Si , where Si is a cylinder in the i-th direction.
                                                   T




                          T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                         6
      S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

Definition 2.1. Let µ be a distribution on X1 × · · · × Xk . The discrepancy of f according to µ, Disckµ ( f )
is
         max Pr[ f (x1 , . . . , xk ) = 1 ∧ (x1 , . . . , xk ) ∈ S] − Pr[ f (x1 , . . . , xk ) = −1 ∧ (x1 , . . . , xk ) ∈ S]
             S    µ                                                                µ

where the maximum is taken over all cylinder intersections S.

   The k in Disckµ denotes the dimension of the underlying cylinder intersections. We will drop this
superscript when it is clear from the context what k is. Let Disc( f ) = minµ Disckµ ( f ).
   The discrepancy method is a powerful tool that gives a lower bound on the randomized communication
complexity in terms of the discrepancy. The following lemma, due to Babai et al. [4], can be found, for
example, in [23, Sec. 3.5].
                   pub
Lemma 2.2. Rε ( f ) ≥ log(2ε/Disc( f )).

   We now recall a useful technique that shows upper bounds on the discrepancy of a function under a
product distribution. It is a standard lemma (see, for example, [12] and [27]).

Lemma 2.3. Let f : X ×Y1 × · · · ×Yk → R, µ = µX × µ1 × · · · × µk be any product distribution, and let
φ : X ×Y1 × · · · ×Yk → {0, 1} be the characteristic function of a cylinder intersection. Then,

                                                                                                                                            !1/2k
         E[ f (x, y1 , . . . , yk )φ (x, y1 , . . . , yk )] ≤            E                    E     Πa∈{0,1}k f (x, ya11 , . . . , yak k )
         µ                                                      y01 ,y11 ,...,y0k ,y1k       x∼µX


where y0i ∼ µi , y1i ∼ µi are sampled independently for each i ∈ [k].

Remark 2.4. When f is {−1, 1} valued, the left hand side represents the discrepancy of f over the
cylinder intersection φ with respect to the distribution µ. However, for our purposes, we are required to
use the inequality when f is {−1, 1, 0} valued.


2.3    The binomial distribution
Definition 2.5. Let B(N) denote the distribution obtained as the sum of 2N independent Bernoulli
variables, each of which take values 1/2, −1/2 with probability 1/2 each.

    A few important things to observe are that B(N) takes only integral values, it is centered and symmetric
around 0, so B(N) is identically distributed to −B(N). Its range is [−N, N].
Let us denote Pr[B(N) = 0] by p0 . It is a well-known fact that
                                                                       
                                                        2N   N       1
                                                  p0 =     /4 = Θ           .
                                                        N           N 1/2

The following lemma tells us that the probability of a binomial distribution taking any value close to its
mean is significantly high.

                              T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                                                   7
                          A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

Lemma 2.6. Let W be a binomial random variable distributed according to B(N). Let p0 denote
Pr[W = 0]. Then for all j ∈ [−N, N],

                                          j2
                                                
                                  p0 − O             ≤ Pr[W = j] ≤ p0 .
                                         N 3/2

Proof. Note that for | j| ≥ N/2, the bound to be proved is trivial. Thus we assume | j| < N/2.
                                                               
                                               2N             2N         1
         Pr[W = j − 1] − Pr[W = j] =                     −            · 2N
                                            N + j−1         N+ j       2
                                                                                        
                                                     (2N)!                  (2N)!            1
                                      =                              −                     ·
                                          (N + j − 1)!(N − j + 1)! (N + j)!(N − j)! 22N
                                                 (2N)!                 2j−1            1
                                      =                         ·                  · 2N
                                         (N + j − 1)!(N − j)! (N − j + 1)(N + j) 2
                                               
                                            2N         2j−1       1
                                      =            ·           ·
                                          N + j N − j + 1 22N
                                         
                                          2N       1     2j
                                      ≤         · 2N ·
                                           N      2     N− j

since the middle binomial coefficient is the maximum. Thus, we have ∀i, |i| ≤ j,
                                                            
                                                            2N  2j    1
                               Pr[W = i − 1] − Pr[W = i] ≤         · 2N .
                                                             N N− j 2
                           
Since 2N                1
           N
        N  /4  =  Θ   N 1/2  ,

                                        j
                                                                      2 j2
                                                                                     
                                                                                 1
             Pr[W = 0] − Pr[W = j] ≤ ∑ |Pr[W = i − 1] − Pr[W = i]| ≤       ·O
                                     i=1                             N− j       N 1/2
                                      2 · 2 j2
                                                          
                                                      1
                                   ≤            ·O                                 (since | j| ≤ N/2)
                                         N           N 1/2
                                         2 
                                              j
                                   =O               .
                                           N 3/2

3    A discrepancy upper bound for the multiparty GHR function
                                               √
In this section, we prove essentially an exp − N/4k upper bound on the discrepancy of the GHRNk
                                                        
                                                                                  
function where the first player gets N input bits. This gives us an exp −nΩ(1) upper bound on the
discrepancy if k ≤ ε log(N) for any constant 0 < ε < 1/4.
    Goldmann et al. [17] showed that when k = 2, if there is a low cost one-way protocol for GHRN2 , then
it must have low advantage. Sherstov [29] noted that the same proof technique can be adapted to prove an
upper bound on the discrepancy on GHRN2 . We generalize this for higher k. In particular, we show

                      T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                            8
    S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

Theorem 3.1. For any k ≥ 1,
                                                                                !
                                                                 (8e)k N 1/4
                                      Disc(GHRNk ) = O           √
                                                                      k
                                                                                    ,
                                                                2 N/4 · 2k/2

where GHRNk is defined as in Definition 1.1, and N is the maximum number of bits a player gets (in this
case the first player).

Proof of Theorem 1.2. It follows directly from Theorem 3.1 and Lemma 2.2.

Proof of Corollary 1.3. From Theorem 1.2, it follows that for all 1 ≤ k ≤ δ · log n, the GHRNk function is
not in PPcc                                                                                        N
           k+1 for any constant 0 < δ < 1/4. Let us see an easy unbounded error protocol for GHRk . Note
that all the weights of the top threshold are positive. One player chooses and announces a Parity gate at
the bottom layer with probability proportional to its corresponding weight. The cost of announcing this is
O(log(N)). The probability of success equals ∑ w+                     +
                                                     i /w, where the wi are the weights of the gates which
                                   +        −
agree with the output. Since ∑ wi > ∑ wi (the weights of the gates which disagree with the output), the
probability of success is strictly greater than 1/2.

     Recall that N = 2n2 4k . The proof technique of Theorem 3.1 is inspired from that of Goldmann et
al. [17].

Proof of Theorem. 3.1. Let
                                                 1 n−1 i
                                          Aj =     ∑ 2 (xi,2 j + xi,2 j+1 ) .
                                                 2 i=0
It is easy to see that A j can take any integer value in [−2n + 1, 2n − 1]. Let µX be a distribution on the
variables x that make the variables A j independent and binomially distributed according to B(2n − 1) as
defined in Definition 2.5. Such a distribution exists because each A j depends on a disjoint set of variables.
                                                 k
Let U be the uniform distribution on {−1, 1}n4 . We choose a tuple (x, y1 , . . . , yk ) by first picking yi ∼ U
independently for each i, and then picking x ∼ µX under the condition that |P(x, y1 , . . . , yk )| = 2k . Let us
define µ to be the distribution obtained by this sampling procedure.
     We will now show an upper bound on the discrepancy of GHRNk under the distribution µ. Let S denote
the characteristic function (0-1 valued) of a cylinder intersection. By Definition 2.1, the discrepancy of
GHRNk according to µ is

                      Discµ (GHRNk ) = max E GHRNk (x, y1 , . . . , yk )S(x, y1 , . . . , yk ) .
                                                                                             
                                                                                                           (3.1)
                                            S    µ

The following lemma will enable us to switch to working with a product distribution on the inputs, for
which we have convenient techniques for proving discrepancy upper bounds via Lemma 2.3.
Lemma 3.2. For µX , U as defined above,
                                                                                       
                                                            k                   1
                               Pr [|P(x, y1 , . . . , yk )| = 2 ] ≥ Ω √ (n+2k)/2            .
                             µX ×Uk                                    n2


                        T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                  9
                                A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

Proof. We will show that for any fixed y1 , . . . , yk , if we sample x according to µX , then
                                                                       n4k −1
                                          P(x, y1 , . . . , yk )/2 = ∑ A j y1 j · · · yk j
                                                                        j=0

is distributed according to B(n4k (2n − 1)). Note that A j y1 j · · · yk j is always distributed according to
B(2n − 1), no matter what the values of y1 , . . . , yk are. Next, observe that the sum of binomial distributions
is a binomial distribution. This shows that
                                                       n4k −1
                                                         ∑ A j y1 j · · · yk j
                                                         j=0

is distributed according to B(n4k (2n − 1)).
    Hence, by plugging in N = n4k (2n − 1) and j = 2k in Lemma 2.6,

                                                                                        4k
                                                                                                
                                               k                1
               Pr [|P(x, y1 , . . . , yk )| = 2 ] ≥ Ω                       −O
             µX ×Uk                                     (n4k (2n − 1))1/2        (n4k (2n − 1))3/2
                                                                    
                                                              1
                                                  = Ω √ (n+2k)/2 .
                                                          n2
We can discard the second term since it equals
                                                   −1 
                                           k/2 n 3/2
                                   O 4 · n(2 − 1)          ,

and is dominated by the first term.

    Let us now recall the law of total expectation.
Fact 3.3 (Law of total expectation). For any probability space (Ω, F, ν), any event E ∈ F, and any
random variable Z, the following equality holds.

                                   E[Z] = E[Z | E] · Pr[E] + E[Z | Ē] · (1 − Pr[E]) .
                                    ν         ν            ν           ν                     ν

    Define a function q by
                                                 (
                                                  P(x, y1 , . . . , yk )/2k      if |P(x, y1 , . . . yk )| = 2k ,
                        q(x, y1 , . . . , yk ) =
                                                  0                              otherwise.

This means that if (x, y1 , . . . , yk ) is chosen according to the distribution µX × Uk , then q(x, y1 , . . . , yk ) =
GHRNk (x, y1 , . . . , yk ) on the support of µ, and 0 otherwise. For any cylinder intersection S, let Z denote
the random variable q(x, y1 , . . . , yk ) · S(x, y1 , . . . , yk ), let E denote the event |P(x, y1 , . . . yk )| = 2k . Using
Fact 3.3 and the fact that EµX ×Uk [Z | Ē] = 0, we obtain

                                                                    EµX ×Uk [q(x, y1 , . . . , yk ) · S(x, y1 , . . . , yk )]
              E[GHR(x, y1 , . . . , yk )S(x, y1 , . . . , yk )] =                                                             .   (3.2)
               µ                                                         PrµX ×Uk [|P(x, y1 , . . . yk )| = 2k ]

                           T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                                    10
    S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

Using Equation (3.1), Lemma 3.2 and Equation (3.2), we obtain the following.
                                                                                                                                       √ n+2k 
                 Discµ (GHRNk ) ≤ max                             E [q(x, y1 , . . . , yk )S(x, y1 , . . . , yk )] · O                   n2 2                       (3.3)
                                                       S      µX ×Uk


where S denotes a cylinder intersection. It therefore suffices to show that for all cylinder intersections S,
                                                                                                n+2k 
                                                                                                 −   −εn
                                           E [q(x, y1 , . . . , yk )S(x, y1 , . . . , yk )] ≤ O 2 2                                                                 (3.4)
                                         µX ×Uk

                                                                          
for some constant ε > 0 to give us a discrepancy upper bound of exp −nΩ(1) . For notational convenience,
we may switch between the notations Ex and Ex∼µX from now on. Now that we have a product distribution,
we can use Lemma 2.3.
                                                                                                        "                                             # !1/2k
     E [q(x, y1 , . . . , yk )S(x, y1 , . . . , yk )] ≤                                E          E            ∏            q(x, ya11 , . . . , yak k )         .   (3.5)
   µX ×Uk                                                                    y01 ,y11 ,...,y0k ,y1k x   a1 ,...,ak ∈{0,1}

We will now show an upper bound on the RHS of the above equation by splitting the outer expectation
into two terms, the first of which has low probability. We will require certain properties of Hadamard
matrices to prove an upper bound on the second term. Let β ∈ {0, 1}k . Define 2k subsets of indices as

                                           Iβ = { j ∈ [n4k ] : ∀i ∈ [k], (y0i ) j = (−1)βi · (y1i ) j } .

Note that {Iβ : β ∈ {0, 1}k } forms a partition of the indices. Since our distributions on the pairs y0i , y1i are
uniform and independent, each Iβ is empty with equal probability. An easy counting argument tells us
that the probability of Iβ being empty is
                                                                                    n4k
                                                                         (2k − 1)/2k      .

By a union bound, the probability that any one of them is empty is at most
                                                                                    n4k
                                                                    2k · (2k − 1)/2k      .

We have the following.
                                         "                                                     # !1/2k                        k    !1/2k
                                                                                                                           1 n4
                                                                                                                       
                      E            E              ∏             q(x, ya11 , . . . , yak k )                   ≤         k
                                                                                                                      2 1− k     +Z
              y01 ,y11 ,...,y0k ,y1k x     a1 ,...,ak ∈{0,1}
                                                                                                                          2

where
                                    Z=                      E                      E          ∏             q(x, ya11 , . . . , yak k ) .
                                             y01 ,y11 ,...,y0k ,y1k :∀β ,Iβ 6=0/   x
                                                                                       a1 ,...,ak ∈{0,1}



                                T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                                                                 11
                                      A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

Claim 3.4. For all y01 , . . . , y0k , y11 , . . . , y1k such that Iβ is non-empty for each β ∈ {0, 1}k , we have
                       "                                             #                                                             k+1
                                                                                                                                         !
                                                                                    k log(e)2k     2k   1         2(k+1)2
               E              ∏            q(x, ya11 , . . . , yak k )   ≤O 2                    ·2         k   ·                            .
                   x
                       a1 ,...,ak ∈{0,1}                                                            (2n/2 )2 −1     23n/2

      Let us assume the claim to be true for now. We have from Equation (3.3) that
                                                                                                 √ n+2k 
       Discµ (GHRNk ) ≤               E [q(x, y1 , . . . , yk )S(x, y1 , . . . , yk )] O           n2 2
                                   µX ×Uk

                                             n4k                                                 !!1/2k
                                                                                        (k+1)2k+1
                                   
                                          1                       k    k       1      2
                              ≤ 2k 1 − k           + O 2k log(e)2 · 22 n/2 2k −1 ·
                                          2                              (2 )            23n/2
                                  √ n+2k 
                               ·O     n2 2
                               "              n2k                    k
                                                                                !#
                                                                                     √ n+2k 
                                    k        1                    (4e)
                              ≤ 2k/2 1 − k            +O       n 1− 1     3n 1     O      n2 2
                                             2                              ·
                                                            (2 2 ) 2k · 2 2 2k
                                                                              √ 
                                            n2k
                                                    n/2+k+k/2k √       (8e)k n
                                 
                                      −1/2k
                              ≤O e               ·2           · n + 3n n 1
                                                                         ( − )·
                                                                      2 2 2 2k                      
                                                                    Using the fact that 1 − 1γ < e−1/γ
                                                            √            √ 
                                        n/2+k+k/2k √   (8e)k n       (8e)k n
                                                                  
                                    −n
                              = O e ·2            · n + n/2k    =O         k         Assuming k < n/3
                                                         2             2n/2
                                                !
                                    (8e)k N 1/4
                              =O    √
                                         k
                                                                                  Recall that N = 2n2 4k
                                   2 N/4 · 2k/2

which proves Theorem 3.1.

      Now it only remains to prove Claim 3.4.

3.1     Proof of Claim 3.4
Recall that we need to show the following. For all y01 , . . . , y0k , y11 , . . . , y1k such that Iβ is non-empty for each
β , we want
               "                                    #                                                         !
                                                                        k        k        1       2 (k+1)2k+1
             E
             x
                        ∏ q(x, ya11 , . . . , yakk ) ≤ O 2k log(e)2 · 22 (2n/2 )2k −1 · 23n/2 .
                a1 ,...,ak ∈{0,1}


Fix any such y01 , . . . , y0k , y11 , . . . , y1k . Note that the LHS of the above equation is
              "                                                    #    "                                                                #
              Pr             ∏             q(x, ya11 , . . . , yak k ) = 1   − Pr          ∏             q(x, ya11 , . . . , yak k ) = −1        .
               x                                                               x
                       a1 ,...,ak ∈{0,1}                                             a1 ,...,ak ∈{0,1}


                                T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                                                12
      S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

For convenience, for all a ∈ {0, 1}k let us denote P(x, ya11 , . . . , yak k ) by Pa and let Sa denote Pa /2. By the
definition of q, we have
                                               "            #            "                    #
       h
                       a            a
                                         i               Pa                          Pa
   Ex ∏a∈{0,1}k q(x, y11 , . . . , yk k ) = Pr
                                             x
                                                  ∏ 2k = 1 − Prx ∏ k 2k = −1
                                                a∈{0,1}k                     a∈{0,1}
                                               "                        #         "                       #
                                                                                          k                                         k
                                                 = Pr
                                                       x
                                                               ∏ Sa = 2(k−1)2                 − Pr
                                                                                                   x
                                                                                                           ∏ Sa = −2(k−1)2                 . (3.6)
                                                            a∈{0,1}k                                    a∈{0,1}k

Let
                                                      Wβ = ∑ A j (y01 ) j . . . (y0k ) j .
                                                                j∈Iβ

It will be useful to note here that Wβ only takes integral values. We will use this fact crucially later. Let pk
denote the 2k × 1 column vector whose elements are indexed by a = (a1 , . . . , ak ) ∈ {0, 1}k , and the a-th
element of pk is P(x, ya11 , . . . , yak k ). Similarly define column vectors sk (wk , respectively) whose a-th entries
are Sa (Wa , respectively) for all a ∈ {0, 1}k . Although pk , sk and wk depend on x, y01 , . . . , y0k , y11 , . . . , y1k , we
do not make this dependence explicit in the following discussion in order to avoid clutter.
Claim 3.5. The following holds true for all k, and all x, y01 , . . . , y0k , y11 , . . . , y1k .

                                     pk = 2sk = 2Hk · wk                                                                                     (3.7)
                                                                
               k   k                       1          Hk−1 Hk−1             
where Hk is a 2 × 2 Hadamard matrix defined as Hk =                and H0 = 1 .
                                                      Hk−1 −Hk−1
      Let us first state a well-known property of Hk .
Fact 3.6. Let Hk be as defined above. Then, (Hk )i j = (−1)hi, ji for all i, j ∈ {0, 1}k .
   In other words, Hk is the communication matrix of the inner product (modulo 2) function. Let us now
prove Claim 3.5.

Proof of Claim 3.5. Let
                                             n4k
          a ∈ {0, 1} , k
                                   Pa = 2 ∑ A j (ya11 ) j · · · (yak k ) j ,           and          Wβ = ∑ A j (y01 ) j · · · (y0k ) j .
                                             j=1                                                              j∈Iβ

Say j ∈ Iβ where β ∈ {0, 1}k . Note that (yai i ) j = −1 · (y0i ) j iff ai = 1, βi = 1. Hence, we have

                     (ya11 ) j · · · (yak k ) j = (−1)(∑i ai ·βi ) (y01 ) j · · · (y0k ) j = (−1)ha,β i (y01 ) j · · · (y0k ) j .

We conclude that
                                                                                                                               !
                             n4k
                   Pa = 2 ∑ A j (ya11 ) j · · · (yak k ) j = 2             ∑ ∑ (−1)ha,β i A j (y01 ) j · · · (y0k ) j
                             j=1                                       β ∈{0,1}k j∈Iβ

   1 This is the Sylvester construction of Hadamard matrices.



                            T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                                              13
                                 A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE
                                                       !
                       =2        ∑ (−1)ha,β iWβ
                             β ∈{0,1}k

                       = 2(Hk )a · wk

where (Hk )a denotes the a-th row of Hk . Thus, pk = 2sk = 2Hk · wk .

3.1.1    On integral solutions to Hadamard constraints
In the remainder of this section, we shall refer to an integral assignment to wk as a valid integral
assignment if it satisfies Equation (3.7) for some setting of x, y01 , . . . , y0k , y11 , . . . , y1k . The conditions on sk
will be explicitly stated in each usage.
    First, we prove that the number of valid integral assignments to wk satisfying
                                                                           k
                                                      ∏ Sa = 2(k−1)2
                                                    a∈{0,1}k

is equal to the number of valid integral assignments to wk satisfying
                                                                            k
                                                    ∏ Sa = −2(k−1)2 .
                                                a∈{0,1}k

Moreover, we show that the total number of such valid integral assignments is small, and the values of |Wa |
are not too large in any such valid assignment. Recall from Equation (3.6) that for all y01 , . . . , y0k , y11 , . . . , y1k
such that Iβ is non-empty for each β , we have
          "                         #      "                      #     "                                  #
                                                                                   k                            k
        E
        x
              ∏ q(x, ya1 , . . . , yak )
                            1            k
                                             = Pr
                                                x
                                                        ∏ Sa = 2(k−1)2                 − Pr
                                                                                         x
                                                                                                ∏ Sa = −2(k−1)2     .
            a∈{0,1}k                                  a∈{0,1}k                                a∈{0,1}k

Thus, we can pair the valid “positive” and “negative” assignments. Higher-order terms in the difference
of probabilities          "                     #      "                     #
                                                           k                                        k
                            Pr
                             x
                                    ∏ Sa = 2(k−1)2             − Pr
                                                                 x
                                                                        ∏ Sa = −2(k−1)2
                                  a∈{0,1}k                            a∈{0,1}k

cancel out. We require the following well-known property of Hadamard matrices.
Fact 3.7. Let H be an N × N Hadamard matrix. Then, H is invertible, and H−1 = N1 H.
Claim 3.8. The number of valid integral assignments to wk such that
                                                                               k
                                                     ∏ Sa = +2(k−1)2
                                                 a∈{0,1}k

equals the number of valid integral assignments such that
                                                                            k
                                                    ∏ Sa = −2(k−1)2 .
                                                a∈{0,1}k



                          T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                           14
     S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

Proof. The constraints we have are Hk · wk = sk . Since Wa is integral for all a, and Hk is a ±1 matrix, this
implies that the Sa are integral as well. Thus, using Fact 3.7 we get (1/2k )Hk · sk = wk , or Hk · (sk /2k ) =
wk . Let us consider two cases, one where ∀a ∈ {0, 1}k , Sa /2k = 1/2, and another where there exists an
a such that Sa /2k 6= 1/2.
    • Let us assume ∀a, Sa /2k = 1/2. We show something slightly stronger, namely that every setting
      of each Sa /2k to ±1/2 gives us a valid assignment to the Wa . Since Hk is a ±1 matrix of even
      dimension, the parity of the number of appearances of +1/2 equals the parity of number of
      appearances of −1/2 in the sum (Hk )R · (sk /2k ), where (Hk )R is the R-th row of Hk . This holds for
      every row R. Thus, WR is always an integer. This means the number of valid positive assignments
      equals the number of valid negative assignments in this case.
    • The absolute value of Sa must equal a power of 2 for each a since the product of them is a power of
      2. If there exists an Sa whose value is not ±2k−1 , then there must exist an Sb (consider the last such
      one) which is a multiple of 2k since
                                                                                 k
                                                        ∏ Sa = ±2(k−1)2 .
                                                      a∈{0,1}k

       Since Sb /2k is an integer, and we had a valid integral assignment to wk , flipping the sign of Sb can
       change the value of any Wc to Wc ± 2 · Sb /2k , which remains an integer. This is a bijection between
       valid positive and negative assignments.
Lemma 3.9. The number of valid integral assignments to wk satisfying
                                                                             k
                                                     ∏ Sa = ±2(k−1)2
                                                   a∈{0,1}k

                     k
is at most 2k log(e)2 .
   We use the following standard fact about binomial coefficients.
Fact 3.10. For all n ∈ N and for all k ∈ [n], nk ≤ (n · e/k)k .
                                                

                                                                      k
Proof of Lemma 3.9. Suppose ∏a∈{0,1}k Sa = ±2(k−1)2 . This means we have to distribute (k − 1)2k
powers of 2 among among the integers 2k Sa . The total number of ways to do this equals the number of
non-negative integer solutions to m1 + · · · + m2k = (k − 1)2k , which equals
                                                k         
                                                  k2 − 1
                                                             .
                                                 (k − 1)2k
Note that
                            k2k − 1                k2k                                  (k−1)2k
                                                               
                                           =                      ≤ k2k · e/((k − 1)2k )         ,
                           (k − 1)2k            (k − 1)2k
where the last inequality follows by Fact 3.10. Now we use the fact that 1 + x ≤ ex and conclude that
                                                                   (k−1)2k
                                                k2k · e/((k − 1)2k )

                          T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                             15
                                A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

                            k                              k
is bounded above by ek2 , which equals 2k log(e)2 . Each of these can give at most one integral assignment
to wk because the system of constraints is linearly independent.

    We now state an upper bound on the value of |Wa | in every integral assignment.
                                                               k
Lemma 3.11. For all a ∈ {0, 1}k , |Wa | ≤ 2(k+1)2 for any valid integral assignment to wk satisfying
                       k
∏a∈{0,1}k Sa = ±2(k−1)2 .
Proof. First note that for each a, |Wa | ≤ ∑a∈{0,1}k |Sa |/2k since Hk · sk = wk . We show that ∑a∈{0,1}k |Sa |
               k
is at most 2k2 . Suppose not. By a simple averaging argument, there must be a b such that
                                                                       k
                                                      |Sb | > 2k2 /2k ,
               k                                 k
which is 2k(2 −1) , which is at least 2(k−1)2 if k ≥ 1. But this is not possible since
                                                                                     k
                                                     ∏ Sa = ±2(k−1)2
                                                a∈{0,1}k

and each Sa is an integer.

3.1.2   Using properties of the binomial distribution
Recall from Equation (3.6) that for all y01 , . . . , y0k , y11 , . . . , y1k such that Iβ is non-empty for each β , we want
to show an upper bound on
                         "                                #             "                          #
                                                         k                                  k
                           Pr
                            x
                                   ∏ Sa = 2(k−1)2              − Pr
                                                                   x
                                                                             ∏ Sa = −2(k−1)2    .
                                 a∈{0,1}k                                  a∈{0,1}k

Recall that we defined
                                             Wβ = ∑ A j (y01 ) j . . . (y0k ) j .
                                                       j∈Iβ

For any β ∈ {0, 1}k , note that Wβ is always distributed according to B(cβ (2n − 1)), where cβ = Iβ ≥ 1.
We can prove this in a manner similar to that in the proof of Lemma 3.2. In Claim 3.8, we showed that
the number of valid integral assignments to wk such that
                                                                                 k
                                                      ∏ Sa = 2(k−1)2
                                                 a∈{0,1}k

equals the number of integral assignments such that
                                                                                    k
                                                     ∏ Sa = −2(k−1)2 .
                                                a∈{0,1}k

Note that if the assignment to wk is not integral, then it has probability 0, since for each a,Wa takes only
integral values. Let us call an assignment to wk positive if the corresponding value of
                                                                                k
                                                     ∏ Sa = 2(k−1)2 ,
                                                 a∈{0,1}k


                          T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                          16
    S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

and negative if the value of
                                                                              k
                                                       ∏ Sa = −2(k−1)2 .
                                                   a∈{0,1}k

Arbitrarily form a matching, denoted by M, between the positive and negative assignments. We will
bound the difference of probabilities of each match.
                         h                     k
                                                 i   h                        k
                                                                                i
                     Prx ∏a∈{0,1}k Sa = 2(k−1)2 − Prx ∏a∈{0,1}k Sa = −2(k−1)2

                        ≤      ∑        Pr[wk = w] − Pr[wk = w0 ]
                                           x               x
                            (w,w0 )∈M


where w = (wa )a∈{0,1}k is a valid positive assignment and w0 = (w0a )a∈{0,1}k is the valid negative assign-
ment that is the unique match of w according to M. The term Prx [wk = w] is equal to
                                                                             
                                                           ^
                                                   Pr             Wa = wa  .
                                                   x
                                                        a∈{0,1}k


In Lemma 3.11 we showed that for each β , the absolute value of Wβ in any integral assignment can be
               k
at most 2(k+1)2 . Each Wβ is distributed according to B(cβ (2n − 1)), cβ > 0, since ∀β ∈ {0, 1}k , Iβ > 0.
For a particular positive assignment w, negative assignment w0 and any y01 , . . . , y0k , y11 , . . . , y1k such that Iβ
is non-empty for each β ,
                                                                          

                Pr[wk = w] − Pr wk = w0 = Pr                       Wa = w0a 
                                             ^                  ^
                                                 Wa = wa  − Pr 
                x               x
                                                               a∈{0,1}k                     a∈{0,1}k

                                                           k
By plugging in N = cβ (2n − 1) and j = 2(k+1)2 in Lemma 2.6, we obtain
                                                                 k+1
                                                                         
                                p0 ≥ Pr[Wβ = wβ ] ≥ p0 − O 2(k+1)2 /23n/2 ,
                                           x


where p0 = Pr[Wβ = 0] = O 1/2n/2 . For convenience in calculations, let us say
                                

                                                               k+1
                                                                        
                                    Pr[Wβ = wβ ] ∈ p0 ± O 2(k+1)2 /23n/2 .
                                    x

Recall that the variables Wβ are independent since they depend on disjoint sets of variables. Thus,

                                                              0
                a∈{0,1}k Wa = wa ] − Pr[
              V                                V
          Pr[                                  a∈{0,1}k Wa = wa ]

                                     k+1
                                           !!2k                           k+1
                                                                                  !!2k             k         k+1
                            2(k+1)2                                 2(k+1)2                 2 · 22     2(k+1)2
          ≤       p0 ± O                           −    p0 ± O                           ≤ n/2 2k −1 ·             .
                              23n/2                                   23n/2               (2 )           23n/2


                           T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                                       17
                              A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

                                                                                                           k
The last inequality holds because the highest-order term after binomially expanding both terms is (p0 )2 ,
                                                                             k
which cancel each other. Note that the sum of the binomial coefficients is 22 , and each term after the first
is at most                                                        
                                              k             k+1
                                    1/(2n/2 )2 −1 · 2(k+1)2 /23n/2 .
Thus, the sum of the remaining terms can be bounded above by
                                 k
                                            k
                                                        k+1
                                                                 
                              22 · 1/(2n/2 )2 −1 · 2(k+1)2 /23n/2 .
                                                                   k
By Lemma 3.9, the number of assignments is at most 2k log(e)2 . Thus,
                h                    k
                                       i      h                       k
                                                                        i
           Prx ∏a∈{0,1}k Sa = 2(k−1)2 − Prx ∏a∈{0,1}k Sa = −2(k−1)2
                                                                                                   k+1
                                                                       k   k       1         2(k+1)2
              ≤      ∑        Pr[wk = w] − Pr[wk = w0 ] ≤ 2k log(e)2 · 22              k   ·
                  (w,w0 )∈M
                              x              x                                 (2n/2 )2 −1     23n/2

which proves Claim 3.4. Using Equation (3.3), this proves Theorem 3.1.


4    Circuit lower bounds
In this section, we will show how we obtain lower bounds on the size of depth-3 circuits of the type
MAJ ◦ THR ◦ ANYk computing the GHRNk function. Recall that GHRNk can be computed by linear-size
THR ◦ PARk+1 circuits. First let us state the results that were known prior to this work.
Lemma 4.1 (Folklore). Any function f computable by size s circuits of the type SYM ◦ ANYk has a
deterministic simultaneous (k + 1)-player protocol of cost O(k log(s)) for any partitioning of the input
bits.
Proof. Since each of the bottom-layer gates has fan-in at most k, there must exist a player who sees all
the inputs to it. The protocol decides beforehand which gate “belongs” to which player. All players
simultaneously broadcast their contribution to the top SYM gate using at most log(s) bits each.

   A consequence of this is an upper bound for randomized protocols for depth-3 circuits, which may be
found in [11], for example, and is stated below without proof.
Lemma 4.2 (Folklore). Given any function f computable by size s circuits of the type MAJ ◦ SYM ◦ ANYk ,
and any partition of the input bits, there exists a public coin (k +1)-player randomized protocol computing
f with advantage Ω(1/s) and cost O(k log(s)).
    Let us now prove Theorem 1.4.

Proof. Suppose GHRNk could be computed by MAJ ◦ SYM ◦ ANYk circuits of size s. Using the pro-
tocol mentioned in Lemma 4.2, the cost of the protocol is O(k log(s)) and advantage Ω(1/s). Using
Theorem 1.2,                                      √                  
                                                      N
                        O(k log(s) + log(s)) ≥ Ω         − log(N) − k ,
                                                    4k

                         T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                            18
    S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

which gives
                                               √            
                                                  N log(N)
                                    log(s) ≥ Ω      −      −1 .
                                                k4k    k
Thus,
                          √                   √          
                              N log(N)                N log(N)
                  s ≥ exp Ω     −      −1   ≥ exp Ω     −         .
                            k4k    k                k4k    k

    By definition, polynomial-size MAJ ◦ MAJ circuits can be simulated by polynomial-size MAJ ◦ SYM
circuits. Also, Goldmann et al. [17] (Theorem 26) showed that MAJ ◦ THR circuits can be simulated by
MAJ ◦ MAJ circuits with a polynomial blowup. More precisely, a MAJ ◦ THR circuit of size s can be
simulated by a MAJ ◦ MAJ circuit of size sα · nβ for some constants α, β . Hence, Corollary 1.5 follows
by a similar proof as that of Lemma 4.2.


5    Conclusion
                                                     √
We have shown that GHRNk requires essentially Ω( N/4k cost to be solved in the PPcc
                                                             
                                                                                           k+1 model. Since
it follows almost from the definition of GHRNk that it has O(log N) cost UPPcc k+1 protocols, this provides a
                  cc           cc
separation of PPk from UPPk for the NOF model when k ≤ δ · log N for some constant δ > 0. In general,
current techniques do not allow us to go beyond log N players to prove lower bounds for the cost of even
deterministic protocols. This remains one of the most interesting problems in NOF complexity. However,
let us remark that for many of the functions used in the literature (see, for example, [18, 3, 1, 15]), there
are surprisingly efficient protocols when k > log N. Moreover these protocols are typically deterministic
and either simultaneous or barely interactive. On the other hand, we do not immediately see an efficient
randomized interactive protocol for GHRNk at k > log N. This raises the question of whether GHRNk is a
hard function even for k > log N.                                 √ 
     Our result shows that the PPcc                        N
                                    k complexity of GHRk is Ω        N for any constant k. As mentioned
in Section 1.1, Sherstov [34] shows existence of functions with Ω(N) cost in PPk but that have efficient
UPPk protocols. A question that may be with reach is whether one can come up with an explicit function
in UPPcc k that requires Ω(N) PPk cost.
     Finally, proving super-logarithmic lower bounds for UPPcc k protocols for any explicit function remains
a very interesting challenge even for k = 3. Hansen and Podolskii [19] have shown that meeting this
challenge is enough to yield super-polynomial lower bounds for THR ◦ THR circuits.


Acknowledgements
We would like to thank Kristoffer Hansen for directing our attention to the result of Goldmann, Håstad
and Razborov [17] during the Summer School on Lower Bounds, held in Prague in the summer of
2015. We thank Michal Koucký for inviting us there, where we started this project. We are grateful to
an anonymous reviewer for pointing out to us that the results of Sherstov [31] and Beigel [7] can be
combined to get a separation between PPcc          cc
                                        k and UPPk for k at most O(log log n). We would like to thank
Alexander Sherstov [32] for his various helpful comments and pointers. Finally, we are thankful to the

                       T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                              19
                         A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

anonymous ToC referees and Laci Babai, who provided detailed comments and valuable advice that
helped in significantly polishing this paper.


References
 [1] A NIL A DA , A RKADEV C HATTOPADHYAY, O MAR FAWZI , AND P HUONG N GUYEN: The NOF
     multiparty communication complexity of composed functions. Comput. Complexity, 24(3):645–694,
     2015. Preliminary version in ICALP’12. [doi:10.1007/s00037-013-0078-4] 2, 19

 [2] L ÁSZLÓ BABAI , P ETER F RANKL , AND JANOS S IMON: Complexity classes in communica-
     tion complexity theory. In Proc. 27th FOCS, pp. 337–347. IEEE Comp. Soc. Press, 1986.
     [doi:10.1109/SFCS.1986.15] 2, 3

 [3] L ÁSZLÓ BABAI , A NNA G ÁL , P ETER G. K IMMEL , AND S ATYANARAYANA V. L OKAM: Commu-
     nication complexity of simultaneous messages. SIAM J. Comput., 33(1):137–166, 2003. Preliminary
     version in STACS’95. [doi:10.1137/S0097539700375944] 2, 19

 [4] L ÁSZLÓ BABAI , N OAM N ISAN , AND M ARIO S ZEGEDY: Multiparty protocols, pseudorandom
     generators for logspace, and time-space trade-offs. J. Comput. System Sci., 45(2):204–232, 1992.
     Preliminary version in STOC’89. [doi:10.1016/0022-0000(92)90047-M] 7

 [5] PAUL B EAME , M ATEI DAVID , T ONIANN P ITASSI , AND P HILIPP W OELFEL: Separating determin-
     istic from randomized multiparty communication complexity. Theory of Computing, 6(9):201–225,
     2010. Preliminary version in ICALP’07. [doi:10.4086/toc.2010.v006a009] 2

 [6] PAUL B EAME AND DANG -T RINH H UYNH -N GOC: Multiparty communication complexity and
     threshold circuit size of AC0 . SIAM J. Comput., 41(3):484–518, 2012. Preliminary version in
     FOCS’09. [doi:10.1137/100792779]

 [7] R ICHARD B EIGEL: Perceptrons, PP, and the Polynomial Hierarchy. Comput. Complexity, 4(4):339–
     349, 1994. Preliminary version in SCT’92. [doi:10.1007/BF01263422] 4, 19

 [8] H ARRY B UHRMAN , N IKOLAI K. V ERESHCHAGIN , AND RONALD DE W OLF: On computation and
     communication with small bias. In Proc. 22nd IEEE Conf. on Computational Complexity (CCC’07),
     pp. 24–32. IEEE Comp. Soc. Press, 2007. [doi:10.1109/CCC.2007.18] 3

 [9] M ARK B UN AND J USTIN T HALER: Approximate degree and the complexity of depth three circuits.
     In Proc. 22nd Internat. Workshop on Randomization and Computation (RANDOM’18), pp. 35:1–
     35:18. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2018.35]

[10] A SHOK K. C HANDRA , M ERRICK L. F URST, AND R ICHARD J. L IPTON: Multi-party protocols. In
     Proc. 15th STOC, pp. 94–99. ACM Press, 1983. [doi:10.1145/800061.808737] 2, 6

[11] A RKADEV C HATTOPADHYAY: Discrepancy and the power of bottom fan-in in depth-three circuits.
     In Proc. 48th FOCS, pp. 449–458. IEEE Comp. Soc. Press, 2007. [doi:10.1109/FOCS.2007.30] 18

                     T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                        20
    S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

[12] A RKADEV C HATTOPADHYAY: Circuits, Communication and Polynomials. Ph. D. thesis, McGill
     University, 2009. 2, 7
[13] A RKADEV C HATTOPADHYAY AND A NIL A DA: Multiparty communication complexity of dis-
     jointness. Electron. Colloq. on Comput. Complexity (ECCC), January 2008. [ECCC:TR08-002]
     2
[14] A RKADEV C HATTOPADHYAY AND N IKHIL M ANDE: Small error versus unbounded error pro-
     tocols in the NOF model. Electron. Colloq. on Comput. Complexity (ECCC), September 2016.
     [ECCC:TR16-095] 4
[15] A RKADEV C HATTOPADHYAY AND M ICHAEL E. S AKS: The power of super-logarithmic number of
     players. In Proc. 18th Internat. Workshop on Randomization and Computation (RANDOM’14), pp.
     596–603. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2014. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2014.596] 2, 19
[16] A NDREW D RUCKER , FABIAN K UHN , AND ROTEM O SHMAN: On the power of the congested
     clique model. In Proc. 33th Symp. on Principles of Distributed Comput. (PODC’14), pp. 367–376.
     ACM Press, 2014. [doi:10.1145/2611462.2611493] 2
[17] M IKAEL G OLDMANN , J OHAN H ÅSTAD , AND A LEXANDER A. R AZBOROV: Majority gates vs.
     general weighted threshold gates. Comput. Complexity, 2(4):277–300, 1992. Preliminary version in
     SCT’92. [doi:10.1007/BF01200426] 3, 4, 5, 6, 8, 9, 19
[18] V INCE G ROLMUSZ: The BNS lower bound for multi-party protocols in nearly optimal. Inform.
     and Comput., 112(1):51–54, 1994. [doi:10.1006/inco.1994.1051] 2, 19
[19] K RISTOFFER A RNSFELT H ANSEN AND V LADIMIR V. P ODOLSKII: Polynomial threshold functions
     and boolean threshold circuits. Inform. and Comput., 240:56–73, 2015. Preliminary version in
     MFCS’13. [doi:10.1016/j.ic.2014.09.008] 19
[20] J OHAN H ÅSTAD: On the size of weights for threshold gates. SIAM J. Discrete Math., 7(3):484–492,
     1994. [doi:10.1137/S0895480192235878] 5
[21] H AMED H ATAMI , K AAVE H OSSEINI , AND S HACHAR L OVETT: Structure of protocols
     for XOR functions. In Proc. 57th FOCS, pp. 282–288. IEEE Comp. Soc. Press, 2016.
     [doi:10.1109/FOCS.2016.38] 5
[22] BALA K ALYANASUNDARAM AND G EORG S CHNITGER: The probabilistic communication com-
     plexity of set intersection. SIAM J. Discrete Math., 5(4):545–557, 1992. Preliminary version in
     SCT’87. [doi:10.1137/0405044] 2
[23] E YAL K USHILEVITZ AND N OAM N ISAN: Communication complexity. Cambridge University
     Press, 1997. 6, 7
[24] T ROY L EE AND A DI S HRAIBMAN: Disjointness is hard in the multiparty number-on-the-
     forehead model. Comput. Complexity, 18(2):309–336, 2009. Preliminary version in CCC’08.
     [doi:10.1007/s00037-009-0276-2] 2

                     T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                         21
                        A RKADEV C HATTOPADHYAY AND N IKHIL S. M ANDE

[25] M IHAI P ǍTRA ŞCU: Towards polynomial lower bounds for dynamic problems. In Proc. 42nd STOC,
     pp. 603–610. ACM Press, 2010. [doi:10.1145/1806689.1806772] 2

[26] A NUP R AO AND A MIR Y EHUDAYOFF: Simplified lower bounds on the multiparty communication
     complexity of disjointness. In Proc. 30th IEEE Conf. on Computational Complexity (CCC’15), pp.
     88–101. DROPS, 2015. [doi:10.4230/LIPIcs.CCC.2015.88] 2

[27] R AN R AZ: The BNS-Chung criterion for multi-party communication complexity. Comput. Com-
     plexity, 9(2):113–122, 2000. [doi:10.1007/PL00001602] 7

[28] A LEXANDER A. R AZBOROV: On the distributional complexity of disjointness. Theoret. Comput.
     Sci., 106(2):385–390, 1992. Preliminary version in ICALP’90. [doi:10.1016/0304-3975(92)90260-
     M] 2

[29] A LEXANDER A. S HERSTOV: Halfspace matrices. Comput. Complexity, 17(2):149–178, 2008.
     Preliminary version in CCC’07. [doi:10.1007/s00037-008-0242-4] 3, 5, 8

[30] A LEXANDER A. S HERSTOV: The intersection of two halfspaces has high threshold degree. SIAM J.
     Comput., 42(6):2329–2374, 2013. Preliminary version in FOCS’09. [doi:10.1137/100785260] 5

[31] A LEXANDER A. S HERSTOV: Communication lower bounds using directional derivatives. J. ACM,
     61(6):34:1–34:71, 2014. Preliminary version in STOC’13. [doi:10.1145/2629334] 2, 4, 19

[32] A LEXANDER A. S HERSTOV: Private Communication, 2016. 4, 19

[33] A LEXANDER A. S HERSTOV: The multiparty communication complexity of set disjointness. SIAM
     J. Comput., 45(4):1450–1489, 2016. Preliminary version in STOC’12. [doi:10.1137/120891587] 2,
     4, 5

[34] A LEXANDER A. S HERSTOV: On multiparty communication with large versus unbounded
     error.    Theory of Computing, 14(22), 2018. Preliminary version ECCC TR16-138.
     [doi:10.4086/toc.2018.v014a022] 4, 5, 19

[35] J USTIN T HALER: Lower bounds for the approximate degree of block-composed functions. In Proc.
     43rd Internat. Colloq. on Automata, Languages and Programming (ICALP’16), pp. 17:1–17:15.
     Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016. [doi:10.4230/LIPIcs.ICALP.2016.17,
     ECCC:TR14-150] 4

[36] S HENGYU Z HANG: Efficient quantum protocols for XOR functions. In Proc. 25th SODA, pp.
     1878–1885. ACM Press, 2014. [doi:10.1137/1.9781611973402.136] 5




                    T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                       22
  S EPARATION OF U NBOUNDED -E RROR M ODELS IN M ULTI -PARTY C OMMUNICATION C OMPLEXITY

AUTHORS

    Arkadev Chattopadhyay
    Associate professor
    Tata Institute of Fundamental Research
    Mumbai, India
    arkadev c tifr res in
    http://www.tcs.tifr.res.in/~arkadev/


    Nikhil S. Mande
    Postdoctoral fellow
    Georgetown University
    nm 936 georgetown edu
    http://www.tcs.tifr.res.in/~nikhil/


ABOUT THE AUTHORS

    A RKADEV C HATTOPADHYAY is currently an Associate Professor at the School of Tech-
       nology and Computer Science at TIFR, Mumbai, which he joined in September 2012.
       He obtained his Ph. D. from McGill University in 2008 under the supervision of Denis
       Thérien. He was a member of the School of Mathematics at the Institute for Advanced
       Study, Princeton for the year 2008-09. He was a postdoctoral member in the Department
       of Computer Science at the University of Toronto from 2009 to 2012. His research
       interests are in computational and communication complexity. Outside of this, he is also
       interested in literature, films and politics.


    N IKHIL S. M ANDE is a postdoctoral fellow at the Department of Computer Science at
        Georgetown University. This work was done while he was a Ph. D. student at the School
        of Technology and Computer Science at TIFR, Mumbai, where he was advised by
        Arkadev Chattopadhyay and graduated in 2018. His research interests lie in commu-
        nication complexity and circuit complexity. In his spare time, he enjoys speedsolving
        Rubik’s cube and related puzzles; his official records can be found here.




                   T HEORY OF C OMPUTING, Volume 14 (21), 2018, pp. 1–23                          23