The Layer Complexity of Arthur-Merlin-like Communication

Authors Dmitry Gavinsky,
Plaintext
                           T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28
                                        www.theoryofcomputing.org




                 The Layer Complexity
          of Arthur-Merlin-like Communication
                                                 Dmitry Gavinsky∗
               Received May 23, 2019; Revised December 11, 2020; Published September 27, 2021




       Abstract. In communication complexity the Arthur-Merlin (AM) model is the most natural
       one that allows both randomness and nondeterminism. Presently we do not have any super-
       logarithmic lower bound for the AM-complexity of an explicit function. Obtaining such a
       bound is a fundamental challenge to our understanding of communication phenomena.
          In this article we explore the gap between the known techniques and the complexity class
       AM. In the first part we define a new natural class, Small-advantage Layered Arthur-Merlin
       (SLAM), that has the following properties:
           • SLAM is (strictly) included in AM and includes all previously known subclasses of AM
             with non-trivial lower bounds:
                                          NP, MA, SBP, UAM ⊆ SLAM ⊂ AM .
             Note that NP ⊂ MA ⊂ SBP, while SBP and UAM are known to be incomparable.
           • SLAM is qualitatively stronger than the union of those classes:
                                                   f ∈ SLAM \ (SBP ∪ UAM)
              holds for an (explicit) partial function f .
   ∗ Partially funded by the grant 19-27871X of GA ČR. Part of this work was done while visiting the Centre for Quantum

Technologies at the National University of Singapore, and was partially supported by the Singapore National Research
Foundation, the Prime Minister’s Office and the Ministry of Education under the Research Centres of Excellence programme
under grant R 710-000-012-135.


ACM Classification: F.1.3
AMS Classification: 68Q15
Key words and phrases: communication complexity, complexity classes


© 2021 Dmitry Gavinsky
c b Licensed under a Creative Commons Attribution License (CC-BY)                          DOI: 10.4086/toc.2021.v017a008
                                           D MITRY G AVINSKY

         • SLAM is a subject to the discrepancy bound: for any f
                                                        s                 !
                                                                 1
                                      SLAM( f ) ∈ Ω       log                 .
                                                              disc( f )

            In particular, the inner product function does not have an efficient SLAM-protocol.
      Structurally this can be summarised as

                                  SBP ∪ UAM ⊂ SLAM ⊆ AM ∩ PP .
                                                                       √
          In the second part we ask why proving a lower bound of ω( n) on the MA-complexity
      of an explicit function seems to be difficult. We show that such a bound either must explore
      certain “uniformity” of MA (which would require a rather unusual argument), or would imply
      a non-trivial lower bound on the AM-complexity of the same function.
          Both of these results are related to the notion of layer complexity, which is, informally,
      the number of “layers of nondeterminism” used by a protocol.


1    Introduction
The communication model Arthur-Merlin (AM) is beautiful. It is the most natural regime that allows both
randomness and nondeterminism. Informally,

    • BPP, the canonical complexity class representing randomised communication, contains such
      bipartite functions f that admit an approximate partition of the set f −1 (1) into quasi-polynomially
      many rectangles;

    • NP, the canonical complexity class representing nondeterministic communication, contains such f
      that admit an exact cover of the set f −1 (1) by quasi-polynomially many rectangles;

    • AM contains such f that admit an approximate cover of the set f −1 (1) by quasi-polynomially many
      rectangles.

    While both BPP and NP are relatively well understood and many strong and tight lower bounds
are known, we do not have any non-trivial lower bound for the AM-complexity of an explicit function.
Obtaining such a bound is a fundamental challenge to our understanding of communication complexity.
    Among numerous analytical efforts that have been made to understand AM, in this paper we are
paying special attention to these two:

    • In 2003 Klauck [13] studied the class Merlin-Arthur (MA): while (again, informally) AM can be
      viewed as “randomness over nondeterminism,” MA is “nondeterminism over randomness.” Klauck
      has found an elegant way to exploit this difference in order to prove strong lower bounds against
      MA.

                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                             2
                   T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

    • In 2015 Göös, Pitassi and Watson [10] demonstrated strong lower bounds against the class Unam-
      biguous Arthur-Merlin (UAM), which was defined in the same paper. Similarly to AM (and unlike
      MA), their class can be viewed as “randomness over nondeterminism,” but only a very special
      form of nondeterminism is allowed: namely, only the (erroneously accepted) elements of f −1 (0)
      may belong to several rectangles; every element of f −1 (1) can belong to at most one rectangle of
      the nondeterministic cover. In other words, a UAM-protocol must correspond to an approximate
      partition of f −1 (1), but at the same time it may be an arbitrary cover of a small fraction of f −1 (0).
      Intuitively, a UAM-protocol must “behave like BPP ” over f −1 (1) and is unrestricted over the small
      erroneously accepted fraction of f −1 (0).

   Interestingly, the classes MA and UAM are incomparable: from the lower bounds demonstrated in [10]
and in [9] it follows that

                                          UAM * MA and MA * UAM .

     In the first half of this article (Section 3) we try to find a communication model that would be as close
to AM as possible, while staying within the reach of our analytic abilities. Inspired by the (somewhat
Hegelian) metamorphosis of “easy” BPP and NP into “hard” AM, we will try to apply a similar “fusion”
procedure to the classes MA and UAM, hoping that the outcome will give us some new insight into the
mystery of AM.
     Namely, we start by looking for a communication complexity class, defined as naturally as possible
and containing both MA and UAM. We will call it Layered Arthur-Merlin (LAM) (Definition 3.3).
Informally, it can be described as letting a protocol behave like MA over f −1 (1) and arbitrarily over the
erroneously accepted small fraction of f −1 (0). Note that it follows trivially from the previous discussion
(at least on the intuitive level) that

                                                MA ∪ UAM ⊆ LAM .

     Then we will add a few rather technical “enhancements” to LAM in order to get a class that includes
all previously known classes “under AM” with non-trivial lower bounds: most noticeably, the class SBP,
which is known to be strictly stronger than MA and strictly weaker than AM (see [11, 9, 14]).
     We call the resulting model Small-advantage Layered Arthur-Merlin (SLAM) (Definition 3.4) and it
holds that

                                    MA, UAM, SBP, LAM ⊆ SLAM ⊂ AM .

Moreover, we will demonstrate a partial function

                                             f ∈ SLAM \ (UAM ∪ SBP) ,

that is, SLAM will be strictly stronger than the union of all subclasses of AM with previously known
non-trivial lower bounds (as UAM ∪ SBP includes them all).1
   1 Here and later when referring to the subclasses of AM with previously known non-trivial lower bounds,we mean, in

particular, the classes that are known to be included by AM. Note that not only do we not have any non-trivial lower bound
against AM yet, but we also cannot guarantee that any interesting complexity class is not included in AM.


                           T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                         3
                                                     D MITRY G AVINSKY

    Both LAM and SLAM seem to require a new approach for proving lower bounds. It will be developed
in Section 3.1, showing that these classes are still a subject to the discrepancy bound: for any function f ,
                                                                  s                  !
                                                                            1
                                           SLAM( f ) ∈ Ω             log                 ,
                                                                         disc( f )

where SLAM( f ) denotes the “SLAM-complexity” of f . In particular, the inner product function does not
have an efficient SLAM-protocol.
   These properties of SLAM can be summarised structurally as

                                         SBP ∪ UAM ⊂ SLAM ⊆ AM ∩ PP ,

where PP is the class consisting of functions with high discrepancy.
                                                   √
     The problem of proving a lower bound of ω ( n) for the MA-complexity of an explicit function has
                                                                                                     √
been open since 2003, when Klauck [13] showed that the MA-complexity of Disj and IP was in Ω ( n).
At that point a number of researchers believed that the actual MA-complexity of these problems was in
Ω (n), so it was surprising when Aaronson and Wigderson [1] demonstrated MA-protocols        for Disj and
               √                                                       √               
IP of cost O ( n log n), which was later improved by Chen [6] to O n log n log log n .
                                                                                               √
     In the second part of this article (Section 4) we try to understand why proving a super- n lower
bound against MA seems to be difficult. We will define a communication model MA   g that can be viewed,
in certain sense, as a non-uniform MA. On the one hand, we will see that imposing the corresponding
uniformity constraint on MA-protocols
                            g             makes them not stronger than MA-protocols; on the other hand,
all known lower bounds on MA( f ) readily translate to similar lower bounds on MA(
                                                                                 g f ).
     Intuitively, a complexity analysis that would explore the uniformity of MA (as opposed to MA)
                                                                                               g must
have a very unusual structure: the difference between the classes is subtle and we are not aware of
any   examples where    this type of an argument is used. At the same time, we will see that MA(g f) ∈
  p                                                                                    √
O      n · AM( f ) for any function f —that is, any lower bound of the form MA(
                                                                             g f ) ∈ ω ( n) would have
                                                                             √
non-trivial implications for AM( f ). This partially explains why no super- n lower bound against MA
has been found yet.2


Why LAM is interesting. In the hope that it would benefit the reader, let us explain the motivation for
defining and studying the communication models presented in the first part of this article. The strong
lower bounds that were shown earlier for both MA and UAM were in the first place steps towards AM.
Both MA and UAM have very natural definitions, they both can be viewed as weakened versions of AM,
and the authors of both [13] and [10] have invented new insightful approaches while analysing these
models.
   2 It is relatively easy to show AM( f ) ∈ Ω (log n) for an explicit function (see Footnote 10); to improve that, a lower bound of
          g f ) ∈ ω √n · log n would be needed. However, it seems that proving any MA(g f0 ) ∈ ω (√n) and deriving from it,
                                
the form MA(
via the argument of Section 4, that AM( f0 ) ∈ ω (1) would by itself shed some new light√on the enigma of AM. As one of the
concluding open problems (Section 5), we suggest proving a lower bound of the form Ω n log n on the MA-complexity of an
explicit function.


                             T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                                 4
                     T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

    The model LAM, in turn, has been defined as a natural “junction” of MA and UAM, at least as strong
as either of the predecessors.3 As the known approaches for analysing MA and UAM were rather different
qualitatively, we expected the new model to be challenging enough to justify defining it. Our experience
of proving a strong lower bound for the newly defined model has confirmed those expectations.
    We hope that studying LAM will serve as the next step towards understanding AM.


2     Preliminaries and definitions
For x ∈ {0, 1}n and i ∈ [n] = {1, . . . , n}, we will write xi or x(i) to address the i-th bit of x (preferring
“xi ” unless it may cause ambiguity). Similarly, for S ⊆ [n], let both xS and x(S) denote the |S|-bit string,
consisting of (naturally ordered) bits of x, whose indices are in S.
                                                                                                     A
                                                                                                       
    For  a (discrete) set A and   k ∈ N,  we    denote by Pow(A)  the set  of subsets of A and by   k   the set
  a ∈ Pow(A) |a| = k .
     Our primary objects of computation will be bipartite Boolean functions of the form A × B → {0, 1}
(typically, {0, 1}n × {0, 1}n → {0, 1}). At times we will consider partial bipartite Boolean functions,
where some of the pairs are excluded: this can be interpreted either as assuming that those pairs are never
given as input, or as allowing any output of a communication protocol when those pairs are received. We
will view partial Boolean functions as total ones that are taking values from {0, 1, ⊥}, where “⊥” marks
the excluded input values. Note that the total functions are a special case, so f : A × B → {0, 1, ⊥} can be
either total or partial. When we refer to an input distribution for a function f : A × B → {0, 1, ⊥}, we
mean a distribution defined on f −1 (0) ∪ f −1 (1).
                                     W
     We will use the logical OR ( ) operator with respect to partial Boolean functions, defined as follows
(note the asymmetry between the first two cases):
                                                 
                                                  1 if f1 (x) = 1 or f2 (y) = 1 ;
                                             def
                             f1 (x) ∨ f2 (y) = 0 if f1 (x) = 0 and f2 (y) = 0 ;                            (2.1)
                                                   ⊥ otherwise.
                                                 


2.1     Communication complexity
We refer to [15] for a classical background on communication complexity and to [11] for a great survey
of the more recent developments.

Communication problems.                We will repeatedly consider the following two communication problems.

Definition 2.1 (Disjointness function, Disj). For every n ∈ N, let (x, y) ∈ {0, 1}n × {0, 1}n . Then
                                                             n
                                                             ^
                                             Disj(x, y) =        (xi = 0 ∨ yi = 0) .
                                                            i=1
    3 Here we are referring to LAM, as its definition is more natural and less technically involved than that of SLAM; on the other

hand, the difference between the two models is, in our opinion, merely formal (as explained above, we wanted the corresponding
complexity class to contain all previously studied subclasses of AM, including SBP, and that was the reason for “boosting” LAM,
which resulted in SLAM).


                             T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                                5
                                           D MITRY G AVINSKY

Definition 2.2 (Inner product function, IP). For every n ∈ N, let (x, y) ∈ {0, 1}n × {0, 1}n . Then
                                                      n
                                                      M
                                         IP(x, y) =         (xi ∧ yi ) .
                                                      i=1

   Both Disj and IP are total bipartite Boolean functions—that is, their input sets are bipartite and the
function values are defined for every possible input pair.

Communication models. The study of communication complexity was initiated by Abelson [2] in the
regime of real-valued messages and adapted by Yao [17] to the discrete regime that we are interested in.
The models P and BPP that capture one’s intuition of efficient communication (respectively, deterministic
and randomised) date back to [17]. Later Babai, Frankl and Simon [3] introduced a number of stronger
communication models—in particular, AM and MA—that intuitively corresponded to some classes studied
in the context of structural computational complexity.
Definition 2.3 (Polylogarithmic, P). We call deterministic bipartite communication protocols P-
protocols.
    We denote by P the class of functions solved by P-protocols of cost at most poly-log(n).
Definition 2.4 (Bounded-error Probabilistic Polylogarithmic, BPP). For every n ∈ N, let f : {0, 1}n ×
{0, 1}n → {0, 1, ⊥} and ε ≥ 0.
    If for every input distribution µn there exists a P-protocol of cost at most kε (n) that solves f with
error at most ε, then we say that the BPPε -complexity of f , denoted by BPPε ( f ), is at most kε (n).
    We let the BPP-complexity of f be its BPP1 -complexity.
                                                 3
    We denote by BPP the class of functions whose BPP-complexity is at most poly-log(n).
    The above definition of BPP, as well those among the following model definitions that are distribution-
dependent, can be phrased in the “worst-case” formulations that do not make a reference to input
distributions. Those variants usually correspond to the closures of our definitions with respect to
mixed strategies, which, in turn, do not affect the resulting models, due to Von Neumann’s minimax
principle [16].
Definition 2.5 (Non-deterministic Polylogarithmic, NP). For some k(n), let Π = ri i ∈ [2k(n) ] be a
                                                                                       

family of characteristic functions of combinatorial rectangles over {0, 1}n × {0, 1}n .
                                                                                W k(n)
    We call such Π an NP-protocol of cost k(n) that solves the function f = 2i=1 ri (x, y) (as well any
partial g that is consistent with f on g−1 (0) ∪ g−1 (1)).
    We say that Π accepts every f −1 (1) and rejects every f −1 (0).
    We say that Π solves a function g : {0, 1}n × {0, 1}n → {0, 1, ⊥} with error ε with respect to an input
distribution µn , if Pr (X,Y )∼µn [ f (X,Y ) 6= g(X,Y )] = ε.
    We denote by NP the class of functions solved by NP-protocols of cost at most poly-log(n).
Definition 2.6 (Arthur-Merlin, AM). For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
    If for every input distribution µn there exists an NP-protocol of cost at most k(n) that solves f with
error at most 31 , then we say that the AM-complexity of f , denoted by AM( f ), is at most k(n).
    We denote by AM the class of functions whose AM-complexity is at most poly-log(n).

                       T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                              6
                      T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

    As we mentioned already, AM is a very strong model of communication, for which we currently do
not have any non-trivial lower bound. All the following classes can be viewed (and some of them have
been defined) as “weaker forms” of AM: for all of them we already have strong lower bounds.
Definition 2.7 (Merlin-Arthur, MA). For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
     If for some k(n) there are functions h1 , . . . , h2k(n) : {0, 1}n × {0, 1}n → {0, 1, ⊥}, whose BPP-complex-
                                          W k(n)
ity is at most k(n), such that f (x, y) ≡ 2i=1 hi (x, y), then we say that the MA-complexity of f , denoted
by MA( f ), is at most k(n).
     We call Merlin-Arthur (MA) the class of functions whose MA-complexity is at most poly-log(n).
                  W
    Note that “ ” of partial functions is defined as in (2.1).
Definition 2.8 (Small-advantage Bounded-error Probabilistic Polylogarithmic, SBP). For every n ∈ N,
let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
     If for input distribution µn and some α > 0 there exists a P-protocol Π of cost at most k0 (n) such that
                                                                    
                                 Pr     Π accepts (X,Y ) f (X,Y ) = 1 ≥ α and
                                 (X,Y )∼µn
                                                                         α
                                    Pr      Π accepts (X,Y ) f (X,Y ) = 0 ≤ ,
                                 (X,Y )∼µn                                 2

then we call Π an SBP-protocol for f with respect to µn . The complexity of this protocol is k0 (n) +
log(1/α) (note that the value of α may depend on both n and µn ).
    If with respect to every µn there exists a SBP-protocol for f of cost at most k(n), then we say that the
SBP-complexity of f , denoted by SBP( f ), is at most k(n).
    We denote by SBP the class of functions whose SBP-complexity is at most poly-log(n).
   It was shown in [8, 4] that MA ⊆ SBP ⊆ AM, in [14] that SBP 6= AM and in [9] that SBP 6= MA.
Therefore,

                                                   MA ⊂ SBP ⊂ AM . 4


    The following complexity measure is a core methodological notion for this work.
Definition 2.9 (Layer complexity). Let Π be an NP-protocol for solving f : {0, 1}n × {0, 1}n → {0, 1, ⊥},
possibly with error.
   We say that the protocol Π
    • has layer complexity l if every (x, y) ∈ {0, 1}n × {0, 1}n belongs to at most l rectangles of Π ;

    • has 0-layer complexity l0 if every (x, y) ∈ f −1 (0) belongs to at most l0 rectangles of Π ;

    • has 1-layer complexity l1 if every (x, y) ∈ f −1 (1) belongs to at most l1 rectangles of Π .
   4 Unless stated otherwise, we implicitly assume partial functions as the default type of communication problems. In those

cases when the object under consideration is a total function and the fact is significant for the context, that will be mentioned
explicitly.


                            T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                               7
                                                   D MITRY G AVINSKY

    The concept of layer complexity in the context of nondeterministic communication is very natural
and not new, dating back at least to [12] by Karchmer, Newman, Saks and Wigderson. We will use it
extensively in order to analyse some previously known subclasses of AM with strong lower bounds and to
define some new ones.
    The following two classes were introduced quite recently by Göös, Pitassi and Watson [10].

Definition 2.10 (Unambiguous Arthur-Merlin, UAM). For every n ∈ N, let f : {0, 1}n × {0, 1}n →
{0, 1, ⊥}.
     If for some constant ε < 12 and every input distribution µn there exists an NP-protocol of cost at most
k(n) and 1-layer complexity 1 that solves f with error at most ε, then we say that the UAM-complexity of
f , denoted by UAM( f ), is at most k(n).
     We denote by UAM the class of functions whose UAM-complexity is at most poly-log(n).

Definition 2.11 (Unambiguous Arthur-Merlin with perfect completeness, UAMcompl ). For every n ∈ N,
let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
     If for every input distribution µn there exists an NP-protocol of cost at most k(n) and 1-layer
complexity 1 that solves f with perfect completeness (that is, it accepts  every (x, y) ∈ f −1 (1)) and
soundness error at most 12 (that is, Pr µn (X,Y ) is accepted f (X,Y ) = 0 ≤ 12 ), then we say that the
                                                                          

UAMcompl -complexity of f , denoted by UAMcompl ( f ), is at most k(n).
     We denote by UAMcompl the class of functions whose UAMcompl -complexity is at most poly-log(n).

    The classes AM, MA, UAMcompl and UAM can be defined in an alternative, more “narrative” way,
where an almighty prover Merlin interacts with a limited verifier Arthur (who, in turn, is a two-headed
union of the players Alice and Bob). In the cases of AM, UAMcompl and UAM these variants correspond to
the closures with respect to mixed strategies (that are equivalent to our definitions, as mentioned earlier).
    Note that the error parameter in the definitions of AM and UAMcompl are fixed without loss of
generality, while for UAM it may be any constant ε < 21 . In the first two cases the error can be trivially
reduced to an arbitrary constant by repeating the protocol constant number of times; on the other hand,
in the case of UAM the possibility of efficient error reduction is not known, so fixing a specific ε might
result in weakening the model.5
    It was shown in [10] that NP * UAM. They also showed that UAM * SBP held in the context of
query complexity, later in [9] this separation was generalised to the case of communication complexity,
thus implying that UAM and SBP are incomparable:

                                          UAM * SBP and SBP * UAM .

    On the other hand, UAM and SBP are the strongest previously known communication complexity
classes contained in AM with non-trivial lower bounds, which makes it interesting to look for their
“natural merge” and try to prove good lower bounds there. That will be the quest of the next section.
   5 In order to reduce the error via repetition, the answer of the new protocol should be the majority vote of the individual

answers in the case of AM, and their logical conjunction in the case of UAMcompl . The problem with this approach for UAM
stems from the fact that in order to perform two-sided error reduction via repetition, one must take the majority vote of the
individual answers, which would ruin the required uniqueness of 1-certificates.


                           T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                             8
                    T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

3 Layered Arthur-Merlin: getting as close to AM as we can
Let us try to construct as strong a communication model “under AM” as we can analyse.
    We start by considering several slightly stronger modifications of MA that will emphasise the intuition
behind the main definitions that will follow.

Definition 3.1 (MA0 ). For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
   If for some k(n) and 1 ≤ t(n) ≤ 2k(n) there are functions h1 , . . . , ht(n) : {0, 1}n × {0, 1}n → {0, 1, ⊥},
                                                                                      Wt(n)
whose BPP13·t 2 (n) -complexity is at most k(n), such that f (x, y) ≡ i=1 hi (x, y), then we say that the
MA0 -complexity of                     0
                   f , denoted by MA0 ( f ), is at most k(n).
    We call such hi i ∈ [t] an MA -protocol for f . We address the value t as the layer complexity of
this protocol.6

    Observe that

                                            MA( f ) ≤ MA0 ( f ) ∈ O (MA( f ))2
                                                                                    


always holds: the inequality follows trivially from the definitions, and the containment results from the
well-known fact that for every function h and ε > 0, BPPε (h) ∈ O BPP(h) · log ε1 . So, MA is the class
                                                                                    

of functions, whose MA0 -complexity is at most poly-log(n).
    Now suppose MA0 ( f ) ≤ k(n), what does it imply with respect to a known input distribution µ? In
this case for every hi there is a P-protocol of cost at most k(n) that computes a function gi , such that
Pr µ [hi (X,Y ) 6= gi (X,Y )] ≤ 3·t 21(n) ; accordingly, the union bound gives

                                                          t(n)
                                            "                           #
                                                          _                        1
                                       Pr f (X,Y ) 6=            gi (X,Y ) ≤            .
                                        µ
                                                          i=1
                                                                               3 · t(n)

    What can we say about a communication complexity class that only requires that the above holds for
every µ: in particular, what will be its relation to MA? Let us define it.

Definition 3.2 (MA). For every n ∈ N, let f : {0,1}n × {0, 1}n → {0, 1, ⊥}.
    For some k(n) and 1 ≤ t(n) ≤ 2k(n) , let Π = gi i ∈ [t(n)] be a family of functions gi : {0, 1}n ×
{0, 1}n →h {0, 1, ⊥}, whose P-complexity
                                     i      is at most k(n), such that for some input distribution µn it holds
                       Wt(n)
that Pr µn f (X,Y ) 6= i=1 gi (X,Y ) ≤ 1/3t(n). Then we call Π an MA-protocol of cost k(n) for f with
respect to µn .
    If for every input distribution µn there exists an MA-protocol of cost k(n) for f , then we say that the
MA-complexity of f , denoted by MA( f ), is at most k(n).
    We denote by MA the class of functions whose MA-complexity is at most poly-log(n).
   6 Note the slight inconsistency in our use of the term layer complexity and the parameter t that denotes it: most of the time

they stand for the actual number of rectangles that an input value belongs to, but occasionally—in particular, in the context of
MA0 —we use them for an upper bound on the number of possible “witnesses.” The main reason for that is the lack of natural
correspondence between the BPP-protocols (intrinsic to the definition of MA0 ) and combinatorial rectangles.


                            T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                              9
                                                     D MITRY G AVINSKY

    Note that

                                                          MA ⊆ MA

follows from the definition and the previous discussion: MA( f ) ≤ MA0 ( f ) ∈ O (MA( f ))2 .
                                                                                           

    Towards our goal to construct a communication model under AM as strong as we can analyse, let us
look at UAMcompl : together with MA these are, arguably, the two most natural (though not the strongest)
“sub-AM” models for which we have good lower bounds. Conceptually, the insightful lower bounds
given by Klauck [13] for MA and by Göös, Pitassi and Watson [10] for UAMcompl can be viewed as two
different approaches to analysing strong “sub-AM” models of communication complexity.
    On the one hand, the more recently defined UAMcompl has at least one important “AM-like” property
that MA lacks: AM puts no limitations on the layer complexity of protocols; MA limits the number of
“layers” over any input pair; UAMcompl and UAM only limit the 1-layer complexity (that is, they let the
0-layer complexity of a protocol be arbitrary, like AM and unlike MA). This difference seems to be rather
crucial:

    • While the lower-bound argument of [13] against MA can be generalised to work against a commu-
      nication model that would limit only the 0-layer complexity of a protocol, it does not seem to go
      through if only the 1-layer complexity is limited.

    • If we consider the natural (and the most common) situation when the target function is balanced
      with respect to its “hard” distribution—which is the case, for instance, for all functions with
      low discrepancy—then the “expected density” of protocol’s rectangles over the points in the
      (erroneously) accepted ε-fraction of f −1 (0) would be much higher than the density in the (rightly)
      accepted majority of f −1 (1). In other words, the expected number of protocol’s rectangles that
      an accepted (x, y) ∈ f −1 (0) belongs to would be considerably higher than the analogous value for
      (x, y) ∈ f −1 (1). Accordingly, limiting only the 1-layer complexity feels like a weaker restriction
      (i. e., resulting in a stronger defined model) than limiting only the 0-layer complexity (or both).

    On the other hand, even though the classes7 UAMcompl and UAM limit only the 1-layer complexity
of a protocol, the actual quantitative limitation that they put is way too strong: it is 1, as opposed to the
quasi-polynomial limitation on the (total) layer complexity of MA (as emphasised by Definition 3.2).
For instance, it has been shown in [10] that NP * UAM (note that NP ⊆ MA and UAMcompl ⊆ UAM). To
include NP, an “NP-like” class must allow protocols with super-constant 1-layer complexity.
    On the technical level, comparing the definitions of MA (Definition 3.2) and of UAMcompl (Defini-
tion 2.11), we can see that in both cases the membership of a function f implies existence of a family of
rectangles, whose union approximates f —that is, existence of good NP-approximations of f :

    • if f ∈ MA (in particular, if f ∈ MA), then for some t(n) ∈ N and every input distribution µn there
      exists an NP-protocol of cost at most poly-log(n) and layer complexity at most t(n) that solves f
      with error at most 1/3t(n);
    7 Most of the time the distinction between the two classes is insignificant for the context of this work. Nevertheless, we will

rather often explicitly mention both UAM and UAMcompl in the same context, as, in our opinion, the former is more naturally
defined, while the latter is somewhat simpler to analyse and often allows for clearer intuition.


                            T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                                10
                 T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

    • if f ∈ UAMcompl , then for every input distribution µn there exists an NP-protocol of cost at most
      poly-log(n) and 1-layer complexity 1 that solves f with perfect completeness and soundness error
      at most 1/2 with respect to µn .
Note that the above membership condition of UAMcompl is sufficient, and that of MA is just necessary.
   Let us use this intuition to define a new communication complexity class that includes both UAMcompl
and MA.
Definition 3.3 (Layered Arthur-Merlin, LAM). For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
    If for input distribution µn there exists an NP-protocol Π of 1-layer complexity t that solves f with
completeness error at most 1/3 and soundness error at most 1/3t, then we call Π a LAM-protocol for f
with respect to µn . If Π contains K rectangles, then its complexity is log (K).
    If with respect to every µn there exists a LAM-protocol for f of cost at most k(n), then we say that the
LAM-complexity of f , denoted by LAM( f ), is at most k(n).
    We denote by LAM the class of functions whose LAM-complexity is at most poly-log(n).
    It follows readily from the previous discussion that

                                NP, MA, MA,UAMcompl ⊆ LAM ⊆ AM .

    To make it somewhat stronger and to simplify its definition, we have granted to LAM a few additional
relaxations (not needed in order to include MA and UAMcompl ): Most significantly, in LAM the layer
complexity bound t can be chosen per distribution µn , and it does not have to be error-independent, unlike
in the cases of both MA and UAMcompl (for the latter it equals 1).
    Let us further strengthen the model, so that the corresponding complexity class would include all
previously known subclasses of AM with strong lower bounds. The following definition can be viewed as
LAM with relaxed accuracy requirements.
Definition 3.4 (Small-advantage Layered Arthur-Merlin, SLAM). For every n ∈ N, let f : {0, 1}n ×
{0, 1}n → {0, 1, ⊥}.
    If for input distribution µn and some α > 0 there exists an NP-protocol Π of 1-layer complexity t
such that
                                                                 
                               Pr    Π accepts (X,Y ) f (X,Y ) = 1 ≥ α and
                            (X,Y )∼µn
                                                                    α
                               Pr      Π accepts (X,Y ) f (X,Y ) = 0 ≤ ,
                            (X,Y )∼µn                                 2t
then we call Π a SLAM-protocol for f with respect to µn . If Π contains K rectangles, then its complexity
is log (K/α) (the value of α may depend on both n and µn ).
     If with respect to every µn there exists a SLAM-protocol for f of cost at most k(n), then we say that
the SLAM-complexity of f , denoted by SLAM( f ), is at most k(n).
     We denote by SLAM the class of functions whose SLAM-complexity is at most poly-log(n).
    As any LAM-protocol of cost k is also a SLAM-protocol of cost k + log 23 ,

                                        SLAM( f ) < LAM( f ) + 1


                       T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                              11
                                            D MITRY G AVINSKY

holds for all f .
    Later (Section 3.2) we will see that SLAM indeed is a proper subclass of AM that includes all
previously known (as far as we are aware) subclasses of AM with strong lower bounds:

                      NP, MA, MA,UAMcompl , LAM, UAM, SBP ⊆ SLAM ⊂ AM ;

moreover, it is strictly stronger than their union:

                                           UAM ∪ SBP ⊂ SLAM .

3.1     Limitations of LAM and SLAM
Let us see that the SLAM-complexity is a subject to the discrepancy bound.
Definition 3.5 (Discrepancy). For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥} and µn be a
distribution on {0, 1}n × {0, 1}n .
    The discrepancy of f with respect to µn is defined as

                        discµn ( f ) = max r µn (r ∩ f −1 (1)) − µn (r ∩ f −1 (0)) ,
                                          

where r ranges over the combinatorial
                                     rectangles over {0, 1}n × {0, 1}n .
   We denote disc( f ) = min µ discµ ( f ) .
Theorem 3.6. For any f : {0, 1}n × {0, 1}n → {0, 1, ⊥}:
                                                  s                        !
                                                                  1
                                    SLAM( f ) ∈ Ω          log                 .
                                                               disc( f )

      That is,

                                               SLAM ⊆ PP ,

where PP is the class consisting of functions with high discrepancy. Along with other mentioned
properties, this implies

                        SBP ∪ UAM ⊂ SLAM ⊆ AM ∩ PP and SLAM ⊂ AM ,

as AM * PP is known [14].
                                                            √
Corollary 3.7. Any LAM- or SLAM-protocol for IP has cost Ω ( n).
      To prove the theorem we will use the following combinatorial lemma.
                                                      def Sm
Lemma 3.8. Let C1 , . . . ,Cm be finite sets and W =       i=1 Ci . Let t ∈ N, β > 1 and

                                 def         
                              W0 = w ∈ W 1 ≤ | i ∈ [m] w ∈ Ci | ≤ t ,
                                 def     
                              W1 = w ∈ W | i ∈ [m] w ∈ Ci | ≥ β · t .


                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                  12
                    T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

Let µ be a distribution on W , such that
                                                    µ(W1 )
                                                           ≥λ
                                                    µ(W0 )

for some λ > 0. 8                                                 l                   m
                                                              def             γ                          def T
    Then for any γ > λ there exists J ⊆ [m] of size at most k = log β +1      λ        , such that for CJ = j∈J C j
                                                                      2
it holds that
                                                 µ(CJ ∩W1 )
                                                            ≥ γ, 8                                            (3.1)
                                                 µ(CJ ∩W0 )
and
                                                                            k
                                                            min {1, β − 1}
                                      µ(CJ ∩W1 ) ≥ µ(W1 ) ·                       .
                                                                 2m
                                                                                                      T
    Informally, the lemma says that for any family of sets C1 , . . . ,Cm there exists CJ = j∈J C j that
“highlights” points that are contained in more than a certain threshold number of sets Ci ; moreover, the
fraction of such points in W that end up in CJ ∩W is not too small.

Proof. We will find a set Ci0 such that µ(Ci0 ∩ W1 ) is not too small and, at the same time, the ratio
µ(Ci0 ∩W1 )/µ(Ci0 ∩W0 ) is significantly larger than µ(W1 )/µ(W0 ). The result will follow by induction
on t.
     The first part of the argument is the same for the base case (t = 1) and the inductive step (t ≥ 2). Let
Ci (·) denote the characteristic function of Ci , then
                                "                    #         "                   #
                                       m                             m
                                E
                               X∼µ
                                      ∑ Ci (X) X ∈ W1 ≥ β · E X∼µ
                                                                     ∑ Ci (X) X ∈ W0 ;
                                      i=1                            i=1
                               m                            m                        
                                  E Ci (X) X ∈ W1 ≥ β · ∑ E Ci (X) X ∈ W0 ;
                               ∑ X∼µ                     X∼µ
                               i=1                            i=1
                                m                                      
                                                                            
                               ∑      E Ci (X) X ∈ W1 − β · E Ci (X) X ∈ W0 ≥ 0 ;
                                    X∼µ                        X∼µ
                               i=1
                                m                                                     
                                β +1                      β +1                     
                       0 ≤ ∑           · E Ci (X) X ∈ W1 −          · E Ci (X) X ∈ W0
                           i=1   2β X∼µ                        2      X∼µ
                                    m  
                                                           β −1                     
                                = ∑ E Ci (X) X ∈ W1 −                · E Ci (X) X ∈ W1
                                   i=1
                                         X∼µ                   2β X∼µ
                                                                   
                                        β +1                    
                                      −        · E Ci (X) X ∈ W0 ;
                                          2     X∼µ
                            m                      β +1                    
                                                                                
                           ∑ X∼µ Ci (X) X ∈ W1 − 2 · X∼µ Ci (X) X ∈ W0
                                 E                            E
                           i=1
   8 Here let x > y hold for any x, y > 0.
              0


                           T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                 13
                                            D MITRY G AVINSKY
                                               "                            #
                                                   m
                                  β −1
                                ≥      · E         ∑ Ci (X) X ∈ W1
                                   2β X∼µ        i=1
                                 β −1         β −1
                                ≥     ·β ·t ≥      ;
                                  2β            2
                                            β +1                     β −1
               ∃i0 ∈ [m] : E Ci0 (X) X ∈ W1 −       · E Ci0 (X) X ∈ W0 ≥      .
                           X∼µ                  2    X∼µ                  2m
That is,
                               µ(Ci0 ∩W1 ) β + 1 µ(Ci0 ∩W0 ) β − 1
                                          −     ·           ≥      ,
                                 µ(W1 )      2     µ(W0 )     2m
which implies
                                                          β −1
                                       µ(Ci0 ∩W1 ) ≥           · µ(W1 )                                 (3.2)
                                                           2m
and
                                µ(Ci0 ∩W1 ) β + 1 µ(W1 ) β + 1
                                            ≥    ·        ≥    ·λ .                                     (3.3)
                                µ(Ci0 ∩W0 )   2    µ(W0 )   2
    At this point we check whether letting J = {i0 } would satisfy the statement of the lemma. Assume
that it would not; as γ > λ ⇒ k ≥ 1, this necessarily means that (3.3) is insufficient to guarantee (3.1). In
other words, it holds that
                                       β +1      µ(Ci0 ∩W1 )
                                            ·λ ≤             <γ,
                                         2       µ(Ci0 ∩W0 )
where the latter inequality is the contrapositive of (3.1) with respect to J = {i0 }, and therefore
                                                 β +1  γ
                                                      < .                                               (3.4)
                                                   2   λ
      Denote
                                                    def
                                               C0j = C j ∩Ci0

for all j 6= i0 and
                                                 def
                                             Ci000 = Ci0 \
                                                             [
                                                                     Cj .
                                                             j6=i0

Note that Ci000 ⊆ W0 .
    How we continue from here depends on the value of t: first suppose that t = 1 (the base case for
                                                                                   def
the induction). Let i1 ∈ [m] \ {i0 } be such that µ(Ci01 ∩W1 ) is maximised. Let J = {i0 , i1 }. From (3.4) it
follows that
                                            l         γ m
                                        k = log β +1        ≥ 2 = |J| .
                                                  2    λ

                       T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                14
                 T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

From (3.2) and the choice of i1 ,

                                                                 min {1, β − 1} 2
                                                                              
                                 1    β −1
                  µ(CJ ∩W1 ) ≥      ·      · µ(W1 ) > µ(W1 ) ·                    .
                               m − 1 2m                               2m

As t = 1,

                                             µ(CJ ∩W0 ) = 0 ,

which satisfies (3.1) (according to Footnote 8). This finishes the proof of the base case.
              n t ≥ 2. That
    Now suppose           o is, we are inside the inductive step, so let us apply Lemma 3.8 inductively
                 0
to the family C j j 6= i0 with parameters

                                                                  β ·t −1 0
                              m0 = m − 1, t 0 = t − 1, β 0 =             , γ =γ.
                                                                   t −1
Note that this choice corresponds to

                      W 0 = (W ∩Ci0 ) \Ci000 , W00 = (W0 ∩Ci0 ) \Ci000 , W10 = W1 ∩Ci0 ,
                           β +1
                      λ0 =      ·λ ,
                             2
where the last equality follows from (3.3). Note that λ 0 < γ 0 follows from (3.4).
   The lemma guarantees existence of (non-empty) J 0 ⊆ [m] \ {i0 } of size at most
                       l           γ m l           γ m l             γ      m
                  k0 = log β 0 +1       ≤  log β +1        =    log β +1      − 1   = k−1
                              2    λ0            2   λ0               2   λ
                                                                    def T
(where the inequality follows from β 0 > β ), such that for CJ 0 =                 0
                                                                            j∈J 0 C j it holds that

        µ(CJ 0 ∩Ci0 ∩W1 ) µ(CJ 0 ∩W10 )
                         ≥              ≥ γ;
        µ(CJ 0 ∩Ci0 ∩W0 ) µ(CJ 0 ∩W00 )
                                                                     0
                                                       min {1, β − 1} k
                                                        
        µ(CJ 0 ∩Ci0 ∩W1 ) = µ(CJ 0 ∩W10 ) ≥ µ(W10 ) ·
                                                            2m
                                                            k−1
                                                                               min {1, β − 1} k
                                                                                           
                                             min {1, β − 1}
                           ≥ µ(Ci0 ∩W1 ) ·                        ≥ µ(W1 ) ·                    ,
                                                  2m                                2m
                                                            def
where the last inequality follows from (3.2). Letting J = J 0 ∪ {i0 } finishes the proof.

    We are ready to prove the lower bound.

Proof. The argument is as follows. Recall that the core advantage of the models LAM and SLAM over
MA is allowing arbitrarily high 0-layer complexity in efficient protocols: If the 0-layer complexity of a
given protocol Π is not above its 1-layer complexity, then Klauck’s argument for MA limits Π’s strength.

                       T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                           15
                                                        D MITRY G AVINSKY

    The case when the 0-layer complexity is high was the main challenge of this work and the reason
why it was written. Here we further assume that the average 0-layer complexity of Π is noticeably higher
than its average 1-layer complexity (we notice that the other cases can be handled by a straightforward
adaptation of Klauck’s argument). The layer complexity measures the density of Π’s rectangles, and the
assumed difference in the average densities between f −1 (0) and f −1 (1) implies that the membership of
a pair of random variables (X,Y ) in “too many” rectangles makes the event [ f (X,Y ) = 0] more likely.
Lemma 3.8 gives us a “not too small” rectangle intersection—therefore a rectangle by itself—where
many elements belong to many (original) rectangles. The discrepancy assumption applied to this new
rectangle concludes the argument.
    Let µ be a distribution that achieves disc( f ) = discµ ( f ) and
                                                 n                o
                                           Π = ri i ∈ [2k(n) ]

be a SLAM-protocol for f with respect to µ of cost k(n) + log(1/α(n)) that accepts α(n)-fraction of the
elements of f −1 (1), and whose 1-layer complexity is t(n).
    By the definition of SLAM, the soundness error of Π in solving f with respect to µ is at most

                                                              α(n)
                                                                      .                                         (3.5)
                                                             2 · t(n)

    By the definition of discµ (and the fact that {0, 1}n × {0, 1}n is a rectangle),

                                                                                  1 discµ ( f ) 1 disc( f )
                Pr      [ f (X,Y ) = 0] , Pr            [ f (X,Y ) = 1] ∈           ±          = ±          .   (3.6)
             (X,Y )∼µ                        (X,Y )∼µ                             2     2       2    2

    Let l0av (n) denote the average 0-layer complexity of Π, namely

                           def
                  l0av (n) =                 | r ∈ Π (X,Y ) ∈ r | (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) ,
                                                                                           
                                   E
                                 (X,Y )∼µ


where “Π−1 (1)” denotes the set of input pairs accepted by Π.
   Fix n ∈ N. We will consider two cases, distinguished by the value of l0av (n).
   First suppose that

                                                                     4 · t(n)
                                                        l0av (n) ≤            .                                 (3.7)
                                                                         3
Then

                ∑ µ(r ∩ f −1 (1)) ≥ (X,YPr)∼µ (X,Y ) ∈ Π−1 (1) ∩ f −1 (1)
                                                                                          
                r∈Π
                                                                   Pr (X,Y ) ∈ Π−1 (1) f (X,Y ) = 1
                                                                                                   
                                            = Pr [ f (X,Y ) = 1] ·Pr
                                               µ                   µ
                                                             
                                                 1 disc( f )
                                            ≥      −             · α(n) ,
                                                 2       2


                          T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                    16
                   T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

where the last inequality follows from (3.6), and

          ∑ µ(r ∩ f −1 (0)) = ∑                 ∑         µ(x, y)
         r∈Π                      r∈Π (x,y)∈r∩ f −1 (0)
                                                                
                              =          ∑          µ(x, y) ·    r ∈ Π (x, y) ∈ r
                                  (x,y)∈ f −1 (0)
                                                                                                 
                              =    Pr [ f (X,Y ) = 0] · E        | r ∈ Π (X,Y ) ∈ r | f (X,Y ) = 0
                                (X,Y )∼µ               (X,Y )∼µ

                              = Pr (X,Y ) ∈ Π (1) ∩ f −1 (0)
                                               −1
                                                               
                                 µ

                                    E | r ∈ Π (X,Y ) ∈ r | (X,Y ) ∈ Π−1 (1) ∩ f −1 (0)
                                                                                       
                                   ·E
                                     µ

                              = Pr (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) · l0av (n)
                                                               
                                   µ
                                                      α(n) 4 · t(n)
                              ≤ Pr [ f (X,Y ) = 0] ·          ·
                                 µ                   2 · t(n)     3
                                               
                                   1 disc( f )       2 · α(n)
                              ≤      +             ·            ,
                                   2       2              3

where the first inequality follows from (3.5) and (3.7), and the last one from (3.6). Therefore,
                                                                                                 
                                       −1                       −1                1 5 · disc( f )
                        ∑ µ(r ∩ f           (1)) − µ(r ∩ f           (0)) ≥
                                                                                  6
                                                                                    −
                                                                                          6
                                                                                                    · α(n)
                       r∈Π

and for some r0 ∈ Π it holds that
                                                                                                  
                                       −1                       −1                 1 5 · disc( f )
               disc( f ) ≥ µ(r0 ∩ f         (1)) − µ(r0 ∩ f          (0)) ≥          −               · α(n) · 2−k(n)
                                                                                   6       6

and
                                                                             
                                            1                            1
                                k(n) + log                      ∈ log             − O (1) ,
                                           α(n)                       disc( f )

as required.
    Now suppose that

                                                                      4 · t(n)
                                                        l0av (n) >             .
                                                                          3

      Define
                                                                                                   
                                   def                                                  5 · t(n)
                               A=            (x, y)     r ∈ Π (x, y) ∈ r             ≥                  .
                                                                                             4


                          T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                         17
                                              D MITRY G AVINSKY

Let us see that µ(A) cannot be too small.

                 4 · t(n)
                          < l0av (n) = E | r ∈ Π (X,Y ) ∈ r | (X,Y ) ∈ Π−1 (1) ∩ f −1 (0)
                                                                                         
                     3                  µ

                          ≤ Pr (X,Y ) ∈ A (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) · 2k(n)
                                                                      
                             µ
                                                                               
                                                                                5 · t(n)
                                      Pr (X,Y ) ∈ A (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) ·
                                          
                            + 1 −Pr
                                       µ                                               4
                            5 · t(n)
                                               Pr (X,Y ) ∈ A (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) .
                                      + 2k(n) ·Pr
                                                                                       
                          ≤
                                 4              µ

Therefore,
                                                                   t(n) · 2−k(n)
                         Pr (X,Y ) ∈ A (X,Y ) ∈ Π−1 (1) ∩ f −1 (0) >
                            
                          µ                                              12
and
                                                    µ Π−1 (1) ∩ f −1 (0)
                                                                        
                                  Pr [(X,Y ) ∈ A] >                       .
                                   µ                    12 · 2k(n)
On the other hand,

                                                                       1 disc( f ) 2
                                                                                 
                 −1        −1
                                                     −1
                                                            α(n)
             µ Π (1) ∩ f        (0) ≥ max r ∈ Π µ r ∩ f (0) ≥ k(n) ·     −           ,
                                                             2         2    2
where the last inequality follows from the fact that the µ-weight of the largest rectangle of Π is, due
to (3.6), at least                                          
                                       α(n)      1 disc( f )
                                             ·     −           ,
                                       2k(n)     2      2
and the relative µ-weight of f −1 (0) in it is at least

                                                 1 disc( f )
                                                   −         .
                                                 2    2
Assuming [disc( f ) ≤ 12 ] (otherwise the desired statement holds trivially), we get

                                                        α(n)
                                             µ(A) ≥                 .                             (3.8)
                                                      192 · 22·k(n)
      Let
                                def          
                              B = (x, y) 1 ≤ | r ∈ Π (x, y) ∈ r | ≤ t(n) ,

then

                                        µ(B) ≤ µ Π−1 (1) < α(n) ,
                                                        
                                                                                                  (3.9)


                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                        18
                 T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

as Π accepts, with respect to µ, α(n)-fraction of f −1 (1) and smaller fraction of f −1 (0).
    Note that

                                     A ⊂ f −1 (0) and f −1 (1) ∩ Π−1 (1) ⊆ B ,                                  (3.10)

as follows from their definitions and the fact that the 1-layer complexity of Π is t(n).
    Let us make use of the difference in the “rectangle density” of A and B via applying Lemma 3.8.
              def          def      def       def        def
Namely, let m = 2k(n) , Ci = ri , t = t(n), β = 45 and γ = 2. Then the conditions of the lemma hold with
respect to W0 = B, W1 = A and λ = 192·212·k(n) . Then there exists some J ⊆ [2k(n) ], such that for

                                                           def \
                                                       s=          rj
                                                             j∈J


it holds that

                                                    µ(s ∩ A)
                                                             ≥2
                                                    µ(s ∩ B)

and
                                                                        2
                                           µ(s ∩ A) ∈ α(n) · 2−O(k (n)) ,

as follows from (3.8), (3.9) and the statement of the lemma.
    That is,

                                                           µ(s ∩ A)               2
                            µ(s ∩ A) − µ(s ∩ B) ≥                   ∈ α(n) · 2−O(k (n)) .
                                                              2

As s ⊆ Π−1 (1), it follows from (3.10) that

                                                                                   2
                            µ(s ∩ f −1 (0)) − µ(s ∩ f −1 (1)) ∈ α(n) · 2−O(k (n)) ,

and since s is a rectangles’ intersection and therefore a rectangle itself,
                                                                        2
                                            disc( f ) ∈ α(n) · 2−O(k (n)) ,

that is,
                                                                                      s               !
           2
                          1                  1                           1                        1
      O k (n) + log                  ≥ log                  ⇒ k(n) + log               ∈Ω   log                 ,
                          α(n)             disc( f )                     α(n)                   disc( f )

as required.

                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                        19
                                                   D MITRY G AVINSKY

3.2     More about LAM and SLAM
When we defined the communication complexity class SLAM (Definition 3.4), we promised to show later
that

                             NP, MA,UAMcompl , UAM, SBP, LAM ⊆ SLAM ⊂ AM

and

                                                 UAM ∪ SBP ⊂ SLAM .

      The relations

                              NP, MA,UAMcompl ⊆ LAM ⊆ AM and LAM ⊆ SLAM

follow trivially from the definitions. Theorem 3.6 implies that

                                        SLAM ⊆ PP =⇒ SLAM ⊂ AM ,

as PP is the class consisting of functions with high discrepancy and AM * PP is known [14].
    It remains to see that

                                             UAM ∪ SBP ⊂ SLAM ⊆ AM ,

which will be implied by the upcoming Claims 3.9 and 3.13.
Claim 3.9. For any bipartite Boolean function f :

                                          AM( f ) ∈ O (SLAM( f ) + log n) .

Proof. The proof combines the “randomness sparsification” method of Goldwasser and Sipser [8] with
NP-witnessing.
   Assume SLAM( f ) = k(n). Then by Von Neumann’s minimax principle [16] there exists a family
Π = {h1 , . . . , hm } for some m ∈ N, where every hi is a bipartite Boolean function computable by an
NP-protocol of cost at most k(n), such that
                                                         
                                                          ≥ α · m if f (x, y) = 1 ,
                         ∀(x, y) : i ∈ [m] hi (x, y) = 1
                                                           ≤ α2 · m if f (x, y) = 0 ,

for some α ≥ 2−k(n) .9
    By a standard Bernstein-type concentration     argument (e. g., [7], Lemma 1, Hoeffding inequality),
there exists l ∈ O (n/α) ⊆ O 2k(n)+log n such that for some Π0 ⊆ Π of size l it holds that
                                        

                                                         2α
                                                        ≥ 3 · l if f (x, y) = 1 ,
                      ∀(x, y) : i ∈ [l] hi (x, y) = 1
                                                         ≤ α3 · l if f (x, y) = 0 ,
   9 Note that the actual value of t is insignificant for this argument, which is not surprising: if we modify the definition of

SLAM (Definition 3.4) by allowing arbitrary 1-layer complexity, then we end up with a definition of AM.


                           T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                              20
                  T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

where we have assumed without loss of generality that Π0 = {h1 , . . . , hl }.
    By another application of the Hoeffding inequality, for some s ∈ O n + α1 ⊆ O 2k(n)+log n and a
                                                                                            

uniformly random function g : [l] → [s] it holds with positive probability that

                                                                                 ≥ 35
                                                                             
                                                                                            if f (x, y) = 1 ,
                 ∀(x, y) : Pr [∃i ∈ [l] : hi (x, y) = 1 ∧ g(i) = Z]                                             (3.11)
                            ∼[s]
                           Z⊂                                                    ≤ 25       if f (x, y) = 0 .

Fix any such g.
    Consider the following AM-protocol (described below in a distribution-free regime, which is the dual
equivalent of Definition 2.6).

    • The players pick Z ⊂
                         ∼ [s] and send it to Merlin.

    • Merlin responds with i ∈ [l] and w ∈ {0, 1}k(n) .

    • The players accept if and only if hi (X,Y ) = 1 ∧ g(i) = Z, where the former is witnessed by w (recall
      that NP(hi ) ≤ k(n)) .

    By (3.11), this is an AM-protocol for f with error at most 2/5; repeating it several times and
taking the majority vote brings the error bound to at most 1/3. The cost of the resulting protocol is in
O (k(n) + log n), as required.

    To see that UAM ∪ SBP ⊂ SLAM, we prove a somewhat stronger separation:

                                             LAM * UAM ∪ SBP .

For that we will use several results from [10, 9].
                                                   def
Fact 3.10 (NP * UAM [10]). Let ¬Disj(x, y) = 1 − Disj(x, y) for every (x, y) ∈ {0, 1}n × {0, 1}n . Then

                                            UAM(¬Disj) ∈ Ω (n) .

    The following partial function has been used to show that UAMcompl * SBP.

Definition 3.11 (Gut-IPn [10, 9]). For an even m ∈ N, let n = m2 · d200 · log me. For any x ∈ {0, 1}n
and i, j ∈ [m], let xi, j denote the sub-string of x that starts from bit d200 · log me · (m · (i − 1) + j − 1) + 1
and contains d200 · log me bits. Denote

                                                 def   
                                   ∀i ∈ [m] : ]i =         j IP(xi, j , yi, j ) = 1     ,

then
                                        
                                         1 if ∀i
                                                 : ]i = 1;
                                                                                               = m2 ;
                                                                            
                        Gut-IPn (x, y) = 0 if i ]i = 0                 =     i ]i = 2
                                          ⊥ otherwise.
                                        


                         T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                      21
                                               D MITRY G AVINSKY

Fact 3.12 (UAMcompl * SBP [10, 9]).

                           UAMcompl (Gut-IPn ) ∈ O (log n) ;
                                               √             1     3
                                                                        
                           SBP(Gut-IPn ) ∈ Ω m · log m = Ω n 4 · log 4 n .

Claim 3.13. For any n ∈ N such that Gut-IPn is defined and x1 , x2 , y1 , y2 ∈ {0, 1}n , let

                                                 def
                          f ((x1 , x2 ), (y1 , y2 )) = ¬Disj(x1 , y1 ) ∧ Gut-IPn (x2 , y2 ) .

Then

                                      LAM( f ), SLAM( f ) ∈ O log2 n ;
                                                                    

                                      UAM( f ) ∈ Ω (n) ;
                                                   1        3
                                                                
                                      SBP( f ) ∈ Ω n 4 · log 4 n .

Proof. Consider an input distribution µ that fixes X1 = Y1 = 1n and makes the pair (X2 ,Y2 ) come
                         for Gut-IPn (X2,Y2 ), then any SBP-protocol that solves f with respect to µ
from a hard distribution 
must have complexity Ω n1/4 · log3/4 n , according to Fact 3.12. Similarly, a distribution that fixes
(X2 ,Y2 ) ∈ Gut-IPn−1 (1) arbitrarily and makes Disj(X1 ,Y1 ) hard for UAM witnesses that UAM( f ) ∈ Ω (n),
according to Fact 3.10.
     To see that LAM( f ) ∈ O log2 n , let µ be any input distribution for f and let µ 0 be the marginal
                                            

distribution of (X2 ,Y2 ) when ((X1 , X2 ), (Y1 ,Y2 )) ∼ µ. Consider a UAMcompl -protocol Π of complexity
O (log n) that solves Gut-IPn with perfect completeness andsoundness error at most 12 with respect to µ 0 ,
and let Π0 be its amplified version of complexity O log2 n that solves Gut-IPn with soundness error at
most 3n1
          with respect to µ 0 .
     Let Π00 ((X1 , X2 ), (Y1 ,Y2 )) be a nondeterministic protocol for f that does the following:

    • emulates the behaviour of Π0 (X2 ,Y2 ) ;

    • runs the trivial NP-protocol for ¬Disj(X1 ,Y1 ) ;

    • accepts if and only if the two steps above have accepted.

The complexity of Π00 is in O log2 n .
                                     

     Since an NP-protocol for ¬Disj is exact (though nondeterministic), an error can come only from the
first step; since Π0 has perfect completeness, so does Π00 . The soundness error of Π00 in solving f with
respect to µ equals that of Π0 in solving Gut-IPn with respect to µ 0 , which is at most 3n
                                                                                          1
                                                                                            . Since Π0 has
                                                   00
1-layer complexity 1, the 1-layer complexity of Π equals that of the NP-protocol for ¬Disj, which is n.
So, Π00 is a valid LAM-protocol for f with respect to µ, as required.

                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                             22
                      T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION
                        √
4      On proving super- n lower bounds against MA
                                                         √
When Klauck [13] showed that MA(Disj), MA(IP) ∈ Ω ( n), many believed that the actual MA-complexity
of these problems was in Ω (n). So, it came as a surprise when Aaronson and Wigderson [1] demonstrated
                                          √                                             √                
MA-protocols for Disj and IP of cost O ( n log n), later improved by Chen [6] to O n log n log log n .
That highlighted the importance of proving the “ultimate” lower bound of Ω (n) for the MA-complexity
of any explicit communication problem.
    We will define a communication model MA g (Definition 4.1) that can be viewed as “non-uniform MA.”
Non-uniformity is the only possible source of advantage of MA g over MA: we will see (Claim 4.2) that
imposing the “uniformity constraint” on MA-protocols makes them not stronger than MA-protocols. All
                                          g
known lower bounds on MA( f ) readily translate to MA( g f ). Intuitively, a lower bound argument that
explores the uniformity of MA (as opposed to MA)g must have a very unusual structure.
                                                                     p             
                                                           g f) ∈ O
    We will see (Theorem 4.3) that for any f it holds that MA(           n · AM( f ) ; in other words, any
                         g f ) ∈ ω (√n) will have non-trivial consequences for AM( f ).10 Furthermore,
lower bound of the form MA(
                                                                 √
according to Claim 4.2, any lower bound of the form MA( f ) ∈ ω ( n) either should exploit the uniformity
of MA (the only difference between MA and MA), g or it will have non-trivial consequences for AM( f ).
This partially explains why no such lower bound has been found yet.
                                           g For every n ∈ N, let f : {0, 1}n × {0, 1}n → {0, 1, ⊥}.
Definition 4.1 (Non-uniform Merlin-Arthur, MA).
    If for some k(n), every input distribution µn and every ε > 0 there      are functions h1 , . . . , h2k(n) :
{0, 1}n × {0, 1}n → {0, 1, ⊥}, whose P-complexity is in O k(n) · log ε1 , such that
                                                                       
                                                              
                                                                      k(n)
                                                                     2_
                                            Pr        f (X,Y ) 6=           hi (X,Y ) ≤ ε ,
                                         (X,Y )∼µn
                                                                     i=1

then we say that the MA-complexity
                     g                               g f ), is in O (k(n)) .
                                   of f , denoted by MA(
    The intuition behind the above formulation is the following.11 Any MA-protocol allows for error
reduction at the cost of (at most) a multiplicative factor of O log ε1 , where ε is the “target error.” It has
been intuitively clear that this this property of MA is important: in particular, it was used by Klauck [13]
to prove his lower bound on the MA-complexity. The concept of MA-complexity
                                                                       g                isolates this property,
effectively allowing for its direct analysis, which is the main subject of this part of our work.
    First of all, let us see that the non-uniformity is the only possible source of advantage of MA
                                                                                                 g over MA.

Claim 4.2. For every n ∈ N, let g1 , . . . , g2k(n) : {0, 1}n × {0, 1}n → {0, 1, ⊥} be such that for every input
distribution µn and every ε > 0 the conditions of Definition 4.1 hold, as well as the additional requirement
that
                                 h       i
                            ∀i ∈ 2k(n) :           Pr [hi (X,Y ) 6= gi (X,Y )] ≤ ε .
                                                       (X,Y )∼µn

Then the MA-complexity of f is in O (k(n)) .
    10 It is not too hard to demonstrate AM( f ) ∈ Ω (log n) for an explicit f : for example, it holds for so-called index function
          def                                                                g f ) ∈ Ω √n log n .
                                                                                               
Ind(x, i) = xi ; however, it is not clear how to use such examples to obtain MA(
  11 The author thanks the anonymous referee whose comment has resulted in the appearance of this paragraph.



                             T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                               23
                                                 D MITRY G AVINSKY

    Note that the functions g1 , . . . , g2k(n) are fixed (for every n), in particular, they do not depend on µn ε.
The statement says that in order to become sufficient for MA, the definition of MA       g should be restricted by
the additional requirement that all the hi are approximations of the corresponding gi . That is why we
view MA
     g as a non-uniform modification of MA.

                g f ) ∈ O (k(n)). For every input distribution µ and ε > 0, let h               µ,ε
Proof. Assume MA(                                                                i                    denote the function
                                                         g f ).
hi corresponding to these µ and ε from the definition of MA(
    Let ν be the uniform input distribution, then
                                               k(n)
                                              2_                             k(n)
                                                                            2_
                                    ∀x, y :           hν,ε
                                                       i (x, y) −→                  gi (x, y)
                                                                ε→0
                                                i=1                          i=1

and
                                                  k(n)
                                                 2_
                                      ∀x, y :            hν,ε
                                                          i (x, y) −→ f (x, y) ,
                                                                       ε→0
                                                 i=1

therefore
                                                              k(n)
                                                             2_
                                                f (x, y) ≡           gi (x, y) .                                    (4.1)
                                                              i=1
                                                              1
                         g it must hold that P(hµ, 3 ) ∈ O (k(n)) for every input distribution µ. On the
    By the definition of MA                        i
other hand,
                                                                  
                                             µ, 13                      1
                             ∀µ : Pr        hi (X,Y ) 6= gi (X,Y ) ≤ ,
                                  (X,Y )∼µ                              3
which means that BPP(gi ) ∈ O (k(n)). Together with (4.1) this implies MA( f ) ∈ O (k(n)).
                               √                    g f ) would have non-trivial consequences for
   Next we claim that a super- n lower bound on MA(
AM( f ).
Theorem 4.3. For any bipartite Boolean function f :
                                               p              
                                  g f) ∈ O
                                  MA(               n · AM( f ) .

Proof. Let AM( f ) = k(n), that is, for every input distribution ν there is an NP-protocol of cost at most
k(n) that solves f with error at most 13 with respect to ν. Via the standard accuracy amplification technique
this implies that for any input distribution ν and ε > 0 there is an NP-protocol of cost O k(n) · log ε1
that solves f with error at most
                            p ε withrespect to ν. In particular, for every √ input distribution ν there is an
                                                                              − n/k(n)
NP-protocol Πν of cost O         n · k(n) that solves f with error at most 2           with respect to ν.
    Let us see that
                                                       p           
                                          g f) ∈ O
                                          MA(               n · k(n) .                                    (4.2)


                         T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                         24
                  T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

For n ∈ N, take
              √any input distribution µn and any ε > 0.                        p                 
             − n/k(n)
    If ε ≤ 2          , then let h1 = f : as its P-complexity is at most n ∈ O   n · k(n) · log ε1 , the
“decomposition”
                                                         1
                                                         _
                                            f (X,Y ) =         hi (X,Y )
                                                         i=1

satisfies the requirements of Definition
                                 √         4.1 with respect to (4.2).                 p          
                              − n/k(n)
    Now suppose that ε > 2               . Then Πµ is an NP-protocol of cost O            n · k(n) that solves f
                                                            √        
                                                          O     n·k(n)
with error less than ε with respect to µ. Let K ∈ 2                     be the number of rectangles contained in
Πµ , denote their characteristic functions by h1 , . . . , hK . As the P-complexity of every such hi is 1 and
                           "                            #
                                           K
                                           _                                          
                      Pr      f (X,Y ) 6= hi (X,Y ) = Pr f (X,Y ) 6= Πµ (X,Y ) < ε ,
                    (X,Y )∼µn                                  µn
                                         i=1

the requirements of Definition 4.1 with respect to (4.2) are satisfied, and the result follows.


5    Conclusions
Among those communication complexity regimes that reside well beyond our current level of understand-
ing, the model of Arthur-Merlin (AM) may be the closest to us. The motivation of this work has been to
explore the “neighbourhood” of AM that we might be able to analyse.

    • We have defined and analysed a new communication complexity class, SLAM, strictly included in
      AM and strictly stronger than the union of all previously known subclasses of AM.
                                                                        √
    • We have identified one possible source of hardness in proving ω ( n) lower bounds against MA:
      such a bound would either be of a “very special form,” or imply a non-trivial lower bound against
      AM.

    A few questions that have remained open can be viewed as possible further steps towards understand-
ing AM. For instance:

    • What is the SLAM-complexity of Disj? Note that even its UAM-complexity is not understood yet
      (see [10] for details).
                                       √       
    • Can we prove a lower bound of Ω n log n on the MA-complexity of an explicit function (see
      Footnote 10)?

    • What approaches to understanding AM look promising?

          – Shall we try hard to prove a lower bound of n1/2+Ω(1) on the MA-complexity of an explicit
            function?

                        T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                                 25
                                           D MITRY G AVINSKY

         – Are there complexity classes inside AM with non-trivial advantage over SLAM (or incompara-
           ble to it), which we can analyse?

    Finally, we would like to mention a result that is somewhat dual to this work from the conceptual
point of view: Bouland, Chen, Holden, Thaler and Vasudevan [5] define communication complexity
classes NISZK cc and SZK cc , and show that

                           NISZK cc ⊆ SZK cc ⊆ AM and NISZK cc * UPP ,

where UPP is the class consisting of functions with high sign-rank; UPP is known to strictly contain PP.
Accordingly, the quest of understanding AM is at least as hard as that of understanding NISZK cc , and the
latter might be simpler if NISZK cc ⊂ AM.


Acknowledgements
I am grateful to Thomas Watson for many useful comments and suggestions. I would also like to thank
the authors of helpful anonymous reviews. The most careful proofreading done by the editorial team of
Theory of Computing has contributed not just to the presentation quality of the results, but also to the
author’s confidence in them.


References
 [1] S COTT A ARONSON AND AVI W IGDERSON: Algebrization: A new barrier in complexity
     theory. ACM Trans. Comput. Theory, 1(1):1–54, 2009. Preliminary version in STOC’08.
     [doi:10.1145/1490270.1490272] 4, 23

 [2] H AROLD A BELSON: Lower bounds on information transfer in distributed computations. J. ACM,
     27(2):384–392, 1980. Preliminary version in FOCS’78. [doi:10.1145/322186.322200] 6

 [3] L ÁSZLÓ BABAI , P ÉTER F RANKL , AND JANOS S IMON: Complexity classes in communi-
     cation complexity theory. In Proc. 27th FOCS, pp. 337–347. IEEE Comp. Soc., 1986.
     [doi:10.1109/SFCS.1986.15] 6

 [4] E LMAR B ÖHLER , C HRISTIAN G LASSER , AND DANIEL M EISTER: Error-bounded probabilistic
     computations between MA and AM. J. Comput. System Sci., 72(6):1043–1076, 2006. Preliminary
     version in MFCS’03. [doi:10.1016/j.jcss.2006.05.001] 7

 [5] A DAM B OULAND , L IJIE C HEN , D HIRAJ H OLDEN , J USTIN T HALER , AND P RASHANT VASUDE -
     VAN: On the power of statistical zero knowledge. SIAM J. Comput., 49(4):1–58, 2020. Preliminary
     version in FOCS’17. [doi:10.1137/17M1161749, arXiv:1609.02888] 26

 [6] L IJIE C HEN: On the hardness of approximate and exact (bichromatic) maximum in-
     ner product. Theory of Computing, 16(4):1–50, 2020. Preliminary version in CCC’18.
     [doi:10.4086/toc.2020.v016a004] 4, 23

                       T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                            26
                T HE L AYER C OMPLEXITY OF A RTHUR -M ERLIN - LIKE C OMMUNICATION

 [7] E VGENY D RUKH AND Y ISHAY M ANSOUR: Concentration bounds for unigram language models.
     JMLR, 6(42):1231–1264, 2005. JMLR. 20

 [8] S HAFI G OLDWASSER AND M ICHAEL S IPSER: Private coins versus public coins in interactive
     proof systems. In S ILVIO M ICALI, editor, Randomness in Computation, volume 5 of Advances in
     Computing Research, pp. 73–90. JAI Press, 1989. Preliminary version in STOC’86. 7, 20

 [9] M IKA G ÖÖS , S HACHAR L OVETT, R AGHU M EKA , T HOMAS WATSON , AND DAVID Z UCKERMAN:
     Rectangles are nonnegative juntas. SIAM J. Comput., 45(5):1835–1869, 2016. Preliminary version
     in STOC’15. [doi:10.1137/15M103145X] 3, 7, 8, 21, 22

[10] M IKA G ÖÖS , T ONIANN P ITASSI , AND T HOMAS WATSON: Zero-information protocols and
     unambiguity in Arthur–Merlin communication. Algorithmica, 76(3):684–719, 2016. Preliminary
     version in ITCS’15. [doi:10.1007/s00453-015-0104-9] 3, 4, 8, 10, 21, 22, 25

[11] M IKA G ÖÖS , T ONIANN P ITASSI , AND T HOMAS WATSON: The landscape of communication
     complexity classes. Comput. Complexity, 27(2):245–304, 2018. [doi:10.1007/s00037-018-0166-6]
     3, 5

[12] M AURICIO K ARCHMER , I LAN N EWMAN , M IKE S AKS , AND AVI W IGDERSON: Non-deterministic
     communication complexity with few witnesses. J. Comput. System Sci., 49(2):247–257, 1994.
     Preliminary version in SCT(CCC)’92. [doi:10.1016/S0022-0000(05)80049-2] 8

[13] H ARTMUT K LAUCK: Rectangle size bounds and threshold covers in communication complexity. In
     Proc. 18th IEEE Conf. on Comput. Complexity (CCC’03), pp. 118–134. IEEE Comp. Soc., 2003.
     [doi:10.1109/CCC.2003.1214415, arXiv:cs/0208006] 2, 4, 10, 23

[14] H ARTMUT K LAUCK: On Arthur Merlin games in communication complexity. In Proc.
     26th IEEE Conf. on Comput. Complexity (CCC’11), pp. 189–199. IEEE Comp. Soc., 2011.
     [doi:10.1109/CCC.2011.33, arXiv:1101.0523] 3, 7, 12, 20

[15] E YAL K USHILEVITZ AND N OAM N ISAN: Communication Complexity. Cambridge Univ. Press,
     1997. [doi:10.1017/CBO9780511574948] 5

[16] J OHN VON N EUMANN: Zur Theorie der Gesellschaftsspiele. Mathematische Annalen, 100(1):295–
     320, 1928. EuDML. 6, 20

[17] A NDREW C HI -C HIH YAO: Some complexity questions related to distributive computing. In Proc.
     11th STOC, pp. 209–213. ACM Press, 1979. [doi:10.1145/800135.804414] 6




                     T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                       27
                                         D MITRY G AVINSKY

AUTHOR

    Dmitry Gavinsky
    Researcher
    Department of Mathematical Logic
     and Theoretical Computer Science
    Institute of Mathematics of the
     Czech Academy of Sciences
    Praha, Czech Republic
    gavinsky math cas cz
    https://users.math.cas.cz/~gavinsky/


ABOUT THE AUTHOR

    In the good old days D MITRY G AVINSKY studied at the Technion - Israel Institute of
        Technology and at the University of Calgary. Thanks to the support of his scientific
        advisers Nader Bshouty, Richard Cleve and John Watrous, he graduated from both
        institutions, and now he lives and works in the best city in the world. There he enjoys the
        best beer and loves good music and good books.




                     T HEORY OF C OMPUTING, Volume 17 (8), 2021, pp. 1–28                             28