Optimal Composition Theorem for Randomized Query Complexity

Authors Dmytro Gavinsky, Troy Lee, Miklos Santha, Swagato Sanyal,
Plaintext
                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35
                                        www.theoryofcomputing.org




          Optimal Composition Theorem for
           Randomized Query Complexity
                Dmytro Gavinsky∗†   Troy Lee‡                                       Miklos Santha
                                Swagato Sanyal§
             Received January 11, 2021; Revised December 29, 2022; Published December 30, 2023




       Abstract. For any set 𝑆, any relation 𝑓 ⊆ {0, 1} 𝑛 × 𝑆, and any partial Boolean
       function 𝑔 defined on a subset of {0, 1} 𝑚 , we show that
                                                                        q          
                                                    𝑛
                                       ℝ1/3 ( 𝑓 ◦ 𝑔 ) ∈ Ω ℝ4/9 ( 𝑓 ) ·       ℝ1/3 (𝑔) ,

       where ℝ𝜖 (·) stands for the bounded-error randomized query complexity with error at
       most 𝜖, and 𝑓 ◦ 𝑔 𝑛 ⊆ ({0, 1} 𝑚 )𝑛 × 𝑆 denotes the composition of 𝑓 with 𝑛 instances of
       𝑔. This result is new even in the special case when 𝑆 = {0, 1} and 𝑔 is a total function.
           We show that the new composition theorem is optimal for the general case of
       relations: A relation
                       √        𝑓0 and a partial Boolean function 𝑔0 are constructed, such that
       ℝ4/9 ( 𝑓0 ) ∈ Θ( 𝑛), ℝ1/3 (𝑔0 ) ∈ Θ(𝑛) and ℝ1/3 ( 𝑓0 ◦ 𝑔0𝑛 ) ∈ Θ(𝑛).
    A preliminary version of this paper appeared in the Proceedings of the 46th International Colloquium on
Automata, Languages, and Programming (ICALP’19) [10].
  ∗ In 2022 D. G. changed his first name, as explained on his Internet page.
  † Partially funded by the grant 19-27871X of GA ČR.
  ‡ Supported by the Australian Research Council (Grant No: DP200100950).
  § Supported by an ISIRD Grant from Sponsored Research and Industrial Consultancy, IIT Kharagpur.



ACM Classification: F.1.2, F.2.3
AMS Classification: 68Q17, 68W20
Key words and phrases: query complexity, randomized decision tree, composed function, lower
bound

© 2023 Dmytro Gavinsky, Troy Lee, Miklos Santha, and Swagato Sanyal
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2023.v019.a009
                 D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

          The theorem is proved via introducing a new complexity measure,            max-conflict
       complexity, denoted by 𝜒(·).  Its investigation shows that 𝜒(𝑔)
                                                                               p
                                ¯                                   ¯     ∈ Ω( ℝ1/3 (𝑔)) for any
       partial Boolean function 𝑔 and ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) ∈ Ω(ℝ4/9 ( 𝑓 ) · 𝜒(𝑔))
                                                                        ¯     for any relation 𝑓 ,
       which readily implies the composition statement. It is further shown that 𝜒(𝑔)     ¯    is
       always at least as large as the sabotage complexity of 𝑔 (introduced by Ben-David and
       Kothari in 2016).


1     Introduction
Let 𝑆 be a finite set, |𝑆| ≥ 2. An 𝑆-relation in 𝑛 Boolean variables is defined to be a subset
 𝑓 ⊆ {0, 1} 𝑛 × 𝑆 such that for every 𝑥 ∈ {0, 1} 𝑛 there exists a 𝑠 ∈ 𝑆 such that (𝑥, 𝑠) ∈ 𝑓 . (For
motivation see Section 2.) If 𝑆 = {0, 1}, we call the 𝑆-relation a Boolean relation. A partial Boolean
function in 𝑚 variables is a function 𝑔 : 𝑇 → {0, 1}, where 𝑇 ⊆ {0, 1} 𝑚 . We will identify 𝑔
with the Boolean relation {(𝑥, 𝑔(𝑥)) | 𝑥 ∈ 𝑇} ∪ {(𝑥, 𝑏) | 𝑥 ∉ 𝑇, 𝑏 ∈ {0, 1}} ⊆ {0, 1} 𝑚 × {0, 1}.
𝑔 is called a total Boolean function if 𝑇 = {0, 1} 𝑚 . The composition of an 𝑆-relation 𝑓 and a
partial Boolean function 𝑔 is the 𝑆-relation 𝑓 ◦ 𝑔 𝑛 ⊆ ({0, 1} 𝑚 )𝑛 × 𝑆 defined as follows. A tuple
(𝑥 1 , . . . , 𝑥 𝑛 , 𝑠) is in 𝑓 ◦ 𝑔 𝑛 if and only if one of the following two conditions holds:

    1. There exists an 𝑖 such that 𝑥 𝑖 is not a valid input of 𝑔, i. e., 𝑥 𝑖 ∉ 𝑇.

    2. Each 𝑥 𝑖 is a valid input of 𝑔 and ((𝑔(𝑥1 ), . . . , 𝑔(𝑥 𝑛 )), 𝑠) ∈ 𝑓 .

Note that if 𝑓 is a partial Boolean function in 𝑛 Boolean variables with domain 𝑅 ⊆ {0, 1} 𝑛 (i. e.,
 𝑓 : 𝑅 → {0, 1}), then 𝑓 ◦ 𝑔 𝑛 is a partial Boolean function in 𝑚𝑛 Boolean variables with domain
{(𝑥 1 , . . . , 𝑥 𝑛 ) ∈ 𝑇 𝑛 | (𝑔(𝑥 1 ), . . . , 𝑔(𝑥 𝑛 )) ∈ 𝑅} ⊆ ({0, 1} 𝑚 )𝑛 .
     A partial Boolean function models a decision task in computer science, whose domain is
the set of valid binary encodings of inputs to the task. An 𝑆-relation models a search problem,
where each input may correspond to more than one items in a search space (e. g., a Boolean
formula may have more than one satisfying assignments).
     Relating the complexity of 𝑓 ◦ 𝑔 𝑛 to the complexities of 𝑓 and 𝑔 is a natural research problem.
A query algorithm for computing an 𝑆-relation ℎ is allowed to query individual bits of the input 𝑥,
with the goal of outputting an element 𝑠 such that (𝑥, 𝑠) ∈ ℎ. The query complexity of an algorithm
is the maximum possible number of queries that it makes.
     Query algorithms can be deterministic, randomized or quantum, where the latter two classes
allow for (bounded) errors. The corresponding query complexity of an 𝑆-relation ℎ – denoted,
respectively, by 𝔻(ℎ), ℝ(ℎ) or ℚ(ℎ) – is the minimal query complexity of an algorithm that
belongs to the corresponding class and computes ℎ with error 1/3. 1 Section 2 contains formal
definitions of various query complexity measures.
     It is easy to see that 𝔻( 𝑓 ◦ 𝑔 𝑛 ) ≤ 𝔻( 𝑓 ) · 𝔻(𝑔). 2 For the cases of randomized and quantum
   1 ℝ𝜖 (ℎ) stands for the 𝜖-error randomized query complexity, and ℚ𝜖 (ℎ) denotes the 𝜖-error quantum query
complexity.
   2To compute 𝑓 ◦ 𝑔 𝑛 , one can simulate an optimal query algorithm for 𝑓 , serving every query of this algorithm by
running an optimal query algorithm for 𝑔.


                          T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                     2
                 O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

query complexity the argument is slightly more subtle, though very similar conceptually; in
particular, both ℝ( 𝑓 ◦ 𝑔 𝑛 ) ∈ 𝑂(ℝ( 𝑓 ) · ℝ(𝑔) · log ℝ( 𝑓 )) and ℚ( 𝑓 ◦ 𝑔 𝑛 ) ∈ 𝑂(ℚ( 𝑓 ) · ℚ(𝑔)) hold. 3
    Showing strong lower bounds on the query complexity of 𝑓 ◦ 𝑔 𝑛 (preferably, matching the
trivial upper bound) is often more interesting, the corresponding statements are sometimes
called composition theorems. Such results can lead to further theoretical developments (e. g.,
separating complexity measures, as well as different classes in structural complexity).
    For deterministic query complexity it has been shown [14, 17] that
                                              𝔻( 𝑓 ◦ 𝑔 𝑛 ) = 𝔻( 𝑓 ) · 𝔻(𝑔),
which means that the trivial query algorithm for 𝑓 ◦ 𝑔 𝑛 described above is optimal. Similarly, for
bounded-error quantum query complexity it has been shown [12, 15] that
                                           ℚ( 𝑓 ◦ 𝑔 𝑛 ) ∈ Θ(ℚ( 𝑓 ) · ℚ(𝑔)).
   Prior to this work, the randomized query complexity of composition has remained an open
problem, even for the special case where 𝑓 ⊆ {0, 1} 𝑛 × {0, 1} and 𝑔 : {0, 1} 𝑚 → {0, 1} are total
Boolean functions. We partially solve it for the most general case of composition, namely, letting
𝑓 be an 𝑆-relation and 𝑔 be a partial Boolean function.
Theorem 1.1. For any 𝑆-relation 𝑓 ⊆ {0, 1} 𝑛 × 𝑆 and any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},
                                                                      q          
                                                  𝑛
                                     ℝ1/3 ( 𝑓 ◦ 𝑔 ) ∈ Ω ℝ4/9 ( 𝑓 ) ·       ℝ1/3 (𝑔) .

    Note that the above lower bound does not match the trivial upper bound, so we address its
optimality separately. We do this by constructing an example where the above bound is tight.
In other words, while some incomparable lower bounds on ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) are conceivable, the
statement of Theorem 1.1 is a strongest possible statement in general. 4
Theorem 1.2. There exists a relation 𝑓0 ⊆ {0, 1} 𝑛 × {0, 1} 𝑛 and a partial Boolean function 𝑔0 ⊆
{0, 1} 𝑛 × {0, 1}, such that
                                     √
                     ℝ4/9 ( 𝑓0 ) ∈ Θ( 𝑛), ℝ1/3 (𝑔0 ) ∈ Θ(𝑛) and ℝ1/3 ( 𝑓0 ◦ 𝑔0𝑛 ) ∈ Θ(𝑛).
    Theorem 1.2 shows that a perfect composition theorem does not hold when 𝑓 is an 𝑆-relation
for an arbitrary finite set 𝑆 and 𝑔 is any partial Boolean function. After the conference publication
of the current work, Ben-David and Blais [5] exhibited partial Boolean functions 𝑓 and 𝑔 such that
ℝ( 𝑓 ◦ 𝑔 𝑛 ) ∈ 𝑂(
               e ℝ( 𝑓 )2/3 · ℝ(𝑔)2/3 ); their example showed that a perfect composition theorem fails
to hold even when both 𝑓 and 𝑔 are partial Boolean functions. However, in their example, the
randomized query complexities grow logarithmically with the respective number of variables,
which leaves the following possibility open (mentioned as a conjecture in [5]).
    3The multiplicative factor of log ℝ( 𝑓 ) in the case of ℝ(ℎ) is due to the need to reduce the error in computing each
instance of 𝑔 to 𝑂( ℝ(1𝑓 ) ); in the quantum case this can be handled in a more elegant, “lossless” way.
                                                                                                             √
    4The following construction also witnesses the possibility of ℝ( 𝑓 ◦ 𝑔 𝑛 ) ∈ 𝑂(ℝ(𝑔)) when ℝ( 𝑓 ) ∈ Ω( 𝑛) – in other
words, it is true in general that composition with a “hard” Boolean relation makes a Boolean function harder for randomized
query algorithms.


                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                          3
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

Conjecture 1.3 (Ben-David and Blais [5]). Let 𝑓 and 𝑔 be any partial Boolean functions, and 𝑛 be the
number of variables of 𝑓 . Then
                                               Ω(ℝ( 𝑓 ) · ℝ(𝑔))
                                ℝ( 𝑓 ◦ 𝑔 𝑛 ) =                  .
                                                   log 𝑛

1.1    Our approach
We introduce a new complexity measure of Boolean functions, the max-conflict complexity,
denoted by 𝜒(𝑔).
           ¯     We show that 𝜒(𝑔)¯   is a quadratically tight lower bound on the randomized
query complexity of a (partial) function 𝑔.

Theorem 1.4. For any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},
                                                       q           
                                         𝜒(𝑔)
                                         ¯    ∈Ω            ℝ1/3 (𝑔) .

    Theorem 1.4 is tight for the function 𝑔0 in Theorem 1.2. The main technical ingredient of this
article is the following composition statement for the max-conflict complexity.

Theorem 1.5. For any 𝑆-relation 𝑓 ⊆ {0, 1} 𝑛 × 𝑆 and any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},

                                  ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) ∈ Ω ℝ4/9 ( 𝑓 ) · 𝜒(𝑔)   .
                                                                         
                                                                    ¯

      Theorem 1.4 and Theorem 1.5 together imply Theorem 1.1.

1.2    Prior work
In the special case of 𝑓 being a partial function and 𝑔 being a total one, significant progress has
been made by Ben-David and Kothari [7], who showed that
                                                                  s
                                                                        ℝ0 (𝑔) ª
                             ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) ∈ Ω ℝ1/3 ( 𝑓 ) ·                  ®.
                                                  ©
                                                                                                       (1.1)
                                                                      log ℝ0 (𝑔)
                                                  «                              ¬
   To prove the above statement, the authors have introduced and investigated a new complexity
measure of Boolean functions, sabotage complexity, denoted by ℝ𝕊(𝑔). The notion has a very
natural definition and is of independent interest. The authors have shown that for any partial
functions 𝑓 , 𝑔, ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) = Ω(ℝ1/3 ( 𝑓 ) ·ℝ𝕊(𝑔)). The
                                                             authors have further shown that if 𝑔 is a
                                                q
                                                      ℝ0 (𝑔)
total Boolean function, then ℝ𝕊(𝑔) = Ω              log ℝ0 (𝑔)
                                                                 . This implies (1.1). We note here that for
partial functions 𝑔, ℝ𝕊(𝑔) can be arbitrarily smaller than ℝ(𝑔); the authors have shown
                                                                                    √    a family
of partial functions (the collision problem) for which ℝ𝕊(𝑔) = 𝑂(1), but ℝ(𝑔) = Θ( 𝑛), where 𝑛
is the size of the input.
    In this article we show that max-conflict complexity is always lower-bounded by the sabotage
complexity of the same function.

                       T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                               4
                O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

Theorem 1.6. For any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},

                                                  𝜒(𝑔)
                                                  ¯    ≥ ℝ𝕊(𝑔).

   Since 𝑓 is a Boolean function, ℝ1/3 ( 𝑓 ) = Θ(ℝ4/9 ( 𝑓 )); hence Theorem 1.6 along with Theo-
rem 1.5 imply (1.1).

1.3   Proof technique
At a high level, the proof of Theorem 1.5 follows the structure of the proof by Anshu et al. [3] and
Ben-David and Kothari [7]. We show that for every probability distribution 𝜂 over the input p space
       𝑛                                                                                 𝑛
{0, 1} of 𝑓 , there exists a deterministic query algorithm 𝒜 that makes 𝑂(ℝ1/3 ( 𝑓 ◦ 𝑔 )/ ℝ1/3 (𝑔))
queries in the worst case, and computes 𝑓 with high probability, Pr𝑧∼𝜂 [(𝑧, 𝒜(𝑧)) ∈ 𝑓 ] ≥ 5/9. By
the minimax principle (Fact 2.4) this implies Theorem 1.5.
    We do this by using a query algorithm for 𝑓 ◦ 𝑔 𝑛 to construct a query algorithm for 𝑓 .
We define a sampling procedure that for any 𝑧 ∈ {0, 1} 𝑛 samples 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ) such that
(𝑧, 𝑠) ∈ 𝑓 if and only if (𝑥, 𝑠) ∈ 𝑓 ◦ 𝑔 𝑛 . This procedure is defined in terms of 𝒬, which is a
probability distribution over pairs of distributions (𝜇0 , 𝜇1 ), where 𝜇0 is supported on 𝑔 −1 (0) and
𝜇1 is supported on 𝑔 −1 (1). We define a distribution 𝛾𝜂 over ({0, 1} 𝑚 )𝑛 in terms of this sampling
process as follows:

   1. Sample 𝑧 = (𝑧 1 , . . . , 𝑧 𝑛 ) from {0, 1} 𝑛 according to 𝜂.
                                      (𝑖)   (𝑖)
   2. Independently sample (𝜇0 , 𝜇1 ) from 𝒬 for 𝑖 = 1, . . . , 𝑛.
                         (1)       (𝑚)                     (𝑖)
   3. Sample 𝑥 𝑖 = (𝑥 𝑖 , . . . , 𝑥 𝑖 ) according to 𝜇𝑧 𝑖 for 𝑖 = 1, . . . , 𝑛. Return 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ).

Notice that steps (1) and (2) are independent and the order in which they are performed does
not matter. For future reference, for a fixed 𝑧 let 𝛾𝑧 (𝒬) be the probability distribution defined by
the last two steps.
    Now 𝛾𝜂 is simply a probability distribution over ({0, 1} 𝑚 )𝑛 . Thus by the minimax principle
(Fact 2.4 below), there is a deterministic query algorithm 𝒜 0 of worst-case complexity at most
ℝ1/3 ( 𝑓 ◦𝑔 𝑛 ) such that Pr𝑥∼𝛾𝜂 [(𝑥, 𝒜 0(𝑥)) ∈ 𝑓 ◦𝑔 𝑛 ] ≥ 2/3. We first use 𝒜 0 to construct a randomized
query algorithm 𝑇 for 𝑓 with bounded expected query complexity and error at most 1/3. 𝑇 is
presented formally in Algorithm 3. The final algorithm 𝒜 will be a truncation of 𝑇 which has
bounded worst-case complexity and error at most 4/9.
    On input 𝑧, the algorithm 𝑇 seeks to sample a string 𝑥 from 𝛾𝑧 (𝒬), and run 𝒜 0 on 𝑥. Put
another way, 𝛾𝑧 (𝒬) induces a probability distribution over the leaves of 𝒜 0, and the goal of 𝑇 is
to sample a leaf of 𝒜 0 according to this distribution. Since for each 𝑠 ∈ 𝑆, (𝑥, 𝑠) ∈ 𝑓 ◦ 𝑔 𝑛 if and
only if (𝑧, 𝑠) ∈ 𝑓 , and Pr𝑥∼𝛾𝜂 [(𝑥, 𝒜 0(𝑥)) ∈ 𝑓 ◦ 𝑔 𝑛 ] ≥ 2/3, we have that Pr𝑧∼𝜂 [(𝑧, 𝑇(𝑧)) ∈ 𝑓 ] ≥ 2/3.
Thus 𝑇 meets the accuracy requirement.
    The catch, of course, is to specify how 𝑇 samples from 𝛾𝑧 (𝒬) without making too many queries
                              (𝑖)
to 𝑧. To sample 𝑥 𝑖 from 𝜇𝑧 𝑖 seems to require knowledge of 𝑧 𝑖 , and thus 𝑇 would have to query
all of 𝑧.

                         T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                      5
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

    To bypass this problem, we remember that 𝒜 0, being an efficient algorithm, will query only
a few bits of 𝑥. This allows us to sample 𝑥 bit by bit as and when they are queried by 𝒜 0. To see
                                                                          (1)  (1)            (𝑛)   (𝑛)
this more clearly, consider a run of 𝑇 where the pairs of distributions (𝜇0 , 𝜇1 ), . . . , (𝜇0 , 𝜇1 )
were chosen in step (2) of the sampling procedure. Suppose that 𝑇 is trying to simulate 𝒜 at a    0
                    (𝑗)                                                              (𝑗)
vertex 𝑣 where 𝑥 𝑖 is queried. To respond to this query, 𝑇 will sample 𝑥 𝑖 from its marginal
                                (𝑖)
distribution according to 𝜇𝑧 𝑖 conditioned on the event 𝑥 ∈ 𝑣. Let the following be the marginal
                     (𝑗)
distributions of 𝑥 𝑖 for the two possible values of 𝑧 𝑖 .

                                               (𝑗)                       (𝑗)
                                Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 ∈ 𝑣] Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 1 | 𝑥 ∈ 𝑣]
                                      𝑖   𝑧𝑖                    𝑖   𝑧𝑖

                       𝑧𝑖 = 0                   𝑝0                       1 − 𝑝0
                       𝑧𝑖 = 1                   𝑝1                       1 − 𝑝1

Without loss of generality, assume that 𝑝 0 ≤ 𝑝 1 . 𝑇 answers the query by the procedure Bitsampler
given in Algorithm 1. Note that the bit returned by Bitsampler has the desired distribution.

  Algorithm 1: Bitsampler (suppose 𝑝 0 ≤ 𝑝1 )
 1 Sample 𝑟 ∼ [0, 1] uniformly at random.
 2 if 𝑟 < 𝑝 0 then
 3      return 0.
 4
 5 else if 𝑟 > 𝑝 1 then
 6      return 1.
 7
 8 else
 9      query 𝑧 𝑖 .
 10     if 𝑟 ≤ 𝑝 𝑧 𝑖 then
 11         return 0.
 12     else
 13         return 1.


The step in which Bitsampler returns the bit depends on the value of 𝑟 sampled in step 1. In
particular, 𝑧 𝑖 is queried if and only if 𝑟 ∈ [𝑝 0 , 𝑝1 ], and the bit is returned in step 11 or 13. Such a
query to 𝑧 𝑖 contributes to the query complexity of 𝑇. Thus the probability that 𝑇 makes a query
when the underlying simulation of 𝒜 0 is at vertex 𝑣 is (𝑝 1 − 𝑝 0 ). We refer to this quantity as
Δ(𝑣). It plays an important role in our analysis (see Section 6 and Appendix 6.1).
   Our sampling procedure and the tools we use to bound its cost is reminiscent of work of Barak,
Braverman, Chen and Rao [4] in communication complexity. They look at a communication
analog of our setting where two players are trying to sample a leaf in a communication protocol
while communicating as little as possible.

                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                          6
                O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

1.3.1   Conflict complexity and max-conflict complexity
Bounding the query complexity of 𝑇 naturally suggests the quantities that we define in this
article: the conflict complexity 𝜒(𝑔) and the max-conflict complexity 𝜒(𝑔)   ¯      of a partial Boolean
function 𝑔. A formal definition can be found in Section 3; here we give the high-level idea and
motivation behind these quantities.
    Let us set aside 𝑇 for a moment and just consider a deterministic query algorithm ℬ
computing the partial function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1}. Let 𝜇0 and 𝜇1 be distributions with support
on 𝑔 −1 (0) and 𝑔 −1 (1), respectively. For each vertex 𝑣 ∈ ℬ let 𝑝 0 (𝑣) and 𝑝 1 (𝑣) be the probability
that the answer to the query at 𝑣 is 0 on input 𝑥 ∼ 𝜇0 and 𝑥 ∼ 𝜇1 , respectively, conditioned
on 𝑥 reaching 𝑣. Now we can imagine a process 𝒫(ℬ, 𝜇0 , 𝜇1 ) that runs BITSAMPLER on the
tree ℬ: 𝒫(ℬ, 𝜇0 , 𝜇1 ) begins at the root, and at a vertex 𝑣 in ℬ it uniformly chooses a random
real number 𝑟 ∈ [0, 1]. If 𝑟 < min{𝑝 0 (𝑣), 𝑝1 (𝑣)} then the query is “answered” 0 and it moves
to the left child. If 𝑟 > max{𝑝 0 (𝑣), 𝑝1 (𝑣)} then the query is “answered” 1 and it moves to the
right child. If 𝑟 ∈ [min{𝑝0 (𝑣), 𝑝1 (𝑣)}, max{𝑝 0 (𝑣), 𝑝1 (𝑣)}] then the process halts. The conflict
complexity 𝜒(ℬ, (𝜇0 , 𝜇1 )) is the expected number of vertices this process visits before halting.
The conflict complexity of 𝑔 is defined to be

                                        𝜒(𝑔) = max min 𝜒(𝑇, (𝜇0 , 𝜇1 )) ,
                                                 (𝜇0 ,𝜇1 )   𝑇


where the minimum is taken over trees 𝑇 that compute 𝑔. For max-conflict complexity we enlarge
the set over which we maximize. Let 𝒬 be a distribution with finite support over pairs of
distributions (𝜇0 , 𝜇1 ), where supp(𝜇0 ) ⊆ 𝑔 −1 (0), supp(𝜇1 ) ⊆ 𝑔 −1 (1) for each pair (𝜇0 , 𝜇1 ) in the
support of 𝒬. Let 𝜒(ℬ, 𝒬) = 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(ℬ, (𝜇0 , 𝜇1 ))]. The max-conflict complexity 𝜒(𝑔)      ¯     is
defined as
                                     𝜒(𝑔)
                                     ¯    = sup min 𝜒(𝑇, 𝒬) ,
                                                         𝒬       𝑇

where the minimum is taken over trees 𝑇 that compute 𝑔. Clearly, the max-conflict complexity
is at least as large as the conflict complexity.
    To motivate the max-conflict complexity, note that the query complexity of 𝑇 is the number
of times step 9 in Bitsampler is executed, i. e., when the random number 𝑟 ∈ [𝑝0 , 𝑝1 ]. In the
definition of 𝑇 we will choose 𝒬 to achieve close to the optimal value in the definition of 𝜒(𝑔).
                                                                                              ¯
Then intuitively one expects that for each 𝑖, 𝑇 queries 𝑧 𝑖 only after 𝒜 makes about 𝜒(𝑔)
                                                                        0            ¯    queries
into 𝑥 𝑖 . By means of a direct sum theorem for max-conflict complexity we make this intuition
rigorous and prove that the expected query complexity of 𝑇 is at most ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 )/ 𝜒(𝑔).
                                                                                        ¯       We
refer the reader to Section 5 for a formal proof.

1.3.2   𝜒(𝑔)
        ¯    and ℝ(𝑔)
Note that applying Theorem 1.5 with the outer function 𝑓 (𝑧) = 𝑧 1 shows that ℝ1/3 (𝑔) ∈ Ω( 𝜒(𝑔)).5
                                                                                            ¯
We complete the proof of Theorem 1.1 by showing that max-conflict complexity is a quadratically

   5 ℝ1/3 (𝑔) ∈ Ω(𝜒(𝑔))
                  ¯     also follows fairly easily from the defintions of ℝ1/3 (𝑔) and 𝜒(𝑔)).
                                                                                       ¯


                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                          7
              D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

tight lower bound on randomized query complexity, even for partial functions 𝑔. In fact, we
show the stronger result that this is true even for the conflict complexity.

Theorem 1.7. For any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},
                                                    q          
                                       𝜒(𝑔) ∈ Ω          ℝ1/3 (𝑔) .

    Theorem 1.7 is proved in Section 6. At a high level, our proof is reminiscent of the result of
[4] on compressing communication protocols in that both look at a random sampling process to
navigate a tree, and relate the probability of this process needing to query or communicate at a
node to the amount of information that is learned at the node.
    To prove ℝ(𝑔) ∈ 𝑂(𝜒(𝑔)2 ), we again resort to the minimax principle; we show that for
each probability distribution 𝜇 over the valid inputs to 𝑔, there is an accurate and efficient
distributional query algorithm for 𝑔. For 𝑏 ∈ {0, 1}, let 𝜇𝑏 be the distribution obtained by
conditioning 𝜇 on the event 𝑔(𝑥) = 𝑏. By the definition of 𝜒(𝑔), there is a query algorithm ℬ
such that the following is true: if its queries are served by Bitsampler, step 9 is executed within
expected 𝜒(ℬ, 𝜇0 , 𝜇1 ) ≤ 𝜒(𝑔) queries. Note that at a vertex 𝑣 which queries 𝑖, the probability
that step 9 is executed is Δ(𝑣) = | Pr𝜇0 [𝑥 𝑖 = 0 | 𝑥 at 𝑣] − Pr𝜇1 [𝑥 𝑖 = 0 | 𝑥 at 𝑣]|. This roughly
implies that for a typical vertex 𝑣 of ℬ, Δ(𝑣) is at least about 𝜒(𝑔)
                                                                   1
                                                                      . By a technical claim that we
prove (Claim 6.4) this implies that the query outcome at 𝑣 carries about 𝜒(𝑔) 1
                                                                                2 bits of information

about 𝑔(𝑥). Using the chain rule of mutual information, we can show that the mutual information
between 𝑔(𝑥) and the outcomes of the first 𝑂(𝜒(𝑔))2 queries by ℬ is Ω(1). This enables us
to conclude that we can infer the value of 𝑔(𝑥) with success probability 1/2 + Ω(1) from the
transcript of ℬ restricted to the first 𝑂(𝜒(𝑔)2 ) queries. The distributional algorithm of 𝑔 for 𝜇 is
simply the algorithm ℬ terminated after 𝑂(𝜒(𝑔)2 ) queries.

1.3.3   𝜒(𝑔)
        ¯    and ℝ𝕊(𝑔)
To see why 𝜒(𝑔)
              ¯     ≥ ℝ𝕊(𝑔), we first give an alternative characterization of ℝ𝕊(𝑔). For a
deterministic tree 𝑇 computing 𝑔 and strings 𝑥, 𝑦 such that 𝑔(𝑥) ≠ 𝑔(𝑦), let sep𝑇 (𝑥, 𝑦) be the
depth of the node 𝑣 in 𝑇 such that 𝑥 and 𝑦 both reach 𝑣 yet 𝑥 𝑞(𝑣) ≠ 𝑦 𝑞(𝑣) , where 𝑞(𝑣) is the index
queried at 𝑣. Let 𝒯 be a zero-error randomized protocol for 𝑔, i. e., 𝒯 is a probability distribution
supported on deterministic trees that compute 𝑔. Then we have (for a proof see Appendix B)

                             ℝ𝕊(𝑔) = min max
                                          𝑥,𝑦
                                              𝔼𝑇∼𝒯 [sep𝑇 (𝑥, 𝑦)] .
                                        𝒯
                                             𝑔(𝑥)≠𝑔(𝑦)


By von Neumann’s minimax theorem [18], this is equal to

                             ℝ𝕊(𝑔) = max min 𝔼(𝑥,𝑦)∼𝑝 [sep𝑇 (𝑥, 𝑦)] .
                                         𝑝     𝑇

Here, the max is taken over distributions 𝑝 on pairs (𝑥, 𝑦) where 𝑔(𝑥) ≠ 𝑔(𝑦), and the min is
taken over deterministic trees 𝑇 computing 𝑔.

                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                         8
               O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

      We have seen that the definition of 𝜒(𝑔)
                                          ¯    is

                             𝜒(𝑔)
                             ¯    = sup min 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝑇, (𝜇0 , 𝜇1 ))] ,
                                      𝒬    𝑇


where 𝒬 is a distribution with finite support over pairs (𝜇0 , 𝜇1 ) and 𝑇 is a deterministic tree
computing 𝑔. When (𝜇0 , 𝜇1 ) are taken to be singleton distributions, i. e., 𝜇0 puts all its weight
on a single 𝑥 with 𝑔(𝑥) = 0, and 𝜇1 puts all its weight on a single 𝑦 with 𝑔(𝑦) = 1, it is easy to
see that 𝜒(𝑇, (𝜇0 , 𝜇1 )) = sep𝑇 (𝑥, 𝑦) (see Claim 3.4). Since there are only finitely many such pairs
(𝑥, 𝑦), we have that 𝜒(𝑔)
                        ¯    is at least as large as the sabotage complexity of 𝑔.

1.4     The conference version
A preliminary version of this paper appeared in the proceedings of ICALP 2019 [10]. Among
other revisions, some terminological confusion in the conference version is cleared up in this
paper.

1.5     Follow-up work
As mentioned before, after the conference publication of the present work [10], Ben-David and
Blais [5] exhibited partial Boolean functions 𝑓 and 𝑔 such that ℝ( 𝑓 ◦ 𝑔 𝑛 ) ∈ 𝑂(
                                                                               e ℝ( 𝑓 )2/3 · ℝ(𝑔)2/3 ),
ruling out the possibility of a perfect composition theorem even when both 𝑓 and 𝑔 are partial
Boolean functions. Later, Ben-David, Blais, Göös and Maystre [6] introduced the linearized
complexity measure LR(·). They showed the composition theorem ℝ( 𝑓 ◦ 𝑔 𝑛 ) = Ω(ℝ( 𝑓 ) · LR(𝑔))
for all partial Boolean functions 𝑓 and 𝑔. They further showed that LR(𝑔) is the asymptotically
largest measure for which such a composition statement holds. Thus their composition theorem
is an improvement on Theorem 1.5.


2     Preliminaries
For 𝑇 ⊆ {0, 1} 𝑚 , let 𝑔 : 𝑇 → {0, 1} be a partial Boolean function. For 𝑏 ∈ {0, 1}, 𝑔 −1 (𝑏) is defined
to be the set of strings {𝑥 ∈ 𝑇 | 𝑔(𝑥) = 𝑏}. We refer to 𝑇 as the set of valid inputs to 𝑔. For all
strings 𝑦 ∉ 𝑇, we say that 𝑔(𝑦) = ∗. All probability distributions 𝜇 over {0, 1} 𝑚 considered in
connection with 𝑔 are assumed to have support on 𝑇. Thus 𝑔(𝑥) is well-defined for any 𝑥 in the
support of 𝜇.
    Let 𝑆 be any finite set. Let ℎ ⊆ {0, 1} 𝑘 × 𝑆 be any 𝑆-relation (which could also be a partial
Boolean function). For the sake of simplicity throughout the paper we will assume that for every
𝑥 ∈ {0, 1} 𝑘 there exists an 𝑠 ∈ 𝑆 such that (𝑥, 𝑠) ∈ ℎ; the case in which there exists an 𝑥 ∈ {0, 1} 𝑘
such that for all 𝑠 ∈ 𝑆 (𝑥, 𝑠) ∉ ℎ, can be easily handled by including all pairs in {(𝑥, 𝑠) | 𝑠 ∈ 𝑆} in
ℎ. Consider query algorithms 𝒜 that accept a string 𝑥 ∈ {0, 1} 𝑘 as input, query various bits of 𝑥,
and produce an element of 𝑆 as output. We denote the output by 𝒜(𝑥).

Definition 2.1 (Deterministic query complexity). A deterministic query algorithm 𝒜 is said to
compute an 𝑆-relation ℎ if (𝑥, 𝒜(𝑥)) ∈ ℎ for all 𝑥 ∈ {0, 1} 𝑘 . The deterministic query complexity

                       T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                           9
              D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

𝔻(ℎ) of ℎ is the minimum over all deterministic query algorithms 𝒜 computing ℎ of the
maximum number of queries made by 𝒜 over 𝑥 ∈ {0, 1} 𝑘 .

Definition 2.2 (Bounded-error randomized query complexity). Let 𝜖 ∈ [0, 1/2). We say that a
randomized query algorithm 𝒜 computes an 𝑆-relation ℎ with error 𝜖 if Pr[(𝑥, 𝒜(𝑥)) ∈ ℎ] ≥ 1− 𝜖
for all 𝑥 ∈ {0, 1} 𝑘 . The bounded-error randomized query complexity ℝ𝜖 (ℎ) of ℎ is the minimum
over all randomized query algorithms 𝒜 computing ℎ with error 𝜖 of the maximum number of
queries made by 𝒜 over all 𝑥 ∈ {0, 1} 𝑘 and the internal randomness of 𝒜.

Definition 2.3 (Distributional query complexity). Let 𝜇 a distribution on the input space {0, 1} 𝑘
of ℎ, and 𝜖 ∈ [0, 1/2). We say that a deterministic query algorithm 𝒜 computes an 𝑆-relation
ℎ with distributional error 𝜖 on 𝜇 if Pr𝑥∼𝜇 [(𝑥, 𝒜(𝑥)) ∈ ℎ] ≥ 1 − 𝜖. The distributional query
               𝜇
complexity 𝔻𝜖 (ℎ) of ℎ is the minimum over deterministic algorithms 𝒜 computing ℎ with
distributional error 𝜖 on 𝜇 of the maximum over 𝑥 ∈ {0, 1} 𝑘 of the number of queries made by
𝒜 on 𝑥.

   We will use the minimax principle in our proofs to go between distributional and randomized
query complexity.

Fact 2.4 (Minimax principle). For any integer 𝑘 > 0, finite set 𝑆, and 𝑆-relation ℎ ⊆ {0, 1} 𝑘 × 𝒮,
                                                            𝜇
                                        ℝ𝜖 (ℎ) = max 𝔻𝜖 (ℎ).
                                                       𝜇


   We present a proof of Fact 2.4 in Appendix A.
   Let 𝜇 be a probability distribution over {0, 1} 𝑘 . We use supp(𝜇) to denote the support of 𝜇.
By 𝑥 ∼ 𝜇 we mean that 𝑥 is a random string drawn from 𝜇. Let 𝐶 ⊆ {0, 1} 𝑘 be an arbitrary set
such that Pr𝑥∼𝜇 [𝑥 ∈ 𝐶] = 𝑦∈𝐶 𝜇(𝑦) > 0. Then 𝜇 | 𝐶 is defined to be the probability distribution
                         Í
obtained by conditioning 𝜇 on the event that the sampled string belongs to 𝐶, i. e.,

                                                                if 𝑥 ∉ 𝐶
                                               (
                                                   0
                                (𝜇 | 𝐶)(𝑥) =          𝜇(𝑥)
                                                                if 𝑥 ∈ 𝐶
                                                     𝑦∈𝐶 𝜇(𝑦)
                                                   Í


   For a distribution 𝒬 over pairs of distributions (𝜇0 , 𝜇1 ), let supp0 (𝒬) = ∪{supp(𝜇0 ) :
∃𝜇1 , (𝜇0 , 𝜇1 ) ∈ supp(𝒬)}. Similarly let supp1 (𝒬) = ∪{supp(𝜇1 ) : ∃𝜇0 , (𝜇0 , 𝜇1 ) ∈ supp(𝒬)}. We
say that 𝒬 is consistent if supp0 (𝒬) and supp1 (𝒬) are disjoint sets. We say that 𝒬 is consistent
with a (partial) function 𝑔 if supp0 (𝒬) ⊆ 𝑔 −1 (0) and supp1 (𝒬) ⊆ 𝑔 −1 (1). All such distributions
𝒬 considerd in this paper will be assumed to have finite support.

Definition 2.5 (Subcube, codimension). A subset ℂ ⊆ {0, 1} 𝑚 is called a subcube if there
exists a set 𝑆 ⊆ {1, . . . , 𝑚} of indices and an assignment function 𝜎 : 𝑆 → {0, 1} such that
ℂ = {𝑥 ∈ {0, 1} 𝑚 : ∀𝑖 ∈ 𝑆, 𝑥 𝑖 = 𝜎(𝑖)}. The codimension codim(ℂ) of ℂ is defined to be |𝑆|.

   The composition of an 𝑆-relation and a partial Boolean function plays a central role in this
paper. See Section 1 for a definition.

                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                            10
              O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

    We will often view a deterministic query algorithm as a binary decision tree. In each
vertex 𝑣 of the tree, an input variable is queried. Depending on the outcome of the query, the
computation goes to a child of 𝑣. The child of 𝑣 corresponding to outcome 𝑏 of the query is
denoted by 𝑣 𝑏 .
    The set of inputs that lead the computation of a decision tree to a certain vertex is a subcube.
We will use the same symbol (e. g., 𝑣) to refer to a vertex as well as the subcube associated with
it.
    The execution of a decision tree terminates at some leaf. If the tree computes some 𝑆-relation
ℎ ⊆ {0, 1} 𝑘 × 𝑆, the leaves are labelled by elements of 𝑆, and the tree outputs the label of the leaf
at which it terminates. We will also consider decision tree with unlabelled leaves (see Section 4).


3   Conflict complexity
In this section, we define the conflict complexity and max-conflict complexity of a partial Boolean
function 𝑔 on 𝑚 bits. For this, we will need to introduce some notation related to a deterministic
decision tree 𝑇. For a node 𝑣 ∈ 𝑇, let 𝜋(𝑣) = ⊥ if 𝑣 is the root and 𝜋(𝑣) be the parent of 𝑣
otherwise. Let 𝑞(𝑣) be the index that is queried at 𝑣 in 𝑇, and let 𝑑𝑇 (𝑣) be the number of vertices
on the unique path in 𝑇 from the root to 𝑣 (i. e., the depth of 𝑣). The depth of the root is 1.
    Now fix a partial function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1} and probability distributions 𝜇0 , 𝜇1 over
𝑔 (0), 𝑔 −1 (1), respectively. Let 𝑇 be a tree that computes 𝑔. For a node 𝑣 ∈ 𝑇 let 𝑝 0 (𝑣) =
 −1

Pr𝜇0 [𝑥 𝑞(𝑣) = 0|𝑥 at 𝑣] and 𝑝 1 (𝑣) = Pr𝜇1 [𝑥 𝑞(𝑣) = 0|𝑥 at 𝑣], and
                                 if 𝑣 is the root,
                      1
                      
                      
                      
              𝑅(𝑣) = 𝑅(𝜋(𝑣)) · min{Pr𝜇0 [𝑥 ∈ 𝑣|𝑥 ∈ 𝜋(𝑣)], Pr𝜇1 [𝑥 ∈ 𝑣|𝑥 ∈ 𝜋(𝑣)]}
                                  otherwise .
                    
                    
                    
                    
Also define
                                      Δ(𝑣) = |𝑝 0 (𝑣) − 𝑝 1 (𝑣)| .
To gather intuition about these quantities, imagine a random walk on 𝑇 that begins at the root.
At a node 𝑣, this walk moves to the left child with probability min{𝑝 0 (𝑣), 𝑝1 (𝑣)}, and it moves to
the right child with probability 1 − max{𝑝 0 (𝑣), 𝑝1 (𝑣)}. With the remaining probability, Δ(𝑣), it
terminates at 𝑣. 𝑅(𝑣) is the probability that the walk reaches 𝑣. The walk always terminates
before it reaches a leaf of 𝑇. To see why, note that if the walk reaches a leaf before terminating,
then there are two inputs 𝑥 ∈ 𝑔 −1 (0), 𝑦 ∈ 𝑔 −1 (1) both of which lead the computation of 𝑇 to the
same leaf, which contradicts the assumption that 𝑇 computes 𝑔. Hence for any tree 𝑇 computing
𝑔 we have 𝑣∈𝑇 Δ(𝑣)𝑅(𝑣) = 1. In particular, this means that 𝑣∈𝑇 𝑑𝑇 (𝑣)Δ(𝑣)𝑅(𝑣)—the expected
            Í                                                    Í
number of steps the walk takes before it terminates—is always at most the depth of the tree 𝑇.
Definition 3.1 (Conflict complexity and max-conflict complexity). Let 𝑔 be a partial function.
For distributions 𝜇0 , 𝜇1 with supp(𝜇𝑏 ) ⊆ 𝑔 −1 (𝑏) for 𝑏 ∈ {0, 1}, and a deterministic decision tree
𝑇 computing 𝑔, define                              Õ
                                𝜒(𝑇, (𝜇0 , 𝜇1 )) =   𝑑𝑇 (𝑣)Δ(𝑣)𝑅(𝑣) .
                                                 𝑣∈𝑇


                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                         11
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

The conflict complexity of 𝑔 is
                                   𝜒(𝑔) = max min 𝜒(𝑇, (𝜇0 , 𝜇1 )) ,
                                           𝜇0 ,𝜇1       𝑇

where the maximum is over all pairs of distributions (𝜇0 , 𝜇1 ), where 𝜇0 and 𝜇1 are supported on
𝑔 −1 (0) and 𝑔 −1 (1), respectively, and the minimum is taken over all deterministic trees 𝑇 computing
𝑔. For 𝒬 a distribution with finite support over pairs of distributions satisfying supp𝑏 (𝒬) ⊆ 𝑔 −1 (𝑏)
for 𝑏 ∈ {0, 1}, and 𝑇 a deterministic tree computing 𝑔, let 𝜒(𝑇, 𝒬) = 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝑇, (𝜇0 , 𝜇1 ))].
Finally, the max-conflict complexity of 𝑔 is
                                      𝜒(𝑔)
                                      ¯    = sup min 𝜒(𝑇, 𝒬) ,
                                                    𝒬       𝑇

where the supremum is taken over 𝒬 with finite support such that supp𝑏 (𝒬) ⊆ 𝑔 −1 (𝑏) for
𝑏 ∈ {0, 1}, and the minimum is taken over deterministic trees 𝑇 computing 𝑔.
   We can extend the definition of conflict complexity and max-conflict complexity to more
general query processes that do not necessarily compute a function. We first need the notion of
FULL.
Definition 3.2. For a deterministic tree 𝑇 and pair of distributions (𝜇0 , 𝜇1 ) with disjoint support,
we say that (𝑇, (𝜇0 , 𝜇1 )) is FULL if 𝑣∈𝑇 Δ(𝑣)𝑅(𝑣) = 1, i. e., if the random walk described above
                                      Í
terminates with probability 1. We say that (𝑇, 𝒬) is FULL if (𝑇, (𝜇0 , 𝜇1 )) is FULL for each
(𝜇0 , 𝜇1 ) ∈ supp(𝒬).
Definition 3.3. For a deterministic tree 𝑇 and pair of distributions (𝜇0 , 𝜇1 ) such that (𝑇, (𝜇0 , 𝜇1 ))
is FULL, define 𝜒(𝑇, (𝜇0 , 𝜇1 )) = 𝑣∈𝑇 𝑑𝑇 (𝑣)Δ(𝑣)𝑅(𝑣). For a distribution 𝒬 such that (𝑇, 𝒬) is
                                   Í
FULL, define 𝜒(𝑇, 𝒬) = 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝑇, (𝜇0 , 𝜇1 ))].

3.1   Comparison with other query measures
The definition of conflict complexity appeared for the first time in [16], which is one of the
two papers the current paper is a merger of (the other paper being [9]). Li [13] analyzed this
definition and showed that the conflict complexity of a total Boolean function 𝑔 is at least the
block sensitivity of 𝑔. Here we show that the max-conflict complexity of a (partial) function 𝑔
is at least as large as the sabotage complexity of 𝑔. For a total Boolean function 𝑔, Ben-David
and Kothari [7] show that the sabotage complexity of 𝑔 is at least as large as the fractional block
sensitivity of 𝑔 [1, 17, 11], which in turn is at least as large as the block sensitivity. They also show
examples where the sabotage complexity is much larger than the partition bound, quantum
query complexity and approximate polynomial degree, thus the same holds for max-conflict
complexity as well.
    We first need the following simple claim. Let 𝛿 𝑥 be the probability distribution that puts
weight 1 on the string 𝑥.
Claim 3.4. Let 𝑇 be a deterministic tree computing the partial function 𝑔 and let 𝑥, 𝑦 be such that
𝑔(𝑥) = 0, 𝑔(𝑦) = 1. Then
                                  𝜒(𝑇, (𝛿 𝑥 , 𝛿 𝑦 )) = sep𝑇 (𝑥, 𝑦) .

                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                            12
                O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

Proof. Let 𝑣1 be the root of 𝑇, and 𝑣 1 , 𝑣2 , . . . , 𝑣 𝑡 be the longest sequence of vertices in 𝑇 that
are visited both by 𝑥 and 𝑦, i. e., 𝑥 𝑞(𝑣𝑡 ) ≠ 𝑦 𝑞(𝑣𝑡 ) . Since 𝑇 computes 𝑔, 𝑣 𝑡 is not a leaf. For each
𝑖 = 1, . . . , 𝑡 − 1 we see that Δ(𝑣 𝑖 ) = 0, while Δ(𝑣 𝑡 ) = 1. Also 𝑅(𝑣 𝑖 ) = 1 for each 𝑖 = 1, . . . , 𝑡, while
𝑅(𝑣) = 0 for any other vertex. Thus 𝑣∈𝑇 𝑑(𝑣)Δ(𝑣)𝑅(𝑣) = 𝑡 = sep𝑇 (𝑥, 𝑦).
                                            Í
                                                                                                                

Theorem 3.5. Let 𝑔 ⊆ {0, 1} 𝑚 × {0, 1} be a partial Boolean function. Then 𝜒(𝑔)
                                                                           ¯    ≥ ℝ𝕊(𝑔).

Proof. By Theorem B.1 (Appendix B),

                                     ℝ𝕊(𝑔) = max min 𝔼(𝑥,𝑦)∼𝑝 [sep𝑇 (𝑥, 𝑦)] .
                                                𝑝     𝑇

By definition of max-conflict complexity we have

                                   𝜒(𝑔)
                                   ¯    = sup min 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝑇, (𝜇0 , 𝜇1 ))] .
                                            𝒬    𝑇


The distribution 𝑝 in sabotage complexity is a special case of 𝒬 where all the pairs of distributions
in the support are singleton distributions. Note that 𝑝 has finite support. The theorem now
follows from Claim 3.4.                                                                             


4    Query process
We now come to the most important definition of the paper, that of the query process 𝒫(ℬ, 𝒬).
Let 𝑡 > 0 be any integer and ℬ be any deterministic query algorithm that runs on inputs in
({0, 1} 𝑚 )𝑡 . Let 𝑥 = (𝑥 𝑖 ) 𝑖=1,...,𝑡 be a generic input to ℬ, and let 𝑥 𝑖 stand for (𝑥 𝑖 ) 𝑗=1,...,𝑚 . For a
                          (𝑗)                                                                 (𝑗)
                                𝑗=1,...,𝑚

vertex 𝑣 of ℬ, let 𝑣 (𝑖) denote the subcube in 𝑣 projected to 𝑥 𝑖 , i. e., 𝑣 = 𝑣 (1) × . . . × 𝑣 (𝑡) . Recall
from Section 2 that 𝑣 𝑏 stands for the child of 𝑣 corresponding to the query outcome being 𝑏, for
𝑏 ∈ {0, 1}.
    The query process 𝒫(ℬ, 𝒬) runs on an input 𝑧 ∈ {0, 1} 𝑡 and uses the BITSAMPLER
(Algorithm 1) routine to simulate the queries of ℬ to 𝑥 when it can. This process is the heart of
how we will transform an algorithm for 𝑓 ◦ 𝑔 𝑛 into a query efficient algorithm for 𝑓 .

Definition 4.1 (Query process 𝒫(ℬ, 𝒬)). Let ℬ be a decision tree that runs on inputs from
({0, 1} 𝑚 )𝑡 . Let 𝒬 be a consistent probability distribution with finite support over pairs of
distributions (𝜇0 , 𝜇1 ). The query process 𝒫(ℬ, 𝒬) is run on an input 𝑧 ∈ {0, 1} 𝑡 and is defined
by Algorithm 2.

     A few comments about Definition 4.1. First, we think of ℬ and 𝒫 as query procedures that
query input variables and terminate. In particular, they do not have to produce outputs, i. e.,
their leaves do not have to be labeled. Second, note that in Algorithm 2 the segment from line 9
to line 19 corresponds to the Bitsampler procedure in Algorithm 1. Queries to the input bits 𝑧 𝑖
are made in line 15, which corresponds to step 9 of Bitsampler. Finally, the variables ℕ𝑖 are not
directly used by the algorithm, but are used in its analysis.

                        T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                   13
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL


  Algorithm 2: 𝒫(ℬ, 𝒬)
   Input: 𝑧 = (𝑧 1 , . . . , 𝑧 𝑡 ) ∈ {0, 1} 𝑡 .
 1 for 1 ≤ 𝑘 ≤ 𝑡 do
 2     QUERY 𝑘 ← 0.                                             // Indicates if 𝑧 𝑘 is queried.
 3     ℕ 𝑘 ← 0.                                 // Counts references to 𝑥 𝑘 till 𝑧 𝑘 is queried.
                   (𝑘)     (𝑘)
 4      Sample (𝜇0 , 𝜇1 ) from 𝒬.
 5 𝑣 ←Root of ℬ                                                         // Corresponds to ({0, 1} 𝑚 )𝑡 .
 6 while 𝑣 is not a leaf of ℬ do
 7   Let 𝑞(𝑣) = (𝑖, 𝑗), the 𝑗 𝑡 ℎ coordinate of 𝑥                   𝑖
 8      if QUERY𝑖 = 0 then
  9         Sample a fresh real number 𝑟 ∼ [0, 1] uniformly at random.
                                                    (𝑗)
 10         if 𝑟 < min𝑏 Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑣 (𝑖) ] then
                                 𝑖        𝑏
 11             𝑣 ← 𝑣0 .
                                                          (𝑗)
 12         else if 𝑟 > max𝑏 Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑣 (𝑖) ] then
                                              𝑖       𝑏
 13             𝑣 ← 𝑣1 .
 14         else
 15             Query 𝑧 𝑖 . QUERY𝑖 ← 1.
                                              (𝑗)
 16             if 𝑟 ≤ Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑣 (𝑖) ] then
                             𝑖       𝑧𝑖
 17                  𝑣 ← 𝑣0 .
 18             else
 19                 𝑣 ← 𝑣1 .

 20         ℕ𝑖 ← ℕ𝑖 + 1.
 21     else
                                                     (𝑗)
                                                  [𝑥 𝑖 = 1 | 𝑥 𝑖 ∈ 𝑣 (𝑖) ]
                 
                  1 with probability Pr𝑥 𝑖 ∼𝜇(𝑖)
                 
                                             𝑧
 22         𝑏←                                       (𝑗)
                                                                𝑖

                  0 with probability Pr𝑥 𝑖 ∼𝜇(𝑖)
                                                 [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑣 (𝑖) ]
                                             𝑧 𝑖
 23         𝑣 ← 𝑣𝑏



    We now present an important structural result about 𝒫(ℬ, 𝒬). In particular, this formally
proves that the procedure Bitsampler given in Algorithm 1 samples the bits from the right
distribution. The theorem should be intuitively clear, but we present a formal proof for
completeness. Recall the definition of 𝛾𝑧 (𝒬) from Section 1.3.

Theorem 4.2 (Simulation Theorem). Let ℬ be a deterministic decision tree running on inputs from
({0, 1} 𝑚 )𝑡 , and let 𝑣 be a vertex in ℬ. Let 𝐴 𝑧 (𝑣, 𝒬) be the event that 𝒫(ℬ, 𝒬), when run on 𝑧, reaches
node 𝑣. Let 𝐵 𝑧 (𝑣, 𝒬) be the event that for a random input 𝑥 sampled from 𝛾𝑧 (𝒬), the computation of ℬ

                         T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                              14
               O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

reaches 𝑣. Then for every 𝑧 ∈ {0, 1} 𝑡 and each vertex 𝑣 of ℬ,

                                                   Pr[𝐴 𝑧 (𝑣, 𝒬)] = Pr[𝐵 𝑧 (𝑣, 𝒬)] .

Proof. To save writing, we fix 𝑧 ∈ {0, 1} 𝑡 and 𝒬 and let 𝐴(𝑣) := 𝐴 𝑧 (𝑣, 𝒬) be the event that
𝒫(ℬ, 𝑄) reaches node 𝑣 on input 𝑧, and 𝐵(𝑣) := 𝐵 𝑧 (𝑣, 𝒬) be the event that ℬ reaches node 𝑣
                                                                      (1)  (1)            (𝑡)  (𝑡)
under the distribution 𝛾𝑧 (𝒬). Additionally, we write (𝜇0 , 𝜇1 ) = ((𝜇0 , 𝜇1 ), . . . , (𝜇0 , 𝜇1 )) for a
𝑡-tuple of pairs of distributions. In the following when we write 𝔼(𝜇 ,𝜇 )∼𝒬 𝑡 this expectation is
                                                                                                                0    1
                                     (𝑖)  (𝑖)
taken with respect to drawing each (𝜇0 , 𝜇1 ) independently from 𝒬.
   Now notice that Pr[𝐴(𝑣)] = 𝔼(𝜇 ,𝜇 )∼𝒬 𝑡 Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] and Pr[𝐵(𝑣)] = 𝔼(𝜇 ,𝜇 )∼𝒬 𝑡
                                                     0   1                                                               0   1

Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )]. We prove by induction on 𝑑(𝑣), the depth of a node 𝑣, that

                                             Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )]                                         (4.1)

for any (𝜇0 , 𝜇1 ). This will give the claim.
    Towards the aim of showing (4.1), fix an arbitrary (𝜇0 , 𝜇1 ).

Base case: 𝑑(𝑣) = 1, i. e., 𝑣 is the root of ℬ. Thus Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )] = 1.

Inductive step: Assume that 𝑑(𝑣) ≥ 2, and that the statement is true for all vertices of depth at
     most 𝑑(𝑣) − 1. Since 𝑑(𝑣) ≥ 2, 𝑣 is not the root of ℬ. Let 𝑢 = 𝑢 (1) × . . . × 𝑢 (𝑡) be the parent of
                                       (𝑗)
      𝑣, and say variable 𝑥 𝑖 is queried at 𝑢. Without loss of generality we assume that 𝑣 = 𝑢0 .
      We split the proof into the following two cases.

                                             (𝑗)                                             (𝑗)
         • Case 1: Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ] ≤ Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ].
                             𝑖    𝑧𝑖                                         𝑖        𝑧𝑖

            Conditioned on 𝐴(𝑢), (𝜇0 , 𝜇1 ) and QUERY𝑖 = 1, the probability that 𝒫 reaches 𝑣 is
                         (𝑗)
            Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖       = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ]. Also, conditioned on 𝐴(𝑢), (𝜇0 , 𝜇1 ) and QUERY𝑖 = 0
                𝑖   𝑧𝑖
            the probability that 𝒫 reaches 𝑣 is exactly equal to the probability that the real
                                                                                           (𝑗)
            number 𝑟 sampled at 𝑢 lies in [0, Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖                                     = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ]], which is equal to
                                                                         𝑖       𝑧𝑖
                         (𝑗)
            Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ]. Thus,
                𝑖   𝑧𝑖



                         Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐴(𝑢) | (𝜇0 , 𝜇1 )] · Pr[𝐴(𝑣) | 𝐴(𝑢), (𝜇0 , 𝜇1 )]
                                                                                                               (𝑗)
                                                             = Pr[𝐴(𝑢) | (𝜇0 , 𝜇1 )] · Pr [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ].                (4.2)
                                                                                                         (𝑖)
                                                                                                   𝑥 𝑖 ∼𝜇𝑧 𝑖


            Now condition on 𝐵(𝑢) and (𝜇0 , 𝜇1 ). The probability that ℬ reaches 𝑣 is exactly
                                                              (𝑗)
            equal to the probability that 𝑥 𝑖 = 0 when 𝑥 is sampled according to the distribution

                         T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                                        15
              D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

           𝛾𝑧 ((𝜇0 , 𝜇1 )) conditioned on the event that 𝑥 ∈ 𝑢. Note that in the distribution
           𝛾𝑧 ((𝜇0 , 𝜇1 )), the 𝑥 𝑘 ’s are independently distributed. Thus,

                       Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑢) | (𝜇0 , 𝜇1 )] · Pr[𝐵(𝑣) | 𝐵(𝑢), (𝜇0 , 𝜇1 )]
                                                                                              (𝑗)
                                                = Pr[𝐵(𝑢) | (𝜇0 , 𝜇1 )] · Pr [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ].      (4.3)
                                                                                 𝑥 𝑖 ∼𝜇𝑖𝑧 𝑖


           By the inductive hypothesis, Pr[𝐴(𝑢) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑢) | (𝜇0 , 𝜇1 )]. It follows
           from (4.2) and (4.3) that Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )].
                                    (𝑗)                                    (𝑗)
        • Case 2: Pr𝑥 ∼𝜇(𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ] > Pr𝑥 𝑖 ∼𝜇 (𝑖) [𝑥 𝑖 = 0 | 𝑥 𝑖 ∈ 𝑢 (𝑖) ]. Let 𝑣 0 = 𝑢1 . By an
                         𝑖     𝑧𝑖                                  𝑧
                                                                       𝑖
           argument similar to Case 1, we have that

                                          Pr[𝐴(𝑣 0) | (𝜇0 , 𝜇1 )] = Pr[𝐵(𝑣 0)(𝜇0 , 𝜇1 )].                   (4.4)
           Now,
                             Pr[𝐴(𝑣) | (𝜇0 , 𝜇1 )] = Pr[𝐴(𝑢) | (𝜇0 , 𝜇1 )] − Pr[𝐴(𝑣 0) | (𝜇0 , 𝜇1 )]
                                                   = Pr[𝐵(𝑢) | (𝜇0 , 𝜇1 )] − Pr[𝐴(𝑣 0) | (𝜇0 , 𝜇1 )]
                                                          (By inductive hypothesis)
                                                   = Pr[𝐵(𝑢) | (𝜇0 , 𝜇1 )] − Pr[𝐵(𝑣 0) | (𝜇0 , 𝜇1 )]
                                                          (By (4.4))
                                                   = Pr[𝐵(𝑣) | (𝜇0 , 𝜇1 )] .

                                                                                                                
    We will be interested in the number of queries 𝒫(ℬ, 𝒬) is able to simulate before making a
query to 𝑧 𝑖 . To this end, let the random variable 𝒩𝑖 (ℬ, 𝑧, 𝒬) stand for the value of the variable
ℕ𝑖 in Algorithm 2 after the termination of 𝒫(ℬ, 𝒬) on input 𝑧. Note that 𝒩𝑖 depends on the
randomness in the choices of 𝑟 (step 9) and also on the randomness in 𝒬 in the choice of
                  (𝑘)   (𝑘)
distributions (𝜇0 , 𝜇1 ) (step 4).

4.1   Relating 𝒫(ℬ, 𝒬) to max-conflict complexity
A key to our composition theorem will be relating the number of simulated queries made by
𝒫(ℬ, 𝒬) to max-conflict complexity, which we do in this section. Let ℬ be a query algorithm
taking inputs from {0, 1} 𝑚 . In this case, 𝒩1 (ℬ, 1, 𝒬) = 𝒩1 (ℬ, 0, 𝒬). This is because the behavior
of 𝒫(ℬ, 𝒬) on input 0 is exactly the same as the behavior on input 1 before a query to 𝑧 is made,
and after 𝑧 is queried the value of ℕ𝑖 does not change.
Claim 4.3. Let ℬ be an algorithm taking inputs from {0, 1} 𝑚 . Then (ℬ, 𝒬) is FULL if and only if
𝒫(ℬ, 𝒬) queries 𝑧 with probability 1. If (ℬ, 𝒬) is FULL then
                                           𝜒(ℬ, 𝒬) = 𝔼[𝒩1 (ℬ, 1, 𝒬)]

                       T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                    16
               O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

Proof. Note that until 𝑧 is queried, 𝒫(ℬ, (𝜇0 , 𝜇1 )) exactly executes the random walk described
in Section 3, and querying 𝑧 in 𝒫(ℬ, (𝜇0 , 𝜇1 )) corresponds to this random walk terminating.
The first part of the claim then follows as 𝒫(ℬ, 𝒬) queries 𝑧 with probability 1 if and only if
𝒫(ℬ, (𝜇0 , 𝜇1 )) queries 𝑧 with probability 1 for every (𝜇0 , 𝜇1 ) ∈ supp(𝒬).
    Also, because 𝒫(ℬ, (𝜇0 , 𝜇1 )) exactly executes the random walk described in Section 3, we
see that 𝜒(ℬ, (𝜇0 , 𝜇1 )) = 𝔼[𝒩1 (ℬ, 1, (𝜇0 , 𝜇1 ))]. The second part of the claim follows by taking
the expectation of this equality over (𝜇0 , 𝜇1 ) ∼ 𝒬.                                              
    The correspondence of Claim 4.3 prompts us to define FULL in a more general setting.
Definition 4.4 (FULL). Let ℬ be a query algorithm taking inputs from ({0, 1} 𝑚 )𝑡 . The pair
(ℬ, 𝒬) is said to be FULL if for every 𝑧 ∈ {0, 1} 𝑡 it holds that 𝒫(ℬ, 𝒬) queries 𝑧 𝑖 with probability
1, for every 𝑖 = 1, . . . , 𝑡.


5    The Composition Theorem
In this section we prove Theorem 1.5 (restated below).
Theorem 5.1. For any 𝑆-relation 𝑓 ⊆ {0, 1} 𝑛 × 𝑆 and any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},
                                      ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) ∈ Ω ℝ4/9 ( 𝑓 ) · 𝜒(𝑔)   .
                                                                             
                                                                        ¯
    Our proof will make use of the following direct sum theorem.
Theorem 5.2. Let ℬ be a query algorithm acting on inputs from ({0, 1} 𝑚 )𝑡 . Let 𝒬 be a consistent
distribution with finite support over pairs of distributions (𝜇0 , 𝜇1 ) on 𝑚-bit strings. If (ℬ, 𝒬) is FULL
then for any 𝑧 ∈ {0, 1} 𝑡
                                   𝑡
                                   Õ
                                         𝔼[𝒩𝑖 (ℬ, 𝑧, 𝒬)] ≥ 𝑡 · min 𝜒(𝒞, 𝒬) ,
                                                                               𝒞
                                   𝑖=1
where the minimum is taken over deterministic trees 𝒞 acting on inputs from {0, 1} 𝑚 such that (𝒞, 𝒬) is
FULL.
Proof of Theorem 5.2. Towards a contradiction, assume that
                       𝑡
                       Õ
                              𝔼[𝒩𝑖 (ℬ, 𝑧, 𝒬)] < 𝑡 · min 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝒞, (𝜇0 , 𝜇1 ))] .                          (5.1)
                                                               𝒞
                        𝑖=1
By averaging, there exists a 𝑘 such that 𝔼[𝒩𝑘 (ℬ, 𝑧, 𝒬)]] < min𝒞 𝔼(𝜇0 ,𝜇1 )∼𝒬
[𝜒(𝒞, (𝜇0 , 𝜇1 ))]. Let us focus on the expression on the left hand side. Recall that there are two
kinds of randomness in this expectation, the choice of the random numbers 𝑟 in 𝒫(ℬ, 𝒬) and
the choice of (𝜇0 , 𝜇1 ) ∼ 𝒬 𝑡 . We separate out these two as follows:
                 𝔼[𝒩𝑘 (ℬ, 𝑧, 𝒬)] = 𝔼(𝜇0 ,𝜇1 )∼𝒬 𝑡 𝔼𝑟 [𝒩𝑘 (ℬ, 𝑧, (𝜇0 , 𝜇1 )]
                                     = 𝔼𝑟 𝔼(𝜇 ,𝜇 )∼𝒬 𝑡 [𝒩𝑘 (ℬ, 𝑧, (𝜇0 , 𝜇1 )]
                                               0   1

                                     = 𝔼𝑟 𝔼            −(𝑘)            𝔼(𝜇(𝑘) ,𝜇(𝑘) )∼𝒬 [𝒩𝑘 (ℬ, 𝑧, (𝜇0 , 𝜇1 )] ,
                                             (𝜇0 ,𝜇1 )        ∼𝒬 𝑡−1       0       1




                       T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                          17
                    D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

                    −(𝑘)
where (𝜇0 , 𝜇1 )  is a (𝑡 − 1)-tuple of pairs of distributions without the 𝑘 𝑡 ℎ coordinate. This
further means that there is a fixing of the randomness 𝑟 and the (𝑡 − 1)-tuple of pairs distributions
         −(𝑘)
(𝜇0 , 𝜇1 )      such that 𝔼(𝜇(𝑘) ,𝜇(𝑘) )∼𝒬 [𝒩𝑘 (ℬ, 𝑧, (𝜇0 , 𝜇1 )] < min𝒞 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝒞, (𝜇0 , 𝜇1 ))]. With such
                                   0    1
a fixed setting, however, 𝒫(ℬ, 𝒬) creates a query process equivalent to 𝒫(ℬ 0 , ℚ) run on 𝑧 𝑖 ∈ {0, 1}
for a deterministic query algorithm ℬ 0 running on inputs from {0, 1} 𝑚 and such that (ℬ 0 , 𝒬) is
FULL. The distribution 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝒩1 (ℬ 0 , 1, 𝜇0 , 𝜇1 )] is the same as that as 𝔼(𝜇(𝑘) ,𝜇(𝑘) )∼𝒬 [𝒩𝑘 (ℬ, 𝑧,
                                                                                             0   1
             −(𝑘)(𝑘)  (𝑘)                                                           −(𝑘)
((𝜇0 , 𝜇1 ) , (𝜇0 , 𝜇1 ))] conditioned on the earlier fixing of (𝜇0 , 𝜇1 )               and the randomness 𝑟.
Thus 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(ℬ 0 , (𝜇0 , 𝜇1 ))] < min𝒞 𝔼(𝜇0 ,𝜇1 )∼𝒬 [𝜒(𝒞, (𝜇0 , 𝜇1 ))], a contradiction.           
Proof of Theorem 1.5. We shall prove that for each distribution 𝜂 on the inputs to 𝑓 , there is a
randomized query algorithm 𝒜 making at most 18ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 )/ 𝜒(𝑔)¯   queries in the worst case, for
which Pr𝑧∈𝜂 [(𝑧, 𝒜(𝑧)) ∈ 𝑓 ] ≥ 59 holds. 𝒜 can be made deterministic with the same complexity
and accuracy guarantees by appropriately fixing its randomness. This will imply the theorem
by the minmax principle (Fact 2.4). To this end let us fix a distribution 𝜂 over {0, 1} 𝑛 .
    Let 𝒬 be a distribution with finite support which is consistent with 𝑔 such that for any
deterministic decision tree 𝒞 computing 𝑔 we have 𝜒(𝒞, 𝒬) ≥ 𝜒(𝑔)   ¯   − 𝜖, where 𝜖 is to be set later.
We will use distributions 𝜂 and 𝒬 to set up a distribution 𝛾𝜂 over the input space of 𝑓 ◦ 𝑔 𝑛 . This
distribution is defined as follows:
   1. Sample 𝑧 = (𝑧 1 , . . . , 𝑧 𝑛 ) from 𝜂.
                      (𝑖)   (𝑖)
   2. Sample (𝜇0 , 𝜇1 ) independently from 𝒬 for 𝑖 = 1, . . . , 𝑡.
                                  (𝑖)
   3. Sample 𝑥 𝑖 from 𝜇𝑧 𝑖 for 𝑖 = 1, . . . , 𝑡. Return 𝑥 = (𝑥 1 , . . . , 𝑥 𝑡 ).
Recall from Section 1.3 the observation that for each 𝑧, 𝑥 sampled as above, for each 𝑠 ∈ 𝒮,
(𝑧, 𝑠) ∈ 𝑓 if and only if (𝑥, 𝑠) ∈ 𝑓 ◦ 𝑔 𝑛 .
    Assume that ℝ1/3 ( 𝑓 ◦ 𝑔 𝑛 ) = 𝑐. The minimax principle (Fact 2.4) implies that there is a
deterministic query algorithm 𝒜 0 for inputs from ({0, 1} 𝑚 )𝑛 , that makes at most 𝑐 queries in
the worst case, such that Pr𝑥∈𝛾𝜂 [(𝑥, 𝒜 0(𝑥)) ∈ 𝑓 ◦ 𝑔 𝑛 ] ≥ 32 . We will first use 𝒜 0 to construct a
randomized algorithm 𝑇 for 𝑓 whose accuracy under the distribution 𝜂 is as desired and which,
for every input 𝑧, makes few queries in expectation. 𝑇 is described in Algorithm 3.

  Algorithm 3: 𝑇
    Input: 𝑧 ∈ {0, 1} 𝑛
  1 Run 𝒫(𝒜 0 , 𝒬) on 𝑧.
  2 Return the output of 𝒜 0 .


    First we bound the probability of error by 𝑇. By Theorem 4.2, we have that Pr[(𝑧, 𝑇(𝑧)) ∈
𝑓 ] = Pr𝑥∼𝛾𝑧 (𝒬) [(𝑥, 𝒜 0(𝑥)) ∈ 𝑓 ◦ 𝑔 𝑛 ] for each 𝑧 ∈ {0, 1} 𝑛 . Thus, Pr𝑧∼𝜂 [(𝑧, 𝑇(𝑧)) ∈ 𝑓 ] =
Pr𝑥∼𝛾𝜂 [(𝑥, 𝒜 0(𝑥)) ∈ 𝑓 ◦ 𝑔 𝑛 ] ≥ 32 .
    Next, we bound the expected number of queries made by 𝑇 in the worst-case.

                            T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                18
                 O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

Claim 5.3. The expected number of queries made by 𝑇 on each input 𝑧 is at most 𝜒(𝑔)
                                                                               ¯
                                                                                2𝑐
                                                                                    .

Proof. Fix an input 𝑧 ∈ {0, 1} 𝑛 . For each leaf ℓ = ℓ (1) × . . . × ℓ (𝑛) of 𝒜 0 and for each 𝑖 = 1, . . . , 𝑛
define ℰ 𝑖,ℓ0
                to be the event that the computation of 𝒫(𝒜 0 , 𝒬) finishes at ℓ with QUERYi = 0. For
𝑖 = 1, . . . , 𝑛 define ℱ𝑖 0 to be the event that QUERYi is set to 1 in 𝒫(𝒜 0 , 𝒬). Let 𝒬 𝑛 stand for the
(product) distribution of 𝑛 pairs of probability distributions each independently sampled from
𝒬. For 𝑖, ℓ such that 𝑔 is not constant on ℓ (𝑖) , let 𝒟𝑖,ℓ be the distribution given by the following
sampling procedure:
                    (1)    (1)               (𝑛)   (𝑛)
   1. Sample (𝜇0 , 𝜇1 ), . . . , (𝜇0 , 𝜇1 ) from 𝒬 𝑛 conditioned on ℰ 𝑖,ℓ
                                                                      0
                                                                          ,
                   (𝑖)           (𝑖)
   2. return (𝜇0 | ℓ (𝑖) , 𝜇1 | ℓ (𝑖) ).
Let ℬ𝑖,ℓ be an optimal tree for 𝒟𝑖,ℓ , i. e., 𝜒(ℬ𝑖,ℓ , 𝒟𝑖,ℓ ) = min𝒞 𝜒(𝒞, 𝒟𝑖,ℓ ), where the minimization
is over all algorithms 𝒞 that output 0 on supp0 (𝒟𝑖,ℓ ) and output 1 on supp1 (𝒟𝑖,ℓ ). Now, consider
the query algorithm 𝐻 defined in Algorithm 4. Note that (𝐻, 𝒬) is FULL. Now consider a

  Algorithm 4: 𝐻
   Input: 𝑥 ∈ ({0, 1} 𝑚 )𝑛
 1 Run 𝒜 0 on 𝑥.
 2 Let 𝒜 0 terminate at leaf ℓ = ℓ (1) × . . . × ℓ (𝑛) .
 3 for 1 ≤ 𝑖 ≤ 𝑛 do
 4     if 𝑔 is not constant on ℓ (𝑖) then
 5         Run ℬ𝑖,ℓ on 𝑥 𝑖 .


run of the query process 𝒫(𝐻, 𝒬) on input 𝑧. Theorem 5.2 implies that 𝑛𝑖=1 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬)] ≥
                                                                                                     Í
𝑛 · min𝒞 𝜒(𝒞, 𝒬) = 𝑛 · ( 𝜒(𝑔)
                           ¯     − 𝜖), by the choice of 𝒬.
    Let ℱ𝑖 to be the event that QUERYi is set to 1 in 𝒫(𝐻, 𝒬) when it reaches a leaf of 𝒜 0, and
for each leaf ℓ of 𝒜 0 let ℰ 𝑖,ℓ be the event that 𝒫(𝐻, 𝒬) reaches ℓ and QUERYi = 0 when it does.
Observe that for each 𝑖 = 1, . . . , 𝑛, the events {ℱ𝑖 , (ℰ 𝑖,ℓ )ℓ } are mutually exclusive and exhaustive.
    We have that
                          𝑛
                          Õ
   𝑛 · ( 𝜒(𝑔)
         ¯    − 𝜖) ≤            𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬)]
                          𝑖=1
                          𝑛 Õ
                          Õ                                                      𝑛
                                                                                 Õ
                     =                 Pr[ℰ 𝑖,ℓ ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℰ 𝑖,ℓ ] +         Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ]    (5.2)
                          𝑖=1    ℓ                                               𝑖=1

Let 𝑑 𝑖 (ℓ ) be the number of queries into 𝑥 𝑖 made in the unique path from the root of 𝒜 0 to ℓ . Now,
                                                           (𝑗)   (𝑗)
condition on the 𝑛 pairs of distributions (𝜇0 , 𝜇1 ) 𝑗=1,...,𝑛 that are used in 𝒫(𝐻, 𝒬). We have that,
                                       (𝑗)   (𝑗)
    𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℰ 𝑖,ℓ , (𝜇0 , 𝜇1 ) 𝑗=1,...,𝑛 ] = 𝑑 𝑖 (ℓ (𝑖) ) + 𝔼[𝒩1 (ℬ𝑖,ℓ , 𝑧 𝑖 , (𝜇(𝑖)
                                                                                          0
                                                                                                         (𝑖)
                                                                                              | ℓ (𝑖) , 𝜇1 | ℓ (𝑖) ))].   (5.3)


                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                             19
                  D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

                        (𝑗)       (𝑗)
Averaging over (𝜇0 , 𝜇1 ) 𝑗=1,...,𝑛 we have from (5.3) that

             𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℰ 𝑖,ℓ ] = 𝑑 𝑖 (ℓ (𝑖) ) + 𝔼[𝒩1 (ℬ𝑖,ℓ , 𝑧 𝑖 , 𝒟𝑖,ℓ )]
                                                        = 𝑑 𝑖 (ℓ (𝑖) ) + min 𝜒(𝒞, 𝒟𝑖,ℓ )        (By the choice of ℬ𝑖,ℓ ).
                                                                         𝒞

                                                        ≤ 𝑑 𝑖 (ℓ (𝑖) ) + 𝜒(𝑔).
                                                                         ¯                                                              (5.4)
                    Í
Observing that        ℓ Pr[ℰ 𝑖,ℓ ] = 1 − Pr[ℱ𝑖 ], we have from (5.2) and (5.4) that

                                    𝑛 Õ
                                    Õ                                                    𝑛
                                                                                         Õ
                                                                        (𝑖)
        𝑛 · ( 𝜒(𝑔)
              ¯    − 𝜖) ≤                           Pr[ℰ 𝑖,ℓ ] · (𝑑 𝑖 (ℓ ) + 𝜒(𝑔))
                                                                             ¯     +            Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ]
                                    𝑖=1        ℓ                                          𝑖=1
                                    Õ𝑛
                              =               (1 − Pr[ℱ𝑖 ]) · 𝜒(𝑔)
                                                              ¯
                                    𝑖=1
                                                        𝑛
                                                                                                                                    !
                                                        Õ Õ
                                                    +             (Pr[ℰ 𝑖,ℓ ] · 𝑑 𝑖 (ℓ (𝑖) ) + Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ])
                                                      𝑖=1 ℓ
            𝑛                                       𝑛
                                                                                                                                !
            Õ                 1                     Õ Õ
        ⇒         Pr[ℱ𝑖 ] ≤      ·                              (Pr[ℰ 𝑖,ℓ ] · 𝑑 𝑖 (ℓ (𝑖) ) + Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ])
                            𝜒(𝑔)
                            ¯
            𝑖=1                                     𝑖=1     ℓ
                                                                      𝑛𝜖
                                                                 +        .                                                             (5.5)
                                                                     𝜒(𝑔)
                                                                     ¯
                          Í𝑛  Í                                                                                  
                                                                                                                           𝑐
                                               ℓ (Pr[ℰ 𝑖,ℓ ] · 𝑑 𝑖 (ℓ ) + Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ]) ≤ 𝑐. We set 𝜖 ≤ 𝑛 .
We will show that                                                    (𝑖)
                              𝑖=1
       Í𝑛
Since 𝑖=1 Pr[ℱ𝑖 ] is exactly the expected number of queries made by 𝑇, the claim will follow
from (5.5).
    Consider a run of 𝒫(𝐻, 𝒬) on input 𝑧, and let 𝑐 𝑖 be a random variable denoting the number
of times step 7 of Algorithm 2 (with ℬ = 𝐻) is a query into 𝑥 𝑖 before a leaf of 𝒜 0 is reached,
for 𝑖 = 1, . . . , 𝑛. Thus 𝑛𝑖=1 𝔼[𝑐 𝑖 ] ≤ 𝑐. Further, for each 𝑖, ℓ we have 𝑑 𝑖 (ℓ (𝑖) ) = 𝔼[𝑐 𝑖 | ℰ 𝑖,ℓ ] and
                          Í
𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ] = 𝔼[𝒩𝑖 (𝒜 0 , 𝑧, 𝒬) | ℱ𝑖 ] ≤ 𝔼[𝑐 𝑖 | ℱ𝑖 ]. Thus,
                              𝑛 Õ
                              Õ                                                                                   
                                               (Pr[ℰ 𝑖,ℓ ] · 𝑑 𝑖 (ℓ (𝑖) ) + Pr[ℱ𝑖 ] · 𝔼[𝒩𝑖 (𝐻, 𝑧, 𝒬) | ℱ𝑖 ])
                              𝑖=1         ℓ
                                  𝑛 Õ
                                  Õ                                                                           
                              ≤                     (Pr[ℰ 𝑖,ℓ ] · 𝔼[𝑐 𝑖 | ℰ 𝑖,ℓ ]) + Pr[ℱ𝑖 ] · 𝔼[𝑐 𝑖 | ℱ𝑖 ]
                                    𝑖=1         ℓ
                                  𝑛
                                  Õ
                              =           𝔼[𝑐 𝑖 ] ≤ 𝑐.
                                   𝑖=1

                                                                                                                                           
    Now we finish the proof of Theorem 1.5 by constructing the query algorithm 𝒜. Let 𝒜 00 be
the algorithm obtained by terminating 𝑇 after 18𝑐/ 𝜒(𝑔)
                                                   ¯    queries. By Markov’s inequality, for each

                              T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                                        20
              O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

𝑧, the probability that 𝑇 makes more than 18𝑐/𝜒(𝑔) ¯   queries is at most 1/9. Thus 𝒜 00 computes 𝑓
with probability at least 2/3 − 1/9 = 5/9 on a random input from 𝜂. Finally, 𝒜 is obtained by fixing
the randomness of 𝒜 00 appropriately so that the above probabilistic guarantee holds.              


6    Conflict complexity and randomized query complexity
In this section, we will prove Theorem 1.7 (restated below). Our proof relates the conflict
complexity to the expected amount of information that is learned about the function value
through each query via Pinsker’s Inequality. At a high level, our proof is reminiscent of the result
of [4] on compressing communication protocols in that both look at a random sampling process
to navigate a tree, and relate the probability of this process needing to query or communicate at
a node to the amount of information that is learned at the node.

Theorem 6.1 (Restatement of Theorem 1.7). For any partial Boolean function 𝑔 ⊆ {0, 1} 𝑚 × {0, 1},
                                                    q          
                                         𝜒(𝑔) ∈ Ω        ℝ1/3 (𝑔) .

Proof. We will show that there exists a constant 𝜖 < 1/2 such that for each input distribution
     𝜇
𝜇, 𝔻𝜖 (𝑔) ≤ 10𝜒(𝑔)2 . Theorem 1.7 will follow from the minimax principle (Fact 2.4), and the
observation that the error can be brought down to 1/3 by constantly many independent repetitions
followed by a selection of the majority of the answers. It is enough to consider distributions 𝜇
supported on valid inputs of 𝑔. To this end, fix a distribution 𝜇 supported on 𝑔 −1 (0) ∪ 𝑔 −1 (1).
Define 𝜇0 := 𝜇 | 𝑔 −1 (0), 𝜇1 := 𝜇 | 𝑔 −1 (1).
    Let 𝜒(𝑔) = 𝑑. Let ℬ be a deterministic query algorithm for inputs in {0, 1} 𝑚 such that
(ℬ, 𝜇0 , 𝜇1 ) is FULL and 𝜒(𝜇0 , 𝜇1 ) = 𝜒(ℬ, 𝜇0 , 𝜇1 ). We call such a decision tree an optimal decision
tree for 𝜇0 , 𝜇1 . Thus in 𝒫(ℬ, 𝜇0 , 𝜇1 ), 𝔼[𝒩1 ] = 𝜒(𝜇0 , 𝜇1 ) ≤ 𝑑. Recall from Section 4 that the leaves
of ℬ can be labelled by bits such that ℬ computes 𝑔 on the supports of 𝜇0 and 𝜇1 . We assume
ℬ’s leaves to be labelled as such.
    Consider the following query algorithm ℬ 0: Start simulating ℬ. Terminate the simulation if
one of the following events occurs. The output in each case is specified below.

    1. If 10𝑑 2 queries have been made and 𝑣 10𝑑2 +1 ≠ ⊥, terminate and output arg max𝑏 Pr𝑥∼𝜇 [𝑔(𝑥) =
       𝑏 | 𝑥 ∈ 𝑣 10𝑑2 +1 ].

    2. If ℬ terminates, terminate and output what ℬ outputs.

By construction, ℬ 0 makes at most 10𝑑 2 queries in the worst case. The following claim bounds
the error of ℬ 0, and completes the proof of Theorem 1.7.
Claim 6.2. There exists a constant 𝜖 < 1/2 such that Pr𝑥∼𝜇 [ℬ 0(𝑥) ≠ 𝑔(𝑥)] ≤ 𝜖. Furthermore, the
constant 𝜖 is independent of 𝜇.

Proof of Claim 6.2. Let 𝑣 𝑘 be the random vertex at which the ℬ makes its 𝑘-th query when
it is run on 𝑥; If ℬ terminates before making 𝑘 queries, define 𝑣 𝑘 := ⊥. Let ℰ denote the

                       T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                            21
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

event that in at most 10𝑑 2 queries, the computation of ℬ does not reach a vertex 𝑣 such that
Pr𝑥∼𝜇 [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣] · Pr𝑥∼𝜇 [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣] ≤ 19 . Since ℬ computes 𝑔 on the supports of 𝜇0
and 𝜇1 , therefore if ℰ happens then the computation of ℬ does not reach a leaf within 10𝑑 2
queries. We split the proof into the following two cases.
Case 1: Pr[ℰ] < 43 .
     Condition on the event that the computation reaches a vertex 𝑣 of ℬ for which Pr𝑥∼𝜇 [𝑔(𝑥) =
     0 | 𝑥 ∈ 𝑣] · Pr𝑥∼𝜇 [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣] ≤ 91 holds. In this case, one of Pr𝑥∼𝜇 [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣]
     and Pr𝑥∼𝜇 [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣] is at most 1/3. Hence, | Pr𝑥∼𝜇 [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣] − Pr𝑥∼𝜇 [𝑔(𝑥) =
     1 | 𝑥 ∈ 𝑣]| ≥ 1/3. Let 𝑤 be the random leaf of the subtree of ℬ 0 rooted at 𝑣 at which the
     computation ends. The probability that ℬ 0 errs is at most
                                                                                       
                         1 1
                𝔼𝑤        −  Pr [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑤] − Pr [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑤]
                         2 2 𝑥∼𝜇                     𝑥∼𝜇
                                                                                          
                  1 1
                 ≤ − 𝔼𝑤 Pr [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑤] − 𝔼𝑤 Pr [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑤]
                  2 2   𝑥∼𝜇                        𝑥∼𝜇

                                          (By Jensen’s inequality and linearity of expectation)
                     1 1                                                1
                 =    −  Pr [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣] − Pr [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣] ≤ .
                     2 2 𝑥∼𝜇                     𝑥∼𝜇                    3

      Thus we have shown that conditioned on ℰ the probability that ℬ 0 errs is at most 13 .
      Since ℬ 0 errs with probability at most 1/2 when ℰ happens due to the decision in step 1,
                                                                             24 < 2 .
      therefore the probability that ℬ 0 errs is at most 14 · 31 + 43 · 21 = 11   1


Case 2: Pr[ℰ] ≥ 43 .
      Let 𝑎 𝑗 := (𝑖 𝑗 , 𝑥 𝑖 𝑗 ) be the tuple formed by the index and value of the random input variable
      queried at the 𝑗-th step by ℬ 0; if ℬ 0 terminates before making 𝑗 queries (i. e., 𝑣 𝑗 = ⊥) or 𝑣 𝑗 is
      a leaf of ℬ, then define 𝑖 𝑗 , 𝑥 𝑖 𝑗 := ⊥ . Note that the sequence (𝑎 1 , . . . , 𝑎 10𝑑2 ) uniquely specifies
      a leaf of ℬ 0, and vice versa. Let I(·, ·) denote the mutual information. (See Appendix C for
      the definitions and results from information theory used in this paper.) We prove the
      following claim in Section 6.1.
      Claim 6.3. If Pr[ℰ] ≥ 34 , then I(𝑎 1 , . . . , 𝑎 10𝑑2 : 𝑔(𝑥)) ≥ 40
                                                                        1
                                                                          .

      Thus if Pr[ℰ] ≥ 43 , Claim 6.3 implies that
                                                                             1   39
                                       H(𝑔(𝑥) | 𝑎 1 , . . . 𝑎 10𝑑2 ) ≤ 1 −     =    .                        (6.1)
                                                                             40 40
      Let ℒ be the set of leaves ℓ of ℬ 0 such that H(𝑔(𝑥) | 𝑥 ∈ ℓ ) ≤ 79             80 . For each ℓ ∈ ℒ,
      min𝑏 Pr𝑥∼𝜇 [𝑔(𝑥) = 𝑏 | 𝑥 ∈ ℓ ] ≤ 20 . Conditioned on (𝑎 1 , . . . , 𝑎 10𝑑2 ) ∈ ℒ, the probability that
                                       9

      ℬ 0 errs is at most 20
                          9
                             . By Markov’s inequality and (6.1), it follows that Pr[(𝑎 1 , . . . , 𝑎 10𝑑2 ) ∈
                                                                      79 · 2 < 2 .
             1          0
      ℒ] ≥ 79 . Thus ℬ errs with probability at most 79  1
                                                           · 20
                                                              9
                                                                 + 78        1     1



                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                 22
                O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

                                                                                                                     
      This completes the proof of Theorem 1.7.                                                                       

6.1     Proof of Claim 6.3
Let 𝑣 be a vertex in ℬ. Define Δ(𝑣) as follows.6

                          | Pr𝑥∼𝜇0 [𝑥 𝑖 = 0 | 𝑥 ∈ 𝑣] − Pr𝑥∼𝜇1 [𝑥 𝑖 = 0 | 𝑥 ∈ 𝑣]|
                         
                         
                         
               Δ(𝑣) :=                        if 𝑣 ≠ ⊥ and Pr𝑥∼𝜇𝑏 [𝑥 ∈ 𝑣] > 0 for 𝑏 ∈ {0, 1},
                         
                          1
                                             otherwise.

The following claim shows that if Δ(𝑣) is large, the query outcome of 𝑣 contains significant
information about 𝑔(𝑥).
Claim 6.4. Let 𝑣 be a vertex in ℬ. Let variable 𝑥 𝑖 be queried at 𝑣. Then,
                                                                                                        2
            I(𝑔(𝑥) : 𝑥 𝑖 | 𝑥 ∈ 𝑣) ≥ 8 Pr [𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣] · Pr [𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣] · Δ(𝑣)                    .
                                             𝑥∼𝜇                          𝑥∼𝜇

Proof. Define 𝑏 := 𝑔(𝑥). Condition on the event 𝑥 ∈ 𝑣. Recall from Appendix C that (𝑏 ⊗ 𝑥 𝑖 )
is the distribution over pairs of bits, where the the first and the second bit are distributed
independently according to the distributions of 𝑏 and 𝑥 𝑖 , respectively. Fact C.7 implies that
I(𝑏 : 𝑥 𝑖 ) = D((𝑏, 𝑥 𝑖 )||(𝑏 ⊗ 𝑥 𝑖 ))7. Now, Pinsker’s inequality (Theorem C.9) implies that

                                                              1
                                 D((𝑏, 𝑥 𝑖 )||(𝑏 ⊗ 𝑥 𝑖 )) ≥     ||(𝑏, 𝑥 𝑖 ) − (𝑏 ⊗ 𝑥 𝑖 )|| 21 .                   (6.2)
                                                              2
Next, we bound ||(𝑏, 𝑥 𝑖 ) − (𝑏 ⊗ 𝑥 𝑖 )|| 1 . To this end, we fix bits 𝑧 1 , 𝑧2 ∈ {0, 1}, and bound
| Pr[(𝑏, 𝑥 𝑖 ) = (𝑧 1 , 𝑧2 )] − Pr[(𝑏 ⊗ 𝑥 𝑖 ) = (𝑧 1 , 𝑧2 )]|. We have that,

                          Pr[(𝑏, 𝑥 𝑖 ) = (𝑧1 , 𝑧2 )] = Pr[𝑏 = 𝑧 1 ] Pr[𝑥 𝑖 = 𝑧 2 | 𝑏 = 𝑧 1 ].                     (6.3)

Now,

        Pr[(𝑏 ⊗ 𝑥 𝑖 ) = (𝑧1 , 𝑧2 )] = Pr[𝑏 = 𝑧 1 ] Pr[𝑥 𝑖 = 𝑧 2 ]
                                    = Pr[𝑏 = 𝑧 1 ](Pr[𝑏 = 𝑧 1 ] Pr[𝑥 𝑖 = 𝑧 2 | 𝑏 = 𝑧 1 ]+
                                                                   Pr[𝑏 = 𝑧 1 ] Pr[𝑥 𝑖 = 𝑧 2 | 𝑏 = 𝑧 1 ]).        (6.4)

Taking the absolute difference of (6.4) and (6.3) we have that,

                      | Pr[(𝑏, 𝑥 𝑖 ) = (𝑧 1 , 𝑧2 )] − Pr[(𝑏 ⊗ 𝑥 𝑖 ) = (𝑧 1 , 𝑧2 )]|
                       = Pr[𝑏 = 𝑧 1 ] · Pr[𝑏 = 𝑧 1 ] · Δ(𝑣) = Pr[𝑏 = 0] · Pr[𝑏 = 1] · Δ(𝑣)                        (6.5)

The Claim follows by adding (6.5) over 𝑧 1 , 𝑧2 and using (6.2).                                                     
   6Recall that we mentioned Δ(𝑣) in Section 1.3.
   7See Appendix C for definition of Kullback-Leibler divergence and L1 -distance.


                         T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                       23
                D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

    Let ℬ be run on a random input 𝑥 sampled from 𝜇. The next claim proves a lower bound on
the expected sum of Δ(𝑣) for the random vertices 𝑣 in the transcript of ℬ. Recall from the proof
of Claim 6.2 that 𝑣 𝑘 is the random vertex at which the 𝑘-th query is made; If ℬ terminates before
making 𝑘 queries, define 𝑣 𝑘 := ⊥. Note that if ℬ terminates before making 𝑡 queries, 𝑣 𝑡 = ⊥ and
Δ(𝑣 𝑡 ) = 1.

Claim 6.5. Let 𝑐 be any positive integer. Then,
                                             10𝑑𝑐
                                             Õ                         13𝑐
                                                    𝔼[Δ(𝑣 𝑘 ) | ℰ] ≥       .
                                                                       20
                                              𝑘=1

    To prove Claim 6.5 we need the following claim.

Claim 6.6.
                                              10𝑑
                                              Õ                        13
                                                    𝔼[Δ(𝑣 𝑘 ) | ℰ] ≥      .
                                                                       20
                                              𝑘=1

Proof of Claim 6.6. Let us sample vertices 𝑢 𝑘 of ℬ as follows:
                
                    1 with probability Pr𝑥∼𝜇 [𝑔(𝑥) = 1],
   1. Set 𝑧 =
                    0 with probability Pr𝑥∼𝜇 [𝑔(𝑥) = 0]

   2. Run 𝒫(ℬ, 𝜇0 , 𝜇1 ) on the 1-bit input 𝑧.

   3. Let 𝑢 𝑘 be the vertex 𝑣 of ℬ in the beginning of the 𝑘-th iteration of the while loop of
      Algorithm 2. If the simulation stops before 𝑘 iterations, set 𝑢 𝑘 := ⊥. Return (𝑢 𝑘 ) 𝑘=1,... .

By Theorem 4.2, the transcripts (𝑢 𝑘 ) 𝑘=1,2,... and (𝑣 𝑘 ) 𝑘=1,2,... have the same distribution.
    Now, since 𝔼[𝒩1 ] ≤ 𝜒(𝑔) = 𝑑, we have by Markov’s inequality that the probability that
𝒫(ℬ, 𝜇0 , 𝜇1 ) sets QUERY1 to 1 within first 10𝑑 iterations of the while loop, is at least 9/10. Note
that conditioned on the event that the computation of 𝒫(ℬ, 𝜇0 , 𝜇1 ) is at vertex 𝑣 of ℬ that queries
the input bit 𝑥 𝑖 , the probability that the random real number 𝑟 generated in the same iteration
lies in the interval [min𝑏 Pr𝑥 𝑖 ∼𝜇𝑏 [𝑥 𝑖 = 0 | 𝑥 ∈ 𝑣], max𝑏 Pr𝑥 𝑖 ∼𝜇𝑏 [𝑥 𝑖 = 0 | 𝑥 ∈ 𝑣]] is exactly Δ(𝑣). We
have,
         10𝑑
         Õ                        10𝑑
                                  Õ
               𝔼[Δ(𝑣 𝑘 ) | ℰ] =         𝔼[Δ(𝑢 𝑘 ) | ℰ]
         𝑘=1                      𝑘=1
                             ≥ Pr[QUERY1 is set to to 1 within first 10𝑑 iterations | ℰ]
                                           (by union bound)
                             ≥ Pr[QUERY1 is set to to 1 within first 10𝑑 iterations] − Pr[ℰ]
                                9  1 13
                             ≥    − =    .
                               10 4 20
                                                                                                           

                         T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                             24
                  O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

    The following observation will be useful in the proof of Claim 6.5.

Observation 6.7. Let 𝑣 be any node of ℬ, such that the associated subcube has non-empty
intersections with the supports of both 𝜇0 and 𝜇1 . Let 𝜇00 := 𝜇0 | 𝑣 and 𝜇01 := 𝜇1 | 𝑣. Let ℬ𝑣
denote the subtree of ℬ rooted at 𝑣. Then ℬ𝑣 is an optimal decision tree for 𝜇00 and 𝜇01 .

Proof. If ℬ𝑣 is not an optimal decision tree for 𝜇00 and 𝜇01 then we could replace it by an optimal
decision tree for 𝜇00 and 𝜇01 , and for the resultant decision tree ℬ 0, the expected value of 𝒩1 in
𝒫(ℬ 0 , 𝜇0 , 𝜇1 ) will be smaller than that in 𝒫(ℬ, 𝜇0 , 𝜇1 ). This will contradict the optimality of
ℬ.                                                                                                 


Proof of Claim 6.5. For 𝑖 = 0, . . . , 𝑐 − 1, let 𝑤 be any vertex at depth 10𝑖𝑑 + 1 consistent with ℰ,
such that Pr𝑥∼𝜇0 [𝑥 ∈ 𝑤], Pr𝑥∼𝜇1 [𝑥 ∈ 𝑤] ≠ 0. Consider the subtree T of ℬ rooted at 𝑤. Let 𝑤1 := 𝑤
and 𝑤ℓ be the random vertex at depth ℓ of T, when T is run on a random input from 𝜇 | 𝑤, or ⊥ if
T terminates before ℓ queries. By Observation 6.7, T is an optimal decision tree for distributions
𝜇00 := 𝜇0 | 𝑤, 𝜇01 := 𝜇1 | 𝑤. From Claim 6.6 we have that,

                                                10𝑑
                                                Õ                         13
                                                       𝔼[Δ(𝑤ℓ ) | ℰ] ≥       ,                                           (6.6)
                                                                          20
                                                ℓ =1


where Δ(𝑤ℓ ) is with respect to distributions 𝜇00 and 𝜇01 . Since 𝜇00 | 𝑤ℓ = 𝜇0 | 𝑤ℓ and 𝜇01 | 𝑤ℓ = 𝜇1 |
𝑤ℓ , Δ(𝑤ℓ ) in (6.6) is also with respect to distributions 𝜇0 and 𝜇1 . Now, when 𝑤 is the random
vertex 𝑣 10𝑖𝑑+1 , 𝑤ℓ is the random vertex 𝑣10𝑖𝑑+ℓ . Thus from (6.6) we have that,

                                              10(𝑖+1)𝑑
                                                Õ                            13
                                                         𝔼[Δ(𝑣 𝑘 ) | ℰ] ≥       .                                        (6.7)
                                                                             20
                                              𝑘=10𝑖𝑑+1


The claim follows by adding (6.7) over 𝑖 = 0, . . . , 𝑐 − 1.                                                                 


    Now we are ready to prove Claim 6.3. By setting 𝑐 = 𝑑 and invoking Claim 6.5 we have,

                                                Õ2
                                                10𝑑
                                                                          13𝑑
                                                       𝔼[Δ(𝑣 𝑡 ) | ℰ] ≥       .                                          (6.8)
                                                                           20
                                                𝑡=1


Let E 𝑗 be the event Pr𝑥∼𝜇0 [𝑥 ∈ 𝑣 𝑗 ] ≠ 0 ∧ Pr𝑥∼𝜇1 [𝑥 ∈ 𝑣 𝑗 ] ≠ 0 ∧ 𝑣 𝑗 ≠ ⊥ (i. e., 𝑣 𝑗 is a vertex of ℬ and is
not a leaf). Note that if 𝑣 𝑗 ≠ ⊥ and 𝑣 𝑗 is not a leaf of ℬ, 𝑣 𝑗 is determined by (𝑎 1 , . . . , 𝑎 𝑗−1 ) and vice
versa, and hence I(𝑎 𝑗 : 𝑔(𝑥) | 𝑎 1 , . . . , 𝑎 𝑗−1 ) = I(𝑥 𝑖 𝑗 : 𝑔(𝑥) | 𝑣 𝑗 ). If 𝑣 𝑗 = ⊥ or 𝑣 𝑗 is a leaf of ℬ, then 𝑔(𝑥)
is determined by (𝑎 1 , . . . , 𝑎 𝑖−1 ), and 𝑥 𝑖 𝑗 = ⊥; thus, I(𝑎 𝑗 : 𝑔(𝑥) | 𝑎 1 , . . . , 𝑎 𝑗−1 ) = I(𝑥 𝑖 𝑗 : 𝑔(𝑥) | 𝑣 𝑗 ) = 0.

                           T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                             25
                   D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

Thus we have,

        I(𝑎 1 , . . . , 𝑎 10𝑑2 : 𝑔(𝑥))
            Õ2
            10𝑑
        =          I(𝑎 𝑗 : 𝑔(𝑥) | 𝑎 1 , . . . , 𝑎 𝑗−1 )    (By the chain rule of mutual information
            𝑗=1

                                                (Theorem C.5))
            10𝑑2
            Õ
        =          I(𝑥 𝑖 𝑗 : 𝑔(𝑥) | 𝑣 𝑗 )                 (From the discussion above)
            𝑗=1

              Õ2
              10𝑑     h                                                                    2i
                     𝔼 1E 𝑗 · Pr[𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣 𝑗 ] · Pr[𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣 𝑗 ] · Δ(𝑣 𝑗 )
                             
         ≥8
               𝑗=1

                                                          (From Claim 6.4)                                           (6.9)
              Õ2
              10𝑑                  h                                                                           i
                                        Pr[𝑔(𝑥) = 0 | 𝑥 ∈ 𝑣 𝑗 ] · Pr[𝑔(𝑥) = 1 | 𝑥 ∈ 𝑣 𝑗 ] · Δ(𝑣 𝑗 )
                                                                                                      2
         ≥8          Pr[ℰ] · 𝔼                                                                             |ℰ
               𝑗=1

                                                      (Conditioned on ℰ, E 𝑗 happens with probability 1

                                                                      for each 𝑗 ≤ 10𝑑 2 )
                          Õ2
                          10𝑑
                                  3 1
                     ≥8            · · 𝔼[Δ(𝑣 𝑗 )2 | ℰ]                (By the assumption Pr[ℰ] ≥ 43 )
                                  4 9
                           𝑗=1
                           10𝑑2
                       2Õ
                     =    𝔼[Δ(𝑣 𝑗 )2 | ℰ]
                       3
                            𝑗=1
                           10𝑑2
                       2Õ                2
                     ≥    𝔼[Δ(𝑣 𝑗 ) | ℰ]                               (By Jensen’s inequality)
                       3
                            𝑗=1
                                                                  2
                                        10𝑑2
                      2  1 ©Õ
                     ≥ ·        𝔼[Δ(𝑣 𝑗 ) | ℰ]®
                                              ª
                            2
                                                                      (By Cauchy-Schwarz inequality)
                      3 10𝑑
                                     « 𝑗=1                    ¬
                       1
                     ≥    .                                   (From (6.8))                                          (6.10)
                       40

7   Tightness
In this section we prove Theorem 1.2. We construct a Boolean relation 𝑓0 ⊆ {0, 1} 𝑛 × {0, 1} 𝑛 (i. √ e.,
             𝑛                                  𝑛
𝒮 = {0, 1} ) and a promise function 𝑔0 ⊆ {0, 1} × {0, 1} (i. e., 𝑚 = 𝑛), such that ℝ4/9 ( 𝑓0 ) ∈ Θ( 𝑛),
ℝ1/3 (𝑔0 ) ∈ Θ(𝑛) and ℝ1/3 ( 𝑓0 ◦ 𝑔0𝑛 ) ∈ Θ(𝑛).

                             T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                      26
                  O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

      For strings 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ), 𝑧 = (𝑧 1 , . . . , 𝑥 𝑛 ) in {0, 1} 𝑛 , let 𝑥 ⊕ 𝑧 be the string (𝑥 1 ⊕ 𝑧 1 , . . . , 𝑥 𝑛 ⊕
𝑧 𝑛 ) obtained by taking their bitwise XOR. Let |𝑥| stand for the Hamming weight |{𝑖 ∈ [𝑛] : 𝑥 𝑖 = 1}|
of 𝑥. We define 𝑓0 as follows:
                                         n                                               𝑛 √ o
                               𝑓0 = (𝑎, 𝑧) ∈ {0, 1} 𝑛 × {0, 1} 𝑛 |𝑎 ⊕ 𝑧| ≤                 − 𝑛
                                   def
                                                                                         2

Now we define 𝑔0 by specifying 𝑔0−1 (0) and 𝑔0−1 (1).
                                                         n               𝑛 √ o
                                   𝑔0−1 (0) = (𝑥, 0) 𝑥 ∈ {0, 1} 𝑛 , |𝑥| ≤  − 𝑛 ,
                                                   def
                                                                         2
                                                n                        𝑛 √ o
                                   𝑔0−1 (1) = (𝑥, 1) 𝑥 ∈ {0, 1} 𝑛 , |𝑥| ≥ + 𝑛 .
                                            def
                                                                         2
𝑔0 is a gap-majority function with specific parameters. The two-party version of 𝑔0 is the
gap-Hamming-Distance problem. It has been studied before in the context of communication
complexity [8]. We now determine the randomized query complexities of 𝑓0 , 𝑔0 and 𝑓0 ◦ 𝑔0𝑛 .
                                       √
Claim 7.1.         (i) ℝ4/9 ( 𝑓0 ) ∈ Ω( 𝑛).

  (ii) ℝ1/3 (𝑔0 ) ∈ Ω(𝑛).

 (iii) ℝ𝜖 ( 𝑓0 ◦ 𝑔0𝑛 ) ∈ 𝑂(𝑛) for any 𝜖 ∈ Ω(1).

    Theorem 1.2 follows from Theorem 1.5 and Claim 7.1 with 𝜖 set to 31 .

Proof of Claim 7.1.   (i) Assume that a deterministic protocol of cost 𝑘 solves 𝑓0 with respect to
      the uniform input distribution with error at most 4/9. Such a protocol partitions {0, 1} 𝑛
      into (at most) 2 𝑘 subcubes, each marked by some “answer” (an element from {0, 1} 𝑛 ). In
      particular, more than 2𝑛 − 2𝑛−4 points belong to subcubes of size at least 2𝑛−𝑘−4 – in other
      words, to subcubes of codimension at most 𝑘 + 4. As more than 15    16 fraction of all points
      belong to such subcubes and the total protocol error is at most 4/9, there exists at least
      one subcube of codimension 𝑘 + 4, on which the protocol errs with probability less than
      9 · 15 < /2.
      4 16     1

       The symmetry in the definition of 𝑓0 allows us to assume without loss of generality that
        the subcube is the set 𝜏 = 0 𝑘+4 ◦ {0, 1} 𝑛−𝑘−4 , where “◦” denotes string concatenation. It is
                                             def

        easy to see that the “answer” that would minimize the error probability with respect to
        this subcube can be any binary string starting with “0 𝑘+4 ”, so let us assume that the actual
        label is 0𝑛 . Let 𝒰ℓ denote uniform distribution on {0, 1} 𝑛 . Then
                                                                          h
                                                                                0      𝑛 √ i 1
                                   Pr [error | 𝑍 ∈ 𝜏] =          Pr           |𝑍 | ≤     − 𝑛 < /2,
                                                             𝑍 ∼𝒰𝑛−𝑘−4
                                                               0                       2
                                   √
       which√implies that 𝑘 + 4 ≥ 2 𝑛, as a √
                                            uniformly-random binary string of length more than
       𝑛 − 2 𝑛 would have more than 𝑛2 − 𝑛 “ones” with probability at least 1/2.

                             T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                                 27
                 D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

  (ii) A randomized query protocol of cost 𝑘 and error 1/3 for 𝑔0 would trivially imply existence
       of a randomized communication protocol of cost at most 2𝑘 and error 1/3 for the bipartite
       problem Gap-Hamming-Distance:
                                                                    √
                                              
                                               0 if |𝑋 ⊕ 𝑌| ≤ 𝑛2 − 𝑛;
                                                                    √
                                              
                            𝐺𝐻𝐷(𝑋 , 𝑌) = 1 if |𝑋 ⊕ 𝑌| ≥ 𝑛2 + 𝑛;
                                          def 
                                              
                                              
                                               ∗ otherwise,
                                              
                                              
       and it has been demonstrated by Chakrabarti and Regev [8] that the complexity of this
       problem for any constant error is Ω(𝑛).
 (iii) Consider the following protocol for computing 𝑓0 (𝑔0 (𝑥 1 ), . . . , 𝑔0 (𝑥 𝑛 )), where 𝑥 𝑖 ∈ {0, 1} 𝑛 :
       For every 𝑖 ∈ [𝑛], let 𝑎 𝑖 = 𝑥 𝑖 (𝑗𝑖 ), where 𝑗𝑖 ∼    ⊂ [𝑛] – that is, 𝑎 𝑖 is a uniformly-random bit of 𝑥 𝑖 .
       Then {𝑖|𝑎 𝑖 = 𝑔0 (𝑥 𝑖 )} – the expected number of “correctly guessed” values of 𝑎 𝑖 – is at
                   √
       least 𝑛2 + 𝑛; intuitively, this means that the probability that 𝑎 1 , . . . , 𝑎 𝑛 is a right answer
       to 𝑓0 (𝑔0 (𝑥 1 ), . . . , 𝑔0 (𝑥 𝑛 )) is “non-trivially high” – to “boost” this probability, we will use
       several “probes” from every 𝑥 𝑖 .
       For 𝑚 a power of 3, let
                                                     𝜏𝑚 : {0, 1} 𝑚 → {0, 1}
       be the Boolean function represented by a complete ternary tree of depth log3 𝑚 with leaves
       labelled by the 𝑚 input variables naturally ordered and vertices computing the majority:

                                   Maj3 (𝑧 1 , 𝑧2 , 𝑧3 ) = (𝑧 1 ∧ 𝑧 2 ) ∨ (𝑧 1 ∧ 𝑧 3 ) ∨ (𝑧 2 ∧ 𝑧 3 ).
                                                       def



       Protocol: For an integer 𝑑 𝜖 (to be fixed later), independently choose 𝑗𝑖,𝑘 ∼
                                                                                   ⊂ [𝑛] for 𝑖 ∈ [𝑛]
       and 𝑘 ∈ [3𝑑𝜖 ]. Let 𝑎 𝑖 = 𝜏3𝑑𝜖 𝑥 𝑖 (𝑗𝑖,1 ), . . . , 𝑥 𝑖 (𝑗𝑖,3𝑑𝜖 ) and output “𝑎 1 , . . . , 𝑎 𝑛 ”.8
                                    def                                  

       The behaviour of a single Maj3 (𝑥 𝑖 (𝑗𝑖,1 ), 𝑥 𝑖 (𝑗𝑖,2 ), 𝑥 𝑖 (𝑗𝑖,3 )) is as follows. For 𝛿 ∈ (0, 1/2]:
                              h                                                                  𝑛      i
                          Pr Maj3 (𝑥 𝑖 (𝑗𝑖,1 ), 𝑥 𝑖 (𝑗𝑖,2 ), 𝑥 𝑖 (𝑗𝑖,3 )) = 𝑔0 (𝑥 𝑖 ) |𝑥 𝑖 | −     =𝛿·𝑛
                                                                                                 2
                                 1             1         1      1 3𝛿
                              = ( + 𝛿)3 + 3 · ( + 𝛿)2 · ( − 𝛿) = +   − 2𝛿3
                                 2            2        2      2  2
                                     1 5𝛿 3
                              > min    +     ,    .
                                     2    4 4
       The same analysis applies to every node in the tree representation of the function 𝜏, so:
                                                       𝑛  √ i      1 (5/4)𝑑𝜖 3
                               h                                                                        
                           Pr 𝑎 𝑖 = 𝑔0 (𝑥 𝑖 ) |𝑥 𝑖 | −   ≥ 𝑛 > min   + √ ,     .
                                                       2           2     𝑛 4
   8Here we are using 𝜏3𝑑𝜖 (·) for approximating the value of 𝑔0 (𝑥 𝑖 ), instead of the majority function, which would
look more natural here (moreover, it is easy to see that the function 𝜏 is somewhat less efficient for the task). The
symmetry appearing in the tree representation of the function 𝜏 simplifies its analysis considerably, while the
resulting complexity is sufficient for our needs.


                          T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                     28
                 O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY
                                                                        √
                                                                         𝑛
     As long as 𝑑 𝜖 ∈ 𝑜(log 𝑛), it holds that (5/4)𝑑𝜖 ≤                 4 for sufficiently large 𝑛, and so:

                                                                                         1 (5/4)𝑑𝜖
                                         Pr [𝑎 𝑖 = 𝑔0 (𝑥 𝑖 ) | 𝑔0 (𝑥 𝑖 ) ≠ ∗] >            + √ .
                                                                                         2     𝑛

      Note that 𝑎 1 , . . . , 𝑎 𝑛 is a wrong answer to 𝑓0 (𝑔0 (𝑥 1 ), . . . , 𝑔0 (𝑥 𝑛 )) only if {𝑖|𝑎 𝑖 = 𝑔0 (𝑥 𝑖 )} <
      𝑛
         √
      2 + 𝑛, so by a Bernstein-type tail bound, as given in [2], it follows that
                                                                                                         2
                                  Pr[The protocol errs] < exp                   −1/2 ·       (5/4)𝑑𝜖 − 1         .

     Accordingly, 𝑑 𝜖 ∈ 𝑂(1) suffices for any 𝜖 ∈ Ω(1) and the result follows.
                                                                                                                         

A    Minimax principle: proof of Fact 2.4
Fix an integer ℓ . Let 𝒟ℓ be the finite set of all deterministic query algorithms on 𝑘 bits with
worst-case complexity at most ℓ . Let ℋ𝑘 := {0, 1} 𝑘 . For algorithm A ∈ 𝒟ℓ and input 𝑥 ∈ ℋ𝑘 , let
𝔼(A, 𝑥) = 1 if (𝑥, A(𝑥)) ∉ ℎ, and 0 otherwise. By von Neumann’s minimax principle,
                                 Õ                                                       Õ
           min max                      𝜎(A)𝔼(A, 𝑥)𝜇(𝑥) = max min                                  𝜎(A)𝔼(A, 𝑥)𝜇(𝑥),   (A.1)
             𝜎      𝜇                                               𝜇           𝜎
                         A∈𝒟ℓ ,𝑥∈ℋ𝑘                                                 A∈𝒟ℓ ,𝑥∈ℋ𝑘

where 𝜎 and 𝜇 range over probability distributions over 𝒟ℓ and ℋ𝑘 , respectively. Note that in
equation (A.1), we can assume that the maximum in the left hand side is over point distributions
on ℋ𝑘 , i. e., distributions that assign weight 1 to some input 𝑥 ∈ ℋ𝑘 . Similarly we can assume
that the minimum in the right hand side is over point distributions on 𝒟ℓ . Thus we have that,
                                         Õ                                           Õ
                         min max               𝜎(A)𝔼(A, 𝑥) = max min                          𝔼(A, 𝑥)𝜇(𝑥).            (A.2)
                             𝜎   𝑥∈ℋ𝑘                               𝜇   A∈𝒟ℓ
                                        A∈𝒟ℓ                                        𝑥∈ℋ𝑘

From equation (A.2) it follows that
                         (                                                  )
                                               Õ
       ℝ𝜖 (ℎ) = min ℓ min max                         𝜎(A)𝔼(A, 𝑥) ≤ 𝜖
                                 𝜎    𝑥∈ℋ𝑘
                                               A∈𝒟ℓ

                                          (where 𝜎 ranges over all probability distributions on 𝒟ℓ )
                         
                                              Õ                            
                                                                            
                         
                                                                           
                                                                            
                 = min ℓ max min                      𝔼(A, 𝑥)𝜇(𝑥) ≤ 𝜖
                                  𝜇     A∈𝒟ℓ
                                               𝑥∈ℋ𝑘
                         
                                                                           
                                                                            
                                                                           
                                          (where 𝜇 ranges over all probability distributions on ℋ𝑘 )
                                 
                                              Õ                            
                                                                            
                                 
                                                                           
                                                                            
                 = max min ℓ min                      𝔼(A, 𝑥)𝜇(𝑥) ≤ 𝜖
                    𝜇                   A∈𝒟ℓ
                                               𝑥∈ℋ𝑘
                                 
                                                                           
                                                                            
                                                                           
                     𝜇
                 = 𝔻𝜖 (ℎ).

                          T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                                          29
               D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

B    Alternative characterization of sabotage complexity
We first go over the standard definition of sabotage complexity from [7]. Let 𝑔 ⊆ {0, 1} 𝑚 ×{0, 1} be
a partial function. From 𝑔, define a partial function 𝑔sab : 𝑃 → {∗, †}, where now 𝑃 ⊆ {0, 1, ∗, †} 𝑛
is defined in the following way. Let 𝑃 ∗ ⊆ {0, 1, ∗} be the largest set such that for all 𝑧 ∈ 𝑃 ∗ there
exist 𝑥, 𝑦 with 𝑔(𝑥) ≠ 𝑔(𝑦) and both 𝑥 and 𝑦 are consistent with the non-star coordinates of 𝑧.
Define 𝑃 † ⊆ {0, 1, †} analogously with † instead of ∗. Then 𝑃 = 𝑃 ∗ ∪ 𝑃 † . Finally, define 𝑔sab (𝑧) = ∗
if 𝑧 ∈ 𝑃 ∗ and 𝑔sab (𝑧) = † if 𝑧 ∈ 𝑃 † . The sabotage complexity of 𝑔 is defined as ℝ𝕊(𝑔) = 𝑅0 (𝑔sab ).
     For a tree 𝑇 computing 𝑔, and strings 𝑥, 𝑦 such that 𝑔(𝑥) ≠ 𝑔(𝑦), let sep𝑇 (𝑥, 𝑦) denote the
depth of the node 𝑣 in 𝑇 such that 𝑥 and 𝑦 both reach 𝑣 yet 𝑥 𝑞(𝑣) ≠ 𝑦 𝑞(𝑣) where 𝑞(𝑣) is the index
queried at node 𝑣. We have the following alternative characterization of sabotage complexity.

Theorem B.1. Let 𝑔 ⊆ {0, 1} 𝑚 × {0, 1} be a partial function. Then

                              ℝ𝕊(𝑔) = min max
                                           𝑥,𝑦
                                               𝔼𝑇∼𝒯 [sep𝑇 (𝑥, 𝑦)]
                                          𝒯
                                              𝑔(𝑥)≠𝑔(𝑦)

                                      = max min 𝔼(𝑥,𝑦)∼𝑝 [sep𝑇 (𝑥, 𝑦)] .
                                          𝑝     𝑇


    In the first equation the minimum is taken over zero-error randomized algorithms 𝒯 for
𝑔. In the second equation, the maximum is taken over distributions over pairs (𝑥, 𝑦) where
𝑔(𝑥) = 0, 𝑔(𝑦) = 1, and the minimum is taken over deterministic trees 𝑇 computing 𝑔.


Proof. That the right hand side of the first line is equal to the second line follows by von
Neumann’s minimax theorem [18].
    Now we focus on establishing the first line. We first show that ℝ𝕊(𝑔) is at most the right
hand side of the first line. Let 𝒯 ∗ achieve the minimum of the expression on the right hand
side. Let 𝑧 ∈ 𝑃 be any sabotaged input. Then there are 𝑥 ∗ , 𝑦 ∗ with 𝑔(𝑥 ∗ ) ≠ 𝑔(𝑦 ∗ ) such that 𝑥 ∗
and 𝑦 ∗ only differ where 𝑧 has special symbols. Thus any query that separates 𝑥 ∗ and 𝑦 ∗ will
also find a special symbol. The expected number of queries to separate 𝑥 ∗ and 𝑦 ∗ is at most
max𝑥,𝑦 𝔼𝑇∼𝒯 ∗ [sep𝑇 (𝑥, 𝑦)], thus the left hand side is at most the right hand side.
    For the other direction, let 𝒯 ∗ be an optimal zero-error randomized algorithm computing
𝑔sab . For any 𝑥, 𝑦 with 𝑔(𝑥) ≠ 𝑔(𝑦) we can create 𝑧 ∗ ∈ 𝑃 ∗ such that 𝑧 ∗ has ∗ in those positions
where 𝑥, 𝑦 disagree, and 𝑧 ∗ agrees with 𝑥, 𝑦 in those positions where they agree with each other..
Let 𝑧 † equal 𝑧 ∗ with ∗ replaced by †. Now 𝒯 ∗ is able to distinguish between 𝑧 ∗ and 𝑧 † using an
expected number of queries that is at most ℝ𝕊(𝑔). Any query that distinguishes 𝑧 ∗ and 𝑧 † is
also a query that separates 𝑥 and 𝑦, as 𝑧 ∗ and 𝑧 † only differ where 𝑥 and 𝑦 do. This means

                                    𝔼𝑇∼𝒯 ∗ [sep𝑇 (𝑥, 𝑦)] ≤ ℝ𝕊(𝑔) ,

showing that the right hand side is at most the left hand side.                                       

                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                            30
              O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

C    Information Theory
Let 𝑋 be a random variable supported on a finite set {𝑥 1 , . . . , 𝑥 𝑠 }. Let ℰ be any event in the
same probability space. Let ℙ[·] denote the probability of any event. The conditional entropy
H(𝑋 | ℰ) of 𝑋 conditioned on ℰ is defined as follows.
Definition C.1 (Conditional entropy).
                                        𝑠
                                        Õ                                         1
                         H(𝑋 | ℰ) :=           ℙ[𝑋 = 𝑥 𝑖 | ℰ] log2                         .
                                                                         ℙ[𝑋 = 𝑥 𝑖 | ℰ]
                                         𝑖=1

   An important special case is when ℰ is the entire sample space. In that case the above
conditional entropy is referred to as the entropy H(𝑋) of 𝑋.
Definition C.2 (Entropy).
                                           𝑠
                                           Õ                              1
                                H(𝑋) :=          ℙ[𝑋 = 𝑥 𝑖 ] log2                  .
                                                                       ℙ[𝑋 = 𝑥 𝑖 ]
                                           𝑖=1

    Let 𝑌 be another random variable in the same probability space as 𝑋, taking values from a
finite set {𝑦1 , . . . , 𝑦𝑡 }. Then the conditional entropy of 𝑋 conditioned on 𝑌, H(𝑋 | 𝑌), is defined
as follows.
Definition C.3.
                                               𝑡
                                               Õ
                              H(𝑋 | 𝑌) =             ℙ[𝑌 = 𝑦 𝑖 ] · H(𝑋 | 𝑌 = 𝑦 𝑖 ).
                                               𝑖=1

Definition C.4 (Mutual information). Let 𝑋, 𝑌 and 𝑍 be two random variables in the same
probability space, taking values from finite sets. The mutual information between 𝑋 and 𝑌
conditioned on 𝑍, I(𝑋; 𝑌 | 𝑍), is defined as follows.
                                 I(𝑋; 𝑌 | 𝑍) := H(𝑋 | 𝑍) − H(𝑋 | 𝑌, 𝑍).
It can be shown that I(𝑋; 𝑌 | 𝑍) is symmetric in 𝑋 and 𝑌: I(𝑋; 𝑌 | 𝑍) = I(𝑌; 𝑋 | 𝑍) = H(𝑌 |
𝑍) − H(𝑌 | 𝑋 , 𝑍).
Theorem C.5 (Chain rule of mutual information). Let 𝑋1 , . . . , 𝑋 𝑘 , 𝑌, 𝑍 be random variables in the
same probability space, taking values from finite sets. Then,
                                                       𝑘
                                                       Õ
                       I(𝑋1 , . . . , 𝑋 𝑘 : 𝑌 | 𝑍) =         I(𝑋𝑖 : 𝑌 | 𝑍, 𝑋1 , . . . , 𝑋𝑖−1 ).
                                                       𝑖=1

Definition C.6 (Kullback-Leibler Divergence). Given two probability distributions P and ℚ on a
finite set 𝒰, the Kullback-Leibler divergence from ℚ to P, denoted by 𝔻(P|| ℚ), is defined as:
                                                        Õ                P(𝑢)
                                    𝔻(P||Q) := −              P(𝑢) log        .
                                                                         Q(𝑢)
                                                        𝑢∈𝒰


                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                          31
              D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

    Given two random variables, X and Y, taking values in a finite set 𝒰, let X ⊗ Y be the
distribution over ordered pairs of elements of 𝒰 (i. e., over elements of 𝒰 × 𝒰), where the first
and the second element are sampled independently according to the distributions of X and Y,
respectively. Let (X , Y) denote the joint distribution of X and Y. The following fact can be easily
verified.

Fact C.7. I(X : Y) = D((X , Y)||(X ⊗ Y)).

Definition C.8. Given two probability distributions P and Q on a finite set 𝒰, the L1 -distance
between P and Q, denoted by ||P − Q|| 1 , is defined as:
                                                    Õ
                                   ||P − Q|| 1 :=         |P(𝑢) − Q(𝑢)|.
                                                    𝑢∈𝒰

   Pinsker’s inequality, stated below, bounds 𝔻(𝑃||𝑄) in terms of |P(𝑢) − Q(𝑢)| from below.

Theorem C.9 (Pinsker’s inequality). Given two probability distributions P and Q on a finite set 𝒰,

                                                     1
                                       𝔻(P||Q) ≥       ||P − Q|| 21 .
                                                     2

Acknowledgements. We thank Rahul Jain for useful discussions. We thank Srĳita Kundu and
Jevgēnĳs Vihrovs for their helpful comments on the manuscript. We thank Yuval Filmus for
suggesting to look at the min-max version of conflict complexity, which led to the development
of max-conflict complexity. We thank the anonymous reviewers for their helpful comments.
   This research was supported by the National Research Foundation, Singapore, and A*STAR
under its CQT Bridging Grant and its Quantum Engineering Programme under grant NRF2021-
QEP2-02-P05. Part of this work was conducted while T. L. and S. S. were at the Nanyang
Technological University and the Centre for Quantum Technologies, supported by the Singapore
National Research Foundation under NRF RF Award No. NRF-NRFF2013-13. Part of this
work was done while D.G. was visiting the Centre for Quantum Technologies at the National
University of Singapore.


References
 [1] Scott Aaronson: Quantum certificate complexity. J. Comput. System Sci., 74(3):313–322, 2008.
     Preliminary version in CCC’03. [doi:10.1016/j.jcss.2007.06.020, arXiv:quant-ph/0210020]
     12

 [2] Dana Angluin and Leslie G. Valiant: Fast probabilistic algorithms for Hamiltonian circuits
     and matchings. J. Comput. System Sci., 18(2):155–193, 1979. Preliminary version in STOC’77.
     [doi:10.1016/0022-0000(79)90045-X] 29

 [3] Anurag Anshu, Dmitry Gavinsky, Rahul Jain, Srijita Kundu, Troy Lee, Priyanka
     Mukhopadhyay, Miklos Santha, and Swagato Sanyal: A composition theorem for

                      T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                       32
             O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

    randomized query complexity. In Proc. 37th Found. Softw. Techn. Theoret. Comp. Sci.
    Conf. (FSTTCS’17), pp. 10:1–13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
    [doi:10.4230/LIPIcs.FSTTCS.2017.10, arXiv:1706.00335] 5
 [4] Boaz Barak, Mark Braverman, Xi Chen, and Anup Rao: How to compress interactive
     communication. SIAM J. Comput., 42(3):1327–1363, 2013. [doi:10.1137/100811969] 6, 8, 21
 [5] Shalev Ben-David and Eric Blais: A tight composition theorem for the randomized query
     complexity of partial functions. In Proc. 61st FOCS, pp. 240–246. IEEE Comp. Soc., 2020.
     [doi:10.1109/FOCS46700.2020.00031, arXiv:2002.10809] 3, 4, 9
 [6] Shalev Ben-David, Eric Blais, Mika Göös, and Gilbert Maystre: Randomised composi-
     tion and small-bias minimax. In Proc. 63rd FOCS, pp. 624–635. IEEE Comp. Soc., 2022.
     [doi:10.1109/FOCS54457.2022.00065, arXiv:2208.12896] 9
 [7] Shalev Ben-David and Robin Kothari: Randomized query complexity of sabotaged and
     composed functions. Theory of Computing, 14(5):1–27, 2018. Preliminary version in ICALP’16.
     [doi:10.4086/toc.2018.v014a005, arXiv:1605.09071] 4, 5, 12, 30
 [8] Amit Chakrabarti and Oded Regev: An optimal lower bound on the communication
     complexity of Gap-Hamming-Distance. SIAM J. Comput., 41(5):1299–1317, 2012. Preliminary
     version in STOC’11. [doi:10.1137/120861072, arXiv:1009.3460] 27, 28
 [9] Dmitry Gavinsky, Troy Lee, and Miklos Santha: On the randomised query complexity of
     composition, 2018. [arXiv:1801.02226] 12
[10] Dmitry Gavinsky, Troy Lee, Miklos Santha, and Swagato Sanyal: A composition theorem
     for randomized query complexity via max-conflict complexity. In Proc. 46th Internat. Colloq.
     on Automata, Languages, and Programming (ICALP’19), pp. 64:1–13. Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.ICALP.2019.64] 1, 9
[11] Justin Gilmer, Michael Saks, and Srikanth Srinivasan: Composition limits and separating
     examples for some Boolean function complexity measures. Combinatorica, 36(3):265–311,
     2016. Preliminary version in CCC’13. [doi:10.1007/s00493-014-3189-x, arXiv:1306.0630] 12
[12] Peter Høyer, Troy Lee, and Robert Špalek: Negative weights make adversaries stronger. In
     Proc. 39th STOC, pp. 526–535. ACM Press, 2007. [doi:10.1145/1250790.1250867, arXiv:quant-
     ph/0611054] 3
[13] Yaqiao Li: Conflict complexity is lower bounded by block sensitivity. Theoret. Comput. Sci.,
     856:169–172, 2021. [doi:10.1016/j.tcs.2020.12.038, arXiv:1810.08873] 12
[14] Ashley Montanaro: A composition theorem for decision tree complexity. Chicago J. Theoret.
     Comp. Sci., 2014(6):1–8. [doi:10.4086/cjtcs.2014.006] 3
[15] Ben W. Reichardt: Reflections for quantum query algorithms. In Proc. 22nd
     Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’11), pp. 560–569. SIAM, 2011.
     [doi:10.1137/1.9781611973082.44, arXiv:1005.1601] 3

                     T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                     33
             D MYTRO G AVINSKY, T ROY L EE , M IKLOS S ANTHA , AND S WAGATO S ANYAL

[16] Swagato Sanyal: A composition theorem via conflict complexity, 2018. [arXiv:1801.03285]
     12

[17] Avishay Tal: Properties and applications of boolean function composition. In Proc.
     4th Innovations in Theoret. Comp. Sci. Conf. (ITCS’13), pp. 441–454. ACM Press, 2013.
     [doi:10.1145/2422436.2422485, ECCC:TR12-163] 3, 12

[18] John von Neumann: Zur Theorie der Gessellschaftsspiele. Mathematische Annalen, 100:295–
     320, 1928. EuDML. 8, 30


AUTHORS

     Dmytro Gavinsky
     Researcher
     Institute of Mathematics
     Czech Academy of Sciences
     115 67 Žitna 25, Praha 1, Czechia
     gavinsky math caz cz
     https://users.math.cas.cz/~gavinsky/


     Troy Lee
     Associate Professor
     Centre for Quantum Software and Information
     University of Technology Sydney
     troyjlee gmail com
     https://troylee.org/


     Miklos Santha
     Research Director emeritus
     CNRS, IRIF, Université Paris Cité, F-75013 Paris, France
     and
     Research Professor and Principal Investigator
     CQT, National University of Singapore
     Singapore 117543, Singapore
     miklos.santha gmail com
     https://www.irif.fr/~santha/




                    T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                 34
           O PTIMAL C OMPOSITION T HEOREM FOR R ANDOMIZED Q UERY C OMPLEXITY

    Swagato Sanyal
    Assistant Professor
    Department of Computer Science and Engineering
    IIT Kharagpur
    India
    swagato cse iitkgp ac in
    http://cse.iitkgp.ac.in/~swagato/


ABOUT THE AUTHORS

    In the good old days Dmytro Gavinsky studied at the Technion - Israel Institute
        of Technology and at the University of Calgary. Thanks to the support of his
        scientific advisers Nader Bshouty, Richard Cleve and John Watrous, he graduated
        from both institutions and launched his own research activities.


    Troy Lee received his Ph. D. in 2006 from the University of Amsterdam, where
       his advisor was Harry Buhrman. He is currently an Associate Professor at the
       University of Technology Sydney.


    Miklos Santha received his Diploma in Mathematics in 1979 from Eötvös University
       in Budapest, and his Ph. D. in Mathematics in 1983 from the Université Paris 7.
       His advisor was Jacques Stern. After having been a CNRS researcher between
      1988 and 2021 he is currently Research Director emeritus at the Université Paris
       Cité. He is also Research Professor and Principal Investigator at CQT, National
       University of Singapore.


    Swagato Sanyal graduated from TIFR, Mumbai in 2017; his advisor was Prahladh
       Harsha. The work presented in this article was partly carried out when he was a
       postdoctoral fellow at Centre for Quantum Technologies, National University of
       Singapore. He is currently an Assistant Professor at the Department of Computer
       Science and Engineering, IIT Kharagpur.




                  T HEORY OF C OMPUTING, Volume 19 (9), 2023, pp. 1–35                    35