DOKK Library

Lower Bound Techniques in the Comparison-Query Model and Applications to Inversion Minimization

Authors Ivan Hu, Dieter van Melkebeek, Andrew Morgan,

License CC-BY-3.0

Plaintext
                           T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62
                                        www.theoryofcomputing.org




              Lower Bound Techniques in the
              Comparison-Query Model and
             Inversion Minimization on Trees
            Ivan Hu                 Dieter van Melkebeek               Andrew Morgan
                  Received March 18, 2023; Revised July 1, 2024; Published December 7, 2024




       Abstract. Given a rooted tree and a ranking of its leaves, what is the minimum
       number of inversions of the leaves that can be attained by ordering the tree? This
       variation of the well-known problem of counting inversions in arrays originated in
       mathematical psychology. It has the evaluation of the Mann–Whitney statistic for
       detecting differences between distributions as a special case.
           We study the complexity of the problem in the comparison-query model, the
       standard model for problems like sorting, selection, and heap construction. The
       complexity depends heavily on the shape of the tree: for trees of unit depth, the
       problem is trivial; for many other shapes, we establish lower bounds close to the
       strongest known in the model, namely the lower bound of log2 (𝑛!) for sorting 𝑛
       items. For trees with 𝑛 leaves we show, in increasing order of closeness to the sorting
       lower bound:
   A conference version of this paper appeared in the Proceedings of the 34th ACM-SIAM Symposium on Discrete
Algorithms (SODA’23) [14].


ACM Classification: Theory of computation β†’ Computational complexity and cryptography
AMS Classification: 68Q17, 68Q25
Key words and phrases: query complexity, comparison-query model, lower bounds, permutahe-
dron, sensitivity, connectivity, inversions, trees, Mann–Whitney, Gaussian binomial coefficients,
Gaussian polynomials


Β© 2024 Ivan Hu, Dieter van Melkebeek, and Andrew Morgan
c b Licensed under a Creative Commons Attribution License (CC-BY)                 DOI: 10.4086/toc.2024.v020a007
                    I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

       (a) log2 ((𝛼(1 βˆ’ 𝛼)𝑛)!) βˆ’ 𝑂(log 𝑛) queries are needed whenever the tree has a
           subtree that contains a fraction 𝛼 of the leaves. This implies a lower bound of
                     π‘˜
           log2 (( (π‘˜+1) 2 𝑛)!) βˆ’ 𝑂(log 𝑛) for trees of degree π‘˜.


       (b) log2 (𝑛!) βˆ’ 𝑂(log 𝑛) queries are needed in case the tree is binary.
       (c) log2 (𝑛!) βˆ’ 𝑂(π‘˜ log π‘˜) queries are needed for certain classes of trees of degree π‘˜,
           including perfect trees with even π‘˜.
         The lower bounds are obtained by developing two new techniques for any problem
     Ξ  in the comparison-query model and applying them to inversion minimization on
     trees. Both techniques can be described in terms of the Cayley graph of the symmetric
     group with adjacent-rank transpositions as the generating set, or equivalently,
     in terms of the edge graph of the permutahedron, the polytope spanned by all
     permutations of the vector (1, 2, . . . , 𝑛). Consider the subgraph consisting of the
     edges between vertices with the same value under Ξ . We show that the size of any
     decision tree for Ξ  must be at least
       (i) the number of connected components of this subgraph, and
       (ii) the factorial (or Ξ“ function) of the average degree of the complementary
            subgraph, divided by 𝑛.
     Lower bounds on query complexity then follow by taking the base-2 logarithm.
     Technique (i) represents a discrete analog of a classical technique in algebraic
     complexity and allows us to establish (c) and a tight lower bound for counting cross
     inversions, as well as unify several of the known lower bounds in the comparison-
     query model. Technique (ii) represents an analog of sensitivity arguments in Boolean
     complexity and allows us to establish (a) and (b).
        Along the way to proving (b), we derive a tight upper bound on the maximum
     probability of the distribution of cross inversions, which is the distribution of the
     Mann–Whitney statistic in the case of the null hypothesis. Up to normalization, the
     probabilities appear in the literature as the coefficients of polynomials formed by the
     Gaussian binomial coefficients, also known as Gaussian polynomials.


1   Overview
The result of a hierarchical cluster analysis on a set 𝑋 of items can be thought of as an unordered
rooted tree 𝑇 with leaf set 𝑋. To visualize the tree, or to spell out the classification in text,
one needs to decide for every internal node of 𝑇 in which order to visit its children. Figure 1a
represents an example of a classification of eight body parts from the psychology literature
[6]. It is obtained by repeatedly clustering nearest neighbors where the distance between two
items is given by the number of people in a survey who put the items into different classes [22].
The ordering of the resulting binary tree in Figure 1a is the output produced by a particular
implementation of the clustering algorithm.

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                       2
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL




                                              ear      ear


             toe chest waist   cheek mouth                     cheek mouth chest waist         toe


  knee thigh                                                                        thigh knee

             (a) Initial tree ordering                          (b) Optimal tree ordering

                                Figure 1: Classification of body parts


    Another ordering is given in Figure 1b; black marks the nodes whose children have been
swapped from the ordering in Figure 1a. Figure 1b has the advantage over Figure 1a that the
leaves now appear in an interesting global order, namely head-to-toe: ear, cheek, mouth, chest,
waist, thigh, knee, toe. Indeed, Figure 1b makes apparent that the anatomical order correlates
perfectly with the clustering. In general, given a tree 𝑇 and a ranking 𝜌 of its leaves, one might
ask β€œhow correlated” is 𝑇 with 𝜌? Degerman [6] suggests evaluating the orderings of 𝑇 in terms
of the number of inversions of the left-to-right ranking 𝜎 of the leaves with respect to the given
ranking 𝜌, and use the minimum number over all orderings as a measure of (non)correlation.
For further reference, we collect the formal definitions of the underlying notions.
Definition 1.1 (tree ordering, ranking, inversion, InvΒ· (Β·)). An ordering of a rooted tree 𝑇 consists
of an ordering, for every non-leaf 𝑣 of 𝑇, of the children of 𝑣. A ranking 𝜌 of a set 𝑋 of 𝑛 items is
a bΔ³ection from 𝑋 to [𝑛]. Given two rankings 𝜎 and 𝜌, an inversion of 𝜎 with respect to 𝜌 is a
pair of items π‘₯ 1 , π‘₯2 ∈ 𝑋 such that 𝜌(π‘₯ 1 ) < 𝜌(π‘₯ 2 ) but 𝜎(π‘₯ 1 ) > 𝜎(π‘₯ 2 ). The number of inversions
is denoted by Inv𝜌 (𝜎). An inversion in an array 𝐴 of distinct values is an inversion of 𝜎 with
respect to 𝜌 where 𝜎 denotes the ranking by array index and 𝜌 the ranking by value; in this
setting we write Inv(𝐴) for Inv𝜌 (𝜎).
    The minimum number of inversions over all tree orderings can be used to compare the quality
of different trees 𝑇 for a given ranking 𝜌, or of different rankings 𝜌 for a given tree 𝑇. This mimics
the use of the number of inversions in applications like collaborative filtering in recommender
systems, rank aggregation for meta searching the web, and Kendall’s test for dependencies
between two random variables. In particular, the Mann–Whitney test for differences between
random variables can be viewed as a special case of our optimization problem. The test is
widely used because of its nonparametric nature, meaning that no assumptions need to be made
about the distribution of the two variables; the distribution of the statistic in the case of the null
hypothesis (both variables have the same distribution) is always the same. The test achieves
this property by only considering the relative order of the samples. It takes a sequence 𝐴 of π‘Ž
samples from a random variable π‘Œ, a sequence 𝐡 of 𝑏 samples from another random variable 𝑍,
and computes the statistic π‘ˆ  min(XInv(𝐴, 𝐡), XInv(𝐡, 𝐴)) that is the minimum of the number
XInv(𝐴, 𝐡) of cross inversions from 𝐴 to 𝐡, and vice versa, defined as follows.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                          3
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Definition 1.2 (cross inversions, XInvΒ· (Β·, Β·)). Let 𝜌 be a ranking of 𝑋, and 𝐴, 𝐡 βŠ† 𝑋. A cross
inversion from 𝐴 to 𝐡 with respect to 𝜌 is a pair (π‘₯ 1 , π‘₯2 ) ∈ 𝐴 Γ— 𝐡 that is out of order with respect
to 𝜌, i. e., such that 𝜌(π‘₯ 1 ) > 𝜌(π‘₯ 2 ). The number of cross inversions is denoted by XInv𝜌 (𝐴, 𝐡).
For two arrays 𝐴 and 𝐡 of distinct values, a cross inversion from 𝐴 to 𝐡 is a cross inversion from
the set of entries in 𝐴 to the set of entries in 𝐡 where 𝜌 denotes the ranking by value; in this
setting we write XInv(𝐴, 𝐡) for XInv𝜌 (𝐴, 𝐡).

    The statistic π‘ˆ coincides with the optimum value of our optimization problem with the tree
𝑇 in Figure 2 as input. The leftmost π‘Ž leaves correspond to the samples 𝐴, the rightmost 𝑏 leaves
to the samples 𝐡, and the ranking 𝜌 to the value order of the combined π‘Ž + 𝑏 samples.




                                   Β·Β·Β·                                  Β·Β·Β·
                             π‘Ž leaves                        𝑏 leaves

                                Figure 2: Mann–Whitney instance

   We mainly study the value version of our optimization problem, which we denote by MinInv.

Definition 1.3 (inversion minimization on trees, MinInv(Β·, Β·), Ξ Β· ). Inversion minimization on
trees is the computational problem with the following specification:

Input: A rooted tree 𝑇 with leaf set 𝑋 of size 𝑛, and a ranking 𝜌 of 𝑋.

Output: MinInv(𝑇, 𝜌), the minimum of Inv𝜌 (𝜎) over all possible orderings of 𝑇, where 𝜎 denotes
    the left-to-right ranking of 𝑋 induced by the ordering of 𝑇.

For any fixed tree 𝑇 with leaf set 𝑋, we use the short-hand Π𝑇 to denote the computational
problem that takes as input a ranking 𝜌 of 𝑋 and outputs MinInv(𝑇, 𝜌).

    Degerman [6] observes that the ordering at each internal node can be optimized independently
in a greedy fashion. In the setting of binary trees, for each node 𝑣, we can count the cross
inversions from the leaves in the left subtree of 𝑣 to the leaves in the right subtree of 𝑣. Between
the two possible orderings of the children of a node 𝑣, we choose the one that yields the smaller
number of cross inversions. Based on his observation, Degerman presents a polynomial-time
algorithm for the case of binary trees 𝑇. A more refined implementation and analysis yields
a running time of 𝑂(𝑑avg (𝑇) Β· 𝑛), where 𝑑avg (𝑇) denotes the average depth of a leaf in 𝑇. For
balanced binary trees the running time becomes 𝑂(𝑛 log 𝑛). All of this can be viewed as variants
of the well-known 𝑂(𝑛 log 𝑛) divide-and-conquer algorithm for counting inversions in arrays of
length 𝑛.

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                           4
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    For trees of degree deg(𝑇) > 2, the local greedy optimization at each internal node becomes
more complicated, as there are many ways to order the children of each internal node. Exhaustive
search results in a running time of 𝑂((deg(𝑇)! + deg(𝑇) Β· 𝑑avg (𝑇)) Β· 𝑛), which can be improved to
𝑂((deg(𝑇)2 2deg(𝑇) + deg(𝑇) Β· 𝑑avg (𝑇)) Β· 𝑛) using dynamic programming. The problem is closely
related to the classical problem of minimum arc feedback set, and becomes NP-hard without
any constraints on the degree. We refer to Section 10 for more details.



Query complexity. Rather than running time in the Turing machine model, our focus lies
on query complexity in the comparison-query model. There we can only access the ranking
𝜌 : 𝑋 β†’ [𝑛] via queries of the form: Is 𝜌(π‘₯ 1 ) < 𝜌(π‘₯ 2 )? For any fixed tree 𝑇, we want to determine
the minimum number of queries needed to solve the problem.
    The comparison-query model represents the standard model for analyzing problems like
sorting, selection, and heap construction. Sorting represents the hardest problem in the
comparison-query model as it is tantamount to knowing the entire ranking 𝜌. Its query
complexity has a well-known information-theoretic lower bound of log2 (𝑛!) = 𝑛 log2 (𝑛/e) +
2 log2 (𝑛) + 𝑂(1). Standard algorithms such as mergesort and heapsort yield an upper bound of
1

log2 (𝑛!) + 𝑂(𝑛), which has been improved to log2 (𝑛!) + π‘œ(𝑛) recently [26]. We refer to Section 2
for an overview of results and techniques for lower bounds in the model.
    Information theory only yields
                                  a very weak lower bound on the query complexity of inversion
minimization on trees: log2 𝑛2 = 2 log2 (𝑛) βˆ’ 𝑂(1). The complexity of the problem critically
depends on the shape of the tree 𝑇 and can be significantly lower than the one for sorting. For
starters, the problem becomes trivial for trees of depth one as their leaves can be arranged freely
in any order. More precisely, the trees 𝑇 for which the answer is identically zero, irrespective of
the ranking 𝜌, are exactly those such that all root-to-leaf paths have only the root in common.
    Arguably, the simplest nontrivial instances of inversion minimization are for trees 𝑇 of the
Mann–Whitney type in Figure 2 with π‘Ž = 1 and 𝑏 = 𝑛 βˆ’ 1. Depending on the rank π‘Ÿ of the
isolated leaf, an optimal ordering of 𝑇 is either the left or the right part in Figure 3, where the
label of each leaf is its rank under 𝜌.




                                   ...                                 ...
                π‘Ÿ      1     2              𝑛              1       2         𝑛     π‘Ÿ

                                         Figure 3: Rank instance

    As the ordering on the left has π‘Ÿ βˆ’ 1 inversions and the one on the right 𝑛 βˆ’ π‘Ÿ, the answer is
min(π‘Ÿ βˆ’ 1, 𝑛 βˆ’ π‘Ÿ). Thus, this instance of inversion minimization on trees is essentially equivalent
to rank finding, which has query complexity exactly 𝑛 βˆ’ 1.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                        5
                        I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

1.1     Main results
We prove that for many trees 𝑇, inversion minimization on 𝑇 is nearly as hard as sorting. First,
we exhibit a common structure that guarantees high complexity, namely a subtree that contains
a fairly balanced fraction of the leaves. We make use of the following notation.

Definition 1.4 (leaf set, L(Β·), and subtree). For a tree 𝑇, the leaf set of 𝑇, denoted L(𝑇), is the set
of leaves of 𝑇. For a node 𝑣 to 𝑇, 𝑇𝑣 denotes the subtree of 𝑇 rooted at 𝑣.

    The quantitative statement references the gamma function Ξ“, which is a proxy for any convex
real function that interpolates the factorial function on the positive integers. More precisely, we
have that Ξ“(𝑛 + 1) = 𝑛! for every integer 𝑛 β‰₯ 1.

Theorem 1.5 (lower bound for general trees). Let 𝑇 be a tree with 𝑛 leaves, and 𝑣 a node with
|L(𝑇𝑣 )| = β„“ . The query complexity of inversion minimization on 𝑇 is at least log2 (Ξ“( β„“ (π‘›βˆ’β„“  𝑛
                                                                                                  )
                                                                                                    + 1)). In
                                                  π‘˜
particular, the complexity is at least log2 (Ξ“( (π‘˜+1)2 Β· 𝑛 + 1)) where π‘˜ denotes the degree of 𝑇.

    For trees of constant degree, Theorem 1.5 yields a lower bound that is as strong as the one
for sorting up to a constant multiplicative factor. For the important case of binary trees (like
the classification trees from the motivating example), we obtain a lower bound that is only a
logarithmic additive term shy of the lower bound for sorting.

Theorem 1.6 (lower bound for binary trees). For binary trees 𝑇 with 𝑛 leaves, the query complexity
of inversion minimization on 𝑇 is at least log2 (𝑛!) βˆ’ 𝑂(log 𝑛).

    The logarithmic loss can be reduced to a constant for certain restricted classes of trees. The
full statement is somewhat technical. First, it assumes that the tree has no nodes of degree 1. This
is without loss of generality, as we can short-cut all degree-1 nodes in the tree without affecting
the minimum number of inversions. For example, trivial trees for inversion minimization have
depth 1 without loss of generality. Second, the strength of the lower bound depends on the
maximum size of a leaf child set, defined as follows.

Definition 1.7 (leaf child set, LC(Β·)). The leaf child set LC(𝑣) of a vertex 𝑣 in a tree 𝑇 is the set
LC(𝑣) of all the children of 𝑣 that are leaves in 𝑇.

   Most importantly, the result requires certain fragile parity conditions to hold. That said,
there are interesting classes satisfying all requirements, and the bounds are very tight.

Theorem 1.8 (lower bound for restricted classes). Let 𝑇 be a tree without nodes of degree 1 such that
the leaf child sets have size at most π‘˜, at most one of them is odd, and if there exists an odd one, say LC(𝑣 βˆ— ),
then all ancestors of 𝑣 βˆ— have empty leaf child sets. The query complexity of inversion minimization on 𝑇
is at least log2 (𝑛!) βˆ’ 𝑂(π‘˜ log π‘˜). In particular, the lower bound applies to:

      β€’ perfect trees of even degree π‘˜, and

      β€’ full binary (π‘˜ = 2) trees with at most one leaf without a sibling leaf.

                          T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                  6
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    Recall that a tree of degree π‘˜ is full if every node has degree 0 or π‘˜. It is perfect if it is full
and all leaves have the same depth.
    For the Mann–Whitney statistic, Theorem 1.5 provides an Ξ©(𝑛 log 𝑛) lower bound for
balanced instances, i. e., when π‘Ž and 𝑏 are Θ(𝑛). For unbalanced instances there is a more
efficient way to count cross inversions and thus evaluate the statistic: Sort the smaller of the two
sides, and then do a binary search for each item of the larger side to find its position within the
sorted smaller side so as to determine the number of cross inversions that it contributes. For
π‘Ž ≀ 𝑏 the approach makes 𝑏 log2 (π‘Ž) + 𝑂(π‘Ž log π‘Ž) comparisons. We establish a lower bound that
shows the approach is optimal up to a constant multiplicative factor.
Theorem 1.9 (lower bound for counting cross inversions). Counting cross inversions from a set 𝐴
of size π‘Ž to a set 𝐡 of size 𝑏 β‰₯ π‘Ž with respect to a ranking 𝜌 of 𝑋  𝐴 t 𝐡 requires Ξ©((π‘Ž + 𝑏) log(π‘Ž))
queries in the comparison-query model, as does inversion minimization on the tree of Figure 2.

1.2   Techniques
We obtain our results by developing two new query lower bound techniques for problems Ξ 
in the comparison-query model, and then instantiating them to the problem Π𝑇 of inversion
minimization on a fixed tree 𝑇. Although some of our techniques extend to relations, we restrict
attention to computational problems Ξ  that are functions, just like the problem Π𝑇 that we focus
on.
Definition 1.10 (computational problem and algorithm in the comparison-query model). A
computational problem in the comparison-query model is a total function on the rankings of a
set 𝑋. An algorithm in the comparison-query model can access an input ranking 𝜌 : 𝑋 β†’ [𝑛]
using comparison queries: For given π‘₯1 , π‘₯2 ∈ 𝑋, test if 𝜌(π‘₯ 1 ) < 𝜌(π‘₯ 2 ).
   Both of our techniques follow the common pattern of lower bounding the number of distinct
execution traces that any algorithm for Ξ  needs to have.
Definition 1.11 (execution trace, complexity measures D(·) and Q(·)). Consider an algorithm 𝐴
for a problem Π in the comparison-query model. An execution trace of 𝐴 is the sequence of
comparisons that 𝐴 makes on some input 𝜌, as well as the outcomes of the comparisons. The
complexity D(Ξ ) is the minimum over all possible algorithms for Ξ  of the number of distinct
traces the algorithm has over the set of all inputs 𝜌. The complexity Q(Π) is the minimum, over
all possible algorithms for Ξ  of the maximum number of comparisons that the algorithm makes
over the set of all inputs 𝜌.
    The complexity measure Q is what we refer to as query complexity. Since the maximum
number of queries that an algorithm 𝐴 makes is at least the base-2 logarithm of the number
of execution traces, we have that Q(Ξ ) β‰₯ log2 (D(Ξ )). Note that, in order to avoid confusion
with the tree 𝑇 specifying an instance of inversion minimization, we refrain from the common
terminology of decision trees in the context of the complexity measure D. In those terms, we
lower bound the number of leaves of any decision tree for Ξ , and use the fact that the depth of
this binary decision tree is at least the base-2 logarithm of the number of leaves.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                          7
                    I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

   Both techniques proceed by considering the effect on the output of perturbations to the input
ranking 𝜌 that are hard for queries to observe. More specifically, we consider the following
perturbations:
Definition 1.12 (adjacent-rank transposition, affected items). An adjacent-rank transposition is
a permutation 𝜏 of [𝑛] of the form 𝜏 = (π‘Ÿ, π‘Ÿ + 1), where π‘Ÿ ∈ [𝑛 βˆ’ 1] and 𝑛 denotes the number
of items. Given 𝜏 and a ranking 𝜌 : 𝑋 β†’ [𝑛], the affected items are the two elements π‘₯ ∈ 𝑋 for
which 𝜏(𝜌(π‘₯)) β‰  𝜌(π‘₯), i. e., the items with ranks π‘Ÿ and π‘Ÿ + 1 under 𝜌.
    As with any permutation of the set of ranks, the effect of 𝜏 on a ranking 𝜌 is the ranking 𝜏𝜌.
Adjacent-rank transpositions are the least noticeable perturbations one can apply to a ranking
in the following sense: If two rankings differ by an adjacent-rank transposition, then the only
query that distinguishes them is the query that compares the affected items.

Sensitivity. Our first technique turns this observation around to obtain a lower bound on
query complexity. We adopt the terminology of sensitivity from Boolean query complexity.
Definition 1.13 (sensitivity, average sensitivity, 𝑠(Β·)). Let Ξ  be a computational problem in
the comparison-query model on a set 𝑋 of items. For a fixed ranking 𝜌 and adjacent-rank
transposition 𝜏, we say that Ξ  is sensitive to 𝜏 at 𝜌 if Ξ (𝜌) β‰  Ξ (𝜏𝜌). The sensitivity of Ξ  at 𝜌
is the number of adjacent-rank transpositions 𝜏 such that Π is sensitive to 𝜏 at 𝜌. The average
sensitivity of Ξ , denoted 𝑠(Ξ ), is the average sensitivity of Ξ  at 𝜌 when 𝜌 is drawn uniformly at
random from all rankings of 𝑋.
    On input a ranking 𝜌, any algorithm for Π needs to make a number of queries that is at
least the sensitivity of Π at 𝜌. Indeed, consider an adjacent-rank transposition 𝜏 to which Π is
sensitive at 𝜌. If the algorithm does not make the query that compares the affected items, then it
must output the same answer on input 𝜏𝜌 as on input 𝜌. Since the value of Π differs on both
inputs, this means the algorithm makes a mistake on at least one of the two. It follows that the
average number of queries that any algorithm for Ξ  makes is at least the average sensitivity
𝑠(Ξ ). A fortiori, Q(Ξ ) β‰₯ 𝑠(Ξ ).
    As sensitivity cannot exceed 𝑛 βˆ’ 1, the best lower bound on query complexity that we can
establish based on the above basic observation alone, is 𝑛 βˆ’ 1. The following improvement
yields a the lower bound D(Ξ ) β‰₯ 𝑛!/𝑛 = (𝑛 βˆ’ 1)!, and therefore Q(Ξ ) β‰₯ log2 (𝑛!) βˆ’ log2 (𝑛) for
problems Ξ  of maximum average sensitivity 𝑠(Ξ ) = 𝑛 βˆ’ 1. The argument hinges on an efficient
encoding of rankings that share the same execution trace. See Section 3 for more details.
Lemma 1.14 (Sensitivity Lemma). For any problem Ξ  in the comparison-query model with 𝑛 items,
D(Ξ ) β‰₯ Ξ“(𝑠(Ξ ) + 2)/𝑛.
   The lower bound for general trees 𝑇 in Theorem 1.5 and the strengthening for binary trees
in Theorem 1.6 follow from corresponding lower bounds on the average sensitivity 𝑠(Π𝑇 ).
Theorem 1.5 only requires a short analysis to establish the sensitivity lower bound needed for
the application of the Sensitivity Lemma; this illustrates the power of the lemma and of the
lower bound technique. Theorem 1.6 requires a more involved sensitivity analysis, but then

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                       8
                         L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

yields a very tight lower bound. Owing to the average-case nature of the underlying measure,
the technique also exhibits some degree of robustness. For the particular problem of inversion
minimization on trees, we show that small changes to the tree 𝑇 do not affect the average
sensitivity 𝑠(Π𝑇 ) by much. See Section 4 and Section 5.
    For sorting, counting inversions, and inversion parity, the average sensitivity reaches its
maximum value of 𝑛 βˆ’ 1, and Lemma 1.14 recovers the standard lower bounds up to a small
loss. In contrast, for selection, the average sensitivity equals 1 for ranks 1 and 𝑛, and 2 for other
ranks, so the bound from Lemma 1.14 is no good. This reflects that, just like in the Boolean
setting, (average) sensitivity is sometimes too rough of a measure and not always capable of
proving strong lower bounds. Our second technique looks at a more delicate structural aspect,
which enables it to sometimes yield stronger lower bounds.

Permutahedron graph. Before introducing our second technique, we cast our first technique
in graph theoretic terms. In fact, both our techniques can be expressed naturally in subgraphs
of the graph with the rankings as vertices and adjacent-rank transpositions as edges. The
latter graph can be viewed as the Cayley graph of the symmetric group with adjacent-rank
transpositions as the generating set. It is also the edge graph of the permutahedron, the convex
polytope spanned by all permutations of the vertex (1, 2, . . . , 𝑛) in ℝ 𝑛 .  The permutahedron
resides inside the hyperplane where the sum of the coordinates equals 𝑛2 , has positive volume
inside that hyperplane, and can thus be represented naturally in dimension 𝑛 βˆ’ 1; see Figure 4
for a rendering of the instance with 𝑛 = 4 [9].
    We think of coloring the vertices of the
permutahedron with their values under Ξ  and                                    (3,1,2,4)
                                                                                              (4,1,2,3)
                                                                                                             (4,2,1,3)
make use of the subgraph with the same vertex                                             (3,2,1,4)

set but only containing the monochromatic                                                        (4,1,3,2)
                                                              (2,1,3,4)                                                     (4,3,1,2)
edges, i. e., the edges whose end points have
                                                                                (2,3,1,4)
the same value under Ξ . We also consider the
                                                                                              (3,1,4,2)    (4,2,3,1)
the complementary subgraph containing all                            (2,1,4,3)                                                  (4,3,2,1)

bichromatic edges.                                    (1,2,3,4)                                                        (3,4,1,2)
                                                                                     (1,3,2,4)      (2,4,1,3)
                                                                                                                (3,2,4,1)
Definition 1.15 (permutahedron graph, 𝐺(·),
                                                      (1,2,4,3)                                           (3,4,2,1)
𝐺(·)). Let Π be a computational problem in
the comparison-query model on a set 𝑋 of                                  (1,4,2,3)
                                                                                      (2,3,4,1)

items. The permutahedron graph of Ξ , denoted                    (1,3,4,2)                       (2,4,3,1)

𝐺(Ξ ), has the rankings of 𝑋 as vertices, and                                (1,4,3,2)


an edge between two rankings 𝜌1 and 𝜌2 if          Figure 4: Permutahedron for 𝑛 = 4 items
Π(𝜌1 ) = Π(𝜌2 ) and there exists an adjacent-
rank transposition such that 𝜌2 = 𝜏𝜌1 . The
complementary permutahedron graph of Π, denoted 𝐺(Π), is defined similarly by replacing the
condition Ξ (𝜌1 ) = Ξ (𝜌2 ) by its complement, Ξ (𝜌1 ) β‰  Ξ (𝜌2 ).

  Our first technique looks at degrees in the complementary permutahedron graph 𝐺(Π), and
more specifically at the average degree degavg (𝐺(Ξ ))  𝔼(deg𝐺(Ξ ) (𝜌)), where the expectation is

                              T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                                     9
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

with respect to a uniform choice of the ranking 𝜌. Our second technique looks at the connected
components of the permutahedron graph 𝐺(Π).

Connectivity. Our second technique is reminiscent of a result in algebraic complexity theory,
where the number of execution traces of an algorithm for a problem Ξ  in the algebraic
comparison-query model is lower bounded in terms of the number of connected components
that Ξ  induces in its input space ℝ 𝑛 [1]. In the comparison-query setting, we obtain the following
lower bound.

Lemma 1.16 (Connectivity Lemma). For any problem Ξ  in the comparison-query model, D(Ξ ) is at
least the number of connected components of 𝐺(Π).

    The Connectivity Lemma allows for a simple and unified exposition of many of the known
lower bounds. For counting inversions and inversion parity the argument goes as follows. Every
adjacent-rank transposition changes the number of inversions by exactly one (up or down), and
therefore changes the output of Ξ , so all 𝑛! vertices in 𝐺(Ξ ) are isolated. This means that any
algorithm for Ξ  actually needs to sort and has to make at least log2 (𝑛!) queries. See Section 7 for
a proof of the Connectivity Lemma and more applications to classical problems, including the
Ξ©(𝑛) lower bound for median finding.
    The Connectivity Lemma also enables us to establish strong lower bounds for inversion
minimization on special types of trees 𝑇, namely those of Theorem 1.8 and the Mann–Whitney
instances in Theorem 1.9, closely related to counting inversions. Both theorems involve an
analysis of the size of the connected component of a random ranking 𝜌 in 𝐺(Π𝑇 ), and Theorem 1.8
uses the delicate parity conditions of its statement to keep 𝐺(Π𝑇 ) as sparse as possible. See
Section 8 for more details, including a more general property that guarantees the required
sparseness (the partition property from Definition 8.1) and the resulting lower bound for any
problem Ξ  that satisfies the property (Lemma 8.4).
    The Mann–Whitney setting illustrates well the relative power of our techniques. In the
Mann–Whitney instances of inversion minimization, the leaves are naturally split between
a subtree containing π‘Ž of them and a subtree containing 𝑏 of them. The argument behind
                                         π‘Žπ‘
Theorem 1.5 yields a lower bound of π‘Ž+𝑏     on the sensitivity 𝑠(Π𝑇 ). The true sensitivity is just
𝑂(1) below the one for counting cross inversions, which is π‘Ž+𝑏
                                                             2π‘Žπ‘
                                                                 . The resulting lower bounds on
the query complexity in case π‘Ž ≀ 𝑏 are Θ(π‘Ž log π‘Ž), which roughly account for sorting the smaller
side but not for the 𝑏 log2 (π‘Ž) comparisons used in the subsequent binary searches for counting
cross inversions. Our approach based on the Connectivity Lemma yields a lower bound that
includes both terms. On the other hand, it is easier to estimate and obtain the lower bound via
the Sensitivity Lemma than to argue the query lower bound via the Connectivity Lemma or
from scratch.

1.3   Other results
Other modes of computation. We stated our lower bounds for the standard, deterministic
mode of computation. Both of our techniques provide lower bounds for the number of distinct

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                        10
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

execution traces that are needed to cover all input rankings, irrespective of whether these
execution traces derive from a single algorithm. Such execution traces can be viewed as
certificates or witnesses for the value of Π on a given input 𝜌, or as valid execution traces of a
nondeterministic algorithm for Ξ . We define the minimum number of traces needed to cover all
input rankings for a problem Ξ  as the nondeterministic complexity of Ξ  and denote it by N(Ξ ),
along the lines of the Boolean setting [15]. All of our lower bounds on D(Ξ ) actually hold for
N(Ξ ). See Remark 3.4 and Remark 7.2 for further discussion.
    Since randomized algorithms with zero error are also nondeterministic algorithms, all of our
lower bounds apply verbatim to the former mode of computation, as well. As for randomized
algorithms with bounded error, we argue in Section 6 that our lower bounds on the query complexity
of inversion minimization on trees that follow from the Sensitivity Lemma carry over modulo a
small loss in strength. We do so by showing in general that high average sensitivity implies
high query complexity against such algorithms (see Lemma 6.5).
    The fact that our techniques yield lower bounds on N(Ξ ) and not just D(Ξ ) also explains
why our approaches sometimes fail. For example, for the problem Ξ  of finding the minimum
of 𝑛 items, a total of 𝑛 certificates suffice and are needed, namely one for each possible item
being the minimum. This means that our techniques cannot give a lower bound on the query
complexity of Ξ  that is better than log2 (𝑛). In contrast, as reviewed in Section 2, D(Ξ ) = 2π‘›βˆ’1
and the number of queries needed is 𝑛 βˆ’ 1.

Cross-inversion distribution. As a technical result in the sensitivity analysis for inversion
minimization on binary trees (Theorem 1.6), we need a strong upper bound on the probability
that the number of cross inversions XInv𝜌 (𝐴, 𝐡) takes on any particular value when the ranking 𝜌
of the set 𝑋 = 𝐴 t 𝐡 is chosen uniformly at random. This is the distribution of the Mann–Whitney
statistic under the null hypothesis. Mann and Whitney [20] argued that it converges to a normal
distribution with mean πœ‡ = π‘Žπ‘/2 and variance 𝜎2 = π‘Žπ‘(π‘Ž + 𝑏 + 1)/12√as π‘Ž  |𝐴| and 𝑏  |𝐡| grow
large. Since the normal distribution has a maximum density of 1/( 2πœ‹πœŽ), theirp         result suggests
that the maximum of the underlying probability distribution is 𝑂(1/𝜎) = 𝑂(1/ π‘Žπ‘(π‘Ž + 𝑏 + 1)).
                                                                                               √
TakΓ‘cs [28] managed to formally establish such a bound for all pairs (π‘Ž, 𝑏) with |π‘Žβˆ’π‘| = 𝑂( π‘Ž + 𝑏),
Stanley and Zanello [27] for all pairs (π‘Ž, 𝑏) with min(π‘Ž, 𝑏) bounded, and Melczer, Panova, and
Pemantle [21] for all pairs (π‘Ž, 𝑏) with |π‘Ž βˆ’ 𝑏| ≀ 𝛼 Β· (π‘Ž + 𝑏) for some constant 𝛼 < 1. However, these
results do not cover all regimes and leave open a single bound of the same form that applies to
all pairs (π‘Ž, 𝑏), which is what we need for Theorem 1.6. We establish such a bound in Section 9.
The counts of the rankings 𝜌 with a particular value for XInv𝜌 (𝐴, 𝐡) appear as the coefficients of
the Gaussian polynomials. Our bound can be stated equivalently as a bound on those coefficients.


1.4     Organization
We have organized the material so as to provide a shortest route to a full proof of Theorem 1.5.
Here are the sections needed for the different main results:

      β€’ Theorem 1.5 (lower bound for general trees): 3, 4.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                        11
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

    β€’ Theorem 1.6 (lower bound for binary trees): 3, 5, 9.

    β€’ Theorem 1.8 (lower bound for restricted classes): 7, 8 up to 8.3 inclusive.

    β€’ Theorem 1.9 (lower bound for counting cross inversions): 7, 8 but not 8.2 nor 8.3.
In Section 2, we provide some background on known lower bounds in the comparison-query
model, several of which are unified by the Sensitivity Lemma and Connectivity Lemma. In
Section 6, we present our lower bounds against randomized algorithms with bounded error.
The tight bound on maximum probability of the cross-inversion distribution is covered in
Section 9. For completeness, we end in Section 10 with proofs of the results we stated on the
Turing complexity of inversion minimization on trees.


2    The comparison-query model
In this section we provide an overview of known results and techniques for lower bounds in the
comparison-query model. This section can be skipped without a significant loss in continuity.
    Tight bounds have been established for problems like sorting, selection, and heap construc-
tion.
    β€’ We already discussed the central problem of sorting in Section 1.

    β€’ In selection we are told a rank π‘Ÿ, and must identify the item with rank π‘Ÿ. The query
      complexity is known to be Θ(𝑛) [2, 7, 8]. There is also multiple selection, in which one is
      given multiple ranks π‘Ÿ1 , . . . , π‘Ÿ π‘˜ , and must identify each of the corresponding items. The
      query complexity of multiple selection is likewise known up to a Θ(𝑛) gap between the
      upper and lower bounds [16].

    β€’ In heap construction we must arrange the items as nodes in a complete binary tree such that
      every node has a rank no larger than its children. The query complexity is known to be
      Θ(𝑛) [5, 11].
    All the problems above can be cast as instantiations of a general framework known as partial
order production [25]. Here, in addition to query access to the ranking 𝜌 of the items, we are given
𝑛 slots and regular access to a partial order <slot on the slots. The objective is to put each item
into a slot, one item per slot, so that whenever two slots, 𝑠 1 and 𝑠 2 , are related by 𝑠 1 <slot 𝑠 2 , we
also have 𝜌(𝑠 1 ) < 𝜌(𝑠 2 ). Sorting coincides with the case where <slot is a total order. In selection
of rank π‘Ÿ, there is a designated slot 𝑠 βˆ— , and there are exactly π‘Ÿ βˆ’ 1 slots 𝑠 with 𝑠 <slot 𝑠 βˆ— and
exactly 𝑛 βˆ’ π‘Ÿ slots 𝑠 with 𝑠 βˆ— <slot 𝑠; there are no other relations in <slot (see the example at the
end of Section 7 for more details). Multiple selection is similar. For heap construction, <slot
matches the complete binary tree arrangement.
    Partial order production for a given <slot naturally decomposes into the same problem for
each of the connected components of the undirected graph underlying <slot . In the case of a
single connected component, an elementary adversary argument shows that Q(Ξ ) β‰₯ 𝑛 βˆ’ 1: Any
combination of less than 𝑛 βˆ’ 1 queries to 𝜌 leaves some pair of slots in <slot undetermined with

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                             12
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

respect to 𝜌. Another lower bound is the information-theoretic limit. For each way of putting
items into slots, the number of input rankings 𝜌 for which that way is a correct answer is bounded
by 𝑒(<slot ), the number of ways to extend <slot to a total order. Therefore, there must be at least
𝑛!/𝑒(<slot ) distinct execution traces. Since each execution trace is determined by the outcomes of
its queries, and each query has only two outcomes, we conclude that πœ†(<slot )  log2 (𝑛!/𝑒(<slot ))
queries are necessary to solve partial order production. Complementing these lower bounds
there exists an upper bound of (1 + π‘œ(1)) Β· πœ†(<slot ) + 𝑐 Β· (𝑛 βˆ’ 1) queries for some universal constant
𝑐 [4]. For any instance Ξ  with partial order <slot it follows that Q(Ξ ) = Θ(πœ†(<slot ) + 𝑛 βˆ’ 𝛾(<slot )),
where 𝛾(<slot ) denotes the number of connected components of the undirected graph underlying
<slot .
     Not every problem of interest in the comparison model is an instance of partial order
production. Here are a few examples.
    β€’ In rank finding there is a designated item π‘₯ βˆ— , and we have to compute its rank. The rank can
      be computed by comparing π‘₯ βˆ— with each of the 𝑛 βˆ’ 1 other items. A similar elementary
      adversary argument as above shows that the query complexity is at least 𝑛 βˆ’ 1.

    β€’ In counting inversions the items are arranged in some known order 𝜎 and the objective is
      to count the number of inversions of 𝜎 with respect to 𝜌. As we reviewed in Section 1,
      counting inversions has exactly the same query complexity as sorting.

    β€’ The problem of inversion parity is the same as counting inversions except that one need only
      count the number of inversions modulo 2. This problem, as well as counting inversions
      modulo π‘š for any integer π‘š > 1, also has exactly the same complexity as sorting.
    For each of the three problems above, information theory does not provide a satisfactory
lower bound. For example, in the inversion parity problem there are only two possible outputs,
which yields a lower bound of log2 (2) = 1. It so happens that for each of the preceding three
examples, the query complexity is known quite precisely; however, the known arguments are
rather problem-specific.
    Inversion minimization on trees is another example that does not fit the framework of partial
order generation, and for which information theory only yields a weak lower bound: log2 𝑛2 =
                                                                                             
2 log2 (𝑛) βˆ’ Θ(1). In contrast to the above examples, a strong lower bound does not seem to
follow from a simple ad-hoc argument nor from a literal equivalence to sorting.


3    Sensitivity Lemma
In this section we develop Lemma 1.14. We actually prove a somewhat stronger version.
Lemma 3.1 (Strong Sensitivity Lemma). Consider an algorithm 𝐴 in the comparison-based model
with 𝑛 items, color each vertex of the permutahedron with its execution trace under 𝐴, and let 𝐻 denote
the subgraph with the same vertex set but only containing the bichromatic edges. The number of distinct
execution traces of 𝐴 is at least 𝑔(degavg (𝐻) + 1)/𝑛, where 𝑔 : [1, ∞) β†’ ℝ is any convex function with
𝑔(π‘₯) = π‘₯! for π‘₯ ∈ [𝑛].

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                          13
                       I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

    The Sensitivity Lemma follows from Lemma 3.1 because the coloring with execution traces of
an algorithm 𝐴 for Π is a refinement of the coloring with Π, so every edge of the permutahedron
that is bichromatic under Π is also bichromatic under 𝐴, and

                          𝑠(Ξ )  𝔼(deg𝐺(Ξ ) (𝜌)) ≀ 𝔼(deg𝐻 (𝜌))  degavg (𝐻).

Provided 𝑔 is nondecreasing, it follows that D(Ξ ) β‰₯ 𝑔(degavg (𝐻) + 1)/𝑛 β‰₯ 𝑔(𝑠(Ξ ) + 1)/𝑛.
    In the Sensitivity Lemma we set 𝑔(π‘₯) = Ξ“(π‘₯ + 1). An optimal (but less elegant) choice for 𝑔 is
the piece-wise linear function that interpolates the prescribed values at the integral points in
[𝑛], namely
                         𝑔(π‘₯)  (π‘₯ βˆ’ bπ‘₯c) Β· (dπ‘₯e!) + (1 βˆ’ (π‘₯ βˆ’ bπ‘₯c)) Β· (bπ‘₯c!).
   For the proof of Lemma 3.1 we take intuition from a similar result in the Boolean setting [23,
Exercise 8.43], where the hypercube plays the role of the permutahedron in our setting.

Fact 3.2. Let 𝐴 be a query algorithm on binary strings of length 𝑛. Color each vertex of the 𝑛-dimensional
hypercube by its execution trace under 𝐴, and let 𝐻 denote the subgraph with the same vertex set but
                                                                                                  deg (𝐻)
only containing the bichromatic edges. Then the number of distinct execution traces is at least 2 avg .

    One way to argue Fact 3.2 is to think of assigning a weight 𝑀(π‘₯) to each π‘₯ ∈ {0, 1} 𝑛 so as to
maximize the total weight on all inputs, subject to the constraint that the total weight on each
individual execution trace is at most 1. Then the number of distinct execution traces must be at
least the sum of all the weights. If the weight only depends on the degree, i. e., if we can write
𝑀(π‘₯) = 𝑓 (deg𝐻 (π‘₯)) for some function 𝑓 : [0, ∞) β†’ ℝ, then we can lower bound the number π‘˜ of
distinct execution traces as follows:
                  Γ•             Γ•
             π‘˜β‰₯        𝑀(π‘₯) =       𝑓 (deg𝐻 (π‘₯)) β‰₯ 2𝑛 Β· 𝑓 (𝔼(deg(π‘₯))) = 2𝑛 Β· 𝑓 (degavg (𝐻)),               (3.1)
                   π‘₯            π‘₯

where the last inequality holds provided 𝑓 is convex.
    In the Boolean setting, the set 𝑅 of inputs π‘₯ ∈ {0, 1} 𝑛 with a particular execution trace forms
a subcube of dimension 𝑛 βˆ’ β„“ , where β„“ denotes the length of the execution trace, i. e., the number
of queries. Each π‘₯ ∈ 𝑅 has degree β„“ in 𝐻; this is because a change in a single queried position
results in a different execution trace, and a change in an unqueried position does not. Therefore,
a natural choice for the weight of π‘₯ ∈ 𝑅 is 𝑀(π‘₯) = 𝑓 (β„“ ) where 𝑓 (π‘₯) = 1/2π‘›βˆ’β„“ . It satisfies the
constraint that the total weight on 𝑅 is (at most) one, and 𝑓 is convex. We conclude by (3.1) that
the number of distinct execution traces is at least 2𝑛 Β· 𝑓 (degavg (𝐻)) = 2
                                                                              degavg (𝐻)
                                                                                           , as desired.

Proof of Lemma 3.1. Let π‘˜ denote the number of distinct execution traces of 𝐴, and let 𝑅 1 , . . . , 𝑅 π‘˜
denote the corresponding sets of rankings. Following a similar strategy, we want to find a convex
function 𝑓 : [0, ∞) β†’ ℝ such that the weight function 𝑀(𝜌) = 𝑓 (deg𝐻 (𝜌)) does not assign weight
more than 1 to any one of the sets 𝑅 𝑖 . The following claim, to be proven later, is the crux of this.
Claim 3.3. Let 𝑅 denote the set of all rankings 𝜌 that follow a particular execution trace on 𝐴, and let
                                                                                    𝑛!
𝑑 ∈ {0, . . . , 𝑛 βˆ’ 1}. The number of rankings 𝜌 ∈ 𝑅 with deg𝐻 (𝜌) = 𝑑 is at most (𝑑+1)! .

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                  14
                      L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    Based on Claim 3.3, a natural choice for 𝑓 is any convex function that satisfies 𝑓 (π‘₯) = 𝑛1 (π‘₯+1)!
                                                                                                    𝑛!
for π‘₯ ∈ {0, . . . , 𝑛 βˆ’ 1}. The factor of 𝑛1 comes from the fact that there are 𝑛 terms to sum together
after the weights have been normalized. For every 𝑖 ∈ [π‘˜] we then have
                      π‘›βˆ’1                                              π‘›βˆ’1                               π‘›βˆ’1
       Γ•              Γ•                                                Γ•   1 (𝑑 + 1)!            𝑛!      Γ•1
             𝑀(𝜌) =         {𝜌 ∈ 𝑅 𝑖 : deg𝐻 (𝜌) = 𝑑} Β· 𝑓 (𝑑) ≀                            Β·            =    = 1.
                                                                             𝑛   𝑛!           (𝑑 + 1)!    𝑛
     πœŒβˆˆπ‘… 𝑖            𝑑=0                                              𝑑=0                               𝑑=0

Similar to (3.1) we conclude
       π‘˜ Γ•
       Γ•                    Γ•                   Γ•
  π‘˜β‰₯               𝑀(𝜌) =              𝑀(𝜌) =       𝑓 (deg𝐻 (𝜌)) β‰₯ 𝑛! Β· 𝑓 (𝔼(deg𝐻 (𝜌))) = 𝑛! Β· 𝑓 (degavg (𝐻)). (3.2)
       𝑖=1 πœŒβˆˆπ‘… 𝑖               𝜌                𝜌

                      𝑔(π‘₯+1)
Setting 𝑓 (π‘₯) = 𝑛1 𝑛! turns the requirements for 𝑓 into those for 𝑔 in the statement of the
lemma, and yields that π‘˜ β‰₯ 𝑛! Β· 𝑓 (degavg (𝐻)) = 𝑔(degavg (𝐻) + 1)/𝑛.                     

    We now turn to proving Claim 3.3. The comparisons and outcomes that constitute a
particular execution trace of 𝐴 can be thought of as directed edges between the items in 𝑋. We
refer to the resulting digraph on the vertex set 𝑋 as the comparison graph 𝐢. Since the outcomes
of the comparisons are consistent with some underlying ranking, the digraph 𝐢 is acyclic. The
rankings in 𝑅 are in one-to-one and onto correspondence with the linear orderings of the DAG
𝐢. For a given ranking 𝜌 ∈ 𝑅, the degree deg𝐻 (𝜌) equals the number of π‘Ÿ ∈ {2, . . . , 𝑛} such that
swapping ranks π‘Ÿ βˆ’ 1 and π‘Ÿ in 𝜌 results in a ranking 𝜌0 = 𝜏𝜌 that is not in 𝑅, where 𝜏 denotes the
adjacent-rank transposition (π‘Ÿ βˆ’ 1, π‘Ÿ). The ranking 𝜌0 not being in 𝑅 means that it is inconsistent
with the combined comparisons and outcomes of the underlying execution trace, which happens
exactly when there is a path in 𝐢 from the item πœŒβˆ’1 (π‘Ÿ βˆ’ 1) of rank π‘Ÿ βˆ’ 1 in 𝜌 to the item πœŒβˆ’1 (π‘Ÿ)
with rank π‘Ÿ in 𝜌. Thus, the degree deg𝐻 (𝜌) equals the number of π‘Ÿ ∈ {2, . . . , 𝑛} such that there
is a path from πœŒβˆ’1 (π‘Ÿ βˆ’ 1) to πœŒβˆ’1 (π‘Ÿ) in 𝐢. See Figure 5 for an illustration, where a squiggly edge
𝑒 { 𝑣 denotes that there exists a path from 𝑒 to 𝑣 in 𝐢. We only draw squiggly edges from one
position to the next, so deg𝐻 (𝜌) equals the number of squiggly edges in Figure 5.

                    rank           1       2        3    4      5    ...     π‘Ÿβˆ’1      π‘Ÿ        ...   𝑛

                      πœŒβˆ’1

                                                Figure 5: Ranking encoding

    Our strategy is to give a compressed encoding of the rankings in 𝑅 such that there is more
compression as the number of squiggly edges increases. Our encoding is based on the well-
known algorithm to compute a linear order of a DAG. Algorithm 1 provides pseudocode for the
algorithm, which we refer to as BuildRanking.
    In our formulation, BuildRanking is nondeterministic: There is a choice to make in step 3
for each π‘Ÿ = 1, . . . , 𝑛. The possible executions of BuildRanking are in one-to-one and onto
correspondence with the linear orders of 𝐢, and thus with the rankings in 𝑅.

                            T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                   15
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Algorithm 1 BuildRanking(𝐢)
Input: DAG 𝐢 on vertex set 𝑋
Output: ranking of 𝑋 that is a linear order of 𝐢
 1: 𝑇 ← βˆ…
 2: for π‘Ÿ = 1 to 𝑛 do
 3:     π‘₯ ← arbitrary element of 𝑆  {𝑣 ∈ 𝑋 | there is no 𝑒 ∈ 𝑋 \ 𝑇 with 𝑒 { 𝑣 in 𝐺}
 4:     πœŒβˆ’1 (π‘Ÿ) ← π‘₯
 5:     𝑇 ← 𝑇 βˆͺ {π‘₯}


    Our encoding is a compressed description of how to make the decisions in BuildRanking
such that the output is 𝜌. Note that if πœŒβˆ’1 (π‘Ÿ βˆ’ 1) { πœŒβˆ’1 (π‘Ÿ), then the item π‘₯ with rank π‘Ÿ cannot
enter the set 𝑆 before iteration π‘Ÿ. This is because before πœŒβˆ’1 (π‘Ÿ βˆ’ 1) is removed from 𝑇 at the end
of iteration π‘Ÿ βˆ’ 1, the edge πœŒβˆ’1 (π‘Ÿ βˆ’ 1) { πœŒβˆ’1 (π‘Ÿ) prevents π‘₯ from being in 𝑆. Thus, whenever
πœŒβˆ’1 (π‘Ÿ βˆ’ 1) { πœŒβˆ’1 (π‘Ÿ), the item π‘₯ = πœŒβˆ’1 (π‘Ÿ) is lucky in the sense that it gets picked in step 3 as soon
as it enters the set 𝑆. In fact, the lucky items with respect to a ranking 𝜌 ∈ 𝑅 are exactly those for
which πœŒβˆ’1 (π‘Ÿ βˆ’ 1) { πœŒβˆ’1 (π‘Ÿ) for some π‘Ÿ ∈ {2, . . . , 𝑛}, as well as the item πœŒβˆ’1 (1) with rank 1. In
Figure 5 the lucky items are marked black. Their number equals deg𝐻 (𝜌) + 1.
    In order to generate a ranking 𝜌 using BuildRanking, it suffices to know:

  (a) the lucky items (as a set, not their relative ordering), and

  (b) the ordering of the non-lucky items (given which items they are).

This information suffices to make the correct choices in step 3 of Algorithm 1:

   β€’ If the set 𝑆 contains a lucky item, there will be a unique lucky item in 𝑆; pick it as the
     element π‘₯.

   β€’ Otherwise, pick for π‘₯ the first item in the ordering of the non-lucky items that is not yet in
     𝑇. Such an element will exist, and all the items that come after it in the ordering are not
     yet in 𝑇 either.
                                                                                          𝑛
If 𝜌 has degree 𝑑 = deg𝐻 (𝜌), then there are 𝑑 + 1 lucky items, so there are at most 𝑑+1
                                                                                            
                                                                                              choices for
                                                                                𝑛                     𝑛!
(a), and at most (𝑛 βˆ’ 𝑑 βˆ’ 1)! choices for (b), resulting in a total of at most 𝑑+1 Β· (𝑛 βˆ’ 𝑑 βˆ’ 1)! = (𝑑+1)!
                                                                                  

choices. This proves Claim 3.3.

Remark 3.4. Suppose we allow an algorithm 𝐴 to have multiple valid execution traces on a
given input 𝜌, and let 𝑅 𝑖 denote the set of rankings on which the 𝑖-th execution trace is valid.
The proof of Claim 3.3 carries over as it considers individually sets 𝑅 𝑖 , and only depends on the
DAG that the comparisons in 𝑅 𝑖 induce. The rest of the proof of Lemma 3.1 carries through
modulo the first equality in (3.2), which no longer holds as the sets 𝑅 𝑖 may overlap. However,
the equality can be replaced by the inequality β‰₯, which does hold and is sufficient for the
argument. This means that we can replace D(Ξ ) in the statement of the Sensitivity Lemma by
its nondeterministic variant N(Ξ ).

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                            16
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

4     Sensitivity approach for general trees
In this section we analyze the average sensitivity of the problem Π𝑇 of inversion minimization on
a tree 𝑇 with a general shape. In Section 4.1 we show that the existence of a subtree containing
a fair fraction of the leaves implies high sensitivity. The lower bound on query complexity
for Π𝑇 in Theorem 1.5 then follows from the Sensitivity Lemma. In Section 4.3 we prove that
the average sensitivity measure is Lipschitz continuous. For the analysis, we make use of
the decomposition of the objective of inversion minimization on trees mentioned earlier. We
describe the decomposition in more detail in Section 4.2; it will be helpful in later parts of this
paper, as well.

4.1   Subtree-induced sensitivity
We first introduce a sensitivity bound for inversion
minimization based on the size of a subtree.

Lemma 4.1 (subtree-induced sensitivity). Consider a
tree 𝑇 with 𝑛  |L(𝑇)| leaves, and some node 𝑣 in 𝑇 with                            𝑣
β„“  |L(𝑇𝑣 )| leaves. We have

                              β„“ (𝑛 βˆ’ β„“ )                                        L(𝑇𝑣 )
                   𝑠(Π𝑇 ) β‰₯              βˆ’ 1.
                                  𝑛
                                                                                   L(𝑇)
   Note that 𝑣 is not necessarily a direct child of the
root, as shown in Figure 6.                                           Figure 6: Subtree rooted at 𝑣
   We now prove Lemma 4.1. Let 𝜌 be a ranking of
the leaves of 𝑇, and let 𝜎min be a tree ordering that
minimizes the number of inversions with respect to 𝜌.

Claim 4.2. 𝜌 is sensitive to the transposition 𝜏 = (π‘Ÿ, π‘Ÿ + 1) if 𝜎min (πœŒβˆ’1 (π‘Ÿ)) > 𝜎min (πœŒβˆ’1 (π‘Ÿ + 1)).

Proof. If 𝜎min (πœŒβˆ’1 (π‘Ÿ)) > 𝜎min (πœŒβˆ’1 (π‘Ÿ + 1)), then Inv𝜏𝜌 (𝜎min ) = Inv𝜌 (𝜎min ) βˆ’ 1. By definition,
Inv𝜌 (𝜎min ) = MinInv(𝑇, 𝜌), so this means that MinInv(𝑇, 𝜏𝜌) < MinInv(𝑇, 𝜌), or that 𝜌 is
sensitive to 𝜏.                                                                                   
    In the case of general trees, a tree ordering 𝜎 that minimizes the number of inversions with
respect to 𝜌 is difficult to find (see the discussion on NP-hardness in Section 10). Our strategy
is to find a lower bound on the number of π‘Ÿ for which 𝜎(πœŒβˆ’1 (π‘Ÿ)) > 𝜎(πœŒβˆ’1 (π‘Ÿ + 1)) that applies
regardless of 𝜎.

Claim 4.3. For any ordering 𝜎, the number of π‘Ÿ such that 𝜎(πœŒβˆ’1 (π‘Ÿ)) > 𝜎(πœŒβˆ’1 (π‘Ÿ + 1)) is at least one less
than the number of 𝑠 such that πœŒβˆ’1 (𝑠) ∈ L(𝑇𝑣 ) and πœŒβˆ’1 (𝑠 + 1) βˆ‰ L(𝑇𝑣 ).

Proof. For all except at most one value of 𝑠 (the maximum 𝑠 for which πœŒβˆ’1 (𝑠) ∈ L(𝑇𝑣 )), there exists
a minimal 𝑠 0 > 𝑠 such that πœŒβˆ’1 (𝑠 0) ∈ L(𝑇𝑣 ). We claim that at least one value of π‘Ÿ = 𝑠, . . . , 𝑠 0 βˆ’ 1
satisfies 𝜎(πœŒβˆ’1 (π‘Ÿ)) > 𝜎(πœŒβˆ’1 (π‘Ÿ + 1)). If not, then 𝜎 would rank πœŒβˆ’1 (𝑠), πœŒβˆ’1 (𝑠 + 1), . . . , πœŒβˆ’1 (𝑠 0) in

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                             17
                       I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

increasing order. Because 𝜎 is a tree ordering, the leaves of L(𝑇𝑣 ) must be mapped into a
contiguous range by 𝜎, as shown in Figure 7. However, we have πœŒβˆ’1 (𝑠), πœŒβˆ’1 (𝑠 0) ∈ L(𝑇𝑣 ) but
πœŒβˆ’1 (𝑠 + 1) βˆ‰ L(𝑇𝑣 ), which violates this property since 𝜎 ranks a leaf outside L(𝑇𝑣 ) between two
leaves inside L(𝑇𝑣 ).
    Because each value of π‘Ÿ is found between consecutive pairs of values in L(𝑇𝑣 ), the values of π‘Ÿ
are distinct.                                                                                    


                                                           𝜎
                                                               9
                                                               8
                                                               7
                        𝑣                                      6
                                                               5                                                  L(𝑇𝑣 )
  2         7                    3         9                   4
                                                               3
                                                               2
                                8
                                                               1

            1    5 4        6                                      1    2   3
                                                                            π‘Ÿ1
                                                                                 4     5   6
                                                                                           π‘Ÿ2
                                                                                                7   8     9   𝜌
                                                                   𝑠1            𝑠10       𝑠2       𝑠20

 (a) Leaves in 𝜎-order, labeled with 𝜌-ranks               (b) Corresponding plot of 𝜌, 𝜎 for each leaf

                       Figure 7: 𝜎 maps leaves of L(𝑇𝑣 ) in a contiguous range.


Claim 4.4. Over a uniformly random 𝜌, the expected number of 𝑠 such that πœŒβˆ’1 (𝑠) ∈ L(𝑇𝑣 ) and
πœŒβˆ’1 (𝑠 + 1) βˆ‰ L(𝑇𝑣 ) is β„“ (π‘›βˆ’β„“
                            𝑛 .
                               )


Proof. For 𝑠 = 1, . . . , 𝑛 βˆ’ 1, the probability that πœŒβˆ’1 (𝑠) ∈ L(𝑇𝑣 ) is 𝑛ℓ , and the probability that
                                                    π‘›βˆ’β„“
πœŒβˆ’1 (𝑠 + 1) βˆ‰ L(𝑇𝑣 ) given that πœŒβˆ’1 (𝑠) ∈ L(𝑇𝑣 ) is π‘›βˆ’1 . Using linearity of expectation on the indicator
random variables for 𝜌 (𝑠) ∈ L(𝑇𝑣 ) and 𝜌 (𝑠 + 1) βˆ‰ L(𝑇𝑣 ), the expected number of 𝑠 satisfying
                           βˆ’1                   βˆ’1

this property is
                                               β„“ (𝑛 βˆ’ β„“ )     β„“ (𝑛 βˆ’ β„“ )
                                                         
                                     (𝑛 βˆ’ 1)                =            .                             
                                               𝑛(𝑛 βˆ’ 1)           𝑛

      Combining Claim 4.2, Claim 4.3, and Claim 4.4, we can conclude with Claim 4.1.

Bounded degree. We apply our analysis to the case of trees of degree π‘˜. Observe that for fixed
𝑛, Lemma 4.1 is strongest when β„“ = 𝑛/2. Not every tree 𝑇 has a subtree with exactly 𝑛/2 leaves,
but Lemma 4.1 still gives a useful bound for subtrees that do not contain too few or too many
leaves. In the case of trees of bounded degree, there always exists a subtree 𝑇𝑣 that contains a
fairly balanced fraction of the leaves. The following quantification is folklore, but we include a
proof for completeness.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                            18
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Fact 4.5. If 𝑇 is a tree of degree π‘˜ with 𝑛 leaves, there exists a node 𝑣 in 𝑇 such that β„“  |L(𝑇𝑣 )| = 𝛼 Β· 𝑛,
                      π‘˜
where π‘˜+1
        1
           ≀ 𝛼 ≀ π‘˜+1    .
Proof. Let π‘Ÿ be the root of 𝑇 and construct a sequence 𝑣 1 = π‘Ÿ, 𝑣2 , 𝑣3 , . . . such that 𝑣 𝑖 is a child of
𝑣 π‘–βˆ’1 that maximizes β„“ 𝑖  |L(𝑇𝑣 𝑖 )|, with ties broken arbitrarily. Notice that {β„“ 𝑖 } is a decreasing
sequence, and since 𝑇 has degree π‘˜, β„“ 𝑖 ≀ π‘˜β„“ 𝑖+1 for all 𝑖. We claim that some 𝑣 𝑖 in this sequence
                                                                       π‘˜
satisfies the conditions of the claim. If not, then for some 𝑖, β„“ 𝑖 > π‘˜+1 Β· 𝑛 and β„“ 𝑖+1 < π‘˜+11
                                                                                                Β· 𝑛, which
contradicts the fact that β„“ 𝑖 ≀ π‘˜β„“ 𝑖+1 .                                                                 
   By choosing a subtree satisfying Fact 4.5, we can apply Lemma 4.1 and conclude that
           π‘˜
𝑠(Π𝑇 ) β‰₯ (π‘˜+1) 2 Β· 𝑛 βˆ’ 1. The Sensitivity Lemma then gives the β€œin particular” part of Theorem 1.5.



4.2   Decomposition of the objective function
For use in this section as well as later parts of the paper, we now explain how the objective of
inversion minimization on trees decomposes. We introduce the notion of root inversion along
the way, and observe the effect of adjacent-rank transpositions on the decomposition.
    The objective MinInv(𝑇, 𝜌) can be written as the sum of contributions from each of the
individual nodes. A node 𝑣 contributes those inversions that reside in the subtree 𝑇𝑣 and go
through the root 𝑣 of 𝑇𝑣 . We refer to them as the root inversions in 𝑇𝑣 .
Definition 4.6 (root inversions, RInv(Β·, Β·, Β·), MinRInv(Β·, Β·)). Given a tree 𝑇, a ranking 𝜌 of the
leaves of 𝑇, and an ordering 𝜎 of 𝑇, a root inversion of 𝜎 with respect to 𝜌 is an inversion (β„“ 1 , β„“2 )
of 𝜎 with respect to 𝜌 for which the lowest common ancestor LCA(β„“ 1 , β„“2 ) is the root of 𝑇. The
number of root inversions of 𝜎 with respect to 𝜌 in 𝑇 is denoted by RInv(𝑇, 𝜌, 𝜎). The minimum
number of root inversions in 𝑇 with respect to 𝜌 is denoted
                                  MinRInv(𝑇, 𝜌)  min RInv(𝑇, 𝜌, 𝜎),                                    (4.1)
                                                          𝜎

where 𝜎 ranges over all possible orderings of 𝑇.
    The only aspect of the ordering 𝜎 of 𝑇𝑣 that affects RInv(𝑇𝑣 , 𝜌, 𝜎) is the relative order of the
children of 𝑣. For a node 𝑣 with π‘˜ children 𝑒1 , . . . , 𝑒 π‘˜ , by abusing notation and using 𝜎 to also
denote the ranking of the children induced by the ordering of the tree, we have
                                                   Γ•
                               RInv(𝑇, 𝜌, 𝜎)                 XInv𝜌 (𝐿 𝜎(𝑖) , 𝐿 𝜎(𝑗) ),                 (4.2)
                                                 1≀𝑖<π‘—β‰€π‘˜

where 𝐿 𝑖 is a short-hand for the leaf set 𝐿(𝑇𝑒𝑖 ). The contributions of the nodes can be optimized
independently:                                     Γ•
                                MinInv(𝑇, 𝜌) =        MinRInv(𝑇𝑣 , 𝜌),                         (4.3)
                                                      𝑣
where 𝑣 ranges over all nodes of 𝑇 with degree deg𝑇 (𝑣) > 1.
    When we apply an adjacent-rank transposition 𝜏 to a ranking 𝜌, at most one of terms in the
decomposition (4.3) can change, and the change is at most one unit. We capture this observation
for future reference as it will be helpful in several sensitivity analyses.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                19
                       I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Proposition 4.7. Let 𝜌 be a ranking of the leaf set 𝑋 of a tree 𝑇, 𝜏 an adjacent-rank transposition, and β„“ 1
and β„“2 be the affected leaves. Then

                                   MinRInv(𝑇𝑣 , 𝜌) = MinRInv(𝑇𝑣 , 𝜏𝜌)

for all nodes 𝑣 in 𝑇 except possibly 𝑣 = LCA(β„“ 1 , β„“2 ). Moreover, the difference is at most 1 in absolute
value.

Proof. Since the ranks of β„“ 1 and β„“2 under 𝜌 are adjacent, for any leaf β„“ other than β„“ 1 and β„“ 2 , the
relative order of β„“ under 𝜌 is the same with respect to β„“ 1 as it is with respect to β„“ 2 . This means
that the adjacent-rank transposition 𝜏 does not affect whether a pair of leaves constitutes an
inversion unless that pair equals {β„“ 1 , β„“2 }. As a result, the only term on the right-hand side of
(4.3) that can be affected by the transposition 𝜏 is the one corresponding to the node 𝑣, and it
can change by at most one unit.                                                                      

4.3   Lipschitz continuity
Average-case notions typically do not change much under
small changes to the input. This is indeed the case for the
average sensitivity when β€œsmall” is interpreted as affecting
few of the subtrees. The following lemma quantifies                                         𝑣
the property and can be viewed as a form of Lipschitz
continuity.                                                                                     π‘£βˆ—

Lemma 4.8. Given a tree 𝑇, if a subtree 𝑇𝑣 βˆ— with β„“ leaves is
replaced with a tree 𝑇𝑣0βˆ— with the same number of leaves, resulting
in the tree 𝑇 0, then
                                                                           β„“1              β„“2          π‘Œ
                                         β„“ (β„“ βˆ’ 1)                                                         𝑋
                 |𝑠(Π𝑇 ) βˆ’ 𝑠(Π𝑇 0 )| ≀             .
                                              𝑛
                                                                      Figure 8: Effects of changing 𝑇𝑣 βˆ—
Proof. We think of the leaf sets of 𝑇 and 𝑇 0 as being the
same set 𝑋 = L(𝑇) = L(𝑇 0), and fix a ranking 𝜌 of 𝑋.
Consider an ordering of 𝑇 and the ranking 𝜎 of 𝑋 that it induces. Outside of 𝑇𝑣0βˆ— we can order 𝑇 0
in the same way as 𝑇. Irrespective of how we order 𝑇 0 inside 𝑇𝑣0βˆ— , the induced ranking 𝜎0 of 𝑋
agrees with 𝜎 on all leaves in 𝑋 except possibly those in π‘Œ  L(𝑇𝑣 βˆ— ) = L(𝑇𝑣0βˆ— ). Moreover, under
both 𝜎 and 𝜎0, the set π‘Œ gets mapped to the same contiguous interval. It follows that for all pairs
(β„“ 1 , β„“2 ) of distinct leaves of which at least one lies outside of π‘Œ, (β„“1 , β„“2 ) constitutes an inversion of
𝜎 with respect to 𝜌 if and only if (β„“ 1 , β„“2 ) constitutes an inversion of 𝜎0 with respect to 𝜌. For any
node 𝑣 outside of 𝑇𝑣 βˆ— , root inversions in 𝑇𝑣 cannot involve leaves that are both in π‘Œ  L(𝑇𝑣 βˆ— ). See
Figure 8 for an illustration. Thus, for such nodes 𝑣, RInv(𝑇𝑣 , 𝜌, 𝜎) = RInv(𝑇𝑣0 , 𝜌, 𝜎0). By taking
the minimum over all orderings, we conclude:
Claim 4.9. MinRInv(𝑇𝑣 , 𝜌) = MinRInv(𝑇𝑣0 , 𝜌) holds for every node 𝑣 outside of 𝑇𝑣 βˆ— (or equivalently,
outside of 𝑇𝑣0βˆ— ).

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                               20
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

     Consider a ranking 𝜌 and an adjacent-rank transposition 𝜏 = (π‘Ÿ, π‘Ÿ + 1). We claim that,
unless (β„“ 1 , β„“2 )  (πœŒβˆ’1 (π‘Ÿ), πœŒβˆ’1 (π‘Ÿ + 1)) ∈ π‘Œ Γ— π‘Œ, Π𝑇 is sensitive to 𝜏 at 𝜌 if and only if Π𝑇 0 is
sensitive to 𝜏 at 𝜌. This is because by Proposition 4.7 the only term in the decomposition (4.3)
of MinInv(𝑇, 𝜌) that can be affected by 𝜏 is the contribution MinRInv(𝑇𝑣 , 𝜌) for 𝑣 = LCA(β„“ 1 , β„“2 ).
If at least one of β„“ 1 or β„“ 2 is not inside 𝑇𝑣 βˆ— , then 𝑣 is not inside 𝑇𝑣 βˆ— either, so by Claim 4.9,
MinRInv(𝑇𝑣 , 𝜌) = MinRInv(𝑇𝑣0 , 𝜌). By the same token, MinRInv(𝑇𝑣 , 𝜏𝜌) = MinRInv(𝑇𝑣0 , 𝜏𝜌). It
follows that MinInv(𝑇, 𝜌) β‰  MinInv(𝑇, 𝜏𝜌) if and only if MinInv(𝑇 0 , 𝜌) β‰  MinInv(𝑇 0 , 𝜏𝜌).
     We bound the expected number of values of π‘Ÿ for which (πœŒβˆ’1 (π‘Ÿ), πœŒβˆ’1 (π‘Ÿ + 1)) ∈ π‘Œ Γ— π‘Œ with
π‘Œ  L(𝑇𝑣 ) when 𝜌 is chosen uniformly at random. For π‘Ÿ ∈ [𝑛 βˆ’ 1], the probability that πœŒβˆ’1 (π‘Ÿ) ∈ π‘Œ
is 𝑛ℓ , and the probability that πœŒβˆ’1 (π‘Ÿ + 1) ∈ π‘Œ given that πœŒβˆ’1 (π‘Ÿ) ∈ π‘Œ is π‘›βˆ’1β„“ βˆ’1
                                                                                   . Using linearity of
expectation on the indicators, the expected number of said π‘Ÿ is

                                             β„“ (β„“ βˆ’ 1)   β„“ (β„“ βˆ’ 1)
                                                       
                                     (𝑛 βˆ’ 1)           =           .                                      
                                             𝑛(𝑛 βˆ’ 1)         𝑛

    Lemma 4.8 helps to extend query lower bounds based on average sensitivity to larger classes.
Suppose we have established a good lower bound on the sensitivity 𝑠(Π𝑇 ) for a class 𝐢 of trees.
Consider a class 𝐢 0 obtained by taking a tree 𝑇 in class 𝐢 and replacing some of the subtrees 𝑇𝑣
by other subtrees 𝑇𝑣0 on the same number of leaves. For this new class 𝐢 0 the same lower bound
on the sensitivity of inversion minimization applies modulo the Lipschitz loss. For example,
Theorem 1.6 holds by virtue of a lower bound of the form 𝑠(Π𝑇 ) β‰₯ (𝑛 βˆ’ 1) βˆ’ 𝑐 log(𝑛) for every
binary tree 𝑇 with 𝑛 leaves, where 𝑐 is a universal constant. If we allow some of the subtrees of
𝑇 to be replaced by, say freely arrangeable ones on the same leaves, applying Lemma 4.8 for
each of the modified subtrees in sequence shows that the resulting new tree 𝑇 0 has

                         𝛼𝑛(𝛼𝑛 βˆ’ 1)
   𝑠(Π𝑇 0 ) β‰₯ 𝑠(Π𝑇 ) βˆ’              β‰₯ (𝑛 βˆ’ 1) βˆ’ 𝑐 log(𝑛) βˆ’ 𝛼 2 (𝑛 βˆ’ 1) = (1 βˆ’ 𝛼2 )(𝑛 βˆ’ 1) βˆ’ 𝑐 log(𝑛),
                             𝑛

where 𝛼 denotes the fraction of leaves that belong to one of the replaced subtrees.
     In fact, the notion of average sensitivity is robust with respect to the following, more refined
type of surgery. From any tree 𝑇, let 𝑅 be a connected subset of 𝑇 that includes no leaves. Let
𝑇 0 be the subtree rooted at the LCA of 𝑅 (𝑇 0 contains all of 𝑅), and let 𝑇10 , . . . , π‘‡π‘˜0 be the disjoint
maximal subtrees of 𝑇 0 that are strictly below 𝑅. Let 𝑅0 be any tree that has π‘˜ leaves. Replace 𝑇 0
by 𝑅0, and then replace the leaves of 𝑅0 by 𝑇10 , . . . , π‘‡π‘˜0.
     The effect is that the region 𝑅 has been β€œreshaped" to look like 𝑅0, but the rest of 𝑇 is
unaffected. The cost of such a surgery is at most (𝑛 βˆ’ 1) times the probability that a uniformly
random pair of distinct leaves has their LCA in 𝑅. The bound follows from thinking of sensitivity
as (𝑛 βˆ’ 1) times the probability that a uniformly random edge in the full permutahedron is
bichromatic. Provided the LCA of the affected leaves is outside 𝑅, then we get sensitivity before
the surgery if and only if we get it after the surgery. Surgeries can be iterated, and the costs
accumulate additively. In combination with our strong lower bound on the average sensitivity
of binary trees (Lemma 5.7), this allows for a robust sense in which β€œmostly-binary" trees have
high average sensitivity.

                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                            21
                         I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

5     Refined sensitivity approach for binary trees
In this section we show how to refine the sensitivity approach for lower bounds on the query
complexity of the problem Π𝑇 of inversion minimization on trees in the important special case
of binary trees 𝑇. In Section 5.1 we first develop a criterion for when a particular ranking 𝜌 is
sensitive to a particular adjacent-rank transposition 𝜏. We then analyze the root sensitivity of
binary trees in Section 5.2 and finally establish a strong lower bound on the average sensitivity
in Section 5.3. An application of the Sensitivity Lemma then yields Theorem 1.6.

5.1   Sensitivity criterion
Recall the decomposition of the objective function MinInv(𝑇, 𝜌) into contributions attributed to
each node 𝑣 of degree deg𝑇 (𝑣) > 1, as given by (4.3) in Section 4.2. In the case of binary trees,
the contribution of node 𝑣 can be calculated simply as

                          MinRInv(𝑇𝑣 , 𝜌) = min(XInv𝜌 (𝐿1 , 𝐿2 ), XInv𝜌 (𝐿2 , 𝐿1 )),                      (5.1)

where 𝑒1 and 𝑒2 denote the two children of 𝑣, and 𝐿1  L(𝑇𝑒1 ) and 𝐿2  L(𝑇𝑒2 ) their leaf sets.
This simplicity makes a precise analysis of sensitivity feasible, as we will see next.
       For a given ranking 𝜌 of 𝑇 and a given adjacent-rank transposition 𝜏, we would like to figure
out the effect of 𝜏 on the objective MinInv(𝑇, Β·), in particular when MinInv(𝑇, 𝜏𝜌) = MinInv(𝑇, 𝜌).
Let β„“ lo and β„“ hi denote the two leaves that are affected by the transposition 𝜏 on the ranking
𝜌, where the subscript β€œlo” indicates the lower of the two leaves with respect to 𝜌, and β€œhi”
the higher of the two. Let 𝑣 be the lowest common ancestor LCA(β„“ lo , β„“hi ). We use the same
subscripts β€œlo” and β€œhi” for the two children of 𝑣: 𝑒lo denotes the child whose subtree contains
β„“ lo , and 𝑒hi its sibling. Similarly, we denote by 𝐿lo the leaf set of 𝑇𝑒lo , and by 𝐿hi the leaf set of
𝑇𝑒hi . See Figure 9 for the subsequent analysis.

                                                      𝑣


                                            𝑒lo                    𝑒hi

      𝐿lo = {β„“lo } t 𝐴                                                                    𝐿hi  {β„“ hi } t 𝐡

      𝐴 = 𝐴< t 𝐴>                                                                          𝐡 = 𝐡< t 𝐡>

𝜌(𝐴< ) < 𝜌(β„“ lo ) < 𝜌(𝐴> )           β„“ lo                                β„“ hi         𝜌(𝐡< ) < 𝜌(β„“hi ) < 𝜌(𝐡> )
                             𝐿lo                                                𝐿hi

                              Figure 9: Sensitivity analysis for binary trees

   By Proposition 4.7, the situation before and after the application of 𝜏 is as follows, where we
abbreviate π‘₯  XInv𝜌 (𝐿lo , 𝐿hi ) and 𝑦  XInv𝜌 (𝐿hi , 𝐿lo ).

                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                 22
                     L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL




        ranking RInv(𝑇𝑣 , Β·, 𝜎(β„“ lo ) < 𝜎(β„“ hi )) RInv(𝑇𝑣 , Β·, 𝜎(β„“ hi ) < 𝜎(β„“ lo )) MinRInv(𝑇𝑣 , Β·)
           𝜌                  π‘₯                                 𝑦                      min(π‘₯, 𝑦)
          𝜏𝜌                π‘¦βˆ’1                               π‘₯+1                   min(𝑦 βˆ’ 1, π‘₯ + 1)

   The objective function remains the same iff min(π‘₯, 𝑦) = min(𝑦 βˆ’ 1, π‘₯ + 1), which happens iff
π‘₯ βˆ’ 𝑦 = βˆ’1, or equivalently iff

                                   DInv𝜌 (𝐿lo , 𝐿hi ) = DInv𝜌 ({β„“ lo }, {β„“hi }),                              (5.2)

where we introduce the following short-hand:

Definition 5.1 (cross inversion difference, DInvΒ· (Β·, Β·)). For a ranking 𝜌 of a set 𝑋, and two subsets
𝐴, 𝐡 βŠ† 𝑋,
                            DInv𝜌 (𝐴, 𝐡)  XInv𝜌 (𝐴, 𝐡) βˆ’ XInv𝜌 (𝐡, 𝐴).

     We can split 𝐿lo as 𝐿lo = {β„“lo } t 𝐴 = 𝐴< t {β„“lo } t 𝐴> , where 𝐴< contains all leaves in 𝐿lo that
𝜌 ranks before β„“ lo , and 𝐴> contains all the leaves in 𝐿lo that 𝜌 ranks after β„“ lo . We similarly split
𝐿hi , as indicated in Figure 9. We have that

      DInv𝜌 (𝐿lo , 𝐿hi ) = DInv𝜌 ({β„“ lo }, {β„“hi }) + DInv𝜌 ({β„“ lo }, 𝐡) + DInv𝜌 (𝐴, {β„“hi }) + DInv𝜌 (𝐴, 𝐡).

Since the ranks of β„“ lo and β„“ hi under 𝜌 are adjacent, we have that DInv𝜌 ({β„“ lo }, 𝐡) = |𝐡< | βˆ’ |𝐡> |
and DInv𝜌 (𝐴, {β„“ hi }) = |𝐴> | βˆ’ |𝐴< |. Plugging everything into (5.2) we conclude:

Proposition 5.2. Let 𝑇 be a binary tree, 𝜌 a ranking of the leaves of 𝑇, 𝜏 an adjacent-rank transposition,
β„“ lo and β„“hi the two leaves affected by 𝜏 under 𝜌 such that 𝜌 ranks β„“ lo before β„“ hi . Referring to the notation
in Figure 9, we have that

             MinInv(𝑇, 𝜌) = MinInv(𝑇, 𝜏𝜌) ⇔ DInv𝜌 (𝐿lo , 𝐿hi ) = DInv𝜌 ({β„“ lo }, {β„“ hi })                     (5.3)
                                                 ⇔ DInv𝜌 (𝐴, 𝐡) = |𝐴< | βˆ’ |𝐴> | + |𝐡> | βˆ’ |𝐡< |.              (5.4)

5.2    Root sensitivity
Given a ranking 𝜌 and an adjacent-rank transposition 𝜏, we know by Proposition 4.7 that at
most one of the terms in the decomposition (4.3) of MinInv(𝑇, 𝜌) is affected by the transposition,
namely MinRInv(𝑇𝑣 , 𝜌) where 𝑣 is the lowest common ancestor of the affected leaves β„“ lo and β„“ hi .
It follows that we can write the average sensitivity of Π𝑇  MinInv(𝑇, Β·) as the following convex
combination:

𝑠(Π𝑇 ) = (𝑛 βˆ’ 1) Β· β„™[MinInv(𝑇, 𝜌) β‰  MinInv(𝑇, 𝜏𝜌)]
                    Γ•
        = (𝑛 βˆ’ 1)       β„™[𝑣 = LCA(β„“lo , β„“hi )]β„™[MinRInv(𝑇𝑣 , 𝜌) β‰  MinRInv(𝑇𝑣 , 𝜏𝜌) | 𝑣 = LCA(β„“lo , β„“hi )],
                    𝑣
                                                                                                              (5.5)


                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                   23
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

where the probability is over a uniformly random choice of the ranking 𝜌 and the adjacent-rank
transposition 𝜏, and β„“ lo and β„“hi denote the affected leaves. The conditional probability on the
right-hand side of (5.5) only depends on the subtree 𝑇𝑣 . The ranking 𝜌 of all leaves induces a
ranking 𝜌0 of the leaves of 𝑇𝑣 that is uniform under the conditioning. Similarly, the adjacent-rank
transposition 𝜏 for 𝜌 induces an adjacent-rank transposition 𝜏0 for 𝜌0; the distribution of 𝜏0 under
the conditioning is independent of 𝜌0 and uniform among all adjacent-rank transpositions such
that the affected leaves live in subtrees of different children of 𝑣. Thus, the probability on the
right-hand side of (5.5) coincides with the following notion for the subtree 𝑇𝑣 .

Definition 5.3 (root sensitivity). Let 𝑇 be a tree. The root sensitivity of 𝑇 is the probability that

                                  MinRInv(𝑇𝑣 , 𝜌) β‰  MinRInv(𝑇𝑣 , 𝜏𝜌)

when 𝜌 is a uniform random ranking of L(𝑇), and 𝜏 a uniform random adjacent transposition
with the condition that the affected leaves are in subtrees of different children of the root of 𝑇.

    Note that the only nodes 𝑣 that need to be considered in the sum on the right-hand side of
(5.5) are those that can appear as the lowest common ancestor of two leaves, and such that 𝑇𝑣 is
not freely arrangeable. In the case of binary trees, this means that we only need to consider
nodes 𝑣 of degree 2 such that 𝑇𝑣 contains more than 2 leaves. In this section we prove a strong
lower bound on the root sensitivity of such trees 𝑇𝑣 .
    Consider the binary tree 𝑇 with root 𝑣 in Figure 9. The distribution underlying Definition 5.3
can be generated as follows: Pick a leaf on each side of the root 𝑣 uniformly at random, and let 𝜌
be a ranking of the leaves of 𝑇 that is uniformly random on the condition that the selected leaves
receive adjacent ranks; 𝜏 then is the adjacent-rank transposition that swaps the two selected
leaves. The root sensitivity of 𝑇 is the complement of the probability that the right-hand side
of (5.4) holds under this distribution. Let us analyze the left-hand side of (5.4) further. As
𝐴 = 𝐴< t 𝐴> and 𝐡 = 𝐡< t 𝐡> , we have that

       DInv𝜌 (𝐴, 𝐡) = DInv𝜌 (𝐴< , 𝐡< ) + DInv𝜌 (𝐴< , 𝐡> ) + DInv𝜌 (𝐴> , 𝐡< ) + DInv𝜌 (𝐴> , 𝐡> ).

By the defining properties of the sets involved (see Figure 9), we know that DInv𝜌 (𝐴< , 𝐡> ) =
βˆ’π‘Ž < 𝑏 > and DInv𝜌 (𝐴> , 𝐡< ) = π‘Ž > 𝑏 < , where π‘Ž <  |𝐴< |, π‘Ž >  |𝐴> |, 𝑏 <  |𝐡< |, and 𝑏 >  |𝐡> |. Thus,
we can rewrite criterion (5.4) as:

               DInv𝜌 (𝐴< , 𝐡< ) + DInv𝜌 (𝐴> , 𝐡> ) = π‘Ž < 𝑏 > βˆ’ π‘Ž > 𝑏 < + π‘Ž < βˆ’ π‘Ž > + 𝑏 > βˆ’ 𝑏 < .       (5.6)

A critical observation that helps us to bound the probability of (5.6) is that, conditioned on all
four values π‘Ž Β· and 𝑏 Β· , the right-hand side of (5.6) is fixed, but the left-hand side still contains a
lot of randomness. In fact, under the conditioning stated, the ranking that 𝜌 induces on 𝐴< t 𝐡<
is still distributed uniformly at random, the same holds for the ranking that 𝜌 induces on
𝐴> t 𝐡> , and both distributions are independent. This means that, under the same conditioning,
the left-hand side of (5.6) has the same distribution as the sum XInvπ‘Ž < ,𝑏 < + XInvπ‘Ž > ,𝑏 > of two
independent random variables of the following type:

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                               24
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Definition 5.4 (cross inversion distribution, XInvΒ·,Β· ). For nonnegative integers π‘Ž and 𝑏, XInvπ‘Ž,𝑏
denotes the random variable XInv(𝐴, 𝐡) that counts the number of cross inversions from 𝐴 to 𝐡,
where 𝐴 is an array of length π‘Ž, 𝐡 an array of length 𝑏, and the concatenation 𝐴𝐡 is a uniformly
random permutation of [π‘Ž + 𝑏].

   In Section 9 we establish the following upper bound on the probability that the number of
cross inversions takes on any specific value.

Lemma 5.5. There exists a constant 𝐢 such that for all integers π‘Ž, 𝑏 β‰₯ 1 and 0 ≀ π‘˜ ≀ π‘Žπ‘,

                                                                 𝐢
                                      β„™[XInvπ‘Ž,𝑏 = π‘˜] ≀ p                  .                                  (5.7)
                                                              π‘Žπ‘(π‘Ž + 𝑏)

    Using Lemma 5.5 we can establish an upper bound of the same form as the right-hand side
of (5.7) for the probability that (5.6) holds: For some constant 𝐢 0

                                                                                               𝐢0
           β„™[XInvπ‘Ž < ,𝑏 < + XInvπ‘Ž > ,𝑏 > = π‘Ž < 𝑏 > βˆ’ π‘Ž > 𝑏 < + π‘Ž < βˆ’ π‘Ž > + 𝑏 > βˆ’ 𝑏 < ] ≀ p               ,   (5.8)
                                                                                             π‘Žπ‘(π‘Ž + 𝑏)

where π‘Ž  π‘Ž < + π‘Ž > β‰₯ 1 and 𝑏  𝑏 < + 𝑏 > β‰₯ 1. We consider several cases based on the relative
sizes of π‘Ž < vs π‘Ž > , and 𝑏 < vs 𝑏 > .

  (i) In case both π‘Ž < β‰₯ πœ‚π‘Ž and 𝑏 < β‰₯ πœ‚π‘, the bound (5.8) follows from (5.7) as long as 𝐢 0 β‰₯ 𝐢/πœ‚3/2 .

  (ii) By switching the order in (i), the same holds true in case both π‘Ž > β‰₯ πœ‚π‘Ž and 𝑏 > β‰₯ πœ‚π‘.

 (iii) In case π‘Ž > ≀ πœ‚π‘Ž and 𝑏 < ≀ πœ‚π‘, the left-hand side of (5.6) is at most 2πœ‚π‘Žπ‘, whereas the
       right-hand side is at least

                 (1 βˆ’ πœ‚)2 π‘Žπ‘ βˆ’ πœ‚2 π‘Žπ‘ + (1 βˆ’ πœ‚)π‘Ž βˆ’ πœ‚π‘Ž + (1 βˆ’ πœ‚)𝑏 βˆ’ πœ‚π‘ = (1 βˆ’ 2πœ‚)(π‘Žπ‘ + π‘Ž + 𝑏).

      As long as 2πœ‚ ≀ 1 βˆ’ 2πœ‚, or equivalently, πœ‚ ≀ 1/4, this case cannot occur.

 (iv) By switching the roles of 𝐴 and 𝐡 in (iii), the same holds true in case 𝑏 > ≀ πœ‚π‘ and π‘Ž < ≀ πœ‚π‘Ž.

As long as πœ‚ ≀ 1/2, it holds that either π‘Ž < β‰₯ πœ‚π‘Ž or π‘Ž > β‰₯ πœ‚π‘Ž, and either 𝑏 < β‰₯ πœ‚π‘ or 𝑏 > β‰₯ πœ‚π‘.
Distributing the β€œand” over the β€œor”, we obtain the four cases we considered, which are therefore
exhaustive. We conclude that (5.8) holds for 𝐢 0 = 43/2 𝐢 whenever π‘Ž  |𝐴| β‰₯ 1 and 𝑏  |𝐡| β‰₯ 1.
    In the case where π‘Ž = 0 and 𝑏 β‰₯ 1, the right-hand side of (5.4) vanishes, as do |𝐴< | and
|𝐴> |, so (5.4) holds if and only if |𝐡< | = |𝐡> |, or equivalently, the leaf β„“hi is ranked exactly in the
middle of the leaf set 𝐿hi . As the ranking 𝜌 is chosen uniformly at random, this happens with
probability 1/(𝑏 + 1) where 𝑏  |𝐡| = |𝐿hi | βˆ’ 1. The case where π‘Ž β‰₯ 1 and 𝑏 = 0 is symmetric.
The remaining case, π‘Ž = 𝑏 = 0, is one we do not need to consider as the tree 𝑇 then only has two
leaves. In all other cases we obtain a strong upper bound on the probability that (5.4) holds, and
by complementation a strong lower bound on the root sensitivity. We capture the lower bound
in the following single expression that holds for all cases under consideration.

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                   25
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Lemma 5.6. There exists a constant 𝑐 such that for every binary tree 𝑇 with at leaves 3 leaves and a root
of degree 2, the root sensitivity of 𝑇 is at least
                                                              𝑐
                                             1βˆ’ p                            ,                           (5.9)
                                                        𝑛1 𝑛2 (𝑛1 + 𝑛2 )
where 𝑛1 and 𝑛2 denote the number of leaves in the subtrees rooted by the two children of the root.

5.3   Average sensitivity
We are now ready to establish that, except for trivial cases, the average sensitivity of a binary
tree is close to maximal. The trivial cases are those where the tree has at most two leaves, in
which case the sensitivity is zero.
Lemma 5.7. The average sensitivity of Π𝑇 for binary trees 𝑇 with 𝑛 β‰₯ 2 leaves is 𝑛 βˆ’ 𝑂(1).
Proof. We use the expression (5.5) for the average sensitivity of Π𝑇 , where 𝑣 ranges over all
nodes of degree 2 such that 𝑇𝑣 contains as least two leaves. Consider a node 𝑣 of degree 2 such
that 𝑇𝑣 contains 𝑛𝑣,1 leaves one one side and 𝑛𝑣,2 leaves on the other side, where 𝑛𝑣,1 + 𝑛𝑣,2 β‰₯ 3.
If we choose  the ranking 𝜌 and the adjacent-rank transposition 𝜏 uniformly at random, each of
the 𝑛2 pairs of leaves are equally likely to be the affected pair. As there are 𝑛𝑣,1 Β· 𝑛𝑣,2 choices
      
                                                                                             2𝑛𝑣,1 𝑛𝑣,2
that result in 𝑣 as their lowest common ancestor, we have that β„™[𝑣 = LCA(β„“lo , β„“hi )] = 𝑛(π‘›βˆ’1)          .
Combining this with the root sensitivity lower bound given by (5.9), we have that
                                                                                                    !
                                       Γ• 2𝑛𝑣,1 𝑛𝑣,2                                𝑐
                    𝑠(Π𝑇 ) β‰₯ (𝑛 βˆ’ 1)                         Β· 1βˆ’ p
                                         𝑣
                                                 𝑛(𝑛 βˆ’ 1)                𝑛𝑣,1 𝑛𝑣,2 (𝑛𝑣,1 + 𝑛𝑣,2 )
                                                    𝑛𝑣,1 𝑛𝑣,2
                                                  r
                                             2𝑐 Γ•
                            = (𝑛 βˆ’ 1) βˆ’                                  .
                                             𝑛     𝑣
                                                           𝑛𝑣,1 + 𝑛𝑣,2

The following claim then completes the proof.                                                               
Claim 5.8. There is a constant 𝑐 0 such that for all binary trees 𝑇 with 𝑛 leaves
                                          Γ• r 𝑛𝑣,1 𝑛𝑣,2
                                                                     ≀ 𝑐 0 𝑛,                           (5.10)
                                             𝑣
                                                       𝑛𝑣,1 + 𝑛𝑣,2

where the sum ranges over all nodes 𝑣 of degree 2 such that 𝑇𝑣 contains at least 3 leaves.
Proof of Claim 5.8. We use structural induction to prove a somewhat stronger claim, namely that
                                    Γ• r 𝑛𝑣,1 𝑛𝑣,2                             √
                                                                  ≀ 𝑐 0 𝑛 βˆ’ 𝑑0 𝑛                        (5.11)
                                     𝑣
                                                 𝑛𝑣,1 + 𝑛𝑣,2

for some constants 𝑐 0 and 𝑑0 to be determined. As the base case we consider binary trees 𝑇 with
at most two leaves. In this case, the left-hand side of (5.11) is zero and the right-hand side is
non-negative provided 𝑐 0 β‰₯ 𝑑0, so (5.11) holds.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                26
                     L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

   For the inductive step, the case where the root of 𝑇 has degree 1 immediately follows
from the inductive hypothesis for the subtree 𝑇𝑒 rooted by the child 𝑒 of the root of 𝑇. The
remaining case is where the root of 𝑇 has degree 2. Let 𝑒1 and 𝑒2 be the two children for the root,
𝑛1 = |L(𝑇𝑒1 )|, and 𝑛2 = |L(𝑇𝑒2 )|. The sum on the left-hand side of (5.11) has three contributions:
q
     𝑛1 𝑛2
    𝑛1 +𝑛2 from the root, and the contributions from 𝑇𝑒1 and 𝑇𝑒2 , to which we can individually
apply the inductive hypothesis. This gives us an upper bound of

          𝑛1 𝑛2                                                   𝑛1 𝑛2
     r                                                       r
                               √                   √                                          √    √
                 + (𝑐 0 𝑛1 βˆ’ 𝑑0 𝑛1 ) + (𝑐 0 𝑛2 βˆ’ 𝑑0 𝑛2 ) =               + 𝑐 0(𝑛1 + 𝑛2 ) βˆ’ 𝑑0( 𝑛1 + 𝑛2 ),
         𝑛1 + 𝑛2                                                 𝑛1 + 𝑛2

which we want to upper bound by
                                           √                      √
                                 𝑐 0 𝑛 βˆ’ 𝑑0 𝑛 = 𝑐 0(𝑛1 + 𝑛2 ) βˆ’ 𝑑0 𝑛1 + 𝑛2 .

Writing 𝑛1 = 𝛼𝑛 for some 𝛼 ∈ [0, 1] and rearranging terms, the upper bound holds if and only if
                                    p        √    √
                              𝛼(1 βˆ’ 𝛼) ≀ 𝑑0( 𝛼 + 1 βˆ’ 𝛼 βˆ’ 1).
                                              √
We claim that the upper bound holds for 𝑑0 = ( 2 + 1)/2. Let
                                          √   √            p
                                𝐹(𝛼)  𝑑0( 𝛼 + 1 βˆ’ 𝛼 βˆ’ 1) βˆ’ 𝛼(1 βˆ’ 𝛼).

It suffices to show that 𝐹(𝛼) β‰₯ 0. Since 𝐹 is continuous on [0, 1], it attains a minimum on [0, 1].
On (0, 1), 𝐹 is differentiable. It can be verified that 𝐹0 has a unique zero in (0, 1/2), which needs
to be a maximum as 𝐹 is increasing at 𝛼 = 0. By the symmetry 𝐹(𝛼) = 𝐹(1 βˆ’ 𝛼), it follows
that the minimum of 𝐹 on [0, 1] is attained at the midpoint 𝛼 = 1/2 or at one of the endpoint
𝛼 = 0√or 𝛼 = 1. At all three points 𝐹(𝛼) = 0. We conclude that (5.11) holds for any constants
𝑑0 β‰₯ ( 2 + 1)/2 and 𝑐 0 β‰₯ 𝑑0.                                                                       


6     Sensitivity approach for bounded error
In this section, we apply the sensitivity approach to obtain lower bounds on the query complexity
of problems in the comparison-query model against randomized algorithms with bounded error.
We derive a general result that query lower bounds against deterministic algorithms that are
based on the Sensitivity Lemma also hold against bounded-error randomized algorithms with a
small loss in strength. The approach works particularly well when we have linear lower bounds
on the average sensitivity, in which case there is only a constant-factor loss in the strength of
the query lower bound. Among others, this applies to the Ξ©(𝑛 log 𝑛) query lower bound for
inversion minimization on trees of bounded degree.
    Our approach is based on Yao’s minimax principle [29], which lower bounds worst-case
complexity against randomized algorithms with bounded error by average-case complexity
against deterministic algorithms with bounded distributional error. We view a deterministic
algorithm with small distributional error for a problem Ξ  as an exact deterministic algorithm

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                27
                        I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

for a slightly modified problem Ξ 0. The idea is to then apply the sensitivity approach to Ξ 0, and
capitalize on the closeness of the average sensitivities of Ξ  and Ξ 0 to obtain a lower bound in
terms of the sensitivity of Ξ . By using the Sensitivity Lemma as a black box, the approach yields
a lower bound on the query complexity of bounded-error algorithms that is worst-case with
respect to the input and with respect to the randomness, i. e., the lower bound holds for some
input and some computation path on that input. By delving into the proof of the Sensitivity
Lemma, we are able to obtain a lower bound that is worst-case with respect to the input but
average-case with respect to the randomness, i. e., the lower bound holds for the expected
number of queries on some input.1
    We first define the notions of randomized complexity and distributional complexity.
Definition 6.1 (randomized query complexity, RQΒ· (Β·), and distributional query complexity,
DistQΒ· (Β·, Β·)). Let Ξ  be a problem in the comparison-query model and πœ€ ∈ [0, 1].
     A randomized algorithm 𝑅 for Ξ  is said to have error πœ€ if on every input 𝜌, the algorithm
outputs Ξ (𝜌) with probability at least 1 βˆ’ πœ€. The query complexity of 𝑅 is the maximum, over
all inputs 𝜌, of the expected number of queries that 𝑅 makes on input 𝜌. The πœ€-error randomized
query complexity of Ξ , denoted RQπœ€ (Ξ ), is the minimum query complexity of 𝑅 over all πœ€-error
randomized algorithms 𝑅 for Ξ .
     Let π’Ÿ be a probability distribution on the inputs 𝜌. A deterministic algorithm 𝐴 for Ξ  has
error πœ€ with respect to π’Ÿ if the probability that 𝐴(𝜌) = Ξ (𝜌) is at least 1 βˆ’ πœ€ where the input 𝜌 is
chosen according to π’Ÿ. The query complexity of 𝐴 with respect to π’Ÿ is the expected number of
queries that 𝐴 makes on input 𝜌 when 𝜌 is chosen according to π’Ÿ. The πœ€-error distributional query
complexity of Ξ  with respect to π’Ÿ, denoted DistQπœ€ (Ξ , π’Ÿ), is the minimum query complexity of
𝐴 with respect to π’Ÿ over all deterministic algorithms 𝐴 for Ξ  that have error πœ€ with respect to
π’Ÿ.
   The relationship between randomized complexity and distributional complexity is described
by Yao’s principle.
Lemma 6.2 (Yao’s minimax principle [29]). Let Ξ  be a problem in the comparison-query model,
πœ€ ∈ [0, 1/2], and π’Ÿ a distribution on the inputs 𝜌.
                                                       1
                                         RQπœ€ (Ξ ) β‰₯       DistQ2πœ€ (Ξ , π’Ÿ).                                        (6.1)
                                                       2
   We now prove lower bounds on the distributional query complexity, and thus on randomized
query complexity, of comparison-query problems Ξ  based on average sensitivity bounds. For
these bounds, we always set π’Ÿ to be the uniform distribution, the distribution underlying the
notion of average sensitivity.
   We start by studying average-case query complexity, i. e., zero-error distributional query
complexity, and its relationship to the average sensitivity. We follow a strategy similar to the
   1In fact, the approach yields a lower bound that is average-case with respect to the input (chosen uniformly at
random) as well as the randomness. This follows because the proof of Yao’s minimax principle allows us to replace
the left-hand side of (6.1) by the average of the expected number of queries with respect to the distribution π’Ÿ, which
we pick to be uniform in our application of the principle.


                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                      28
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

one in the proof of the Sensitivity Lemma. Whereas a bound on deterministic complexity Q
follows purely from the number of execution traces D, here, the execution traces are weighted
by their depth and their probability of occurring.
    Recall that 𝑔 in the statement of the Strong Sensitivity Lemma denotes any convex function
𝑔 : [1, ∞) β†’ ℝ with 𝑔(π‘₯) = π‘₯! for π‘₯ ∈ [𝑛]; for deriving the Sensitivity Lemma from the Strong
Sensitivity Lemma we also need 𝑔 to be nondecreasing. One such function is 𝑔(π‘₯) = Ξ“(π‘₯ + 1). To
prove a lower bound on the zero-error distributional complexity, we need the function 𝑔 to be
not only convex, but log-convex, i. e., log2 𝑔(π‘₯) needs to be convex. The function 𝑔(π‘₯) = Ξ“(π‘₯ + 1)
satisfies this constraint as well.

Proposition 6.3. Let Ξ  be a problem in the comparison-query model with 𝑛 items, π’Ÿ the uniform
distribution on the inputs 𝜌, and 𝑔 : [1, ∞) β†’ ℝ a nondecreasing log-convex function with 𝑔(π‘₯) = π‘₯!
for π‘₯ ∈ [𝑛].
                                DistQ0 (Ξ , π’Ÿ) β‰₯ log2 (𝑔(𝑠(Ξ ) + 1)/𝑛)

Proof. Let π‘˜ be the number of distinct execution traces of a deterministic algorithm 𝐴 for Ξ , and
let 𝑅 1 , . . . , 𝑅 π‘˜ denote the corresponding sets of rankings. Interpreting 𝐴 as a binary decision
tree, let 𝑑(𝑅 𝑖 ) be the depth of the execution trace corresponding to 𝑅 𝑖 . By Kraft’s inequality,

                                                    π‘˜
                                                    Γ•
                                                              2βˆ’π‘‘(𝑅 𝑖 ) ≀ 1.
                                                        𝑖=1

              𝑔(π‘₯+1)
Let 𝑓 (π‘₯) = 𝑛1 𝑛! and define the weight function 𝑀(𝜌) = 𝑓 (deg𝐻 (𝜌)), where 𝐻 refers to the
notation of the Strong Sensitivity Lemma: 𝐻 denotes the subgraph of the full permutahedron
that only consists of the bichromatic edges when the vertices are colored with their execution
trace under 𝐴. By Claim 3.3, the sum of the weights of all rankings 𝜌 in 𝑅 𝑖 is at most 1. Therefore,
                                                Γ•
                                                         2βˆ’π‘‘(𝜌) 𝑀(𝜌) ≀ 1.
                                                    𝜌


Dividing both sides by 𝑛! and taking the logarithm of both sides, we get that
                                                h                   i
                                                    βˆ’π‘‘(𝜌)
                                        log2 𝔼 2              𝑀(𝜌) ≀ log2 (1/𝑛!),                           (6.2)

where the expectation is with respect to a uniform distribution over the inputs 𝜌. By Jensen’s
inequality, since log is concave, we get
                       h            i       h                           i
             log2 𝔼 2βˆ’π‘‘(𝜌) 𝑀(𝜌) β‰₯ 𝔼 log2 2βˆ’π‘‘(𝜌) 𝑀(𝜌)                           = 𝔼[βˆ’π‘‘(𝜌)] + 𝔼[log2 𝑀(𝜌)],

which, in combination with (6.2), implies

                                   𝔼[log2 𝑀(𝜌)] ≀ 𝔼[𝑑(𝜌)] + log2 (1/𝑛!).

                           T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                               29
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Note that since 𝑔 is log-convex, so is 𝑓 . By applying Jensen’s inequality again,

                     𝔼[log2 𝑀(𝜌)] = 𝔼[log2 𝑓 (deg𝐻 (𝜌))] β‰₯ log2 𝑓 (𝔼[deg𝐻 (𝜌)]),

implying
                                  1 𝑔(𝔼[deg𝐻 (𝜌)] + 1)
                                                         
                       log2         Β·                  ≀ 𝔼[𝑑(𝜌)] + log2 (1/𝑛!),
                                  𝑛        𝑛!
or equivalently,
                                   𝔼[𝑑(𝜌)] β‰₯ log2 (𝑔(𝔼[deg𝐻 (𝜌)] + 1)/𝑛).
The result follows since 𝐴 is an arbitrary deterministic algorithm for Ξ , 𝔼[𝑑(𝜌)] equals the query
complexity of 𝐴 with respect to the uniform distribution π’Ÿ, 𝔼[deg𝐻 (𝜌)] β‰₯ 𝔼[deg𝐺(Ξ ) (𝜌)] = 𝑠(Ξ ),
and 𝑔 is nondecreasing.                                                                          
    Proposition 6.3 allows us to prove a lower bound on the πœ€-error distributional query
complexity of Ξ  with respect to the uniform distribution. In order to do so, we view a
deterministic algorithm with distributional error πœ€ for Ξ  as an exact deterministic algorithm for
a modified problem Ξ 0, apply Proposition 6.3, and lower bound the sensitivity of Ξ 0 in terms of
the sensitivity of Ξ .

Proposition 6.4. Let Ξ  be a problem in the comparison-query model with 𝑛 items, π’Ÿ the uniform
distribution on the inputs 𝜌, πœ€ ∈ [0, 1], and 𝑔 : [1, ∞) β†’ ℝ a nondecreasing log-convex function with
𝑔(π‘₯) = π‘₯! for π‘₯ ∈ [𝑛].

                                                        𝑔(𝑠(Ξ ) + 1 βˆ’ 2(𝑛 βˆ’ 1)πœ€)
                                                                                 
                          DistQπœ€ (Ξ , π’Ÿ) β‰₯ log2                                                  (6.3)
                                                                   𝑛

Proof. Consider any algorithm 𝐴 with error πœ€ for Ξ , or in other words, β„™[𝐴(𝜌) β‰  Ξ (𝜌)] ≀ πœ€. Let
Π𝐴 be the problem of determining the output of 𝐴. We prove that

                                         𝑠(Π𝐴 ) β‰₯ 𝑠(Ξ ) βˆ’ 2(𝑛 βˆ’ 1)πœ€,

which implies the desired result by Proposition 6.3, since 𝐴 is a deterministic algorithm for Π𝐴
and 𝑔 is nondecreasing.
    Let 𝐺 denote the full permutahedron graph for 𝑛 items. We use the fact that 𝑠(Ξ ) =
(𝑛 βˆ’ 1) Β· β„™π‘’βˆˆπΊ [𝑒 ∈ 𝐺(Ξ )], and similarly, 𝑠(Π𝐴 ) = (𝑛 βˆ’ 1) Β· β„™π‘’βˆˆπΊ [𝑒 ∈ 𝐺(Π𝐴 )], where all the
underlying distributions are uniform. Suppose the endpoints of 𝑒 are 𝜌1 and 𝜌2 . Note that if
𝑒 ∈ 𝐺 is picked uniformly at random, then the marginal distributions of both 𝜌1 and 𝜌2 are also
uniform. If 𝐴(𝜌1 ) = Ξ (𝜌1 ), 𝐴(𝜌2 ) = Ξ (𝜌2 ), and 𝑒 ∈ 𝐺(Π𝐴 ), then 𝑒 ∈ 𝐺(Ξ ), as well. By a union
bound, the probability that 𝐴(𝜌1 ) β‰  Ξ (𝜌1 ) or 𝐴(𝜌2 ) β‰  Ξ (𝜌2 ) is at most 2πœ€.

           β„™π‘’βˆˆπΊ [𝑒 ∈ 𝐺(Π𝐴 )] β‰₯ β„™π‘’βˆˆπΊ [𝑒 ∈ 𝐺(Ξ )] βˆ’ β„™π‘’βˆˆπΊ [𝐴(𝜌1 ) β‰  Ξ (𝜌1 ) or 𝐴(𝜌2 ) β‰  Ξ (𝜌2 )]
                             β‰₯ β„™π‘’βˆˆπΊ [𝑒 ∈ 𝐺(Ξ )] βˆ’ 2πœ€.

Multiplying both sides by 𝑛 βˆ’ 1 gives 𝑠(Π𝐴 ) β‰₯ 𝑠(Ξ ) βˆ’ 2(𝑛 βˆ’ 1)πœ€.                                   

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                        30
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    Since 𝑠(Ξ ) ≀ 𝑛 βˆ’ 1, Proposition 6.4 only yields nontrivial lower bounds for small πœ€. In order
to establish lower bounds for the standard πœ€ = 1/3, we first reduce the error using standard
techniques. Doing so such that the argument of 𝑔 on the right-hand side of (6.3) remains
Ξ©(𝑠(Ξ )), and picking 𝑔(π‘₯) = Ξ“(π‘₯ + 1), we conclude:

Lemma 6.5 (Bounded-Error Sensitivity Lemma). For any problem Ξ  in the comparison-query model
with 𝑛 items,
                                                𝑠 log 𝑠
                                                       
                              RQ1/3 (Ξ ) = Ξ©               ,
                                              log(2𝑛/𝑠)
where 𝑠  𝑠(Ξ ).

Proof. By taking the majority vote of multiple independent runs and a standard analysis, e. g.,
based on Chernoff bounds, we have that RQπœ€ (Ξ ) = 𝑂(log(1/πœ€)) RQ1/3 (Ξ ) for any πœ€ ≀ 1/3.
Combining this with Lemma 6.2 and Proposition 6.4, we have:
                                                                                  
                     RQπœ€ (Ξ )     DistQ2πœ€ (Ξ )    log(𝑔(𝑠 + 1 βˆ’ 4(𝑛 βˆ’ 1)πœ€)/𝑛)
       RQ1/3 (Ξ ) = Ξ©          =Ξ©             =Ξ©                             .
                     log(1/πœ€)     log(1/πœ€)               log(1/πœ€)

Setting πœ€ such that 4π‘›πœ€ = 𝑠/2 yields
                                                                
                                            log(𝑔(𝑠/2 + 1)/𝑛)
                              RQ1/3 (Ξ ) = Ξ©                   .
                                                log(8𝑛/𝑠)
                                                         √      π‘₯ π‘₯
Picking 𝑔(π‘₯) = Ξ“(π‘₯ + 1) and using the fact that Ξ“(π‘₯) β‰₯
                                                                 
                                                          2πœ‹π‘₯   e , we obtain

                                (𝑠/2) log(𝑠/(2e)) βˆ’ log 𝑛      𝑠 log 𝑠
                                                                         
                  RQ1/3 (Ξ ) = Ξ©                           =Ξ©           ,
                                        log(8𝑛/𝑠)            log(2𝑛/𝑠)
                                                                                           √
where the simplification can be verified by considering the cases of large 𝑠 (say 𝑠 β‰₯          𝑛) and
small 𝑠 separately.                                                                                

   We can apply Lemma 6.5 to the sensitivity lower bounds of Lemma 4.1 and produce
randomized lower bounds for inversion minimization on bounded-degree trees. Using Fact 4.5
we obtain:

Theorem 6.6 (lower bound against bounded-error for inversion minimization on trees). Let 𝑇
be a tree with deg(𝑇) ≀ π‘˜. The query complexity of Π𝑇 for bounded-error randomized algorithms is
   𝑛 log(𝑛/π‘˜)
Ξ©( π‘˜ log(π‘˜) ).


7   Connectivity Lemma
In this section we establish Lemma 1.16 and use it to present some of the known lower bounds
in a unified framework. We actually prove the following somewhat stronger result.

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                          31
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Lemma 7.1 (Strong Connectivity Lemma). Consider an algorithm 𝐴 in the comparison-based model,
color each vertex of the permutahedron with its execution trace under 𝐴, and let 𝐻 denote the subgraph
with the same vertex set but only containing the monochromatic edges. The number of distinct execution
traces of 𝐴 equals the number of connected components of 𝐻.
    The Connectivity Lemma follows from Lemma 7.1 because the coloring with execution
traces of an algorithm 𝐴 for Π is a refinement of the coloring with Π. Note that the counterpart
of Lemma 7.1 in the Boolean setting is trivial. This is because an execution trace in the Boolean
setting is specified by values for a subset of the input bits, so the set of inputs that follow a
particular execution trace form a subcube of the hypercube, the Boolean counterpart of the
permutahedron. Subcubes are trivially connected inside the hypercube. In the comparison-
query model, the sets of inputs that follow a particular execution trace can be more complicated,
and their connectedness is no longer trivial but still holds.
Proof of Lemma 7.1. Two rankings 𝜌1 and 𝜌2 that have distinct execution traces under 𝐴 cannot
be connected because any path between them needs to contain at least one bichromatic edge.
For the remainder of the proof, we consider two rankings 𝜌1 and 𝜌2 that have the same execution
trace under 𝐴, and construct a path from 𝜌1 to 𝜌2 in 𝐻.
    If 𝜌1 = 𝜌2 , we do not need to make any move and use an empty path.
    Otherwise, there exists a rank π‘Ÿ < 𝑛 such that 𝜌1 and 𝜌2 agree on ranks less than π‘Ÿ and
disagree on rank π‘Ÿ. We have the following situation, where the item π‘¦π‘Ÿ with rank π‘Ÿ under 𝜌2 ,
has rank 𝑠 > π‘Ÿ under 𝜌1 .

                      rank 1 Β· Β· Β·       π‘Ÿβˆ’1 π‘Ÿ       Β·Β·Β·   π‘ βˆ’1       𝑠       Β·Β·Β·   𝑛
                       πœŒβˆ’1
                        1
                           π‘₯1 Β· Β· Β·      π‘₯ π‘Ÿβˆ’1 π‘₯ π‘Ÿ   Β·Β·Β·   π‘₯ π‘ βˆ’1 π‘₯ 𝑠 = π‘¦π‘Ÿ    Β·Β·Β·
                              =     =     =      β‰ 
                       πœŒβˆ’1
                        2
                              𝑦1 Β· Β· Β·   π‘¦π‘Ÿβˆ’1   π‘¦π‘Ÿ   Β·Β·Β·                     Β·Β·Β·
     Considering ranking 𝜌1 , we have that 𝜌1 (π‘₯ π‘ βˆ’1 ) = 𝑠 βˆ’ 1 < 𝑠 = 𝜌1 (π‘₯ 𝑠 ). Considering ranking
𝜌2 , since π‘₯ π‘ βˆ’1 differs from 𝑦 𝑖 = π‘₯ 𝑖 for every 𝑖 ∈ [π‘Ÿ βˆ’ 1] and also differs from π‘¦π‘Ÿ , we have that
𝜌2 (π‘₯ π‘ βˆ’1 ) > π‘Ÿ = 𝜌2 (π‘¦π‘Ÿ ) = 𝜌2 (π‘₯ 𝑠 ). Thus, the relative ranks of π‘₯ π‘ βˆ’1 and π‘₯ 𝑠 under 𝜌1 and 𝜌2 differ. As
𝜌1 and 𝜌2 have the same execution trace, this means that the algorithm does not compare π‘₯ π‘ βˆ’1
and π‘₯ 𝑠 on either input, and on 𝜌1 in particular. Let 𝜌01 be the ranking obtained from ranking 𝜌1
by applying the adjacent-rank transposition 𝜏 = (𝑠 βˆ’ 1, 𝑠). Since the algorithm does not compare
the affected items, the execution trace for 𝜌01 and 𝜌1 are the same, so the edge from 𝜌01 to 𝜌1 is
monochromatic and in 𝐻. We use this edge as the first on the path from 𝜌1 to 𝜌2 in 𝐻. What
remains is to find a path from 𝜌01 to 𝜌2 in 𝐻. The situation is the same as the one depicted above
but with π‘Ÿ increased by one in case 𝑠 = π‘Ÿ + 1, and with the same π‘Ÿ and 𝑠 decreased by one,
otherwise. The proof then follows by induction on the ordered pair (π‘Ÿ, 𝑛 βˆ’ 𝑠).                             
Remark 7.2. Suppose we allow an algorithm 𝐴 to have multiple valid execution traces on a
given input 𝜌, and let 𝑅 denote the set of rankings on which a particular execution trace is valid.
The construction in the proof of Lemma 7.1 yields a path in the permutahedron between any
two rankings in 𝑅 such that the path entirely stays within 𝑅. This means that we can replace
D(Ξ ) in the statement of the Connectivity Lemma by its nondeterministic variant N(Ξ ).

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                              32
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    The Connectivity Lemma captures all the prior lower bounds stated in Section 2 except the
elementary adversary argument (which is also based on connectivity considerations, but in
an undirected graph other than 𝐻, namely (𝑋 , 𝐸) where 𝐸 denotes the queries the algorithm
makes on a given input ranking 𝜌). It captures the information-theoretic lower bound because
input rankings with different outputs cannot belong to the same connected component of 𝐻.
We already explained in Section 1 how the Connectivity Lemma shows that counting inversions
and inversion parity amount to sorting, and require at least log(𝑛!) queries. We now illustrate
its use for a classical problem that is easier than sorting, namely, median finding.
    Let Ξ  denote the selection problem with rank π‘Ÿ = d𝑛/2e. For any ranking, the adjacent-rank
transpositions 𝜏 that change the item with rank π‘Ÿ are the two that involve rank π‘Ÿ: 𝜏 = (π‘Ÿ βˆ’ 1, π‘Ÿ)
and 𝜏 = (π‘Ÿ, π‘Ÿ + 1). Those transpositions are the ones that correspond to missing edges in the
permutahedron graph 𝐺(Π). As a result, for any two rankings, there exists a path between them
in 𝐺(Π) if and only if they have the same median as well as the same set of items with rank less
than π‘Ÿ (and also the same set of items withrank greater than π‘Ÿ). As there are 𝑛 possibilities
for the median and, for each median, π‘›βˆ’1    π‘Ÿβˆ’1 possibilities for the set of items that have rank
                            π‘›βˆ’1
less than π‘Ÿ, 𝐺(Ξ ) has 𝑛 Β· π‘Ÿβˆ’1 connected components. It follows that any algorithm for Ξ  has
                         √
at least 𝑛 Β· π‘›βˆ’1              𝑛
              π‘Ÿβˆ’1 = Ξ©( 𝑛 Β· 2 ) distinct execution paths, and therefore needs to make at least
                 

𝑛 + 12 log(𝑛) βˆ’ 𝑂(1) queries.
    As a side note, this example clarifies a subtlety in the equivalence between ordinary
selection and the instantiation of partial order production that is considered equivalent to
selection. Whereas selection of rank π‘Ÿ ordinarily requires outputting only the item of rank π‘Ÿ, the
instantiation of partial order production additionally requires partitioning the remaining items
according to whether their ranks are less than or greater than π‘Ÿ. The above analysis implies
that it is impossible for the algorithm to know the item of rank π‘Ÿ without also knowing how
to partition the remaining items into those of rank less than and greater than π‘Ÿ. It follows
that, in the comparison-based model, ordinary selection and the instantiation of partial order
production are equivalent.



8   Connectivity approach

This section covers the connectivity approach for obtaining query lower bounds in the
comparison-query model. Our main focus is the problem Π𝑇 of inversion minimization
on a fixed tree 𝑇, for which we derive very strong query lower bounds in the case of the special
types of trees in Theorem 1.8. Some parts of the analysis carry through for a broader class of
problems Ξ , namely those that satisfy a certain partition property. We first develop the property
and apply the Connectivity Lemma to a general problem Ξ  with the property. We then present
sufficient conditions for the problem Π𝑇 to have the property and perform a detailed analysis,
leading to Theorem 1.8. Finally, we apply the same ideas to the problem of counting cross
inversions, for which we obtain the query lower bound of Theorem 1.9, as well as to the closely
related problem of inversion minimization on the Mann–Whitney trees of Figure 2.

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                      33
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

8.1    Partition property
In order to obtain good lower bounds on D(Ξ ) using the Connectivity Lemma, it is sufficient to
find good upper bounds on the size of the connected components of a typical vertex in 𝐺(Π).
For the problem Π𝑇 , we can assume without loss of generality that 𝑇 has no internal nodes
of degree 1, i. e., no nodes with exactly one child. With that assumption, Π𝑇 is insensitive to
any adjacent-rank transposition 𝜏 at a ranking 𝜌 for which the affected leaves are siblings in 𝑇.
Thus, the corresponding edges from the permutahedron are always present in 𝐺(Π𝑇 ). From the
perspective of ensuring small connected components in 𝐺(Π𝑇 ), the ideal situation would be if
there were no other edges in 𝐺(Π𝑇 ). That is to say, Π𝑇 is sensitive at 𝜌 to every adjacent-rank
transposition 𝜏 except when the affected leaves are siblings. We will investigate conditions on 𝑇
that guarantee this situation in the next two subsections. In this subsection, we analyze the size
of the connected components of 𝐺(Π𝑇 ) when 𝑇 is of the desired type, and use it obtain a query
lower bounds via the Connectivity Lemma. Our analysis applies more generally to any problem
Ξ  with the following property.

Definition 8.1 (partition property). A computational problem Ξ  in the comparison-query model
on a set 𝑋 of 𝑛 items has the partition property if the set 𝑋 can be partitioned into sets 𝑋𝑖 such
that for any ranking 𝜌 of 𝑋 and adjacent-rank transposition 𝜏 = (π‘Ÿ, π‘Ÿ + 1) with π‘Ÿ ∈ [𝑛 βˆ’ 1],
Ξ (𝜌) = Ξ (𝜏𝜌) if and only if πœŒβˆ’1 (π‘Ÿ) and πœŒβˆ’1 (π‘Ÿ + 1) belong to the same partition class 𝑋𝑖 . If every
partition class 𝑋𝑖 has size at most π‘˜, we say that Ξ  has the partition property with class size at
most π‘˜.

    In other words, a problem Ξ  has the partition property if the underlying universe can be
partitioned in such a way that adjacent-rank transpositions that do not change the answer are
exactly those whose affected items fall within the same partition class. In the case of the problem
Π𝑇 , the partition classes 𝑋𝑖 correspond to the leaf child sets LC(𝑣) from Definition 1.7, where 𝑣
ranges over the leaf parents.
    Let us investigate the size of the connected components of 𝐺(Π) when Π satisfies the partition
property. Consider a walk in 𝐺(Π). As the only steps we can take correspond to adjacent-rank
transpositions 𝜏 that swap elements in the same partition class, the sets 𝑋𝑖 remain invariant,
irrespective of the ranking 𝜌 we start from. Depending on 𝜌, there may be more structure inside
each partition class 𝑋𝑖 ; the set 𝑋𝑖 may be broken up into smaller subsets that are each invariant.
For our analysis, we list the elements of each partition class in order of increasing rank under
𝜌, and include an edge between elements that have successive ranks. We introduce the term
β€œsuccessor graph" to capture this structure, viewed as a graph with the ranks as vertices.

Definition 8.2 (successor graph, 𝑆(Β·, Β·)). Let Ξ  be a computational problem in the comparison-
query model on a set 𝑋 of 𝑛 items, and 𝜌 a ranking of 𝑋. The successor graph of Ξ  on 𝜌, denoted
𝑆(Ξ , 𝜌), has vertex set [𝑛] and contains all edges of the form (π‘Ÿ, π‘Ÿ + 1) with π‘Ÿ ∈ [𝑛 βˆ’ 1] such that
Ξ (𝜌) = Ξ (𝜏𝜌), where 𝜏 denotes the adjacent-rank transposition (π‘Ÿ, π‘Ÿ + 1).

      We have the following connection.


                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                       34
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Proposition 8.3. Let Ξ  be a computational problem in the comparison-query model on the set 𝑋, and let
𝜌 be a ranking of 𝑋. If Ξ  has the partition property, then the connected component of 𝜌 in 𝐺(Ξ ) has size
  𝑗 (𝑛 𝑗 !), where the 𝑛 𝑗 ’s denote the sizes of the connected components of 𝑆(Ξ , 𝜌).
Î

Proof. The connected components of the successor graph 𝑆(Ξ , 𝜌) correspond to subsets of
the classes 𝑋𝑖 that each remain invariant under walks in 𝐺(Ξ ). Within each of the subsets,
independently for each subset, every possible ordering can be realized by such walks. This is
because for any adjacent-rank transposition 𝜏, the successor graphs 𝑆(Ξ , 𝜌) and 𝑆(Ξ , 𝜏𝜌) are
the same, and every ordering can be realized by a sequence of swaps of adjacent elements. It
follows that the number of rankings that can be reached from 𝜌 in 𝐺(Π) equals the product over
all connected components of 𝑆(Ξ , 𝜌) of the number of possible orderings of the elements in the
connected component.                                                                         
    Figure 10 depicts an example for a problem of type Π𝑇 and a partition consisting of 4 classes,
namely the leaf child sets LC1 , LC2 , LC3 and LC4 . The tree 𝑇 and ranking 𝜌 are represented in
Figure 10a. Figure 10b represents the part of the successor graph 𝑆(Π𝑇 , 𝜌) involving the leaf
child set LC3 and illustrates the subpartitioning into invariant subsets.



                                                   LC4       π‘Ÿ     2     3     7     8     9        11   14
                                         1    13

LC1                           LC2                          𝜌(π‘Ÿ)
      4   12   6   5     10

                       LC3
                              11 2   3   9 14 7    8

  (a) Leaf child sets and leaf ranks under 𝜌                        (b) Inside leaf child set LC3

                                Figure 10: Connected component analysis

    If each of the partition classes 𝑋𝑖 has size at most π‘˜, then each of the connected components
of 𝑆(Ξ , 𝜌) has size 𝑛 𝑗 ≀ π‘˜, irrespective of 𝜌. The maximum value that 𝑗 (𝑛 𝑗 !) can take under
                                                                            Î

the constraints 𝑗 𝑛 𝑗 = 𝑛 and 𝑛 𝑗 ≀ π‘˜ is no more than (π‘˜!)𝑛/π‘˜ . By the Connectivity Lemma, we
                 Í

conclude that D(Ξ ) β‰₯ 𝑛!/(π‘˜!)𝑛/π‘˜ , and that the query complexity is at least log2 (𝑛!) βˆ’ 𝑂(𝑛 log(π‘˜)).
    We can do better by observing that, for a random ranking 𝜌, the number of adjacent-rank
transpositions 𝜏 that do not jump from one partition class to another is not much larger than the
average size of the partition classes.

Lemma 8.4 (lower bound for problems with the partition property). Let Ξ  be a computational
problem in the comparison-query model on a set of size 𝑛. If Ξ  satisfies the partition property with class
size at most π‘˜, then then D(Ξ ) β‰₯ 𝑛!/(2(π‘˜!)2 ).

Proof. For any rank π‘Ÿ ∈ [𝑛 βˆ’ 1], the probability that πœŒβˆ’1 (π‘Ÿ) and πœŒβˆ’1 (π‘Ÿ + 1) belong to the same
                                                            π‘˜βˆ’1
partition class equals 𝑖 |𝑋𝑛𝑖 | |π‘‹π‘›βˆ’1
                                   𝑖 |βˆ’1
                                                                provided each partition class 𝑋𝑖 has size
                      Í
                                         , which is at most π‘›βˆ’1

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                              35
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

at most π‘˜. It follows that the expected number of adjacent-rank transpositions 𝜏 that do not
change partition class, is at most π‘˜ βˆ’ 1, so for a fraction at least half of the rankings 𝜌 the number
is at most 2(π‘˜ βˆ’ 1).
    The number of adjacent-rank transpositions 𝜏 that do not change partition class for a given
ranking 𝜌 equals the number of edges in the successor graph 𝑆(Ξ , 𝜌). In terms of the sizes 𝑛 𝑗 ,
the number equals 𝑗 (𝑛 𝑗 βˆ’ 1). We are considering rankings 𝜌 for which the sum is at most
                      Í
                             Î                                     Í
2(π‘˜ βˆ’ 1). The maximum of 𝑗 (𝑛 𝑗 !) under the constraints that 𝑗 (𝑛 𝑗 βˆ’ 1) ≀ 2(π‘˜ βˆ’ 1) and that each
individual 𝑛 𝑗 ≀ π‘˜, is reached when two of the 𝑛 𝑗 ’s equal π‘˜ and the rest are 1. Thus, if each of the
partition classes 𝑋𝑖 are of size at most π‘˜, for a fraction at least half of the rankings 𝜌, the size of
the connected component of 𝜌 in 𝐺(Ξ ) is at most (π‘˜!)2 . It follows that the number of connected
components of 𝐺(Ξ ) is at least 𝑛!/(2(π‘˜!)2 ). The Connectivity Lemma then yields the claimed
lower bound on D(Ξ ).                                                                                 

    Lemma 8.4 yields a lower bound of log(𝑛!) βˆ’ 𝑂(π‘˜ log(π‘˜)) on the query complexity of Ξ 
whenever Ξ  satisfies the partition property with class size at most π‘˜.
    Next we turn to sufficient conditions on the tree 𝑇 that guarantee the partition property for
Π𝑇 . For didactic reasons we first develop the conditions for binary trees, and then generalize
them to arbitrary trees.

8.2   Binary trees
In the case of binary trees 𝑇, the sensitivity analysis of Section 5.1 leads to a simple sufficient
condition for the partition property to hold for Π𝑇 . Recall that we are assuming without loss of
generality that 𝑇 has no internal nodes of degree 1, which in the case of binary trees is equivalent
to saying that the tree is full: Every internal node has the maximum degree of 2.
    Consider criterion 5.3 in Proposition 5.2. The right-hand side is always βˆ’1. As for the
left-hand side, we know the following.

Fact 8.5. For all disjoint sets 𝐴, 𝐡 βŠ† 𝑋 and any ranking 𝜌 of 𝑋, DInv𝜌 (𝐴, 𝐡) = |𝐴| Β· |𝐡| mod 2.

Proof. As every pair in 𝐴 Γ— 𝐡 constitutes a cross-inversion for either 𝐴 to 𝐡, or 𝐡 to 𝐴, we have
XInv𝜌 (𝐴, 𝐡) + XInv𝜌 (𝐡, 𝐴) = |𝐴| · |𝐡|. Thus,

                   DInv𝜌 (𝐴, 𝐡)  XInv𝜌 (𝐴, 𝐡) βˆ’ XInv𝜌 (𝐡, 𝐴)
                                 = (XInv𝜌 (𝐴, 𝐡) + XInv𝜌 (𝐡, 𝐴)) βˆ’ 2 XInv𝜌 (𝐡, 𝐴)
                                 = |𝐴| Β· |𝐡| βˆ’ 2 XInv𝜌 (𝐡, 𝐴).                                     (8.1)

As XInv𝜌 (𝐡, 𝐴) in an integer, the claim follows.                                                     

   Fact 8.5 implies that whenever at least one of the leaf sets 𝐿lo or 𝐿hi is of even cardinality,
then (5.3) fails to hold, and Π𝑇 is sensitive to the underlying 𝜏 at 𝜌. Thus, we can guarantee that
Π𝑇 satisfies the partition property provided that for any two siblings 𝑒1 and 𝑒2 in 𝑇 that are
not both leaves, at least one of |L(𝑇𝑒1 )| or |L(𝑇𝑒2 )| is even. We refer to the latter condition as the
product condition. In trees without nodes of degree 1, the product condition can be expressed

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                           36
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

alternately in terms of the leaf child sets. We state and prove the result for arbitrary trees as it
will help us in the next subsection to generalize the analysis.
Proposition 8.6. Let 𝑇 be a tree without nodes of degree 1. The following two conditions are equivalent:
  (a) For any two siblings 𝑒1 and 𝑒2 that are not both leaves, at least one of |L(𝑇𝑒1 )| or |L(𝑇𝑒2 )| is even.

  (b) At most one leaf child set is odd, and if there exists a node 𝑣 βˆ— with an odd leaf child set LC(𝑣 βˆ— ), then
      all ancestors of 𝑣 βˆ— have an empty leaf child set.
In the case of binary trees, (b) can be simplified to: At most one leaf has a non-leaf sibling.
Proof. We establish the two directions of implication separately.
β‡’: We argue the contrapositive. Suppose that at least two of the leaf child sets are odd. Start
with the root of 𝑇 as the node 𝑣, and iterate the following: If 𝑣 has a child 𝑒 such that 𝑇𝑒 contains
at least two nodes with an odd leaf child set, replace 𝑣 by such a child 𝑒. When the process
ends, one of the following situations applies:
   β€’ There are two distinct children 𝑒1 and 𝑒2 of 𝑣 that are not leaves and each contain a single
     node with an odd leaf child set. In this case both 𝑇𝑒1 and 𝑇𝑒2 contain an odd number of
     leaves, violating (a).

   β€’ There exists a unique child 𝑒1 of 𝑣 that is not a leaf and contains a single node with an odd
     leaf child set, and 𝑣 itself has an odd number of leaf children. In this case, setting 𝑒2 to
     any one leaf child of 𝑣 (which exists as their number is odd), leads to a violation of (a).
Next, suppose that there exists a unique node 𝑣 βˆ— that has an odd leaf child set, and that an
ancestor 𝑣 of 𝑣 βˆ— has a leaf child 𝑒1 . Setting 𝑒2 to the child of 𝑣 that contains 𝑣 βˆ— in its subtree,
yields a violation of (a) as 𝑇𝑒2 contains an odd number of leaves.
⇐: If neither 𝑒1 nor 𝑒2 are leaves, the first condition of (b) guarantees that at most one of 𝑇𝑒1 or
𝑇𝑒2 contains an odd number of leaves. If 𝑒1 is a leaf and 𝑒2 is not, then the second condition of
(b) implies that 𝑇𝑒2 cannot contain a node with an odd leaf child set, and therefore has an even
number of leaves.
   In the case of binary trees, the first condition of (b) implies the second one, which can
therefore be dropped from the equivalence statement. Moreover, for binary trees the first
condition of (b) can be expressed as: At most one leaf has a non-leaf sibling.             
    By Proposition 8.6, in a binary tree the condition that at most one leaf has no sibling is
equivalent to the product condition, which implies the partition property of Π𝑇 , so the lower
bound of Lemma 8.4 applies. This establishes Theorem 1.8 in the case of binary trees.
    As a side note, Fig. 11 shows the simplest example of a full binary tree 𝑇 and a ranking 𝜌 for
which there exists an adjacent-rank transposition 𝜏 to which Π𝑇 is insensitive at 𝜌 while the
affected leaves are not siblings. The tree has two leaves without siblings, namely 1 and 2. The
adjacent-rank transposition (3, 4) acts on nodes that are not siblings, but leaves the minimum
number of inversions at 4.

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                  37
                       I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN




                                   1                   2

                                         3       6          4       5

                       Figure 11: Tree insensitive to non-sibling transposition


8.3     General trees
For general trees 𝑇 the sensitivity analysis of Π𝑇 becomes more complicated than for binary
trees, and we do not know of a simple sensitivity criterion like Proposition 5.2, but we can
nevertheless extend the result for binary trees to arbitrary trees with similar constraints. For a
given ranking 𝜌 of L(𝑇) and a given adjacent-rank transposition 𝜏, we would like to figure out
the effect of 𝜏 on the objective MinInv(𝑇, Β·), in particular when MinInv(𝑇, 𝜏𝜌) = MinInv(𝑇, 𝜌).
Recall the decomposition (4.3) of MinInv(𝑇, Β·) from Section 4.2. By Proposition 4.7 the only term
on the right-hand side of (4.3) that can be affected by the transposition 𝜏 is

                                 MinRInv(𝑇𝑣 , 𝜌)  min RInv(𝑇, 𝜌, 𝜎)
                                                      𝜎

corresponding to the node 𝑣 that is the least common ancestor LCA(β„“ lo , β„“hi ) of the two leaves
β„“ lo and β„“ hi that are affected by 𝜏 under 𝜌. In Section 5 we considered the two possible relative
orderings 𝜎1 and 𝜎2 of the children of 𝑣, and derived a criterion for when the lowest cost does
not change under 𝜏. More precisely, when

             min(RInv(𝑇, 𝜌, 𝜎1 ), RInv(𝑇, 𝜌, 𝜎2 )) = min(RInv(𝑇, 𝜏𝜌, 𝜎1 ), RInv(𝑇, 𝜏𝜌, 𝜎2 )).   (8.2)

There are two complications in generalizing this approach from binary to general trees.

      β€’ The expression (4.2) for RInv(𝑇, 𝜌, 𝜎) involves multiple terms instead of just one as in
        (5.1). This complicates probabilistic analyses like the one we did in Section 5 because the
        difference in cost of the two relative orderings of two children is also affected by parts of
        the tree outside of their combined subtrees. The issue did not matter for the analysis in
        Section 4. We will be able to manage it here, as well.

      β€’ There now are not just two but multiple possible orderings 𝜎, and it is not clear what
        pairs (𝜎1 , 𝜎2 ) we need to impose (8.2) on in order to guarantee that MinRInv(𝑇𝑣 , 𝜌) =
        MinRInv(𝑇𝑣 , 𝜏𝜌) but no more.

In Section 4 we circumvented the second issue by only considering sensitivities that decrease
the objective function, and establishing a lower bound on their occurrence independent of
the ordering 𝜎. Here we are also able to handle the second issue by shooting for a sufficient
condition for sensitivity rather than a criterion. We do so by requiring that for no pair of distinct

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                      38
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

orderings 𝜎1 and 𝜎2 , condition (8.2) holds (unless the two affected leaves are siblings). Similar
to the case of binary trees, we guarantee that (8.2) fails based on parity considerations given the
product condition.
    For the analysis we again assume without loss of generality that 𝑇 has no nodes of degree 1.
We use the same notation as in Section 5: Let β„“ lo denote the affected leaf that is smaller with
respect to 𝜌, and β„“ hi the other affected leaf. Let 𝐿 𝑖 denote the leaf set 𝐿 𝑖  L(𝑇𝑒𝑖 ), where 𝑒1 , . . . , 𝑒 π‘˜
are the children of 𝑣 = LCA(β„“lo , β„“hi ). We also write 𝑒lo for the child of 𝑣 that contains β„“lo in its
subtree, and 𝐿lo for the leaf set of the subtree rooted at 𝑒lo , and define β„“hi , 𝑒hi , and 𝐿hi similarly.
See Figure 12 for a sketch of the setting.

                                                              𝑣
                                                                            π‘˜

                       𝑒1          Β· Β· Β· 𝑒lo          Β·Β·Β·             Β·Β·Β·              𝑒hi Β· Β· Β·               π‘’π‘˜




                                               β„“lo                              β„“ hi

                                 Figure 12: Sensitivity for general trees

    We slightly abuse notation and let 𝜎 denote both the ordering of the entire tree 𝑇 as well as
the ranking of the children of 𝑣. By the analysis of Section 5, we have that for two orderings 𝜎1
and 𝜎2 the situation (8.2) can only occur if 𝜎1 (𝑒lo ) < 𝜎1 (𝑒hi ), 𝜎2 (𝑒lo ) > 𝜎2 (𝑒hi ), and
                    RInv𝜌 (𝑇𝑣 , 𝜌, 𝜎1 ) βˆ’ RInv𝜌 (𝑇𝑣 , 𝜌, 𝜎2 ) = βˆ’1 = DInv𝜌 ({β„“ lo }, {β„“hi }).                                               (8.3)
For any two disjoint sets of leaves, (8.1) lets us write
                                           1               1
                                XInv𝜌 (𝐴, 𝐡) =
                                             DInv𝜌 (𝐴, 𝐡) + |𝐴| · |𝐡|.                                                                      (8.4)
                                           2               2
Applying (8.4) to all the terms involved in (4.2), we have
                                                         Γ•
                            RInv𝜌 (𝑇𝑣 , 𝜌, 𝜎) =                 XInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) Β· 𝕀[𝜎(𝑖) < 𝜎(𝑗)]
                                                     1≀𝑖<π‘—β‰€π‘˜
                                                            Γ•
                                                      +               XInv𝜌 (𝐿 𝑗 , 𝐿 𝑖 ) Β· 𝕀[𝜎(𝑖) > 𝜎(𝑗)]
                                                          1≀𝑖<π‘—β‰€π‘˜
                                                     1      Γ•                                                                          
                                                =                      DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) Β· (βˆ’1)𝕀[𝜎(𝑖>𝜎(𝑗)] + |𝐿1 | Β· |𝐿2 |
                                                     2
                                                         1≀𝑖<π‘—β‰€π‘˜
                                                          Γ•                                           Γ•
     RInv𝜌 (𝑇𝑣 , 𝜌, 𝜎1 ) βˆ’ RInv𝜌 (𝑇𝑣 , 𝜌, 𝜎2 ) =                     DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) +                          DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 )
                                                      1≀𝑖<π‘—β‰€π‘˜                                       1≀𝑖<π‘—β‰€π‘˜
                                                     𝜎1 (𝑖)<𝜎2 (𝑗)                                 𝜎1 (𝑖)>𝜎2 (𝑗)
                                                     𝜎2 (𝑖)>𝜎2 (𝑗)                                 𝜎2 (𝑖)<𝜎2 (𝑗)



                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                                                  39
                        I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

By combining the last equation with (8.3) and separating out the term for (𝑖, 𝑗) = (lo, hi), we
obtain the following necessary condition for (8.2) to hold:
                                                          Γ•                                     Γ•
  DInv𝜌 (𝐿lo , 𝐿hi ) βˆ’ DInv𝜌 ({β„“ lo }, {β„“ hi }) = βˆ’                   DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) +                   DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ). (8.5)
                                                        1≀𝑖<π‘—β‰€π‘˜                               1≀𝑖<π‘—β‰€π‘˜
                                                      𝜎1 (𝑖)<𝜎2 (𝑗)                          𝜎1 (𝑖)>𝜎2 (𝑗)
                                                      𝜎2 (𝑖)>𝜎2 (𝑗)                          𝜎2 (𝑖)<𝜎2 (𝑗)
                                                      (𝑖,𝑗)β‰ (lo,hi)

    In order for Π𝑇 to have the partition property, it suffices to ensure that (8.5) fails whenever
𝑒lo and 𝑒hi are not both leaves. By (8.1) each of the terms DInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) in (8.5) has the same
parity as |𝐿 𝑖 | Β· |𝐿 𝑗 |. Since DInv𝜌 ({β„“ lo }, {β„“ hi }) is odd, it follows that (8.5) fails whenever at most
one of the leaf sets 𝐿 𝑖 involved is odd, which is condition (a) in Proposition 8.6. Switching to the
equivalent condition (b) from Proposition 8.6 allows us to conclude via Lemma 8.4:

Theorem 8.7. Let 𝑇 be a tree without nodes of degree 1 such that the leaf child sets have size at most π‘˜, at
most one of them is odd, and if there exists an odd one, say LC(𝑣 βˆ— ), then all ancestors of 𝑣 βˆ— have empty leaf
child sets. Then D(Π𝑇 ) β‰₯ 𝑛!/(2(π‘˜!)2 ).

      Theorem 1.8 follows by taking the base-2 logarithm of the bound.

8.4     Counting cross inversions and evaluating the Mann–Whitney statistic
We now apply the connectivity approach that we captured in Proposition 8.3 to the problem
ΠXInv of computing the number of cross inversions between two disjoint sets 𝐴 and 𝐡 with
respect to a ranking 𝜌 of 𝑋 = 𝐴 t 𝐡. Note that this problem is a refinement of evaluating the
Mann–Whitney statistic, or equivalently, of inversion minimization on the tree 𝑇 of Figure 2: Any
algorithm that solves Ξ XInv with π‘ž queries, can be transformed into an algorithm for Π𝑇 with π‘ž
queries, namely by transforming the output 𝑦 of the algorithm for Ξ XInv to min(𝑦, |𝐴| Β· |𝐡| βˆ’ 𝑦).
Viewed in the contrapositive, a lower bound for Ξ XInv is easier to obtain than one for Π𝑇 on the
tree 𝑇 of Figure 2. We first establish a lower bound for Ξ XInv and then see how it extends to Π𝑇 .
     One can think of Ξ XInv as inversion minimization on the Mann–Whitney tree without
allowing swapping the two children of the root. As a result, the problem Ξ XInv is sensitive to
every adjacent-rank transposition between non-siblings, and therefore automatically satisfies
the partition property (with 𝐴 and 𝐡 being the partition classes), so Proposition 8.3 applies. In
contrast, the problem Π𝑇 of minimizing inversions on the Mann–Whitney tree may not have the
partition property. This is why analyzing Ξ XInv is a bit simpler, and why we handle it first.
     Let π‘Ž  |𝐴| and 𝑏  |𝐡|. By the partition property, the average sensitivity of Ξ XInv equals
π‘Ž+𝑏 , which via the Sensitivity Lemma yields a query lower bound of Ξ©(π‘Ž log(π‘Ž)) for π‘Ž ≀ 𝑏. To
 2π‘Žπ‘

obtain the stronger lower bound of Ξ©((π‘Ž + 𝑏) log(π‘Ž)) we need a more detailed analysis of the
connectivity of the permutahedron graph 𝐺(ΠXInv ).
     For a given ranking 𝜌, let π‘₯ 1 , . . . , π‘₯ π‘Ž be the elements of 𝐴 listed in increasing order, and
similarly for 𝑦1 , . . . , 𝑦𝑏 for the elements of 𝐡. We define π‘š1 , . . . , π‘šπ‘+1 such that for each 𝑖, π‘š 𝑖 is
the number of elements of 𝐴 between 𝑦 π‘–βˆ’1 and 𝑦 𝑖 . (Here, 𝑦0 and 𝑦𝑏+1 serve as sentinels with
an infinitely low and infinitely high rank.) Similarly, we define 𝑛1 , . . . , 𝑛 π‘Ž+1 as the number

                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                                      40
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

of elements in 𝐡 between successive elements of 𝐴. The numbers π‘š 𝑖 and 𝑛 𝑖 are the sizes of
the connected components of the successor graph 𝑆(Ξ XInv , 𝜌) (possibly with some additional
zeroes). By Proposition 8.3, the connected component of 𝜌 in 𝐺(ΠXInv ) has size

                                      (π‘š1 )! Β· Β· Β· (π‘šπ‘+1 )!(𝑛1 )! Β· Β· Β· (𝑛 π‘Ž+1 )!.                    (8.6)

Depending on the values of π‘š1 , . . . , π‘šπ‘+1 , 𝑛1 , . . . , 𝑛 π‘Ž+1 , some connected components may be
much larger than others. We apply the Connectivity Lemma in a similar way as in Lemma 8.4
and only count the rankings 𝜌 that are in small connected components, which are the rankings
for which π‘š1 , . . . , π‘šπ‘+1 , 𝑛1 , . . . , 𝑛 π‘Ž+1 are bounded. Let π‘š βˆ— and 𝑛 βˆ— be the minimum integers for
which
                                                      3                                    3
                     β„™[π‘š1 , . . . , π‘šπ‘+1 ≀ π‘š βˆ— ] β‰₯       and β„™[𝑛1 , . . . , 𝑛 π‘Ž+1 ≀ 𝑛 βˆ— ] β‰₯ .
                                                      4                                    4
By a union bound, the probability that both of these events hold is at least 1/2. In other words,
there are least (π‘Ž + 𝑏)!/2 rankings 𝜌 for which π‘š1 , . . . , π‘šπ‘+1 ≀ π‘š βˆ— and 𝑛1 , . . . , 𝑛 π‘Ž+1 ≀ 𝑛 βˆ— .
Proposition 8.8.
                                                             (π‘Ž + 𝑏)!
                                          D(Π𝑇 ) β‰₯                               .                    (8.7)
                                                      2(π‘š βˆ— )!π‘Ž/π‘š (𝑛 βˆ— )!𝑏/𝑛
                                                                 βˆ—           βˆ—



Proof. We consider the rankings for which π‘š1 , . . . , π‘šπ‘+1 ≀ π‘š βˆ— and 𝑛1 , . . . , 𝑛 π‘Ž+1 ≀ 𝑛 βˆ— . We first
argue the following upper bound on the size (8.6) of the connected component in 𝐺(ΠXInv ) of
any such ranking 𝜌:

                          (π‘š1 )! Β· Β· Β· (π‘šπ‘+1 )!(𝑛1 )! Β· Β· Β· (𝑛 π‘Ž+1 )! ≀ (π‘š βˆ— )π‘Ž/π‘š (𝑛 βˆ— )𝑏/𝑛 .
                                                                                     βˆ—       βˆ—
                                                                                                      (8.8)

For nonnegative integers 𝑛, 𝑛!1/𝑛 is increasing. This can be seen by noticing that log(𝑛!1/𝑛 ) is the
                                                                                         βˆ—
average of log(1), . . . , log(𝑛). As a result, for 𝑖 ∈ [𝑏 + 1], (π‘š 𝑖 )!1/π‘šπ‘– ≀ (π‘š βˆ— )!1/π‘š , or equivalently,
(π‘š 𝑖 )! ≀ (π‘š βˆ— )π‘šπ‘– /π‘š . Using the fact that π‘š1 + Β· Β· Β· + π‘šπ‘+1 = π‘Ž,
                     βˆ—




                          (π‘š1 )! Β· Β· Β· (π‘šπ‘+1 )! ≀ (π‘š βˆ— )!(π‘š1 +Β·Β·Β·+π‘šπ‘+1 )/π‘š = (π‘š βˆ— )!π‘Ž/π‘š .
                                                                            βˆ—                βˆ—




We can apply similar reasoning to get (𝑛1 )! Β· Β· Β· (𝑛 π‘Ž+1 )! ≀ (𝑛 βˆ— )!𝑏/𝑛 . From this, we conclude that
                                                                                         βˆ—


the size of the connected components among the rankings under consideration is at most the
right-hand side of (8.8).
    Since there are at least (π‘Ž + 𝑏)!/2 of the rankings under consideration, we derive the stated
bound on D(Π𝑇 ) by applying the Connectivity Lemma.                                                  
    Now, we find concrete bounds on π‘š βˆ— and 𝑛 βˆ— .
Proposition 8.9.
                                           π‘Ž         π‘Ž+𝑏
                               max 1,           ≀ π‘šβˆ— ≀     ln(4(𝑏 + 1)).
                                           𝑏+1          𝑏
Symmetrically,
                                       𝑏         π‘Ž+𝑏
                                                 
                               max 1,     ≀ π‘›βˆ— ≀     ln(4(π‘Ž + 1)).
                                      π‘Ž+1         π‘Ž

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                              41
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Proof. We first prove the upper bound on π‘š βˆ— . Let π‘˜ be a positive integer. We compute the
probability, over an average ranking 𝜌, that π‘š 𝑖 > π‘˜ for a specific 𝑖. Notice that there is a one-to-one
correspondence between the ranking 𝜌 and the corresponding sequence of nonnegative integers
π‘š1 , . . . , π‘šπ‘+1 such that π‘š1 + Β· Β· Β· + π‘šπ‘+1 = π‘Ž, because the ranks of 𝐡 can be uniquely recovered
as π‘š1 + 1, π‘š1 + π‘š2 + 2, . . . , and the remaining ranks form 𝐴. By stars and bars, there are π‘Ž+𝑏
                                                                                                        
                                                                                                      𝑏
such sequences. Now, if π‘š 𝑖 > π‘˜, then π‘š 𝑖 βˆ’ (π‘˜ + 1) is an arbitrary nonnegative integer, and
π‘š1 + Β· Β· Β· + (π‘š 𝑖 βˆ’ (π‘˜ + 1)) + Β· Β· Β· + π‘šπ‘+1 = π‘Ž βˆ’ π‘˜ βˆ’ 1. By stars and bars, there are π‘Ž+π‘βˆ’π‘˜βˆ’1
                                                                                                  
                                                                                              𝑏     such
sequences. Therefore, the probability that π‘š 𝑖 > π‘˜ is π‘Ž+π‘βˆ’π‘˜βˆ’1
                                                                   π‘Ž+𝑏 
                                                              𝑏    /   𝑏  . Continuing,
                                                                          𝑏
                       (π‘Ž + 𝑏 βˆ’ π‘˜ βˆ’ 1) Β· Β· Β· (π‘Ž βˆ’ π‘˜)   π‘Ž+π‘βˆ’π‘˜βˆ’1                            π‘˜+1
                                                                                             
          β„™[π‘š 𝑖 > π‘˜] =                               ≀                         ≀ exp βˆ’π‘ Β·     ,
                           (π‘Ž + 𝑏) Β· Β· Β· (π‘Ž + 1)         π‘Ž+𝑏                              π‘Ž+𝑏
where the last step uses the bound 1 + π‘₯ ≀ exp(π‘₯). By a union bound and taking the complement,
                                                                         π‘˜+1
                                                                                 
                         β„™[π‘š1 , . . . , π‘šπ‘+1 ≀ π‘˜] β‰₯ 1 βˆ’ (𝑏 + 1) exp βˆ’π‘ Β·     .                         (8.9)
                                                                         π‘Ž+𝑏
If π‘˜ is such that the right-hand side of (8.9) is at least 3/4, we know that π‘š βˆ— ≀ π‘˜. Solving for π‘˜
yields the stated bound on π‘š βˆ— .
                                                                                                      π‘Ž
    For the lower bounds, π‘š βˆ— β‰₯ 1 because at least one of π‘š1 , . . . , π‘šπ‘+1 is at least 1, and π‘š βˆ— β‰₯ 𝑏+1
because π‘š1 + Β· Β· Β· + π‘šπ‘+1 = π‘Ž, and π‘š is greater than or equal to the average term in the sum. 
                                    βˆ—


    The first part of Theorem 1.9 now comes from taking the logarithm of D(Ξ XInv ) in Proposi-
tion 8.8 and using the bounds in Proposition 8.9.
Proof of the first part of Theorem 1.9. We mainly make use of the following approximation based
on Stirling’s formula.                           
                                                1
                                   ln(𝑛!) = 𝑛 +     ln(𝑛) βˆ’ 𝑛 + 𝑂(1).                     (8.10)
                                                2
In order to estimate ln D(Ξ XInv ), we need to estimate ln((π‘Ž + 𝑏)!) βˆ’ π‘šπ‘Žβˆ— ln(π‘š βˆ— ) βˆ’ π‘›π‘βˆ— ln(𝑛 βˆ— ). By (8.10),
                                                     
                                                       1
                          ln((π‘Ž + 𝑏)!) = π‘Ž + 𝑏 +          ln(π‘Ž + 𝑏) βˆ’ (π‘Ž + 𝑏) + 𝑂(1)                  (8.11)
                                                       2
                         π‘Ž          βˆ—
                                                  π‘Ž                    π‘Ž
                             ln((π‘š    )!) =   π‘Ž +        ln(π‘š βˆ— ) βˆ’ π‘Ž + βˆ— 𝑂(1)                        (8.12)
                         π‘š βˆ—                         βˆ—
                                                  2π‘š                   π‘š
                           𝑏                       𝑏                   𝑏
                                            
                              ln((𝑛 βˆ— )!) = 𝑏 + βˆ— ln(𝑛 βˆ— ) βˆ’ 𝑏 + βˆ— 𝑂(1)                               (8.13)
                          π‘›βˆ—                      2𝑛                   𝑛
                                                                               𝑏
We can use the lower bounds in Proposition 8.9, namely that π‘š βˆ— β‰₯ 1 and 𝑛 βˆ— β‰₯ π‘Ž+1 , to simplify
the occurrences of π‘š and 𝑛 in the denominators of (8.12) and (8.13). Therefore,
                    βˆ—     βˆ—

                         π‘Ž+𝑏         π‘Ž+𝑏   π‘Ž           π‘Ž+1
                                               
                                                                     1
     ln D(Ξ XInv ) β‰₯ π‘Ž ln      + 𝑏 ln      βˆ’ ln(π‘š βˆ— ) βˆ’     ln(𝑛 βˆ— ) + ln(π‘Ž + 𝑏) βˆ’ 𝑂(π‘Ž)
                          π‘š βˆ—         𝑛 βˆ—  2            2            2
                          π‘Ž+𝑏               π‘Ž+𝑏
                                                             
                                       1
                  = π‘Ž ln   √      + 𝑏+   ln     βˆ’ 𝑂(π‘Ž).
                         π‘šβˆ— π‘šβˆ— π‘›βˆ—      2     π‘›βˆ—


                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                               42
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Using the upper bounds in Proposition 8.9, absorbing low-order terms, and using the condition
that π‘Ž ≀ 𝑏, we get
                          √ !              !
                         𝑏 π‘Žπ‘                          √
 log D(Ξ XInv ) β‰₯ Ξ© π‘Ž log       + 𝑏 log(π‘Ž) β‰₯ Ξ©(π‘Ž log( π‘Žπ‘/2) + 𝑏 log(π‘Ž)) = Ξ©((π‘Ž + 𝑏) log(π‘Ž)).
                         π‘Ž+𝑏
                                                                                                  

Evaluating the Mann–Whitney statistic. We now argue how the second part of Theorem 1.9
follows, i. e., that the lower bound of Ξ©((π‘Ž + 𝑏) log(π‘Ž)) holds for inversion minimization on the
Mann–Whitney tree 𝑇 of Figure 2 with π‘Ž ≀ 𝑏. We do so by tweaking our lower bound argument
for Ξ XInv to the setting of Π𝑇 .
    How does the permutahedron graph for Π𝑇 relate to the one for Ξ XInv ? The problem Π𝑇 is
a coarsening of the problem Ξ XInv : Output values 𝑦 and π‘Žπ‘ βˆ’ 𝑦 for Ξ XInv are both mapped to
min(𝑦, π‘Žπ‘ βˆ’ 𝑦) under Π𝑇 . This means that all edges present in 𝐺(Ξ XInv ) are also present in 𝐺(Π𝑇 ),
but there may be more, and some of the connected components in 𝐺(ΠXInv ) corresponding to
output value 𝑦, may be merged in 𝐺(Π𝑇 ) with some of the connected components of 𝐺(Ξ XInv )
corresponding to output value π‘Žπ‘ βˆ’ 𝑦. However, by the reasoning behind Proposition 4.7, edges
in 𝐺(ΠXInv ) can only go between rankings whose value under ΠXInv differ by at most one. This
means that the above merging of connected components can only happen if the difference
between 𝑦 and π‘Žπ‘ βˆ’ 𝑦 is 1, i. e., for the values bπ‘Žπ‘/2c and dπ‘Žπ‘/2e, and only if π‘Žπ‘ is odd. In fact,
this is exactly the situation that we analyzed in Figure 9, where 𝑣 coincides with the root of 𝑇.
    If we ignore the rankings with value bπ‘Žπ‘/2c or dπ‘Žπ‘/2e under Ξ XInv , our lower bound argument
for Ξ XInv carries over verbatim to Π𝑇 , except that on the right-hand side of Proposition 8.8
the factor of 12 is replaced by 12 βˆ’ π‘Š, where π‘Š represents the fraction of rankings with value
                                                                     p
bπ‘Žπ‘/2c or dπ‘Žπ‘/2e under Ξ XInv . Lemma 5.5 tells us that π‘Š ≀ 2𝐢/ π‘Žπ‘(π‘Ž + 𝑏), where 𝐢 denotes
the constant from the lemma. Thus, we obtain a lower bound for D(Π𝑇 ) that is a negligible
fraction smaller than the one for D(Ξ XInv ). Taking logarithms, we obtain the same lower bound
for the query complexity up to an additive term. In particular, we obtain a query lower bound
of Ξ©((π‘Ž + 𝑏) log(π‘Ž)) for Π𝑇 in case π‘Ž ≀ 𝑏. This is the second part of Theorem 1.9.


9     Cross-inversion distribution
In this section we prove the upper bound we need for the proof of Theorem 1.6 in Section 5.2,
namely Lemma 5.5. Recall that XInvπ‘Ž,𝑏 denotes a random variable that counts the number of
cross inversions XInv(𝐴, 𝐡) from 𝐴 to 𝐡, where 𝐴 is an array of length π‘Ž, 𝐡 an array of length
𝑏, and the concatenation 𝐴𝐡 is a random permutation of [π‘Ž + 𝑏]. Lemma 5.5 states p that for all
positive integers π‘Ž and 𝑏, XInvπ‘Ž,𝑏 takes on no value with probability more than 𝐢/ π‘Žπ‘(π‘Ž + 𝑏),
where 𝐢 is a universal constant.

9.1   Approach
We establish Lemma 5.5 by going through the characteristic function of XInvπ‘Ž,𝑏 .

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                        43
                        I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Definition 9.1 (characteristic function, πœ‘ Β·,Β· ). The characteristic function πœ‘π‘Œ of a random variable
π‘Œ is the function πœ‘π‘Œ (𝑑) : ℝ β†’ β„‚ : 𝑑 ↦→ 𝔼(eπ‘–π‘‘π‘Œ ). We denote by πœ‘ π‘Ž,𝑏 the characteristic function of
XInvπ‘Ž,𝑏 .

    The characteristic function is well-defined for any random variable π‘Œ and the distribution
function of π‘Œ can always be retrieved from it (see, e.g., [24, Chapter VI]). In the case of integer-
valued random variables like XInvπ‘Ž,𝑏 , the characteristic function is periodic with period 2πœ‹ and
the distribution function can be retrieved via the inverse Fourier transform of the characteristic
function. In particular, the probabilities of XInvπ‘Ž,𝑏 can be expressed as the following integrals.

Fact 9.2. For any positive integers π‘Ž, 𝑏 and integer π‘˜
                                                                ∫ πœ‹
                                                       1
                                     β„™[XInvπ‘Ž,𝑏 = π‘˜] =                 πœ‘ π‘Ž,𝑏 (𝑑)eβˆ’π‘–π‘‘ π‘˜ 𝑑𝑑.                          (9.1)
                                                      2πœ‹         βˆ’πœ‹


Proof. Observe that the characteristic function πœ‘ π‘Ž,𝑏 of XInvπ‘Ž,𝑏 is a polynomial in 𝑧 = e𝑖𝑑 , where
the coefficient of degree π‘˜ equals the probability of the outcome π‘˜.2 Formula (9.1) then follows
by linearity from
                         ∫ πœ‹                ∫ πœ‹
                                                                 2πœ‹ 𝑑 = π‘˜
                                                               
                               𝑑 βˆ’π‘– π‘˜π‘‘            𝑖(π‘‘βˆ’π‘˜)𝑑
                              𝑧 e      𝑑𝑑 =     e         𝑑𝑑 =
                           βˆ’πœ‹                βˆ’πœ‹                   0 𝑑 β‰  π‘˜.
                                                                                                                      

    The following lemma represents the essence of the proof of Lemma 5.5.

Lemma 9.3. Then there exists a constant 𝐢 such that for all integers π‘Ž, 𝑏 with 𝑏 β‰₯ π‘Ž β‰₯ 2
                                                 ∫ πœ‹
                                                                       𝐢
                                                      |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑 ≀ √ .                                         (9.2)
                                                   βˆ’πœ‹                 𝑏 π‘Ž

where πœ‘ π‘Ž,𝑏 (𝑑)  𝔼(e𝑖𝑑 XInvπ‘Ž,𝑏 ).

Proof of Lemma 5.5. By symmetry, it suffices to consider the case where π‘Ž ≀ 𝑏. In the case where
π‘Ž = 1, thep distribution of XInvπ‘Ž,𝑏 is uniform over {0, . . . , 𝑏}, so the maximum probability is
                                            √
𝑏+1 ≀ 𝐢/ π‘Žπ‘(π‘Ž + 𝑏) for any constant 𝐢 β‰₯ 2/2. Otherwise, we have
 1

                                ∫ πœ‹                                  ∫ πœ‹
                        1                          βˆ’π‘–π‘‘ π‘˜         1                              𝐢        𝐢
      β„™[XInvπ‘Ž,𝑏 = π‘˜] =                    πœ‘ π‘Ž,𝑏 (𝑑)e       𝑑𝑑 ≀            |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑 ≀      √ ≀ p
                       2πœ‹            βˆ’πœ‹                         2πœ‹    βˆ’πœ‹                      2πœ‹π‘ π‘Ž    π‘Žπ‘(π‘Ž + 𝑏)

by (9.1) and Lemma 9.3.                                                                                               

    2After multiplication by π‘Ž+𝑏
                                 
                              π‘Ž , the resulting polynomial is known as the Gaussian polynomial with parameter
(π‘Ž, 𝑏).


                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                        44
                     L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

   To establish Lemma 9.3, as |πœ‘ π‘Ž,𝑏 (𝑑)| is an even function, it suffices to take the integral (9.2)
over the domain [0, πœ‹] and multiply by two:
                                     ∫ πœ‹                         ∫ πœ‹
                                            |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑 = 2            |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑        (9.3)
                                       βˆ’πœ‹                         0

    We divide the domain of integration on the right-hand side of (9.3) into two regions: one
close to zero, and the rest. The integrand is well-behaved in the center near zero, with it being
approximated accurately by a Gaussian curve. It is harder to analyze the behavior of the function
away from zero. In this region, a pole reduction lemma (captured by Lemma 9.13) that hinges
on a combinatorial matching result (Lemma 9.15), plays a crucial role in eliminating most of the
messy behavior of the function and still providing an effective bound.
    We first derive an expression for the characteristic function of XInvπ‘Ž,𝑏 in Section 9.2, bound
the central part of the integral in Section 9.3, the peripheral part in Section 9.4, and conclude
with the pole reduction lemma in Section 9.5.

9.2      The characteristic function πœ‘ π‘Ž,𝑏
Our derivation of the characteristic function πœ‘ π‘Ž,𝑏 of XInvπ‘Ž,𝑏 is based on a connection between
cross inversions and inversions in arrays.

Fact 9.4. Let 𝐴 and 𝐡 be arrays, and let 𝐴𝐡 be the concatenation of 𝐴 with 𝐡. Then

                                 Inv(𝐴𝐡) = Inv(𝐴) + Inv(𝐡) + XInv(𝐴, 𝐡).

Proof. Any inversion in 𝐴𝐡 is either between two elements of 𝐴, two elements of 𝐡, or one
element of 𝐴 and one element of 𝐡. In each case, the inversion is counted in Inv(𝐴), Inv(𝐡), or
XInv(𝐴, 𝐡), respectively.                                                                    

      We also make use of the following handy property of characteristic functions.

Fact 9.5. For independent random variables π‘Œ, 𝑍,

                                             πœ‘π‘Œ+𝑍 (𝑑) = πœ‘π‘Œ (𝑑) πœ‘ 𝑍 (𝑑).

Proof.      πœ‘π‘Œ+𝑍 (𝑑) = 𝔼(e𝑖𝑑(π‘Œ+𝑍) ) = 𝔼(eπ‘–π‘‘π‘Œ e𝑖𝑑𝑍 ) = 𝔼(eπ‘–π‘‘π‘Œ )𝔼(e𝑖𝑑𝑍 ) = πœ‘π‘Œ (𝑑) πœ‘ 𝑍 (𝑑).           

   Let Invπ‘Ž be the random variable that counts the number Inv(𝐴) of inversions in an array 𝐴
that is a uniform permutation of [π‘Ž], and let πœ‘ π‘Ž (𝑑) be the characteristic function of Invπ‘Ž .

Claim 9.6.
                                                 π‘Ž  𝑖𝑑(π‘˜βˆ’1)/2
                                                 Γ–                       
                                                    e          sin(π‘˜π‘‘/2)
                                     πœ‘ π‘Ž (𝑑) =                        Β·                         (9.4)
                                                           π‘˜               sin(𝑑/2)
                                                 π‘˜=1


                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                     45
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Proof. Consider the process of placing the elements 1, . . . , π‘Ž one by one, each time placing each
new element between two elements or on some end of the array, to form an array 𝐴.
      For π‘˜ = 1, . . . , π‘Ž, consider the random variable that counts the number of new inversions
formed with π‘˜ when π‘˜ is placed. First of all, when π‘˜ is placed, the number of new inversions
is equal to the number of elements to the left of π‘˜ at the time of placement (only the elements
1, . . . , π‘˜ βˆ’ 1 have been placed at this point). This means the random variable has a uniform
distribution over {0, . . . , π‘˜ βˆ’ 1}, which we denote by π‘ˆ π‘˜βˆ’1 . Furthermore, this situation applies
regardless of the placement of the other elements, so this random variable π‘ˆ π‘˜βˆ’1 is independent
from all other previous random variables π‘ˆ π‘—βˆ’1 with 𝑗 < π‘˜.
      Therefore, Invπ‘Ž can be written as the following sum of independent variables:

                                             Invπ‘Ž = π‘ˆ0 + Β· Β· Β· + π‘ˆ π‘Žβˆ’1 .

We can use Fact 9.5 to calculate the characteristic function:
                                            π‘Ž                      π‘Ž   π‘˜βˆ’1
                                                                                          !
                                            Γ–                      Γ– 1 Γ• π‘–π‘‘π‘š
                               πœ‘ π‘Ž (𝑑) =          𝔼[eπ‘–π‘‘π‘ˆ π‘˜βˆ’1 ] =                     e        ,
                                                                           π‘˜
                                            π‘˜=1                      π‘˜=1       π‘š=0

                                                              𝑖𝑑 π‘˜/2 (e 𝑖𝑑 π‘˜/2 βˆ’eβˆ’π‘–π‘‘ π‘˜/2 )                          π‘˜/2)
from which (9.4) follows as
                                Í π‘˜βˆ’1      π‘–π‘‘π‘š = e𝑖𝑑 π‘˜ βˆ’1 = e                              = e𝑖𝑑(π‘˜βˆ’1)/2 Β· 2𝑖2𝑖sin(𝑑          
                                   π‘š=0 e         e𝑖𝑑 βˆ’1       e𝑖𝑑/2 (eπ‘–π‘˜/2 βˆ’eβˆ’π‘–π‘‘/2 )                           sin(𝑑/2)
                                                                                                                         .

    Consider a random permutation of [π‘Ž + 𝑏], let 𝐴 be the array consisting of the first π‘Ž elements,
and 𝐡 the array consisting of the remaining 𝑏. Then Inv(𝐴), Inv(𝐡), Inv(𝐴, 𝐡), and XInv(𝐴, 𝐡)
have the same distributions as Invπ‘Ž , Inv𝑏 , Invπ‘Ž+𝑏 , and XInvπ‘Ž,𝑏 , respectively. Moreover, the values
of Inv(𝐴), Inv(𝐡), and XInv(𝐴, 𝐡) are independent. Hence, by Fact 9.4 and Fact 9.5 we have

                                         πœ‘ π‘Ž+𝑏 (𝑑) = πœ‘ π‘Ž (𝑑) πœ‘ 𝑏 (𝑑) πœ‘ π‘Ž,𝑏 (𝑑),

or
                                                               πœ‘ π‘Ž+𝑏 (𝑑)
                                              πœ‘ π‘Ž,𝑏 (𝑑) =                    .
                                                             πœ‘ π‘Ž (𝑑) πœ‘ 𝑏 (𝑑)
By (9.4), the arithmetic sum formula applied to the exponents of e, and algebraic simplifications
we conclude:
                                                            ΓŽπ‘›
Proposition 9.7. For integers 𝑛 β‰₯ 0, let 𝑠 𝑛 (𝑑) =             π‘˜=1
                                                                     sin(π‘˜π‘‘)
                                                                        π‘˜ . Then

                                             𝑠 π‘Ž+𝑏 (𝑑/2)                          𝑠 π‘Ž+𝑏 (𝑑/2)
                   πœ‘ π‘Ž,𝑏 (𝑑) = e𝑖𝑑 π‘Žπ‘                      and |πœ‘ π‘Ž,𝑏 (𝑑)| =                    .
                                        𝑠 π‘Ž (𝑑/2)𝑠 𝑏 (𝑑/2)                   𝑠 π‘Ž (𝑑/2)𝑠 𝑏 (𝑑/2)

9.3   Center bound
For the first piece of the integral on the right-hand side of (9.3), we integrate |πœ‘ π‘Ž,𝑏 (𝑑)| over the
interval [0, 2πœ‹/(π‘Ž + 𝑏)]. For the sake of convenience, we substitute 𝑑 with 2𝑑 in order to avoid the
denominator of 2 in the sine terms of the integrand.

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                                   46
                  L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Lemma 9.8 (center bound). For integers 𝑏 β‰₯ π‘Ž β‰₯ 2,
                       ∫     2πœ‹                          ∫      πœ‹                                   
                             π‘Ž+𝑏                               π‘Ž+𝑏                             1
                                   |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑 = 2                |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 = 𝑂       √         .
                         0                                 0                                   𝑏 π‘Ž

   We can write
                                                         π‘Ž
                                                         Γ–      π‘˜    sin((𝑏 + π‘˜)𝑑)
                                        |πœ‘ π‘Ž,𝑏 (2𝑑)| =             Β·
                                                               𝑏+π‘˜      sin(π‘˜π‘‘)
                                                         π‘˜=1

as every term in the product on the right-hand side is nonnegative on this interval. We start
with the following estimates.

Claim 9.9. For positive integers π‘˜ ≀ 𝑏 and π‘₯ ∈ [0, πœ‹/(𝑏 + π‘˜)],

                                          π‘˜    sin((𝑏 + π‘˜)π‘₯)       𝑏2
                                             Β·               ≀ 1 βˆ’ 2 π‘₯2.
                                         𝑏+π‘˜      sin(π‘˜π‘₯)         2πœ‹

    To prove this claim, we first prove two trigonometric bounds. Refer to Figure 13 for a plot of
the functions and bounds.

Claim 9.10. For positive integers π‘˜, and π‘₯ ∈ [0, πœ‹/π‘˜],

                                                                         (π‘˜π‘₯)3
                                                sin(π‘˜π‘₯) ≀ π‘˜π‘₯ βˆ’                 .
                                                                          πœ‹2
                                                                                𝑦3
Proof. Let 𝑦 = π‘˜π‘₯. It is enough to argue that sin(𝑦) β‰₯ 𝑦 βˆ’ πœ‹2 in the range 𝑦 ∈ [0, πœ‹].
                                   𝑦3
   Let 𝑓 (𝑦) = sin(𝑦) βˆ’ 𝑦 + πœ‹2 . Notice that 𝑓 (0) = 𝑓 (πœ‹) = 0 and 𝑓 0(πœ‹) > 0. We will argue that
there is a unique point 𝑦 βˆ— ∈ (0, πœ‹) such that 𝑓 0(𝑦 βˆ— ) = 0, which will ensure that 𝑓 (𝑦) ≀ 0 for all
𝑦 ∈ [0, πœ‹].
   We can calculate that

                                                         3𝑦 2
                                              𝑓 0(𝑦) = cos(𝑦) βˆ’ 1 +
                                                          πœ‹2
                                            3𝑦 2          𝑦
                                          = 2 βˆ’ 2 sin2        .
                                            πœ‹              2
                                          √
So 𝑓 0(𝑦) = 0 if and only if sin(𝑦/2) = Β±( 6/πœ‹) Β· (𝑦/2), which is satisfied by one unique point
𝑦 βˆ— ∈ (0, πœ‹).                                                                                

Claim 9.11. For positive integers π‘˜, and π‘₯ ∈ (0, πœ‹/2π‘˜],

                                                                         1
                                                     cot(π‘˜π‘₯) ≀             .
                                                                        π‘˜π‘₯

                      T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                   47
                     I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Proof. Let 𝑦 = π‘˜π‘₯. We will prove that cot(𝑦) ≀ 1𝑦 for all 𝑦 ∈ (0, πœ‹/2]. It is enough to prove that
𝑦 cos(𝑦) ≀ sin(𝑦) for all 𝑦 ∈ [0, πœ‹/2].
    The latter inequality follows because both sides are zero when 𝑦 = 0, and the derivative of
the left hand side is bounded above by the derivative of the right hand side when 𝑦 ∈ [0, πœ‹/2]:

                                   cos(𝑦) βˆ’ 𝑦 sin(𝑦) ≀ cos(𝑦).

                                                                                                

               𝑦                                          𝑦




                                          π‘₯                                     π‘₯
                                   πœ‹                                      πœ‹/2
                      (a) Claim 9.10                           (b) Claim 9.11

                         Figure 13: Plots of Trigonometric Bounds.
                Trigonometric functions are dotted, upper bounds are dashed.

   Now we finish the proof of Claim 9.9.

Proof. Notice that
                        sin((𝑏 + π‘˜)π‘₯) sin(π‘˜π‘₯) cos(𝑏π‘₯) + sin(𝑏π‘₯) cos(π‘˜π‘₯)
                                     =
                           sin(π‘˜π‘₯)                 sin(π‘˜π‘₯)
                                   = cos(𝑏π‘₯) + cot(π‘˜π‘₯) sin(𝑏π‘₯).
    Of course, cos(𝑏π‘₯) ≀ 1. Furthermore, from Claim 9.10 and Claim 9.11, we can see that in this
domain of π‘₯, sin(𝑏π‘₯) ≀ 𝑏π‘₯ βˆ’ π‘πœ‹π‘₯2 and cot(π‘˜π‘₯) ≀ π‘˜π‘₯
                              3 3               1
                                                  . Additionally, sin(𝑏π‘₯) β‰₯ 0. Therefore, using
the fact that π‘˜ ≀ 𝑏,

                        π‘˜    sin((𝑏 + π‘˜)π‘₯)    π‘˜            𝑏3 π‘₯3
                                                                        
                                                    1
                           Β·               ≀     1+    𝑏π‘₯ βˆ’ 2
                       𝑏+π‘˜      sin(π‘˜π‘₯)      𝑏+π‘˜    π‘˜π‘₯      πœ‹

                                            𝑏3               𝑏2 2
                                 ≀ 1βˆ’             π‘₯ 2
                                                      ≀ 1 βˆ’     π‘₯ .
                                        (𝑏 + π‘˜)πœ‹2           2πœ‹2
                                                                                                
   From this, we can now prove Lemma 9.8.

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                      48
                          L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

Proof. Recall that on this interval
                                                               π‘Ž
                                                               Γ–      π‘˜    sin((𝑏 + π‘˜)𝑑)
                                            |πœ‘ π‘Ž,𝑏 (2𝑑)| =               Β·               .
                                                                     𝑏+π‘˜      sin(π‘˜π‘‘)
                                                               π‘˜=1

      Claim 9.9 applies on all 𝑑 in the domain because πœ‹/(𝑏 + π‘Ž) ≀ πœ‹/(𝑏 + π‘˜) for all π‘˜.
      Therefore,
                 πœ‹                               πœ‹                  π‘Ž                   πœ‹
                                                             𝑏2𝑑2                                   π‘Žπ‘ 2 𝑑 2
          ∫                               ∫                                        ∫                                                   
                π‘Ž+𝑏                             π‘Ž+𝑏                                     π‘Ž+𝑏                                          1
                      |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀                   1βˆ’                 𝑑𝑑 ≀             exp βˆ’                     𝑑𝑑 = 𝑂       √         .
            0                               0                2πœ‹2                    0                2πœ‹2                             𝑏 π‘Ž
Here, we use the fact that 1 βˆ’ π‘₯ ≀ exp(βˆ’π‘₯) for all π‘₯, and the Gaussian integral: the integral of
exp(βˆ’π‘‘ 2 ) over ℝ is constant, and by scaling the argument, the integral of exp(βˆ’π‘π‘‘ 2 ) over ℝ is a
constant factor of 𝑐 βˆ’1/2 for any parameter 𝑐.                                                   

9.4     Peripheral bound
We now bound |πœ‘ π‘Ž,𝑏 (𝑑)| in the region away from 0, namely, the interval [2πœ‹/(π‘Ž + 𝑏), πœ‹]. As for
the other part of the integral on the right-hand side of (9.3), we substitute 𝑑 with 2𝑑 in order to
avoid the denominator of 2 in the sine terms of the integrand.
Lemma 9.12 (peripheral bound). For integers 𝑏 β‰₯ π‘Ž β‰₯ 2,
                                  ∫ πœ‹                           ∫ πœ‹                                            
                                                                         2                                1
                                          |πœ‘ π‘Ž,𝑏 (𝑑)| 𝑑𝑑 = 2                 |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 = 𝑂          √         .
                                    2πœ‹
                                    π‘Ž+𝑏
                                                                     πœ‹
                                                                    π‘Ž+𝑏
                                                                                                          𝑏 π‘Ž

    In this region, the main problem is that the denominator of |πœ‘ π‘Ž,𝑏 (2𝑑)| often goes to zero,
which could potentially blow up the integrand. However, the terms in the numerator always
cancel out these blowups. The following lemma will be our main tool for bounding |πœ‘ π‘Ž,𝑏 (2𝑑)| in
this region; it will allow terms in the numerator to cancel out bad terms in the denominator.
Lemma 9.13 (pole reduction). For every 𝑑 ∈ ℝ, there exists a bΔ³ection 𝛽 𝑑 : {1, . . . , π‘Ž} β†’ {𝑏 +
1, . . . , 𝑏 + π‘Ž} (depending on 𝑑) such that for every π‘˜ = 1, . . . , π‘Ž,

                                                  1              1
                                                    sin(π‘˜π‘‘) β‰₯         sin(𝛽 𝑑 (π‘˜)𝑑) .
                                                  π‘˜           𝛽 𝑑 (π‘˜)

      A basic bound can be found by applying Lemma 9.13 on |πœ‘ π‘Ž,𝑏 (2𝑑)| for π‘˜ = 2, . . . , π‘Ž, resulting
in an upper bound of 1/𝑏 sin(𝑑). This bound is usable, but is weak on points closer to 0. To
remedy this, we can divide the domain of integration into multiple parts, where the points
closer to 0 can safely include more terms in the denominator.
                                                                                                  πœ‹    πœ‹
      Let 2 ≀ 𝑛 ≀ π‘Ž be an integer. We split the domain of integration into three intervals: [ π‘Ž+𝑏    , 2𝑛 ],
  πœ‹ πœ‹      πœ‹           πœ‹   πœ‹ πœ‹
[ 2𝑛 , 2 βˆ’ 2𝑛 ], and [ 2 βˆ’ 2𝑛 , 2 ]. By selecting a good value of 𝑛, we can get a reasonable upper bound
on this region. Here, we will make use of Lemma 9.13 and the following linear approximation
to sine:

                              T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                                                 49
                              I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

Fact 9.14. For a positive integer π‘˜ and 𝑑 ∈ [0, πœ‹/2π‘˜],
                                                                                        2π‘˜π‘‘
                                                                     sin(π‘˜π‘‘) β‰₯              .
                                                                                         πœ‹
                                                   πœ‹
                              πœ‹ when 𝑑 = 0 and 𝑑 = 2π‘˜ . This fact then follows since sine is
Proof. Notice that sin(π‘˜π‘‘) = 2π‘˜π‘‘
concave on this interval.                                                                 

Region I. The first region of integration is [πœ‹/(π‘Ž + 𝑏), πœ‹/(2𝑛)].

                    πœ‹                                  πœ‹
                                                                     𝑛!           sin(𝛽 𝑑 (1)𝑑) Β· Β· Β· sin(𝛽 𝑑 (𝑛)𝑑)
               ∫                                  ∫
                    2𝑛                                 2𝑛
                             |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀                                                                      𝑑𝑑
                    πœ‹                                  πœ‹    𝛽 𝑑 (1) Β· Β· Β· 𝛽 𝑑 (𝑛)     sin(𝑑) Β· Β· Β· sin(𝑛𝑑)
                   π‘Ž+𝑏                                π‘Ž+𝑏
                                                             πœ‹
                                                𝑛!
                                                        ∫
                                                             2𝑛                 1
                                              ≀ 𝑛                                          𝑑𝑑
                                               𝑏             πœ‹        sin(𝑑) Β· Β· Β· sin(𝑛𝑑)
                                                            π‘Ž+𝑏
                                                        ∫      πœ‹       πœ‹ 𝑛 1
                                                  1           2𝑛
                                              ≀                                         𝑑𝑑
                                                  𝑏𝑛         πœ‹            2        𝑑𝑛
                                                            π‘Ž+𝑏
                                                                                                      π‘›βˆ’1
                                                1  πœ‹ 𝑛         π‘Ž+𝑏
                                                                                          
                                                            1
                                              ≀ 𝑛 Β·      Β·     Β·
                                               𝑏    2      π‘›βˆ’1    πœ‹
                                                                                                 π‘›βˆ’1
                                                 1  πœ‹ 𝑛
                                                                                          
                                                               1    2𝑏
                                              ≀ 𝑛 Β·         Β·     Β·
                                                𝑏     2       π‘›βˆ’1   πœ‹
                                                    πœ‹
                                              ≀           .
                                                2𝑏(𝑛 βˆ’ 1)
    The first two steps involve applying Lemma 9.13 on all π‘˜ ∈ {𝑛 + 1, . . . , π‘Ž}, and then using
the fact that |sin(π‘₯)| ≀ 1 and 𝛽 𝑑 (π‘˜) β‰₯ 𝑏. The third step uses Fact 9.14 for π‘˜ ∈ {1, . . . , 𝑛}, which
applies on the interval [πœ‹/(π‘Ž + 𝑏), πœ‹/(2𝑛)] for these values of π‘˜. From there, we bound with the
left limit of integration.

                                                       πœ‹ πœ‹      πœ‹
Region II. We now bound |πœ‘ π‘Ž,𝑏 (2𝑑)| on the interval [ 2𝑛 , 2 βˆ’ 2𝑛 ]. We can use Lemma 9.13 to
eliminate all terms except π‘˜ ∈ {1, 2} this time.
                ∫ πœ‹βˆ’ πœ‹                                  ∫ πœ‹βˆ’ πœ‹
                         2    2𝑛                                 2    2𝑛             2        sin(𝛽 𝑑 (1)𝑑) sin(𝛽 𝑑 (2)𝑑)
                                   |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀                                                                      𝑑𝑑
                    πœ‹
                    2𝑛
                                                            πœ‹
                                                            2𝑛
                                                                              𝛽 𝑑 (1) 𝛽 𝑑 (2)     sin(𝑑) sin(2𝑑)
                                                                 ∫ πœ‹βˆ’ πœ‹
                                                        2                 2   2𝑛          1
                                                  ≀                                               𝑑𝑑.
                                                        𝑏2           πœ‹
                                                                     2𝑛
                                                                                   sin(𝑑) sin(2𝑑)

    Because |sin(𝑑)| is increasing here and |sin(2𝑑)| is symmetric about πœ‹4 , the value of the integral
                  πœ‹ πœ‹
on the interval [ 2𝑛 , 4 ] exceeds the value on the interval [ πœ‹4 , πœ‹2 βˆ’ 2𝑛
                                                                         πœ‹
                                                                            ]. Using Fact 9.14,
                             ∫ πœ‹                                                ∫ πœ‹  2
                   2               4         1             1                      4  πœ‹ 1                     πœ‹2 2𝑛     πœ‹π‘›
                                                      𝑑𝑑 ≀ 2                                          𝑑𝑑 ≀        Β·   = 2.
                   𝑏2         πœ‹        sin(𝑑) sin(2𝑑)     𝑏                        πœ‹     2      𝑑   2        4𝑏 2   πœ‹  2𝑏
                              2𝑛                                                   2𝑛



                               T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                            50
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

We have now established that                    ∫ πœ‹βˆ’ πœ‹
                                                      2   2𝑛                       πœ‹π‘›
                                                               |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀      .
                                                 πœ‹
                                                 2𝑛
                                                                                   𝑏2

                                                                      πœ‹ πœ‹
Region III. Notice that |sin(𝑑)| is increasing on the interval [ πœ‹2 βˆ’ 2𝑛 , 2 ], so we can bound |sin(𝑑)|
         πœ‹       πœ‹
by |sin( 2 βˆ’ 2𝑛 )|. Similar to before, the first step follows from Lemma 9.13, this time applied to
π‘˜ ∈ {2, . . . , π‘Ž}.
                ∫ πœ‹                             ∫ πœ‹                                       ∫ πœ‹
                    2                              2     1 sin(𝛽 1 (𝑑))                        2      1
                           |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀                            𝑑𝑑 ≀                                𝑑𝑑
                  πœ‹   πœ‹                          πœ‹   πœ‹ 𝛽 (1) sin(𝑑)                        πœ‹   πœ‹ |𝑏 sin(𝑑)|
                  2 βˆ’ 2𝑛                         2 βˆ’ 2𝑛 𝑑                                  2 βˆ’ 2𝑛
                                                πœ‹         1        πœ‹      1      πœ‹
                                          ≀       Β·       πœ‹   πœ‹ ≀             =         .
                                                2𝑛 𝑏 sin( 2 βˆ’ 2𝑛 ) 2𝑛 𝑏(1 βˆ’ ) 2𝑏(𝑛 βˆ’ 1)
                                                                            1
                                                                                           𝑛


Overall bound. Summing the above bounds, we can deduce that
                                        ∫ πœ‹
                                           2                               πœ‹      πœ‹π‘›
                                                |πœ‘ π‘Ž,𝑏 (2𝑑)| 𝑑𝑑 ≀                + 2.
                                           πœ‹                            𝑏(𝑛 βˆ’ 1)   𝑏
                                          π‘Ž+𝑏

                 √
By choosing 𝑛 = d π‘Ž e (keeping in mind that 𝑏 β‰₯ π‘Ž β‰₯ 2), we can deduce Lemma 9.12.

9.5   Pole reduction
Finally, we prove the pole reduction lemma (Lemma 9.13). The essence is an interval matching
strategy capture Lemma 9.15. Here is the intuition.
                                                                         𝑑)
    Recall that we want to upper bound factors of the form | sin(β„“     β„“    | by rescaled versions | sin(π‘˜π‘‘)
                                                                                                        π‘˜ |
of the same pattern, where β„“ ∈ {𝑏 + 1, . . . , 𝑏 + π‘Ž} and π‘˜ ∈ {1, . . . , π‘Ž} are matched. The matching
                                                                      π‘˜ | vanishes at the point 𝑑 while
definitely needs to avoid situations like in Figure 14a, where | sin(π‘˜π‘‘)
        𝑑)
| sin(β„“
     β„“     | does not. Ideally, the period of the π‘˜-scaled version that contains the point 𝑑 encloses
the period of the β„“ -scaled version that contains 𝑑, like in Figure 14b. As long as the pattern is
convex, this ensures that the π‘˜-scaled version is larger than the β„“ -scaled version everywhere on
the encompassed period. (We will formally prove this in Claim 9.22.) Thus, if at every point 𝑑,
we can set up a matching such that the enclosing relationship holds for all matched pairs, we
are home free. Lemma 9.15 below does exactly this.
       Let us first introduce some notation. For every real number 𝑑 and positive integer π‘˜, there is
a unique integer 𝑛 such that 𝑑 is contained in the half-open interval [𝑛/π‘˜, (𝑛 + 1)/π‘˜). We call this
interval the π‘˜-interval of 𝑑. For positive integers π‘˜, β„“ , we can say that the π‘˜-interval of 𝑑 encloses
the β„“ -interval of 𝑑 if the β„“ -interval of 𝑑 is a subset of the π‘˜-interval of 𝑑. We use the shorthand
that π‘˜ encloses β„“ at 𝑑.

Lemma 9.15 (interval matching). Let π‘Ž, 𝑏 be positive integers. For any real 𝑑, there exists a bΔ³ection 𝛽 𝑑
between {1, . . . , π‘Ž} and {𝑏 + 1, . . . , 𝑏 + π‘Ž} such that for all π‘˜ = 1, . . . , π‘Ž, π‘˜ encloses β„“ = 𝛽 𝑑 (π‘˜) at 𝑑.

                           T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                  51
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN




        𝑑                                                                   𝑑
            β„“                                                                   β„“
                          π‘˜                                                         π‘˜
      (a) Bad case: no interval enclosure                       (b) Good case: interval enclosure

                    Figure 14: Enclosing intervals are needed for Lemma 9.13.


    Note that the bΔ³ection can be different depending on 𝑑. In fact, this is necessary as otherwise
β„“ would need to be a multiple of π‘˜, which is not possible with a bΔ³ection between the sets
{1, . . . , π‘Ž} and {𝑏 + 1, . . . , 𝑏 + π‘Ž}.
    We can interpret Lemma 9.15 as a matching on a bipartite graph by using Hall’s marriage
lemma.
Lemma 9.16 (Hall’s marriage lemma). Let 𝐺 be a bipartite graph with partitions 𝐿, 𝑅. For any 𝐴 βŠ† 𝐿,
let 𝑁(𝐴) be the set of all vertices in 𝑅 with at least one neighbor in 𝐴. The graph 𝐺 admits a perfect
matching if and only if for all such 𝐴, |𝑁(𝐴)| β‰₯ |𝐴|.
    For 𝐴 βŠ† {1, . . . , π‘Ž}, let 𝑁𝑑 (𝐴) be the set of all β„“ ∈ {𝑏 + 1, . . . , 𝑏 + π‘Ž} such that there exists
π‘˜ ∈ 𝐴 where π‘˜ encloses β„“ at 𝑑. To produce the desired bΔ³ection 𝛽 𝑑 in Lemma 9.15, it is sufficient
to prove that |𝑁𝑑 (𝐴)| β‰₯ |𝐴| for all 𝑑 and 𝐴.
    There are some values of β„“ that are always contained in 𝑁𝑑 (𝐴) regardless of the value of 𝑑.
Let 𝑁(𝐴) be the set of all β„“ ∈ β„• such that for all 𝑑, there exists π‘˜ ∈ 𝐴 where π‘˜ encloses β„“ (this π‘˜
can vary depending on 𝑑). As 𝑁(𝐴) ∩ {𝑏 + 1, . . . , 𝑏 + π‘Ž} βŠ† 𝑁𝑑 (𝐴), it is sufficient to prove that
|𝑁(𝐴) ∩ {𝑏 + 1, . . . , 𝑏 + π‘Ž}| β‰₯ |𝐴| for all 𝐴 and apply Theorem 9.16 to prove Lemma 9.15.
Example 9.17. 5 ∈ 𝑁({2, 3}).
Proof. We only consider 𝑑 ∈ [0, 1) for clarity. When 𝑑 ∈ [0, 0.4), the 2-interval for 𝑑 is [0, 0.5), while
the 5-interval for 𝑑 is either [0, 0.2) or [0.2, 0.4), which means 2 encloses 5. When 𝑑 ∈ [0.4, 0.6),
the 3-interval for 𝑑 is [1/3, 2/3), which encloses the 5-interval [0.4, 0.6). When 𝑑 ∈ [0.6, 1), the
2-interval is [0.5, 1) while the 5-interval is either [0.6, 0.8) or [0.8, 1), so 2 encloses 5. These
enclosures are shown in Figure 15, where the marked 5-intervals are contained within the
respectively marked 2 or 3-intervals.
    For all 𝑑, either 2 encloses 5, or 3 encloses 5. In other words, 5 ∈ 𝑁({2, 3}).                     
   We first consider a different characterization of 𝑁(𝐴).
Claim 9.18. Let β„“ be a positive integer. Then β„“ ∈ 𝑁(𝐴) if and only if every open interval 𝐼 βŠ‚ ℝ that
contains a fraction of denominator π‘˜ for every π‘˜ ∈ 𝐴 must also contain a fraction of denominator β„“ .
(These fractions do not have to be distinct or reduced.)

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                            52
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

                                                                                   π‘˜=2

                                                                                   π‘˜=3

                                                                                   β„“ =5

                                         Figure 15: Example 9.17


    The idea is that if no π‘˜ ∈ 𝐴 encloses β„“ for some 𝑑, then the endpoints of each π‘˜-interval form
a fraction of denominator π‘˜ contained strictly within the β„“ -interval of 𝑑. As a result, we have a
contiguous interval 𝐼 containing a fraction of denominator π‘˜, and 𝐼 is contained strictly within
the β„“ -interval of 𝑑. As such, 𝐼 cannot contain a fraction of denominator β„“ . This is illustrated
in Figure 16. The formal proof also considers the edge cases involving the endpoints of the
intervals.

                                     𝑑


                                                                          π‘˜-intervals for π‘˜ ∈ 𝐴



                                                      β„“ -interval


                                         𝐼

 Figure 16: 𝐼 contains a fraction of denominator π‘˜ for all π‘˜, but no fraction of denominator β„“ .

    For Claim 9.18, we prove the following proposition.

Proposition 9.19. The following statements are equivalent for any half-open interval [𝑐, 𝑑) and positive
integer π‘˜:

(1) [𝑐, 𝑑) is contained within a π‘˜-interval.

(2) (𝑐, 𝑑) is contained within a π‘˜-interval.

(3) (𝑐, 𝑑) contains no fraction of denominator π‘˜.

Proof. (1) =β‡’ (2). This follows as (𝑐, 𝑑) is a subset of [𝑐, 𝑑).
   (2) =β‡’ (3). Suppose (𝑐, 𝑑) is contained in the π‘˜-interval [𝑛/π‘˜, (𝑛 + 1)/π‘˜). The interval (𝑐, 𝑑)
cannot contain a fraction of denominator π‘˜, otherwise said fraction would be strictly between
𝑛/π‘˜ and (𝑛 + 1)/π‘˜.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                          53
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

   (3) =β‡’ (1). Let 𝑛/π‘˜ be the largest fraction of denominator π‘˜ less than or equal to 𝑐. It
must be true that (𝑛 + 1)/π‘˜ β‰₯ 𝑑, since (𝑛 + 1)/π‘˜ cannot be contained in (𝑐, 𝑑). Therefore, [𝑐, 𝑑) is
contained in the π‘˜-interval [𝑛/π‘˜, (𝑛 + 1)/π‘˜).                                                     
    From this, we can prove Claim 9.18.
Proof. By definition, the condition that β„“ ∈ 𝑁(𝐴) is that any β„“ -interval 𝐽 is contained in some
π‘˜-interval for some π‘˜ ∈ 𝐴. By condition (2) in Proposition 9.19, this is equivalent to saying Int(𝐽)
is contained in some π‘˜-interval, where Int(𝐽) is the interior of 𝐽. This is true if and only if any
open subinterval 𝐼 of Int(𝐽) is contained in some π‘˜-interval. Equivalently, if 𝐼 is an open interval
that is not contained in any π‘˜-interval, then 𝐼 is not contained in any β„“ -interval. Using condition
(3) in Proposition 9.19, this is finally equivalent to the condition that if 𝐼 contains a fraction of
denominator π‘˜ for every π‘˜ ∈ 𝐴, then 𝐼 contains a fraction of denominator β„“ .                       
    The characterization in Claim 9.18 allows us to prove the following key claim about 𝑁(𝐴).
Claim 9.20. If β„“ 1 , β„“2 ∈ 𝑁(𝐴), then β„“ 1 + β„“ 2 ∈ 𝑁(𝐴).
Proof. By Claim 9.18, any interval 𝐼 that contains a fraction of denominator π‘˜ for every π‘˜ ∈ 𝐴
must also contain two fractions π‘₯/β„“1 and 𝑦/β„“ 2 . For positive integers π‘Ž, 𝑏, 𝑐, 𝑑, define the mediant
of π‘Ž/𝑏 and 𝑐/𝑑 to be (π‘Ž + 𝑐)/(𝑏 + 𝑑). Then the mediant is always between the two fractions. In
other words, if π‘Ž/𝑏 ≀ 𝑐/𝑑,
                                             π‘Ž    π‘Ž+𝑐        𝑐
                                               ≀        ≀ .
                                             𝑏    𝑏+𝑑       𝑑
This fact can be proven with elementary algebra, as both sides are equivalent to π‘Ž/𝑏 ≀ 𝑐/𝑑.
     From this, we see that the mediant (π‘₯ + 𝑦)/(β„“1 + β„“ 2 ) is contained in 𝐼, as it is between
two elements of 𝐼. As this applies to every such 𝐼, we can use Claim 9.18 to conclude that
β„“ 1 + β„“ 2 ∈ 𝑁(𝐴).
     Note that π‘₯/β„“ 1 and 𝑦/β„“ 2 do not have to be distinct. β„“ 1 and β„“ 2 might be equal, or π‘₯/β„“ 1 = 𝑦/β„“ 2 . 
    As π‘˜ encloses π‘˜ for every 𝑑, we have that 𝐴 βŠ† 𝑁(𝐴). Let Sums(𝐴) be the set of positive
integers that can be written as the sum of not necessarily distinct elements of 𝐴. Claim 9.20
implies that Sums(𝐴) βŠ† 𝑁(𝐴).
Claim 9.21. For nonnegative integers 𝑛,
                                |Sums(𝐴) ∩ {𝑛 + 1, . . . , 𝑛 + π‘Ž}| β‰₯ |𝐴|.
Proof. Let π‘š = max(𝐴). Because 𝐴 βŠ† Sums(𝐴), Sums(𝐴) contains every positive integer
congruent to an element of 𝐴 modulo π‘š, since we can repeatedly add π‘š to any element of
𝐴. As π‘Ž β‰₯ π‘š, the set {𝑛 + 1, . . . , 𝑛 + π‘Ž} contains at least one element for every residue mod π‘š.
Therefore, Sums(𝐴) contains at least |𝐴| elements of {𝑛 + 1, . . . , 𝑛 + π‘Ž}.                    
    Putting it all together, we have that
          |𝐴| ≀ |Sums(𝐴) ∩ {𝑏 + 1, . . . , 𝑏 + π‘Ž}| ≀ |𝑁(𝐴) ∩ {𝑏 + 1, . . . , 𝑏 + π‘Ž}| ≀ |𝑁𝑑 (𝐴)|,
which proves Lemma 9.15 by our previous reasoning. To finish, we need to show that this
implies Lemma 9.13.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                             54
                    L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL


                                                                   𝑓 (π‘˜π‘₯)
                                                                      π‘˜

                                              𝑓 (β„“ π‘₯)
                                                 β„“


                                         𝑑
                                                                                       π‘₯
                                πœ†/β„“
                              πœ…/π‘˜

                              Figure 17: Proof of Claim 9.22: πœ†/β„“ ≀ πœ…/π‘˜


Claim 9.22. Lemma 9.15 implies Lemma 9.13.
Proof. We prove the following more general claim. Let 𝑓 : ℝ β†’ ℝ be a function that has period
1, and additionally, 𝑓 is concave on [0, 1] and 𝑓 (0) = 𝑓 (1) = 0. For 𝑑 ∈ [0, 1], let 𝛽 𝑑 be a bΔ³ection
that satisfies the conditions of Lemma 9.15. Then for any π‘˜ ∈ {1, . . . , π‘Ž}, let β„“ = 𝛽 𝑑 (π‘˜). We seek
to prove that
                                                  𝑓 (π‘˜π‘‘)    𝑓 (β„“ 𝑑)
                                                         β‰₯
                                                     π‘˜         β„“
for all π‘˜ = 1, . . . , π‘Ž. Notice that 𝑓 (𝑑) = |sin(𝑑)| satisfies all the conditions of the claim, albeit after
scaling the period from πœ‹ to 1.
    To prove the general claim, let πœ… = π‘˜π‘‘ βˆ’ bπ‘˜π‘‘c and πœ† = β„“ 𝑑 βˆ’ bβ„“ 𝑑c, noting that πœ…, πœ† ∈ [0, 1]. By
the enclosing property, it is true that πœ…/π‘˜ β‰₯ πœ†/β„“ , since they represent the distance from 𝑑 to the
left endpoints of the π‘˜ and β„“ -intervals, respectively, and π‘˜ encloses β„“ at 𝑑. This can be seen in
Figure 17.
    Suppose πœ… ≀ πœ†. We have
                                                  πœ…        π‘˜       π‘˜
                               𝑓 (π‘˜π‘‘) = 𝑓 (πœ…) β‰₯     𝑓 (πœ†) β‰₯ 𝑓 (πœ†) = 𝑓 (β„“ 𝑑).
                                                  πœ†        β„“       β„“
The first step follows from the periodicity of 𝑓 , the second from the concavity of 𝑓 on [0, 1]
and the fact that 𝑓 (0) = 0 (the point (πœ…, 𝑓 (πœ…)) is above the line segment connecting (πœ†, 𝑓 (πœ†)) and
(0, 0)), the third from the aforementioned inequality πœ…/π‘˜ β‰₯ πœ†/β„“ , and the last from the periodicity
of 𝑓 again.
     If πœ† < πœ…, we can instead consider the function e  𝑓 (𝑑) = 𝑓 (1 βˆ’ 𝑑). Here, e
                                                                                πœ… = 1 βˆ’ πœ… and e
                                                                                              πœ† = 1 βˆ’ πœ†,
and we can use our previous reasoning. This proves the general claim.                                 


10     Turing complexity
This section serves as a supplementary discussion on the time (and space) complexity of
algorithms for computing the minimum number of inversions in trees. From the analysis in Sec-

                        T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                               55
                        I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

tion 4.2, computing MinInv(𝑇, 𝜌) involves computing the sum of MinRInv(𝑇𝑣 , 𝜌) independently
for each 𝑣 ∈ 𝑇:                              Γ•
                            MinInv(𝑇, 𝜌) =     MinRInv(𝑇𝑣 , 𝜌).
                                                        𝑣

In order to compute MinRInv(𝑇𝑣 , 𝜌) for each 𝑣, we need to check the orderings 𝜎 of the
children 𝑒1 , . . . , 𝑒 π‘˜ of 𝑣 and determine the minimum number of cross inversions between the
corresponding leaf sets 𝐿1 , . . . , 𝐿 π‘˜ in the order given by 𝜎:
                                                      Γ•
                                RInv(𝑇𝑣 , 𝜌, 𝜎)              XInv𝜌 (𝐿 𝜎(𝑖) , 𝐿 𝜎(𝑗) ).
                                                    1≀𝑖<π‘—β‰€π‘˜


    A natural approach for computing MinInv(𝑇, 𝜌) consists of two phases:

   1. In a first phase, we compute XInv𝜌 between the leaf sets of any pair of siblings in 𝑇. This
      can be done in a bottom-up fashion similar to mergesort. More precisely, for a node
      𝑣 with sorted leaf sets 𝐿1 , . . . , 𝐿 π‘˜ , the cross inversions between every pair of leaf sets
      can be calculated using a merge operation in 𝑂(π‘˜ Β· (|𝐿1 |+ Β· Β· Β· + |𝐿 π‘˜ |)) time, and sorts the
      concatenation of the leaf sets for use in future steps. Each leaf of depth 𝑑 appears in 𝑑
      operations, which gives the total runtime of 𝑂(deg(𝑇) Β· 𝑑avg (𝑇) Β· 𝑛).

   2. In a second phase, we check, for each node 𝑣 independently, which ordering 𝜎 of the children
      of 𝑣 minimizes RInv(𝑇𝑣 , 𝜌, 𝜎). We then output the sum of the values MinRInv(𝑇𝑣 , 𝜌).

Exhaustively testing all orderings 𝜎 of the π‘˜ children of 𝑣 to compute MinRInv(𝑇𝑣 , 𝜌) takes 𝑂(π‘˜ Β·π‘˜!)
time, as 𝑂(π‘˜) time is required to calculate RInv(𝑇𝑣 , 𝜌, 𝜎), for each 𝜎, from the precomputed
values of XInv. This results in a total of 𝑂(𝑛 Β· deg(𝑇)!) time for the second phase, and an overall
running time of 𝑂((deg(𝑇)! + deg(𝑇) Β· 𝑑avg (𝑇)) Β· 𝑛).
    For any constant bound on the degree of the tree 𝑇, the basic algorithm runs in time
polynomial in 𝑛. The dependency of the running time on the degree can be improved. One way
to do so is by reducing the problem of the second phase to the closely-related and well-studied
problem of computing a minimum arc feedback set of a weighted directed graph.

Definition 10.1 (Minimum Feedback Arc Set). Given a directed graph 𝐺 = (𝑉 , 𝐸) with an
ordering 𝜎 on the vertices 𝑣1 , . . . , 𝑣 𝑛 , a feedback arc is an edge 𝑒 π‘˜ from 𝑣 𝑖 to 𝑣 𝑗 such that
𝜎(𝑣 𝑖 ) > 𝜎(𝑣 𝑗 ). The minimum feedback arc set problem is finding the minimum number of
feedback arcs induced by any ordering 𝜎. In weighted minimum feedback arc set, each edge
from 𝑣 𝑖 to 𝑣 𝑗 has a weight 𝑀 𝑖𝑗 , and the objective is to minimize π‘’βˆˆπΈ 𝑒 𝑖𝑗 Β· 𝕀[𝜎(𝑣 𝑖 ) > 𝜎(𝑣 𝑗 )].
                                                                     Í

    We can encode the problem of computing MinRInv(𝑇𝑣 , 𝜌) as an instance of weighted
minimum arc feedback set, where each edge of the graph 𝐺 has a positive weight. If the leaf sets
of the children of 𝑣 are 𝐿1 , . . . , 𝐿 π‘˜ , we construct a graph 𝐺 with π‘˜ vertices 𝑣 1 , . . . , 𝑣 π‘˜ . For each
pair of vertices 𝑣 𝑖 and 𝑣 𝑗 , if XInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ) < XInv𝜌 (𝐿 𝑗 , 𝐿 𝑖 ), we add an edge from 𝑣 𝑖 to 𝑣 𝑗 of weight
XInv𝜌 (𝐿 𝑗 , 𝐿 𝑖 ) βˆ’ XInv𝜌 (𝐿 𝑖 , 𝐿 𝑗 ). We can extract the value of MinRInv(𝑇𝑣 , 𝜌) from the weight of the
minimum feedback arc set.

                         T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                   56
                   L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

    As a consequence, we can use existing efficient algorithms for weighted minimum arc
feedback set to construct algorithms for inversion minimization on trees that are more efficient
than the basic algorithm we described. [3] gives two exact algorithms for weighted minimum
arc feedback set. One algorithm [3, Algorithm 1] is based on the Held-Karp algorithm for the
traveling salesman problem [13]; it uses dynamic programming to achieve a time complexity
of Θ(𝑛 2 2𝑛 ) and a space complexity of Θ(2𝑛 ) for a graph of 𝑛 vertices. Another algorithm
[3, Algorithm 2] uses a divide and conquer approach that achieves a time complexity of
𝑂(poly(𝑛) Β· 4𝑛 ), but has the advantage of only needing polynomial space.

Algorithm 2 MinRInv(𝑇𝑣 , 𝜌), Dynamic Programming
Input: Tree 𝑇𝑣 with child leaf sets 𝐿1 , . . . , 𝐿 π‘˜ , ranking 𝜌
  Initialize Cost[𝑆], where 𝑆 is over all subsets of {1, . . . , π‘˜}.
  Cost[βˆ…] ← 0
  for 𝑖 from 1 to π‘˜ do
      for all sets 𝑆 of size 𝑖 do
          Cost[𝑆] ← minπ‘ βˆˆπ‘† (Cost(𝑆 \ {𝑠}) + π‘—βˆˆπ‘†,𝑗≠𝑠 XInv𝜌 (𝐿 𝑗 , 𝐿 𝑠 )))
                                                    Í

  return Cost[{1, . . . , π‘˜}].

    An adaptation of the dynamic programming algo-
rithm for calculating MinRInv is given in Algorithm 2.                              π‘’π‘˜
                                                               𝐺:
Using this subroutine for computing MinRInv, we can
                                                                               𝑣𝑖            𝑣𝑗
improve the time complexity of our basic algorithm to
𝑂((deg(𝑇)2 2deg(𝑇) + deg(𝑇) Β· 𝑑avg (𝑇)) Β· 𝑛).
    The improved running time is still not efficient for
                                                               𝑇:
trees with unrestricted degree. In fact, the problem is                         𝑛
NP-hard.
                                                                          𝑣𝑖                        𝑣𝑗
Proposition 10.2. Computing MinInv(𝑇, 𝜌) is NP-hard.                                Β·Β·Β·
                                                               2 deg𝐺 (𝑣 𝑖 )        2 deg𝐺 (𝑣 𝑗 )
Proof. We show a reduction from the known NP-hard
problem β€œMinimum feedback arc set" [17].
    For a graph 𝐺 with 𝑛 vertices 𝑣 1 , . . . , 𝑣 𝑛 and π‘š                Β·Β·Β·               Β·Β·Β·
directed edges 𝑒1 , . . . , 𝑒 π‘š , we construct a depth-2 tree 𝑇 𝜌: βˆ’2π‘˜       2π‘˜ βˆ’ 1 βˆ’2π‘˜ + 1    2π‘˜
and a ranking 𝜌 of its leaves. We will assume that 𝐺 has
no isolated vertices; this goes without loss of generality      Figure 18: Encoding of an edge 𝑒 π‘˜
as isolated vertices can be dropped from an instance of
minimum arc feedback set without affecting the answer.
We also assume that between any two vertices at most
one of the two directed edges is present; this is also without loss of generality since dropping
the edges in case both are present reduces the answer by one. Finally, for ease of notation, we
allow the ranking 𝜌 to be an injective mapping into the integers; this can be changed easily by
replacing each integer by its rank in the range.

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                              57
                      I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

    In the first layer, the root of 𝑇 has 𝑛 children corresponding to 𝑣 1 , . . . , 𝑣 𝑛 . The second layer
has leaves with ranks encoding the edges of 𝐺. For each edge 𝑒 π‘˜ going from 𝑣 𝑖 to 𝑣 𝑗 , we add two
leaves under 𝑣 𝑖 with ranks βˆ’2π‘˜ and 2π‘˜ βˆ’ 1, and two leaves under 𝑣 𝑗 with ranks βˆ’2π‘˜ + 1 and 2π‘˜,
as shown in Figure 18. All ranks are distinct.
    Consider the number of inversions in 𝑇 induced by an ordering 𝜎 of 𝑣 1 , . . . , 𝑣 𝑛 . For each
edge 𝑒 π‘˜ , the number of inversions between the leaves of rank βˆ’2π‘˜ and 2π‘˜ βˆ’ 1 and the leaves
of rank βˆ’2π‘˜ + 1 and 2π‘˜ is 1 if 𝜎(𝑣 𝑖 ) < 𝜎(𝑣 𝑗 ) and 3 if 𝜎(𝑣 𝑖 ) > 𝜎(𝑣 𝑗 ). These four leaves also form
8(π‘š βˆ’ 1) inversions with all other leaves, keeping in mind that these inversions are counted
twice when summed up over all edges.
    Therefore, the minimum number of inversions in 𝑇 is given by
                                                                                             !
                                                                 Γ•
                  MinInv(𝑇, 𝜌) = min 4π‘š(π‘š βˆ’ 1) + π‘š + 2 Β·                  𝕀[𝜎(𝑣 𝑖 ) > 𝜎(𝑣 𝑗 )] .
                                     𝜎
                                                                 𝑒 π‘˜ ∈𝐸


                                                              𝑒 π‘˜ ∈𝐸 𝕀[𝜎(𝑣 𝑖 ) > 𝜎(𝑣 𝑗 )] , which
                                                                              Í                        
The size of the minimum arc feedback set is precisely min𝜎
can be extracted from MinInv(𝑇, 𝜌) with straightforward calculations. This completes the
reduction.                                                                                      

Approximation Algorithms. If we relax our requirements to an approximate answer, we can
approximate MinRInv in polynomial time using existing approximation algorithms for weighted
minimum feedback arc set. The best known such algorithm achieves an approximation ratio
of 𝑂(log 𝑛 log log 𝑛) on a graph with 𝑛 vertices [10]. Adapting this algorithm for minimizing
inversions in trees produces an approximation factor of 𝑂(log(deg(𝑇)) log log(deg(𝑇))) for
MinInv(𝑇, 𝜌). Under the unique games conjecture, there does not exist a constant-factor
approximation algorithm for minimum feedback arc set on arbitrary digraphs [12].
    In the special case of tournament graphs, which have exactly one edge of weight 1 between
every pair of vertices, there are efficient constant factor approximation algorithms for minimum
arc feedback set. Some of these also apply to weighted tournaments, where for every pair of
vertices 𝑣 𝑖 , 𝑣 𝑗 , the nonnegative edge weights 𝑀 𝑖𝑗 , 𝑀 𝑗𝑖 satisfy 𝑀 𝑖𝑗 + 𝑀 𝑗𝑖 = 1. This case corresponds
to the scenario of computing MinRInv(𝑇𝑣 , 𝜌) where         √
                                                                all leaf sets 𝐿 𝑖 of siblings have the same
size. [18] gives an algorithm with runtime 𝑂 (2    βˆ—    𝑂(   OPT) ), given that the optimal answer is OPT.
[19] also gives an approximation algorithm in the case where 𝑀 𝑖𝑗 + 𝑀 𝑗𝑖 ∈ [𝑏, 1] for some 𝑏 > 0:
                                                                                                   ˜
For any πœ– > 0, the algorithm produces a (1 + πœ–)-approximation of OPT in time 𝑛2𝑂(1/(πœ–π‘) ) . For
                                                                                                           12


the problem of computing MinRInv(𝑇𝑣 , 𝜌), the parameter 𝑏 represents the ratio between the
smallest and largest possible values of |𝐿 𝑖 | Β· |𝐿 𝑗 |.

Wilcoxon test. As a final remark we point out an alternate way of computing Π𝑇 (𝜌) in the
special case of the Mann–Whitney trees of Figure 2. The number of cross inversions XInv𝜌 (𝐴, 𝐡)
can be written in terms of the rank sum π‘Šπ΅  π‘¦βˆˆπ΅ 𝜌(𝑦) as follows, where π‘Ž  |𝐴| and 𝑏  |𝐡|:
                                              Í

                                                         𝑏(𝑏 + 1)
                                  XInv𝜌 (𝐴, 𝐡) = π‘Žπ‘ +             βˆ’ π‘Šπ΅ .                                    (10.1)
                                                            2

                       T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                                     58
                 L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

The quantity π‘Šπ΅ is known as the Wilcoxon rank-sum statistic for differences between random
variables. Because of the relationship (10.1) the Wilcoxon test is equivalent in power to the
Mann–Whitney test. However, the evaluation based on the efficient computation of cross
inversions (especially in the case of unbalanced set sizes π‘Ž and 𝑏) is superior to the evaluation
based on the rank sum 𝑆𝐡 , as the latter presumes sorting the combined set 𝑋 = 𝐴 t 𝐡.


Acknowledgements
We would like to thank Greta Panova, Robin Pemantle, and Richard Stanley for pointers regarding
Gaussian polynomials, Stasys Jukna for answering questions about the complexity measure
𝑁, and the anonymous reviewers and copy editors for helpful suggestions. We appreciate the
partial support for this research by the U.S. National Science Foundation under Grants No.
2137424 and 2312540.


References
 [1] Michael Ben-Or: Lower bounds for algebraic computation trees. In Proc. 15th STOC, pp.
     80–86. ACM Press, 1983. [doi:10.1145/800061.808735] 10

 [2] Manuel Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan:
     Time bounds for selection. J. Comput. System Sci., 7(4):448–461, 1973. [doi:10.1016/S0022-
     0000(73)80033-9] 12

 [3] Hans L. Bodlaender, Fedor V. Fomin, Arie M. C. A. Koster, Dieter Kratsch, and Dimitrios M.
     Thilikos: A note on exact algorithms for vertex ordering problems on graphs. Theory
     Computing Sys., 50(3):420–432, 2012. [doi:10.1007/s00224-011-9312-0] 57

 [4] Jean Cardinal, Samuel Fiorini, GwenaΓ«l Joret, RaphaΓ«l M. Jungers, and J. Ian Munro: An
     efficient algorithm for partial order production. SIAM J. Comput., 39(7):2927–2940, 2010.
     [doi:10.1137/090759860] 13

 [5] Svante Carlsson and Jingsen Chen: The complexity of heaps. In Proc. 3rd Ann. ACM–SIAM
     Symp. on Discrete Algorithms (SODA’92), pp. 393–402. SIAM, 1992. ACM DL. 12

 [6] Richard Degerman: Ordered binary trees constructed through an application of Kendall’s
     tau. Psychometrika, 47(4):523–527, 1982. [doi:10.1007/BF02293713] 2, 3, 4

 [7] Dorit Dor and Uri Zwick: Selecting the median. SIAM J. Comput., 28(5):1722–1758, 1999.
     [doi:10.1137/S0097539795288611] 12

 [8] Dorit Dor and Uri Zwick: Median selection requires (2 + πœ–)𝑛 comparisons. SIAM J. Discr.
     Math., 14(3):312–325, 2001. [doi:10.1137/S0895480199353895] 12

 [9] David Eppstein: A permutohedron, 2007. LINK, see also Wikipedia article, accessed
     11-07-2024. 9

                     T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                     59
                    I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

[10] Guy Even, Joseph Naor, Baruch Schieber, and Madhu Sudan: Approximating mini-
     mum feedback sets and multicuts in directed graphs. Algorithmica, 20(2):151–174, 1998.
     [doi:10.1007/pl00009191] 58

[11] Gaston H. Gonnet and J. Ian Munro: Heaps on heaps. SIAM J. Comput., 15(4):964–971,
     1986. [doi:10.1137/0215068] 12

[12] Venkatesan Guruswami, Johan HΓ₯stad, Rajsekar Manokaran, Prasad Raghavendra,
     and Moses Charikar: Beating the random ordering is hard: Every ordering CSP is
     approximation resistant. SIAM J. Comput., 40(3):878–914, 2011. [doi:10.1137/090756144] 58

[13] Michael Held and Richard M. Karp: A dynamic programming approach to sequencing
     problems. J. SIAM, 10(1):196–210, 1962. [doi:10.1137/0110015] 57

[14] Ivan Hu, Dieter van Melkebeek, and Andrew Morgan: Query complexity of inversion
     minimization on trees. In Proc. 34th Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’23),
     pp. 2836–2866. SIAM, 2023. [doi:10.1137/1.9781611977554.ch107] 1

[15] Stasys Jukna, Alexander A. Razborov, Petr SavickΓ½, and Ingo Wegener: On P versus
     NP ∩ co-NP for decision trees and read-once branching programs. Comput. Complexity,
     8(4):357–370, 1999. [doi:10.1007/s000370050005] 11

[16] Kanela Kaligosi, Kurt Mehlhorn, J. Ian Munro, and Peter Sanders: Towards optimal
     multiple selection. In Proc. 32nd Internat. Colloq. on Automata, Languages, and Programming
     (ICALP’05), pp. 103–114. Springer, 2005. [doi:10.1007/11523468_9] 12

[17] Richard M. Karp: Reducibility among combinatorial problems. In Raymond E. Miller
     and James W. Thatcher, editors, Complexity of Computer Computations, pp. 85–103. Plenum
     Press/Springer, 1972. Available at ResearchGate. [doi:10.1007/978-1-4684-2001-2_9] 57

[18] Marek Karpinski and Warren Schudy: Faster algorithms for feedback arc set tournament,
     Kemeny rank aggregation and betweenness tournament. In Proc. Internat. Symp. on
     Algorithms and Computation (ISAAC’10), pp. 3–14. Springer, 2010. [doi:10.1007/978-3-642-
     17517-6_3] 58

[19] Claire Kenyon-Mathieu and Warren Schudy: How to rank with few errors. In Proc. 39th
     STOC, pp. 95–103. ACM Press, 2007. [doi:10.1145/1250790.1250806] 58

[20] Henry B. Mann and Donald R. Whitney: On a test of whether one of two random variables
     is stochastically larger than the other. Ann. Math. Stat., 18(1):50–60, 1947. JSTOR. 11

[21] Stephen Melczer, Greta Panova, and Robin Pemantle: Counting partitions inside a
     rectangle. SIAM J. Discr. Math., 34(4):2388–2410, 2020. [doi:10.1137/20M1315828] 11

[22] George A. Miller: Psychological method to investigate verbal concepts. J. Mathematical
     Psychology, 6(2):169–191, 1969. [doi:10.1016/0022-2496(69)90001-7] 2

                    T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                     60
                 L OWER B OUND T ECHNIQUES IN THE C OMPARISON -Q UERY M ODEL

[23] Ryan O’Donnell: Analysis of Boolean Functions.          Cambridge Univ. Press, 2014.
     [doi:10.1017/CBO9781139814782] 14

[24] AlfrΓ©d RΓ©nyi: Probability Theory. North-Holland 1970, Dover 2007. 44

[25] Arnold SchΓΆnhage: The production of partial orders. AstΓ©risque, 38–39:229–246, 1976.
     NumDam. 12

[26] Igor S. Sergeev: On the upper bound of the complexity of sorting. Comput. Math. and Math.
     Physics, 61(2):329–346, 2021. [doi:10.1134/S0965542521020111] 5

[27] Richard P. Stanley and Fabrizio Zanello: Some asymptotic results on π‘ž-binomial co-
     efficients. Annals of Combinatorics, 20(3):623–634, 2016. [doi:10.1007/s00026-016-0319-8]
     11

[28] Lajos TakΓ‘cs: Some asymptotic formulas for lattice paths. J. Stat. Planning and Inference,
     14(1):123–142, 1986. [doi:10.1016/0378-3758(86)90016-9] 11

[29] Andrew Chi-Chin Yao: Probabilistic computations: Toward a unified measure of complexity.
     In Proc. 18th FOCS, pp. 222–227. IEEE Comp. Soc., 1977. [doi:10.1109/SFCS.1977.24] 27, 28


AUTHORS

     Ivan Hu
     Ph. D. student
     Department of Computer Science
     University of Wisconsin – Madison
     Madison, Wisconsin, USA
     ilhu wisc edu
     https://pages.cs.wisc.edu/~ihu/


     Dieter van Melkebeek
     Professor
     Department of Computer Science
     University of Wisconsin – Madison
     Madison, Wisconsin, USA
     dieter cs wisc edu
     https://pages.cs.wisc.edu/~dieter/




                    T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                    61
                  I VAN H U , D IETER VAN M ELKEBEEK , AND A NDREW M ORGAN

    Andrew Morgan
    Software engineer
    Google
    amorgan cs wisc edu
    https://pages.cs.wisc.edu/~amorgan/


ABOUT THE AUTHORS

    Ivan Hu is a third year Ph. D. student at the University of Wisconsin–Madison under
       the supervision of Dieter van Melkebeek. He is currently studying complexity
       theory.


    Dieter van Melkebeek received his Ph. D. from the University of Chicago, under
       the supervision of Lance Fortnow. His thesis was awarded the ACM Doctoral
       Dissertation Award. After postdocs at DIMACS and the Institute for Advanced
       Study, he joined the faculty at the University of Wisconsin-Madison, where
       he currently is a full professor. His research interests include the power of
       randomness, lower bounds for NP-complete problems, and connections between
       derandomization and lower bounds.


    Andrew Morgan received his Ph. D. in 2022 from the University of Wiscon-
      sin–Madison under the supervision of Dieter van Melkebeek. After graduating,
      he became a software engineer at Google.




                  T HEORY OF C OMPUTING, Volume 20 (7), 2024, pp. 1–62                    62