Rounds vs. Queries Tradeoff in Noisy Computation

Authors Navin Goyal, Michael Saks,
Plaintext
                              T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134
                                           www.theoryofcomputing.org




                         Rounds vs. Queries Tradeoff
                           in Noisy Computation
                                      Navin Goyal∗                  Michael Saks†
                                Received: January 26, 2010; published: September 2, 2010.




         Abstract: We show that a noisy parallel decision tree making O(n) queries needs Ω(log∗ n)
         rounds to compute OR of n bits. This answers a question of Newman [IEEE Conference
         on Computational Complexity, 2004, 113–124]. We prove more general tradeoffs between
         the number of queries and rounds. We also settle a similar question for computing MAX in
         the noisy comparison tree model; these results bring out interesting differences among the
         noise models.

ACM Classification: F.2.3
AMS Classification: 68Q13
Key words and phrases: noisy computation, tradeoff lower bound, decision trees


1      Introduction
Understanding the impact of noise on computation and communication is an important problem and it
has been studied in various fields. Within the theory of computing, noise has been studied in the context
of various models of computation: decision trees, e. g., [5, 9, 3, 2, 7], various kinds of communication
protocols, e. g., [6, 4, 7], and many others. In this paper we will be concerned with the noisy Boolean
decision tree and the noisy comparison tree models.

     ∗ Most of this author’s research was performed at Rutgers University supported in part by a Louis Bevier Fellowship of

Rutgers University and NSF grant CCR-9988526.
   † Supported in part by NSF grants CCR-9988526,CCF-0515201 and CCF-0832787.



    2010 Navin Goyal and Michael Saks
    Licensed under a Creative Commons Attribution License                                     DOI: 10.4086/toc.2010.v006a006
                                   NAVIN G OYAL AND M ICHAEL S AKS

1.1   Noisy Boolean decision trees
A Boolean decision tree represents an algorithm for computing a Boolean function f (x1 , . . . , xn ) by
adaptively querying its variables. Several models have been proposed to formalize the notion of noise
in the context of decision trees. Among these models, perhaps the most popular and simplest is the
random noise model. In this model, there is a noise parameter ε ∈ [0, 1/2) and the answer given to each
query is incorrect with probability ε, independent of the answers to other queries and any randomness
used by the algorithm. We call a decision tree operating under the random noise model a noisy decision
tree (NDT). We say that an NDT computes f with error probability δ (for some δ ∈ [0, 1/2)) if on each
input x ∈ {0, 1}n , the NDT outputs f (x) with probability at least 1 − δ . For brevity, we say that an NDT
computes f , if it computes f with error probability at most 1/4.
    A noise-free Boolean decision tree of depth d can be simulated by a noisy decision tree of depth
O(d log(d/δ )) with arbitrarily small error δ : Each query of a variable in the noise-free setting is simu-
lated by taking the majority of Θ(log(d/δ )) noisy queries of the same variable. Any n-variate function
can be computed with at most n queries in the noise-free setting, and thus can be computed with at
most O(n log n) queries in the random noise model. Feige et al. [5] showed that to compute MAJORITY
and PARITY in the noisy decision tree model Ω(n log n) queries are indeed necessary. In contrast, they
showed that OR can be computed in O(n) queries in the noisy model.
    There is a natural notion of parallelism for decision trees [11]. In each step (or round), the decision
tree may make many queries (as opposed to a single query in the usual decision trees) and the decision
tree branches according to the ensemble of answers to all of the queries. We can think of the queries in
a single round as being made in parallel. In the noise-free model, such an algorithm is called a parallel
decision tree (PDT).
    In this paper, we investigate parallel noisy decision trees (PNDTs), which is the parallel version of
the noisy decision trees. See Section 2 for precise definition. PNDTs were introduced by Newman [7],
who used them to construct protocols in the noisy broadcast model of distributed computing. His work
raised the problem of understanding the tradeoff between the the number of rounds and the total number
of queries needed to compute ORn (the OR of n input bits) for PNDTs. The O(n)-query NDT for ORn
of Feige et al. [5] gives an O(log n) rounds PNDT. Newman reduced this to O(log∗ n) rounds with O(n)
queries, and asked: Is there a noisy decision tree for OR using O(n) queries and O(1) rounds? In
this paper we provide a negative answer to this question and show that Newman’s PNDT is essentially
optimal by proving a rounds-query tradeoff lower bound. The notation used in this section is defined in
Section 2.
Theorem 1.1. Let ε ∈ (0, 1/2). There is a positive integer C = C(ε) such that for any positive integers
n, r with r ≤ log∗1/ε n −C, any PNDT T that computes ORn in r rounds with error probability less than
                                                                                   1         (r)
1/4 under the random noise model with noise parameter ε. Then T requires at least 100 n log1/ε n queries.
    We note two special cases of the theorem: (1) r = 1 corresponds to the case when all queries are
made in parallel in the first round; the number of queries needed here is Ω(n log n). (2) Any PNDT that
computes ORn using O(n) queries needs log∗1/ε n −C rounds, where C is independent of n but depends
on ε.
    It has been noted (in, e. g., [4]) that the unrealistic assumption of the random noise model, that
each query experiences independent noise with the same probability, allows for artificial algorithms that

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                            114
                        ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

exploit this regularity of noise. Other models of noise have been proposed [4] that are more robust in
that they model noise by a family of possible distributions rather than as a single distribution, and a
good PNDT for a problem must succeed against any distribution in the family. For example, a minimal
robustness requirement is that a good PNDT should still succeed if all noise is eliminated. Noise models
of this type are conveniently defined by viewing the noise as partially controlled by an adversary; the
more powerful the adversary the harder it is to design algorithms that are robust against it. In the
random noise model, the adversary has no power to influence the noise, in the static adversary model
the adversary has some power, and in the clairvoyant adversary model, the adversary has even more
power. These models are defined in Section 2.1.
    For upper bounds, one seeks to design PNDTs that tolerate noise in the most robust model possible.
Since our lower bound is proved for the least robust model, they hold for the other models as well.
    Our lower bound for ORn is deduced from a lower bound on a related problem, the “Which Half
Problem” denoted WHPn . Working with this problem substantially simplifies the proof. The input to
WHPn is a 2n-bit vector with exactly one bit set to 1. The goal is to determine whether the 1 is in the first
or the second half of the bits. We prove that if the location of 1 is chosen uniformly at random then we
get a query-round trade-off as in Theorem 1.1. The proof is by induction on the number of rounds. The
induction step uses round elimination: We prove a lemma saying that after the first round of queries, with
high probability we are left with a problem that is at least as hard as a smaller (but not too much smaller)
version of the original problem. Round elimination as a general idea has been used in communication
complexity; we are not aware of similar applications in the context of noisy computation.

1.2   Noisy comparison decision trees
In the comparison tree model, the variables take values from an abstract totally ordered set U and a
query specifies two variables xi , x j and asks whether xi < x j or xi > x j . (We assume for simplicity that
the variables have distinct values.) The parallel version of the model, parallel comparison trees (PCTs),
was considered by Valiant [11].
    The random noise model and the other more robust noise models have a natural analog for com-
parison trees; e. g., in the random noise model the answer to each comparison query is incorrect with
probability ε. Thus we can define parallel noisy comparison trees (PNCTs). Such trees were consid-
ered by Feige et al. [5]. Among other results, they showed that the function MAXn , which determines
the index of the maximum variable among n variables, can be computed in O(log n) rounds using O(n)
comparisons, even in the most robust noise model.
    We are interested in the tradeoffs between the total number of comparisons and the number of rounds
used by a PNCT computing the maximum function MAX.
    For deterministic trees, in the noise-free setting, Valiant showed that deterministic PCTs that make
O(n) comparisons per round require Ω(log log n) rounds. He gave a PCT with O(log log n) rounds and
O(n log log n) comparisons. This was improved to O(log log n) rounds and total O(n) comparisons by
Shiloach and Vishkin [10]. Thus, for deterministic PCTs using O(n) comparisons, Θ(log log n) rounds
are necessary and sufficient. We adapt the Shiloach and Vishkin upper bound to deterministic PNCTs:
Theorem 1.2. Let ε ∈ [0, 1/2). For any of the noise models of Section 2.1 there is a deterministic PNCT
computing MAXn in O(log log n) rounds and O(n) comparisons.

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                              115
                                    NAVIN G OYAL AND M ICHAEL S AKS

    Any lower bound for deterministic PCTs in the noise-free case also applies to the static and clair-
voyant adversary models, because in those two models the adversary is allowed to eliminate all noise.
Therefore Valiant’s lower bound for the noise-free case applies to both of these models and so the result
of Theorem 1.2 is tight. As mentioned earlier, algorithms in the random noise model are potentially
more powerful than in the deterministic model (which is why the random noise model is considered to
be unsatisfactory when analyzing upper bounds), because the noise can be used to simulate randomized
algorithms that outperform any deterministic algorithms. Our results provide a specific example of this
phenomena: after presenting a randomized PNCT for MAXn , we show in Corollary 1.4, that the ran-
domized PNCT for MAXn can be adapted to give a deterministic PNCT for MAXn in the random noise
model that beats Valiant’s lower bound in the noise-free model.
    For randomized PCTs, in the noise-free setting, Reischuk [8] gave an algorithm with O(1) rounds
and O(n) comparisons, which is clearly best possible up to constant factors. For the noisy case we obtain
the following tight queries-rounds tradeoff for randomized PNCTs computing MAXn .

Theorem 1.3. There exists a constant ε0 ∈ (0, 1) such that for noise parameter ε satisfying 0 < ε ≤ ε0 ,
Θ(log∗1/ε n) rounds are necessary and sufficient to compute MAXn by a randomized PNCT using O(n)
comparisons. This holds for each of the three noise models in Section 2.1.

    The lower bound is proved by a reduction from the PNDT lower bound for OR; the upper bound
adapts Reischuk’s [8] algorithm to the noisy case. The resulting algorithm can be easily modified to get
a deterministic algorithm for the random noise model:

Corollary 1.4. There exists a constant ε0 ∈ (0, 1) such that for noise parameter ε satisfying 0 < ε ≤ ε0 ,
Θ(log∗1/ε n) rounds are necessary and sufficient to compute MAXn by a deterministic PNCT using O(n)
comparisons in the random noise model.

Organization of the paper. The rest of this paper is organized as follows. Section 2 contains a formal
description of the PNDT model and some definitions. In Section 3, we present a PNDT for ORn , different
from Newman’s, that meets the lower bound of Theorem 1.1, and provides intuition for why there is no
O(n) query, O(1) round protocol. In Section 4 we present the proof of Theorem 1.1 by reducing the
problem of lower bound for OR to lower bound for the “Which Half” problem. In Section 5 we present
results for the comparison tree model. Section 6 concludes with some open problems.


2    Preliminaries
A Boolean decision tree over Boolean variables {x1 , . . . , xn } is a rooted binary tree T where each internal
node of the tree is labeled by a variable and has two children, the two edges going out of a node have
labels 0 and 1, and each leaf is labeled 0 or 1. An execution of T determines a path from the root as
follows: Starting from the root, a query is made to the variable labeling that node, and the (possibly
noisy) answer to the query determines the edge to be followed. This is repeated until a leaf is reached.
The output is the label of the leaf.
    In a noise-free execution, the response to each query is equal to the value of the queried variable and
thus the path followed is completely determined by the input.

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               116
                        ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

     In a noisy execution the response to each query may disagree with the input variable. Associated to
each node v is a noise random variable ηv which is 0 if the answer is correct (so that the query answer
agrees with the input variable) and is 1 if the answer is incorrect (so that the query answer disagrees
with the input variable). Thus the path followed is a random variable depending on the noise. In the
random noise model with parameter ε ∈ [0, 1/2), the Zv are each 1 with probability ε, and are mutually
independent and also independent.
     We say that a decision tree computes a function f : {0, 1}n → {0, 1} for noise parameter ε if for
all inputs x, a noisy execution on the tree outputs f (x) with probability ≥ 3/4, where the probability is
taken over the random noise. Of course, 3/4 is an arbitrary constant, and we could choose it to be any
constant in (1/2, 1) using standard amplification, increasing the number of rounds and queries only by
a constant factor.
     For noiseless executions, we generally assume without loss of generality that the tree does not query
the same variable more than once along any path. For noisy executions, this assumption is not made. A
decision tree that is being subjected to noisy execution (and may include multiple queries to the same
variable) is called a noisy decision tree (NDT).
     A parallel Boolean decision tree (PDT) is similar to a Boolean decision tree, except that each internal
node v is labeled by a list of variables (possibly with repetition) corresponding to a collection of queries
made in parallel. There are 2m branches coming from v (where m is the number of parallel queries made
at the node), corresponding to the possible responses to these queries. A noiseless execution is defined
in the obvious way. In a noisy execution, there are now noise random variables ηv,i corresponding to the
ith query at node v. The random noise model is defined in the obvious way. In the noiseless setting, the
list of queries labeling a node may be assumed to be distinct.


2.1   More robust noise models
As mentioned in the introduction, models of noise other than the random noise model have been pro-
posed; we will explain some relevant ones in this section. A noise distribution for the PNDT is any
joint distribution over the random variables ηv,i . A general noise model consists of a family of allowed
joint distributions over the noise variables. As mentioned above, in the random noise model, the family
consists of a single distribution in which each variable is chosen independently to be 1 with probability
ε.
    In the static adversary model [4] (also called the fault tolerance model [7]), given the PNDT, the
allowed distributions are those for which the noise variables are mutually independent but the probability
of noise for each variable may be any number in [0, ε].
    In the clairvoyant adversary model [4] the noise variables are not required to be mutually indepen-
dent. The allowed distributions are those that satisfy that for any noise variable and any setting of the
other noise variables, the conditional probability that the noise variable is 1 given that setting is at most
ε. The clairvoyant adversary model is the most robust; that is, the allowed distributions are more general
than the static adversary model.
    Here is another way to think about the clairvoyant adversary model with parameter ε that is espe-
cially useful. In an execution, values are generated independently for all of these noise variables as in
the random noise model. The adversary then has the option of changing any noise variable from 1 to 0

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                              117
                                      NAVIN G OYAL AND M ICHAEL S AKS

(but not from 0 to 1). Effectively, the adversary may cancel any of the noise (and can do this with full
knowledge of the PDT, the input values and the values of all noise variables.)
     This viewpoint is particularly useful for showing that certain algorithms are robust against the clair-
voyant adversary. Let B be any event that depends on the noise variables. We say that B is monotone
with respect to noise if for any setting of the noise variables for which B occurs then B continues to occur
if any variables set to 1 are changed to 0. This notion is useful for the analysis of algorithms because of
the following observation:

Lemma 2.1. Let B be an event that is monotone with respect to noise. If Pr[B] ≤ δ with respect to the
random noise model, then for any clairvoyant adversary, Pr[B] ≤ δ .


2.2   Towers and iterated logarithms
                                                                                                (k)
Let b > 1 be a real number and k be a nonnegative integer. The tower function Towerb (·) is defined
recursively for s ≥ 1 by:
                                            (
                                    (k)      s              if k = 0,
                              Towerb (s) =          (k−1)
                                             bTowerb (s) if k ≥ 1.
                                                                   (r)
We define log∗b (x) to be the least integer r such that Towerb (1) ≥ x. The base b k-iterated log function
is defined for real numbers x that satisfy log∗b (x) ≥ k by:
                                               (
                                     (k)        x                        if k = 0,
                                  logb (x) =              (k−1)
                                                 logb (logb     (x))     if k ≥ 1.

                                               (k)           (k)
Observe that for each fixed k and b, Towerb (·) and logb (·) are inverse functions.
   We state the following routine facts without proof:

Proposition 2.2. Let b > a > 1 be real numbers. There is a nonnegative integer T = T (a, b) with the
property that for all x ≥ 1 and 1 ≤ k ≤ log∗a (x) − T ,

          (k)                   (k)
   1. loga (x) ≤ (loga b) logb (x).

          (k)             (k)
   2. loga (x2 ) ≤ 2 loga (x).


2.3   Other notation
For a bit vector u ∈ {0, 1}n , |u| = ∑i ui . For 1 ≤ i ≤ n let ei = eni denote the vector with all except the ith
bit 0. The all 0’s vector is denoted by 0 = 0n . The input is denoted x = (x1 , . . . , xn ). When the base is not
specified the base of the logarithm is 2. The notation O(.), Ω(.) may hide factors depending on ε. Set
[n] := {1, . . . , n}. For two sets A and B denote their symmetric difference by A 4 B. X ∼ Y means that
the random variables X and Y are identically distributed.

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                  118
                        ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

3    Tradeoff upper bounds for OR
In this section, we present a PNDT that essentially matches the tradeoff lower bound stated in The-
orem 1.1. As mentioned earlier, Newman already gave such a PNDT. His PNDT has an additional
property that on non-zero input, with high probability it returns the minimum index i ∈ [n] such that
x[i] = 1.
     Here we give a different PNDT called FAST OR. FAST OR does not have the above additional
property but has other advantages: (i) it is simpler, (ii) it reveals a computational bottleneck that leads to
our tight lower bound, and (iii) on any input x, with high probability it outputs a subset of indices that is
“close” to {i ∈ [n] : xi = 1}. We need this last property in the proof of Theorem 1.3 to design an efficient
PNCT for MAX. Both Newman’s PNDT and FAST OR are robust even against the most robust noise
model.
     For our algorithm, we assume that the noise parameter ε is at most 1/16. Given an algorithm that
works with noise parameter ε ≤ 1/16, we can convert it to one that works for any noise parameter
ε < 1/2 by replacing each query to a variable by the majority of a suitably chosen constant number of
queries to that variable. This leaves the number of rounds unchanged and changes the number of queries
by a constant factor.
     Given n and integer q ≥ 2, FAST ORn,q takes as input x1 , . . . , xn and uses nq queries and a small
number of rounds. The definition of FAST ORn,q depends on two sequences (q j : j ≥ 1) and (λ j : j ≥ 1)
of parameters. We define β := ( ε1 )1/8 and for j ≥ 1,

                                                         ( j−1)
                                        q j := Towerβ (q/2) ,
                                                   q
                                        λ j :=             .
                                               q j+1 2 j+1

The first sequence increases rapidly like a tower function, and the second decreases rapidly like the
reciprocal of the tower function. We now describe FAST ORn,q .


                                               FAST ORn,q

             FAST ORn,q constructs a sequence of sets [n] =: S0 ⊇ S1 ⊇ · · · . We write s j
             for the size of S j . During round j for j ≥ 1, the algorithm determines S j from
             S j−1 as follows: For each i ∈ S j−1 , the algorithm performs q j (noisy) queries
             of xi and places i in S j if at least q j /2 answers were 1. At the end of round j,
             FAST OR terminates with output 0 if s j = 0, and terminates with output 1 if
             s j > λ j s j−1 , and otherwise it continues to round j + 1.


    The basic idea behind FAST ORn,q is that one can increase the number of queries when the number
of relevant variables is small, achieving small probability of error and not making too many queries
overall. This simple idea has appeared before in the context of noisy decision tree computation. The
new element here is the (tight) trade-off between the number of rounds and the number of queries. This

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               119
                                    NAVIN G OYAL AND M ICHAEL S AKS

requires that we carefully choose the number of queries made in each round. Let us give some intuition
for FAST ORn,q . FAST ORn,q tries to approximate the set S := {i ∈ [n] : xi = 1} of indices whose input
bit is 1. The algorithm constructs a sequence of successive approximations S0 ⊃ S1 ⊃ S2 ⊃ S3 ⊃ · · ·
to S. The initial approximation S0 is the set [n]. In round i, the algorithm makes qi queries to each
of the variables indexed by Si−1 and defines Si to be the set of indices of Si for which the majority of
queries return 1. Some indices may be misclassified, but for each individual bit the probability that it
is misclassified is small. If |Si | is not much smaller than Si−1 (smaller by a factor of λi ), then we will
deduce that, with high probability, S is well approximated by Si , and in particular is non-empty, so the
OR is 1. Otherwise, we proceed to the (i + 1)st round of queries. Note that since |Si |/|Si−1 | ≤ λi , we can
afford to make a larger number of queries (qi ) to each bit without much increase in the overall number
of queries, and so the probability that a bit is misclassified decreases (exponentially in the number of
queries to it in the previous round).
     For each fixed input x, the behavior of the algorithm depends on the random noise present in each
query response. As we will see below, the algorithm is guaranteed to terminate, irrespective of the
random noise. This is simply because if the algorithm does not stop in round j, then |S j | is much smaller
than |S j−1 |, and this cannot go on for too many rounds. Let R be the random variable representing the
total number of rounds.
     For x ∈ {0, 1}n , define A = A(x) := {i : xi = 1} and B = B(x) := {i : xi = 0}.

Theorem 3.1. For each ε ∈ (0, 1/16), there is a constant C = C(ε) such that the following holds.
                                                                              (r)
Let n, r, q be positive integers satisfying r ≤ log∗1/ε n − C, and q = d80 log1/ε (n)e ≤ n. For any input
x ∈ {0, 1}n , FAST ORn,q uses at most qn queries and the number R of rounds is at most r. Furthermore,
the final set SR satisfies:

   1. for each i ∈ A, Pr[i 6∈ SR ] ≤ 1/40000, and

   2. |SR 4 A| ≤ |A|/100 with probability at least 99/100.

This holds for all three noise models.

     To compute OR, FAST OR outputs 1 if it finishes with nonempty SR and outputs 0 otherwise. If all
input bits are 0, then A = 0/ and assertion (2) implies that SR = 0/ with probability at least 99/100. If at
least one input bit is 1, then A 6= 0/ and so for any i ∈ A, assertion (1) implies that i ∈ SR with probability
at least 1 − 1/40000. In either case, the algorithm outputs ORn (x) with very small probability of error.

Proof. We first observe that it is enough to prove assertions (1) and (2) for the random noise model. It
is not hard to show that for each i ∈ A, and for each j ∈ [R], the event i 6∈ S j is monotone with respect to
noise (as defined in Subsection 2.1). Therefore by Lemma 2.1, if (1) holds for the random noise model
it holds against any clairvoyant adversary. Similarly, for i ∈ B, and j ∈ [R], the event i ∈ S j is monotone
with respect to noise. It follows that the event |SR 4 A| ≤ |A|/100 is monotone with respect to noise and
so it is enough to prove (2) for the random noise model.
    Henceforth we assume the random noise model. The constant C(ε) that is stipulated in the theorem
will be specified by various conditions that arise in the argument.

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               120
                          ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

    For any j, if FAST OR has not terminated by the end of round j − 1 then:
                                                                             qn
                                     1 ≤ s j−1 ≤ λ j−1 s j−2 ≤ λ j−1 n =           .                             (3.1)
                                                                            q j2 j

The number of queries in round j is at most q j s j−1 ≤ qn/2 j , and so the total number of queries is at
most qn.
    We next show that FAST ORn,q uses at most r rounds. Suppose for contradiction that FAST ORn,q
                                                                                               (r)
has not terminated after r rounds. Then by (3.1) and the definition of qr+1 , qn ≥ qr+1 = Towerβ (q/2),
and so:
                                       (r)                    (r)
                                    logβ (qn) ≥ q/2 ≥ 40 log1/ε (n) .                              (3.2)
By choosing the constant C(ε) large enough we can ensure by Proposition 2.2(2) that for q, r, n as
hypothesized,
                                  (r)        (r)            (r)
                              logβ (qn) ≤ logβ (n2 ) ≤ 2 logβ (n) .
                                                                                                               (r)
Using Proposition 2.2(1) and again choosing the constant C(ε) large enough, this is at most 16 log1/ε (n),
which contradicts (3.2).
    Next we consider the properties of the final set SR . For j ≤ R and for each i ∈ S j−1 , FAST OR
computes the majority of q j queries to xi . Let p j denote the probability that the majority value disagrees
with xi . Then                       
                                qj                                            1 2q j    1
                      pj ≤             ε q j /2 ≤ 2q j ε q j /2 ≤ ε q j /4 ≤           = 2 ,            (3.3)
                               q j /2                                          β        q j+1
where the third inequality uses ε ≤ 1/16.
   For i ∈ A,
                                                               1    2    1
                                 Pr[i 6∈ SR ] ≤ ∑ pi ≤ ∑      2
                                                                  ≤ 2≤
                                                i≥1          q
                                                          i≥1 i+1  q 2 40000
(with room to spare) since ε ≤ 1/16 and q ≥ 80. This completes the proof of the first assertion in the
theorem.
    To prove the second assertion in the theorem, we now bound

                                              Pr[|A 4 SR | ≥ |A|/100]

from above. Let A j := A ∩ S j and B j := B ∩ S j ; also let a j := |A j | and b j := |B j |. For the event |A 4 SR | ≥
|A|/100 to occur, either |AR | is too small compared to |A| (FAST OR has lost too many indices i with
xi = 1) or BR is too large. We will show that neither of these is likely.
    Let C be the event that |A − AR | ≥ |A|/200 and for j ≥ 0 let Bad j be the event

                                        ( j ≤ R) ∧ (b j > 200(2 j )p j b j−1 ) .

This is the event that in round j of FAST OR the set B j does not shrink fast enough.
   It suffices to prove the following two claims: (i) If none of the events in the set {C} ∪ {Bad j : j ≥ 1}
occur, then |A 4 SR | ≤ |A|/100 and (ii) Pr[∪ j≥1 Bad j ] ≤ 1/200 and Pr[C] ≤ 1/200.

                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                        121
                                   NAVIN G OYAL AND M ICHAEL S AKS

    To prove (i), assume that none of the given events occur. Write |A 4 SR | = |A − AR | + |BR |. Since
C doesn’t occur, |A − AR | ≤ |A|/200. It now suffices to show |BR | ≤ |A|/200. If FAST OR terminates
with SR = 0/ then |BR | ≤ |A|/200 is trivially satisfied. In the other case, the termination condition of
FAST OR and the assumption that BadR does not occur imply:

                                                               λR              qR+1
               aR + bR ≥ λR (aR−1 + bR−1 ) ≥ λR bR−1 ≥                 bR >             bR .
                                                            200(2R )pR      200(22R+1 )

Here the last inequality uses the definition of λR , upper bound (3.3) on pR , and the fact that q > 1.
Using q ≥ 80 it is easy to see that qR+1 /(200(22R+1 )) is minimized when R = 1, in which case it equals
(1/ε)q/16 /1600. Using ε ≤ 1/16 and q ≥ 80 this expression is more than 201 and it follows immediately
that bR ≤ aR /200 ≤ |A|/200.
    To prove (ii), we notice that E[|A − AR |] ≤ |A|/40000 (this in the consequence of the first assertion
in the theorem proved above) and Markov’s inequality imply Pr[C] ≤ 1/200.
    For each j ≥ 1, conditioned on the event that FAST OR executes at least j rounds and also condi-
tioned on the value of b j−1 , the expectation of b j is p j b j−1 . By Markov’s inequality:

                                                              1
                                           Pr[Bad j ] ≤
                                                          2 j 200
and therefore Pr[∪ j≥1 Bad j ] ≤ 1/200.


4    Tradeoff lower bounds for OR
In round j, FAST OR keeps an approximation S j of the set of 1s. The problem for the next round
becomes the OR problem for the bits in S j . The bits in S j are treated similarly by FAST OR at the end
of round j because the answers to the queries to these bits have more 1s than 0s. We can think of S j
as representing the uncertainty FAST OR has about the input. In our lower bound proof, we basically
show that the size of S j is not too small if FAST OR has not made too many queries. More generally,
we will show that for any PNDT, in a round there is a set of indices which look the same, and this set
does not shrink too fast with the number of rounds.
    Given this strategy, a most direct way of working it out would be to (1) use an input distribution
over the all 0 vector and vectors with a single 1, its index uniformly randomly distributed over [n], (2)
show that when the input x is such that OR(x) = 1, sets S j are not too small by showing that there are
many indices which look the same as the index whose bit is 1, (3) when the input is the all 0 vector, then
the distribution of the answers is not too far from when there is one bit with value 1. The last part in
this strategy requires rather technical computations, which we can avoid by working with the following
problem instead:

Definition 4.1. The “Which Half Problem” (WHPn ).

 Input: A 2n-bit vector with a single 1.

 Output: 0, if the 1 appears among the first n bits, and 1 otherwise.

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                           122
                        ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

   The advantage of working with WHP is that we only need to work with one distribution on the inputs
and PNDT lower bounds for WHPn immediately imply similar bounds for ORn :

Lemma 4.2. If there is a qn-query, r-round randomized PNDT for ORn with error probability at most δ
then there is a qn-query, r-round randomized PNDT for WHPn with error probability at most δ .

Proof. For a 2n-bit vector x with a single 1, WHPn (x) = ORn (xn+1 , . . . , x2n ), so WHPn (x) can be solved
using a PNDT for ORn .

    Let WHPUn denote WHPn with input chosen uniformly at random from e1 , . . . , e2n , where e j ∈
{0, 1}2n has a unique 1 in position j.

Theorem 4.3. Let ε ∈ (0, 1/2). There is a positive integer C = C(ε) such that for any positive integers
n, r with r ≤ log∗1/ε n − C, any (possibly randomized) PNDT that solves WHPUn in r rounds with error
                                             1       (r)
probability less than 1/4 requires at least 100 n log1/ε n queries.

    Theorem 1.1 follows immediately using Lemma 4.2.
    It suffices to prove Theorem 4.3 for deterministic PNDTs, since a randomized PNDT can be viewed
as a probability distribution over deterministic PNDTs and the probability of error of a randomized
PNDT running on a fixed input distribution is a suitable average over deterministic trees run over the
same input distribution. (This is the easy direction of Yao’s lemma [13].)

4.1   The round elimination lemma
Let errn (r,t) be the minimum possible error probability of an r-round, deterministic PNDT for WHPUn
that uses at most t queries. The round elimination lemma relates errn (r,t) to errn0 (r − 1,t) for some n0
that is not much smaller than n:

Lemma 4.4. (Round elimination) Let ε ∈ (0, 1/2) and θ ≥ 10. Let n, Q, r be positive integers such that
Q ≥ n > ( ε1 )4θ Q/n , and let n0 be a positive integer satisfying n0 ≤ n ε 2θ Q/n . Then:
                                                                       
                                                                      5
                                   errn (r, Q) ≥ errn0 (r − 1, Q) 1 −     .
                                                                      θ

    We remark that in the above inequality the total number of queries available for r rounds is equal to
the total number of queries available for r − 1 rounds. In our application of this inequality, after the first
round of queries fewer queries will be available for the remaining r − 1 rounds, however the proof goes
through without taking this into account.

Proof. Let X be the random input to WHPUn and let K be the (random) index such that XK = 1.
    We want to show that any r-round, Q-query deterministic PNDT for WHPUn has error probability
at least errn0 (r − 1, Q)(1 − 5/θ ). The analysis is simplified by introducing an advisor who provides
some additional information to the PNDT. We show the desired lower bound holds for PNDTs with this
advisor; since the advisor can only reduce the probability of error, the lower bound applies to PNDT
without advice.

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               123
                                    NAVIN G OYAL AND M ICHAEL S AKS

    After the first round, the advisor reveals the values of X at all but n0 locations of each half of the
input, and also provides some additional information about the unrevealed values. Let T be the event
that the unique 1 is revealed, and let T be the complementary event. The behavior of the advisor will
ensure the following conditions:

   1. Pr[T ] ≤ θ5 .

   2. The conditional distribution on K given T is uniform over the set of unrevealed locations.

     These two conditions imply the theorem since by Condition 2, the conditional probability that the
PNDT makes an error given T is at least errn0 (r − 1, Q), and thus by Condition 1, the probability of error
is at least errn0 (r − 1, Q)(1 − 5/θ ).
     We now describe a strategy for the advisor that guarantees these two conditions. The first round of
queries of a PNDT for WHPn is specified by a vector (m1 , . . . , m2n ), where mi is the number of times Xi
is queried. By hypothesis, m1 + . . . + m2n ≤ Q. Let B1 (resp. B2 ) consist of the bn/θ c indices among
the first half (resp. second half) of the variables that are queried most often, breaking ties arbitrarily. Let
A1 := [n] − B1 and A2 := [n + 1, 2n] − B2 , and A := A1 ∪ A2 . Note that for j ∈ {1, 2} and i ∈ A j since
mi ≤ mh for all h ∈ B j , we have

                                                 1                   Qθ
                                     mi ≤                ∑      mh ≤    .
                                            |B j | + 1 h∈B ∪{i}       n
                                                        j



Define m = bQθ /nc; since the mi are integers, we have mi ≤ m for all i ∈ A j .
   We now describe the behavior of the advisor.

Step 1. The advisor reveals the values of (Xi : i ∈ B := B1 ∪ B2 ). If K ∈ B, the computation ends.

Step 2. For each i ∈ A, the advisor provides answers to m − mi additional (noisy) queries to Xi . Thus for
      each i ∈ A, the PNDT has m noisy values for Xi .

Step 3. For i ∈ A, let vi be the number of queries to Xi that reported 1. For j ∈ {1, 2}, let S j := {i ∈ A j :
      vi = vK } and let s j := |S j |. (S j is the set of indices that “look the same” as K to the PNDT.) If
      s1 < n0 or s2 < n0 the advisor reveals the value at location K and the computation ends. Otherwise,
      the advisor reveals the values of the variables outside S := S1 ∪ S2 .

Step 4. The advisor chooses |s1 −s2 | indices at random from the larger of S1 , S2 and reveals those values.
      Let R1 , R2 be the subsets of S1 , S2 that remain unrevealed.

Step 5. If K is unrevealed then |R1 | = |R2 | ≥ n0 and the conditional distribution of K is uniform over
      R1 ∪ R2 . Finally, the advisor chooses a pair R01 ⊆ R1 and R02 ⊆ R2 of subsets uniformly at random
      from among the pairs of subsets of size exactly n0 such that K ∈ R01 ∪ R02 , and reveals all the values
      of locations in (R1 − R01 ) ∪ (R2 − R02 ). This maintains the property that K is unrevealed and the
      location of K is uniformly distributed over R01 ∪ R02 .

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               124
                          ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

    It is clear from this description that Condition 2 is satisfied and it remains to verify Condition 1. For
1 ≤ i ≤ 5, let Ei be the event that the advisor reveals the location of K during step i. The events Ei are
disjoint and E2 and E5 are empty, so Pr[T ] = Pr[E1 ] + Pr[E3 ] + Pr[E4 ]. Since K is independent of the
queries performed in the first round
                                                                 |B|  1
                                        Pr[E1 ] = Pr[K ∈ B] =        ≤ .
                                                                 2n   θ
    It remains to give an upper bound on Pr[E3 ] + Pr[E4 ]. For any event F containing E3 ,

                          Pr[E3 ] + Pr[E4 ] ≤ Pr[F] + Pr[E4 ∧ F] ≤ Pr[F] + Pr[E4 |F] .

We will choose such an event F and bound the two terms of this final sum.
    For a positive integer s and γ ∈ [0, 1], let B(s, γ) denote the sum of s independent 0-1 random vari-
ables, each equal to 1 with probability γ. Let
                                                            
                                                            m t
                              p(t) := Pr[B(m, ε) = t] =          ε (1 − ε)m−t .
                                                             t

Observe that for any t ∈ [0, m], p(t) ≥ ε m , using the fact that ε ≤ 1/2. Define
                                                           
                           w(t) := E B(n − bn/θ c, p(t)) = (n − bn/θ c)p(t) .

Let W = w(vK ); thus W is a random variable depending on vK . For j ∈ {1, 2} we define Fj to be the
event that |s j −W | > W /θ and F to be the event F1 ∨ F2 . To show that Condition 1 holds it now suffices
to show:

(A1) E3 ⊆ F.

(A2) Pr[Fj ] ≤ 1/θ for j ∈ {1, 2}.

(A3) Pr[E4 |F] ≤ 2/θ .

    For (A1), if E3 holds, then for some j ∈ {1, 2},
                                                                    1
                                s j < n0 ≤ nε 2m ≤ nε m p(vK ) ≤ 1 −   w(vK ) ,
                                                                     θ
which implies F. (The last inequality is very loose.)
   Next, we bound Pr[Fj ] for j ∈ {1, 2}. Fix k ∈ {1, . . . , 2n} and t ∈ [0, m] arbitrarily. We show that

                                                                         1
                                        Pr[Fj |(K = k) ∧ (vK = t)] ≤       ,
                                                                         θ
from which Pr[Fj ] ≤ 1/θ follows immediately. Condition on the event (K = k) ∧ (vK = t). Then for
i ∈ [n], let Yi be the indicator of the event i ∈ S j . Thus s j = ∑i∈A j Yi . We have Yk = 1 and, for i ∈ A j − {k},
Yi ∼ B(1, p(t)). If k 6∈ A j then s j ∼ B(n − bn/θ c, p(t)), and otherwise s j ∼ B(n − bn/θ c − 1, p(t)) + 1; in
either case we can write s j = Z +Y , where Z ∼ B(n − bn/θ c, p(t)) and Y is a random variable satisfying

                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                    125
                                    NAVIN G OYAL AND M ICHAEL S AKS

0 ≤ Y ≤ 1. Noting that Z has variance Var[Z] = (1 − p(t))E[Z] ≤ E[Z] = w(t) and using Chebyshev’s
inequality we get:
                                                                                
                                                 w(t)                      w(t)
  Pr[Fj |(K = k) ∧ (vk = t)] = Pr |s j − w(t)| >        ≤ Pr |Z − w(t)| ≥       −1
                                                  θ                          θ
                                                                   2       2        4θ 2
                                                     
                                                 w(t)     4Var[Z]θ      4θ                  1
                             ≤ Pr |Z − w(t)| ≥          ≤       2
                                                                     ≤        ≤  m       n ≤ .
                                                 2θ         E[Z]        E[Z] ε (n − b θ c) θ

The second inequality follows from w(t) ≥ 2θ , which in turn follows from w(t) ≥ (n − bn/θ c)ε bθ Q/nc
together with the hypotheses n ≥ (1/ε)4θ Q/n , θ ≥ 10, ε < 1/2 and Q ≥ n. The final inequality also
follows easily from these hypotheses. Thus we have shown that Pr[F1 ] = Pr[F2 ] ≤ 1/θ .
    It remains to upper bound Pr[E4 ]. The event E4 occurs if xK is among the |s1 − s2 | variables of S1 ∪ S2
that the advisor reveals. This happens with probability at most |s1 − s2 |/(s1 + s2 ). Given F, we have
W (1 − 1/θ ) ≤ s1 , s2 ≤ W (1 + 1/θ ), which implies |s1 − s2 |/(s1 + s2 ) ≤ 2/θ , using θ ≥ 10.
    Summing the above three upper bounds, we conclude that Pr[T ] ≤ 5/θ , as required to verify Condi-
tion 1.

4.2   Proof of Theorem 4.3
In this section we prove Theorem 4.3 using Lemma 4.4. To do this we extend Lemma 4.4. Recall that
we already noted that Theorem 1.1 follows immediately from Theorem 4.3 and Lemma 4.2.
Lemma 4.5 (Multiple-Round elimination). Let ε ∈ (0, 1/2) and θ ≥ 10. Let n, Q, r be positive integers
with Q ≥ n. Define the sequence (n j : j ≥ 0) by the recurrence n0 = n and for j ≥ 1,
                                                              j
                                          n j = bn j−1 ε 2 θ Q/n j−1 c .

For a natural number i, let I(i) denote the inequality:
                                                          i+2 θ Q/n
                                          I(i) : ni ε 2            i
                                                                       ≥ 1,

If j is a natural number such that I( j − 1) holds then:
                                                                         
                                                                10     −j
                             errn (r, Q) ≥ errn j (r − j, Q) 1 − (1 − 2 ) .
                                                                θ
Proof. We proceed by induction on j; the case j = 1 follows immediately from Lemma 4.4. So assume
j ≥ 2 and I( j − 1) holds. Since n j is a decreasing function of j, I( j − 2) holds and so by the induction
hypothesis:                                                                       
                                                                 10       −( j−1)
                                                                                  
                       errn (r, Q) ≥ errn j−1 (r − j + 1, Q) 1 −    1−2              .
                                                                 θ
Now by Lemma 4.4 with parameters n, Q, θ , n0 in that lemma replaced by n j−1 , Q, 2 j−1 θ , n j , we have
                                                                        
                                                                      10
                       errn j−1 (r − j + 1, Q) ≥ errn j (r − j, Q) 1 − j .
                                                                      θ2

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                              126
                         ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

Substituting this in to the right-hand side of the previous inequality we get:
                                                                                    
                                                           10          10    −( j−1)
                                                                                     
                      errn (r, Q) ≥ errn j (r − j, Q) 1 − j       1−      1−2
                                                          θ2            θ
                                                                      
                                                          10
                                                              1 − 2− j .
                                                                      
                                  ≥ errn j (r − j, Q) 1 −
                                                          θ


    Setting j = r and θ = 20 in the lemma, and noting that errn (0, Q) = 1/2 (since a 0-round algorithm
has probability at most 1/2 to guess the correct half) we obtain:
Corollary 4.6. Let ε ∈ (0, 1/2). Let n, Q, r be positive integers with Q ≥ n and let θ = 20. Define the
sequence (n j : j ≥ 0) and the condition I( j) as in Lemma 4.5. If r is a natural number such that I(r − 1)
holds then:                                                      
                                                1      1      −r         1
                                 errn (r, Q) ≥      1 − (1 − 2 ) ≥ .
                                                2      2                 4
    We are now ready to prove Theorem 4.3.

Proof of Theorem 4.3. Set θ = 20 as in Corollary 4.6 and define Q to be:
                                                     1       (r)
                                             Q=         n log1/ε (n) .                                   (4.1)
                                                    160
If r is an integer satisfying
                                        r ≤ log∗1/ε (n) − log∗1/ε (160) ,                                (4.2)
then Q ≥ n. We will show further that condition I(r − 1) holds and then Corollary 4.6 implies
                                                                1
                                                errn (r, Q) ≥     ,
                                                                4
as needed to prove the theorem.
    From the recurrence for n j we have:
                                                j                          j+1 θ Q/n
                                n j = bn j−1 ε 2 θ Q/n j−1 c ≥ n j−1 ε 2               j−1
                                                                                             .           (4.3)

Define q j = 2 j+3 θ Q/n j . Note that q0 = 8θ Q/n ≥ 160 since θ = 20, and also note that (q j ) is an
increasing sequence since (n j ) is a decreasing sequence. Condition I(r − 1) can be rewritten as:

                                         2r+2 θ Q ≥ qr−1 (1/ε)qr−1 /2 .

Since the left hand side is larger than n and the right hand side is smaller than (1/ε)qr−1 , I(r − 1) follows
from:
                                               log1/ε n ≥ qr−1 .                                          (4.4)
    Substituting the definition of q j into (4.3), we get that for j ≥ 1:

                                    2 j+3 θ Q/q j ≥ 2 j+2 θ Q/q j−1 ε q j−1 /2 ,

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                               127
                                      NAVIN G OYAL AND M ICHAEL S AKS

which can be rewritten as:
                                     q j ≤ 2q j−1 (1/ε)q j−1 /2 ≤ (1/ε)q j−1 ,
                                               (r−1)
from which we conclude that qr−1 ≤ Tower1/ε (160Q/n). Substituting definition (4.1) for Q gives (4.4)
as required.


5     Comparison trees
In this section we prove Theorems 1.2 and 1.3.

5.1   Deterministic comparison trees
Proof sketch for Theorem 1.2. Our goal is to give an algorithm that finds the maximum of n numbers
in O(log log n) rounds and a linear number of comparison queries, even against a clairvoyant adversary.
The algorithm we use is a fault tolerant version of the Shiloach-Vishkin algorithm [10] mentioned earlier.
     First we describe a simple one round algorithm, TRIVMAX, which takes as input a subset S of [n]
and an odd integer parameter k and attempts to output the index i ∈ S for which yi is maximum. For
each pair i, j of distinct indices in S, TRIVMAX performs the comparison xi , x j k times. and declares
the “winner” to be the one that is larger in a majority of comparisons. If there is some index that is
the winner against every other index then it is output as the maximum, otherwise TRIVMAX outputs an
arbitrary index of S.
     TRIVMAX performs k |S|
                                 
                                2 comparisons. For each pair i, j, the probability that the winner is selected
incorrectly can be bounded abovepby C(ε)k where C(ε) < 1 is a constant depending only on ε (in fact,
it is easy to show that C(ε) ≤ 2 ε(1 − ε)). Thus the probability that the maximum is not correctly
identified is at most |S|C(ε)k .
     We now define a parametrized multiround algorithm based on repeated use of TRIVMAX. The
algorithm maintains a subset of [n] whose members are candidates for the index at which the maximum
occurs. In each round, the algorithm reduces the set of candidates. During the round, the algorithm splits
the candidates into groups of (nearly) equal size and performs TRIVMAX on each group. The winners
for each group advance to the next round. For simplicity, let us assume that n is a power of 2. The
algorithm takes as parameters r (the number of rounds) and integers g1 , . . . , gr , k1 , . . . , kr where each gi
is a power of 2 which specifies the size of the groups in round i (and g1 g2 · · · gr = n) and ki is a positive
odd integer which specifies the number of times each comparison is performed in round i. For 0 ≤ j ≤ r,
            j
n j := n/ ∏i=1  gi is then the number of candidates that remain after j rounds; note that n0 = n and nr = 1.
     Since each candidate participates in ki (gi − 1) pairwise comparisons, the number of comparisons
in round i is ki ni−1 (gi − 1)/2. The algorithm fails if the maximum is eliminated in some round; the
probability that the maximum is eliminated in round i is bounded by giC(ε)ki , where C(ε) is the constant
arising in the analysis of TRIVMAX (defined above).
     It now remains to choose the parameters gi , ki so that the total number of comparisons is linear and
the probability that the maximum is eliminated is bounded below 1/4, while minimizing the number
of rounds. By Valiant’s lower bound the number of rounds must be Ω(log log n). Here is one way (of
many possible ways) to match this bound. All logarithms in this description are to the base 2. Let
h = dlog log ne.

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                   128
                          ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

    The rounds are divided into two stages. The first stage consists of 2h rounds. For 1 ≤ i ≤ 2h, we
take gi = 2 and ki = 2W1 i + 1 for some suitably large constant integer W1 . This reduces the size of the
candidate set to n2h = n/4h ≤ n/(log n)2 . The number of comparisons during round i is
                                         (2W1 i + 1)ni−1 (2W1 i + 1)n
                                                        =             ,
                                                2            2i
so the total number of comparisons in the first 2h rounds is O(n). Also, the probability that the maximum
is eliminated in round i is bounded above by λ (W1 )i where λ (W1 ) can be made arbitrarily small by
choosing W1 large enough. So the probability that the maximum is eliminated in the first T rounds can
be made arbitrarily small.
                                               q
    Now let q be the least integer such that 22 −1 ≥ n2h ; note q ≤ dlog log ne. The second phase consists
                                                                i−1                                 i
of q rounds. For rounds 2h + i, 1 ≤ i ≤ q − 1, set g2h+i := 22 ; this implies that n2h+i = n2h /22 −1 . In
                             q−1        q−1
particular, n2h+q−1 = n2h /22 −1 ≤ 22 by the assumption on q. Set g2h+q := n/n2h+q−1 ; this ensures
that n = ∏2w+q
            i=1 gi and so after 2h + q = O(log log n) rounds we have reduced to a single candidate.
    Set k2h+i = 2dW2 log ne + 1 for each 1 ≤ i ≤ q, where W2 is a suitably large constant. The num-
ber of comparisons in round 2h + i is O(n2h+i−1 g2h+iW2 log n) = O(n/ log n), so the total number of
comparisons in the second phase is o(n).
    The probability that the maximum is eliminated in round 2h + i is at most g2w+iC(ε)ki which is
o(1/n) for an appropriate choice of W2 . Thus the chance that the maximum is eliminated in the second
stage can be made arbitrarily small.

5.2    Randomized comparison trees
Proof sketch for Theorem 1.3. Lower bound. As mentioned before, we prove the tradeoff lower bound
for MAXn by reducing ORn to it. Suppose we have a PNCT that computes MAXn of n distinct inputs
in the random noise model; we will show how to use it to compute ORn on a given Boolean input
x = (x1 , . . . , xn ). Define yi := xi + i/2n so y1 , . . . , yn , so the yi are distinct. If we determine the index j
such that y j is maximum, then ORn (x) = x j . We then finish the algorithm by reading x j in parallel 2s + 1
times (for some suitably large constant s) and taking the majority value to be the output.
    To find MAXn (y) we simulate the comparison queries of the PNCT for MAXn using the Boolean
queries: To compare yi and y j with i < j we query xi and x j . Call the returned values xi0 and x0j , and
answer the simulated query with yi > y j if xi0 = 1 and x0j = 0 and with y j > yi otherwise.
    In the above simulation, the probability of error for the simulated comparison query depends on the
values of xi and x j . For example, if n = 2 and (x1 , x2 ) = (0, 0), then (y1 , y2 ) = (1/4, 1/2), and so y1 < y2 .
The probability that the simulated query incorrectly answers y2 < y1 is ε(1 − ε), which corresponds
to the event (x10 , x20 ) = (1, 0). On the other hand, if (x1 , x2 ) = (0, 1), then (y1 , y2 ) = (1/4, 3/2), and so
y1 < y2 . But now the probability that the simulated query incorrectly answers y2 < y1 , is ε 2 , again
corresponding to the event (x10 , x20 ) = (1, 0).
    Since our PNCT for MAXn is only guaranteed to work for the random noise model in which the
probability of error of each comparison is the same fixed value, we need to modify the above simulation
so that every comparison query has the same probability of error. To this end, we randomly perturb the
answers of the above simulation with small probabilities so that the errors for all comparison queries are
the same.

                          T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                      129
                                     NAVIN G OYAL AND M ICHAEL S AKS

     The modified simulation requires four perturbation probabilities q00 , q01 , q10 , q11 which we will
specify below. To simulate the comparison xi , x j for i < j we again perform noisy queries to xi and
x j to get xi0 and x0j . The comparison query answers xi > x j with probability qxi0 ,x0j and x j > xi with proba-
bility 1 − qxi0 ,x0j .
     We want to choose the perturbation probabilities so that the probability of error α of comparison
queries is fixed and not too large. We have the following equations:

   Equation (0, 0)       α = q00 (1 − ε)2 + q01 (1 − ε)ε + q10 (1 − ε)ε + q11 ε 2 ,
   Equation (0, 1)       α = q00 (1 − ε)ε + q01 (1 − ε)2 + q10 ε 2 + q11 (1 − ε)ε ,
   Equation (1, 0)       α = (1 − q00 )(1 − ε)ε + (1 − q01 )ε 2 + (1 − q10 )(1 − ε)2 + (1 − q11 )(1 − ε)ε ,
   Equation (1, 1)       α = q00 ε 2 + q01 (1 − ε)ε + q10 (1 − ε)ε + q11 (1 − ε)2 .


Equation (u, v) corresponds to the case (xi , x j ) = (u, v) and expresses the probability that the simulated
comparison gives the incorrect answer. In each equation, there are four terms corresponding to the
possible query results (xi0 , x0j ). For example, in equation (0, 0), the first term comes from the fact that
(xi0 , x0j ) = (0, 0) with probability (1 − ε)2 and then the probability that the response is xi > x j q00 .
       Solving these equations for the qs gives:

                            q00 = q11 = (α − ε + ε 2 − 2αε + 2αε 2 )/(1 − 2ε)2 ,
                                   q01 = (α + ε 2 − 4αε + 2αε 2 )/(1 − 2ε)2 ,
                                   q10 = (1 − α − 2ε + ε 2 + 2αε 2 )/(1 − 2ε)2 .
                                                                                                    √
We want to choose α to be small, so that these are all in the range [0, 1]. For example, α =         ε gives:
                                                  √
                                     q00 = q11 = ε + O(ε) ,
                                                       √
                                           q01 = 1 − ε + O(ε) ,
                                                  √
                                           q10 = ε + O(ε) .


These are all in [0, 1] provided that ε is small enough. We have shown that there is a constant ε1 ∈ (0,√1)
such that for all positive ε ≤ ε1 we can simulate a comparison query with a fixed probability of error ε
using two Boolean queries each with probability  √ of error ε.
    Using Theorem 1.1 this shows that any ε-noisy comparison decision tree for MAXn with O(n)
queries must use Ω(log∗1/√ε n) rounds. Since log∗1/ε n = Θ(log∗1/√ε n), an ε-noisy decision tree with
O(n) queries needs Ω(log∗1/ε n) rounds, completing the proof of the lower bound part of Theorem 1.3.
Upper bound. Our PNCT for MAXn borrows the sampling idea used by Reischuk [8] to design a noise-
free PCT for MAXn with O(1) rounds and O(n) queries. Let y1 , . . . , yn denote the inputs, which are
distinct elements of the totally ordered set U.
    The PNCT uses two subroutines TRIVMAX (which was defined in the previous subsection) and
APXM.

                         T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                                  130
                        ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

     APXM takes as input a subset S of {1, . . . , n} and u ∈ {1, . . . , n} and returns a subset R of S such
that with probability at least 1 − 1/90, max(S) ∈ R, and |R 4 S≥u | ≤ |S≥u |/100 where S≥u is the set
{i ∈ S : yi ≥ yu }.
     APXM(S, u) is derived from the PNDT for Theorem 3.1: We create a Boolean input z = (z1 , . . . , zn )
for the PNDT by setting zi = 1 if yi > yu , and zi = 0 otherwise. We then run FAST OR on z. Theorem 3.1
implies that APXM(S, u) has the properties claimed with probability at least 1 − 1/100 − 1/40000 >
1 − 1/90.
     APXM uses O(|S|) queries and O(log∗ |S|) rounds.
     Now we present our PNCT for MAXn . The parameter k for TRIVMAX is K log n for a suitably large
constant K. Let T0 := [n].


                                             PNCT for MAXn

             1 Uniformly randomly choose T1 ⊂ {1, . . . , n} of size n1/3 .
                  m1 = TRIVMAX(T1 ).
                  T2 = APXM(T0 , m1 ).

             2 Uniformly randomly choose T3 ⊂ T2 of size n1/3 .
                  m2 = TRIVMAX(T3 ).
                  T4 = APXM(T2 , m2 ).
                  If |T4 | > 100 n1/3 , HALT with ERROR.

             3 Output TRIVMAX(T4 ).


    We sketch the routine analysis. Since the elements of T1 are chosen uniformly randomly, with very
high probability, S>m1 has at most 50n2/3 elements, and the properties of APXM imply that with high
probability T2 contains the index of the overall maximum and has size at most 60n2/3 . Similarly, with
high probability T4 contains the index of the overall maximum and has size at most 1000n1/3 .
    The total number of comparisons is O(n) since TRIVMAX is run only on sets of size O(n1/3 ) and
APXM takes at most O(n) comparisons.

    Finally we prove Corollary 1.4 which shows how to simulate a randomized PNCT for MAXn in the
random noise model by a deterministic PNCT.

Proof of Corollary 1.4. We adapt the above randomized PNCT for MAXn . The        above1/3 PNCT uses ran-
                                                                              n
domness for choosing the sets T1 and T3 . For each of these it needs O(log n1/3  ) = O(n log n) random
bits. It can be simulated by a O(n)-query, log∗1/ε n-round deterministic PNCT because it has access to
O(n) biased random bits (it can just query a single variable repeatedly to generate biased random bits).
This simulation can be done, for example, using von Neumann’s procedure [12] for generating perfectly
random bits from independent random bits with common but unknown bias: take noisy samples of x1 in
pairs. Ignore pairs that return the same value. If the pair returns 01 interpret this as a 0 and if the pair
returns 10 interpret this as a 1.

                        T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                              131
                                  NAVIN G OYAL AND M ICHAEL S AKS

6    Future work
There are several existing results in the literature about the NDT query complexity (without restriction
on rounds) for some specific Boolean functions, e. g., MAJORITY and PARITY. This paper gives
tradeoff results between rounds and queries for OR function. It would be nice to obtain analogous
results for other functions. More generally, one might hope to characterize the query/rounds tradeoff for
any Boolean function in terms of combinatorial properties of the function.
    For the noisy comparison tree model, one can ask questions about for SELECTION similar to the
ones we studied above for MAX, Note that answers to these questions are known for the noise-free
comparison tree model by the work of Ajtai et al. [1] and Reischuk [8] for deterministic and randomized
algorithms respectively.


Acknowledgements
We would like to thank anonymous referees of previous versions for their helpful comments.


References
 [1] M IKL ÓS A JTAI , J ÁNOS KOML ÓS , W. L. S TEIGER , AND E NDRE S ZEMER ÉDI: Optimal par-
     allel selection has complexity O(log log n). J. Comput. System Sci., 38(1):125–133, 1989.
     [doi:10.1016/0022-0000(89)90035-4]. 132

 [2] W ILLIAM E VANS AND N ICHOLAS P IPPENGER: Average-case lower bounds for noisy Boolean
     decision trees. SIAM J. Comput., 28(2):433–446, 1998. [doi:10.1137/S0097539796310102]. 113

 [3] U RIEL F EIGE: On the complexity of finite random functions. Inform. Process. Lett., 44(6):295–
     296, 1992. [doi:10.1016/0020-0190(92)90102-2]. 113

 [4] U RIEL F EIGE AND J OE K ILIAN: Finding OR in a noisy broadcast network. Inform. Process. Lett.,
     73(1-2):69–75, 2000. [doi:10.1016/S0020-0190(00)00002-8]. 113, 114, 115, 117

 [5] U RIEL F EIGE , P RABHAKAR R AGHAVAN , DAVID P ELEG , AND E LI U PFAL: Computing with
     noisy information. SIAM J. Comput., 23(5):1001–1018, 1994. [doi:10.1137/S0097539791195877].
     113, 114, 115

 [6] E YAL K USHILEVITZ AND Y ISHAY M ANSOUR: Computation in noisy radio networks. SIAM J.
     Discrete Math., 19(1):96–108, 2005. [doi:10.1137/S0895480103434063]. 113

 [7] I LAN N EWMAN: Computing in fault tolerance broadcast networks. In Proc. IEEE Conf. on Com-
     put. Complex., pp. 113–122. IEEE Comp. Soc. Press, 2004. [doi:10.1109/CCC.2004.1313813].
     113, 114, 117

 [8] R ÜDIGER R EISCHUK: Probabilistic parallel algorithms for sorting and selection. SIAM J. Comput.,
     14(2):396–409, 1985. [doi:10.1137/0214030]. 116, 130, 132

                       T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                           132
                     ROUNDS VS . Q UERIES T RADEOFF IN N OISY C OMPUTATION

 [9] R ÜDIGER R EISCHUK AND B ERND S CHMELTZ: Reliable computation with noisy circuits and
     decision trees—A general n log n lower bound. In Proc. 32nd FOCS, pp. 602–611. IEEE Comp.
     Soc. Press, 1991. [doi:10.1109/SFCS.1991.185425]. 113

[10] YOSSI S HILOACH AND U ZI V ISHKIN: Finding the maximum, merging, and sorting in a parallel
     computation model. J. Algorithms, 2(1):88–102, 1981. 115, 128

[11] L ESLIE G. VALIANT: Parallelism in comparison problems. SIAM J. Comput., 4(3):348–355, 1975.
     [doi:10.1137/0204030]. 114, 115

[12] J OHN VON N EUMANN: Various techniques used in connection with random digits. In A. S.
     H OUSEHOLDER , G. E. F ORSYTHE , AND H. H. G ERMOND, editors, Monte Carlo Method, vol-
     ume 12 of National Bureau of Standards Applied Mathematics Series, pp. 36–38. 1951. 131

[13] A NDREW C HI -C HIH YAO: Probabilistic computations: Toward a unified measure of complexity.
     In Proc. 18th FOCS, pp. 222–227. IEEE Comp. Soc. Press, 1977. [doi:10.1109/SFCS.1977.24].
     123


AUTHORS

     Navin Goyal
     Researcher
     Microsoft Research India, Bangalore
     navingo microsoft com
     http://research.microsoft.com/en-us/people/navingo/


     Michael Saks
     Professor
     Rutgers University
     saks@math.rutgers.edu
     http://www.math.rutgers.edu/∼saks/


ABOUT THE AUTHORS

     NAVIN G OYAL graduated from Rutgers University in 2005; his advisor was Mike Saks.
       His thesis focused on information theoretic limitations on computation in models such
       as communication protocols. He moved to his native India in 2009 after postdocs at
       McGill University and Georgia Tech. He gets easily excited about many research topics
       and has worked in several areas after his thesis, but continues to be interested in the
       interplay between computation and information.




                     T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                         133
                          NAVIN G OYAL AND M ICHAEL S AKS

M ICHAEL S AKS is a Professor of Mathematics at Rutgers University. He has worked
   in a wide range of areas within theoretical computer science including circuit lower
   bounds, decision tree complexity, communication complexity, derandomization, online
   algorithms, and distributed computing. Most recently he has been thinking about com-
   munication complexity lower bounds, sublinear time approximation algorithms, poly-
   nomial identity testing, and algorithms for stable instances of NP-hard problems.




                T HEORY OF C OMPUTING, Volume 6 (2010), pp. 113–134                       134