An Optimal Lower Bound for Monotonicity Testing over Hypergrids

Authors Deeparnab Chakrabarty, C. Seshadhri,
Plaintext
                        T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464
                                       www.theoryofcomputing.org




            An Optimal Lower Bound for
         Monotonicity Testing over Hypergrids
                            Deeparnab Chakrabarty                        C. Seshadhri∗
                Received April 2, 2014; Revised December 3, 2014; Published December 24, 2014




       Abstract: For positive integers n, d, the hypergrid [n]d is equipped with the coordinatewise
       product partial ordering denoted by ≺. A function f : [n]d → N is monotone if ∀x ≺ y,
       f (x) ≤ f (y). A function f is ε-far from monotone if at least an ε fraction of values must be
       changed to make f monotone. Given a parameter ε, a monotonicity tester must distinguish
       with high probability a monotone function from one that is ε-far.
           We prove that any (adaptive, two-sided) monotonicity tester for functions f : [n]d → N
       must make Ω(ε −1 d log n − ε −1 log ε −1 ) queries. Recent upper bounds show the existence
       of O(ε −1 d log n) query monotonicity testers for hypergrids. This closes the question of
       monotonicity testing for hypergrids over arbitrary ranges. The previous best lower bound for
       general hypergrids was a non-adaptive bound of Ω(d log n).
ACM Classification: F.1.2, F.2.2
AMS Classification: 68Q17, 68W20
Key words and phrases: lower bounds, property testing, monotonicity testing

1     Introduction
Given query access to a function f , the area of property testing [21, 17] deals with the problem of
determining properties of f without accessing all its inputs. Monotonicity testing [16] is a classic problem
     A conference version of this paper appeared in the Proceedings of the 17th Internat. Workshop on Randomization and
Computation (RANDOM 2013) [11].
   ∗ This work was funded by the DARPA Graph-theoretic Research in Algorithms and the Phenomenology of Social Networks

(GRAPHS) program and by the DOE ASCR Complex Interconnected Distributed Systems (CIDS) program, and Sandia’s
Laboratory Directed Research & Development (LDRD) program. Sandia National Laboratories is a multi-program laboratory
managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.
Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.


© 2014 Deeparnab Chakrabarty and C. Seshadhri
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2014.v010a017
                              D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

in property testing. Consider a function f : D → R, where D is a finite set equipped with a partial
order given by “≺,” and R is a set equipped with a total order. The function f is monotone if for all
x ≺ y (in D), f (x) ≤ f (y). The distance to monotonicity of f is the minimum fraction of values that
need to be modified to make f monotone. More precisely, let the distance between functions d( f , g) be
|{x : f (x) 6= g(x)}|/|D|, and let M be the set of all monotone functions. Then the distance to monotonicity
of f is ming∈M d( f , g). (This minimum always exists since D is finite.)
    A function is called ε-far from monotone if the distance to monotonicity is strictly greater than ε.
A property tester for monotonicity is a, possibly randomized, algorithm that takes as input a distance
parameter ε ∈ (0, 1), error parameter δ ∈ [0, 1], and query access to an arbitrary f . If f is monotone, then
the tester must accept with probability > 1 − δ . If it is ε-far from monotone, then the tester rejects with
probability > 1 − δ . If neither, then the tester is allowed to do anything. The aim is to design a property
tester making as few queries as possible to the function. A tester is called one-sided if it always accepts a
monotone function. A tester is called non-adaptive if the queries made do not depend on function values
returned in the previous queries. The most general tester is an adaptive, two-sided tester.
    Monotonicity testing has a rich history and the hypergrid domain, [n]d , has received special attention.
The boolean hypercube (n = 2) and the total order (d = 1) are special instances of hypergrids. Following
a long line of work [13, 16, 12, 19, 15, 1, 14, 18, 20, 2, 3, 4], previous work of the authors [10] shows
the existence of O(ε −1 d log n)-query monotonicity testers. The result in this paper is a matching lower
bound that is optimal in all parameters for functions of unbounded range.

Theorem 1.1. Any adaptive, two-sided monotonicity tester for functions f : [n]d → N requires

                                            d log n − log ε −1
                                                                
                                          Ω
                                                    ε

queries, assuming ε > n−d .

1.1   Previous work
The problem of monotonicity testing was introduced by Goldreich et al. [16], who demonstrated a O(n/ε)
tester for functions f : {0, 1}n → {0, 1}. The first tester for general hypergrids was given by Dodis et
al. [12]. The upper bound of O(ε −1 d log n) for monotonicity testing was recently proven in [10]. We
refer the interested reader to the introduction of [10] for a more detailed history of previous upper bounds.
     There have been numerous lower bounds for monotonicity testing. Following the work of Ergun
et al. [13] who demonstrated an Ω(log n) lower bound for non-adaptive monotonicity testers, for the
total order D = [n], Fischer [14] gave an Ω(log n) lower bound for adaptive
                                                                         √       monotonicity testers as well
over [n]. For the hypercube domain, Fischer et al. [15] proved a Ω( d) lower bound for non-adaptive,
one-sided testers (this lower bound holds even for {0, 1}-ranged functions), which was improved to a
Ω(d/ε) lower bound by Brïet et al. [8]. Using an ingenious reduction from communication complexity,
Blais, Brody and Matulef [4] proved an Ω(d) lower bound for adaptive, two sided testers. Honing this
reduction, Brody [9] improved it to an Ω(d/ε) lower bound. For the hypergrid domain, the only lower
bound known was an Ω(d log n) for non-adaptive testers by Blais, Raskhodnikova, and Yaroslavtsev [5]
using communication complexity techniques.

                    T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                            454
              A N O PTIMAL L OWER B OUND FOR M ONOTONICITY T ESTING OVER H YPERGRIDS

    We note that our theorem only holds when the range is N, while some  √ previous results hold for
restricted ranges. The results of [4, 9] provide lower bounds for range [ d] and that of Blais et al. [5]
hold for the range [nd]. For these settings, the communication complexity reductions provide stronger
lower bounds than our result.

1.2    Preliminaries and main ideas
We start with a formal definition of a tester. Consider the family of functions f : D → R, where D is some
partial order, and R ⊆ N. We assume that f always takes distinct values, so ∀x, y, f (x) 6= f (y). Since we
are proving lower bounds, this is no loss of generality.

Definition 1.2. An algorithm A is a (t, ε, δ )-monotonicity tester if A has the following properties. For
any f : D → R, the algorithm A makes t (possibly randomized) queries to f and then outputs either
“accept” or “reject.” If f is monotone, then A accepts with probability > 1 − δ . If f is ε-far from
monotone, then A rejects with probability > 1 − δ .

    Given a positive integer s, let Ds be the s-fold Cartesian product of D. We define two symbols
acc and rej, and denote D0 = D ∪ {acc, rej}. Any (t, ε, δ )-tester can be completely specified by the
following family of functions. For all s ≤ t, x ∈ Ds , y ∈ D0 , we consider a function pyx : Rs → [0, 1],
with the semantics that for any a ∈ Rs , pyx (a) denotes the probability the tester queries y as the (s + 1)th
query, given that the first s queries are x1 , . . . , xs and f (xi ) = ai for 1 ≤ i ≤ s. These functions satisfy the
following properties.


                                  ∀s ≤ t, ∀x ∈ Ds , ∀a ∈ Rs ,      ∑ pyx (a) = 1 ,                                   (1.1)
                                                                  y∈D0

                                 ∀x ∈ Dt , ∀y ∈ D, ∀a ∈ Rt ,      pyx (a) = 0 .                                      (1.2)

(1.1) ensures the decisions of the tester at step (s + 1) must form a probability distribution. (1.2) implies
that the tester makes at most t queries. Any adaptive tester can be specified by these functions. The
important point to note is that these are finitely many functions; their number is at most t|D|t+1 .
    The starting point of this work is the result of Fischer [14] who proved an adaptive lower bound for
monotonicity testing for functions f : [n] → N. He shows that adaptive testers can be reduced to what we
call comparison-based testers ([14] calls them order-based testers). In plain English, comparison-based
testers are adaptive testers whose decision on where to query at time s + 1 depends only on the order of
the function values at the s-query points so far, and not on the value themselves. Such a reduction is done
using Ramsey theory arguments, in turn inspired by the work of Breslauer et al. [7]. Our starting point is
an observation that Fischer’s proof goes through for every partial order, and not just the total order [n]. To
define comparison-based testers formally, we need some notation.
    For any positive integer s, let R(s) denote the set of unordered subets of R of cardinality s. We
introduce new functions as follows. With each s, x ∈ Ds , y ∈ D0 , and each permutation σ : [s] → [s], we
associate functions qyx,σ : R(s) → [0, 1], with the semantics

             For any set S = (a1 < a2 < · · · < as ) ∈ R(s) ,        qyx,σ (S) := pyx (aσ (1) , . . . , aσ (s) ) .

                      T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                                        455
                              D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

    That is, qyx,σs (S) sorts the answers in S in increasing order, permutes them according to σ , and passes
the permuted ordered tuple to pyx . These q-functions allow us to formally define comparison-based testers.

Definition 1.3. A monotonicity tester A is comparison-based for functions f : D → R if for all s,
x ∈ Ds , y ∈ D0 , and permutations σ : [s] → [s], the function qyx,σ is a constant function on R(s) . In other
words, the (s + 1)th decision of the tester given that the first s questions is x, depends only on the ordering
of the answers received, and not on the values of the answers.

    It is not too hard to see that a comparison-based tester for the domain [n] can be easily converted to
a non-adaptive tester, for which an Ω(log n) bound was previously known [13]. This is not true for the
hypergrid domain in general. To circumvent this, we first focus on the hypercube domain. As is standard,
we define a distribution over functions, one of which is monotone and the others ε-far from monotone,
and show that any deterministic comparison-based tester making few queries cannot be correct most of
the time. Our monotone function is in fact the “decimal notation” of the binary vector which “mimics” a
total order from 0 to 2d − 1. This can now be used to argue that any comparison-based tester is essentially
non-adaptive for which a lower bound follows easily. Finally, for hypergrids, we give an easy reduction
to hypercubes.


2    The reduction to comparison-based testers
Theorem 2.1. Suppose there exists a (t, ε, δ )-monotonicity tester for functions f : D → N. Then there
exists a comparison-based (t, ε, 2δ )-monotonicity tester for functions f : D → N.

    As stated in the previous section, the above theorem is implicit in the work of Fischer [14] who
proved it only for D = [n]. We provide a proof for completeness. Call a monotonicity tester discrete if the
corresponding functions pyx satisfying constraints (1.1), (1.2) can only take values in {i/K : 0 ≤ i ≤ K}
for some finite K.

Lemma 2.2. Suppose there exists a (t, ε, δ )-monotonicity tester A for functions f : D → N. Then there
exists a discrete (t, ε, 2δ )-monotonicity tester for such functions.

Proof. We do a rounding on the p-functions. Let K = 100t|D|t /δ 2 . Start with the p-functions of the
(t, ε, δ )-tester A. For y ∈ D ∪ acc, x ∈ Ds , a ∈ Rs , let p̂yx (a) be the largest value in {i/K | 0 ≤ i ≤ K}
which is at most pyx (a). Set p̂x (a) so that (1.1) is maintained.
                                rej

     Note that for y ∈ D ∪ acc, if pyx (a) > 10t/(δ K), then
                                           
                                         δ
                                     1−       pyx (a) ≤ p̂yx (a) ≤ pyx (a) .
                                        10t
                rej         rej
Furthermore, p̂x (a) ≥ px (a).
     The p̂-functions describe a new discrete tester A0 that makes at most t queries. We argue that A0 is a
(t, ε, 2δ )-tester. Given a function f that is either monotone or ε-far from monotone, consider a sequence
of queries x = (x1 , . . . , xs ) after which A returns a correct decision µ. Call such a sequence good, and let

                      T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                              456
             A N O PTIMAL L OWER B OUND FOR M ONOTONICITY T ESTING OVER H YPERGRIDS

p(x) denote the probability this occurs. We know that the sum of p(x) over all good query sequences is
at least (1 − δ ). Now,

              p(x) := px1 · pxx21 ( f (x1 )) · px(x31 ,x2 ) ( f (x1 ), f (x2 )) · · · · p(x1 ,...,xs ) ( f (x1 ), . . . , f (xs )) .
                                                                                            µ


Here px1 is the probability that the first point queried is x1 . Two cases arise. Suppose all of the multiplier
probabilities in the right-hand side above are ≥ 10t/δ K. Then, the probability of this good sequence
arising in A0 is at least (1 − δ /10t)t p(x) ≥ p(x)(1 − δ /10). Otherwise, suppose some probability in
the right-hand side is < 10t/δ K; call such good sequences deficient. The total probability mass of
querying deficient good sequences is at most 10t/δ K · |D|t ≤ δ /2. Therefore, the probability of querying
a good sequence in A0 is at least (1 − 3δ /2)(1 − δ /10) > 1 − 2δ , where the first term is the mass on
non-deficient, good sequences for A. Therefore, A0 is a (t, ε, 2δ ) tester.

    We introduce some Ramsey theory terminology. For any positive integer i, a finite coloring of N(i) is a
function coli : N(i) → {1, . . . ,C} for some finite number C. An infinite set X ⊆ N is called monochromatic
with respect to coli if for all sets A, B ∈ X (i) , coli (A) = coli (B). A k-wise finite coloring of N is a
collection of k colorings col1 , . . . , colk . (Note that each coloring is over different sized tuples.) An
infinite set X ⊆ N is k-wise monochromatic if X is monochromatic with respect to all the coli s.
    The following is a simple variant of Ramsey’s original theorem. (We closely follow the proof of
Ramsey’s theorem as given in Chap VI, Theorem 4 of [6].)

Theorem 2.3. For any k-wise finite coloring of N, there is an infinite k-wise monochromatic set X ⊆ N.

Proof. We proceed by induction on k. If k = 1, then this is trivially true since C is finite. We now iteratively
construct an infinite set of N. Let col1 , col2 , . . . , colk be a k-coloring of N. Start with a0 being the
minimum element in N. Consider the following (k − 1)-wise coloring of (N \ {a0 }) col01 , . . . , col0k−1 ,
where col0i (S) is defined to be coli+1 (S ∪ a0 ). By the induction hypothesis, there exists an infinite
(k − 1)-wise monochromatic set A0 ⊆ N \ {a0 } with respect to coloring col0i s. That is, for 2 ≤ i ≤ k,
and any set S, T ⊆ A0 with |S| = |T | = i − 1, we have coli (a0 ∪ S) = coli (a0 ∪ T ). Call this color Ci0 .
Denote the collection of these colors as a vector C0 = (C10 ,C20 , . . . ,Ck0 ) where C10 = col1 (a0 ).
      Subsequently, let a1 be the minimum element in A0 , and consider the (k − 1)-wise coloring col0 of
(A0 \ {a1 }) where col0i (S) = coli+1 (S ∪ {a1 }) for S ⊆ A0 \ {a1 }. Again, the induction hypothesis yields
an infinite (k − 1)-wise monochromatic set A1 as before, and similarly the vector C1 . Continuing this
procedure, we get an infinite sequence a0 , a1 , a2 , . . . of natural numbers, an infinite sequence of vectors of
k colors C0 , C1 , . . ., and an infinite nested sequence of infinite sets A0 ⊃ A1 ⊃ A2 . . .. Every Ar contains
as , ∀s > r and by construction, any set ({ar } ∪ S), S ⊆ Ar , |S| = i − 1, has color Cri . Since there are only
finitely many colors, some vector of colors occurs infinitely often as Cr1 , Cr2 , . . .. The corresponding
infinite sequence of elements ar1 , ar2 , . . . is k-wise monochromatic.

Proof of Theorem 2.1. Suppose there exists a (t, ε, δ )-tester for functions f : D → N. We need to show
there is a comparison-based (t, ε, 2δ )-tester for such functions.
    By Lemma 2.2, there is a discrete (t, ε, 2δ )-tester A. Equivalently, we have the functions qyx,σ as
described in the previous section. We now describe a t-wise finite coloring of N. Consider s ∈ [t]. Given
a set A ⊆ N(s) , cols (A) is a vector indexed by (y, x, σ ), where y ∈ D0 , x ∈ Ds , and σ is a permutation of

                       T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                                                        457
                                D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

[s]. The value of the vector at this entry is defined to be qyx,σ (A). The domain is finite, so the number
of dimensions is finite. Since the tester is discrete, the number of possible colors entries is also finite.
Applying Theorem 2.3, we know the existence of a t-wise monochromatic infinite set R ⊆ N. By
the monochromatic property, we get that for any y, x, σ , and any two sets A, B ∈ R(s) , s ≤ t, we have
qyx,σ (A) = qyx,σ (B). That is, the algorithm A is a comparison-based tester for functions f : D → R.
     Consider the strictly monotone map φ : N → R, where φ (b) is the bth element of R in sorted order.
Now given any function f : D → N, consider the function φ ◦ f : D → R. Consider an algorithm A0
which on input f runs A on φ ◦ f . More precisely, whenever A queries a point x, it gets answer φ ◦ f (x).
Observe that if f is monotone (or ε-far from monotone), then so is φ ◦ f , and therefore, the algorithm A0
is a (t, ε, 2δ )-tester of φ ◦ f . Since the range of φ ◦ f is R, A0 is comparison-based.


3     Lower bounds
We assume that n is a power of 2 and set ` := log2 n, and think of [n] as {0, 1, . . . , n − 1}. For any integer
0 ≤ z < n, we think of the binary representation of z as an `-bit vector (z1 , z2 , . . . , z` ), where z1 is the least
significant bit (although, z1 is leftmost in the way written).
    We first start with a map which allows us to reduce functions on hypergrids from those on hypercubes.
The map is the following natural one: φ : [n]d → {0, 1}d` . For any ~y = (y1 , y2 , . . . , yd ) ∈ [n]d , we
concatenate binary representations of the yi s in order to get a d`-bit vector φ (~y). Hence, we can transform
a function f : {0, 1}d` → N into a function fe : [n]d → N by defining fe(~y) := f (φ (~y)).
    In Section 3.1, we describe a distribution of functions over the hypercube with equal mass on
monotone and ε-far from monotone functions. The key property is that for a function drawn from
this distribution, any deterministic comparison based algorithm errs in classifying it with non-trivial
probability. This property will be used in conjunction with the above mapping to get our final lower
bound Section 3.2.

3.1    The hard distribution
We focus on functions f : {0, 1}m → N. (Eventually, we set m = d`.) Given any x ∈ {0, 1}m , we let
val(x) := ∑m  i=1 2
                    i−1 x denote the number for which x is the binary representation. Here, x denotes the
                         i                                                                   1
least significant bit of x.
    For convenience, we let ε be a power of 1/2. For k ∈ {1, . . . , 1/2ε}, we let

                               Sk := x : val(x) ∈ [2(k − 1)ε2m , 2kε2m − 1] .
                                     

Note that the sets Sk partition the hypercube, with each |Sk | = ε2m+1 . In fact, each Sk is a subhypercube
of dimension m0 := m + 1 − log(1/ε), with the minimal element having all zeros in the m0 least significant
bits, and the maximal element having all ones in those.
    We describe a distribution Fm,ε on functions. The support of Fm,ε consists of f (x) = 2val(x) and
m /(2ε) functions indexed as g j,k with j ∈ [m0 ] and k ∈ [1/(2ε)], defined as follows.
  0

                                      (
                                       2val(x) − 2 j − 1     if x j = 1 and x ∈ Sk ,
                          g j,k (x) =
                                       2val(x)               otherwise.

                      T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                                    458
              A N O PTIMAL L OWER B OUND FOR M ONOTONICITY T ESTING OVER H YPERGRIDS

      The distribution Fm,ε puts probability mass 1/2 on the function f = 2val and ε/m0 on each of the
g j,k s. All these functions take distinct values on their domain. Note that 2val induces a total order on
{0, 1}m .


The distinguishing problem: Given query access to a random function f from Fm,ε , we want a
deterministic comparison-based algorithm that declares that f = 2val or f 6= 2val. We refer to any such
algorithm as a distinguisher. Naturally, we say that the distinguisher errs on f if its declaration is wrong.
We first prove a lower bound for non-adaptive distinguishers.

Lemma 3.1. Any deterministic, non-adaptive, comparison-based distinguisher A making fewer than
t ≤ m0 /(8ε) queries, errs with probability at least 1/8.

Proof. Let X be the set of points queried by the distinguisher. Set Xk = X ∩ Sk ; these form a partition
of X. We say that a pair of points (x, y) captures the (unique) coordinate j, if j is the largest coordinate
where x j 6= y j . (By largest coordinate, we refer to most significant bit.) For a set Y of points, we say Y
captures coordinate j if there is a pair in Y that captures j. The main technical argument is encapsulated
in the following two claims.

Claim 3.2. For any j, k, if the algorithm distinguishes between val and g j,k , then Xk captures j.

Proof. If the algorithm distinguishes between val and g j,k , there must exist (x, y) ∈ X such that val(x) <
val(y) and g j,k (x) > g j,k (y). We claim that x and y capture j; this will also imply they lie in the same Sk0
since the m − j most significant bit of x and y are the same.
    Firstly, observe that we must have y j = 1 and x j = 0; otherwise,

                                 g j,k (y) − g j,k (x) ≥ 2(val(y) − val(x)) > 0

contradicting the supposition. Now suppose (x, y) don’t capture j implying there exists i > j which is the
largest coordinate at which they differ. Since val(y) > val(x) we have yi = 1 and x j = 0. Therefore, we
have
           g j,k (y) − g j,k (x) ≥ 2(val(y) − val(x)) − 2 j − 1 ≥ (2i + 2 j ) − ∑ 2r − 2 j − 1 > 0 .
                                                                                1≤r<i

So, x, y capture j and lie in the same Sk0 . If k0 6= k, then again g j,k (y) − g j,k (x) = 2(val(y) − val(x)) > 0.
Therefore, Xk captures j.

Claim 3.3. A set Y of points captures at most |Y | − 1 coordinates.

Proof. We apply induction on |Y |. When |Y | = 2, this is trivially true. Otherwise, pick the largest
coordinate j captured by Y and let Y0 = {y : y j = 0} and Y1 = {y : y j = 1}. By induction, Y0 captures at
most |Y0 | − 1 coordinates, and Y1 captures at most |Y1 | − 1 coordinates. Pairs (x, y) ∈ Y0 ×Y1 only capture
coordinate j. Therefore, the total number of captured coordinates is at most

                                       |Y0 | − 1 + |Y1 | − 1 + 1 = |Y | − 1 .

                     T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                                 459
                              D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

    We now complete the proof of Lemma 3.1. If |X| ≤ m0 /8ε, then there exist at least 1/4ε values of
k such that |Xk | ≤ m0 /2. By Claim 3.2 and Claim 3.3, each such Xk captures at most m0 /2 coordinates.
Therefore, there exist at least
                                                 1 m0     m0
                                                     · =
                                                4ε 2      8ε
functions g j,k that are indistinguishable from the monotone function 2val to a comparison-based proce-
dure that queries X. This implies the distinguisher must err (make a mistake on either these g j,k s or 2val)
with probability at least
                                                ε m0 1
                                                          
                                                                1
                                         min      0
                                                    ·   ,     = .
                                                m (8ε) 2        8

   A basic proposition reduces adaptive distinguishers to non-adaptive ones. This crucially uses the total
order given by val(x).

Proposition 3.4. Suppose there exists a deterministic comparison-based distinguisher A that makes at
most t queries for inputs drawn from distribution Fm,ε . Then there exists a deterministic non-adaptive
comparison-based distinguisher A0 making at most t queries whose probability of error on inputs from
Fm,ε is at most that of A.

Proof. We represent A as a comparison tree. For any path in A, the total number of distinct domain
points involved in comparisons is at most t. Note that 2val(x) is a total order, since for any x, y either
val(x) < val(y) or vice versa. We say that a comparison between f (x) and f (y) is inconsistent with val
if f (x) < f (y), val(x) > val(y) or vice versa. We construct a comparison tree A0 where we simply reject
whenever a comparison is inconsistent with the total order, and otherwise mimics A. The comparison tree
of A0 has an error probability at most that of A since it never errs when A doesn’t err. Furthermore, the
tree is just a path and thus can be modeled as a non-adaptive distinguisher as follows. We simply query
upfront all the points involving points on this path, and make the relevant comparisons for the output.

      Our main lemma is a direct consequence of Proposition 3.4 and Lemma 3.1.

Lemma 3.5. Any deterministic comparison-based distinguisher that makes less than m0 /(8ε) queries
errs with probability at least 1/8 on a function drawn from Fε,m .

3.2     The final bound
Recall, given function f : {0, 1}d` → N, we have the function fe : [n]d → N by defining fe(~y) := f (φ (~y)).
We start with the following observation.

Proposition 3.6. The function 2val
                              ] is monotone and every gf
                                                       j,k is ε/2-far from being monotone.


Proof. Let ~u and ~v be elements in [n]d such that ~u ≺ ~v. We have val(φ (~u)) < val(φ (~v)), so 2val   ] is
                                                                                                 d`
monotone. For the latter, it suffices to exhibit a matching of violated pairs of cardinality ε2 for gf  j,k . This
is given by pairs (~u,~v) where φ (~u) and φ (~v) only differ in their jth coordinate, and are both contained in
Sk . Note that these pairs are comparable in [n]d and are violations.

                     T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                                 460
            A N O PTIMAL L OWER B OUND FOR M ONOTONICITY T ESTING OVER H YPERGRIDS

Theorem 3.7. Any (t, ε/2, 1/16)-monotonicity tester for f : [n]d → N, must have

                                               d log n − log(1/ε)
                                        t≥                        .
                                                        8ε
Proof. By Theorem 2.1, it suffices to show this for comparison-based (t, ε/2, 1/8) testers. By Yao’s
minimax lemma, it suffices to produce a distribution D over functions f : [n]d → N such that any
deterministic comparison-based (t, ε/2, 1/8)-monotonicity tester for D must have t ≥ s, where

                                               d log n − log(1/ε)
                                        s :=                      .
                                                        8ε

    Consider the distribution D where we generate f from Fm,ε and output fe. Suppose t < s. By Proposi-
tion 3.6, the deterministic comparison based monotonicity tester acts as a deterministic comparison-based
distinguisher for Fm,ε making fewer than s queries, contradicting Lemma 3.1.



References
 [1] N IR A ILON AND B ERNARD C HAZELLE: Information theory in property testing and monotonicity
     testing in higher dimension. Inform. and Comput., 204(11):1704–1717, 2006. Preliminary version
     in STACS’05 and ECCC. [doi:10.1016/j.ic.2006.06.001] 454

 [2] N IR A ILON , B ERNARD C HAZELLE , S. C OMANDUR , AND D ING L IU: Estimating the distance to a
     monotone function. Random Structures Algorithms, 31(3):1704–1711, 2007. Preliminary version in
     RANDOM’04. [doi:10.1002/rsa.20167] 454

 [3] T U ĞKAN BATU , RONITT RUBINFELD , AND PATRICK W HITE: Fast approximate PCPs for multidi-
     mensional bin-packing problems. Inform. and Comput., 196(1):42–56, 2005. Preliminary version
     in APPROX’99. [doi:10.1016/j.ic.2004.10.001] 454

 [4] E RIC B LAIS , J OSHUA B RODY, AND K EVIN M ATULEF: Property testing lower bounds via commu-
     nication complexity. Comput. Complexity, 21(2):311–358, 2012. Preliminary version in CCC’11
     and ECCC. [doi:10.1007/s00037-012-0040-x] 454, 455

 [5] E RIC B LAIS , S OFYA R ASKHODNIKOVA , AND G RIGORY YAROSLAVTSEV: Lower bounds for test-
     ing properties of functions on hypergrid domains. In Proc. 29th IEEE Conf. on Computational Com-
     plexity (CCC’14), pp. 309–320, 2014. Preliminary version in ECCC. [doi:10.1109/CCC.2014.38]
     454, 455

 [6] B ÉLA B OLLOBÁS: Modern Graph Theory. Springer, 2000. 457

 [7] DANY B RESLAUER , A RTUR C ZUMAJ , D EVDATT P. D UBHASHI , AND F RIEDHELM M EYER AUF
     DER H EIDE: Transforming comparison model lower bounds to the parallel-random-access-machine.
     Inform. Process. Lett., 62(2):103–110, 1997. [doi:10.1016/S0020-0190(97)00032-X] 455

                   T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                          461
                           D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

 [8] J OP B RIËT, S OURAV C HAKRABORTY, DAVID G ARCÍA -S ORIANO , AND A RJEH M ATSLIAH:
     Monotonicity testing and shortest-path routing on the cube. Combinatorica, 32(1):35–53, 2012.
     Preliminary version in RANDOM’99 and ECCC. [doi:10.1007/s00493-012-2765-1] 454

 [9] J OSHUA B RODY: Personal communication, 2013. 454, 455

[10] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: Optimal bounds for monotonicity and Lips-
     chitz testing over hypercubes and hypergrids. In Proc. 45th STOC, pp. 419–428. ACM Press, 2013.
     Preliminary version in ECCC. [doi:10.1145/2488608.2488661] 454

[11] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: An optimal lower bound for monotonicity
     testing over hypergrids. In Proc. 16th Internat. Workshop on Approximation Algorithms for Combi-
     natorial Optimization Problems (APPROX’13), pp. 425–435, 2013. Preliminary version in ECCC.
     [doi:10.1007/978-3-642-40328-6_30] 453

[12] Y EVGENIY D ODIS , O DED G OLDREICH , E RIC L EHMAN , S OFYA R ASKHODNIKOVA , DANA RON ,
     AND A LEX S AMORODNITSKY: Improved testing algorithms for monotonicity. In Proc. 2nd Internat.
     Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX’99),
     pp. 97–108, 1999. Available at ACM-DL. Preliminary version in ECCC. 454

[13] F UNDA E RGÜN , S AMPATH K ANNAN , R AVI K UMAR , RONITT RUBINFELD , AND M AHESH
     V ISWANATHAN: Spot-checkers. J. Comput. System Sci., 60(3):717–751, 2000. Preliminary version
     in STOC’98. [doi:10.1006/jcss.1999.1692] 454, 456

[14] E LDAR F ISCHER: On the strength of comparisons in property testing. Inform. and Comput.,
     189(1):107–116, 2004. Preliminary version in ECCC. [doi:10.1016/j.ic.2003.09.003] 454, 455, 456

[15] E LDAR F ISCHER , E RIC L EHMAN , I LAN N EWMAN , S OFYA R ASKHODNIKOVA , RONITT RU -
     BINFELD , AND A LEX S AMORODNITSKY : Monotonicity testing over general poset domains. In
     Proceedings of the 34th Annual ACM Symposium on the Theory of Computing (STOC), pp. 474–483.
     ACM Press, 2002. [doi:10.1145/509907.509977] 454

[16] O DED G OLDREICH , S HAFI G OLDWASSER , E RIC L EHMAN , DANA RON , AND A LEX S AMOROD -
     NITSKY: Testing monotonicity. Combinatorica, 20:301–337, 2000. Preliminary version in FOCS’98.
     [doi:10.1007/s004930070011] 453, 454

[17] O DED G OLDREICH , S HAFI G OLDWASSER , AND DANA RON: Property testing and its connection
     to learning and approximation. J. ACM, 45(4):653–750, 1998. Preliminary version in FOCS’96 and
     ECCC. [doi:10.1145/285055.285060] 453

[18] S HIRLEY H ALEVY AND E YAL K USHILEVITZ: Testing monotonicity over graph prod-
     ucts. Random Structures Algorithms, 33(1):44–67, 2008. Preliminary version in ICALP’04.
     [doi:10.1002/rsa.20211] 454

[19] E RIC L EHMAN AND DANA RON: On disjoint chains of subsets. J. Combin. Theory Ser. A,
     94(2):399–404, 2001. [doi:10.1006/jcta.2000.3148] 454

                  T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                       462
           A N O PTIMAL L OWER B OUND FOR M ONOTONICITY T ESTING OVER H YPERGRIDS

[20] M ICHAL PARNAS , DANA RON , AND RONITT RUBINFELD: Tolerant property testing and distance
     approximation. J. Comput. System Sci., 6(72):1012–1042, 2006. Preliminary vesion in ECCC.
     [doi:10.1016/j.jcss.2006.03.002] 454

[21] RONITT RUBINFELD AND M ADHU S UDAN:          Robust characterization of polynomi-
     als with applications to program testing. SIAM J. Comput., 25(2):647–668, 1996.
     [doi:10.1137/S0097539793255151] 453


AUTHORS

     Deeparnab Chakrabarty
     Researcher
     Microsoft Research
     9 Lavelle Road, Bangalore, India, 560001
     deeparnab gmail com
     http://research.microsoft.com/en-us/um/people/dechakr/


     C. Seshadhri
     Assistant Professor
     Department of Computer Science
     University of California, Santa Cruz
     scomandu ucsc edu
     csesha gmail com
     https://users.soe.ucsc.edu/~sesh/


ABOUT THE AUTHORS

     D EEPARNAB C HAKRABARTY received his B. Tech. from IIT Bombay in 2003 and Ph. D.
        from Georgia Institute of Technology in 2008. His graduate adviser was Vijay V. Vazirani.
        After stints at the universities of Waterloo and Pennsylvania, he crossed the pond again
        to join Microsoft Research in Bangalore where he has been a researcher since 2011. He
        is interested in understanding efficient algorithms using the lens of optimization and has
        worked on approximation algorithms, property testing, and on algorithmic questions
        arising in game theory and economics.




                  T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                           463
                      D EEPARNAB C HAKRABARTY AND C. S ESHADHRI

C. S ESHADHRI (Seshadhri Comandur according to his passport, and Sesh according to his
   friends) follows the common naming style of South India, the native land of his parents.
   In three of his early papers, including one cited in this bibliography, his name appears
   as S. Comandur. The interested reader is invited to consult the OAQ (Occasionally
   Asked Questions) section of his website for more details. Sesh got his B. Tech. from IIT
   Kanpur in 2003 and his Ph. D. from Princeton University in 2008. His Ph. D. advisor
   was Bernard Chazelle. He spent two years at IBM Almaden, and in 2010 he became a
   member of technical staff at Sandia National Laboratories, Livermore, California. In
   2015 he joins the faculty of the University of California at Santa Cruz. This paper was
   written during his time at Sandia. His research focuses on how random sampling can be
   used for algorithms for massive inputs. In theory, this manifests itself as work in property
   testing. In practice, this has led to research on algorithms for massive real-world graphs.
   He is increasingly interested in algorithmics beyond the worst-case, and bridging the gap
   between theory and practice for big data applications.




             T HEORY OF C OMPUTING, Volume 10 (17), 2014, pp. 453–464                             464