Testing $k$-Monotonicity: The Rise and Fall of Boolean Functions

Authors Cl{\'{e}}ment L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, Karl Wimmer,
Plaintext
                           T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55
                                        www.theoryofcomputing.org




                              Testing k-Monotonicity
                        The Rise and Fall of Boolean Functions
 Clément L. Canonne∗                      Elena Grigorescu† Siyao Guo‡                            Akash Kumar§
                                               Karl Wimmer
                  Received March 10, 2017; Revised August 18, 2018; Published April 22, 2019




       Abstract: A Boolean k-monotone function defined over a finite poset domain D alternates
       between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-
       monotone functions are natural generalizations of the classical monotone functions, which
       are the 1-monotone functions.
           Motivated by the recent interest in k-monotone functions in the context of circuit com-
       plexity and learning theory, and by the central role that monotonicity testing plays in the
       context of property testing, we initiate a systematic study of k-monotone functions, in the
       property testing model. In this model, the goal is to distinguish functions that are k-monotone
       (or are close to being k-monotone) from functions that are far from being k-monotone.
       Our results include the following.
          1. We demonstrate a separation between testing k-monotonicity and testing monotonicity,
             on the hypercube domain {0, 1}d , for k ≥ 3;
     An extended abstract of this paper appeared in the Proceedings of the 8th Innovations in Theoretical Computer Science
conference, 2017 [14].
   ∗ Work performed while a graduate student at Columbia University, supported by NSF CCF-1115703 and NSF CCF-1319788.
   † Research supported in part by NSF CCF-1649515.
   ‡ Part of the work was done at New York University and Simons Institute, UC Berkeley.
   § Research supported in part by NSF CCF-1649515.



ACM Classification: F.2, G.2
AMS Classification: 68Q32, 68W20, 68Q25, 68Q17
Key words and phrases: property testing, Boolean functions, monotonicity, learning


© 2019 Clément L. Canonne, Elena Grigorescu, Siyao Guo, Akash Kumar, and Karl Wimmer
c b Licensed under a Creative Commons Attribution License (CC-BY)                            DOI: 10.4086/toc.2019.v015a001
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

                                                                               d
                                                                    √ on {0, 1} , for k = ω(log d):
          2. We demonstrate a separation between testing and learning
             testing k-monotonicity can be performed with exp(O( √d · log d · log(1/ε))) queries,
             while learning k-monotone functions requires exp(Ω(k · d · 1/ε)) queries (Blais et al.
             (RANDOM 2015));
          3. We present a tolerant test for k-monotonicity of functions f : [n]d → {0, 1} with com-
             plexity independent of n. The test implies a tolerant test for monotonicity of functions
             f : [n]d → [0, 1] in `1 distance, which makes progress on a problem left open by Berman
             et al. (STOC 2014).
        Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier
        analysis on the grid [n]d , and draw connections to distribution testing techniques.


1     Introduction
A function f : D → {0, 1}, defined over a finite partially ordered domain (D, ) is said to be k-monotone,
for some integer k ≥ 0, if there does not exist x1  x2  · · ·  xk+1 in D such that f (x1 ) = 1 and
 f (xi ) 6= f (xi+1 ) for all i ∈ [k]. Note that 1-monotone functions are the classical monotone functions,
satisfying f (x1 ) ≤ f (x2 ), whenever x1  x2 .
      Monotone functions have been well-studied on multiple fronts in computational complexity due to
their natural structure. They have been studied for decades in the property testing literature [28, 22, 26,
12, 16, 18, 17], where we have recently witnessed exciting results [37, 19, 7], in the circuit complexity
literature, where we now have strong lower bounds [47, 48], and in computational learning, where we
now have learning algorithms in numerous learning models [13, 4, 36, 50, 44, 45].
      The generalized notion of k-monotonicity has also been studied in the context of circuit lower bounds
for more than 50 years. In particular, Markov [40] showed that any k-monotone function (even with
multiple outputs) can be computed using circuits containing only log k negation gates. The presence of
negation gates appears to be a challenge in proving circuit lower bounds: “the effect of such gates on
circuit size remains to a large extent a mystery” [33]. The recent results of Blais et al. [11] on learning
lower bounds have prompted renewed interest in understanding k-monotone functions from multiple
angles, including cryptography, circuit complexity, learning theory, and Fourier analysis ([49, 31, 30, 39]).
      Motivated by the exponential lower bounds on PAC learning k-monotone functions due to [11], we
initiate the study of k-monotonicity in the closely related Property Testing model. In this model, given
query access to a function, one must decide if the function is k-monotone, or is far from being k-monotone,
by querying the input only in a small number of places.


1.1     Our results
We focus on testing k-monotonicity of Boolean functions defined over the d-dimensional hypergrid [n]d ,
and the hypercube {0, 1}d . We begin our presentation with the results for the hypercube, in order to build
intuition about the difficulty of the problem, while comparing our results with the current literature on
testing monotonicity. Our stronger results concern the hypergrid [n]d .

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               2
                                                 T ESTING k-M ONOTONICITY

1.1.1    Testing k-monotonicity on the hypercube {0, 1}d
                                                                                           √
In light of the recent results of [37] that provide an (optimal non-adaptive one-sided) O(
                                                                                         e d)-query tester
for monotonicity, we first show that testing k-monotonicity is strictly harder than testing monotonicity on
{0, 1}d , for k ≥ 3.

Theorem 1.1. For 1 ≤ k ≤ d 1/4 /2, any one-sided non-adaptive tester for k-monotonicity of functions
                                      k/4
f : {0, 1}d → {0, 1} must make Ω d/k2      queries.

     Both Theorem 1.1 and its proof generalize the Ω(d 1/2 ) lower bound for testing monotonicity, due to
Fischer et al. [26].1
     On the upper bounds side, while the monotonicity testing problem provides numerous potential
techniques for approaching this new problem [28, 22, 18, 12, 20, 37], most common techniques appear to
resist generalizations to k-monotonicity. However, our upper bounds demonstrate a separation between
testing and PAC learning k-monotonicity, for large enough values of k = ω(log d).

Theorem 1.2. There exists a one-sided non-adaptive tester for k-monotonicity of functions f : {0, 1}d →
{0, 1} with query complexity
                                                 √
                                                                    
                                                                    1
                             q(d, ε, k) = exp O     d · log d · log      .
                                                                    ε

     Indeed, in the related PAC√learning model, [11] shows that learning k-monotone functions on the
hypercube requires exp(Ω(k · d · 1/ε)) many queries.
     We further observe that the recent non-adaptive and adaptive two-sided lower bounds of [19, 7] imply
the same bounds for k-monotonicity, using black box reductions. In Table 1, we summarize our results on
testing k-monotonicity over hypercube and best bounds on monotonicity testing (see Section 1.4.2) for
comparison.

                     upper bound              1. s.-n. a. lower bound      1. s.-n.
                                                                                 a. lower bound        1. s.-a. lower bound
                   O d 1/2 /ε 2 [37]                Ω d 1/2 [26]                 d 1/2−o(1)                  e d 1/3 [21]
                                                                                                                    
     k=1           e                                                        Ω            [19]                Ω
                   √                                    k/4                          
               d O( d·log(1/ε)) Thm 1.2       Ω d/k2                       Ω d 1/2−o(1) Cor 4.7            e d 1/3 Cor 4.7
                                                                                                                    
     k≥2                                                      Thm 1.1                                     Ω

                       Table 1: Testing k-monotonicity of a function f : {0, 1}d → {0, 1}.



1.1.2    Testing k-monotonicity on the hypergrid [n]d
In the remainder of the paper we discuss functions defined over the d-dimensional hypergrid domain [n]d ,
where we denote by (i1 , i2 , . . . , id )  ( j1 , j2 , . . . , jd ) the partial order in which i1 ≤ j1 , i2 ≤ j2 , . . . , id ≤ jd .
Testing monotonicity for functions over hypergrids has received a lot of attention [28, 23, 25, 6, 1, 32,
   1 More specifically, our theorem generalizes the first half of Fischer et al.’s argument, before their spanning tree improvement,

which yields an Ω(d 1/2 ) lower bound for monotonicity (instead of Ω(d 1/4 )). The reason for this is that this second part does
not extend beyond monotonicity, i. e., to k ≥ 2.


                             T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                                  3
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

10, 16, 18, 17, 8]. As these papers demonstrate, the problem is well-understood. We refer the reader
to Table 4 in Section 1.4.2 for a detailed review on the state of the art in the area. We summarize our
results on testing k-monotonicity over [n]d in Table 2.

                                     General  k                           k=2              k = 1 (monotonicity)
                       k                    e 17 1. s.-n. a.            1
                                                                                                Θ ε1 1. s.-n. a.
                                                                                                   
        d=1       Θ    ε       1. s.-n. a., O  ε
                                                                    O   ε     1. s.-n. a.
                                     2
                                      k
                                                                                             Θ ε1 log ε1 1. s.-n. a.,
                                                                                                         
                                 O ε 4 1. s.-n. a.
                                  e
                                                                      Θ ε1 1. s.-a.
                                                                          
        d=2
                                                                                                 Θ ε1 1. s.-a.
                                                                                                       
                         (from
                              the row  below)
                                   
                            1 5kd d
                                                                          
                        O ε2 ε
                        e              1. s.-n. a.,             Oe 12 10d d 1. s.-n. a.,
                                                                                             O dε log dε 1. s.-n. a.
                                                                                                         
        d≥3                √                                   ε √ε    
                      exp O k d/ε
                           e         2     1. s.-n. a.         exp Oe d/ε 2 1. s.-n. a.

Table 2: Summary of our results: testing k-monotonicity of a function f : [n]d → {0, 1} (first two columns).
The last column contains known bounds on monotonicity testing (see Section 1.4.2) and is provided for
comparison.



Testing k-monotonicity on the line and the 2-dimensional grid. We begin with a study of functions
 f : [n] → {0, 1}. Note that one-sided testers should always accept k-monotone functions, and so, they must
accept unless they discover a violation to k-monotonicity in the form of a sequence x1  x2  · · ·  xk+1
in [n]d , such that f (x1 ) = 1 and f (xi ) 6= f (xi+1 ). Therefore, lower bounds for one-sided k-monotonicity
testing must grow at least linearly with k. We show that an even better lower bound of Ω(k/ε) holds
for both adaptive and non-adaptive testers, and moreover, we give a tight non-adaptive algorithm.
Consequently, our results demonstrate that adaptivity does not help in testing k-monotonicity with
one-sided error on the line domain.

Theorem 1.3. Any one-sided (possibly adaptive) tester for k-monotonicity of functions f : [n] → {0, 1}
must have query complexity Ω(k/ε).

    The upper bound generalizes the O(1/ε) tester for monotonicity on the line.

Theorem 1.4. There exists a one-sided non-adaptive tester for k-monotonicity of functions f : [n] → {0, 1}
with query complexity q(n, ε, k) = O(k/ε).

    Testing with two-sided error, however, does not require a dependence on k. In fact the problem
has been well-studied in the machine learning literature in the context of testing/learning “union of
intervals” [35, 5], and in testing geometric properties, in the context of testing surface area [38, 42],2
resulting in an O(1/ε 7/2 )-query algorithm. Namely, the starting point of [5] (later improved by [38]) is a
“Buffon’s Needle”-type argument. There, the crucial quantity to analyze is the noise sensitivity of the
function, that is the probability that a randomly chosen pair of nearby points cross a “boundary,” i. e.,
have different values. (Moreover, the algorithm of [5] works in the active testing setting: it only requires
a weaker access model that the standard query model.)
   2 We thank Eric Blais for mentioning the connection, and pointing us to these articles.



                                 T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                   4
                                         T ESTING k-M ONOTONICITY

    We provide an alternate proof of a poly(1/ε) bound (albeit with a worse exponent) that reveals a
surprising connection with distribution testing, namely with the problem of estimating the support size of
a distribution.

Theorem 1.5. There exists a two-sided non-adaptive tester for k-monotonicity of functions f : [n] → {0, 1}
with query complexity q(n, ε, k) = O(1/ε
                                   e     7 ), independent of k.


    An immediate implication of Theorem 1.5 is that one can test even n1−α -monotonicity of f : [n] →
{0, 1}, for every α > 0, with a constant number of queries. Hence, there is a separation between one-sided
and two-sided testing, for k = ω(1).
    Turning to the 2-dimensional grid, we show that 2-monotone functions can be tested with the
minimum number of queries one could hope for:

Theorem 1.6. There exists a two-sided adaptive tester for 2-monotonicity of functions f : [n]2 → {0, 1}
with query complexity q(n, ε) = O(1/ε).

    We also discuss possible generalizations of Theorem 1.6 to general k or d in Section 6.2.

Testing k-monotonicity on [n]d , tolerant testing, and distance approximation. Moving to the gen-
eral grid domain [n]d , we show that k-monotonicity is testable with poly(1/ε, k) queries in constant-
dimension grids.

Theorem 1.7. There exists a non-adaptive two-sided tester for k-monotonicity of functions f : [n]d →
{0, 1} with query complexity
                                                  d !        √
                                                                            !
                                          1    5kd             e k d/ε 2
                                                                         
                    q(n, d, ε, k) = min O
                                        e               , exp O               .
                                          ε2    ε

    In fact, we obtain more general testing algorithms than in Corollary 1.7, namely our results hold for
tolerant testers (as we define next).
    The notion of tolerant testing was first introduced in [46] to account for the possibility of noisy data.
In this notion, a test should accept inputs that are ε1 -close to the property, and reject inputs that are ε2 -far
from the property, where ε1 and ε2 are given parameters. Tolerant testing is intimately connected to
the notion of distance approximation: given tolerant testers for every (ε1 , ε2 ), there exists an algorithm
that estimates the distance to the property within any (additive) ε, while incurring only a O(log(1/ε))
                                                                                                    e
factor blow up in the number of queries. Furthermore, [46] shows that both tolerant testing and distance
approximation are no harder than agnostic learning. We prove the following general result.

Theorem 1.8. There exist

    • a non-adaptive (fully) tolerant tester for k-monotonicity of functions f : [n]d → {0, 1} with query
      complexity
                                                                           d !
                                                            1        5kd
                               q(n, d, ε1 , ε2 , k) = O
                                                      e                          ;
                                                        (ε2 − ε1 )2 ε2 − ε1


                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                   5
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

      • a non-adaptive tolerant tester for k-monotonicity of functions f : [n]d → {0, 1} with query com-
        plexity                                             √              
                                                           e k d/(ε2 − 3ε1 )2 ,
                                q(n, d, ε1 , ε2 , k) = exp O

        under the restriction that ε2 > 3ε1 .

    To the best of our knowledge, the only previous results for tolerant testing for monotonicity on [n]d are
due to Fattal and Ron [24]. They give both additive and multiplicative distance approximations algorithms,
and obtain O(d)-multiplicative and ε-additive approximations with query complexity poly(1/ε). While
very efficient, their results only give fully tolerant testers for dimensions d = 1 and d = 2. Our results
generalize results of [24] by showing the existence of tolerant testers for k-monotonicity (and hence for
monotonicity) for any dimension d ≥ 1, and any k ≥ 1, but paying the price in the query complexity.
    As a consequence to Theorem 1.8, we make progress on an open problem of Berman et al. [8], as
explained next.

Testing k-monotonicity under L p distance. The property of being a monotone Boolean function has a
natural extension to real-valued functions. Indeed, a real-valued function defined over a finite domain D is
monotone if f (x) ≤ f (y) whenever x  y. For real-valued functions the more natural notion of distance is
L p distance, rather than Hamming distance. The study of monotonicity has been extended to real-valued
functions in a recent article by Berman et al. [8]. They give tolerant testers for grids of dimension d = 1
and d = 2, and leave open the problem of extending the results to general d, as asked explicitly at the
recent Sublinear Algorithms Workshop 2016 [52].
     We make progress towards solving this open problem, by combining our Theorem 1.8 with a
reduction from L p testing to Hamming testing inspired by [8]. This reduction relates L1 -distance to
monotonicity of a function f : [n]d → [0, 1] to Hamming distance to monotonicity of a “rounded” function
fe: [n]d × [m] → {0, 1}, essentially trading the range for an extra dimension (where m is a rounding
parameter to be suitably chosen). Moreover, simulating query access to fe can be performed efficiently
given query access to f .

Theorem 1.9. There exists a non-adaptive tolerant L1 -tester for monotonicity of functions f : [n]d → [0, 1]
with query complexity
                      d 
             1      5d
    • O (ε −ε )2 ε2 −ε1
      e                     , for any 0 ≤ ε1 < ε2 ≤ 1;
             2   1

            √             
            e d/(ε2 − 3ε1 )2 , for any 0 ≤ 3ε1 < ε2 ≤ 1.
      • exp O

1.2     Proof overview and technical contribution
Structural properties and the separation between testing and learning on {0, 1}d . We first observe
that basic structural properties, such as extendability (i. e., the feature that a function that is monotone on
a sub-poset of [n]d can be extended into a monotone function on the entire poset domain), and properties
of the violation graph (i. e., the graph whose edges encode the violations to monotonicity), extend easily
to k-monotonicity (see Section 3). These properties help us to argue the separation between testing and

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                6
                                        T ESTING k-M ONOTONICITY

learning (Theorem 1.2). However, unlike the case of monotonicity testing, these properties do not seem
to be enough for showing upper bounds that grow polynomially in d.

Grid coarsening and testing by implicit/explicit learning. One technique, which underlies all the
hypergrid upper bounds in this article is that of gridding: i. e., partitioning the domain into “blocks”
whose size no longer depends on the main parameter of the problem, n. This technique generalizes the
approach of [24] who performed a similar gridding for dimension d = 2. By simulating query access to
the “coarsened” version of the unknown function (with regard to these blocks), we are able to leverage
methods such as testing-by-learning (either fully or partially learning the function), or reduce our testing
problem to a (related) question on these nicer “coarsenings.” (The main challenge here lies in providing
efficient and consistent oracle access to the said coarsenings.)
     At a high-level, the key aspect of k-monotonicity which makes this general approach possible is
reminiscent of the concept of heredity in property testing. Specifically, we rely upon the fact that “gridding
preserves k-monotonicity”: if f is k-monotone, then so will be its coarsening g—but now g is much
simpler to handle. This allows us to trade the domain [n]d for what is effectively [m]d , with m  n. We
point out that this differs from the usual paradigm of dimension reduction: indeed, the latter would reduce
                                                                           0
the study of a property of functions on [n]d to that of functions on [n]d for d 0  d (usually even d 0 = 1)
by projecting f on a lower-dimensional domain. In contrast, we do not take the dimension down, but
instead reduce the size of the alphabet. Moreover, it is worth noting that this gridding technique is also
orthogonal to that of range reduction, as used, e. g., in [22]. Indeed, the latter is a reduction of the range
of the function from [R] to {0, 1}, while gridding is only concerned about the domain size.

Estimating the support of distributions. Our proof of the poly(1/ε) upper bound for testing k-
monotonicity on the line (Theorem 1.5) rests upon an unexpected connection to distribution testing,
namely to the question of support size estimation of a probability distribution. In more detail, we describe
how to reduce k-monotonicity testing to the support size estimation problem in (a slight modification of)
the Dual access model introduced by Canonne and Rubinfeld [15], where the tester is granted samples
from an unknown distribution as well as query access to its probability mass function.
    For our reduction to go through, we first describe how any function f : [n] → {0, 1} determines a
probability distribution D f (on [n]), whose effective support size is directly related to the k-monotonicity
of f . We then show how to implement dual access to this D f from queries to f : in order to avoid any
dependence on k and n in this step, we resort both to the gridding approach outlined above (allowing us to
remove n from the picture) and to a careful argument to “cap” the values of D f returned by our simulated
oracle. Indeed, obtaining the exact value of D f (x) for arbitrary x may require Ω(k) queries to f , which
we cannot afford; instead, we argue that only returning D f (x) whenever this value is “small enough”
is sufficient. Finally, we show that implementing this “capped” dual access oracle is possible with no
dependence on k whatsoever, and we can now invoke the support size estimation algorithm of [15] to
conclude.

Fourier analysis on the hypergrid. We give an algorithm for fully tolerantly testing k-monotonicity
whose query complexity is exponential in d. We also describe an alternate tester
                                                                            √ (with a slightly worse
tolerance guarantee) whose query complexity is instead exponential in O(k
                                                                        e     d) for constant distance

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                7
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

parameters. As mentioned above, we use our gridding approach combined with tools from learning
theory. Specifically, we employ an agnostic learning algorithm of [34] using polynomial regression. Our
coarsening methods allow us to treat the domain as if it were [m]d for some m that is independent of n. To
prove that this agnostic learning algorithm will succeed, we turn to Fourier analysis over [m]d . We extend
the bound on average sensitivity of k-monotone functions over the Boolean hypercube from [11] to the
hypergrid, and we show that this result implies that the Fourier coefficients are concentrated on “simple”
functions.

1.3     Discussion and open problems
This is the first article to study k-monotonicity in the property testing setting, a natural and well-motivated
generalization of monotonicity. Hence these results open up many intriguing questions in the area of
property testing, with potential applications to learning theory, circuit complexity and cryptography.
    A natural question emerging from our results here is whether k-monotonicity on the hypercube {0, 1}d
can be tested with poly(d k ) queries, as one may guess by “smoothly” generalizing monotonicity. As it
turns out, this is actually not the case: subsequent results
                                                          √     [29] show that even testing 2-monotonicity,
non-adaptively and with one-sided error, requires 2     Ω(  d) many queries.
    Another important open question concerns the hypergrid domain, and in particular it pushes for a
significant strengthening of Corollary 1.7 and Corollary 1.9:
                                                                                   √
        Can k-monotonicity on the hypergrid [n]d be (tolerantly) tested with 2ok ( d) queries?

Answering this question would imply further progress on the L1 -testing question for monotonicity, left
open in [8, 52].
    There also remains the question of establishing two-sided lower bounds that would go beyond those
of monotonicity. Specifically:

        Is there an d Ω(k) -query two-sided lower bound for k-monotonicity on the hypercube {0, 1}d ?

    In this article we also show surprising connections to distribution testing (e. g., in the proof of The-
orem 1.5), and to testing union of intervals and testing surface area (as discussed in Section 5.2). An
intriguing direction is to generalize this connection to union of intervals and surface area in higher
dimensions, to leverage or gain insight on k-monotonicity on the d-dimensional hypergrid.
    Finally, while we only stated here a few directions, we emphasize that every question that is relevant
to monotonicity is relevant and interesting in the case of k-monotonicity as well.

1.4     Related work
1.4.1    Previous work on k-monotonicity
As mentioned, k-monotonicity has deep connections with the notion of negation complexity of functions,
which is the minimum number of negation gates needed in a circuit to compute a given function. The
power of negation gates is intriguing and far from being understood in the context of circuit lower bounds.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                8
                                       T ESTING k-M ONOTONICITY

Quoting from Jukna’s book [33], the main difficulty in proving nontrivial lower bounds on the size of
circuits using AND, OR, and NOT is the presence of NOT gates: we already know how to prove even
exponential lower bounds for monotone functions if no NOT gates are allowed. The effect of such gates
on circuit size remains to a large extent a mystery.
    This gap has motivated the study of circuits with few negations. Two notable works successfully
extend lower bounds in the monotone setting to negation-limited setting: in [3], Amano and Maruoka
show superpolynomial circuit lower bounds for (1/6) log log n negations using the C LIQUE function; and
recently the breakthrough results of Rossman [49] establishes circuit lower bounds for NC1 with roughly
(1/2) log n negations by drawing upon his lower bound for monotone NC1 .
    The divide between the understanding of monotone and non-monotone computation exists in general:
while we usually have a fairly good understanding of the monotone case, many things get murky or fail
to hold even when a single negation gate is allowed. In order to get a better grasp on negation-limited
circuits, a body of recent work has been considering this model in various contexts: Blais et al. [11] study
negation-limited circuits from a computational learning viewpoint, Guo et al. [31] study the possibility of
implementing cryptographic primitives using few negations, and Lin and Zhang [39] are interested in
verifying whether some classic Boolean function conjectures hold for the subset of functions computed
by negation-limited circuits.
    Many of these results implicitly or explicitly rely on a simple but powerful tool: the decomposition of
negation-limited circuits into a composition of some “nice” function with monotone components. Doing
so enables one to apply results on separate monotone components, and finally to carefully combine the
outcomes (e. g., [30]). Though these techniques can yield results for as many as O(log n) negations, they
also leave open surprisingly basic questions:

    • [11] Is there an efficient algorithm that weakly learns circuits with a single negation?

    • [31] Is there a pseudorandom generator computed with a single negation gate?

   In contexts where the circuit size is not the quantity of interest, the equivalent notion of 2-monotone
functions is more natural than that of circuits allowing only one negation. Albeit seemingly simple,
even the class of 2-monotone functions remains largely a mystery: as exemplified above, many basic yet
non-trivial questions, ranging from the structure of their Fourier spectrum to their expressive power of
k-monotone functions, remain open.


1.4.2   Previous work on monotonicity testing

In this section, we summarize the state-of-the-art on monotonicity testing. We observe that this question
has been considered for functions over various domains (e. g., hypergrids, hypercubes and general posets)
and ranges (notably Boolean range {0, 1} and unbounded range N); as hypergrids and hypercubes are
arguably the domains that have received the most attention in the literature, we will in this overview
restrict ourselves to work on these, and refer readers to [26, 9] for other various posets. We will also
focus on the Boolean range {0, 1}, which is most relevant to our work. We then briefly mention the best
known results (which are tight as well) for unbounded range N. At the end of this section, we include
known results for tolerant monotonicity testing.

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                              9
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

    Before we go over those results, we recall some notation: namely, testers can make adaptive (a.) or
non-adaptive (n. a.) queries and have one-sided (1. s.) or two-sided (1. s.) error. The best one could hope
for would then be to obtain one-sided non-adaptive upper bounds complemented with two-sided adaptive
lower bounds. We note that all testers included in below except tolerant testers are one-sided (almost all
of them are non-adaptive) algorithms.

Hypercubes with Boolean Range. The problem of monotonicity testing is introduced by Goldreich et
al. [28] for functions f : {0, 1}d → {0, 1}. [28] present a simple “edge tester” with query complexity
O(d/ε). A tester with O(d 7/8 /ε 3/2 ) queries, the first improvement in terms of the dependence on d and
the first to “break” the linear barrier, was presented
                                                    √ by Chakrabarty and Seshadhri [18], further improved
    e 5/6 /ε 4 ) by Chen et al. [20]. Recently, a O(
to O(d                                            e d/ε 2 ) upper bound was established by Khot et al. [37]
which is optimal (for non-adaptive one-sided testers) up to logarithmic factors. All these upper bounds
are obtained for one-sided, non-adaptive testers.                          √
     For one-sided non-adaptive testers, Fischer et al. [26] showed an Ω( d) lower bound. For two-sided
non-adaptive testers, Chen et al. [20] obtained an Ω(d  e 1/5 ) lower bound, further improved by Chen et
al. [19]) to Ω(d  1/2−c ) (for any constant c > 0). All these lower bounds applying to non-adaptive testers,
they only imply an Ω(log d) lower bound for adaptive ones. Recently, Belovs and Blais [7] showed an
e 1/4 ) lower bound for two-sided adaptive testers, i. e., an exponential improvement over the previous
Ω(d
bounds, further improved by Chen, Waingarten, and Xie [21] to Ω(d     e 1/3 ). All mentioned lower bounds
hold for constant ε > 0, and are summarized in Table 3.

                   Domain           Upper bound                      Lower bound
                   {0, 1}d    e d 1/2 /ε 2 1. s.-n. a. [37]    Ω d 1/2 1.
                                                                      
                              O
                                                                  1/2−o(1)
                                                                            s.-n. a. [26]
                                                              Ω d            1. s.-n. a. [19]
                                                                 e d 1/3 1. s.-a. [21]
                                                                         
                                                                Ω

                    Table 3: Testing monotonicity of a function f : {0, 1}d → {0, 1}.


Hypergrids with Boolean Range. We remark that most known previous upper bounds for testing
monotonicity over hypergrids are for unbounded range, which we will be the focus of the next section.
Instead, we only mention here the case of Boolean range, giving in each setting the current best known
results. For testing monotonicity over the line with Boolean range (i. e., d = 1 case), both a one-sided
non-adaptive O(1/ε) upper bound and a two-sided adaptive Ω(1/ε) lower bound are known (both of
them being folklore). For d = 2, Berman et al. [8] showed a tight bound of Θ((log 1/ε)/ε) for one-sided
non-adaptive testers. Interestingly, they also prove that “adaptivity” helps in the d = 2 case: that is, they
establish a one-sided tight adaptive O(1/ε) upper bound which beats Ω(log 1/ε)/ε) lower bound for
one-sided non-adaptive testers. For general d, Berman et al. [8] give both a one-sided non-adaptive tester
with query complexity O( dε log dε ), and a one-sided adaptive tester with query complexity

                                                           d 2 log d
                                                                    
                                              d    d−1 1
                                        O d2 log         +             .
                                                       ε        ε
The best known results can be found in Table 4.

                       T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               10
                                         T ESTING k-M ONOTONICITY

               Domain                  Upper bound                            Lower  bound
                                         1                                         1
               [n]                   O ε 1. s.-n. a.                          Ω ε  1. s.-a.
               [n]2              O ε1 log ε1 1. s.-n. a. [8]            Ω ε1 log ε1  1. s.-n. a. [8]
                                     O ε1 1. s.-a. [8]                        Ω ε1 1. s.-a.
               [n]d             O ε log dε 1. s.-n.a. [8]
                                   d
                                                                         Ω ε log ε1 1. s.-n. a. [8]
                                                                            1
                                              2
                           O d2d logd−1 ε1 + d log d
                                                                               Ω ε1 1. s.-a.
                                                                                     
                                                ε    1. s.-a. [8]

                      Table 4: Testing monotonicity of a function f : [n]d → {0, 1}.


Unbounded Range. For unbounded range, tight upper and lower bounds are known for both hypergrid
and hypercube domains. Chakrabarty and Seshadhri [17] describe a one-sided non-adaptive tester with
O(d log n/ε) queries for the hypergrid [n]d . Later, they show that O(d log n/ε) is essentially optimal even
for two-sided adaptive tester [16]. For the hypercube, [17] give a one-sided non-adaptive tester making
O(n/ε) queries, and a matching two-sided adaptive lower bound is proved by Joshua Brody (mentioned
as private communication in [16]). We refer readers to [17, 16] for overviews on previous results for
testing monotonicity over the hypercube and hypergrid with unbounded range. The best known results
are summarized in Table 5.

                 Domain            Upper bound                        Lower
                                                                          bound
                                   d                                   d
                 {0, 1}d
                                    
                              O ε 1.    s.-n. a. [17]             Ω ε 1. s.-a.
                                                                                [16]
                                d log n
                 [n]d        O ε          1. s.-n. a. [17]   Ω d log n 1    1
                                                                  ε − ε log ε 1. s.-a. [16]

                         Table 5: Testing monotonicity of a function f : D → N.


Tolerant Testing. To the best of our knowledge, prior to our results tolerant testers for monotonicity
for Boolean functions over the hypergrid were only known for dimension d ∈ {1,     2}. Specifically, an
O(ε2 /(ε2 − ε1 )2 )-query upper bound is known for d = 1, while an O
                                                                   e 1/(ε2 − ε1 )4 -query one is known
for d = 2 [8, 24].

                          Domain            Upper bound                  Lower bound
                          [n]           O( ε2 ) [8] 1. s.-a.                  ?
                                         (ε2 −ε1 )2
                                            1
                          [n]2        O
                                      e         4
                                          (ε2 −ε1 )
                                                      [24] 1. s.-n. a.          ?

                 Table 6: Tolerant testing monotonicity of a function f : [n]d → {0, 1}.


1.5   Organization of the paper
After recalling some notations and definitions in Section 2 and providing some insight into the structural
properties of functions far from k-monotonicity in Section 3, we consider the case of the Boolean
hypercube in Section 4, where we establish lower bounds on testing k-monotonicity of functions

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                             11
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

 f : {0, 1}d → {0, 1} for both one- and two-sided algorithms, and provide an algorithm which “beats” the
testing-by-learning approach, showing that testing is provably easier than learning.
      Next, we establish our results for functions on the line in Section 5, starting with the lower and upper
bounds for one-sided testers before turning in Section 5.2 to the two-sided upper bound of Theorem 1.3.
We then describe in Section 6 our results for functions on the grid [n]2 , focusing on the case k = 2; and
discussing possible extensions in Section 6.2.
      Section 7 contains our general algorithms for k-monotonicity on the hypergrid [n]d , for arbitrary
k and d. We prove Theorem 1.8 in two parts. We establish its first item (general tolerant testing
algorithm with exponential dependence  √ in d) in Section 7.1 (Proposition 7.2). The second item (with
query complexity exponential in k d) is proven in Section 7.2, where we analyze the Fourier-based
tolerant tester of Proposition 7.11. We then apply these results to the question of tolerant L1 -testing of
monotonicity in Section 8, after describing a reduction between monotonicity of functions [n]d → [0, 1]
and of [n]d+1 → {0, 1}.

   Except maybe Section 8 which depends on Section 7, all sections are independent and self-contained,
and the reader may choose to read them in any order.


2    Preliminaries
We denote by log the binary logarithm, and use O(·)
                                               e to hide polylogarithmic factors in the argument (so
                     c
     e f ) = O( f log f ) for some c ≥ 0).
that O(
    Given two functions f , g : X → Y on a finite domain X, we write dist( f , g) for the (normalized)
Hamming distance between them, i. e.,

                                               1
                             dist( f , g) =                            Pr [ f (x) 6= g(x) ]
                                                  ∑ 1{ f (x)6=g(x)} = x∼X
                                              |X| x∈X

where x ∼ X refers to x being drawn from the uniform distribution on X. A property of functions from X
to Y is a subset P ⊆ XY of these functions; we define the distance of a function f to P as the minimum
distance of f to any g ∈ P:
                                              dist( f , P) = inf dist( f , g) .
                                                             g∈P

    For some of our applications, we will also use another notion of distance specific to real-valued
functions, the L1 distance. Testing under L1 distance is a natural notion that has been studied ever since
the first works in Property Testing (e. g., [27]).
    For f , g : X → [0, 1], we write

                                        1
                       L1 ( f , g) =       ∑ | f (x) − g(x)| = Ex∼X [| f (x) − g(x)|] ∈ [0, 1]
                                       |X| x∈X

and extend the definition to L1 ( f , P), for P ⊆ X[0,1] , as before.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                              12
                                        T ESTING k-M ONOTONICITY

Property testing. We recall the standard definition of testing algorithms, as well as some terminology:
Definition 2.1. Let P be a property of functions from X to Y. A q-query testing algorithm for P is a
randomized algorithm T which takes as input ε ∈ (0, 1] as well as query access to a function f : X → Y.
After making at most q(ε) queries to the function, T either outputs ACCEPT or REJECT, such that the
following holds:
    • if f ∈ P, then T outputs ACCEPT with probability at least 2/3;                        (Completeness)

    • if dist( f , P) ≥ ε, then T outputs REJECT with probability at least 2/3;                 (Soundness)
where the probability is taken over the algorithm’s randomness. If the algorithm only errs in the second
case but accepts any function f ∈ P with probability 1, it is said to be a one-sided tester; otherwise, it
is said to be two-sided. Moreover, if the queries made to the function can only depend on the internal
randomness of the algorithm, but not on the values obtained during previous queries, it is said to be
non-adaptive; otherwise, it is adaptive.
    Additionally, we will also be interested in tolerant testers—roughly, algorithms robust to a relaxation
of the first item above:
Definition 2.2. Let P, X, and Y be as above. A q-query tolerant testing algorithm for P is a randomized
algorithm T which takes as input 0 ≤ ε1 < ε2 ≤ 1, as well as query access to a function f : X → Y. After
making at most q(ε1 , ε2 ) calls to the oracle, T outputs either ACCEPT or REJECT, such that the following
holds:
    • if dist( f , P) ≤ ε1 , then T outputs ACCEPT with probability at least 2/3;           (Completeness)

    • if dist( f , P) ≥ ε2 , then T outputs REJECT with probability at least 2/3;               (Soundness)
where the probability is taken over the algorithm’s randomness. The notions of one-sidedness and
adaptivity of Definition 2.1 extend to tolerant testing algorithms as well.
    Note that as stated, in both cases the algorithm “knows” X, Y, and P; so that the query complexity q
can be parameterized by these quantities. More specifically, when considering X = [n]d and the property
P of k-monotonicity, we will allow q to depend on n, d, and k. Finally, we shall sometimes require a
probability of success 1 − δ instead of the (arbitrary) constant 2/3; by standard techniques, this can be
obtained at the cost of a multiplicative O(log(1/δ )) in the query complexity.

PAC and agnostic learning [51]. A learning algorithm A for a concept class C of functions f : X → Y
(under the uniform distribution) is given parameters ε, δ > 0 and sample access to some target function
 f ∈ C via labeled samples hx, f (x)i, where x is drawn uniformly at random from X. The algorithm should
output a hypothesis h : X → Y such that dist(h, f ) ≤ ε with probability at least 1 − δ . The algorithm
is efficient if it runs in time poly(n, 1/ε, 1/δ ). If A must output h ∈ C we say it is a proper learning
algorithm, otherwise, we say it is an improper learning one.
    Moreover, if A still succeeds when f does not actually belong to C, we say it is an agnostic learning
algorithm. Specifically, the hypothesis function h that it outputs must satisfy dist( f , h) ≤ OPT f + ε with
probability at least 1 − δ , where OPT f = ming∈C dist( f , h).

                       T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               13
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

3    Structural results
In this section, we will prove that the distance to k-monotonicity of a Boolean function f can be expressed
in a combinatorial way—which does not require measuring the distance between f and the closest
k-monotone function to f . We will prove this for a general finite poset domain building up on the ideas
of [26]. In the rest of this section, we denote by P = (V, ) an arbitrary poset, the underlying domain of
the function.

Definition 3.1. We define the violated pattern K as the sequence of alternating bits

                                             K = (b1 , b2 , · · · , bk , bk+1 )

of length (k + 1), where b1 = 1 and all the bits in the sequence alternate, i. e., bi 6= bi+1 ∀i ∈ [k].

A function f : P → {0, 1} is k-monotone only if it avoids K. That is, for any x1 ≺ x2 ≺ . . . ≺ xk+1 ∈ P we
have f (xi ) 6= K(i) for some i ∈ [k + 1].
    Using insights from the literature on monotonicity testing, we show that functions far from k-
monotonicity have a large matching of “violated hyperedges” in the “violation hypergraph” which we
define shortly. Let us recall the definition of “violation graph” which has been extremely useful with
monotonicity testing as seen in [23, 46, 32, 2, 26].

Definition 3.2 (Violation graph). Given a function f : P → {0, 1}, the violation graph of f is defined as
Gviol ( f ) = (P, E(Gviol )) where (x, y) ∈ E(Gviol ) if x, y ∈ P form a monotonicity violating pair in f —that
is x  y but f (x) > f (y).

    The following theorem, proved in [26], about violation graphs has been extremely useful in mono-
tonicity testing literature.

Theorem 3.3. Let f : P → {0, 1} be a function that is ε-far from monotone. Then, there exists a matching
of edges in the violation graph for f of size at least ε |P| /2.

Now let us define a generalization of this concept, the violation hypergraph.

Definition 3.4 (Violation hypergraph). Given a function f : P → {0, 1}, the violation hypergraph of f is
Hviol ( f ) = (P, E(Hviol )) where (x1 , x2 , · · · , xk+1 ) ∈ E(Hviol ) if the ordered (k + 1)-tuple x1 ≺ x2 ≺ · · · ≺
xk+1 forms a violation to k-monotonicity in f .

Note that Hrmviol ( f ) is a (k +1)-uniform hypergraph. For a (k +1)-tuple (x1 ≺ x2 ≺ · · · ≺ xk+1 ) ∈ E(Hviol ),
we will sometimes refer to the tuple as a (k + 1)-uniform hyperedge. Now, we state the main theorem
we prove in this section. This theorem offers an alternate characterization of distance to k-monotonicity
that we seek. We recall that a set of edges forms a matching in a hypergraph if any pair of hyperedges is
pairwise disjoint.

Theorem 3.5. Let f : P → {0, 1} be a function that is ε-far from k-monotone. Then, there exists a
matching of (k + 1)-uniform hyperedges of size at least ε |P|/(k + 1) in the violation hypergraph.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                       14
                                           T ESTING k-M ONOTONICITY

     To prove this theorem we first exploit the key notion of extendability (also explored in the context of
testing monotonicity in [26]) which we define below. Later we will show that k-monotone functions are
extendable.
Definition 3.6 (Extendability). A property of Boolean functions is said to be extendable over a poset
domain if the following holds for any set X ⊆ P: given a function f : X → {0, 1} which has the property
(on X), it is possible to define a function g : P → {0, 1} such that g(x) = f (x), ∀x ∈ X and g has the
property.
In other words, a property is extendable if for any subset X ⊆ P, given a function defined over the set X
which respects the property, it is possible to fill in values outside X such that the new function obtained
continues to respect the property. Next, we show that k-monotonicity is an extendable property:
Lemma 3.7. k-monotonicity is an extendable property.
Proof of Lemma 3.7. Consider X ⊆ P and some function f : X → {0, 1} which is k-monotone over X.
That is, for any x1 ≺ x2 ≺ · · · ≺ xk+1 ∈ X there exists i ∈ [k + 1] such that f (xi ) 6= K[i]. Take a minimal
point v ∈ P \ X. That is, for any other point v0 ∈ P \ X either v ≤ v0 or v and v0 are not comparable. We
will use the following result:
Claim 3.8. There exists a function g : X ∪ {v} → {0, 1} such that g(x) = f (x) for all x ∈ X, and g respects
k-monotonicity over its domain.
    Before proving this claim, we show how it implies Lemma 3.7. Namely, starting with any function
f : X → {0, 1} which is k-monotone on its domain X, we just keep applying the Claim 3.8 inductively
until we get a function defined over the entire poset which respects k-monotonicity.

Proof of Claim 3.8. We will show this by contradiction. Suppose there is no good assignment available
for g(v), that is that both the choices g(v) = 0 and g(v) = 1 lead to a violation to k-monotonicity in g.
Consider the choice g(v) = 0. Since this results in a violation to k-monotonicity, we know that there is a
path P0 = (x1 ≺ x2 ≺ · · · ≺ xk+1 ) which is a violation to k-monotonicity. It is clear that v ∈ P0 ; let i be
such that xi = v. Similarly, there is path P1 = (y1 ≺ y2 ≺ · · · ≺ yk+1 ) corresponding to g(v) = 1 which
also contains the violated pattern, and some j such that y j = v. And thus, g(xt ) = g(yt ), for all t ∈ [k + 1]
(as both of the paths indexed by x and y form a violation to k-monotonicity). By the above discussion 2
paths, P0 and P1 , meet at v. We will see that one of the two paths
                             P00 = (x1 ≺ x2 ≺ · · · ≺ xi−1 ≺ yi ≺ yi+1 ≺ · · · ≺ yk+1 )
or
                            P10 = (y1 ≺ y2 ≺ · · · ≺ y j−1 ≺ x j ≺ x j+1 ≺ · · · ≺ xk+1 )
is already a violation to k-monotonicity in f . To see this, let us begin by recalling that we let v be the ith
vertex on P0 and the jth vertex on P1 . Now it is clear that i 6= j. Without loss of generality, suppose i < j.
In this case, the evaluations of f along path P10 form the violated pattern. This is because the function
values alternate along the segment (y1 ≺ y2 ≺ · · · ≺ y j−1 ). Also, the function values alternate along the
segment (xi ≺ xi+1 ≺ xi+2 ≺ · · · ≺ xk+1 ). And finally note that since f (y j−1 ) 6= f (y j ) and f (y j ) = f (x j )
we get that f (y j−1 ) 6= f (x j ) as well. So, the path P10 indeed contains a violation to k-monotonicity as
claimed. The other case, i > j, is analogous. Hence the claim follows.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                     15
     C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER




    In the next lemma, we show that there is a nice characterization of distance to k-monotonicity in
terms of the size of the minimum vertex cover of the violation hypergraph.3

Lemma 3.9. Let Mk denote the set of k-monotone functions over the poset P, and f : P → {0, 1}. Then
dist( f , Mk ) = ε f if, and only if, the size of the minimum vertex cover in Hviol ( f ), denoted |VCmin (Hviol ( f ))|
is ε f |P|.

Proof of Lemma 3.9. We establish separately the two inequalities.
                                   1
Claim 3.10. dist( f , Mk ) ≥          |VCmin (Hviol )|.
                                  |P|
Proof. Suppose the distance to k-monotonicity is ε f , and let g be a k-monotone function achieving
it, so that dist( f , g) = ε f . Define X = { x ∈ P : f (x) 6= g(x) } (thus, |X| = ε f |P|). Let us consider the
violation hypergraph for f given as Hviol ( f ) = (P, E(Hviol )). Now, delete vertices in X and the hyperedges
containing any vertex v ∈ X from this hypergraph. Because X is the smallest set of vertices changing
values at which gives a k-monotone function, it follows that every hyperedge in E(Hviol ) must contain a
vertex in X. Thus, X indeed forms a vertex cover in Hviol ( f ).
                                   1
Claim 3.11. dist( f , Mk ) ≤          |VCmin (Hviol )|.
                                  |P|

Proof. Suppose the minimum vertex cover in the violation hypergraph has size ε f |P|. We will show
that the distance of the function to k-monotonicity is ε f . To see this, let C ⊆ P be the smallest vertex
cover of the violation hypergraph of the said size. Observe that deleting C from Hviol ( f ) removes all
the hyperedges, and therefore that the function f restricted to X = P \ C is k-monotone. And by the
extendability of k-monotone functions established in Lemma 3.7, it follows that the function can be
extended to the rest of the domain (by providing values in the cover C) such that it keeps respecting
k-monotonicity. Thus, the distance to k-monotonicity is at most |C| / |P|.



    Having characterized distance to k-monotonicity as the size of the smallest vertex cover in the
violation graph, we are ready to establish Theorem 3.5. To do so, we will require the following standard
fact:

Fact 3.12. Let G = (V, E) be a t-uniform hypergraph. Let M be the maximum matching in G. Then,
|M| ≤ VCmin (G) ≤ t |M|

Proof of Theorem 3.5. By Lemma 3.9, we know that the violation hypergraph, Hviol ( f ) has a minimum
vertex cover of size at least ε f |P|. And by Claim 3.12, it is seen that it contains a matching of t-uniform
hyperedges of size at least (ε f /t) |P|.
   3 Recall that a vertex cover in a hypergraph is just a set of vertices such that every hyperedge contains at least one of the

vertices from this set.


                           T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                              16
                                                 T ESTING k-M ONOTONICITY

4     On the Boolean hypercube
In this section, we focus on k-monotonicity of Boolean√
                                                          functions over the hypercube {0, 1}d . We begin
in Section 4.1 with a tester with query complexity 2O( d) , establishing a strict separation between learning
                                                    e

and testing. Section 4.2 is then dedicated to our lower bounds on k-monotonicity testing.

4.1     Upper bound: beating the learning approach
In this section, we prove the following theorem:4
Theorem 1.2. There exists a one-sided non-adaptive tester for k-monotonicity of functions f : {0, 1}d →
{0, 1} with query complexity
                                                 √
                                                                    
                                                                    1
                             q(d, ε, k) = exp O     d · log d · log      .
                                                                    ε
Let us recall the following standard fact which we use in the proof.
Fact 4.1. There exists an absolute constant C > 0 such that the number of points of {0, 1}d that do not
have integer weights in the middle levels
                                      d √       C d √
                                                                 
                                                                C
                                        − d log , + d log
                                      2         ε 2             ε

is at most ε2d−1 .
Proof of Theorem 1.2. Let us begin by describing the following tester. We will later see that this tester
has the claimed query complexity.

    (1) Sample O(1/ε) random points from the middle levels.

    (2) For each of the queries in the first step, query all points with Hamming weight in the middle levels
        which fall in the subcube below and in the subcube above each such random point. We call each of
        these O(1/ε) collections of queries a superquery.

The key idea behind the tester is that there is a big “matching of violated hyperedges” in a function far
from k-monotonicity, as seen in Theorem 3.5. The tester tries to find one of the “violated hyperedges.”
We recall that the violation hypergraph for a function f over the hypercube has {0, 1}d as its vertex set.
And a tuple (x1 < x2 < · · · < xk+1 ) of (k + 1) vertices forms an edge iff it is a violation to k-monotonicity.
In the rest of the analysis, we assume that k is odd.5 In this case, given a function f ε-far from k-
monotonicity
         √ we can assume without loss of generality that f equals 0 on points      √ with Hamming weight
< d/2− d log(C/ε), and equals 1 on points with Hamming weight > d/2+ d log(C/ε). From Fact 4.1
it follows that the resulting function is still ε/2-far from being k-monotone.
    4 We note that this result is only interesting in the regime k ≤
                                                                       √                     √
                                                                        d: indeed, for k = Ω( d log(1/ε)) every function is ε-close
to k-monotone.
    5 The case k even is similar. In this case, one may assume that f evaluates to 0 on all points outside the middle levels.



                             T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                               17
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

    Now, by Theorem 3.5, we know that there exists a matching M f of violations to k-monotonicity in f
of size
                                        ε/2 · 2d    ε2d−1
                                                 =        .
                                         k+1         k+1
The set of vertices that participate in these violations has cardinality

                                                     ε2d−1
                                         (k + 1) ·         = ε2d−1 .
                                                     k+1

With constant probability, in O(1/ε) queries in the first step, the tester queries a vertex that belongs
to some violation from M f . Then in the next step, the tester finds this√violation. Now we bound the
number of queries made in a single superquery. Since it involves only 2 d log(C/ε)
                                                                              √
                                                                                        levels of the cube,
the number of points queried in a single superquery is no more than b√= d O( d log(1/ε)) . The total query
complexity of the tester can therefore be upper bounded by b/ε = d O( d log(1/ε)) . We emphasize that the
number of queries made by this tester has no dependence on k.



Remark 4.2. We can design and analyze another standalone tester with an almost identical asymptotic
upper bound on query complexity. Our analysis of the performance of this tester does not use the
extendability property. The tester makes O(1/ε) iterations. In every iteration, it picks a random seed
x ∈ {0, 1}n and queries all points on chains going through x in the truncated cube. The idea behind our
analysis is a notion of “rank” for points in the cube. We say rank(x) ≥ ` if there is some ascending
chain ending at x which has ≥ l flips on it. Observe that if the function is indeed far, then there is at
least an ε fraction of points with rank at least k + 1. Thus, with constant probability, some iteration hits
a seed with large rank. And in the next step, the tester finds the violation. To upper bound the query
complexity, note
              √     that the number of chains in the truncated cube going through any initial seed x is at
most b = d  O(  d log(1/ε)) . This gives an upper bound of O(b/ε).


4.2     Lower bounds
We now turn to lower bounds for testing k-monotonicity of Boolean functions over the hypercube {0, 1}d .
In Section 4.2.1, we show that for 1 ≤ k ≤ d 1/4 /2,√any one-sided non-adaptive tester for k-monotonicity
requires Ω(d/k2 )k/4 queries, generalizing the Ω( d) lower bound for monotonicity due to Fischer et
al. [26]. This bound suggests the problem becomes√     strictly harder when k increases:
                                                                                 √        specifically, for
2 < k ≤ d 1/4 /2 testing k-monotonicity requires ω( d) queries, while an O(    e d/ε 2 ) one-sided non-
adaptive upper bound holds for monotonicity testing [37].
     We then describe in Section 4.2.2 a general reduction from monotonicity testing to k-monotonicity
testing, for arbitrary constant k. This blackbox reduction allows us to carry any lower bound (possibly
two-sided or adaptive) for monotonicity testing to k-monotonicity. In particular, combining it with the
recent lower bounds [19, 7] for two-sided monotonicity testing, we obtain an Ω(d 1/2−o(1) ) lower bound
for non-adaptive k-monotonicity testers, and an Ω(d 1/4 ) lower bound for adaptive ones.

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                             18
                                        T ESTING k-M ONOTONICITY

4.2.1   One-sided lower bounds
Theorem 1.1. For 1 ≤ k ≤ d 1/4 /2, any one-sided non-adaptive tester for k-monotonicity of functions
                                      k/4
f : {0, 1}d → {0, 1} must make Ω d/k2      queries.

    One natural way to prove Theorem 1.1 is to extend the monotonicity lower bound in [26]. To this end,
we consider the nk -sized family of functions {gS : S ⊆ [d] of size k} where every single function in the
family is a truncated anti-parity over some subset S of size k. By an argument analogous to the one used
in [26], one can show Ω(d k/4 ) lower bound on query complexity. However, functions in this family are
only Ω(2−k )-far from being k-monotone. We refine this argument to obtain functions that are Ω(1)-far
from being k-monotone, for all k.

Proof of Theorem 1.1. Consider the family of functions { fS : S ⊆ [d] of size t} where t ≥ k is a parameter
to be determined later and fS is a truncated anti-parity over t input coordinates indexed by S, namely,
                                        (                           √
                                          ⊕i∈S xi if ||x| − d/2| ≤ d,
                                   fS =
                                          0        otherwise.

    The theorem immediately follows from the two claims below.
Claim 4.3. For any non-adaptive q-query algorithm A, there exists fS where |S| = t such that A reveals
a violation on fS with probability at most
                                            √   .  
                                          2 2 d  t       d
                                        q                    .
                                             k   k       k
                         √              b(t−k−1)/2c t  t 
Claim 4.4. For k ≤ t ≤     d, fS is Ω ∑i=0          i /2 -far from any k-monotone function.
                                                       √
   In particular, let t = 4k2 . By Claim 4.4, for t ≤ d, fS is Ω(1)-far from any k-monotone function,
and by Claim 4.3, to reject every fS with Ω(1) probability, q needs to be at least
                                                         k/2
                                                    k
                                     d k/4 ·                     = Ω(d/k2 )k/4 .
                                                   2e2t

Proof of Claim 4.3. Let Q be an arbitrary set of√q queries. √
                                                            Without loss of generality, we assume every
query z in Q has Hamming weight |z| ∈ [d/2 − d, d/2 + d]. We define Qx,z = { y ∈ Q : x  y  z }
and Rej(Q) = { fS : Q contains a violation for fS }. Note that Rej(Q) = (x,z) : xz∈Q Rej(Qx,z ). Hence
                                                                          S

we can bound the size of Rej(Q) by

                                    |Rej(Q)| ≤              ∑         |Rej(Qx,z )| .                    (4.1)
                                                     (x,z)∈Q2 : xz

Fix any x  z ∈ Q. Fix any violation for fS in Qx,z . Note that there exists S0 ⊆ S with |S0 | = k such that
Qx,z contains two points x0 and z0 with x  x0  z0  z and xS0 0√= 0k and z0S0 = 1k . Also, note that x and z
                   √
differ in at most 2 d coordinates so that there are at most 2 k d distinct S0 on which xS0 is 0k and zS0 is
                                                                   


                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               19
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

                                                  d−k
1k . Moreover, for each S0 there are at most                distinct S such that S0 ⊆ S. Therefore we can bound the
                                                        
                                                  t−k
size of Rej(Qx,z ) by
                                                         √       
                                                         2 d  d −k
                                         |Rej(Qx,z )| ≤               .                                          (4.2)
                                                          k    t −k
Combining (4.1) and (4.2),                            √       
                                                      2 d  d −k
                                                            2
                                        |Rej(Q)| ≤ q               .
                                                       k    t −k
It follows that for any non-adaptive algorithm making at most q queries,
                                                                       √       
                                                                       2 d  d −k  2
                 ∑ Pr[A reveals a violation for fS ] ≤ E[|Rej(Q)|] ≤ q k     t −k
                                                                                    .
              S⊆[d]:|S|=t

Hence there exists fS such that
                                                              √            √  
                                                             2 d d−k       2 d t
                                                                     
                      Pr[A reveals a violation for fS ] ≤ q2 k d t−k = q2 k d  k .
                                                                 t           k

                        0                            function to fS . Let Z denote the set {z ∈ {0, 1}d−t :
      √ Claim 4.4. Let f√
Proof of                S be the closest k-monotone
                                              √
d/2 − d ≤ |z| ≤ d/2 + d − t}. For any t ≤ d,
                     √                √
                                                                       d √ d √
                                                                                         
              d −t     d −t d −t         d −t
                   −        ,       +            is contained in         − d, + d − t
                2       2       2         2                            2       2

so that |Z| = Ω(2d−t ).
    For any assignment z ∈ Z on coordinates indexed by [d] \ S, fS (·, z) agrees with ⊕i∈S xi and fS0 (·, z) is
                                                                                                b(t−k−1)/2c t  t
k-monotone. To finish the proof, it suffices to show that an anti-parity over t inputs is ∑i=0              k /2 -
far from any k-monotone function over t inputs. Indeed, this will imply that, for every z ∈ Z, fS (·, z)
                            b(t−k−1)/2c t 
differs from fS0 (·, z) on ∑i=0           k points, and thus finally that f S differs from f S on
                                                                                              0

                                                                              
                                           2 c
                                       b t−k−1                     2 c
                                                                 b t−k−1  
                                                t                           t
                                  |Z| · ∑           = Ω 2d−t · ∑              
                                          i=0   k                   i=0     k

points.
      Now we show it is the case that an anti-parity function over t inputs h(x1 , . . . , xt ) = ⊕ti=1 xi is far from
any k-monotone function g (over x1 , . . . , xt ). We begin by noting that we can sample a random point from
{0, 1}t by first sampling a random chainC = (x0 = 0t , x1 , . . . , xt = 1t ) (from all possible chains from 0t to
1t ) then outputting xi with probability ti /2t . Thus the distance between h and g, namely Prx [h(x) 6= g(x)],
is the same as PrC,i [h(xi ) 6= g(xi )] which can be further expanded as follows
               "                         #       "                                                             #
                      t
                  t
                                                     t       
                                                   1     t − 1
            EC ∑ it · 1{h(xi )6=g(xi )} = EC t ∑                 · (1{h(xi )6=g(xi )} + 1{h(xi−1 )6=g(xi−1 )} )   (4.3)
                 i=0 2                             2 i=1 i − 1


                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                       20
                                           T ESTING k-M ONOTONICITY

where the last inequality relies on the identity
                                                        
                                          t      t −1    t −1
                                             =         +       .
                                          i      i−1       i

For any fixed chain C, because g alternates at most k times, there are at least t − k choices of i ∈ [t] such
that g(xi−1 ) = g(xi ), which implies 1{h(xi )6=g(xi )} + 1{h(xi−1 )6=g(xi−1 )} ≥ 1 (due to h(xi−1 ) 6= h(xi )). Thus

                                  t      
                                    t −1
                                ∑           · (1{h(xi )6=g(xi )} + 1{h(xi−1 )6=g(xi−1 )} )
                                i=1 i − 1


is at least the sum of smallest t − k binomials among t−1          t−1
                                                                     
                                                       0 , . . . , t−1 which is


            2 c
        b t−k−1           2 c
                      b t−k−2             2 c
                                       b t−k−1            2 c
                                                       b t−k−2           2 c
                                                                      b t−k−1 
                 t −1            t −1            t −1            t −1           t
           ∑          +   ∑ t −1−i     =   ∑           +   ∑          ≥   ∑ i ,
           i=0     i      i=0              i=0     i       i=0     i      i=0

which implies
                                                                       2 c
                                                                   b t−k−1 
                                                           1                t
                                        Pr[h(x) 6= g(x)] ≥ t          ∑ i
                                         x                2           i=0

by combining the above with (4.3).



Remark 4.5. One may observe that our lower bound is suboptimal (by a quadratic factor) for the case
k = 1, i. e., for monotonicity. In this case, the construction and argument of [26] indeed give an Ω(d 1/2 )
lower bound, while instantiating Theorem 1.1 only yields Ω(d 1/4 ). The reason is that, in their proof,
Fischer et al. apply a “spanning tree argument” on the set of violating edges to obtain this quadratic
improvement: however, it is not clear how this argument, particularly well-suited for their construction
(basically, for antidictator functions), would apply to our functions.

4.2.2   Two-sided lower bounds
The following theorem gives a construction that enables us to convert monotone functions into k-monotone
functions, and functions that are far from monotone into functions that are far from k-monotone.

Theorem 4.6. There exists an efficiently computable function h : {0, 1}d/2 → {0, 1} such that, for any
                                                                                                     def
g : {0, 1}d/2 → {0, 1}, g||h : {0, 1}d → {0, 1} is a Boolean function (defined as g||h(x, y) = g(x) ⊕ h(y)
for any x, y ∈ {0, 1}d/2 ) satisfying the following.

    • if g is monotone, then g||h is a k-monotone function;

    • if g is ε-far from monotone, then g||h is Ω(ε/k)-far from being a k-monotone function.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                     21
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

     The above theorem reduces testing monotonicity to testing k-monotonicity (for arbitrary constant k)
with the same number of queries to the input function. As we will see, it carries any lower bound for
testing monotonicity over to testing k-monotonicity, while preserving the characteristics (two-sidedness,
adaptivity) of the original lower bound. In particular, combining it with the recent of [19, 21], we obtain
the following corollary.
Corollary 4.7. For any c > 0 and k ≥ 1, there exists ε = ε(k, c) > 0 such that any two-sided non-adaptive
algorithm for testing whether f is k-monotone or ε-far from it requires Ω(d 1/2−c ) queries. Any two-sided
                             e 1/3 ) queries.
adaptive algorithm requires Ω(d
     To prove Theorem 4.6, we prove the following three claims. Claim 4.8 and Claim 4.9 assume the
existence of a (k − 1)-monotone function h which satisfies an interesting structural property. This property
is then exploited to establish Theorem 4.6. Finally in Claim 4.10, we establish the existence of such h,
and Theorem 4.6 follows.
Claim 4.8. Let h : {0, 1}d → {0, 1} be a (k − 1)-monotone function. Then for any monotone g : {0, 1}d →
{0, 1}, f = g||h is a k-monotone function.
Claim 4.9. Suppose there exists a h : {0, 1}d → {0, 1} such that the following holds. There exists at least
M paths of length k − 1 such that (i) all paths are vertex disjoint and (ii) for every path y1  · · ·  yk ,
h(y1 ) = 0 and h(yi ) 6= h(yi+1 ) for 1 ≤ i ≤ k − 1. Then, for any g : {0, 1}d → {0, 1} which is ε-far from
being a monotone function, the function f = g||h is Ω((M/2d ) · ε)-far from being a k-monotone function.
Claim 4.10. For any constant k, there exists an efficiently computable h : {0, 1}d → {0, 1} such that h is
a (k − 1)-monotone function and h contains at least (1 − od (1))2d /k paths of length k − 1 such that all
paths are vertex disjoint and for every path y1  · · ·  yk , h(y1 ) = 0 and h(yi ) 6= h(yi+1 ) for 1 ≤ i ≤ k − 1.
Proof of Claim 4.8. Suppose f = g||h is not k-monotone, then there exist (x1 , y1 ), . . . , (xk+1 , yk+1 ) such
that (x1 , y1 )  · · ·  (xk+1 , yk+1 ), f (x1 , y1 ) = 1 and f (xi , yi ) 6= f (xi+1 , yi+1 ) for any 1 ≤ i ≤ k. Because
g is monotone, either g is constant on x1 , . . . , xk+1 , or there exists an index 1 < j ≤ k + 1 such that
g(xi ) = 0 for i < j and g(xi ) = 1 for i ≥ j. In the first case, if g is the constant 0 on x1 , . . . xk+1 (resp.
constant 1) either h(yi ) = f (xi , yi ) (resp. or h(yi ) = 1 − f (xi , yi )) for any i and h alters exactly k times on
points y1  · · ·  yk+1 . For the second case, h(y1 ) = f (x1 , y1 ) = 1, and h alters exactly k − 1 times on
y1  · · ·  y j−1  y j+1  · · ·  yk+1 . Both cases contradict h being (k − 1)-monotone.
Proof of Claim 4.9. Let Mh be the maximal set of paths of length k such that all paths are vertex disjoint
and for every path y1  · · ·  yk , h(y1 ) = 0 and h(yi ) 6= h(yi+1 ) for 1 ≤ i ≤ k − 1. Let Mg be the maximal
set of pairs such that all pairs are vertex disjoint and every pair x1  x2 is a violation for g, i. e., g(x1 ) = 1
and g(x2 ) = 0. For each path (y1  · · ·  yk ) ∈ Mh and any pair (x1  x2 ) ∈ Mg , it is easy to see the
following path is a violation for f ||g being k-monotone :
                                            (x1 , y1 ), . . . , (x1 , yk ), (x2 , yk ) .
Let f 0 be the closest k-monotone function to f . For each violating path, f and f 0 differ on at least 1
point. Because both Mh and Mg are vertex disjoint, violating paths constructed by taking every path in
Mh and every pair in Mg are vertex disjoint. Thus f and f 0 differ on at least |Mh | × |Mg | points and f
is (|Mh | · |Mg | /22d )-far from f 0 . It is known ([28]) that for any g : {0, 1}d → {0, 1} which is ε-far from
monotone, |Mg | ≥ 2d−1 ε. The desired conclusion follows.

                          T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                        22
                                           T ESTING k-M ONOTONICITY

Proof of Claim 4.10. Let B1 , B2 , . . . , Bk be consecutive blocks each consisting of consecutive Hamming
levels of the hypercube. Letting |Bi | denote the number of points in the ith block, note that it is possible to
have all blocks satisfy,                            d                        d
                                                 k    2                     k      2
                                ∀i ∈ [k] 1 −   √           ≤ |Bi | ≤ 1 +   √         .
                                                  d    k                     d k
                                                                       √
Because k is a constant and every layer contains at most 2d / d points, we can always greedily find
B1 , . . . , Bk one by one. Let h be a function such that h is constant over every block and alternates on
successive blocks, and which evaluates to 0 over all of B1 . Note that h is (k − 1)-monotone. Next we
inductively argue that for any 1 ≤ j ≤ k, B1 , . . . , B j contain at least (1 − od (1))2d /k vertex disjoint paths
of length j − 1 such the ith point on every path is in Bi . Claim 4.10 follows from the case j = k.
      For j = 1, the statement holds by taking all points in B1 . For j > 1, assume that B1 , . . . , B j−1 contain
a set Pj−1 of such (1 − od (1))2d /k vertex disjoint paths of length j − 2. Let M be the maximal matching
between B j−1 and B j and let Pj be the set of paths of length j − 1 constructed in following way: for each
path in Pj−1 with an endpoint y j−1 , if there exists y j such that (y j−1 , y j ) ∈ M, we add the path appended
with y j into Pj . Because no points in B j will be added into two different paths in Pj and Pj−1 are vertex
disjoint, paths in Pj are vertex disjoint.
      Now we show |M| = min(|B j−1 |, |B j |) which implies |Pj | ≥ (1 − od (1))2d /k. Suppose |B j−1 | ≤ |B j |
(the argument is analogous in the other case). For any subset S of B j−1 , let fS be the indicator function of
the upper closure of S denoted as N(S). It is not hard to check that fS is monotone and thus

                              Pr[ fS (x) = 1 | x ∈ B j ] ≥ Pr[ fS (x) = 1 | x ∈ B j−1 ] .

It follows

             N(S) ∩ B j = B j Pr[ fS (x) = 1 | x ∈ B j ] ≥ B j−1 Pr[ fS (x) = 1 | x ∈ B j−1 ] ≥ |S| .

By Hall’s theorem, |M| = |B j−1 |. By similar argument, we can show |M| = |B j | when |B j−1 | > |B j |. Thus
|M| = min(|B j−1 |, |B j |).


5     On the line
In this section we prove our results on testing k-monotonicity on the line, that is of functions f : [n] →
{0, 1}. We start with Theorem 1.4, which establishes that this can be done non-adaptively with one-sided
error, with only O(k/ε) queries; we then turn to Theorem 1.3, which shows that this is the best one can
hope for if we insist on one-sidedness. The last result of this section is Theorem 1.5, where we show
that—perhaps unexpectedly—two-sided algorithms, even non-adaptive, can break this barrier and test
k-mononicity with no dependence on k.

5.1   Upper and lower bounds for one-sided testers
We first prove the upper bound, restated below:
Theorem 1.4. There exists a one-sided non-adaptive tester for k-monotonicity of functions f : [n] → {0, 1}
with query complexity q(n, ε, k) = O(k/ε).

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                   23
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

                                                                                                   def
Proof. We assume that εn/(50k) is an integer,6 and partition the domain into K = 50k/ε intervals of
size εn/(50k), the consecutive “blocks” B1 , . . . , BK . We then define g : [n] → {0, 1, ∗} as the function
constant on each block Bi = {bi , . . . , bi+1 − 1}, such that

    • If f (bi ) = f (bi+1 − 1), then g( j) = f (bi ) for all j ∈ Bi ;

    • otherwise, g( j) = ∗ for all j ∈ Bi .

We say that a block Bi such that g|Bi = ∗ is a changepoint block for g. Clearly, given query access to
 f one can obtain the value of g on any point j ∈ [n] with only two queries to f . Moreover, defining
ge: [n] → {0, 1} to be the function obtained from g by replacing ∗ by 0, we observe the following:

    • If f is k-monotone, then (i) so is ge, and (ii) f and ge differ in at most k blocks (note that any
      block—whether changepoint block or otherwise—whenever they differ flip count of f goes up by
      1), so that dist( f , g) ≤ k · (1/K) = ε/50;

    • If f is ε-far from k-monotone, then either (i) ge is not k-monotone, or (ii) dist( f , ge) > ε.

    We first discuss a natural idea, why it fails, and how to fix it. The above suggests a very simple
algorithm: first, learn the function ge (exactly) using O(k/ε) queries, and then distinguish between
dist( f , ge) ≤ ε/50 and dist( f , ge) > ε by random sampling, using O(1/ε) queries. This immediately results
in a non-adaptive testing algorithm with query complexity O(k/ε); however, this tester is inherently
two-sided, due to the second (distance estimation) step. To obtain one-sidedness, we thus take a slightly
different route, and consider the number of blocks on which f and ge differ instead of estimating the
distance. As finding at least k + 1 such blocks can only happen when f is not k-monotone, this will never
result in rejecting a k-monotone function; but this one-sidedness comes at the price of a slightly less
straightforward analysis.
    As mentioned above, we start by learning g (and thus ge) exactly, using 2K = O(k/ε) non-adaptive
                       def
queries. Setting m = C · (k/ε), we also sample m0 ∼ Poisson(m) points7 j1 , . . . , jm0 independently and
uniformly from [n], where C > 0 is a constant to be determined in the course of the analysis, and query
the value of f (and g) on all of them. Then, we reject if either (i) ge is not k-monotone; or (ii) there exist at
least k + 1 distinct blocks which contain a sample s j such that ge(s j ) 6= f (s j ).
    By definition, this tester is non-adaptive; and it is not difficult to see it accepts any k-monotone
function with probability 1, since in that case f and g (and a fortiori ge) differ in at most k blocks.
    It remains to argue soundness: we will show that if f is ε-far from k-monotone, the tester will reject
with probability at least 2/3. By the first check made, (i), we can assume in the following that ge is
k-monotone—as otherwise f is rejected with probability 1—and we need to show that (ii) will reject with
   6 If not, we consider instead
                                                def 50k                50k
                                                          j εn k               ε
                                              ε0 =             >ε−          >
                                                    n 50k               n      2
if ε > 100k/n; while if ε ≤ 100k/n we query the entire function, for a total of n = O(k/ε) queries.
    7 The fact that we sample Poisson(m) instead of m is for ease of the analysis; note that due to the tight concentration of

Poisson random variables, with probability 1 − o(1) we will have m0 ≤ 2m. If this does not happen, the tester can output
ACCEPT, incurring only a small additional error probability (and not affecting the one-sidedness).


                             T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                          24
                                          T ESTING k-M ONOTONICITY

probability at least 2/3. For each block Bi (where i ∈ [K]), let pi ∈ [0, 1/K] be defined as the (normalized)
number of points in Bi on which f and ge differ (we henceforth refer to such a point as a giveaway point):
                                                def 1
                                           pi =      ∑       1{ f ( j)6=ge( j)} .
                                                   n j∈B i


Since f is ε-far from the k-monotone function ge, we have ∑Ki=1 pi ≥ ε. Now, letting Zi be the indicator of
the event that among the m0 samples, at least one is a giveaway point from Bi , and Z = ∑Ki=1 Zi , we can
write Zi = 1{Yi 6=0} , where the (Yi )i∈[K] are independent Poisson random variables with Yi ∼ Poisson(mpi ).
The expected number of blocks in which a giveaway point is sampled is then
                                    K             K                         K
                             EZ = ∑ EZi = ∑ Pr[Yi 6= 0 ] = ∑ (1 − e−mpi ) .
                                    i=1          i=1                       i=1

Since for every i ∈ [K] it holds that e−mpi ≤ 1 − (m/2)pi (the inequality holding since 0 ≤ mpi ≤ 1, which
is verified for m ≤ K), we get
                                           K            K
                                                            m      mε C
                                  EZ = ∑ EZi ≥ ∑              pi ≥    ≥ k.
                                          i=1           i=1 2       2  2

Moreover, by a Chernoff bound, we get that
                                            
                                           C     Ck      C
                                   Pr Z < k ≤ e− 16 ≤ e− 16
                                           4
                                                  def
which is less that 1/4 for C ≥ 23. Setting C = 30 satisfies both conditions that m ≤ K and C ≥ 23, and
results in a one-sided non-adaptive tester which rejects functions far from k-monotone with probability at
least 1 − 1/4 + o(1) ≥ 2/3.

Turning to the lower bounds against one-sided testers, we show the following:
Theorem 1.3. Any one-sided (possibly adaptive) tester for k-monotonicity of functions f : [n] → {0, 1}
must have query complexity Ω(k/ε).
Proof. Since a lower bound of Ω(1/ε) straightforwardly holds, we can restrict ourselves to k ≥ 8, and
ε < 1/12; moreover, we assume without loss of generality that εn/k is an integer, and partition the
               def
domain into K = k/ε intervals of size εn/k, the consecutive “blocks” B1 , . . . , BK . For v ∈ {0, 1}K/2 , we
define gv : [n] → {0, 1} as the function which has constant value vi on block B2i−1 and has constant value
1 on the remaining blocks.
    Consider the distribution over {gv }v where each coordinate of v is independently set to 0 with
               def
probability p = 6ε, and 1 otherwise. We next show that gv is at least ε-far from any k-monotone
function with very high probability over the choice of v. By a Chernoff bound, with probability at least
1 − e−pK/16 = 1 − e−3k/8 , gv has at least pK/4 = 3k/2 blocks that are 0 blocks. Conditioned on this, it is
easy to see that gv is ε-far from k-monotone: indeed, to make it k-monotone one has to flip its value on at
least k blocks, and each block contains an ε/k fraction of the domain.

                       T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               25
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

      Fix any deterministic adaptive algorithm with query complexity q ≤ k/(24ε) queries, and denote by
x1 , . . . , xq the sequence of queries made (when given query access to some function gv ). Without loss of
generality, we assume that x1 , . . . , xq are made into distinct odd blocks. x1 , . . . , xq reveals a violation if
and only if number of 0’s in corresponding answers is at least k/2 + 1.
      If x1 , . . . , xq are non-adaptive queries, the probability that the number of 0’s in f (x1 ), . . . , f (xq ) is at
least k/2 + 1 can be upper bounded by Chernoff Bound in a straightforward way. Because each odd block
is sampled independently, the same bound can be obtained in the adaptive setting. Note, for arbitrary
a ∈ {0, 1}q ,
                                                                  q
 Pr[ f (x1 ) = a1 , . . . , f (xq ) = aq ] = Pr[ f (x1 ) = a1 ] · ∏ Pr[ f (xi ) = ai | f (x1 ) = a1 , . . . , f (xi−1 ) = ai−1 ] .
                                                                 i=2

Because xi is determined by f (x1 ) = a1 , . . . , f (xi−1 ) = ai−1 and each odd block is sampled inde-
pendently, for every i it holds that Pr[ f (xi ) = ai | f (x1 ) = a1 , . . . , f (xi−1 ) = ai−1 ] = 6ε if ai = 0 and
Pr[ f (xi ) = ai | f (x1 ) = a1 , . . . , f (xi−1 ) = ai−1 ] = 1 − 6ε if ai = 1. Thus.

                               Pr[ f (x1 ) = a1 , . . . , f (xq ) = aq ] = (1 − 6ε)|a| (6ε)q−|a| .                           (5.1)

   Let Xi be the indicator that f (xi ) = 0. We get that, writing F(i, N, p) for the cumulative distribution
function of a Binomial with parameters N and p,
                 "                #
                    q
                            k
               Pr ∑ Xi ≥ + 1 ≤                 ∑          (1 − 6ε)|a| (6ε)r−|a|
                   i=1      2                  q
                                        a∈{0,1} : |ā|≥k/2
                                        q−k/2                                            
                                                 r            `    q−`          k
                                     = ∑             (1 − 6ε) (6ε)      = F q − , q, 1 − 6ε
                                         `=0     `                              2
                                                                                                              def
                                                                                                                k
                                            = F(q(1 − x), q, 1 − 6ε)                                      (x = 2q ∈ (12ε, 1))
                                            ≤ e−qD(1−x||1−6ε)                          (relative entropy Chernoff bound8 )

where
                                                      def     p               1− p
                                        D(p || p0 ) = p ln       + (1 − p) ln        ,
                                                              p0              1 − p0
and we used the fact that k/2 + 1 ≤ q ≤ k/(24ε). Rewriting slightly the right-hand-side, we obtain
                                       "            #
                                          q
                                                k           k
                                     Pr ∑ Xi ≥ + 1 ≤ e− 2 Φ(x)
                                         i=1    2

for                                                                         
                                              def 1          1−x           x
                                       Φ(x) =    (1 − x) ln        + x ln      .
                                              x             1 − 6ε        6ε
   8 Recall that the relative entropy version of the Chernoff bound states that F(m, N, p) ≤ e−mD( Nm ||p) as long as 0 ≤ m ≤ p.
                                                                                                                          N


                            T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                                26
                                           T ESTING k-M ONOTONICITY

It is not hard to see that Φ is increasing on [6ε, 1) (indeed, for x ∈ [6ε, 1), we have
                                                         1     1−x
                                              Φ0 (x) = − 2 ln
                                                        x     1 − 6ε

and hence Φ0 (x) ≥ 0), and since x ≥ 12ε the right-hand-side is at most e−(k/2)Φ(12ε) . It then suffices to
observe that, for ε ≤ 1/12, it holds that Φ(12ε) ≥ Φ(0) = ln 2 − 1/2 > 1/8 to conclude that
                                          "             #
                                             q
                                                   k            k
                                        Pr ∑ Xi ≥ + 1 ≤ e− 16
                                            i=1    2

and therefore obtain
                                                                                              k
                       Pr[ f (x1 ), . . . , f (xq ) contains at least (k/2 + 1) zeros ] ≤ e− 16 .

Combining the two, this shows that the probability that x1 , . . . , xq does not reveal a violation for gv while
gv is ε-far from k-monotone is at least 1 − e−k/16 − e−3k/8 > 1/3 (since k ≥ 8). By Yao’s principle, for
any (possibly randomized) non-adaptive algorithm A making at most k/(24ε) there exists a fixed v such
that gv is ε-far from k-monotone yet A rejects gv with probability less than 2/3. The desired conclusion
follows.

5.2     Upper bound for two-sided testers: proof of Theorem 1.5
In this section, we prove the two-sided non-adaptive upper bound of Theorem 1.5, restated below:

Theorem 1.5. There exists a two-sided non-adaptive tester for k-monotonicity of functions f : [n] → {0, 1}
with query complexity q(n, ε, k) = O(1/ε
                                   e     7 ), independent of k.


    In what follows, we assume that k > 20/ε, as otherwise we can use for instance the O(k/ε)-query
(non-adaptive, one-sided) tester of Theorem 1.4 to obtain an O(1/ε 2 ) query complexity.

5.2.1    Testing k-monotonicity over [Ck]
In this section, we give a poly(C/ε)-query tester for k-monotonicity over the domain [Ck], where C is a
parameter to be chosen (for our applications, we will eventually set C = poly(1/ε)).

Lemma 5.1. There exists a two-sided non-adaptive tester for k-monotonicity of functions f : [Ck] → {0, 1}
with query complexity O(C3 /ε 3 ).

    The tester proceeds by reducing to support size estimation and using (a slight variant of) an algorithm
of Canonne and Rubinfeld [15]. Let f : [Ck] → {0, 1}, and suppose f is s-monotone but not (s − 1)-
monotone. Then there is a unique partition of [Ck] into s + 1 disjoint intervals I1 , I2 , . . . , Is+1 such that f
is constant on each interval; note that this constant value alternates in consecutive intervals. We can then
define a distribution D f over [s + 1] such that D f (i) = |Ii | /(Ck).
    Our next claims, Claim 5.2 and Observation 5.3, provide the basis for the reduction (from testing
k-monotonicity of f to support size estimation of D f ).

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                   27
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

Claim 5.2. If f is ε-far from k-monotone, then it is not (1+ε)k-monotone, and in particular supp(D f ) >
(1 + ε)k + 1.
    Also, note that if f is k-monotone, then supp(D f ) ≤ k + 1. This together with Claim 5.2 furnishes
the following (for k > 20/ε):
Observation 5.3. In order to ε-test k-monotonicity of f , it suffices to estimate supp(D f ) to within
εk/10.
    The proofs of the above results are relatively straightforward. So we defer these proofs to the end
of this section and now proceed to use algorithm of [15] to do support size estimation. The algorithm
of [15] uses “dual access” to D; an oracle that provides a random sample from D, and an oracle that given
an element of D, returns the probability mass assigned to this element by D.
Theorem 5.4 ([15, Theorem 14 (rephrased)]). In the dual access model described above, there exists an
algorithm that, on input a threshold n ∈ N∗ and a parameter ε > 0, and given access to a distribution D
(over an arbitrary set) satisfying
                                                          1
                                             min D(x) ≥
                                           x∈supp(D)      n
estimates the support size |supp(D)| up to an additive εn, with query complexity O(1/ε 2 ).
    Note however that we only have access to D f through query access to f , and thus have to manage to
simulate (efficiently) access to the former. One difficulty is that, to access D f (i), we need to determine
where Ii lies in f . For example, finding D f (k/2) requires finding Ik/2 , which might require a large number
of queries to f . We circumvent this by weakening the “dual access” model in two ways, arguing for each
of these two relaxations that the algorithm of [15] can still be applied:
    • we rewrite the support size as in [15], as | supp(D f ) | = Ex∼D f [1/D f (x)]. We want to estimate it to
      within ±O(εk) which we can do by random sampling;

    • the quantity inside the expectation depends on D f (x) but not x itself, so “labels” are unnecessary
      for our random sampling. Thus, it will be sufficient to be able to compute D f (x) (and thus 1/D f (x))
      for a random x ∼ D f , even if not actually knowing x itself;

    • actually, even calculating D f (x) may possibly too expensive, so instead we will estimate
                                                                                   
                                                                        20
                            Ex∼D f [1/De f (x)] where D e f (x) = min      , D f (x) .
                                                                        εk

      Note that D
                e f might no longer define a probability distribution; but this expectation is only off by at
      most εk/20, since 1/D0f (x) = max(εk/20, 1/D f (x)) and 1/D f (x) is positive.
More details follow.
    First, we note as discussed above that the algorithm does not require knowing the “label” of any
element in the support of the distribution: the only access required is being able to randomly sample
elements according to D f , and evaluate the probability mass on the sampled points. This, in turn, can be
done, as the following two lemmata explain:

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                28
                                          T ESTING k-M ONOTONICITY

Lemma 5.5 (Sampling from D f ). Let i ∈ [n] be chosen uniformly at random, and let j be such that i ∈ I j .
Then, the distribution of j is exactly D f .

Lemma 5.6 (Evaluating D f ( j)). Suppose I j = {a, a + 1, . . . , b}. Given i such that i ∈ I j , we can find I j by
querying f (i +1) = f (i+2) = · · · = f (b) and f (b +1) 6= f (b), as well as f (i −1) = f (i−2) = · · · = f (a)
and f (a − 1) 6= f (a). The number of queries to f is b − a + 3 = I j + 3.

    Here comes the second difficulty: if we straightforwardly use these approaches to emulate the required
oracles to estimate the support size of D f , the number of queries is potentially very large. For instance,
if we attempt to query D f ( j) where I j = Ω(k), we will need Ω(k) queries to f . This is where comes
the second relaxation: specifically, we shall argue that it will be enough for us to “cap” the size of the
interval (as per our next lemma).

Observation 5.7 (Evaluating D f ( j) with a cap). Given i such that i ∈ I j , we will query f on every point
in [i − 20C/ε, i + 20C/ε]. If I j ≤ 20C/ε, then I j will be determined by these queries. If these queries do
not determine I j , we know I j > 20C/ε. Beyond querying i, this requires 40C/ε (nonadaptive) queries.

We now can put all the above pieces together and give the proof for Lemma 5.1:

Proof of Lemma 5.1. As previously discussed, we use the algorithm of [15] for estimating support size.
Inspecting their algorithm, we see that our cap of 20C/ε for interval length (and therefore 20/(εk) for
maximum probability reported) might result in further error of the estimate. The algorithm interacts with
the unknown function by estimating the expected value of 1/D f ( j) over random choices of j with respect
to D f . Our cap can only decrease this expectation by at most (εk)/20. Indeed, the algorithm works by
estimating the quantity                                             
                                                   1
                                       Ex∼D f          1               ,
                                                D f (x) {D f (x)>τ }
for some suitable parameter τ > 0. By capping the value of 1/D f (x) to 20/(εk), we can therefore only
decrease the estimate, and by at most 20/(εk) · D f ( x : D f (x) > (εk)/20 ) ≤ 20/(εk).
     The condition for their algorithm to estimate support size to within ±εm is that all elements in the
support have a probability mass of at least 1/m. Since each nonempty interval has length at least 1, we
have min j D f ( j) ≥ (1/Ck). In order for their algorithm to report an estimate within ±εk/20 of support
size, we set ε 0 = (ε/20C) in their algorithm.
     The total error in support size is at most εk/20 + εk/20 = εk/10. By Observation 5.3, this suffices to
test ε-test k-monotonicity of f .
     Using the algorithm of [15], we need O(1/ε 0 2 ) = O (C/ε)2 queries to D f . For
                                                                   
                                                                                       every query to D f ,
we need to make O(C/ε) queries to f , so the overall query complexity is O C3 /ε 3 .

Proof of Claim 5.2. The last part of the statement is immediate from the first, so it suffices to prove the
first implication. We show the contrapositive: assuming f is (1 + ε)k-monotone, then f can be fixed into
a k-monotone function by changing at most εn points.
       Let `∗ be the minimum integer ` for which f is `-monotone: we can assume εk ≥ 1 and k < `∗ ≤
(1 + ε)k (as if `∗ ≤ k we are done.) Consider as above the maximal consecutive monochromatic intervals
I1 , . . . , I`∗ . Let S be the set of the (`∗ − k) shortest intervals. Note that intervals in S occupy at most an

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                    29
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

(`∗ − k)/`∗ ≤ εk/k = ε fraction of the domain. Thus it suffices to show modifying intervals in S will
produce a k-monotone function.
    Observe that the partial function formed by intervals not in S is k-monotone (as only k intervals are
involved). Let I j be the first interval which is not in S. We go over intervals in S in ascending order and
for each interval, if its index i is smaller than j, then we set the points in Ii to have the same value as the
points in I j , otherwise, we set them to have the same value as the points in Ii−1 . As each modification
does not increase the number of maximal consecutive monochromatic intervals, this procedure produces
a k-monotone function.

5.2.2   Reducing [n] → {0, 1} to [Ck] → {0, 1}
Now we show how to reduce ε-testing k-monotonicity of f : [n] → {0, 1} to ε 0 -testing k-monotonicity of a
function g : [Ck] → {0, 1} for C = poly(1/ε) and ε 0 = poly(ε), resulting in a poly(1/ε)-query algorithm
for ε-testing k-monotonicity.
    The first step is (as before) to divide [n] into blocks (disjoint intervals) of size εn/(4k) if ε > 8k/n
(again assuming without loss of generality that εn/(4k) is an integer), and blocks of size 1 otherwise
(in which case n ≤ 8k/ε and we can directly apply the result of Claim 5.1, with C = n/k ≤ 8/ε). Let
m = 4k/ε be the number of resulting blocks, and define fm : [n] → {0, 1} as the m-block-coarsening of f :
namely, for any j ∈ Bi , we set

                                    fm ( j) = argmaxb∈{0,1} Pr [ f (k) = b] .                  (majority vote)
                                                            k∈Bi

Ordering the blocks B1 , B2 , . . . , Bm , we also define g : [m] → {0, 1} such that g(i) = fm (a) where g(i) is
set to be the constant value fm takes over the ith block.
     It is easy to see that if f is k-monotone, then f has at most k non-constant blocks, and fm is k-
monotone. Because the function f only changes values k times; for a block to be non-constant, the block
must contain a pair of points with a value change. We call a block variable if the minority points comprise
at least an ε/100-fraction of the block; formally, B is variable if minb∈{0,1} Pr j∈B [ f ( j) = b] ≥ ε/100.
     We need the following claims (their proofs are at the end of the section) to prove Theorem 1.5.

Claim 5.8. Suppose f has s variable blocks. Then dist( f , fm ) ≤ s/m + ε/100.

Claim 5.9. Suppose f is promised to be either (i) k-monotone or (ii) such that fm has more than 54 k
variable blocks. Then we can determine which with O((1/ε 2 ) log(1/ε)) queries, and probability 9/10.

Proof of Theorem 1.5. We use the estimation/test from the previous claim as the first part of our tester.
In more detail, note that in case (ii) in Claim 5.9 above, the test simply rejects f . Otherwise in case (i),
either f is k-monotone or it has at most 5k/4 variable blocks. If f passes, we can assume that fm has less
than (5/4)k variable blocks. By Claim 5.8,

                                                 5k      ε   5ε   ε  ε
                              dist( f , fm ) ≤        m+    =    +   ≤ .
                                                 4       100 16 100 3

This part takes O (1/ε 2 ) log(1/ε) queries.
                                   


                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                 30
                                       T ESTING k-M ONOTONICITY

     Now, we apply the tester of Claim 5.1 (with probability of success amplified to 9/10 by standard
arguments) to (ε/6)-test k-monotonicity of g : [m] → {0, 1}, where g(i) is the constant value of fm on
Bi , and m = (4k)/ε. Let q be the query complexity of the tester, and set δ = 1/(10q); to query g(i), we
randomly query f on O((1/ε) log(1/δ )) points in Bi and take the majority vote. With probability at least
1 − δ , we get the correct value of g(i), and by a union bound all q simulated queries have the correct
value with probability at least 9/10.
     Therefore, to get a single query to g, we use O((log  q)/ε) queries. In the context of our previous
                                         3   3          6
section, we have C = 4/ε, so q = O(C /ε ) = O 1/ε and the overall query complexity of this part
is O((q log q)/ε) = O((1/ε 7 ) log(1/ε)). This dominates the query complexity of the other part of the
tester, from Lemma 5.9, which is O((1/ε 2 ) log(1/ε)). By a union bound over the part from Lemma 5.9,
the simulation of g, and the call to the tester of Claim 5.1, the algorithm is correct with probability at
least 1 − 3/10 > 2/3.

Proof of Lemma 5.8. We will estimate the error of fm in computing f on variable blocks and non-variable
blocks separately. Each non-variable block B can contribute error on at most ε |B| /100 points. Each
variable block B can contribute error on at most |B| = n/m points. The total number of errors is at most
εn/100 + s(n/m) = n(ε/100 + s/m), yielding the upper bound on dist( f , fm ).

Proof of Lemma 5.9. We first note that given any fixed block B, it is easy to detect whether it is variable
(with probability of failure at most δ ) by making O((1/ε) log(1/δ )) uniformly distributed queries in B.
Doing so, a variable block will be labelled as such with probability at least 1 − δ , while a constant block
will never be marked as variable. (If a block is neither constant nor variable, then any answer will do.)
    Letting s denote the number of variable blocks, we then want to non-adaptively distinguish between

                                     5  5ε                      ε
                                   s≥ k= m           and s ≤ k = m
                                     4  16                      4

(since if f were k-monotone, then fm had at most k variable blocks). Doing so with probability at least
19/20 can be done by checking only q = O(1/ε) blocks chosen uniformly at random: by the above,
setting δ = 1/(20q) all of the q checks will also yield the correct answer with probability no less than
9/10, so by a union bound we will distinguish (i) and (ii) with probability at least 9/10. We conclude by
observing that all
                                                                 
                                         1      1           1     1
                                    O q · log       = O 2 log
                                         ε      q          ε      ε
queries are indeed non-adaptive.


6    On the grid
We now turn to the grid, and consider k-monotonicity of functions defined on [n]2 . More specifically,
in this section we prove Theorem 1.6, giving an adaptive tester for 2-mononicity with optimal query
complexity, before discussing in Section 6.2 possible extensions of these ideas.

                       T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                              31
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

6.1     The case k = 2
Theorem 1.6. There exists a two-sided adaptive tester for 2-monotonicity of functions f : [n]2 → {0, 1}
with query complexity q(n, ε) = O(1/ε).

Proof. At a high-level, the algorithm relies on two key components: the first is the observation that
testing 2-monotonicity of f : [n]2 → {0, 1} under some suitable additional assumption on f reduces to
(tolerant) testing monotonicity of two one-dimensional functions (but with larger range), under the L1
norm. The second is that, given access to an arbitrary f , one can efficiently provide query access to some
function g which satisifies this additional assumption, and such that g will also be close to f whenever f
is truly 2-monotone.
     Combining the two then enables one to test this function g for 2-monotonicity, and then check whether
it is also the case that f and g are sufficiently close. The first step, by the above, can be done efficiently
by simulating query access to g, which (with some additional tricks) in turn allows to simulate access
to the corresponding one-dimensional functions: and invoke on these two functions the L1 -tester of [8].
(The main challenges there lies in performing this two-level simulation while keeping the number of
queries to f low enough; which we achieve by carefully amortizing the queries made overall.)

Details. Hereafter, we assume without loss of generality that f is identically 0 on the bottom and top
rows, that is f (1, j) = f (n, j) = 0 for all j ∈ [n]. (Indeed, we can ensure this is the case by adding two
extra rows, extending the domain of f to [n + 2] × [n]: note that f remains 2-monotone if it was already,
and can only decrease its distance to 2-monotonicity by O(1/n)).9 For the sake of the proof, we will
require the notion of 2-column-wise-monotonicity, defined below:
Definition 6.1. A function f : [n]2 → {0, 1} is said to be 2-column-wise-monotone if, for every j ∈ [n], its
restriction f j : [n] × { j} → {0, 1} is 2-monotone. Given such a function f , we define the two sequences
(∂¯ f j ) j∈[n] and (∂ f j ) j∈[n] as the sequence of “changepoints” in the columns. More formally, we define
                     ¯
         ∂ f j = min { i ∈ [n] : f (i, j) 6= f (1, j) } − 1,   ∂¯ f j = max { i ∈ [n] : f (i, j) 6= f (n, j) } + 1
         ¯
for every j such that f |[n]×{ j} is not constant. If the column f |[n]×{1} is constant, let ∂ f1 = 1 and ∂¯ f1 = n;
                                                                                                ¯
further, if for some j > 1 the column f |[n]×{ j} is constant, let ∂ f j = ∂ f j−1 and ∂¯ f j = ∂¯ f j−1 . Note that
                                                                         ¯      ¯
∂ f j ≤ ∂¯ f j for every column j ∈ [n].
¯
      As it turns out, testing 2-monotonicity of functions guaranteed to be 2-column-wise-monotone reduces
to testing monotonicity of these two specific subsequences:
Lemma 6.2. Let f : [n]2 → {0, 1} be 2-column-wise-monotone. If f is 2-monotone, then both sequences
(∂ f j ) j∈[n] and (∂¯ f j ) j∈[n] are non-increasing. Moreover, if f is ε-far from 2-monotone, then at least one
 ¯
of the two sequences is ε/2-far from non-increasing (in Hamming distance).

Proof. The first part of the theorem is straightforward (by contrapositive, if at least one of the two
sequences is not non-increasing then we can find a violation of 2-monotonicity). We thus turn to the
second part, and show the contrapositive; for this purpose, we require the following result:
   9 This will be used in the proof of Lemma 6.2 and Lemma 6.4.



                          T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                   32
                                             T ESTING k-M ONOTONICITY

Claim 6.3. If f : [n]2 → {0, 1} is a 2-column-wise monotone function such that (i) f (1, j) = f (n, j) = 0
for all j ∈ [n] and (ii) both (∂ f j ) j∈[n] , (∂¯ f j ) j∈[n] ⊆ [n] are non-increasing, then f is 2-monotone.
                                    ¯
Proof. By contradiction, suppose there exists a 2-column-wise monotone function f satisfying (i) and (ii),
which is not 2-monotone. This last point implies there exists a triple of comparable elements x = (ix , jx ) ≺
y = (iy , jy ) ≺ z = (iz , jz ) constituting a violation, i. e., such that ( f (x), f (y), f (z)) = (1, 0, 1). Moreover,
since (i) holds we must have 1 < ix ≤ iy ≤ iz < n; more precisely, 1 ≤ ∂ f jx < ix ≤ iy ≤ iz < ∂¯ f jz ≤ n. As
                                                                                    ¯
x ≺ y ≺ z, we have jx ≤ jy ≤ jz , which by the non-increasing assumption (ii) implies that ∂ f jx ≥ ∂ f jy
                                                                                                                ¯       ¯
and ∂¯ f jy ≥ ∂¯ f jz . But this is not possible, as altogether this leads to ∂ f jy < iy < ∂¯ f jy , i. e., f (y) = 1.
                                                                                  ¯
                                                  ¯
      Assume both sequences (∂ f j ) j∈[n] , (∂ f j ) j∈[n] ⊆ [n] are ε/2-close to non-increasing, and let L, H ⊂ [n]
                                     ¯
(respectively) be the set of indices where the two sequences need to be changed in order to become
non-increasing. By assumption, |L| , |H| ≤ εn/2, so |L ∪ H| ≤ εn. But to “fix” a value of (∂ f j ) j∈[n] or
                                                                                                                ¯
(∂¯ f j ) j∈[n] requires to change the values of the function f inside a single column—and this can be done
preserving its 2-column-wise-monotonicity, so that changing the value of f on at most n points is enough.
It follows that making both (∂ f j ) j∈[n] and (∂¯ f j ) j∈[n] non-increasing requires to change f on at most
                                       ¯
εn2 points, and with Claim 6.3 this results in a function which is 2-monotone. Thus, f is ε-close to
2-monotone.

     It is possible to refine the above statement to obtain, in the second case, a more precise characterization
in terms of the L1 distance of these two sequences to monotonicity. For conciseness, in the rest of this
                             (d)
section we denote by Mk the class of k-monotone functions on [n]d , and will omit the subscript when
                                                     (d)
k = 1 (i. e., for monotone functions: M(d) = M1 ).
Lemma 6.4. Let f : [n]2 → {0, 1} be 2-column-wise-monotone. If f is ε-far from 2-monotone then at
least one of the two sequences is ε/2-far from non-increasing (in L1 distance). More precisely:
                                              
                                           (2)
                                 dist f , M2 ≤ L1 (∂¯ f , M(1) ) + L1 (∂ f , M(1) ) .                  (6.1)
                                                                          ¯
Proof. Recall that we aim at establishing the following:
                                              
                                           (2)
                                 dist f , M2 ≤ L1 (∂¯ f , M(1) ) + L1 (∂ f , M(1) ) .                  (6.2)
                                                                          ¯
For notational convenience, we will view in this proof the sequences (∂ f ) j , (∂¯ f ) j as functions
                                                                              ¯
                                                 ∂ f , ∂¯ f : [n] → [n] .
                                                 ¯
Let `, h : [n] → [n] (for “low” and “high,” respectively) be monotone functions achieving L1 (∂ f , M(1) )
                                                                                                     ¯
and L1 (∂¯ f , M(1) ), respectively.
    • As ∂ f ( j) ≤ ∂¯ f ( j) for all j ∈ [n], we will assume `( j) ≤ h( j) for all j. Otherwise, one can consider
          ¯
      instead the functions `0 = min(`, h) and h0 = max(`, h): both will still be monotone (non-increasing),
      and by construction
                         `0 ( j) − ∂ f ( j) + h0 ( j) − ∂¯ f ( j) ≤ |`( j) − ∂ f ( j)| + h( j) − ∂¯ f ( j)
                                   ¯                                              ¯
       for all j ∈ [n], so that L1 (∂¯ f , `0 ) + L1 (∂ f , h0 ) ≤ L1 (∂¯ f , `) + L1 (∂ f , h).
                                                      ¯                                ¯

                          T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                       33
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

    • From ` and h, we can define a 2-column-wise monotone function g : [n]2 → [n] such that ∂ g = `
                                                                                             ¯
      and ∂¯ g = h: that is,
                                             
                                             0 if i ≥ h( j),
                                             
                                    g(i, j) = 1 if `( j) < i < h( j),
                                             
                                              0 if i ≤ `( j),
                                             

      for (i, j) ∈ [n]2 .

It is clear that g is 2-column-wise monotone with g(1, j) = g(n, j) = 0 for all j ∈ [n]; since by construction
∂ g, ∂¯ g are non-increasing, we can invoke Claim 6.3 to conclude g is 2-monotone. It remains to bound the
¯
distance between f and g: writing ∆ j ∈ {0, . . . , n} for the number of points on which f and g differ in the
j-th column, we have
      
            (2)
                              1 n       1 n
  dist f , M2 ≤ dist( f , g) = 2 ∑ ∆ j ≤ 2 ∑ |`( j) − ∂ f ( j)| + h( j) − ∂¯ f ( j)
                                                                                    
                              n j=1     n j=1         ¯
                                      1 n                        1 n
                                                                       h( j) − ∂¯ f ( j) = L1 (∂ f , `) + L1 ∂¯ f , h
                                                                                                                     
                                  =    2 ∑  |`( j) − ∂ f ( j)| +  2 ∑
                                      n j=1          ¯           n j=1                         ¯
                                  ≤ L1 (∂ f , M(1) ) + L1 (∂¯ f , M(1) )
                                        ¯
which concludes the proof.

   Having established these two lemmata, we turn to the proof of Theorem 1.6. We first describe a
non-optimal tester making O(1/ε 2 ) queries; before explaining how to modify it in order to amortize the
number of queries, to yield the desired O(1/ε) query complexity.
   Suppose we are given query access to an arbitrary function f : [n]2 → {0, 1}. For simplicity, as before
                                                                                                               def
we assume without loss of generality that εn/16 is an integer, and partition each column into K = 16/ε
intervals of size εn/16, that is partition the domain [n] × [n] into a grid [16/ε] × [n] (each column being
divided in K blocks B1 , . . . , BK of size εn/16). This uniquely defines a function g : [n]2 → {0, 1, ∗}: for
any point x = (i, j) ∈ [n]2 , we let ` ∈ [K] be the index such that i ∈ B` = {b` , . . . , b`+1 − 1}, and set:

    • g(x) = f (i, b` ), if f (i, b` ) = f (i, b`+1 − 1);

    • g(x) = ∗, if f (i, b` ) 6= f (i, b`+1 − 1);

so that g is constant on any “block” B` × {i}. Note that we can provide query access to g, at the price of
an overhead of 2 queries (to f ) per query (to g).
    However, this g may not itself be 2-column-wise-monotone; for this reason, we will instead work
with a “fixed” version of g which will by construction be 2-column-wise-monotone. In more detail, we
define ge to be the 2-column-wise-monotone function obtained by the following process.

    • First, we (arbitrarily) set the values ∗ to 0, so that g becomes a function g : [n]2 → {0, 1}.

                            T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                         34
                                          T ESTING k-M ONOTONICITY

   • Define ge by its restriction on each column: letting (∂¯ g j ) j∈[n] and (∂ g j ) j∈[n] be as defined in Defini-
                                                                               ¯
     tion 6.1 (observing that the quantities are well and uniquely defined even if g is not 2-column-wise-
     monotone), set                          
                                             g(1, j)
                                                             if i ≤ ∂ g j ,
                                                                       ¯
                                   ge(i, j) = 1 − g(1, j) if ∂ g j < i < ∂¯ g j ,
                                                                  ¯
                                                              if i ≥ ∂¯ g j .
                                             
                                              g(n, j)
                                             

From this construction, it is clear that ge is 2-column-wise-monotone, and entirely and deterministically
determined by g (and therefore by f ); moreover we have ge = g whenever g is itself 2-column-wise-
monotone. Furthermore, any query to ge can straightforwardly be answered by making at most queries
O(1/ε) to g, and hence to f .

   • If f is 2-monotone, then g is ε/8-close
                                                 to f and so is ge; moreover, ge is 2-monotone as well.
                                                 (2)
     Therefore dist( f , ge) ≤ ε/8 and dist ge, M2 = 0.

   • If f is ε-far from 2-monotone, then ge is either (i) ε/4-far from f or (ii) 3ε/4-far from 2-monotone,
     since if neither hold then by the triangle inequality f is ε-close to 2-monotone.

The tester (first take). The algorithm now proceeds as follows:

   1. simulate query access to ge (defined as above) to detect if
                                                            
                                                         (2)
                                               dist ge, M2 ≥ ε

      using Equation (6.1), with probability of failure 1/6. More precisely, test monotonicity in L1
      distance of both ∂ ge and ∂¯ ge with parameter ε/64, using the (non-adaptive) algorithm of [8]; and
                           ¯
      reject if any of these two tests rejects.
                               
                            (2)
         • If dist f , M2 = 0 (and so in particular ge is 2-monotone as well by the first bullet above),
            then
             (a) L1 (∂¯ ge, M(1) ) = L1 (∂ ge, M(1) ) = 0, And
                                         ¯
             (b) Estimating that dist( f , ge) ≤ ε8 vs dist( f , ge) > ε4 reveals the correct answer (the former)
                  with probability at least 5/6
           by Lemma 6.2, and both tests will accept.
                           
                        (2)
         • If dist fe, M2 > ε, (and so in particular ge is possibly 3ε4 far from 2-monotone) then

            (a) max(L1 (∂¯ ge, M(1) ), L1 (∂ ge, M(1) )) > 3ε8 , Or
                                           ¯
            (b) Estimating that dist( f , ge) ≤ ε8 vs dist( f , ge) > ε4 reveals the correct answer (the latter)
                with probability at least 5/6
            So, in this case, at least one of the two tests rejects.

   2. return ACCEPT if both of the two tests above passed, REJECT otherwise.

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                     35
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

By a union bound, the tester is then correct with probability at least 2/3; its query complexity is
                                                             
                                         1        0     1           1
                                  2t · O     + t ·O         +O
                                         ε              ε          ε

where t and t 0 are respectively the cost of simulating query access to ∂¯ g, ∂ g, and to ge; as the first step,
                                                                              ¯
testing in L1 for for functions defined on the line [n], has query complexity O(1/ε)        from [8]. Taking
     0                                                                              2
                                                                                      
t = t = O(1/ε) as discussed above then results in a query complexity of O 1/ε .
    However, as mentioned previously this is not optimal: as we shall see, we can modify this to obtain
instead an O(1/ε) query complexity. In order to amortize the overall query complexity, we define the
following process that specifies a 2-column-wise-monotone function g̊:
The Amortized Tester The idea here is to have our previous tester simulate query access to a new
     function g̊ instead ge. Simulating a single query ge took O(1/ε) queries to f and this is where
     amortizing queries using g̊ reduces the number of queries. Details follow.

          • (Initialization) Let j1 , . . . , j1/ε+1 ∈ [n] be the indices defined by j` = (` − 1) · εn + 1 for
            ` ∈ [1/ε], and j1/ε+1 = n.
          • (g̊ on j1 -st column) Obtain all the values of ge on the j1 -st column [n] × { j1 }, at a cost of
            O(1/ε) queries to f , to find ∂¯ g̊ j1 , ∂ g̊ j1 . Define g̊ on this column accordingly.
                                                        ¯
          • (g̊ on remaining columns) Assuming g̊ has been defined on the j` -th column, define it on the
             j`+1 -th column: starting at the “vertical” positions of the two changepoints ∂¯ g̊ j` , ∂ g̊ j` of the
                                                                                                        ¯
            previous column, start querying “downwards” the values of g on the j`+1 -th column until
            candidates values for ∂¯ g̊ j`+1 , ∂¯ g̊ j`+1 consistent with g are found or the bottom of the column
            is reached (in which case the corresponding changepoint ∂¯ g̊ j`+1 or ∂ g̊`+1 is set to 1). After
                                                                                           ¯
            this, define g̊ on the j`+1 -th column to be consistent with these (at most) two changepoints.

      Note that the queries made in this step are adaptive, and the (partial) function g̊ obtained at the
      end coincides with ge if f (and therefore ge) is indeed 2-monotone. This is because in this case,
      by Lemma 6.2 the sequences (∂¯ gej ) j , (∂ gej ) j will be non-decreasing, and therefore the process
                                                  ¯
      outlined above will result in ∂¯ gej` = ∂¯ g̊ j` and ∂ gej` = ∂ g̊ j` for all 1 ≤ ` ≤ 1/ε + 1. Moreover, it
                                                           ¯        ¯
      is not difficult to see that the total number of queries made to f in this initialization step will be
      O(1/ε): this is because of the fact that we only search for the current changepoint ∂¯ g̊ j` (resp. ∂ g̊ j` )
                                                                                                           ¯
      starting at the position of the previous one ∂¯ g̊ j`−1 (resp. ∂ g̊ j`−1 ), going downwards only. Since to
                                                                       ¯
      obtain a changepoint we only query the positions of g (and therefore f ) at block endpoints, there
      are in total at most 1/ε positions to query where starting at one and then only going “down.” Thus,
      the number of queries made for each column j` can be written as O(1) + m` , where
                                               1/ε+1
                                                ∑ m` ≤ K = O(1/ε) .
                                                `=1


Query time. When querying the value of g̊ on a point (i, j), first let ` be the index such that j` ≤ j < j`+1 .
    Then define ∂¯ g̊ j (resp. ∂ g̊ j ) by querying the value of g on the j-th column for all at most 1/ε
                               ¯

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                     36
                                             T ESTING k-M ONOTONICITY

       (block) indices between ∂¯ g̊ j` and ∂¯ g̊ j`+1 (resp. ∂ g̊ j` and ∂ g̊ j`+1 ) to find the corresponding candidate
                                                              ¯           ¯
       changepoints:
                                          
                            ∂ g̊ j = min ∂ g̊ j`+1 ≤ i ≤ ∂ g̊ j` : g(i, j) 6= g(1, j) − 1 ,
                            ¯              ¯                  ¯
                            ∂¯ g̊ = max ∂¯ g̊
                                   j               j`+1 ≤ i ≤ ∂¯ g̊ : g(i, j) 6= g(n, j) + 1 .
                                                                 j`

Again, note that after each query the (partial) function g̊ obtained so far will coincide with ge if f (and
therefore ge) is indeed 2-monotone; and g̊ is uniquely determined by f (and in particular does not depend
on the actual queries made nor on their order). Finally, the function g̊ thus defined will always by
construction be 2-column-wise-monotone.
    The tester previously described can then be slightly modified to simulate access to g̊ instead of ge: by
the above discussion, for the completeness case we will have the same guarantees as then g̊ = ge, while
the soundness case stays unchanged:
                                                                                                  
                                                                                               (2)
    • If f is 2-monotone, then g̊ = ge is ε8 -close to f ; so dist( f , g̊) ≤ ε8 and dist g̊, M2 ≤ ε8 .

    • If f is ε-far from 2-monotone, then g̊ is either (i) ε4 -far from f or (ii) 3ε4 -far from 2-monotone.
Thus, the analysis of correctness of the tester carries through with this modification; it only remains to
bound the query complexity. We will show that the expected number of queries made is O(1/ε); a bound
on the worst-case query complexity will then follow from standard arguments,10 at the price of a constant
factor in the O(·).
    To give this bound on the expected number of queries, we first observe that the algorithm from [8] we
rely on in the first stage of the tester is non-adaptive, and moreover all the queries it makes are uniformly
distributed (as it works by a reduction, invoking the non-adaptive, one-sided, sample-based monotonicity
tester for functions [n] → {0, 1}). Similarly, all the queries made in the second stage are uniformly
distributed as well.
    Therefore, the expected number of queries a` made to columns with indices j` ≤ j < j`+1 is the same
for each ` ∈ [1/ε + 1], namely
                                                      q
                                                a` =      = O(1) .
                                                     1/ε
where q = O(1/ε) is the total number of queries made to g̊ (and/or to ∂¯ g̊, ∂ g̊) during the second phase of
                                                                                      ¯
“Query time.” Now, letting m` = ∂ g̊ j` − ∂ g̊ j`+1 and m0` = ∂¯ g̊ j` − ∂¯ g̊ j`+1 , we have that the total expected
                                      ¯       ¯
cost of simulating these queries (in terms of queries to f ) is upper bounded by
                1                                             1                       1         
                ε
                                         0
                                                  1            ε                       ε
                                                                                           0       1
               ∑ a` · O(1) + m` + m` = O ε + O(1) · ∑ m` + O(1) · ∑ m` = O ε .
              `=1                                             `=1                     `=1
Since the total number of queries made to f is the sum of the number of queries made to partly build g̊
during the “Initialization phase” (which is O(1/ε) by the foregoing discussion), the number of queries
made to simulate access to g̊ or ∂¯ g̊, ∂ g̊ during the “Query time” (which was just shown to be O(1/ε) in
                                        ¯
expectation), and the number of queries directly made to f when testing the distance of f to g̊ (which is
also O(1/ε)), the expected total number of queries is indeed O(1/ε), as claimed.
  10 Namely, stopping the algorithm and outputting REJECT if the number of queries made exceeds C/ε for some absolute

constant C > 0, found by applying Markov’s inequality.


                          T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                        37
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

6.2     Possible extensions
We now discuss two possible extensions of the techniques underlying Theorem 1.6, namely to (i) k-
monotonicity testing of functions over [n]2 , for general k; and (ii) 2-monotonicity testing of functions
over [n]d , for general d. (Note that we do provide a different tester for general k and d in the next
section, Section 7).

Extending to general k (for d = 2). A natural direction would be to first to generalize Definition 6.1
to k-column-wise monotone functions f , defining the k sequences
                                              (1)                    (k)
                                         (∂ f j ) j∈[n] , . . . , (∂ f j ) j∈[n]
                                          ¯                        ¯
                                   (1)                (k)
of column changepoints with ∂ f j ≤ · · · ≤ ∂ f j for all j ∈ [n]. The next step would then be to obtain
                                ¯             ¯
an analogue of the key lemma of the previous section, Lemma 6.4 to this setting. An issue is that it
appears necessary to consider now the L1 distance to (k − 1)-monotonicity of these k sequences, instead
of monotonicity as before. Thus, taking this route requires to generalize the definition of k-monotonicity
to real-valued functions, but also to develop L1 -testers for k-monotonicity over the line.
    The testing algorithm now follows the same outline as in the previous section, with the same
“amortizing” idea when invoking this newly obtained L1 -tester for k-monotonicity in parallel on the
k subsequences, each with probability of failure δ = 1/(10k) (for a union bound) and approximation
parameter ε 0 = ε/(10k). (Note that some more optimizations may then help further reduce the query
complexity, by “sharing” the same set of queries between the k instances of the L1 -testing algorithm.)

Extending to general d (for k = 2). At a very high-level, the tester of Section 6.1 works by reducing
2-monotonicity testing of f : [n]2 → {0, 1} to monotonicity L1 -testing of ∂ f , ∂¯ f : [n] → [0, 1]. More
                                                                                   ¯
generally, one can hope to extend this approach to higher dimensions, reducing 2-monotonicity testing
of f : [n]d → {0, 1} to monotonicity L1 -testing of ∂ f , ∂¯ f : [n]d−1 → [0, 1]: that is, testing monotonicity
                                                    ¯
(in L1 ) of the two (d − 1)-dimensional “surfaces” of changepoints. This in turn could be done invoking
the L1 non-adaptive tester of [8] for monotonicity over [n]d , which has query complexity O(d/ε): e      which
may lead to a total query complexity of poly(d, 1/ε), that is polynomial in the dimension. We leave this
possible extension as an interesting direction for future work.


7     On the high-dimensional grid
In this section, we give two algorithms for tolerant testing, that is testing whether a function f : [n]d →
{0, 1} is ε1 -close to k-monotone vs. ε2 -far from k-monotone, establishing Theorem 1.8. The first has
query complexity exponential in the dimension d and is fully tolerant, that is works for any setting of
0 ≤ ε1 < ε2 ≤ 1. The √ second applies   whenever ε2 > 3ε1 , and has (incomparable) query complexity
                                   2
exponential in O(k d/(ε2 − 3ε1 ) ). Both of these algorithms can be used for non-tolerant (“regular”)
                 e
testing by setting ε1 = 0 and ε2 = ε, which implies Corollary 1.7.

Theorem 1.8. There exist

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                38
                                          T ESTING k-M ONOTONICITY

    • a non-adaptive (fully) tolerant tester for k-monotonicity of functions f : [n]d → {0, 1} with query
      complexity
                                                                           d !
                                                            1        5kd
                               q(n, d, ε1 , ε2 , k) = O
                                                      e                          ;
                                                        (ε2 − ε1 )2 ε2 − ε1

    • a non-adaptive tolerant tester for k-monotonicity of functions f : [n]d → {0, 1} with query com-
      plexity                                             √              
                                                         e k d/(ε2 − 3ε1 )2 ,
                              q(n, d, ε1 , ε2 , k) = exp O

       under the restriction that ε2 > 3ε1 .

As a corollary, this implies Corollary 1.7, restated below:

Theorem 1.7. There exists a non-adaptive two-sided tester for k-monotonicity of functions f : [n]d →
{0, 1} with query complexity
                                                  d !        √
                                                                            !
                                          1    5kd                     2
                                                                         
                    q(n, d, ε, k) = min O
                                        e               , exp Oe k d/ε        .
                                          ε2    ε

      For convenience, we will view in this part of the paper the set [n] as [n] = {0, 1, . . . , n − 1}. Assuming
that m divides n, we let Bm,n : [n]d → [m]d be the mapping such that Bm,n (y)i = bmyi /nc for 1 ≤ i ≤ m.
For x ∈ [m]d , we define the set B−1   m,n (x) to be the inverse image of x. Specifically, Bm,n (x) is the set of
                                                                                                −1
                                     d
points of the form m · x + [n/m] , with standard definitions for scalar multiplication and coordinate-wise
addition. That is, B−1                               d             d
                       m,n (x) is a “coset” of [n/m] points in [n] . To keep on with the notations of the other
sections, we will call these cosets blocks, and will say a function h : [n]d → {0, 1} is an m-block function
if it is constant on each block. Moreover, for clarity of presentation, we will omit the subscripts on B and
B−1 whenever they are not necessary.
      We first establish a lemma that will be useful for the proofs of correctness of both algorithms.

Lemma 7.1. Suppose f : [n]d → {0, 1} is k-monotone. Then there is an m-block function h : [n]d → {0, 1}
such that dist( f , h) < kd/m.

Proof. Fix any k-monotone function f : [n]d → {0, 1}. We partition [m]d into chains of the form
                         n                                                    o
                    Cx = x + ` · 1d : ` ∈ N, x ∈ [m]d and xi = 0 for some i .

There are md − (m − 1)d ≤ dmd−1 of these chains: we will show that f can only be nonconstant on at
most k blocks of each chain.
    By contradiction, suppose there exists x ∈ [m]d such that f is nonconstant on k + 1 different blocks
B (z(i) ), where z(1) ≺ z(2) ≺ · · · ≺ z(k) ≺ z(k+1) , and each z(i) ∈ Cx . By construction, we have B−1 (z(i) ) ≺
  −1
                                                                            (i) (i)                      (i)     (i)
B−1 (z( j) ) for i < j. For each 1 ≤ i ≤ k + 1, there are two points v∗ , v∗∗ ∈ B−1 (zi ) such that v∗ ≺ v∗∗
         (i)        (i)                     (1)    (1)      (2)    (2)     (3)     (3)        (k+1)    (k+1)
and f (v∗ ) 6= f (v∗∗ ). By construction v∗ ≺ v∗∗ ≺ v∗ ≺ v∗∗ ≺ v∗ ≺ v∗∗ ≺ · · · ≺ v∗                ≺ v∗∗ , and
there must be at least k + 1 pairs of consecutive points with differing function values. Out of these 2k + 2

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                    39
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

many points, there is a chain of points v̄(1) ≺ v̄(2) ≺ · · · ≺ v̄(k+1) where f (v̄(i) ) 6= f (v̄(i+1) ) for 1 ≤ i ≤ k,
which is a violation of the k-monotonicity of f .
    Thus, in each of the dmd−1 many chains of blocks, there can only be k nonconstant blocks. It
follows that there are at most kdmd−1 nonconstant blocks in total. We now define h(y) to be equal
to f (y) if f is constant on B(y), and arbitrarily set h(y) = 0 otherwise. Each set B−1 (y) contains
(n/m)d = nd · m−d many points, and f is not constant on at most kdmd−1 of these. It follows that
dist( f , h) ≤ kdmd−1 · m−d = kd/m.

7.1     Fully tolerant testing with O(kd/(ε2 − ε1 ))d queries
Our first algorithm (Algorithm 1) then proceeds by essentially brute-force learning an m-block function
close to the unknown function, and establishes the first item of Theorem 1.8.

Algorithm 1 Fully tolerant testing with O(kd/(ε2 − ε1 ))d queries.
Require: Query access to f : [n]d → {0, 1}, ε2 > ε1 ≥ 0, a positive integer k
 1: α ← (ε2 − ε1 ), m ← d5kd/αe,t ← d25 ln(6md )/(2α 2 )e
 2: . Define a distribution D over [m]d × {0, 1}.
 3: for x ∈ [m]d do
 4:      Query f on t random points Tx ⊆ B−1 (x).
 5:      D(x, 0) ← Pry∈Tx [ f (y) = 0 ] /md
 6:      D(x, 1) ← Pry∈Tx [ f (y) = 1 ] /md
 7: end for
 8: . Define a distribution D0 over [n]d × {0, 1} such that D0 (y, b) = D(B(y), b) · md /nd .
 9: if there exists a k-monotone m-block function h such that Pr(y,b)∼D0 [ h(y) 6= b ] ≤ ε1 + α2 then return
    ACCEPT
10: end if
11: return REJECT


Proposition 7.2. Algorithm 1 accepts all functions ε1 -close to k-monotone functions, and rejects all
functions ε2 -far from k-monotone (with probability at least 2/3). Its query complexity is
                                                         d             !
                                      d        5kd                  kd
                             O                        + 1 log              .
                                 (ε2 − ε1 )2 ε2 − ε1             ε2 − ε1

Proof. The algorithm first estimates Pry∈B−1 (x) [ f (y) = b ] for every x ∈ [m]d and b ∈ {0, 1} to within
±α/5. We use t = 25 ln(6md )/2α 2 points in each block to ensure (by an additive Chernoff bound) that
each estimate is correct except with probability at most m−d /3. By a union bound, the probability that all
estimates are correct is at least 2/3. Conditioning on this event, we have
                                                      
                                                                                α
                         E(x,b)∼D     Pr [ f (y) 6= b ] = Pr [ f (y) 6= b ] ≤ .
                                    y∈B−1 (x)             (y,b)∼D0               5

In this probability experiment, the marginal distribution of D0 on y is uniform over [n]d .

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                       40
                                                     T ESTING k-M ONOTONICITY

     Let f ∗ : [n]d → {0, 1} be a k-monotone function minimizing Pr[ f (y) 6= f ∗ (y) ]. Lemma 7.1 ensures
that there is a k-monotone m-block function h : [n]d → {0, 1} such that dist( f ∗ , h) < kd/m ≤ α/5. Let
h∗ : [n]d → {0, 1} be a k-monotone m-block function minimizing dist( f ∗ , h∗ ).

Completeness. Suppose dist( f , f ∗ ) ≤ ε1 . Then by the triangle inequality,
                                                                                                                                           2α
   Pr      [ h∗ (y) 6= b ] ≤     Pr       [ h∗ (y) 6= f ∗ (y) ] +     Pr       [ f ∗ (y) 6= f (y) ] +     Pr       [ f (y) 6= b ] ≤ ε1 +      .
(y,b)∼D0                       (y,b)∼D0                             (y,b)∼D0                            (y,b)∼D0                            5

where to bound the first term Pr(y,b)∼D0 [ h∗ (y) 6= f ∗ (y) ] by dist( f ∗ , h∗ ) ≤ α/5 we used the fact that the
marginal distribution of y is uniform when (y, b) ∼ D0 . Thus, the algorithm will find a k-monotone
m-block function close to D (without using any queries to f ) and accept.

Soundness. Suppose dist( f , f ∗ ) ≥ ε2 . Then by the triangle inequality

                        Pr [h(y) 6= b] ≥             Pr [h(y) 6= f (y)] −             Pr [ f (y) 6= b]
                     (y,b)∼D0                      (y,b)∼D0                        (y,b)∼D0
                                                                                                                         α
                                               ≥     Pr [ f ∗ (y) 6= f (y)] −           Pr [ f (y) 6= b] ≥ ε2 −
                                                   (y,b)∼D0                          (y,b)∼D0                            5

for every k-monotone m-block function h. Since ε2 − 2α/5 ≥ ε1 + 3α/5, the algorithm never find a
k-monotone m-block function h with low error with respect to D, and the algorithm will reject.

Query complexity.              The algorithm only makes queries in constructing D; the number of queries required
is                                                                d        !
                                                     d    5kd             kd
                                          md · t = O           + 1 log         .
                                                     α2    α              α

7.2     Tolerant testing via agnostic learning
We now present our second algorithm, Algorithm 2, proving the second item of Theorem 1.8. At its core
is the use of an agnostic learning algorithm for k-monotone functions, which we first describe.11
Proposition 7.3. There exists an agnostic learning algorithm
                                                     √       for k-monotone functions over [r]d → {0, 1}
with excess error τ with sample complexity exp(O(k d/τ )).
                                                 e         2


    We start the proof of Proposition 7.3 by formally presenting the learning algorithm.
    To prove its correctness we will rely on tools from Fourier analysis. For this reason, it will be
convenient in this section to view the range as {−1, 1} instead of {0, 1}.
    The idea of the algorithm is to first attempt to learn the function, before checking whether what was
learned is indeed close to both the unknown function and the set of k-monotone functions. To perform the
  11 Recall that an agnostic learner with excess error τ for some class of functions C is an algorithm that, given an unknown

distribution D, an unknown arbitrary function f , and access to   random labelled
                                                                                     samples hx, f (x)i where x ∼ D, satisfies the
following. It outputs a hypothesis function ĥ such that Prx∼D f (x) 6= ĥ(x) ≤ OPTD + τ with probability at least 2/3, where
OPT D = minh∈C Prx∼D [ f (x) 6= h(x) ] (i. e., it performs “almost as well as the best function in C”).



                               T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                                          41
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

                                                   √              
                                                  e k d/(ε2 − 3ε1 )2 queries
Algorithm 2 Multiplicative approximation with exp O

Require: Query access to f : [n]d → {0, 1}, ε2 > 3ε1 ≥ 0, a positive integer k
 1: α ← (ε2 − 3ε1 ), m ← d6kd/αe ,t ← d3d(k + 1)/α ln m + ln 100e
 2: Define D to be the distribution over [m]d × {0, 1} such that D(x, b) = Pry∈B−1 (x) [ f (y) = b ].
 3: . AD (τ, f ) denotes the output of an agnostic learner of k-monotone functions with respect to D,
    with excess error τ and probability of failure 1/10
 4: h : [m]d → R ← AD (α/12, f ).
 5: Estimate Pr(x,b)∼D [ h(x) 6= b ] to within ±α/7 with probability of failure 1/10, using O(1/α 2 )
    queries.
 6: if the estimate is more than ε1 + 5α  12 then return REJECT
 7: end if
 8: if dist(h, `) = Prx∈[m]d [ h(x) 6= `(x) ] ≤ 2ε1 + 5α
                                                      12 for some k-monotone m-block function ` then return
    ACCEPT
 9: else return REJECT
10: end if


first step efficiently enough, we need to design an agnostic learning algorithm for k-monotone functions,
which is what most of this section will be dedicated to. In more detail, we will rely on a theorem of [34],
which very broadly speaking states that any class of functions C that can be well-approximated by linear
combinations of a small number of functions admits a good agnostic learner (see Theorem 7.10 for the
formal statement). Given this, our goal is to establish the existence of a small set of functions whose
linear combinations “cover” the class of k-monotone functions. To do so, we first generalize a theorem of
Blais et al. [11], to show that k-monotone functions on the hypergrid have small influence (Lemma 7.6).
We then use this in Lemma 7.8 to get that k-monotone functions have Fourier concentration on relatively
small degree coefficients, which we leverage in Lemma 7.9 to obtain our small set of functions.

7.3     Preliminary tools from Fourier analysis
Definition 7.4. For a Boolean function f : [r]d → {−1, 1}, we define the influence of variable i on f as
                                                     h                   i
                                    Infi [ f ] = 2 Pr f (x) 6= f (x(i) )

where x = (x1 , x2 , . . . , xd ) is a uniformly random string over [r]d , and

                                     x(i) = (x1 , x2 , . . . , xi−1 , xi0 , xi+1 , . . . , xd )

for x0 drawn independently and uniformly from [r]. We also define the total influence of the variables on
f as Inf[ f ] = ∑di=1 Infi [ f ].

      We first generalize the following result, due to Blais et al., to more general domains:
                                                                                             √
Proposition 7.5 ([11]). Let f : {0, 1}d → {−1, 1} be a k-monotone function. Then Inf[ f ] ≤ k d.

                         T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                           42
                                                      T ESTING k-M ONOTONICITY
                                                                                              √
Lemma 7.6 (Generalization). Let f : [r]d → {−1, 1} be a k-monotone function. Then Inf[ f ] ≤ k d.
Proof. For any two strings y0 , y1 ∈ [r]d , let fy0 ,y1 : {0, 1}d → {−1, 1} be the function obtained by setting
fy0 ,y1 (x) = f (yx ), where yx ∈ [r]d is defined as
                                                (
                                                 min{y0i , y1i } if xi = 0,
                                          yxi =
                                                 max{y0i , y1i } if xi = 1.
                                                                               √
     Since f was a k-monotone function, so is fy0 ,y1 . Thus Inf[ fy0 ,y1 ] ≤ k d for every choice of y0 and y1 .
It is not hard to see that for any fixed i ∈ [d] the following two processes yield the same distribution over
[r]d × [r]d :
                                                                                           def
    • Draw z ∈ [r]d , z0i ∈ [r] independently and uar, set z0 = (z1 , . . . , zi−1 , z0i , zi+1 , . . . , zd ), and output
      (z, z0 );
                                                                                                            (i)
    • Draw y0 , y1 ∈ [r]d , x ∈ {0, 1}d independently and uar, and output (yx , yx ).
This implies that
                  d                     d                                          d                       h                        i
                                                                      (i)                                            x        x(i)
      Inf[ f ] = ∑ Infi [ f ] = ∑ 2 Pr [ f (z) 6= f (z )] = ∑ 2Ey0 ,y1 ∈[r]d                         Pr           f (y ) 6= f (y )
                                              d                                                  x∈{0,1}d
                 i=1                 i=1 z∈[r]                                     i=1
                                "                                                   #
                                    d                 h                            i
                                                                            x(i)
              = Ey0 ,y1 ∈[r]d       ∑ 2 x∈{0,1}
                                          Pr              f (yx ) 6= f (y )
                                                  d
                                i=1
                                "                                                           #
                                    d                 h                                    i
              = Ey0 ,y1 ∈[r]d       ∑ 2 x∈{0,1}
                                          Pr              fy0 ,y1 (x) 6= fy0 ,y1 (x(i) )
                                                  d
                                i=1
                                                     √      √
              = Ey0 ,y1 [Inf[ fy0 ,y1 ]] ≤ Ey0 ,y1 [k d] = k d .

    For two functions f , g : [r]d → R, we define the inner product h f , gi = Ex [ f (x)g(x)], where the
expectation is taken with respect to the uniform distribution. It is known that for functions f : [r]d → R,
there is a “Fourier basis” of orthonormal functions f . To construct such a basis, we can take any
orthonormal basis {φ0 ≡ 1, φ1 , . . . , φ|r|−1 } for functions f : [r] → R. Given such a basis, a Fourier basis is
the collection of functions φα , where α ∈ [r]d , and φα (x) = ∏di=1 φαi (xi ). Then every f : [r]d → R has a
unique representation f = ∑α∈[r]d fb(α)φα , where fb(α) = h f , φα i ∈ R.
   Many Fourier formulæ hold in arbitrary Fourier bases, an important example being Parseval’s Identity:
∑α∈[r]d fb(α)2 = 1. We will use the following property:

Lemma 7.7 ([43, Proposition 8.23]). For α ∈ [r]d , let |α| denote the number of nonzero coordinates in
α. Then we have
                                     Inf[ f ] = ∑ |α| fˆ(α)2 .
                                                                      α∈[r]d

Lemma 7.8. If Inf[ f ] ≤ k, then               ∑           fb(α)2 ≤ ε.
                                            α:|α|>k/ε


                           T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                                           43
      C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

Proof. If not, then
                                                                         k                   k
               Inf[ f ] = ∑ |α| fb(α)2 ≥      ∑        |α| fb(α)2 ≥           ∑      fb(α)2 > · ε = k ,
                         α                 α:|α|>k/ε
                                                                         ε α:|α|>k/ε         ε

a contradiction.
Lemma 7.9. Let p be the function ∑α:|α|≤t fb(α)φα . Then
  (i) kp − f k22 = Ex∈[r]d [(p(x) − f (x))2 ] = ∑α:|α|>t fb(α)2 ;

  (ii) p is expressible as a linear combination of real-valued functions over [r]d , each of which only
       depends on at most t coordinates;
 (iii) p is expressible as a degree-t polynomial over the rd indicator functions 1{xi = j} for 1 ≤ i ≤ d and
       j ∈ [r].
Proof. The lemma follows immediately from Parseval’s identity and from the definitions.
Theorem 7.10 ([34, Theorem 5]). Let C be a class of Boolean functions over X and S a collection of
real-valued functions over X such that for every f : X → {−1, 1} in C, there exists a function p : X → R
such that p is expressible as a linear combination of functions from S and kp − f k22 ≤ τ 2 . Then there is an
agnostic learning algorithm for C achieving excess error τ which has sample complexity poly(|S| , 1/τ).
    Importantly, this algorithm is still successful with inconsistent labelled samples (examples), as long
as they come from a distribution on X × {−1, 1}, where the marginal distribution on X is uniform.


7.4     Finishing the proof of Proposition 7.3
Now we put all the pieces together. To agnostically learn a k-monotone function, we simply perform the
agnostic learning algorithm of [34] on the distribution D over [m]d × {−1, 1} defined by
                                        D(x, b) =         Pr       [ f (y) = b ] .
                                                       y∈B−1 (x)

To generate a sample (x, b) from D, we draw a uniformly random string in x ∈ [m]d , and b is the result of
a query for the
             √ value of f (y) for a uniformly random y ∈ B−1 (x). From Lemma 7.9, we can take S to be
                  2
the set of (k d/τ )-way products of rd indicator functions. It follows that
                                       
                                           rd
                                                         √        
                                 |S| =    √        = exp Oe k d/τ 2
                                         k d/τ 2
(this holds because r = O(k/(ε2 − 3ε1 ))).
Proposition 7.11. Algorithm 2 accepts all functions ε1 -close to k-monotone functions, and rejects all
functions ε2 -far from k-monotone, when ε2 > 3ε1 (with probability at least 2/3). Its query complexity is
                                          √                
                                     exp Oe k d/(ε2 − 3ε1 )2 .

Proof. By a union bound, we have that with probability at least 8/10 both Step 5 and Step 4 succeed. Let
us condition on this.

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                               44
                                               T ESTING k-M ONOTONICITY

Completeness. Suppose f is ε1 -close to k-monotone. Lemma 7.1 and the triangle inequality imply
that there is a k-monotone m-block function g∗ such that dist( f , g∗ ) ≤ ε1 + α/6. The agnostic learning
algorithm thus returns a hypothesis h such that dist( f , h) ≤ ε1 + α/4. The algorithm estimates this
closeness to within α/7, so the estimate obtained in Step 5 is at most ε1 + α/4 + α/7 < ε1 + 5α/12
and the algorithm does not reject in this step. By the triangle inequality, h is (2ε1 + 5α/12)-close to
k-monotone, and the algorithm will accept. There is no estimation error here, since no queries to f are
required.

Soundness. Now suppose f is ε2 -far from k-monotone, where ε2 = 3ε1 + α for some α > 0. Suppose
the algorithm does not reject when estimating dist( f , h), where h is the hypothesis returned by the agnostic
learning algorithm. Then dist( f , h) ≤ ε1 + 5α/12 + α/7 < ε1 + 7α/12. By the triangle inequality, if
t is a k-monotone function, dist(h,t) ≥ dist( f ,t) − dist( f , h) > ε2 − (ε1 + 7α/12) = 2ε1 + 5α/12. The
algorithm will thus reject in the final step.

Query complexity. The query complexity of the algorithm is dominated by the query complexity of
the agnostic learning algorithm, which is
                                √           √                
                          exp Oe k d/α 2 = exp O e k d/(ε2 − 3ε1 )2 .


8      Tolerant testing and applications to L1 -testing
We now show how our techniques can be applied to make progress on an open problem on L1 tolerant
testing of monotonicity, asked at the Sublinear Algorithms Workshop 2016 [52]. In particular, as detailed
in Theorem 1.9, we obtain tolerant `1 testers for monotonicity over the hypergrids with query complexity
depending only on the parameters ε1 and ε2 for small (constant) d.
     We start by describing a reduction lemma from L1 distance to monotonicity of functions in [0, 1]X
to Hamming distance to monotonicity of functions in {0, 1}X×[0,1] (that is, “trading the range for a
dimension”). We note that this idea appears in Berman et al. [8, Lemmata 2.1 and 2.3], although
formulated in a slightly different way. For convenience and completeness, we state and prove here the
version we shall use.
     In what follows, we let X be a discrete partially ordered domain equipped with a measure µ,12 that is
a tuple (X, , µ); and for a set Y ⊆ R we denote by M(X→Y) ⊆ YX the set of monotone functions from X
to Y.

Definition 8.1 (Analogue of [8, Definition 2.1]). For a function f : X → [0, 1], the threshold function
T ◦ f : X × [0, 1] → {0, 1} is defined by
                                                           (
                                                            1 if f (x) ≥ 1 − t,
                             T ◦ f (x,t) = 1{ f (x)≥1−t} =
                                                            0 otherwise.
    12 We will only require that (X, µ) be a measurable space with finite measure, that is µ(X) < ∞, and shall henceforth only

concern ourselves with measurable functions.


                            T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                           45
     C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

The next fact is immediate from this definition:

Fact 8.2. For any f : X → [0, 1], it is the case that for every x ∈ X
                                                              Z 1
                                                    f (x) =         T ◦ f (x,t)dt .
                                                               0


Moreover, f ∈ M(X→[0,1]) if, and only if, T ◦ f ∈ M(X×[0,1]→{0,1}) .

    We begin by the following characterization, which is immediately obtained from a corresponding
theorem of Berman et al.; before stating a slightly modified version that we shall rely upon. For
completeness, we give the proof of the former below.

Proposition 8.3 (Analogue of [8, Lemma 2.1]). For any f : X → [0, 1],
                                                                                       
             L1 f , M(X→[0,1]) = L1 T ◦ f , M(X×[0,1]→{0,1}) = dist T ◦ f , M(X×[0,1]→{0,1}) .

(In particular, the RHS is well-defined, i. e., the mapping
                                                                                  
                                     t ∈ [0, 1] 7→ L1 T ◦ f (·,t), M(X×[0,1]→{0,1}) ,


where T ◦ f (·,t) ∈ {0, 1}X is the function that maps x to T ◦ f (x,t), is measurable.)

                        def
Proof. We write ν = µ × Leb[0,1] for the product measure on X × [0, 1] induced by µ and the Lebesgue
measure on [0, 1]; so that ν(X × [0, 1]) = µ(X) · 1 = µ(X).
   For any fixed t ∈ [0, 1], let gt ∈ M(X→{0,1}) be any function with
                                                                                    
                                   L1 (T ◦ f (·,t), gt ) = L1 T ◦ f (·,t), M(X→{0,1}) ,


and define g ∈ [0, 1]X by
                                                                   Z 1
                                                       g0 (x) =          dtgt (x)
                                                                    0

for all x ∈ X: note that g is then monotone by construction.13 Moreover, choose h ∈ M(X×[0,1]→{0,1}) as a
function achieving
                                                                            
                                 L1 (T ◦ f , h) = L1 T ◦ f , M(X×[0,1]→{0,1}) .

  13 Additionally, since we restrict ourselves to finite X, there are only finitely many distinct functions T ◦ f (·,t) (for t ∈ [0, 1],

and therefore only finitely many distinct functions gt ).


                              T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                                  46
                                              T ESTING k-M ONOTONICITY

Then we have
                                                                       Z 1
                                              1
                                                        Z
    L1 f , M(X→[0,1]) ≤ L1 f , g0 =
                                 
                                                               µ(dx)         dt(T ◦ f (x,t) − gt (x))
                                             µ(X)          X            0
                                                     Z 1
                             1
                                      Z
                          ≤                µ(dx)           dt |T ◦ f (x,t) − gt (x)|
                            µ(X)       X              0
                               Z 1                                          Z1
                                           1
                                                 Z
                          =     dt              µ(dx) |T ◦ f (x,t) − gt (x)| =       dtL1 (T ◦ f (·,t), gt )
                             0            µ(X)X                                   0
                            Z 1                              Z 1                                            
                                                                         1
                                                                             Z
                          ≤     dtL1 (T ◦ f (·,t), h(·,t)) =     dt            µ(dx) |T ◦ f (x,t) − h(x,t)|
                             0                                0       µ(X) X
                                   1
                                            Z
                          =                         ν(dx, dt) |T ◦ f (x,t) − h(x,t)|
                            ν(X × [0, 1]) X×[0,1]
                                                                         
                          = L1 (T ◦ f , h) = L1 T ◦ f , M(X×[0,1]→{0,1})

where we applied Fact 8.2 (and the definition of g0 = 01 gt ) for the first equality, and for the third
                                                                             R

inequality the fact that h induces (for every fixed t ∈ [0, 1]) a monotone function h(·,t) ∈ M(X→{0,1}) : so
that L1 (T ◦ f (·,t), gt ) ≤ L1 (T ◦ f (·,t), h(·,t)) for all t.
    For the other direction of the inequality, fix any f : X → [0, 1], and let g ∈ M(X→[0,1]) be (any) function
achieving L1 ( f , g) = L1 f , M(X→[0,1]) . We can write, unrolling the definitions,
                                         


                                 1
                                     Z
     L1 f , M (X→[0,1])
                              =         µ(dx)| f (x) − g(x)|
                                µ(X) X
                                               Z 1
                                 1
                                     Z
                              =         µ(dx)       dt(T ◦ f (x,t) − T ◦ g(x,t))
                                µ(X) X           0
                                               Z 1
                                 1
                                     Z
                              =         µ(dx)       dt(T ◦ f (x,t) − T ◦ g(x,t))
                                µ(X) X           0
                                 1
                                     Z        Z 1
                              =         µ(dx)        dt(T ◦ f (x,t) − T ◦ g(x,t))1{ f (x)>g(x)}
                                µ(X) X            0
                                                                            
                                 + (T ◦ g(x,t) − T ◦ f (x,t))1{g(x)> f (x)}
                                     Z Z 1
                                 1                   
                              =            dtµ(dx) (T ◦ f (x,t) − T ◦ g(x,t))1{ f (x)>g(x)}
                                µ(X) X 0
                                                                            
                                 + (T ◦ g(x,t) − T ◦ f (x,t))1{g(x)> f (x)}
                                     1
                                                 Z
                              =                       ν(dx, dt) |T ◦ f (x,t) − T ◦ g(x,t)| = L1 (T ◦ f , T ◦ g)
                                ν(X × [0, 1]) X×[0,1]
                                                           
                              ≥ L1 T ◦ f , M(X×[0,1]→{0,1})

where we applied Fact 8.2 for the second equality, the definition of L1 distance for the second-to-
last; and to handle the absolute values we used the fact that |a − b| = (a − b)1{a>b} + (b − a)1{a>b} ,

                          T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                    47
    C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

along with the observation that T ◦ f (x,t) > T ◦ g(x,t) can only hold if f (x) > g(x). Finally, we have
L1 (T ◦ f , T ◦ g) ≥ L1 T ◦ f , M(X×[0,1]→{0,1}) since T ◦ g ∈ M(X×[0,1]→{0,1}) , yielding the desired claim.
    Finally, the fact that
                                                                                  
                        L1 T ◦ f , M(X×[0,1]→{0,1}) = dist T ◦ f , M(X×[0,1]→{0,1})

is immediate from the Boolean range, as |a − b| = 1{a6=b} for any a.b ∈ {0, 1}.

Proposition 8.4 (Rounding and Range-Dimension Tradeoff). For any f : X → [0, 1] and parameter
               def
m ≥ 1, let Rm = {1/m, 2/m, . . . , 1}. We define the m-rounding of f as Φm ◦ f : X → Rm by

                                                          dm f (x)e
                                       Φm ◦ f (x) =                 ,   x ∈ X.
                                                             m
Then we have

   (i) L1 f , M(X→[0,1]) − L1 Φm ◦ f , M(X→Rm ) ≤ m1 ;
                                               

  (ii) L1 Φm ◦ f , M(X→Rm ) = dist T ◦ Φm ◦ f , M(X×Rm →{0,1}) .
                                                             

Proof of Proposition 8.4. Fix any m ≥ 1. We start the proof of Proposition (i) by the simple observation
that if f ∈ M(X→[0,1]) , then Φm ◦ f ∈ M(X→Rm ) ⊆ M(X→[0,1]) , that is rounding preserves monotonicity;
and that Φm ◦ g = g for all g : X → Rm . This, along with the fact that for all f : X → [0, 1]
                                                1                                    1
                                                      Z
                         L1 ( f , Φm ◦ f ) =            µ(dx) |Φm ◦ f (x) − f (x)| ≤
                                               µ(X)   X       |        {z        }   m
                                                                         ≤1/m


implies by the triangle inequality, for any g ∈ M(X→Rm ) ⊆ M(X→[0,1]) , that
                                 1                                          1
  L1 f , M(X→[0,1]) ≤ L1 ( f , g) ≤ + L1 (g, Φm ◦ g) + L1 (Φm ◦ f , Φm ◦ g) = + 0 + L1 (Φm ◦ f , g) .
                                   m                                          m
Taking g ∈ M(X→Rm ) that achieves L1 (Φm ◦ f , g) = L1 Φm ◦ f , M(X→Rm ) , we get
                                                                        

                                             1                     
                            L1 f , M(X→[0,1]) ≤ + L1 Φm ◦ f , M(X→Rm ) .
                                               m
For the other direction, we first note that for any two functions f , g : X → [0, 1], it is the case that
L1 ( f , g) ≥ L1 (Φm ◦ f , Φm ◦ g) − 1/m (which is immediate from the definition of the rounding operator),
and taking g to be the closest monotone function to f this readily yields
                                                      1                      1
              L1 f , M(X→[0,1]) ≥ L1 (Φm ◦ f , Φm ◦ g) − ≥ L1 Φm ◦ f , M(X→Rm ) − .
                                                        m                        m
    Finally, the proof of the second part, Proposition (ii), is identical to that of Proposition 8.3, replacing
the Lebesgue measure on [0, 1] by the counting measure on Rm . (So that integrals over [0, 1] become sums
over Rm , normalized by |Rm | = m.)

                        T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                                48
                                       T ESTING k-M ONOTONICITY

    Given Proposition 8.4, it is now easy to apply the results of Section 7 to obtain a tolerant L1 tester
for monotonicity of functions f : [n]d → [0, 1]. Indeed, given parameters 0 < ε1 < ε2 , one can set the
rounding parameter m to d4/(ε2 − ε1 )e; and from query access to f : [n]d → [0, 1], simulate query access
                             def
to Φm ◦ f and therefore to g = T ◦ Φm ◦ f : [n]d × Rm → {0, 1}. By Proposition 8.4 and our choice of m,
in order to distinguish
                                                                     
                        L1 f , M(X→[0,1]) ≤ ε1 vs. L1 f , M(X→[0,1]) ≥ ε2

it is enough to distinguish
                                         1                                         1
             dist g, M(X×Rm →{0,1}) ≤ ε1 +         vs.    dist g, M(X×Rm →{0,1}) ≥ ε2 − .
                                           m                                           m
By our choice of m, we also have
                                                  
                                         1         1     ε2 − ε1
                                    ε2 −    − ε1 +     ≥         .
                                         m         m        2

    The last step is to observe that one can view equivalently g as a function g : [n]d × [m] → {0, 1};
by Proposition 8.4 and our choice of m, so that the algorithms of Corollary 1.7 apply.

Theorem 1.9. There exists a non-adaptive tolerant L1 -tester for monotonicity of functions f : [n]d → [0, 1]
with query complexity
                      d 
             1      5d
    • O (ε −ε )2 ε2 −ε1
      e                     , for any 0 ≤ ε1 < ε2 ≤ 1;
            2   1

         √             
         e d/(ε2 − 3ε1 )2 , for any 0 ≤ 3ε1 < ε2 ≤ 1.
   • exp O

Acknowledgments. We would like to thank Eric Blais for helpful remarks on an earlier version of this
paper, and the anonymous reviewers for very detailed and insightful comments.


References
 [1] N IR A ILON AND B ERNARD C HAZELLE: Information theory in property testing and monotonicity
     testing in higher dimension. Inform. and Comput., 204(11):1704–1717, 2006. Preliminary version
     in STACS’05. [doi:10.1016/j.ic.2006.06.001] 4

 [2] N IR A ILON , B ERNARD C HAZELLE , S ESHADHRI C OMANDUR , AND D ING L IU: Estimating the
     distance to a monotone function. Random Structures Algorithms, 31(3):371–383, 2007. Preliminary
     version in RANDOM’04. [doi:10.1002/rsa.20167] 14

 [3] K AZUYUKI A MANO AND A KIRA M ARUOKA: A superpolynomial lower bound for a circuit
     computing the Clique function with at most (1/6) log log n negation gates. SIAM J. Comput.,
     35(1):201–216, 2005. Preliminary version in MFCS’98. [doi:10.1137/S0097539701396959] 9

                       T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                              49
   C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

 [4] DANA A NGLUIN: Queries and concept learning.           Machine Learning, 2(4):319–342, 1988.
     [doi:10.1007/BF00116828] 2

 [5] M ARIA -F LORINA BALCAN , E RIC B LAIS , AVRIM B LUM , AND L IU YANG: Active property testing.
     In Proc. 53rd FOCS, pp. 21–30. IEEE Computer Society, 2012. [doi:10.1109/FOCS.2012.64] 4

 [6] T U ĞKAN BATU , RONITT RUBINFELD , AND PATRICK W HITE: Fast approximate PCPs for multidi-
     mensional bin-packing problems. Inform. and Comput., 196(1):42–56, 2005. Preliminary version
     in RANDOM’99. [doi:10.1016/j.ic.2004.10.001] 4

 [7] A LEKSANDRS B ELOVS AND E RIC B LAIS: A polynomial lower bound for testing monotonicity. In
     Proc. 48th STOC, pp. 1021–1032. ACM, 2016. [doi:10.1145/2897518.2897567, arXiv:1511.05053]
     2, 3, 10, 18

 [8] P IOTR B ERMAN , S OFYA R ASKHODNIKOVA , AND G RIGORY YAROSLAVTSEV: L p -testing. In
     Proc. 46th STOC, pp. 164–173. ACM, 2014. [doi:10.1145/2591796.2591887] 4, 6, 8, 10, 11, 32,
     35, 36, 37, 38, 45, 46, 53

 [9] A RNAB B HATTACHARYYA , E LENA G RIGORESCU , K YOMIN J UNG , S OFYA R ASKHODNIKOVA ,
     AND DAVID P. W OODRUFF: Transitive-closure spanners. SIAM J. Comput., 41(6):1380–1425,
     2012. Preliminary version in SODA’09. [doi:10.1137/110826655, arXiv:0808.1787] 9

[10] E RIC B LAIS , J OSHUA B RODY, AND K EVIN M ATULEF: Property testing lower bounds via commu-
     nication complexity. Comput. Complexity, 21(2):311–358, 2012. Preliminary version in CCC’11.
     [doi:10.1007/s00037-012-0040-x] 4

[11] E RIC B LAIS , C LÉMENT L. C ANONNE , I GOR C. O LIVEIRA , ROCCO A. S ERVEDIO , AND
     L I -YANG TAN: Learning circuits with few negations. In Approximation, Randomization, and
     Combinatorial Optimization. Algorithms and Techniques (RANDOM 2015), volume 40, pp. 512–
     527. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2015. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2015.512, arXiv:1410.8420] 2, 3, 8, 9, 42

[12] J OP B RIËT, S OURAV C HAKRABORTY, DAVID G ARCÍA -S ORIANO , AND A RIE M ATSLIAH: Mono-
     tonicity testing and shortest-path routing on the cube. Combinatorica, 32(1):35–53, 2012. Prelimi-
     nary version in RANDOM’10. [doi:10.1007/s00493-012-2765-1] 2, 3

[13] NADER H. B SHOUTY AND C HRISTINO TAMON: On the Fourier spectrum of monotone functions.
     J. ACM, 43(4):747–770, 1996. Preliminary version in STOC’95. [doi:10.1145/234533.234564] 2

[14] C LÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND
     K ARL W IMMER: Testing k-monotonicity. In Proc. 8th Symp. Innovations in Theoreti-
     cal Computer Science (ITCS’17). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.ITCS.2017.29, arXiv:1609.00265] 1

[15] C LÉMENT L. C ANONNE AND RONITT RUBINFELD: Testing probability distributions underlying
     aggregated data. In Proc. 41st Internat. Colloq. on Automata, Languages and Programming

                      T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                          50
                                    T ESTING k-M ONOTONICITY

     (ICALP’14), volume 8572 of Lecture Notes in Computer Science, pp. 283–295. Springer, 2014.
     [doi:10.1007/978-3-662-43948-7_24, arXiv:1402.3835, ECCC:TR14-021] 7, 27, 28, 29

[16] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: Optimal bounds for monotonicity and
     Lipschitz testing over hypercubes and hypergrids. In Proc. 45th STOC, pp. 419–428, 2013.
     [doi:10.1145/2488608.2488661, arXiv:1204.0849] 2, 4, 11

[17] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: An optimal lower bound for monotonicity
     testing over hypergrids. Theory of Computing, 10(17):453–464, 2014. Preliminary version in
     RANDOM’13. [doi:10.4086/toc.2014.v010a017] 2, 4, 11

[18] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: An o(n) monotonicity tester for Boolean
     functions over the hypercube. SIAM J. Comput., 45(2):461–472, 2016. Preliminary version in
     STOC’13. [doi:10.1137/13092770X, arXiv:1302.4536] 2, 3, 4, 10

[19] X I C HEN , A NINDYA D E , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Boolean function mono-
     tonicity testing requires (almost) n1/2 non-adaptive queries. In Proc. 47th STOC, pp. 519–528.
     ACM, 2015. [doi:10.1145/2746539.2746570, arXiv:1412.5657] 2, 3, 10, 18, 22

[20] X I C HEN , ROCCO A. S ERVEDIO , AND L I -YANG TAN: New algorithms and lower bounds
     for monotonicity testing. In Proc. 55th FOCS, pp. 286–295. IEEE Computer Society, 2014.
     [doi:10.1109/FOCS.2014.38, arXiv:1412.5655] 3, 10

[21] X I C HEN , E RIK WAINGARTEN , AND J INYU X IE: Beyond Talagrand functions: new lower
     bounds for testing monotonicity and unateness. In Proc. 49th STOC, pp. 523–536, 2017.
     [doi:10.1145/3055399.3055461, arXiv:1702.06997] 3, 10, 22

[22] Y EVGENIY D ODIS , O DED G OLDREICH , E RIC L EHMAN , S OFYA R ASKHODNIKOVA , DANA RON ,
     AND A LEX S AMORODNITSKY: Improved testing algorithms for monotonicity. In Proc. 3rd Internat.
     Workshop on Randomization and Computation (RANDOM’99), volume 1671 of Lecture Notes in
     Computer Science, pp. 97–108. Springer, 1999. [doi:10.1007/978-3-540-48413-4_10] 2, 3, 7

[23] F UNDA E RGÜN , S AMPATH K ANNAN , R AVI K UMAR , RONITT RUBINFELD , AND M AHESH
     V ISWANATHAN: Spot-checkers. J. Comput. System Sci., 60(3):717–751, 2000. Preliminary version
     in STOC’98. [doi:10.1006/jcss.1999.1692] 4, 14

[24] S HAHAR FATTAL AND DANA RON: Approximating the distance to monotonicity in high dimensions.
     ACM Trans. Algorithms, 6(3):52:1–37, 2010. [doi:10.1145/1798596.1798605] 6, 7, 11

[25] E LDAR F ISCHER: On the strength of comparisons in property testing. Inform. and Comput.,
     189(1):107–116, 2004. [doi:10.1016/j.ic.2003.09.003] 4

[26] E LDAR F ISCHER , E RIC L EHMAN , I LAN N EWMAN , S OFYA R ASKHODNIKOVA , RONITT RUBIN -
     FELD , AND A LEX S AMORODNITSKY : Monotonicity testing over general poset domains. In Proc.
     34th STOC, pp. 474–483. ACM, 2002. [doi:10.1145/509907.509977] 2, 3, 9, 10, 14, 15, 18, 19, 21

                     T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                       51
   C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

[27] P ETER G EMMELL , R ICHARD J. L IPTON , RONITT RUBINFELD , M ADHU S UDAN , AND AVI
     W IGDERSON: Self-testing/correcting for polynomials and for approximate functions. In Proc. 23rd
     STOC, pp. 32–42. ACM, 1991. [doi:10.1145/103418.103429] 12

[28] O DED G OLDREICH , S HAFI G OLDWASSER , E RIC L EHMAN , DANA RON , AND A LEX S AMOROD -
     NITSKY : Testing monotonicity. Combinatorica, 20(3):301–337, 2000. Preliminary version in
     FOCS’98. [doi:10.1007/s004930070011] 2, 3, 4, 10, 22

[29] E LENA G RIGORESCU , A KASH K UMAR , AND K ARL W IMMER: Flipping out with many flips: Hard-
     ness of testing k-monotonicity. In Approximation, Randomization, and Combinatorial Optimization.
     Algorithms and Techniques (RANDOM 2018), pp. 40:1–17. Schloss Dagstuhl–Leibniz-Zentrum fuer
     Informatik, 2018. [doi:10.4230/LIPIcs.APPROX-RANDOM.2018.40] 8

[30] S IYAO G UO AND I LAN KOMARGODSKI: Negation-limited formulas. Theoret. Comput. Sci.,
     660:75–85, 2017. Preliminary version in RANDOM’15. [doi:10.1016/j.tcs.2016.11.027] 2, 9

[31] S IYAO G UO , TAL M ALKIN , I GOR C. O LIVEIRA , AND A LON ROSEN: The power of negations in
     cryptography. In 12th Theory of Cryptography Conf. (TCC’15), volume 9014 of Lecture Notes in
     Computer Science, pp. 36–65. Springer, 2015. [doi:10.1007/978-3-662-46494-6_3] 2, 9

[32] S HIRLEY H ALEVY AND E YAL K USHILEVITZ: Testing monotonicity over graph prod-
     ucts. Random Structures Algorithms, 33(1):44–67, 2008. Preliminary version in ICALP’04.
     [doi:10.1002/rsa.20211] 4, 14

[33] S TASYS J UKNA: Boolean Function Complexity. Springer, 2012. [doi:10.1007/978-3-642-24508-4]
     2, 9

[34] A DAM TAUMAN K ALAI , A DAM R. K LIVANS , Y ISHAY M ANSOUR , AND ROCCO A. S ERVEDIO:
     Agnostically learning halfspaces. SIAM J. Comput., 37(6):1777–1805, 2008. Preliminary version in
     FOCS’05. [doi:10.1137/060649057] 8, 42, 44

[35] M ICHAEL J. K EARNS AND DANA RON: Testing problems with sublearning sample com-
     plexity. J. Comput. System Sci., 61(3):428–456, 2000. Preliminary version in COLT’98.
     [doi:10.1006/jcss.1999.1656] 4

[36] M ICHAEL J. K EARNS AND L ESLIE G. VALIANT: Cryptographic limitations on learning Boolean
     formulae and finite automata. J. ACM, 41(1):67–95, 1994. Preliminary version in STOC’89.
     [doi:10.1145/174644.174647] 2

[37] S UBHASH K HOT, D OR M INZER , AND M ULI S AFRA: On monotonicity testing and Boolean
     isoperimetric type theorems. In Proc. 56th FOCS, pp. 52–58. IEEE Computer Society, 2015.
     [doi:10.1109/FOCS.2015.13] 2, 3, 10, 18

[38] P RAVESH KOTHARI , A MIR NAYYERI , RYAN O’D ONNELL , AND C HENGGANG W U: Testing
     surface area. In Proc. 25th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’14), pp. 1204–
     1214. SIAM, 2014. [doi:10.1137/1.9781611973402.89] 4

                      T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                         52
                                     T ESTING k-M ONOTONICITY

[39] C HENGYU L IN AND S HENGYU Z HANG: Sensitivity conjecture and log-rank conjecture for func-
     tions with small alternating numbers. In Proc. 44th Internat. Colloq. on Automata, Languages and
     Programming (ICALP’17), pp. 51:1–13. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.ICALP.2017.51, arXiv:1602.06627] 2, 9
[40] A NDREI A. M ARKOV: On the inversion complexity of systems of functions. Doklady Akademii
     Nauk SSSR, N.S., 116:917–919, 1957. English translation in [41]. 2
[41] A NDREI A. M ARKOV: On the inversion complexity of a system of functions. J. ACM, 5(4):331–334,
     1958. [doi:10.1145/320941.320945] 53
[42] J OE N EEMAN: Testing surface area with arbitrary accuracy. In Proc. 46th STOC, pp. 393–397.
     ACM, 2014. [doi:10.1145/2591796.2591807] 4
[43] RYAN O’D ONNELL: Analysis of Boolean Functions.            Cambridge University Press, 2014.
     [doi:10.1017/CBO9781139814782] 43
[44] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: Learning monotone decision trees in poly-
     nomial time. SIAM J. Comput., 37(3):827–844, 2007. Preliminary version in CCC’06.
     [doi:10.1137/060669309] 2
[45] RYAN O’D ONNELL AND K ARL W IMMER: KKL, Kruskal-Katona, and monotone nets. SIAM J.
     Comput., 42(6):2375–2399, 2013. Preliminary version in FOCS’09. [doi:10.1137/100787325] 2
[46] M ICHAL PARNAS , DANA RON , AND RONITT RUBINFELD: Tolerant property testing and distance
     approximation. J. Comput. System Sci., 72(6):1012–1042, 2006. [doi:10.1016/j.jcss.2006.03.002]
     5, 14
[47] R AN R AZ AND AVI W IGDERSON: Monotone circuits for matching require linear depth. J. ACM,
     39(3):736–744, 1992. Preliminary version in STOC’90. [doi:10.1145/146637.146684] 2
[48] A LEXANDER A. R AZBOROV: Lower bounds on the monotone complexity of some Boolean
     functions. Doklady Akademii Nauk SSSR, N.S., 281(4):798–801, 1985. English translation in Soviet
     Math. Doklady, 31:354–357, 1985. 2
[49] B ENJAMIN ROSSMAN: Correlation bounds against monotone NC1 . In Proc. 30th IEEE Conf.
     on Computational Complexity (CCC’15), pp. 392–411. Schloss Dagstuhl–Leibniz-Zentrum fuer
     Informatik, 2015. [doi:10.4230/LIPIcs.CCC.2015.392] 2, 9
[50] ROCCO A. S ERVEDIO: On learning monotone DNF under product distributions. Inform. and
     Comput., 193(1):57–74, 2004. Preliminary version in COLT’01. [doi:10.1016/j.ic.2004.04.003] 2
[51] L ESLIE G. VALIANT: A theory of the learnable. Communications of the ACM, 27(11):1134–1142,
     1984. Preliminary version in STOC’84. [doi:10.1145/1968.1972] 13
[52] G RIGORY YAROSLAVTSEV: List of open problems in sublinear algorithms: Problem 70. http:
     //sublinear.info/70, 2016. Originally posed in [8]. 6, 8, 45


                      T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                        53
  C L ÉMENT L. C ANONNE , E LENA G RIGORESCU , S IYAO G UO , A KASH K UMAR , AND K ARL W IMMER

AUTHORS

    Clément L. Canonne
    Graduate student
    Columbia University, New York, NY
    ccanonne cs columbia edu
    http://www.cs.columbia.edu/~ccanonne/


    Elena Grigorescu
    Assistant professor
    Purdue University
    elena-g purdue edu
    https://www.cs.purdue.edu/homes/egrigore/


    Siyao Guo
    Assistant professor
    NYU Shanghai
    siyao guo nyu edu
    https://sites.google.com/site/siyaoguo/


    Akash Kumar
    Graduate student
    Purdue University
    akumar purdue edu
    https://www.cs.purdue.edu/homes/akumar/


    Karl Wimmer
    Associate professor
    Duquesne University
    wimmerk duq edu
    http://www.mathcs.duq.edu/~wimmer/


ABOUT THE AUTHORS

    C LÉMENT L. C ANONNE is currently a Motwani Postdoctoral Fellow at Stanford University.
       He graduated from Columbia University in 2017, where his advisor was Rocco Servedio.
       His research focuses on the fields of property testing and sublinear algorithms; specifi-
       cally, on understanding the strengths and limitations of the standard models in property
       and distribution testing, as well as in related areas. He also really likes elephants.


                    T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                           54
                                 T ESTING k-M ONOTONICITY

E LENA G RIGORESCU is an Assistant Professor in the Computer Science Department of
   Purdue University. She graduated from EECS MIT in 2010; her advisor was Madhu
   Sudan. Her research interests span many aspects of Theoretical Computer Science, with a
   focus on sublinear algorithms, coding theory, and complexity theory. She likes elephants
   too, but prefers Boo.


S IYAO G UO is an Assistant Professor in the Computer Science Department of NYU Shanghai.
    She graduated from the Chinese University of Hong Kong in 2014 where she was advised
    by Andrej Bogdanov. Her research interests include pseudorandomness, complexity and
    cryptography. She loves her softball teams BNU Phoenix and CUHK Phoenix.


A KASH K UMAR is a Ph. D. student at Purdue University. He is interested in Property Testing
   and Hardness of Approximation. He likes to imagine that he is good at the game of
   cricket. But really, he is a terrible player.


K ARL W IMMER is an Associate Professor at Duquesne University. He graduated from
   CMU in 2009; he had the privilege of being the first Ph. D. graduate advised by Ryan
   O’Donnell. His research interest is, like, math and stuff, with a current focus on property
   testing and Fourier analysis. He enjoys spending time with his wonderful wife Cassondra
   and their three children. His family also greatly enjoys pets, but they have not yet taken
   in any elephants.




                T HEORY OF C OMPUTING, Volume 15 (1), 2019, pp. 1–55                             55