DOKK Library

Optimal Unateness Testers for Real-Valued Functions: Adaptivity Helps

Authors Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, C. Seshadhri,

License CC-BY-3.0

Plaintext
                            T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36
                                         www.theoryofcomputing.org




   Optimal Unateness Testers for Real-Valued
         Functions: Adaptivity Helps
                       Roksana Baleshzar∗                         Deeparnab Chakrabarty†
   Ramesh Krishnan S. Pallavoor‡                             Sofya Raskhodnikova‡                       C. Seshadhri§
                   Received June 25, 2018; Revised July 2, 2020; Published September 16, 2020




       Abstract. We study the problem of testing unateness of functions f : {0, 1}d → R. A
       function f : {0, 1}d → R is unate if for every coordinate i ∈ [d], the function is either
       nonincreasing in the ith coordinate or nondecreasing in the ith coordinate. We give an
       O((d/ε) · log(d/ε))-query nonadaptive tester and an O(d/ε)-query adaptive tester and show
       that both testers are optimal for a fixed distance parameter ε. Previously known unateness
       testers worked only for Boolean functions, and their query complexity had worse dependence
       on the dimension d both for the adaptive and the nonadaptive case. Moreover, no lower
       bounds for testing unateness were known. (Concurrent work by Chen et al. (STOC’17) proved
       an Ω(d/ log2 d) lower bound on the nonadaptive query complexity of testing unateness of
       Boolean functions.) We also generalize our results to obtain optimal unateness testers for
       functions f : [n]d → R.
     An extended abstract of this paper appeared in the Proceedings of the 44th International Colloquium on Automata,
Languages, and Programming (ICALP), 2017 [4].
   ∗ Supported by NSF award CCF-1422975.
   † Supported by NSF award CCF-1813053.
   ‡ Supported by NSF awards CCF-1422975 and CCF-1909612.
   § Supported by NSF awards CCF-1740850, CCF-1813165, CCF-1909790, and ARO Award W911NF1910294.



ACM Classification: F.2.2
AMS Classification: 68Q17, 68W20
Key words and phrases: property testing, unate and monotone functions, hypercube, hypergrid


© 2020 Roksana Baleshzar, Deeparnab Chakrabarty, Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, and C. Seshadhri
c b Licensed under a Creative Commons Attribution License (CC-BY)                                  DOI: 10.4086/toc.2020.v016a003
    R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

           Our results establish that adaptivity helps with testing unateness of real-valued functions
       on domains of the form {0, 1}d and, more generally, [n]d . This stands in contrast to the
       situation for monotonicity testing where, as shown by Chakrabarty and Seshadhri (Theory of
       Computing, 2014), there is no adaptivity gap for functions f : [n]d → R.


1    Introduction
We study the problem of testing whether a given real-valued function f on domain [n]d , where n, d ∈
N, is unate. A function f : [n]d → R is unate if for every coordinate i ∈ [d], the function is either
nonincreasing in the ith coordinate or nondecreasing in the ith coordinate. Unate functions naturally
generalize monotone functions, which are nondecreasing in all coordinates, and b-monotone functions,
which have a particular direction in each coordinate (either nonincreasing or nondecreasing), specified by
a bit-vector b ∈ {0, 1}d . More precisely, a function is b-monotone if it is nondecreasing in coordinates i
with bi = 0 and nonincreasing in the other coordinates. Observe that a function f is unate iff there exists
some b ∈ {0, 1}d for which f is b-monotone.
     A tester [59, 39] for a property P of a function f is an algorithm that gets a distance parameter
ε ∈ (0, 1) and query access to f . The (relative) distance from a function f to a property P is the smallest
fraction of values of f that must be modified to make f satisfy P. A function f is ε-far from P if the
distance from f to P is at least ε. A tester for P has to accept with probability at least 2/3 if f has
property P and reject with probability at least 2/3 if f is ε-far from P. A tester has one-sided error if it
always accepts a function satisfying P; it has two-sided error otherwise. A nonadaptive tester makes all
its queries at once, whereas an adaptive tester can make queries after seeing answers to previous queries.
     Testing various properties of functions, including monotonicity (see, e. g., [38, 52, 33, 34, 47, 36,
53, 35, 41, 1, 42, 5, 16, 12, 19, 22, 17, 11, 23, 24, 21, 27, 26, 45, 8, 9, 32, 49, 29, 7, 13, 25, 14, 50] and
recent surveys [54, 55, 20]), the Lipschitz property [43, 22, 31, 17, 2], bounded-derivative properties [21],
linearity [18, 6, 10, 44, 56], submodularity [51, 58, 60, 15], and unateness [38, 46], has been studied
extensively over the past two decades. Even though unateness testing was initially discussed in the
seminal paper by Goldreich et al. [38], which was one of the first papers that study property testing,
relatively little was known about testing this property. All previous work on unateness testing focused on
the special case of Boolean functions on domain {0, 1}d . The domain {0, 1}d is called the hypercube and
the more general domain [n]d is called the hypergrid. Goldreich et al. [38] provided an O(d 3/2 /ε)-query
nonadaptive tester for unateness of Boolean functions on the hypercube. More than a decade later, Khot
and Shinkar [46] improved the query complexity to O((d log d)/ε), albeit with an adaptive tester.
     In this paper, we improve upon both these works, and our results hold for a more general class of
functions. Specifically, we show that unateness of real-valued functions on hypercubes can be tested
nonadaptively with O((d/ε) log(d/ε)) queries and adaptively with O(d/ε) queries. More generally, we
describe an O((d/ε) · (log(d/ε) + log n))-query nonadaptive tester and an O((d log n)/ε)-query adaptive
tester for unateness of real-valued functions over hypergrids.
     In contrast to the state of knowledge for unateness testing, the complexity of testing monotonicity
of real-valued functions over the hypercube and the hypergrid has been resolved. For constant distance
parameter ε, it is known to be Θ(d log n). Moreover, this bound holds for all bounded-derivative
properties [21], a large class that includes b-monotonicity and some properties quite different from

                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                2
            O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

monotonicity, such as the Lipschitz property. Amazingly, the upper bound for all these properties is
achieved by the same simple and, in particular, nonadaptive tester. Even though proving lower bounds for
adaptive testers has been challenging in general, a line of work, starting from Fischer [35] and including
[16, 23, 21], has established that adaptivity does not help for this large class of properties. Since unateness
is so closely related, it is natural to ask whether the same is true for testing unateness.
    We answer this in the negative: we prove that any nonadaptive unateness tester of real-valued functions
over the hypercube (for some constant distance parameter) must make Ω(d log d) queries. More generally,
it needs Ω(d(log d + log n)) queries for the hypergrid domain. These lower bounds complement our
algorithms, completing the picture for testing unateness of real-valued functions. From a property testing
standpoint, our results establish that unateness is different from monotonicity and, more generally, any
derivative-bounded property.

1.1    Formal statements and technical overview
Our testers are summarized in the following theorem, stated for functions over the hypergrid domains.
(Recall that the hypercube is a special case of the hypergrid with n = 2.)
Theorem 1.1. Consider functions f : [n]d → R and a distance parameter ε ∈ (0, 1/2).
   1. There is a nonadaptive unateness tester that makes
                                                            
                                               d     d
                                          O       log + log n
                                               ε     ε

       queries.1

   2. There is an adaptive unateness tester that makes O((d log n)/ε) queries.
Both testers have one-sided error.
    Our main technical contribution is the proof that the extra Ω(log d) is needed for nonadaptive testers.
This result demonstrates a gap between adaptive and nonadaptive unateness testing.
Theorem 1.2. Any nonadaptive unateness tester (even with two-sided error) for real-valued functions
f : {0, 1}d → R with a distance parameter ε = 1/8 must make Ω(d log d) queries.
   The lower bound for adaptive testers is an easy adaptation of the monotonicity lower bound in [23].
We state this theorem for completeness and prove it in Section 4.1.
Theorem 1.3. Any unateness tester for functions f : [n]d → R with a distance parameter ε ∈ (0, 1/4)
must make                                                 
                                          d log n log 1/ε
                                     Ω           −
                                             ε         ε
queries.
    1 For many properties, when the domain is extended from the hypercube to the hypergrid, testers incur an extra multiplicative

factor of log n in the query complexity. This is the case for our adaptive tester. However, the complexity of nonadaptive unateness
testing (for constant ε) is Θ(d(log d + log n)) rather than Θ(d log d log n).


                             T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                                3
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

   Theorems 1.2 and 1.3 directly imply that our nonadaptive tester is optimal for constant ε, even for the
hypergrid domain. The following theorem is proved in Section 4.2.

Theorem 1.4. Any nonadaptive unateness tester (even with two-sided error) for real-valued functions
f : [n]d → R must make Ω(d(log n + log d)) queries.

1.1.1   Overview of techniques
We first consider the hypercube domain. For each i ∈ [d], an i-edge of the hypercube is a pair (x, y)
of points in {0, 1}d , where xi = 0, yi = 1, and x j = y j for all j ∈ ([d] \ {i}). Given an input function
 f : {0, 1}d → R, we say an i-edge (x, y) is increasing if f (x) < f (y), decreasing if f (x) > f (y), and
constant if f (x) = f (y).
     Our nonadaptive unateness tester on the hypercube uses the work investment strategy from [11] (also
refer to Section 8.2.4 of Goldreich’s book [37]) to “guess” a good dimension where to look for violations
of unateness (specifically, both increasing and decreasing edges). For all i ∈ [d], let αi be the fraction of
the i-edges that are decreasing, βi be the fraction of the i-edges that are increasing, and µi = min(αi , βi ).
The dimension reduction theorem from [21] implies that if the input function is ε-far from unate, then the
average of µi over all dimensions is at least ε/(4d). If the tester knew which dimension had µi = Ω(ε/d),
it could detect a violation with high probability by querying the endpoints of O(1/µi ) = O(d/ε) uniformly
random i-edges. Performing this test for every dimension would result in query complexity Θ(d 2 /ε).
The work investment strategy allows us to achieve query complexity O((d/ε) log(d/ε)) by repeatedly
choosing a uniformly random dimension and investing a specific number of queries in trying to find a
violation of unateness in that dimension. It proceeds in Θ(log(d/ε)) stages, doubling, in every stage, the
number of queries invested in each dimension. Intuitively, when all violations are in one dimension, a
tester has to try Θ(d) dimensions before it finds the bad one, but needs only Θ(1/ε) queries in the bad
dimension. In contrast, when all dimensions have µi = ε/(4d), only a constant number of attempts are
needed to find a dimension with violations, but Θ(d/ε) queries in one dimension are required to detect a
violation. (In this case, sampling Θ((d/ε) log(d/ε)) uniformly random edges would not be enough to
detect a violation.) The work investment strategy allows us to interpolate between these scenarios.
     With adaptivity, this search through Θ(log(d/ε)) different scenarios is not required. A pair of queries
in each dimension detects dimensions with many non-constant edges, and the algorithm focuses on
finding violations in those dimensions. This leads to the query complexity of O(d/ε), removing the
log(d/ε) factor.
     It is relatively easy to extend (both adaptive and nonadaptive) testers from hypercubes to hypergrids
by incurring an extra factor of log n in the query complexity. The role of i-edges is now played by i-lines.
An i-line is a set of n domain points that differ only in coordinate i. The domain [n] is called a line.
Monotonicity on the line (a.k.a. sortedness) was one of the first properties studied in the context of
property testing [34]. What we need is a nonadaptive tester for sortedness that has 1-sided error and is,
in addition, proximity oblivious: that is, it rejects a function f : [n] → R with probability proportional
to the distance from f to monotonicity. Proximity-oblivious testers (POTs) were defined by Goldreich
and Ron [40]. There are several POTs for sortedness that make O(log n) queries: the tree tester [34], the
spanners-based tester [12, 54], and the power of 2 tester [22]. Such testers can be easily modified to work
for unateness on the line and to output not just an accept/reject decision, but whether they found a pair of

                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                 4
           O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

points on which the function is strictly increasing and, similarly, whether they found a pair of points on
which the function is strictly decreasing (see Section 2.3 for details). To generalize our unateness testers
from the hypercube to the hypergrid domains, instead of sampling a random i-edge, we sample a random
i-line ` and run a modified POT for unateness on the restriction f|` of function f to the line `. This “direct
generalization” is optimal for adaptive testers, but, interestingly, not for nonadaptive testers.
     Intuitively, for nonadaptive testers, the “direct generalization” works with only O((d log n)/ε) queries
when there are enough lines that are, cumulatively, sufficiently far from unate. However, it could happen
that a function on the hypergrid is far from unate, but does not have any lines that are far from unate: the
distance from unateness arises because some lines are far from monotone and other lines in the same
dimension are far from antitone (that is, far from nonincreasing). We prove that each function f on the
line that is ε-far from monotone, but is not ε/2-far from unate, is strictly decreasing on an ε/4 fraction of
pairs in [n]. Symmetrically, if a function is far from antitone but close to unate, it is strictly increasing on a
large fraction of pairs. The dimension reduction allows us to use this statement to show that if the “direct
generalization” does not work well, then, intuitively, the average dimension has many pairs on which f
is increasing and many pairs on which f is decreasing. We again use the work investment strategy to
get a tester for this case that has the same complexity as our nonadaptive tester for the hypercube. The
resulting nonadaptive complexity (for constant ε) is O(d(log d + log n)), which we show is optimal.


The nonadaptive lower bound. Our most significant finding is the log d gap in the query complexity
between adaptive and nonadaptive testing of unateness. By techniques from previous work [35, 23], it
suffices to prove lower bounds for comparison-based testers, i. e., testers that can only perform compar-
isons of the function values at queried points, but cannot use the values themselves. Our main technical
contribution is the Ω(d log d) lower bound for nonadaptive comparison-based testers of unateness on
hypercube domains.
    Note that nonadaptivity must be critical in our lower bound construction, since we obtained an
O(d)-query adaptive tester for unateness. Another challenge in proving the lower bound is the existence
of a single, universal nonadaptive O(d)-tester for all b-monotonicity properties, proven in [21]. In
other words, there is a single distribution on O(d) queries that defines a nonadaptive property tester
for b-monotonicity, regardless of b. Since unateness is the union of all b-monotonicity properties, our
construction of hard inputs must be able to fool such algorithms. In particular, if b is fixed in advance,
the algorithm from [21] will work, so it should be hard to learn b for functions in the lower bound
construction. Once a tester finds a non-constant edge in each dimension, the problem reduces to testing
b-monotonicity for a vector b determined by the directions (increasing or decreasing) of the non-constant
edges. That is, intuitively, most edges in our construction must be constant. This is one of the main
technical challenges. The previous lower bound constructions for monotonicity testing [16, 23] crucially
used the fact that all edges in the hard functions were non-constant.
    We briefly describe how we overcome the problems mentioned above. By Yao’s minimax principle, it
suffices to construct two distributions, D+ and D− , on unate and on far-from-unate functions, respectively,
that a deterministic nonadaptive tester cannot distinguish. First, for some parameter m, we partition the
hypercube into m subcubes based on the first log2 m most significant coordinates. Both distributions,
D+ and D− , sample a uniform k from [K], where K = Θ(log d), and a set R ⊆ [d] of cardinality 2k .
Furthermore, each subcube j ∈ [m] selects an “action dimension” r j ∈ R uniformly at random. For both

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                   5
     R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

distributions, in any particular subcube j, the function value is completely determined by the coordinates
not in R, and the random coordinate r j ∈ R. Note that all the i-edges for i ∈ (R \ {r j }) are constant. Within
the subcube, the function is a linear function with exponentially increasing coefficients. In the distribution
D+ , any two subcubes j, j0 with the same action dimension orient the edges in that dimension the same
way (both increasing or both decreasing), whereas in the distribution D− each subcube decides on the
orientation independently. The former correlation maintains unateness while the latter independence
creates distance to unateness. We prove that to distinguish the distributions, any comparison-based
nonadaptive tester must find two distinct subcubes with the same action dimension r j and, furthermore,
make a specific query (in both) that reveals the coefficient of r j . We show that, with o(d log d) queries,
the probability of this event is negligible.


2      Upper bounds
In this section, we prove parts 1-2 of Theorem 1.1, starting from the hypercube domain.
    Recall the definition of i-edges and i-lines from Section 1.1.1 and what it means for an edge to be
increasing, decreasing, and constant.
    The starting point for our algorithms is the dimension reduction theorem from [21]. It bounds
the distance of f : [n]d → R to monotonicity in terms of the average distances of restrictions of f to
one-dimensional functions.

Theorem 2.1 (Dimension Reduction, Theorem 1.8 in [21]). Fix a bit vector b ∈ {0, 1}d and a function
f : [n]d → R which is ε-far from b-monotonicity. For all i ∈ [d], let µi be the average distance of f|` to
bi -monotonicity over all i-lines `. Then
                                             d
                                                     ε
                                            ∑ µi ≥ 4 .
                                            i=1

     For the special case of the hypercube domains, i-lines become i-edges, and the average distance µi to
bi -monotonicity is the fraction of i-edges on which the function is not bi -monotone.

2.1     The nonadaptive tester over the hypercube
We now describe Algorithm 1, the nonadaptive tester for unateness over the hypercube domains.

    Algorithm 1: The Nonadaptive Unateness Tester over the Hypercube
      input : a distance parameter ε ∈ (0, 1/2); query access to a function f : {0, 1}d → R.
    1 for r = 1 to dlog(16d/ε)e do
          repeat sr = 16d
                        ln 4 
    2                    ε·2r   times
    3        Sample a dimension i ∈ [d] uniformly at random.
    4        Sample 3 · 2r i-edges uniformly and independently at random and reject if there exist an
               increasing edge and a decreasing edge among the sampled edges.
    5 accept




                          T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                6
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

   It is evident that Algorithm 1 is a nonadaptive, one-sided error tester. Furthermore, its query
complexity is O ((d/ε) log(d/ε)). It suffices to prove the following.
Lemma 2.2. If f is ε-far from unate, Algorithm 1 rejects with probability at least 2/3.
Proof. Recall that αi is the fraction of i-edges that are decreasing, βi is the fraction of i-edges that are
increasing and µi = min(αi , βi ).
     Define the d-dimensional bit vector b as follows: for each i ∈ [d], let bi = 0 if αi < βi and bi = 1
otherwise. Then the average distance of f to bi -monotonicity over a random i-edge is precisely µi . Since
 f is ε-far from being unate, f is also ε-far from being b-monotone. By Theorem 2.1, ∑i∈[d] µi ≥ ε/4.
Hence, Ei∈[d] [µi ] ≥ ε/(4d). We now apply the work investment strategy due to Berman et al. [11] to get
an upper bound on the probability that Algorithm 1 fails to reject.
Theorem 2.3 ([11]). For a random variable X ∈ [0, 1] with E[X] ≥ µ, let pr = Pr[X ≥ 2−r ] and δ ∈ (0, 1)
be the desired error probability. Let sr = 4 ln 1/δ
                                             µ·2r . Then

                                          dlog(4/µ)e
                                             ∏      (1 − pr )sr ≤ δ .
                                             r=1

     Consider running Algorithm 1 on a function f that is ε-far from unate. Let X = µi where i is
sampled uniformly at random from [d]. Then E[X] ≥ ε/(4d). Applying the work investment strategy
(Theorem 2.3) on X with µ = ε/(4d), we get that the probability that, in some iteration, Step 3 samples
a dimension i such that µi ≥ 2−r is at least 1 − δ . We set δ = 1/4. Conditioned on sampling such a
dimension, the probability that Step 4 rfails to obtain an increasing edge and a decreasing edge among its
                                      3·2
3 · 2r samples is at most 2 (1 − 2−r ) ≤ 2e−3 < 1/9, as the fraction of both increasing and decreasing
edges in the dimension is at least 2−r . Hence, the probability that Algorithm 1 rejects f is at least
(3/4) · (8/9) = 2/3, which completes the proof of Lemma 2.2.

2.2   The adaptive tester over the hypercube
We now describe Algorithm 2, an adaptive tester for unateness over the hypercube domains with good
expected query complexity. The final tester is obtained by repeating this tester and accepting if the
number of queries exceeds a specified bound.
Claim 2.4. The expected number of queries made by Algorithm 2 is O(d/ε).
Proof. Consider one iteration of the repeat-loop in Step 1. We prove that the expected number of queries
in this iteration is 4d. The total number of queries in Step 3 is 2d, as 2 points per dimension are queried.
Let Ei be the event that edge ei is non-constant and Ti be the random variable for the number of i-edges
sampled in Step 5. Then
                                                     1         1
                                         E[Ti ] =         =        .
                                                  αi + βi Pr[Ei ]
Therefore, the expected number of all edges sampled in Step 5 is
                                 d                     d
                                                                      1
                                 ∑ Pr[Ei ] · E[Ti ] = ∑ Pr[Ei ] ·   Pr[Ei ]
                                                                            = d.
                                i=1                    i=1


                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                              7
      R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI


  Algorithm 2: The Adaptive Unateness Tester over the Hypercube
      input : a distance parameter ε ∈ (0, 1/2); query access to a function f : {0, 1}d → R.
  1 repeat d8/εe times
  2    for i = 1 to d do
  3        Sample an i-edge ei uniformly at random.
  4        if ei is non-constant (i. e., increasing or decreasing) then
  5              Sample i-edges uniformly at random until we obtain a non-constant edge e0i .
  6              reject if one of the edges ei and e0i is increasing and the other is decreasing.
  7 accept




Hence, the expected number of queries in Step 5 is 2d, and the total expected number of queries in one
iteration is 4d. Since there are d8/εe iterations in Step 1, the expected number of queries in Algorithm 2
is O(d/ε).

Claim 2.5. If f is ε-far from unate, Algorithm 2 accepts with probability at most 1/6.

Proof. First, we bound the probability that a violation of unateness is detected in some dimension i ∈ [d]
in one iteration of the repeat-loop in Step 1. Consider the probability of finding a decreasing i-edge in
Step 3 and of finding an increasing i-edge in Step 5. The former is exactly αi , and the latter is βi /(αi + βi ).
Similarly, the probability of finding an increasing i-edge in Step 3 and of finding a decreasing i-edge
in Step 5 is βi and αi /(αi + βi ), respectively. Therefore, the probability we detect a violation from
dimension i is
                                           αi βi
                                       2·         ≥ min(αi , βi ) = µi .
                                          αi + βi
The probability that we fail to detect a violation in any of the d dimensions in one iteration is at most
                                     d                  d
                                                        ∑ µi ≤ e−ε/4 ,
                                                            
                                    ∏(1 − µ i ) ≤ exp −
                                    i=1                     i=1

where the last inequality follows from Theorem 2.1 (Dimension Reduction). Thus, the probability that
Algorithm 2 fails to reject in all iterations of Step 1 is at most (exp(−ε/4))(8/ε) = e−2 < 1/6.

Proof of Theorem 1.1, Part 2 (for the special case of the hypercube domain). We run Algorithm 2, abort-
ing and accepting if the number of queries exceeds the expectation by a factor of 6. By Markov’s inequality,
the probability of aborting is at most 1/6. By Claim 2.5, if f is ε-far from unate, Algorithm 2 accepts
with probability at most 1/6. The theorem follows by a union bound.

2.3     Extension to hypergrids
We start by establishing terminology for lines and pairs. Consider a function f : [n]d → R. Recall the
definition of i-lines from Section 1.1.1. A pair of points that differ only in coordinate i is called an i-pair.
An i-pair {x, y} with xi < yi is called increasing if f (x) < f (y), decreasing if f (x) > f (y), and constant if

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                   8
           O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

 f (x) = f (y). Recall that we call a function h : [n] → R monotone if h(x) ≤ h(y) for all x < y and antitone
if h(x) ≥ h(y) for all x < y.
     Recall that a proximity-oblivious tester (POT) for a property P rejects every input with probability
proportional to the distance from the input to P. As discussed in Section 1.1.1, there are several 1-sided
error, nonadaptive POTs for monotonicity on the line that we can use to extend Algorithms 1 and 2 to
work on hypergrids. One of them is the tree tester, designed by Ergun et al. [34], that simply picks a
point x ∈ [n] uniformly at random, queries all points visited in a binary search for x, and rejects iff x
forms a decreasing pair with one of the queried points. By symmetry, when this algorithm is modified
to reject iff it finds an increasing pair, it tests antitonicity. The tree tester (and any other 1-sided error
POT for sortedness) can be easily modified to return whether it found any increasing/decreasing edges by
including ↑ / ↓ in its output.

Lemma 2.6 ([34, 12, 22, 21]). There exists a nonadaptive algorithm Adir that gets query access to a
function h : [n] → R, makes O(log n) queries, and returns a set dir ⊆ {↑, ↓}.

     • If h is monotone, then ↓ is not in dir. The probability that dir contains ↓ is at least the distance from
       h to monotonicity.

     • Similarly, if h is antitone, then ↑ is not in dir. The probability that dir contains ↑ is at least the
       distance from h to antitonicity.

    We now describe Algorithm 3, an adaptive tester for unateness over the hypergrid domains with
good expected query complexity. As in the case of the hypercube domains, the final tester is obtained by
repeating this tester and accepting if the number of queries exceeds a specified bound.

  Algorithm 3: The Adaptive Unateness Tester over the Hypergrid
     input : a distance parameter ε ∈ (0, 1/2); query access to a function f : [n]d → R.
 1 repeat d8/εe times
 2    for i = 1 to d do
 3        Sample an i-line `i uniformly at random.
 4        Let diri be the output of Adir (from Lemma 2.6) on f|`i .
 5        if diri 6= 0/ then
 6            repeat
 7                  Sample an i-line `0i uniformly at random and let dir0i be the output of Adir on f|`0i .
              until dir0i 6= 0/
 8            If diri ∪ dir0i = {↑, ↓}, reject.
 9 accept




Proof of Theorem 1.1, Part 2. First, we explain how Lemma 2.6 and Theorem 2.1 are used in the analysis
of the adaptive tester. For a dimension i ∈ [d], let αi and βi denote the average distance of f|` to
monotonicity and antitonicity, respectively, over all i-lines `. Then µi := min(αi , βi ) is the average
fraction of points per i-line that needs to change to make f unate. Define the b-vector with bi = 0 if

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                 9
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

αi < βi , and bi = 1 otherwise. By Theorem 2.1, if f is ε-far from unate, and thus ε-far from b-monotone,
then ∑di=1 µi ≥ ε/4. By Lemma 2.6, the probability that the output of Adir on a uniformly random i-line
 f|`i contains ↓ is at least αi , and the probability that it contains ↑ is at least βi . The rest of the analysis of
Algorithm 3 is similar to that in the hypercube case.
      In the proof of Claim 2.4, the expected number of edges sampled in one iteration of Algorithm 2 is
2d. Similarly, the expected number of lines sampled in one iteration of Algorithm 3 is 2d. Since Adir
makes O(log n) queries, the overall expected number of queries is O((d log n)/ε).
      The proof of Claim 2.5 carries over almost word for word. Fix a dimension i. The probability that
↓ ∈ diri in Step 4 is at least αi . The probability that ↑ ∈ dir0i in Step 7 is at least βi /(αi + βi ). The rest of
the calculation is identical to that of the proof of Claim 2.5.
      Finally, we run Algorithm 3 and abort and accept if the number of queries exceeds the expectation by a
factor of 6. As in the proof of Theorem 1.1, Part 2, the resulting algorithm always accepts unate functions
and accepts functions that are ε-far from unate with probability at most 1/3, completing the proof.

   Next we show how to modify any POT for sortedness in a black-box manner to obtain a POT for
unateness on the line with the same guarantees.

  Algorithm 4: The POT for Unateness over the Line
   input : query access to a function h : [n] → R.
  1 Run algorithm Adir on h and let dir be its output.
  2 Pick x ∈ [n] uniformly at random and query h on points 1, x, and n.
  3 If any pair in {1, x, n} is increasing, update dir ← dir ∪ {↑}.
  4 If any pair in {1, x, n} is decreasing, update dir ← dir ∪ {↓}.
  5 Reject if dir = {↑, ↓}; otherwise, accept.




Lemma 2.7. If h : [n] → R is ε-far from unate, then Algorithm 4 rejects with probability at least ε.

Proof. First, consider the case when h(1) = h(n). Since h is ε-far from unate, it is also ε-far from being
a constant function equal to h(1) on all points in [n]. Therefore, with probability at least ε, point x chosen
in Step 2 of the algorithm satisfies h(x) 6= h(1). If this holds, then one of the pairs {1, x} and {x, n} is
increasing and the other is decreasing. So, Algorithm 4 indeed rejects with probability at least ε.
    Next, consider the case when h(1) < h(n). Then {1, n} is an increasing pair. Since h is ε-far from
monotone, by Lemma 2.6, the output of Adir contains ↓ with probability at least ε. So, again Algorithm 4
rejects with probability at least ε.
    Finally, the case when h(1) > h(n) is symmetric to the case when h(1) < h(n).

    Our nonadaptive hypergrid tester is stated in Algorithm 5. A crucial part of its analysis is Lemma 2.8
that demonstrates that every function on the line that is far from monotone is also far from unate or a
large fraction of pairs are decreasing with respect to that function.

Lemma 2.8. Consider a function h : [n] → R which is ε-far from monotone. At least one of the following
holds:

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                     10
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS


  Algorithm 5: The Nonadaptive Unateness Tester over the Hypergrid
   input : distance parameter ε ∈ (0, 1/2); query access to a function f : [n]d → R.
            24 ln 3 
 1 repeat      ε       times
 2     for i = 1 to d do
 3         Sample an i-line ` uniformly at random.
 4         Reject if Algorithm 4 rejects on input f|` .
 5 for r = 1 to dlog(96d/ε)e do
       repeat sr = 96d
                         ln 4 
 6                        ε·2r   times
 7         Sample a dimension i ∈ [d] uniformly at random.
 8         Sample 3 · 2r i-pairs uniformly and independently at random.
 9         If we find an increasing and a decreasing pair among the sampled pairs, reject.
10 accept




   1. The function h is ε2 -far from unate.
               h                          i
   2. Pru,v∈[n] pair {u, v} is decreasing ≥ ε4 .

Proof. Suppose Item 1 does not hold, that is, there exists a set G ⊆ [n] of size greater than n(1 − ε/2)
on which h|G is unate and therefore antitone. (Unate means monotone or antitone, but h|G cannot be
monotone because h is ε-far from monotone.) Observe that if u, v ∈ G and h(u) 6= h(v) then {u, v} is a
decreasing pair because h|G is antitone. For each point u ∈ G, let Du ⊆ G be the set of points on which h
differs from h(u). Since h is ε-far from monotone and, consequently, ε-far from constant, |Du | ≥ ε/2 for
all u ∈ G. Thus,
                                                                              ε ε      ε
                Pr [pair {u, v} is decreasing] ≥ Pr[u ∈ G and v ∈ Du ] ≥ 1 −      · ≥ .
              u,v∈[n]                                                          2 2 4

The last inequality follows because ε ≤ 1. We proved that if Item 1 does not hold then Item 2 must
hold.

Theorem 2.9. If f : [n]d → R is ε-far from unate, then Algorithm 5 rejects with probability at least 2/3.

Proof. Suppose f : [n]d → R is ε-far from unate. We will show that when Algorithm 5 runs on f , one of
the two loops (in Steps 1 and 5) rejects with probability at least 2/3.
    For every line `, we define the following quantities.

   • α` : the distance of f|` to monotonicity.

   • β` : the distance of f|` to antitonicity.

   • µ` = min(α` , β` ): the distance of f|` to unateness.

   • α`0 : the probability that a uniformly random pair in ` is decreasing.

                       T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                           11
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

    • β`0 : the probability that a uniformly random pair in ` is increasing.

    By Lemma 2.8 and symmetry, for every line `,

                                                       α`                β`
                                        µ` + 2α`0 ≥       and µ` + 2β`0 ≥ .                             (2.1)
                                                       2                 2
    Let Li be the set of i-lines. By Theorem 2.1,
                                                                              !
                                               d
                                         1                                         ε
                                               ∑ min ∑ α` , ∑ β`                  ≥ .                   (2.2)
                                        nd−1   i=1          `∈Li       `∈Li        4

    Define u = ∑di=1 ui , where
                                                             1
                                                     ui =           ∑ µ`
                                                            nd−1   `∈Li

is the average distance to unateness in dimension i. (In general, ui is different from the quantity µi used in
the analysis of the adaptive tester. Recall that µi was used to denote the minimum of the average distance
to monotonicity and the average distance to antitonicity in dimension i. We are using different notation
to avoid confusion.) Let αi0 to be the average of α`0 for all lines ` ∈ Li . Define βi0 analogously, and let
µi0 = min(αi0 , βi0 ). Let random variable X be µi0 where i is sampled uniformly at random from [d]. Then
                                                  "                                  !#
                                          1 d                              0       0
                         u + 2d · E[X] = d−1 ∑ ∑ µ` + 2 · min ∑ α` , ∑ β`
                                        n     i=1 `∈Li               `∈Li    `∈Li
                                                                                       !
                                          1 d
                                       = d−1 ∑ min ∑ (µ` + 2α`0 ), ∑ (µ` + 2β`0 )
                                        n     i=1      `∈Li            `∈Li
                                                                       !
                                               d
                                          1                 α`     β`        ε
                                       ≥ d−1 ∑ min ∑ , ∑                  ≥ .
                                        n     i=1      `∈Li 2 `∈Li 2         8

The last two inequalities follow from Equations (2.1) and (2.2), respectively. We conclude that at least
one of u and E[X] has to be large, specifically, u ≥ ε/24 or E[X] ≥ ε/(24d).
    Case 1: u ≥ ε/24. Consider the first loop of Algorithm 5 (in Step 1). By Lemma 2.7, the probability
that a uniformly random i-line is rejected by Algorithm 4 is at least ui . The probability that one iteration
of the loop in Step 1 fails to reject is
                   d              d                                d
                                                                                        ε 
                  ∏(1 − ui ) ≤ ∏ exp(−ui ) = exp(− ∑ ui ) = exp(−u) ≤ exp −             24
                                                                                           .
                  i=1             i=1                              i=1


Thus, all iterations of the loop fail to reject with probability at most (exp(−ε/24))(24 ln 3)/ε = 1/3.
    Case 2: E[X] ≥ ε/(24d). Applying the work investment strategy (Theorem 2.3) on X with µ =
ε/(24d) and using a calculation analogous to that in the proof of Lemma 2.2, the probability that Step 9
rejects in some iteration is at least 2/3.

                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                               12
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

3     The lower bound for nonadaptive testers over the hypercube
In this section, we prove Theorem 1.2, which gives a lower bound for nonadaptive unateness testers for
functions over the hypercube.
     Fischer [35] showed that in order to prove lower bounds for a general class of properties on the line
domain, it is sufficient to consider a special class of testers called comparison-based testers. The properties
he looked at are called order-based properties (see Definition 3.2), and they include monotonicity and
unateness. A tester is comparison-based if it bases its decisions only on the order of the function values
at the points it queried, and not on the values themselves. Chakrabarty and Seshadhri [23] extended
Fischer’s proof to monotonicity on any partially-ordered domain for the case when all function values
are distinct. As we show in Section 3.1 below, Chakrabarty and Seshadhri’s proof goes through for
all order-based properties on partially-ordered domains. Moreover, the assumption of distinctness for
function values can be removed. We include this proof for completeness, filling in the details needed to
generalize the original proof.
     Our main technical contribution is the construction of a distribution of functions f : {0, 1}d → N on
which every nonadaptive comparison-based tester must query Ω(d log d) points to determine whether the
sampled function is unate or far from unate. We describe this construction in Section 3.2 and show its
correctness in Sections 3.3-3.4.

3.1   Reduction to comparison-based testers
In this section, we prove that if there exists a tester for an order-based property P of functions over
a partially-ordered domain, then there exists a comparison-based tester for P with the same query
complexity. This is stated in Theorem 3.3. Before stating the theorem, we introduce some definitions.

Definition 3.1. A (t, ε, δ )-tester for a property is a 2-sided error tester with distance parameter ε that
makes at most t queries and errs with probability at most δ .

Definition 3.2 (Order-based property). For an arbitrary partial order D and an arbitrary total order R, a
property P of functions f : D → R is order-based if, for all strictly increasing maps φ : R → R and all
functions f , we have dist( f , P) = dist(φ ◦ f , P).

    In particular, unateness is an order-based property. The following theorem is an extension of
Theorem 5 in [35] and Theorem 2.1 in [23]. Specifically, Theorem 2.1 in [23] was proved with the
assumption that the function values are distinct. We generalize the theorem by removing this assumption.

Theorem 3.3 (Generalization of [35, 23]). Let P be an order-based property of functions f : D → N over
a finite domain D. Suppose there exists a (t, ε, δ )-tester for P. Then there exists a comparison-based
(t, ε, 2δ )-tester for P.

    The rest of this section is devoted to proving Theorem 3.3. Our proof closely follows the proof of
Theorem 2.1 in [23]. The proof has two parts. The first part describes a reduction from a tester to a
discretized tester, and the second part describes a reduction from a discretized tester to a comparison-based
tester.

                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                13
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

      Let P be a property of functions f : D → R for an arbitrary partial order D and an arbitrary total order
R ⊆ N. Let T be a (t, ε, δ )-tester for P. First, we define a family of probability functions that completely
characterizes T. Fix some s ∈ [t]. Consider the point in time in an execution of the tester T on some
input function f , where exactly s queries have been made. Suppose these queries are x1 , x2 , . . . , xs ∈ D
and the corresponding answers are a1 = f (x1 ), a2 = f (x2 ), . . . , as = f (xs ). Let the query vector X be
(x1 , . . . , xs ) and the answer vector A be (a1 , . . . , as ). The next action of the algorithm is either choosing
the (s + 1)st query from D or outputting accept or reject. For each action y ∈ D ∪ {accept, reject}, let
pyX (A) denote the probability that T chooses action y after making queries X and receiving answers A.
Since pyX (A) is a probability distribution,

                           ∀s < t, ∀X ∈ Ds , ∀A ∈ Rs ,            ∑             pyX (A) = 1.
                                                         y∈D∪{accept,reject}

Furthermore, the tester cannot make more than t queries, and so the action (t + 1) must be either accept
or reject. Formally,
                             ∀X ∈ Dt , ∀A ∈ Rt ,     ∑          pyX (A) = 1.
                                                      y∈{accept,reject}
                                                                            0
If a tester decides to accept or reject before making t queries, i. e., pyX (A) = 1 for some X = (x1 , . . . , xs ), A =
(a1 , . . . , as ), where s < t and y0 ∈ {accept, reject}, then we fill in the values for p, so that the action
                                                                                                0
of the tester is defined to be the same y0 for all values until t + 1. Specifically, we set pyX 0 (A0 ) = 1 for all
             0         0
X 0 ∈ Ds , A0 ∈ Rs where s < s0 ≤ t and the first s queries (in X 0 ) and their corresponding answers (in A0 )
are x1 , . . . , xs and a1 , . . . , as , respectively.

Definition 3.4 (Discretized tester). A tester T is discretized if all pyX (A)-values associated with T come
from the range {i/K : i ∈ {0, 1, . . . , K}} for some positive integer K.

    Chakrabarty and Seshadhri [23] proved that if there exists a (t, ε, δ )-monotonicity tester T for
functions f : D → N, then there exists a discretized (t, ε, 2δ )-monotonicity tester T 0 for the same class
of functions. Both the statement and the proof in [23] hold not only for testers of monotonicity, but for
testers of all properties of functions f : D → R.

Lemma 3.5 (Implicit in [23, Lemma 2.2]). Suppose there exists a (t, ε, δ )-tester T for a property P of
functions f : D → R. Then there exists a (t, ε, 2δ )-discretized tester T 0 for P.

     This completes the first part of the proof.
     Next, we show how to transform a discretized tester into a comparison-based tester. Intuitively, a
tester is comparison-based if each action of the tester depends only on the ordering of the answers to
the previous queries, not on the values themselves. We define a family of probability functions q in
order to characterize comparison-based testers. The q-functions are defined in terms of p-functions,
but, in their definition, we decouple the set of values that were received as answers from their positions
in the answer vector. Specifically, the set of values that were received as answers (i. e., information
irrelevant for a comparison-based tester) is given as the argument to the q-functions. All the remaining
information is given as subscripts and superscripts. Let V represent the set {a1 , . . . , as } of answer values
(without duplicates). Let r be the number of (distinct) values in V . Note that r ≤ s. Suppose V is

                          T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                       14
           O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

{v1 , v2 , . . . , vr }, where v1 , . . . , vr ∈ R and v1 < v2 < . . . < vr . Let ρ be the map from positions of values
in the answer vector to their corresponding indices in V , that is, ρ : [s] → [r]. Observe that ρ is surjective.
The q-functions are defined as follows:

                                     qyX,ρ (V ) = pyX ((vρ(1) , vρ(2) , . . . , vρ(s) )).

    For any set S, let S(r) denote the set of all subsets of S of size r.

Definition 3.6 (Comparison-based tester). A tester T for an order-based property P is comparison-based
for functions f : D → R, if for all r, s satisfying r ≤ s ≤ t, and all X ∈ Ds , y ∈ D ∪ {accept, reject}
and surjections ρ : [s] → [r], the function qyX,ρ is constant on R(r) . That is, for all V,V 0 ∈ R(r) , we have
qyX,ρ (V ) = qyX,ρ (V 0 ).

     To complete the proof of Theorem 3.3, we show that if there exists a discretized tester T for an
order-based property P over the functions f : D → N, then there exists an infinite set R ⊆ N such that,
for functions f : D → R, the tester T is comparison-based. Specifically, the q-functions that describe
the tester do not depend on V , the specific set of answer values, as long as V ⊂ R. At the end of the
proof, we construct a new comparison-based tester that modifies the input function so that its range is R
(without changing the distance of the function to the property P) and runs T on the modified function.
The existence of the infinite set R is proved using Ramsey theory arguments.
     We introduce some Ramsey theory terminology. Consider a positive integer C, where [C] represents
a set of colors. For any positive integer i, a finite coloring of N(i) is a function coli : N(i) → [C]. That
is, this function assigns one of C colors to each subset of N with i elements. An infinite set R ⊆ N is
monochromatic with respect to coli if for all i-element subsets V,V 0 ∈ R(i) , the color coli (V ) = coli (V 0 ).
In other words, each i-element subset of R is colored with the same color by the coloring function coli .
A k-wise finite coloring of N is a collection of k-colorings col1 , col2 , . . . , colk . Note that each coloring
function col1 , . . . , colk is defined over subsets of a different size, and together they assign a color to
each subset with at most k elements. An infinite subset R ⊆ N is k-wise monochromatic with respect to
col1 , . . . , colk if R is monochromatic with respect to all coli for i ∈ [k]. That is, all subsets of R of the
same size get assigned the same color by the coloring functions.
     We use the following variant of Ramsey’s theorem which was also used in [35, 23].

Theorem 3.7 (Theorem 2.3 in [23]). For any k-wise finite coloring of N, there exists an infinite k-wise
monochromatic subset R ⊆ N.

Proof of Theorem 3.3. Suppose there exists a (t, ε, δ )-tester for property P of functions f : D → N. By
Lemma 3.5, there exists a (t, ε, 2δ )-discretized tester T for P. Consider the family of q-functions that
characterizes T.
    The main idea of the proof is to view the behavior of the tester T on each possible subset of answer
values V (with at most t elements) as the color of V . More precisely, the color of V is the corresponding
vector of all q-functions evaluated at V . If two sets V and V 0 are mapped to the same color, it means,
intuitively, that the tester T is ignoring whether the specific answer values are V or V 0 . We want to show
that there is a large subset R of values, such that the tester T ignores the answer values, as long as they
come from R. Since T is discretized, the set of colors is finite, and we are able to apply Theorem 3.7 to

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                      15
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

get an infinite subset R of N which is t-wise monochromatic. That means that T ignores V , as long as
V ⊂ R. That is, T is already comparison-based on R. At the end, we use the fact that P is an order-based
property to get a comparison-based tester for the whole range.
    We define a t-wise finite coloring of N. For each r ∈ [t] and V ∈ N(r) , the color colr (V ) is
defined as a vector of probability values qyX,ρ (V ). The vector is indexed by (y, X, ρ) for each y ∈
D ∪ {accept, reject}, X ∈ Ds for an integer s satisfying r ≤ s ≤ t, and a surjection ρ : [s] → [r]. The
value at the index (y, X, ρ) in colr (V ) is equal to qyX,ρ (V ). Note that, there are finitely many possible
values for y and X, and surjections ρ. So, the dimension of the vector colr (V ) is finite. Furthermore,
since the tester is discretized, the number of different values that the q-functions take is also finite. Hence,
the range of colr is finite. Now, we have a t-wise finite coloring col1 , . . . , colt of N. By Theorem 3.7,
there exists an infinite t-wise monochromatic set R ⊆ N. Thus, for each r ∈ [t] and V,V 0 ∈ R(r) , we have
colr (V ) = colr (V 0 ), implying that qyX,ρ (V ) = qyX,ρ (V 0 ) for all y, X, and ρ. Thus, T is comparison-based
for functions f : D → R.
    Finally, we construct a comparison-based tester T 0 for the whole range N. Consider a strictly
monotone increasing map φ : N → R. Given any function f : D → N, consider φ ◦ f : D → R. Define
an algorithm T 0 , which on input f , runs T on φ ◦ f . Since P is order-based, dist( f , P) = dist(φ ◦ f , P).
Hence, T 0 is a (t, ε, 2δ )-tester for P. Moreover, since the tester T 0 just runs T on φ ◦ f : D → R, and T is
comparison-based for φ ◦ f , the tester T 0 is also comparison-based.

3.2    The hard distributions
Our main lower bound theorem is stated next. Together with Theorem 3.3, it implies Theorem 1.2.

Theorem 3.8 (Main). Any nonadaptive comparison-based unateness tester of functions f : {0, 1}d → N
that works with distance parameter ε = 1/8 and errs with probability at most 1/6 must make Ω(d log d)
queries.

    The proof of Theorem 3.8 is presented in Sections 3.2-3.4 and forms the core technical content of
this work. We will use Yao’s minimax principle [61], specifically, the version stated in [57, Claim 5]. It
asserts that to prove a lower bound for a randomized tester, it is sufficient to give two distributions, one
on positive and one on negative instances, that every deterministic tester fails to distinguish with high
probability. Next, we define two input distributions.
    It may be useful for the reader to recall the sketch of the main ideas given in Section 1.1.1. Without loss
                                                                                                        0
of generality,2 let d be an even power of 2 and d 0 := d +log2 d. We will focus on functions h : {0, 1}d → N,
and prove the lower bound of Ω(d log d) for this class of functions, as Ω(d log d) = Ω(d 0 log d 0 ).
                          0
    We partition {0, 1}d into d subcubes based on the log2 d most significant bits. Specifically, for i ∈ [d],
the ith subcube is defined as
                                                         0
                                 Ci := {x ∈ {0, 1}d | val(xd 0 xd 0 −1 · · · xd+1 ) = i − 1},
                 p
where val(z) := ∑i=1 zi 2i−1 denotes the integer equivalent of the binary string z p z p−1 . . . z1 .
     2 If the number of dimensions is d 00 where there is no positive integer d such that d 00 = d + log d, then consider the largest
                                                                                                 0
                                                                                                        2                 00
integer d 0 such that d 0 < d 00 and d 0 = d + log2 d. We can construct any function over {0, 1}d and extend it to {0, 1}d by adding
  00      0                                     0       00                 00    00
d − d dummy variables. Furthermore, d = Θ(d ), and hence, Ω(d log d ) = Ω(d log d ).       0     0



                             T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                                 16
            O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

    Let m := d. For ease of notation, we denote the set of subcube indices by [m] and the set of dimensions
in a subcube by [d]. We use i, j ∈ [m] to index subcubes, and a, b ∈ [d] to index dimensions. We now
define a collection of random variables, where each variable may depend on the previous ones:
      • The logarithm of the number of action dimensions, k: a number picked uniformly at random from
        [(1/2) log2 d].

      • The set of all action dimensions, R: a uniformly random subset of [d] of size 2k .

      • The action dimension for each subcube, ri : for each i ∈ [m], ri is picked from R uniformly and
        independently at random.

      • The direction for each dimension (in D+ distribution), αb : for each b ∈ [d], αb is picked from
        {−1, +1} uniformly and independently at random. (Technically, αb will only be used for each
        b ∈ R. We define it for all b ∈ [d] so that it is independent of R.)

      • The direction of potential violations for each subcube (in D− distribution), βi : for each i ∈ [m], βi
        is picked from {−1, +1} uniformly and independently at random.
    We use T to refer to the entire collection (k, R, {ri | i ∈ [m]}, {αb | b ∈ [d]}, {βi | i ∈ [m]}) of random
variables. We denote the tuple (k, R, {ri | i ∈ [m]}) by S, also referred to as the shared randomness
common to the distributions D+ and D− . Given T, the distributions D+ and D− generate the functions
fT and gT , respectively, where

                                              fT (x) :=      ∑ xb 3b + αr xr 3r ,
                                                                          i       i
                                                                                          i

                                                          b∈[d 0 ]\R

                                          gT (x) :=          ∑ xb 3b + βi xr 3r
                                                                              i
                                                                                      i

                                                          b∈[d 0 ]\R

and i is the subcube containing x, i. e., i = val(xd 0 xd 0 −1 · · · xd+1 ) + 1.

3.3     The sign function and the distance to unateness for hard distributions
We need to analyze when functions in our hard distributions are increasing or decreasing in a specified
dimension. More generally, since we are looking at comparison-based testers, we need to understand how
                                                                       0
h(x) and h(y) compare for any points x, y in the hypercube {0, 1}d for any function h in the support of
the hard distributions. To help us with that, we define the sign function sgnh (x, y) and analyze its behavior
on our hard distribution.
                             p
    Recall that val(z) := ∑i=1  zi 2i−1 denotes the integer equivalent of the binary string z = z p z p−1 . . . z1 .
                                                                                            0
We say that x <val y if val(x) < val(y). Note that <val forms a total ordering on {0, 1}d .
                                          0                                                   0
Definition 3.9. Given x, y ∈ {0, 1}d such that x <val y, and a function h : {0, 1}d → N, define
                                               
                                               1
                                                       if h(x) < h(y),
                                  sgnh (x, y) = 0       if h(x) = h(y),
                                               
                                                 −1 if h(x) > h(y).
                                               


                          T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                  17
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

    Next, we show that for every function in the support of our hard distributions and two points x <val y
in different subcubes, sgnh (x, y) = 1.
    For a distribution D, let supp(D) denote the support of D.

Claim 3.10. For all h ∈ supp(D+ ) ∪ supp(D− ), x ∈ Ci and y ∈ C j such that i, j ∈ [m] and i < j, we have

                                                    sgnh (x, y) = 1.

Proof. Let b be the most significant coordinate on which x and y differ, i. e., xb 6= yb . Since x ∈ Ci , y ∈ C j
and i < j, we know that xb = 0, yb = 1 and b > d. For all a > d, the coefficient of xa in the definition of h
is 3a , irrespective of whether h is fT or gT . Then,

                                     h(x) ≤ ∑ 3a xa ≤ ∑ 3a xa + ∑ 3a ;
                                             a∈[d 0 ]             a>b             a<b
                                                        a             a
                                     h(y) ≥ ∑ 3 ya − ∑ 3 .
                                             a≥b                a<b

Thus, h(y) − h(x) = 3b − 2 ∑a<b 3a > 0.

    Now we analyze the sign function on points x, y in the same subcube. Its value is determined by the
most significant coordinate on which x and y differ that has a nonzero coefficient in the definitions of
hard functions fT and gT . We define this coordinate next.

Definition 3.11. For any setting of the shared randomness S, index i ∈ [m], and points x, y ∈ Ci , define
tSi (x, y) to be ⊥ if x and y are the same on all coordinates in ([d] \ R) ∪ {ri }; otherwise, define tSi (x, y) to
be the most significant coordinate in ([d] \ R) ∪ {ri } on which x and y differ.

    Note that S determines R and {ri }. For any T that extends S, the restrictions of both fT and gT to
Ci is constant with respect to the coordinates in R \ {ri }. Thus, tSi (x, y) is ⊥ if x and y differ only on
coordinates that do not influence the value of the function in Ci ; otherwise, it is the first coordinate on
which x and y differ that is influential in Ci .

Claim 3.12. Fix some shared randomness S, index i ∈ [m], and points x, y ∈ Ci where x <val y. Let
a = tSi (x, y). For any T that extends S,

    • if a =⊥, then sgn fT (x, y) = sgngT (x, y) = 0;

    • if a ∈ ([d] \ R), then sgn fT (x, y) = sgngT (x, y) = 1;

    • if a = ri , sgn fT (x, y) = αa and sgngT (x, y) = βi .

Proof. Recall that

                                         fT (x) =           ∑ xb 3b + αr xr 3r ;
                                                                          i       i
                                                                                          i

                                                    b∈[d 0 ]\R

                                        gT (x) =            ∑ xb 3b + βi xr 3r .
                                                                              i
                                                                                      i

                                                    b∈[d 0 ]\R



                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                   18
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

    First, consider the case a =⊥. Then xb = yb for all b ∈ ([d 0 ] \ R) ∪ {ri }. By the definition of fT and
gT , we get fT (x) = fT (y) and gT (x) = gT (y). Hence, sgn fT (x, y) = sgngT (x, y) = 0.
    Next, consider the case a 6= ri . Then a ∈/ R by the definition of tSi (x, y). Also, xb = yb for all b > a
such that b ∈/ R. Since x <val y, we have xa = 0 and ya = 1. Thus, fT (y) − fT (x) ≥ 3a − ∑b<a 3b > 0
implying that sgn fT (x, y) = 1. The same argument holds for gT .
    Finally, consider the case a = ri . Thus, fT (y) − fT (x) = αa 3a + ∑b<a,b∈R                 b
                                                                                    / (yb − xb )3 . Note that
   a
−3 < ∑b<a,b∈R                b    a
               / (yb − xb )3 < 3 . Hence, f T (y) − f T (x) is positive when αa = 1 and negative when
αa = −1. This implies that sgn fT (x, y) = αa . By an analogous argument, sgngT (x, y) = βi .

   We complete Section 3.3 by using Claims 3.10 and 3.12 to analyze the distance to unateness for
functions distributed according to our hard distributions.

Claim 3.13. Every function fT ∈ supp(D+ ) is unate.

Proof. Fix some fT ∈ supp(D+ ). We need to prove that along each dimension b ∈ [d 0 ], the function
                                                                                                                0
 fT is either nonincreasing or nondecreasing. Recall that a b-edge is a pair (x, y) of points in {0, 1}d ,
where xb = 0, yb = 1, and xa = ya for all a ∈ [d 0 ] \ {b}. Consider a b-edge {x, y}. If b ∈ [d 0 ] \ R, then, by
Claims 3.10 and 3.12, sgn fT (x, y) = 1, and hence fT is increasing in dimension b.
     If b ∈ R, then, by Claim 3.12, sgn fT (x, y) is 0 or αb . That is, if αb = 1, then fT is nondecreasing, and
if αb = −1, then fT is nonincreasing.

    We write f ∼ D to denote that f is sampled from distribution D.

Claim 3.14. For sufficiently large d, a function gT ∼ D− is 1/8-far from unate with probability at least
19/20.

Proof. For each r ∈ R, let Ar = {i ∈ [m] | ri = r} be the set of subcube indices with ri = r. Then

                                               ∑ |Ar | = m = d.                                            (3.1)
                                               r∈R
                             √                 √
Hence,√E[|Ar |] = d/|R| ≥ d, since |R| ≤ d. By        √ Chernoff and union bounds, for all r ∈ R, we have
|Ar | ≥ d/2 with probability at least 1 −  √d exp(− d/8).
     Condition on the event that |Ar | ≥ d/2 for all r ∈ R. Fix an r ∈ R. For each i ∈ Ar , there is a
random choice of βi . Partition Ar into A+            −
                                            r and Ar , depending on whether βi is +1 or −1. Again, by
                                                               +    −
Chernoff and √ union bounds, for all r ∈ R, we have min(|Ar |, |Ar |) ≥ |Ar |/4 with probability at least
1 − d exp(− d/32).
     Thus, for sufficiently large d and any choice of R,
                                                                √ !            √ !!
                              +     −     |Ar |                     d               d         19
         Pr ∀r ∈ R, min(|Ar |, |Ar |) ≥           ≥ 1 − d exp −         + exp −            ≥ .
                                           4                       8              32          20

   Note that for all i ∈ [m], the size |Ci | = 2d . By Claim 3.12, for any r-edge (x, y) that lies within
subcube Ci where i ∈ Ar , the sign sgngT (x, y) = βi . Hence, in gT , for all i ∈ A+
                                                                                   r , all r-edges in subcube


                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                  19
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

Ci are increasing, whereas, for all j ∈ A−
                                         r , all r-edges in C j are decreasing. In any unate function, all these
r-edges are either nonincreasing or nondecreasing. Thus gT differs from a unate function on at least
                                         2d                 −      2d |Ar |
                                            · min(|A+
                                                    r |, |A r |) ≥
                                         2                            8
points. Overall, we need to change at least (2d /8) ∑r∈R |Ar | values. Since the Ar ’s partition the set of
                                        0
subcube indices, (2d /8) ∑r∈R |Ar | = 2d /8.

3.4     Bad events
Fix a set of queries Q made by a deterministic, nonadaptive, comparison-based tester. We first identify
certain bad values of S, on which Q could potentially distinguish between fT and gT for any T that
extends S. In this section, we prove that, for a given Q, the probability of a bad S is small. In Section 3.5,
we show that the tester with queries Q cannot distinguish between fT and gT for any T that extends
good S.
    Recall that when a comparison-based tester queries points x and y, it only sees sgnh (x, y). In order
for sgnh (x, y) to have a possibility of being different for h ∼ D+ and h ∼ D− , by Claims 3.10 and 3.12,
queries x and y have to be in the same subcube Ci for some i ∈ [m] and, moreover, tSi (x, y), the most
significant coordinate in ([d] \ R) ∪ {ri } on which x and y differ has to be the action dimension ri for the
subcube Ci . Subcubes are fixed, whereas action dimensions R and ri are chosen randomly. Intuitively, the
value of sgnh (x, y) is likely to be determined by the top several coordinates in [d] on which x and y differ.
(And specifically, considering 5 top coordinates is sufficient to get high enough probability.) Next, we
formalize this intuition in the definition of dimensions captured by (x, y) and, more generally, by a set of
query points.
                                                                         0
Definition 3.15 (Captured coordinates). For points x, y ∈ {0, 1}d , define cap(x, y) to be
   1. the set of all coordinates on which x and y differ if the size of the set is at most 5;
   2. the set of 5 most significant coordinates on which x and y differ, otherwise.
                                                                                                         0
We say that the pair (x, y) captures the set of coordinates cap(x, y). For a set of points P ⊆ {0, 1}d , define
           S
cap(P) := x,y∈P cap(x, y) to be the set of all coordinates captured by the set P.
    For each i ∈ [m], let Qi := Q ∩Ci denote the set of the queried points that lie in the subcube Ci . We
define two bad events for S.
Definition 3.16 (Bad events). S is bad if at least one of the following bad events holds:
      • Abort Event A: There exist x, y ∈ Q such that |cap(x, y)| = 5 and cap(x, y) ⊆ R.
      • Collision Event C: There exist distinct i, j ∈ [m] with ri = r j , such that ri ∈ cap(Qi ) and
        r j ∈ cap(Q j ).
    If A does not occur, then for every pair (x, y), the sign sgnh (x, y) is determined by cap(x, y) for any
h ∈ supp(D+ ) ∪ supp(D− ). And, if C does not occur, conditioned on A not occurring, then the tester
cannot distinguish D+ and D− . The heart of the analysis lies in proving that the bad events happen rarely.

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                20
           O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

Lemma 3.17. Let δ0 = 1/20000. If d is sufficiently large and |Q| ≤ δ0 d log d, then Pr[A] < 0.01 .
Proof. Recall that k is the log of the number of action dimensions. Fix any choice of k (in S). For each
pair of points x, y ∈ Q such that |cap(x, y)| = 5,

                                                  2k 2k − 1 2k − 4  2k 5
                           Pr[cap(x, y) ⊆ R] =      ·      ···      <      .
                                                  d d −1       d −4   d −4
Since d − 4 ≥ d/2 for all d ≥ 8 and k ≤ (log d)/2, the probability is at most 32d −5/2 . By the union
bound,
            Pr[A] < |Q × Q| · 32d −5/2 < δ02 d 2 log2 d · 32d −5/2 = 32δ02 d −1/2 log2 d < 0.01
for all sufficiently large d.

    To analyze the probability of the collision event C, we prove the following combinatorial lemma that
will be used to bound the number of coordinates Q captures.
Lemma 3.18. Let c, d ∈ N and V be a set of d-dimensional vectors over an arbitrary alphabet. For all
x, y ∈ V ,
   1. if x and y differ on more than c coordinates, then let capc (x, y) denote the set of the first c
      coordinates on which x and y differ;

   2. otherwise, let capc (x, y) denote the set of all coordinates on which x and y differ.

                       x,y∈V capc (x, y). Then |capc (V )| ≤ c(|V | − 1).
                      S
Define capc (V ) :=
Proof. We construct c different edge-colored graphs G1 , . . . , Gc over the vertex set V with colors from
[d]. For every coordinate i ∈ capc (V ), add one edge of color i to exactly one of the graphs Gt . Since
i ∈ capc (V ), there exists at least one pair of vectors x, y such that i ∈ capc (x, y). Thinking of each
capc (x, y) as an ordered set, find one pair of vectors (x, y) where i appears “earliest” in capc (x, y). Let
the position of i in this capc (x, y) be denoted by t. Add the edge (x, y) to Gt and color it i. Note that each
edge (x, y) is added to Gt at most once, and hence each graph Gt is simple.
    We claim that each Gt is acyclic. Suppose not. Let C be a cycle in Gt . Let (x, y) be the edge in C
with the smallest color i. Clearly, xi 6= yi , since i ∈ capc (x, y). There must exist another edge (u, v) in C
such that ui 6= vi . Furthermore, the color of (u, v) is j > i. Thus, j is the t th entry in capc (u, v). Note that
i ∈ capc (u, v), since ui 6= vi and j > i. It follows that i appears earlier in capc (u, v) than in capc (x, y). So
the edge colored i should not be in Gt , a contradiction.

    Lemma 3.18 implies that a small Q can capture only a few coordinates. Next we bound the probability
of the collision event.
Lemma 3.19. If |Q| ≤ δ0 d log d, then Pr[C] < 0.04 .
Proof. By Lemma 3.18,

                                ∑ |cap(Qi )| ≤ 5 ∑ |Qi | = 5|Q| ≤ 5δ0 d log d.                               (3.2)
                                i∈[m]              i∈[m]



                          T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                  21
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

    For each r ∈ [d], define Ar := { j ∈ [m] | r ∈ cap(Q j )} to be the set of indices of the subcubes in which
coordinate r is captured. Let ar := |Ar |. For each ` ∈ [log d], define n` := |{r ∈ [d] | ar ∈ (2`−1 , 2` ]}| to
be the number of coordinates that are captured by more than 2`−1 , but at most 2` subcubes. Observe that

                             ∑ n` 2` ≤ 2 ∑ ar = 2 ∑ |cap(Qi )| ≤ 10δ0 d log d,                              (3.3)
                           `∈[log d]          r∈[d]         i∈[m]


where the first inequality holds because if coordinate r is included in the count n` then 2`−1 < ar , and the
last inequality follows from Equation (3.2).
     Fix the log of the action dimensions, k, to be some κ ∈ [(1/2) log d]. For each r ∈ [d], we say the
event Cr occurs if r ∈ R and there exist distinct i, j ∈ [m] such that ri = r j = r, and r ∈ cap(Qi ) ∩ cap(Q j ).
By the union bound, Pr[C | k = κ] ≤ ∑r∈[d] Pr[Cr | k = κ].
     Now, we compute Pr[Cr |k = κ]. Conditioning on k = κ, the size of the set of the action dimensions,
|R|, is 2κ . Only sets Q j with j ∈ Ar capture coordinate r. Event Cr occurs if at least two of these sets have
ri = r j = r. Hence,

        Pr[Cr | k = κ] = Pr[r ∈ R] · Pr[∃ distinct i, j ∈ Ar such that ri = r j = r | r ∈ R]                (3.4)
                                  ar
                           2κ
                       =        · ∑ Pr [There are exactly c subcube indices i ∈ Ar with ri = r | r ∈ R]
                            d    c=2
                                                             1 ar −c
                                  ar               c      
                           2κ          ar         1
                       =        ·∑                        1− κ       ,                                      (3.5)
                            d    c=2   c         2κ         2

where we used Pr[r ∈ R] = 2κ /d to get the second equality, and Pr[ri = r | r ∈ R] = 2−κ for each i ∈ [m]
to get the third equality.
    If ar > 2κ /4, by Equation (3.4),

                                         Pr[Cr | k = κ] ≤ Pr[r ∈ R] = 2κ /d.                                (3.6)

    If ar ≤ 2κ /4, by Equation (3.5),

                                                                 1 ar −c
                                                  c 
                                       2κ ar ar
                                                                   
                                                      1
                      Pr[Cr | k = κ] =    ·∑                 1− κ
                                       d c=2     c   2κ         2
                                                                          !c
                                                 1 ar ar                1 −1
                                                                
                                       2κ                     1
                                     <       1− κ     ∑ ar · 2κ · 1 − 2κ
                                       d        2    c=2
                                       2κ ar  ar c
                                     ≤    ∑ 2κ−1
                                       d c=2
                                       2κ  ar 2 8a2r
                                     <    · 2 κ−1 = κ ,                                                     (3.7)
                                       d      2          2 d
where we used                    
                                 ar                                          ar    1
                                    < acr ,       (1 − 2−κ )−1 ≤ 2   and         ≤
                                 c                                         2 κ−1   2

                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                   22
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

to get the first, second and third inequalities, respectively. Using the union bound over all r ∈ [d] and
grouping according to n` , we get
                         d
       Pr[C | k = κ] ≤ ∑ Pr[Cr | k = κ] =                    ∑           Pr[Cr | k = κ] +        ∑            Pr[Cr | k = κ]
                        r=1                           r∈[d]:ar >2κ /4                       r∈[d]:ar ≤2κ /4

                                                                    2κ              8a2r
                                                  ≤          ∑         +    ∑
                                                      r∈[d]:a >2κ−2
                                                                    d r∈[d]:a ≤2κ−2 2κ d
                                                             r                        r

                                                                 log d               κ−2
                                                      2κ                            8
                                                  ≤              ∑
                                                       d `=max(κ−1,1)
                                                                             n` +     ∑ n` 22`−κ ,
                                                                                    d `=1
                                                                                                                               (3.8)

where the second last inequality follows from Equations (3.6) and (3.7), and the last inequality holds
because if coordinate r is included in the count n` , then 2`−1 < ar ≤ 2` .
   Averaging over all the values of k, we get
                                          log d
                                  2    2
                        Pr[C] =       ∑   Pr[C|k = κ]
                                log d κ=1
                                          log d                                     !
                                                          log d        κ−2
                                     16     2
                                                    κ                          2`−κ
                                ≤         ∑ 2              ∑ n` + ∑ n` 2
                                  d log d κ=1         `=max(κ−1,1)     `=1
                                                               log d    log d
                                                                                    
                                              log d   `+1         2 −2
                                    16                                    2
                                ≤
                                  d log d `=1  ∑ n` ∑ 2κ + ∑ n` ∑ 22`−κ  ,                                                    (3.9)
                                                      κ=1         `=1  κ=`+2

where the first inequality follows from Equation (3.8) and the last inequality is obtained by switching the
order of summations and rearranging the terms. Now,
                                                                     log d
                                    `+1                                2
                                                       `+2
                                    ∑ 2 <2κ
                                                             and     ∑ 22`−κ < 2` .
                                    κ=1                            κ=`+2

Using these inequalities to bound the sum in Equation (3.9) and then applying Equation (3.3), we get

                                          80 log d ` 80 · 10δ0 d log d
                              Pr[C] <           ∑ n` 2 ≤ d log d = 0.04 .
                                        d log d `=1

3.5   Indistinguishability of hard distributions
Proof of Theorem 3.8. Fix a deterministic, nonadaptive, comparison-based tester making a set of queries
Q whose size |Q| ≤ δ0 d logd. (Recall that δ0 = 1/20000.) The tester decides whether the input function
h is unate based on the |Q|
                          2 comparisons between the function values at points in Q. We define a labeled
                     Q
undirected graph Gh to be the clique on the node set Q, where each edge {x, y} such that x <val y is
labeled by sgnh (x, y). The graph GQh defines the tester’s “view” of the function h after it has made its
queries Q.

                       T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                                      23
   R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

                                                                    0
    For any distribution D over functions f : {0, 1}d → N, let D-view denote the distribution of the
labeled graphs GQ  h when h ∼ D. For two distributions D1 and D2 , let D1 ≈α D2 denote that the statistical
distance between D1 and D2 is at most α. By the version of Yao’s principle stated in [57, Claim 5], it is
sufficient to give two distributions, D1 and D2 , on positive and negative instances, respectively, such that
the statistical distance between D1 -view and D2 -view is less than 1/6 for every deterministic tester that
makes at most δ0 d log d queries. Specifically, we show that for such testers, D1 -view ≈3/19 D2 -view.
    We first show that, conditioned on the bad events A and C not occurring, the view of the tester is
the same for both D+ and D− . For a distribution D and an event E, let D|E denote the distribution D
conditioned on the event E.
Lemma 3.20. For every deterministic, nonadaptive, comparison-based tester,
                                       D+ -view|A∪C = D− -view|A∪C .
Proof. Fix some S = S for Q such that the bad events A and C do not occur. Let ES denote the event that
the collection of random variables T used to define the support of D+ and D− is extended from S. We
show that for any labeled clique G over the vertex set Q,
                                       Pr       [GQf = G] =             Pr     [GQ
                                                                                 g = G].
                                    f ∼D+ |ES                      g∼D− |ES

      Let F = supp(D+ |ES ) ∪ supp(D− |ES ). Fix some G ∈ {GQ         h | h ∈ F}. Consider an edge {x, y} ∈ G
                                    d 0
with x <val y and x, y ∈ {0, 1} . If x and y lie in different subcubes, then, by Claim 3.10, its label
sgnh (x, y) = 1 for all h ∈ F. Similarly, if x and y lie in the same subcube Ci for some i ∈ [m], but with
tSi (x, y) 6= ri , then by Claim 3.12, sgnh (x, y) is the same for all h ∈ F. Hence the label of the edge {x, y}
can potentially distinguish between D+ and D− only if x and y lie in the same subcube Ci for some
i ∈ [m] and tSi (x, y) = ri . We call such edges interesting. In the rest of this proof, we focus on interesting
edges, since the labels of non-interesting edges cannot be used to distinguish between D+ and D− .
      Fix i ∈ [m]. Recall that Qi = Q ∩Ci is the set of queried points that lie in the subcube Ci . Let G(Qi )
denote the subgraph of G induced on vertex set Qi . By Claim 3.12, all interesting edges in G(Qi ) have
the same label. Denote this label by `i .
      Let I = {i ∈ [m] | G(Qi ) contains an interesting edge} denote the set of indices of all subcubes with
at least one interesting edge.
      Now, we focus on g ∼ D− |ES . For any edge {x, y} ∈ G with x <val y, let `(x, y) denote its label. The
probability
                                             h^          ^                              i
                         Pr− [GQg =  G] = Pr                     `(x, y) =  sgn g (x, y)
                  g∼D |ES
                                                i∈I     {x,y}∈G(Qi ),
                                                      {x,y} is interesting
                                            h^                ^                             i
                                     = Pr                                    `(x, y) = βi       (by Claim 3.12)
                                                i∈I     {x,y}∈G(Qi ),
                                                      {x,y} is interesting
                                            h^          i
                                     = Pr     (`i = βi )
                                                i∈I
                                     = ∏ Pr[`i = βi ] = 2−|I| ,                                          (3.10)
                                         i∈I


                        T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                 24
             O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

since each βi is chosen uniformly and independently at random from {+1, −1}.
    Similarly, for f ∼ D+ |ES , the probability
                                                        h^           i
                                     Pr+ [GQf = G] = Pr   (`i = αri ) .                                        (3.11)
                                     f ∼D |ES
                                                                   i∈I

Note that unless there exist distinct i, j ∈ I such that ri = r j , the individual events in the probability
expression Equation (3.11) are mutually independent. We show that if the bad events A and C do not
occur then all subcubes Ci with interesting edges have distinct action dimensions ri .
Claim 3.21. If A ∪ C then ri 6= r j for all distinct i, j ∈ I.

Proof. Assume for the sake of contradiction that there exist distinct i, j ∈ I with ri = r j . The fact
that C holds ensures that if ri ∈ cap(Qi ) and r j ∈ cap(Q j ) for distinct i, j ∈ [m], then ri 6= r j . Hence,
ri ∈
   / cap(Qi ) or r j ∈
                     / cap(Q j ). Suppose without loss of generality ri ∈    / cap(Qi ). Since i ∈ I, there must
exist an interesting edge {x, y} ∈ G(Qi ) with x <val y. We know ri ∈         / cap(Qi ), so ri ∈  / cap(x, y). By
Definition 3.15, if x and y differ on at most five coordinates, then ri ∈ cap(x, y). Hence, x and y differ on
at least six coordinates, and ri is not among the five most significant coordinates. Since the abort event
A does not occur, cap(x, y) * R. Hence, there exists a coordinate a ∈ cap(x, y) that is not in R and is
more significant than ri . Then, tSi (x, y) ≥ a, contradicting the fact that tSi (x, y) = ri . Hence, ri 6= r j for all
distinct i, j ∈ I.

    By Equation (3.11) and Claim 3.21,

                                     Pr+ [GQf = G] = ∏ Pr[`i = αri ] = 2−|I| ,                                 (3.12)
                                   f ∼D |ES               i∈I

since αb is chosen uniformly and independently at random from {−1, +1} for each b ∈ [d]. By Equa-
tions (3.10) and (3.12),
                                  Pr+ [GQf = G] = Pr− [GQ  g = G],
                                      f ∼D |ES                  g∼D |ES

completing the proof of Lemma 3.20.

    We now wrap up the proof of Theorem 3.8. Let FAR denote the event that a function h ∼ D− is
1/8-far from unate. Define Db − to be D− |FAR . Note that every function h ∼ D
                                                                             b − is 1/8-far from unate.
By Claim 3.13, every function h ∼ D is unate. Hence, to complete the proof of this theorem, it suffices
                                     +

to show that D+ -view ≈3/19 Db − -view.
    By Claim 3.14, Pr[FAR] ≥ 19/20. We use the following claim from [57] to show that D− and D      b−
are close.
Claim 3.22 (Claim 4 in [57]). Let E be an event that happens with probability at least α = 1 − 1/a under
the distribution D. Then, D|E ≈α 0 D where α 0 = 1/(a − 1).
    By Claim 3.22, D− ≈1/19 D
                            b − . Consequently,

                                              D− -view ≈1/19 D
                                                             b − -view.                                        (3.13)


                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                       25
    R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

      By Lemmas 3.17 and 3.19, the probability Pr[A ∪ C] ≥ 19/20. Using Claim 3.22, we get
                                         D+ -view ≈1/19 D+ -view|A∪C ,                                         (3.14)
                                            −                 −
                                         D -view ≈1/19 D -view|A∪C .                                           (3.15)
      Using Equation (3.14), Lemma 3.20, Equation (3.15), and, finally, Equation (3.13), we get
             D+ -view ≈1/19 D+ -view|A∪C = D− -view|A∪C ≈1/19 D− -view ≈1/19 D
                                                                             b − -view.

Hence D+ -view ≈3/19 D
                     b − -view, completing the proof of Theorem 3.8.


4      Other lower bounds
4.1     The lower bound for adaptive testers over the hypergrid
We show that every unateness tester for functions f : [n]d → R requires
                                                             
                                            d log n log 1/ε
                                        Ω          −
                                               ε          ε
queries for ε ∈ (0, 1/4) and prove Theorem 1.3.

Proof of Theorem 1.3. By Yao’s minimax principle and the reduction to testing with comparison-based
testers from [35, 23] (stated for completeness in Theorem 3.3), it is sufficient to give a hard input
distribution on which every deterministic comparison-based tester fails with probability at least 2/3. We
use the hard distribution constructed by Chakrabarty and Seshadhri [23], who used it to prove the same
lower bound for testing monotonicity. Their distribution is a mixture of two distributions, D+ and D− ,
on positive and negative instances, respectively. The positive instances for their problem are functions
that are monotone and, therefore, unate; the negative instances are functions that are ε-far from monotone.
We show that their distribution D− is supported on functions that are ε-far from unate, i. e., negative
instances for our problem. Then the required lower bound for unateness follows from the fact that every
deterministic comparison-based tester needs the stated number of queries to distinguish the distributions
D+ and D− with high enough probability.
     We start by describing the distributions D+ and D− used in [23]. We will define them as distributions
on functions over the hypercube domain. Next, we explain how to convert functions over hypercubes to
functions over hypergrids.
     Without loss of generality, assume n is a power of 2 and let ` := log2 n. For all z ∈ [n], let bin(z)
denote the binary representation of z − 1 as an `-bit vector (z1 , . . . , z` ), where z1 is the least significant bit.
     We now describe the mapping used to convert functions on hypergrids to functions on hypercubes. Let
φ : [n]d → {0, 1}d` be the mapping that takes y ∈ [n]d to the concatenation of bin(y1 ), . . . , bin(yd ). Any
function f : {0, 1}d` → R can be easily converted into a function fe : [n]d → R, where fe(y) := f (φ (y)).
     Let m := d`. For x ∈ {0, 1}m , recall that val(x) = ∑mi=1 xi 2
                                                                     i−1 denotes the integer equivalent of the

binary number represented by vector x. For simplicity, assume 1/ε is a power of 2. Partition the set of
points x ∈ {0, 1}m according to the most significant log(1/2ε) dimensions. That is, for k ∈ [1/2ε], let
                              Sk := {x : val(x) ∈ [(k − 1) · ε2m+1 , k · ε2m+1 − 1]}.

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                      26
           O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

The hypercube is partitioned into 1/2ε sets Sk of equal size, and each Sk forms a subcube of dimension
m0 = m − log(1/ε) + 1.
   We now describe the distributions D+ and D− for functions on hypercubes. The distribution D+
consists of a single function f (x) = 2val(x). The distribution D− is the uniform distribution over m0 /2ε
functions g j,k , where j ∈ [m0 ] and k ∈ [1/2ε], defined as follows:
                                         (
                                          2val(x) − 2 j − 1     if x j = 1 and x ∈ Sk ;
                             g j,k (x) =
                                          2val(x)               otherwise.

To get the distributions D+ and D− for the hypergrid, we convert f to fe and each function g j,k to gf
                                                                                                     j,k ,
using the transformation defined before.
    Chakrabarty and Seshadhri [23] proved that f is monotone and each function gf     j,k is ε-far from
monotone. It remains to show that functions gf
                                             j,k are also ε-far from unate.

Claim 4.1. Each function gf
                          j,k is ε-far from unate.


Proof. To prove that gf   j,k is ε-far from unate, it suffices to show that there exists a dimension i, such that
there are at least ε2 increasing i-pairs and at least ε2d` decreasing i-pairs w.r.t. gf
                       d`
                                                                                            j,k and that all of these
i-pairs are disjoint. Let u, v ∈ [n]d be two points such that φ (u) and φ (v) differ only in the jth bit. Clearly,
u and v form an i-pair, where i = d j/`e. Now, if φ (u), φ (v) ∈ Sk and u ≺ v, then gf    j,k (v) = gfj,k (u) − 1. So,
the i-pair (u, v) is decreasing. The total number of such i-pairs is 2 d`−log(1/2ε)−1         d`
                                                                                      = ε2 . If φ (u), φ (v) ∈ Sk0
where k0 6= k, then the i-pair (u, v) is increasing. Clearly, there are at least ε2d` such i-pairs. All the i-pairs
we mentioned are disjoint. Hence, gf      j,k is ε-far from unate.


      This completes the proof of Theorem 1.3.


4.2     The lower bound for nonadaptive testers over the hypergrid
The lower bound for nonadaptive testers over hypergrids follows from a combination of the lower bound
for nonadaptive testers over hypercube and the lower bound for adaptive testers over hypergrids.

Proof of Theorem 1.4. Fix ε = 1/8. The proof consists of two parts. The lower bound for adaptive testers
is also a lower bound for nonadaptive testers, and so, the bound of Ω(d log n) holds. Next, we extend
the Ω(d log d) lower bound for hypercubes. Assume n to be even. Define function ψ : [n] → {0, 1} as
ψ(a) := 1[a > n/2] for a ∈ [n]. For x = (x1 , x2 , . . . , xd ) ∈ [n]d , define the mapping Ψ : [n]d → {0, 1}d as

                                      Ψ(x) := (ψ(x1 ), ψ(x2 ), . . . , ψ(xd )).

Any function f : {0, 1}d → R can be extended to f˜ : [n]d → R using the mapping f˜(x) = f (Ψ(x)) for all
x ∈ [n]d . The proof of Theorem 3.8 goes through for hypergrids as well, and so we have an Ω(d log d)
lower bound. Combining the two lower bounds, we get a bound of Ω(d · max{log n, log d}), which is
asymptotically equal to Ω(d(log n + log d)).

                         T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                                      27
    R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

5    Conclusion and open questions
In this work, we give the first algorithms for testing unateness of real-valued functions over the hypercube
as well as the hypergrid domains. We also show that our algorithms are optimal by proving matching
lower bounds, thus resolving the query complexity of testing unateness of real-valued functions. Our
results demonstrate that, for real-valued functions, adaptivity helps with testing unateness, which is not
the case in monotonicity testing.
    Subsequent to the initial publication of this work [4], the problem of testing unateness of Boolean
functions receivedsignificant attention. Concurrent with our work, Chen et al. [29] proved a lower bound
of Ω d 2/3 /log3 d for adaptive unateness testers of Boolean functions over {0, 1}d . Subsequently, Chen
                                                                           e d 3/4 /ε 2 for the same class of
                                                                                       
et al. [30] gave an adaptive unateness tester with query complexity O
                                                     e d 2/3 /ε 2 -query algorithm for this problem by Chen
                                                                 
functions. An exciting recent development is an O
and Waingarten [28], which only leaves a polylogarithmic (in d) gap between the upper bound and the
lower bound.
    Next, we discuss nonadaptive unateness testing of Boolean functions over {0, 1}d . In a subsequent
work, Baleshzar et al. [3] proved a lower bound of Ω (d/log d) for one-sided error testers for this
problem. Since Boolean functions are a special case of real-valued functions, our nonadaptive algorithm
over the hypercube also works for Boolean functions. This algorithm has 1-sided error and its query
complexity is O ((d/ε) log(d/ε)) which is currently the best known upper bound for any (even 2-sided
error) nonadaptive algorithm. There is still a polylogarithmic (in d) gap between the upper bound and
the lower bound. An interesting open question is to determine if testers with two-sided error have better
query complexity than testers with one-sided error in the nonadaptive setting.
    Finally, we mention some recent work on approximating the distance to unateness of Boolean func-
tions over the hypercube domain. Levi and Waingarten [48] showed that every algorithm approximating
the distance to unateness within a constant factor requires Ω(d)  e    queries and strengthened their lower
               3/2
bound to Ω(d ) queries for nonadaptive algorithms. Subsequently, Pallavoor et al. [50] proved that
           e
every nonadaptive algorithm approximating the distance to unateness within a d 1/2−k factor for k > 0
            k
requires 2d queries. No nontrivial upper bounds are currently known for this problem.


Acknowledgments. We thank Oded Goldreich for useful discussions and Meiram Murzabulatov for
participation in initial discussions on this work. We would like to thank Theory of Computing and László
Babai personally for the unparalleled copy editing service they provided. We have never seen anything
like this from other journals. The comments and suggestions we received helped us greatly improve our
paper. We are also pleased with its appearance.



References
 [1] N IR A ILON AND B ERNARD C HAZELLE: Information theory in property testing and monotonicity
     testing in higher dimension. Inform. and Comput., 204(11):1704–1717, 2006. Preliminary version
     in STACS’05. [doi:10.1016/j.ic.2006.06.001] 2

                       T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                               28
         O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

 [2] P RANJAL AWASTHI , M ADHAV J HA , M ARCO M OLINARO , AND S OFYA R ASKHODNIKOVA: Test-
     ing Lipschitz functions on hypergrid domains. Algorithmica, 74(3):1055–1081, 2016. Preliminary
     version in RANDOM’12. [doi:10.1007/s00453-015-9984-y] 2

 [3] ROKSANA BALESHZAR , D EEPARNAB C HAKRABARTY, R AMESH K RISHNAN S. PALLAVOOR ,
     S OFYA R ASKHODNIKOVA , AND C. S ESHADHRI: A lower bound for nonadaptive, one-sided
     error testing of unateness of Boolean functions over the hypercube. Electron. Colloq. on Comput.
     Complexity (ECCC), 2017. [ECCC:TR17-111, arXiv:1706.00053] 28

 [4] ROKSANA BALESHZAR , D EEPARNAB C HAKRABARTY, R AMESH K RISHNAN S. PALLAVOOR ,
     S OFYA R ASKHODNIKOVA , AND C. S ESHADHRI: Optimal unateness testers for real-valued func-
     tions: Adaptivity helps. In Proc. 44th Internat. Colloq. on Automata, Languages and Programming
     (ICALP’17), pp. 5:1–5:14. Springer, 2017. [doi:10.4230/LIPIcs.ICALP.2017.5, arXiv:1703.05199]
     1, 28

 [5] T U ĞKAN BATU , RONITT RUBINFELD , AND PATRICK W HITE: Fast approximate PCPs for multidi-
     mensional bin-packing problems. Inform. and Comput., 196(1):42–56, 2005. Preliminary version
     in RANDOM’99. [doi:10.1016/j.ic.2004.10.001] 2

 [6] M IHIR B ELLARE , D ON C OPPERSMITH , J OHAN H ÅSTAD , M ARCOS A. K IWI , AND M ADHU
     S UDAN: Linearity testing in characteristic two. IEEE Trans. Inform. Theory, 42(6):1781–1795,
     1996. Preliminary version in FOCS’95. [doi:10.1109/18.556674] 2

 [7] A LEKSANDRS B ELOVS: Adaptive lower bound for testing monotonicity on the line. In
     Proc. 22nd Internat. Workshop on Randomization and Computation (RANDOM’18), pp. 31:1–
     31:10. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2018. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2018.31, arXiv:1801.08709] 2

 [8] A LEKSANDRS B ELOVS AND E RIC B LAIS: Quantum algorithm for monotonicity testing on
     the hypercube. Theory of Computing, 11(16):403–412, 2015. [doi:10.4086/toc.2015.v011a016,
     arXiv:1503.02868] 2

 [9] A LEKSANDRS B ELOVS AND E RIC B LAIS: A polynomial lower bound for testing monotonic-
     ity. In Proc. 48th STOC, pp. 1021–1032. ACM Press, 2016. [doi:10.1145/2897518.2897567,
     arXiv:1511.05053] 2

[10] M ICHAEL B EN -O R , D ON C OPPERSMITH , M ICHAEL L UBY, AND RONITT RUBINFELD: Non-
     abelian homomorphism testing, and distributions close to their self-convolutions. Random Structures
     Algorithms, 32(1):49–70, 2008. Preliminary version in RANDOM’04. [doi:10.1002/rsa.20182] 2

[11] P IOTR B ERMAN , S OFYA R ASKHODNIKOVA , AND G RIGORY YAROSLAVTSEV: L p -testing. In
     Proc. 46th STOC, pp. 164–173. ACM Press, 2014. [doi:10.1145/2591796.2591887] 2, 4, 7

[12] A RNAB B HATTACHARYYA , E LENA G RIGORESCU , K YOMIN J UNG , S OFYA R ASKHODNIKOVA ,
     AND DAVID P. W OODRUFF: Transitive-closure spanners. SIAM J. Comput., 41(6):1380–1425,
     2012. Preliminary version in SODA’09. [doi:10.1137/110826655, arXiv:0808.1787] 2, 4, 9

                      T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                           29
  R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

[13] H ADLEY B LACK , D EEPARNAB C HAKRABARTY, AND C. S ESHADHRI: A o(d)· polylog(n)
     monotonicity tester for Boolean functions over the hypergrid [n]d . In Proc. 29th ACM-
     SIAM Symp. on Discrete Algorithms (SODA’18), pp. 2133–2151. ACM Press, 2018.
     [doi:10.1137/1.9781611975031.139, arXiv:1710.10545] 2

[14] H ADLEY B LACK , D EEPARNAB C HAKRABARTY, AND C. S ESHADHRI: Domain reduction
     for monotonicity testing: A o(d) tester for Boolean functions in d-dimensions. In Proc.
     31st ACM-SIAM Symp. on Discrete Algorithms (SODA’20), pp. 1975–1994. ACM Press, 2020.
     [doi:10.1137/1.9781611975994.122, arXiv:1811.01427] 2

[15] E RIC B LAIS AND A BHINAV B OMMIREDDI: Testing submodularity and other properties of valuation
     functions. In Proc. 8th Innovations in Theoret. Computer Science (ITCS’17), pp. 33:1–33:17.
     Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017. [doi:10.4230/LIPIcs.ITCS.2017.33,
     arXiv:1611.07879] 2

[16] E RIC B LAIS , J OSHUA B RODY, AND K EVIN M ATULEF: Property testing lower bounds via commu-
     nication complexity. Comput. Complexity, 21(2):311–358, 2012. Preliminary version in CCC’11.
     [doi:10.1007/s00037-012-0040-x] 2, 3, 5

[17] E RIC B LAIS , S OFYA R ASKHODNIKOVA , AND G RIGORY YAROSLAVTSEV: Lower bounds for
     testing properties of functions over hypergrid domains. In Proc. 29th IEEE Conf. on Computational
     Complexity (CCC’14), pp. 309–320. IEEE Comp. Soc. Press, 2014. [doi:10.1109/CCC.2014.38] 2

[18] M ANUEL B LUM , M ICHAEL L UBY, AND RONITT RUBINFELD: Self-testing/correcting with appli-
     cations to numerical problems. J. Comput. System Sci., 47(3):549–595, 1993. Preliminary version
     in STOC’90. [doi:10.1016/0022-0000(93)90044-W] 2

[19] J OP B RIËT, S OURAV C HAKRABORTY, DAVID G ARCÍA -S ORIANO , AND A RIE M ATSLIAH: Mono-
     tonicity testing and shortest-path routing on the cube. Combinatorica, 32(1):35–53, 2012. Prelimi-
     nary version in RANDOM’10. [doi:10.1007/s00493-012-2765-1] 2

[20] D EEPARNAB C HAKRABARTY: Monotonicity testing. In Encyclopedia of Algorithms, pp. 1352–
     1356. Springer, 2016. [doi:10.1007/978-1-4939-2864-4_699] 2

[21] D EEPARNAB C HAKRABARTY, K ASHYAP D IXIT, M ADHAV J HA , AND C. S ESHADHRI: Property
     testing on product distributions: Optimal testers for bounded derivative properties. ACM Trans.
     Algorithms, 13(2):20:1–20:30, 2017. Preliminary version in SODA’15. [doi:10.1145/3039241,
     arXiv:1404.0718] 2, 3, 4, 5, 6, 9

[22] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: Optimal bounds for monotonicity and Lips-
     chitz testing over hypercubes and hypergrids. In Proc. 45th STOC, pp. 419–428. ACM Press, 2013.
     [doi:10.1145/2488608.2488661, arXiv:1204.0849] 2, 4, 9

[23] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: An optimal lower bound for monotonicity
     testing over hypergrids. Theory of Computing, 10(17):453–464, 2014. Preliminary version in
     RANDOM’13. [doi:10.4086/toc.2014.v010a017] 2, 3, 5, 13, 14, 15, 26, 27

                      T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                          30
          O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

[24] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: An o(n) monotonicity tester for Boolean
     functions over the hypercube. SIAM J. Comput., 45(2):461–472, 2016. Preliminary version in
     STOC’13. [doi:10.1137/13092770X, arXiv:1302.4536] 2

[25] D EEPARNAB C HAKRABARTY AND C. S ESHADHRI: Adaptive Boolean monotonicity testing in total
     influence time. In Proc. 10th Innovations in Theoret. Computer Science (ITCS’19), pp. 20:1–20:7.
     Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.ITCS.2019.20,
     arXiv:1801.02816] 2

[26] X I C HEN , A NINDYA D E , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Boolean function mono-
     tonicity testing requires (almost) n1/2 non-adaptive queries. In Proc. 47th STOC, pp. 519–528. ACM
     Press, 2015. [doi:10.1145/2746539.2746570, arXiv:1412.5657] 2

[27] X I C HEN , ROCCO A. S ERVEDIO , AND L I -YANG TAN: New algorithms and lower bounds
     for monotonicity testing. In Proc. 55th FOCS, pp. 286–295. IEEE Comp. Soc. Press, 2014.
     [doi:10.1109/FOCS.2014.38, arXiv:1412.5655] 2

[28] X I C HEN AND E RIK WAINGARTEN: Testing unateness nearly optimally. In Proc. 51st STOC, pp.
     547–558. ACM Press, 2019. [doi:10.1145/3313276.3316351, arXiv:1904.05309] 28

[29] X I C HEN , E RIK WAINGARTEN , AND J INYU X IE: Beyond Talagrand functions: new lower bounds
     for testing monotonicity and unateness. In Proc. 49th STOC, pp. 523–536. ACM Press, 2017.
     [doi:10.1145/3055399.3055461, arXiv:1702.06997] 2, 28
                                                                                    e 3/4 )
[30] X I C HEN , E RIK WAINGARTEN , AND J INYU X IE: Boolean unateness testing with O(n
     adaptive queries.   In Proc. 58th FOCS, pp. 868–879. IEEE Comp. Soc. Press, 2017.
     [doi:10.1109/FOCS.2017.85, arXiv:1708.05786] 28

[31] K ASHYAP D IXIT, M ADHAV J HA , S OFYA R ASKHODNIKOVA , AND A BHRADEEP T HAKURTA:
     Testing the Lipschitz property over product distributions with applications to data privacy. In Proc.
     10th Theory of Cryptography Conf. (TCC’13), pp. 418–436. Springer, 2013. [doi:10.1007/978-3-
     642-36594-2_24, arXiv:1209.4056] 2

[32] K ASHYAP D IXIT, S OFYA R ASKHODNIKOVA , A BHRADEEP T HAKURTA , AND N ITHIN M. VARMA:
     Erasure-resilient property testing. SIAM J. Comput., 47(2):295–329, 2018. Preliminary version in
     ICALP’16. [doi:10.1137/16M1075661, arXiv:1607.05786] 2

[33] Y EVGENIY D ODIS , O DED G OLDREICH , E RIC L EHMAN , S OFYA R ASKHODNIKOVA , DANA
     RON , AND A LEX S AMORODNITSKY: Improved testing algorithms for monotonicity. In Proc. 3rd
     Internat. Workshop on Randomization and Computation (RANDOM’99), pp. 97–108. Springer,
     1999. [doi:10.1007/978-3-540-48413-4_10] 2

[34] F UNDA E RGÜN , S AMPATH K ANNAN , R AVI K UMAR , RONITT RUBINFELD , AND M AHESH
     V ISWANATHAN: Spot-checkers. J. Comput. System Sci., 60(3):717–751, 2000. Preliminary version
     in STOC’98. [doi:10.1006/jcss.1999.1692] 2, 4, 9

                       T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                            31
  R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

[35] E LDAR F ISCHER: On the strength of comparisons in property testing. Inform. and Comput.,
     189(1):107–116, 2004. [doi:10.1016/j.ic.2003.09.003] 2, 3, 5, 13, 15, 26

[36] E LDAR F ISCHER , E RIC L EHMAN , I LAN N EWMAN , S OFYA R ASKHODNIKOVA , RONITT RUBIN -
     FELD , AND A LEX S AMORODNITSKY : Monotonicity testing over general poset domains. In Proc.
     34th STOC, pp. 474–483. ACM Press, 2002. [doi:10.1145/509907.509977] 2

[37] O DED G OLDREICH: Introduction to Property Testing.            Cambridge Univ. Press, 2017.
     [doi:10.1017/9781108135252] 4

[38] O DED G OLDREICH , S HAFI G OLDWASSER , E RIC L EHMAN , DANA RON , AND A LEX S AMOROD -
     NITSKY : Testing monotonicity. Combinatorica, 20(3):301–337, 2000. Preliminary version in
     FOCS’98. [doi:10.1007/s004930070011] 2

[39] O DED G OLDREICH , S HAFI G OLDWASSER , AND DANA RON: Property testing and its connection
     to learning and approximation. J. ACM, 45(4):653–750, 1998. Preliminary version in FOCS’96.
     [doi:10.1145/285055.285060] 2

[40] O DED G OLDREICH AND DANA RON: On proximity-oblivious testing. SIAM J. Comput., 40(2):534–
     566, 2011. Preliminary version in STOC’09. [doi:10.1137/100789646] 4

[41] S HIRLEY H ALEVY AND E YAL K USHILEVITZ: Distribution-free property-testing. SIAM J. Comput.,
     37(4):1107–1138, 2007. Preliminary version in RANDOM’03. [doi:10.1137/050645804] 2

[42] S HIRLEY H ALEVY AND E YAL K USHILEVITZ: Testing monotonicity over graph prod-
     ucts. Random Structures Algorithms, 33(1):44–67, 2008. Preliminary version in ICALP’04.
     [doi:10.1002/rsa.20211] 2

[43] M ADHAV J HA AND S OFYA R ASKHODNIKOVA: Testing and reconstruction of Lipschitz functions
     with applications to data privacy. SIAM J. Comput., 42(2):700–731, 2013. Preliminary version in
     FOCS’11. [doi:10.1137/110840741] 2

[44] TALI K AUFMAN , S IMON L ITSYN , AND N ING X IE: Breaking the ε-soundness bound of the linearity
     test over GF(2). SIAM J. Comput., 39(5):1988–2003, 2010. Preliminary version in RANDOM’08.
     [doi:10.1137/080715548] 2

[45] S UBHASH K HOT, D OR M INZER , AND M ULI S AFRA: On monotonicity testing and Boolean
     isoperimetric-type theorems. SIAM J. Comput., 47(6):2238–2276, 2018. Preliminary version in
     FOCS’15. [doi:10.1137/16M1065872] 2

[46] S UBHASH K HOT AND I GOR S HINKAR: An O(n)   e    queries adaptive tester for unateness. In
     Proc. 20th Internat. Workshop on Randomization and Computation (RANDOM’16), pp. 37:1–
     37:7. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2016. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2016.37, arXiv:1608.02451] 2

[47] E RIC L EHMAN AND DANA RON: On disjoint chains of subsets. J. Combin. Theory Ser. A,
     94(2):399–404, 2001. [doi:10.1006/jcta.2000.3148] 2

                     T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                         32
         O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

[48] A MIT L EVI AND E RIK WAINGARTEN: Lower bounds for tolerant junta and unateness test-
     ing via rejection sampling of graphs. In Proc. 10th Innovations in Theoret. Computer Sci-
     ence (ITCS’19), pp. 52:1–52:20. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
     [doi:10.4230/LIPIcs.ITCS.2019.52, arXiv:1805.01074] 28

[49] R AMESH K RISHNAN S. PALLAVOOR , S OFYA R ASKHODNIKOVA , AND N ITHIN M. VARMA:
     Parameterized property testing of functions. ACM Trans. Comput. Theory, 9(4):17:1–17:19, 2018.
     Preliminary version in ITCS’17. [doi:10.1145/3155296] 2

[50] R AMESH K RISHNAN S. PALLAVOOR , S OFYA R ASKHODNIKOVA , AND E RIK WAIN -
     GARTEN : Approximating the distance to monotonicity of Boolean functions. In Proc. 31st
     ACM-SIAM Symp. on Discrete Algorithms (SODA’20), pp. 1995–2009. ACM Press, 2020.
     [doi:10.1137/1.9781611975994.123, arXiv:1911.06924] 2, 28

[51] M ICHAL PARNAS , DANA RON , AND RONITT RUBINFELD: On testing convexity and sub-
     modularity. SIAM J. Comput., 32(5):1158–1184, 2003. Preliminary version in RANDOM’02.
     [doi:10.1137/S0097539702414026] 2

[52] S OFYA R ASKHODNIKOVA: Monotonicity testing. Master’s thesis, Massachusetts Institute of
     Technology, Cambridge, MA, USA, 1999. LINK. 2

[53] S OFYA R ASKHODNIKOVA: Property Testing: Theory and Applications. Ph. D. thesis, Massachusetts
     Institute of Technology, Cambridge, MA, USA, 2003. LINK. 2

[54] S OFYA R ASKHODNIKOVA: Transitive-closure spanners: A survey. In Property Testing, volume
     6390 of LNCS, pp. 167–196. Springer, 2010. [doi:10.1007/978-3-642-16367-8_10] 2, 4

[55] S OFYA R ASKHODNIKOVA: Testing if an array is sorted. In Encyclopedia of Algorithms, pp.
     2219–2222. Springer, 2016. [doi:10.1007/978-1-4939-2864-4_700] 2

[56] S OFYA R ASKHODNIKOVA AND RONITT RUBINFELD: Linearity testing/testing Hadamard codes. In
     Encyclopedia of Algorithms, pp. 1107–1110. Springer, 2016. [doi:10.1007/978-1-4939-2864-4_202]
     2

[57] S OFYA R ASKHODNIKOVA AND A DAM D. S MITH: A note on adaptivity in testing properties of
     bounded degree graphs. Electron. Colloq. on Comput. Complexity (ECCC), 2006. [ECCC:TR06-
     089] 16, 24, 25

[58] S OFYA R ASKHODNIKOVA AND G RIGORY YAROSLAVTSEV: Learning pseudo-Boolean k-DNF
     and submodular functions. In Proc. 24th ACM-SIAM Symp. on Discrete Algorithms (SODA’13), pp.
     1356–1368. ACM Press, 2013. [doi:10.1137/1.9781611973105.98, arXiv:1208.2294] 2

[59] RONITT RUBINFELD AND M ADHU S UDAN: Robust characterizations of polynomi-
     als with applications to program testing. SIAM J. Comput., 25(2):252–271, 1996.
     [doi:10.1137/S0097539793255151] 2

                     T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                       33
  R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

[60] C. S ESHADHRI AND JAN VONDRÁK: Is submodularity testable? Algorithmica, 69(1):1–25, 2014.
     Preliminary version in ICS’10. [doi:10.1007/s00453-012-9719-2, arXiv:1008.0831] 2

[61] A NDREW C HI -C HIH YAO: Probabilistic computations: Toward a unified measure of complex-
     ity (extended abstract). In Proc. 18th FOCS, pp. 222–227. IEEE Comp. Soc. Press, 1977.
     [doi:10.1109/SFCS.1977.24] 16


AUTHORS

     Roksana Baleshzar [about the author]
     Software engineer
     Google, Inc.
     Sunnyvale, CA, USA
     roksana baleshzar gmail com


     Deeparnab Chakrabarty [about the author]
     Assistant professor
     Department of Computer Science
     Dartmouth College
     Hanover, NH, USA
     deeparnab dartmouth edu
     https://www.cs.dartmouth.edu/~deepc/


     Ramesh Krishnan S. Pallavoor [about the author]
     Graduate student
     Department of Computer Science
     Boston University
     Boston, MA, USA
     rameshkp bu edu
     http://cs-people.bu.edu/rameshkp/


     Sofya Raskhodnikova [about the author]
     Professor
     Department of Computer Science
     Boston University
     Boston, MA, USA
     sofya bu edu
     http://cs-people.bu.edu/sofya/




                    T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                       34
       O PTIMAL U NATENESS T ESTERS FOR R EAL -VALUED F UNCTIONS : A DAPTIVITY H ELPS

    C. Seshadhri [about the author]
    Associate professor
    Department of Computer Science
    University of California, Santa Cruz
    Santa Cruz, CA, USA
    sesh ucsc edu
    https://users.soe.ucsc.edu/~sesh/


ABOUT THE AUTHORS

    ROKSANA BALESHZAR was a graduate student at Penn State while this work was done.
      She received her M. S. in Computer Science from Penn State under the supervision of
      Sofya Raskhodnikova, and her Bachelor’s degree in Information Technology Engineering
      from Sharif University of Technology. She currently works as a Software Engineer at
      Google, Inc.


    D EEPARNAB C HAKRABARTY is a faculty member of the Department of Computer Science
       at Dartmouth. Prior to this, he spent five lovely years at Microsoft Research in Bangalore.
       Deeparnab received his Ph. D. from the Georgia Institute of Technology under the
       supervision of Vijay Vazirani. He had postdoctoral stints at the University of Waterloo
       and the University of Pennsylvania. He gets excited by the application of optimization
       techniques in the design of algorithms, and is constantly surprised by the breadth of their
       applicability. He loves teaching undergraduate algorithms at Dartmouth, and hopes that
       the class sizes remain small.


    R AMESH K RISHNAN S. PALLAVOOR recently graduated with a Ph. D. from the Department
       of Computer Science at Boston University where he was advised by Sofya Raskhodnikova.
       Previously, he was a Ph. D. student at Penn State, advised by Sofya Raskhodnikova.
       Before that, he received his B. Tech. in Computer Engineering from the Indian Institute
       of Information Technology, Design and Manufacturing (IIITD&M), Kancheepuram.
       His research interests include sublinear algorithms (in particular, property testing), and
       differential privacy.
       He grew up in Chennai, a coastal city of 7 million in the southern part of India. His
       interest in math was partly spurred by his obsession with cricket statistics. During the
       initial part of his undergraduate studies, he was not too interested in the theory side of
       CS, and he thanks his teacher and mentor, Professor N. Sadagopan, for changing his
       outlook.




                    T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                             35
R. BALESHZAR , D. C HAKRABARTY, R. K. S. PALLAVOOR , S. R ASKHODNIKOVA , AND C. S ESHADHRI

   S OFYA R ASKHODNIKOVA is a professor of Computer Science at Boston University. Pre-
      viously, she was a Professor of Computer Science and Engineering at Penn State. She
      received her Ph. D. from MIT. Prior to joining Penn State, she was a postdoctoral fellow
      at the Hebrew University of Jerusalem and the Weizmann Institute of Science. She
      has held visiting positions at the Institute for Pure and Applied Mathematics at UCLA,
      Boston University, Harvard University and at the Simons Institute for the Theory of
      Computing at Berkeley. Sofya works in the areas of randomized and approximation
      algorithms. Her main interest is the design and analysis of sublinear-time algorithms for
      combinatorial problems. She has also made important contributions to data privacy. As
      far as her hobbies go, recall that she works on privacy.


   C. S ESHADHRI (Sesh) is a faculty member of the Department of Computer Science and
      Engineering at the University of California, Santa Cruz. He received his Ph. D. from
      Princeton University under the supervision of Bernard Chazelle and did a postdoc at
      IBM Almaden. Prior to joining UCSC, he spent four years as a staff member at Sandia
      National Laboratories, Livermore. Sesh’s research interests are periodically sampled
      from a distribution that includes sublinear algorithms, social network analysis, and
      mathematical foundations for massive data analysis.
      Sesh got his B. Tech from the Computer Science and Engineering Department at the
      Indian Institute of Technology, Kanpur. He still remembers his first data structures course
      with Prof. Manindra Agrawal; in that course, he fell in love with TCS.
      Sesh used to have hobbies and was an all around interesting person, but that was before
      his kids were born.




                   T HEORY OF C OMPUTING, Volume 16 (3), 2020, pp. 1–36                             36