DOKK Library

The Communication Complexity of Gap Hamming Distance

Authors Alexander A. Sherstov,

License CC-BY-3.0

Plaintext
                              T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208
                                           www.theoryofcomputing.org




               The Communication Complexity of
                    Gap Hamming Distance
                                                Alexander A. Sherstov∗
                                    Received: August 8, 2011; published: May 17, 2012.




         Abstract: In the gap Hamming distance problem, two parties must determine whether
                                                                                        √
         their respective strings x, y ∈ {0, 1}n are at Hamming distance less than n/2 − n or greater
                     √
         than n/2 + n. In a recent tour de force, Chakrabarti and Regev (2010) proved the long-
         conjectured Ω(n) lower bound on the randomized communication complexity of this problem.
         In follow-up work, Vidick (2010) discovered a simpler proof. We contribute a new proof,
         which is simpler yet and a page-and-a-half long.


1      Introduction
The gap Hamming distance problem features two communicating parties, the first of which receives a
vector x ∈ {−1, +1}n and the second a vector y ∈ {−1, +1}n . The two vectors are chosen such that the
Hamming distance between them is either noticeably smaller than n/2 or noticeably larger than n/2. The
objective is to reliably determine which is the case by exchanging as few bits of communication as possible.
Throughout this paper, communication is assumed to be randomized, and the communication protocol
is to produce the correct answer with probability at least 2/3. Formally, gap Hamming distance is the
     ∗ Supported by NSF CAREER award CCF-1149018




ACM Classification: F.1.3
AMS Classification: 68Q17, 68Q25
Key words and phrases: Communication complexity, gap Hamming distance, Talagrand’s concentration
inequality


    2012 Alexander A. Sherstov
    Licensed under a Creative Commons Attribution License                                DOI: 10.4086/toc.2012.v008a008
                                         A LEXANDER A. S HERSTOV

communication problem that corresponds to the following partial function on {−1, +1}n × {−1, +1}n :
                                            (                    √
                                              −1 if hx, yi ≤ − n ,
                             GHDn (x, y) =                     √                              (1.1)
                                              +1 if hx, yi ≥ n ,

where hx, yi = ∑ xi yi is the usual inner product. Note that the function is left undefined when hx, yi is
                                           √
small in absolute value. The choice of n for the gap is natural for reasons of measure and represents the
most difficult case from the standpoint of proof. When the gap is different in size or simply shifted to
a different location in {−n, . . . , −1, 0, 1, . . . , n}, one applies a padding argument [7] to reduce the new
problem to the above canonical case.
    The gap Hamming distance problem, or GHD for short, was proposed by Indyk and Woodruff [8] as
a means to understand the complexity of several streaming tasks. More specifically, lower bounds on the
communication complexity of gap Hamming distance imply lower bounds on the memory requirements of
estimating the number of distinct elements in a data stream. Applications of GHD have been discovered
to several other streaming tasks, including the computation of frequency moments [18] and entropy [6].
    The communication complexity of gap Hamming distance has been the subject of much study over
the past few years. The original paper by Indyk and Woodruff [8] proved a linear lower bound on the
one-way communication complexity of a problem closely related to GHD. A linear lower bound on the
one-way communication complexity of GHD itself was obtained by Woodruff [18], with shorter and more
elementary proofs discovered subsequently by Jayram, Kumar, and Sivakumar [10], Woodruff [19], and
Brody and Chakrabarti [4]. A few years ago, Brody and Chakrabarti [4] obtained a linear lower bound
for constant-round protocols solving GHD. The dependence on the number of rounds was improved by
Brody et al. [5]. The communication complexity of GHD in the unbounded-round model, on the other
                                                                           √
hand, proved to be a challenge to analyze. A lower bound of Ω( n) is immediate by a reduction from
the disjointness problem [11, 14]. Coincidentally, GHD has a quantum communication protocol with cost
   √
O( n log n), so that the many quantum methods discovered to date were of no use in moving beyond the
√
  n barrier. Finally, in a recent tour de force, Chakrabarti and Regev [7] settled the problem definitively
with a linear lower bound on the unbounded-round communication complexity of gap Hamming distance:

Theorem 1.1 (Chakrabarti and Regev). If a randomized communication protocol solves GHDn with
probability 2/3 on every valid input, then it has communication complexity Ω(n).

   The proof in [7] is quite involved. In follow-up work, Vidick [17] discovered a simpler proof. This
paper contributes a new proof of Theorem 1.1, which is short and simpler than the proofs of Chakrabarti
and Regev [7] and Vidick [17]. In what follows, we give a detailed overview of previous work and our
approach.

1.1   Some Terminology
Let f : X × Y → {−1, +1} be a given communication problem. A common starting point in proving
lower bounds on randomized communication complexity is Yao’s minimax theorem [20]: to rule out a
randomized protocol for f with cost c and error probability at most ε, one defines a probability distribution
µ on X ×Y and argues that with respect to µ, every deterministic protocol with cost c errs on more than

                         T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                              198
                      T HE C OMMUNICATION C OMPLEXITY OF G AP H AMMING D ISTANCE

an ε fraction of the inputs. This approach is complete in that one can always prove a tight lower bound on
randomized communication in this manner.
    The challenge in Yao’s program is establishing the hardness of a given distribution µ for deterministic
communication protocols. By far the most common solution since the 1980s is the corruption method,
pioneered by Yao himself [20]. In more detail, a deterministic communication protocol with cost c gives
                      S c
a partition X ×Y = 2i=1 Ri , where each set Ri is the Cartesian product of a subset of X and a subset of Y .
The sets Ri are called rectangles, and the output of the deterministic protocol is constant on each rectangle.
To prove a lower bound on communication, one defines a probability measure µ on X ×Y and argues that
every rectangle R with nontrivial measure is ε-corrupted by elements of f −1 (+1), in the sense that

                                      µ(R ∩ f −1 (+1)) > ε µ(R ∩ f −1 (−1))                                         (1.2)

for some constant ε > 0. Provided that f −1 (−1) has reasonable measure, (1.2) bounds from below
the total number of rectangles in any partition of X ×Y . By symmetry, the roles of +1 and −1 can be
interchanged throughout this argument. Furthermore, the argument applies unchanged to partial functions
 f , whose domain is a proper subset of X ×Y .
      Over the years, many approaches have been used to prove (1.2). For the uniform distribution, a
particularly general method was discovered by Babai, Frankl, and Simon [2] almost thirty years ago. It
plays a key role in several subsequent papers and this work. In detail, let µ be the uniform measure on
X ×Y . For the sake of contradiction, suppose that R = A × B is a large rectangle that is not ε-corrupt,
A ⊆ X, B ⊆ Y . We may assume that no row of R is 2ε-corrupt because any offending rows can be
discarded without affecting the size of R much. The proof is completed in two steps.


S TEP 1: I DENTIFYING A H ARD C ORE.
         Using the hypothesis that A is large, one identifies elements x1 , x2 , . . . , xk ∈ A that are “very
         dissimilar” and collectively “representative” of X. Naturally, what those words mean depends
         on the context but one gets the right idea by thinking about k random elements of X. Typically k
         is tiny, exponentially smaller than |A|. We will call {x1 , x2 , . . . , xk } a hard core of A because at
         an intuitive level, these few elements capture the full complexity of A.

S TEP 2: C ORRUPTION .
         Using the hypothesis that B is large and x1 , x2 , . . . , xk are representative, one shows that the
         rectangle {x1 , x2 , . . . , xk } × B is 2ε-corrupt, a contradiction to the fact that no row of R = A × B
         is 2ε-corrupt.


     This program is successful in practice because it is much easier to analyze the corruption of a rectangle
{x1 , x2 , . . . , xk } × B for a small and highly structured collection of elements x1 , x2 , . . . , xk . Babai, Frankl,
                                                                                                                     √
and Simon [2] used this approach to establish, with an exceedingly elegant and short proof, an Ω( n)
lower bound on the communication complexity of set disjointness. In that work, X and Y both referred to
                                                            √
the family of subsets of {1, 2, . . . , n} of cardinality n, and the hard core used in Step 1 was a collection
            √
of k = ε n subsets that are mostly disjoint.

                           T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                       199
                                            A LEXANDER A. S HERSTOV

1.2    Previous Work
We are now in a position to outline the proofs of Theorem 1.1 due to Chakrabarti and Regev [7] and
Vidick [17]. Both works study a continuous version of gap Hamming distance, in which the parties
receive inputs x, y ∈ Rn drawn according to Gaussian measure and need to determine whether their inner
                       √                  √
product is less than − n or greater than n. It was shown earlier [5] that the discrete and continuous
versions of gap Hamming distance are equivalent from the standpoint of communication complexity. The
proof in [7] has two steps. Let R = A × B be a rectangle of nonnegligible Gaussian measure. For reasons
                                                                           √
of measure, we may assume that the vectors in A, B have Euclidean norm n, up to a factor 1 ± ε.
S TEP 1: I DENTIFYING A H ARD C ORE.
         Using the hypothesis that A has nontrivial measure, one identifies Ω(n) vectors x1 , x2 , . . . , xi , . . . ∈
         A that are almost orthogonal. More precisely, the Euclidean norm of the projection of xi onto
         span{x1 , . . . , xi−1 } is at most a small constant fraction of the norm of xi .
In retrospect, a system of near-orthogonal vectors is a natural choice for a hard core because GHD is
defined in terms of inner products. That such a system of vectors can always be chosen from A was proven
by Raz [13], who used this fact to obtain a lower bound for another linear-algebraic communication
problem (deciding subspace membership). In the light of the program of Babai, Frankl, and Simon [2], it
is tempting to proceed to Step 2 and argue that the rectangle {x1 , x2 , . . . , xi , . . . , } × B is heavily corrupted.
Unfortunately, gap Hamming distance does have rectangles that are large and almost uncorrupted, and
one cannot apply the corruption method directly. Instead, Chakrabarti and Regev [7] prove the following.
S TEP 20 : A NTICONCENTRATION .
                                                                                                              √
           With probability Ω(1), a random pair (x, y) ∈ {x1 , x2 , . . . , xi , . . . } × B has |hx, yi| = Ω( n).
    The two steps above immediately give the following statement: for any sets A, B ⊆ Rn of nonneg-
                                                                       √
ligible measure, random vectors x ∈ A, y ∈ B obey |hx, yi| = Ω( n) with constant probability. This
anticoncentration result is the technical centerpiece of Chakrabarti and Regev’s proof. The authors
actually derive a much stronger statement, giving a detailed characterization of the distribution of hx, yi.
To complete the proof, they use a criterion for high communication complexity due to Jain and Klauck [9],
known as the smooth rectangle bound. Specifically, Chakrabarti and Regev use their anticoncentration
result to argue that in any partition of Rn × Rn , only a small constant measure of inputs can be covered by
large uncorrupted rectangles. Settling this claim requires the introduction of a second measure, call it λ ,
to account for covering by large rectangles. The smooth rectangle bound [9] was discovered very recently
and overcomes limitations of Yao’s corruption bound—at the expense of being more challenging to use.
    In follow-up work, Vidick [17] discovered a simpler proof of the anticoncentration property for hx, yi,
by taking a matrix-analytic view of the problem as opposed to the purely measure- and information-
theoretic treatment in [7]. Vidick first shows that for any A ⊆ Rn of nonnegligible Gaussian measure, the
matrix M = Ex∈A [xxT ] has a relatively spread-out spectrum, with a constant fraction of singular values on
the order of Ω(1). Since Ex∈A,y∈B [hx, yi2 ] = Ey∈B [yT My], the author of [17] is able to use this spectral
property of M to prove anticoncentration for hx, yi. Vidick’s proof ingeniously exploits the rotation-
invariance of Gaussian measure and requires just the Bernstein inequality and the Berry-Esseen theorem
for independent Gaussian variables. With anticoncentration established, Vidick uses the Jain-Klauck
criterion to prove the lower bound on communication complexity.

                           T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                      200
                     T HE C OMMUNICATION C OMPLEXITY OF G AP H AMMING D ISTANCE




                                                                   
                 F    T    F                     F      T                         T      F




      Figure 1: Reduction from gap orthogonality to gap Hamming distance (T = “true,” F = “false”).


1.3     Our Proof
This paper contributes a new proof of Theorem 1.1. Our approach departs from previous work on two
counts. First, we use Yao’s original corruption method for proving communication lower bounds, rather
than the recent and more involved criterion of Jain and Klauck. Second, the authors of [7, 17] work with
an extension of the problem to Gaussian space, whereas we are able to give a direct argument for the
hypercube. As we show, the discrete setting allows for a treatment that is much simpler both in formalism
and in substance; contrast the proof of Lemma 4.4 in [13] with that of Lemma 3.1 in this paper to get an
idea.
    Our main technical tool is Talagrand’s concentration inequality [15, 1, 16]. It states that for any given
subset S ⊂ {−1, +1}n of constant measure, nearly all the points of the hypercube lie at a short Euclidean
distance from the convex hull of S. Talagrand’s concentration inequality has yielded results whose range
and depth are out of proportion to the inequality’s easy proof [1, 16]. We use the following well-known
consequence of Talagrand’s inequality: the projection
                                              √          of a random vector x ∈ {−1, +1}n onto a given
linear subspace V ⊆ Rn has Euclidean norm dimV ± O(1) almost surely.
    We now give a more detailed description of the proof. What we actually obtain is an Ω(n) lower
bound on the communication complexity of gap orthogonality, a problem in which the two parties
receive vectors x, y ∈ {−1, +1}n and need to reliably tell whether they are nearly orthogonal or far from
orthogonal. Formally, gap orthogonality is the partial Boolean function on {−1, +1}n × {−1, +1}n given
by
                                               (                    √
                                                 −1 if |hx, yi| ≤ n ,
                                 ORTn (x, y) =                       √                                  (1.3)
                                                 +1 if |hx, yi| ≥ 2 n .

Gap orthogonality readily reduces to gap Hamming distance, as suggested pictorially in Figure 1. Hence,
it suffices to prove an Ω(n) lower bound for gap orthogonality.
     It seems at first that nothing of substance is gained by switching from gap Hamming distance to gap
orthogonality. In actuality, the latter is preferable in that it allows the use of Yao’s corruption method.
Indeed, the corruption property for ORTn is equivalent to the anticoncentration of |hx, yi|. Thus, we just
need to establish the anticoncentration: for some absolute constant ε > 0 and any sets A, B ⊆ {−1, +1}n of
uniform measure at least 2−εn , the inner product hx, yi for random x ∈ A, y ∈ B cannot be too concentrated
around zero. We give a short proof of this result, which combines selected ideas of [7] and [17] with
some new elements.


                          T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                            201
                                          A LEXANDER A. S HERSTOV

S TEP 1: I DENTIFYING A H ARD C ORE.
         One can select a family of Ω(n) near-orthogonal vectors x1 , x2 , . . . , xi , . . . ∈ A. Formally, the
         projection of xi onto span{x1 , x2 , . . . , xi−1 } has Euclidean norm no greater than a third of the
         norm of xi .

S TEP 2: A NTICONCENTRATION & C ORRUPTION.
         Fix the vectors so constructed. Then with probability exponentially close to 1, a random
                                                                                     √
         y ∈ {−1, +1}n will have nonnegligible inner product (absolute value at least n/4) with one or
         more of the vectors xi .

Step 1 is a trivial consequence of Talagrand’s concentration inequality; earlier works by Raz [13]
and Chakrabarti and Regev [7] used an analogue of this claim for the sphere Sn−1 with the uniform
measure, whose proof was more involved. To prove Step 2, we switch to the matrix-analytic view of
Vidick [17] but give a simpler and more direct argument. Specifically, we consider the matrix M with
rows x1 , x2 , . . . , xi , . . . , which by construction is close in norm to an orthogonal matrix. It follows that
                                                                                       √
a constant fraction of the singular values of M are large, on the order of Ω( n). Applying Talagrand a
second time, we get that a random vector y ∈ {−1, +1}n will have a constant fraction of its Euclidean
norm in the linear subspace corresponding to the large singular values of M, except with probability
exponentially small. This completes Step 2. The desired anticoncentration property falls out as a corollary,
for purely combinatorial reasons. This proves corruption for gap orthogonality.


2    Preliminaries
Notation
The symbol [k] stands for the set {1, 2, . . . , k}. The inner product of x, y ∈ Rn is denoted hx, yi = ∑ xi yi .
Likewise, we let hA, Bi = ∑ Ai j Bi j for matrices A = [Ai j ] and B = [Bi j ]. The Boolean values “true” and
“false” are represented in this paper by −1 and +1, respectively. In particular, Boolean functions take on
values ±1. A partial function on X is a function whose domain of definition, denoted dom f , is a proper
subset of X. For a Boolean string x, the symbol xk stands for the concatenation xx . . . x (k times).

Linear algebra
The Frobenius norm of a real matrix M = [Mi j ] is given by kMkF = (∑ Mi2j )1/2 . We denote the singular
values of M by σ1 (M) ≥ σ2 (M) ≥ · · · ≥ 0. It is straightforward to bound the inner product of matrices
A, B in terms of their singular values. Specifically, fixing a singular value decomposition A = ∑ σi (A)ui vT
                                                                                                            i
for some unit vectors ui , vi , one arrives at the following well-known bound:

                      hA, Bi = ∑ σi (A)hui vT                 T
                                            i , Bi = ∑ σi (A)ui Bvi ≤ σ1 (B) ∑ σi (A) .                      (2.1)
                                 i                      i                         i

Very precise methods are known for estimating the rth singular value of a matrix, including the Hoffman-
Wielandt inequality. For us, the following crude bound is all that is needed.

                         T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                  202
                    T HE C OMMUNICATION C OMPLEXITY OF G AP H AMMING D ISTANCE

Fact 2.1. For all real matrices M, M̃ and all natural numbers r < rk M,

                                                                    √
                                                                      
                                              1      hM, M̃i
                              σr+1 (M) ≥                     − kMkF r .
                                          rk M − r σ1 (M̃)
                                                                                                           √
Proof. Let σi = σi (M). The r largest singular values of M sum to σ1 + · · · + σr ≤ (σ12 + · · · + σr2 )1/2 r ≤
      √
kMkF r, and the remaining ones sum to at most (rk M − r)σr+1 . At the same time, ∑ σi ≥ hM, M̃i/σ1 (M̃)
by (2.1).

    The Euclidean norm of a vector x ∈ Rn is denoted kxk = (∑ xi2 )1/2 . The dimension of a linear
subspace V is denoted dimV . For a linear subspace V ⊆ Rn and a vector x ∈ Rn , we let projV x denote
the orthogonal projection of x to V . The following fact is immediate from Talagrand’s concentration
inequality [1, Thm. 7.6.1].

Fact 2.2 (Talagrand). For every linear subspace V ⊆ Rn and every t > 0, one has
                                     h            √           i         2
                               P n k projV xk − dimV > t < 4e−ct ,
                              x∈{−1,+1}

where c > 0 is an absolute constant.

    Fact 2.2 is entirely classical. The interested reader will find its short derivation in [16] and in Section 5
of this paper.

Communication complexity
Fix finite sets X,Y and let f be a (possibly partial) Boolean function on X ×Y . A randomized communi-
cation protocol is said to compute f with error ε if for all (x, y) ∈ dom f , the output of the protocol on
(x, y) is f (x, y) with probability at least 1 − ε. The least communication cost of such a protocol is known
as the ε-error communication complexity of f , denoted Rε ( f ). For all constants ε ∈ (0, 1/2), one has
Rε ( f ) = Θ(R1/3 ( f )).
     A rectangle of X ×Y is any set of the form A × B, where A ⊆ X, B ⊆ Y . One of the earliest and best
known criteria for high randomized communication complexity is Yao’s corruption bound [20, 2, 12].

Theorem 2.3 (Corruption bound). Let f be a (possibly partial) Boolean function on X × Y . Given
ε, δ ∈ (0, 1), suppose that there is a distribution µ on X ×Y such that

                                    µ(R ∩ f −1 (+1)) > ε µ(R ∩ f −1 (−1))

for every rectangle R ⊆ X ×Y with µ(R) > δ . Then
                                                                
                                   Rξ ( f )   1       −1       ξ
                                 2          ≥     µ( f (−1)) −     .
                                              δ                ε

    We gave an informal proof of Theorem 2.3 in the Introduction. See [3, Lem. 3.5] for a rigorous
treatment.

                         T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                 203
                                              A LEXANDER A. S HERSTOV

3    Corruption of Gap Orthogonality
We start by showing that any subset of {−1, +1}n of nontrivial size contains n/10 near-orthogonal
vectors.
Lemma 3.1. Let α > 0 be a sufficiently small constant. Fix A ⊆ {−1, +1}n with |A| > 2(1−α)n . Then for
k = bn/10c there exist x1 , x2 , . . . , xk ∈ A such that for each i,
                                                                              √
                                                                               n
                                             projspan{x1 ,x2 ,...,xi } xi+1 ≤    .               (3.1)
                                                                              3
Proof. The proof is by induction. Having selected x1 , x2 , . . . , xi ∈ A, pick xi+1 ∈ {−1, +1}n uniformly
at random. Then P[xi+1 ∈ A] > 2−αn . On the other hand, Fact 2.2 implies that the projection of xi+1 to
                                                      √
span{x1 , x2 , . . . , xi } has Euclidean norm at most n/3, except with probability 2−αn . Thus, there exists
xi+1 ∈ A that obeys (3.1).

    Next, we show that given any family of near-orthogonal vectors in {−1, +1}n , a random vector in
{−1, +1}n almost surely has a substantial inner product with some vector from the family. The proof uses
the lower bound in Talagrand’s theorem, as opposed to Lemma 3.1 in which we used the upper bound.
Lemma 3.2. Fix vectors x1 , x2 , . . . , xm ∈ {−1, +1}n that obey (3.1) for all i. Then
                                                                √ 
                                                                  n
                                        P       max |hy, xi i| ≤     ≤ e−β m                                         (3.2)
                               y∈{−1,+1}n i=1,...,m              4

for some absolute constant β > 0.
Proof. Let x̃1 , . . . , x̃m be orthogonal vectors obtained from x1 , . . . , xm by the Gram-Schmidt process, i. e.,
x̃i = xi − projspan{x1 ,...,xi−1 } xi . Let M and M̃ be the m × n matrices with rows x1 , . . . , xm and x̃1 , . . . , x̃m ,
respectively. Then
                                     m           m             m
                                                                                          2    8
                     hM, M̃i = ∑ hxi , x̃i i = ∑ kxi k2 − ∑ projspan{x1 ,...,xi−1 } xi        ≥ nm
                                  i=1           i=1           i=1                              9
                         √                                                                     √
by (3.1). Also σ1 (M̃) ≤ n since M̃ has orthogonal rows. Thus, Fact 2.1 gives σdm/4e (M) ≥ 0.51 n.
    Let M = ∑m                T
                i=1 σi (M)ui vi be the singular value decomposition of M. Define V = {vi : σi (M) ≥
    √
0.51 n}, so that |V | ≥ m/4 by the previous paragraph. For all vectors y,
                                 m
                                                                                                    2
                    kMyk2 = ∑ σi (M)2 hy, vi i2 ≥ 0.26n ∑ hy, vi2 = 0.26n projspanV y .
                                i=1                            v∈V

Since spanV has dimension at least m/4, Fact 2.2 guarantees that kMyk2 > mn/16 with probability
1 − exp(−Ω(m)) for random y ∈ {−1, +1}n . This implies (3.2).

    The main result of this section is immediate from the previous two lemmas for basic combinatorial
                                             √
reasons; cf. the well-known proof of an Ω( n) lower bound on the communication complexity of
disjointness due to Babai, Frankl, and Simon [2].

                           T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                         204
                     T HE C OMMUNICATION C OMPLEXITY OF G AP H AMMING D ISTANCE

Theorem 3.3. Let ε > 0 be a small enough constant. Let A, B ⊆ {−1, +1}n be given such that
                                                                 √
Px∈A,y∈B [x f y] ≥ 1 − ε, where x f y is shorthand for |hx, yi| ≤ n/4 (“almost orthogonal”). Then

                                           4−n |A| |B| = exp(−Ω(n)) .

Proof. Assume that |A| > 2 · 2(1−α)n , where α > 0 is the constant from Lemma 3.1. We will show that a
random y ∈ {−1, +1}n occurs in B with probability at most exp(−Ω(n)).
      The argument is a combinatorial accounting for what kinds of elements arise in B and is closely
analogous to Theorem 8.3 in [2]. Define A0 = {x ∈ A : Py∈B [x f y] ≥ 1 − 2ε}, so that |A0 | ≥ 21 |A|. Fix
x1 , x2 , . . . , xk ∈ A0 to obey (3.1) for all i, where k = bn/10c. Define B0 = {y ∈ B : Pi∈[k] [xi f y] ≥ 1 − 3ε} ,
so that |B0 | ≥ 13 |B|. Then 2−n |B0 | is a lower bound on the probability that a random y ∈ {−1, +1}n has
                  √
|hy, xi i| ≤ n/4 for at least (1 − 3ε)k indices i. By Lemma 3.2 and the union bound, this probability
                         k
cannot exceed 3εk           e−Ω(k) ≤ e−Ω(n) .


4     Main Result
In this final section, we prove the sought Ω(n) lower bound on the communication complexity of gap
Hamming distance and gap orthogonality, defined in (1.1) and (1.3).

Main Theorem.            R1/3 (ORTn ) = Ω(n).

Proof. Consider the partial Boolean function fn on {−1, +1}n ×{−1, +1}n defined as −1 when |hx, yi| ≤
√                                   √
  n/8 and +1 when |hx, yi| ≥ n/4. With µ the uniform distribution on {−1, +1}n × {−1, +1}n ,
Theorem 3.3 guarantees that µ(R ∩ fn−1 (+1)) > ε µ(R) for all rectangles R with µ(R) > 2−εn , where
ε > 0 is a small constant. Since µ( fn −1 (−1)) = Θ(1), the corruption bound (Theorem 2.3) shows that
R1/3 ( fn ) = Ω(n). Finally, fn (x, y) = ORT64n (x64 , y64 ).

Corollary.         R1/3 (GHDn ) = Ω(n).

Proof. Immediate from the reduction in Figure 1. Formally, for n a square,
                                       √              √ 
    ORTn (x, y) = GHD10n+15√n x10 (−1)15 n , y10 (+1)15 n
                                                                               √              √ 
                                                       ∧ ¬GHD10n+15√n x10 (+1)15 n , y10 (+1)15 n .


5     More on Talagrand’s Concentration Inequality
In this concluding section, we provide additional background on Talagrand’s concentration inequality
for the interested reader and include the well-known derivation of Fact 2.2. For a nonempty subset
S ⊆ Rn and a point x ∈ Rn , let ρ(x, S) = infy∈S kx − yk be the Euclidean distance from x to S. Talagrand’s
inequality [15, 1] states that for any reasonably large subset S of the hypercube, almost all the points of
the hypercube lie at a short distance from the convex hull of S. In more detail:

                          T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                                    205
                                      A LEXANDER A. S HERSTOV

Theorem 5.1 (Talagrand). For a fixed convex set S ⊆ Rn and a random x ∈ {−1, +1}n ,
                                                                    2
                                   P[x ∈ S] P[ρ(x, S) > t] ≤ e−t /16 .

    We now explain how Fact 2.2 is a consequence of Talagrand’s concentration inequality. For any linear
subspace V ⊆ Rn and vector x ∈ Rn , elementary linear algebra gives k projV xk = ρ(x,V ⊥ ), where V ⊥ is
the orthogonal complement of V with dimV ⊥ = n − dimV . This allows one to reformulate Fact 2.2 in
the language of Euclidean distances: for a fixed linear subspace V ⊆ Rn and a random x ∈ {−1, +1}n ,
                               h         √            i        2
                              P ρ(x,V ) − n − dimV > t ≤ 4e−Ω(t ) .                                (5.1)

The advantage of this reformulation is that it can be easily proved using Talagrand’s concentration
inequality. We present here the proof from a recent expository article by Tao [16]. Fix a ≥ 0 and
consider the convex set S = {v ∈ Rn : ρ(v,V ) ≤ a}. Then P[ρ(x,V ) ≤ a] P[ρ(x, S) > t] ≤ exp(−t 2 /16)
by Talagrand. But by the triangle inequality, ρ(x,V ) > a + t implies ρ(x, S) > t, whence
                                                                         2
                              P[ρ(x,V ) ≤ a] P[ρ(x,V ) > a + t] ≤ e−t /16                          (5.2)

for any a. Let m be the median value of ρ(x,V ). The two tail bounds
                                                                2
                                    P[ρ(x,V ) > m + t] ≤ 2e−t /16 ,                                (5.3)
                                                             −t 2 /16
                                    P[ρ(x,V ) ≤ m − t] ≤ 2e                                        (5.4)

result from letting a = m and a = m − t, respectively, in (5.2). Consequently, ρ(x,V ) is sharply concen-
trated around its median. The sharp concentration
                                      √           means among other things that the median is within an
additive constant of E[ρ(x,V )2 ]1/2 = n − dimV . Along with (5.3) and (5.4), this settles (5.1).


Acknowledgments
The author is thankful to Amit Chakrabarti, Dmitry Gavinsky, Oded Regev, Thomas Vidick, Ronald de
Wolf, David Woodruff, and the anonymous reviewers for their valuable feedback on an earlier version of
this manuscript.


References
 [1] N OGA A LON AND J OEL H. S PENCER: The Probabilistic Method. John Wiley & Sons, 3rd edition,
     2008. [doi:10.1002/9780470277331] 201, 203, 205

 [2] L ÁSZLÓ BABAI , P ÉTER F RANKL , AND J ÁNOS S IMON: Complexity classes in communica-
     tion complexity theory. In Proc. 27th FOCS, pp. 337–347. IEEE Comp. Soc. Press, 1986.
     [doi:10.1109/SFCS.1986.15] 199, 200, 203, 204, 205

                       T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                           206
                  T HE C OMMUNICATION C OMPLEXITY OF G AP H AMMING D ISTANCE

 [3] PAUL B EAME , T ONIANN P ITASSI , NATHAN S EGERLIND , AND AVI W IGDERSON: A strong
     direct product theorem for corruption and the multiparty communication complexity of disjointness.
     Comput. Complexity, 15(4):391–432, 2006. [doi:10.1007/s00037-007-0220-2] 203

 [4] J OSHUA B RODY AND A MIT C HAKRABARTI: A multi-round communication lower bound for
     gap Hamming and some consequences. In Proc. 24th IEEE Conf. on Computational Complexity
     (CCC’09), pp. 358–368. IEEE Comp. Soc. Press, 2009. [doi:10.1109/CCC.2009.31] 198

 [5] J OSHUA B RODY, A MIT C HAKRABARTI , O DED R EGEV, T HOMAS V IDICK , AND RONALD DE
     W OLF: Better gap-Hamming lower bounds via better round elimination. In Proc. 15th Inter-
     nat. Workshop on Randomization and Computation (RANDOM’10), pp. 476–489. Springer, 2010.
     [doi:10.1007/978-3-642-15369-3_36] 198, 200

 [6] A MIT C HAKRABARTI , G RAHAM C ORMODE , AND A NDREW M C G REGOR: A near-optimal
     algorithm for estimating the entropy of a stream. ACM Trans. Algorithms, 6(3), 2010.
     [doi:10.1145/1798596.1798604] 198

 [7] A MIT C HAKRABARTI AND O DED R EGEV: An optimal lower bound on the communication
     complexity of gap-Hamming-distance. In Proc. 43rd STOC, pp. 51–60. ACM Press, 2011.
     [doi:10.1145/1993636.1993644] 198, 200, 201, 202

 [8] P IOTR I NDYK AND DAVID P. W OODRUFF: Tight lower bounds for the distinct elements problem. In
     Proc. 44th FOCS, pp. 283–289. IEEE Comp. Soc. Press, 2003. [doi:10.1109/SFCS.2003.1238202]
     198

 [9] R AHUL JAIN AND H ARTMUT K LAUCK: The partition bound for classical communication complex-
     ity and query complexity. In Proc. 25th IEEE Conf. on Computational Complexity (CCC’10), pp.
     247–258. IEEE Comp. Soc. Press, 2010. [doi:10.1109/CCC.2010.31] 200

[10] T. S. JAYRAM , R AVI K UMAR , AND D. S IVAKUMAR: The one-way communication complexity of
     Hamming distance. Theory of Computing, 4(1):129–135, 2008. [doi:10.4086/toc.2008.v004a006]
     198

[11] BALA K ALYANASUNDARAM AND G EORG S CHNITGER: The probabilistic communication com-
     plexity of set intersection. SIAM J. Discrete Math., 5(4):545–557, 1992. [doi:10.1137/0405044]
     198

[12] E YAL K USHILEVITZ AND N OAM N ISAN: Communication complexity. Cambridge University
     Press, 1997. [doi:10.1017/CBO9780511574948] 203

[13] R AN R AZ: Exponential separation of quantum and classical communication complexity. In Proc.
     31st STOC, pp. 358–367. ACM Press, 1999. [doi:10.1145/301250.301343] 200, 201, 202

[14] A LEXANDER A. R AZBOROV: On the distributional complexity of disjointness. Theoret. Comput.
     Sci., 106(2):385–390, 1992. [doi:10.1016/0304-3975(92)90260-M] 198

                      T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                          207
                                    A LEXANDER A. S HERSTOV

[15] M ICHEL TALAGRAND: Concentration of measure and isoperimetric inequalities in product spaces.
     Publications Mathématiques de L’IHÉS, 81(1):73–205, 1996. [doi:10.1007/BF02699376] 201, 205

[16] T ERENCE TAO: Talagrand’s concentration inequality. Weblog entry, 2009. http://terrytao.
     wordpress.com/2009/06/09/talagrands-concentration-inequality/. 201, 203, 206

[17] T HOMAS V IDICK: A concentration inequality for the overlap of a vector on a large set with
     application to the communication complexity of the Gap-Hamming-Distance problem. Technical
     Report TR11-051, Electron. Colloq. on Comput. Complexity (ECCC), 2010. Revised, 2011.
     [ECCC:TR11-051] 198, 200, 201, 202

[18] DAVID P. W OODRUFF: Optimal space lower bounds for all frequency moments. In Proc. 15th Ann.
     ACM-SIAM Symp. on Discrete Algorithms (SODA’04), pp. 167–175, 2004. [ACM:982817] 198

[19] DAVID P. W OODRUFF: The average-case complexity of counting distinct elements. In Proceed-
     ings of the Twelfth International Conference on Database Theory (ICDT), pp. 284–295, 2009.
     [doi:10.1145/1514894.1514928] 198

[20] A NDREW C HI -C HIH YAO: Lower bounds by probabilistic arguments. In Proc. 24th FOCS, pp.
     420–428. IEEE Comp. Soc. Press, 1983. [doi:10.1109/SFCS.1983.30] 198, 199, 203


AUTHOR

     Alexander A. Sherstov
     Assistant Professor
     University of California, Los Angeles
     sherstov cs ucla edu
     http://www.cs.ucla.edu/~sherstov


ABOUT THE AUTHOR

     A LEXANDER A. S HERSTOV completed his Ph. D. in 2009 at the University of Texas at
        Austin under the direction of Adam Klivans. After a postdoc at Microsoft Research,
        Alexander recently joined UCLA as an assistant professor of computer science. His
        research interests include computational complexity theory, computational learning
        theory, and quantum computing.




                      T HEORY OF C OMPUTING, Volume 8 (2012), pp. 197–208                     208