On some extensions of the FKN theorem

Authors Jacek Jendrej, Krzysztof Oleszkiewicz, Jakub O. Wojtaszczyk,
Plaintext
                         T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469
                                        www.theoryofcomputing.org

                   S PECIAL ISSUE : A NALYSIS OF B OOLEAN F UNCTIONS



    On Some Extensions of the FKN Theorem
       Jacek Jendrej                  Krzysztof Oleszkiewicz∗              Jakub O. Wojtaszczyk
             Received January 19, 2013; Revised September 19, 2015; Published December 29, 2015




       Abstract: Let S = a1 r1 + a2 r2 + · · · + an rn be a weighted Rademacher sum. Friedgut, Kalai,
       and Naor have shown that if Var(|S|) is much smaller than Var(S), then the sum is largely
       determined by one of the summands. We provide a simple and elementary proof of this
       result, strengthen it, and extend it in various ways to a more general setting.

ACM Classification: G.3
AMS Classification: 60E15, 42C10
Key words and phrases: independent random variables, Rademacher variables, absolute value variation,
Fourier expansion, Irit Dinur PCP proof

1     Introduction
Consider a family of independent random variables (Xi )ni=1 . It is easy to prove that if the distribution of
their sum is supported on a set of cardinality 2, then all the Xi ’s but one are constant almost surely. In our
paper we investigate stability of this phenomenon. Namely, we prove that if the distribution of the sum is
concentrated around a two-point set, then there exists k ∈ {1, 2, . . . , n} such that ∑i:i6=k Xi is concentrated
around some point. We provide various strict quantitative variants of this heuristic statement. One of
them is the following theorem:
Theorem 1.1. Let τ ≥ 1. Let (Xi )ni=1 be a sequence of independent square-integrable random variables.
Assume that Var(Xi ) ≤ τ · (E|Xi − EXi |)2 for 1 ≤ i ≤ n. Then for some k ∈ {1, 2, . . . , n} we have
                                                                       
                             Var ∑ Xi ≤ K(τ) · inf Var x + ∑ Xi ,
                                                                   x∈R   i≤n
                                           i≤n:i6=k

where K(τ) is a constant which depends only on τ.
    ∗ Partially supported by Polish MNiSW grant N N201 397437.



© 2015 Jacek Jendrej, Krzysztof Oleszkiewicz and Jakub O. Wojtaszczyk
c b Licensed under a Creative Commons Attribution License (CC-BY)                     DOI: 10.4086/toc.2015.v011a018
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

    Note that always Var(Xi ) − (E|Xi − EXi |)2 = Var(|Xi − EXi |) ≥ 0, with equality only when Xi is either
constant almost surely, or uniformly distributed on a two-point set. For such Xi ’s, the comparison of
moments assumption is thus satisfied with τ = 1. In a sense, the closer τ is to one, the more ∑i Xi must
resemble a (shifted) weighted Rademacher sum.
    This result for weighted Rademacher sums (i. e., in the case when for each i the random variable Xi
is symmetric and takes values ±ai , where a1 , a2 , . . . , an are some real numbers) was proved in [4] by
E. Friedgut, G. Kalai, and A. Naor, and was a part of the proof of their theorem on Boolean functions on
the discrete cube with Fourier coefficients concentrated at the first two levels.

FKN Theorem ([4], Theorem 1.1). There exists an absolute constant K such that for any
                                                             s
                               n
                    f : {−1, 1} → {−1, 1}       and     ρ=      ∑ | fˆ(A)|2
                                                                                  |A|≥2


one of the following holds:

                      k f (x1 , x2 , . . . , xn ) − 1k2 ≤ Kρ,      or
                      k f (x1 , x2 , . . . , xn ) + 1k2 ≤ Kρ,      or
                                                                                                          (1.1)
                     k f (x1 , x2 , . . . , xn ) − xi k2 ≤ Kρ for some i ∈ {1, 2, . . . , n},   or
                     k f (x1 , x2 , . . . , xn ) + xi k2 ≤ Kρ for some i ∈ {1, 2, . . . , n}.

Remark 1.2. Note that (1.1) can be equivalently formulated as follows.              P( f 6= 1) ≤ K 2 ρ 2 /4 or
P( f 6= −1) ≤ K ρ /4, or P( f (x) 6= xi ) ≤ K ρ /4 for some i, or P( f (x) 6= −xi ) ≤ K 2 ρ 2 /4 for some i.
               2 2                           2 2


    They gave two proofs of the result concerning Rademacher sums. One was a direct application
of a theorem of König et al. [7]. The other used a more elementary approach (Chernoff’s inequality),
but contained an omission—it worked only under the additional assumption that we already know that
Var(Xk ) ≥ C Var(∑i Xi ) for some C close to 1. This minor gap is well known by now as well as some
ways to fix it, for example by the use of the Berry–Esseen theorem—in fact, it has been fixed already by
Kindler and Safra [6], whose proof also yielded better asymptotic estimates than [4]. (Although [6] was
not formally published, as far as we know, it was widely circulated; the proof appeared also in Kindler’s
Ph. D. thesis [5].) The FKN theorem is a direct application of the above variance bound for Rademacher
sums. It was originally devised for applications in discrete combinatorics and social science, but turned
out to be useful also in theoretical computer science. In particular, the theorem is used in analyzing the
Long Code Test in the celebrated expander proof of the PCP theorem by Irit Dinur ([3]). With that in
mind, we hope that our easy, self-contained proof will simplify understanding of the PCP theorem’s
background. Hence we set out to give an elementary proof of Theorem 1.1 for weighted Rademacher
sums which does not refer to intricate results such as the Berry–Esseen inequality, [7], [2] or [1] (a proof
based on the Bonami–Beckner hypercontractive bounds was also known).
    We think it is also interesting (although not very surprising) that the inequality still holds if we replace
the Rademacher variables by variables satisfying a moment comparison condition. Note that, in contrast
to the weighted Rademacher setting described above, in our results the sums do not need to be linear
combinations of an i. i. d. sequence (however, in the discrete cube setting we actually prove a stronger,

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                              446
                              O N S OME E XTENSIONS OF THE FKN T HEOREM

and essentially optimal, FKN type estimate in Theorem 5.3 by the use of yet another method; Ryan
O’Donnell obtained the same bound independently by a slightly different approach—see [11, Theorem
5.33]).
    We also provide the following analogous result for symmetric random variables with no additional
assumption about moment comparison.

Theorem 1.3. Let (Xi )ni=1 be a sequence of independent symmetric square-integrable random variables.
Then for some k ∈ {1, 2, . . . , n} we have
                                                                  
                                  Var ∑ Xi ≤ C · inf Var x + ∑ Xi ,
                                                       x∈R           i≤n
                                      i≤n:i6=k
                                                                    √
where C is a universal constant. The result holds true with C = (7 + 17)/2.

     For the sake of clarity, we start by showing in Section 2 that if a sum of independent random vectors is
concentrated around a two-point set, then by removing just one term we may make the sum of remaining
vectors concentrate around a single point. Then, in Section 3, we demonstrate how to use this observation
in the real-valued case. However, our results and methods can be quite easily adapted to a Banach space
setting, with concentration around a finite set of points, which leads to some nice geometric considerations.
We present them in Section 6. Only very basic knowledge of Banach space theory is needed, which can
be found, e. g., in [15].
     Since we tried to make our proofs as transparent and “low-tech” as possible, in many cases our
estimates can be easily improved upon some natural optimization.
     Readers interested only in the Rademacher case may find it useful to restrict their attention to Section 4,
in which Theorem 1.3 is proved, and first two subsections of Section 5, in which the strengthening of the
FKN Theorem is described.


2    Splitting of the sum
We begin by analyzing the concentration in terms of probability rather than variance. In what follows, we
denote by µZ the distribution of a random variable Z. Readers not comfortable with the Banach space
formulation may simply replace V by R, and k · k by | · |.

Lemma 2.1. Let X, Y be independent random variables with values in a real separable Banach space V .
Assume that for δ ≥ 0 and a, b ∈ V we have

                            P (kX +Y − ak ≤ ε or kX +Y − bk ≤ ε) ≥ 1 − δ ,                                (2.1)

where 0 ≤ ε < kb − ak/6. Then there exists some vector c ∈ V such that
                                        √                             √
                     P (kX − ck > ε) ≤ δ + δ or P (kY − ck > ε) ≤ δ + δ .

Proof. Let v = b − a and

                        Ay = {x ∈ V : kx + y − ak ≤ ε} ∪ {x ∈ V : kx + y − bk ≤ ε}

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                              447
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

for y ∈ V . Then from (2.1) and independence of variables by the Fubini theorem we get
                                     Z                                  Z
                           1−δ ≤             1Ay (x)dµ(X,Y ) (x, y) =       µX (Ay ) dµY (y) ,
                                      V ×V                              V

so in particular µX (Ay ) ≥ 1 − δ for some y ∈ V , which means

                   ∃c1 , c2 ∈ V, c2 − c1 = v : P (kX − c1 k ≤ ε or kX − c2 k ≤ ε) ≥ 1 − δ .

Similarly we prove that

                   ∃d1 , d2 ∈ V, d2 − d1 = v : P (kY − d1 k ≤ ε or kY − d2 k ≤ ε) ≥ 1 − δ .

Let α = P (kX − c1 k ≤ ε), β = P (kX − c2 k ≤ ε), γ = P (kY − d1 k ≤ ε), η = P (kY − d2 k ≤ ε). Thus

                                    P (k(X +Y ) − (c1 + d1 )k ≤ 2ε) ≥ αγ ,
                                    P (k(X +Y ) − (c2 + d2 )k ≤ 2ε) ≥ β η .
                                                                                           √
Let B̄(x, r) denote the closed ball with center x and radius r. If α, γ, β , η >            δ , then αγ, β η > δ , and
(2.1) would imply that

        B̄(c1 + d1 , 2ε) ∩ (B̄(a, ε) ∪ B̄(b, ε)) 6= 0/   and    B̄(c2 + d2 , 2ε) ∩ (B̄(a, ε) ∪ B̄(b, ε)) 6= 0/ .

Since (c2 + d2 ) − (c1 + d1 ) = (c2 − c1 ) + (d2 − d1 ) = 2v, the diameter of B̄(a, ε) ∪ B̄(b, ε) would satisfy
the inequalities
                             2kvk − 4ε ≤ diam(B̄(a, ε) ∪ B̄(b, ε)) ≤ kvk + 2ε ,

contradicting the assumption ε < kvk/6.                    √
   Without                                √ assume that α ≤ δ . Since α + β ≥ 1 − δ , this implies
        √ loss of generality we may therefore
β ≥ 1 − δ − δ , i. e., P (kX − c2 k > ε) ≤ δ + δ .

Lemma 2.2. Let X1 , X2 , . . . , Xn be independent random variables with values in a separable Banach
space V . Let S = X1 + · · · + Xn , Si = S − Xi for 1 ≤ i ≤ n. Assume that for δ ≥ 0 and some vectors a, b ∈ V
we have
                                    P (kS − ak ≤ ε or kS − bk ≤ ε) ≥ 1 − δ ,                              (2.2)

where ε < kb − ak/6. Then there exist k ∈ {1, . . . , n} and c ∈ V such that
                                                 √            √
                           P (kSk − ck > 2ε) ≤ 2( δ + δ ) = O( δ ) as δ → 0+ .

Additionally, if δ < 1/9, then
                                                                 √
      P (kXk − (a − c)k > 3ε and kXk − (b − c)k > 3ε) ≤ δ /(1 − 2 δ − 2δ ) = O(δ ) as δ → 0+ .

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                      448
                              O N S OME E XTENSIONS OF THE FKN T HEOREM

Proof. Let I be a minimal (in the sense of inclusion) subset of {1, . . . , n} such that
                                                            !
                                                                 √
                               ∀c ∈ V : P ∑ Xi − c > ε > δ + δ
                                                  i∈I

(if there is no such I, there is nothing to prove). Of course I 6= 0.
                                                                   / Let k ∈ I. We have S = ∑i∈I Xi + ∑i∈I
                                                                                                        / Xi ,
and the two sums are obviously independent, so by our assumption about I, (2.2) and Lemma 2.1, for
some c1 ∈ V we get                                          !
                                                                  √
                                       P ∑ Xi − c1 > ε ≤ δ + δ .                                        (2.3)
                                          i∈I
                                           /
But I was minimal, so for some c2 ∈ V
                                                            !
                                                                    √
                                   P      ∑ Xi − c2 > ε         ≤    δ +δ .                             (2.4)
                                        i∈I\{k}

Sk = ∑i∈I
       / Xi + ∑i∈I\{k} Xi , so that (2.3), (2.4) and the triangle inequality yield
                                                                   √
                                 P (kSk − (c1 + c2 )k > 2ε) ≤ 2( δ + δ ) .
The second assertion of the lemma follows easily upon recalling that Sk and Xk are independent and
S = Sk + Xk :
              √                                                         
        (1 − 2 δ − 2δ ) · P kXk − (a − c)k > 3ε and kXk − (b − c)k > 3ε
                                                                                          
                       ≤ P kSk − ck ≤ 2ε and kXk − (a − c)k > 3ε and kXk − (b − c)k > 3ε
                                                        
                       ≤ P kS − ak > ε and kS − bk > ε ≤ δ .
                          √
For δ ≥ 1/9 we have 1 − 2 δ − 2δ ≤ δ , which makes the arising probability bound trivial.
Remark 2.3. Both bounds are of optimal order for δ → 0+ even for V
                                                                 √= R, as indicated by the√following
example. Fix a = 0, b = 1, ε = 1/7 and some n ≥ 2. Let Xi ∼ Pois( δ /n), so that S ∼ Pois( δ ) and
                                                 √      √
                              P(S ∈ {0, 1}) = e− δ (1 + δ ) ≥ 1 − δ .
Hence the assumptions of Lemma 2.2 are satisfied. Since for every k ≤ n there is
                                                  n−1√
                                                          
                                      Sk ∼ Pois          δ ,
                                                    n
                  √                         √                                          √
we have P(Sk = 0)/ δ → ∞ and P(Sk = 1)/ δ → (n − 1)/n as δ → 0+ which shows that the O( δ )
bound cannot be improved. Also, for every k ≤ n we have
                                                                                   1
                 P(Xk = 0)/δ → ∞ ,       P(Xk = 1)/δ → ∞ ,      and    P(Xk = 2)/δ →
                                                                                  2n2
as δ → 0+ . Hence, for δ small enough, for every set A which is a union of two intervals of length 6/7
each, there is
                                                        δ
                                           P(Xk ∈
                                                / A) ≥ 2
                                                       3n
which proves the optimality of the O(δ ) bound.

                    T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                              449
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

3    Proof of Theorem 1.1 (comparison of moments assumption)
We will now show how to use the facts from the previous paragraph to give a proof of Theorem 1.1.
Concentration bounds in terms of probability will be translated into statements about variances by the use
of a Paley–Zygmund type inequality. We need a few simple and standard lemmas.
Lemma 3.1 (Khinchine inequality). Let r1 , r2 , . . . be independent symmetric ±1 random variables. There
exists a universal constant κ such that for every a1 , a2 , . . . , am ∈ R there is
                                                                     !1/2
                                                m                    m
                                        κ · E ∑ ai ri ≥             ∑ a2i       .
                                               i=1                  i=1
                                                   √
Proof. The estimate with the optimal constant κ = 2 was proved by Szarek, [16] (see [8] for a simpler
                                                                                                 √
proof). For the reader’s convenience we provide a well-known simple argument which yields κ = 3.
Let S = ∑m                      2   m    2
          i=1 ai ri , so that ES = ∑i=1 ai . Since
                            m                                        m       m        2
                   ES4 = ∑ a4i + 6        ∑           a2i a2j ≤ 3    ∑ a2i     ∑ j = 3 ES2 ,
                                                                                a 2
                           i=1        i, j:1≤i< j≤m                  i=1       j=1

by Hölder’s inequality we get
                                           1/3                      2/3
               ES2 = E |S|4/3 · |S|2/3 ≤ ES4      (E|S|)2/3 ≤ 31/3 ES2      (E|S|)2/3 .
        √             1/2         2 1/2 .
         3 E|S| ≥ ES2      = ∑m
                                     
Hence                         i=1 ai

Lemma 3.2. Let Y1 ,Y2 , . . . ,Ym be independent symmetric integrable random variables. Then
                                                               !1/2
                                                m                    m
                                        κ · E ∑ Yi ≥ E              ∑ Yi2       ,
                                               i=1                  i=1

where κ is the universal constant from Lemma 3.1.
Proof. Let (ri )m                                                                                      m
                   i=1 be a sequence of independent symmetric ±1 random variables, independent of (Yi )i=1 ,
                  m                                     m
so that (Yi · ri )i=1 has the same distribution as (Yi )i=1 . Hence
                                                                    !
                     m           m                m                             m 1/2
                 E ∑ Yi = E ∑ Yi ri = E E ∑ Yi ri Y1 ,Y2 , . . . ,Ym ≥ κ −1 · E ∑ Yi2     ,
                i=1             i=1               i=1                                   i=1

where we have used Lemma 3.1.

   The following result may be traced back to the work of Marcinkiewicz and Zygmund [9] (see
Théorème 2 therein).
Corollary 3.3. Let ρ ≥ 1. Let Y1 ,Y2 , . . . ,Ym be independent symmetric square-integrable random vari-
ables such that EYi2 ≤ ρ · (E|Yi |)2 for i ≤ m. Set Z = ∑m                 2   2          2
                                                           i=1 Yi . Then EZ ≤ κ ρ · (E|Z|) , where κ is the
universal constant from Lemma 3.1.

                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                          450
                                O N S OME E XTENSIONS OF THE FKN T HEOREM

Proof. Indeed, by Lemma 3.2 we have
                                     m        m 1/2
                    κ · E|Z| = κ · E ∑ Yi ≥ E ∑ Yi2   = E (|Y1 |, |Y2 |, . . . , |Ym |)
                                        i=1               i=1                                      l2m
                                                                              m            1/2
                             ≥ (E|Y1 |, E|Y2 |, . . . , E|Ym |)        m
                                                                      l2
                                                                         =    ∑ (E|Yi |)2
                                                                              i=1
                                           m          1/2
                             ≥ ρ −1/2 ·       ∑ EYi2            = ρ −1/2 · (EZ 2 )1/2 ,
                                              i=1

where we have used Jensen’s inequality for the convex function y 7→ kykl2m .

Proof of Theorem 1.1. Obviously, by considering x + X1 instead of X1 we may reduce our task to proving
that                                                               
                            min Var ∑ Xi ≤ K(τ) · Var ∑ Xi .
                                  k≤n                                              i≤n
                                               i≤n:i6=k

Let (Xi0 )ni=1 be an independent copy of (Xi )ni=1 . Let S = ∑ni=1 Xi and S0 = ∑ni=1 Xi0 , so that S0 is an
independent copy of S. Note that random variables Yi = Xi − Xi0 (i ≤ n) are independent and symmetric.
By Jensen’s inequality,

                E|Yi | = EE(|Xi − Xi0 | | Xi ) ≥ E|E(Xi − Xi0 |Xi )| = E|Xi − EXi0 | = E|Xi − EXi | .

Since EYi2 = 2Var(Xi ), we have EYi2 ≤ 2τ · (E|Yi |)2 for i = 1, 2, . . . , n. Let ε = 18κ 2 τ(Var(|S|))1/2 , where
κ is the universal constant from Lemma 3.1. Let δ = κ −4 τ −2 /324, a = E|S|, and b = −a = −E|S|. We
consider two cases:


Case 1. Assume ε < |a − b|/6. By Chebyshev’s inequality,

                          P |S − a| ≤ ε or |S − b| ≤ ε ≥ 1 − Var(|S|)ε −2 = 1 − δ ,
                                                      


so by Lemma 2.2 there exist c ∈ R and k ≤ n such that
                                                                √
                                        P(|Sk − c| ≤ 2ε) ≥ 1 − 2 δ − 2δ ,

where Sk = ∑i≤n:i6=k Xi . Let Z = ∑i≤n:i6=k Yi . Let Sk0 = ∑i≤n:i6=k Xi0 , so that Sk0 is an independent copy of Sk .
Obviously,
                                     EZ 2 = E(Sk − Sk0 )2 = 2Var(Sk ) .

Note that

            P(|Z| ≤ 4ε) = P(|Sk − Sk0 | ≤ 4ε) ≥ P(|Sk − c| ≤ 2ε and |Sk0 − c| ≤ 2ε)
                                                                        √                √
                        = P(|Sk − c| ≤ 2ε) · P(|Sk0 − c| ≤ 2ε) ≥ (1 − 2 δ − 2δ )2 ≥ 1 − 4 δ ,


                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                  451
                JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK
                       √
so that P(|Z| > 4ε) ≤ 4 δ . Now we may apply a Paley–Zygmund type estimate:

        κ −1 τ −1/2 · (Var(Sk ))1/2 = κ −1 (2τ)−1/2 (EZ 2 )1/2 ≤ E|Z| = E|Z|1|Z|≤4ε + E|Z|1|Z|>4ε
                                                                                            √ 1/2
                                    ≤ 4ε + (EZ 2 )1/2 · (P(|Z| > 4ε))1/2 ≤ 4ε + 2Var(Sk ) · 4 δ
                                                              2
                                   = 72κ 2 τ · (Var(|S|))1/2 + κ −1 τ −1/2 · (Var(Sk ))1/2
                                                              3
—we have used Corollary 3.3 (with m = n − 1 and ρ = 2τ) in the first inequality, and the second one
follows from the Cauchy–Schwarz inequality. Obvious cancellations yield Var(Sk ) ≤ (6κ)6 τ 3 · Var(|S|).

Case 2. If ε ≥ |a − b|/6, then E|S| ≤ 3ε and

                Var(S1 ) ≤ Var(S) ≤ ES2 = Var(|S|) + (E|S|)2
                         ≤ Var(|S|) + 9ε 2 = (1 + 2976κ 4 τ 2 ) · Var(|S|) ≤ (6κ)6 τ 3 · Var(|S|)

because κ, τ ≥ 1. We have proved the theorem with K(τ) = (6κ)6 τ 3 .

Corollary 3.4. Let ξ be a square-integrable random variable which is not constant a. s., and let ξ1 ,
ξ2 , . . . be its i. i. d. copies. Then there exists a constant Kξ , depending only on distribution of ξ , such that
for any real numbers a1 , a2 , . . . , an there is some k ≤ n for which
                                                                              
                                                 2
                                          ∑ i ξa   ≤  K  · inf Var  x + ∑ ii .
                                                                          a ξ
                                                      x∈R           i≤n
                                    i≤n:i6=k


Proof. It suffices to apply Theorem 1.1 with τ = Var(ξ )/(E|ξ − Eξ |)2 . Setting Kξ = K(τ)/Var(ξ ) ends
the proof.


4    Proof of Theorem 1.3 (symmetric variant)
Now we will prove an analogue of Theorem 1.1 for real symmetric random variables (but with no
constrains on moments). It is possible to do it in a similar way as in the proof of Theorem 1.1, i. e., by
using Lemma 2.2, but to get better estimates we will adopt another, more direct approach. We will need a
lemma.

Lemma 4.1. Let X and Y be independent square-integrable random variables, at least one of them
symmetric. Then
                                         7 + √17
                      min Var(X), Var(Y ) ≤            · Var(|X +Y |) .
                                                 4
Proof. Obviously, E|X +Y |2 = EX 2 + EY 2 because EXY = EX · EY = 0. Since |X +Y | and |X −Y | have
the same distribution, there is

           E|X +Y | = (E|X +Y | + E|X −Y |)/2 = E(|X +Y | + |X −Y |)/2 = E max(|X|, |Y |) .

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                   452
                             O N S OME E XTENSIONS OF THE FKN T HEOREM

Hence
              Var(|X +Y |) = E(X 2 +Y 2 ) − (E|X +Y |)2
                                                                   
                           = E (max(|X|, |Y |))2 + (min(|X|, |Y |))2 − (E max(|X|, |Y |))2
                           = Var(max(|X|, |Y |)) + Var(min(|X|, |Y |)) + (E min(|X|, |Y |))2
                              1                                  
                           ≥ Var max(|X|, |Y |) + min(|X|, |Y |) + (E min(|X|, |Y |))2
                              2
                              1
                           = Var(|X| + |Y |) + (E min(|X|, |Y |))2
                              2
                              1                     
                           =     Var(|X|) + Var(|Y |) + (E min(|X|, |Y |))2 .
                              2
We have used the fact that for any square-integrable random variables V and W there is
                   2Var(V ) + 2Var(W ) = Var(V +W ) + Var(V −W ) ≥ Var(V +W ) .
Thus, for σ = (Var(|X +Y |))1/2 and s = Var(|X|) + Var(|Y |), we have s ≤ 2σ 2 and
                                                            1 1/2
                                                              
                                  E min(|X|, |Y |) ≤ σ 2 − s       .
                                                            2
    The identity a + b − 2 min(a, b) = |a − b| yields a pointwise bound:

     |X| + |Y | − 2 min(|X|, |Y |) = |X| − |Y | ≤ E|X| − E|Y | + (|X| − E|X|) − (|Y | − E|Y |)

                                = E|X| + E|Y | − 2 min(E|X|, E|Y |) + (|X| − |Y |) − E(|X| − |Y |) .
By taking expectations of both sides we arrive at
                                               1
          min(E|X|, E|Y |) ≤ E min(|X|, |Y |) + E (|X| − |Y |) − E(|X| − |Y |)
                                               2
                                               1               1/2          1 1/2 1 √
                           ≤ E min(|X|, |Y |) + Var(|X| − |Y |)      ≤ σ2 − s        +   s,
                                               2                               2       2
where we have bounded the L1 norm by the L2 norm and used the independence of |X| and |Y |. Therefore
                                                                                               
       min(Var(X), Var(Y )) ≤ min(EX 2 , EY 2 ) = min Var(|X|) + (E|X|)2 , Var(|Y |) + (E|Y |)2
                                                                  3         1 1/2
                            ≤ s + (min(E|X|, E|Y |))2 ≤ σ 2 + s + σ 2 s − s2
                                                                  4          2
                                             3                        7 + √17
                            ≤ σ 2 + sup         t + (σ 2t − t 2 /2)1/2 =        ·σ2 .
                                          2
                                   t∈[0,2σ ]  4                            4
Remark 4.2. An example of X with distribution
                                         1       3   1
                                           δ−2 + δ0 + δ2
                                         8       4   8
                                                   √
and ±1 symmetric Y indicates that the constant (7 + 17)/4 ≈ 2.78 in Lemma 4.1 cannot be replaced
by any number less than 16/7 ≈ 2.29.

                    T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                           453
                JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK
                                                                              √
Proof. Proof of Theorem 1.3 We prove the theorem with C = (7 + 17)/2 ≈ 5.56. For x ∈ R let
ξ1 = x + X1 , and ξi = Xi for i ≥ 2. Set S = ∑ni=1 ξi . For J ⊆ {1, 2, . . . , n} let SJ = ∑i∈J ξi . Let I be a
minimal (in the sense of inclusion) subset of [n] = {1, 2, . . . , n} such that
                                                     √
                                                  7 + 17
                                      Var(SI ) >            · Var(|S|)
                                                     4
(if no subset satisfies this condition, then the assertion follows trivially). Obviously, I 6= 0.
                                                                                               / Choose any
k ∈ I. Certainly,                                        √
                                                     7 + 17
                                      Var(SI\{k} ) ≤           · Var(|S|) .
                                                         4
Since SI and S[n]\I are independent and at least one of them is symmetric, Lemma 4.1 yields that
                                                            7 + √17
                                  min Var(SI ), Var(S[n]\I ) ≤        · Var(|S|) ,
                                                                 4
so that                                                          √
                                                              7 + 17
                                               Var(S[n]\I ) ≤        · Var(|S|) .
                                                                 4
Therefore                                                                         √
                                                                             7 + 17
                     Var         ∑ Xi          = Var(SI\{k} ) + Var(S[n]\I ) ≤        · Var(|S|) .
                               i≤n:i6=k                                           2
Thus we have proved that
                                                                  √
                                                             7 + 17                  
                               min Var          ∑ Xi         ≤        · inf Var x + ∑ Xi .
                               k≤n
                                              i≤n:i6=k            2     x∈R         i≤n

Remark 4.3. A simple example of n = 3 and X1 , X2 , X3 i. i. d. symmetric ±1 random variables indicates
that the constant C in Theorem 1.3 cannot be less than 8/3 ≈ 2.67 (it suffices to check it for x = 0).


5     Harmonic analysis on product spaces
Below, we introduce assumptions and notation which will be used throughout Section 5. This is a natural
and convenient setting for harmonic analysis on product spaces, e. g., on the discrete cube with the
uniform measure, in which case (πi )ni=1 is the standard Rademacher system. This language will allow us
to state FKN type results on product spaces other than the discrete cube.

5.1   Assumptions and notation (A & N)
Let ξ1 , ξ2 , . . . , ξn be independent random variables satisfying Eξi = 0 and Eξi2 = 1 for all i ≤ n. We
consider a Hilbert space L2 = L2 (Rn , µ), where µ = µξ1 ⊗ µξ2 ⊗ · · · ⊗ µξn is the joint distribution of the
random vector ξ = (ξ1 , ξ2 , . . . , ξn ). It will be convenient to set ξ0 ≡ 1, so that (ξi )ni=0 is an orthonormal
system in L2 . Let A be the linear (finite-dimensional and thus closed) subspace of L2 consisting of all

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                  454
                               O N S OME E XTENSIONS OF THE FKN T HEOREM

affine real-valued functions on Rn . We define coordinate projection functions πi : Rn −→ R by πi (x) = xi
for 1 ≤ i ≤ n, and π0 ≡ 1. Let Aπ = {π0 , −π0 , π1 , −π1 , . . . , πn , −πn }. For a Boolean Borel (i. e., {−1, 1}-
valued and such that the preimage of {1} is a Borel set) function f on Rn , by fA : Rn −→ R we will
denote its orthogonal projection in L2 onto the subspace A:
                                                                                      n
                           fA (x) = a0 + a1 x1 + · · · + an xn ,    i. e.,    fA = ∑ ai πi ,
                                                                                     i=0

where ai = h f , πi iL2 . We may and will use the same notation for a Borel Boolean function f defined only
on the support of µ, since obviously it may be extended to a Borel Boolean function F on the whole
Rn , and FA does not depend on the choice of the extension. Let us define the sign function in a slightly
non-standard way as 1[0,∞) − 1(−∞,0) , to make the function Boolean (setting sign(0) = −1 would work
     √ Let ρ = distL2 ( f , A) = infg∈A k f − gkL2 and d = distL2 ( f , Aπ ) = infg∈Aπ k f − gkL2 . Note that
as well).
d ≤ 2 because
                           k f − π1 k2L2 + k f + π1 k2L2 = 2k f k2L2 + 2kπ1 k2L2 = 2 + 2 = 4 ,

so that                                                            √
                                    min k f − π1 kL2 , k f + π1 kL2 ≤ 2 .

Obviously, ρ ≤ k f kL2 = 1. Finally, let us define random variables S = fA (ξ ) and R = ( f − fA )(ξ ).


5.2     Symmetric case
We will start with a theorem which recovers and extends the main result of [4] with a quite good, explicit
constant.

Theorem 5.1. Under A & N, if ξ1 , ξ2 , . . . , ξn are additionally symmetric, then there exists some k ∈
{0, 1, . . . , n} such that
                                                        √
                                          2         9 + 17 2
                                       ak ≥ 1 −              ·ρ .
                                                        2
Also,
                                 √                                   9 + √17 1/2
                     ρ ≤ d ≤ (9 + 17)1/2 · ρ ,         and    d≤                           · ρ + o(ρ)
                                                                              2
as ρ → 0+ (uniformly over Boolean functions).

Proof. Since | f | ≡ 1, the triangle inequality yields a pointwise bound

                        1 − | f − fA | ≤ | fA | ≤ 1 + | f − fA |,    i. e.,       |S| − 1 ≤ |R| ,

so Var(|S|) = E(|S| − 1)2 − (E|S| − 1)2 ≤ ER2 = k f − fA k2L2 = ρ 2 . Let us consider independent random
variables (Xi )ni=0 given by Xi = ai ξi for 1 ≤ i ≤ n, and with X0 being symmetric ±a0 random variable.
The sum | ∑ni=0 Xi | has the same distribution (and thus the same variance) as |S|, so by using Theorem 1.3

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                 455
                JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

(strictly speaking, its reformulation for n + 1 instead of n summands) with x = 0 we infer that for some
k ∈ {0, 1, . . . , n} there is
                                                                              7 + √17
                                                2
                                   ∑           ai = Var         ∑           Xi ≤        · ρ 2.
                             i∈{0,1,...,n}\{k}            i∈{0,1,...,n}\{k}
                                                                                   2

Since by orthogonality we have
                                n
                               ∑ a2i = k fA k2L = k f k2L − k f − fA k2L = 1 − ρ 2 ,
                                                 2           2              2
                               i=0
this ends the proof of the first assertion.
    The inequality ρ ≤ d follows immediately from Aπ ⊆ A. Observe that
       d 2 ≤ k f − sign(ak )πk k2L2 = k f k2L2 + kπk k2L2 − 2sign(ak )h f , πk iL2
                                                                                √
          = 1 + 1 − 2sign(ak )ak = 2(1 − |ak |) = 2(1 − a2k )/(1 + |ak |) ≤ (9 + 17)ρ 2 /(1 + |ak |) .
Thus                                                √ 1/2
                                              d ≤ 9 + 17    ·ρ .
The remaining assertion also follows easily because the first assertion implies |ak | ≥ 1 − O(ρ 2 ), so that
                                                   √
                                        2      9 + 17 2
                                      d ≤                · ρ + o(ρ 2 ) .
                                                   2
Corollary 5.2. Under assumptions of Theorem 5.1 (A & N, and ξ1 , ξ2 , . . . , ξn are symmetric) there is
some k ∈ {0, 1, . . . , n} such that
                                                                           √ 1/2
               min k f − (sign ◦ πk )kL2 , k f + (sign ◦ πk )kL2 ≤ 2d ≤ 2 9 + 17        · ρ.

Proof. Let k be such that k f − πk kL2 = d (or, alternatively, such that k f + πk kL2 = d). Note that for any
s ∈ {−1, 1} and u ∈ R there is |s − u| ≥ |sign(u) − u| (and |s + u| ≥ |sign(u) − u|). Hence | f − πk | ≥
|(sign ◦ πk ) − πk | (and | f + πk | ≥ |(sign ◦ πk ) − πk |) pointwise, so that k(sign ◦ πk ) − πk kL2 ≤ d. The
assertion follows by the triangle inequality.
    Now let us see how to strengthen the result of Friedgut, Kalai and Naor. For a function f defined
on the discrete cube {−1, 1}n we consider its standard Walsh-Fourier expansion ∑A fˆ(A)wA , where
wA (x) = ∏i∈A xi .
Theorem 5.3. There exists a universal constant L > 0 with the following property. For f : {−1, 1}n →
{−1, 1}, let
                                                            1/2
                                                     ˆ     2
                                   ρ=        ∑     | f (A)|       .
                                                     A⊆[n]:|A|≥2

Then there exists some B ⊆ [n] with |B| ≤ 1 such that

                                             ∑          | fˆ(A)|2 ≤ L · ρ 4 ln(2/ρ)
                                     A⊆[n]:|A|≤1,A6=B

and | fˆ(B)|2 ≥ 1 − ρ 2 − L · ρ 4 ln(2/ρ).

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                              456
                                   O N S OME E XTENSIONS OF THE FKN T HEOREM

Proof. Let ξ1 , ξ2 , . . . , ξn be independent symmetric ±1 random variables, so that the definition of ρ is
consistent with the one from A & N. We also have ai = h f , πi iL2 = fˆ({i}) for i ∈ [n], and a0 = fˆ(0).  / Let
us put
                                                                    −1
                                             θ = 4 log2 (2/d) − 1        .
              √
Because d ≤ 2, always θ ∈ (0, 1].
     Let k ∈ {0, 1, . . . , n} be such that d = k f − πk kL2 (if the point of Aπ closest to f is of the form −πk ,
then a similar reasoning works). Hence d 2 = k f k2L2 + kπk k2L2 − 2h f , πk iL2 = 2(1 − ak ). Since h = f − πk
is a {−2, 0, 2}-valued function, we get
                                                                   1
                      Pµ (h 6= 0) = µ({x ∈ {−1, 1}n : h(x) 6= 0}) = khk2L2 = (d/2)2 .
                                                                   4
Therefore
                                          2
                     4
                                       1+θ            B−B
    d 4 /2 = 4(d/2) 1+θ = 4 Pµ (h 6= 0)      = khk2L1+θ ≥ ∑ θ |A| · |ĥ(A)|2
                                                                          A⊆[n]
                                                                                              d4                     
          ≥θ·         ∑         |ĥ(A)|2 = θ · (1 − ak )2 +            ∑           a 2
                                                                                     i   = θ ·      +    ∑           a2
                                                                                                                      i ,
                 A⊆[n]:|A|≤1                                     i∈{0,1,...,n}\{k}
                                                                                                 4 i∈{0,1,...,n}\{k}

so that
                                      ∑             a2i ≤ (2θ −1 − 1)d 4 /4 ≤ 2d 4 log2 (2/d) .                         (5.1)
                                i∈{0,1,...,n}\{k}

                B−B
The inequality ≥ is the classical L2 − L1+θ hypercontractive Bonami–Beckner estimate ([2], [1]). Now
we have
                 n            d 2 2                             d 2 2 1
                ∑ a2i = 1 − 2 +              ∑           a2i ≤ 1 −       + (2θ −1 − 1)d 4
                i=0                    i∈{0,1,...,n}\{k}
                                                                   2      4
                                      1
                           = 1 − d 2 + θ −1 d 4 ≤ 1 − d 2 + 2d 4 log2 (2/d)
                                      2
and thus (recall that | f | ≡ 1, so ∑A⊆[n] | fˆ(A)|2 = k f k2L2 = 1) we get
                                                                   n
                          ρ2 =          ∑        | fˆ(A)|2 = 1 − ∑ a2i ≥ d 2 − 2d 4 log2 (2/d) .                        (5.2)
                                   A⊆[n]:|A|≥2                    i=0

We finish the proof by observing that (5.1) and (5.2) yield
                                                                        2
                          2      4                   2      4
                 ∑      a i ≤ 2d   log2 (2/d) ≤ 2  ρ   + 2d   log 2 (2/d)   log2 (2/d)
            i∈{0,1,...,n}\{k}
                                                          2
                                  ≤ 2 ρ 2 + 2d 4 log2 (2/ρ) log2 (2/ρ) = 2ρ 4 log2 (2/ρ) + o(ρ 5 ) ,

uniformly, as ρ → 0+ . We have used the d = O(ρ) bound of Theorem 5.1.

                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                              457
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

      For ρ ≤ 1/3, by Theorem 5.1 we have

                                             1   √ 1/2
                                        d≤     9 + 17    ≤ 2e−1/2
                                             3
and we may choose
                                       −1                                          −1
                       θ = 4 ln(2/d) − 1            instead of        4 log2 (2/d) − 1

which yields, essentially with the same proof, a slightly better (asymptotically) bound of eρ 4 ln(2/ρ) +
o(ρ 5 ) as ρ → 0+ .

Remark 5.4. The bound O ρ 4 ln(2/ρ) in Theorem 5.3 is better than O(ρ 2 ) obtained in [4] and ρ 2 +
                                         

o(ρ 2 ) of [6] (see also Corollary 15.2 of [5]). Also, it is of the optimal order. Indeed, for 2 ≤ m ≤ n
consider the negated OR function
                                                               m
                                                         1
                                         f (x) = 1 −            (1 + xi ) .
                                                       2m−1 ∏
                                                            i=1


Then ρ ≤ 2 · 2−m/2 , so that ρ 4 log2 (2/ρ) ≤ 8m · 4−m , whereas | fˆ(A)|2 ≥ 4 · 4−m for every one-element
set A contained in {1, 2, . . . , m}.


5.3     Absolute first moment assumption
Without the symmetry assumption we cannot hope to get the same assertion as in Theorem 5.1—if
n = 1, P(ξ1 = −2) = 1/5, P(ξ1 = 1/2) = 4/5 and f (x) = (3 + 4x)/5, then f ∈ A \ Aπ even though f
is Boolean. However, under an additional moment assumption, we still may prove that any Boolean
function which is close in L2 to A must be also at a comparable L2 -distance from an affine function of a
single coordinate.

Theorem 5.5. There exists a function C : (0, ∞) −→ (0, ∞) with the following property. Under A
& N, let η = mini∈[n] E|ξi |. Then there is some k ∈ [n] such that k f − (a0 + ak πk )kL2 ≤ C(η)ρ and
k f − sign(a0 + ak πk )kL2 ≤ 2C(η)ρ.

Proof. As in the proof of Theorem 5.1 we observe first that Var(|S|) ≤ ρ 2 , where S = a0 + ∑ni=1 ai ξi . By
Theorem 1.1 there is some k ∈ [n] such that ∑i∈[n]\{k} a2i ≤ K(η −2 )ρ 2 . Thus

                                                                                 2

                k f − (a0 + ak πk )k2L2 = k f − fA k2L2 +                             ≤ 1 + K(η −2 ) ρ 2
                                                                                                    
                                                               ∑         ai πi
                                                             i∈[n]\{k}           L2
                                            p
which proves the first assertion with C(η) = 1 + K(η −2 ). The second assertion follows from the first
one in the same way in which Corollary 5.2 follows from Theorem 5.1.

                    T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                               458
                              O N S OME E XTENSIONS OF THE FKN T HEOREM

5.4     General case
We will need two auxiliary lemmas.

Lemma 5.6. Let X and Y be independent square-integrable random variables. Assume E(|X +Y | − 1)2 ≤
ρ 2 for some ρ ∈ [0, 1]. Then Var(X) ≤ 25ρ or Var(Y ) ≤ 25ρ.

Proof. Let (X 0 ,Y 0 ) be an independent copy of the pair (X,Y ). For a square-integrable random variable
                      1/2
Z let kZk2 = EZ 2          . Also, let us define φ : R −→ {−2, 0, 2} by φ (u) = 2 for u > 1, φ (u) = −2 for
u < −1, and φ (u) = 0 otherwise. Note that |u − φ (u)| = dist(u, {−2, 0, 2}) for any u ∈ R. Finally, let

          α = P(X − X 0 > 1) = P(X − X 0 < −1) and          β = P(Y −Y 0 > 1) = P(Y −Y 0 < −1) .

    By the assumptions of the lemma we have kdist(X +Y, {−1, 1})k2 ≤ ρ. Since X +Y and X 0 +Y have
the same distribution, we obtain kdist(X 0 +Y, {−1, 1})k2 ≤ ρ. Thus the pointwise bound

                  dist(X − X 0 , {−2, 0, 2}) ≤ dist(X +Y, {−1, 1}) + dist(X 0 +Y, {−1, 1})

implies
                      k(X − X 0 ) − φ (X − X 0 )k2 = kdist(X − X 0 , {−2, 0, 2})k2 ≤ 2ρ .              (5.3)
In a similar way we prove that

                       k(Y −Y 0 ) − φ (Y −Y 0 )k2 = kdist(Y −Y 0 , {−2, 0, 2})k2 ≤ 2ρ .                (5.4)

The pointwise bound

             dist(X +Y − X 0 −Y 0 , {−2, 0, 2}) ≤ dist(X +Y, {−1, 1}) + dist(X 0 +Y 0 , {−1, 1})

implies kdist ((X − X 0 ) + (Y −Y 0 ), {−2, 0, 2})k2 ≤ 2ρ. Hence by (5.3), (5.4), and the triangle inequality
we obtain
                    dist φ (X − X 0 ) + φ (Y −Y 0 ), {−2, 0, 2} 2 ≤ 2ρ + 2ρ + 2ρ = 6ρ ,
                                                               

so that
                       1
                αβ ≤ Edist2 (φ (X − X 0 ) + φ (Y −Y 0 ), {−2, 0, 2})1X−X 0 ,Y −Y 0 >1 ≤ 9ρ 2 .    (5.5)
                       4
                                                                        √
      Assume Var(X), Var(Y ) > 25ρ, so that kX − X 0 k2 , kY −Y 0 k2 > 7 ρ. Then by (5.3) we have
                                √                        √          √
                               2 2α = kφ (X − X 0 )k2 > 7 ρ − 2ρ > 5 ρ,

so that α > 3ρ. In a similar way from (5.4) we get β > 3ρ, and therefore αβ > 9ρ 2 , which contradicts
(5.5).

Lemma 5.7. Let X1 , X2 , . . . , Xn be independent square-integrable random variables and let S = ∑ni=1 Xi .
Assume E(|S| − 1)2 ≤ ρ 2 for some ρ ∈ [0, 1]. Then there exists some k ∈ [n] such that Var(S − Xk ) ≤ 50ρ.

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                            459
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

Proof. Lemma 5.7 follows from Lemma 5.6 in a way similar to that in which we have deduced Theo-
rem 1.3 from Lemma 4.1: we look for a minimal I ⊆ [n] = {1, 2, . . . , n} such
                                                                             that Var (∑i∈I Xi ) > 25ρ,
then we choose any k ∈ I and from Lemma 5.6 we infer that Var ∑i∈[n]\I Xi ≤ 25ρ. By the minimality
                                  
of I there is also Var ∑i∈I\{k} Xi ≤ 25ρ, which ends the proof.

    Now we may finally state a result which does not use any additional properties of the marginal
distributions.
Theorem 5.8. Under A & N, there exists some k ∈ [n] such that
                                        √                                     √
              k f − (a0 + ak πk )kL2 ≤ 8 ρ and k f − sign(a0 + ak πk )kL2 ≤ 16 ρ .

Proof. Let Xi = ai ξi for i ∈ [n], so that S = a0 + ∑ni=1 Xi . Then

                              E(|S| − 1)2 = k| fA | − 1k2L2 ≤ k f − fA k2L2 = ρ 2

because of the pointwise inequality || fA | − 1| ≤ | f − fA |. Lemma 5.7 easily implies that there is some
k ∈ [n] such that ∑i∈[n]\{k} a2i ≤ 50ρ. Thus
                                                                                2
                k f − (a0 + ak πk )k2L2 = k f − fA k2L2 +     ∑         ai πi        ≤ ρ 2 + 50ρ ≤ 64ρ
                                                                                L2
                                                            i∈[n]\{k}

(recall that ρ is always in [0, 1] under A & N), which proves the first assertion. The second assertion of the
theorem follows from the first one in the same way in which Corollary 5.2 follows from Theorem 5.1.
                                               p                         p
     Under A & N, let n = 2 and P(ξ      p i =   β /α) =  α, P(ξ i =  −    α/β ) = β for i ∈ {1, 2}, where
α ∈ (0, 1) and β = 1 − α. Then β − αβ ξi ’s are {0, 1}-valued random variables, so that f (x) given by
                     p               p
             2(β − αβ x1 )(β − αβ x2 ) − 1 = (2β 2 − 1) − 2β 3/2 α 1/2 (x1 + x2 ) + 2αβ x1 x2
                                 p       p
is a Boolean function on {− α/β , β /α}2 equipped with the measure µξ1 ⊗ µξ2 . A simple analysis
shows that ρ = distL2 ( f , A) = 2αβ while the L2 -distance from f to any function of a single coordinate
is not less that 2β 3/2 α 1/2 = Θ(ρ 1/2 ) as α → 0+ (and, consequently, β → 1− and ρ → 0+ ). Thus the
   √
O( ρ) general bound of Theorem 5.8 cannot be improved even on two-dimensional biased discrete cube.
On the other hand, some FKN-type bounds were obtained in [6] and [5] for the biased discrete cube of
arbitrary dimension, in terms of the bias parameter. The approach used in the proof of Theorem 5.3 can
be effectively adapted to the case of the biased discrete cube if the Bonami–Beckner estimate is replaced
by the hypercontractive bounds of [12], see [10].

5.5   Boolean versus bounded
It is natural to look for an analogue of the FKN theorem for [−1, 1]-valued functions; however, there is
no hope for estimates as good as in the Boolean case. Recall that the FKN theorem states that a Boolean
function on the discrete cube (equipped with the uniform measure) which is close in L2 to A must be be
also at a comparable L2 -distance from a function which is both Boolean and affine (i. e., some function
from Aπ ).

                     T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                            460
                              O N S OME E XTENSIONS OF THE FKN T HEOREM

    Let A[−1,1] denote the set of [−1, 1]-valued affine functions on the cube:
                                                       (                      )
                                                           n        n
                                        A[−1,1] =          ∑ bi πi ; ∑ |bi | ≤ 1 .
                                                         i=0       i=0

Let ψ(t) = 1 for t > 1, ψ(t) = −1 for t < −1, and ψ(t) = t for t ∈ [−1, 1], and let us define functions
 f , g : {−1, 1}n −→ R by g(x) = s−1 n−1/2 ∑ni=1 xi and f (x) = ψ(g(x)) for some positive parameter s. Note
that
                                   2              n
               distL2 g, A[−1,1]        ≥ inf     ∑ (s−1 n−1/2 − bi )2
                                         ∑ |bi |≤1 i=1
                                                   n
                                        ≥ inf     ∑ (s−2 n−1 − 2s−1 n−1/2 bi ) ≥ s−2 − 2s−1 n−1/2
                                         ∑ |bi |≤1 i=1


and distL2 (g, A[−1,1] ) ≤ kg − 0kL2 = s−1 , so that limn→∞ distL2 (g, A[−1,1] ) = s−1 . Thus for the [−1, 1]-
valued function f there is
                                                       q                                  2
                 lim distL2 ( f , A) ≤ lim k f − gkL2 = E(|s−1 G| − 1)2 1|G|≥s = O(e−s /4 )
                n→∞                     n→∞

as s → ∞, where G denotes the standard N(0, 1) Gaussian variable, while we have only

                            distL2 ( f , A[−1,1] ) ≥ distL2 (g, A[−1,1] ) − k f − gkL2 ,

so that limn→∞ distL2 ( f , A[−1,1] ) = Θ(s−1 ) as s → ∞.
                                                          2
    The above example, demonstrating the gap between O(e−s /4 ) and Θ(s−1 ), is as bad as it gets—in [10]
                                                                                                  2
Nayar proved that for s > 0 and every function f : {−1, 1}n → [−1, 1] with distL2 ( f , A) = e−s /4 we
have distL2 ( f , A[−1,1] ) ≤ 36s−1 .


6    Banach space setting
The main result of this section, Theorem 6.10, is an advanced extension of Lemma 2.2. To prove it, we
will need to develop some new tools.
    In what follows, (V, k · k) denotes a separable real Banach space, its continuous dual space (of bounded
linear functionals on V ) is denoted by V 0 , and + stands for Minkowski addition. By dist we will mean
the distance in the norm k · k. Readers unfamiliar with Banach spaces may find it convenient to assume
additionally that V is a finite-dimensional normed vector space and think about Euclidean spaces instead
of Hilbert spaces.
    For a finite A ⊂ V , positive ∆, ε, δ and a V -valued random vector X we will say that:

    • A is ∆-separated if either |A| ≥ 2 and for any two distinct x, y ∈ A there is kx−yk ≥ ∆, or |A| = 1; we
      define the separation constant of A by ∆(A) = min{kx − yk; x, y ∈ A and x 6= y}. Clearly, ∆(A) = ∞
      if |A| = 1;

                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                           461
                JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

    • X is (ε, δ )-close to A if P(dist(X, A) > ε) ≤ δ ;

    • X is (ε, δ )-present around A if there is P(kX − ak ≤ ε) > δ for every a ∈ A.
Note that if X is (ε, δ )-close to A, then it is (ε, δ )-close to any finite set B containing A. Similarly, if X is
(ε, δ )-present around A, then it is (ε, δ )-present around any finite set B contained in A.
     We need some simple lemmas. The first of them is obvious.
Lemma 6.1. Let X be a V -valued random vector which is (ε, δ )-close to some nonempty finite A ⊂ V
with some ε > 0 and δ ∈ [0, 1]. Then there is some a ∈ A such that P(kX − ak ≤ ε) ≥ (1 − δ )/|A|.
Lemma 6.2. Let A, B ⊂ V be finite and ∆-separated for some ∆ > 0 and assume that X is a V -valued
random vector which is both (ε1 , δ )-close to A and (ε2 , δ )-present around B for some δ > 0 and ε1 , ε2 > 0
such that ε1 + ε2 < ∆/2. Then |B| ≤ |A|.
Proof. Let b ∈ B. Since P(kX − bk ≤ ε2 ) > δ and P(dist(X, A) > ε1 ) ≤ δ there is P(kX − bk ≤
ε2 and dist(X, A) ≤ ε1 ) > 0, so that dist(b, A) ≤ ε1 + ε2 . Note that there is only one ab ∈ A such that
kb − ab k ≤ ε1 + ε2 because we assume that A is ∆-separated and ε1 + ε2 < ∆/2. For a similar reason (B
is also ∆-separated), the mapping b 7→ ab is injective. This ends the proof.

Lemma 6.3. Let X1 , X2 , . . . , Xm be independent V -valued random vectors and let S = ∑mi=1 Xi . Assume
that S is (ε, δ )-close to a nonempty finite A ⊂ V for some ε, δ > 0. Then there exist vectors v1 , v2 , . . . ,
vm ∈ V such that Xi is (ε, δ )-close to vi + A for every i ∈ [m].
Proof. This follows from the Fubini theorem (see the beginning of the proof of Lemma 2.1).

Lemma 6.4. Let ∆ > 0 and let A and B be finite ∆-separated subsets of V with |A|, |B| ≥ 2. Then there
exists some C ⊆ A + B = {a + b; a ∈ A, b ∈ B} with |C| > max(|A|, |B|) which is also ∆-separated.
Proof. Without loss of generality we may assume that |A| ≥ |B|. Let b and b0 be arbitrary distinct elements
of B. Let ϕ ∈ V 0 be such that kϕkV 0 = 1 and ϕ(b0 − b) = kb0 − bk (the existence of such a functional is
guaranteed by the Hahn-Banach theorem). Finally, let â = arg maxa∈A ϕ(a) (any maximizer will do if
there is more than one). Then C = (A + b) ∪ {â + b0 } has more elements than A and is ∆-separated. These
two facts follow since, for a ∈ A,

          k(â + b0 ) − (a + b)k ≥ ϕ(â + b0 − a − b) = ϕ(â) − ϕ(a) + ϕ(b0 − b) ≥ kb0 − bk ≥ ∆ .

    By an obvious induction we obtain the following corollary.
Corollary 6.5. Let ∆ > 0 and let A1 , . . . , Am be ∆-separated finite subsets of V with |Ai | ≥ 2 for i ∈ [m].
Then there exists some C ⊆ A1 + A2 + · · · + Am with |C| > m which is also ∆-separated.
    The next corollary easily follows (we leave it as an exercise).
Corollary 6.6. Let ∆, ε1 , . . . , εm , δ1 , . . . , δm > 0 and let A1 , . . . , Am be ∆-separated finite subsets of V
with |Ai | ≥ 2 for i ∈ [m]. Assume that X1 , . . . , Xk are independent V -valued random vectors and Xi is
(εi , δi )-present around Ai for all i ∈ [m]. Then there exists some ∆-separated set C ⊆ A1 + · · · + Am with
|C| > m such that ∑m                m            m
                       i=1 Xi is (∑i=1 εi , ∏i=1 δi )-present around C.


                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                    462
                                   O N S OME E XTENSIONS OF THE FKN T HEOREM

Lemma 6.7. Let ∆ > 0 and let A and B be finite ∆-separated subsets of a Hilbert space H. Then there
exists some C ⊆ A + B with |C| ≥ |A| + |B| − 1 which is also ∆-separated.
Proof. Without loss of generality we may assume that 0 ∈ A. Let A = {0, u1 , . . . , um } and B = {v1 , . . . , vn }.
For u ∈ H let b(u) = arg maxv∈B hu, vi (any choice will do if there is more than one maximizer). Then we
can just take C = {v1 , . . . , vn , u1 + b(u1 ), . . . , um + b(um )}. Obviously, C ⊆ A + B. By definition of b(u j )
for any i ∈ [n] and j ∈ [m] we have hu j , b(u j )i ≥ hu j , vi i, so that
                        ku j kku j + b(u j ) − vi k ≥ hu j , u j + b(u j ) − vi i ≥ hu j , u j i = ku j k2
and therefore k(u j + b(u j )) − vi k ≥ ku j k ≥ ∆.
    Similarly, for any distinct j, j0 ∈ [m] we have hu j , b(u j )i ≥ hu j , b(u j0 )i and hu j0 , b(u j0 )i ≥ hu j0 , b(u j )i,
so that hu j − u j0 , b(u j ) − b(u j0 )i ≥ 0. Hence
               ku j − u j0 kk(u j + b(u j )) − (u j0 + b(u j0 ))k ≥ hu j − u j0 , u j − u j0 + b(u j ) − b(u j0 )i
                                                                   ≥ hu j − u j0 , u j − u j0 i = ku j − u j0 k2
and thus k(u j + b(u j )) − (u j0 + b(u j0 ))k ≥ ku j − u j0 k ≥ ∆.

    Again, an obvious induction yields the following corollary.
Corollary 6.8. Let ∆ > 0 and A1 , . . . , Am be nonempty, ∆-separated and finite subsets of a Hilbert space.
Then there exists some ∆-separated C ⊆ A1 + · · · + Am with |C| > ∑m i=1 (|Ai | − 1).

    The next corollary easily follows (we also leave it as an exercise).
Corollary 6.9. Let ∆, ε1 , . . . , εm , δ1 , . . . , δm > 0 and let A1 , . . . , Am be finite ∆-separated subsets of a
Hilbert space H. Assume that X1 , . . . , Xm are independent H-valued random vectors and Xi is (εi , δi )-
present around Ai for all i ∈ [m]. Then there exists some ∆-separated set C ⊆ A1 + · · · + Am with
|C| > ∑m                             m                 m      m
        i=1 (|Ai | − 1) such that ∑i=1 Xi is (∑i=1 εi , ∏i=1 δi )-present around C.

    Now we are in position to prove a structural theorem.
Theorem 6.10. Let A be a finite subset of V with |A| ≥ 2. Let ε and δ be positive numbers satisfying
                                                 ∆(A)
                                         ε<                     and     δ ≤ |A|−|A| .
                                               2(|A| + 1)
Assume that ξ1 , . . . , ξn are independent V -valued random vectors such that S = ∑ni=1 ξi is (ε, δ )-close to
A. For I ⊆ [n] let SI = ∑i∈I ξi . Then there exists a nonnegative integer k < |A| and {i1 , . . . , ik } ⊆ [n] such
that                                                                                  |A|
                                                           
                           P S[n]\{i1 ,...,ik } − v ≤ |A|ε ≥ 1 − δ − (|A| − 1)δ 1/|A|
for some v ∈ V . Consequently,
                         S[n]\{i1 ,...,ik } − v > |A|ε ≤ |A|(|A| − 1)δ 1/|A| + |A|δ ≤ |A|2 δ 1/|A| .
                                                      
                    P
Moreover, if V is a Hilbert space and k > 0, then there are vectors v1 , . . . , vk ∈ V and nonempty sets
B1 , . . . , Bk ⊆ A with ∑kl=1 (|Bl | − 1) < |A| such that ξil is (ε, |A|δ 1/k )-close, and thus also (ε, |A|δ 1/(|A|−1) )-
close, to vl + Bl for every l ∈ [k].

                        T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                           463
                 JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

Proof. We will call I ⊆ [n] relevant if there exist some two points x, y ∈ V such that kx − yk ≥ ∆(A),
P(kSI − xk ≤ ε) > δ 1/|A| and P(kSI − yk ≤ ε) > δ 1/|A| . Since ε < ∆(A)/2, all relevant sets are nonempty.
We will inductively construct a sequence of relevant sets I1 , I2 , . . .. Let I1 be a minimal (in the sense of
inclusion) relevant subset of [n]—any minimizer will do if there is more than one. Then, having defined
I1 , . . . , Is , we choose for Is+1 a minimal relevant subset of [n] \ sl=1 Il . We end up with a collection of
                                                                        S

pairwise disjoint relevant sets I1 , . . . , Ik , and we know that J = [n] \ kl=1 Il does not contain any relevant
                                                                            S

set and thus is not relevant.
       Let us assume for a while that k ≥ |A|. By expressing S as
                                                          !
                                                    |A|
                                                    ∑ SI  l
                                                              + S[n]\S|A| I
                                                                       l=1 l
                                                    l=1

                                                                 |A|
and using Lemma 6.3 (with m = 2) we deduce that ∑l=1 SIl is (ε, δ )-close to some shift of A, i. e., to a
∆(A)-separated set of cardinality |A|. On the other hand, Il ’s are relevant so that SIl ’s are (ε, δ 1/|A| )-present
                                                                                                                |A|
around some ∆-separated sets of cardinality greater than 1. Thus, by Corollary 6.6 the sum ∑l=1 SIl is
(|A|ε, δ )-present around some ∆(A)-separated set of cardinality greater than |A| and hence Lemma 6.2
(used for ε2 = |A|ε) implies that |A| < |A|. This contradiction proves that k < |A|.
     Select arbitrary i1 ∈ I1 , . . . , ik ∈ Ik . The way in which we chose Il ’s guarantees that sets I1 \{i1 }, . . . , Ik \
{ik } are not relevant. By expressing S as SJ + SI1 \{i1 } + · · · + SIk \{ik } + ξi1 + · · · + ξik and using again
Lemma 6.3 (now with m = 2k + 1) we prove that there exist vectors w, w1 , . . . , wk , v1 , . . . , vk such that
SJ is (ε, δ )-close to w + A and SIl \{il } is (ε, δ )-close to wl + A, and ξil is (ε, δ )-close to vl + A for every
l ∈ [k]. On the other hand, Il \ {il }’s are not relevant and the shifts of A are ∆(A)-separated,            so that for
each l ∈ [k] there exists at most one al ∈ A such that P SIl \{il } − wl − al ≤ ε > δ 1/|A| . Thus
                                                                                            


                                     SIl \{il } − (wl + al ) > ε ≤ (|A| − 1)δ 1/|A| + δ
                                                                
                                P

for all l ∈ [k] and similarly we prove that there exists some a ∈ A such that

                                    P (kSJ − (w + a)k > ε) ≤ (|A| − 1)δ 1/|A| + δ .

Hence for S[n]\{i1 ,...,ik } = SJ + ∑kl=1 SIl \{il } and v = (w + a) + ∑kl=1 (wl + al ) we have

                                                                                       k+1
                       P    S[n]\{i1 ,...,ik } − v ≤ (k + 1)ε ≥ 1 − (|A| − 1)δ 1/|A| − δ

and the first assertion of the theorem easily follows from k < |A|. Note that δ ≤ |A|−|A| implies (|A| −
1)δ 1/|A| + δ < 1.
    Now let us additionally assume that k > 0 and V is a Hilbert space. For l ∈ [k] let
                                   n                                        o
                              Bl = a ∈ A : P (kξil − (a + vl )k ≤ ε) > δ 1/k .

Note that δ ≤ |A|−|A| implies (1 − δ )/|A| ≥ δ 1/(|A|−1) ≥ δ 1/k , so that by Lemma 6.1 we have Bl 6= 0.        / We
also easily see that ξil is (ε, δ + (|A| − 1)δ 1/k )-close and thus also (ε, |A|δ 1/k )-close to vl + Bl . Since ξil is

                       T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                          464
                                O N S OME E XTENSIONS OF THE FKN T HEOREM

(ε, δ 1/(|A|−1) )-present around vl + Bl for every l ∈ [k], by Corollary 6.9 the sum ∑kl=1 ξil is (kε, δ k/(|A|−1) )-
present, and thus also ((|A| − 1)ε, δ )-present around some ∆(A)-separate C ⊆ B1 + · · · + Bk + ∑kl=1 vl
with |C| > ∑kl=1 (|Bl | − 1). From Lemma 6.2 used for ε2 = (|A| − 1)ε we get |A| ≥ |C| which ends the
proof.

Remark 6.11. Interestingly, only a slightly worse quantitative description of the structure of random
variables ξil ’s is possible also without assuming that V is a Hilbert space—we will briefly sketch the
argument. First replace ξil by
                                      Yl = ∑ 1kξi −(vl +a)k≤ε · (vl + a) .
                                                     l
                                             a∈A

Note that Yl ’s are independent again and P(kξil −Yl k > ε) ≤ δ . Now observe that all Yl ’s take values in a
finite-dimensional space W which is a linear span of the set A and vectors vl ’s. Thus dim(W ) ≤ |A| + k <
2|A|. Now it suffices to recall the classical theorem of F. John which
                                                                 √     states that the Banach-Mazur distance
    N
to l2 of any N-dimensional Banach space is not greater than N, deal with Yl ’s in an appropriate Hilbert
space as before, and then transfer the obtained bounds back to the Banach space setting.

    We finish by posing a problem which we were not able to solve.

Question 6.12. Is Lemma 6.7 valid in an arbitrary Banach (not only Hilbert) space?


7    Addendum
Most of the results and proofs contained in the present paper were presented at several conferences
and seminars, including the second-named author’s lecture at the Analysis of Boolean Functions: New
Directions and Applications Simons Symposium, February 5–11, 2012. During the second-named
author’s Fall 2013 visit to UC Berkeley’s Simons Institute for the Theory of Computing, he learned of
new results obtained independently by Aviad Rubinstein and Shmuel (Muli) Safra: Rubinstein’s MSc
thesis [13], written under the supervision of Safra, and their joint research paper [14]. Let us briefly
explain how their results relate to ours. We will need the following estimates.

Lemma 7.1. Let X and Y be independent square-integrable real random variables such that Var(X +Y ) >
0. Assume that, for some ρ ∈ [0, 1], E(|X + Y | − 1)2 ≤ ρ 2 . Then Var(X) ≤ 800ρ 2 /Var(X + Y ) or
Var(Y ) ≤ 800ρ 2 /Var(X +Y ).

Proof.
p      Let us adapt the proof of Lemma 5.6. Using the same notation as there, let us also set σ =
  Var(X +Y ). Since Var(X) + Var(Y ) = σ 2 , without loss of generality we may and will assume that
Var(X) ≥ σ 2 /2. Now it suffices to show that Var(Y ) ≤ 800ρ 2 σ −2 .
                    √
    Indeed, if σ < 6 ρ, then

                        Var(Y ) = σ 2 − Var(X) ≤ σ 2 /2 ≤ 64 ρ 2 σ −2 /2 ≤ 800ρ 2 σ −2 .
                            √
If, on the other hand, σ ≥ 6 ρ ≥ 6ρ, then by (5.3) we have
              √                                                                 2
             2 2α = kφ (X − X 0 )k2 ≥ kX − X 0 k2 − 2ρ = 2Var(X) − 2ρ ≥ σ − 2ρ ≥ σ ,
                                                        p
                                                                                3

                      T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                                  465
               JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

so that α ≥ σ 2 /18. Thus, by (5.5), β ≤ 9ρ 2 /α ≤ 162ρ 2 σ −2 . Since, by (5.4),

                       2Var(Y ) = kY −Y 0 k2 ≤ 2ρ + kφ (Y −Y 0 )k2 = 2ρ + 2 2β ,
                     p                                                         p


we arrive at
                                   2Var(Y ) ≤ 2ρ + 36ρσ −1 ≤ 40ρσ −1 ,
                                  p

the last inequality following from the fact that

                            σ ≤ kX +Y k2 ≤ 1 + |X +Y | − 1               ≤ 1+ρ ≤ 2.
                                                                     2

Therefore, Var(Y ) ≤ 800ρ 2 σ −2 , and the proof is finished.

Lemma 7.2. Let X1 , X2 , . . ., Xn be independent square-integrable random variables and let S = ∑ni=1 Xi .
Assume that E(|S| − 1)2 ≤ ρ 2 for some ρ ∈ [0, 1], and that Var(S) > 0. Then there exists some k ∈ [n]
such that Var(S − Xk ) ≤ 1600ρ 2 /Var(S).
Proof. Lemma 7.2 can be deduced from Lemma 7.1 in the same way in which Lemma 5.7 follows from
Lemma 5.6.

    Now we are in position to state the following refinement of Theorem 5.8—note that in the case
Var( f ) ≤ ρ the bound of Theorem 5.8 is trivial since always
                                                              p
                               f − a0 − ak πk L2 ≤ f − a0 L2 = Var( f ) ,
                                  p            √
while for Var( f ) ≥ ρ we have ρ/ Var( f ) ≤ ρ.
Corollary 7.3. Under A & N (see Subsection 5.1), there is some k ∈ [n] such that
                                   p                                                p
      k f − (a0 + ak πk )kL2 ≤ 41ρ/ Var( f ) , and k f − sign(a0 + ak πk )kL2 ≤ 82ρ/ Var( f ) .

Proof. Just as in the proof of Theorem 5.8, we arrive at
                                                                                     2
                     k f − (a0 + ak πk )k2L2 = k f − fA k2L2 +     ∑         ai πi
                                                                                     L2
                                                                 i∈[n]\{k}

                                            ≤ ρ 2 + 1600ρ 2 /Var(S) ≤ 1601ρ 2 /Var(S) ,

where we have used Lemma 7.2 and the fact that Var(S) = ∑ni=1 a2i ≤ 1. The second assertion of the
corollary follows from the first one in the same way in which Corollary 5.2 follows from Theorem 5.1.

      The above corollary should be compared to the main result of Rubinstein’s MSc thesis, [13, Corol-
lary 10]. Note that the setting introduced in Subsection 5.1 is slightly incompatible with Rubinstein’s
“pair-wise disjoint subsets of the inputs”—one would need to consider the product measure µ on a more
abstract probability space than just Rn to recover [13, Corollary 10] as a special case of Corollary 7.3.
Still, [13, Corollary 10] may be easily deduced from Lemma 7.2 by an obvious modification of the proof
of Corollary 7.3, in which ai πi ’s should be replaced by Rubinstein’s restrictions fi ’s and a0 turned into
 fˆ(0).
    /

                    T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                           466
                            O N S OME E XTENSIONS OF THE FKN T HEOREM

Acknowledgements
Research of the second-named author was partially carried out during his stay at the Fields Institute in
Toronto. He gratefully acknowledges the hospitality of the institute. The authors thank Prof. Stanisław
Kwapień for providing the reference to the work of Marcinkiewicz and Zygmund.


References
 [1] W ILLIAM B ECKNER: Inequalities in Fourier analysis. Ann. of Math., 102(1):159–182, 1975.
     [doi:10.2307/1970980] 446, 457

 [2] A LINE B ONAMI: Étude des coefficients Fourier des fonctions de L p (G). Ann. Inst. Fourier,
     20(2):335–402, 1970. EuDML. 446, 457

 [3] I RIT D INUR: The PCP theorem by gap amplification. J. ACM, 54(3:12), 2007. Preliminary versions
     in STOC’06 and ECCC. [doi:10.1145/1236457.1236459] 446

 [4] E HUD F RIEDGUT, G IL K ALAI , AND A SSAF NAOR: Boolean functions whose Fourier transform is
     concentrated on the first two levels. Adv. in Appl. Math., 29(3):427–437, 2002. [doi:10.1016/S0196-
     8858(02)00024-6] 446, 455, 458

 [5] G UY K INDLER: Property Testing, PCP and Juntas. Ph. D. thesis, Tel Aviv University, 2002.
     Available at author’s website. 446, 458, 460

 [6] G UY K INDLER AND S HMUEL S AFRA: Noise-resistant Boolean functions are juntas. Preprint,
     2002. Available at author’s website. 446, 458, 460

 [7] H ERMANN KÖNIG , C ARSTEN S CHÜTT, AND N ICOLE T OMCZAK -JAEGERMANN: Projection
     constants of symmetric spaces and variants of Khintchine’s inequality. J. Reine Angew. Math.,
     1999(511):1–42, 1999. [doi:10.1515/crll.1999.511.1] 446

 [8] R AFAŁ L ATAŁA AND K RZYSZTOF O LESZKIEWICZ: On the best constant in the Khinchin-Kahane
     inequality. Studia Math., 109(1):101–104, 1994. EuDML. 450

 [9] J ÓZEF M ARCINKIEWICZ AND A NTONI Z YGMUND: Sur les fonctions indépendantes. Fund. Math.,
     29(1):60–90, 1937. EuDML. 450

[10] P IOTR NAYAR: FKN theorem on the biased cube.             Colloq. Math., 137(2):253–261, 2014.
     [doi:10.4064/cm137-2-9, arXiv:1311.3179] 460, 461

[11] RYAN O’D ONNELL: Analysis of Boolean Functions. Cambridge University Press, 2014. 447

[12] K RZYSZTOF O LESZKIEWICZ: On a nonsymmetric version of the Khinchine-Kahane inequality.
     In Stochastic Inequalities and Applications, volume 56 of Progress in Probability, pp. 157–168.
     Springer/Birkhäuser, 2003. [doi:10.1007/978-3-0348-8069-5_11] 460

                   T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                         467
              JACEK J ENDREJ , K RZYSZTOF O LESZKIEWICZ AND JAKUB O. W OJTASZCZYK

[13] AVIAD RUBINSTEIN: Boolean functions whose Fourier transform is concentrated on pair-wise
     disjoint subsets of the inputs. Master’s thesis. School of Computer Science, Tel-Aviv University,
     October 2012. 465, 466

[14] AVIAD RUBINSTEIN AND M ULI S AFRA: Boolean functions whose Fourier transform is concen-
     trated on pairwise disjoint subsets of the input, 2015. [arXiv:1512.09045] 465

[15] WALTER RUDIN: Functional Analysis. McGraw-Hill, Inc., New York, 2nd edition, 1991. 447

[16] S TANISŁAW J. S ZAREK: On the best constants in the Khinchin inequality. Studia Math., 58(2):197–
     208, 1976. EuDML. 450


AUTHORS

      Jacek Jendrej
      Ph. D. student
      Centre de Mathématiques Laurent Schwartz
      Ecole Polytechnique
      jacek jendrej polytechnique edu
      http://jacek.jendrej.perso.math.cnrs.fr/


      Krzysztof Oleszkiewicz
      Professor
      Institute of Mathematics
      University of Warsaw
      koles mimuw edu pl
      Web page at U. Warsaw


      Jakub O. Wojtaszczyk
      Software engineer
      Google
      onufry google com
      http://research.google.com/pubs/author58121.html


ABOUT THE AUTHORS

      JACEK J ENDREJ is a Ph. D. student at Ecole Polytechnique advised by Yvan Martel and Frank
         Merle. He is interested in nonlinear evolution partial differential equations, especially
         in near-soliton dynamics for critical nonlinearities, energy estimates and compactness
         methods.



                   T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                          468
                      O N S OME E XTENSIONS OF THE FKN T HEOREM

K RZYSZTOF O LESZKIEWICZ is a professor at the University of Warsaw’s Institute of
   Mathematics, where he received his Ph. D. in 1997 under the supervision of Stanisław
   Kwapień. His research interests lie in probability theory and analysis, especially in
   various inequalities, including tail and moment estimates for sums of independent random
   variables, concentration bounds and hypercontractivity. He has no idea about Computer
   Science.


JAKUB O NUFRY W OJTASZCZYK is a software engineer at Google. Besides developing
   cluster-level operating systems, he has some knowledge of algorithms and of probability
   theory.




             T HEORY OF C OMPUTING, Volume 11 (18), 2015, pp. 445–469                         469