Hypercontractivity Via the Entropy Method

Authors Eric Blais, Li-Yang Tan,
Plaintext
                         T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896
                                       www.theoryofcomputing.org

                                                         NOTE

                   S PECIAL ISSUE : A NALYSIS OF B OOLEAN F UNCTIONS



  Hypercontractivity Via the Entropy Method
                                      Eric Blais∗                   Li-Yang Tan†
                 Received January 15, 2013; Revised June 30, 2013; Published December 7, 2013




       Abstract: The Hypercontractive Inequality of Bonami (1968, 1970) and Gross (1975) is
       equivalent to the following statement: for every q > 2 and every function f : {−1, 1}n → R
       of Fourier degree at most m,
                                         k f kq ≤ (q − 1)m/2 k f k2 .
       The original proof of this inequality is analytical. Friedgut and Rödl (2001) gave an al-
       ternative proof of the slightly weaker Hypercontractive Inequality k f k4 ≤ 28m/4 k f k2 by
       combining tools from information theory and combinatorics. Specifically, they recast the
       problem as a statement about multi-hypergraphs, generalized Shearer’s lemma, and used
       probabilistic arguments to obtain the inequality.
           We show that Shearer’s Lemma and elementary arguments about the entropy of random
       variables are sufficient to recover the optimal Hypercontractive Inequality for all even
       integers q.

ACM Classification: G.2, G.3, F.1.3
AMS Classification: 68Q87
Key words and phrases: Boolean functions, harmonic analysis, hypercontractivity, information theory,
entropy method
   ∗ Research supported by a Simons Postdoctoral Fellowship. Part of this research was completed while the author was a

postdoctoral research fellow at Carnegie Mellon University.
   † Part of this research was completed while the author was visiting CMU.



© 2013 Eric Blais and Li-Yang Tan
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2013.v009a029
                                              E RIC B LAIS AND L I -YANG TAN

1     Introduction
The Hypercontractive Inequality plays a fundamental role in the analysis of Boolean functions. Discovered
independently1 by Aline Bonami [3, 4] and several years later by Leonard Gross [14], the inequality
is concerned with the noise operator Tρ that acts on functions f : {−1, 1}n → R via Tρ f (x) = E[ f (y)],
where y is a ρ-correlated copy of x (i. e., y is drawn from the product distribution where E[yi xi ] = ρ for
all i ∈ [n]). Intuitively, the noise operator smooths f by replacing each f (x) with the average of f ’s values
in a neighborhood around x; this effect is also evident when considering the Fourier expansion
                                  Tρ f = ∑ ρ |S| fb(S)χS ,         where      χS (x) = ∏ xi .
                                           S⊆[n]                                        i∈S

Here we see that each Fourier coefficient fb(S) is attenuated by a factor ρ |S| that gets harsher the larger |S|
is. The Hypercontractive Inequality quantifies this smoothing effect by showing that applying Tρ to f
allows one to bound its 2-norm by a smaller norm:
Hypercontractive Inequality. Let f : {−1, 1}n → R and 0 < ρ < 1. Then Tρ f                             2
                                                                                                           ≤ k f k1+ρ 2 .
     The Hypercontractive Inequality is equivalent (via duality, see, e. g., [22]) to the following inequality.
Theorem 1.1. For any f : {−1, 1}n → R, let f (=m) = ∑|S|=m fb(S)χS denote the projection of f to its
degree-m part. Then for all q > 2,
                                           k f (=m) kq ≤ (q − 1)m/2 · k f (=m) k2 .
That is, the 2 → q norm of the projection operator P=m that maps f : {−1, 1}n → R to its degree-m part
is at most (q − 1)m/2 . We denote this as kP=m k2→q ≤ (q − 1)m/2 .
    First introduced into theoretical computer science by the celebrated work of Kahn, Kalai, and
Linial [15], the Hypercontractive Inequality has seen utility in a surprisingly wide variety of areas,
spanning distributed computing, random graphs, k-SAT, social choice, inapproximability, learning theory,
metric spaces, statistical physics, convex relaxation hierarchies, etc. [2, 6, 22, 8, 9, 10, 11, 5, 18, 17, 21,
13, 19, 1]. In almost every one of these results there are no known alternate proofs that do not require the
use of hypercontractivity. (See de Wolf’s survey [23, Sec. 4] and O’Donnell’s monograph [20, Ch. 9] for
more details on the Hypercontractive Inequality and its applications.)
    The well-known analytic proof of the Hypercontractive Inequality proceeds by induction on n. The
crux of the inductive step (sometimes referred to as the tensoring property of the Hypercontractive
Inequality) is Minkowski’s inequality, the triangle inequality for L p spaces; the base case is a two-
point inequality that reduces to standard analytic calculations but is nonetheless technically involved.
Considering the ubiquity of the Hypercontractive Inequality in discrete settings, there has been significant
interest in obtaining alternative proofs of the inequality.
    Ehud Friedgut and Vojtech Rödl [12] obtained one such alternative proof by exploiting a novel
connection between the Hypercontractive Inequality and Shannon entropy. Specifically, their main result
is an information-theoretic/combinatorial proof of the inequality
                                                          √
                                          kP=m k2→4 ≤ ( 28)m/2 .
    1 For a history of the discovery of the Hypercontractive Inequality we refer the reader to the Notes in [20, Ch. 9].



                         T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                                            890
                           H YPERCONTRACTIVITY V IA THE E NTROPY M ETHOD

This is a slightly weaker version of the q = 4 special case of Theorem 1.1, but it is nonetheless qualitatively
sufficient for many applications of the Hypercontractive Inequality, with a slight loss in the corresponding
bounds. Friedgut and Rödl obtain this result by recasting the inequality in terms of multi-hypergraphs,
invoking a generalization of Shearer’s entropy lemma for such hypergraphs, and applying probabilistic
arguments to complete the proof.
    Friedgut and Rödl’s result raises two fundamental questions: Is the combinatorial/information
theoretic argument strong enough to recover the optimal value of kP=m k2→q ? And, can this result be
obtained directly by elementary information theoretic arguments? We give positive answers to both
questions: we show that Friedgut and Rödl’s argument can be simplified and sharpened by reasoning
directly about the Fourier spectrum of f , without requiring the translation to hypergraphs or generalization
of Shearer’s lemma. With this direct approach, we obtain a simple proof of the optimal Hypercontractive
Inequality
                                          kP=m k2→2k ≤ (2k − 1)m/2
for all k ∈ N (i. e., Theorem 1.1 for all even integers q) using only elementary information-theoretic facts.
    Let us briefly mention two other alternative proofs for special cases of the Hypercontractive Inequality.
First, there is a short and elegant inductive proof of the q = 4 special case of Theorem 1.1 that requires
only the Cauchy-Schwarz inequality. This proof first appeared in the literature in [19], although Bonami’s
original paper [4] contains a proof along similar lines. This proof, however, does not appear to generalize
beyond the q = 4 special case. Second, in independent recent work, Kauers et al. noted that the two-point
inequality in the standard analytic proof of the Hypercontractive Inequality becomes particularly simple
when q is an even integer [16]. However, this proof still uses the inductive step as a black-box (i. e., the
fact that the Hypercontractive Inequality tensorizes, requiring Minkowski’s inequality), and remains very
much an analytic proof.


2    Basics of Shannon entropy
Let X be a (scalar or vector valued) random variable over the discrete sample space Ω and let p : Ω → [0, 1]
be the probability mass function of X. The entropy of X is

                                        H(X) = − ∑ p(x) log p(x) .
                                                   x∈Ω

Here and throughout this note, logarithms are taken to base 2. The conditional entropy of X given Y is
                                             h                        i
                               H(X | Y) = E − ∑ p(x | y) log p(x | y) .
                                              y
                                                     x

We use the following basic properties of entropy in our analysis.

Lemma 2.1 (Universal upper bound). For any random variable X over the sample space Ω,

                                     H(X) ≤ log |supp(X)| ≤ log |Ω| .

The equality H(X) = log |Ω| holds if and only if X is uniformly distributed over Ω.

                     T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                               891
                                        E RIC B LAIS AND L I -YANG TAN

Lemma 2.2 (Chain rule). The entropy of a sequence X1 , . . . , Xn of random variables satisfies
                                                         n
                                  H(X1 , . . . , Xn ) = ∑ H(Xi | X1 , . . . , Xi−1 ).
                                                       i=1

   Finally we recall Shearer’s lemma, a generalization of the subadditivity of entropy. (For a simple
proof, see [7].) For a sequence of random variables X = (X1 , . . . , Xn ) and a set S ⊆ [n], we write XS to
denote the projection of X onto the coordinates in S, i. e., XS = (X j ) j∈S .
Lemma 2.3 (Shearer’s Lemma). Let X ∈ Ωn be a vector of random variables and let S1 , . . . , Sm ⊆ [n] be
a collection of sets that cover each element in [n] at least t times. Then

                                                         1 m
                                             H(X) ≤        ∑ H(XS j ) .
                                                         t j=1


3    Hypercontractivity via the entropy method
In this section we prove Theorem 1.1 for all even integers q using the entropy method.
Theorem 3.1 (Special case of Theorem 1.1). For any f : {−1, 1}n → R and any even integer q > 2,

                                      k f (=m) kq ≤ (q − 1)m/2 · k f (=m) k2 .

Proof. By a limiting argument, it suffices to prove the inequality for f : {−1, 1}n → Q. By homogeneity,
we may further assume      that the Fourier coefficients of f are integral (i. e., f (S) ∈ Z for all S ⊆ [n]).
                                                                                    b
                       [n]
      For each S ∈ m , let WS denote a set of | f (S)| elements which we call witnesses for S. The witness
                                                    b
                 two distinct sets S 6= T are disjoint, and we write W = S WS to denote the disjoint union
                                                                            S
sets for any
of all mn witness sets. We say that q witnesses in W are a legal q-tuple if their corresponding sets
              

S1 , . . . , Sq ∈ [n]
                     
                   m satisfy S1 4 · · · 4 Sq = 0.
                                               /
      Let w = (w1 , . . . , wq ) ∈ Wq be a random variable drawn uniformly at random from the collection of
                                                                                                      q
ordered legal q-tuples in Wq (where repetitions are allowed) and let S = (S1 , . . . , Sq ) ∈ [n]  m     be the sets
corresponding to the q witnesses in w. By the chain rule, the joint entropy of w and S satisfies
                                                                                   
                                                                               
                                                                               
            H(w, S) = H(w) + H(S | w) = log 
                                                     ∑ | f (S1 ) . . . f (Sq )|
                                                                b       b
                                                                                +0
                                             S1 ,...,Sq ∈([n])                 
                                                                   m
                                                      S1 4···4 Sq =0/




                                            ≥ log            ∑          fb(S1 ) . . . fb(Sq ) = log(k f (=m) kqq ) ,   (1)
                                                     S1 ,...,Sq ∈([n]
                                                                   m)
                                                     S1 4···4 Sq =0/

where the final equality uses the fact that q is even.

                      T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                                          892
                              H YPERCONTRACTIVITY V IA THE E NTROPY M ETHOD

    Let
                                                                                q
                                                                             [n] (2)
                                                                           
                                 T = (T{1,2} , T{1,3} , . . . , T{q−1,q} ) ∈
                                                                             ≤m
be a sequence of q2 sets such that for every i < j ∈ [q],
                   

               
      T{i, j} = x ∈ [n] : x ∈ (Si ∩ S j )\(Si+1 ∪ · · · ∪ S j−1 ) and #{` ∈ [q] : ` < i, x ∈ S` } is even .

Recall that the symmetric difference of the sets in S is empty, and so every x ∈ [n] occurs in an even
number of them. Therefore, if an element x satisfies {i ∈ [q] : x ∈ Si } = {i1 , . . . , ik } where i1 ≤ . . . ≤ ik ,
then x is added to the sets T{i1 ,i2 } , T{i3 ,i4 } , . . . , T{ik−1 ,ik } .
    Let
                  T{i,∗} := (T{i, j} ) j6=i = (T{1,i} , T{2,i} , . . . , T{i−1,i} , T{i,i+1} , . . . , T{i,q} ) .
This construction guarantees that for every i ∈ [q], the sets in T{i,∗} partition Si (since every x ∈ Si is
in exactly one set T{i, j} for some j 6= i). In particular, S determines T and vice-versa so H(S) = H(T).
With a different application of the chain rule to the joint entropy of w and S, we obtain

                               H(w, S) = H(S) + H(w | S) = H(T) + H(w | S) .                                        (2)

The chain rule also yields H(w | S) = ∑ni=1 H(wi | w1 , . . . , wi−1 , S1 , . . . , Sn ). Given Si , the random variable
wi is uniformly distributed among the witnesses for Si and, in particular, is independent of w j and S j for
each j 6= i. So the conditional entropy of w given S satisfies
                                                           q
                                            H(w | S) = ∑ H(wi | Si ) .                                              (3)
                                                          i=1

The random variables T{1,∗} , . . . , T{q,∗} cover each set in {T{i, j} }i6= j twice, so by Shearer’s Lemma

                                                        1 q
                                             H(T) ≤       ∑ H(T{i,∗} ) .                                            (4)
                                                        2 i=1

Since T{i,∗} is a partition of Si , H(T{i,∗} ) = H(Si , T{i,∗} ). Also, there are at most (q − 1)m possible
ordered partitions of the m elements of Si into q − 1 parts so the universal upper bound yields

                                          H(T{i,∗} | Si ) ≤ m log(q − 1) .

These two observations imply that

                H(T{i,∗} ) = H(Si , T{i,∗} ) = H(Si ) + H(T{i,∗} | Si ) ≤ H(Si ) + m log(q − 1) .                   (5)

We can now combine (2)–(5) to obtain the upper bound

                                        qm             1 q                    
                           H(w, S) ≤       log(q − 1) + ∑ H(Si ) + 2H(wi | Si ) .                                   (6)
                                         2             2 i=1

                       T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                                     893
                                      E RIC B LAIS AND L I -YANG TAN

   Finally, we observe that H(Si ) + 2H(wi | Si ) = H(wi , w0i , Si ) where wi , Si are as above, and w0i is
another witness of Si chosen uniformly at random from WSi . By the universal upper bound,
                                                                          !
                                             0
             H(Si ) + 2H(wi | Si ) = H(wi , wi , Si ) ≤ log ∑ | fb(S)| = log k f (=m) k22 .
                                                                        2
                                                                                                        (7)
                                                              S∈([n]
                                                                  m)


Combining (1), (6), and (7) and rearranging completes the proof.

Remark 3.2. For f : {−1, 1}n → R, define f (≤m) = ∑|S|≤m fb(S)χS . The argument above (with only
minor changes in the choice of random variables) can also be used to show that

                                    k f (≤m) kq ≤ (q − 1)m/2 · k f (≤m) k2

for every even integer q > 2, a slight generalization of Theorem 3.1.


Acknowledgements
We thank Ryan O’Donnell, Rocco Servedio, and Andrew Wan for helpful discussions. We also thank
Elchanan Mossel and the anonymous referees whose comments helped improve the presentation.


References
 [1] B OAZ BARAK , F ERNANDO G. S. L. B RANDÃO , A RAM W. H ARROW, J ONATHAN A. K ELNER ,
     DAVID S TEURER , AND Y UAN Z HOU: Hypercontractivity, sum-of-squares proofs, and their appli-
     cations. In Proc. 44th STOC, pp. 307–326. ACM Press, 2012. [doi:10.1145/2213977.2214006]
     890

 [2] M ICHAEL B EN -O R AND NATHAN L INIAL: Collective coin flipping. In S ILVIO M ICALI AND
     F RANCO P REPARATA, editors, Randomness and Computation, volume 5 of Advances in Computing
     Research: A research annual, pp. 91–115. JAI Press, 1989. Preliminary version in FOCS’85. 890

 [3] A LINE B ONAMI: Ensembles Λ(p) dans le dual de D∞ . Annales de l’Institut Fourier, 18(2):193–204,
     1968. EuDML. 890

 [4] A LINE B ONAMI: Étude des coefficients Fourier des fonctions de L p (G). Annales de l’Institut
     Fourier, 20(2):335–402, 1970. EuDML. 890, 891

 [5] J EAN B OURGAIN: On the distribution of the Fourier spectrum of Boolean functions. Israel J. Math.,
     131(1):269–276, 2002. [doi:10.1007/BF02785861] 890

 [6] J EAN B OURGAIN , J EFF K AHN , G IL K ALAI , Y ITZHAK K ATZNELSON , AND NATHAN
     L INIAL: The influence of variables in product spaces. Israel J. Math., 77(1-2):55–64, 1992.
     [doi:10.1007/BF02808010] 890

 [7] T HOMAS M. C OVER AND J OY A. T HOMAS: Elements of Information Theory. Wiley, 1991. 892

                     T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                            894
                         H YPERCONTRACTIVITY V IA THE E NTROPY M ETHOD

 [8] U RIEL F EIGE AND J OE K ILIAN: Zero knowledge and the chromatic number. J. Comput. System
     Sci., 57(2):187–199, 1998. Preliminary version in CCC’96. [doi:10.1006/jcss.1998.1587] 890

 [9] E HUD F RIEDGUT: Boolean functions with low average sensitivity depend on few coordinates.
     Combinatorica, 18(1):27–35, 1998. [doi:10.1007/PL00009809] 890

[10] E HUD F RIEDGUT: Sharp thresholds of graph properties, and the k-SAT problem. J. Amer. Math.
     Soc., 12(4):1017–1054, 1999. AMS. 890

[11] E HUD F RIEDGUT, G IL K ALAI , AND A SSAF NAOR: Boolean functions whose Fourier transform
     is concentrated on the first two levels. Advances in Applied Mathematics, 29(3):427–437, 2002.
     [doi:10.1016/S0196-8858(02)00024-6] 890

[12] E HUD F RIEDGUT AND VOJTECH R ÖDL: Proof of a hypercontractive estimate via entropy. Israel J.
     Math., 125(1):369–380, 2001. [doi:10.1007/BF02773387] 890

[13] C HRISTOPHE G ARBAN , G ÁBOR P ETE , AND O DED S CHRAMM: The Fourier spectrum of critical
     percolation. Acta Mathematica, 205(1):19–104, 2010. [doi:10.1007/s11511-010-0051-x] 890

[14] L EONARD G ROSS: Logarithmic Sobolev inequalities.          American Journal of Mathematics,
     97(4):1061–1083, 1975. JSTOR. 890

[15] J EFF K AHN , G IL K ALAI , AND NATHAN L INIAL: The influence of variables on boolean func-
     tions (extended abstract). In Proc. 29th FOCS, pp. 68–80. IEEE Comp. Soc. Press, 1988.
     [doi:10.1109/SFCS.1988.21923] 890

[16] M ANUEL K AUERS , RYAN O’D ONNELL , L I -YANG TAN , AND Y UAN Z HOU: Hypercontractivity
     inequalities via SOS, and the Frankl-Rödl graph. Technical report, 2012. To appear in SODA’14.
     [arXiv:1212.5324] 891

[17] S UBHASH K HOT, G UY K INDLER , E LCHANAN M OSSEL , AND RYAN O’D ONNELL: Optimal
     inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM J. Comput., 37(1):319–
     357, 2007. Preliminary version in FOCS’04. See also at ECCC. [doi:10.1137/S0097539705447372]
     890

[18] S UBHASH K HOT AND N ISHEETH K. V ISHNOI: The Unique Games Conjecture, integrality gap for
     cut problems and embeddability of negative type metrics into `1 . In Proc. 46th FOCS, pp. 53–62.
     IEEE Comp. Soc. Press, 2005. [doi:10.1109/SFCS.2005.74] 890

[19] E LCHANAN M OSSEL , RYAN O’D ONNELL , AND K RZYSZTOF O LESZKIEWICZ: Noise stability
     of functions with low influences: invariance and optimality. Ann. of Math., 171(1):295–341, 2010.
     Preliminary version in FOCS’05. [doi:10.4007/annals.2010.171.295] 890, 891

[20] RYAN O’D ONNELL: The Analysis of Boolean Functions. Cambridge Univ. Press, 2014, to appear.
     Preliminary version at analysisofbooleanfunctions.org. 890

                   T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                        895
                                   E RIC B LAIS AND L I -YANG TAN

[21] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: Learning monotone decision trees in poly-
     nomial time. SIAM J. Comput., 37(3):827–844, 2007. Preliminary version in CCC’06.
     [doi:10.1137/060669309] 890

[22] M ICHEL TALAGRAND: On Russo’s approximate zero-one law. Ann. Probab., 22(3):1576–1587,
     1994. JSTOR. 890

[23] RONALD DE W OLF: A Brief Introduction to Fourier Analysis on the Boolean Cube. Number 1 in
     Graduate Surveys. Theory of Computing Library, 2008. [doi:10.4086/toc.gs.2008.001] 890


AUTHORS

     Eric Blais
     Simons Postdoctoral Fellow
     CSAIL, MIT. Cambridge, MA
     eblais csail mit edu
     http://people.csail.mit.edu/eblais


     Li-Yang Tan
     Ph. D. Candidate
     Department of Computer Science, Columbia University. New York, NY
     liyang cs columbia edu
     http://www.cs.columbia.edu/~liyang


ABOUT THE AUTHORS

     E RIC B LAIS is a Simons Postdoctoral Fellow in the Theory of Computing group at MIT. He
         graduated from Carnegie Mellon University in 2012; his advisor was Ryan O’Donnell.
         His research interests include the complexity of Boolean functions and sublinear algo-
         rithms.


     L I -YANG TAN is a Ph. D. student at Columbia University, where he is fortunate to be
          advised by Rocco Servedio. His research interests lie in complexity theory, with a focus
          on discrete Fourier analysis, the complexity of Boolean functions, and computational
          learning theory.




                   T HEORY OF C OMPUTING, Volume 9 (29), 2013, pp. 889–896                           896