Decision Trees and Influence: an Inductive Proof of the OSSS Inequality

Authors Homin K. Lee,
Plaintext
                               T HEORY OF C OMPUTING, Volume 6 (2010), pp. 81–84
                                          www.theoryofcomputing.org

                                                                  NOTE



     Decision Trees and Influence: an Inductive
           Proof of the OSSS Inequality
                                                          Homin K. Lee
                                 Received: October 23, 2009; published: August 17, 2010.



         Abstract: We give a simple proof of the OSSS inequality (O’Donnell, Saks, Schramm,
         Servedio, FOCS 2005). The inequality states that for any decision tree T calculating a
         Boolean function f : {0, 1}n → {−1, 1}, we have Var[ f ] ≤ ∑i δi (T ) Infi ( f ), where δi (T ) is
         the probability that the input variable xi is read by T and Infi ( f ) is the influence of the ith
         variable on f .

ACM Classification: G.2, G.3, F.1.3
AMS Classification: 05A20, 60C05, 68R05, 68Q87
Key words and phrases: computational complexity, decision trees, influence

1      Introduction
Let T be a decision tree computing a function f : {0, 1}n → {−1, 1}. We write δi (T ) for the probability
that the variable xi is queried by the decision tree on a uniform random input, and we write:
                                            n
                                     def
                              ∆(T ) = ∑ δi (T ) =               E      [# coordinates T queries on x] .
                                           i=1            x∈{0,1}n

∆(T ) can also be thought of as the average depth of the decision tree, or as a refinement of the notion of
the size of the decision tree, since ∆(T ) ≤ log(size(T )) [4].
    The influence of a variable xi on a Boolean function f is defined to be

                                                 Infi ( f ) =       Pr [ f (x) 6= f (x(i) )] ,
                                                                x∈{0,1}n


    2010 Homin K. Lee
    Licensed under a Creative Commons Attribution License                                           DOI: 10.4086/toc.2010.v006a004
                                                           H OMIN K. L EE

where x(i) denotes x with its i-th bit flipped [1]. Recall that the variance of a function f is Var[ f ] =
E[( f − E[ f ])2 ], and that the covariance of two functions f and g is Cov[ f , g] = E [( f − E f ) (g − E g)].
    O’Donnell et al. [3] proved the following inequality:

Theorem 1.1. Let f : {0, 1}n → {−1, 1} be a Boolean function, and let T be a decision tree computing f .
Then
                                                                  n
                                                   Var[ f ] ≤ ∑ δi (T ) Infi ( f ) .
                                                                i=1

    This inequality can be viewed as a refinement of the Efron-Stein Inequality [2, 5] for the discrete
cube (i. e., Var[ f ] ≤ ∑ni=1 Infi ( f )) that takes into account the complexity of the function’s representation.
    O’Donnell et al. [3] also proved the following generalization of Theorem 1.1, which is what we
reprove in the next section.

Theorem 1.2. Let f , g : {0, 1}n → {−1, 1} be Boolean functions, and let T be a decision tree comput-
ing f . Then
                                                                       n
                                                |Cov[ f , g]| ≤ ∑ δi (T ) Infi (g) .
                                                                      i=1


2    Inductive Proof
The original proofs of both Theorems 1.1 and 1.2 relied on some delicate probabilistic reasoning about
the independence of certain hybrid inputs to the decision tree. We will prove Theorem 1.2 using induc-
tion. To do so, we will consider the function’s behavior under the two cases when the root variable takes
the value 0 and the value 1. First we will review a fact from probability theory.
    For a function f : {0, 1}n → R, let

                               ci (x1 , . . . , xn ) = E[ f |(x1 , . . . , xi )] − E[ f |(x1 , . . . , xi−1 )]

for 1 ≤ i ≤ n so that E[ f ] + ∑i ci = f . The sequence {ci } is a martingale difference sequence. Let g
be another real-valued function, and let {di } be its martingale difference sequence. Then Cov[ f , g] =
∑ni=1 E[ci di ]. We’ll prove this fact for the sake of completeness.
Fact 2.1. Let f , g : {0, 1}n → R be real-valued functions, with martingale difference sequences

      ci = E[ f |(x1 , . . . , xi )] − E[ f |(x1 , . . . , xi−1 )] and di = E[g|(x1 , . . . , xi )] − E[g|(x1 , . . . , xi−1 )]

for 1 ≤ i ≤ n. Then Cov[ f , g] = ∑ni=1 E [ci di ].

Proof. For j < k, we have

              E[c j dk ] = E E [c j dk |(x1 , . . . , xk−1 )] = E [c j E [dk |(x1 , . . . , xk−1 )]] = E[c j · 0] = 0 .

Therefore Cov[ f , g] = E [( f − E f ) (g − E g)] = E [∑ c j ∑ dk ] = ∑ni=1 E [ci di ].

    We now relate the last martingale difference sequence with the influence of the variable xn .

                               T HEORY OF C OMPUTING, Volume 6 (2010), pp. 81–84                                                  82
                                             D ECISION TREES AND INFLUENCE

Lemma 2.2. Let f , g : {0, 1}n → {−1, 1} be Boolean functions, and let

                         cn = f − E[ f |(x1 , . . . , xn−1 )] and         dn = g − E[g|(x1 , . . . , xn−1 )] .

Then E [cn dn ] ≤ Infn ( f ) and E [cn dn ] ≤ Infn (g).

Proof. Let f0 (x1 , . . . , xn ) denote f (x1 , . . . , xn−1 , 0), and let f1 , g0 , and g1 be defined similarly. Then we
have that
                                 cn = f − ( f0 + f1 )/2 and dn = g − (g0 + g1 )/2 .
We can rewrite E [cn dn ] as                                               
                                                       f0 + f1    g0 + g1 
                                           E        f−           g−            .
                                                          2            2
If both f0 6= f1 and g0 6= g1 , the quantity inside the expectation is f · g ∈ {−1, 1}. Otherwise, the quantity
is 0.
     The influence of xn on f is just Infn ( f ) = Pr [ f0 (x) 6= f1 (x)], and thus Infn ( f ) is an upper bound on
E[cn dn ] (as is Infn (g)). Note that this upper bound is an equality when we consider the special case of
 f = g, and we have that E[c2n ] = Infn ( f ).

    We are now ready to prove Theorem 1.2.

Proof. We’ll prove the statement by induction on the number of variables. For the base case of one
variable, recall that both δi (T ) and Infi (g) are always non-negative, and that the covariance of two
functions with range {−1, 1} is a value in [−1, 1]. A Boolean function on only one variable is either
constant, the single variable x1 , or its negation. If either f or g is constant, then Cov[ f , g] = 0, and the
inequality holds. If neither f nor g are constant, then Inf1 (g) and δ1 (T ) must be 1 and the inequality
holds.
      Now we’ll consider f and g on n variables. We can assume that f and g are non-constant, or the
inequality trivially holds as before. Thus, T must query at least one variable, and we will assume
without loss of generality that the root of T queries xn . Let T0 be the left subtree and let T1 be the right
subtree. Then for i 6= n, we have δi (T ) = (1/2)δi (T0 ) + (1/2)δi (T1 ). As in the proof of Lemma 2.2, let
 f0 (x1 , . . . , xn ) denote f (x1 , . . . , xn−1 , 0), and let f1 , g0 , and g1 be defined similarly. For i 6= n, we get the
following expression: Infi (g) = (1/2) Infi (g0 ) + (1/2) Infi (g1 ).
      By Fact 2.1, we can write Cov[ f , g] = ∑ni=1 E [ci di ]. Let

                                    ci,0 = E[ f0 |(x1 , . . . , xi )] − E[ f0 |(x1 , . . . , xi−1 )] ,

for 1 ≤ i ≤ n − 1, and define ci,1 , di,0 , and di,1 similarly. Then we have ci = (ci,0 + ci,1 )/2, di = (di,0 +
di,1 )/2, and we can write the covariance as:
                     n
                                     1 n−1                                         1
     Cov[ f , g] = ∑ E [ci di ] =      ∑      ∑      E [ci,a di,b ] + E [cn dn ] =      ∑ Cov[ fa , gb ] + E [cn dn ] .
                    i=1              4 i=1 a,b∈{0,1}                               4 a,b∈{0,1}

By the triangle inequality, |Cov[ f , g]| ≤ (1/4) ∑a,b∈{0,1} |Cov[ fa , gb ]| + |E [cn dn ]| .
   Since fa and gb are functions on n − 1 variables we can use the induction hypothesis, and we have:

                                T HEORY OF C OMPUTING, Volume 6 (2010), pp. 81–84                                          83
                                                     H OMIN K. L EE


                                       n−1                                      n−1
                           1
         |Cov[ f , g]| ≤        ∑      ∑   δi (Ta ) Infi (gb ) + |E [cn dn ]| = ∑ δi (T ) Infi (g) + |E [cn dn ]| .
                           4 a,b∈{0,1} i=1                                      i=1

As Infn (g) = Infn (−g), we have that |E [cn dn ]| ≤ Infn (g) by Lemma 2.2, and δn (T ) = 1 because xn is the
root of the tree. Thus, the inductive step holds.



References
[1] M ICHAEL B EN -O R AND NATHAN L INIAL: Collective coin flipping. In S. M ICALI, editor, Ran-
    domness and Computation, pp. 91–115. Acad. Press, 1989. Preliminary version in Proc. FOCS’85.
    82

[2] B RADLEY E FRON AND C HARLES S TEIN: The jackknife estimate of variance. Ann. Statist., 9:586–
    596, 1981. [doi:10.1214/aos/1176345462]. 82

[3] RYAN O’D ONNELL , M ICHAEL S AKS , O DED S CHRAMM , AND ROCCO A. S ERVEDIO: Every
    decision tree has an influential variable. In Proc. 46th FOCS, pp. 31–39. IEEE Computer Soc.,
    2005. [doi:10.1109/SFCS.2005.34]. 82

[4] RYAN O’D ONNELL AND ROCCO A. S ERVEDIO: Learning monotone decision trees in polyno-
    mial time. SIAM J. Comput., 37(3):827–844, 2007. Preliminary version in Proc. of CCC’06.
    [doi:10.1137/060669309]. 81

[5] J. M ICHAEL S TEELE: An Efron-Stein inequality for nonsymmetric statistics. Ann. Statist., 14:753–
    758, 1986. [doi:10.1214/aos/1176349952]. 82


AUTHOR

      Homin K. Lee
      Department of Computer Science
      University of Texas at Austin
      homin cs utexas edu
      http://www.cs.utexas.edu/∼homin


ABOUT THE AUTHOR

      H OMIN K. L EE received his Ph. D. from the Department of Computer Science at Columbia
         University in 2009 under the supervision of Tal Malkin and Rocco A. Servedio. Homin
         is currently a postdoctoral researcher at the University of Texas at Austin. Homin’s
         research interests are in computational learning theory, combinatorics, computational
         complexity, and cryptography. Homin is also currently hungry.


                             T HEORY OF C OMPUTING, Volume 6 (2010), pp. 81–84                                        84