Pseudorandomness and Fourier-Growth Bounds for Width-3 Branching Programs

Authors Thomas Steinke, Salil Vadhan, Andrew Wan,
Plaintext
                          T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2014



     Pseudorandomness and Fourier-Growth
     Bounds for Width-3 Branching Programs
                   Thomas Steinke∗                       Salil Vadhan†                Andrew Wan‡
             Received February 23, 2015; Revised December 20, 2016; Published October 31, 2017




       Abstract: We present an explicit pseudorandom generator for oblivious, read-once, width-3
       branching programs, which can read their input bits in any order. The generator has seed
                     3
       length O(log
                 e     n). The best previously known seed length for this model is n1/2+o(1) due
       to Impagliazzo, Meka, and Zuckerman (FOCS’12). Our result generalizes a recent result
       of Reingold, Steinke, and Vadhan (RANDOM’13) for permutation branching programs.
       The main technical novelty underlying our generator is a new bound on the Fourier growth
       of width-3, oblivious, read-once branching programs. Specifically, we show that for any
        f : {0, 1}n → {0, 1} computed by such a branching program, and k ∈ [n],

                                              ∑          fb[s] ≤ n2 · (O(log n))k ,
                                           s⊆[n]:|s|=k

       where fb[s] = EU f [U] · (−1)s·U is the standard Fourier transform over Zn2 . The base
                                       

       O(log n) of the Fourier growth is tight up to a factor of log log n.

ACM Classification: F.1
AMS Classification: 65C10, 68Q87
Key words and phrases: pseudorandom generators, branching programs, Fourier analysis
     A conference version of this paper appeared in the Proceedings of the 18th International Workshop on Randomization and
Computation (RANDOM’14) [46].
   ∗ Part of this work was done while the author was at Harvard University, supported by NSF grant CCF-1116616 and the Lord

Rutherford Memorial Research Fellowship.
   † Supported in part by NSF grant CCF-1116616, US-Israel BSF grant 2010196, and a Simons Investigator Award.
   ‡ Part of this work was completed while at Harvard University.



© 2017 Thomas Steinke, Salil Vadhan, and Andrew Wan
c b Licensed under a Creative Commons Attribution License (CC-BY)                            DOI: 10.4086/toc.2017.v013a012
                              T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

1     Introduction
1.1     Pseudorandom generators for space-bounded computation
A central problem in the theory of pseudorandomness is constructing an “optimal” pseudorandom
generator for space-bounded computation—that is, an explicit algorithm that stretches a uniformly random
seed of O(log n) bits to n bits that cannot be distinguished from uniform by any log n-space algorithm.
Such a generator would imply that every randomized algorithm can be derandomized with only a constant-
factor increase in space (RL = L), and would also have a variety of other applications, such as in streaming
algorithms [26], deterministic dimension reduction and SDP rounding [43, 16], hashing [13], hardness
amplification [23], almost k-wise independent permutations [27], and cryptographic pseudorandom
generator constructions [21].
    To construct a pseudorandom generator for space-bounded algorithms using space s, it suffices to
construct a generator that is pseudorandom against ordered branching programs of width 2s . A read-once,
oblivious branching program B is a non-uniform model of space-bounded computation that reads one
input bit at a time, maintaining a state in [w] = {1, . . . , w}, where w is called the width of B. At each
time step i = 1, . . . , n, the program B can read a different input bit xπ(i) (for some permutation π) and
uses a different state transition function Ti : [w] × {0, 1} → [w]. It is often useful to think of a read-once,
oblivious branching program as a directed acyclic graph consisting of n + 1 groups of w vertices each,
where the ith group corresponds to the state space at time i. The transition function defines a bipartite
graph between consecutive state spaces (which we call a layer), where we connect state s in state space
i − 1 to states Ti (s, 0) and Ti (s, 1) in state space i (labeling those edges 0 and 1, respectively). (We usually
visualise the edges as going left to right.) Figure 1 gives an example illustration of a read-once, oblivious
branching program.

                   0             0             0             0             0             0             0        accept
                   1             1             1             1             1             1             1
                                               1                           1
                  1             1              0             1             0             0               1
       start
                   0             0             0              0            0             1             0
                  0             0                  1         0                 1         0             0
                  1             1                            1                           1             1
                  x1            x2            x3             x4                x5        x6            x7


Figure 1: The graphical representation of a length-7 width-3 ordered branching program computing the
function f (x) = ((x1 ∧ x2 ∧ x3 ) ∨ (x4 ∧ x5 )) ⊕ x7 .

    Most previous constructions of pseudorandom generators for space-bounded computations consider
ordered branching programs, where the input bits are read in order—that is, π(i) = i. The classic result of
Nisan [33] gave a generator stretching O(log2 n) uniformly random bits to n bits that are pseudorandom
against ordered branching programs of polynomial width.1 Despite intensive study, this is the best
    1 We say that a generator G : {0, 1}` → {0, 1}n is ε-pseudorandom against a program P : {0, 1}n → {0, 1} (or ε-fools P) if



                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                            2
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

known seed length for ordered branching programs even of width 3, but a variety of results have shown
improvements for restricted classes of ordered branching programs such as width-2 programs [39, 5],
and “regular” or “permutation” ordered branching programs (of constant width) [9, 10, 28, 14, 45].2 For
width 3, hitting set generators (a relaxation of pseudorandom generators) have been constructed [42, 18].
The vast majority of these results are based on Nisan’s original generator or its variants by Impagliazzo,
Nisan, and Wigderson [25] and Nisan and Zuckerman [34], and adhere to a paradigm that seems unlikely
to yield generators against general logspace computations with seed length better than log1.99 n (see [10]).
    All known analyses of Nisan’s generator and its variants rely on the order in which the output bits are
fed to the ordered branching program (given by the permutation π). Tzur [48, Corollary 3.18] showed
that this is inherent by constructing a permutation π and a constant-width ordered branching program that
can distinguish the output of an instantiation of Nisan’s generator when permuted according to π from
uniformly random bits. The search for new ideas leads us to ask: Can we construct a pseudorandom
generator whose analysis does not depend on the order in which the bits are read? Answering this
question will hopefully yield new techniques and insights that shed light on the ordered case as well.
    Another motivation for considering unordered, read-once, oblivious branching programs is that they
also encompass natural classes of read-once formulas, like read-once CNFs and DNFs (which can be
simulated in width 3), read-once polynomials over GF(2) (simulable in width 4), and read-once AC0 or
even read-once ACC0 (both simulable in constant width).3
    A recent line of work [6, 24, 36] has constructed pseudorandom generators for unordered, read-
once, oblivious branching programs (where the bits are fed to the branching program in an arbitrary,
fixed order); however, none match both the seed length and generality of Nisan’s result. For unordered
branching programs of length n and width w, Impagliazzo, Meka, and Zuckerman [24] give seed length
O((nw)1/2+o(1) ) improving on the linear seed length (1 − Ω(1)) · n of Bogdanov, Papakonstantinou, and
Wan [6].4 Reingold, Steinke, and Vadhan [36] achieve seed length O(w2 log2 n) for the restricted class
of permutation ordered branching programs, in which Ti (·, b) is a permutation on [w] for all i ∈ [n] and
b ∈ {0, 1}.
    Recently, a new approach for constructing pseudorandom generators has been suggested in the paper
of Gopalan et al. [18]; they constructed pseudorandom generators for read-once CNF formulas and
combinatorial rectangles, and hitting set generators for ordered width-3 branching programs, all having

| E [P(G(U` ))] − E [P(Un )] | ≤ ε, where U` and Un are uniform on {0, 1}` and on {0, 1}n , respectively.
    2 If the transition function T (·, b) is a permutation on [w] for all i ∈ [n] and b ∈ {0, 1}, we say that the ordered branching
                                  i
program is a permutation ordered branching program. Regular ordered branching programs are defined by a weaker property,
namely, for every i ∈ [n] and u ∈ [w], there are exactly two (v, b) ∈ [w] × {0, 1} such that Ti (v, b) = u.
    3 The fact that every read-once ACC0 circuit can be simulated by a read-once, oblivious, constant-width branching program

can be proved by induction on the depth of the ACC0 circuit. Indeed, it also applies to a generalization of ACC0 where we
replace the unbounded-fan-in mod m gates of ACC0 with unbounded fan-in gates that compute products in any constant-
                                                                                                        (b ) (b )    (b )
size semigroup. (More precisely, such a gate takes input bits (b1 , . . . , bt ) and outputs 1 iff g1 1 g2 2 · · · gt t ∈ S, where
 (0)   (1)   (0)   (1)   (0)    (1)
g1 , g1 , g2 , g2 , . . . , gt , gt are semigroup elements associated with the gate and S is a subset of the semigroup associated
with the gate.) Conversely, any read-once, oblivious, constant-width branching program can be simulated by a single semigroup-
product gate, taking the semigroup to be the set of functions from [w] to [w] under composition, where w is the width of the
                                 (b)
branching program, taking gi to be the transition function of the branching program at time step i upon reading a value b, and
taking S to be the set of functions that map the start state to an accept state. Thus these two models are actually equivalent.
   4 A generator with seed length O(  e √n log w) is given in [36]. The generator in [24] also extends to branching programs that
read their inputs more than once and in an adaptively chosen order, which is more general than the model we consider.


                               T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                             3
                            T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

seed length O(log
              e      n) (even for polynomially small error). Their basic generator (e. g., for read-once
CNF formulas) works by pseudorandomly partitioning the bits into several groups and assigning the bits
in each group using a small-bias generator [31]. A key insight in their analysis is that the small-bias
generator only needs to fool the function “on average,” where the average is taken over the possible
assignments to subsequent groups, which is a weaker requirement than fooling the original function or
even a random restriction of the original function. (For a more precise explanation, see Section 4.1.)
    The analysis of Gopalan et al. [18] does not rely on the order in which the output bits are read, and
the previously mentioned results of Reingold, Steinke, and Vadhan [36] use Fourier analysis of regular,
read-once, oblivious branching programs to show that the generator of Gopalan et al. fools unordered,
read-once, oblivious permutation branching programs.
    In this article, we further develop Fourier analysis of read-once, oblivious branching programs. Our
                                                                                                3
main result shows that the pseudorandom generator of Gopalan et al. with seed length O(log e      n) fools
width-3, read-once, oblivious branching programs.

Theorem 1.1 (Main result). There is an explicit pseudorandom generator
                                                           3
                                         G : {0, 1}O(log n·log log n) → {0, 1}n

against oblivious, read-once (but unordered), branching programs of width 3 and length n.

    The previous best seed length for this model is the aforementioned length of O(n1/2+o(1) ) given
in [24]. The construction of the generator in Theorem 1.1 is essentially the same as the generator of
Gopalan et al. [18] for read-once CNF formulas, which was used by Reingold et al. [36] for read-once,
oblivious, permutation branching programs. In our analysis, we give a new bound on the Fourier mass of
oblivious, read-once, width-3 branching programs.

1.2     Fourier growth of branching programs
For a function f : {0, 1}n → R, let
                                               fb[s] = E f [U] · (−1)s·U
                                                                        
                                                       U

be the standard Fourier transform over Zn2 , where U is a random variable distributed uniformly over
{0, 1}n and s ⊆ [n] or, equivalently, s ∈ {0, 1}n . The Fourier mass of f (also called the spectral norm of f ),
defined as L( f ) := ∑s6=0/ | fb[s]|, is a fundamental measure of complexity for Boolean functions (see, e. g.,
[19]), and its study has applications to learning theory [29, 30], communication complexity [20, 1, 47, 41],
and circuit complexity [8, 11, 12]. In the study of pseudorandomness, it is well known that small-bias
generators5 with bias ε/L (which can be sampled using a seed of length O(log(n · L/ε)) [31, 2]) will
ε-fool any function whose Fourier mass is at most L. Width-2, read-once, oblivious branching programs
have Fourier mass at most O(n) [5, 39] and are thus fooled by small-bias generators with bias ε/n.
Unfortunately, such a bound does not hold even for very simple width-3 programs. For example, the
“mod 3 function,” which indicates when the Hamming weight of its input is a multiple of 3, has Fourier
mass exponential in n.
   5 A small-bias generator with bias µ outputs a random variable X ∈ {0, 1}n such that   EX (−1)s·X ≤ µ for every s ⊂ [n]
                                                                                                   

with s 6= 0.
          /


                          T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                         4
      P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

    However, a more refined measure of Fourier mass is possible and often useful: let Lk ( f ) = ∑|s|=k | fb[s]|
be the level-k Fourier mass of f . A bound on the Fourier growth of f , or the rate at which Lk ( f ) grows
with k, was used by Mansour [30] to obtain an improved query algorithm for polynomial-size DNF; the
junta approximation results of Friedgut [17] and Bourgain [7] are proved using approximating functions
that have slow Fourier growth. This notion turns out to be useful in the analysis of pseudorandom
generators as well: Reingold et al. [36] show that the generator of Gopalan et al. [18] will work if there is
a good bound on the Fourier mass of low-order coefficients. More precisely, they show that for any class
C of functions computed by read-once, oblivious, branching programs that is closed under restrictions
and decompositions and satisfies Lk ( f ) ≤ poly(n) · ck for every k and f ∈ C, there is a pseudorandom
                             e · log2 n) that fools every f ∈ C. They then bound the Fourier growth of
generator with seed length O(c
read-once, oblivious, permutation branching programs (and the even more general model of read-once,
oblivious, “regular” branching programs, where each layer between consecutive state spaces is a regular
bipartite graph) to obtain a pseudorandom generator for read-once, oblivious, permutation branching
programs. We state their result.
Theorem 1.2 (Reingold et al. [36, Theorem 1.4]). Let f : {0, 1}n → {0, 1} be computed by a length-n,
width-w, read-once, oblivious, regular branching program. Then, for all k ∈ [n], we have Lk ( f ) ≤ (2w2 )k .
     In particular, the mod 3 function over O(k) bits, which is computed by an ordered permutation
branching program of width 3, has Fourier mass 2Θ(k) at level k (for k = o(n)). However, the Tribes
function,6 which is also computed by an ordered, width-3 branching program, has Fourier mass Θk (logk n)
at level k, so the bound in Theorem 1.2 does not hold for non-regular, read-once, oblivious branching
programs even of width 3.
     The Coin Theorem of Brody and Verbin [10] implies a related result: essentially, a function f
computed by a width-w, length-n, read-once, oblivious branching program cannot distinguish product
distributions on {0, 1}n any better than a function satisfying Lk ( f ) ≤ (log n)O(wk) for all k. To be more
precise, if X ∈ {0, 1}n is n independent samples from a coin with bias β (that is, each bit has expectation
(1 + β )/2), then EX [ f [X]] = ∑s fb[s]β |s| for any f . Consequently, if Lk ( f ) ≤ (log n)O(wk) for all k, then

                  E [ f [X]] − E [ f [U]] = ∑ fb[s]β |s| ≤ ∑ Lk ( f )|β |k ≤ O(|β |(log n)O(w) ) ,
                   X           U
                                               s6=0             k∈[n]

assuming |β | ≤ 1/(log n)O(w) . Brody and Verbin prove that, if f is computed by a length-n, width-w,
read-once, oblivious branching program, then | EX [ f [X]] − EU [ f [U]] | ≤ O(|β |(log n)O(w) ). This suggests
the following potential strengthening of their result.
Conjecture 1.3 (Reingold et al. [36, Conjecture 8.1]). For every constant w, the following holds. Let
f : {0, 1}n → {0, 1} be computed by a width-w, read-once, oblivious branching program. Then

                                              Lk ( f ) ≤ nO(1) · (log n)O(k) ,

where the constants in the O(·) expressions may depend on w.
   6 The Tribes function (introduced by Ben-Or and Linial [4]) is a DNF formula where all the terms are the same size and

every input appears exactly once. The size of the clauses in this case is chosen to give an asymptotically constant acceptance
probability on uniform input.


                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                            5
                           T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

      We now state the main contribution of this article, the proof of this conjecture for w = 3.
Theorem 1.4 (Fourier growth for width 3). Let f : {0, 1}n → {0, 1} be computed by a width-3, read-once,
oblivious branching program. Then, for all k ∈ [n],

                                   Lk ( f ) :=    ∑ | fb[s]| ≤ n2 · (O(log n))k .
                                                 s:|s|=k

     This bound, combined with the techniques of Reingold et al. [36], implies our main result (Theo-
rem 1.1). The Tribes function of [4] shows that the base of O(log n) of the Fourier growth in Theorem 1.4
is tight up to a factor of log log n.
     We also prove Conjecture 1.3 with k = 1 for any constant width w. We state the result.
Theorem 1.5. Let f : {0, 1}n → {0, 1} be computed by a width-w, length-n, read-once, oblivious branch-
ing program. Then
                                L1 ( f ) = ∑ | fb[{i}]| ≤ O(log n)w−2 .
                                                  i∈[n]


1.3     Techniques
The intuition behind our approach begins with two extreme cases of width-3, read-once, oblivious
branching programs: regular branching programs and branching programs in which every layer is a
non-regular layer.
    Regular, ordered branching programs “mix” well: on a uniform random input, the distribution over
states gets closer to uniform (in `2 distance) within each “connected component” of that layer. The
connected components of each layer are the connected components of the undirected bipartite graph
corresponding to that layer; Figure 2 illustrates connected components of layers. We can combine this

                 0                        0                           0                  0
                 1                        1                           1                  1
                                          1                           0
                 0                        0                               1                1
                 1                                                                       0
                                          1
                 0                        0                            0                 0
                 1                                                    1                  1
                (a)                      (b)                         (c)                (d)


         Figure 2: Examples of possible width-3 layers with connected components highlighted.

fact with an inductive argument to achieve a bound of 2O(k) on the level-k Fourier mass. This is the bound
of Theorem 1.2, which we use as part of our proof.
    For ordered branching programs in which every layer is a non-regular layer, we can make use of an
argument of Brody and Verbin [10]: when we apply a random restriction (where each variable is kept free
with probability roughly 1/k log n) to such a branching program, the resulting program is “simple” in that

                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                          6
      P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

many of the surviving state spaces have collapsed to width two. (This collapse is analogous to Håstad’s
switching lemma [22], which applies a similar argument to circuits, rather than branching programs.)
The restricted branching program is now “almost” an ordered, width-2 branching program, which allows
us to use arguments tailored to width-2, ordered branching programs, which are much easier to handle. In
particular, we can use the same concept of mixing as used for regular, ordered branching programs.
    To handle general width-3, ordered branching programs, which may contain an arbitrary mix of
regular and non-regular layers, we group the layers into “chunks” containing exactly one non-regular
layer each. We will then treat each chunk as if it was a single non-regular layer in the argument of Brody
and Verbin. Instead of using an ordinary random restriction as Brody and Verbin do, we consider a series
of carefully chosen restrictions similar to those in Steinberger’s “interwoven hybrids” technique [44],
which he used to give an alternative and tighter proof of the coin theorem of Brody and Verbin [10]. In
Section 3.1, we use such restrictions to show that the level-k Fourier mass of an arbitrary width-3 ordered
branching program can be bounded in terms of the level-k Fourier mass of a program D which has the
following form: D can be split into r ≤ n ordered branching programs D1 ◦ D2 ◦ · · · ◦ Dr , where each Di
has at most O(k) non-regular layers and the state space between consecutive blocks Di has width 2.
    We then generalize the arguments used for width-2, ordered branching programs to such branching
programs. We can show that each chunk Di is either mixing or has small Fourier growth, which suffices
to bound the Fourier growth of D.

1.4    Organization
In Section 2 we introduce the definitions and tools we use in our proof. In Section 2.1 we formally define
the kinds branching programs we consider and explain how to view them as matrix-valued functions,
which is very convenient for our analysis. In Sections 2.4 and 2.5 we define the matrix-valued Fourier
transform and explain how we use it.
    We prove the upper bound on Fourier mass of oblivious, read-once, width-3 branching programs (i. e.,
Theorem 1.4) in Section 3 (Theorem 3.1). In Section 4 we construct and analyze our pseudorandom
generator, which proves the main result (Theorem 1.1). The optimality of our Fourier growth bound is
discussed in Section 5. We prove the bound on first-order coefficients (Theorem 1.5) in Section 6.
    Finally, we discuss avenues for further research and the limitations of our techniques in Section 7.


2     Preliminaries
2.1    Branching programs as matrices
Rather than viewing ordered branching programs as graphs or as Boolean functions, we will usually
view them as matrix-valued functions, as in [45, 36]. The advantage of this view is that it allows the use
of linear-algebraic tools in addition to combinatorial tools. In particular, we can use linear algebra to
formulate the concept of mixing along the lines of the second eigenvalue, which has been used extensively
in pseudorandomness [35, 37].
     We begin by recalling the standard combinatorial view of ordered branching programs and then
translate it to our linear-algebraic view. Combinatorially, we define a length-n, width-w program to be
a function B : {0, 1}n × [w] → [w], which takes a start state u ∈ [w] and an input string x ∈ {0, 1}n and

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                             7
                               T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

outputs a final state B[x](u).7 B reads one bit of the input at a time (rather than reading x all at once)
maintaining only a state in [w] = {1, 2, . . . , w} at each step. We capture this restriction that B computes by
applying a transition function after reading each bit by requiring that B is composed of smaller programs
as follows. The computation of a length-n, width-w ordered branching program B : {0, 1}n × [w] → [w]
can be broken into the composition of n length-1 programs B1 , B2 , . . . , Bn : {0, 1}n × [w] → [w] such that8

                              B[x](u) = Bn [xn ](Bn−1 [xn−1 ](· · · B2 [x2 ](B1 [x1 ](u)) · · · )) .

We call Bi the transition function for the ith layer.9 Often we think of B as having a fixed start state u0
and a set S ⊂ [w] of accept states. Then B accepts x ∈ {0, 1}n if B[x](u0 ) ∈ S. We say that B computes
the Boolean function f : {0, 1}n → {0, 1} if f (x) = 1 if and only if B[x](u0 ) ∈ S.
    Our-linear algebraic view of a program B : {0, 1}n × [w] → [w] is as a matrix-valued function B :
{0, 1}n → {0, 1}w×w : for each x ∈ {0, 1}n , we let B[x] ∈ {0, 1}w×w be a matrix defined by

                                             B[x](u, v) = 1 ⇐⇒ B[x](u) = v .

     In our applications, the input x is randomly (or pseudorandomly) chosen. For a random variable X
on {0, 1}n , we have EX [B[X]] ∈ [0, 1]w×w . Then the entry in the uth row and vth column EX [B[X]] (u, v)
is the probability that B takes the initial state u to the final state v when given a random input from the
distribution X—that is,
                                     E [B[X]] (u, v) = P [B[X](u) = v] .
                                             X                      X

    Let B and B0 be width-w programs of length n and n0 , respectively. We define the concatenation B ◦ B0
of B and B0 to be a width-w program of length n + n0 defined by

                                                 (B ◦ B0 )[x ◦ x0 ] := B[x] · B0 [x0 ] ,
                                                                                0
where · denotes matrix multiplication and x ◦ x0 ∈ {0, 1}n+n denotes the concatenation of the strings
                               0
x ∈ {0, 1}n and x0 ∈ {0, 1}n . Combinatorially, we run B and B0 on separate inputs, but the final state of B
becomes the start state of B0 .
    Then a length-n, width-w, ordered branching program (abbreviated OBP) is a program B that can
be written B = B1 ◦ B2 ◦ · · · ◦ Bn , where each Bi is a length-1 width-w program. We refer to Bi : {0, 1} →
{0, 1}w×w as the ith layer of B. We can relate the matrix-valued function of the ith layer to the ith transition
function by Bi [x](u, v) = 1 ⇐⇒ Bi [x](u) = v. We can now write the entire computation in terms of
matrix multiplication:
                                        B[x] = B1 [x1 ] · B2 [x2 ] · · · · · Bn [xn ]
for all x ∈ {0, 1}n . We denote the subprogram of B from i to j by Bi... j := Bi ◦ Bi+1 ◦ · · · ◦ B j .
    A length-n, width-w, ordered branching program is often viewed as a directed acyclic graph, with
edges going left to right. The vertices are arranged into n + 1 state spaces each containing w vertices. The
edges are in layers going from one state space to the next and are specified by the transition functions. In
    7 We use the notation B[x](u) rather than the more standard B(x, u) to clarify what is the input x (in square brackets) and what

is the state u (in round brackets).
    8 Equivalently, B[x](u ) = u , where we recursively define u = B [x ](u
                           0      n                                i   i i    i−1 ) for i ∈ [n] = {1, . . . , n}.
    9 B [x](u) = T (u, x) in the notation of the discussion on page 2.
       i           i



                            T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                 8
      P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

particular, there is an edge in layer i labelled x from vertex u in state space i − 1 to vertex Bi [x](u) in state
space i.
    We use the following notational conventions when referring to layers and state spaces of a length-n
ordered branching program. Layers consist of edges and correspond to the Bi and are numbered from
1 to n. State spaces consist of vertices and correspond to the states between consecutive layers Bi and
are numbered from 0 to n. The edges in layer i (Bi ) go from vertices in state space i − 1 (on the left) to
vertices in state space i (on the right).


2.2    Classes of branching programs
An unordered, read-once, oblivious branching program is simply an ordered branching program composed
with a permutation of the input bits. Formally, a read-once, oblivious branching program B is an ordered
branching program B0 composed with a permutation π. That is,

                            B[x] = B0 [π(x)] = B01 [xπ(1) ] · B02 [xπ(2) ] · · · · · B0n [xπ(n) ] ,

where the ith bit of π(x) is the π(i)th bit of x.
   For a program B and an arbitrary distribution X, the matrix EX [B[X]] is stochastic—that is,

                                                 ∑ EX [B[X]] (u, v) = 1
                                                  v

for all u and EX [B[X]] (u, v) ≥ 0 for all u and v. A program B is called a regular program if the matrix
EU [B[U]] is doubly stochastic—that is, both EU [B[U]] and its transpose EU [B[U]]∗ are stochastic. A
program B is called a permutation program if B[x] is a permutation matrix for every x or, equivalently,
EX [B[X]] is doubly stochastic for every distribution X. Note that a permutation program is necessarily a
regular program and, if both B and B0 are regular or permutation programs, then so is their concatenation.
    A read-once, oblivious, regular branching program is a branching program where each layer Bi is a
regular program and likewise for a permutation branching program. We will refer to layer i as regular if
Bi is a regular program and we say that layer i is non-regular otherwise.
    Equivalently, a regular, read-once, oblivious branching program is one where, in the directed acyclic
graph, each vertex has in-degree 2 (in addition to having out-degree 2) except those in the first state
space—that is, each layer of edges is a regular graph (hence the name). A permutation, read-once,
oblivious branching program has the additional constraint that the incoming edges have distinct labels.
    A regular program B has the property that the uniform distribution is a stationary distribution of the
Markov chain defined by the matrix EU [B[U]]—that is, if u is a uniformly random start vertex and U is a
uniformly random input, then B[U](u) is a uniformly random final state. Moreover, if B is a permutation
program, the uniform distribution is stationary for EX [B[X]] for any distribution X.
    We also consider branching programs of varying width—some state spaces have more vertices than
others. The width wi of a particular state space i is the number of vertices in the ith state space. The
overall width of the branching program is the maximum width of any state space. This means that the
layers Bi may give non-square matrices. Namely, Bi [x] ∈ [0, 1]wi−1 ×wi for all i and x, where wi is the width
of state space i.

                        T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                   9
                               T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

2.3   Norms
We are interested in constructing a random variable X (the output of the pseudorandom generator) such that
EX [B[X]] ≈ EU [B[U]], where U is uniform on {0, 1}n and B is a width-3, read-once, oblivious branching
program. Throughout we use U to denote the uniform distribution. The error of the pseudorandom
generator will be measured by a norm of the matrix EX [B[X]] − EU [B[U]].
                            0
    For a matrix A ∈ Rw×w , define the ρ operator norm of A by
                                                                 kvAkρ
                                                    kAkρ = max             ,
                                                             v    kvkρ
                                                                                                        0
where ρ specifies a vector norm (usually 1, 2, or ∞ norm). Define the Frobenius norm of A ∈ Rw×w by

                                   kAk2Fr = ∑ A(u, v)2 = trace(A∗ A) = ∑ |λ |2 ,
                                              u,v                              λ

where A∗ is the (conjugate) transpose of A and the last sum is over the singular values λ of A. Note that
kAk2 ≤ kAkFr for all A.
   If f : {0, 1}n → {0, 1} is a function computed by a width-w, length-n ordered branching program B,
we can relate the error of f under pseudorandom input X versus uniformly random input U by
                                                                 √
                      P [ f (X) = 1] − P [ f (U) = 1] ≤           w · E [B[X]] − E [B[U]]       .      (2.1)
                       X                  U                              X         U        2

In particular, if u0 ∈ [w] is the start state of B and S ⊂ [w] is the set of accept states of B, then f (x) =
∑v∈S B[x](u0 , v) for all x ∈ {0, 1}n . Furthermore, if there is
                                                              √
                                                                 a single accept state (which can be assumed
by collapsing all the accept states into one), the factor of w can be omitted.

2.4   Fourier analysis
                           0
Let B : {0, 1}n → Rw×w be a matrix-valued function (such as given by a length-n, width-w, read-once,
oblivious branching program). Then we define the Fourier transform of B as a matrix-valued function
                   0
Bb : {0, 1}n → Rw×w given by
                                      b := E [B[U]χs (U)] ,
                                      B[s]
                                                         U
where s ∈ {0, 1}n (or, equivalently, s ⊂ [n]) and

                                       χs (x) = (−1)∑i x(i)·s(i) = ∏(−1)x(i) .
                                                                     i∈s

             b as the sth Fourier coefficient of B. The order of a Fourier coefficient B[s]
We refer to B[s]                                                                            b is |s|—the
Hamming weight of s, which is the size of the set s or the number of 1s in the string s. Note that this is
equivalent to taking the real-valued Fourier transform of each of the w · w0 entries of B[x] separately, but
we will see below that this matrix-valued Fourier transform is nicely compatible with matrix algebra.
   For a random variable X over {0, 1}n we define its sth Fourier coefficient as
                                                    b := E [χs (X)] ,
                                                    X(s)
                                                             X


                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                10
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

which, up to scaling, is the same as taking the real-valued Fourier transform of the probability mass
function of X. We have the following useful properties.
                                         0                                  0       00
Lemma 2.1. Let A : {0, 1}n → Rw×w and B : {0, 1}n → Rw ×w be matrix-valued functions. Let X,Y,U
be independent random variables over {0, 1}n , where U is uniform. Let s,t ∈ {0, 1}n . Then we have the
following.
      • Decomposition: If C[x ◦ y] = A[x] · B[y] for all x, y ∈ {0, 1}n , then C[s
                                                                               b ◦ t] = A[s]
                                                                                        b · B[t].
                                                                                             b

      • Expectation: EX [B[X]] = ∑s B[s]
                                    b X(s).
                                         b

      • Fourier Inversion for Matrices: B[x] = ∑s B[s]χ
                                                  b s (x).
                                                            h          i
      • Fourier Inversion for Distributions: PX [X = x] = EU X(U)χ
                                                              b   U (x) .

      • Convolution for Distributions: If Z = X ⊕Y , then Z(s)   b · Yb (s).
                                                          b = X(s)
                                            2       h          i
      • Parseval’s Identity: ∑s∈{0,1}n B[s]
                                       b      = EU kB[U]k2Fr .
                                                Fr

      • Convolution for Matrices: If, for all x ∈ {0, 1}n , we have C[x] = EU [A[U] · B[U ⊕ x]], then C[s]
                                                                                                      b =
        A[s] · B[s].
        b      b

    The Decomposition property is what makes the matrix-valued Fourier transform more convenient
than separately taking the Fourier transform of the matrix entries, as done by Bogdanov et al. [6]. If B is
a length-n, width-w, ordered branching program, then, for all s ∈ {0, 1}n ,
                                       b = Bb1 [s1 ] · Bb2 [s2 ] · · · · · Bbn [sn ] .
                                       B[s]

2.5     Fourier mass
We analyze small bias distributions as pseudorandom generators for read-once, oblivious branching
programs using Fourier analysis. Intuitively, the Fourier transform of a branching program expresses that
program as a linear combination of linear functions (parities), which can then be fooled using a small-bias
space.
    Define the Fourier mass of a matrix-valued function B to be

                                               L(B) := ∑ B[s]
                                                         b                      .
                                                           s6=0             2


Also, define the Fourier mass of B at level k as

                                        Lk (B) :=           ∑               B[s]
                                                                            b                .
                                                                                         2
                                                      s∈{0,1}n :|s|=k

Note that L(B) = ∑k≥1 Lk (B). We define
                                                                        0
                                              L≥k (B) := ∑ Lk (B) .
                                                              k0 ≥k


                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                          11
                         T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

We analogously define the quantities L≤k (B), L>k (B), and L<k (B).
   The following lemma asserts that the Fourier mass is unaffected by order.
Lemma 2.2. Let B, B0 : {0, 1}n → Rw×w be matrix-valued functions satisfying B[x] = B0 [π(x)], where π :
                                                                b = Bb0 [π(s)]. In particular, L(B) = L(B0 )
[n] → [n] is a permutation. Then, for all s ∈ {0, 1}n , we have B[s]
     k          k  0
and L (B) = L (B ) for all k.
    Lemma 2.2 implies that the Fourier mass of any read-once, oblivious branching program B is equal to
the Fourier mass of the corresponding ordered branching program B0 .

2.6   Small-bias distributions
The bias of a random variable X over {0, 1}n is defined as

                                            bias(X) := max |X(s)|
                                                            b     .
                                                           s6=0

A distribution is ε-biased if it has bias at most ε. Note that a distribution has bias 0 if and only if it is
uniform. Thus a distribution with small bias is an approximation to the uniform distribution. We can
sample an ε-biased distribution X on {0, 1}n with seed length O(log(n/ε)) and using space O(log(n/ε))
[31, 2].
    Small-bias distributions are useful pseudorandom generators: an ε-biased random variable X is
indistinguishable from uniform by any F2 -linear function (a parity of a subset of the bits of X). That is,
for any nonempty s ⊂ [n], we have |EX [ i∈s Xi ] − 1/2| ≤ ε/2.
                                          L

    If L(B) is small, then B is fooled by a small-bias distribution, as stated below.
Lemma 2.3. Let B be a length-n, width-w, read-once, oblivious branching program. Let X be an ε-biased
random variable on {0, 1}n . We have


                            E [B[X]] − E [B[U]]        =    ∑ B[s]
                                                              b X(s)
                                                                   b       ≤ L(B)ε .
                             X          U          2       s6=0        2

    It is known that for length-n, width-2, read-once, oblivious branching programs B we have L(B) ≤
O(n) [5], so they are fooled by 1/poly(n)-biased generators (which have seed length O(log n)). But for a
length-n, width-3, permutation, read-once, oblivious branching program B we can have L(B) = 2Θ(n) . For
example, the program Bmod 3 that computes the Hamming weight of its input modulo 3 has exponential
Fourier mass and indeed is not fooled by the 2−Θ(n) -biased distribution that is uniform on {x ∈ {0, 1}n : |x|
mod 3 = 0}. This means we cannot apply small-bias pseudorandom generators directly. However, small-
bias distributions play a large part in our proof. The key is that, after some pre-processing (namely,
applying an “averaging restriction”), we can ensure that L(B) is small.


3     Fourier analysis of width-3 branching programs
In this section we prove a bound on the low-order Fourier mass of width-3, read-once, oblivious branching
programs. This is key to the analysis of our pseudorandom generator.

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                               12
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Theorem 3.1. Let B be a length-n, width-3, read-once, oblivious (but unordered) branching program.
Then, for all k ∈ [n],
                                      Lk (B) ≤ n2 · O(log n)k .

    In terms of Boolean functions, if f : {0, 1}n → {0, 1} is computed by a 3OBP, then, for all k ∈ [n], we
have Lk ( f ) ≤ n2 · (O(log n))k . This holds because there is a 3OBP B such that f (x) = B[x](u, v) for some
entry (u, v) ∈ [3] × [3], whence Lk ( f ) ≤ Lk (B). This yields Theorem 1.4.
    The proof proceeds in two parts. The first part reduces the problem to one about branching programs
of a special form, namely many state spaces have collapsed to width 2. The second part uses the mixing
properties of such almost-width-2 programs to bound the Fourier mass.


3.1     Reduction of width by random restriction
The first part of our proof is a reduction from an arbitrary width-3, ordered branching program to one
with many state spaces collapsed to width 2. Formally, our reduction can be stated as follows.

Proposition 3.2. Let B be a length-n, width-3, ordered branching program (abbreviated 3OBP) with the
first and last state spaces having width at most 2. Then, for all k ≤ m ≤ n, there exist length-n 3OBPs
D1 , . . . , Dn such that

   1. we have the bound                           n
                                                 m
                                      k
                                    L (B) ≤ n ·    ∑ 2−(`−1)(m−k) Lk (D` )
                                                 k `=0

        and

   2. each D` can be decomposed as

                                      D` = D`1 ◦ D`2 ◦ · · · ◦ D`r   (r ≤ n) ,

        where each D`i is a 3OBP with at most 3`k non-regular layers and first and last state spaces of
        width at most 2.

      Before continuing, we preview how Proposition 3.2 will be used. In Section 3.2, we will prove
Lk (D` ) ≤ n · O(`)k . Invoking Proposition 3.2 with m = 2k + 1, we get
                                       n
                                      m
                           k
                         L (B) ≤ n ·    ∑ 2−(`−1)(m−k) n · O(`)k ≤ n2 · O(k)k .
                                      k `=0

Finally, in Section 3.3, we show that we may essentially assume k ≤ O(log n). Thus we get the Fourier
growth bound Lk (B) ≤ n2 · O(log n)k of Theorem 3.1.
    Now we turn our attention to proving Proposition 3.2. Our proof is based on those of Brody and
Verbin [10] and Steinberger [44] for the Coin Theorem. In particular, we will apply a careful random
restriction to B and argue that this gives a branching program of the form given by D` .

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                              13
                         T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

    First we must define the notation for our random restrictions. For g ⊂ [n] and x ∈ {0, 1}n , define the
restriction of B to g using x (denoted B|g←x ) to be the branching program obtained by setting the inputs
(eliminating layers) of B outside g to values from x and leaving the inputs in g free. More formally,
                                          B|g←x [y] = B[Select(g, y, x)] ,
where                                                    (
                                                          yi             if i ∈ g,
                                      Select(g, y, x)i =
                                                          xi             if i ∈
                                                                              / g.
    We prove Proposition 3.2 by considering a restriction B|g←x for a carefully chosen g and a random x.
We show that it suffices to bound the Fourier growth of the restricted program B|g←x (Lemma 3.3) and
that the restricted program is of the desired form D` with high probability (Lemma 3.4).
    In order to specify the permutations g that we use for our random restrictions, we must appropriately
decompose B as follows. Define a chunk to be a 3OBP with exactly one non-regular layer. We partition
B into chunks B = C1 ◦ · · · ◦Cq , where q is the number of non-regular layers in B. This partition is not
necessarily unique; we arbitrarily fix one such partitioning for each 3OBP and simply refer to the ith
chunk Ci . Let ci ⊂ [n] be the layers corresponding to Ci .
    Our random restrictions will be defined in terms of chunks, so that each chunk is either entirely
restricted or entirely free. The choice of which chunks to restrict follows the “interwoven hybrids”
technique of Steinberger [44]. The sets of variables we restrict are as follows. For t ⊂ [m], define
                                               [                              [
                                  gt :=                   ci+m· j =                       ci
                                          i∈t,0≤ j≤ q−i
                                                     m
                                                                      i∈[q]:(i mod m)∈t

and
                                             Gtk := {s ⊂ gt : |s| = k} .
We refer to gt as the t th group of indices and Gtk as the t th group of (order k) Fourier coefficients.
    The following lemma tells us that we may bound the level-k Fourier weight of B by considering a
fixed subset t ⊂ [m] of size k and bounding the level-k Fourier weight of the branching program that
results by randomly restricting the variables outside of gt .
Lemma 3.3. Let B : {0, 1}n → Rw×w be an arbitrary matrix-valued function, k ∈ [n], and m ≥ k. Define
gt as above. Then                    
                             k       m                  h         i
                            L (B) ≤      · max E Lk (B|gt ←U ) .
                                      k   t⊂[m]:|t|=k U

Proof. Every Fourier coefficient s ⊂ [n] of size k appears in Gtk for some t ⊂ [m]. Note that some Fourier
coefficients appear in more than one Gt . Thus

                             Lk (B) =        ∑            B[s]
                                                          b          ≤ ∑      ∑ B[s]
                                                                                b      .
                                                                 2                   2
                                          s⊂[n]:|s|=k                 t:|t|=k s∈Gtk

For all s ∈ Gtk , by the definition of the Fourier transform and by the convexity of k·k2 ,
                                                                       h           i       h           i
           = E [B[U]χs (U)] = E 0 Bgt ←U [U 0 ]χs (U 0 )
                                                            
   B[s]
   b                                                             = E B   \gt ←U [s]  ≤E B    gt ←U [s]
                                                                                             \           .
        2     U               2      U,U                                  2       U            2   U   2


                      T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                14
      P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Combining these inequalities yields
                                 h                     i              h      i m              h      i
       k                                                                k
      L (B) ≤ ∑          ∑ EU        B
                                     \ gt ←U [s]           = ∑ E L (B|gt ←U ) ≤     max E Lk (B|gt ←U ) .
                 t:|t|=k s∈Gtk
                                                   2
                                                            t:|t|=k
                                                                    U           k t⊂[m]:|t|=k U

    Given Lemma 3.3, we may now prove Proposition 3.2 by giving an upper bound on
                                         h            i
                                       E Lk (B|gt ←U )
                                                             U

for any fixed t ⊂ [m] with |t| = k. To do this, we prove that the random restriction B|gt ←U is with high
probability a branching program of the form D` specified by Proposition 3.2. We state this formally.

Lemma 3.4. Let B be a length-n 3OBP with width 2 in the first and last state spaces. Let k, ` ∈ [n] and
m ≥ k. Fix t ⊆ [m] with |t| = k. Then with probability at least 1 − n · 2−`·(m−k) over a random choice of
x ∈ {0, 1}n , we can write
                                      B|gt ←x = D1 ◦ D2 ◦ · · · ◦ Dr ,
where r ∈ [n] and each Di is a 3OBP with at most 3`k non-regular layers and first and last state spaces of
width at most 2.

Proof. The key step in the proof is the following claim.
Claim 3.5. Let E1 , . . . , En be independent events and suppose P [Ei ] ≥ 1/2 for all i. Then, for any a, with
probability at least 1 − n · 2−a there are no m consecutive events Ei , Ei+1 , . . . , Ei+a−1 none of which occur,
i. e.,
                             P ∃i ∈ [n − a + 1] Ei ∧ Ei+1 ∧ · · · ∧ Ei+a−1 ≤ n · 2−a .
                                                                         


Proof. By our assumptions on the events P Ei ∧ Ei+1 ∧ · · · ∧ Ei+a−1 ≤ 2−a for all i. The claim now
                                                                   

follows from a union bound.

     First decompose B into chunks B = C1 ◦ · · · ◦Cq .10 When we perform the restriction gt ← x, some of
the chunks “survive” the restriction (namely those Ci where i mod m ∈ t), while the rest are restricted;
some of the restricted chunks will collapse to width 2, meaning only one or two of the vertices in their
final state space will be reachable. We then regroup the surviving chunks as B|gt ←x = D1 ◦ D2 ◦ · · · ◦ Dr
splitting every time there is a collapse to width two. We simply have to verify that with high probability
no Di will contain more than 3`k non-regular layers.
     The key to the proof is that, when a chunk is restricted, it will reduce the width to two with probability
at least one half. (This observation is also key to the results of Brody and Verbin [10] and Steinberger [44].)
This is because of the non-regular layer contained in the chunk. Graphically, this layer must contain a
right vertex with at most one incoming edge. With probability at least one half, this edge is not selected by
the restriction and the vertex becomes unreachable, thereby reducing the width. In more detail, write the
layers of Ci = Ba ◦ · · · ◦ Bb and let Bc be the non-regular layer; then EU [Bc [U]] has a column whose sum
is strictly less than 1. So there must exist a setting of xc ∈ {0, 1} such that Bc [xc ] has a column with sum
  10 If B has no non-regular layers, then the result holds trivially. So we may restrict out attention to the case where there is at

least one non-regular layer.


                            T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                15
                              T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

less than 1. Since this sum is a non-negative integer, this means the column sum is zero. When x is such
that Bc [xc ] has a zero column, then, for any d ∈ {c, c + 1, . . . , b}, the matrix Bc [xc ] · Bc+1 [xc+1 ] · · · · Bd [xd ]
also has a zero column by induction, since each each Bd [xd ] has exactly one 1 in each row. (Graphically,
there is only one edge leaving each vertex after the restriction.) Moreover, the restriction is independent
across chunks.
     Now we apply Claim 3.5. Each restricted chunk corresponds to one event, with the event being
the width of that chunk collapsing. Consider ` · m consecutive chunks of B. Of these, ` · (m − k) are
restricted—this is the number of consecutive events a = ` · (m − k) we consider. By Claim 3.5, with high
probability, at least one of these events will occur amongst every a consecutive events. That is, with high
probability, every ` · m consecutive chunks of B contains a chunk that is reduced to width two.
     Assume that the conclusion of Claim 3.5 happens. It follows that every Di is comprised of at most
` · m chunks from B. (Otherwise we would split it into two blocks Di .) Now we can bound the number
of non-regular layers in each Di . Each of the ` · k surviving chunks contributes one non-regular layer.
In addition, the restricted chunks can contribute non-regular layers, but only one between each pair of
consecutive surviving chunks. Thus the number of non-regular layers is bounded by 3`k.

    We may now complete the proof of Proposition 3.2.

Proof of Proposition 3.2. By Lemma 3.3, it suffices to bound, for every fixed t, the quantity
                                            h            i
                                         E Lk (B|gt ←U ) .
                                                    U

    We compute the expectation of
                                                        Lk (B|gt ←U )
using Lemma 3.4. To do so we must convert the tail bound of Lemma 3.4 into a bound on the expectation.
Let `∗ (x) be the smallest ` such that we can write
                                             B|gt ←x = D`1 ◦ D`2 ◦ · · · ◦ D`r∗ ,
where each D`i is a 3OBP with at most 3`k non-regular layers and the first and last state spaces have width
at most two. Lemma 3.4 gives
                                      P [`∗ (U) > `] ≤ n · 2−`·(m−k)
for all `. Also P [`∗ (U) > n] = 0 as there are at most n layers and hence at most n non-regular layers.
Thus
                        h            i   n                  h                          i
                      E Lk (B|gt ←U ) ≤ ∑ P [`∗ (U) = `] · E Lk (B|gt ←x ) | `∗ (U) ≤ `
                        U                                            x
                                              `=0
                                               n
                                           ≤ ∑ n · 2−(`−1)·(m−k) · Lk (D` ) ,
                                              `=0

where
                                                D` = D`1 ◦ D`2 ◦ · · · ◦ D`r∗
is a branching program of length at most n that maximizes Lk (D` ) subject to each D`i having at most 3`k
non-regular layers and having width two in the first and last state spaces of each Di .

                            T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                        16
      P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

3.2    Mixing in width 2
Now it remains to bound the Fourier mass of 3OBPs of the form given by Proposition 3.2.

Proposition 3.6. Let D` be a length-n 3OBP such that

                                            D` = D`1 ◦ D`2 ◦ · · · ◦ D`r ,

where each D`i is a 3OBP with at most ` non-regular layers and width 2 in the first and last state spaces.
Then
                               Lk (D` ) ≤ 2n · (104 (` + 1))k = n · O(`)k
for all k, ` ≥ 1.

   Before we prove Proposition 3.6, we show how it implies the Fourier bound of Theorem 3.1 for
k ≤ O(log n).

Theorem 3.7. Let B be a length-n, width-3, read-once, oblivious branching program with width 2 in the
first and last state spaces. Then, for all k ∈ [n],
                                                                             k
                       Lk (B) :=        ∑             b ≤ 16n2 · 106 k
                                                     |B[s]|                       = n2 · O(k)k .
                                   s∈{0,1}n :|s|=k


     The difference between Theorems 3.7 and 3.1 is that here we have n2 · O(k)k , whereas we want
n2 · O(log n)k . Later we will reduce the analysis of higher-order coefficients to case of k ≤ O(log n).

Proof. Let B be a 3OBP and assume B has width 2 in the first and last state spaces. By Propositions 3.2
and 3.6, we have, setting m = 2k + 1,
                                       
                                        m
                         Lk (B) ≤ n ·
                                        k `≥0∑ 2−(`−1)(m−k) Lk (D` )
                                       
                                        m
                                ≤ n·
                                        k `≥0∑ 2−(`−1)(m−k) 2n · (104 (3`k + 1))k
                                              
                                      2 2k + 1
                                ≤ 4n
                                            k    ∑ 2−(`−1)(k+1) (104 (3`k + 1))k
                                                 `≥0
                                                                    k
                                      2 k     −(`−1)      4 3`k + 1
                                ≤ 4n 4 ∑ 2           · 10
                                          `≥0                2`−1
                                                       !
                                      2 k       −(`−1)
                                                                    k
                                ≤ 4n 4 ∑ 2               · 104 · 4k
                                               `≥0
                                                     k
                                ≤ 16n2 · 104 · 4 · 4k ,

as required.

                      T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                            17
                              T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

    A key notion in our proof of Proposition 3.6 is a measure of the extent to which an ordered branching
program (or subprogram) mixes, and the way this affects the Fourier spectrum. To measure the mixing,
we use a parameter that equals the second eigenvalue in the case of regular programs; this has been useful
as a measure of mixing in other work on pseudorandomness for space-bounded computation [35, 37, 38].
    For an ordered branching program D : {0, 1}n → {0, 1}w×w of width w, define

                                                                  kv · EU [D[U]]k2
                                          λ (D) =       max                        .                 (3.1)
                                                        w
                                                    v∈R :∑i vi =0       kvk2

Intuitively, if v is a distribution on start states of a branching program D, then v · EU [D[U]] is the
distribution on final states on uniformly random input. We are interested in how close v · EU [D[U]] is to
uniform relative to v. To do this we decompose v = u + v0 where u is the uniform distribution and v0 is
orthogonal to the uniform distribution. Then

                                       v · E [D[U]] = u · E [D[U]] + v0 · E [D[U]] .
                                          U                 U                 U

When D is regular, we have u · EU [D[U]] = u—that is, u is a stationary distribution for the Markov chain
defined by the program D. Regardless of whether D is regular or not,

                                                v0 · E [D[U]]       ≤ λ (D) kvk2 .
                                                    U           2

So
                                           v · E [D[U]] − u         ≤ λ (D) kv − uk2 .
                                              U                 2
Thus, in the regular case, the quantity λ (D) measures how rapidly D mixes the states towards uniform,
measured in 2-norm.
    If D is regular, we have λ (D) ∈ [0, 1], where 0 corresponds to perfect mixing and 1 to no mixing. If
D is not regular, it is possible that λ (D) > 1. However, for width-w = 2—where EU [D[U]] is a 2 × 2
matrix—it turns out that λ (D) ≤ 1 even if D is non-regular. We now state this.

Lemma 3.8. Let                                                               
                                                            1−α  α
                                                  M=
                                                             β  1−β
be a stochastic matrix. Then
                                                  k(1, −1) · Mk2
                                     λ (M) =                     = |1 − α − β | ≤ 1 .
                                                    k(1, −1)k2

    Notice that |(1 − α) − β | = |(1 − β ) − α| measures how different the behaviour is between the two
possible start states. Thus λ (M) still captures some kind of mixing, even if it is not convergence to
uniform. This lemma is crucial to our analysis and is the main reason our results do not extend to higher
widths.11 Mixing also can be used to bound Fourier coefficients via the following lemma.
  11 For further discussion of the limitations of our proof, see Section 7.



                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                       18
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Lemma 3.9. Let D = D1 ◦ D2 : {0, 1}n1 +n2 → {0, 1}w×w be the composition of programs D1 and D2 of
                                                                 b ◦ 0n2 ] for s ∈ {0, 1}n1 . Then
lengths n1 and n2 respectively. Consider the Fourier coefficient D[s

                                              b ◦ 0n2 ]
                                              D[s                  ≤ D
                                                                     c1 [s]         · λ (D2 ) .
                                                               2                2


Proof. First we show that, for any s 6= 0n1 , the rows of D  b 1 [s] sum to zero. Since D1 [x] is a stochastic
matrix for all x ∈ {0, 1} , we have D1 [x] · 1 = 1 for all x ∈ {0, 1}n1 , where ~1 is the all ones vector. Thus,
                         n1                  ~  ~
for s 6= 0n1 ,
                                           h               i       h       i
                            b 1 [s] ·~1 = E D[U] ·~1χs (U) = E ~1χs (U) = ~0 .
                            D
                                                U                               U

Thus

                              v·D        c2 [0n2 ]
                                c1 [s] · D                           v·D
                                                                       c1 [s]           v·D        c2 [0n2 ]
                                                                                          c1 [s] · D
  b ◦0 ]     n2                                      2                              2                          2
  D[s                 = max                              = max                                                     ≤ D
                                                                                                                     c1 [s]       · λ (D2 ) ,
                  2      v           kvk2                      v       kvk2                  v·D
                                                                                               c1 [s]                         2
                                                                                                          2

as
                                                         ∑(v · Dc1 [s])i = 0
                                                           i

for all v.

    Mixing is a very powerful tool on its own. For ordered branching programs B in which every layer is
mixing—that is λ (Bi ) ≤ 1 − γ < 1 for all i—Lemma 3.9 can be used with an inductive argument (simpler
than the proof below) to obtain a 1/γ O(k) bound on the level-k Fourier mass.
    We will use mixing as part of our proof. We show that any Di with width two in the first and last state
spaces with few non-regular layers (the form given by Proposition 3.2) will either mix well or have small
Fourier mass after restriction.
    To formalize this, we define the p-damped Fourier mass of a branching program B as

                                         L p (B) = ∑ pk Lk (B) = ∑ p|s| B[s]
                                                                        b                             .
                                                     k>0                 s6=0                     2


We can translate p-damped Fourier mass back to level k Fourier mass: Lk (B) ≤ L p (B)p−k for all k and p.
The main lemma we prove in this section is the following.

Lemma 3.10. If D is a length-d 3OBP with at most k ≥ 1 non-regular layers that has only two vertices
in the first and last state spaces, then
                                         λ (D) + L p (D) ≤ 1

for any p ≤ 1/104 (k + 1).

     First, we show that Lemma 3.10 implies Proposition 3.6.

                              T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                       19
                               T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

Proof of Proposition 3.6. Let D` be 3OBP of length at most n such that D` = D`1 ◦ D`2 ◦ · · · ◦ D`r , where
each D`i is a 3OBP with at most ` non-regular layers and width 2 in the first and last state spaces. Let
p = 1/104 (` + 1).
   We inductively show that
                                  L p (D`1 ◦ · · · ◦ D`i ) = L p (D`1···i ) ≤ 2i ,
and hence L p (D) ≤ 2r ≤ 2n, which implies the result. For i = 0 this is trivial. Now suppose it holds for i.
By decomposition, we have

        L p (D`1···i ◦ D`i+1 )
         =      ∑ p|s|+|t| Dd
                            ` [s] · D
                            1···i
                                    d ` [t]
                                      i+1
                                            2
                                                                                         (Lemma 2.1 Decomposition)
              (s,t)6=0

         ≤         ∑ p|s| Dd
                           ` [s]
                           1···i
                                 2
                                   ∑ p|t| Dd
                                           ` [t]
                                           i+1
                                                 2
                   s6=0                      t6=0

               + ∑ p|s| Dd
                         ` [s] · D
                         1···i
                                 d ` [0]
                                   i+1
                   s6=0                                     2

                  ` [0]
               + Dd
                  1···i            ∑ p|t| Dd
                                 2 t6=0
                                           ` [t]
                                           i+1
                                                 2

         ≤        L p (D`1···i ) · L p (D`i+1 )                                          (Lemma 3.9)
               + L p (D`1···i )λ (D`i+1 ) + Dd
                                             ` [0]
                                             1···i                    · L p (D`i+1 )
                                                                  2

         ≤ L p (D`1···i ) · 1 + Dd
                                 ` [0]
                                 1···i                  · L p (D`i+1 )                   (Lemma 3.10)
                                                    2
         ≤ 2i + 2 ,

since
                                                                  ` [0]
                                                                 Dd               ≤ 2,
                                                                  1···i
                                                                              2
as this is the norm of a 2 × 2 stochastic matrix. Thus, we have that

                                          Lk (D` ) ≤ p−k L p (D` ) ≤ 2n · (104 (` + 1))k ,

as required

      Now we turn our attention to Lemma 3.10. We split into two cases. First, if λ (D) is far from 1,
i. e., λ (D) ≤ 0.99, then we need only ensure L p (D) ≤ 1/100. This is the “easy case” which is a simple
extension of the analysis of ordered regular branching programs of Reingold et al. [36]. Second, if
λ (D) = 1, then we will see that D is trivial—i. e., L p (D) = 0—and we are also done. The “hard case” is
when λ (D) is very close to 1, i. e., 0.99 ≤ λ (D) < 1.

3.2.1   Easy case—good mixing
We extend the analysis of Reingold et al. [36] to prove the following result.

                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                     20
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Lemma 3.11. Let D be a 3OBP with at most k non-regular layers. If p ≤ 1/104 (k + 1), then L p (D) ≤
1/100.


   It immediately follows that, assuming λ (D) < 0.99 (i. e., good mixing), we have λ (D) + L p (D) ≤ 1
when p ≤ 1/104 (k + 1). This covers the “easy” case of Lemma 3.10.
    We begin with the following lemma that forms the basis of the analysis of Reingold et al.


Lemma 3.12 (Braverman et al. [9, Lemma 4], reformulated by Reingold et al. [36, Lemma 3.1]). Let B
be a length-n, width-w, ordered, regular branching program. Then


                                         ∑       i···n [1 ◦ 0
                                                Bd           n−i
                                                                 ]         ≤ 2w2 ,
                                        1≤i≤n                          2



where Bi···n denotes the ordered branching program composed of layers i, i + 1, . . . , n from B.


    The entry in the uth row and vth column of

                                             n−i
                                i···n [1 ◦ 0
                              2Bd                ] = (Bi [0] − Bi [1]) · E [Bi+1···n [U]]


is the the probability of reaching vertex v in state space n given that we reached vertex u in state space
i − 1 and the ith input bit is 0 minus the same probability given that the ith input bit is 1. Thus the quantity

                                                               n−i
                                                   i···n [1 ◦ 0
                                                  Bd               ]
                                                                           2


measures the correlation between the ith input bit and the final state of the program, which we call the
weight of bit i. Braverman et al. [9] proved Lemma 3.12 for a slightly different measure of weight. Their
result was translated into the above Fourier-analytic form by Reingold et al. [36].
    We can add some non-regular layers to get the following.


Lemma 3.13. Let B be a length-n, width-w, ordered branching program with at most k non-regular
layers. Then
                                                              √
                          ∑ Bd   i···n [1 ◦ 0
                                             n−i
                                                 ] ≤ (2w2 + 1) w(k + 1) .
                              1≤i≤n                      2



Proof. The proof proceeds by induction on k. If k = 0, the result follows from Lemma 3.12. Suppose the
result holds for some k and let B be a length-n, width-w ordered branching program with k + 1 non-regular

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                 21
                            T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

layers. Let i∗ be the index of the first non-regular layer. Then

                                    n−i                                               i∗ −i−1 [
                ∑       i···n [1 ◦ 0
                       Bd               ]       =          ∑        i···(i∗ −1) [1 ◦ 0
                                                                   B\                        ] · Bi∗ ···n [0]
               1≤i≤n                        2            1≤i<i∗                                                           2

                                                                      n−i∗
                                                       i∗ ···n [1 ◦ 0
                                                    + B[                   ]
                                                                                   2
                                                                                 n−i
                                                    +      ∑        i···n [1 ◦ 0
                                                                   Bd                  ]
                                                         i∗ <i≤n                           2
                                                                                                                 !
                                                                                               i∗ −i−1
                                                ≤             ∑       i···(i∗ −1) [1 ◦ 0
                                                                     B\                                  ]           · B[
                                                                                                                        i∗ ···n [0]
                                                           1≤i<i∗                                            2                        2
                                                     √             √
                                                   + w + (2w2 + 1) w(k + 1)
                                                       √    √            √
                                                ≤ 2w2 · w + w + (2w2 + 1) w(k + 1)
                                                           √
                                                ≤ (2w2 + 1) w(k + 2) ,

                                                √
where we use the fact that B[s]
                           b                ≤    w for any s and width-w, ordered branching program B.
                                        2



    This gives us the following bound on the Fourier mass.

Lemma 3.14. Let B be a length-n, width-w, ordered branching program with at most k non-regular
layers. Then, for all k0 ∈ [n],

           0                                            √              √          0 √                    0
        Lk (B) :=         ∑           B[s]
                                      b             ≤    w · ((2w2 + 1) w(k + 1))k ≤ w · (3w2.5 (k + 1))k .
                                                2
                    s∈{0,1}n :|s|=k


   This result is proved using Lemma 3.13 analogously to how Theorem 1.2 was proved using
Lemma 3.12 by Reingold et al. [36, Theorem 3.2].


Proof. We perform an induction on k0 . If k0 = 0, then there is only one Fourier coefficient to bound—
namely,
                                                         b n ] = E [B[U]] .
                                                         B[0
                                                                    U


Since EU [B[U]] is stochastic,
                                                                                √
                                                          E [B[U]]          ≤    w
                                                          U             2

and the base case follows. Now suppose the bound holds for k0 and consider k0 + 1. In the following

                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                            22
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

calculation, we split the Fourier coefficients based on where the last 1 is.

          ∑              B[s]
                         b
                                2
   s∈{0,1}n :|s|=k0 +1

          =    ∑            ∑            b ◦ 1 ◦ 0n−i ]
                                         B[s
              1≤i≤n s∈{0,1}i−1 :|s|=k0                           2

                                                                      n−i
          =    ∑            ∑             1···i−1 [s] · Bi···n [1 ◦ 0
                                         B\                               ]                    (by Lemma 2.1 (Decomposition))
                                                        d
              1≤i≤n s∈{0,1}i−1 :|s|=k0                                             2

                                                                              n−i
          ≤    ∑            ∑             1···i−1 [s]
                                         B\                   · Bd
                                                                 i···n [1 ◦ 0     ]
              1≤i≤n s∈{0,1}i−1 :|s|=k0                    2                            2

                      √              √          0
          ≤    ∑       w · ((2w2 + 1) w(k + 1))k · Bd
                                                    i···n [1 ◦ 0
                                                                n−i
                                                                    ]                          (by the induction hypothesis)
              1≤i≤n                                                                        2
           √              √          0           √
          ≤ w · ((2w2 + 1) w(k + 1))k · (2w2 + 1) w(k + 1)                                     (by Lemma 3.13)
           √              √          0
          = w · ((2w2 + 1) w(k + 1))k +1 ,

as required.

Proof of Lemma 3.11. We have
                                                 0   0                       0   √              √           0
                           L p (D) = ∑ pk Lk (D) ≤ ∑ pk                           3((2 · 32 + 1) 3(k + 1))k
                                         k0 ≥1                       k0 ≥1
                                                           √          !k0
                                         √               19 3(k + 1)
                                    ≤     3∑                              ≤ 1/100 .
                                             k0 ≥1        104 (k + 1)



3.2.2    Hard case—poor mixing
Now we consider the case where λ (D) ∈ [0.99, 1].

Lemma 3.15. Let D be a 3OBP with at most k non-regular layers where the first and last state spaces of
vertices have width 2. Suppose λ (D) ∈ [0.99, 1]. If p ≤ 1/(352k + 252), then L p (D) + λ (D) ≤ 1.

    This covers the ‘hard’ case of Lemma 3.10 and, along with Lemma 3.11 completes the proof of
Lemma 3.10. Since D has width 2 in the first and last state spaces, we view D[x] as a 2 × 2 matrix. Now
write                                                            
                                             1 − f (x)     f (x)
                                  D[x] =                             ,
                                               g(x)     1 − g(x)
where f , g : {0, 1}d → {0, 1}. We can write the expectation (which is stochastic) as
                                                                
                                                   1−α       α
                                    E [D[U]] =                      ,
                                    U                β     1−β

                            T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                               23
                          T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

where α = EU [ f (U)] and β = EU [g(U)]. By Lemma 3.8, λ (D) = |1 − α − β | ∈ [0.99, 1]. We can
assume (by permuting rows and columns) that α, β ∈ [0, 1/100].
    We will show that L p ( f ) ≤ O(kpα) and L p (g) ≤ O(kpβ ) for p ≤ 1/O(k). To prove Lemma 3.15, we
choose p = 1/O(k) such that L p ( f ) ≤ α/2 and L p (g) ≤ β /2. Then, since
                                                             
                                   |s|       − fb(s) fb(s)
                  L p (D) = ∑ p                                        ≤ 2L p ( f ) + 2L p (g) ≤ α + β
                            s6=0
                                              gb(s) −b g(s)        2

and λ (D) = 1 − α − β , the result follows.
    First, we give some intuition. We can view D as having two corresponding start and end states. The
probability that, starting in the first start state, we end in the first end state is 1 − α ≥ 0.99. Likewise,
the probability that, starting in the second start state, we end in the second end state is 1 − β ≥ 0.99.
The function f is computed by starting in the first start state and accepting if we end in the second
end state—that is, we “cross over.” Likewise, g computes the function telling us whether we will cross
over from the second start state to the first end state. Intuitively, there is a low probability (≤ 1/100)
of “crossing over,” so the program behaves like two disjoint programs. We will formalize this intuition
by showing that D can be partitioned into states corresponding to the first start+end states and states
corresponding to the second start+end states. This means that the width-3, read-once, oblivious branching
program can be decomposed in terms of width-2, read-once, oblivious branching programs. Then we can
apply tools tailored to width-2, read-once, oblivious branching programs to analyze this decomposition.
    The plan is as follows.

   1. Show that we can partition the states/vertices of D with O(k) edges crossing the partition such that
      each state space has at least one vertex on each side of the partition. Intuitively, this partitions D
      into two width-2, ordered branching programs with a few edges going between them.

   2. Using this partition, we decompose D into width-2, ordered branching programs. We are able to
      show that f can be expressed as a sum (or OR) of ANDs of regular, width-2, ordered branching
      programs. That is, we can express f (x) = ∑s ∏ j fs, j (x j ), where each fs, j is a {0, 1}-valued function
      computed by a regular, width-2, ordered branching program, the product is over O(k) terms, and
      the x j form a partition of the input x.

   3. Consider one term
                                         fs (x) = ∏ fs, j (x j ) and αs = E [ fs (U)] .
                                                  j                            U

      Using an analysis heavily tailored to ANDs of regular, width-2, ordered branching programs, we
      can show that L p ( fs ) ≤ O(kpαs ) for p ≤ 1/O(k). Then, by the triangle inequality,

                                     L p ( f ) ≤ ∑ L p ( fs ) ≤ ∑ O(kpαs ) ≤ O(kpα) ,
                                                  s            s

      as required.

The same holds for g, which gives the result.

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                  24
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Step 1. Partitioning the states of a poorly mixing 3OBP with few non-regular layers

Lemma 3.16. Let D be a 3OBP with at most k non-regular layers and width-2 in the first and last state
spaces. Suppose λ (D) ∈ [0.99, 1]. Then there is a partition of the vertices of D such that each state space
has at least one vertex in each side of the partition and there are at most 43k + 31 edges that cross the
partition.

Proof. We assign each vertex of D a numerical “charge”. Let v0 = (1, −1). For i ∈ [d], let vi =
vi−1 · EU [Di [U]]. The charge of vertex u ∈ [w] in state space i ∈ {0, . . . , d} is vi (u).
    In other words, the charge of a vertex (i, u) is the probability that a random execution of D starting
from the first start vertex reaches (i, u) minus the probability that an execution from the second start
vertex reaches (i, u).

Claim 3.17. Every state space has a vertex with charge > 0.49 and a vertex with charge < −0.49.

  Proof. The sum of absolute values of charges cannot increase from one state space to the next, as
  EU [Di [U]] is stochastic. That is, kvi k1 ≤ kvi−1 k1 for all i. Moreover,
                                                                               
                                                                 1−α  α
         kvd k1 = (1, −1) · E [D[U]]           = (1, −1) ·                                  = 2λ (D) ≥ 2 · 0.99 .
                              U           1
                                                                  β  1−β                1

  We also know that the charges sum to zero in every state space, as they are “conserved.” In particular,
                                                    h           i
                     ∑  vi (u) = v i ·~1 = vi−1 · E  Di [U] ·~1   = vi−1 ·~1 = ∑ vi−1 (u) .
                        u                            U                              u

       Thus, in each state space, we have three charges whose sum is zero and whose sum of absolute
   values is at least 2 · 0.99. A simple case analysis shows that this implies that at least one must be
   larger than 0.49 and at least one must be smaller than −0.49.
     Define the length of an edge in the ordered branching program to be the absolute difference between
the charges of its endpoints. More precisely, letting (i, u) denote the vertex corresponding to the state u in
state space i, the length of the edge (i − 1, u) → (i, u0 ) is |vi−1 (u) − vi (u0 )|. We will analyze the sum of
the lengths of the edges. To bound the sum of the edge lengths, we use the following.
     Following the proof of Lemma 3.12 from [9], define the “potential” of a multiset of real numbers S as

                                                 ρ(S) = ∑ |a − b| .
                                                         a,b∈S


   Let Si be the multiset that contains each point vi (u) twice. We are interested in the potentials
ρ(S0 ), . . . , ρ(Sd ) of the state spaces.

Claim 3.18. For all i ∈ [n], if Di is a regular layer, then

                                  ∑            vi−1 (u) − vi (u0 ) ≤ ρ(Si−1 ) − ρ(Si ) .
                             (i−1,u)→(i,u0 )



                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                        25
                          T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

  Proof. We will change Si−1 to Si two points at a time. At each step we argue that the potential
  decreases proportional to the length of two edges. Summing these changes will prove the claim.
      Each change is of the following form. For a vertex u in state space i, we look at the two incoming
  edges (i − 1, u1 ) → (i, u) and (i − 1, u2 ) → (i, u) and we change one copy of vi−1 (u1 ) and one copy
  of vi−1 (u2 ) to two copies of vi (u). Since the layer is regular, doing this for every u will change Si−1
  to Si .
      Now we must analyze what happens to the potential when doing this. Let Sbefore u and Safter u
  denote the set before and after changing the two points corresponding to u.
      We have

  ρ(Sbefore u ) − ρ(Safter u ) = |vi−1 (u1 ) − vi−1 (u2 )| + ∑ |vi−1 (u1 ) − a| + |vi−1 (u2 ) − a| − 2|vi (u) − a| .
                                                                  a

  Since vi (u) = (vi−1 (u1 ) + vi−1 (u2 ))/2, by the triangle inequality

                               |vi−1 (u1 ) − a| + |vi−1 (u2 ) − a| − 2|vi (u) − a| ≥ 0

  for all a. Thus

       ρ(Sbefore u ) − ρ(Safter u ) ≥ |vi−1 (u1 ) − vi−1 (u2 )| = |vi−1 (u1 ) − vi (u)| + |vi−1 (u2 ) − vi (u)| .

  The right hand side is simply the sum of the lengths of the two edges entering (i, u). Now, summing
  over all u proves the claim.

              ρ(Si−1 ) − ρ(Si ) = ∑ ρ(Sbefore u ) − ρ(Safter u ) ≥              ∑            vi−1 (u) − vi (u0 ) .
                                     u                                     (i−1,u)→(i,u0 )

   Now we have a claim bounding the sum of edge lengths.
Claim 3.19.
                             ∑           ∑          vi−1 (u) − vi (u0 ) ≤ 12k + 30(k + 1) .
                            i∈[d] (i−1,u)→(i,u0 )

  Proof. Firstly, 0 ≤ ρ(Si ) ≤ 30 for all i, as kvi k∞ ≤ 1 and |Si | ≤ 6. If Di is a regular layer, then

                                    ∑           vi−1 (u) − vi (u0 ) ≤ ρ(Si−1 ) − ρ(Si ) .
                              (i−1,u)→(i,u0 )

  We use this to bound the contribution from regular layers. If there are only regular layers, we would
  be able to say that the sum of edge lengths is bounded by 30, but we must account for the non-regular
  layers.
      For a non-regular layer Di , we have

                                              ∑            vi−1 (u) − vi (u0 ) ≤ 12 ,
                                         (i−1,u)→(i,u0 )

  as there are 6 edges each of length at most 2. A non-regular layer may also increase the potential by
  30. Each non-regular layer will contribute at most 12 to the sum of edge weights with its own edges
  and 30 by raising the potential function for the regular layers.

                        T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                          26
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

    Now we can define our partition. For t ∈ [−0.49, 0.49] define a partition (Qt , Qt ) by whether the
charge of each vertex is above or below the threshold t. Namely,

                                 Qt = {(i, u) ∈ {0, . . . , n} × [w] : vi (u) ≥ t} .

An edge (i − 1, u) → (i, u0 ) crosses the partition (Qt , Qt ) if and only if

                               vi−1 (u) ≥ t > vi (u0 ) or     vi−1 (u) < t ≤ vi (u0 ) .

    If t ∈ [−0.49, 0.49] is chosen uniformly at random, the probability that an edge (i − 1, u) → (i, u0 )
crosses the partition is at most
                                           1
                                                  |vi−1 (u) − vi (u0 )| ,
                                      0.49 + 0.49
slightly more than its length.
    By linearity of expectations, the expected number of edges crossing the partition for a random
t ∈ [−0.49, 0.49] is at most

                              1                                               42k + 30
                                   ∑         ∑          vi−1 (u) − vi (u0 ) ≤          .
                             0.98 i∈[d] (i−1,u)→(i,u0 )                         0.98

In particular, there exists some fixed choice of t ∈ [−0.49, 0.49] such that at most 43k + 31 edges cross
the partition (Qt , Qt ).
    By the first claim, each state space has a vertex with charge greater than t and a vertex with charge
less than t. Thus this partition has at least one vertex on each side in every state space.

Step 2. Decomposing a 3OBP into a sum of ANDs of regular, width-2, ordered branching pro-
grams.

Lemma 3.20. Let D be a length-d 3OBP with at most k non-regular layers and width-2 in the first and
last state spaces. Suppose λ (D) ≥ 0.99. Let f : {0, 1}n → {0, 1} be a function computed by D, namely
 f (x) = D[x](1, 1). Then we can write f (x) = ∑s ∏ j fs, j (x j ), where each fs, j is computed by a regular,
width-2, ordered branching program and the x j ’s are a partition of x into at most 88k + 63 contiguous
parts.

Proof. Fix a partition of the vertices of D as promised by Lemma 3.16. Call a layer Di critical if it is
either non-regular or it has an edge crossing the partition. Let Γ ⊂ [d] denote the set of critical layers. By
Lemma 3.16, D has at most 44k + 31 critical layers.
     Between critical layers, D is partitioned into two width-2, regular, ordered branching programs. That
is, if Di and D j are consecutive critical layers then Di+1··· j−1 is a regular ordered branching program and,
viewed as an undirected graph, it is disconnected; the connected components of Di+1··· j−1 are two ordered
branching programs of width at most 2 (or three width-1 ordered branching programs).
     The value of f is entirely determined by what happens in the critical layers. In particular, it is
determined by (the parity of) the number of times the partition is crossed. To make use of this observation,
we consider all possibilities for what happens in the critical layers.

                        T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                              27
                           T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

                          e to be the set of possibilities for critical edges—that is, s ∈ Γ
      Formally, we define Γ                                                                 e specifies for each
i ∈ Γ an edge s(i) in layer i (specifying one of three states to the left of layer i and the label, which is 0 or
1, of the edge taken). We can think of s ∈ Γ  e as a function s : Γ → [3] × {0, 1}. Let Γ  e1 denote the set of
possibilities that imply f (x) = 1.
      Given a possibility s ∈ Γe for what happens in the critical layers, we want to be able to compute
whether or not that possibility happened. That is, for s ∈ Γ,  e we want to compute the indicator function
            d
 fs : {0, 1} → {0, 1} given by

           fs (x) = 1 ⇐⇒ the path in D determined by input x uses all the edges specified by s .

Now
                                                 f (x) = ∑ fs (x) ,
                                                          s∈Γ
                                                            e1

as each path in D is consistent with exactly one s. So if we can compute fs , we can compute f .
    We claim that each fs can be written as the conjunction of at most 2|Γ| + 1 functions computed by
regular, width-2, ordered branching programs. In particular, there is one term for each critical layer and
one term for each gap between critical layers. Once we prove this claim, the lemma follows.
    Let i1 < i2 < · · · < i|Γ| be an enumeration of Γ. (Also define i0 = 0 and i|Γ|+1 = d + 1.) We will write

                                                         2|Γ|+1
                                              fs (x) =    ∏ fs, j (x j ) ,
                                                          j=1


where the x j form a partition of x as follows. For j ∈ [|Γ|], let x2 j ∈ {0, 1} be coordinate i j of x. For
 j ∈ [|Γ| + 1], let x2 j−1 ∈ {0, 1}i j −i j−1 −1 contain the coordinates of x from i j−1 + 1 to i j − 1. The function
 fs, j checks the bits in x j and is 1 if and only if the path is consistent with fs = 1, assuming the bits
outside of x j are set consistently with fs = 1. This is computable by a regular, width-2, ordered branching
program. For a critical layer, fs,2 j (x2 j ) is simply x2 j or ¬x2 j . Between critical layers, D is partitioned into
two regular, width-2, ordered branching programs and fs,2 j−1 must simply simulate one of these.

Step 3. Tailored analysis of a sum of ANDs of regular, width-2, ordered branching programs We
use the following fact about regular width-2 ordered branching programs—a very simple class of functions,
that permits a case analysis.

Lemma 3.21. Let f : {0, 1}n → {0, 1} be computed by a width-2 regular ordered branching program.
Then EU [ f (U)] ∈ {0, 1/2, 1} and L p ( f ) ≤ p/2 for all p ∈ [0, 1].

Proof. Every layer of a regular width-2 ordered branching program falls into one of three cases: trivial
layers (the input bit does not affect the state), a (negated) XOR (flips the state depending on the input bit),
or a (negated) dictator (sets the state based on the current input bit regardless of the previous state). Thus
any such branching program is either a constant function, which gives EU [ f (U)] ∈ {0, 1} and L p ( f ) = 0,
or is a (possibly negated) XOR of a subset of the input bits. In the latter case f has one non-trivial Fourier
coefficient of magnitude 1/2, which implies EU [ f (U)] = 1/2 and L p ( f ) ≤ p/2.


                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                     28
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Lemma 3.22. Let f : {0, 1}n → {0, 1} be of the form f (x) = ∏ j∈[k] f j (x j ), where the x j form a partition
of x and each f j is computed by a width-2, ordered, regular branching program. Then L p ( f ) ≤ 2kp ·
EU [ f (U)] for p ≤ 1/k.
Proof. Define α j = EU [ f j (U)]. We have α = ∏ j∈[k] α j . Now

         α + L p ( f ) = ∑ p|s| | fb[s]| = ∑ p|s| ∏ | b
                                                      f j [s j ]| = ∏ ∑ p|s j | | b
                                                                                  f j [s j ]| = ∏ (α j + L p ( f j )) .
                          s                     s    j∈[k]            j∈[k] s j                 j∈[k]

Since each f j is computed by a width-2, regular, ordered branching program, α j ∈ {0, 1/2, 1} by
Lemma 3.21. If α j = 0, then f j = 0, whence f = 0 and the result is trivially true; so this case is covered
and we do not consider it further. Moreover, if α j = 1, then f j = 1 and L p ( f j ) = 0. Thus we can ignore
the factors with α j = 1, as they are just 1. Let J = { j ∈ [k] : α j = 1/2}. Thus we are left with
                                                                           !                       !
                       1
       Lp( f ) = ∏       + L p ( f j ) − 2−|J| = α · ∏ (1 + 2L p ( f j )) − 1 ≤ α · ∏ (1 + p) − 1
                 j∈J   2                             j∈J                               j∈J
                                   
                         p |J|
               ≤ α · (e ) − 1 ≤ α · 2p|J| ≤ α2pk ,

as long as p|J| ≤ 1.
      Now we complete the analysis of the poor mixing case.
Proof of Lemma 3.15. Let D be a 3OBP with at most k non-regular layers, where the first and last state
spaces have width two and λ (D) ∈ [0.99, 1]. Fix p ≤ 1/4(88k + 63). Write
                                                                
                                              1 − f (x)   f (x)
                                  D[x] =
                                                g(x)    1 − g(x)
and                                                                               
                                                                 1−α  α
                                               E [D[U]] =                              ,
                                               U                  β  1−β
where f , g : {0, 1}d → {0, 1} are functions and α = EU [ f (U)] and β = EU [g(U)] are their expectations.
By Lemma 3.8, λ (D) = |1 − α − β | ∈ [0.99, 1]. We can assume that α, β ∈ [0, 1/100].
    By Lemma 3.20, we can write f (x) = ∑s ∏ j fs, j (x j ), where the product is over at most 88k + 63
terms, the x j form a partition of x, and each fs, j is computed by a width-2, regular, ordered branching
program. Let fs (x) = ∏ j fs, j (x j ). Then, by Lemma 3.22,
                 L p ( f ) ≤ ∑ L p ( fs ) ≤ ∑ 2(88k + 63)p · E [ fs (U)] = 2(88k + 63)pα ≤ α/2 .
                              s                 s                 U

Likewise L p (g) ≤ β /2.
   Now                                                               
                                     |s|       1 − f (x)   f (x)
                 L p (D) = ∑ p                                                ≤ 2L p ( f ) + 2L p (g) ≤ α + β
                              s6=0
                                                 g(x)    1 − g(x)         2
and
                                     L p (D) + λ (D) ≤ (α + β ) + (1 − α − β ) ≤ 1 ,
as required.

                          T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                           29
                                T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

3.3     Bootstrapping
Theorem 3.7 gives a bound on the Fourier growth of width-3, read-once, oblivious branching programs of
the form Lk (B) ≤ n2 · O(k)k . The kk term is inconvenient, but can be easily removed by “bootstrapping.”
The following proposition shows that if we can bound the Fourier mass up to level k∗ = O(log n), then
we can bound the Fourier mass at all levels k.
                                                                                     ∗
Proposition 3.23. Fix parameters k∗ , a, b, n with a · n ≤ 2k . Let B be a length-n, ordered branching
program such that, for all i, j, k ∈ [n] with k ≤ 2k∗ and i ≤ j, we have Lk (Bi··· j ) ≤ a · bk , where Bi··· j
denotes the branching program consisting of layers i through j of B. Then, for all i, j, k ∈ [n] with i ≤ j,
we have Lk (Bi··· j ) ≤ a · (2b)k .

Proof. Suppose the proposition is false and fix the smallest k ≥ 0 such that Lk (Bi··· j ) > a · (2b)k for some
i ≤ j. By assumption, if k ≤ 2k∗ , then Lk (Bi··· j ) ≤ a · bk . Thus k > 2k∗ . Let k0 = k − k∗ . By the minimality
                                                             0                   0
of our choice of k and the fact that k0 < k, we have Lk (Bi··· j ) ≤ a · (2b)k for all i ≤ j. Now

           Lk (Bi··· j ) =              ∑                   Bbi··· j [s]
                                                                           2
                             s⊂{1,..., j−i+1}, |s|=k
                             j+1
                       ≤∑                                          ∑                                Bbi··· j [s1 ◦ s2 ]
                                                                                                                           2
                             `=i
                                               s1 ⊂ {1, . . . , ` − i},            |s1 | = k0 ,
                                   s2 ⊂ {` − i + 1, . . . , j − i + 1},            |s2 | = k∗
                             j+1
                       =∑                   ∑                                  ∑                  Bbi···`−1 [s1 ] · Bb`··· j [s2 ]
                             `=i s1 ⊂{1,...,`−i}, |s1 |=k0 s2 ⊂{`−i+1,..., j−i+1}, |s2 |=k∗                                          2

                             j+1
                       ≤∑                   ∑                                  ∑                  Bbi···`−1 [s1 ]       · Bb`··· j [s2 ]
                             `=i s1 ⊂{1,...,`−i}, |s1 |=k0 s2 ⊂{`−i+1,..., j−i+1}, |s2 |=k∗                         2                      2

                             j+1
                                    0                   ∗
                       = ∑ Lk (Bi···`−1 ) · Lk (B`··· j )
                             `=i
                             j+1
                                                0       ∗
                       ≤ ∑ a · (2b)k · a · bk
                             `=i
                                            0       ∗
                       ≤ n · a · (2b)k · a · bk
                                      na 
                       = a · (2b)k · k∗ .
                                       2
                ∗
Since na ≤ 2k , we have a contradiction, as we assumed Lk (Bi··· j ) > a · (2b)k .

      Now we combine Theorem 3.7 with Proposition 3.23 to prove Theorem 3.1.

Proof of Theorem 3.1. Let B be a length-n 3OBP with width 2 in the first and last state spaces. (The
assumption of having width 2 in the first and last state spaces can be removed at the expense of a constant
factor increase in the bound, as each entry in the matrix B[x] can be equated with an entry in the matrix

                             T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                             30
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

of a 3OBP with width 2 in the first and last state spaces by simply collapsing the irrelevant entries.) By
Theorem 3.7, we have
                                                                    k
                                       Lk (Bi··· j ) ≤ 16n2 · 106 k
for all i, j, k ∈ [n]. Now we set the “bootstrapping level” k∗ = dlog2 (16n3 )e = Θ(log n) and the Fourier
growth parameters a = 16n2 and b = 106 · 2k∗ to satisfy the hypotheses of Proposition 3.23. Thus, for
any k ∈ [n], we have
                                                                    k
                                       Lk (B) ≤ 16n2 · 2 · 106 · 2k∗ ,
which implies the theorem.


4      The pseudorandom generator
Our main result Theorem 1.1 follows from plugging our Fourier growth bound (Theorem 3.1) into the
analysis of Reingold et al. [36]. Unfortunately, Reingold et al. do not have a general theorem linking
Fourier growth to a pseudorandom generator, although this is implicit. For completeness, we include a
general statement here and give a proof.
Theorem 4.1. Let C be a set of ordered branching programs of length at most n and width at most w
that is closed under restrictions and subprograms—that is, if B ∈ C, then B|t←x ∈ C for all t and x and
Bi··· j ∈ C for all i and j. Suppose that, for all B ∈ C and k ∈ [n], we have Lk (B) ≤ abk , where b ≥ 2. Let
ε > 0.
      Then there exists a pseudorandom generator
                                                Ga,b,n,w,ε : {0, 1}sa,b,n,w,ε → {0, 1}n
with seed length                                                          
                                                                       abwn
                            sa,b,n,w,ε = O b · log(b) · log(n) · log
                                                                         ε
such that, for any length-n, width-w, read-once, oblivious (but unordered) branching program B that
corresponds to an ordered branching program in C,12
                                                                              
                                          E          B[Ga,b,n,ε (Usa,b,n,w,ε )] − E [B[U]]       ≤ε.
                                      Usa,b,n,w,ε                               U
                                                                                             2

Moreover, Ga,b,n,w,ε can be computed in space O(sa,b,n,w,ε ).
    To prove Theorem 1.1 we set C to be the class of all 3OBPs of length at most n. Theorem 3.1
gives a bound corresponding to a = O(n2 ) and b = O(log n). This gives the required generator. The
statements of Theorems 1.1 and 4.1 differ in that Theorem 4.1 bounds the error of the pseudorandom
generator with respect to a matrix-valued function, while Theorem 1.1 bounds the error with respect to a
{0, 1}-valued function. These statements are equivalent as the {0, 1}-valued function is simply one entry
in the matrix-valued function; see (2.1).
    The generator is the same as that used by Reingold et al. [36] and Gopalan et al. [18].
    12 That is, there exists B0 ∈ C and a permutation of the bits π : {0, 1}n → {0, 1}n such that B[x] = B0 [π(x)] for all x.



                             T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                              31
                              T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

4.1     Pseudorandom restrictions
Our pseudorandom generator repeatedly applies pseudorandom restrictions. For the analysis, we introduce
the concept of an averaging restriction as in Gopalan et al. [18] and Reingold et al. [36], which is subtly
different from the restrictions in Section 3.

Definition 4.2. For t ∈ {0, 1}n and a length-n branching program B, let B|t be the (averaging) restriction
of B to t—that is, B|t : {0, 1}n → Rw×w is a matrix-valued function given by

                                                B|t [x] := E [B[Select(t, x,U)]] ,
                                                            U

where U is uniform on {0, 1}n .

   In this section we show that, for a pseudorandom T (generated using few random bits), if B has good
Fourier growth, then L(B|T ) is small with high probability. In particular, we generate T using an almost
O(log n)-wise independent distribution, which we now define.

Definition 4.3. A random variable X on Ωn is δ -almost k-wise independent if, for every index set
I = {i1 , i2 , . . . , ik } ⊂ [n] with |I| = k, the coordinates (Xi1 , Xi2 , . . . , Xik ) ∈ Ωk are δ -close (in statistical
distance) to being independent—that is, for all T ⊂ Ωk ,
                                                                                          !
                               ∑      P [(Xi1 , Xi2 , . . . , Xik ) = x] − ∏ P [Xil = xl ]      ≤δ.
                                          X                                            X
                               x∈T                                             l∈[k]

We say that X is k-wise independent if it is 0-almost k-wise independent.

    We can sample a random variable X on {0, 1}n that is δ -almost k-wise independent such that each bit
has expectation p = 2−d using O(kd + log(1/δ ) + d log(nd)) random bits [36, Lemma B.2].
    The following lemma, proved in essentially the same way as Lemma 5.3 in [36], tells us that L(B|T )
will be small for T chosen from a δ -almost k-wise distribution with appropriate parameters.

Lemma 4.4. Let B be a length-n, width-w, ordered branching program. Let T be a random variable over
{0, 1}n where each bit has expectation p and the bits are δ -almost 2k-wise independent. Suppose that, for
                                                     0                  0
all i, j, k0 ∈ [n] such that k ≤ k0 < 2k, we have Lk (Bi··· j ) ≤ a · bk . If we set p ≤ 1/2b and δ ≤ 1/(2b)2k ,
then                                        h               i          2a
                                          P L≥k (B|T ) > 1 ≤ n4 · k .
                                          T                            2
(Recall that L≥k (g) = ∑nj=k L j (g).)

      The proof is similar to that of Proposition 3.23.

Proof. Let k ≤ k0 < 2k. We have that for all i and j,
                                                                                                                           
 h 0              i
                                                                          k0               k0   k0         1         1              2a
E Lk (Bi··· j |T ) =         ∑            P [s ⊂ T ] Bd
                                                      i··· j [s]       ≤ L (B)(p + δ ) ≤ ab                    0 +              ≤      .
T
                       s⊂{i··· j}:|s|=k   0T                       2                                     (2b)k     (2b)2k           2k

                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                     32
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Applying Markov’s inequality and a union bound, we have that, for all β > 0,
                         h                   0
                                                            i           2a
                       P ∀1 ≤ i ≤ j ≤ n Lk (Bi··· j |T ) ≤ β ≥ 1 − n2 k .
                       T                                               2β
Applying a union bound over values of k0 and setting β = 1/n, we obtain
                                                                    
                                                      0            1             2a
                  P ∀k ≤ k0 < 2k ∀1 ≤ i ≤ j ≤ n Lk (Bi··· j |T ) ≤     ≥ 1 − n4 · k .
                  T                                                n             2
The result now follows from the following Lemma.
Lemma 4.5 (Reingold et al. [36, Lemma 5.4]). Let B be a length-n, ordered branching program and
                                                                                                    0
t ∈ {0, 1}n . Suppose that, for all integers i, j, k0 with 1 ≤ i ≤ j ≤ n and k ≤ k0 < 2k, we have Lk (Bi··· j |t ) ≤
                                                           00
1/n. Then, for all k00 ≥ k and all i and j, we have Lk (Bi··· j |t ) ≤ 1/n.
      The proof of Lemma 4.5 is similar to that of Proposition 3.23.

4.2     Pseudorandom generator construction
The following lemma gives the basis of our pseudorandom generator. In our application, the Fourier
growth parameters are a = O(n2 ) and b = O(log n) and the independence parameters are δ = 1/nO(log log n)
and k = O(log n).
Lemma 4.6. Let the parameters a and b and the class C be as in Theorem 4.1 and B ∈ C. Let ε ∈ (0, 1).
Let T be a random variable over {0, 1}n that is δ -almost 2k-wise independent and each bit has expectation
p, where we require
                                      k ≥ log2 8an4 w/ε , and δ ≤ 1/(2b)2k .
                                                           
                    p ≤ 1/(2b) ,
Let U be uniform over {0, 1}n . Let X be a µ-biased random variable over {0, 1}n with µ ≤ ε/2abk . Then

                                    E [B[Select(T, X,U)]] − E [B[U]]                 ≤ε.
                                  T,X,U                           U              2

   Thus Select(T, X,U) is a pseudorandom distribution. To complete the construction we will recursively
generate U
         e and use Select(T, X, U)
                                e as the pseudorandom output.

Proof. For a fixed t ∈ {0, 1}n , we have

                        E [B[Select(t, X,U)]] − E [B[U]]          = E [B|t [X]] − E [B[U]]
                       X,U                         U          2         X                  U        2

                                                                  =       ct [s]X(s)
                                                                        ∑ B|    b
                                                                       s6=0                    2

                                                                  ≤∑          ct [s]
                                                                              B|           |X(s)|
                                                                                            b
                                                                      s6=0             2

                                                                  ≤ L(B|t )µ.



                        T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                    33
                                T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

 Conditioning on whether or not L≥k (B|T ) > 1, we have


       E [B[Select(T, X,U)]] − E [B[U]]
     T,X,U                               U           2
                                                      h             i
                                              ≤     P L≥k (B|T ) > 1 max E [B[Select(t, X,U)]] − E [B[U]]
                                                    T                 t   X,U                    U        2
                                                      h             i   h                    i
                                                        ≥k                       ≥k
                                                  + P L (B|T ) ≤ 1 µ E L(B|T ) L (B|T ) ≤ 1 .
                                                      T                            T

 We have
                                                               0        bk−1 − 1
                                     L<k (B) ≤       ∑       abk = ab            ≤ abk − 1 .
                                                   1≤k0 <k               b−1

 Thus ET L(B|T ) L≥k (B|T ) ≤ 1 ≤ abk . Lemma 4.4 gives
                              

                                                    h             i      2a
                                                  P L≥k (B|T ) > 1 ≤ n4 · k .
                                                  T                      2
 For all t, x, y, we have kB[Select(t, x, y)] − EU [B[U]]k2 ≤ 2w. Thus

                                                                                        2a
                             E [B[Select(T, X,U)]] − E [B[U]]                  ≤ n4 ·      · 2w + 1 · µ · abk
                           T,X,U                               U           2            2k
                                                                                 4an4 w        ε
                                                                               ≤   4
                                                                                        + abk
                                                                                8an w/ε       2abk
                                                                               ≤ε.

     Now we use the above results to construct our pseudorandom generator.
     The pseudorandom generator is formally defined as follows.

                                     Algorithm for Ga,b,n,ε : {0, 1}sa,b,n,ε → {0, 1}n .

Parameters: n ∈ N and ε > 0.
      Input: A random seed of length sa,b,n,ε .
             1. Compute appropriate values of parameters satisfying13

                                                     ε 0 = ε p/14w log2 (n) ,              k ≥ log2 8an4 w/ε 0 ,
                                                                                                              
                        p ≤ 1/2b ,
                        δ ≤ ε 0 (p/2)2k ,            µ ≤ ε 0 /2abk .

                However, if the algorithm is being called recursively, use the parameter values from the
                previous level of recursion.14
             2. If n ≤ 320 · dlog2 (1/ε 0 )e/p, output n truly random bits and stop.
   13 Pick parameter values that are within a constant factor of these constraints.
   14 For the purposes of the analysis we assume that ε 0 , k, p, δ , µ are the same at every level of recursion.



                             T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                  34
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

          3. Sample T ∈ {0, 1}n where each bit has expectation p and the bits are δ -almost 2k-wise
             independent.
          4. If |T | < pn/2, output 0n and stop.
          5. Recursively sample U e ∈ {0, 1}bn(1−p/2)c , i. e., U
                                                                e = Ga,b,bn(1−p/2)c,ε (U).
          6. Sample X ∈ {0, 1}n from a µ-biased distribution.
                                 e ∈ {0, 1}n .15
          7. Output Select(T, X, U)

    The analysis of the algorithm proceeds roughly as follows.

    • We have

                     p = Θ(1/b) ,                 ε 0 = Θ(ε/wb log n) ,                   k = Θ(log(abwn/ε)) ,
                    δ = 1/bΘ(k) ,                  µ = 1/bΘ(k) .

    • Every time we recurse, n is decreased to bn(1 − p/2)c. After O(log(n)/p) recursions, n is reduced
      to O(1). So the maximum recursion depth is r = O(log(n)/p) = O(b log n).

    • The probability of failing because |T | < pn/2 is small by a Chernoff bound for limited indepen-
      dence. (This requires that n is not too small and, hence, step 2.)

    • The output is pseudorandom, as
                                   h                 i
                                                  e ≈ E [B[Select(T, X,U)]] ≈ E [B[U]] .
           E [B[Ga,b,n,ε (U)]] = E B[Select(T, X, U)]
              U                         T,X,U
                                            e                                T,X,U                              U


       The first approximate equality holds because we inductively assume that U
                                                                               e is pseudorandom. The
       second approximate equality holds by Lemma 4.6.

    • The total seed length is the seed length needed to sample X and T at each level of recursion and
      O(log(1/ε 0 )/p) = O(b log(bwn/ε)) truly random bits at the last level. Sampling X requires seed
      length O(log(n/µ)) = O(k log b) and sampling T requires seed length

                          O(k log(1/p) + log(1/δ ) + log(1/p) · log(n log(1/p))) = O(k log b)

       so the total seed length is
                                                                                          
                                                                                       abwn
                      r · O(k log b) + O(b log(bwn/ε)) = O b · log(b) · log(n) · log           .
                                                                                         ε

Lemma 4.7. The probability that Ga,b,n,ε fails at step 4 is bounded by 3ε 0 —that is, PT [|T | < pn/2] ≤ 3ε 0 .

Proof. We use the following extension of a standard result [40, Theorem 4].
  15 Technically, we must pad U                                                    ei = 0 for i ∈ T ) to obtain the right length.
                              e with zeros in the locations specified by T (i. e., U


                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                35
                         T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

Lemma 4.8 (Tail bound for limited independence). Let X1 , . . . , X` be δ -almost k-wise independent
random variables with Xi ∈ {0, 1} for all i. Set X = ∑i Xi and µ = ∑i µi = ∑i EX [Xi ], and suppose
µi ≤ 1/2 for all i. If k ≤ µ/10 is even, then, for all α ∈ (0, 1),

                                                    20k bk/2c             ` k
                                                                         
                            P [|X − µ| ≥ α µ] ≤                 + 2δ ·        .
                            X                       α2µ                  αµ
Proof of Lemma 4.8. Assume, without loss of generality, that k is even. It is well known [40, 32, 3] that,
if the Xi are fully independent, then
                                       h        i
                                      E (X − µ)k ≤ (20kµ)k/2 .
                                               Xi

This also holds when the Xi are only k-wise independent, as (X − µ)k is a degree-k polynomial in the Xi .
Here the Xi are δ -almost k-wise independent, which gives
                                     h        i
                                   E (X − µ)k ≤ (20kµ)k/2 + 2δ `k .
                                      Xi

Thus we can apply Markov’s inequality to obtain the result, as follows.
                           h                  i E (X − µ)k   20kµ k/2         
                                                                                     ` k
                                                                                       
                                  k         k       X
     P [|X − µ| ≥ α µ] = P (X − µ) ≥ (α µ) ≤ i                      ≤        + 2δ        .
     X                   X                             (α µ)k           α2µ2        αµ
    By the Chernoff bound for limited independence (Lemma 4.8),
                                                    bk0 /2c                  k0
                                            20k0
                                                                   
                                                                         n
                      P [|T | < pn/2] ≤                      + 2δ ·               ,
                      T                   (1/2)2 pn                   (1/2)pn
where k0 ≤ 2k even is arbitrary. Set k0 = 2dlog2 (1/ε 0 )e. Step 2 ensures that n > 160k0 /p and our setting
                              0
of δ gives that δ ≤ ε 0 (p/2)k . Thus we have
                                                                          0
                                P [|T | < pn/2] ≤ 2− log2 (1/ε ) + 2ε 0 ≤ 3ε 0 .
                                 T

    The following bounds the error of Ga,b,n,ε .
Lemma 4.9. Let B ∈ C. Then

                                                                                    ≤ 7wrε 0 ,
                                                   
                                  E B[Gn,ε (Usn,ε )] − E [B[U]]
                                Usn,ε                                U
                                                                                2

where r = O(log(n)/p) is the maximum recursion depth of Ga,b,n,ε .
Proof. For 0 ≤ i < r, let ni , Ti , Xi , and U
                                             ei be the values of n, T , X, and U
                                                                               e at recursion level i. We have
                                         i+1
ni+1 = bni (1 − p/2)c ≤ n(1 − p/2) and Ui−1 = Select(Ti , Xi , Ui ). Let ∆i be the error of the output from
                                                e                     e
the ith level of recursion—that is,
                                                        h                          i
                                                         B0 [Select(Ti , Xi , U
                                                                              ei )] − E B0 [U]
                                                                                             
                         ∆i := max
                                0
                                              E                                                       .
                               B ∈C        Ti ,Xi ,U
                                                   ei                                 U
                                                                                                  2


                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                               36
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

   Since the last level of recursion outputs uniform randomness, ∆r = 0. For 0 ≤ i < r, we have, for
some B0 ∈ C,
                                                   h                          i
                                                    B0 [Select(Ti , Xi , U
                                                                         ei )] − E B0 [U]
                                                                                        
                  ∆i ≤                   E                                                                 · P [|T | ≥ pn/2]
                                      Ti ,Xi ,U
                                              ei                                      U                        T
                                                                                                       2
                            + 2w · P [|T | < pn/2]
                                           T
                                                   h                          i
                                                    B0 [Select(Ti , Xi , U
                                                                         ei )] − E B0 [Select(Ti , Xi ,U)]
                                                                                                         
                      ≤                  E
                                      Ti ,Xi ,U
                                              ei                                      Ti ,Xi ,U
                                                                                                                                   2

                                         E B0 [Select(Ti , Xi ,U)] − E B0 [U]
                                                                          
                            +
                                      Ti ,Xi ,U                                   U                   2
                            + 2w · P [|T | < pn/2] .
                                           T

By Lemma 4.6,
                                                        0
                                                        B [Select(Ti , Xi ,U)] − E B0 [U]                      ≤ ε0 .
                                                                                       
                                               E
                                         Ti ,Xi ,U                                      U                  2
By Lemma 4.7,
                                                             P [|T | < pn/2] ≤ 3ε 0 .
                                                              T
We claim that
                                 h                          i
                                  B0 [Select(Ti , Xi , U
                                                       ei )] − E B0 [Select(Ti , Xi ,U)]
                                                                                       
                       E                                                                                                     ≤ ∆i+1 .
                    Ti ,Xi ,U
                            ei                                        Ti ,Xi ,U
                                                                                                                         2

Before we prove the claim, we complete the proof. This gives ∆i ≤ ∆i+1                                             + ε 0 + 2w · 3ε 0 . It follows that
∆0 ≤ 7wrε 0 , as required.
   To prove the claim, consider any fixed Ti = t and Xi = x. We have
                                  h                      i
                                 E B0 [Select(t, x, U
                                                    ei )] − E B0 [Select(t, x,U)]
                                                                                
                                                                                                                   ≤ ∆i+1 .
                                 U
                                 ei                                  U
                                                                                                               2

    Consider Bx,t [y] := B0 [Select(t, x, y)] as a function of y ∈ {0, 1}ni −|t| . Then Bx,t is a width-3, read-once,
oblivious branching program of length-(ni − |t|).
    We inductively know that U   ei is pseudorandom for Bx,t —that is,
                                                        h          i           
                                                               ei ] − E Bx,t [U]
                                                       E Bx,t [U                                  ≤ ∆i+1 .
                                                       U
                                                       ei            U
                                                                                            2

Thus
          h                      i                                                       h          i
         E B0 [Select(t, x, U
                            ei )] − E B0 [Select(t, x,U)]
                                                                                                              
                                                                                                ei ] − E Bx,t [U]
                                                                                      = E Bx,t [U                                           ≤ ∆i+1 ,
         U
         ei                                        U                                        U
                                                                                            ei                       U
                                                                                  2                                                     2

as required.

                           T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                                                       37
                         T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

Proof of Theorem 4.1. Since ε 0 ≤ ε/(7wr), Lemma 4.9 implies that Ga,b,n,ε has error at most ε. The seed
length is
                                                                       
                                                                    abwn
                           sa,b,n,ε = O b · log(b) · log(n) · log
                                                                      ε
as required.


5    Optimality of the Fourier growth bound
The following result shows that Theorem 3.1 is close to optimal.

Proposition 5.1. There exists an infinite family of functions fn : {0, 1}n → {0, 1} and that are computed
by 3OBPs such that the following holds. Let q : N → R be an increasing function with 20 ≤ q(n) ≤
          √
exp(exp(o( log n))). For all sufficiently large n, there exists k ∈ [n] such that
                                                                                   k
                                    k                     log n
                                   L ( fn ) > q(n) ·                                     .
                                                     20 log log q(n)

   Our main result shows that Lk ( fn ) ≤ poly(n) · (O(log n))k . Setting q(n) = poly(n), this proposition
shows that the base O(log n) cannot be improved by more than a log log n factor. The log log n factor
comes from the fact that we allow a polynomial factor q(n) in the bound. If we demand q(n) = O(1), the
base O(log n) is optimal.

Proof. Let n = m · 2m , where m is an integer. Define fn : {0, 1}n → {0, 1} by
                                                                                !
                                        fn (x) = ∏              1 − ∏ xi, j         ,
                                                      i∈[2m ]       j∈[m]

                                                                        m
where we view x ∈ {0, 1}n as a 2m × m matrix x ∈ {0, 1}2 ×m . This is (up to a negation) the Tribes
function [4]. This function can be computed by a 3OBP. Now we show that it has large Fourier growth.
                                                                                                       m
   The Fourier coefficients of fn are given as follows. For s ⊂ [n] (which we identify with s ∈ {0, 1}2 ×m ),

                                fbn [s] = E [ f (U)χs (U)]
                                          U
                                            "                               !                #
                                     =E       ∏            1 − ∏ Ui, j χsi (Ui )
                                         U
                                             i∈[2m ]            j∈[m]
                                                      "                                      #
                                     = ∏ E χsi (Ui ) − ∏ Ui, j χsi (Ui )
                                                  U
                                        i∈[2m ]                     j∈[m]
                                                                 
                                     = ∏ I(si = 0) − 2−m (−1)|si | ,
                                        i∈[2m ]



                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                              38
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

where si = (si,1 , si,2 , . . . , si,m ). The damped Fourier mass is also easy to compute. For p ∈ (0, 1),

                       L p ( fn ) + | fbn [0]| =        ∑         p|s| fbn [s]
                                                           m
                                                   s∈{0,1}2 ×m

                                             =          ∑  m
                                                                   ∏ p|s | I(si = 0) − 2−m (−1)|s |
                                                                             i                             i

                                                   s∈{0,1}2 ×m i∈[2m ]

                                             = ∏             ∑          p|si | I(si = 0) − 2−m (−1)|si |
                                                   i∈[2m ] si ∈{0,1}m
                                                                                                 !
                                                                    −m                |si | −m
                                             = ∏             1−2          +∑p 2
                                                   i∈[2m ]                   si 6=0
                                                                   −m
                                                                         + 2−m (1 + p)m − 2−m
                                                                                                     
                                             = ∏ 1−2
                                                   i∈[2m ]
                                                              m
                                                  (1 + p)m − 2 2
                                              
                                             = 1+                .
                                                       2m

Set p = (1 + log(3 + log q(n)))/m. We have

                       1 + log(3 + log q(n)) m
                                           
                 m
         (1 + p) = 1 +                         = e1+log(3+log q(n)) (1 − o(1)) ≥ 3 + log q(n)
                                 m

for sufficiently large m. Thus, for sufficiently large m,
                              m                     m
                  (1 + p)m − 2 2         1 + log q(n) 2
                                    
 L p ( fn ) ≥ 1 +                −1 ≥ 1+                − 1 = e1+log q(n) (1 − o(1)) − 1 ≥ q(n) .
                       2m                     2m

   Suppose for the sake of contradiction that Lk ( fn ) ≤ q(n) · (log n/20 log log q(n))k for all k ∈ [n]. We
have

                                           1 + log(3 + log q(n)) k
                                                                                              k
                          k k                                                         log n
        L p ( fn ) = ∑ p L ( fn ) ≤ ∑                                 · q(n) ·
                    k∈[n]          k∈[n]
                                                      m                          20 log log q(n)
                     ≤ ∑ q(n)2−k < q(n) ,
                       k∈[n]

which is a contradiction.

    A more careful analysis gives the following bound.

Proposition 5.2. There exists an infinite family of functions fn : {0, 1}n → {0, 1} that are computed by
3OBPs such that, for all k ∈ [n],
                                                        log n k
                                                             
                                         Lk ( f ) ≥ Ω           .
                                                        log k


                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                 39
                           T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

                                                                                           m
Proof. Let n, m, and fn be as in the proof of Proposition 5.1. For s ∈ {0, 1}2 ×m , denote

                     `(s) = |{i ∈ [2m ] : si 6= 0}| = |{i ∈ [2m ] : ∃ j ∈ [m] si, j = 1}| .

Then, for all s ∈ {0, 1}n×m , we have
                                                                                    m
                 | fbn [s]| = ∏ I(si = 0) − 2−m (−1)|si | = (1 − 2−m )2 −`(s) · (2−m )`(s) .
                            i∈[2m ]

Fix ` with k/` ≤ m. Set h = bk/`c. Choose i, j ≥ 0 with i + j = ` and ih + j(h + 1) = k. Then

                    Lk ( f n ) ≥       ∑          | fbn [s]|
                                   |s|=k∧`(s)=`
                                m  i         j
                                2     m       m                   m
                             ≥                       · (1 − 2−m )2 −` · (2−m )`
                                 `    h      h+1
                                m `             (h+1) j            m  `
                                2       m hi     m                    1 2        1
                             ≥                                 · 1− m         · m
                                 `      h      h+1                   2          2
                                 `                (h+1) j
                               1 1      m hi      m
                             ≥
                               4 `       h      h+1
                                 `         hi+(h+1) j
                               1 1         m
                             ≥
                               4 `       h+1
                               1 1 `
                                               k
                                             m
                             ≥        ·              .
                               4 `        k/` + 1

Suppose k ≤ 2m−1 . Setting ` = dk/ log2 (2k)e, we have
                                        `          k                        k
                       k         1      1       m         1   1           m
                      L ( fn ) ≥                         ≥ · 2k+2 ·                 ,
                                 4      `     k/` + 1     4 2         log2 k + 2
as
                                                         k + log2 k
                       log2 (`` ) = ` log2 ` <                      log2 (k + log2 k) ≤ 2k + 2 .
                                                           log2 k
Since m = Θ(log n), this gives the result for k ≤ 2m−1 . If k > 2m−1 , then log k = Θ(log n) and the result
is trivial.


6    First-order Fourier coefficients of read-once, oblivious branching pro-
     grams
In addition to proving a Fourier growth bound for read-once, oblivious branching programs of width 3,
our techniques give a bound on the first-order Fourier mass of all constant-width, read-once, oblivious
branching programs, which we now state.

                      T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                             40
     P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

Theorem 6.1. Let B be a width-w, length-n, read-once, oblivious branching program. Then

                                         ∑ B[{i}]
                                           b        ≤ O(log n)w−2 .
                                                  2
                                        i∈[n]

     The proof is similar to the proof of the Coin Theorem by Steinberger [44]. The main difference is that
we need a variant of the “collision lemma” of Brody and Verbin [10]. We call a layer Bi of an ordered
branching program trivial if Bi [0] = Bi [1] and nontrivial otherwise. We say that a layer Bi has a collision
if there exist two edges with the same label and the same endpoint, but different start points. A layer has
a collision if and only if it is not a permutation layer. If there is a collision in layer i, then with probability
at least 1/2 a random restriction of layer i reduces the width—this is what drives Lemma 3.4.

Lemma 6.2 (Collision lemma). Let f : {0, 1}n → {0, 1} be a function computed by a width-w, ordered
branching program B. Then there exists a function g computed by a width-w, ordered branching program
B0 such that every nontrivial layer of B0 has a collision and

                                            ∑ | fb[{i}]| ≤ ∑ |bg[{i}]| .                                     (6.1)
                                                i           i

      Lemma 6.2 says that, for any function f computed by an ordered branching program, we can assume
that the branching program has a collision in every nontrivial layer without decreasing the level-one
Fourier mass. In contrast, the original collision lemma of Brody and Verbin [10] says that, for any f
computed by an ordered branching program, we can assume that every nontrivial layer has a collision
without decreasing its ability to distinguish two coins of differing bias. That is, (6.1) is replaced with
P [ f (X) = 1] − P [ f (Y ) = 1] ≤ P [g(X) = 1] − P [g(Y ) = 1], where X is the distribution on {0, 1}n given
by n independent flips of a coin with bias 1/2 + ε and Y is n flips of a coin with bias 1/2 − ε.

Proof. We will construct B0 by flipping edge labels in B (viewed as a graph). Begin by ranking the vertices
in each state space by their acceptance probability—that is, the probability that f (U) = 1 conditioned on
passing through that vertex. (If two vertices have the same acceptance probability rank them arbitrarily.)
We will flip the edge labels in each layer such that for every vertex the 0-labelled edge leads to a
higher-ranked vertex than the 1-labelled edge (or they lead to the same vertex).
    Note that flipping the edges in layer i only affects the corresponding Fourier coefficient. Thus we
need only show that flipping the edges in layer i does not decrease | fb[i]|.
    Fix a layer i. For a vertex u on the left of layer i of B (i. e., in state space i − 1), let u0 and u1 be the
vertices led to by the 0- and 1-edges respectively and let pu be the probability that a random execution
of B reaches u. For a vertex v on the right of layer i of B (i. e., in state space i), let qv be the acceptance
probability of B under uniform input conditioned on passing through vertex v. Then

                                                   1
                                          fb[{i}] = ∑ pu (qu0 − qu1 ) .
                                                   2 u

Flipping edge labels corresponds to flipping the signs of the terms in the above sum as we swap qu0 and
qu1 . Clearly, | fb[{i}]| is maximized if every term has the same sign. Our choice of flips ensures this is the
case, as qu0 ≥ qu1 .

                        T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                   41
                         T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

    It remains to verify that every nontrivial layer of B0 must have a collision. Nonregular layers have a
collision (regardless of the edge labels). Thus we need only analyze regular layers. Consider a regular
layer i and let v be the highest ranked vertex on its right such that the incoming edges are from different
vertices on the left. Suppose, for the sake of contradiction, that the incoming edges have different labels.
Pick the edge labelled 1 and let u be its start point. Let u0 be the vertex reached from u by the edge
labelled 0. Then, by our choice of labels u0 is ranked higher than v—a contradiction, as v is the highest
ranked vertex with incoming edges with different start points. (The other incoming edge of u0 can’t also
come from u, as u has an edge going to v.)

   Define ξ (n, w) to be the maximal first-order Fourier mass of any function computed by a length-n,
width-w, ordered branching program. We follow the structure of Steinberger’s proof [44] and inductively
bound ξ .
Lemma 6.3. For all n and w ≥ 3, we have ξ (n, w) ≤ d1 + 2 log2 (n)e · (ξ (n, w − 1) + 1).
Proof. Let B be a length-n, width-w, ordered branching program that maximizes the first-order Fourier
mass of the function f it computes. By Lemma 6.2, we may assume that every nontrivial layer of B has
a collision. We may assume that there are no trivial layers. (Otherwise we can remove them without
affecting the Fourier mass.)
    Let m = d1 + 2 log2 ne. Split the first-order Fourier coefficients into m groups of the form

                                            Gi0 = {i ∈ [n] : i mod m = i0 } .

We bound the first-order Fourier mass of each group separately and add them up, i. e.,

                                         ∑ fb[{i}] = ∑ ∑ fb[{i}] .
                                         i∈[n]               i0 ∈[m] i∈Gi0

Fix one group G = Gi0 .
    We apply a random restriction to B to obtain the function f |G←U computed by the branching program
B|G←U . As in Lemma 3.3, we have
                                                   "                 #
                                 ∑ fb[{i}] ≤ E ∑ f\    U
                                                          |     [{i}] .G←U
                                      i∈G                     i∈G

So if suffices to bound the first-order Fourier mass of f |G←U .
    We claim that B|G←U is a width-(w − 1), ordered branching program with probability at least
1 − n · 21−m over the choice of U ∈ {0, 1}n . This implies that
                     "                  #
                   E ∑ f\   |   G←U[{i}] ≤ ξ (n, w − 1) + n · 21−m · n ≤ ξ (n, w − 1) + 1 .
                  U
                      i∈G

Thus
                                "                        #
        ∑ fb[{i}] ≤ ∑ EU ∑ f\
                            |G ←U [{i}]       i0
                                                             ≤ ∑ ξ (n, w − 1) + 1 = m(ξ (n, w − 1) + 1) ,
       i∈[n]          i0 ∈[m]    i∈Gi0                           i0 ∈[m]


                      T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                 42
       P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

as required.
    Now, to prove the claim, fix a surviving state space i ∈ G of B|G←U other than the last surviving
state space (which can always be assumed to have width 2 anyway). State space i is followed by m − 1
restricted layers. As in Lemma 3.4, with probability at least 1 − 21−m , at least one of these layers will
reduce the width (by selecting a pair of edges causing a collision). This reduces the number of reachable
vertices and, hence, the width of state space i. A union bound gives the required probability.

Lemma 6.4. For all n ≥ 1, we have ξ (n, 2) ≤ 5.
Proof. Let B be a length-n, width-2, ordered branching program. We may assume, without loss of
generality, that B has no trivial layers, as trivial layers can be removed without affecting the other Fourier
coefficients. As in Lemma 3.21, we can do a case analysis    √ over all possible nontrivial layers, to see that
λ (Bi ) ≤ 1/2 for all i ∈ [n]. We also have kBi [0]k2 ≤ 2 (since Bbi [0] is a 2 × 2 stochastic matrix) and
                                                  b
kBbi [1]k2 ≤ 1 for all i ∈ [n]. As in Proposition 3.6, we can now inductively bound the damped Fourier
mass using Lemmas 2.1 and 3.9.

                L p (B1···i+1 ) ≤L p (B1···i )L p (Bi+1 ) + L p (B1···i )λ (Bi+1 ) + Bd
                                                                                      1···i [0]       L p (Bi+1 )
                                                                                                  2
                                                             1 √
                              ≤L p (B1···i )p + L p (B1···i ) + 2p .
                                                             2
Setting p = 1/5 shows that, if L p (B1···i ) ≤ 1, then L p (B1···i+1 ) ≤ 1. Induction then gives L1/5 (B) ≤ 1.
This implies that L1 (B) ≤ 5, as required.

      Solving the recurrence for ξ gives ξ (n, w) ≤ O(log n)w−2 , as required for Theorem 6.1.


7      Further directions
7.1     Larger width
Our results hinge on the fact that “mixing” is well understood for regular ordered branching programs [9,
36, 28, 14, 45] and for (non-regular) width-2 ordered branching programs [5]. Indeed, understanding
mixing underpins most results for restricted models of branching programs.
    The other main ingredient in our proof is using random restrictions to reduce from width 3 to “almost”
width 2 (Section 3.1). After applying the random restriction, we can exploit our understanding of mixing
for width-2 (Section 3.2).
    What about width 4 and beyond? Using a random restriction we can reduce analyzing width 4 to
“almost” width 3—that is, Proposition 3.2 generalizes. Unfortunately, the reduction does not give a true
width-3, ordered branching program, but rather a concatenation of width-4 programs that begin and end
in width-3 state spaces and have few nonregular layers; thus we cannot repeat the reduction to width 2.
Moreover, we have a poor understanding of mixing for non-regular width-3, ordered branching programs,
which means we cannot use the same techniques that have worked for width-2, ordered branching
programs. Indeed, the biggest obstacle to extending our techniques to w > 3 is mixing (Lemma 3.10).
The problem is that the parameter λ (D) is no longer a useful measure of mixing for width-3 and above. In
particular, λ (D) > 1 is possible if EU [D[U]] is a 3 × 3 matrix. To extend our techniques, we need a better

                         T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                      43
                          T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

notion of mixing. Using λ (D) is useful for regular ordered branching programs (it equals the second
eigenvalue for regular programs), but is of limited use for non-regular ordered branching programs. One
idea is to use the 1-norm, rather than the 2-norm. Define

                                                          kv EU [D[U]]k1
                                     λ1 (D) := max                       .                                (7.1)
                                                ∑u v(u)=0      kvk1

This would still satisfy λ1 (D) ∈ [0, 1] even for non-regular D of arbitrary width. The partition lemma
(Lemma 3.16) would also generalize to larger width with this definition. However, we do not know an
analog of Lemmas 3.21 or 3.22 for larger width.
    Our proof also uses a different notion of mixing—collisions. Namely, to prove Proposition 3.2, we
used the fact that a random restriction of a non-regular layer will with probability at least 1/2 result in the
width of the right side of the layer being reduced. This is a form of mixing, but it is not captured by λ .
Ideally, we want a notion of mixing that captures both λ and width-reduction under restrictions.
    Our proofs combine the techniques of Braverman et al. [9] and Reingold et al. [36] and those of
Brody and Verbin [10] and Steinberger [44]. We would like to combine them more cleanly—presently
the proof is split into two parts (Proposition 3.2 and Lemma 3.10). This would likely involve developing
a deeper understanding of the notion of mixing.

7.2   Better seed length
                         3
Our seed length O(log
                   e       n) is far from the optimal O(log n). Further improvement would require some
new techniques. We could potentially relax our notion of Fourier growth to achieve better results. We
bound Lk ( f ) in order to obtain a bound on L( f |t ) for an averaging restriction t, whence we can apply a
small-bias generator to f |t . However, it suffices to bound L(g), where g approximates f |t , as stated below.

Proposition 7.1 (De et al. [15, Proposition 2.6]). Let f , f+ , f− : {0, 1}n → R satisfy f− (x) ≤ f (x) ≤ f+ (x)
for all x and EU [ f+ (U) − f− (U)] ≤ δ . Then any ε-biased distribution X gives

                            E [ f (X)] − E [ f (U)] ≤ δ + ε · max {L( f+ ), L( f− )} .
                            X           U

     The functions f+ and f− are called sandwiching polynomials for f . This notion of sandwiching is in
fact a tight characterisation of small bias [15, Proposition 2.7]. That is, any function f fooled by all small
bias generators has sandwiching polynomials satisfying the hypotheses of Proposition 7.1.
     Gopalan et al. [18] use sandwiching polynomials in the analysis of their generator for CNFs. This
allows them to set a constant fraction of the bits at each level of recursion (p = Ω(1)), while we set a
p = 1/O(log n) fraction at each level. We would like to similarly exploit sandwiching polynomials for
branching programs to improve the seed length of the generator.
     A further avenue for improvement (also exploited by Gopalan et al.) would be to modify the generator
construction to have polyloglog(n)/p levels of recursion, rather than Θ(log(n)/p). Gopalan et al. analyze
this by showing that read-once CNFs simplify substantially after restricting all but a 1/polylog(n) fraction
of the variables. Unfortunately, we do not know of an appropriate notion of simplification to use for
arbitrary width-3 programs.

                       T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                                 44
    P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

References
 [1] A NIL A DA , O MAR FAWZI , AND H AMED H ATAMI: Spectral norm of symmetric functions. In
     Proc. 16th Internat. Workshop on Randomization and Computation (RANDOM’12), volume 7408
     of LNCS, pp. 338–349. Springer, 2012. [doi:10.1007/978-3-642-32512-0_29, arXiv:1205.5282] 4

 [2] N OGA A LON , O DED G OLDREICH , J OHAN H ÅSTAD , AND R ENÉ P ERALTA: Simple constructions
     of almost k-wise independent random variables. Random Structures Algorithms, 3(3):289–304,
     1992. Preliminary version in FOCS’90. [doi:10.1002/rsa.3240030308] 4, 12

 [3] M IHIR B ELLARE AND J OHN ROMPEL: Randomness-efficient oblivious sampling. In Proc. 35th
     FOCS, pp. 276–287. IEEE Comp. Soc. Press, 1994. [doi:10.1109/SFCS.1994.365687] 36

 [4] M ICHAEL B EN -O R AND NATHAN L INIAL: Collective coin flipping. Advances in Comput. Research,
     5:91–115, 1989. Available at author’s website. Preliminary version in FOCS’85. 5, 6, 38

 [5] A NDREJ B OGDANOV, Z EEV DVIR , E LAD V ERBIN , AND A MIR Y EHUDAYOFF: Pseudo-
     randomness for width-2 branching programs. Theory of Computing, 9(7):283–293, 2013.
     [doi:10.4086/toc.2013.v009a007] 3, 4, 12, 43

 [6] A NDREJ B OGDANOV, P ERIKLIS A. PAPAKONSTANTINOU , AND A NDREW WAN: Pseudorandom-
     ness for read-once formulas. In Proc. 52nd FOCS, pp. 240–246. IEEE Comp. Soc. Press, 2011.
     [doi:10.1109/FOCS.2011.57] 3, 11

 [7] J EAN B OURGAIN: On the distribution of the Fourier spectrum of Boolean functions. Israel J.
     Mathematics, 131(1):269–276, 2002. [doi:10.1007/BF02785861] 5

 [8] Y IGAL B RANDMAN , A LON O RLITSKY, AND J OHN L. H ENNESSY: A spectral lower bound
     technique for the size of decision trees and two level AND/OR circuits. IEEE Trans. Computers,
     39(2):282–287, 1990. [doi:10.1109/12.45216] 4

 [9] M ARK B RAVERMAN , A NUP R AO , R AN R AZ , AND A MIR Y EHUDAYOFF: Pseudorandom genera-
     tors for regular branching programs. SIAM J. Comput., 43(3):973–986, 2014. Preliminary version
     in FOCS’10. [doi:10.1137/120875673] 3, 21, 25, 43, 44

[10] J OSHUA B RODY AND E LAD V ERBIN: The coin problem and pseudorandomness for branching pro-
     grams. In Proc. 51st FOCS, pp. 30–39. IEEE Comp. Soc. Press, 2010. [doi:10.1109/FOCS.2010.10]
     3, 5, 6, 7, 13, 15, 41, 44

[11] J EHOSHUA B RUCK: Harmonic analysis of polynomial threshold functions. SIAM J. Discrete Math.,
     3(2):168–177, 1990. [doi:10.1137/0403015] 4

[12] J EHOSHUA B RUCK AND ROMAN S MOLENSKY: Polynomial threshold functions, AC0 functions,
     and spectral norms. SIAM J. Comput., 21(1):33–42, 1992. Preliminary version in FOCS’90.
     [doi:10.1137/0221003] 4

                    T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                       45
                        T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

[13] L. E LISA C ELIS , O MER R EINGOLD , G IL S EGEV, AND U DI W IEDER: Balls and bins: Smaller
     hash families and faster evaluation. SIAM J. Comput., 42(3):1030–1050, 2013. Preliminary version
     in FOCS’11. [doi:10.1137/120871626] 2

[14] A NINDYA D E: Pseudorandomness for permutation and regular branching programs. In Proc. 26th
     IEEE Conf. on Computational Complexity (CCC’11), pp. 221–231. IEEE Comp. Soc. Press, 2011.
     [doi:10.1109/CCC.2011.23] 3, 43

[15] A NINDYA D E , O MID E TESAMI , L UCA T REVISAN , AND M ADHUR T ULSIANI: Improved pseudo-
     random generators for depth 2 circuits. In Proc. 14th Internat. Workshop on Randomization and Com-
     putation (RANDOM’10), volume 6302 of LNCS, pp. 504–517. Springer, 2010. [doi:10.1007/978-3-
     642-15369-3_38] 44

[16] L ARS E NGEBRETSEN , P IOTR I NDYK , AND RYAN O’D ONNELL: Derandomized dimensionality re-
     duction with applications. In Proc. 13th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’02),
     pp. 705–712. ACM/SIAM, 2002. ACM DL. 2

[17] E HUD F RIEDGUT: Boolean functions with low average sensitivity depend on few coordinates.
     Combinatorica, 18(1):27–35, 1998. [doi:10.1007/PL00009809] 5

[18] PARIKSHIT G OPALAN , R AGHU M EKA , O MER R EINGOLD , L UCA T REVISAN , AND S ALIL P.
     VADHAN: Better pseudorandom generators from milder pseudorandom restrictions. In Proc. 53rd
     FOCS, pp. 120–129. IEEE Comp. Soc. Press, 2012. [doi:10.1109/FOCS.2012.77, arXiv:1210.0049]
     3, 4, 5, 31, 32, 44

[19] B EN G REEN AND T OM S ANDERS: Boolean functions with small spectral norm. Geometric and
     Functional Analysis, 18(1):144–162, 2008. [doi:10.1007/s00039-008-0654-y] 4

[20] V INCE G ROLMUSZ: On the power of circuits with gates of low `1 norms. Theoret. Comput. Sci.,
     188(1-2):117–128, 1997. [doi:10.1016/S0304-3975(96)00290-3] 4

[21] I FTACH H AITNER , DANNY H ARNIK , AND O MER R EINGOLD: On the power of the random-
     ized iterate. SIAM J. Comput., 40(6):1486–1528, 2011. Preliminary version in CRYPTO’06.
     [doi:10.1137/080721820] 2

[22] J OHAN H ÅSTAD: Almost optimal lower bounds for small depth circuits. In Proc. 18th STOC, pp.
     6–20. ACM Press, 1986. [doi:10.1145/12130.12132] 7

[23] A LEXANDER H EALY, S ALIL VADHAN , AND E MANUELE V IOLA: Using nondeterminism to
     amplify hardness. SIAM J. Comput., 35(4):903–931, 2006. Preliminary version in STOC’04.
     [doi:10.1137/S0097539705447281] 2

[24] RUSSELL I MPAGLIAZZO , R AGHU M EKA , AND DAVID Z UCKERMAN: Pseudorandomness
     from shrinkage.    In Proc. 53rd FOCS, pp. 111–119. IEEE Comp. Soc. Press, 2012.
     [doi:10.1109/FOCS.2012.78] 3, 4

                     T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                          46
    P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

[25] RUSSELL I MPAGLIAZZO , N OAM N ISAN , AND AVI W IGDERSON: Pseudorandomness for network
     algorithms. In Proc. 26th STOC, pp. 356–364. ACM Press, 1994. [doi:10.1145/195058.195190] 3

[26] P IOTR I NDYK: Stable distributions, pseudorandom generators, embeddings, and data
     stream computation. J. ACM, 53(3):307–323, 2006. Preliminary version in FOCS’00.
     [doi:10.1145/1147954.1147955] 2

[27] E YAL K APLAN , M ONI NAOR , AND O MER R EINGOLD: Derandomized constructions of k-wise
     (almost) independent permutations. Algorithmica, 55(1):113–133, 2009. Preliminary versions in
     RANDOM’05 and ECCC. [doi:10.1007/s00453-008-9267-y] 2

[28] M ICHAL KOUCKÝ , P RAJAKTA N IMBHORKAR , AND PAVEL P UDLÁK: Pseudorandom generators
     for group products. In Proc. 43rd STOC, pp. 263–272. ACM Press, 2011. Preliminary version in
     ECCC. [doi:10.1145/1993636.1993672] 3, 43

[29] E YAL K USHILEVITZ AND Y ISHAY M ANSOUR: Learning decision trees using the Fourier spectrum.
     SIAM J. Comput., 22(6):1331–1348, 1993. Preliminary version in STOC’91. [doi:10.1137/0222080]
     4

[30] Y ISHAY M ANSOUR: An O(nlog log n ) learning algorithm for DNF under the uniform distri-
     bution. J. Comput. System Sci., 50(3):543–550, 1995. Preliminary version in COLT’92.
     [doi:10.1006/jcss.1995.1043] 4, 5

[31] J OSEPH NAOR AND M ONI NAOR: Small-bias probability spaces: Efficient constructions and
     applications. SIAM J. Comput., 22(4):838–856, 1993. Preliminary version in STOC’90.
     [doi:10.1137/0222053] 4, 12

[32] J ELANI N ELSON: Exponential tail bounds without the moment generating function. MathOverflow,
     2013. http://mathoverflow.net/q/147239. 36

[33] N OAM N ISAN: RL ⊆ SC. Comput. Complexity, 4(1):1–11, 1994. Preliminary version in STOC’92.
     [doi:10.1007/BF01205052] 2

[34] N OAM N ISAN AND DAVID Z UCKERMAN: More deterministic simulation in logspace. In Proc.
     25th STOC, pp. 235–244. ACM Press, 1993. [doi:10.1145/167088.167162] 3

[35] O MER R EINGOLD: Undirected connectivity in log-space. J. ACM, 55(4):17:1–17:24, 2008. Prelim-
     inary version in STOC’05. [doi:10.1145/1391289.1391291] 7, 18

[36] O MER R EINGOLD , T HOMAS S TEINKE , AND S ALIL P. VADHAN: Pseudorandomness for reg-
     ular branching programs via Fourier analysis. In Proc. 17th Internat. Workshop on Random-
     ization and Computation (RANDOM’13), volume 8096 of LNCS, pp. 655–670. Springer, 2013.
     [doi:10.1007/978-3-642-40328-6_45, arXiv:1306.3004] 3, 4, 5, 6, 7, 20, 21, 22, 31, 32, 33, 43, 44

[37] O MER R EINGOLD , L UCA T REVISAN , AND S ALIL P. VADHAN: Pseudorandom walks on reg-
     ular digraphs and the RL vs. L problem. In Proc. 38th STOC, pp. 457–466. ACM Press, 2006.
     [doi:10.1145/1132516.1132583] 7, 18

                     T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                         47
                       T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

[38] E YAL ROZENMAN AND S ALIL P. VADHAN: Derandomized squaring of graphs. In Proc. 9th
     Internat. Workshop on Randomization and Computation (RANDOM’05), volume 3624 of LNCS, pp.
     436–447. Springer, 2005. [doi:10.1007/11538462_37] 18

[39] M ICHAEL E. S AKS AND S HIYU Z HOU: BPH SPACE(S) ⊆ DSPACE(S3/2 ). J. Comput. System Sci.,
     58(2):376–403, 1999. Preliminary version in FOCS’95. [doi:10.1006/jcss.1998.1616] 3, 4

[40] J EANETTE P. S CHMIDT, A LAN S IEGEL , AND A RAVIND S RINIVASAN: Chernoff-Hoeffding
     bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223–250, 1995.
     Preliminary version in SODA’93. [doi:10.1137/S089548019223872X] 35, 36

[41] A MIR S HPILKA , AVISHAY TAL , AND B EN LEE VOLK: On the structure of boolean functions with
     small spectral norm. Comput. Complexity, 26(1):229–273, 2017. Preliminary version in ITCS’14.
     [doi:10.1007/s00037-015-0110-y, arXiv:1304.0371] 4

[42] J I ŘÍ Š ÍMA AND S TANISLAV Ž ÁK: A sufficient condition for sets hitting the class of read-once
     branching programs of width 3. In Internat. Conf. Theory and Practice of Comput. Sci. (SOF-
     SEM’12), pp. 406–418, 2012. [doi:10.1007/978-3-642-27660-6_33] 3

[43] D. S IVAKUMAR: Algorithmic derandomization via complexity theory. In Proc. 34th STOC, pp.
     619–626. ACM Press, 2002. Also in CCC’02. [doi:10.1145/509907.509996] 2

[44] J OHN P. S TEINBERGER: The distinguishability of product distributions by read-once branching
     programs. In Proc. 28th IEEE Conf. on Computational Complexity (CCC’13), pp. 248–254. IEEE
     Comp. Soc. Press, 2013. [doi:10.1109/CCC.2013.33] 7, 13, 14, 15, 41, 42, 44

[45] T HOMAS S TEINKE: Pseudorandomness for permutation branching programs without the group
     theory. Electron. Colloq. on Comput. Complexity (ECCC), 19(83), 2012. Available at ECCC. 3, 7,
     43

[46] T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN: Pseudorandomness and Fourier growth
     bounds for width-3 branching programs. In Proc. 18th Internat. Workshop on Randomization
     and Computation (RANDOM’14), volume 28 of LIPIcs, pp. 885–899. Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2014. Preliminary version in ECCC. [doi:10.4230/LIPIcs.APPROX-
     RANDOM.2014.885, arXiv:1405.7028] 1

[47] H ING Y IN T SANG , C HUNG H OI W ONG , N ING X IE , AND S HENGYU Z HANG: Fourier sparsity,
     spectral norm, and the log-rank conjecture. In Proc. 54th FOCS, pp. 658–667. IEEE Comp. Soc.
     Press, 2013. [doi:10.1109/FOCS.2013.76, arXiv:1304.1245] 4

[48] YOAV T ZUR: Notions of weak pseudorandomness and GF(2n )-polynomials. Master’s thesis,
     Weizmann Institute, 2009. Available at ECCC. 3




                     T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                         48
   P SEUDORANDOMNESS AND F OURIER -G ROWTH B OUNDS FOR W IDTH -3 B RANCHING P ROGRAMS

AUTHORS

    Thomas Steinke
    Postdoctoral researcher
    IBM Almaden Research Center
    San Jose, CA
    width3 thomas-steinke net
    http://www.thomas-steinke.net


    Salil Vadhan
    Professor
    John A. Paulson School of Engineering and Applied Sciences
    Harvard University, Cambridge, MA
    salil seas harvard edu
    http://www.seas.harvard.edu/~salil


    Andrew Wan
    Research staff member
    Institute for Defense analyzes, Alexandria, VA
    atw12 columbia edu
    http://www1.cs.columbia.edu/~atw12


ABOUT THE AUTHORS

    T HOMAS S TEINKE is a postdoctoral researcher at the IBM Almaden Research Center. He
       completed his Ph. D. at Harvard University in 2016 advised by Salil Vadhan and was
       previously an undergraduate student at the University of Canterbury in New Zealand. His
       research interests include pseudorandomness, data privacy, and adaptive data analysis.
       When not proving theorems, he enjoys rock climbing with his wife Joy.


    S ALIL VADHAN is the Vicky Joseph Professor of Computer Science and Applied Math-
       ematics at the Harvard John A. Paulson School of Engineering & Applied Sciences.
       He received his Ph. D. under the supervision of Shafi Goldwasser at MIT in 1999; his
       dissertation was entitled “A Study of Statistical Zero-Knowledge Proofs.” Other research
       interests include the theory of pseudorandomness and the theory and practice of data
       privacy. He enjoys spending leisure time with his wife and two daughters, as well as
       learning to surf when the Boston weather or travel destinations permit.




                   T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                          49
                  T HOMAS S TEINKE , S ALIL VADHAN , AND A NDREW WAN

A NDREW WAN is currently a research staff member at the Institute for Defense analyzes.
   Before this, he was a research fellow at the Simons Institute for the Theory of Computing,
   a postdoctoral fellow at Harvard University, and an assistant professor at IIIS, Tsinghua
   University. He received his doctorate from Columbia University in 2010 under the
   supervision of Tal Malkin and Rocco Servedio. His interests include complexity theory,
   cryptography, and machine learning. Before graduate school, he was a student of
   philosophy at Columbia University and enjoyed playing the piano, the trumpet, and the
   accordion. Although he still enjoys playing music, the PAC model rarely affords him the
   time.




               T HEORY OF C OMPUTING, Volume 13 (12), 2017, pp. 1–50                            50