Improved Pseudorandom Generators from Pseudorandom Multi-switching Lemmas

Authors Rocco A. Servedio, Li-Yang Tan,
Plaintext
                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46
                                        www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2019



    Improved Pseudorandom Generators from
     Peudorandom Multi-switching Lemmas
                               Rocco A. Servedio*                      Li-Yang Tan†
                  Received June 30, 2019; Revised December 30, 2020; Published March 2, 2022




       Abstract. We give the best known pseudorandom generators for two touchstone classes
       in unconditional derandomization: small-depth circuits and sparse F2 polynomials. Our
       main results are an ε-PRG for the class of size-M depth-d AC0 circuits with seed length
       log(M)d+O(1)
                  √ · log(1/ε), and an ε-PRG for the class of S-sparse F2 polynomials with seed
       length 2 log S) · log(1/ε). These results bring the state of the art for unconditional deran-
                O(

       domization of these classes into sharp alignment with the state of the art for computational
       hardness for all parameter settings: substantially improving on the seed lengths of either PRG
       would require a breakthrough on longstanding and notorious circuit lower bound problems.
           The key enabling ingredient in our approach is a new pseudorandom multi-switching
       lemma. We derandomize recently developed multi-switching lemmas, which are powerful
       generalizations of Håstad’s switching lemma that deal with families of depth-two circuits.
       Our pseudorandom multi-switching lemma—a randomness-efficient algorithm for sampling
       restrictions that simultaneously simplify all circuits in a family—achieves the parameters
       obtained by the (full randomness) multi-switching lemmas of Impagliazzo, Matthews, and
     A preliminary version of this paper appeared in the Proceedings of the 23rd International Conference on Randomization
and Computation (RANDOM 2019) [68]
   * Supported by CCF-1420349 and CCF-1563155
   † Supported by CCF-1563122



ACM Classification: F.1.2
AMS Classification: 68Q17
Key words and phrases: pseudorandom generators, switching lemmas, circuit complexity


© 2022 Rocco A. Servedio and Li-Yang Tan
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2022.v018a004
                                   ROCCO A. S ERVEDIO AND L I -YANG TAN

        Paturi (SODA’12) and Håstad (SICOMP 2014). This optimality of our derandomization
        translates into the optimality (given current circuit lower bounds) of our PRGs for AC0 and
        sparse F2 polynomials.


1      Introduction
Switching lemmas. Switching lemmas, first established in breakthrough work in the 1980s [4, 32, 79, 37],
are fundamental results stating that depth-two circuits (ORs of ANDs or vice versa) simplify dramatically
when they are “hit with a random restriction.” They are a powerful technique in circuit complexity, and
are responsible for a remarkable suite of hardness results concerning small-depth Boolean circuits (AC0 ).
Switching lemmas are at the heart of several near-optimal bounds on AC0 circuits, such as essentially
optimal correlation bounds against the PARITY function [43, 38] and the worst-case and average-case
depth hierarchy theorems of [37, 42, 39]. Indeed, comparably strong results are lacking (and are major
open problems) for seemingly small extensions of AC0 , such as AC0 augmented with parity or mod-p
gates, for which switching lemmas do not apply; this gap highlights the importance of switching lemmas
as a proof technique.
    Switching lemmas are versatile as well as powerful: many results in circuit complexity rely on
sophisticated variants and generalizations of the “standard” switching lemmas. Recent examples include
the aforementioned correlation bounds and average-case depth hierarchy theorems, as well as powerful
lower bounds on the circuit complexity of the C LIQUE problem [14, 64], lower bounds on the small-depth
circuit complexity of ST-C ONNECTIVITY [28], and lower bounds against AC0 formulas [65]. Beyond
the immediate arena of circuit lower bounds, switching lemmas are also important tools in diverse areas
including propositional proof complexity [59, 47, 60, 40], computational learning theory [48], the design
of circuit satisfiability algorithms [16, 43], and coding theory [26, 12].
    This paper is about the role of switching lemmas in the study of unconditional pseudorandomness.
Switching lemmas have a long history in this area; indeed, arguably the first work in unconditional
derandomization, the seminal paper of Ajtai and Wigderson [6], was based on a pseudorandom switching
lemma, which they used to give the first non-trivial pseudorandom generator for AC0 . (Interestingly, after
many subsequent developments described in detail in Section 2, we come full circle in this paper and use
the [6] framework to give a new pseudorandom generator for AC0 that is essentially best possible without
improving longstanding circuit lower bounds.) One key contribution that we make in this paper is to bring
together two important generalizations of standard switching lemmas, one quite old and one very new:

    (i) pseudorandom switching lemmas (originating in [6]), which employ pseudorandom rather than
        “fully random” restrictions, and

    (ii) recently developed multi-switching lemmas [43, 38] which simultaneously simplify all of the
         depth-two circuits in a family of such circuits, rather than a single depth-two circuit as is the case
         for standard switching lemmas.

      Let us discuss each of these generalizations in turn.

                          T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                               2
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Pseudorandom switching lemmas. The (truly) random restrictions that are used in standard switching
lemmas make an independent random choice for each input variable x1 , . . . , xn of whether to map it
to 0, to 1, or to leave it unassigned (map it to ∗); standard switching lemmas show that a depth-two
circuit simplifies dramatically with very high probability when it is hit with such a random restriction.
Such “truly random” restrictions are inherently incompatible with unconditional derandomization, which
naturally motivates the notion of a pseudorandom switching lemma. Such a result defines a much smaller
probability space of “pseudorandom” restrictions, and proves that a restriction drawn randomly from this
space also has the effect of simplifying a depth-two circuit with high probability. While pseudorandom
switching lemmas have been the subject of much research since they were first introduced by Ajtai
and Wigderson [6, 5, 27, 3, 35, 43, 34, 73, 33], and have been applied in a range of different ways in
unconditional derandomization, they are not yet fully understood.
    The designer of a pseudorandom switching lemma faces an inherent tension between achieving strong
parameters—intuitively, having a depth-two circuit simplify as much as possible while keeping a large
fraction of variables alive—and using as little randomness as possible. Prior to the work of Trevisan and
Xue [73], known pseudorandom switching lemmas fell short of achieving the parameters of Håstad’s
influential “full randomness” switching lemma [37]. In particular, a parameter of central importance in
essentially all applications of switching lemmas is the probability that a given coordinate xi remains alive
under a random (or pseudorandom) restriction; this is often referred to as the “∗-probability” and denoted
by p. A crucial quantitative advantage of Håstad’s switching lemma over previous results is that it can
be applied even when p is as large as Ω(1/ log(n)) for poly(n)-size depth-two circuits—in contrast, the
earlier articles [4, 32, 79] required p = n−Ω(1) —and yields a very strong conclusion, namely that with
high probability the restricted circuit collapses to a shallow decision tree.1 (For example, while the recent
pseudorandom switching lemma of [34] is able to achieve a relatively large p, the conclusion of that
switching lemma is that the restricted depth-two circuit can w.h.p. be sandwiched by depth-two circuits
with small bottom fan-in, which is weaker than the aforementioned decision-tree conclusion.)
    Trevisan and Xue [73] give a pseudorandom switching lemma that is highly randomness efficient
and yet achieves the parameters of Håstad’s fully random switching lemma (i. e., [73] achieves the same
simplification, collapsing to a shallow decision tree, that follows from [37], with the same ∗-probability
p as [37]). The key conceptual ingredient enabling this is a beautiful idea of “fooling the proof” of
Håstad’s switching lemma, exploiting its “computational simplicity.” Trevisan and Xue leverage their
pseudorandom switching lemma to construct a new pseudorandom generator for AC0 , obtaining the first
improvement of Nisan’s celebrated PRG [57] in over two decades. We elaborate on Trevisan and Xue’s
ideas and how they obtain their PRG later in Section 2.1.

Multi-switching lemmas. The switching lemma shows that any width-k CNF formula collapses to a
shallow decision tree with high probability under a random restriction. Via a simple union bound it is of
course possible to extend this result to say that a family of width-k CNF formulas will all collapse to a
shallow decision tree with high probability under a random restriction; but this naive approach leads to
a quantitative loss in parameters if the argument is iterated, as it typically is, d − 1 times to analyze a
depth-d circuit. (The exact nature of this quantitative loss is important but somewhat subtle; see Section 3
   1 The first published version of the switching lemma with a decision tree conclusion is due to Cai [22]; several authors

subsequently noted that Håstad’s argument also yields such a conclusion.


                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                          3
                                        ROCCO A. S ERVEDIO AND L I -YANG TAN

for a detailed explanation.)
     Via an ingenious extension of the ideas underlying the original switching lemma, Håstad [38]
developed “multi-switching lemmas” that essentially bypass this quantitative loss in parameters that
results from iterating a naive union bound (see also the work of Impagliazzo, Matthews, and Paturi [43]
for closely related results). Roughly speaking, [38] shows that a family of width-k CNF formulas will
with high probability have a shallow common partial decision tree. Without explaining this structure in
detail here (again see Section 3 for a detailed explanation), this makes it possible to iterate the argument
and tackle depth-d circuits without incurring a quantitative loss in parameters. The savings thus achieved
is the key new ingredient that allowed [43, 38] to achieve essentially optimal correlation bounds for AC0
against the PARITY function, capping off a long line of work [4, 79, 37, 22, 10, 16]. These ideas have also
been leveraged to achieve new algorithmic results such as better-than-brute-force satisfiability algorithms
and distribution-free PAC learning algorithms for AC0 [16, 43, 66].


A pseudorandom multi-switching lemma. A core technical contribution of this paper is to bring
together these two lines of work, on pseudorandom switching lemmas and on multi-switching lemmas.
Since the precise statement of our pseudorandom multi-switching lemma, Theorem 4.2, is somewhat
involved we defer it to Section 4 and here merely make some remarks about it. In the spirit of Trevisan
and Xue’s derandomization of the original switching lemma, to obtain Theorem 4.2 we “fool the proof”
of Håstad’s multi-switching lemma [38], exploiting its “computational simplicity.” This enables us to
achieve optimal parameters in the same sense as [73], namely, that it establishes the same dramatic
simplification—now of the family F of depth-two circuits—as [38], while only requiring the same
∗-probability p as [38]. Our pseudorandom switching lemma is highly efficient in its use of randomness;
this randomness efficiency is crucial in the constructions of our pseudorandom generators for AC0 circuits
and sparse F2 polynomials using Theorem 4.2, which we describe in the next section.2



2     PRGs for AC0 and sparse F2 polynomials

We employ our pseudorandom multi-switching lemma to give the best known pseudorandom generators
for two canonical classes in unconditional derandomization: AC0 circuits and sparse F2 polynomials. As
we describe in this section, our results bring the state of the art for unconditional derandomization of these
classes into sharp alignment with the state of the art for computational hardness. In this sense, our results
are in the same spirit as those of Imagliazzo, Meka, and Zuckerman [44], which gave optimal (assuming
current circuit lower bounds) pseudorandom generators for various classes of Boolean formulas and
branching programs; however, our techniques are very different from those of [44].

    2 While our focus in this article is on unconditional derandomization, we briefly mention that recent work of Ball et al. [12]

establishes a new connection between pseudorandom switching lemmas and non-malleable codes in coding theory [30]. Using
this connection, [12] is able to leverage the randomness efficiency of Trevisan and Xue’s pseudorandom switching lemma [73]
in its design of new non-malleable codes for small-depth circuits. We leave the possibility of applying our techniques to obtain
further-improved non-malleable codes as an interesting avenue for future work.


                             T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                               4
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

2.1     PRGs for AC0 circuits
The class of small-depth Boolean circuits (AC0 ) is a class of central interest in unconditional deran-
domization, and has been the subject of intensive research in this area over the past 30 years [6, 49,
57, 58, 54, 53, 45, 72, 74, 13, 62, 21, 46, 29, 2, 1, 69, 51, 31, 35, 34, 73, 33, 70, 36, 24]. This highly
successful line of work on derandomizing AC0 has generated a wealth of ideas and techniques that
have become mainstays in the field of pseudorandomness. A prominent example is Nisan’s celebrated
PRG for AC0 circuits [57], which introduced ideas that enriched the surprising connections between
pseudorandomness and computational hardness [17, 78, 58]. The hardness-versus-randomness paradigm
asserts, qualitatively, that strong explicit PRGs exist if and only if strong explicit circuit lower bounds
exist. In the context of unconditional derandomization (the subject of this article), this strongly motivates
the goal of constructing, for every circuit class C , unconditional PRGs for C that are best possible
given the current best lower bounds for C . In other words, this is the goal of achieving an optimal
hardness-to-randomness conversion for C , converting “all the hardness” in our lower bounds for C into
pseudorandomness for C .
     For C being the class of n-variable size-M depth-d AC0 circuits, this amounts to constructing PRGs
with seed length logd−1 (Mn) log(1/ε); such seed length is essentially the best possible without improving
longstanding AC0 lower bounds that date back to the 1980s [37]. More precisely, it is well known
(see, e. g., the subsection “Barriers to further progress” in [73, Section 1]) that achieving seed length
logd−1.01 (Mn) log(1/ε) would yield exp(ω(n1/(d−1) )) size lower bounds against depth-d AC0 circuits;
this is a barrier that has stood for over 30 years even in the d = 3 case. We give the first PRG that achieves
a seed length of logd+O(1) (Mn) log(1/ε):

Theorem 2.1 (PRG for AC0 circuits). For every d ≥ 2, M ∈ N and ε > 0, there is an ε-PRG for the class
of n-variable size-M depth-d circuits with seed length logd+O(1) (Mn) log(1/ε).

    We note that (to the best of our knowledge) a seed length of logd+O(1) (Mn) + log(1/ε) would not
imply new circuit lower bounds, and neither would a seed length of logd−1 (Mn) log(1/ε). We leave the
design of pseudorandom generators achieving these seed lengths as interesting subjects for future work.

2.1.1    Background and prior PRGs for AC0 circuits
As noted above, there has been a significant body of work on PRGs for AC0 circuits, spanning over 30
years. In this section we give a brief overview of the history and prior state of the art for this touchstone
problem in unconditional derandomization.

Ajtai–Wigderson and Nisan. Ajtai and Wigderson, in their seminal paper [6] pioneering the study of
unconditional derandomization, constructed the first non-trivial PRG for AC0 circuits with an no(1) seed
length; we will discuss their techniques in detail later. The seed length of the [6] generator was improved
significantly in the celebrated work of Nisan [57], using what is now known as the Nisan–Wigderson
framework [58], which provides a generic template for converting correlation bounds against a circuit
class to PRGs for a closely related class (in the case of AC0 these two classes essentially coincide). Via
this approach Nisan showed how correlation bounds for AC0 against the PARITY function [37] yield a
PRG with seed length log2d+O(1) (Mn/ε).

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                5
                                 ROCCO A. S ERVEDIO AND L I -YANG TAN

     We remark that the generality of the Nisan–Wigderson framework comes at a quantitative price: it is
straightforward to verify that a seed length of (logd (Mn) + log(1/ε))2 is the best that can be achieved via
this framework given current AC0 circuit lower bounds (see, e. g., [73, 36]). This is roughly quadratically
worse than the best that can be achieved assuming only current AC0 circuit lower bounds.

Bounded independence fools AC0 . Nisan’s seed length for fooling AC0 circuits stood unchallenged
for more than two decades. However, in this interim period there was significant progress on showing
that distributions with bounded independence fool AC0 , a well-known conjecture posed by Linial and
Nisan [49]. Braverman’s breakthrough, which built on an earlier result of Bazzi [13] and its subsequent
simplification by Razborov [62], showed that polylog(n)-wise independence fools AC0 . Using standard
                                                                                              2
constructions of k-wise independent distributions, this gave a PRG with seed length logO(d ) (Mn/ε);
this was subsequently sharpened to log3d+O(1) (Mn/ε) by Tal [70]. Recently, Harsha and Srinivasan [36]
further improved the seed length of Braverman’s generator to log3d+O(1) (Mn) log(1/ε), which is notable
for its optimal dependence on the error parameter ε.

The work of Trevisan and Xue. Recent work of Trevisan and Xue [73] makes a significant advance
towards achieving seed length logd−1 (Mn) log(1/ε): their work circumvents the “quadratic loss” as-
sociated with the Nisan–Wigderson framework with a PRG of seed length logd+O(1) (Mn/ε). This is
the first PRG to achieve a logd+O(1) (Mn) dependence, an exponent that is within an additive absolute
constant of the sought-for logd−1 (Mn), and is also the first strict improvement on Nisan’s seed length in
more than two decades. (Note, however, that, like in Nisan’s PRG, the dependence on ε is not optimal:
logd+O(1) (1/ε) instead of log(1/ε).)
    Rather than going through the Nisan–Wigderson framework—which, as noted above, carries with it
an associated quantitative loss in parameters—Trevisan and Xue construct their PRG by derandomizing
the proof of AC0 lower bounds, “opening up the black-box” of AC0 lower bounds, so to speak. At a high
level, Trevisan and Xue [73] adopt the strategy employed in the early work of Ajtai and Wigderson [6].
We describe this strategy in detail in Section 5, but roughly speaking, Ajtai and Wigderson introduced a
powerful and generic framework for constructing PRGs from pseudorandom switching lemmas. In [6],
they instantiated this framework with a derandomization of Ajtai’s switching lemma [4]—which underlies
his proof of the first superpolynomial lower bounds against AC0 —to obtain the first non-trivial PRG for
AC0 . Trevisan and Xue obtain their PRG by revisiting this early framework of [6], instantiating it with
their derandomization of Håstad’s switching lemma [37]. (And as we will soon discuss, in this paper we
obtain our PRG by instantiating the [6] framework with our derandomization of the [38] multi-switching
lemmas.)
PRGs via polarizing random walks. Finally, in recent exciting work, Chattopadhyay, Hatami, Hosseini,
and Lovett [24] have introduced an elegant new framework for obtaining pseudorandom generators which
has consequences for fooling AC0 . Their framework is based on a notion of “fractional” pseudorandom
generators, which are used as steps in a random walk which ultimately yields a (standard) pseudorandom
generator. Chattopadhyay et al. [24] show that if a class C of circuits is closed under restrictions and has
sufficiently strong Fourier concentration on low-degree coefficients, then almost k-wise independence
suffices to yield a fractional PRG, which their random walk approach can then convert into a standard
PRG against C . Using Tal’s sharp bounds [70] on the Fourier concentration of AC0 , they obtain a seed

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                              6
    I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

length of O(log(n/ε)(log(log(n)/ε)) log2d−2 M) for size-M depth-d circuits.

2.1.2   Our PRG and approach
To summarize, prior to our work there were three incomparable best known PRGs for AC0 , achieving
three different tradeoffs in the overall dependence on M, d and 1/ε. These were the PRG of Trevisan and
Xue [73], which has seed length logd+O(1) (Mn/ε); Harsha and Srinivasan’s improvement of Braverman’s
generator [36], which has seed length log3d+O(1) (Mn) log(1/ε); and the [24] PRG, which has seed length
O(log(n/ε)(log(log(n)/ε)) log2d−2 M), i. e., essentially log2d−1 (Mn) log2 (1/ε).
     Theorem 2.1 unifies and improves these three incomparable seed lengths. Our PRG achieves an essen-
tially optimal hardness-to-randomness conversion for AC0 : our seed length of logd+O(1) (Mn) log(1/ε)
comes very close to logd−1 (Mn) log(1/ε), which is best possible without improving longstanding AC0
circuit lower bounds that date back to the 1980s.
     Table 1 provides a comparison of the seed length of our PRG (and the techniques that underlie our
construction) and those of previous work.

 Reference            Seed length                  Techniques

 [6] AW               no(1) for M = poly(n)        derandomize [4] switching lemma

 [57] Nisan           log2d+O(1) (Mn/ε)            [58] framework, [37] correlation bounds
                         O(d 2 )
 [21] Braverman       log          (Mn/ε)          bounded independence

 [73] TX              logd+O(1) (Mn/ε)             [6] framework, derandomize [37] switching lemma

 [70] Tal             log3d+O(1) (Mn/ε)            bounded independence

 [36] HS              log3d+O(1) (Mn) log(1/ε)     bounded independence

                      (essentially)                almost bounded independence, fractional PRGs, po-
 [24] CHHL
                      log2d−1 (Mn) log2 (1/ε)      larizing random walks

                                                   [6] framework, derandomize [38] multi-switching
 This paper           logd+O(1) (Mn) log(1/ε)
                                                   lemma, bounded independence

                  Table 1: PRGs for ε-fooling n-variable size-M depth-d AC0 circuits.

Our approach. Our approach draws on and unifies ideas from articles [6, 73, 36] discussed above, which
we use in conjunction with our derandomization of Håstad’s multi-switching lemma [38] to obtain our
PRG.
    At a high level, we adopt the overall conceptual strategy of Ajtai and Wigderson [6] and Trevisan
and Xue [73], and obtain our PRG by derandomizing the proof of AC0 lower bounds. The key technical
ingredient in our PRG construction is our pseudorandom multi-switching lemma, a derandomization of
the multi-switching lemmas which underlie the optimal correlation bounds for AC0 against PARITY by

                       T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                          7
                                       ROCCO A. S ERVEDIO AND L I -YANG TAN

Impagliazzo et al. and Håstad [43, 38]. Our pseudorandom multi-switching lemma improves both the
pseudorandom switching lemma of [73] (a derandomization of Håstad’s switching lemma [37] which
underlies his exponential lower bounds against AC0 ) and the pseudorandom switching lemma of [6]
(a derandomization of Ajtai’s switching lemma [4] which underlies his superpolynomial lower bounds
against AC0 ).
    Our derandomization of the [38] multi-switching lemma is largely influenced by Trevisan and Xue’s
derandomization of Håstad’s original switching lemma [37]. We describe our approach in detail in
Section 4, but highlight here the simple but ingenious new idea underlying the argument in [73]. Roughly
speaking, they derandomize Håstad’s switching lemma by “fooling its proof”: showing that Håstad’s proof
of his switching lemma “cannot δ -distinguish” between truly random restrictions and pseudorandom
restrictions drawn from polylog(n)-wise independent distributions. Since Håstad’s switching lemma
holds for truly random restrictions, it thus follows that it also holds for pseudorandom restrictions drawn
from polylog(n)-wise independent distributions (up to a δ additive loss in the failure probability).
    To accomplish this, Trevisan and Xue exploit the fact that Håstad’s proof of the switching lemma
is “computationally simple”: for a fixed k-CNF F, there is a small depth-3 circuit that takes as input
an encoding of a restriction ρ, and returns 1 iff ρ is a bad restriction for the desired conclusion of
Håstad’s switching lemma, contributing to its failure probability. (More precisely, the failure event is
that the “canonical decision tree” for F  ρ has large depth.) In a similar spirit, our derandomization
of the [38] multi-switching lemma also exploits the “computational simplicity” of its proof. In our
case, for a fixed family F of k-CNF formulas we construct a small depth-4 circuit for recognizing bad
restrictions. (The one additional layer of depth reflects the fact that multi-switching lemmas are, roughly
speaking, “one quantifier more complex” than switching lemmas.) To obtain optimal parameters in our
PRG constructions, we use the d = 4 case of Harsha and Srinivasan’s strengthening of Braverman’s
generator [36] to fool this depth-4 circuit, and hence show that Håstad’s proof [38] of the multi-switching
lemmas “cannot distinguish” between truly random and pseudorandom restrictions. The fact that [36]
achieves an optimal log(1/ε) dependence on the seed length plays a crucial role in enabling the optimal
log(1/ε) seed-length dependence of our PRG.


2.2    PRGs for sparse F2 polynomials
Our second main result deals with the class of sparse F2 polynomials. Like AC0 circuits, sparse F2
polynomials and low-degree F2 polynomials have been extensively studied in unconditional derandom-
ization [56, 8, 54, 18, 74, 50, 76, 19, 51, 52, 25].
    Via the hardness-versus-randomness paradigm, the problem of derandomizing F2 polynomials
is intimately related to that of proving correlation bounds for F2 polynomials. A prominent open
problem in the latter context—arguably the current flagship challenge in this area—is that of obtaining
superpolynomially small correlation bounds against F2 polynomials of degree log(n). Degree log(n)
represents the fundamental limit of our current suite of powerful techniques for proving F2 correlation
bounds [11, 20, 23, 77], and breaking this “degree-log(n) barrier” would constitute a significant technical
breakthrough3 . See Open Question 1 of Viola’s excellent survey [75] for a detailed discussion of this
   3 Breaking this “degree-log(n) barrier” is also well known (via a simple and beautiful observation by Håstad and Gold-

mann [41]) to be a prerequisite for breaking the notorious “log(n)-party barrier” in multi-party communication complexity [11],


                            T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                             8
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

important open problem and its relationship to other central challenges in complexity theory.
    As a second application of our pseudorandom
                                           √         multi-switching lemma, we give an ε-PRG for S-
sparse F2 polynomials with seed length 2O( log S) log(1/ε), which is best possible without breaking the
aforementioned “degree-log(n) barrier” for F2 correlation bounds.
                                                                               2
Theorem 2.2 (PRG√for sparse F2 polynomials). For every S = 2ω(log log(n)) and ε > 0 there is a PRG
with seed length 2O( log S) log(1/ε) that ε-fools the class of n-variable S-sparse F2 polynomials.

Background and prior PRGs for F2 polynomials. The first unconditional PRGs for F2 polynomials
were given in early influential work of Luby, Veličković, and Wigderson [54], who constructed a PRG
that ε-fools size-S SYM ◦ AND√ circuits—including S-sparse F2 polynomials as an important special
                           O( log(S/ε))
case—with seed length 2                 . To obtain their PRG, Luby et al. employed the Nisan–Wigderson
framework [58] together with multi-party number-on-the-forehead (NOF) communication complexity
lower bounds from  √ the seminal paper by Babai, Nisan, and Szegedy [11]. Viola [74] subsequently
                O( log(S/ε))
extended this 2              seed length to the broader class of SYM ◦ AC0 circuits with a more modular
proof.
    √ In recent work [67], the authors have improved the seed-length dependence on ε of [54, 74] to
  O( log(S))
2            + polylog(1/ε). We discuss the relation between our techniques and those of [67] in more
detail below.
    In a related line of work, PRGs for low-degree F2 polynomials have also been intensively studied.
Starting with the fundamental results of Naor and Naor [56] on ε-biased distributions (which resolved the
degree-1 case), this research continued through an exciting line of work on the degree k ≥ 2 case [18, 19]
and culminated in the breakthroughs by Lovett [50] and Viola [76] which are described in more detail
below. We note that prior to our work, the underlying techniques used for the sparse case (multi-party
communication complexity) were completely different from the techniques used for the low-degree case
(Fourier analysis).

Our PRG and approach. Theorem 2.2 gives an exponential and optimal improvement of the PRG
of [54] in terms of its dependence on the error parameter ε. Our PRG achieves an optimal hardness-to-
randomness conversion for F2 polynomials: since every F2 polynomial of degree log(n) has at most
nlog(n)
   √ monomials, it can be shown (using the simple Proposition 3.1 of [76]) that a PRG with seed length
2o( log S) log(1/ε) would break the degree-log(n) barrier.
     Our techniques for Theorem 2.2 are substantially different from the techniques of [67, 74]. As
summarized in Table 2, the basic approach of [67], like [74] and [54], is via the Nisan-Wigderson
paradigm using multi-party communication complexity bounds; the main point of departure between
[67] and [74] is that [67] leverages Håstad’s multi-switching lemma from [38] in place of his earlier [37]
switching lemma which was used in [74]. (We note, that, similarly to the situation for AC0 circuits, it is
straightforward to verify that our optimal log(1/ε) dependence is not achievable via the Nisan–Wigderson
framework without dramatic breakthroughs in correlation bounds for F2 polynomials, going well beyond
breaking the degree-log(n) barrier.) In contrast, we do not use the Nisan–Wigderson framework or
multi-party communication complexity lower bounds; instead, as for AC0 , our approach is based on the
[6] framework and our derandomization of the [38] multi-switching lemma. Indeed, our approach to
a longstanding open problem that has resisted attack for over three decades.


                            T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                        9
                                 ROCCO A. S ERVEDIO AND L I -YANG TAN

obtaining Theorem 2.2 bridges the two previously disparate lines of work on pseudorandomness for sparse
and low-degree polynomials: roughly speaking, it can be viewed as a reduction from PRGs for S-sparse
                                               √
polynomials to PRGs for polynomials of degree log S. This allows us to leverage the result of Viola [76]
(building on the work of Lovett [50]), which gives PRGs for n-variable degree-k F2 polynomials with
seed length
                                      O(k log(n) + k2k log(1/ε)).
(We note that the lack of PRGs for SYM ◦ ANDd circuits of comparable strength to [76, 50] — in
particular, the log(1/ε) dependence — for general SYM gates beyond the parity function are a barrier for
extending our PRG for F2 polynomials to general depth-two SYM ◦ AND circuits.) More precisely, at
the heart of our reduction is a new pseudorandom switching lemma for sparse F2 polynomials, showing
that such a polynomial is very likely to collapse to a small-depth decision tree with low-degree F2
polynomials at its leaves under a suitable pseudorandom restriction. This is essentially a special case
of our pseudorandom multi-switching lemma. With this reduction in hand, we then exploit the strength
and generality of Viola’s result—roughly speaking, that the sum of k independent copies of a sufficiently
strong ε-biased distribution fools degree-k polynomials—to show that his PRG extends to fool not only
low-degree polynomials, but also small-depth decision trees with low-degree polynomials at their leaves.
    Table 2 provides a comparison of the seed length of our PRG (and the techniques that underlie our
construction) and those of previous work.

 Reference/
                   Seed length                     Techniques
 Class

 [54] LVW             √                            [58] framework, [11] multi-party NOF communica-
                   2O( log(S/ε))
 S-sparse                                          tion complexity

 [67] ST              √                            [58] framework, [11] multi-party NOF communica-
                   2O( log S) + (log(1/ε))4.01
 S-sparse                                          tion complexity, [38] multi-switching lemma

 [50] Lovett
                   O(2k log(n) + 4k log(1/ε))      Fourier analysis
 degree k

 [76] Viola
                   O(k log(n) + k2k log(1/ε))      Fourier analysis
 degree k

 This paper S         √                            [6] framework, derandomize [38] multi-switching
                   2O( log S) log(1/ε)
 sparse                                            lemma, Fourier analysis, bounded independence

                             Table 2: PRGs for ε-fooling F2 polynomials.


2.3   Organization
Section 2.4 recalls some basic preliminaries from unconditional pseudorandomness. We describe and
contrast the original Håstad switching lemma [37] versus the [38] multi-switching lemma in Section 3.

                      T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                            10
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Section 3.1 establishes some infrastructure towards derandomizing the [38] switching lemma, and the
actual derandomization is carried out in Section 4, culminating in the proof of Theorem 4.2. Section 5
describes a general framework for constructing pseudorandom generators that is implicit in the work
of Ajtai and Wigderson [6]; a crucial ingredient in this framework for constructing a pseudorandom
generator for a class C is a “pseudorandom simplification lemma” for C . In Section 6 we apply our
derandomized multi-switching lemma from Section 4 to obtain the required pseudorandom simplification
lemmas for AC0 circuits and for sparse F2 polynomials. Finally, Section 7 puts the pieces together and
establishes the PRGs for AC0 and for sparse F2 polynomials that are our main PRG results.

2.4    Preliminaries
All logarithms are in base 2. For r < n, we say that a distribution D over {0, 1}n can be sampled efficiently
with r random bits if (i) D is the uniform distribution over a multiset {z(1) , . . . , z(s) } of strings from
{0, 1}n where s ≤ 2r and (ii) there is a deterministic algorithm GenD which, given as input a uniform
random element of [s], runs in time poly(n, s) and returns a string drawn from D.
     For δ > 0 and a class C of functions from {0, 1}n to {0, 1}, we say that a distribution D over {0, 1}n
δ -fools C with seed length r if (a) D can be sampled efficiently with r random bits via algorithm GenD ,
and (b) for every function f ∈ C , we have
                                                                                
                                    E         f (GenD (z)) −      E           f (x) ≤ δ .
                                  z←[s]                        x←{0,1}n

Equivalently, we say that GenD is a δ -PRG for C with seed length r.
    Two kinds of distributions which are extremely useful in derandomization are δ -biased and k-
wise independent distributions. We say that a distribution D over {0, 1}n is δ -biased if it δ -fools
the class of all 2n parity functions {PARITYS }S⊆[n] , where PARITYS : {0, 1}n → {0, 1} is defined by
PARITYS (x) = ∑i∈S xi mod 2. We say that a distribution D over {0, 1}n is k-wise independent with
parameter p if for every 1 ≤ i1 < · · · < ik ≤ n and every (b1 , . . . , bk ) ∈ {0, 1}k , we have
                                                              k                    k
                       Pr xi1 = b1 and · · · and xik = bk = p∑ j=1 b j · (1 − p)k−∑ j=1 b j ,
                                                        
                      x←D

i. e., every subset of k coordinates is distributed identically to a product distribution with parameter p.
      A restriction ρ of variables x1 , . . . , xn is an element of {0, 1, ∗}n . We write supp(ρ) to denote the set
of coordinates that are fixed to 0 or 1 by ρ. Given a function f (x1 , . . . , xn ) and a restriction ρ, we write
 f  ρ to denote the function obtained by fixing xi to ρ(i) if ρ(i) ∈ {0, 1} and leaving xi unset if ρ(i) = ∗.
For ρ ∈ {0, 1}S with S ( [n], the notation “ f  ρ” refers to the restriction of f obtained by “completing”
ρ to an element of {0, 1, ∗}n by setting ρi = ∗ for all i ∈ [n] \ S. For two restrictions ρ, ρ 0 ∈ {0, 1, ∗}n ,
their composition, denoted ρρ 0 or ρ ◦ ρ 0 , is the restriction (element of {0, 1, ∗}n ) defined by
                                                       
                                                 0         ρi if ρi ∈ {0, 1}
                                          (ρρ )i =
                                                           ρi0 otherwise.

Given a collection F = { f1 , . . . , fM } of functions and a restriction ρ we write F  ρ to denote the family
{ f1  ρ, . . . , fM  ρ}.

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                   11
                                 ROCCO A. S ERVEDIO AND L I -YANG TAN

     Given an AC0 circuit, we define its size to include the input variables (along with the number of gates
in the circuit). We adopt this convention for notational convenience, since we may then always assume
that the size M of an n-variable circuit is always at least n. (We do not adopt this convention for F2
polynomials: as is standard, we define the sparsity of an F2 polynomial to be the number of monomials
in its support.)
     Finally, if g is a Boolean function and C is a class of circuits, we say that g is computed by a
(t, C )-decision tree if g is computed by a decision tree of depth t (with single Boolean variables xi at
internal nodes as usual) in which each leaf is labeled by a function from C .


3    Multi-switching lemmas
At the heart of almost all applications of Håstad’s original switching lemma [37] is a powerful structural
fact about AC0 circuits: every AC0 circuit “collapses” (i. e., simplifies dramatically) to a depth-t decision
tree with probability at least 1 − ε under a random restriction that randomly fixes a (1 − p)-fraction of
coordinates. In the precise quantitative statement of this fact, both t and p depend on ε: as the desired
failure probability ε tends to 0, the ∗-probability p tends to 0 (more coordinates are fixed) and t tends to
n (the resulting decision tree is of larger depth). It is easy to see that this dependence is inherent given
the statement of the [37] switching lemma, and indeed this will be clear from the discussion later in this
section.
    The recent multi-switching lemma of Håstad [38] (see also [43]) achieves a remarkable strengthening
of the above: essentially the same structural fact about AC0 holds (in terms of the quantitative relation
between the decision tree depth t and the failure probability ε) with the ∗-probability p being independent
of ε. This is the key qualitative difference underlying the optimal AC0 correlation bounds for PARITY
obtained in [43, 38]; likewise, in this work, this is the key qualitative difference underlying the optimal
ε-dependence in the seed lengths of our PRGs for AC0 circuits and sparse F2 polynomials.
    Let R p denote the random restriction which independently sets each variable xi to 0 with probability
(1 − p)/2, to 1 with probability (1 − p)/2, and to ∗ with probability p. We first recall the original
switching lemma from [37]:

Theorem 3.1 (Håstad’s switching lemma). Let F be a k-CNF. Then for all t ≥ 1, we have that

                       Pr [F  ρ does not have a decision tree of depth t ] ≤ (5pk)t .
                     ρ←R p


    In the context of AC0 circuits the switching lemma is used to achieve depth reduction under random
restrictions: we apply Theorem 3.1 separately to each of the bottom-layer depth-2 subcircuits, choosing t
appropriately so that all of them “switch” to depth-t decision trees with high probability. The following
corollary is what is typically used:

Corollary 3.2 (AC0 depth reduction via Theorem 3.1). Let C be a size-M depth-d AC0 circuit with bottom
fan-in k, and let p = 1/(10k). Then for all ε > 0,

        Pr C  ρ is not computed by a depth-(d − 1) circuit with bottom fan-in log(M/ε) ≤ ε.
                                                                                        
      ρ←R p


                       T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                               12
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Proof. This follows from applying Theorem 3.1 with t = log(M/ε) to each of the bottom-layer depth-2
subcircuits of C (at most M of them), along with the basic fact that a depth-t decision tree can be expressed
as both a t-DNF as well as a t-CNF.
    The same argument is then repeated again on the (k = log(M/ε))-DNFs at the bottom two layers
of the new circuit (applying the dual form of the switching lemma for k-DNFs rather than k-CNFs) to
further reduce the depth to d − 2. However, observe that in this second application of the switching
lemma (and in later applications as well), in order to use Corollary 3.2, the parameter p of the random
restriction must now depend on ε, since we must now take p < 1/(5k) = 1/(5 log(M/ε)) in order to
get a nontrivial bound in Theorem 3.1. This is why standard applications of the [37] switching lemma
(involving d − 1 iterative applications of Corollary 3.2) show that every size-M depth-d AC0 circuit
collapses to depth-(t = log(M/ε)) decision tree with high probability, at least 1 − ε, under a random
restriction with ∗-probability p = Θ(1/ logd−1 (M/ε)). Note that t and p both depend on ε.
    As alluded to above, the recent multi-switching lemma of [38] shows, remarkably, that essentially the
same simplification holds under a random restriction with ∗-probability p = Θ(1/ logd−1 (M)), indepen-
dent of ε. Let us establish some terminology and notation to present these results.
Definition 3.3 (Common partial decision tree). Let F = {F1 , . . . , FM } be a collection of Boolean functions.
We say that a decision tree T is a common `-partial decision tree for F if every Fi ∈ F can be expressed
as T with depth-` decision trees at its leaves. (Equivalently, for every Fi ∈ F and root-to-leaf path π in T ,
we have that Fi  π is computed by a depth-` decision tree.)
    The multi-switching lemma of [38] is as follows:
Theorem 3.4 (Multi-switching lemma, Lemma 3.8 of [38]). Let F = {F1 , . . . , FM } be a collection of
k-CNFs and ` := log(2M). Then for all t ≥ 1,
                Pr [ F  ρ does not have a common `-partial DT of depth t ] ≤ M(24pk)t .
              ρ←R p

    The following corollary should be contrasted with Corollary 3.2:
Corollary 3.5 (AC0 depth reduction via Theorem 3.4; c.f. Corollary 3.2). Let C be a size-M depth-d AC0
circuit with bottom fan-in k, and let p = 1/(48k). Then for all ε > 0,
 Pr C  ρ is not computed by a ((log(M/ε), AC0 (depth d − 1, bottom fan-in log(2M))-decision tree) ≤ ε.
                                                                                                     
ρ←R p

Proof. This follows by applying Theorem 3.4 with F being the bottom-layer depth-2 subcircuits of C
and t = log(M/ε), along with the fact that a depth-` decision tree can be expressed as both a `-DNF and
an `-CNF.
    We highlight a crucial qualitative aspect of Corollary 3.5: while the depth t = log(M/ε) of the
decision tree does depend on ε, the depth-(d −1) AC0 circuits at its leaves have bottom fan-in k = log(2M)
which does not depend on ε. This means that in successive application of Corollary 3.5, the values
of p = 1/(48k) = Θ(1/ log M) will remain independent of ε. This leads to much better quantitative
bounds than can be obtained through repeated applications of Corollary 3.2: d − 1 iterative applications of
Corollary 3.5 imply that every size-M depth-d AC0 circuit collapses to a depth-O(2d log(M/ε)) decision
tree with high probability, at least 1−ε, under a random restriction with ∗-probability p = Θ(1/ logd−1 M).
Note that the overall ∗-probability p is independent of ε.

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                13
                                   ROCCO A. S ERVEDIO AND L I -YANG TAN

Multi-switching lemmas and sparse F2 polynomials. The qualitative advantage of multi-switching
lemmas—in particular, the crucial role of a common partial decision tree—can also be seen within the
context of F2 polynomials.
    Let P be an S-sparse F2 polynomial. It is an easy observation that P becomes a low-degree polynomial
with high probability when hit with a random restriction: for all ε, p ∈ (0, 1) and k ∈ N,
                                                            
                                                      ε      w k
        Pr [ P  ρ is not a degree-k polynomial ] ≤ + S          p where w = Θ(log(S/ε)).           (3.1)
      ρ←R p                                           2      k
             2


(The proof follows by considering each monomial of P individually and taking a union bound over all S
of them. For a fixed monomial, the probability that more than Ω(log(S/ε)) variables survive a random
restriction from R 1 is at most ε/(2S); next, the probability that at least k variables in a width-w monomial
                   2
survive a random restriction from R p is at most wk pk .) The failure probability of (3.1) can be made at
                                                      

most ε by choosing p and k appropriately, but note that at least one of p (the ∗-probability) or k (the
degree of the resulting polynomial) must depend on ε.
    Using a slight extension of the ideas in the multi-switching lemmas of [38], we can instead bound
the probability that P  ρ becomes a depth-t decision tree with degree-k polynomials at its leaves.
While this provides weaker structural information than the simple observation above (cf. Corollary 3.2
vs. Corollary 3.5 in the context of AC0 ), the crucial win will come from the fact that p and k can both be
taken to be independent of the failure probability ε (and only t will depend on ε).

3.1     Canonical common `-partial decision trees
An important concept in the proof of Theorem 3.4 is that of a canonical common `-partial decision tree
for an ordered collection F of k-CNFs, which we define in this section.
    Given a k-CNF formula F (which we view as an ordered sequence of width-k clauses C1 ∧C2 ∧ · · · ),
we recall the notion of the canonical decision tree for F, denoted CDT(F). This is a decision tree which
computes F and is obtained as follows:

      • If any clause Ci is identically 0, then the tree is the constant 0.

      • If every clause Ci is identically 1, then the tree is the constant 1.

      • Otherwise, let Ci1 be the first clause that is not identically 1, and let κ ∈ [k] be the number of
        variables in Ci1 . The first κ levels of CDT(F) exhaustively query these κ variables. At each of the
        2κ resulting leaves of the tree (each one corresponding to some restriction η ∈ {0, 1}κ fixing those
        κ variables), recursively put down the canonical decision tree CDT(F  η).

We observe that the tree CDT(F) is unique given a fixed ordering C1 ,C2 , . . . of the clauses in F.
    Håstad’s proof of his original switching lemma (Theorem 3.1) actually shows that if F is a k-CNF,
then the canonical decision tree CDT(F  ρ) is shallow w.h.p. over ρ ← R p . This is crucially important
for the arguments of Trevisan and Xue [73], who give a derandomized version of Håstad’s original
switching lemma: they construct a pseudorandom distribution over restrictions to take the place of R p ,
and show that with high probability a restriction drawn from this pseudorandom distribution causes

                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                             14
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

a k-CNF to collapse to a small-depth decision tree. Their argument uses the structure of a canonical
decision tree in an essential way.
    Turning to Håstad’s multi-switching lemma [38], we observe that analogous to his original switching
lemma, the proof of Theorem 3.4 given in [38] implicitly establishes a stronger statement: F  ρ has a
small-depth canonical common `-partial decision tree w.h.p. over ρ ← R p (we will define the notion of a
canonical common `-partial decision tree below). In fact, we will use the fact that it actually establishes
an even stronger statement: w.h.p. over ρ ← R p , every canonical common `-partial decision tree for
F  ρ is shallow—as we explain below, there is more than one canonical common `-partial decision tree
for a sequence F of CNFs.
    Let us explain what a canonical common `-partial decision tree for a sequence of CNFs F is. We
will see that there is a set of canonical common `-partial decision trees for a given F rather than just one
tree; and this is the case even though we assume a fixed ordering F1 , F2 , . . . on the elements of F as well
as on the clauses within each CNF. (Observe the contrast with the case of a canonical decision tree for
a single formula F, where we assume a fixed ordering on the clauses of F; in that setting, as explained
above there is a single canonical decision tree CDT(F).)
    We need a preliminary definition to handle a technical issue related to the final segment of paths
through a canonical decision tree.

Definition 3.6 (Full paths in the CDT). Let F = C1 ∧C2 ∧ · · · be a k-CNF and consider the canonical
decision tree CDT(F) for F. Every path η in CDT(F) can be written as the the disjoint union of segments
η = η (1) ◦η (2) ◦· · ·◦η (u) , where for all j ∈ [u], the segment η ( j) is an assignment to the surviving variables
in the restricted clause Ci j  η (1) ◦ · · · ◦ η ( j−1) , and Ci j is the first clause in F  η (1) ◦ · · · ◦ η ( j−1) that is
not identically 1.
     Furthermore, note that for j ∈ [u − 1], the segment η ( j) is in fact an assignment fixing all the surviving
variables in Ci j  η (1) ◦ · · · ◦ η ( j−1) . We say that η is full if this is also the case for the final segment: η is
full if η (u) is an assignment fixing all the surviving variables in Ciu  η (1) ◦ · · · ◦ η (u−1) .

Observation 1. Let F be a k-CNF and suppose depth(CDT(F)) > `. Then there is a full path η of length
|η| ∈ {` + 1, . . . , ` + k} in CDT(F).

    To help minimize confusion, we will reserve “η” for paths or segments of paths in CDTs, and “π” for
paths (or segments of paths) in CCDTs.
    We are now ready to define the set of canonical common `-partial decision trees:

Definition 3.7 (Canonical common `-partial DT). Let F = (F1 , . . . , FM ) be an ordered collection of
k-CNFs. The set of all canonical common `-partial decision trees for F , which we denote CCDT` (F ),
is defined inductively as follows:

   0. If M = 0 (i. e., F is an empty collection of k-CNFs) then CCDT` (F ) contains a single tree, the
      empty tree with no nodes. (Note that otherwise M ≥ 1, so there is some first formula F1 in F .)

   1. If depth(CDT(F1 )) ≤ `, then CCDT` (F ) is simply CCDT` (F 0 ), where F 0 = (F2 , . . . , FM ). (Note
      that in this case, since inductively each tree in CCDT` (F 0 ) is a common `-partial DT for F 0 , each
      such tree is also a common `-partial DT for F .)

                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                             15
                                   ROCCO A. S ERVEDIO AND L I -YANG TAN

    2. Otherwise, since depth(CDT(F1 )) > ` there must be a witnessing full path η of length between
       ` + 1 and ` + k in CDT(F1 ), and there are at most 2`+k such witnessing full paths. Let P be the set
       of all such witnessing full paths. For each path η ∈ P, let Tη be the tree of depth |η| obtained by
       exhaustively querying all the variables in η in the first |η| levels. Recurse at the end of each path
       in Tη : for each path π in Tη , attach a tree T 0 from CCDT` (F  π) at the end of the path. So in this
       case CCDT` (F ) is the set of all trees that can be obtained in this way (across all possible choices
       of η ∈ P and all possible choices of a tree T 0 ∈ CCDT` (F  π) for each path π ∈ Tη ).
We write depth(CCDT` (F )) to denote the maximum depth of any tree in the set CCDT` (F ).
   The following slight variant of Theorem 3.4 can be extracted, with some effort, from a slight
modification of the proof given in [38], which we provide in Appendix A:
Theorem 3.8 (Slight variant of Håstad’s multi-switching lemma. Theorem 3.4). Let F = (F1 , . . . , FM )
be an ordered collection of k-CNFs. Then for all `,t ≥ 1,

                            Pr [ depth(CCDT` (F  ρ)) ≥ t ] ≤ M dt/`e (32pk)t .
                           ρ←R p


A comparison of Theorem 3.4 (Håstad’s multi-switching lemma) and Theorem 3.8 (our variant
of it). We emphasize that the differences are technical in nature, and all the ideas in our proof of
Theorem 3.8 are from [38]. First, we observe that ` is now a free parameter rather than being fixed to
log(2M); this flexibility will be necessary in our PRG construction for sparse F2 polynomials (where we
           √
take ` = Θ( log M)). Second, our notion of a canonical common partial decision tree differs slightly
from the one that is implicit in [38]: in case 2 of Definition 3.7, we query a witnessing full path of length
between ` + 1 and ` + k, whereas [38] queries any witnessing path of length greater than `.


4    A pseudorandom multi-switching lemma
As suggested earlier, the crux of our PRG construction is a derandomization of the multi-switching lemma
(Theorem 3.8): we devise a suitable pseudorandom distribution over random restrictions in place of R p
(the truly random distribution over restrictions) and show that a random restriction ρ drawn from this
pseudorandom distribution satisfies a gurantee similar to the one in Theorem 3.8.
    Our derandomization of Theorem 3.8 is largely influenced by Trevisan and Xue’s [73] ingenious
derandomization of Håstad’s original switching lemma (Theorem 3.1). Roughly speaking, we will
derandomize the multi-switching lemma of Theorem 3.8 by “fooling its proof”: we will show that the proof
of Theorem 3.8 (given in Appendix A, which we again emphasize is only a slight technical modification
of Håstad’s proof of his multi-switching lemma, Theorem 3.4) “cannot δ -distinguish” between truly
random restrictions and pseudorandom restrictions drawn from polylog(n)-wise independent distributions.
Since Theorem 3.8 holds for truly random restrictions, it thus follows that it also holds for pseudorandom
restrictions drawn from polylog(n)-wise independent distributions (up to a δ additive loss in the failure
probability).
    To accomplish this, we exploit the “computational simplicity” of Theorem 3.8’s proof: for a fixed
family F of k-CNF formulas, we will show that there is a small AC0 circuit that takes as input an encoding

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                               16
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

of a restriction ρ, and returns 1 iff ρ is a bad restriction for the desired conclusion of Theorem 3.8,
contributing to its failure probability (i. e., iff depth(CCDT` (F  ρ)) > t). As alluded to in Section 3.1,
this relies on the fact that Theorem 3.8 does not simply bound the depth of the optimal common `-partial
decision tree for F  ρ, but instead the depth of any canonical common `-partial decision tree for F  ρ.
Indeed, this “constructive” aspect of the proof is crucial for our derandomization strategy: it is not at all
clear that there is a small circuit for checking if the optimal common `-partial decision tree for F  ρ has
depth greater than t.
    It will be convenient for us to represent restrictions ρ ∈ {0, 1, ∗}n as bitstrings (ρ, y) ∈ {0, 1}n×q ×
{0, 1}n := {0, 1}Yq , where q ∈ N is a parameter and Yq = (q + 1)n.
Definition 4.1 (Representing restrictions as bitstrings). We associate with each string (ρ, y) ∈ {0, 1}Yq
the restriction ρ(ρ, y) ∈ {0, 1, ∗}n defined as follows:
                                              (
                                                ∗ if ρi,1 = · · · = ρi,q = 1
                                   ρ(ρ, y)i =
                                                yi otherwise.
      The following observation explains the role of q:
Observation 2. Let (ρ, y) be drawn from the uniform distribution over {0, 1}Yq . Then the random
restriction ρ(ρ, y) ∈ {0, 1, ∗}n is distributed according to R p where p = 2−q .
      Our main result in this section is a pseudorandom multi-switching lemma:
Theorem 4.2 (Derandomized version of Theorem 3.8). Let F = (F1 , . . . , FM ) be an ordered list of Q-
clause k-CNFs. Let δ , p ∈ (0, 1) and define q = log(1/p). Let ` ≥ k and t ∈ N, and D be any distribution
over {0, 1}Yq that (δ /(M dt/`e nO(t) ))-fools the class of depth-4 circuits of size M(nO(`) QO(`) 2O(kq) ). Then
                           depth(CCDT` (F  ρ(η, z))) ≥ t ≤ 16t+` M dt/`e (32pk)t + δ .
                                                                
                    Pr
                  (η,z)←D


4.1     Bad restrictions and the structure of witnessing paths
Fix F = (F1 , . . . , FM ). We say that a restriction ρ ∈ {0, 1, ∗}n is bad if
                                          depth(CCDT` (F  ρ)) ≥ t.
Fix ρ to be a bad restriction. Recalling our definition of the set of canonical common partial decision
trees (Definition 3.7), there exists a tree T ∈ CCDT` (F  ρ) and a path Π of length exactly t through T .
Furthermore, we have that
   1. There exist indices 1 ≤ i1 ≤ i2 ≤ · · · ≤ iu ≤ M where u ≤ dt/`e, and
   2. Π = π (1) ◦ · · · ◦ π (u) , where for all j ∈ [u], we have that supp(π ( j) ) = supp(η ( j) ) where η ( j) is a
      path through the canonical decision tree
                                           CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ).
        Furthermore, for every j ∈ [u − 1] we have that η ( j) is a full path of length between ` + 1 and ` + k
        through the CDT, and η (u) is a path of length exactly t − ∑u−1             ( j)
                                                                        j=1 |supp(η )|. (Note that η
                                                                                                      (u) is not

        necessarily a full path.)

                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                     17
                                    ROCCO A. S ERVEDIO AND L I -YANG TAN

(Note that by (2), these subpaths π ( j) of Π are supported on mutually disjoint sets of coordinates.) With
this structure of Π in mind, we make the following definition:
Definition 4.3 (F -traversal). Let F = (F1 , . . . , FM ) be an ordered list of CNFs. An `-segmented F -
traversal of length t is a tuple P = (I , (S1 , . . . , Su ), Π, H) comprising:
   1. An ordered list of indices I = (i1 , . . . , iu ) where 1 ≤ i1 ≤ · · · ≤ iu ≤ M and u ≤ dt/`e,
   2. For each index i j ∈ I , a subset S j ⊆ [n] such that
         (a) These sets are mutually disjoint: S j ∩ S j0 = 0/ for all j 6= j0 .
         (b) For 1 ≤ j ≤ u − 1, each S j has size between ` + 1 and ` + k, and Su has size exactly t −
             ∑u−1         ( j)
               j=1 |supp(η )|.

        (Consequently |S1 ∪ · · · ∪ Su | = t.)
   3. An assignment Π = π (1) ◦ · · · ◦ π (u) to the variables in S1 ∪ · · · ∪ Su , where
                                       π ( j) : {0, 1}S j → {0, 1}           for 1 ≤ j ≤ u.

   4. An assignment H = η (1) ◦ · · · ◦ η (u) to the variables in S1 ∪ · · · ∪ Su , where again
                                       η ( j) : {0, 1}S j → {0, 1}           for 1 ≤ j ≤ u.

    By our discussion above, for any restriction ρ ∈ {0, 1, ∗}n and any tree T ∈ CCDT` (F  ρ), every
path Π of length t through CCDT` (F  ρ) uniquely induces an `-segmented F -traversal P of length t.
We say that P occurs in CCDT` (F  ρ) if it is induced by some path Π of length t through T for some
T ∈ CCDT` (F  ρ).
    Definition 4.3 immediately yields the following:
Proposition 4.4 (Number of F -traversals). Fix an ordered list F = (F1 , . . . , FM ) of k-CNFs, and let
PF ,`,t denote the collection of all `-segmented F -traversals of length t. Then

                                                 |PF ,`,t | ≤ M dt/`e nO(t) .

4.2     A small AC0 circuit for recognizing bad restrictions
We begin by showing that for every F -traversal P = (I , (S1 , . . . , Su ), Π, H), there is a small circuit CP
over {0, 1}Yq that returns 1 on input (ρ, y) ∈ {0, 1}Yq iff P occurs in CCDT` (F  ρ(ρ, y)). Since
      ρ(ρ, y) is bad ⇐⇒ depth(CCDT` (F  ρ(ρ, y))) ≥ t
                     ⇐⇒ ∃ `-segmented F -traversal P of length t occurring in CCDT` (F  ρ(ρ, y)),
by considering
                                         CF ,`,t (ρ, y) :=                CP (ρ, y)
                                                                 _
                                                                                                         (4.1)
                                                              P∈PF ,`,t

we have that
                                       ρ(ρ, y) is bad ⇐⇒ CF ,`,t (ρ, y) = 1.

                          T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                              18
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Claim 4.5 (Circuit for a single F -traversal). Let P = (I , (S1 , . . . , Su ), Π, H) be an `-segmented F -
traversal of length t. There is a depth-4 AND-OR-AND-OR circuit CP : {0, 1}Yq → {0, 1} of size
M(nO(`) QO(`) 2O(kq) ) such that

                ∀ (ρ, y) ∈ {0, 1}Yq : CP (ρ, y) = 1 ⇐⇒ P occurs in CCDT` (F  ρ(ρ, y))

Proof. Our circuit CP will be the AND of M many depth-3 subcircuits, one for each k-CNF F ∈ F . As
we will explain later, each of these subcircuits is one of two types. We first describe these two types of
“candidate subcircuits,” and then explain precisely which M subcircuits of each type are AND-ed together
to give CP . (Both these types of circuits are implicit in the work of [73].)

   1. First type: Circuits checking that a particular restriction η is a segment of a path in a
      particular CDT. We claim that for any Q-clause k-CNF F 0 = C1 ∧ · · · ∧CQ and restriction      η, there
                                                                                       Q 
      is a depth-3 OR-AND-OR circuit G over {0, 1}Yq with fan-in sequence ( ≤|η|           , Q2O(kq) , O(kq))
      that returns 1 on input (ρ, y) iff η is the initial segment of a path in CDT(F 0  ρ(ρ, y)).
      For each i ∈ [Q], we write Fixedi to denote the set

                                  { j ∈ [n] : j ∈ η −1 ({0, 1}) and x j occurs in Ci }

      of all variables that are fixed by η and occur in Ci , and we write η (i) ∈ {0, 1}Fixedi to denote η
      restricted to the coordinates in Fixedi . We note that η is a path in CDT(F 0  ρ(ρ, y)) if and only if
      there is a set I ⊆ [Q] of indices (indices of the clauses that contribute to η in CDT(F 0  ρ(ρ, y)))
      where |I| ≤ |η|, such that for all i ∈ Q:

                 / I, the clause Ci is satisfied by the restriction ρ ◦ η (<i) , where η (<i) denotes the composi-
        (a) If i ∈
                        0
            tion of η (i ) for all i0 ∈ I such that i0 < i. (This clause does not contribute to CDT(F 0  ρ(ρ, y));
            it is “skipped” in the canonical decision tree construction process);
        (b) Otherwise, if i ∈ I and i < max(I),
             (i) for every j ∈ [n] such that x j occurs in Ci ,
                                                       (
                                                         =∗       if j ∈ Fixedi
                                             ρ(ρ, y) j
                                                         ∈ {0, 1} otherwise;

             (ii) ρ ◦ η (<i) does not satisfy any literal in Ci (and hence this clauses contributes to CDT(F 0 
                  ρ(ρ, y)));
            (iii) ρ ◦ η (i) satisfies Ci .
        (c) Finally, if i = max(I),
             (i) for every j ∈ [n] such that x j occurs in Ci , we have ρ(ρ, y) j = ∗ if j ∈ Fixedi ;
            (ii) ρ ◦ η (<i) does not satisfy any literal in Ci .

      We now argue that for any fixed set I ⊆ [Q], the above conditions can be checked by a Q2O(kq) -
      clause O(kq)-width CNF, from which our overall claim about circuits of the first type follows by

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                   19
                                     ROCCO A. S ERVEDIO AND L I -YANG TAN

                                    Q 
        taking an OR over all ≤|η|      possibilities for I. All three conditions (a), (b), and (c) depend only on
        the coordinates of ρ(ρ, y) that occur in Ci ; there are at most k such coordinates since Ci has width
        at most k, and hence these conditions depend on at most k(q + 1) coordinates of (ρ, y) ∈ {0, 1}Yq .
        Consequently it is clear that all three conditions can be checked by a 2O(kq) -clause O(kq)-CNF
        over {0, 1}Yq . The CNF is simply the AND of all Q many of these CNFs, one for each clause Ci of
        F 0 , and hence is itself a Q2O(kq) -clause O(kq)-width CNF.

   2. Second type: Circuits checking that a particular CDT has depth at most `. Next, we claim
      that for every Q-clause k-CNF F 0 , there is a depth-3 AND-OR-AND circuit with fan-in sequence
      ((2n)`+1 QO(`) , Q2O(kq) , O(kq)) that returns 1 on input (ρ, y) iff depth(CDT(F 0  ρ(ρ, y))) ≤ `.
        We establish this by showing that there is a depth-3 OR-AND-OR circuit Σ with the claimed fan-in
        sequence that returns 1 on input (ρ, y) if depth(CDT(F 0  ρ(ρ, y))) > `; given such a circuit Σ, the
        desired AND-OR-AND circuit is obtained by negating Σ and using de Morgan’s law. Certainly
        depth(CDT(F 0  ρ(ρ, y))) > ` iff there is a path η of length ` + 1 in CDT(F 0  ρ(ρ, y)). There
        are at most (2n)`+1 many possible paths of length ` + 1 (every path is simply an ordered list
        of literals), and as argued in (1) above, for everysuch path η there is a depth-3 OR-AND-OR
                                                         Q
        circuit over {0, 1}Yq with fan-in sequence ( ≤`+1    , Q2O(kq) , O(kq)) that checks if η is a path in
        CDT(F 0  ρ(ρ, y)). The overall circuit Σ is simply the OR of at most (2n)`+1 such circuits, one
        for each path η.

    With these two types of circuits in hand the overall circuit CP is now easy to describe. CP is the AND
of depth-3 subcircuits, one for each k-CNF F ∈ F :

      • For each of the indices i j ∈ I , a circuit of the first type that checks that η ( j) is a path in CDT(Fi j 
        ρ(ρ, y) ◦ π (1) ◦ · · · ◦ π ( j−1) ) (recall from Definition 4.3 that η ( j) is H restricted to the variables in
        S j ). If i < u, we also check that η ( j) is a full path.

      • For all other indices i ∈ [M] \ I , defining i− = max{ j ∈ [u] : i j < i}, if i− < u we include a
                                                                                                 −
        circuit of the second type that checks that depth(CDT(Fi  ρ(ρ, y) ◦ π (1) ◦ · · · ◦ π (i ) )) ≤ `, where
        i− = max{ j ∈ [u] : i j < i}.

The bound on the size of this overall circuit follows from a union bound over the sizes of the subcircuits
given in (1) and (2) above.

4.3     Putting the pieces together: Proof of Theorem 4.2
Recalling the definition (4.1) of CF ,`,t ,

                                          CF ,`,t (ρ, y) :=               CP (ρ, y),
                                                                 _

                                                              P∈PF ,`,t


Proposition 4.4 giving a bound on its top fan-in, and Claim 4.5 giving a bound on the size of its subcircuits,
we have shown the following:

                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                      20
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Claim 4.6 (Circuit for recognizing bad restrictions). Let F = (F1 , . . . , FM ) be an ordered list of Q-clause
k-CNFs, and let `,t ≥ 1. There is a depth-5 circuit CF ,`,t over {0, 1}Yq such that



                       CF ,`,t (ρ, y) = 1   ⇐⇒      depth(CCDT` (F  ρ(ρ, y))) ≥ t.



This circuit CF ,`,t is the OR of M d`/te nO(t) many depth-4 circuits of size M(nO(`) QO(`) 2O(kq) ).



    The following observation will be useful for us:



Observation 3. Let F = (F1 , . . . , FM ) be an ordered collection of k-CNFs. For ` ≥ k, the total number of
paths Π such that Π is a path of length exactly t in some tree T ∈ CCDT` (F ) is at most (2`+k ·2`+k )dt/`e ≤
16t+` . Consequently, if (ρ, y) ∈ {0, 1}Yq is such that CF ,`,t (ρ, y) = 1, then CP (ρ, y) = 1 for (at least one)
and at most 16t+` many `-segmented F -traversals P of length t.



Proof. This follows by inspection of the recursive construction of the set CCDT` (F ) of canonical
common `-partial decision trees for F . Each time case (2) of the definition is reached, the set P of
witnessing full paths has size at most 2`+k , and for each path in P there are at most 2`+k possible
assignments to the variables on the path. Finally, there are at most dt/`e levels of recursive calls.




    With Claim 4.6 and Observation 3 in hand, we are now ready to prove our main result of this
section (Theorem 4.2), a derandomized version of the multi-switching lemma (Theorem 3.8). We restate
Theorem 4.2 here for the reader’s convenience:



Theorem 4.7. Let F = (F1 , . . . , FM ) be an ordered list of Q-clause k-CNFs. Let δ , p ∈ (0, 1) and define
q = log(1/p). Let ` ≥ k and t ∈ N, and D be any distribution over {0, 1}Yq that (δ /(M dt/`e nO(t) ))-fools
the class of depth-4 circuits of size M(nO(`) QO(`) 2O(kq) ). Then



                             depth(CCDT` (F  ρ(η, z))) ≥ t ≤ 16t+` M dt/`e (32pk)t + δ .
                                                          
                    Pr
                  (η,z)←D


                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                 21
                                          ROCCO A. S ERVEDIO AND L I -YANG TAN

Proof.
                                                                       
                  Pr  depth(CCDT` (F  ρ(η, z))) ≥ t
             (η,z)←D

                          CF ,`,t (η, z)
                                        
             = E                                                                                       (Claim 4.6)
                (η,z)←D

                                    CP (η, z)
                                             
             ≤ ∑           E                                                                       (union bound)
                  P∈PF ,`,t (η,z)←D
                                                                           
                                                                 δ
              ≤        ∑            E     [ CP (ρ, y) ] +                        (D (δ /(M dt/`e nO(t) ))-fools CP )
                  P∈PF ,`,t     (ρ,y)←U                     M dt/`e nO(t)
                                   "                        #
              ≤δ+           E             ∑    CP (ρ, y)                                        (Proposition 4.4 )
                       (ρ,y)←U P∈P
                                  F ,`,t

              ≤ δ + 16t+`         [ CF ,`,t (ρ, y) ]
                                    E                                                             (Observation 3)
                                  (ρ,y)←U

              = δ + 16t+` Pr
                                                                  
                                    depth(CCDT` (F  ρ(ρ, y))) ≥ t                                     (Claim 4.6)
                          (ρ,y)←U
                      t+`
                                                          
              = δ + 16     Pr depth(CCDT` (F  ρ)) ≥ t                                            (Observation 2)
                                  ρ←R p

              ≤ δ + 16t+` M dt/`e (32pk)t .                                                        (Theorem 3.8)




5 Applying our pseudorandom multi-switching lemma: the Ajtai–Wigderson
  framework for PRG constructions
Implicit in the early work of Ajtai–Wigderson [6] giving the first PRG for AC0 circuits is a powerful,
generic framework for constructing PRGs from “pseudorandom simplification lemmas.” In this section
we give an explicit description of their framework in general terms. Our work shows that this framework
is fairly versatile: both our PRGs, for AC0 circuits and sparse F2 polynomials, are obtained within it
(albeit with specialized pseudorandom simplification lemmas for each class). Variants of these ideas
from [6] are also present in the more recent PRG constructions of [35, 44, 63, 73, 55].

    • Let C be the function class of interest, the class for which we would like to design a PRG. For us
      C will either be the class of size-M depth-d AC0 circuits, or the class of S-sparse F2 polynomials.
      (Our analysis will assume that C is closed under restrictions, which holds for natural function
      classes including our two classes of interest.)
    • Let Csimple be a class of “simple” functions. We will describe the relationship between C and
      Csimple in detail shortly, but we mention here that this approach relies on the simplicity of the
      functions in Csimple enabling PRGs of short seed length. For us, when C is the class of AC0 circuits,
      Csimple will be the class of small-depth decision trees; when C is the class of sparse F2 polynomials,
      Csimple will be the class of small-depth decision trees with low-degree F2 polynomials at its leaves.
      (Note that we do not require that Csimple be a subclass of C .)

                            T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                 22
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

     At a high level, the plan is to give a randomness-efficient reduction from the task of fooling C to that of
fooling Csimple ; we obtain a pseudorandom distribution D over {0, 1}n that fools C by “pseudorandomly
                                                                                                    0
stitching together” independent copies of a pseudorandom distribution Dsimple over {0, 1}n that fools
Csimple (for some n0 ≤ n). In more detail, the plan is to fool C recursively in stages, where in each stage
we employ two pseudorandom constructs:

   1. A PRG for Csimple , and

   2. A “pseudorandom C -to-Csimple simplification lemma.”
      Roughly speaking, such a simplification lemma says the following: there is a pseudorandom
      distribution R over restrictions such that for all C ∈ C , with high probability over ρ ← R the
      randomly restricted function C  ρ belongs to Csimple . This pseudorandom distribution R over the
      space of restrictions {0, 1, ∗}n should have the following structure:

        (a) The set of “live” positions L ⊆ [n] (i. e., the set of ∗’s) can be sampled with seed length sSL .
            We write L ← Rstars to denote a draw from this pseudorandom distribution over subsets of [n].
        (b) Non-live positions [n] \ L are filled in independently and uniformly with {0, 1}, and do not
            count against the seed length sSL . We write ρ ← {0, 1}[n]\L to denote a draw of such a
            restriction.

      We will require each subset L ∈ supp(Rstars ) to have size at least pn for some not-too-small
      p ∈ (0, 1) (equivalently, we will require R to be supported on restrictions that leave at least a p
      fraction of coordinates unfixed); as we will soon see, this ensures that we “make good progress” in
      each stage.
      The guarantee that we will require of this pseudorandom C -to-Csimple simplification lemma is as
      follows: for every C ∈ C ,
                                                                      
                                                             / Csimple ≤ δSL ,
                                                                     
                                  E         Pr       (C  ρ) ∈                                  (5.1)
                                  L←Rstars   ρ←{0,1}[n]\L


      where the failure probability δSL is as small as possible.

An aside about applying Theorem 4.2 within this framework. The astute reader may have noticed that
our pseudorandom multi-switching lemma (Thereom 4.2) from the previous section is established for a
distribution over restrictions that does not have the structure prescribed above: rather than a pseudorandom
choice of live variables L ⊆ [n] and a fully random choice of bits as values for the non-live variables [n] \ L,
Theorem 4.2 is established for a distribution over restrictions where both choices are pseudorandom.
(Recalling Definition 4.1, we see that η in the statement of Theorem 4.2 corresponds to the choice of
L ⊆ [n], and z to the choice of bits for the coordinates in [n] \ L; in the proof of Theorem 4.2 this pair (η, z)
is sampled from a single pseudorandom distribution over Yq .) However, this suggests that Theorem 4.2
is “stronger than it has to be,” since it is more randomness efficient than necessary for this application.
Indeed, in Proposition 6.2 we formalize this intuition, showing that our proof of Theorem 4.2 also extends
to hold for distributions over restrictions with the prescribed structure.

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                  23
                                       ROCCO A. S ERVEDIO AND L I -YANG TAN

One stage of the PRG construction. Going back to the general framework, we next describe how the
two pseudorandom constructs described above—a PRG for Csimple and a pseudorandom C -to-Csimple
simplification lemma—are employed together within a single stage of the PRG construction for C .
    For L ⊆ [n] let us write δ (L) to denote the probability Prρ←{0,1}[n]\L [ (C  ρ) ∈/ Csimple ]; by (5.1) we
have that EL←Rstars [δ (L)] ≤ δSL . Fix an L ⊆ [n]. Let Dsimple be a distribution that δPRG -fools Csimple , and
suppose Dsimple can be sampled with sPRG many random bits. A simple but crucial fact from [6] is the
following: the distribution over {0, 1}n where

   1. The coordinates in [n] \ L are filled in with uniform random bits;

   2. The coordinates in L are filled in according to the pseudorandom distribution Dsimple ,

(δ (L) + δPRG )-fools C . That is, for all C ∈ C ,

                                           C(x[n]\L , yL ) = E C(x) ± (δ (L) + δPRG ).
                                                                
                                  E
                               x←U                          x←U
                             y←Dsimple


Taking expectations over L ← Rstars and using (5.1), we get that
                                                                 

                                                  C(x[n]\L , yL )  = E C(x) ± (δSL + δPRG ).
                                                                        
                         E                E                                                               (5.2)
                                 
                                 
                      L←Rstars         x←U                        x←U
                                     y←Dsimple


Consider the distribution Rgentle over the space of restrictions {0, 1, ∗}n defined as follows: to make a draw
π ← Rgentle , first make draws L ← Rstars and y ← Dsimple , and then output the restriction π ∈ {0, 1, ∗}n
where                                    (
                                           y if i ∈ L
                                   πi = i                     for all i ∈ [n].
                                           ∗ otherwise.

In words, π is the restriction that fixes the coordinates in L according to y. With this definition of Rgentle
in hand, we can rewrite (5.2) as
                                                      
                                          E (C  π)(x) = E C(x) ± (δSL + δPRG ).
                                                            
                              E                                                                           (5.3)
                         π←Rgentle        x←U                   x←U


Note that a draw π ← Rgentle can be sampled with sSL + sPRG random bits. (We need sSL random bits to
make a draw L ← Rstars , and sPRG random bits to make a draw y ← Dsimple .)
    We emphasize that the restriction π is supported on L (i. e., π −1 ({0, 1}) = L), rather than [n] \ L.
For this reason we may view Rgentle as being “dual” to the distribution R that yields a C -to-Csimple
simplification lemma: while R is supported on restrictions that leave at least a p fraction of coordinates
unfixed, Rgentle is supported on restrictions that fix at least a p fraction of coordinates. This explains
why, as alluded to above, we require the pseudorandom simplification lemma to be such that every
L ∈ supp(Rstars ) has size at least pn for some not-too-small p ∈ (0, 1).

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                 24
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Fooling C recursively: the overall PRG construction and its analysis. We have sketched the
construction of a distribution Rgentle over restrictions in {0, 1, ∗}n that preserves C’s bias up to an error
of (δSL + δPRG ) in the sense of (5.3); furthermore, Rgentle is supported on restrictions that fix at least
a p fraction of coordinates. Since an ε-PRG is simply a distribution over assignments in {0, 1}n that
preserves C’s bias up to an error of ε, we see that we have made a “p-fraction of progress” towards a
PRG, while incurring (δSL + δPRG ) out of the total ε amount of error allowed.
    Our PRG construction will recurse on C  π for all π ∈ supp(Rgentle ), all of which are functions
over at most (1 − p)n variables. (Since C is closed under restrictions, we note that C  π belongs to
C and so we can indeed apply the same argument recursively.) By fixing at least a p fraction of the
remaining coordinates in each stage, we ensure that there are at most p−1 ln n stages in total, after which
n coordinates will have been fixed. Hence, as long as

                                                                         ε
                                           δSL + δPRG ≤                          ,
                                                                      p−1 ln n

i. e., the total error incurred across all stages is at most ε, we will have that the final distribution over
{0, 1}n does indeed ε-fool C.
      As noted above, the seed length required to sample from Rgentle in each stage is sSL + sPRG . Since
there are at most p−1 ln n stages in total, the overall seed length of this PRG construction is

                                           (sSL + sPRG ) · p−1 ln n.

    The following theorem summarizes the upshot of our discussion in this section:

Theorem 5.1 (PRGs from pseudorandom simplification lemmas; implicit in [6]). Let C and Csimple be
two function classes over {0, 1}n , and suppose we have

   1. A δPRG -PRG for Csimple with seed length sPRG (δPRG ) for all δPRG > 0, and

   2. A pseudorandom C -to-Csimple simplification lemma with the following parameters: for all δSL > 0,
      there is a distribution Rstars over subsets of [n] such that

       (a) A draw L ← Rstars can be sampled with sSL (δSL ) random bits.
       (b) Every L ∈ supp(Rstars ) satisfies |L| ≥ pn for some p ∈ (0, 1).
        (c) For all C ∈ C , we have that
                                                                                          
                                                                              / Csimple
                                                                                         
                                       E               Pr             (C  ρ) ∈                ≤ δSL .
                                    L←Rstars       ρ←{0,1}[n]\L


Then for all ε > 0, there is an ε-PRG for C with seed length
                                  εp             ε p 
                                 sSL        + sPRG          · p−1 ln n.
                                     2 ln n         2 ln n

                       T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                               25
                                     ROCCO A. S ERVEDIO AND L I -YANG TAN


6 Pseudorandom simplification lemmas for AC0 circuits and sparse F2
  polynomials
In order to apply Theorem 4.2, we need a PRG that can fool depth-4 circuits (to play the role of D in that
theorem). We recall a very recent result of Harsha and Srinivasan giving the first PRG for fooling AC0
with a seed length whose ε-dependence is log(1/ε); we state this result, specialized to the notation of
Section 4, below.
Theorem 6.1 ([36]). The class of size-S depth-d circuits over {0, 1}Yq is δ -fooled by rHS -wise indepen-
dence where
                              rHS (S, d, δ ) = log3d+O(1) (S) · log(1/δ ).
    We will need an elementary fact that states, roughly speaking, that if D is a distribution that fools
a class F that is closed under restrictions, then the distribution obtained by replacing a subset of its
coordinates with fully random bits also fools F . Specialized to our context, we state this fact as follows:
Proposition 6.2. Let Dr-wise be an rHS -wise independent distribution over {0, 1}Yq where rHS (S, d, δ )
is as defined in Theorem 6.1. Consider the distribution Dmix over {0, 1}Yq where a draw from Dmix is
(η, y) ∈ {0, 1}n×q × {0, 1}n where
   1. (Pseudorandom stars) η is drawn from the marginal distribution of Dr-wise on {0, 1}n×q , and
   2. (Non-stars filled in fully randomly) y is an independent uniform string drawn from {0, 1}n .
Then like Dr-wise , this distribution Dmix also δ -fools the class of size-S depth-d circuits over {0, 1}Yq .
Proof. This follows from the observation that Dmix , like Dr-wise , is rHS -wise independent (or from the
simple argument that gives Fact 9 of [73]).

   We can now state the pseudorandom multi-switching lemma that we will use for both our pseudoran-
dom simplification lemmas (for AC0 circuits and for sparse F2 polynomials):
Lemma 6.3 (Stars chosen pseudorandomly, non-stars filled in fully randomly). Let F = (F1 , . . . , FM ) be
an ordered list of Q-clause k-CNFs. Let ` ≥ k, t ∈ N and δ , p ∈ (0, 1), and define q = log(1/p). There is
a distribution Rstars over subsets of [n] such that the following hold:
   1. A draw L ← Rstars can be sampled with O(r log(n)) random bits, where
                                                                           
                                           O(`) O(`) O(kq)
                                                                   δ
                              r = rHS M n Q 2               , 4, dt/`e O(t)
                                                                M     n
      and rHS (·, ·, ·) is as defined in Theorem 6.1.
   2. Rstars is p-regular: PrL←Rstars i ∈ L = p for all i ∈ [n].
                                            

   3. A multi-switching lemma holds with respect to Rstars :

                               depth(CCDT` (F  ρ)) ≥ t ≤ 16t+` M dt/`e (32pk)t + δ .
                                                            
                        Pr                                                                                (6.1)
                       L←Rstars
                      ρ←{0,1}[n]\L


                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                    26
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Proof. Let D be an r-wise independent distribution over {0, 1}Yq ; standard constructions [7] show that D
can be sampled with O(r log |Yq |) = O(r log(n)) random bits. The marginal of D on {0, 1}n×q naturally
induces a distribution Rstars over subsets of [n] via Definition 4.1, where a draw L ← Rstars is defined to
be ρ(ρ, z)−1 (∗) (i. e., for all coordinates i ∈ [n], i ∈ L iff ρ i,1 = ρ i,2 = · · · = ρ i,q = 1). Since D is r-wise
independent for r  q, we have that

                                                  ρ i,1 = ρ i,2 = · · · = ρ i,q = 1 = 2−q = p,
                                                                                 
                       Pr i ∈ L = Pr
                     L←Rstars             (ρ,z)←D

which establishes the second claim. The third claim follows by combining Theorem 4.2, Theorem 6.1,
and Proposition 6.2.

6.1    Pseudorandom simplification lemma for AC0 circuits
We will use the following instantiation of Lemma 6.3 in our construction of a PRG for AC0 circuits:

Corollary 6.4. There is a universal constant c > 0 such that the following holds. Let F = (F1 , . . . , FM )
be an ordered list of Q-clause k-CNFs with log M ≥ k and ε0 ∈ (0, 1). There is a distribution Rstars over
subsets of [n] such that:

   1. A draw L ← Rstars can be sampled with s = logc (MQ) log(1/ε0 ) random bits.

   2. Rstars is p-regular for p = Ω(1/k).

   3. A multi-switching lemma holds with respect to Rstars :

                                  depth(CCDTlog M (F  ρ)) ≥ log(2M 5 /ε0 ) ≤ ε0 .
                                                                          
                           Pr
                            L←Rstars
                           ρ←{0,1}[n]\L


Proof. Applying Lemma 6.3 with ` = log M, we see that the failure probability (6.1) can be bounded by

                      16t+` M dt/`e (32pk)t + δ ≤ 16t M 4 M dt/`e (64)−t + δ < M 5 2−t + δ

by choosing p = Ω(1/k). We make this at most ε0 by choosing t = log(2M 5 /ε0 ) and δ = ε0 /2. The
bound on s follows from the d = 3 case of Theorem 6.1 and our setting of parameters, and this completes
the proof.

   Following the standard bottom-up approach to AC0 circuit lower bounds, we compose d − 1 iterative
applications of the pseudorandom multi-switching lemma of Corollary 6.4 to obtain our pseudorandom
simplification lemma for AC0 :

Lemma 6.5 (Pseudorandom simplification lemma for AC0 ). There is a universal constant C > 0 such
that the following holds. Let C be a size-M depth-d Boolean circuit over {0, 1}n (so recall that M ≥ n)
and ε1 ∈ (0, 1). There is a distribution Rstars over subsets of [n] such that

   1. A draw L ← Rstars can be sampled with s = O(2d logC (M) log(1/ε1 )) random bits.

                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                     27
                                            ROCCO A. S ERVEDIO AND L I -YANG TAN

   2. Rstars is p-regular for p = Ω(1/ logd−1 (M)).

   3. The following simplification lemma holds with respect to Rstars :

                               C  ρ is not a decision tree of depth O(2d log(M/ε1 )) ≤ ε1 .
                                                                                    
                      Pr
                              L←Rstars
                             ρ←{0,1}[n]\L


Proof. Fix t := log(2dM 5 /ε1 ).
Preprocessing stage: We begin with a zeroth stage of preprocessing to trim the bottom fan-in of C:
applying Corollary 6.4 with F being the bottom layer gates of C (viewed as depth-2 circuits of size
                                                                                    (0)              (0)
Q ≤ n and bottom fan-in k = 1) and ε0 = ε1 /d, we get that there is a distribution Rstars such that Rstars
can be sampled with s0 := logc (Mn) log(d/ε1 ) random bits (where c is the universal constant from
                 (0)
Corollary 6.4), Rstars is p0 -regular for p0 = Ω(1), and
                                                                                                          ε1
                                 C  ρ (0) is not a (t, AC0 (depth d, bottom fan-in log M))-decision tree ≤ .
                             
              Pr
              (0)
           L←Rstars                                                                                        d
        ρ ←{0,1}[n]\L
         (0)



First stage: Let T (0) be any good outcome of the zeroth stage above, a (t, AC0 (depth d, bottom fan-in
log M))-decision tree. Note that there are at most 2t many AC0 (depth d, bottom fan-in log M) circuits at
the leaves of this depth-t decision tree T (0) , each of size at most M. Fix any such circuit C0 . Applying
Corollary 6.4 to C0 , with F being all its bottom layer depth-2 subcircuits of bottom fan-in log M (so
                                                                      (1)            (1)
Q ≤ M) and ε0 = ε1 /(d2t ), we get that there is a distribution Rstars such that Rstars can be sampled with
                                              (1)
s1 := logc (M 2 ) log(d2t /ε1 ) random bits, Rstars is p1 -regular for p1 = Ω(1/ log M), and
                                                                                                           ε1
                           C0  ρ (1) is not a (2t, AC0 (depth d − 1, bottom fan-in log M))-decision tree ≤ t .
                                                                                                        
         Pr
            (1)
       L←Rstars                                                                                            d2
   ρ (1) ←{0,1}[n]\L


Taking a union bound over all the circuits at the leaves of T (0) (at most 2t of them), we get that
                                                                                                            ε1
                      T (0)  ρ (1) is not a (t + 2t, AC0 (depth d − 1, bottom fan-in log M))-decision tree ≤ .
                  
      Pr
       (1)
    L←Rstars                                                                                                 d
 ρ ←{0,1}[n]\L
  (1)



Let T (1) be any good outcome of the above, and consider any circuit C00 at a leaf of this depth-3t decision
tree. We note a subtlety at this point (this same subtlety is present in applications of the standard switching
lemma): while C00 has at most M gates in total from levels 1 to d − 2 (indeed, its number of gates in those
layers is at most that of C), each of its bottom layer depth-2 subcircuits may have size as large as M 2 .
This is because the M-way AND of depth-(log M) decision trees, when expressed as depth-2 circuit, can
have size as large as M · 2log M = M 2 . (And of course the same is true for the M-way OR.) Therefore
from the second stage onwards, we will always apply Corollary 6.4 with F being a family of M many
M 2 -clause (log M)-CNFs (or DNFs), and so Q = M 2 .

                                   T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                           28
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

The i-th stage: We repeat for d − 2 more stages, where in the i-th stage we consider a good outcome
T (i−1) of the previous stage, a ((2i − 1)t, AC0 (depth d − i + 1, bottom fan-in log M))-decision tree. Fix
any subcircuit C000 of at a leaf of this depth-((2i − 1)t) decision tree T (i−1) . Applying Corollary 6.4 to C000 ,
with F being all its bottom layer depth-2 subcircuits of bottom fan-in log M (as noted above, we take
Q = M 2 ) and
                                                          ε1
                                                 ε0 = (2i −1)t ,
                                                       d2
                                                  (i)               (i)
we get that there is a distribution Dstars such that Rstars can be sampled with

                                           si := logc (M 3 ) log(1/ε0 ) = 2i · O(t logc (M))
                         (i)
random bits, Rstars is pi -regular for pi = Ω(1/ log M), and
                                                                                                                  ε1
                        C000  ρ (i) is not a (2it, AC0 (depth d − i, bottom fan-in log M))-decision tree ≤
                                                                                                        
      Pr                                                                                                           i   .
          (i)
     L←Rstars                                                                                                 d 2 −1)t
                                                                                                                 (2

ρ (i) ←{0,1}[n]\L

(We have used the fact that log(2M 5 /ε0 ) = (2i − 1)t + log(2dM 5 /ε1 ) = 2it.) Taking a union bound over
                                                      i
all the circuits at the leaves of T (i−1) (at most 2(2 −1)t of them), we get that
                                                                                                                   ε1
                    T (i−1)  ρ (i) is not a ((2(i+1) − 1)t, AC0 (depth d − i, bottom fan-in log M))-decision tree ≤ .
                
     Pr
       (i)
    L←Rstars                                                                                                        d
ρ ←{0,1}[n]\L
 (i)


The overall distribution. Composing all d stages described above (including the zeroth preprocessing
stage), we get an overall distribution Rstars where a draw L ← Rstars is simply
                                                                             (i)
                               L = L(0) ∩ L(1) ∩ · · · ∩ L(d−1) ,   L(i) ← Rstars for all 0 ≤ i ≤ d − 1.

This distribution Rstars can be sampled with
                                                 d−1
                                                  ∑ si = O(2d logC (M) log(1/ε1 ))
                                                  i=0

random bits for some constant C > 0, Rstars is p-regular for
                                                        d−1
                                                  p = ∏ pi = (Ω(1/ log(M)))d−1 ,
                                                        i=0

and by a union bound over the d many failure probabilities of ε1 /d from each of the d stages, we have
that indeed
                               C  ρ is not a depth-((2d − 1)t) decision tree ≤ ε1 .
                                                                            
                      Pr
                                 L←Rstars
                                ρ←{0,1}[n]\L

Since (2d − 1)t = O(2d log(M/ε1 )) (using d ≤ M so log(2dM 5 /ε1 ) ≤ log(2M 6 /ε1 )), this completes the
proof.

                                   T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                               29
                                       ROCCO A. S ERVEDIO AND L I -YANG TAN

6.2    Pseudorandom simplification lemma for sparse F2 polynomials
To motivate the parameter settings used in this subsection, we recall the discussion about multi-switching
lemmas and sparse F2 polynomials right before Section 3.1; observe that both the ∗-probability p and the
degree of the F2 polynomials obtained below are independent of the failure probability ε2 .
Lemma 6.6 (Pseudorandom simplification lemma for sparse F2 polynomials). There is a universal
constant C > 0 such that the following holds. Let P be an S-sparse F2 polynomial and ε2 ∈ (0, 1). There
is a distribution Rstars over subsets of [n] such that
   1. A draw L ← Rstars can be sampled with s = logC (Sn) log(1/ε2 ) random bits.
                                             √
   2. Rstars is p-regular for p = 2−O( log S) .
   3. The following simplification lemma holds with respect to Rstars :
                                     p                                 p                 
            Pr      P  ρ is not a O( log S) + log(2/ε2 ), F2 (degree log S) -decision tree ≤ ε2 .
            L←Rstars
           ρ←{0,1}[n]\L

Proof. We observe that an S-sparse F2 polynomial is simply a PAR ◦ AND circuit with S many bottom
layer gates of unbounded fan-in. With this point of view in mind, we apply Lemma 6.3 with F being
     √                          √                           √ size Q ≤ n and bottom fan-in k = 1) and
this family of S many AND gates (viewed as depth-2 circuits of
` = log S. By choosing t = A · log S + log(2/ε2 ), p = 2−B log S , and δ = ε2 /2, we get that the failure
probability (6.1) can be bounded by
                                                                                                      √
       t+` dt/`e          t
                                            √
                                       (A+1) log S+log(2/ε2 )        1+A
                                                                            √
                                                                             log S·log(2/ε2 )     32A· log S+log(2/ε2 )  ε2
      16   S       (32pk) + δ = 16                              ·S         ·2                   · AB B√log S·log(2/ε ) +
                                                                                                 S ·2                  2 2
                                  < ε2 ,
where the inequality holds for a suitable choice of absolute constant values A, B. The bound on s follows
from the d = 4 case of Theorem 6.1 and our setting of parameters, and this completes the proof.


7 PRGs for AC0 and sparse F2 polynomials from pseudorandom simplifi-
  cation lemmas
We will need the following easy fact for both our PRG constructions: we can derive from a p-regular
distribution Rstars satisfying a pseudorandom simplification lemma (in the sense of our main results in
the previous section, Lemmas 6.5 and 6.6) a distribution R0stars supported entirely on sets of size at least
(pn)/2, such that R0stars also satisfies a pseudorandom simplification lemma with only a slightly worse
failure probability. More precisely, and in more generality:
Proposition 7.1 (Condition on having sufficiently many stars). Fix any property Φ : {0, 1, ∗}n → {0, 1}
of restrictions. Let Rstars be a p-regular distribution over subsets of [n] and suppose
                                                              
                                             Pr       Φ(ρ) = 1 ≤ τ
                                               L←Rstars
                                              ρ←{0,1}[n]\L


                              T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                            30
      I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

for some τ > 0. Let R0stars be the distribution of L ← Rstars conditioned on L satisfying |L| ≥ (pn)/2. Then
                                                                        2τ
                                                     Pr0        Φ(ρ) = 1 ≤ .
                                                 L←Rstars                  p
                                                ρ←{0,1}[n]\L


Proof. Since Rstars is p-regular we have that EL←Rstars |L| = pn, and so
                                                        

                                                                              h      pn i p
                                           L ∈ supp(R0stars ) =
                                                            
                                 Pr                                     Pr     |L| ≥     ≥ .
                               L←Rstars                              L←Rstars        2    2

Hence

                                                              Φ(ρ) = 1 | L ∈ supp(R0stars )
                                                                                        
             Pr0         Φ(ρ) = 1 =               Pr
          L←Rstars                             L←Rstars
         ρ←{0,1}[n]\L                         ρ←{0,1}[n]\L
                                                                                         1                  2τ
                                          ≤       Pr          Φ(ρ) = 1 ·                           0        ≤    .
                                               L←Rstars                      PrL←Rstars [L ∈ supp(Rstars )]    p
                                              ρ←{0,1}[n]\L



7.1    PRGs for AC0 circuits
Theorem 7.2. For every d ≥ 2, M ≥ n, and ε > 0, there is an ε-PRG for the class C of n-variable size-M
depth-d circuits with seed length 2d logd+O(1) (M) log(1/ε).

Proof. Applying Proposition 7.1 to the pseudorandom simplification lemma of Lemma 6.5, we get that
for all ε1 > 0, there is a distribution R0stars over subsets of [n] such that

   1. A draw L ← R0stars can be sampled with sSL = O(2d logC (M) log(1/ε1 ) random bits, where C > 0
      is the universal constant from Lemma 6.5.

   2. Every L ∈ supp(R0stars ) satisfies |L| ≥ pn where p = Ω(1/ logd−1 (M)).

   3. For all C ∈ C ,
                                                                                              ε1
                                       C  ρ is not a decision tree of depth O(2d log(M/ε1 )) ≤ .
                                   
                         Pr0
                    L←Rstars                                                                   p
                   ρ←{0,1}[n]\L


Setting ε1 = ε p2 /(2 ln n) and taking Csimple to be the class of depth-t decision trees where

                                       t = O(2d log(M/ε1 )) = O(d 2d log(M/ε)),

we get that a draw L ← R0stars can be sampled with

                   sSL = O(2d logC (M) log(1/ε1 )) = O(d2d logC (M) log((log M)/ε))

                          T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                       31
                                       ROCCO A. S ERVEDIO AND L I -YANG TAN

random bits, and R0stars satisfies
                                                                                        
                                                                                                  εp
                                                                            / Csimple
                                                                                       
                                     E0              Pr             (C  ρ) ∈                ≤
                                  L←Rstars       ρ←{0,1}[n]\L                                    2 ln n

for all C ∈ C . Since Csimple is 0-fooled by any t-wise independent distribution, we get from Theorem 5.1
that there is an ε-PRG for C with seed length

                               O(sSL + t log(n)) · p−1 ln n = 2d logd+O(1) (M) log(1/ε),

and this completes the proof.

7.2   PRGs for sparse F2 polynomials
                                                       2
                                            √ ε > 0 there is an ε-PRG for the class C of n-variable
Theorem 7.3. For every S = 2ω(log log(n)) and
S-sparse F2 polynomials with seed length 2 log S) log(1/ε).
                                          O(


Proof. Applying Proposition 7.1 to the pseudorandom simplification lemma of Lemma 6.6, we get that
for all ε2 > 0, there is a distribution R0stars over subsets of [n] such that
   1. A draw L ← R0stars can be sampled with sSL = logC (Sn) log(1/ε2 ) random bits, where C > 0 is the
      universal constant from Lemma 6.6.
                                                                                   √
   2. Every L ∈ supp(R0stars ) satisfies |L| ≥ pn where p = 2−O( log S) .

   3. For all P ∈ C ,
                                            p                               p                     ε2
             Pr0            P  ρ is not a O( log S) + log(2/ε2 ), F2 (degree log S) -decision tree ≤ .
          L←Rstars                                                                                   p
         ρ←{0,1}[n]\L

                                                                                √
Setting ε2 = ε p2 /(2 ln n) and taking Csimple to be the class of (t, F2 (degree log S))-decision trees where
                                   p                            p
                            t = O( log S) + log(2/ε2 ) = O( log S) + log(1/ε)
                                                                    2
(where the second equality uses S = 2ω(log log(n)) ), we get that a draw L ← R0stars can be sampled with
                                                                     1
                                sSL = logC (Sn) log(1/ε2 ) = O logC+ 2 (Sn) log(1/ε)
                                                                                    

random bits, and R0stars satisfies
                                                                                        
                                                                                                  εp
                                                                            / Csimple
                                                                                       
                                     E               Pr             (P  ρ) ∈                ≤
                                  L←Rstars       ρ←{0,1}[n]\L                                    2 ln n

for all P ∈ C .
    We claim that the class of (t, F2 (degree k))-decision trees can be δ -fooled with seed length

                                   sPRG (δ ) = k · O(t + 2k log(1/δ )) + O(t log(n));

                              T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                        32
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

we defer the proof of this claim to the next subsection (see Lemma 7.8). Recalling our definition of
                      √                           √
Csimple where t = O( log S) + log(1/ε) and k = log S, it follows from this claim that Csimple can be
(ε p/(2 ln n))-fooled with seed length
              √                                 √                    p
  sPRG = 2O( log S) log(1/ε) + O(t log(n)) = 2O( log S) log(1/ε) + O( log S log(n)) + log(1/ε) log(n)
              √
        = 2O( log S) log(1/ε)
                                              2
(where we have again used S = 2ω(log log(n)) ). Now applying Theorem 5.1, we get that there is an ε-PRG
for C with seed length
                                           1                   √                    √
       (sSL + sPRG ) · p−1 ln n = O logC+ 2 (Sn) log(1/ε) + 2O( log S) log(1/ε) · 2O( log S) ln n
                                                          
                                      √
                                = 2O( log S) log(1/ε)
                                                           2
(where the last equality yet again uses S = 2ω(log log(n)) ), and this completes the proof.

7.2.1   Fooling depth-t decision trees with degree-k F2 polynomials at its leaves
We recall the following well-known result of Viola:
                                               1 2      k−1
Theorem 7.4 ([76]). The sum of k independent ( 16 δ           )-biased distributions δ -fools the class of degree-k
F2 polynomials.
    Earlier work of Lovett [50] proved the weaker statement with 2k independent copies instead of k. We
note that Lovett’s result suffices for our purposes.
    We will need a few simple facts about distributions. Recall that a distribution D is a mixture of
component distributions D(1) , . . . , D(`) if there exist non-negative weights w1 , . . . , w` summing to 1 such
that making a draw from D corresponds to first drawing i ∈ [`] with probability wi and then making a
draw from D(i) .
Fact 7.5. Let C be a class of functions and suppose that distributions D(1) , . . . , D(`) each δ -fool C . Then
any mixture D of distributions D(1) , . . . , D(`) also δ -fools C .
    We say that a class C of Boolean functions is closed under shifts if for all f ∈ C and y ∈ {0, 1}n , the
function g(x) := f (x + y) is also in C (where addition is coordinatewise over F2 ). An easy consequence
of Fact 7.5 is the following:
Fact 7.6. Let C be a class of functions, closed under shifts, that is δ -fooled by a distribution D. Let D0
be any other independent distribution. Then the distribution D + D0 , where a draw from D + D0 is x + y
where x ← D and y ← D0 , also δ -fools C .
    Finally we recall the following which is an easy consequence of the definition of a δ -biased distribu-
tion:
Fact 7.7 (Conditioning a δ -biased distribution). Let D be a δ -biased distribution over {0, 1}n . Fix i ∈ [n]
and b ∈ {0, 1}, and let D0 denote the distribution of x ← D conditioned on xi = b. Then the marginal
distribution of D0 on the coordinates in [n] \ {i} is 2δ /(1 − δ ) ≤ 4δ biased.

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                    33
                                  ROCCO A. S ERVEDIO AND L I -YANG TAN

                                                                                                (1)       (k)
Lemma 7.8 (Fooling decision trees with low-degree polynomials at leaves). Let Dδ 0 -biased , . . . , Dδ 0 -biased
                                                      1 2k−1
be k independent δ 0 -biased distribution where δ 0 = 16 δ   · 4−t . Let Dt-wise be an independent t-wise
independent distribution. Then the sum
                                          (1)                  (k)
                                 D := Dδ 0 -biased + · · · + Dδ 0 -biased + Dt-wise

δ -fools the class of depth-t decision trees with degree-k polynomials at its leaves. Since δ 0 -biased distri-
butions can be generated with seed length O(log(n) + log(1/δ 0 )), and t-wise independent distributions
with seed length O(t log(n)), we get that we can sample from D using

                                     k · O(t + 2k log(1/δ )) + O(t log(n))

random bits.
    The intuition underlying Lemma 7.8 is as follows:

   1. Dt-wise ensures that every branch of the decision tree is taken with the right probability.
                                                      k−1                       k−1
   2. By Fact 7.7, each Dδ 0 -biased remains ( 16
                                                1 2
                                                  δ 4−t ) · 4t = 16
                                                                  1 2
                                                                    δ -biased even when conditioned on
      a length-t branch. By Theorem 7.4, their sum δ -fools the degree-k polynomial at the leaf.

Proof. Let F be computed by a depth-t decision tree T with degree-k polynomials at its leaves. We begin
by noting that every branch π of T is taken with the right probability under a random draw from D:
                                                                                    
                 E F(y) = 1 = ∑ Pr [ y follows π ] · E (F  π)(y) | y follows π
                y←D                 π∈T y←D                          y←D
                                                                                                   
                                  = ∑ Pr [ x follows π ] · E                (F  π)(y) | y follows π ,
                                    π∈T x←U                         y←D
                                                                       (since D is t-wise independent and |π| ≤ t)

so it remains to show that
                                                        
                  E (F  π)(y) | y follows π = E (F  π)(x) ± δ                         for all π ∈ T .
                y←D                                   x←U

Since for all π ∈ T F  π is a degree-k polynomial over the coordinates in [n] \ supp(π), it suffices to
show that D  π, the distribution of y ← D conditioned on y following π, δ -fools the class of degree-k
polynomials over the coordinates in [n] \ supp(π).
    Fix π ∈ T and let S denote supp(π). We will express D  π as a mixture of distributions, and argue
that each component distribution in the mixture δ -fools the class of degree-k polynomials over the
coordinates in [n] \ S. Recall that D is the sum of k + 1 many independent distributions
                                         (1)                   (k)
                                 D = Dδ 0 -biased + · · · + Dδ 0 -biased + Dt-wise ,

and so a draw y = z(1) + · · · + z(k+1) is consistent with π iff
                                               (1)          (k+1)
                                            zS + · · · + zS          = πS .

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                    34
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

Therefore, D  π is a mixture of component distributions each of which is the sum of k + 1 indepen-
dent distributions. Each component distribution is specified by a (k + 1)-tuple (π (1) , . . . , π (k+1) ) where
supp(π (i) ) = S for all i ∈ [k + 1] and
                                                           M      (i)
                                                                 πS = πS .
                                                       i∈[k+1]


Given such a (k + 1)-tuple (π (1) , . . . , π (k+1) ), the corresponding component distribution is
                                                                                       
                      (1)                            (k)
                     Dδ 0 -biased  π (1) + · · · + Dδ 0 -biased  π (k) + Dt-wise  π (k+1) .            (7.1)

(The values of the mixing weights for the components are By Fact 7.7, the marginal distribution of each
  (i)
Dδ -biased  π (i) on the coordinates in [n] \ S is
                                                                    
                                    0   |π (i) |           1 2k−1 −t           1 2k−1
                                 δ ·4              ≤          δ  4     · 4t =    δ
                                                           16                 16

biased, and hence by Viola’s theorem (Theorem 7.4) their sum
                                                                                  
                                  (1)                            (k)
                                 Dδ 0 -biased  π (1) + · · · + Dδ 0 -biased  π (k)

δ -fools the class of degree-k polynomials over the coordinates in [n] \ S. By Fact 7.6, so does the
distribution in (7.1). By Fact 7.5 the mixture distribution D  π likewise δ -fools the class of degree-k
polynomials over the coordinates in [n] \ S, and the proof is complete.


A     Proof sketch of Theorem 3.8
We sketch a proof of the following:

Theorem A.1. Let F = (F1 , . . . , FM ) be an ordered collection of k-CNFs. Then for all t, ` ∈ N,

                             Pr [ depth(CCDT` (F  ρ) ≥ t ] ≤ M dt/`e (32pk)t .
                            ρ←R p


    Our proof sketch of Theorem A.1 is carried out in the “encoding-decoding” framework of Razborov’s
alternative proof [61] of Håstad’s original switching lemma [37], Theorem 3.1. (For a detailed exposition
of Razborov’s proof technique see [15, 71] and Chapter §14 of [9].) We emphasize that the ideas in
our proof of Theorem A.1 are all from [38], but in our view the encoding–decoding presentation is
more amenable to the derandomization that we ultimately require than the conditioning-based inductive
argument given in [38]. We also note that a similar proof based on the encoding–decoding framework,
achieving a very similar bound, appears in [70, Section 7].

                        T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                 35
                                   ROCCO A. S ERVEDIO AND L I -YANG TAN

A.1    Bad restrictions and the structure of witnessing paths
Fix F = (F1 , . . . , FM ) and consider the set B ⊆ {0, 1, ∗}n of all bad restrictions ρ, namely the ones such
that
                                          depth(CCDT` (F  ρ)) ≥ t.
Fix any bad restriction ρ ∈ B. Recalling our definition of the set of canonical common partial decision
trees (Definition 3.7), there exists a canonical common `-partial decision tree T ∈ CCDT` (F  ρ) and a
path Π of length exactly t through T . Furthermore, we have that
   1. There exist indices 1 ≤ i1 ≤ i2 ≤ · · · ≤ iu ≤ M where u ≤ dt/`e, and

   2. Π = π (1) ◦ · · · ◦ π (u) , where for all j ∈ [u], we have that supp(π ( j) ) = supp(η ( j) ) where η ( j) is a
      path through the canonical decision tree

                                           CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ).

      Furthermore, for every j ∈ [u − 1] we have that η ( j) is a full path of length between ` + 1 and ` + k
      through the CDT, and η (u) is a path of length exactly t − ∑u−1             ( j)
                                                                      j=1 |supp(η )|. (Note that η
                                                                                                    (u) is not

      necessarily a full path.)
(Note that by (2), these restrictions π ( j) are supported on mutually disjoint sets of coordinates.)

A.2    Encoding bad restrictions ρ
Recalling the statement of Theorem A.1, our goal is to bound Prρ←R p [ ρ ∈ B ], the weight of the set B of
bad restrictions under R p . To do so, we define an encoding of each bad restriction ρ ∈ B as a different
restriction ρ 0 ∈ {0, 1, ∗}n and a small amount (say at most m bits) of “auxiliary information”:

                                       encode : B → {0, 1, ∗}n × {0, 1}m
                                   encode(ρ) = (ρ 0 , auxiliary information)

This encoding should satisfy two key properties. First, it should be uniquely decodable, meaning that one
is always able to recover ρ given ρ 0 and the auxiliary information; equivalently, the function encode(·)
is an injection. Second, ρ 0 should extend ρ by exactly t bits, meaning that supp(ρ) ⊆ supp(ρ 0 ) and
|supp(ρ 0 ) \ supp(ρ)| = t. From this second property we get that
                                      Prρ←R p [ ρ = ρ 0 ]     1− p t
                                                                  
                                                          =           ,
                                       Prρ←R p [ ρ = ρ ]       2p

i. e., that the weight of ρ 0 under R p is larger than that of ρ by a O(p)−t multiplicative factor. It is not
hard to see that together, these two properties imply that the total weight of all bad restrictions with the
same auxiliary information is at most O(p)t . To complete the proof of Theorem A.1, we then bound the
overall weight of B via a union bound over all 2m possible strings of auxiliary information, giving us a
failure probability of
                                                        2p t
                                                           
                                                  m
                                                2 ·            .                                       (A.1)
                                                      1− p

                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                     36
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

    We now describe the encoding in more detail. Given a bad restriction ρ ∈ B, the extension ρ 0 of ρ
will be
                              ρ 0 = ρ ◦ σ (1) ◦ · · · ◦ σ (u) , u ≤ dt/`e                       (A.2)
where σ ( j) is a restriction that is supported on the same coordinates as π ( j) for all j ∈ [u]. (Hence these
restrictions σ ( j) ’s are supported on mutually disjoint sets of coordinates, every σ ( j) has length between
` + 1 and ` + k, except σ (u) which has length t − ∑u−1                ( j)
                                                           j=1 |supp(σ )| and is not necessarily a full path.)
      We now define these restrictions σ ( j) . Recall that supp(π ( j) ) = supp(η ( j) ) where η ( j) is a full path
of length between ` + 1 and ` + k through the canonical decision tree CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ),
a full path witnessing that fact that CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ) > `. That is, η ( j) is a full path
witnessing the fact that ρ ◦ π (1) ◦ · · · ◦ π ( j−1) is a bad restriction for the usual switching lemma, and
π ( j) is an assignment to the variables in supp(η ( j) ). (Once again this is with the possible exception
of the segment π (u) of Π, which has length t − ∑u−1                    ( j)
                                                            j=1 |supp(π )| and is not necessarily a full path.)
Razborov’s encoding–decoding proof of the usual switching lemma defines an encoding of this bad
restriction ρ ◦ π (1) ◦ · · · ◦ π ( j−1) to an extension

                                           ρ ( j) := ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ◦ σ ( j)

where σ ( j) is supported on the same ` coordinates as η ( j) (and hence π ( j) as well). Razborov’s proof
hinges on the fact that given the k-CNF F, this encoding ρ ( j) , and a small amount of auxiliary information,
one is able to recover the bad restriction ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ; that is, one is able to “undo” σ ( j) in ρ ( j) ,
flipping the coordinates in supp(σ ( j) ) from {0, 1} back to ∗. This restriction σ ( j) as defined in Razborov’s
proof is precisely the σ ( j) we will use in our encoding (A.2).
     We summarize the discussion above in the following fact:

Fact A.2 (Main lemma in encoding–decoding proof of the usual switching lemma, notation specialized
to our current context). Let Fi j be a k-CNF, ρ ◦ π (1) ◦ · · · π ( j−1) be a restriction, and η ( j) be a path in
CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ). There is a restriction σ ( j) to the coordinates in supp(η ( j) ) such that
given

   1. The k-CNF Fi j ,

   2. The restriction ρ ( j) = ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ◦ σ ( j) ,

   3. |supp(η ( j) )| · (2 + log k) bits of auxiliary information ι(ρ ◦ π (1) ◦ · · · ◦ π ( j−1) , Fi j , η j ),

a decoder is able to recover the restriction π (1) ◦ · · · ◦ π ( j−1) .
    Furthermore, if η ( j) is a full path in CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ) (recall the definition of a full
path given in Definition 3.6) then given

   1. The k-CNF Fi j ,

   2. Any extension ρ ( j) of the restriction ρ ( j) = ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ◦ σ ( j) ,

   3. |supp(η ( j) )| · (2 + log k) bits of auxiliary information ι(ρ ◦ π (1) ◦ · · · ◦ π ( j−1) , Fi j , η j ),

                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                        37
                                       ROCCO A. S ERVEDIO AND L I -YANG TAN

a decoder is able to “undo” σ ( j) in ρ ( j) , by which we mean that she is able to recover the restriction ρ̄ ( j)
where                                            (
                                        ( j)       ∗      if i ∈ supp(σ ( j) )
                                      ρ̄i =          ( j)
                                                   ρi     otherwise.

A.3    Our auxiliary information
We will provide the decoder with

   1. u log M bits of information specifying the u indices i1 , . . . , iu ∈ [M].

   2. The auxiliary information ι(ρ ◦ π (1) ◦ · · · ◦ π ( j−1) , Fi j , η j ) for all j ∈ [u] (as defined in Fact A.2), a
      total of
                                          u
                                         ∑ |supp(η ( j) )| · (2 + log k) = t · (2 + log k)
                                         j=1

       many bits.

   3. t bits of information specifying the length-t path Π = π (1) ◦ · · · ◦ π (u) through CCDT` (F  ρ).

This is a total of
                                                m := u log M + t log k + 3t
bits of auxiliary information; recalling equation (A.1) and the preceding discussion, to establish Theo-
rem A.1 it remains to argue that the map encode(ρ) = (ρ 0 , auxiliary information) is indeed invertible.

A.4    Decoding
Fix F = (F1 , . . . , FM ) and consider a bad restriction ρ ∈ B, one such that

                                               depth(CCDT` (F  ρ)) ≥ t.

Let Π = π (1) ◦ · · · ◦ π (u) be a path of length t through a canonical common `-partial decision tree
T ∈ CCDT` (F  ρ) that witnesses the badness of ρ. We claim that for all j ∈ [u], given

   1. The family of k-CNFs F ,

   2. The “hybrid” restriction ρ ( j) := ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ◦ σ ( j) ◦ · · · ◦ σ (u) ,

   3. The auxiliary information described in Section A.3,

the decoder can recover the “next” hybrid restriction ρ ( j+1) := ρ ◦π (1) ◦· · · π ( j) ◦σ ( j+1) ◦· · ·◦σ (u) . Before
justifying this claim, we note that from this claim we get that the map

                                       encode(ρ) = (ρ 0 , auxiliary information)                                  (A.3)

is indeed invertible, i. e., that given ρ 0 as defined in (A.2) and the auxiliary information described above,
we can recover ρ (this would complete our proof of Theorem A.1). To see this, we first observe that ρ 0

                           T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                       38
     I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

is simply ρ (1) . Applying the claim u times the decoder is able to iteratively recover ρ (2) , . . . , ρ (u+1) =
ρ ◦π (1) ◦· · ·◦π (u) , and having done so she will have identified supp(π (1) ◦· · ·◦π (u) ). With this information
she is then able to recover ρ from ρ (u+1) (simply by flipping the bits in supp(π (1) ◦ · · · ◦ π (u) ) back to ∗’s).
     We now show how the decoder obtains ρ ( j+1) from ρ ( j) for all j ∈ [u]. First, since the auxiliary
information specifies i j ∈ [M] she is able to identify Fi j within F . Next,

    • For j ∈ [u − 1], we recall that η ( j) is a full path in CDT(Fi j  ρ ◦ π (1) ◦ · · · ◦ π ( j−1) ) (whose length
      is between ` + 1 and ` + k and hence is “approximately known” to the decoder), and hence we
      may apply the “Furthermore” part of Fact A.2 to “undo” σ ( j) in ρ ( j) and obtain the restriction
      ρ ◦ π (1) ◦ · · · π ( j−1) ◦ σ ( j+1) ◦ · · · ◦ σ (u) .

    • For j = u, while η (u) is not necessarily a full path in CDT(Fiu  ρ ◦ π (1) ◦ · · · ◦ π (u−1) ) we observe
      that ρ (u) is simply ρ (u) , and hence we may apply the first part of Fact A.2 to obtain the restriction
      ρ ◦ π (1) ◦ · · · ◦ π (u−1) .

    In either case, since our auxiliary information to the decoder specifies the values of π ( j) on supp(σ ( j) ),
the decoder is able to fill in these coordinates accordingly to obtain ρ ( j+1) .


Acknowledgements
We thank Prahladh Harsha and Srikanth Srinivasan for helpful discussions.


References
 [1] S COTT A ARONSON: A counterexample to the generalized Linial–Nisan conjecture. Electron.
     Colloq. Comput. Complexity, TR10-109, 2010. [ECCC] 5

 [2] S COTT A ARONSON: BQP and the polynomial hierarchy. In Proc. 42nd STOC, pp. 141–150. ACM
     Press, 2010. [doi:10.1145/1806689.1806711] 5

 [3] M ANINDRA AGRAWAL , E RIC A LLENDER , RUSSELL I MPAGLIAZZO , T ONIANN P ITASSI , AND
     S TEVEN RUDICH: Reducing the complexity of reductions. Comput. Complexity, 10(2):117–138,
     2001. [doi:10.1007/s00037-001-8191-1] 3

 [4] M IKLÓS A JTAI: Σ11 -formulae on finite structures. Ann. Pure Appl. Logic, 24(1):1–48, 1983.
     [doi:10.1016/0168-0072(83)90038-6] 2, 3, 4, 6, 7, 8

 [5] M IKLÓS A JTAI: Geometric properties of sets defined by constant depth circuits. In Combinatorics,
     Paul Erdős is Eighty, Vol. 1, Bolyai Soc. Math. Studies, pp. 19–31. J. Bolyai Math. Soc., Budapest,
     1993. 3

 [6] M IKLÓS A JTAI AND AVI W IGDERSON: Deterministic simulation of probabilistic constant depth
     circuits. Adv. Comput. Res., 5:199–222, 1989. Preliminary version in FOCS’85. 2, 3, 5, 6, 7, 8, 9,
     10, 11, 22, 24, 25

                         T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                                      39
                                ROCCO A. S ERVEDIO AND L I -YANG TAN

 [7] N OGA A LON , L ÁSZLÓ BABAI , AND A LON I TAI: A fast and simple randomized algorithm for
     the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. [doi:10.1016/0196-
     6774(86)90019-2] 27
 [8] N OGA A LON , O DED G OLDREICH , J OHAN H ÅSTAD , AND R ENÉ P ERALTA: Simple constructions
     of almost k-wise independent random variables. Random Struct. Algor., 3(3):289–304, 1992.
     Preliminary version in FOCS’90. [doi:10.1002/rsa.3240030308] 8
 [9] S ANJEEV A RORA AND B OAZ BARAK: Computational Complexity: A modern approach. Cam-
     bridge Univ. Press, 2009. [doi:10.1017/CBO9780511804090] 35
[10] L ÁSZLÓ BABAI: Random oracles separate PSPACE from the polynomial-time hierarchy. Inform.
     Process. Lett., 26(1):51–53, 1987. [doi:10.1016/0020-0190(87)90036-6] 4
[11] L ÁSZLÓ BABAI , N OAM N ISAN , AND M ÁRIÓ S ZEGEDY: Multiparty protocols, pseudorandom
     generators for logspace, and time-space trade-offs. J. Comput. System Sci., 45(2):204–232, 1992.
     [doi:10.1016/0022-0000(92)90047-M] 8, 9, 10
[12] M ARSHALL BALL , DANA DACHMAN -S OLED , S IYAO G UO , TAL M ALKIN , AND L I -YANG TAN:
     Non-malleable codes for small-depth circuits. In Proc. 59th FOCS, pp. 826–837. IEEE Comp. Soc.,
     2018. [doi:10.1109/FOCS.2018.00083] 2, 4
[13] L OUAY M. J. BAZZI: Polylogarithmic independence can fool DNF formulas. SIAM J. Comput.,
     38(6):2220–2272, 2009. [doi:10.1137/070691954] 5, 6
[14] PAUL B EAME: Lower bounds for recognizing small cliques on CRCW PRAM’s. Discr. Appl. Math.,
     29(1):3–20, 1990. [doi:10.1016/0166-218X(90)90079-R] 2
[15] PAUL B EAME: A switching lemma primer. Technical Report UW-CSE-95-07-01, Univ. Washington,
     1994. LINK. 35
[16] PAUL B EAME , RUSSELL I MPAGLIAZZO , AND S RIKANTH S RINIVASAN: Approximating AC0 by
     small height decision trees and a deterministic algorithm for #AC0 -SAT. In Proc. 27th IEEE Conf. on
     Comput. Complexity (CCC’12), pp. 117–125. IEEE Comp. Soc., 2012. [doi:10.1109/CCC.2012.40]
     2, 4
[17] M ANUEL B LUM AND S ILVIO M ICALI: How to construct cryptographically strong sequences of
     pseudorandom bits. SIAM J. Comput., 13(4):850–864, 1984. Preliminary version in FOCS’82.
     [doi:10.1137/0213053] 5
[18] A NDREJ B OGDANOV: Pseudorandom generators for low degree polynomials. In Proc. 37th STOC,
     pp. 21–30. ACM Press, 2005. [doi:10.1145/1060590.1060594] 8, 9
[19] A NDREJ B OGDANOV AND E MANUELE V IOLA: Pseudorandom bits for polynomials. SIAM J.
     Comput., 39(6):2464–2486, 2010. [doi:10.1137/070712109] 8, 9
[20] J EAN B OURGAIN: Estimation of certain exponential sums arising in complexity theory. C. R. Math.
     Acad. Sci. Paris, 340(9):627–631, 2005. [doi:10.1016/j.crma.2005.03.008] 8

                      T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                            40
    I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

[21] M ARK B RAVERMAN: Polylogarithmic independence fools AC0 circuits. J. ACM, 57(5):1–10, 2010.
     [doi:10.1145/1754399.1754401] 5, 7

[22] J IN -Y I C AI: With probability one, a random oracle separates PSPACE from the polynomial-
     time hierarchy. J. Comput. System Sci., 38(1):68–85, 1989. Preliminary version in STOC’86.
     [doi:10.1016/0022-0000(89)90033-0] 3, 4

[23] A RKADEV C HATTOPADHYAY: Discrepancy and the power of bottom fan-in in depth-three cir-
     cuits. In Proc. 48th FOCS, pp. 449–458. IEEE Comp. Soc., 2007. [doi:10.1109/FOCS.2007.30,
     ECCC:TR07-050] 8

[24] E SHAN C HATTOPADHYAY, P OOYA H ATAMI , K AAVE H OSSEINI , AND S HACHAR L OVETT: Pseu-
     dorandom generators from polarizing random walks. Theory of Computing, 15(10):1–26, 2019.
     Preliminary version in CCC’18. [doi:10.4086/toc.2019.v015a010] 5, 6, 7

[25] E SHAN C HATTOPADHYAY, P OOYA H ATAMI , S HACHAR L OVETT, AND AVISHAY TAL: Pseudoran-
     dom generators from the second Fourier level and applications to AC0 with parity gates. In Proc.
     10th Innovations in Theoret. Comp. Sci. conf. (ITCS’19), pp. 22:1–15. Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.ITCS.2019.22] 8

[26] E SHAN C HATTOPADHYAY AND X IN L I: Non-malleable codes and extractors for small-depth
     circuits, and affine functions. In Proc. 49th STOC, pp. 1171–1184. ACM Press, 2017.
     [doi:10.1145/3055399.3055483] 2

[27] S HIVA C HAUDHURI AND JAIKUMAR R ADHAKRISHNAN: Deterministic restrictions in circuit
     complexity. In Proc. 28th STOC, pp. 30–36. ACM Press, 1996. [doi:10.1145/237814.237824,
     ECCC:TR96-004] 3

[28] X I C HEN , I GOR C ARBONI O LIVEIRA , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Near-optimal
     small-depth lower bounds for small distance connectivity. In Proc. 48th STOC, pp. 612–625. ACM
     Press, 2016. [doi:10.1145/2897518.2897534] 2

[29] A NINDYA D E , O MID E TESAMI , L UCA T REVISAN , AND M ADHUR T ULSIANI: Improved pseudo-
     random generators for depth 2 circuits. In Proc. 14th Internat. Workshop on Randomization and
     Computation (RANDOM’10), pp. 504–517. Springer, 2010. [doi:10.1007/978-3-642-15369-3_38] 5

[30] S TEFAN D ZIEMBOWSKI , K RZYSZTOF P IETRZAK , AND DANIEL W ICHS: Non-malleable codes. J.
     ACM, 65(4):20:1–32, 2018. [doi:10.1145/3178432] 4

[31] B ILL F EFFERMAN , RONEN S HALTIEL , C HRISTOPHER U MANS , AND E MANUELE V IOLA: On
     beating the hybrid argument. Theory of Computing, 9(26):809–843, 2013. Preliminary version in
     ITCS’12. [doi:10.4086/toc.2013.v009a026, ECCC:TR10-186] 5

[32] M ERRICK F URST, JAMES S AXE , AND M ICHAEL S IPSER: Parity, circuits, and the polynomial-time
     hierarchy. Math. Sys. Theory, 17(1):13–27, 1984. [doi:10.1007/BF01744431] 2, 3

                     T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                         41
                              ROCCO A. S ERVEDIO AND L I -YANG TAN

[33] O DED G OLDREICH AND AVI W IDGERSON: On derandomizing algorithms that err extremely rarely.
     In Proc. 46th STOC, pp. 109–118. ACM Press, 2014. [doi:10.1145/2591796.2591808] 3, 5

[34] PARIKSHIT G OPALAN , R AGHU M EKA , AND O MER R EINGOLD: DNF sparsification and a faster
     deterministic counting algorithm. Comput. Complexity, 22(2):275–310, 2013. [doi:10.1007/s00037-
     013-0068-6] 3, 5

[35] PARIKSHIT G OPALAN , R AGHU M EKA , O MER R EINGOLD , L UCA T REVISAN , AND S ALIL P.
     VADHAN: Better pseudorandom generators from milder pseudorandom restrictions. In Proc. 53rd
     FOCS, pp. 120–129. IEEE Comp. Soc., 2012. [doi:10.1109/FOCS.2012.77, arXiv:1210.0049,
     ECCC:TR12-123] 3, 5, 22

[36] P RAHLADH H ARSHA AND S RIKANTH S RINIVASAN: On polynomial approximations to
     AC0 . Random Struct. Algor., 54(2):289–303, 2019. Preliminary version in RANDOM’16.
     [doi:10.1002/rsa.20786] 5, 6, 7, 8, 26

[37] J OHAN H ÅSTAD: Almost optimal lower bounds for small depth circuits. In Proc. 18th STOC, pp.
     6–20. ACM Press, 1986. [doi:10.1145/12130.12132] 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 35

[38] J OHAN H ÅSTAD: On the correlation of parity and small-depth circuits. SIAM J. Comput.,
     43(5):1699–1708, 2014. [doi:10.1137/120897432, ECCC:TR12-137] 2, 4, 6, 7, 8, 9, 10, 11,
     12, 13, 14, 15, 16, 35

[39] J OHAN H ÅSTAD: An average-case depth hierarchy theorem for higher depths. In Proc. 57th FOCS,
     pp. 79–88. IEEE Comp. Soc., 2016. [doi:10.1109/FOCS.2016.18, ECCC:TR16-041] 2

[40] J OHAN H ÅSTAD: On small-depth Frege proofs for Tseitin for grids. J. ACM, 68(1):1–31, 2020.
     Preliminary version in FOCS’17. [doi:10.1145/3425606, ECCC:TR17-142] 2

[41] J OHAN H ÅSTAD AND M IKAEL G OLDMANN: On the power of small-depth threshold circuits.
     Comput. Complexity, 1(2):113–129, 1991. [doi:10.1007/BF01272517] 8

[42] J OHAN H ÅSTAD , B ENJAMIN ROSSMAN , ROCCO A. S ERVEDIO , AND L I -YANG TAN: An average-
     case depth hierarchy theorem for Boolean circuits. J. ACM, 64(5):1–27, 2017. Preliminary version
     in FOCS’15. [doi:10.1145/3095799, ECCC:TR15-065] 2

[43] RUSSELL I MPAGLIAZZO , W ILLIAM M ATTHEWS , AND R AMAMOHAN PATURI: A satisfiability
     algorithm for AC0 . In Proc. 23rd Ann. ACM–SIAM Symp. on Discrete Algorithms (SODA’12), pp.
     961–972. SIAM, 2012. [doi:10.1137/1.9781611973099.77] 2, 3, 4, 8, 12

[44] RUSSELL I MPAGLIAZZO , R AGHU M EKA , AND DAVID Z UCKERMAN: Pseudorandomness from
     shrinkage. J. ACM, 66(2):11:1–16, 2019. [doi:10.1145/3230630] 4, 22

[45] A DAM K LIVANS: On the derandomization of constant depth circuits. In Proc. 5th Internat.
     Workshop on Randomization and Computation (RANDOM’01), pp. 249–260. Springer, 2001.
     [doi:10.1007/3-540-44666-4_28] 5

                     T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                         42
    I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

[46] A DAM K LIVANS , H OMIN L EE , AND A NDREW WAN: Mansour’s conjecture is true for random
     DNF formulas. In Proc. 23rd Ann. Conf. on Learning Theory (COLT’10), pp. 368–380. Springer,
     2010. [ECCC:TR10-023] 5

[47] JAN K RAJÍ ČEK , PAVEL P UDLÁK , AND A LAN W OODS: An exponential lower bound to the size of
     bounded depth Frege proofs of the pigeonhole principle. Random Struct. Algor., 7(1):15–39, 1995.
     [doi:10.1002/rsa.3240070103, ECCC:TR94-018] 2

[48] NATHAN L INIAL , Y ISHAY M ANSOUR , AND N OAM N ISAN: Constant depth circuits, Fourier
     transform and learnability. J. ACM, 40(3):607–620, 1993. Preliminary version in FOCS’89.
     [doi:10.1145/174130.174138] 2

[49] NATHAN L INIAL AND N OAM N ISAN: Approximate inclusion-exclusion. Combinatorica,
     10(4):349–365, 1990. Preliminary version in STOC’90. [doi:10.1007/BF02128670] 5, 6

[50] S HACHAR L OVETT: Unconditional pseudorandom generators for low-degree polynomials. Theory
     of Computing, 5(3):69–82, 2009. [doi:10.4086/toc.2009.v005a003] 8, 9, 10, 33

[51] S HACHAR L OVETT AND S RIKANTH S RINIVASAN: Correlation bounds for poly-size AC0 circuits
     with n1−o(1) symmetric gates. In Proc. 15th Internat. Workshop on Randomization and Computation
     (RANDOM’11), pp. 640–651. Springer, 2011. [doi:10.1007/978-3-642-22935-0_54] 5, 8

[52] C HI -J EN L U: Hitting set generators for sparse polynomials over any finite fields. In Proc.
     27th IEEE Conf. on Comput. Complexity (CCC’12), pp. 280–286. IEEE Comp. Soc., 2012.
     [doi:10.1109/CCC.2012.20] 8

[53] M ICHAEL L UBY AND B OBAN V ELI ČKOVI Ć: On deterministic approximation of DNF. Algo-
     rithmica, 16(4–5):415–433, 1996. Preliminary version in STOC’91. [doi:10.1007/BF01940873]
     5

[54] M ICHAEL L UBY, B OBAN V ELI ČKOVI Ć , AND AVI W IGDERSON: Deterministic approximate
     counting of depth-2 circuits. In Proc. 2nd Isr. Symp. Theory Comp. Sys. (ISTCS’93), pp. 18–24.
     IEEE Comp. Soc., 1993. [doi:10.1109/ISTCS.1993.253488] 5, 8, 9, 10

[55] R AGHU M EKA , O MER R EINGOLD , AND AVISHAY TAL: Pseudorandom generators for
     width-3 branching programs.      In Proc. 51st STOC, pp. 626–637. ACM Press, 2019.
     [doi:10.1145/3313276.3316319, arXiv:1806.04256] 22

[56] J OSEPH NAOR AND M ONI NAOR: Small-bias probability spaces: Efficient constructions and
     applications. SIAM J. Comput., 22(4):838–856, 1993. [doi:10.1137/0222053] 8, 9

[57] N OAM N ISAN: Pseudorandom bits for constant depth circuits. Combinatorica, 11(1):63–70, 1991.
     [doi:10.1007/BF01375474] 3, 5, 7

[58] N OAM N ISAN AND AVI W IGDERSON: Hardness vs. randomness. J. Comput. System Sci., 49(2):149–
     167, 1994. [doi:10.1016/S0022-0000(05)80043-1] 5, 7, 9, 10

                     T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                         43
                               ROCCO A. S ERVEDIO AND L I -YANG TAN

[59] T ONIANN P ITASSI , PAUL B EAME , AND RUSSELL I MPAGLIAZZO: Exponential lower bounds for
     the pigeonhole principle. Comput. Complexity, 3(2):97–140, 1993. [doi:10.1007/BF01200117] 2

[60] T ONIANN P ITASSI , B ENJAMIN ROSSMAN , ROCCO A. S ERVEDIO , AND L I -YANG TAN: Poly-
     logarithmic Frege depth lower bounds via an expander switching lemma. In Proc. 48th STOC, pp.
     644–657. ACM Press, 2016. [doi:10.1145/2897518.2897637] 2

[61] A LEXANDER A. R AZBOROV: Bounded arithmetic and lower bounds in Boolean complexity. In
     Feasible Mathematics II, pp. 344–386. Springer, 1995. [doi:10.1007/978-1-4612-2566-9_12] 35

[62] A LEXANDER A. R AZBOROV: A simple proof of Bazzi’s theorem. ACM Trans. Comput. Theory,
     1(1):3:1–5, 2009. [doi:10.1145/1490270.1490273] 5, 6

[63] O MER R EINGOLD , T HOMAS S TEINKE , AND S ALIL VADHAN: Pseudorandomness for regular
     branching programs via Fourier analysis. In Proc. 17th Internat. Workshop on Randomization and
     Computation (RANDOM’13), pp. 655–670. Springer, 2013. [doi:10.1007/978-3-642-40328-6_45]
     22

[64] B ENJAMIN ROSSMAN: On the constant-depth complexity of k-clique. In Proc. 40th STOC, pp.
     721–730. ACM Press, 2008. [doi:10.1145/1374376.1374480] 2

[65] B ENJAMIN ROSSMAN: The average sensitivity of bounded-depth formulas. Comput. Complexity,
     27(2):209–223, 2018. [doi:10.1007/s00037-017-0156-0] 2

[66] ROCCO A. S ERVEDIO AND L I -YANG TAN: What circuit classes can be learned with nontrivial
     savings? In Proc. 8th Innovations in Theoret. Comp. Sci. conf. (ITCS’17). Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2017. [doi:10.4230/LIPIcs.ITCS.2017.30] 4

[67] ROCCO A. S ERVEDIO AND L I -YANG TAN: Luby–Veličković–Wigderson revisited: Improved
     correlation bounds and pseudorandom generators for depth-two circuits. In Proc. 22nd Internat.
     Conf. on Randomization and Computation (RANDOM’18), pp. 56:1–20. Schloss Dagstuhl–Leibniz-
     Zentrum fuer Informatik, 2018. [doi:10.4230/LIPIcs.APPROX-RANDOM.2018.56] 9, 10

[68] ROCCO A. S ERVEDIO AND L I -YANG TAN: Improved pseudorandom generators from pseudo-
     random multi-switching lemmas. In Proc. 23rd Internat. Conf. on Randomization and Compu-
     tation (RANDOM’19), pp. 45:1–23. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019.
     [doi:10.4230/LIPIcs.APPROX-RANDOM.2019.45] 1

[69] J IRÍ S ÍMA AND S TANISLAV Z ÁK: A polynomial time construction of a hitting set for read-once
     branching programs of width 3, 2010/2021. [ECCC:TR10-088, arXiv:2101.01151] 5

[70] AVISHAY TAL: Tight bounds on the Fourier spectrum of AC0 . In Proc. 32nd Comput. Com-
     plexity Conf. (CCC’17), pp. 15:1–31. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.CCC.2017.15] 5, 6, 7, 35

[71] N EIL T HAPEN: Notes on switching lemmas, 2009. Posted on arXiv in 2022. [arXiv:2202.05651]
     35

                      T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                           44
    I MPROVED P SEUDORANDOM G ENERATORS FROM P SEUDORANDOM M ULTI - SWITCHING L EMMAS

[72] L UCA T REVISAN: A note on approximate counting for k-DNF. In Proc. 8th Internat. Workshop on
     Randomization and Computation (RANDOM’04), pp. 417–425. Springer, 2004. [doi:10.1007/978-
     3-540-27821-4_37] 5

[73] L UCA T REVISAN AND T ONGKE X UE: A derandomized switching lemma and an improved
     derandomization of AC0 . In Proc. 28th IEEE Conf. on Comput. Complexity (CCC’13), pp. 242–247.
     IEEE Comp. Soc., 2013. [doi:10.1109/CCC.2013.32, ECCC:TR12-116] 3, 4, 5, 6, 7, 8, 14, 16, 19,
     22, 26

[74] E MANUELE V IOLA: Pseudorandom bits for constant-depth circuits with few arbitrary symmetric
     gates. SIAM J. Comput., 36(5):1387–1403, 2007. [doi:10.1137/050640941] 5, 8, 9

[75] E MANUELE V IOLA: On the power of small-depth computation. Found. Trends Theor. Comp. Sci.,
     5(1):1–72, 2009. [doi:10.1561/0400000033] 8

[76] E MANUELE V IOLA: The sum of d small-bias generators fools polynomials of degree d. Comput.
     Complexity, 18(2):209–217, 2009. [doi:10.1007/s00037-009-0273-5] 8, 9, 10, 33

[77] E MANUELE V IOLA AND AVI W IGDERSON: Norms, XOR lemmas, and lower bounds for polyno-
     mials and protocols. Theory of Computing, 4(7):137–168, 2008. [doi:10.4086/toc.2008.v004a007]
     8

[78] A NDREW YAO: Theory and applications of trapdoor functions. In Proc. 23rd FOCS, pp. 80–91.
     IEEE Comp. Soc., 1982. [doi:10.1109/SFCS.1982.45] 5

[79] A NDREW YAO: Separating the polynomial-time hierarchy by oracles. In Proc. 26th FOCS, pp.
     1–10. IEEE Comp. Soc., 1985. [doi:10.1109/SFCS.1985.49] 2, 3, 4


AUTHORS

     Rocco A. Servedio
     Professor
     Columbia University
     rocco cs columbia edu
     http://www.cs.columbia.edu/~rocco


     Li-Yang Tan
     Assistant Professor
     Stanford University
     liyang cs stanford edu
     http://theory.stanford.edu/~liyang




                     T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                       45
                             ROCCO A. S ERVEDIO AND L I -YANG TAN

ABOUT THE AUTHORS

    ROCCO S ERVEDIO is a professor in the Department of Computer Science at Columbia Uni-
      versity. He graduated from Harvard University in 2001, where his Ph. D. was supervised
      by Les Valiant. He is interested in computational complexity theory, computational
      learning theory, randomness in computing, property testing, and other topics.


    L I -YANG TAN is an assistant professor of computer science at Stanford University. He
         received his Ph. D. in Computer Science from Columbia University in 2014, advised by
         Rocco Servedio. His research is in theoretical computer science, with an emphasis on
         complexity theory.




                   T HEORY OF C OMPUTING, Volume 18 (4), 2022, pp. 1–46                         46