A Stochastic Calculus Approach to the Oracle Separation of $\mathsf{BQP}$ and $\mathsf{PH}$

Authors Xinyu Wu,
Plaintext
                          T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11
                                       www.theoryofcomputing.org

                                                         NOTE



    A Stochastic Calculus Approach to the
      Oracle Separation of BQP and PH
                                                      Xinyu Wu
                   Received February 7, 2020; Revised June 17, 2022; Published June 23, 2022




       Abstract. Recently, Ran Raz and Avishay Tal proved that in some relativized
       world, BQP is not contained in the polynomial-time hierarchy (STOC’19). It has been
       suggested that some aspects of the proof may be simplified by stochastic calculus.
       In this note, we describe such a simplification.


1     Introduction
A recent landmark result by Ran Raz and Avishay Tal [8] shows that there exists an oracle 𝐴
such that BQP𝐴 * PH𝐴 . It has been suggested by several people, including Ryan O’Donnell,
James Lee, and Avishay Tal, that some aspects of the proof may be simplified by stochastic
calculus. We describe such a simplification.
    As Aaronson [1] points out, there is a classical correspondence between the relativized
complexity of PH and the size of bounded-depth, unbounded fan-in Boolean circuits (Furst,
Saxe, Sipser [6]). Using this correspondence, the oracle separation reduces to upper bounds on
the statistical difference between two distributions. Concretely, it suffices to show that there
exists a distribution 𝒟 over {−1, 1}2𝑁 such that

ACM Classification: F.1.3, G.3
AMS Classification: 68Q15, 81P68
Key words and phrases: quantum complexity, stochastic calculus


© 2022 Xinyu Wu
c b Licensed under a Creative Commons Attribution License (CC-BY)                  DOI: 10.4086/toc.2022.v018a017
                                            X INYU W U

    1. For any 𝑓 : {−1, 1}2𝑁 → {0, 1} computable by a quasipolynomial-size bounded-depth
       circuit,
                                                           polylog (𝑁)
                              |E[ 𝑓 (𝒟)] − E[ 𝑓 (𝒰2𝑁 )]| ≤     √       ,              (1)
                                                                 𝑁
       where 𝒰2𝑁 is the uniform distribution over {−1, 1}2𝑁 . The notation E[ 𝑓 (𝒟)] means
       Ex∼𝒟 [ 𝑓 (x)]. In fact, a weaker upper bound of 1/(log 𝑁)𝜔(1) would suffice for the oracle
       separation.
    2. There exists a quantum algorithm 𝑄 that queries the input once and runs in 𝑂(log 𝑁)
       time, such that                                                
                                                                   1
                                    |E[𝑄(𝒟)] − E[𝑄(𝒰2𝑁 )]| ≥ Ω           .                    (2)
                                                                 log 𝑁
For an explanation of why these two items suffice for the separation result, we refer to Raz
and Tal’s paper [8]. In their paper, Raz and Tal use a truncated Gaussian for 𝒟. Moreover,
they take 𝑄 to be the Forrelation query algorithm (first introduced as “Fourier checking” by
Aaronson [1], and further analyzed in [2]). In this note, we will describe a construction of a
related but different distribution 𝒟 based on Brownian motion, which simplifies certain details
of the analysis. The resulting analysis gives the same bounds as Raz and Tal, up to constant
factors.


2     Overview
2.1   Strategy to construct the distribution 𝒟
For convenience, we shall refer to the distribution called 𝒟 in the Introduction as 𝒟 0. We shall
use 𝒟 to denote an auxiliary distribution we shall use to construct 𝒟 0 (see step 2 below).
   We will first give an overview of the strategy to construct the distribution 𝒟 0. In this section
we will not formalize stochastic calculus concepts, deferring this to Section 3.
   The construction has two main steps:
  1. Use a stopped Brownian motion to define a distribution 𝒟 on [−1/2, 1/2]2𝑁 .
  2. Round 𝒟 to a distribution 𝒟 0 on {−1, 1}2𝑁 which has the same expectation on Boolean
      functions. (This step is identical with [8].)
   We will define the distribution 𝒟 by describing how to sample from it.
   Let x𝑡 be a standard 𝑁-dimensional Brownian motion, where “standard” means its covariance
matrix is 𝐼 𝑁 , the 𝑁 × 𝑁 identity matrix. Let y𝑡 = 𝐻𝑁√x𝑡 , where 𝐻𝑁 is the 𝑁 × 𝑁 normalized
Walsh-Hadamard matrix (the W-H matrix divided by 𝑁). Finally, let B𝑡 be the 2𝑁 × 1 column
matrix formed by x𝑡 on top of y𝑡 ; for typographical convenience we write this as B𝑡 = (x𝑡 , y𝑡 ).

Proposition 2.1. B𝑡 is a 2𝑁-dimensional Brownian motion with covariance matrix

                                           𝐼         𝐻𝑁
                                                        
                                        ΦB 𝑁            .
                                           𝐻𝑁        𝐼𝑁

                     T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                        2
         A S TOCHASTIC C ALCULUS A PPROACH TO THE O RACLE S EPARATION OF BQP AND PH

We defer the proof to Proposition 3.3.
   Let 𝜀 > 0 and consider the random variable 𝜏 defined by

                     𝜏 B min{𝜀, first exit time of B𝑡 from [−1/2, 1/2]2𝑁 } .                    (3)

    Let B𝜏 denote the random variable obtained by stopping the Brownian motion B𝑡 at the
random time 𝑡 = 𝜏 (for appropriately chosen 𝜀, see the line before Equation (12)). Let 𝒟 be the
distribution of the random variable B𝜏 .
    Finally, use the method of Raz and Tal to round 𝒟 to obtain a distribution 𝒟 0 on {−1, 1}2𝑁
such that the following holds for every Boolean function 𝑓 : {−1, 1}2𝑁 → {0, 1} :

                                      E [ 𝑓˜(z)] = 0 E 0[ 𝑓 (z0)] ,                             (4)
                                     z∼𝒟           z ∼𝒟

where 𝑓˜ denotes the (unique) multilinear polynomial 𝑓˜ : ℝ2𝑁 ↦→ ℝ that extends 𝑓 . We shall
describe the method and prove Equation (4) in Proposition 4.2.
   Therefore, if 𝒟 satisfies Equations (1) and (2), then so does 𝒟 0. Hence, it will suffice to
analyze 𝒟 instead of 𝒟 0.

2.2   Sketch of the quantum algorithm
The quantum algorithm 𝑄 used in Equation (2) is very simple: since 𝐻𝑁 can be implemented
by a quantum circuit of depth 𝑂(log 𝑁), computing hx, 𝐻𝑁 yi will only take a single query.
This value will have a large positive expectation when (x, y) is drawn from 𝒟 but will have
expectation 0 when (x, y) is drawn from 𝒰2𝑁 . This will be proven in Section 6.

2.3   Comparison with the proof of Raz and Tal
The proof here essentially follows the structure of RT, and uses many of the same technical ideas.
The main differences are in the proof of the lower bound against bounded-depth circuits, while
the quantum algorithm stays the same and has only a minor difference in its analysis.
    Our main contribution lies in simplifying some aspects of Sections 5 and 7 from [8]. Because
our 𝒟 is defined to be bounded within [−1, 1]2𝑁 , there is no need to analyze a truncation
function applied to 𝒟, as in [8, Claims 5.2 and 5.3]. Theorem 4.4 reproves [8, Theorem 7.4], using
some ideas from [8, Claim 7.2]. [8, Claim 7.3] is replaced with Lemma 4.3, while the analysis of
the random walk in [8, Theorem 7.4] is replaced with an application of Dynkin’s lemma. Also
using Dynkin’s lemma, we directly prove Theorem 4.4 using bounds on the second derivatives
of the function 𝑓 , instead of relying on Isserlis’s theorem for moments of multivariate Gaussians.


3     Technical preliminaries
We briefly review some probability and stochastic calculus concepts. See for instance [7, Chapters
2.1 and 7] for more details. We shall use the common notation (Ω, ℱ , P) to denote a probability
space, where Ω is the sample space, ℱ is the 𝜎-algebra of measurable sets, and P is the probability

                     T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                       3
                                                 X INYU W U

measure. A random variable is an ℱ -measurable function X : Ω → ℝ 𝑁 . A random variable X
induces a distribution which is a probability measure on ℝ 𝑁 , defined by 𝜇X (𝐵) = P(X−1 (𝐵)).

Definition 3.1. A stochastic process is a parametrized collection of random variables {X𝑡 } 𝑡∈𝑇
defined on a probability space (Ω, ℱ , P) and assuming values in ℝ 𝑁 .

    Typically, the parameter space is 𝑇 = [0, ∞). For each 𝑡, we have a random variable
𝜔 ↦→ X𝑡 (𝜔). On the other hand, for a fixed 𝜔 we can consider the function 𝑡 ↦→ X𝑡 (𝜔), a trajectory
of the process.

Definition 3.2. Let 𝐾 be a positive semidefinite symmetric 𝑁 × 𝑁 real matrix. An 𝑁-dimensional
Brownian motion {B𝑡 } 𝑡∈[0,∞) with mean 0 and covariance 𝐾 is a stochastic process characterized
by the following:
   (i) B0 = 0 almost surely.
  (ii) for 𝑢, 𝑡 ≥ 0 the increment B𝑡+𝑢 − B𝑡 is independent of the past, {B𝑠 } 𝑠<𝑡 .
 (iii) for 𝑢, 𝑡 ≥ 0 the increment B𝑡+𝑢 − B𝑡 is distributed as an 𝑁-dimensional Gaussian with
       mean 0 and covariance matrix 𝑢𝐾.
 (iv) almost all trajectories are continuous.
We say that B is a standard Brownian motion if 𝐾 is the identity matrix.

   We refer to [7, Section 2.2] for a proof that such a stochastic process exists on some underlying
probability space (Ω, ℱ , P).
   We now make an observation.

Proposition 3.3 (Restatement of Proposition 2.1). Let x𝑡 be a standard 𝑁-dimensional Brownian
motion, and y𝑡 = 𝐻𝑁 x𝑡 . Let B𝑡 = (x𝑡 , y𝑡 ). Then B𝑡 is a 2𝑁-dimensional Brownian motion with
covariance matrix
                                                𝐼 𝑁 𝐻𝑁
                                                       
                                        ΦB                .
                                                𝐻𝑁 𝐼 𝑁

Proof. We will check items (i)–(iv) in the definition of Brownian motion. First, we note that
B𝑡 = (x𝑡 , y𝑡 ) = (𝐼 𝑁 , 𝐻𝑁 )𝑇 x𝑡 , and so (i), (ii), and (iv) hold for B since they hold for x.
    Now we show (iii), that is, for fixed 𝑡, 𝑢, we want to show that the random variable
B𝑡+𝑢 − B𝑡 = (x𝑡+𝑢 , y𝑡+𝑢 ) − (x𝑡 , y𝑡 ) is distributed as a Gaussian with mean 0 and covariance
𝑢Φ. Using property (iii) of x, we see that x𝑡+𝑢 − x𝑡 is √          a Gaussian with mean 0 and covariance
𝑢𝐼 𝑁 , so (x𝑡+𝑢 , y𝑡+𝑢 ) − (x𝑡 , y𝑡 ) = (𝐼 𝑁 , 𝐻𝑁 )𝑇 (x𝑡+𝑢 − x𝑡 ) = 𝑢(𝐼 𝑁 , 𝐻𝑁 )𝑇 𝐺 𝑁 , where 𝐺 𝑁 is a standard
𝑁-dimensional Gaussian. Using the fact that for an 𝑛 × 𝑑 matrix 𝐴, the random variable 𝐴𝐺 𝑑
is an 𝑛-dimensional Gaussian with mean 0 and covariance 𝐴𝐴𝑇 , we obtain that B𝑡+𝑢 − B𝑡 is a
Gaussian with mean 0 and covariance 𝑢(𝐼 𝑁 , 𝐻𝑁 )𝑇 (𝐼 𝑁 , 𝐻𝑁 ) = 𝑢Φ.                                           

    We now define stopping times.

Definition 3.4. Let X = {X𝑡 } 𝑡∈[0,∞) be a stochastic process on (Ω, ℱ , P). A random variable
𝜏 : Ω → [0, ∞) is a stopping time for X if for any 𝑡 ∈ [0, ∞), the event {𝜏 ≤ 𝑡} is independent of
{X𝑠 } 𝑠>𝑡 . The stopped stochastic process X𝜏 is a random variable defined via X𝜏 (𝜔) B X𝜏(𝜔) (𝜔).

                        T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                                4
         A S TOCHASTIC C ALCULUS A PPROACH TO THE O RACLE S EPARATION OF BQP AND PH

   In particular, stopping times may be applied to a Brownian motion to produce a stopped
Brownian motion. For example, any constant 𝜏0 B 𝑡0 is a stopping time; the stopped Brownian
motion has distribution B𝑡0 . The first time that B𝑡 exits the cube [−1, 1]𝑁 , 𝜏1 B inf {𝑡 ≥ 0 | B𝑡 ∉
[−1, 1]𝑁 } is also a stopping time. The minimum or maximum of any two stopping times is a
stopping time. In particular, 𝜏 in Equation (3) is a stopping time.
   We will need to use the following fact about Brownian motion.

Proposition 3.5. Let B𝑡 be an 𝑁-dimensional standard Brownian motion, and 𝜏 be a bounded stopping
time. Then, E[kB𝜏 k 2 ] = E[𝜏].

Proof. This follows from a few well-known facts about Brownian motion. First, kB𝑡 k 2 − 𝑡
is a martingale [9, Proposition II.1.2(ii)]. Given a bounded stopping time 𝜏, E[kB𝜏 k 2 − 𝜏] =
E[kB0 k 2 ] = 0 [9, Proposition II.1.4], and so E[kB𝜏 k 2 ] = E[𝜏].                          
    The main stochastic calculus tool we will use is Dynkin’s formula, which, for a function
𝑓 : ℝ 𝑁 → ℝ 𝑁 , relates E[ 𝑓 (B𝑡 )] to the second partial derivatives of 𝑓 .

Theorem 3.6 (Dynkin’s formula, [7, Theorem 7.4.1]). Let B be an 𝑁-dimensional Brownian motion
with mean 0 and covariance matrix 𝐾, let 𝜏 be a bounded stopping time, and let 𝑓 : ℝ 𝑁 → ℝ be a twice
continuously differentiable function. The following holds:
                                                     ∫ 𝜏 Õ                             
                             E[ 𝑓 (B𝜏 )] = 𝑓 (0) + E            𝐾 𝑖𝑗 (𝜕𝑖𝑗 𝑓 )(B𝑠 ) 𝑑𝑠 
                                                                                       
                                                                                                 (5)
                                                      0 𝑖,𝑗∈[𝑁]                        
                                                                                       

where 𝜕𝑖𝑗 = 𝜕𝑥𝜕𝜕𝑥 .
                  2

              𝑖       𝑗


    We will also require the following tail bound on Brownian motion.

Proposition 3.7 ([9, Proposition II.1.8]). Let B be a standard 1-dimensional Brownian motion. For
𝑎, 𝑡 > 0,                                           
                                        Pr sup |B𝑠 | ≥ 𝑎𝑡 ≤ e−𝑎 𝑡/2 .
                                                                        2

                                            0≤𝑠≤𝑡


4   Reduction to a Fourier bound
The main technical part of Raz and Tal’s proof [8] shows that, for a Boolean function 𝑓 :
{−1, 1}2𝑁 → {−1, 1} computable by a bounded-depth, quasipolynomial-size circuit, and a
multivariate Gaussian distribution 𝒵 over ℝ2𝑁 ,

                            | E[ 𝑓 (trnc(𝒵))] − E[ 𝑓 (𝒰2𝑁 )]| ≤ 𝑂(𝛾 · polylog(𝑁)),                (6)

where 𝛾 is a bound on the (pairwise) covariances of the coordinates of 𝒵, trnc truncates 𝒵
so that the resulting random variable is within [−1, 1]𝑁 , and 𝒰𝑁 is the uniform distribution
over {−1, 1} 𝑁 . This is based on the 𝑘 = 2 case of Tal’s fundamental result [10] that gives a

                          T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                    5
                                                      X INYU W U

polylog(𝑚) upper bound on the the level-𝑘 Fourier coefficients of Boolean functions computable
by a Boolean circuit of bounded depth and size 𝑚. We state Tal’s exact bound as Theorem 5.1
below.
    Another natural way of viewing a multivariate Gaussian distribution is as the result of an
𝑁-dimensional Brownian motion stopped at a fixed time. We can also build the truncation into
the stopping time. This allows us to use tools from stochastic calculus to analyze the distribution.
    We first recall the definition of restrictions of Boolean functions.

Definition 4.1 (restriction). Let 𝑓 : {−1, 1} 𝑁 → ℝ and let 𝜌 ∈ {−1, 1, ∗} 𝑁 . Let free(𝜌) be the set
of coordinates with ∗’s. We define the restriction of 𝑓 by 𝜌 as 𝑓𝜌 : {−1, 1} 𝑁 → ℝ, where 𝑓𝜌 (𝑥) is
𝑓 evaluated at 𝜌 with the coordinates of 𝑥 replacing1 the ∗’s in 𝜌.

  Henceforth, we also identify functions on a Boolean domain, 𝑓 : {−1, 1} 𝑁 → ℝ, with their
multilinear polynomial representations (or Fourier expansions)
                                                       Õ              Ö
                                            𝑓 (𝑥) =           𝑓 (𝑆)
                                                              b             𝑥𝑖 .                     (7)
                                                      𝑆⊆[𝑁]           𝑖∈𝑆


    The following result has been extracted from the proof of Equation (2) in [8, Sec. 5].

Proposition 4.2. Let 𝒟 be a distribution on [−1, 1]𝑁 . Let z0 ∼ 𝒟 0 be sampled by first drawing z ∼ 𝒟.
Then, independently for each 𝑖 ∈ [𝑁], we will set z0𝑖 = 1 with probability (1 + z𝑖 )/2 and z0𝑖 = −1 with
probability (1 − z𝑖 )/2. For any function 𝑓 : {−1, 1} 𝑁 → ℝ, after identifying 𝑓 with its multilinear
polynomial representation, we have

                                            E [ 𝑓 (z)] = 0 E 0[ 𝑓 (z0)].
                                           z∼𝒟             z ∼𝒟

Proof. The proof follows from the Fourier expansion. First, fix 𝑧 ∈ [−1, 1]𝑁 , and then draw
z0 ∈ {−1, 1} 𝑁 using the procedure above.

                                      Õ           Ö           Õ          Ö
                        0                               0
                      𝑓        𝑧]            𝑓             𝑧         𝑓         E[z0𝑖 | 𝑧] = 𝑓 (𝑧).
                                                            
                E   [   (z ) |    = E       b (𝑆)     z 𝑖
                                                             =      b (𝑆)
               z0 ∼z                                        
                                      𝑆⊆[𝑁]       𝑖∈𝑆        𝑆⊆[𝑁]       𝑖∈𝑆
                                                            
Taking the expectation of 𝑧 over 𝒟, we infer Ez∼𝒟 [ 𝑓 (z)] = Ez0∼𝒟 0 [ 𝑓 (z0)].                       

    We make some observations about Fourier coefficients. First, the Fourier coefficients of 𝑓𝜌
        𝑓𝜌 (𝑆) = 0 for all 𝑆 * free(𝜌). We also have that
satisfy b

                                                 𝑓 (𝑆) = (𝜕𝑆 𝑓 )(0),
                                                 b                                                   (8)
                                   𝜕
where 𝜕𝑆 =        𝑖∈𝑆 𝜕𝑖 and 𝜕𝑖 = 𝜕𝑥 𝑖 is the partial derivative.
              Î

   1Although the domain of 𝑓𝜌 is {−1, 1} 𝑁 , it only depends on the coordinates in free(𝜌).


                         T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                        6
          A S TOCHASTIC C ALCULUS A PPROACH TO THE O RACLE S EPARATION OF BQP AND PH

    Further, because 𝑓 is multilinear, for any ℎ ∈ ℝ \ {0} and any standard basis vector 𝑒 𝑖 we
have
                                              𝑓 (𝑥 + ℎ𝑒 𝑖 ) − 𝑓 (𝑥)
                                 (𝜕𝑖 𝑓 )(𝑥) =                       .                        (9)
                                                      ℎ
    The following lemma is similar to [5, Claim A.5], which first appeared in [3] and [4, Claim
3.3].

Lemma 4.3. Let 𝑓 : ℝ 𝑁 → ℝ be a multilinear polynomial. For any 𝑥 ∈ [−1/2, 1/2]𝑁 , there exists a
distribution ℛ 𝑥 over restrictions 𝜌 ∈ {−1, 1, ∗} 𝑁 , such that for any 𝑖, 𝑗 ∈ [𝑁],

                                            (𝜕𝑖𝑗 𝑓 )(𝑥) = 4 E            (𝜕𝑖𝑗 𝑓𝜌 )(0) .
                                                                                   
                                                                                                                           (10)
                                                             𝜌∼ℛ 𝑥


Proof. We define ℛ 𝑥 as follows: for each coordinate 𝑖 ∈ [𝑁] we independently set 𝜌 𝑖 to be 1 with
probability 14 + 𝑥2𝑖 , to be −1 with probability 14 − 𝑥2𝑖 , and to be ∗ with probability 12 .
   Using that 𝑓 is a multilinear polynomial, and that        the coordinates are independent, we
deduce that for any 𝑦 ∈ ℝ 𝑁 , 𝑓 (𝑥 + 𝑦) = E𝜌∼ℛ 𝑥 𝑓𝜌 (2𝑦) . Then, using Equation (9),

          (𝜕𝑖𝑗 𝑓 )(𝑥) = 𝑓 (𝑥 + 𝑒 𝑖 + 𝑒 𝑗 ) − 𝑓 (𝑥 + 𝑒 𝑖 ) − 𝑓 (𝑥 + 𝑒 𝑗 ) + 𝑓 (𝑥)
                                    𝑓𝜌 (2𝑒 𝑖 + 2𝑒 𝑗 ) − 𝑓𝜌 (2𝑒 𝑗 ) − 𝑓𝜌 (2𝑒 𝑖 ) + 𝑓𝜌 (0) = 4 E            (𝜕𝑖𝑗 𝑓𝜌 )(0) .
                                                                                                                  
                     = E                                                                                                     
                        𝜌∼ℛ 𝑥                                                                 𝜌∼ℛ 𝑥

    We now show our main result, which is analogous to [5, Theorem A.7] and [8, Theorem 2.4].

Theorem 4.4. Let 𝑓 : {−1, 1} 𝑁 → {−1, 1} be a Boolean function, and let 𝐿 > 0 such that for any
restriction 𝜌,                        Õ
                                          |b𝑓𝜌 (𝑆)| ≤ 𝐿.
                                                    𝑆⊆[𝑁]
                                                     |𝑆|=2

Let 𝛾 > 0 and let B be an 𝑁-dimensional Brownian motion with mean 0 and covariance matrix 𝐾.
Further assume that |𝐾 𝑖𝑗 | ≤ 𝛾 for 𝑖 ≠ 𝑗.
    Let 𝜀 > 0 and define the stopping time

                           𝜏 B min {𝜀, first time that B𝑡 exits [−1/2, 1/2]𝑁 }.

Then, identifying 𝑓 with its multilinear representation, we have

                                           |E[ 𝑓 (B𝜏 )] − E[ 𝑓 (𝒰𝑛 )]| ≤ 2𝜀𝛾𝐿.

Proof. First, we note that E[ 𝑓 (𝒰𝑁 )] = 𝑓 (0). Note that B𝜏 is always within [−1/2, 1/2]𝑁 . We can
apply Theorem 3.6

                                                   ∫ 𝜏                                 
                                                        1 Õ
                           E[ 𝑓 (B𝜏 )] − 𝑓 (0) = E              𝐾 𝑖𝑗 (𝜕𝑖𝑗 𝑓 )(B𝑠 ) 𝑑𝑠  .
                                                                                       
                                                                                                                          (11)
                                                    0 2 𝑖,𝑗∈[𝑁]                        
                                                                                       

                        T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                                                7
                                                      X INYU W U

Then, we use the upper bound 𝜏 ≤ 𝜀, and that (𝜕𝑖𝑖 𝑓 ) = 0 for all 𝑖 ∈ [𝑁] because 𝑓 is multilinear,
to get

                                                                           
                                              1 Õ
        | E[ 𝑓 (B𝜏 )] − 𝑓 (0)| ≤ 𝜀 E  sup              𝐾 𝑖𝑗 (𝜕𝑖𝑗 𝑓 )(B𝑠 ) 
                                                                           
                                     
                                     𝑠∈[0,𝜏] 2 𝑖,𝑗∈[𝑁]                     
                                 𝜀𝛾
                                                   Õ                       
                               ≤         sup                 (𝜕𝑖𝑗 𝑓 )(𝑥)
                                   2 𝑥∈[−1/2,1/2]𝑁
                                                       𝑖≠𝑗
                                                       Õ
                                                                       (𝜕𝑖𝑗 𝑓𝜌 )(0)
                                                                                     
                               = 2𝜀𝛾        sup                E                          (Lemma 4.3)
                                        𝑥∈[−1/2,1/2]𝑁 𝑖≠𝑗 𝜌∼ℛ 𝑥

                                                         Õ                   
                                                                (𝜕𝑖𝑗 𝑓𝜌 )(0) 
                                                                             
                               ≤ 2𝜀𝛾     sup        E 
                                     𝑥∈[−1/2,1/2]𝑁 𝜌∼ℛ 𝑥  𝑖≠𝑗                
                                                                             
                                                                             
                                                          Õ                  
                                                                      𝑓𝜌 (𝑆) 
                                                                             
                               ≤ 2𝜀𝛾     sup        E               b                   (Equation (8))
                                     𝑥∈[−1/2,1/2]𝑁 𝜌∼ℛ 𝑥 𝑆⊆free(𝜌)           
                                                                             
                                                            |𝑆|=2            
                               ≤ 2𝜀𝛾𝐿.                                                                     


5   Bound for classical circuits
We now construct the distribution 𝒟 as described in Section 2.1. Due to Proposition 2.1, we can
take 𝒟 to be the distribution defined by B𝜏 , where B is a 2𝑁-dimensional Brownian motion
with covariance matrix Φ, 𝜏 is defined as in Theorem 4.4, and and 𝜀 will be chosen appropriately
before Equation (12) below.
   Following Raz–Tal, we first use the following result of Tal [10] on the Fourier weight of
Boolean circuits.

Theorem 5.1 (Theorem 37(3) of [10]). There exists a constant 𝐶 > 0 such that the following holds.
Let 𝑓 : {−1, 1}2𝑁 → {−1, 1} be a Boolean function computed by a Boolean circuit of depth 𝑑 and size
𝑚 > 1. Then for all 𝑘,            Õ
                                                  𝑓 (𝑆)| ≤ 2(𝐶(log 𝑚)𝑑−1 ) 𝑘 .
                                                 |b
                                       𝑆:|𝑆|=𝑘

    In particular, if 𝑓 is computed by a bounded-depth circuit of quasipoly(𝑁) size, then
                                            Õ
                                                     𝑓 (𝑆)| ≤ polylog(𝑁).
                                                    |b
                                          𝑆:|𝑆|=2


    Since restriction does not increase circuit size or depth, we can apply Theorem 4.4 with

                        T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                              8
          A S TOCHASTIC C ALCULUS A PPROACH TO THE O RACLE S EPARATION OF BQP AND PH

𝜀 = 1/(8 ln 2𝑁) and 𝛾 = √1 , to deduce that
                               𝑁

                                                                    polylog (𝑁)
                                         | E[ 𝑓 (B𝜏 )] − 𝑓 (0)| ≤       √       ,                 (12)
                                                                          𝑁
where B𝜏 is defined as in Theorem 4.4, justifying Equation (1).                                     


6    Quantum algorithm
Finally, we show that a 1-query 𝑂(log 𝑁)-time quantum algorithm can distinguish 𝒟 from the
uniform distribution. This is virtually identical to the argument in [8, Section 6], but we can
again use some stochastic calculus tools on the stopping time built into our, slightly different,
distribution.
    The Forrelation query algorithm [1, 2] is an 𝑂(log 𝑁)-time quantum algorithm with inputs
𝑥, 𝑦 ∈ {−1, 1} 𝑁 which accepts with probability (1 + 𝜙(𝑥, 𝑦))/2, where
                                                            1
                                               𝜙(𝑥, 𝑦) B      h𝑥, 𝐻𝑁 𝑦i.                          (13)
                                                            𝑁
When the pair (𝑥, 𝑦) is drawn from the uniform distribution, E[𝜙(𝑥, 𝑦)] = 0. The quantum
algorithm is to prepare the uniform superposition over the basis states |1i . . . |𝑁i, query 𝑥, apply
the Walsh–Hadamard transform, query 𝑦, apply the Walsh–Hadamard transform again, then
measure in the computational basis and accept if the outcome is |1i. The quantum algorithm is
described in more detail in [1, Section 3.2].
    We show the following inequality [8, Claim 6.3], which shows that the quantum algorithm
distinguishes 𝒟 from the uniform distribution with sufficiently high probability, justifying
Equation (2).
Proposition 6.1. E(x𝜏 ,y𝜏 )∼𝒟 [𝜙(x𝜏 , y𝜏 )] ≥ 4𝜀 .
Proof. We have
                                                       1
                              E         [𝜙(x𝜏 , y𝜏 )] =  E[hx𝜏 , 𝐻𝑁 y𝜏 i]
                          (x𝜏 ,y𝜏 )∼𝒟                  𝑁
                                                       1
                                                     =   E[hx𝜏 , 𝐻𝑁
                                                                  2
                                                                    x𝜏 i] = E[kx𝜏 k 2 ] .
                                                       𝑁
By Proposition 3.5, we have
                                                   E[kx𝜏 k 2 ] = E[𝜏] .                           (14)
By Markov’s inequality,
                                               𝜀
                                                 Pr[𝜏 > 2𝜀 ].
                                                 E[𝜏] ≥
                                               2
If 𝜏 ≤ 2𝜀 , it must be the case that the path exits [−1/2, 1/2]2𝑁 no later than 2𝜀 . Hence, by the
union bound, we have
           Pr 𝜏 ≤ 2𝜀 ≤ 2𝑁 · Pr 1st coordinate of (x𝜏 , y𝜏 ) exits − 12 , 12 not later than 2𝜀 .
                                                                                        
                                                                                                  (15)

                         T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                      9
                                                          X INYU W U

                                                                                                    (1)
   Each coordinate of (x𝜏 , y𝜏 ) is a standard 1D Brownian motion since 𝐾 𝑖𝑖 = 1 for all 𝑖. Let B𝑡
denote the first coordinate of x𝑡 . Applying Proposition 3.7,
                        "                         #
                                        (1)   1                                   1
                   Pr        sup      |B𝑡 | ≥         ≤ 2e−1/4𝜀 = 2e−2 ln 2𝑁 ≤        for 𝑁 ≥ 4.   (16)
                            0≤𝑡≤𝜀/2           2                                  4𝑁

Therefore, Pr[𝜏 ≤ 2𝜀 ] ≤ 12 , so E[𝜏] ≥ 4𝜀 .                                                         


7    Acknowledgments
I would like to thank Ryan O’Donnell and Avishay Tal for helpful discussions and their
suggestions concerning an early draft. Thanks also to Gregory Rosenthal and anonymous
reviewers for helpful comments.


References
 [1] Scott Aaronson: BQP and the Polynomial Hierarchy. In Proc. 42nd STOC, pp. 141–150.
     ACM Press, 2010. [doi:10.1145/1806689.1806711] 1, 2, 9

 [2] Scott Aaronson and Andris Ambainis: Forrelation: a problem that optimally separates
     quantum from classical computing. SIAM J. Comput., 47(3):982–1038, 2018. Preliminary
     version in STOC’15. [doi:10.1137/15M1050902] 2, 9

 [3] Boaz Barak and Jarosław Błasiok: On the Raz-Tal oracle separation of BQP and PH, 2018.
     windowsontheory.org. 7

 [4] Eshan Chattopadhyay, Pooya Hatami, Kaave Hosseini, and Shachar Lovett: Pseudoran-
     dom generators from polarizing random walks. Theory of Computing, 15(10):1–26, 2019.
     Preliminary version in CCC’18. [doi:10.4086/toc.2019.v015a010] 7

 [5] Eshan Chattopadhyay, Pooya Hatami, Shachar Lovett, and Avishay Tal: Pseudorandom
     generators from the second Fourier level and applications to AC0 with parity gates. In
     Proc. 10th Innovations in Theoret. Comp. Sci. Conf. (ITCS’19), pp. 22:1–22:15. Schloss Dagstuhl–
     Leibniz-Zentrum fuer Informatik, 2019. [doi:10.4230/LIPIcs.ITCS.2019.22, ECCC:TR18-155]
     7

 [6] Merrick Lee Furst, James B. Saxe, and Michael Sipser: Parity, circuits, and the polynomial-
     time hierarchy. Math. Systems Theory, 17(1):13–27, 1984. Preliminary version in FOCS’81.
     [doi:10.1007/BF01744431] 1

 [7] Bernt Øksendal: Stochastic Differential Equations. Universitext. Springer, 6th edition, 2003.
     [doi:10.1007/978-3-642-14394-6] 3, 4, 5

                        T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                       10
         A S TOCHASTIC C ALCULUS A PPROACH TO THE O RACLE S EPARATION OF BQP AND PH

 [8] Ran Raz and Avishay Tal: Oracle separation of BQP and PH. In Proc. 51st STOC, pp. 13–23.
     ACM Press, 2019. [doi:10.1145/3313276.3316315, ECCC:TR18-107] 1, 2, 3, 5, 6, 7, 9

 [9] Daniel Revuz and Marc Yor: Continuous Martingales and Brownian Motion. Volume 293 of
     Grundlehren der Math. Wiss. Springer, 3rd edition, 1999. [doi:10.1007/978-3-662-06400-9] 5

[10] Avishay Tal: Tight bounds on the Fourier spectrum of AC0 . In Proc. 32nd Comput. Complexity
     Conf. (CCC’17), pp. 15:1–15:31. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2017.
     [doi:10.4230/LIPIcs.CCC.2017.15, ECCC:TR14-174] 5, 8


AUTHOR

     Xinyu Wu
     Graduate student
     Computer Science Department
     Carnegie Mellon University
     Pittsburgh, PA, USA
     xinyuwu cmu edu
     https://www.andrew.cmu.edu/user/xinyuw1/


ABOUT THE AUTHOR

     Xinyu Wu is a Ph. D. student at Carnegie Mellon University, advised by Ryan
        O’Donnell and Pravesh Kothari. Her research interests are in spectral graph
        theory, free probability, and their applications in understanding average-case
        problems and quantum computing. She grew up in Singapore, and did her
        undergraduate degree in math and computer science also at Carnegie Mellon.
        She also enjoys cooking, cycling, and cat videos.




                    T HEORY OF C OMPUTING, Volume 18 (17), 2022, pp. 1–11                    11