The Complexity of Deciding Statistical Properties of Samplable Distributions

Authors Thomas Watson,
Plaintext
                           T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34
                                        www.theoryofcomputing.org




        The Complexity of Deciding Statistical
         Properties of Samplable Distributions
                                                  Thomas Watson∗
              Received December 10, 2013; Revised February 13, 2015; Published March 12, 2015




       Abstract: We consider the problems of deciding whether the joint distribution sampled
       by a given circuit has certain statistical properties such as being i. i. d., being exchangeable,
       being pairwise independent, having two coordinates with identical marginals, having two
       uncorrelated coordinates, and many other variants. We give a proof that simultaneously shows
       all these problems are C= P-complete, by showing that the following promise problem (which
       is a restriction of all the above problems) is C= P-complete: Given a circuit, distinguish the
       case where the output distribution is uniform and the case where every pair of coordinates
       is neither uncorrelated nor identically distributed. This completeness result holds even for
       samplers that are depth-3 circuits.
            We also consider circuits that are d-local, in the sense that each output bit depends on at
       most d input bits. We give linear-time algorithms for deciding whether a 2-local sampler’s
       joint distribution is fully independent, and whether it is exchangeable.
            We also show that for general circuits, certain approximation versions of the problems of
       deciding full independence and exchangeability are SZK-complete.
            We also introduce a bounded-error version of C= P, which we call BC= P, and we
       investigate its structural properties.

ACM Classification: F.1.3
AMS Classification: 68Q17, 68Q15
Key words and phrases: complexity theory, algorithms, complexity classes, completeness, samplable
distributions
     An extended abstract of this paper appeared in the Proceedings of the 31st International Symposium on Theoretical Aspects
of Computer Science (STACS), 2014 [39].
   ∗ Supported by funding from NSERC.



© 2015 Thomas Watson
c b Licensed under a Creative Commons Attribution License (CC-BY)                               DOI: 10.4086/toc.2015.v011a001
                                                      T HOMAS WATSON

1     Introduction
Testing for independence of random variables is a fundamental problem in statistics. Theoretical
computer scientists have studied this and other analogous problems from two main viewpoints. The
first viewpoint is property testing of distributions, which is a black-box model in which a tester is
given samples and tries to distinguish between some statistical property being “close” or “far” from
satisfied. Some important works giving upper and lower bounds for property testing of distributions
include [6, 5, 4, 7, 29, 2, 28, 35, 30, 26, 14, 13].
      The other viewpoint is the white-box model in which a tester is given a description of a distribution
(from which it could generate its own samples). This could potentially make some problems easier,
but there are complexity-theoretic results showing that several such problems are computationally hard,
particularly when the input is a succinct description of a distribution. One of the most general and
natural ways to succinctly specify a distribution is to give the code of an efficient algorithm that takes
“pure” randomness and transforms it into a sample from the distribution. (This gives a polynomial-size
specification of a distribution over a potentially exponential-size set.) For arbitrary circuit samplers, the
papers [31, 20, 21, 40] contain completeness results for various approximation problems concerning
statistical distance, Shannon entropy, and min-entropy. See [22] for a survey of both the black-box and
the white-box viewpoints.
      In this paper we consider a wide array of “exact” problems concerning statistical properties of the joint
distribution produced by a given sampler. Such problems include deciding whether the joint distribution is
i. i. d., exchangeable, pairwise independent, and many other variants. Exchangeability is a very important
and useful concept with many different applications in pure and applied probability [25, 1], but it has been
less-often studied in the theoretical computer science community. A joint distribution over a finite domain
is called exchangeable if it is invariant under permuting the coordinates. It is fairly straightforward to
see that a finite distribution is exchangeable iff it is a mixture of distributions that arise from drawing a
sequence of colored balls without replacement from an urn1 [16]. When each coordinate is a single bit,
exchangeability is equivalent to the probability of a string only depending on the Hamming weight. We
feel it is natural to pose complexity-theoretic questions about exchangeability.
      We prove that the aforementioned wide array of problems, and more generally a single problem we
call PANOPTIC -S TATS which is at most as hard as any of those problems, are complete for the complexity
class C= P. This class was introduced in [38] as part of the counting hierarchy, and it can be viewed
as a class that captures “exact counting” of NP witnesses. The class C= P is at least as hard as the
polynomial-time hierarchy, since PH ⊆ BP · C= P [33] and even PH ⊆ ZP · C= P [32]. It is at most as hard
as “threshold counting,” since C= P ⊆ PP, and it is not substantially easier, since PP ⊆ NPC= P . The class
C= P has been given several names and characterizations; it equals the classes2 coNQP [18] and ES [11].
      In many areas of complexity theory, when arbitrary small-size circuits are too unwieldy to reason
about, we restrict our attention to more stringent complexity measures that are combinatorially simple
enough to reason about and obtain unconditional results. The two major categories of such complexity
    1 This is a finite analogue of De Finetti’s Theorem.
    2 NQP is a nondeterministic version of quantum polynomial time. It differs from QMA, an alternative such version, in that

NQP is defined in terms of zero vs. non-zero probability of acceptance by a quantum algorithm (like the nondeterminism-based
definition of NP), whereas QMA is defined in terms of a quantum algorithm given a quantum state as a witness (like the
verification-based definition of NP). These two classes are not known to be comparable.


                             T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                          2
        T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

measures are parallel time, and space. One model of efficient parallel time computation is AC0 (constant-
depth unbounded fan-in circuits with AND, OR, and NOT gates). Papers that study AC0 circuits that
sample distributions include [36, 27, 37, 8]. Another (generally more restrictive) model of efficient parallel
time computation is locally-computable functions, where each output bit depends on at most a bounded
number of input bits. Papers that study locally-computable functions as samplers include [36, 17, 15, 37,
40] as well as a large collection of papers investigating the possibility of implementing pseudorandom
generators locally. (See [15] for an extensive list of past work on the power of locally-computable
functions, including whether they can implement PRGs, one-way functions, and extractors.) The most
common model for logarithmic-space samplers is one with streaming/one-way access to the pure random
input bits. Topics that have been studied concerning such logspace samplers include compression [34],
extraction [24], and min-entropy estimation [40]. One more paper worth mentioning is [10], which
considers Markov random fields as succinct descriptions of distributions (though these descriptions would
not be considered “samplers”).
    We prove that our C= P-completeness results hold even when restricted to samplers that are AC0 -type
circuits with depth 3 and top fan-in 2 (i. e., each output gate has fan-in at most 2). We also consider
2-local samplers (where each output bit depends on at most 2 of the pure random input bits) such that each
coordinate of the sampled joint distribution is a single bit. We give polynomial-time (in fact, linear-time)
algorithms for deciding whether such a sampler’s distribution is fully independent, and whether it is
exchangeable. These seem to be the first-of-a-kind algorithmic results on deciding statistical properties of
succinctly described distributions.
    We also consider approximate versions of the problems discussed above: deciding whether the
joint distribution of a given sampler is statistically close to or far from satisfying a property. It was
shown in [20] that for the property of being uniform, the problem is complete for the class NISZK
(non-interactive statistical zero-knowledge). It was shown in [31] that the problem of deciding whether a
pair of samplable distributions are statistically close or far is complete for the class SZK (statistical zero
knowledge). We prove that with suitable parameters, the approximate versions of the full independence
and exchangeability problems (for general circuit samplers) are also SZK-complete.
    In this paper we also consider a “bounded-error” version of C= P, which we call BC= P and which
does not seem to have been defined or studied in the literature before. Although it does not appear
to be directly relevant to statistical properties of samplable distributions, we take the opportunity to
study this class and prove that it is closed under several operations (disjunction, conjunction, union, and
intersection).


2    Results
If D is a joint distribution over ({0, 1}k )n , we let Di (for i ∈ {1, . . . , n}) denote the ith coordinate, which is
marginally distributed over {0, 1}k . For each of the computational problems we consider, the input is a
circuit S : {0, 1}r → ({0, 1}k )n (and we assume that the values of k and n are part of the description of the
circuit). We call such a circuit a (k, n)-sampler, and if it has size ≤ s we also call it a (k, n, s)-sampler.
Plugging a uniformly random string into S yields a joint output distribution, which we denote by S(U).
We formulate computational problems using the framework of promise problems. Throughout this paper,
when we talk about reductions and completeness, we are always referring to Karp reductions (polynomial-

                          T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                      3
                                              T HOMAS WATSON

time many-one reductions). We refer to the texts [3, 19] for expositions of standard complexity classes
and completeness.
    We state our completeness results for exact problems in Section 2.1 and prove them in Section 3. We
state our algorithmic results for exact problems in Section 2.2 and prove them in Section 4. We state our
completeness results for approximate problems in Section 2.3 and prove them in Section 5. We consider
a new complexity class, BC= P, in Section 6, and we list some open problems in Section 7.

2.1   Exact completeness results
For a joint distribution D over ({0, 1}k )n , we say that Di , D j are uncorrelated if they have covariance 0, in
other words E(Di · D j ) = E(Di ) · E(D j ) (when {0, 1}k is interpreted as binary representations of integers
from 0 to 2k − 1). Uncorrelated is the same as independent if k = 1. We consider the following extreme
notion of a distribution being nonuniform.

Definition 2.1. A joint distribution is discordant if there are ≥ 2 coordinates and every pair of coordinates
is neither uncorrelated nor identically distributed.

Definition 2.2. PANOPTIC -S TATS is the following promise problem.
                                                  
                       PANOPTIC -S TATSYES = S : S(U) is uniform ,
                                                  
                        PANOPTIC -S TATSNO = S : S(U) is discordant .

    We say that promise problem Π is a generalization of promise problem Π0 , or that Π0 is a restriction
of Π, if Π0YES ⊆ ΠYES and Π0NO ⊆ ΠNO .

Fact 2.3. PANOPTIC -S TATS is generalized by all the following languages, which are defined in a natural
way.

      U NIFORM, I ID, F ULLY-I NDEPENDENT, I DENTICALLY-D ISTRIBUTED, E XCHANGEABLE,
      K-W ISE -U NIFORM, K-W ISE -I NDEPENDENT, K-W ISE -E XCHANGEABLE,
      2-W ISE -U NCORRELATED, K-E XISTS -U NIFORM, K-E XISTS -I NDEPENDENT,
      K-E XISTS -I DENTICALLY-D ISTRIBUTED, K-E XISTS -E XCHANGEABLE,
      2-E XISTS -U NCORRELATED, N ON -D ISCORDANT.

For example, S ∈ U NIFORM ⇐⇒ S(U) is uniform. Also, K ≥ 2 is any constant (unrelated to k). Technical
caveat: To ensure the K-W ISE - and K-E XISTS - problems generalize PANOPTIC -S TATS, they are defined
in terms of a property holding for every or some (respectively) set of min(K, n) coordinates.

    We prove that PANOPTIC -S TATS and all the languages listed in Fact 2.3 are complete for the com-
plexity class C= P (defined below). In fact, the C= P-hardness of each of the individual languages in
Fact 2.3 is fairly simple to prove, but the C= P-hardness of PANOPTIC -S TATS shows two things: (1) that
this phenomenon is very robust, not dependent on some fragile aspects of the properties being decided,
and (2) that only one proof is needed to show the C= P-hardness of all the languages in Fact 2.3.
    To prove the C= P-hardness of PANOPTIC -S TATS, it suffices to prove hardness for the case n = 2.
However, hardness for n = 2 does not seem to directly imply hardness for a larger number of coordinates;

                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                  4
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

it is desirable to prove hardness even when restricted to samplers that are small in terms of the number of
coordinates n. We formalize this by introducing a new parameter m and viewing k, n, s as functions of m.
Thus m can be thought of as indexing a family of parameter settings.

Definition 2.4. We say that a triple of functions κ(m), ν(m), σ (m) : N → N is polite if the functions are
monotonically nondecreasing, polynomially bounded in m, computable in time polynomial in m, and
σ (m) ≥ m.

Definition 2.5. PANOPTIC -S TATSκ,ν,σ is the restriction of PANOPTIC -S TATS to (k, n, s)-samplers with
k = κ(m), n = ν(m), and s ≤ σ (m) for some m, where κ, ν, σ is assumed to be polite.

   We now state the definition of our central complexity class, C= P. We use a standard model of
computation in which randomized algorithms have access to independent unbiased coin flips.

Definition 2.6. prC= P is the class of all promise problems for which there exists a polynomial-time
randomized algorithm M that accepts with probability 1/2 on YES instances, and accepts with probability
6= 1/2 on NO instances. Also, C= P is defined as the class of languages in prC= P.

Proposition 2.7. prC= P is the class of all promise problems Karp-reducible to the following promise
problem, U NIFORM -B IT.
                                 
              U NIFORM -B ITYES = S : S is a (1, 1)-sampler and S(U) is uniform ,
                                 
               U NIFORM -B ITNO = S : S is a (1, 1)-sampler and S(U) is nonuniform .

Proof. Suppose Π ∈ prC= P is witnessed by M taking an input x and a uniformly random string y of
some polynomial length. To reduce Π to U NIFORM -B IT, map x to Sx where Sx (y) = M(x, y). Conversely,
suppose Π reduces to U NIFORM -B IT. Then Π ∈ prC= P is witnessed by M that takes x, runs the reduction
to get a (1, 1)-sampler Sx , and runs Sx on a uniformly random input.

Theorem 2.8. PANOPTIC -S TATSκ,ν,σ is prC= P-hard for every polite κ, ν, σ with κν ≤ o(σ ).

Theorem 2.9. PANOPTIC -S TATSκ,ν,σ is prC= P-hard even when restricted to samplers that are AC0 -type
circuits with depth 3 and top fan-in 2, for every polite κ, ν, σ with κν + ν 2 ≤ o(σ ).

Theorem 2.10. All the languages listed in Fact 2.3 are in C= P.

    Consequently, all the languages listed in Fact 2.3 are C= P-complete, even when restricted to (κ, ν, σ )-
samplers (like in Definition 2.5) with polite κ, ν, σ satisfying κν ≤ o(σ ) (for general circuit samplers) or
satisfying κν + ν 2 ≤ o(σ ) (for depth-3 circuits with top fan-in 2).
    We mention that some problems concerning conditional independence are also C= P-complete. For
example, deciding whether the first n − 1 coordinates of S(U) are fully independent conditioned on
the last coordinate is at least as hard as the corresponding non-conditional problem. Another problem
concerning conditional independence is whether S(U) forms a (time-inhomogeneous) Markov chain
(assuming n ≥ 3). The construction in our proof of Theorem 2.8 also shows that this problem is C= P-hard.
Both these problems are in C= P by the same techniques used in the proof of Theorem 2.10.

                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               5
                                            T HOMAS WATSON

2.2   Exact algorithmic results
We say a (k, n, s)-sampler is d-local if each of the kn output bits depends on at most d of the uniformly
random input bits. For d-local samplers, if dk ≤ O(log s) then some statistical properties, such as being
pairwise independent or having identically distributed marginals, can be decided trivially in polynomial
time. We now prove that some other properties, namely being fully independent or being exchangeable,
can be decided in polynomial time when d = 2 and k = 1. (Admittedly, our algorithms are not very
“algorithmic”; we prove combinatorial characterizations for which it is trivial to check whether a given
sampler satisfies the characterization.)

Theorem 2.11. There exists a linear-time algorithm for deciding whether the joint distribution of a given
2-local (1, n)-sampler is fully independent.

Theorem 2.12. There exists a linear-time algorithm for deciding whether the joint distribution of a given
2-local (1, n)-sampler is exchangeable.

    When d = 2 and k = 1, we can also improve the efficiency of the trivial quadratic-time algorithm for
deciding pairwise independence.

Theorem 2.13. There exists a linear-time Karp reduction from the problem of deciding whether the joint
distribution of a given 2-local (1, n)-sampler is pairwise independent, to the element distinctness problem.
Hence the former problem can be solved in deterministic O(n log n) time and in zero-error randomized
expected linear time.

     One can also consider logspace samplers that have streaming/one-way access to their random input
bits, and which are usually modeled as layered read-once branching programs representing a certain type
of (time-inhomogeneous) Markov chain. For logspace samplers, some statistical properties, such as being
pairwise independent or having identically distributed marginals, can be decided in polynomial time by
straightforward dynamic programming algorithms; the complexities of deciding full independence and
exchangeability remain open.

2.3   Approximate completeness results
We quantify approximation in terms of statistical distance (also known as total variation distance).

Definition 2.14. The statistical distance between two distributions D(1) , D(2) over the same set is defined
as

                        D(1) − D(2)    = max         Pr[D(1) ∈ E] − Pr[D(2) ∈ E]
                                          events E

                                                     Pr[D(1) ∈ E] − Pr[D(2) ∈ E]
                                                                                
                                       = max
                                          events E
                                        1
                                       = · ∑ Pr[D(1) = w] − Pr[D(2) = w] .
                                        2 outcomes w

We say D(1) , D(2) are c-close if D(1) − D(2) ≤ c, and f -far if D(1) − D(2) ≥ f .

                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                              6
         T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

    We prove that for appropriate parameters, approximate versions of the full independence and ex-
changeability problems are prSZK-complete (for arbitrary circuit samplers). We do not reproduce the
original definition of prSZK, but we make use of the characterization of this class proved by Sahai and
Vadhan [31]. The following is our general formulation of the approximate full independence problem.

Definition 2.15. For functions 0 ≤ c(k, n, s) < f (k, n, s) ≤ 1, F ULLY-I NDEPENDENTc, f is the following
promise problem.3

    F ULLY-I NDEPENDENTc, f  
                       YES =  S : S is a (k, n, s)-sampler and S(U) is c(k, n, s)-close
                                  to some fully independent distribution over ({0, 1}k )n ,
     F ULLY-I NDEPENDENTc, f 
                        NO = S : S is a (k, n, s)-sampler and S(U) is f (k, n, s)-far
                                  from every fully independent distribution over ({0, 1}k )n .

Theorem 2.16. F ULLY-I NDEPENDENTc, f is prSZK-hard for all constants 0 < c < f < 1/4.

Theorem 2.17. F ULLY-I NDEPENDENTc, f ∈ prSZK where c = c0 /(n + 1), for all constants 0 < c0 < f 2 <
1.

    We have, for example, that F ULLY-I NDEPENDENT0.05/(n+1), 0.24 is prSZK-complete. The contain-
ment in prSZK follows from Theorem 2.17. Although the prSZK-hardness does not follow from the
statement of Theorem 2.16, the proof indeed yields this; we stated Theorem 2.16 using constants for the
sake of simplicity and clarity. (It is open to prove Theorem 2.17 with constant c.)

Definition 2.18. For functions 0 ≤ c(k, n, s) < f (k, n, s) ≤ 1, E XCHANGEABLEc, f is the following
promise problem.

          E XCHANGEABLEc, f  
                       YES =  S : S is a (k, n, s)-sampler and S(U) is c(k, n, s)-close
                                  to some exchangeable distribution over ({0, 1}k )n ,
           E XCHANGEABLEc, f 
                        NO = S : S is a (k, n, s)-sampler and S(U) is f (k, n, s)-far
                                  from every exchangeable distribution over ({0, 1}k )n .

Theorem 2.19. E XCHANGEABLEc, f is prSZK-hard for all constants 0 < c < f < 1/2.

Theorem 2.20. E XCHANGEABLEc, f ∈ prSZK for all constants 0 < 2c < f 2 < 1.

     Consequently, for example, E XCHANGEABLE0.12, 0.49 is prSZK-complete.


3     Proofs of exact completeness results
We prove a key lemma in Section 3.1. Then we use the key lemma to prove Theorem 2.8 and Theorem 2.9
in Section 3.2. Then we prove Theorem 2.10 in Section 3.3.
    3 The superscripts have a different meaning than the superscripts in Definition 2.5.



                             T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                      7
                                                   T HOMAS WATSON

3.1     The key lemma
The following is the key lemma in the proof of Theorem 2.8. It can be interpreted qualitatively as a
certain type of amplification.

Lemma 3.1. There is an algorithm that takes as input a (1, 1, s)-sampler S and an integer n ≥ 2, runs in
time O(n + s), and outputs a (1, n, O(n + s))-sampler T such that the following both hold.

                                  S(U) is uniform          =⇒ T (U) is uniform ,
                                  S(U) is nonuniform =⇒ T (U) is discordant .

Proof. Let T perform the following computation.


      run S and let b be its output
      choose bits a1 , a2 , . . . , an uniformly at random
      if there exists an ` < n such that a` = 0 then
           let `∗ be the least such `
           output a1 , . . . , a`∗ , b, a`∗ +2 , . . . , an
      else output a1 , . . . , an


    It is straightforward to see that if S(U) is uniform then T (U) is uniform. Now suppose S(U)
is nonuniform, say Pr[S(U) = 1] = p 6= 1/2. For brevity we define D = T (U). Consider any two
coordinates Di and D j where i < j. For technical reasons in the analysis below, if `∗ does not exist then
we define `∗ to be an arbitrary value > n.
    We first show that Di and D j are not identically distributed. If i > 1 then

      Pr[Di = 1] = Pr Di = 1 `∗ = i − 1 · Pr[`∗ = i − 1] + Pr Di = 1 `∗ 6= i − 1 · Pr[`∗ 6= i − 1]
                                                                             

                         1   1        1 
                 = p · i−1 + · 1 − i−1 .
                       2     2       2
Similarly,
                                                              1  1       1 
                                      Pr[D j = 1] = p ·        +  · 1 −       .
                                                          2 j−1 2       2 j−1
Since p 6= 1/2, and since Pr[Di = 1] and Pr[D j = 1] are different convex combinations of p and 1/2, that
means they are not equal. More formally,
                                                                      1  1   1 
                           Pr[Di = 1] − Pr[D j = 1] =             p−          −     6= 0 .
                                                                       2 2i−1 2 j−1

On the other hand, suppose i = 1. Then Pr[Di = 1] = 1/2, and Pr[D j = 1] is a nontrivial convex
combination of p and 1/2 and is thus not equal to Pr[Di = 1]. In either case, Di and D j are not identically
distributed.

                            T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                          8
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

                                                                                           
    Now we show that Di and D j are correlated. Suppose j = i + 1. Then Pr D j = 1 Di = 1 = 1/2, and
              Pr D j = 1 Di = 0 = Pr D j = 1 `∗ = i, Di = 0 · Pr `∗ = i Di = 0 +
                                                                               

                                        Pr D j = 1 `∗ < i, Di = 0 · Pr `∗ < i Di = 0
                                                                                 

                                                             1
                                    = p · Pr `∗ = i Di = 0 + · 1 − Pr `∗ = i Di = 0 .
                                                                                      
                                                               2
(Technically Pr D j = 1 ` < i, Di = 0 is undefined if i = 1, but then 1 − Pr `∗ = i Di = 0 = 0 anyway
                          ∗
                                                                                        

so the final equation above still holds.) It follows that
                                                            1  ∗               
            Pr D j = 1 Di = 0 − Pr D j = 1 Di = 1 = p −             · Pr ` = i Di = 0 6= 0
                                                                 2
                       ∗              
since p 6= 1/2 and Pr ` = i Di = 0 > 0. On the other hand, suppose j > i + 1. Then
                                                         
                                        Pr D j = 1 Di = 0 = 1/2 ,
and
         Pr D j = 1 Di = 1 = Pr D j = 1 `∗ = j − 1, Di = 1 · Pr `∗ = j − 1 Di = 1 +
                                                                                   

                                 Pr D j = 1 `∗ 6= j − 1, Di = 1 · Pr `∗ 6= j − 1 Di = 1
                                                                                     

                                                          1
                             = p · Pr `∗ = j − 1 Di = 1 + · 1 − Pr `∗ = j − 1 Di = 1 .
                                                                                         
                                                             2
It follows that
                                                        1  ∗                    
         Pr D j = 1 Di = 1 − Pr D j = 1 Di = 0 = p −            · Pr ` = j − 1 Di = 1 6= 0
                                                             2
                      ∗                 
since p 6= 1/2 and Pr ` = j − 1 Di = 1 > 0. In either case, Di and D j are correlated since
                                                                    
                            Pr D j = 1 Di = 0 6= Pr D j = 1 Di = 1 .
Lemma 3.2. Lemma 3.1 holds even when T is required to be an AC0 -type circuit with depth 3 and top
fan-in 2, except that the size of T and the running time of the algorithm both become O(n2 + s).
Proof. The construction and analysis are the same as in the proof of Lemma 3.1, but we need more
care in implementing T . First, we use a standard reduction to convert S into a 3-CNF F that accepts
the same number of inputs as S (but has more input bits). Thus, for some polynomially large q, S
accepts a uniformly random input with probability 1/2 iff F accepts a uniformly random input with
probability 1/2q . Let x1 , x2 , . . . , xr denote the input bits of F. Construct a new CNF F 0 with input bits
x0 , x1 , . . . , xr by taking F and including x0 in each of the clauses (yielding a 4-CNF), then adding a new
clause (x0 ∨ x1 ∨ · · · ∨ xq ). Since
                                              1                  1
                           Pr[F 0 accepts] = · Pr[F accepts] + · Pr (x1 ∨ · · · ∨ xq ) accepts
                                                                                             
                                              2                  2
                                                       q     0
it follows that F accepts with probability 1/2 iff F accepts with probability 1/2. Now to implement
T , we include a copy of F 0 as well as the random input bits a1 , a2 , . . . , an . The 1st output bit of T
is just a1 . For the ith output bit when i > 1, we have a multiplexer that selects the output of F 0 if
(a1 ∧ a2 ∧ · · · ∧ ai−2 ∧ ai−1 ) is true, and selects ai otherwise. Overall, T is an OR-AND-OR circuit (with
negations pushed to the inputs) where each output gate has fan-in at most 2.

                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                 9
                                               T HOMAS WATSON

3.2 prC= P-hardness
We need one final corollary before we are ready to put the pieces together to prove Theorem 2.8 and
Theorem 2.9.
Corollary 3.3. Lemma 3.1 and Lemma 3.2 also hold when the algorithm is additionally given an integer
k ≥ 1 and is required to output a (k, n)-sampler T , except that the size of T and the running time of the
algorithm both become O(kn + s) (for Lemma 3.1) or O(kn + n2 + s) (for Lemma 3.2).
Proof. If T 0 is the output of the algorithm from Lemma 3.1 or Lemma 3.2, we can trivially modify it into
a sampler T that prepends independent uniformly random bit strings of length k − 1 to the n coordinates.
In the YES case, T (U) is still uniform. Consider the NO case. The property that no two coordinates
are identically distributed is inherited from T 0 . To see that coordinates T (U)i , T (U) j are still correlated,
abbreviate T (U) as D and T 0 (U) as D0 , and let Di = D0i + I and D j = D0j + J where I, J are independent
uniformly random even numbers in the range {0, . . . , 2k − 2}, and note that

                     E(Di D j ) = E(D0i D0j ) + E(D0i J) + E(ID0j ) + E(IJ)
                                = E(D0i D0j ) + E(D0i ) E(J) + E(I) E(D0j ) + E(I) E(J)
                                6= E(D0i ) E(D0j ) + E(D0i ) E(J) + E(I) E(D0j ) + E(I) E(J)
                                 = E(D0i ) + E(I) E(D0j ) + E(J)
                                                                    

                                = E(Di ) E(D j ) .

Proof of Theorem 2.8. We reduce U NIFORM -B IT to PANOPTIC -S TATSκ,ν,σ . Let c be the constant factor
in the big O inCorollary 3.3. Given a (1, 1, s)-sampler S, we first find the smallest m such that c ·
 κ(m)ν(m) + s ≤ σ (m). Such an m exists and is O(s) because κν ≤ o(σ ) and σ (m) ≥ m for all m.
Then we run the algorithm from Corollary
                                        3.3 (based on Lemma 3.1) with k = κ(m) and n = ν(m) to get
T of size at most c · κ(m)ν(m) + s ≤ σ (m). Thus the following both hold.

                        S ∈ U NIFORM -B ITYES =⇒ T ∈ PANOPTIC -S TATSκ,ν,σ
                                                                     YES ,
                        S ∈ U NIFORM -B ITNO         =⇒ T ∈ PANOPTIC -S TATSκ,ν,σ
                                                                            NO .

The reduction’s running time is polynomial since m, κ(m), ν(m), σ (m) are all polynomially bounded
in s and computable in time polynomial in s, and since the algorithm from Corollary 3.3 runs in time
O(kn + s).

Proof of Theorem 2.9. We reduce U NIFORM -B IT to PANOPTIC -S TATSκ,ν,σ restricted as in the statement
of Theorem 2.9. Let c be the constant factor in the big O in Corollary  3.3. Given a (1, 1, s)-sampler
S, we first find the smallest m such that c · κ(m)ν(m) + ν(m)2 + s ≤ σ (m). Such an m exists and is
                                                                  

O(s) because κν + ν 2 ≤ o(σ ) and σ (m) ≥ m for all m. Then we run the algorithm from Corollary 3.3
(based on Lemma 3.2) withk = κ(m) and n = ν(m) to get a depth-3, top fan-in 2 circuit T of size at most
c · κ(m)ν(m) + ν(m)2 + s ≤ σ (m). Thus the following both hold.

                        S ∈ U NIFORM -B ITYES =⇒ T ∈ PANOPTIC -S TATSκ,ν,σ
                                                                     YES ,
                        S ∈ U NIFORM -B ITNO         =⇒ T ∈ PANOPTIC -S TATSκ,ν,σ
                                                                            NO .


                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                   10
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

The reduction’s running time is polynomial since m, κ(m), ν(m), σ (m) are all polynomially bounded
in s and computable in time polynomial in s, and since the algorithm from Corollary 3.3 runs in time
O(kn + n2 + s).

3.3   Containment in C= P
In the proof of Theorem 2.10 we use the following lemma, which states that C= P is closed under
exponential conjunctions and polynomial disjunctions. We supply a folklore proof of this lemma in
Section A.1.

Lemma 3.4. If L ∈ C= P then both of the following hold.

   • ∀q L ∈ C= P for every polynomial q, where ∀q L = x : (x, y) ∈ L for all y ∈ {0, 1}q(|x|) .
                                                          

                             
   • ∨L ∈ C= P where ∨L = (x1 , . . . , x` ) : xi ∈ L for some i .

Proof of Theorem 2.10. The arguments are very similar, so we just give three representative examples:
F ULLY-I NDEPENDENT, K-W ISE -E XCHANGEABLE, and 2-E XISTS -U NCORRELATED. First we men-
tion a useful tool: If S1 , S2 are (1, 1)-samplers, then we define Equ(S1 , S2 ) to be a (1, 1)-sampler that
picks i ∈ {1, 2} uniformly at random, runs Si , and negates the output if i = 2. Hence Equ(S1 , S2 )(U) is
uniform iff S1 (U), S2 (U) are identically distributed.
    Now we prove that F ULLY-I NDEPENDENT ∈ C= P. Note that F ULLY-I NDEPENDENT = ∀q L where,
if we view S as (say) a (k, n)-sampler, and y as (an appropriately encoded description of) an element of
({0, 1}k )n (so q is linear in the size of S), then

                          (S, y) ∈ L ⇐⇒ Pr[S(U) = y] = ∏ni=1 Pr[S(U)i = yi ] .

Thus by Lemma 3.4 it suffices to show that L ∈ C= P. A reduction from L to U NIFORM -B IT just outputs
Equ(S1 , S2 ), where S1 runs S and accepts iff the output is y, and S2 runs S for n times and accepts iff for
all i, the ith coordinate of the output of the ith run is yi .
     Now we prove that K-W ISE -E XCHANGEABLE ∈ C= P. Note that K-W ISE -E XCHANGEABLE = ∀q L
where, if we view S as (say) a (k, n)-sampler, and y = (I, π, w) as (an appropriately encoded description
of) a subset I ⊆ {1, . . . , n} of size min(K, n), a permutation π on {1, . . . , min(K, n)}, and an element
w ∈ ({0, 1}k )min(K,n) (so q is certainly polynomial in the size of S), then
                                    
                        S, (I, π, w) ∈ L ⇐⇒ Pr[S(U)I = w] = Pr[S(U)I = π(w)]

where S(U)I is the restriction to coordinates indexed by I, and π(w) ∈ ({0, 1}k )min(K,n) is obtained by
permuting the coordinates of w by π. Thus by Lemma 3.4 it suffices to show that L ∈ C= P. A reduction
from L to U NIFORM -B IT just outputs Equ(S1 , S2 ), where S1 runs S and accepts iff the output restricted
to I is w, and S2 runs S and accepts iff the output restricted to I is π(w).
     Now we prove that 2-E XISTS -U NCORRELATED ∈ C= P. Note that if we define the language
                              
                           L = (S, i, j) : S(U)i and S(U) j are uncorrelated ,

                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               11
                                              T HOMAS WATSON

then 2-E XISTS -U NCORRELATED reduces to ∨L by mapping a (k, n)-sampler S to
                                                                                 
                           (S, 1, 2), (S, 1, 3), (S, 1, 4), . . . , (S, n − 1, n) .

Thus by Lemma 3.4 it suffices to show that L ∈ C= P. A reduction from L to U NIFORM -B IT just outputs
Equ(S1 , S2 ), where S1 runs S yielding some y ∈ ({0, 1}k )n and accepts with probability (1/22k ) · yi · y j so
that
                                                    1                    
                                 Pr[S1 (U) = 1] = 2k · E S(U)i · S(U) j ,
                                                   2
and S2 runs S twice (independently) yielding some y(1) and y(2) and accepts with probability (1/22k ) ·
 (1) (2)
yi · y j so that
                                                1
                            Pr[S2 (U) = 1] = 2k · E(S(U)i ) · E(S(U) j ) .
                                               2

4     Proofs of exact algorithmic results
We prove Theorem 2.11, Theorem 2.12, and Theorem 2.13 in Section 4.1, Section 4.2, and Section 4.3,
respectively.
    First we introduce some terminology to describe 2-local samplers. Each output bit depends on either
zero, one, or two input bits. Output bits that depend on zero input bits are constants (0 or 1). The
nonconstant output bits can be modeled with an undirected graph (multi-edges and self-loops allowed) as
follows. The input bits are indexed by the nodes. Each output bit depending on one input bit is a self-loop,
labeled with a function from {0, 1} to {0, 1} (either the identity or negation). Each output bit depending
on two input bits is an edge between those two nodes, labeled with a function from {0, 1}2 to {0, 1}.
There are three types of such functions that depend on both bits: AND-type (accepting one of the four
inputs), XOR-type (accepting two of the four inputs), and OR-type (accepting three of the four inputs).

4.1     Full independence for 2-local samplers
We prove Theorem 2.11. Consider a 2-local (1, n)-sampler S, and assume without loss of generality that S
has no constant output bits. We claim that S(U) is fully independent iff both of the following conditions
hold.

    (i) The graph is a forest, ignoring self-loops.

    (ii) Each connected component of the graph has at most one of the following: a self-loop, an AND-type
         edge, or an OR-type edge.

It is trivial to check in linear time whether these conditions hold.
     First we assume that (i) and (ii) both hold, and show that S(U) is fully independent. The different
connected components of the graph are certainly fully independent of each other, so we can focus on
showing that the coordinates of a single connected component are fully independent. If there is a self-loop,
an AND-type edge, or an OR-type edge in the connected component, then let e be that edge. Otherwise,
let e be any edge in the connected component. We show that conditioned on e evaluating to any particular

                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                 12
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

bit, the joint distribution of the remaining edges in e’s connected component is uniform. This implies that
the whole joint distribution of the connected component is fully independent.
     Suppose e is a self-loop at some node v, so we are conditioning on v being some particular bit.
Ignoring e itself, we can view e’s connected component as a tree rooted at v with only XOR-type
edges. After the conditioning, there is a bijection between the set of all assignments of values to the
edges (excluding e) and the set of all assignments of values to the nodes (excluding v) in e’s connected
component: An assignment to nodes (together with the conditioned value of v) determines an assignment
to edges. Furthermore, every assignment to edges arises from some assignment to nodes, because for any
assignment to edges, we can start at v and work our way downward to the leaves, uniquely specifying the
value of each node in terms of the values of its parent and the edge to its parent. Since the sets have the
same size, we have exhibited a bijection between them. This means that conditioned on either value of e,
the joint distribution of all the other edges in e’s connected component is uniform.
     Now suppose e = {u, v} is not a self-loop. We show that, in fact, conditioned on any one of the
four assignments of values to the pair u, v, the joint distribution of all the other edges in e’s connected
component is uniform. Removing e results in two new connected components, each of which is a tree
of XOR-type edges, one rooted at u and the other rooted at v. Let U denote the set of nodes in u’s
new connected component excluding u itself, and let V denote the set of nodes in v’s new connected
component excluding v itself. By the argument from the previous paragraph (when e was a self-loop),
a uniformly random assignment to U induces a uniformly random assignment to the edges in u’s new
connected component, and similarly for V . Since assignments to U and V are chosen independently of
each other, this means that the values of all the edges in e’s original connected component (except e itself)
are jointly uniformly distributed (conditioned on any particular assignment to u, v, and hence conditioned
on any particular assignment to e).
     Now we prove the converse by assuming that (i) and (ii) do not both hold, and showing that S(U) is
not fully independent. Let us refer to self-loops, AND-type edges, and OR-type edges as non-XOR-type
edges. If (i) and (ii) do not both hold, then at least one of the following conditions holds.

 (A) There is a cycle consisting entirely of XOR-type edges.

 (B) There is a cycle with exactly one AND-type edge or OR-type edge.

 (C) There is a path between two non-XOR-type edges.

Suppose (A) holds. Let e be an edge on the cycle. Then e’s marginal distribution is uniform, but
conditioning on any particular values of the other edges on the cycle determines whether or not e’s
endpoints are the same bit as each other, and thus fixes the value of e. Hence S(U) is not fully independent.
Suppose (B) holds. Let ` denote the number of nodes on the cycle. Then the probability that all edges on
the cycle evaluate to 1 must be an integer multiple of 1/2` (since they only depend on ` input bits), but the
product of the marginal probabilities that each edge on the cycle evaluates to 1 must be either 1/2`+1 (if
there is an AND-type edge) or 3/2`+1 (if there is an OR-type edge). Hence S(U) is not fully independent.
Suppose (C) holds. Without loss of generality, all intermediate edges on the path are XOR-type. Let e1
and e2 be the two non-XOR-type edges, which we consider to be part of the path. Let ` denote the number
of nodes on the path. Then the probability that all edges on the path evaluate to 1 must be an integer
multiple of 1/2` (since they only depend on ` input bits), but the product of the marginal probabilities

                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               13
                                                T HOMAS WATSON

that each edge on the path evaluates to 1 must be either 1/2`+1 (if neither e1 nor e2 is OR-type) or 3/2`+1
(if exactly one of e1 , e2 is OR-type) or 9/2`+1 (if both e1 and e2 are OR-type). Hence S(U) is not fully
independent.

4.2     Exchangeability for 2-local samplers
We prove Theorem 2.12. We begin with a lemma.

Lemma 4.1. A joint distribution D over ({0, 1}1 )n is exchangeable iff both of the following conditions
hold.

  (1) The marginals Di are all identically distributed.

  (2) For all i 6= j, if Pr[Di 6= D j ] > 0 then the joint distribution of the other n − 2 coordinates is the
      same when conditioned on (Di = 1, D j = 0) as it is when conditioned on (Di = 0, D j = 1).

Proof of Lemma 4.1. Suppose D is exchangeable. Then (1) holds trivially. To see that (2) holds, consider
i 6= j such that Pr[Di 6= D j ] > 0. First note that since Di , D j are identically distributed,

                                 Pr[Di = 1, D j = 0] = Pr[Di = 0, D j = 1] > 0 .

For some arbitrary particular bits bh (for h 6∈ {i, j}), let E denote the event that Dh = bh for all h 6∈ {i, j}.
Then we have
                                                                                  
                                                        Pr E and Di = 1, D j = 0
                         Pr E Di = 1, D j = 0 =
                                                              Pr[Di = 1, D j = 0]
                                                                                  
                                                          Pr E and Di = 0, D j = 1
                                                     =
                                                              Pr[Di = 0, D j = 1]
                                                                                
                                                     = Pr E Di = 0, D j = 1 .

where
                                                                           
                           Pr E and Di = 1, D j = 0 = Pr E and Di = 0, D j = 1
holds by exchangeability. This shows that (2) holds.
    For the converse, suppose (1) and (2) both hold. Since every permutation is a composition of
transpositions, it suffices to show that the joint distribution is invariant under transposing coordinates. Let
D0 be obtained from D by transposing some coordinates i 6= j. For some arbitrary particular bits bh (for
h ∈ {1, . . . , n}), let E denote the event that Dh = bh for all h ∈ {1, . . . , n}, and let E 0 denote the event that
D0h = bh for all h ∈ {1, . . . , n}. To show that D and D0 are equal as distributions, we just need to show
that Pr[E] = Pr[E 0 ]. If bi = b j then this certainly holds since E and E 0 are the same event. If bi 6= b j and
Pr[Di 6= D j ] = 0 then Pr[E] = Pr[E 0 ] = 0. Finally, suppose bi 6= b j and Pr[Di 6= D j ] > 0. Assume bi = 1
and b j = 0; the other case is symmetric. Since Di , D j are identically distributed by (1), it follows that

                     Pr[Di = 1, D j = 0] = Pr[Di = 0, D j = 1] = Pr[D0i = 1, D0j = 0] > 0 .

                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                      14
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

Also note that
                               Pr E Di = 1, D j = 0 = Pr E 0 D0i = 1, D0j = 0
                                                                           

by (2) and the definition of D0 . Putting the pieces together, we have
                                                           
                          Pr[E] = Pr E Di = 1, D j = 0 · Pr[Di = 1, D j = 0]
                                  = Pr E 0 D0i = 1, D0j = 0 · Pr[D0i = 1, D0j = 0]
                                                            

                                     = Pr[E 0 ] .

This finishes the proof of Lemma 4.1.

    Now we present the proof of Theorem 2.12. Consider a 2-local (1, n)-sampler S. If condition (1)
from Lemma 4.1 does not hold for S(U), then we can reject outright. Otherwise, there are five cases
corresponding to the marginal probability that any particular coordinate is 1.

Case 0:    If all output bits of S are constant 0 then S(U) is trivially exchangeable.

Case 1/4: In this case, each edge of the graph is AND-type. When two AND-type edges share an
endpoint, we say that they agree on the endpoint if the unique assignments that make the two edges
evaluate to 1 agree on the value of the node. We assume without loss of generality that the graph has no
nodes of degree 0. We claim that S(U) is exchangeable iff at least one of the following conditions holds.

  (i) The edges are all disjoint.

  (ii) The graph is a star, and all edges agree on the central node.4

 (iii) The graph is a triangle, and there is agreement at all nodes.5

 (iv) The graph is a triangle, and there is disagreement at all nodes.

  (v) There are only three nodes u, v, w, and there are no {u, w} edges, there are at most two {u, v} edges
      and they agree on v and disagree on u, there are at most two {v, w} edges and they agree on v and
      disagree on w, and the {u, v} edges disagree with the {v, w} edges on v.

 (vi) There are only two nodes, and no two edges agree on both nodes.

(vii) There are only two nodes, and all edges agree on both nodes.

It is trivial to check in linear time whether at least one of these conditions holds.
     First we assume at least one of the conditions holds, and argue that S(U) is exchangeable. If (i) holds
then S(U) is i. i. d. where each coordinate has 1/4 probability of being 1, and this is exchangeable. If (ii)
holds then S(U) is a uniform mixture of the uniform distribution and the constant all 0’s distribution, so
S(U) is exchangeable since it is a mixture of i. i. d.’s. If (iii) holds then outputs of Hamming weight 1 each
  4 A star does not include multi-edges.
  5 A triangle does not include multi-edges.



                          T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                             15
                                                T HOMAS WATSON

have probability 1/8, and outputs of Hamming weight 2 each have probability 0, so S(U) is exchangeable
since the probability of an output only depends on the Hamming weight. If (iv) or (v) or (vi) holds then
an edge evaluating to 1 forces all other edges to evaluate to 0, so S(U) is all 0’s with probability 1 − (n/4)
and is otherwise uniformly distributed on strings of Hamming weight 1, so S(U) is exchangeable. If (vii)
holds then S(U) is all 1’s with probability 1/4 and all 0’s with probability 3/4, which is exchangeable.
    We prove the converse with two lemmas, which show that the following conditions are the only
obstacles to exchangeability. We write ∃e1 , e2 , e3 with the tacit assumption that these are three distinct
edges.

 (A) ∃e1 , e2 , e3 such that e1 , e3 are not disjoint, and e2 , e3 are disjoint.

 (B) ∃e1 , e2 , e3 all sharing an endpoint on which e1 , e2 disagree, and such that e3 does not share its other
     endpoint with e1 or with e2 .

 (C) ∃e1 , e2 , e3 such that e1 , e3 share both endpoints, and e2 shares exactly one endpoint with them, and
     they all agree on the common node.

 (D) ∃e1 , e2 , e3 forming a triangle, such that e1 , e3 agree and e2 , e3 disagree.

  (E) ∃e1 , e2 , e3 such that e1 , e3 share and agree on both endpoints, and e2 either does not share both
      endpoints or does not agree on both endpoints.

Lemma 4.2. If none of (i)–(vii) hold then at least one of (A)–(E) holds.

Proof. Assume none of (i)–(vii) hold. Suppose the graph is not connected. Then since (i) fails, there are
two edges that are not disjoint, and there is also another edge not in their connected component, so (A)
holds. Henceforth suppose the graph is connected.
    Suppose there are at least four nodes. If there is a simple path of length three, then (A) holds, so
suppose there is no such path. Then the graph is “star-like,” meaning it would be a star if multi-edges
were replaced with single edges. If there is not complete agreement on the central node then (B) holds;
otherwise, since (ii) fails, there must be multi-edges and so (C) holds.
    Now suppose there are exactly three nodes and there is a triangle. If there are no multi-edges, then
since (iii) and (iv) fail, (D) holds. If there is a multi-edge pair, then either these two edges disagree on
some endpoint, in which case (D) holds, or they agree on both endpoints, in which case (E) holds.
    Now suppose there are exactly three nodes and there is no triangle. Since the graph is connected,
there is a length-2 path, say {u, v}, {v, w}. Since (v) fails, either there are two {u, v} edges that disagree
on v, in which case (B) holds, or there are two {u, v} edges that agree on both u and v, in which case (E)
holds, or analogous situations happen with {v, w}, or all edges agree on v and there are either two {u, v}
edges or two {v, w} edges, in which case (C) holds. Note that there cannot be just one {u, v} edge and
just one {v, w} edge, since if they agreed then (ii) would hold, and if they disagreed then (v) would hold.
    Finally, if there are exactly two nodes then since (vi) and (vii) fail, (E) holds.

Lemma 4.3. If at least one of (A)–(E) holds, then S(U) is not exchangeable.

                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               16
        T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

Proof. Assuming at least one of (A)–(E) holds, we use condition (2) from Lemma 4.1 to refute exchange-
ability of S(U) by showing that the marginal probability that e3 = 1 (more precisely, the random variable
indexed by e3 evaluates to 1) changes when we go from conditioning on (e1 = 1, e2 = 0) to conditioning
on (e1 = 0, e2 = 1).
    If (A) holds and e1 , e3 share both endpoints then it goes either from 1 to 0 (if e1 , e3 agree on both
endpoints) or from 0 to 1/3. If (A) holds and e1 , e3 share only one endpoint and e1 , e2 are disjoint then it
goes either from 1/2 to 1/6 (if e1 , e3 agree) or from 0 to 1/3. If (A) holds and e3 , e1 , e2 form a simple
path then it goes either from 1/2 to 0 (if e1 , e3 agree and e1 , e2 agree) or from 1/2 to 1/4 (if e1 , e3 agree
and e1 , e2 disagree) or from 0 to 1/2 (if e1 , e3 disagree and e1 , e2 agree) or from 0 to 1/4.
    If (B) holds then it goes either from 1/2 to 0 (if e1 , e3 agree) or from 0 to 1/2. If (C) holds then it
goes either from 1 to 0 (if e1 , e3 agree on both endpoints) or from 0 to 1. If (D) holds then it goes either
from 1 to 0 (if e1 , e2 agree) or from 1/2 to 0. If (E) holds then it goes from 1 to 0.

    Interestingly, the above analysis shows that in Case 1/4, S(U) is “globally exchangeable” iff it is
“locally exchangeable” in the sense that every set of three coordinates is exchangeable.

Case 1/2: In this case, each edge of the graph is either XOR-type or a self-loop. We say that XOR-type
multi-edges agree if they compute the same function, and similarly we say that multi-self-loops agree if
they compute the same function. We assume without loss of generality that the graph has no nodes of
degree 0. We claim that S(U) is exchangeable iff at least one of the following conditions holds.
  (i) The graph is a forest ignoring self-loops, and there is at most one self-loop per connected compo-
      nent.
  (ii) The graph is a simple cycle.6
 (iii) The graph is a simple path but with two self-loops, one at each end.
 (iv) There are only two nodes, no self-loops, and all edges agree.
  (v) There are only two nodes, no self-loops, and only two edges, which disagree.
 (vi) There is only one node, with agreeing self-loops.
 (vii) There is only one node, with only two self-loops, which disagree.
It is trivial to check in linear time whether at least one of these conditions holds.
     First we assume at least one of the conditions holds, and argue that S(U) is exchangeable. If (i) holds
then S(U) is uniform by the characterization in the proof of Theorem 2.11. If (ii) or (iii) holds then S(U)
is the same as conditioning the uniform distribution on having a particular parity, so S(U) is invariant
under permuting coordinates (by commutativity and associativity of addition over GF(2)). If (iv) or (vi)
holds then S(U) is all 1’s with probability 1/2 and all 0’s with probability 1/2, which is exchangeable. If
(v) or (vii) holds then S(U) is uniform over the two possibilities 01 and 10, which is exchangeable.
     We prove the converse with two lemmas, which show that the following conditions are the only
obstacles to exchangeability.
   6 A simple cycle does not include multi-edges or self-loops, unless it has length 2, in which case it is a multi-edge pair.



                            T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                                 17
                                             T HOMAS WATSON

 (A) ∃ a cycle C of XOR-type edges, and an edge e that is either a self-loop or has at least one endpoint
     not on C.

 (B) ∃ a path P (of zero or more XOR-type edges), two self-loops with one at each end of P, and an
     edge e at least one of whose nodes is not on P.

 (C) ∃ three XOR-type edges all sharing both endpoints, such that some but not all of these edges agree.

 (D) ∃ three self-loops all sharing the same node, such that some but not all of these edges agree.

Lemma 4.4. If none of (i)–(vii) hold then at least one of (A)–(D) holds.

Proof. Assume none of (i)–(vii) hold. If there is only one node, then since (vi) and (vii) fail, (D) holds.
Henceforth suppose there are at least two nodes. Since (i) fails, the graph has either a cycle of XOR-type
edges, or two self-loops in the same connected component.
     If it has a cycle of XOR-type edges, then let C be a shortest such cycle. Since (ii) fails, there exists
another edge. If there exists another edge e that is either a self-loop or has at least one endpoint not on
C, then (A) holds. Otherwise, all other edges are XOR-type with both endpoints on C. If C had length
at least three, this would contradict the minimal nature of C. Hence there are only two nodes, with no
self-loops and with at least three XOR-type edges (two of them forming C). Then since (iv) and (v) fail,
(C) holds.
     On the other hand, if the graph is a forest ignoring self-loops but has two self-loops in the same
connected component, then consider two such self-loops that are closest, and let P be the unique path
between them. If P has length zero (so the self-loops are at the same node), then (B) holds since we
are assuming there are at least two nodes (and the other node is incident to some edge e). Otherwise,
since (iii) fails, there exists another edge e. If e were XOR-type with both endpoints on P, then there
would be a cycle of XOR-type edges (contradicting the assumption that the graph is a forest ignoring
self-loops), and if e were another self-loop on P then this would contradict the minimal nature of P (since
P has length ≥ 1). Thus at least one of e’s nodes is not on P, so (B) holds.

Lemma 4.5. If at least one of (A)–(D) holds, then S(U) is not exchangeable.

Proof. Assuming at least one of (A)–(D) holds, we use condition (2) from Lemma 4.1 to refute exchange-
ability of S(U) by exhibiting edges e1 , e2 for which the joint distribution of the evaluations of the other
edges changes when we go from conditioning on (e1 = 1, e2 = 0) to conditioning on (e1 = 0, e2 = 1).
    If (A) holds then let e1 = e and e2 be any edge on C. The joint distribution of the other edges on
C (besides e2 ) goes from being uniform conditioned on having a particular parity to being uniform
conditioned on having the opposite parity.
    If (B) holds then let e1 = e and e2 be one of the two self-loops at the ends of P. The joint distribution
of the other edges on P, together with the self-loop at the other end of P, goes from being uniform
conditioned on having a particular parity to being uniform conditioned on having the opposite parity.
    If (C) or (D) holds then call the edges e1 , e2 , e3 where e1 , e2 disagree and e2 , e3 agree. Then the
marginal probability that e3 evaluates to 1 goes from 0 to 1.

                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               18
         T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

Case 3/4: In this case, each edge of the graph is OR-type. Let S denote the circuit obtained from S by
negating every output bit. Then S(U) is exchangeable iff S(U) is exchangeable. Every edge of the graph
for S is AND-type, so we can use the characterization from Case 1/4 to decide in linear time whether
S(U) is exchangeable.


Case 1:     If all output bits of S are constant 1 then S(U) is trivially exchangeable.


4.3     Pairwise independence for 2-local samplers
We prove Theorem 2.13. Consider a 2-local (1, n)-sampler S, and assume without loss of generality
that S has no constant output bits. We claim that S(U) is pairwise independent iff both of the following
conditions hold.

    (i) The graph has no multi-edges.

    (ii) For each node v of the graph, there is at most one of the following among the edges incident to v: a
         self-loop, an AND-type edge, or an OR-type edge.

It is trivial to check in linear time whether condition (ii) holds. Condition (i) is an instance of the element
distinctness problem, which is the problem of deciding whether a list of numbers (encoding pairs of nodes,
in our situation) has no duplicates. The element distinctness problem can be solved in deterministic
O(n log n) time by sorting, and it can be solved in zero-error randomized expected linear time.7 We
supply a folklore proof of the following lemma in Section A.2.

Lemma 4.6. The element distinctness problem has a zero-error randomized expected linear-time algo-
rithm.

    We now verify that (i) and (ii) characterize pairwise independence. First we assume that (i) and (ii)
both hold, and show that the evaluations of two arbitrary edges e1 , e2 are independent. If e1 , e2 are disjoint
then this is immediate; otherwise they share a node v. Since (i) and (ii) hold, the characterization in
the proof of Theorem 2.11 implies that the edges incident to v are fully independent of each other; in
particular e1 , e2 are independent. Conversely, suppose (i) and (ii) do not both hold. A simple case analysis
shows that if two edges form a multi-edge pair, or if they share a node and neither is XOR-type, then they
cannot be independent, and so S(U) is not pairwise independent.


5      Proofs of approximate completeness results
We prove Theorem 2.16 and Theorem 2.17 in Section 5.1, and we prove Theorem 2.19 and Theorem 2.20
in Section 5.2.

    7 As usual, we assume a model of computation where arithmetic operations and array look-ups take constant time.



                           T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                       19
                                            T HOMAS WATSON

Definition 5.1. For functions 0 ≤ c(s) < f (s) ≤ 1, S TATISTICAL -D ISTANCEc, f is the following promise
problem.
                             c, f 
      S TATISTICAL -D ISTANCEYES  =S : S is a (k, 2, s)-sampler and S(U)1 , S(U)2 are c(s)-close ,
      S TATISTICAL -D ISTANCEc, f 
                             NO = S : S is a (k, 2, s)-sampler and S(U)1 , S(U)2 are f (s)-far .

Without loss of generality, S(U) is independent.

      Sahai and Vadhan [31] proved the following two theorems.

Theorem 5.2. S TATISTICAL -D ISTANCEc, f is prSZK-hard for all constants 0 < c < f < 1.

Theorem 5.3. S TATISTICAL -D ISTANCEc, f ∈ prSZK for all constants 0 < c < f 2 < 1.

     More generally, Sahai and Vadhan proved that for all functions c, f computable in time polynomial in
                                                                      o(1)             o(1)
s, the problem S TATISTICAL -D ISTANCEc, f is prSZK-hard if c = 2−s , f = 1 − 2−s , and is in prSZK
if c ≤ f 2 − s−O(1) . This can be used to improve the parameters (as functions of s) in our theorems. For
                                            o(1)                      o(1)
example, in Theorem 2.16, c can be 2−s and f can be (1/4) − 2−s . We chose to state our theorems
using constants (except Theorem 2.17, where the 1/(n + 1) factor is needed) for simplicity and clarity. It
is awkward to handle reductions when the c, f functions depend on the size of one circuit for one problem
but on the size of a different circuit for the other problem.

5.1     Approximate full independence
We now prove Theorem 2.16 and Theorem 2.17.

Proof of Theorem 2.16. We reduce S TATISTICAL -D ISTANCE2c,4 f (which is prSZK-hard according to
Theorem 5.2) to F ULLY-I NDEPENDENTc, f . Given a (k, 2)-sampler S, let D = S(U). The reduction
outputs a sampler S0 that outputs (b, w) where b ∈ {1, 2} is chosen uniformly at random and w ∈ {0, 1}k
is a sample from Db . This yields a distribution D0 = S0 (U) over {1, 2} × {0, 1}k , but D0 can be viewed as
a distribution over ({0, 1}k )2 by embedding {1, 2} into {0, 1}k .
     First we show that if S ∈ S TATISTICAL -D ISTANCE2c,4    f       0                            c, f
                                                           YES then S ∈ F ULLY-I NDEPENDENT YES . Sup-
pose kD1 − D2 k ≤ 2c. Let D∗ be the independent distribution whose first coordinate is uniform over
{1, 2}and whose second coordinate is D1 . For any event E ⊆ {1, 2} × {0, 1}k and b ∈ {1, 2}, let
Eb = z ∈ {0, 1}k : (b, z) ∈ E . We have

               1                 1                                      1                 1
Pr[D0 ∈ E] =     · Pr[D1 ∈ E1 ] + · Pr[D2 ∈ E2 ] and     Pr[D∗ ∈ E] =     · Pr[D1 ∈ E1 ] + · Pr[D1 ∈ E2 ] .
               2                 2                                      2                 2
This implies that

                                             1                                1
               Pr[D0 ∈ E] − Pr[D∗ ∈ E] =       · Pr[D2 ∈ E2 ] − Pr[D1 ∈ E2 ] ≤ · 2c = c
                                             2                                2
where the inequality follows by kD1 − D2 k ≤ 2c. Hence kD0 − D∗ k ≤ c.

                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                              20
        T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

    Now we show that if S ∈ S TATISTICAL -D ISTANCE2c,4    f      0                          c, f
                                                       NO then S ∈ F ULLY-I NDEPENDENT NO . Suppose
kD0 − D∗ k < f for some distribution D∗ (not necessarily the same as above) that is independent. Assume
Pr[D∗1 = 1] ≤ Pr[D∗1 = 2] (the other case is symmetric). For any event E ⊆ {0, 1}k , we have

                               Pr[D1 ∈ E] = 2 · Pr D0 ∈ {1} × E
                                                                 

                                            < 2 · Pr D∗ ∈ {1} × E + f
                                                                       

                                            ≤ 2 · Pr D∗ ∈ {2} × E + f
                                                                       

                                            < 2 · Pr D0 ∈ {2} × E + 2 f
                                                                        
                                                                           
                                            = 2 · (1/2) · Pr[D2 ∈ E] + 2 f
                                                = Pr[D2 ∈ E] + 4 f

where the third line follows by independence and by the assumption that Pr[D∗1 = 1] ≤ Pr[D∗1 = 2]. Hence
by the second equality in Definition 2.14, kD1 − D2 k < 4 f .

      The proof of Theorem 2.17 uses the following lemma.

Lemma 5.4. Suppose D is a distribution over ({0, 1}k )n . If D is c-close to some fully independent
distribution D∗ , then D is (n + 1)c-close to the distribution D0 that is fully independent and has the same
marginals as D.

   Lemma 5.4 can be proven using a simple hybrid argument. The case n = 2 was proven in [5], but the
same argument works for general n; we omit the details.
                                                                                                0
Proof of Theorem 2.17. We reduce F ULLY-I NDEPENDENTc, f to S TATISTICAL -D ISTANCEc , f (which is
in prSZK by Theorem 5.3). Given a (k, n)-sampler S, construct a (k, n)-sampler S0 that runs S indepen-
dently n times and outputs (w1 , . . . , wn ) where wi is the ith coordinate of the output of the ith run. Let
D = S(U) and D0 = S0 (U), and note that D0 is as in the statement of Lemma 5.4.
    The reduction outputs a (kn, 2)-sampler Ŝ whose first coordinate is a sample from D and whose
second coordinate is a sample from D0 . If S ∈ F ULLY-I NDEPENDENTc,        f
                                                                           YES then by Lemma 5.4,

                                         kD − D0 k ≤ (n + 1)c = c0
                                            0
                                          c ,f
and hence Ŝ ∈ S TATISTICAL -D ISTANCEYES      . If S ∈ F ULLY-I NDEPENDENTc, f          0
                                                                           NO then kD − D k ≥ f since
                                                                    0 ,f
D0 is fully independent, and hence Ŝ ∈ S TATISTICAL -D ISTANCEcNO       .

5.2     Approximate exchangeability
We now prove Theorem 2.19 and Theorem 2.20.

Proof of Theorem 2.19. We reduce S TATISTICAL -D ISTANCEc,2 f (which is prSZK-hard by Theorem 5.2)
to E XCHANGEABLEc, f . Given a (k, 2)-sampler S and letting D = S(U) where, without loss of generality,
D1 , D2 are independent, the reduction is the identity map.
     First we show that if S ∈ S TATISTICAL -D ISTANCEc,2   f                          c, f
                                                          YES then S ∈ E XCHANGEABLE YES . Suppose
kD1 − D2 k ≤ c. Let D∗ be the independent distribution over ({0, 1}k )2 both of whose marginals are

                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                21
                                               T HOMAS WATSON

D1 . Since D∗ is i. i. d., it is exchangeable. For any event E ⊆ ({0, 1}k )2 and y ∈ {0, 1}k , let Ey = z ∈
                                                                                                       

{0, 1}k : (y, z) ∈ E . By independence, we have

                                Pr[D ∈ E] =       ∑ Pr[D1 = y] · Pr[D2 ∈ Ey ]
                                               y∈{0,1}k

and

                               Pr[D∗ ∈ E] =       ∑ Pr[D1 = y] · Pr[D1 ∈ Ey ] .
                                               y∈{0,1}k

This implies that

                Pr[D ∈ E] − Pr[D∗ ∈ E] ≤          ∑ Pr[D1 = y] · Pr[D2 ∈ Ey ] − Pr[D1 ∈ Ey ]
                                                y∈{0,1}k

                                           ≤      ∑ Pr[D1 = y] · c
                                                y∈{0,1}k

                                           = c

where the second inequality follows by kD1 − D2 k ≤ c. Hence kD − D∗ k ≤ c.
    Now we show that if S ∈ S TATISTICAL -D ISTANCEc,2             f                            c, f
                                                                 NO then S ∈ E XCHANGEABLE NO . Suppose
kD − D∗ k < f for some distribution D∗ (not necessarily the same as above) that is exchangeable. In
particular, D∗1 , D∗2 are identically distributed. We trivially have kD1 − D∗1 k ≤ kD − D∗ k and kD2 − D∗2 k ≤
kD − D∗ k. Thus by the triangle inequality, kD1 − D2 k ≤ kD1 − D∗1 k + kD2 − D∗2 k < 2 f .

      The proof of Theorem 2.20 uses the following lemma.

Lemma 5.5. Suppose D is a distribution over ({0, 1}k )n . If D is c-close to some exchangeable distribution
D∗ , then D is 2c-close to the distribution D0 obtained by drawing a sample from D then permuting the
coordinates according to a uniformly random permutation.
                                                   k                                  k n
Proof of Lemma  5.5. For a multiset W ⊆ {0, 1} of size n, we say that w ∈ ({0, 1} ) is an ordering of ∗+  W
if the multiset wi : i ∈ {1, . . . , n} equals W . Let Ord(W ) denote the set of all orderings of W . Let dW
be the sum of Pr[D = w] − Pr[D∗ = w] over all w ∈ Ord(W ) such that Pr[D = w] − Pr[D∗ = w] > 0, and
     ∗−
let dW  be the sum of Pr[D∗ = w] − Pr[D = w] over all w ∈ Ord(W ) such that Pr[D∗ = w] − Pr[D = w] > 0.
Then by the third equality in Definition 2.14, we have
                                                                       ∗+    ∗−
                                                                      dW  + dW
                                kD − D∗ k =              ∑                      .                       (5.1)
                                                multisets W ⊆ {0, 1}k
                                                                          2
                                                      of size n

         0+      0−
Letting dW  and dW  be the analogous quantities with D0 instead of D∗ , we have
                                                                       0+    0−
                                                                      dW  + dW
                                 kD − D0 k =               ∑                    .                       (5.2)
                                                multisets W ⊆ {0, 1}k
                                                                          2
                                                      of size n


                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                               22
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

Now fix some W . Note that since D∗ is exchangeable, all elements of Ord(W ) have the same probability
under D∗ ; call this probability pW
                                  ∗ . If w is an element of Ord(W ) then permuting the coordinates of w

uniformly at random yields a uniformly random element of Ord(W ). Thus all elements of Ord(W ) have
the same probability under D0 , namely

                                        0          1
                                       pW =             · ∑ Pr[D = w].
                                               |Ord(W )| w∈Ord(W )

    0 ≥ p∗ then d 0+ ≤ d ∗+ by definition. If p0 ≤ p∗ then d 0− ≤ d ∗− by definition. We also have
If pW    W       W      W                      W    W       W      W
                                                         
                                                                           0
                                   0 =         ∑ Pr[D = w]  − |Ord(W )| · pW
                                           w∈Ord(W )
                                                                   0
                                                                       
                                      =       ∑       Pr[D = w] − pW
                                          w∈Ord(W )
                                         0+    0−
                                      = dW  − dW
                     0+     0−         ∗+ ∗−              0+     0−              ∗+ ∗−          ∗+   ∗−
                                                                                                       
which implies that dW   = dW   ≤ max(dW  , dW ). Hence dW    +dW      ≤ 2·max(dW   , dW ) ≤ 2· dW  +dW   .
Since this holds for all W , we get kD − D0 k ≤ 2 · kD − D∗ k by (5.1) and (5.2).

    We mention that the constant factor of 2 in Lemma 5.5 is tight, by the following example. Suppose
k = 1, and suppose D is uniformly distributed over a set of n strings, one of which has Hamming weight
1 and the other n − 1 of which have Hamming weight n − 1. Let D∗ be uniformly distributed over the
strings of Hamming weight n − 1. Note that D∗ is exchangeable, and kD − D∗ k = 1/n. However, D0
has probability 1/n2 on each string of Hamming weight 1, and probability (n − 1)/n2 on each string of
Hamming weight n − 1, and thus
                                               1 1      1
                              kD − D0 k = 2 1 −   · = 2 1−    · kD − D∗ k .
                                                n n        n

Proof of Theorem 2.20. For any constant c0 such that 2c < c0 < f 2 , we reduce E XCHANGEABLEc, f to
                             0
S TATISTICAL -D ISTANCEc , f (which is in prSZK by Theorem 5.3). Given a (k, n)-sampler S, construct a
(k, n)-sampler S00 that performs the following computation.

                                1
                                   
    for i = 1, . . . , log2 c0 −2c       do
        choose π ∈ {0, 1}      dlog 2 (n!)e uniformly at random
        if π < n! then
            interpret π as a permutation on {1, . . . , n}
            run S to get w = (w1 , . . . , wn ) 
            halt and output wπ(1) , . . . , wπ(n)
        end
    end
    halt and output the all 0’s element of ({0, 1}k )n


                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                          23
                                                     T HOMAS WATSON

    Let D = S(U), let D0 be as in the statement of Lemma 5.5, and let D00 = S00 (U). Conditioned on
halting inside the for loop, D00 has the same distribution as D0 . In each iteration, there is > 1/2 probability
that π < n! and the computation of S00 halts. Hence the probability the computation halts on the last line
(after failing to halt inside the for loop) is < c0 − 2c. This implies that kD0 − D00 k < c0 − 2c.
    The reduction outputs a (kn, 2)-sampler Ŝ whose first coordinate is a sample from D and whose
second coordinate is a sample from D00 . If S ∈ E XCHANGEABLEc,         f                               0
                                                                      YES then by Lemma 5.5, kD − D k ≤ 2c,
                                            00         0       0     00            0
so by the triangle inequality kD − D k ≤ kD − D k + kD − D k < 2c + (c − 2c) = c and hence Ŝ ∈   0
                              0, f
S TATISTICAL -D ISTANCEcYES        . If S ∈ E XCHANGEABLEc,  f                00               00
                                                            NO then kD − D k ≥ f since D is exchangeable,
                                                0 ,f
and hence Ŝ ∈ S TATISTICAL -D ISTANCEcNO            .


6 BC= P
We now consider the bounded-error version of C= P, which does not seem to have been defined or studied
in the literature before.8

Definition 6.1. prBC= P is the class of all promise problems Karp-reducible to the following promise
problem, B OUNDED -U NIFORM -B IT.
                                
   B OUNDED -U NIFORM -B ITYES = S : S is a (1, 1)-sampler and S(U) is uniform ,
                                
    B OUNDED -U NIFORM -B ITNO = S : S is a (1, 1)-sampler and Pr[S(U) = 1] − (1/2) ≥ 1/4 .

BC= P is defined as the class of languages in prBC= P.

    We now investigate the structural properties of BC= P. We begin with the following amplification
result for BC= P, which is somewhat less trivial than usual amplification results. Then we apply this
lemma to obtain closure properties of BC= P.

Lemma 6.2. For all languages L, the following are equivalent.

  (1) For some polynomial q, there is a Karp reduction that takes x and outputs a (1, 1)-sampler S such
      that the following both hold.9

                                          x ∈ L =⇒ S(U) is uniform .
                                                                    1   1
                                          x 6∈ L =⇒ Pr[S(U) = 1] − ≥         .
                                                                    2 q(|x|)

  (2) L ∈ BC= P.
    8 A more extreme version, in which YES instances have acceptance probability 1/2 and NO instances have acceptance

probability 0, has been studied before and is known as C== P[half] [9] and HalfP [12].
    9 It is not convenient to phrase this as a reduction to a problem, because the bound 1 depends on the size of x, not the size
                                                                                        q(|x|)
of S.


                            T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                              24
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

  (3) For every polynomial Q, there is a Karp reduction that takes x and outputs a (1, 1)-sampler S such
      that the following both hold.
                                          x ∈ L =⇒ S(U) is uniform .
                                                                                   1
                                          x 6∈ L =⇒ Pr[S(U) = 1] ≤                         .
                                                                                 2Q(|x|)
Proof. Clearly (3) ⇒ (2) ⇒ (1), so we just need to demonstrate (1) ⇒ (3). Assume (1). By the standard
trick for making C= P have “1-sided error” (see Section A.1), it follows that there is a similar reduction
mapping x to some (different) (1, 1)-sampler S that achieves
                                                                    1    1
                                           Pr[S(U) = 1] ≤             −
                                                                    2 q(|x|)2
in the case x 6∈ L. Construct a new circuit S0 that performs the following computation.

                                        1      1
    let m = 2q(|x|)4 Q(|x|) and t =
                                                       
                                        2 − 2q(|x|)2       ·m
    choose a uniformly random bit b
    if b = 0 then run S independently m times, and accept iff ≥ t of these runs accept else accept with
                               m
    probability 1 − 21m ∑m
                                 
                         i=dte i

                                               m
    Note that the values of m, t, and ∑m
                                                 
                                         i=dte i can be precomputed in polynomial time and hard-wired
into S0 . If S accepts with probability 1/2 then the probability that ≥ t of m runs of S accept is
                                                 1 m m
                                                          
                                                    ∑ i .
                                                2m i=dte

Hence if x ∈ L then S0 accepts with probability 1/2. If S accepts with probability
                                                       1    1
                                                  ≤      −
                                                       2 q(|x|)2
then by a standard concentration bound, the probability that ≥ t of m runs of S accept is
                                                                4      1
                                            ≤ 2−m/2q(|x|) =                  .
                                                                     2Q(|x|)
Also by a standard concentration bound,
                                           !
                                    1 m m                   4   1
                                 1− m ∑        ≤ 2−m/2q(|x|) = Q(|x|) .
                                   2 i=dte i                  2

Hence if x 6∈ L then S0 accepts with probability
                                          1    1      1   1        1
                                    ≤       · Q(|x|) + · Q(|x|) = Q(|x|) .
                                          2 2         2 2        2
The reduction for (3) just outputs S0 .

                        T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                           25
                                               T HOMAS WATSON

    Note that coRP ⊆ BC= P. The 1-sided error property in part (3) of Lemma 6.2 implies that BC= P ⊆
BPP. Thus BC= P is presumably closed under complement (since presumably P = BC= P = BPP [23]),
but proving this seems out of reach. We now apply Lemma 6.2 to prove    that BC= P is closed under union,
intersection, disjunction (i. e., ∨L ∈ BC= P if L ∈ BC= P, where
                                                                ∨L = (x1 , . . . , x` ) : xi ∈ L for some i ),
and conjunction (i. e., ∧L ∈ BC= P if L ∈ BC= P, where ∧L = (x1 , . . . , x` ) : xi ∈ L for all i ). The proof
of Lemma 3.4 in Section A.1 showing that C= P is closed under disjunction does not work to show that
BC= P is closed under disjunction.
Theorem 6.3. BC= P is closed under disjunction.
Proof. Assuming L ∈ BC= P, we exhibit a reduction witnessing ∨L ∈ BC= P. Given (x1 , . . . , x` ), by
padding we may assume without loss of generality that the xi ’s all have the same length n ≥ `. By
(2) ⇒ (3) in Lemma 6.2, there is a reduction that takes xi and outputs Ci such that the following both
hold.
                                                                  1
                                      xi ∈ L =⇒ Pr[Ci (U) = 1] =    ,
                                                                  2
                                                                  1
                                      xi 6∈ L =⇒ Pr[Ci (U) = 1] ≤     .
                                                                  4n
Our reduction witnessing ∨L ∈ BC= P runs the above reduction for L on each xi to obtain the circuits Ci ,
then outputs a circuit S that runs each Ci independently and combines their results with a parity gate. If
(x1 , . . . , x` ) ∈ ∨L then S is taking the parity of ` independent bits at least one of which is uniform, so S(U)
is uniform. If (x1 , . . . , x` ) 6∈ ∨L then letting z1 , . . . , z` denote S’s random input, we have
                                                                 `
                                                                M                   
                                Pr[S(U) = 1] =        Pr              Ci (zi ) = 1
                                                   z1 ,...,z`
                                                                i=1
                                                                                
                                               ≤      Pr Ci (zi ) = 1 for some i
                                                   z1 ,...,z`
                                                    `
                                               ≤ ∑ Pr[Ci (zi ) = 1]
                                                         zi
                                                   i=1
                                                         1
                                               ≤ `·
                                                        4n
                                                   1
                                               ≤     .
                                                   4
Theorem 6.4. BC= P is closed under conjunction.
Proof. Assuming L ∈ BC= P, we exhibit a reduction witnessing ∧L ∈ BC= P. We are given (x1 , . . . , x` ).
Lemma 6.2 implies that BC= P can have 1-sided error, so there is a reduction that takes xi and outputs Ci
such that the following both hold.
                                                                 1
                                      xi ∈ L =⇒ Pr[Ci (U) = 1] =   ,
                                                                 2
                                                                 1
                                      xi 6∈ L =⇒ Pr[Ci (U) = 1] ≤ .
                                                                 4

                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                   26
        T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

Our reduction witnessing ∧L ∈ BC= P runs the above reduction for L on each xi to obtain the circuits Ci ,
then outputs a circuit S that chooses i ∈ {1, . . . , `} uniformly at random and outputs the same bit as an
execution of Ci . Thus
                                                        1 `
                                  Pr[S(U) = 1] = ∑ Pr[Ci (U) = 1] .
                                                        ` i=1

If (x1 , . . . , x` ) ∈ ∧L then S(U) is uniform since Pr[Ci (U) = 1] = 1/2 for all i. If (x1 , . . . , x` ) 6∈ ∧L then

                                                     `−1 1 1 1 1 1
                                  Pr[S(U) = 1] ≤        · + · = − .
                                                      `  2 ` 4 2 4`
By (1) ⇒ (2) in Lemma 6.2, ∧L ∈ BC= P.

Corollary 6.5. BC= P is closed under union and intersection.

Proof. This follows by the same proof techniques used for Theorem 6.3 and Theorem 6.4. Alternatively,
this follows in a black-box fashion from Theorem   6.3 and Theorem 6.4: Assuming L1 ∈ BC= P and
L2 ∈ BC= P, we also have L ∈ BC= P where L = (x, i) : i ∈ {1, 2} and x ∈ Li , and thus ∨L ∈ BC= P
and ∧L ∈ BC= P. But L1 ∪ L2 reduces to ∨L, and L1 ∩ L2 reduces to ∧L, both by the same trivial reduction
that maps x to (x, 1), (x, 2) . Hence L1 ∪ L2 ∈ BC= P and L1 ∩ L2 ∈ BC= P.


7    Open problems
There are plenty of open problems concerning the complexity of deciding statistical properties of joint
distributions with low-complexity samplers. To the best of our knowledge, none of these problems
has ever been studied before, so the best bounds are what hold trivially or follow directly from results
discussed in this work.
     What can be said about depth-2 circuits? For example, for each of the languages listed in Fact 2.3, it
is consistent with current knowledge that, when restricted to depth-2 circuit samplers, the language could
be in P or C= P-complete. What about d-local samplers when d > 2 or k > 1? For example, is there a
polynomial-time algorithm for deciding whether the joint distribution of a given 3-local (1, n)-sampler
is fully independent? What are the complexities of deciding full independence and exchangeability for
logspace samplers? What about samplers where each input bit influences at most a bounded number of
output bits?
     There are also plenty of open problems concerning the complexity of approximately deciding
statistical properties of samplable distributions. Can quantitative improvements in our results be obtained
(e. g., improvements in the bounds of 1/4 in Theorem 2.16 and 1/2 in Theorem 2.19)? What is the
complexity of the exact-versus-far (as opposed to close-versus-far) versions of these problems? What
about deciding whether a joint distribution is close or far from being pairwise independent? What can be
said about versions of these problems where k is constrained to be 1? What about approximate problems
restricted to low-complexity samplers? Can we prove any interesting approximate algorithmic results?
     For general circuit samplers, what is the complexity of deciding whether or not a distribution is a
mixture of i. i. d. distributions? It is not clear whether this problem is in C= P. Of course, i. i. d. corresponds

                          T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                           27
                                                T HOMAS WATSON

to drawing balls from an urn with replacement, and when changed to without replacement the problem
becomes equivalent to exchangeability, which is decidable in C= P.
    Are there other interesting structural properties or applications of BC= P?


Acknowledgments
I thank László Babai and anonymous reviewers for their comments.


A     Folklore proofs
We use this appendix to supply some folklore proofs.


A.1    Closure of C= P
We supply a folklore proof of Lemma 3.4, which is used in the proof of Theorem 2.10. Assume L ∈ C= P.
First we prove that ∀q L ∈ C= P.
    There is a deterministic polynomial-time algorithm that takes as input x and outputs a circuit C(y, z)
where |y| = q(|x|), such that the following both hold.

                                                                      1
                                  x ∈ ∀q L =⇒ ∀y : Pr[C(y, z) = 1] =    ,
                                                             z        2
                                                                      1
                                  x 6∈ ∀q L =⇒ ∃y : Pr[C(y, z) = 1] 6= .
                                                     z                2

The circuit C runs the reduction witnessing L ∈ C= P on (x, y), then runs the output of the reduction on z.
Now let C be the same as C but with the output negated. We construct a new circuit A(y, z0 ) that for all y
satisfies
                                             1
                        Pr0 [A(y, z0 ) = 1] = · Pr[C(y, z) = 1]2 + Pr[C(y, z) = 1]2
                                                                                    
                        z                    2 z                    z


by randomly choosing either C or C and running it twice independently. We also construct a new circuit
B(y, z0 ) that for all y satisfies

                                                1
                        Pr0 [B(y, z0 ) = 1] =
                                                                                         
                                                  · 2 · Pr[C(y, z) = 1] · Pr[C(y, z) = 1]
                         z                      2        z                 z


by running both C and C independently. Thus we have

                                                            1                                    2
              Pr0 [A(y, z0 ) = 1] − Pr0 [B(y, z0 ) = 1] =     · Pr[C(y, z) = 1] − Pr[C(y, z) = 1]
               z                   z                        2 z                    z


                       T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                             28
       T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

which implies the following.
                    x ∈ ∀q L =⇒ ∀y : Prz0 [A(y, z0 ) = 1] − Prz0 [B(y, z0 ) = 1] = 0 .
                                 
                                                   0                      0
                                 ∃y : Prz0 [A(y, z ) = 1] − Prz0 [B(y, z ) = 1] > 0
                                 
                    x 6∈ ∀q L =⇒   and
                                 
                                  ∀y : Prz0 [A(y, z0 ) = 1] − Prz0 [B(y, z0 ) = 1] ≥ 0 .
                                 

Now we construct a circuit S(y, z00 ) such that
                                               1
                     Pr00 [S(y, z00 ) = 1] =     · Pr0 [A(y, z0 ) = 1] + (1 − Pr0 [B(y, z0 ) = 1])
                                                                                                  
                     z                         2 z                            z

by randomly choosing either A or B to run, and negating the output if B was chosen. This implies the
following.
                                                                  1                                           1
              x ∈ ∀q L =⇒ ∀y : Pr00 [S(y, z00 ) = 1] =                           =⇒ Pr00 [S(y, z00 ) = 1] =     .
                                      z                           2                      y,z                  2
                           
                                              00           1
                           ∃y : Prz00 [S(y, z ) = 1] > 2
                           
                                                                                                              1
              x 6∈ ∀q L =⇒   and                                                 =⇒ Pr00 [S(y, z00 ) = 1] >     .
                                                                                        y,z                  2
                            ∀y : Prz00 [S(y, z00 ) = 1] ≥ 12
                           

The reduction witnessing ∀q L ∈ C= P just outputs S, which has y, z00 as the random input bits.
    Now we prove that ∨L ∈ C= P. Given x1 , . . . , x` , we run the reduction witnessing L ∈ C= P on each xi
                                                                                  (0)              (1)
to get circuits Ci (z) such that xi ∈ L ⇐⇒ Prz [Ci (z) = 1] = 1/2. We define Ci = Ci and let Ci be the
same as Ci but with the output negated. Then we have
                                                      (0)                (1)
                 (x1 , . . . , x` ) ∈ ∨L ⇐⇒ ∏`i=1 Prz Ci (z) = 1 − Prz Ci (z) = 1 = 0 .
                                                                                     

We construct a new circuit A(z0 ) that satisfies
                                                                           `
                                                       1                            (bi )
                             Pr0 [A(z0 ) = 1] =
                                                                                                       
                                                           ·      ∑ ∏ Prz Ci                 (z) = 1
                             z                     2`−1        even parity i=1
                                                               b ∈ {0, 1}`

                                                                            (b )
by randomly choosing an even parity b then running each Ci i independently. We also construct a new
circuit B(z0 ) that satisfies
                                                                           `
                                                       1                            (bi )
                             Pr0 [B(z0 ) = 1] =
                                                                                                    
                                                       `−1
                                                           ·      ∑ ∏ Prz Ci                 (z) = 1
                             z                     2           odd parity i=1
                                                               b ∈ {0, 1}`

                                                                          (b )
by randomly choosing an odd parity b then running each Ci i independently. This implies that
                         (x1 , . . . , x` ) ∈ ∨L ⇐⇒ Pr0 [A(z0 ) = 1] − Pr0 [B(z0 ) = 1] = 0 .
                                                           z                       z


                         T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                                       29
                                                         T HOMAS WATSON

Now we construct a circuit S(z00 ) such that

                                                       1
                              Pr00 [S(z00 ) = 1] =       · Pr0 [A(z0 ) = 1] + (1 − Pr0 [B(z0 ) = 1])
                                                                                                    
                              z                        2 z                         z

by randomly choosing either A or B to run, and negating the output if B was chosen. This implies that
(x1 , . . . , x` ) ∈ ∨L ⇐⇒ Prz00 [S(z00 ) = 1] = 1/2. The reduction witnessing ∨L ∈ C= P just outputs S.

A.2     The element distinctness problem
We supply a folklore proof of Lemma 4.6, which is used in the proof of Theorem 2.13. Consider the
following algorithm.

     Input: x1 , x2 , . . . , xn ∈ {1, 2, . . . , M}
     Output: are x1 , x2 , . . . , xn distinct?

 1 execute the following two while loop computations in parallel until one of them halts:
 2 while true do
 3    choose a hash function f : {1, . . . , M} → {1, . . . , n} pairwise independently
 4    if the number of pairs i < j such that f (xi ) = f (x j ) is ≤ 2n then
 5         if ∃(i < j) such that f (xi ) = f (x j ) and xi = x j then halt and output “no” else halt and
           output “yes”
 6    end
 7 end
 8 while true do
 9    choose a pair of indices i < j uniformly at random
10    if xi = x j then halt and output “no”
11 end



    Note that the algorithm never outputs an incorrect answer.
                                                               n
                                                                 If the number of pairs i < j such that
xi = x j is ≥ n/2 then the second while loop halts after ≤ 2 /(n/2) = n − 1 iterations in expectation.
Otherwise, the expectation (over the choice of f ) of the number of pairs i < j such that f (xi ) = f (x j ) is
< (n/2) + n2 · (1/n) < n, and so the first while loop has > 1/2 probability of halting in each iteration
and thus halts after < 2 iterations in expectation. Line 4 takes linear time by building lists of input
elements that hash to each bucket, and Line 5 takes linear time by brute force (or sorting) on each bucket,
so the first while loop would take linear time in expectation.


References
 [1] DAVID J. A LDOUS: More uses of exchangeability: Representations of complex random structures.
     In Probability and Mathematical Genetics: Papers in Honour of Sir John Kingman, pp. 35–63.
     Cambridge Univ. Press, 2010. [arXiv:0909.4339] 2

                             T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                            30
      T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

 [2] N OGA A LON , A LEXANDR A NDONI , TALI K AUFMAN , K EVIN M ATULEF, RONITT RUBINFELD ,
     AND N ING X IE: Testing k-wise and almost k-wise independence. In Proc. 39th STOC, pp. 496–505.
     ACM Press, 2007. [doi:10.1145/1250790.1250863] 2
 [3] S ANJEEV A RORA AND B OAZ BARAK: Computational Complexity – A Modern Approach. Cam-
     bridge Univ. Press, 2009. 4
 [4] T U ĞKAN BATU , S ANJOY DASGUPTA , R AVI K UMAR , AND RONITT RUBINFELD: The complexity
     of approximating the entropy. SIAM J. Comput., 35(1):132–150, 2005. Preliminary version in
     CCC’02. [doi:10.1137/S0097539702403645] 2
 [5] T U ĞKAN BATU , E LDAR F ISCHER , L ANCE F ORTNOW, R AVI K UMAR , RONITT RUBINFELD , AND
     PATRICK W HITE: Testing random variables for independence and identity. In Proc. 42nd FOCS,
     pp. 442–451. IEEE Comp. Soc. Press, 2001. [doi:10.1109/SFCS.2001.959920] 2, 21
 [6] T U ĞKAN BATU , L ANCE F ORTNOW, RONITT RUBINFELD , WARREN S MITH , AND PATRICK
     W HITE: Testing closeness of discrete distributions. J. ACM, 4, 2013. Preliminary version in
     FOCS’00. [doi:10.1145/2432622.2432626, arXiv:1009.5397] 2
 [7] T U ĞKAN BATU , R AVI K UMAR , AND RONITT RUBINFELD: Sublinear algorithms for testing
     monotone and unimodal distributions. In Proc. 36th STOC, pp. 381–390. ACM Press, 2004.
     [doi:10.1145/1007352.1007414] 2
 [8] C HRISTOPHER B ECK , RUSSELL I MPAGLIAZZO , AND S HACHAR L OVETT: Large deviation bounds
     for decision trees and sampling lower bounds for AC0 -circuits. In Proc. 53rd FOCS, pp. 101–110.
     IEEE Comp. Soc. Press, 2012. [doi:10.1109/FOCS.2012.82] 3
 [9] A NDRÉ B ERTHIAUME AND G ILLES B RASSARD: The quantum challenge to structural complexity
     theory. In Proc. 7th IEEE Conf. on Structure in Complexity Theory (SCT’92), pp. 132–137, 1992.
     [doi:10.1109/SCT.1992.215388] 24
[10] A NDREJ B OGDANOV, E LCHANAN M OSSEL , AND S ALIL P. VADHAN: The complexity of distin-
     guishing Markov random fields. In Proc. 12th Internat. Workshop on Randomization and Compu-
     tation (RANDOM’08), volume 5171, pp. 331–342, 2008. [doi:10.1007/978-3-540-85363-3_27]
     3
[11] B ERND B ORCHERT, L ANE A. H EMASPAANDRA , AND J ÖRG ROTHE: Restrictive acceptance
     suffices for equivalence problems. LMS Journal of Computation and Mathematics, 3:86–95, 2000.
     Preliminary version in FCT’99. [doi:10.1112/S146115700000022X] 2
[12] B ERND B ORCHERT AND F RANK S TEPHAN: Looking for an analogue of Rice’s Theorem in circuit
     complexity theory. Mathematical Logic Quarterly, 46(4):489–504, 2000. Preliminary versions in
     KGC’97 and ECCC. [doi:10.1002/1521-3870(200010)46:4<489::AID-MALQ489>3.0.CO;2-F] 24
[13] S IU O N C HAN , I LIAS D IAKONIKOLAS , G REGORY VALIANT, AND PAUL VALIANT: Op-
     timal algorithms for testing closeness of discrete distributions. In Proc. 25th Ann. ACM-
     SIAM Symp. on Discrete Algorithms (SODA’14), volume 6, pp. 1193–1203. ACM Press, 2014.
     [doi:10.1137/1.9781611973402.88] 2

                     T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                         31
                                          T HOMAS WATSON

[14] C ONSTANTINOS DASKALAKIS , I LIAS D IAKONIKOLAS , ROCCO S ERVEDIO , G REGORY VALIANT,
     AND PAUL VALIANT : Testing k-modal distributions: Optimal algorithms via reductions. In Proc.
     24th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’13), pp. 1833–1852. ACM Press, 2013.
     [doi:10.1137/1.9781611973105.131] 2
[15] A NINDYA D E AND T HOMAS WATSON: Extractors and lower bounds for locally samplable sources.
     ACM Transactions on Computation Theory, 4(3), 2012. Preliminary versions in RANDOM’11 and
     ECCC. [doi:10.1145/2141938.2141941] 3
[16] P ERCI D IACONIS AND DAVID F REEDMAN: Finite exchangeable sequences. Ann. Probab., 8(4):745–
     764, 1980. [doi:10.1214/aop/1176994663] 2
[17] Z EEV DVIR , DAN G UTFREUND , G UY ROTHBLUM , AND S ALIL P. VADHAN: On approximating
     the entropy of polynomial mappings. In Proc. 2nd Innovations in Computer Science Conf. (ICS’11),
     pp. 460–475, 2011. Preliminary version in ECCC. 3
[18] S TEPHEN F ENNER , F REDERIC G REEN , S TEVEN H OMER , AND R ANDALL P RUIM: Quantum NP
     is hard for PH. In Proc. 6th Italian Conf. on Theoretical Computer Science, pp. 241–252, 1998.
     Expanded version in ECCC. 2
[19] O DED G OLDREICH: Computational Complexity – A Conceptual Perspective. Cambridge Univ.
     Press, 2008. 4
[20] O DED G OLDREICH , A MIT S AHAI , AND S ALIL P. VADHAN: Can statistical zero knowledge be
     made non-interactive? or On the relationship of SZK and NISZK. In Proc. 19th Internat. Cryptology
     Conf., pp. 467–484, 1999. Preliminary version in ECCC. [doi:10.1007/3-540-48405-1_30] 2, 3
[21] O DED G OLDREICH AND S ALIL P. VADHAN: Comparing entropies in statistical zero-knowledge
     with applications to the structure of SZK. In Proc. 14th IEEE Conf. on Computational Complexity
     (CCC’99), pp. 54–73, 1999. [doi:10.1109/CCC.1999.766262] 2
[22] O DED G OLDREICH AND S ALIL P. VADHAN: On the complexity of computational problems
     regarding distributions. In Studies in Complexity and Cryptography, volume 6650 of LNCS, pp.
     390–405. Springer, 2011. Also available at ECCC. [doi:10.1007/978-3-642-22670-0_27] 2
[23] RUSSELL I MPAGLIAZZO AND AVI W IGDERSON: P = BPP if E requires exponential circuits:
     Derandomizing the XOR Lemma. In Proc. 29th STOC, pp. 220–229. ACM Press, 1997.
     [doi:10.1145/258533.258590] 26
[24] J ESSE K AMP, A NUP R AO , S ALIL P. VADHAN , AND DAVID Z UCKERMAN: Deterministic extractors
     for small-space sources. J. Comput. System Sci., 77(1):191–220, 2011. Preliminary version in
     STOC’06. [doi:10.1016/j.jcss.2010.06.014] 3
[25] J OHN K INGMAN: Uses of exchangeability. Ann. Probab., 6(2):183–197, 1978. 2
[26] R EUT L EVI , DANA RON , AND RONITT RUBINFELD: Testing properties of collections of dis-
     tributions. Theory of Computing, 9:295–347, 2013. Preliminary versions in ICS’10 and ECCC.
     [doi:10.4086/toc.2013.v009a008] 2

                      T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                         32
      T HE C OMPLEXITY OF D ECIDING S TATISTICAL P ROPERTIES OF S AMPLABLE D ISTRIBUTIONS

[27] S HACHAR L OVETT AND E MANUELE V IOLA: Bounded-depth circuits cannot sample good
     codes. Comput. Complexity, 21(2):245–266, 2012. Preliminary versions in CCC’12 and ECCC.
     [doi:10.1007/s00037-012-0039-3] 3

[28] S OFYA R ASKHODNIKOVA , DANA RON , A MIR S HPILKA , AND A DAM S MITH: Strong lower
     bounds for approximating distribution support size and the distinct elements problem. SIAM J.
     Comput., 39(3):813–842, 2009. Preliminary version in FOCS’07. [doi:10.1137/070701649] 2

[29] RONITT RUBINFELD AND ROCCO S ERVEDIO: Testing monotone high-dimensional distribu-
     tions. Random Structures Algorithms, 34(1):24–44, 2009. Preliminary version in STOC’05.
     [doi:10.1002/rsa.20247] 2

[30] RONITT RUBINFELD AND N ING X IE: Testing non-uniform k-wise independent distributions
     over product spaces. In Proc. 37th Internat. Colloq. on Automata, Languages and Programming
     (ICALP’10), pp. 565–581, 2010. [doi:10.1007/978-3-642-14165-2_48] 2

[31] A MIT S AHAI AND S ALIL P. VADHAN: A complete problem for statistical zero knowledge. J. ACM,
     50(2):196–249, 2003. Preliminary versions in FOCS’97, ECCC and Cryptology ePrint Archive.
     [doi:10.1145/636865.636868] 2, 3, 7, 20

[32] J UN TARUI: Probabilistic polynomials, AC0 functions, and the polynomial-time hierarchy. Theoret.
     Comput. Sci., 113(1):167–183, 1993. Preliminary version in STACS’91. [doi:10.1016/0304-
     3975(93)90214-E] 2

[33] S EINOSUKE T ODA AND M ITSUNORI O GIWARA: Counting classes are at least as hard as the
     polynomial-time hierarchy. SIAM J. Comput., 21(2):316–328, 1992. Preliminary version in CCC’91.
     [doi:10.1137/0221023] 2

[34] L UCA T REVISAN , S ALIL P. VADHAN , AND DAVID Z UCKERMAN: Compression of samplable
     sources. Comput. Complexity, 14(3):186–227, 2005. Preliminary versions in CCC’04 and ECCC.
     [doi:10.1007/s00037-005-0198-6] 3

[35] PAUL VALIANT: Testing symmetric properties of distributions. SIAM J. Comput., 40(6):1927–1968,
     2011. Preliminary versions in STOC’08 and ECCC. [doi:10.1137/080734066] 2

[36] E MANUELE V IOLA: The complexity of distributions. SIAM J. Comput., 41(1):191–218, 2012.
     Preliminary version in FOCS’10. [doi:10.1137/100814998] 3

[37] E MANUELE V IOLA: Extractors for circuit sources. SIAM J. Comput., 43(2):655–672, 2014.
     Preliminary versions in FOCS’11 and ECCC. [doi:10.1137/11085983X] 3

[38] K LAUS W. WAGNER: The complexity of combinatorial problems with succinct input representation.
     Acta Informatica, 23(3):325–356, 1986. [doi:10.1007/BF00289117] 2

[39] T HOMAS WATSON: The complexity of deciding statistical properties of samplable distributions. In
     Proc. 31st Symp. Theoretical Aspects of Comp. Sci. (STACS’14), pp. 663–674, 2014. (Preliminary
     version of this paper). [doi:10.4230/LIPIcs.STACS.2014.663] 1

                      T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                         33
                                         T HOMAS WATSON

[40] T HOMAS WATSON: The complexity of estimating min-entropy. Comput. Complexity, 2014 (online).
     Preliminary version in ECCC. [doi:10.1007/s00037-014-0091-2] 2, 3


AUTHOR

     Thomas Watson
     Postdoctoral researcher
     University of Toronto, Toronto, ON
     thomasw cs toronto edu
     http://www.cs.toronto.edu/~thomasw/


ABOUT THE AUTHOR

     T HOMAS WATSON is a postdoc at the University of Toronto, supervised by Toniann Pitassi.
        He received his Ph. D. in computer science from the University of California, Berkeley,
        supervised by Luca Trevisan. He received his undergraduate degree in computer science,
        math, and computer engineering from the University of Wisconsin-Madison, where his
        interest in theoretical computer science research was sparked and fostered by Dieter van
        Melkebeek. His research interests include most of complexity theory, but recently he has
        been working on communication complexity.




                     T HEORY OF C OMPUTING, Volume 11 (1), 2015, pp. 1–34                          34