Almost $k$-Wise vs. $k$-Wise Independent Permutations, and Uniformity for General Group Actions

Authors Noga Alon, Shachar Lovett,
Plaintext
                         T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2012



               Almost k-Wise vs. k-Wise
            Independent Permutations, and
         Uniformity for General Group Actions∗
                                   Noga Alon†                       Shachar Lovett‡
                  Received August 19, 2012; Revised January 14, 2013; Published May 30, 2013




       Abstract: A family of permutations in Sn is k-wise independent if a uniform permutation
       chosen from the family maps any sequence of k distinct elements to any sequence of k distinct
       elements with equal probability. Efficient constructions of k-wise independent permutations
       are known for k = 2 and k = 3 based on multiply transitive permutation groups but are
       unknown for k ≥ 4. In fact, it is known that there are no nontrivial subgroups of Sn for n ≥ 25
       which are 4-wise independent (“4-transitive”). Faced with this obstacle, research has turned
       towards constructing almost k-wise independent families, where small errors are allowed.
       Constructions of almost k-wise independent families of permutations, with optimal size up
       to polynomial factors, have been achieved by several authors.
           Motivated by this problem, we give several results relating almost k-wise and k-wise
       distributions over permutations.
   ∗ An earlier version of this paper appeared in the Proceedings of the 16th International Workshop on Randomization and

Computation (RANDOM ’12), pages 350–361, 2012.
  † Supported in part by an ERC advanced grant and by NSF grant DMS-0835373.
  ‡ Supported by NSF grant DMS-0835373.



ACM Classification: G.3
AMS Classification: 68W20,68Q25
Key words and phrases: local independence, permutations, groups, representation theory


© 2013 Noga Alon and Shachar Lovett
c b Licensed under a Creative Commons Attribution License (CC-BY)                           DOI: 10.4086/toc.2013.v009a015
                                  N OGA A LON AND S HACHAR L OVETT

        1. Any almost k-wise independent distribution, with small enough error, is close in
           statistical distance to a k-wise independent distribution.
        2. A uniformly random set of nO(k) permutations supports, with high probability, a distri-
           bution which is k-wise independent.
        3. Derandomizing this, we show that any family which is almost 2k-wise independent,
           with small enough error, supports a distribution which is k-wise independent.

          These results allow for simplified analysis of randomized algorithms. For example,
      our results show that one can analyze an algorithm assuming access to k-wise independent
      permutation families, but then use it with only almost k-wise independent families, with a
      provable correctness guarantee.
          In fact, we prove all of these results in the general setting of a group actions. Let G be
      a group acting on a set X. The case of k-wise permutations corresponds to G = Sn and X
      the set of sequences of k distinct elements. A subset S of G is X-uniform if for any x, y ∈ X,
      the probability over a uniform g ∈ S that g(x) = y is the same as when g is chosen uniformly
      from G. It is approximately X-uniform if these probabilities are close. We prove all the
      above results in this general setting, relating almost X-uniform and X-uniform distributions.
      Our proof is based on basic tools from the representation theory of finite groups.


1    Introduction
Small probability spaces of limited independence are widely used in many applications. Specifically, if
the analysis of a randomized algorithm depends only on the assumption that the random bits used are
only k-wise independent, one can replace the random tape by a tape selected from a k-wise independent
distribution. One application of this is a derandomization of the algorithm by enumerating over all
possible random strings. Another application is when the random string needs to be saved, for example in
data structures, where using k-wise independence allows one to maintain a succinct data structure.
     The case of k-wise independent distributions over {0, 1}n has been widely studied, and there are
optimal constructions of k-wise independent probability spaces of size nO(k) (see e. g., [4]). Moreover,
these constructions are strongly explicit: given an index of an element i ∈ [nO(k) ] and an index of a bit
 j ∈ [n], one can compute the j-th bit of the i-th string in time O(k log n). This is crucial for several
applications, for example for streaming algorithms and cryptography, where operations need to be
performed in poly-logarithmic time.
     Another widely studied case is that of k-wise independent permutations of n elements. This problem
is motivated by cryptographic applications, as k-wise independent permutations allow perfect secrecy
even if one allows k oracle queries to the encryption. For more details on the role of k-wise independent
permutations in cryptography, see, e. g., [21, 25, 26, 27].
     Here, the situation is much less understood. For k = 2 the group of invertible affine transformations
x 7→ ax + b over a finite field F yields a 2-wise independent family; and for k = 3 the group of Möbius
transformations x 7→ (ax + b)/(cx + d) with ad − bc = 1 over the projective line F ∪ {∞} yields a 3-wise
independent family.

                    T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                            560
                             A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

     For k ≥ 4 (and n large enough) the situation is dramatically different. Until recently, no k-wise
independent families of permutation were known other than the full symmetric group Sn and the alternating
group An . In fact, it is known (cf., e. g., [7], Theorem 5.2) that for n ≥ 25 and k ≥ 4 there are no other
subgroups of Sn which form a k-wise independent family.1 Recently, two works broke this barrier. The
first is by Finucane, Peled and Yaari [9] who gave an explicit construction of a k-wise independent family
of permutations of size k2n . The second is by Kuperberg, Lovett and Peled [17] who proved the existence
of k-wise independent families of permutations of size nO(k) . Still, no explicit construction of k-wise
independent families of permutations of size smaller than exponential in n is known.
     Faced with this problem, research has turned towards constructing families of permutations which are
almost k-wise independent, allowing for small errors. There has been much research towards constructing
explicit almost k-wise independent families of minimal size. This was achieved, up to polynomial factors,
by Kaplan, Naor and Reingold [12], who gave a construction of such a family of size nO(k) . Alternatively,
such a family can also be obtained from the construction of Kassabov [14] of a constant size expanding
set of Sn by considering all words of length O(k log n). Both of these constructions are also strongly
explicit: given an index of a permutation i ∈ [nO(k) ] and an element j ∈ [n], one can compute the image
of the i-th permutation on j in time O(k log n). Again, this is crucial for applications such as streaming
algorithms or cryptography.
     For many applications, almost k-wise independent families are just as good as k-wise independent
families. However, the analysis must take the error into account, which in some cases is not trivial. Our
first result shows that by choosing the error small enough, one can analyze an algorithm using k-wise
independent permutations, and then apply almost k-wise independent permutations to achieve almost the
same results.

Theorem 1.1. Let µ be a distribution taking values in Sn which is almost k-wise independent with error
ε. Then there exists a distribution µ 0 over permutations which is k-wise independent, and such that the
statistical distance between µ and µ 0 is at most O(ε · n4k ).

    A similar result for k-wise independent hash functions was obtained by Alon, Goldreich and Man-
sour [5], and more generally over product spaces by Rubinfeld and Xie [20]. Our proof technique is
similar in spirit, although technically more involved. This allows for an oblivious derandomization of
randomized algorithms (with two-sided error) which “work” given any k-wise independent distribution
over permutations: let f be a boolean function, and let A be a randomized algorithm such that

                                                 Pr [A(x, π) = f (x)] ≥ 2/3
                                                π∼µ

for any k-wise independent distribution over permutations µ. Then A can be derandomized by letting π
be chosen uniformly from an almost k-wise independent distribution with error ε ≤ O(n−4k ). Since such
distributions can be generated strongly explicitly, the overhead (in terms of the number of bits needed to
sample from the distribution) is just O(k log n).
     Another relaxation of the problem of constructing small families of k-wise independent permutations
is that of considering weighted families, or equivalently distributions of small support, which are k-wise
   1 In the language of group theory, these are k-transitive groups. The currently known proof of this fact is hard, as it requires

the classification of finite simple groups.


                         T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                              561
                                          N OGA A LON AND S HACHAR L OVETT

independent. The relaxation here is that the elements can have unequal weights. We note that such
distributions are typically useless for derandomization purposes as the weights could require exponentially
small precision, hence requiring polynomially many random bits to compute. Still, it is interesting to
inspect what is known for distributions.
     Contrary to the case of families, it is simple to establish that there exist distributions of small support
which are k-wise independent. First, note that given a family S of permutations, it is easy to decide if
there exists a distribution µ supported on S which is k-wise independent using linear programming: for a
permutation π define the matrix Mk (π) to be the permutation on sequences of k distinct elements induced
by π. It is an (n)k × (n)k permutation matrix, where (n)k := ∏k−1  i=0 (n − i). Let U denote the uniform matrix
all of whose elements are (n − k)!/n!. Then there exists a k-wise independent distribution supported
on S iff U belongs to the convex hull of {Mk (π) : π ∈ S}. The latter condition can be easily verified
using linear programming. Now, starting with any set of permutations which support k-wise independent
permutations (for example the set of all permutations), one can apply Carathéodory theorem [8] and
deduce that U lies in the convex hull of at most n2k permutations. That is, there exist k-wise independent
distributions which are supported on at most n2k permutations. Moreover, and somewhat surprisingly, one
can algorithmically find a k-wise independent distribution with small support in a weakly explicit manner
(i. e., in time nO(k) ) using the ideas of Karp and Papadimitriou [13] and Koller and Megiddo [15].2
     We consider the problem of constructing small explicit sets which support k-wise independent
distributions. First, we establish that most small sets support k-wise independent distributions.

Theorem 1.2. Let S be a uniformly random subset of Sn of size cn6k for an appropriately chosen absolute
constant c. Then with high probability (w. h. p., for short) over the choice of S, there exists a distribution
µ supported on S which is k-wise independent.

    A similar result for k-wise independent hash functions was obtained by Austrin and Håstad [6]. Our
result implies a somewhat surprising consequence for search algorithms which “work” given any k-wise
independent distribution over permutations, which allows us to transform weak guarantees to strong
guarantees. Let f be a function and A an algorithm, such that for any k-wise independent distribution µ,

                                                  Pr [A(x, π) = f (x)] > 0 .
                                                 π∼µ


Then since almost all sets of size nO(k) support such a distribution, we must have that A has a noticeable
fraction of witnesses in Sn ,
                                       Pr [A(x, π) = f (x)] ≥ n−O(k) .
                                             π∈Sn

   We also show that almost 2k-wise independent permutations give an explicit construction of a set
which supports k-wise independence, thus derandomizing Theorem 1.2.

Theorem 1.3. Let S be a subset of Sn such that S is almost 2k-wise independent with error ε ≤ O(n−7k ).
Then there exists a distribution µ supported on S which is k-wise independent.
   2 Essentially, the linear program for finding µ has n! variables and nO(k) constraints. Its dual has nO(k) variables and n!

constraints. The dual problem can be solved efficiently using the ellipsoid method since it has an efficient separating-hyperplane
oracle.


                         T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                             562
                           A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

    We are not aware of a similar result, even in the case of k-wise independent hash functions. This
allows for an oblivious derandomization of search algorithms which “work” given any k-wise independent
distribution over permutations: let f be a function, and let A be a randomized algorithm such that

                                                  Pr [A(x, π) = f (x)] > 0
                                                 π∼µ

for any k-wise independent distribution µ over permutations. Then taking S to be an almost 2k-wise
independent family of permutations with error O(n−7k ), we get that there exists π ∈ S for which A(x, π) =
 f (x), achieving an oblivious derandomization of A with overhead (measured in bits, as before) O(k log n).
     Here is a toy example illustrating the way the last theorem and the discussion preceding it can be
applied. Let G = (V, E) be a graph on a set V of n vertices, and suppose that each vertex v ∈ V has a real
positive weight w(v). Let d(v) be the degree of v, and assume all degrees are bounded by k. We claim
that G contains an independent set U ⊂ V of total weight W (U) = ∑u∈U w(u) at least

                                                                w(v)
                                                         ∑ d(v) + 1 .
                                                        v∈V

To prove it, let π be a random permutation of the set of vertices V , and let U consist of all vertices
u so that π(u) precedes π(v) for every neighbor v of u. It is clear that U is an independent set, and
for any vertex u ∈ V the probability that u ∈ U is exactly 1/(d(u) + 1), as this is the probability that u
precedes all its neighbors. By linearity of expectation, the expected value of the total weight of U is
∑v∈V w(v)/(d(v) + 1) and hence there exists an independent set U of total weight at least as claimed.
     The above proof clearly works even if π is only assumed to be (k + 1)-wise independent.3 Therefore,
the discussion preceding Theorem 1.3 implies that if π is chosen uniformly at random, then the probability
it provides a set U satisfying W (U) ≥ ∑v∈V w(v)/(d(v) + 1), is at least n−O(k) . The theorem itself shows
that the support of any set of almost (2k + 2)-wise independent permutations with sufficiently small error
must contain a permutation π that provides an independent set U as above.
     A similar reasoning can be applied to other arrangement problems. Given a k-uniform hypergraph
with a weight for each permutation of the vertices in each of its edges, one may want to find a permutation
maximizing the total weight of all orders induced on the sets of vertices in the edges. Problems of this
type are called k-CSP-rank problems, (see, e. g., [1]), and include Betweenness and Feedback Arc Set. In
most of these problems, finding the precise optimum is NP-hard, and the reasoning above provides some
insight about algorithms for the (much easier) problem of finding a permutation in which the total weight
is at least as large as the expected weight in a uniformly random permutation.

1.1   Group action uniformity vs. almost uniformity
We actually prove all the aforementioned results in the general setting of group actions, of which k-wise
independent permutations as well as k-wise independent random variables form specific instances. A
group G acts on a set X if G acts as a group of permutations on X. That is, g : X → X is a permutation
of X for all g ∈ G, and (gh)(x) = g(h(x)) for all g, h ∈ G and x ∈ X. This gives a general framework:
k-wise independent permutations correspond to the case of G = Sn the group of permutations, and
  3 In fact, it suffices if π is (k + 1)-minwise independent.



                        T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                        563
                                          N OGA A LON AND S HACHAR L OVETT

X = [n]k = {i1 , . . . , ik ∈ [n] distinct} is the set of sequences of k distinct elements, where the action of G
on X is straightforward. The case of k-wise independent distributions over {0, 1}n corresponds to G = Fn2
and X = [n]k × Fk2 , where the action of g = (g1 , . . . , gn ) ∈ Fn2 on x = ((i1 , . . . , ik ), (b1 , . . . , bk )) ∈ [n]k × Fk2
is given by g(x) = ((i1 , . . . , ik ), (b1 + gi1 , . . . , bk + gik )). Similarly, one can obtain in this way distributions
supporting k-wise independent random variables, even when each variable is distributed over a different
domain.
     We now introduce some definitions. If G acts on X, a distribution µ over G is X-uniform if

                                               Pr [g(x) = y] = Pr [g(x) = y]
                                              g∼µ                  g∈G

for all x, y ∈ X; and is almost X-uniform with error ε if

                                            Pr [g(x) = y] − Pr [g(x) = y] ≤ ε
                                           g∼µ                  g∈G

for all x, y ∈ X. These definitions coincide with k-wise independence and almost k-wise independence for
permutations when G = Sn and X = [n]k . Theorem 1.1, Theorem 1.2 and Theorem 1.3 are immediate
corollaries of the following general theorems, when applied to G = Sn and X = [n]k .
    First, we show that distributions over G which are almost X-uniform with small enough error, are
close in statistical distance to distributions which are X-uniform.
Theorem 1.4. Let µ be a distribution on G which is almost X-uniform with error ε. Then there exists a
distribution µ 0 on G which is X-uniform, and such that the statistical distance between µ and µ 0 is at
most ε · 3|X|4 .
    Second, we show that a small random subset of G supports w. h. p. an X-uniform distribution.
Theorem 1.5. Let S ⊂ G be a uniformly random set of size c|X|6 for an appropriately chosen absolute
constant c. Then with high probability over the choice of S, there exists a distribution µ supported on S
which is X-uniform.
    Finally, we derandomize Theorem 1.5. Recall that if G acts on X, then G also acts on X × X in
the obvious manner, i. e., g((x1 , x2 )) = (g(x1 ), g(x2 )). We show that if a distribution over G is almost
X × X-uniform with a small enough error, then it must support an X-uniform distribution.
Theorem 1.6. Let µ be a distribution supported on a set S ⊂ G which is almost (X × X)-uniform with
error ε < 0.5|X|−7 . Then there exists a distribution µ 0 supported on S which is X-uniform.
    The proof of Theorem 1.5 is by a counting argument using the symmetry of the group action. The
proofs of Theorem 1.4 and Theorem 1.6 rely on representation theory of finite groups. In the language
of the Fourier analysis literature, we prove results regarding quadrature rules for the representations
appearing in the action of G on X. Technically, our arguments involving representation theory are quite
basic, and as such are similar in spirit to several known results in the Fourier analysis literature. In
particular, Theorem 1.5 is similar to theorems established in [16, 2]. However, our proof is arguably
simpler, as it applies the Carathéodory theorem instead of a more involved second moment argument.
Also, some technical parts used in the proof of Theorem 1.6 are related to known results in the Fourier
analysis literature, e. g., in [18, 19].

                         T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                             564
                         A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

Paper organization We present preliminary definitions in Section 2. Theorem 1.4 is proved in Section 3,
Theorem 1.5 in Section 4 and Theorem 1.6 in Section 5. We conclude with some open problems in
Section 6. Throughout the paper we do not attempt to optimize constants.


2    Preliminaries
For an integer t ≥ 1 we denote [t] := {1, 2, . . . ,t}. The statistical (or total variation) distance between two
distributions µ, µ 0 is given by dist(µ, µ 0 ) = ∑x |µ(x) − µ 0 (x)|. The expression δi, j is equal to 1 if i = j
and is equal to 0 otherwise.

Group action and uniformity A group G acts on a set X if there is a homomorphism from G to
the permutation group on X. That is, each g ∈ G is a permutation on X, and (gh)(x) = g(h(x)) for all
g, h ∈ G, x ∈ X. We denote by UG the uniform distribution over G. For a distribution µ over G we denote
by g ∼ µ a random element chosen according to µ. We abbreviate Prg∈G [·] = Prg∼UG [·].
     We recall some definitions from the introduction: a distribution µ over G is X-uniform if

                                           Pr [g(x) = y] = Pr [g(x) = y]
                                          g∼µ                g∈G

for all x, y ∈ X; and a distribution µ is almost X-uniform with error ε if

                                        Pr [g(x) = y] − Pr [g(x) = y] ≤ ε
                                       g∼µ                g∈G

for all x, y ∈ X. A family S ⊂ G is X-uniform (almost X-uniform, accordingly) if the uniform distribution
over S is such. We will need some basic facts in linear algebra, geometry and representation theory, which
are presented below.

Linear algebra Let A = (ai, j ) be a complex matrix. The L∞ norm of A is kAk∞ = max |ai, j | (not to be
confusedp with the operator norm of A on L∞ , which is not used in this paper). The Frobenius norm of A is
kAkFr = ∑ |ai, j |2 . For every A, kAk∞ ≤ kAkFr . A matrix A is unitary if AAt = I. The Frobenius norm
of a matrix is invariant under any unitary basis change. That is, if U is unitary then kU −1 AUkFr = kAkFr .
The tensor product of a d1 × d1 matrix A1 and a d2 × d2 matrix A2 , denoted A1 ⊗ A2 , is a (d1 d2 ) × (d1 d2 )
matrix, whose entries are given by (A1 ⊗ A2 )(i,i0 ),( j, j0 ) = (A1 )i, j (A2 )i0 , j0 . The tensor product of unitary
matrices is also unitary.

Geometry Let X = {x1 , . . . , xN } be a set of points in Rd . The convex hull of X is the set of points
contained in the minimal convex set containing X; equivalently, it is the set of all points
                                 n                                     o
                                      λ x
                                   ∑ ii 1 : λ , . . . , λN ≥ 0, ∑ i
                                                                 λ  = 1  .

Fact 2.1 (Carathéodory theorem [8]). Let X be a finite set of points in Rd , and let y be a point in the
convex hull of X. Then there exists a subset Y ⊂ X of size |Y | ≤ d + 1 such that y is in the convex hull of
Y.

                       T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                    565
                                        N OGA A LON AND S HACHAR L OVETT

    Any hyperplane H partitions a set X of points into two sets: if H = {x : ha, xi = b} then the sets are
{x ∈ X : ha, xi ≥ b} and {x ∈ X : ha, xi < b}. We need the following bound on the maximal number of
ways a set can be partitioned by hyperplanes.4
Fact 2.2 (Harding [11]). Let X be a set of N points     d
                                          d   N−1
                                                  in Rd . The number of different ways to partition X
into two sets by a hyperplane is at most ∑i=0 i ≤ N .

Representation theory Let G be a finite group. A representation of G (over C) is a homomorphism
R : G → GL(d, C). That is, R(g) for g ∈ G is a d × d nonsingular complex matrix, and R(gh) = R(g)R(h)
for every g, h ∈ G. The dimension of the representation R is d. Two representations R, R0 of G of
dimension d are equivalent if there exists an invertible matrix A such that R0 (g) = A−1 R(g)A for all g ∈ G.
This is denoted as R ≡ R0 .
    A representation R is unitary if R(g) is unitary for all g ∈ G.
Fact 2.3. Any representation of G is equivalent to a unitary representation.
    We will restrict our attention only to unitary representations in this paper. We note that if R, R0 are
unitary and equivalent, then there exists a unitary matrix A such that R0 (g) = A−1 R(g)A.
    Let G be a group which acts on a set X, that is, g : X → X is a permutation of X for all g ∈ G, and
(gh)(x) = g(h(x)) for all g, h ∈ G and x ∈ X. The associated representation RX maps each g ∈ G to the
permutation matrix it induces on the set X. That is, RX (g) is an |X| × |X| matrix, indexed by x, y ∈ X,
defined as (RX (g))x,y = 1 if g(x) = y and (RX (g))x,y = 0 otherwise. Note that RX is always a unitary
representation.
    The sum of two representations R1 : G → GL(d1 , C) and R2 : G → GL(d2 , C) is a representation
R : G → GL(d1 + d2 , C) where R(g) is defined as a block diagonal matrix with two blocks, given by
R1 (g) and R2 (g). For e ≥ 1 let eR := R + · · · + R where the sum is over e copies of R. A representation
R is reducible if it is equivalent to the sum of two representations. Otherwise, the representation R is
irreducible. We summarize a few basic properties of representations below. For details we refer the reader
to any standard book on representation theory, e. g., [10].
Fact 2.4 (Maschke’s theorem). Any representation R of G is equivalent to a sum of irreducible represen-
tations R ≡ e1 R1 + · · · + et Rt , where R1 , . . . , Rt are nonequivalent irreducible representations, and ei ≥ 1
is the multiplicity of Ri . We have ∑ ei dim(Ri ) = dim(R).
Fact 2.5 (Schur’s lemma). Let R be a unitary irreducible representation of G of dimension d. Then for
any i, j, k, ` ∈ [d],
                                  1                        1
                                      ∑ R(g)i, j R(g)k,` = d δi,k δ j,` .
                                 |G| g∈G
Let R0 , R00 be two non-equivalent unitary irreducible representations of G of dimensions d 0 , d 00 . Then for
any i, j ∈ [d 0 ] and k, ` ∈ [d 00 ],
                                         1
                                            ∑ R0 (g)i, j R00 (g)k,` = 0 .
                                        |G| g∈G
  4 A quick way to prove a slightly weaker estimate is as follows: the VC-dimension [24] of halfspaces is d + 1. Hence by the
                                                                             N
VC-dimension theorem [24, 22, 23] the number of partitions is at most ∑d+1
                                                                         i=0 i ≤ N
                                                                                    d+1 .



                        T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                         566
                        A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

   The trivial representation is given by 1(g) = 1 for all g ∈ G. An immediate corollary of Schur’s
lemma is that for every representation R which is irreducible and nontrivial, we have ∑g∈G R(g) = 0.
   The group algebra C[G] is the linear space of functions µ : G → C. It is often written as µ =
∑g∈G µ(g) · g. Note that the distributions over G form a subset of C[G]. For µ ∈ C[G] and a representation
R of G, let R(µ) := ∑g∈G µ(g)R(g). If µ is a distribution, this is equivalent to R(µ) = Eg∼µ [R(g)].
Note that in this notation, the statistical distance between two distributions µ, µ 0 on G is dist(µ, µ 0 ) =
kµ − µ 0 k1 .


3    Almost X-uniform distributions are statistically close to X-uniform dis-
     tributions
We prove in this section Theorem 1.4, which states that almost X-uniform distributions with small enough
error are statistically close to X-uniform distributions. We recall the statement of the theorem.
Theorem 1.4 Let µ be a distribution on G which is almost X-uniform with error ε. Then there exists
a distribution µ 0 on G which is X-uniform, and such that the statistical distance between µ and µ 0 is at
most ε · 3|X|4 .
    We first rephrase the conditions for a distribution to be X-uniform, or almost X-uniform, in terms of
representations. Let RX be the representation of the action of G on X, i. e., RX (g)x,y = 1g(x)=y . Let UG
denote the uniform distribution over G.
Proposition 3.1. Let µ be a distribution on G. Then
    1. µ is X-uniform iff RX (µ) = RX (UG );
    2. µ is almost X-uniform with error ε iff kRX (µ) − RX (UG )k∞ ≤ ε.
Proof. The claim is immediate from the definitions of X-uniform and almost X-uniform distributions,
since RX (µ)x,y = Prg∼µ [g(x) = y] and RX (UG )x,y = Prg∈G [g(x) = y].

     The first step is to decompose RX into its irreducible representations. Let RX ≡ e0 1 + e1 R1 + · · · +
et Rt , where R1 , . . . , Rt are unitary nonequivalent non-trivial irreducible representations, and ei is the
multiplicity of Ri in RX . We next transform the conditions of Proposition 3.1 to the basis of the irreducible
representations.
Proposition 3.2. Let µ be a distribution on G. Then
    1. µ is X-uniform iff Ri (µ) = 0 for all i ∈ [t];
    2. if µ is almost X-uniform with error ε then kRi (µ)k∞ ≤ ε|X| for all i ∈ [t].
Proof. As µ is a distribution, then 1(µ) = ∑g∈G µ(g) = 1, and the same holds for UG . Hence always
1(µ) = 1(UG ). Thus, RX (µ) = RX (UG ) iff Ri (µ) = Ri (UG ) for all i ∈ [t]. The first item follows since
Ri (UG ) = 0 for all i ∈ [t]. To see that, note that by Schur’s lemma
                                            1                    1
                          Ri (UG ) j,k =       ∑   Ri (g) j,k =     ∑ Ri (g) j,k 1(g) = 0
                                           |G| g∈G              |G| g∈G

                      T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                            567
                                            N OGA A LON AND S HACHAR L OVETT

since Ri and 1 are nonequivalent unitary irreducible representations. To prove the second item, let µ be an
almost X-uniform distribution with error ε. By Proposition 3.1 this is equivalent to kRX (µ)−RX (UG )k∞ ≤
ε. The L∞ norm is not convenient for the basis change required to switch to the basis of the irreducible
representations. We thus switch to the Frobenius norm, which is trivially bounded by

                                                 kRX (µ) − RX (UG )kFr ≤ ε|X| .

Note that the Frobenius norm is invariant under unitary change of basis, and as both RX and R1 , . . . , Rt are
unitary, the basis change can also be assumed to be unitary. We thus have
                       s
                             t
                            ∑ ei kRi (µ) − Ri (UG )k2Fr = kRX (µ) − RX (UG )kFr ≤ ε|X| ,
                            i=1

which combined with the fact that Ri (UG ) = 0 implies that kRi (µ)k∞ ≤ ε|X|.

    The main idea in the proof of Theorem 1.4 is to “correct” each element of Ri (µ) to be zero by making
a small statistical change in µ, and without affecting the other elements of Ri or in any other Ri0 . This
is analogous to the proof idea of [5] for almost k-wise independent bits (see also [3]). Performing all
these local changes sequentially over all elements of Ri , i ∈ [t], will shift µ into an X-uniform distribution.
Actually, as a first step we will get an element of C[G] which we then fix to be a distribution.
    Let Ri be one of the irreducible representations, and let di = dim(Ri ) be its dimension. For j, k ∈ [di ]
we define ∆i, j,k ∈ C[G] as
                                                          di
                                           ∆i, j,k (g) =     Ri (g) j,k .
                                                         |G|
    We consider how shifting µ by a small multiple of ∆i, j,k affects the entries of R1 , . . . , Rt .

Proposition 3.3. Let i ∈ [t], j, k ∈ [di ] and i0 ∈ [t], j0 , k0 ∈ [di0 ]. For any δ ∈ R we have

                                 Ri0 (µ + δ ∆i, j,k ) j0 ,k0 = Ri0 (µ) j0 ,k0 + δ · 1(i, j,k)=(i0 , j0 ,k0 ) .

Proof. First, by additivity

                                 Ri0 (µ + δ ∆i, j,k ) j0 ,k0 = Ri0 (µ) j0 ,k0 + δ · Ri0 (∆i, j,k ) j0 ,k0 .

The claim follows from the orthogonality of the entries of the irreducible representations. By Schur’s
Lemma,
                                              di
                     Ri0 (∆i, j,k ) j0 ,k0 =     ∑ Ri0 (g) j0 ,k0 Ri (g) j,k = 1(i, j,k)=(i0 , j0 ,k0 ) .
                                             |G| g∈G

   We will also need the following proposition, which asserts that 1(∆i, j,k ) = 0 and that k∆i, j,k k∞ is
bounded.

Proposition 3.4. Let i ∈ [t], j, k ∈ [di ]. Then

   1. 1(∆i, j,k ) = 0;

                         T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                 568
                        A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

   2. k∆i, j,k k∞ ≤ |X|/|G|.
Proof. The first item follows because ∑g∈G Ri (g) j,k = 0 by Schur’s lemma, since Ri is a nontrivial
irreducible representation. The second item follows because di ≤ |X| and because |Ri (g) j,k | ≤ 1 since
Ri (g) is a unitary matrix.

    Applying Proposition 3.3 and Proposition 3.4 iteratively over all elements of R1 , . . . , Rt , we obtain the
following corollary.
Corollary 3.5. Let µ be a distribution over G which is almost X-uniform with error ε. Define ∆ ∈ C[G]
by
                                ∆(g) = − ∑ ∑ Ri (µ) j,k · ∆i, j,k (g) .
                                              i∈[t] j,k∈[di ]

Then
   1. RX (µ + ∆) = RX (UG );
   2. k∆k∞ ≤ ε|X|4 /|G|.
Proof. The first item holds since Ri (µ + ∆) j,k = Ri (UG ) j,k for all i ∈ [t] and j, k ∈ di by Proposition 3.3,
and since 1(µ + ∆) = 1(UG ) = 1 by the first item in Proposition 3.4. The second item holds since
∑ di2 ≤ |X|2 as dim(RX ) = |X|, |Ri (µ) j,k | ≤ ε|X| by Proposition 3.2, and |∆i, j,k (g)| ≤ |X|/|G| by the
second item in Proposition 3.4.

    We are nearly done. The only problem is that µ + ∆ may not be a distribution: it may be complex,
or have negative values. This can be fixed, without increasing the statistical distance too much. This
concludes the proof of Theorem 1.4.

Proof of Theorem 1.4. Let λ = |G| · k∆k∞ ≤ ε|X|4 . Define
                                                         
                                  0                 ∆+∆
                                µ = (1 − λ ) µ +            + λUG .
                                                      2
We claim that µ 0 is a distribution which is X-uniform. This will establish the result as

                      kµ − µ 0 k1 ≤ λ kµk1 + (1 − λ )k∆k1 + λ kUG k1 ≤ 3λ = 3ε|X|4 .

   First let us show that RX (µ 0 ) = RX (UG ). We already know by Corollary 3.5 that RX (µ + ∆) = RX (UG ).
Conjugating this equality, since RX is a real representation (i. e., all elements in RX (g) are real), and since
µ,UG ∈ R[G] are also real, we obtain that also

                                            RX (µ + ∆) = RX (UG ) .

Thus RX (µ 0 ) = RX (UG ) since RX (µ 0 ) is a convex combination of RX (µ + ∆), RX (µ + ∆) and RX (UG ).
    To conclude we need to show that µ 0 is a distribution, i. e., it is real, nonnegative and sums to one. By
definition of µ 0 it is real, and since RX (µ 0 ) = RX (UG ) we have ∑g∈G µ 0 (g) = 1(µ 0 ) = 1(UG ) = 1. The
bound µ 0 (g) ≥ 0 for all g ∈ G follows by elementary calculations from µ(g) ≥ 0, |∆(g)| ≤ λ /|G| and
UG (g) = 1/|G|.

                     T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                 569
                                     N OGA A LON AND S HACHAR L OVETT

4    Uniformly random sets support X-uniform distributions
We establish Theorem 1.5 in this section, which states that w. h. p. a uniformly random set of size |X|O(1)
supports an X-uniform distribution. We recall the statement of the theorem.
Theorem 1.5 Let S ⊂ G be a uniformly random set of size k for k = O(|X|6 ). Then with high probability
over the choice of S, there exists a distribution µ supported on S which is X-uniform.
     Recall that a distribution µ is X-uniform if Prg∼µ [g(x) = y] = Prg∈G [g(x) = y] for all x, y ∈ X. We
say a set S supports X-uniformity if there exists a distribution supported on S which is X-uniform. We
first establish that this is a purely geometric property of S.
     Let RX be the representation of the action of G on X, that is,

                                              RX (g)x,y = 1g(x)=y .

Let U = RX (UG ) = Eg∈G [RX (g)] denote the matrix which corresponds to the action on X of the uniform
distribution over G. We consider these matrices as points in Rd for d = |X|2 .

Proposition 4.1. A set S ⊂ G supports X-uniformity iff the convex hull of the matrices {RX (g) : g ∈ S}
contains the matrix U.

Proof. A point in the convex hull is given by M = ∑g∈S µ(g) · RX (g) where µ(g) ≥ 0 and ∑g∈S µ(g) = 1.
Thus, each point in the convex hull corresponds to a distribution µ over S, and vice versa. Note that
Mx,y = Prg∼µ [g(x) = y], hence an X-uniform distribution corresponds to the matrix U.

    Let S ⊂ G be a uniformly random set. By Proposition 4.1 it is enough to show that the matrix U lies in
the convex hull of {RX (g) : g ∈ S}. Suppose this is not the case; then there must exist a hyperplane H in
Rd which passes through U and such that all matrices {RX (g) : g ∈ S} lie on one side of H. We first show
that any hyperplane which passes through U has a noticeable fraction of the matrices {RX (g) : g ∈ G} on
both sides.

Proposition 4.2. Let H be a hyperplane which passes through U. The number of matrices {RX (g) : g ∈ G}
on any side of H is at least |G|/(|X|2 + 1).

Proof. Let H + denote a halfspace defined by H, and let G+ = {g ∈ G : RX (g) ∈ H + } denote the set of
permutations whose corresponding matrices lie in H + . The matrix U can be written by Carathéodory
theorem as the convex combination of d + 1 matrices RX (g0 ), . . . , RX (gd ). We claim that for any h ∈ G,
the matrix U also belongs to the convex hull of RX (g0 h), . . . , RX (gd h). This follows since RX (gi h) =
RX (gi )RX (h) and URX (h) = U. Thus, at least one of g0 h, . . . , gd h must lie in G+ , for any choice of h ∈ G.
This concludes the proof since for a randomly chosen h,

                                                   d
                                                                                  |G+ |
                        1 = Pr [∃i, gi h ∈ G+ ] ≤ ∑ Pr [gi h ∈ G+ ] = (d + 1) ·         .
                            h∈G                   i=0 h∈G                          |G|

    We now establish Theorem 1.5.

                      T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                570
                         A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

Proof of Theorem 1.5. Let S ⊂ G be a uniformly random set of N elements, chosen with repetitions.
Let K C G be the normal subgroup of G which acts trivially on X, i. e., K = {g ∈ G : g(x) = x ∀x ∈ X}.
Observe that the quotient group G/K also acts on X, and that {RX (g) : g ∈ G} = {RX (g) : g ∈ G/K}.
Thus the number of distinct matrices RX (g) is bounded by |G/K| ≤ |X|!, and by Fact 2.2 the number of
ways to partition this set of matrices by any hyperplane, and in particular one which passes through U, is
bounded by (|X|!)d . Fix such a partition. The number of matrices {Rx (g) : g ∈ G} which lies on each
side of the partition is at least |G|/(d + 1) by Proposition 4.2. Hence, the probability that S is contained
in one side of the partition is bounded by 2(1 − 1/(d + 1))N . Thus, by the union bound, the probability
that there exists a hyperplane passing through U, such that S is contained in one side of it, is at most
                                                N
                              d             1
                        |(X!)| · 2 1 −                ≤ 2 exp(−N/(d + 1) + d log(|X|!)) ,
                                          d +1

which is at most 0.01 (say) for N = O(d 2 log(|X|!)) ≤ O(|X|6 ).


5    Almost X-uniform distributions support X-uniform distributions
We prove in this section Theorem 1.6, which states that if µ is an almost X × X-uniform distribution with
small enough error, then there exists an X-uniform distribution µ 0 supported on the support of µ. We
recall the statement of the theorem.
Theorem 1.6 Let µ be a distribution supported on a set S ⊂ G which is almost (X × X)-uniform with
error ε < 0.5|X|−7 . Then there exists a distribution µ 0 supported on S which is X-uniform.
     Fix such a distribution µ, and let S denote its support, S = {g : µ(g) > 0}. Let RX be the representation
of G acting on X. By Proposition 4.1, S supports an X-uniform distribution iff RX (UG ) = Eg∈G [RX (g)]
lies in the convex hull of {RX (g) : g ∈ S}. Assume this is not the case; then there exists a hyperplane H
which passes through RX (UG ) and such that all {RX (g) : g ∈ S} lie on one side of H.
     We first project H into a hyperplane with a simpler representation. Let RX ≡ e0 1 + e1 R1 + · · · + et Rt
denote the decomposition of RX into unitary nonequivalent irreducible representation, and let di = dim(Ri )
denote the dimension of each irreducible representation. Essentially, we will project H to “use” only one
copy from each nontrivial irreducible representation. That is, we will show that H can be projected to a
hyperplane separating 0 from {R1 (g) × · · · × Rt (g) : g ∈ S}.

Proposition 5.1. There exists a map L : G → R given by

                                         L(g) := ∑        ∑ λi, j,k · Ri (g) j,k
                                                  i∈[t] j,k∈[di ]


for some coefficients {λi, j,k ∈ C : i ∈ [t], j, k ∈ [di ]} such that

    1. Eg∈G [L(g)] = 0;

    2. for all g ∈ S, L(g) > 0.

                       T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                            571
                                            N OGA A LON AND S HACHAR L OVETT

Proof. The existence of a hyperplane H separating RX (UG ) from {RX (g) : g ∈ S} implies that there exists
a map L0 : G → R defined as L0 (g) = ∑x,y∈X αx,y RX (g)x,y for some real coefficients {αx,y ∈ R : x, y ∈ X}
such that
                                         L0 (g) > Eh∈G [L0 (h)]
for all g ∈ S. Applying the linear transformation mapping RX into the basis of irreducible representations,
we get that L0 (g) can be expressed as

                              L0 (g) = ∑ β0,` 1(g) + ∑                               ∑ ∑ βi, j,k,` Ri (g) j,k ,
                                           `∈[e0 ]                           i∈[t] j,k∈[di ] `∈[ei ]


where β0,` , βi, j,k,` ∈ C are obtained by a linear transformation (over C) of αx,y . Observe that Eg∈G [L0 (g)] =
∑`∈[e0 ] β0,` by Schur’s lemma, and define L(g) := L0 (g) − E[L0 (g)]. Note that L : G → R is real since
L0 : G → R was real, that E[L] = 0 and that L(g) > 0 for all g ∈ S. The coefficient λi, j,k is given by the
sum of all βi, j,k,` over ` ∈ [ei ].

      We may assume without loss of generality that Eg∈G [L2 (g)] = 1 by multiplying all coefficients
λi, j,k by an appropriate factor. The main idea is to show that if µ is almost X × X uniform, then
Eg∼µ [L2 (g)] ≈ Eg∈G [L2 (g)] = 1 while Eg∼µ [L(g)] ≈ Eg∈G [L(g)] = 0. Combining this with a bound on
kLk∞ a simple calculation shows that it cannot be the case that L(g) > 0 for all g in the support of µ.
      The first step is to show that the coefficients λi, j,k cannot be very large.

Proposition 5.2.
                                                                       |λi, j,k |2
                                                         ∑ ∑                       = 1.
                                                        i∈[t] j,k∈[d ]
                                                                          di
                                                                         i


In particular, |λi, j,k | ≤ |X|1/2 for all i, j, k.

Proof. We assumed E[L2 ] = 1, which, since L is real, implies E[|L|2 ] = 1. Using Schur’s lemma we get
                                            
                         1 = Eg∈G L(g) · L(g)
                                                                                                                           
                             =       ∑ ∑                   ∑            λi, j,k λi0 , j0 ,k0 Eg∈G Ri (g) j,k Ri0 (g) j0 ,k0
                                   i,i0 ∈[t] j,k∈[di ] j0 ,k0 ∈[di0 ]

                                                  |λi, j,k |2
                             =     ∑ ∑                        ,
                                   i∈[t] j,k∈[d ]
                                                     di
                                                  i



and in particular |λi, j,k |2 ≤ di ≤ |X|.

    An immediate corollary is that L(g) can never be very large.

Corollary 5.3. |L(g)| ≤ |X|2.5 for all g ∈ G.

Proof. We have |Ri (g) j,k | ≤ 1 since Ri is unitary, hence |L(g)| ≤ ∑i∈[t] ∑ j,k∈[di ] |λi, j,k | ≤ |X|2.5 since
∑ di2 ≤ |X|2 .

                        T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                                 572
                          A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

     The bound on |λi, j,k | together with the assumption that µ is almost X × X-uniform, implies that the
first and second moments of L are approximately the same under µ and under the uniform distribution
over G.
Proposition 5.4. Let µ be a distribution which is almost X × X-uniform with error ε. Then
   1. |Eg∼µ [L(g)]| ≤ ε|X|4.5 ;

   2. |Eg∼µ [L2 (g)] − 1| ≤ ε|X|7 .
Proof. We have
                               |Eg∼µ [L(g)]| ≤ ∑               ∑ |λi, j,k ||Eg∼µ [Ri (g) j,k ]| .
                                                      i∈[t] j,k∈[di ]

The bound on the first moment follows since ∑ di2 ≤ |X|2 ; since |λi, j,k | ≤ |X|1/2 by Proposition 5.2; and
since µ is in particular X-uniform with error ε|X|, we have by Proposition 3.2 that |Eg∼µ [Ri (g) j,k ]| ≤
ε|X|2 . The bound on the second moment is proved in a similar way. Recall that we proved the identity
∑ |λi, j,k |2 /di = 1 in Proposition 5.2. So

                    Eg∼µ [|L(g)|2 − 1] = ∑ |λi, j,k |2 · Eg∼µ [|Ri (g) j,k |2 ] − 1/di
                                                                                       
                                             i, j,k

                                        +              ∑                λi, j,k λi0 , j0 ,k0 · Eg∼µ [Ri (g) j,k Ri0 (g) j0 ,k0 ] .
                                             (i, j,k)6=(i0 , j0 ,k0 )

To conclude the proof we need to show that Eg∼µ [|Ri (g) j,k |2 ] ≈ 1/di and that Eg∼µ [Ri (g) j,k Ri0 (g) j0 ,k0 ] ≈ 0.
    The condition that µ is almost X × X-uniform with error ε is equivalent to

                                         kRX×X (µ) − RX×X (UG )k∞ ≤ ε .

Switching to the Frobenius norm, this implies

                                      kRX×X (µ) − RX×X (UG )kFr ≤ ε|X|2 .

We now decompose RX×X to simpler representations, coming from the irreducible representations of
RX . We have that RX×X = RX ⊗ RX , which since RX is real, also gives RX×X = RX ⊗ RX . Now, if
RX ≡ e0 1 + ∑ti=1 ei Ri is the decomposition of RX into irreducible unitary nonequivalent representations,
we have
                                                 t                                    t
                             RX×X ≡ e20 1 + ∑ e0 ei (Ri + Ri ) + ∑ ei ei0 (Ri ⊗ Ri0 ) .
                                               i=1                                 i,i0 =1

Note that this is not the decomposition of RX×X into irreducible representations, since Ri ⊗ Ri0 is reducible!
Nevertheless, we observe that as the basis change for RX was unitary, so is the basis change for RX×X
(since the tensor product of two unitary matrices is again unitary). In particular, we get that

                                  k(Ri ⊗ Ri0 )(µ) − (Ri ⊗ Ri0 )(UG )kFr ≤ ε|X|2 ,

which implies that
                                  k(Ri ⊗ Ri0 )(µ) − (Ri ⊗ Ri0 )(UG )k∞ ≤ ε|X|2 .

                       T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                                       573
                                          N OGA A LON AND S HACHAR L OVETT

The matrix Ri ⊗ Ri0 is indexed by (( j, j0 ), (k, k0 )), where (Ri ⊗ Ri0 )(g)( j, j0 ),(k,k0 ) = Ri (g) j,k Ri0 (g) j0 ,k0 . We
thus get that for any i, j, k, i0 , j0 , k0 we have

                           Eg∼µ [Ri (g) j,k Ri0 (g) j0 ,k0 ] − Eg∈G [Ri (g) j,k Ri0 (g) j0 ,k0 ] ≤ ε|X|2 .

To conclude, by Schur’s lemma
                                                                           1
                                     Eg∈G [Ri (g) j,k Ri0 (g) j0 ,k0 ] =     1           0 0 0 .
                                                                           di (i, j,k)=(i , j ,k )
The bound for the second moment now follows by elementary calculations analogous to the ones for the
first moment.

    We are now ready to prove Theorem 1.6.

Proof of Theorem 1.6. Let µ be almost X × X uniform with error ε ≤ 0.5|X|−7 . Summarizing Corol-
lary 5.3 and Proposition 5.4, we have

    1. kLk∞ ≤ |X|2.5 ;

    2. Eg∼µ [L(g)] ≤ ε|X|4.5 ;

    3. Eg∼µ [L(g)2 ] ≥ 1 − ε|X|7 .

However, since we assumed by contradiction that L(g) > 0 for all g in the support of µ, we have

                              Eg∼µ [L(g)2 ] ≤ kL(g)k∞ · Eg∼µ [L(g)] ≤ |X|2.5 · ε|X|4.5 ,

i. e., we have
                                                     1 − ε|X|7 ≤ ε|X|7 ,
which is false whenever ε < 0.5|X|−7 .


6     Summary and open problems
We showed that almost X-uniform (or almost X × X-uniform) distributions are close to X-uniform
distributions in two ways: they are statistically close to some X-uniform distribution µ 0 , and they support
a X-uniform distribution µ 00 . It may be possible that both can be realized by the same X-uniform
distribution, i. e., that µ 0 = µ 00 . We leave this as an open problem.
    Another interesting combinatorial problem is to construct small families which are uniform. Even
in the special case of k-wise independent permutations, this is known explicitly only for families of
exponential size [9] or non-explicitly by a probabilistic argument [17]. It is intriguing whether the
probabilistic argument can be adapted to efficiently construct such families.

Acknowledgements We thank Avi Wigderson for helpful discussions and reference to the work of
Karp and Papadimitriou [13], and Laci Babai for helpful comments.

                        T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                                           574
                     A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

References
 [1] N IR A ILON AND N OGA A LON: Hardness of fully dense problems. Inform. and Comput.,
     205(8):1117–1129, 2007. [doi:10.1016/j.ic.2007.02.006] 563

 [2] G ORJAN A LAGIC AND A LEXANDER RUSSELL: Spectral concentration of positive functions on
     compact groups. J. Fourier Anal. Appl., 17(3):355–373, 2011. [doi:10.1007/s00041-011-9174-5]
     564

 [3] N OGA A LON , A LEXANDR A NDONI , TALI K AUFMAN , K EVIN M ATULEF, RONITT RUBINFELD ,
     AND N ING X IE: Testing k-wise and almost k-wise independence. In Proc. 39th STOC, pp. 496–505.
     ACM Press, 2007. [doi:10.1145/1250790.1250863] 568

 [4] N OGA A LON , L ÁSZLÓ BABAI , AND A LON I TAI: A fast and simple randomized parallel algorithm
     for the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. [doi:10.1016/0196-
     6774(86)90019-2] 560

 [5] N OGA A LON , O DED G OLDREICH , AND Y ISHAY M ANSOUR: Almost k-wise independence
     versus k-wise independence. Inform. Process. Lett., 88(3):107–110, 2003. [doi:10.1016/S0020-
     0190(03)00359-4] 561, 568

 [6] P ER AUSTRIN AND J OHAN H ÅSTAD: Randomly supported independence and resistance. SIAM J.
     Comput., 40(1):1–27, 2011. Preliminary version in STOC’09. [doi:10.1137/100783534] 562

 [7] P ETER J. C AMERON: Permutation groups. In Handbook of Combinatorics, Vol. 1, pp. 611–645.
     Elsevier, Amsterdam, 1995. 561

 [8] C ONSTANTIN C ARATHÉODORY: Über den Variabilitätsbereich der Fourier’schen Konstanten von
     positiven harmonischen Funktionen. Rendiconti del Circolo Matematico di Palermo, 32(1):193–217,
     1911. [doi:10.1007/BF03014795] 562, 565

 [9] H ILARY F INUCANE , RON P ELED , AND YARIV YAARI: A recursive construction of t-wise
     uniform permutations. Technical report, 2012. Random Structures and Algorithms, to appear.
     [arXiv:1201.4960] 561, 574

[10] W ILLIAM F ULTON AND J OE H ARRIS: Representation Theory: A First Course. Volume 129 of
     Graduate Texts in Mathematics. Springer, 1st edition, 1991. 566

[11] E DWARD F RANK H ARDING: The number of partitions of a set of N points in k di-
     mensions induced by hyperplanes. Proc. Edinburgh Math. Soc. (2), 15(4):285–289, 1967.
     [doi:10.1017/S0013091500011925] 566

[12] E YAL K APLAN , M ONI NAOR , AND O MER R EINGOLD: Derandomized constructions of k-wise
     (almost) independent permutations. Algorithmica, 55(1):113–133, 2009. Preliminary version in
     RANDOM’05. [doi:10.1007/s00453-008-9267-y] 561

                   T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                      575
                                 N OGA A LON AND S HACHAR L OVETT

[13] R ICHARD M. K ARP AND C HRISTOS H. PAPADIMITRIOU: On linear characterizations of combi-
     natorial optimization problems. SIAM J. Comput., 11(4):620–632, 1982. Preliminary version in
     FOCS’80. [doi:10.1137/0211053] 562, 574

[14] M ARTIN K ASSABOV: Symmetric groups and expander graphs. Inventiones Mathematicae,
     170(2):327–354, 2007. [doi:10.1007/s00222-007-0065-y] 561

[15] DAPHNE KOLLER AND N IMROD M EGIDDO: Constructing small sample spaces satisfying given
     constraints. SIAM J. Discrete Math., 7(2):260–274, 1994. Preliminary version in STOC’93.
     [doi:10.1137/S0895480192228140] 562

[16] K A -L AM K UEH , T IMOTHY O LSON , DANIEL ROCKMORE , AND K I -S ENG TAN: Nonlin-
     ear approximation theory on compact groups. J. Fourier Anal. Appl., 7(3):257–281, 2001.
     [doi:10.1007/BF02511813] 564

[17] G REG K UPERBERG , S HACHAR L OVETT, AND RON P ELED: Probabilistic existence of
     rigid combinatorial structures. In Proc. 44th STOC, pp. 1091–1106. ACM Press, 2012.
     [doi:10.1145/2213977.2214075] 561, 574

[18] DAVID M ASLEN: Efficient computation of Fourier transforms on compact groups. J. Fourier Anal.
     Appl., 4(1):19–52, 1998. [doi:10.1007/BF02475926] 564

[19] A IDAN ROY AND A NDREW J. S COTT: Unitary designs and codes. Designs, Codes and Cryptogra-
     phy, 53(1):13–31, 2009. [doi:10.1007/s10623-009-9290-2] 564

[20] RONITT RUBINFELD AND N ING X IE: Robust characterizations of k-wise independence over prod-
     uct spaces and related testing results. Random Structures & Algorithms, 2012 (online). Preliminary
     version in ICALP’10. [doi:10.1002/rsa.20423] 561

[21] A LEXANDER RUSSELL AND H ONG WANG: How to fool an unbounded adversary with a short key.
     IEEE Trans. Inform. Theory, 52(3):1130–1140, 2006. Preliminary version in EUROCRYPT’02.
     [doi:10.1109/TIT.2005.864438] 560

[22] N ORBERT S AUER: On the density of families of sets. J. Combin. Theory Ser. A, 13(1):145–147,
     1972. [doi:10.1016/0097-3165(72)90019-2] 566

[23] S AHARON S HELAH: A combinatorial problem; stability and order for models and theories in
     infinitary languages. Pacific J. Math., 41(1):247–261, 1972. Project Euclid. 566

[24] V LADIMIR N. VAPNIK AND A LEXEY YA . C HERVONENKIS: On the uniform convergence of
     relative frequencies of events to their probabilities. Theory Probab. Appl., 16(2):264–280, 1971.
     [doi:10.1137/1116025] 566

[25] S ERGE VAUDENAY: Provable security for block ciphers by decorrelation. In Proc. 15th Symp. Theo-
     retical Aspects of Comp. Sci. (STACS’98), pp. 249–275. Springer, 1998. [doi:10.1007/BFb0028566]
     560

                   T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                         576
                     A LMOST k-W ISE VS . k-W ISE I NDEPENDENT P ERMUTATIONS

[26] S ERGE VAUDENAY: Adaptive-attack norm for decorrelation and super-pseudorandomness. In Proc.
     6th Ann. Internat. Workshop on Selected Areas in Cryptography (SAC’99), pp. 49–61. Springer,
     1999. [doi:10.1007/3-540-46513-8_4] 560
[27] S ERGE VAUDENAY: Decorrelation: A theory for block cipher security. J. Cryptology, 16(4):249–
     286, 2003. [doi:10.1007/s00145-003-0220-6] 560

AUTHORS
     Noga Alon
     Professor
     Tel Aviv University, Israel
     nogaa tau ac il
     http://www.tau.ac.il/~nogaa

     Shachar Lovett
     Assistant Professor
     University of California, San Diego CA
     slovett ucsd edu
     http://cse.ucsd.edu/~slovett

ABOUT THE AUTHORS
     N OGA A LON received his Ph. D. in Mathematics at the Hebrew University of Jerusalem
        under the supervision of Micha Perles. He is a Baumritter Professor of Mathematics and
        Computer Science at Tel Aviv University, and visits frequently the Institute for Advanced
        Study in Princeton. He works in Combinatorics, Graph Theory and their applications
        in Theoretical Computer Science, focusing on combinatorial algorithms, combinatorial
        geometry, combinatorial number theory, algebraic and probabilistic methods in Combi-
        natorics, and has also been working on Circuit Complexity, Streaming algorithms, and
        topological methods in Combinatorics. He is a member of the Israel National Academy
        of Sciences and of Academia Europaea, and received several awards including the Pólya
        Prize, the Gödel Prize, the Israel Prize and the EMET Prize. He is married to Nurit and
        has three daughters. More details can be found at Noga Alon’s Home Page.

     S HACHAR L OVETT received his Ph. D. from the Weizmann Institute of Science in 2010
        under the supervision of Omer Reingold and Ran Raz. He was a member in the Institute
        for Advanced Study School of Mathematics between 2010 and 2012. He is now an
        assistant professor in University of California San Diego School of Computer Science
        and Engineering. He works in Theoretical Computer Science with special emphasis
        on Computational Complexity, Coding theory, Randomness and Pseudo-randomness,
        Algebraic techniques and applications of Additive Combinatorics. He is married to Iris
        and has one daughter and one son. More details can be found at Shachar Lovett’s Home
        Page.


                  T HEORY OF C OMPUTING, Volume 9 (15), 2013, pp. 559–577                           577