A Constructive Lovász Local Lemma for Permutations

Authors David G. Harris, Aravind Srinivasan,
Plaintext
                          T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41
                                       www.theoryofcomputing.org




      A Constructive Lovász Local Lemma for
                   Permutations∗
                             David G. Harris†                       Aravind Srinivasan‡
               Received July 27, 2015; Revised December 30, 2016; Published December 21, 2017




       Abstract: While there has been significant progress on algorithmic aspects of the Lovász
       Local Lemma (LLL) in recent years, a noteworthy exception is when the LLL is used
       in the context of random permutations. The breakthrough algorithm of Moser & Tardos
       only works in the setting of independent variables, and does not apply in this context. We
       resolve this by developing a randomized polynomial-time algorithm for such applications. A
       noteworthy application is for Latin transversals: the best general result known here (Bissacot
       et al., improving on Erdős and Spencer), states that any n × n matrix in which each entry
       appears at most (27/256)n times, has a Latin transversal. We present the first polynomial-
       time algorithm to construct such a transversal. We also develop RNC algorithms for Latin
       transversals, rainbow Hamiltonian cycles, strong chromatic number, and hypergraph packing.
           In addition to efficiently finding a configuration which avoids bad events, the algo-
       rithm of Moser & Tardos has many powerful extensions and properties. These include a
       well-characterized distribution on the output distribution, parallel algorithms, and a partial
   ∗ An extended abstract of this paper has appeared in the Proceedings of the 25th ACM-SIAM Symposium on Discrete

Algorithms, 2014 [19].
   † Research supported in part by NSF Awards CNS-1010789 and CCF-1422569.
   ‡ Research supported in part by NSF Awards CNS-1010789 and CCF-1422569, and by a research award from Adobe, Inc.



ACM Classification: F.2.2, G.3
AMS Classification: 68W20, 60C05, 05B15
Key words and phrases: Lovász Local Lemma, Lopsided Lovász Local Lemma, random permutations,
Moser–Tardos algorithm, Latin transversals, rainbow Hamiltonian cycles, strong chromatic number,
hypergraph packing


© 2017 David G. Harris and Aravind Srinivasan
c b Licensed under a Creative Commons Attribution License (CC-BY)                      DOI: 10.4086/toc.2017.v013a017
                                  DAVID G. H ARRIS AND A RAVIND S RINIVASAN

       resampling variant. We show that our algorithm has nearly all of the same useful properties
       as the Moser–Tardos algorithm, and present a comparison of this aspect with recent work on
       the LLL in general probability spaces.


1     Introduction
Recent years have seen substantial progress in developing algorithmic versions of the Lovász Local
Lemma (LLL) and some of its generalizations, starting with the breakthrough by Moser & Tardos [31],
see, e. g., [16, 18, 25, 32]. However, one major relative of the LLL that has eluded constructive versions,
is the “lopsided” version of the LLL (with the single exception of the CNF-SAT problem [31]). A natural
setting for the lopsided LLL is where we have one or many random permutations [13, 27, 30]. This
approach has been used for Latin transversals [9, 13, 36], hypergraph packing [28], graph coloring [10],
and certain error-correcting codes [24]. However, current techniques do not give constructive versions
in this context. We develop a randomized polynomial-time algorithm to construct such permutation(s)
whose existence is guaranteed by the lopsided LLL, leading to several algorithmic applications in
combinatorics. Furthermore, since the appearance of the conference version of this work [19], related
papers, including [1, 20, 26] have been published; we make a comparison to these in Sections 1.2 and 6.3,
detailing which of our contributions do not appear to follow from the frameworks of [1, 20, 26].

1.1    The Lopsided Lovász Local Lemma and random permutations
Suppose we want to select N permutations π1 , . . . , πN , where each πk is a permutation on the set [nk ] =
{1, . . . , nk }, which satisfy a given list of side constraints. The Lopsided Lovász Local Lemma (LLLL)
can be used to prove that such permutations exist, under suitable conditions. To do so, we define the
probability space Ω, which is the uniform distribution on Sn1 × · · · × SnN , i. e., each permutation πk is
chosen independently and uniformly. For every constraint on the permutations, there is an associated
“bad” event in the probability space Ω that the permutations violate the constraint. We then wish to show
that there is positive probability that no bad event occurs, i. e., permutations exist satisfying the list of
constraints.
     We restrict our attention to a limited class of constraints, in which each bad event B has the form

                                        B ≡ πk1 (x1 ) = y1 ∧ · · · ∧ πkr (xr ) = yr

for some list of tuples {(k1 , x1 , y1 ), . . . , (kr , xr , yr )}. (More complex constraints can usually be decomposed
into such conjunctions, so this does not lose much generality.) We frequently abuse notation to identify B
with the set of tuples describing it, so we write B = {(k1 , x1 , y1 ), . . . , (kr , xr , yr )} and say that B is true on π
if πk1 (x1 ) = y1 ∧ · · · ∧ πkr (xr ) = yr . We will assume that no bad event contains two tuples (k, x, y), (k, x, y0 )
where y 6= y0 , or two tuples (k, x, y), (k, x0 , y) where x 6= x0 ; such a bad event would be impossible and
could be ignored.
     To apply the LLLL in this setting, we need to define a dependency graph with respect to these
bad events. We connect two bad events B, B0 by an edge if they overlap in one slice of the domain
or range, that is, if there are k, x, y1 , y2 with (k, x, y1 ) ∈ B, (k, x, y2 ) ∈ B0 or there are k, x1 , x2 , y with
(k, x1 , y) ∈ B, (k, x2 , y) ∈ B0 . We write this B ∼ B0 ; note that B ∼ B. The following notation will be useful:

                          T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                           2
                       A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

for pairs (x1 , y1 ), (x2 , y2 ), we write (x1 , y1 ) ∼ (x2 , y2 ) if x1 = x2 or y1 = y2 (or both). Thus, another way to
write B ∼ B0 is that “there are (k, x, y) ∈ B, (k, x0 , y0 ) ∈ B0 with (x, y) ∼ (x0 , y0 ).” At various points we use
the notation (k, x, ∗) to mean any (or all) triples of the form (k, x, y), and similarly for (k, ∗, y), or (x, ∗)
etc. Therefore, yet another way to write the condition B ∼ B0 is that there are (k, x, ∗) ∈ B, (k, x, ∗) ∈ B0 or
(k, ∗, y) ∈ B, (k, ∗, y) ∈ B0 .
     With these definitions, one can show that in the space Ω the probability of avoiding a bad event B can
only be increased by avoiding other bad events B0 6∼ B [28]. Thus, in the language of the lopsided LLL,
the relation ∼ defines a negative-dependence graph among the bad events. (See [27, 28, 30] for a study
of the connection between negative dependence, random injections/permutations, and the LLLL.) Hence,
the standard LLLL criterion is as follows.
Theorem 1.1 ([28]). Suppose some function x : B → (0, 1) satisfies, for every B ∈ B, the condition
                                          PΩ (B) ≤ x(B) ∏ (1 − x(B0 )) .
                                                          B0 ∼B
                                                          B0 6=B

Then the random process of selecting each πk uniformly at random and independently has a positive
probability of selecting permutations that avoid all the bad events.
     The “positive probability” of Theorem 1.1 is however typically exponentially small, as is standard for
the LLL. As mentioned above, a variety of papers have used the framework of Theorem 1.1 to prove
the existence of various combinatorial structures. Unfortunately, the algorithms for the LLL, such as
Moser–Tardos resampling [31], do not apply in this setting. The problem is that such algorithms have a
more restrictive notion of when two bad events are dependent, namely, that they share variables. (The
Moser–Tardos algorithm allows for a restricted type of dependence called lopsidependence, wherein two
bad events which share a variable but always agree on that value, are counted as independent. This is
not strong enough to generate permutations.) So we do not have an efficient algorithm to generate such
permutations, we can merely show that they exist.
     We develop an algorithmic analogue of the LLL for permutations. The necessary conditions for
our Swapping Algorithm are the same as for the LLL (Theorem 1.1); however, we will construct such
permutations in randomized polynomial (typically linear or near-linear) time. Our setting is far more
complex than [31], and requires many intermediate results first. The main complication is that when we
encounter a bad event involving “πk (x) = y,” and perform our algorithm’s random swap associated with
it, we could potentially change any entry of πk . In contrast, when we resample a variable in [31, 18], all
the changes are confined to that variable. There is a further technical issue: the current witness-tree-based
algorithmic versions of the LLL such as [31, 18], identify, for each bad event B in the witness-tree τ,
some necessary event occurring with probability at most PΩ (B). This is not the proof we employ here;
there are significant additional terms (“(nk − A0k )!/n!”—see the proof of Lemma 3.1) that are gradually
“discharged” over time.
     We also develop RNC versions of our algorithms. Going from serial to parallel is fairly direct in [31];
our main bottleneck here is that when we resample an “independent” set of bad events, they could still
influence each other.
     (Note: we distinguish in this paper between the probability of events which occur in our algorithm,
which we denote simply by P, and the probabilities of events within the space Ω, which we denote by
PΩ .)

                          T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                       3
                               DAVID G. H ARRIS AND A RAVIND S RINIVASAN

1.2   Comparison with other LLLL algorithms
Building on an earlier version of this article [19], several papers have developed generic frameworks
for variations of the Moser–Tardos algorithm applied to other probability spaces. In [1], Achlioptas &
Iliopoulos gave an algorithm which is based on a compression analysis for a random walk; this was
improved for permutations and matchings by Kolmogorov [26]. In [20], Harvey & Vondrák gave a
probabilistic analysis similar to the parallel Moser–Tardos algorithm. These frameworks both include
the permutation LLL as well as some other combinatorial applications. These papers give much simpler
proofs that the Swapping Algorithm terminates quickly.
    The Moser–Tardos algorithm has many other powerful properties and extensions, beyond the fact
that it efficiently finds a configuration avoiding bad events. These properties include a well-characterized
distribution on the output distribution at the end of the resampling process, a corresponding efficient
parallel (RNC) algorithm, a partial-resampling variant (as developed in [18]), and an arbitrary (even
adversarial) choice of which bad event to resample. All of these properties follow from the Witness Tree
Lemma we show for our Swapping Algorithm. The more generalized LLLL frameworks of [1, 20] have a
limited ability to show such extensions.
    We will discuss the relationship between this paper and the other LLLL frameworks further in
Section 6.3. As one example of the power of our proof method, we develop a parallel Swapping
Algorithm in Section 7; we emphasize that such a parallel algorithm cannot be shown using the results
of [1] or [20]. A second example is provided by Theorem 8.2, which we do not see how to develop using
the frameworks of [1, 20, 26].
    One of the main goals of our paper is to provide a model for what properties a generalized LLLL
algorithm should have. In our view, there has been significant progress toward this goal but there remain
many missing pieces toward a true generalization of the Moser–Tardos algorithm. We will discuss this
more in a concluding section, Section 9.

1.3   Applications
We present algorithmic applications for four classical combinatorial problems: Latin transversals, rainbow
Hamiltonian cycles, strong chromatic number, and edge-disjoint hypergraph packing. In addition to the
improved bounds, we wish to highlight that our algorithmic approach can go beyond Theorem 1.1. As we
will see shortly, one of our asymptotically optimal algorithmic results on Latin transversals, could not
even have been shown non-constructively using the lopsided LLL prior to this work.
    The study of Latin squares and the closely related Latin transversals is a classical area of combinatorics,
going back to Euler and earlier [23]. Given an m × n matrix A with m ≤ n, a transversal of A is a choice
of m elements from A, one from each row and at most one from any column. Perhaps the major open
problem here is given an integer s, under what conditions will A have an s-transversal: a transversal in
which no value appears more than s times [9, 12, 13, 35, 36]? The usual type of sufficient condition
sought here is an upper bound ∆ on the number of occurrences of any given value in A. Thus we ask:
what is the maximum ∆ such that any m × n matrix A in which each value appears at most ∆ times, is
guaranteed to have an s-transversal? We denote this quantity by L(s; m, n).
    The case s = 1 is perhaps most studied, and 1-transversals are also called Latin transversals. The case
m = n is also commonly studied (and includes Latin squares as a special case), and we will also focus on

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                4
                     A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

these. It is well-known that L(1; n, n) ≤ n − 1 [35]. In perhaps the first application of the LLLL to random
permutations, Erdős & Spencer essentially proved a result very similar to Theorem 1.1, and used it to
show that L(1; n, n) ≥ n/(4e) [13]. (Their paper shows that L(1; n, n) ≥ n/16; the n/(4e) lower bound
follows easily from their technique.) To our knowledge, this is the first Ω(n) lower bound on L(1; n, n).
Alon asked if there is a constructive version of this result [4]. Building on [13] and using the connections
to the LLL from [33, 34], Bissacot et al. showed non-constructively that L(1; n, n) ≥ (27/256)n [9]. Our
result makes these results constructive.
     The lopsided LLL has also been used to study the case s > 1 [36]. Here, we prove a result that is
                                                                       √
asymptotically optimal for large s, except for the lower-order O( s) term: we show (algorithmically)
                          √
that L(s; n, n) ≥ (s − O( s)) · n. An interesting fact is that this was not known even non-constructively
before—Theorem 1.1 roughly gives L(s; n, n) ≥ (s/e) · n. We also give faster serial and perhaps the first
RNC algorithms with good bounds, for the strong chromatic number. Strong coloring is quite well studied
[5, 8, 14, 21, 22], and is in turn useful in covering a matrix with Latin transversals [7].

1.4   Outline
In Section 2 we introduce our Swapping Algorithm, a variant of the Moser–Tardos resampling algorithm.
In it, we randomly select our initial permutations; as long as some bad event is currently true, we perform
certain random swaps to randomize (or resample) them.
     Section 3 introduces the key analytic tools to understand the behavior of the Swapping Algorithm,
namely the witness tree and the witness subdag. The construction for witness trees follows [31]; it
provides an explanation or history for the random choices used in each resampling. The witness subdag is
a related concept, which is new here; it provides a history not for each resampling, but for each individual
swapping operation performed during the resamplings.
     In Section 4, we show how these witness subdags may be used to deduce partial information about the
permutations. As the Swapping Algorithm proceeds in time, the witness subdags can also be considered
to evolve over time. At each stage of this process, the current value of the witness subdags provides
information about the current values of the permutations. In Section 5, we use this process to make
probabilistic predictions for certain swaps made by the Swapping Algorithm. Whenever the witness
subdags change, the swaps must be highly constrained so that the permutations still conform to them. We
calculate the probability that the swaps satisfy these constraints.
     Section 6 puts the analyses of Sections 3, 4, 5 together, to prove that our Swapping Algorithm
terminates in polynomial time under the same conditions as those of Theorem 1.1; also, as mentioned in
Section 1.2, Section 6.3 discusses certain contributions that our approach leads to that do not appear to
follow from [1, 20, 26].
     In Section 7, we introduce a parallel (RNC) algorithm corresponding to the Swapping Algorithm.
This is similar in spirit to the Parallel Resampling Algorithm of Moser & Tardos. In the latter algorithm,
one repeatedly selects a maximal independent set (MIS) of bad events which are currently true, and
resamples them in parallel. In our setting, bad events which are “independent” in the LLL sense (that is,
they are not connected via ∼), may still influence each other; a great deal of care must be taken to avoid
these conflicts.
     Section 8 describes a variety of combinatorial problems to which our Swapping Algorithm can
be applied, including Latin transversals, strong chromatic number, and hypergraph packing. Finally,

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                              5
                                 DAVID G. H ARRIS AND A RAVIND S RINIVASAN

we conclude in Section 9 with a discussion of future goals for the construction of a generalized LLL
algorithm.


2    The Swapping Algorithm
We will analyze the following Swapping Algorithm to find a satisfactory π1 , . . . , πN .

    1. Generate the permutations π1 , . . . , πN uniformly at random and independently.

    2. While there is some true bad event,

        3. Choose some true bad event B ∈ B arbitrarily. For each permutation that is involved in B, we
           perform a swapping of all the relevant entries. (We will describe the swapping subroutine
           “Swap” shortly.) We refer to this step as a resampling of the bad event B.
             Each permutation involved in B is swapped independently, but if B involves multiple entries
             from a single permutation, then all such entries are swapped simultaneously. For example, if
             B consisted of triples (k1 , x1 , y1 ), (k2 , x2 , y2 ), (k2 , x3 , y3 ), then we would perform Swap(π1 ; x1 )
             and Swap(π2 ; x2 , x3 ), where the “Swap” procedure is given next.

The swapping subroutine Swap(π; x1 , . . . , xr ) for a permutation π : [t] → [t] is defined as follows.
Repeat the following for i = 1, . . . , r:

    • Select xi0 uniformly at random among [t] − {x1 , . . . , xi−1 }.

    • Swap entries xi and xi0 of π.

      At every stage of this algorithm all the πk are permutations, and if this algorithm terminates, then the
πk must avoid all the bad events. So our task will be to show that the algorithm terminates in polynomial
time. We measure time in terms of a single iteration of the main loop of the Swapping Algorithm: each
time we run one such iteration, we increment the time by one. We will use the notation πkT to denote the
value of permutation πk after time T . The initial sampling of the permutation (after Step (1)) generates
πk0 .
      The swapping subroutine seems strange; it would appear more natural to allow xi0 to be uniformly
selected among [t]. However, the swapping subroutine is nothing more than than the Fisher–Yates Shuffle
for generating uniformly random permutations. If we allowed xi0 to be chosen from [t] then the resulting
permutation would be biased. The goal is to change πk in a minimal way to ensure that πk (x1 ), . . . , πk (xr )
and πk−1 (y1 ), . . . , πk−1 (yr ) are adequately randomized.
      There are alternative methods for generating random permutations, and many of these can replace
the Swapping subroutine without changing our analysis. We discuss a variety of such equivalencies
in Appendix A, which are used in various parts of our proofs. One class of algorithms that has a very
different behavior is the commonly used method to generate random reals ri ∈ [0, 1], and then form the
permutation by sorting these reals. When encountering a bad event, one would resample the affected
reals ri . In our setting, where the bad events are defined in terms of specific values of the permutation,
this is not a good swapping method because a single swap can drastically change the permutation.

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                           6
                      A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

When bad events are defined in terms of the relative rankings of the permutation (e. g., a bad event is
π(x1 ) < π(x2 ) < π(x3 )), then this is a better method and can be analyzed in the framework of the ordinary
Moser–Tardos algorithm.


3    Witness trees and witness subdags
To analyze the Swapping Algorithm, following the Moser–Tardos approach [31], we introduce the concept
of an execution log and a witness tree. The execution log consists of listing every resampled bad event, in
the order that they are resampled. We form a witness tree to justify the resampling at time t. We start
with the resampled bad event B corresponding to time t, and create a single node in our tree labeled by
this event. We move backward in time; for each bad event B we encounter, we add it to the witness tree if
B ∼ B0 for some event B0 already in the tree. We choose such a B0 that has the maximum depth in the
current tree (breaking ties arbitrarily), and make B a child of this B0 (there could be many nodes labeled
B0 ). If B 6∼ B0 for all B0 in the current tree, we ignore this B and keep moving backward in time. To make
this discussion simpler we say that the root of the tree is at the “top” and the deep layers of the tree are at
the “bottom.” The top of the tree corresponds to later events, the bottom of the tree to the earliest events.
      We will use the term “witness tree” in two closely related senses in the following proof. First, when
we run the Swapping Algorithm, we produce a witness tree τ̂ T ; this is a random variable. Second, we
might want to fix some labeled tree τ, and discuss hypothetically under what conditions it could be
produced or what properties it has; in this sense, τ is a specific object. We will always use the notation
τ̂ T to denote the specific witness tree produced by running the Swapping Algorithm, corresponding to
resampling time T . We write τ̂ as shorthand for τ̂ T where T is understood from context (or irrelevant).
      We say that a witness tree τ appears if τ̂ T = τ for some T ≥ 0.
      The critical lemma that allows us to analyze the behavior of this algorithm is the following Witness
Tree Lemma.

Lemma 3.1 (Witness Tree Lemma). Let τ be a witness tree, with nodes labeled B1 , . . . , Bs . Then

                                    P(τ appears) ≤ PΩ (B1 ) · · · PΩ (Bs ) .

Note that the probability of the event B within the space Ω can be computed as follows: if B contains
r1 , . . . , rN elements from each of the permutations 1, . . . , N, (and B is not impossible) then

                                               (n1 − r1 )! (nN − rN )!
                                    PΩ (B) =              ...          .
                                                  n1 !        nN !

    This lemma is superficially similar to the corresponding lemma of Moser & Tardos [31]. However,
the proof will be far more complex, and we will require many intermediate results first. The main
complication is that when we encounter a bad event involving πk (x) = y, and we perform the random
swap associated with it, then we could potentially change any entry of πk . By contrast, when the Moser–
Tardos algorithm resamples a variable, all the changes are confined to that variable. However, as we will
see, the witness tree will leave us with enough clues about which swap was actually performed that we
will be able to narrow down the possible impact of the swap.

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                7
                                DAVID G. H ARRIS AND A RAVIND S RINIVASAN

    The analysis in the next sections can be very complicated. We have two recommendations to make
these proofs easier. First, the basic idea behind how to form and analyze these trees comes from [31]; the
reader should consult that paper for results and examples which we omit here. Second, one can get most
of the intuition behind these proofs by considering the situation in which there is a single permutation,
and every bad event has the form π(xi ) = yi . In this case, the witness subdags (defined later) are more or
less equivalent to the witness tree. (The main point of the witness subdag concept is, in effect, to reduce
bad events to their individual elements.) When reading the following proofs, it is a good idea to keep this
special case in mind. In several places, we will discuss how certain results simplify in that setting.
    The following proposition is the main reason the witness tree encodes sufficient information about
the sequence of swaps.

Proposition 3.2. Suppose that at some time t0 we have πkt0 (X) 6= Y , and at some later time t2 > t0 we
have πkt2 (X) = Y . Then there must have occurred at some intermediate time t1 some bad event including
(k, X, ∗) or (k, ∗,Y ).

Proof. Let t1 ∈ [t0 ,t2 − 1] denote the earliest time at which we had π t1 +1 (X) = Y ; this must be due to
encountering some bad event including the elements (k, x1 , y1 ), . . . , (k, xr , yr ) (and possibly other elements
from other permutations). Suppose that πk (X) = Y was first caused by swapping entry xi , which at that
time had πk (xi ) = y0i , with some x00 .
      After this swap, we have πk (xi ) = y00 and πk (x00 ) = y0i . Evidently x00 = X or xi = X. In the second case,
the bad event at time t1 included (k, X, ∗) as desired and we are done. So suppose x00 = X and y0i = Y .
So at the time of the swap, we had πk (xi ) = Y . The only earlier swaps in this resampling were with
x1 , . . . , xi−1 ; so at the beginning of time t1 , we must have had πkt1 (x j ) = Y for some j ≤ i. This implies
that y j = Y , so that the bad event at time t1 included (k, ∗,Y ) as desired.

    To explain some of the intuition behind Lemma 3.1, we note that Proposition 3.2 implies Lemma 3.1
for a singleton witness tree.

Corollary 3.3. Suppose that τ is a singleton node labeled by B. Then P(τ appears) ≤ PΩ (B).

Proof. Suppose τ̂ T = τ. We claim that B must have been true of the initial configuration. For suppose
that (k, x, y) ∈ B but in the initial configuration we have πk (x) 6= y. At some later point in time t ≤ T , the
event B must become true. By Proposition 3.2, then there is some time t 0 < t at which we encounter a bad
event B0 including (k, x, ∗) or (k, ∗, y). This bad event B0 occurs earlier than B, and B0 ∼ B. Hence, we
would have placed B0 below B in the witness tree τ̂ T .

     In proving Lemma 3.1, we will not need to analyze the interactions between the separate permutations,
but rather we will be able to handle each permutation in a completely independent way. For a permutation
πk , we define the witness subdag for permutation πk ; this is a relative of the witness tree, but which only
includes the information for a single permutation at a time.

Definition 3.4 (Witness subdags). For a permutation πk , a witness subdag for πk is defined to be a
directed acyclic simple graph, whose nodes are labeled with pairs of the form (x, y). If a node v is labeled
by (x, y), we write v ≈ (x, y). This graph must in addition satisfies the following conditions:

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                    8
                      A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

   1. If any pair of nodes overlaps in a coordinate, that is, v ≈ (x, y) ∼ (x0 , y0 ) ≈ v0 , then nodes v, v0 must
      be comparable (that is, either there is a path from v to v0 or vice versa).

   2. Every node of G has in-degree at most two and out-degree at most two.

    We also may label the nodes with some auxiliary information, for example we will record that the
nodes of a witness subdag correspond to bad events or nodes in a witness tree τ.
    We refer to vertices close to the source nodes of G (appearing earlier in term) as the “bottom” and
vertices close to the sink nodes (appearing in later in time) as the “top” of G.

   The witness subdags that we will be interested in are derived from witness trees in the following
manner.

Definition 3.5 (Projection of a witness tree). For a witness tree τ, we define the projection of τ onto
permutation πk which we denote Projk (τ), as follows.
    Consider a node v ∈ τ labeled by some bad event B = {(k1 , x1 , y1 ), . . . , (kr , xr , yr )}. For each i with
ki = k, we create a corresponding node v0i ≈ (xi , yi ) in the graph Projk (τ). We also include some auxiliary
information indicating that these nodes came from bad event B, and in particular that all such nodes are
part of the same bad event.
    The edges of Projk (τ) are formed follows. For each node v0 ∈ Projk (τ), labeled by (x, y) and
corresponding to v ∈ τ, we find the node wx ∈ τ (if any) which satisfies the following conditions:

(P1) The depth of wx is smaller than the depth of v.

(P2) wx is labeled by some bad event B0 which contains (k, x, ∗).

(P3) Among all vertices satisfying (P1), (P2), the depth of wx is maximal.

    If this node wx ∈ τ exists, then it corresponds to a node w0x ∈ Projk (τ) labeled (k, x, ∗); we construct
an edge from v0 to w0x . Note that, since the levels of the witness tree are independent under ∼, there can
be at most one such wx and at most one such w0x .
    We similarly define a node wy satisfying:

(P1’) The depth of wy is smaller than the depth of v.

(P2’) wy is labeled by some bad event B0 which contains (k, ∗, y).

(P3’) Among all vertices satisfying (P1’), (P2’), the depth of wy is maximal.

If this node exists, we create an edge from v0 to the corresponding w0y ∈ Projk (τ) labeled (k, ∗, y).

   Note that since edges in Projk (τ) correspond to strictly smaller depth in τ, the graph Projk (τ) is
acyclic. Also, note that it is possible that wx = wy ; in this case we only add a single edge to Projk (τ).

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                   9
                                DAVID G. H ARRIS AND A RAVIND S RINIVASAN

Expository remark. In the special case when each bad event contains a single element, the witness
subdag is a “flattening” of the tree structure. Each node in the tree corresponds to a node in the witness
subdag, and each node in the witness subdag points to the next highest occurrence of the domain and
range variables.
    Basically, the projection of τ onto k tells us all of the swaps of πk that occur. It also gives us some
of the temporal information about these swaps that would have been available from τ. If there is a path
from v to v0 in Projk (τ), then we know that the swap corresponding to v must come before the swap
corresponding to v0 . It is possible that there are a pair of nodes in Projk (τ) which are incomparable, yet in
τ there was enough information to deduce which event came first (because the nodes would have been
connected through some other permutation). So Projk (τ) does discard some information from τ, but it
turns out that we will not need this information.
    To prove Lemma 3.1, we will prove (almost) the following claim: Let τ be a witness tree whose
nodes are labeled with bad events B1 , . . . , Bs . Then the probability that there is some T > 0 such that
Projk (τ) = Projk (τ̂ T ), is at most Pk (B1 ) · · · Pk (Bs ), where, for a bad event B we define Pk (B) in a manner,
similar to PΩ (B); namely, if the bad event B contains rk elements from permutation k, then we define
Pk (B) = (nk − rk )!/nk !.
    Unfortunately, proving this directly runs into technical complications regarding the order of condi-
tioning. It is simpler to just sidestep these issues. However, the reader should bear this in mind as the
informal motivation for the analysis in Section 4.


4     The conditions on a permutation πk∗ over time
In Section 4, we will fix a value k∗ , and we will describe conditions that πkt ∗ must satisfy at various
times t during the execution of the Swapping Algorithm. In this section, we are only analyzing a single
permutation k∗ . To simplify notation, the dependence on k∗ will be hidden henceforth; we will discuss
simply π, Proj(τ), and so forth.
    This analysis can be divided into three phases.
    1. We define the future-subgraph at time t, denoted Gt . This is a kind of graph which encodes
       necessary conditions on π t , in order for τ to appear, that is, for τ̂ T = τ for some T > 0. Importantly,
       these conditions, and Gt itself, are independent of the precise value of T . We define and describe
       some structural properties of these graphs.
    2. We analyze how a future-subgraph Gt imposes conditions on the corresponding permutation π t ,
       and how these conditions change over time.
    3. We compute the probability that the swapping satisfies these conditions.
   We will prove 1. and 2. in Section 4. In Section 5 we will put this together to prove 3. for all the
permutations.

4.1    The future-subgraph
Suppose we have fixed a target graph G, which could hypothetically have been produced as the projection
of τ̂ T onto k∗ . We begin the execution of the Swapping Algorithm and see if, so far, it is still possible that

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                    10
                       A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

G = Projk∗ (τ̂ T ), or if G has been disqualified somehow. Suppose we are at time t of this process; we will
show that certain swaps must have already occurred at past times t 0 < t, and certain other swaps must
occur at future times t 0 > t.
   We define the future-subgraph of G at time t, denoted Gt , which tells us all the future swaps that must
occur.

Definition 4.1 (The future-subgraph). We define the future-subgraphs Gt inductively. Initially G0 = G.
When we run the Swapping Algorithm, as we encounter a bad event (k1 , x1 , y1 ), . . . , (kr , xr , yr ) at time t,
we form Gt+1 from Gt as follows:

   1. Suppose that ki = k∗ , and Gt has a source labeled (xi , y00 ) where y00 6= yi or (x00 , yi ) where x00 6= xi .
      Then, as will be shown in Proposition 4.2, we can immediately conclude G is impossible; we set
      Gt+1 = ⊥, and we can abort the execution of the Swapping Algorithm.

   2. Suppose that Gt contains source nodes labeled (ki , xi , yi ); then Gt+1 is obtained from Gt by removing
      all such nodes.

   3. Otherwise, we set Gt+1 = Gt .
                                           T denote the witness tree built for the event at time T , but only
Proposition 4.2. For any time t ≥ 0, let τ̂≥t
using the execution log from time t onwards. Then if Proj(τ̂ T ) = G we also have Proj(τ̂≥tT )=G.
                                                                                                    t
    Note that if Gt = ⊥, the latter condition is obviously impossible; in this case, we are asserting that
whenever Gt = ⊥, it is impossible to have Proj(τ̂ T ) = G.

Proof. We omit T from the notation, as usual. We prove this by induction on t. When t = 0, this is
obviously true as τ̂≥0 = τ̂ and G0 = G.
    Suppose Proj(τ̂) = G; at time t we encounter a bad event B = (k1 , x1 , y1 ), . . . , (kr , xr , yr ). By inductive
hypothesis, Proj(τ̂≥t ) = Gt .
    Suppose first that τ̂≥t+1 does not contain any bad events B0 ∼ B. Then, by our rule for building the
witness tree, we have τ̂≥t = τ̂≥t+1 . Hence Gt = Proj(τ̂≥t+1 ). The graph Proj(τ̂≥t+1 ) cannot have any
source node labeled (k, x, y) with (x, y) ∼ (xi , yi ) as such node would be labeled with B0 ∼ B. Hence,
according to our rules for updating Gt , we have Gt+1 = Gt . So in this case we have τ̂≥t = τ̂≥t+1 and
Gt = Gt+1 and Proj(τ̂≥t ) = Gt ; it follows that Proj(τ̂≥t+1 ) = Gt+1 as desired.
    Next, suppose τ̂≥t+1 does contain B0 ∼ B. Then bad event B will be added to τ̂≥t , placed below any
such B0 . When we project τ̂≥t , then for each i with ki = k∗ we add a node (xi , yi ) to Proj(τ̂≥t ). Each
such node is necessarily a source node; if such a node (xi , yi ) had a predecessor (x00 , y00 ) ∼ (xi , yi ), then
the node (x00 , y00 ) would correspond to an event B00 ∼ B placed below B. Hence we see that Proj(τ̂≥t ) is
obtained from Proj(τ̂≥t ) by adding source nodes (xi , yi ) for each (k∗ , xi , yi ) ∈ B.
    So Proj(τ̂≥t ) = Proj(τ̂≥t+1 ) plus the addition of source nodes for each (k∗ , xi , yi ). By inductive
hypothesis, Gt = Proj(τ̂≥t ), so that Gt = Proj(τ̂≥t+1 ) plus source nodes for each (k∗ , xi , yi ). Now our rule
for updating Gt+1 from Gt is to remove all such source nodes, so it is clear that Gt+1 = Proj(τ̂≥t+1 ), as
desired.
    Note that in this proof, we assumed that Proj(τ̂) = G, and we never encountered the case in which
Gt+1 = ⊥. This confirms our claim that whenever Gt+1 = ⊥ it is impossible to have Proj(τ̂) = G.

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                      11
                               DAVID G. H ARRIS AND A RAVIND S RINIVASAN

     By Proposition 4.2, the witness subdag G and the future-subgraphs Gt have a similar shape; they are
all produced by projecting witness trees of (possibly truncated) execution logs. Note that if G = Proj(τ)
for some tree τ, then for any bad event B ∈ τ, either B is not represented in G, or all the pairs of the form
(k∗ , x, y) ∈ B are represented in G and are incomparable there.
     The following structural decomposition of a witness subdag G will be critical.

Definition 4.3 (Alternating paths). Given a witness subdag G, we define an alternating path in G to be a
simple path which alternately proceeds forward and backward along the directed edges of G. For a vertex
v ∈ G, the forward path of v in G is the maximal alternating path which includes v and all the forward
edges emanating from v. The backward path of G is defined analogously. Because G has in-degree and
out-degree at most two, every vertex v has a unique forward and backward path (up to reflection); this
justifies our reference to “the” forward and backward path. These paths may be even-length cycles.

    Note that if v is a source node, then its backward path contains just v itself. This is an important type
of alternating path which should always be taken into account in our definitions.
    One type of alternating path, which is referred to as the W-configuration, plays a particularly important
role.

Definition 4.4 (The W-configuration). Suppose v ≈ (x, y) has in-degree at most one, and the backward
path contains an even number of edges, terminating at vertex v0 ≈ (x0 , y0 ). We refer to this alternating path
as a W-configuration. (See Figure 1.)
    Any W-configuration can be written (in one of its two orientations) as a path of vertices labeled

                               (x0 , y1 ), (x1 , y1 ), (x1 , y2 ), . . . , (xs , ys ), (xs , ys+1 );

here the vertices (x1 , y1 ), . . . , (xs , ys ) are at the “base” of the W-configuration. Note here that we have
written the path so that the x-coordinate changes, then the y-coordinate, then x, and so on. When written
this way, we refer to (x0 , ys+1 ) as the endpoints of the W-configuration.
    If v ≈ (x, y) is a source node, then it defines a W-configuration with endpoints (x, y). This should not
be considered a triviality or degeneracy, rather it will be the most important type of W-configuration.


                                (x0 , y1 )                                   (x4 , y5 )




                                       (x1 , y1 )
                                                      (x0 , y0 )
Figure 1: The vertices labeled (x0 , y1 ), (x1 , y1 ), . . . , (x4 , y5 ) form a W-configuration of length 9 with
endpoints (x0 , y5 ). Note that the vertex (x0 , y0 ) is not part of this W-configuration.




                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                 12
                       A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

4.2     The conditions on πkt ∗ encoded by Gt
At any time t, the future-subgraph Gt gives certain necessary conditions on π in order for some putative τ
to appear. Proposition 4.5 describes a certain set of conditions that plays a key role in the analysis.

Proposition 4.5. For a witness subdag G and integers t ≤ T , the following condition is necessary to have
           T ): For every W-configuration in G with endpoints (x , y
G = Proj(τ̂≥t                                                                              t
                                                                  0 s+1 ), we must have π (x0 ) = ys+1 ,

      For example, if v ≈ (x, y) is a source node of G, then π t (x) = y.

Proof. We prove this by induction on s. The base case is s = 0; in this case we have a source node (x, y).
Suppose π t (x) 6= y. In order for τ̂ T to contain some bad event containing (k∗ , x, y), we must at some point
                  0
t 0 > t have π t (x) = y; let t 0 be the minimal such time. By Proposition 3.2, we must encounter a bad
event containing (k∗ , x, ∗) or (k∗ , ∗, y) at some intervening time t 00 < t 0 . If this bad event contains (k∗ , x, y)
                          00
then necessarily π t (x) = y contradicting minimality of t 0 . So there is a bad event B containing either
(k∗ , x, 6= y) or (k∗ , 6= x, y), earlier than the earliest occurrence of π(x) = y. This event B corresponds to a
source node (x, 6= y) or (6= x, y) in Proj(τ̂≥t      T ). So (x, y) cannot also be a source node of G.

     We now prove the induction step. Consider a W-configuration with base (x1 , y1 ), . . . , (xs , ys ), whose
endpoints are vertices v, v0 labeled (x0 , y1 ) and (xs , ys+1 ), respectively.
     At some future time t 0 ≥ t we must encounter a bad event B involving some subset of the source
nodes, say that B includes (xi1 , yi1 ), . . . , (xir , yir ) for 1 ≤ r ≤ s. As these were necessarily source nodes
in Proj(τ̂≥tT ), we had π t 0 (x ) = y , . . . , π t 0 (x ) = y . After the swaps, these source nodes are removed
              0                    i1        i1          ir     ir
                                  T
and so the updated Proj(τ̂≥t         0 +1 ) has r + 1 new W-configurations, whose length is all smaller than s. By
                                                              0
inductive hypothesis, the updated permutation π t +1 must then satisfy
                              0                     0                           0
                            π t +1 (x0 ) = yi1 , π t +1 (xi1 ) = yi2 , . . . , π t +1 (xir ) = ys+1 .

      By Proposition A.2, we may suppose without loss of generality that the resampling of the bad event
first swaps xi1 , . . . , xir in that order. Let π 0 denote the result of these swaps; there may be additional swaps
                                                                           0
to other elements of the permutation, but we must have π t +1 (xi` ) = π 0 (xi` ) for ` = 1, . . . , r. Evidently
xi1 swapped with xi2 , then xi2 swapped with xi3 , and so on, until eventually xir was swapped with
            0
x00 = (π t )−1 ys+1 . At this point, we have π 0 (x00 ) = yi1 . Later swaps during time t 0 may swap x00 with some
                                                                                 0                    0
other x, where (x, y) ∈ B. Thus, at time t 0 + 1 we either have π t +1 (x00 ) = yi1 or π t +1 (x) = yi1 where
                                      0
(x, y) ∈ B. Recall that π t +1 (x0 ) = yi−1 ; thus either x00 = x0 or x = x0 .
      In the latter case, (x0 , y) ∈ B. Thus implies that, when we encounter the bad event B at time t 0 ,
there is a source node labeled (x0 , y) ∈ Proj(τ̂≥t         T ). This node (x , y) would also occur in Proj(τ̂ T ). So
                                                              0                0                                 ≥t
                                                                                   T ), although it is a W-configuration
(x0 , y1 ), (x1 , y1 ), . . . , (xs , ys+1 ) cannot be a W-configuration in Proj(τ̂≥t
in G.
                                                          0                                    0
      Thus, we conclude that x00 = x0 . So (π t )−1 ys = x00 = x0 or equivalently π t (x0 ) = ys . This in turn
implies that π t (x0 ) = ys+1 . For, by Proposition 3.2, otherwise we would have encountered a bad event
involving (x0 , ∗) or (∗, ys+1 ); these would imply an additional in-neighbor of v or v0 , respectively, which
contradicts that it is part of a W-configuration of Proj(τ̂≥t         T ).


      Proposition 4.5 can be viewed equally as a definition:

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                       13
                                 DAVID G. H ARRIS AND A RAVIND S RINIVASAN

Definition 4.6 (Active conditions of a witness subdag). We refer to the conditions implied by Proposi-
tion 4.5 as the active conditions of the witness subdag G. More formally, we define

                Active(G) = {(x, y) | (x, y) are the end-points of a W -configuration of G} .

We also define Atk to be the cardinality of Active(Gt ), that is, the number of active conditions of permuta-
tion πk at time t. (The subscript k may be omitted in context, as usual.)

Lemma 4.7. Suppose G is a witness subdag which has source nodes v1 ≈ (x1 , y1 ), . . . , vr ≈ (xr , yr )
(plus possibly some additional source nodes). Let H = G − v1 − · · · − vr . Then there is a set Z ⊆
{(x1 , y1 ), . . . , (xr , yr )} with the following properties:

   1. There is an injective function f : Z → Active(H), with the property that (x, y) ∼ f ((x, y)) for all
      (x, y) ∈ Z .

   2. | Active(H)| = | Active(G)| − (r − |Z|) .

    Intuitively, we are saying that every node (x, y) we are removing is either explicitly constrained in
an “independent way” by some new condition in the graph H (corresponding to Z), or it is almost totally
unconstrained.

Expository remark. We have recommended bearing in mind the special case when each bad event
consists of a single element. In this case, we would have r = 1; and the stated theorem would be that either
| Active(H)| = | Active(G)| − 1; OR | Active(H)| = | Active(G)| and (x1 , y1 ) ∼ (x10 , y01 ) ∈ Active(H).

Proof. Let Hi = G − v1 − · · · − vi . We will recursively build up a set Z i and functions f i : Z i → Active(Hi ),
where Z i ⊆ {(x1 , y1 ), . . . , (xi , yi )}, and which satisfy the given conditions up to stage i.
    We remove the source node vi from Hi−1 . Observe that (xi , yi ) ∈ Active(Hi−1 ), but (unless there
is some other vertex with the same label in G), (xi , yi ) 6∈ Active(Hi ). Thus, the most obvious change
when we remove vi is that we destroy the active condition (xi , yi ). This may add or subtract other active
conditions as well.
    We will need to update Z i−1 , f i−1 . Most importantly, f i−1 may have mapped (x j , y j ) for j < i, to an
active condition of Hi−1 which is destroyed when vi is removed. In this case, we must re-map this to a
new active condition. Note that we cannot have f i−1 (x j , y j ) = (xi , yi ) for j < i, as xi 6= x j and yi 6= y j .
    There are now a variety of cases depending on the forward path of vi in Hi−1 .

   1. This forward path consists of a cycle, or the forward path terminates on both sides in forward
      edges. This is the easiest case. Then no more active conditions of Hi−1 are created or destroyed.
      We update Z i = Z i−1 , f i = f i−1 . One active condition is removed, in net, from Hi−1 ; hence
      | Active(Hi )| = | Active(Hi−1 )| − 1.

   2. This forward path contains a forward edge on one side and a backward edge on the other. For
      example, suppose the path has the form (X1 ,Y1 ), (X1 ,Y2 ), (X2 ,Y2 ), . . . , (Xs ,Ys+1 ), where the vertices
      (X1 ,Y1 ), . . . , (Xs ,Ys ) are at the base, and the node (X1 ,Y1 ) has out-degree 1, and the node (Xs ,Ys+1 )
      has in-degree 1. Suppose that (xi , yi ) = (X j ,Y j ) for some j ∈ {1, . . . , s}. (See Figure 2.) In this

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                     14
                         A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

       case, we do not destroy any W-configurations, but we create a new W-configuration with endpoints
       (X j ,Ys+1 ) = (xi ,Ys+1 ).
       We now update Z i = Z i−1 ∪ {(xi , yi )}. We define f i = f i−1 plus we map (xi , yi ) to the new active
       condition (xi ,Ys+1 ). In net, no active conditions were added or removed, and | Active(Hi )| =
       | Active(Hi−1 )|.

                                                       (X2 ,Y3 )          (X4 ,Y5 )




                                                  (X2 ,Y2 )


     Figure 2: When we remove (X2 ,Y2 ), we create a new W-configuration with endpoints (X2 ,Y5 ).



    3. This forward path was a W-configuration (X0 ,Y1 ), (X1 ,Y1 ), . . . , (Xs ,Ys ), (Xs ,Ys+1 ) with the pairs
       (X1 ,Y1 ), . . . , (Xs ,Ys ) on the base, and we had (xi , yi ) = (X j ,Y j ). This is the most complicated
       situation; in this case, we destroy the original W-configuration with endpoints (X0 ,Ys+1 ) but create
       two new W-configurations with endpoints (X0 ,Y j ) and (X j ,Ys+1 ). We update Z i = Z i−1 ∪ {(xi , yi )}.
       We will set f i = f i−1 , except for a few small changes as follows.
       If ( f i−1 )−1 (X0 ,Ys+1 ) = 0/ then simply set f i (xi , yi ) = (X0 ,Y j ). Otherwise, we have f i−1 (x` , y` ) =
       (X0 ,Ys+1 ) for some ` < i; so either x` = X0 or y` = Ys+1 . If it is the former, set f i (x` , y` ) =
       (X0 ,Y j ), f i (xi , yi ) = (X j ,Ys+1 ). If it is the latter, set f i (x` , y` ) = (X j ,Ys+1 ), f i (xi , yi ) = (X0 ,Y j ).
       In any case, f i is updated appropriately, and in the net no active conditions are added or removed,
       so | Active(Hi )| = | Active(Hi−1 )|.


5     The probability that the swaps are all successful
In the previous sections, we determined necessary conditions for the permutations πkt , depending on the
graphs Gk,t . In this section, we finish by computing the probability that the swapping subroutine causes
the permutations to, in fact, satisfy all such conditions.
    Proposition 5.1 states the key randomness condition satisfied by the swapping subroutine. The
basic intuition is as follows: suppose π : [n] → [n] is a fixed permutation with π(x) = y, and π 0 =
Swap(π; x1 , . . . , xr ). Then π 0 (x1 ) has a uniform distribution over [n]. Similarly, π 0−1 (y1 ) has a uniform
distribution over [n]. However, the joint distribution is not uniform—there is essentially only one degree of
freedom for the two values. In general, any subset of the variables π 0 (x1 ), . . . , π 0 (xr ), π 0−1 (y1 ), . . . , π −1 (yr )
will have the uniform distribution, as long as the subset does not simultaneously contain π 0 (xi ), π 0−1 (yi )
for some i ∈ [r].

Proposition 5.1. Suppose n, r, s, q are non-negative integers obeying the following constraints:

                           T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                                   15
                                     DAVID G. H ARRIS AND A RAVIND S RINIVASAN

   1. 0 ≤ s ≤ min(q, r).

   2. q + (r − s) ≤ n.
    Let π be a fixed permutation of [n]. Let x1 , . . . , xr ∈ [n] be distinct, and let yi = π(xi ) for i = 1, . . . , r.
Define π 0 = Swap(π; x1 , . . . , xr ).
    Consider a list (x10 , y01 ), . . . , (xq0 , y0q ) satisfying the following properties:
   3. All x0 are distinct; all y0 are distinct.

   4. For i = 1, . . . , s we have (xi , yi ) ∼ (xi0 , y0i ).
    Then we have the bound:
                                                                                   (n − r)!(n − q)!
                              P(π 0 (x10 ) = y01 ∧ · · · ∧ π 0 (xq0 ) = y0q ) ≤                      .
                                                                                  n!(n − q − r + s)!

Expository remark. Consider the special case when each bad event contains a single element. In that
case, we only need to use this result for r = 1. There are two possibilities for s; either s = 0 in which case
this probability on the right is 1 − q/n (i. e., the probability that π 0 (x1 ) 6= y01 , . . . , y0q ); or s = 1 in which
case this probability is 1/n (i. e., the probability that π 0 (x1 ) = y01 ).
Proof. Define the function
                                                                 (n − r)!(n − q)!
                                             g(n, r, s, q) =                       .
                                                                n!(n − q − r + s)!
We will prove this proposition by induction on s, r, considering a number of separate cases.
   1. Suppose s > 0 and x1 = x10 . Then, in order to satisfy the desired conditions, we must swap x1
      to x00 = π −1 (y01 ); this occurs with probability 1/n. The subsequent r − 1 swaps starting with the
      permutation π(x1 x00 ) must now satisfy the conditions π 0 (x20 ) = y02 , . . . , π 0 (xq ) = y0q . We claim that
      (xi , π(x1 x00 )xi ) ∼ (xi0 , y0i ) for i = 2, . . . , s. If x00 6= x2 , . . . , xs , this is immediate. Otherwise, suppose
      x00 = x j . If x j = x0j , then we again still have (x j , π(x1 x00 )x j ) ∼ (x0j , y0j ). If y j = y0j , then this implies
      that y01 = y j = y0j , which contradicts that the y0j 6= y01 .
       So we apply the induction hypothesis to π(x1 x00 ); in the induction, we subtract one from n, q, r, s.
       This gives
                                                                  1
                    P(π 0 (x10 ) = y01 ∧ · · · ∧ π(xq0 ) = y0q ) ≤ g(n − 1, r − 1, s − 1, q − 1) = g(n, r, s, q)
                                                                  n
       as desired.

   2. Similarly, suppose s > 0 and suppose y1 = y01 . By Proposition A.3, we would obtain the same
      distribution if we executed (π 0 )−1 = Swap(π −1 ; y1 , . . . , yr ), so

                  P(π 0 (x10 ) = y01 ∧ · · · ∧ π(xq0 ) = y0q ) = P((π 0 )−1 (y01 ) = x10 ∧ · · · ∧ (π 0 )−1 (y0q ) = xq0 ) .

       Now, the right-hand side has swapped the roles of x1 /y1 ; in particular, it now falls under the
       previous case (1) already proved, and so the right-hand side is at most g(n, r, s, q) as desired.

                           T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                               16
                        A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

   3. Suppose s = 0 and there are indices i ∈ [r], j ∈ [q] with (xi , yi ) ∼ (x0j , y0j ). By Proposition A.2, we
      can assume without loss of generality that (x1 , y1 ) ∼ (x10 , y01 ). So, in this case, we are really in the
      case with s = 1. This is covered by case (1) or case (2), which we have already shown. Thus, we
      have
                                                                                g(n, r, 0, q)
                P(π 0 (x10 ) = y01 ∧ · · · ∧ π(xq0 ) = y0q ) ≤ g(n, r, 1, q) =                ≤ g(n, r, s, q) .
                                                                               n−q−r+1
       Here, we are using our hypothesis that n ≥ q + (r − s) = q + r.

   4. Finally, suppose s = 0 and x1 , . . . , xr are distinct from x10 , . . . , xq0 and y1 , . . . , yq are distinct from
      y01 , . . . , y0q . In this case, a necessary condition to have π 0 (x10 ) = y01 , . . . , π(xq0 ) = y0q is that there
      are some y001 , . . . , y00r , distinct from each other and distinct from y01 , . . . , y0q , with the property that
      π 0 (xi ) = y00i for j = 1, . . . , r. By the union bound, we have

                   P(π 0 (x10 ) = y01 ∧ · · · ∧ π(xq0 ) = y0q ) ≤       ∑ P(π 0 (x1 ) = y001 ∧ · · · ∧ π(xr ) = y00r ) .
                                                                    y001 ,...,y00r

       For each individual summand, we apply the induction hypothesis; the summand has probability
       at most g(n, r, r, q). As there are (n − q)!/(n − q − r)! possible values for y001 , . . . , y00r , the total
       probability is at most (n − q)!/(n − q − r)! × g(n, r, r, q) = g(n, r, s, q).

   We apply Proposition 5.1 to upper-bound the probability that the Swapping Algorithm successfully
swaps when it encounters a bad event.

Proposition 5.2. Suppose we encounter a bad event B at time t containing elements (k, x1 , y1 ), . . . ,
(k, xr , yr ) from permutation k (and perhaps other elements from other permutations). Then the probability
that πkt+1 satisfies all the active conditions of its future-subgraph, conditional on all past events and all
other swappings at time t, is at most

                                                                                     (nk − At+1
                                                                                            k )!
                              P(πkt+1 satisfies Active(Gt+1
                                                        k )) ≤ Pk (B)                              .
                                                                                      (nk − Atk )!

    Recall that we have defined Atk = | Active(Gk,t )| and we have defined Pk (B) = (nkn−r)!
                                                                                        k!
                                                                                             .

Expository remark. Consider the special case when each bad event consists of a single element. In
this case, Pk (B) = 1/n, and the stated theorem is now: either At+1 = At , in which case the probability
that π satisfies its swapping condition is 1/n; or At+1 = At − 1; in which case the probability that π
satisfies its swapping condition is 1 − At+1 /n.

Proof. Let H denote the future-subgraph Gk,t+1 after removing the source nodes corresponding to the
pairs (x1 , y1 ), . . . , (xr , yr ). Using the notation of Lemma 4.7, we set s = |Z| and q = At+1k . We have
Active(H) = {(x10 , y01 ), . . . , (xq0 , y0q )}.
     For each (x, y) ∈ Z, we have y = πkt (x), and there is an injective function f : Z → Active(H) and
(x, y) ∼ f ((x, y)). By Proposition A.2, we can assume without loss of generality Z = {(x1 , y1 ), . . . , (xs , ys )}
and f (xi , yi ) = (xi0 , y0i ). In order to satisfy the active conditions on Gk,t+1 , the swapping must cause
πkt+1 (xi0 ) = y0i for i = 1, . . . , q.

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                             17
                                DAVID G. H ARRIS AND A RAVIND S RINIVASAN

   By Lemma 4.7, we have Atk = At+1                                       t
                                        k + (r − s) = q + (r − s). Since Ak ≤ n, all the conditions of
Proposition 5.1 are satisfied. Thus this probability is at most

                            (nk − r)!       (nk − q)!       (nk − r)!(nk − At+1
                                                                              k )!
                                      ×                   =                t       .
                               nk !     (nk − q − r + s)!       nk !(nk − Ak )!

    We finally have all the pieces necessary to prove Lemma 3.1.
Lemma 3.1. Let τ be a witness tree, with nodes labeled B1 , . . . , Bs . Then

                                      P(τ appears) ≤ PΩ (B1 ) · · · PΩ (Bs ) .

Proof. The Swapping Algorithm, as we have defined it, begins by selecting the permutations uniformly
at random. One may also consider fixing the permutations to some arbitrary (not random) value, and
allowing the Swapping Algorithm to execute from that point onward. We refer to this as starting at
an arbitrary state of the Swapping Algorithm. We will prove the following by induction on τ 0 : The
probability, starting at an arbitrary state of the Swapping Algorithm, that the subsequent swaps cause
subtree τ 0 to appear, is at most
                                                                 N
                                                                                 nk !
               P(τ̂ T = τ 0 for some T ≥ 0) ≤ ∏ PΩ (B) × ∏                                 0
                                                                                                  .     (5.1)
                                                 B∈τ 0          k=1 (nk − | Active(Projk (τ ))|)!

     When τ 0 = 0,
                / the RHS of (5.1) is equal to one so this is vacuously true.
     To show the induction step, note that in order for τ 0 to appear, it must be that some B is resampled,
where some node v ∈ τ 0 is labeled by B. Suppose we condition on that v is the first such node, resampled
at time t. A necessary condition to have τ̂ T = τ 0 for some T ≥ t is that each πkt+1 satisfies all the active
conditions on Gk,t+1 . By Proposition 5.2, this has probability at most

                                                     (nk − At+1   )!
                                             ∏ Pk (B) (nk − Ak t )! .
                                             k                 k

                                                                          T
    Next, if this event occurs, then subsequent resamplings must cause τ̂≥t+1   = τ 0 − v. We bound this
probability using the induction hypothesis. Note that the induction hypothesis gives a bound conditional
on any starting configuration of the Swapping Algorithm, so we may multiply these probabilities to get

            P(τ̂ T = τ 0 for some T > 0)
                              (nk − At+1
                                     k )!
                                                                N
                                                                                 nk !
                ≤ ∏ Pk (B)            t    ×  ∏       PΩ (B) × ∏                          0
                   k           (nk − Ak )!   B∈τ 0 −v          k=1 (nk − | Active(Projk (τ − v))|)!
                                  (nk − At+1
                                          k )!               nk !
                = ∏ PΩ (B) ∏               t )! (n − | Active(Proj (τ 0 − v))|)!
                  B∈τ 0         k  (n k − Ak      k               k
                                      nk !
                = ∏ PΩ (B) ∏               t     as At+1
                                                     k   = | Active(Projk (τ 0 − v))| ,
                  B∈τ 0         k (nk − Ak )!

completing the induction argument.

                          T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                            18
                    A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

    We now consider the necessary conditions to produce the entire witness tree τ, and not just fragments
of it. First, the original permutations πk0 must satisfy the active conditions of the respective witness
subdags Projk (τ). For each permutation k, this occurs with probability
                                           (nk − A0k )!
                                                        .
                                               nk !
Next, the subsequent sampling must be compatible with τ; by (5.1) this has probability at most
                                                    N
                                                           nk !
                                      ∏ PΩ (B) × ∏              0
                                                                   .
                                      B∈τ          k=1 (nk − Ak )!
Again, note that the bound in (5.1) is conditional on any starting position of the Swapping Algorithm,
hence we may multiply these probabilities. In total we have
                                          (nk − A0k )!               N
                                                                            nk !
        P(τ̂ T = τ for some T ≥ 0) ≤ ∏                 × ∏ PΩ (B) × ∏            0
                                                                                    = ∏ PΩ (B).
                                        k     nk !       B∈τ        k=1 (nk − Ak )!   B∈τ

    We note a counterintuitive aspect to this proof. The natural way of proving this lemma would be
to identify, for each bad event B ∈ τ, some necessary event occurring with probability at most PΩ (B).
This is the general strategy in Moser & Tardos [31] and related constructive LLL variants such as [18],
[1], [20]. This is not the proof we employ here; there is an additional factor of (nk − A0k )!/n! which is
present for the original permutation and is gradually “discharged” as active conditions disappear from the
future-subgraphs.


6    The constructive LLL for permutations
Now that we have proved the Witness Tree Lemma, the remainder of the analysis is essentially the same
as for the Moser–Tardos algorithm [31]. Using arguments and proofs from [31] with our key lemma, we
can now easily show our key theorem:
Theorem 6.1. Suppose some function x : B → (0, 1) satisfies, for every B ∈ B, the condition
                                     PΩ (B) ≤ x(B) ∏ (1 − x(B0 )) .
                                                   B0 ∼B
                                                   B0 6=B

Then the Swapping Algorithm terminates with probability one. The expected number of iterations in
which we resample B is at most x(B)/(1 − x(B)).
    In the “symmetric” case, this gives us the well-known LLL criterion:
Corollary 6.2. Suppose each bad event B ∈ B has probability at most p, and is dependent with at most d
bad events. Then if ep(d + 1) ≤ 1, the Swapping Algorithm terminates with probability one; the expected
number of resamplings of each bad event is O(1).
    Some extensions of the LLL, such as the Moser–Tardos distribution bounds shown in [16], the
cluster-expansion LLL criterion [9] and its connection to the Moser–Tardos algorithm [32], or the partial
resampling of [18], follow almost immediately here. There are a few extensions which require slightly
more discussion.

                      T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                            19
                                  DAVID G. H ARRIS AND A RAVIND S RINIVASAN

6.1   Lopsidependence
As in [31], it is possible to slightly restrict the notion of dependence. We can redefine the relation ∼ on
bad events by setting B ∼ B0 iff

   1. B = B0 , or

   2. there is some (k, x, y) ∈ B, (k, x0 , y0 ) ∈ B0 with either x = x0 , y 6= y0 or x 6= x0 , y = y0 .

In particular, bad events which share the same triple (k, x, y), are not caused to be dependent.
    Proving that the Swapping Algorithm still works in this setting requires only a slight change in
our definition of Projk (τ). Now, the tree τ may have multiple copies of any given triple (k, x, y) on a
single level. When this occurs, we create the corresponding nodes v ≈ (x, y) ∈ Projk (τ); edges are added
between such nodes in an arbitrary (but consistent) way. The remainder of the proof remains as before.

6.2   LLL for injective functions
The analysis of [28] considers a slightly more general setting for the LLL, in which we select random
injections fk : [mk ] → [nk ], where mk ≤ nk . In fact, our Swapping Algorithm can be extended to this case.
We simply define a permutation πk on [nk ], where the entries πk (mk + 1), . . . , πk (nk ) are “dummies” which
do not participate in any bad events. The LLL criterion for the extended permutation πk is exactly the
same as the corresponding LLL criterion for the injection fk . Because all of the dummy entries have
the same behavior, it is not necessary for the Swapping Algorithm to keep track of the dummy entries
exactly; they are needed only for the analysis.

6.3   Comparison with the approaches of Achlioptas & Iliopoulos and Harvey & Vondrák
Achlioptas & Iliopoulos [1] and Harvey & Vondrák [20] gave generic frameworks for analyzing variants
of the Moser–Tardos algorithm, applicable to different types of combinatorial configurations. These
frameworks can include vertex colorings, permutations, Hamiltonian cycles of graphs, spanning trees,
matchings, and other settings. For the case of permutations, both of these frameworks give a version of
the Swapping Algorithm and show that it terminates under the same conditions as we do, which in turn
are the same conditions as the LLL (Theorem 1.1).
     The key difference between our approach and [1, 20] is that they enumerate the entire history of
all resamplings to the permutations. In contrast, our proof is based on the Witness Tree Lemma; this
is a much more succinct structure that ignores most of the resamplings, and only enumerates the few
resamplings that are necessary to justify a single item in the execution log. Their proofs are much simpler
than ours; a major part of the complexity of our proof lies in the need to argue that the bad events which
were ignored by the witness tree do not affect the probabilities. (The ignored bad events do interact with
the variables we need to track for the witness tree, but do so in a “neutral” way.)
     If our only goal is to prove that the Swapping Algorithm terminates in polynomial time, then the other
two frameworks give a better and simpler approach. However, the Witness Tree Lemma allows much
more precise estimates for many types of events. The main reason for this precision is the following:
suppose we want to show that some event E has a low probability of occurring during or after the

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                              20
                     A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

execution of the Swapping Algorithm. The proof strategy of Moser & Tardos is to take a union bound
over all witness trees that correspond to this event. In this case, we show a probability bound which is
proportional to the total weight of all such witness trees. This can be a relatively small number as only
the witness trees connected to E are relevant. Our analysis, which is also based on witness trees, is able
to show similar types of bounds.
    However, the analysis of Achlioptas & Iliopoulos and Harvey & Vondrák is not based on witness trees,
but the much larger set of full execution logs. The number of possible execution logs can be exponentially
larger than the number of witness trees. It is very inefficient to take a union bound over all such logs.
Hence, Achlioptas & Iliopoulos and Harvey & Vondrák give bounds which are exponentially weaker (in
a certain technical sense) than the ones we provide.
    Many properties of the Swapping Algorithm depend on the fine degree of control provided by the
Witness Tree Lemma, and it seems difficult to obtain them from the alternate LLLL approaches. We list a
few of these properties here.
The LLL criterion without slack. As a simple example of the problems caused by taking a union bound
over execution logs, suppose that we satisfy the LLL criterion without slack, say ep(d + 1) = 1; here,
as usual, p and d are bounds, respectively on the probability of any bad event and the degree of any
bad event in the dependency graph. In this case, we show that the expected time for our Swapping
Algorithm to terminate is O(m). In contrast, in Achlioptas & Iliopoulos or Harvey & Vondrák, they
require satisfying the LLL criterion with slack ep(1 + ε)(d + 1) = 1, and achieve a termination time of
O(m/ε). They require this slack term in order to damp the exponential growth in the number of execution
logs. (Harvey & Vondrák show that if the symmetric LLL criterion is satisfied without slack, then the
Shearer criterion [34] is satisfied with slack ε = Ω(1/m). Thus, they would achieve a running time of
O(m2 ) without slack.)
Arbitrary choice of which bad event to resample. The Swapping Algorithm as we have stated it is
actually underdetermined, in that the choice of which bad event to resample is arbitrary. In contrast,
in both Achlioptas & Iliopoulos and Harvey & Vondrák, there is a fixed priority on the bad events.
(Kolmogorov [26] has shown that this restriction can be removed in certain special cases of the Achlioptas
& Iliopoulos setting, including for random permutations and matchings.) This freedom can be quite
useful. For example, in Section 7 we consider a parallel implementation of our Swapping Algorithm.
We will select which bad events to resample in a quite complicated and randomized way. However, the
correctness of the parallel algorithm will follow from the fact that it simulates some serial implementation
of the Swapping Algorithm.
The Moser–Tardos distribution. The Witness Tree Lemma allows us to analyze the so-called “Moser–
Tardos (MT) distribution,” first discussed by [16]. The LLL and its algorithms ensure that bad events
B cannot possibly occur. In other words, we know that the configuration produced by the LLL has the
property that no B ∈ B is true. In many applications of the LLL, we may wish to know more about such
configurations, other than they exist.
    We give two examples where this useful for the ordinary, variable-based LLL. First, suppose that
we have some weights for the values of our variables, and we define the objective function on a solution
∑i w(Xi ); in this case, if we are able to estimate the probability that a variable Xi takes on value j in the
output of the LLL (or Moser–Tardos algorithm), then we may be able to show that configurations with a
good objective function exist. A second example is when the number of bad events becomes too large,

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                               21
                              DAVID G. H ARRIS AND A RAVIND S RINIVASAN

perhaps exponentially large. In this case, the Moser–Tardos algorithm cannot test them all. However, we
may still be able to ignore a subset of the bad events, and argue that the probability that they are true at
the end of the Moser–Tardos algorithm is small even though they were never checked.
    The Witness Tree Lemma gives us an extremely powerful result concerning this MT distribution,
which carries over to the Swapping Algorithm.

Proposition 6.3. Let E ≡ πk1 (x1 ) = y1 ∧ · · · ∧ πkr (xr ) = yr . Then the probability that E is true in the
output of the Swapping Algorithm, is at most

                                         PΩ (E) ∏ (1 − x(B))−1 .
                                                B∼E

Proof. See [16] for the proof of this for the ordinary MT algorithm; the extension to the Swapping
Algorithm is straightforward.

Bounds on the depth of the resampling process. One key requirement for parallel variants of the Moser–
Tardos algorithm appears to be that the resampling process has logarithmic depth. This is equivalent to
showing that there are no deep witness trees. This follows easily from the Witness Tree Lemma, along
the same lines as in the original paper of Moser & Tardos, but appears to be very difficult in the other
LLLL frameworks.
Partial resampling. In [18], a partial resampling variant of the Moser–Tardos algorithm was developed.
In this variant, one only resamples a small, random subset of the variables (or, in our case, permutation
elements) which determine a bad event. To analyze this variant, [18] developed an alternate type of
witness tree, which only records the variables which were actually resampled. Ignoring the other variables
can drastically prune the space of witness trees. Again, this does not seem to be possible in other
LLLL frameworks in which the full execution log must be recorded. We will see an example of this in
Theorem 8.2; we do not know of any way to show results such as Theorem 8.2 using the frameworks of
either Achlioptas & Iliopoulos or Harvey & Vondrák.


7    A parallel version of the Swapping Algorithm
The Moser–Tardos resampling algorithm for the ordinary LLL can be transformed into an RNC algorithm
by allowing a slight slack in the LLL’s sufficient condition [31]. The basic idea is that in every round, we
select a maximal independent set (MIS) of bad events to resample. Using the known distributed/parallel
algorithms for MIS, this can be done in RNC; the number of resampling rounds is then shown to be
logarithmic w. h. p. (“with high probability”), in [31].
     In this section, we will describe a parallel algorithm for the Swapping Algorithm, which runs along
the same lines. However, everything is more complicated than in the case of the ordinary LLL. In the
Moser–Tardos algorithm, events which are not connected to each other cannot affect each other in any
way. For the permutation LLL, such events can interfere with each other, but do so rarely. Consider
the following example. Suppose that at some point we have two active bad events, “πk (1) = 1” and
“πk (2) = 2,” and so we decide to resample them simultaneously (since they are not connected to each
other, and hence constitute an independent set). When we resample the bad event πk (1) = 1, we may

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                              22
                      A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

swap 1 with 2; this automatically fixes the second bad event as well. The sequential algorithm, in this
case, would only swap a single element. The parallel algorithm should likewise not perform a second
swap for the second bad event, or else it would be oversampling. Avoiding this type of conflict is quite
tricky.
      Let n = n1 + · · · + nN ; since the output of the algorithm will be the contents of the permutations
π1 , . . . , πk , this algorithm should be measured in terms of n, and we must show that this algorithm runs in
polylog(n) time. Our algorithm will require that |B|, the total number of bad events, is polynomial in n,
and that every element B ∈ B has size |B| ≤ polylog(n); these conditions hold for many cases.
      We describe the following Parallel Swapping Algorithm:

   1. In parallel, generate the permutations π1 , . . . , πN uniformly at random.

   2. We proceed through a series of rounds while there is some true bad event. In round i (i = 1, 2, . . . ,)
      do the following:

         3. Let Vi,1 ⊆ B denote the set of bad events which are currently true at the beginning of round i.
            We will attempt to fix the bad events in Vi,1 through a series of subrounds. This may introduce
            new bad events, but we will not fix any newly created bad events until round i + 1.
            Repeat the following process for j = 1, 2, . . . as long as Vi, j 6= 0:
                                                                                 /
             4. Let Ii, j be an MIS of Vi, j .
             5. For each bad event B = {(k1 , x1 , y1 ), . . . , (kr , xr , yr )} ∈ Ii, j , choose the swaps correspond-
                ing to B. We select each z` ∈ [nk` ], which is the element to be swapped with πk` (x` )
                according to procedure Swap. Do not perform the indicated swaps at this time though!
                We refer to (k1 , x1 ), . . . , (kr , xr ) as the swap-sources of B and the (k1 , z1 ), . . . , (kr , zr ) as
                the swap-mates of B.
             6. Select a random ordering ρi, j of the elements of Ii, j . Consider the graph Gi, j whose
                vertices correspond to elements of Ii, j , with an edge on (B, B0 ) if ρi, j (B) < ρi, j (B0 )
                and one of the swap-mates of B is a swap-source of B0 . Generate Ii,0 j ⊆ Ii, j as the
                lexicographically first MIS (LFMIS) of the resulting graph Gi, j , with respect to vertex
                ordering ρi, j .
             7. For each permutation πk , enumerate all the transpositions (x z) corresponding to elements
                of Ii,0 j , arranged in order of ρi, j . Say these transpositions are, in order (x1 , z1 ), . . . (x` , z` ).
                Compute, in parallel for all πk , the composition πk0 = πk (x1 z1 ) . . . (x` z` ).
             8. Update Vi, j+1 from Vi, j by removing all elements which are either no longer true for the
                current permutation, or are connected via ∼ to some element of Ii,0 j .

Most steps of this algorithm can be implemented using standard parallel methods. For example, step
(1) can be performed simply by having each element of [nk ] choose a random real and then executing a
parallel sort. The independent set Ii, j can be found in time in polylogarithmic time using [6, 29].
    The difficult step to parallelize is in selecting the LFMIS Ii,0 j . In general, the problem of finding the
LFMIS is P-complete [11], hence we do not expect a generic parallel algorithm for this. However, what
saves us it that the ordering ρi, j and the graph Gi, j are constructed in a highly random fashion.
    This allows us to use the following greedy algorithm to construct Ii,0 j , the LFMIS of Gi, j :

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                            23
                                DAVID G. H ARRIS AND A RAVIND S RINIVASAN

   1. Let H1 be the directed graph obtained by orienting all edges of Gi, j in the direction of ρi, j . Repeat
      the following for s = 1, 2, . . . ,:

         2. If Hs = 0/ terminate.
         3. Find all source nodes of Hs . Add these to Ii,0 j .
         4. Construct Hs+1 by removing all source nodes and all successors of source nodes from Hs .

The output of this algorithm is the LFMIS Ii,0 j . Each step can be implemented in parallel time O(log n).
The number of iterations of this algorithm is at most the length of the longest directed path in G0i, j . So it
suffices it show that, w. h. p., all directed paths in G0i, j have length polylog(n).

Proposition 7.1. Let I ⊆ B be an an arbitrary independent set of true bad events, and suppose all
elements of B have size ≤ M. Let G = Gi, j be the graph constructed in Step (6) of the Parallel Swapping
Algorithm.
    Then w. h. p., every directed path in G has length O(M + log n).

Proof. One of the main ideas below is to show that for the typical B1 , . . . , B` ∈ I, where ` = 5(M + log n),
the probability that B1 , . . . , B` form a directed path is small. Suppose we select B1 , . . . , B` ∈ I uniformly
at random without replacement. Let us analyze how these could form a directed path in G. (We may
assume |I| > ` or otherwise the result holds trivially.)
    First, it must be the case that ρ(B1 ) < ρ(B2 ) < · · · < ρ(B` ). This occurs with probability 1/`!.
    Next, the swap-mates of Bs must overlap the swap-sources of Bs+1 , for s = 1, . . . , ` − 1. Now, Bs
has O(M) swap-mates; each such swap-mate can overlap with at most one element of I, since I is an
independent set. Conditional on having chosen B1 , . . . , Bs , there remain |I| − s choices for Bs+1 . This
gives that the probability of having Bs with an edge to Bs+1 , conditional on the previous events, is at most
M/(|I| − s). (The fact that swap-mates are chosen randomly does not give too much of an advantage
here.)
    Putting this all together, the total probability that there is a directed path on B1 , . . . , B` is

                                                                      M `−1 (|I| − `)!
                                P(directed path B1 , . . . , B` ) ≤                    .
                                                                       (|I| − 1)!`!

   The above was for a random B1 , . . . , B` , so the probability that there is some such length-` path is at
most
                                       |I|!      M l−1 (|I| − `)!         M l−1      M `−1
       P(some directed path) ≤                 ×                  = |I| ×       ≤ n×        ≤ n−Ω(1) ,
                                    (|I| − `)!    (|I| − 1)!`!             `!        (`/e)`

since ` = 5(M + log n).

Proposition 7.2. Suppose |B| ≤ poly(n) and all elements B ∈ B have size |B| ≤ M. Then w. h. p. we
have Vi, j = 0/ for some j = O(M log2 n).

Proof. We begin by showing that, if B ∈ I, where I is an arbitrary independent set of B, then with
probability at least 1 − 1/(2M ln n) we have B ∈ I 0 as well, where I 0 is the LFMIS associated with I.

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                   24
                      A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

    Observe that if there is no B0 ∈ I such that ρ(B0 ) < ρ(B) and such that a swap-mate of B0 overlaps with
a swap-source of B, then B ∈ I 0 (this is not a necessary condition). We will analyze the ordering ρ using
the standard trick, in which each element B ∈ I chooses a rank W (B) ∼ Uniform[0, 1], independently and
identically. The ordering ρ is then formed by sorting in increasing ordering of W . In this way, we are
able to avoid the dependencies induced by the rankings. For the moment, suppose that the rank W (B) is
fixed at some real value w. We will then count how many B0 ∈ I satisfy W (B0 ) < w and a swap-mate of B0
overlaps a swap-source of B.
    So consider some swap-source s of B in permutation k, and consider some B0j ∈ I which has r0j other
elements in permutation k. For ` = 1, . . . , r0j , there are nk − ` + 1 possible choices for the `th swap-mate
from B0j , and hence the total expected number of swap-mates of B0 which overlap s is at most

                                                  r0j                 Z r0 +1
                                                                                                             !
                                                       1                   j        d`              nk
      E[# swap-mates of B0j overlapping s] ≤     ∑ nk − ` + 1 ≤                            = ln                  .
                                                 `=1                   `=1      nk − ` + 1        nk − r0j

   Next, sum over all B0j ∈ I. Since I is an independent set, we must have ∑ r0j ≤ nk − 1. Thus, by
concavity of the ln function,
                                                               !                   !
                                                        n k               nk
   E[# swap-mates of some B0j overlapping s] ≤ ∑ ln              ≤ ln                ≤ ln nk ≤ ln n .
                                                 j    nk − r0j        nk − ∑ j r0j

   Summing over all swap-sources of B, the total probability that there is some B0 with ρ(B0 ) ≤ B and for
which a swap-mate overlaps a swap-source of B, is at most w|B| ln n ≤ wM ln n. By Markov’s inequality,

                                   P(B0 ∈ I 0 | W (B) = w) ≥ 1 − wM ln n .

    Integrating over w gives
                                                                 1
                                          P(B0 ∈ I 0 ) ≥ 1 −           .
                                                               2M ln n
    Now, using this fact, we show that Vi, j is decreasing quickly in size. For, suppose B ∈ Vi, j . So B ∼ B0
for some B0 ∈ Ii, j , as Ii, j is a maximal independent set (possibly B = B0 ). We will remove B from Vi, j+1 if
B0 ∈ Ii,0 j , which occurs with probability at least 1 − 1/(2M ln n). As B was an arbitrary element of Vi, j ,
this shows that
                                                                 1 
                                      E |Vi, j+1 | Vi, j ≤ 1 −
                                        
                                                                         |Vi, j | .
                                                                2M ln n
For j = Ω(M log2 n), this implies that

                                               1 Ω(M log2 n)
                                                                |Vi,1 | ≤ n−Ω(1) .
                             
                            E |Vi, j | ≤ 1 −
                                               2M ln n

    This in turn implies that Vi, j = 0/ with high probability, for j = Ω(M log2 n).

   To finish the proof, we must show that the number of rounds is itself bounded w. h. p. We begin by
showing that Witness Tree Lemma remains valid in the parallel setting.

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                         25
                                  DAVID G. H ARRIS AND A RAVIND S RINIVASAN

Proposition 7.3. When we execute the Parallel Swapping Algorithm, we may generate an “execution log”
according to the following rule: suppose that we resample B in round i, j and B0 in round i0 , j0 . Then we
place B before B0 iff:
   1. i < i0 ; OR
   2. i = i0 AND j < j0 ; OR
   3. i = i0 and j = j0 and ρi, j (B) < ρi0 , j0 (B0 );
that is, we order the resampled bad events lexicographically by round, subround, and then rank ρ.
    Given such an execution log, we may also generate witness trees in the same manner as the sequential
algorithm. For any witness tree τ, this procedure ensures that

                                             P(τ appears) ≤ ∏ PΩ (B) .
                                                            B∈τ

Proof. Observe that the choice of swaps for a bad event B at round i, subround j, and rank ρi, j (B), is
only affected by the events in earlier rounds / subrounds as well as other B0 ∈ Ii, j with ρi, j (B0 ) < ρi, j (B).
Thus, we can view this parallel algorithm as simulating the sequential algorithm, with a particular rule for
selecting the bad event to resample. Namely, we keep track of the sets Vi and Ii, j as we do for the parallel
algorithm, and within each subround we resample the bad event in Ii, j with the minimum value of ρi, j (B).
This is why it is critical in step (6) that we select Ii,0 j to be the lexicographically first MIS; this means that
the presence of B ∈ Ii,0 j cannot be affected with B0 with ρ(B0 ) > ρ(B).

Proposition 7.4. Let B be any resampling performed at the ith round of the Parallel Swapping Algorithm
(that is, B ∈ Ii,0 j for some integer j > 0). Then the witness tree corresponding to the resampling of B has
height exactly i.
Proof. First, note that if we have B ∼ B0 in the execution log, where B occurs earlier in time, and the
witness tree corresponding to B has height i, then the witness tree corresponding to B0 must have height
i + 1. So it will suffice to show that if B ∈ Ii,0 j , then we must have B ∼ B0 for some B0 ∈ Ii−1,
                                                                                                 0
                                                                                                    j0 .
     The bad event B must be true at the beginning of round i. By Proposition 3.2, either B was already
true at the beginning of round i − 1, or some bad event B0 ∼ B was resampled at round i − 1. If it is the
latter, we are done. If B was true at the beginning of round i − 1, then B ∈ Vi−1,1 . In order for B to have
been removed from Vi−1 , then either we had B ∼ B0 ∈ Ii−1,      0
                                                                    j0 , in which case we are also done, or after
                   0
some subround j the event B was no longer true. But again by Proposition 3.2, in order for B to become
true again at the beginning of round i, there must have been some bad event B0 ∼ B encountered later in
round i − 1.

    This gives us the key bound on the running time of the Parallel Swapping Algorithm. We give only a
sketch of the proof, since the argument is identical to that of [31].
Proposition 7.5. Suppose that ε > 0 and that some function x : B → (0, 1) satisfies, for every B ∈ B, the
condition
                                PΩ (B)(1 + ε) ≤ x(B) ∏ (1 − x(B0 )) .
                                                            B0 ∼B
                                                            B0 6=B


                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                 26
                       A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

Then, w. h. p., the Parallel Swapping Algorithm terminates after
                                                             
                                                        x(B)
                                         polylog n ∑B 1−x(B)
                                                      ε
rounds.
Proof. Consider the event that some B ∈ B is resampled after i rounds of the Parallel Swapping Algorithm.
In this case, τ̂ has height i. As shown in [31], the sum, over all witness trees of some height h, of the
product of the probabilities of the constituent events in the witness trees, is decreasing exponentially in h.
So, for any fixed B, the probability that this occurs is exponentially small; this remains true after taking a
union-bound over the polynomial number of B ∈ B.

      We can put this analysis all together to show the following result.
Theorem 7.6. Suppose |B| ≤ poly(n) and that every B ∈ B0 has size |B| ≤ polylog(n). Suppose also
that ε > 0 and that some function x : B → (0, 1) satisfies, for every B ∈ B, the condition
                                   PΩ (B)(1 + ε) ≤ x(B) ∏ (1 − x(B0 )) .
                                                          B0 ∼B
                                                          B0 6=B

Then, w. h. p., the Parallel Swapping Algorithm terminates after
                                                             
                                                        x(B)
                                         polylog n ∑B 1−x(B)
                                                      ε
time.


8       Algorithmic Applications
The LLL for permutations plays a role in diverse combinatorial constructions. Using our algorithm,
nearly all of these constructions become algorithmic. We examine a few selected applications now.

8.1     Latin transversals
Consider an n × n matrix A, whose entries come from a set C which are referred to as colors. A Latin
transversal of this matrix is a permutation π ∈ Sn , such that no color appears twice among the entries
A(i, π(i)); that is, there are no i 6= j with A(i, π(i)) = A( j, π( j)). A typical question in this area is the
following: suppose each color appears at most ∆ times in the matrix. How large can ∆ be so as to
guarantee the existence of a Latin transversal? In [13], a proof using the probabilistic form of the LLL
was given, showing that ∆ ≤ n/(4e) suffices. This was the first application of the LLL to permutations.
This bound was subsequently improved by [9] to the criterion ∆ ≤ (27/256)n; this uses a variant of
the probabilistic Local Lemma which is essentially equivalent to Pegden’s variant on the constructive
Local Lemma. Our algorithmic LLL can almost immediately transform the existential proof of [9] into a
constructive algorithm. To our knowledge, this is the first polynomial-time algorithm for constructing
such a transversal.

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                               27
                                DAVID G. H ARRIS AND A RAVIND S RINIVASAN

Theorem 8.1. Suppose ∆ ≤ (27/256)n. Then there is a Latin transversal of the matrix. Furthermore,
the Swapping Algorithm selects such a transversal in polynomial time.

Proof. For any quadruple i, j, i0 , j0 with A(i, j) = A(i0 , j0 ), we have a bad event π(i) = j ∧ π(i0 ) = j0 .
Such an event has probability 1/(n(n − 1)). We apply Pegden’s criterion [32] using the weight function
µ(B) = α to every bad event B, where α is a scalar to be determined.
    This bad event can have up to four types of neighbors (i1 , j1 , i01 , j10 ), which overlap on one of the four
coordinates i, j, i0 , j0 ; as discussed in [9], all the neighbors of any type are themselves neighbors in the
dependency graph. Since these are all the same, we will analyze just the first type of neighbor, one which
shares the same value of i, that is i1 = i. We now may choose any value for j1 (n choices). At this point,
the color A(i1 , j1 ) is determined, so there are ∆ − 1 remaining choices for i01 , j10 .
    By Lemma 3.1 and Pegden’s criterion [32], a sufficient condition for the convergence of the Swapping
Algorithm is that
                                                    1
                                          α≥              (1 + n(∆ − 1)α)4 .
                                                n(n − 1)
    Routine algebra shows that this has a positive real root α when ∆ ≤ (27/256)n.

    In [36], Szabó considered a generalization of this question: suppose that we seek a transversal, such
that no color appears more than s times. When s = 1, this is asking for a Latin transversal. Szabó gave
similar criteria “∆ ≤ γs n” for s a small constant. Such bounds can be easily obtained constructively using
the permutation LLL as well. By combining the permutation LLL with the partial resampling approach
of [18], we can provide asymptotically optimal bounds for large s.
                                     √
Theorem 8.2. Suppose ∆ ≤ (s−c s)n, where c is a sufficiently large constant. Then there is a transversal
of the matrix in which each color appears no more than s times. This transversal can be constructed in
polynomial time.

Proof. For each set of s appearances of any color, we have a bad event. We use the partial resampling
                                                                           −1
framework, to associate the fractional hitting set which assigns weight rs       to any r appearances of a
                  √
color, where r = d se.
    We first compute the probability of selecting a given r-set X. From the fractional hitting set, this has
              −1
probability rs . In addition, the probability of selecting the indicated cells is (n − r)!/n!. So we have
                                                 −1
                                                 s    (n − r)!
                                             p≤                .
                                                 r       n!

    Next, we compute the dependency of the set X. First,we may select another X 0 which overlaps with
                                                         ∆
X in a row or column; the number of such sets is 2rn r−1    . Next, we may select any other r-set with the
same color as X (this is the dependency due to ./ in the partial resampling framework; see [18] for more
details). The number of such sets is ∆r .
    So the LLL criterion is satisfied if
                                −1                           
                                 s      (n − r)!           ∆          ∆
                            e×                   × 2rn            +        ≤ 1.
                                 r         n!             r−1         r

                        T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                  28
                          A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

                                                                                 √
    Simple calculus now shows that this can be satisfied when ∆ ≤ (s − O( s))n. Also, it is easy to
detect a true bad event and resample it in polynomial time, so this gives a polynomial-time algorithm.

    Our result depends on the Swapping Algorithm in a fundamental way—it does not follow from
Theorem 1.1 (which would roughly require ∆ ≤ (s/e)n). Hence, prior to this paper, we would not have
been able to even show the existence of such transversals; here we provide an efficient algorithm as well.
To see that our bound is asymptotically optimal, consider a matrix in which the first s + 1 rows all contain
a given color, a total multiplicity of ∆ = (s + 1)n. Then the transversal must contain that color at least
s + 1 times.

8.2    Rainbow Hamiltonian cycles and related structures
The problem of finding Hamiltonian cycles in the complete graph Kn , with edges of distinct colors, was
first studied in [17]. This problem is typically phrased in the language of graphs and edges, but we can
rephrase it in the language of Latin transversals, with the additional property that the permutation π has
full cycle. How often can a color appear in the matrix A, for this to be possible? In [3], it was shown that
such a transversal exists if each color appears at most ∆ = n/32 times.1 This proof is based on applying
the non-constructive LLLL to the probability space induced by a random choice of full-cycle permutation.
This result was later generalized in [15], which showed that if each color appears at most ∆ ≤ c0 n times
for a certain constant c0 > 0, then not only is there a full-cycle Latin transversal, but there are also cycles
of each length 3 ≤ k ≤ n. The constant c0 was somewhat small, and this result was also non-constructive.
Theorem 8.3 uses the Swapping Algorithm to construct Latin transversals with essentially arbitrary cycle
structures; this generalizes [15] and [3] quite a bit.

Theorem 8.3. Suppose that each color appears at most ∆ ≤ 0.027n times in the matrix A, and n is
sufficiently large. Let τ be any permutation on n letters, whose cycle structure contains no fixed points
nor swaps (2-cycles). Then there is a Latin transversal π which is conjugate to τ (i. e., has the same cycle
structure); furthermore the Swapping Algorithm finds it in polynomial time. Also, the Parallel Swapping
Algorithm finds it in time polylog(n).

Proof. We cannot apply the Swapping Algorithm directly to the permutation π, because we will not be
able to control its cycle structure. Rather, we will set π = σ −1 τσ , and apply the Swapping Algorithm to
σ.
    A bad event is that A(x, π(x)) = A(x0 , π(x0 )) for some x 6= x0 . Using the fact that τ has no fixed
points or 2-cycles, we can see that this is equivalent to one of the following two situations: (A) There
are i, i0 , x, y, x0 , y0 such that σ (x) = i, σ (y) = τ(i), σ (x0 ) = i0 , σ (y0 ) = τ(i0 ), and x, y, x0 , y0 are distinct,
and i, i0 , τ(i), τ(i0 ) are distinct, and A(x, y) = A(x0 , y0 ) or (B) There are i, x, y, z with σ (x) = i, σ (y) =
τ(i), σ (z) = τ 2 (i), and all of x, y, z are distinct, and A(x, y) = A(y, z). We will refer to the first type of bad
event as an event of type A led by i (such an event is also led by i0 ); we will refer to the second type of
bad event as type B led by i.
   1 The terminology used for rainbow Hamilton cycles is slightly different from that of Latin transversals. In the context of

Hamilton cycles, one often assumes that the matrix A is symmetric. Furthermore, since A(x, y) and A(y, x) always have the same
color, one only counts this as a single occurrence of that color. Thus, for example, in [3], the stated criterion is that the matrix A
is symmetric and a color appears at most ∆/64 times.


                            T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                                  29
                                  DAVID G. H ARRIS AND A RAVIND S RINIVASAN

    Note that in an A-event, the color is repeated in distinct column and rows, and in a B-event the column
of one coordinate is the row of another. So, to an extent, these events are mutually exclusive. Much of the
complexity of the proof lies in balancing the two configurations. To a first approximation, the worst case
occurs when A-events are maximized and B-events are impossible. This intuition should be kept in mind
during the following proof.
    We will define the function µ for Pegden’s criterion as follows. Each event of type A is assigned the
same weight µA , and each event of type B is assigned weight µB . The event of type A has probability
(n − 4)!/n! and each event of type B has probability (n − 3)!/n!. In the following proof, we shall need to
compare the relative magnitude of µA , µB . In order to make this concrete, we set

                                    µA = 2.83036n−4 ,            µB = 1.96163n−3 .

(In deriving this proof, we left these constant coefficients undetermined until the end of the computation,
and we then verified that all desired inequalities held.)
    To apply Pegden’s criterion [32] for the convergence of the Swapping Algorithm, we will need to
analyze the independent sets of neighbors of each bad event. To count these, it will be convenient to
define the following sums. We let t denote the sum of µ(X) over all bad events X involving some fixed
term σ (x). Let s denote the sum of µ(X) over all bad events X (of type either A or B) led by any fixed
value i, and let b denote the sum of µ(X) over B-events X led by any fixed value i. Recall that each bad
event of type A is led by i and also by i0 .
    We now examine how to compute the term t. We will enumerate all the bad events that involve σ (x)
for some fixed value x. These correspond to color-repetitions involving either row or column x in the
matrix A. Let ci denote the number of occurrences of color i in column x of the matrix, excluding A(x, y);
similarly, let ri denote the number of occurrences of color i in row y of the matrix, again excluding A(x, y).
    A repeated color may have the one of the three following forms:

   1. A(y, x) = A(x, y0 ) where y 6= y0 ;

   2. A(x, y) = A(x0 , y0 ), where x 6= x0 , y 6= y0 ;

   3. A(y, x) = A(y0 , x0 ), where x 6= x0 , y 6= y0 .

If v1 , v2 , v3 denote the number of such repetitions, then we see that

                                                v1 ≤ ∑ ci ri ,
                                                         i
                                                v2 ≤ ∑ ci (∆ − ci − ri ) ,
                                                         i
                                                v3 ≤ ∑ ri (∆ − ci − ri ) .
                                                         i

      A repetition of the first type must correspond to an B-event, in which σ (y) = i, σ (x) = τ(i), σ (y0 ) =
τ 2 (i) for some i. For a repetition of the second type, if x0 6= y this correspond to an A-event in which
σ (x) = i, σ (y) = τ(i), σ (x0 ) = i0 , σ (y0 ) = τ(i0 ) for some i, i0 or alternatively if x0 = y it corresponds to a

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                     30
                        A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

B-event in which σ (x) = i, σ (y) = τ(i), σ (y0 ) = τ 2 (i) for some i. The third type of repetition is similar
to the second type. Summing the three cases gives
          t ≤ v1 nµB + v2 (max(n2 µA , nµB )) + v3 (max(n2 µA , nµB )) = v1 nµB + v2 n2 µA + v3 n2 µA
            ≤ ∑(c j r j nµB + c j (∆ − c j − r j )n2 µA + r j (∆ − c j − r j )n2 µA ) .
                j

 The RHS of this expression is maximized when there are n distinct colors with c j = 1 and n distinct
colors with r j = 1. For, suppose that a color has (say) c j > 1. If we decrement c j by 1 while adding a
new color with c j0 = 1, this changes the RHS by (−1 + 2(c j + r j ) − ∆)n2 µA + (−1 + ∆ − r j )nµB ≥ 0.
    This gives us
                                               t ≤ 2n3 ∆µA .
    Similarly, let us consider s. Given i, we choose some y with σ (y) = τ(i), and we list all color
repetitions A(x, y) = A(x0 , y0 ) or A(x, y) = A(y, z). The number of the former is at most ∑ j c j (∆ − c j − r j )
and the number of the latter is at most ∑ j c j r j . As before, this is maximized when each color appears
once in the column, leading to
                                                   s ≤ n3 ∆µA .
   Term b is maximized when there (2n/∆) colors, which each appear ∆/2 times in column y and ∆/2
times in row y; this yields
                                       b ≤ n2 (∆/2)µB .
Now fix some bad event of type A, with parameters i, i0 , x, y, x0 , y0 , and let us enumerate its independent
sets of neighbors. This could have one or zero events involving σ (x) and similarly for σ (y), σ (x0 ), σ (y0 );
this gives a total contribution of (1 +t)4 . The neighbor could also overlap on i; the total set of possibilities
is either zero such events, a B-event led by i − 2, a B-event led by i − 2 and an event led by i, an event led
by i − 1, an event led by i − 1 and an event led by i + 1, an event led by i, an event led by i + 1. There is
an identical factor for the contributions of bad events led by i0 − 2, . . . , i0 + 1. In total, Pegden’s criterion is
                               (n − 4)!
                              µA ≥      (1 + t)4 (1 + b + sb + s + s2 + s + s)2 .
                                  n!
    Applying the same type of analysis to an event of type B gives the criterion:
                           (n − 3)!
                      µB ≥          (1 + t)3 (1 + b + sb + sb + s + s2 + s2 + s + s2 + s) .
                               n!
    Putting all these constraints together gives a complicated system of polynomial equations, which can
be solved using a symbolic algebra package. Indeed, the stated values of µA , µB satisfy these conditions
when ∆ ≤ 0.027n and n is sufficiently large.
    Hence the Swapping Algorithm terminates, resulting in the desired permutation π = σ −1 τσ . It is
easy to see that the Parallel Swapping Algorithm works as well.
    We note that for certain cycle structures, such as the full cycle σ = (123 . . . n − 1 n) and n/2 trans-
positions σ = (12)(34) . . . (n − 1 n), the LLLL can be applied directly to the permutation π. This gives
a qualitatively similar condition, of the form ∆ ≤ cn, but the constant term is slightly better than ours.
For some of these settings, one can also apply a variant of the Moser–Tardos algorithm to find such
permutations [1]. However, these results do not apply to general cycle structures, and they do not give
parallel algorithms.

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                     31
                               DAVID G. H ARRIS AND A RAVIND S RINIVASAN

8.3   Strong chromatic number
Consider a graph G, whose vertices are partitioned into r blocks each of size b, i. e., V = V1 t · · · tVr .
We would like to b-color the vertices, such that every block has exactly b colors, and such that no edge
has both endpoints with the same color (i. e., it is a proper vertex coloring). This is referred to as a strong
coloring of the graph. If this is possible for any such partition of the vertices into blocks of size b, then
we say that the graph G has strong chromatic number b.
    A series of papers [5, 8, 14, 21] have provided bounds on the strong chromatic number of graphs,
typically in terms of their maximum degree ∆. In [22], it is shown that when b ≥ (11/4)∆ + Ω(1),
such a coloring exists; this is the best bound currently known. Furthermore, the constant 11/4 cannot
be improved to any number strictly less than 2. The methods used in most of these papers are highly
non-constructive, and do not provide algorithms for generating such colorings.
    In this section, we examine two routes to constructing strong colorings. The first proof, based
on [2], builds up the coloring vertex by vertex, using the ordinary LLL. The second proof uses the
permutation LLL to build the strong coloring directly. The latter appears to be the first RNC algorithm
with a reasonable bound on b.
    We first develop a related concept to the strong coloring known as an independent transversal. In an
independent transversal, we choose a single vertex from each block, so that the selected vertices form an
independent set of the graph.

Proposition 8.4. Suppose b ≥ 4∆. Then G has an independent transversal, which can be found in
expected time O(n∆).
   Furthermore, let v ∈ G be any fixed vertex. Then G has an independent transversal which includes v,
which can be found in expected time O(n∆2 ).

Proof. Use the ordinary LLL to select a single vertex uniformly from each block. See [9], [18] for more
details. This shows that, under the condition b ≥ 4∆, an independent transversal exists and is found in
expected time O(n∆).
    To find an independent transversal including v, we imagine assigning a weight 1 to vertex v and
weight zero to all other vertices. As described in [18], the expected weight of the independent transversal
returned by the Moser–Tardos algorithm, is at least Ω(w(V )/∆), where w(V ) is the total weight of all
vertices. This implies that that vertex v is selected with probability Ω(1/∆). Hence, after running the
Moser–Tardos algorithm for O(∆) separate independent executions, one finds an independent transversal
including v.

Theorem 8.5. If b ≥ 5∆, then G has a strong coloring, which can be found in expected time O(n2 ∆2 ).

Proof. (This proof is almost identical to the proof of Theorem 5.3 of [2]). We maintain a partial coloring
of the graph G, in which some vertices are colored with {1, . . . , b} and some vertices are uncolored.
Initially all vertices are uncolored. We require that in a block, no vertices have the same color, and no
adjacent vertices have the same color.
    Now, suppose some color is partially missing from the strong coloring; say without loss of generality
there is a vertex w missing color 1 in block 1. In each block i = 1, . . . , r, we will select some vertex vi
to have color 1. If the block does not have such a vertex already, we will simply assign vi to have color

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                32
                      A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

1. If the block i already had some vertex ui with color 1, we will swap the colors of vi and ui (if vi was
previously uncolored, then ui will become uncolored).
     We need to ensure three things. First, the vertices v1 , . . . , vr must form an independent transversal of
G. Second, if we select vertex vi and swap its color with ui , this cannot cause ui to have any conflicts
with its neighbors. Third, we will select v1 = w.
     A vertex ui will have conflicts with its neighbors if vi currently has the same color as one of the
neighbors of ui . In each block, there are at least b − ∆ possible choices of vi that avoid that; we must
select an independent transversal among these vertices, which also includes the designated vertex w.
By Proposition 8.4, this can be done in time O(n∆2 ) as long as b − ∆ ≥ 4∆. Whenever we select the
independent transversal v1 , . . . , vr , the total number of colored vertices increases by at least one. For,
the vertex w becomes colored while it was not initially, and in every other block the number of colored
vertices does not decrease. So, after n iterations, the entire graph has a strong coloring; the total time is
O(n2 ∆2 ).

    The algorithm based on the ordinary LLL is slow and is inherently sequential. The permutation LLL
gives a more direct and faster construction; however, the hypothesis of the theorem will need to be slightly
stronger.
Theorem 8.6. Suppose G is a graph of maximum degree ∆, whose vertices are partitioned into blocks
of size b. Then if b ≥ (256/27)∆, it is possible to strongly color graph G in expected time O(n∆). If
b ≥ (256/27 + ε)∆ for some constant ε > 0, there is an RNC algorithm to construct such a strong
coloring.
Proof. For each block, we assume the vertices and colors are identified with the set [b]. Then any proper
coloring of a block corresponds to a permutation of Sb . When we discuss the color of a vertex v, we refer
to πk (v) where k is the block containing vertex v.
    For each edge f = (u, v) ∈ G and any color c ∈ {1, . . . b}, we have a bad event that both u and v have
color c. (Note that we cannot specify simply that u and v have the same color, because of the restricted
class of bad events we consider.)
    Each bad event has probability 1/b2 . We apply Pegden’s criterion using the weight function µ(B) = α
for every bad event B, where α is a scalar to be determined. Each such event (u, v, c) is dependent with
four other types of bad events:
   1. An event u, v0 , c0 where v0 is connected to vertex u;
   2. An event u0 , v, c0 where u0 is connected to vertex v;
   3. An event u0 , v0 , c where u0 is in the block of u and v0 is connected to u0 ;
   4. An event u0 , v0 , c where v0 is in the block of v and u0 is connected to v0 .
    There are b∆ neighbors of each type. For any of these four types, all the neighbors are themselves
connected to each other. Hence an independent set of neighbors can contain one or zero of each of the
four types of bad events. Using Lemma 3.1 and Pegden’s criterion [32], a sufficient condition for the
convergence of the Swapping Algorithm is that
                                          α ≥ (1/b2 ) · (1 + b∆α)4 .

                       T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                 33
                                 DAVID G. H ARRIS AND A RAVIND S RINIVASAN

    When b ≥ (256/27)∆, this has a real positive root α ∗ (which is a complicated algebraic expression).
Furthermore, in this case the expected number of swaps of each permutation is at most b2 ∆α ∗ ≤
(256/81)∆. So the Swapping Algorithm terminates in expected time O(n∆). A similar argument applies
to the parallel Swapping Algorithm.

8.4    Hypergraph packing
In [28], the following packing problem was considered. Suppose we are given two r-uniform hypergraphs
H1 , H2 and an integer n. Is it possible to find two injections φi : V (Hi ) → [n] with the property that φ1 (H1 )
is edge-disjoint to φ2 (H2 )? (That is, there are no edges e1 ∈ H1 , e2 ∈ H2 with {φ1 (v) | v ∈ e1 } = {φ2 (v) |
v ∈ e2 }.) A sufficient condition on H1 , H2 , n was given using the LLLL. We achieve this algorithmically
as well.

Theorem 8.7. Suppose that H1 and H2 have m1 and m2 edges, respectively. Suppose that each edge of
Hi intersects with at most di other edges of Hi , and suppose that
                                                                   n
                                                                    
                                                                          r
                                         (d1 + 1)m2 + (d2 + 1)m1 <            .
                                                                          e
    Then the Swapping Algorithm finds injections φi : V (Hi ) → [n] such that φ1 (H1 ) is edge-disjoint to
φ2 (H2 ). Suppose further that r ≤ polylog(n) and

                                                            (1 − ε) nr
                                                                      
                                 (d1 + 1)m2 + (d2 + 1)m1 <              .
                                                                e
    Then the Parallel Swapping Algorithm finds such injections with high probability in polylog(n)/ε
time and using poly(m1 , m2 , n) processors.

Proof. [28] proves this fact using the LLLL, and the proof immediately applies to the Swapping Algorithm
as well. We review the proof briefly: we may assume without loss of generality that the vertex set of H1 is
[n] and the vertex set of H2 has cardinality n and that φ1 is the identity permutation; then we only need to
select the bijection φ2 : H2 → [n]. For each pair of edges e1 = {u1 , . . . , ur } ∈ H1 , e2 = {v1 , . . . , vr } ∈ H2 ,
and each ordering σ ∈ Sr , there is a separate bad event φ2 (v1 ) = uσ 1 ∧ · · · ∧ φ2 (vr ) = uσ r . Now observe
that the LLL criterion is satisfied for these bad events, under the stated hypothesis.
     The proof for the Parallel Swapping Algorithm is almost immediate. There is one slight complication:
the total number of bad events is m1 m2 r!, which could be superpolynomial. However, it is easy to see
that the total number of bad events which are true at any one time is at most m1 m2 , since each pair of
edges e1 , e2 can have at most one σ with φ2 (v1 ) = uσ 1 ∧ · · · ∧ φ2 (vr ) = uσ r . It is not hard to see that
Theorem 7.6 still holds under this condition.


9     Conclusion
The original formulation of the LLLL [13] applies in a natural way to general probability spaces. There
has been great progress over the last few years in developing constructive algorithms, which find in

                         T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                      34
                     A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

polynomial time the combinatorial structures in these probability spaces whose existence is guaranteed
by the LLL. These algorithms have been developed in great generality, encompassing the Swapping
Algorithm as a special case.
    However, the Moser–Tardos algorithm has uses beyond simply finding a object which avoids the bad
events. In many ways, the Moser–Tardos algorithm is more powerful than the LLL. We have already seen
problems that feature its extensions: e. g., Theorem 8.2 requires the use of the Partial Resampling variant
of the Moser–Tardos algorithm, and Proposition 8.4 requires the use of the Moser–Tardos distribution
(albeit in the context of the original Moser–Tardos algorithm, not the Swapping Algorithm).
    While the algorithmic frameworks of Achlioptas & Iliopoulos and Harvey & Vondrák achieve the
main goal of a generalized constructive LLL algorithm, they do not match the full power of the Moser–
Tardos algorithm. However, our analysis shows that the Swapping Algorithm matches nearly all of the
additional features of the Moser–Tardos algorithm. In our view, one main goal of our paper is to serve
as a roadmap to the construction of a true generalized LLL algorithm. Behind all the difficult technical
analysis, there is the underlying theme: even complicated probability spaces such as permutations can
be reduced to “variables” (the domain and range elements of the range) which interact in a somewhat
“independent” fashion.
    Encouragingly, there has been progress toward this goal. For example, one main motivation of [1, 20]
was to generalize the Swapping Algorithm. Then, Kolmogorov noted that our Swapping Algorithm had
the nice property that the choice of which bad event to resample can be made arbitrarily, a property
missing from analysis of [1]; this led to Kolmogorov’s work [26] where he partially generalized that
property (to which he refers as commutativity).
    At the current time, we do not even know how to define a truly generalized LLL algorithm, let alone
analyze it. But we hope that we have at least provided an example approach toward such an algorithm.


10     Acknowledgments
We would like to thank the anonymous reviewers of the conference and journal versions of this paper, for
their helpful comments and suggestions.


A     Symmetry properties of the swapping subroutine
In this appendix, we show a variety of symmetry properties of the swapping subroutine. This analysis
will use simple results and notations of group theory. We let Sn denote the symmetric group on n letters,
which we identify with the set of permutations of [n]. We let (a b) denote the permutation (of whatever
dimension is appropriate) that swaps a/b and is the identity otherwise. We write multiplications on the
right, so that σ τ denotes the permutation which maps x to σ (τ(x)). Finally, we will sometimes write σ x
instead of the more cumbersome σ (x).

Proposition A.1. For any permutations τ, σ we have

                  P(Swap(π; x1 , . . . , xr ) = σ ) = P(Swap(πτ; τ −1 x1 , . . . , τ −1 xr ) = σ τ)

                      T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                             35
                                     DAVID G. H ARRIS AND A RAVIND S RINIVASAN

and
                           P(Swap(π; x1 , . . . , xr ) = σ ) = P(Swap(τπ; x1 , . . . , xr = τσ ) .
(Less formally, the swapping subroutine is invariant under permutations of the domain or range.)
Proof. We prove this by induction on r. The following equivalence will be useful. We can view a single
call to Swap as follows: we select a random x10 and swap x1 with x10 ; let π 0 = π · (x1 x10 ) denote the
permutation after this swap. Now consider the permutation on n − 1 letters obtained by removing x1 from
the range and π 0 (x1 ) from the range of π 0 ; we use the notation π 0 − (x1 , ∗) to denote this restriction of
range/domain. We then recursively call Swap(π 0 − (x1 , ∗), x2 , . . . , xr ).
     Now, in order to have Swap(πτ; τ −1 x1 , . . . , τ −1 xr ) = σ τ we must first swap τ −1 x1 with x10 =
τ −1 π −1 σ τx1 ; this occurs with probability 1/n. Then we would have

      P(Swap(πτ; τ −1 x1 , . . . , τ −1 xr ) = σ τ)
            = n1 P(Swap(πτ(τ −1 x1 τ −1 π −1 σ x1 ) − (τ −1 x1 , ∗); τ −1 x2 , . . . , τ −1 xr ) = σ τ − (τ −1 x1 , ∗))
            = 1n P(Swap(πτ(τ −1 x1 τ −1 π −1 σ x1 )τ −1 − (x1 , ∗); τ −1 τx2 , . . . , τ −1 τxr ) = σ ττ −1 − (x1 , ∗))
                  by inductive hypothesis
            = n1 P(Swap(π(x1 π −1 σ x1 )τ −1 − (x1 , ∗); x2 , . . . , xr ) = σ − (x1 , ∗))
            = P(Swap(π; x1 , x2 , . . . , xr ) = σ ) .

A similar argument applies for permutation of the range (i. e., post-composition by τ).

Proposition A.2. Let π ∈ Sn , let x1 , . . . , xr ∈ [n], and let ρ ∈ Sr . For any permutation σ ∈ Sn we have

                          P(Swap(π; x1 , . . . , xr ) = σ ) = P(Swap(π; xρ(1) , . . . , xρ(r) = σ ) .

Proof. We prove this by induction on r. We assume ρ(1) 6= 1 or else this follows immediately from
induction. We compute:

   P(Swap(π; xρ(1) , . . . , xρ(r) ) = σ )
        = 1n P(Swap(π(xρ(1) π −1 σ xρ(1) ); xρ(2) , . . . , xρ(r) ) = σ )
        = n1 P(Swap(π(xρ(1) π −1 σ xρ(1) ); x1 , x2 , . . . , xρ(1)
                                                                ˆ , . . . , xr ) = σ )   by I.H.
             1
        = n(n−1) P(Swap(π(xρ(1) π −1 σ xρ(1) )(x1 (π(xρ(1) π −1 σ xρ(1) )−1 )σ x1 )x2 , . . . , xρ(1)
                                                                                                  ˆ , . . . , xr ) = σ )
             1
        = n(n−1) P(Swap(π(xρ(1) π −1 σ xρ(1) )(x1 (xρ(1) π −1 σ xρ(1) )π −1 σ x1 )x2 , . . . , xρ(1)
                                                                                                 ˆ , . . . , xr ) = σ ) ,

where we have used the notation . . . , xρ(1)
                                          ˆ , . . . to denote . . . , xρ(1)−1 , xρ(1)+1 , . . . .
     At this point, consider the following simple fact about permutations: for any a1 , a2 , b1 , b2 ∈ [`] with
a1 6= a2 , b1 6= b2 , we have

                                    (a2 b2 )(a1 (a2 b2 )b1 ) = (a1 b1 )(a2 (a1 b1 )b2 ) .

This fact is simple to prove by case analysis considering which of the letters a1 , a2 , b1 , b2 are equal to
each other.

                            T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                           36
                        A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

    We now apply this fact using a1 = x1 , a2 = xρ(1) , b1 = π −1 σ a1 , b2 = π −1 σ a2 ; this gives us
       P(Swap(π; xρ(1) , . . . , xρ(r) ) = σ )
              1
         = n(n−1) P(Swap(π(x1 π −1 σ x1 )(xρ(1) (x1 π −1 σ x1 )π −1 σ xρ(1) ); x2 , . . . , xρ(1)
                                                                                              ˆ , . . . , xr ) = σ )

         = 1n P(Swap(π(x1 π −1 σ x1 ); xρ(1) , x2 , . . . , xρ(1)
                                                              ˆ , . . . , xr ) = σ )

         = 1n P(Swap(π(x1 π −1 σ x1 ); x2 , . . . , xr ) = σ )         by I.H.
         = P(Swap(π; x1 , . . . , xr ) = σ ) .
    In our analysis and algorithm, we will seek to maintain the symmetry between the “domain” and
“range” of the permutation. The swapping subroutine seems to break this symmetry, inasmuch as the
swaps are all based on the domain of the permutation. However, this symmetry breaking is only superficial
as shown in Proposition A.3.
Proposition A.3. Define the alternate swapping subroutine, which we denote Swap2(π; y1 , . . . , yr ) as
follows:
   1. Repeat the following for i = 1, . . . , r:
   2. Select y0i uniformly at random among [n] − {y1 , . . . , yi−1 }.
   3. Swap entries π −1 (yi ) and π −1 (y0i ) of the permutation π.
    More compactly:
                                 Swap2(π; y1 , . . . , yr ) = Swap(π −1 , y1 , . . . , yr )−1 .
    If π(x1 ) = y1 , . . . , π(xr ) = yr , then for any permutation σ we have
                          P(Swap(π; x1 , . . . , xr ) = σ ) = P(Swap2(π; y1 , . . . , yr ) = σ ) .
Proof. A similar recursive definition applies to Swap2 as for Swap: we select x10 uniformly at random,
swap x1 /x10 , and then call Swap2(π(x1 x10 ) − (∗, y1 ); y2 , . . . , yr ). The main difference is that we remove
the image point (∗, y1 ) instead of the domain point (x1 , ∗).
    Now, in order to have Swap2(π; y1 , . . . , yr ) = σ we must first swap x1 with x10 = π −1 σ x1 ; this occurs
with probability 1/n. Next, we recursively call Swap2 on the permutation π(x1 x10 ) − (∗, y1 ) yielding:
        P(Swap2(π; y1 , . . . , yr ) = σ )
                    = 1n P(Swap2(π(x1 x10 ) − (∗, y1 ); y2 , . . . , yr ) = σ − (∗, y1 ))
                    = n1 P(Swap(π(x1 x10 ) − (∗, y1 ); (x1 x10 )π −1 y2 , . . . , (x1 x10 )π −1 yr ) = σ − (∗, y1 ))
                          by inductive hypothesis
                    = 1n P(Swap(π − (x1 , y1 ); x2 , . . . , xr ) = σ (x1 x10 ) − (x1 , y1 ))
                         by Proposition A.1, when we pre-compose with (x1 x10 )
                    = 1n P(Swap((σ x1 σ x10 )π − (x1 , ∗); x2 , . . . , xr ) = (σ x1 σ x10 )σ (x1 x10 ) − (x1 , ∗))
                         by Proposition A.1; when we post-compose with (σ x1 σ x10 )
                    = n1 P(Swap(π(x1 x10 ) − (x1 , ∗); x2 , . . . , xr ) = σ − (x1 , ∗))
                    = P(Swap(π; x1 , . . . , xr ) = σ ) .


                          T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                                        37
                            DAVID G. H ARRIS AND A RAVIND S RINIVASAN

References
 [1] D IMITRIS ACHLIOPTAS AND F OTIS I LIOPOULOS: Random walks that find perfect objects and
     the Lovász Local Lemma. J. ACM, 63(3):22:1–29, 2016. Preliminary version in FOCS’14.
     [doi:10.1145/2818352, arXiv:1406.0242] 2, 4, 5, 19, 20, 31, 35

 [2] RON A HARONI , E LI B ERGER , AND R AN Z IV: Independent systems of representatives in weighted
     graphs. Combinatorica, 27(3):253–267, 2007. [doi:10.1007/s00493-007-2086-y] 32

 [3] M ICHAEL H. A LBERT, A LAN M. F RIEZE , AND B RUCE A. R EED: Multicoloured Hamilton cycles.
     Electr. J. Combin., 2(R10), 1995. URL. 29

 [4] N OGA A LON: Probabilistic proofs of existence of rare events. In Geometric Aspects of
     Functional Analysis, volume 1376 of Lect. Notes in Math., pp. 186–201. Springer, 1989.
     [doi:10.1007/BFb0090055] 5

 [5] N OGA A LON: The strong chromatic number of a graph. Random Structures Algorithms, 3(1):1–7,
     1992. [doi:10.1002/rsa.3240030102] 5, 32

 [6] N OGA A LON , L ÁSZLÓ BABAI , AND A LON I TAI: A fast and simple randomized parallel algorithm
     for the maximal independent set problem. J. Algorithms, 7(4):567–583, 1986. [doi:10.1016/0196-
     6774(86)90019-2] 23

 [7] N OGA A LON , J OEL H. S PENCER , AND P RASAD T ETALI: Covering with Latin transversals. Discr.
     Appl. Math., 57(1):1–10, 1995. [doi:10.1016/0166-218X(93)E0136-M] 5

 [8] M ARIA A XENOVICH AND RYAN R. M ARTIN: On the strong chromatic number of graphs. SIAM J.
     Discrete Math., 20(3):741–747, 2006. [doi:10.1137/050633056, arXiv:1605.06574] 5, 32

 [9] RODRIGO B ISSACOT, ROBERTO F ERNÁNDEZ , A LDO P ROCACCI , AND B ENEDETTO S COPPOLA:
     An improvement of the Lovász Local Lemma via cluster expansion. Combin. Probab. Comput.,
     20(5):709–719, 2011. [doi:10.1017/S0963548311000253, arXiv:0910.1824] 2, 4, 5, 19, 27, 28, 32

[10] J ULIA B ÖTTCHER , YOSHIHARU KOHAYAKAWA , AND A LDO P ROCACCI: Properly coloured copies
     and rainbow copies of large graphs with small maximum degree. Random Structures Algorithms,
     40(4):425–436, 2012. [doi:10.1002/rsa.20383, arXiv:1007.3767] 2

[11] S TEPHEN A. C OOK: A taxonomy of problems with fast parallel algorithms. Inf. Control, 64(1-3):2–
     22, 1985. [doi:10.1016/S0019-9958(85)80041-3] 23

[12] PAUL E RD ŐS , D EAN R. H ICKERSON , D ONALD A. N ORTON , AND S HERMAN K. S TEIN: Has
     every Latin square of order n a partial Latin transversal of size n − 1? Amer. Math. Monthly,
     95(5):428–430, 1988. [doi:10.2307/2322477] 4

[13] PAUL E RD ŐS AND J OEL H. S PENCER: Lopsided Lovász Local Lemma and Latin transversals.
     Discr. Appl. Math., 30(2-3):151–154, 1991. [doi:10.1016/0166-218X(91)90040-4] 2, 4, 5, 27, 34

                     T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                         38
                   A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

[14] M ICHAEL R. F ELLOWS: Transversals of vertex partitions in graphs. SIAM J. Discrete Math.,
     3(2):206–215, 1990. [doi:10.1137/0403018] 5, 32

[15] A LAN M. F RIEZE AND M ICHAEL K RIVELEVICH: On rainbow trees and cycles. Electr. J. Combin.,
     15(R59), 2008. URL. 29

[16] B ERNHARD H AEUPLER , BARNA S AHA , AND A RAVIND S RINIVASAN: New constructive aspects
     of the Lovász Local Lemma. J. ACM, 58(6):28:1–28, 2011. [doi:10.1145/2049697.2049702,
     arXiv:1001.1231] 2, 19, 21, 22

[17] G E ŇA H AHN AND C ARSTEN T HOMASSEN: Path and cycle sub-Ramsey numbers and an edge-
     colouring conjecture. Discrete Math., 62(1):29–33, 1986. [doi:10.1016/0012-365X(86)90038-5]
     29

[18] DAVID G. H ARRIS AND A RAVIND S RINIVASAN: The Moser-Tardos framework with partial resam-
     pling. In Proc. 54th FOCS, pp. 469–478. IEEE Comp. Soc. Press, 2013. [doi:10.1109/FOCS.2013.57,
     arXiv:1406.5943] 2, 3, 4, 19, 22, 28, 32

[19] DAVID G. H ARRIS AND A RAVIND S RINIVASAN: A constructive algorithm for the Lovász Local
     Lemma on permutations. In Proc. 25th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’14),
     pp. 907–925. ACM Press, 2014. [doi:10.1137/1.9781611973402.68, arXiv:1612.02663] 1, 2, 4

[20] N ICHOLAS J. A. H ARVEY AND JAN VONDRÁK: An algorithmic proof of the Lovász Local
     Lemma via resampling oracles. In Proc. 56th FOCS, pp. 1327–1346. IEEE Comp. Soc. Press, 2015.
     [doi:10.1109/FOCS.2015.85, arXiv:1504.02044] 2, 4, 5, 19, 20, 35

[21] P ENNY E. H AXELL: On the strong chromatic number. Combin. Probab. Comput., 13(6):857–865,
     2004. [doi:10.1017/S0963548304006157] 5, 32

[22] P ENNY E. H AXELL: An improved bound for the strong chromatic number. J. Graph Theory,
     58(2):148–158, 2008. [doi:10.1002/jgt.20300] 5, 32

[23] A. D ONALD K EEDWELL AND J ÓZSEF D ÉNES: Latin Squares and Their Applications. 2nd ed.
     Elsevier, 2015. 4

[24] P ETER K EEVASH AND C HENG Y EAW K U: A random construction for permutation codes and the
     covering radius. Designs, Codes and Cryptography, 41(1):79–86, 2006. [doi:10.1007/s10623-006-
     0017-3] 2

[25] K ASHYAP BABU R AO KOLIPAKA AND M ARIO S ZEGEDY: Moser and Tardos meet Lovász. In
     Proc. 43rd STOC, pp. 235–244. ACM Press, 2011. [doi:10.1145/1993636.1993669] 2

[26] V LADIMIR KOLMOGOROV: Commutativity in the algorithmic Lovász Local Lemma. In Proc. 57th
     FOCS, pp. 780–787. IEEE Comp. Soc. Press, 2016. [doi:10.1109/FOCS.2016.88, arXiv:1506.08547]
     2, 4, 5, 21, 35

                     T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                       39
                            DAVID G. H ARRIS AND A RAVIND S RINIVASAN

[27] L INYUAN L U , AUSTIN M OHR , AND L ÁSZLÓ S ZÉKELY: Quest for negative dependency graphs.
     In Recent Advances in Harmonic Analysis and Applications, pp. 243–258. Springer, 2012.
     [doi:10.1007/978-1-4614-4565-4_21] 2, 3

[28] L INYUAN L U AND L ÁSZLÓ S ZÉKELY: Using Lovász Local Lemma in the space of random
     injections. Electr. J. Combin., 14(R63), 2007. URL. 2, 3, 20, 34

[29] M ICHAEL L UBY: A simple parallel algorithm for the maximal independent set problem. SIAM J.
     Comput., 15(4):1036–1053, 1986. Preliminary version in STOC’85. [doi:10.1137/0215074] 23

[30] AUSTIN M OHR: Applications of the lopsided Lovász Local Lemma regarding hypergraphs. Ph. D.
     thesis, University of South Carolina, 2013. Available at author’s webpage. 2, 3

[31] ROBIN A. M OSER AND G ÁBOR TARDOS: A constructive proof of the general Lovász Local
     Lemma. J. ACM, 57(2):11:1–15, 2010. [doi:10.1145/1667053.1667060, arXiv:0903.0544] 2, 3, 5,
     7, 8, 19, 20, 22, 26, 27

[32] W ESLEY P EGDEN: An extension of the Moser-Tardos algorithmic local lemma. SIAM J. Discrete
     Math., 28(2):911–917, 2014. [doi:10.1137/110828290, arXiv:1102.2853] 2, 19, 28, 30, 33

[33] A LEXANDER D. S COTT AND A LAN D. S OKAL: The repulsive lattice gas, the independent-
     set polynomial, and the Lovász Local Lemma. J. Stat. Phys., 118(5-6):1151–1261, 2005.
     [doi:10.1007/s10955-004-2055-4, arXiv:cond-mat/0309352] 5

[34] JAMES B. S HEARER: On a problem of Spencer.             Combinatorica, 5(3):241–245, 1985.
     [doi:10.1007/BF02579368] 5, 21

[35] S HERMAN K. S TEIN: Transversals of Latin squares and their generalizations. Pacific J. Math.,
     59(2):567–575, 1975. [doi:10.2140/pjm.1975.59.567] 4, 5

[36] S ÁNDOR S ZABÓ: Transversals of rectangular arrays. Acta Math. Univ. Comenianae, 77(2):279–284,
     2008. Available at EMIS. 2, 4, 5, 28


AUTHORS

      David Harris
      Affiliate
      University of Maryland
      College Park, MD
      davidgharris29@gmail.com
      https://sites.google.com/site/davidgharriswebsite/home




                     T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                       40
                  A C ONSTRUCTIVE L OV ÁSZ L OCAL L EMMA FOR P ERMUTATIONS

    Aravind Srinivasan
    Professor
    University of Maryland
    College Park, MD
    srin@cs.umd.edu
    https://www.cs.umd.edu/~srin/


ABOUT THE AUTHORS

    DAVID H ARRIS received his Ph. D. from the University of Maryland in 2015; his advisor
      was Aravind Srinivasan. His main research focus in on probability in computing and
      the Lovász Local Lemma. He lives in Silver Spring, MD with his family and their dog
      Ziggy.


    A RAVIND S RINIVASAN is a Professor of Computer Science at the University of Maryland,
       College Park. His Ph. D. from Cornell University, received in 1993, was advised by
       David Shmoys. Aravind’s research interests include algorithms, randomness, networks,
       and computing in the service of society (see Nature, 2004 and ACM IHI (Internat. Health
       Informatics) 2012 articles).




                   T HEORY OF C OMPUTING, Volume 13 (17), 2017, pp. 1–41                         41
A Constructive Lov&aacute;sz Local Lemma for Permutations

A Constructive Lovász Local Lemma for Permutations