The Minimum Bisection in the Planted Bisection Model

Authors Amin Coja-Oghlan, Oliver Cooley, Mihyun Kang, Kathrin Skubch,
Plaintext
                           T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22
                                        www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2015




                    The Minimum Bisection in the
                      Planted Bisection Model
               Amin Coja-Oghlan∗                     Oliver Cooley†                    Mihyun Kang‡
                                                  Kathrin Skubch
             Received December 19, 2015; Revised August 31, 2016; Published September 23, 2017




       Abstract: In the planted bisection model a random graph G(n, p+ , p− ) with n vertices is
       created by partitioning the vertices randomly into two classes of equal size (up to ±1). Any
       two vertices that belong to the same class are linked by an edge with probability p+ and any
       two that belong to different classes with probability p− < p+ independently. The planted
       bisection model has been used extensively to benchmark graph partitioning algorithms. If
       p± = 2d± /n for numbers 0 ≤ d− < d+ that remain fixed as n → ∞, then w. h. p. the “planted”
       bisection (the one used to construct the graph) will not be a minimum bisection. In this paper
       we derive an√ asymptotic formula for the minimum bisection width under the assumption that
       d+ − d− > c d+ ln d+ for a certain constant c > 0.

ACM Classification: G.2.2
AMS Classification: 05C80, 05C15
Key words and phrases: random graphs, minimum bisection, planted bisection, belief propagation

     A preliminary version of this paper appeared in the Proceedings of the 19th International Workshop on Randomization and
Computation (RANDOM 2015) [11].
   ∗ The research leading to these results has received funding from the European Research Council under the European Union’s

Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 278857–PTCC.
   † Supported by Austrian Science Fund (FWF): P26826 and W1230.
   ‡ Supported by Austrian Science Fund (FWF): P26826 and W1230.



© 2017 Amin Coja-Oghlan, Oliver Cooley, Mihyun Kang, and Kathrin Skubch
c b Licensed under a Creative Commons Attribution License (CC-BY)                              DOI: 10.4086/toc.2017.v013a008
              A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

1     Introduction
1.1    Background and motivation
The minimum bisection problem is a well-known NP-hard problem [21], in which given a graph G one
aims to find a partition of the vertex set of G into two classes of equal size (up to ±1) so as to minimise
the number of crossing edges between the two classes, called the bisection width. We denote by bis(G)
the minimum bisection width of G.
     Since the early days of computational complexity, graph partitioning problems, e. g., the minimum
bisection problem, have played a central role in computer science [21, 28]. Over the years they have
inspired some of the most important algorithmic techniques that we have at our disposal today, such as
network flows or semidefinite programming [3, 18, 22, 29, 42].
     In the context of the probabilistic analysis of algorithms, it is hard to think of a more intensely
studied problem than the planted bisection model. In this model a random graph G = G(n, p+1 , p−1 )
on vertex set [n] = {1, . . . , n} is created by choosing a map σ : V → {−1, 1} uniformly at random
subject to |σ −1 (1)| − |σ −1 (−1)| ≤ 1 and connecting any two vertices v 6= w with probability pσ (v)σ (w)
independently, where 0 ≤ p−1 < p+1 ≤ 1. To ease notation, we often write p+ for p+1 and p− for p−1 ,
and handle subscripts similarly for other parameters.
     Given the random graph G (but not the planted bisection σ ), the task is to find a minimum bisection
of G, i. e., to partition the vertices into two disjoint sets S, S̄ = [n] \ S whose sizes satisfy |S| − |S̄| ≤ 1
such that the number of S-S̄-edges is minimum. The planted bisection model has been employed to gauge
algorithms based on spectral, semidefinite programming, flow and local search techniques, to name but a
few [5, 6, 7, 8, 9, 12, 15, 16, 17, 26, 27, 31, 33, 35].
     Remarkably, for a long time the algorithm with the widest range of n, p± for which a minimum
bisection can be found efficiently was one of the earliest ones, namely Boppana’s spectral algorithm [6].1
It succeeds if
                                                            p
                                            n(p+ − p− ) ≥ c np+ ln n

for a certain constant c > 0. Under this assumption the planted bisection is minimum w. h. p. (with high
probability, meaning with probability tending to one as n → ∞). In fact, recently the critical value c∗ > 0
for which this statement is true was identified explicitly [1, 38]. In particular, for

                                              n(p+ − p− ) > c∗
                                                                   p
                                                                    np+ ln n

the minimum bisection width simply equals (1/4 + o(1))n2 p− w. h. p.
                             √
    But if n(p+ − p− ) < c∗ np+ ln n, then the minimum bisection width will be strictly smaller than the
width of the planted bisection w. h. p. Yet there is another spectral algorithm [9] that finds a minimum
bisection w. h. p. under the weaker assumption that
                                                               p
                                           n(p+ − p− ) ≥ c      np+ ln(np+ ) ,                                           (1.1)
    1 Boppana’s paper, which only appeared in conference proceedings, did not contain the proof of an important statement about

the spectrum of the adjacency matrix. A similar result could, however, be proved along the lines of the work of Friedman, Kahn
and Szemerédi [20]; see also [19, 44].


                            T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                             2
                     T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

for a certain constant c > 0, and even certifies the optimality of its solution. However, [9] does not answer
what is arguably the most immediate question: what is the typical value of the minimum bisection width
bis(G)?
    In this paper we derive the value to which the (suitably scaled) minimum bisection width converges
in probability (Theorem 1.1). We confine ourselves to the case that (n/2)p± = d± remain fixed as n → ∞.
Hence, the random graph G has bounded average degree. This is arguably the most interesting case
because the discrepancy between the planted and the minimum bisection gets larger as the graphs get
sparser. In fact, it is easy to see that in the case of fixed (n/2)p± = d± the difference between the planted
and the minimum bisection width is Θ(n) as the planted bisection is not even locally optimal w. h. p.
    Although we build upon some of the insights from [9], it seems difficult to prove our main result by
tracing the fairly complicated algorithm from that paper. Instead, our main tool is an elegant message
passing algorithm called Warning Propagation that plays an important role in the study of random
constraint satisfaction problems via ideas from statistical physics [36]. Running Warning Propagation on
G naturally corresponds to a fixed-point problem on the 2-simplex, and the minimum bisection width can
be cast as a function of the fixed point.

1.2   The main result
To state the fixed-point problem, we consider the functions
                                                     
                                                     −1
                                                               if x < −1,
                                ψ : R → R,       x 7→ x         if − 1 ≤ x ≤ 1,
                                                     
                                                      1         if x > 1,
                                                     

and
                                                       (
                                                        −1         if x ≤ −1,
                                   ψ̃ : R → R,    x 7→
                                                        1          if x > −1.

Let P({−1, 0, 1}) be the set of probability measures on {−1, 0, 1}. Clearly, we can identify P({−1, 0, 1})
with the set of all maps p : {−1, 0, 1} → [0, 1] such that p(−1) + p(0) + p(1) = 1, i. e., the 2-simplex. For
a {−1, 0, 1}-valued random variable Z we denote by L(Z) ∈ P({−1, 0, 1}) the distribution of Z. Given
p ∈ P({−1, 0, 1}), let (η p,i )i≥1 be a family of i. i. d. {−1, 0, 1}-valued random variables with distribution
p. Moreover, let γ± = Po(d± ) be Poisson variables that are independent of each other and of the η p,i . Let

                                                   γ+           γ+ +γ−
                                      Z p,d+ ,d− := ∑ η p,i −     ∑ η p,i                                (1.2)
                                                   i=1          i=γ+ +1

and define a map

                                  Td+ ,d− : P({−1, 0, 1}) → P({−1, 0, 1}),                               (1.3)
                                           p 7→ L(ψ(Z p,d+ ,d− ))


                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                 3
             A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

that maps p ∈ P({−1, 0, 1}) to the distribution of ψ(Z p,d+ ,d− ). Further, with (η p,i )i≥1 and γ± as before,
let

            ϕd+ ,d− : P({−1, 0, 1}) → R,
                              "                                                        #
                                γ+                         γ+ +γ− 
                          1
                      p 7→ E ∑ 1 η p,i = −ψ̃(Z p,d+ ,d− ) + ∑ 1 η p,i = ψ̃(Z p,d+ ,d− ) .
                          2 i=1                            i=γ+ +1

                                                              −10
Moreover, let us call p ∈ P({−1, 0, 1}) skewed if p(1) ≥ 1 − d+   .
Theorem 1.1. There exists a constant c > 0 such that for any d± > 0 satisfying
                                                              p
                            d+ ≥ 2      and      d+ − d− ≥ c d+ ln d+

the map Td+ ,d− has a unique skewed fixed point p∗ and n−1 bis(G) converges in probability to ϕd+ ,d− (p∗ ).
   Note that Td+ ,d− may have further fixed points besides p∗ for example the Dirac measure (0, 1, 0), but
 ∗
p is the only fixed point which is skewed. We also note that the condition d+ ≥ 2 is not optimised—any
constant larger than 1 would do as a lower bound, but then in any case the condition d+ ≥ 2 follows from
the lower bound on d+ − d− for sufficiently large c.
    In the following sections we will use that the assumptions of Theorem 1.1 allow us to assume that
also d+ is sufficiently large.

1.3   Further related work
Determining the minimum bisection width of a graph is NP-hard [21] and there is evidence that the
problem does not even admit a PTAS [30]. On the positive side, it is possible to approximate the minimum
bisection width within a factor of O(ln n) for graphs on n vertices in polynomial time [42].
    The planted bisection model has been studied in statistics under the name “stochastic block
model” [23]. However, in the context of statistical inference the aim is to recover the planted parti-
tion σ as closely as possible given G rather than to determine the minimum bisection width. Recently
there has been a lot of progress, much of it inspired by non-rigorous work [13], on the statistical inference
problem. The current status of the problem is that matching upper and lower bounds are known for the
values of d± for which it is possible to obtain a partition that is non-trivially correlated with σ [34, 37, 40].
Furthermore, there are algorithms that recover a best possible approximation to σ under certain conditions
on d± [1, 39, 38]. But since our objective is different, the methods employed in the present paper are
somewhat different and, indeed, rather simpler.
    Finally, there has been recent progress on determining the minimum bisection width on the Erdős-
Rényi random graph. Although its precise asymptotics remain unknown in the case of bounded average
degrees d, it was proved in [14] that the main correction term corresponds to the “Parisi formula” in the
Sherrington-Kirkpartrick model [43]. Additionally, regarding the case of very sparse random graphs (i. e.,
with constant average degree), there is a sharp threshold (at np = ln 4) for the minimum bisection width
to be linear in n [32].
    Generally speaking, the approach that we pursue is somewhat related to the notion of “local weak
convergence” of graph sequences as it was used in [2]. More specifically, we are going to argue that

                         T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                  4
                     T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

the minimum bisection width of G is governed by the “limiting local structure” of the graph, which is
a two-type Galton–Watson tree. The fixed-point problem in Theorem 1.1 mirrors the execution of a
message passing algorithm on the Galton–Watson tree. The study of this fixed-point problem, for which
we use the contraction method [41], is the key technical ingredient of our proof. We believe that this
strategy provides an elegant framework for tackling many other problems in the theory of random graphs
as well. In fact, in a recent paper [10] we combined Warning Propagation with a fixed point analysis on
Galton–Watson trees to the k-core problem. A similar approach was used in [24] in the context of random
constraint satisfaction problems. Further, in [4] Warning Propagation was applied to the random graph
coloring problem.



2     Outline
From here on √we keep the notation and the assumptions of Theorem 1.1. In particular, we assume that
d+ − d− ≥ c d+ ln d+ for a large enough constant c > 0 and that d± remain fixed as n → ∞. Furthermore
we assume that d+ is bounded from below by a large enough constant. Throughout the paper all graphs
will be locally finite and of countable size.


Three main insights enable the proof of Theorem 1.1. The first one, which we borrow from [9], is that
w. h. p. G features a fairly large set C of vertices such that for any two optimal bisections τ1 , τ2 of G
(i. e., maps τ1 , τ2 : V (G) → {±1}), we either have τ1 (v) = τ2 (v) for all v ∈ C or τ1 (v) = −τ2 (v) for all
v ∈ C (Lemma 2.1). In the language of random constraint satisfaction problems, the vertices in C are
“frozen.” While there remain Ω(n) unfrozen vertices, the subgraph that they induce is subcritical, i. e., all
components are of size O(ln n) and indeed most are of bounded size.
     The second main ingredient is an efficient message passing algorithm called Warning Propagation,
(cf. [36, Chapter 19]). We will show that a bounded number of Warning Propagation iterations suffice
to arrange almost all of the unfrozen vertices optimally (i. e., to assign almost all of the vertices to two
classes such that there is a minimum bisection respecting this assignment) and thus to obtain a very good
approximation to the minimum bisection w. h. p. (Proposition 2.2). This insight reduces our task to tracing
Warning Propagation for a bounded number of rounds.
     This last problem can be solved by studying Warning Propagation on a suitable Galton–Watson tree,
because G only contains a negligible number of short cycles w. h. p. (Lemma 2.3). Thus, the analysis of
Warning Propagation on the random tree is the third main ingredient of the proof. This task will turn out
to be equivalent to studying the fixed-point problem from Section 1.2 (Proposition 2.5). We proceed to
outline the three main components of the proof.


2.1   The core

Given a vertex u of a graph G let ∂G u denote the neighbourhood of u in G. We sometimes omit the
subscript G when the graph is clear from the context. More particularly, in the random graph G, let ∂± u
denote the set of all neighbours w of u in G with σ (w)σ (u) = ±1. Following [9], we define C as the

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                5
                 A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

largest subset U ⊂ [n] such that
                                     cp
                     |∂± u| − d± ≤     d+ ln d+            and        |∂ u \U| ≤ 100       for all u ∈ U .              (2.1)
                                     4
Clearly, the set C, which we call the core, is uniquely defined because any union of sets U that satisfy (2.1)
also has the property. Let σ C : C → {±1}, v 7→ σ (v) be the restriction of the “planted assignment” σ to
C.
    Furthermore, for a graph G, a set U ⊂ V (G) and a map σ : U → {−1, 1} we let
                                
                                              1 − τ(v)τ(w)
              cut(G, σ ) := min        ∑
                                   {v,w}∈E(G)
                                                    2
                                                                                             
                                      τ : V (G) → {±1} satisfies τ(v) = σU (v) for all v ∈ U .

Note that U does not appear explicitly in the notation cut(G, σ ) despite being integral to the definition—it
is however implicit in the notation since U is the domain of σ .
    In words, cut(G, σ ) is the smallest number of edges in a cut of G that separates the vertices in
U ∩ σ −1 (−1) from those in U ∩ σ −1 (1). In particular, cut(G, σC ) is the smallest cut of G that separates
the vertices in the core C that are frozen to −1 from those that are frozen to 1. Finally, for any vertex v
we define a set Cv = Cv (G, σ ) of vertices via the following process.
           (0)
C1 Let Cv = {v} ∪ ∂ v.
                         (t+1)      (t)                                         (t)
C2 Inductively, let Cv           = Cv ∪                ∂ u and let Cv =    t≥0 Cv .
                                          S                               S
                                                 (t)
                                            u∈Cv \C

Lemma 2.1 ([9], Proposition 19 and Section 3.6). We have
                                                                                           −100
                         bis(G) = cut(G, σ C ) + O(1)             and         |C| ≥ n(1 − d+    )

w. h. p. Furthermore, for any ε > 0 there exists ω > 0 such that w. h. p.

                                              ∑ |Cv | · 1 {|Cv | ≥ ω} ≤ εn .
                                              v∈[n]


2.2    Warning Propagation
To calculate cut(G, σ C ) we adopt the Warning Propagation (“WP”) message passing algorithm.2 Let us
first introduce WP for a generic graph G = (V (G), E(G)) and a map σ : U ⊂ V (G) → {−1, 1}. At each
time t ≥ 0, WP sends a “message” µv→w (t | G, σ ) ∈ {−1, 0, 1} from v to w for any edge {v, w} ∈ E(G).
The messages are directed objects, i. e., µv→w (t | G, σ ) and µw→v (t | G, σ ) may differ. They are defined
inductively by
                                                     (
                                                        σ (v) if v ∈ U,
                                 µv→w (0 | G, σ ) :=
                                                        0       otherwise,
   2 A discussion of Warning Propagation in the context of the “cavity method” from statistical physics can be found in [36].



                           T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                             6
                      T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

and for t ≥ 0
                                                                                 !
                             µv→w (t + 1 | G, σ ) := ψ        ∑ µu→v (t | G, σ ) .                           (2.2)
                                                          u∈∂ v\w

Again, U is implicit in the notation µv→w (t | G, σ ) since U is the domain of σ .
    Thus, the WP messages are initialised according to σ : U → {−1, 1}. Subsequently, v sends message
±1 to w if it receives more ±1 than ∓1 messages from its neighbours u 6= w. If there is a tie, v sends out
0. Finally, for t ≥ 0 define
                                   µv (t | G, σ ) := ∑ µw→v (t | G, σ ) .
                                                      w∈∂ v

    The intuition is that the message µv→w which v sends to w indicates which class v is most likely to be
in based on the current local information it receives from its other neighbours. To minimise the cut, we
would like to place v into the class in which most of its neighbours lie. The initialisation is given by the
set U, which we will choose to be the core.

Proposition 2.2. For any ε > 0 there exists t0 = t0 (ε, d+ , d− ) such that for all t ≥ t0 w. h. p.

                                   1
                  cut(G, σ C ) −      ∑ ∑ 1{µw→v (t | G, σ ) = −ψ̃ (µv (t | G, σ ))} ≤ εn .
                                   2 v∈[n] w∈∂ v

We defer the proof of Proposition 2.2 to Section 3.

2.3   The local structure
Proposition 2.2 shows that w. h. p. in order to approximate cut(G, σ C ) up to a small error of εn we merely
need to run WP for a number t0 of rounds that is bounded in terms of ε. The upshot is that the WP
messages µw→v (t | G, σ ) that are required to figure out the minimum bisection width are determined by
the local structure of G. We show that the local structure of G “converges to” a suitable Galton–Watson
tree. For this purpose, for simplicity we always say that the number of potential neighbours of any vertex
in each class is n/2. This ignores the fact that if n is odd the classes do not have quite this size and the
fact that a vertex cannot be adjacent to itself. However, ignoring these difficulties will not affect our
calculations in any significant way.
    Our task boils down to studying WP on that Galton–Watson tree. Specifically, let T = T d+ ,d− be the
Galton–Watson tree with two types +1, −1 and offspring matrix
                                                              
                                              Po(d+ ) Po(d− )
                                                                  .                                    (2.3)
                                              Po(d− ) Po(d+ )

Hence, a vertex of type ±1 spawns Po(d+ ) vertices of type ±1 and independently Po(d− ) vertices of type
∓1. Moreover, the type of the root vertex rT is chosen uniformly at random. Let τ = τ d+ ,d− : V (T ) →
{±1} assign each vertex of T its type.
     The random graph (G, σ ) “converges to” (T , τ) in the following sense. For two triples (G, r, σ ),
(G0 , r0 , σ 0 ) of graphs G, G0 , root vertices r ∈ V (G), r0 ∈ V (G0 ) and maps σ : V (G) → {±1}, σ 0 : V (G0 ) →

                         T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                    7
             A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

{±1} we write (G, r, σ ) ∼= (G0 , r0 , σ 0 ) if there is a graph isomorphism ϕ : G → G0 such that ϕ(r) = r0 and
       0
σ = σ ◦ ϕ. Further, we denote by ∂ t (G, r, σ ) the rooted graph obtained from (G, r, σ ) by deleting all
vertices at distance greater than t from r together with the restriction of σ to this subgraph. The following
lemma characterises the local structure of (G, σ ).
Lemma 2.3. Let t > 0 be an integer and let T be any tree with root r and map τ : V (T ) → {±1}. Then
       1                                              n→∞
               1 ∂ t (G, v, σ ) ∼
                                = ∂ t (T, r, τ)                P ∂ t (T , rT , τ) ∼
                                                                                  = ∂ t (T, r, τ)
                                                                                               
          ∑                                          −−−→                                            in probability.
       n v∈[n]

Furthermore, w. h. p. G does not contain more than ln n vertices v such that ∂ t (G, v, σ ) contains a cycle.

Proof. Given a tree T with root r and map τ : V (T ) → {±1}, let
                                                      1
                                                              1 ∂ t (G, v, σ ) ∼
                                                                               = ∂ t (T, r, τ) .
                                                               
                             Xt = Xt (T, r, τ) :=        ∑
                                                      n v∈[n]

Similarly, for a set C of isomorphism classes of rooted {±1}-marked trees let

                                   qt = qt (C) := P ∂ t (T , rT , τ) ∈ C .
                                                                       

The proof proceeds by induction on t. If t = 0, pick a vertex v ∈ [n] uniformly at random, then

                 X0 = Pv (σ (v) = τ(r)) = 1/2               and        p0 = PT (τ(rT ) = τ(r)) = 1/2

for any τ(r) ∈ {±1}. To proceed from t to t + 1, let d denote the number of children v1 , . . . , vd of r in
T . For each i = 1, . . . , d, let Ti denote the tree rooted at vi in the forest obtained from T by removing
r and let τi : V (Ti ) → {±1} denote the restriction of τ to the vertex set of Ti . Finally, let C1 , . . . ,Cd˜
for some d˜ ≤ d denote the distinct isomorphism classes among {∂ t (Ti , vi , τi ) : i = 1, . . . , d}, and let
c j = |{i : ∂ t (Ti , vi , τi ) ∈ C j }|. Let v ∈ [n] be an arbitrary vertex in G. Our aim is to determine the
probability of the event {∂ t+1 (G, v, σ ) ∼      = ∂ t+1 (T, r, τ)}. Therefore, we think of G as being created in
three rounds. First, partition [n] into two classes. Second, randomly insert edges between vertices in
[n] \ {v} according to their planted sign. Finally, reveal the neighbours of v. For the above event to
happen, v must have d neighbours in G. Since |∂± v| are independent binomially distributed random
variables with parameters n/2 and p± and because (n/2)p± = d± , we may approximate |∂± v| with a
Poisson distribution, and v has degree d with probability

                                                     (d+ + d− )d
                                                                   + o(1) ,
                                                  d! exp(d+ + d− )

where the error term is bounded by (d+ + d− )/n by Le Cam’s inequality. Conditioned on v having degree
d, by induction v is adjacent to precisely c j vertices with neighbourhood isomorphic to ∂ t (Ti , vi , τi ) ∈ C j
with probability
                                                     d˜
                                             d
                                                          qt (C j ) + o(1) .
                                         c1 . . . cd˜ ∏
                                                      j=1


                           T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                        8
                      T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

The number of cycles of length ` ≤ 2t + 3 in G is stochastically bounded by the number of such cycles in
G(n, d+ /n) (the Erdős-Rényi graph). For each `, this number tends in distribution to a Poisson variable
with bounded mean (see, e. g., Theorem 3.19 in [25]) and so the total number of such cycles is bounded
w. h. p. Thus all the pairwise distances (in G − v) between neighbours of v are at least 2t + 1 w. h. p. (and
in particular this proves the second part of the lemma). Therefore
                                                                    d˜
                                         (d+ + d− )d
                                                      
                                                           d
                         EG [Xt+1 ] =                                   qt (C j ) + o(1) .
                                      d! exp(d+ + d− ) c1 . . . cd˜ ∏
                                                                    j=1

By definition of T , we obtain E[Xt+1 ] = qt+1 + o(1). To apply Chebyshev’s inequality, it remains to
determine E[Xt+12 ]. Let v, w ∈ [n] be two randomly choosen vertices. Then w. h. p. v and w have distance

at least 2t + 3 in G, conditioned on which ∂ t+1 (G, v, σ ) and ∂ t+1 (G, w, σ ) are independent. Therefore
we obtain

             Pv,w ∂ t+1 (G, v, σ ) ∼
                                   = ∂ t+1 (T, r, τ) ∧ ∂ t+1 (G, w, σ ) ∼
                                                                        = ∂ t+1 (T, r, τ)
                                                                                         

                = Pv ∂ t+1 (G, v, σ ) ∼
                                      = ∂ t+1 (T, r, τ) Pw ∂ t+1 (G, w, σ ) ∼  = ∂ t+1 (T, r, τ) + o(1) ,
                                                                                               


and finally
                2
                   ] =EG Pv ∂ t+1 (G, v, σ ) ∼
                                             = ∂ t+1 (T, r, τ) Pw ∂ t+1 (G, w, σ ) ∼
                                                                                   = ∂ t+1 (T, r, τ)
                                                                                                  
          EG [Xt+1
                        1
                      + EG [Xt+1 ] + o(1)
                        n
                     =EG [Xt+1 ]2 + o(1) .

The first assertion follows from Chebyshev’s inequality.

2.4   The fixed point
Let (T, r, τ) be a rooted tree together with a map τ : V (T ) → {±1}. Then for any pair v, w of adjacent
vertices we have the WP messages µv→w (t | T, τ), t ≥ 0, as defined in (2.2). Since we are going to be
particularly interested in the messages directed towards the root, we introduce the following notation.
Given the root r, any vertex v 6= r of T has a unique parent vertex u (the neighbour of v on the unique
path from v to r). Initially, let
                                           µv↑ (0 | T, r, τ) = τ(v)                                 (2.4)
and define
                                         µv↑ (t | T, r, τ) = µv→u (t | T, τ)                             (2.5)
for t > 0. In addition, set µr↑ (0 | T, r, τ) = τ(r) and let
                                                                          !
                          µr↑ (t + 1 | T, r, τ) = ψ   ∑ µv↑ (t | T, r, τ)        (t ≥ 0)                 (2.6)
                                                      v∈∂ r

be the message that r would send to its parent if there was one.

                         T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                               9
            A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

    For p = (p(−1), p(0), p(1)) ∈ P({−1, 0, 1}) we let p̄ = (p(1), p(0), p(−1)). Recalling the map

                         T = Td+ ,d− : P({−1, 0, 1}) → P({−1, 0, 1}) ,
                                         p 7→ T(p(−1), p(0), p(1)) = L(ψ(Z p,d+ ,d− ))

defined by (1.3) in Section 1.2 and writing Tt for its t-fold iteration, we observe the following.

Lemma 2.4. Let pt := Tt (0, 0, 1).

   1. Given that τ(rT ) = +1, the message µrT ↑ (t | T , rT , τ) has distribution pt .

   2. Given that τ(rT ) = −1, the message µrT ↑ (t | T , rT , τ) has distribution p̄t .

Proof. The proof is by induction on t. In the case t = 0 the assertion holds because µrT ↑ (0 | T , rT , τ) =
τ(rT ). Now assume that the assertion holds for t. To prove it for t + 1, let C± be the set of all children v
of rT with τ(rT )τ(v) = ±1. By construction, |C± | has distribution Po(d± ). Furthermore, let (T v , v, τv )
signify the subtree pending on a child v of rT . Because T is a Galton–Watson tree, the random subtrees
T v are mutually independent. Moreover, each T v is distributed as a Galton–Watson tree with offspring
matrix (2.3) and a root vertex of type ±τ(rT ) for each v ∈ C± . Therefore, by induction, the message
µv↑ (t | T v , v, τ v ) has distribution pt if τ(v) = 1 and p̄t if τ(v) = −1. As a consequence,
                                                                                                      !
                µrT ↑ (t + 1 | T , rT , τ) := ψ   ∑ µv↑ (t | T v , v, τ v ) + ∑ µv↑ (t | T v , v, τ v )
                                                  v∈C+                       v∈C−


has distribution pt+1 if τ(rT ) = 1 and p̄t+1 otherwise.

    Lemma 2.4 shows that the operator T mimics WP on the Galton–Watson tree (T , rT , τ). Hence, to
understand the behaviour of WP after a large enough number of iterations we need to investigate the fixed
point to which Tt (0, 0, 1) converges as t → ∞. In Section 4 we will establish the following.

Proposition 2.5. The operator T has a unique skewed fixed point p∗ and limt→∞ Tt (0, 0, 1) = p∗ .

Proof of Theorem 1.1. Consider the random variables

                             1
                        Xn :=  bis(G) ,
                             n
                        (t)  11
                       Yn :=      ∑ ∑ 1{µw→v (t | G, σ ) = −ψ̃ (µv (t | G, σ ))} .
                             2 n v∈[n] w∈∂ v  G


Then Lemma 2.1 and Proposition 2.2 imply that for any ε > 0,
                                                 h          i
                                                       (t)
                                        lim lim P Xn −Yn > ε = 0 .                                        (2.7)
                                        t→∞ n→∞


                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                10
                            T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

By Definition (2.2), µw→v (t | G, σ ) and µv (t | G, σ ) are both determined by ∂Gt v and the initialisation
µu→w (0 | G, σ ) for all u, w ∈ ∂Gt v, {u, w} ∈ E(G). Since (2.5) and (2.6) match the recursive definition (2.2)
of µw→v (t | G, σ ) and µv (t | G, σ ), Lemma 2.3 implies that for any fixed t > 0 (as n tends to infinity),
                                "                                                     #
     (t)   n→∞        (t)     1       
    Yn     →      x         := E ∑ 1 µw↑ (t | T , rT , τ) = −ψ̃(µrT (t | T , rT , τ))                in probability.
                              2 w∈∂ r
                                       T T
                                                                                                                  (2.8)

Now let p∗ denote the unique skewed fixed point of T guaranteed by Proposition 2.5. Since each child of
rT can be considered a root of an independent instance of T to which we can apply Lemma 2.4, we obtain
that given (τ(w))w∈∂ rT the sequence (µw↑ (t | T , rT , τ))w∈∂ rT converges to a sequence of independent
random variables (ηw )w∈∂ rT with distribution p∗ (if τ(w) = 1) and p̄∗ (if τ(w) = −1). By definition
µrT (t | T , rT , τ) converges to
                                        ∑ ηw +              ∑        ηw
                                         w∈∂ rT ,τ(w)=1          w∈∂ rT ,τ(w)=−1

Considering the offspring distributions of rT in both cases, i. e., τ(rT ) = ±1, we obtain from ϕd+ ,d− (p) =
ϕd+ ,d− ( p̄) for all p ∈ P({−1, 0, 1}) that

                                                 lim x(t) = ϕd+ ,d− (p∗ ) .                                       (2.9)
                                                 t→∞

Finally, combining (2.7)–(2.9) completes the proof.


3    Proof of Proposition 2.2
Lemma 3.1. If v ∈ C and w ∈ ∂G v, then µv→w (t | G, σ ) = σ (v) = µv→w (t | G, σ C ) for all t ≥ 0.

Proof. We proceed by induction on t. For t = 0 the assertion is immediate from the initialisation of the
messages. To go from t to t + 1, consider v ∈ C and w ∈ ∂G v. We may assume without loss of generality
that σ (v) = 1. By the definition of the WP message,
                                                                                !
                 µv→w (t + 1 | G, σ ) = ψ              ∑       µu→v (t | G, σ )     = ψ (S+ + S− + S0 )           (3.1)
                                                  u∈∂G v\{w}

where

                                      S+ :=            ∑               µu→v (t | G, σ ) ,
                                              u∈C∩σ −1 (+1)∩∂G v\{w}

                                      S− :=            ∑               µu→v (t | G, σ ) ,
                                              u∈C∩σ −1 (−1)∩∂G v\{w}

                                      S0 :=        ∑           µu→v (t | G, σ ) .
                                              u∈∂G v\(C∪{w})



                             T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                      11
             A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

Now, (2.1) ensures that
                  cp                                     cp                                cp
     S+ ≥ d+ −      d+ ln d+ ,         S− ≥ −d− −          d+ ln d+ ,      |S0 | ≤ 100 ≤     d+ ln d+ ,     (3.2)
                  4                                      4                                 4
provided that the constant c > 0 is chosen large enough. Combining (3.1) and (3.2), we see that
S+ + S− + S0 ≥ 1 and thus µv→w (t + 1 | G, σ ) = 1. The same argument works for µv→w (t + 1 | G, σ C ) =
1.

     Let Gv denote the subgraph of G induced on Cv . To prove Proposition 2.2, fix s > 0 large enough.
                                                                √
Let S = S(s) be the set of all vertices such that either |Cv | > s or Gv is cyclic. Then Lemma 2.1 (with
slightly smaller ε) and Lemma 2.3 imply that |S| ≤ εn w. h. p. For the rest of this section, let v 6∈ S be
fixed.
     For w ∈ Cv \ {v} we let w↑v be the neighbour of w on the path from w to v. We define Gw→v as the
component of w in the graph obtained from Gv by removing the edge {w, w↑v }. The vertex set of Gw→v
will be denoted by Cw→v . Further, hw→v is the maximum distance between w and any other vertex in
Gw→v . Additionally, hv is the maximum distance between v and any other vertex in Gv . Finally, let
σ v : Cv → {±1}, w 7→ σ (w) and let σ C,v : Cv ∩ C → {±1}, w 7→ σ C (w).

Lemma 3.2.

(1) For any w ∈ Cv \ {v} and any t > hw→v we have

                       µw→w↑v (t | G, σ ) = µw→w↑v (hw→v + 1 | G, σ ) = µw→w↑v (t | G, σ C ) .

(2) For any t ≥ hv we have µv (t | G, σ ) = µv (hv + 1 | G, σ ) = µv (t | G, σ C ).

Proof. The proof of (1) proceeds by induction on hw→v . The construction C1–C2 of Cv in Section 2.1
ensures that any w ∈ Cv with hw→v = 0 either belongs to C or has no neighbour besides w↑v . Hence for
the first case the assumption follows from Lemma 3.1. If ∂G w \ {w↑v } = 0/ we obtain that

                                  µw→w↑v (t | G, σ ) = µw→w↑v (t | G, σ C ) = 0

for all t ≥ 1 by the definition of the WP messages. Now, assume that hw→v > 0 and let t > hw→v . Then
all neighbours u 6= w↑v of w in Gw→v satisfy hu→v < hw→v . Thus, by induction
                                                                    

        µw→w↑v (t | G, σ ) = ψ        ∑           µu→w (t − 1 | G, σ )
                                   u∈∂G w\{w↑v }
                                                                          

                          =ψ          ∑           µu→w (hu→v + 1 | G, σ ) = µw→w↑v (hw→v + 1 | G, σ ) .
                                   u∈∂G w\{w↑v }


An analogous argument applies to µw→w↑v (t | G, σ C ). The proof of (2) is similar.

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                  12
                     T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

   For each vertex w ∈ Cv , w 6= v, let µw→v
                                          ∗  = µw→w↑v (s | G, σ ). Further, let µw∗ = µw (s | G, σ ). In
addition, for z ∈ {±1} let
                                                              (
                                                                 z      if u = w,
                   σ w→v : Cw→v ∩ ({w} ∪ C) → {±1} ,
                     z
                                                         u 7→
                                                                 σ (u) otherwise.

In words, σ zw→v freezes w to z and all other u ∈ Cw→v that belong to the core to σ (u). Analogously, let
                                                               (
                                                                z        if u = v,
                      σ v : Cv ∩ ({v} ∪ C) → {±1} ,
                        z
                                                         u 7→
                                                                σ (u) otherwise.

Lemma 3.3. Suppose that u ∈ Cv \ {v} is such that hu→v ≥ 1.
            ∗
(1) If z = µu→v ∈ {−1, 1}, then

                                    cut(Gu→v , σ zu→v ) < cut(Gu→v , σ −z
                                                                       u→v ) .                         (3.3)

    Similarly, if z = ψ(µv∗ ) ∈ {−1, 1}, then

                                          cut(Gv , σ zv ) < cut(Gv , σ −z
                                                                       v ).                            (3.4)

        ∗
(2) If µu→v = 0, then
                                                                      −1
                                    cut(Gu→v , σ +1
                                                 u→v ) = cut(Gu→v , σ u→v ) .                          (3.5)
    Similarly, if µv∗ = 0, then
                                                                     −1
                                         cut(Gv , σ +1
                                                    v ) = cut(Gv , σ v ) .                             (3.6)

Proof. We prove (3.3) and (3.5) by induction on hu→v . If hu→v = 1 then we have that all neighbours
                       ∗
w ∈ ∂Cu→v u of u with µu→v 6= 0 are in C, i. e., fixed under σ zu→v . Since Cu→v = ∂G u \ {u↑v } ∪ {u}, we
obtain


                        cut(Cu→v , σ −z                   z
                                     u→v ) − cut(Cu→v , σ u→v ) =        ∑            ∗
                                                                                     µw→v              (3.7)
                                                                     w∈∂G u\{u↑v }


by definition of z. By the induction hypothesis and because Gu→v is a tree (as v 6∈ S) we have that (3.7)
holds for hu→v > 1 as well. A similar argument yields (3.4) and (3.6).

    Now, let Uv be the set of all w ∈ Cv such that µw→v
                                                    ∗   6= 0. Furthermore, let
                                                              (
                                                               ψ̃(µv∗ ) if w = v,
                     σ ↑v : Uv ∪ {v} → {−1, +1} ,       w 7→     ∗
                                                                                                       (3.8)
                                                                µw→v    otherwise.

Thus, σ ↑v sets all w ∈ Cv ∩ C \ {v} to their planted sign and all w ∈ Uv \ C to µw→v
                                                                                  ∗   . Moreover, σ ↑v sets v
       ∗          ∗
to ψ(µv ) if ψ(µv ) 6= 0 and to 1 if there is a tie.

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                              13
             A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

Corollary 3.4. We have cut(Gv , σ C ) = cut(Gv , σ ↑v ).

Proof. This is immediate from Lemma 3.3.


      Hence, in order to determine an optimal cut of Gv we merely need to figure out the assignment of the
vertices in Cv \ ({v} ∪ Uv ). Suppose that σ ∗↑v : Cv → {±1} is an optimal extension of σ ↑v to a cut of Gv ,
i. e.,
                                                            1
                              cut(Gv , σ ↑v ) =     ∑ 2 (1 − σ ∗↑v (u)σ ∗↑v (w)) .
                                                {u,w}∈E(G )v



Corollary 3.5. It holds that

                                 1
                          ∑ 2 (1 − σ ∗↑v (v)σ ∗↑v (w)) = ∑ 1{µw→v
                                                              ∗
                                                                  = −ψ̃ (µv )} .
                        w∈∂G v                                 w∈∂G v


Proof. Part (2) of Lemma 3.3 implies that σ ∗↑v (v)σ ∗↑v (w) = 1 for all w ∈ ∂G v such that µw→v
                                                                                             ∗   = 0. The
                                       ∗
claim follows from the definition of σ ↑v and (3.8).


Proof of Proposition 2.2. Given ε > 0 choose δ = δ (ε, d+ , d− ) sufficiently small and s = s(ε, δ , d+ , d− ) >
0 sufficiently large. In particular, pick s large enough so that

                                                  P (|S| ≥ δ n) < ε .                                        (3.9)

Provided that δ is sufficiently small, the Chernoff bound implies that for large n
                                                                         !
                                             1
                                     P         ∑ |∂G v| ≥ εn |S| < δ n
                                             2 v∈S
                                                                             <ε.                            (3.10)


    Now, suppose that σ ∗C is an optimal extension of σ C to a cut of G and let v 6∈ S. Then using the
definition of Cv , Corollary 3.4 implies that

                            ∑ (1 − σ ∗C (v)σ ∗C (w)) = ∑ (1 − σ ∗↑v (v)σ ∗↑v (w)) .
                           w∈∂G v                          w∈∂G v


Therefore, we obtain
                                                           !                                      !
                           1            ∗      ∗                                   1
        P    cut(G, σ C ) − ∑ ∑ (1 − σ ↑v (v)σ ↑v (w)) ≥ εn ≤ P                      ∑ |∂G v| ≥ εn ≤ 2ε .
                           2 v6∈S w∈∂ v                                            2 v∈S
                                         G



The assertion follows from Lemma 3.2 for t ≥ s.

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                   14
                     T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

4    Proof of Proposition 2.5
We continue to denote the set of probability measures on X ⊂ Rk by P(X). For an X-valued random
variable X we denote by L(X) ∈ P(X) the distribution of X. Furthermore, if p, q ∈ P(X), then we write
P p,q (X) to denote the set of all probability measures µ on X × X such that the marginal distribution of
the first component coincides with p and the marginal distribution of the second component coincides
with q. The space P({−1, 0, 1}) is complete with respect to the L1 -Wasserstein metric that is defined by
         `1 (p, q) := inf E|X −Y | : X,Y are random variables with L(X,Y ) ∈ P p,q ({−1, 0, 1}) .
                         

In words, the infimum of E|X −Y | is over all couplings (X,Y ) of the distributions p, q. Such a coupling
(X,Y ) is optimal if `1 (p, q) = E|X −Y |. Finally, let P∗ ({−1, 0, 1}) be the set of all skewed probability
measures on {−1, 0, 1}, that is, P∗ ({−1, 0, 1}) is the set of all p = (p(−1), p(0), p(1)) ∈ P({−1, 0, 1})
                          −10
satisfying p(1) ≥ 1 − d+      . Being a closed subset of P({−1, 0, 1}), P∗ ({−1, 0, 1}) is complete with
respect to `1 ( · , · ).
    As in the definition (1.2)-(1.3) of the operator T = Td+ ,d− for p ∈ P({−1, 0, 1}) we let (η p,i )i≥1 be a
family of independent random variables with distribution p. Further, let γ± = Po(d± ) be independent of
each other and of the (η p,i )i≥1 . We introduce the shorthands
                                                     γ+                      γ+ +γ−
                      Z p := Z p,d+ ,d− ,   Z p,+ := ∑ η p,i ,    Z p,− :=     ∑ η p,i ,
                                                     i=1                     i=γ+ +1

so that
                                          Z p = Z p,+ − Z p,− .
              √
Also set λ = c d+ ln d+ and recall that c > 0 is a constant that we assume to be sufficiently large.
Lemma 4.1. The operator T maps P∗ ({−1, 0, 1}) into itself.
Proof. Suppose that p ∈ P({−1, 0, 1}) is skewed. Then
                                                                         
                                               λ −1                    λ −1
                 P (Z p < 1) ≤ P Z p,+ ≤ d+ −         + P Z p,− ≥ d− +        .                         (4.1)
                                                 2                       2
Since |η p,i | ≤ 1 for all i, we can bound the second summand from above by √
                                                                            invoking the Chernoff bound
on a binomial approximation of the Poisson distribution. Using that d− + c d+ ln d+ ≤ d+ we obtain
                                                                      2
                                                              
                                             cp              1      −c   1 −10
                               P γ− ≥ d− +        d+ ln d+ −     ≤ d+ 2 < d+                       (4.2)
                                             2               2           3
provided c is large enough.
    To bound the other summand from above we use that (η p,i )i≥1 is a sequence of independent skewed
random variables, whence by the Chernoff bound
                                         
                                    λ −1
                   P Z p,+ ≤ d+ −
                                      2
                                                                                   
                                                             λ −1
                   ≤ P (|γ+ − d+ | > λ /8) + P Z p,− ≤ d+ −          γ+ ≥ d+ − λ /8
                                                               2
                      1 −10                         −10
                                                                        2 −10
                   ≤ d+     + P Bin(d+ − λ /8, 1 − d+    ) ≤ d+ − λ /7 < d+      ,               (4.3)
                      3                                                   3

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                               15
              A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

provided that c is sufficiently large. Combining (4.1)–(4.3) completes the proof.

Lemma 4.2. The operator T is strictly `1 -contracting on P∗ ({−1, 0, 1}).

Proof. Let p, q ∈ P∗ ({−1, 0, 1}). We aim to show that `1 (T(p), T(q)) ≤ (1/2) `1 (p, q). To this end, we
let (η p,i , ηq,i )i≥1 be a family of random variables such that the (η p,i )i≥1 are independent with distribution
p and the (ηq,i )i≥1 are independent with distribution q but the pair (η p,i , ηq,i ) is an optimal coupling for
every i. Then by the definition of `1 ( · , · ),

                                       `1 (T(p), T(q)) ≤ E ψ(Z p ) − ψ(Zq ) .                                     (4.4)

To estimate the r. h. s., let η̃ p,i = 1{η p,i = 1} and η̃q,i = 1{ηq,i = 1}. Further, let Fi be the σ -algebra gen-
erated by η̃ p,i , η̃q,i and let F be the σ -algebra generated by γ+ , γ− and the random variables (η̃ p,i , η̃q,i )i≥1 .
Additionally, let γ = γ+ + γ− and consider the three events
                      (                       )
                      γ
            A1 :=    ∑ η̃ p,i η̃q,i ≥ γ − 10 ,            A2 := {γ ≥ 2d+ } ,          A3 := {γ+ − γ− ≤ 20} .
                     i=1

We are going to bound |ψ(Z p ) − ψ(Zq )| on A := A1 \ (A2 ∪ A3 ), C := A1 ∪ A2 ∪ A3 , A2 and A3 \ A2
separately. The bound on the first event is immediate: if A occurs, then ψ(Z p ) = ψ(Zq ) = 1 with certainty.
Hence,

                                             E [|ψ(Z p ) − ψ(Zq )| 1A ] = 0 .                                     (4.5)

    Let us turn to the second event C. Because the pairs (η p,i , ηq,i )i≥1 are mutually independent, we find

                           E [|η p,i − ηq,i | | F] = E [|η p,i − ηq,i | | Fi ]     for all i ≥ 1.                 (4.6)

Clearly, if η̃ p,i η̃q,i = 1, then η p,i − ηq,i = 0. Consequently,

                                                          E|η p,i − ηq,i |    E|η p,1 − ηq,1 |
                           E [|η p,i − ηq,i | | Fi ] ≤                      =                    .                (4.7)
                                                         P[η̃ p,i η̃q,i = 0] P[η̃ p,1 η̃q,1 = 0]

Since the events A1 , A2 , A3 are F-measurable and because Ā2 ensures that γ < 2d+ , (4.6) and (4.7) yield

                                                                    2d+ E|η p,1 − ηq,1 |
                              E[|ψ(Z p ) − ψ(Zq )| | F]1C ≤                              · 1C .                   (4.8)
                                                                     P[η̃ p,1 η̃q,1 = 0]

Further, because the pairs (η p,i , ηq,i )i≥1 are independent and because p, q are skewed,
                                                               !
                                              γ
                  P (C) ≤ P γ ≤ 2d+ , ∑ η̃ p,i η̃q,i ≤ γ − 10             ≤ (2d+ P (η̃ p,1 η̃q,1 = 0))10 .        (4.9)
                                             i=1

Combining (4.8) and (4.9), we obtain

                 E [E [|ψ(Z p ) − ψ(Zq )| | F] 1C ] ≤ (2d+ )11 P (η̃ p,1 η̃q,1 = 0)9 E|η p,1 − ηq,1 | .          (4.10)


                           T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                                      16
                      T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

                                                                      −10
Since p, q are skewed, we furthermore obtain P (η̃ p,1 η̃q,1 = 0) ≤ 2d+    . Therefore
                                                 h                                       i
                    E [|ψ(Z p ) − ψ(Zq )|1C ] = E E [|ψ(Z p ) − ψ(Zq )| | F] 1A1 ∪A2 ∪A3
                                                      −79
                                               ≤ 220 d+   E|η p,1 − ηq,1 | .

With respect to A2 , the triangle inequality yields

                            E[|ψ(Z p ) − ψ(Zq )|1A2 ] ≤ 2E|η p,1 − ηq,1 | · E[γ1A2 ] .              (4.11)
                                                                             −1
Further, since γ = Po(d+ + d− ), the Chernoff bound entails that E[γ1A2 ] ≤ d+  if the constant c is chosen
large enough. Combining this estimate with (4.11), we get
                                                             −1
                               E[|ψ(Z p ) − ψ(Zq )|1A2 ] ≤ 2d+  E|η p,1 − ηq,1 | .                  (4.12)

    Finally, on A3 \ A2 we have

                    E[|ψ(Z p ) − ψ(Zq )|1A3 \A2 ] ≤ 4d+ E|η p,1 − ηq,1 | P [γ+ − γ− ≤ 20] .         (4.13)
                                                                                   −2
Since γ± = Po(d± ) and d+ − d− ≥ λ , the Chernoff bound yields P [γ+ − γ− ≤ 20] ≤ d+  , if c is large
enough. Hence, (4.13) implies
                                                               −1
                             E[|ψ(Z p ) − ψ(Zq )|1A3 \A2 ] ≤ 4d+  E|η p,1 − ηq,1 | .                (4.14)

Finally, the assertion follows from (4.4), (4.5), (4.10), (4.12) and (4.14).

Proof of Proposition 2.5. The assertion follows from Lemmas 4.1 and 4.2 and the Banach fixed point
theorem.


References
 [1] E MMANUEL A BBE , A FONSO S. BANDEIRA , AND G EORGINA H ALL: Exact recov-
     ery in the stochastic block model. IEEE Trans. Inform. Theory, 62(1):471–487, 2016.
     [doi:10.1109/TIT.2015.2490670, arXiv:1405.3267] 2, 4
 [2] DAVID A LDOUS AND J OHN M. S TEELE: The objective method: Probabilistic combinatorial
     optimization and local weak convergence. In Probability on Discrete Structures, pp. 1–72. Springer,
     2004. [doi:10.1007/978-3-662-09444-0_1] 4
 [3] S ANJEEV A RORA , S ATISH R AO , AND U MESH V. VAZIRANI: Expander flows, geometric embed-
     dings and graph partitioning. J. ACM, 56(2):5:1–5:37, 2009. Preliminary version in STOC’04.
     [doi:10.1145/1502793.1502794] 2
 [4] V ICTOR BAPST, A MIN C OJA -O GHLAN , S AMUEL H ETTERICH , F ELICIA R ASSMANN , AND
     DAN V ILENCHIK: The condensation phase transition in random graph coloring. Communica-
     tions in Mathematical Physics, 341(2):543–606, 2016. Preliminary version in RANDOM’14.
     [doi:10.1007/s00220-015-2464-z, arXiv:1404.5513] 5

                        T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                            17
           A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

 [5] B ÉLA B OLLOBÁS AND A LEX D. S COTT: Max cut for random graphs with a planted partition.
     Combin. Probab. Comput., 13(4-5):451–474, 2004. [doi:10.1017/S0963548304006303] 2

 [6] R AVI B. B OPPANA: Eigenvalues and graph bisection: An average-case analysis. In Proc. 28th
     FOCS, pp. 280–285. IEEE Comp. Soc. Press, 1987. [doi:10.1109/SFCS.1987.22] 2

 [7] T HANG N GUYEN B UI , S OMA C HAUDHURI , F RANK T HOMSON L EIGHTON , AND M ICHAEL
     S IPSER: Graph bisection algorithms with good average case behavior. Combinatorica, 7(2):171–191,
     1987. Preliminary version in FOCS’84. [doi:10.1007/BF02579448] 2

 [8] T ED C ARSON AND RUSSELL I MPAGLIAZZO: Hill-climbing finds random planted bisections. In
     Proc. 12th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’01), pp. 903–909. ACM Press,
     2001. ACM DL. 2

 [9] A MIN C OJA -O GHLAN: A spectral heuristic for bisecting random graphs. Random Structures
     Algorithms, 29(3):351–398, 2006. Preliminary version in SODA’05. [doi:10.1002/rsa.20116] 2, 3,
     5, 6

[10] A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH: How does
     the core sit inside the mantle? Electronic Notes in Discrete Mathematics, 49:489 – 496, 2015.
     [doi:10.1016/j.endm.2015.06.068, arXiv:1503.09030] 5

[11] A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH: The
     minimum bisection in the planted bisection model.  In Proc. 19th Internat. Workshop
     on Randomization and Computation (RANDOM’15), pp. 710–725. Schloss Dagstuhl, 2015.
     [doi:10.4230/LIPIcs.APPROX-RANDOM.2015.710, arXiv:1505.02985] 1

[12] A NNE C ONDON AND R ICHARD M. K ARP: Algorithms for graph partitioning on the planted
     partition model. Random Structures Algorithms, 18(2):116–140, 2001. Preliminary version in
     RANDOM’99. [doi:10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2] 2

[13] AURELIEN D ECELLE , F LORENT K RZAKALA , C RISTOPHER M OORE , AND L ENKA Z DEBOROVÁ:
     Asymptotic analysis of the stochastic block model for modular networks and its algorithmic
     applications. Physical Review E, 84(6):066106, 2011. [doi:10.1103/PhysRevE.84.066106,
     arXiv:1109.3041] 4

[14] A MIR D EMBO , A NDREA M ONTANARI , AND S UBHABRATA S EN: Extremal cuts of sparse random
     graphs. Ann. Probab., 45(2):1190–1217, 2017. [doi:10.1214/15-AOP1084, arXiv:1503.03923] 4

[15] TASSOS D IMITRIOU AND RUSSELL I MPAGLIAZZO: Go with the winners for graph bisection. In
     Proc. 9th Ann. ACM-SIAM Symp. on Discrete Algorithms (SODA’98), pp. 510–520. ACM Press,
     1998. ACM DL. 2

[16] M ARTIN E. DYER AND A LAN M. F RIEZE: The solution of some random NP-hard problems in
     polynomial expected time. J. Algorithms, 10(4):451–489, 1989. [doi:10.1016/0196-6774(89)90001-
     1] 2

                      T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                         18
                    T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

[17] U RIEL F EIGE AND J OE K ILIAN: Heuristics for semirandom graph problems. J. Comput. System
     Sci., 63(4):639–671, 2001. [doi:10.1006/jcss.2001.1773] 2

[18] U RIEL F EIGE AND ROBERT K RAUTHGAMER: A polylogarithmic approximation of the minimum
     bisection. SIAM J. Comput., 31(4):1090–1118, 2002. Preliminary version in FOCS’00. See also
     SIAM Rev. 2006. [doi:10.1137/S0097539701387660] 2

[19] U RIEL F EIGE AND E RAN O FEK: Spectral techniques applied to sparse random graphs. Random
     Structures Algorithms, 27(2):251–275, 2005. [doi:10.1002/rsa.20089] 2

[20] J OEL F RIEDMAN , J EFF K AHN , AND E NDRE S ZEMERÉDI: On the second eigenvalue of random
     regular graphs. In Proc. 21st STOC, pp. 587–598. ACM Press, 1989. [doi:10.1145/73007.73063] 2

[21] M ICHAEL R. G AREY, DAVID S. J OHNSON , AND L ARRY J. S TOCKMEYER: Some simplified
     NP-complete graph problems. Theoret. Comput. Sci., 1(3):237–267, 1976. Preliminary version in
     STOC’74. [doi:10.1016/0304-3975(76)90059-1] 2, 4

[22] M ICHEL X. G OEMANS AND DAVID P. W ILLIAMSON: Improved approximation algorithms for
     maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6):1115–
     1145, 1995. Preliminary version in STOC’94. [doi:10.1145/227683.227684] 2

[23] PAUL W. H OLLAND , K ATHRYN L ASKEY, AND S AMUEL L EINHARDT: Stochastic blockmodels:
     First steps. Social networks, 5(2):109–137, 1983. [doi:10.1016/0378-8733(83)90021-7] 4

[24] M ORTEZA I BRAHIMI , YASH K ANORIA , M ATT K RANING , AND A NDREA M ONTANARI: The set of
     solutions of random XORSAT formulae. Ann. Appl. Probab., 25(5):2743–2808, 2015. Preliminary
     version in SODA’12. [doi:10.1214/14-AAP1060, arXiv:1107.5377] 5

[25] S VANTE JANSON , T OMASZ Ł UCZAK , AND A NDRZEJ RUCI ŃSKI: Random Graphs. John Wiley &
     Sons, 2000. [doi:10.1002/9781118032718] 9

[26] M ARK J ERRUM AND G REGORY B. S ORKIN: The Metropolis algorithm for graph bisection.
     Discrete Applied Mathematics, 82(1-3):155–175, 1998. [doi:10.1016/S0166-218X(97)00133-9] 2

[27] A RI J UELS: Topics in black-box combinatorial function optimization. Ph. D. thesis, UC Berkeley,
     1996. Available at Research Gate. 2

[28] R ICHARD K ARP: Reducibility among combinatorial problems. In Complexity of Computer Compu-
     tations, pp. 85–103. Plenum Press, 1972. [doi:10.1007/978-1-4684-2001-2_9] 2

[29] M AREK K ARPINSKI: Approximability of the minimum bisection problem: An algorithmic chal-
     lenge. In Proc. 27th Internat. Symp. Math. Found. Comput. Sci. (MFCS’02), volume 2420 of LNCS,
     pp. 59–67. Springer, 2002. [doi:10.1007/3-540-45687-2_4] 2

[30] S UBHASH K HOT: Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipar-
     tite clique. SIAM J. Comput., 36(4):1025–1071, 2006. Preliminary version in FOCS’04.
     [doi:10.1137/S0097539705447037] 4

                      T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                         19
           A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

[31] L UD ĚK K U ČERA: Expected complexity of graph partitioning problems. Discr. Appl. Math., 57(2-
     3):193–212, 1995. [doi:10.1016/0166-218X(94)00103-K] 2
[32] M ALWINA J. L UCZAK AND C OLIN M C D IARMID: Bisecting sparse random graphs. Ran-
     dom Structures Algorithms, 18(1):31–38, 2001. [doi:10.1002/1098-2418(200101)18:1<31::AID-
     RSA3>3.0.CO;2-1] 4
[33] KONSTANTIN M AKARYCHEV, Y URY M AKARYCHEV, AND A RAVINDAN V IJAYARAGHAVAN:
     Approximation algorithms for semi-random partitioning problems. In Proc. 44th STOC, pp. 367–
     384. ACM Press, 2012. [doi:10.1145/2213977.2214013, arXiv:1205.2234] 2
[34] L AURENT M ASSOULIÉ: Community detection thresholds and the weak Ramanujan prop-
     erty. In Proc. 46th STOC, pp. 694–703. ACM Press, 2014. [doi:10.1145/2591796.2591857,
     arXiv:1311.3085] 4
[35] F RANK M C S HERRY: Spectral partitioning of random graphs. In Proc. 42nd FOCS, pp. 529–537.
     IEEE Comp. Soc. Press, 2001. [doi:10.1109/SFCS.2001.959929] 2
[36] M ARC M ÉZARD AND A NDREA M ONTANARI: Information, Physics, and Computation. Oxford
     Univ. Press, 2009. [doi:10.1093/acprof:oso/9780198570837.001.0001] 3, 5, 6
[37] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: Stochastic block models and reconstruc-
     tion, 2012. [arXiv:1202.1499] 4
[38] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: Consistency thresholds for the planted
     bisection model. In Proc. 47th STOC, pp. 69–75. ACM Press, 2015. [doi:10.1145/2746539.2746603,
     arXiv:1407.1591] 2, 4
[39] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: Belief propagation, robust reconstruction
     and optimal recovery of block models. Ann. Appl. Probab., 26(4):2211–2256, 2016. Preliminary
     version in COLT’14. [doi:10.1214/15-AAP1145, arXiv:1309.1380] 4
[40] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: A proof of the block model threshold
     conjecture. Combinatorica, pp. 1–44, 2017. [doi:10.1007/s00493-016-3238-8, arXiv:1311.4115] 4
[41] R ALPH N EININGER AND L UDGER RÜSCHENDORF: A general limit theorem for recur-
     sive algorithms and combinatorial structures. Ann. Appl. Probab., 14(1):378–418, 2004.
     [doi:10.1214/aoap/1075828056] 5
[42] H ARALD R ÄCKE: Optimal hierarchical decompositions for congestion minimization in networks.
     In Proc. 40th STOC, pp. 255–264. ACM Press, 2008. [doi:10.1145/1374376.1374415] 2, 4
[43] M ICHEL TALAGRAND: The Parisi formula.                  Ann. Math., 163(1):221–263, 2006.
     [doi:10.4007/annals.2006.163.221] 4
[44] VAN H. V U: Spectral norm of random matrices. Combinatorica, 27(6):721–736, 2007. Preliminary
     version in STOC’05. [doi:10.1007/s00493-007-2190-z] 2


                      T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                         20
                 T HE M INIMUM B ISECTION IN THE P LANTED B ISECTION M ODEL

AUTHORS

    Amin Coja-Oghlan
    Professor
    Goethe University Frankfurt, Germany
    acoghlan math uni-frankfurt de
    http://www.uni-frankfurt.de/52107833


    Oliver Cooley
    Assistant professor
    Graz University of Technology, Austria
    cooley math tugraz at


    Mihyun Kang
    Professor
    Graz University of Technology, Austria
    kang math tugraz at
    http://www.math.tugraz.at/~kang/


    Kathrin Skubch
    Ph. D. student
    Goethe University Frankfurt, Germany
    skubch math uni-frankfurt de
    http://www.uni-frankfurt.de/53778787


ABOUT THE AUTHORS

    A MIN C OJA -O GHLAN studied Mathematics and Computer Science in Hamburg and Berlin.
       He graduated from the University of Hamburg with a Ph. D. in Mathematics in 2001
       under the supervision of Johannes Michaliček and obtained a Habilitation from Humboldt
       University Berlin in 2005. After visiting Carnegie Mellon University in 2007, he held
       faculty positions at the University of Edinburgh and the University of Warwick before
       joining Goethe University Frankfurt in 2012.


    O LIVER C OOLEY received a Bachelor and Master of Mathematics from the University of
       Cambridge in 2004 and 2005. He graduated from the University of Birmingham with a
       Ph. D. in Mathematics in 2010 under the supervision of Daniela Kühn and Deryk Osthus.
       His research interests include random graphs and hypergraphs, extremal graph theory
       and extremal hypergraph theory.



                    T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                         21
     A MIN C OJA -O GHLAN , O LIVER C OOLEY, M IHYUN K ANG , AND K ATHRIN S KUBCH

M IHYUN K ANG graduated with a Ph. D. in Mathematics from KAIST (Korea Advanced
   Institute of Science and Technology) in 2001 under the supervision of Geon Ho Choe.
   Her research fields include random graphs, random hypergraphs and random graphs on
   surfaces.


K ATHRIN S KUBCH is a Ph. D. student of Amin Coja-Oghlan at Goethe University Frankfurt.
   Her research fields include probabilistic combinatorics, random graphs and hypergraphs
   and phase transitions in discrete structures.




               T HEORY OF C OMPUTING, Volume 13 (8), 2017, pp. 1–22                         22