Conditional Random Fields, Planted Constraint Satisfaction, and Entropy Concentration

Authors Emmanuel Abbe, Andrea Montanari,
Plaintext
                        T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443
                                       www.theoryofcomputing.org

                          S PECIAL ISSUE : APPROX-RANDOM 2013



                    Conditional Random Fields,
                   Planted Constraint Satisfaction,
                     and Entropy Concentration
                             Emmanuel Abbe                          Andrea Montanari∗
             Received October 27, 2013; Revised November 30, 2014; Published December 29, 2015




       Abstract: This paper studies a class of probabilistic models on graphs, where edge variables
       depend on incident node variables through a fixed probability kernel. The class includes
       planted constraint satisfaction problems (CSPs), as well as other structures motivated by
       coding theory and community detection problems. It is shown that under mild assumptions
       on the kernel and for sparse random graphs, the conditional entropy of the node variables
       given the edge variables concentrates. This implies in particular concentration results for the
       number of solutions in a broad class of planted CSPs, the existence of a threshold function
       for the disassortative stochastic block model, and the proof of a conjecture on parity check
       codes. It also establishes new connections among coding, clustering and satisfiability.

ACM Classification: G.3, H.1.1, G.2.2
AMS Classification: 68Q87, 68P30, 05C80
Key words and phrases: planted SAT, planted CPS, clustering, stochastic block model, graph-based
codes, entropy

1     Introduction
This paper studies a class of probabilistic models on graphs encompassing models from random com-
binatorial optimization, coding theory and machine learning. Depending on the context, the class may
    A preliminary version of this paper appeared in the Proc. of RANDOM, Berkeley, 2013.
    ∗ Supported by NSF CCF-1319979, DMS-11-06627 and AFOSR FA9550-13-1-0036



© 2015 Emmanuel Abbe and Andrea Montanari
c b Licensed under a Creative Commons Attribution License (CC-BY)                          DOI: 10.4086/toc.2015.v011a017
                                   E MMANUEL A BBE AND A NDREA M ONTANARI

be described as a family of planted constrained satisfaction problems (CSPs), encoded channels or
conditional random fields. We start by providing motivations in CSPs.
      CSPs are key components in the theory of computational complexity as well as important mathematical
models in various applications of computer science, engineering and physics. In a CSP, a set of variables
x1 , . . . , xn is required to satisfy a collection of constraints, each involving a subset of the variables. In
many cases of interest, the variables are Boolean and the constraints are all of a common type: e. g., in
k-SAT, the constraints require the OR of k Boolean variables or their negations to be TRUE, whereas
in k-XORSAT, the XOR of the variables or their negations must equal zero. Given a set of constraints
and a number of variables, the problem is to decide whether there exists a satisfying assignment. In
random CSPs, the constraints are drawn at random from a given ensemble, keeping the constraint density1
constant. In this setting, it is of interest to estimate the probability that a random instance is satisfiable.
One of the fascinating phenomena occurring for random instances is the phase transition, which makes
the task of estimating this probability much easier in the limit. For a large class of CSPs, and as n tends
to infinity, the probability of being satisfiable tends to a step function, jumping from 1 to 0 when the
constraint density crosses a critical threshold. For random k-XORSAT the existence of such a critical
threshold is proved2 in For random k-SAT, k ≥ 3, the existence of a threshold that depends on n is proved
in [32]. However it remains open to show that this threshold converges as n → ∞. Upper and lower
bounds are known to match up to a term that is of relative order k 2−k as k increases [9, 19]. Phase
transition phenomena in other types of CSPs are also investigated in [8, 49, 9]
      In planted random CSPs, a “planted assignment” is first drawn, and the constraints are then drawn
at random so as to keep that planted assignment a satisfying one. Planted ensembles were investigated
in [12, 37, 7, 6, 38, 5], and at high density in [10, 20, 28]. In the planted setting, the probability of being
satisfiable is always equal to one by construction, and a more relevant question is to determine the actual
number of satisfying assignments. One would expect that this problem becomes easier in the limit as
n → ∞ due to an asymptotic phenomenon. This paper shows that, indeed, a concentration phenomenon
occurs: for a large class of planted CSPs (including SAT, NAE-SAT and XOR-SAT) the normalized
logarithm of the number or satisfying assignments concentrates (with respect to the graph of the CSP) to
a number. Moreover, we emphasize that this number is independent of n, unlike the SAT threshold.
      It is worth comparing the result obtained in this paper for planted CSPs, with the one obtained
in [4] for non-planted CSPs. In that case, the number of solutions is zero with positive probability and
therefore the logarithm of the number of solution does not have a finite expectation. Technically, standard
martingale methods do not allow to prove concentration, even to an n-dependent threshold. In [4] an
interpolation method [36] is used to prove the existence of the limit of a “regularized” quantity, namely
the logarithm of the number of solutions plus one, divided by the number of variables. A technical
consequence of this approach is that the concentration of this quantity to a value that is independent of n
can only be proved when the probability of unsatisfiability is known to be O(1/ log(n)1+ε ).
      This paper shows that in the planted case the concentration around an n-independent value holds
unconditionally. We again use the interpolation technique [36, 30, 31, 53, 13, 4] but with an interesting
twist. While in all the cited references, the entropy (or log-partition function) is shown to be superadditive,
   1 The ratio of the expected number of constraints and the number of variables.
   2 Reference [24] contains a statement but proofs were omitted from the proceedings. They can be found as appendices in the

arXiv version [25].


                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                         414
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

in the present setting it turns out to be subadditive. This flip from superadditivity to subadditivity has an
intriguing information-theoretic interpretation. Considering for instance the application to sparse-graph
codes, it corresponds to the fact that more information can be transmitted reliably using—say—a code of
block length (n1 + n2 ) than two codes of block lengths n1 and n2 , respectively. While this intuition is
folklore among practitioners, the present analysis provides an elegant formalization.
   Let us also mention that a fruitful line of work has addressed the relation between planted random
CSPs and their non-planted counterparts in the satisfiable phase [5, 40, 58]. These papers show that,
when the number of solutions is sufficiently concentrated, planting does not play a critical role in the
model. It would be interesting to use these ideas to “export” the concentration result obtained here to
non-planted models.
    In this paper, we pursue a different type of approach. Motivated by applications,3 in particular in
coding theory and clustering, we consider extensions of the standard planted CSPs to a setting allowing
soft probabilistic constraints. Within the setting of soft CSPs, the planted solution is an unknown vector
to be reconstructed, and the constraints are regarded as noisy observations of this unknown vector. For
instance one can recover the case of planted random k-SAT as follows. Each clause is generated by
selecting first k variable indices i1 , . . . , ik uniformly at random, providing a random hyperedge. Then
a clause is drawn uniformly among the ones that are satisfied by the variables xi1 , . . . , xik appearing in
the planted assignment. The clause can hence be regarded as a noisy observation of xi1 , . . . , xik . More
generally the formula can be seen as a noisy observation of the planted assignment.
    Our framework extends the above to include numerous examples from coding theory, machine
learning and statistics. Within LDPC or LDGM codes [55], encoding is performed by evaluating the
modulo 2 sum of a random subset of information bits and transmitting it through a noisy communication
channel. The selection of the information bits is described by a graph, drawn at random for the code
construction, and the transmission of these bits leads to a noisy observation of the graph variables.
Similarly, a community detection block model [34] can be seen as a random graph model, whereby each
edge is a noisy observation of the community assignments of the incident nodes. Definitions will be made
precise in the next section.
    The conditional probability of the unknown vector given the noisy observations takes the form of
a graphical model, i. e., factorizes according to a hypergraph whose nodes correspond to variables and
hyperedges correspond to noisy observations. Such graphical models have been studied by several authors
in machine learning [43] under the name of “conditional random fields,” and in [48] in the context of
LDPC and LDGM codes. The conditional entropy of the unknown vector given the observations is
used here to quantify the residual uncertainty of the vector. This is equivalent to considering the mutual
information between the node and edge variables. In such a general setting, we prove that the conditional
entropy per variable tends to a limit. This framework allows a unified treatment of a large class of random
combinatorial optimization problems, raises new connections among them, and opens up to new models.
We obtain in particular a proof of a conjecture posed in [42] on low-density parity-check codes, and the
existence of a threshold function for the disassortative stochastic block model [23].


   3 Planted models are also appealing to cryptographic application, as hard instances with known solutions provide good

one-way functions [35, 16].


                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                     415
                                    E MMANUEL A BBE AND A NDREA M ONTANARI

2     The model
Let k and n be two positive integers with n ≥ k.

     • Let V = [n] and g = (V, E(g)) be a hypergraph with vertex set V and edge set E(g) ⊆ Ek (V ), where
       Ek (V ) denotes the set of all possible nk hyperedges of order k on the vertex set V . We will often
       drop the prefix “hyper.” For the purpose of this paper, one can equivalently work with the model in
       which the hyperedges as drawn from all nk hyperedges without replacement of the vertices.

     • Let X and Y be two finite sets called respectively the input and output alphabets. Let Q(·|·) be
       a probability transition function (or channel) from Xk to Y, i. e., for each u ∈ Xk , Q(·|u) is a
       probability distribution on4 Y.

     • To each vertex in V , we assign a node-variable taking values in X, and to each edge in E(g),
       we assign an edge-variable taking values in Y. We now define a type of factor model where the
       probability of the edge-variables given the node-variables is given by

                                    Pg,Q (y|x) ≡    ∏ Q(yI | x[I]) ,       x ∈ XV , y ∈ YE(g) ,           (2.1)
                                                   I∈E(g)

        where yI denotes the edge-variable attached to edge I, and x[I] denotes the k node-variables attached
        to the vertices incident with edge I (where the entries of x[I] are placed in the increasing order of I,
        although we will only consider kernels that are invariant under this ordering).

     The above is a type of factor or graphical model, or a planted constraint satisfaction problem with soft
probabilistic constraints. For each x ∈ XV , Pg,Q (· | x) is a product measure on the set of edge-variables. We
call Pg,Q a graphical channel with graph g and kernel Q. We next put the uniform probability distribution
on the set of node-variables XV , and define the a posteriori probability distribution (or reverse channel) by
                                               1
                           Rg,Q (x | y) ≡            Pg,Q (y | x)|X|−n ,   x ∈ XV , y ∈ YE(g) ,           (2.2)
                                            Sg,Q (y)

where y belongs to the support of Pg,Q (· | x) and

                                             Sg,Q (y) ≡ ∑ Pg,Q (y | x)|X|−n                               (2.3)
                                                        x∈XV

is the marginal distribution of the edge variables y.

Example: planted SAT model. We now show how previous model reduces to planted k-SAT for a
specific choice of the kernel. First, we take X = {0, 1}, i. e., the variables in planted SAT are Boolean,
and x ∈ {0, 1}n represents the planted assignment. We want to generate a random k-CNF formula with
clauses that keep x a satisfying assignment. Notice that a k-clause is specified by two entities, (i) the
variables that are in the clause, i. e., an index set I ⊆ [n] of cardinality |I| = k, (ii) the negations that
    4 Note that we will allow Y to depend on k.



                        T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                            416
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

come on some of these variables, which can be encoded by a binary vector yI ∈ {0, 1}k . For example,
the 3-clause u2 ∧ ū7 ∧ ū12 (where we use u for the formula’s variables, which are dummies in contrast
to the planted assignment x) is specified by the index set (2, 7, 12) and by the negation pattern (0, 1, 1).
Requiring that this clause is satisfied is equivalent to require that (u2 , u7 , u12 ) 6= (0, 1, 1) (or equivalently,
u[I] 6= yI in general). This means that the triplet (u2 , u7 , u12 ) must be different than the triplet (0, 1, 1),
though a strict subset of components can be equal. In the model that we defined above, the edges of the
graph play the role of the index sets I, and the output yI of the kernel Q plays the role of the negation
pattern (in particular the output alphabet is Y = {0, 1}k ). The planted formula is hence defined by the pair
(g, y), where g is the graph and y ∈ Y|E(g)| are the negation patterns. When working with non-planted SAT,
the negations yI do typically not depend on a specific x and on I, and yI is uniformly drawn. However, for
planted-SAT, yI depends on the values of the planted assignments x[I], and this dependency is captured
by the kernel Q. For a uniform planted SAT model, which we consider in this paper, the following kernel
is used
                                                          1
                                     Qsat (yI | x[I]) = k     1(yI 6= x[I]) .
                                                       2 −1
In words, we draw the negation pattern uniformly at random among all patterns that keep the planted
assignment a satisfying one.
     We now define two probability distributions on the hypergraph g, which are equivalent for the
purposes of this paper:

      • A sparse Erdős-Rényi distribution, where each edge is drawn independently with probability
        p = αn/nk , where α > 0 is the edge density.

      • A sparse Poisson distribution, where for each I ∈ Ek (V ), a number of edges mI is drawn inde-
        pendently from a Poisson distribution of parameter p = αn/nk . Note that mI takes value in Z+ ,
        hence G is now a multi-edge hypergraph. To cope with this more general setting, we allow the
        edge-variable yI to take value in YmI , i. e., yI = (yI (1), . . . , yI (mI )), and define (with a slight abuse
        of notation)
                                                             mI
                                             Q(yI | x[I]) = ∏ Q(yI (i) | x[I]) .                                 (2.4)
                                                             i=1

        This means that for each I, if mI ≥ 1, then mI i. i. d. outputs are drawn from the kernel Q, and if
        mI = 0, no edge is drawn. We denote by Pk (α, n) this distribution on (multi-edge) hypergraphs.

   Since p = αn/nk , the number of edges concentrates around its expectation given by αn and the two
models can be shown to be equivalent for our purposes in the limit as n → ∞.


3     Main Results
3.1     Preliminaries
We start by reviewing some basic definitions of information measures used in this paper. Recall that
for a random variable Z having distribution PZ supported on a finite set Z, the entropy (in bits) of Z or

                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                   417
                              E MMANUEL A BBE AND A NDREA M ONTANARI

equivalently of PZ is defined by

                                   H(Z) = H(PZ ) = − ∑ PZ (z) log2 PZ (z) .
                                                        z∈Z

In particular, 0 ≤ H(Z) ≤ log2 |Z|, where the first inequality is met only for Z deterministic and the second
inequality is met only for Z uniform on Z. If W is a random variable having distribution PW supported on
a finite set W, and (W, Z) has a joint probability distribution PW,Z on W × Z, then the conditional entropy
of Z given that W = w ∈ W is defined by

                 H(Z | W = w) = H(PZ|W =w ) = − ∑ PZ|W =w (z | w) log2 PZ|W =w (z | w) ,
                                                       z∈Z

where
                                     PZ|W =w (z | w) = PW,Z (w, z)/PW (w)
is the conditional distribution of Z given that W = w. In particular, 0 ≤ H(Z | W = w) ≤ log(|Z|) where
the first inequality is met only if Z is deterministic given that W = w and the second inequality is met
only if Z is uniform on Z given that W = w. The conditional entropy of Z given W is then given by

                                    H(Z | W ) = ∑ PW (w)H(Z | W = w)
                                                 w∈W

and this is a scalar (and not a random variable in contrast to the conditional expectation E(Z | W )). In
particular 0 ≤ H(Z | W ) ≤ log(Z) where the first inequality is met only if Z is a deterministic function of
W and the second inequality is met only if Z is independent of W and uniformly distributed on Z. Note
that if W1 ,W2 are two random variables jointly distributed with Z under PW1 ,W2 ,Z on W1 × W2 × Z, then

               H(Z | W1 ,W2 = w2 ) =      ∑ H(Z | W1 = w1 ,W2 = w2 )PW |W =w (w1 | w2 ) .
                                                                                1   2   2
                                        w1 ∈W1

Finally, the mutual information between W and Z is given by

                           I(W ; Z) = H(W ) − H(W | Z) = H(Z) − H(Z | W ) .

We refer to [22] for more details on these quantities.
    In the sequel, we let X be uniformly distributed in Xn , G be a random sparse hypergraph drawn from
the Pk (α, n) ensemble independently of X. For a realization G = g, we let Yg be the output of X through
the graphical channel Pg,Q defined in (2.1) for a kernel Q. We also define

             Hg (X | Yg ) ≡ H(X | Yg , G = g) = −|X|−n ∑            ∑ Pg,Q (y | x) log Rg,Q (x | y) ,   (3.1)
                                                             x∈XV y∈YE(g)

where Pg,Q and Rg,Q are defined in (2.1) and (2.2) respectively. Note that Hg (X | Yg ) is precisely the
conditional entropy of X given Yg and G = g as defined in the previous paragraph. We use the notation
Hg (X | Yg ) with the subscript g rather than the notation H(X | Yg , G = g) as it allows us to let G be
a random graph and consider HG (X | YG ) as a random variable (which is a function of G). Instead,

                    T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                             418
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

H(X | YG , G) denotes the expectation of H(X | Yg , G = g) over g as traditionally used in information
theory (see previous paragraph), i. e.,

                                        H(X | YG , G) = E HG (X | YG ) .                               (3.2)
                                                            G

We will often drop the subscript g (or G) in Yg (or YG ) as the Y variable is always implicitly defined from
the underlying graph.
    Finally, we denote the mutual information between the node and edge variables as

                                     Ig (X | Yg ) = n log |X| − Hg (X | Yg )

and the expected value
                                    I(X | Y, G) = n log |X| − H(X | Y, G) .

Example: conditional entropy for planted SAT. We now explain the meaning of the conditional
entropy for the planted-SAT model. Recall the definition of planted k-SAT in terms of graphical channels
discussed in previous section. The alphabets are X = {0, 1}, Y = {0, 1}k , the input x ∈ Xn represents
the planted assignment, and for each edge (or index set) I ∈ [n]k , the kernel Qsat takes the variables x[I]
and outputs the negation pattern yI uniformly at random in {0, 1}k \ {x[I]}, so that the I-clause is given
by u[I] 6= yI (where u are the formula’s variables) and the formula is specified by (g, y). Assuming that
x is drawn uniformly at random in Xn , we can write the reverse distribution of the planted assignment
x ∈ Xn given a negation pattern y ∈ Y|E(g)| on the graph g, which is simply the uniform distribution on all
elements of Xn (the potential planted assignments) that satisfy the planted formula (g, y). This is formally
shown by taking the definition of the planted-SAT kernel,
                                                            1
                                     Qsat (yI | x[I]) =            1(yI 6= x[I])                       (3.3)
                                                          2k − 1
and plugging it in the definition of Rg,Q (x | y) (see (2.2)):
                                       1
             Rg,Qsat (x | y) =                       ∏ 1(x[I] 6= yI )                                  (3.4)
                             ∑u ∏e∈g 1(u[I] 6= yI ) e∈g
                             (
                              0       if x is not a satisfiable assignment of the formula (g, y),
                           =     1                                                                     (3.5)
                               Zg (y) otherwise,

where
                                         Zg (y) = ∑ ∏ 1(u[I] 6= yI )
                                                    u e∈g

is the number of satisfying assignments of the formula (g, y). Hence, for a fixed formula, the conditional
entropy H(X | Yg = y, G = g) is simply the logarithm of the number of satisfying assignments. Moreover,
for a random graph G (e. g., under the models of previous section), H(X | YG , G) is the expected logarithm
of the number of satisfying assignments of a random planted formula. Since H(X | Yg = y, G = g) ∈ [0, n],
the normalized condition entropy belongs to [0, 1], and captures the exponent at which the number of
solutions grows in the exponential scale.

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                           419
                                E MMANUEL A BBE AND A NDREA M ONTANARI

    Note that in this paper, we are interested in models that go beyond CSPs, such as for coding or
clustering problems, which we cast as soft CSPs induced by a kernel Q that is typically supported on all
possible inputs. In such models, the reverse channel is typically not uniform on a subset of “satisfying”
inputs as for planted SAT, and all “planted assignments” have non-zero probability. Hence, the number
of satisfying assignment is not relevant here (it is the entire space), but we are interested in how likely
are the different assignments to a given channel output. In particular, the conditional entropy captures
how much information the model output contains about the input: if H(X | YG , G)/n is close to 0, then
the input (i. e., the planted assignment) is determined by the output up to an o(n) entropy, whereas if
H(X | YG , G)/n tends to 1, then the output contains no significant trace (i. e., it only removes an o(n)
entropy) about the input. For the case of clustering, this captures how much information the graph tells
about the clusters. In particular, the limit of the normalized conditional entropy is strictly less than 1 if
and only if the clusters can be detected (i. e., recoverable with better asymptotic accuracy than what a
random guess provides [46, 51]). Similarly, for coding problems, the conditional entropy leads directly to
the mutual information, which captures how much information the underlying channel carries [22].
    We next introduce an operator which plays a crucial role in proving that H(X | Y, G)/n admits a limit.

Definition 3.1. We denote by M1 (Xl ) the set of probability measures on Xl . For a kernel Q from Xk to Y,
we define

       Γl : M1 (Xl ) → R ,                                                                                     (3.6)
                                                           "                       #
                                                                l                    k
                                   1                                         (r)
                                                                                         (1)          (l)
                  ν 7→ Γl (ν) =               ∑                ∑∏ 1 − Q(y | u    )   ∏ ν(ui , . . . , ui ) .   (3.7)
                                  |Y| u(1) ,...,u(l) ∈Xk   y∈Y r=1                     i=1

Hypothesis H. A kernel Q is said to satisfy Hypothesis H if Γl is convex for any l ≥ 1.

    Despite the lengthy expression, it is important to note that the definition of Γl depends solely on the
kernel Q. The operator can be interpreted as follows. Let U = (U1 , . . . ,Uk ) be i. i. d. random vectors
taking values in Xl under the distribution νl . Each vector Ui for i ∈ [k] has hence l components denoted
     (1)         (l)                                            (l)       (l)
by Ui , . . . ,Ui and distributed under νl . Denote U (l) = (U1 , . . . ,Uk ). For Y uniformly drawn in Y,
the operator Γl is equivalently defined by the map that sends νl to
                                                 l
                                               E ∏ 1 − Q(Y | U (r) ) .
                                                                    
                                              Y,U
                                                    r=1

    The role of Γl appears in the proof of Theorem 3.2, to establish that f (n) = H(X | Y, G) is a super-
additive function, i. e., that f (n) ≤ f (n1 ) + f (n2 ), where n1 + n2 = n. We provide here a high-level
insight. Consider for example the case of Q corresponding to a planted CSP, say k-SAT. In order
to show that f (n) ≤ f (n1 ) + f (n2 ), one would like to compare the number of solutions Z(α, n) of a
random k-CNF formula with density α on n variables, with the number of solutions of two independent
random k-CNF formulae with density α on n1 and n2 variables respectively (where n1 + n2 = n). More
precisely, the problem is to determine whether a general inequality holds between Z(α, n) and the product
Z(α, n1 ) · Z(α, n2 ), i. e., between f (n) and f (n1 ) + f (n2 ). It is delicate to compare these quantities. For
example, the inequality does not go in the same direction for k-SAT and planted k-SAT as demonstrated

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                   420
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

in this paper. At a very high-level, the proof manages to express the super-additivity, which can be
thought as a form of convexity property of f with respect to n = n1 + n2 , in terms of a formal convexity
property of the operator Γl . This operator acts on the empirical distribution of l solutions drawn uniformly
at random from a random k-CNF formula and corresponding to the variables U (1) , . . . ,U (l) previously
defined. By definition, the empirical distribution counts the occurrence of the {0, 1}-patterns that appear
in the l solutions, and this counting can be done separately in the two subsystems of n1 and n2 variables,
and be added to obtain the counting in the global system of n variables. In other words, denoting by nP
the count in the full system and by n1 P1 and n2 P2 the counts in the subsystems, the empirical distribution
is additive: nP = n1 P1 + n2 P2 . The proof expresses the gap f (n) − f (n1 ) − f (n2 ) in terms of the gaps
nΓl (P) − n1 Γl (P1 ) − n2 Γl (P2 ), for l ≥ 1, hence allowing us to conclude with a formal convexity argument
when Hypothesis H holds.
     We will see in Section 6 that a large variety of kernels satisfy this hypothesis, including kernels
corresponding to parity-check encoded channels, planted SAT, NAE-SAT, XORSAT (for even clause
size), and disassortative stochastic block models. There are also natural examples of kernels for which
Hypothesis H is not satisfied, for example 3-XORSAT. We do not expect that Hypothesis H is necessary
to obtain convergence of the normalized conditional entropy. It appears to be a technical requirement
dictated by our proof technique.

3.2     Results
We first show the sub-additivity property for the expected conditional entropy of graphical channels.
Theorem 3.2. Let Q be a kernel satisfying Hypothesis H. Let n1 , n2 ≥ k and n = n1 + n2 . We denote by
Gn a hypergraph drawn from the ensemble Pk (α, n) and Gni two hypergraphs drawn independently from
the ensembles Pk (α, ni ), respectively for i = 1, 2. Define f (n) = H(X | Y, Gn ), and f (ni ) = H(X | Y, Gni ),
i = 1, 2. Then

                                            f (n) ≤ f (n1 ) + f (n2 ) .                                    (3.8)

   The proof of this theorem is outlined in Section 4. Since f is sub-additive, it follows from Fekete’s
Lemma (see for example [54]) that it converges (to a finite value since f is lower-bounded by 0).
Corollary 3.3. Let Q be a kernel satisfying Hypothesis H and Gn be a random hypergraph drawn from
the ensemble Pk (α, n). There exists a constant Ck (α, Q) such that
                                          1
                                       lim H(X | Y, Gn ) = Ck (α, Q) .                                     (3.9)
                                       n→∞ n

      The following is obtained using the previous corollary and Azuma-Hoeffding’s inequality.
Theorem 3.4. Let Q be a kernel satisfying Hypothesis H and Gn be a random hypergraph drawn from
the ensemble Pk (α, n). Then, almost surely,
                                           1
                                        lim HGn (X | Y ) = Ck (α, Q) ,                                   (3.10)
                                       n→∞ n

with Ck (α, Q) as in Corollary 3.3.

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                               421
                              E MMANUEL A BBE AND A NDREA M ONTANARI

    Note that the almost sure convergence is with respect to the graph Gn . Hence, the result says that
with high probability on the drawing of Gn , the conditional entropy HGn (X | Y ) is close to a deterministic
number Ck (α, Q). The proof is given in Section 5.
    In Section 6, we consider three applications of the preceding results:

    • In Section 6.1 we consider planted CSPs such as planted k-SAT, k-NAE-SAT and k-XORSAT and
      show that the normalized logarithm of the number of solutions concentrates with respect to the
      drawing of the random graph to an n-independent limit.

    • In Section 6.2, we consider the stochastic block model with two communities, or equivalently,
      the planted partition model. We show that for the disassortative case, the normalized conditional
      entropy of the node variables (the community assignments) given the graph converges to a fixed
      value. In other words, the uncertainty on the community assignments after observing the graph is
      asymptotically determined by a fixed valued depending on the connectivity parameters.

    • In Section 6.3, we consider LDGM binary codes over symmetric channels and prove the conjecture
      of [42] on the concentration of the mutual information for k even.



4    Proof of Theorem 3.2: Interpolation method for graphical channels

Notations. Recall that the considered model X is a random uniform vector of dimension n with
components valued in X, the graph G (a random graph on the vertex set [n] with edges of order k), and
the edge variables Y (a random vector of dimension |E(G)| with components valued in the finite set Y
and indexed by the edges of G). Note that X and G are drawn independently, while Y depends on both G
and X, i. e., Y is the output of X on the graphical channel PG,Q . While X, G and Y all depend on n, we
emphasize the dependency in n only through G in what follows to simplify the notation. We also keep
writing Y instead of YG even though Y changes if G changes. When carrying expansions where G does
not change, we may write H(X | Y ) instead of H(X | Y, G).
    For a random variable Z with distribution PZ , we use the notation

                                          E f (Z) = ∑ f (z)PZ (z) ,                                    (4.1)
                                          Z           z


and we may simply write E f (Z) when there is no ambiguity about the expectation. For two random
variables W, Z with conditional distribution P{W = w | Z = z} = PW |Z (w | z), we use the notation

                                    E g(W, Z) = ∑ g(w, Z)PW |Z (w | Z) ,                               (4.2)
                                   W |Z           w


note that EW |Z g(W, Z) is a function of Z, hence a random variable.

                    T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                             422
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

    Our goal is to show the sub-additivity of H(X | Y, Gn ) as a function5 of n, namely

                                    H(X | Y, Gn ) ≤ H(X | Y, Gn1 ) + H(X | Y, Gn2 ) ,                                       (4.3)

where n1 + n2 = n. To understand the meaning of this inequality, consider a partition the set of vertices
[n] into two disjoint sets of size n1 and n2 with n1 + n2 = n. Let g be a graph on the vertex set [n], and
denote by g1 and g2 the restriction of g on each of these subsets. Then the following is obtained simply
from the fact that conditioning reduced entropy:

                                      H(X | Y, g) ≤ H(X | Y, g1 ) + H(X | Y, g2 ) .                                         (4.4)

Hence, the above is also true for a random graph G drawn from the ensemble Pk (α, n). However, the
random graph obtained by restricting Gn to a subset of ni vertices is not equivalent to Gni , which is drawn
from the ensemble Pk (α, ni ), i = 1, 2. The issue is that when taking the restriction, the probability of
an edge stays at αn/nk , whereas it should be6 αni /nki under the ensemble Pk (α, ni ). Consequently, the
above does not imply
                             H(X | Y, Gn ) ≤ H(X | Y, Gn1 ) + H(X | Y, Gn2 ) .

To obtain the proper term on the right hand side, one should add the edges lost in the splitting of the
vertices (e. g., using a coupling argument), but this gives a lower bound on the right hand side of (4.4)
instead of an upper bound. This also shows that it may not be obvious to guess the direction of the
inequality (i. e., sub- versus super-additivity). We rely on an interpolation method to compare the right
quantities.
    The interpolation method was first introduced in [36] for the Sherrington-Kirkpatrick model. This is a
model for a spin-glass (i. e., a spin model with random couplings) on a complete graph. It was subsequently
shown in [30, 31, 53] that the same ideas can be generalized to models on random sparse graphs, and
applications in coding theory and random combinatorial optimization were proposed in [47, 41] and [13, 4].
We next develop an interpolation method to estimate the conditional entropy of general graphical channels.
Interestingly, we will see that the planting flips the behaviour of the entropy from super to sub-additive.

Definition 4.1. We define a more general Poisson model for the random graph, where a parameter εI ≥ 0
is attached to each I ∈ Ek (V ) (where Ek (V ) denotes the set of all nk edges of order k on V = [n]), and
the number of edges mI (εI ) is drawn from a Poisson distribution of parameter εI . This defines a random
multi-edge hypergraph whose edge probability is not homogenous but depends on the parameters εI .
Denoting by ε the collection of all nk parameters εI , we denote this ensemble as Pk (ε, n). If for all I,
εI = αn/nk , then Pk (ε, n) reduces to Pk (α, n) as previously defined.

   5 The reader will notice that this corresponds to super-additivity of the mutual information


                                            I(X;Y, Gn ) ≥ I(X;Y, Gn1 ) + I(X;Y, Gn2 ) .

This corresponds to the intuition that more information is conveyed using a graph of size (n1 + n2 ), than two graphs of sizes n1 ,
and n2 .
   6 When working with the Poisson model, the probabilities αn /nk should be multiplied by exp(−αn /nk ).
                                                                 i i                                      i i



                        T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                              423
                              E MMANUEL A BBE AND A NDREA M ONTANARI

Lemma 4.2. Let X be uniformly drawn over Xn , G(ε) be a random hypergraph drawn from the ensemble
Pk (ε, n) independently of X, and let Y be the output of X through PG(ε),Q defined in (2.1) for the kernel Q.
Define also
                               H(Q) ≡ −2−k ∑ Q(z | u) log Q(z | u) .
                                                  u∈Xk ,z∈Y

Then
                              ∂
                                   H(X | Y, G(ε)) = H(Q) − H(YI0 | Y, G(ε)) ,                          (4.5)
                              ∂ εI

where YI0 and Y are independent conditionally on X (i. e., YI0 is drawn under Q(· | X[I]) and Y is drawn
independently under RG(ε),Q (· | X)).

Proof. Note that for a random variable Zε which is Poisson distributed of parameter ε, and a function f ,

                                  ∂
                                     E f (Zε ) = E f (Zε + 1) − E f (Zε ) .                            (4.6)
                                  ∂ε
Since

                      H(X | Y, G(ε)) = ∑ H(X | Y, G(ε) = g)P{G(ε) = g}                                 (4.7)
                                              k
                                         g∈Zn+

                                      = ∑ H(X | Y, G(ε) = g)         ∏ P{Zε = gI }
                                                                                I                      (4.8)
                                              k                    I∈Ek (V )
                                         g∈Zn+


where ZεI ∼ P(εI ), hence using (4.6),

                       ∂
                            H(X | Y, G(ε)) = H(X | Y, G(ε),YI0 ) − H(X | Y, G(ε)) ,                    (4.9)
                       ∂ εI

where YI0 is an extra output drawn independently from Y (ε) but conditionally on the same X. Recall also
that for any two random variables A and B, H(A) − H(A | B) = H(B) − H(B | A). We then have

                        ∂
                             H(X | Y (ε)) = H(YI0 | X,Y, G(ε)) − H(YI0 | Y, G(ε))                     (4.10)
                        ∂ εI
                                          = H(YI0 | X[I],Y, G(ε)) − H(YI0 | Y, G(ε))                  (4.11)

where we used the fact that YI0 depends only on the components of X indexed by I. Finally, recall the
dependency among these variables: X and G(ε) are drawn independently, Y is drawn from PG,Q (· | X)
and YI0 is drawn independently from Q(· | X[I]). Therefore, YI0 is independent of (Y, G(ε)) conditionally
on X[I] and

                               H(YI0 | X[I],Y, G(ε)) = H(YI0 | X[I]) = H(Q)                           (4.12)

where H(Q) = −2−k ∑u∈Xk ,z∈Y Q(z | u) log Q(z | u).

                    T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                            424
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

    We define a path as a differentiable map t 7→ ε(t), with t ∈ [0, T ] for some T ≥ 0. We say that a path
is balanced if
                                                           dεI
                                                     ∑         (t) = 0 .                                  (4.13)
                                                  I∈E (V )
                                                           dt
                                                       k


We define G(t) = G(ε(t)). Since H(Q) in Lemma 4.2 is constant, we have the following.

Corollary 4.3. For a balanced path

                             d                                        dεI
                                H(X | Y, G(t)) = − ∑ H(YI0 | Y, G(t))     (t) .                           (4.14)
                             dt                   I∈E (V )
                                                                      dt
                                                                  k



    Given a partition V = V1 tV2 , we define the associated canonical path ε : t ∈ [0, 1] → ε(t) ∈ [0, 1]Ek (V )
as follows. Let ni = |Vi |, mi = |Ek (Vi )|, i ∈ {1, 2}, and m = |Ek (V )|. We define
                                                           αn
                                          εI (0) ≡            ,           ∀I ∈ Ek (V ),                   (4.15)
                                                           m

                                                       αn
                                                       m1
                                                         1
                                                                          if I ∈ Ek (V1 ),
                                        εI (1) ≡ αn 2
                                                                          if I ∈ Ek (V2 ),                (4.16)
                                                 m2
                                                       0                  otherwise,
                                                      

and

                                          ε(t) = (1 − t)ε(0) + tε(1) .                                    (4.17)

   Note that the canonical path is balanced. Moreover, at time t = 0, Pk (ε(0), n) reduces to the original
ensemble Pk (α, n), and at time t = 1, Pk (ε(1), n) reduces to two independent copies of the original
ensemble on the subset of n1 and n2 variables: Pk (α, n1 ) × Pk (α, n2 ).
   Applying Corollary 4.3, and using the chain rule for the derivative, we obtain the following.

Corollary 4.4. For the canonical path

      d
         H(X | Y, G(t)) = αn E H(YI0 | Y, G(t)) − αn1 E H(YI01 | Y, G(t)) − αn2 E H(YI02 | Y, G(t)) ,     (4.18)
      dt                     I                        I1                        I2

where I is uniformly drawn in Ek (V ), and Ii are drawn uniformly in Ek (Vi ), i = 1, 2, i. e.,

                                                                                          1
                       E H(YI0 | Y, G(t)) =       ∑ H(YI0 | Y, G(t)) |Ek (V )| ,                          (4.19)
                        I
                                               I∈Ek (V )
                                                                                          1
                       E H(YI0i | Y, G(t)) =      ∑ H(YI0 | Y, G(t)) |Ek (Vi )| ,
                                                                      i
                                                                                              i = 1, 2.   (4.20)
                       I1
                                               Ii ∈Ek (Vi )



                      T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                              425
                                       E MMANUEL A BBE AND A NDREA M ONTANARI

    Recall that

                         H(YI0 | Y, G(t)) = −          E        log ∑ Q(YI0 | x[I])RG(t) (x | Y, G(t))               (4.21)
                                                   Y,G(t),YI0             x
                                              =−       E        log           E      Q(YI0 | X[I]) ,                 (4.22)
                                                   Y,G(t),YI0         X|Y,G(t)

where Y is the output of PG(t),Q .
Lemma 4.5.
                                      ∞
           1 d                               1                                                        
                  H(X | Y, G(t)) = − ∑                 E        nΓ̃l (V ) − n1 Γ̃l (V1 ) − n2 Γ̃l (V2 )              (4.23)
          α|Y| dt                    l=2 l(l − 1) X ,...,X
                                                   (1)     (l)


where
                                                        l                     
                                           Γ̃l (V ) ≡ E ∏ 1 − Q(WI | X (r) [I]) ,                                    (4.24)
                                                   I,WI
                                                          r=1

I is uniformly drawn in Ek (V ), WI is uniformly drawn in Y, and X (1) , . . . , X (l) are drawn under the
probability distribution
                                                                      l
                  P{X (1) = x(1) , . . . , X (l) = x(l) } = ∑ ∏ RG(t),Q (x(i) | y) ∑ PG(t),Q (y | u)2−n .            (4.25)
                                                                y i=1                                u

    This means that X (1) , . . . , X (l) are drawn i. i. d. from the channel RG(t) given a hidden output Y , these
are the “replica” variables, which are exchangeable but not i. i. d.

Proof of Lemma 4.5. By definition

                                   H(YI0 | Y, G(t)) = −          E            log     E        Q(YI0 | X[I]) ,       (4.26)
                                                           Y,G(t),YI0               X|Y,G(t)

and expanding the logarithm in its power series (the argument in the log is between 0 and 1),
                                                                ∞                           l
                                                                    1
                         log       E      Q(YI0 | X[I]) = − ∑            E 1 − Q(YI0 | X[I]) .                       (4.27)
                               X|Y,G(t)                         l=1 l X|Y,G(t)

We now introduce the “replicas” X (1) , . . . , X (l) distributed as follows. Recall that X is drawn independently
of G(t) and Y is drawn from PG(t),Q (· | X). The replicas are drawn independently from the reverse channel
RG(t),Q (· | Y ) defined in (2.2). In other words, we have the Markov chain
                                                   P       R
                                                X − Y − (X(1), . . . , X(l)) .                                       (4.28)

                    e = 1 − Q, we can rewrite (4.27) as
Defining the kernel Q
                                                            ∞                              l
                                                               1
                      log      E       Q(YI0 | X[I]) = − ∑                  E                 e I0 | X (r) [I])) .
                                                                                          ∏ Q(Y                      (4.29)
                            X|Y,G(t)                       l=1 l X (1) ,...,X (l) |Y,G(t)
                                                                                          r=1


                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                        426
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

Collecting terms we have
                                                           ∞                              l
                                                              1
                    H(YI0 | Y, G(t)) = E          E   E ∑                  E             ∏   e I0 | X (r) [I]))
                                                                                             Q(Y                         (4.30)
                                                      0
                                         X Y,G(t)|X YI |X
                                                          l=1 l X (1) ,...,X (l) |Y,G(t)
                                                                                         r=1
                                          ∞                        l
                                            1                         e I0 | X (r) [I])) .
                                       =∑         E         E     ∏   Q(Y                                                (4.31)
                                            l   (1)
                                         l=1 X,X ...,X
                                                       (l) YI
                                                             0 |X
                                                                  r=1

We now use a manipulation that is specific to the planted framework. In the non-planted framework,
super-additivity is achieved by showing that each term weighted by 1/l is convex in the empirical
distribution of the replicas. In the planted framework, this is no longer true from the above expression.
We rely on an additional step which will flip the behaviour from super- to sub-additivity. We switch
measure in the expectation EYI0 |X , defining WI to be uniformly distributed over Y, and write

                                              ∞                        l
                                               1
                  H(YI0 | Y, G(t)) = |Y| ∑             E           E ∏ Q(We I | X (r) [I]))Q(WI | X[I])) .               (4.32)
                                           l=1 l X,X (1) ...,X (l) WI
                                                                      r=1

Renaming X by X (0) , and using the fact that X (0) , X (1) . . . , X (l) are exchangeable (they are i. i. d. condi-
tioned on Y ), we can write
                                                                                                           !
                            ∞                               l                          l
                               1
    H(YI0 | Y, G(t)) = |Y| ∑              E             E ∏ Q(We I | X (r) [I])) − E ∏ Q(We I | X (r) [I]))   (4.33)
                           l=1 l X (0) ,X (1) ...,X (l) WI
                                                           r=1
                                                                                   WI
                                                                                      r=0
                                                                ∞                               l
                                e I | X (1) [I])) − |Y| ∑              1                           e I | X (r) [I])) .
                      = |Y| E E Q(W                                               E E ∏ Q(W                              (4.34)
                            X (1) WI                           l=2 l(l − 1) X (1) ...,X (l) WI
                                                                                               r=1

                    e I | X (1) [I])) does not depend on n. Recalling that from Corollary 4.4,
Moreover EX (1) EWI Q(W

        d
           H(X | Y, G(t)) = αn E H(YI0 | Y, G(t)) − αn1 E H(YI01 | Y, G(t)) − αn2 E H(YI02 | Y, G(t)) ,
        dt                     I                        I1                        I2

we can express each term in the above using (4.34), where the first term in (4.34) is the same in the
global or subsystems (since it does not depend on n, n1 or n2 ), hence it cancels out in the above sum as
n = n1 + n2 .

    Finally, the following corollary is a change of notation from the previous lemma.

Corollary 4.6.
                                     ∞
          1 d                               1
                 H(X | Y, G(t)) = − ∑                 E [nΓl (νl ) − n1 Γl (νl,1 ) − n2 Γl (νl,2 )]                      (4.35)
         α|Y| dt                    l=2 l(l − 1) X ...,X
                                                  (1)    (l)



where νl (respectively νl,i ) is the empirical distribution of {X (1) [i], . . . , X (l) [i]}i∈V over Xl (respectively of
{X (1) [i], . . . , X (l) [i]}i∈Vi over Xl , i = 1, 2), and Γl (·) is as defined in (3.7) (for which Hypothesis H requires
convexity).

                      T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                             427
                                    E MMANUEL A BBE AND A NDREA M ONTANARI

Proof. Recall that V = [n] and V1 and V2 are the subsystems of cardinality n1 and n2 respectively.
                                   l                     
                      Γ̃l (V ) = E ∏ 1 − Q(WI | X (r) [I])                                                       (4.36)
                                  I,WI
                                         r=1
                                           "                                          #
                                                    l
                                  1
                               =
                                 |Y|
                                     EI        ∑ ∏(1 − Q(y | X (r) [I]))                                         (4.37)
                                               y∈Y r=1
                                                            "                                 #
                                                                      l
                                  1
                               =             ∑
                                 |Y| u(1) ,...,u                ∑ ∏(1 − Q(y | u(r) )) η(u(1) , . . . , u(l) )    (4.38)
                                                (l) ∈Xk      y∈Y r=1


where the last equality holds by defining η(u(1) , . . . , u(l) ) as the empirical distribution of

                                                   {X (1) [I], . . . , X (l) [I]}I∈[n]k

over Xkl . Moreover, since the index set I is drawn uniformly at random in [n]k ,
                                                                           k
                                                                                (1)           (l)
                                         η(u(1) , . . . , u(l) ) = ∏ νl (ui , . . . , ui ) ,                     (4.39)
                                                                          i=1

where νl is the empirical distribution of {X (1) [i], . . . , X (l) [i]}i∈[n] over Xl . Hence
                                                        "                                 #
                                                                  l                            k
                                1                                                                   (1)    (l)
                    Γ̃l (V ) =             ∑
                               |Y| u(1) ,...,u              ∑ ∏(1 − Q(y | u(r) )) ∏ νl (ui , . . . , ui )        (4.40)
                                              (l) ∈Xk     y∈Y r=1                             i=1

                            = Γl (νl ) .

Proof of Theorem 3.2. Hypothesis H ensures that Γl is convex for any distribution on Xl , hence in
particular for the empirical distribution of the replicas. Moreover, since V = V1 ∪V2 and V1 ∩V2 = 0,
                                                                                                   / the
empirical distribution over V can be additively computed from the empirical distribution over V1 and V2 .
Formally,

                                                        nνl = n1 νl,1 + n2 νl,2 ,                                (4.41)

for any l ≥ 1, and the convexity of Γl together with Corollary 4.6 imply the theorem.


5    Proof of Theorem 3.4
Proof. Since we know that HG (X | Y )/n converges in expectation, it is sufficient to show that it concen-
trates around its expectation. Indeed we claim that there exists B > 0 such that
                                                                              2
                                    P |HG (X | Y ) − H(X | Y )| ≥ n∆ ≤ 2 e−nB∆ ,
                                     
                                                                                                                  (5.1)

whence our thesis follows from Borel-Cantelli.

                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                    428
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

   The proof of equation (5.1) is a direct application of the Azuma-Hoeffding inequality [11]. We
condition on the number m ∈ [(α − δ )n, (α + δ )n] of hyperedges in G and prove that
                      n                                            o          2
                    P |HG (X | Y ) − H(X | Y )| ≥ n∆ |E(G)| = m ≤ 2 e−nB∆ .                   (5.2)

The claim follows from summing this inequality over m ∈ [(α − δ )n, (α + δ )n] and noting that |E(G)| 6∈
[(α − δ )n, (α + δ )n] with exponentially small probability by standard concentration of binomial random
variables.
    In order to prove the bound equation (5.2), we regard HG (X | Y ) as a function of the choice of the m
hyperedges: (e1 , e2 , . . . , em ) 7→ HG (X | Y ). Since the the hyper-edges are independent, it is sufficient to
prove that this function is Lipshitz continuous. Indeed, we claim that |HG (X | Y ) − HG0 (X | Y )| ≤ 2C, for
some constant C, if G and G0 differ only in one of their hyperedges,
    In order to prove the last claim, let G0 = G + a denote the graph G to which hyperedge a = (i1 , . . . , ik )
has been added. Then, writing explicitly the component of Y corresponding to hyperedge a by Ya , we
need to prove that |HG+a (X | Y,Ya ) − HG (X | Y )| ≤ C. We have, dropping the subscripts and superscript
for the sake of simplicity,

                    0 ≤ H(X | Y ) − H(X | Y,Ya ) = H(X | Y ) − H(X,Ya | Y ) + H(Ya | Y )                    (5.3)
                                                   = H(Ya | Y ) − H(Ya | X,Y )                              (5.4)
                                                   ≤ log2 |Y| ,                                             (5.5)

where the last inequality follows from the fact that Ya takes value in the finite set Y.


6     Applications to specific models
We next present three applications of the general model and results described in the previous section.
While planted CSPs and parity-check codes are directly derived as particular cases of our model, the
stochastic block model is obtained with a limiting argument. Note that the general model described in the
previous section also allows to generate new hybrid structures. For example, one may consider codes
which are not linear but which rely on OR gates as in SAT, community structures whose connectivity rely
on collections of k nodes, or network models which have censored edges. The latter case was recently
considered in [2, 1].

6.1   Planted constraint satisfaction problems
Definition 6.1. A CSP kernel is given by

                                             1
                              Q(y | u) =          1(y ∈ A(u)) ,    u ∈ Xk , y ∈ Y ,                         (6.1)
                                           |A(u)|

where A(u) is a subset of Y containing the variables that y can be assigned to, with the property that
|A(u)| is constant (it may depend on k but not on u).

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                 429
                               E MMANUEL A BBE AND A NDREA M ONTANARI

    We will next show that a graphical channel with a CSP kernel corresponds to a planted CSP. We
derive first a few known examples of CSPs.

    • For planted k-SAT, Y = {0, 1}k and A(u) = {0, 1}k \ u. Hence, |A(u)| = 2k − 1. With this kernel, the
      planted assignment x generates for any selected edge I ∈ Ek (V ) an edge variable yI in {0, 1}k \ x[I]
      uniformly drawn. See Section 2 for an interpretation of this model as the usual planted SAT model.

    • For planted k-NAE-SAT, Y = {0, 1}k and A(u) = {0, 1}k \ {u, ū}, where ū is the vector obtained by
      flipping each component in u. Hence, |A(u)| = 2k − 2.

    • For k-XOR-SAT, Y = {0, 1} and A(u) = ⊕ki=1 ui , hence |A(u)| = 1.

A graphical channel with graph g and kernel Q as in (6.1) leads to a planted CSP where the constraints
are given by A(u[I]) 3 yI for any I ∈ E(g). For example, for planted k-SAT, the constraints are equivalent
to u[I] 6= yI , whereas for planted k-NAE-SAT, the constraints are equivalent to u[I] ∈ / (yI , ȳI ). If y is
drawn from the output marginal distribution Sg,Q (cf. (2.3)), then there exists a satisfying assignment
by construction. If X is drawn uniformly at random, then the posterior distribution is also uniformly
distributed on its support, and we have the following lemma.

Lemma 6.2. For a graphical channel with graph g and CSP kernel Q as in (6.1), and for y in the support
of Sg,Q ,

                                          Hg (X | Y = y) = log Zg (y)                                      (6.2)

where Zg (y) is the number of satisfying assignments of the planted CSP with graph g and constraints
specified by y (where the structure of constraints is specified by Q).

Proof of Lemma 6.2. We use the notation x ∼ y to say that x is a satisfying assignment for the clause
specified by y and the kernel Q. We have

                                   Pg,Q (y | x) =    ∏ Q(yI | x[I])                                        (6.3)
                                                    I∈E(g)
                                                               1
                                               =     ∏ |A| 1(yI ∈ A(x[I]))                                 (6.4)
                                                    I∈E(g)
                                                    (
                                                            1
                                                        |A||E(g)|
                                                                    if x ∼ y,
                                               =                                                           (6.5)
                                                      0             otherwise.

Hence for a given x, Pg,Q (· | x) is uniform on the set of all y’s verifying x, which has cardinality |A||E(g)| .
Since X is uniform, for a given y, Rg,Q (· | y) is a uniform measure on a set of cardinality

                 ∑ ∏ 1(yI ∈ A(x[I])) = |{x ∈ Xn : yI ∈ A(x[I]), ∀I ∈ E(g)}| = Zg (y) .                     (6.6)
                x∈Xn I∈E(g)

Therefore Hg (X | Y = y) = log Zg (y).


                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                               430
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

Corollary 6.3. For a graphical channel with CSP kernel Q as in (6.1), and for a graph G drawn from
the ensemble P(α, n),

                                              H(X | Y, G) = E log ZG (Y ) ,                                          (6.7)
                                                                        G,Y

where ZG (Y ) is the number of satisfying assignments of the corresponding random planted CSP.

Remark 6.4. Together with Corollary 3.3, this result gives the convergence of the normalized expected
logarithm of the number of solutions for any edge density α. This is to be put in contrast with [4], which
obtains the convergence of the same quantity (where the number of solutions is shifted by 1 to avoid
taking the logarithm of 0) only for a regime of α small enough, essentially where the probability of being
unsatisfiable decays faster than 1/ log1+ε (n) for some ε > 0. One should note that an unconditional result
of the kind of Corollary 6.3 for the non-planted setting would imply the satisfiability conjecture (the
existence of an n-independent threshold for k-SAT). Since [32] provides a n-dependent threshold for
k-SAT, the convergence of
                                             1
                                                E log(ZFk (n, α))
                                             n
to a limit φ (α) allows to freeze the limit of Friedgut’s threshold, using standard analytical arguments (see
the proof of Theorem 2, Section 5 in [4]).

      We now verify that several planted CSPs satisfy Hypothesis H.

Lemma 6.5. For any k ≥ 1, and for the CSP kernel corresponding to planted k-SAT, the operator Γl is
convex for any l ≥ 1.

Proof of Lemma 6.5. For planted k-SAT, Y = Xk = {0, 1}k ,

                                                                        1
                                               Q(z | u) =                         1(z 6= u)                          (6.8)
                                                                   2k − 1
and
                                         l                        "                          #
                                                                             l                    k
                          1          1                                                                  (1)   (l)
               Γl (νl ) = k
                         2         k
                                  2 −1               ∑                 ∑ ∏ 1(z = u(r) ) ∏ νl (ui , . . . , ui )      (6.9)
                                              u(1) ,...,u(l) ∈Xk   z∈Y r=1                        i=1
                                        l            k
                        1            1
                      = k          k           ∑ ∏ νl (ui , . . . , ui )                                            (6.10)
                       2          2 −1        u∈Xk i=1
                                        l                   k
                        1            1
                      = k          k              ∑ ∏ νl (ui , . . . , ui )                                         (6.11)
                       2          2 −1        u1 ,...,uk ∈X i=1
                                        l                                        !k
                        1            1
                      = k          k              ∑ νl (u1 , . . . , u1 )               ,                           (6.12)
                       2          2 −1           u1 ∈X

which is convex in νl for any k, l ≥ 1.

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                         431
                                   E MMANUEL A BBE AND A NDREA M ONTANARI

Lemma 6.6. For any k ≥ 1, and for the CSP kernel corresponding to planted k-NAE-SAT, the operator
Γl is convex for any l ≥ 1.

Proof of Lemma 6.6. For planted k-NAE-SAT, Y = Xk = {0, 1}k ,

                                                                       1
                                                Q(z | u) =                 1(z ∈
                                                                               / (u, ū))                                     (6.13)
                                                                    2k − 2

and
                                          l                        "                             #
                                                                           l                          k
                         1          1                                              (r)                          (1)    (l)
              Γl (νl ) = k        k                   ∑                 ∑ ∏ 1(u          ∈ (z, z̄)) ∏ νl (ui , . . . , ui )   (6.14)
                        2        2 −2          u(1) ,...,u(l) ∈Xk    z∈Y r=1                          i=1
                                         l                            k
                      1             1
                    = k
                     2            k
                                 2 −2              ∑           ∑ ∏ νl (ui ⊕ b1 , . . . , ui ⊕ bl )                            (6.15)
                                               b1 ,...,bl ∈X u∈Xk i=1
                                         l                                                           !k
                      1             1
                    = k
                     2           2k − 2            ∑                ∑ νl (u1 ⊕ b1 , . . . , u1 ⊕ bl )       ,                 (6.16)
                                               b1 ,...,bl ∈X    u1 ∈X


which is convex in νl for any k, l ≥ 1.

Lemma 6.7. For any k even, and for the CSP kernel corresponding to planted k-XOR-SAT, the operator
Γl is convex for any l ≥ 1.

Proof of Lemma 6.7. This is a special case of Lemma 6.12 for s = 1, d = −1.

      Note that for k-XOR-SAT, k odd, Γl may not be convex. For example for k = 3,

                                                    Γ2 (νl ) = 1 + F(νl )3 (1, 1)

which is not convex.
   Using Theorem 3.4 and the previous lemmas, the following is obtained.

Corollary 6.8. Let Z(Fn ) denote the number of solutions of a random planted formula Fn = (Gn ,Y ) with
graph Gn and k-SAT, k-NAE-SAT, or k-XOR-SAT (k even) kernel as in Definition 6.1. Then

                                                               1
                                                                 E log Z(Fn )
                                                               n Y

converges almost surely.

   Note that the almost sure convergence is over Gn only, and not on the negation patterns Y which are
under the expectation. It remains open to show that concentration holds with respect to both Gn and Y .
Note also that the limit depends only on k and α, and is not n-dependent.

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                                   432
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

6.2    Stochastic block model
The problem of community detection is to divide a set of vertices in a network (or graph) into groups
having “similar behavior.” This may refer to clustering the nodes into subgroups having higher edge
density inside the subgroups (assortative case), or across the subgroups (disassortative case). In this paper,
we use the terms clustering and community detection for the same purpose. Community detection is
a fundamental problem in many modern statistics, machine learning, and data mining problems with a
broad range of applications in population genetics, image processing, biology and social science. A large
variety of models have been proposed for community detection problems. We refer to [52, 29, 34] for a
survey on the subject.
      At an algorithmic level, the problem of finding the smallest cut in a graph with two equally sized
groups, i. e., the min-bisection problem, is well-known to be NP-hard [26]. Concerning average-case
complexity, various random graphs models have been proposed for community detection. The Erdős-
Rényi random graph is typically a very bad model for community structures, since each node is equally
likely to be connected to any other nodes and no communities are typically formed. The stochastic block
model is a natural extension of an Erdős-Rényi model with a community structure. Although the model is
fairly simple (communities emerge but the average degree is still constant7 ) it is a fascinating model with
several fundamental questions still open.
      We now describe the stochastic block model (SBM) with two groups and symmetric parameters, also
called the planted bisection model. Let V = [n] be the vertex set and a, b be two positive real numbers.
For a uniformly drawn assignment X ∈ {0, 1}V on the vertices, an edge is drawn between vertex i and j
with probability a/n if Xi = X j and with probability b/n if Xi 6= X j , and each edge is drawn independently
conditionally on the vertex variables. We denote this model by G(n, a, b). Note that the average degree of
an edge is (a + b)/2, however, a 0-labeled node is connected in expectation with a/2 − a/n nodes that
have a 0-label and with b/2 nodes that have a 1-label.
      This model was studied in [15, 21, 26, 56] for the recovery of the clusters in dense graph regimes
and in [17, 57] for sparser regimes with logarithmic degree. The threshold for       √ the logarithmic
                                                                                            p       √ degree
regime was recently obtained in [3], with recovery possible if and only if α − β ≥ 2, where
α = pn/ log(n), β = qn/ log(n) and p (respectively, q) are the intra (respectively, extra) edge probability
(with α > β > 0). The attention to the sparse regime described above was initiated in [18] and later
in [23, 50]. In particular, [23] conjectured a phase transition phenomenon, with the detection of clusters
(i. e., obtaining a reconstruction positively correlated with the true one) possible if (a − b)2 > 2(a + b) and
impossible otherwise. In [50], a proof of the impossibility part is obtained, and in [46, 51] the conjecture
was set.
      There are at least two ways to define the SBM as a graphical channel. The direct way is simply to
consider G to be the complete graphs, and for each pair of vertices, to use the kernel

                                          Q(yi j | xi , x j ) = P(yi j | xi ⊕ x j ) ,

where P(1 | 0) = a/n and P(1 | 1) = b/n. With this approach, however, the channel Q depends on the
number of vertices n, whereas the edge probability of the graph is constant. We next show how we
can “move” the sparsity from the kernel to the graph, obtaining a graphical channel with a sparse graph
   7 Models with corrected degrees have been proposed in [39].



                      T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                            433
                               E MMANUEL A BBE AND A NDREA M ONTANARI

as in the previous section and a fixed (asymmetric) channel. The resulting model will approximate
accurately the original model as shown next. Note that this creates a strong connection between coding
and clustering, since it expresses the latter problem as a particular type of code (a simple degree 2 LDGM
code) on a binary input/output channel which is very noisy.
Definition 6.9. For a, b ∈ [0, 1], an SBM(a, b) kernel is given by X = Y = {0, 1} and
                                                        (
                                                         a if u1 = u2 ,
                                      Q(1 | u1 , u2 ) =                                            (6.17)
                                                         b if u1 6= u2 ,

for all u1 , u2 ∈ {0, 1}.
Lemma 6.10. There exists n0 = n0 (γ, a, b) and C = C(a, b) such that the following holds true. Let X be
uniformly drawn on {0, 1}V , Y be the output (the graph) of a sparse stochastic block model of parameters
a, b, and Yγ be the output of a graphical channel with graph Gγ drawn from the ensemble P(γ, n) and
kernel SBM(a/γ, b/γ) (cf. Definition 6.9), then, for all n ≥ n0
                                                                                  Cn
                                     H(X | Y ) − H(X | Gγ ,Yγ ) ≤                    .             (6.18)
                                                                                   γ
Lemma 6.11. For the SBM kernel given by (6.17), a ≤ b (disassortative case) and γ large enough, the
operator Γl is convex for any l ≥ 1.
Proof of Lemma 6.11. This is a special case of Lemma 6.12 below, with s = (a + b)/γ and d = (a − b)/γ.
If γ is large enough, then s ≤ 1 and if a ≤ b, then d ≥ 0 and all the coefficients in (6.26) are positive.

    We denote by
                                       F(νl )(w) = ∑ (−1)x·w νl (x)
                                                           x∈Fl2

the Fourier-Walsh transform of νl evaluated at w ∈ Fl2 (where x · w denotes the dot product of x and w).
Lemma 6.12. If X = Y = {0, 1},
                                    Q(y | x1 , . . . , xk ) = W y | ⊕ki=1 X[I]
                                                                              

and W is an arbitrary binary input/output channel, then
                               1         h                           i
                     Γl (νl ) = ∑ d |w| sl−|w| + (−1)|w| (2 − s)l−|w| F(νl )k (w)                  (6.19)
                               2 w∈Fl
                                       2


where s = W (1 | 0) +W (1 | 1), d = W (1 | 0) −W (1 | 1), |w| = ∑li=1 wi .
    Note that F(νl )(w) is linear in νl , hence
    • For s = 1, i. e., for symmetric channels,
                                           Γl (νl ) =         ∑            d |w| F(νl )k (w)       (6.20)
                                                        w∈Fk2 : |w| even

       and Γl is convex when k is even.

                      T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                        434
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

    • If s ≥ 1, d ≥ 0 or s ≤ 1, d ≤ 0, then Γl is convex when k is even.
Proof of Lemma 6.12. We have
                                                      "                                   #
                                                                  l                            k
                                1                                                   (r)             (1)    (l)
                     Γl (νl ) =           ∑ k
                                2 u(1) ,...,u             ∑ ∏(1 − P(y | u )) ∏ νl (ui , . . . , ui )             (6.21)
                                             (l) ∈F
                                                  2
                                                          y∈F2 r=1                            i=1

                                                                      (r)
and using the fact that P(y | u(r) ) = W (y | ⊕ki=1 ui ) ,
                                                  "                      #
                                                         l
                              1
                   Γl (νl ) =           ∑
                              2 v(1) ,...,v         ∑ ∏(1 −W (y | v(r) )) νl?k (v(1) , . . . , v(l) )            (6.22)
                                           (l) ∈F  y∈F2 r=1
                                                      2

                                    1
                                =      ∑l γ(v)νl?k (v)
                                    2 v∈F
                                                                                                                 (6.23)
                                            2

where
                                        l
                      γ(v) ≡ ∑ ∏(1 −W (y | v(r) )) = (1 − a)l−|v| (1 − b)|v| + al−|v| b|v|                       (6.24)
                                y∈F2 r=1

and a = W (1 | 0), b = W (1 | 1). Note that the Fourier transform of al−|v| b|v| is given by
                                                              F
                                       al−|v| b|v|         −→           (a + b)l−|w| (a − b)|w| ,                (6.25)
hence
                   F(γ)(w) = (2 − (a + b))l−|w| (a − b)|w| (−1)|w| + (a + b)l−|w| (a − b)|w| .                   (6.26)


Proof of (6.25). To show that we have the Fourier pair
                                                          F
                             Fl2 3 v 7→ ρ |v|         −→          Fl2 3 w 7→ (1 + ρ)l−|w| (1 − ρ)|w|             (6.27)
note that the identity is true when l = 1 and assume it to be true for l. Then for l + 1
                                                                                l                   l
                         ∑ ρ |v| (−1)|vw| = ∑ ρ |v| (−1)|vw | + ρ |v|+1 (−1)|vw | (−1)w
                                                                                1                   1     l+1
                                                                                                                 (6.28)
                       v∈Fl+1
                          2                           v∈Fl2
                                                                                l
                                                = ∑ ρ |v| (−1)|vw1 | (1 + ρ(−1)wl+1 )                            (6.29)
                                                      v∈Fl2
                                                                            l         l
                                                = (1 + ρ)l−|w1 | (1 − ρ)|w1 | (1 + ρ(−1)wl+1 ) .
Corollary 6.13. For the disassortative SBM, the limit of H(X | Y )/n exists and satisfies
                                          1                   1
                                      lim H(X | Y ) = lim lim H(X | Gγ ,Yγ ) .                                   (6.30)
                                      n→∞ n           γ→∞ n→∞ n

    In a work in progress, the assortative case is investigated with a different proof technique. The
regime of a, b where the above limit is strictly below 1 is expected to reflect the phase transition of the
SBM [46, 51].

                       T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                    435
                                     E MMANUEL A BBE AND A NDREA M ONTANARI

6.3    Parity-check encoded channels
Shannon’s coding theorem states that for a discrete memoryless channel W from X to Y, the largest rate at
which reliable communication can take place is given by the capacity I(W ) = maxX I(X;Y ), where I(X;Y )
is the mutual information of the channel W with a random input X. To show that rates up to capacity
are achievable, Shannon used random codebooks, relying on a probabilistic argument. Shortly after,
Elias [27] showed8 that random linear codes allow to achieve capacity, reducing the encoding complexity
from exponential to quadratic in the code dimension. However, Berlekamp, McEliece, and Van Tilborg
showed in [14] that the maximum likelihood decoding of unstructured linear codes is NP-hard.
     In order to reduce the complexity of the decoder, Gallager proposed to the use sparse linear codes [33],
giving birth to the LDPC codes, with sparse parity-check matrices, and LDGM codes, with sparse
generator matrices. Various types of LDPC/LDGM codes depend on various types of row and column
degree distributions. Perhaps the most basic class of such codes is the LDGM code with constant right
degree, which corresponds to a generator matrix with column having a fixed number k of ones. This
means that each codeword is the XOR of k uniformly selected information bits. In other words, this is a
graph based code drawn from an Erdős-Rényi or Poisson ensemble Pk (α, n). The code can also be seen
as a planted k-XOR-SAT formula. The dimension of the code is m = αn and the rate is r = 1/α.
     Despite the long history of research on the LDPC and LDGM codes, and their success in practical
applications of communications, there are still many open questions concerning the behaviour of these
codes. In particular, even for the simple code described above, it is still open to show that the mutual
information (1/n)I(X n ;Y m ) concentrates, with the exception of the binary erasure channel for which
much more is known [44, 45]. In the case of dense random codes, standard probability arguments show
that concentration occurs with a transition at capacity for any discrete memoryless channels. But for
sparse codes, the traditional arguments fail. Recently, the following was conjectured in [42] for constant
right degree LDGM codes G and binary input symmetric output channels,9
                                                      (
                                  1                          0 if α < Ck (W ),
                             PG     IG (X;Y ) < I(W ) →                                               (6.31)
                                  m                          1 if α > Ck (W ),
where Ck (W ) is a constant depending on k and W .
   We provide next a concentration result for this model, which implies the above conjecture for even
degrees.
Definition 6.14. An encoded symmetric kernel is given by
                                                 Q(z | u) = W (z | ⊕ki=1 ui ) ,                                       (6.32)
where W is a binary input symmetric output (BISO) channel from X to Y.
    Note that this corresponds to the output of a BISO W when the input to the channel is the XOR of
k information bits. This corresponds also to the constant right-degree LDGM codes considered in the
conjecture of [42].
   8 The result of Elias is originally for binary-input channels.
    9 This means that the channel output can be obtained by drawing at random a BSC among a finite list of given BSCs and then

drawing an output from the selected BSC. In terms of the matrix representing the channel, it has the property that each column
is either constant or each column comes in pair with another column where the top and bottom components are interchanged.


                        T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                         436
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

Lemma 6.15. For an encoded symmetric kernel with k even, the operator Γl is convex for any l ≥ 1.

Proof. We represent the channel W as a 2 × |Y| stochastic matrix. By definition of BISO channels, this
matrix can be decomposed into pairs of columns which are symmetric as
                                                    
                                                c d
                                                                                               (6.33)
                                                d c

with c, d ≥ 0, or into single columns which have constant values. Let us assume that W contains m such
matrices and s such constant columns. We have
                                                       "                          #
                                                              l                      k
                               1                                            (r)            (1)          (l)
                   Γl (νl ) =             ∑ k ∑∏                (1 − P(y | u    ))  ∏ νl (ui , . . . , ui ) (6.34)
                              |Y| u(1) ,...,u (l) ∈F     y∈Y r=1                    i=1
                                                     2
                                                       "                           #
                                                              l
                               1
                            =             ∑
                              |Y| v(1) ,...,v            ∑ ∏(1 −W (y | v(r) )) νl?k (v(1) , . . . , v(l) )  (6.35)
                                             (l) ∈F
                                                  2     y∈Y r=1
                             1
                          =      ∑l g(v)νl?k (v)
                            |Y| v∈F
                                                                                                           (6.36)
                                    2

                               1
                          =        ∑l F(g)(w)F(νl )k (w)
                              |Y| w∈F
                                                                                                           (6.37)
                                        2


where
                                       m                       s
                                           l−|v| |v| l−|v| |v|
                                g(v) = ∑ Ci Di + Di Ci + ∑ Eil ,                                           (6.38)
                                            i=1                               i=1

for some positive constants Ci , Di , i ∈ [m], Ei , i ∈ [s]. Moreover, using (6.25),

                                            F
             Cl−|v| D|v| + Dl−|v|C|v|       −→        (C + D)l−|w| (C − D)|w| + (C + D)l−|w| (D −C)|w|     (6.39)
                                                      = (C + D)l−|w| (C − D)|w| (1 + (−1)|w| ) ,           (6.40)

and F(g)(w) has only positive coefficients since only the terms with |w| even survive. Hence Γl is convex
when k is even.

Corollary 6.16. Let X be uniformly drawn in {0, 1}n , U = XG be the output of a k-degree LDGM code G
of dimension αn, and Y be the output of U on a BISO channel W . (Note that we use G for both the graph
and the generator matrix. This abuse of notation is however explained by the fact that the generator
matrix is indeed the incidence matrix of the graph G.) Then

                                                         1
                                                           IG (X;Y )
                                                         n
converges almost surely to a constant Ck (α,W ).

                     T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                                 437
                             E MMANUEL A BBE AND A NDREA M ONTANARI

    Note that for any realization of G, and dropping the subscript G, we have
                                  1           1                
                                    I(X;Y ) =   H(Y ) − H(Y | X ,
                                  m           m
where H(Y | X) is the conditional entropy of the forward channel. Since the forward channel factorizes
over the m edges, we have H(Y | X) = mH(W ), where H(W ) denotes the conditional entropy of the
kernel W . Hence
                                  1                       1
                                     I(X;Y ) < 1 − H(W ) ≡ H(Y ) < 1 .
                                  m                       m
Since (1/m)H(Y ) converges from previous corollary, and since the limit must be decreasing in α
(increasing in r), the conjecture (6.31) follows.


7    Open problems
Some technical requirements resulting from Hypothesis H, such k being even for XORSAT or the
disassortativity for the stochastic block model, should be removed. It would also be interesting to
compute the value of the limit for various models such as SAT, LDGM codes and the stochastic block
model (even though it may not be an explicit formula). Finally, it would be interesting to obtain
concentration of the number of solutions for planted CSPs with respect to both the drawing of the random
graph and of the clause variables (e. g., the negation variables in SAT).


References
 [1] E MMANUEL A BBE , A FONSO S. BANDEIRA , A NNINA B RACHER , AND A MIT S INGER: Decoding
     binary node labels from censored edge measurements: Phase transition and efficient recovery. To
     appear in the IEEE Transactions on Network Science and Engineering, 2014. [arXiv:1404.4749]
     429

 [2] E MMANUEL A BBE , A FONSO S. BANDEIRA , A NNINA B RACHER , AND A MIT S INGER: Lin-
     ear inverse problems on Erdös-Rényi graphs: Information-theoretic limits and efficient re-
     covery. In IEEE Internat. Symp. on Information Theory (ISIT 2014), pp. 1251–1255, 2014.
     [doi:10.1109/ISIT.2014.6875033] 429

 [3] E MMANUEL A BBE , A FONSO S. BANDEIRA , AND G EORGINA H ALL: Exact recovery in the
     stochastic block model. 2014. [arXiv:1405.3267] 433

 [4] E MMANUEL A BBE AND A NDREA M ONTANARI: On the concentration of the number of solu-
     tions of random satisfiability formulas. Random Structures Algorithms, 45(3):362–382, 2014.
     [doi:10.1002/rsa.20501, arXiv:1006.3786v1] 414, 423, 431

 [5] D IMITRIS ACHLIOPTAS: Algorithmic barriers from phase transitions in graphs. In Graph Theoretic
     Concepts in Computer Science, volume 6410 of LNCS, pp. 1–1. Springer, 2010. Preliminary version
     in FOCS’08. [doi:10.1007/978-3-642-16926-7_1] 414, 415

                   T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                         438
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

 [6] D IMITRIS ACHLIOPTAS , H AIXIA J IA , AND C RISTOPHER M OORE: Hiding satisfying assignments:
     Two are better than one. J. Artif. Intell. Res. (JAIR), 24:623–639, 2005. Preliminary version in
     AAAI’04. [doi:10.1613/jair.1681] 414

 [7] D IMITRIS ACHLIOPTAS , H ENRY K AUTZ , AND BART S ELMAN C ARLA G OMES: Generating
     satisfiable problem instances. In Proc. AAAI, pp. 256–261, 2000. Available at AAAI. 414

 [8] D IMITRIS ACHLIOPTAS , J EONG H AN K IM , M ICHAEL K RIVELEVICH , AND P RASAD T ETALI:
     Two-coloring random hypergraphs. Random Structures Algorithms, 20(2):249–259, 2002.
     [doi:10.1002/rsa.997] 414

 [9] D IMITRIS ACHLIOPTAS , A SSAF NAOR , AND Y UVAL P ERES: Rigorous Location of Phase Transi-
     tions in Hard Optimization Problems. Nature, 435(7043):759–764, 2005. [doi:10.1038/nature03602]
     414

[10] FABRIZIO A LTARELLI , R EMI M ONASSON , AND F RANCESCO Z AMPONI: Can rare SAT formulas
     be easily recognized? On the efficiency of message passing algorithms for k-SAT at large clause-to-
     variable ratios. 2006. [arXiv:cs/0609101] 414

[11] K AZUOKI A ZUMA: Weighted sums of certain dependent random variables. Tohoku Mathematical
     Journal, 19(3):357–367, 1967. [doi:10.2748/tmj/1178243286] 429

[12] W OLFGANG BARTHEL , A LEXANDER K. H ARTMANN , M ICHELE L EONE , F EDERICO R ICCI -
     T ERSENGHI , M ARTIN W EIGT, AND R ICCARDO Z ECCHINA: Hiding solutions in random sat-
     isfiability problems: A statistical mechanics approach. Phys. Rev. Lett., 88(18):188701, 2002.
     [doi:10.1103/PhysRevLett.88.188701] 414

[13] M OHSEN BAYATI , DAVID G AMARNIK , AND P RASAD T ETALI: Combinatorial approach to the
     interpolation method and scaling limits in sparse random graphs. Ann. Probab., 41(6):4080–4115,
     2013. Preliminary version in STOC’10. [doi:10.1214/12-AOP816] 414, 423

[14] E LWYN R. B ERLEKAMP, ROBERT J. M C E LIECE , AND H ENK C.A. VAN T ILBORG: On the
     inherent intractability of certain coding problems (corresp.). IEEE Trans. Inform. Theory, 24(3):384–
     386, 1978. [doi:10.1109/TIT.1978.1055873] 436

[15] P ETER J. B ICKEL AND A IYOU C HEN: A nonparametric view of network models and newman-
     girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50):21068–
     21073, 2009. [doi:10.1073/pnas.0907096106, ] 433

[16] A NDREJ B OGDANOV AND YOUMING Q IAO: On the security of Goldreich’s one-way function. Com-
     put. Complexity, 21(1):83–127, 2012. Preliminary version in RANDOM’09. [doi:10.1007/s00037-
     011-0034-0] 415

[17] R AVI B. B OPPANA: Eigenvalues and graph bisection: An average-case analysis. In Proc. 28th
     FOCS, pp. 280–285. IEEE Comp. Soc. Press, 1987. [doi:10.1109/SFCS.1987.22] 433

                   T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                           439
                            E MMANUEL A BBE AND A NDREA M ONTANARI

[18] A MIN C OJA -O GHLAN: Graph partitioning via adaptive spectral techniques. Combin. Probab.
     Comput., 19(2):227–284, 2010. [doi:10.1017/S0963548309990514] 433

[19] A MIN C OJA -O GHLAN: The asymptotic k-SAT threshold. In Proc. 46th STOC, pp. 804–813. ACM
     Press, 2014. ACM DL. 414

[20] A MIN C OJA -O GHLAN , M ICHAEL K RIVELEVICH , AND DAN V ILENCHIK: Why almost all sat-
     isfiable k-CNF formulas are easy. In Proc. 13th Int. Conf. Analysis of Algorithms (AofA’07), pp.
     89–102, 2007. Available at DMTCS. 414

[21] A NNE C ONDON AND R ICHARD M. K ARP: Algorithms for graph partitioning on the planted
     partition model. Random Structures Algorithms, 18(2):116–140, 2001. Preliminary version in
     RANDOM’99. [doi:10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2] 433

[22] T HOMAS M. C OVER AND J OY A. T HOMAS: Elements of Information Theory. Wiley Interscience,
     New York, 1991. 418, 420

[23] AURELIEN D ECELLE , F LORENT K RZAKALA , C RISTOPHER M OORE , AND L ENKA Z DEBOROVÁ:
     Asymptotic analysis of the stochastic block model for modular networks and its algorithmic
     applications. Phys. Rev., 84(6):066106, 2011. [doi:10.1103/PhysRevE.84.066106] 415, 433

[24] M ARTIN D IETZFELBINGER , A NDREAS G OERDT, M ICHAEL M ITZENMACHER , A NDREA M ONTA -
     NARI , R ASMUS PAGH , AND M ICHAEL R INK: Tight thresholds for cuckoo hashing via XORSAT. In
     Proc. 37th Internat. Colloq. on Automata, Languages and Programming (ICALP’10), pp. 213–225.
     Springer, 2010. [doi:10.1007/978-3-642-14165-2_19, arXiv:0912.0287] 414

[25] M ARTIN D IETZFELBINGER , A NDREAS G OERDT, M ICHAEL M ITZENMACHER , A NDREA M ON -
     TANARI , R ASMUS PAGH , AND M ICHAEL R INK: Tight thresholds for cuckoo hashing via XORSAT.
     2010. [arXiv:0912.0287v1] 414

[26] M ARTIN E. DYER AND A LAN M. F RIEZE: The solution of some random NP-hard problems in
     polynomial expected time. J. Algorithms, 10(4):451–489, 1989. Preliminary version in FOCS’86.
     [doi:10.1016/0196-6774(89)90001-1] 433

[27] P ETER E LIAS: Coding for noisy channels. IRE Convention Record, 4:37–46, 1955. 436

[28] U RIEL F EIGE , E LCHANAN M OSSEL , AND DAN V ILENCHIK: Complete convergence of message
     passing algorithms for some satisfiability problems. Theory of Computing, 9(19):617–651, 2013.
     Preliminary version in RANDOM’06. [doi:10.4086/toc.2013.v009a019] 414

[29] S ANTO F ORTUNATO: Community detection in graphs. Physics Reports, 486(3-5):75–174, 2010.
     [doi:10.1016/j.physrep.2009.11.002, arXiv:0906.0612] 433

[30] S ILVIO F RANZ AND M ICHELE L EONE: Replica bounds for optimization problems and diluted spin
     systems. J. Stat. Phys., 111(3-4):535, 2003. [doi:10.1023/A:1022885828956] 414, 423

                  T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                       440
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

[31] S ILVIO F RANZ , M ICHELE L EONE , AND FABIO L UCIO T ONINELLI: Replica bounds for diluted
     non-Poissonian spin systems. J. Phys. A, 36(43):10967, 2003. [doi:10.1088/0305-4470/36/43/021]
     414, 423
[32] E HUD F RIEDGUT: Sharp thresholds of graph properties, and the k-sat problem. J. Amer. Math. Soc.,
     12:1017–1054, 1999. Appendix by Jean Bourgain. [doi:10.1090/S0894-0347-99-00305-7] 414, 431
[33] ROBERT G. G ALLAGER: Low-Density Parity-Check Codes. MIT Press, Cambridge, Massachussetts,
     1963. 436
[34] A NNA G OLDENBERG , A LICE X. Z HENG , S TEPHEN E. F IENBERG , AND E DOARDO M. A IROLDI:
     A survey of statistical network models. Foundations and Trends in Machine Learning, 2(2):129–233,
     2010. [doi:10.1561/2200000005] 415, 433
[35] O DED G OLDREICH: In Candidate one-way functions based on expander graphs, volume 6650 of
     LNCS, pp. 76–87. Springer, 2011. [doi:10.1007/978-3-642-22670-0_10] 415
[36] F RANCESCO G UERRA AND FABIO L UCIO T ONINELLI: The thermodynamic limit in mean field
     spin glasses. Commun. Math. Phys., 230(1):71–79, 2002. [doi:10.1007/s00220-002-0699-y] 414,
     423
[37] H ARRI H AANPÄÄ , M ATTI J ÄRVISALO , P ETTERI K ASKI , AND I LKKA N IEMELÄ: Hard satisfiable
     clause sets for benchmarking equivalence reasoning techniques. Journal on Satisfiability, Boolean
     Modeling and Computation, 2(1-4):27–46, 2005. Available at JSAT. 414
[38] H AIXIA J IA , C RISTOPHER M OORE , AND D OUG S TRAIN: Generating hard satisfiable formulas by
     hiding solutions deceptively. J. Artif. Intell. Res. (JAIR), 28:107–118, 2007. Preliminary version in
     AAAI’05. [doi:10.1613/jair.2039, arXiv:cs/0503044] 414
[39] B RIAN K ARRER AND M ARK E.J. N EWMAN: Stochastic blockmodels and community structure in
     networks. Phys. Rev. E, 83(1):016107, 2011. [doi:10.1103/PhysRevE.83.016107] 433
[40] F LORENT K RZAKALA AND L ENKA Z DEBOROVÁ: Hiding quiet solutions in random constraint sat-
     isfaction problems. Phys. Rev. Lett., 102(23):238701, 2009. [doi:10.1103/PhysRevLett.102.238701,
     arXiv:0901.2130] 415
[41] S HRINIVAS K UDEKAR AND N ICOLAS M ACRIS: Sharp bounds for optimal decoding of
     Low-Density Parity-Check codes. IEEE Trans. Inform. Theory, 55(10):4635–4650, 2009.
     [doi:10.1109/TIT.2009.2027523, arXiv:0807.3065] 423
[42] R AJ K UMAR , K RISHNA K UMAR , PAYAM PAKZAD , A MIR H ESAM S ALAVATI , AND M OHAM -
     MAD A MIN S HOKROLLAHI: Phase transitions for mutual information. In Turbo Codes and Iterative
     Information Processing (ISTC), 2010 6th International Symposium on, pp. 137–141, 2010. 415,
     422, 436
[43] J OHN D. L AFFERTY, A NDREW M C C ALLUM , AND F ERNANDO C. N. P EREIRA: Conditional
     random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th Int.
     Conf. on Machine Learning (ICML’01), pp. 282–289, 2001. ACM DL. 415

                   T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                           441
                             E MMANUEL A BBE AND A NDREA M ONTANARI

[44] M ICHAEL L UBY, M ICHAEL M ITZENMACHER , M OHAMMAD A MIN S HOKROLLAHI , AND
     DANIEL A. S PIELMAN: Efficient erasure correcting codes. IEEE Trans. Inform. Theory, 47(2):569–
     584, 2001. [doi:10.1109/18.910575] 436

[45] M ICHAEL L UBY, M ICHAEL M ITZENMACHER , M OHAMMAD A MIN S HOKROLLAHI , DANIEL A.
     S PIELMAN , AND VOLKER S TEMANN: Practical loss-resilient codes. In Proc. 29th STOC, pp.
     150–159. ACM Press, 1997. [doi:10.1145/258533.258573] 436

[46] L AURENT M ASSOULIÉ: Community detection thresholds and the weak Ramanujan property. In
     Proc. 46th STOC, pp. 694–703. ACM Press, 2014. ACM DL. [arXiv:1311.3085] 420, 433, 435

[47] A NDREA M ONTANARI: Tight bounds for LDPC and LDGM codes under MAP decod-
     ing. IEEE Trans. Inform. Theory, 51(9):3221–3246, 2005. [doi:10.1109/TIT.2005.853320,
     arXiv:cs.IT/0407060] 423

[48] A NDREA M ONTANARI: Estimating random variables from random sparse observations. European
     Transactions on Telecommunications, 19(4):385–403, 2008. ACM DL. [arXiv:0709.0145] 415

[49] A NDREA M ONTANARI , R ICARDO R ESTREPO , AND P RASAD T ETALI: Reconstruction and Clus-
     tering in Random Constraint Satisfaction Problems. SIAM J. Discrete Math., 25(2):771–808, 2011.
     [doi:10.1137/090755862, arXiv:0904.2751] 414

[50] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: Stochastic Block Models and Reconstruc-
     tion. Prob. Theor. Rel. Fields, 2012. To appear. [arXiv:1202.1499] 433

[51] E LCHANAN M OSSEL , J OE N EEMAN , AND A LLAN S LY: A proof of the block model threshold
     conjecture. 2014. [arXiv:1311.4115] 420, 433, 435

[52] M ARK E. J. N EWMAN: Communities, modules and large-scale structure in networks. Nature
     Physics, 8(1):25–31, 2011. [doi:10.1038/nphys2162] 433

[53] D MITRY PANCHENKO AND M ICHEL TALAGRAND: Bounds for diluted mean-field spin glass
     models. Prob. Theor. Rel. Fields, 130(3):319–336, 2004. [doi:10.1007/s00440-004-0342-2] 414,
     423

[54] G EORGE P OLYA AND G ABOR S ZEGÖ: Problems and Theorems in Analysis I. Springer, Berlin,
     1998. 421

[55] T OM R ICHARDSON AND R ÜDIGER U RBANKE: Modern Coding Theory. Cambridge Univ. Press,
     Cambridge, 2008. 415

[56] T OM A.B. S NIJDERS AND K RZYSZTOF N OWICKI: Estimation and Prediction for Stochastic
     Blockmodels for Graphs with Latent Block Structure. Journal of Classification, 14(1):75–100, 1997.
     [doi:10.1007/s003579900004] 433

[57] E NDRE S ZEMERÉDI: Regular partitions of graphs. Colloq. Internat. CNRS: Problèmes combina-
     toires et théorie des graphes, 260:399–401, 1978. 433

                   T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                        442
C ONDITIONAL R ANDOM F IELDS , P LANTED C ONSTRAINT S ATISFACTION , AND E NTROPY C ONCENTRATION

[58] L ENKA Z DEBOROVÁ AND F LORENT K RZAKALA: Quiet planting in the locked constraint sat-
     isfaction problems. SIAM J. Discrete Math., 25(2):750–770, 2011. [doi:10.1137/090750755,
     arXiv:0902.4185] 415


AUTHORS

     Emmanuel Abbe
     Assistant professor
     Princeton University, Princeton, NJ
     eabbe princeton edu
     http://www.princeton.edu/~eabbe


     Andrea Montanari
     Associate professor
     Stanford University, Stanford, CA
     montanari stanford edu
     http://web.stanford.edu/~montanar


ABOUT THE AUTHORS

     E MMANUEL A BBE received his Ph. D. from the EECS department at M.I.T., under the
        supervision of Lizhong Zheng, and his M. Sc. degree from the Mathematics Department at
        EPFL. He is currently an assistant professor in the Department of Electrical Engineering
        and in the Program for Applied and Computational Mathematics at Princeton University.
        His research interests are in coding theory, random graphs, and in the interplay between
        those fields.


     A NDREA M ONTANARI is an associate professor in the Departments of Electrical Engineering
        and of Statistics, Stanford University. He received the Laurea degree in physics in 1997,
        and the Ph. D. in theoretical physics in 2001, both from Scuola Normale Superiore, Pisa,
        Italy. He has been a Postdoctoral Fellow with the Laboratoire de Physique Théorique
        of Ecole Normale Supérieure (LPTENS), Paris, France, and the Mathematical Sciences
        Research Institute, Berkeley, CA. From 2002 to 2010 has been Chargé de Recherche at
        LPTENS. In September 2006, he joined the faculty of Stanford University. Dr. Montanari
        was co-awarded the ACM SIGMETRICS Best Paper Award in 2008. He received the
        CNRS Bronze Medal for Theoretical Physics in 2006, the National Science Foundation
        CAREER award in 2008, the Okawa Foundation Research Grant in 2013, and the Best
        Publication Award of the Applied Probability Society in 2015. His research focuses on
        algorithms on graphs, graphical models, statistical inference and estimation.



                  T HEORY OF C OMPUTING, Volume 11 (17), 2015, pp. 413–443                          443