Explicit Rateless Codes for Memoryless Binary-Input Output-Symmetric Channels

Authors Benny Applebaum, Liron David, Guy Even,
Plaintext
                           T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29
                                        www.theoryofcomputing.org




     Explicit Rateless Codes for Memoryless
    Binary-Input Output-Symmetric Channels
                   Benny Applebaum∗                         Liron David†             Guy Even
                Received September 8, 2015; Revised October 29, 2017; Published April 26, 2018




       Abstract: A rateless code encodes a finite-length information word into an infinitely long
       codeword such that longer prefixes of the codeword can tolerate a larger fraction of errors. A
       rateless code approaches capacity for a family of channels if, for every channel in the family,
       reliable communication is obtained by a prefix of the code whose rate is arbitrarily close to
       the channel’s capacity. The encoder is universal in the sense that same encoder is used for all
       channels in the family.
            So far, all known constructions of rateless codes were randomized, giving rise to ensem-
       bles of such codes. In this paper, we construct the first explicit rateless code for memoryless
       binary-input output-symmetric (MBIOS) channels. Our code achieves an almost exponen-
       tially small error probability (e. g., exp(−Ω(k/ log∗ k)) for k-bit information word), and
       can be encoded in almost constant time per-bit (e. g., O(log∗ k)). Over binary symmetric
       channels, the running time of decoding is similar. Previous ensemble-based rateless codes
       for the binary symmetric channel have polynomial asymptotic error probabilities and the
       running time of decoding is polynomial only under some conditions.
    A preliminary version of this paper appeared in the Proceedings of the 6th Conference on Innovations in Theoretical
Computer Science, ITCS 2015 [1].
  ∗ Supported by the European Union’s Horizon 2020 Programme (ERC-StG-2014-2020) under grant agreement no. 639813

ERC-CLC, by an ICRC grant and by the Check Point Institute for Information Security.
  † Supported by Israel Ministry of Science and Technology (grant 3-9094).



ACM Classification: E.4
AMS Classification: 94B05
Key words and phrases: coding theory, error-correcting codes, rateless codes, memoryless binary-input
output symmetric channel, binary symmetric channel, Gaussian channel


© 2018 Benny Applebaum, Liron David, and Guy Even
c b Licensed under a Creative Commons Attribution License (CC-BY)                         DOI: 10.4086/toc.2018.v014a004
                            B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

          Our main technical contribution is a constructive proof for the existence of an infinite
      generating matrix with the property that each of its prefixes induces a weight distribution
      that approximates the expected weight distribution of a random linear code.


1     Introduction
Consider a transmitter who wishes to send an information word m ∈ {0, 1}k over a Binary Symmetric
Channel with crossover probability p ∈ (0, 1/2) (hereafter denoted as BSC(p)). By Shannon’s theorem,
using an error-correcting code, reliable communication can be achieved by encoding the information word
into an n-bit codeword, where n = k/(1 − H(p) − δ ). Here H(p) denotes the binary entropy function,
and δ > 0 is an arbitrarily small constant. Furthermore, there are explicit capacity-achieving codes in
which decoding and encoding can be performed efficiently in polynomial or even linear time (cf. [5, 6]).
     The task of reliable communication is more challenging when the transmitter does not know the noise
level p. Such a situation arises, for example, when the transmitter wishes to broadcast the same message
to many receivers B1 , B2 , . . ., where each receiver Bi experiences a different crossover probability pi (e. g.,
due to a different distance from the transmitter). Naively, one would use a code which is tailored to
the noisiest channel with parameter pmax . However, this solution adds an unnecessary communication
overhead for receivers Bi with pi < pmax . Furthermore, in absence of an upper bound pmax < 1/2, even
this naive solution is not possible.
     The problem of encoding without knowing the channel’s noise level (also studied in [7, 30]) can be
solved by a rateless code. In a rateless code, the encoder maps the information word m ∈ {0, 1}k into an
infinitely long sequence of bits {ci }i∈N such that the longer the prefix of the codeword, the higher level
of noise can be corrected. That is, for every value of p ∈ (0, 1/2), and every constant gap-to-capacity
δ > 0, reliable communication is achieved over BSC(p) based on a prefix of length k/(1 − H(p) − δ )
(as k tends to infinity).
     Rateless codes have been extensively studied under various names [18, 16, 9, 14, 7, 30, 24, 8, 13,
26, 15, 22, 23]. Information-theoretically, the problem of rateless transmission is well understood [28],
and, for many noise models, random codes achieve nearly-optimal rate. The task of constructing
computationally efficient rateless codes, which provide polynomial-time encoding and decoding, is much
more challenging. Currently, only a few examples of efficient capacity-achieving rateless codes are
known for several important channels such as erasure channels, Gaussian channels, and binary symmetric
channels [17, 27, 10, 20]. Interestingly, all known constructions are probabilistic in the sense that the
code is chosen randomly from an ensemble. Hence, the encoder and the decoder must share randomness
(e. g., by an error-free side-channel or public randomness). This limitation raises the natural question of
whether one can construct explicit rateless codes.

1.1   Our results
In this paper, we answer the question in the affirmative by constructing deterministic efficient rateless
codes which achieve the capacity over Memoryless Binary-Input Output-Symmetric (MBIOS) channels.
To simplify the exposition, we state the results for the special case of the BSC family. (A generalization
to arbitrary MBIOS channels appears in Section 3.)

                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                   2
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

Theorem 1.1 (Rateless codes for BSC). Fix some super-constant function β (k) = ω(1). There exists a
deterministic rateless encoding algorithm Enc and a deterministic rateless decoding algorithm Dec such
that for every k-bit information word m the following holds.

      • (Capacity achieving) For every crossover probability p ∈ (0, 1/2), and prefix length

                                                                   1
                                                     n = k·
                                                              1 − H(p) − δ

        where 0 < δ < 1 − H(p) is an arbitrary constant, the n-long prefix of Enc(m) is decoded correctly
                                                         3
        by Dec over BSC(p) with probability 1 − 2−Ω(k/β ) .

      • (Efficiency) For every n > k, the n-long prefix of Enc(m) can be computed in time n · β , and
        decoding is performed in time n · β . Both algorithms can be implemented in parallel by circuits of
        depth O(β + log n).

    Letting β be a slowly increasing function, (e.g, log∗ (k)) we obtain an “almost” exponential error and
“almost” linear time encoding and decoding.
    One may also consider a weaker form of capacity achieving rateless codes in which the encoding
is allowed to depend on the gap to capacity δ . (This effectively puts an a priori upper-bound on the
noise probability which makes things easier.) In this setting we can obtain an asymptotically optimal
construction with linear time encoding and decoding and exponentially small error. (See Section 3 for a
formal statement.)

Comparison with spinal codes. Prior to our work, spinal codes [19, 20, 2] were the only known
efficient (randomized) rateless codes for the BSC.1 Apart from being deterministic, our construction
has several important theoretical advantages over spinal codes. The upper bound on the decoding error
of spinal codes is only inverse polynomial in k, and these codes only weakly achieve the capacity (i. e.,
the encoding depends on the gap δ to capacity). Moreover, the decoding complexity is polynomial (as
opposed to linear or quasilinear in our codes), and both encoding and decoding are highly sequential
as they require Ω(k) sequential steps. It should be mentioned however that, while spinal codes were
reported to be highly practical, we currently do not know whether our codes perform well in practice.
The question of directly derandomizing spinal codes (while keeping high practical performance) is left
for future research.

1.2     Overview of our construction
Our starting point is a simple (but inefficient and randomized) construction based on a random linear
code. Assume that both the encoder and decoder have access to an infinite sequence of random k-bit row
vectors {Ri }i∈N . To encode the message m ∈ {0, 1}k , viewed as a k-bit column vector, the encoder sends
the sequence {Ri · m}i∈N of inner products over the binary field. To decode a noisy n-bit prefix of the
codeword, we will employ the maximum-likelihood decoder (ML) for the code generated by the n × k
   1 We emphasize that, despite the use of the term “de-randomization” in [2], the construction of spinal codes is randomized.

In particular, it employs a hash function chosen uniformly from a family of pair-wise independent hash functions.


                            T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                            3
                                  B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

matrix R = (R1 , . . . , Rn ). A classical result in coding theory asserts that such a code achieves the capacity
of the BSC. Namely, as long as the gap from capacity δ = 1 − H(p) − k/n is positive, the random matrix
  R
R←{0, 1}n×k is likely to generate a code with exponentially small (in k) Maximum Likelihood Decoding
Error over BSC(p).
    This construction has two important drawbacks: It is probabilistic and it does not support efficient
decoding. For now, let us ignore computational limitations, and attempt to derandomize the construction.

1.2.1    Derandomization
We would like to deterministically generate an infinite number of rows {Ri }i∈N such that every n-row
prefix matrix R[1 : n] = (R1 , . . . , Rn ) has a low ML-decoding error of, say 0.01, for every p for which
1 − H(p) − k/n is greater than, say, 0.01.2
    Although we know that, for every n, almost all n × k matrices satisfy this condition, it is not a priori
clear that every such low-error matrix can be extended to a larger matrix while preserving low error.
    To solve this problem, we identify a property of good matrices which, on the one hand, guarantees
low decoding error, and, on the other hand, is extendable in the sense that every good matrix can be
augmented by some row while preserving its goodness. We will base our notion of goodness on the
weight distribution of the matrix R.
    Let Wi,n denote the set of information words which are mapped by the matrix R[1 : n] to codewords
of Hamming weight i, and let wi,n denote the size of this set. The sets (W1,n , . . . ,Wn,n ) form a partition
of {0, 1}k , and the vector (wi,n )i=1,...,n is called the weight distribution of the code. When a row Rn+1 is
added, the weight of all information words which are orthogonal to Rn+1 remains the same, while the
weight of non-orthogonal words grows by 1. Thus Rn+1 splits Wi,n into two parts: the orthogonal vectors
which “remain” in Wi,n+1 , and the non-orthogonal vectors which are “elevated” to Wi+1,n+1 . A random
row Rn+1 is therefore expected to split Wi,n into two equal parts.
    If in each step we could choose such an “ideal” row which simultaneously               halves each set Wi,n ,
                                                                 ∗           n
                                                                               k−n
we would get an “ideal” weight distribution in which wi (n, k) = i · 2 , as expected in a random
linear code. Such an ideal weight distribution guarantees a low ML decoding error over BSC(p) when
1 − H(p) < k/n (cf. [21, 29, 4]).
    While we do not know how to choose such an ideal row (in fact it is not clear that such a row exists),
a probabilistic argument shows that we can always find a row Rn+1 which approximately splits every
sufficiently large Wi,n simultaneously. Furthermore, by keeping track of the small sets and choosing Rn+1
which elevates a constant fraction of the lightest vectors, we make sure that the distance of the code is
not too small, e. g., Wi,n is empty for all i < Ω((n − k)/ log n). Using these properties we show that the
resulting code has low ML decoding error. (See Section 4.)

1.2.2    Making the code efficient
The above approach gives rise to a deterministic rateless code which achieves the capacity of the BSC
with a subexponential error of ε = 2−Ω(β / log β ) where β is the length of the information word. However,
   2 We use small constants to simplify the presentation, the discussion remains valid when the constants are replaced with a

function that decreases with k.


                            T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                           4
     E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

the time complexity of encoding/decoding the n-bit prefix of a codeword is n · 2O(β ) . We solve this
problem by noting that Forney’s concatenation technique [11] naturally extends to the rateless setting.
We sketch the construction below. (Full details appear in Section 6, see also Figure 1 and Figure 2.)
    Let k denote the length of the information word. The construction uses the inefficient rateless code
as an “inner code” Cin : {0, 1}β → {0, 1}∗ , and, in addition, employs a standard efficient outer code
Cout : Bkout → Bnout where B , {0, 1}β and kout , k/β .
    To encode a message m ∈ {0, 1}k , we parse it as M ∈ Bkout , apply the outer code to obtain a codeword
C , (C1 , . . . ,Cnout ) and then apply the inner code to each of the symbols of C in parallel. Namely, each
symbol Ci is encoded by the code Cin to an infinitely-long column vector. The nin · nout prefix of the
concatenated encoding is obtained by collecting the binary vectors (X1 , . . . , Xnout ) where Xi denotes the
prefix of length nin of the inner codeword that corresponds to Ci .
    Decoding proceeds in the natural way. Let Y = (Y1 , . . . ,Ynout ) denote the noisy nin · nout prefix of the
encoding of the message m. First, maximum likelihood decoding is employed to decode each of the inner
codewords Yi into X̂i . Next, the decoder of the outer code recovers an information word M from the noisy
codeword (X̂1 , . . . , X̂nout ).
    In order to prove Theorem 1.1, we need a somewhat non-standard setting of the parameters. To
avoid having to fix the gap to the channel’s capacity ahead of time, we use an outer code whose
rate tends to 1 (i. e., nout = kout (1 + o(1))). Set β = ω(1). For concreteness, take an outer code
Cout : Bkout → Bnout with nout = kout + kout /poly(β ), and assume that the code can be decoded from a
fraction of ε 0 = Ω(1/poly(β )) errors in time nout · poly(β ) and can be encoded with similar complexity.3
A standard application of Chernoff’s bound shows that the decoding error of p-noisy codeword of
                                                  0  2
length n ≥ k · 1/(1 − H(p) − δ ), is 2−Ω(nout (ε −ε) ) , which, under our choice of parameters, simplifies to
2−Ω(k/poly(β )) . For a slowly increasing β = ω(1), we derive an almost-exponential error, and an almost
linear encoding/decoding time complexity of nout · β + n · 2O(β ) .
    If the gap-to-capacity δ is a priori given, we can take β to be a constant (which depends on δ ). As a
result the rate of the outer code is bounded away from 1, but the error becomes exponentially small and
both encoding and decoding can be performed in linear time.

Remark 1.2 (An alternative approach). As already mentioned, our analysis shows that a good approxi-
mation of the weight distribution guarantees low decoding error. In particular, to achieve the error bounds
of Theorem 1.1, it suffices to design a code whose n-bit prefix has (1) distance of Ω((n − k)/ log n), and
(2) has at most w∗i (n, k) · 2o(n) + poly(n) codewords of weight i. Once n > 2k these conditions become
trivial to satisfy: The first condition requires distance of Ω((n − k)/k) (i. e., every k rows should increase
the distance by 1) and the second condition becomes vacuous (since the total number of codewords is at
most 2k < poly(n)). In particular, if the the first 2k rows of the generating matrix are “good,” they can be
easily extended by concatenating the identity matrix over and over again. This observation (suggested to
us by an anonymous referee) implies that a rateless code can be deterministically constructed in time
double-exponential in k by enumerating over all 2k × k matrices. Of course, this is significantly slower
than our construction which (naively) achieves exponential complexity in k and, after concatenation,
yields an “almost linear-time” construction.
   3 Such a code can be obtained based on expander graphs, e. g., [32, 31, 12]. In fact, we will employ the code of [12] which

achieves a smaller alphabet of absolute size β . This is not a real issue as we can increase the alphabet to 2β by parsing β / log β
symbols as a single symbol without affecting the properties of the code. See Section 6.


                             T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                                 5
                                  B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

1.3     Discussion
One of the main conceptual contributions of this work is a formalization of rateless codes from an
algorithmic point of view (see Section 2.2). This formulation raises a more general research problem:

        Is it possible to gradually generate an infinite combinatorial object O = {Oi }∞
                                                                                       i=1 via a
        deterministic algorithm?

Note that the question may be interesting even for inefficient algorithms as it may be infeasible, in general,
to decide whether a finite sequence O1 , . . . , On is a prefix of some good infinite sequence O. (This is
very different than the standard finite setting, where inefficient derandomization is trivially achievable by
exhaustive search.) It will be interesting to further explore other instances of this question (e. g., for some
families of graphs).
     The problem of deterministically constructing a rateless code can be formulated as follows. Refer
to a generating matrix as “pseudo-random-weight” if the weight distribution of the code it generates
is “close” to the expected weight distribution of random linear codes. Our main technical contribution
is a deterministic construction of an infinite generating matrix, every finite prefix of which is “pseudo-
random-weight.”
     An interesting open problem is to obtain stronger approximations for the “ideal” weight distribution.
Specifically, it should be possible to improve the code’s distance from sublinear (Ω((n − k)/ log n)) to
linear (Ω((n − k))) in the redundancy. Such a code will be able to resist adversarial errors (e. g., in
Hamming model). It is also natural to try and approximate the ideal weight distribution not only for
prefixes but also for consecutive blocks. Namely, is it possible to construct a rateless code which, for
every restriction to n consecutive bits, achieves the capacity of the BSC? Getting back to our motivating
story of noisy multicast, such a rateless code would allow the receivers to dynamically join the multicast.


2     Preliminaries
2.1     MBIOS Channels
Let x , (x1 , . . . , xk ) denote a k-bit string. Let W : {0, 1} → Y denote a binary-input output-symmetric
channel, where Y denotes the output alphabet of a channel.4 Let W n : {0, 1} → Yn denote the memoryless
binary-input output-symmetric (MBIOS) channel obtained by n independent copies of W . Let W n (z)
denote the random variable (taking values in Yn ) that describes the channel’s output when input z ∈ {0, 1}n .
An example of an MBIOS channel is the binary symmetric channel with crossover probability p ∈ (0, 1/2)
(denoted by BSC(p)).

2.2     Rateless codes
We begin with a syntactic definition of rateless codes over binary-input channels. We denote the output
alphabet by Y.
    4 A binary-input channel is output-symmetric if there is a bijection π : Y → Y such that Pr(W (b) = y) = Pr(W (1 − b) = π(y)),

for every y ∈ Y and b ∈ {0, 1}.


                             T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                               6
     E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

Definition 2.1 (rateless code). A rateless code is a pair of functions, (Enc, Dec), satisfying the following
conditions.
   1. The encoder Enc : {0, 1}∗ × N → {0, 1} takes an information word x ∈ {0, 1}∗ and an index i ∈ N,
      and outputs the i-th bit of the encoding of x. (Equivalently, the encoding of x is an infinite sequence
      of bits (Enc(x, i))i∈N .)
   2. The decoder Dec : Y∗ × N → {0, 1}∗ maps a noisy codeword y ∈ Y∗ and an integer k (which
      corresponds to the length of the information word) to an information word x̂ ∈ {0, 1}k .
    In our construction, the encoder and the decoder are computed by deterministic algorithms. Construc-
tions based on ensembles require an additional input whose purpose is to serve as shared randomness that
selects the specific code from the ensemble. (See Section 2.3 for a formal definition.)

Conventions. We let Enc(x, [1 : n]) denote the first n bits of the codeword that encodes the information
word x. Namely, Enc(x, [1 : n]) is the binary string c = (c1 , . . . , cn ), where ci = Enc(x, i). A rateless code
defines (n, k) codes for every n and k via
                                         Cn,k , {Enc(x, [1 : n]) | x ∈ {0, 1}k } .                                        (2.1)
When the information word length is clear from the context, we abbreviate Dec(y, k) to Dec(y).
Remark 2.2 (Additional features.). In some scenarios it is beneficial to have a rateless code with the
following additional features.
    • (Linearity) A rateless code is linear if Enc is a linear function. Namely, for x ∈ GF(2)k , we have
                                                        Enc(x, i) = Ri · x ,
       where {Ri }∞                                                      k
                  i=1 , is an infinite sequence of row vectors Ri ∈ GF(2) . We refer to the infinite matrix
       G = {Ri }i=1 as the generator matrix of the code.
                ∞                                         5


    • (Systematic) An encoding is systematic if, for every x ∈ {0, 1}k , we have Enc(x, [1 : k]) = x.
    We define the (maximum) error function of a rateless code (Enc, Dec) over a channel W with respect
to information word length k and codeword length n by
                     Perr (W, Enc, Dec, k, n) , max Pr[Dec(W n (Enc(x, [1 : n])), k) 6= x] .
                                                    x∈{0,1}k

    Let C(W ) denote the Shannon capacity of a channel W . Let F denote a family of MBIOS channels.
Definition 2.3 (capacity-achieving rateless code with a universal encoder and decoder). A rateless code
(Enc, Dec) achieves capacity with respect to a family of channels F if, for every channel W ∈ F and every
δ ∈ (0,C(W )), if n(k) , k/(C(W ) − δ ), then
                                           lim Perr (W, Enc, Dec, k, n(k)) = 0 .                                          (2.2)
                                          k→∞
   5 We use the convention that the generating matrix of C
                                                          n,k is an n × k matrix G and the information word x is a column vector
in GF(2)k . A codeword y is obtained by the multiplication y = Gx.


                            T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                              7
                             B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

In Definition 2.3, we assume that neither the encoder nor the decoder knows which channel W is chosen
from F. One can consider the variation of this definition in which only the encoder is oblivious to the
channel. In such a variation, the decoder depends on W . To emphasize this dependency we denote the
decoder by DecW .

Definition 2.4 (capacity-achieving <– rateless code with a universal encoder and channel dependent
decoder). A rateless code (Enc, {DecW }W ∈F ) achieves capacity if, for every channel W ∈ F and every
δ ∈ (0,C(W )), if n(k) , k/(C(W ) − δ ), then

                                      lim Perr (W, Enc, DecW , k, n(k)) = 0 .                                 (2.3)
                                     k→∞


2.3   Randomized rateless codes
For the sake of future reference, we extend the definitions of rateless codes to the setting of randomized
constructions. (These definitions are not used in this paper since all our constructions are deterministic,
and the reader may safely skip them.) Roughly speaking, we assume that to encode/decode the n-bit
prefix of the codeword the encoder and the decoder consume a shared random string ρ of length `(k, n)
where k is the length of the information word. (Equivalently, the two algorithms have an access to an
infinite string and they use its `(k, n) prefix to generate/decode the n-bit prefix of the code.) The error
probability is measured naturally with respect to a randomly chosen string ρ. We proceed with formal
definitions. Below we consider a binary-input channel with an output alphabet Y.

Definition 2.5 (randomized rateless code). A randomized rateless code with randomness complexity of
` : N × N → N is a pair of functions (Enc, Dec) such that:

   1. The encoder Enc : {0, 1}∗ × N × {0, 1}∗ → {0, 1} takes an information word x ∈ {0, 1}∗ an index
      i ∈ N, and a random string ρ ∈ {0, 1}`(|x|,i) and outputs the i-th bit of the encoding of x.

   2. The decoder Dec : Y∗ × N × {0, 1}∗ → {0, 1}∗ maps a noisy codeword y ∈ Y∗ an integer k (which
      corresponds to the length of the information word) and a random string ρ ∈ {0, 1}`(k,|y|) to an
      information word x̂ ∈ {0, 1}k .

We always assume that the randomness complexity function ` : N × N → N is monotone increasing in
both arguments.

    We let Encρ (x, [1 : n]) denote the first n bits of the codeword that encodes the information word x using
the (same) random string ρ ∈ {0, 1}`(|x|,n) . Namely, Encρ (x, [1 : n]) is the binary string c = (c1 , . . . , cn ),
where ci = Enc(x, i, ρ[1 : `(k, i)]). We also write Decρ (y, k) to denote Dec(y, k, ρ).
    We define the error function of a randomized rateless code (Enc, Dec) over a channel W with respect
to information word length k and codeword length n by

                  Perr (W, Enc, Dec, k, n) , max Pr[Decρ (W n (Encρ (x, [1 : n])), k) 6= x] ,
                                              x∈{0,1}k ρ

where the probability is taken over the distribution of the channel and over a uniform choice of
  R
ρ ←{0, 1}`(k,n) . A randomized rateless code (Enc, Dec) achieves capacity with respect to a family

                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                     8
     E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

F of channels with a universal encoder and decoder (or a universal encoder and channel-dependent
decoder) if its error function satisfies Definition 2.3 (Definition 2.4, resp.).
    One may study variants of the above definition in which the encoder/decoder have some limited
access to the randomness. For example, it may be the case that the infinite random string is partitioned
into short blocks and the i-th bit of the codeword can be generated solely based on the i-th block. (The
encoder does not have to remember “old parts” of the random string.) Indeed, this property is satisfied by
the “random matrix construction” in which the i-th bit is generated by sending the inner product of x with
a random k-bit vector Ri . The study of other variants is left for future research.


3    Our results
We present a capacity-achieving rateless code for MBIOS channels with a universal encoder. The code is
parameterized by a parameter β = β (k) which is an arbitrary slowly growing function of the information
word length k. The decoder DecW of a MBIOS channel W uses maximum likelihood (ML) decoding as
a subroutine. (However, computational efficiency is obtained by applying ML decoding only to short
blocks of the inner code of a concatenated code.) Let MBIOS denote the family of all MBIOS channels.
Theorem 3.1 (Main Theorem). There exists a systematic linear rateless code Cβ = (Enc, {Dec}W ∈MBIOS )
such that for every channel W ∈ MBIOS and every δ ∈ (0,C(W )), the following hold.
    1. If β (k) tends to infinity with k, then
                                                              k                3
                                   Perr W, Enc, DecW , k,               = 2−Ω(k/β ) .
                                                            C(W ) − δ

    2. Furthermore, if β and k are fixed and n tends to infinity then

                                     Perr (W, Enc, DecW , k, n) = 2−Ω(n/ log(n)) .

The running time of the encoder is O(β · 22β ) per codeword bit and can be computed by a circuit of depth
O(β + log n). The running time of the decoder is O(β + (R/β ) · TWML (β /R)) per codeword bit, where
R = k/n and TWML (`) denotes the running time of ML-decoding over the channel W of a linear [`, β ]-code.
Decoding can be computed in parallel by a circuit of depth O(PTWML (β /R)) + β + log n), where PTWML (`)
denotes the parallel time of ML-decoding over the channel W of a linear [`, β ]-code.

The “furthermore” part shows that the error decreases quickly even for fixed length inputs (and fixed β )
provided that the prefix is sufficiently long. If the gap-to-capacity δ is fixed then the construction can be
tweaked to work with some constant β = β (δ ). The resulting error function decreases exponentially as
follows.
Corollary 3.2. For every fixed δ > 0, there exists constant β such that, for every W ∈ MBIOS with
C(W ) > δ , the rateless code Cβ satisfies
                                                       k      
                              Perr W, Enc, DecW , k,             = 2−Ω(k) .                 (3.1)
                                                     C(W ) − δ

                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                              9
                                B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

The running time of the encoder is constant per codeword bit and O(log n) in parallel. The running time
of the decoder is                                                
                                                 ML           1
                               O((C(W ) − δ ) · TW   O
                                                         C(W ) − δ
per codeword bit.

Universal Decoding. We remark that Theorem 3.1 and Corollary 3.2 can be restricted to families
of MBIOS channels F in which ML-decoding does not depend on the channel W ∈ F. Examples of
such families are: (1) BSC(p), for p ∈ (0, 1/2), where ML-decoding finds a closest codeword in the
L1 distance (Hamming distance). (2) Binary-Input Additive White Gaussian Noise with noise variance
σ 2 (AWGN(σ )) for σ > 0, where ML-decoding finds a closest codeword in the L2 distance (Euclidean
distance). As the dependency of our decoder DecW on the channel W stems only from the need to
compute ML-decoding, it follows that in such families of channels the decoder is universal and does not
depend on the channel.


4     Inefficient explicit rateless code for BSC
In this section we present a computationally inefficient6 explicit construction of a rateless code that
achieves capacity with respect to the family of all binary symmetric channels BSC(p), where p ∈ (0, 1/2).
The decoder in this construction is universal and is simply a maximum-likelihood decoder (i. e., minimizes
the L1 -distance). This code will be later used as the inner code of our final construction.
Theorem 4.1. There exists a systematic linear rateless code (Enc, ML) with the following properties.
Capacity achieving and decaying error: For every p ∈ (0, 1/2) and δ ∈ (0,C(BSC(p))),
                                       k      
            Perr BSC(p), Enc, ML, k,             = 2−Ω(k/ log k)          (as k → ∞),                                    (4.1)
                                     C(W ) − δ
                             Perr (BSC(p), Enc, ML, k, n) = 2−Ω(n/ log(n))                (fixed k, n → ∞).              (4.2)

Complexity: Encoding and decoding of k-bit information words and n-bit codewords can be done in
    time O(nk · 22k ).
    The encoder multiplies the generating matrix by the information word. Each row of the generating
matrix can be computed in time O(k · 22k ). Hence, the generating matrix of Cn,k can be computed in time
O(nk · 22k ). Both the encoder and decoder require the generating matrix. Once the generating matrix of
Cn,k has been computed, the running times of the encoding and the decoding are as follows.
     • The encoding of Enc(m, [n : 1]) of m ∈ {0, 1}k can be computed in time O(n · k).
     • Maximum likelihood decoding y ∈ {0, 1}n can be done in time O(n · k · 2k ).
    In the following sections we describe the construction of the generating matrix of the code and
analyze the error of the maximum likelihood decoder.
    6 Namely, the running time required to compute the rows of the generating matrix and decoding is exponential in k.



                           T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                            10
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

4.1    Computing the generating matrix
Our goal is to construct an infinite generating matrix G with k columns. Let Ri ∈ {0, 1}k denote the
ith row of the generating matrix. Let Gn denote the k × n matrix, the rows of which are (Ri )i=1...n . Let
Cn,k denote the code generated by Gn . The generating matrix G begins with the k × k identity matrix,
and hence each code Cn,k is systematic. Subsequent rows Ri (for i > k) of the generating matrix are
constructed one by one. Let
                                  Wi,n , {x ∈ {0, 1}k : wt(Gn · x) = i}
denote the ith weight class of Cn,k . The rows are chosen so that the weight distribution (|W1,n |, . . . , |Wn,n |)
                                                         ∗ . Note that when a row vector R
of Cn,k is close to that of a random [n, k]-linear code Cn,k                                 n+1 is added, if
           k
x ∈ {0, 1} is orthogonal to Rn+1 , then wt(Gn+1 · x) = wt(Gn · x); otherwise, wt(Gn+1 · x) = wt(Gn · x) + 1.
Thus Rn+1 splits each weight class Wi,n to two parts: the orthogonal vectors which “remain” in Wi,n+1 ,
and the non-orthogonal vectors which are “elevated” to Wi+1,n+1 .

Definition 4.2. A vector R ∈ GF(2)k ε-splits a set S ⊆ GF(2)k if
                             1                                     1    
                                 − ε · |S| ≤ |{s ∈ S | s · R = 1}| ≤    + ε · |S| .
                               2                                      2
A vector R ∈ GF(2)k ε-elevates a set S ⊆ GF(2)k if

                                          |{s ∈ S | s · R = 1}| ≥ ε · |S| .

     The algorithm for computing the rows Ri of G for i > k is listed as Algorithm 1 (following the high
level intuition provided in Section 1.2). Roughly speaking, the algorithm ε-splits all “large” weight
classes (which contain more than 2n2 information words), and employs a marking strategy to deal with
“small” weight classes Wi,n (which contain less than 2n2 information words). Initially, all the nonzero
information words are unmarked. Once an information word becomes a member of a small weight class,
it is marked, and remains marked forever (even if it later belongs to a large weight class). The unmarked
vectors in Wi,n are denoted by Wbi,n . By definition, the set W
                                                              bi,n is either empty or large, and so, by a simple
probabilistic argument, there exists a vector Rn+1 which simultaneously ε-splits all the sets W         bi,n , for
1 ≤ i ≤ n. In addition, Rn+1 is required to elevate the set of nonzero codewords of minimum weight.
As we will later see, the distance of the resulting code grows sufficiently fast as a function of n, and its
weight distribution is sufficiently close to the expected weight distribution of a random linear code.
     We remark that (according to the analysis) the 1/8-elevation of Wd,n can be skipped if W           bd,n 6= 0/
(namely, the elevation is required only if every vector in Wd,n is marked). It is not hard to verify that
Algorithm 1 can compute the first n rows in time O(nk · 22k ).

4.2    Success in finding a new row
In this section we prove that Algorithm 1 succeeds in finding a new row for the generating matrix.

Lemma 4.3. Algorithm 1 always finds a suitable vector Rn+1 in Line 3d.

We begin with the following claim.

                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                   11
                            B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

Algorithm 1 Compute-Generating-Matrix - An algorithm for computing rows Rn of the generating matrix
of the rateless code for n > k.
   1. Let (R1 , . . . , Rk ) be the rows of the k × k identity matrix.

   2. Initialize the set of marked information words M ← 0.
                                                         /

   3. For n = k to ∞ do

        (a) For 1 ≤ i ≤ n, let Wi,n be the set of information words that are encoded by a codeword of
            weight i.
        (b) Let d > 0 be the minimal positive integer for which Wd,n is non-empty.
        (c) For every i, if |Wi,n \ M| < 2n2 , then mark all the information words in Wi,n by setting
            M ← M ∪Wi,n . Let W  bi,n , (Wi,n \ M) denote the unmarked vectors in Wi,n .
                                                                                             √
        (d) Let Rn+1 be the lexicographically first vector in GF(2)k that simultaneously 1/(2 n-splits
            every unmarked weight class W  bi,n and 1/8-elevates Wd,n .



Claim 4.4. For every set W ⊆ {0, 1}k \ {0k } of size at least 2n2 , there are more than 2k · (1 − 1/(2n))
                   √
vectors that 1/(2 · n)-split W .

Proof. Let W = {x1 , . . . , xm }, where m ≥ 2n2 . Let R denote a random vector chosen uniformly in
{0, 1}k . This uniform distribution induces 0 − 1 random variables Z1 , . . . , Zm defined by Zi = R · xi . The
expectation of each random variable Zi is 1/2, and the variance of each Zi is 1/4. (However, they are not
independent.) Since the elements of W are distinct, the random variables {Zi }i are pairwise independent.
By Chebyshev’s Inequality,
                                                              !
                                         1 m       1       1         1
                                    Pr     · ∑ Zi − > √          <      .                                  (4.3)
                                         m i=1     2     2· n       2n
                                 √
Finally, note that R is an 1/(2 · n)-splitter for W if and only if

                                            1 m      1   1
                                             · ∑ Zi − ≤ √ .
                                            m i=1    2 2· n

Proof of Lemma 4.3. If |Wi,n \ M| < 2n2 , then all the information words in Wi,n are marked, and W       d i,n is
                                                             2
empty. Therefore, Wi,n is either empty or of size at least 2n . It follows, by a union bound, that more than
                     b
                                                 √
half of the k-bit vectors simultaneously 1/(2 · n)-split each set W   bi,n , for 1 ≤ i ≤ n. Therefore, to prove
the lemma it suffices to show that at least half of the vectors R (1/8)-elevates the set Wd,n .
    Note that any 3/8-splitter of Wd,n is also a 1/8-elevator of this set. In case |Wd,n | ≤ 8, pick a vector
x ∈ Wd,n . Half the vectors are not orthogonal to x, and hence at least half the vectors are 1/8-elevators of
Wd,n . If |Wd,n | ≥ 9, we can apply the argument of the above Claim 4.4 and get that at least half of the
vectors R 1/8-elevate Wd,n . This completes the proof of Lemma 4.3.


                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                 12
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

4.3    Weight distribution
In this section we analyze the weight distribution of the linear code Cn,k . We let wi,n be the size of Wi,n ,
the set of information words whose encoding under Cn,k has Hamming weight i. We will show that wi,n is
not far from the expected weight distribution w∗i (n, k) , ni · (2k − 1) · 2−n of a random [n, k] linear code.
Claim 4.5. After n iterations, the number of marked information words is less than 2n4 .
Proof. For every i, n0 ≤ n the set Wi,n0 contributes less than 2n2 information words to the set M of marked
words. Hence there are most 2n4 marked vectors after the Rn is chosen.

Claim 4.6.
                                                          t      
                                                          2    k +t
                                           |W
                                            bi,k+t | ≤                .                                 (4.4)
                                                          3      i

Proof. By induction on t. The case of t = 0 holds because |Wi,k | = ki . To prove the induction step for
                                                                      

t + 1, recall that the choice of Rk+t+1 splits each W
                                                    bi,k+t so that

                                                    2  b                      
                                   |W
                                    bi,k+t+1 | ≤      · |Wi−1,k+t | + |W
                                                                       bi,k+t | .
                                                    3
The induction hypothesis for t now implies that
                                            t+1               
                                            2        k +t      k +t
                             |Wi,k+t+1 | ≤
                              b                  ·         +
                                            3        i−1         i
                                            t+1         
                                            2      k +t +1
                                         =       ·           ,
                                            3          i
and the claim follows.


Claim 4.7. For every n and i, we have that wi,n ≤ 2n4 + w∗i (n, k) · Πk,n where
                                              n−1            √ √
                                                     
                                                  1
                                    Πk,n , ∏ 1 + √      ≤ e2( n− k) .
                                          j=k+1     j

Proof. By Claim 4.5, it suffices to bound the unmarked vectors by
                                              bi,n | ≤ w∗i (n, k) · Πk,n .
                                             |W                                                         (4.5)

         bi,n | and w∗i (n, k) satisfy the following recurrences:
Indeed, |W
                                      1
                          w∗i (n, k) =  · (w∗i−1 (n − 1, k) + w∗i (n − 1, k)) ,
                                      2
                                                     
                              bi,n | ≤ 1 + √ 1            1  b                           
                             |W                         · · |W     i−1,n−1 | + |Wb i,n−1 |  .
                                               n−1        2


                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                              13
                            B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

We can now prove equation (4.5) by induction on n ≥ k. Indeed, wi,k = w∗i,k , and
                                     
                                 1        1
                                         · · w∗i−1 (n − 1, k) · Πk,n−1 + w∗i (n − 1, k) · Πk,n−1
                                                                                                 
                bi,n | ≤ 1 + √
               |W
                                n−1       2
                         1 ∗
                           wi−1 (n − 1, k) + w∗i (n − 1, k) · Πk,n
                                                           
                       =
                         2
                     = w∗i (n, k) · Πk,n .

The claim follows.

We will also need to prove that the distance of Cn,k is sufficiently large.

Claim 4.8. For every n > k, the minimum distance of the code Cn,k is greater than (n − k)/(55 · log n).

Proof. It is easier to view the evolution of the weight distribution of Cn,k as a process of shifting balls
in n bins. A ball represents a nonzero information word, and a bin corresponds to a weight class. We
assume that bin(1) is positioned on the left, and bin(n) is positioned on the right. Moving (or shifting) a
ball one bin to the right means that the augmentation of the generating matrix by a new row increases the
weight of the encoding of the information word by one. Note that, as the generating matrix is augmented
by a new row, a ball either stays in the same bin or is shifted by one bin to the right.
    Step t of the process corresponds to the weight distribution of Cn0 ,k for n0 = t + k. Let bint (i) denote
the set of balls in bin(i) after step t. By Algorithm 1, the process treats marked balls and unmarked balls
differently.
    Let
                                        2                                n−k
                             α,               < 11     and       ∆,               .
                                   log2 (8/7)                         α log(2n4 )
Using these terms, we prove a slightly stronger minimum distance, namely,

                                             binn−k (i) = 0/ ,    ∀i ≤ ∆ .                                (4.6)

   The proof is divided into two parts. First we consider the unmarked balls, and then we consider the
marked balls. We begin by proving that

                                      bin(n−k)/2 (i) \ M = 0/ ,      ∀i ≤ ∆ .                             (4.7)

Namely, after (n − k)/2 iterations of Algorithm 1, the bins bin(n−k)/2 (1), . . . , bin(n−k)/2 (∆) may contain
only marked balls. Note that as balls never move left, these bins remain without unmarked balls in all
subsequent steps.
     The proof of equation (4.7) is based on Claim 4.6. For t = (n − k)/2 and i ≤ ∆, the RHS of
equation (4.4) is smaller than 1, and so equation (4.7) follows.
     To prove that bin(n−k) (i) ∩ M = 0/ for every i ≤ ∆, let t(i) , (n − k)/2 + i · log8/7 (2n4 ). Note that
t(∆) = n − k. We wish to prove, by induction on i, that the leftmost bin with a marked ball after t(i)
iterations is bin(i + 1). After log8/7 (2n4 ) additional iterations, also bin(i + 1) lacks marked balls. In this

                        T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                 14
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

manner, after (n − k) iterations all the marked balls are pushed to the right of bin(∆). Formally, we claim
that
                                       bint(i) ( j) ∩ M = 0/ , ∀ j ≤ i.                                (4.8)
equation (4.8) suffices because t(∆) = n − k, and hence it implies that bin(n−k) ( j) = 0/ for every j ≤ ∆,
as required. The proof of equation (4.8) is by induction on i. For i = 0 the claim is trivial (because the
code is systematic every nonzero information word is encoded to a nonzero word). The induction step for
i > 0 is as follows. For every t(i − 1) < t ≤ t(i), if bint (i) contains a marked ball, then, by the induction
hypothesis, it is the leftmost bin that contains a marked ball. Hence, each new row Rt+1 of the generator
matrix 1/8-elevates bint (i). Since bint (i) consists only of marked balls, by Claim 4.5, it follows that
|bint(i−1) (i)| < 2n4 . Hence, after log8/7 (2n4 ) steps, the bin is emptied, namely, bint(i) (i) = 0,
                                                                                                    / as required.
    We proved that bin(n−k) (i) is empty if i ≤ ∆, and the claim follows.

    Overall Claim 4.8 and Claim 4.7 imply that Cn,k is close to an “average” code in the following sense.
Let α , 2/log2 (8/7) < 11.

Lemma 4.9. The weight distribution of the constructed code Cn,k satisfies the following bound:
                                      (                                              n−k
                                       0                               if 0 < i ≤ α log(2n4) ,
                            wi,n ≤                                                                          (4.9)
                                         2n4 + w∗i (n, k) · Πk,n                 n−k
                                                                       if i > α log(2n4) .




4.4    Analysis of the ML decoding error
In this section we complete the proof of Theorem 4.1. Let ML be the maximum-likelihood decoder which,
given a noisy codeword y ∈ {0, 1}n and k, finds a closest (in L1 distance) codeword ŷ ∈ Cn,k and outputs
the message m ∈ {0, 1}k for which Gn · m = ŷ.

Proof of Theorem 4.1. Fix p and δ , and consider n and k such that n ≥ k/(C(BSC(p)) − δ ). Let δGV be
the root x ∈ (0, 1/2) of the equation H(x) = 1 − k/n. Since the code is linear, we may assume without
loss of generality that the all zero codeword was transmitted. Our goal is to upper-bound the event that ŷ,
the codeword computed by the ML-decoder, is non-zero. We divide the analysis into two cases based on
the Hamming weight of ŷ.

Case 1: ŷ is of positive weight smaller than δGV · n. For a fixed codeword z of weight i > 0, erroneous
decoding to z (instead of to the all zero codeword) corresponds to the event that the BSC(p) flipped
at least i/2 bits in the support of z. (The support of z is the set { j : z j = 1}.) This event happens with
probability
                                      Pi , ∑ij=di/2e ij · p j · (1 − p)i− j .
                                                       

By a union-bound, we can upper-bound the probability of the event that 0 < wt(ŷ) < δGV · n by

                              δGV ·n−1                     δGV ·n−1                 √
                                ∑        wi,n · Pi ≤          ∑         (2n4 + e2 n ) · Pi .              (4.10)
                                i=1                    i=(n−k)/(55 log n)



                        T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                   15
                              B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

Indeed, Lemma√4.9 implies that (i) wi,n = 0 if i ≤ (n − k)/(55 log n) and (ii) if i/n < δ − GV, then
wi,n ≤ (2n4 + e2 n ) (since w∗i (n, k) < 1 if i/n < δGV ). Below, we show that
                                                        Pi ≤ 2−β ·i                                              (4.11)
where β , −(1/2) · log2 (4p(1 − p)) is positive since p ∈ (0, 1/2). It follows that the error probabil-
ity (equation (4.10)) is upper-bounded by
                                           √            δGV ·n
                                (2n4 + e2 n ) ·          ∑             2−β ·i ≤ e−Ω(n/ log n) .
                                                  i=(n−k)/(55 log n)

It is left to prove equation (4.11). Indeed, by definition, Pi satisfies
             Pi , ∑ij=di/2e ij · p j · (1 − p)i− j ≤ pi/2 · (1 − p)i/2 · ∑ij=i/2 ij ≤ pi/2 · (1 − p)i/2 · 2i ,
                                                                                  

which can be written as (4p(1 − p))i/2 . Because p < 1/2, it follows that β > 0, and Pi ≤ 2−β ·i , as
required.

Case 2: ŷ is of weight greater than δGV · n. In this regime, the spectrum of our code is sufficiently
close to that of a random linear code, and so the error of the ML-decoding can be analyzed via (an
extension of) Poltyrev’s bound [21] (see also [29]). The extension bounds the probability of the event
that ML-decoding returns a “heavy” word. Note that no assumption is made on the minimum distance of
the code. The proof is based on an analysis in [3] and is deferred to Section 7.
Theorem 4.10 (extension of Thm. 1 of [21]). Let p ∈ (0, 1/2) be a constant, δ > 0 be a constant such
that k/n < C(BSC(p)) − δ , and τ ∈ [0, 1] be a threshold parameter. There exists a constant α > 0 for
which the following holds. If C is an [n, k] linear code whose weight distribution {wi (Cn )}i satisfies
                                   wi ≤ 2(δ /3)n · w∗i (n, k)          for every i ≥ τn.
Then, the probability over BSC(p) that the all zero word with noise is ML-decoded to a codeword of
weight at least τn is at most 2−αn .
     Observe the weight distribution of our code satisfies the Poltyrev’s criteria for codewords of weight at
least τ = δGV · n. Indeed, in this regime w∗i (n, k) ≥ 1 and so the upper-bound wi,n ≤ 2n4 + w∗i (n, k) · Πk,n
from Lemma 4.9 simplifies to wi,n ≤ w∗i (n, k)2o(n) . We therefore conclude that the decoding error in case
(2) is 2−Ω(n) .
     By combining the two cases, we conclude that the error probability is at most 2−Ω(n/ log n) , as
required.


5    Extension to MBIOS channels
In this section we prove that the systematic linear rateless code presented in Section 4 achieves capacity
for MBIOS channels. The only modification that is required is that the decoder is an ML-decoder with
respect to the channel (hence the encoder is universal but the decoder depends on the channel).
    Let MLW denote an ML-decoder with respect to the channel W . For a channel W , let (Enc, MLW )
denote the systematic linear rateless code where Enc is the encoder presented in Section 4.

                          T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                      16
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

Theorem 5.1. For every W ∈ MBIOS and every δ ∈ (0,C(W )),
                                             k      
                Perr W, Enc, MLW , k,                  = 2−Ω(k/ log k)                             (as k → ∞),              (5.1)
                                           C(W ) − δ
                             Perr (W, Enc, MLW , k, n) = 2−Ω(n/ log(n))                      (fixed k, n → ∞).              (5.2)

Proof. The proof is a consequence of [25, Appendix] that states that

                         Perr (W, Enc, MLW , k, n) ≤ n · Perr (BSC(p), Enc, MLBSC(p) , k, n)
                                                       + H(Perr (BSC(p), Enc, MLBSC(p) , k, n))
                                                                                          √
if C(W ) ≤ C(BSC(p)) (where H(x) denotes the binary entropy function).7 Note that H(x) ≤ 4 x for
x ∈ [0, 1], hence the theorem follows from Theorem 4.1.


6     Efficient rateless code for MBIOS channels
In this section we prove Theorem 3.1 by constructing a computationally efficient capacity-achieving
rateless code with a universal encoder for MBIOS channels. The construction is based on a concatenated
code to obtain computationally efficient encoding and decoding as well as amplification of the probability
of successful decoding. The inner code is the computationally inefficient rateless code described in
Section 4. The outer code is taken from [12, Lemma 1]. The rate of the outer code tends to one as the
block length increases. The outer code is resilient to an adversarial channel that corrupts at most Θ(r)
symbols, where r is the redundancy. Encoding and decoding of the outer code requires almost linear time.
    We define the rateless code (Enc0 , {DecW
                                            0
                                              }W ) via its restriction to information words of length k and
codewords of length n. The code is the concatenation of an [nout , kout ] outer code Cout and an [nin , kin ]
inner code Cin defined as follows.

Inner Code. The inner code Cin is the computationally inefficient rateless code described in Section 4
restricted to input length kin and output length nin . Recall that this is an [nin , kin ] linear systematic code
over {0, 1} which can be encoded in time O(nin kin · 22kin ) and in parallel time O(kin ). The error function
of ML-decoding is bounded by 2−Ω(nin / log nin ) as long as kin /nin ≤ (C(W ) − δ ).

Outer Code. The outer code Cout is taken from [12, Lemma 1]. It is an [nout , kout ] linear systematic
code over an alphabet Σout with nout = kout · (1 + |Σout |−1/2 ). Hence, the rate of the outer code tends to
one as the alphabet Σout increases. The outer code can be encoded in time O(nout · |Σout |1/2 ). Decoding
in time O(nout · |Σout |) is successful as long as the fraction of errors is bounded by εout = Θ(|Σout |−1 ).
Furthermore, the code can be encoded and decoded in parallel time of O(log(nout · |Σout |)).
     We emphasize that the length k of the information word together with a parameter β determine the
outer code as well as the length kin of the information word in the inner code. For fixed k and β , the outer
code is fixed, and the rate of the inner code decreases as nin increases.
    7 The proof in [25] is with respect to the average error probability. However, the code is linear and the channel is symmetric,

hence the average and maximum error probabilities are equal.


                            T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                               17
                               B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

Construction 6.1 (The concatenated code Cn,k = (Enc0 , {DecW
                                                           0
                                                     β
                                                             }W )). For lengths k and n, and a parameter
β let

         |Σout | = kin = β ,     kout = k/ log2 |Σout |,     Lin = (nout · log2 |Σout |)/kin ,     nin = n/Lin .

    • The encoder Enc0 of the concatenated code Cn,k maps k-bit information word to n-bit codeword as
                                                             β

      follows (see Figure 1).
                                          1              2       3              4
                                     F2k ,→ Σkout
                                               out
                                                   −→ Σnout
                                                         out
                                                             ,→ (F2kin )Lin −→ (F2nin )Lin .

       The four steps of the encoder are: (1) A message m ∈ {0, 1}k is parsed as the message mout ∈
       (Σout )kout . Namely, Σout = {0, 1}log β , and the message m is broken into kout blocks of length
       log2 |Σout |. (2) The encoder of the outer code maps mout to a codeword cout ∈ (Σout )nout . (3) The
       outer codeword cout is parsed as Lin messages (m1in , . . . , mLinin ) each over {0, 1}kin . (4) The encoder
       of the inner code maps each message minj to an inner codeword cinj ∈ {0, 1}nin .
                       0                                                            β
    • The decoder DecW   for channel W of the concatenated code Cn,k maps a noisy n-bit codeword to a
      k-bit information word as follows (see Figure 2).
                                                 4               3          2           1
                                     (F2nin )Lin −→ (F2kin )Lin ,→ Σnout
                                                                      out
                                                                          −→ Σkout
                                                                                out
                                                                                    ,→ F2k .

       The four steps of the decoder correspond to the encoding steps in reversed order: (4) The decoder
       of the inner code is an ML-decoder and is applied in parallel to each noisy inner codeword ĉinj ∈
       {0, 1}nin . We denote the ML-decoding of ĉinj ∈ {0, 1}nin by m̂inj . (3) The Lin (inner) information
       words (m̂1in , . . . , m̂Linin ) each over {0, 1}kin are parsed as a noisy codeword ĉout ∈ (Σout )nout of the
       outer code. (2) The decoder of the outer code maps the noisy codeword ĉout ∈ (Σout )nout to a
       message m̂out ∈ (Σout )kout . (1) The message m̂out is parsed as a message m̂ ∈ {0, 1}k .
    The encoder of the rateless code (when n is not predetermined) outputs the encoding of m1in , . . . , mLinin
“row by row.” Namely, after the i’th bit of the encodings is output, the encoder outputs bit i + 1 of each
inner-codeword. Hence, the code Cn,k is a prefix of the code Cn0 ,k for n < n0 and so the code defines a
                                     β                           β

rateless code. Also note that the code is systematic and the complexity of encoding is

                             O(nout · |Σout |1/2 + Lin · nin · kin · 22kin ) = O(n · β · 22β )

and the complexity of decoding is

           O(nout · |Σout | + Lin · nin · kin · 22kin + Lin · TWML (nin )) = O(n · β · 22β + Lin · TWML (nin )) .

(We assume pessimistically that the encoder and the decoder need to compute the generating matrix.)
Furthermore, encoding can be performed in parallel-time of O(kin + log(nout · |Σout |)) = O(β + log n)
and decoding in parallel time of O(β + log n + TWML (nin ))R.
    In the following proof of Theorem 3.1, the error function is analyzed with respect to two scenarios:
(1) An arbitrarily slow growing function β = β (k) and rate approaching the capacity. (2) Fixed k and β
(and hence the outer code is fixed), but decreasing rate of the inner code.

                          T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                      18
     E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

Proof of Theorem 3.1. Let ĉin = (ĉ1in , . . . , ĉLinin ) denote the noisy prefix of length n = nin · Lin of the encod-
ing of the message m. Let ê denote the fraction of the inner-code information words that are incorrectly
decoded by the ML-decoder. The decoder of the outer-code is successful as long as ê < εout . (Note that
each decoded inner information word is parsed into kin / log2 |Σout | symbols of the outer code. Hence, the
fraction of erroneous outer code symbols is also bounded by ê.) We bound the probability of the event
that ê ≥ εout using an additive Chernoff bound. Let εin denote the probability of erroneous decoding of a
noisy inner codeword ĉinj . As the ML-decoding errors are Lin independent random events, we conclude
that
                                                                               2
                                        Pr[ê ≥ εout ] ≤ 2−2Lin (εout −εin ) .
    By Theorem 5.1, εin = e−Ω(kin / log kin ) if the rate of the inner code is less than C(W ). Indeed, the rate
of the outer code is                                   p β →∞
                                               1/(1 + β ) −→ 1 .
Hence, if β is sufficiently large, then the rate of the inner code is less than C(W ) − δ /2, as required.
   As kin = β , it follows that εin = o(1/β ), and εout − εin = Θ(1/β ). By definition,

                                Lin = (nout · log β )/β > (kout · log β )/β = k/β ,
                                                                            3
and so the the bound on the error probability simplifies to 2−Ω(k/β ) .
    In the second setting, when everything but nin is fixed (i. e., k, β , and hence the outer code and kin are
fixed), we bound the probability of the event that ê ≥ εout by a union bound over all εout -fractions of Lin .
Namely,                                                        
                                                        Lin
                                 Pr(ê ≥ εout ) ≤                 · εin εout ·Lin .
                                                     εout · Lin
Recall that εout and Lin are fixed. By Theorem 5.1, εin = e−Ω(nin / log nin ) . Because nin is linear in n, the
probability of the event is bounded by 2−Ω(n/ log n) , as required.

     The proof of Corollary 3.2 is similar, except that now, when we are given the gap to capacity δ ahead
of time, we can set β to be a sufficiently large constant.


7    An extension of Poltyrev’s theorem (Theorem 4.10)
In this section we prove Theorem 4.10 restated here for the convenience of the reader.    k−n(Recall that
                                                                             ∗          n
δGV (n, k) be the root x ∈ (0, 1/2) of the equation H(x) = 1 − k/n and that wi (n, k) , i · 2   denote the
expected weight distribution of a random linear [n, k] code.)
Theorem 7.1 (Theorem 4.10 restated). Let p ∈ (0, 1/2) be a constant, δ > 0 be a constant such that
k/n < 1 − H(p) − δ , and τ ∈ [0, 1] be a threshold parameter. There exists a constant α > 0 for which the
following holds. If C is an [n, k] linear code whose weight distribution {wi (Cn )}i satisfies

                                   wi ≤ 2(δ /3)n · w∗i (n, k)    for every i ≥ τn.

Then, the probability over BSC(p) that the all zero word is ML-decoded to a codeword of weight at least
τn is 2−αn .

                          T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                       19
                          B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN


                          m ∈ {0, 1}k                                     b                    b           b




                                                                          (1)
                                             log(Σout )

                               mout            m1out                      b                        b           b     mkout
                                                                                                                        out




                                                                                  (2)
                                log(Σout )

                        cout      c1out                   b       b                b                                                cnout
                                                                                                                                       out




                                                                                  (3)
                                      kin                                                                                     kin

                        cout          m1in                    b       b                b
                                                                                                                              mL
                                                                                                                               in
                                                                                                                                 in




                                                                                  (4)
                                                   1                                                                 1



                                                  c1in                                                             cL
                                                                                                                    in
                                                                                                                      in




                                                                              b            b           b




                                                   b                                                                  b



                                                   b                                                                  b



                                                   b                                                                  b




Figure 1: Encoder: concatenation of the outer code and the inner code. Recall that |Σout | = kin = β ,
kout = k/ log2 |Σout |, Lin = (nout · log2 |Σout |)/kin and nin = n/Lin .


    Our proof will follow an analysis given by [3] (for a related statement). Let us first collect some
useful facts.

7.1   A technical lemma
The extension of the binomial coefficients to reals is defined by
                                     
                                      n              Γ(n + 1)
                                          =                       ,                                                                          (7.1)
                                      k      Γ(k + 1)Γ(n − k + 1)

where Γ(x) is the Gamma function that extends the factorial function to the real numbers. (In particular,
Γ(x) is monotone increasing for x ≥ 1 and Γ(x + 1) = x · Γ(x).)

Lemma 7.2. Let 0 < a ≤ b. Define the function f : [0, b] → R by
                                                  
                                                  a b
                                        f (x) ,            .
                                                  x      x


                      T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                                                     20
    E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS


                                                                                  BSC(p)
                                                               1                                                                   1




                                                      nin      c1in                                                               cL
                                                                                                                                   in
                                                                                                                                     in



                                                                              b   b       b




                                                                            (4)


                                               ML              ML     ML                                                                               ML
                                               dec             dec    dec             b       b       b
                                                                                                                                                       dec



                                            kin                                                                                                        kin

                             cout          m1in                                                               b           b          b
                                                                                                                                                      mL
                                                                                                                                                       in
                                                                                                                                                         in




                                                                                  (3)
                                    log(Σout )

                             cout      c1out                                                                      b           b          b
                                                                                                                                                              cnout
                                                                                                                                                                 out




                                                                                  (2)
                                                  log(Σout )

                                    mout             m1out                                        b       b           b
                                                                                                                                             mkout
                                                                                                                                                out




                                                                                  (1)


                                m ∈ {0, 1}k                                                       b       b           b




Figure 2: Decoder of the concatenated code uses ML-decoding for the inner code and the decoder of the
outer code.

Then,

                                                           ab − 1
                                                                    ≤ 1,
                                                               arg max f (x) −                                                                                         (7.2)
                                                         a+b+2
                                                                     (ab)4
                                                               
                             2                              ab
                            a ≥ a + b =⇒ max{ f (x)} ≤ f          ·          .                                                                                         (7.3)
                                                           a+b      (a + b)4

Proof. Let t , (ab − 1)/(a + b + 2). We first prove the following “discrete” monotonicity property.

   1. If i ≤ t, then f (i) ≤ f (i + 1).

   2. If i ≥ t, then f (i) ≥ f (i + 1).

The proof of this monotonicity property is by evaluating the quotient
                                                          a b
                                                            
                                              f (i)
                                     Q,              = ai ib  .
                                           f (i + 1)   i+1 i+1


                        T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                                                                             21
                            B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

It is easy to check that Q ≤ 1 if i ≤ t, and Q ≥ 1 if i ≥ t.
     Let x∗ , arg max f (x). The monotonicity property implies that t ≤ x∗ ≤ t + 1, which proves the first
part of the lemma.
     Let y , ab/(a + b). Note that y is also between t and t + 1. This implies that |x∗ − y| ≤ 1. Note
also that y ≥ 1, (a − y) = a2 /(a + b) ≥ 1, and (b − y) = b2 /(a + b) ≥ 1. Hence by the properties of the
Gamma function we obtain:
                                f (x∗ )    Γ2 (y + 1) · Γ(a + 1 − y) · Γ(b + 1 − y)
                                        = 2 ∗
                                 f (y)    Γ (x + 1) · Γ(a + 1 − x∗ ) · Γ(b + 1 − x∗ )
                                          Γ2 (y + 1) · Γ(a + 1 − y) · Γ(b + 1 − y)
                                        ≤
                                                Γ2 (y) · Γ(a − y) · Γ(b − y)
                                      = y2 · (a − y) · (b − y)
                                             ab 2 a2           b2
                                                 
                                      =              ·       ·
                                           a+b         a+b a+b
                                          (ab)  4
                                      =            .
                                        (a + b)4

7.2     Proof of Theorem 4.10
The following proof is based on [3]. Let y be the received word when the all zero word is transmitted (i.e,
{yi }i are independent Bernoulli variables with probability p). Let ŷ denote the codeword computed by
the ML-decoder with respect to the input y. Our goal is to upper-bound the event that ŷ has weight at
least ` , τn.
     Let ε > 0 denote a sufficiently small constant that depends only on p and δ ; in particular ε satisfies:
                                                             
                                                      1
                                           ε ≤ min      − p, p .                                       (7.4)
                                                      2
      We divide the analysis into two cases based on the Hamming weight of y:
                            Pr(wt(ŷ) ≥ `) ≤ Pr[|wt(y) − np| > εn]
                            y                    y

                                             + Pr[wt(ŷ) ≥ ` & |wt(y) − np| ≤ εn] .
                                                y


Case 1: The weight of y is far from np, i.e |wt(y) − np| > εn.
By additive Chernoff-Hoeffding inequality we know that,
                                                                    2
                                  Pr[|wt(y) − np| > εn] ≤ 2 · e−2ε ·n = 2−Ω(n) .

Case 2: The weight of y is close to np, i.e |wt(y) − np| ≤ εn.
Let r , wt(y). Note that,
                                             pn − εn ≤ r ≤ pn + εn .                                   (7.5)


                        T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                              22
      E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

Let P`,r denote the following probability

                                         P`,r , Pr[wt(ŷ) ≥ ` & wt(y) = r] .
                                                   y

Because all words y of weight r are equiprobable, we have

                                                               |{y : wt(y) = r, wt(ŷ) ≥ `}|
                           Pr[wt(ŷ) ≥ ` | wt(y) = r] =                                      .
                           y                                         |{y : wt(y) = r}|

Hence,

                                                       |{y : wt(y) = r, wt(ŷ) ≥ `}|
                               P`,r = Pr[wt(y) = r] ·
                                        y                    |{y : wt(y) = r}|
                                      n
                                                     |{y : wt(y) = r, ŷ = c}|
                                    ≤∑       ∑                                 .                                   (7.6)
                                     i=` c∈C:wt(c)=i     |{y : wt(y) = r}|

Let

                                            αc,r , |{y : ŷ = c & wt(y) = r}| .

    Fix a codeword c ∈ C of weight i. A word y of weight r is ML-decoded to c only if dist(y, c) ≤ r.
Without loss of generality c = 1i ◦ 0n−i (i. e., c consists of i ones followed n − i zeros). Note that
wt(y) = dist(y, 0n ). Let y0 and y00 denote the prefix of length i of y and the suffix of length n − i of y,
respectively. Because dist(y, c) ≤ r, it follows that 0 ≤ r − dist(y, c) = dist(y, 0n ) − dist(y, c). But

             dist(y, 0n ) − dist(y, c) = dist(y0 , 0i ) + dist(y00 , 0n−i ) − dist(y0 , 1i ) + dist(y00 , 0n−i )
                                        = dist(y0 , 0i ) − dist(y0 , 1i ) .

Namely, in the prefix y0 , the majority of the bits are ones. We conclude that at least i/2 of the coordinates
of the support y have to be chosen from the coordinates of the support of c. Hence,
                                                   r  
                                                              n−i
                                                                   
                                                         i
                                         αc,r = ∑                    .                                   (7.7)
                                                 w=i/2
                                                         w r−w

                 i      i
                         
      Because,   w   ≤ i/2 , we can upper-bound equation (7.7) by,

                                                      i r−i/2 n − i
                                                                 
                                              αc,r ≤      ∑ w .
                                                     i/2 w=0

                                                            n−i
                                                                 
Because ε ≤ 1/2 − p the maximal summand is                 r−i/2 , and we get an upper-bound of

                                                             n−i
                                                                 
                                                         i
                                               αc,r ≤ n               .                                            (7.8)
                                                        i/2 r − i/2


                         T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                                        23
                            B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

Substituting equation (7.8) in equation (7.6), we get,
                                                          i
                                                             n−i 
                                                 n     n i/2   r−i/2
                                         P`,r ≤ ∑ wi,n       n
                                                                    .
                                                i=`           r

The weight distribution wi,n satisfies wi,n = 2(δ /3)n · w∗i (n, k), therefore,
                                   −1 n
                                                                              n−i
                                                                               
                                     n         (δ /3)n     ∗           i
                         P`,r ≤ n          ∑2          · wi (n, k)                  .
                                     r     i=`                        i/2 r − i/2

Recall that the average weight distribution w∗i (n, k) satisfies
                                                               
                                            ∗            k−n n
                                          wi (n, k) = 2
                                                                i
therefore,
                                     −1 n
                                                                 n−i
                                                                   
                                     n       k−n+(δ /3)n n   i
                           P`,r ≤ n      ∑2                               .                            (7.9)
                                     r   i=`             i  i/2 r − i/2

Now, we show that,
                                         n−i             n−r
                                                   
                                 n   i            n   r
                                                =               .                                     (7.10)
                                 i  i/2 r − i/2   r  i/2  i/2
    The combinatorial proof proceeds by counting the number of possibilities of dividing students to two
classes and choosing committee members in two ways. Consider n students that we wish to partition to
two classes one of size i and the other of size n − i. We want to choose a committee of r students that
consists of i/2 students from the first class, and r − i/2 students from the second class. The left hand side
in equation (7.10) counts the number of possible partitions into two classes and choices of committee
members as follows. First partition the students by choosing the members of the first class, then choose
the committee members from each class. The right hand side in equation (7.10) counts the same number
of possibilities by first choosing the committee members (before dividing the students into classes). Only
then we partition the committee members to two classes. Finally, the non-committee members of the first
class are chosen.
    Plugging in equation (7.10) and equation (7.9), we get,
                                              n
                                                                    n−r
                                                                      
                                                 k−n+(δ /3)n   r
                                    P`,r ≤ n ∑ 2                           .
                                             i=`              i/2    i/2

By Lemma 7.2,

                                 n−r                    n−r          r(n − r) 4
                                                                    
                          r                    r
                                       ≤                         ·
                         i/2      i/2     r(n − r)/n r(n − r)/n         n
                                                        n−r
                                                            
                                               r
                                       ≤                         · n4 .
                                          r(n − r)/n r(n − r)/n


                        T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                              24
    E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

It follows that
                                                                               n−r
                                                                                    
                                                               r
                          P`,r ≤ n6 · 2k−n+(δ /3)n ·                                     .          (7.11)
                                                          r(n − r)/n        r(n − r)/n

Let p̂ , r/n. By equation (7.5) it follows that

                                                                                (1 − p̂)n
                                                                                        
                                                                p̂n
                           P`,r ≤ n6 · 2k−n+(δ /3)n ·                                        .      (7.12)
                                                            p̂(1 − p̂)n        p̂(1 − p̂)n
Because
                                                  
                                                  n         k
                                                     ≤ 2nH( n ) ,
                                                  k
it follows that

                              P`,r ≤ n6 · 2k−n+(δ /3)n · 2 p̂nH(1− p̂) · 2(1− p̂)nH( p̂) .

Because H( p̂) = H(1 − p̂), we get,

                                         P`,r ≤ n6 · 2k−n+(δ /3)n+nH( p̂) .

Our goal now is to prove that the exponent k − n + (δ /3)n + nH( p̂) is at most −δ · n/3. Indeed,
                                                         k        δ        
                        k − n + (δ /3)n + nH( p̂) = −n · − + 1 − − H( p̂)
                                                           n       3
                                                                       2 
                                                  ≤ −n · H(p) − H( p̂) + · δ .
                                                                        3
To complete the proof, it suffices to show that |H( p̂) − H(p)| < δ /3. Indeed, |p − p̂| ≤ ε, and hence by
continuity, this holds if ε is sufficiently small (as a function of p and δ ).

Acknowledgments. We thank Uri Erez, Meir Feder, Nissim Halabi, Simon Litsyn, Ronny Roth,
and Rami Zamir for useful conversations. We thank Rüdiger Urbanke for pointing out reference [25,
Appendix]. We thank the anonymous referees for their helpful comments and, specifically, for pointing
out Remark 1.2.


References
 [1] B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN: Deterministic rateless codes for BSC. In
     Proc. 6th Innovations in Theoretical Computer Science Conf. (ITCS’15), pp. 31–40. ACM Press,
     2015. [doi:10.1145/2688073.2688117, arXiv:1406.0157] 1

 [2] H ARI BALAKRISHNAN , P ETER I ANNUCCI , J ONATHAN P ERRY, AND D EVAVRAT S HAH: De-
     randomizing Shannon: The design and analysis of a capacity-achieving rateless code, 2012.
     [arXiv:1206.0418] 3

                       T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                            25
                        B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

 [3] A LEXANDER BARG: Lecture notes ENEE 739C: Advanced topics in signal processing: Coding
     theory (lecture 4), 2003. Available from author’s website. 16, 20, 22

 [4] A LEXANDER BARG AND G EORGE DAVID F ORNEY, J R .: Random codes: Minimum
     distances and error exponents.  IEEE Trans. Inform. Theory, 48(9):2568–2573, 2002.
     [doi:10.1109/TIT.2002.800480] 4

 [5] A LEXANDER BARG AND G ILLES Z ÉMOR: Error exponents of expander codes. IEEE Trans. Inform.
     Theory, 48(6):1725–1729, 2002. Preliminary version in ISIT’01. [doi:10.1109/TIT.2002.1003853]
     2

 [6] A LEXANDER BARG AND G ILLES Z ÉMOR: Error exponents of expander codes under linear-
     complexity decoding. SIAM J. Comput., 17(3):426–445, 2004. [doi:10.1137/S0895480102403799]
     2

 [7] J OHN W. B YERS , M ICHAEL L UBY, M ICHAEL M ITZENMACHER , AND A SHUTOSH R EGE: A
     digital fountain approach to reliable distribution of bulk data. SIGCOMM Comput. Commun. Rev.,
     28(4):56–67, 1998. [doi:10.1145/285243.285258] 2

 [8] G IUSEPPE C AIRE AND DANIELA T UNINETTI: The throughput of hybrid-ARQ protocols
     for the Gaussian collision channel. IEEE Trans. Inform. Theory, 47(5):1971–1988, 2001.
     [doi:10.1109/18.930931] 2

 [9] DAVID C HASE: Code combining–a maximum-likelihood decoding approach for combin-
     ing an arbitrary number of noisy packets. IEEE Trans. Commun., 33(5):385–393, 1985.
     [doi:10.1109/TCOM.1985.1096314] 2

[10] U RI E REZ , M ITCHELL D. T ROTT, AND G REGORY W. W ORNELL: Rateless coding for Gaussian
     channels. IEEE Trans. Inform. Theory, 58(2):530–547, 2012. [doi:10.1109/TIT.2011.2173242,
     arXiv:0708.2575] 2

[11] G EORGE DAVID F ORNEY, J R .: Concatenated Codes. MIT Press, 1966. 5

[12] V ENKATESAN G URUSWAMI AND P IOTR I NDYK: Linear-time encodable/decodable codes with
     near-optimal rate. IEEE Trans. Inform. Theory, 51(10):3393–3400, 2005. Preliminary version in
     STOC’02. [doi:10.1109/TIT.2005.855587] 5, 17

[13] J EONGSEOK H A , JAEHONG K IM , AND S TEVEN W. M C L AUGHLIN: Rate-compatible punctur-
     ing of low-density parity-check codes. IEEE Trans. Inform. Theory, 50(11):2824–2836, 2004.
     [doi:10.1109/TIT.2004.836667] 2

[14] J OACHIM H AGENAUER: Rate-compatible punctured convolutional codes (RCPC codes) and their
     applications. IEEE Trans. Commun., 36(4):389–400, 1988. [doi:10.1109/26.2763] 2

[15] T INGFANG J I AND WAYNE S TARK: Rate-adaptive transmission over correlated fading channels.
     IEEE Trans. Commun., 53(10):1663–1670, 2005. [doi:10.1109/TCOMM.2005.857147] 2

                     T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                       26
    E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

[16] S HU L IN , DANIEL C OSTELLO , AND M ICHAEL M ILLER:            Automatic-repeat-
     request error-control schemes.  IEEE Communications Magazine, 22(12):5–17, 1984.
     [doi:10.1109/MCOM.1984.1091865] 2

[17] M ICHAEL L UBY: LT codes. In Proc. 43rd FOCS, pp. 271–280. IEEE Comp. Soc. Press, 2002.
     [doi:10.1109/SFCS.2002.1181950] 2

[18] DAVID M ANDELBAUM: An adaptive-feedback coding scheme using incremental redundancy. IEEE
     Trans. Inform. Theory, 20(3):388–389, 1974. [doi:10.1109/TIT.1974.1055215] 2

[19] J ONATHAN P ERRY, H ARI BALAKRISHNAN , AND D EVAVRAT S HAH: Rateless spinal codes. In
     Proc. 10th ACM Workshop on Hot Topics in Networks (HotNets-X’11), pp. 6:1–6:6. ACM Press,
     2011. [doi:10.1145/2070562.2070568] 3

[20] J ONATHAN P ERRY, P ETER A. I ANNUCCI , K ERMIN E. F LEMING , H ARI BALAKRISHNAN , AND
     D EVAVRAT S HAH: Spinal codes. In Proc. 2012 Conf. on Applications, Technologies, Archi-
     tectures, and Protocols for Comput. Commun. (SIGCOMM’12), pp. 49–60. ACM Press, 2012.
     [doi:10.1145/2342356.2342363] 2, 3

[21] G REGORY P OLTYREV: Bounds on the decoding error probability of binary linear codes via their
     spectra. IEEE Trans. Inform. Theory, 40(4):1284–1292, 1994. [doi:10.1109/18.335935] 4, 16

[22] D ORON R AJWAN: Method of encoding and transmitting data over a communication medium
     through division and segmentation, 2007. US Patent 7,304,990. 2

[23] D ORON R AJWAN , E YAL L UBETZKY, AND J OSEPH YOSSI A ZAR: Data streaming, 2008. US
     Patent 7,327,761. 2

[24] D OUGLAS N. ROWITCH AND L AURENCE B. M ILSTEIN: On the performance of hybrid FEC/ARQ
     systems using rate compatible punctured turbo (RCPT) codes. IEEE Trans. Commun., 48(6):948–
     959, 2000. [doi:10.1109/26.848555] 2

[25] E REN Ş A ŞO ĞLU: Polar Coding Theorems for Discrete Systems. Ph. D. thesis, EPFL, 2011. 17, 25

[26] S TEFANIA S ESIA , G IUSEPPE C AIRE , AND G UILLAUME V IVIER: Incremental redundancy hybrid
     ARQ schemes based on low-density parity-check codes. IEEE Trans. Commun., 52(8):1311–1321,
     2004. [doi:10.1109/TCOMM.2004.833022] 2

[27] A MIN S HOKROLLAHI: Raptor codes. IEEE Trans. Inform. Theory, 52(6):2551–2567, 2006.
     [doi:10.1109/TIT.2006.874390] 2

[28] NADAV S HULMAN: Communication over an Unknown Channel via Common Broadcasting. Ph. D.
     thesis, Tel Aviv University, 2003. LINK at SemanticScholar. 2

[29] NADAV S HULMAN AND M EIR F EDER: Random coding techniques for nonrandom codes. IEEE
     Trans. Inform. Theory, 45(6):2101–2104, 1999. [doi:10.1109/18.782147] 4, 16

                      T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                          27
                         B ENNY A PPLEBAUM , L IRON DAVID , AND G UY E VEN

[30] NADAV S HULMAN AND M EIR F EDER: Static broadcasting. In Proc. 2000 IEEE Internat. Symp. on
     Inform. Theory (ISIT’00), p. 23. IEEE Comp. Soc. Press, 2000. [doi:10.1109/ISIT.2000.866313] 2

[31] DANIEL A. S PIELMAN: Computationally efficient error-correcting codes and holographic proofs.
     Ph. D. thesis, MIT, 1996. Available at DSpace@MIT. 5

[32] DANIEL A. S PIELMAN: Linear-time encodable and decodable error-correcting codes. IEEE Trans.
     Inform. Theory, 42(6):1723–1731, 1996. Preliminary version in STOC’95. [doi:10.1109/18.556668]
     5


AUTHORS

     Benny Applebaum
     Associate professor
     Tel Aviv University
     Tel Aviv, Israel
     bennyap post tau ac il
     https://www.eng.tau.ac.il/~bennyap


     Liron David
     Ph. D. candidate
     Tel Aviv University
     Tel Aviv, Israel
     lirondavid gmail com


     Guy Even
     Professor
     Tel Aviv University
     Tel Aviv, Israel
     guy eng tau ac il
     http://hyde.eng.tau.ac.il/wordpress/


ABOUT THE AUTHORS

      B ENNY A PPLEBAUM is an Associate Professor of Electrical Engineering at Tel-Aviv Uni-
         versity. He received his B. Sc. from the Hebrew University of Jerusalem in 2002, and
         his Ph. D. from the Computer Science Department of the Technion in 2007 under the
         supervision of Yuval Ishai and Eyal Kushilevitz. Before joining Tel-Aviv he was a
         postdoc at Princeton University and Weizmann Institute of Science. He is interested in
         the theory of computation, mainly in cryptography and computational complexity. He
         enjoys spending time with his family and playing the guitar.


                     T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                         28
E XPLICIT R ATELESS C ODES FOR M EMORYLESS B INARY-I NPUT O UTPUT-S YMMETRIC C HANNELS

 L IRON DAVID is a Ph. D. candidate in the Electrical Engineering Department at Tel-Aviv
     University, under the supervision of Avishai Wool. She received a Weinstein paper prize
     in 2018 and a Weinstein award for excellence in studies in 2017. She got a Rector’s
     award for excellence in teaching for the academic year 2015-2016. She completed
     her M. Sc. in 2014 under the supervision of Benny Applebaum and Guy Even in the
     Electrical Engineering Department at Tel-Aviv University. She owes much of her interest
     in research to her M. Sc. advisors. She obtained her B. Sc. in Electrical and Electronics
     Engineering and Computer Science at Tel-Aviv University in 2011. Liron’s hobby is
     playing the piano. She completed four years of piano studies at the Israel Conservatory
     of Music, Tel-Aviv.


 G UY E VEN received his B. Sc. degree in 1988 in Mathematics and Computer Science from
    the Hebrew University. Received his M. Sc. and D. Sc. in Computer Science from the
    Technion in 1991 and 1994, respectively. Guy spent his post-doctorate in the University
    of the Saarland during 1995–1997 with Wolfgang Paul. Since 1997, he has been a faculty
    member in the School of Electrical Engineering in Tel-Aviv University. He is interested in
    algorithms and their applications in various fields. He has published papers on the theory
    of VLSI, approximation algorithms, computer arithmetic, online algorithms, frequency
    assignment, scheduling, packet-routing, linear-programming decoding of LDPC codes,
    and rateless codes. He is on the editorial board of “Theory of Computing Systems.”
    Together with Moti Medina, Guy wrote a digital hardware textbook titled “Digital Logic
    Design: A Rigorous Approach” published in 2012 by Cambridge University Press.




                 T HEORY OF C OMPUTING, Volume 14 (4), 2018, pp. 1–29                            29