On the Hardness of Learning With Errors with Binary Secrets

Authors Daniele Micciancio,
Plaintext
                           T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17
                                        www.theoryofcomputing.org




     On the Hardness of Learning With Errors
               with Binary Secrets
                                                Daniele Micciancio∗
             Received September 17, 2017; Revised October 13, 2018; Published November 30, 2018




       Abstract: We give a simple proof that the decisional Learning With Errors (LWE) problem
       with binary secrets (and an arbitrary polynomial number of samples) is at least as hard as
       the standard LWE problem (with unrestricted, uniformly random secrets, and a bounded,
       quasi-linear number of samples). This proves that the binary-secret LWE distribution is
       pseudorandom, under standard worst-case complexity assumptions on lattice problems. Our
       results are similar to those proved by Brakerski, Langlois, Peikert, Regev and Stehlé (STOC
       2013), but provide a shorter, more direct proof, and a small improvement in the noise growth
       of the reduction.


1     Introduction
The Learning With Errors (LWE) problem [21, 22] plays a central role in lattice cryptography, its secure
instantiation, and its most advanced applications. The usefulness of LWE in cryptography is due in large
part to its pseudorandomness properties, captured by the standard decisional LWE problem defined as
follows. An LWE instance is described by a matrix A ∈ Zm×nq   (chosen uniformly at random) and a vector
    ∗ Research supported in part by the Defense Advanced Research Project Agency (DARPA) and the U.S. Army Research

Office under the SafeWare program, and the National Science Foundation (NSF) under grant CNS-1528068. Any opinions,
findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect
reflect the views, position or policy of the Government.


ACM Classification: F.2.2, F.1.3
AMS Classification: 68Q17, 52C07, 11H06
Key words and phrases: complexity theory, cryptography, pseudorandomness, lattice, learning, LWE


© 2018 Daniele Micciancio
c b Licensed under a Creative Commons Attribution License (CC-BY)                                 DOI: 10.4086/toc.2018.v014a013
                                                   DANIELE M ICCIANCIO

b ∈ Zmq which may be chosen either uniformly at random, or as b = As + e (mod q), where s ∈ Zq is a
                                                                                                         n
                              m
random secret and e ∈ Z is a “small” error vector, typically chosen with independent discrete Gaussian
                                       √
entries of standard deviation σ ≈ n. The (Decisional) LWE problem asks to distinguish between these
two cases.
    Several variants of LWE exist in the literature, depending on how s and e are chosen, all motivated
by specific cryptographic applications. In the most standard formulation of LWE, the secret s ∈ Znq is
chosen uniformly at random. But this is often undesirable in many cryptographic applications, e. g., those
making use of modulus-switching techniques, where large secrets result in substantial ciphertext quality
degradation. Ideally, it would be best to choose s ∈ {0, 1}n as a vector with binary entries, as used for
example in many Fully Homomorphic Encryption schemes (e. g., see [9, 8]). This binary-secret LWE
also plays a fundamental role in theoretical studies, like the proof that LWE is leakage resilient [11], and
the proof that LWE with polynomial modulus q is at least as hard as worst-case lattice problems under
classical (i. e., non-quantum) reductions [6].
    This last work [6] is the best currently known hardness result for binary-secret LWE, and gives a
reduction from (standard) LWE with arbitrary secret in Znq , to LWE with secret in {0, 1}n log q , i. e., the
secret can be restricted to binary vectors at the cost of increasing the dimension1 from n to n log q. This
reduction is a major part of the main result of [6] on the classical hardness of LWE, and takes a good half
of that paper, going through a careful hybrid argument involving some technical (“first-is-errorless” and
“extended-LWE”) problem variants.
    In this paper we present a direct and substantially shorter proof of this important result. In fact, while
the proof of this result given in [6] is over 7 pages long, spanning multiple subsections, and involving a
number of intermediate problems, our proof has a more direct structure and it is much shorter. A key
insight leading to our simpler proof is the formulation of the binary LWE problem (denoted LWE± )
using secrets in {±1}n , rather than {0, 1}n . This is easily seen (for odd2 modulus q) to be equivalent to
the more common {0, 1} formulation via the affine transformation s 7→ (2s − 1), but has the technical
advantage that all secrets have exactly the same Euclidean length, simplifying the application of discrete
Gaussian convolution theorems. Given the equivalence between the two problems, we will keep referring
to LWE± informally as the binary LWE problem. Other than presenting a simpler and shorter proof, we
do not claim any new results over previous work: our results, and the range of parameters for which we
reduce LWE to LWE± , are essentially the same as in [6, Theorem√4.1], except possibly for reducing
some constants,
           √        e. g., in our reduction the error grows by a factor 2 n + 1, while in [6, Theorem 4.1] it
grows by 10n.
    Given the important role played by binary LWE in many cryptographic applications, we hope that
our simplified treatment will make the theoretical hardness of this problem more easily accessible, and
stimulate further research.

Related work. The LWE problem with small secret was first formally considered by Applebaum, Cash,
Peikert and Sahai in [3], who proved that, without loss of generality, one may assume that the secret
   1 As remarked in [6], this increase seems unavoidable, as it preserves the bit-length of the secret.
   2 Hardness results for binary LWE with even modulus q are easily obtained by modulus switching, i. e., scaling and (randomly)

rounding each entry x ∈ Zq of A or b bx · ((q − 1)/q)e$ ∈ Zq−1 . This increases the error roughly by an additive term O(ksk),
which is small because s is a binary secret.


                           T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                              2
                  O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

                                                                                                        √
follows the same distribution as the LWE errors. This allows the secret coordinates to be as small as n,
but not as small as {0, 1}. For a list of applications using LWE with small secrets see [1].
    Reducing LWE to have a binary secret was first considered by Goldwasser, Kalai, Peikert and
Vaikuntanathan in [11], motivated by questions in leakage-resilient cryptography, where the problem is
proved hard using “noise-flooding” techniques. A stronger reduction is given by Brakerski, Langlois,
Peikert, Regev and Stehlé in [6], in the context of proving classical hardness results for LWE.
    A different (and much harder) problem is that of proving that LWE is computationally hard when the
error (and not just the secret) follows the binary distribution [17, 7]. In fact, LWE with small errors can
be efficiently solved when sufficiently many samples are available [4, 2, 13]. In this paper, we do not
study LWE with binary errors.
    Attacks against LWE with binary secret (and Gaussian errors) are considered in [1, 5]. Theoretically,
the secret can be assumed binary by increasing the LWE dimension to n log q [6], but experimental results
in [5] suggest that, heuristically, increasing the secret dimension by a log log n factor may already be
enough to counter the best known cryptanalytic attacks for common parameter settings.

Paper organization. In Section 2 we introduce the notation used in this paper, provide a formal
definition of the LWE problem, and present some background results, including a simple lemma on the
projection of discrete Gaussians (Lemma 2.6), and the construction of a gadget matrix needed in our main
reduction (Lemma 2.7). The proof that LWE with binary secrets is pseudorandom is given in Section 3.
Section 4 concludes with a discussion of open problems.


2    Preliminaries
We use bold lowercase letters a for vectors, and bold uppercase A for matrices. Probability distributions
are denoted using calligraphic letters A. We write vectors as columns v ∈ Zn = Zn×1 . The transpose of
a vector or matrix A is denoted At . We write [A1 , . . . , An ] for the horizontal concatenation of matrices
Ai ∈ Zk×mi , and use transpose notation [A1 , . . . , An ]t for the vertical concatenation of At1 , . . . , Atn . Let
e1 , . . . , en be the standard basis of Zn , I = [e1 , . . . , en ] the n × n identity matrix, and u = ∑i ei the all-ones
vector. The Euclidean norm of a vector is
                                                                 r
                                                     kxk = ∑ xi2 ,
                                                                i

and the max norm is kxk∞ = maxi |xi |.
    For any vector z = (z1 , . . . , zn ) ∈ Zn , we write diag(z) = [z1 · e1 , . . . , zn · en ] for the diagonal matrix
with the entries of z along the diagonal. So, for example, diag(u) = I. For any integer matrix Q ∈ Zn×m
and for any positive integer k ≤ m, we write Q[k] for the matrix consisting of the first k columns of Q,
and Q]k[ for the matrix obtained by removing the first k columns from Q. So, Q = [Q[k] , Q]k[ ] where
Q[k] ∈ Zn×k and Q]k[ ∈ Zn×(m−k) .
    For any integer matrix Q ∈ Zk×m , we write ker(Q) = {x ∈ Zm : Qx = 0} for the kernel of Q : Zm → Zk
as an integer linear map. We say that a matrix Q ∈ Zk×m is primitive if QZm = Zk , i. e., if Q : Zm → Zk
is surjective. As a special case, a row vector wt ∈ Z1×k is primitive if and only if the greatest common
divisor of its entries equals gcd(w) = 1.

                          T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                         3
                                             DANIELE M ICCIANCIO

2.1   Probabilities and asymptotics
We use standard asymptotic notation, O(·), Ω(·) and ω(·), and all asymptotics refer to a (possibly implicit)
integer variable n. For example, we may write nO(1) for an arbitrary polynomially bounded function of n,
and n−ω(1) for a negligible function. Other parameters defining the size of a problem instance are always
assumed to be polynomial in n. So, if A ∈ Zk×m is a matrix with integer entries, the number of rows
k = nO(1) , the number of columns m = nO(1) , and the bitsize maxi, j log |ai, j | = nO(1) of the matrix entries
are all assumed to be (at most) polynomial in n.
    A probability ensemble is a sequence An of probability distributions over sets An , for n ∈ N =
{1, 2, . . .}. We always assume that all elements of An ⊆ {0, 1}`(n) can be represented by strings of some
fixed length `(n). We write x ← A for the operation of sampling an element x according to distribution A,
and Pr{x ← A} for the probability of x under A. The uniform distribution over a set A is denoted U(A).
    The statistical distance between two distributions A, A0 over a set A is

                                             1
                               ∆(A, A0 ) =     ∑ | Pr{x ← A} − Pr{x ← A0 }| .
                                             2 x∈A

Two distribution ensembles An , A0n are statistically close (written An ≈ A0n ) if the statistical distance
∆(An , A0n ) = n−ω(1) is negligible. Two ensembles An , A0n are computationally indistinguishable if for any
efficient (probabilistic polynomial-time computable) predicate P, P(An ) ≈ P(A0n ). The gap

                              ∆(P(An ), P(A0n )) = | Pr{P(An )} − Pr{P(A0n )}|

is called the advantage of P in distinguishing between the two distributions. An ensemble An over sets
An is pseudorandom if it is computationally indistinguishable from the uniform distributions U(An ). If
An ≈ U(An ) are statistically close, then we say that An is almost uniform or statistically pseudorandom.
    We typically leave the parameter n implicit, and talk about individual distributions A over a single
set A, but all asymptotic statements should be interpreted as referring to ensembles An parameterized
by an integer n in some obvious way. For example, we may say that a distribution A over a set A is
pseudorandom if no efficient algorithm can distinguish A from U(A) with better than negligible advantage.
More precisely, an efficiently sampleable ensemble {An }n>0 over the sets {An }n>0 is pseudorandom if
any predicate P computable in probabilistic polynomial time nO(1) has at most negligible advantage

                                  | Pr{P(An )} − Pr{P(U(An ))}| ≤ n−ω(1)

in distinguishing An from the uniform distribution U(An ).
    We write Z for the set of integers, and Zq = Z/(qZ) for the integers modulo q. We will need the
following version of the leftover hash lemma, and a bound on the probability that a random vector is
primitive modulo q.

Lemma 2.1 (Leftover Hash Lemma, [12]). For any odd integer q, positive real ε > 0 and integers k
and n ≥ log2 (qk /ε 2 ), the distribution X = {(A, Az) : A ← U(Zk×n
                                                                q ), z ← U({±1} )} is within statistical
                                                                                  n

distance ∆(X, U) ≤ ε from the uniform distribution U = U(Zk×n         k
                                                               q × Zq ). In particular, if n ≥ k log2 (q) +
ω(log n), then X ≈ U is (statistically) pseudorandom.

                        T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                 4
                  O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

                                                                           O(1)
Lemma 2.2 (Primitive Vectors). For any positive integers q = 2n and k = ω(log n), if w ∈ Zkq is chosen
uniformly at random, then gcd(w, q) = 1 except with negligible probability.
Proof. The probability that gcd(w, q) 6= 1 is at most

                                   ∑ p−k ≤ (log q)/2k ≤ nO(1) /nω(1) = n−ω(1)
                                   p|q

where the summation is over all prime factors of q. We used the fact that all prime factors are at least
p ≥ 2, and there are at most log2 q of them. Better bounds are possible, but this crude estimate is more
than enough for the purposes of this paper.

2.2    Gaussian distributions
Let ρ(x) = exp(−πx2 ) be the Gaussian function with total mass x∈R ρ(x) dx = 1, and ρσ (x) = ρ(x/σ )
                                                                             R

its scaling by a factor σ > 0. For a set A, we write ρσ (A) as a shorthand for ∑x∈A ρσ (A). The discrete
Gaussian distribution of parameter σ , denoted3 Dσ , picks each integer x ∈ Z with probability proportional
to ρσ (x), i. e., Pr{x ← Dσ } = ρσ (x)/ρσ (Z). The product distribution Dkσ selects each x ∈ Zk with
probability proportional to ρσ (x) = ρσ (kxk) = ∏i ρσ (xi ). If x and y are orthogonal vectors (xt y = 0),
then by the Pythagorean theorem ρσ (x + y) = ρσ (x) · ρσ (y).
     A rank-n integer lattice is the set Λ = BZn ⊆ Zd of all integer linear combinations of n linearly
independent vectors B = [b1 , . . . , bn ] in Zd . The last successive minimum of a rank-n lattice Λ is the
smallest positive real λn such that Λ contains n linearly independent vectors of length at most λn . Another
standard quantity associated to a lattice is the smoothing parameter ηε (Λ), which is parameterized by a
positive real ε > 0. In this paper, all we need to know about the smoothing parameter are the following
two bounds.
Lemma 2.3 (See [18, Lemma 4.1] and [10, Lemma 2.4]). For any lattice Λ, ε ∈ (0, 1), and vector c in
the linear span of Λ, if σ > ηε (Λ), then ρσ (Λ + c) ∈ [(1 − ε)/(1 + ε), 1] · ρσ (Λ).
Lemma 2.4 (Smoothing Parameter Bound, [18, Lemma 3.3]). For any rank-n lattice Λ and positive real
ε > 0, the smoothing parameter is at most
                                           r
                                              ln(2n(1 + 1/ε))
                                  ηε (Λ) ≤                       · λn (Λ) .                          (2.1)
                                                      π
                         √
In particular, for any ω( log n) function there is a negligible function ε(n) = n−ω(1) such that ηε (Λ) ≤
   √
ω( log n) · λn (Λ).
    When the smoothing parameter η(Λ) is written without specifying the value of ε, it is assumed that
ε = n−ω(1) is an arbitrary negligible functionpof the asymptotic variable n. For example, the smoothing
                                                                        √
parameter of the integer lattice is η(Z) ≤ ln(2(1 + 1/ε)/π) = ω( log n). We will also need the
following convolution theorems for discrete Gaussians.
  3 In the literature, D is often used for the continuous Gaussian distribution over the real numbers R, while the discrete
                        σ
Gaussian is denoted DZ,σ . Since here we do not use continuous Gaussians, for brevity we use Dσ to denote the discrete
Gaussian distribution over the integers.


                          T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                          5
                                                DANIELE M ICCIANCIO

Lemma
√      2.5 (Convolution, [17, Theorem 3]). For any primitive vector v ∈ Zm and positive reals σi ≥
    p∞ η(Z), if yi ← Dσi for i = 1, . . . , m, then the sum y = ∑i vi · yi is statistically close to Dσ , where
  2kvk
σ = ∑i (vi σi )2 .

Lemma 2.6 (Gaussian Projection). For any primitive matrix T ∈ Zk×m , positive reals α, σ > 0, and
                                                                        σ ) ≈ Dασ .
negligible ε = n−ω(1) , if T · Tt = α 2 · I and η(ker(T)) ≤ σ , then T(Dm      k


Proof. Let y ∈ Zk be an arbitrary integer vector, and let x ∈ Zm be such that Tx = y. By linearity, any
other z ∈ Zm maps to Tz = y if and only if z ∈ x + ker(T). So, by definition, the probability of y = Tx
under T(Dm                                                          t    2     m                       m
            σ ) is proportional to ρσ (x + ker(T)). Let x1 = T y/α ∈ R and x0 = x − x1 ∈ R , so that
x = x0 + x1 , and x0 is orthogonal to the rows of T. It follows that x1 is orthogonal to x0 and ker(T).
Therefore, ρσ (x + ker(T)) = ρσ (x1 ) · ρσ (x0 + ker(T)). Since x0 belongs to the linear span of ker(T),
and σ ≥ η(ker(T)), by Lemma 2.3 the Gaussian mass ρσ (x0 + ker(T)) is essentially independent of x0 ,
up to a negligible relative error. So, up to this error, the probability of y is proportional to ρσ (x1 ). Finally,
we observe that kx1 k2 = yt TTt y/α 4 = kyk2 /α 2 , and therefore ρσ (x1 ) = ρσ (kyk/α) = ρασ (y). This
proves that T(Dm   σ ) is statistically close to the discrete Gaussian distribution Dασ .
                                                                                      k




2.3   A gadget matrix construction
Our main proof requires an integer matrix satisfying some special properties. In the following lemma,
we state the required properties and give a simple construction. We recall that notation Q[n] (resp. Q]n[ )
stands for the matrix obtained by taking (resp. dropping) the first n columns of a matrix Q. In particular,
Q]1[ is the matrix Q without its first column.

Lemma 2.7. There is an efficiently computable matrix Q ∈ Zn×(2n+3) such that Q[n] is invertible, ut Q[n] =
                                              √
et1 , the vector vt = ut Q]n[ has norm kvk = 2 n, kvk∞ = 2, and the matrix T = Q]1[ satisfies T(D2n+2  )≈
                       √                                                                          σ
Dn2σ for all σ ≥ ω( log n).

Proof. Define the matrix
                                                                           
                                                              −1
                               n−1
                                                                  ..      
                                                         1             .
                                     (ei+1 − ei ) · eti =                  ∈ Zn×(n−1) .
                                                                           
                          X= ∑                                    ..      
                               i=1                                   . −1 
                                                                         1

The idea is to start with the square matrix Q̄ = Q̄[n] = [e1 , X], which is unitriangular (i. e., triangular,
with unit elements along the diagonal, and, therefore, invertible), and it satisfies ut Q̄ = et1 . We would
like to use Lemma 2.6 to analyze the distribution Q̄]1[ (Dm             m
                                                             σ ) = X(Dσ ). However, X is not primitive
                                     t     2
and does not satisfy the property XX = α I required by Lemma 2.6 because adjacent rows of X have
scalar product −1. Other pairs of rows are orthogonal, so XXt is tridiagonal (i. e., with nonzero entries
only on or immediately next to the main diagonal), but not diagonal. To fix this, we extend Q̄ to

                         T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                   6
                 O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

Q̃ = [Q̄, Y] = [e1 , X, Y] with a block of (n − 1) coordinates
                                                                                 
                                                                  1
                                 n−1
                                                                     ..     
                                                              1            .
                                       (ei+1 + ei ) · eti =                  ∈ Zn×(n−1)
                                                                            
                            Y= ∑                                     ..     
                                 i=1                                    . 1 
                                                                           1

where adjacent rows have scalar product 1, and cancel out with X. This time Q̃]1[ = [X, Y] has pairwise
orthogonal rows, but the first and last rows have a different norm than the rest. So, [X, Y][X, Y]t is
diagonal, but it is still not a scalar matrix α 2 I. We complete the construction by adding 4 more columns
to make each row of Q]1[ contain precisely 4 nonzero ±1 entries. Our final construction is

                                            Q = [e1 , X, −en , Y, en , e1 , e1 ]

where the position and sign of the new columns have been chosen to highlight the (square) unitriangular
blocks X̃ = [X, −en ], Ỹ = [Y, en ] ∈ Zn×n . Notice that Ỹ = X̃ + 2I, and therefore the two blocks commute,
i. e., X̃Ỹ = ỸX̃.
      We already know that Q[n] = [e1 , X] is invertible, ut Q[n] = et1 , and it is immediate to verify that the
                                       √
vector vt = ut Q]n[ satisfies kvk = 2 n and kvk∞ = 2. It remains to analyze T(D2n+2    σ    ), where

                                                T = Q]1[ = [X̃, Ỹ, e1 , e1 ] .

This matrix is primitive because it starts with a unitriangular block, and it satisfies TTt = 4I by con-
struction. In order to apply Lemma 2.6, and conclude that T(D2n+2
                                                                σ   ) ≈ D2σ , we only need to bound the
smoothing parameter of Λ = ker(T). This lattice is defined by a system Tx = 0 of n linearly independent
equations in 2n + 2 variables. So, Λ is a rank-(n + 2) lattice. Moreover, it contains (n + 2) vectors of
length at most 2 given by the columns of the matrix
                                             
                                     Ỹ e1
                                   −X̃ −e1
                                               ∈ Z(2n+2)×(n+2) .
                                              
                                V=
                                        1  1 
                                         1 −1

The columns are linearly independent because the matrix
                                                                 
                                                I I
                                 W=                    1 1  ∈ Z(n+2)×(2n+2)
                                                        1 −1

satisfies WV = 2I. So V has rank n + 2. This proves that λn+2 (Λ) ≤ 2, and therefore, by Lemma 2.4,
           √
η(Λ) ≤ ω( log n) ≤ σ . So, all hypotheses of Lemma 2.6 are satisfied and T(D2n+2
                                                                              σ   ) ≈ Dn2σ .


                        T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                 7
                                           DANIELE M ICCIANCIO

2.4   Computational problems and LWE
All computational problems considered in this paper are decision problems about pseudorandom distribu-
tions. Specifically, for any distribution ensemble An over sets An , the An -assumption is the assumption
that An is pseudorandom, and the An -problem is the computational problem of distinguishing An from the
uniform distribution U(An ) with non-negligible advantage. So, all problems will be implicitly specified
simply by defining an appropriate set of distributions An .
     A reduction between (the decision problems associated to) two distributions An and A0n over sets
An and A0n (from An to A0n ) is an efficient (probabilistic polynomial-time) algorithm that solves problem
An (i. e., distinguishes An from the uniform distribution with non-negligible advantage) given access to
any oracle that solves A0n with (possibly different, but still) non-negligible advantage. In the simplest
settings (e. g., see Lemmas 2.12 and 2.13) a reduction may be specified just by an efficient (probabilistic
polynomial-time computable) function ϕ such that ϕ(An ) ≈ A0n and ϕ(U(An )) ≈ U(A0n ). Most of our
reductions are more complex, and make use of hybrid arguments (see Lemma 2.9) that require oracle
calls on distributions other than A0n or U(A0n ).
     In this paper, it is convenient to consider a version of the Learning With Errors (LWE) problem where
the secret is a matrix S, rather than a vector, defined as follows.

Definition 2.8. For any positive integers q, n, k, m and real σ , let LWE(q, n × k, m, σ ) be the LWE
distribution with modulus q, number of samples m, secret dimension n × k, and error parameter σ , i. e.,
the distribution of
                                                        m×(n+k)
                                       [A, AS + E] ∈ Zq
obtained by picking A ← U(Zm×n
                           q   ) and S ← U(Zn×k
                                            q ) uniformly at random, and E ← Dσ
                                                                              m×k with discrete

Gaussian distribution.

    When k = 1, the secret is just a vector s ∈ Znq , and this is the standard version of LWE, which
we write LWE(q, n, m, σ ) instead of LWE(q, n × 1, m, σ ). The m rows of the LWE can be viewed as
random noisy labeled samples from a hard-to-learn linear function defined by the secret S. Worst-case
to average-case reductions [21, 19, 6, 20] support the conjecture that the LWE problem is hard for an
arbitrary (polynomially bounded) number of samples m = nO(1) , and some reductions require this extra
flexibility. (E. g., the LWE search-to-decision reduction in [21], but see also [16] for a sample-preserving
reduction.) This version of the problem is denoted LWE(q, n, σ ). The modulus q is always assumed
to have bit-size polynomial in n (i. e., log2 q ≤ nO(1) ), but in most cryptographic applications it is just a
small polynomial (e. g., q ≤ n2 ), and integers modulo q are represented with O(log n) bits.
    The vector and matrix variants of LWE are easily seen to be equivalent via a standard hybrid argument.

Lemma 2.9. There is a polynomial-time reduction from LWE(q, n, m, σ ) to LWE(q, n × k, m, σ ).

Proof. The intuition behind the proof is that the LWE distribution with secret matrix S ∈ Zn×k
                                                                                           q   may
be regarded as k copies of the standard LWE distribution with secret vectors given by the columns
of S, all using the same public random A. More technically, the reduction considers the sequence
of hybrid distributions Ai = (A, [ASi + Ei , Bi ]) where A ← U(Zm×n
                                                                q   ), Si ← U(Zn×i
                                                                               q ), Ei ← Dσ
                                                                                            m×i and
           m×(k−i)
Bi ← U(Zq         ), for i = 0, . . . , k. Each pair of neighboring hybrids Ai , Ai+1 can be generated by
starting from an LWE(q, n, m, σ ) challenge sample (A, b), and then computing A = (A, [AS + E, b, B])

                       T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                8
                O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

                                                m×(k−i−1)
where S ← U(Zn×i
               q ), E ← Dσ
                             m×i and B ← U(Z
                                              q        ). The resulting distribution equals A = Ai if b
is random, and A = Ai+1 if b = As + e is pseudorandom. So, any distinguisher with advantage ε against
LWE(q, n × k, m, σ ) will achieve advantage ε/k against LWE(q, n, m, σ ).

Definition 2.10. The LWE0,1 (q, n, m, σ ) distribution (and associated decision problem and pseudoran-
domness assumption) is defined just like LWE(q, n, m, σ ), except that the secret s ← U({0, 1}n ) is chosen
with random binary entries.

Definition 2.11. The LWE± (q, n, m, σ ) distribution (and associated decision problem and pseudoran-
domness assumption) is defined just like LWE(q, n, m, σ ), except that the secret s ← U({±1}n ) is chosen
with random unit entries.

    We remark that LWE0,1 and LWE± could also be generalized to secret matrices S, and proved
equivalent to the single-vector version exactly as in Lemma 2.9. But this is not used in this paper, so,
for simplicity, we only define the secret-vector version of the problems. The next two lemmas show that
LWE0,1 and LWE± are essentially the same problem. We remark that the lemmas are even more general
than stated, and they apply to LWE problems with arbitrary error distribution, not just discrete Gaussians.
All parameters (including the number of samples, and the exact error distribution) are preserved by the
reductions, showing that the two problems are equivalent in a very strong sense.

Lemma 2.12. For any odd integer q, there is a polynomial-time reduction from the LWE0,1 (q, n, m, σ )
problem to the LWE± (q, n, m, σ ) problem.

Proof. On input an LWE0,1 instance (A, b), the reduction outputs ϕ(A, b) = (A/2, b0 = b − (A/2)u)
where A/2 is computed modulo q, and u = (1, . . . , 1) ∈ Znq . Notice that, since q is odd, the factor 2 is
invertible modulo q, and A/2 is uniformly distributed. If b is uniform, then b0 is also uniform. On the
other hand, if b = As + e, then b0 = (A/2)s0 + e where s0 = 2s − u is uniformly random in {±1}n .

Lemma 2.13. For any odd integer q, there is a polynomial-time reduction from the LWE± (q, n, m, σ )
problem to the LWE0,1 (q, n, m, σ ) problem.

Proof. On input an LWE± instance (A, b), the reduction outputs ϕ(A, b) = (2A, b0 = b + Au) where
u ∈ {1}n is the all-ones vector. Notice that, since q is odd, the factor 2 is invertible modulo q, and 2A is
uniformly distributed. If b is uniform, then b0 is also uniform. On the other hand, if b = As + e, then
b0 = (2A)s0 + e where s0 = (s + u)/2 is uniformly random in {0, 1}n .


3    Pseudorandomness of binary LWE
In this section we present a proof that the binary-secret LWE distribution LWE± is pseudorandom.
The idea is to define a simple (efficiently computable) randomized transformation ϕ with the following
properties:

    • If the input to ϕ is uniformly distributed, then the output ϕ(U) equals (or is statistically close to)
      the binary LWE distribution LWE± (q, n, m, σ̂ ) for some σ̂ .

                       T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                              9
                                                    DANIELE M ICCIANCIO

    • There are two pseudorandom distributions B, B̂ such that ϕ(B) equals (or is statistically close to)
      B̂.

Since ϕ is efficiently computable, the pseudorandomness of B implies that ϕ(U) ≈ LWE± (q, n, m, σ̂ ) is
computationally indistinguishable from ϕ(B) ≈ B̂. By transitivity, since B̂ is pseudorandom, it follows
that LWE± (q, n, m, σ̂ ) is also pseudorandom.
    As our aim is to give a reduction from the standard LWE problem to binary LWE, we set B and B̂ to
two pseudorandom distributions related to LWE. Specifically, we use the distributions
                                                        (n−1)×k                                  (n−1)×m
              B = {(AS + E)t | A ← U(Zq                           ), S ← U(Zqk×m ), E ← Dσ                 } and
                                                    (n+1)×(k+1)            (k+1)×m          (n+1)×m
              B̂ = {(ÂŜ + Ê)       t
                                          | Â ← U(Zq           ), Ŝ ← U(Zq       ), Ê ← D2σ      }

for some σ related to σ̂ . In other words B and B̂ are the (transposed) “label” component of the LWE
distributions LWE(q, k × m, n − 1, σ ) and LWE(q, (k + 1) × m, n + 1, 2σ ). Notice that any distinguisher
between B and the uniform distribution can be immediately transformed into an LWE distinguisher that
on input (A, B = AS + E) simply discards A, and then runs the original distinguishing procedure on Bt .
So, B is pseudorandom under the standard LWE assumption, and similarly for B̂.
    Before getting into the details of the transformation, notice the difference between the high level
structure of the proof presented here, and a typical reduction between variants of LWE. A typical
reduction would map standard LWE samples to binary LWE samples, and uniform samples to uniform
samples. Here, instead, on the one hand the standard LWE distribution is mapped again to a standard
LWE distribution (with slightly different parameters). On the other hand, the uniform distribution is
mapped to binary LWE.
    Our randomized transformation ϕ is shown in Figure 1. The transformation uses, as randomness, both
a uniform secret vector s and a binary secret vector z. Informally, the intuition is that by simultaneously
multiplying by s (on the left) and by z (on the right), the same transformation is able to produce (depending
on how the input B was chosen) either

    • a binary LWE distribution with secret z (when B is uniform), or

    • a (transposed4 ) standard LWE distribution with secret [s, St ]t (when B = (AS + E)t ).

Intuitively, one may think of ϕ as mapping B to [B, Bz + e]. So, when B is uniformly random, ϕ outputs
the binary LWE distribution by construction. On the other hand, if B = (AS + E)t = St At + Et , the
transformation outputs [B, Bz + e] = St [At , At z] + [Et , Et z + e], which looks like a standard (transposed)
LWE label matrix. In fact, by the Leftover Hash Lemma, one may argue that [At , At z] is statistically
close to a uniformly distributed matrix. Unfortunately, the error matrix [Et , Et z + e] does not follow the
Gaussian distribution5 required by LWE. So, in order to address this and other technical difficulties, the
actual transformation ϕ is a bit more complex. The details of the transformation are somewhat technical,
   4 The LWE distributions B and B̂ are transposed to allow for the multiplication of the uniform secret s on the left.
   5 In particular, the second part of the error matrix Et z + e will typically have much larger entries than Ê, and also be somehow

correlated to the first part Et . One may try to address the error imbalance by noise flooding techniques (i. e., by adding a large
perturbation term to the first part of the output B), but this would result in much larger noise and still not remove correlations.


                            T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                                10
                  O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

                                                        Randomness:               Dimensions:
             Transformation ϕ(B; z, s, a, e, G):           z ← U({±1}n )                   m×(n−1)
                                                                                     B ∈ Zq
             Input: B                                      s ← U(Zmq)                x ∈ Zm
                                                                                          q
                 x = s+e
                                                           a ← U(Zn−1
                                                                   q )               Q ∈ Zq
                                                                                           n×(2n+3)
                 Y = [s, s · at + B]
                                                           e ← Dm
                                                                2σ                   Y ∈ Zm×n
                 X = [Y, G]Qt · diag(z)                          m×(n+3)
                                                                                           q
             Output [X, x]                                 G ← Dσ                    X ∈ Zm×n
                                                                                           q




Figure 1: Transformation proving the pseudorandomness of binary LWE, where Q is the matrix specified
in Lemma 2.7.

and they are primarily motivated by all the cancellations needed for the proof to work and obtain the
proper LWE Gaussian error distribution.
     One way to gain additional insight into the construction is to notice that the transformation ϕ(B)
always outputs a pair [X, x] such that Xz = s + Gv ≈ s + e = x. (See proof of Claim 3.2 for details.) So,
distribution B̂ = ϕ(B) will also satisfy this property with high probability: there must be a small vector
ẑ ∈ {±1}n+1 such that (ÂŜ + Ê)t ẑ ≈ 0. This shows that the pseudorandom matrix B̂ = (ÂŜ + Ê)t is
already somehow close to a binary LWE instance because there is a ±1 combination of the first n columns
of B̂ that is close to the last column. In fact, something very similar can be proved directly, without using
ϕ: matrix Ât maps a set {0, 1}n+1 of size 2n > qk+1 to a set Zk+1      q    of size qk+1 . So, by the pigeon-hole
principle, there exist two binary inputs such that Ât ẑ0 = Ât ẑ1 , or, equivalently, a small vector ẑ = ẑ0 − ẑ1
(with kzk∞ = 1) such that B̂ẑ = Êt ẑ ≈ 0. An informal interpretation of this argument (which, in fact, is
closely related to the proof that LWE is robust with respect to the secret distribution [11]) is that matrix
Ât hashes the binary secret ẑ to an almost uniform (smaller dimensional) secret Ât ẑ with entries in Zq .
But, as before, the problem with this intuitive approach is that the error distribution Êt ẑ is not Gaussian,
and it is correlated with the secret ẑ.
     Our theorem below solves these technical problems using a carefully designed gadget matrix Q
(described in Lemma 2.7) which efficiently adjusts the error distribution using some extra randomness
G. Notice how, in the process of transforming LWE into binary LWE, the number of samples n − 1 in
the presumed hard LWE(q, k × m, n − 1, σ ) instance becomes the size n of the secret in the final binary
LWE instance LWE± (q, n, m, σ̂ ). Similarly, the number of columns m (i. e., the number of parallel LWE
instances) in the presumed hard LWE(q, (k + 1) × m, n + 1, 2σ ) instance becomes the number of samples
in the final binary LWE instance.
Theorem 3.1. Assume the distributions LWE(q, k × m, n − 1, σ ) and LWE(q, (k + 1) × m, n + 1, 2σ ) are
                        O(1)           √
pseudorandom. If q ≤ 2n , σ ≥ ω( log n), k ≥ ω(log n), and n √   ≥ (k + 1) · log2 (q) + ω(log n), then the
distribution LWE± (q, n, m, σ̂ ) is also pseudorandom for σ̂ = 2σ n + 1.
Proof. We use Z as a shorthand for the diagonal matrix diag(z). We first show that the transformation ϕ
maps the uniform distribution to the binary LWE distribution.
                           m×(n−1)
Claim 3.2. If B ← U(Zq            ) is chosen uniformly at random, then ϕ(B) ∈ Zm×n
                                                                                q   × Zm
                                                                                       q is statistically
close to the LWE± (q, n, m, σ̂ ) distribution.

                         T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                    11
                                              DANIELE M ICCIANCIO

Proof. We show that for any fixed values of a ∈ Zn−1   q   and z ∈ {±1}n , the output of the transformation
[X, x] = ϕ(B) is statistically close to the LWE± distribution with secret z, i. e., X ∈ Zm×n
                                                                                           q    is uniformly
random, and the conditional distribution of the noise vector ê = x − Xz (given X and z) is statistically
close to Dm
          σ̂ . All this is over the probability space defined by the random choice of B, s, e, G. The claim
follows by averaging over a and z.
    Let Q = [Q[n] , Q]n[ ] be the matrix defined in Lemma 2.7, and recall that Q[n] ∈ Zn×n is invertible,
                                                                        √
u Q[n] = et1 , and the vector vt = ut Q]n[ ∈ Zn+3 has norm kvk = 2 n, kvk∞ ≤ 2. Since s and B are
 t

uniformly random, the matrix Y is also uniformly distributed, and independent of e and G. Since Qt[n]
and Z are invertible, the matrix

                               X = [Y, G][Q[n] , Q]n[ ]t Z = Y(Qt[n] Z) + (GQt]n[ Z)

is also uniformly distributed, independently of G, e. It remains to analyze the conditional distribution of
the error vector ê = x−Xz. Using Z·z = u and YQt[n] u = Ye1 = s, we get Xz = YQt[n] u+GQt]n[ u = s+Gv.
So, the error vector equals ê = (s + e) − Xz = e − Gv. Since the entries of G and e are independent
discrete Gaussians of parameter σ and 2σ (respectively), the coordinates of ê are independent identically
distributed random variables, each following the distribution D2σ − ∑i vi Dσ . By Lemma 2.5, this
distribution is statistically close to Dσ̂ for
                                 r                      q                √
                           σ̂ = (2σ )2 + ∑(vi σ )2 = σ 4 + kvk2 = 2σ n + 1 .
                                              i

    Next, consider the output [X, x] when B follows distribution B.
Claim 3.3. The distribution ϕ(B) is statistically close to B̂.
                                             (n−1)×k                                      (n−1)×m
Proof. Let B = (AS + E)t for A ← U(Zq         ), S ← U(Zk×m
                                                         q  ) and E ← Dσ                            . By linearity, we can
                t
write Y = [s, sa + B] = Ys + Ye as the sum of two matrices

                                 Ys = [s, sat + St At ]       and       Ye = [0, Et ] .

Similarly, we can also decompose ϕ(B) = [X, x] = [Xs , s] + [Xe , e] as a sum where

                        Xs = Ys Qt[n] Z      and          Xe = [Ye , G]Qt Z = [Et , G]Qt]1[ Z .

Our goal is to show that [Xs , s]t = ÂŜ and [Xe , e]t = Ê for Â, Ŝ, Ê distributed as in the definition of B̂.
    We first look at the distribution of the error matrix Êt = [Xe , e]. The last column e is a discrete
                                                                                                m×(2n+2)
Gaussian of parameter 2σ by construction. Since [Et , G] has Gaussian distribution Dσ                    , the rest of
the matrix is distributed according to
                                                  (2n+2)×m
                               Xte ← Zt Q]1[ (Dσ                    2σ ) = D2σ ,
                                                             ) ≈ Z(Dn×m     n×m


for any fixed value of z, where we have used the property Q]1[ (D2n+2
                                                                 σ    ) ≈ Dn2σ from Lemma 2.7, and the
symmetry ZDn2σ = Dn2σ . This proves that Ê has Gaussian distribution of parameter 2σ , and it depends
only on E, G and e.

                         T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                         12
                   O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

     We now look at the distribution of [Xs , s] = (ÂŜ)t over the random choice of a, s, z, A and S. The idea
                                                                               (k+1)×m
is to set Ŝ = [s, St ]t , so that Ŝ is distributed uniformly at random over Zq       . But, in order to properly
randomize Â, we define                                         t 
                                                            −1    s
                                                      Ŝ = W
                                                                  S
                                                                 (k+1)×(k+1)
where W is a uniformly random invertible matrix in Zq                          . Since W is invertible, Ŝ is still
uniformly distributed, and independent of W. Next, define

                                                                                    1 0t
                                                                                        
                       I                    (n+1)×(k+1)                                           n×(k+1)
            Â =                ZQ[n] HW ∈ Zq               where        H=                    ∈ Zq         .
                       zt                                                           a A

Using the identities zt ZQ[n] = ut Q[n] = et1 and HWŜ = H[s, St ]t = Yts , we see that our choice of Â, Ŝ
satisfies (ÂŜ)t = [Xs , s] as desired. All that is left to do is to prove that Â is statistically close to uniform,
independently of Ŝ. We first look at HW. Let wt be the first row of W. That’s also the first row of HW.
The remaining rows of HW are [a, A]W. The first row wt is distributed uniformly at random among
all primitive vectors in Zk+1 q , i. e., all vectors such that gcd(w, q) = 1. So, by Lemma 2.2, the vector w
is within negligible statistical distance from the uniform distribution over Zk+1        q . Finally, since [a, A] is
uniform by construction, and W is invertible, the bottom rows ([a, A]W) of HW are uniform too, and
                                                                           n×(k+1)
independent of w. So, HW is statistically close to uniform over Zq                  . The matrix Ǎ = (ZQ[n] HW)t
is also statistically close to uniform (and independent of z) because Q[n] and Z are invertible. Finally,
using the Leftover Hash Lemma (Lemma 2.1) and the assumption n ≥ (k + 1) log2 (q) + ω(log n), we
see that Ât = Ǎ[I, z] = [Ǎ, Ǎz] is also statistically close to uniform. This concludes the proof that
ϕ(B) = (ÂŜ + Ê)t where Â, Ŝ and Ê follow the LWE distribution as in the definition of B̂.

     We are now ready to prove the theorem. It follows from pseudorandomness of the LWE(q, k × m, n −
1, σ ) that the distribution B is computationally indistinguishable from the uniform distribution U over
  m×(n−1)
Zq         . Since ϕ is efficiently computable, the distributions ϕ(B) and ϕ(U) are also computationally
indistinguishable. By Claim 3.2, ϕ(U) is statistically close to LWE± (q, n × 1, m, σ̂ ). Similarly, by
Claim 3.3, ϕ(B) is statistically close to B̂. So, LWE± (q, n × 1, m, σ̂ ) is computationally indistinguishable
from B̂. Finally, from the pseudorandomness of LWE(q, (k + 1) × m, n + 1, 2σ ), we know that the
                                                                                            m×(n+1)
distribution B̂ is computationally indistinguishable from the uniform distribution over Zq          . It follows
by transitivity that the binary LWE distribution LWE± (q, n, m, σ̂ ) is computationally indistinguishable
                                       m×(n+1)
from the uniform distribution over Zq           , i. e., LWE± (q, n, m, σ̂ ) is pseudorandom.

    The statement in the above theorem can be simplified using Lemma 2.9 to rephrase it in terms of the
basic LWE problem, and by noticing that LWE(q, k, n, σ ) does not get any easier when k and σ grow, or
when n gets smaller.
                                                                                                                O(1)
Corollary 3.4. Assume the distribution LWE(q, k, n + 1, σ ) is pseudorandom for some q ≤ 2n , σ ≥
   √                                                                                         O(1) , σ̂ )
ω( log n), k ≥ ω(log n), and (n+1)
                                √ ≥ (k +1)·(log2 (q)+1). Then the distribution LWE± (q, n, n
is also pseudorandom for σ̂ = 2σ n + 1.

                            T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                                      13
                                          DANIELE M ICCIANCIO

Proof. Notice that, under the assumptions in the corollary statement,

                            n ≥ (k + 1) log2 q + k ≥ (k + 1) log2 q + ω(log n)

as required by Theorem 3.1. In order to invoke the theorem, we also need to verify the pseudorandomness
conditions. Assume LWE(q, k, n + 1, σ ) is pseudorandom. Dropping the last two rows from the samples
[A, b] ← LWE(q, k, n + 1, σ ) shows that LWE(q, k, n − 1, σ ) is also pseudorandom. The samples [A, b]
can also be mapped to LWE(q, k + 1, n + 1, 2σ ) by performing the following two operations:

    • Add an extra Gaussian error term e ← Dn+1
                                            √
                                             3σ
                                                to b. By Lemma 2.5, this has the effect of increasing
                       √
      the error rate to σ 2 + 3σ 2 = 2σ .

    • Append an extra column a to A and add a random multiple a · s to b. This has the effect of extending
      the secret with an extra coordinate s.

Since this transformation also preserves the uniform distribution, it provides a reduction from

                       LWE(q, k, n + 1, σ )      to     LWE(q, k + 1, n + 1, 2σ ) ,

and proves that LWE(q, k + 1, n + 1, 2σ ) is pseudorandom. Finally, by Lemma 2.9,

                 LWE(q, k × m, n − 1, σ )      and      LWE(q, (k + 1) × m, n + 1, 2σ )

are also pseudorandom, as required by Theorem 3.1.

     Notice how Corollary 3.4 establishes the pseudorandomness of LWE± for any polynomial number
of samples m = nO(1) , using, as an assumption, only the pseudorandomness of LWE for a fixed number
(n+1 ≈ k log q) of samples. (This property is also implicit in [6].) We remark that we phrased Theorem 3.1
and Corollary 3.4 asymptotically (in terms of polynomial-time distinguishers achieving at most negligible
advantage ε = n−ω(1) ) only for simplicity. All statements and proofs are easily adapted to other settings,
e. g., to prove hardness of binary LWE against adversaries running in subexponential time.


4    Conclusion
We presented a simple proof that the LWE problem with binary secret of size n = O(k log2 q) is as hard
as LWE with uniformly random secret in Zkq . More specifically, if LWE with secrets in Zkq and n ≈ k log q
samples is pseudorandom, then LWE with secrets in {0, 1}n or {±1}n (and an arbitrary polynomial
number of samples nO(1) ) is also pseudorandom. As already observed in [6], the growth in the dimension
of the secret is seemingly optimal, because it approximately preserves the bit-size of the secret, and the
cost of a brute force attack. Starting from LWE with a fixed number of samples m ≈ k log q = O(k log k)
(for typical modulus q = kO(1) polynomial in the LWE secret dimension k) is potentially useful for
cryptanalysis, as it allows to generate and publish fixed-size random challenges for any value of k. (By
contrast, the general LWE problem would require to give to the adversary access to an LWE sampling
oracle that can be called an arbitrary number of times.) An interesting question is whether a reduction

                      T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                             14
                O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

can be given starting from LWE with an even smaller number of samples, e. g., m = O(k) linear in the
secret dimension.
     An important open problem is whether similar results can be proved for the structured variants of
LWE based on algebraic lattices [15, 14]. The use of structured lattices is of primary importance to make
lattice cryptography efficient in practice, and the use of LWE with binary secrets plays an important role
in some applications, like Fully Homomorphic Encryption schemes [9, 8], to control the noise growth
when computing on encrypted data. We remark that the use of binary secrets and errors does not seem to
pose any difficulty in the setting of one-way hash functions based on structured lattices [15]. However,
for LWE [21, 14], it is unclear how to adapt the proof in this paper to the algebraic lattice setting. We
hope our simple proof for unstructured lattices will bring more attention to this problem, and serve as a
possible starting point to establish similar results for ring LWE.


Acknowledgments. The author thanks the anonymous Theory of Computing referees and editor Oded
Regev for useful comments on earlier drafts of this paper.


References
 [1] M ARTIN R. A LBRECHT: On dual lattice attacks against small-secret LWE and parameter choices
     in HElib and SEAL. In Proc. 36th Ann. Internat. Conf. on the Theory and Applications of Cryp-
     tographic Techniques (EUROCRYPT’17), pp. 103–129. Springer, 2017. [doi:10.1007/978-3-319-
     56614-6_4] 3

 [2] M ARTIN R. A LBRECHT, C ARLOS C ID , J EAN -C HARLES FAUGÈRE , ROBERT F ITZPATRICK , AND
     L UDOVIC P ERRET: Algebraic algorithms for LWE problems. ACM Comm. Computer Algebra,
     49(2):62, 2015. [doi:10.1145/2815111.2815158] 3

 [3] B ENNY A PPLEBAUM , DAVID C ASH , C HRIS P EIKERT, AND A MIT S AHAI: Fast cryptographic
     primitives and circular-secure encryption based on hard learning problems. In Proc. 29th Ann.
     Internat. Crypto. Conf. (CRYPTO’09), pp. 595–618. Springer, 2009. [doi:10.1007/978-3-642-03356-
     8_35] 2

 [4] S ANJEEV A RORA AND RONG G E: New algorithms for learning in presence of errors. In Proc. 38th
     Internat. Colloq. on Automata, Languages and Programming (ICALP’11), pp. 403–415. Springer,
     2011. [doi:10.1007/978-3-642-22006-7_34] 3

 [5] S HI BAI AND S TEVEN D. G ALBRAITH: Lattice decoding attacks on binary LWE. In Proc. 19th
     Australasian Conf. on Information Security and Privacy (ACISP’14), pp. 322–337. Springer, 2014.
     [doi:10.1007/978-3-319-08344-5_21] 3

 [6] Z VIKA B RAKERSKI , A DELINE L ANGLOIS , C HRIS P EIKERT, O DED R EGEV, AND DAMIEN
     S TEHLÉ: Classical hardness of learning with errors. In Proc. 45th STOC, pp. 575–584. ACM Press,
     2013. [doi:10.1145/2488608.2488680, arXiv:1306.0281] 2, 3, 8, 14

                      T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                            15
                                        DANIELE M ICCIANCIO

 [7] J OHANNES A. B UCHMANN , F LORIAN G ÖPFERT, R ACHEL P LAYER , AND T HOMAS W UNDERER:
     On the hardness of LWE with binary error: Revisiting the hybrid lattice-reduction and meet-in-the-
     middle attack. In Proc. 8th Internat. Conf. on Progress in Cryptology (AFRICACRYPT’16), pp.
     24–43. Springer, 2016. [doi:10.1007/978-3-319-31517-1_2] 3

 [8] I LARIA C HILLOTTI , N ICOLAS G AMA , M ARIYA G EORGIEVA , AND M ALIKA I ZABACHÈNE: Faster
     fully homomorphic encryption: Bootstrapping in less than 0.1 seconds. In Proc. 22nd Internat.
     Conf. on the Theory and Application of Cryptology and Information Security (ASIACRYPT’16), pp.
     3–33. Springer, 2016. [doi:10.1007/978-3-662-53887-6_1] 2, 15

 [9] L ÉO D UCAS AND DANIELE M ICCIANCIO: FHEW: Bootstrapping homomorphic encryption in less
     than a second. In Proc. 34th Ann. Internat. Conf. on the Theory and Applications of Cryptographic
     Techniques (EUROCRYPT’15), pp. 617–640. Springer, 2015. [doi:10.1007/978-3-662-46800-5_24]
     2, 15

[10] C RAIG G ENTRY, C HRIS P EIKERT, AND V INOD VAIKUNTANATHAN: Trapdoors for hard lattices
     and new cryptographic constructions. In Proc. 40th STOC, pp. 197–206. ACM Press, 2008.
     [doi:10.1145/1374376.1374407] 5

[11] S HAFI G OLDWASSER , YAEL TAUMAN K ALAI , C HRIS P EIKERT, AND V INOD VAIKUNTANATHAN:
     Robustness of the learning with errors assumption. In Proc. 1st Conf. on Innovations in Theoret.
     Computer Science (ITCS’10), pp. 230–240. Tsinghua University Press, 2010. 2, 3, 11

[12] J OHAN H ÅSTAD , RUSSELL I MPAGLIAZZO , L EONID A. L EVIN , AND M ICHAEL L UBY: A pseu-
     dorandom generator from any one-way function. SIAM J. Comput., 28(4):1364–1396, 1999.
     [doi:10.1137/S0097539793244708] 4

[13] PAUL K IRCHNER AND P IERRE -A LAIN F OUQUE: An improved BKW algorithm for LWE with
     applications to cryptography and lattices. In Proc. 35th Ann. Internat. Crypto. Conf. (CRYPTO’15),
     pp. 43–62. Springer, 2015. [doi:10.1007/978-3-662-47989-6_3, arXiv:1506.02717] 3

[14] VADIM LYUBASHEVSKY, C HRIS P EIKERT, AND O DED R EGEV: On ideal lattices and learning
     with errors over rings. J. ACM, 60(6):43:1–43:35, 2013. Preliminary version in EUROCRYPT’10.
     [doi:10.1145/2535925] 15

[15] DANIELE M ICCIANCIO: Generalized compact knapsacks, cyclic lattices, and efficient one-
     way functions. Comput. Complexity, 16(4):365–411, 2007. Preliminary version in FOCS’02.
     [doi:10.1007/s00037-007-0234-9] 15

[16] DANIELE M ICCIANCIO AND P ETROS M OL: Pseudorandom knapsacks and the sample complexity
     of LWE search-to-decision reductions. In Proc. 31st Ann. Internat. Crypto. Conf. (CRYPTO’11), pp.
     465–484. Springer, 2011. [doi:10.1007/978-3-642-22792-9_26] 8

[17] DANIELE M ICCIANCIO AND C HRIS P EIKERT: Hardness of SIS and LWE with small parameters. In
     Proc. 33rd Ann. Internat. Crypto. Conf. (CRYPTO’13), pp. 21–39. Springer, 2013. [doi:10.1007/978-
     3-642-40041-4_2] 3, 6

                     T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                          16
               O N THE H ARDNESS OF L EARNING W ITH E RRORS WITH B INARY S ECRETS

[18] DANIELE M ICCIANCIO AND O DED R EGEV: Worst-case to average-case reductions based on
     Gaussian measures. SIAM J. Comput., 37(1):267–302, 2007. Preliminary version in FOCS’04.
     [doi:10.1137/S0097539705447360] 5

[19] C HRIS P EIKERT: Public-key cryptosystems from the worst-case shortest vector problem: extended
     abstract. In Proc. 41st STOC, pp. 333–342. ACM Press, 2009. [doi:10.1145/1536414.1536461] 8

[20] C HRIS P EIKERT, O DED R EGEV, AND N OAH S TEPHENS -DAVIDOWITZ: Pseudorandomness of
     ring-LWE for any ring and modulus. In Proc. 49th STOC, pp. 461–473. ACM Press, 2017.
     [doi:10.1145/3055399.3055489] 8

[21] O DED R EGEV: On lattices, learning with errors, random linear codes, and cryptography. J. ACM,
     56(6):34:1–34:40, 2009. Preliminary version in STOC’05. [doi:10.1145/1568318.1568324] 1, 8, 15

[22] O DED R EGEV: The learning with errors problem (invited survey). In Proc. 25th Conf.
     on Computational Complexity (CCC’10), pp. 191–204. IEEE Comp. Soc. Press, 2010.
     [doi:10.1109/CCC.2010.26] 1


AUTHOR

      Daniele Micciancio
      Professor
      University of California at San Diego (UCSD)
      La Jolla, CA, USA
      daniele cs ucsd edu
      http://cseweb.ucsd.edu/~daniele/


ABOUT THE AUTHOR

      DANIELE M ICCIANCIO got his Ph. D. in Computer Science from M.I.T. in 1998, under
        the supervision of Shafi Goldwasser. He is a full professor in CSE department at the
        University of California, San Diego (UCSD), where he has worked since 1999. He
        is broadly interested in theoretical computer science and cryptography. His research
        focuses on lattice cryptography and the complexity of lattice and coding problems.




                     T HEORY OF C OMPUTING, Volume 14 (13), 2018, pp. 1–17                       17