Yet Another Introductory Number Theory Textbook (Cryptology Emphasis Version)

Authors Jonathan A. Poritz,
Plaintext
             Yet Another
Introductory Number Theory Textbook
        (Cryptology Emphasis Version)


              Jonathan A. Poritz

                  after Wissam Raji

        Department of Mathematics and Physics
           Colorado State University, Pueblo
                  2200 Bonforte Blvd.
               Pueblo, CO 81001, USA
          E-mail: jonathan.poritz@gmail.com
            Web: www.poritz.net/jonathan




                                                07 M AY 2014 11:04MDT
                                              Preface

     This is a first draft of a free (as in speech, not as in beer, [Sta02]) (although it is free as in
beer as well) undergraduate number theory textbook. It was used for Math 319 at Colorado
State University –Pueblo in the spring semester of 2014. Thanks are hereby offered to the
students in that class – Megan Bissell, Tennille Candelaria, Ariana Carlyle, Michael De-
graw, Daniel Fisher, Aaron Griffin, Lindsay Harder, Graham Harper, Helen Huang, Daniel
Nichols, and Arika Waldrep – who offered many useful suggestions and found numerous
typos. I am also grateful to the students in my Math 242 Introduction to Mathematical Pro-
gramming class in that same spring semester of 2014 – Stephen Ciruli, Jamen Cox, Graham
Harper, Joel Kienitz, Matthew Klamm, Christopher Martin, Corey Sullinger, James Todd,
and Shelby Whalen – whose various programming projects produced code that I adapted
to make some of the figures and examples in the text.
     The author gratefully acknowledges the work An Introductory Course in Elementary
Number Theory by Wissam Raji [see www.saylor.org/books/] from which this
was initially adapted. Raji’s text was released under the Creative Commons CC BY 3.0
license, see creativecommons.org/licenses/by/3.0 .
This work is instead released under a CC BY-SA 4.0 license, see
creativecommons.org/licenses/by-sa/4.0 . (The
difference is that if you build future works off of this one, you must also release your
derivative works with a license that allows further remixes over which you have no control.)
     This version: 07 May 2014 11:04MDT. Note this text will be frequently updated and
improved as the author has time, particularly during and immediately after semesters in
which it is being used in a class. Therefore please check back often to the website, which
is www.poritz.net/jonathan/share/yaintt.
     This work is dedicated to my insanely hardworking colleagues at Colorado State Uni-
versity – Pueblo whose dedication to their students, their scholarship, and their communi-
ties is an inspiration. While I was working on the first version of this book, those colleagues
stood up to some of the most benighted, ignorant administrative nonsense I have seen in
the more than thirty years I have been involved in higher education. As MLK said, “The
arc of the moral universe is long, but it bends towards justice.” – It is selfless, intelligent,
hard work like yours that is doing the bending.
                                            Jonathan A. Poritz, 7 May 2014, Pueblo, CO, USA

                                                  iii
                                           Release Notes

    This version of YAINTT has a particular emphasis on connections to cryptology. The
cryptologic material appears in Chapter 4 and §§ 5.5 and 5.6, arising naturally (I hope)
out of the ambient number theory. The main cryptologic applications – being the RSA
cryptosystem, Diffie-Hellman key exchange, and the ElGamal cryptosystem – come out
so naturally from considerations of Euler’s Theorem, primitive roots, and indices that it
renders quite ironic G.H. Hardy’s assertion [Har05] of the purity and eternal inapplicability
of number theory.
    Note, however, that once we broach the subject of these cryptologic algorithms, we
take the time to make careful definitions for many cryptological concepts and to develop
some related ideas of cryptology which have much more tenuous connections to the topic
of number theory. This material therefore has something of a different flavor from the
rest of the text – as is true of all scholarly work in cryptology (indeed, perhaps in all of
computer science), which is clearly a discipline with a different culture from that of “pure”
mathematics. Obviously, these sections could be skipped by an uninterested reader, or
remixed away by an instructor for her own particular class approach.

           Caution: In good Bourbaki1 style, where this symbol appears in the text below,
           it indicates a place where the reasoning is intricate and difficult to follow, or calls
           attention to a common misinterpretation of some point.


    This version, in PDF form, can be found at
       http://www.poritz.net/jonathan/share/yaintt.pdf
while all the files to create custom versions can be found at
       http://www.poritz.net/jonathan/share/yaintt/
– have fun with it, that’s the point of the Creative Commons!




    1
     A fictional mathematician and author of many (non-fictional – they really exist) fine mathematics texts,
such as [Bou04]
                                                     v
                                     Contents

Preface                                                            iii

Release Notes                                                       v

Chapter 1. Well-Ordering and Division                               1
  1.1. The Well-Ordering Principle and Mathematical Induction       1
  1.2. Algebraic Operations with Integers                           5
  1.3. Divisibility and the Division Algorithm                      6
  1.4. Representations of Integers in Different Bases               9
  1.5. The Greatest Common Divisor                                 13
  1.6. The Euclidean Algorithm                                     17

Chapter 2. Congruences                                             21
  2.1. Introduction to Congruences                                 21
  2.2. Linear Congruences                                          27
  2.3. The Chinese Remainder Theorem                               30
  2.4. Another Way to Work with Congruences: Equivalence Classes   33
  2.5. Euler’s φ Function                                          37

Chapter 3. Prime Numbers                                           41
  3.1. Basics and the FTA                                          41
  3.2. Wilson’s Theorem                                            45
  3.3. Multiplicative Order and Applications                       47
  3.4. Another Approach to Fermat’s Little and Euler’s Theorems    51

Chapter 4. Cryptology                                              55
  4.1. Some Speculative History                                    55
  4.2. The Caesar Cipher and Its Variants                          60
  4.3. First Steps into Cryptanalysis: Frequency Analysis          64
  4.4. Public-Key Crypto: the RSA Cryptosystem                     73
  4.5. Digital Signatures                                          81
  4.6. Man-in-the-Middle Attacks, Certificates, and Trust          86

Chapter 5. Indices = Discrete Logarithms                           89
  5.1. More Properties of Multiplicative Order                     91
                                          vii
viii                                       CONTENTS

       5.2.   A Necessary Digression: Gauss’s Theorem on Sums of Euler’s Function    94
       5.3.   Primitive Roots                                                        97
       5.4.   Indices                                                               103
       5.5.   Diffie-Hellman Key Exchange                                           107
       5.6.   The ElGamal Cryptosystem                                              111

Bibliography                                                                        115

Index                                                                               117
                                        CHAPTER 1


                             Well-Ordering and Division

            1.1. The Well-Ordering Principle and Mathematical Induction
    In this chapter, we present three basic tools that will often be used in proving properties
of the integers. We start with a very important property of integers called the well-ordering
principle. We then state what is known as the pigeonhole principle, and then we proceed to
present an important method called mathematical induction.

   1.1.1. The Well-Ordering Principle.

   D EFINITION 1.1.1. Given a set S of numbers (of any kind), we say that ℓ ∈ S is a least
element of S if ∀x ∈ S, either x = ℓ or ℓ < x.

    T HE W ELL -O RDERING P RINCIPLE . Every non-empty set of natural numbers has a
least element.

   This principle is often taken as an axiom.

   1.1.2. The Pigeonhole Principle.

    T HEOREM 1.1.2. The Pigeonhole Principle: Let s, k ∈ N satisfy s > k. If s objects
are placed in k boxes, then at least one box contains more than one object.

    P ROOF. Suppose that none of the boxes contains more than one object. Then there are
at most k objects. This leads to a contradiction with the fact that there are s objects for
s > k.                                                                                   

   1.1.3. The Principle of Mathematical Induction. We now present a valuable tool for
proving results about integers. This tool is the principle of mathematical induction.

    T HEOREM 1.1.3. The First Principle of Mathematical Induction: Let S ⊂ N be a set
satisfying the following two properties:
     (1) 1 ∈ S; and
     (2) ∀k ∈ N, k ∈ S ⇒ k + 1 ∈ S.
Then S = N.
More generally, if P(n) is a property of natural numbers which may or may not be true for
any particular n ∈ N, satisfying
     (1) P(1) is true; and
                                              1
2                                 1. WELL-ORDERING AND DIVISION

     (2) ∀k ∈ N, P(k) ⇒ P(k + 1)
then ∀n ∈ N, P(n) is true.

     P ROOF. We use the well-ordering principle to prove this first principle of mathematical
induction.
     Let S be the set from the first part of the theorem and let T be the set of natural numbers
not in S. We will use a proof by contradiction, so assume T is non-empty.
     Then, by the well-ordering principle, T contains a least element ℓ.
     Note that 1 ∈ S, so 1 ∈/ T and thus ℓ > 1. Therefore ℓ − 1 is a natural number. Since ℓ
is the least element of T , ℓ − 1 is not in T , it is therefore in S.
     But by the defining properties of S, since ℓ − 1 ∈ S, ℓ = ℓ − 1 + 1 ∈ S, which
contradicts the fact that ℓ is a least element of T , so in T , so not in S.
     This contradiction implies that the assumption that T is non-empty is false, hence S =
N.
     For the second part of the theorem, let S = {n ∈ N | P(n) is true} and apply the first
part.                                                                                         

    E XAMPLE 1.1.4. We use mathematical induction to show that ∀n ∈ N
                                 n
                                X        n(n + 1)
(1.1.1)                             j=            .
                                j=1
                                            2
First note that
                                           1
                                           X               1·2
                                                 j=1=
                                           j=1
                                                            2
and thus the the statement is true for n = 1. For the remaining inductive step, suppose that
                                                      P
the formula holds for some particular n ∈ N, that is nj=1 j = n(n+1)
                                                                   2
                                                                      . We show that
                                    n+1
                                    X            (n + 1)(n + 2)
                                           j=                   .
                                     j=1
                                                       2
to complete the proof by induction. Indeed
            n+1        n
            X          X                     n(n + 1)             (n + 1)(n + 2)
                  j=         j + (n + 1) =            + (n + 1) =                ,
            j=1        j=1
                                                2                       2
and the result follows.

    E XAMPLE 1.1.5. Now we use mathematical induction to prove that n! ≤ nn ∀n ∈ N.
    Note that 1! = 1 ≤ 11 = 1. We now present the inductive step. Suppose that
                                                 n! ≤ nn
for some n ∈ N, we prove that (n + 1)! ≤ (n + 1)n+1. Note that
          (n + 1)! = (n + 1)n! ≤ (n + 1).nn < (n + 1)(n + 1)n = (n + 1)n+1 .
             1.1. THE WELL-ORDERING PRINCIPLE AND MATHEMATICAL INDUCTION                    3

This completes the proof.

    T HEOREM 1.1.6. The Second Principle of Mathematical Induction: Let S ⊂ N be a
set satisfying the following two properties:
     (1) 1 ∈ S; and
     (2) ∀k ∈ N, 1, . . . , k ∈ S ⇒ k + 1 ∈ S.
Then S = N.
More generally, if P(n) is a property of natural numbers which may or may not be true for
any particular n ∈ N, satisfying
     (1) P(1) is true; and
     (2) ∀k ∈ N, if P(1), . . . , P(k) are all true, then P(k + 1) is also true
then ∀n ∈ N, P(n) is true.

     P ROOF. To prove the second principle of induction, we use the first principle of induc-
tion.
     Let S be a set of integers as in the first part of the theorem. For n ∈ N, let P(n)
be the mathematical property “1, . . . , n ∈ S”. Then we can apply the First Principle of
Mathematical Induction to prove that ∀n ∈ N P(n) is true, which means S = N. [Details
left to the reader.]
     The second part of the theorem follows from the first in exactly the same way that the
second part of the First Principle of Mathematical Induction followed from the first.      
4                           1. WELL-ORDERING AND DIVISION

    Exercises for §1.1.

    E XERCISE 1.1.1. Prove using mathematical induction that n < 3n for all positive inte-
gers n.
                               P
    E XERCISE 1.1.2. Show that nj=1 j 2 = n(n+1)(2n+1)
                                                6
                                                       .
                                                             P
    E XERCISE 1.1.3. Use mathematical induction to prove that nj=1(−1)j−1 j 2 = (−1)n−1 n(n+
1)/2.
                                                              P
    E XERCISE 1.1.4. Use mathematical induction to prove that nj=1 j 3 = [n(n + 1)/2]2
for every positive integer n.
                                                              P
    E XERCISE 1.1.5. Use mathematical induction to prove that nj=1 (2j − 1) = n2 .

    E XERCISE 1.1.6. Use mathematical induction to prove that 2n < n! for n ≥ 4.

    E XERCISE 1.1.7. Use mathematical induction to prove that n2 < n! for n ≥ 4.
                         1.2. ALGEBRAIC OPERATIONS WITH INTEGERS                             5

                        1.2. Algebraic Operations with Integers
   On Z, the set of integers, there are two basic binary operations, namely addition (de-
noted by +) and multiplication (denoted by ·), which satisfy the following well known
properties:
     (1) Commutativity of addition and multiplication
                                ∀a, b ∈ Z :      a+b= b+a
                                                  a·b= b·a
     (2) Associativity of addition and multiplication
                         ∀a, b, c ∈ Z :     (a + b) + c = a + (b + c)
                                              (a · b) · c = a · (b · c)
     (3) Distributivity of multiplication over addition
                  ∀a, b, c ∈ Z :    a · (b + c) = a · b + a · c.
In the set Z there are identity elements for the two operations + and ·, and these are the
elements 0 and 1 respectively, that satisfy the basic properties
                               ∀a ∈ Z :      a+0=0+a=a
                                              a·1=1·a=a.
    The set Z allows additive inverses for its elements, in the sense that for every a ∈ Z
there exists another integer in Z, denoted by −a, such that
(1.2.1)                                   a + (−a) = 0.
While for multiplication, only the integer 1 has a multiplicative inverse in the sense that
1 is the only integer a such that there exists another integer, denoted by a−1 or by 1/a,
(namely 1 itself in this case) such that
(1.2.2)                                    a · a−1 = 1.
    From the operations of addition and multiplication one can define two other operations
on Z, namely subtraction (denoted by −) and division (denoted by /). Subtraction is a
binary operation on Z, i.e., defined for any two integers in Z, while division is not a binary
operation and thus is defined only for some specific pairs of integers in Z. Subtraction and
division are defined as follows:
     (1) ∀a, b ∈ Z, a − b is defined to be a + (−b)
     (2) Given a, b ∈ Z, where b 6= 0, if ∃c ∈ Z such that a = b · c then a/b is defined to
         be c.
6                                1. WELL-ORDERING AND DIVISION

                       1.3. Divisibility and the Division Algorithm
    We now discuss the concept of divisibility and its properties.
    1.3.1. Integer Divisibility.
    D EFINITION 1.3.1. If a and b are integers such that a 6= 0, then we say a divides b and
write a | b if there exists an integer k such that b = ka. That is, given a, b ∈ Z such that
a 6= 0, we write a | b if ∃k ∈ Z. s.t. b = ka.
    If a divides b, we also say a is a factor [or divisor] of b, and b is a multiple of a. If a
does not divide b, we write a ∤ b.
    E XAMPLE 1.3.2. For example, 2 | 4 and 7 | 63, while 5 ∤ 26.
    D EFINITION 1.3.3. Given a ∈ Z, we say a is even if 2 | a, i.e., if ∃k ∈ Z s.t. a = 2k.
In contrast, given a ∈ Z, we say a is odd if 2 ∤ a.
   It is a consequence of the Division Algorithm, below, that if a is odd then ∃k ∈ Z s.t.
a = 2k + 1.
    P ROPOSITION 1.3.4. ∀a ∈ Z we have a | 0.
    P ROPOSITION 1.3.5. If b ∈ Z is such that |b| < a, and b 6= 0, then a ∤ b.
    P ROPOSITION 1.3.6. Given a, b ∈ Z, a | b ⇔ a | |b|.
    T HEOREM 1.3.7. If a, b and c are integers such that a | b and b | c, then a | c.
   P ROOF. Since a | b and b | c, we know ∃k1 , k2 ∈ Z such that b = k1 a and c = k2 b.
Hence c = k1 k2 a and so a | c.                                                     
    E XAMPLE 1.3.8. Since 6 | 18 and 18 | 36, then 6 | 36.
   The following theorem states that if an integer divides two other integers then it divides
any linear combination of these integers.
    T HEOREM 1.3.9. ∀a, b, c, m, n ∈ Z, if c | a and c | b then c | (ma + nb).
    P ROOF. Since c | a and c | b, ∃k1 , k2 ∈ Z such that a = k1 c and b = k2 c. Thus
                         ma + nb = mk1 c + nk2 c = c(mk1 + nk2 ),
and hence c | (ma + nb).                                                                       
    Theorem 1.3.9 can be generalized to any finite linear combination as follows. If
                    n ∈ N, a, b1 , . . . , bn ∈ Z and a | b1 , a | b2 , . . . , a | bn
then
                                                  n
                                                  X
(1.3.1)                                      a|         kj bj
                                                  j=1

∀k1 , . . . , kn ∈ Z. It would be a nice exercise to prove this generalization by induction.
                       1.3. DIVISIBILITY AND THE DIVISION ALGORITHM                         7

   1.3.2. The Division Algorithm.

    T HEOREM 1.3.10. The Division Algorithm Given a, b ∈ Z such that b > 0, there exist
unique q, r ∈ Z such that a = qb + r and 0 ≤ r < b. This q is called the quotient and r
the remainder when a is divided by b.

    P ROOF. Consider the set A = {a − bk ≥ 0 | k ∈ Z}. Note that A is nonempty since
for k < a/b, a − bk > 0. By the well-ordering principle, A has a least element r = a − qb
for some q ∈ Z. Notice that r ≥ 0 by construction. Now if r ≥ b then (since b > 0)
                       r > r − b = a − qb − b = a − (q + 1)b ≥ 0.
This leads to a contradiction since r is assumed to be the least positive integer of the form
r = a − qb. As a result we have 0 ≤ r < b.
   We will show that q and r are unique. Suppose that a = q1 b + r1 and a = q2 b + r2 with
0 ≤ r1 < b and 0 ≤ r2 < b. Then we have
               a − a = q1 b + r1 − (q2 b + r2 ) = (q1 − q2 )b + (r1 − r2 ) = 0.
As a result we have
                                   (q1 − q2 )b = r2 − r1 .
Thus we get that
                                        b | (r2 − r1 ).
And since − max(r1 , r2 ) ≤ |r2 − r1 | ≤ max(r1 , r2 ), and b > max(r1 , r2 ), then r2 − r1
must be 0, i.e. r2 = r1 . And since bq1 + r1 = bq2 + r2 , we also get that q1 = q2 . This
proves uniqueness.                                                                       

   E XAMPLE 1.3.11. If a = 71 and b = 6, then 71 = 6 · 11 + 5. Here q = 11 and r = 5.
8                             1. WELL-ORDERING AND DIVISION

    Exercises for §1.3.

    E XERCISE 1.3.1. Show that 5 | 25, 19 | 38 and 2 | 98.

   E XERCISE 1.3.2. Use the division algorithm to find the quotient and the remainder
when 76 is divided by 13.

   E XERCISE 1.3.3. Use the division algorithm to find the quotient and the remainder
when -100 is divided by 13.

    E XERCISE 1.3.4. Show that if a, b, c and d are integers with a and c nonzero, such that
a | b and c | d, then ac | bd.

    E XERCISE 1.3.5. Show that if a and b are positive integers and a | b, then a ≤ b.

    E XERCISE 1.3.6. Prove that the sum of two even integers is even, the sum of two odd
integers is even and the sum of an even integer and an odd integer is odd.

   E XERCISE 1.3.7. Show that the product of two even integers is even, the product of
two odd integers is odd and the product of an even integer and an odd integer is even.

    E XERCISE 1.3.8. Show that if m is an integer then 3 divides m3 − m.

    E XERCISE 1.3.9. Show that the square of every odd integer is of the form 8m + 1.

    E XERCISE 1.3.10. Show that the square of any integer is of the form 3m or 3m + 1 but
not of the form 3m + 2.

    E XERCISE 1.3.11. Show that if ac | bc, then a | b.

    E XERCISE 1.3.12. Show that if a | b and b | a then a = ±b.
                   1.4. REPRESENTATIONS OF INTEGERS IN DIFFERENT BASES                        9

                   1.4. Representations of Integers in Different Bases
    In this section, we show how any positive integer can be written in terms of any positive
base integer expansion in a unique way. Normally we use decimal notation to represent
integers, we will show how to convert an integer from decimal notation into any other
positive base integer notation and vise versa. Using the decimal notation in daily life is
more traditional probably only because we have ten fingers. (“What about our toes?” you
cry. I don’t know. And apparently the Babylonians had 30 fingers on each hand, or 15 on
each hand and each foot, since they used base 60.)
    Notation An integer a written in base b expansion is denoted by (a)b .

   T HEOREM 1.4.1. Let b ∈ Z satisfy b > 1. Then ∀m ∈ N, ∃l ∈ N and ∃a1 , . . . , al ∈ Z
such that

                           m = al bl + al−1 bl−1 + · · · + a1 b + a0 ,
                            0 ≤ aj < b for j = 0, 1, . . . , l, and
                                             al 6= 0.

   P ROOF. Fix an m ∈ N. We start by dividing m by b and we get

                                m = q0 b + a0 , 0 ≤ a0 < b.

If q0 6= 0 then we continue to divide q0 by b and we get

                                q0 = q1 b + a1 , 0 ≤ a1 < b.

We continue this process and hence we get

                              q1 = q2 b + a2 , 0 ≤ a2 < b,
                                 .
                                 .
                                 .
                            ql−2 = ql−1 b + al−1 , 0 ≤ al−1 < b,
                            ql−1 = 0 · b + al , 0 ≤ al < b.

Note that the sequence q0 , q1 , . . . is a decreasing sequence of non-negative integers with a
last term ql that must be 0.
    Now substituting the equation q0 = q1 b + a1 in m = q0 b + a0 , we get

                         m = (q1 b + a1 )b + a0 = q1 b2 + a1 b + a0 ,
10                               1. WELL-ORDERING AND DIVISION

Successively substituting the equations in m, we get
                           m = q2 b3 + a2 b2 + a1 b + a0 ,
                             .
                             .
                             .
                              = ql−1 bl + al−1 bl−1 + · · · + a1 b + a0 ,
                              = al bl + al−1 bl−1 + · · · + a1 b + a0 .
What remains to prove is that the representation is unique. Suppose now that
         m = al bl + al−1 bl−1 + · · · + a1 b + a0 = cl bl + cl−1 bl−1 + · · · + c1 b + c0
where if the number of terms is different in one expansion, we add zero coefficients to make
the number of terms agree. Subtracting the two expansions, we get
            (al − cl )bl + (al−1 − cl−1 )bl−1 + · · · + (a1 − c1 )b + (a0 − c0 ) = 0.
If the two expansions are different, then there exists 0 ≤ j ≤ l such that cj 6= aj . As a
result, we get
                 bj ((al − cl )bl−j + · · · + (aj+1 − cj+1)b + (aj − cj )) = 0
and since b 6= 0, we get
                   (al − cl )bl−j + · · · + (aj+1 − cj+1 )b + (aj − cj ) = 0.
We now get
                       aj − cj = (al − cl )bl−j + · · · + (aj+1 − cj+1 )b,
and as a result, b | (aj − cj ). Since 0 ≤ aj < b and 0 ≤ cj < b, we get that aj = cj . This is
a contradiction and hence the expansion is unique.                                           

      D EFINITION 1.4.2. Given b ∈ Z satisfying b > 1. For m ∈ N, let ℓ ∈ N and
a1 , . . . , aℓ ∈ Z be as in the above theorem (1.4.1). Then the base b expression for m
is the sequences of digits mb = aℓ . . . a1 . If b ≥ 10, we often use some other single
symbols to represent the possible values from 10 to b − 1 of the ai ’s. For example,
                                            10 ! A
                                            11 ! B
                                            12 ! C
                                               etc.
Base 2 representation of integers is called binary representation. Binary representation
is useful for computers: the coefficients a0 , . . . , al of a binary representation all satisfy
0 ≤ aj < 2, hence they are 0 or 1. Thus to represent an integer on l wires, one can have
                   1.4. REPRESENTATIONS OF INTEGERS IN DIFFERENT BASES                      11

each wire either have voltage (1) or not (0). (In fact, the word bit is a contraction of binary
digit.)
    Computer programmers also frequently use base 8 and base 16, called octal and hexa-
decimal or hex, respectively. The Babylonians used base 60, called sexagesimal.

   E XAMPLE 1.4.3. To find the expansion of 214 base 3: we do the following
                                      214 = 3 · 71 + 1
                                       71 = 3 · 23 + 2
                                       23 = 3 · 7 + 2
                                         7=3·2+1
                                         2=3·0+2


As a result, to obtain a base 3 expansion of 214, we take the remainders of divisions and
we get that (214)10 = (21221)3.

   E XAMPLE 1.4.4. To find the base 10 expansion, i.e., the decimal expansion, of (364)7:
We do the following: 4 · 70 + 6 · 71 + 3 · 72 = 4 + 42 + 147 = 193.
12                            1. WELL-ORDERING AND DIVISION

     1.4.1. Exercises for §1.4.

     E XERCISE 1.4.1. Convert (7482)10 to base 6 notation.

     E XERCISE 1.4.2. Convert (98156)10 to base 8 notation.

     E XERCISE 1.4.3. Convert (101011101)2 to decimal notation.

     E XERCISE 1.4.4. Convert (AB6C7D)16 to decimal notation.

     E XERCISE 1.4.5. Convert (9A0B)16 to binary notation.
                             1.5. THE GREATEST COMMON DIVISOR                                13

                           1.5. The Greatest Common Divisor
    In this section we define the greatest common divisor (gcd) of two integers and discuss
its properties. We also prove that the greatest common divisor of two integers is a linear
combination of these integers.
    Two integers a and b, not both 0, can have only finitely many divisors (see Exer-
cise 1.3.5), and thus can have only finitely many divisors in common. In this section,
we are interested in the greatest of these common divisors.

    D EFINITION 1.5.1. Given a, b ∈ Z, not both zero, the greatest common divisor is the
largest integer that divides both a and b, and is written gcd(a, b) (or sometimes just (a, b)).
    When it makes some formulæ simpler, we will write gcd(0, 0) = 0.

   E XAMPLE 1.5.2. The greatest common divisor of 24 and 18 is 6. In other words
gcd(24, 18) = 6.

    D EFINITION 1.5.3. a, b ∈ Z are said to be relatively prime if gcd(a, b) = 1.

    E XAMPLE 1.5.4. The greatest common divisor of 9 and 16 is 1, thus they are relatively
prime.

    Note that every integer has positive and negative divisors. If a is a positive divisor of m,
then −a is also a divisor of m. Therefore by our definition of the greatest common divisor,
we can see that gcd(a, b) = gcd(|a|, |b|).
    We can use the gcd of two integers to make relatively prime integers:

    T HEOREM 1.5.5. If a, b ∈ Z have gcd(a, b) = d then gcd(a/d, b/d) = 1.

    P ROOF. Fix a, b ∈ Z. We will show that a/d and b/d have no common positive divisors
other than 1. Let k ∈ N be a divisor of both a/d and b/d, so ∃m, n ∈ N such that
                                 a/d = km and b/d = kn
Thus we get that
                                  a = kmd and b = knd.
Hence kd is a common divisor of both a and b. Also, kd ≥ d. However, d is the greatest
common divisor of a and b. As a result, we get that k = 1.                          
   The next theorem shows that the greatest common divisor of two integers does not
change when we add a multiple of one of the two integers to the other.

    T HEOREM 1.5.6. Let a, b, c ∈ Z. Then gcd(a, b) = gcd(a + cb, b).

    P ROOF. We will show that every divisor of a and b is also a divisor of a + cb and b and
vise versa. Hence they have exactly the same divisors. So we get that the greatest common
divisor of a and b will also be the greatest common divisor of a + cb and b. Let k be a
14                             1. WELL-ORDERING AND DIVISION

common divisor of a and b. By Theorem 1.3.9, k | (a + cb) and hence k is a divisor of
a + cb. Now assume that l is a common divisor of a + cb and b. Also by Theorem 1.3.9 we
have,
                                 l | ((a + cb) − cb) = a.
As a result, l is a common divisor of a and b and the result follows.                      

     E XAMPLE 1.5.7. Notice that gcd(4, 14) = gcd(4, 14 − 3 · 4) = gcd(4, 2) = 2.

    We now present a theorem which proves that the greatest common divisor of two inte-
gers can be written as a linear combination of the two integers.

   T HEOREM 1.5.8. Let a, b ∈ Z not both be zero. Then gcd(a, b) is the smallest natural
number which is of the form d = ma + nb for some m, n ∈ Z.

    P ROOF. Assume without loss of generality that a, b ∈ N are positive integers. Consider
the set
                  S = {d ∈ N | d = ma + nb for some m, n ∈ Z} .
S is non-empty since a = 1 · a + 0 · b and b = 0 · a + 1 · b are both in S. Let d ∈ N be the
least element of S, whose existence is guaranteed by the well-ordering principle. Notice
d = ma + nb for some m, n ∈ Z, since d ∈ S. We still must prove that d divides both a
and b and that it is the greatest such common divisor.
    By the division algorithm, ∃q, r ∈ Z such that
                                  a = qd + r, 0 ≤ r < d.
Thus we have
                    r = a − qd = a − q(ma + nb) = (1 − qm)a − qnb.
We then have that r is a linear combination of a and b. Since 0 ≤ r < d and d is the
least positive integer which is a linear combination of a and b, we must have r = 0 and so
a = qd. Hence d | a.
    The same sort of argument will show that d | b.
    Now notice that if there is a divisor c that divides both a and b. Then c divides any
linear combination of a and b by Theorem 1.3.9. Hence c | d. This proves that any common
divisor of a and b divides d. Hence c ≤ d, and d is the greatest common divisor.        
     There is a simple application of this which will be very useful in the future:

   C OROLLARY 1.5.9. If a, b ∈ Z are relatively prime, then ∃m, n ∈ Z such that ma +
nb = 1.

   D EFINITION 1.5.10. For some n ∈ N, let a1 , a2 , . . . , an ∈ Z not be all 0. The greatest
common divisor of these integers is the largest integer that divides all of them, and is
denoted gcd(a1 , . . . , an ).
                             1.5. THE GREATEST COMMON DIVISOR                                15

    D EFINITION 1.5.11. For some n ∈ N, a1 , a2 , . . . , an ∈ Z are said to be mutually
relatively prime if gcd(a1 , a2 , . . . , an ) = 1.

    E XAMPLE 1.5.12. The integers 3, 6, 7 are mutually relatively prime since (3, 6, 7) = 1
although (3, 6) = 3.

   D EFINITION 1.5.13. For some n ∈ N, a1 , a2 , . . . , an ∈ Z are called pairwise relatively
prime if ∀i, j ∈ N such that i ≤ n, j ≤ n, and i 6= j, we have gcd(ai , aj ) = 1.

    E XAMPLE 1.5.14. The integers 3, 14, 25 are pairwise relatively prime. Notice also that
these integers are mutually relatively prime.

    P ROPOSITION 1.5.15. For n ∈ N and a1 , . . . , an ∈ Z, if a1 , a2 , . . . , an are pairwise
relatively prime then they are mutually relatively prime.
16                           1. WELL-ORDERING AND DIVISION

     Exercises for §1.5.

     E XERCISE 1.5.1. Find the greatest common divisor of 15 and 35.

     E XERCISE 1.5.2. Find the greatest common divisor of 100 and 104.

     E XERCISE 1.5.3. Find the greatest common divisor of -30 and 95.

     E XERCISE 1.5.4. Let m ∈ N. Find the greatest common divisor of m and m + 1.

     E XERCISE 1.5.5. Let m ∈ N, find the greatest common divisor of m and m + 2.

    E XERCISE 1.5.6. Show that if m, n ∈ Z have gcd(m, n) = 1, then gcd(m+n, m−n) =
1 or 2.

     E XERCISE 1.5.7. Show that if m ∈ N, then 3m + 2 and 5m + 3 are relatively prime.

    E XERCISE 1.5.8. Show that if a, b ∈ Z are relatively prime, then gcd(a+2b, 2a+b) = 1
or 3.

   E XERCISE 1.5.9. Show that if a1 , a2 , . . . , an ∈ Z are not all 0 and c ∈ N, then
gcd(ca1 , ca2 , . . . , can ) = c · gcd(a1 , a2 , . . . , an ).
                                   1.6. THE EUCLIDEAN ALGORITHM                              17

                              1.6. The Euclidean Algorithm
    In this section we describe a systematic method that determines the greatest common
divisor of two integers, due to Euclid and thus called the Euclidean algorithm.

    L EMMA 1.6.1. If a, b, q, r ∈ Z and a = qb + r, then gcd(a, b) = gcd(r, b).

    P ROOF. Note that by theorem 8, we have gcd(bq + r, b) = gcd(b, r).                      

    Now to the Euclidean algorithm in its general form, which basically states that the
greatest common divisor of two integers is the last non zero remainder of successive divi-
sions.

    T HEOREM 1.6.2. Let a, b ∈ N and assume a ≥ b. Define r0 = a, r1 = b, s0 = 1,
s1 = 0, t0 = 0, and t1 = 1. Then apply the division algorithm successively to obtain
quotients and remainders qj , rj ∈ N satisfying rj = rj+1qj+1 + rj+2 and 0 ≤ rj+2 < rj+1
for all j = 0, 1, . . . , n − 2 where n is defined so that rn+1 = 0. Along the way, also define
sj+1 = sj−1 − qj+1 sj and tj+1 = tj−1 − qj+1 tj . Then gcd(a, b) = rn = sn+1 a + tn+1 b.

    P ROOF. By applying the division algorithm, we see that

                            r0 = q1 r1 + r2     0 ≤ r2 < r1 ,
                            r1 = q2 r2 + r3     0 ≤ r3 < r2 ,
                               .
                               .
                               .
                          rn−2 = qn−1 rn−1 + rn      0 ≤ rn < rn−1 ,
                          rn−1 = qn rn .

Notice that, we will have a remainder of 0 eventually since all the remainders are integers
and every remainder in the next step is less than the remainder in the previous one. By
Lemma 1.6.1, we see that

              gcd(a, b) = gcd(b, r2 ) = gcd(r2 , r3 ) = · · · = gcd(rn , 0) = rn .

                                                                                             

Note: The full version of this theorem, with the sj ’s and tj , is called the extended Eu-
clidean Algorithm, while a simpler version without those coefficients is know as Eu-
clidean Algorithm.
     The attentive reader will have seen that We did not actually prove that the sj ’s and tj ’s
can be used, as claimed, to write the gcd as a linear combination of a and b. This proof is
left as an exercise, below.
18                         1. WELL-ORDERING AND DIVISION

   E XAMPLE 1.6.3. We will find the greatest common divisor of 4147 and 10672:
Note that
                               10672 = 4147 · 2 + 2378,
                                4147 = 2378 · 1 + 1769,
                                2378 = 1769 · 1 + 609,
                                1769 = 609 · 2 + 551,
                                 609 = 551 · 1 + 58,
                                 551 = 58 · 9 + 29,
                                  58 = 29 · 2,


Hence gcd(4147, 10672) = 29.
                            1.6. THE EUCLIDEAN ALGORITHM                            19

   Exercises for §1.6.

   E XERCISE 1.6.1. Use the Euclidean algorithm to find the greatest common divisor of
412 and 32 and express it in terms of the two integers.

   E XERCISE 1.6.2. Use the Euclidean algorithm to find the greatest common divisor of
780 and 150 and express it in terms of the two integers.

   E XERCISE 1.6.3. Find the greatest common divisor of 70, 98, 108.

   E XERCISE 1.6.4. Let a, b ∈ N be even. Prove that gcd(a, b) = 2 gcd(a/2, b/2).

   E XERCISE 1.6.5. Show that if a ∈ N is even and b ∈ N is odd, then gcd(a, b) =
gcd(a/2, b).

   E XERCISE 1.6.6. Prove the extended part of the Extended Euclidean Algorithm.
                                      CHAPTER 2


                                     Congruences

    A congruence is nothing more than a statement about divisibility. The theory of con-
gruences was introduced by Carl Friedrich Gauss, in his monumental Disquisitiones Arith-
meticae (published in 1801, when he was 24;a translation is [Gau86]).
    We start by introducing congruences and their properties. We then present solutions to
linear congruences which will serve as an introduction to the Chinese Remainder Theorem
that follows.

                           2.1. Introduction to Congruences
   As we mentioned in the introduction, the theory of congruences was developed by
Gauss at the beginning of the nineteenth century.

    D EFINITION 2.1.1. Given a, b ∈ Z and n ∈ N, we say that a is congruent to b modulo
n if n | (a − b), i.e., if ∃k ∈ Z such that a = b + kn. If a is congruent to b modulo n, we
write a ≡ b (mod n).

   E XAMPLE 2.1.2. 19 ≡ 5 (mod 7). Similarly 2k + 1 ≡ 1 (mod 2) which means every
odd number is congruent to 1 modulo 2.

   Congruence is much like equality in many ways. For example:

   T HEOREM 2.1.3. Given a, b, c, d ∈ Z and n ∈ N. Then
     (1) If a ≡ b (mod n), then b ≡ a (mod n).
     (2) If a ≡ b (mod n) and b ≡ c (mod n), then a ≡ c (mod m).
     (3) If a ≡ b (mod n), then a + c ≡ b + c (mod n).
     (4) If a ≡ b (mod n), then a − c ≡ b − c (mod n).
     (5) If a ≡ b (mod n), then ac ≡ bc (mod n).
     (6) If c > 0 and a ≡ b (mod n), then ac ≡ bc (mod nc).
     (7) If a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n).
     (8) If a ≡ b (mod n) and c ≡ d (mod n) then a − c ≡ b − d (mod n).
     (9) If a ≡ b (mod n) and c ≡ d (mod n) then ac ≡ bd (mod n).

   P ROOF.
     (1) If a ≡ b (mod n), then n | (a − b). Thus ∃k ∈ Z such that a − b = kn. This
         implies b − a = (−k)n and thus n | (b − a). Consequently b ≡ a (mod n).
                                            21
22                                   2. CONGRUENCES

     (2) Since a ≡ b (mod n) and b ≡ c (mod n), n | (a − b) and n | (b − c). As a
         result, there ∃k, l ∈ Z such that a = b + kn and b = c + ln, which imply that
         a = c + (k + l)n. In other words, a = c (mod n).
     (3) Since a ≡ b (mod n), n | (a − b). So if we add and subtract c we get

                                 n | ((a + c) − (b + c))

        which means that
                                a+c ≡b+c        (mod n).
     (4) Since a ≡ b (mod n), n | (a − b) so we can subtract and add c to get

                                 n | ((a − c) − (b − c))

        so
                                a−c ≡b−c        (mod n).
     (5) If a ≡ b (mod n), n | (a − b). Thus there ∃k ∈ Z such that a − b = kn and as a
         result ac − bc = (kc)n. Therefore

                                      n | (ac − bc)

        and hence
                                  ac ≡ bc    (mod n).
     (6) If a ≡ b (mod n), n | (a − b). Thus there ∃k ∈ Z such that a − b = kn and as a
         result
                                    ac − bc = (k)cn.
        Thus
                                     nc | (ac − bc)
        and hence
                                  ac ≡ bc   (mod nc).
     (7) Since a ≡ b (mod n) and c ≡ d (mod n), n | (a − b) and n | (c − d). As a result,
         there ∃k, l ∈ Z such that a − b = kn and c − d = ln. Note that

                    (a − b) + (c − d) = (a + c) − (b + d) = (k + l)n.

        As a result,
                                 n | ((a + c) − (b + d)),
        hence
                               a + c ≡ b + d (mod n).
                             2.1. INTRODUCTION TO CONGRUENCES                               23

     (8) If a = b + kn and c = d + ln for k, l ∈ Z, we have
                    (a − b) − (c − d) = (a − c) − (b − d) = (k − l)n.
         As a result,
                                     n | ((a − c) − (b − d)),
         hence
                                  a − c ≡ b − d (mod n).
     (9) ∃k, l ∈ Z such that such that a − b = kn and c − d = ln and thus ca − cb = (ck)n
         and bc − bd = (bl)n. Note that
                        (ca − cb) + (bc − bd) = ac − bd = (kc − lb)n.
         As a result,
                                         n | (ac − bd),
         hence
                                      ac ≡ bd    (mod n).
                                                                                           

   Here is a technical result which will be useful later:

    T HEOREM 2.1.4. Given a, b, c ∈ Z, if a | c, b | c, and a and b are relatively prime, then
ab | c.

   P ROOF. By Corollary 1.5.9, we know ∃m, n ∈ Z such that ma + nb = 1. Also,
because of the divisibility hypotheses, we also know ∃p, q ∈ Z such that c = pa and
c = qb. Compute:
         c = c · 1 = c(ma + nb) = mca + ncb = mqba + npab = (mq + np)ab .
But this means ab | c, as desired.                                                         

   E XAMPLES 2.1.5.
     (1) 14 ≡ 8 (mod 6) so 8 ≡ 14 (mod 6).
     (2) Because 22 ≡ 10 (mod 6) and 10 ≡ 4 (mod 6), it is also true that 22 ≡ 4
         (mod 6).
     (3) 50 ≡ 20 (mod 15) so 50 + 5 = 55 ≡ 20 + 5 = 25 (mod 15).
     (4) 50 ≡ 20 (mod 15) so 50 − 5 = 45 ≡ 20 − 5 = 15 (mod 15).
     (5) 19 ≡ 16 (mod 3) so 2(19) = 38 ≡ 2(16) = 32 (mod 3).
     (6) 19 ≡ 16 (mod 3) so 2(19) = 38 ≡ 2(16) = 32 (mod 2 · 3) or 38 ≡ 2(16) = 32
         (mod 6).
     (7) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19 + 17 = 36 ≡ 3 + 9 =
         12 (mod 8).
24                                      2. CONGRUENCES

      (8) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19 − 17 = 2 ≡ 3 − 9 =
          −6 (mod 8).
      (9) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19(17) = 323 ≡ 3(9) =
          27 (mod 8).

    Here is a result which at first seems very simple, but turns out to be immensely useful
– so useful it has a name.

    L EMMA 2.1.6. Euclid’s Lemma: Given x, y, z ∈ Z, if x | yz and gcd(x, y) = 1 then
x | z.

    P ROOF. From Corollary 1.5.9, we know ∃m, n ∈ Z such that mx + ny = 1. Multiply-
ing by z, we get mxz + nyz = z. But we’ve assumed that x | yz, so x | nyz, and certainly
x | mxz, so x | mxz + nyz, i.e., x | z.                                               

    We now present a theorem that will show one difference between equations and congru-
ences: in equations, if we divide both sides of the equation by a non-zero number, equality
holds. However, in congruences, this is not necessarily true. In other words, dividing both
sides of a congruence by the same integer does not necessarily preserve the congruence.

     T HEOREM 2.1.7.
      (1) Given a, b, c ∈ Z and n ∈ N, define d = gcd(a, n) ∈ N. If ab ≡ ac (mod n) then
          b ≡ c (mod n/d).
      (2) In particular, if gcd(a, n) = 1 then


                      b = c (mod n)           ⇔        ab ≡ ac    (mod n).

     P ROOF. For Part 1, if ab ≡ ac (mod n), then

                                   n | (ab − ac) = a(b − c).

Hence ∃k ∈ Z such that a(b − c) = kn. Dividing both sides by d, we get (a/d)(b − c) =
k(n/d) or (n/d) | (a/d)(b − c). Now, by Theorem 1.5.5 gcd(a/d, n/d) = 1 so Euclid’s
Lemma 2.1.6 tells us that (n/d) | (b − c). Hence b ≡ c (mod n/d).
    For Part 2, the direction ⇒ is part 5 of Theorem 2.1.3, while ⇐ is a special case of
Part 1.                                                                               

     E XAMPLE 2.1.8. 38 ≡ 10 (mod 7). Since gcd(2, 7) = 1, we have 19 ≡ 5 (mod 7).

     One last technical result is worth stating clearly at this point:

    T HEOREM 2.1.9. Given n, d ∈ N such that d | n, there are exactly d values x ∈ Z, up
to congruence modulo n, satisfying x ≡ 0 (mod n/d).
                            2.1. INTRODUCTION TO CONGRUENCES                                25

    P ROOF. Let xj = j(n/d) for j = 0, . . . , (d − 1). Certainly each of these d values xj is
a multiple of n/d and so solves x ≡ 0 (mod n/d). All we must show, then, is that every
solution x of x ≡ 0 (mod n/d) is congruent, modulo n, to one of these xj .
    Let x be such a solution, so ∃k ∈ Z such that x = k(n/d). Use the Division Algorithm
for x divided by n, getting x = qn + r for some q, r ∈ Z with 0 ≤ r < n. But
                    r = x − qn = k(n/d) − qd(n/d) = (k − qd)(n/d)
so r is a multiple of (n/d) which lies in the range [0, n). The only such multiples are the
x0 , . . . , x(d−1) defined above; say r = xj . Then x = qn + r = qn + xj ≡ xj (mod xj ),
so every solution x is congruent modulo n to exactly one of the d particular solutions
x1 , . . . , x(d−1) .                                                                    
26                                        2. CONGRUENCES

     Exercises for §2.1.

     E XERCISE 2.1.1. Determine whether 3 and 99 are congruent modulo 7 or not.

     E XERCISE 2.1.2. Show that if x is an odd integer, then x2 ≡ 1 (mod 8).

   E XERCISE 2.1.3. Show that if a, b ∈ Z and m, n ∈ N are such that n | m and a ≡ b
(mod m), then a ≡ b (mod n).

   E XERCISE 2.1.4. Show that if n, k ∈ N and {a1 , . . . , ak , b1 , . . . , bk } ⊂ Z satisfy ai ≡ bi
                                  P         P
(mod n) for i = 1, 2, ..., k, then ki=1 ai ≡ ki=1 bi (mod n).

     E XERCISE 2.1.5. For which n ∈ N is it true that 1 + 2 + ... + (n − 1) ≡ 0 (mod n)?
                                  2.2. LINEAR CONGRUENCES                                   27

                                 2.2. Linear Congruences
    Because congruence is analogous to equality, it is natural to ask about the analogues
of linear equations, the simplest equations one can solve in algebra, but using congruence
rather than equality. In this section, we discuss linear congruences of one variable and their
solutions.
    We start with a definition:

   D EFINITION 2.2.1. Given constants a, b ∈ Z and n ∈ Z, a congruence of the form
ax ≡ b (mod n) where x ∈ Z is unknown is called a linear congruence in one variable.

   If a linear congruence has one solution, then it has infinitely many:

    T HEOREM 2.2.2. Given constants a, b ∈ Z, n ∈ Z, and a solution x ∈ Z to the linear
congruence ax ≡ b (mod n), any other x′ ∈ Z satisfying x′ ≡ x (mod n) is also a
solution of the same congruence.

   [Note: in the early history of number theory, before Gauss, one talked about

    D EFINITION 2.2.3. An algebraic equation whose constants and variables are all inte-
gers is called a Diophantine equation.

Then the modern linear congruence ax ≡ b (mod n), for a, b ∈ Z and n ∈ N is equivalent
to the linear Diophantine equation ax − ny = b in the two unknowns x and y.]
    The following gives a fairly complete characterization of solutions of linear congru-
ences:

   T HEOREM 2.2.4. Let a, b ∈ Z and n ∈ N and consider the linear congruence
                                    ax ≡ b (mod n) .
Setting d = gcd(a, n), we have
     (1) If d ∤ b, then the congruence has no solutions.
     (2) If d | b, then the congruence has exactly d solutions which are distinct modulo n.

     P ROOF. For Part 1, we prove its contrapositive. So assume the congruence has solu-
tions, meaning ∃x, k ∈ Z such that ax − b = kn, or ax − kn = b. But since d = gcd(a, n)
is a common divisor of a and n, it divides the linear combination ax − kn = b. Hence d | b.
     Now for Part 2, assume d | b, so ∃k ∈ Z such that kd = b. But from Theorem 1.5.8
we know ∃p, q ∈ Z such that d = pa + qn. This means that kpa + kqn = kd = b or,
rearranging, a(kp) − b = (−kq)n. Hence n | a(kp) = b, i.e., a(kp) ≡ b (mod n) and thus
x = kp is one solution to the linear congruence ax ≡ b (mod n).
     Finally, let us prove that there are the correct number of solutions, mod n, of the con-
gruence equation. We have just seen that there is at least one x ∈ Z satisfying ax ≡ b
(mod n). Let y ∈ Z be any other solution. Then ax ≡ b ≡ ay (mod n). By part (2) of
28                                    2. CONGRUENCES

Theorem 2.1.7, x ≡ y (mod n/d), or δ = y − x ≡ 0 (mod n/d). Now by Theorem 2.1.9,
there are exactly d possibilities, modulo n, for this δ. Thus there are d solutions of ax ≡ b
(mod n) of the form x + δ.                                                                 

     R EMARK 2.2.5. Notice that if a ∈ Z and n ∈ N are relatively prime, then ∀b ∈ Z there
is a unique solution modulo n to the equation ax ≡ b (mod n).

    E XAMPLE 2.2.6. Let us find all the solutions of the congruence 3x ≡ 12 (mod 6).
Notice that gcd(3, 6) = 3 and 3 | 12. Thus there are three incongruent solutions modulo 6.
Using the Euclidean Algorithm to find the solution of the equation 3x − 6y = 12 we get
a solution x0 = 6. Thus the three incongruent (modulo 6) solutions are given by x1 = 6
(mod 6), x1 = 6 + 2 = 2 (mod 6) and x2 = 6 + 4 = 4 (mod 6).

   As we mentioned in Remark 2.2.5, the congruence ax ≡ b (mod n) for a, b ∈ Z and
n ∈ N has a unique (modulo n) solution if gcd(a, n) = 1. This will allow us to talk about
modular inverses.

    D EFINITION 2.2.7. Given a ∈ Z and n ∈ N, a solution to the congruence ax ≡ 1
(mod n) for (a, n) = 1 is called the inverse of a modulo n. We denote such an inverse by
a−1 , with the n to be understood from context.

Stating formally what was just recalled from Remark 2.2.5, we have

    C OROLLARY 2.2.8. Given a ∈ Z and n ∈ N which are relatively prime, the modular
inverse a−1 exists and is unique modulo n.

    E XAMPLE 2.2.9. The modular inverse 7−1 of 7 modulo 48 is 7. Notice that a solution
of 7x ≡ 1 (mod 48) is x ≡ 7 (mod 48).
                               2.2. LINEAR CONGRUENCES                              29

   Exercises for §2.2.

   E XERCISE 2.2.1. Find all solutions of 3x ≡ 6 (mod 9).

   E XERCISE 2.2.2. Find all solutions of 3x ≡ 2 (mod 7).

   E XERCISE 2.2.3. Find inverses modulo 13 of 2 and of 11.

   E XERCISE 2.2.4. Given a ∈ Z and n ∈ N, show that if a−1 is the inverse of a modulo
n and b−1 is the inverse of b modulo n, then a−1 b−1 is the inverse of ab modulo n.
30                                     2. CONGRUENCES

                         2.3. The Chinese Remainder Theorem
    In this section, we discuss solutions of systems of congruences having different moduli.
An example of this kind of systems is the following: find a number that leaves a remainder
of 1 when divided by 2, a remainder of 2 when divided by three and a remainder of 3 when
divided by 5. We shall see that there is a systematic way of solving this kind of system.

      T HEOREM 2.3.1. The Chinese Remainder Theorem: Fix a k ∈ N. Then given
b1 , . . . , bk ∈ Z and n1 , . . . , nk ∈ N, the system of congruences

                                     x ≡ b1     (mod n1 )
                                     x ≡ b2     (mod n2 )
                                      ..
                                       .
                                     x ≡ bk     (mod nk )

has a solution x ∈ Z if the n1 , n2 , . . . , nk are pairwise relatively prime. The solution is
unique modulo N = n1 n2 . . . nk .

    P ROOF. For j = 1, . . . , k, let Nj = N/nj . Since the moduli nj are pairwise relatively
prime, gcd(Nj , nj ) = 1 – after all, Nj is the product of all the moduli except nj . Hence by
Corollary 2.2.8, ∃yj = Nj−1 modulo nj , satisfying Nj yj ≡ 1 (mod nj ). Consider now
                                            k
                                            X
                                       x=           bj Nj yj
                                              j=1

Since
                                Nj ≡ 0 (mod ni ) ∀i 6= j,
we see that
                               x ≡ bj Nj yj ≡ bj        (mod nj ).
Hence x is a solution to the system of congruences.
    We have to show now that any two solutions are congruent modulo N. Suppose that
x and y are both solutions of the system of congruences. Then x ≡ bj ≡ y (mod nj ), or
nj | x−y, for all 1 ≤ j ≤ k. But then, since the moduli are pairwise relatively prime, using
Theorem 2.1.4 k times (formally, this needs to be done by induction!), we can conclude that
N = n1 . . . nk | x − y or x ≡ y (mod N).                                                 

     E XAMPLE 2.3.2. Solve the system

                                      x ≡ 1 (mod 2)
                                      x ≡ 2 (mod 3)
                                      x ≡ 3 (mod 5).
                         2.3. THE CHINESE REMAINDER THEOREM                       31

We have N = 2 · 3 · 5 = 30. Also
               N1 = 30/2 = 15, N2 = 30/3 = 10, and N3 = 30/5 = 6.
So we have to solve now 15y1 ≡ 1 (mod 2) – a solution is y1 ≡ 1 (mod 2). In the same
way, we find that y2 ≡ 1 (mod 3) and y3 ≡ 1 (mod 5). Therefore
              x = 1 · 15 · 1 + 2 · 10 · 1 + 3 · 6 · 1 = 53 ≡ 23   (mod 30).
32                                     2. CONGRUENCES

     Exercises for §2.3.

    E XERCISE 2.3.1. Find an integer that leaves a remainder of 2 when divided by either 3
or 5, but that is divisible by 4.

    E XERCISE 2.3.2. Find all integers that leave a remainder of 4 when divided by 11 and
leaves a remainder of 3 when divided by 17.

   E XERCISE 2.3.3. Find all integers that leave a remainder of 1 when divided by 2, a
remainder of 2 when divided by 3 and a remainder of 3 when divided by 5.

    E XERCISE 2.3.4. A band of 17 pirates steal some gold bars. When they try to divide
the spoils equally, 3 bars are left over – so a fight breaks out, killing one. This immediately
brings calm as they see if the gold can now be evenly shared. Unfortunately, there are now
10 bars left out, so they fight again. After the inevitable (single) further death, a perfect
division is now possible. What is the minimum number of gold bars the pirates could have
started with? [This is apparently an ancient Chinese problem.]
           2.4. ANOTHER WAY TO WORK WITH CONGRUENCES: EQUIVALENCE CLASSES                  33

         2.4. Another Way to Work with Congruences: Equivalence Classes
    In this section, we shall consider another way to work with congruences, based upon
the following:

     D EFINITION 2.4.1. Let S be a set and ∼ = a relation defined on S. (That is, ∀x, y ∈ S,
the statement “x ∼ = y” may be true or false.) If ∼
                                                  = satisfies the following three properties,
it is called an equivalence relation:
        • [Reflexivity] ∀x ∈ S, x ∼= x.
        • [Symmetry] ∀x, y ∈ S, x ∼  =y⇔y∼    = x.
                                         ∼
        • [Transitivity] ∀x, y, z ∈ S, x = y and y ∼
                                                   =z⇒x∼     = z.
    If ∼
       = is an equivalence relation on the set S and x ∈ S, then the set [x] = {y ∈ S |
y = x} ⊆ S is called the equivalence class of x. We write S/ ∼
  ∼                                                           = for the set of equivalence
classes in S. And if C ∈ S, then any r ∈ S such that C = [r] is called a representative of
the equivalence class C.

   T HEOREM 2.4.2. Let S be a set and ∼
                                      = an equivalence relation defined on S. Then
     (1) ∀x ∈ S, x ∈ [x].
     (2) ∀x, y ∈ S, either [x] = [y] or [x] ∩ [y] = ∅, but not both.

    P ROOF. (1): This is just the reflexivity of ∼
                                                 =.
                                                ∼
    (2): Suppose z ∈ [x] ∩ [y]. This means z = x and z ∼ = y, so by symmetry y ∼
                                                                               = z and by
                ∼
transitivity, x = y.
    Now if a ∈ [x] and b ∈ [y], then a ∼   = x and b ∼
                                                     = y. By transitivity, a ∼=x∼=y ∼ = b.
Thus a ∈ [y] and, by symmetry, b ∼  = x so b ∈ [x].
    Therefore [x] ⊆ [y] and [y] ⊆ [x], and thus [x] = [y].
    Since the above construction only relied on the existence of some element z ∈ [x] ∩ [y],
we see that the only way we could fail to have [x] = [y] is if [x] ∩ [y] = ∅.             

    E XAMPLE 2.4.3. On the set S = {(n, m) | n, m ∈ Z, m 6= 0} we can define the
                = (c, d) if ad = cb. Then S/ ∼
relation (a, b) ∼                            = is nothing other than the rational numbers,
Q!

   Now let us specialize the concept of equivalence class to the case of congruences:

   P ROPOSITION 2.4.4. Given n ∈ N, the relation on Z defined by
                                a∼
                                 =n b ⇔ a ≡ b (mod n)
(which we shall write a ∼
                        = b if the n is clear from context) is an equivalence relation.

    D EFINITION 2.4.5. Given n ∈ N and a ∈ Z, the equivalence class of a under the
above equivalence relation is called the congruence class of a mod n and is written [a]n
(or, by abuse of notation, merely [a] if the n is understood from the context). The set of
34                                     2. CONGRUENCES

equivalence classes Z/ ∼
                       =n is called the integers mod n and is written Z/nZ (or, by some
authors, Z/n or Zn ).

    T HEOREM 2.4.6. Given n ∈ N, Z/nZ has n elements, with representatives 0, . . . , n−1
of these distinct equivalence classes. That is,

                               Z/nZ = {[0]n , . . . , [n − 1]n } .

     P ROOF. Given n ∈ N and a ∈ Z, the Division Algorithm tells us there is a unique
pair q, r ∈ Z such that a = qn + r and 0 ≤ r < n. Note that a ≡ r (mod n) or,
equivalently, a ∈ [r]. Thus every a ∈ Z is an element of a unique equivalence class
[r] for r ∈ {0, . . . , n − 1}. Since each such r ∈ [r], uniquely!, the equivalence classes
[0], . . . , [n − 1] are all distinct.                                                   

    The remarkable thing about these Z/nZ is that we can do much of the usual integer
arithmetic in them; in fact, sometimes we can do a bit more than the usual.

    D EFINITION 2.4.7. Given n ∈ N and C, D ∈ Z/nZ, define C + D = [a + b] and
C · D = [a · m] where a and b are any representatives of the congruence classes C and D,
respectively.

    T HEOREM 2.4.8. The operations + and · on Z/nZ are well-defined. That is, they are
independent of the choices of representatives of the congruence classes.

     P ROOF. Say n ∈ N and C, D ∈ Z/nZ. Let a, p, b, q ∈ Z be such that C = [a] = [p]
and D = [b] = [q]. Then p ≡ a (mod n) and q ≡ b (mod n). But then Theorem 2.1.3
tells us that a + b ≡ p + q (mod n) and a · b ≡ p · q (mod n), so [a + b] = [p + q] and
[a · b] = [p · q]. This means that C + D can be defined either as [a + b] or as [p + q] and it
will be the same thing, as asserted, and likewise for C · D.                               

     These new operations are quite nice:

     T HEOREM 2.4.9. Given n ∈ N, the addition and multiplication on Z/nZ
      (1) are commutative and associative;
      (2) multiplication distributes over addition;
      (3) both operations have an identity – [0] for addition and [1] for multiplication;
      (4) every element of Z/nZ has an additive inverse – the inverse of [a] is [−a] (or
          [n − a], another name for the same thing); and
      (5) an element [a] ∈ Z/nZ has a multiplicative inverse [a]−1 if and only if gcd(a, n) =
          1; this inverse is unique when it exists.

    P ROOF. Left to the reader. Note the last point is basically Corollary 2.2.8 restated in
the language of congruence classes.                                                       
           2.4. ANOTHER WAY TO WORK WITH CONGRUENCES: EQUIVALENCE CLASSES                    35

    We can also restate many of our other, earlier results beside just Corollary 2.2.8 in terms
of congruence classes. Most of these shall be left to the reader, but here is one example:

   T HEOREM 2.4.10. Given a, b ∈ Z and n ∈ N, let d = gcd(a, n). Consider the equation
                                         [a] · x = [b]
where x ∈ Z/nZ. Then
     (1) If d ∤ b, there are no solutions.
     (2) If d | b, this equation has exactly d solutions in Z/nZ.

   P ROOF. Left to the reader; this is really just Theorem 2.2.4 stated differently.        
36                                   2. CONGRUENCES

     Exercises for §2.4.

     E XERCISE 2.4.1. When the rational numbers Q are described as in Example 2.4.3, we
define addition by [(n, m)] + [(p, q)] = [(nq + mp, mq)] and multiplication by [(n, m)] ·
[(p, q)] = [(np, mq)], where n, m, p, q ∈ Z and neither m nor q is 0. Prove a version of
Theorems 2.4.8 for this version of Q.
    What are the additive and multiplicative identities in this Q? Does every element (or
nearly every element) of Q have an additive and multiplicative inverse – if so, give a for-
mula for those inverses; if not, why not?

    E XERCISE 2.4.2. Restate the Chinese Remainder Theorem in terms of congruence
classes and equations in various Z/nZ’s rather than congruences.

    E XERCISE 2.4.3. Prove the statements in this section which have proofs “left to the
reader.”

   E XERCISE 2.4.4. State and prove congruence class versions of any results about con-
gruences from sections 2.1 and 2.2 which do not have versions in this section.
                                  2.5. EULER’S φ FUNCTION                                  37

                                 2.5. Euler’s φ Function
   Euler made the following definition, and it was good.

   D EFINITION 2.5.1. Given n ∈ N,

                 φ(n) = # ({m ∈ Z | 0 ≤ m < n and gcd(m, n) = 1}) .

 In other words, φ(n) counts the number of non-negative integers less than n which are
relatively prime to n.
    This is called Euler’s φ function, or Euler’s totient function (“totient” rhymes with
“quotient”; this name was give to it by the English mathematician Sylvester).

   Here’s one thing it is good for:

   T HEOREM 2.5.2. Given n ∈ N, φ(n) is the number of elements of Z/nZ which are
(multiplicatively) invertible.

   P ROOF. This is just Theorem 2.4.9 part (5)                                             

    One quite surprising fact about Euler’s totient function is that it is multiplicative, at
least for relatively prime numbers:

   T HEOREM 2.5.3. Given n, m ∈ Z, if gcd(n, m) = 1 then φ(nm) = φ(n)φ(m).

    P ROOF. This is actually a nice application of the Chinese Remainder Theorem, as we
shall see.
    Fix n, m ∈ N which are relatively prime. For k ∈ N, let

                         (Z/kZ)∗ = {x ∈ Z/kZ | x is invertible}

so φ(k) = #((Z/kZ)∗ ). (This notation means the number of things in the set (Z/kZ)∗ .)
    Now define the set of pairs

           (Z/nZ)∗ × (Z/mZ)∗ = {(a, b) | a ∈ (Z/nZ)∗ and b ∈ (Z/nZ)∗ } .

Notice that #((Z/nZ)∗ × (Z/mZ)∗ ) = #((Z/nZ)∗ ) · #((Z/mZ)∗ ) = φ(n)φ(m), since
each part of the pair is free to be whichever element it wants of the respective set, so the
total number of pairs is the size of the first set times the size of the second. Therefore,
if we can prove that (Z/nZ)∗ × (Z/mZ)∗ can be put into bijective correspondence with
(Z/(nm)Z)∗ , then we will have

          φ(n)φ(m) = # ((Z/nZ)∗ × (Z/mZ)∗ ) = #((Z/(nm)Z)∗ ) = φ(nm)

as desired, since bijective sets have the same number of elements.
    The correspondence is given by the function

            F : (Z/(nm)Z)∗ → (Z/nZ)∗ × (Z/mZ)∗ : [x]nm 7→ ([x]n , [x]m ) .
38                                      2. CONGRUENCES

    We must show F is well-defined, 1-1, and onto. Well-defined means that for some
[x]nm , if y ∈ Z is another representative of the congruence class [x]nm , then ([x]n , [x]m ) =
([y]n , [y]m ) so that F is defined only by the class [x]nm , not by its choice of representative
x. But this is easy, since y ∈ [x]nm means y ∼  =nm x so ∃k ∈ Z such that y − x = k(nm) =
(km)n = (kn)m. These last two versions mean that y ∼          =n x and y ∼ =m x, so [x]n = [y]n
and [x]m = [y]m , which means F is well-defined.
    Now let us assume that x, y ∈ Z have F ([x]nm ) = F ([y]nm), i.e., [x]n = [y]n and
[x]m = [y]m . That is, z = x − y ∈ Z solves the system
                                      z≡0      (mod n)
                                      z≡0      (mod m) .
But z = 0 also solves this system, and the Chinese Remainder Theorem tells us that so-
lutions of this system (which is an appropriate system for the CRT since gcd(n, m) = 1)
are unique modulo nm. Therefore, x − y ≡ 0 (mod nm), so x ≡ y (mod nm). Hence
[x]nm = [y]nm , and thus F is 1-1.
    Now, given any pair ([x]n , [y]m ) ∈ (Z/nZ)∗ × (Z/mZ)∗ , consider the system of con-
gruences
                                      z ≡ x (mod n)
                                      z≡y      (mod m) ,
which system is again CRT-ready since gcd(n, m) = 1. Therefore there exists a solution to
this system, call it z ∈ Z, and F ([z]nm ) = ([x]n , [y]m ). Thus F is onto as well.   
                                2.5. EULER’S φ FUNCTION                               39

   Exercises for §2.5.

    E XERCISE 2.5.1. Compute φ(n) for n = 2, 3, 5, 7, 11, 13, 17. Make a general conjec-
ture. Can you prove it?

   E XERCISE 2.5.2. Compute φ(n) for n = 2, 4, 8, 16, 32, 64. Make a conjecture about
   k
φ(2 ) for k ∈ N. Prove it!
                                      CHAPTER 3


                                   Prime Numbers

    Primes are the atoms out of which the more complicated, composite integers (the
molecules, in this metaphor) are built. In this chapter we study some of their basic prop-
erties, prove the aptly named Fundamental Theorem of Arithmetic, and go on to Wilson’s
Theorem.

                                3.1. Basics and the FTA
   First of all, we make the
   D EFINITION 3.1.1. We say p ∈ N is prime if p > 1 and the only natural numbers
which divide p are 1 and p.
    E XAMPLE 3.1.2. Some primes are 2, 3, 5, 7, 11, 13, and 17. Notice 2 is the only even
prime (clearly – any other would be a multiple of 2 and hence could not be prime), and has
some unusual properties – the joke is that “2 is the oddest prime.”
   The largest prime known to humans at the time of this writing is
                                       257,885,161 − 1
which was proven to be prime in January of 2013 by a distributed computer program called
GIMPS [the Great Internet Mersenne Prime Search] running on hundreds of machines
across the Internet.
   In contrast, we also use the following term
   D EFINITION 3.1.3. A number c ∈ N which is greater than 1 and not prime is called
composite.
   How far does a naive, brute-force check have to go in order to see if a number is
composite?
                                                                                       √
   T HEOREM 3.1.4. If n is a composite, then it has a positive divisor d satisfying d ≤ n.
     P ROOF. Suppose n is composite. Then it has some divisor a ∈ N. Notice that n = a· na ,
                                                                  √
so na ∈ N is also a divisor. But a and na cannot both be less than n, because if they were
we would have
                                          n √ √
                                 n=a· < n· n=n
                                          a
which would be a contradiction. Hence either a or na is the divisor d promised by the
theorem statement.                                                                      
                                             41
42                                         3. PRIME NUMBERS

   Euclid’s Lemma (Lemma 2.1.6) takes a particularly nice form if the divisor involved is
prime:

     P ROPOSITION 3.1.5. Suppose p is a prime and a, b ∈ Z. If p | ab then p | or p | b.

    P ROOF. Notice that gcd(p, a) | p, therefore gcd(p, a) is either 1 or p since p is prime.
But also gcd(a, p) | a, so either p | a or gcd(a, p) = 1. If p | a, we are done. If not, since
therefore gcd(a, p) = 1, Euclid’s Lemma 2.1.6 tells us that p | b.                          

     A more general form of this is

     C OROLLARY 3.1.6. Suppose p is a prime, k ∈ N, and a1 , . . . , ak ∈ Z. Then if p |
a1 . . . ak , it follows that p divides at least one of the aj .

     P ROOF. Left to the reader (use induction on k).                                                  

     This leads to the aptly named

    T HEOREM 3.1.7. The Fundamental Theorem of Arithmetic: Let n ∈ N, n ≥ 2. Then
∃k ∈ N and primes p1 , . . . , pk such that n = p1 . . . pk . Furthermore, if l ∈ N and q1 , . . . , ql
are also primes such that n = q1 . . . ql , then l = k and the factorization in terms of the q’s
is merely a reordering of that in term s of the p’s.

    P ROOF. We use the Second Principle of Mathematical induction for the existence part.
The general statement we are proving is ∀n ∈ Z, n > 1 ⇒ S(n), where S(n) is the
statement “∃k ∈ N and primes p1 , . . . , pk such that n = p1 . . . pk .”
    As the base case, say n = 2. Then k = 1 and p1 = 2 works.
    Now assume S(k) is true for all k < n. If n is prime, then k = 1 and p1 = n works.
So suppose n is instead composite, with divisor d 6= 1, n. Then both d, nd < n, so by the
inductive hypothesis ∃k, k ′ ∈ N and primes p1 , . . . , pk , p′1 , . . . , p′k′ such that d = p1 . . . pk
and nd = p′1 . . . p′k′ . So then n is the product

                                       n = p1 . . . pk · p′1 . . . p′k′

of the k + k ′ primes. Hence S(n) is true as well, and so prime factorizations always exist.
      Now suppose n ∈ Z satisfies n > 1 and ∃k, l ∈ N and both primes p1 , . . . pk and
q1 , . . . , ql such that
                                      p1 . . . pk = n = q1 . . . ql .
Certainly p1 divides the left hand side of these dual expressions for n. Then by Corol-
lary 3.1.6, p1 divides one of the qj , which means it must be that p1 = qj since they are
prime. Removing the p1 from the left and the qj from the right, we get

                              p2 . . . pk = n = q1 . . . qj−1 · qj+1 . . . ql .
                                  3.1. BASICS AND THE FTA                                 43

Continuing in this way, either we get the uniqueness statement in the theorem, or we run
out of p’s or q’s. However, we cannot run of of primes on one side before the other, because
that would make a product of primes on one side equal to 1, which is impossible.          
44                                    3. PRIME NUMBERS

     Exercises for §3.1.

     E XERCISE 3.1.1. Provide all the details of the proof of Corollary 3.1.6.

     E XERCISE 3.1.2. State and prove a theorem about the prime factorizations of numbers
a, b ∈ N and of their gcd.

    E XERCISE 3.1.3. A number n ∈ Z, n > 1 is called square-free if it is not divisible by
the square of any natural number other than 1. Prove that an n ∈ Z, n ≥ 1 is square-free if
and only if it is the product of distinct primes.
                                     3.2. WILSON’S THEOREM                                      45

                                    3.2. Wilson’s Theorem
    In this section, we prove a nice theorem usually named after an 18th century English
mathematician ... although it was actually first stated by Ibn al-Haytham nearly 800 years
earlier.
    First, we need a

     L EMMA 3.2.1. Let p be a prime. Then n ∈ N equals its own inverse mod p if and only
if p | n + 1 or p | n − 1, i.e., iff n ≡ ±1 (mod p).

    P ROOF. Say p is a prime and n ∈ N equals its own inverse mod p. That means that
n = n · n ≡ 1 (mod p). By definition, p | n2 − 1 = (n + 1)(n − 1). By Proposition 3.1.5,
 2

this means that either p | n + 1 or p | n − 1.
    For the converse, suppose n ≡ 1 (mod p) or n ≡ −1 (mod p). Then, by Theo-
rem 2.1.3, n2 ≡ (±1)2 = 1 (mod p), so n equals its own inverse mod p.                 

     This Lemma is a key step in

    T HEOREM 3.2.2. Wilson’s Theorem Given p ∈ N such that p ≥ 2, p is prime iff
(p − 1)! ≡ −1 (mod p).

    P ROOF. Suppose p is prime. Consider the terms of (p − 2)!: every one has an inverse
mod p by Corollary 2.2.8, and only 1 equals its own inverse mod p (so would p − 1, but it is
not in the product (p − 2)! by Lemma 3.2.1. Hence in (p − 1)! we can group the terms into
pairs ((p − 3)/2 of these pairs) which are inverses mod p, leaving out only 1 and (p − 1).
That is
                    (p − 1)! ≡ 1(p−3)/2 (p − 1) ≡ p − 1 ≡ −1 (mod p)
    Conversely, suppose p ∈ N satisfies p ≥ 2 and (p − 1)! ≡ −1 (mod p). We can rewrite
that congruence as (p − 1)! + 1 ≡ 0 (mod p), or p | ((p − 1)! + 1).
    Now let d ∈ N be a divisor of p such that d 6= p. It follows that d is one of the numbers
in the product (p − 1)!, so d | (p − 1)!; also, since d | p and p | ((p − 1)! + 1), it follows
that d | ((p − 1)! + 1). Therefore, d | ((p − 1)! + 1) − (p − 1)! = 1 by Theorem 1.3.9,
which means d = 1.
    In other words, if d ∈ N is a divisor of p, it must be either p or 1, so p is prime.    

    E XAMPLE 3.2.3. Let’s work through one direction of the proof for a simple case, say
of p = 7. We start by finding all the inverses of the numbers 2, . . . , 6 (by brute force, in this
small case): 2 · 4 ≡ 1 (mod 7) and 3 · 5 ≡ 1 (mod 7), meaning that 2−1 ≡ 4 (or4−1 ≡ 2)
and 3−1 ≡ 5 (or 5−1 ≡ 3).
    Then

          (p − 2)! = 5! = 5 · 4 · 3 · 2 · 1 = (5 · 3) · (4 · 2) ≡ 1 · 1 ≡ 1   (mod 7)
46                                  3. PRIME NUMBERS

which in turn means that
              (p − 1)! = 6! = 6 · 5! ≡ 6 · 1 ≡ 6 ≡ 7 − 1 ≡ −1     (mod 7) .

    Notice that Wilson’s Theorem can be used to build a test for primality: see if a number
n satisfies (n − 1)! ≡ −1 (mod n) and, if so, n is prime. This is an entirely impractical
test, but it is our first example of a simple computational congruence involving an integer
which can tell us that that integer is prime.
                        3.3. MULTIPLICATIVE ORDER AND APPLICATIONS                          47

                       3.3. Multiplicative Order and Applications
    In this section we prove two very useful results called Euler’s Theorem and Fermat’s
Little Theorem (a special case of Euler’s). We do not follow the proof strategy of Euler and
Fermat, however, instead using an approach inspired by abstract algebra and Lagrange’s
Theorem in that subject.
    First we need the
   D EFINITION 3.3.1. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1.
Then we define the multiplicative order of a in mod n (called just the order when the
multiplicative and n can be understood from context) to be the smallest k ∈ N such that
ak ≡ 1 (mod n). The order of a in mod n is written ordn (a).
   Let us verify something which should probably always be checked for a new definition:
   P ROPOSITION 3.3.2. Given relatively prime numbers n ∈ N and a ∈ Z with n ≥ 2,
ordn (a) is well-defined.
    P ROOF. The problem in the definition of ordn (a) might be that there might not be any
value of k ∈ N at all for which ak ≡ 1 (mod n).
    But notice that this is a congruence, so we are really only concerned with the elements
 k
a up to their congruence class.
    So look at {[ap ]n | p ∈ N} and imagine putting the natural numbers p ∈ N into different
boxes based on what is the corresponding congruence class [ap ] ∈ Z/nZ. Since there are
infinitely many elements in N and only n elements in Z/nZ – n boxes – by the Pigeonhole
Principle (Theorem 1.1.2) there must be two (actually, infinitely many pairs of) distinct
values p, q ∈ N such that p and q end up in the same box, meaning [ap ] = [aq ]. Assume
without loss of generality that p > q, so p − q ∈ N.
    We are given that gcd(a, n) = 1, so by Corollary 2.2.8, a−1 exists in mod n. Thus
                    ap−q ≡ ap (a−1 )q ≡ aq (a−1 )q ≡ a0 ≡ 1 (mod n)
(replacing ap with aq in the middle of this congruence since we know that [ap ] = [aq ],
which means ap ≡ aq (mod n)) and therefore the set of k ∈ N for which ak ≡ 1 (mod n)
is non-empty. The order is the smallest such value, which exists by the Well-Ordering
Principle.                                                                          
    Here now is a theorem from abstract algebra (Lagrange’s Theorem) translated into the
current context:
   T HEOREM 3.3.3. Given relatively prime numbers n ∈ N and a ∈ Z, ordn (a) | φ(n).
    P ROOF. Start by looking at the congruence classes [a], [a2 ], [a3 ] . . . . They go up to
[aordn (a) ] = [1] and then start to repeat, so the set
                             hai = {[aj ] | j ∈ N, j ≤ ordn (a)}
48                                    3. PRIME NUMBERS

is a set of ordn (a) elements of Z/nZ (in group theory, this set hai is called the cyclic
subgroup of (Z/nZ)∗ generated by a). Notice that because aordn (a) ≡ 1 (mod n), [1] ∈
hai. Also, since if a−1 is the inverse of a mod n, (a−1 )k is the inverse of ak for k ∈ N, we
see that hai ⊆ (Z/nZ)∗ .
    To finish, we shall show that (Z/nZ)∗ is made up of some number, say m, of pieces,
which we shall call cosets, each of which is bijective with hai. These pieces will thus all
have ordn (a) elements, so m · ordn (a) = # ((Z/nZ)∗ ) = φ(n), from which our desired
result follows.
    A coset of hai is a set of the form

                     x hai = {[x] · [aj ] = [x aj ] | j ∈ N, j ≤ ordn (a)}

where [x] ∈ (Z/nZ)∗ . We have observed that [1] ∈ hai, so ∀[x] ∈ (Z/nZ)∗ , [x] ∈ x hai,
meaning that every congruence class [x] ∈ (Z/nZ)∗ is in some coset, i.e.,
                                           [
                              Z/nZ ⊆             x hai .
                                           [x]∈(Z/nZ)∗

    In fact, each coset x hai is bijective with hai. The bijection is the map fx : hai → x hai
defined by fx ([aj ]) = [xaj ] for j ∈ N with j ≤ ordn (a). This map is a bijection because it
has an inverse fx−1 , coming from the inverse mod n of x.
    Now suppose x hai and y hai are two cosets. I claim that either x hai = y hai or
      T                                         T                               T
x hai y hai = ∅. For this, suppose x hai y hai =         6 ∅, so ∃[z] ∈ x hai y hai. That
means that ∃j, k ∈ N such that j ≤ ordn (a), k ≤ ordn (a), and [xaj ] = [z] = [yak ]. In
other words,
                                  xaj ≡ yak      (mod n) .
Without loss of generality, assume j ≤ k. Then if we multiply both sides of the above
congruence by (a−1 )j , we have x ≡ yak−j (mod n). But this means that every element
[xap ] ∈ x hai is can be expressed as [yap+k−j ] ∈ y hai, and every element [yaq ] ∈ y hai is
can be expressed as [xaq+j−k ] ∈ x hai. Thus x hai = y hai.                                

   That was a lot of work, but now we get the famous theorems of Fermat’s Little and of
Euler very easily.

     T HEOREM 3.3.4. Euler’s Theorem Say a ∈ Z and n ∈ N satisfy gcd(a, n) = 1. Then
 φ(n)
a      ≡ 1 (mod n).

    P ROOF. The previous theorem, 3.3.3, told us that ordn (a) | φ(n), so ∃m ∈ Z such that
m · ordn (a) = φ(n). By the definition of order, this means
                                                m
                 aφ(n) ≡ am ordn (a) ≡ aordn (a) ≡ 1m ≡ 1 (mod n).

                                                                                           
                        3.3. MULTIPLICATIVE ORDER AND APPLICATIONS                          49

   C OROLLARY 3.3.5. Fermat’s Little Theorem If p is a prime and a ∈ Z satisfies
gcd(p, a) = 1 then ap−1 ≡ 1 (mod p).

    P ROOF. We have seen that for primes p, φ(p) = p − 1, so there is very little to do
here.                                                                                
   Sometimes one sees Fermat’s Little Theorem in the following, different form:

   T HEOREM 3.3.6. If p is a prime then ∀a ∈ Z, ap ≡ a (mod p).

    P ROOF. If gcd(a, p) = 1, then by Fermat’s Little, ap−1 ≡ 1 (mod p). Multiplying
both sides of this congruence by a yields the desired result.
    If instead gcd(a, p) 6= 1, it must be that gcd(a, p) = p since that gcd is a divisor of p,
which is prime. The gcd is also a divisor of a, so in fact p | a. But then a and all its powers
are congruent mod p to 0, so ap ≡ 0 ≡ a (mod p).                                             
50                                   3. PRIME NUMBERS

     Exercises for §3.3.

   E XERCISE 3.3.1. What are the remainder when 15! is divided by 17 and the remainder
when 2 · (26!) is divided by 29?

    E XERCISE 3.3.2. We know 17 is prime (it’s just a wonderful number, isn’t it?). But
just to be sure, use Wilson’s Theorem to prove it.

     E XERCISE 3.3.3. Can we build a test for primality out of Fermat’s Little Theorem? If
so, would it be better (more efficient) than one based on Wilson’s Theorem? How would it
work? Write a clear, formal statement of your proposed primality test.
    If you know a programming language, write some code to try out your primality test.
If not (or, in any case, after the programming), do a little research to see if this question
has already been investigated and, if so, what was the conclusion. Give either a formal
statement of the test and formal result describing its efficacy or a counter-example to your
test that you found in the literature.

   E XERCISE 3.3.4. Suppose p is an odd prime. Prove that if the quadratic congruence
 2
x + 1 ≡ 0 (mod p) has a solution, then p ≡ 1 (mod 4). [Hint: Apply Fermat’s Little
Theorem to a solution a of the congruence, then multiply and divide by 2 in the power.]
            3.4. ANOTHER APPROACH TO FERMAT’S LITTLE AND EULER’S THEOREMS                  51

           3.4. Another Approach to Fermat’s Little and Euler’s Theorems
   There is another way to think about these theorems which we shall explain here, be-
cause it usefully fills out our understanding of multiplication in Z/nZ.
   In this case, we shall consider the theorems in the reverse order to what we used above
– which is in fact more historically accurate.

   T HEOREM 3.4.1. Fermat’s Little Theorem, Redux If p is a prime and a ∈ Z satisfies
gcd(p, a) = 1 then ap−1 ≡ 1 (mod p).

   P ROOF. Let’s start by proving that the set

                         Ma = {[0]p , [a]p , [2a]p , . . . , [(p − 1)a]p }

of multiples of [a]p in Z/pZ has p elements.
    There are only p congruence classes named between the { and }, so the set cannot have
more than p elements.
    Now look at two elements in this set, call them [ja]p and [ka]p with 0 ≤ j, k < p,
and suppose they are equal. That means that ja ≡ ka (mod p), or p | (j − k)a. By
Proposition 3.1.5, this means that either p | (j − k) or p | a. Since gcd(p, a) = 1, we
cannot have p | a. Thus p | (j − k).
    But 0 ≤ j, k < p tells us that −p + 1 < j − k < p − 1 and the only multiple of p in this
range is 0. Therefore j = k, and this means that all of the elements in the above description
of Ma are distinct, so indeed #(Ma ) = p. But we already know that Z/pZ itself has p
elements, which means Ma above is simply another way of describing Z/pZ.
    For our next step, let’s multiply together all the nonzero elements of Ma , or of Z/pZ
since we know it’s the same thing:

                 a · 2a · · · · · (p − 1)a ≡ 1 · ·2 · · · · · (p − 1) (mod p) .

Rearranging these terms, we see

                            ap−1 (p − 1)! ≡ (p − 1)!        (mod p)

or p | (ap−1 − 1) (p − 1)!. Since p being prime means gcd(p, (p − 1)!) = 1, it follows by
Proposition 3.1.5 that p | ap−1 − 1. That is, ap−1 ≡ 1 (mod p), as desired.            

    The above is quite similar to the proof that Euler gave of Fermat’s Little Theorem
(actually, Fermat gave no proof at all – not unlike his famous Last “Theorem”). A very
similar strategy can also prove his own theorem:

   T HEOREM 3.4.2. Euler’s Theorem, Redux Say a ∈ Z and n ∈ N satisfy gcd(a, n) = 1.
Then aφ(n) ≡ 1 (mod n).
52                                          3. PRIME NUMBERS

      P ROOF. Recall that the set (Z/nZ)∗ of (multiplicatively) invertible elements in Z/nZ
can be written as {[b1 ]n , . . . , [bφ(n) ]n } where the numbers b1 , . . . , bφ(n) , satisfying 1 ≤ b1 <
· · · < bφ(n) < n, are all relatively prime to n.
      We claim that we can also describe (Z/nZ)∗ as the set
                                               
                                      Ma∗ = [b1 a]n , . . . , [bφ(n) ]n .
Note first that since each bj is relatively prime to n, and so is a, it follows that bj a is
relatively prime to n as well – the primes which divide bj a divide either bj or a, and there
are no such primes which also divide n. Therefore Ma∗ ⊆ (Z/nZ)∗ .
    Now notice that if [bj a]n = [bk a]n for 1 ≤ j, k ≤ φ(n) then n | (bj − bk )a. By Euclid’s
Lemma 2.1.6, since gcd(a, n) = 1, we must have n | (bj − bk ).
    However, since 1 ≤ b1 < · · · < bφ(n) < n, it follows that −n + 1 < bj − bk < n − 1,
and the only multiple of n in that range is 0. Thus bj = bk and we conclude that all
of the elements in the above description of Ma∗ are distinct. Since there are φ(n) such
elements and Ma∗ is a subset of (Z/nZ)∗ , which has only φ(n) elements, it must be that
Ma∗ = (Z/nZ)∗ .
    As in the previous proof, we conclude that the products of all the elements in Ma∗ and
in (Z/nZ)∗ must be the same:
                          b1 a · · · · · bφ(n) a ≡ b1 · · · · · bφ(n)    (mod n) .
Regrouping terms again, we see
                         aφ(n) b1 · · · · · bφ(n) ≡ b1 · · · · · bφ(n)    (mod n) .
Now the b’s are all invertible mod n, so we can multiply by the inverses, one by one, until
we are left with
                                   aφ(n) ≡ 1 (mod n)
which finishes Euler’s Theorem.                                                                        
            3.4. ANOTHER APPROACH TO FERMAT’S LITTLE AND EULER’S THEOREMS                53

   Exercises for §3.4.

   E XERCISE 3.4.1. Let n, a, and b1 , . . . , bφ(n) be as in the proof of Euler’s Theorem.
Show that b1 · · · · · bφ(n) ≡ ±1 (mod n).
                                       CHAPTER 4


                                       Cryptology

   Here are some Greek roots:
      kryptos, κρυπτ oς: secret, hidden
      logos, λóγoς: word, study, speech
      graph, γράϕω: write, written
   From these (and others), English gets the words
      cryptosystem: a set of algorithms for protecting secrets
      cryptography: work done to make cryptosystems
      cryptanalysis: work done to circumvent the protections of cryptosystems
      cryptology: the union of cryptography and cryptanalysis,
                  often abbreviated simply to crypto.
   Beware that cryptography is widely (but inappropriately!) used as a synecdoche for
cryptology. (This is not unlike the widely understood incorrect usage of the word hacker.)
We will try to use these words more carefully.


    With that understood, we start with a little elementary cryptology in this chapter. There
will be very little number theory, but we will set up some terminology and simple examples
of cryptography and the corresponding cryptanalysis, with an emphasis on the old, historic,
systems which are no longer viable in the modern age. Later chapters will come around
quickly to modern, number theoretic techniques in crypto.


                             4.1. Some Speculative History
    Perhaps there was a form of deception that preceded language – certainly many a house
pet has feigned innocence despite the clear evidence of involvement in stealing treats. And
even apiologists may not know if some lazy bees make up a story about a long excursion
to a new flower patch when their Queen demands an accounting.
    But among homo sapiens, probably as soon as there was language, there was lying.
Of course, when two humans are face to face, both parties have some control, such as:
the listener can make an attempt to evaluate the trustworthiness of speaker, they can both
form their own judgements of the other’s identity and therefore choose what they wish to
share with each other, and the words of the speaker pass directly from their lips to the
                                             55
56                                            4. CRYPTOLOGY

listener’s ears without the possibility of change of meaning in flight (absent considerations
of ambient noise and so on).
     A great deal changed with the invention of writing more than 5000 years ago. Words
frozen in physical form, and the ideas they represent, can be taken and shared with a wide
range of parties other than those with whom the original author wanted to communicate.
In addition, if an author is not able to hand her work directly to the intended reader and
instead the written words are out in the world on their own for a while, then both intended
communicants can no longer be sure that the other is who the writing claims them to be nor
that the writing remains the unchanged symbols that the other party originally set down.
     Let us formalize some of these issues of information security (as it is called now), in
the context of a message to be sent from someone named Alice to someone named Bob.
The role of the possibly disruptive and overly intrusive environment is played in our little
drama by Eve. (Traditionally one skips directly to a character whose name starts with E to
symbolize both the environment and also someone who is potentially an eavesdropper 1.)
     Here is some basic terminology:
    D EFINITION 4.1.1. Confidentiality means that only the intended recipient can extract
the content of the message – Alice wants only Bob to get her message and not Eve, no
matter if she listens on the eavesdrop or intercepts the message as it travels in some form
from Alice to Bob.
   D EFINITION 4.1.2. Message integrity means that the recipient can be sure the message
was not altered – Bob wants to know that what he gets is what Alice wrote, not what the
mischievous Eve intended to change it into.
    D EFINITION 4.1.3. Sender authentication means that the recipient can determine from
the message the identity of the sender – Bob wants to be sure this message did in fact
originate with Alice.
    D EFINITION 4.1.4. Sender non-repudiation means that the sender should not be able
to deny sending that message – Bob wants to be able to hold Alice to her promises.
    Note that Alice and Bob may actually be the same person sending a message from a
past self to their future self. For example, someone may want to keep records which must
be confidential and whose integrity must be reliable, even if there is a break-in to the site
keeping those records.


    One of the earliest schemes attempting to achieve confidentiality was probably used
around the 7th century BCE in Greece by military commanders who wanted to get dis-
patches from far away soldiers, and to send them orders, in a way that even if the messenger
     1
      Eavesdropper apparently comes from the Old English yfesdrype, meaning literally one who stands on
the eavesdrop [ground where water drips from the eaves of the roof] to listen to conversations inside a house.
                              4.1. SOME SPECULATIVE HISTORY                                57

is captured, the message they carry will not be readable by the enemy. They used a device
called a scytale (the “c” is hard and the word rhymes with “Italy”), which was a cylindrical
stick – maybe a spear shaft – (σκυτ άη means baton in ancient Greek), around which a long
strip of parchment was wrapped in a spiral. A message could then be written in parallel
rows along the length of the scytale, so that when the strip was unwound, the letters were
all jumbled and the message was unreadable.




                             F IGURE 4.1.1. A scytale in use.2


    On the other end, assuming the receiver also has a scytale, when the strip is wound in a
spiral and read lengthwise, the message reappears, as if by magic.


   Before we go on, a few more technical terms:

   D EFINITION 4.1.5. The message that Alice wishes to send to Bob, in its actual original
(and final) form is called the plaintext.

    D EFINITION 4.1.6. An algorithm that Alice uses to transform (obfuscate) the plaintext
into a form that will remain confidential even if observed by Eve while in flight is called a
cipher. We also say that Alice encrypts the (plaintext) message.

   D EFINITION 4.1.7. After the message has been encrypted, it is called the ciphertext.

   D EFINITION 4.1.8. When Bob receives the ciphertext, he applies another algorithm to
decrypt it and recover the plaintext.



   2
    Image by DMGualtieri, CC-BY-SA-3.0 http://creativecommons.org/licenses/by-sa/3.0, via
Wikimedia Commons, downloaded from https://commons.wikimedia.org/wiki/File%3AScytale.png
58                                       4. CRYPTOLOGY

     Graphically:
            Basic crypto terminology:
                  Alice           on public network                       Bob
          plaintext/cleartext
              message m
           encrypts m to c
              transmits c        ֌ ciphertext c ֌                     receives c
                                                                     decrypts c to
                                                                  recover plaintext m

   Suppose a spy sees scouts in the field wrapping strips of parchment around their spears
and writing down the length of the spear: meaning Eve learns the encryption algorithm.
Although the algorithm is known, confidentiality may be preserved:

   D EFINITION 4.1.9. Additional information (some of which is) used in encryption and
(some of) which is necessary for successful decryption is called a key.

    In the case of encryption with a scytale, the diameter of the scytale is the key. In fact,
there is some conjecture that the scytale was also a simple from of authentication: only
Alice had the matching scytale to the Bob’s, so if Bob could read any coherent sequence of
letters at all along his scytale, he knew Alice had sent it.
    Even with a key, the vulnerabilities of this cryptosystem are fairly clear. And Athens
did lose the Peloponnesian War.


    Many hundreds of years of cryptology have shown that in fact it is better to reveal the
algorithms of your cryptosystem to the public, and only to keep the key for a particularly
communication channel (between Alice and Bob, say) secret. There are many reasons for
this, the primary one probably being that the author of a cryptosystem can never be sure
they have thought of all the possible methods of cryptanalysis which will be used against
it. So the system’s author is better off letting the general crypto community do its best
against the system, and only the systems which have withstood such assault should be
used. After all, this is basic to the scientific method itself: scientists publish all the details
of their experiments, so others can try them as well and give independent verification or
find something to criticize.
This idea – that the security of a cryptosystem should be based on the secrecy of the key
but not of the algorithm – has come to be called Kerckhoff’s Principle, after a set of
cryptosystem design ideas written down in 1883 by a French military cryptography. It is
held in opposition to cryptosystems, thought of as very weak, which rely upon no one ever
finding out the algorithms: such systems rely instead on the ill-advised security through
obscurity paradigm.
                               4.1. SOME SPECULATIVE HISTORY                                 59

   Exercises for §4.1.

    E XERCISE 4.1.1. Suppose for some k ∈ N, the cleartext Alice wishes to send consists
of some symbols m1 , . . . , mk . Assume that the scytale she uses has a diameter such that
s ∈ N letters can be written on each turn of the spiral of parchment, when it is wrapped
around the scytale.
    Write one or more formulæ describing what the letters c1 , . . . , ck of the ciphertext will
be. You should assume s < k and, if you like, that they have any particular useful relation-
ship such as s | k.

    E XERCISE 4.1.2. Even if the scytale cryptosystem were strong, there may be problems
using it for authentication. Describe how authentication might fail depending upon the
message being sent – give an example or two of “bad” messages for authentication.

   E XERCISE 4.1.3. The scytale cryptosystem seems weak. Describe how you would
cryptanalyze it.
60                                       4. CRYPTOLOGY

                         4.2. The Caesar Cipher and Its Variants
     Another system which dates to ancient times was supposedly used by Julius Caesar.

    D EFINITION 4.2.1. Alice takes her message, removes all spaces and punctuation, and
puts it all in one case (maybe upper case). Then she moves each letter k places down the
alphabet, wrapping around from Z to A if necessary, where k ∈ Z is a fixed number known
to both Alice and Bob but no one else, called the key.
    To decrypt, Bob simply moves each letter k places earlier in the alphabet, wrapping
past A to Z if necessary: i.e. Bob encrypts the ciphertext with key −k to get the plaintext.
    This is called the Caesar cryptosystem.

    Apparently Julius Caesar usually used the key value k = 3. His nephew Octavian, who
later became the emperor Augustus, liked to use k = −1.
    When using what is called the Latin alphabet (which is not what was used in ancient
Rome, though), with its 26 letters, there is one particularly nice key value: 13. The nice
thing about that value is encryption and decryption are exactly the same transformation. In
modern times, this transformation is called ROT13, and it has a small role in the modern
history of the Internet. In particular, posts on early chat rooms and bulletin boards would
sometimes want to have a bit of content that should not be automatically available to anyone
who looks at the post, but would be there for the determined reader (such as, for example,
a spoiler in a review of some popular new game, book, or film). These were often included
in the post, but only after they had been run through ROT13.
    A few commercial products used ROT13 for actual security, despite it actually being
completely insecure, such as certain parts of some versions of the Windows operating sys-
tem.


    The Caesar ciphers were completely broken (in a number of ways; see §4.3, below) be-
fore 1000CE, but a descendent was developed in the late Middle Ages and was considered
the state of the cryptological art through the early modern period.

    D EFINITION 4.2.2. Once again, Alice takes her message, removes all spaces and punc-
tuation, and puts it all in one case (maybe upper case). This time, the key ~k = (k1 , . . . , kℓ )
which Alice and Bob share is an ℓ-tuple, for ℓ ∈ N, of integers k1 , . . . , kℓ ∈ Z.
    To encrypt, Alice steps through her plaintext message and key sequence both one letter
at a time, moving each plaintext letter forward (wrapping from A to Z if necessary) by
the number of letters specified by the corresponding key number. If she runs out of key
numbers, she simply starts again at the beginning of the key sequence.
    To decrypt, Bob simply moves each letter k places earlier in the alphabet, wrapping
past A to Z if necessary: i.e. to decrypt, Bob encrypts with key −~k = (−k1 , . . . , −kℓ ).
    This is called the Vigenère cryptosystem.
                          4.2. THE CAESAR CIPHER AND ITS VARIANTS                          61

    Traditionally, the key in Vigenère is a written down, memorized, and shared between
Alice and Bob in the form of a word. Then when encrypting or decrypting, the letters of
the keyword are “added” to the message letters, as described above, under the convention
that A = 1, B = 2, etc..
    Notice that Vigenère with key length ℓ is essentially ℓ Caesars in parallel – in fact,
if ℓ = 1, they are exactly the same cryptosystem. It therefore turns out that Vigenère is
essentially ℓ + 1 times harder to crack than Caesar, the “ + 1” coming from the fact that
Eve doesn’t even know ℓ. Nevertheless, after a couple of hundred years in which it was
considered unbreakable and used as the principle diplomatic cipher in the courts of Europe,
approaches to breaking Vigenère were developed.


   As one final variant of the Vigenère cipher, suppose we go in the opposite direction
from the ℓ = 1 extreme, and instead take ℓ as long as possible.

    D EFINITION 4.2.3. A one-time pad is a Vigenère cryptosystem in which the key is
as long as the message, chosen randomly, and never re-used. [The key is also called the
one-time pad in this cryptosystem3.]

    (A one-time pad is sometimes called – inappropriately, given the true intellectual his-
tory of the cryptosystem – a Vernam Cipher .)
    It is important to have a good key sequence in a one-time pad cryptosystem. The good
news is that one can prove that with a truly random one-time pad, the resulting cryptosys-
tem is in fact perfectly secure. In computer science, this is called information theoretically
secure , because the proof of security does not rely upon any assumptions about the com-
putational resources available to the attacker. [See [Sha49] and [Sha48] for this proof.]
Other cryptosystems we shall meet below are only secure if we assume the attacker has
access to a computer of a particular type (a probabilistic polynomial-time Turing machine
is the usual assumption; this is the subject within computer science called computational
complexity, see, e.g., [AB09]).
    It is also important never to use the same pad more than once. If so, an attacker can
take the letter-by-letter difference of the two ciphertexts and this will completely remove
the pad from the calculation. In fact, at that point the message can usually be determined
quickly.
    In modern times, after the advent of digital communications networks, messages are
written as computer data, so everything is stored (and transmitted) as bits, meaning 1s and
0s. With bits, when we do the equivalent of what was described above as shifting a letter
along the alphabet, wrapping from Z to A if necessary, we are either doing nothing, if the
shift is by an even amount, or simply switching 0 and 1 otherwise.

   3Another synecdoche!
62                                     4. CRYPTOLOGY

    Another way of saying that is to say that is that if the message and the key are both
written all out in bits – think of them as the elements [0], [1] ∈ Z/2Z – then the encryption
consists exactly of adding the corresponding bits mod 2. A good one-time pad is therefore
a very random, and very long, string of bits (congruence classes in Z/2Z).
    The big problem with one-time pads is key distribution: Alice and Bob must share very
large one-time pads before they ever start to communicate, large enough to have as many
letters (or bits, in the computer age) as all of their future messages.
    This does suggest an interesting approach to cryptography: find a way that Alice and
Bob can share a small secret out of which they can generate arbitrarily long sequences of
bits, both getting the same sequence, which seem to any attacker to be entirely random.
If this were possible, they would use these pseudorandom sequences (called this because
they are not actually random – after all, they can be determined in advance by both Alice
and Bob using their small starting secret – they only appear so to all attackers) as one-time
pads. [See [Lub96] for more about pseudorandomness.]
                           4.2. THE CAESAR CIPHER AND ITS VARIANTS                               63

    Exercises for §4.2.

   E XERCISE 4.2.1. ROT13 was supposedly nice because it reduced the amount of coding
which needed to be done: the same program would decrypt as the one which encrypts, since
doing ROT13 a second time undoes the first ROT13’s transformation.
   Is this true for other keyed Caesar ciphers? Make a conjecture about whether, given a
Caesar cipher key k ∈ Z and a message m, there always exists an n ∈ N such that
                                         n times
                                     z      }|       {
                                     eCk ◦ · · · ◦ eC
                                                    k (m) = m

where eCk is the function which does the Caesar encryption with key k. If so, find an
expression for the smallest such n, which depends (if necessary) on k, m, and the size of
the alphabet in which m is written.

    E XERCISE 4.2.2. Continuing the previous exercise: Suppose now ~k = (k1 , . . . , kℓ ) is
an ℓ-tuple, for ℓ ∈ N, of integers k1 , . . . , kℓ ∈ Z and e~Vk is the Vigenère encryption function
with key ~k. Find if for all messages m, there exists an n ∈ N such that
                                         n times
                                     z       }|        {
                                     e~Vk ◦ · · · ◦ e~Vk (m) = m
and, if so, find an expression for the smallest such n, which depends (if necessary) on ~k,
m, and the size of the alphabet in which m is written.

    E XERCISE 4.2.3. Suppose Eve has a twin Vev, and they are both looking at an inter-
cepted ciphertext c sent by Alice to Bob. They think that those crazy lovebirds Alice and
Bob have used a one-time pad to encrypt their communications, but Eve and Vev both do
lots of calculations and think they have managed to correctly decrypt c. Unfortunately, they
have different (non-gibberish!) values me and mv which Eve and Vev, respectively, think
were the plaintext that Alice meant to send to Bob.
    What must have been the one-time pads pe and pv which they were somehow calculating
(or guessing) that Alice used for their respective encryptions?
    Could they have come up with arbitrary proposed decryptions me and mv , or is there
some relationship between the proposed decryptions, the proposed pads pe and pv , the true
plaintext, and Alice’s actual one-time pad?
    Work out a particular, concrete example: if Alice’s original cleartext message m was
aardvark, would it be possible for Eve to think the message was me = iloveyou
while Vev thinks it is mv = ihateyou? If so, give an example of what would be the
corresponding ciphertext c, Alice and Bob’s one-time pad pAB , the proposed decrypts me
and mv , and the proposed one-time pads pe and pv .
64                                       4. CRYPTOLOGY

                4.3. First Steps into Cryptanalysis: Frequency Analysis
     The Caesar cipher seems very weak. But looking at a ciphertext, such as
                   wrehruqrwwrehwkdwlvwkhtxhvwlrq
                   zkhwkhuwlvqreohulqwkhplqgwrvxiihu
                   wkhvolqjvdqgduurzvrirxwudjhrxviruwxqh
                   ruwrwdnhdupvdjdlqvwdvhdriwurxeohv
                   dqgebrssrvlqjhqgwkhp
it is hard to know where to start – this hardly seems to be English at all.
     Perhaps we should start with the Caesar cipher itself, assuming (anachronistically) that
Caesar was following Kerckhoff’s Principle, or that (more chronistically) spies had deter-
mined the cryptosystem but not the key.
     One thing we notice right away is that what we do not know, that key, is such a small
thing to be giving us such trouble. In fact, if we tried a random choice of key, nearly four
times out of 100 (using the modern “Latin alphabet” of 26 letters – in Caesar’s time, the
actual Latin alphabet had only 23, so even better!) would, purely by dumb luck, give us a
correct decryption. Formally, we are talking about

     D EFINITION 4.3.1. The set of all valid keys for some cryptosystem is called its keyspace.

     E XAMPLE 4.3.2. The keyspace of the Caesar cipher cryptosystem is the alphabet.

    E XAMPLE 4.3.3. The keyspace of the Vigenère cipher cryptosystem with a key length
of ℓ is all sequences of ℓ letters in the alphabet; therefore, it has N ℓ elements, if the alphabet
is of size N. If the precise key length is not known, but it is known to be less than or equal
                                          P
to some bound L ∈ N, then there are Lj=1 N j possible keys.

   E XAMPLE 4.3.4. Any sequence of letters of length M is a possible key (pad) for a one-
time pad encryption of a message of length M. Therefore there are N M possible one-time
pads for a message of length M over an alphabet of size N.

    We computed the sizes of these various keyspaces because one thing that seemed weak
about the Caesar cipher was how small was its keyspace. In fact, a better strategy than
simply guessing at random would be to try all possible keys – there are only 26 (or 23
in Julius’s time)! – and see which one gives the correct decryption. Such an approach
is called a brute-force attack [or exhaustive search]. Even in Caesar’s time, the Caesar
cipher keyspace is so small that Eve could check all possible keys and see which yielded
the cleartext of a message from Alice to Bob.
    Things are a little more difficult for the Vigenère cipher. For example, there are nearly
12 million keys to try if the key length is known to be five. Before the computer age, this
would have been completely intractable. Even in the 21st century, where it would be nearly
instantaneous to human eyes for a computer to generate all of those possible decryption,
                   4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS                            65

there is a problem: those human eyes would have to look over the 12 million possibilities
and pick out which was the valid decryption.
    Amazon’s Mechanical Turk (see https://www.mturk.com/mturk/welcome)
might be able to marshal enough human eyes to solve a single brute force Vigenère de-
cryption. But a better approach would be to write a computer program that can distinguish
gibberish from a real message Alice might have sent to Bob – then, in a matter of moments,
a brute force attack would succeed.
    If we are to distinguish gibberish from a real message, we need to know what possible
real messages are. This motivates the following

    D EFINITION 4.3.5. The set of possible messages for an encrypted communication is
called the message space.

    Without some structure for the message space, cryptanalysis can become nearly impos-
sible. For example, if Alice is e-mailing an encrypted version the combination of a safe to
Bob, the message space is not too large but has no identifiable features: a brute force attack
would have no way of telling which potential decryption was the actual combination.
    But suppose we assume that the message space of Alice and Bob’s communication is a
set of short (or long, which would be better) English texts. There is quite a bit of structure
in English texts – we human speakers of English certainly recognize valid English. The
question is whether we can automate the process of detection of valid English texts.
    Looking back at the example Caesar encryption given at the beginning of this section,
we noticed that it did not look like English – but in what way? One thing was, of course,
that it does not have any English words in it. Someone who knows English knows most of
the words of that language and can recognize that this text has none of them, except maybe
“up” and “i” (several times).
    This suggests an approach: take a list of all words in English, with all variant and
derivative forms, add in some possible short random sequences that may have been included
in otherwise standard English (where Alice was quoting some nonsense she’d heard, or was
writing down a graffito she saw on the side of a subway car, etc.), and compare them to
possible decryptions of the ciphertext. This scheme is not very practical. But if the message
space were small enough, something like this would be reasonable.
    Looking back at the message again, another thing that immediately strikes the eye of a
reader of English is that it does not seem to have the right letters, or the right combinations
of letters. For example, there do not seem to be enough vowels. Looking at how often
letters occur in a text compared to what we expect is called frequency analysis. Its use in
cryptanalysis was apparently first described by the Muslim philosopher and mathematician
al-Kindi4 in his 9th century work Manuscript on Deciphering Cryptographic Messages.
    4
     Al-Kindi is a very interesting figure from early Muslim intellectual history: e.g., he seems to have
brought Hindu numbers, with their place-value notation, into the Muslim world.
66                                     4. CRYPTOLOGY

    Here is a table of the frequencies of the letters in a bit of standard English (some of the
plays of Shakespeare):




         F IGURE 4.3.1. Frequency counts of letters in several Shakespeare plays




    Notice that the letter ‘e’ is the most common, by a bit. So a first cryptanalytic approach
using frequency analysis would simply be to use the Caesar decryption key which moved
the most frequent letter in the ciphertext to be ‘e’. With the ciphertext at the beginning of
this section, that would yield the decryption:

                  ezmpzcyzeezmpesletdespbfpdetzy
                  hspespcetdyzmwpctyespxtyoezdfqqpc
                  espdwtyrdlyolcczhdzqzfeclrpzfdqzcefyp
                  zcezelvplcxdlrltydeldplzqeczfmwpd
                  lyomjzaazdtyrpyoespx

Not so good. Apparently looking only at the most frequent letter is insufficient: the most
frequent letter in the plaintext of this message was not ’e’.
    Instead let us look at the entire frequency table for this ciphertext.
                 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS                    67




             F IGURE 4.3.2. Letter frequencies in a sample Caesar ciphertext


    Unfortunately, this is not exactly the standard (Shakespearean) English frequency dis-
tribution, nor is it even a shifted (with wrapping past 25 back to 0) version of Shakespeare’s
distribution. But perhaps one of its shifts is fairly close to Shakespeare.
    This suggest the following more refined cryptanalytic approach: try all possible decryp-
tions (which is not so bad for the Caesar cipher, because of its small keyspace) and choose
the one whose entire letter frequency table is closest to the standard English letter frequency
table – so, not looking only at the most frequent letter, but looking at the whole frequency
distribution; and not looking for exact match, but merely for the best approximate match.
    Which brings up the question of what is the best way to measure the distance between
two distributions. One commonly used approach is to measure the

   D EFINITION 4.3.6. The total square error is a distance between two distributions f
and g given by
                                            25
                                            X
                                d(f, g) =         (f (j) − g(j))2
                                            j=0

where we are always working with our distributions defined on the letters in the form ‘a’=0,
‘b’=1, etc.).

    Minimizing this distance is equivalent to least squares in statistics or linear algebra.
    Following this strategy with the ciphertext from the beginning of this section, we get
the following
68                                     4. CRYPTOLOGY




     F IGURE 4.3.3. Square errors with all decryption keys for example Caesar ciphertext




    The minimal square error will then come from decryption key 23 (corresponding to
an original encryption key of 3; this was done by Julius himself, anachronistically). The
corresponding cleartext then would have been

                   tobeornottobethatisthequestion
                   whethertisnoblerinthemindtosuffer
                   theslingsandarrowsofoutrageousfortune
                   ortotakearmsagainstaseaoftroubles
                   andbyopposingendthem

which looks pretty good.
    Interestingly, this approach to non-gibberish detection is fairly robust, even with quite
short texts which might be expected to have frequency distributions rather far away from
the English standard. For example, with ciphertext

                          rkkrtbrkeffespkyvtifjjifruj

we have
                 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS                    69




       F IGURE 4.3.4. Square errors with all decryption keys for Caesar ciphertext
       rkkrtbrkeffespkyvtifjjifruj

from which we conclude that the decryption key is 9, corresponding to an encryption key
of 17. Thus the cleartext must have been
                          attackatnoonbythecrossroads
which seems like the kind of thing Julius might have said.


     So much for cryptanalysis of the Caesar cipher. Moving on to Vigenère, the first prob-
lem we face is finding out the key length. Because if we knew that key length ℓ, we would
divide the ciphertext into ℓ shorter texts consisting of every ℓth character starting with the
first, then with the second, etc., up to starting with the ℓth . Applying the automatic Caesar
cracker described above, we would have the ℓ digits of the key and thus the plaintext.
     To make things easier, suppose we take all of Hamlet and encrypt it with Vigenère
using the keyword hippopotomonstrosesquipedaliophobia, corresponding to
numerical key
                k = (7, 8, 15, 15, 14, 15, 14, 19, 14, 12, 14, 13, 18, 19, 17, 14,
                    18, 4, 18, 16, 20, 8, 15, 4, 3, 0, 11, 8, 14, 15, 7, 14, 1, 8, 0)
For example, the part of the ciphertext corresponding to the quote used above in Caesar
cracking would be
70                                      4. CRYPTOLOGY

                   hdxajnioomjvdtfvstrqitmfozlxjj
                   kcmiosgpenjjbgxmcmtfzltmaudofpmwg
                   odsntxuuhwjywmrjpnieosoqlfbpjoqyyljia
                   cmbdaozawminabtdhrtyndlncugkflswh
                   vjrwgdwddoeicznymcyl
If Eve was hoping this was only Caesar-encrypted, so the Vigenère key length was ℓ = 1,
she would have the following letter frequencies:




       F IGURE 4.3.5. Letter frequencies in the Vigenère encryption of Hamlet
       using key hippopotomonstrosesquipedaliophobia

Notice that compared to Figure 4.3.2, the values show much less variation – in fact, there
are no zero values and no values which are much larger than all others. The reason is that
this distribution is the average of 16 different shifts, coming from the 16 different letters in
the keyword hippopotomonstrosesquipedaliophobia, of the standard English
distribution Figure 4.3.1. This averaging smooths out all of the characteristic shape of the
distribution useful in identifying when its decrypting shift is correct.
     Nevertheless, the Caesar cracker will give some key value for each one of the choices
Eve makes for a possible value of ℓ – there will always be a value of the shift for each letter
which minimizes the square error. The minimum will not be very small, though, because
the frequency distribution for the letters chosen as all locations in the ciphertext congruent
to k (mod ℓ) for any k ∈ N and k ≤ ℓ, where ℓ is less than the true key size, will be quite
flat and therefore far away (in the square error distance) from the distribution of standard
English.
                  4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS                          71

    For example, with the encrypted Hamlet we have been examining, if Eve guesses that
ℓ = 5 and looks at every letter in a location congruent to 1 (mod 5), the Caesar cracker
will have the following graph of square error:




        F IGURE 4.3.6. Square errors with all decryption keys for letters at lo-
        cations congruent to 1 (mod 5) in Hamlet Vigenère encrypted with key
        hippopotomonstrosesquipedaliophobia

      The minimum is clearly at a decryption key of 23, but it is not significantly less than
 other possibilities – and all the errors are around ten times higher than we saw in earlier
 such graphs such as Figure 4.3.3.
      So here is the Vigenère cryptanalytic approach: Try all key lengths ℓ up to some bound
 L (determined by how much computer time is available, and not so large that taking every
 ℓth letter of the ciphertext results in too few letters to do reasonable frequency analysis). For
 each of the ℓ substrings consisting of every ℓth letter starting at locations 1, . . . , ℓ, apply the
 Caesar cracker. This will result in an optimal Vigenère key ~kℓ = (kℓ,1 , . . . , kℓ,ℓ) of length ℓ.
 Then for each such ~kℓ in the range 1, . . . , L, compute the square error distance dℓ between
 the ciphertext decrypted with ~kℓ and the standard English letter distribution. Report ℓ and
~kℓ as the Vigenère keylength and key if dℓ is the smallest square error computed.
72                                     4. CRYPTOLOGY

     Exercises for §4.3.

    E XERCISE 4.3.1. Can you think of any Caesar ciphertexts which would have two de-
cryptions that would both seem like valid English during an brute-force attack? If so, give
precise examples, with decryption keys.

    E XERCISE 4.3.2. Would it be harder or easier to do the previous exercise (4.3.1) for
the Vigenère cipher? Give reasoning and examples.

    E XERCISE 4.3.3. It seems like more encryption would be better, in terms of designing
a secure cryptosystem – so how about encrypting a cleartext using one cryptosystem, then
re-encrypting the resulting ciphertext to make a super-ciphertext?
    We have two cryptosystems so far with short keys (we also have the one-time pad,
but it is already perfectly secure and has big keys, those pads, so we will not consider it):
Caesar and Vigenère. Suppose I make a new cryptosystem which does some combination
of Caesar and Vigenère encryptions, one after the other, all with different keys. Will this
result in a substantially more secure cryptosystem? Why or why not?
                      4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM                        73

                   4.4. Public-Key Crypto: the RSA Cryptosystem
     Suppose Alice and Bob never had a chance to meet in person, and they nevertheless
want to exchange messages which will be secret from Eve. What can they do?
     This seems impossible within the context of cryptosystems we have been discussing so
far. Let us tease out the crucial part that makes this so difficult
    D EFINITION 4.4.1. A symmetric cipher (or symmetric cryptosystem) consists of the
following parts, all known to both the communicating parties and the public in general:
      • a message space M
      • a keyspace K
      • an encryption algorithm which creates a ciphertext c = ek (m) for any choice of
        k ∈ K and m ∈ M.
      • a decryption algorithm which generates a message dk (c) ∈ M given a ciphertext
        c and a k ∈ K, which satisfies dk (ek (m)) = m ∀k ∈ K, m ∈ M.
Alice and Bob can use such a symmetric cipher by agreeing in private upon a key k ∈ K
they will use for both encryption and decryption. For this reason, symmetric cryptosystems
are also called private key cryptosystems.
   Graphically:

     Basic private key crypto set-up and notation:
                                private communication
         Alice           ←→ of shared key k ∈ K       ←→                     Bob
                                      on public network
      message mA ∈ M
     compute cA = ek (mA )
         transmit cA                ֌ ciphertext cA       ֌          receive cA
                                                               compute mA = dk (cA )
                                                                 message mB ∈ M
                                                               compute cB = ek (mB )
         receive cB                 ֋ ciphertext cB       ֋         transmit cB
    compute mB = dk (cB )

                          etc....


     It is important for the security of the above cryptosystem that the keyspace K either
be quite large, or that the whole system be structured in such a way that it is hard to tell
if a particular possible decryption is valid or not (or both). Otherwise Eve can perform
the brute-force attack of simply trying all the possible keys k ∈ K and seeing which dk (c)
makes sense as the decryption.
74                                     4. CRYPTOLOGY

    In any case, because of the initial private key exchange, symmetric cryptosystems are
not appropriate for the kind of situation we proposed at the start of this section. Instead,
one would need a different kind of cryptosystem:

    D EFINITION 4.4.2. An asymmetric cipher (or asymmetric cryptosystem) consists of
the following parts, all known to any interested party, licit or il-:
       • a message space M
       • an encryption keyspace Ke
       • a decryption keyspace Kd
       • a way of associating encryption keys to decryption keys, by a one-to-one function
         E : Kd → Ke
       • an encryption algorithm which creates a ciphertext c = eke (m) for any choice of
         ke ∈ Ke and m ∈ M.
       • a decryption algorithm which generates a message dkd (c) ∈ M given a ciphertext
         c and a kd ∈ Kd , which satisfies dkd (eE(kd ) (m)) = m ∀kd ∈ Kd , m ∈ M.
To use such a cryptosystem, Bob picks a decryption key kd , without telling it to anyone –
but he makes the corresponding encryption key ke = E(kd ) publicly available (probably on
his website). For this reason, ke is also called his public key and kd his private key, while
the whole system is called a public-key cryptosystem. .

     Graphically:

      Basic public key crypto set-up and notation:
            Alice                on public network                     Bob
                                                                  pick kd ∈ Kd
                                                             compute ke = E(kd ) ∈ Ke
         download ke            ֋    public key ke     ֋           publish ke
       message m ∈ M
      compute c = eke (m)
          transmit c             ֌    ciphertext c   ֌              receive c
                                                                compute m = dkd (c)

     As before, the security of this cryptosystem against brute-force attacks relies upon the
size of Kd . In addition, since the encryption key and algorithm are known to Eve, it is
important that the message space M not be too small. If M were small, Eve could simply
compute all encryptions eke (m) as m ran over M, and compare them to a ciphertext c she
intercepted.
    There is a way to prevent this attack on the message space, by making it artificially
larger. In fact, one usually adds some
                         4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM                                  75

    D EFINITION 4.4.3. Cryptographic salt is random data added to a message Alice de-
sires to send to Bob before she encrypts it, which is automatically removed at Bob’s end.

Once Alice salts her messages, Eve must search the entire space of messages together with
salt, which can be much larger than M.
    As an added benefit, salting messages makes the ciphertext change with each secret
transmission – even if the plaintext is the same. Therefore even though Eve will of course
know there is communication going on between Alice and Bob, her traffic analysis will no
longer even be able to track when the same message is being sent on separate occasions.
Absent this salt, a meticulous Eve might be able to correlate the message ciphertext with
actions in the real world, and thus effectively learn their decryption even without cracking
the cryptosystem. With salt, this is impossible.
    Since Bob’s encryption key ke = D(kd ) is public, the entire security of this cryptosys-
tem falls apart if Eve can figure out the inverse of the function E : Kd → Ke . On the other
hand, Bob must have been able to compute E itself, at least during the set-up phase of the
cryptosystem. There is a name for this kind of function

    D EFINITION 4.4.4. A function f : A → B which is one-to-one and which can be
computed in a reasonable amount of time on a standard (classical) computer but whose
inverse cannot feasibly be computed is called a [cryptographic] one-way function.


    Multiplying two numbers, even very large numbers, can be done quite quickly on a
computer. It is even surprisingly fast to tell if very large numbers are prime or composite
(see [AKS04] for this amazing recent result in the long history of primality testing).
    But no one has found a way to factor large numbers, even when they are known to be
composite, in a reasonable amount of time.5 For example, one of the largest ever to be fac-
tored, a famous challenge problem known as RSA-768, consisting of a 232-digit (768 bit)
composite number, was finally factored in 2009 after a network of hundreds of computers
working together (although not exclusively on this problem) had been processing for about
two years. Larger composite numbers are easy to make and astronomically harder to factor.
    Therefore, the function which multiplies two large primes together to get a very large
composite is a good candidate for a cryptographic one-way function.
    In 1978, Ron Rivest, Adi Shamir and Leonard Adleman described ([RSA78]) the fol-
lowing public-key cryptosystem based on this one-way function:

    D EFINITION 4.4.5. Bob starts by picking two very large primes p and q. Their prod-
uct n = pq is known as the corresponding RSA modulus. Bob also picks a number
e, typically a number which when written in base two has very few 1’s, such as 3 or
    5
      On a classical computer – there is an efficient algorithm known for factoring on a quantum computer,
see [Sho94] and [NC10].
76                                      4. CRYPTOLOGY

65537 = 100000000000000012, which satisfies gcd(e, φ(n)) = 1. This number is called
the RSA exponent.
    The RSA public [encryption] key ke consists of the pair (n, e); the set of possible such
pairs is Ke .
    The RSA private [decryption] key kd consists of the pair (n, d), where
                                   d = e−1     (mod φ(n)),
and Kd is the set of such pairs.
   The association of encryption to decryption keys is by E : (n, e) 7→ (n, e−1 (mod φ(n))).
   The message space of this cryptosystem is M = {m ∈ Z | 0 ≤ m < n}.
   Encryption is by
                              c = e(n,e) (m) = me (mod n) .
     Decryption is
                                  d(n,d) (c) = cd   (mod n) .
     All together, the above is called the RSA cryptosystem.

     Graphically:

     RSA cryptosystem set-up and notation:
          Alice               on public network                            Bob
                                                                pick large primes p and q
                                                             compute RSA modulus n = pq
                                                               pick RSA exponent e ∈ N
                                                                    with gcd(e, φ(n))
       download ke             ֋ public key ke       ֋             publish ke = (n, e)
                                                              compute d = e−1 (mod φ(n))
   message m ∈ M
 compute c = e(n,e) (m)
    = me (mod n)
      transmit c               ֌ ciphertext c       ֌                  receive c
                                                                  compute m = d(n,d) (c)
                                                                     = cd (mod n)

    The first thing we need to see is that this satisfies the basic condition to be a functioning
cryptosystem:

   P ROPOSITION 4.4.6. Let notation be as in the above definition of the RSA cryptosystem.
Then for any message m ∈ M, d(n,d) (e(n,e) (m)) = m.

   P ROOF. Any m ∈ M represents a class in Z/nZ, and we will blur the distinction
between the class and its representative m satisfying 0 ≤ m < n.
                      4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM                         77

   We have built d as d = e−1 (mod φ(n)). This means e · d ≡ 1 (mod φ(n)), so ∃k ∈ Z
such that e · d = 1 + kφ(n).
   Then by Euler’s Theorem 3.3.4, ∀m ∈ M such that gcd(m, n) = 1
d(n,d) (e(n,e) (m)) ≡ (me )d = med = m1+kφ(n) ≡ m · (mφ(n) )k ≡ m · (1)k ≡ m       (mod n)
as desired. The case where gcd(m, n) 6= 1 is Exercise 4.4.1 in this section.              
     Beyond basic functionality, we need to see how practical RSA is. So let us go through
it carefully, picking out the algorithms needed at each step.
     (1) Bob must choose the large primes p and q. As we have said, there are algorithms
         which can tell if a number is prime in a reasonable amount of time. More precisely,
         this means that a (classical) computer can determine if a number k is prime in an
         amount of time which is less than a polynomial function of log(k). [That’s the
         usual meaning of feasible computation in cryptology.]
     (2) Furthermore, there are enough primes that Eve will not be able to do a brute force
         search through all the possibilities. We can tell this because The Prime Number
         Theorem tells us the (asymptotic) number of primes:

            T HEOREM 4.4.7. Given x ∈ R, x > 0, let π(x) = #{p ∈ N|p ≤ n} be the
         prime counting function. Then
                                         π(x)
                                     lim         = 1.
                                    x→∞ x/ ln(x)

         This theorem is one of the gems of analytic number theory. It was proven in 1896
         by Jacques Hadamard and Charles Jean de la Vallée-Poussin, although important
         pieces were known before then by people like Euler, Legendre, and Riemann. The
         proof is hard, even an elementary one from 1948 by Atle Selberg and Paul Erdös.
              One consequence of the Prime Number Theorem is that the probability that
         a randomly chosen number k is prime is approximately 1/ln(k). For example,
         looking for a prime with 200 digits, we pick a random number of that size, and
         then we will have to test on the order of ln(10200 ) = 461 numbers before we have
         a prime. This is not an unreasonable amount of computation.
     (3) Having an RSA modulus n = pq, we can compute the Euler totient function
         φ(n) = (p − 1)(q − 1) nearly instantly by Theorem 2.5.3.
     (4) Finding the RSA exponent e which is relatively prime to φ(n) is not hard, since
         there are many such – in fact, in a real sense, the probability that two randomly
         chosen numbers are relatively prime is π62 (for a precise statement of this, and its
         proof, see [HW79]). Testing candidates for e is also efficient because, as we shall
         see below in Exercise 4.4.3, the Euclidean Algorithm takes an amount of time
         which is less than a polynomial function of the log’s of the numbers involved – it
         is feasible.
78                                     4. CRYPTOLOGY

     (5) Next, computing d = e−1 (mod φ(n)) is very fast using the Extended Euclidean
         Algorithm (Theorem 1.6.2) which is certainly slower than the basic Euclidean
         Algorithm, but is still feasible.
     (6) RSA encryption and decryption both consist of raising a number to a power in
         (mod n). Fortunately, there is an algorithm called fast modular exponentiation
         which again does this in an amount of time less than a polynomial function of the
         log’s of the given numbers.
     (7) One last practical point is how Alice actually uses RSA to send the message she
         wants, and not merely a number m ∈ Z such that 0 ≤ m < n.
             This is a standard problem in mathematical cryptography and has standard
         solutions. Typically, Alice must take her message, character by character, and
         transcribe it into a sequence of numbers using some accepted standard. This has
         been done since 1960 using the ASCII code [American Standard Code for Infor-
         mation Interchange], which gives 7 bit binary numbers for the 26 letters of the
         (modern) Latin alphabet, both upper and lower case, as well as for a set of com-
         monly found symbols and a few modern additions, usually no-printing, such as
         111102 Record Separator, 1112 Bell, 10112 Vertical Tab, etc. These ASCII codes
         are today always stored in one byte (8 bits), padded with a leading 0 bit.
             Actually, as the World Wide Web has become a global phenomenon, there
         has been an increasing need for more alphabets and even writing systems (like
         Chinese) which do not use alphabets. This has resulted in a system called Unicode
         which as, in version 6.3 as of 2013, more than 110,000 characters. Unicode is
         stored in a variety encodings, of which the most common are UTF-8 and UTF-
         16, which use between two and four bytes to store each character.
             What Alice typically does, then, is to write out her message in either ASCII or
         Unicode, simply attach all the bytes in sequence, and then cut the entire message
         into blocks of b bits. Here b ∈ N is chosen to be the largest value such that 2b < n,
         so that those blocks of b bits can be thought of as representing an integer m ∈ Z
         such that 0 ≤ m < n, so RSA can now be used on the message, a block at a time.
             (We are glossing over many details here, such as how to leave space for some
         salt, other approaches and issues in the structure of the blocks, etc. See any stan-
         dard reference on cryptography, such as [FS03] or [MVOV96].)
                       4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM                          79

   Exercises for §4.4.

    E XERCISE 4.4.1. In this problem, you will finish the proof of Proposition 4.4.6, which
states that RSA decryption functions correctly in all cases.
    As a first step, formally state and prove a lemma that for distinct primes p and q, two
integers r and s are congruent mod pq if and only if they are congruent mod p and mod q.
[Hint: the Chinese Remainder Theorem.]
    Now, using the above, prove the missing case of Proposition 4.4.6: Given distinct
primes p and q, define n = pq. Let e ∈ N satisfy gcd(e, φ(n)) = 1 and d ∈ N be an
inverse of e mod φ(n). Then, for any m ∈ Z such that 0 ≤ m < n and gcd(m, n) 6= 1, we
have med ≡ m (mod n).

     E XERCISE 4.4.2. How much computation is required to compute ak (mod n) for a ∈
Z and k, n ∈ N with n ≥ 2?
    Multiplying two numbers a, b ∈ Z is a basic instruction in most modern computers: it
can be thought of as taking one unit of time. (Or you can do a number of smaller manip-
ulations which is approximately a polynomial function of the number of digits in a and b;
it will make no difference for the rest of this problem.) Similarly, division with remainder
is either a single machine instruction (e.g. on a processor in the Pentium family) or can be
done by elementary school methods in time approximated by a polynomial function of the
sizes of the inputs.
    Let us then describe the work to do a multiplication followed by reducing (mod n)
as being bounded by a polynomial function p(d), where d is the number of digits in the
operands.
    If we then simply multiply a by itself k times, reducing (mod n) each time, the time
to do this will be approximately kp(d) = c1 ec2 d p(d), for some constants c1 , c2 ∈ R – it will
increase exponentially.
    See if you can come up with a much fast exponentiation algorithm, polynomial rather
than exponential.
    [Hints: (1) Make a table of successive squares of a in (mod n), being a2 , a4 , a6 , etc.
(2) write out k in base two, and expand ak using this binary version of k, the usual rules for
exponentiation, and the table. End up with a description of how to compute ak (mod n),
and check carefully how much time it would take to do your computation.]

    E XERCISE 4.4.3. For k ∈ N, play the following game:

     (1) use the division algorithm on k for division by two, getting quotient q and remain-
         der r;
     (2) replace k ← q;
     (3) if k > 0, repeat from step (1); otherwise, terminate.
80                                     4. CRYPTOLOGY

Give a bound on how many steps does it take for this game to finish – answer in terms of a
function of k, or a function of the number of bits required to write k down in base two, or
a function of the number of digits require to write k in base 10.
    Show that for every two steps of the Euclidean Algorithm, the remainder terms ri de-
crease by a least a faction of 1/2. In other words, if we use the notation of Theorem 1.6.2
then rj+2 < 12 rj for all j ∈ Z satisfying j ≥ 0.
    Explain how this means the Euclidean Algorithm requires at most log2 (b) steps to com-
pute gcd(a, b) and it therefore requires at most seven times the number of digits of b. [Hint:
what is log2 (10)?]
    Explain how this means that the Euclidean Algorithm is a cryptologically feasible com-
putation, in the sense of this section.
                                   4.5. DIGITAL SIGNATURES                                  81

                                  4.5. Digital Signatures
    Public-key cryptosystems allow several use-cases which symmetric cryptosystems do
not. One which has come to have more and more importance in the modern digital economy
is the creation of digital signatures – these are parts of electronic documents which are
supposed to have something of the qualities of a physical signature in that are hard for an
imposter to forge. Such a signed document might be needed, for example, if Bob from
the last section (whose RSA public key is on his website) wished to send a legally binding
contract via e-mail. Perhaps Alice and Bob wish to e-mail to their future landlord Larry a
signed lease for an apartment that they will share. When Larry gets an e-mail from Bob
saying “I agree to be bound by the terms of this lease,” Larry needs to have confidence that
this e-mail did originate from Bob, which he can if there is a digital signature.
    Here’s what Bob can do: he takes a copy of the lease, adds a section at the end stating
his agreement to its terms and giving some personally identifying information (perhaps
a scan of his driver’s license). Call this whole chunk of data m. Then Bob applies his
decryption algorithm, using his private (decryption) key kd , yielding s = dkd (m) – this s is
called Bob’s signature on the message m. He then e-mails both m and s to Larry.
    When Larry receives this signed message, the first thing he does is detach the signature
s and compute its encryption eke (s) using the public key he got off Bob’s website. Since
eke and dkd are inverses and it does not matter in which order they are applied, the result
should be m. If that is so, Larry can be sure that whoever sent the message also had access
to Bob’s secret key and so presumably is Bob himself.
    Graphically:

   Basic digital signatures:
          Larry                        on public network                      Bob
                                                                          pick kd ∈ Kd
                                                                      compute ke = E(kd )
       download ke                  ֋ public key ke        ֋                publish ke
                                                                        message m ∈ M
                                                                      compute s = dkd (m)
      receive (m, s)           ֋ signed message (m, s)         ֋         transmit (m, s)
  if eke (s) = m
      ACCEPT
  otherwise,
       REJECT


     One problem with this scheme is that it has effectively doubled the size of the message.
The way to make a smaller, more efficient signature is for it to consist of the decryption not
of all of m but instead of some function h(m). Here the function h should take a message
82                                     4. CRYPTOLOGY

of arbitrary size and produce a small, digested piece of data ... which nevertheless depends
upon every part of the input m. After, all, if h(m) depended only upon the first 100 bits
of m, for example, then a malicious Eve could alter the message in transit, and her change
would go undetected as long as she did not change the first 100 bits of the message.
    Cryptologists have a name for functions like this h.

     D EFINITION 4.5.1. A function h which takes as input arbitrary length strings of bits
and produces output bit strings of a fixed length is called a cryptographic hash function
if it satisfies
      ease of computation: it is feasible to compute the h(m) for any m;
      pre-image resistance: given a hash value t, it is infeasible to find an m such that
        h(m) = t;
      second pre-image resistance: given a specific input m1 , it is infeasible to find an-
        other m2 such that h(m2 ) = h(m1 );
      collision resistance: it is infeasible to find two messages m1 and m2 such that h(m2 ) =
        h(m1 ).
The words feasible and infeasible here have the same meaning here as in the previous
section that it is, or is not, possible to complete the computation in an amount of time
bounded by a polynomial function of the size of the inputs.

    Notice that since a hash function takes inputs of arbitrary length but has a fixed output
size, there will necessarily be an infinite number of collisions
    The creation of cryptographic hash functions is something of a black art. It turns out
that if one builds a candidate hash function with some clear structure (usually mathemati-
cal) – particularly if it is one that is fast to compute – a way to break one of the resistance
requirements is usually found by the cryptological community. For this reason, the algo-
rithms currently in wide use tend to be very ad hoc computations that just seem messy and
have resisted attempts at inversion or breaking resistance.

    E XAMPLE 4.5.2. For around a decade starting in the early 1990s, the most widely used
cryptographic hash function was called md5. This algorithm was developed by Ron Rivest
and published in 1992. The output size of md5 is 128 bits.
    While md5 was thought to be flawed since the middle 1990s, a real attack was not
published until 2004, when it was shown not to be collision resistant [WY05]. However,
md5 is still used extensively today to verify that a large data transfer has not suffered a
transmission error – i.e., it is still a useful tool to test for non-malicious data corruption.
(In this context of providing evidence for data integrity against non-malicious corruption,
a hash function is frequently called a fingerprint.)
                                 4.5. DIGITAL SIGNATURES                                83

    E XAMPLE 4.5.3. The most widely used cryptographic hash function from the late
1990s until recently, and one which is built into many widely accepted and standardized
cryptographic protocols, is SHA-1, with an output size of 160 bits.
    SHA-1 was developed by US National Security Agency (NSA) in a semi-public pro-
cess, and was adopted by the US National Institute of Standards and Technology (NIST) as
part of several US Federal Information Processing Standards .
    In 2004, some work was published which indicated that SHA-1 might be vulnerable
to certain kinds of attack. (See [PS04].) For this reason, NIST required in 2010 many US
federal data protection applications to move to another hash function.

    E XAMPLE 4.5.4. At the time of this writing, most security conscious users and organi-
zations recommend SHA-2, usually in its “SHA-256” variant, which has an output size of
256 bits. Given recent revelations of the NSA’s involvement – and weakening of – crypto-
graphic protocols, it might be a cause of concern that NSA participated in the development
of SHA-2.
   Graphically:

  RSA digital signatures:
       Charlie                        on public network                     Bob
                                                                    do RSA set-up, make
                                                                      ke = (n, e) and d
       download ke               ֋ verification key ke      ֋             publish ke
                                                                      message m ∈ M
                                                                    compute s = (h(m))d
       receive (m, s)          ֋    signed message (m, s)    ֋         transmit (m, s)
  if h(m) = se (mod n)
      ACCEPT
  otherwise, REJECT
   With this understood, we can describe digital signatures more formally.

    D EFINITION 4.5.5. Suppose Bob sets up the RSA cryptosystem, as in Definition 4.4.5,
and chooses once and for all a cryptographic hash function h which he publishes on his
web page along with his public encryption key ke .
    If Bob now wants to sign a message m, he appends to m the RSA digital signature
s = ddk (h(m)) before transmitting it to any third party, say Charlie.
    When Charlie receives a signed message (m, s) which claims to be from Bob, he goes
to Bob’s website and downloads the public key ke and description of the cryptographic
hash function h. At this point, Charlie computes eke (s) and compares it to h(m). If they
are equal, he accepts the signature; if not, he rejects.
84                                    4. CRYPTOLOGY

    In this context, Bob’s private/decryption key kd is called the signing key and his pub-
lic/encryption key ke is called the verification key.
                                 4.5. DIGITAL SIGNATURES                                 85

   Exercises for §4.5.

    E XERCISE 4.5.1. Use the Pigeonhole Principle (Theorem 1.1.2 ) to prove that there
will always be an infinite number of collisions for a cryptographic hash function [although
they may not be feasibly computable].

   E XERCISE 4.5.2. Give an example of a hash function h whose output size is one bit
and a specific input m1 for which there is no second pre-image; that is, ∄m2 such that
h(m2 ) = h(m1 ).
86                                    4. CRYPTOLOGY

               4.6. Man-in-the-Middle Attacks, Certificates, and Trust
     While public-key crypto can seem an unalloyed benefit to the networked world, close
examination of the details of the last two sections shows a dangerous gap between a casual
statement of the properties of these cryptographic tools and their reality. The distinction
which at first goes unnoticed is between Bob, the person, and the bits arriving over the wire
to Alice or Larry which claim to be from Bob. This has little effect if Eve is, as we have
mostly been assuming her to be, a passive observer of the communications between Alice
and Bob (and sometimes Larry). But if Eve has even more control over the network and
can replace all communications with her own versions, new attacks are possible.
     Suppose Alice wants to send her secret message to Bob, again without ever having met
him to exchange with him a key for a symmetric cryptosystem. She hopes that he has a
public key, so she gets on the web and downloads his home page.
     Here is where Eve springs into action, intercepting Bob’s (web server’s) response to the
web request. She keeps a copy of Bob’s public key keB for herself, but substitutes into the
web page data in its place an RSA encryption key of her own, keE – for which she alone
knows the corresponding decryption key kdE – and transmits the modified page on to Alice.
     Alice composes her cleartext m, and transmits its corresponding ciphertext cA = ekeE (m)
to Bob, or so she thinks. Instead, Eve intercepts that ciphertext, decrypts and stores
m = dkdE (cA ) = dkdE (ekeE (m)). Then, in order to make Alice and Bob think everything is
working normally (so they’ll keep up their revealing chatter), Eve transmits cE = ekeB (m)
on to Bob.
     From Bob’s point of view, he gets an e-mail which seems to be from Alice and which,
when decrypted with his private key, does make perfect sense. Furthermore, if he interacts
with Alice off-line, she behaves as if she sent that message – in fact, she did, but not in
the particular encrypted form that Bob received. Eve has completely violated the confiden-
tiality of Alice and Bob’s communications, and she could violate the message integrity any
time she liked, still making it appear to come legitimately from Alice.
     The above scenario is called a man-in-the-middle attack (pardon the non-gender-
neutral terminology).
     Here is a graphical depiction of this attack:
                 4.6. MAN-IN-THE-MIDDLE ATTACKS, CERTIFICATES, AND TRUST                    87

        Man-in-the-middle attack:
             Alice                            Eve                         Bob
                                                                    generate keys:
                                                                 public keB , private kdB
                                          intercept keB ֋             publish keB
                                       generate keys:
                                   public keE , private kdE
         download keE             ֋ keE , spoof origin
        message m ∈ M
      compute cA = ekeE (m)
          transmit cA              ֌ cA , intercept
                                     extract the cleartext
                                        m = dkdE (cA )
                                   change to m′ if desired
                                   compute cE = ekeB (m′ )
                                          spoof origin cE ֌            receive cE
                                                                    read the message
                                                                      m′ = dkdB (cE )

    So it seems that symmetric cryptosystems actually have one nice thing built in: when
the parties meet in that perfect, prelapsarian moment to exchange their symmetric key, they
presumably can have confidence in the identity of the person they’re talking to – if not, they
wouldn’t exchange the symmetric key until they had seen lots of official-looking identifi-
cation materials. Asymmetric cryptosystems have the fundamental difficulty to overcome
of establishing a trustworthy connection between a real person’s identity and a public key
on the Internet which purports to be from that person.
    The technique from last section, §4.5, can help transfer the trust, at least. Suppose Alice
wants to engage in secret communication with Bob, but does not know if she can trust the
public key which appears to be on Bob’s web page. If that key came with an accompanying
digital signature issued by a trusted third party [TTP] of which Alice had the public key,
she could verify that the key was Bob’s, at least as far as the TTP knew.
    Here is a formal definition.

    D EFINITION 4.6.1. Individuals or organizations who want to use asymmetric cryp-
tography can go to a trusted third party called a certificate authority [CA] to request a
[digital] certificate for their public keys. This certificate would be a digital signature on
the public key itself, signed by the CA’s signing key. The CA’s verification key would be
assumed widely disseminated across the Internet or perhaps built into the basic operating
system software or computer hardware. Then anyone who wanted to use a public key could
88                                      4. CRYPTOLOGY

first check the validity of the associated certificate and have confidence that the intended
party did own the key in question.
     The entire aggregate of certificates, CAs, pre-installed or widely distributed verification
keys, etc., is called a public key infrastructure or PKI.

    In actual practice, the CA often cannot do more than certify that a certain public key
is owned by an individual who has access to a certain e-mail address, or the webmaster
password to a certain site, or some such purely Internet-based token. That much is usually
quite easy – connecting this with a real, external identity would require checking some
form of government-issued ID, probably, and is rarely done. Although it would be useful:
perhaps the government should act as a CA, and government IDs should have a built-in
RFID (recent US passports do!) which can transmit a certificate on the ID owner’s public
key.
    There is one other approach to figuring out whether to have faith in a particular public
key, favored by those who mistrust authority but are willing to trust some others they know
personally. In this approach, individuals who know each other personally and have faith in
each other’s good cryptologic habits can each sign each other’s public keys, adding their
digital signatures to those which have already been collected. Then when you want to use
someone’s public key, you can unwrap a chain of digital signatures, each signing the next,
until you find one by a person whom you know personally, have met, and whose public key
you have verified.
    This approach has become known as the web of trust, and is strongly supported by
GnuPG and other OpenPGP-compatible organizations, see gnupg.org.
    By the way, to be useful, a web of trust must include as many individuals and their keys
as possible, and have as many connections (where individual X knows and signs individual
Y’s public key) as possible. One way to get this going quickly is to have a key-signing
party. If you are standing next to someone whom you know well who says “sure, this
other person is someone I know well and trust”, then you might be willing to sign both of
their keys right there. In practice, when you sign keys, you are standing there with someone
whom you trust, so it usually suffices to check the md5 fingerprint of their key and not the
whole thing – the md5 is shorter and standing there with this person you trust, presumably
you do not think that anyone there has devoted large computational resources to finding a
second md5 pre-image the fingerprint.
                                      CHAPTER 5


                          Indices = Discrete Logarithms

   We talked about the cyclic subgroup hai of (Z/nZ)∗ generated by an element a in the
proof of our version of Lagrange’s Theorem 3.3.3, on the way to Euler’s Theorem 3.3.4.
Recall it consisted of all powers of a (well, of [a]n ) in (Z/nZ)∗ . Here are some examples:

 n φ(n) (Z/nZ)∗               a     hai                                    ordn (a)
 2  1   {1}                   1     {1}                                       1
 3  2   {1, 2}                1     {1}                                       1
                                         2
                              2     {2, 2 ≡ 1}                                2
 4    2    {1, 3}             1     {1}                                       1
                                         2
                              3     {3, 3 ≡ 1}                                2
 5    4    {1, 2, 3, 4}       1     {1}                                       1
                                         2       3       4
                              2     {2, 2 ≡ 4, 2 ≡ 3, 2 ≡ 1}                  4
                              3     {3, 32 ≡ 4, 33 ≡ 2, 34 ≡ 1}               4
                                         2
                              4     {4, 4 ≡ 1}                                2
 6    2    {1, 5}             1     {1}                                       1
                                         2
                              5     {5, 5 ≡ 1}                                2
 7    6    {1, 2, 3, 4, 5, 6} 1     {1}                                       1
                                         2       3
                              2     {2, 2 ≡ 4, 2 ≡ 1}                         3
                                         2       3       4      5    6
                              3     {3, 3 ≡ 2, 3 ≡ 6, 3 ≡ 4, 3 ≡ 5, 3 ≡ 1}    6
                                         2       3
                              4     {4, 4 ≡ 2, 4 ≡ 1}                         3
                                         2       3       4      5    6
                              5     {5, 5 ≡ 4, 5 ≡ 6, 5 ≡ 2, 5 ≡ 3, 5 ≡ 1}    6
                                         2
                              6     {6, 6 ≡ 1}                                2
 8    4    {1, 3, 5, 7}       1     {1}                                       1
                                         2
                              3     {3, 3 ≡ 1}                                2
                                         2
                              5     {5, 5 ≡ 1}                                2
                                         2
                              7     {7, 7 ≡ 1}                                2

              TABLE 5.0.1. Cyclic subgroups of (Z/nZ)∗ for n = 2, . . . , 8


Each of these cyclic subgroups hai has a size – being the order ordn (a) of its generator
– which divides the size of the ambient (Z/nZ)∗ , as we know will happen from Euler’s
Theorem 3.3.4.
                                             89
90                          5. INDICES = DISCRETE LOGARITHMS

    Some random things we also notice, which might or might not hold true in general,
given the small amount of evidence in this table, are:
      • frequently there is an element a which generates the biggest possible cyclic sub-
        group: hai = (Z/nZ)∗ ;
      • there seem always to be those big cyclic subgroups in Z/nZ when n is a prime;
      • even some composite n give a Z/nZ which contains a big cyclic subgroup, except
        for the case of the largest power of 2, which was n = 23 , that we tried.
    Just to see if those possible general statements hold a bit further, let’s compute two
more examples. For these we give only the sizes of (Z/nZ)∗ and hai, not complete lists of
their elements, to conserve space and since the elements of (Z/nZ)∗ can be found in the
column labelled “a”:

                           n φ(n) a ∈ (Z/nZ)∗ ordn (a)
                           16 8        1         1
                                       3         4
                                       5         4
                                       7         2
                                       9         2
                                       11        4
                                       13        4
                                       15        2
                           17 16       1         1
                                       2         8
                                       3        16
                                       4         4
                                       5        16
                                       6        16
                                       7        16
                                       8         8
                                       9         8
                                       10       16
                                       11       16
                                       12       16
                                       13        4
                                       14       16
                                       15        8
                                       16        2

          TABLE 5.0.2. Cyclic subgroups of (Z/nZ)∗ for n = 16 and n = 17
                        5.1. MORE PROPERTIES OF MULTIPLICATIVE ORDER                         91

Our guesses (large powers of 2 not so good; other n’s, particularly prime, are good) still
hold true. Of course, any finite amount of evidence for a general mathematical statement is
very inconclusive....
   Now to the formal analysis.

                      5.1. More Properties of Multiplicative Order
     But first, we need some more facts about [multiplicative] order.

  T HEOREM 5.1.1. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then
 k
a ≡ 1 (mod n) for k ∈ N iff ordn (a) | k.

    P ROOF. One direction is very easy: if k ∈ N satisfies ordn (a) | k then ∃d ∈ N such
that k = ordn (a) · d and thus

                     ak = aordn (a)·d = (aordn (a) )d ≡ 1d = 1 (mod n) .

    Conversely, suppose k ∈ N is such that ak ≡ 1 (mod n). Apply the Division Algo-
rithm to get q, r ∈ N such that k = ordn (a) · q + r and 0 ≤ r < ordn (a). Then

             1 ≡ ak = aordn (a)·q+r = (aordn (a) )q · ar ≡ 1q · ar ≡ ar     (mod n) .

But the definition of the order ordn (a) is that it is the smallest positive integer such that a
to that power is congruent to 1. The only way for ar ≡ 1 (mod n) and 0 ≤ r < ordn (a)
to be true, then, is if r = 0. Thus k = ordn (a) · q and ordn (a) | k, as desired.            

    What this will mean is that when we work in the cyclic subgroup of (Z/nZ)∗ gener-
ated by some element a, we should work with the powers of a as if the powers lived in
Z/(ordn (a))Z. That is:

  T HEOREM 5.1.2. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then
a ≡ ak (mod n) for j, k ∈ N iff j ≡ k (mod ordn (a)).
 j


    P ROOF. Again, one direction is very easy: if j, k ∈ N satisfy j ≡ k (mod ordn (a))
then ∃ℓ ∈ Z such that j = k + ordn (a) · ℓ from which we compute

               aj = ak+ordn (a)·ℓ = ak · (aordn (a) )ℓ ≡ ak · 1ℓ = ak     (mod n) .

    Conversely, suppose j, k ∈ N are such that aj ≡ ak (mod n) and, without loss of
generality, j ≥ k. Multiplying both sides of this congruence by (a−1 )k , which exists since
gcd(a, n) = 1, we get aj−k ≡ 1 (mod n). Then by Theorem 5.1.1 we have ordn (a) |
(j − k) or j ≡ k (mod ordn (a)).                                                          

   E XAMPLE 5.1.3. The rows in Table 5.0.1 bear out Theorems 5.1.1 and 5.1.2: each
cyclic subgroup (row) has a number of elements which divides the corresponding φ(n),
and powers of the generator a are only defined up to ordn (a).
92                            5. INDICES = DISCRETE LOGARITHMS

    It also seems that some of the smaller cyclic subgroups sometimes occur as a subset of
a larger cyclic subgroup. So in mod n = 7, if a = 3 and b = a2 ≡ 2, then we have the
containment hbi ⊂ hai, where hbi consists of half of the elements of hai, namely the even
powers of a:
                                                      
     hbi = b, b2 , b3 = {2, 4, 1} = 32 , 34 , 36 ⊂ 3, 32 , 33 , 34 , 35 , 36 = hai
Looking instead at c = 33 ≡ 6, we still have hci ⊂ hai, but now hci consists of one third of
the elements of hai, the powers of a which are multiples of 3:
                                                
         hci = c, c2 = {6, 1} = 33 , 36 ⊂ 3, 32 , 33 , 34 , 35 , 36 = hai
    The last part of this example seems to be asking for a general statement about what size
subset of a cyclic subgroup hai is formed of the cyclic subgroup ak for some k ∈ N. This
size will just be the order of that element ak , so we need the following
   T HEOREM 5.1.4. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then
                        ordn (a)
∀k ∈ N, ordn (ak ) = gcd(ordn (a),k)

   P ROOF. Fix some k ∈ N and let r = ord(ak ) and s = ord(a); with this notation, what
                                      s
we are trying to prove is that r =           .
                                   gcd(s, k)
   We shall repeatedly use in the proof the fact that since a gcd is a divisor, gcd(s, k) | s
                                                     s          k
and gcd(s, k) | k, which means in turn that both           ,           ∈ N.
                                                 gcd(s, k) gcd(s, k)
   Now to the proof.
   The definition of order is that r is the smallest natural number such that (ak )r ≡ 1
(mod n). Notice that
                          s             k          k
                  (ak ) gcd(s,k) = (as ) gcd(s,k) = 1 gcd(s,k) ≡ 1 (mod n) .
                                                            
                                                     s
 By Theorem 5.1.1, we conclude that r                          .
                                                 gcd(s, k)
   Smallest or not, since akr = (ak )r ≡ 1 (mod n), s | kr by Theorem 5.1.1, so ∃m ∈ N
such that ms = kr. Dividing both sides by gcd(s, k), we get an equation of natural numbers
                                                               
                                       s                     k
                              m                   =                 r;
                                   gcd(s, k)            gcd(s, k)
which means                                           
                                  s              k
                                                         r.
                              gcd(s, k)      gcd(s, k)
                                             
                            s           k
By Theorem 1.5.5, gcd             ,             = 1, so Euclid’s Lemma 2.1.6 tells us that
                        gcd(s, k) gcd(s, k)
   s
           r.
gcd(s, k)
                                  s
   We have shown that r and              divide each other. But that means they are equal,
                              gcd(s, k)
which is what we were trying to prove.                                                  
                      5.1. MORE PROPERTIES OF MULTIPLICATIVE ORDER                         93

   Exercises for §5.1.

   E XERCISE 5.1.1. Suppose n ∈ N satisfies n ≥ 2. Prove that if there exists a ∈
(Z/nZ)∗ such that ordn (a) = n − 1, then n is prime.

   E XERCISE 5.1.2. Suppose p is an odd prime and a ∈ (Z/pZ)∗ . Prove that if ∃k ∈ N
such that ordp (a) = 2k then ak ≡ −1 (mod p). [Hint: look at the proof of Lemma 3.2.1.]

    E XERCISE 5.1.3. Suppose n ∈ N satisfies n ≥ 2 and a, b ∈ Z/nZ are both relatively
prime to n. Prove that
                             ordn (ab) | ordn (a) ordn (b) .

     E XERCISE 5.1.4. Prove that there are infinitely many primes congruent to 1 mod 4, by
filling in the details of the following outline:


    (1) Prove: if an odd prime p and n ∈ Z are such that n2 ≡ −1 (mod p), then 4 | φ(p)
        [use a theorem in this section]. Why is it necessary to exclude the even prime
        here?
    (2) Therefore for n ∈ Z, the odd prime divisors of n2 + 1 are congruent to 1 mod 4.
    (3) Now assume for contradiction’s sake that there are only finitely many primes
        p1 , . . . , pn congruent to 1 mod 4 and consider the number (2p1 · · · · · pn )2 + 1.
        Apply the previous step....
94                             5. INDICES = DISCRETE LOGARITHMS

     5.2. A Necessary Digression: Gauss’s Theorem on Sums of Euler’s Function
     Later in this chapter we will need a fact first proved by Gauss about Euler’s φ function:
                                         X
     T HEOREM 5.2.1. For all n ∈ N,            φ(d) = n .
                                           d∈N
                                         s.t. d|n

     We’ll give two proofs, which illustrate different features of this situation:

    P ROOF 1. Fix n ∈ N.
    Let’s define some subsets of {1, . . . , n}, dependent upon a choice of a positive divisor
d | n, as follows

                       Sd = {k ∈ N | 1 ≤ k ≤ n and gcd(k, n) = d} .

 These sets are disjoint since for each k ∈ N such that 1 ≤ k ≤ n, d = gcd(k, n) has a
specific value and k is only in that Sd .
     For k ∈ Sd , gcd(k, n) = d or, by Theorem 1.5.5, gcd(k/d, n/d) = 1. Thus #(Sd ) is
the number of elements ℓ ∈ N in the range 1 ≤ ℓ ≤ n/d which are relatively prime to n/d,
i.e., #(Sd ) = φ(n/d).
     Every k ∈ Z in the range 1 ≤ k ≤ n is in exactly one of these Sd , for d a positive
divisor of n. Therefore
                                                     [
                                    {1, . . . , n} =   Sd
                                                            d∈N
                                                          s.t. d|n
so                                                   
                                      [       X            X
          n = # ({1, . . . , n}) = # 
                                        S d
                                             =
                                                 # (S d ) =   φ(n/d) .
                                         d∈N                     d∈N        d∈N
                                       s.t. d|n                s.t. d|n   s.t. d|n

But as d runs over the positive divisors of n, so does n/d; in other words

         {d | d ∈ N, 1 ≤ d ≤ n and d | n} = {n/d | d ∈ N, 1 ≤ d ≤ n and d | n}

     We can therefore rewrite the last big equation as
                                    X                 X
                              n=          φ(n/d) =      φ(d)
                                      d∈N                        d∈N
                                    s.t. d|n                   s.t. d|n

which is what we were trying to prove.                                                     

     P ROOF 2. This approach will concentrate on the function
                                            X
                                   F (n) =       φ(d) ,
                                                      d∈N
                                                    s.t. d|n

which has some very nice properties.
        5.2. A NECESSARY DIGRESSION: GAUSS’S THEOREM ON SUMS OF EULER’S FUNCTION                 95

     For example, suppose p is a prime and k ∈ N. Then the divisors of pk are 1, p, p2, . . . , pk ,
so
F (pk ) = φ(1)+φ(p)+φ(p2)+· · ·+φ(pk ) = 1+(p−1)+(p2 −p)+· · ·+(pk −pk−1 ) = pk .
   Furthermore, if p and q are distinct primes, then the divisors of pq are 1, p, q, and pq.
Also, gcd(p, q) = 1, so by Theorem 2.5.3
                           F (pq) = φ(1) + φ(p) + φ(q) + φ(pq)
                                   = φ(1) + φ(p) + φ(q) + φ(p)φ(q)
                                   = (1 + φ(p))(1 + φ(q))
                                   = F (p)F (q) ,
From this fact, one can prove (it’s an exercise below) that F (ab) = F (a)F (b) whenever
a, b ∈ N are relatively prime. This means that F has the same sort of multiplicative property
for relatively prime factors as does φ.
    We are ready to finish the proof. So let n ∈ N be any number such that n > 1 (for
n = 1, the theorem is trivial). Suppose the prime decomposition of n is given by
                                        n = pk11 · · · · · pkr r ,
                                                                            k
where the primes {p1 , . . . , pr } are distinct. Therefore gcd(pki i , pj j ) = 1 if i 6= j and so
we can use the multiplicative property of F and our calculation of F for prime powers to
conclude
                                  F (n) = F (pk11 · · · · · pkr r )
                                         = F (pk11 ) · · · · · F (pkr r )
                                         = pk11 · · · · · pkr r
                                         =n,
which is what we wanted to prove.                                                                
96                         5. INDICES = DISCRETE LOGARITHMS

     Exercises for §5.2.

    E XERCISE 5.2.1. Finish the second proof of Gauss’s Theorem 5.2.1 by proving that
F (ab) = F (a)F (b) for all a, b ∈ N which are relatively prime.
                                     5.3. PRIMITIVE ROOTS                                    97

                                    5.3. Primitive Roots
   Now back to those elements which generate large cyclic subgroups – which have a
name:
   D EFINITION 5.3.1. Given n ∈ N such that n ≥ 2, an element a ∈ (Z/nZ)∗ is called a
primitive root mod n if ordn (a) = φ(n). We shall also call an integer x ∈ Z a primitive
root mod n if [x]n is a primitive root in the sense just defined.
    E XAMPLE 5.3.2. From the two tables in the introduction to this chapter we can read
off the following primitive roots mod their respective n’s:

                        n    Primitive Roots mod n     φ(n) φ(φ(n))
                        2    1                          1      1
                        3    2                          2      1
                        4    3                          2      1
                        5    2, 3                       4      2
                        6    5                          2      1
                        7    3, 5                       6      2
                        8    none                       4      2
                        ..
                         .
                       16 none                          8         4
                       17 3, 5, 6, 7, 10, 11, 12, 14    16        8

                   TABLE 5.3.1. Primitive roots for n = 2, . . . , 8, 16, 17


We have included the column of φ(n) since that is the order that each primitive root must
have. And then we added the column of φ(φ(n)) as well, since by some strange magic it
appears frequently to compute the number of primitive roots.
   Let us state formally, and prove, the general result noticed in this example:
    T HEOREM 5.3.3. Given n ∈ N satisfying n ≥ 2, if n has any primitive roots than it
has exactly φ(φ(n)) primitive roots.
   P ROOF. Let a ∈ (Z/nZ)∗ be a primitive root. That means that
                            
                      hai = a, a2 , . . . aordn (a) = (Z/nZ)∗
since ordn (a) = φ(n) = # ((Z/nZ)∗ ). Any other primitive root b, being an element of
(Z/nZ)∗ must be one of these powers of a. In other words, ∃k ∈ N such that b = ak .
    In order for this b to be a primitive root, its order must be φ(n). But from Theorem 5.1.4,
we know that
                                          ord(a)              φ(n)
                           ord(b) =                    =                .
                                     gcd(ord(a), k)       gcd(φ(n), k)
98                           5. INDICES = DISCRETE LOGARITHMS

 Therefore b = ak will be a primitive root if and only if gcd(φ(n), k) = 1. From the
definition of Euler’s φ function, this happens for exactly φ(φ(n)) values of k.   
    Let’s see if we can prove that in some circumstances, we will have the first primitive
root that the above theorem requires. The easiest situation in which that might happen will
probably be when the modulus is prime, as we have the strongest tools to work with in that
case.
    Finding elements of a particular order d amounts (in part) to finding solutions to the
equation xd − 1 ≡ 0 (mod p). The first step toward this is the following more general
theorem about polynomials in mod p due to Lagrange:

     T HEOREM 5.3.4. For n ∈ N and a0 , . . . , an ∈ Z, the polynomial
                          an xn + · · · + a1 x + a0 ≡ 0 (mod p) ,
where an 6≡ 0 (mod p), has at most n solutions in Z/pZ if p is prime.

    P ROOF. We use induction on the degree n. The base case n = 1 amounts to solving
the linear congruence a1 x + a0 ≡ 0 (mod p) where a1 6≡ 0 (mod p). Since p is prime,
this means that gcd(p, a1 ) = 1, and therefore the linear congruence has a unique solution
mod p by the version of Theorem 2.2.4 as stated in Remark 2.2.5.
    Now for the inductive step, assuming that the theorem is true for some n ∈ N, we shall
prove it for the case n + 1. So let a0 , . . . , an+1 ∈ Z be such that an+1 6≡ 0 (mod p). If
                        a(x) = an+1 xn+1 + an xn + · · · + a1 x + a0
has no zeros in mod p, then the theorem is certainly true for this polynomial of degree n+1:
the number of solutions to a(x) ≡ 0 (mod p) is 0 < n + 1.
    If instead a(x) has at least one zero z1 in mod p, do long division of polynomials to get
                                a(x) = b(x)(x − z1 ) + r(x)
where b(x) and r(x) are polynomials with integral coefficients and such that
                            0 ≤ deg(r(x)) < deg(x − z1 ) = 1 .
It therefore follows that r(x) is a constant polynomial, say with value r1 ∈ Z.
     Plug z1 into the above formula resulting from division of polynomials to get
                     0 ≡ a(z1 ) ≡ b(z1 )(z1 − z1 ) + r1 ≡ r1   (mod p) .
Hence r1 ≡ 0 (mod p), which means that our polynomial a(x) ≡ b(x)(x − z1 ) (mod p).
But an+1 equals (one times) the leading coefficient of b(x), so that leading coefficient must
not be congruent to 0 mod p.
    Hence the inductive hypothesis applies to b(x) and tells us that it has no more than n
roots in mod p. These roots of b(x), plus the single root of (x − z1 ), total no more than
n + 1 roots for a(z) = b(x)(x − z1 ).
                                    5.3. PRIMITIVE ROOTS                                    99

    All we need to do to finish is to see that any root of a(x) must in fact be a root of b(a)
or (x − z1 ). So suppose r ∈ Z is a root, so
                            a(r) = b(r)(r − z1 ) ≡ 0    (mod p),
which means in turn that p | b(r)(r − z1 ). Then by Proposition 3.1.5 we must have p | b(r)
or p | (r − z1 ). In other words, b(r) ≡ 0 (mod p) or (r − z1 ) ≡ 0 (mod p), as we hoped.
[This amounts to using what is called in basic algebra the “zero product property”, which
is true in Z/pZ for p prime by Proposition 3.1.5; in abstract algebra, we say that Z/pZ is a
domain if p is prime.]
     Thus the roots of a(x) all come from roots of b(x) or (x − z1 ), and so number at most
n + 1, which means we have proven the inductive step.                                     

    So much for a maximum number of roots. In the following particular case, we can get
the exact number of roots:

   T HEOREM 5.3.5. If p is prime and d | p − 1, then there are exactly d solutions, up to
congruence mod p, of the congruence
                                    xd = 1 (mod p) .

    P ROOF. Let p and d be as in the statement, and define m = (p − 1)/d ∈ N. We use a
clever factorization, defining
                       a(x) = xd − 1,
                       b(x) = xdm−d + xdm−2d + · · · + xd + 1, and
                       c(x) = xp−1 − 1
so that c(x) = a(x)b(x).
    Notice that by Fermat’s Little Theorem, c(x) has exactly p − 1 roots which are distinct
in mod p, being {1, . . . , p − 1}. Also, by the previous Theorem 5.3.4, b(x) has at most
deg(b(x)) = dm − d = p − 1 − d roots. Since, as in the proof of that Theorem 5.3.4 (the
part where we mentioned the “zero product property”), roots of c(x) correspond exactly to
the roots of a(x) and those of b(x), there must be at least d roots of a(x) to add to these no
more than p − 1 − d roots of b(x), making the exactly p − 1 roots of p − 1.                 

   We are now in a position to quantify exactly the congruence classes in Z/pZ, for p a
prime, of particular orders:

   T HEOREM 5.3.6. If p is prime and d | p − 1, then there are exactly φ(d) distinct
congruence classes of order d in Z/pZ.

   P ROOF. Let p and d be as in the statement and write
                 ψ(k) = # ({ℓ ∈ Z | 1 ≤ ℓ ≤ p − 1 and ordp (ℓ) = k}) .
100                           5. INDICES = DISCRETE LOGARITHMS

By our version of Lagrange’s Theorem 3.3.3, any ℓ ∈ Z such that 1 ≤ ℓ ≤ p − 1 has an
order which divides φ(p) = p − 1, so
                                   X
                                      ψ(k) = p − 1 .
                                      k∈N
                                   s.t. k|p−1

In addition, by Gauss’s Theorem 5.2.1,
                                  X
                                                φ(k) = p − 1 .
                                      k∈N
                                   s.t. k|p−1

      Therefore
                                 X                     X
                                            φ(k) =                ψ(k) .
                                  k∈N                   k∈N
                               s.t. k|p−1            s.t. k|p−1

Our goal now is to show that ∀k ∈ N such that k | p − 1, we have ψ(k) ≤ ψ(k). If we can
do this, then in fact for all such k, we would have to have ψ(k) = ψ(k) because if for one
k we had ψ(k) < ψ(k) then there would be no way for another k to give ψ(k) > ψ(k) so
      P             P
that ψ(k) = φ(k) could still hold.
     So let k ∈ N satisfy k | p − 1. If ψ(k) = 0, meaning that there are no elements of order
k, then certainly ψ(k) ≤ φ(k) as φ(k) is a non-negative function.
     Suppose ψ(k) > 0, meaning there exists at least one element, call it a, of order k in
Z/pZ. Notice that for any n ∈ N, (an )k = (ak )n ≡ 1n ≡ 1 (mod p), which means that
a, . . . , ak are distinct elements of Z/pZ which all satisfy
                                   xk − 1 ≡ 1 (mod p) .
By Theorem 5.3.5, there are exactly k solutions of this congruence equation. These so-
lutions include all of the elements of Z/pZ of order k, and maybe other elements, whose
order is a divisor of k, in which we are not so interested.
    But in fact, by Theorem 5.1.4, we know exactly the orders of these elements a, . . . , ak :
the order of aℓ is
                                   ordp (a)             k
                                                 =            .
                                gcd(ordp (a), ℓ)    gcd(k, ℓ)

   Thus the elements of Z/pZ of order k are those elements aℓ for ℓ ∈ Z satisfying 1 ≤
ℓ ≤ k for which gcd(k, ℓ) = 1. There are therefore φ(k); in other words, ψ(k) = φ(k). 
      So, the punch line:

      C OROLLARY 5.3.7. If p is prime, there are φ(p − 1) primitive roots in Z/pZ.

      P ROOF. Use k = p − 1 in the previous theorem.                                        
    Just to finish the thread of conjectures with which we started this chapter and section,
let us prove the
                                         5.3. PRIMITIVE ROOTS                                       101

    T HEOREM 5.3.8. Let k ∈ N satisfy n ≥ 3. Then n = 2k has no primitive roots.

    P ROOF. We shall prove that for all odd numbers a,
                                          k−2
                                        a2        ≡ 1 (mod 2k )
by induction on k.
    Start with k = 3 and just look at all the congruence classes in Z/8Z with odd represen-
tatives, and the squares of these classes:
[1]28 = [1]8 ,   [3]28 = [9]8 = [1]8 ,            [5]28 = [25]8 = [1]8 ,   and   [7]28 = [49]8 = [1]8 .
So the base case k = 3 is established.
    Now assume the statement holds for the value k, meaning that for all odd a,
                                         k−2
                                    a2            ≡ 1 (mod 2k ) .
In other words, ∃ℓ ∈ Z such that
                                               k−2
                                             a2      = 2k ℓ + 1 .
Squaring both sides of this, we get
       k−1
              k−2 2               2                                       
    a2 = a2             = 2k ℓ + 1 = 22k ℓ2 + 2 · 2k ℓ + 1 = 2k+1 2k−1 ℓ2 + ℓ + 1,
meaning that
                              (k+1)−2             k−1
                           a2            = a2           ≡1   (mod 2(k+1) ) .
    The inductive proof of the statement with which we began this proof is done. But
notice that this means that for all odd numbers a, or, otherwise, for all a ∈ (Z/2k Z)∗ ,
ord2k (a) ≤ 2k−2 < 2k−1 . Thus n = 2k has no primitive roots, which would all have to be
odd numbers of order φ(2k ) = 2k−1.                                                   
    As we thought we noticed, based on the few examples we had calculated, large powers
of two do not have primitive roots.
102                            5. INDICES = DISCRETE LOGARITHMS

      Exercises for §5.3.

      E XERCISE 5.3.1. Express each of the primitive roots of 17 as a power of one of them.

   E XERCISE 5.3.2. Find all of the primitive roots for the primes 11 and 13 and express
them each as a power of one of them.

      E XERCISE 5.3.3. Find all of the elements of Z/13Z which have each possible order.

   E XERCISE 5.3.4. By expressing everything as powers of single primitive root, use
Corollary 5.3.7 to prove one direction of Wilson’s Theorem

    E XERCISE 5.3.5. If r is a primitive root of the odd prime p, prove that r (p−1)/2 ≡ −1
(mod p). Also, prove that if s is any other primitive root of p, then rs cannot be a primitive
root.

      E XERCISE 5.3.6. Prove that the inverse of a primitive root is always a primitive root.

    E XERCISE 5.3.7. If p is a prime congruent to 1 mod 4, prove that for all primitive
roots r of p, −r is also a primitive root. If instead p is a prime congruent to 3 mod 4,
prove that ordp (−r) = (p − 1)/2.
                                       5.4. INDICES                                    103

                                      5.4. Indices
   So for some values of n ∈ N, there exists a primitive root a ∈ (Z/nZ)∗ , in which case
we have seen that
                             
                       hai = a, a2 , . . . aordn (a) = (Z/nZ)∗ .

That means that any b ∈ (Z/nZ)∗ is some power of a. “But which power?” you cry, so we
make the

    D EFINITION 5.4.1. If n ∈ N has a primitive root a, then for any b ∈ Z such that
gcd(b, n) = 1 the smallest k ∈ N such that b ≡ ak (mod n) is called the index of b
relative to a and is denoted inda (b). We shall also sometimes talk about inda (x) where
x ∈ (Z/nZ)∗ , meaning the index relative to a of any representative b of the congruence
class x.

   E XAMPLE 5.4.2. Working from the tables above (5.0.1, 5.0.2, and 5.3.1), it is easy to
compute a number of examples. First, for the smaller values of the moduli, where there are
few primitive roots:


                     n a b inda (b)
                                                      n a b inda (b)
                     2 1 1    1
                                                      7 3 1    6
                     3 2 1    2
                                                          2    2
                         2    1
                                                          3    1
                     4 3 1    2
                                                          4    4
                         3    1
                                                          5    5
                     5 2 1    4
                                                          6    3
                         2    1
                                                        5 1    6
                         3    3
                                                          2    4
                         4    2
                                                          3    5
                     5 3 1    4
                                                          4    2
                         2    3
                                                          5    1
                         3    1
                                                          6    3
                         4    2

                TABLE 5.4.1. Index values for moduli n = 2, 3, 4, 5, 7



   For the two larger moduli we worked out above, only n = 17 has primitive roots.
Since (Z/17Z)∗ has 16 elements and 8 primitive roots, we make a larger table for just this
modulus:
104                            5. INDICES = DISCRETE LOGARITHMS

                                                          b
      inda (b)
                 1    2 3 4 5 6 7 8 9 10 11                                12   13   14   15 16
            3    16   14 1 12 5 15 11 10 2 3 7                             13   4    9    6 8
            5    16   6 13 12 1 3 15 2 10 7 11                             9    4    5    14 8
            6    16   2 15 4 11 1 5 6 14 13 9                              3    12   7    10 8
            7    16   10 3 4 15 13 1 14 6 9 5                              7    12   11   2 8
      a
            10   16   10 11 4 7 5 9 14 6 1 13                              15   12   3    2 8
            11   16   2 7 4 3 9 13 6 14 5 1                                11   12   15   10 8
            12   16   6 5 12 9 11 7 2 10 15 3                              1    4    13   14 8
            14   16   14 9 12 13 7 3 10 2 11 16                            5    4    1    6 8

                        TABLE 5.4.2. Index values for modulus n = 17

    Some things are natural to conjecture, given these examples and the simple definition
of index; many of them are very easy to prove (and are exercises):
      T HEOREM 5.4.3. Let n ∈ N have a primitive root a. Then
       (1) inda (1) = ordn (a) = φ(n); equivalently, inda (1) ≡ 0 (mod φ(n))
       (2) inda (a) = 1
       (3) ∀b, c ∈ Z such that gcd(b, n) = gcd(c, n) = 1, we have
                          inda (bc) ≡ inda (b) + inda (c) (mod φ(n))
       (4) ∀b ∈ Z such that gcd(b, n) = 1 and ∀k ∈ N, we have
                              inda (bk ) ≡ k · inda (b)       (mod φ(n))
     These properties of inda are strikingly similar to the basic properties of a logarithm
with base a. For this reason, indices are called discrete logarithms in the computer science
literature. For cryptological applications, it is important to note that exponentiation in some
(Z/nZ)∗ is an excellent candidate one-way function:
          • Given n, a, and k, fast modular exponentiation is a feasible computation of b = ak
            in mod n.
          • Given n, a, and b, no known feasible algorithm finds a k = inda (b) such that
            ak ≡ b (mod n).
We will discuss some discrete logarithm-based cryptological protocols in the next sections.
    Before turning to cryptology, we explore some pure mathematical applications of in-
dices. Like the applications of logarithms in basic algebra, the usefulness of indices comes
from the above Theorem 5.4.3 enabling convenient algebraic manipulations of powers and
multiplications. Here’s an
      E XAMPLE 5.4.4. We use indices to solve the congruence
                                       3x5 ≡ 4 (mod 7)
                                        5.4. INDICES                                     105

by first taking ind5 of both sides
                         ind5 (3) + 5 ind5 (x) ≡ ind5 (4) (mod 6)
and applying all the rules in Theorem 5.4.3. Looking in Table 5.4.1, we translate that into
                               5 + 5 · ind5 (x) ≡ 2 (mod 6)
or, solving:
                         ind5 (x) ≡ 5−1 · 3 ≡ 5 · 3 ≡ 3   (mod 6)
which means, searching through the table again, that x = 6.
   Just to check, notice that 3 · 65 = 23328 ≡ 4 (mod 7).
   What if we solve the same equation, but use a different primitive root in our indices?
Compute
                        ind3 (3) + 5 ind3 (x) ≡ ind3 (4) (mod 6)
so
                               1 + 5 · ind5 (x) ≡ 4 (mod 6)
and this yields the same equation
                             ind5 (x) ≡ 5−1 · 3 ≡ 3    (mod 6)
and thus the same solution x = 6.

Caution: A common mistake with indices is to use the same modulus with the indices as
with the original congruence. Instead, when the congruence is in mod n, for n ∈ N, the
index congruence is in mod φ(n)!
106                          5. INDICES = DISCRETE LOGARITHMS

      Exercises for §5.4.

      E XERCISE 5.4.1. In the modulus n = 17, use indices to solve the congruences
                     (a) 4x ≡ 11      (b) 5x6 ≡ 7     (c) x12 ≡ 13
                     (d) 8x5 ≡ 10      (e) 9x8 ≡ 8    (f) 7x ≡ 7

   E XERCISE 5.4.2. The logarithm rules in Theorem 5.4.3 are very similar to rules for the
usual logarithm, except one is missing: the change of base formula. Figure out what that
should be in the context of indices, make a formal statement, and prove it.

      E XERCISE 5.4.3. Let p be an odd prime and a a primitive root mod p.
       (a) Prove that inda (−1) = (p − 1)/2
       (b) If x, y ∈ (Z/pZ)∗ satisfy xy ≡ 1 (mod p), then what is the relationship between
           inda (x) and inda (y)? Prove it!
       (c) If x, y ∈ (Z/pZ)∗ satisfy x + y ≡ 0 (mod p), then what is the relationship
           between inda (x) and inda (y)? Prove it!
                                5.5. DIFFIE-HELLMAN KEY EXCHANGE                                    107

                              5.5. Diffie-Hellman Key Exchange
   About a year before the RSA cryptosystem was invented, Whitfield Diffie and Martin
Hellman published New directions in cryptography [DH76], the first full description of a
working public key cryptosystem in the open scientific literature.1 They defined in this
paper something which has since been named after them:

   D EFINITION 5.5.1. The following protocol is called Diffie-Hellman key exchange
[DHKE]:
     (1) Alice and Bob agree upon a large prime p and a primitive root r ∈ (Z/pZ)∗ and
         publish both.
     (2) Alice chooses an α ∈ Z satisfying 1 ≤ α ≤ p − 1, computes A = r α (mod p),
         keeps α secret, but publishes A.
     (3) Bob chooses a β ∈ Z satisfying 1 ≤ β ≤ p − 1, computes B = r β (mod p),
         keeps β secret, and publishes B.
     (4) Alice gets the public value B and computes S = B α .
     (5) Bob gets A and computes the same value S = Aβ .
     (6) Both Alice and Bob use the shared secret S for future communications encrypted
         with some symmetric cryptosystem upon which they had previously agreed.
The value S which both Alice and Bob have [but Eve does not] is called their shared key
or shared secret.

    Here is a graphical representation:
             Diffie-Hellman key exchange:
                    Alice           on public network                         Bob
                                      pick a prime p
                                  find a primitive root r
              pick α, compute                                          pick β, compute
             A = r α (mod p)                                           B = r β (mod p)
                 publish A          ֌A           B֋                       publish B
              get B, compute                                            get A, compute
             S = B α (mod p)                                           S = Aβ (mod p)
                                        assymteric crypto with key S
                               ⇐=======================⇒

    P ROPOSITION 5.5.2. When Alice and Bob follow the DHKE protocol, they both com-
pute the same shared key; i.e., DHKE works.

    1Although in fact, some workable public key crypto had been invented earlier within the US/UK in-

telligence community, and not shared with the public. Since quite early in the Cold War, there have been
mathematical theorems and proofs held in secret by large governments.
108                           5. INDICES = DISCRETE LOGARITHMS

    P ROOF. There’s very little to check here: using the notation of the definition and, in
the very middle, the commutativity of multiplication, we have
                                  α
                       B α = r β = r β·α = r α·β = (r α )β = Aβ .
The left end of this equality is the S that Alice computes, while the right end is the one that
Bob computes, and they are the same.                                                         
   E XAMPLE 5.5.3. Alice and Bob agree in open, public discussion to use the prime
p = 617 and its primitive root r = 17.
   Alice privately chooses her secret value α = 19 and sends the value
                                A = 1719 ≡ 385 (mod 617)
to Bob by insecure e-mail. (All unencrypted e-mail is insecure, of course.)
    Bob chooses his secret value β = 13 and sends the value
                                B = 1713 ≡ 227 (mod 617)
to Alice by insecure e-mail.
    Alice computes the shared secret
                            S = B α = 22719 ≡ 127 (mod 617) .
      Bob computes the same shared secret instead by
                            S = Aβ = 38513 ≡ 127 (mod 617) .
      They then can use this value to do symmetric crypto for the rest of this communication.
    What about the practicality of DHKE? As was discussed in §4.4, finding a the [large]
prime p (we continue using the notation in Definition 5.5.1) is computationally feasible, as
are the several modular exponentiations in the DHKE protocol. There remains the question
of finding the primitive root r.
    One way is to avoid searching for r more than once. After one prime p and an associated
primitive root r for p are found, each user can choose their secret (Alice’s α and Bob’s β),
without overlap or conflict. This is the approach suggested in some Internet standards, see,
e.g., [HC] and [LK08].
    Another, more mathematical, strategy is based on the following definition, which we
give because of its independent mathematical interest and the amazing life of its namesake.
   D EFINITION 5.5.4. A prime p with the property that 2p + 1 is also a prime is called a
Sophie Germain prime.
    It is not known how many Sophie Germain primes there are, although it is conjectured
that there are an infinite number. In fact, there are precise conjectures on the asymptotic
density of such primes, as well as algorithmic techniques to generate them efficiently; see
[Sho09].
                                 5.5. DIFFIE-HELLMAN KEY EXCHANGE                                      109

    E XAMPLE 5.5.5. The first seventeen Sophie Germain primes are
         2, 3, 5, 11, 23, 29, 41, 53, 83, 89, 113, 131, 173, 179, 191, 233, 239
The sequence of all Sophie Germain primes is sequence A005384 at the On-Line Encyclo-
pedia of Integer Sequences, oeis.org.
   The largest known Sophie Germain prime, at the time of this writing, is
                                   18543637900515 × 2666667 − 1
discovered in 2012 by Philipp Bliedung and a large distributed network of computers.

    The reader is asked to investigate the usefulness of Sophie Germain primes in DHKE
is explored in the exercises, below.
     As mentioned in the last section 5.4, DHKE depends for its security on modular expo-
nentiation be a one-way function: feasible to compute forwards (by fast modular exponen-
tiation) but infeasible to computer backwards (which is an index).
     If discrete logs could be computed by a feasible algorithm, then Eve could completely
break the security of DHKE. Starting with the public values of A and B, she would compute
α = indr (A) and β = indr (B). She could then compute S as either B α , Aβ , or, directly,
as r α·β . Having S, she could decrypt all of Alice and Bob’s communications as they go by.
     Actually, it is not necessary for Eve to be able to compute discrete logs, as long as she
can solve a specific computation problem.

   D EFINITION 5.5.6. The Diffie-Hellman problem [DHP] consists of the following
question: Given
        • a prime p,
        • a primitive root r ∈ (Z/pZ)∗ , and
        • two elements a, b ∈ (Z/pZ)∗ which are known to be of the form a = r x and
          b = r y for x, y ∈ N, although the x and y are not known
compute
        • r xy .

    An efficient way to compute discrete logs would of course result in a solution of the
Diffie-Hellman problem, but it is not known if there might be a solution of the DHP which
is does not consist of a full-blown algorithm to compute discrete logs. Since this question
is open as of this writing, it makes sense to make the most precise statement: breaking
DHKE amounts of solving the DHP, which is not feasible on a classical computer2
    DHKE is one of the most widely used cryptologic protocols on the Internet. It is part
of SSL, TLS, SSH, IPsec, and many VPNs, among others.


    2
      As is the case with factoring, there is a known algorithm for a quantum computer which solves the DHP
efficiently, see [Sho94].
110                         5. INDICES = DISCRETE LOGARITHMS

      Exercises for §5.5.

    E XERCISE 5.5.1. Suppose you had an efficient algorithm to generate large Sophie Ger-
main primes. Describe how you would use this to do the initial choice of public parameters
for DHKE – go through all of the set-up stages of this protocol, and explain how each can
be done feasibly, based on either past discussions of calculations we can do feasibly or on
new ideas you develop here.

    E XERCISE 5.5.2. The number p = 11717 is prime, and r = 103 is a primitive root of
this p. Playing the role of Alice, your instructor computed A = 5123.
    Please play the role of Bob and do what is necessary to establish a shared key with your
instructor: you compute S as in DHKE, and send your B by e-mail so that your instructor
can also compute S. Then await further instructions by return e-mail, encrypted with the
shared secret.
    You may need to perform calculations that are hard to do on a hand calculator. If you
have access to, and are familiar with, some computer system like Octave, Matlab, or
Mathematica, you should be able to do the calculations that way. Otherwise, you might
try using your favorite search engine on the phrase “fast modular exponentiation applet”,
or using wolframalpha.com.

    E XERCISE 5.5.3. Spell out in precise, mathematical detail all of the steps which would
be used in a man-in-the-middle attack against DHKE.
                              5.6. THE ELGAMAL CRYPTOSYSTEM                                111

                            5.6. The ElGamal Cryptosystem
    As mentioned in the previous section, exponentiation in mod p, where p is a prime
known to the public, is a good candidate one-way function: It is fast (feasible) in the
forward direction; its inverse, being discrete log with respect to a primitive root r mod p is
thought to be infeasible – even when that root is known to the public. This was behind the
security of DHKE, and now we discuss how to use this one-way function to set up a more
usual public-key cryptosystem: the ElGamal Cryptosystem.
    The RSA public-key cryptosystem did its actual encryption by exponentiation in mod
n = pq. The decryption then was by exponentiation with the multiplicative inverse in mod
φ(n) of the encryption exponent. The owner of the private key – (p, q) – knows also the
decryption exponent, entirely because φ(n) can be computed.
    Similarly, ElGamal uses an elementary arithmetic operation – multiplication of a nu-
merical form of the message by a random number in some mod – to do the scrambling
needed for encryption. Enough information is also passed along in the ciphertext so that
the intended recipient, who knows the value of a certain discrete log, can cancel out this
scrambling multiplication. Here are the details:

    D EFINITION 5.6.1. To start, Alice picks a large prime p, a primitive root r mod p, and
a secret value α ∈ N satisfying 2 ≤ α ≤ p − 1. She computes the value a = r α and then
posts her ElGamal public [encryption] key (p, r, a) on her website.
    Alice’s ElGamal private [decryption] key is (p, r, α). The association of decryption
to encryption keys is by E : (p, r, α) 7→ (p, r, r α ).
    The message space is M = {m ∈ Z | 2 ≤ m ≤ p − 1}, which we will assume can be
interpreted as meaningful messages encoded numerically by some widely known scheme.
    Say Bob wishes to send Alice the cleartext m ∈ M. For each new such message m, he
generates a random number β ∈ N such that 2 ≤ β ≤ p − 2 and builds the ciphertext for
ElGamal encryption as the two pieces

                    c = e(p,r,a) (m) = (r β (mod p), m · aβ (mod p)) .

   When Alice gets the ciphertext c = (c1 , c2 ), she can recover the cleartext by ElGamal
decryption
                               d(p,r,α)(c1 , c2 ) = c2 · cp−1−α
                                                          1     .

   All of the above parts together form the ElGamal cryptosystem.

   We first need to know this is correct, in the sense that

   P ROPOSITION 5.6.2. With the notation as above in Definition 5.6.1 we have

                           d(p,r,α) (e(p,r,a) (m)) = m   ∀m ∈ M .
112                            5. INDICES = DISCRETE LOGARITHMS

      P ROOF. Just compute:

                 d(p,r,α) (e(p,r,a) (m)) ≡ m · aβ (mod p) · (r β )p−1−α   (mod p)
                                      ≡ m · (r α )β · (r β )p−1−α   (mod p)
                                      ≡ m · r αβ+β(p−1)−αβ      (mod p)
                                      ≡ m · (r p−1)β    (mod p)
                                      ≡ m · 1β     (mod p)
                                      ≡m      (mod p)

Note that the power p − 1 − α as the exponent of the c1 term in decryption is to make (c−1
                                                                                        1 )
                                                                                            α

without using negative powers, by applying Theorem 5.1.2.                                  

      Graphically:
          ElGamal cryptosystem:
                    Alice          on public network                       Bob
               pick a prime p
           find a primitive root r
        choose α | 2 ≤ α ≤ p − 1
         compute a = r α (mod p)
             publish public key      ֌ (p, r, a)                  download public key
                                                                 given message m ∈ M
                                                                   (so 2 ≤ m ≤ p − 1)
                                                               choose β | 2 ≤ β ≤ p − 2
                                                                compute c1 = r β (mod p)
                                                                and c2 = m · aβ (mod p)
             receive ciphertext               (c1 , c2 ) ֋          transmit ciphertext
             compute cleartext
              m = c2 · cp−1−α
                        1


      There is a nice digital signature algorithm associated with ElGamal:

     D EFINITION 5.6.3. Suppose Alice has ElGamal private key (p, r, α) and wishes to dig-
itally sign the message m ∈ M. She first chooses a random γ ∈ N satisfying 1 < γ < p−1
and gcd(γ, p − 1) = 1.
     The digital signature on m is (r γ (mod p), γ −1 · (m − αr γ ) (mod p − 1)), where the
inverse is taken in mod p − 1.
     To verify the signature (x, y) on message m using Alice’s public key (p, r, a), Bob
checks to see if ax xy ≡ r m (mod p): if so, he accepts; if not, he rejects.

      Again, we would like to know this does the right thing:
                            5.6. THE ELGAMAL CRYPTOSYSTEM                                               113

   P ROPOSITION 5.6.4. Using the notation as above, Bob will accept all signed messages
produced by Alice.

   P ROOF. Assuming the signed message (m, x, y) was produced by Alice as above, we
compute:
                                   γ            −1 (m−αr γ )
                        ax xy ≡ ar (r γ )γ                       (mod p)
                                           γ          −1 (m−αr γ )
                             ≡ (r α )r (r γ )γ                       (mod p)
                                       γ       −1 (m−αr γ )
                             ≡ r αr +γγ                         (mod p)
                                       γ          γ
                             ≡ r αr +m−αr                (mod p)
                             ≡ rm          (mod p)
So Bob will accept.                                                                                     
   Graphically:
    ElGamal digital signatures:
                   Alice              on public network                             Bob
             find prime p and
              primitive root r
      choose α | 2 ≤ α ≤ p − 1
       compute a = r α (mod p)
            publish public key         ֌ (p, r, a)                         download public key
         given message m ∈ M
           (so 2 ≤ m ≤ p − 1)
   pick random γ s.t. 1 < γ < p − 1
         and gcd(γ, p − 1) = 1
     compute s1 = r γ (mod p) and
  s2 = γ −1 · (m − αr γ ) (mod p − 1)
        transmit signed message         ֌ (m, s1 , s2 )                    receive signed message
                                                                          if as1 · ss12 ≡ r m (mod p)
                                                                               ACCEPT
                                                                           otherwise,
                                                                                REJECT
114                           5. INDICES = DISCRETE LOGARITHMS

      Exercises for §5.6.

    E XERCISE 5.6.1. You instructor still likes the prime p = 11717 with primitive root
r = 103 from an earlier exercise 5.5.2 on DHKE. In addition, your instructor has calculated
the value a = 1020 to complete an ElGamal public key (p, r, a) = (11717, 103, 1020).
    Using this public key, you want to send a message to your instructor, which should
consist of the number 42 (it is, after all, the answer to “life, the universe, and everything”).
What ciphertext will you send? Show your work!

   E XERCISE 5.6.2. Now your instructor wants to send you your grade on a recent test
by e-mail, and to prove that this e-mail does in fact originate with your instructor, the
email contains both the score value of 97 and the addendum “This score value signed with
an ElGamal Digital signature using my public key [the same instructor’s public key as
above in exercise 5.6.1, being (p, r, a) = (11717, 103, 1020)]; the signature has the value
(6220, 10407).”
   Do you accept this as truly coming from your instructor? Show your work!

    E XERCISE 5.6.3. Create an ElGamal public key and e-mail it to your instructor. Wait
for a reply message which is ElGamal encrypted, then mail the cleartext back to your
instructor.
    Also, use your public key to sign the number (=message) 17. Send the signed number
to your instructor and wait to hear if the signature is accepted or not.
                                          Bibliography

[AB09]   Sanjeev Arora and Boaz Barak, Computational Complexity: A Modern Approach, Cambridge
         University Press, 2009.
[AKS04] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena, Primes is in p, Annals of mathematics
         (2004), 781–793.
[Bou04]  Nicolas Bourbaki, Theory of sets, Springer, 2004.
[DH76]   Whitfield Diffie and Martin E Hellman, New directions in cryptography, Information Theory,
         IEEE Transactions on 22 (1976), no. 6, 644–654.
[FS03]   Niels Ferguson and Bruce Schneier, Practical cryptography, vol. 23, Wiley New York, 2003.
[Gau86]  Carl Friedrich Gauß, Disquisitiones Arithmeticae, 1801. English translation by Arthur A. Clarke,
         1986.
[Har05]  Godfrey Harold Hardy, A Mathematician’s Apology, 2005, First electronic edition, available at
         http://www.math.ualberta.ca/mss.
[HC]     Dan Harkins and Dave Carrel, RFC 2409: The Internet Key Exchange (IKE), November 1998,
         Status: Proposed Standard.
[HW79]   Godfrey Harold Hardy and Edward Maitland Wright, An introduction to the theory of numbers,
         Oxford University Press, 1979.
[LK08]   M Lepinski and S Kent, RFC 5114-Additional Diffie-Hellman Groups for Use with IETF Stan-
         dards, 2008.
[Lub96]  Michael George Luby, Pseudorandomness and Cryptographic Applications, Princeton Univer-
         sity Press, 1996.
[MVOV96] Alfred J Menezes, Paul C Van Oorschot, and Scott A Vanstone, Handbook of applied cryptog-
         raphy, CRC press, 1996.
[NC10]   Michael A Nielsen and Isaac L Chuang, Quantum computation and quantum information, Cam-
         bridge university press, 2010.
[PS04]   Jonathan A Poritz and Morton Swimmer, Hash woes, Virus Bulletin (2004), 14–16.
[RSA78]  Ronald L Rivest, Adi Shamir, and Len Adleman, A method for obtaining digital signatures and
         public-key cryptosystems, Communications of the ACM 21 (1978), no. 2, 120–126.
[Sha48]  C. E. Shannon, A mathematical theory of communication, Bell Systems Technical Journal 27
         (1948), 379–423, 623–656.
[Sha49]          , Communication theory of secrecy systems, Bell System Technical Journal 28 (1949),
         no. 4, 656–715.
[Sho94]  Peter W Shor, Algorithms for quantum computation: discrete logarithms and factoring, Foun-
         dations of Computer Science, 1994 Proceedings., 35th Annual Symposium on, IEEE, 1994,
         pp. 124–134.
[Sho09]  Victor Shoup, A computational introduction to number theory and algebra, Cambridge Univer-
         sity Press, 2009, on-line at http://shoup.net/ntb/.



                                                  115
116                                      BIBLIOGRAPHY

[Sta02]   Richard Stallman, Free Software, Free Society: Selected Essays of Richard M. Stallman, Lulu.
          com, 2002.
[WY05]    Xiaoyun Wang and Hongbo Yu, How to break md5 and other hash functions, Advances in
          Cryptology–EUROCRYPT 2005, Springer, 2005, pp. 19–35.
                                                  Index



additive inverse, 5                                      congruences
Adleman, Leonard, 75                                       dividing both sides, 24
al-Haytham, Ibn, 45                                        existence and number of solutions, 27, 28
al-Kindi, 65                                               number of solutions when RHS is 0, 24
ASCII, 78                                                congruent
associativity of addition and multiplication, 5            basic properties, 21
asymmetric cipher/cryptosystem/encryption, 74              definition, 21
Augustus, 60                                             coset, 48
authentication, 56                                       Creative Commons, iii
                                                         cryptanalysis, 55
base b, 10
                                                         crypto, 55
binary representation, 10
                                                         cryptographic hash function, 82
bit, 11
                                                         cryptography, 55
brute-force attack, 64
                                                         cryptology, 55
CA, 87                                                   cryptosystem, 55
Caesar cipher                                            cyclic subgroup
   cracking, 67                                            defined, 48
   definition, 60                                          used, 89, 97
certificate authority, 87
Chinese Remainder Theorem, 30, 38, 79                    Diffie, Whitfield, 107
cipher, 57                                               Diffie-Hellman key exchange, v, 107, 111
ciphertext, 57                                           Diffie-Hellman problem, 109
cleartext, 57                                            digital certificate, 87
collision resistance, 82                                 digital signature
commutativity of addition and multiplication, 5             ElGamal, 112
composite                                                   RSA, 83
   definition, 41                                        Diophantine equation, 27
   size of smallest factor, 41                           discrete log, 111
confidentiality, 56                                      discrete logarithm, 104
congruence classes                                       Disquisitiones Arithmeticae, 21
   addition, 34                                          distance between frequency distributions, 67
   addition and multiplication well-defined, 34          distributivity of multiplication over addition, 5
   definition, 33                                        divisibility
   multiplication, 34                                       definition, 6
   representatives, 34                                      of linear combinations, 6
                                                   117
118                                                  INDEX

  transitivity, 6                                            Federal Information Processing Standards, US,
  with relatively prime factors, 23                                83
Division Algorithm                                           Fermat’s Little Theorem
  statement, 7                                                  alternate statement, 49
  used, 6, 9, 25, 34, 79, 91                                    statement, 49, 51
divisor, 6                                                      used, 50, 99
domain, 99                                                   fingerprint, 82
                                                             frequency analysis, 65
ElGamal cryptosystem, v, 111                                 Fundamental Theorem of Arithmetic
encryption, 57                                                  statement, 42
equivalence classes                                             used, 95
  definition, 33
  disjoint or equal, 33                                      Gauss’s Theorem
  example: Q, 33                                               statement, 94
  example: congruence classes, 33                              used, 100
  representative, 33                                         Gauss, Carl Friedrich, 21
equivalence relation                                         GnuPG, 88
  definition, 33                                             graph [γράϕω, Greek root], 55
  reflexivity, 33                                            greatest common divisor
  symmetry, 33                                                 after them division algorithm, 17
  transitivity, 33                                             definition, 13
Euclid’s Lemma                                                 definition for more than two integers, 14
  statement, 24                                                examples, 13, 14, 16, 18, 19, 24, 28
  used, 24, 42, 52, 92                                         properties, 13, 14, 16, 19
Euclidean Algorithm                                            used, 13, 15, 24, 27, 30, 34, 35, 37, 38, 42,
  statement, 17                                                   47–49, 51, 52, 76, 77, 79, 80, 91, 92, 94, 95,
  used, 28, 77, 78, 80                                            98, 100, 103, 104, 112
Euler’s φ/totient function
                                ∗                            hacker, 55
  counts elements of (Z/nZ) , 37
                                                             Hellman, Martin, 107
  definition, 37
                                                             hex, 11
  is multiplicative for relatively prime integers,
                                                             hexadecimal, 11
     37
  used, 37, 48, 49, 51, 76–79, 89–91, 97–100
                                                             index, v
  values, 39, 77
                                                                basic properties, 104
Euler’s Theorem, v
                                                                definition, 103
  statement, 48, 51
                                                                examples, n = 17, 104
  used, 77, 89
                                                                examples, small n, 103
even integer, 6
                                                             information security, 56
exhaustive search, 64                                        information theoretically secure, 61
                                                             integrity, 56
factor of an integer, 6
fast modular exponentiation, 78, 104, 108, 109               Julius Caesar, 60
feasible computation
   definition, 77                                            Kerckhoff’s Principle, 58
   used, 77, 78, 80, 82, 104, 108–111                        key, 58
                                                  INDEX                                                 119

key distribution, 62                                      pairwise relatively prime, 30
key-signing party, 88                                        definition, 15
keyspace, 64                                              Pigeonhole Principle
kryptos [κρυπτ oς, Greek root], 55                           statement, 1
                                                             used, 47, 85
Lagrange’s Theorem, 89                                    PKI, 88
   statement, 47                                          plaintext, 57
   used, 100                                              polynomials in mod p, 98–100
least element                                             pre-image resistance, 82
   definition, 1                                          prime
least squares, 67                                            definition, 41
letter frequencies, English, 66                              dividing a product, 42, 99
linear congruence, 98                                     prime counting function, 77
   definition, 27                                         Prime Number Theorem, 77
   unique solution, 28, 98                                primitive root, v, 97, 98, 100, 107, 108, 111
   used, 98                                               Principle of Mathematical Induction, first version
logos [λóγoς, Greek root], 55                               statement, 1
                                                             used, 2
man-in-the-middle attack, 86                              Principle of Mathematical Induction, second
md5, 82                                                         version
mechanical turk, 65                                          statement, 3
messagespace, 65                                             used, 42
multiple, 6                                               private key, 74
multiplicative inverse                                    probabilistic polynomial-time Turing machine, 61
 in Z, 5                                                  pseudorandom, 62
 mod n, 28, 45                                            public key, 74
multiplicative order in mod n, 91, 97–100                 public key infrastructure, 88
 definition, 47                                           public-key cryptosystem, 74
 divides φ(n), 47
                                                          quantum computer, 75, 109
 examples, 89, 90
                                                          quotient, 7
 well-defined, 47
mutually relatively prime, 15                             relatively prime, 13
                                                             definition, 13
National Institute of Standards and Technology,              linear combination giving 1, 14
     US [NIST], 83                                        remainder
National Security Agency, US [NSA], 83                       definition, 7
non-repudiation, 56                                       Rivest, Ron, 75, 82
                                                          ROT13, 60
octal, 11                                                 RSA
Octavian (Augustus), 60                                      cryptosystem, v, 75, 107
odd integer, 6                                               exponent, 76
one-time pad, 61                                             modulus, 75
one-way function, 75, 104, 109, 111
OpenPGP, 88                                               scytale, 57
order, see also multiplicative order in mod n             second pre-image resistance, 82
120                                            INDEX

security through obscurity, 58
sexagesimal, 11
SHA-1, 83
SHA-2, 83
SHA-256, 83
Shamir, Adi, 75
signing key, 84
Sophie Germain prime
   definition, 108
   examples, 109
square error, 67
symmetric cipher/cryptosystem/encryption, 73

trusted third party, 87

Unicode, 78

verification key, 84
Vernam Cipher, 61
Vigenère cipher, 60

web of trust, 88
Well-Ordering Principle
  statement, 1
  used, 2, 7, 47
Wilson’s Theorem, 45, 102

yfesdrype, 56

zero product property, 99