Authors Jonathan A. Poritz
License CC-BY-SA-4.0
Yet Another Introductory Number Theory Textbook (Cryptology Emphasis Version) Jonathan A. Poritz after Wissam Raji Department of Mathematics and Physics Colorado State University, Pueblo 2200 Bonforte Blvd. Pueblo, CO 81001, USA E-mail: jonathan.poritz@gmail.com Web: www.poritz.net/jonathan 07 M AY 2014 11:04MDT Preface This is a first draft of a free (as in speech, not as in beer, [Sta02]) (although it is free as in beer as well) undergraduate number theory textbook. It was used for Math 319 at Colorado State University –Pueblo in the spring semester of 2014. Thanks are hereby offered to the students in that class – Megan Bissell, Tennille Candelaria, Ariana Carlyle, Michael De- graw, Daniel Fisher, Aaron Griffin, Lindsay Harder, Graham Harper, Helen Huang, Daniel Nichols, and Arika Waldrep – who offered many useful suggestions and found numerous typos. I am also grateful to the students in my Math 242 Introduction to Mathematical Pro- gramming class in that same spring semester of 2014 – Stephen Ciruli, Jamen Cox, Graham Harper, Joel Kienitz, Matthew Klamm, Christopher Martin, Corey Sullinger, James Todd, and Shelby Whalen – whose various programming projects produced code that I adapted to make some of the figures and examples in the text. The author gratefully acknowledges the work An Introductory Course in Elementary Number Theory by Wissam Raji [see www.saylor.org/books/] from which this was initially adapted. Raji’s text was released under the Creative Commons CC BY 3.0 license, see creativecommons.org/licenses/by/3.0 . This work is instead released under a CC BY-SA 4.0 license, see creativecommons.org/licenses/by-sa/4.0 . (The difference is that if you build future works off of this one, you must also release your derivative works with a license that allows further remixes over which you have no control.) This version: 07 May 2014 11:04MDT. Note this text will be frequently updated and improved as the author has time, particularly during and immediately after semesters in which it is being used in a class. Therefore please check back often to the website, which is www.poritz.net/jonathan/share/yaintt. This work is dedicated to my insanely hardworking colleagues at Colorado State Uni- versity – Pueblo whose dedication to their students, their scholarship, and their communi- ties is an inspiration. While I was working on the first version of this book, those colleagues stood up to some of the most benighted, ignorant administrative nonsense I have seen in the more than thirty years I have been involved in higher education. As MLK said, “The arc of the moral universe is long, but it bends towards justice.” – It is selfless, intelligent, hard work like yours that is doing the bending. Jonathan A. Poritz, 7 May 2014, Pueblo, CO, USA iii Release Notes This version of YAINTT has a particular emphasis on connections to cryptology. The cryptologic material appears in Chapter 4 and §§ 5.5 and 5.6, arising naturally (I hope) out of the ambient number theory. The main cryptologic applications – being the RSA cryptosystem, Diffie-Hellman key exchange, and the ElGamal cryptosystem – come out so naturally from considerations of Euler’s Theorem, primitive roots, and indices that it renders quite ironic G.H. Hardy’s assertion [Har05] of the purity and eternal inapplicability of number theory. Note, however, that once we broach the subject of these cryptologic algorithms, we take the time to make careful definitions for many cryptological concepts and to develop some related ideas of cryptology which have much more tenuous connections to the topic of number theory. This material therefore has something of a different flavor from the rest of the text – as is true of all scholarly work in cryptology (indeed, perhaps in all of computer science), which is clearly a discipline with a different culture from that of “pure” mathematics. Obviously, these sections could be skipped by an uninterested reader, or remixed away by an instructor for her own particular class approach. Caution: In good Bourbaki1 style, where this symbol appears in the text below, it indicates a place where the reasoning is intricate and difficult to follow, or calls attention to a common misinterpretation of some point. This version, in PDF form, can be found at http://www.poritz.net/jonathan/share/yaintt.pdf while all the files to create custom versions can be found at http://www.poritz.net/jonathan/share/yaintt/ – have fun with it, that’s the point of the Creative Commons! 1 A fictional mathematician and author of many (non-fictional – they really exist) fine mathematics texts, such as [Bou04] v Contents Preface iii Release Notes v Chapter 1. Well-Ordering and Division 1 1.1. The Well-Ordering Principle and Mathematical Induction 1 1.2. Algebraic Operations with Integers 5 1.3. Divisibility and the Division Algorithm 6 1.4. Representations of Integers in Different Bases 9 1.5. The Greatest Common Divisor 13 1.6. The Euclidean Algorithm 17 Chapter 2. Congruences 21 2.1. Introduction to Congruences 21 2.2. Linear Congruences 27 2.3. The Chinese Remainder Theorem 30 2.4. Another Way to Work with Congruences: Equivalence Classes 33 2.5. Euler’s φ Function 37 Chapter 3. Prime Numbers 41 3.1. Basics and the FTA 41 3.2. Wilson’s Theorem 45 3.3. Multiplicative Order and Applications 47 3.4. Another Approach to Fermat’s Little and Euler’s Theorems 51 Chapter 4. Cryptology 55 4.1. Some Speculative History 55 4.2. The Caesar Cipher and Its Variants 60 4.3. First Steps into Cryptanalysis: Frequency Analysis 64 4.4. Public-Key Crypto: the RSA Cryptosystem 73 4.5. Digital Signatures 81 4.6. Man-in-the-Middle Attacks, Certificates, and Trust 86 Chapter 5. Indices = Discrete Logarithms 89 5.1. More Properties of Multiplicative Order 91 vii viii CONTENTS 5.2. A Necessary Digression: Gauss’s Theorem on Sums of Euler’s Function 94 5.3. Primitive Roots 97 5.4. Indices 103 5.5. Diffie-Hellman Key Exchange 107 5.6. The ElGamal Cryptosystem 111 Bibliography 115 Index 117 CHAPTER 1 Well-Ordering and Division 1.1. The Well-Ordering Principle and Mathematical Induction In this chapter, we present three basic tools that will often be used in proving properties of the integers. We start with a very important property of integers called the well-ordering principle. We then state what is known as the pigeonhole principle, and then we proceed to present an important method called mathematical induction. 1.1.1. The Well-Ordering Principle. D EFINITION 1.1.1. Given a set S of numbers (of any kind), we say that ℓ ∈ S is a least element of S if ∀x ∈ S, either x = ℓ or ℓ < x. T HE W ELL -O RDERING P RINCIPLE . Every non-empty set of natural numbers has a least element. This principle is often taken as an axiom. 1.1.2. The Pigeonhole Principle. T HEOREM 1.1.2. The Pigeonhole Principle: Let s, k ∈ N satisfy s > k. If s objects are placed in k boxes, then at least one box contains more than one object. P ROOF. Suppose that none of the boxes contains more than one object. Then there are at most k objects. This leads to a contradiction with the fact that there are s objects for s > k. 1.1.3. The Principle of Mathematical Induction. We now present a valuable tool for proving results about integers. This tool is the principle of mathematical induction. T HEOREM 1.1.3. The First Principle of Mathematical Induction: Let S ⊂ N be a set satisfying the following two properties: (1) 1 ∈ S; and (2) ∀k ∈ N, k ∈ S ⇒ k + 1 ∈ S. Then S = N. More generally, if P(n) is a property of natural numbers which may or may not be true for any particular n ∈ N, satisfying (1) P(1) is true; and 1 2 1. WELL-ORDERING AND DIVISION (2) ∀k ∈ N, P(k) ⇒ P(k + 1) then ∀n ∈ N, P(n) is true. P ROOF. We use the well-ordering principle to prove this first principle of mathematical induction. Let S be the set from the first part of the theorem and let T be the set of natural numbers not in S. We will use a proof by contradiction, so assume T is non-empty. Then, by the well-ordering principle, T contains a least element ℓ. Note that 1 ∈ S, so 1 ∈/ T and thus ℓ > 1. Therefore ℓ − 1 is a natural number. Since ℓ is the least element of T , ℓ − 1 is not in T , it is therefore in S. But by the defining properties of S, since ℓ − 1 ∈ S, ℓ = ℓ − 1 + 1 ∈ S, which contradicts the fact that ℓ is a least element of T , so in T , so not in S. This contradiction implies that the assumption that T is non-empty is false, hence S = N. For the second part of the theorem, let S = {n ∈ N | P(n) is true} and apply the first part. E XAMPLE 1.1.4. We use mathematical induction to show that ∀n ∈ N n X n(n + 1) (1.1.1) j= . j=1 2 First note that 1 X 1·2 j=1= j=1 2 and thus the the statement is true for n = 1. For the remaining inductive step, suppose that P the formula holds for some particular n ∈ N, that is nj=1 j = n(n+1) 2 . We show that n+1 X (n + 1)(n + 2) j= . j=1 2 to complete the proof by induction. Indeed n+1 n X X n(n + 1) (n + 1)(n + 2) j= j + (n + 1) = + (n + 1) = , j=1 j=1 2 2 and the result follows. E XAMPLE 1.1.5. Now we use mathematical induction to prove that n! ≤ nn ∀n ∈ N. Note that 1! = 1 ≤ 11 = 1. We now present the inductive step. Suppose that n! ≤ nn for some n ∈ N, we prove that (n + 1)! ≤ (n + 1)n+1. Note that (n + 1)! = (n + 1)n! ≤ (n + 1).nn < (n + 1)(n + 1)n = (n + 1)n+1 . 1.1. THE WELL-ORDERING PRINCIPLE AND MATHEMATICAL INDUCTION 3 This completes the proof. T HEOREM 1.1.6. The Second Principle of Mathematical Induction: Let S ⊂ N be a set satisfying the following two properties: (1) 1 ∈ S; and (2) ∀k ∈ N, 1, . . . , k ∈ S ⇒ k + 1 ∈ S. Then S = N. More generally, if P(n) is a property of natural numbers which may or may not be true for any particular n ∈ N, satisfying (1) P(1) is true; and (2) ∀k ∈ N, if P(1), . . . , P(k) are all true, then P(k + 1) is also true then ∀n ∈ N, P(n) is true. P ROOF. To prove the second principle of induction, we use the first principle of induc- tion. Let S be a set of integers as in the first part of the theorem. For n ∈ N, let P(n) be the mathematical property “1, . . . , n ∈ S”. Then we can apply the First Principle of Mathematical Induction to prove that ∀n ∈ N P(n) is true, which means S = N. [Details left to the reader.] The second part of the theorem follows from the first in exactly the same way that the second part of the First Principle of Mathematical Induction followed from the first. 4 1. WELL-ORDERING AND DIVISION Exercises for §1.1. E XERCISE 1.1.1. Prove using mathematical induction that n < 3n for all positive inte- gers n. P E XERCISE 1.1.2. Show that nj=1 j 2 = n(n+1)(2n+1) 6 . P E XERCISE 1.1.3. Use mathematical induction to prove that nj=1(−1)j−1 j 2 = (−1)n−1 n(n+ 1)/2. P E XERCISE 1.1.4. Use mathematical induction to prove that nj=1 j 3 = [n(n + 1)/2]2 for every positive integer n. P E XERCISE 1.1.5. Use mathematical induction to prove that nj=1 (2j − 1) = n2 . E XERCISE 1.1.6. Use mathematical induction to prove that 2n < n! for n ≥ 4. E XERCISE 1.1.7. Use mathematical induction to prove that n2 < n! for n ≥ 4. 1.2. ALGEBRAIC OPERATIONS WITH INTEGERS 5 1.2. Algebraic Operations with Integers On Z, the set of integers, there are two basic binary operations, namely addition (de- noted by +) and multiplication (denoted by ·), which satisfy the following well known properties: (1) Commutativity of addition and multiplication ∀a, b ∈ Z : a+b= b+a a·b= b·a (2) Associativity of addition and multiplication ∀a, b, c ∈ Z : (a + b) + c = a + (b + c) (a · b) · c = a · (b · c) (3) Distributivity of multiplication over addition ∀a, b, c ∈ Z : a · (b + c) = a · b + a · c. In the set Z there are identity elements for the two operations + and ·, and these are the elements 0 and 1 respectively, that satisfy the basic properties ∀a ∈ Z : a+0=0+a=a a·1=1·a=a. The set Z allows additive inverses for its elements, in the sense that for every a ∈ Z there exists another integer in Z, denoted by −a, such that (1.2.1) a + (−a) = 0. While for multiplication, only the integer 1 has a multiplicative inverse in the sense that 1 is the only integer a such that there exists another integer, denoted by a−1 or by 1/a, (namely 1 itself in this case) such that (1.2.2) a · a−1 = 1. From the operations of addition and multiplication one can define two other operations on Z, namely subtraction (denoted by −) and division (denoted by /). Subtraction is a binary operation on Z, i.e., defined for any two integers in Z, while division is not a binary operation and thus is defined only for some specific pairs of integers in Z. Subtraction and division are defined as follows: (1) ∀a, b ∈ Z, a − b is defined to be a + (−b) (2) Given a, b ∈ Z, where b 6= 0, if ∃c ∈ Z such that a = b · c then a/b is defined to be c. 6 1. WELL-ORDERING AND DIVISION 1.3. Divisibility and the Division Algorithm We now discuss the concept of divisibility and its properties. 1.3.1. Integer Divisibility. D EFINITION 1.3.1. If a and b are integers such that a 6= 0, then we say a divides b and write a | b if there exists an integer k such that b = ka. That is, given a, b ∈ Z such that a 6= 0, we write a | b if ∃k ∈ Z. s.t. b = ka. If a divides b, we also say a is a factor [or divisor] of b, and b is a multiple of a. If a does not divide b, we write a ∤ b. E XAMPLE 1.3.2. For example, 2 | 4 and 7 | 63, while 5 ∤ 26. D EFINITION 1.3.3. Given a ∈ Z, we say a is even if 2 | a, i.e., if ∃k ∈ Z s.t. a = 2k. In contrast, given a ∈ Z, we say a is odd if 2 ∤ a. It is a consequence of the Division Algorithm, below, that if a is odd then ∃k ∈ Z s.t. a = 2k + 1. P ROPOSITION 1.3.4. ∀a ∈ Z we have a | 0. P ROPOSITION 1.3.5. If b ∈ Z is such that |b| < a, and b 6= 0, then a ∤ b. P ROPOSITION 1.3.6. Given a, b ∈ Z, a | b ⇔ a | |b|. T HEOREM 1.3.7. If a, b and c are integers such that a | b and b | c, then a | c. P ROOF. Since a | b and b | c, we know ∃k1 , k2 ∈ Z such that b = k1 a and c = k2 b. Hence c = k1 k2 a and so a | c. E XAMPLE 1.3.8. Since 6 | 18 and 18 | 36, then 6 | 36. The following theorem states that if an integer divides two other integers then it divides any linear combination of these integers. T HEOREM 1.3.9. ∀a, b, c, m, n ∈ Z, if c | a and c | b then c | (ma + nb). P ROOF. Since c | a and c | b, ∃k1 , k2 ∈ Z such that a = k1 c and b = k2 c. Thus ma + nb = mk1 c + nk2 c = c(mk1 + nk2 ), and hence c | (ma + nb). Theorem 1.3.9 can be generalized to any finite linear combination as follows. If n ∈ N, a, b1 , . . . , bn ∈ Z and a | b1 , a | b2 , . . . , a | bn then n X (1.3.1) a| kj bj j=1 ∀k1 , . . . , kn ∈ Z. It would be a nice exercise to prove this generalization by induction. 1.3. DIVISIBILITY AND THE DIVISION ALGORITHM 7 1.3.2. The Division Algorithm. T HEOREM 1.3.10. The Division Algorithm Given a, b ∈ Z such that b > 0, there exist unique q, r ∈ Z such that a = qb + r and 0 ≤ r < b. This q is called the quotient and r the remainder when a is divided by b. P ROOF. Consider the set A = {a − bk ≥ 0 | k ∈ Z}. Note that A is nonempty since for k < a/b, a − bk > 0. By the well-ordering principle, A has a least element r = a − qb for some q ∈ Z. Notice that r ≥ 0 by construction. Now if r ≥ b then (since b > 0) r > r − b = a − qb − b = a − (q + 1)b ≥ 0. This leads to a contradiction since r is assumed to be the least positive integer of the form r = a − qb. As a result we have 0 ≤ r < b. We will show that q and r are unique. Suppose that a = q1 b + r1 and a = q2 b + r2 with 0 ≤ r1 < b and 0 ≤ r2 < b. Then we have a − a = q1 b + r1 − (q2 b + r2 ) = (q1 − q2 )b + (r1 − r2 ) = 0. As a result we have (q1 − q2 )b = r2 − r1 . Thus we get that b | (r2 − r1 ). And since − max(r1 , r2 ) ≤ |r2 − r1 | ≤ max(r1 , r2 ), and b > max(r1 , r2 ), then r2 − r1 must be 0, i.e. r2 = r1 . And since bq1 + r1 = bq2 + r2 , we also get that q1 = q2 . This proves uniqueness. E XAMPLE 1.3.11. If a = 71 and b = 6, then 71 = 6 · 11 + 5. Here q = 11 and r = 5. 8 1. WELL-ORDERING AND DIVISION Exercises for §1.3. E XERCISE 1.3.1. Show that 5 | 25, 19 | 38 and 2 | 98. E XERCISE 1.3.2. Use the division algorithm to find the quotient and the remainder when 76 is divided by 13. E XERCISE 1.3.3. Use the division algorithm to find the quotient and the remainder when -100 is divided by 13. E XERCISE 1.3.4. Show that if a, b, c and d are integers with a and c nonzero, such that a | b and c | d, then ac | bd. E XERCISE 1.3.5. Show that if a and b are positive integers and a | b, then a ≤ b. E XERCISE 1.3.6. Prove that the sum of two even integers is even, the sum of two odd integers is even and the sum of an even integer and an odd integer is odd. E XERCISE 1.3.7. Show that the product of two even integers is even, the product of two odd integers is odd and the product of an even integer and an odd integer is even. E XERCISE 1.3.8. Show that if m is an integer then 3 divides m3 − m. E XERCISE 1.3.9. Show that the square of every odd integer is of the form 8m + 1. E XERCISE 1.3.10. Show that the square of any integer is of the form 3m or 3m + 1 but not of the form 3m + 2. E XERCISE 1.3.11. Show that if ac | bc, then a | b. E XERCISE 1.3.12. Show that if a | b and b | a then a = ±b. 1.4. REPRESENTATIONS OF INTEGERS IN DIFFERENT BASES 9 1.4. Representations of Integers in Different Bases In this section, we show how any positive integer can be written in terms of any positive base integer expansion in a unique way. Normally we use decimal notation to represent integers, we will show how to convert an integer from decimal notation into any other positive base integer notation and vise versa. Using the decimal notation in daily life is more traditional probably only because we have ten fingers. (“What about our toes?” you cry. I don’t know. And apparently the Babylonians had 30 fingers on each hand, or 15 on each hand and each foot, since they used base 60.) Notation An integer a written in base b expansion is denoted by (a)b . T HEOREM 1.4.1. Let b ∈ Z satisfy b > 1. Then ∀m ∈ N, ∃l ∈ N and ∃a1 , . . . , al ∈ Z such that m = al bl + al−1 bl−1 + · · · + a1 b + a0 , 0 ≤ aj < b for j = 0, 1, . . . , l, and al 6= 0. P ROOF. Fix an m ∈ N. We start by dividing m by b and we get m = q0 b + a0 , 0 ≤ a0 < b. If q0 6= 0 then we continue to divide q0 by b and we get q0 = q1 b + a1 , 0 ≤ a1 < b. We continue this process and hence we get q1 = q2 b + a2 , 0 ≤ a2 < b, . . . ql−2 = ql−1 b + al−1 , 0 ≤ al−1 < b, ql−1 = 0 · b + al , 0 ≤ al < b. Note that the sequence q0 , q1 , . . . is a decreasing sequence of non-negative integers with a last term ql that must be 0. Now substituting the equation q0 = q1 b + a1 in m = q0 b + a0 , we get m = (q1 b + a1 )b + a0 = q1 b2 + a1 b + a0 , 10 1. WELL-ORDERING AND DIVISION Successively substituting the equations in m, we get m = q2 b3 + a2 b2 + a1 b + a0 , . . . = ql−1 bl + al−1 bl−1 + · · · + a1 b + a0 , = al bl + al−1 bl−1 + · · · + a1 b + a0 . What remains to prove is that the representation is unique. Suppose now that m = al bl + al−1 bl−1 + · · · + a1 b + a0 = cl bl + cl−1 bl−1 + · · · + c1 b + c0 where if the number of terms is different in one expansion, we add zero coefficients to make the number of terms agree. Subtracting the two expansions, we get (al − cl )bl + (al−1 − cl−1 )bl−1 + · · · + (a1 − c1 )b + (a0 − c0 ) = 0. If the two expansions are different, then there exists 0 ≤ j ≤ l such that cj 6= aj . As a result, we get bj ((al − cl )bl−j + · · · + (aj+1 − cj+1)b + (aj − cj )) = 0 and since b 6= 0, we get (al − cl )bl−j + · · · + (aj+1 − cj+1 )b + (aj − cj ) = 0. We now get aj − cj = (al − cl )bl−j + · · · + (aj+1 − cj+1 )b, and as a result, b | (aj − cj ). Since 0 ≤ aj < b and 0 ≤ cj < b, we get that aj = cj . This is a contradiction and hence the expansion is unique. D EFINITION 1.4.2. Given b ∈ Z satisfying b > 1. For m ∈ N, let ℓ ∈ N and a1 , . . . , aℓ ∈ Z be as in the above theorem (1.4.1). Then the base b expression for m is the sequences of digits mb = aℓ . . . a1 . If b ≥ 10, we often use some other single symbols to represent the possible values from 10 to b − 1 of the ai ’s. For example, 10 ! A 11 ! B 12 ! C etc. Base 2 representation of integers is called binary representation. Binary representation is useful for computers: the coefficients a0 , . . . , al of a binary representation all satisfy 0 ≤ aj < 2, hence they are 0 or 1. Thus to represent an integer on l wires, one can have 1.4. REPRESENTATIONS OF INTEGERS IN DIFFERENT BASES 11 each wire either have voltage (1) or not (0). (In fact, the word bit is a contraction of binary digit.) Computer programmers also frequently use base 8 and base 16, called octal and hexa- decimal or hex, respectively. The Babylonians used base 60, called sexagesimal. E XAMPLE 1.4.3. To find the expansion of 214 base 3: we do the following 214 = 3 · 71 + 1 71 = 3 · 23 + 2 23 = 3 · 7 + 2 7=3·2+1 2=3·0+2 As a result, to obtain a base 3 expansion of 214, we take the remainders of divisions and we get that (214)10 = (21221)3. E XAMPLE 1.4.4. To find the base 10 expansion, i.e., the decimal expansion, of (364)7: We do the following: 4 · 70 + 6 · 71 + 3 · 72 = 4 + 42 + 147 = 193. 12 1. WELL-ORDERING AND DIVISION 1.4.1. Exercises for §1.4. E XERCISE 1.4.1. Convert (7482)10 to base 6 notation. E XERCISE 1.4.2. Convert (98156)10 to base 8 notation. E XERCISE 1.4.3. Convert (101011101)2 to decimal notation. E XERCISE 1.4.4. Convert (AB6C7D)16 to decimal notation. E XERCISE 1.4.5. Convert (9A0B)16 to binary notation. 1.5. THE GREATEST COMMON DIVISOR 13 1.5. The Greatest Common Divisor In this section we define the greatest common divisor (gcd) of two integers and discuss its properties. We also prove that the greatest common divisor of two integers is a linear combination of these integers. Two integers a and b, not both 0, can have only finitely many divisors (see Exer- cise 1.3.5), and thus can have only finitely many divisors in common. In this section, we are interested in the greatest of these common divisors. D EFINITION 1.5.1. Given a, b ∈ Z, not both zero, the greatest common divisor is the largest integer that divides both a and b, and is written gcd(a, b) (or sometimes just (a, b)). When it makes some formulæ simpler, we will write gcd(0, 0) = 0. E XAMPLE 1.5.2. The greatest common divisor of 24 and 18 is 6. In other words gcd(24, 18) = 6. D EFINITION 1.5.3. a, b ∈ Z are said to be relatively prime if gcd(a, b) = 1. E XAMPLE 1.5.4. The greatest common divisor of 9 and 16 is 1, thus they are relatively prime. Note that every integer has positive and negative divisors. If a is a positive divisor of m, then −a is also a divisor of m. Therefore by our definition of the greatest common divisor, we can see that gcd(a, b) = gcd(|a|, |b|). We can use the gcd of two integers to make relatively prime integers: T HEOREM 1.5.5. If a, b ∈ Z have gcd(a, b) = d then gcd(a/d, b/d) = 1. P ROOF. Fix a, b ∈ Z. We will show that a/d and b/d have no common positive divisors other than 1. Let k ∈ N be a divisor of both a/d and b/d, so ∃m, n ∈ N such that a/d = km and b/d = kn Thus we get that a = kmd and b = knd. Hence kd is a common divisor of both a and b. Also, kd ≥ d. However, d is the greatest common divisor of a and b. As a result, we get that k = 1. The next theorem shows that the greatest common divisor of two integers does not change when we add a multiple of one of the two integers to the other. T HEOREM 1.5.6. Let a, b, c ∈ Z. Then gcd(a, b) = gcd(a + cb, b). P ROOF. We will show that every divisor of a and b is also a divisor of a + cb and b and vise versa. Hence they have exactly the same divisors. So we get that the greatest common divisor of a and b will also be the greatest common divisor of a + cb and b. Let k be a 14 1. WELL-ORDERING AND DIVISION common divisor of a and b. By Theorem 1.3.9, k | (a + cb) and hence k is a divisor of a + cb. Now assume that l is a common divisor of a + cb and b. Also by Theorem 1.3.9 we have, l | ((a + cb) − cb) = a. As a result, l is a common divisor of a and b and the result follows. E XAMPLE 1.5.7. Notice that gcd(4, 14) = gcd(4, 14 − 3 · 4) = gcd(4, 2) = 2. We now present a theorem which proves that the greatest common divisor of two inte- gers can be written as a linear combination of the two integers. T HEOREM 1.5.8. Let a, b ∈ Z not both be zero. Then gcd(a, b) is the smallest natural number which is of the form d = ma + nb for some m, n ∈ Z. P ROOF. Assume without loss of generality that a, b ∈ N are positive integers. Consider the set S = {d ∈ N | d = ma + nb for some m, n ∈ Z} . S is non-empty since a = 1 · a + 0 · b and b = 0 · a + 1 · b are both in S. Let d ∈ N be the least element of S, whose existence is guaranteed by the well-ordering principle. Notice d = ma + nb for some m, n ∈ Z, since d ∈ S. We still must prove that d divides both a and b and that it is the greatest such common divisor. By the division algorithm, ∃q, r ∈ Z such that a = qd + r, 0 ≤ r < d. Thus we have r = a − qd = a − q(ma + nb) = (1 − qm)a − qnb. We then have that r is a linear combination of a and b. Since 0 ≤ r < d and d is the least positive integer which is a linear combination of a and b, we must have r = 0 and so a = qd. Hence d | a. The same sort of argument will show that d | b. Now notice that if there is a divisor c that divides both a and b. Then c divides any linear combination of a and b by Theorem 1.3.9. Hence c | d. This proves that any common divisor of a and b divides d. Hence c ≤ d, and d is the greatest common divisor. There is a simple application of this which will be very useful in the future: C OROLLARY 1.5.9. If a, b ∈ Z are relatively prime, then ∃m, n ∈ Z such that ma + nb = 1. D EFINITION 1.5.10. For some n ∈ N, let a1 , a2 , . . . , an ∈ Z not be all 0. The greatest common divisor of these integers is the largest integer that divides all of them, and is denoted gcd(a1 , . . . , an ). 1.5. THE GREATEST COMMON DIVISOR 15 D EFINITION 1.5.11. For some n ∈ N, a1 , a2 , . . . , an ∈ Z are said to be mutually relatively prime if gcd(a1 , a2 , . . . , an ) = 1. E XAMPLE 1.5.12. The integers 3, 6, 7 are mutually relatively prime since (3, 6, 7) = 1 although (3, 6) = 3. D EFINITION 1.5.13. For some n ∈ N, a1 , a2 , . . . , an ∈ Z are called pairwise relatively prime if ∀i, j ∈ N such that i ≤ n, j ≤ n, and i 6= j, we have gcd(ai , aj ) = 1. E XAMPLE 1.5.14. The integers 3, 14, 25 are pairwise relatively prime. Notice also that these integers are mutually relatively prime. P ROPOSITION 1.5.15. For n ∈ N and a1 , . . . , an ∈ Z, if a1 , a2 , . . . , an are pairwise relatively prime then they are mutually relatively prime. 16 1. WELL-ORDERING AND DIVISION Exercises for §1.5. E XERCISE 1.5.1. Find the greatest common divisor of 15 and 35. E XERCISE 1.5.2. Find the greatest common divisor of 100 and 104. E XERCISE 1.5.3. Find the greatest common divisor of -30 and 95. E XERCISE 1.5.4. Let m ∈ N. Find the greatest common divisor of m and m + 1. E XERCISE 1.5.5. Let m ∈ N, find the greatest common divisor of m and m + 2. E XERCISE 1.5.6. Show that if m, n ∈ Z have gcd(m, n) = 1, then gcd(m+n, m−n) = 1 or 2. E XERCISE 1.5.7. Show that if m ∈ N, then 3m + 2 and 5m + 3 are relatively prime. E XERCISE 1.5.8. Show that if a, b ∈ Z are relatively prime, then gcd(a+2b, 2a+b) = 1 or 3. E XERCISE 1.5.9. Show that if a1 , a2 , . . . , an ∈ Z are not all 0 and c ∈ N, then gcd(ca1 , ca2 , . . . , can ) = c · gcd(a1 , a2 , . . . , an ). 1.6. THE EUCLIDEAN ALGORITHM 17 1.6. The Euclidean Algorithm In this section we describe a systematic method that determines the greatest common divisor of two integers, due to Euclid and thus called the Euclidean algorithm. L EMMA 1.6.1. If a, b, q, r ∈ Z and a = qb + r, then gcd(a, b) = gcd(r, b). P ROOF. Note that by theorem 8, we have gcd(bq + r, b) = gcd(b, r). Now to the Euclidean algorithm in its general form, which basically states that the greatest common divisor of two integers is the last non zero remainder of successive divi- sions. T HEOREM 1.6.2. Let a, b ∈ N and assume a ≥ b. Define r0 = a, r1 = b, s0 = 1, s1 = 0, t0 = 0, and t1 = 1. Then apply the division algorithm successively to obtain quotients and remainders qj , rj ∈ N satisfying rj = rj+1qj+1 + rj+2 and 0 ≤ rj+2 < rj+1 for all j = 0, 1, . . . , n − 2 where n is defined so that rn+1 = 0. Along the way, also define sj+1 = sj−1 − qj+1 sj and tj+1 = tj−1 − qj+1 tj . Then gcd(a, b) = rn = sn+1 a + tn+1 b. P ROOF. By applying the division algorithm, we see that r0 = q1 r1 + r2 0 ≤ r2 < r1 , r1 = q2 r2 + r3 0 ≤ r3 < r2 , . . . rn−2 = qn−1 rn−1 + rn 0 ≤ rn < rn−1 , rn−1 = qn rn . Notice that, we will have a remainder of 0 eventually since all the remainders are integers and every remainder in the next step is less than the remainder in the previous one. By Lemma 1.6.1, we see that gcd(a, b) = gcd(b, r2 ) = gcd(r2 , r3 ) = · · · = gcd(rn , 0) = rn . Note: The full version of this theorem, with the sj ’s and tj , is called the extended Eu- clidean Algorithm, while a simpler version without those coefficients is know as Eu- clidean Algorithm. The attentive reader will have seen that We did not actually prove that the sj ’s and tj ’s can be used, as claimed, to write the gcd as a linear combination of a and b. This proof is left as an exercise, below. 18 1. WELL-ORDERING AND DIVISION E XAMPLE 1.6.3. We will find the greatest common divisor of 4147 and 10672: Note that 10672 = 4147 · 2 + 2378, 4147 = 2378 · 1 + 1769, 2378 = 1769 · 1 + 609, 1769 = 609 · 2 + 551, 609 = 551 · 1 + 58, 551 = 58 · 9 + 29, 58 = 29 · 2, Hence gcd(4147, 10672) = 29. 1.6. THE EUCLIDEAN ALGORITHM 19 Exercises for §1.6. E XERCISE 1.6.1. Use the Euclidean algorithm to find the greatest common divisor of 412 and 32 and express it in terms of the two integers. E XERCISE 1.6.2. Use the Euclidean algorithm to find the greatest common divisor of 780 and 150 and express it in terms of the two integers. E XERCISE 1.6.3. Find the greatest common divisor of 70, 98, 108. E XERCISE 1.6.4. Let a, b ∈ N be even. Prove that gcd(a, b) = 2 gcd(a/2, b/2). E XERCISE 1.6.5. Show that if a ∈ N is even and b ∈ N is odd, then gcd(a, b) = gcd(a/2, b). E XERCISE 1.6.6. Prove the extended part of the Extended Euclidean Algorithm. CHAPTER 2 Congruences A congruence is nothing more than a statement about divisibility. The theory of con- gruences was introduced by Carl Friedrich Gauss, in his monumental Disquisitiones Arith- meticae (published in 1801, when he was 24;a translation is [Gau86]). We start by introducing congruences and their properties. We then present solutions to linear congruences which will serve as an introduction to the Chinese Remainder Theorem that follows. 2.1. Introduction to Congruences As we mentioned in the introduction, the theory of congruences was developed by Gauss at the beginning of the nineteenth century. D EFINITION 2.1.1. Given a, b ∈ Z and n ∈ N, we say that a is congruent to b modulo n if n | (a − b), i.e., if ∃k ∈ Z such that a = b + kn. If a is congruent to b modulo n, we write a ≡ b (mod n). E XAMPLE 2.1.2. 19 ≡ 5 (mod 7). Similarly 2k + 1 ≡ 1 (mod 2) which means every odd number is congruent to 1 modulo 2. Congruence is much like equality in many ways. For example: T HEOREM 2.1.3. Given a, b, c, d ∈ Z and n ∈ N. Then (1) If a ≡ b (mod n), then b ≡ a (mod n). (2) If a ≡ b (mod n) and b ≡ c (mod n), then a ≡ c (mod m). (3) If a ≡ b (mod n), then a + c ≡ b + c (mod n). (4) If a ≡ b (mod n), then a − c ≡ b − c (mod n). (5) If a ≡ b (mod n), then ac ≡ bc (mod n). (6) If c > 0 and a ≡ b (mod n), then ac ≡ bc (mod nc). (7) If a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n). (8) If a ≡ b (mod n) and c ≡ d (mod n) then a − c ≡ b − d (mod n). (9) If a ≡ b (mod n) and c ≡ d (mod n) then ac ≡ bd (mod n). P ROOF. (1) If a ≡ b (mod n), then n | (a − b). Thus ∃k ∈ Z such that a − b = kn. This implies b − a = (−k)n and thus n | (b − a). Consequently b ≡ a (mod n). 21 22 2. CONGRUENCES (2) Since a ≡ b (mod n) and b ≡ c (mod n), n | (a − b) and n | (b − c). As a result, there ∃k, l ∈ Z such that a = b + kn and b = c + ln, which imply that a = c + (k + l)n. In other words, a = c (mod n). (3) Since a ≡ b (mod n), n | (a − b). So if we add and subtract c we get n | ((a + c) − (b + c)) which means that a+c ≡b+c (mod n). (4) Since a ≡ b (mod n), n | (a − b) so we can subtract and add c to get n | ((a − c) − (b − c)) so a−c ≡b−c (mod n). (5) If a ≡ b (mod n), n | (a − b). Thus there ∃k ∈ Z such that a − b = kn and as a result ac − bc = (kc)n. Therefore n | (ac − bc) and hence ac ≡ bc (mod n). (6) If a ≡ b (mod n), n | (a − b). Thus there ∃k ∈ Z such that a − b = kn and as a result ac − bc = (k)cn. Thus nc | (ac − bc) and hence ac ≡ bc (mod nc). (7) Since a ≡ b (mod n) and c ≡ d (mod n), n | (a − b) and n | (c − d). As a result, there ∃k, l ∈ Z such that a − b = kn and c − d = ln. Note that (a − b) + (c − d) = (a + c) − (b + d) = (k + l)n. As a result, n | ((a + c) − (b + d)), hence a + c ≡ b + d (mod n). 2.1. INTRODUCTION TO CONGRUENCES 23 (8) If a = b + kn and c = d + ln for k, l ∈ Z, we have (a − b) − (c − d) = (a − c) − (b − d) = (k − l)n. As a result, n | ((a − c) − (b − d)), hence a − c ≡ b − d (mod n). (9) ∃k, l ∈ Z such that such that a − b = kn and c − d = ln and thus ca − cb = (ck)n and bc − bd = (bl)n. Note that (ca − cb) + (bc − bd) = ac − bd = (kc − lb)n. As a result, n | (ac − bd), hence ac ≡ bd (mod n). Here is a technical result which will be useful later: T HEOREM 2.1.4. Given a, b, c ∈ Z, if a | c, b | c, and a and b are relatively prime, then ab | c. P ROOF. By Corollary 1.5.9, we know ∃m, n ∈ Z such that ma + nb = 1. Also, because of the divisibility hypotheses, we also know ∃p, q ∈ Z such that c = pa and c = qb. Compute: c = c · 1 = c(ma + nb) = mca + ncb = mqba + npab = (mq + np)ab . But this means ab | c, as desired. E XAMPLES 2.1.5. (1) 14 ≡ 8 (mod 6) so 8 ≡ 14 (mod 6). (2) Because 22 ≡ 10 (mod 6) and 10 ≡ 4 (mod 6), it is also true that 22 ≡ 4 (mod 6). (3) 50 ≡ 20 (mod 15) so 50 + 5 = 55 ≡ 20 + 5 = 25 (mod 15). (4) 50 ≡ 20 (mod 15) so 50 − 5 = 45 ≡ 20 − 5 = 15 (mod 15). (5) 19 ≡ 16 (mod 3) so 2(19) = 38 ≡ 2(16) = 32 (mod 3). (6) 19 ≡ 16 (mod 3) so 2(19) = 38 ≡ 2(16) = 32 (mod 2 · 3) or 38 ≡ 2(16) = 32 (mod 6). (7) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19 + 17 = 36 ≡ 3 + 9 = 12 (mod 8). 24 2. CONGRUENCES (8) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19 − 17 = 2 ≡ 3 − 9 = −6 (mod 8). (9) Because 19 ≡ 3 (mod 8) and 17 ≡ 9 (mod 8), we have 19(17) = 323 ≡ 3(9) = 27 (mod 8). Here is a result which at first seems very simple, but turns out to be immensely useful – so useful it has a name. L EMMA 2.1.6. Euclid’s Lemma: Given x, y, z ∈ Z, if x | yz and gcd(x, y) = 1 then x | z. P ROOF. From Corollary 1.5.9, we know ∃m, n ∈ Z such that mx + ny = 1. Multiply- ing by z, we get mxz + nyz = z. But we’ve assumed that x | yz, so x | nyz, and certainly x | mxz, so x | mxz + nyz, i.e., x | z. We now present a theorem that will show one difference between equations and congru- ences: in equations, if we divide both sides of the equation by a non-zero number, equality holds. However, in congruences, this is not necessarily true. In other words, dividing both sides of a congruence by the same integer does not necessarily preserve the congruence. T HEOREM 2.1.7. (1) Given a, b, c ∈ Z and n ∈ N, define d = gcd(a, n) ∈ N. If ab ≡ ac (mod n) then b ≡ c (mod n/d). (2) In particular, if gcd(a, n) = 1 then b = c (mod n) ⇔ ab ≡ ac (mod n). P ROOF. For Part 1, if ab ≡ ac (mod n), then n | (ab − ac) = a(b − c). Hence ∃k ∈ Z such that a(b − c) = kn. Dividing both sides by d, we get (a/d)(b − c) = k(n/d) or (n/d) | (a/d)(b − c). Now, by Theorem 1.5.5 gcd(a/d, n/d) = 1 so Euclid’s Lemma 2.1.6 tells us that (n/d) | (b − c). Hence b ≡ c (mod n/d). For Part 2, the direction ⇒ is part 5 of Theorem 2.1.3, while ⇐ is a special case of Part 1. E XAMPLE 2.1.8. 38 ≡ 10 (mod 7). Since gcd(2, 7) = 1, we have 19 ≡ 5 (mod 7). One last technical result is worth stating clearly at this point: T HEOREM 2.1.9. Given n, d ∈ N such that d | n, there are exactly d values x ∈ Z, up to congruence modulo n, satisfying x ≡ 0 (mod n/d). 2.1. INTRODUCTION TO CONGRUENCES 25 P ROOF. Let xj = j(n/d) for j = 0, . . . , (d − 1). Certainly each of these d values xj is a multiple of n/d and so solves x ≡ 0 (mod n/d). All we must show, then, is that every solution x of x ≡ 0 (mod n/d) is congruent, modulo n, to one of these xj . Let x be such a solution, so ∃k ∈ Z such that x = k(n/d). Use the Division Algorithm for x divided by n, getting x = qn + r for some q, r ∈ Z with 0 ≤ r < n. But r = x − qn = k(n/d) − qd(n/d) = (k − qd)(n/d) so r is a multiple of (n/d) which lies in the range [0, n). The only such multiples are the x0 , . . . , x(d−1) defined above; say r = xj . Then x = qn + r = qn + xj ≡ xj (mod xj ), so every solution x is congruent modulo n to exactly one of the d particular solutions x1 , . . . , x(d−1) . 26 2. CONGRUENCES Exercises for §2.1. E XERCISE 2.1.1. Determine whether 3 and 99 are congruent modulo 7 or not. E XERCISE 2.1.2. Show that if x is an odd integer, then x2 ≡ 1 (mod 8). E XERCISE 2.1.3. Show that if a, b ∈ Z and m, n ∈ N are such that n | m and a ≡ b (mod m), then a ≡ b (mod n). E XERCISE 2.1.4. Show that if n, k ∈ N and {a1 , . . . , ak , b1 , . . . , bk } ⊂ Z satisfy ai ≡ bi P P (mod n) for i = 1, 2, ..., k, then ki=1 ai ≡ ki=1 bi (mod n). E XERCISE 2.1.5. For which n ∈ N is it true that 1 + 2 + ... + (n − 1) ≡ 0 (mod n)? 2.2. LINEAR CONGRUENCES 27 2.2. Linear Congruences Because congruence is analogous to equality, it is natural to ask about the analogues of linear equations, the simplest equations one can solve in algebra, but using congruence rather than equality. In this section, we discuss linear congruences of one variable and their solutions. We start with a definition: D EFINITION 2.2.1. Given constants a, b ∈ Z and n ∈ Z, a congruence of the form ax ≡ b (mod n) where x ∈ Z is unknown is called a linear congruence in one variable. If a linear congruence has one solution, then it has infinitely many: T HEOREM 2.2.2. Given constants a, b ∈ Z, n ∈ Z, and a solution x ∈ Z to the linear congruence ax ≡ b (mod n), any other x′ ∈ Z satisfying x′ ≡ x (mod n) is also a solution of the same congruence. [Note: in the early history of number theory, before Gauss, one talked about D EFINITION 2.2.3. An algebraic equation whose constants and variables are all inte- gers is called a Diophantine equation. Then the modern linear congruence ax ≡ b (mod n), for a, b ∈ Z and n ∈ N is equivalent to the linear Diophantine equation ax − ny = b in the two unknowns x and y.] The following gives a fairly complete characterization of solutions of linear congru- ences: T HEOREM 2.2.4. Let a, b ∈ Z and n ∈ N and consider the linear congruence ax ≡ b (mod n) . Setting d = gcd(a, n), we have (1) If d ∤ b, then the congruence has no solutions. (2) If d | b, then the congruence has exactly d solutions which are distinct modulo n. P ROOF. For Part 1, we prove its contrapositive. So assume the congruence has solu- tions, meaning ∃x, k ∈ Z such that ax − b = kn, or ax − kn = b. But since d = gcd(a, n) is a common divisor of a and n, it divides the linear combination ax − kn = b. Hence d | b. Now for Part 2, assume d | b, so ∃k ∈ Z such that kd = b. But from Theorem 1.5.8 we know ∃p, q ∈ Z such that d = pa + qn. This means that kpa + kqn = kd = b or, rearranging, a(kp) − b = (−kq)n. Hence n | a(kp) = b, i.e., a(kp) ≡ b (mod n) and thus x = kp is one solution to the linear congruence ax ≡ b (mod n). Finally, let us prove that there are the correct number of solutions, mod n, of the con- gruence equation. We have just seen that there is at least one x ∈ Z satisfying ax ≡ b (mod n). Let y ∈ Z be any other solution. Then ax ≡ b ≡ ay (mod n). By part (2) of 28 2. CONGRUENCES Theorem 2.1.7, x ≡ y (mod n/d), or δ = y − x ≡ 0 (mod n/d). Now by Theorem 2.1.9, there are exactly d possibilities, modulo n, for this δ. Thus there are d solutions of ax ≡ b (mod n) of the form x + δ. R EMARK 2.2.5. Notice that if a ∈ Z and n ∈ N are relatively prime, then ∀b ∈ Z there is a unique solution modulo n to the equation ax ≡ b (mod n). E XAMPLE 2.2.6. Let us find all the solutions of the congruence 3x ≡ 12 (mod 6). Notice that gcd(3, 6) = 3 and 3 | 12. Thus there are three incongruent solutions modulo 6. Using the Euclidean Algorithm to find the solution of the equation 3x − 6y = 12 we get a solution x0 = 6. Thus the three incongruent (modulo 6) solutions are given by x1 = 6 (mod 6), x1 = 6 + 2 = 2 (mod 6) and x2 = 6 + 4 = 4 (mod 6). As we mentioned in Remark 2.2.5, the congruence ax ≡ b (mod n) for a, b ∈ Z and n ∈ N has a unique (modulo n) solution if gcd(a, n) = 1. This will allow us to talk about modular inverses. D EFINITION 2.2.7. Given a ∈ Z and n ∈ N, a solution to the congruence ax ≡ 1 (mod n) for (a, n) = 1 is called the inverse of a modulo n. We denote such an inverse by a−1 , with the n to be understood from context. Stating formally what was just recalled from Remark 2.2.5, we have C OROLLARY 2.2.8. Given a ∈ Z and n ∈ N which are relatively prime, the modular inverse a−1 exists and is unique modulo n. E XAMPLE 2.2.9. The modular inverse 7−1 of 7 modulo 48 is 7. Notice that a solution of 7x ≡ 1 (mod 48) is x ≡ 7 (mod 48). 2.2. LINEAR CONGRUENCES 29 Exercises for §2.2. E XERCISE 2.2.1. Find all solutions of 3x ≡ 6 (mod 9). E XERCISE 2.2.2. Find all solutions of 3x ≡ 2 (mod 7). E XERCISE 2.2.3. Find inverses modulo 13 of 2 and of 11. E XERCISE 2.2.4. Given a ∈ Z and n ∈ N, show that if a−1 is the inverse of a modulo n and b−1 is the inverse of b modulo n, then a−1 b−1 is the inverse of ab modulo n. 30 2. CONGRUENCES 2.3. The Chinese Remainder Theorem In this section, we discuss solutions of systems of congruences having different moduli. An example of this kind of systems is the following: find a number that leaves a remainder of 1 when divided by 2, a remainder of 2 when divided by three and a remainder of 3 when divided by 5. We shall see that there is a systematic way of solving this kind of system. T HEOREM 2.3.1. The Chinese Remainder Theorem: Fix a k ∈ N. Then given b1 , . . . , bk ∈ Z and n1 , . . . , nk ∈ N, the system of congruences x ≡ b1 (mod n1 ) x ≡ b2 (mod n2 ) .. . x ≡ bk (mod nk ) has a solution x ∈ Z if the n1 , n2 , . . . , nk are pairwise relatively prime. The solution is unique modulo N = n1 n2 . . . nk . P ROOF. For j = 1, . . . , k, let Nj = N/nj . Since the moduli nj are pairwise relatively prime, gcd(Nj , nj ) = 1 – after all, Nj is the product of all the moduli except nj . Hence by Corollary 2.2.8, ∃yj = Nj−1 modulo nj , satisfying Nj yj ≡ 1 (mod nj ). Consider now k X x= bj Nj yj j=1 Since Nj ≡ 0 (mod ni ) ∀i 6= j, we see that x ≡ bj Nj yj ≡ bj (mod nj ). Hence x is a solution to the system of congruences. We have to show now that any two solutions are congruent modulo N. Suppose that x and y are both solutions of the system of congruences. Then x ≡ bj ≡ y (mod nj ), or nj | x−y, for all 1 ≤ j ≤ k. But then, since the moduli are pairwise relatively prime, using Theorem 2.1.4 k times (formally, this needs to be done by induction!), we can conclude that N = n1 . . . nk | x − y or x ≡ y (mod N). E XAMPLE 2.3.2. Solve the system x ≡ 1 (mod 2) x ≡ 2 (mod 3) x ≡ 3 (mod 5). 2.3. THE CHINESE REMAINDER THEOREM 31 We have N = 2 · 3 · 5 = 30. Also N1 = 30/2 = 15, N2 = 30/3 = 10, and N3 = 30/5 = 6. So we have to solve now 15y1 ≡ 1 (mod 2) – a solution is y1 ≡ 1 (mod 2). In the same way, we find that y2 ≡ 1 (mod 3) and y3 ≡ 1 (mod 5). Therefore x = 1 · 15 · 1 + 2 · 10 · 1 + 3 · 6 · 1 = 53 ≡ 23 (mod 30). 32 2. CONGRUENCES Exercises for §2.3. E XERCISE 2.3.1. Find an integer that leaves a remainder of 2 when divided by either 3 or 5, but that is divisible by 4. E XERCISE 2.3.2. Find all integers that leave a remainder of 4 when divided by 11 and leaves a remainder of 3 when divided by 17. E XERCISE 2.3.3. Find all integers that leave a remainder of 1 when divided by 2, a remainder of 2 when divided by 3 and a remainder of 3 when divided by 5. E XERCISE 2.3.4. A band of 17 pirates steal some gold bars. When they try to divide the spoils equally, 3 bars are left over – so a fight breaks out, killing one. This immediately brings calm as they see if the gold can now be evenly shared. Unfortunately, there are now 10 bars left out, so they fight again. After the inevitable (single) further death, a perfect division is now possible. What is the minimum number of gold bars the pirates could have started with? [This is apparently an ancient Chinese problem.] 2.4. ANOTHER WAY TO WORK WITH CONGRUENCES: EQUIVALENCE CLASSES 33 2.4. Another Way to Work with Congruences: Equivalence Classes In this section, we shall consider another way to work with congruences, based upon the following: D EFINITION 2.4.1. Let S be a set and ∼ = a relation defined on S. (That is, ∀x, y ∈ S, the statement “x ∼ = y” may be true or false.) If ∼ = satisfies the following three properties, it is called an equivalence relation: • [Reflexivity] ∀x ∈ S, x ∼= x. • [Symmetry] ∀x, y ∈ S, x ∼ =y⇔y∼ = x. ∼ • [Transitivity] ∀x, y, z ∈ S, x = y and y ∼ =z⇒x∼ = z. If ∼ = is an equivalence relation on the set S and x ∈ S, then the set [x] = {y ∈ S | y = x} ⊆ S is called the equivalence class of x. We write S/ ∼ ∼ = for the set of equivalence classes in S. And if C ∈ S, then any r ∈ S such that C = [r] is called a representative of the equivalence class C. T HEOREM 2.4.2. Let S be a set and ∼ = an equivalence relation defined on S. Then (1) ∀x ∈ S, x ∈ [x]. (2) ∀x, y ∈ S, either [x] = [y] or [x] ∩ [y] = ∅, but not both. P ROOF. (1): This is just the reflexivity of ∼ =. ∼ (2): Suppose z ∈ [x] ∩ [y]. This means z = x and z ∼ = y, so by symmetry y ∼ = z and by ∼ transitivity, x = y. Now if a ∈ [x] and b ∈ [y], then a ∼ = x and b ∼ = y. By transitivity, a ∼=x∼=y ∼ = b. Thus a ∈ [y] and, by symmetry, b ∼ = x so b ∈ [x]. Therefore [x] ⊆ [y] and [y] ⊆ [x], and thus [x] = [y]. Since the above construction only relied on the existence of some element z ∈ [x] ∩ [y], we see that the only way we could fail to have [x] = [y] is if [x] ∩ [y] = ∅. E XAMPLE 2.4.3. On the set S = {(n, m) | n, m ∈ Z, m 6= 0} we can define the = (c, d) if ad = cb. Then S/ ∼ relation (a, b) ∼ = is nothing other than the rational numbers, Q! Now let us specialize the concept of equivalence class to the case of congruences: P ROPOSITION 2.4.4. Given n ∈ N, the relation on Z defined by a∼ =n b ⇔ a ≡ b (mod n) (which we shall write a ∼ = b if the n is clear from context) is an equivalence relation. D EFINITION 2.4.5. Given n ∈ N and a ∈ Z, the equivalence class of a under the above equivalence relation is called the congruence class of a mod n and is written [a]n (or, by abuse of notation, merely [a] if the n is understood from the context). The set of 34 2. CONGRUENCES equivalence classes Z/ ∼ =n is called the integers mod n and is written Z/nZ (or, by some authors, Z/n or Zn ). T HEOREM 2.4.6. Given n ∈ N, Z/nZ has n elements, with representatives 0, . . . , n−1 of these distinct equivalence classes. That is, Z/nZ = {[0]n , . . . , [n − 1]n } . P ROOF. Given n ∈ N and a ∈ Z, the Division Algorithm tells us there is a unique pair q, r ∈ Z such that a = qn + r and 0 ≤ r < n. Note that a ≡ r (mod n) or, equivalently, a ∈ [r]. Thus every a ∈ Z is an element of a unique equivalence class [r] for r ∈ {0, . . . , n − 1}. Since each such r ∈ [r], uniquely!, the equivalence classes [0], . . . , [n − 1] are all distinct. The remarkable thing about these Z/nZ is that we can do much of the usual integer arithmetic in them; in fact, sometimes we can do a bit more than the usual. D EFINITION 2.4.7. Given n ∈ N and C, D ∈ Z/nZ, define C + D = [a + b] and C · D = [a · m] where a and b are any representatives of the congruence classes C and D, respectively. T HEOREM 2.4.8. The operations + and · on Z/nZ are well-defined. That is, they are independent of the choices of representatives of the congruence classes. P ROOF. Say n ∈ N and C, D ∈ Z/nZ. Let a, p, b, q ∈ Z be such that C = [a] = [p] and D = [b] = [q]. Then p ≡ a (mod n) and q ≡ b (mod n). But then Theorem 2.1.3 tells us that a + b ≡ p + q (mod n) and a · b ≡ p · q (mod n), so [a + b] = [p + q] and [a · b] = [p · q]. This means that C + D can be defined either as [a + b] or as [p + q] and it will be the same thing, as asserted, and likewise for C · D. These new operations are quite nice: T HEOREM 2.4.9. Given n ∈ N, the addition and multiplication on Z/nZ (1) are commutative and associative; (2) multiplication distributes over addition; (3) both operations have an identity – [0] for addition and [1] for multiplication; (4) every element of Z/nZ has an additive inverse – the inverse of [a] is [−a] (or [n − a], another name for the same thing); and (5) an element [a] ∈ Z/nZ has a multiplicative inverse [a]−1 if and only if gcd(a, n) = 1; this inverse is unique when it exists. P ROOF. Left to the reader. Note the last point is basically Corollary 2.2.8 restated in the language of congruence classes. 2.4. ANOTHER WAY TO WORK WITH CONGRUENCES: EQUIVALENCE CLASSES 35 We can also restate many of our other, earlier results beside just Corollary 2.2.8 in terms of congruence classes. Most of these shall be left to the reader, but here is one example: T HEOREM 2.4.10. Given a, b ∈ Z and n ∈ N, let d = gcd(a, n). Consider the equation [a] · x = [b] where x ∈ Z/nZ. Then (1) If d ∤ b, there are no solutions. (2) If d | b, this equation has exactly d solutions in Z/nZ. P ROOF. Left to the reader; this is really just Theorem 2.2.4 stated differently. 36 2. CONGRUENCES Exercises for §2.4. E XERCISE 2.4.1. When the rational numbers Q are described as in Example 2.4.3, we define addition by [(n, m)] + [(p, q)] = [(nq + mp, mq)] and multiplication by [(n, m)] · [(p, q)] = [(np, mq)], where n, m, p, q ∈ Z and neither m nor q is 0. Prove a version of Theorems 2.4.8 for this version of Q. What are the additive and multiplicative identities in this Q? Does every element (or nearly every element) of Q have an additive and multiplicative inverse – if so, give a for- mula for those inverses; if not, why not? E XERCISE 2.4.2. Restate the Chinese Remainder Theorem in terms of congruence classes and equations in various Z/nZ’s rather than congruences. E XERCISE 2.4.3. Prove the statements in this section which have proofs “left to the reader.” E XERCISE 2.4.4. State and prove congruence class versions of any results about con- gruences from sections 2.1 and 2.2 which do not have versions in this section. 2.5. EULER’S φ FUNCTION 37 2.5. Euler’s φ Function Euler made the following definition, and it was good. D EFINITION 2.5.1. Given n ∈ N, φ(n) = # ({m ∈ Z | 0 ≤ m < n and gcd(m, n) = 1}) . In other words, φ(n) counts the number of non-negative integers less than n which are relatively prime to n. This is called Euler’s φ function, or Euler’s totient function (“totient” rhymes with “quotient”; this name was give to it by the English mathematician Sylvester). Here’s one thing it is good for: T HEOREM 2.5.2. Given n ∈ N, φ(n) is the number of elements of Z/nZ which are (multiplicatively) invertible. P ROOF. This is just Theorem 2.4.9 part (5) One quite surprising fact about Euler’s totient function is that it is multiplicative, at least for relatively prime numbers: T HEOREM 2.5.3. Given n, m ∈ Z, if gcd(n, m) = 1 then φ(nm) = φ(n)φ(m). P ROOF. This is actually a nice application of the Chinese Remainder Theorem, as we shall see. Fix n, m ∈ N which are relatively prime. For k ∈ N, let (Z/kZ)∗ = {x ∈ Z/kZ | x is invertible} so φ(k) = #((Z/kZ)∗ ). (This notation means the number of things in the set (Z/kZ)∗ .) Now define the set of pairs (Z/nZ)∗ × (Z/mZ)∗ = {(a, b) | a ∈ (Z/nZ)∗ and b ∈ (Z/nZ)∗ } . Notice that #((Z/nZ)∗ × (Z/mZ)∗ ) = #((Z/nZ)∗ ) · #((Z/mZ)∗ ) = φ(n)φ(m), since each part of the pair is free to be whichever element it wants of the respective set, so the total number of pairs is the size of the first set times the size of the second. Therefore, if we can prove that (Z/nZ)∗ × (Z/mZ)∗ can be put into bijective correspondence with (Z/(nm)Z)∗ , then we will have φ(n)φ(m) = # ((Z/nZ)∗ × (Z/mZ)∗ ) = #((Z/(nm)Z)∗ ) = φ(nm) as desired, since bijective sets have the same number of elements. The correspondence is given by the function F : (Z/(nm)Z)∗ → (Z/nZ)∗ × (Z/mZ)∗ : [x]nm 7→ ([x]n , [x]m ) . 38 2. CONGRUENCES We must show F is well-defined, 1-1, and onto. Well-defined means that for some [x]nm , if y ∈ Z is another representative of the congruence class [x]nm , then ([x]n , [x]m ) = ([y]n , [y]m ) so that F is defined only by the class [x]nm , not by its choice of representative x. But this is easy, since y ∈ [x]nm means y ∼ =nm x so ∃k ∈ Z such that y − x = k(nm) = (km)n = (kn)m. These last two versions mean that y ∼ =n x and y ∼ =m x, so [x]n = [y]n and [x]m = [y]m , which means F is well-defined. Now let us assume that x, y ∈ Z have F ([x]nm ) = F ([y]nm), i.e., [x]n = [y]n and [x]m = [y]m . That is, z = x − y ∈ Z solves the system z≡0 (mod n) z≡0 (mod m) . But z = 0 also solves this system, and the Chinese Remainder Theorem tells us that so- lutions of this system (which is an appropriate system for the CRT since gcd(n, m) = 1) are unique modulo nm. Therefore, x − y ≡ 0 (mod nm), so x ≡ y (mod nm). Hence [x]nm = [y]nm , and thus F is 1-1. Now, given any pair ([x]n , [y]m ) ∈ (Z/nZ)∗ × (Z/mZ)∗ , consider the system of con- gruences z ≡ x (mod n) z≡y (mod m) , which system is again CRT-ready since gcd(n, m) = 1. Therefore there exists a solution to this system, call it z ∈ Z, and F ([z]nm ) = ([x]n , [y]m ). Thus F is onto as well. 2.5. EULER’S φ FUNCTION 39 Exercises for §2.5. E XERCISE 2.5.1. Compute φ(n) for n = 2, 3, 5, 7, 11, 13, 17. Make a general conjec- ture. Can you prove it? E XERCISE 2.5.2. Compute φ(n) for n = 2, 4, 8, 16, 32, 64. Make a conjecture about k φ(2 ) for k ∈ N. Prove it! CHAPTER 3 Prime Numbers Primes are the atoms out of which the more complicated, composite integers (the molecules, in this metaphor) are built. In this chapter we study some of their basic prop- erties, prove the aptly named Fundamental Theorem of Arithmetic, and go on to Wilson’s Theorem. 3.1. Basics and the FTA First of all, we make the D EFINITION 3.1.1. We say p ∈ N is prime if p > 1 and the only natural numbers which divide p are 1 and p. E XAMPLE 3.1.2. Some primes are 2, 3, 5, 7, 11, 13, and 17. Notice 2 is the only even prime (clearly – any other would be a multiple of 2 and hence could not be prime), and has some unusual properties – the joke is that “2 is the oddest prime.” The largest prime known to humans at the time of this writing is 257,885,161 − 1 which was proven to be prime in January of 2013 by a distributed computer program called GIMPS [the Great Internet Mersenne Prime Search] running on hundreds of machines across the Internet. In contrast, we also use the following term D EFINITION 3.1.3. A number c ∈ N which is greater than 1 and not prime is called composite. How far does a naive, brute-force check have to go in order to see if a number is composite? √ T HEOREM 3.1.4. If n is a composite, then it has a positive divisor d satisfying d ≤ n. P ROOF. Suppose n is composite. Then it has some divisor a ∈ N. Notice that n = a· na , √ so na ∈ N is also a divisor. But a and na cannot both be less than n, because if they were we would have n √ √ n=a· < n· n=n a which would be a contradiction. Hence either a or na is the divisor d promised by the theorem statement. 41 42 3. PRIME NUMBERS Euclid’s Lemma (Lemma 2.1.6) takes a particularly nice form if the divisor involved is prime: P ROPOSITION 3.1.5. Suppose p is a prime and a, b ∈ Z. If p | ab then p | or p | b. P ROOF. Notice that gcd(p, a) | p, therefore gcd(p, a) is either 1 or p since p is prime. But also gcd(a, p) | a, so either p | a or gcd(a, p) = 1. If p | a, we are done. If not, since therefore gcd(a, p) = 1, Euclid’s Lemma 2.1.6 tells us that p | b. A more general form of this is C OROLLARY 3.1.6. Suppose p is a prime, k ∈ N, and a1 , . . . , ak ∈ Z. Then if p | a1 . . . ak , it follows that p divides at least one of the aj . P ROOF. Left to the reader (use induction on k). This leads to the aptly named T HEOREM 3.1.7. The Fundamental Theorem of Arithmetic: Let n ∈ N, n ≥ 2. Then ∃k ∈ N and primes p1 , . . . , pk such that n = p1 . . . pk . Furthermore, if l ∈ N and q1 , . . . , ql are also primes such that n = q1 . . . ql , then l = k and the factorization in terms of the q’s is merely a reordering of that in term s of the p’s. P ROOF. We use the Second Principle of Mathematical induction for the existence part. The general statement we are proving is ∀n ∈ Z, n > 1 ⇒ S(n), where S(n) is the statement “∃k ∈ N and primes p1 , . . . , pk such that n = p1 . . . pk .” As the base case, say n = 2. Then k = 1 and p1 = 2 works. Now assume S(k) is true for all k < n. If n is prime, then k = 1 and p1 = n works. So suppose n is instead composite, with divisor d 6= 1, n. Then both d, nd < n, so by the inductive hypothesis ∃k, k ′ ∈ N and primes p1 , . . . , pk , p′1 , . . . , p′k′ such that d = p1 . . . pk and nd = p′1 . . . p′k′ . So then n is the product n = p1 . . . pk · p′1 . . . p′k′ of the k + k ′ primes. Hence S(n) is true as well, and so prime factorizations always exist. Now suppose n ∈ Z satisfies n > 1 and ∃k, l ∈ N and both primes p1 , . . . pk and q1 , . . . , ql such that p1 . . . pk = n = q1 . . . ql . Certainly p1 divides the left hand side of these dual expressions for n. Then by Corol- lary 3.1.6, p1 divides one of the qj , which means it must be that p1 = qj since they are prime. Removing the p1 from the left and the qj from the right, we get p2 . . . pk = n = q1 . . . qj−1 · qj+1 . . . ql . 3.1. BASICS AND THE FTA 43 Continuing in this way, either we get the uniqueness statement in the theorem, or we run out of p’s or q’s. However, we cannot run of of primes on one side before the other, because that would make a product of primes on one side equal to 1, which is impossible. 44 3. PRIME NUMBERS Exercises for §3.1. E XERCISE 3.1.1. Provide all the details of the proof of Corollary 3.1.6. E XERCISE 3.1.2. State and prove a theorem about the prime factorizations of numbers a, b ∈ N and of their gcd. E XERCISE 3.1.3. A number n ∈ Z, n > 1 is called square-free if it is not divisible by the square of any natural number other than 1. Prove that an n ∈ Z, n ≥ 1 is square-free if and only if it is the product of distinct primes. 3.2. WILSON’S THEOREM 45 3.2. Wilson’s Theorem In this section, we prove a nice theorem usually named after an 18th century English mathematician ... although it was actually first stated by Ibn al-Haytham nearly 800 years earlier. First, we need a L EMMA 3.2.1. Let p be a prime. Then n ∈ N equals its own inverse mod p if and only if p | n + 1 or p | n − 1, i.e., iff n ≡ ±1 (mod p). P ROOF. Say p is a prime and n ∈ N equals its own inverse mod p. That means that n = n · n ≡ 1 (mod p). By definition, p | n2 − 1 = (n + 1)(n − 1). By Proposition 3.1.5, 2 this means that either p | n + 1 or p | n − 1. For the converse, suppose n ≡ 1 (mod p) or n ≡ −1 (mod p). Then, by Theo- rem 2.1.3, n2 ≡ (±1)2 = 1 (mod p), so n equals its own inverse mod p. This Lemma is a key step in T HEOREM 3.2.2. Wilson’s Theorem Given p ∈ N such that p ≥ 2, p is prime iff (p − 1)! ≡ −1 (mod p). P ROOF. Suppose p is prime. Consider the terms of (p − 2)!: every one has an inverse mod p by Corollary 2.2.8, and only 1 equals its own inverse mod p (so would p − 1, but it is not in the product (p − 2)! by Lemma 3.2.1. Hence in (p − 1)! we can group the terms into pairs ((p − 3)/2 of these pairs) which are inverses mod p, leaving out only 1 and (p − 1). That is (p − 1)! ≡ 1(p−3)/2 (p − 1) ≡ p − 1 ≡ −1 (mod p) Conversely, suppose p ∈ N satisfies p ≥ 2 and (p − 1)! ≡ −1 (mod p). We can rewrite that congruence as (p − 1)! + 1 ≡ 0 (mod p), or p | ((p − 1)! + 1). Now let d ∈ N be a divisor of p such that d 6= p. It follows that d is one of the numbers in the product (p − 1)!, so d | (p − 1)!; also, since d | p and p | ((p − 1)! + 1), it follows that d | ((p − 1)! + 1). Therefore, d | ((p − 1)! + 1) − (p − 1)! = 1 by Theorem 1.3.9, which means d = 1. In other words, if d ∈ N is a divisor of p, it must be either p or 1, so p is prime. E XAMPLE 3.2.3. Let’s work through one direction of the proof for a simple case, say of p = 7. We start by finding all the inverses of the numbers 2, . . . , 6 (by brute force, in this small case): 2 · 4 ≡ 1 (mod 7) and 3 · 5 ≡ 1 (mod 7), meaning that 2−1 ≡ 4 (or4−1 ≡ 2) and 3−1 ≡ 5 (or 5−1 ≡ 3). Then (p − 2)! = 5! = 5 · 4 · 3 · 2 · 1 = (5 · 3) · (4 · 2) ≡ 1 · 1 ≡ 1 (mod 7) 46 3. PRIME NUMBERS which in turn means that (p − 1)! = 6! = 6 · 5! ≡ 6 · 1 ≡ 6 ≡ 7 − 1 ≡ −1 (mod 7) . Notice that Wilson’s Theorem can be used to build a test for primality: see if a number n satisfies (n − 1)! ≡ −1 (mod n) and, if so, n is prime. This is an entirely impractical test, but it is our first example of a simple computational congruence involving an integer which can tell us that that integer is prime. 3.3. MULTIPLICATIVE ORDER AND APPLICATIONS 47 3.3. Multiplicative Order and Applications In this section we prove two very useful results called Euler’s Theorem and Fermat’s Little Theorem (a special case of Euler’s). We do not follow the proof strategy of Euler and Fermat, however, instead using an approach inspired by abstract algebra and Lagrange’s Theorem in that subject. First we need the D EFINITION 3.3.1. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then we define the multiplicative order of a in mod n (called just the order when the multiplicative and n can be understood from context) to be the smallest k ∈ N such that ak ≡ 1 (mod n). The order of a in mod n is written ordn (a). Let us verify something which should probably always be checked for a new definition: P ROPOSITION 3.3.2. Given relatively prime numbers n ∈ N and a ∈ Z with n ≥ 2, ordn (a) is well-defined. P ROOF. The problem in the definition of ordn (a) might be that there might not be any value of k ∈ N at all for which ak ≡ 1 (mod n). But notice that this is a congruence, so we are really only concerned with the elements k a up to their congruence class. So look at {[ap ]n | p ∈ N} and imagine putting the natural numbers p ∈ N into different boxes based on what is the corresponding congruence class [ap ] ∈ Z/nZ. Since there are infinitely many elements in N and only n elements in Z/nZ – n boxes – by the Pigeonhole Principle (Theorem 1.1.2) there must be two (actually, infinitely many pairs of) distinct values p, q ∈ N such that p and q end up in the same box, meaning [ap ] = [aq ]. Assume without loss of generality that p > q, so p − q ∈ N. We are given that gcd(a, n) = 1, so by Corollary 2.2.8, a−1 exists in mod n. Thus ap−q ≡ ap (a−1 )q ≡ aq (a−1 )q ≡ a0 ≡ 1 (mod n) (replacing ap with aq in the middle of this congruence since we know that [ap ] = [aq ], which means ap ≡ aq (mod n)) and therefore the set of k ∈ N for which ak ≡ 1 (mod n) is non-empty. The order is the smallest such value, which exists by the Well-Ordering Principle. Here now is a theorem from abstract algebra (Lagrange’s Theorem) translated into the current context: T HEOREM 3.3.3. Given relatively prime numbers n ∈ N and a ∈ Z, ordn (a) | φ(n). P ROOF. Start by looking at the congruence classes [a], [a2 ], [a3 ] . . . . They go up to [aordn (a) ] = [1] and then start to repeat, so the set hai = {[aj ] | j ∈ N, j ≤ ordn (a)} 48 3. PRIME NUMBERS is a set of ordn (a) elements of Z/nZ (in group theory, this set hai is called the cyclic subgroup of (Z/nZ)∗ generated by a). Notice that because aordn (a) ≡ 1 (mod n), [1] ∈ hai. Also, since if a−1 is the inverse of a mod n, (a−1 )k is the inverse of ak for k ∈ N, we see that hai ⊆ (Z/nZ)∗ . To finish, we shall show that (Z/nZ)∗ is made up of some number, say m, of pieces, which we shall call cosets, each of which is bijective with hai. These pieces will thus all have ordn (a) elements, so m · ordn (a) = # ((Z/nZ)∗ ) = φ(n), from which our desired result follows. A coset of hai is a set of the form x hai = {[x] · [aj ] = [x aj ] | j ∈ N, j ≤ ordn (a)} where [x] ∈ (Z/nZ)∗ . We have observed that [1] ∈ hai, so ∀[x] ∈ (Z/nZ)∗ , [x] ∈ x hai, meaning that every congruence class [x] ∈ (Z/nZ)∗ is in some coset, i.e., [ Z/nZ ⊆ x hai . [x]∈(Z/nZ)∗ In fact, each coset x hai is bijective with hai. The bijection is the map fx : hai → x hai defined by fx ([aj ]) = [xaj ] for j ∈ N with j ≤ ordn (a). This map is a bijection because it has an inverse fx−1 , coming from the inverse mod n of x. Now suppose x hai and y hai are two cosets. I claim that either x hai = y hai or T T T x hai y hai = ∅. For this, suppose x hai y hai = 6 ∅, so ∃[z] ∈ x hai y hai. That means that ∃j, k ∈ N such that j ≤ ordn (a), k ≤ ordn (a), and [xaj ] = [z] = [yak ]. In other words, xaj ≡ yak (mod n) . Without loss of generality, assume j ≤ k. Then if we multiply both sides of the above congruence by (a−1 )j , we have x ≡ yak−j (mod n). But this means that every element [xap ] ∈ x hai is can be expressed as [yap+k−j ] ∈ y hai, and every element [yaq ] ∈ y hai is can be expressed as [xaq+j−k ] ∈ x hai. Thus x hai = y hai. That was a lot of work, but now we get the famous theorems of Fermat’s Little and of Euler very easily. T HEOREM 3.3.4. Euler’s Theorem Say a ∈ Z and n ∈ N satisfy gcd(a, n) = 1. Then φ(n) a ≡ 1 (mod n). P ROOF. The previous theorem, 3.3.3, told us that ordn (a) | φ(n), so ∃m ∈ Z such that m · ordn (a) = φ(n). By the definition of order, this means m aφ(n) ≡ am ordn (a) ≡ aordn (a) ≡ 1m ≡ 1 (mod n). 3.3. MULTIPLICATIVE ORDER AND APPLICATIONS 49 C OROLLARY 3.3.5. Fermat’s Little Theorem If p is a prime and a ∈ Z satisfies gcd(p, a) = 1 then ap−1 ≡ 1 (mod p). P ROOF. We have seen that for primes p, φ(p) = p − 1, so there is very little to do here. Sometimes one sees Fermat’s Little Theorem in the following, different form: T HEOREM 3.3.6. If p is a prime then ∀a ∈ Z, ap ≡ a (mod p). P ROOF. If gcd(a, p) = 1, then by Fermat’s Little, ap−1 ≡ 1 (mod p). Multiplying both sides of this congruence by a yields the desired result. If instead gcd(a, p) 6= 1, it must be that gcd(a, p) = p since that gcd is a divisor of p, which is prime. The gcd is also a divisor of a, so in fact p | a. But then a and all its powers are congruent mod p to 0, so ap ≡ 0 ≡ a (mod p). 50 3. PRIME NUMBERS Exercises for §3.3. E XERCISE 3.3.1. What are the remainder when 15! is divided by 17 and the remainder when 2 · (26!) is divided by 29? E XERCISE 3.3.2. We know 17 is prime (it’s just a wonderful number, isn’t it?). But just to be sure, use Wilson’s Theorem to prove it. E XERCISE 3.3.3. Can we build a test for primality out of Fermat’s Little Theorem? If so, would it be better (more efficient) than one based on Wilson’s Theorem? How would it work? Write a clear, formal statement of your proposed primality test. If you know a programming language, write some code to try out your primality test. If not (or, in any case, after the programming), do a little research to see if this question has already been investigated and, if so, what was the conclusion. Give either a formal statement of the test and formal result describing its efficacy or a counter-example to your test that you found in the literature. E XERCISE 3.3.4. Suppose p is an odd prime. Prove that if the quadratic congruence 2 x + 1 ≡ 0 (mod p) has a solution, then p ≡ 1 (mod 4). [Hint: Apply Fermat’s Little Theorem to a solution a of the congruence, then multiply and divide by 2 in the power.] 3.4. ANOTHER APPROACH TO FERMAT’S LITTLE AND EULER’S THEOREMS 51 3.4. Another Approach to Fermat’s Little and Euler’s Theorems There is another way to think about these theorems which we shall explain here, be- cause it usefully fills out our understanding of multiplication in Z/nZ. In this case, we shall consider the theorems in the reverse order to what we used above – which is in fact more historically accurate. T HEOREM 3.4.1. Fermat’s Little Theorem, Redux If p is a prime and a ∈ Z satisfies gcd(p, a) = 1 then ap−1 ≡ 1 (mod p). P ROOF. Let’s start by proving that the set Ma = {[0]p , [a]p , [2a]p , . . . , [(p − 1)a]p } of multiples of [a]p in Z/pZ has p elements. There are only p congruence classes named between the { and }, so the set cannot have more than p elements. Now look at two elements in this set, call them [ja]p and [ka]p with 0 ≤ j, k < p, and suppose they are equal. That means that ja ≡ ka (mod p), or p | (j − k)a. By Proposition 3.1.5, this means that either p | (j − k) or p | a. Since gcd(p, a) = 1, we cannot have p | a. Thus p | (j − k). But 0 ≤ j, k < p tells us that −p + 1 < j − k < p − 1 and the only multiple of p in this range is 0. Therefore j = k, and this means that all of the elements in the above description of Ma are distinct, so indeed #(Ma ) = p. But we already know that Z/pZ itself has p elements, which means Ma above is simply another way of describing Z/pZ. For our next step, let’s multiply together all the nonzero elements of Ma , or of Z/pZ since we know it’s the same thing: a · 2a · · · · · (p − 1)a ≡ 1 · ·2 · · · · · (p − 1) (mod p) . Rearranging these terms, we see ap−1 (p − 1)! ≡ (p − 1)! (mod p) or p | (ap−1 − 1) (p − 1)!. Since p being prime means gcd(p, (p − 1)!) = 1, it follows by Proposition 3.1.5 that p | ap−1 − 1. That is, ap−1 ≡ 1 (mod p), as desired. The above is quite similar to the proof that Euler gave of Fermat’s Little Theorem (actually, Fermat gave no proof at all – not unlike his famous Last “Theorem”). A very similar strategy can also prove his own theorem: T HEOREM 3.4.2. Euler’s Theorem, Redux Say a ∈ Z and n ∈ N satisfy gcd(a, n) = 1. Then aφ(n) ≡ 1 (mod n). 52 3. PRIME NUMBERS P ROOF. Recall that the set (Z/nZ)∗ of (multiplicatively) invertible elements in Z/nZ can be written as {[b1 ]n , . . . , [bφ(n) ]n } where the numbers b1 , . . . , bφ(n) , satisfying 1 ≤ b1 < · · · < bφ(n) < n, are all relatively prime to n. We claim that we can also describe (Z/nZ)∗ as the set Ma∗ = [b1 a]n , . . . , [bφ(n) ]n . Note first that since each bj is relatively prime to n, and so is a, it follows that bj a is relatively prime to n as well – the primes which divide bj a divide either bj or a, and there are no such primes which also divide n. Therefore Ma∗ ⊆ (Z/nZ)∗ . Now notice that if [bj a]n = [bk a]n for 1 ≤ j, k ≤ φ(n) then n | (bj − bk )a. By Euclid’s Lemma 2.1.6, since gcd(a, n) = 1, we must have n | (bj − bk ). However, since 1 ≤ b1 < · · · < bφ(n) < n, it follows that −n + 1 < bj − bk < n − 1, and the only multiple of n in that range is 0. Thus bj = bk and we conclude that all of the elements in the above description of Ma∗ are distinct. Since there are φ(n) such elements and Ma∗ is a subset of (Z/nZ)∗ , which has only φ(n) elements, it must be that Ma∗ = (Z/nZ)∗ . As in the previous proof, we conclude that the products of all the elements in Ma∗ and in (Z/nZ)∗ must be the same: b1 a · · · · · bφ(n) a ≡ b1 · · · · · bφ(n) (mod n) . Regrouping terms again, we see aφ(n) b1 · · · · · bφ(n) ≡ b1 · · · · · bφ(n) (mod n) . Now the b’s are all invertible mod n, so we can multiply by the inverses, one by one, until we are left with aφ(n) ≡ 1 (mod n) which finishes Euler’s Theorem. 3.4. ANOTHER APPROACH TO FERMAT’S LITTLE AND EULER’S THEOREMS 53 Exercises for §3.4. E XERCISE 3.4.1. Let n, a, and b1 , . . . , bφ(n) be as in the proof of Euler’s Theorem. Show that b1 · · · · · bφ(n) ≡ ±1 (mod n). CHAPTER 4 Cryptology Here are some Greek roots: kryptos, κρυπτ oς: secret, hidden logos, λóγoς: word, study, speech graph, γράϕω: write, written From these (and others), English gets the words cryptosystem: a set of algorithms for protecting secrets cryptography: work done to make cryptosystems cryptanalysis: work done to circumvent the protections of cryptosystems cryptology: the union of cryptography and cryptanalysis, often abbreviated simply to crypto. Beware that cryptography is widely (but inappropriately!) used as a synecdoche for cryptology. (This is not unlike the widely understood incorrect usage of the word hacker.) We will try to use these words more carefully. With that understood, we start with a little elementary cryptology in this chapter. There will be very little number theory, but we will set up some terminology and simple examples of cryptography and the corresponding cryptanalysis, with an emphasis on the old, historic, systems which are no longer viable in the modern age. Later chapters will come around quickly to modern, number theoretic techniques in crypto. 4.1. Some Speculative History Perhaps there was a form of deception that preceded language – certainly many a house pet has feigned innocence despite the clear evidence of involvement in stealing treats. And even apiologists may not know if some lazy bees make up a story about a long excursion to a new flower patch when their Queen demands an accounting. But among homo sapiens, probably as soon as there was language, there was lying. Of course, when two humans are face to face, both parties have some control, such as: the listener can make an attempt to evaluate the trustworthiness of speaker, they can both form their own judgements of the other’s identity and therefore choose what they wish to share with each other, and the words of the speaker pass directly from their lips to the 55 56 4. CRYPTOLOGY listener’s ears without the possibility of change of meaning in flight (absent considerations of ambient noise and so on). A great deal changed with the invention of writing more than 5000 years ago. Words frozen in physical form, and the ideas they represent, can be taken and shared with a wide range of parties other than those with whom the original author wanted to communicate. In addition, if an author is not able to hand her work directly to the intended reader and instead the written words are out in the world on their own for a while, then both intended communicants can no longer be sure that the other is who the writing claims them to be nor that the writing remains the unchanged symbols that the other party originally set down. Let us formalize some of these issues of information security (as it is called now), in the context of a message to be sent from someone named Alice to someone named Bob. The role of the possibly disruptive and overly intrusive environment is played in our little drama by Eve. (Traditionally one skips directly to a character whose name starts with E to symbolize both the environment and also someone who is potentially an eavesdropper 1.) Here is some basic terminology: D EFINITION 4.1.1. Confidentiality means that only the intended recipient can extract the content of the message – Alice wants only Bob to get her message and not Eve, no matter if she listens on the eavesdrop or intercepts the message as it travels in some form from Alice to Bob. D EFINITION 4.1.2. Message integrity means that the recipient can be sure the message was not altered – Bob wants to know that what he gets is what Alice wrote, not what the mischievous Eve intended to change it into. D EFINITION 4.1.3. Sender authentication means that the recipient can determine from the message the identity of the sender – Bob wants to be sure this message did in fact originate with Alice. D EFINITION 4.1.4. Sender non-repudiation means that the sender should not be able to deny sending that message – Bob wants to be able to hold Alice to her promises. Note that Alice and Bob may actually be the same person sending a message from a past self to their future self. For example, someone may want to keep records which must be confidential and whose integrity must be reliable, even if there is a break-in to the site keeping those records. One of the earliest schemes attempting to achieve confidentiality was probably used around the 7th century BCE in Greece by military commanders who wanted to get dis- patches from far away soldiers, and to send them orders, in a way that even if the messenger 1 Eavesdropper apparently comes from the Old English yfesdrype, meaning literally one who stands on the eavesdrop [ground where water drips from the eaves of the roof] to listen to conversations inside a house. 4.1. SOME SPECULATIVE HISTORY 57 is captured, the message they carry will not be readable by the enemy. They used a device called a scytale (the “c” is hard and the word rhymes with “Italy”), which was a cylindrical stick – maybe a spear shaft – (σκυτ άη means baton in ancient Greek), around which a long strip of parchment was wrapped in a spiral. A message could then be written in parallel rows along the length of the scytale, so that when the strip was unwound, the letters were all jumbled and the message was unreadable. F IGURE 4.1.1. A scytale in use.2 On the other end, assuming the receiver also has a scytale, when the strip is wound in a spiral and read lengthwise, the message reappears, as if by magic. Before we go on, a few more technical terms: D EFINITION 4.1.5. The message that Alice wishes to send to Bob, in its actual original (and final) form is called the plaintext. D EFINITION 4.1.6. An algorithm that Alice uses to transform (obfuscate) the plaintext into a form that will remain confidential even if observed by Eve while in flight is called a cipher. We also say that Alice encrypts the (plaintext) message. D EFINITION 4.1.7. After the message has been encrypted, it is called the ciphertext. D EFINITION 4.1.8. When Bob receives the ciphertext, he applies another algorithm to decrypt it and recover the plaintext. 2 Image by DMGualtieri, CC-BY-SA-3.0 http://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons, downloaded from https://commons.wikimedia.org/wiki/File%3AScytale.png 58 4. CRYPTOLOGY Graphically: Basic crypto terminology: Alice on public network Bob plaintext/cleartext message m encrypts m to c transmits c ciphertext c receives c decrypts c to recover plaintext m Suppose a spy sees scouts in the field wrapping strips of parchment around their spears and writing down the length of the spear: meaning Eve learns the encryption algorithm. Although the algorithm is known, confidentiality may be preserved: D EFINITION 4.1.9. Additional information (some of which is) used in encryption and (some of) which is necessary for successful decryption is called a key. In the case of encryption with a scytale, the diameter of the scytale is the key. In fact, there is some conjecture that the scytale was also a simple from of authentication: only Alice had the matching scytale to the Bob’s, so if Bob could read any coherent sequence of letters at all along his scytale, he knew Alice had sent it. Even with a key, the vulnerabilities of this cryptosystem are fairly clear. And Athens did lose the Peloponnesian War. Many hundreds of years of cryptology have shown that in fact it is better to reveal the algorithms of your cryptosystem to the public, and only to keep the key for a particularly communication channel (between Alice and Bob, say) secret. There are many reasons for this, the primary one probably being that the author of a cryptosystem can never be sure they have thought of all the possible methods of cryptanalysis which will be used against it. So the system’s author is better off letting the general crypto community do its best against the system, and only the systems which have withstood such assault should be used. After all, this is basic to the scientific method itself: scientists publish all the details of their experiments, so others can try them as well and give independent verification or find something to criticize. This idea – that the security of a cryptosystem should be based on the secrecy of the key but not of the algorithm – has come to be called Kerckhoff’s Principle, after a set of cryptosystem design ideas written down in 1883 by a French military cryptography. It is held in opposition to cryptosystems, thought of as very weak, which rely upon no one ever finding out the algorithms: such systems rely instead on the ill-advised security through obscurity paradigm. 4.1. SOME SPECULATIVE HISTORY 59 Exercises for §4.1. E XERCISE 4.1.1. Suppose for some k ∈ N, the cleartext Alice wishes to send consists of some symbols m1 , . . . , mk . Assume that the scytale she uses has a diameter such that s ∈ N letters can be written on each turn of the spiral of parchment, when it is wrapped around the scytale. Write one or more formulæ describing what the letters c1 , . . . , ck of the ciphertext will be. You should assume s < k and, if you like, that they have any particular useful relation- ship such as s | k. E XERCISE 4.1.2. Even if the scytale cryptosystem were strong, there may be problems using it for authentication. Describe how authentication might fail depending upon the message being sent – give an example or two of “bad” messages for authentication. E XERCISE 4.1.3. The scytale cryptosystem seems weak. Describe how you would cryptanalyze it. 60 4. CRYPTOLOGY 4.2. The Caesar Cipher and Its Variants Another system which dates to ancient times was supposedly used by Julius Caesar. D EFINITION 4.2.1. Alice takes her message, removes all spaces and punctuation, and puts it all in one case (maybe upper case). Then she moves each letter k places down the alphabet, wrapping around from Z to A if necessary, where k ∈ Z is a fixed number known to both Alice and Bob but no one else, called the key. To decrypt, Bob simply moves each letter k places earlier in the alphabet, wrapping past A to Z if necessary: i.e. Bob encrypts the ciphertext with key −k to get the plaintext. This is called the Caesar cryptosystem. Apparently Julius Caesar usually used the key value k = 3. His nephew Octavian, who later became the emperor Augustus, liked to use k = −1. When using what is called the Latin alphabet (which is not what was used in ancient Rome, though), with its 26 letters, there is one particularly nice key value: 13. The nice thing about that value is encryption and decryption are exactly the same transformation. In modern times, this transformation is called ROT13, and it has a small role in the modern history of the Internet. In particular, posts on early chat rooms and bulletin boards would sometimes want to have a bit of content that should not be automatically available to anyone who looks at the post, but would be there for the determined reader (such as, for example, a spoiler in a review of some popular new game, book, or film). These were often included in the post, but only after they had been run through ROT13. A few commercial products used ROT13 for actual security, despite it actually being completely insecure, such as certain parts of some versions of the Windows operating sys- tem. The Caesar ciphers were completely broken (in a number of ways; see §4.3, below) be- fore 1000CE, but a descendent was developed in the late Middle Ages and was considered the state of the cryptological art through the early modern period. D EFINITION 4.2.2. Once again, Alice takes her message, removes all spaces and punc- tuation, and puts it all in one case (maybe upper case). This time, the key ~k = (k1 , . . . , kℓ ) which Alice and Bob share is an ℓ-tuple, for ℓ ∈ N, of integers k1 , . . . , kℓ ∈ Z. To encrypt, Alice steps through her plaintext message and key sequence both one letter at a time, moving each plaintext letter forward (wrapping from A to Z if necessary) by the number of letters specified by the corresponding key number. If she runs out of key numbers, she simply starts again at the beginning of the key sequence. To decrypt, Bob simply moves each letter k places earlier in the alphabet, wrapping past A to Z if necessary: i.e. to decrypt, Bob encrypts with key −~k = (−k1 , . . . , −kℓ ). This is called the Vigenère cryptosystem. 4.2. THE CAESAR CIPHER AND ITS VARIANTS 61 Traditionally, the key in Vigenère is a written down, memorized, and shared between Alice and Bob in the form of a word. Then when encrypting or decrypting, the letters of the keyword are “added” to the message letters, as described above, under the convention that A = 1, B = 2, etc.. Notice that Vigenère with key length ℓ is essentially ℓ Caesars in parallel – in fact, if ℓ = 1, they are exactly the same cryptosystem. It therefore turns out that Vigenère is essentially ℓ + 1 times harder to crack than Caesar, the “ + 1” coming from the fact that Eve doesn’t even know ℓ. Nevertheless, after a couple of hundred years in which it was considered unbreakable and used as the principle diplomatic cipher in the courts of Europe, approaches to breaking Vigenère were developed. As one final variant of the Vigenère cipher, suppose we go in the opposite direction from the ℓ = 1 extreme, and instead take ℓ as long as possible. D EFINITION 4.2.3. A one-time pad is a Vigenère cryptosystem in which the key is as long as the message, chosen randomly, and never re-used. [The key is also called the one-time pad in this cryptosystem3.] (A one-time pad is sometimes called – inappropriately, given the true intellectual his- tory of the cryptosystem – a Vernam Cipher .) It is important to have a good key sequence in a one-time pad cryptosystem. The good news is that one can prove that with a truly random one-time pad, the resulting cryptosys- tem is in fact perfectly secure. In computer science, this is called information theoretically secure , because the proof of security does not rely upon any assumptions about the com- putational resources available to the attacker. [See [Sha49] and [Sha48] for this proof.] Other cryptosystems we shall meet below are only secure if we assume the attacker has access to a computer of a particular type (a probabilistic polynomial-time Turing machine is the usual assumption; this is the subject within computer science called computational complexity, see, e.g., [AB09]). It is also important never to use the same pad more than once. If so, an attacker can take the letter-by-letter difference of the two ciphertexts and this will completely remove the pad from the calculation. In fact, at that point the message can usually be determined quickly. In modern times, after the advent of digital communications networks, messages are written as computer data, so everything is stored (and transmitted) as bits, meaning 1s and 0s. With bits, when we do the equivalent of what was described above as shifting a letter along the alphabet, wrapping from Z to A if necessary, we are either doing nothing, if the shift is by an even amount, or simply switching 0 and 1 otherwise. 3Another synecdoche! 62 4. CRYPTOLOGY Another way of saying that is to say that is that if the message and the key are both written all out in bits – think of them as the elements [0], [1] ∈ Z/2Z – then the encryption consists exactly of adding the corresponding bits mod 2. A good one-time pad is therefore a very random, and very long, string of bits (congruence classes in Z/2Z). The big problem with one-time pads is key distribution: Alice and Bob must share very large one-time pads before they ever start to communicate, large enough to have as many letters (or bits, in the computer age) as all of their future messages. This does suggest an interesting approach to cryptography: find a way that Alice and Bob can share a small secret out of which they can generate arbitrarily long sequences of bits, both getting the same sequence, which seem to any attacker to be entirely random. If this were possible, they would use these pseudorandom sequences (called this because they are not actually random – after all, they can be determined in advance by both Alice and Bob using their small starting secret – they only appear so to all attackers) as one-time pads. [See [Lub96] for more about pseudorandomness.] 4.2. THE CAESAR CIPHER AND ITS VARIANTS 63 Exercises for §4.2. E XERCISE 4.2.1. ROT13 was supposedly nice because it reduced the amount of coding which needed to be done: the same program would decrypt as the one which encrypts, since doing ROT13 a second time undoes the first ROT13’s transformation. Is this true for other keyed Caesar ciphers? Make a conjecture about whether, given a Caesar cipher key k ∈ Z and a message m, there always exists an n ∈ N such that n times z }| { eCk ◦ · · · ◦ eC k (m) = m where eCk is the function which does the Caesar encryption with key k. If so, find an expression for the smallest such n, which depends (if necessary) on k, m, and the size of the alphabet in which m is written. E XERCISE 4.2.2. Continuing the previous exercise: Suppose now ~k = (k1 , . . . , kℓ ) is an ℓ-tuple, for ℓ ∈ N, of integers k1 , . . . , kℓ ∈ Z and e~Vk is the Vigenère encryption function with key ~k. Find if for all messages m, there exists an n ∈ N such that n times z }| { e~Vk ◦ · · · ◦ e~Vk (m) = m and, if so, find an expression for the smallest such n, which depends (if necessary) on ~k, m, and the size of the alphabet in which m is written. E XERCISE 4.2.3. Suppose Eve has a twin Vev, and they are both looking at an inter- cepted ciphertext c sent by Alice to Bob. They think that those crazy lovebirds Alice and Bob have used a one-time pad to encrypt their communications, but Eve and Vev both do lots of calculations and think they have managed to correctly decrypt c. Unfortunately, they have different (non-gibberish!) values me and mv which Eve and Vev, respectively, think were the plaintext that Alice meant to send to Bob. What must have been the one-time pads pe and pv which they were somehow calculating (or guessing) that Alice used for their respective encryptions? Could they have come up with arbitrary proposed decryptions me and mv , or is there some relationship between the proposed decryptions, the proposed pads pe and pv , the true plaintext, and Alice’s actual one-time pad? Work out a particular, concrete example: if Alice’s original cleartext message m was aardvark, would it be possible for Eve to think the message was me = iloveyou while Vev thinks it is mv = ihateyou? If so, give an example of what would be the corresponding ciphertext c, Alice and Bob’s one-time pad pAB , the proposed decrypts me and mv , and the proposed one-time pads pe and pv . 64 4. CRYPTOLOGY 4.3. First Steps into Cryptanalysis: Frequency Analysis The Caesar cipher seems very weak. But looking at a ciphertext, such as wrehruqrwwrehwkdwlvwkhtxhvwlrq zkhwkhuwlvqreohulqwkhplqgwrvxiihu wkhvolqjvdqgduurzvrirxwudjhrxviruwxqh ruwrwdnhdupvdjdlqvwdvhdriwurxeohv dqgebrssrvlqjhqgwkhp it is hard to know where to start – this hardly seems to be English at all. Perhaps we should start with the Caesar cipher itself, assuming (anachronistically) that Caesar was following Kerckhoff’s Principle, or that (more chronistically) spies had deter- mined the cryptosystem but not the key. One thing we notice right away is that what we do not know, that key, is such a small thing to be giving us such trouble. In fact, if we tried a random choice of key, nearly four times out of 100 (using the modern “Latin alphabet” of 26 letters – in Caesar’s time, the actual Latin alphabet had only 23, so even better!) would, purely by dumb luck, give us a correct decryption. Formally, we are talking about D EFINITION 4.3.1. The set of all valid keys for some cryptosystem is called its keyspace. E XAMPLE 4.3.2. The keyspace of the Caesar cipher cryptosystem is the alphabet. E XAMPLE 4.3.3. The keyspace of the Vigenère cipher cryptosystem with a key length of ℓ is all sequences of ℓ letters in the alphabet; therefore, it has N ℓ elements, if the alphabet is of size N. If the precise key length is not known, but it is known to be less than or equal P to some bound L ∈ N, then there are Lj=1 N j possible keys. E XAMPLE 4.3.4. Any sequence of letters of length M is a possible key (pad) for a one- time pad encryption of a message of length M. Therefore there are N M possible one-time pads for a message of length M over an alphabet of size N. We computed the sizes of these various keyspaces because one thing that seemed weak about the Caesar cipher was how small was its keyspace. In fact, a better strategy than simply guessing at random would be to try all possible keys – there are only 26 (or 23 in Julius’s time)! – and see which one gives the correct decryption. Such an approach is called a brute-force attack [or exhaustive search]. Even in Caesar’s time, the Caesar cipher keyspace is so small that Eve could check all possible keys and see which yielded the cleartext of a message from Alice to Bob. Things are a little more difficult for the Vigenère cipher. For example, there are nearly 12 million keys to try if the key length is known to be five. Before the computer age, this would have been completely intractable. Even in the 21st century, where it would be nearly instantaneous to human eyes for a computer to generate all of those possible decryption, 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS 65 there is a problem: those human eyes would have to look over the 12 million possibilities and pick out which was the valid decryption. Amazon’s Mechanical Turk (see https://www.mturk.com/mturk/welcome) might be able to marshal enough human eyes to solve a single brute force Vigenère de- cryption. But a better approach would be to write a computer program that can distinguish gibberish from a real message Alice might have sent to Bob – then, in a matter of moments, a brute force attack would succeed. If we are to distinguish gibberish from a real message, we need to know what possible real messages are. This motivates the following D EFINITION 4.3.5. The set of possible messages for an encrypted communication is called the message space. Without some structure for the message space, cryptanalysis can become nearly impos- sible. For example, if Alice is e-mailing an encrypted version the combination of a safe to Bob, the message space is not too large but has no identifiable features: a brute force attack would have no way of telling which potential decryption was the actual combination. But suppose we assume that the message space of Alice and Bob’s communication is a set of short (or long, which would be better) English texts. There is quite a bit of structure in English texts – we human speakers of English certainly recognize valid English. The question is whether we can automate the process of detection of valid English texts. Looking back at the example Caesar encryption given at the beginning of this section, we noticed that it did not look like English – but in what way? One thing was, of course, that it does not have any English words in it. Someone who knows English knows most of the words of that language and can recognize that this text has none of them, except maybe “up” and “i” (several times). This suggests an approach: take a list of all words in English, with all variant and derivative forms, add in some possible short random sequences that may have been included in otherwise standard English (where Alice was quoting some nonsense she’d heard, or was writing down a graffito she saw on the side of a subway car, etc.), and compare them to possible decryptions of the ciphertext. This scheme is not very practical. But if the message space were small enough, something like this would be reasonable. Looking back at the message again, another thing that immediately strikes the eye of a reader of English is that it does not seem to have the right letters, or the right combinations of letters. For example, there do not seem to be enough vowels. Looking at how often letters occur in a text compared to what we expect is called frequency analysis. Its use in cryptanalysis was apparently first described by the Muslim philosopher and mathematician al-Kindi4 in his 9th century work Manuscript on Deciphering Cryptographic Messages. 4 Al-Kindi is a very interesting figure from early Muslim intellectual history: e.g., he seems to have brought Hindu numbers, with their place-value notation, into the Muslim world. 66 4. CRYPTOLOGY Here is a table of the frequencies of the letters in a bit of standard English (some of the plays of Shakespeare): F IGURE 4.3.1. Frequency counts of letters in several Shakespeare plays Notice that the letter ‘e’ is the most common, by a bit. So a first cryptanalytic approach using frequency analysis would simply be to use the Caesar decryption key which moved the most frequent letter in the ciphertext to be ‘e’. With the ciphertext at the beginning of this section, that would yield the decryption: ezmpzcyzeezmpesletdespbfpdetzy hspespcetdyzmwpctyespxtyoezdfqqpc espdwtyrdlyolcczhdzqzfeclrpzfdqzcefyp zcezelvplcxdlrltydeldplzqeczfmwpd lyomjzaazdtyrpyoespx Not so good. Apparently looking only at the most frequent letter is insufficient: the most frequent letter in the plaintext of this message was not ’e’. Instead let us look at the entire frequency table for this ciphertext. 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS 67 F IGURE 4.3.2. Letter frequencies in a sample Caesar ciphertext Unfortunately, this is not exactly the standard (Shakespearean) English frequency dis- tribution, nor is it even a shifted (with wrapping past 25 back to 0) version of Shakespeare’s distribution. But perhaps one of its shifts is fairly close to Shakespeare. This suggest the following more refined cryptanalytic approach: try all possible decryp- tions (which is not so bad for the Caesar cipher, because of its small keyspace) and choose the one whose entire letter frequency table is closest to the standard English letter frequency table – so, not looking only at the most frequent letter, but looking at the whole frequency distribution; and not looking for exact match, but merely for the best approximate match. Which brings up the question of what is the best way to measure the distance between two distributions. One commonly used approach is to measure the D EFINITION 4.3.6. The total square error is a distance between two distributions f and g given by 25 X d(f, g) = (f (j) − g(j))2 j=0 where we are always working with our distributions defined on the letters in the form ‘a’=0, ‘b’=1, etc.). Minimizing this distance is equivalent to least squares in statistics or linear algebra. Following this strategy with the ciphertext from the beginning of this section, we get the following 68 4. CRYPTOLOGY F IGURE 4.3.3. Square errors with all decryption keys for example Caesar ciphertext The minimal square error will then come from decryption key 23 (corresponding to an original encryption key of 3; this was done by Julius himself, anachronistically). The corresponding cleartext then would have been tobeornottobethatisthequestion whethertisnoblerinthemindtosuffer theslingsandarrowsofoutrageousfortune ortotakearmsagainstaseaoftroubles andbyopposingendthem which looks pretty good. Interestingly, this approach to non-gibberish detection is fairly robust, even with quite short texts which might be expected to have frequency distributions rather far away from the English standard. For example, with ciphertext rkkrtbrkeffespkyvtifjjifruj we have 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS 69 F IGURE 4.3.4. Square errors with all decryption keys for Caesar ciphertext rkkrtbrkeffespkyvtifjjifruj from which we conclude that the decryption key is 9, corresponding to an encryption key of 17. Thus the cleartext must have been attackatnoonbythecrossroads which seems like the kind of thing Julius might have said. So much for cryptanalysis of the Caesar cipher. Moving on to Vigenère, the first prob- lem we face is finding out the key length. Because if we knew that key length ℓ, we would divide the ciphertext into ℓ shorter texts consisting of every ℓth character starting with the first, then with the second, etc., up to starting with the ℓth . Applying the automatic Caesar cracker described above, we would have the ℓ digits of the key and thus the plaintext. To make things easier, suppose we take all of Hamlet and encrypt it with Vigenère using the keyword hippopotomonstrosesquipedaliophobia, corresponding to numerical key k = (7, 8, 15, 15, 14, 15, 14, 19, 14, 12, 14, 13, 18, 19, 17, 14, 18, 4, 18, 16, 20, 8, 15, 4, 3, 0, 11, 8, 14, 15, 7, 14, 1, 8, 0) For example, the part of the ciphertext corresponding to the quote used above in Caesar cracking would be 70 4. CRYPTOLOGY hdxajnioomjvdtfvstrqitmfozlxjj kcmiosgpenjjbgxmcmtfzltmaudofpmwg odsntxuuhwjywmrjpnieosoqlfbpjoqyyljia cmbdaozawminabtdhrtyndlncugkflswh vjrwgdwddoeicznymcyl If Eve was hoping this was only Caesar-encrypted, so the Vigenère key length was ℓ = 1, she would have the following letter frequencies: F IGURE 4.3.5. Letter frequencies in the Vigenère encryption of Hamlet using key hippopotomonstrosesquipedaliophobia Notice that compared to Figure 4.3.2, the values show much less variation – in fact, there are no zero values and no values which are much larger than all others. The reason is that this distribution is the average of 16 different shifts, coming from the 16 different letters in the keyword hippopotomonstrosesquipedaliophobia, of the standard English distribution Figure 4.3.1. This averaging smooths out all of the characteristic shape of the distribution useful in identifying when its decrypting shift is correct. Nevertheless, the Caesar cracker will give some key value for each one of the choices Eve makes for a possible value of ℓ – there will always be a value of the shift for each letter which minimizes the square error. The minimum will not be very small, though, because the frequency distribution for the letters chosen as all locations in the ciphertext congruent to k (mod ℓ) for any k ∈ N and k ≤ ℓ, where ℓ is less than the true key size, will be quite flat and therefore far away (in the square error distance) from the distribution of standard English. 4.3. FIRST STEPS INTO CRYPTANALYSIS: FREQUENCY ANALYSIS 71 For example, with the encrypted Hamlet we have been examining, if Eve guesses that ℓ = 5 and looks at every letter in a location congruent to 1 (mod 5), the Caesar cracker will have the following graph of square error: F IGURE 4.3.6. Square errors with all decryption keys for letters at lo- cations congruent to 1 (mod 5) in Hamlet Vigenère encrypted with key hippopotomonstrosesquipedaliophobia The minimum is clearly at a decryption key of 23, but it is not significantly less than other possibilities – and all the errors are around ten times higher than we saw in earlier such graphs such as Figure 4.3.3. So here is the Vigenère cryptanalytic approach: Try all key lengths ℓ up to some bound L (determined by how much computer time is available, and not so large that taking every ℓth letter of the ciphertext results in too few letters to do reasonable frequency analysis). For each of the ℓ substrings consisting of every ℓth letter starting at locations 1, . . . , ℓ, apply the Caesar cracker. This will result in an optimal Vigenère key ~kℓ = (kℓ,1 , . . . , kℓ,ℓ) of length ℓ. Then for each such ~kℓ in the range 1, . . . , L, compute the square error distance dℓ between the ciphertext decrypted with ~kℓ and the standard English letter distribution. Report ℓ and ~kℓ as the Vigenère keylength and key if dℓ is the smallest square error computed. 72 4. CRYPTOLOGY Exercises for §4.3. E XERCISE 4.3.1. Can you think of any Caesar ciphertexts which would have two de- cryptions that would both seem like valid English during an brute-force attack? If so, give precise examples, with decryption keys. E XERCISE 4.3.2. Would it be harder or easier to do the previous exercise (4.3.1) for the Vigenère cipher? Give reasoning and examples. E XERCISE 4.3.3. It seems like more encryption would be better, in terms of designing a secure cryptosystem – so how about encrypting a cleartext using one cryptosystem, then re-encrypting the resulting ciphertext to make a super-ciphertext? We have two cryptosystems so far with short keys (we also have the one-time pad, but it is already perfectly secure and has big keys, those pads, so we will not consider it): Caesar and Vigenère. Suppose I make a new cryptosystem which does some combination of Caesar and Vigenère encryptions, one after the other, all with different keys. Will this result in a substantially more secure cryptosystem? Why or why not? 4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM 73 4.4. Public-Key Crypto: the RSA Cryptosystem Suppose Alice and Bob never had a chance to meet in person, and they nevertheless want to exchange messages which will be secret from Eve. What can they do? This seems impossible within the context of cryptosystems we have been discussing so far. Let us tease out the crucial part that makes this so difficult D EFINITION 4.4.1. A symmetric cipher (or symmetric cryptosystem) consists of the following parts, all known to both the communicating parties and the public in general: • a message space M • a keyspace K • an encryption algorithm which creates a ciphertext c = ek (m) for any choice of k ∈ K and m ∈ M. • a decryption algorithm which generates a message dk (c) ∈ M given a ciphertext c and a k ∈ K, which satisfies dk (ek (m)) = m ∀k ∈ K, m ∈ M. Alice and Bob can use such a symmetric cipher by agreeing in private upon a key k ∈ K they will use for both encryption and decryption. For this reason, symmetric cryptosystems are also called private key cryptosystems. Graphically: Basic private key crypto set-up and notation: private communication Alice ←→ of shared key k ∈ K ←→ Bob on public network message mA ∈ M compute cA = ek (mA ) transmit cA ciphertext cA receive cA compute mA = dk (cA ) message mB ∈ M compute cB = ek (mB ) receive cB ciphertext cB transmit cB compute mB = dk (cB ) etc.... It is important for the security of the above cryptosystem that the keyspace K either be quite large, or that the whole system be structured in such a way that it is hard to tell if a particular possible decryption is valid or not (or both). Otherwise Eve can perform the brute-force attack of simply trying all the possible keys k ∈ K and seeing which dk (c) makes sense as the decryption. 74 4. CRYPTOLOGY In any case, because of the initial private key exchange, symmetric cryptosystems are not appropriate for the kind of situation we proposed at the start of this section. Instead, one would need a different kind of cryptosystem: D EFINITION 4.4.2. An asymmetric cipher (or asymmetric cryptosystem) consists of the following parts, all known to any interested party, licit or il-: • a message space M • an encryption keyspace Ke • a decryption keyspace Kd • a way of associating encryption keys to decryption keys, by a one-to-one function E : Kd → Ke • an encryption algorithm which creates a ciphertext c = eke (m) for any choice of ke ∈ Ke and m ∈ M. • a decryption algorithm which generates a message dkd (c) ∈ M given a ciphertext c and a kd ∈ Kd , which satisfies dkd (eE(kd ) (m)) = m ∀kd ∈ Kd , m ∈ M. To use such a cryptosystem, Bob picks a decryption key kd , without telling it to anyone – but he makes the corresponding encryption key ke = E(kd ) publicly available (probably on his website). For this reason, ke is also called his public key and kd his private key, while the whole system is called a public-key cryptosystem. . Graphically: Basic public key crypto set-up and notation: Alice on public network Bob pick kd ∈ Kd compute ke = E(kd ) ∈ Ke download ke public key ke publish ke message m ∈ M compute c = eke (m) transmit c ciphertext c receive c compute m = dkd (c) As before, the security of this cryptosystem against brute-force attacks relies upon the size of Kd . In addition, since the encryption key and algorithm are known to Eve, it is important that the message space M not be too small. If M were small, Eve could simply compute all encryptions eke (m) as m ran over M, and compare them to a ciphertext c she intercepted. There is a way to prevent this attack on the message space, by making it artificially larger. In fact, one usually adds some 4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM 75 D EFINITION 4.4.3. Cryptographic salt is random data added to a message Alice de- sires to send to Bob before she encrypts it, which is automatically removed at Bob’s end. Once Alice salts her messages, Eve must search the entire space of messages together with salt, which can be much larger than M. As an added benefit, salting messages makes the ciphertext change with each secret transmission – even if the plaintext is the same. Therefore even though Eve will of course know there is communication going on between Alice and Bob, her traffic analysis will no longer even be able to track when the same message is being sent on separate occasions. Absent this salt, a meticulous Eve might be able to correlate the message ciphertext with actions in the real world, and thus effectively learn their decryption even without cracking the cryptosystem. With salt, this is impossible. Since Bob’s encryption key ke = D(kd ) is public, the entire security of this cryptosys- tem falls apart if Eve can figure out the inverse of the function E : Kd → Ke . On the other hand, Bob must have been able to compute E itself, at least during the set-up phase of the cryptosystem. There is a name for this kind of function D EFINITION 4.4.4. A function f : A → B which is one-to-one and which can be computed in a reasonable amount of time on a standard (classical) computer but whose inverse cannot feasibly be computed is called a [cryptographic] one-way function. Multiplying two numbers, even very large numbers, can be done quite quickly on a computer. It is even surprisingly fast to tell if very large numbers are prime or composite (see [AKS04] for this amazing recent result in the long history of primality testing). But no one has found a way to factor large numbers, even when they are known to be composite, in a reasonable amount of time.5 For example, one of the largest ever to be fac- tored, a famous challenge problem known as RSA-768, consisting of a 232-digit (768 bit) composite number, was finally factored in 2009 after a network of hundreds of computers working together (although not exclusively on this problem) had been processing for about two years. Larger composite numbers are easy to make and astronomically harder to factor. Therefore, the function which multiplies two large primes together to get a very large composite is a good candidate for a cryptographic one-way function. In 1978, Ron Rivest, Adi Shamir and Leonard Adleman described ([RSA78]) the fol- lowing public-key cryptosystem based on this one-way function: D EFINITION 4.4.5. Bob starts by picking two very large primes p and q. Their prod- uct n = pq is known as the corresponding RSA modulus. Bob also picks a number e, typically a number which when written in base two has very few 1’s, such as 3 or 5 On a classical computer – there is an efficient algorithm known for factoring on a quantum computer, see [Sho94] and [NC10]. 76 4. CRYPTOLOGY 65537 = 100000000000000012, which satisfies gcd(e, φ(n)) = 1. This number is called the RSA exponent. The RSA public [encryption] key ke consists of the pair (n, e); the set of possible such pairs is Ke . The RSA private [decryption] key kd consists of the pair (n, d), where d = e−1 (mod φ(n)), and Kd is the set of such pairs. The association of encryption to decryption keys is by E : (n, e) 7→ (n, e−1 (mod φ(n))). The message space of this cryptosystem is M = {m ∈ Z | 0 ≤ m < n}. Encryption is by c = e(n,e) (m) = me (mod n) . Decryption is d(n,d) (c) = cd (mod n) . All together, the above is called the RSA cryptosystem. Graphically: RSA cryptosystem set-up and notation: Alice on public network Bob pick large primes p and q compute RSA modulus n = pq pick RSA exponent e ∈ N with gcd(e, φ(n)) download ke public key ke publish ke = (n, e) compute d = e−1 (mod φ(n)) message m ∈ M compute c = e(n,e) (m) = me (mod n) transmit c ciphertext c receive c compute m = d(n,d) (c) = cd (mod n) The first thing we need to see is that this satisfies the basic condition to be a functioning cryptosystem: P ROPOSITION 4.4.6. Let notation be as in the above definition of the RSA cryptosystem. Then for any message m ∈ M, d(n,d) (e(n,e) (m)) = m. P ROOF. Any m ∈ M represents a class in Z/nZ, and we will blur the distinction between the class and its representative m satisfying 0 ≤ m < n. 4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM 77 We have built d as d = e−1 (mod φ(n)). This means e · d ≡ 1 (mod φ(n)), so ∃k ∈ Z such that e · d = 1 + kφ(n). Then by Euler’s Theorem 3.3.4, ∀m ∈ M such that gcd(m, n) = 1 d(n,d) (e(n,e) (m)) ≡ (me )d = med = m1+kφ(n) ≡ m · (mφ(n) )k ≡ m · (1)k ≡ m (mod n) as desired. The case where gcd(m, n) 6= 1 is Exercise 4.4.1 in this section. Beyond basic functionality, we need to see how practical RSA is. So let us go through it carefully, picking out the algorithms needed at each step. (1) Bob must choose the large primes p and q. As we have said, there are algorithms which can tell if a number is prime in a reasonable amount of time. More precisely, this means that a (classical) computer can determine if a number k is prime in an amount of time which is less than a polynomial function of log(k). [That’s the usual meaning of feasible computation in cryptology.] (2) Furthermore, there are enough primes that Eve will not be able to do a brute force search through all the possibilities. We can tell this because The Prime Number Theorem tells us the (asymptotic) number of primes: T HEOREM 4.4.7. Given x ∈ R, x > 0, let π(x) = #{p ∈ N|p ≤ n} be the prime counting function. Then π(x) lim = 1. x→∞ x/ ln(x) This theorem is one of the gems of analytic number theory. It was proven in 1896 by Jacques Hadamard and Charles Jean de la Vallée-Poussin, although important pieces were known before then by people like Euler, Legendre, and Riemann. The proof is hard, even an elementary one from 1948 by Atle Selberg and Paul Erdös. One consequence of the Prime Number Theorem is that the probability that a randomly chosen number k is prime is approximately 1/ln(k). For example, looking for a prime with 200 digits, we pick a random number of that size, and then we will have to test on the order of ln(10200 ) = 461 numbers before we have a prime. This is not an unreasonable amount of computation. (3) Having an RSA modulus n = pq, we can compute the Euler totient function φ(n) = (p − 1)(q − 1) nearly instantly by Theorem 2.5.3. (4) Finding the RSA exponent e which is relatively prime to φ(n) is not hard, since there are many such – in fact, in a real sense, the probability that two randomly chosen numbers are relatively prime is π62 (for a precise statement of this, and its proof, see [HW79]). Testing candidates for e is also efficient because, as we shall see below in Exercise 4.4.3, the Euclidean Algorithm takes an amount of time which is less than a polynomial function of the log’s of the numbers involved – it is feasible. 78 4. CRYPTOLOGY (5) Next, computing d = e−1 (mod φ(n)) is very fast using the Extended Euclidean Algorithm (Theorem 1.6.2) which is certainly slower than the basic Euclidean Algorithm, but is still feasible. (6) RSA encryption and decryption both consist of raising a number to a power in (mod n). Fortunately, there is an algorithm called fast modular exponentiation which again does this in an amount of time less than a polynomial function of the log’s of the given numbers. (7) One last practical point is how Alice actually uses RSA to send the message she wants, and not merely a number m ∈ Z such that 0 ≤ m < n. This is a standard problem in mathematical cryptography and has standard solutions. Typically, Alice must take her message, character by character, and transcribe it into a sequence of numbers using some accepted standard. This has been done since 1960 using the ASCII code [American Standard Code for Infor- mation Interchange], which gives 7 bit binary numbers for the 26 letters of the (modern) Latin alphabet, both upper and lower case, as well as for a set of com- monly found symbols and a few modern additions, usually no-printing, such as 111102 Record Separator, 1112 Bell, 10112 Vertical Tab, etc. These ASCII codes are today always stored in one byte (8 bits), padded with a leading 0 bit. Actually, as the World Wide Web has become a global phenomenon, there has been an increasing need for more alphabets and even writing systems (like Chinese) which do not use alphabets. This has resulted in a system called Unicode which as, in version 6.3 as of 2013, more than 110,000 characters. Unicode is stored in a variety encodings, of which the most common are UTF-8 and UTF- 16, which use between two and four bytes to store each character. What Alice typically does, then, is to write out her message in either ASCII or Unicode, simply attach all the bytes in sequence, and then cut the entire message into blocks of b bits. Here b ∈ N is chosen to be the largest value such that 2b < n, so that those blocks of b bits can be thought of as representing an integer m ∈ Z such that 0 ≤ m < n, so RSA can now be used on the message, a block at a time. (We are glossing over many details here, such as how to leave space for some salt, other approaches and issues in the structure of the blocks, etc. See any stan- dard reference on cryptography, such as [FS03] or [MVOV96].) 4.4. PUBLIC-KEY CRYPTO: THE RSA CRYPTOSYSTEM 79 Exercises for §4.4. E XERCISE 4.4.1. In this problem, you will finish the proof of Proposition 4.4.6, which states that RSA decryption functions correctly in all cases. As a first step, formally state and prove a lemma that for distinct primes p and q, two integers r and s are congruent mod pq if and only if they are congruent mod p and mod q. [Hint: the Chinese Remainder Theorem.] Now, using the above, prove the missing case of Proposition 4.4.6: Given distinct primes p and q, define n = pq. Let e ∈ N satisfy gcd(e, φ(n)) = 1 and d ∈ N be an inverse of e mod φ(n). Then, for any m ∈ Z such that 0 ≤ m < n and gcd(m, n) 6= 1, we have med ≡ m (mod n). E XERCISE 4.4.2. How much computation is required to compute ak (mod n) for a ∈ Z and k, n ∈ N with n ≥ 2? Multiplying two numbers a, b ∈ Z is a basic instruction in most modern computers: it can be thought of as taking one unit of time. (Or you can do a number of smaller manip- ulations which is approximately a polynomial function of the number of digits in a and b; it will make no difference for the rest of this problem.) Similarly, division with remainder is either a single machine instruction (e.g. on a processor in the Pentium family) or can be done by elementary school methods in time approximated by a polynomial function of the sizes of the inputs. Let us then describe the work to do a multiplication followed by reducing (mod n) as being bounded by a polynomial function p(d), where d is the number of digits in the operands. If we then simply multiply a by itself k times, reducing (mod n) each time, the time to do this will be approximately kp(d) = c1 ec2 d p(d), for some constants c1 , c2 ∈ R – it will increase exponentially. See if you can come up with a much fast exponentiation algorithm, polynomial rather than exponential. [Hints: (1) Make a table of successive squares of a in (mod n), being a2 , a4 , a6 , etc. (2) write out k in base two, and expand ak using this binary version of k, the usual rules for exponentiation, and the table. End up with a description of how to compute ak (mod n), and check carefully how much time it would take to do your computation.] E XERCISE 4.4.3. For k ∈ N, play the following game: (1) use the division algorithm on k for division by two, getting quotient q and remain- der r; (2) replace k ← q; (3) if k > 0, repeat from step (1); otherwise, terminate. 80 4. CRYPTOLOGY Give a bound on how many steps does it take for this game to finish – answer in terms of a function of k, or a function of the number of bits required to write k down in base two, or a function of the number of digits require to write k in base 10. Show that for every two steps of the Euclidean Algorithm, the remainder terms ri de- crease by a least a faction of 1/2. In other words, if we use the notation of Theorem 1.6.2 then rj+2 < 12 rj for all j ∈ Z satisfying j ≥ 0. Explain how this means the Euclidean Algorithm requires at most log2 (b) steps to com- pute gcd(a, b) and it therefore requires at most seven times the number of digits of b. [Hint: what is log2 (10)?] Explain how this means that the Euclidean Algorithm is a cryptologically feasible com- putation, in the sense of this section. 4.5. DIGITAL SIGNATURES 81 4.5. Digital Signatures Public-key cryptosystems allow several use-cases which symmetric cryptosystems do not. One which has come to have more and more importance in the modern digital economy is the creation of digital signatures – these are parts of electronic documents which are supposed to have something of the qualities of a physical signature in that are hard for an imposter to forge. Such a signed document might be needed, for example, if Bob from the last section (whose RSA public key is on his website) wished to send a legally binding contract via e-mail. Perhaps Alice and Bob wish to e-mail to their future landlord Larry a signed lease for an apartment that they will share. When Larry gets an e-mail from Bob saying “I agree to be bound by the terms of this lease,” Larry needs to have confidence that this e-mail did originate from Bob, which he can if there is a digital signature. Here’s what Bob can do: he takes a copy of the lease, adds a section at the end stating his agreement to its terms and giving some personally identifying information (perhaps a scan of his driver’s license). Call this whole chunk of data m. Then Bob applies his decryption algorithm, using his private (decryption) key kd , yielding s = dkd (m) – this s is called Bob’s signature on the message m. He then e-mails both m and s to Larry. When Larry receives this signed message, the first thing he does is detach the signature s and compute its encryption eke (s) using the public key he got off Bob’s website. Since eke and dkd are inverses and it does not matter in which order they are applied, the result should be m. If that is so, Larry can be sure that whoever sent the message also had access to Bob’s secret key and so presumably is Bob himself. Graphically: Basic digital signatures: Larry on public network Bob pick kd ∈ Kd compute ke = E(kd ) download ke public key ke publish ke message m ∈ M compute s = dkd (m) receive (m, s) signed message (m, s) transmit (m, s) if eke (s) = m ACCEPT otherwise, REJECT One problem with this scheme is that it has effectively doubled the size of the message. The way to make a smaller, more efficient signature is for it to consist of the decryption not of all of m but instead of some function h(m). Here the function h should take a message 82 4. CRYPTOLOGY of arbitrary size and produce a small, digested piece of data ... which nevertheless depends upon every part of the input m. After, all, if h(m) depended only upon the first 100 bits of m, for example, then a malicious Eve could alter the message in transit, and her change would go undetected as long as she did not change the first 100 bits of the message. Cryptologists have a name for functions like this h. D EFINITION 4.5.1. A function h which takes as input arbitrary length strings of bits and produces output bit strings of a fixed length is called a cryptographic hash function if it satisfies ease of computation: it is feasible to compute the h(m) for any m; pre-image resistance: given a hash value t, it is infeasible to find an m such that h(m) = t; second pre-image resistance: given a specific input m1 , it is infeasible to find an- other m2 such that h(m2 ) = h(m1 ); collision resistance: it is infeasible to find two messages m1 and m2 such that h(m2 ) = h(m1 ). The words feasible and infeasible here have the same meaning here as in the previous section that it is, or is not, possible to complete the computation in an amount of time bounded by a polynomial function of the size of the inputs. Notice that since a hash function takes inputs of arbitrary length but has a fixed output size, there will necessarily be an infinite number of collisions The creation of cryptographic hash functions is something of a black art. It turns out that if one builds a candidate hash function with some clear structure (usually mathemati- cal) – particularly if it is one that is fast to compute – a way to break one of the resistance requirements is usually found by the cryptological community. For this reason, the algo- rithms currently in wide use tend to be very ad hoc computations that just seem messy and have resisted attempts at inversion or breaking resistance. E XAMPLE 4.5.2. For around a decade starting in the early 1990s, the most widely used cryptographic hash function was called md5. This algorithm was developed by Ron Rivest and published in 1992. The output size of md5 is 128 bits. While md5 was thought to be flawed since the middle 1990s, a real attack was not published until 2004, when it was shown not to be collision resistant [WY05]. However, md5 is still used extensively today to verify that a large data transfer has not suffered a transmission error – i.e., it is still a useful tool to test for non-malicious data corruption. (In this context of providing evidence for data integrity against non-malicious corruption, a hash function is frequently called a fingerprint.) 4.5. DIGITAL SIGNATURES 83 E XAMPLE 4.5.3. The most widely used cryptographic hash function from the late 1990s until recently, and one which is built into many widely accepted and standardized cryptographic protocols, is SHA-1, with an output size of 160 bits. SHA-1 was developed by US National Security Agency (NSA) in a semi-public pro- cess, and was adopted by the US National Institute of Standards and Technology (NIST) as part of several US Federal Information Processing Standards . In 2004, some work was published which indicated that SHA-1 might be vulnerable to certain kinds of attack. (See [PS04].) For this reason, NIST required in 2010 many US federal data protection applications to move to another hash function. E XAMPLE 4.5.4. At the time of this writing, most security conscious users and organi- zations recommend SHA-2, usually in its “SHA-256” variant, which has an output size of 256 bits. Given recent revelations of the NSA’s involvement – and weakening of – crypto- graphic protocols, it might be a cause of concern that NSA participated in the development of SHA-2. Graphically: RSA digital signatures: Charlie on public network Bob do RSA set-up, make ke = (n, e) and d download ke verification key ke publish ke message m ∈ M compute s = (h(m))d receive (m, s) signed message (m, s) transmit (m, s) if h(m) = se (mod n) ACCEPT otherwise, REJECT With this understood, we can describe digital signatures more formally. D EFINITION 4.5.5. Suppose Bob sets up the RSA cryptosystem, as in Definition 4.4.5, and chooses once and for all a cryptographic hash function h which he publishes on his web page along with his public encryption key ke . If Bob now wants to sign a message m, he appends to m the RSA digital signature s = ddk (h(m)) before transmitting it to any third party, say Charlie. When Charlie receives a signed message (m, s) which claims to be from Bob, he goes to Bob’s website and downloads the public key ke and description of the cryptographic hash function h. At this point, Charlie computes eke (s) and compares it to h(m). If they are equal, he accepts the signature; if not, he rejects. 84 4. CRYPTOLOGY In this context, Bob’s private/decryption key kd is called the signing key and his pub- lic/encryption key ke is called the verification key. 4.5. DIGITAL SIGNATURES 85 Exercises for §4.5. E XERCISE 4.5.1. Use the Pigeonhole Principle (Theorem 1.1.2 ) to prove that there will always be an infinite number of collisions for a cryptographic hash function [although they may not be feasibly computable]. E XERCISE 4.5.2. Give an example of a hash function h whose output size is one bit and a specific input m1 for which there is no second pre-image; that is, ∄m2 such that h(m2 ) = h(m1 ). 86 4. CRYPTOLOGY 4.6. Man-in-the-Middle Attacks, Certificates, and Trust While public-key crypto can seem an unalloyed benefit to the networked world, close examination of the details of the last two sections shows a dangerous gap between a casual statement of the properties of these cryptographic tools and their reality. The distinction which at first goes unnoticed is between Bob, the person, and the bits arriving over the wire to Alice or Larry which claim to be from Bob. This has little effect if Eve is, as we have mostly been assuming her to be, a passive observer of the communications between Alice and Bob (and sometimes Larry). But if Eve has even more control over the network and can replace all communications with her own versions, new attacks are possible. Suppose Alice wants to send her secret message to Bob, again without ever having met him to exchange with him a key for a symmetric cryptosystem. She hopes that he has a public key, so she gets on the web and downloads his home page. Here is where Eve springs into action, intercepting Bob’s (web server’s) response to the web request. She keeps a copy of Bob’s public key keB for herself, but substitutes into the web page data in its place an RSA encryption key of her own, keE – for which she alone knows the corresponding decryption key kdE – and transmits the modified page on to Alice. Alice composes her cleartext m, and transmits its corresponding ciphertext cA = ekeE (m) to Bob, or so she thinks. Instead, Eve intercepts that ciphertext, decrypts and stores m = dkdE (cA ) = dkdE (ekeE (m)). Then, in order to make Alice and Bob think everything is working normally (so they’ll keep up their revealing chatter), Eve transmits cE = ekeB (m) on to Bob. From Bob’s point of view, he gets an e-mail which seems to be from Alice and which, when decrypted with his private key, does make perfect sense. Furthermore, if he interacts with Alice off-line, she behaves as if she sent that message – in fact, she did, but not in the particular encrypted form that Bob received. Eve has completely violated the confiden- tiality of Alice and Bob’s communications, and she could violate the message integrity any time she liked, still making it appear to come legitimately from Alice. The above scenario is called a man-in-the-middle attack (pardon the non-gender- neutral terminology). Here is a graphical depiction of this attack: 4.6. MAN-IN-THE-MIDDLE ATTACKS, CERTIFICATES, AND TRUST 87 Man-in-the-middle attack: Alice Eve Bob generate keys: public keB , private kdB intercept keB publish keB generate keys: public keE , private kdE download keE keE , spoof origin message m ∈ M compute cA = ekeE (m) transmit cA cA , intercept extract the cleartext m = dkdE (cA ) change to m′ if desired compute cE = ekeB (m′ ) spoof origin cE receive cE read the message m′ = dkdB (cE ) So it seems that symmetric cryptosystems actually have one nice thing built in: when the parties meet in that perfect, prelapsarian moment to exchange their symmetric key, they presumably can have confidence in the identity of the person they’re talking to – if not, they wouldn’t exchange the symmetric key until they had seen lots of official-looking identifi- cation materials. Asymmetric cryptosystems have the fundamental difficulty to overcome of establishing a trustworthy connection between a real person’s identity and a public key on the Internet which purports to be from that person. The technique from last section, §4.5, can help transfer the trust, at least. Suppose Alice wants to engage in secret communication with Bob, but does not know if she can trust the public key which appears to be on Bob’s web page. If that key came with an accompanying digital signature issued by a trusted third party [TTP] of which Alice had the public key, she could verify that the key was Bob’s, at least as far as the TTP knew. Here is a formal definition. D EFINITION 4.6.1. Individuals or organizations who want to use asymmetric cryp- tography can go to a trusted third party called a certificate authority [CA] to request a [digital] certificate for their public keys. This certificate would be a digital signature on the public key itself, signed by the CA’s signing key. The CA’s verification key would be assumed widely disseminated across the Internet or perhaps built into the basic operating system software or computer hardware. Then anyone who wanted to use a public key could 88 4. CRYPTOLOGY first check the validity of the associated certificate and have confidence that the intended party did own the key in question. The entire aggregate of certificates, CAs, pre-installed or widely distributed verification keys, etc., is called a public key infrastructure or PKI. In actual practice, the CA often cannot do more than certify that a certain public key is owned by an individual who has access to a certain e-mail address, or the webmaster password to a certain site, or some such purely Internet-based token. That much is usually quite easy – connecting this with a real, external identity would require checking some form of government-issued ID, probably, and is rarely done. Although it would be useful: perhaps the government should act as a CA, and government IDs should have a built-in RFID (recent US passports do!) which can transmit a certificate on the ID owner’s public key. There is one other approach to figuring out whether to have faith in a particular public key, favored by those who mistrust authority but are willing to trust some others they know personally. In this approach, individuals who know each other personally and have faith in each other’s good cryptologic habits can each sign each other’s public keys, adding their digital signatures to those which have already been collected. Then when you want to use someone’s public key, you can unwrap a chain of digital signatures, each signing the next, until you find one by a person whom you know personally, have met, and whose public key you have verified. This approach has become known as the web of trust, and is strongly supported by GnuPG and other OpenPGP-compatible organizations, see gnupg.org. By the way, to be useful, a web of trust must include as many individuals and their keys as possible, and have as many connections (where individual X knows and signs individual Y’s public key) as possible. One way to get this going quickly is to have a key-signing party. If you are standing next to someone whom you know well who says “sure, this other person is someone I know well and trust”, then you might be willing to sign both of their keys right there. In practice, when you sign keys, you are standing there with someone whom you trust, so it usually suffices to check the md5 fingerprint of their key and not the whole thing – the md5 is shorter and standing there with this person you trust, presumably you do not think that anyone there has devoted large computational resources to finding a second md5 pre-image the fingerprint. CHAPTER 5 Indices = Discrete Logarithms We talked about the cyclic subgroup hai of (Z/nZ)∗ generated by an element a in the proof of our version of Lagrange’s Theorem 3.3.3, on the way to Euler’s Theorem 3.3.4. Recall it consisted of all powers of a (well, of [a]n ) in (Z/nZ)∗ . Here are some examples: n φ(n) (Z/nZ)∗ a hai ordn (a) 2 1 {1} 1 {1} 1 3 2 {1, 2} 1 {1} 1 2 {2, 22 ≡ 1} 2 4 2 {1, 3} 1 {1} 1 3 {3, 32 ≡ 1} 2 5 4 {1, 2, 3, 4} 1 {1} 1 2 {2, 22 ≡ 4, 23 ≡ 3, 24 ≡ 1} 4 3 {3, 32 ≡ 4, 33 ≡ 2, 34 ≡ 1} 4 4 {4, 42 ≡ 1} 2 6 2 {1, 5} 1 {1} 1 5 {5, 52 ≡ 1} 2 7 6 {1, 2, 3, 4, 5, 6} 1 {1} 1 2 {2, 22 ≡ 4, 23 ≡ 1} 3 3 {3, 32 ≡ 2, 33 ≡ 6, 34 5 6 ≡ 4, 3 ≡ 5, 3 ≡ 1} 6 4 {4, 42 ≡ 2, 43 ≡ 1} 3 5 {5, 52 ≡ 4, 53 ≡ 6, 54 5 6 ≡ 2, 5 ≡ 3, 5 ≡ 1} 6 6 {6, 62 ≡ 1} 2 8 4 {1, 3, 5, 7} 1 {1} 1 3 {3, 32 ≡ 1} 2 5 {5, 52 ≡ 1} 2 7 {7, 72 ≡ 1} 2 TABLE 5.0.1. Cyclic subgroups of (Z/nZ)∗ for n = 2, . . . , 8 Each of these cyclic subgroups hai has a size – being the order ordn (a) of its generator – which divides the size of the ambient (Z/nZ)∗ , as we know will happen from Euler’s Theorem 3.3.4. 89 90 5. INDICES = DISCRETE LOGARITHMS Some random things we also notice, which might or might not hold true in general, given the small amount of evidence in this table, are: • frequently there is an element a which generates the biggest possible cyclic sub- group: hai = (Z/nZ)∗ ; • there seem always to be those big cyclic subgroups in Z/nZ when n is a prime; • even some composite n give a Z/nZ which contains a big cyclic subgroup, except for the case of the largest power of 2, which was n = 23 , that we tried. Just to see if those possible general statements hold a bit further, let’s compute two more examples. For these we give only the sizes of (Z/nZ)∗ and hai, not complete lists of their elements, to conserve space and since the elements of (Z/nZ)∗ can be found in the column labelled “a”: n φ(n) a ∈ (Z/nZ)∗ ordn (a) 16 8 1 1 3 4 5 4 7 2 9 2 11 4 13 4 15 2 17 16 1 1 2 8 3 16 4 4 5 16 6 16 7 16 8 8 9 8 10 16 11 16 12 16 13 4 14 16 15 8 16 2 TABLE 5.0.2. Cyclic subgroups of (Z/nZ)∗ for n = 16 and n = 17 5.1. MORE PROPERTIES OF MULTIPLICATIVE ORDER 91 Our guesses (large powers of 2 not so good; other n’s, particularly prime, are good) still hold true. Of course, any finite amount of evidence for a general mathematical statement is very inconclusive.... Now to the formal analysis. 5.1. More Properties of Multiplicative Order But first, we need some more facts about [multiplicative] order. T HEOREM 5.1.1. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then k a ≡ 1 (mod n) for k ∈ N iff ordn (a) | k. P ROOF. One direction is very easy: if k ∈ N satisfies ordn (a) | k then ∃d ∈ N such that k = ordn (a) · d and thus ak = aordn (a)·d = (aordn (a) )d ≡ 1d = 1 (mod n) . Conversely, suppose k ∈ N is such that ak ≡ 1 (mod n). Apply the Division Algo- rithm to get q, r ∈ N such that k = ordn (a) · q + r and 0 ≤ r < ordn (a). Then 1 ≡ ak = aordn (a)·q+r = (aordn (a) )q · ar ≡ 1q · ar ≡ ar (mod n) . But the definition of the order ordn (a) is that it is the smallest positive integer such that a to that power is congruent to 1. The only way for ar ≡ 1 (mod n) and 0 ≤ r < ordn (a) to be true, then, is if r = 0. Thus k = ordn (a) · q and ordn (a) | k, as desired. What this will mean is that when we work in the cyclic subgroup of (Z/nZ)∗ gener- ated by some element a, we should work with the powers of a as if the powers lived in Z/(ordn (a))Z. That is: T HEOREM 5.1.2. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then a ≡ ak (mod n) for j, k ∈ N iff j ≡ k (mod ordn (a)). j P ROOF. Again, one direction is very easy: if j, k ∈ N satisfy j ≡ k (mod ordn (a)) then ∃ℓ ∈ Z such that j = k + ordn (a) · ℓ from which we compute aj = ak+ordn (a)·ℓ = ak · (aordn (a) )ℓ ≡ ak · 1ℓ = ak (mod n) . Conversely, suppose j, k ∈ N are such that aj ≡ ak (mod n) and, without loss of generality, j ≥ k. Multiplying both sides of this congruence by (a−1 )k , which exists since gcd(a, n) = 1, we get aj−k ≡ 1 (mod n). Then by Theorem 5.1.1 we have ordn (a) | (j − k) or j ≡ k (mod ordn (a)). E XAMPLE 5.1.3. The rows in Table 5.0.1 bear out Theorems 5.1.1 and 5.1.2: each cyclic subgroup (row) has a number of elements which divides the corresponding φ(n), and powers of the generator a are only defined up to ordn (a). 92 5. INDICES = DISCRETE LOGARITHMS It also seems that some of the smaller cyclic subgroups sometimes occur as a subset of a larger cyclic subgroup. So in mod n = 7, if a = 3 and b = a2 ≡ 2, then we have the containment hbi ⊂ hai, where hbi consists of half of the elements of hai, namely the even powers of a: hbi = b, b2 , b3 = {2, 4, 1} = 32 , 34 , 36 ⊂ 3, 32 , 33 , 34 , 35 , 36 = hai Looking instead at c = 33 ≡ 6, we still have hci ⊂ hai, but now hci consists of one third of the elements of hai, the powers of a which are multiples of 3: hci = c, c2 = {6, 1} = 33 , 36 ⊂ 3, 32 , 33 , 34 , 35 , 36 = hai The last part of this example seems to be asking for a general statement about what size subset of a cyclic subgroup hai is formed of the cyclic subgroup ak for some k ∈ N. This size will just be the order of that element ak , so we need the following T HEOREM 5.1.4. Suppose n ∈ N and a ∈ Z satisfy n ≥ 2 and gcd(a, n) = 1. Then ordn (a) ∀k ∈ N, ordn (ak ) = gcd(ordn (a),k) P ROOF. Fix some k ∈ N and let r = ord(ak ) and s = ord(a); with this notation, what s we are trying to prove is that r = . gcd(s, k) We shall repeatedly use in the proof the fact that since a gcd is a divisor, gcd(s, k) | s s k and gcd(s, k) | k, which means in turn that both , ∈ N. gcd(s, k) gcd(s, k) Now to the proof. The definition of order is that r is the smallest natural number such that (ak )r ≡ 1 (mod n). Notice that s k k (ak ) gcd(s,k) = (as ) gcd(s,k) = 1 gcd(s,k) ≡ 1 (mod n) . s By Theorem 5.1.1, we conclude that r . gcd(s, k) Smallest or not, since akr = (ak )r ≡ 1 (mod n), s | kr by Theorem 5.1.1, so ∃m ∈ N such that ms = kr. Dividing both sides by gcd(s, k), we get an equation of natural numbers s k m = r; gcd(s, k) gcd(s, k) which means s k r. gcd(s, k) gcd(s, k) s k By Theorem 1.5.5, gcd , = 1, so Euclid’s Lemma 2.1.6 tells us that gcd(s, k) gcd(s, k) s r. gcd(s, k) s We have shown that r and divide each other. But that means they are equal, gcd(s, k) which is what we were trying to prove. 5.1. MORE PROPERTIES OF MULTIPLICATIVE ORDER 93 Exercises for §5.1. E XERCISE 5.1.1. Suppose n ∈ N satisfies n ≥ 2. Prove that if there exists a ∈ (Z/nZ)∗ such that ordn (a) = n − 1, then n is prime. E XERCISE 5.1.2. Suppose p is an odd prime and a ∈ (Z/pZ)∗ . Prove that if ∃k ∈ N such that ordp (a) = 2k then ak ≡ −1 (mod p). [Hint: look at the proof of Lemma 3.2.1.] E XERCISE 5.1.3. Suppose n ∈ N satisfies n ≥ 2 and a, b ∈ Z/nZ are both relatively prime to n. Prove that ordn (ab) | ordn (a) ordn (b) . E XERCISE 5.1.4. Prove that there are infinitely many primes congruent to 1 mod 4, by filling in the details of the following outline: (1) Prove: if an odd prime p and n ∈ Z are such that n2 ≡ −1 (mod p), then 4 | φ(p) [use a theorem in this section]. Why is it necessary to exclude the even prime here? (2) Therefore for n ∈ Z, the odd prime divisors of n2 + 1 are congruent to 1 mod 4. (3) Now assume for contradiction’s sake that there are only finitely many primes p1 , . . . , pn congruent to 1 mod 4 and consider the number (2p1 · · · · · pn )2 + 1. Apply the previous step.... 94 5. INDICES = DISCRETE LOGARITHMS 5.2. A Necessary Digression: Gauss’s Theorem on Sums of Euler’s Function Later in this chapter we will need a fact first proved by Gauss about Euler’s φ function: X T HEOREM 5.2.1. For all n ∈ N, φ(d) = n . d∈N s.t. d|n We’ll give two proofs, which illustrate different features of this situation: P ROOF 1. Fix n ∈ N. Let’s define some subsets of {1, . . . , n}, dependent upon a choice of a positive divisor d | n, as follows Sd = {k ∈ N | 1 ≤ k ≤ n and gcd(k, n) = d} . These sets are disjoint since for each k ∈ N such that 1 ≤ k ≤ n, d = gcd(k, n) has a specific value and k is only in that Sd . For k ∈ Sd , gcd(k, n) = d or, by Theorem 1.5.5, gcd(k/d, n/d) = 1. Thus #(Sd ) is the number of elements ℓ ∈ N in the range 1 ≤ ℓ ≤ n/d which are relatively prime to n/d, i.e., #(Sd ) = φ(n/d). Every k ∈ Z in the range 1 ≤ k ≤ n is in exactly one of these Sd , for d a positive divisor of n. Therefore [ {1, . . . , n} = Sd d∈N s.t. d|n so [ X X n = # ({1, . . . , n}) = # S d = # (S d ) = φ(n/d) . d∈N d∈N d∈N s.t. d|n s.t. d|n s.t. d|n But as d runs over the positive divisors of n, so does n/d; in other words {d | d ∈ N, 1 ≤ d ≤ n and d | n} = {n/d | d ∈ N, 1 ≤ d ≤ n and d | n} We can therefore rewrite the last big equation as X X n= φ(n/d) = φ(d) d∈N d∈N s.t. d|n s.t. d|n which is what we were trying to prove. P ROOF 2. This approach will concentrate on the function X F (n) = φ(d) , d∈N s.t. d|n which has some very nice properties. 5.2. A NECESSARY DIGRESSION: GAUSS’S THEOREM ON SUMS OF EULER’S FUNCTION 95 For example, suppose p is a prime and k ∈ N. Then the divisors of pk are 1, p, p2, . . . , pk , so F (pk ) = φ(1)+φ(p)+φ(p2)+· · ·+φ(pk ) = 1+(p−1)+(p2 −p)+· · ·+(pk −pk−1 ) = pk . Furthermore, if p and q are distinct primes, then the divisors of pq are 1, p, q, and pq. Also, gcd(p, q) = 1, so by Theorem 2.5.3 F (pq) = φ(1) + φ(p) + φ(q) + φ(pq) = φ(1) + φ(p) + φ(q) + φ(p)φ(q) = (1 + φ(p))(1 + φ(q)) = F (p)F (q) , From this fact, one can prove (it’s an exercise below) that F (ab) = F (a)F (b) whenever a, b ∈ N are relatively prime. This means that F has the same sort of multiplicative property for relatively prime factors as does φ. We are ready to finish the proof. So let n ∈ N be any number such that n > 1 (for n = 1, the theorem is trivial). Suppose the prime decomposition of n is given by n = pk11 · · · · · pkr r , k where the primes {p1 , . . . , pr } are distinct. Therefore gcd(pki i , pj j ) = 1 if i 6= j and so we can use the multiplicative property of F and our calculation of F for prime powers to conclude F (n) = F (pk11 · · · · · pkr r ) = F (pk11 ) · · · · · F (pkr r ) = pk11 · · · · · pkr r =n, which is what we wanted to prove. 96 5. INDICES = DISCRETE LOGARITHMS Exercises for §5.2. E XERCISE 5.2.1. Finish the second proof of Gauss’s Theorem 5.2.1 by proving that F (ab) = F (a)F (b) for all a, b ∈ N which are relatively prime. 5.3. PRIMITIVE ROOTS 97 5.3. Primitive Roots Now back to those elements which generate large cyclic subgroups – which have a name: D EFINITION 5.3.1. Given n ∈ N such that n ≥ 2, an element a ∈ (Z/nZ)∗ is called a primitive root mod n if ordn (a) = φ(n). We shall also call an integer x ∈ Z a primitive root mod n if [x]n is a primitive root in the sense just defined. E XAMPLE 5.3.2. From the two tables in the introduction to this chapter we can read off the following primitive roots mod their respective n’s: n Primitive Roots mod n φ(n) φ(φ(n)) 2 1 1 1 3 2 2 1 4 3 2 1 5 2, 3 4 2 6 5 2 1 7 3, 5 6 2 8 none 4 2 .. . 16 none 8 4 17 3, 5, 6, 7, 10, 11, 12, 14 16 8 TABLE 5.3.1. Primitive roots for n = 2, . . . , 8, 16, 17 We have included the column of φ(n) since that is the order that each primitive root must have. And then we added the column of φ(φ(n)) as well, since by some strange magic it appears frequently to compute the number of primitive roots. Let us state formally, and prove, the general result noticed in this example: T HEOREM 5.3.3. Given n ∈ N satisfying n ≥ 2, if n has any primitive roots than it has exactly φ(φ(n)) primitive roots. P ROOF. Let a ∈ (Z/nZ)∗ be a primitive root. That means that hai = a, a2 , . . . aordn (a) = (Z/nZ)∗ since ordn (a) = φ(n) = # ((Z/nZ)∗ ). Any other primitive root b, being an element of (Z/nZ)∗ must be one of these powers of a. In other words, ∃k ∈ N such that b = ak . In order for this b to be a primitive root, its order must be φ(n). But from Theorem 5.1.4, we know that ord(a) φ(n) ord(b) = = . gcd(ord(a), k) gcd(φ(n), k) 98 5. INDICES = DISCRETE LOGARITHMS Therefore b = ak will be a primitive root if and only if gcd(φ(n), k) = 1. From the definition of Euler’s φ function, this happens for exactly φ(φ(n)) values of k. Let’s see if we can prove that in some circumstances, we will have the first primitive root that the above theorem requires. The easiest situation in which that might happen will probably be when the modulus is prime, as we have the strongest tools to work with in that case. Finding elements of a particular order d amounts (in part) to finding solutions to the equation xd − 1 ≡ 0 (mod p). The first step toward this is the following more general theorem about polynomials in mod p due to Lagrange: T HEOREM 5.3.4. For n ∈ N and a0 , . . . , an ∈ Z, the polynomial an xn + · · · + a1 x + a0 ≡ 0 (mod p) , where an 6≡ 0 (mod p), has at most n solutions in Z/pZ if p is prime. P ROOF. We use induction on the degree n. The base case n = 1 amounts to solving the linear congruence a1 x + a0 ≡ 0 (mod p) where a1 6≡ 0 (mod p). Since p is prime, this means that gcd(p, a1 ) = 1, and therefore the linear congruence has a unique solution mod p by the version of Theorem 2.2.4 as stated in Remark 2.2.5. Now for the inductive step, assuming that the theorem is true for some n ∈ N, we shall prove it for the case n + 1. So let a0 , . . . , an+1 ∈ Z be such that an+1 6≡ 0 (mod p). If a(x) = an+1 xn+1 + an xn + · · · + a1 x + a0 has no zeros in mod p, then the theorem is certainly true for this polynomial of degree n+1: the number of solutions to a(x) ≡ 0 (mod p) is 0 < n + 1. If instead a(x) has at least one zero z1 in mod p, do long division of polynomials to get a(x) = b(x)(x − z1 ) + r(x) where b(x) and r(x) are polynomials with integral coefficients and such that 0 ≤ deg(r(x)) < deg(x − z1 ) = 1 . It therefore follows that r(x) is a constant polynomial, say with value r1 ∈ Z. Plug z1 into the above formula resulting from division of polynomials to get 0 ≡ a(z1 ) ≡ b(z1 )(z1 − z1 ) + r1 ≡ r1 (mod p) . Hence r1 ≡ 0 (mod p), which means that our polynomial a(x) ≡ b(x)(x − z1 ) (mod p). But an+1 equals (one times) the leading coefficient of b(x), so that leading coefficient must not be congruent to 0 mod p. Hence the inductive hypothesis applies to b(x) and tells us that it has no more than n roots in mod p. These roots of b(x), plus the single root of (x − z1 ), total no more than n + 1 roots for a(z) = b(x)(x − z1 ). 5.3. PRIMITIVE ROOTS 99 All we need to do to finish is to see that any root of a(x) must in fact be a root of b(a) or (x − z1 ). So suppose r ∈ Z is a root, so a(r) = b(r)(r − z1 ) ≡ 0 (mod p), which means in turn that p | b(r)(r − z1 ). Then by Proposition 3.1.5 we must have p | b(r) or p | (r − z1 ). In other words, b(r) ≡ 0 (mod p) or (r − z1 ) ≡ 0 (mod p), as we hoped. [This amounts to using what is called in basic algebra the “zero product property”, which is true in Z/pZ for p prime by Proposition 3.1.5; in abstract algebra, we say that Z/pZ is a domain if p is prime.] Thus the roots of a(x) all come from roots of b(x) or (x − z1 ), and so number at most n + 1, which means we have proven the inductive step. So much for a maximum number of roots. In the following particular case, we can get the exact number of roots: T HEOREM 5.3.5. If p is prime and d | p − 1, then there are exactly d solutions, up to congruence mod p, of the congruence xd = 1 (mod p) . P ROOF. Let p and d be as in the statement, and define m = (p − 1)/d ∈ N. We use a clever factorization, defining a(x) = xd − 1, b(x) = xdm−d + xdm−2d + · · · + xd + 1, and c(x) = xp−1 − 1 so that c(x) = a(x)b(x). Notice that by Fermat’s Little Theorem, c(x) has exactly p − 1 roots which are distinct in mod p, being {1, . . . , p − 1}. Also, by the previous Theorem 5.3.4, b(x) has at most deg(b(x)) = dm − d = p − 1 − d roots. Since, as in the proof of that Theorem 5.3.4 (the part where we mentioned the “zero product property”), roots of c(x) correspond exactly to the roots of a(x) and those of b(x), there must be at least d roots of a(x) to add to these no more than p − 1 − d roots of b(x), making the exactly p − 1 roots of p − 1. We are now in a position to quantify exactly the congruence classes in Z/pZ, for p a prime, of particular orders: T HEOREM 5.3.6. If p is prime and d | p − 1, then there are exactly φ(d) distinct congruence classes of order d in Z/pZ. P ROOF. Let p and d be as in the statement and write ψ(k) = # ({ℓ ∈ Z | 1 ≤ ℓ ≤ p − 1 and ordp (ℓ) = k}) . 100 5. INDICES = DISCRETE LOGARITHMS By our version of Lagrange’s Theorem 3.3.3, any ℓ ∈ Z such that 1 ≤ ℓ ≤ p − 1 has an order which divides φ(p) = p − 1, so X ψ(k) = p − 1 . k∈N s.t. k|p−1 In addition, by Gauss’s Theorem 5.2.1, X φ(k) = p − 1 . k∈N s.t. k|p−1 Therefore X X φ(k) = ψ(k) . k∈N k∈N s.t. k|p−1 s.t. k|p−1 Our goal now is to show that ∀k ∈ N such that k | p − 1, we have ψ(k) ≤ ψ(k). If we can do this, then in fact for all such k, we would have to have ψ(k) = ψ(k) because if for one k we had ψ(k) < ψ(k) then there would be no way for another k to give ψ(k) > ψ(k) so P P that ψ(k) = φ(k) could still hold. So let k ∈ N satisfy k | p − 1. If ψ(k) = 0, meaning that there are no elements of order k, then certainly ψ(k) ≤ φ(k) as φ(k) is a non-negative function. Suppose ψ(k) > 0, meaning there exists at least one element, call it a, of order k in Z/pZ. Notice that for any n ∈ N, (an )k = (ak )n ≡ 1n ≡ 1 (mod p), which means that a, . . . , ak are distinct elements of Z/pZ which all satisfy xk − 1 ≡ 1 (mod p) . By Theorem 5.3.5, there are exactly k solutions of this congruence equation. These so- lutions include all of the elements of Z/pZ of order k, and maybe other elements, whose order is a divisor of k, in which we are not so interested. But in fact, by Theorem 5.1.4, we know exactly the orders of these elements a, . . . , ak : the order of aℓ is ordp (a) k = . gcd(ordp (a), ℓ) gcd(k, ℓ) Thus the elements of Z/pZ of order k are those elements aℓ for ℓ ∈ Z satisfying 1 ≤ ℓ ≤ k for which gcd(k, ℓ) = 1. There are therefore φ(k); in other words, ψ(k) = φ(k). So, the punch line: C OROLLARY 5.3.7. If p is prime, there are φ(p − 1) primitive roots in Z/pZ. P ROOF. Use k = p − 1 in the previous theorem. Just to finish the thread of conjectures with which we started this chapter and section, let us prove the 5.3. PRIMITIVE ROOTS 101 T HEOREM 5.3.8. Let k ∈ N satisfy n ≥ 3. Then n = 2k has no primitive roots. P ROOF. We shall prove that for all odd numbers a, k−2 a2 ≡ 1 (mod 2k ) by induction on k. Start with k = 3 and just look at all the congruence classes in Z/8Z with odd represen- tatives, and the squares of these classes: [1]28 = [1]8 , [3]28 = [9]8 = [1]8 , [5]28 = [25]8 = [1]8 , and [7]28 = [49]8 = [1]8 . So the base case k = 3 is established. Now assume the statement holds for the value k, meaning that for all odd a, k−2 a2 ≡ 1 (mod 2k ) . In other words, ∃ℓ ∈ Z such that k−2 a2 = 2k ℓ + 1 . Squaring both sides of this, we get k−1 k−2 2 2 a2 = a2 = 2k ℓ + 1 = 22k ℓ2 + 2 · 2k ℓ + 1 = 2k+1 2k−1 ℓ2 + ℓ + 1, meaning that (k+1)−2 k−1 a2 = a2 ≡1 (mod 2(k+1) ) . The inductive proof of the statement with which we began this proof is done. But notice that this means that for all odd numbers a, or, otherwise, for all a ∈ (Z/2k Z)∗ , ord2k (a) ≤ 2k−2 < 2k−1 . Thus n = 2k has no primitive roots, which would all have to be odd numbers of order φ(2k ) = 2k−1. As we thought we noticed, based on the few examples we had calculated, large powers of two do not have primitive roots. 102 5. INDICES = DISCRETE LOGARITHMS Exercises for §5.3. E XERCISE 5.3.1. Express each of the primitive roots of 17 as a power of one of them. E XERCISE 5.3.2. Find all of the primitive roots for the primes 11 and 13 and express them each as a power of one of them. E XERCISE 5.3.3. Find all of the elements of Z/13Z which have each possible order. E XERCISE 5.3.4. By expressing everything as powers of single primitive root, use Corollary 5.3.7 to prove one direction of Wilson’s Theorem E XERCISE 5.3.5. If r is a primitive root of the odd prime p, prove that r (p−1)/2 ≡ −1 (mod p). Also, prove that if s is any other primitive root of p, then rs cannot be a primitive root. E XERCISE 5.3.6. Prove that the inverse of a primitive root is always a primitive root. E XERCISE 5.3.7. If p is a prime congruent to 1 mod 4, prove that for all primitive roots r of p, −r is also a primitive root. If instead p is a prime congruent to 3 mod 4, prove that ordp (−r) = (p − 1)/2. 5.4. INDICES 103 5.4. Indices So for some values of n ∈ N, there exists a primitive root a ∈ (Z/nZ)∗ , in which case we have seen that hai = a, a2 , . . . aordn (a) = (Z/nZ)∗ . That means that any b ∈ (Z/nZ)∗ is some power of a. “But which power?” you cry, so we make the D EFINITION 5.4.1. If n ∈ N has a primitive root a, then for any b ∈ Z such that gcd(b, n) = 1 the smallest k ∈ N such that b ≡ ak (mod n) is called the index of b relative to a and is denoted inda (b). We shall also sometimes talk about inda (x) where x ∈ (Z/nZ)∗ , meaning the index relative to a of any representative b of the congruence class x. E XAMPLE 5.4.2. Working from the tables above (5.0.1, 5.0.2, and 5.3.1), it is easy to compute a number of examples. First, for the smaller values of the moduli, where there are few primitive roots: n a b inda (b) n a b inda (b) 2 1 1 1 7 3 1 6 3 2 1 2 2 2 2 1 3 1 4 3 1 2 4 4 3 1 5 5 5 2 1 4 6 3 2 1 5 1 6 3 3 2 4 4 2 3 5 5 3 1 4 4 2 2 3 5 1 3 1 6 3 4 2 TABLE 5.4.1. Index values for moduli n = 2, 3, 4, 5, 7 For the two larger moduli we worked out above, only n = 17 has primitive roots. Since (Z/17Z)∗ has 16 elements and 8 primitive roots, we make a larger table for just this modulus: 104 5. INDICES = DISCRETE LOGARITHMS b inda (b) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 3 16 14 1 12 5 15 11 10 2 3 7 13 4 9 6 8 5 16 6 13 12 1 3 15 2 10 7 11 9 4 5 14 8 6 16 2 15 4 11 1 5 6 14 13 9 3 12 7 10 8 7 16 10 3 4 15 13 1 14 6 9 5 7 12 11 2 8 a 10 16 10 11 4 7 5 9 14 6 1 13 15 12 3 2 8 11 16 2 7 4 3 9 13 6 14 5 1 11 12 15 10 8 12 16 6 5 12 9 11 7 2 10 15 3 1 4 13 14 8 14 16 14 9 12 13 7 3 10 2 11 16 5 4 1 6 8 TABLE 5.4.2. Index values for modulus n = 17 Some things are natural to conjecture, given these examples and the simple definition of index; many of them are very easy to prove (and are exercises): T HEOREM 5.4.3. Let n ∈ N have a primitive root a. Then (1) inda (1) = ordn (a) = φ(n); equivalently, inda (1) ≡ 0 (mod φ(n)) (2) inda (a) = 1 (3) ∀b, c ∈ Z such that gcd(b, n) = gcd(c, n) = 1, we have inda (bc) ≡ inda (b) + inda (c) (mod φ(n)) (4) ∀b ∈ Z such that gcd(b, n) = 1 and ∀k ∈ N, we have inda (bk ) ≡ k · inda (b) (mod φ(n)) These properties of inda are strikingly similar to the basic properties of a logarithm with base a. For this reason, indices are called discrete logarithms in the computer science literature. For cryptological applications, it is important to note that exponentiation in some (Z/nZ)∗ is an excellent candidate one-way function: • Given n, a, and k, fast modular exponentiation is a feasible computation of b = ak in mod n. • Given n, a, and b, no known feasible algorithm finds a k = inda (b) such that ak ≡ b (mod n). We will discuss some discrete logarithm-based cryptological protocols in the next sections. Before turning to cryptology, we explore some pure mathematical applications of in- dices. Like the applications of logarithms in basic algebra, the usefulness of indices comes from the above Theorem 5.4.3 enabling convenient algebraic manipulations of powers and multiplications. Here’s an E XAMPLE 5.4.4. We use indices to solve the congruence 3x5 ≡ 4 (mod 7) 5.4. INDICES 105 by first taking ind5 of both sides ind5 (3) + 5 ind5 (x) ≡ ind5 (4) (mod 6) and applying all the rules in Theorem 5.4.3. Looking in Table 5.4.1, we translate that into 5 + 5 · ind5 (x) ≡ 2 (mod 6) or, solving: ind5 (x) ≡ 5−1 · 3 ≡ 5 · 3 ≡ 3 (mod 6) which means, searching through the table again, that x = 6. Just to check, notice that 3 · 65 = 23328 ≡ 4 (mod 7). What if we solve the same equation, but use a different primitive root in our indices? Compute ind3 (3) + 5 ind3 (x) ≡ ind3 (4) (mod 6) so 1 + 5 · ind5 (x) ≡ 4 (mod 6) and this yields the same equation ind5 (x) ≡ 5−1 · 3 ≡ 3 (mod 6) and thus the same solution x = 6. Caution: A common mistake with indices is to use the same modulus with the indices as with the original congruence. Instead, when the congruence is in mod n, for n ∈ N, the index congruence is in mod φ(n)! 106 5. INDICES = DISCRETE LOGARITHMS Exercises for §5.4. E XERCISE 5.4.1. In the modulus n = 17, use indices to solve the congruences (a) 4x ≡ 11 (b) 5x6 ≡ 7 (c) x12 ≡ 13 (d) 8x5 ≡ 10 (e) 9x8 ≡ 8 (f) 7x ≡ 7 E XERCISE 5.4.2. The logarithm rules in Theorem 5.4.3 are very similar to rules for the usual logarithm, except one is missing: the change of base formula. Figure out what that should be in the context of indices, make a formal statement, and prove it. E XERCISE 5.4.3. Let p be an odd prime and a a primitive root mod p. (a) Prove that inda (−1) = (p − 1)/2 (b) If x, y ∈ (Z/pZ)∗ satisfy xy ≡ 1 (mod p), then what is the relationship between inda (x) and inda (y)? Prove it! (c) If x, y ∈ (Z/pZ)∗ satisfy x + y ≡ 0 (mod p), then what is the relationship between inda (x) and inda (y)? Prove it! 5.5. DIFFIE-HELLMAN KEY EXCHANGE 107 5.5. Diffie-Hellman Key Exchange About a year before the RSA cryptosystem was invented, Whitfield Diffie and Martin Hellman published New directions in cryptography [DH76], the first full description of a working public key cryptosystem in the open scientific literature.1 They defined in this paper something which has since been named after them: D EFINITION 5.5.1. The following protocol is called Diffie-Hellman key exchange [DHKE]: (1) Alice and Bob agree upon a large prime p and a primitive root r ∈ (Z/pZ)∗ and publish both. (2) Alice chooses an α ∈ Z satisfying 1 ≤ α ≤ p − 1, computes A = r α (mod p), keeps α secret, but publishes A. (3) Bob chooses a β ∈ Z satisfying 1 ≤ β ≤ p − 1, computes B = r β (mod p), keeps β secret, and publishes B. (4) Alice gets the public value B and computes S = B α . (5) Bob gets A and computes the same value S = Aβ . (6) Both Alice and Bob use the shared secret S for future communications encrypted with some symmetric cryptosystem upon which they had previously agreed. The value S which both Alice and Bob have [but Eve does not] is called their shared key or shared secret. Here is a graphical representation: Diffie-Hellman key exchange: Alice on public network Bob pick a prime p find a primitive root r pick α, compute pick β, compute A = r α (mod p) B = r β (mod p) publish A A B publish B get B, compute get A, compute S = B α (mod p) S = Aβ (mod p) assymteric crypto with key S ⇐=======================⇒ P ROPOSITION 5.5.2. When Alice and Bob follow the DHKE protocol, they both com- pute the same shared key; i.e., DHKE works. 1Although in fact, some workable public key crypto had been invented earlier within the US/UK in- telligence community, and not shared with the public. Since quite early in the Cold War, there have been mathematical theorems and proofs held in secret by large governments. 108 5. INDICES = DISCRETE LOGARITHMS P ROOF. There’s very little to check here: using the notation of the definition and, in the very middle, the commutativity of multiplication, we have α B α = r β = r β·α = r α·β = (r α )β = Aβ . The left end of this equality is the S that Alice computes, while the right end is the one that Bob computes, and they are the same. E XAMPLE 5.5.3. Alice and Bob agree in open, public discussion to use the prime p = 617 and its primitive root r = 17. Alice privately chooses her secret value α = 19 and sends the value A = 1719 ≡ 385 (mod 617) to Bob by insecure e-mail. (All unencrypted e-mail is insecure, of course.) Bob chooses his secret value β = 13 and sends the value B = 1713 ≡ 227 (mod 617) to Alice by insecure e-mail. Alice computes the shared secret S = B α = 22719 ≡ 127 (mod 617) . Bob computes the same shared secret instead by S = Aβ = 38513 ≡ 127 (mod 617) . They then can use this value to do symmetric crypto for the rest of this communication. What about the practicality of DHKE? As was discussed in §4.4, finding a the [large] prime p (we continue using the notation in Definition 5.5.1) is computationally feasible, as are the several modular exponentiations in the DHKE protocol. There remains the question of finding the primitive root r. One way is to avoid searching for r more than once. After one prime p and an associated primitive root r for p are found, each user can choose their secret (Alice’s α and Bob’s β), without overlap or conflict. This is the approach suggested in some Internet standards, see, e.g., [HC] and [LK08]. Another, more mathematical, strategy is based on the following definition, which we give because of its independent mathematical interest and the amazing life of its namesake. D EFINITION 5.5.4. A prime p with the property that 2p + 1 is also a prime is called a Sophie Germain prime. It is not known how many Sophie Germain primes there are, although it is conjectured that there are an infinite number. In fact, there are precise conjectures on the asymptotic density of such primes, as well as algorithmic techniques to generate them efficiently; see [Sho09]. 5.5. DIFFIE-HELLMAN KEY EXCHANGE 109 E XAMPLE 5.5.5. The first seventeen Sophie Germain primes are 2, 3, 5, 11, 23, 29, 41, 53, 83, 89, 113, 131, 173, 179, 191, 233, 239 The sequence of all Sophie Germain primes is sequence A005384 at the On-Line Encyclo- pedia of Integer Sequences, oeis.org. The largest known Sophie Germain prime, at the time of this writing, is 18543637900515 × 2666667 − 1 discovered in 2012 by Philipp Bliedung and a large distributed network of computers. The reader is asked to investigate the usefulness of Sophie Germain primes in DHKE is explored in the exercises, below. As mentioned in the last section 5.4, DHKE depends for its security on modular expo- nentiation be a one-way function: feasible to compute forwards (by fast modular exponen- tiation) but infeasible to computer backwards (which is an index). If discrete logs could be computed by a feasible algorithm, then Eve could completely break the security of DHKE. Starting with the public values of A and B, she would compute α = indr (A) and β = indr (B). She could then compute S as either B α , Aβ , or, directly, as r α·β . Having S, she could decrypt all of Alice and Bob’s communications as they go by. Actually, it is not necessary for Eve to be able to compute discrete logs, as long as she can solve a specific computation problem. D EFINITION 5.5.6. The Diffie-Hellman problem [DHP] consists of the following question: Given • a prime p, • a primitive root r ∈ (Z/pZ)∗ , and • two elements a, b ∈ (Z/pZ)∗ which are known to be of the form a = r x and b = r y for x, y ∈ N, although the x and y are not known compute • r xy . An efficient way to compute discrete logs would of course result in a solution of the Diffie-Hellman problem, but it is not known if there might be a solution of the DHP which is does not consist of a full-blown algorithm to compute discrete logs. Since this question is open as of this writing, it makes sense to make the most precise statement: breaking DHKE amounts of solving the DHP, which is not feasible on a classical computer2 DHKE is one of the most widely used cryptologic protocols on the Internet. It is part of SSL, TLS, SSH, IPsec, and many VPNs, among others. 2 As is the case with factoring, there is a known algorithm for a quantum computer which solves the DHP efficiently, see [Sho94]. 110 5. INDICES = DISCRETE LOGARITHMS Exercises for §5.5. E XERCISE 5.5.1. Suppose you had an efficient algorithm to generate large Sophie Ger- main primes. Describe how you would use this to do the initial choice of public parameters for DHKE – go through all of the set-up stages of this protocol, and explain how each can be done feasibly, based on either past discussions of calculations we can do feasibly or on new ideas you develop here. E XERCISE 5.5.2. The number p = 11717 is prime, and r = 103 is a primitive root of this p. Playing the role of Alice, your instructor computed A = 5123. Please play the role of Bob and do what is necessary to establish a shared key with your instructor: you compute S as in DHKE, and send your B by e-mail so that your instructor can also compute S. Then await further instructions by return e-mail, encrypted with the shared secret. You may need to perform calculations that are hard to do on a hand calculator. If you have access to, and are familiar with, some computer system like Octave, Matlab, or Mathematica, you should be able to do the calculations that way. Otherwise, you might try using your favorite search engine on the phrase “fast modular exponentiation applet”, or using wolframalpha.com. E XERCISE 5.5.3. Spell out in precise, mathematical detail all of the steps which would be used in a man-in-the-middle attack against DHKE. 5.6. THE ELGAMAL CRYPTOSYSTEM 111 5.6. The ElGamal Cryptosystem As mentioned in the previous section, exponentiation in mod p, where p is a prime known to the public, is a good candidate one-way function: It is fast (feasible) in the forward direction; its inverse, being discrete log with respect to a primitive root r mod p is thought to be infeasible – even when that root is known to the public. This was behind the security of DHKE, and now we discuss how to use this one-way function to set up a more usual public-key cryptosystem: the ElGamal Cryptosystem. The RSA public-key cryptosystem did its actual encryption by exponentiation in mod n = pq. The decryption then was by exponentiation with the multiplicative inverse in mod φ(n) of the encryption exponent. The owner of the private key – (p, q) – knows also the decryption exponent, entirely because φ(n) can be computed. Similarly, ElGamal uses an elementary arithmetic operation – multiplication of a nu- merical form of the message by a random number in some mod – to do the scrambling needed for encryption. Enough information is also passed along in the ciphertext so that the intended recipient, who knows the value of a certain discrete log, can cancel out this scrambling multiplication. Here are the details: D EFINITION 5.6.1. To start, Alice picks a large prime p, a primitive root r mod p, and a secret value α ∈ N satisfying 2 ≤ α ≤ p − 1. She computes the value a = r α and then posts her ElGamal public [encryption] key (p, r, a) on her website. Alice’s ElGamal private [decryption] key is (p, r, α). The association of decryption to encryption keys is by E : (p, r, α) 7→ (p, r, r α ). The message space is M = {m ∈ Z | 2 ≤ m ≤ p − 1}, which we will assume can be interpreted as meaningful messages encoded numerically by some widely known scheme. Say Bob wishes to send Alice the cleartext m ∈ M. For each new such message m, he generates a random number β ∈ N such that 2 ≤ β ≤ p − 2 and builds the ciphertext for ElGamal encryption as the two pieces c = e(p,r,a) (m) = (r β (mod p), m · aβ (mod p)) . When Alice gets the ciphertext c = (c1 , c2 ), she can recover the cleartext by ElGamal decryption d(p,r,α)(c1 , c2 ) = c2 · cp−1−α 1 . All of the above parts together form the ElGamal cryptosystem. We first need to know this is correct, in the sense that P ROPOSITION 5.6.2. With the notation as above in Definition 5.6.1 we have d(p,r,α) (e(p,r,a) (m)) = m ∀m ∈ M . 112 5. INDICES = DISCRETE LOGARITHMS P ROOF. Just compute: d(p,r,α) (e(p,r,a) (m)) ≡ m · aβ (mod p) · (r β )p−1−α (mod p) ≡ m · (r α )β · (r β )p−1−α (mod p) ≡ m · r αβ+β(p−1)−αβ (mod p) ≡ m · (r p−1)β (mod p) ≡ m · 1β (mod p) ≡m (mod p) Note that the power p − 1 − α as the exponent of the c1 term in decryption is to make (c−1 1 ) α without using negative powers, by applying Theorem 5.1.2. Graphically: ElGamal cryptosystem: Alice on public network Bob pick a prime p find a primitive root r choose α | 2 ≤ α ≤ p − 1 compute a = r α (mod p) publish public key (p, r, a) download public key given message m ∈ M (so 2 ≤ m ≤ p − 1) choose β | 2 ≤ β ≤ p − 2 compute c1 = r β (mod p) and c2 = m · aβ (mod p) receive ciphertext (c1 , c2 ) transmit ciphertext compute cleartext m = c2 · cp−1−α 1 There is a nice digital signature algorithm associated with ElGamal: D EFINITION 5.6.3. Suppose Alice has ElGamal private key (p, r, α) and wishes to dig- itally sign the message m ∈ M. She first chooses a random γ ∈ N satisfying 1 < γ < p−1 and gcd(γ, p − 1) = 1. The digital signature on m is (r γ (mod p), γ −1 · (m − αr γ ) (mod p − 1)), where the inverse is taken in mod p − 1. To verify the signature (x, y) on message m using Alice’s public key (p, r, a), Bob checks to see if ax xy ≡ r m (mod p): if so, he accepts; if not, he rejects. Again, we would like to know this does the right thing: 5.6. THE ELGAMAL CRYPTOSYSTEM 113 P ROPOSITION 5.6.4. Using the notation as above, Bob will accept all signed messages produced by Alice. P ROOF. Assuming the signed message (m, x, y) was produced by Alice as above, we compute: γ −1 (m−αr γ ) ax xy ≡ ar (r γ )γ (mod p) γ −1 (m−αr γ ) ≡ (r α )r (r γ )γ (mod p) γ +γγ −1 (m−αr γ ) ≡ r αr (mod p) γ +m−αr γ ≡ r αr (mod p) ≡ rm (mod p) So Bob will accept. Graphically: ElGamal digital signatures: Alice on public network Bob find prime p and primitive root r choose α | 2 ≤ α ≤ p − 1 compute a = r α (mod p) publish public key (p, r, a) download public key given message m ∈ M (so 2 ≤ m ≤ p − 1) pick random γ s.t. 1 < γ < p − 1 and gcd(γ, p − 1) = 1 compute s1 = r γ (mod p) and s2 = γ −1 · (m − αr γ ) (mod p − 1) transmit signed message (m, s1 , s2 ) receive signed message if as1 · ss12 ≡ r m (mod p) ACCEPT otherwise, REJECT 114 5. INDICES = DISCRETE LOGARITHMS Exercises for §5.6. E XERCISE 5.6.1. You instructor still likes the prime p = 11717 with primitive root r = 103 from an earlier exercise 5.5.2 on DHKE. In addition, your instructor has calculated the value a = 1020 to complete an ElGamal public key (p, r, a) = (11717, 103, 1020). Using this public key, you want to send a message to your instructor, which should consist of the number 42 (it is, after all, the answer to “life, the universe, and everything”). What ciphertext will you send? Show your work! E XERCISE 5.6.2. Now your instructor wants to send you your grade on a recent test by e-mail, and to prove that this e-mail does in fact originate with your instructor, the email contains both the score value of 97 and the addendum “This score value signed with an ElGamal Digital signature using my public key [the same instructor’s public key as above in exercise 5.6.1, being (p, r, a) = (11717, 103, 1020)]; the signature has the value (6220, 10407).” Do you accept this as truly coming from your instructor? Show your work! E XERCISE 5.6.3. Create an ElGamal public key and e-mail it to your instructor. Wait for a reply message which is ElGamal encrypted, then mail the cleartext back to your instructor. Also, use your public key to sign the number (=message) 17. Send the signed number to your instructor and wait to hear if the signature is accepted or not. Bibliography [AB09] Sanjeev Arora and Boaz Barak, Computational Complexity: A Modern Approach, Cambridge University Press, 2009. [AKS04] Manindra Agrawal, Neeraj Kayal, and Nitin Saxena, Primes is in p, Annals of mathematics (2004), 781–793. [Bou04] Nicolas Bourbaki, Theory of sets, Springer, 2004. [DH76] Whitfield Diffie and Martin E Hellman, New directions in cryptography, Information Theory, IEEE Transactions on 22 (1976), no. 6, 644–654. [FS03] Niels Ferguson and Bruce Schneier, Practical cryptography, vol. 23, Wiley New York, 2003. [Gau86] Carl Friedrich Gauß, Disquisitiones Arithmeticae, 1801. English translation by Arthur A. Clarke, 1986. [Har05] Godfrey Harold Hardy, A Mathematician’s Apology, 2005, First electronic edition, available at http://www.math.ualberta.ca/mss. [HC] Dan Harkins and Dave Carrel, RFC 2409: The Internet Key Exchange (IKE), November 1998, Status: Proposed Standard. [HW79] Godfrey Harold Hardy and Edward Maitland Wright, An introduction to the theory of numbers, Oxford University Press, 1979. [LK08] M Lepinski and S Kent, RFC 5114-Additional Diffie-Hellman Groups for Use with IETF Stan- dards, 2008. [Lub96] Michael George Luby, Pseudorandomness and Cryptographic Applications, Princeton Univer- sity Press, 1996. [MVOV96] Alfred J Menezes, Paul C Van Oorschot, and Scott A Vanstone, Handbook of applied cryptog- raphy, CRC press, 1996. [NC10] Michael A Nielsen and Isaac L Chuang, Quantum computation and quantum information, Cam- bridge university press, 2010. [PS04] Jonathan A Poritz and Morton Swimmer, Hash woes, Virus Bulletin (2004), 14–16. [RSA78] Ronald L Rivest, Adi Shamir, and Len Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the ACM 21 (1978), no. 2, 120–126. [Sha48] C. E. Shannon, A mathematical theory of communication, Bell Systems Technical Journal 27 (1948), 379–423, 623–656. [Sha49] , Communication theory of secrecy systems, Bell System Technical Journal 28 (1949), no. 4, 656–715. [Sho94] Peter W Shor, Algorithms for quantum computation: discrete logarithms and factoring, Foun- dations of Computer Science, 1994 Proceedings., 35th Annual Symposium on, IEEE, 1994, pp. 124–134. [Sho09] Victor Shoup, A computational introduction to number theory and algebra, Cambridge Univer- sity Press, 2009, on-line at http://shoup.net/ntb/. 115 116 BIBLIOGRAPHY [Sta02] Richard Stallman, Free Software, Free Society: Selected Essays of Richard M. Stallman, Lulu. com, 2002. [WY05] Xiaoyun Wang and Hongbo Yu, How to break md5 and other hash functions, Advances in Cryptology–EUROCRYPT 2005, Springer, 2005, pp. 19–35. Index additive inverse, 5 congruences Adleman, Leonard, 75 dividing both sides, 24 al-Haytham, Ibn, 45 existence and number of solutions, 27, 28 al-Kindi, 65 number of solutions when RHS is 0, 24 ASCII, 78 congruent associativity of addition and multiplication, 5 basic properties, 21 asymmetric cipher/cryptosystem/encryption, 74 definition, 21 Augustus, 60 coset, 48 authentication, 56 Creative Commons, iii cryptanalysis, 55 base b, 10 crypto, 55 binary representation, 10 cryptographic hash function, 82 bit, 11 cryptography, 55 brute-force attack, 64 cryptology, 55 CA, 87 cryptosystem, 55 Caesar cipher cyclic subgroup cracking, 67 defined, 48 definition, 60 used, 89, 97 certificate authority, 87 Chinese Remainder Theorem, 30, 38, 79 Diffie, Whitfield, 107 cipher, 57 Diffie-Hellman key exchange, v, 107, 111 ciphertext, 57 Diffie-Hellman problem, 109 cleartext, 57 digital certificate, 87 collision resistance, 82 digital signature commutativity of addition and multiplication, 5 ElGamal, 112 composite RSA, 83 definition, 41 Diophantine equation, 27 size of smallest factor, 41 discrete log, 111 confidentiality, 56 discrete logarithm, 104 congruence classes Disquisitiones Arithmeticae, 21 addition, 34 distance between frequency distributions, 67 addition and multiplication well-defined, 34 distributivity of multiplication over addition, 5 definition, 33 divisibility multiplication, 34 definition, 6 representatives, 34 of linear combinations, 6 117 118 INDEX transitivity, 6 Federal Information Processing Standards, US, with relatively prime factors, 23 83 Division Algorithm Fermat’s Little Theorem statement, 7 alternate statement, 49 used, 6, 9, 25, 34, 79, 91 statement, 49, 51 divisor, 6 used, 50, 99 domain, 99 fingerprint, 82 frequency analysis, 65 ElGamal cryptosystem, v, 111 Fundamental Theorem of Arithmetic encryption, 57 statement, 42 equivalence classes used, 95 definition, 33 disjoint or equal, 33 Gauss’s Theorem example: Q, 33 statement, 94 example: congruence classes, 33 used, 100 representative, 33 Gauss, Carl Friedrich, 21 equivalence relation GnuPG, 88 definition, 33 graph [γράϕω, Greek root], 55 reflexivity, 33 greatest common divisor symmetry, 33 after them division algorithm, 17 transitivity, 33 definition, 13 Euclid’s Lemma definition for more than two integers, 14 statement, 24 examples, 13, 14, 16, 18, 19, 24, 28 used, 24, 42, 52, 92 properties, 13, 14, 16, 19 Euclidean Algorithm used, 13, 15, 24, 27, 30, 34, 35, 37, 38, 42, statement, 17 47–49, 51, 52, 76, 77, 79, 80, 91, 92, 94, 95, used, 28, 77, 78, 80 98, 100, 103, 104, 112 Euler’s φ/totient function ∗ hacker, 55 counts elements of (Z/nZ) , 37 Hellman, Martin, 107 definition, 37 hex, 11 is multiplicative for relatively prime integers, hexadecimal, 11 37 used, 37, 48, 49, 51, 76–79, 89–91, 97–100 index, v values, 39, 77 basic properties, 104 Euler’s Theorem, v definition, 103 statement, 48, 51 examples, n = 17, 104 used, 77, 89 examples, small n, 103 even integer, 6 information security, 56 exhaustive search, 64 information theoretically secure, 61 integrity, 56 factor of an integer, 6 fast modular exponentiation, 78, 104, 108, 109 Julius Caesar, 60 feasible computation definition, 77 Kerckhoff’s Principle, 58 used, 77, 78, 80, 82, 104, 108–111 key, 58 INDEX 119 key distribution, 62 pairwise relatively prime, 30 key-signing party, 88 definition, 15 keyspace, 64 Pigeonhole Principle kryptos [κρυπτ oς, Greek root], 55 statement, 1 used, 47, 85 Lagrange’s Theorem, 89 PKI, 88 statement, 47 plaintext, 57 used, 100 polynomials in mod p, 98–100 least element pre-image resistance, 82 definition, 1 prime least squares, 67 definition, 41 letter frequencies, English, 66 dividing a product, 42, 99 linear congruence, 98 prime counting function, 77 definition, 27 Prime Number Theorem, 77 unique solution, 28, 98 primitive root, v, 97, 98, 100, 107, 108, 111 used, 98 Principle of Mathematical Induction, first version logos [λóγoς, Greek root], 55 statement, 1 used, 2 man-in-the-middle attack, 86 Principle of Mathematical Induction, second md5, 82 version mechanical turk, 65 statement, 3 messagespace, 65 used, 42 multiple, 6 private key, 74 multiplicative inverse probabilistic polynomial-time Turing machine, 61 in Z, 5 pseudorandom, 62 mod n, 28, 45 public key, 74 multiplicative order in mod n, 91, 97–100 public key infrastructure, 88 definition, 47 public-key cryptosystem, 74 divides φ(n), 47 quantum computer, 75, 109 examples, 89, 90 quotient, 7 well-defined, 47 mutually relatively prime, 15 relatively prime, 13 definition, 13 National Institute of Standards and Technology, linear combination giving 1, 14 US [NIST], 83 remainder National Security Agency, US [NSA], 83 definition, 7 non-repudiation, 56 Rivest, Ron, 75, 82 ROT13, 60 octal, 11 RSA Octavian (Augustus), 60 cryptosystem, v, 75, 107 odd integer, 6 exponent, 76 one-time pad, 61 modulus, 75 one-way function, 75, 104, 109, 111 OpenPGP, 88 scytale, 57 order, see also multiplicative order in mod n second pre-image resistance, 82 120 INDEX security through obscurity, 58 sexagesimal, 11 SHA-1, 83 SHA-2, 83 SHA-256, 83 Shamir, Adi, 75 signing key, 84 Sophie Germain prime definition, 108 examples, 109 square error, 67 symmetric cipher/cryptosystem/encryption, 73 trusted third party, 87 Unicode, 78 verification key, 84 Vernam Cipher, 61 Vigenère cipher, 60 web of trust, 88 Well-Ordering Principle statement, 1 used, 2, 7, 47 Wilson’s Theorem, 45, 102 yfesdrype, 56 zero product property, 99