Authors Albert R Meyer Eric Lehman F Thomson Leighton
License CC-BY-SA-3.0
“mcs” — 2017/3/10 — 22:22 — page i — #1 Mathematics for Computer Science revised Friday 10th March, 2017, 22:22 Eric Lehman Google Inc. F Thomson Leighton Department of Mathematics and the Computer Science and AI Laboratory, Massachussetts Institute of Technology; Akamai Technologies Albert R Meyer Department of Electrical Engineering and Computer Science and the Computer Science and AI Laboratory, Massachussetts Institute of Technology 2016, Eric Lehman, F Tom Leighton, Albert R Meyer. This work is available under the terms of the Creative Commons Attribution-ShareAlike 3.0 license. “mcs” — 2017/3/10 — 22:22 — page ii — #2 “mcs” — 2017/3/10 — 22:22 — page iii — #3 Contents I Proofs Introduction 3 0.1 References 4 1 What is a Proof? 5 1.1 Propositions 5 1.2 Predicates 8 1.3 The Axiomatic Method 8 1.4 Our Axioms 9 1.5 Proving an Implication 11 1.6 Proving an “If and Only If” 13 1.7 Proof by Cases 15 1.8 Proof by Contradiction 16 1.9 Good Proofs in Practice 17 1.10 References 19 2 The Well Ordering Principle 29 2.1 Well Ordering Proofs 29 2.2 Template for Well Ordering Proofs 30 2.3 Factoring into Primes 32 2.4 Well Ordered Sets 33 3 Logical Formulas 47 3.1 Propositions from Propositions 48 3.2 Propositional Logic in Computer Programs 52 3.3 Equivalence and Validity 54 3.4 The Algebra of Propositions 57 3.5 The SAT Problem 62 3.6 Predicate Formulas 63 3.7 References 68 4 Mathematical Data Types 95 4.1 Sets 95 4.2 Sequences 100 4.3 Functions 101 4.4 Binary Relations 103 4.5 Finite Cardinality 107 “mcs” — 2017/3/10 — 22:22 — page iv — #4 iv Contents 5 Induction 127 5.1 Ordinary Induction 127 5.2 Strong Induction 136 5.3 Strong Induction vs. Induction vs. Well Ordering 143 6 State Machines 163 6.1 States and Transitions 163 6.2 The Invariant Principle 164 6.3 Partial Correctness & Termination 172 6.4 The Stable Marriage Problem 177 7 Recursive Data Types 207 7.1 Recursive Definitions and Structural Induction 207 7.2 Strings of Matched Brackets 211 7.3 Recursive Functions on Nonnegative Integers 215 7.4 Arithmetic Expressions 217 7.5 Games as a Recursive Data Type 222 7.6 Induction in Computer Science 226 8 Infinite Sets 253 8.1 Infinite Cardinality 254 8.2 The Halting Problem 263 8.3 The Logic of Sets 267 8.4 Does All This Really Work? 270 II Structures Introduction 295 9 Number Theory 297 9.1 Divisibility 297 9.2 The Greatest Common Divisor 302 9.3 Prime Mysteries 309 9.4 The Fundamental Theorem of Arithmetic 311 9.5 Alan Turing 314 9.6 Modular Arithmetic 318 9.7 Remainder Arithmetic 320 9.8 Turing’s Code (Version 2.0) 323 9.9 Multiplicative Inverses and Cancelling 325 9.10 Euler’s Theorem 329 9.11 RSA Public Key Encryption 334 “mcs” — 2017/3/10 — 22:22 — page v — #5 v Contents 9.12 What has SAT got to do with it? 336 9.13 References 337 10 Directed graphs & Partial Orders 375 10.1 Vertex Degrees 377 10.2 Walks and Paths 378 10.3 Adjacency Matrices 381 10.4 Walk Relations 384 10.5 Directed Acyclic Graphs & Scheduling 385 10.6 Partial Orders 393 10.7 Representing Partial Orders by Set Containment 397 10.8 Linear Orders 398 10.9 Product Orders 398 10.10 Equivalence Relations 399 10.11 Summary of Relational Properties 401 11 Communication Networks 433 11.1 Routing 433 11.2 Routing Measures 434 11.3 Network Designs 437 12 Simple Graphs 453 12.1 Vertex Adjacency and Degrees 453 12.2 Sexual Demographics in America 455 12.3 Some Common Graphs 457 12.4 Isomorphism 459 12.5 Bipartite Graphs & Matchings 461 12.6 Coloring 466 12.7 Simple Walks 471 12.8 Connectivity 473 12.9 Forests & Trees 478 12.10 References 486 13 Planar Graphs 525 13.1 Drawing Graphs in the Plane 525 13.2 Definitions of Planar Graphs 525 13.3 Euler’s Formula 536 13.4 Bounding the Number of Edges in a Planar Graph 537 13.5 Returning to K5 and K3;3 538 13.6 Coloring Planar Graphs 539 13.7 Classifying Polyhedra 541 13.8 Another Characterization for Planar Graphs 544 “mcs” — 2017/3/10 — 22:22 — page vi — #6 vi Contents III Counting Introduction 553 14 Sums and Asymptotics 555 14.1 The Value of an Annuity 556 14.2 Sums of Powers 562 14.3 Approximating Sums 564 14.4 Hanging Out Over the Edge 568 14.5 Products 574 14.6 Double Trouble 577 14.7 Asymptotic Notation 580 15 Cardinality Rules 605 15.1 Counting One Thing by Counting Another 605 15.2 Counting Sequences 606 15.3 The Generalized Product Rule 609 15.4 The Division Rule 613 15.5 Counting Subsets 616 15.6 Sequences with Repetitions 618 15.7 Counting Practice: Poker Hands 621 15.8 The Pigeonhole Principle 626 15.9 Inclusion-Exclusion 635 15.10 Combinatorial Proofs 641 15.11 References 645 16 Generating Functions 683 16.1 Infinite Series 683 16.2 Counting with Generating Functions 685 16.3 Partial Fractions 691 16.4 Solving Linear Recurrences 694 16.5 Formal Power Series 699 16.6 References 702 IV Probability Introduction 721 17 Events and Probability Spaces 723 17.1 Let’s Make a Deal 723 “mcs” — 2017/3/10 — 22:22 — page vii — #7 vii Contents 17.2 The Four Step Method 724 17.3 Strange Dice 733 17.4 The Birthday Principle 740 17.5 Set Theory and Probability 742 17.6 References 746 18 Conditional Probability 755 18.1 Monty Hall Confusion 755 18.2 Definition and Notation 756 18.3 The Four-Step Method for Conditional Probability 758 18.4 Why Tree Diagrams Work 760 18.5 The Law of Total Probability 768 18.6 Simpson’s Paradox 770 18.7 Independence 772 18.8 Mutual Independence 774 18.9 Probability versus Confidence 778 19 Random Variables 807 19.1 Random Variable Examples 807 19.2 Independence 809 19.3 Distribution Functions 810 19.4 Great Expectations 819 19.5 Linearity of Expectation 830 20 Deviation from the Mean 861 20.1 Markov’s Theorem 861 20.2 Chebyshev’s Theorem 864 20.3 Properties of Variance 868 20.4 Estimation by Random Sampling 874 20.5 Confidence in an Estimation 877 20.6 Sums of Random Variables 879 20.7 Really Great Expectations 888 21 Random Walks 913 21.1 Gambler’s Ruin 913 21.2 Random Walks on Graphs 923 V Recurrences Introduction 941 22 Recurrences 943 “mcs” — 2017/3/10 — 22:22 — page viii — #8 viii Contents 22.1 The Towers of Hanoi 943 22.2 Merge Sort 946 22.3 Linear Recurrences 950 22.4 Divide-and-Conquer Recurrences 957 22.5 A Feel for Recurrences 964 Bibliography 971 Glossary of Symbols 975 Index 979 “mcs” — 2017/3/10 — 22:22 — page 1 — #9 I Proofs “mcs” — 2017/3/10 — 22:22 — page 2 — #10 “mcs” — 2017/3/10 — 22:22 — page 3 — #11 Introduction This text explains how to use mathematical models and methods to analyze prob- lems that arise in computer science. Proofs play a central role in this work because the authors share a belief with most mathematicians that proofs are essential for genuine understanding. Proofs also play a growing role in computer science; they are used to certify that software and hardware will always behave correctly, some- thing that no amount of testing can do. Simply put, a proof is a method of establishing truth. Like beauty, “truth” some- times depends on the eye of the beholder, and it should not be surprising that what constitutes a proof differs among fields. For example, in the judicial system, legal truth is decided by a jury based on the allowable evidence presented at trial. In the business world, authoritative truth is specified by a trusted person or organization, or maybe just your boss. In fields such as physics or biology, scientific truth is confirmed by experiment.1 In statistics, probable truth is established by statistical analysis of sample data. Philosophical proof involves careful exposition and persuasion typically based on a series of small, plausible arguments. The best example begins with “Cogito ergo sum,” a Latin sentence that translates as “I think, therefore I am.” This phrase comes from the beginning of a 17th century essay by the mathematician/philosopher, René Descartes, and it is one of the most famous quotes in the world: do a web search for it, and you will be flooded with hits. Deducing your existence from the fact that you’re thinking about your existence is a pretty cool and persuasive-sounding idea. However, with just a few more lines 1 Actually, only scientific falsehood can be demonstrated by an experiment—when the experiment fails to behave as predicted. But no amount of experiment can confirm that the next experiment won’t fail. For this reason, scientists rarely speak of truth, but rather of theories that accurately predict past, and anticipated future, experiments. “mcs” — 2017/3/10 — 22:22 — page 4 — #12 4 0.1. References of argument in this vein, Descartes goes on to conclude that there is an infinitely beneficent God. Whether or not you believe in an infinitely beneficent God, you’ll probably agree that any very short “proof” of God’s infinite beneficence is bound to be far-fetched. So even in masterful hands, this approach is not reliable. Mathematics has its own specific notion of “proof.” Definition. A mathematical proof of a proposition is a chain of logical deductions leading to the proposition from a base set of axioms. The three key ideas in this definition are highlighted: proposition, logical deduc- tion, and axiom. Chapter 1 examines these three ideas along with some basic ways of organizing proofs. Chapter 2 introduces the Well Ordering Principle, a basic method of proof; later, Chapter 5 introduces the closely related proof method of induction. If you’re going to prove a proposition, you’d better have a precise understand- ing of what the proposition means. To avoid ambiguity and uncertain definitions in ordinary language, mathematicians use language very precisely, and they often express propositions using logical formulas; these are the subject of Chapter 3. The first three Chapters assume the reader is familiar with a few mathematical concepts like sets and functions. Chapters 4 and 8 offer a more careful look at such mathematical data types, examining in particular properties and methods for proving things about infinite sets. Chapter 7 goes on to examine recursively defined data types. 0.1 References [12], [46], [1] “mcs” — 2017/3/10 — 22:22 — page 5 — #13 1 What is a Proof? 1.1 Propositions Definition. A proposition is a statement (communication) that is either true or false. For example, both of the following statements are propositions. The first is true, and the second is false. Proposition 1.1.1. 2 + 3 = 5. Proposition 1.1.2. 1 + 1 = 3. Being true or false doesn’t sound like much of a limitation, but it does exclude statements such as “Wherefore art thou Romeo?” and “Give me an A!” It also ex- cludes statements whose truth varies with circumstance such as, “It’s five o’clock,” or “the stock market will rise tomorrow.” Unfortunately it is not always easy to decide if a claimed proposition is true or false: Claim 1.1.3. For every nonnegative integer n the value of n2 C n C 41 is prime. (A prime is an integer greater than 1 that is not divisible by any other integer greater than 1. For example, 2, 3, 5, 7, 11, are the first five primes.) Let’s try some numerical experimentation to check this proposition. Let p.n/ WWD n2 C n C 41:1 (1.1) We begin with p.0/ D 41, which is prime; then p.1/ D 43; p.2/ D 47; p.3/ D 53; : : : ; p.20/ D 461 are each prime. Hmmm, starts to look like a plausible claim. In fact we can keep checking through n D 39 and confirm that p.39/ D 1601 is prime. But p.40/ D 402 C 40 C 41 D 41 41, which is not prime. So Claim 1.1.3 is false since it’s not true that p.n/ is prime for all nonnegative integers n. In fact, it’s not hard to show that no polynomial with integer coefficients can map all 1 The symbol WWD means “equal by definition.” It’s always ok simply to write “=” instead of WWD, but reminding the reader that an equality holds by definition can be helpful. “mcs” — 2017/3/10 — 22:22 — page 6 — #14 6 Chapter 1 What is a Proof? nonnegative numbers into prime numbers, unless it’s a constant (see Problem 1.26). But this example highlights the point that, in general, you can’t check a claim about an infinite set by checking a finite sample of its elements, no matter how large the sample. By the way, propositions like this about all numbers or all items of some kind are so common that there is a special notation for them. With this notation, Claim 1.1.3 would be 8n 2 N: p.n/ is prime: (1.2) Here the symbol 8 is read “for all.” The symbol N stands for the set of nonnegative integers: 0, 1, 2, 3, . . . (ask your instructor for the complete list). The symbol “2” is read as “is a member of,” or “belongs to,” or simply as “is in.” The period after the N is just a separator between phrases. Here are two even more extreme examples: Conjecture. [Euler] The equation a4 C b 4 C c 4 D d 4 has no solution when a; b; c; d are positive integers. Euler (pronounced “oiler”) conjectured this in 1769. But the conjecture was proved false 218 years later by Noam Elkies at a liberal arts school up Mass Ave. The solution he found was a D 95800; b D 217519; c D 414560; d D 422481. In logical notation, Euler’s Conjecture could be written, 8a 2 ZC 8b 2 ZC 8c 2 ZC 8d 2 ZC : a4 C b 4 C c 4 ¤ d 4 : Here, ZC is a symbol for the positive integers. Strings of 8’s like this are usually abbreviated for easier reading: 8a; b; c; d 2 ZC : a4 C b 4 C c 4 ¤ d 4 : Here’s another claim which would be hard to falsify by sampling: the smallest possible x; y; z that satisfy the equality each have more than 1000 digits! False Claim. 313.x 3 C y 3 / D z 3 has no solution when x; y; z 2 ZC . It’s worth mentioning a couple of further famous propositions whose proofs were sought for centuries before finally being discovered: Proposition 1.1.4 (Four Color Theorem). Every map can be colored with 4 colors so that adjacent2 regions have different colors. 2 Two regions are adjacent only when they share a boundary segment of positive length. They are not considered to be adjacent if their boundaries meet only at a few points. “mcs” — 2017/3/10 — 22:22 — page 7 — #15 1.1. Propositions 7 Several incorrect proofs of this theorem have been published, including one that stood for 10 years in the late 19th century before its mistake was found. A laborious proof was finally found in 1976 by mathematicians Appel and Haken, who used a complex computer program to categorize the four-colorable maps. The program left a few thousand maps uncategorized, which were checked by hand by Haken and his assistants—among them his 15-year-old daughter. There was reason to doubt whether this was a legitimate proof—the proof was too big to be checked without a computer. No one could guarantee that the com- puter calculated correctly, nor was anyone enthusiastic about exerting the effort to recheck the four-colorings of thousands of maps that were done by hand. Two decades later a mostly intelligible proof of the Four Color Theorem was found, though a computer is still needed to check four-colorability of several hundred spe- cial maps.3 Proposition 1.1.5 (Fermat’s Last Theorem). There are no positive integers x, y and z such that xn C yn D zn for some integer n > 2. In a book he was reading around 1630, Fermat claimed to have a proof for this proposition, but not enough space in the margin to write it down. Over the years, the Theorem was proved to hold for all n up to 4,000,000, but we’ve seen that this shouldn’t necessarily inspire confidence that it holds for all n. There is, after all, a clear resemblance between Fermat’s Last Theorem and Euler’s false Conjecture. Finally, in 1994, British mathematician Andrew Wiles gave a proof, after seven years of working in secrecy and isolation in his attic. His proof did not fit in any margin.4 Finally, let’s mention another simply stated proposition whose truth remains un- known. Conjecture 1.1.6 (Goldbach). Every even integer greater than 2 is the sum of two primes. Goldbach’s Conjecture dates back to 1742. It is known to hold for all numbers up to 1018 , but to this day, no one knows whether it’s true or false. 3 The story of the proof of the Four Color Theorem is told in a well-reviewed popular (non- technical) book: “Four Colors Suffice. How the Map Problem was Solved.” Robin Wilson. Princeton Univ. Press, 2003, 276pp. ISBN 0-691-11533-8. 4 In fact, Wiles’ original proof was wrong, but he and several collaborators used his ideas to arrive at a correct proof a year later. This story is the subject of the popular book, Fermat’s Enigma by Simon Singh, Walker & Company, November, 1997. “mcs” — 2017/3/10 — 22:22 — page 8 — #16 8 Chapter 1 What is a Proof? For a computer scientist, some of the most important things to prove are the correctness of programs and systems—whether a program or system does what it’s supposed to. Programs are notoriously buggy, and there’s a growing community of researchers and practitioners trying to find ways to prove program correctness. These efforts have been successful enough in the case of CPU chips that they are now routinely used by leading chip manufacturers to prove chip correctness and avoid some notorious past mistakes. Developing mathematical methods to verify programs and systems remains an active research area. We’ll illustrate some of these methods in Chapter 5. 1.2 Predicates A predicate can be understood as a proposition whose truth depends on the value of one or more variables. So “n is a perfect square” describes a predicate, since you can’t say if it’s true or false until you know what the value of the variable n happens to be. Once you know, for example, that n equals 4, the predicate becomes the true proposition “4 is a perfect square”. Remember, nothing says that the proposition has to be true: if the value of n were 5, you would get the false proposition “5 is a perfect square.” Like other propositions, predicates are often named with a letter. Furthermore, a function-like notation is used to denote a predicate supplied with specific variable values. For example, we might use the name “P ” for predicate above: P .n/ WWD “n is a perfect square”; and repeat the remarks above by asserting that P .4/ is true, and P .5/ is false. This notation for predicates is confusingly similar to ordinary function notation. If P is a predicate, then P .n/ is either true or false, depending on the value of n. On the other hand, if p is an ordinary function, like n2 C1, then p.n/ is a numerical quantity. Don’t confuse these two! 1.3 The Axiomatic Method The standard procedure for establishing truth in mathematics was invented by Eu- clid, a mathematician working in Alexandria, Egypt around 300 BC. His idea was to begin with five assumptions about geometry, which seemed undeniable based on direct experience. (For example, “There is a straight line segment between every “mcs” — 2017/3/10 — 22:22 — page 9 — #17 1.4. Our Axioms 9 pair of points”.) Propositions like these that are simply accepted as true are called axioms. Starting from these axioms, Euclid established the truth of many additional propo- sitions by providing “proofs.” A proof is a sequence of logical deductions from axioms and previously proved statements that concludes with the proposition in question. You probably wrote many proofs in high school geometry class, and you’ll see a lot more in this text. There are several common terms for a proposition that has been proved. The different terms hint at the role of the proposition within a larger body of work. Important true propositions are called theorems. A lemma is a preliminary proposition useful for proving later propositions. A corollary is a proposition that follows in just a few logical steps from a theorem. These definitions are not precise. In fact, sometimes a good lemma turns out to be far more important than the theorem it was originally used to prove. Euclid’s axiom-and-proof approach, now called the axiomatic method, remains the foundation for mathematics today. In fact, just a handful of axioms, called the Zermelo-Fraenkel with Choice axioms (ZFC), together with a few logical deduction rules, appear to be sufficient to derive essentially all of mathematics. We’ll examine these in Chapter 8. 1.4 Our Axioms The ZFC axioms are important in studying and justifying the foundations of math- ematics, but for practical purposes, they are much too primitive. Proving theorems in ZFC is a little like writing programs in byte code instead of a full-fledged pro- gramming language—by one reckoning, a formal proof in ZFC that 2 C 2 D 4 requires more than 20,000 steps! So instead of starting with ZFC, we’re going to take a huge set of axioms as our foundation: we’ll accept all familiar facts from high school math. This will give us a quick launch, but you may find this imprecise specification of the axioms troubling at times. For example, in the midst of a proof, you may start to wonder, “Must I prove this little fact or can I take it as an axiom?” There really is no absolute answer, since what’s reasonable to assume and what requires proof depends on the circumstances and the audience. A good general guideline is simply to be up front about what you’re assuming. “mcs” — 2017/3/10 — 22:22 — page 10 — #18 10 Chapter 1 What is a Proof? 1.4.1 Logical Deductions Logical deductions, or inference rules, are used to prove new propositions using previously proved ones. A fundamental inference rule is modus ponens. This rule says that a proof of P together with a proof that P IMPLIES Q is a proof of Q. Inference rules are sometimes written in a funny notation. For example, modus ponens is written: Rule. P; P IMPLIES Q Q When the statements above the line, called the antecedents, are proved, then we can consider the statement below the line, called the conclusion or consequent, to also be proved. A key requirement of an inference rule is that it must be sound: an assignment of truth values to the letters P , Q, . . . , that makes all the antecedents true must also make the consequent true. So if we start off with true axioms and apply sound inference rules, everything we prove will also be true. There are many other natural, sound inference rules, for example: Rule. P IMPLIES Q; Q IMPLIES R P IMPLIES R Rule. NOT .P / IMPLIES NOT .Q/ Q IMPLIES P On the other hand, Non-Rule. NOT .P / IMPLIES NOT .Q/ P IMPLIES Q is not sound: if P is assigned T and Q is assigned F, then the antecedent is true and the consequent is not. As with axioms, we will not be too formal about the set of legal inference rules. Each step in a proof should be clear and “logical”; in particular, you should state what previously proved facts are used to derive each new conclusion. “mcs” — 2017/3/10 — 22:22 — page 11 — #19 1.5. Proving an Implication 11 1.4.2 Patterns of Proof In principle, a proof can be any sequence of logical deductions from axioms and previously proved statements that concludes with the proposition in question. This freedom in constructing a proof can seem overwhelming at first. How do you even start a proof? Here’s the good news: many proofs follow one of a handful of standard tem- plates. Each proof has it own details, of course, but these templates at least provide you with an outline to fill in. We’ll go through several of these standard patterns, pointing out the basic idea and common pitfalls and giving some examples. Many of these templates fit together; one may give you a top-level outline while others help you at the next level of detail. And we’ll show you other, more sophisticated proof techniques later on. The recipes below are very specific at times, telling you exactly which words to write down on your piece of paper. You’re certainly free to say things your own way instead; we’re just giving you something you could say so that you’re never at a complete loss. 1.5 Proving an Implication Propositions of the form “If P , then Q” are called implications. This implication is often rephrased as “P IMPLIES Q.” Here are some examples: (Quadratic Formula) If ax 2 C bx C c D 0 and a ¤ 0, then p xD b ˙ b 2 4ac =2a: (Goldbach’s Conjecture 1.1.6 rephrased) If n is an even integer greater than 2, then n is a sum of two primes. If 0 x 2, then x 3 C 4x C 1 > 0. There are a couple of standard methods for proving an implication. 1.5.1 Method #1 In order to prove that P IMPLIES Q: 1. Write, “Assume P .” 2. Show that Q logically follows. “mcs” — 2017/3/10 — 22:22 — page 12 — #20 12 Chapter 1 What is a Proof? Example Theorem 1.5.1. If 0 x 2, then x 3 C 4x C 1 > 0. Before we write a proof of this theorem, we have to do some scratchwork to figure out why it is true. The inequality certainly holds for x D 0; then the left side is equal to 1 and 1 > 0. As x grows, the 4x term (which is positive) initially seems to have greater magnitude than x 3 (which is negative). For example, when x D 1, we have 4x D 4, but x 3 D 1 only. In fact, it looks like x 3 doesn’t begin to dominate until x > 2. So it seems the x 3 C 4x part should be nonnegative for all x between 0 and 2, which would imply that x 3 C 4x C 1 is positive. So far, so good. But we still have to replace all those “seems like” phrases with solid, logical arguments. We can get a better handle on the critical x 3 C 4x part by factoring it, which is not too hard: x 3 C 4x D x.2 x/.2 C x/ Aha! For x between 0 and 2, all of the terms on the right side are nonnegative. And a product of nonnegative terms is also nonnegative. Let’s organize this blizzard of observations into a clean proof. Proof. Assume 0 x 2. Then x, 2 x and 2Cx are all nonnegative. Therefore, the product of these terms is also nonnegative. Adding 1 to this product gives a positive number, so: x.2 x/.2 C x/ C 1 > 0 Multiplying out on the left side proves that x 3 C 4x C 1 > 0 as claimed. There are a couple points here that apply to all proofs: You’ll often need to do some scratchwork while you’re trying to figure out the logical steps of a proof. Your scratchwork can be as disorganized as you like—full of dead-ends, strange diagrams, obscene words, whatever. But keep your scratchwork separate from your final proof, which should be clear and concise. Proofs typically begin with the word “Proof” and end with some sort of de- limiter like or “QED.” The only purpose for these conventions is to clarify where proofs begin and end. “mcs” — 2017/3/10 — 22:22 — page 13 — #21 1.6. Proving an “If and Only If” 13 1.5.2 Method #2 - Prove the Contrapositive An implication (“P IMPLIES Q”) is logically equivalent to its contrapositive NOT .Q/ IMPLIES NOT .P / : Proving one is as good as proving the other, and proving the contrapositive is some- times easier than proving the original statement. If so, then you can proceed as follows: 1. Write, “We prove the contrapositive:” and then state the contrapositive. 2. Proceed as in Method #1. Example p Theorem 1.5.2. If r is irrational, then r is also irrational. A number is rational when it equals a quotient of integers —that is, if it equals m=n for some integers m and n. If it’s not rational, then it’s called irrational. So p we must show that if r is not a ratio of integers, then r is also not a ratio of integers. That’s pretty convoluted! We can eliminate both not’s and simplify the proof by using the contrapositive instead. p Proof. We prove the contrapositive: if r is rational, then r is rational. p Assume that r is rational. Then there exist integers m and n such that: p m rD n Squaring both sides gives: m2 rD 2 n 2 2 Since m and n are integers, r is also rational. 1.6 Proving an “If and Only If” Many mathematical theorems assert that two statements are logically equivalent; that is, one holds if and only if the other does. Here is an example that has been known for several thousand years: Two triangles have the same side lengths if and only if two side lengths and the angle between those sides are the same. The phrase “if and only if” comes up so often that it is often abbreviated “iff.” “mcs” — 2017/3/10 — 22:22 — page 14 — #22 14 Chapter 1 What is a Proof? 1.6.1 Method #1: Prove Each Statement Implies the Other The statement “P IFF Q” is equivalent to the two statements “P IMPLIES Q” and “Q IMPLIES P .” So you can prove an “iff” by proving two implications: 1. Write, “We prove P implies Q and vice-versa.” 2. Write, “First, we show P implies Q.” Do this by one of the methods in Section 1.5. 3. Write, “Now, we show Q implies P .” Again, do this by one of the methods in Section 1.5. 1.6.2 Method #2: Construct a Chain of Iffs In order to prove that P is true iff Q is true: 1. Write, “We construct a chain of if-and-only-if implications.” 2. Prove P is equivalent to a second statement which is equivalent to a third statement and so forth until you reach Q. This method sometimes requires more ingenuity than the first, but the result can be a short, elegant proof. Example The standard deviation of a sequence of values x1 ; x2 ; : : : ; xn is defined to be: s .x1 /2 C .x2 /2 C C .xn /2 (1.3) n where is the average or mean of the values: x1 C x2 C C xn WWD n Theorem 1.6.1. The standard deviation of a sequence of values x1 ; : : : ; xn is zero iff all the values are equal to the mean. For example, the standard deviation of test scores is zero if and only if everyone scored exactly the class average. Proof. We construct a chain of “iff” implications, starting with the statement that the standard deviation (1.3) is zero: s .x1 /2 C .x2 /2 C C .xn /2 D 0: (1.4) n “mcs” — 2017/3/10 — 22:22 — page 15 — #23 1.7. Proof by Cases 15 Now since zero is the only number whose square root is zero, equation (1.4) holds iff .x1 /2 C .x2 /2 C C .xn /2 D 0: (1.5) Squares of real numbers are always nonnegative, so every term on the left-hand side of equation (1.5) is nonnegative. This means that (1.5) holds iff Every term on the left-hand side of (1.5) is zero. (1.6) But a term .xi /2 is zero iff xi D , so (1.6) is true iff Every xi equals the mean. 1.7 Proof by Cases Breaking a complicated proof into cases and proving each case separately is a com- mon, useful proof strategy. Here’s an amusing example. Let’s agree that given any two people, either they have met or not. If every pair of people in a group has met, we’ll call the group a club. If every pair of people in a group has not met, we’ll call it a group of strangers. Theorem. Every collection of 6 people includes a club of 3 people or a group of 3 strangers. Proof. The proof is by case analysis5 . Let x denote one of the six people. There are two cases: 1. Among 5 other people besides x, at least 3 have met x. 2. Among the 5 other people, at least 3 have not met x. Now, we have to be sure that at least one of these two cases must hold,6 but that’s easy: we’ve split the 5 people into two groups, those who have shaken hands with x and those who have not, so one of the groups must have at least half the people. Case 1: Suppose that at least 3 people did meet x. This case splits into two subcases: 5 Describing your approach at the outset helps orient the reader. 6 Part of a case analysis argument is showing that you’ve covered all the cases. This is often obvious, because the two cases are of the form “P ” and “not P .” However, the situation above is not stated quite so simply. “mcs” — 2017/3/10 — 22:22 — page 16 — #24 16 Chapter 1 What is a Proof? Case 1.1: No pair among those people met each other. Then these people are a group of at least 3 strangers. The theorem holds in this subcase. Case 1.2: Some pair among those people have met each other. Then that pair, together with x, form a club of 3 people. So the theorem holds in this subcase. This implies that the theorem holds in Case 1. Case 2: Suppose that at least 3 people did not meet x. This case also splits into two subcases: Case 2.1: Every pair among those people met each other. Then these people are a club of at least 3 people. So the theorem holds in this subcase. Case 2.2: Some pair among those people have not met each other. Then that pair, together with x, form a group of at least 3 strangers. So the theorem holds in this subcase. This implies that the theorem also holds in Case 2, and therefore holds in all cases. 1.8 Proof by Contradiction In a proof by contradiction, or indirect proof, you show that if a proposition were false, then some false fact would be true. Since a false fact by definition can’t be true, the proposition must be true. Proof by contradiction is always a viable approach. However, as the name sug- gests, indirect proofs can be a little convoluted, so direct proofs are generally prefer- able when they are available. Method: In order to prove a proposition P by contradiction: 1. Write, “We use proof by contradiction.” 2. Write, “Suppose P is false.” 3. Deduce something known to be false (a logical contradiction). 4. Write, “This is a contradiction. Therefore, P must be true.” “mcs” — 2017/3/10 — 22:22 — page 17 — #25 1.9. Good Proofs in Practice 17 Example p We’ll prove by contradiction that 2 is irrational. Remember that a number is ra- tional if it is equal to a ratio of integers—for example, 3:5 D 7=2 and 0:1111 D 1=9 are rational numbers. p Theorem 1.8.1. 2 is irrational. p Proof. We use proof byp contradiction. Suppose the claim is false, and 2 is ratio- nal. Then we can write 2 as a fraction n=d in lowest terms. Squaring both sides gives 2 D n2 =d 2 and so 2d 2 D n2 . This implies that n is a multiple of 2 (see Problems 1.15 and 1.16). Therefore n2 must be a multiple of 4. But since 2d 2 D n2 , we know 2d 2 is a multiple of 4 and so d 2 is a multiple of 2. This implies that d is a multiple of 2. So, the numerator and denominator havep 2 as a common factor, which contradicts the fact that n=d is in lowest terms. Thus, 2 must be irrational. 1.9 Good Proofs in Practice One purpose of a proof is to establish the truth of an assertion with absolute cer- tainty, and mechanically checkable proofs of enormous length or complexity can accomplish this. But humanly intelligible proofs are the only ones that help some- one understand the subject. Mathematicians generally agree that important mathe- matical results can’t be fully understood until their proofs are understood. That is why proofs are an important part of the curriculum. To be understandable and helpful, more is required of a proof than just logical correctness: a good proof must also be clear. Correctness and clarity usually go together; a well-written proof is more likely to be a correct proof, since mistakes are harder to hide. In practice, the notion of proof is a moving target. Proofs in a professional research journal are generally unintelligible to all but a few experts who know all the terminology and prior results used in the proof. Conversely, proofs in the first weeks of a beginning course like 6.042 would be regarded as tediously long-winded by a professional mathematician. In fact, what we accept as a good proof later in the term will be different from what we consider good proofs in the first couple of weeks of 6.042. But even so, we can offer some general tips on writing good proofs: State your game plan. A good proof begins by explaining the general line of rea- soning, for example, “We use case analysis” or “We argue by contradiction.” “mcs” — 2017/3/10 — 22:22 — page 18 — #26 18 Chapter 1 What is a Proof? Keep a linear flow. Sometimes proofs are written like mathematical mosaics, with juicy tidbits of independent reasoning sprinkled throughout. This is not good. The steps of an argument should follow one another in an intelligible order. A proof is an essay, not a calculation. Many students initially write proofs the way they compute integrals. The result is a long sequence of expressions without explanation, making it very hard to follow. This is bad. A good proof usually looks like an essay with some equations thrown in. Use complete sentences. Avoid excessive symbolism. Your reader is probably good at understanding words, but much less skilled at reading arcane mathematical symbols. Use words where you reasonably can. Revise and simplify. Your readers will be grateful. Introduce notation thoughtfully. Sometimes an argument can be greatly simpli- fied by introducing a variable, devising a special notation, or defining a new term. But do this sparingly, since you’re requiring the reader to remember all that new stuff. And remember to actually define the meanings of new variables, terms, or notations; don’t just start using them! Structure long proofs. Long programs are usually broken into a hierarchy of smaller procedures. Long proofs are much the same. When your proof needed facts that are easily stated, but not readily proved, those fact are best pulled out as preliminary lemmas. Also, if you are repeating essentially the same argu- ment over and over, try to capture that argument in a general lemma, which you can cite repeatedly instead. Be wary of the “obvious.” When familiar or truly obvious facts are needed in a proof, it’s OK to label them as such and to not prove them. But remember that what’s obvious to you may not be—and typically is not—obvious to your reader. Most especially, don’t use phrases like “clearly” or “obviously” in an attempt to bully the reader into accepting something you’re having trouble proving. Also, go on the alert whenever you see one of these phrases in someone else’s proof. Finish. At some point in a proof, you’ll have established all the essential facts you need. Resist the temptation to quit and leave the reader to draw the “obvious” conclusion. Instead, tie everything together yourself and explain why the original claim follows. “mcs” — 2017/3/10 — 22:22 — page 19 — #27 1.10. References 19 Creating a good proof is a lot like creating a beautiful work of art. In fact, mathematicians often refer to really good proofs as being “elegant” or “beautiful.” It takes a practice and experience to write proofs that merit such praises, but to get you started in the right direction, we will provide templates for the most useful proof techniques. Throughout the text there are also examples of bogus proofs—arguments that look like proofs but aren’t. Sometimes a bogus proof can reach false conclusions because of missteps or mistaken assumptions. More subtle bogus proofs reach correct conclusions, but do so in improper ways such as circular reasoning, leaping to unjustified conclusions, or saying that the hard part of the proof is “left to the reader.” Learning to spot the flaws in improper proofs will hone your skills at seeing how each proof step follows logically from prior steps. It will also enable you to spot flaws in your own proofs. The analogy between good proofs and good programs extends beyond structure. The same rigorous thinking needed for proofs is essential in the design of criti- cal computer systems. When algorithms and protocols only “mostly work” due to reliance on hand-waving arguments, the results can range from problematic to catastrophic. An early example was the Therac 25, a machine that provided radia- tion therapy to cancer victims, but occasionally killed them with massive overdoses due to a software race condition. A further example of a dozen years ago (August 2004) involved a single faulty command to a computer system used by United and American Airlines that grounded the entire fleet of both companies—and all their passengers! It is a certainty that we’ll all one day be at the mercy of critical computer systems designed by you and your classmates. So we really hope that you’ll develop the ability to formulate rock-solid logical arguments that a system actually does what you think it should do! 1.10 References [12], [1], [46], [16], [20] “mcs” — 2017/3/10 — 22:22 — page 20 — #28 20 Chapter 1 What is a Proof? Problems for Section 1.1 Class Problems Problem 1.1. Albert announces to his class that he plans to surprise them with a quiz sometime next week. His students first wonder if the quiz could be on Friday of next week. They reason that it can’t: if Albert didn’t give the quiz before Friday, then by midnight Thursday, they would know the quiz had to be on Friday, and so the quiz wouldn’t be a surprise any more. Next the students wonder whether Albert could give the surprise quiz Thursday. They observe that if the quiz wasn’t given before Thursday, it would have to be given on the Thursday, since they already know it can’t be given on Friday. But having figured that out, it wouldn’t be a surprise if the quiz was on Thursday either. Similarly, the students reason that the quiz can’t be on Wednesday, Tuesday, or Monday. Namely, it’s impossible for Albert to give a surprise quiz next week. All the students now relax, having concluded that Albert must have been bluffing. And since no one expects the quiz, that’s why, when Albert gives it on Tuesday next week, it really is a surprise! What, if anything, do you think is wrong with the students’ reasoning? Problem 1.2. The Pythagorean Theorem says that if a and b are the lengths of the sides of a right triangle, and c is the length of its hypotenuse, then a2 C b 2 D c 2 : This theorem is so fundamental and familiar that we generally take it for granted. But just being familiar doesn’t justify calling it “obvious”—witness the fact that people have felt the need to devise different proofs of it for milllenia.7 In this problem we’ll examine a particularly simple “proof without words” of the theorem. Here’s the strategy. Suppose you are given four different colored copies of a right triangle with sides of lengths a, b and c, along with a suitably sized square, as shown in Figure 1.1. (a) You will first arrange the square and four triangles so they form a c c square. From this arrangement you will see that the square is .b a/ .b a/. 7 Over a hundred different proofs are listed on the mathematics website http://www.cut-the- knot.org/pythagoras/. “mcs” — 2017/3/10 — 22:22 — page 21 — #29 1.10. References 21 b c a Figure 1.1 Right triangles and square. (b) You will then arrange the same shapes so they form two squares, one a a and the other b b. You know that the area of an s s square is s 2 . So appealing to the principle that Area is Preserved by Rearranging, you can now conclude that a2 C b 2 D c 2 , as claimed. This really is an elegant and convincing proof of the Pythagorean Theorem, but it has some worrisome features. One concern is that there might be something special about the shape of these particular triangles and square that makes the rearranging possible—for example, suppose a D b? (c) How would you respond to this concern? (d) Another concern is that a number of facts about right triangles, squares and lines are being implicitly assumed in justifying the rearrangements into squares. Enumerate some of these assumed facts. Problem 1.3. What’s going on here?! p p p p p 2 1D 1 D . 1/. 1/ D 1 1D 1 D 1: (a) Precisely identify and explain the mistake(s) in this bogus proof. (b) Prove (correctly) that if 1 D 1, then 2 D 1. “mcs” — 2017/3/10 — 22:22 — page 22 — #30 22 Chapter 1 What is a Proof? (c) Every positive real number r has two square roots, one positive and the other p negative. The standard convention is that the expression r refers to the positive square root of r. Assuming familiar properties of multiplication of real numbers, prove that for positive real numbers r and s, p p p rs D r s: Problem 1.4. Identify exactly where the bugs are in each of the following bogus proofs.8 (a) Bogus Claim: 1=8 > 1=4: Bogus proof. 3>2 3 log10 .1=2/ > 2 log10 .1=2/ log10 .1=2/3 > log10 .1=2/2 .1=2/3 > .1=2/2 ; and the claim now follows by the rules for multiplying fractions. (b) Bogus proof : 1¢ D $0:01 D .$0:1/2 D .10¢/2 D 100¢ D $1: (c) Bogus Claim: If a and b are two equal real numbers, then a D 0. Bogus proof. aDb a2 D ab a2 b 2 D ab b2 .a b/.a C b/ D .a b/b aCb Db a D 0: 8 From [45], Twenty Years Before the Blackboard by Michael Stueben and Diane Sandford “mcs” — 2017/3/10 — 22:22 — page 23 — #31 1.10. References 23 Problem 1.5. It’s a fact that the Arithmetic Mean is at least as large as the Geometric Mean, namely, aCb p ab 2 for all nonnegative real numbers a and b. But there’s something objectionable about the following proof of this fact. What’s the objection, and how would you fix it? Bogus proof. aCb ‹ p ab; so 2 ‹ p a C b 2 ab; so ‹ a2 C 2ab C b 2 4ab; so ‹ a2 2ab C b 2 0; so 2 .a b/ 0 which we know is true. The last statement is true because a b is a real number, and the square of a real number is never negative. This proves the claim. Practice Problems Problem 1.6. Why does the “surprise” paradox of Problem 1.1 present a philosophical problem but not a mathematical one? Problems for Section 1.5 Homework Problems Problem 1.7. Show that log7 n is either an integer or irrational, where n is a positive integer. Use whatever familiar facts about integers and primes you need, but explicitly state such facts. “mcs” — 2017/3/10 — 22:22 — page 24 — #32 24 Chapter 1 What is a Proof? Problems for Section 1.7 Practice Problems Problem 1.8. Prove by cases that max.r; s/ C min.r; s/ D r C s (*) for all real numbers r; s. Class Problems Problem 1.9. If we raise an irrational number to p an irrational power, can the result be rational? p 2 Show that it can by considering 2 and arguing by cases. Problem 1.10. Prove by cases that jr C sj jrj C jsj (1) for all real numbers r; s.9 Homework Problems Problem 1.11. (a) Suppose that a C b C c D d; where a; b; c; d are nonnegative integers. Let P be the assertion that d is even. Let W be the assertion that exactly one among a; b; c are even, and let T be the assertion that all three are even. Prove by cases that P IFF ŒW OR T : (b) Now suppose that w2 C x2 C y 2 D z2; 9 The absolute value jrj of r equals whichever of r or r is not negative. “mcs” — 2017/3/10 — 22:22 — page 25 — #33 1.10. References 25 where w; x; y; z are nonnegative integers. Let P be the assertion that z is even, and let R be the assertion that all three of w; x; y are even. Prove by cases that P IFF R: Hint: An odd number equals 2m C 1 for some integer m, so its square equals 4.m2 C m/ C 1. Exam Problems Problem 1.12. p Prove that there is an irrational p number a such that a 3 is rational. p 3 Hint: Consider 3 2 and argue by cases. Problems for Section 1.8 Practice Problems Problem 1.13. Prove that for any n > 0, if an is even, then a is even. Hint: Contradiction. Problem 1.14. p Prove that if a b D n, then either a or b must be n, where a; b, and n are nonnegative real numbers. Hint: by contradiction, Section 1.8. Problem 1.15. Let n be a nonnegative integer. (a) Explain why if n2 is even—that is, a multiple of 2—then n is even. (b) Explain why if n2 is a multiple of 3, then n must be a multiple of 3. Problem 1.16. Give an example of two distinct positive integers m; n such that n2 is a multiple of m, but n is not a multiple of m. How about having m be less than n? “mcs” — 2017/3/10 — 22:22 — page 26 — #34 26 Chapter 1 What is a Proof? Class Problems Problem 1.17. p How far can you generalize p the proof of Theorem 1.8.1 that 2 is irrational? For example, how about 3? Problem 1.18. Prove that log4 6 is irrational. Problem 1.19. p p Prove by p contradiction p p that p3 C 2 is irrational. Hint: . 3 C 2/. 3 2/ Problem 1.20. Here is a generalization of Problem 1.17 that you may not have thought of: Lemma. Let the coefficients of the polynomial a0 C a1 x C a2 x 2 C C am 1x m 1 C xm be integers. Then any real root of the polynomial is either integral or irrational. p m (a) Explain why the Lemma immediately implies that k is irrational whenever k is not an mth power of some integer. (b) Carefully prove the Lemma. You may find it helpful to appeal to: Fact. If a prime p is a factor of some power of an integer, then it is a factor of that integer. You may assume this Fact without writing down its proof, but see if you can explain why it is true. Exam Problems Problem 1.21. Prove that log9 12 is irrational. “mcs” — 2017/3/10 — 22:22 — page 27 — #35 1.10. References 27 Problem 1.22. Prove that log12 18 is irrational. Problem 1.23. p3 A familiar proof that 72 is irrational depends on the fact that a certain equation among those below is unsatisfiable by integers a; b > 0. Note that more than one is unsatisfiable. Indicate the equation that would p appear in the proof, and explain 3 2 why it is unsatisfiable. (Do not assume that 7 is irrational.) i. a2 D 72 C b 2 ii. a3 D 72 C b 3 iii. a2 D 72 b 2 iv. a3 D 72 b 3 v. a3 D 73 b 3 vi. .ab/3 D 72 Homework Problems Problem 1.24. The fact that that there are irrational numbers a; b such that ab is rational was proved in Problem 1.9 by cases. Unfortunately, that proof was nonconstructive: it p a specific pair a; b with this property. But in fact, it’s easy to do this: didn’t reveal let a WWD 2 and bpWWD 2 log2 3. We know a D 2 is irrational, and ab D 3 by definition. Finish the proof that these values for a; b work, by showing that 2 log2 3 is irrational. Problem 1.25. p Here is a different proof that 2 is irrational, taken from the American Mathemat- ical Monthly, v.116, #1, Jan. 2009, p.69: p Proof. Suppose for the sake ofcontradiction that 2 is rational, and choose the p least integer q > 0 such that 2 1 q is a nonnegative integer. Let q 0 WWD p p 2 1 q. Clearly 0 < q 0 < q. But an easy computation shows that 2 1 q0 is a nonnegative integer, contradicting the minimality of q. “mcs” — 2017/3/10 — 22:22 — page 28 — #36 28 Chapter 1 What is a Proof? (a) This proof was written for an audience of college teachers, and at this point it is a little more concise than desirable. Write out a more complete version which includes an explanation of each step. (b) Now that you have justified the steps in this proof, do you have a preference for one of these proofs over the other? Why? Discuss these questions with your teammates for a few minutes and summarize your team’s answers on your white- board. Problem 1.26. For n D 40, the value of polynomial p.n/ WWD n2 C n C 41 is not prime, as noted in Section 1.1. But we could have predicted based on general principles that no nonconstant polynomial can generate only prime numbers. In particular, let q.n/ be a polynomial with integer coefficients, and let c WWD q.0/ be the constant term of q. (a) Verify that q.cm/ is a multiple of c for all m 2 Z. (b) Show that if q is nonconstant and c > 1, then as n ranges over the nonnegative integers N there are infinitely many q.n/ 2 Z that are not primes. Hint: You may assume the familiar fact that the magnitude of any nonconstant polynomial q.n/ grows unboundedly as n grows. (c) Conclude that for every nonconstant polynomial q there must be an n 2 N such that q.n/ is not prime. Hint: Only one easy case remains. “mcs” — 2017/3/10 — 22:22 — page 29 — #37 2 The Well Ordering Principle Every nonempty set of nonnegative integers has a smallest element. This statement is known as The Well Ordering Principle. Do you believe it? Seems sort of obvious, right? But notice how tight it is: it requires a nonempty set—it’s false for the empty set which has no smallest element because it has no elements at all. And it requires a set of nonnegative integers—it’s false for the set of negative integers and also false for some sets of nonnegative rationals—for example, the set of positive rationals. So, the Well Ordering Principle captures something special about the nonnegative integers. While the Well Ordering Principle may seem obvious, it’s hard to see offhand why it is useful. But in fact, it provides one of the most important proof rules in discrete mathematics. In this chapter, we’ll illustrate the power of this proof method with a few simple examples. 2.1 Well Ordering Proofs We actually p have already taken the Well Ordering Principle for granted in proving that 2 is irrational. That proof assumed that for any positive integers m and n, the fraction m=n can be written in lowest terms, that is, in the form m0 =n0 where m0 and n0 are positive integers with no common prime factors. How do we know this is always possible? Suppose to the contrary that there are positive integers m and n such that the fraction m=n cannot be written in lowest terms. Now let C be the set of positive integers that are numerators of such fractions. Then m 2 C , so C is nonempty. Therefore, by Well Ordering, there must be a smallest integer m0 2 C . So by definition of C , there is an integer n0 > 0 such that m0 the fraction cannot be written in lowest terms. n0 This means that m0 and n0 must have a common prime factor, p > 1. But m0 =p m0 D ; n0 =p n0 “mcs” — 2017/3/10 — 22:22 — page 30 — #38 30 Chapter 2 The Well Ordering Principle so any way of expressing the left-hand fraction in lowest terms would also work for m0 =n0 , which implies m0 =p the fraction cannot be in written in lowest terms either. n0 =p So by definition of C , the numerator m0 =p is in C . But m0 =p < m0 , which contradicts the fact that m0 is the smallest element of C . Since the assumption that C is nonempty leads to a contradiction, it follows that C must be empty. That is, that there are no numerators of fractions that can’t be written in lowest terms, and hence there are no such fractions at all. We’ve been using the Well Ordering Principle on the sly from early on! 2.2 Template for Well Ordering Proofs More generally, there is a standard way to use Well Ordering to prove that some property, P .n/ holds for every nonnegative integer n. Here is a standard way to organize such a well ordering proof: To prove that “P .n/ is true for all n 2 N” using the Well Ordering Principle: Define the set C of counterexamples to P being true. Specifically, define C WWD fn 2 N j NOT.P .n// is trueg: (The notation fn j Q.n/g means “the set of all elements n for which Q.n/ is true.” See Section 4.1.4.) Assume for proof by contradiction that C is nonempty. By the Well Ordering Principle, there will be a smallest element n in C . Reach a contradiction somehow—often by showing that P .n/ is actually true or by showing that there is another member of C that is smaller than n. This is the open-ended part of the proof task. Conclude that C must be empty, that is, no counterexamples exist. 2.2.1 Summing the Integers Let’s use this template to prove “mcs” — 2017/3/10 — 22:22 — page 31 — #39 2.2. Template for Well Ordering Proofs 31 Theorem 2.2.1. 1 C 2 C 3 C C n D n.n C 1/=2 (2.1) for all nonnegative integers n. First, we’d better address a couple of ambiguous special cases before they trip us up: If n D 1, then there is only one term in the summation, and so 1 C 2 C 3 C C n is just the term 1. Don’t be misled by the appearance of 2 and 3 or by the suggestion that 1 and n are distinct terms! If n D 0, then there are no terms at all in the summation. By convention, the sum in this case is 0. So, while the three dots notation, which is called an ellipsis, is convenient, you have to watch out for these special cases where the notation is misleading. In fact, whenever you see an ellipsis, you should be on the lookout to be sure you understand the pattern, watching out for the beginning and the end. We could have eliminated the need for guessing by rewriting the left side of (2.1) with summation notation: n X X i or i: i D1 1i n Both of these expressions denote the sum of all values taken by the expression to the right of the sigma as the variable i ranges from 1 to n. Both expressions make it clear what (2.1) means when n D 1. The second expression makes it clear that when n D 0, there are no terms in the sum, though you still have to know the convention that a sum of no numbers equals 0 (the product of no numbers is 1, by the way). OK, back to the proof: Proof. By contradiction. Assume that Theorem 2.2.1 is false. Then, some nonneg- ative integers serve as counterexamples to it. Let’s collect them in a set: n.n C 1/ C WWD fn 2 N j 1 C 2 C 3 C C n ¤ g: 2 Assuming there are counterexamples, C is a nonempty set of nonnegative integers. So, by the Well Ordering Principle, C has a minimum element, which we’ll call c. That is, among the nonnegative integers, c is the smallest counterexample to equation (2.1). “mcs” — 2017/3/10 — 22:22 — page 32 — #40 32 Chapter 2 The Well Ordering Principle Since c is the smallest counterexample, we know that (2.1) is false for n D c but true for all nonnegative integers n < c. But (2.1) is true for n D 0, so c > 0. This means c 1 is a nonnegative integer, and since it is less than c, equation (2.1) is true for c 1. That is, .c 1/c 1 C 2 C 3 C C .c 1/ D : 2 But then, adding c to both sides, we get .c 1/c c2 c C 2c c.c C 1/ 1 C 2 C 3 C C .c 1/ C c D Cc D D ; 2 2 2 which means that (2.1) does hold for c, after all! This is a contradiction, and we are done. 2.3 Factoring into Primes We’ve previously taken for granted the Prime Factorization Theorem, also known as the Unique Factorization Theorem and the Fundamental Theorem of Arithmetic, which states that every integer greater than one has a unique1 expression as a prod- uct of prime numbers. This is another of those familiar mathematical facts which are taken for granted but are not really obvious on closer inspection. We’ll prove the uniqueness of prime factorization in a later chapter, but well ordering gives an easy proof that every integer greater than one can be expressed as some product of primes. Theorem 2.3.1. Every positive integer greater than one can be factored as a prod- uct of primes. Proof. The proof is by well ordering. Let C be the set of all integers greater than one that cannot be factored as a product of primes. We assume C is not empty and derive a contradiction. If C is not empty, there is a least element n 2 C by well ordering. The n can’t be prime, because a prime by itself is considered a (length one) product of primes and no such products are in C . So n must be a product of two integers a and b where 1 < a; b < n. Since a and b are smaller than the smallest element in C , we know that a; b … C . In other words, a can be written as a product of primes p1 p2 pk and b as a product of 1 . . . unique up to the order in which the prime factors appear “mcs” — 2017/3/10 — 22:22 — page 33 — #41 2.4. Well Ordered Sets 33 primes q1 ql . Therefore, n D p1 pk q1 ql can be written as a product of primes, contradicting the claim that n 2 C . Our assumption that C is not empty must therefore be false. 2.4 Well Ordered Sets A set of numbers is well ordered when each of its nonempty subsets has a minimum element. The Well Ordering Principle says, of course, that the set of nonnegative integers is well ordered, but so are lots of other sets, such as every finite set, or the sets rN of numbers of the form rn, where r is a positive real number and n 2 N. Well ordering commonly comes up in computer science as a method for proving that computations won’t run forever. The idea is to assign a value to the successive steps of a computation so that the values get smaller at every step. If the values are all from a well ordered set, then the computation can’t run forever, because if it did, the values assigned to its successive steps would define a subset with no minimum element. You’ll see several examples of this technique applied in Chapter 6 to prove that various state machines will eventually terminate. Notice that a set may have a minimum element but not be well ordered. The set of nonnegative rational numbers is an example: it has a minimum element zero, but it also has nonempty subsets that don’t have minimum elements—the positive rationals, for example. The following theorem is a tiny generalization of the Well Ordering Principle. Theorem 2.4.1. For any nonnegative integer n the set of integers greater than or equal to n is well ordered. This theorem is just as obvious as the Well Ordering Principle, and it would be harmless to accept it as another axiom. But repeatedly introducing axioms gets worrisome after a while, and it’s worth noticing when a potential axiom can actually be proved. We can easily prove Theorem 2.4.1 using the Well Ordering Principle: Proof. Let S be any nonempty set of integers n. Now add n to each of the elements in S ; let’s call this new set S C n. Now S C n is a nonempty set of nonnegative integers, and so by the Well Ordering Principle, it has a minimum element m. But then it’s easy to see that m n is the minimum element of S . The definition of well ordering states that every subset of a well ordered set is well ordered, and this yields two convenient, immediate corollaries of Theo- rem 2.4.1: “mcs” — 2017/3/10 — 22:22 — page 34 — #42 34 Chapter 2 The Well Ordering Principle Definition 2.4.2. A lower bound (respectively, upper bound) for a set S of real numbers is a number b such that b s (respectively, b s) for every s 2 S . Note that a lower or upper bound of set S is not required to be in the set. Corollary 2.4.3. Any set of integers with a lower bound is well ordered. Proof. A set of integers with a lower bound b 2 R will also have the integer n D bbc as a lower bound, where bbc, called the floor of b, is gotten by rounding down b to the nearest integer. So Theorem 2.4.1 implies the set is well ordered. Corollary 2.4.4. Any nonempty set of integers with an upper bound has a maximum element. Proof. Suppose a set S of integers has an upper bound b 2 R. Now multiply each element of S by -1; let’s call this new set of elements S . Now, of course, b is a lower bound of S . So S has a minimum element m by Corollary 2.4.3. But then it’s easy to see that m is the maximum element of S. 2.4.1 A Different Well Ordered Set (Optional) Another example of a well ordered set of numbers is the set F of fractions that can be expressed in the form n=.n C 1/: 0 1 2 3 n ; ; ; ;:::; ;:::: 1 2 3 4 nC1 The minimum element of any nonempty subset of F is simply the one with the minimum numerator when expressed in the form n=.n C 1/. Now we can define a very different well ordered set by adding nonnegative inte- gers to numbers in F. That is, we take all the numbers of the form n C f where n is a nonnegative integer and f is a number in F. Let’s call this set of numbers—you guessed it—N C F. There is a simple recipe for finding the minimum number in any nonempty subset of N C F, which explains why this set is well ordered: Lemma 2.4.5. N C F is well ordered. Proof. Given any nonempty subset S of N C F, look at all the nonnegative integers n such that n C f is in S for some f 2 F. This is a nonempty set nonnegative integers, so by the WOP, there is a minimum one; call it ns . By definition of ns , there is some f 2 F such that nS C f is in the set S. So the set all fractions f such that nS C f 2 S is a nonempty subset of F, and since F is well ordered, this nonempty set contains a minimum element; call it fS . Now it easy to verify that nS C fS is the minimum element of S (Problem 2.20). “mcs” — 2017/3/10 — 22:22 — page 35 — #43 2.4. Well Ordered Sets 35 The set N C F is different from the earlier examples. In all the earlier examples, each element was greater than only a finite number of other elements. In N C F, every element greater than or equal to 1 can be the first element in strictly decreas- ing sequences of elements of arbitrary finite length. For example, the following decreasing sequences of elements in N C F all start with 1: 1; 0: 1; 12 ; 0: 1; 23 ; 12 ; 0: 1; 34 ; 23 ; 12 ; 0: :: : Nevertheless, since N C F is well ordered, it is impossible to find an infinite de- creasing sequence of elements in N C F, because the set of elements in such a sequence would have no minimum. Problems for Section 2.2 Practice Problems Problem 2.1. For practice using the Well Ordering Principle, fill in the template of an easy to prove fact: every amount of postage that can be assembled using only 10 cent and 15 cent stamps is divisible by 5. In particular, let the notation “j j k” indicate that integer j is a divisor of integer k, and let S.n/ mean that exactly n cents postage can be assembled using only 10 and 15 cent stamps. Then the proof shows that S.n/ IMPLIES 5 j n; for all nonnegative integers n: (2.2) Fill in the missing portions (indicated by “. . . ”) of the following proof of (2.2). Let C be the set of counterexamples to (2.2), namely C WWD fn j : : :g Assume for the purpose of obtaining a contradiction that C is nonempty. Then by the WOP, there is a smallest number m 2 C . This m must be positive because . . . . “mcs” — 2017/3/10 — 22:22 — page 36 — #44 36 Chapter 2 The Well Ordering Principle But if S.m/ holds and m is positive, then S.m 10/ or S.m 15/ must hold, because . . . . So suppose S.m 10/ holds. Then 5 j .m 10/, because. . . But if 5 j .m 10/, then obviously 5 j m, contradicting the fact that m is a counterexample. Next, if S.m 15/ holds, we arrive at a contradiction in the same way. Since we get a contradiction in both cases, we conclude that. . . which proves that (2.2) holds. Problem 2.2. The Fibonacci numbers F .0/; F .1/; F .2/; : : : are defined as follows: 8 <0 ˆ if n D 0; F .n/ WWD 1 if n D 1; ˆ F .n 1/ C F .n 2/ if n > 1: : Exactly which sentence(s) in the following bogus proof contain logical errors? Explain. False Claim. Every Fibonacci number is even. Bogus proof. Let all the variables n; m; k mentioned below be nonnegative integer valued. 1. The proof is by the WOP. 2. Let EF.n/ mean that F .n/ is even. 3. Let C be the set of counterexamples to the assertion that EF.n/ holds for all n 2 N, namely, C WWD fn 2 N j NOT.EF.n//g: 4. We prove by contradiction that C is empty. So assume that C is not empty. 5. By WOP, there is a least nonnegative integer m 2 C . 6. Then m > 0, since F .0/ D 0 is an even number. 7. Since m is the minimum counterexample, F .k/ is even for all k < m. 8. In particular, F .m 1/ and F .m 2/ are both even. “mcs” — 2017/3/10 — 22:22 — page 37 — #45 2.4. Well Ordered Sets 37 9. But by the definition, F .m/ equals the sum F .m 1/ C F .m 2/ of two even numbers, and so it is also even. 10. That is, EF.m/ is true. 11. This contradicts the condition in the definition of m that NOT.EF.m// holds. 12. This contradition implies that C must be empty. Hence, F .n/ is even for all n 2 N. Problem 2.3. In Chapter 2, the Well Ordering Principle was used to show that all positive rational numbers can be written in “lowest terms,” that is, as a ratio of positive integers with no common factor prime factor. Below is a different proof which also arrives at this correct conclusion, but this proof is bogus. Identify every step at which the proof makes an unjustified inference. Bogus proof. Suppose to the contrary that there was positive rational q such that q cannot be written in lowest terms. Now let C be the set of such rational numbers that cannot be written in lowest terms. Then q 2 C , so C is nonempty. So there must be a smallest rational q0 2 C . So since q0 =2 < q0 , it must be possible to express q0 =2 in lowest terms, namely, q0 m D (2.3) 2 n for positive integers m; n with no common prime factor. Now we consider two cases: Case 1: [n is odd]. Then 2m and n also have no common prime factor, and therefore m 2m q0 D 2 D n n expresses q0 in lowest terms, a contradiction. Case 2: [n is even]. Any common prime factor of m and n=2 would also be a common prime factor of m and n. Therefore m and n=2 have no common prime factor, and so m q0 D n=2 expresses q0 in lowest terms, a contradiction. Since the assumption that C is nonempty leads to a contradiction, it follows that C is empty—that is, there are no counterexamples. “mcs” — 2017/3/10 — 22:22 — page 38 — #46 38 Chapter 2 The Well Ordering Principle Class Problems Problem 2.4. Use the Well Ordering Principle 2 to prove that n X n.n C 1/.2n C 1/ k2 D : (2.4) 6 kD0 for all nonnegative integers n. Problem 2.5. Use the Well Ordering Principle to prove that there is no solution over the positive integers to the equation: 4a3 C 2b 3 D c 3 : Problem 2.6. You are given a series of envelopes, respectively containing 1; 2; 4; : : : ; 2m dollars. Define Property m: For any nonnegative integer less than 2mC1 , there is a selection of envelopes whose contents add up to exactly that number of dollars. Use the Well Ordering Principle (WOP) to prove that Property m holds for all nonnegative integers m. Hint: Consider two cases: first, when the target number of dollars is less than 2 and second, when the target is at least 2m . m Homework Problems Problem 2.7. Use the Well Ordering Principle to prove that any integer greater than or equal to 8 can be represented as the sum of nonnegative integer multiples of 3 and 5. Problem 2.8. Use the Well Ordering Principle to prove that any integer greater than or equal to 50 can be represented as the sum of nonnegative integer multiples of 7, 11, and 13. 2 Proofs by other methods such as induction or by appeal to known formulas for similar sums will not receive full credit. “mcs” — 2017/3/10 — 22:22 — page 39 — #47 2.4. Well Ordered Sets 39 Problem 2.9. Euler’s Conjecture in 1769 was that there are no positive integer solutions to the equation a4 C b 4 C c 4 D d 4 : Integer values for a; b; c; d that do satisfy this equation were first discovered in 1986. So Euler guessed wrong, but it took more than two centuries to demonstrate his mistake. Now let’s consider Lehman’s equation, similar to Euler’s but with some coeffi- cients: 8a4 C 4b 4 C 2c 4 D d 4 (2.5) Prove that Lehman’s equation (2.5) really does not have any positive integer solutions. Hint: Consider the minimum value of a among all possible solutions to (2.5). Problem 2.10. Use the Well Ordering Principle to prove that n 3n=3 (2.6) for every nonnegative integer n. Hint: Verify (2.6) for n 4 by explicit calculation. Problem 2.11. A winning configuration in the game of Mini-Tetris is a complete tiling of a 2 n board using only the three shapes shown below: For example, here are several possible winning configurations on a 2 5 board: “mcs” — 2017/3/10 — 22:22 — page 40 — #48 40 Chapter 2 The Well Ordering Principle (a) Let Tn denote the number of different winning configurations on a 2n board. Determine the values of T1 , T2 and T3 . (b) Express Tn in terms of Tn 1 and Tn 2 for n > 2. (c) Use the Well Ordering Principle to prove that the number of winning configu- rations on a 2 n Mini-Tetris board is:3 2nC1 C . 1/n Tn D (*) 3 Problem 2.12. Mini-Tetris is a game whose objective is to provide a complete “tiling” of a 2 n board using tiles of specified shapes. In this problem we consider the following set of five tiles: For example, there are two possible tilings of a 2 1 board: Also, here are three tilings for a 2 2 board: 3A good question is how someone came up with equation (*) in the first place. A simple Well Ordering proof gives no hint about this, but it should be absolutely convincing anyway. “mcs” — 2017/3/10 — 22:22 — page 41 — #49 2.4. Well Ordered Sets 41 Note that tiles may not be rotated, which is why the second and third of the above tilings count as different, even though one is a 180o rotation of the other. (A 90o degree rotation of these shapes would not count as a tiling at all.) (a) There are four more 2 2 tilings in addition to the three above. What are they? Let Tn denote the number of different tilings of a 2 n board. We know that T1 D 2 and T2 D 7. Also, T0 D 1 because there is exactly one way to tile a 2 0 board—don’t use any tiles. (b) Tn can be specified in terms of Tn 1 and Tn 2 as follows: Tn D 2Tn 1 C 3Tn 2 (2.7) for n 2. Briefly explain how to justify this equation. (c) Use the Well Ordering Principle to prove that for n 0, the number Tn of tilings of a 2 n Mini-Tetris board is: 3nC1 C . 1/n : 4 Exam Problems Problem 2.13. Except for an easily repaired omission, the following proof using the Well Ordering Principle shows that every amount of postage that can be paid exactly using only 10 cent and 15 cent stamps, is divisible by 5. Namely, let the notation “j j k” indicate that integer j is a divisor of integer k, and let S.n/ mean that exactly n cents postage can be assembled using only 10 and 15 cent stamps. Then the proof shows that S.n/ IMPLIES 5 j n; for all nonnegative integers n: (2.8) Fill in the missing portions (indicated by “. . . ”) of the following proof of (2.8), and at the end, identify the minor mistake in the proof and how to fix it. “mcs” — 2017/3/10 — 22:22 — page 42 — #50 42 Chapter 2 The Well Ordering Principle Let C be the set of counterexamples to (2.8), namely C WWD fn j S.n/ and NOT.5 j n/g Assume for the purpose of obtaining a contradiction that C is nonempty. Then by the WOP, there is a smallest number m 2 C . Then S.m 10/ or S.m 15/ must hold, because the m cents postage is made from 10 and 15 cent stamps, so we remove one. So suppose S.m 10/ holds. Then 5 j .m 10/, because. . . But if 5 j .m 10/, then 5 j m, because. . . contradicting the fact that m is a counterexample. Next suppose S.m 15/ holds. Then the proof for m 10 carries over directly for m 15 to yield a contradiction in this case as well. Since we get a contradiction in both cases, we conclude that C must be empty. That is, there are no counterexamples to (2.8), which proves that (2.8) holds. The proof makes an implicit assumption about the value of m. State the assump- tion and justify it in one sentence. Problem 2.14. (a) Prove using the Well Ordering Principle that, using 6¢, 14¢, and 21¢ stamps, it is possible to make any amount of postage over 50¢. To save time, you may specify assume without proof that 50¢, 51¢, . . . 100¢ are all makeable, but you should clearly indicate which of these assumptions your proof depends on. (b) Show that 49¢ is not makeable. Problem 2.15. We’ll use the Well Ordering Principle to prove that for every positive integer n, the sum of the first n odd numbers is n2 , that is, n X1 .2i C 1/ D n2 ; (2.9) i D0 for all n > 0. Assume to the contrary that equation (2.9) failed for some positive integer n. Let m be the least such number. “mcs” — 2017/3/10 — 22:22 — page 43 — #51 2.4. Well Ordered Sets 43 (a) Why must there be such an m? (b) Explain why m 2. (c) Explain why part (b) implies that m X1 .2.i 1/ C 1/ D .m 1/2 : (2.10) i D1 (d) What term should be added to the left-hand side of (2.10) so the result equals m X .2.i 1/ C 1/‹ i D1 (e) Conclude that equation (2.9) holds for all positive integers n. Problem 2.16. Use the Well Ordering Principle (WOP) to prove that 2 C 4 C C 2n D n.n C 1/ (2.11) for all n > 0. Problem 2.17. Prove by the Well Ordering Principle that for all nonnegative integers, n: 2 3 3 3 3 n.n C 1/ 0 C 1 C 2 C C n D : (2.12) 2 Problem 2.18. Use the Well Ordering Principle to prove that n.n C 1/.n C 2/ 1 2 C 2 3 C 3 4 C C n.n C 1/ D (*) 3 for all integers n 1. “mcs” — 2017/3/10 — 22:22 — page 44 — #52 44 Chapter 2 The Well Ordering Principle Problem 2.19. Say a number of cents is makeable if it is the value of some set of 6 cent and 15 cent stamps. Use the Well Ordering Principle to show that every integer that is a multiple of 3 and greater than or equal to twelve is makeable. Problems for Section 2.4 Homework Problems Problem 2.20. Complete the proof of Lemma 2.4.5 by showing that the number nS C fS is the minimum element in S . Practice Problems Problem 2.21. Indicate which of the following sets of numbers have a minimum element and which are well ordered. For those that are not well ordered, give an example of a subset with no minimum element. p (a) The integers 2. p (b) The rational numbers 2. (c) The set of rationals of the form 1=n where n is a positive integer. (d) The set G of rationals of the form m=n where m; n > 0 and n g, where g is a googol 10100 . (e) The set F of fractions of the form n=.n C 1/: 0 1 2 3 ; ; ; ;:::: 1 2 3 4 (f) Let W WWD N [ F be the set consisting of the nonnegative integers along with all the fractions of the form n=.n C 1/. Describe a length 5 decreasing sequence of elements of W starting with 1,. . . length 50 decreasing sequence,. . . length 500. Problem 2.22. Use the Well Ordering Principle to prove that every finite, nonempty set of real numbers has a minimum element. “mcs” — 2017/3/10 — 22:22 — page 45 — #53 2.4. Well Ordered Sets 45 Class Problems Problem 2.23. Prove that a set R of real numbers is well ordered iff there is no infinite decreasing sequence of numbers R. In other words, there is no set of numbers ri 2 R such that r0 > r1 > r2 > : : : : (2.13) “mcs” — 2017/3/10 — 22:22 — page 46 — #54 “mcs” — 2017/3/10 — 22:22 — page 47 — #55 3 Logical Formulas It is amazing that people manage to cope with all the ambiguities in the English language. Here are some sentences that illustrate the issue: “You may have cake, or you may have ice cream.” “If pigs can fly, then your account won’t get hacked.” “If you can solve any problem we come up with, then you get an A for the course.” “Every American has a dream.” What precisely do these sentences mean? Can you have both cake and ice cream or must you choose just one dessert? Pigs can’t fly, so does the second sentence say anything about the security of your account? If you can solve some problems we come up with, can you get an A for the course? And if you can’t solve a single one of the problems, does it mean you can’t get an A? Finally, does the last sentence imply that all Americans have the same dream—say of owning a house—or might different Americans have different dreams—say, Eric dreams of designing a killer software application, Tom of being a tennis champion, Albert of being able to sing? Some uncertainty is tolerable in normal conversation. But when we need to formulate ideas precisely—as in mathematics and programming—the ambiguities inherent in everyday language can be a real problem. We can’t hope to make an exact argument if we’re not sure exactly what the statements mean. So before we start into mathematics, we need to investigate the problem of how to talk about mathematics. To get around the ambiguity of English, mathematicians have devised a spe- cial language for talking about logical relationships. This language mostly uses ordinary English words and phrases such as “or,” “implies,” and “for all.” But mathematicians give these words precise and unambiguous definitions which don’t always match common usage. Surprisingly, in the midst of learning the language of logic, we’ll come across the most important open problem in computer science—a problem whose solution could change the world. “mcs” — 2017/3/10 — 22:22 — page 48 — #56 48 Chapter 3 Logical Formulas 3.1 Propositions from Propositions In English, we can modify, combine, and relate propositions with words such as “not,” “and,” “or,” “implies,” and “if-then.” For example, we can combine three propositions into one like this: If all humans are mortal and all Greeks are human, then all Greeks are mortal. For the next while, we won’t be much concerned with the internals of propositions— whether they involve mathematics or Greek mortality—but rather with how propo- sitions are combined and related. So, we’ll frequently use variables such as P and Q in place of specific propositions such as “All humans are mortal” and “2 C 3 D 5.” The understanding is that these propositional variables, like propositions, can take on only the values T (true) and F (false). Propositional variables are also called Boolean variables after their inventor, the nineteenth century mathematician George—you guessed it—Boole. 3.1.1 NOT , AND , and OR Mathematicians use the words NOT, AND and OR for operations that change or combine propositions. The precise mathematical meaning of these special words can be specified by truth tables. For example, if P is a proposition, then so is “NOT.P /,” and the truth value of the proposition “NOT.P /” is determined by the truth value of P according to the following truth table: P NOT.P / T F F T The first row of the table indicates that when proposition P is true, the proposi- tion “NOT.P /” is false. The second line indicates that when P is false, “NOT.P /” is true. This is probably what you would expect. In general, a truth table indicates the true/false value of a proposition for each possible set of truth values for the variables. For example, the truth table for the proposition “P AND Q” has four lines, since there are four settings of truth values for the two variables: P Q P AND Q T T T T F F F T F F F F “mcs” — 2017/3/10 — 22:22 — page 49 — #57 3.1. Propositions from Propositions 49 According to this table, the proposition “P AND Q” is true only when P and Q are both true. This is probably the way you ordinarily think about the word “and.” There is a subtlety in the truth table for “P OR Q”: P Q P OR Q T T T T F T F T T F F F The first row of this table says that “P OR Q” is true even if both P and Q are true. This isn’t always the intended meaning of “or” in everyday speech, but this is the standard definition in mathematical writing. So if a mathematician says, “You may have cake, or you may have ice cream,” he means that you could have both. If you want to exclude the possibility of having both cake and ice cream, you should combine them with the exclusive-or operation, XOR: P Q P XOR Q T T F T F T F T T F F F 3.1.2 If and Only If Mathematicians commonly join propositions in an additional way that doesn’t arise in ordinary speech. The proposition “P if and only if Q” asserts that P and Q have the same truth value. Either both are true or both are false. P Q P IFF Q T T T T F F F T F F F T For example, the following if-and-only-if statement is true for every real number x: x 2 4 0 IFF jxj 2: For some values of x, both inequalities are true. For other values of x, neither inequality is true. In every case, however, the IFF proposition as a whole is true. “mcs” — 2017/3/10 — 22:22 — page 50 — #58 50 Chapter 3 Logical Formulas 3.1.3 IMPLIES The combining operation whose technical meaning is least intuitive is “implies.” Here is its truth table, with the lines labeled so we can refer to them later. P Q P IMPLIES Q T T T (tt) T F F (tf) F T T (ft) F F T (ff) The truth table for implications can be summarized in words as follows: An implication is true exactly when the if-part is false or the then-part is true. This sentence is worth remembering; a large fraction of all mathematical statements are of the if-then form! Let’s experiment with this definition. For example, is the following proposition true or false? If Goldbach’s Conjecture is true, then x 2 0 for every real number x. We already mentioned that no one knows whether Goldbach’s Conjecture, Proposi- tion 1.1.6, is true or false. But that doesn’t prevent us from answering the question! This proposition has the form P IMPLIES Q where the hypothesis P is “Gold- bach’s Conjecture is true” and the conclusion Q is “x 2 0 for every real number x.” Since the conclusion is definitely true, we’re on either line (tt) or line (ft) of the truth table. Either way, the proposition as a whole is true! Now let’s figure out the truth of one of our original examples: If pigs fly, then your account won’t get hacked. Forget about pigs, we just need to figure out whether this proposition is true or false. Pigs do not fly, so we’re on either line (ft) or line (ff) of the truth table. In both cases, the proposition is true! False Hypotheses This mathematical convention—that an implication as a whole is considered true when its hypothesis is false—contrasts with common cases where implications are supposed to have some causal connection between their hypotheses and conclu- sions. For example, we could agree—or at least hope—that the following statement is true: “mcs” — 2017/3/10 — 22:22 — page 51 — #59 3.1. Propositions from Propositions 51 If you followed the security protocal, then your account won’t get hacked. We regard this implication as unproblematical because of the clear causal connec- tion between security protocols and account hackability. On the other hand, the statement: If pigs could fly, then your account won’t get hacked, would commonly be rejected as false—or at least silly—because porcine aeronau- tics have nothing to do with your account security. But mathematically, this impli- cation counts as true. It’s important to accept the fact that mathematical implications ignore causal connections. This makes them a lot simpler than causal implications, but useful nevertheless. To illustrate this, suppose we have a system specification which con- sists of a series of, say, a dozen rules,1 If the system sensors are in condition 1, then the system takes action 1. If the system sensors are in condition 2, then the system takes action 2. :: : If the system sensors are in condition 12, then the system takes action 12. Letting Ci be the proposition that the system sensors are in condition i , and Ai be the proposition that system takes action i , the specification can be restated more concisely by the logical formulas C1 IMPLIES A1 ; C2 IMPLIES A2 ; :: : C12 IMPLIES A12 : Now the proposition that the system obeys the specification can be nicely expressed as a single logical formula by combining the formulas together with ANDs:: ŒC1 IMPLIES A1 AND ŒC2 IMPLIES A2 AND AND ŒC12 IMPLIES A12 : (3.1) For example, suppose only conditions C2 and C5 are true, and the system indeed takes the specified actions A2 and A5 . So in this case, the system is behaving 1 Problem 3.16 concerns just such a system. “mcs” — 2017/3/10 — 22:22 — page 52 — #60 52 Chapter 3 Logical Formulas according to specification, and we accordingly want formula (3.1) to come out true. The implications C2 IMPLIES A2 and C5 IMPLIES A5 are both true because both their hypotheses and their conclusions are true. But in order for (3.1) to be true, we need all the other implications, all of whose hypotheses are false, to be true. This is exactly what the rule for mathematical implications accomplishes. 3.2 Propositional Logic in Computer Programs Propositions and logical connectives arise all the time in computer programs. For example, consider the following snippet, which could be either C, C++, or Java: if ( x > 0 || (x <= 0 && y > 100) ) :: : (further instructions) Java uses the symbol || for “OR,” and the symbol && for “AND.” The further instructions are carried out only if the proposition following the word if is true. On closer inspection, this big expression is built from two simpler propositions. Let A be the proposition that x > 0, and let B be the proposition that y > 100. Then we can rewrite the condition as A OR .NOT.A/ AND B/: (3.2) 3.2.1 Truth Table Calculation A truth table calculation reveals that the more complicated expression 3.2 always has the same truth value as A OR B: (3.3) We begin with a table with just the truth values of A and B: A B A OR .NOT.A/ AND B/ A OR B T T T F F T F F “mcs” — 2017/3/10 — 22:22 — page 53 — #61 3.2. Propositional Logic in Computer Programs 53 These values are enough to fill in two more columns: A B A OR .NOT.A/ AND B/ A OR B T T F T T F F T F T T T F F T F Now we have the values needed to fill in the AND column: A B A OR .NOT.A/ AND B/ A OR B T T F F T T F F F T F T T T T F F T F F and this provides the values needed to fill in the remaining column for the first OR: A B A OR .NOT.A/ AND B/ A OR B T T T F F T T F T F F T F T T T T T F F F T F F Expressions whose truth values always match are called equivalent. Since the two emphasized columns of truth values of the two expressions are the same, they are equivalent. So we can simplify the code snippet without changing the program’s behavior by replacing the complicated expression with an equivalent simpler one: if ( x > 0 || y > 100 ) :: : (further instructions) The equivalence of (3.2) and (3.3) can also be confirmed reasoning by cases: A is T. An expression of the form .T OR anything/ is equivalent to T. Since A is T both (3.2) and (3.3) in this case are of this form, so they have the same truth value, namely, T. A is F. An expression of the form .F OR anything/ will have same truth value as anything. Since A is F, (3.3) has the same truth value as B. An expression of the form .T AND anything/ is equivalent to anything, as is any expression of the form F OR anything. So in this case A OR .NOT.A/ AND B/ is equivalent to .NOT.A/ AND B/, which in turn is equivalent to B. “mcs” — 2017/3/10 — 22:22 — page 54 — #62 54 Chapter 3 Logical Formulas Therefore both (3.2) and (3.3) will have the same truth value in this case, namely, the value of B. Simplifying logical expressions has real practical importance in computer sci- ence. Expression simplification in programs like the one above can make a program easier to read and understand. Simplified programs may also run faster, since they require fewer operations. In hardware, simplifying expressions can decrease the number of logic gates on a chip because digital circuits can be described by logical formulas (see Problems 3.6 and 3.7). Minimizing the logical formulas corresponds to reducing the number of gates in the circuit. The payoff of gate minimization is potentially enormous: a chip with fewer gates is smaller, consumes less power, has a lower defect rate, and is cheaper to manufacture. 3.2.2 Cryptic Notation Java uses symbols like “&&” and “jj” in place of AND and OR. Circuit designers use “” and “C,” and actually refer to AND as a product and OR as a sum. Mathe- maticians use still other symbols, given in the table below. English Symbolic Notation NOT .P / :P (alternatively, P ) P AND Q P ^Q P OR Q P _Q P IMPLIES Q P !Q if P then Q P !Q P IFF Q P !Q P XOR Q P ˚Q For example, using this notation, “If P AND NOT.Q/, then R” would be written: .P ^ Q/ ! R: The mathematical notation is concise but cryptic. Words such as “AND” and “OR” are easier to remember and won’t get confused with operations on numbers. We will often use P as an abbreviation for NOT.P /, but aside from that, we mostly stick to the words—except when formulas would otherwise run off the page. 3.3 Equivalence and Validity 3.3.1 Implications and Contrapositives Do these two sentences say the same thing? “mcs” — 2017/3/10 — 22:22 — page 55 — #63 3.3. Equivalence and Validity 55 If I am hungry, then I am grumpy. If I am not grumpy, then I am not hungry. We can settle the issue by recasting both sentences in terms of propositional logic. Let P be the proposition “I am hungry” and Q be “I am grumpy.” The first sentence says “P IMPLIES Q” and the second says “NOT.Q/ IMPLIES NOT.P /.” Once more, we can compare these two statements in a truth table: P Q .P IMPLIES Q/ .NOT.Q/ IMPLIES NOT.P // T T T F T F T F F T F F F T T F T T F F T T T T Sure enough, the highlighted columns showing the truth values of these two state- ments are the same. A statement of the form “NOT.Q/ IMPLIES NOT.P /” is called the contrapositive of the implication “P IMPLIES Q.” The truth table shows that an implication and its contrapositive are equivalent—they are just different ways of saying the same thing. In contrast, the converse of “P IMPLIES Q” is the statement “Q IMPLIES P .” The converse to our example is: If I am grumpy, then I am hungry. This sounds like a rather different contention, and a truth table confirms this suspi- cion: P Q P IMPLIES Q Q IMPLIES P T T T T T F F T F T T F F F T T Now the highlighted columns differ in the second and third row, confirming that an implication is generally not equivalent to its converse. One final relationship: an implication and its converse together are equivalent to an iff statement, specifically, to these two statements together. For example, If I am grumpy then I am hungry, and if I am hungry then I am grumpy. are equivalent to the single statement: I am grumpy iff I am hungry. “mcs” — 2017/3/10 — 22:22 — page 56 — #64 56 Chapter 3 Logical Formulas Once again, we can verify this with a truth table. P Q .P IMPLIES Q/ AND .Q IMPLIES P / P IFF Q T T T T T T T F F F T F F T T F F F F F T T T T The fourth column giving the truth values of .P IMPLIES Q/ AND .Q IMPLIES P / is the same as the sixth column giving the truth values of P IFF Q, which confirms that the AND of the implications is equivalent to the IFF statement. 3.3.2 Validity and Satisfiability A valid formula is one which is always true, no matter what truth values its vari- ables may have. The simplest example is P OR NOT.P /: You can think about valid formulas as capturing fundamental logical truths. For example, a property of implication that we take for granted is that if one statement implies a second one, and the second one implies a third, then the first implies the third. The following valid formula confirms the truth of this property of implication. Œ.P IMPLIES Q/ AND .Q IMPLIES R/ IMPLIES .P IMPLIES R/: Equivalence of formulas is really a special case of validity. Namely, statements F and G are equivalent precisely when the statement .F IFF G/ is valid. For example, the equivalence of the expressions (3.3) and (3.2) means that .A OR B/ IFF .A OR .NOT.A/ AND B// is valid. Of course, validity can also be viewed as an aspect of equivalence. Namely, a formula is valid iff it is equivalent to T. A satisfiable formula is one which can sometimes be true—that is, there is some assignment of truth values to its variables that makes it true. One way satisfiabil- ity comes up is when there are a collection of system specifications. The job of the system designer is to come up with a system that follows all the specs. This means that the AND of all the specs must be satisfiable or the designer’s job will be impossible (see Problem 3.16). There is also a close relationship between validity and satisfiability: a statement P is satisfiable iff its negation NOT.P / is not valid. “mcs” — 2017/3/10 — 22:22 — page 57 — #65 3.4. The Algebra of Propositions 57 3.4 The Algebra of Propositions 3.4.1 Propositions in Normal Form Every propositional formula is equivalent to a “sum-of-products” or disjunctive form. More precisely, a disjunctive form is simply an OR of AND-terms, where each AND-term is an AND of variables or negations of variables, for example, .A AND B/ OR .A AND C /: (3.4) You can read a disjunctive form for any propositional formula directly from its truth table. For example, the formula A AND .B OR C / (3.5) has truth table: A B C A AND .B OR C / T T T T T T F T T F T T T F F F F T T F F T F F F F T F F F F F The formula (3.5) is true in the first row when A, B and C are all true, that is, where A AND B AND C is true. It is also true in the second row where A AND B AND C is true, and in the third row when A AND B AND C is true, and that’s all. So (3.5) is true exactly when .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C / (3.6) is true. The expression (3.6) is a disjunctive form where each AND-term is an AND of every one of the variables or their complements in turn. An expression of this form is called a disjunctive normal form (DNF). A DNF formula can often be simplified into a smaller disjunctive form. For example, the DNF (3.6) further simplifies to the equivalent disjunctive form (3.4) above. Applying the same reasoning to the F entries of a truth table yields a conjunctive form for any formula—an AND of OR-terms in which the OR-terms are OR’s only “mcs” — 2017/3/10 — 22:22 — page 58 — #66 58 Chapter 3 Logical Formulas of variables or their negations. For example, formula (3.5) is false in the fourth row of its truth table (3.4.1) where A is T, B is F and C is F. But this is exactly the one row where .A OR B OR C / is F! Likewise, the (3.5) is false in the fifth row which is exactly where .A OR B OR C / is F. This means that (3.5) will be F whenever the AND of these two OR-terms is false. Continuing in this way with the OR -terms corresponding to the remaining three rows where (3.5) is false, we get a conjunctive normal form (CNF) that is equivalent to (3.5), namely, .A OR B OR C / AND .A OR B OR C / AND .A OR B OR C /AND .A OR B OR C / AND .A OR B OR C / The methods above can be applied to any truth table, which implies Theorem 3.4.1. Every propositional formula is equivalent to both a disjunctive normal form and a conjunctive normal form. 3.4.2 Proving Equivalences A check of equivalence or validity by truth table runs out of steam pretty quickly: a proposition with n variables has a truth table with 2n lines, so the effort required to check a proposition grows exponentially with the number of variables. For a proposition with just 30 variables, that’s already over a billion lines to check! An alternative approach that sometimes helps is to use algebra to prove equiv- alence. A lot of different operators may appear in a propositional formula, so a useful first step is to get rid of all but three: AND, OR and NOT. This is easy because each of the operators is equivalent to a simple formula using only these three. For example, A IMPLIES B is equivalent to NOT.A/ OR B. Formulas using onlyAND, OR and NOT for the remaining operators are left to Problem 3.17. We list below a bunch of equivalence axioms with the symbol “ ! ” between equivalent formulas. These axioms are important because they are all that’s needed to prove every possible equivalence. We’ll start with some equivalences for AND’s that look like the familiar ones for multiplication of numbers: A AND B ! B AND A (commutativity of AND) (3.7) .A AND B/ AND C ! A AND .B AND C / (associativity of AND) (3.8) T AND A ! A (identity for AND) F AND A ! F (zero for AND) A AND .B OR C / ! .A AND B/ OR .A AND C / (distributivity of AND over OR) (3.9) “mcs” — 2017/3/10 — 22:22 — page 59 — #67 3.4. The Algebra of Propositions 59 Associativity (3.8) justifies writing A AND B AND C without specifying whether it is parenthesized as A AND .B AND C / or .A AND B/ AND C . Both ways of inserting parentheses yield equivalent formulas. Unlike arithmetic rules for numbers, there is also a distributivity law for “sums” over “products:” A OR .B AND C / ! .A OR B/ AND .A OR C / (distributivity of OR over AND) (3.10) Three more axioms that don’t directly correspond to number properties are A AND A ! A (idempotence for AND) A AND A ! F (contradiction for AND) (3.11) NOT .A/ ! A (double negation) (3.12) There are a corresponding set of equivalences for OR which we won’t bother to list, except for the OR rule corresponding to contradiction for AND (3.11): A OR A ! T (validity for OR) (3.13) Finally, there are De Morgan’s Laws which explain how to distribute NOT’s over AND ’sand OR’s: NOT.A AND B/ ! A OR B (De Morgan for AND) (3.14) NOT .A OR B/ ! A AND B (De Morgan for OR) (3.15) All of these axioms can be verified easily with truth tables. These axioms are all that’s needed to convert any formula to a disjunctive normal form. We can illustrate how they work by applying them to turn the negation of formula (3.5), NOT..A AND B/ OR .A AND C //: (3.16) into disjunctive normal form. We start by applying De Morgan’s Law for OR (3.15) to (3.16) in order to move the NOT deeper into the formula. This gives NOT .A AND B/ AND NOT.A AND C /: Now applying De Morgan’s Law for AND (3.14) to the two innermost AND-terms, gives .A OR B/ AND .A OR C /: (3.17) “mcs” — 2017/3/10 — 22:22 — page 60 — #68 60 Chapter 3 Logical Formulas At this point NOT only applies to variables, and we won’t need De Morgan’s Laws any further. Now we will repeatedly apply (3.9), distributivity of AND over OR, to turn (3.17) into a disjunctive form. To start, we’ll distribute .A OR B/ over AND to get ..A OR B/ AND A/ OR ..A OR B/ AND C /: Using distributivity over both AND’s we get ..A AND A/ OR .B AND A// OR ..A AND C / OR .B AND C //: By the way, we’ve implicitly used commutativity (3.7) here to justify distributing over an AND from the right. Now applying idempotence to remove the duplicate occurrence of A we get .A OR .B AND A// OR ..A AND C / OR .B AND C //: Associativity now allows dropping the parentheses around the terms being OR’d to yield the following disjunctive form for (3.16): A OR .B AND A/ OR .A AND C / OR .B AND C /: (3.18) The last step is to turn each of these AND-terms into a disjunctive normal form with all three variables A, B and C . We’ll illustrate how to do this for the second AND -term .B AND A/. This term needs to mention C to be in normal form. To introduce C , we use validity for OR and identity for AND to conclude that .B AND A/ ! .B AND A/ AND .C OR C /: Now distributing .B AND A/ over the OR yields the disjunctive normal form .B AND A AND C / OR .B AND A AND C /: Doing the same thing to the other AND-terms in (3.18) finally gives a disjunctive normal form for (3.5): .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C / OR .B AND A AND C / OR .B AND A AND C / OR .A AND C AND B/ OR .A AND C AND B/ OR .B AND C AND A/ OR .B AND C AND A/: “mcs” — 2017/3/10 — 22:22 — page 61 — #69 3.4. The Algebra of Propositions 61 Using commutativity to sort the term and OR-idempotence to remove duplicates, finally yields a unique sorted DNF: .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C / OR .A AND B AND C /: This example illustrates a strategy for applying these equivalences to convert any formula into disjunctive normal form, and conversion to conjunctive normal form works similarly, which explains: Theorem 3.4.2. Any propositional formula can be transformed into disjunctive normal form or a conjunctive normal form using the equivalences listed above. What has this got to do with equivalence? That’s easy: to prove that two for- mulas are equivalent, convert them both to disjunctive normal form over the set of variables that appear in the terms. Then use commutativity to sort the variables and AND -terms so they all appear in some standard order. We claim the formulas are equivalent iff they have the same sorted disjunctive normal form. This is obvious if they do have the same disjunctive normal form. But conversely, the way we read off a disjunctive normal form from a truth table shows that two different sorted DNF’s over the same set of variables correspond to different truth tables and hence to inequivalent formulas. This proves Theorem 3.4.3 (Completeness of the propositional equivalence axioms). Two propo- sitional formula are equivalent iff they can be proved equivalent using the equiva- lence axioms listed above. The benefit of the axioms is that they leave room for ingeniously applying them to prove equivalences with less effort than the truth table method. Theorem 3.4.3 then adds the reassurance that the axioms are guaranteed to prove every equiva- lence, which is a great punchline for this section. But we don’t want to mislead you: it’s important to realize that using the strategy we gave for applying the ax- ioms involves essentially the same effort it would take to construct truth tables, and there is no guarantee that applying the axioms will generally be any easier than using truth tables. “mcs” — 2017/3/10 — 22:22 — page 62 — #70 62 Chapter 3 Logical Formulas 3.5 The SAT Problem Determining whether or not a more complicated proposition is satisfiable is not so easy. How about this one? .P OR Q OR R/ AND .P OR Q/ AND .P OR R/ AND .R OR Q/ The general problem of deciding whether a proposition is satisfiable is called SAT. One approach to SAT is to construct a truth table and check whether or not a T ever appears, but as with testing validity, this approach quickly bogs down for formulas with many variables because truth tables grow exponentially with the number of variables. Is there a more efficient solution to SAT? In particular, is there some brilliant procedure that determines SAT in a number of steps that grows polynomially—like n2 or n14 —instead of exponentially—2n —whether any given proposition of size n is satisfiable or not? No one knows. And an awful lot hangs on the answer. The general definition of an “efficient” procedure is one that runs in polynomial time, that is, that runs in a number of basic steps bounded by a polynomial in s, where s is the size of an input. It turns out that an efficient solution to SAT would immediately imply efficient solutions to many other important problems involving scheduling, routing, resource allocation, and circuit verification across multiple dis- ciplines including programming, algebra, finance, and political theory. This would be wonderful, but there would also be worldwide chaos. Decrypting coded mes- sages would also become an easy task, so online financial transactions would be insecure and secret communications could be read by everyone. Why this would happen is explained in Section 9.12. Of course, the situation is the same for validity checking, since you can check for validity by checking for satisfiability of a negated formula. This also explains why the simplification of formulas mentioned in Section 3.2 would be hard—validity testing is a special case of determining if a formula simplifies to T. Recently there has been exciting progress on SAT-solvers for practical applica- tions like digital circuit verification. These programs find satisfying assignments with amazing efficiency even for formulas with millions of variables. Unfortu- nately, it’s hard to predict which kind of formulas are amenable to SAT-solver methods, and for formulas that are unsatisfiable, SAT-solvers are generally much less effective. So no one has a good idea how to solve SAT in polynomial time, or how to prove that it can’t be done—researchers are completely stuck. The problem of determining whether or not SAT has a polynomial time solution is known as the “mcs” — 2017/3/10 — 22:22 — page 63 — #71 3.6. Predicate Formulas 63 “P vs. NP” problem.2 It is the outstanding unanswered question in theoretical computer science. It is also one of the seven Millenium Problems: the Clay Institute will award you $1,000,000 if you solve the P vs. NP problem. 3.6 Predicate Formulas 3.6.1 Quantifiers The “for all” notation 8 has already made an early appearance in Section 1.1. For example, the predicate “x 2 0” is always true when x is a real number. That is, 8x 2 R: x 2 0 is a true statement. On the other hand, the predicate “5x 2 7 D 0” p is only sometimes true; specifically, when x D ˙ 7=5. There is a “there exists” notation 9 to indicate that a predicate is true for at least one, but not necessarily all objects. So 9x 2 R: 5x 2 7 D 0 is true, while 8x 2 R: 5x 2 7D0 is not true. There are several ways to express the notions of “always true” and “sometimes true” in English. The table below gives some general formats on the left and specific examples using those formats on the right. You can expect to see such phrases hundreds of times in mathematical writing! 2P stands for problems whose instances can be solved in time that grows polynomially with the size of the instance. NP stands for nondeterministtic polynomial time, but we’ll leave an explanation of what that is to texts on the theory of computational complexity. “mcs” — 2017/3/10 — 22:22 — page 64 — #72 64 Chapter 3 Logical Formulas Always True For all x 2 D, P .x/ is true. For all x 2 R, x 2 0. P .x/ is true for every x in the set D. x 2 0 for every x 2 R. Sometimes True There is an x 2 D such that P .x/ is true. There is an x 2 R such that 5x 2 7 D 0. P .x/ is true for some x in the set D. 5x 2 7 D 0 for some x 2 R. P .x/ is true for at least one x 2 D. 5x 2 7 D 0 for at least one x 2 R. All these sentences “quantify” how often the predicate is true. Specifically, an assertion that a predicate is always true is called a universal quantification, and an assertion that a predicate is sometimes true is an existential quantification. Some- times the English sentences are unclear with respect to quantification: If you can solve any problem we come up with, then you get an A for the course. (3.19) The phrase “you can solve any problem we can come up with” could reasonably be interpreted as either a universal or existential quantification: you can solve every problem we come up with, (3.20) or maybe you can solve at least one problem we come up with. (3.21) To be precise, let Probs be the set of problems we come up with, Solves.x/ be the predicate “You can solve problem x,” and G be the proposition, “You get an A for the course.” Then the two different interpretations of (3.19) can be written as follows: .8x 2 Probs: Solves.x// IMPLIES G; for (3.20); .9x 2 Probs: Solves.x// IMPLIES G: for (3.21): 3.6.2 Mixing Quantifiers Many mathematical statements involve several quantifiers. For example, we al- ready described Goldbach’s Conjecture 1.1.6: Every even integer greater than 2 is the sum of two primes. Let’s write this out in more detail to be precise about the quantification: “mcs” — 2017/3/10 — 22:22 — page 65 — #73 3.6. Predicate Formulas 65 For every even integer n greater than 2, there exist primes p and q such that n D p C q. Let Evens be the set of even integers greater than 2, and let Primes be the set of primes. Then we can write Goldbach’s Conjecture in logic notation as follows: 8n „ 2ƒ‚ … 9p Evens 2 Primes 9q 2 Primes: n D p C q: „ ƒ‚ … for every even there exist primes integer n > 2 p and q such that 3.6.3 Order of Quantifiers Swapping the order of different kinds of quantifiers (existential or universal) usually changes the meaning of a proposition. For example, let’s return to one of our initial, confusing statements: “Every American has a dream.” This sentence is ambiguous because the order of quantifiers is unclear. Let A be the set of Americans, let D be the set of dreams, and define the predicate H.a; d / to be “American a has dream d .” Now the sentence could mean there is a single dream that every American shares—such as the dream of owning their own home: 9 d 2 D 8a 2 A: H.a; d / Or it could mean that every American has a personal dream: 8a 2 A 9 d 2 D: H.a; d / For example, some Americans may dream of a peaceful retirement, while others dream of continuing practicing their profession as long as they live, and still others may dream of being so rich they needn’t think about work at all. Swapping quantifiers in Goldbach’s Conjecture creates a patently false statement that every even number 2 is the sum of the same two primes: 9 p 2 Primes 9 q 2 Primes: 8n 2 Evens n D p C q: „ ƒ‚ … „ ƒ‚ … there exist primes for every even p and q such that integer n > 2 3.6.4 Variables Over One Domain When all the variables in a formula are understood to take values from the same nonempty set D it’s conventional to omit mention of D. For example, instead of 8x 2 D 9y 2 D: Q.x; y/ we’d write 8x9y: Q.x; y/. The unnamed nonempty set “mcs” — 2017/3/10 — 22:22 — page 66 — #74 66 Chapter 3 Logical Formulas that x and y range over is called the domain of discourse, or just plain domain, of the formula. It’s easy to arrange for all the variables to range over one domain. For exam- ple, Goldbach’s Conjecture could be expressed with all variables ranging over the domain N as 8n: n 2 Evens IMPLIES .9 p 9 q: p 2 Primes AND q 2 Primes AND n D p C q/: 3.6.5 Negating Quantifiers There is a simple relationship between the two kinds of quantifiers. The following two sentences mean the same thing: Not everyone likes ice cream. There is someone who does not like ice cream. The equivalence of these sentences is an instance of a general equivalence that holds between predicate formulas: NOT .8x: P .x// is equivalent to 9x: NOT.P .x//: (3.22) Similarly, these sentences mean the same thing: There is no one who likes being mocked. Everyone dislikes being mocked. The corresponding predicate formula equivalence is NOT .9x: P .x// is equivalent to 8x: NOT.P .x//: (3.23) Note that the equivalence (3.23) follows directly by negating both sides the equiv- alence (3.22). The general principle is that moving a NOT to the other side of an “9” changes it into “8,” and vice versa. These equivalences are called De Morgan’s Laws for Quantifiers because they can be understood as applying De Morgan’s Laws for propositional formulas to an infinite sequence of AND’s and OR’s. For example, we can explain (3.22) by supposing the domain of discourse is fd0 ; d1 ; : : : ; dn ; : : :g. Then 9x: NOT.P .x// means the same thing as the infinite OR: NOT.P .d0 // OR NOT .P .d1 // OR OR NOT.P .dn // OR : : : : (3.24) Applying De Morgan’s rule to this infinite OR yields the equivalent formula NOT ŒP .d0 / AND P .d1 / AND AND P .dn / AND : : : : (3.25) “mcs” — 2017/3/10 — 22:22 — page 67 — #75 3.6. Predicate Formulas 67 But (3.25) means the same thing as NOTŒ8x: P .x/: This explains why 9x: NOT.P .x// means the same thing as NOTŒ8x: P .x/, which confirms(3.22). 3.6.6 Validity for Predicate Formulas The idea of validity extends to predicate formulas, but to be valid, a formula now must evaluate to true no matter what the domain of discourse may be, no matter what values its variables may take over the domain, and no matter what interpreta- tions its predicate variables may be given. For example, the equivalence (3.22) that gives the rule for negating a universal quantifier means that the following formula is valid: NOT.8x: P .x// IFF 9x: NOT .P .x//: (3.26) Another useful example of a valid assertion is 9x8y: P .x; y/ IMPLIES 8y9x: P .x; y/: (3.27) Here’s an explanation why this is valid: Let D be the domain for the variables and P0 be some binary predi- cate3 on D. We need to show that if 9x 2 D: 8y 2 D: P0 .x; y/ (3.28) holds under this interpretation, then so does 8y 2 D 9x 2 D: P0 .x; y/: (3.29) So suppose (3.28) is true. Then by definition of 9, this means that some element d0 2 D has the property that 8y 2 D: P0 .d0 ; y/: By definition of 8, this means that P0 .d0 ; d / is true for all d 2 D. So given any d 2 D, there is an element in D, namely d0 , such that P0 .d0 ; d / is true. But that’s exactly what (3.29) means, so we’ve proved that (3.29) holds under this interpretation, as required. 3 That is, a predicate that depends on two variables. “mcs” — 2017/3/10 — 22:22 — page 68 — #76 68 Chapter 3 Logical Formulas We hope this is helpful as an explanation, but we don’t really want to call it a “proof.” The problem is that with something as basic as (3.27), it’s hard to see what more elementary axioms are ok to use in proving it. What the explanation above did was translate the logical formula (3.27) into English and then appeal to the meaning, in English, of “for all” and “there exists” as justification. In contrast to (3.27), the formula 8y9x: P .x; y/ IMPLIES 9x8y: P .x; y/: (3.30) is not valid. We can prove this just by describing an interpretation where the hy- pothesis 8y9x: P .x; y/ is true but the conclusion 9x8y: P .x; y/ is not true. For example, let the domain be the integers and P .x; y/ mean x > y. Then the hy- pothesis would be true because, given a value n for y we could choose the value of x to be n C 1, for example. But under this interpretation the conclusion asserts that there is an integer that is bigger than all integers, which is certainly false. An interpretation like this that falsifies an assertion is called a counter-model to that assertion. 3.7 References [19] Problems for Section 3.1 Practice Problems Problem 3.1. Some people are uncomfortable with the idea that from a false hypothesis you can prove everything, and instead of having P IMPLIES Q be true when P is false, they want P IMPLIES Q to be false when P is false. This would lead to IMPLIES having the same truth table as what propositional connective? Problem 3.2. Your class has a textbook and a final exam. Let P , Q and R be the following propositions: P WWD You get an A on the final exam. “mcs” — 2017/3/10 — 22:22 — page 69 — #77 3.7. References 69 QWWD You do every exercise in the book. RWWD You get an A in the class. Translate following assertions into propositional formulas using P , Q, R and the propositional connectives AND; NOT; IMPLIES. (a) You get an A in the class, but you do not do every exercise in the book. (b) You get an A on the final, you do every exercise in the book, and you get an A in the class. (c) To get an A in the class, it is necessary for you to get an A on the final. (d) You get an A on the final, but you don’t do every exercise in this book; never- theless, you get an A in this class. Class Problems Problem 3.3. When the mathematician says to his student, “If a function is not continuous, then it is not differentiable,” then letting D stand for “differentiable” and C for continuous, the only proper translation of the mathematician’s statement would be NOT.C / IMPLIES NOT .D/; or equivalently, D IMPLIES C: But when a mother says to her son, “If you don’t do your homework, then you can’t watch TV,” then letting T stand for “can watch TV” and H for “do your homework,” a reasonable translation of the mother’s statement would be NOT .H / IFF NOT .T /; “mcs” — 2017/3/10 — 22:22 — page 70 — #78 70 Chapter 3 Logical Formulas or equivalently, H IFF T: Explain why it is reasonable to translate these two IF-THEN statements in dif- ferent ways into propositional formulas. Homework Problems Problem 3.4. Describe a simple procedure which, given a positive integer argument, n, produces a width n array of truth-values whose rows would be all the possible truth-value assignments for n propositional variables. For example, for n D 2, the array would be: T T T F F T F F Your description can be in English, or a simple program in some familiar lan- guage such as Python or Java. If you do write a program, be sure to include some sample output. Problem 3.5. Sloppy Sam is trying to prove a certain proposition P . He defines two related propositions Q and R, and then proceeds to prove three implications: P IMPLIES Q; Q IMPLIES R; R IMPLIES P: He then reasons as follows: If Q is true, then since I proved .Q IMPLIES R/, I can conclude that R is true. Now, since I proved .R IMPLIES P /, I can conclude that P is true. Similarly, if R is true, then P is true and so Q is true. Likewise, if P is true, then so are Q and R. So any way you look at it, all three of P; Q and R are true. (a) Exhibit truth tables for .P IMPLIES Q/ AND .Q IMPLIES R/ AND .R IMPLIES P / (*) and for P AND Q AND R: (**) Use these tables to find a truth assignment for P; Q; R so that (*) is T and (**) is F. “mcs” — 2017/3/10 — 22:22 — page 71 — #79 3.7. References 71 (b) You show these truth tables to Sloppy Sam and he says “OK, I’m wrong that P; Q and R all have to be true, but I still don’t see the mistake in my reasoning. Can you help me understand my mistake?” How would you explain to Sammy where the flaw lies in his reasoning? Problems for Section 3.2 Class Problems Problem 3.6. Propositional logic comes up in digital circuit design using the convention that T corresponds to 1 and F to 0. A simple example is a 2-bit half-adder circuit. This circuit has 3 binary inputs, a1 ; a0 and b, and 3 binary outputs, c; s1 ; s0 . The 2-bit word a1 a0 gives the binary representation of an integer k between 0 and 3. The 3-bit word cs1 s0 gives the binary representation of k C b. The third output bit c is called the final carry bit. So if k and b were both 1, then the value of a1 a0 would be 01 and the value of the output cs1 s0 would 010, namely, the 3-bit binary representation of 1 C 1. In fact, the final carry bit equals 1 only when all three binary inputs are 1, that is, when k D 3 and b D 1. In that case, the value of cs1 s0 is 100, namely, the binary representation of 3 C 1. This 2-bit half-adder could be described by the following formulas: c0 D b s0 D a0 XOR c0 c1 D a0 AND c0 the carry into column 1 s1 D a1 XOR c1 c2 D a1 AND c1 the carry into column 2 c D c2 : (a) Generalize the above construction of a 2-bit half-adder to an n C 1 bit half- adder with inputs an ; : : : ; a1 ; a0 and b and outputs c; sn ; : : : ; s1 ; s0 . That is, give simple formulas for si and ci for 0 i n C 1, where ci is the carry into column i C 1, and c D cnC1 . (b) Write similar definitions for the digits and carries in the sum of two n C 1-bit binary numbers an : : : a1 a0 and bn : : : b1 b0 . Visualized as digital circuits, the above adders consist of a sequence of single- digit half-adders or adders strung together in series. These circuits mimic ordinary “mcs” — 2017/3/10 — 22:22 — page 72 — #80 72 Chapter 3 Logical Formulas pencil-and-paper addition, where a carry into a column is calculated directly from the carry into the previous column, and the carries have to ripple across all the columns before the carry into the final column is determined. Circuits with this design are called ripple-carry adders. Ripple-carry adders are easy to understand and remember and require a nearly minimal number of operations. But the higher- order output bits and the final carry take time proportional to n to reach their final values. (c) How many of each of the propositional operations does your adder from part (b) use to calculate the sum? Homework Problems Problem 3.7. As in Problem 3.6, a digital circuit is called an .n C 1/-bit half-adder when it has with n C 2 inputs an ; : : : ; a1 ; a0 ; b and n C 2 outputs c; sn ; : : : ; s1 ; s0 : The input-output specification of the half-adder is that, if the 0-1 values of inputs an ; : : : ; a1 ; a0 are taken to be the .n C 1/-bit binary representation of an integer k then the 0-1 values of the outputs c; sn ; : : : ; s1 ; s0 are supposed to be the .nC2/-bit binary representation of k C b. For example suppose n D 2 and the values of a2 a1 a0 were 101. This is the binary representation of k D 5. Now if the value of b was 1, then the output should be the 4-bit representation of 5 C 1 D 6. Namely, the values of cs2 s1 s0 would be 0110. There are many different circuit designs for half adders. The most straighforward one is the “ripple carry” design described in Problem 3.6. We will now develop a different design for a half-adder circuit called a parallel-design or “look-ahead carry” half-adder. This design works by computing the values of higher-order digits for both a carry of 0 and a carry of 1, in parallel. Then, when the carry from the low-order digits finally arrives, the pre-computed answer can be quickly selected. We’ll illustrate this idea by working out a parallel design for an .n C 1/-bit half- adder. Parallel-design half-adders are built out of parallel-design circuits called add1- modules. The input-output behavior of an add1-module is just a special case of a half-adder, where instead of an adding an input b to the input, the add1-module always adds 1. That is, an .n C 1/-bit add1-module has .n C 1/ binary inputs an ; : : : ; a1 ; a0 ; “mcs” — 2017/3/10 — 22:22 — page 73 — #81 3.7. References 73 and n C 2 binary outputs c pn ; : : : ; p1 ; p0 : If an : : : a1 a0 are taken to be the .n C 1/-bit representation of an integer k then cpn : : : p1 p0 is supposed to be the .n C 2/-bit binary representation of k C 1. So a 1-bit add1-module just has input a0 and outputs c; p0 where p0 WWD a0 XOR 1; .or more simply, p0 WWD NOT.a0 //; c WWD a0 : In the ripple-carry design, a double-size half-adder with 2.n C 1/ inputs takes twice as long to produce its output values as an .n C 1/-input ripple-carry circuit. With parallel-design add1-modules, a double-size add1-module produces its output values nearly as fast as a single-size add1-modules. To see how this works, suppose the inputs of the double-size module are a2nC1 ; : : : ; a1 ; a0 and the outputs are c; p2nC1 ; : : : ; p1 ; p0 : We will build the double-size add1-module by having two single-size add1-modules work in parallel. The setup is illustrated in Figure 3.1. Namely, the first single-size add1-module handles the first n C 1 inputs. The in- puts to this module are the low-order n C 1 input bits an ; : : : ; a1 ; a0 , and its outputs will serve as the first n C 1 outputs pn ; : : : ; p1 ; p0 of the double-size module. Let c.1/ be the remaining carry output from this module. The inputs to the second single-size module are the higher-order n C 1 input bits a2nC1 ; : : : ; anC2 ; anC1 . Call its first n C 1 outputs rn ; : : : ; r1 ; r0 and let c.2/ be its carry. (a) Write a formula for the carry c of the double-size add1-module solely in terms of carries c.1/ and c.2/ of the single-size add1-modules. (b) Complete the specification of the double-size add1-module by writing propo- sitional formulas for the remaining outputs pnCi for 1 i n C 1. The formula for pnCi should only involve the variables anCi , ri 1 and c.1/ . (c) Explain how to build an .nC1/-bit parallel-design half-adder from an .nC1/- bit add1-module by writing a propositional formula for the half-adder output si using only the variables ai , pi and b. “mcs” — 2017/3/10 — 22:22 — page 74 — #82 74 Chapter 3 Logical Formulas (d) The speed or latency of a circuit is determined by the largest number of gates on any path from an input to an output. In an n-bit ripple carry circuit(Problem 3.6), there is a path from an input to the final carry output that goes through about 2n gates. In contrast, parallel half-adders are exponentially faster than ripple-carry half-adders. Confirm this by determining the largest number of propositional opera- tions, that is, gates, on any path from an input to an output of an n-bit add1-module. (You may assume n is a power of 2.) a2nC1 anC2 anC1 an a1 a0 c.2/ .nC1/-bit add1 c.1/ .nC1/-bit add1 rn r1 r0 c 2.nC2/-bit add1 module p2nC1 p nC2 pnC1 pn p1 p0 Figure 3.1 Structure of a Double-size add1 Module. Exam Problems Problem 3.8. Claim. There are exactly two truth environments (assignments) for the variables M; N; P; Q; R; S that satisfy the following formula: .P OR Q/ AND .Q OR R/ AND .R OR S / AND .S OR P / AND M AND N „ ƒ‚ … „ ƒ‚ … „ ƒ‚ … „ ƒ‚ … clause (1) clause (2) clause (3) clause (4) “mcs” — 2017/3/10 — 22:22 — page 75 — #83 3.7. References 75 (a) This claim could be proved by truth-table. How many rows would the truth table have? (b) Instead of a truth-table, prove this claim with an argument by cases according to the truth value of P . Problem 3.9. An n-bit AND-circuit has 0-1 valued inputs a0 ; a1 ; : : : ; an 1 and one output c whose value will be c D a0 AND a1 AND AND an 1: There are various ways to design an n-bit AND-circuit. A serial design is simply a series of AND-gates, each with one input being a circuit input ai and the other input being the output of the previous gate as shown in Figure 3.2. We can also use a tree design. A 1-bit tree design is just a wire, that is c WWD a1 . Assuming for simplicity that n is a power of two, an n-input tree circuit for n > 1 simply consists of two n=2-input tree circuits whose outputs are AND’d to produce output c, as in Figure 3.3. For example, a 4-bit tree design circut is shown in Figure 3.4. (a) How many AND-gates are in the n-input serial circuit? (b) The “speed” or latency of a circuit is the largest number of gates on any path from an input to an output. Briefly explain why the tree circuit is exponentially faster than the serial circuit. (c) Assume n is a power of two. Prove that the n-input tree circuit has n 1 AND -gates. Problems for Section 3.3 Practice Problems Problem 3.10. Indicate whether each of the following propositional formulas is valid (V), satis- fiable but not valid (S), or not satisfiable (N). For the satisfiable ones, indicate a satisfying truth assignment. “mcs” — 2017/3/10 — 22:22 — page 76 — #84 76 Chapter 3 Logical Formulas Figure 3.2 A serial AND-circuit. “mcs” — 2017/3/10 — 22:22 — page 77 — #85 3.7. References 77 Figure 3.3 An n-bit AND-tree circuit. Figure 3.4 A 4-bit AND-tree circuit. “mcs” — 2017/3/10 — 22:22 — page 78 — #86 78 Chapter 3 Logical Formulas M IMPLIES Q M IMPLIES .P OR Q/ M IMPLIES ŒM AND .P IMPLIES M / .P OR Q/ IMPLIES Q .P OR Q/ IMPLIES .P AND Q/ .P OR Q/ IMPLIES ŒM AND .P IMPLIES M / .P XOR Q/ IMPLIES Q .P XOR Q/ IMPLIES .P OR Q/ .P XOR Q/ IMPLIES ŒM AND .P IMPLIES M / Problem 3.11. Show truth tables that verify the equivalence of the following two propositional formulas .P XOR Q/; NOT .P IFF Q/: Problem 3.12. Prove that the propositional formulas P OR Q OR R and .P AND NOT.Q// OR .Q AND NOT.R// OR .R AND NOT.P // OR .P AND Q AND R/: are equivalent. Problem 3.13. Prove by truth table that OR distributes over AND, namely, P OR .Q AND R/ is equivalent to .P OR Q/ AND .P OR R/ (3.31) “mcs” — 2017/3/10 — 22:22 — page 79 — #87 3.7. References 79 Exam Problems Problem 3.14. The formula NOT.A IMPLIES B/ AND A AND C IMPLIES D AND E AND F AND G AND H AND I AND J AND K AND L AND M turns out to be valid. (a) Explain why verifying the validity of this formula by truth table would be very hard for one person to do with pencil and paper (no computers). (b) Verify that the formula is valid, reasoning by cases according to the truth value of A. Proof. Case: (A is True). Case: (A is False). Class Problems Problem 3.15. (a) Verify by truth table that .P IMPLIES Q/ OR .Q IMPLIES P / is valid. (b) Let P and Q be propositional formulas. Describe a single formula R using only AND’s, OR’s, NOT’s, and copies of P and Q, such that R is valid iff P and Q are equivalent. (c) A propositional formula is satisfiable iff there is an assignment of truth values to its variables—an environment—that makes it true. Explain why P is valid iff NOT .P / is not satisfiable. (d) A set of propositional formulas P1 ; : : : ; Pk is consistent iff there is an envi- ronment in which they are all true. Write a formula S such that the set P1 ; : : : ; Pk is not consistent iff S is valid. “mcs” — 2017/3/10 — 22:22 — page 80 — #88 80 Chapter 3 Logical Formulas Problem 3.16. This problem4 examines whether the following specifications are satisfiable: 1. If the file system is not locked, then (a) new messages will be queued. (b) new messages will be sent to the messages buffer. (c) the system is functioning normally, and conversely, if the system is functioning normally, then the file system is not locked. 2. If new messages are not queued, then they will be sent to the messages buffer. 3. New messages will not be sent to the message buffer. (a) Begin by translating the five specifications into propositional formulas using four propositional variables: L WWD file system locked; Q WWD new messages are queued; B WWD new messages are sent to the message buffer; N WWD system functioning normally: (b) Demonstrate that this set of specifications is satisfiable by describing a single truth assignment for the variables L; Q; B; N and verifying that under this assign- ment, all the specifications are true. (c) Argue that the assignment determined in part (b) is the only one that does the job. Problems for Section 3.4 Practice Problems Problem 3.17. A half dozen different operators may appear in propositional formulas, but just AND , OR , and NOT are enough to do the job. That is because each of the operators is equivalent to a simple formula using only these three operators. For example, A IMPLIES B is equivalent to NOT.A/ OR B. So all occurences of IMPLIES in a formula can be replaced using just NOT and OR. 4 Revised from Rosen, 5th edition, Exercise 1.1.36 “mcs” — 2017/3/10 — 22:22 — page 81 — #89 3.7. References 81 (a) Write formulas using only AND, OR, NOT that are equivalent to each of A IFF B and A XOR B. Conclude that every propositional formula is equivalent to an AND- OR - NOT formula. (b) Explain why you don’t even need AND. (c) Explain how to get by with the single operator NAND where A NAND B is equivalent by definition to NOT.A AND B/. Class Problems Problem 3.18. The propositional connective NOR is defined by the rule P NOR Q WWD .NOT.P / AND NOT.Q//: Explain why every propositional formula—possibly involving any of the usual op- erators such as IMPLIES, XOR, . . . —is equivalent to one whose only connective is NOR . Problem 3.19. Explain how to read off a conjunctive form for a propositional formula directly from a disjunctive form for its complement. Problem 3.20. Let P be the proposition depending on propositional variable A; B; C; D whose truth values for each truth assignment to A; B; C; D are given in the table below. Write out both a disjunctive and a conjunctive normal form for P . “mcs” — 2017/3/10 — 22:22 — page 82 — #90 82 Chapter 3 Logical Formulas A B C D P T T T T T T T T F F T T F T T T T F F F T F T T T T F T F T T F F T T T F F F T F T T T T F T T F F F T F T T F T F F F F F T T F F F T F F F F F T T F F F F T Homework Problems Problem 3.21. Use the equivalence axioms of Section 3.4.2 to convert the formula A XOR B XOR C (a) . . . to disjunctive (OR of AND’s) form, (b) . . . to conjunctive (AND of OR’s) form. Problems for Section 3.5 Class Problems Problem 3.22. The circuit-SAT problem is the problem of determining, for any given digital circuit with one output wire, whether there are truth values that can be fed into the circuit input wires which will lead the circuit to give output T. “mcs” — 2017/3/10 — 22:22 — page 83 — #91 3.7. References 83 It’s easy to see that any efficient way of solving the circuit-SAT problem would yield an efficient way to solve the usual SAT problem for propositional formulas (Section 3.5). Namely, for any formula F , just construct a circuit CF using that computes the values of the formula. Then there are inputs for which CF gives output true iff F is satisfiable. Constructing CF from F is easy, using a binary gate in CF for each propositional connective in F . So an efficient circuit-SAT procedure leads to an efficient SAT procedure. Conversely, there is a simple recursive procedure that will construct, given C , a formula EC that is equivalent to C in the sense that the truth value EC and the out- put of C are the same for every truth assignment of the variables. The difficulty is that, in general, the “equivalent” formula EC , will be exponentially larger than C . For the purposes of showing that satifiability of circuits and satisfiability of formu- las take roughly the same effort to solve, spending an exponential time translating one problem to the other swamps any benefit in switching from one problem to the other. So instead of a formula EC that is equivalent to C , we aim instead for a formula FC that is “equisatisfiable” with C . That is, there will be input values that make C output True iff there is a truth assignment that satisfies FC . (In fact, FC and C need not even use the same variables.) But now we make sure that the amount of computation needed to construct FC is not much larger than the size of the circuit C . In particular, the size of FC will also not be much larger than C . The idea behind the construction of FC is that, given any digital circuit C with binary gates and one output, we can assign a distinct variable to each wire of C . Then for each gate of C , we can set up a propositional formula that represents the constraints that the gate places on the values of its input and output wires. For example, for an AND gate with input wire variables P and Q and output wire variable R, the constraint proposition would be .P AND Q/ IFF R: (3.32) (a) Given a circuit C , explain how to easily find a formula FC of size proportional to the number of wires in C such that FC is satisfiable iff C gives output T for some set of input values. (b) Conclude that any efficient way of solving SAT would yield an efficient way to solve circuit-SAT. Homework Problems Problem 3.23. A 3-conjunctive form (3CF) formula is a conjunctive form formula in which each “mcs” — 2017/3/10 — 22:22 — page 84 — #92 84 Chapter 3 Logical Formulas OR -term is an OR of at most 3 variables or negations of variables. Although it may be hard to tell if a propositional formula F is satisfiable, it is always easy to construct a formula C.F / that is in 3-conjunctive form, has at most 24 times as many occurrences of variables as F , and is satisfiable iff F is satisfiable. To construct C.F /, introduce a different new variables for each operator that occurs in F . For example, if F was ..P XOR Q/ XOR R/ OR .P AND S / (3.33) we might use new variables X1 , X2 O and A corresponding to the operator occur- rences as follows: ..P „ƒ‚… XOR Q/ XOR R/ OR .P „ƒ‚… „ƒ‚… „ƒ‚… AND S /: X1 X2 O A Next we write a formula that constrains each new variable to have the same truth value as the subformula determined by its corresponding operator. For the example above, these constraining formulas would be X1 IFF .P XOR Q/; X2 IFF .X1 XOR R/; A IFF .P AND S /; O IFF .X2 OR A/ (a) Explain why the AND of the four constraining formulas above along with a fifth formula consisting of just the variable O will be satisfiable iff (3.33) is satisfi- able. (b) Explain why each constraining formula will be equivalent to a 3CF formula with at most 24 occurrences of variables. (c) Using the ideas illustrated in the previous parts, explain how to construct C.F / for an arbitrary propositional formula F . “mcs” — 2017/3/10 — 22:22 — page 85 — #93 3.7. References 85 Problems for Section 3.6 Practice Problems Problem 3.24. For each of the following propositions: 1. 8x 9y: 2x yD0 2. 8x 9y: x 2y D 0 3. 8x: x < 10 IMPLIES .8y: y < x IMPLIES y < 9/ 4. 8x 9y: Œy > x ^ 9z: y C z D 100 determine which propositions are true when the variables range over: (a) the nonnegative integers. (b) the integers. (c) the real numbers. Problem 3.25. Let Q.x; y/ be the statement “x has been a contestant on television show y.” The universe of discourse for x is the set of all students at your school and for y is the set of all quiz shows that have ever been on television. Determine whether or not each of the following expressions is logically equiva- lent to the sentence: “No student at your school has ever been a contestant on a television quiz show.” (a) 8x 8y: NOT.Q.x; y// (b) 9x 9y: NOT.Q.x; y// (c) NOT.8x 8y: Q.x; y// (d) NOT.9x 9y: Q.x; y// “mcs” — 2017/3/10 — 22:22 — page 86 — #94 86 Chapter 3 Logical Formulas Problem 3.26. Find a counter-model showing the following is not valid. 9x:P .x/ IMPLIES 8x:P .x/ (Just define your counter-model. You do not need to verify that it is correct.) Problem 3.27. Find a counter-model showing the following is not valid. Œ9x: P .x/ AND 9x:Q.x/ IMPLIES 9x:ŒP .x/ AND Q.x/ (Just define your counter-model. You do not need to verify that it is correct.) Problem 3.28. Which of the following are valid? For those that are not valid, desribe a counter- model. (a) 9x9y: P .x; y/ IMPLIES 9y9x: P .x; y/ (b) 8x9y: Q.x; y/ IMPLIES 9y8x: Q.x; y/ (c) 9x8y: R.x; y/ IMPLIES 8y9x: R.x; y/ (d) NOT.9x S.x// IFF 8x NOT.S.x// Problem 3.29. (a) Verify that the propositional formula .P IMPLIES Q/ OR .Q IMPLIES P / is valid. (b) The valid formula of part (a) leads to sound proof method: to prove that an im- plication is true, just prove that its converse is false.5 For example, from elementary calculus we know that the assertion If a function is continuous, then it is differentiable is false. This allows us to reach at the correct conclusion that its converse, 5 This problem was stimulated by the discussion of the fallacy in [3]. “mcs” — 2017/3/10 — 22:22 — page 87 — #95 3.7. References 87 If a function is differentiable, then it is continuous is true, as indeed it is. But wait a minute! The implication If a function is differentiable, then it is not continuous is completely false. So we could conclude that its converse If a function is not continuous, then it is differentiable, should be true, but in fact the converse is also completely false. So something has gone wrong here. Explain what. Class Problems Problem 3.30. A media tycoon has an idea for an all-news television network called LNN: The Logic News Network. Each segment will begin with a definition of the domain of discourse and a few predicates. The day’s happenings can then be communicated concisely in logic notation. For example, a broadcast might begin as follows: THIS IS LNN. The domain of discourse is fAlbert; Ben; Claire; David; Emilyg: Let D.x/ be a predicate that is true if x is deceitful. Let L.x; y/ be a predicate that is true if x likes y. Let G.x; y/ be a predicate that is true if x gave gifts to y. Translate the following broadcasts in logic notation into (English) statements. (a) NOT.D.Ben/ OR D.David// IMPLIES .L.Albert; Ben/ AND L.Ben; Albert//: (b) 8x: ..x D Claire AND NOT.L.x; Emily/// OR .x ¤ Claire AND L.x; Emily/// AND 8x: ..x D David AND L.x; Claire// OR .x ¤ David AND NOT.L.x; Claire//// “mcs” — 2017/3/10 — 22:22 — page 88 — #96 88 Chapter 3 Logical Formulas (c) NOT .D.Claire// IMPLIES .G.Albert; Ben/ AND 9x: G.Ben; x// (d) 8x9y9z .y ¤ z/ AND L.x; y/ AND NOT.L.x; z//: (e) How could you express “Everyone except for Claire likes Emily” using just propositional connectives without using any quantifiers (8; 9)? Can you generalize to explain how any logical formula over this domain of discourse can be expressed without quantifiers? How big would the formula in the previous part be if it was expressed this way? Problem 3.31. The goal of this problem is to translate some assertions about binary strings into logic notation. The domain of discourse is the set of all finite-length binary strings: , 0, 1, 00, 01, 10, 11, 000, 001, . . . . (Here denotes the empty string.) In your translations, you may use all the ordinary logic symbols (including =), variables, and the binary symbols 0, 1 denoting 0, 1. A string like 01x0y of binary symbols and variables denotes the concatenation of the symbols and the binary strings represented by the variables. For example, if the value of x is 011 and the value of y is 1111, then the value of 01x0y is the binary string 0101101111. Here are some examples of formulas and their English translations. Names for these predicates are listed in the third column so that you can reuse them in your solutions (as we do in the definition of the predicate NO -1 S below). Meaning Formula Name x is a prefix of y 9z .xz D y/ PREFIX (x; y) x is a substring of y 9u9v .uxv D y/ SUBSTRING (x; y) x is empty or a string of 0’s NOT . SUBSTRING .1; x// NO -1 S (x) (a) x consists of three copies of some string. (b) x is an even-length string of 0’s. (c) x does not contain both a 0 and a 1. (d) x is the binary representation of 2k C 1 for some integer k 0. “mcs” — 2017/3/10 — 22:22 — page 89 — #97 3.7. References 89 (e) An elegant, slightly trickier way to define NO -1 S.x/ is: PREFIX .x; 0x/: (*) Explain why (*) is true only when x is a string of 0’s. Problem 3.32. For each of the logical formulas, indicate whether or not it is true when the do- main of discourse is N, (the nonnegative integers 0, 1, 2, . . . ), Z (the integers), Q (the rationals), R (the real numbers), and C (the complex numbers). Add a brief explanation to the few cases that merit one. 9x: x 2 D 2 8x:9y: x 2 D y 8y:9x: x 2 D y 8x ¤ 0:9y: xy D 1 9x:9y: x C 2y D 2 AND 2x C 4y D 5 Problem 3.33. Show that .8x9y: P .x; y// ! 8z: P .z; z/ is not valid by describing a counter-model. Homework Problems Problem 3.34. Express each of the following predicates and propositions in formal logic notation. The domain of discourse is the nonnegative integers, N. Moreover, in addition to the propositional operators, variables and quantifiers, you may define predicates using addition, multiplication, and equality symbols, and nonnegative integer con- stants (0, 1,. . . ), but no exponentiation (like x y ). For example, the predicate “n is an even number” could be defined by either of the following formulas: 9m: .2m D n/; 9m: .m C m D n/: (a) m is a divisor of n. “mcs” — 2017/3/10 — 22:22 — page 90 — #98 90 Chapter 3 Logical Formulas (b) n is a prime number. (c) n is a power of a prime. Problem 3.35. Translate the following sentence into a predicate formula: There is a student who has e-mailed at most two other people in the class, besides possibly himself. The domain of discourse should be the set of students in the class; in addition, the only predicates that you may use are equality, and E.x; y/, meaning that “x has sent e-mail to y.” Problem 3.36. (a) Translate the following sentence into a predicate formula: There is a student who has e-mailed at most n other people in the class, besides possibly himself. The domain of discourse should be the set of students in the class; in addition, the only predicates that you may use are equality, E.x; y/, meaning that “x has sent e-mail to y.” (b) Explain how you would use your predicate formula (or some variant of it) to express the following two sentences. 1. There is a student who has emailed at least n other people in the class, besides possibly himself. 2. There is a student who has emailed exactly n other people in the class, besides possibly himself. Exam Problems Problem 3.37. For each of the logic formulas below, indicate the smallest domain in which it is true, among N(nonnegative integers); Z(integers); Q(rationals); R(reals); C(complex numbers); “mcs” — 2017/3/10 — 22:22 — page 91 — #99 3.7. References 91 or state “none” if it is not true in any of them. i. 8x9y: y D 3x ii. 8x9y: 3y D x iii. 8x9y: y 2 D x iv. 8x9y: y < x v. 8x9y: y 3 D x vi. 8x ¤ 0: 9y; z: y ¤ z AND y 2 D x D z 2 Problem 3.38. The following predicate logic formula is invalid: 8x; 9y:P .x; y/ ! 9y; 8x:P .x; y/ Which of the following are counter models for it? 1. The predicate P .x; y/ D ‘y x D 1’ where the domain of discourse is Q. 2. The predicate P .x; y/ D ‘y < x’ where the domain of discourse is R. 3. The predicate P .x; y/ D ‘y x D 2’ where the domain of discourse is R without 0. 4. The predicate P .x; y/ D ‘yxy D x’ where the domain of discourse is the set of all binary strings, including the empty string. Problem 3.39. Some students from a large class will be lined up left to right. There will be at least two students in the line. Translate each of the following assertions into predicate formulas with the set of students in the class as the domain of discourse. The only predicates you may use are equality and, F .x; y/, meaning that “x is somewhere to the left of y in the line.” For example, in the line “CDA”, both F .C; A/ and F .C; D/ are true. “mcs” — 2017/3/10 — 22:22 — page 92 — #100 92 Chapter 3 Logical Formulas Once you have defined a formula for a predicate P you may use the abbreviation “P ” in further formulas. (a) Student x is in the line. (b) Student x is first in line. (c) Student x is immediately to the right of student y. (d) Student x is second. Problem 3.40. We want to find predicate formulas about the nonnegative integers N in which is the only predicate that appears, and no constants appear. For example, there is such a formula defining the equality predicate: Œx D y WWD Œx y AND y x: Once predicate is shown to be expressible solely in terms of , it may then be used in subsequent translations. For example, Œx > 0 WWD 9y: NOT.x D y/ AND y x: (a) Œx D 0. (b) Œx D y C 1. Hint: If an integer is bigger than y, then it must be x. (c) x D 3. Problem 3.41. Predicate Formulas whose only predicate symbol is equality are called “pure equal- ity” formulas. For example, 8x 8y: x D y (1-element) is a pure equality formula. Its meaning is that there is exactly one element in the domain of discourse.6 Another such formula is 9a 9b 8x: x D a OR x D b: ( 2-elements) 6 Remember, a domain of discourse is not allowed to be empty. “mcs” — 2017/3/10 — 22:22 — page 93 — #101 3.7. References 93 Its meaning is that there are at most two elements in the domain of discourse. A formula that is not a pure equality formula is x y: (not-pure) Formula (not-pure) uses the less-than-or-equal predicate which is not allowed.7 (a) Describe a pure equality formula that means that there are exactly two ele- ments in the domain of discourse. (b) Describe a pure equality formula that means that there are exactly three ele- ments in the domain of discourse. 7 Infact, formula (not-pure) only makes sense when the domain elements are ordered, while pure equality formulas make sense over every domain. “mcs” — 2017/3/10 — 22:22 — page 94 — #102 “mcs” — 2017/3/10 — 22:22 — page 95 — #103 4 Mathematical Data Types We have assumed that you’ve already been introduced to the concepts of sets, se- quences, and functions, and we’ve used them informally several times in previous sections. In this chapter, we’ll now take a more careful look at these mathemati- cal data types. We’ll quickly review the basic definitions, add a few more such as “images” and “inverse images” that may not be familiar, and end the chapter with some methods for comparing the sizes of sets. 4.1 Sets Informally, a set is a bunch of objects, which are called the elements of the set. The elements of a set can be just about anything: numbers, points in space, or even other sets. The conventional way to write down a set is to list the elements inside curly-braces. For example, here are some sets: A D fAlex; Tippy; Shells; Shadowg dead pets B D fred; blue; yellowg primary colors C D ffa; bg; fa; cg; fb; cgg a set of sets This works fine for small finite sets. Other sets might be defined by indicating how to generate a list of them: D WWD f1; 2; 4; 8; 16; : : :g the powers of 2 The order of elements is not significant, so fx; yg and fy; xg are the same set written two different ways. Also, any object is, or is not, an element of a given set— there is no notion of an element appearing more than once in a set.1 So, writing fx; xg is just indicating the same thing twice: that x is in the set. In particular, fx; xg D fxg. The expression “e 2 S” asserts that e is an element of set S. For example, 32 2 D and blue 2 B, but Tailspin 62 A—yet. Sets are simple, flexible, and everywhere. You’ll find some set mentioned in nearly every section of this text. 1 It’s not hard to develop a notion of multisets in which elements can occur more than once, but multisets are not ordinary sets and are not covered in this text. “mcs” — 2017/3/10 — 22:22 — page 96 — #104 96 Chapter 4 Mathematical Data Types 4.1.1 Some Popular Sets Mathematicians have devised special symbols to represent some common sets. symbol set elements ; the empty set none N nonnegative integers f0; 1; 2; 3; : : :g Z integers f: : : ; 3; 2; 1; 0; 1; 2; 3; : : :g 1 5 Q rational numbers 2; 3 ; 16;petc. R real numbers ; e; p9; 2; etc. C complex numbers i; 192 ; 2 2i; etc. A superscript “C ” restricts a set to its positive elements; for example, RC denotes the set of positive real numbers. Similarly, Z denotes the set of negative integers. 4.1.2 Comparing and Combining Sets The expression S T indicates that set S is a subset of set T , which means that every element of S is also an element of T . For example, N Z because every nonnegative integer is an integer; Q R because every rational number is a real number, but C 6 R because not every complex number is a real number. As a memory trick, think of the “” symbol as like the “” sign with the smaller set or number on the left-hand side. Notice that just as n n for any number n, also S S for any set S . There is also a relation on sets like the “less than” relation < on numbers. S T means that S is a subset of T , but the two are not equal. So just as n 6< n for every number n, also A 6 A, for every set A. “S T ” is read as “S is a strict subset of T .” There are several basic ways to combine sets. For example, suppose X WWD f1; 2; 3g; Y WWD f2; 3; 4g: Definition 4.1.1. The union of sets A and B, denoted A [ B, includes exactly the elements appearing in A or B or both. That is, x 2A[B IFF x 2 A OR x 2 B: So X [ Y D f1; 2; 3; 4g. “mcs” — 2017/3/10 — 22:22 — page 97 — #105 4.1. Sets 97 The intersection of A and B, denoted A \ B, consists of all elements that appear in both A and B. That is, x 2A\B IFF x 2 A AND x 2 B: So, X \ Y D f2; 3g. The set difference of A and B, denoted A B, consists of all elements that are in A, but not in B. That is, x2A B IFF x 2 A AND x … B: So, X Y D f1g and Y X D f4g. Often all the sets being considered are subsets of a known domain of discourse D. Then for any subset A of D, we define A to be the set of all elements of D not in A. That is, A WWD D A: The set A is called the complement of A. So A D ; IFF A D D: For example, if the domain we’re working with is the integers, the complement of the nonnegative integers is the set of negative integers: NDZ : We can use complement to rephrase subset in terms of equality A B is equivalent to A \ B D ;: 4.1.3 Power Set The set of all the subsets of a set A is called the power set pow.A/ of A. So B 2 pow.A/ IFF B A: For example, the elements of pow.f1; 2g/ are ;; f1g; f2g and f1; 2g. More generally, if A has n elements, then there are 2n sets in pow.A/—see The- orem 4.5.5. For this reason, some authors use the notation 2A instead of pow.A/. “mcs” — 2017/3/10 — 22:22 — page 98 — #106 98 Chapter 4 Mathematical Data Types 4.1.4 Set Builder Notation An important use of predicates is in set builder notation. We’ll often want to talk about sets that cannot be described very well by listing the elements explicitly or by taking unions, intersections, etc., of easily described sets. Set builder notation often comes to the rescue. The idea is to define a set using a predicate; in particular, the set consists of all values that make the predicate true. Here are some examples of set builder notation: A WWD fn 2 N j n is a prime and n D 4k C 1 for some integer kg; B WWD fx 2 R j x 3 3x C 1 > 0g; C WWD fa C bi 2 C j a2 C 2b 2 1g; D WWD fL 2 books j L is cited in this textg: The set A consists of all nonnegative integers n for which the predicate “n is a prime and n D 4k C 1 for some integer k” is true. Thus, the smallest elements of A are: 5; 13; 17; 29; 37; 41; 53; 61; 73; : : : : Trying to indicate the set A by listing these first few elements wouldn’t work very well; even after ten terms, the pattern is not obvious. Similarly, the set B consists of all real numbers x for which the predicate x3 3x C 1 > 0 is true. In this case, an explicit description of the set B in terms of intervals would require solving a cubic equation. Set C consists of all complex numbers a C bi such that: a2 C 2b 2 1 This is an oval-shaped region around the origin in the complex plane. Finally, the members of set D can be determined by filtering out journal articles in from the list of references in the Bibliography 22.5. 4.1.5 Proving Set Equalities Two sets are defined to be equal if they have exactly the same elements. That is, X D Y means that z 2 X if and only if z 2 Y , for all elements z.2 So, set equalities can be formulated and proved as “iff” theorems. For example: 2 Thisis actually the first of the ZFC axioms for set theory mentioned at the end of Section 1.3 and discussed further in Section 8.3.2. “mcs” — 2017/3/10 — 22:22 — page 99 — #107 4.1. Sets 99 Theorem 4.1.2. [Distributive Law for Sets] Let A, B and C be sets. Then: A \ .B [ C / D .A \ B/ [ .A \ C / (4.1) Proof. The equality (4.1) is equivalent to the assertion that z 2 A \ .B [ C / iff z 2 .A \ B/ [ .A \ C / (4.2) for all z. Now we’ll prove (4.2) by a chain of iff’s. Now we have z 2 A \ .B [ C / iff .z 2 A/ AND .z 2 B [ C / (def of \) iff .z 2 A/ AND .z 2 B OR z 2 C / (def of [) iff .z 2 A AND z 2 B/ OR .z 2 A AND z 2 C / (AND distributivity (3.9)) iff .z 2 A \ B/ OR .z 2 A \ C / (def of \) iff z 2 .A \ B/ [ .A \ C / (def of [) The proof of Theorem 4.1.2 illustrates a general method for proving a set equality involving the basic set operations by checking that a corresponding propositional formula is valid. As a further example, from De Morgan’s Law (3.14) for proposi- tions NOT .P AND Q/ is equivalent to P OR Q we can derive (Problem 4.5) a corresponding De Morgan’s Law for set equality: A \ B D A [ B: (4.3) Despite this correspondence between two kinds of operations, it’s important not to confuse propositional operations with set operations. For example, if X and Y are sets, then it is wrong to write “X AND Y ” instead of “X \ Y .” Applying AND to sets will cause your compiler—or your grader—to throw a type error, because an operation that is only supposed to be applied to truth values has been applied to sets. Likewise, if P and Q are propositions, then it is a type error to write “P [ Q” instead of “P OR Q.” “mcs” — 2017/3/10 — 22:22 — page 100 — #108 100 Chapter 4 Mathematical Data Types 4.2 Sequences Sets provide one way to group a collection of objects. Another way is in a sequence, which is a list of objects called its components, members, or elements. Short se- quences are commonly described by listing the elements between parentheses; for example, the sequence .a; b; c/ has three components. It would also be referred to as a three element sequence or a sequence of length three. These phrases are all synonyms—sequences are so basic that they appear everywhere and there are a lot of ways to talk about them. While both sets and sequences perform a gathering role, there are several differ- ences. The elements of a set are required to be distinct, but elements in a sequence can be the same. Thus, .a; b; a/ is a valid sequence of length three, but fa; b; ag is a set with two elements, not three. The elements in a sequence have a specified order, but the elements of a set do not. For example, .a; b; c/ and .a; c; b/ are different sequences, but fa; b; cg and fa; c; bg are the same set. Texts differ on notation for the empty sequence; we use for the empty sequence. The product operation is one link between sets and sequences. A Cartesian product of sets, S1 S2 Sn , is a new set consisting of all sequences where the first component is drawn from S1 , the second from S2 , and so forth. Length two sequences are called pairs.3 For example, N fa; bg is the set of all pairs whose first element is a nonnegative integer and whose second element is an a or a b: N fa; bg D f.0; a/; .0; b/; .1; a/; .1; b/; .2; a/; .2; b/; : : :g A product of n copies of a set S is denoted S n . For example, f0; 1g3 is the set of all 3-bit sequences: f0; 1g3 D f.0; 0; 0/; .0; 0; 1/; .0; 1; 0/; .0; 1; 1/; .1; 0; 0/; .1; 0; 1/; .1; 1; 0/; .1; 1; 1/g 3 Some texts call them ordered pairs. “mcs” — 2017/3/10 — 22:22 — page 101 — #109 4.3. Functions 101 4.3 Functions 4.3.1 Domains and Images A function assigns an element of one set, called the domain, to an element of an- other set, called the codomain. The notation f WA!B indicates that f is a function with domain A and codomain B. The familiar notation “f .a/ D b” indicates that f assigns the element b 2 B to a. Here b would be called the value of f at argument a. Functions are often defined by formulas, as in: 1 f1 .x/ WWD x2 where x is a real-valued variable, or f2 .y; z/ WWD y10yz where y and z range over binary strings, or f3 .x; n/ WWD the length n sequence .x; : : : ; x/ „ ƒ‚ … n x’s where n ranges over the nonnegative integers. A function with a finite domain could be specified by a table that shows the value of the function at each element of the domain. For example, a function f4 .P; Q/ where P and Q are propositional variables is specified by: P Q f4 .P; Q/ T T T T F F F T T F F T Notice that f4 could also have been described by a formula: f4 .P; Q/ WWD ŒP IMPLIES Q: A function might also be defined by a procedure for computing its value at any element of its domain, or by some other kind of specification. For example, define “mcs” — 2017/3/10 — 22:22 — page 102 — #110 102 Chapter 4 Mathematical Data Types f5 .y/ to be the length of a left to right search of the bits in the binary string y until a 1 appears, so f5 .0010/ D 3; f5 .100/ D 1; f5 .0000/ is undefined: Notice that f5 does not assign a value to any string of just 0’s. This illustrates an important fact about functions: they need not assign a value to every element in the domain. In fact this came up in our first example f1 .x/ D 1=x 2 , which does not assign a value to 0. So in general, functions may be partial functions, meaning that there may be domain elements for which the function is not defined. If a function is defined on every element of its domain, it is called a total function. It’s often useful to find the set of values a function takes when applied to the elements in a set of arguments. So if f W A ! B, and S is a subset of A, we define f .S / to be the set of all the values that f takes when it is applied to elements of S . That is, f .S / WWD fb 2 B j f .s/ D b for some s 2 Sg: For example, if we let Œr; s denote set of numbers in the interval from r to s on the real line, then f1 .Œ1; 2/ D Œ1=4; 1. For another example, let’s take the “search for a 1” function f5 . If we let X be the set of binary words which start with an even number of 0’s followed by a 1, then f5 .X / would be the odd nonnegative integers. Applying f to a set S of arguments is referred to as “applying f pointwise to S ”, and the set f .S / is referred to as the image of S under f .4 The set of values that arise from applying f to all possible arguments is called the range of f . That is, range.f / WWD f .domain.f //: Some authors refer to the codomain as the range of a function, but they shouldn’t. The distinction between the range and codomain will be important later in Sec- tions 4.5 when we relate sizes of sets to properties of functions between them. 4.3.2 Function Composition Doing things step by step is a universal idea. Taking a walk is a literal example, but so is cooking from a recipe, executing a computer program, evaluating a formula, and recovering from substance abuse. 4 There is a picky distinction between the function f which applies to elements of A and the function which applies f pointwise to subsets of A, because the domain of f is A, while the domain of pointwise-f is pow.A/. It is usually clear from context whether f or pointwise-f is meant, so there is no harm in overloading the symbol f in this way. “mcs” — 2017/3/10 — 22:22 — page 103 — #111 4.4. Binary Relations 103 Abstractly, taking a step amounts to applying a function, and going step by step corresponds to applying functions one after the other. This is captured by the op- eration of composing functions. Composing the functions f and g means that first f is applied to some argument, x, to produce f .x/, and then g is applied to that result to produce g.f .x//. Definition 4.3.1. For functions f W A ! B and g W B ! C , the composition, g ı f , of g with f is defined to be the function from A to C defined by the rule: .g ı f /.x/ WWD g.f .x//; for all x 2 A. Function composition is familiar as a basic concept from elementary calculus, and it plays an equally basic role in discrete mathematics. 4.4 Binary Relations Binary relations define relations between two objects. For example, “less-than” on the real numbers relates every real number a to a real number b, precisely when a < b. Similarly, the subset relation relates a set A to another set B precisely when A B. A function f W A ! B is a special case of binary relation in which an element a 2 A is related to an element b 2 B precisely when b D f .a/. In this section we’ll define some basic vocabulary and properties of binary rela- tions. Definition 4.4.1. A binary relation R consists of a set A, called the domain of R, a set B called the codomain of R, and a subset of A B called the graph of R. A relation whose domain is A and codomain is B is said to be “between A and B”, or “from A to B.” As with functions, we write R W A ! B to indicate that R is a relation from A to B. When the domain and codomain are the same set A we simply say the relation is “on A.” It’s common to use “a R b” to mean that the pair .a; b/ is in the graph of R.5 Notice that Definition 4.4.1 is exactly the same as the definition in Section 4.3 of a function, except that it doesn’t require the functional condition that, for each 5 Writing the relation or operator symbol between its arguments is called infix notation. Infix expressions like “m < n” or “m C n” are the usual notation used for things like the less-then relation or the addition operation rather than prefix notation like “< .m; n/” or “C.m; n/.” “mcs” — 2017/3/10 — 22:22 — page 104 — #112 104 Chapter 4 Mathematical Data Types domain element a, there is at most one pair in the graph whose first coordinate is a. As we said, a function is a special case of a binary relation. The “in-charge of” relation Chrg for MIT in Spring ’10 subjects and instructors is a handy example of a binary relation. Its domain Fac is the names of all the MIT faculty and instructional staff, and its codomain is the set SubNums of subject numbers in the Fall ’09–Spring ’10 MIT subject listing. The graph of Chrg contains precisely the pairs of the form .hinstructor-namei ; hsubject-numi/ such that the faculty member named hinstructor-namei is in charge of the subject with number hsubject-numi that was offered in Spring ’10. So graph.Chrg/ con- tains pairs like .T. Eng; 6.UAT/ .G. Freeman; 6.011/ .G. Freeman; 6.UAT/ .G. Freeman; 6.881/ .G. Freeman; 6.882/ .J. Guttag; 6.00/ .A. R. Meyer; 6.042/ (4.4) .A. R. Meyer; 18.062/ .A. R. Meyer; 6.844/ .T. Leighton; 6.042/ .T. Leighton; 18.062/ :: : Some subjects in the codomain SubNums do not appear among this list of pairs— that is, they are not in range.Chrg/. These are the Fall term-only subjects. Simi- larly, there are instructors in the domain Fac who do not appear in the list because they are not in charge of any Spring term subjects. 4.4.1 Relation Diagrams Some standard properties of a relation can be visualized in terms of a diagram. The diagram for a binary relation R has points corresponding to the elements of the domain appearing in one column (a very long column if domain.R/ is infinite). All the elements of the codomain appear in another column which we’ll usually picture as being to the right of the domain column. There is an arrow going from a point a in the left-hand, domain column to a point b in the right-hand, codomain column, precisely when the corresponding elements are related by R. For example, here are diagrams for two functions: “mcs” — 2017/3/10 — 22:22 — page 105 — #113 4.4. Binary Relations 105 A B A B a - 1 a - 1 b PP 2 b PP 2 PP3 PP 3 c PP P Pq c Q P Pq 3 3 PP 3 Q PP d q 4 d QQ 4 QQ e s 5 Being a function is certainly an important property of a binary relation. What it means is that every point in the domain column has at most one arrow coming out of it. So we can describe being a function as the “ 1 arrow out” property. There are four more standard properties of relations that come up all the time. Here are all five properties defined in terms of arrows: Definition 4.4.2. A binary relation R is: a function when it has the Œ 1 arrow out property. surjective when it has the Œ 1 arrows in property. That is, every point in the right-hand, codomain column has at least one arrow pointing to it. total when it has the Œ 1 arrows out property. injective when it has the Œ 1 arrow in property. bijective when it has both the ŒD 1 arrow out and the ŒD 1 arrow in prop- erty. From here on, we’ll stop mentioning the arrows in these properties and for ex- ample, just write Œ 1 in instead of Œ 1 arrows in. So in the diagrams above, the relation on the left has the ŒD 1 out and Œ 1 in properties, which means it is a total, surjective function. But it does not have the Œ 1 in property because element 3 has two arrows going into it; it is not injective. The relation on the right has the ŒD 1 out and Œ 1 in properties, which means it is a total, injective function. But it does not have the Œ 1 in property because element 4 has no arrow going into it; it is not surjective. The arrows in a diagram for R correspond, of course, exactly to the pairs in the graph of R. Notice that the arrows alone are not enough to determine, for example, if R has the Œ 1 out, total, property. If all we knew were the arrows, we wouldn’t know about any points in the domain column that had no arrows out. In other words, graph.R/ alone does not determine whether R is total: we also need to know what domain.R/ is. “mcs” — 2017/3/10 — 22:22 — page 106 — #114 106 Chapter 4 Mathematical Data Types Example 4.4.3. The function defined by the formula 1=x 2 has the Œ 1 out prop- erty if its domain is RC , but not if its domain is some set of real numbers including 0. It has the ŒD 1 in and ŒD 1 out property if its domain and codomain are both RC , but it has neither the Œ 1 in nor the Œ 1 out property if its domain and codomain are both R. 4.4.2 Relational Images The idea of the image of a set under a function extends directly to relations. Definition 4.4.4. The image of a set Y under a relation R written R.Y /, is the set of elements of the codomain B of R that are related to some element in Y . In terms of the relation diagram, R.Y / is the set of points with an arrow coming in that starts from some point in Y . For example, the set of subject numbers that Meyer is in charge of in Spring ’10 is exactly Chrg.A. Meyer/. To figure out what this is, we look for all the arrows in the Chrg diagram that start at “A. Meyer,” and see which subject-numbers are at the other end of these arrows. Looking at the list (4.4) of pairs in graph.Chrg/, we see that these subject-numbers are f6.042, 18.062, 6.844g. Similarly, to find the subject numbers that either Freeman or Eng are in charge of, we can collect all the arrows that start at either “G. Freeman,” or “T. Eng” and, again, see which subject- numbers are at the other end of these arrows. This is Chrg.fG. Freeman; T. Engg/. Looking again at the list (4.4), we see that Chrg.fG. Freeman; T. Engg/ D f6.011, 6.881, 6.882, 6.UATg Finally, Fac is the set of all in-charge instructors, so Chrg.Fac/ is the set of all the subjects listed for Spring ’10. Inverse Relations and Images Definition 4.4.5. The inverse, R 1 of a relation R W A ! B is the relation from B to A defined by the rule b R 1 a IFF a R b: In other words, R 1 is the relation you get by reversing the direction of the arrows in the diagram of R. Definition 4.4.6. The image of a set under the relation R 1 is called the inverse image of the set. That is, the inverse image of a set X under the relation R is defined to be R 1 .X /. “mcs” — 2017/3/10 — 22:22 — page 107 — #115 4.5. Finite Cardinality 107 Continuing with the in-charge example above, the set of instructors in charge of 6.UAT in Spring ’10 is exactly the inverse image of f6.UATg under the Chrg relation. From the list (4.4), we see that Eng and Freeman are both in charge of 6.UAT, that is, 1 fT. Eng; D. Freemang Chrg .f6.UATg/: We can’t assert equality here because there may be additional pairs further down the list showing that additional instructors are co-incharge of 6.UAT. Now let Intro be the set of introductory course 6 subject numbers. These are the subject numbers that start with “6.0.” So the set of names of the instructors who were in-charge of introductory course 6 subjects in Spring ’10, is Chrg 1 .Intro/. From the part of the Chrg list shown in (4.4), we see that Meyer, Leighton, Free- man, and Guttag were among the instructors in charge of introductory subjects in Spring ’10. That is, 1 fMeyer, Leighton, Freeman, Guttagg Chrg .Intro/: Finally, Chrg 1 .SubNums/ is the set of all instructors who were in charge of a subject listed for Spring ’10. 4.5 Finite Cardinality A finite set is one that has only a finite number of elements. This number of ele- ments is the “size” or cardinality of the set: Definition 4.5.1. If A is a finite set, the cardinality jAj of A is the number of elements in A. A finite set may have no elements (the empty set), or one element, or two ele- ments,. . . , so the cardinality of finite sets is always a nonnegative integer. Now suppose R W A ! B is a function. This means that every element of A contributes at most one arrow to the diagram for R, so the number of arrows is at most the number of elements in A. That is, if R is a function, then jAj #arrows: If R is also surjective, then every element of B has an arrow into it, so there must be at least as many arrows in the diagram as the size of B. That is, #arrows jBj: “mcs” — 2017/3/10 — 22:22 — page 108 — #116 108 Chapter 4 Mathematical Data Types Combining these inequalities implies that if R is a surjective function, then jAj jBj. In short, if we write A surj B to mean that there is a surjective function from A to B, then we’ve just proved a lemma: if A surj B for finite sets A; B, then jAj jBj. The following definition and lemma lists this statement and three similar rules relating domain and codomain size to relational properties. Definition 4.5.2. Let A; B be (not necessarily finite) sets. Then 1. A surj B iff there is a surjective function from A to B. 2. A inj B iff there is an injective total relation from A to B. 3. A bij B iff there is a bijection from A to B. Lemma 4.5.3. For finite sets A; B: 1. If A surj B, then jAj jBj. 2. If A inj B, then jAj jBj. 3. If A bij B, then jAj D jBj. Proof. We’ve already given an “arrow” proof of implication 1. Implication 2. fol- lows immediately from the fact that if R has the Œ 1 out, function property, and the Œ 1 in, surjective property, then R 1 is total and injective, so A surj B iff B inj A. Finally, since a bijection is both a surjective function and a total injective relation, implication 3. is an immediate consequence of the first two. Lemma 4.5.3.1. has a converse: if the size of a finite set A is greater than or equal to the size of another finite set B then it’s always possible to define a surjective function from A to B. In fact, the surjection can be a total function. To see how this works, suppose for example that A D fa0 ; a1 ; a2 ; a3 ; a4 ; a5 g B D fb0 ; b1 ; b2 ; b3 g: Then define a total function f W A ! B by the rules f .a0 / WWD b0 ; f .a1 / WWD b1 ; f .a2 / WWD b2 ; f .a3 / D f .a4 / D f .a5 / WWD b3 : More concisely, f .ai / WWD bmin.i;3/ ; “mcs” — 2017/3/10 — 22:22 — page 109 — #117 4.5. Finite Cardinality 109 for 0 i 5. Since 5 3, this f is a surjection. So we have figured out that if A and B are finite sets, then jAj jBj if and only if A surj B. All told, this argument wraps up the proof of a theorem that summarizes the whole finite cardinality story: Theorem 4.5.4. [Mapping Rules] For finite sets A; B, jAj jBj iff A surj B; (4.5) jAj jBj iff A inj B; (4.6) jAj D jBj iff A bij B; (4.7) 4.5.1 How Many Subsets of a Finite Set? As an application of the bijection mapping rule (4.7), we can give an easy proof of: Theorem 4.5.5. There are 2n subsets of an n-element set. That is, jAj D n implies j pow.A/j D 2n : For example, the three-element set fa1 ; a2 ; a3 g has eight different subsets: ; fa1 g fa2 g fa1 ; a2 g fa3 g fa1 ; a3 g fa2 ; a3 g fa1 ; a2 ; a3 g Theorem 4.5.5 follows from the fact that there is a simple bijection from subsets of A to f0; 1gn , the n-bit sequences. Namely, let a1 ; a2 ; : : : ; an be the elements of A. The bijection maps each subset of S A to the bit sequence .b1 ; : : : ; bn / defined by the rule that bi D 1 iff ai 2 S: For example, if n D 10, then the subset fa2 ; a3 ; a5 ; a7 ; a10 g maps to a 10-bit sequence as follows: subset: f a2 ; a 3 ; a5 ; a7 ; a10 g sequence: . 0; 1; 1; 0; 1; 0; 1; 0; 0; 1 / Now by bijection case of the Mapping Rules 4.5.4.(4.7), j pow.A/j D jf0; 1gn j: But every computer scientist knows6 that there are 2n n-bit sequences! So we’ve proved Theorem 4.5.5! 6 Incase you’re someone who doesn’t know how many n-bit sequences there are, you’ll find the 2n explained in Section 15.2.2. “mcs” — 2017/3/10 — 22:22 — page 110 — #118 110 Chapter 4 Mathematical Data Types Problems for Section 4.1 Practice Problems Problem 4.1. For any set A, let pow.A/ be its power set, the set of all its subsets; note that A is itself a member of pow.A/. Let ; denote the empty set. (a) The elements of pow.f1; 2g/ are: (b) The elements of pow.f;; f;gg/ are: (c) How many elements are there in pow.f1; 2; : : : ; 8g/? Problem 4.2. Express each of the following assertions about sets by a formula of set theory.7 Expressions may use abbreviations introduced earlier (so it is now legal to use “D” because we just defined it). (a) x D ;. (b) x D fy; zg. (c) x y. (x is a subset of y that might equal y.) Now we can explain how to express “x is a proper subset of y” as a set theory formula using things we already know how to express. Namely, letting “x ¤ y” abbreviate NOT.x D y/, the expression .x y AND x ¤ y/; describes a formula of set theory that means x y. From here on, feel free to use any previously expressed property in describing formulas for the following: (d) x D y [ z. (e) x D y z. (f) x D pow.y/. 7 See Section 8.3.2. “mcs” — 2017/3/10 — 22:22 — page 111 — #119 4.5. Finite Cardinality 111 S (g) x D z2y z. This means that y is supposed to be S S x is the union of all of a collection of sets, and them. A more concise notation for “ z2y z’ is simply “ y.” Class Problems Problem 4.3. Set Formulas and Propositional Formulas. (a) Verify that the propositional formula .P AND Q/ OR .P AND Q/ is equivalent to P . (b) Prove that A D .A B/ [ .A \ B/ for all sets, A; B, by showing x 2 A IFF x 2 .A B/ [ .A \ B/ for all elements x using the equivalence of part (a) in a chain of IFF’s. Problem 4.4. Prove Theorem (Distributivity of union over intersection). A [ .B \ C / D .A [ B/ \ .A [ C / (4.8) for all sets, A; B; C , by using a chain of iff’s to show that x 2 A [ .B \ C / IFF x 2 .A [ B/ \ .A [ C / for all elements x. You may assume the corresponding propositional equivalence 3.10. Problem 4.5. Prove De Morgan’s Law for set equality A \ B D A [ B: (4.9) by showing with a chain of IFF’s that x 2 the left-hand side of (4.9) iff x 2 the right-hand side. You may assume the propositional version (3.14) of De Morgan’s Law. “mcs” — 2017/3/10 — 22:22 — page 112 — #120 112 Chapter 4 Mathematical Data Types Problem 4.6. Powerset Properties. Let A and B be sets. (a) Prove that pow.A \ B/ D pow.A/ \ pow.B/: (b) Prove that .pow.A/ [ pow.B// pow.A [ B/; with equality holding iff one of A or B is a subset of the other. Problem 4.7. Subset take-away8 is a two player game played with a finite set A of numbers. Players alternately choose nonempty subsets of A with the conditions that a player may not choose the whole set A, or any set containing a set that was named earlier. The first player who is unable to move loses the game. For example, if the size of A is one, then there are no legal moves and the second player wins. If A has exactly two elements, then the only legal moves are the two one-element subsets of A. Each is a good reply to the other, and so once again the second player wins. The first interesting case is when A has three elements. This time, if the first player picks a subset with one element, the second player picks the subset with the other two elements. If the first player picks a subset with two elements, the second player picks the subset whose sole member is the third element. In both cases, these moves lead to a situation that is the same as the start of a game on a set with two elements, and thus leads to a win for the second player. Verify that when A has four elements, the second player still has a winning strat- egy.9 8 From Christenson & Tilford, David Gale’s Subset Takeaway Game, American Mathematical Monthly, Oct. 1997 9 David Gale worked out some of the properties of this game and conjectured that the second player wins the game for any set A. This remains an open problem. “mcs” — 2017/3/10 — 22:22 — page 113 — #121 4.5. Finite Cardinality 113 Homework Problems Problem 4.8. Let A, B and C be sets. Prove that A [ B [ C D .A B/ [ .B C / [ .C A/ [ .A \ B \ C / (4.10) using a chain of IFF’s as Section 4.1.5. Problem 4.9. Union distributes over the intersection of two sets: A [ .B \ C / D .A [ B/ \ .A [ C / (4.11) (see Problem 4.4). Use (4.11) and the Well Ordering Principle to prove the Distributive Law of union over the intersection of n sets: A [ .B1 \ \ Bn 1 \ Bn / D .A [ B1 / \ \ .A [ Bn 1/ \ .A [ Bn / (4.12) Extending formulas to an arbitrary number of terms is a common (if mundane) application of the WOP. Exam Problems Problem 4.10. You’ve seen how certain set identities follow from corresponding propositional equivalences. For example, you proved by a chain of iff’s that .A B/ [ .A \ B/ D A using the fact that the propositional formula .P AND Q/ OR .P AND Q/ is equivalent to P . State a similar propositional equivalence that would justify the key step in a proof for the following set equality organized as a chain of iff’s: A B D A C [ .B \ C / [ A [ B \ C (4.13) (You are not being asked to write out an iff-proof of the equality or to write out a proof of the propositional equivalence. Just state the equivalence.) “mcs” — 2017/3/10 — 22:22 — page 114 — #122 114 Chapter 4 Mathematical Data Types Problem 4.11. You’ve seen how certain set identities follow from corresponding propositional equivalences. For example, you proved by a chain of iff’s that .A B/ [ .A \ B/ D A using the fact that the propositional formula .P AND Q/ OR .P AND Q/ is equivalent to P . State a similar propositional equivalence that would justify the key step in a proof for the following set equality organized as a chain of iff’s: A \ B \ C D A [ .B A/ [ C : (You are not being asked to write out an iff-proof of the equality or to write out a proof of the propositional equivalence. Just state the equivalence.) Problem 4.12. The set equation A\B DA[B follows from a certain equivalence between propositional formulas. (a) What is the equivalence? (b) Show how to derive the equation from this equivalence. Problems for Section 4.2 Homework Problems Problem 4.13. Prove that for any sets A, B, C and D, if the Cartesian products A B and C D are disjoint, then either A and C are disjoint or B and D are disjoint. Problem 4.14. (a) Give a simple example where the following result fails, and briefly explain why: False Theorem. For sets A, B, C and D, let L WWD .A [ B/ .C [ D/; R WWD .A C / [ .B D/: “mcs” — 2017/3/10 — 22:22 — page 115 — #123 4.5. Finite Cardinality 115 Then L D R. (b) Identify the mistake in the following proof of the False Theorem. Bogus proof. Since L and R are both sets of pairs, it’s sufficient to prove that .x; y/ 2 L ! .x; y/ 2 R for all x; y. The proof will be a chain of iff implications: .x; y/ 2 R iff .x; y/ 2 .A C / [ .B D/ iff .x; y/ 2 A C , or .x; y/ 2 B D iff (x 2 A and y 2 C ) or else (x 2 B and y 2 D) iff either x 2 A or x 2 B, and either y 2 C or y 2 D iff x 2 A [ B and y 2 C [ D iff .x; y/ 2 L. (c) Fix the proof to show that R L. Problems for Section 4.4 Practice Problems Problem 4.15. The inverse R 1 of a binary relation R from A to B is the relation from B to A defined by: 1 bR a iff a R b: In other words, you get the diagram for R 1 from R by “reversing the arrows” in the diagram describing R. Now many of the relational properties of R correspond to different properties of R 1 . For example, R is total iff R 1 is a surjection. Fill in the remaining entries is this table: R is iff R 1 is total a surjection a function a surjection an injection a bijection “mcs” — 2017/3/10 — 22:22 — page 116 — #124 116 Chapter 4 Mathematical Data Types Hint: Explain what’s going on in terms of “arrows” from A to B in the diagram for R. Problem 4.16. Describe a total injective function ŒD 1 out, Œ 1 in; from R ! R that is not a bijection. Problem 4.17. For a binary relation R W A ! B, some properties of R can be determined from just the arrows of R, that is, from graph.R/, and others require knowing if there are elements in the domain A or the codomain B that don’t show up in graph.R/. For each of the following possible properties of R, indicate whether it is always determined by 1. graph.R/ alone, 2. graph.R/ and A alone, 3. graph.R/ and B alone, 4. all three parts of R. Properties: (a) surjective (b) injective (c) total (d) function (e) bijection Problem 4.18. For each of the following real-valued functions on the real numbers, indicate whether it is a bijection, a surjection but not a bijection, an injection but not a bijection, or neither an injection nor a surjection. (a) x ! x C 2 (b) x ! 2x “mcs” — 2017/3/10 — 22:22 — page 117 — #125 4.5. Finite Cardinality 117 (c) x ! x 2 (d) x ! x 3 (e) x ! sin x (f) x ! x sin x (g) x ! e x Problem 4.19. Let f W A ! B and g W B ! C be functions and h W A ! C be their composition, namely, h.a/ WWD g.f .a// for all a 2 A. (a) Prove that if f and g are surjections, then so is h. (b) Prove that if f and g are bijections, then so is h. (c) If f is a bijection, then so is f 1. Problem 4.20. Give an example of a relation R that is a total injective function from a set A to itself but is not a bijection. Problem 4.21. Let R W A ! B be a binary relation. Each of the following formulas expresses the fact that R has a familiar relational “arrow” property such as being surjective or being a function. Identify the relational property expressed by each of the following relational expressions. Explain your reasoning. (a) R ı R 1 IdB (b) R 1 ı R IdA (c) R 1 ı R IdA (d) R ı R 1 IdB Class Problems Problem 4.22. (a) Prove that if A surj B and B surj C , then A surj C . “mcs” — 2017/3/10 — 22:22 — page 118 — #126 118 Chapter 4 Mathematical Data Types (b) Explain why A surj B iff B inj A. (c) Conclude from (a) and (b) that if A inj B and B inj C , then A inj C . (d) Explain why A inj B iff there is a total injective function (ŒD 1 out; 1 in) from A to B. 10 Problem 4.23. Five basic properties of binary relations R W A ! B are: 1. R is a surjection Œ 1 in 2. R is an injection Œ 1 in 3. R is a function Œ 1 out 4. R is total Œ 1 out 5. R is empty ŒD 0 out Below are some assertions about a relation R. For each assertion, write the numbers of all the properties above that the relation R must have; write “none” if R might not have any of these properties. For example, you should write “(1), (4)” next to the first assertion. Variables a; a1 ; : : : range over A and b; b1 ; : : : range over B. (a) 8a 8b: a R b. (1), (4) (b) NOT.8a 8b: a R b/. (c) 8a 8b: QNOT .a R b/. (d) 8a 9b: a R b. (e) 8b 9a: a R b. (f) R is a bijection. V (g) 8a 9b1 a R b1 8b: a R b IMPLIES b D b1 . (h) 8a; b: a R b OR a ¤ b. (i) 8b1 ; b2 ; a: .a R b1 AND a R b2 / IMPLIES b1 D b2 . 10 The official definition of inj is with a total injective relation (Œ 1 out; 1 in) “mcs” — 2017/3/10 — 22:22 — page 119 — #127 4.5. Finite Cardinality 119 (j) 8a1 ; a2 ; b: .a1 R b AND a2 R b/ IMPLIES a1 D a2 . (k) 8a1 ; a2 ; b1 ; b2 : .a1 R b1 AND a2 R b2 AND a1 ¤ a2 / IMPLIES b1 ¤ b2 . (l) 8a1 ; a2 ; b1 ; b2 : .a1 R b1 AND a2 R b2 AND b1 ¤ b2 / IMPLIES a1 ¤ a2 . Homework Problems Problem 4.24. Let f W A ! B and g W B ! C be functions. (a) Prove that if the composition g ı f is a bijection, then f is a total injection and g is a surjection. (b) Show there is a total injection f and a bijection, g, such that g ı f is not a bijection. Problem 4.25. Let A, B and C be nonempty sets, and let f W B ! C and g W A ! B be functions. Let h WWD f ı g be the composition function of f and g, namely, the function with domain A and codomain C such that h.x/ D f .g.x//. (a) Prove that if h is surjective and f is total and injective, then g must be surjec- tive. Hint: contradiction. (b) Suppose that h is injective and f is total. Prove that g must be injective and provide a counterexample showing how this claim could fail if f was not total. Problem 4.26. Let A, B and C be sets, and let f W B ! C and g W A ! B be functions. Let h W A ! C be the composition f ı g; that is, h.x/ WWD f .g.x// for x 2 A. Prove or disprove the following claims: (a) If h is surjective, then f must be surjective. (b) If h is surjective, then g must be surjective. (c) If h is injective, then f must be injective. (d) If h is injective and f is total, then g must be injective. “mcs” — 2017/3/10 — 22:22 — page 120 — #128 120 Chapter 4 Mathematical Data Types Problem 4.27. Let R be a binary relation on a set D. Let x; y be variables ranging over D. Indicate the expressions below whose meaning is that R is an injection Œ 1 in. Remember R is a not necessarily total or a function. 1. R.x/ D R.y/ IMPLIES x D y 2. R.x/ \ R.y/ D ; IMPLIES x ¤ y 3. R.x/ \ R.y/ ¤ ; IMPLIES x ¤ y 4. R.x/ \ R.y/ ¤ ; IMPLIES x D y 5. R 1 .R.x// D fxg 6. R 1 .R.x// fxg 7. R 1 .R.x// fxg 8. R.R 1 .x// Dx Problem 4.28. The language of sets and relations may seem remote from the practical world of programming, but in fact there is a close connection to relational databases, a very popular software application building block implemented by such software packages as MySQL. This problem explores the connection by considering how to manipulate and analyze a large data set using operators over sets and relations. Sys- tems like MySQL are able to execute very similar high-level instructions efficiently on standard computer hardware, which helps programmers focus on high-level de- sign. Consider a basic Web search engine, which stores information on Web pages and processes queries to find pages satisfying conditions provided by users. At a high level, we can formalize the key information as: A set P of pages that the search engine knows about A binary relation L (for link) over pages, defined such that p1 L p2 iff page p1 links to p2 A set E of endorsers, people who have recorded their opinions about which pages are high-quality “mcs” — 2017/3/10 — 22:22 — page 121 — #129 4.5. Finite Cardinality 121 A binary relation R (for recommends) between endorsers and pages, such that e R p iff person e has recommended page p A set W of words that may appear on pages A binary relation M (for mentions) between pages and words, where p M w iff word w appears on page p Each part of this problem describes an intuitive, informal query over the data, and your job is to produce a single expression using the standard set and relation operators, such that the expression can be interpreted as answering the query cor- rectly, for any data set. Your answers should use only the set and relation symbols given above, in addition to terms standing for constant elements of E or W , plus the following operators introduced in the text: set union [. set intersection \. set difference . relational image—for example, R.A/ for some set A, or R.a/ for some spe- cific element a. relational inverse 1. . . . and one extra: relational composition which generalizes composition of functions a .R ı S / c WWD 9b 2 B: .a S b/ AND .b R c/: In other words, a is related to c in R ı S if starting at a you can follow an S arrow to the start of an R arrow and then follow the R arrow to get to c.11 Here is one worked example to get you started: Search description: The set of pages containing the word “logic” Solution expression: M 1 .“logic”/ Find similar solutions for each of the following searches: (a) The set of pages containing the word “logic” but not the word “predicate” 11 Note the reversal of R and S in the definition; this is to make relational composition work like function composition. For functions, f ı g means you apply g first. That is, if we let h be f ı g, then h.x/ D f .g.x//. “mcs” — 2017/3/10 — 22:22 — page 122 — #130 122 Chapter 4 Mathematical Data Types (b) The set of pages containing the word “set” that have been recommended by “Meyer” (c) The set of endorsers who have recommended pages containing the word “al- gebra” (d) The relation that relates endorser e and word w iff e has recommended a page containing w (e) The set of pages that have at least one incoming or outgoing link (f) The relation that relates word w and page p iff w appears on a page that links to p (g) The relation that relates word w and endorser e iff w appears on a page that links to a page that e recommends (h) The relation that relates pages p1 and p2 iff p2 can be reached from p1 by following a sequence of exactly 3 links Exam Problems Problem 4.29. Let A be the set containing the five sets: fag; fb; cg; fb; d g; fa; eg; fe; f g, and let B be the set containing the three sets: fa; bg; fb; c; d g; fe; f g. Let R be the “is subset of” binary relation from A to B defined by the rule: XRY IFF X Y: (a) Fill in the arrows so the following figure describes the graph of the relation, R: “mcs” — 2017/3/10 — 22:22 — page 123 — #131 4.5. Finite Cardinality 123 A arrows B fag fa; bg fb; cg fb; c; d g fb; d g fe; f g fa; eg fe; f g (b) Circle the properties below possessed by the relation R: function total injective surjective bijective (c) Circle the properties below possessed by the relation R 1: function total injective surjective bijective Problem 4.30. (a) Five assertions about a binary relation R W A ! B are bulleted below. There are nine predicate formulas that express some of these assertions. Write the numbers of the formulas next to the assertions they express. For example, you should write “4” next to the last assertion, since formula (4) expresses the assertion that R is the identity relation. Variables a; a1 ; : : : range over the domain A and b; b1 ; : : : range over the codomain B. More than one formula may express one assertion. R is a surjection R is an injection “mcs” — 2017/3/10 — 22:22 — page 124 — #132 124 Chapter 4 Mathematical Data Types R is a function R is total R is the identity relation. 1. 8b: 9a: a R b. 2. 8a: 9b: a R b. 3. 8a: a R a. 4. 8a; b: a R b IFF a D b. 5. 8a; b: a R b OR a ¤ b. 6. 8b1 ; b2 ; a: .a R b1 AND a R b2 / IMPLIES b1 D b2 . 7. 8a1 ; a2 ; b: .a1 R b AND a2 R b/ IMPLIES a1 D a2 . 8. 8a1 ; a2 ; b1 ; b2 : .a1 R b1 AND a2 R b2 AND a1 ¤ a2 / IMPLIES b1 ¤ b2 . 9. 8a1 ; a2 ; b1 ; b2 : .a1 R b1 AND a2 R b2 AND b1 ¤ b2 / IMPLIES a1 ¤ a2 . (b) Give an example of a relation R that satisfies three of the properties surjection, injection, total, and function (you indicate which) but is not a bijection. Problem 4.31. Prove that if relation R W A ! B is a total injection, Œ 1 out; Œ 1 in, then 1 R ı R D IdA ; where IdA is the identity function on A. (A simple argument in terms of ”arrows” will do the job.) Problem 4.32. Let R W A ! B be a binary relation. (a) Prove that R is a function iff R ı R 1 IdB . Write similar containment formulas involving R 1 ıR, RıR 1 , Ida , IdB equivalent to the assertion that R has each of the following properties. No proof is required. (b) total. (c) a surjection. (d) a injection. “mcs” — 2017/3/10 — 22:22 — page 125 — #133 4.5. Finite Cardinality 125 Problem 4.33. Let R W A ! B and S W B ! C be binary relations such that S ı R is a bijection and jAj D 2. Give an example of such R; S where neither R nor S is a function. Indicate ex- actly which properties—total, surjection, function, and injection—your examples of R and S have. Hint: Let jBj D 4. Problem 4.34. The set f1; 2; 3g! consists of the infinite sequences of the digits 1,2, and 3, and likewise f4; 5g! is the set of infinite sequences of the digits 4,5. For example 123123123 : : : 2 f1; 2; 3g! ; 222222222222 : : : 2 f1; 2; 3g! ; 4554445554444 : : : 2 f4; 5g! : (a) Give an example of a total injective function f W f1; 2; 3g! ! f4; 5g! : (b) Give an example of a bijection g W .f1; 2; 3g! f1; 2; 3g! / ! f1; 2; 3g! . (c) Explain why there is a bijection between f1; 2; 3g! f1; 2; 3g! and f4; 5g! . (You need not explicitly define the bijection.) Problems for Section 4.5 Practice Problems Problem 4.35. Assume f W A ! B is total function, and A is finite. Replace the ? with one of ; D; to produce the strongest correct version of the following statements: (a) jf .A/j ? jBj. (b) If f is a surjection, then jAj ? jBj. (c) If f is a surjection, then jf .A/j ? jBj. (d) If f is an injection, then jf .A/j ? jAj. (e) If f is a bijection, then jAj ? jBj. “mcs” — 2017/3/10 — 22:22 — page 126 — #134 126 Chapter 4 Mathematical Data Types Class Problems Problem 4.36. Let A D fa0 ; a1 ; : : : ; an 1 g be a set of size n, and B D fb0 ; b1 ; : : : ; bm 1 g a set of size m. Prove that jA Bj D mn by defining a simple bijection from A B to the nonnegative integers from 0 to mn 1. Problem 4.37. Let R W A ! B be a binary relation. Use an arrow counting argument to prove the following generalization of the Mapping Rule 1. Lemma. If R is a function, and X A, then jXj jR.X /j: “mcs” — 2017/3/10 — 22:22 — page 127 — #135 5 Induction Induction is a powerful method for showing a property is true for all nonnegative integers. Induction plays a central role in discrete mathematics and computer sci- ence. In fact, its use is a defining characteristic of discrete—as opposed to contin- uous—mathematics. This chapter introduces two versions of induction, Ordinary and Strong, and explains why they work and how to use them in proofs. It also introduces the Invariant Principle, which is a version of induction specially adapted for reasoning about step-by-step processes. 5.1 Ordinary Induction To understand how induction works, suppose there is a professor who brings a bottomless bag of assorted miniature candy bars to her large class. She offers to share the candy in the following way. First, she lines the students up in order. Next she states two rules: 1. The student at the beginning of the line gets a candy bar. 2. If a student gets a candy bar, then the following student in line also gets a candy bar. Let’s number the students by their order in line, starting the count with 0, as usual in computer science. Now we can understand the second rule as a short description of a whole sequence of statements: If student 0 gets a candy bar, then student 1 also gets one. If student 1 gets a candy bar, then student 2 also gets one. If student 2 gets a candy bar, then student 3 also gets one. :: : Of course, this sequence has a more concise mathematical description: If student n gets a candy bar, then student n C 1 gets a candy bar, for all nonnegative integers n. “mcs” — 2017/3/10 — 22:22 — page 128 — #136 128 Chapter 5 Induction So suppose you are student 17. By these rules, are you entitled to a miniature candy bar? Well, student 0 gets a candy bar by the first rule. Therefore, by the second rule, student 1 also gets one, which means student 2 gets one, which means student 3 gets one as well, and so on. By 17 applications of the professor’s second rule, you get your candy bar! Of course the rules really guarantee a candy bar to every student, no matter how far back in line they may be. 5.1.1 A Rule for Ordinary Induction The reasoning that led us to conclude that every student gets a candy bar is essen- tially all there is to induction. The Induction Principle. Let P be a predicate on nonnegative integers. If P .0/ is true, and P .n/ IMPLIES P .n C 1/ for all nonnegative integers n then P .m/ is true for all nonnegative integers m. Since we’re going to consider several useful variants of induction in later sec- tions, we’ll refer to the induction method described above as ordinary induction when we need to distinguish it. Formulated as a proof rule as in Section 1.4.1, this would be Rule. Induction Rule P .0/; 8n 2 N: P .n/ IMPLIES P .n C 1/ 8m 2 N: P .m/ This Induction Rule works for the same intuitive reason that all the students get candy bars, and we hope the explanation using candy bars makes it clear why the soundness of ordinary induction can be taken for granted. In fact, the rule is so obvious that it’s hard to see what more basic principle could be used to justify it.1 What’s not so obvious is how much mileage we get by using it. 1 But see Section 5.3. “mcs” — 2017/3/10 — 22:22 — page 129 — #137 5.1. Ordinary Induction 129 5.1.2 A Familiar Example Below is the formula (5.1) for the sum of the nonnegative integers up to n. The formula holds for all nonnegative integers, so it is the kind of statement to which induction applies directly. We’ve already proved this formula using the Well Or- dering Principle (Theorem 2.2.1), but now we’ll prove it by induction, that is, using the Induction Principle. Theorem 5.1.1. For all n 2 N, n.n C 1/ 1 C 2 C 3 C C n D (5.1) 2 To prove the theorem by induction, define predicate P .n/ to be the equation (5.1). Now the theorem can be restated as the claim that P .n/ is true for all n 2 N. This is great, because the Induction Principle lets us reach precisely that conclusion, provided we establish two simpler facts: P .0/ is true. For all n 2 N, P .n/ IMPLIES P .n C 1/. So now our job is reduced to proving these two statements. The first statement follows because of the convention that a sum of zero terms is equal to 0. So P .0/ is the true assertion that a sum of zero terms is equal to 0.0 C 1/=2 D 0. The second statement is more complicated. But remember the basic plan from Section 1.5 for proving the validity of any implication: assume the statement on the left and then prove the statement on the right. In this case, we assume P .n/— namely, equation (5.1)—in order to prove P .n C 1/, which is the equation .n C 1/.n C 2/ 1 C 2 C 3 C C n C .n C 1/ D : (5.2) 2 These two equations are quite similar; in fact, adding .n C 1/ to both sides of equation (5.1) and simplifying the right side gives the equation (5.2): n.n C 1/ 1 C 2 C 3 C C n C .n C 1/ D C .n C 1/ 2 .n C 2/.n C 1/ D 2 Thus, if P .n/ is true, then so is P .n C 1/. This argument is valid for every non- negative integer n, so this establishes the second fact required by the induction proof. Therefore, the Induction Principle says that the predicate P .m/ is true for all nonnegative integers, m. The theorem is proved. “mcs” — 2017/3/10 — 22:22 — page 130 — #138 130 Chapter 5 Induction 5.1.3 A Template for Induction Proofs The proof of equation (5.1) was relatively simple, but even the most complicated induction proof follows exactly the same template. There are five components: 1. State that the proof uses induction. This immediately conveys the overall structure of the proof, which helps your reader follow your argument. 2. Define an appropriate predicate P .n/. The predicate P .n/ is called the induction hypothesis. The eventual conclusion of the induction argument will be that P .n/ is true for all nonnegative n. A clearly stated induction hypothesis is often the most important part of an induction proof, and its omission is the largest source of confused proofs by students. In the simplest cases, the induction hypothesis can be lifted straight from the proposition you are trying to prove, as we did with equation (5.1). Sometimes the induction hypothesis will involve several variables, in which case you should indicate which variable serves as n. 3. Prove that P .0/ is true. This is usually easy, as in the example above. This part of the proof is called the base case or basis step. 4. Prove that P .n/ implies P .n C 1/ for every nonnegative integer n. This is called the inductive step. The basic plan is always the same: assume that P .n/ is true and then use this assumption to prove that P .n C 1/ is true. These two statements should be fairly similar, but bridging the gap may re- quire some ingenuity. Whatever argument you give must be valid for every nonnegative integer n, since the goal is to prove that all the following impli- cations are true: P .0/ ! P .1/; P .1/ ! P .2/; P .2/ ! P .3/; : : : : 5. Invoke induction. Given these facts, the induction principle allows you to conclude that P .n/ is true for all nonnegative n. This is the logical capstone to the whole argument, but it is so standard that it’s usual not to mention it explicitly. Always be sure to explicitly label the base case and the inductive step. Doing so will make your proofs clearer and will decrease the chance that you forget a key step—like checking the base case. “mcs” — 2017/3/10 — 22:22 — page 131 — #139 5.1. Ordinary Induction 131 5.1.4 A Clean Writeup The proof of Theorem 5.1.1 given above is perfectly valid; however, it contains a lot of extraneous explanation that you won’t usually see in induction proofs. The writeup below is closer to what you might see in print and should be prepared to produce yourself. Revised proof of Theorem 5.1.1. We use induction. The induction hypothesis P .n/ will be equation (5.1). Base case: P .0/ is true, because both sides of equation (5.1) equal zero when n D 0. Inductive step: Assume that P .n/ is true, that is equation (5.1) holds for some nonnegative integer n. Then adding n C 1 to both sides of the equation implies that n.n C 1/ 1 C 2 C 3 C C n C .n C 1/ D C .n C 1/ 2 .n C 1/.n C 2/ D (by simple algebra) 2 which proves P .n C 1/. So it follows by induction that P .n/ is true for all nonnegative n. It probably bothers you that induction led to a proof of this summation formula but did not provide an intuitive way to understand it nor did it explain where the formula came from in the first place.2 This is both a weakness and a strength. It is a weakness when a proof does not provide insight. But it is a strength that a proof can provide a reader with a reliable guarantee of correctness without requiring insight. 5.1.5 A More Challenging Example During the development of MIT’s famous Stata Center, as costs rose further and further beyond budget, some radical fundraising ideas were proposed. One rumored plan was to install a big square courtyard divided into unit squares. The big square would be 2n units on a side for some undetermined nonnegative integer n, and one of the unit squares in the center3 occupied by a statue of a wealthy potential donor—whom the fund raisers privately referred to as “Bill.” The n D 3 case is shown in Figure 5.1. A complication was that the building’s unconventional architect, Frank Gehry, was alleged to require that only special L-shaped tiles (shown in Figure 5.2) be 2 Methods for finding such formulas are covered in Part III of the text. 3 In the special case n D 0, the whole courtyard consists of a single central square; otherwise, there are four central squares. “mcs” — 2017/3/10 — 22:22 — page 132 — #140 132 Chapter 5 Induction 2n 2n Figure 5.1 A 2n 2n courtyard for n D 3. Figure 5.2 The special L-shaped tile. used for the courtyard. For n D 2, a courtyard meeting these constraints is shown in Figure 5.3. But what about for larger values of n? Is there a way to tile a 2n 2n courtyard with L-shaped tiles around a statue in the center? Let’s try to prove that this is so. Theorem 5.1.2. For all n 0 there exists a tiling of a 2n 2n courtyard with Bill in a central square. Proof. (doomed attempt) The proof is by induction. Let P .n/ be the proposition that there exists a tiling of a 2n 2n courtyard with Bill in the center. Base case: P .0/ is true because Bill fills the whole courtyard. Inductive step: Assume that there is a tiling of a 2n 2n courtyard with Bill in the center for some n 0. We must prove that there is a way to tile a 2nC1 2nC1 courtyard with Bill in the center . . . . Now we’re in trouble! The ability to tile a smaller courtyard with Bill in the “mcs” — 2017/3/10 — 22:22 — page 133 — #141 5.1. Ordinary Induction 133 B Figure 5.3 A tiling using L-shaped tiles for n D 2 with Bill in a center square. center isn’t much help in tiling a larger courtyard with Bill in the center. We haven’t figured out how to bridge the gap between P .n/ and P .n C 1/. So if we’re going to prove Theorem 5.1.2 by induction, we’re going to need some other induction hypothesis than simply the statement about n that we’re trying to prove. When this happens, your first fallback should be to look for a stronger induction hypothesis; that is, one which implies your previous hypothesis. For example, we could make P .n/ the proposition that for every location of Bill in a 2n 2n courtyard, there exists a tiling of the remainder. This advice may sound bizarre: “If you can’t prove something, try to prove some- thing grander!” But for induction arguments, this makes sense. In the inductive step, where you have to prove P .n/ IMPLIES P .n C 1/, you’re in better shape because you can assume P .n/, which is now a more powerful statement. Let’s see how this plays out in the case of courtyard tiling. Proof (successful attempt). The proof is by induction. Let P .n/ be the proposition that for every location of Bill in a 2n 2n courtyard, there exists a tiling of the remainder. Base case: P .0/ is true because Bill fills the whole courtyard. Inductive step: Assume that P .n/ is true for some n 0; that is, for every location of Bill in a 2n 2n courtyard, there exists a tiling of the remainder. Divide the 2nC1 2nC1 courtyard into four quadrants, each 2n 2n . One quadrant contains Bill (B in the diagram below). Place a temporary Bill (X in the diagram) in each of the three central squares lying outside this quadrant as shown in Figure 5.4. “mcs” — 2017/3/10 — 22:22 — page 134 — #142 134 Chapter 5 Induction B 2n X X X 2n 2n 2n Figure 5.4 Using a stronger inductive hypothesis to prove Theorem 5.1.2. Now we can tile each of the four quadrants by the induction assumption. Replac- ing the three temporary Bills with a single L-shaped tile completes the job. This proves that P .n/ implies P .n C 1/ for all n 0. Thus P .m/ is true for all m 2 N, and the theorem follows as a special case where we put Bill in a central square. This proof has two nice properties. First, not only does the argument guarantee that a tiling exists, but also it gives an algorithm for finding such a tiling. Second, we have a stronger result: if Bill wanted a statue on the edge of the courtyard, away from the pigeons, we could accommodate him! Strengthening the induction hypothesis is often a good move when an induction proof won’t go through. But keep in mind that the stronger assertion must actually be true; otherwise, there isn’t much hope of constructing a valid proof. Sometimes finding just the right induction hypothesis requires trial, error, and insight. For example, mathematicians spent almost twenty years trying to prove or disprove the conjecture that every planar graph is 5-choosable.4 Then, in 1994, Carsten Thomassen gave an induction proof simple enough to explain on a napkin. The key turned out to be finding an extremely clever induction hypothesis; with that in hand, completing the argument was easy! 4 5-choosabilityis a slight generalization of 5-colorability. Although every planar graph is 4- colorable and therefore 5-colorable, not every planar graph is 4-choosable. If this all sounds like nonsense, don’t panic. We’ll discuss graphs, planarity, and coloring in Part II of the text. “mcs” — 2017/3/10 — 22:22 — page 135 — #143 5.1. Ordinary Induction 135 5.1.6 A Faulty Induction Proof If we have done a good job in writing this text, right about now you should be thinking, “Hey, this induction stuff isn’t so hard after all—just show P .0/ is true and that P .n/ implies P .n C 1/ for any number n.” And, you would be right, although sometimes when you start doing induction proofs on your own, you can run into trouble. For example, we will now use induction to “prove” that all horses are the same color—just when you thought it was safe to skip class and work on your robot program instead. Sorry! False Theorem. All horses are the same color. Notice that no n is mentioned in this assertion, so we’re going to have to re- formulate it in a way that makes an n explicit. In particular, we’ll (falsely) prove that False Theorem 5.1.3. In every set of n 1 horses, all the horses are the same color. This is a statement about all integers n 1 rather 0, so it’s natural to use a slight variation on induction: prove P .1/ in the base case and then prove that P .n/ implies P .n C 1/ for all n 1 in the inductive step. This is a perfectly valid variant of induction and is not the problem with the proof below. Bogus proof. The proof is by induction on n. The induction hypothesis P .n/ will be In every set of n horses, all are the same color. (5.3) Base case: (n D 1). P .1/ is true, because in a size-1 set of horses, there’s only one horse, and this horse is definitely the same color as itself. Inductive step: Assume that P .n/ is true for some n 1. That is, assume that in every set of n horses, all are the same color. Now suppose we have a set of n C 1 horses: h1 ; h2 ; : : : ; hn ; hnC1 : We need to prove these n C 1 horses are all the same color. By our assumption, the first n horses are the same color: h ; h ; : : : ; hn ; hnC1 „1 2ƒ‚ … same color Also by our assumption, the last n horses are the same color: h1 ; h2 ; : : : ; hn ; hnC1 „ ƒ‚ … same color “mcs” — 2017/3/10 — 22:22 — page 136 — #144 136 Chapter 5 Induction So h1 is the same color as the remaining horses besides hnC1 —that is, h2 ; : : : ; hn . Likewise, hnC1 is the same color as the remaining horses besides h1 —that is, h2 ; : : : ; hn , again. Since h1 and hnC1 are the same color as h2 ; : : : ; hn , all n C 1 horses must be the same color, and so P .n C 1/ is true. Thus, P .n/ implies P .n C 1/. By the principle of induction, P .n/ is true for all n 1. We’ve proved something false! Does this mean that math broken and we should all take up poetry instead? Of course not! It just means that this proof has a mistake. The mistake in this argument is in the sentence that begins “So h1 is the same color as the remaining horses besides hnC1 —that is h2 ; : : : ; hn ; : : : .” The ellipis notation (“: : : ”) in the expression “h1 ; h2 ; : : : ; hn ; hnC1 ” creates the impression that there are some remaining horses—namely h2 ; : : : ; hn —besides h1 and hnC1 . However, this is not true when n D 1. In that case, h1 ; h2 ; : : : ; hn ; hnC1 is just h1 ; h2 and there are no “remaining” horses for h1 to share a color with. And of course, in this case h1 and h2 really don’t need to be the same color. This mistake knocks a critical link out of our induction argument. We proved P .1/ and we correctly proved P .2/ ! P .3/, P .3/ ! P .4/, etc. But we failed to prove P .1/ ! P .2/, and so everything falls apart: we cannot conclude that P .2/, P .3/, etc., are true. And naturally, these propositions are all false; there are sets of n horses of different colors for all n 2. Students sometimes explain that the mistake in the proof is because P .n/ is false for n 2, and the proof assumes something false, P .n/, in order to prove P .nC1/. You should think about how to help such a student understand why this explanation would get no credit on a Math for Computer Science exam. 5.2 Strong Induction A useful variant of induction is called strong induction. Strong induction and ordi- nary induction are used for exactly the same thing: proving that a predicate is true for all nonnegative integers. Strong induction is useful when a simple proof that the predicate holds for n C 1 does not follow just from the fact that it holds at n, but from the fact that it holds for other values n. “mcs” — 2017/3/10 — 22:22 — page 137 — #145 5.2. Strong Induction 137 5.2.1 A Rule for Strong Induction Principle of Strong Induction. Let P be a predicate on nonnegative integers. If P .0/ is true, and for all n 2 N, P .0/, P .1/, . . . , P .n/ together imply P .n C 1/, then P .m/ is true for all m 2 N. The only change from the ordinary induction principle is that strong induction allows you make more assumptions in the inductive step of your proof! In an ordinary induction argument, you assume that P .n/ is true and try to prove that P .n C 1/ is also true. In a strong induction argument, you may assume that P .0/, P .1/, . . . , and P .n/ are all true when you go to prove P .nC1/. So you can assume a stronger set of hypotheses which can make your job easier. Formulated as a proof rule, strong induction is Rule. Strong Induction Rule P .0/; 8n 2 N: P .0/ AND P .1/ AND : : : AND P .n/ IMPLIES P .n C 1/ 8m 2 N: P .m/ Stated more succintly, the rule is Rule. P .0/; Œ8k n 2 N: P .k/ IMPLIES P .n C 1/ 8m 2 N: P .m/ The template for strong induction proofs is identical to the template given in Section 5.1.3 for ordinary induction except for two things: you should state that your proof is by strong induction, and you can assume that P .0/, P .1/, . . . , P .n/ are all true instead of only P .n/ during the inductive step. 5.2.2 Fibonacci numbers The numbers that bear his name arose out of the Italian mathematician Fibonacci’s models of population growth at the beginning of the thirteenth century. Fibonacci numbers turn out to describe the growth of lots of interesting biological quantities “mcs” — 2017/3/10 — 22:22 — page 138 — #146 138 Chapter 5 Induction such as the shape of pineapple sprouts or pine cones, and they also come up regu- larly in Computer Science where they describe the growth of various data structures and computation times of algorithms. To generate the list of successive Fibonacci numbers, you start by writing 0; 1 and then keep adding another element to the list by summing the two previous ones: 0; 1; 1; 2; 3; 5; 8; 13; 21; : : : : Another way to describe this process is to define nth Fibonacci number F .n/ by the equations: F .0/ WWD 0; F .1/ WWD 1; F .n/ WWD F .n 1/ C F .n 2/ for n 2. Note that because the general rule for finding the Fibonacci F .n/ refers to the two previous values F .n 1/ and F .n 2/, we needed to know the two values F .0/ and F .1/ in order to get started. One simple property of Fibonacci numbers is that the even/odd pattern of Fi- bonacci numbers repeats in a cycle of length three. A nice way to say this is that for all n 0, F .n/ is even IFF F .n C 3/ is even: (5.4) We will verify the equivalence (5.4) by induction, but strong induction is called for because properties of F .n/ depend not just on F .n 1/ but also on F .n 2/. Proof. The (strong) induction hypothesis P .n/ will be (5.4). Base cases: (n D 0). F .0/ D 0 and F .3/ D 2 are both even. (n D 1). F .1/ D 1 and F .4/ D 3 are both not even. Induction step: For n 1, we want to prove P .n C 1/ is true assuming by strong induction that P .n/ and P .n 1/ are true. Now it is easy to verify that for all integers k; m, m C k is even IFF Œm is even IFF k is even: (*) “mcs” — 2017/3/10 — 22:22 — page 139 — #147 5.2. Strong Induction 139 So for n 1, F .n C 1/ is even IFF F .n/ C F .n 1/ is even (def of F .n C 1/) IFF ŒF .n/ is even IFF F .n 1/ is even (by (*)) IFF ŒF .n C 3/ is even IFF F .n C 2/ is even (by strong ind. hyp. P .n/; P .n 1/) IFF F .n C 3/ C F .n C 2/ is even (by (*)) IFF F .n C 4/ is even (by def of F .n C 4/): This shows that F .n C 1/ is even IFF F .n C 4/ is even; which means that P .n C 1/ is true, as required. There is a long standing community of Fibonacci number enthusiasts who have been captivated by the many extraordinary properties of these number—a few fur- ther illustrative properties appear in Problems 5.8, 5.25, and 5.30. 5.2.3 Products of Primes We can use strong induction to re-prove Theorem 2.3.1 which we previously proved using Well Ordering. Theorem. Every integer greater than 1 is a product of primes. Proof. We will prove the Theorem by strong induction, letting the induction hy- pothesis P .n/ be n is a product of primes: So the Theorem will follow if we prove that P .n/ holds for all n 2. Base Case: (n D 2): P .2/ is true because 2 is prime, so it is a length one product of primes by convention. Inductive step: Suppose that n 2 and that every number from 2 to n is a product of primes. We must show that P .n C 1/ holds, namely, that n C 1 is also a product of primes. We argue by cases: If n C 1 is itself prime, then it is a length one product of primes by convention, and so P .n C 1/ holds in this case. Otherwise, n C 1 is not prime, which by definition means n C 1 D k m for some integers k; m between 2 and n. Now by the strong induction hypothesis, we know that both k and m are products of primes. By multiplying these products, it follows “mcs” — 2017/3/10 — 22:22 — page 140 — #148 140 Chapter 5 Induction Figure 5.5 One way to make 26 Sg using Strongian currency immediately that k m D n C 1 is also a product of primes. Therefore, P .n C 1/ holds in this case as well. So P .n C 1/ holds in any case, which completes the proof by strong induction that P .n/ holds for all n 2. 5.2.4 Making Change The country Inductia, whose unit of currency is the Strong, has coins worth 3Sg (3 Strongs) and 5Sg. Although the Inductians have some trouble making small change like 4Sg or 7Sg, it turns out that they can collect coins to make change for any number that is at least 8 Strongs. Strong induction makes this easy to prove for n C 1 11, because then .n C 1/ 3 8, so by strong induction the Inductians can make change for exactly .n C 1/ 3 Strongs, and then they can add a 3Sg coin to get .n C 1/Sg. So the only thing to do is check that they can make change for all the amounts from 8 to 10Sg, which is not too hard to do. Here’s a detailed writeup using the official format: Proof. We prove by strong induction that the Inductians can make change for any amount of at least 8Sg. The induction hypothesis P .n/ will be: There is a collection of coins whose value is n C 8 Strongs. We now proceed with the induction proof: Base case: P .0/ is true because a 3Sg coin together with a 5Sg coin makes 8Sg. “mcs” — 2017/3/10 — 22:22 — page 141 — #149 5.2. Strong Induction 141 Inductive step: We assume P .k/ holds for all k n, and prove that P .n C 1/ holds. We argue by cases: Case (n C 1 = 1): We have to make .n C 1/ C 8 D 9Sg. We can do this using three 3Sg coins. Case (n C 1 = 2): We have to make .n C 1/ C 8 D 10Sg. Use two 5Sg coins. Case (n C 1 3): Then 0 n 2 n, so by the strong induction hypothesis, the Inductians can make change for .n 2/ C 8Sg. Now by adding a 3Sg coin, they can make change for .n C 1/ C 8Sg, so P .n C 1/ holds in this case. Since n 0, we know that n C 1 1 and thus that the three cases cover every possibility. Since P .n C 1/ is true in every case, we can conclude by strong induction that for all n 0, the Inductians can make change for n C 8 Strong. That is, they can make change for any number of eight or more Strong. 5.2.5 The Stacking Game Here is another exciting game that’s surely about to sweep the nation! You begin with a stack of n boxes. Then you make a sequence of moves. In each move, you divide one stack of boxes into two nonempty stacks. The game ends when you have n stacks, each containing a single box. You earn points for each move; in particular, if you divide one stack of height a C b into two stacks with heights a and b, then you score ab points for that move. Your overall score is the sum of the points that you earn for each move. What strategy should you use to maximize your total score? As an example, suppose that we begin with a stack of n D 10 boxes. Then the game might proceed as shown in Figure 5.6. Can you find a better strategy? Analyzing the Game Let’s use strong induction to analyze the unstacking game. We’ll prove that your score is determined entirely by the number of boxes—your strategy is irrelevant! Theorem 5.2.1. Every way of unstacking n blocks gives a score of n.n 1/=2 points. There are a couple technical points to notice in the proof: The template for a strong induction proof mirrors the one for ordinary induc- tion. As with ordinary induction, we have some freedom to adjust indices. In this case, we prove P .1/ in the base case and prove that P .1/; : : : ; P .n/ imply P .n C 1/ for all n 1 in the inductive step. “mcs” — 2017/3/10 — 22:22 — page 142 — #150 142 Chapter 5 Induction Stack Heights Score 10 5 5 25 points 5 3 2 6 4 3 2 1 4 2 3 2 1 2 4 2 2 2 1 2 1 2 1 2 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Total Score D 45 points Figure 5.6 An example of the stacking game with n D 10 boxes. On each line, the underlined stack is divided in the next step. Proof. The proof is by strong induction. Let P .n/ be the proposition that every way of unstacking n blocks gives a score of n.n 1/=2. Base case: If n D 1, then there is only one block. No moves are possible, and so the total score for the game is 1.1 1/=2 D 0. Therefore, P .1/ is true. Inductive step: Now we must show that P .1/, . . . , P .n/ imply P .n C 1/ for all n 1. So assume that P .1/, . . . , P .n/ are all true and that we have a stack of n C 1 blocks. The first move must split this stack into substacks with positive sizes a and b where a C b D n C 1 and 0 < a; b n. Now the total score for the game is the sum of points for this first move plus points obtained by unstacking the two resulting substacks: total score D (score for 1st move) C (score for unstacking a blocks) C (score for unstacking b blocks) a.a 1/ b.b 1/ D ab C C by P .a/ and P .b/ 2 2 .a C b/2 .a C b/ .a C b/..a C b/ 1/ D D 2 2 .n C 1/n D 2 This shows that P .1/, P .2/, . . . , P .n/ imply P .n C 1/. “mcs” — 2017/3/10 — 22:22 — page 143 — #151 5.3. Strong Induction vs. Induction vs. Well Ordering 143 Therefore, the claim is true by strong induction. 5.3 Strong Induction vs. Induction vs. Well Ordering Strong induction looks genuinely “stronger” than ordinary induction —after all, you can assume a lot more when proving the induction step. Since ordinary in- duction is a special case of strong induction, you might wonder why anyone would bother with the ordinary induction. But strong induction really isn’t any stronger, because a simple text manipula- tion program can automatically reformat any proof using strong induction into a proof using ordinary induction—just by decorating the induction hypothesis with a universal quantifier in a standard way. Still, it’s worth distinguishing these two kinds of induction, since which you use will signal whether the inductive step for n C 1 follows directly from the case for n or requires cases smaller than n, and that is generally good for your reader to know. The template for the two kinds of induction rules looks nothing like the one for the Well Ordering Principle, but this chapter included a couple of examples where induction was used to prove something already proved using well ordering. In fact, this can always be done. As the examples may suggest, any well ordering proof can automatically be reformatted into an induction proof. So theoretically, no one need bother with the Well Ordering Principle either. But it’s equally easy to go the other way, and automatically reformat any strong induction proof into a Well Ordering proof. The three proof methods—well order- ing, induction, and strong induction—are simply different formats for presenting the same mathematical reasoning! So why three methods? Well, sometimes induction proofs are clearer because they don’t require proof by contradiction. Also, induction proofs often provide recursive procedures that reduce large inputs to smaller ones. On the other hand, well ordering can come out slightly shorter and sometimes seem more natural and less worrisome to beginners. So which method should you use? There is no simple recipe. Sometimes the only way to decide is to write up a proof using more than one method and compare how they come out. But whichever method you choose, be sure to state the method up front to help a reader follow your proof. “mcs” — 2017/3/10 — 22:22 — page 144 — #152 144 Chapter 5 Induction Figure 5.7 Gehry’s new tile. Problems for Section 5.1 Practice Problems Problem 5.1. Prove by induction that every nonempty finite set of real numbers has a minimum element. Problem 5.2. Frank Gehry has changed his mind. Instead of the L-shaped tiles shown in fig- ure 5.3, he wants to use an odd offset pattern of tiles (or its mirror-image reflection), as shown in 5.7. To prove this is possible, he uses reasoning similar to the proof in 5.1.5. However, unlike the proof in the text, this proof is flawed. Which part of the proof below contains a logical error? False Claim. The proof is by induction. Let P .n/ be the proposition that for every location of Bill in a 2n 2n courtyard, there exists a tiling of the remainder with the offset tile pattern. False proof. Base case: P .0/ is true because Bill fills the whole courtyard. Inductive step: Assume that P .n/ is true for some n 0; that is, for every location of Bill in a 2n 2n courtyard, there exists a tiling of the remainder. Divide the 2nC1 2nC1 courtyard into four quadrants, each 2n 2n . One quadrant contains “mcs” — 2017/3/10 — 22:22 — page 145 — #153 5.3. Strong Induction vs. Induction vs. Well Ordering 145 Figure 5.8 The induction hypothesis for the false theorem. Bill (B in the diagram below). Place a temporary Bill (X in the diagram) in each of the three squares lying near this quadrant as shown in Figure 5.8. We can tile each of the four quadrants by the induction assumption. Replacing the three temporary Bills with a single offset tile completes the job. This proves that P .n/ implies P .n C 1/ for all n 0. Thus P .m/ is true for all m 2 N, and the ability to place Bill in the center of the courtyard follows as a special case where we put Bill in a central square. Class Problems Problem 5.3. Use induction to prove that 2 n.n C 1/ 13 C 23 C C n3 D : (5.5) 2 for all n 1. Remember to formally 1. Declare proof by induction. 2. Identify the induction hypothesis P .n/. 3. Establish the base case. “mcs” — 2017/3/10 — 22:22 — page 146 — #154 146 Chapter 5 Induction 4. Prove that P .n/ ) P .n C 1/. 5. Conclude that P .n/ holds for all n 1. as in the five part template. Problem 5.4. Prove by induction on n that r nC1 1 1 C r C r2 C C rn D (5.6) r 1 for all n 2 N and numbers r ¤ 1. Problem 5.5. Prove by induction: 1 1 1 1 1C C C C 2 < 2 ; (5.7) 4 9 n n for all n > 1. Problem 5.6. (a) Prove by induction that a 2n 2n courtyard with a 1 1 statue of Bill in a corner can be covered with L-shaped tiles. (Do not assume or reprove the (stronger) result of Theorem 5.1.2 that Bill can be placed anywhere. The point of this problem is to show a different induction hypothesis that works.) (b) Use the result of part (a) to prove the original claim that there is a tiling with Bill in the middle. Problem 5.7. We’ve proved in two different ways that n.n C 1/ 1 C 2 C 3 C C n D 2 But now we’re going to prove a contradictory theorem! False Theorem. For all n 0, n.n C 1/ 2 C 3 C 4 C C n D 2 “mcs” — 2017/3/10 — 22:22 — page 147 — #155 5.3. Strong Induction vs. Induction vs. Well Ordering 147 Proof. We use induction. Let P .n/ be the proposition that 2 C 3 C 4 C C n D n.n C 1/=2. Base case: P .0/ is true, since both sides of the equation are equal to zero. (Recall that a sum with no terms is zero.) Inductive step: Now we must show that P .n/ implies P .n C 1/ for all n 0. So suppose that P .n/ is true; that is, 2 C 3 C 4 C C n D n.n C 1/=2. Then we can reason as follows: 2 C 3 C 4 C C n C .n C 1/ D Œ2 C 3 C 4 C C n C .n C 1/ n.n C 1/ D C .n C 1/ 2 .n C 1/.n C 2/ D 2 Above, we group some terms, use the assumption P .n/, and then simplify. This shows that P .n/ implies P .n C 1/. By the principle of induction, P .n/ is true for all n 2 N. Where exactly is the error in this proof? Homework Problems Problem 5.8. The Fibonacci numbers F .n/ are described in Section 5.2.2. Prove by induction that for all n 1, F .n 1/ F .n C 1/ F .n/2 D . 1/n : (5.8) Problem 5.9. For any binary string ˛ let num .˛/ be the nonnegative integer it represents in binary notation. For example, num .10/ D 2, and num .0101/ D 5. An n C 1-bit adder adds two n C 1-bit binary numbers. More precisely, an n C 1-bit adder takes two length n C 1 binary strings ˛n WWD an : : : a1 a0 ; ˇn WWD bn : : : b1 b0 ; and a binary digit c0 as inputs, and produces a length-(n C 1) binary string n WWD sn : : : s1 s0 ; “mcs” — 2017/3/10 — 22:22 — page 148 — #156 148 Chapter 5 Induction and a binary digit cnC1 as outputs, and satisfies the specification: num .˛n / C num .ˇn / C c0 D 2nC1 cnC1 C num .n / : (5.9) There is a straighforward way to implement an nC1-bit adder as a digital circuit: an n C 1-bit ripple-carry circuit has 1 C 2.n C 1/ binary inputs an ; : : : ; a1 ; a0 ; bn ; : : : ; b1 ; b0 ; c0 ; and n C 2 binary outputs, cnC1 ; sn ; : : : ; s1 ; s0 : As in Problem 3.6, the ripple-carry circuit is specified by the following formulas: si WWD ai XOR bi XOR ci (5.10) ci C1 WWD .ai AND bi / OR .ai AND ci / OR .bi AND ci /; : (5.11) for 0 i n. (a) Verify that definitions (5.10) and (5.11) imply that an C bn C cn D 2cnC1 C sn : (5.12) for all n 2 N. (b) Prove by induction on n that an n C 1-bit ripple-carry circuit really is an n C 1- bit adder, that is, its outputs satisfy (5.9). Hint: You may assume that, by definition of binary representation of integers, num .˛nC1 / D anC1 2nC1 C num .˛n / : (5.13) Problem 5.10. Divided Equilateral Triangles5 (DETs) can be built up as follows: A single equilateral triangle counts as a DET whose only subtriangle is itself. If T WWD is a DET, then the equilateral triangle T 0 built out of four copies of T as shown in in Figure 5.9 is also a DET, and the subtriangles of T 0 are exactly the subtriangles of each of the copies of T . 5 Adapted from [46]. “mcs” — 2017/3/10 — 22:22 — page 149 — #157 5.3. Strong Induction vs. Induction vs. Well Ordering 149 [h] Figure 5.9 DET T 0 from Four Copies of DET T [h] Figure 5.10 Trapezoid from Three Triangles (a) Define the length of a DET to be the number of subtriangles with an edge on its base. Prove by induction on length that the total number of subtriangles of a DET is the square of its length. (b) Show that a DET with one of its corner subtriangles removed can be tiled with trapezoids built out of three subtriangles as in Figure 5.10. Problem 5.11. The Math for Computer Science mascot, Theory Hippotamus, made a startling discovery while playing with his prized collection of unit squares over the weekend. Here is what happened. First, Theory Hippotamus put his favorite unit square down on the floor as in Figure 5.11 (a). He noted that the length of the periphery of the resulting shape was 4, an even number. Next, he put a second unit square down next to the first so that the two squares shared an edge as in Figure 5.11 (b). He noticed that the length of the periphery of the resulting shape was now 6, which is also an even number. (The periphery of each shape in the figure is indicated by a thicker line.) Theory Hippotamus continued to place squares so that each new square shared an edge with at least one previously-placed square and no squares overlapped. Eventually, he arrived at the shape in Figure 5.11 (c). He realized that the length of the periphery of this shape was 36, which is again an even number. Our plucky porcine pal is perplexed by this peculiar pattern. Use induction on the number of squares to prove that the length of the periphery is always even, no matter how many squares Theory Hippotamus places or how he arranges them. “mcs” — 2017/3/10 — 22:22 — page 150 — #158 150 Chapter 5 Induction (a) (b) (c) Figure 5.11 Some shapes that Theory Hippotamus created. Problem 5.12. Prove the Distributive Law of intersection over the union of n sets by induction: n [ n [ A\ Bi D .A \ Bi /: (5.14) i D1 i D1 Hint: Theorem 4.1.2 gives the n D 2 case. Problem 5.13. Here is an interesting construction of a geometric object known as the Koch snowflake. Define a sequence of polygons S0 ; S1 recursively, starting with S0 equal to an equi- lateral triangle with unit sides. We construct SnC1 by removing the middle third of each edge of Sn and replacing it with two line segments of the same length, as illustrated in Figure 5.12. Let an be the area of Sn . Observe that pa0 is just the area of the unit equilateral triangle which by elementary geometry is 3=4. Prove by induction that for n 0, the area of the nth snowflake is given by: 8 3 4 n an D a0 : (5.15) 5 5 9 “mcs” — 2017/3/10 — 22:22 — page 151 — #159 5.3. Strong Induction vs. Induction vs. Well Ordering 151 Figure 5.12 S0 ; S1 ; S2 and S3 . Exam Problems Problem 5.14. Prove by induction that n X k kŠ D .n C 1/Š 1: (5.16) 1 Problem 5.15. Prove by induction: 2 n.n C 1/ 03 C 13 C 23 C C n3 D ; 8n 0: 2 using the equation itself as the induction hypothesis P .n/. (a) Prove the base case .n D 0/. (b) Now prove the inductive step. “mcs” — 2017/3/10 — 22:22 — page 152 — #160 152 Chapter 5 Induction Problem 5.16. Suppose P .n/ is a predicate on nonnegative numbers, and suppose 8k: P .k/ IMPLIES P .k C 2/: (5.17) For P ’s that satisfy (5.17), some of the assertions below Can hold for some, but not all, such P , other assertions Always hold no matter what the P may be, and some Never hold for any such P . Indicate which case applies for each of the assertions and briefly explain why. (a) 8n 0: P .n/ (b) NOT.P .0// AND 8n 1: P .n/ (c) 8n 0: NOT.P .n// (d) .8n 100: P .n// AND .8n > 100: NOT.P .n/// (e) .8n 100: NOT.P .n/// AND .8n > 100: P .n// (f) P .0/ IMPLIES 8n: P .n C 2/ (g) Œ9n: P .2n/ IMPLIES 8n: P .2n C 2/ (h) P .1/ IMPLIES 8n: P .2n C 1/ (i) Œ9n: P .2n/ IMPLIES 8n: P .2n C 2/ (j) 9n: 9m > n: ŒP .2n/ AND NOT.P .2m// (k) Œ9n: P .n/ IMPLIES 8n: 9m > n: P .m/ (l) NOT.P .0// IMPLIES 8n: NOT.P .2n// Problem 5.17. We examine a series of propositional formulas F1 ; F2 ; : : : ; Fn ; : : : containing propo- sitional variables P1 ; P2 ; : : : ; Pn ; : : : constructed as follows F1 .P1 / WWD P1 F2 .P1 ; P2 / WWD P1 IMPLIES P2 F3 .P1 ; P2 ; P3 / WWD .P1 IMPLIES P2 / IMPLIES P3 F4 .P1 ; P2 ; P3 ; P4 / WWD ..P1 IMPLIES P2 / IMPLIES P3 / IMPLIES P4 F5 .P1 ; P2 ; P3 ; P4 ; P5 / WWD ...P1 IMPLIES P2 / IMPLIES P3 / IMPLIES P4 / IMPLIES P5 :: : “mcs” — 2017/3/10 — 22:22 — page 153 — #161 5.3. Strong Induction vs. Induction vs. Well Ordering 153 Let Tn be the number of different true/false settings of the variables P1 ; P2 ; : : : ; Pn for which Fn .P1 ; P2 ; : : : ; Pn / is true. For example, T2 D 3 since F2 .P1 ; P2 / is true for 3 different settings of the variables P1 and P2 : P1 P2 F2 .P1 ; P2 / T T T T F F F T T F F T (a) Explain why TnC1 D 2nC1 Tn : (5.18) (b) Use induction to prove that 2nC1 C . 1/n Tn D (*) 3 for n 1. Problem 5.18. You are given n envelopes, numbered 0; 1; : : : ; n 1. Envelope 0 contains 20 D 1 dollar, Envelope 1 contains 21 D 2 dollars, . . . , and Envelope n 1 contains 2n 1 dollars. Let P .n/ be the assertion that: For all nonnegative integers k < 2n , there is a subset of the n envelopes whose contents total to exactly k dollars. Prove by induction that P .n/ holds for all integers n 1. Problem 5.19. Prove by induction that n.n C 1/.n C 2/ 1 2 C 2 3 C 3 4 C C n.n C 1/ D (5.19) 3 for all integers n 1. “mcs” — 2017/3/10 — 22:22 — page 154 — #162 154 Chapter 5 Induction ... AND-circuit NOT-gate Figure 5.13 OR -circuit from AND-circuit. Problem 5.20. A k-bit AND-circuit is a digital circuit that has k 0-1 valued inputs6 d0 ; d1 ; : : : ; dk 1 and one 0-1-valued output variable whose value will be d0 AND d1 AND AND dk 1: OR -circuitsare defined in the same way, with “OR” replacing “AND.” (a) Suppose we want an OR-circuit but only have a supply of AND-circuits and some NOT-gates (“inverters”) that have one 0-1 valued input and one 0-1 valued output. We can turn an AND-circuit into an OR-circuit by attaching a NOT-gate to each input of the AND-circuit and also attaching a NOT-gate to the output of the AND -circuit. This is illustrated in Figure 5.13. Briefly explain why this works. Large digital circuits are built by connecting together smaller digital circuits as components. One of the most basic components is a two-input/one-output AND- gate that produces an output value equal to the AND of its two input values. So according the definition in part (a), a single AND-gate is a 1-bit AND-circuit. We can build up larger AND-circuits out of a collection of AND-gates in several ways. For example, one way to build a 4-bit AND-circuit is to connect three AND- gates as illustrated in Figure 5.14. More generally, a depth-n tree-design AND-circuit—“depth-n circuit” for short— has 2n inputs and is built from two depth-.n 1/ circuits by using the outputs of the two depth-.n 1/ circuits as inputs to a single AND-gate. This is illustrated in Figure 5.15. So the 4-bit AND-circuit in Figure 5.14 is a depth-2 circuit. A depth-1 circuit is defined simply to be a single AND-gate. 6 Following the usual conventions for digital circuits, we’re using 1 for the truth value T and 0 for F. “mcs” — 2017/3/10 — 22:22 — page 155 — #163 5.3. Strong Induction vs. Induction vs. Well Ordering 155 a2 a3 AND AND AND Figure 5.14 A 4-bit AND-circuit. a2n-1-1 a2n-1 a2n-1+1 •.. a2n-1 depth- depth- n-1 n-1 circuit circuit AND Figure 5.15 An n-bit tree-design AND-circuit. “mcs” — 2017/3/10 — 22:22 — page 156 — #164 156 Chapter 5 Induction (b) Let gate#.n/ be the number of AND-gates in a depth-n circuit. Prove by in- duction that gate#.n/ D 2n 1 (5.20) for all n 1. Problems for Section 5.2 Practice Problems Problem 5.21. Some fundamental principles for reasoning about nonnegative integers are: 1. The Induction Principle, 2. The Strong Induction Principle, 3. The Well Ordering Principle. Identify which, if any, of the above principles is captured by each of the following inference rules. (a) P .0/; 8m: .8k m: P .k// IMPLIES P .m C 1/ 8n: P .n/ (b) P .b/; 8k b: P .k/ IMPLIES P .k C 1/ 8k b: P .k/ (c) 9n: P .n/ 9m: ŒP .m/ AND .8k: P .k/ IMPLIES k m/ (d) P .0/; 8k > 0: P .k/ IMPLIES P .k C 1/ 8n: P .n/ (e) 8m: .8k < m: P .k// IMPLIES P .m/ 8n: P .n/ “mcs” — 2017/3/10 — 22:22 — page 157 — #165 5.3. Strong Induction vs. Induction vs. Well Ordering 157 Problem 5.22. The Fibonacci numbers F .n/ are described in Section 5.2.2. Indicate exactly which sentence(s) in the following bogus proof contain logical errors? Explain. False Claim. Every Fibonacci number is even. Bogus proof. Let all the variables n; m; k mentioned below be nonnegative integer valued. Let Even.n/ mean that F .n/ is even. The proof is by strong induction with induction hypothesis Even.n/. base case: F .0/ D 0 is an even number, so Even.0/ is true. inductive step: We assume may assume the strong induction hypothesis Even.k/ for 0 k n; and we must prove Even.n C 1/. Then by strong induction hypothesis, Even.n/ and Even.n 1/ are true, that is, F .n/ and F .n 1/ are both even. But by the definition, F .n C 1/ equals the sum F .n/ C F .n 1/ of two even numbers, and so it is also even. This proves Even.n C 1/ as required. Hence, F .m/ is even for all m 2 N by the Strong Induction Principle. Problem 5.23. Alice wants to prove by induction that a predicate P holds for certain nonnegative integers. She has proven that for all nonnegative integers n D 0; 1; : : : P .n/ IMPLIES P .n C 3/: (a) Suppose Alice also proves that P .5/ holds. Which of the following proposi- tions can she infer? 1. P .n/ holds for all n 5 2. P .3n/ holds for all n 5 3. P .n/ holds for n D 8; 11; 14; : : : 4. P .n/ does not hold for n < 5 5. 8n: P .3n C 5/ 6. 8n > 2: P .3n 1/ “mcs” — 2017/3/10 — 22:22 — page 158 — #166 158 Chapter 5 Induction 7. P .0/ IMPLIES 8n: P .3n C 2/ 8. P .0/ IMPLIES 8n: P .3n/ (b) Which of the following could Alice prove in order to conclude that P .n/ holds for all n 5? 1. P .0/ 2. P .5/ 3. P .5/ and P .6/ 4. P .0/, P .1/ and P .2/ 5. P .5/, P .6/ and P .7/ 6. P .2/, P .4/ and P .5/ 7. P .2/, P .4/ and P .6/ 8. P .3/, P .5/ and P .7/ Problem 5.24. Prove that every amount of postage of 12 cents or more can be formed using just 4-cent and 5-cent stamps. Class Problems Problem 5.25. The Fibonacci numbers are described in Section 5.2.2. Prove, using strong induction, the following closed-form formula for the Fi- bonacci numbers.7 pn qn F .n/ D p 5 p p where p D 1C2 5 and q D 1 2 5 . Hint: Note that p and q are the roots of x 2 x 1 D 0, and so p 2 D p C 1 and 2 q D q C 1. Problem 5.26. A sequence of numbers is weakly decreasing when each number in the sequence is 7 This mind-boggling formula is known as Binet’s formula. We’ll explain in Chapter 16, and again in Chapter 22, how it comes about. “mcs” — 2017/3/10 — 22:22 — page 159 — #167 5.3. Strong Induction vs. Induction vs. Well Ordering 159 the numbers after it. (This implies that a sequence of just one number is weakly decreasing.) Here’s a bogus proof of a very important true fact, every integer greater than 1 is a product of a unique weakly decreasing sequence of primes—a pusp, for short. Explain what’s bogus about the proof. Lemma. Every integer greater than 1 is a pusp. For example, 252 D 7 3 3 2 2, and no other weakly decreasing sequence of primes will have a product equal to 252. Bogus proof. We will prove the lemma by strong induction, letting the induction hypothesis P .n/ be n is a pusp: So the lemma will follow if we prove that P .n/ holds for all n 2. Base Case (n D 2): P .2/ is true because 2 is prime, and so it is a length one product of primes, and this is obviously the only sequence of primes whose product can equal 2. Inductive step: Suppose that n 2 and that i is a pusp for every integer i where 2 i < n C 1. We must show that P .n C 1/ holds, namely, that n C 1 is also a pusp. We argue by cases: If n C 1 is itself prime, then it is the product of a length one sequence consisting of itself. This sequence is unique, since by definition of prime, n C 1 has no other prime factors. So n C 1 is a pusp, that is P .n C 1/ holds in this case. Otherwise, n C 1 is not prime, which by definition means n C 1 D km for some integers k; m such that 2 k; m < n C 1. Now by the strong induction hypothesis, we know that k and m are pusps. It follows that by merging the unique prime sequences for k and m, in sorted order, we get a unique weakly decreasing sequence of primes whose product equals n C 1. So n C 1 is a pusp, in this case as well. So P .n C 1/ holds in any case, which completes the proof by strong induction that P .n/ holds for all n 2. Problem 5.27. Define the potential p.S / of a stack of blocks S to be k.k 1/=2 where k is the number of blocks in S. Define the potential p.A/ of a set of stacks A to be the sum of the potentials of the stacks in A. “mcs” — 2017/3/10 — 22:22 — page 160 — #168 160 Chapter 5 Induction Generalize Theorem 5.2.1 about scores in the stacking game to show that for any set of stacks A if a sequence of moves starting with A leads to another set of stacks B then p.A/ p.B/, and the score for this sequence of moves is p.A/ p.B/. Hint: Try induction on the number of moves to get from A to B. Homework Problems Problem 5.28. A group of n 1 people can be divided into teams, each containing either 4 or 7 people. What are all the possible values of n? Use induction to prove that your answer is correct. Problem 5.29. The following Lemma is true, but the proof given for it below is defective. Pin- point exactly where the proof first makes an unjustified step and explain why it is unjustified. Lemma. For any prime p and positive integers n; x1 ; x2 ; : : : ; xn , if p j x1 x2 : : : xn , then p j xi for some 1 i n. Bogus proof. Proof by strong induction on n. The induction hypothesis P .n/ is that Lemma holds for n. Base case n D 1: When n D 1, we have p j x1 , therefore we can let i D 1 and conclude p j xi . Induction step: Now assuming the claim holds for all k n, we must prove it for n C 1. So suppose p j x1 x2 xnC1 . Let yn D xn xnC1 , so x1 x2 xnC1 D x1 x2 xn 1 yn . Since the right-hand side of this equality is a product of n terms, we have by induc- tion that p divides one of them. If p j xi for some i < n, then we have the desired i . Otherwise p j yn . But since yn is a product of the two terms xn ; xnC1 , we have by strong induction that p divides one of them. So in this case p j xi for i D n or i D n C 1. Exam Problems Problem 5.30. The Fibonacci numbers F .n/ are described in Section 5.2.2. These numbers satisfy many unexpected identities, such as F .0/2 C F .1/2 C C F .n/2 D F .n/F .n C 1/: (5.21) “mcs” — 2017/3/10 — 22:22 — page 161 — #169 5.3. Strong Induction vs. Induction vs. Well Ordering 161 Equation (5.21) can be proved to hold for all n 2 N by induction, using the equation itself as the induction hypothesis P .n/. (a) Prove the base case .n D 0/. (b) Now prove the inductive step. Problem 5.31. Use strong induction to prove that n 3n=3 for every integer n 0. Problem 5.32. A class of any size of 18 or more can be assembled from student teams of sizes 4 and 7. Prove this by induction (of some kind), using the induction hypothesis: S.n/ WWD a class of n C 18 students can be assembled from teams of sizes 4 and 7: Problem 5.33. Any amount of ten or more cents postage that is a multiple of five can be made using only 10¢ and 15¢ stamps. Prove this by induction (ordinary or strong, but say which) using the induction hypothesis S.n/ WWD .5n C 10/¢ postage can be made using only 10¢ and 15¢ stamps: “mcs” — 2017/3/10 — 22:22 — page 162 — #170 “mcs” — 2017/3/10 — 22:22 — page 163 — #171 6 State Machines State machines are a simple, abstract model of step-by-step processes. Since com- puter programs can be understood as defining step-by-step computational processes, it’s not surprising that state machines come up regularly in computer science. They also come up in many other settings such as designing digital circuits and mod- eling probabilistic processes. This section introduces Floyd’s Invariant Principle which is a version of induction tailored specifically for proving properties of state machines. One of the most important uses of induction in computer science involves prov- ing one or more desirable properties continues to hold at every step in a process. A property that is preserved through a series of operations or steps is known as a preserved invariant. Examples of desirable invariants include properties such as a variable never ex- ceeding a certain value, the altitude of a plane never dropping below 1,000 feet without the wingflaps being deployed, and the temperature of a nuclear reactor never exceeding the threshold for a meltdown. 6.1 States and Transitions Formally, a state machine is nothing more than a binary relation on a set, except that the elements of the set are called “states,” the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition. A transition from state q to state r will be written q ! r. The transition relation is also called the state graph of the machine. A state machine also comes equipped with a designated start state. A simple example is a bounded counter, which counts from 0 to 99 and overflows at 100. This state machine is pictured in Figure 6.1, with states pictured as circles, transitions by arrows, and with start state 0 indicated by the double circle. To be start state 0 1 2 99 overflow Figure 6.1 State transitions for the 99-bounded counter. “mcs” — 2017/3/10 — 22:22 — page 164 — #172 164 Chapter 6 State Machines precise, what the picture tells us is that this bounded counter machine has states WWD f0; 1; : : : ; 99; overflowg; start state WWD 0; transitions WWD fn ! n C 1 j 0 n < 99g [ f99 ! overflow; overflow ! overflowg: This machine isn’t much use once it overflows, since it has no way to get out of its overflow state. State machines for digital circuits and string pattern matching algorithms, for in- stance, usually have only a finite number of states. Machines that model continuing computations typically have an infinite number of states. For example, instead of the 99-bounded counter, we could easily define an “unbounded” counter that just keeps counting up without overflowing. The unbounded counter has an infinite state set, the nonnegative integers, which makes its state diagram harder to draw. State machines are often defined with labels on states and/or transitions to indi- cate such things as input or output values, costs, capacities, or probabilities. Our state machines don’t include any such labels because they aren’t needed for our purposes. We do name states, as in Figure 6.1, so we can talk about them, but the names aren’t part of the state machine. 6.2 The Invariant Principle 6.2.1 A Diagonally-Moving Robot Suppose we have a robot that starts at the origin and moves on an infinite 2- dimensional integer grid. The state of the robot at any time can be specified by the integer coordinates .x; y/ of the robot’s current position. So the start state is .0; 0/. At each step, the robot may move to a diagonally adjacent grid point, as illustrated in Figure 6.2. To be precise, the robot’s transitions are: f.m; n/ ! .m ˙ 1; n ˙ 1/ j m; n 2 Zg: For example, after the first step, the robot could be in states .1; 1/, .1; 1/, . 1; 1/ or . 1; 1/. After two steps, there are 9 possible states for the robot, includ- ing .0; 0/. The question is, can the robot ever reach position .1; 0/? If you play around with the robot a bit, you’ll probably notice that the robot can only reach positions .m; n/ for which m C n is even, which of course means that it “mcs” — 2017/3/10 — 22:22 — page 165 — #173 6.2. The Invariant Principle 165 y 2 1 0 x 0 1 2 3 Figure 6.2 The Diagonally Moving Robot. can’t reach .1; 0/. This follows because the evenness of the sum of the coordinates is a property that is preserved by transitions. This is an example of a preserved invariant. This once, let’s go through this preserved invariant argument, carefully high- lighting where induction comes in. Specifically, define the even-sum property of states to be: Even-sum..m; n// WWD Œm C n is even: Lemma 6.2.1. For any transition q ! r of the diagonally-moving robot, if Even- sum(q), then Even-sum(r). This lemma follows immediately from the definition of the robot’s transitions: .m; n/ ! .m ˙ 1; n ˙ 1/. After a transition, the sum of coordinates changes by .˙1/ C .˙1/, that is, by 0, 2, or -2. Of course, adding 0, 2 or -2 to an even number gives an even number. So by a trivial induction on the number of transitions, we can prove: Theorem 6.2.2. The sum of the coordinates of any state reachable by the diagonally- moving robot is even. “mcs” — 2017/3/10 — 22:22 — page 166 — #174 166 Chapter 6 State Machines y 2 ‹‹ 1 0 goal x 0 1 2 3 Figure 6.3 Can the Robot get to .1; 0/? “mcs” — 2017/3/10 — 22:22 — page 167 — #175 6.2. The Invariant Principle 167 Proof. The proof is induction on the number of transitions the robot has made. The induction hypothesis is P .n/ WWD if q is a state reachable in n transitions, then Even-sum(q): Base case: P .0/ is true since the only state reachable in 0 transitions is the start state .0; 0/, and 0 C 0 is even. Inductive step: Assume that P .n/ is true, and let r be any state reachable in n C 1 transitions. We need to prove that Even-sum(r) holds. Since r is reachable in n C 1 transitions, there must be a state q reachable in n transitions such that q ! r. Since P .n/ is assumed to be true, Even-sum(q) holds, and so by Lemma 6.2.1, Even-sum(r) also holds. This proves that P .n/ IMPLIES P .n C 1/ as required, completing the proof of the inductive step. We conclude by induction that for all n 0, if q is reachable in n transitions, then Even-sum(q). This implies that every reachable state has the Even-sum property. Corollary 6.2.3. The robot can never reach position .1; 0/. Proof. By Theorem 6.2.2, we know the robot can only reach positions with coor- dinates that sum to an even number, and thus it cannot reach position .1; 0/. 6.2.2 Statement of the Invariant Principle Using the Even-sum invariant to understand the diagonally-moving robot is a sim- ple example of a basic proof method called The Invariant Principle. The Principle summarizes how induction on the number of steps to reach a state applies to invari- ants. A state machine execution describes a possible sequence of steps a machine might take. Definition 6.2.4. An execution of the state machine is a (possibly infinite) sequence of states with the property that it begins with the start state, and if q and r are consecutive states in the sequence, then q ! r. A state is called reachable if it appears in some execution. Definition 6.2.5. A preserved invariant of a state machine is a predicate P on states, such that whenever P .q/ is true of a state q and q ! r for some state r then P .r/ holds. “mcs” — 2017/3/10 — 22:22 — page 168 — #176 168 Chapter 6 State Machines The Invariant Principle If a preserved invariant of a state machine is true for the start state, then it is true for all reachable states. The Invariant Principle is nothing more than the Induction Principle reformulated in a convenient form for state machines. Showing that a predicate is true in the start state is the base case of the induction, and showing that a predicate is a preserved invariant corresponds to the inductive step.1 1 Preserved invariants are commonly just called “invariants” in the literature on program correct- ness, but we decided to throw in the extra adjective to avoid confusion with other definitions. For example, other texts (as well as another subject at MIT) use “invariant” to mean “predicate true of all reachable states.” Let’s call this definition “invariant-2.” Now invariant-2 seems like a reason- able definition, since unreachable states by definition don’t matter, and all we want to show is that a desired property is invariant-2. But this confuses the objective of demonstrating that a property is invariant-2 with the method of finding a preserved invariant—which is preserved even at unreachable states—to show that it is invariant-2. “mcs” — 2017/3/10 — 22:22 — page 169 — #177 6.2. The Invariant Principle 169 Robert W. Floyd The Invariant Principle was formulated by Robert W. Floyd at Carnegie Tech in 1967. (Carnegie Tech was renamed Carnegie-Mellon University the following year.) Floyd was already famous for work on the formal grammars that trans- formed the field of programming language parsing; that was how he got to be a professor even though he never got a Ph.D. (He had been admitted to a PhD program as a teenage prodigy, but flunked out and never went back.) In that same year, Albert R. Meyer was appointed Assistant Professor in the Carnegie Tech Computer Science Department, where he first met Floyd. Floyd and Meyer were the only theoreticians in the department, and they were both de- lighted to talk about their shared interests. After just a few conversations, Floyd’s new junior colleague decided that Floyd was the smartest person he had ever met. Naturally, one of the first things Floyd wanted to tell Meyer about was his new, as yet unpublished, Invariant Principle. Floyd explained the result to Meyer, and Meyer wondered (privately) how someone as brilliant as Floyd could be excited by such a trivial observation. Floyd had to show Meyer a bunch of examples be- fore Meyer understood Floyd’s excitement —not at the truth of the utterly obvious Invariant Principle, but rather at the insight that such a simple method could be so widely and easily applied in verifying programs. Floyd left for Stanford the following year. He won the Turing award—the “Nobel prize” of computer science—in the late 1970’s, in recognition of his work on grammars and on the foundations of program verification. He remained at Stanford from 1968 until his death in September, 2001. You can learn more about Floyd’s life and work by reading the eulogy at http://oldwww.acm.org/pubs/membernet/stories/floyd.pdf written by his closest colleague, Don Knuth. “mcs” — 2017/3/10 — 22:22 — page 170 — #178 170 Chapter 6 State Machines 6.2.3 The Die Hard Example The movie Die Hard 3: With a Vengeance includes an amusing example of a state machine. The lead characters played by Samuel L. Jackson and Bruce Willis have to disarm a bomb planted by the diabolical Simon Gruber: Simon: On the fountain, there should be 2 jugs, do you see them? A 5- gallon and a 3-gallon. Fill one of the jugs with exactly 4 gallons of water and place it on the scale and the timer will stop. You must be precise; one ounce more or less will result in detonation. If you’re still alive in 5 minutes, we’ll speak. Bruce: Wait, wait a second. I don’t get it. Do you get it? Samuel: No. Bruce: Get the jugs. Obviously, we can’t fill the 3-gallon jug with 4 gal- lons of water. Samuel: Obviously. Bruce: All right. I know, here we go. We fill the 3-gallon jug exactly to the top, right? Samuel: Uh-huh. Bruce: Okay, now we pour this 3 gallons into the 5-gallon jug, giving us exactly 3 gallons in the 5-gallon jug, right? Samuel: Right, then what? Bruce: All right. We take the 3-gallon jug and fill it a third of the way... Samuel: No! He said, “Be precise.” Exactly 4 gallons. Bruce: Sh - -. Every cop within 50 miles is running his a - - off and I’m out here playing kids games in the park. Samuel: Hey, you want to focus on the problem at hand? Fortunately, they find a solution in the nick of time. You can work out how. The Die Hard 3 State Machine The jug-filling scenario can be modeled with a state machine that keeps track of the amount b of water in the big jug, and the amount l in the little jug. With the 3 and 5 gallon water jugs, the states formally will be pairs .b; l/ of real numbers such “mcs” — 2017/3/10 — 22:22 — page 171 — #179 6.2. The Invariant Principle 171 that 0 b 5; 0 l 3. (We can prove that the reachable values of b and l will be nonnegative integers, but we won’t assume this.) The start state is .0; 0/, since both jugs start empty. Since the amount of water in the jug must be known exactly, we will only con- sider moves in which a jug gets completely filled or completely emptied. There are several kinds of transitions: 1. Fill the little jug: .b; l/ ! .b; 3/ for l < 3. 2. Fill the big jug: .b; l/ ! .5; l/ for b < 5. 3. Empty the little jug: .b; l/ ! .b; 0/ for l > 0. 4. Empty the big jug: .b; l/ ! .0; l/ for b > 0. 5. Pour from the little jug into the big jug: for l > 0, ( .b C l; 0/ if b C l 5, .b; l/ ! .5; l .5 b// otherwise. 6. Pour from big jug into little jug: for b > 0, ( .0; b C l/ if b C l 3, .b; l/ ! .b .3 l/; 3/ otherwise. Note that in contrast to the 99-counter state machine, there is more than one pos- sible transition out of states in the Die Hard machine. Machines like the 99-counter with at most one transition out of each state are called deterministic. The Die Hard machine is nondeterministic because some states have transitions to several differ- ent states. The Die Hard 3 bomb gets disarmed successfully because the state (4,3) is reach- able. Die Hard Permanently The Die Hard series is getting tired, so we propose a final Die Hard Permanently. Here, Simon’s brother returns to avenge him, posing the same challenge, but with the 5 gallon jug replaced by a 9 gallon one. The state machine has the same spec- ification as the Die Hard 3 version, except all occurrences of “5” are replaced by “9.” Now, reaching any state of the form .4; l/ is impossible. We prove this using the Invariant Principle. Specifically, we define the preserved invariant predicate P ..b; l// to be that b and l are nonnegative integer multiples of 3. “mcs” — 2017/3/10 — 22:22 — page 172 — #180 172 Chapter 6 State Machines To prove that P is a preserved invariant of Die-Hard-Once-and-For-All machine, we assume P .q/ holds for some state q WWD .b; l/ and that q ! r. We have to show that P .r/ holds. The proof divides into cases, according to which transition rule is used. One case is a “fill the little jug” transition. This means r D .b; 3/. But P .q/ implies that b is an integer multiple of 3, and of course 3 is an integer multiple of 3, so P .r/ still holds. Another case is a “pour from big jug into little jug” transition. For the subcase when there isn’t enough room in the little jug to hold all the water, that is, when b C l > 3, we have r D .b .3 l/; 3/. But P .q/ implies that b and l are integer multiples of 3, which means b .3 l/ is too, so in this case too, P .r/ holds. We won’t bother to crank out the remaining cases, which can all be checked just as easily. Now by the Invariant Principle, we conclude that every reachable state satisifies P . But since no state of the form .4; l/ satisifies P , we have proved rigorously that Bruce dies once and for all! By the way, notice that the state (1,0), which satisfies NOT.P /, has a transition to (0,0), which satisfies P . So the negation of a preserved invariant may not be a preserved invariant. 6.3 Partial Correctness & Termination Floyd distinguished two required properties to verify a program. The first property is called partial correctness; this is the property that the final results, if any, of the process must satisfy system requirements. You might suppose that if a result was only partially correct, then it might also be partially incorrect, but that’s not what Floyd meant. The word “partial” comes from viewing a process that might not terminate as computing a partial relation. Partial correctness means that when there is a result, it is correct, but the process might not always produce a result, perhaps because it gets stuck in a loop. The second correctness property, called termination, is that the process does always produce some final value. Partial correctness can commonly be proved using the Invariant Principle. Termi- nation can commonly be proved using the Well Ordering Principle. We’ll illustrate this by verifying a Fast Exponentiation procedure. “mcs” — 2017/3/10 — 22:22 — page 173 — #181 6.3. Partial Correctness & Termination 173 6.3.1 Fast Exponentiation Exponentiating The most straightforward way to compute the bth power of a number a is to multi- ply a by itself b 1 times. But the solution can be found in considerably fewer mul- tiplications by using a technique called Fast Exponentiation. The register machine program below defines the fast exponentiation algorithm. The letters x; y; z; r de- note registers that hold numbers. An assignment statement has the form “z WD a” and has the effect of setting the number in register z to be the number a. A Fast Exponentiation Program Given inputs a 2 R; b 2 N, initialize registers x; y; z to a; 1; b respectively, and repeat the following sequence of steps until termination: if z D 0 return y and terminate r WD remainder.z; 2/ z WD quotient.z; 2/ if r D 1, then y WD xy x WD x 2 We claim this program always terminates and leaves y D ab . To begin, we’ll model the behavior of the program with a state machine: 1. states WWD R R N, 2. start state WWD .a; 1; b/, 3. transitions are defined by the rule ( .x 2 ; y; quotient.z; 2// if z is nonzero and even; .x; y; z/ ! 2 .x ; xy; quotient.z; 2// if z is nonzero and odd: The preserved invariant P ..x; y; z// will be z 2 N AND yx z D ab : (6.1) To prove that P is preserved, assume P ..x; y; z// holds and that .x; y; z/ ! .xt ; yt ; zt /. We must prove that P ..xt ; yt ; zt // holds, that is, zt 2 N AND yt xtzt D ab : (6.2) “mcs” — 2017/3/10 — 22:22 — page 174 — #182 174 Chapter 6 State Machines Since there is a transition from .x; y; z/, we have z ¤ 0, and since z 2 N by (6.1), we can consider just two cases: If z is even, then we have that xt D x 2 ; yt D y; zt D z=2. Therefore, zt 2 N and yt xtzt D y.x 2 /z=2 D yx 2z=2 D yx z D ab (by (6.1)) If z is odd, then we have that xt D x 2 ; yt D xy; zt D .z 1/=2. Therefore, zt 2 N and yt xtzt D xy.x 2 /.z 1/=2 D yx 1C2.z 1/=2 D yx 1C.z 1/ D yx z D ab (by (6.1)) So in both cases, (6.2) holds, proving that P is a preserved invariant. Now it’s easy to prove partial correctness: if the Fast Exponentiation program terminates, it does so with ab in register y. This works because 1 ab D ab , which means that the start state .a; 1; b/ satisifies P . By the Invariant Principle, P holds for all reachable states. But the program only stops when z D 0. If a terminated state .x; y; 0/ is reachable, then y D yx 0 D ab as required. Ok, it’s partially correct, but what’s fast about it? The answer is that the number of multiplications it performs to compute ab is roughly the length of the binary representation of b. That is, the Fast Exponentiation program uses roughly log b 2 multiplications, compared to the naive approach of multiplying by a a total of b 1 times. More precisely, it requires at most 2.dlog be C 1/ multiplications for the Fast Exponentiation algorithm to compute ab for b > 1. The reason is that the number in register z is initially b, and gets at least halved with each transition. So it can’t be halved more than dlog be C 1 times before hitting zero and causing the program to terminate. Since each of the transitions involves at most two multiplications, the total number of multiplications until z D 0 is at most 2.dlog be C 1/ for b > 0 (see Problem 6.6). 2 As usual in computer science, log b means the base two logarithm log2 b. We use, ln b for the natural logarithm loge b, and otherwise write the logarithm base explicitly, as in log10 b. “mcs” — 2017/3/10 — 22:22 — page 175 — #183 6.3. Partial Correctness & Termination 175 6.3.2 Derived Variables The preceding termination proof involved finding a nonnegative integer-valued measure to assign to states. We might call this measure the “size” of the state. We then showed that the size of a state decreased with every state transition. By the Well Ordering Principle, the size can’t decrease indefinitely, so when a mini- mum size state is reached, there can’t be any transitions possible: the process has terminated. More generally, the technique of assigning values to states—not necessarily non- negative integers and not necessarily decreasing under transitions—is often useful in the analysis of algorithms. Potential functions play a similar role in physics. In the context of computational processes, such value assignments for states are called derived variables. For example, for the Die Hard machines we could have introduced a derived variable f W states ! R for the amount of water in both buckets, by setting f ..a; b// WWD a C b. Similarly, in the robot problem, the position of the robot along the x-axis would be given by the derived variable x-coord, where x-coord..i; j //WWD i . There are a few standard properties of derived variables that are handy in ana- lyzing state machines. Definition 6.3.1. A derived variable f W states ! R is strictly decreasing iff q ! q 0 IMPLIES f .q 0 / < f .q/: It is weakly decreasing iff q ! q 0 IMPLIES f .q 0 / f .q/: Strictly increasingweakly increasing derived variables are defined similarly.3 We confirmed termination of the Fast Exponentiation procedure by noticing that the derived variable z was nonnegative-integer-valued and strictly decreasing. We can summarize this approach to proving termination as follows: Theorem 6.3.2. If f is a strictly decreasing N-valued derived variable of a state machine, then the length of any execution starting at state q is at most f .q/. Of course, we could prove Theorem 6.3.2 by induction on the value of f .q/, but think about what it says: “If you start counting down at some nonnegative integer f .q/, then you can’t count down more than f .q/ times.” Put this way, it’s obvious. 3 Weakly increasing variables are often also called nondecreasing. We will avoid this terminology to prevent confusion between nondecreasing variables and variables with the much weaker property of not being a decreasing variable. “mcs” — 2017/3/10 — 22:22 — page 176 — #184 176 Chapter 6 State Machines 6.3.3 Termination with Well ordered Sets (Optional) Theorem 6.3.2 generalizes straightforwardly to derived variables taking values in a well ordered set (Section 2.4. Theorem 6.3.3. If there exists a strictly decreasing derived variable whose range is a well ordered set, then every execution terminates. Theorem 6.3.3 follows immediately from the observation that a set of numbers is well ordered iff it has no infinite decreasing sequences (Problem 2.23). Note that the existence of a weakly decreasing derived variable does not guaran- tee that every execution terminates. An infinite execution could proceed through states in which a weakly decreasing variable remained constant. 6.3.4 A Southeast Jumping Robot (Optional) Here’s a simple, contrived example of a termination proof based on a variable that is strictly decreasing over a well ordered set. Let’s think about a robot that travels around the nonnegative integer quadrant N2 . If the robot is at some position .x; y/ different from the origin .0; 0/, the robot must make a move, which may be a unit distance West—that is, .x; y/ ! .x 1; y/ for x > 0, or a unit distance South combined with an arbitrary jump East—that is, .x; y/ ! .z; y 1/ for z x, providing the move does not leave the quadrant. Claim 6.3.4. The robot will always get stuck at the origin. If we think of the robot as a nondeterministic state machine, then Claim 6.3.4 is a termination assertion. The Claim may seem obvious, but it really has a different character than termination based on nonnegative integer-valued variables. That’s because, even knowing that the robot is at position .0; 1/, for example, there is no way to bound the time it takes for the robot to get stuck. It can delay getting stuck for as many seconds as it wants by making its next move to a distant point in the Far East. This rules out proving termination using Theorem 6.3.2. So does Claim 6.3.4 still seem obvious? Well it is if you see the trick. Define a derived variable v mapping robot states to the numbers in the well ordered set N C F of Lemma 2.4.5. In particular, define v W N2 ! N C F as follows x v.x; y/ WWD y C : xC1 “mcs” — 2017/3/10 — 22:22 — page 177 — #185 6.4. The Stable Marriage Problem 177 Brad 2 1 Jennifer 1 2 2 1 Billy Bob 1 2 Angelina Figure 6.4 Preferences for four people. Both men like Angelina best and both women like Brad best. Now it’s easy to check that if .x; y/ ! .x 0 ; y 0 / is a legitimate robot move, then v..x 0 ; y 0 // < v..x; y//. In particular, v is a strictly decreasing derived variable, so Theorem 6.3.3 implies that the robot always get stuck—even though we can’t say how many moves it will take until it does. 6.4 The Stable Marriage Problem Suppose we have a population of men and women in which each person has pref- erences of the opposite-gender person they would like to marry: each man has his preference list of all the women, and each woman has her preference list of all of the men. The preferences don’t have to be symmetric. That is, Jennifer might like Brad best, but Brad doesn’t necessarily like Jennifer best. The goal is to marry every- one: every man must marry exactly one woman and vice versa—no polygamy and heterosexual marriages only.4 Moreover, we would like to find a matching between men and women that is stable in the sense that there is no pair of people who prefer one another to their spouses. For example, suppose Brad likes Angelina best, and Angelina likes Brad best, but Brad and Angelina are married to other people, say Jennifer and Billy Bob. Now Brad and Angelina prefer each other to their spouses, which puts their marriages at risk. Pretty soon, they’re likely to start spending late nights together working on problem sets! This unfortunate situation is illustrated in Figure 6.4, where the digits “1” and “2” near a man shows which of the two women he ranks first and second, respectively, and similarly for the women. 4 Same-sex marriage is an interesting but separate case. “mcs” — 2017/3/10 — 22:22 — page 178 — #186 178 Chapter 6 State Machines More generally, in any matching, a man and woman who are not married to each other and who like each other better than their spouses is called a rogue couple. In the situation shown in Figure 6.4, Brad and Angelina would be a rogue couple. Having a rogue couple is not a good thing, since it threatens the stability of the marriages. On the other hand, if there are no rogue couples, then for any man and woman who are not married to each other, at least one likes their spouse better than the other, and so there won’t be any mutual temptation to start an affair. Definition 6.4.1. A stable matching is a matching with no rogue couples. The question is, given everybody’s preferences, can you find a stable set of mar- riages? In the example consisting solely of the four people in Figure 6.4, we could let Brad and Angelina both have their first choices by marrying each other. Now neither Brad nor Angelina prefers anybody else to their spouse, so neither will be in a rogue couple. This leaves Jen not-so-happily married to Billy Bob, but neither Jen nor Billy Bob can entice somebody else to marry them, and so this is a stable matching. It turns out there always is a stable matching among a group of men and women. We don’t know of any immediate way to recognize this, and it seems surprising. In fact, in the apparently similar same-sex or “buddy” matching problem where people are supposed to be paired off as buddies, regardless of gender, a stable matching may not be possible. An example of preferences among four people where there is no stable buddy match is given in Problem 6.22. But when men are only allowed to marry women, and vice versa, then there is a simple procedure to produce a stable matching and the concept of preserved invariants provides an elegant way to understand and verify the procedure. 6.4.1 The Mating Ritual The procedure for finding a stable matching can be described in a memorable way as a Mating Ritual that takes place over several days. On the starting day, each man has his full preference list of all the women, and likewise each woman has her full preference list of all the men. Then following events happen each day: Morning: Each man stands under the balcony of the woman on the top of his list, that is the woman he prefers above all the other remaining women. The he serenades her. He is said to be her suitor. If a man has no women left on his list, he stays home and does his math homework. Afternoon: Each woman who has one or more suitors says to her favorite among them, “We might get engaged. Please stay around.” To the other suitors, she says, “No. I will never marry you! Take a hike!” “mcs” — 2017/3/10 — 22:22 — page 179 — #187 6.4. The Stable Marriage Problem 179 Evening: Any man who is told by a woman to take a hike crosses that woman off his preference list. Termination condition: When a day arrives in which every woman has at most one suitor, the ritual ends with each woman marrying her suitor, if she has one. There are a number of facts about this Mating Ritual that we would like to prove: The Ritual eventually reaches the termination condition. Everybody ends up married. The resulting marriages are stable. To prove these facts, it will be helpful to recognize the Ritual as the description of a state machine. The state at the start of any day is determined by knowing for each man, which woman, if any, he will serenade that day—that is, the woman at the top of his preference list after he has crossed out all the women who have rejected him on earlier days. Mating Ritual at Akamai The Internet infrastructure company Akamai, cofounded by Tom Leighton, also uses a variation of the Mating Ritual to assign web traffic to its servers. In the early days, Akamai used other combinatorial optimization algorithms that got to be too slow as the number of servers (over 65,000 in 2010) and requests (over 800 billion per day) increased. Akamai switched to a Ritual-like approach, since a Ritual is fast and can be run in a distributed manner. In this case, web requests correspond to women and web servers correspond to men. The web requests have preferences based on latency and packet loss, and the web servers have preferences based on cost of bandwidth and co-location. “mcs” — 2017/3/10 — 22:22 — page 180 — #188 180 Chapter 6 State Machines 6.4.2 There is a Marriage Day It’s easy to see why the Mating Ritual has a terminal day when people finally get married. Every day on which the ritual hasn’t terminated, at least one man crosses a woman off his list. (If the ritual hasn’t terminated, there must be some woman serenaded by at least two men, and at least one of them will have to cross her off his list). If we start with n men and n women, then each of the n men’s lists initially has n women on it, for a total of n2 list entries. Since no women ever gets added to a list, the total number of entries on the lists decreases every day that the Ritual continues, and so the Ritual can continue for at most n2 days. 6.4.3 They All Live Happily Ever After. . . We will prove that the Mating Ritual leaves everyone in a stable marriage. To do this, we note one very useful fact about the Ritual: if on some morning a woman has any suitor, then her favorite suitor will still be serenading her the next morning—his list won’t have changed. So she is sure to have today’s favorite suitor among her suitors tomorrow. That means she will be able to choose a favorite suitor tomorrow who is at least as desirable to her as today’s favorite. So day by day, her favorite suitor can stay the same or get better, never worse. This sounds like an invariant, and it is. Namely, let P be the predicate For every woman w and man m, if w is crossed off m’s list, then w has a suitor whom she prefers over m. Lemma 6.4.2. P is a preserved invariant for The Mating Ritual. Proof. Woman w gets crossed off m’s list only when w has a suitor she prefers to m. Thereafter, her favorite suitor doesn’t change until one she likes better comes along. So if her favorite suitor was preferable to m, then any new favorite suitor will be as well. Notice that the invariant P holds vacuously at the beginning since no women are crossed off to start. So by the Invariant Principle, P holds throughout the Ritual. Now we can prove: Theorem 6.4.3. Everyone is married at the end of the Mating Ritual. Proof. Assume to the contrary that on the last day of the Mating Ritual, some man—call him Bob—is not married. This means Bob can’t be serenading anybody, that is, his list must be empty. So every woman must have been crossed off his list and, since P is true, every woman has a suitor whom she prefers to Bob. In “mcs” — 2017/3/10 — 22:22 — page 181 — #189 6.4. The Stable Marriage Problem 181 particular, every woman has some suitor, and since it is the last day, they have only one suitor, and this is who they marry. But there are an equal number of men and women, so if all women are married, so are all men, contradicting the assumption that Bob is not married. Theorem 6.4.4. The Mating Ritual produces a stable matching. Proof. Let Brad and Jen be any man and woman, respectively, that are not married to each other on the last day of the Mating Ritual. We will prove that Brad and Jen are not a rogue couple, and thus that all marriages on the last day are stable. There are two cases to consider. Case 1: Jen is not on Brad’s list by the end. Then by invariant P , we know that Jen has a suitor (and hence a husband) whom she prefers to Brad. So she’s not going to run off with Brad—Brad and Jen cannot be a rogue couple. Case 2: Jen is on Brad’s list. Since Brad picks women to serenade by working down his list, his wife must be higher on his preference list than Jen. So he’s not going to run off with Jen—once again, Brad and Jen are not a rogue couple. 6.4.4 . . . Especially the Men Who is favored by the Mating Ritual, the men or the women? The women seem to have all the power: each day they choose their favorite suitor and reject the rest. What’s more, we know their suitors can only change for the better as the Ritual progresses. Similarly, a man keeps serenading the woman he most prefers among those on his list until he must cross her off, at which point he serenades the next most preferred woman on his list. So from the man’s perspective, the woman he is serenading can only change for the worse. Sounds like a good deal for the women. But it’s not! We will show that the men are by far the favored gender under the Mating Ritual. While the Mating Ritual produces one stable matching, stable matchings need not be unique. For example, reversing the roles of men and women will often yield a different stable matching among them. So a man may have different wives in different sets of stable marriages. In some cases, a man can stably marry every one of the women, but in most cases, there are some women who cannot be a man’s wife in any stable matching. For example, given the preferences shown in Figure 6.4, Jennifer cannot be Brad’s wife in any stable matching because if he was married to her, then he and Angelina would be a rogue couple. It is not feasible for Jennifer to be stably married to Brad. “mcs” — 2017/3/10 — 22:22 — page 182 — #190 182 Chapter 6 State Machines Definition 6.4.5. Given a set of preferences for the men and women, one person is a feasible spouse for another person when there is a stable matching in which these two people are married. Definition 6.4.6. Let Q be the predicate: for every woman w and man m, if w is crossed off m’s list, then w is not a feasible spouse for m. Lemma 6.4.7. Q is a preserved invariant5 for The Mating Ritual. Proof. Suppose Q holds at some point in the Ritual and some woman Alice is about to be crossed off some man’s, Bob’s, list. We claim that Alice must not be feasible for Bob. Therefore Q will still hold after Alice is crossed off, proving that Q is invariant. To verify the claim, notice that when Alice gets crossed of Bob’s list, it’s because Alice has a suitor, Ted, she prefers to Bob. What’s more, since Q holds, all Ted’s feasible wives are still on his list, and Alice is at the top. So Ted likes Alice better than all his other feasible spouses. Now if Alice could be married to Bob in some set of stable marriages, then Ted must be married to a wife he likes less than Alice, making Alice and Ted a rogue couple and contradicting stability. So Alice can’t be married to Bob, that is, Alice is not a feasible wife for Bob, as claimed. Definition 6.4.8. A person’s optimal spouse is their most preferred feasible spouse. A person’s pessimal spouse is their least preferred feasible spouse. Everybody has an optimal and a pessimal spouse, since we know there is at least one stable matching, namely, the one produced by the Mating Ritual. Lemma 6.4.7 implies a key property the Mating Ritual: Theorem 6.4.9. The Mating Ritual marries every man to his optimal spouse and every woman to her pessimal spouse. Proof. If Bob is married to Alice on the final day of the Ritual, then everyone above Alice on Bob’s preference list was crossed off, and by property Q, all these crossed off women were infeasible for Bob. So Alice is Bob’s highest ranked feasible spouse, that is, his optimal spouse. Further, since Bob likes Alice better than any other feasible wife, Alice and Bob would be a rogue couple if Alice was married to a husband she liked less than Bob. So Bob must be Alice’s least preferred feasible husband. 5 We appeal to P in justifying Q, so technically it is P AND Q which is actually the preserved invariant. But let’s not be picky. “mcs” — 2017/3/10 — 22:22 — page 183 — #191 6.4. The Stable Marriage Problem 183 6.4.5 Applications The Mating Ritual was first announced in a paper by D. Gale and L.S. Shapley in 1962, but ten years before the Gale-Shapley paper was published, and unknown to them, a similar algorithm was being used to assign residents to hospitals by the Na- tional Resident Matching Program (NRMP). The NRMP has, since the turn of the twentieth century, assigned each year’s pool of medical school graduates to hospi- tal residencies (formerly called “internships”), with hospitals and graduates playing the roles of men and women.6 Before the Ritual-like algorithm was adopted, there were chronic disruptions and awkward countermeasures taken to preserve unsta- ble assignments of graduates to residencies. The Ritual resolved these problems so successfully, that it was used essentially without change at least through 1989.7 For this and related work, Shapley was awarded the 2012 Nobel prize in Economics. Not surprisingly, the Mating Ritual is also used by at least one large online dat- ing agency. Of course there is no serenading going on—everything is handled by computer. Problems for Section 6.3 Practice Problems Problem 6.1. Which states of the Die Hard 3 machine below have transitions to exactly two states? Die Hard Transitions 1. Fill the little jug: .b; l/ ! .b; 3/ for l < 3. 2. Fill the big jug: .b; l/ ! .5; l/ for b < 5. 3. Empty the little jug: .b; l/ ! .b; 0/ for l > 0. 4. Empty the big jug: .b; l/ ! .0; l/ for b > 0. 6 Inthis case there may be multiple women married to one man, but this is a minor complication, see Problem 6.23. 7 Much more about the Stable Marriage Problem can be found in the very readable mathematical monograph by Dan Gusfield and Robert W. Irving, [25]. “mcs” — 2017/3/10 — 22:22 — page 184 — #192 184 Chapter 6 State Machines 5. Pour from the little jug into the big jug: for l > 0, ( .b C l; 0/ if b C l 5, .b; l/ ! .5; l .5 b// otherwise. 6. Pour from big jug into little jug: for b > 0, ( .0; b C l/ if b C l 3, .b; l/ ! .b .3 l/; 3/ otherwise. Homework Problems Problem 6.2. In the late 1960s, the military junta that ousted the government of the small re- public of Nerdia completely outlawed built-in multiplication operations, and also forbade division by any number other than 3. Fortunately, a young dissident found a way to help the population multiply any two nonnegative integers without risking persecution by the junta. The procedure he taught people is: procedure multiply.x; y: nonnegative integers/ r WD x; s WD y; a WD 0; while s ¤ 0 do if 3 j s then r WD r C r C r; s WD s=3; else if 3 j .s 1/ then a WD a C r; r WD r C r C r; s WD .s 1/=3; else a WD a C r C r; r WD r C r C r; s WD .s 2/=3; return a; We can model the algorithm as a state machine whose states are triples of non- negative integers .r; s; a/. The initial state is .x; y; 0/. The transitions are given by “mcs” — 2017/3/10 — 22:22 — page 185 — #193 6.4. The Stable Marriage Problem 185 the rule that for s > 0: 8 <.3r; s=3; a/ ˆ if 3 j s .r; s; a/ ! .3r; .s 1/=3; a C r/ if 3 j .s 1/ ˆ .3r; .s 2/=3; a C 2r/ otherwise: : (a) List the sequence of steps that appears in the execution of the algorithm for inputs x D 5 and y D 10. (b) Use the Invariant Method to prove that the algorithm is partially correct—that is, if s D 0, then a D xy. (c) Prove that the algorithm terminates after at most 1 C log3 y executions of the body of the do statement. Problem 6.3. A robot named Wall-E wanders around a two-dimensional grid. He starts out at .0; 0/ and is allowed to take four different types of steps: 1. .C2; 1/ 2. .C1; 2/ 3. .C1; C1/ 4. . 3; 0/ Thus, for example, Wall-E might walk as follows. The types of his steps are listed above the arrows. 1 3 2 4 .0; 0/ ! .2; 1/ ! .3; 0/ ! .4; 2/ ! .1; 2/ ! : : : Wall-E’s true love, the fashionable and high-powered robot, Eve, awaits at .0; 2/. (a) Describe a state machine model of this problem. (b) Will Wall-E ever find his true love? Either find a path from Wall-E to Eve, or use the Invariant Principle to prove that no such path exists. Problem 6.4. A hungry ant is placed on an unbounded grid. Each square of the grid either con- tains a crumb or is empty. The squares containing crumbs form a path in which, “mcs” — 2017/3/10 — 22:22 — page 186 — #194 186 Chapter 6 State Machines except at the ends, every crumb is adjacent to exactly two other crumbs. The ant is placed at one end of the path and on a square containing a crumb. For example, the figure below shows a situation in which the ant faces North, and there is a trail of food leading approximately Southeast. The ant has already eaten the crumb upon which it was initially placed. The ant can only smell food directly in front of it. The ant can only remember a small number of things, and what it remembers after any move only depends on what it remembered and smelled immediately before the move. Based on smell and memory, the ant may choose to move forward one square, or it may turn right or left. It eats a crumb when it lands on it. The above scenario can be nicely modelled as a state machine in which each state is a pair consisting of the “ant’s memory” and “everything else”—for example, information about where things are on the grid. Work out the details of such a model state machine; design the ant-memory part of the state machine so the ant will eat all the crumbs on any finite path at which it starts and then signal when it is done. Be sure to clearly describe the possible states, transitions, and inputs and outputs (if any) in your model. Briefly explain why your ant will eat all the crumbs. Note that the last transition is a self-loop; the ant signals done for eternity. One could also add another end state so that the ant signals done only once. Problem 6.5. Suppose that you have a regular deck of cards arranged as follows, from top to bottom: A~ 2~ : : : K~ A 2 : : : K A| 2| : : : K| A} 2} : : : K} Only two operations on the deck are allowed: inshuffling and outshuffling. In both, you begin by cutting the deck exactly in half, taking the top half into your “mcs” — 2017/3/10 — 22:22 — page 187 — #195 6.4. The Stable Marriage Problem 187 right-hand and the bottom into your left. Then you shuffle the two halves together so that the cards are perfectly interlaced; that is, the shuffled deck consists of one card from the left, one from the right, one from the left, one from the right, etc. The top card in the shuffled deck comes from the right-hand in an outshuffle and from the left-hand in an inshuffle. (a) Model this problem as a state machine. (b) Use the Invariant Principle to prove that you cannot make the entire first half of the deck black through a sequence of inshuffles and outshuffles. Note: Discovering a suitable invariant can be difficult! This is the part of a correctness proof that generally requires some insight, and there is no simple recipe for finding invariants. A standard initial approach is to identify a bunch of reachable states and then look for a pattern—some feature that they all share. Problem 6.6. Prove that the fast exponentiation state machine of Section 6.3.1 will halt after dlog2 ne C 1 (6.3) transitions starting from any state where the value of z is n 2 ZC . Hint: Strong induction. Class Problems Problem 6.7. In this problem you will establish a basic property of a puzzle toy called the Fifteen Puzzle using the method of invariants. The Fifteen Puzzle consists of sliding square tiles numbered 1; : : : ; 15 held in a 4 4 frame with one empty square. Any tile adjacent to the empty square can slide into it. The standard initial position is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 We would like to reach the target position (known in the oldest author’s youth as “mcs” — 2017/3/10 — 22:22 — page 188 — #196 188 Chapter 6 State Machines “the impossible”): 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 A state machine model of the puzzle has states consisting of a 4 4 matrix with 16 entries consisting of the integers 1; : : : ; 15 as well as one “empty” entry—like each of the two arrays above. The state transitions correspond to exchanging the empty square and an adjacent numbered tile. For example, an empty at position .2; 2/ can exchange position with tile above it, namely, at position .1; 2/: n1 n2 n3 n4 n1 n3 n4 n5 n6 n7 n5 n2 n6 n7 ! n8 n9 n10 n11 n8 n9 n10 n11 n12 n13 n14 n15 n12 n13 n14 n15 We will use the invariant method to prove that there is no way to reach the target state starting from the initial state. We begin by noting that a state can also be represented as a pair consisting of two things: 1. a list of the numbers 1; : : : ; 15 in the order in which they appear—reading rows left-to-right from the top row down, ignoring the empty square, and 2. the coordinates of the empty square—where the upper left square has coor- dinates .1; 1/, the lower right .4; 4/. (a) Write out the “list” representation of the start state and the “impossible” state. Let L be a list of the numbers 1; : : : ; 15 in some order. A pair of integers is an out-of-order pair in L when the first element of the pair both comes earlier in the list and is larger, than the second element of the pair. For example, the list 1; 2; 4; 5; 3 has two out-of-order pairs: (4,3) and (5,3). The increasing list 1; 2 : : : n has no out-of-order pairs. Let a state S be a pair .L; .i; j // described above. We define the parity of S to be 0 or 1 depending on whether the sum of the number of out-of-order pairs in L and the row-number of the empty square is even or odd. that is ( 0 if p.L/ C i is even; parity.S / WWD 1 otherwise: “mcs” — 2017/3/10 — 22:22 — page 189 — #197 6.4. The Stable Marriage Problem 189 (b) Verify that the parity of the start state and the target state are different. (c) Show that the parity of a state is preserved under transitions. Conclude that “the impossible” is impossible to reach. By the way, if two states have the same parity, then in fact there is a way to get from one to the other. If you like puzzles, you’ll enjoy working this out on your own. Problem 6.8. The Massachusetts Turnpike Authority is concerned about the integrity of the new Zakim bridge. Their consulting architect has warned that the bridge may collapse if more than 1000 cars are on it at the same time. The Authority has also been warned by their traffic consultants that the rate of accidents from cars speeding across bridges has been increasing. Both to lighten traffic and to discourage speeding, the Authority has decided to make the bridge one-way and to put tolls at both ends of the bridge (don’t laugh, this is Massachusetts). So cars will pay tolls both on entering and exiting the bridge, but the tolls will be different. In particular, a car will pay $3 to enter onto the bridge and will pay $2 to exit. To be sure that there are never too many cars on the bridge, the Authority will let a car onto the bridge only if the difference between the amount of money currently at the entry toll booth and the amount at the exit toll booth is strictly less than a certain threshold amount of $T0 . The consultants have decided to model this scenario with a state machine whose states are triples .A; B; C / of nonnegative integers, where A is an amount of money at the entry booth, B is an amount of money at the exit booth, and C is a number of cars on the bridge. Any state with C > 1000 is called a collapsed state, which the Authority dearly hopes to avoid. There will be no transition out of a collapsed state. Since the toll booth collectors may need to start off with some amount of money in order to make change, and there may also be some number of “official” cars already on the bridge when it is opened to the public, the consultants must be ready to analyze the system started at any uncollapsed state. So let A0 be the initial number of dollars at the entrance toll booth, B0 the initial number of dollars at the exit toll booth, and C0 1000 the number of official cars on the bridge when it is opened. You should assume that even official cars pay tolls on exiting or entering the bridge after the bridge is opened. “mcs” — 2017/3/10 — 22:22 — page 190 — #198 190 Chapter 6 State Machines (a) Give a mathematical model of the Authority’s system for letting cars on and off the bridge by specifying a transition relation between states of the form .A; B; C / above. (b) Characterize each of the following derived variables A; B; A C B; A B; 3C A; 2A 3B; B C 3C; 2A 3B 6C; 2A 2B 3C as one of the following constant C strictly increasing SI strictly decreasing SD weakly increasing but not constant WI weakly decreasing but not constant WD none of the above N and briefly explain your reasoning. The Authority has asked their engineering consultants to determine T and to verify that this policy will keep the number of cars from exceeding 1000. The consultants reason that if C0 is the number of official cars on the bridge when it is opened, then an additional 1000 C0 cars can be allowed on the bridge. So as long as A B has not increased by 3.1000 C0 /, there shouldn’t more than 1000 cars on the bridge. So they recommend defining T0 WWD 3.1000 C0 / C .A0 B0 /; (6.4) where A0 is the initial number of dollars at the entrance toll booth, B0 is the initial number of dollars at the exit toll booth. (c) Use the results of part (b) to define a simple predicate P on states of the tran- sition system which is satisfied by the start state—that is P .A0 ; B0 ; C0 / holds—is not satisfied by any collapsed state, and is a preserved invariant of the system. Ex- plain why your P has these properties. Conclude that the traffic won’t cause the bridge to collapse. (d) A clever MIT intern working for the Turnpike Authority agrees that the Turn- pike’s bridge management policy will be safe: the bridge will not collapse. But she warns her boss that the policy will lead to deadlock—a situation where traffic can’t move on the bridge even though the bridge has not collapsed. Explain more precisely in terms of system transitions what the intern means, and briefly, but clearly, justify her claim. “mcs” — 2017/3/10 — 22:22 — page 191 — #199 6.4. The Stable Marriage Problem 191 Problem 6.9. Start with 102 coins on a table, 98 showing heads and 4 showing tails. There are two ways to change the coins: (i) flip over any ten coins, or (ii) let n be the number of heads showing. Place n C 1 additional coins, all showing tails, on the table. For example, you might begin by flipping nine heads and one tail, yielding 90 heads and 12 tails, then add 91 tails, yielding 90 heads and 103 tails. (a) Model this situation as a state machine, carefully defining the set of states, the start state, and the possible state transitions. (b) Explain how to reach a state with exactly one tail showing. (c) Define the following derived variables: C WWD the number of coins on the table; H WWD the number of heads; T WWD the number of tails; C2 WWD remainder.C =2/; H2 WWD remainder.H=2/; T2 WWD remainder.T =2/: Which of these variables is 1. strictly increasing 2. weakly increasing 3. strictly decreasing 4. weakly decreasing 5. constant (d) Prove that it is not possible to reach a state in which there is exactly one head showing. Problem 6.10. A classroom is designed so students sit in a square arrangement. An outbreak of beaver flu sometimes infects students in the class; beaver flu is a rare variant of bird flu that lasts forever, with symptoms including a yearning for more quizzes and the thrill of late night problem set sessions. Here is an illustration of a 66-seat classroom with seats represented by squares. The locations of infected students are marked with an asterisk. “mcs” — 2017/3/10 — 22:22 — page 192 — #200 192 Chapter 6 State Machines Outbreaks of infection spread rapidly step by step. A student is infected after a step if either the student was infected at the previous step (since beaver flu lasts forever), or the student was adjacent to at least two already-infected students at the pre- vious step. Here adjacent means the students’ individual squares share an edge (front, back, left or right); they are not adjacent if they only share a corner point. So each student is adjacent to 2, 3 or 4 others. In the example, the infection spreads as shown below. ) ) In this example, over the next few time-steps, all the students in class become infected. Theorem. If fewer than n students among those in an nn arrangment are initially infected in a flu outbreak, then there will be at least one student who never gets infected in this outbreak, even if students attend all the lectures. Prove this theorem. Hint: Think of the state of an outbreak as an n n square above, with asterisks indicating infection. The rules for the spread of infection then define the transitions of a state machine. Find a weakly decreasing derived variable that leads to a proof of this theorem. “mcs” — 2017/3/10 — 22:22 — page 193 — #201 6.4. The Stable Marriage Problem 193 Exam Problems Problem 6.11. Token replacing-1-2 is a single player game using a set of tokens, each colored black or white. Except for color, the tokens are indistinguishable. In each move, a player can replace one black token with two white tokens, or replace one white token with two black tokens. We can model this game as a state machine whose states are pairs .nb ; nw / where nb 0 equals the number of black tokens, and nw 0 equals the number of white tokens. (a) List the numbers of the following predicates that are preserved invariants. nb C nw rem.nb C nw ; 3/ ¤ 2 (6.5) nw nb rem.nw nb ; 3/ D 2 (6.6) nb nw rem.nb n2 ; 3/ D 2 (6.7) nb C nw > 5 (6.8) nb C nw < 5 (6.9) Now assume the game starts with a single black token, that is, the start state is .1; 0/. (b) List the numbers of the predicates above are true for all reachable states: (c) Define the predicate T .nb ; nw / by the rule: T .nb ; nw / WWD rem.nw nb ; 3/ D 2: We will now prove the following: Claim. If T .nb ; nw /, then state .nb ; nw / is reachable. Note that this claim is different from the claim that T is a preserved invariant. The proof of the Claim will be by induction in n using induction hypothesis P .n/WWD 8.nb ; nw /: Œ.nb C nw D n/ AND T .nb ; nw / IMPLIES .nb ; nw / is reachable: The base cases will be when n 2. “mcs” — 2017/3/10 — 22:22 — page 194 — #202 194 Chapter 6 State Machines Assuming that the base cases have been verified, complete the Inductive Step. Now verify the Base Cases: P .n/ for n 2. Problem 6.12. Token Switching is a process for updating a set of black and white tokens. The process starts with a single black token. At each step, (i) one black token can be replaced with two white tokens, or (ii) if the numbers of white and black tokens are not the same, the colors of all the tokens can be switched: all the black tokens become white, and the white tokens become black. We can model Token Switching as a state machine whose states are pairs .b; w/ of nonnegative integers, where b equals the number of black tokens, and w equals the number of white tokens. So the start state is .1; 0/. (a) Indicate which of the following states can be reached from the start state in exactly two steps: .0; 0/; .1; 0/; .0; 1/; .1; 1/; .0; 2/; .2; 0/; .2; 1/; .1; 2/; .0; 3/; .3; 0/ (b) Define the predicate F .b; w/ by the rule: F .b; w/ WWD .b w/is not a multiple of 3: Prove the following Claim. If F .b; w/, then state .b; w/ is reachable from the start state. 7777 88 (c) Explain why state .116 ; 510 / is not a reachable state. Hint: Do not assume F is a preserved invariant without proving it. Problem 6.13. Token replacing-1-3 is a single player game using a set of tokens, each colored black or white. In each move, a player can replace a black token with three white tokens, or replace a white token with three black tokens. We can model this game as a state machine whose states are pairs .b; w/ of nonnegative integers, where b is the number of black tokens and w the number of white ones. “mcs” — 2017/3/10 — 22:22 — page 195 — #203 6.4. The Stable Marriage Problem 195 The game has two possible start states: .5; 4/ or .4; 3/. We call a state .b; w/ eligible when rem.b w; 4/ D 1; AND (6.10) minfb; wg 3: (6.11) This problem examines the connection between eligible states and states that are reachable from either of the possible start states. (a) Give an example of a reachable state that is not eligible. (b) Show that the derived variable b C w is strictly increasing. Conclude that state .3; 2/ is not reachable. (c) Suppose .b; w/ is eligible and b 6. Verify that .b 3; w C 1/ is eligible. For the rest of the problem, you may—and should—assume the following Fact: Fact. If maxfb; wg 5 and .b; w/ is eligible, then .b; w/ is reachable. (This is easy to verify since there are only nine states with b; w 2 f3; 4; 5g, but don’t waste time doing this.) (d) Define the predicate P .n/ to be: 8.b; w/:Œb C w D n AND .b; w/ is eligible IMPLIES .b; w/ is reachable: Prove that P .n 1/ IMPLIES P .n C 1/ for all n 1. (e) Conclude that all eligible states are reachable. (f) Prove that .47 C 1; 45 C 2/ is not reachable. (g) Verify that rem.3b w; 8/ is a derived variable that is constant. Conclude that no state is reachable from both start states. Problem 6.14. There is a bucket containing more blue balls than red balls. As long as there are more blues than reds, any one of the following rules may be applied to add and/or remove balls from the bucket: (i) Add a red ball. “mcs” — 2017/3/10 — 22:22 — page 196 — #204 196 Chapter 6 State Machines (ii) Remove a blue ball. (iii) Add two reds and one blue. (iv) Remove two blues and one red. (a) Starting with 10 reds and 16 blues, what is the largest number of balls the bucket will contain by applying these rules? Let b be the number of blue balls and r be the number of red balls in the bucket at any given time. (b) Prove that b r 0 is a preserved invariant of the process of adding and removing balls according to rules (i)–(iv). (c) Prove that no matter how many balls the bucket contains, repeatedly applying rules (i)–(iv) will eventually lead to a state where no further rule can be applied. Problem 6.15. The following problem is a twist on the Fifteen-Puzzle analyzed in Problem 6.7. Let A be a sequence consisting of the numbers 1; : : : ; n in some order. A pair of integers in A is called an out-of-order pair when the first element of the pair both comes earlier in the sequence, and is larger, than the second element of the pair. For example, the sequence .1; 2; 4; 5; 3/ has two out-of-order pairs: .4; 3/ and .5; 3/. We let t .A/ equal the number of out-of-order pairs in A. For example, t ..1; 2; 4; 5; 3// D 2. The elements in A can be rearranged using the Rotate-Triple operation, in which three consecutive elements of A are rotated to move the smallest of them to be first. For example, in the sequence .2; 4; 1; 5; 3/, the Rotate-Triple operation could rotate the consecutive numbers 4; 1; 5, into 1; 5; 4 so that .2; 4; 1; 5; 3/ ! .2; 1; 5; 4; 3/: The Rotate-Triple could also rotate the consecutive numbers 2; 4; 1 into 1; 2; 4 so that .2; 4; 1; 5; 3/ ! .1; 2; 4; 5; 3/: We can think of a sequence A as a state of a state machine whose transitions correspond to possible applications of the Rotate-Triple operation. (a) Argue that the derived variable t is weakly decreasing. (b) Prove that having an even number of out-of-order pairs is a preserved invariant of this machine. “mcs” — 2017/3/10 — 22:22 — page 197 — #205 6.4. The Stable Marriage Problem 197 (c) Starting with S WWD .2014; 2013; 2012; : : : ; 2; 1/; explain why it is impossible to reach T WWD .1; 2; : : : ; 2012; 2013; 2014/: Problems for Section 6.4 Practice Problems Problem 6.16. Four Students want separate assignments to four VI-A Companies. Here are their preference rankings: Student Companies Albert: HP, Bellcore, AT&T, Draper Sarah: AT&T, Bellcore, Draper, HP Tasha: HP, Draper, AT&T, Bellcore Elizabeth: Draper, AT&T, Bellcore, HP Company Students AT&T: Elizabeth, Albert, Tasha, Sarah Bellcore: Tasha, Sarah, Albert, Elizabeth HP: Elizabeth, Tasha, Albert, Sarah Draper: Sarah, Elizabeth, Tasha, Albert (a) Use the Mating Ritual to find two stable assignments of Students to Compa- nies. (b) Describe a simple procedure to determine whether any given stable marriage problem has a unique solution, that is, only one possible stable matching. Briefly explain why it works. Problem 6.17. Suppose that Harry is one of the boys and Alice is one of the girls in the Mating Ritual. Which of the properties below are preserved invariants? Why? a. Alice is the only girl on Harry’s list. “mcs” — 2017/3/10 — 22:22 — page 198 — #206 198 Chapter 6 State Machines b. There is a girl who does not have any boys serenading her. c. If Alice is not on Harry’s list, then Alice has a suitor that she prefers to Harry. d. Alice is crossed off Harry’s list, and Harry prefers Alice to anyone he is serenading. e. If Alice is on Harry’s list, then she prefers Harry to any suitor she has. Problem 6.18. Prove that in a stable set of marriages, every man is the pessimal husband of his optimal wife. Hint: Follows directly from the definition of “rogue couple.” Problem 6.19. In the Mating Ritual for stable marriages between an equal number of boys and girls, explain why there must be a girl to whom no boy proposes (serenades) until the last day. Class Problems Problem 6.20. The preferences among 4 boys and 4 girls are partially specified in the following table: B1: G1 G2 – – B2: G2 G1 – – B3: – – G4 G3 B4: – – G3 G4 G1: B2 B1 – – G2: B1 B2 – – G3: – – B3 B4 G4: – – B4 B3 (a) Verify that .B1; G1/; .B2; G2/; .B3; G3/; .B4; G4/ will be a stable matching whatever the unspecified preferences may be. (b) Explain why the stable matching above is neither boy-optimal nor boy-pessimal and so will not be an outcome of the Mating Ritual. “mcs” — 2017/3/10 — 22:22 — page 199 — #207 6.4. The Stable Marriage Problem 199 (c) Describe how to define a set of marriage preferences among n boys and n girls which have at least 2n=2 stable assignments. Hint: Arrange the boys into a list of n=2 pairs, and likewise arrange the girls into a list of n=2 pairs of girls. Choose preferences so that the kth pair of boys ranks the kth pair of girls just below the previous pairs of girls, and likewise for the kth pair of girls. Within the kth pairs, make sure each boy’s first choice girl in the pair prefers the other boy in the pair. Problem 6.21. The Mating Ritual of Section 6.4.1 for finding stable marriages works even when the numbers of men and women are not equal. As before, a set of (monogamous) marriages between men and women is called stable when it has no “rogue couples.” (a) Extend the definition of rogue couple so it covers the case of unmarried men and women. Verify that in a stable set of marriages, either all the men are married or all the women are married. (b) Explain why even in the case of unequal numbers of men and women, applying the Mating Ritual will yield a stable matching. Homework Problems Problem 6.22. Suppose we want to assign pairs of “buddies,” who may be of the sex, where each person has a preference rank for who they would like to be buddies with. For the preference ranking given in Figure 6.5, show that there is no stable buddy assign- ment. In this figure Mergatroid’s preferences aren’t shown because they don’t even matter. Problem 6.23. The most famous application of stable matching was in assigning graduating med- ical students to hospital residencies. Each hospital has a preference ranking of students, and each student has a preference ranking of hospitals, but unlike finding stable marriages between an equal number of boys and girls, hospitals generally have differing numbers of available residencies, and the total number of residen- cies may not equal the number of graduating students. Explain how to adapt the Stable Matching problem with an equal number of boys and girls to this more general situation. In particular, modify the definition of stable matching so it applies in this situation, and explain how to adapt the Mating Ritual “mcs” — 2017/3/10 — 22:22 — page 200 — #208 200 Chapter 6 State Machines Alex 2 1 3 1 2 Robin Bobby Joe 3 2 1 3 Mergatroid Figure 6.5 Some preferences with no stable buddy matching. to handle it. Problem 6.24. Give an example of a stable matching between 3 boys and 3 girls where no person gets their first choice. Briefly explain why your matching is stable. Can your matching be obtained from the Mating Ritual or the Ritual with boys and girls reversed? Problem 6.25. In a stable matching between an equal number of boys and girls produced by the Mating Ritual, call a person lucky if they are matched up with someone in the top half of their preference list. Prove that there must be at least one lucky person. Hint: The average number of times a boy gets rejected by girls. Problem 6.26. Suppose there are two stable sets of marriages. So each man has a first wife and a second wife , and likewise each woman has a first husband and a second husband. Someone in a given marriage is a winner when they prefer their current spouse to their other spouse, and they are a loser when they prefer their other spouse to their current spouse. (If someone has the same spouse in both of their marriages, then they will be neither a winner nor a loser.) “mcs” — 2017/3/10 — 22:22 — page 201 — #209 6.4. The Stable Marriage Problem 201 We will show that In every marriage, someone is a winner iff their spouse is a loser. (WL) This will lead to an alternative proof of Theorem 6.4.9 that when men are married to their optimal spouses, women must be married to their pessimal spouses. This alternative proof does not depend on the Mating Ritual of Section 6.4.1. (a) The left to right direction of (WL) is equivalent to the assertion that married partners cannot both be winners. Explain why this follows directly from the defini- tion of rogue couple. The right to left direction of (WL) is equivalent to the assertion that a married couple cannot both be losers. This will follow by comparing the number of winners and losers among the marriages. (b) Explain why the number of winners must equal the number of losers among the two sets of marriages. (c) Complete the proof of (WL) by showing that if some married couple were both losers, then there must be another couple who were both winners. (d) Conclude that in a stable set of marriages, someone’s spouse is optimal iff they are pessimal for their spouse. Problem 6.27. Suppose there are two stable sets of marriages, a first set and a second set. So each man has a first wife and a second wife (they may be the same), and likewise each woman has a first husband and a second husband. We can form a third set of marriages by matching each man with the wife he prefers among his first and second wives. (a) Prove that this third set of marriages is an exact matching: no woman is mar- ried to two men. (b) Prove that this third marriage set is stable. Hint: You may assume the following fact from Problem 6.26. In every marriage, someone is a winner iff their spouse is a loser. (SL) “mcs” — 2017/3/10 — 22:22 — page 202 — #210 202 Chapter 6 State Machines Problem 6.28. A state machine has commuting transitions if for any states p; q; r .p ! q AND p ! r/ IMPLIES 9t: q ! t AND r ! t: The state machine is confluent if .p ! q AND p ! r/ IMPLIES 9t: q ! t AND r ! t: (a) Prove that if a state machine has commuting transitions, then it is confluent. Hint: By induction on the number of moves from p to q plus the number from p to r. (b) A final state of a state machine is one from which no transition is possible. Explain why, if a state machine is confluent, then at most one final state is reachable from the start state. Problem 6.29. According to the day-by-day description of the Mating Ritual of Section 6.4.1, at the end of each day, every the man’s list is updated to remove the name of the woman he who rejected him. But it’s easier, and more flexible, simply to let one women reject one suitor at a time. In particular, the states of this Flexible Mating Ritual state machine will be the same as for the day-by-day Ritual: a state will be a list, for each man, of the women who have not rejected him. But now a transition will be to choose two men who are serenading the same woman—that is, who have the same woman at the top of their current lists—and then have the woman reject whichever of the two she likes less. So the only change in state is that the name of the serenaded woman gets deleted from the top of the list of the man she liked less among two of her serenaders—everything else stays the same. It’s a worthwhile review to verify that the same preserved invariants used to es- tablish the properties of the Mating Ritual will apply to the Flexible Mating Ritual. This ensures that the Flexible Ritual will also terminate with a stable set of mar- riages. But now a new issue arises: we know that there can be many sets of possible sets of stable marriages for the same set of men/women preferences. So it seems possible that the Flexible Ritual might terminate with different stable marriage sets, depending on which choice of transition was made at each state. But this does not happen: the Flexible Ritual will always terminate with the same set of stable marriages as the day-by-day Ritual. “mcs” — 2017/3/10 — 22:22 — page 203 — #211 6.4. The Stable Marriage Problem 203 To prove this, we begin with a definition: a state machine has commuting transi- tions if for any states p; q; r, .p ! q AND p ! r/ IMPLIES 9t: q ! t AND r ! t: (a) Verify that the Flexible Mating Ritual has commuting transitions. (b) Now conclude from Problem 6.28 that the Flexible Mating Ritual always ter- minate with the same set of stable marriages as the day-by-day Ritual. Exam Problems Problem 6.30. Four unfortunate children want to be adopted by four foster families of ill repute. A child can only be adopted by one family, and a family can only adopt one child. Here are their preference rankings (most-favored to least-favored): Child Families Bottlecap: Hatfields, McCoys, Grinches, Scrooges Lucy: Grinches, Scrooges, McCoys, Hatfields Dingdong: Hatfields, Scrooges, Grinches, McCoys Zippy: McCoys, Grinches, Scrooges, Hatfields Family Children Grinches: Zippy, Dingdong, Bottlecap, Lucy Hatfields: Zippy, Bottlecap, Dingdong, Lucy Scrooges: Bottlecap, Lucy, Dingdong, Zippy McCoys: Lucy, Zippy, Bottlecap, Dingdong (a) Exhibit two different stable matching of Children and Families. Family Child in 1st match Child in 2nd match Grinches: Hatfields: Scrooges: McCoys: (b) Examine the matchings from part a, and explain why these matchings are the only two possible stable matchings between Children and Families. Hint: In general, there may be many more than two stable matchings for the same set of preferences. “mcs” — 2017/3/10 — 22:22 — page 204 — #212 204 Chapter 6 State Machines Problem 6.31. The Mating Ritual 6.4.1 for finding stable marriages works without change when there are at least as many, and possibly more, men than women. You may assume this. So the Ritual ends with all the women married and no rogue couples for these marriages, where an unmarried man and a married woman who prefers him to her spouse is also considered to be a “rogue couple.” Let Alice be one of the women, and Bob be one of the men. Indicate which of the properties below that are preserved invariants of the Mating Ritual 6.4 when there are at least as many men as women. Briefly explain your answers. (a) Alice has a suitor (man who is serenading her) whom she prefers to Bob. (b) Alice is the only woman on Bob’s list. (c) Alice has no suitor. (d) Bob prefers Alice to the women he is serenading. (e) Bob is serenading Alice. (f) Bob is not serenading Alice. (g) Bob’s list of women to serenade is empty. Problem 6.32. We want a stable matching between n boys and n girls for a positive integer n. (a) Explain how to define preference rankings for the boys and the girls that allow only one possible stable matching. Briefly justify your answer. (b) Mark each of the following predicates about the Stable Marriage Ritual P if it is a Preserved Invariant, N if it is not, and “U” if you are very unsure. “Bob’s list” refers to the list of the women he has not crossed off. (i) Alice is not on Bob’s list. (ii) No girl is on Bob’s list. (iii) Bob is the only boy serenading Alice. (iv) Bob has fewer than 5 girls on his list. (v) Bob prefers Alice to his favorite remaining girl. (vi) Alice prefers her favorite current suitor to Bob. (vii) Bob is serenading his optimal spouse. “mcs” — 2017/3/10 — 22:22 — page 205 — #213 6.4. The Stable Marriage Problem 205 (viii) Bob is serenading his pessimal spouse. (ix) Alice’s optimal spouse is serenading her. (x) Alice’s pessimal spouse is serenading her. “mcs” — 2017/3/10 — 22:22 — page 206 — #214 “mcs” — 2017/3/10 — 22:22 — page 207 — #215 7 Recursive Data Types Recursive data types play a central role in programming, and induction is really all about them. Recursive data types are specified by recursive definitions, which say how to construct new data elements from previous ones. Along with each recursive data type there are recursive definitions of properties or functions on the data type. Most importantly, based on a recursive definition, there is a structural induction method for proving that all data of the given type have some property. This chapter examines a few examples of recursive data types and recursively defined functions on them: strings of characters, “balanced” strings of brackets, the nonnegative integers, and arithmetic expressions. two-player games with perfect information. 7.1 Recursive Definitions and Structural Induction We’ll start off illustrating recursive definitions and proofs using the example of character strings. Normally we’d take strings of characters for granted, but it’s informative to treat them as a recursive data type. In particular, strings are a nice first example because you will see recursive definitions of things that are easy to understand, or that you already know, so you can focus on how the definitions work without having to figure out what they are supposed to mean. Definitions of recursive data types have two parts: Base case(s) specifying that some known mathematical elements are in the data type, and Constructor case(s) that specify how to construct new data elements from previously constructed elements or from base elements. The definition of strings over a given character set A follows this pattern: “mcs” — 2017/3/10 — 22:22 — page 208 — #216 208 Chapter 7 Recursive Data Types Definition 7.1.1. Let A be a nonempty set called an alphabet, whose elements are referred to as characters (also called letters, symbols, or digits). The recursive data type A of strings over alphabet A is defined as follows: Base case: the empty string is in A . Constructor case: If a 2 A and s 2 A , then the pair ha; si 2 A . So f0; 1g are the binary strings. The usual way to treat binary strings is as sequences of 0’s and 1’s. For example, we have identified the length-4 binary string 1011 as a sequence of bits, the 4-tuple .1; 0; 1; 1/. But according to the recursive Definition 7.1.1, this string would be represented by nested pairs, namely h1; h0; h1; h1; iiii : These nested pairs are definitely cumbersome and may also seem bizarre, but they actually reflect the way that such lists of characters would be represented in pro- gramming languages like Scheme or Python, where ha; si would correspond to cons.a; s/. Notice that we haven’t said exactly how the empty string is represented. It really doesn’t matter, as long as we can recognize the empty string and not confuse it with any nonempty string. Continuing the recursive approach, let’s define the length of a string. Definition 7.1.2. The length jsj of a string s is defined recursively based on Defi- nition 7.1.1. Base case: jj WWD 0. Constructor case: j ha; si j WWD 1 C jsj. This definition of length follows a standard pattern: functions on recursive data types can be defined recursively using the same cases as the data type definition. Specifically, to define a function f on a recursive data type, define the value of f for the base cases of the data type definition, then define the value of f in each constructor case in terms of the values of f on the component data items. Let’s do another example: the concatenation s t of the strings s and t is the string consisting of the letters of s followed by the letters of t. This is a per- fectly clear mathematical definition of concatenation (except maybe for what to do with the empty string), and in terms of Scheme/Python lists, s t would be the list append.s; t /. Here’s a recursive definition of concatenation. “mcs” — 2017/3/10 — 22:22 — page 209 — #217 7.1. Recursive Definitions and Structural Induction 209 Definition 7.1.3. The concatenation s t of the strings s; t 2 A is defined recur- sively based on Definition 7.1.1: Base case: t WWD t: Constructor case: ha; si t WWD ha; s ti : 7.1.1 Structural Induction Structural induction is a method for proving that all the elements of a recursively defined data type have some property. A structural induction proof has two parts corresponding to the recursive definition: Prove that each base case element has the property. Prove that each constructor case element has the property, when the construc- tor is applied to elements that have the property. For example, in the base case of the definition of concatenation 7.1.3, we defined concatenation so the empty string was a “left identity,” namely, s WWD s. We want the empty string also to be “right identity,” namely, s D s. Being a right identity is not part of Definition 7.1.3, but we can prove it easily by structural induction: Lemma 7.1.4. sDs for all s 2 A . Proof. The proof is by structural induction on the recursive definition 7.1.3 of con- catenation. The induction hypothesis will be P .s/ WWD Œs D s: Base case: (s D ). sD D ( is a left identity by Def 7.1.3) D s: “mcs” — 2017/3/10 — 22:22 — page 210 — #218 210 Chapter 7 Recursive Data Types Constructor case: (s D a t ). s D .a t / D a .t / (Constructor case of Def 7.1.3) Dat by induction hypothesis P .t / D s: So P .s/ holds. This completes the proof of the constructor case, and we conclude by structural induction that equation (7.1.4) holds for all s 2 A . We can also verify properties of recursive functions by structural induction on their definitions. For example, let’s verify the familiar fact that the length of the concatenation of two strings is the sum of their lengths: Lemma. js t j D jsj C jtj for all s; t 2 A . Proof. By structural induction on the definition of s 2 A . The induction hypoth- esis is P .s/ WWD 8t 2 A : js t j D jsj C jtj: Base case (s D ): js tj D j tj D jt j (base case of Def 7.1.3 of concatenation) D 0 C jt j D jsj C jt j (Def of jj): Constructor case: (s WWD ha; ri). js tj D j ha; ri tj D j ha; r t i j (constructor case of Def of concat) D 1 C jr t j (constructor case of def length) D 1 C .jrj C jtj/ (ind. hyp. P .r/) D .1 C jrj/ C jtj D j ha; ri j C jtj (constructor case, def of length) D jsj C jtj: This proves that P .s/ holds, completing the constructor case. By structural induc- tion, we conclude that P .s/ holds for all strings s 2 A . “mcs” — 2017/3/10 — 22:22 — page 211 — #219 7.2. Strings of Matched Brackets 211 These proofs illustrate the general principle: The Principle of Structural Induction. Let P be a predicate on a recursively defined data type R. If P .b/ is true for each base case element b 2 R, and for all two-argument constructors c, ŒP .r/ AND P .s/ IMPLIES P .c.r; s// for all r; s 2 R, and likewise for all constructors taking other numbers of arguments, then P .r/ is true for all r 2 R: 7.2 Strings of Matched Brackets Let f] ; [ g be the set of all strings of square brackets. For example, the following two strings are in f] ; [ g : []][[[[[]] and [ [ [ ] ] [ ] ] [ ] (7.1) A string s 2 f] ; [ g is called a matched string if its brackets “match up” in the usual way. For example, the left-hand string above is not matched because its second right bracket does not have a matching left bracket. The string on the right is matched. We’re going to examine several different ways to define and prove properties of matched strings using recursively defined sets and functions. These properties are pretty straightforward, and you might wonder whether they have any particular relevance in computer science. The honest answer is “not much relevance any more.” The reason for this is one of the great successes of computer science, as explained in the text box below. “mcs” — 2017/3/10 — 22:22 — page 212 — #220 212 Chapter 7 Recursive Data Types Expression Parsing During the early development of computer science in the 1950’s and 60’s, creation of effective programming language compilers was a central concern. A key aspect in processing a program for compilation was expression parsing. One significant problem was to take an expression like x C y z2 y C 7 and put in the brackets that determined how it should be evaluated—should it be ŒŒx C y z 2 y C 7; or; x C Œy z 2 Œy C 7; or; Œx C Œy z 2 Œy C 7; or : : :‹ The Turing award (the “Nobel Prize” of computer science) was ultimately be- stowed on Robert W. Floyd, for, among other things, discovering simple proce- dures that would insert the brackets properly. In the 70’s and 80’s, this parsing technology was packaged into high-level compiler-compilers that automatically generated parsers from expression gram- mars. This automation of parsing was so effective that the subject no longer demanded attention. It had largely disappeared from the computer science cur- riculum by the 1990’s. The matched strings can be nicely characterized as a recursive data type: Definition 7.2.1. Recursively define the set RecMatch of strings as follows: Base case: 2 RecMatch. Constructor case: If s; t 2 RecMatch, then [ s ] t 2 RecMatch: Here [ s ] t refers to the concatenation of strings which would be written in full as [ .s .] t //: From now on, we’ll usually omit the “’s.” Using this definition, 2 RecMatch by the base case, so letting s D t D in the constructor case implies [ ] D [ ] 2 RecMatch: “mcs” — 2017/3/10 — 22:22 — page 213 — #221 7.2. Strings of Matched Brackets 213 Now, [ ] [ ] D [ ] [ ] 2 RecMatch (letting s D ; t D [ ] ) [ [ ] ] D [ [ ] ] 2 RecMatch (letting s D [ ] ; t D ) [ [ ] ] [ ] 2 RecMatch (letting s D [ ] ; t D [ ] ) are also strings in RecMatch by repeated applications of the constructor case; and so on. It’s pretty obvious that in order for brackets to match, there had better be an equal number of left and right ones. For further practice, let’s carefully prove this from the recursive definitions, beginning with a recursive definition of the number #c .s/ of occurrences of the character c 2 A in a string s: Definition 7.2.2. Base case: #c ./ WWD 0. Constructor case: ( #c .s/ if a ¤ c; #c .ha; si/ WWD 1 C #c .s/ if a D c: The following Lemma follows directly by structural induction on Definition 7.2.2. We’ll leave the proof for practice (Problem 7.9). Lemma 7.2.3. #c .s t / D #c .s/ C #c .t /: Lemma. Every string in RecMatch has an equal number of left and right brackets. Proof. The proof is by structural induction with induction hypothesis h i P .s/ WWD #[ .s/ D #] .s/ : Base case: P ./ holds because #[ ./ D 0 D #] ./ by the base case of Definition 7.2.2 of #c ./. “mcs” — 2017/3/10 — 22:22 — page 214 — #222 214 Chapter 7 Recursive Data Types Constructor case: By structural induction hypothesis, we assume P .s/ and P .t / and must show P .[ s ] t /: #[ .[ s ] t / D #[ .[ / C #[ .s/ C #[ .] / C #[ .t / (Lemma 7.2.3) D 1 C #[ .s/ C 0 C #[ .t / (def #[ ./) D 1 C #] .s/ C 0 C #] .t / (by P .s/ and P .t /) D 0 C #] .s/ C 1 C #] .t / D #] .[ / C #] .s/ C #] .] / C #] .t / (def #] ./) D #] .[ s ] t / (Lemma 7.2.3) This completes the proof of the constructor case. We conclude by structural induc- tion that P .s/ holds for all s 2 RecMatch. Warning: When a recursive definition of a data type allows the same element to be constructed in more than one way, the definition is said to be ambiguous. We were careful to choose an unambiguous definition of RecMatch to ensure that functions defined recursively on its definition would always be well-defined. Re- cursively defining a function on an ambiguous data type definition usually will not work. To illustrate the problem, here’s another definition of the matched strings. Definition 7.2.4. Define the set, AmbRecMatch f] ; [ g recursively as follows: Base case: 2 AmbRecMatch, Constructor cases: if s; t 2 AmbRecMatch, then the strings [ s ] and st are also in AmbRecMatch. It’s pretty easy to see that the definition of AmbRecMatch is just another way to define RecMatch, that is AmbRecMatch D RecMatch (see Problem 7.19). The definition of AmbRecMatch is arguably easier to understand, but we didn’t use it because it’s ambiguous, while the trickier definition of RecMatch is unambiguous. Here’s why this matters. Let’s define the number of operations f .s/ to construct a matched string s recursively on the definition of s 2 AmbRecMatch: f ./ WWD 0; (f base case) f .[ s ] / WWD 1 C f .s/; f .st / WWD 1 C f .s/ C f .t /: (f concat case) “mcs” — 2017/3/10 — 22:22 — page 215 — #223 7.3. Recursive Functions on Nonnegative Integers 215 This definition may seem ok, but it isn’t: f ./ winds up with two values, and consequently: 0 D f ./ (f base case)) D f . / (concat def, base case) D 1 C f ./ C f ./ (f concat case); D1C0C0D1 (f base case): This is definitely not a situation we want to be in! 7.3 Recursive Functions on Nonnegative Integers The nonnegative integers can be understood as a recursive data type. Definition 7.3.1. The set N is a data type defined recursively as: 0 2 N. If n 2 N, then the successor n C 1 of n is in N. The point here is to make it clear that ordinary induction is simply the special case of structural induction on the recursive Definition 7.3.1. This also justifies the familiar recursive definitions of functions on the nonnegative integers. 7.3.1 Some Standard Recursive Functions on N Example 7.3.2. The factorial function. This function is often written “nŠ.” You will see a lot of it in later chapters. Here, we’ll use the notation fac.n/: fac.0/ WWD 1. fac.n C 1/ WWD .n C 1/ fac.n/ for n 0. Example 7.3.3. Summation notation. Let “S.n/” abbreviate the expression “ niD1 f .i /.” P We can recursively define S.n/ with the rules S.0/ WWD 0. S.n C 1/ WWD f .n C 1/ C S.n/ for n 0. “mcs” — 2017/3/10 — 22:22 — page 216 — #224 216 Chapter 7 Recursive Data Types 7.3.2 Ill-formed Function Definitions There are some other blunders to watch out for when defining functions recursively. The main problems come when recursive definitions don’t follow the recursive def- inition of the underlying data type. Below are some function specifications that re- semble good definitions of functions on the nonnegative integers, but really aren’t. f1 .n/ WWD 2 C f1 .n 1/: (7.2) This “definition” has no base case. If some function f1 satisfied (7.2), so would a function obtained by adding a constant to the value of f1 . So equation (7.2) does not uniquely define an f1 . ( 0; if n D 0; f2 .n/ WWD (7.3) f2 .n C 1/ otherwise: This “definition” has a base case, but still doesn’t uniquely determine f2 . Any function that is 0 at 0 and constant everywhere else would satisfy the specification, so (7.3) also does not uniquely define anything. In a typical programming language, evaluation of f2 .1/ would begin with a re- cursive call of f2 .2/, which would lead to a recursive call of f2 .3/, . . . with recur- sive calls continuing without end. This “operational” approach interprets (7.3) as defining a partial function f2 that is undefined everywhere but 0. 8 <0; if n is divisible by 2, ˆ f3 .n/ WWD 1; if n is divisible by 3, (7.4) ˆ 2; otherwise. : This “definition” is inconsistent: it requires f3 .6/ D 0 and f3 .6/ D 1, so (7.4) doesn’t define anything. Mathematicians have been wondering about this function specification, known as the Collatz conjecture for a while: 8 <1; ˆ if n 1; f4 .n/ WWD f4 .n=2/ if n > 1 is even; (7.5) ˆ f4 .3n C 1/ if n > 1 is odd: : For example, f4 .3/ D 1 because f4 .3/ WWD f4 .10/ WWD f4 .5/ WWD f4 .16/ WWD f4 .8/ WWD f4 .4/ WWD f4 .2/ WWD f4 .1/ WWD 1: “mcs” — 2017/3/10 — 22:22 — page 217 — #225 7.4. Arithmetic Expressions 217 The constant function equal to 1 will satisfy (7.5), but it’s not known if another function does as well. The problem is that the third case specifies f4 .n/ in terms of f4 at arguments larger than n, and so cannot be justified by induction on N. It’s known that any f4 satisfying (7.5) equals 1 for all n up to over 1018 . A final example is the Ackermann function, which is an extremely fast-growing function of two nonnegative arguments. Its inverse is correspondingly slow-growing— it grows slower than log n, log log n, log log log n, . . . , but it does grow unboundly. This inverse actually comes up analyzing a useful, highly efficient procedure known as the Union-Find algorithm. This algorithm was conjectured to run in a number of steps that grew linearly in the size of its input, but turned out to be “linear” but with a slow growing coefficient nearly equal to the inverse Ackermann func- tion. This means that pragmatically, Union-Find is linear, since the theoretically growing coefficient is less than 5 for any input that could conceivably come up. The Ackermann function can be defined recursively as the function A given by the following rules: A.m; n/ D 2n if m D 0 or n 1; (7.6) A.m; n/ D A.m 1; A.m; n 1// otherwise: (7.7) Now these rules are unusual because the definition of A.m; n/ involves an eval- uation of A at arguments that may be a lot bigger than m and n. The definitions of f2 above showed how definitions of function values at small argument values in terms of larger one can easily lead to nonterminating evaluations. The definition of the Ackermann function is actually ok, but proving this takes some ingenuity (see Problem 7.25). 7.4 Arithmetic Expressions Expression evaluation is a key feature of programming languages, and recognition of expressions as a recursive data type is a key to understanding how they can be processed. To illustrate this approach we’ll work with a toy example: arithmetic expressions like 3x 2 C 2x C 1 involving only one variable, “x.” We’ll refer to the data type of such expressions as Aexp. Here is its definition: Definition 7.4.1. Base cases: “mcs” — 2017/3/10 — 22:22 — page 218 — #226 218 Chapter 7 Recursive Data Types – The variable x is in Aexp. – The arabic numeral k for any nonnegative integer k is in Aexp. Constructor cases: If e; f 2 Aexp, then – [ e + f ] 2 Aexp. The expression [ e + f ] is called a sum. The Aexp’s e and f are called the components of the sum; they’re also called the summands. – [ e * f ] 2 Aexp. The expression [ e * f ] is called a product. The Aexp’s e and f are called the components of the product; they’re also called the multiplier and multiplicand. – - [ e ] 2 Aexp. The expression - [ e ] is called a negative. Notice that Aexp’s are fully bracketed, and exponents aren’t allowed. So the Aexp version of the polynomial expression 3x 2 C2x C1 would officially be written as [ [ 3 * [ x * x ] ] + [ [ 2 * x ] + 1] ] : (7.8) These brackets and ’s clutter up examples, so we’ll often use simpler expressions like “3x 2 C2xC1” instead of (7.8). But it’s important to recognize that 3x 2 C2xC1 is not an Aexp; it’s an abbreviation for an Aexp. 7.4.1 Evaluation and Substitution with Aexp’s Evaluating Aexp’s Since the only variable in an Aexp is x, the value of an Aexp is determined by the value of x. For example, if the value of x is 3, then the value of 3x 2 C 2x C 1 is 34. In general, given any Aexp e and an integer value n for the variable x we can evaluate e to finds its value eval.e; n/. It’s easy, and useful, to specify this evaluation process with a recursive definition. Definition 7.4.2. The evaluation function, eval W Aexp Z ! Z, is defined recur- sively on expressions e 2 Aexp as follows. Let n be any integer. Base cases: eval.x; n/ WWD n (value of variable x is n), (7.9) eval.k; n/ WWD k (value of numeral k is k, regardless of x.) (7.10) “mcs” — 2017/3/10 — 22:22 — page 219 — #227 7.4. Arithmetic Expressions 219 Constructor cases: eval.[ e1 + e2 ] ; n/ WWD eval.e1 ; n/ C eval.e2 ; n/; (7.11) eval.[ e1 * e2 ] ; n/ WWD eval.e1 ; n/ eval.e2 ; n/; (7.12) eval.- [ e1 ] ; n/ WWD eval.e1 ; n/: (7.13) For example, here’s how the recursive definition of eval would arrive at the value of 3 C x 2 when x is 2: eval.[ 3 + [ x * x ] ] ; 2/ D eval.3; 2/ C eval.[ x * x ] ; 2/ (by Def 7.4.2.7.11) D 3 C eval.[ x * x ] ; 2/ (by Def 7.4.2.7.10) D 3 C .eval.x; 2/ eval.x; 2// (by Def 7.4.2.7.12) D 3 C .2 2/ (by Def 7.4.2.7.9) D 3 C 4 D 7: Substituting into Aexp’s Substituting expressions for variables is a standard operation used by compilers and algebra systems. For example, the result of substituting the expression 3x for x in the expression x.x 1/ would be 3x.3x 1/. We’ll use the general notation subst.f; e/ for the result of substituting an Aexp f for each of the x’s in an Aexp e. So as we just explained, subst.3x; x.x 1// D 3x.3x 1/: This substitution function has a simple recursive definition: Definition 7.4.3. The substitution function from Aexp Aexp to Aexp is defined recursively on expressions e 2 Aexp as follows. Let f be any Aexp. Base cases: subst.f; x/ WWD f (subbing f for variable x just gives f ,) (7.14) subst.f; k/ WWD k (subbing into a numeral does nothing.) (7.15) Constructor cases: “mcs” — 2017/3/10 — 22:22 — page 220 — #228 220 Chapter 7 Recursive Data Types subst.f; [ e1 + e2 ] / WWD [ subst.f; e1 / + subst.f; e2 /] (7.16) subst.f; [ e1 * e2 ] / WWD [ subst.f; e1 / * subst.f; e2 /] (7.17) subst.f; - [ e1 ] / WWD - [ subst.f; e1 /] : (7.18) Here’s how the recursive definition of the substitution function would find the result of substituting 3x for x in the expression x.x 1/: subst.3x; x.x 1// D subst.[ 3 * x ] ; [ x * [ x + - [ 1] ] ] / (unabbreviating) D [ subst.[ 3 * x ] ; x/ * subst.[ 3 * x ] ; [ x + - [ 1] ] /] (by Def 7.4.3 7.17) D [ [ 3 * x ] * subst.[ 3 * x ] ; [ x + - [ 1] ] /] (by Def 7.4.3 7.14) D [ [ 3 * x ] * [ subst.[ 3 * x ] ; x/ + subst.[ 3 * x ] ; - [ 1] /] ] (by Def 7.4.3 7.16) D [ [ 3 * x ] * [ [ 3 * x ] + - [ subst.[ 3 * x ] ; 1/] ] ] (by Def 7.4.3 7.14 & 7.18) D [ [ 3 * x ] * [ [ 3 * x ] + - [ 1] ] ] (by Def 7.4.3 7.15) D 3x.3x 1/ (abbreviation) Now suppose we have to find the value of subst.3x; x.x 1// when x D 2. There are two approaches. First, we could actually do the substitution above to get 3x.3x 1/, and then we could evaluate 3x.3x 1/ when x D 2, that is, we could recursively calculate eval.3x.3x 1/; 2/ to get the final value 30. This approach is described by the expression eval.subst.3x; x.x 1//; 2/: (7.19) In programming jargon, this would be called evaluation using the Substitution Model. With this approach, the formula 3x appears twice after substitution, so the multiplication 3 2 that computes its value gets performed twice. The second approach is called evaluation using the Environment Model. Here, to compute the value of (7.19), we evaluate 3x when x D 2 using just 1 multiplication to get the value 6. Then we evaluate x.x 1/ when x has this value 6 to arrive at the value 6 5 D 30. This approach is described by the expression eval.x.x 1/; eval.3x; 2//: (7.20) The Environment Model only computes the value of 3x once, and so it requires one fewer multiplication than the Substitution model to compute (7.20). “mcs” — 2017/3/10 — 22:22 — page 221 — #229 7.4. Arithmetic Expressions 221 This is a good place to stop and work this example out yourself (Problem 7.26). The fact that the final integer values of (7.19) and (7.20) agree is no surprise. The substitution model and environment models will always produce the same final. We can prove this by structural induction directly following the definitions of the two approaches. More precisely, what we want to prove is Theorem 7.4.4. For all expressions e; f 2 Aexp and n 2 Z, eval.subst.f; e/; n/ D eval.e; eval.f; n//: (7.21) Proof. The proof is by structural induction on e.1 Base cases: Case[x] The left-hand side of equation (7.21) equals eval.f; n/ by this base case in Definition 7.4.3 of the substitution function; the right-hand side also equals eval.f; n/ by this base case in Definition 7.4.2 of eval. Case[k]. The left-hand side of equation (7.21) equals k by this base case in Defini- tions 7.4.3 and 7.4.2 of the substitution and evaluation functions. Likewise, the right-hand side equals k by two applications of this base case in the Def- inition 7.4.2 of eval. Constructor cases: Case[[ e1 + e2 ] ] By the structural induction hypothesis (7.21), we may assume that for all f 2 Aexp and n 2 Z, eval.subst.f; ei /; n/ D eval.ei ; eval.f; n// (7.22) for i D 1; 2. We wish to prove that eval.subst.f; [ e1 + e2 ] /; n/ D eval.[ e1 + e2 ] ; eval.f; n//: (7.23) The left-hand side of (7.23) equals eval.[ subst.f; e1 / + subst.f; e2 /] ; n/ 1 This is an example of why it’s useful to notify the reader what the induction variable is—in this case it isn’t n. “mcs” — 2017/3/10 — 22:22 — page 222 — #230 222 Chapter 7 Recursive Data Types by Definition 7.4.3.7.16 of substitution into a sum expression. But this equals eval.subst.f; e1 /; n/ C eval.subst.f; e2 /; n/ by Definition 7.4.2.(7.11) of eval for a sum expression. By induction hypoth- esis (7.22), this in turn equals eval.e1 ; eval.f; n// C eval.e2 ; eval.f; n//: Finally, this last expression equals the right-hand side of (7.23) by Defini- tion 7.4.2.(7.11) of eval for a sum expression. This proves (7.23) in this case. Case[[ e1 * e2 ] ] Similar. Case[ [ e1 ] ] Even easier. This covers all the constructor cases, and so completes the proof by structural induction. 7.5 Games as a Recursive Data Type Chess, Checkers, Go, and Nim are examples of two-person games of perfect in- formation. These are games where two players, Player-1 and Player-2, alternate moves, and “perfect information” means that the situation at any point in the game is completely visible to both players. In Chess, for example, the visible positions of the pieces on the chess board completely determine how the rest of the game can be played by each player. By contrast, most card games are not games of perfect information because neither player can see the other’s hand. In the section we’ll examine the win-lose two-person games of perfect informa- tion, WL-2PerGm. We will define WL-2PerGm as a recursive data type, and then we will prove, by structural induction, a fundamental theorem about winning strate- gies for these games. The idea behind the recursive definition is to recognize that the situation at any point during game play can itself be treated as the start of a new game. This is clearest for the game of Nim. A Nim game starts with several piles of stones. A move in the game consists of removing some positive number of stones from a single pile. Player-1 and player-2 alternate making moves, and whoever takes the last stone wins. So if there is only one pile, then the first player to move wins by taking the whole pile. On the hand, if the game starts with just two piles, each with the same number of stones, then the “mcs” — 2017/3/10 — 22:22 — page 223 — #231 7.5. Games as a Recursive Data Type 223 player who moves second can guarantee a win simply by mimicking the first player. For example, this means that if the first player removes three stones from one pile, then the second player removes three stones from the other pile. At this point, it’s worth thinking for a moment about why the mimicking strategy guarantees a win for the second player. We can think of the first move in a Nim game as simply picking another Nim game with different piles of stone to play next. For the Nim game Nimh3;4;5i that starts with piles of 3, 4 and 5 stones, the first player can remove between one and three stones from the first pile leading to three possible piles of stones h2; 4; 5i ; h1; 4; 5i ; h4; 5i : Similarly, the first player has five possible ways to remove stones from the last pile, leading to five possible piles of stones h3; 4; 4i ; h3; 4; 3i ; h3; 4; 2i ; h3; 4; 1i ; h3; 4i : So all the properties of Nimh3;4;5i are captured by the set of 3 C 4 C 5 D 12 Nim games that can result from the first move. With this idea in mind, we now give the formal definition. Definition 7.5.1. The class WL-2PerGm of two-person win-lose games of perfect information is defined recursively as follows: Base case: win and lose are WL-2PerGm’s. Constructor case: If G is a nonempty set of WL-2PerGm’s, then G is a WL-2PerGm game. Each game M 2 G is called a possible first move of G. A play of a WL-2PerGm game is a sequence of moves that ends with a win or loss for the first player, or goes on forever without arriving at an outcome.2 More formally: Definition. A play of a WL-2PerGm game G and its outcome is defined recursively on the definition of WL-2PerGm: Base case: (G D win). The sequence hwini of length one is a play of G. Its outcome is a win. Base case: (G D lose). The sequence hlosei of length one is a play of G. Its outcome is a loss. 2 In English, “Nim game” might refer to the rules that define the game, but it might also refer to a particular play of the game—as in the once famous third game in the 1961 movie Last Year at Marienbad. It’s usually easy to figure out which way the phrase in being used, and we won’t worry about it. “mcs” — 2017/3/10 — 22:22 — page 224 — #232 224 Chapter 7 Recursive Data Types Constructor case: (G is a nonempty set of WL-2PerGm’s). A play of G is a sequence that starts with G followed by a play PM of some game M 2 G. The outcome of the play, if any, is the outcome of PM . The basic rules of some games do allow plays that go on forever. In Chess for example, a player might just keep moving the same piece back and forth, and if his opponent did the same, the play could go on forever.3 But the recursive definition of WL-2PerGm games actually rules out the possibility of infinite play. Lemma 7.5.2. Every play of a game G 2 WL-2PerGm has an outcome. Proof. We prove Lemma 7.5.2 by structural induction, using the statement of the Lemma as the induction hypothesis. Base case: (G D win). There is only one play of G, namely the length one play hwini, whose outcome is a win. Base case: (G D lose). Likewise with the outcome being a loss. Constructor case: (G is a nonempty set of WL-2PerGm’s). A play of G by defini- tion consists G followed by a play PM for some M 2 G. By structural induction, PM must be a sequence of some finite length n that ends with an outcome. So this play of G is a length n C 1 sequence that finishes with the same outcome. Among the games of Checker, Chess, Go and Nim, only Nim is genuinely a win- lose game, The other games might end in a tie (draw, stalemate, jigo) rather than a win or loss. However, by treating a tie in these games as a loss for the first player, the results about win-lose games will apply to games with ties. 7.5.1 Game Strategies A strategy for a player is a rule that tells the player which move to make whenever it is their turn. More precisely, a strategy s is a function from games to games with the property that s.G/ 2 G for all games G. A pair of strategies for the two players determines exactly which moves the players choose, and so it determines a unique play of the game, depending on who moves first. A key question about a game is what strategy will ensure that a player will win. The Player-1 wants a strategy whose outcome is guaranteed to be a win, and Player- 2 wants a strategy whose outcome is guaranteed to be a loss for Player-1. 3 Real chess tournaments rule this out by setting an advance limit on the number of moves, or by forbidding repetitions of the same position more than twice. “mcs” — 2017/3/10 — 22:22 — page 225 — #233 7.5. Games as a Recursive Data Type 225 7.5.2 Fundamental Theorem for Win-Lose Games The Fundamental Theorem for WL-2PerGm games says that one of the players always has a fixed “winning” strategy that guarantees a win against every possible opponent strategy. Thinking about Chess for instance, this seems surprising. Serious chess players are typically secretive about their intended play strategies, believing that an oppo- nent could take advantage of knowing their strategy. It seems to them that for any strategy they choose, their opponent can tailor a strategy to beat it. But the Fundamental Theorem says otherwise. In theory, in any win-lose-tie game like Chess or Checkers, each of the players will have a strategy that guar- antees a win or a stalemate, even if the strategy is known to their opponent. That is, there is winning strategy for one of the players, or both players have strategies that guarantee them at worst a draw. Even though the Fundamental Theorem reveals a profound fact about games, it has a very simple proof by structural induction. Theorem 7.5.3. [Fundamental Theorem for Win-Lose Games] For any WL-2PerGm game G, one of the players has a winning strategy. Proof. The proof is by structural induction on the definition of a G 2 WL-2PerGm. The induction hypothesis is that one of the players has a winning strategy for G. Base case: (G D win or lose). Then there is only one possible strategy for each player, namely, do nothing and finish with outcome G. Constructor case: (G is a nonempty set of WL-2PerGm’s). By structural induction we may assume that for each M 2 G one of the players has a winning strategy. Notice that since players alternate moves, the first player in G becomes the second player in M . Now if there is a move M0 2 G where the second player in M0 has a winning strategy, then the first player in G has a simple winning strategy: pick M0 as the first move, and then follow the second player’s winning strategy for M0 . On the other hand, if no M 2 G has a winning strategy for the second player in M , then we can conclude by induction that every M 2 G has a winning strategy for the first player in M . Now the second player in G has a simple winning strategy, namely if the first player in G makes the move M , then the second player in G should follow the follow the winning strategy for the first player in M . “mcs” — 2017/3/10 — 22:22 — page 226 — #234 226 Chapter 7 Recursive Data Types Infinite Games So where do we come upon games with an infinite number of first moves? Well, suppose we play a tournament of n chess games for some positive integer n. This tournament will be a WL-2PerGm if we agree on a rule for combining the payoffs of the n individual chess games into a final payoff for the whole tournament. There still are only a finite number of possible moves at any stage of the n-game chess tournament, but we can define a meta-chess-tournament, whose first move is a choice of any positive integer n, after which we play an n-game tournament. Now the meta-chess-tournament has an infinite number of first moves. Of course only the first move in the meta-chess-tournament is infinite, but then we could set up a tournament consisting of n meta-chess-tournaments. This would be a game with n possible infinite moves. And then we could have a meta-meta- chess-tournament whose first move was to choose how many meta-chess-tournaments to play. This meta-meta-chess-tournament will have an infinite number of infinite moves. Then we could move on to meta-meta-meta-chess-tournaments . . . . As silly or weird as these meta games may seem, their weirdness doesn’t dis- qualify the Fundamental Theorem: in each of these games, one of the players will have winning strategy. Notice that although Theorem 7.5.3 guarantees a winning strategy, its proof gives no clue which player has it. For the Subset Takeaway Game of Problem 4.7 and most familiar 2PerGm’s like Chess, Go, . . . , no one knows which player has a winning strategy.4 7.6 Induction in Computer Science Induction is a powerful and widely applicable proof technique, which is why we’ve devoted two entire chapters to it. Strong induction and its special case of ordinary induction are applicable to any kind of thing with nonnegative integer sizes—which is an awful lot of things, including all step-by-step computational processes. Structural induction then goes beyond number counting, and offers a simple, natural approach to proving things about recursive data types and recursive compu- tation. In many cases, a nonnegative integer size can be defined for a recursively defined datum, such as the length of a string, or the number of operations in an Aexp. It is then possible to prove properties of data by ordinary induction on their size. But 4 Checkers used to be in this list, but there has been a recent announcement that each player has a strategy that forces a tie. (reference TBA) “mcs” — 2017/3/10 — 22:22 — page 227 — #235 7.6. Induction in Computer Science 227 this approach often produces more cumbersome proofs than structural induction. In fact, structural induction is theoretically more powerful than ordinary induc- tion. However, it’s only more powerful when it comes to reasoning about infinite data types—like infinite trees, for example—so this greater power doesn’t matter in practice. What does matter is that for recursively defined data types, structural in- duction is a simple and natural approach. This makes it a technique every computer scientist should embrace. Problems for Section 7.1 Practice Problems Problem 7.1. The set OBT of Ordered Binary Trees is defined recursively as follows: Base case: hleafi is an OBT, and Constructor case: if R and S are OBT’s, then hnode; R; S i is an OBT. If T is an OBT, let nT be the number of node labels in T and lT be the number of leaf labels in T . Prove by structural induction that for all T 2 OBT, lT D nT C 1: (7.24) Class Problems Problem 7.2. Prove by structural induction on the recursive definition(7.1.1) of A that concate- nation is associative: .r s/ t D r .s t / (7.25) for all strings r; s; t 2 A . Problem 7.3. The reversal of a string is the string written backwards, for example, rev.abcde/ D edcba. (a) Give a simple recursive definition of rev.s/ based on the recursive defini- tions 7.1.1 of s 2 A and of the concatenation operation 7.1.3. “mcs” — 2017/3/10 — 22:22 — page 228 — #236 228 Chapter 7 Recursive Data Types (b) Prove that rev.s t / D rev.t / rev.s/; (7.26) for all strings s; t 2 A . You may assume that concatenation is associative: .r s/ t D r .s t / for all strings r; s; t 2 A (Problem 7.2). Problem 7.4. The Elementary 18.01 Functions (F18’s) are the set of functions of one real variable defined recursively as follows: Base cases: The identity function id.x/ WWD x is an F18, any constant function is an F18, the sine function is an F18, Constructor cases: If f; g are F18’s, then so are 1. f C g fg 2g , 2. the inverse function f 1, 3. the composition f ı g. (a) Prove that the function 1=x is an F18. 1 Warning: Don’t confuse 1=x D x 1 with the inverse id of the identity function id.x/. The inverse id 1 is equal to id. (b) Prove by Structural Induction on this definition that the Elementary 18.01 Functions are closed under taking derivatives. That is, show that if f .x/ is an F18, then so is f 0 WWD df =dx. (Just work out 2 or 3 of the most interesting constructor cases; you may skip the less interesting ones.) Problem 7.5. Here is a simple recursive definition of the set E of even integers: “mcs” — 2017/3/10 — 22:22 — page 229 — #237 7.6. Induction in Computer Science 229 Definition. Base case: 0 2 E. Constructor cases: If n 2 E, then so are n C 2 and n. Provide similar simple recursive definitions of the following sets: (a) The set S WWD f2k 3m 5n 2 N j k; m; n 2 Ng. (b) The set T WWD f2k 32kCm 5mCn 2 N j k; m; n 2 Ng. (c) The set L WWD f.a; b/ 2 Z2 j .a b/ is a multiple of 3g. Let L0 be the set defined by the recursive definition you gave for L in the previous part. Now if you did it right, then L0 D L, but maybe you made a mistake. So let’s check that you got the definition right. (d) Prove by structural induction on your definition of L0 that L0 L: (e) Confirm that you got the definition right by proving that L L0 : (f) See if you can give an unambiguous recursive definition of L. Problem 7.6. Definition. The recursive data type binary-2PG of binary trees with leaf labels L is defined recursively as follows: Base case: hleaf; li 2 binary-2PG, for all labels l 2 L. Constructor case: If G1 ; G2 2 binary-2PG, then hbintree; G1 ; G2 i 2 binary-2PG: The size jGj of G 2 binary-2PG is defined recursively on this definition by: Base case: j hleaf; li j WWD 1; for all l 2 L: Constructor case: j hbintree; G1 ; G2 i j WWD jG1 j C jG2 j C 1: “mcs” — 2017/3/10 — 22:22 — page 230 — #238 230 Chapter 7 Recursive Data Types G G1 win G1,2 win lose win Figure 7.1 A picture of a binary tree G. For example, the size of the binary-2PG G pictured in Figure 7.1, is 7. (a) Write out (using angle brackets and labels bintree, leaf, etc.) the binary-2PG G pictured in Figure 7.1. The value of flatten.G/ for G 2 binary-2PG is the sequence of labels in L of the leaves of G. For example, for the binary-2PG G pictured in Figure 7.1, flatten.G/ D .win; lose; win; win/: (b) Give a recursive definition of flatten. (You may use the operation of concate- nation (append) of two sequences.) (c) Prove by structural induction on the definitions of flatten and size that 2 length.flatten.G// D jGj C 1: (7.27) Homework Problems Problem 7.7. The string reversal function, rev W A ! A has a simple recursive definition. Base case: rev./ WWD . Constructor case: rev.as/ WWD rev.s/a for s 2 A and a 2 A. “mcs” — 2017/3/10 — 22:22 — page 231 — #239 7.6. Induction in Computer Science 231 A string s is a palindrome when rev.s/ D s. The palindromes also have a simple recursive definition as the set RecPal. Base cases: 2 RecPal and a 2 RecPal for a 2 A. Constructor case: If s 2 RecPal, then asa 2 RecPal for a 2 A. Verifying that the two definitions agree offers a nice exercise in structural induc- tion and also induction on length of strings. The verification rests on three basic properties of concatenation and reversal proved in separate problems 7.2 and 7.3. Fact. .rs D uv AND jrj D juj/ IFF .r D u AND s D v/ (7.28) r .s t / D .r s/ t (7.29) rev.st / D rev.t / rev.s/ (7.30) (a) Prove that s D rev.s/ for all s 2 RecPal. (b) Prove conversely that if s D rev.s/, then s 2 RecPal. Hint: By induction on n D jsj. Problem 7.8. Let m; n be integers, not both zero. Define a set of integers, Lm;n , recursively as follows: Base cases: m; n 2 Lm;n . Constructor cases: If j; k 2 Lm;n , then 1. j 2 Lm;n , 2. j C k 2 Lm;n . Let L be an abbreviation for Lm;n in the rest of this problem. (a) Prove by structural induction that every common divisor of m and n also di- vides every member of L. (b) Prove that any integer multiple of an element of L is also in L. (c) Show that if j; k 2 L and k ¤ 0, then rem.j; k/ 2 L. (d) Show that there is a positive integer g 2 L that divides every member of L. Hint: The least positive integer in L. “mcs” — 2017/3/10 — 22:22 — page 232 — #240 232 Chapter 7 Recursive Data Types Figure 7.2 Constructing the Koch Snowflake. (e) Conclude that g from part (d) is gcd.m; n/, the greatest common divisor, of m and n. Problem 7.9. Definition. Define the number #c .s/ of occurrences of the character c 2 A in the string s recursively on the definition of s 2 A : base case: #c ./ WWD 0. constructor case: ( #c .s/ if a ¤ c; #c .ha; si/ WWD 1 C #c .s/ if a D c: Prove by structural induction that for all s; t 2 A and c 2 A #c .s t / D #c .s/ C #c .t /: Problem 7.10. Fractals are an example of mathematical objects that can be defined recursively. In this problem, we consider the Koch snowflake. Any Koch snowflake can be constructed by the following recursive definition. Base case: An equilateral triangle with a positive integer side length is a Koch snowflake. Constructor case: Let K be a Koch snowflake, and let l be a line segment on the snowflake. Remove the middle third of l, and replace it with two line segments of the same length jlj, as is done in Figure 7.2 The resulting figure is also a Koch snowflake. Prove pby structural induction that the area inside any Koch snowflake is of the form q 3, where q is a rational number. “mcs” — 2017/3/10 — 22:22 — page 233 — #241 7.6. Induction in Computer Science 233 Problem 7.11. The set RBT of Red-Black Trees is defined recursively as follows: Base cases: hredi 2 RBT, and hblacki 2 RBT. Constructor cases: A; B are RBT’s, then if A; B start with black, then hred; A; Bi is an RBT. if A; B start with red, then hblack; A; Bi is an RBT. For any RBT T , let rT be the number of red labels in T , bT be the number of black labels in T , and nT WWD rT C bT be the total number of labels in T . Prove that nT 2nT C 1 If T starts with a red label; then rT ; (7.31) 3 3 Hint: n=3 r IFF .2=3/n n r Exam Problems Problem 7.12. The Arithmetic Trig Functions (Atrig’s) are the set of functions of one real variable defined recursively as follows: Base cases: The identity function id.x/ WWD x is an Atrig, any constant function is an Atrig, the sine function is an Atrig, “mcs” — 2017/3/10 — 22:22 — page 234 — #242 234 Chapter 7 Recursive Data Types Constructor cases: If f; g are Atrig’s, then so are 1. f C g 2. f g 3. the composition f ı g. Prove by structural induction on this definition that if f .x/ is an Atrig, then so is f 0 WWD df =dx. Problem 7.13. Definition. The set RAF of rational functions of one real variable is the set of functions defined recursively as follows: Base cases: The identity function, id.r/ WWD r for r 2 R (the real numbers), is an RAF, any constant function on R is an RAF. Constructor cases: If f; g are RAF’s, then so is f ~ g, where ~ is one of the operations 1. addition C, 2. multiplication or 3. division =. (a) Describe how to construct functions e; f; g 2 RAF such that e ı .f C g/ ¤ .e ı f / C .e ı g/: (7.32) (b) Prove that for all real-valued functions e; f; g (not just those in RAF): .e ~ f / ı g D .e ı g/ ~ .f ı g/; (7.33) Hint: .e ~ f /.x/ WWD e.x/ ~ f .x/. “mcs” — 2017/3/10 — 22:22 — page 235 — #243 7.6. Induction in Computer Science 235 (c) Let predicate P .h/ be the following predicate on functions h 2 RAF: P .h/ WWD 8g 2 RAF: h ı g 2 RAF: Prove by structural induction on the definition of RAF that P .h/ holds for all h 2 RAF. Make sure to indicate explicitly each of the base cases, and each of the constructor cases. Problem 7.14. The 2-3-averaged numbers are a subset, N23, of the real interval Œ0; 1 defined recursively as follows: Base cases: 0; 1 2 N23. Constructor case: If a; b are in N23, then so is L.a; b/ where 2a C 3b L.a; b/ WWD : 5 (a) Use ordinary induction or the Well-Ordering Principle to prove that n 3 2 N23 5 for all nonnegative integers n. (b) Prove by Structural Induction that the product of two 2-3-averaged numbers is also a 2-3-averaged number. Hint: Prove by structural induction on c that, if d 2 N23, then cd 2 N23. Problem 7.15. This problem is about binary strings s 2 f0; 1g . Let’s call a recursive definition of a set of strings cat-OK when all its constructors are defined as concatenations of strings.5 5 The concatenation of two strings x and y, written xy, is the string obtained by appending x to the left end of y. For example, the concatenation of 01 and 101 is 01101. “mcs” — 2017/3/10 — 22:22 — page 236 — #244 236 Chapter 7 Recursive Data Types For example, the set, One1, of strings with exactly one 1 has the cat-OK defini- tion: Base case: The length-one string 1 is in One1. Constructor case: If s is in One1, then so is 0s and s0. (a) Give a cat-OK definition of the set E of even length strings consisting solely of 0’s. (b) Let rev.s/ be the reversal of the string s. For example, rev.001/ D 100. A palindrome is a string s such that s D rev.s/. For example, 11011 and 010010 are palindromes. Give a cat-OK definition of the palindromes. (c) Give a cat-OK definition of the set P of strings consisting solely of 0’s whose length is a power of two. Problems for Section 7.2 Practice Problems Problem 7.16. Define the sets F1 and F2 recursively: F1 : – 5 2 F1 , – if n 2 F1 , then 5n 2 F1 . F2 : – 5 2 F2 , – if n; m 2 F1 , then nm 2 F2 . (a) Show that one of these definitions is technically ambiguous. (Remember that “ambiguous recursive definition” has a technical mathematical meaning which does not imply that the ambiguous definition is unclear.) (b) Briefly explain what advantage unambiguous recursive definitions have over ambiguous ones. “mcs” — 2017/3/10 — 22:22 — page 237 — #245 7.6. Induction in Computer Science 237 (c) A way to prove that F1 D F2 , is to show first that F1 F2 and second that F2 F1 . One of these containments follows easily by structural induction. Which one? What would be the induction hypothesis? (You do not need to complete a proof.) Problem 7.17. (a) To prove that the set RecMatch, of matched strings of Defini- tion 7.2.1 equals the set AmbRecMatch of ambiguous matched strings of Defini- tion 7.2.4, you could first prove that 8r 2 RecMatch: r 2 AmbRecMatch; and then prove that 8u 2 AmbRecMatch: u 2 RecMatch: Of these two statements, indicate the one that would be simpler to prove by struc- tural induction directly from the definitions. (b) Suppose structural induction was being used to prove that AmbRecMatch RecMatch. Indicate the one predicate below that would fit the format for a structural induction hypothesis in such a proof. P0 .n/ WWD jsj n IMPLIES s 2 RecMatch. P1 .n/ WWD jsj n IMPLIES s 2 AmbRecMatch. P2 .s/ WWD s 2 RecMatch. P3 .s/ WWD s 2 AmbRecMatch. P4 .s/ WWD .s 2 RecMatch IMPLIES s 2 AmbRecMatch/. (c) The recursive definition AmbRecMatch is ambiguous because it allows the s t constructor to apply when s or t is the empty string. But even fixing that, ambiguity remains. Demonstrate this by giving two different derivations for the string ”[ ] [ ] [ ] according to AmbRecMatch but only using the s t constructor when s ¤ and t ¤ . Class Problems Problem 7.18. Let p be the string [ ] . A string of brackets is said to be erasable iff it can be “mcs” — 2017/3/10 — 22:22 — page 238 — #246 238 Chapter 7 Recursive Data Types reduced to the empty string by repeatedly erasing occurrences of p. For example, to erase the string [[[]][]][]; start by erasing the three occurrences of p to obtain [[]]: Then erase the single occurrence of p to obtain, []; which can now be erased to obtain the empty string . On the other hand the string []][[[[[]] (7.34) is not erasable, because when we try to erase, we get stuck. Namely, start by erasing the two occurrences of p in (7.34) to obtain ][[[[]: The erase the one remaining occurrence of p to obtain. ][[[: At this point we are stuck with no remaining occurrences of p. 6 Let Erasable be the set of erasable strings of brackets. Let RecMatch be the recursive data type of strings of matched brackets given in Definition 7.2.1 (a) Use structural induction to prove that RecMatch Erasable: (b) Supply the missing parts (labeled by “(*)”) of the following proof that Erasable RecMatch: 6 Notice that there are many ways to erase a string, depending on when and which occurrences of p are chosen to be erased. It turns out that given any initial string, the final string reached after performing all possible erasures will be the same, no matter how erasures are performed. We take this for granted here, although it is not altogether obvious. (See Problem 6.28 for a proof). “mcs” — 2017/3/10 — 22:22 — page 239 — #247 7.6. Induction in Computer Science 239 Proof. We prove by strong induction that every length n string in Erasable is also in RecMatch. The induction hypothesis is P .n/ WWD 8x 2 Erasable: jxj D n IMPLIES x 2 RecMatch: Base case: (*) What is the base case? Prove that P is true in this case. Inductive step: To prove P .n C 1/, suppose jxj D n C 1 and x 2 Erasable. We need to show that x 2 RecMatch. Let’s say that a string y is an erase of a string z iff y is the result of erasing a single occurrence of p in z. Since x 2 Erasable and has positive length, there must be an erase, y 2 Erasable, of x. So jyj D n 1 0, and since y 2 Erasable, we may assume by induction hypothesis that y 2 RecMatch. Now we argue by cases: Case (y is the empty string): (*) Prove that x 2 RecMatch in this case. Case (y D [ s ] t for some strings s; t 2 RecMatch): Now we argue by subcases. Subcase(x D py): (*) Prove that x 2 RecMatch in this subcase. Subcase (x is of the form [ s 0 ] t where s is an erase of s 0 ): Since s 2 RecMatch, it is erasable by part (b), which implies that s 0 2 Erasable. But js 0 j < jxj, so by induction hypothesis, we may assume that s 0 2 RecMatch. This shows that x is the result of the constructor step of RecMatch, and therefore x 2 RecMatch. Subcase (x is of the form [ s ] t 0 where t is an erase of t 0 ): (*) Prove that x 2 RecMatch in this subcase. (*) Explain why the above cases are sufficient. This completes the proof by strong induction on n, so we conclude that P .n/ holds for all n 2 N. Therefore x 2 RecMatch for every string x 2 Erasable. That is, Erasable RecMatch. Combined with part (a), we conclude that Erasable D RecMatch: “mcs” — 2017/3/10 — 22:22 — page 240 — #248 240 Chapter 7 Recursive Data Types Problem 7.19. (a) Prove that the set RecMatch of matched strings of Definition 7.2.1 is closed under string concatenation. Namely, if s; t 2 RecMatch, then s t 2 RecMatch. (b) Prove AmbRecMatch RecMatch, where AmbRecMatch is the set of am- biguous matched strings of Definition 7.2.4. (c) Prove that RecMatch D AmbRecMatch. Homework Problems Problem 7.20. One way to determine if a string has matching brackets, that is, if it is in the set, RecMatch, of Definition 7.2.1 is to start with 0 and read the string from left to right, adding 1 to the count for each left bracket and subtracting 1 from the count for each right bracket. For example, here are the counts for two sample strings: [ ] ] [ [ [ [ [ ] ] ] ] 0 1 0 1 0 1 2 3 4 3 2 1 0 [ [ [ ] ] [ ] ] [ ] 0 1 2 3 2 1 2 1 0 1 0 A string has a good count if its running count never goes negative and ends with 0. So the second string above has a good count, but the first one does not because its count went negative at the third step. Let GoodCount WWD fs 2 f] ; [ g j s has a good countg: The empty string has a length 0 running count we’ll take as a good count by convention, that is, 2 GoodCount. The matched strings can now be characterized precisely as this set of strings with good counts. (a) Prove that GoodCount contains RecMatch by structural induction on the defi- nition of RecMatch. (b) Conversely, prove that RecMatch contains GoodCount. Hint: By induction on the length of strings in GoodCount. Consider when the running count equals 0 for the second time. Problem 7.21. Divided Equilateral Triangles (DETs) were defined in Problem 5.10 as follows: “mcs” — 2017/3/10 — 22:22 — page 241 — #249 7.6. Induction in Computer Science 241 [h] Figure 7.3 DET T 0 from Four Copies of DET T [h] Figure 7.4 Trapezoid from Three Triangles Base case: A single equilateral triangle is a DET whose only subtriangle is itself. If T WWD is a DET, then the equilateral triangle T 0 built out of four copies of T as shown in in Figure 7.3 is also a DET, and the subtriangles of T 0 are exactly the subtriangles of each of the copies of T . Properties of DETs were proved earlier by induction on the length of a side of the triangle. Recognizing that the definition of DETs is recursive, we can instead prove properties of DETs by structural induction. (a) Prove by structural induction that a DET with one of its corner subtriangles removed can be tiled with trapezoids built out of three subtriangles as in Figure 7.4. (b) Explain why a DET with a triangle removed from the middle of one side can also be tiled by trapezoids. (c) In tiling a large square using L-shaped blocks as described in Section 5.1.5, there was a tiling with any single subsquare removed. Part (b) indicates that trapezoid- tilings are possible for DETs with a non-corner subtriangle removed, so it’s natural to make the mistaken guess that DETs have a corresponding property: False Claim. A DET with any single subtriangle removed can be trapezoid-tiled. We can try to prove the claim by structural induction as in part (a). “mcs” — 2017/3/10 — 22:22 — page 242 — #250 242 Chapter 7 Recursive Data Types Bogus proof. The claim holds vacuously in the base case of a DET with a single subtriangle. Now let T 0 be a DET made of four copies of a DET T , and suppose we remove an arbitrary subtriangle from T 0 . The removed subtriangle must be a subtriangle of one of the copies of T . The copies are the same, so for definiteness we assume the subtriangle was removed from copy 1. Then by structural induction hypothesis, copy 1 can be trapezoid- tiled, and then the other three copies of T can be trapezoid-tiled exactly as in the solution to part(a). This yields a complete trapezoid-tiling of T 0 with the arbitrary subtriangle removed. We conclude by structural induction that any DET with any subtriangle removed can be trapezoid-tiled. What’s wrong with the proof? Hint: Find a counter-example and show where the proof breaks down. We don’t know if there is a simple characterization of exactly which subtriangles can be removed to allow a trapezoid tiling. Problem 7.22. A binary word is a finite sequence of 0’s and 1’s. In this problem, we’ll simply call them “words.” For example, .1; 1; 0/ and .1/ are words of length three and one, respectively. We usually omit the parentheses and commas in the descriptions of words, so the preceding binary words would just be written as 110 and 1. The basic operation of placing one word immediately after another is called con- catentation. For example, the concatentation of 110 and 1 is 1101, and the con- catentation of 110 with itself is 110110. We can extend this basic operation on words to an operation on sets of words. To emphasize the distinction between a word and a set of words, from now on we’ll refer to a set of words as a language. Now if R and S are languages, then R S is the language consisting of all the words you can get by concatenating a word from R with a word from S . That is, R S WWD frs j r 2 R AND s 2 S g: For example, f0; 00g f00; 000g D f000; 0000; 00000g “mcs” — 2017/3/10 — 22:22 — page 243 — #251 7.6. Induction in Computer Science 243 Another example is D D, abbreviated as D 2 , where D WWD f1; 0g. D 2 D f00; 01; 10; 11g: In other words, D 2 is the language consisting of all the length-two words. More generally, D n will be the language of length-n words. If S is a language, the language you can get by concatenating any number of copies of words in S is called S —pronounced “S star.” (By convention, the empty word always included in S .) For example, f0; 11g is the language consisting of all the words you can make by stringing together 0’s and 11’s. This language could also be described as consisting of the words whose blocks of 1’s are always of even length. Another example is .D 2 / , which consists of all the even length words. Finally, the language B of all binary words is just D . The Concatenation-Definable (C-D) languages are defined recursively: Base case: Every finite language is a C-D. Constructor cases: If L and M are C-D’s, then L M; L [ M; and L are C-D’s. Note that the -operation is not allowed. For this reason, the C-D languages are also called the “star-free languages,” [33]. Lots of interesting languages turn out to be concatenation-definable, but some very simple languages are not. This problem ends with the conclusion that the language f00g of even length words whose bits are all 0’s is not a C-D language. (a) Show that the set B of all binary words is C-D. Hint: The empty set is finite. Now a more interesting example of a C-D set is the language of all binary words that include three consecutive 1’s: B111B: Notice that the proper expression here is “B f111gB.” But it causes no confusion and helps readability to omit the dots in concatenations and the curly braces for sets with only one element. (b) Show that the language consisting of the binary words that start with 0 and end with 1 is C-D. (c) Show that 0 is C-D. “mcs” — 2017/3/10 — 22:22 — page 244 — #252 244 Chapter 7 Recursive Data Types (d) Show that if R and S are C-D, then so is R \ S . (e) Show that f01g is C-D. Let’s say a language S is 0-finite when it includes only a finite number of words whose bits are all 0’s, that is, when S \ 0 is a finite set of words. A langauge S is 0-boring—boring, for short—when either S or S is 0-finite. (f) Explain why f00g is not boring. (g) Verify that if R and S are boring, then so is R [ S . (h) Verify that if R and S are boring, then so is R S . Hint: By cases: whether R and S are both 0-finite, whether R or S contains no all-0 words at all (including the empty word ), and whether neither of these cases hold. (i) Conclude by structural induction that all C-D languages are boring. So we have proved that the set .00/ of even length all-0 words is not a C-D language. Problem 7.23. We can explain in a simple and precise way how digital circuits work, and gain the powerful proof method of structural induction to verify their properties, by defining digital circuits as a recursive data type DigCirc. The definition is a little easier to state if all the gates in the circuit take two inputs, so we will use the two-input NOR gate rather than a one-input NOT, and let the set of gates be Gates WWD fNOR; AND; OR; XORg: A digital circuit will be a recursively defined list of gate connections of the form .x; y; G; I / where G is a gate, x and y are the input wires, and I is the set of wires that the gate output feeds into as illustrated in Figure 7.5. Formally, we let W be a set w0 ; w1 ; : : : whose elements are called wires, and O … W be an object called the output. Definition. The set of digital circuit DigCirc, and their inputs and internal wires, are defined recursively as follows: “mcs” — 2017/3/10 — 22:22 — page 245 — #253 7.6. Induction in Computer Science 245 Figure 7.5 Digital Circuit Constructor Step Base case: If x; y 2 W , then C 2 DigCirc, where C D list..x; y; G; fOg// for some G 2 Gates; inputs.C/ WWD fx; yg; internal.C/ WWD ;: Constructor cases: If C 2 DigCirc; I inputs.C/; I ¤ ;; x; y 2 W .I [ internal.C// then D 2 DigCirc, where D D cons..x; y; G; I/; C/ for some G 2 Gates; inputs.D/ WWD fx; yg [ .inputs.C/ I /; internal.D/ WWD internal.C/ [ I: For any circuit C define wires.C/ WWD inputs.C/ [ internal.C/ [ fOg: “mcs” — 2017/3/10 — 22:22 — page 246 — #254 246 Chapter 7 Recursive Data Types A wire assignment for C is a function ˛ W wires.C/ ! fT; Fg such that for each gate connection .x; y; G; I / 2 C , ˛.i / D .˛.x/ G ˛.y// for all i 2 I: (a) Define an environment for C to be a function e W inputs.C/ ! fT; Fg. Prove that if two wire assignments for C are equal for each wire in inputs.C/, then the wire assignments are equal for all wires. Part (a) implies that for any environment e for C , there is a unique wire assign- ment ˛e such that ˛e .w/ D e.w/ for all w 2 inputs.C/: So for any input environment e, the circuit computes a unique output eval.C; e/ WWD ˛e .O/: Now suppose F is a propositional formula whose propositional variables are the input wires of some circuit C . Then C and F are defined to be equivalent iff eval.C; e/ D eval.F; e/ for all environments e for C . (b) Define a function E.C / recursively on the definition of circuit C , such that E.C / is a propositional formula equivalent to C . Then verify the recursive defini- tion by proving the equivalence using structural induction. (c) Give examples where E.C / is exponentially larger than C . Exam Problems Problem 7.24. Let P be a propositional variable. (a) Show how to express NOT.P / using P and a selection from among the con- stant True, and the connectives XOR and AND. The use of the constant True above is essential. To prove this, we begin with a recursive definition of XOR-AND formulas that do not use True, called the PXA formulas. “mcs” — 2017/3/10 — 22:22 — page 247 — #255 7.6. Induction in Computer Science 247 Definition. Base case: The propositional variable P is a PXA formula. Constructor cases If R; S 2 PXA, then R XOR S , R AND S are PXA’s. For example, ...P XOR P / AND P / XOR .P AND P // XOR .P XOR P / is a PXA. (b) Prove by structural induction on the definition of PXA that every PXA formula A is equivalent to P or to False. Problems for Section 7.3 Homework Problems Problem 7.25. One version of the the Ackermann function A W N2 ! N is defined recursively by the following rules: A.m; n/ WWD 2n if m D 0 or n 1, (A-base) A.m; n/ WWD A.m 1; A.m; n 1// otherwise: (AA) Prove that if B W N2 ! N is a partial function that satisfies this same definition, then B is total and B D A. Problems for Section 7.4 Practice Problems Problem 7.26. (a) Write out the evaluation of eval.subst.3x; x.x 1//; 2/ according to the Environment Model and the Substitution Model, indicating where the rule for each case of the recursive definitions of eval.; / and ŒWD] or substitution is first used. Compare the number of arithmetic operations and variable lookups. “mcs” — 2017/3/10 — 22:22 — page 248 — #256 248 Chapter 7 Recursive Data Types (b) Describe an example along the lines of part (a) where the Environment Model would perform 6 fewer multiplications than the Substitution model. You need not carry out the evaluations. (c) Describe an example along the lines of part (a) where the Substitution Model would perform 6 fewer multiplications than the Environment model. You need not carry out the evaluations. Class Problems Problem 7.27. In this problem we’ll need to be careful about the propositional operations that apply to truth values and the corresponding symbols that appear in formulas. We’ll restrict ourselves to formulas with symbols And and Not that correspond to the operations AND, NOT. We will also allow the constant symbols True and False. (a) Give a simple recursive definition of propositional formula F and the set pvar.F / of propositional variables that appear in it. Let V be a set of propositional variables. A truth environment e over V assigns truth values to all these variables. In other words, e is a total function, e W V ! fT; Fg: (b) Give a recursive definition of the truth value, eval.F; e/, of propositional for- mula F in an environment e over a set of variables V pvar.F /. Clearly the truth value of a propositional formula only depends on the truth val- ues of the variables in it. How could it be otherwise? But it’s good practice to work out a rigorous definition and proof of this assumption. (c) Give an example of a propositional formula containing the variable P but whose truth value does not depend on P . Now give a rigorous definition of the as- sertion that “the truth value of propositional formula F does not depend on propo- sitional variable P .” Hint: Let e1 ; e2 be two environments whose values agree on all variables other than P. (d) Give a rigorous definition of the assertion that “the truth value of a proposi- tional formula only depends on the truth values of the variables that appear in it,” and then prove it by structural induction on the definition of propositional formula. (e) Now we can formally define F being valid. Namely, F is valid iff 8e: eval.F; e/ D T: “mcs” — 2017/3/10 — 22:22 — page 249 — #257 7.6. Induction in Computer Science 249 Give a similar formal definition of formula G being unsatisfiable. Then use the definition of eval to prove that a formula F is valid iff Not.F / is unsatisfiable. Homework Problems Problem 7.28. (a) Give a recursive definition of a function erase.e/ that erases all the symbols in e 2 Aexp but the brackets. For example erase.[ [ 3 * [ x * x ] ] + [ [ 2 * x ] + 1] ] / D [ [ [ ] ] [ [ 2 * x ] + 1] ] : (b) Prove that erase.e/ 2 RecMatch for all e 2 Aexp. (c) Give an example of a small string s 2 RecMatch such that [ s ] ¤ erase.e/ for any e 2 Aexp. Problems for Section 7.5 Practice Problems Problem 7.29. In the game tree for the game Tic-Tac-Toe, the root has nine children corresponding to the nine boxes that the first player could mark with an “X”. Each of these nine nodes will have eight children in the second level of the tree, indicating where the second player can mark his “O”, giving a total of 72 nodes. Answer the following questions about the game tree for Tic-Tac-Toe. (a) How many nodes will be in the third level of the tree? (b) What is the first level where this simple pattern of calculating nodes stops working? Homework Problems Problem 7.30. We’re going to characterize a large category of games as a recursive data type and then prove, by structural induction, a fundamental theorem about game strategies. We are interested in two person games of perfect information that end with a nu- merical score. Chess and Checkers would count as value games using the values 1; 1; 0 for a win, loss or draw for the first player. The game of Go really does end with a score based on the number of white and black stones that remain at the end. Here’s the formal definition: “mcs” — 2017/3/10 — 22:22 — page 250 — #258 250 Chapter 7 Recursive Data Types Definition. Let V be a nonempty set of real numbers. The class VG of V -valued two-person deterministic games of perfect information is defined recursively as fol- lows: Base case: A value v 2 V is a VG known as a payoff. Constructor case: If G is a nonempty set of VG’s, then G is a VG. Each game M 2 G is called a possible first move of G. A strategy for a player is a rule that tells the player which move to make when- ever it is their turn. That is, a strategy is a function s from games to games with the property that s.G/ 2 G for all games G. Given which player has the first move, a pair of strategies for the two players determines exactly which moves the players will choose. So the strategies determine a unique play of the game and a unique payoff.7 The max-player wants a strategy that guarantees as high a payoff as possible, and the min-player wants a strategy that guarantees as low a payoff as possible. The Fundamental Theorem for deterministic games of perfect information says that in any game, each player has an optimal strategy, and these strategies lead to the same payoff. More precisely, Theorem (Fundamental Theorem for VG’s). Let V be a finite set of real numbers and G be a V -valued VG. Then there is a value v 2 V , called a max-value maxG for G, such that if the max-player moves first, the max-player has a strategy that will finish with a payoff of at least maxG , no matter what strategy the min-player uses, and the min-player has a strategy that will finish with a payoff of at most maxG , no matter what strategy the max-player uses. It’s worth a moment for the reader to observe that the definition of maxG implies that if there is one for G, it is unique. So if the max-player has the first move, the Fundamental Theorem means that there’s no point in playing the game: the min-player may just as well pay the max-value to the max-player. (a) Prove the Fundamental Theorem for VG’s. Hint: VG’s are a recursively defined data type, so the basic method for proving that all VG’s have some property is structural induction on the definition of VG. Since the min-player moves first in whichever game the max-player picks for their first move, the induction hypothesis will need to cover that case as well. 7 We take for granted the fact that no VG has an infinite play. The proof of this by structural induction is essentially the same as that for win-lose games given in Lemma 7.5.2. “mcs” — 2017/3/10 — 22:22 — page 251 — #259 7.6. Induction in Computer Science 251 (b) (OPTIONAL). State some reasonable generalization of the Fundamental The- orem to games with an infinite set V of possible payoffs. Problem 7.31. Nim is a two-person game that starts with some piles of stones. A player’s move consists of removing one or more stones from a single pile. The players alternate making moves, and whoever takes the last stone wins. It turns out there is a winning strategy for one of the players that is easy to carry out but is not so obvious. To explain the winning strategy, we need to think of a number in two ways: as a nonnegative integer and as the bit string equal to the binary representation of the number—possibly with leading zeroes. For example, the XOR of numbers r; s; ::: is defined in terms of their binary repre- sentations: combine the corresponding bits of the binary representations of r; s; ::: using XOR, and then interpret the resulting bit-string as a number. For example, 2 XOR 7 XOR 9 D 12 because, taking XOR’s down the columns, we have 0 0 1 0 (binary rep of 2) 0 1 1 1 (binary rep of 7) 1 0 0 1 (binary rep of 9) 1 1 0 0 (binary rep of 12) This is the same as doing binary addition of the numbers, but throwing away the carries (see Problem 3.6). The XOR of the numbers of stones in the piles is called their Nim sum. In this problem we will verify that if the Nim sum is not zero on a player’s turn, then the player has a winning strategy. For example, if the game starts with five piles of equal size, then the first player has a winning strategy, but if the game starts with four equal-size piles, then the second player can force a win. (a) Prove that if the Nim sum of the piles is zero, then any one move will leave a nonzero Nim sum. (b) Prove that if there is a pile with more stones than the Nim sum of all the other piles, then there is a move that makes the Nim sum equal to zero. (c) Prove that if the Nim sum is not zero, then one of the piles is bigger than the Nim sum of the all the other piles. “mcs” — 2017/3/10 — 22:22 — page 252 — #260 252 Chapter 7 Recursive Data Types Hint: Notice that the largest pile may not be the one that is bigger than the Nim sum of the others; three piles of sizes 2,2,1 is an example. (d) Conclude that if the game begins with a nonzero Nim sum, then the first player has a winning strategy. Hint: Describe a preserved invariant that the first player can maintain. (e) (Extra credit) Nim is sometimes played with winners and losers reversed, that is, the person who takes the last stone loses. This is called the misère version of the game. Use ideas from the winning strategy above for regular play to find one for misère play. “mcs” — 2017/3/10 — 22:22 — page 253 — #261 8 Infinite Sets This chapter is about infinite sets and some challenges in proving things about them. Wait a minute! Why bring up infinity in a Mathematics for Computer Science text? After all, any data set in a computer is limited by the size of the computer’s memory, and there is a bound on the possible size of computer memory, for the simple reason that the universe is (or at least appears to be) bounded. So why not stick with finite sets of some large, but bounded, size? This is a good question, but let’s see if we can persuade you that dealing with infinite sets is inevitable. You may not have noticed, but up to now you’ve already accepted the routine use of the integers, the rationals and irrationals, and sequences of them—infinite sets all. Further, do you really want Physics or the other sciences to give up the real numbers on the grounds that only a bounded number of bounded measurements can be made in a bounded universe? It’s pretty convincing—and a lot simpler—to ignore such big and uncertain bounds (the universe seems to be getting bigger all the time) and accept theories using real numbers. Likewise in computer science, it’s implausible to think that writing a program to add nonnegative integers with up to as many digits as, say, the stars in the sky— billions of galaxies each with billions of stars—would be different from writing a program that would add any two integers, no matter how many digits they had. The same is true in designing a compiler: it’s neither useful nor sensible to make use of the fact that in a bounded universe, only a bounded number of programs will ever be compiled. Infinite sets also provide a nice setting to practice proof methods, because it’s harder to sneak in unjustified steps under the guise of intuition. And there has been a truly astonishing outcome of studying infinite sets. Their study led to the discovery of fundamental, logical limits on what computers can possibly do. For example, in Section 8.2, we’ll use reasoning developed for infinite sets to prove that it’s impossible to have a perfect type-checker for a programming language. So in this chapter, we ask you to bite the bullet and start learning to cope with infinity. “mcs” — 2017/3/10 — 22:22 — page 254 — #262 254 Chapter 8 Infinite Sets 8.1 Infinite Cardinality In the late nineteenth century, the mathematician Georg Cantor was studying the convergence of Fourier series and found some series that he wanted to say con- verged “most of the time,” even though there were an infinite number of points where they didn’t converge. As a result, Cantor needed a way to compare the size of infinite sets. To get a grip on this, he got the idea of extending the Mapping Rule Theorem 4.5.4 to infinite sets: he regarded two infinite sets as having the “same size” when there was a bijection between them. Likewise, an infinite set A should be considered “as big as” a set B when A surj B. So we could consider A to be “strictly smaller” than B, which we abbreviate as A strict B, when A is not “as big as” B: Definition 8.1.1. A strict B iff NOT .A surj B/. On finite sets, this strict relation really does mean “strictly smaller.” This follows immediately from the Mapping Rule Theorem 4.5.4. Corollary 8.1.2. For finite sets A; B, A strict B iff jAj < jBj: Proof. A strict B iff NOT .A surj B/ (Def 8.1.1) iff NOT .jAj jBj/ (Theorem 4.5.4.(4.5)) iff jAj < jBj: Cantor got diverted from his study of Fourier series by his effort to develop a theory of infinite sizes based on these ideas. His theory ultimately had profound consequences for the foundations of mathematics and computer science. But Can- tor made a lot of enemies in his own time because of his work: the general mathe- matical community doubted the relevance of what they called “Cantor’s paradise” of unheard-of infinite sizes. A nice technical feature of Cantor’s idea is that it avoids the need for a definition of what the “size” of an infinite set might be—all it does is compare “sizes.” Warning: We haven’t, and won’t, define what the “size” of an infinite set is. The definition of infinite “sizes” requires the definition of some infinite sets called “mcs” — 2017/3/10 — 22:22 — page 255 — #263 8.1. Infinite Cardinality 255 ordinals with special well-ordering properties. The theory of ordinals requires get- ting deeper into technical set theory than we want to go, and we can get by just fine without defining infinite sizes. All we need are the “as big as” and “same size” relations, surj and bij, between sets. But there’s something else to watch out for: we’ve referred to surj as an “as big as” relation and bij as a “same size” relation on sets. Of course, most of the “as big as” and “same size” properties of surj and bij on finite sets do carry over to infinite sets, but some important ones don’t—as we’re about to show. So you have to be careful: don’t assume that surj has any particular “as big as” property on infinite sets until it’s been proved. Let’s begin with some familiar properties of the “as big as” and “same size” relations on finite sets that do carry over exactly to infinite sets: Lemma 8.1.3. For any sets A; B; C , 1. A surj B iff B inj A. 2. If A surj B and B surj C , then A surj C . 3. If A bij B and B bij C , then A bij C . 4. A bij B iff B bij A. Part 1. follows from the fact that R has the Œ 1 out; 1 in surjective function property iff R 1 has the Œ 1 out; 1 in total, injective property. Part 2. follows from the fact that compositions of surjections are surjections. Parts 3. and 4. fol- low from the first two parts because R is a bijection iff R and R 1 are surjective functions. We’ll leave verification of these facts to Problem 4.22. Another familiar property of finite sets carries over to infinite sets, but this time some real ingenuity is needed to prove it: Theorem 8.1.4. [Schröder-Bernstein] For any sets A; B, if A surj B and B surj A, then A bij B. That is, the Schröder-Bernstein Theorem says that if A is at least as big as B and conversely, B is at least as big as A, then A is the same size as B. Phrased this way, you might be tempted to take this theorem for granted, but that would be a mistake. For infinite sets A and B, the Schröder-Bernstein Theorem is actually pretty technical. Just because there is a surjective function f W A ! B—which need not be a bijection—and a surjective function g W B ! A—which also need not be a bijection—it’s not at all clear that there must be a bijection e W A ! B. The idea is to construct e from parts of both f and g. We’ll leave the actual construction to Problem 8.10. “mcs” — 2017/3/10 — 22:22 — page 256 — #264 256 Chapter 8 Infinite Sets Another familiar set property is that for any two sets, either the first is at least as big as the second, or vice-versa. For finite sets this follows trivially from the Mapping Rule. It’s actually still true for infinite sets, but assuming it was obvious would be mistaken again. Theorem 8.1.5. For all sets A; B, A surj B OR B surj A: Theorem 8.1.5 lets us prove that another basic property of finite sets carries over to infinite ones: Lemma 8.1.6. A strict B AND B strict C (8.1) implies A strict C for all sets A; B; C . Proof. (of Lemma 8.1.6) Suppose 8.1 holds, and assume for the sake of contradiction that NOT.A strict C /, which means that A surj C . Now since B strict C , Theorem 8.1.5 lets us conclude that C surj B. So we have A surj C AND C surj B; and Lemma 8.1.3.2 lets us conclude that A surj B, contradicting the fact that A strict B. We’re omitting a proof of Theorem 8.1.5 because proving it involves technical set theory—typically the theory of ordinals again—that we’re not going to get into. But since proving Lemma 8.1.6 is the only use we’ll make of Theorem 8.1.5, we hope you won’t feel cheated not to see a proof. 8.1.1 Infinity is different A basic property of finite sets that does not carry over to infinite sets is that adding something new makes a set bigger. That is, if A is a finite set and b … A, then jA [ fbgj D jAj C 1, and so A and A [ fbg are not the same size. But if A is infinite, then these two sets are the same size! Lemma 8.1.7. Let A be a set and b … A. Then A is infinite iff A bij A [ fbg. “mcs” — 2017/3/10 — 22:22 — page 257 — #265 8.1. Infinite Cardinality 257 Proof. Since A is not the same size as A [ fbg when A is finite, we only have to show that A [ fbg is the same size as A when A is infinite. That is, we have to find a bijection between A [ fbg and A when A is infinite. Here’s how: since A is infinite, it certainly has at least one element; call it a0 . But since A is infinite, it has at least two elements, and one of them must not equal to a0 ; call this new element a1 . But since A is infinite, it has at least three elements, one of which must not equal both a0 and a1 ; call this new element a2 . Continuing in this way, we conclude that there is an infinite sequence a0 ; a1 ; a2 ; : : : ; an ; : : : of different elements of A. Now it’s easy to define a bijection e W A [ fbg ! A: e.b/ WWD a0 ; e.an / WWD anC1 for n 2 N; e.a/ WWD a for a 2 A fb; a0 ; a1 ; : : :g: 8.1.2 Countable Sets A set C is countable iff its elements can be listed in order, that is, the elements in C are precisely the elements in the sequence c0 ; c1 ; : : : ; cn ; : : : : Assuming no repeats in the list, saying that C can be listed in this way is formally the same as saying that the function, f W N ! C defined by the rule that f .i /WWDci , is a bijection. Definition 8.1.8. A set C is countably infinite iff N bij C . A set is countable iff it is finite or countably infinite. A set is uncountable iff it is not countable. We can also make an infinite list using just a finite set of elements if we allow repeats. For example, we can list the elements in the three-element set f2; 4; 6g as 2; 4; 6; 6; 6; : : : : This simple observation leads to an alternative characterization of countable sets that does not make separate cases of finite and infinite sets. Namely, a set C is countable iff there is a list c0 ; c1 ; : : : ; cn ; : : : of the elements of C , possibly with repeats. Lemma 8.1.9. A set C is countable iff N surj C . In fact, a nonempty set C is countable iff there is a total surjective function g W N ! C . “mcs” — 2017/3/10 — 22:22 — page 258 — #266 258 Chapter 8 Infinite Sets The proof is left to Problem 8.11. The most fundamental countably infinite set is the set N itself. But the set Z of all integers is also countably infinite, because the integers can be listed in the order: 0; 1; 1; 2; 2; 3; 3; : : : : (8.2) In this case, there is a simple formula for the nth element of the list (8.2). That is, the bijection f W N ! Z such that f .n/ is the nth element of the list can be defined as: ( n=2 if n is even; f .n/ WWD .n C 1/=2 if n is odd: There is also a simple way to list all pairs of nonnegative integers, which shows that .N N/ is also countably infinite (Problem 8.17). From this, it’s a small step to reach the conclusion that the set Q0 of nonnegative rational numbers is countable. This may be a surprise—after all, the rationals densely fill up the space between integers, and for any two, there’s another in between. So it might seem as though you couldn’t write out all the rationals in a list, but Problem 8.9 illustrates how to do it. More generally, it is easy to show that countable sets are closed under unions and products (Problems 8.16 and 8.17) which implies the countability of a bunch of familiar sets: Corollary 8.1.10. The following sets are countably infinite: ZC ; Z; N N; QC ; Z Z; Q: A small modification of the proof of Lemma 8.1.7 shows that countably infinite sets are the “smallest” infinite sets. Namely, Lemma 8.1.11. If A is an infinite set, and B is countable, then A surj B. We leave the proof to Problem 8.8. Also, since adding one new element to an infinite set doesn’t change its size, you can add any finite number of elements without changing the size by simply adding one element after another. Something even stronger is true: you can add a countably infinite number of new elements to an infinite set and still wind up with just a set of the same size (Problem 8.13). By the way, it’s a common mistake to think that, because you can add any finite number of elements to an infinite set and have a bijection with the original set, that you can also throw in infinitely many new elements. In general it isn’t true that just because it’s OK to do something any finite number of times, it’s also OK to do it an infinite number of times. For example, starting from 3, you can increment by 1 any finite number of times, and the result will be some integer greater than or equal to 3. But if you increment an infinite number of times, you don’t get an integer at all. “mcs” — 2017/3/10 — 22:22 — page 259 — #267 8.1. Infinite Cardinality 259 8.1.3 Power sets are strictly bigger Cantor’s astonishing discovery was that not all infinite sets are the same size. In particular, he proved that for any set A the power set pow.A/ is “strictly bigger” than A. That is, Theorem 8.1.12. [Cantor] For any set A, A strict pow.A/: Proof. To show that A is strictly smaller than pow.A/, we have to show that if g is a function from A to pow.A/, then g is not a surjection. To do this, we’ll simply find a subset Ag A that is not in the range of g. The idea is, for any element a 2 A, to look at the set g.a/ A and ask whether or not a happens to be in g.a/. First, define Ag WWD fa 2 A j a … g.a/g: Ag is now a well-defined subset of A, which means it is a member of pow.A/. But Ag can’t be in the range of g, because if it were, we would have Ag D g.a0 / for some a0 2 A, so by definition of Ag , a 2 g.a0 / iff a 2 Ag iff a … g.a/ for all a 2 A. Now letting a D a0 yields the contradiction a0 2 g.a0 / iff a0 … g.a0 /: So g is not a surjection, because there is an element in the power set of A, specifi- cally the set Ag , that is not in the range of g. Cantor’s Theorem immediately implies: Corollary 8.1.13. pow.N/ is uncountable. Proof. By Lemma 8.1.9, U is uncountable iff N strict U . The bijection between subsets of an n-element set and the length n bit-strings f0; 1gn used to prove Theorem 4.5.5, carries over to a bijection between subsets of a countably infinite set and the infinite bit-strings, f0; 1g! . That is, pow.N/ bij f0; 1g! : This immediately implies Corollary 8.1.14. f0; 1g! is uncountable. “mcs” — 2017/3/10 — 22:22 — page 260 — #268 260 Chapter 8 Infinite Sets More Countable and Uncountable Sets Once we have a few sets we know are countable or uncountable, we can get lots more examples using Lemma 8.1.3. In particular, we can appeal to the following immediate corollary of the Lemma: Corollary 8.1.15. (a) If U is an uncountable set and A surj U , then A is uncountable. (b) If C is a countable set and C surj A, then A is countable. For example, now that we know that the set f0; 1g! of infinite bit strings is un- countable, it’s a small step to conclude that Corollary 8.1.16. The set R of real numbers is uncountable. To prove this, think about the infinite decimal expansion of a real number: p 2 D 1:4142 : : : ; 5 D 5:000 : : : ; 1=10 D 0:1000 : : : ; 1=3 D 0:333 : : : ; 1=9 D 0:111 : : : ; 1 4 D 4:010101 : : : : 99 Let’s map any real number r to the infinite bit string b.r/ equal to the sequence of bits in the decimal expansion of r, starting at the decimal point. If the decimal expansion of r happens to contain a digit other than 0 or 1, leave b.r/ undefined. For example, b.5/ D 000 : : : ; b.1=10/ D 1000 : : : ; b.1=9/ D 111 : : : ; 1 b.4 / D 010101 : : : p 99 b. 2/; b.1=3/ are undefined: “mcs” — 2017/3/10 — 22:22 — page 261 — #269 8.1. Infinite Cardinality 261 Now b is a function from real numbers to infinite bit strings.1 It is not a total function, but it clearly is a surjection. This shows that R surj f0; 1g! ; and the uncountability of the reals now follows by Corollary 8.1.15.(a). For another example, let’s prove Corollary 8.1.17. The set .ZC / of all finite sequences of positive integers is count- able. To prove this, think about the prime factorization of a nonnegative integer: 20 D 22 30 51 70 110 130 ; 6615 D 20 33 51 72 110 130 : Let’s map any nonnegative integer n to the finite sequence e.n/ of nonzero expo- nents in its prime factorization. For example, e.20/ D .2; 1/; e.6615/ D .3; 1; 2/; e.513 119 47817 10344 / D .13; 9; 817; 44/; e.1/ D ; (the empty string) e.0/ is undefined: Now e is a function from N to .ZC / . It is defined on all positive integers, and it clearly is a surjection. This shows that N surj .ZC / ; and the countability of the finite strings of positive integers now follows by Corol- lary 8.1.15.(b). 1 Some rational numbers can be expanded in two ways—as an infinite sequence ending in all 0’s or as an infinite sequence ending in all 9’s. For example, 5 D 5:000 D 4:999 : : : ; 1 D 0:1000 D 0:0999 : : : : 10 In such cases, define b.r/ to be the sequence that ends with all 0’s. “mcs” — 2017/3/10 — 22:22 — page 262 — #270 262 Chapter 8 Infinite Sets Larger Infinities There are lots of different sizes of infinite sets. For example, starting with the infinite set N of nonnegative integers, we can build the infinite sequence of sets N strict pow.N/ strict pow.pow.N// strict pow.pow.pow.N/// strict : : : : By Cantor’s Theorem 8.1.12, each of these sets is strictly bigger than all the pre- ceding ones. But that’s not all: the union of all the sets in the sequence is strictly bigger than each set in the sequence (see Problem 8.24). In this way you can keep going indefinitely, building “bigger” infinities all the way. 8.1.4 Diagonal Argument Theorem 8.1.12 and similar proofs are collectively known as “diagonal arguments” because of a more intuitive version of the proof described in terms of on an infinite square array. Namely, suppose there was a bijection between N and f0; 1g! . If such a relation existed, we would be able to display it as a list of the infinite bit strings in some countable order or another. Once we’d found a viable way to organize this list, any given string in f0; 1g! would appear in a finite number of steps, just as any integer you can name will show up a finite number of steps from 0. This hypothetical list would look something like the one below, extending to infinity both vertically and horizontally: A0 D 1 0 0 0 1 1 A1 D 0 1 1 1 0 1 A2 D 1 1 1 1 1 1 A3 D 0 1 0 0 1 0 A4 D 0 0 1 0 0 0 A5 D 1 0 0 1 1 1 :: :: :: :: :: :: :: :: : : : : : : : : But now we can exhibit a sequence that’s missing from our allegedly complete list of all the sequences. Look at the diagonal in our sample list: A0 D 1 0 0 0 1 1 A1 D 0 1 1 1 0 1 A2 D 1 1 1 1 1 1 A3 D 0 1 0 0 1 0 A4 D 0 0 1 0 0 0 A5 D 1 0 0 1 1 1 :: :: :: :: :: :: :: :: : : : : : : : : “mcs” — 2017/3/10 — 22:22 — page 263 — #271 8.2. The Halting Problem 263 Here is why the diagonal argument has its name: we can form a sequence D con- sisting of the bits on the diagonal. D D 1 1 1 0 0 1 ; Then, we can form another sequence by switching the 1’s and 0’s along the diago- nal. Call this sequence C : C D 0 0 0 1 1 0 : Now if nth term of An is 1 then the nth term of C is 0, and vice versa, which guarantees that C differs from An . In other words, C has at least one bit different from every sequence on our list. So C is an element of f0; 1g! that does not appear in our list—our list can’t be complete! This diagonal sequence C corresponds to the set fa 2 A j a … g.a/g in the proof of Theorem 8.1.12. Both are defined in terms of a countable subset of the uncountable infinity in a way that excludes them from that subset, thereby proving that no countable subset can be as big as the uncountable set. 8.2 The Halting Problem Although towers of larger and larger infinite sets are at best a romantic concern for a computer scientist, the reasoning that leads to these conclusions plays a critical role in the theory of computation. Diagonal arguments are used to show that lots of problems can’t be solved by computation, and there is no getting around it. This story begins with a reminder that having procedures operate on programs is a basic part of computer science technology. For example, compilation refers to taking any given program text written in some “high level” programming language like Java, C++, Python, . . . , and then generating a program of low-level instruc- tions that does the same thing but is targeted to run well on available hardware. Similarly, interpreters or virtual machines are procedures that take a program text designed to be run on one kind of computer and simulate it on another kind of com- puter. Routine features of compilers involve “type-checking” programs to ensure that certain kinds of run-time errors won’t happen, and “optimizing” the generated programs so they run faster or use less memory. The fundamental thing that just can’t be done by computation is a perfect job of type-checking, optimizing, or any kind of analysis of the overall run time behavior of programs. In this section, we’ll illustrate this with a basic example known as the Halting Problem. The general Halting Problem for some programming language “mcs” — 2017/3/10 — 22:22 — page 264 — #272 264 Chapter 8 Infinite Sets is, given an arbitrary program, to determine whether the program will run forever if it is not interrupted. If the program does not run forever, it is said to halt. Real pro- grams may halt in many ways, for example, by returning some final value, aborting with some kind of error, or by awaiting user input. But it’s easy to detect when any given program will halt: just run it on a virtual machine and wait till it stops. The problem comes when the given program does not halt—you may wind up waiting indefinitely without realizing that the wait is fruitless. So how could you detect that the program does not halt? We will use a diagonal argument to prove that if an analysis program tries to recognize the non-halting programs, it is bound to give wrong answers, or no answers, for an infinite number of the programs it is supposed to be able to analyze! To be precise about this, let’s call a programming procedure—written in your fa- vorite programming language—C++, or Java, or Python—a string procedure when it is applicable to strings in the set ASCII of strings over the 256 character ASCII alphabet. As a simple example, you might think about how to write a string procedure that halts precisely when it is applied to a double letter string in ASCII , namely, a string in which every character occurs twice in a row. For example, aaCC33, and zz++ccBB are double letter strings, but aa;bb, b33, and AAAAA are not. If the computation that happens when a procedure applied to a string eventually comes to a halt, the procedure is said to recognize the string. In this context, a set of strings a commonly called a (formal) language. We let lang.P / to be the language recognized by procedure P : lang.P / WWDfs 2 ASCII j P applied to s haltsg: A language is called recognizable when it equals lang.P / for some string pro- cedure P . For example, we’ve just agreed that the set of double letter strings is recognizable. There is no harm in assuming that every program can be written as a string in ASCII ; they usually are. When a string s 2 ASCII is actually the ASCII descrip- tion of some string procedure, we’ll refer to that string procedure as Ps . You can think of Ps as the result of compiling s into something executable.2 It’s technically helpful to treat every string in ASCII as a program for a string procedure. So when a string s 2 ASCII doesn’t parse as a proper string procedure, we’ll define Ps to 2 The string s 2 ASCII and the procedure Ps have to be distinguished to avoid a type error: you can’t apply a string to string. For example, let s be the string that you wrote as your program to recognize the double letter strings. Applying s to a string argument, say aabbccdd, should throw a type exception; what you need to do is compile s to the procedure Ps and then apply Ps to aabbccdd. “mcs” — 2017/3/10 — 22:22 — page 265 — #273 8.2. The Halting Problem 265 be some default string procedure—say one that never halts on anything it is applied to. Focusing just on string procedures, the general Halting Problem is to decide, given strings s and t, whether or not the procedure Ps recognizes t. We’ll show that the general problem can’t be solved by showing that a special case can’t be solved, namely, whether or not Ps recognizes s. Definition 8.2.1. No-halt WWD fs j Ps applied to s does not haltg D fs … lang.Ps /g: (8.3) We’re going to prove Theorem 8.2.2. No-halt is not recognizable. We’ll use an argument just like Cantor’s in the proof of Theorem 8.1.12. Proof. By definition, s 2 No-halt IFF s … lang.Ps /; (8.4) for all strings s 2 ASCII . Now suppose to the contrary that No-halt was recognizable. This means there is some procedure Ps0 that recognizes No-halt, that is, No-halt D lang.Ps0 / : Combined with (8.4), we get s 2 lang.Ps0 / iff s … lang.Ps / (8.5) for all s 2 ASCII . Now letting s D s0 in (8.5) yields the immediate contradiction s0 2 lang.Ps0 / iff s0 … lang.Ps0 / : This contradiction implies that No-halt cannot be recognized by any string proce- dure. So that does it: it’s logically impossible for programs in any particular language to solve just this special case of the general Halting Problem for programs in that language. And having proved that it’s impossible to have a procedure that figures out whether an arbitrary program halts, it’s easy to show that it’s impossible to have a procedure that is a perfect recognizer for any overall run time property.3 3 The weasel word “overall” creeps in here to rule out some run time properties that are easy to recognize because they depend only on part of the run time behavior. For example, the set of programs that halt after executing at most 100 instructions is recognizable. “mcs” — 2017/3/10 — 22:22 — page 266 — #274 266 Chapter 8 Infinite Sets For example, most compilers do “static” type-checking at compile time to ensure that programs won’t make run-time type errors. A program that type-checks is guaranteed not to cause a run-time type-error. But since it’s impossible to recognize perfectly when programs won’t cause type-errors, it follows that the type-checker must be rejecting programs that really wouldn’t cause a type-error. The conclusion is that no type-checker is perfect—you can always do better! It’s a different story if we think about the practical possibility of writing pro- gramming analyzers. The fact that it’s logically impossible to analyze perfectly arbitrary programs does not mean that you can’t do a very good job analyzing in- teresting programs that come up in practice. In fact, these “interesting” programs are commonly intended to be analyzable in order to confirm that they do what they’re supposed to do. In the end, it’s not clear how much of a hurdle this theoretical limitation implies in practice. But the theory does provide some perspective on claims about general analysis methods for programs. The theory tells us that people who make such claims either are exaggerating the power (if any) of their methods, perhaps to make a sale or get a grant, or are trying to keep things simple by not going into technical limitations they’re aware of, or perhaps most commonly, are so excited about some useful practical successes of their methods that they haven’t bothered to think about the limitations which must be there. So from now on, if you hear people making claims about having general program analysis/verification/optimization methods, you’ll know they can’t be telling the whole story. One more important point: there’s no hope of getting around this by switching programming languages. Our proof covered programs written in some given pro- gramming language like Java, for example, and concluded that no Java program can perfectly analyze all Java programs. Could there be a C++ analysis procedure that successfully takes on all Java programs? After all, C++ does allow more intimate manipulation of computer memory than Java does. But there is no loophole here: it’s possible to write a virtual machine for C++ in Java, so if there were a C++ pro- cedure that analyzed Java programs, the Java virtual machine would be able to do it too, and that’s impossible. These logical limitations on the power of computation apply no matter what kinds of programs or computers you use. “mcs” — 2017/3/10 — 22:22 — page 267 — #275 8.3. The Logic of Sets 267 8.3 The Logic of Sets 8.3.1 Russell’s Paradox Reasoning naively about sets turns out to be risky. In fact, one of the earliest at- tempts to come up with precise axioms for sets in the late nineteenth century by the logician Gotlob Frege, was shot down by a three line argument known as Rus- sell’s Paradox4 which reasons in nearly the same way as the proof of Cantor’s Theorem 8.1.12. This was an astonishing blow to efforts to provide an axiomatic foundation for mathematics: Russell’s Paradox Let S be a variable ranging over all sets, and define W WWD fS j S 62 S g: So by definition, S 2 W iff S 62 S; for every set S . In particular, we can let S be W , and obtain the contradictory result that W 2 W iff W 62 W: The simplest reasoning about sets crashes mathematics! Russell and his col- league Whitehead spent years trying to develop a set theory that was not contra- dictory, but would still do the job of serving as a solid logical foundation for all of mathematics. Actually, a way out of the paradox was clear to Russell and others at the time: it’s unjustified to assume that W is a set. The step in the proof where we let S be W has no justification, because S ranges over sets, and W might not be a set. In fact, the paradox implies that W had better not be a set! 4 Bertrand Russell was a mathematician/logician at Cambridge University at the turn of the Twen- tieth Century. He reported that when he felt too old to do mathematics, he began to study and write about philosophy, and when he was no longer smart enough to do philosophy, he began writing about politics. He was jailed as a conscientious objector during World War I. For his extensive philosophical and political writing, he won a Nobel Prize for Literature. “mcs” — 2017/3/10 — 22:22 — page 268 — #276 268 Chapter 8 Infinite Sets But denying that W is a set means we must reject the very natural axiom that every mathematically well-defined collection of sets is actually a set. The prob- lem faced by Frege, Russell and their fellow logicians was how to specify which well-defined collections are sets. Russell and his Cambridge University colleague Whitehead immediately went to work on this problem. They spent a dozen years developing a huge new axiom system in an even huger monograph called Prin- cipia Mathematica, but for all intents and purposes, their approach failed. It was so cumbersome no one ever used it, and it was subsumed by a much simpler, and now widely accepted, axiomatization of set theory by the logicians Zermelo and Fraenkel. 8.3.2 The ZFC Axioms for Sets A formula of set theory5 is a predicate formula that only talks about membership in sets. That is, a first-order formula of set theory is built using logical connectives and quantifiers starting solely from expressions of the form “x 2 y.” The domain of discourse is the collection of sets, and “x 2 y” is interpreted to mean that x and y are variables that range over sets, and x is one of the elements in y. Formulas of set theory are not even allowed to have the equality symbol “D,” but sets are equal iff they have the same elements, so there is an easy way to express equality of sets purely in terms of membership: .x D y/ WWD 8z: .z 2 x IFF z 2 y/: (8.6) Similarly, the subset symbol “” is not allowed in formulas of set theory, but we can also express subset purely in terms of membership: .x y/ WWD 8z: .z 2 x IMPLIES z 2 y/: (8.7) So formulas using symbols “D; ,” in addition to “2” can be understood as abbreviations for formulas only using “2.” We won’t worry about this distinction between formulas and abbreviations for formulas—we’ll now just call them all “formulas of set theory.” For example, x D y IFF Œx y AND y x is a formula of set theory that explains a basic connection between set equality and set containment. It’s generally agreed that essentially all of mathematics can be derived from a few formulas of set theory, called the Axioms of Zermelo-Fraenkel Set Theory with Choice (ZFC), using a few simple logical deduction rules. 5 Technically this is called a pure first-order formula of set theory “mcs” — 2017/3/10 — 22:22 — page 269 — #277 8.3. The Logic of Sets 269 We’re not going to be studying the axioms of ZFC in this text, but we thought you might like to see them—and while you’re at it, get some more practice reading and writing quantified formulas: Extensionality. Two sets are equal iff they are members of the same sets: x D y IFF .8z: z 2 x IFF z 2 y/: Pairing. For any two sets x and y, there is a set fx; yg with x and y as its only elements: 8x; y9u8z: Œz 2 u IFF .z D x OR z D y/ Union. The union u of a collection z of sets is also a set: 8z9u8x: .x 2 u/ IFF .9y: x 2 y AND y 2 z/ Infinity. There is an infinite set. Specifically, there is a nonempty set x such that for any set y 2 x, the set fyg is also a member of x. Subset. Given any set x and any definable property of sets, there is a set y contain- ing precisely those elements in x that have the property. 8x9y8z: z 2 y IFF Œz 2 x AND .z/ where .z/ is a formula of set theory.6 Power Set. All the subsets of a set form another set: 8x9p8u: u x IFF u 2 p: Replacement. Suppose a formula of set theory defines the graph of a total func- tion on a set s, that is, 8x 2 s 9y: .x; y/; and 8x 2 s 8y; z: Œ.x; y/ AND .x; z/ IMPLIES y D z: Then the image of s under that function is also a set t. Namely, 9t8y: y 2 t IFF Œ9x 2 s: .x; y/: 6 This axiom is more commonly called the Comprehension Axiom. “mcs” — 2017/3/10 — 22:22 — page 270 — #278 270 Chapter 8 Infinite Sets Foundation. The aim is to forbid any infinite sequence of sets of the form 2 xn 2 2 x1 2 x0 in which each set is a member of the next one. This can be captured by saying every nonempty set has a “member-minimal” element. Namely, define member-minimal.m; x/ WWD Œm 2 x AND 8y 2 x: y … m: Then the Foundation Axiom7 is 8x: x ¤ ; IMPLIES 9m: member-minimal.m; x/: Choice. Let s be a set of nonempty, disjoint sets. Then there is a set c consisting of exactly one element from each set in s. The formula is given in Problem 8.30. 8.3.3 Avoiding Russell’s Paradox These modern ZFC axioms for set theory are much simpler than the system Russell and Whitehead first came up with to avoid paradox. In fact, the ZFC axioms are as simple and intuitive as Frege’s original axioms, with one technical addition: the Foundation axiom. Foundation captures the intuitive idea that sets must be built up from “simpler” sets in certain standard ways. And in particular, Foundation implies that no set is ever a member of itself. So the modern resolution of Russell’s paradox goes as follows: since S 62 S for all sets S , it follows that W , defined above, contains every set. This means W can’t be a set—or it would be a member of itself. 8.4 Does All This Really Work? So this is where mainstream mathematics stands today: there is a handful of ZFC axioms from which virtually everything else in mathematics can be logically de- rived. This sounds like a rosy situation, but there are several dark clouds, suggest- ing that the essence of truth in mathematics is not completely resolved. The ZFC axioms weren’t etched in stone by God. Instead, they were mostly made up by Zermelo, who may have been a brilliant logician, but was also a fallible human being—probably some days he forgot his house keys. So 7 This axiom is also called the Regularity Axiom. “mcs” — 2017/3/10 — 22:22 — page 271 — #279 8.4. Does All This Really Work? 271 maybe Zermelo, just like Frege, didn’t get his axioms right and will be shot down by some successor to Russell who will use his axioms to prove a proposition P and its negation P . Then math as we understand it would be broken—this may sound crazy, but it has happened before. In fact, while there is broad agreement that the ZFC axioms are capable of proving all of standard mathematics, the axioms have some further conse- quences that sound paradoxical. For example, the Banach-Tarski Theorem says that, as a consequence of the axiom of choice, a solid ball can be divided into six pieces and then the pieces can be rigidly rearranged to give two solid balls of the same size as the original! Some basic questions about the nature of sets remain unresolved. For exam- ple, Cantor raised the question whether there is a set whose size is strictly between the smallest infinite set N (see Problem 8.8) and the strictly larger set pow.N/? Cantor guessed not: Cantor’s Contiuum Hypothesis: There is no set A such that N strict A strict pow.N/: The Continuum Hypothesis remains an open problem a century later. Its difficulty arises from one of the deepest results in modern Set Theory— discovered in part by Gödel in the 1930’s and Paul Cohen in the 1960’s— namely, the ZFC axioms are not sufficient to settle the Continuum Hypoth- esis: there are two collections of sets, each obeying the laws of ZFC, and in one collection the Continuum Hypothesis is true, and in the other it is false. Until a mathematician with a deep understanding of sets can extend ZFC with persuasive new axioms, the Continuum Hypothesis will remain undecided. But even if we use more or different axioms about sets, there are some un- avoidable problems. In the 1930’s, Gödel proved that, assuming that an ax- iom system like ZFC is consistent—meaning you can’t prove both P and P for any proposition, P —then the very proposition that the system is consis- tent (which is not too hard to express as a logical formula) cannot be proved in the system. In other words, no consistent system is strong enough to verify itself. 8.4.1 Large Infinities in Computer Science If the romance of different-size infinities and continuum hypotheses doesn’t appeal to you, not knowing about them is not going to limit you as a computer scientist. “mcs” — 2017/3/10 — 22:22 — page 272 — #280 272 Chapter 8 Infinite Sets These abstract issues about infinite sets rarely come up in mainstream mathemat- ics, and they don’t come up at all in computer science, where the focus is generally on “countable,” and often just finite, sets. In practice, only logicians and set the- orists have to worry about collections that are “too big” to be sets. That’s part of the reason that the 19th century mathematical community made jokes about “Can- tor’s paradise” of obscure infinities. But the challenge of reasoning correctly about this far-out stuff led directly to the profound discoveries about the logical limits of computation described in Section 8.2, and that really is something every computer scientist should understand. Problems for Section 8.1 Practice Problems Problem 8.1. Show that the set f0; 1g of finite binary strings is countable. Problem 8.2. Describe an example of two uncountable sets A and B such that there is no bijec- tion between A and B. Problem 8.3. Indicate which of the following assertions (there may be more than one) are equiv- alent to A strict N: jAj is undefined. A is countably infinite. A is uncountable. A is finite. N surj A. 8n 2 N, jAj n. 8n 2 N, jAj n. “mcs” — 2017/3/10 — 22:22 — page 273 — #281 8.4. Does All This Really Work? 273 9n 2 N: jAj n. 9n 2 N: jAj < n. Problem 8.4. Prove that if there is a total injective (Œ 1 out; 1 in) relation from S to N, then S is countable. Problem 8.5. Prove that if S is an infinite set, then pow S is uncountable. Problem 8.6. Let A to be some infinite set and B to be some countable set. We know from Lemma 8.1.7 that A bij .A [ fb0 g/ for any element b0 2 B. An easy induction implies that A bij .A [ fb0 ; b1 ; : : : ; bn g/ (8.8) for any finite subset fb0 ; b1 ; : : : ; bn g B. Students sometimes think that (8.8) shows that A bij .A [ B/. Now it’s true that A bij .A [ B/ for all such A and B for any countable set B (Problem 8.13), but the facts above do not prove it. To explain this, let’s say that a predicate P .C / is finitely discontinuous when P .A [ F / is true for every finite subset F B, but P .A [ B/ is false. The hole in the claim that (8.8) implies A bij .A [ B/ is the assumption (without proof) that the predicate P0 .C / WWD ŒA bij C is not finitely discontinuous. This assumption about P0 is correct, but it’s not com- pletely obvious and takes some proving. To illustrate this point, let A be the nonnegative integers and B be the nonneg- ative rational numbers, and remember that both A and B are countably infinite. Some of the predicates P .C / below are finitely discontinuous and some are not. Indicate which is which. 1. C is finite. “mcs” — 2017/3/10 — 22:22 — page 274 — #282 274 Chapter 8 Infinite Sets 2. C is countable. 3. C is uncountable. 4. C contains only finitely many non-integers. 5. C contains the rational number 2/3. 6. There is a maximum non-integer in C . 7. There is an > 0 such that any two elements of C are apart. 8. C is countable. 9. C is uncountable. 10. C has no infinite decreasing sequence c0 > c1 > . 11. Every nonempty subset of C has a minimum element. 12. C has a maximum element. 13. C has a minimum element. Class Problems Problem 8.7. Show that the set N of finite sequences of nonnegative integers is countable. Problem 8.8. (a) Several students felt the proof of Lemma 8.1.7 was worrisome, if not circular. What do you think? (b) Use the proof of Lemma 8.1.7 to show that if A is an infinite set, then A surj N, that is, every infinite set is “at least as big as” the set of nonnegative integers. Problem 8.9. The rational numbers fill the space between integers, so a first thought is that there must be more of them than the integers, but it’s not true. In this problem you’ll show that there are the same number of positive rationals as positive integers. That is, the positive rationals are countable. “mcs” — 2017/3/10 — 22:22 — page 275 — #283 8.4. Does All This Really Work? 275 (a) Define a bijection between the set ZC of positive integers, and the set .ZC ZC / of all pairs of positive integers: .1; 1/; .1; 2/; .1; 3/; .1; 4/; .1; 5/; : : : .2; 1/; .2; 2/; .2; 3/; .2; 4/; .2; 5/; : : : .3; 1/; .3; 2/; .3; 3/; .3; 4/; .3; 5/; : : : .4; 1/; .4; 2/; .4; 3/; .4; 4/; .4; 5/; : : : .5; 1/; .5; 2/; .5; 3/; .5; 4/; .5; 5/; : : : :: : (b) Conclude that the set QC of all positive rational numbers is countable. Problem 8.10. This problem provides a proof of the [Schröder-Bernstein] Theorem: If A inj B and B inj A, then A bij B. (8.9) Since A inj B and B inj A, there are are total injective functions f W A ! B and g W B ! A. Assume for simplicity that A and B have no elements in common. Let’s picture the elements of A arranged in a column, and likewise B arranged in a second col- umn to the right, with left-to-right arrows connecting a to f .a/ for each a 2 A and likewise right-to-left arrows for g. Since f and g are total functions, there is exactly one arrow out of each element. Also, since f and g are injections, there is at most one arrow into any element. So starting at any element, there is a unique and unending path of arrows going forwards (it might repeat). There is also a unique path of arrows going backwards, which might be unending, or might end at an element that has no arrow into it. These paths are completely separate: if two ran into each other, there would be two arrows into the element where they ran together. This divides all the elements into separate paths of four kinds: (i) paths that are infinite in both directions, (ii) paths that are infinite going forwards starting from some element of A. (iii) paths that are infinite going forwards starting from some element of B. (iv) paths that are unending but finite. (a) What do the paths of the last type (iv) look like? “mcs” — 2017/3/10 — 22:22 — page 276 — #284 276 Chapter 8 Infinite Sets (b) Show that for each type of path, either (i) the f -arrows define a bijection between the A and B elements on the path, or (ii) the g-arrows define a bijection between B and A elements on the path, or (iii) both sets of arrows define bijections. For which kinds of paths do both sets of arrows define bijections? (c) Explain how to piece these bijections together to form a bijection between A and B. (d) Justify the assumption that A and B are disjoint. Problem 8.11. (a) Prove that if a nonempty set C is countable, then there is a total surjective function f W N ! C . (b) Conversely, suppose that N surj D, that is, there is a not necessarily total surjective function f W ND. Prove that D is countable. Problem 8.12. (a) For each of the following sets, indicate whether it is finite, countably infinite, or uncountable. (i) The set of even integers greater than 10100 . (ii) The set of “pure” complex numbers of the form ri for nonzero real numbers r. (iii) The powerset of the integer interval Œ10::1010 . (iv) The complex numbers c such that 9m; n 2 Z: .m C nc/c D 0. Let U be an uncountable set, C be a countably infinite subset of U, and D be a countably infinite set. (v) U [ D. (vi) U \ C (vii) U D (b) Given examples of sets A and B such that R strict A strict B: Recall that A strict B means that A is not “as big as” B. “mcs” — 2017/3/10 — 22:22 — page 277 — #285 8.4. Does All This Really Work? 277 Homework Problems Problem 8.13. Prove that if A is an infinite set and B is a countably infinite set that has no elements in common with A, then A bij .A [ B/: Reminder: You may assume any of the results from class or text as long as you state them explicitly. Problem 8.14. In this problem you will prove a fact that may surprise you—or make you even more convinced that set theory is nonsense: the half-open unit interval is actually the “same size” as the nonnegative quadrant of the real plane!8 Namely, there is a bijection from .0; 1 to Œ0; 1/ Œ0; 1/. (a) Describe a bijection from .0; 1 to Œ0; 1/. Hint: 1=x almost works. (b) An infinite sequence of the decimal digits f0; 1; : : : ; 9g will be called long if it does not end with all 0’s. An equivalent way to say this is that a long sequence is one that has infinitely many occurrences of nonzero digits. Let L be the set of all such long sequences. Describe a bijection from L to the half-open real interval .0; 1. Hint: Put a decimal point at the beginning of the sequence. (c) Describe a surjective function from L to L2 that involves alternating digits from two long sequences. Hint: The surjection need not be total. (d) Prove the following lemma and use it to conclude that there is a bijection from L2 to .0; 12 . Lemma 8.4.1. Let A and B be nonempty sets. If there is a bijection from A to B, then there is also a bijection from A A to B B. (e) Conclude from the previous parts that there is a surjection from .0; 1 to .0; 12 . Then appeal to the Schröder-Bernstein Theorem to show that there is actually a bijection from .0; 1 to .0; 12 . (f) Complete the proof that there is a bijection from .0; 1 to Œ0; 1/2 . 8 The half-open unit interval .0; 1 is fr 2 R j 0 < r 1g. Similarly, Œ0; 1/ WWD fr 2 R j r 0g. “mcs” — 2017/3/10 — 22:22 — page 278 — #286 278 Chapter 8 Infinite Sets Exam Problems Problem 8.15. (a) For each of the following sets, indicate whether it is finite, countably infinite, or uncountable. (i) The set of even integers greater than 10100 . (ii) The set of “pure” complex numbers of the form ri for nonzero real numbers r. (iii) The powerset of the integer interval Œ10::1010 . (iv) The complex numbers c such that c is the root of a quadratic with integer coefficients, that is, 9m; n; p 2 Z; m ¤ 0: mc 2 C nc C p D 0: Let U be an uncountable set, C be a countably infinite subset of U, and D be a countably infinite set. (v) U [ D. (vi) U \ C (vii) U D (b) Give an example of sets A and B such that R strict A strict B: Problem 8.16. Prove that if A0 ; A1 ; : : : ; An ; : : : is an infinite sequence of countable sets, then so is [1 An nD0 Problem 8.17. Let A and B be countably infinite sets: A D fa0 ; a1 ; a2 ; a3 ; : : :g B D fb0 ; b1 ; b2 ; b3 ; : : :g Show that their product A B is also a countable set by showing how to list the elements of AB. You need only show enough of the initial terms in your sequence to make the pattern clear—a half dozen or so terms usually suffice. “mcs” — 2017/3/10 — 22:22 — page 279 — #287 8.4. Does All This Really Work? 279 Problem 8.18. Let f0; 1g be the set of finite binary sequences, f0; 1g! be the set of infinite bi- nary sequences, and F be the set of sequences in f0; 1g! that contain only a finite number of occurrences of 1’s. (a) Describe a simple surjective function from f0; 1g to F . (b) The set F WWD f0; 1g! F consists of all the infinite binary sequences with infinitely many 1’s. Use the previous problem part to prove that F is uncountable. Hint: We know that f0; 1g is countable and f0; 1g! is not. Problem 8.19. Let f0; 1g! be the set of infinite binary strings, and let B f0; 1g! be the set of infinite binary strings containing infinitely many occurrences of 1’s. Prove that B is uncountable. (We have already shown that f0; 1g! is uncountable.) Hint: Define a suitable function from f0; 1g! to B. Problem 8.20. A real number is called quadratic when it is a root of a degree two polynomial with integer coefficients. Explain why there are only countably many quadratic reals. Problem 8.21. Describe which of the following sets have bijections between them: Z (integers); R (real numbers); C (complex numbers); Q (rational numbers); pow.Z/ (all subsets of integers); pow.;/; pow.pow.;//; f0; 1g (finite binary sequences); f0; 1g! (infinite binary sequences) fT; Fg (truth values) pow.fT; Fg/; pow.f0; 1g! / Problem 8.22. Prove that the set ZC of all finite sequences of positive integers is countable. Hint: If s 2 ZC , let sum(s) be the sum of the successive integers in s. “mcs” — 2017/3/10 — 22:22 — page 280 — #288 280 Chapter 8 Infinite Sets Problems for Section 8.2 Class Problems Problem 8.23. Let N! be the set of infinite sequences of nonnegative integers. For example, some sequences of this kind are: .0; 1; 2; 3; 4; : : : /; .2; 3; 5; 7; 11; : : : /; .3; 1; 4; 5; 9; : : : /: Prove that this set of sequences is uncountable. Problem 8.24. There are lots of different sizes of infinite sets. For example, starting with the infinite set N of nonnegative integers, we can build the infinite sequence of sets N strict pow.N/ strict pow.pow.N// strict pow.pow.pow.N/// strict : : : : where each set is “strictly smaller” than the next one by Theorem 8.1.12. Let pown .N/ be the nth set in the sequence, and 1 [ U WWD pown .N/: nD0 (a) Prove that U surj pown .N/; (8.10) for all n > 0. (b) Prove that pown .N/ strict U for all n 2 N. Now of course, we could take U; pow.U /; pow.pow.U //; : : : and keep on in this way building still bigger infinities indefinitely. “mcs” — 2017/3/10 — 22:22 — page 281 — #289 8.4. Does All This Really Work? 281 Homework Problems Problem 8.25. For any sets A and B, let ŒA ! B be the set of total functions from A to B. Prove that if A is not empty and B has more than one element, then NOT.A surj ŒA ! B/. Hint: Suppose that is a function from A to ŒA ! B mapping each element a 2 A to a function a W A ! B. Pick any two elements of B; call them 0 and 1. Then define ( 0 if a .a/ D 1; diag.a/ WWD 1 otherwise: Problem 8.26. String procedures are one-argument procedures that apply to strings over the ASCII alphabet. If application of procedure P to string s results in a computation that eventually halts, we say that P recognizes s. We define lang.P / to be the set of strings or language recognized by P : lang.P / WWDfs 2 ASCII j P recognizes sg: A language is unrecognizable when it is not equal to lang.P / for any procedure P . A string procedure declaration is a text s 2 ASCII that conforms to the gram- matical rules for programs. The declaration defines a procedure Ps , which we can think of as the result of compiling s into an executable object. If s 2 ASCII is not a grammatically well-formed procedure declaration, we arbitrarily define Ps to be the string procedure that fails to halt when applied to any string. Now every string defines a string procedure, and every string procedure is Ps for some s 2 ASCII . An easy diagonal argument in Section 8.2 showed that No-halt WWD fs j Ps applied to s does not haltg D fs j s … lang.Ps /g is not recognizable. It may seem pretty weird to apply a procedure to its own declaration. Are there any less weird examples of unrecognizable set? The answer is “many more.” In this problem, we’ll show three more: No-halt- WWD fs j Ps applied to does not haltg D fs j … lang.Ps /g; Finite-halt WWD fs j lang.Ps / is finiteg; Always-halt WWD fs j lang.Ps / D ASCII g: “mcs” — 2017/3/10 — 22:22 — page 282 — #290 282 Chapter 8 Infinite Sets Let’s begin by showing how we could use a recognizer for No-halt- to define a recognizer for No-halt. That is, we will “reduce” the weird problem of recognizing No-halt to the more understandable problem of recognizing No-halt-. Since there is no recognizer for No-halt, it follows that there can’t be one for No-halt- either. Here’s how this reduction would work: suppose we want to recognize when a given string s is in No-halt. Revise s to be the declaration of a slightly modified procedure Ps 0 which behaves as follows: Ps 0 applied to argument t 2 ASCII , ignores t, and simulates Ps ap- plied to s. So, if Ps applied to s halts, then Ps 0 halts on every string it is applied to, and if Ps applied to s does not halt, then Ps 0 does not halt on any string it is applied to. That is, s 2 No-halt IMPLIES lang.Ps 0 / D ; IMPLIES … lang.Ps 0 / IMPLIES s 0 2 No-halt-; s … No-halt IMPLIES lang.Ps 0 / D ASCII IMPLIES 2 lang.Ps 0 / IMPLIES s 0 … No-halt-: In short, s 2 No-halt IFF s 0 2 No-halt-: So to recognize when s 2 No-halt all you need to do is recognize when s 0 2 No-halt-. As already noted above (but we know that remark got by several students, so we’re repeating the explanation), this means that if No-halt- was recognizable, then No-halt would be as well. Since we know that No-halt is unrecognizable, then No-halt- must also be unrecognizable, as claimed. (a) Conclude that Finite-halt is unrecognizable. Hint: Same s 0 . Next, let’s see how a reduction of No-halt to Always-halt would work. Suppose we want to recognize when a given string s is in No-halt. Revise s to be the declaration of a slightly modified procedure Ps 00 which behaves as follows: When Ps 00 is applied to argument t 2 ASCII , it simulates Ps applied to s for up to jt j “steps” (executions of individual machine instruc- tions). If Ps applied to s has not halted in jtj steps, then the application of Ps 00 to t halts. If Ps applied to s has halted within jtj steps, then the application of Ps 00 to t runs forever. “mcs” — 2017/3/10 — 22:22 — page 283 — #291 8.4. Does All This Really Work? 283 (b) Conclude that Always-halt is unrecognizable. Hint: Explain why s 2 No-halt IFF s 00 2 Always-halt: (c) Explain why Finite-halt is unrecognizable. Hint: Same s 00 . Note that it’s easy to recognize when Ps does halt on s: just simulate the appli- cation of Ps to s until it halts. This shows that No-halt is recognizable. We’ve just concluded that Finite-halt is nastier: neither it nor its complement is recognizable. Problem 8.27. There is a famous paradox about describing numbers which goes as follows: There are only so many possible definitions of nonnegative integers that can be written out in English using no more than 161 characters from the Roman alphabet, punctuation symbols, and spaces. So there have to be an infinite number of nonneg- ative integers that don’t have such short definitions. By the Well Ordering Principle, there must be a least nonnegative integer n that has no such short definition. But wait a minute, “The least nonnegative integer that cannot be defined in English using at most 161 characters from the Roman alphabet, punctuation symbols, and spaces.” is a definition of n that uses 161 characters (count ’em). So n can’t exist, and the Well Ordering Principle is unsound! Now this “paradox” doesn’t stand up to reason because it rests on the decidedly murky concept of a “definition in English.” As usual, when you don’t know what you’re talking about, reaching contradictory conclusions is to be expected. But we can extract from this paradox a well-defined and interesting theorem about definability in predicate logic. The method we use is essentialy the same as the one used to prove Cantor’s Theorem 8.1.12, and it leads to many other important results about the logical limits of mathematics and computer science. In particular, we’ll present a simple and precise description of a set of binary strings that can’t be described by ordinary logical formulas. In other words, we will give a precise description of an undescribable set of strings, which sounds paradoxical, but won’t be when we look at it more closely. Let’s start by illustrating how a logical formula can describe the set of binary strings that do not contain a 1: NOTŒ9y: 9z: s D y1z : (no-1s) “mcs” — 2017/3/10 — 22:22 — page 284 — #292 284 Chapter 8 Infinite Sets So the strings s described by formula (no-1s) are exactly the strings consisting solely of 0’s. Formula (no-1s) is an example of a “string formula” of the kind we will use to describe properties of binary strings. More precisely, an atomic string formula is a formula, like “s D y1z” above, that is of the general form “xy : : : z D uv : : : w 00 where x; y; : : : ; z; u; v; : : : ; w may be the constants 0,1, or may be variables rang- ing over the set, f0; 1g , of finite binary strings. A string formula in general is one like (no-1s), built up from atomic formulas using quantifiers and propositional connectives. When G.s/ is a string formula, we’ll use the notation desc.G/ for the set of binary strings s that satisfy G. That is, desc.G/ WWD fs 2 f0; 1g j G.s/g: A set of binary strings is describable if it equals desc.G/ for some string formula G. For example, the set 0 of finite strings of 0’s is describable because desc.(no-1s)/ D 0 : The next important idea comes from the observation that a string formula itself is a syntactic object, consisting of a string of characters over some standard character alphabet. Now coding characters of an alphabet into binary strings is a familiar idea. For example, the characters of the ASCII alphabet have a standard coding into the length eight binary strings. Once its individual characters are coded into binary, a complete string formula can be coded into a binary string by concatenating the binary codes of its consecutive characters—a very familiar idea to a computer scientist. Now suppose x is a binary string that codes some formula Gx . The details of how we extract Gx from its code x don’t matter much—we only require that there is some procedure to actually display the string formula Gx given its code x. It’s technically convenient to treat every string as the code of a string formula, so if x is not a binary string we would get from a string formula, we’ll arbitrarily define Gx to be the formula (no-1s). Now we have just the kind of situation where a Cantor-style diagonal argument can be applied, namely, we’ll ask whether a string describes a property of itself ! That may sound like a mind-bender, but all we’re asking is whether Gx .x/ “mcs” — 2017/3/10 — 22:22 — page 285 — #293 8.4. Does All This Really Work? 285 is true, or equivalently whether x 2 desc.Gx /: For example, using character-by-character translations of formulas into binary, neither the string 0000 nor the string 10 would be the binary representation of a formula, so our convention implies that G0000 D G10 WWD formula (no-1s): So desc.G0000 / D desc.G10 / D 0 : This means that 0000 2 desc.G0000 / and 10 … desc.G10 /: Now we are in a position to give a precise mathematical description of an “un- describable” set of binary strings, namely: Theorem. Define U WWD fx 2 f0; 1g j x … desc.Gx /g: (8.11) The set U is not describable. Use reasoning similar to Cantor’s Theorem 8.1.12 to prove this Theorem. Hint: Suppose U D desc.GxU /. Exam Problems Problem 8.28. Let f1; 2; 3g! be the set of infinite sequences containing only the numbers 1, 2, and 3. For example, some sequences of this kind are: .1; 1; 1; 1:::/; .2; 2; 2; 2:::/; .3; 2; 1; 3:::/: Prove that f1; 2; 3g! is uncountable. Hint: One approach is to define a surjective function from f1; 2; 3g! to the power set pow.N/. “mcs” — 2017/3/10 — 22:22 — page 286 — #294 286 Chapter 8 Infinite Sets Problems for Section 8.3 Class Problems Problem 8.29. Forming a pair .a; b/ of items a and b is a mathematical operation that we can safely take for granted. But when we’re trying to show how all of mathematics can be reduced to set theory, we need a way to represent the pair .a; b/ as a set. (a) Explain why representing .a; b/ by fa; bg won’t work. (b) Explain why representing .a; b/ by fa; fbgg won’t work either. Hint: What pair does ff1g; f2gg represent? (c) Define pair.a; b/ WWD fa; fa; bgg: Explain why representing .a; b/ as pair.a; b/ uniquely determines a and b. Hint: Sets can’t be indirect members of themselves: a 2 a never holds for any set a, and neither can a 2 b 2 a hold for any b. Problem 8.30. The axiom of choice says that if s is a set whose members are nonempty sets that are pairwise disjoint—that is, no two sets in s have an element in common—then there is a set c consisting of exactly one element from each set in s. In formal logic, we could describe s with the formula, pairwise-disjoint.s/WWD8x 2 s: x ¤ ; AND 8x; y 2 s: x ¤ y IMPLIES x \ y D ;: Similarly we could describe c with the formula choice-set.c; s/ WWD 8x 2 s: 9Šz: z 2 c \ x: Here “9Š z:” is fairly standard notation for “there exists a unique z.” Now we can give the formal definition: Definition (Axiom of Choice). 8s: pairwise-disjoint.s/ IMPLIES 9c: choice-set.c; s/: “mcs” — 2017/3/10 — 22:22 — page 287 — #295 8.4. Does All This Really Work? 287 The only issue here is that set theory is technically supposed to be expressed in terms of pure formulas in the language of sets, which means formula that uses only the membership relation 2 propositional connectives, the two quantifies 8 and 9, and variables ranging over all sets. Verify that the axiom of choice can be expressed as a pure formula, by explaining how to replace all impure subformulas above with equivalent pure formulas. For example, the formula x D y could be replaced with the pure formula 8z: z 2 x IFF z 2 y. Problem 8.31. Let R W A ! A be a binary relation on a set A. If a1 R a0 , we’ll say that a1 is “R- smaller” than a0 . R is called well founded when there is no infinite “R-decreasing” sequence: R an R R a1 R a0 ; (8.12) of elements ai 2 A. For example, if A D N and R is the <-relation, then R is well founded because if you keep counting down with nonnegative integers, you eventually get stuck at zero: 0 < < n 1 < n: But you can keep counting up forever, so the >-relation is not well founded: > n > > 1 > 0: Also, the -relation on N is not well founded because a constant sequence of, say, 2’s, gets -smaller forever: 2 2 2: (a) If B is a subset of A, an element b 2 B is defined to be R-minimal in B iff there is no R-smaller element in B. Prove that R W A ! A is well founded iff every nonempty subset of A has an R-minimal element. A logic formula of set theory has only predicates of the form “x 2 y” for vari- ables x; y ranging over sets, along with quantifiers and propositional operations. For example, isempty.x/ WWD 8w: NOT.w 2 x/ is a formula of set theory that means that “x is empty.” (b) Write a formula member-minimal.u; v/ of set theory that means that u is 2- minimal in v. “mcs” — 2017/3/10 — 22:22 — page 288 — #296 288 Chapter 8 Infinite Sets (c) The Foundation axiom of set theory says that 2 is a well founded relation on sets. Express the Foundation axiom as a formula of set theory. You may use “member-minimal” and “isempty” in your formula as abbreviations for the formu- las defined above. (d) Explain why the Foundation axiom implies that no set is a member of itself. Homework Problems Problem 8.32. In writing formulas, it is OK to use abbreviations introduced earlier (so it is now legal to use “D” because we just defined it). (a) Explain how to write a formula, Subsetn .x; y1 ; y2 ; : : : ; yn /, of set theory 9 that means x fy1 ; y2 ; : : : ; yn g. (b) Now use the formula Subsetn to write a formula, Atmostn .x/, of set theory that means that x has at most n elements. (c) Explain how to write a formula Exactlyn of set theory that means that x has exactly n elements. Your formula should only be about twice the length of the formula Atmostn . (d) The direct way to write a formula Dn .y1 ; : : : ; yn / of set theory that means that y1 ; : : : ; yn are distinct elements is to write an AND of subformulas “yi ¤ yj ” for 1 i < j n. Since there are n.n 1/=2 such subformulas, this approach leads to a formula Dn whose length grows proportional to n2 . Describe how to write such a formula Dn .y1 ; : : : ; yn / whose length only grows proportional to n. Hint: Use Subsetn and Exactlyn . Exam Problems Problem 8.33. (a) Explain how to write a formula Members.p; a; b/ of set the- ory10 that means p D fa; bg. Hint: Say that everything in p is either a or b. It’s OK to use subformulas of the form “x D y,” since we can regard “x D y” as an abbreviation for a genuine set theory formula. A pair .a; b/ is simply a sequence of length two whose first item is a and whose second is b. Sequences are a basic mathematical data type we take for granted, but when we’re trying to show how all of mathematics can be reduced to set theory, we 9 See Section 8.3.2. 10 See Section 8.3.2. “mcs” — 2017/3/10 — 22:22 — page 289 — #297 8.4. Does All This Really Work? 289 need a way to represent the ordered pair .a; b/ as a set. One way that will work11 is to represent .a; b/ as pair.a; b/ WWD fa; fa; bgg: (b) Explain how to write a formula Pair.p; a; b/, of set theory 12 that means p D pair.a; b/. Hint: Now it’s OK to use subformulas of the form “Members.p; a; b/.” (c) Explain how to write a formula Second.p; b/, of set theory that means p is a pair whose second item is b. Problems for Section 8.4 Homework Problems Problem 8.34. In this problem, structural induction and the Foundation Axiom of set theory pro- vide simple proofs about some utterly infinite objects. Definition. The class of ‘recursive set-like” objects, Recs, is defined recursively as follows: Base case: The empty set ; is a Recs. Constructor step: If S is a nonempty set of Recs’s, then S is a Recs. (a) Prove that Recs satisfies the Foundation Axiom: there is no infinite sequence of Recs, ro ; r1 ; : : : ; rn 1 ; rn ; : : : such that : : : rn 2 rn 1 2 : : : r1 2 r0 : (8.13) Hint: Structural induction. (b) Prove that every pure set is a Recs.13 Hint: Use the Foundation axiom. (c) Every Recs R defines a special kind of two-person game of perfect informa- tion called a uniform game. The initial “board position” of the game is R itself. 11 Some similar ways that don’t work are described in problem 8.29. 12 See Section 8.3.2. 13 A “pure” set is empty or is a set whose elements are all pure sets. “mcs” — 2017/3/10 — 22:22 — page 290 — #298 290 Chapter 8 Infinite Sets A player’s move consists of choosing any member R. The two players alternate moves, with the player whose turn it is to move called the Next player. The Next player’s move determines a game in which the other player, called the Previous player, moves first. The game is called “uniform” because the two players have the same objective: to leave the other player stuck with no move to make. That is, whoever moves to the empty set is a winner, because then the next player has no move. Prove that in every uniform game, either the Previous player or the Next player has a winning strategy. Problem 8.35. For any set x, define next.x/ to be the set consisting of all the elements of x, along with x itself: next.x/ WWD x [ fxg: So by definition, x 2 next.x/ and x next.x/: (8.14) Now we give a recursive definition of a collection Ord of sets called ordinals that provide a way to count infinite sets. Namely, Definition. ; 2 Ord; if 2 Ord; then next./ 2 Ord; [ if S Ord; then 2 Ord: 2S There is a method for proving things about ordinals that follows directly from the way they are defined. Namely, let P .x/ be some property of sets. The Ordinal Induction Rule says that to prove that P ./ is true for all ordinals , you need only show two things If P holds for all the members of next.x/, then it holds for next.x/, and if P holds for all members of some set S , then it holds for their union. That is: “mcs” — 2017/3/10 — 22:22 — page 291 — #299 8.4. Does All This Really Work? 291 Rule. Ordinal Induction 8x: .8y 2 next.x/: P .y// IMPLIES S P .next.x//; 8S: .8x 2 S: P .x// IMPLIES P . x2S x/ 8 2 Ord: P ./ The intuitive justification for the Ordinal Induction Rule is similar to the justifi- cation for strong induction. We will accept the soundness of the Ordinal Induction Rule as a basic axiom. (a) A set x is closed under membership if every element of x is also a subset of x, that is 8y 2 x: y x: Prove that every ordinal is closed under membership. (b) A sequence 2 nC1 2 n 2 2 1 2 0 (8.15) of ordinals i is called a member-decreasing sequence starting at 0 . Use Ordinal Induction to prove that no ordinal starts an infinite member-decreasing sequence.14 14 Do not assume the Foundation Axiom of ZFC (Section 8.3.2) which says that there isn’t any set that starts an infinite member-decreasing sequence. Even in versions of set theory in which the Foun- dation Axiom does not hold, there cannot be any infinite member-decreasing sequence of ordinals. “mcs” — 2017/3/10 — 22:22 — page 292 — #300 “mcs” — 2017/3/10 — 22:22 — page 293 — #301 II Structures “mcs” — 2017/3/10 — 22:22 — page 294 — #302 “mcs” — 2017/3/10 — 22:22 — page 295 — #303 Introduction The properties of the set of integers are the subject of Number Theory. This part of the text starts with a chapter on this topic because the integers are a very famil- iar mathematical structure that have lots of easy-to-state and interesting-to-prove properties. This makes Number Theory a good place to start serious practice with the methods of proof outlined in Part 1. Moreover, Number Theory has turned out to have multiple applications in computer science. For example, most modern data encryption methods are based on Number theory. We study numbers as a “structure” that has multiple parts of different kinds. One part is, of course, the set of all the integers. A second part is the collection of basic integer operations: addition, multiplication, exponentiation,. . . . Other parts are the important subsets of integers—like the prime numbers—out of which all integers can be built using multiplication. Structured objects more generally are fundamental in computer science. Whether you are writing code, solving an optimization problem, or designing a network, you will be dealing with structures. Graphs, also known as networks, are a fundamental structure in computer sci- ence. Graphs can model associations between pairs of objects; for example, two exams that cannot be given at the same time, two people that like each other, or two subroutines that can be run independently. In Chapter 10, we study directed graphs which model one-way relationships such as being bigger than, loving (sadly, it’s often not mutual), and being a prerequisite for. A highlight is the special case of acyclic digraphs (DAGs) that correspond to a class of relations called partial or- ders. Partial orders arise frequently in the study of scheduling and concurrency. Digraphs as models for data communication and routing problems are the topic of Chapter 11. In Chapter 12 we focus on simple graphs that represent mutual or symmetric re- “mcs” — 2017/3/10 — 22:22 — page 296 — #304 296 Part II Structures lationships, such as being in conflict, being compatible, being independent, being capable of running in parallel. Planar Graphs—simple graphs that can be drawn in the plane—are examined in Chapter 13, the final chapter of Part II. The impossi- bility of placing 50 geocentric satellites in orbit so that they uniformly blanket the globe will be one of the conclusions reached in this chapter. “mcs” — 2017/3/10 — 22:22 — page 297 — #305 9 Number Theory Number theory is the study of the integers. Why anyone would want to study the integers may not be obvious. First of all, what’s to know? There’s 0, there’s 1, 2, 3, and so on, and, oh yeah, -1, -2, . . . . Which one don’t you understand? What practical value is there in it? The mathematician G. H. Hardy delighted at its impracticality. He wrote: [Number theorists] may be justified in rejoicing that there is one sci- ence, at any rate, and that their own, whose very remoteness from or- dinary human activities should keep it gentle and clean. Hardy was especially concerned that number theory not be used in warfare; he was a pacifist. You may applaud his sentiments, but he got it wrong: number theory underlies modern cryptography, which is what makes secure online communication possible. Secure communication is of course crucial in war—leaving poor Hardy spinning in his grave. It’s also central to online commerce. Every time you buy a book from Amazon, use a certificate to access a web page, or use a PayPal account, you are relying on number theoretic algorithms. Number theory also provides an excellent environment for us to practice and apply the proof techniques that we developed in previous chapters. We’ll work out properties of greatest common divisors (gcd’s) and use them to prove that integers factor uniquely into primes. Then we’ll introduce modular arithmetic and work out enough of its properties to explain the RSA public key crypto-system. Since we’ll be focusing on properties of the integers, we’ll adopt the default convention in this chapter that variables range over the set Z of integers. 9.1 Divisibility The nature of number theory emerges as soon as we consider the divides relation. Definition 9.1.1. a divides b (notation a j b) iff there is an integer k such that ak D b: The divides relation comes up so frequently that multiple synonyms for it are used all the time. The following phrases all say the same thing: “mcs” — 2017/3/10 — 22:22 — page 298 — #306 298 Chapter 9 Number Theory a j b, a divides b, a is a divisor of b, a is a factor of b, b is divisible by a, b is a multiple of a. Some immediate consequences of Definition 9.1.1 are that for all n n j 0; n j n; and ˙ 1 j n: Also, 0 j n IMPLIES n D 0: Dividing seems simple enough, but let’s play with this definition. The Pythagore- ans, an ancient sect of mathematical mystics, said that a number is perfect if it equals the sum of its positive integral divisors, excluding itself. For example, 6 D 1 C 2 C 3 and 28 D 1 C 2 C 4 C 7 C 14 are perfect numbers. On the other hand, 10 is not perfect because 1 C 2 C 5 D 8, and 12 is not perfect because 1 C 2 C 3 C 4 C 6 D 16. Euclid characterized all the even perfect numbers around 300 BC (Problem 9.2). But is there an odd perfect number? More than two thou- sand years later, we still don’t know! All numbers up to about 10300 have been ruled out, but no one has proved that there isn’t an odd perfect number waiting just over the horizon. So a half-page into number theory, we’ve strayed past the outer limits of human knowledge. This is pretty typical; number theory is full of questions that are easy to pose, but incredibly difficult to answer. We’ll mention a few more such questions in later sections.1 9.1.1 Facts about Divisibility The following lemma collects some basic facts about divisibility. Lemma 9.1.2. 1. If a j b and b j c, then a j c. 1 Don’t Panic—we’re going to stick to some relatively benign parts of number theory. These super-hard unsolved problems rarely get put on problem sets. “mcs” — 2017/3/10 — 22:22 — page 299 — #307 9.1. Divisibility 299 2. If a j b and a j c, then a j sb C t c for all s and t . 3. For all c ¤ 0, a j b if and only if ca j cb. Proof. These facts all follow directly from Definition 9.1.1. To illustrate this, we’ll prove just part 2: Given that a j b, there is some k1 2 Z such that ak1 D b. Likewise, ak2 D c, so sb C t c D s.k1 a/ C t .k2 a/ D .sk1 C t k2 /a: Therefore sb C t c D k3 a where k3 WWD .sk1 C t k2 /, which means that a j sb C t c: A number of the form sb C t c is called an integer linear combination of b and c, or, since in this chapter we’re only talking about integers, just a linear combination. So Lemma 9.1.2.2 can be rephrased as If a divides b and c, then a divides every linear combination of b and c. We’ll be making good use of linear combinations, so let’s get the general definition on record: Definition 9.1.3. An integer n is a linear combination of numbers b0 ; : : : ; bk iff n D s0 b0 C s1 b1 C C sk bk for some integers s0 ; : : : ; sk . 9.1.2 When Divisibility Goes Bad As you learned in elementary school, if one number does not evenly divide another, you get a “quotient” and a “remainder” left over. More precisely: Theorem 9.1.4. [Division Theorem]2 Let n and d be integers such that d ¤ 0. Then there exists a unique pair of integers q and r, such that n D q d C r AND 0 r < jd j : (9.1) 2 This theorem is often called the “Division Algorithm,” but we prefer to call it a theorem since it does not actually describe a division procedure for computing the quotient and remainder. “mcs” — 2017/3/10 — 22:22 — page 300 — #308 300 Chapter 9 Number Theory The number q is called the quotient and the number r is called the remainder of n divided by d . We use the notation qcnt.n; d / for the quotient and rem.n; d / for the remainder. The absolute value notation jd j used above is probably familiar from introduc- tory calculus, but for the record, let’s define it. Definition 9.1.5. For any real number r, the absolute value jrj of r is:3 ( r if r 0; jrj WWD r if r < 0: So by definition, the remainder rem.n; d / is nonnegative regardless of the sign of n and d . For example, rem. 11; 7/ D 3, since 11 D . 2/ 7 C 3. “Remainder” operations built into many programming languages can be a source of confusion. For example, the expression “32 % 5” will be familiar to program- mers in Java, C, and C++; it evaluates to rem.32; 5/ D 2 in all three languages. On the other hand, these and other languages are inconsistent in how they treat re- mainders like “32 % -5” or “-32 % 5” that involve negative numbers. So don’t be distracted by your familiar programming language’s behavior on remainders, and stick to the mathematical convention that remainders are nonnegative. The remainder on division by d by definition is a number in the (integer) interval from 0 to jd j 1. Such integer intervals come up so often that it is useful to have a simple notation for them. For k n 2 Z, .k::n/ WWD fi j k < i < ng; .k::n WWD .k; n/ [ fng; Œk::n/ WWD fkg [ .k; n/; Œk::n WWD fkg [ .k; n/ [ fng D fi j k i ng: 9.1.3 Die Hard Die Hard 3 is just a B-grade action movie, but we think it has an inner message: everyone should learn at least a little number theory. In Section 6.2.3, we formal- ized a state machine for the Die Hard jug-filling problem using 3 and 5 gallon jugs, p 3 The absolute value of r could be defined as r 2 , which works because of the convention that square root notation always refers to the nonnegative square root (see Problem 1.3). Absolute value generalizes to complex numbers where it is called the norm. For a; b 2 R, p ja C bi j WWD a2 C b 2 : “mcs” — 2017/3/10 — 22:22 — page 301 — #309 9.1. Divisibility 301 and also with 3 and 9 gallon jugs, and came to different conclusions about bomb explosions. What’s going on in general? For example, how about getting 4 gallons from 12- and 18-gallon jugs, getting 32 gallons with 899- and 1147-gallon jugs, or getting 3 gallons into a jug using just 21- and 26-gallon jugs? It would be nice if we could solve all these silly water jug questions at once. This is where number theory comes in handy. A Water Jug Invariant Suppose that we have water jugs with capacities a and b with b a. Let’s carry out some sample operations of the state machine and see what happens, assuming the b-jug is big enough: .0; 0/ ! .a; 0/ fill first jug ! .0; a/ pour first into second ! .a; a/ fill first jug ! .2a b; b/ pour first into second (assuming 2a b) ! .2a b; 0/ empty second jug ! .0; 2a b/ pour first into second ! .a; 2a b/ fill first ! .3a 2b; b/ pour first into second (assuming 3a 2b) What leaps out is that at every step, the amount of water in each jug is a linear combination of a and b. This is easy to prove by induction on the number of transitions: Lemma 9.1.6 (Water Jugs). In the Die Hard state machine of Section 6.2.3 with jugs of sizes a and b, the amount of water in each jug is always a linear combination of a and b. Proof. The induction hypothesis P .n/ is the proposition that after n transitions, the amount of water in each jug is a linear combination of a and b. Base case (n D 0): P .0/ is true, because both jugs are initially empty, and 0 a C 0 b D 0. Inductive step: Suppose the machine is in state .x; y/ after n steps, that is, the little jug contains x gallons and the big one contains y gallons. There are two cases: If we fill a jug from the fountain or empty a jug into the fountain, then that jug is empty or full. The amount in the other jug remains a linear combination of a and b. So P .n C 1/ holds. “mcs” — 2017/3/10 — 22:22 — page 302 — #310 302 Chapter 9 Number Theory Otherwise, we pour water from one jug to another until one is empty or the other is full. By our assumption, the amount x and y in each jug is a linear combination of a and b before we begin pouring. After pouring, one jug is either empty (contains 0 gallons) or full (contains a or b gallons). Thus, the other jug contains either x C y, x C y a or x C y b gallons, all of which are linear combinations of a and b since x and y are. So P .n C 1/ holds in this case as well. Since P .n C 1/ holds in any case, this proves the inductive step, completing the proof by induction. So we have established that the jug problem has a preserved invariant, namely, the amount of water in every jug is a linear combination of the capacities of the jugs. Lemma 9.1.6 has an important corollary: Corollary. In trying to get 4 gallons from 12- and 18-gallon jugs, and likewise to get 32 gallons from 899- and 1147-gallon jugs, Bruce will die! Proof. By the Water Jugs Lemma 9.1.6, with 12- and 18-gallon jugs, the amount in any jug is a linear combination of 12 and 18. This is always a multiple of 6 by Lemma 9.1.2.2, so Bruce can’t get 4 gallons. Likewise, the amount in any jug using 899- and 1147-gallon jugs is a multiple of 31, so he can’t get 32 either. But the Water Jugs Lemma doesn’t tell the complete story. For example, it leaves open the question of getting 3 gallons into a jug using just 21- and 26-gallon jugs: the only positive factor of both 21 and 26 is 1, and of course 1 divides 3, so the Lemma neither rules out nor confirms the possibility of getting 3 gallons. A bigger issue is that we’ve just managed to recast a pretty understandable ques- tion about water jugs into a technical question about linear combinations. This might not seem like a lot of progress. Fortunately, linear combinations are closely related to something more familiar, greatest common divisors, and will help us solve the general water jug problem. 9.2 The Greatest Common Divisor A common divisor of a and b is a number that divides them both. The greatest common divisor of a and b is written gcd.a; b/. For example, gcd.18; 24/ D 6. “mcs” — 2017/3/10 — 22:22 — page 303 — #311 9.2. The Greatest Common Divisor 303 As long as a and b are not both 0, they will have a gcd. The gcd turns out to be very valuable for reasoning about the relationship between a and b and for reasoning about integers in general. We’ll be making lots of use of gcd’s in what follows. Some immediate consequences of the definition of gcd are that for n > 0, gcd.n; n/ D n; gcd.n; 1/ D 1; gcd.n; 0/ D n; where the last equality follows from the fact that everything is a divisor of 0. 9.2.1 Euclid’s Algorithm The first thing to figure out is how to find gcd’s. A good way called Euclid’s algorithm has been known for several thousand years. It is based on the following elementary observation. Lemma 9.2.1. For b ¤ 0, gcd.a; b/ D gcd.b; rem.a; b//: Proof. By the Division Theorem 9.1.4, a D qb C r (9.2) where r D rem.a; b/. So a is a linear combination of b and r, which implies that any divisor of b and r is a divisor of a by Lemma 9.1.2.2. Likewise, r is a linear combination a qb of a and b, so any divisor of a and b is a divisor of r. This means that a and b have the same common divisors as b and r, and so they have the same greatest common divisor. Lemma 9.2.1 is useful for quickly computing the greatest common divisor of two numbers. For example, we could compute the greatest common divisor of 1147 and 899 by repeatedly applying it: gcd.1147; 899/ D gcd.899; rem.1147; 899// „ ƒ‚ … D248 D gcd .248; rem.899; 248/ D 155/ D gcd .155; rem.248; 155/ D 93/ D gcd .93; rem.155; 93/ D 62/ D gcd .62; rem.93; 62/ D 31/ D gcd .31; rem.62; 31/ D 0/ D 31 “mcs” — 2017/3/10 — 22:22 — page 304 — #312 304 Chapter 9 Number Theory This calculation that gcd.1147; 899/ D 31 was how we figured out that with water jugs of sizes 1147 and 899, Bruce dies trying to get 32 gallons. On the other hand, applying Euclid’s algorithm to 26 and 21 gives gcd.26; 21/ D gcd.21; 5/ D gcd.5; 1/ D 1; so we can’t use the reasoning above to rule out Bruce getting 3 gallons into the big jug. As a matter of fact, because the gcd here is 1, Bruce will be able to get any number of gallons into the big jug up to its capacity. To explain this, we will need a little more number theory. Euclid’s Algorithm as a State Machine Euclid’s algorithm can easily be formalized as a state machine. The set of states is N2 and there is one transition rule: .x; y/ ! .y; rem.x; y//; (9.3) for y > 0. By Lemma 9.2.1, the gcd stays the same from one state to the next. That means the predicate gcd.x; y/ D gcd.a; b/ is a preserved invariant on the states .x; y/. This preserved invariant is, of course, true in the start state .a; b/. So by the Invariant Principle, if y ever becomes 0, the invariant will be true and so x D gcd.x; 0/ D gcd.a; b/: Namely, the value of x will be the desired gcd. What’s more x and therefore also y, gets to be 0 pretty fast. To see why, note that starting from .x; y/, two transitions leads to a state whose the first coordinate is rem.x; y/, which is at most half the size of x.4 Since x starts off equal to a and gets halved or smaller every two steps, it will reach its minimum value—which is gcd.a; b/—after at most 2 log a transitions. After that, the algorithm takes at most one more transition to terminate. In other words, Euclid’s algorithm terminates after at most 1 C 2 log a transitions.5 4 In other words, rem.x; y/ x=2 for 0 < y x: (9.4) This is immediate if y x=2, since the remainder of x divided by y is less than y by definition. On the other hand, if y > x=2, then rem.x; y/ D x y < x=2. 5 A tighter analysis shows that at most log .a/ transitions are possible where ' is the golden ratio p ' .1 C 5/=2, see Problem 9.14. “mcs” — 2017/3/10 — 22:22 — page 305 — #313 9.2. The Greatest Common Divisor 305 9.2.2 The Pulverizer We will get a lot of mileage out of the following key fact: Theorem 9.2.2. The greatest common divisor of a and b is a linear combination of a and b. That is, gcd.a; b/ D sa C t b; for some integers s and t.6 We already know from Lemma 9.1.2.2 that every linear combination of a and b is divisible by any common factor of a and b, so it is certainly divisible by the greatest of these common divisors. Since any constant multiple of a linear combination is also a linear combination, Theorem 9.2.2 implies that any multiple of the gcd is a linear combination, giving: Corollary 9.2.3. An integer is a linear combination of a and b iff it is a multiple of gcd.a; b/. We’ll prove Theorem 9.2.2 directly by explaining how to find s and t. This job is tackled by a mathematical tool that dates back to sixth-century India, where it was called kuttaka, which means “the Pulverizer.” Today, the Pulverizer is more commonly known as the “Extended Euclidean Gcd Algorithm,” because it is so close to Euclid’s algorithm. For example, following Euclid’s algorithm, we can compute the gcd of 259 and 70 as follows: gcd.259; 70/ D gcd.70; 49/ since rem.259; 70/ D 49 D gcd.49; 21/ since rem.70; 49/ D 21 D gcd.21; 7/ since rem.49; 21/ D 7 D gcd.7; 0/ since rem.21; 7/ D 0 D 7: The Pulverizer goes through the same steps, but requires some extra bookkeeping along the way: as we compute gcd.a; b/, we keep track of how to write each of the remainders (49, 21, and 7, in the example) as a linear combination of a and b. This is worthwhile, because our objective is to write the last nonzero remainder, 6 This result is often referred to as Bezout’s lemma, which is a misattribution since it was first published in the West 150 years earlier by someone else, and was described a thousand years before that by Indian mathematicians Aryabhata and Bhaskara. “mcs” — 2017/3/10 — 22:22 — page 306 — #314 306 Chapter 9 Number Theory which is the gcd, as such a linear combination. For our example, here is this extra bookkeeping: x y .rem.x; y// D x q y 259 70 49 D a 3 b 70 49 21 D b 1 49 D b 1 .a 3 b/ D 1aC4b 49 21 7 D 49 2 21 D .a 3 b/ 2 . 1 a C 4 b/ D 3 a 11 b 21 7 0 We began by initializing two variables, x D a and y D b. In the first two columns above, we carried out Euclid’s algorithm. At each step, we computed rem.x; y/ which equals x qcnt.x; y/ y. Then, in this linear combination of x and y, we replaced x and y by equivalent linear combinations of a and b, which we already had computed. After simplifying, we were left with a linear combination of a and b equal to rem.x; y/, as desired. The final solution is boxed. This should make it pretty clear how and why the Pulverizer works. If you have doubts, you may work through Problem 9.13, where the Pulverizer is formalized as a state machine and then verified using an invariant that is an extension of the one used for Euclid’s algorithm. Since the Pulverizer requires only a little more computation than Euclid’s algo- rithm, you can “pulverize” very large numbers very quickly by using this algorithm. As we will soon see, its speed makes the Pulverizer a very useful tool in the field of cryptography. Now we can restate the Water Jugs Lemma 9.1.6 in terms of the greatest common divisor: Corollary 9.2.4. Suppose that we have water jugs with capacities a and b. Then the amount of water in each jug is always a multiple of gcd.a; b/. For example, there is no way to form 4 gallons using 3- and 6-gallon jugs, be- cause 4 is not a multiple of gcd.3; 6/ D 3. 9.2.3 One Solution for All Water Jug Problems Corollary 9.2.3 says that 3 can be written as a linear combination of 21 and 26, since 3 is a multiple of gcd.21; 26/ D 1. So the Pulverizer will give us integers s and t such that 3 D s 21 C t 26 (9.5) “mcs” — 2017/3/10 — 22:22 — page 307 — #315 9.2. The Greatest Common Divisor 307 The coefficient s could be either positive or negative. However, we can readily transform this linear combination into an equivalent linear combination 3 D s 0 21 C t 0 26 (9.6) where the coefficient s 0 is positive. The trick is to notice that if in equation (9.5) we increase s by 26 and decrease t by 21, then the value of the expression s 21 C t 26 is unchanged overall. Thus, by repeatedly increasing the value of s (by 26 at a time) and decreasing the value of t (by 21 at a time), we get a linear combination s 0 21 C t 0 26 D 3 where the coefficient s 0 is positive. (Of course t 0 must then be negative; otherwise, this expression would be much greater than 3.) Now we can form 3 gallons using jugs with capacities 21 and 26: We simply repeat the following steps s 0 times: 1. Fill the 21-gallon jug. 2. Pour all the water in the 21-gallon jug into the 26-gallon jug. If at any time the 26-gallon jug becomes full, empty it out, and continue pouring the 21- gallon jug into the 26-gallon jug. At the end of this process, we must have emptied the 26-gallon jug exactly t 0 times. Here’s why: we’ve taken s 0 21 gallons of water from the fountain, and we’ve poured out some multiple of 26 gallons. If we emptied fewer than t 0 times, then by (9.6), the big jug would be left with at least 3 C 26 gallons, which is more than it can hold; if we emptied it more times, the big jug would be left containing at most 3 26 gallons, which is nonsense. But once we have emptied the 26-gallon jug exactly t 0 times, equation (9.6) implies that there are exactly 3 gallons left. Remarkably, we don’t even need to know the coefficients s 0 and t 0 in order to use this strategy! Instead of repeating the outer loop s 0 times, we could just repeat until we obtain 3 gallons, since that must happen eventually. Of course, we have to keep track of the amounts in the two jugs so we know when we’re done. Here’s the “mcs” — 2017/3/10 — 22:22 — page 308 — #316 308 Chapter 9 Number Theory solution using this approach starting with empty jugs, that is, at .0; 0/: fill 21 pour 21 into 26 ! .21; 0/ ! .0; 21/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 21/ ! .16; 26/ ! .16; 0/ ! .0; 16/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 16/ ! .11; 26/ ! .11; 0/ ! .0; 11/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 11/ ! .6; 26/ ! .6; 0/ ! .0; 6/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 6/ ! .1; 26/ ! .1; 0/ ! .0; 1/ fill 21 pour 21 to 26 ! .21; 1/ ! .0; 22/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 22/ ! .17; 26/ ! .17; 0/ ! .0; 17/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 17/ ! .12; 26/ ! .12; 0/ ! .0; 12/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 12/ ! .7; 26/ ! .7; 0/ ! .0; 7/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 7/ ! .2; 26/ ! .2; 0/ ! .0; 2/ fill 21 pour 21 to 26 ! .21; 2/ ! .0; 23/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 23/ ! .18; 26/ ! .18; 0/ ! .0; 18/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 18/ ! .13; 26/ ! .13; 0/ ! .0; 13/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 13/ ! .8; 26/ ! .8; 0/ ! .0; 8/ fill 21 pour 21 to 26 empty 26 pour 21 to 26 ! .21; 8/ ! .3; 26/ ! .3; 0/ ! .0; 3/ The same approach works regardless of the jug capacities and even regardless of the amount we’re trying to produce! Simply repeat these two steps until the desired amount of water is obtained: 1. Fill the smaller jug. 2. Pour all the water in the smaller jug into the larger jug. If at any time the larger jug becomes full, empty it out, and continue pouring the smaller jug into the larger jug. By the same reasoning as before, this method eventually generates every multiple— up to the size of the larger jug—of the greatest common divisor of the jug capacities, all the quantities we can possibly produce. No ingenuity is needed at all! So now we have the complete water jug story: Theorem 9.2.5. Suppose that we have water jugs with capacities a and b. For any c 2 Œ0::a, it is possible to get c gallons in the size a jug iff c is a multiple of gcd.a; b/. “mcs” — 2017/3/10 — 22:22 — page 309 — #317 9.3. Prime Mysteries 309 9.2.4 Properties of the Greatest Common Divisor It can help to have some basic gcd facts on hand: Lemma 9.2.6. a) gcd.ka; kb/ D k gcd.a; b/ for all k > 0. b) .d j a AND d j b/ IFF d j gcd.a; b/. c) If gcd.a; b/ D 1 and gcd.a; c/ D 1, then gcd.a; bc/ D 1. d) If a j bc and gcd.a; b/ D 1, then a j c. Showing how all these facts follow from Theorem 9.2.2 that gcd is a linear com- bination is a good exercise (Problem 9.11). These properties are also simple consequences of the fact that integers factor into primes in a unique way (Theorem 9.4.1). But we’ll need some of these facts to prove unique factorization in Section 9.4, so proving them by appeal to unique factorization would be circular. 9.3 Prime Mysteries Some of the greatest mysteries and insights in number theory concern properties of prime numbers: Definition 9.3.1. A prime is a number greater than 1 that is divisible only by itself and 1. A number other than 0, 1, and 1 that is not a prime is called composite.7 Here are three famous mysteries: Twin Prime Conjecture There are infinitely many primes p such that p C 2 is also a prime. In 1966, Chen showed that there are infinitely many primes p such that p C2 is the product of at most two primes. So the conjecture is known to be almost true! Conjectured Inefficiency of Factoring Given the product of two large primes n D pq, there is no efficient procedure to recover the primes p and q. That is, no polynomial time procedure (see Section 3.5) is guaranteed to find p and 7 So 0, 1, and 1 are the only integers that are neither prime nor composite. “mcs” — 2017/3/10 — 22:22 — page 310 — #318 310 Chapter 9 Number Theory q in a number of steps bounded by a polynomial in the length of the binary representation of n (not n itself). The length of the binary representation at most 1 C log2 n. The best algorithm known is the “number field sieve,” which runs in time proportional to: 1=3 2=3 e 1:9.ln n/ .ln ln n/ : This number grows more rapidly than any polynomial in log n and is infea- sible when n has 300 digits or more. Efficient factoring is a mystery of particular importance in computer science, as we’ll explain later in this chapter. Goldbach’s Conjecture We’ve already mentioned Goldbach’s Conjecture 1.1.6 sev- eral times: every even integer greater than two is equal to the sum of two primes. For example, 4 D 2 C 2, 6 D 3 C 3, 8 D 3 C 5, etc. In 1939, Schnirelman proved that every even number can be written as the sum of not more than 300,000 primes, which was a start. Today, we know that every even number is the sum of at most 6 primes. Primes show up erratically in the sequence of integers. In fact, their distribution seems almost random: 2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; : : : : One of the great insights about primes is that their density among the integers has a precise limit. Namely, let .n/ denote the number of primes up to n: Definition 9.3.2. .n/ WWD jfp 2 Œ2::n j p is primegj: For example, .1/ D 0; .2/ D 1 and .10/ D 4, because 2, 3, 5, and 7 are the primes less than or equal to 10. Step by step, grows erratically according to the erratic spacing between successive primes, but its overall growth rate is known to smooth out to be the same as the growth of the function n= ln n: Theorem 9.3.3 (Prime Number Theorem). .n/ lim D 1: n!1 n= ln n “mcs” — 2017/3/10 — 22:22 — page 311 — #319 9.4. The Fundamental Theorem of Arithmetic 311 Thus, primes gradually taper off. As a rule of thumb, about 1 integer out of every ln n in the vicinity of n is a prime. The Prime Number Theorem was conjectured by Legendre in 1798 and proved a century later by de la Vallée Poussin and Hadamard in 1896. However, after his death, a notebook of Gauss was found to contain the same conjecture, which he apparently made in 1791 at age 15. (You have to feel sorry for all the otherwise “great” mathematicians who had the misfortune of being contemporaries of Gauss.) A proof of the Prime Number Theorem is beyond the scope of this text, but there is a manageable proof (see Problem 9.22) of a related result that is sufficient for our applications: Theorem 9.3.4 (Chebyshev’s Theorem on Prime Density). For n > 1, n .n/ > : 3 ln n 9.4 The Fundamental Theorem of Arithmetic There is an important fact about primes that you probably already know: every positive integer number has a unique prime factorization. So every positive integer can be built up from primes in exactly one way. These quirky prime numbers are the building blocks for the integers. Since the value of a product of numbers is the same if the numbers appear in a different order, there usually isn’t a unique way to express a number as a product of primes. For example, there are three ways to write 12 as a product of primes: 12 D 2 2 3 D 2 3 2 D 3 2 2: What’s unique about the prime factorization of 12 is that any product of primes equal to 12 will have exactly one 3 and two 2’s. This means that if we sort the primes by size, then the product really will be unique. Let’s state this more carefully. A sequence of numbers is weakly decreasing when each number in the sequence is at least as big as the numbers after it. Note that a sequence of just one number as well as a sequence of no numbers—the empty sequence—is weakly decreasing by this definition. Theorem 9.4.1. [Fundamental Theorem of Arithmetic] Every positive integer is a product of a unique weakly decreasing sequence of primes. “mcs” — 2017/3/10 — 22:22 — page 312 — #320 312 Chapter 9 Number Theory A Prime for Google In late 2004 a billboard appeared in various locations around the country: first 10-digit prime found . com in consecutive digits of e Substituting the correct number for the expression in curly-braces produced the URL for a Google employment page. The idea was that Google was interested in hiring the sort of people that could and would solve such a problem. How hard is this problem? Would you have to look through thousands or millions or billions of digits of e to find a 10-digit prime? The rule of thumb derived from the Prime Number Theorem says that among 10-digit numbers, about 1 in ln 1010 23 is prime. This suggests that the problem isn’t really so hard! Sure enough, the first 10-digit prime in consecutive digits of e appears quite early: e D2:718281828459045235360287471352662497757247093699959574966 9676277240766303535475945713821785251664274274663919320030 599218174135966290435729003342952605956307381323286279434 : : : “mcs” — 2017/3/10 — 22:22 — page 313 — #321 9.4. The Fundamental Theorem of Arithmetic 313 For example, 75237393 is the product of the weakly decreasing sequence of primes 23; 17; 17; 11; 7; 7; 7; 3; and no other weakly decreasing sequence of primes will give 75237393.8 Notice that the theorem would be false if 1 were considered a prime; for example, 15 could be written as 5 3, or 5 3 1, or 5 3 1 1, . . . . There is a certain wonder in unique factorization, especially in view of the prime number mysteries we’ve already mentioned. It’s a mistake to take it for granted, even if you’ve known it since you were in a crib. In fact, unique factorization actually fails for many p integer-like sets of numbers, such as the complex numbers of the form n C m 5 for m; n 2 Z (see Problem 9.25). The Fundamental Theorem is also called the Unique Factorization Theorem, which is a more descriptive and less pretentious, name—but we really want to get your attention to the importance and non-obviousness of unique factorization. 9.4.1 Proving Unique Factorization The Fundamental Theorem is not hard to prove, but we’ll need a couple of prelim- inary facts. Lemma 9.4.2. If p is a prime and p j ab, then p j a or p j b. Lemma 9.4.2 follows immediately from Unique Factorization: the primes in the product ab are exactly the primes from a and from b. But proving the lemma this way would be cheating: we’re going to need this lemma to prove Unique Factoriza- tion, so it would be circular to assume it. Instead, we’ll use the properties of gcd’s and linear combinations to give an easy, noncircular way to prove Lemma 9.4.2. Proof. One case is if gcd.a; p/ D p. Then the claim holds, because a is a multiple of p. Otherwise, gcd.a; p/ ¤ p. In this case gcd.a; p/ must be 1, since 1 and p are the only positive divisors of p. Now gcd.a; p/ is a linear combination of a and p, so we have 1 D sa C tp for some s; t. Then b D s.ab/ C .t b/p, that is, b is a linear combination of ab and p. Since p divides both ab and p, it also divides their linear combination b. A routine induction argument extends this statement to: 8 The “product” of just one number is defined to be that number, and the product of no numbers is by convention defined to be 1. So each prime p is uniquely the product of the primes in the length- one sequence consisting solely of p, and 1, which you will remember is not a prime, is uniquely the product of the empty sequence. “mcs” — 2017/3/10 — 22:22 — page 314 — #322 314 Chapter 9 Number Theory Lemma 9.4.3. Let p be a prime. If p j a1 a2 an , then p divides some ai . Now we’re ready to prove the Fundamental Theorem of Arithmetic. Proof. Theorem 2.3.1 showed, using the Well Ordering Principle, that every posi- tive integer can be expressed as a product of primes. So we just have to prove this expression is unique. We will use Well Ordering to prove this too. The proof is by contradiction: assume, contrary to the claim, that there exist positive integers that can be written as products of primes in more than one way. By the Well Ordering Principle, there is a smallest integer with this property. Call this integer n, and let n D p1 p2 pj ; D q1 q2 qk ; where both products are in weakly decreasing order and p1 q1 . If q1 D p1 , then n=q1 would also be the product of different weakly decreasing sequences of primes, namely, p2 pj ; q2 qk : Since n=q1 < n, this can’t be true, so we conclude that p1 < q1 . Since the pi ’s are weakly decreasing, all the pi ’s are less than q1 . But q1 j n D p1 p2 pj ; so Lemma 9.4.3 implies that q1 divides one of the pi ’s, which contradicts the fact that q1 is bigger than all them. 9.5 Alan Turing The man pictured in Figure 9.1 is Alan Turing, the most important figure in the history of computer science. For decades, his fascinating life story was shrouded by government secrecy, societal taboo, and even his own deceptions. At age 24, Turing wrote a paper entitled On Computable Numbers, with an Ap- plication to the Entscheidungsproblem. The crux of the paper was an elegant way to model a computer in mathematical terms. This was a breakthrough, because it allowed the tools of mathematics to be brought to bear on questions of computation. For example, with his model in hand, Turing immediately proved that there exist “mcs” — 2017/3/10 — 22:22 — page 315 — #323 9.5. Alan Turing 315 Figure 9.1 Alan Turing problems that no computer can solve—no matter how ingenious the programmer. Turing’s paper is all the more remarkable because he wrote it in 1936, a full decade before any electronic computer actually existed. The word “Entscheidungsproblem” in the title refers to one of the 28 mathemat- ical problems posed by David Hilbert in 1900 as challenges to mathematicians of the 20th century. Turing knocked that one off in the same paper. And perhaps you’ve heard of the “Church-Turing thesis”? Same paper. So Turing was a brilliant guy who generated lots of amazing ideas. But this lecture is about one of Turing’s less-amazing ideas. It involved codes. It involved number theory. And it was sort of stupid. Let’s look back to the fall of 1937. Nazi Germany was rearming under Adolf Hitler, world-shattering war looked imminent, and—like us —Alan Turing was pondering the usefulness of number theory. He foresaw that preserving military secrets would be vital in the coming conflict and proposed a way to encrypt com- munications using number theory. This is an idea that has ricocheted up to our own time. Today, number theory is the basis for numerous public-key cryptosystems, digital signature schemes, cryptographic hash functions, and electronic payment systems. Furthermore, military funding agencies are among the biggest investors in cryptographic research. Sorry, Hardy! Soon after devising his code, Turing disappeared from public view, and half a century would pass before the world learned the full story of where he’d gone and “mcs” — 2017/3/10 — 22:22 — page 316 — #324 316 Chapter 9 Number Theory what he did there. We’ll come back to Turing’s life in a little while; for now, let’s investigate the code Turing left behind. The details are uncertain, since he never formally published the idea, so we’ll consider a couple of possibilities. 9.5.1 Turing’s Code (Version 1.0) The first challenge is to translate a text message into an integer so we can perform mathematical operations on it. This step is not intended to make a message harder to read, so the details are not too important. Here is one approach: replace each letter of the message with two digits (A D 01, B D 02, C D 03, etc.) and string all the digits together to form one huge number. For example, the message “victory” could be translated this way: v i c t o r y ! 22 09 03 20 15 18 25 Turing’s code requires the message to be a prime number, so we may need to pad the result with some more digits to make a prime. The Prime Number Theorem indicates that padding with relatively few digits will work. In this case, appending the digits 13 gives the number 2209032015182513, which is prime. Here is how the encryption process works. In the description below, m is the unencoded message (which we want to keep secret), m b is the encrypted message (which the Nazis may intercept), and k is the key. Beforehand The sender and receiver agree on a secret key, which is a large prime k. Encryption The sender encrypts the message m by computing: b Dmk m Decryption The receiver decrypts m b by computing: m b D m: k For example, suppose that the secret key is the prime number k D 22801763489 and the message m is “victory.” Then the encrypted message is: b Dmk m D 2209032015182513 22801763489 D 50369825549820718594667857 There are a couple of basic questions to ask about Turing’s code. “mcs” — 2017/3/10 — 22:22 — page 317 — #325 9.5. Alan Turing 317 1. How can the sender and receiver ensure that m and k are prime numbers, as required? The general problem of determining whether a large number is prime or com- posite has been studied for centuries, and tests for primes that worked well in practice were known even in Turing’s time. In the past few decades, very fast primality tests have been found as described in the text box below. Primality Testing It’s easy ˘ see that an integer n is prime iff it is not divisible by any number from p to 2 to n (see Problem 1.14). Of course this naive way to test if n is prime takes p more than n steps, which is exponential in the size of n measured by the number of digits in the decimal or binary representation of n. Through the early 1970’s, no prime testing procedure was known that would never blow up like this. In 1974, Volker Strassen invented a simple, fast probabilistic primality test. Strassens’s test gives the right answer when applied to any prime number, but has some probability of giving a wrong answer on a nonprime number. However, the probability of a wrong answer on any given number is so tiny that relying on the answer is the best bet you’ll ever make. Still, the theoretical possibility of a wrong answer was intellectually bothersome—even if the probability of being wrong was a lot less than the prob- ability of an undetectable computer hardware error leading to a wrong answer. Finally in 2002, in a breakthrough paper beginning with a quote from Gauss em- phasizing the importance and antiquity of primality testing, Manindra Agrawal, Neeraj Kayal, and Nitin Saxena presented an amazing, thirteen line description of a polynomial time primality test. This definitively places primality testing way below the exponential effort ap- parently needed for SAT and similar problems. The polynomial bound on the Agrawal et al. test had degree 12, and subsequent research has reduced the de- gree to 5, but this is still too large to be practical, and probabilistic primality tests remain the method used in practice today. It’s plausible that the degree bound can be reduced a bit more, but matching the speed of the known probabilistic tests remains a daunting challenge. 2. Is Turing’s code secure? The Nazis see only the encrypted message m b D m k, so recovering the original message m requires factoring mb. Despite immense efforts, no really efficient factoring algorithm has ever been found. It appears to be a funda- “mcs” — 2017/3/10 — 22:22 — page 318 — #326 318 Chapter 9 Number Theory mentally difficult problem. So, although a breakthrough someday can’t be ruled out, the conjecture that there is no efficient way to factor is widely accepted. In effect, Turing’s code puts to practical use his discovery that there are limits to the power of computation. Thus, provided m and k are sufficiently large, the Nazis seem to be out of luck! This all sounds promising, but there is a major flaw in Turing’s code. 9.5.2 Breaking Turing’s Code (Version 1.0) Let’s consider what happens when the sender transmits a second message using Turing’s code and the same key. This gives the Nazis two encrypted messages to look at: c1 D m1 k m and mc2 D m2 k The greatest common divisor of the two encrypted messages, m c1 and m c2 , is the secret key k. And, as we’ve seen, the gcd of two numbers can be computed very efficiently. So after the second message is sent, the Nazis can recover the secret key and read every message! A mathematician as brilliant as Turing is not likely to have overlooked such a glaring problem, and we can guess that he had a slightly different system in mind, one based on modular arithmetic. 9.6 Modular Arithmetic On the first page of his masterpiece on number theory, Disquisitiones Arithmeticae, Gauss introduced the notion of “congruence.” Now, Gauss is another guy who managed to cough up a half-decent idea every now and then, so let’s take a look at this one. Gauss said that a is congruent to b modulo n iff n j .a b/. This is written a b .mod n/: For example: 29 15 .mod 7/ because 7 j .29 15/: It’s not useful to allow a modulus n 1, and so we will assume from now on that moduli are greater than 1. There is a close connection between congruences and remainders: Lemma 9.6.1 (Remainder). ab .mod n/ iff rem.a; n/ D rem.b; n/: “mcs” — 2017/3/10 — 22:22 — page 319 — #327 9.6. Modular Arithmetic 319 Proof. By the Division Theorem 9.1.4, there exist unique pairs of integers q1 ; r1 and q2 ; r2 such that: a D q1 n C r1 b D q2 n C r2 ; where r1 ; r2 2 Œ0::n/. Subtracting the second equation from the first gives: a b D .q1 q2 /n C .r1 r2 /; where r1 r2 is in the interval . n; n/. Now a b .mod n/ if and only if n divides the left-hand side of this equation. This is true if and only if n divides the right-hand side, which holds if and only if r1 r2 is a multiple of n. But the only multiple of n in . n; n/ is 0, so r1 r2 must in fact equal 0, that is, when r1 WWD rem.a; n/ D r2 WWD rem.b; n/. So we can also see that 29 15 .mod 7/ because rem.29; 7/ D 1 D rem.15; 7/: Notice that even though “(mod 7)” appears on the end, the symbol isn’t any more strongly associated with the 15 than with the 29. It would probably be clearer to write 29 mod 7 15, for example, but the notation with the modulus at the end is firmly entrenched, and we’ll just live with it. The Remainder Lemma 9.6.1 explains why the congruence relation has proper- ties like an equality relation. In particular, the following properties9 follow imme- diately: Lemma 9.6.2. aa .mod n/ (reflexivity) a b IFF b a .mod n/ (symmetry) .a b AND b c/ IMPLIES a c .mod n/ (transitivity) We’ll make frequent use of another immediate corollary of the Remainder Lemma 9.6.1: Corollary 9.6.3. a rem.a; n/ .mod n/ 9 Binary relations with these properties are called equivalence relations, see Section 10.10. “mcs” — 2017/3/10 — 22:22 — page 320 — #328 320 Chapter 9 Number Theory Still another way to think about congruence modulo n is that it defines a partition of the integers into n sets so that congruent numbers are all in the same set. For example, suppose that we’re working modulo 3. Then we can partition the integers into 3 sets as follows: f :::; 6; 3; 0; 3; 6; 9; : : : g f :::; 5; 2; 1; 4; 7; 10; : : : g f :::; 4; 1; 2; 5; 8; 11; : : : g according to whether their remainders on division by 3 are 0, 1, or 2. The upshot is that when arithmetic is done modulo n, there are really only n different kinds of numbers to worry about, because there are only n possible remainders. In this sense, modular arithmetic is a simplification of ordinary arithmetic. The next most useful fact about congruences is that they are preserved by addi- tion and multiplication: Lemma 9.6.4 (Congruence). If a b .mod n/ and c d .mod n/, then aCc bCd .mod n/; (9.7) ac bd .mod n/: (9.8) Proof. Let’s start with 9.7. Since a b .mod n/, we have by definition that n j .b a/ D .b C c/ .a C c/, so aCc bCc .mod n/: Since c d .mod n/, the same reasoning leads to bCc bCd .mod n/: Now transitivity (Lemma 9.6.2) gives aCc bCd .mod n/: The proof for 9.8 is virtually identical, using the fact that if n divides .b a/, then it certainly also divides .bc ac/. 9.7 Remainder Arithmetic The Congruence Lemma 9.6.1 says that two numbers are congruent iff their remain- ders are equal, so we can understand congruences by working out arithmetic with remainders. And if all we want is the remainder modulo n of a series of additions, multiplications, subtractions applied to some numbers, we can take remainders at every step so that the entire computation only involves number in the range Œ0::n/. “mcs” — 2017/3/10 — 22:22 — page 321 — #329 9.7. Remainder Arithmetic 321 General Principle of Remainder Arithmetic To find the remainder on division by n of the result of a series of additions and multiplications, applied to some integers replace each integer operand by its remainder on division by n, keep each result of an addition or multiplication in the range Œ0::n/ by im- mediately replacing any result outside that range by its remainder on divi- sion by n. For example, suppose we want to find rem..444273456789 C 155558585555 /4036666666 ; 36/: (9.9) This looks really daunting if you think about computing these large powers and then taking remainders. For example, the decimal representation of 444273456789 has about 20 million digits, so we certainly don’t want to go that route. But re- membering that integer exponents specify a series of multiplications, we follow the General Principle and replace the numbers being multiplied by their remainders. Since rem.44427; 36/ D 3; rem.15555858; 36/ D 6, and rem.403; 36/ D 7, we find that (9.9) equals the remainder on division by 36 of .33456789 C 65555 /76666666 : (9.10) That’s a little better, but 33456789 has about a million digits in its decimal represen- tation, so we still don’t want to compute that. But let’s look at the remainders of the first few powers of 3: rem.3; 36/ D 3 rem.32 ; 36/ D 9 rem.33 ; 36/ D 27 rem.34 ; 36/ D 9: We got a repeat of the second step, rem.32 ; 36/ after just two more steps. This means means that starting at 32 , the sequence of remainders of successive powers of 3 will keep repeating every 2 steps. So a product of an odd number of at least three 3’s will have the same remainder on division by 36 as a product of just three 3’s. Therefore, rem.33456789 ; 36/ D rem.33 ; 36/ D 27: “mcs” — 2017/3/10 — 22:22 — page 322 — #330 322 Chapter 9 Number Theory What a win! Powers of 6 are even easier because rem.62 ; 36/ D 0, so 0’s keep repeating after the second step. Powers of 7 repeat after six steps, but on the fifth step you get a 1, that is rem.76 ; 36/ D 1, so (9.10) successively simplifies to be the remainders of the following terms: .33456789 C 65555 /76666666 .33 C 62 65553 /.76 /1111111 .33 C 0 65553 /11111111 D 27: Notice that it would be a disastrous blunder to replace an exponent by its re- mainder. The general principle applies to numbers that are operands of plus and times, whereas the exponent is a number that controls how many multiplications to perform. Watch out for this. 9.7.1 The ring Zn It’s time to be more precise about the general principle and why it works. To begin, let’s introduce the notation Cn for doing an addition and then immediately taking a remainder on division by n, as specified by the general principle; likewise for multiplying: i Cn j WWD rem.i C j; n/; i n j WWD rem.ij; n/: Now the General Principle is simply the repeated application of the following lemma. Lemma 9.7.1. rem.i C j; n/ D rem.i; n/ Cn rem.j; n/; (9.11) rem.ij; n/ D rem.i; n/ n rem.j; n/: (9.12) Proof. By Corollary 9.6.3, i rem.i; n/ and j rem.j; n/, so by the Congru- ence Lemma 9.6.4 i C j rem.i; n/ C rem.j; n/ .mod n/: By Corollary 9.6.3 again, the remainders on each side of this congruence are equal, which immediately gives (9.11). An identical proof applies to (9.12). “mcs” — 2017/3/10 — 22:22 — page 323 — #331 9.8. Turing’s Code (Version 2.0) 323 The set of integers in the range Œ0::n/ together with the operations Cn and n is referred to as Zn , the ring of integers modulo n. As a consequence of Lemma 9.7.1, the familiar rules of arithmetic hold in Zn , for example: .i n j / n k D i n .j n k/: These subscript-n’s on arithmetic operations really clog things up, so instead we’ll just write “(Zn )” on the side to get a simpler looking equation: .i j / k D i .j k/ .Zn /: In particular, all of the following equalities10 are true in Zn : .i j / k D i .j k/ (associativity of ); .i C j / C k D i C .j C k/ (associativity of C); 1k Dk (identity for ); 0Ck Dk (identity for C); k C . k/ D 0 (inverse for C); i Cj Dj Ci (commutativity of C) i .j C k/ D .i j / C .i k/ (distributivity); i j Dj i (commutativity of ) Associativity implies the familiar fact that it’s safe to omit the parentheses in products: k1 k2 km comes out the same in Zn no matter how it is parenthesized. The overall theme is that remainder arithmetic is a lot like ordinary arithmetic. But there are a couple of exceptions we’re about to examine. 9.8 Turing’s Code (Version 2.0) In 1940, France had fallen before Hitler’s army, and Britain stood alone against the Nazis in western Europe. British resistance depended on a steady flow of sup- 10 A set with addition and multiplication operations that satisfy these equalities is known as a commutative ring. In addition to Zn , the integers, rationals, reals, and polynomials with integer coefficients are all examples of commutative rings. On the other hand, the set fT; Fg of truth values with OR for addition and AND for multiplication is not a commutative ring because it fails to satisfy one of these equalities. The n n matrices of integers are not a commutative ring because they fail to satisfy another one of these equalities. “mcs” — 2017/3/10 — 22:22 — page 324 — #332 324 Chapter 9 Number Theory plies brought across the north Atlantic from the United States by convoys of ships. These convoys were engaged in a cat-and-mouse game with German “U-boats” —submarines—which prowled the Atlantic, trying to sink supply ships and starve Britain into submission. The outcome of this struggle pivoted on a balance of in- formation: could the Germans locate convoys better than the Allies could locate U-boats, or vice versa? Germany lost. A critical reason behind Germany’s loss was not made public until 1974: Ger- many’s naval code, Enigma, had been broken by the Polish Cipher Bureau,11 and the secret had been turned over to the British a few weeks before the Nazi invasion of Poland in 1939. Throughout much of the war, the Allies were able to route con- voys around German submarines by listening in to German communications. The British government didn’t explain how Enigma was broken until 1996. When the story was finally released (by the US), it revealed that Alan Turing had joined the secret British codebreaking effort at Bletchley Park in 1939, where he became the lead developer of methods for rapid, bulk decryption of German Enigma messages. Turing’s Enigma deciphering was an invaluable contribution to the Allied victory over Hitler. Governments are always tight-lipped about cryptography, but the half-century of official silence about Turing’s role in breaking Enigma and saving Britain may be related to some disturbing events after the war—more on that later. Let’s get back to number theory and consider an alternative interpretation of Turing’s code. Perhaps we had the basic idea right (multiply the message by the key), but erred in using conventional arithmetic instead of modular arithmetic. Maybe this is what Turing meant: Beforehand The sender and receiver agree on a large number n, which may be made public. (This will be the modulus for all our arithmetic.) As in Version 1.0, they also agree that some prime number k < n will be the secret key. Encryption As in Version 1.0, the message m should be another prime in Œ0::n/. The sender encrypts the message m to produce mb by computing mk, but this time modulo n: b WWD m k .Zn / m (9.13) Decryption (Uh-oh.) The decryption step is a problem. We might hope to decrypt in the same way as before by dividing the encrypted message m b by the key k. The difficulty is that m b 11 See http://en.wikipedia.org/wiki/Polish Cipher Bureau. “mcs” — 2017/3/10 — 22:22 — page 325 — #333 9.9. Multiplicative Inverses and Cancelling 325 is the remainder when mk is divided by n. So dividing mb by k might not even give us an integer! This decoding difficulty can be overcome with a better understanding of when it is ok to divide by k in modular arithmetic. 9.9 Multiplicative Inverses and Cancelling The multiplicative inverse of a number x is another number x 1 such that 1 x x D 1: From now on, when we say “inverse,” we mean multiplicative (not relational) in- verse. For example, over the rational numbers, 1=3 is, of course, an inverse of 3, since, 1 3 D 1: 3 In fact, with the sole exception of 0, every rational number n=m has an inverse, namely, m=n. On the other hand, over the integers, only 1 and -1 have inverses. Over the ring Zn , things get a little more complicated. For example, 2 is a multi- plicative inverse of 8 in Z15 , since 2 8 D 1 .Z15 /: On the other hand, 3 does not have a multiplicative inverse in Z15 . We can prove this by contradiction: suppose there was an inverse j for 3, that is 1 D 3 j .Z15 /: Then multiplying both sides of this equality by 5 leads directly to the contradiction 5 D 0: 5 D 5 .3 j / D .5 3/ j D 0 j D 0 .Z15 /: So there can’t be any such inverse j . So some numbers have inverses modulo 15 and others don’t. This may seem a little unsettling at first, but there’s a simple explanation of what’s going on. “mcs” — 2017/3/10 — 22:22 — page 326 — #334 326 Chapter 9 Number Theory 9.9.1 Relative Primality Integers that have no prime factor in common are called relatively prime.12 This is the same as having no common divisor (prime or not) greater than 1. It’s also equivalent to saying gcd.a; b/ D 1. For example, 8 and 15 are relatively prime, since gcd.8; 15/ D 1. On the other hand, 3 and 15 are not relatively prime, since gcd.3; 15/ D 3 ¤ 1. This turns out to explain why 8 has an inverse over Z15 and 3 does not. Lemma 9.9.1. If k 2 Œ0::n/ is relatively prime to n, then k has an inverse in Zn . Proof. If k is relatively prime to n, then gcd.n; k/ D 1 by definition of gcd. This means we can use the Pulverizer from section 9.2.2 to find a linear combination of n and k equal to 1: sn C t k D 1: So applying the General Principle of Remainder Arithmetic (Lemma 9.7.1), we get .rem.s; n/ rem.n; n// C .rem.t; n/ rem.k; n// D 1 .Zn /: But rem.n; n/ D 0, and rem.k; n/ D k since k 2 Œ0::n/, so we get rem.t; n/ k D 1 .Zn /: Thus, rem.t; n/ is a multiplicative inverse of k. By the way, it’s nice to know that when they exist, inverses are unique. That is, Lemma 9.9.2. If i and j are both inverses of k in Zn , then i D j . Proof. i D i 1 D i .k j / D .i k/ j D 1 j D j .Zn /: So the proof of Lemma 9.9.1 shows that for any k relatively prime to n, the inverse of k in Zn is simply the remainder of a coefficient we can easily find using the Pulverizer. Working with a prime modulus is attractive here because, like the rational and real numbers, when p is prime, every nonzero number has an inverse in Zp . But arithmetic modulo a composite is really only a little more painful than working modulo a prime—though you may think this is like the doctor saying, “This is only going to hurt a little,” before he jams a big needle in your arm. 12 Other texts call them coprime. “mcs” — 2017/3/10 — 22:22 — page 327 — #335 9.9. Multiplicative Inverses and Cancelling 327 9.9.2 Cancellation Another sense in which real numbers are nice is that it’s ok to cancel common factors. In other words, if we know that t r D t s for real numbers r; s; t , then as long as t ¤ 0, we can cancel the t’s and conclude that r D s. In general, cancellation is not valid in Zn . For example, 3 10 D 3 5 .Z15 /; (9.14) but cancelling the 3’s leads to the absurd conclusion that 10 equals 5. The fact that multiplicative terms cannot be cancelled is the most significant way in which Zn arithmetic differs from ordinary integer arithmetic. Definition 9.9.3. A number k is cancellable in Zn iff ka Dkb implies a D b .Zn / for all a; b 2 Œ0::n/. If a number is relatively prime to 15, it can be cancelled by multiplying by its inverse. So cancelling works for numbers that have inverses: Lemma 9.9.4. If k has an inverse in Zn , then it is cancellable. But 3 is not relatively prime to 15, and that’s why it is not cancellable. More generally, if k is not relatively prime to n, then we can show it isn’t cancellable in Zn in the same way we showed that 3 is not cancellable in (9.14). To summarize, we have Theorem 9.9.5. The following are equivalent for k 2 Œ0::n/: gcd.k; n/ D 1; k has an inverse in Zn ; k is cancellable in Zn : 9.9.3 Decrypting (Version 2.0) Multiplicative inverses are the key to decryption in Turing’s code. Specifically, we can recover the original message by multiplying the encoded message by the Zn -inverse j of the key: b j D .m k/ j D m .k j / D m 1 D m .Zn /: m So all we need to decrypt the message is to find an inverse of the secret key k, which will be easy using the Pulverizer—providing k has an inverse. But k is positive and less than the modulus n, so one simple way to ensure that k is relatively prime to the modulus is to have n be a prime number. “mcs” — 2017/3/10 — 22:22 — page 328 — #336 328 Chapter 9 Number Theory 9.9.4 Breaking Turing’s Code (Version 2.0) The Germans didn’t bother to encrypt their weather reports with the highly-secure Enigma system. After all, so what if the Allies learned that there was rain off the south coast of Iceland? But amazingly, this practice provided the British with a critical edge in the Atlantic naval battle during 1941. The problem was that some of those weather reports had originally been trans- mitted using Enigma from U-boats out in the Atlantic. Thus, the British obtained both unencrypted reports and the same reports encrypted with Enigma. By com- paring the two, the British were able to determine which key the Germans were using that day and could read all other Enigma-encoded traffic. Today, this would be called a known-plaintext attack. Let’s see how a known-plaintext attack would work against Turing’s code. Sup- pose that the Nazis know both the plain text m and its b D m k .Zn /; m and since m is positive and less than the prime n, the Nazis can use the Pulverizer to find the Zn -inverse j of m. Now j m b D j .m k/ D .j m/ k D 1 k D k .Zn /: So by computing j m b D k .Zn /, the Nazis get the secret key and can then decrypt any message! This is a huge vulnerability, so Turing’s hypothetical Version 2.0 code has no practical value. Fortunately, Turing got better at cryptography after devising this code; his subsequent deciphering of Enigma messages surely saved thousands of lives, if not the whole of Britain. 9.9.5 Turing Postscript A few years after the war, Turing’s home was robbed. Detectives soon determined that a former homosexual lover of Turing’s had conspired in the robbery. So they arrested him—that is, they arrested Alan Turing—because at that time in Britain, homosexuality was a crime punishable by up to two years in prison. Turing was sentenced to a hormonal “treatment” for his homosexuality: he was given estrogen injections. He began to develop breasts. Three years later, Alan Turing, the founder of computer science, was dead. His mother explained what happened in a biography of her own son. Despite her re- peated warnings, Turing carried out chemistry experiments in his own home. Ap- parently, her worst fear was realized: by working with potassium cyanide while eating an apple, he poisoned himself. “mcs” — 2017/3/10 — 22:22 — page 329 — #337 9.10. Euler’s Theorem 329 However, Turing remained a puzzle to the very end. His mother was a devout woman who considered suicide a sin. And, other biographers have pointed out, Turing had previously discussed committing suicide by eating a poisoned apple. Evidently, Alan Turing, who founded computer science and saved his country, took his own life in the end, and in just such a way that his mother could believe it was an accident. Turing’s last project before he disappeared from public view in 1939 involved the construction of an elaborate mechanical device to test a mathematical conjecture called the Riemann Hypothesis. This conjecture first appeared in a sketchy paper by Bernhard Riemann in 1859 and is now one of the most famous unsolved problems in mathematics. 9.10 Euler’s Theorem The RSA cryptosystem examined in the next section, and other current schemes for encoding secret messages, involve computing remainders of numbers raised to large powers. A basic fact about remainders of powers follows from a theorem due to Euler about congruences. Definition 9.10.1. For n > 0, define13 .n/ WWD the number of integers in Œ0::n/, that are relatively prime to n. This function is known as Euler’s function.14 ¡For example, .7/ D 6 because all 6 positive numbers in Œ0::7/ are relatively prime to the prime number 7. Only 0 is not relatively prime to 7. Also, .12/ D 4 since 1, 5, 7, and 11 are the only numbers in Œ0::12/ that are relatively prime to 12. More generally, if p is prime, then .p/ D p 1 since every positive number in Œ0::p/ is relatively prime to p. When n is composite, however, the function gets a little complicated. We’ll get back to it in the next section. Euler’s Theorem is traditionally stated in terms of congruence: Theorem (Euler’s Theorem). If n and k are relatively prime, then k .n/ 1 .mod n/: (9.15) 13 Since 0 is not relatively prime to anything, .n/ could equivalently be defined using the interval .0::n/ instead of Œ0::n/. 14 Some texts call it Euler’s totient function. “mcs” — 2017/3/10 — 22:22 — page 330 — #338 330 Chapter 9 Number Theory The Riemann Hypothesis The formula for the sum of an infinite geometric series says: 1 1 C x C x2 C x3 C D : 1 x Substituting x D 21s , x D 1 3s , xD 1 5s , and so on for each prime number gives a sequence of equations: 1 1 1 1 1C s C 2s C 3s C D 2 2 2 1 1=2s 1 1 1 1 1 C s C 2s C 3s C D 3 3 3 1 1=3s 1 1 1 1 1 C s C 2s C 3s C D 5 5 5 1 1=5s :: : Multiplying together all the left-hand sides and all the right-hand sides gives: 1 X 1 Y 1 D : ns 1 1=p s nD1 p2primes The sum on the left is obtained by multiplying out all the infinite series and ap- plying the Fundamental Theorem of Arithmetic. For example, the term 1=300s in the sum is obtained by multiplying 1=22s from the first equation by 1=3s in the second and 1=52s in the third. Riemann noted that every prime appears in the expression on the right. So he proposed to learn about the primes by studying the equivalent, but simpler expression on the left. In particular, he regarded s as a complex number and the left side as a function .s/. Riemann found that the distribution of primes is related to values of s for which .s/ D 0, which led to his famous conjecture: Definition 9.9.6. The Riemann Hypothesis: Every nontrivial zero of the zeta function .s/ lies on the line s D 1=2 C ci in the complex plane. A proof would immediately imply, among other things, a strong form of the Prime Number Theorem. Researchers continue to work intensely to settle this conjecture, as they have for over a century. It is another of the Millennium Problems whose solver will earn $1,000,000 from the Clay Institute. “mcs” — 2017/3/10 — 22:22 — page 331 — #339 9.10. Euler’s Theorem 331 Things get simpler when we rephrase Euler’s Theorem in terms of Zn . Definition 9.10.2. Let Zn be the integers in .0::n/, that are relatively prime to n:15 Zn WWD fk 2 .0::n/ j gcd.k; n/ D 1g: (9.16) Consequently, .n/ D ˇZn ˇ : ˇ ˇ Theorem 9.10.3 (Euler’s Theorem for Zn ). For all k 2 Zn , k .n/ D 1 .Zn /: (9.17) Theorem 9.10.3 will follow from two very easy lemmas. Let’s start by observing that Zn is closed under multiplication in Zn : Lemma 9.10.4. If j; k 2 Zn , then j n k 2 Zn . There are lots of easy ways to prove this (see Problem 9.67). Definition 9.10.5. For any element k and subset S of Zn , let kS WWD fk n s j s 2 Sg: Lemma 9.10.6. If k 2 Zn and S Zn , then jkS j D jS j: Proof. Since k 2 Zn , by Theorem 9.9.5 it is cancellable. Therefore, Œks D k t .Zn / implies s D t: So mulitplying by k in Zn maps all the elements of S to distinct elements of kS , which implies S and kS are the same size. Corollary 9.10.7. If k 2 Zn , then kZn D Zn : Proof. A product of elements in Zn remains in Zn by Lemma 9.10.4. So if k 2 Zn , then kZn Zn . But by Lemma 9.10.6, kZn and Zn are the same size, so they must be equal. Now we can complete the proof of Euler’s Theorem 9.10.3 for Zn ): 15 Some other texts use the notation n for Zn . “mcs” — 2017/3/10 — 22:22 — page 332 — #340 332 Chapter 9 Number Theory Proof. Let P WWD k1 k2 k.n/ .Zn / be the product in Zn of all the numbers in Zn . Let Q WWD .k k1 / .k k2 / .k k.n/ / .Zn / for some k 2 Zn . Factoring out k’s immediately gives Q D k .n/ P .Zn /: But Q is the same as the product of the numbers in kZn , and kZn D Zn , so we realize that Q is the product of the same numbers as P , just in a different order. Altogether, we have P D Q D k .n/ P .Zn /: Furthermore, P 2 Zn by Lemma 9.10.4, and so it can be cancelled from both sides of this equality, giving 1 D k .n/ .Zn /: Euler’s theorem offers another way to find inverses modulo n: if k is relatively prime to n, then k .n/ 1 is a Zn -inverse of k, and we can compute this power of k efficiently using fast exponentiation. However, this approach requires computing .n/. In the next section, we’ll show that computing .n/ is easy if we know the prime factorization of n. But we know that finding the factors of n is generally hard to do when n is large, and so the Pulverizer remains the best approach to computing inverses modulo n. Fermat’s Little Theorem For the record, we mention a famous special case of Euler’s Theorem that was known to Fermat a century earlier. Corollary 9.10.8 (Fermat’s Little Theorem). Suppose p is a prime and k is not a multiple of p. Then k p 1 1 .mod p/: 9.10.1 Computing Euler’s Function RSA works using arithmetic modulo the product of two large primes, so we begin with an elementary explanation of how to compute .pq/ for primes p and q: “mcs” — 2017/3/10 — 22:22 — page 333 — #341 9.10. Euler’s Theorem 333 Lemma 9.10.9. .pq/ D .p 1/.q 1/ for primes p ¤ q. Proof. Since p and q are prime, any number that is not relatively prime to pq must be a multiple of p or a multiple of q. Among the pq numbers in Œ0::pq/, there are precisely q multiples of p and p multiples of q. Since p and q are relatively prime, the only number in Œ0::pq/ that is a multiple of both p and q is 0. Hence, there are p C q 1 numbers in Œ0::pq/ that are not relatively prime to n. This means that .pq/ D pq .p C q 1/ D .p 1/.q 1/; as claimed.16 The following theorem provides a way to calculate .n/ for arbitrary n. Theorem 9.10.10. (a) If p is a prime, then .p k / D p k pk 1 for k 1. (b) If a and b are relatively prime, then .ab/ D .a/.b/. Here’s an example of using Theorem 9.10.10 to compute .300/: .300/ D .22 3 52 / D .22 / .3/ .52 / (by Theorem 9.10.10.(b)) 2 1 1 0 2 1 D .2 2 /.3 3 /.5 5 / (by Theorem 9.10.10.(a)) D 80: Note that Lemma 9.10.9 also follows as a special case of Theorem 9.10.10.(b), since we know that .p/ D p 1 for any prime p. To prove Theorem 9.10.10.(a), notice that every pth number among the p k num- bers in Œ0::p k / is divisible by p, and only these are divisible by p. So 1=p of these numbers are divisible by p and the remaining ones are not. That is, .p k / D p k .1=p/p k D p k pk 1 : We’ll leave a proof of Theorem 9.10.10.(b) to Problem 9.61. As a consequence of Theorem 9.10.10, we have 16 This proof previews a kind of counting argument that we will explore more fully in Part III. “mcs” — 2017/3/10 — 22:22 — page 334 — #342 334 Chapter 9 Number Theory Corollary 9.10.11. For any number n, if p1 , p2 , . . . , pj are the (distinct) prime factors of n, then 1 1 1 .n/ D n 1 1 1 : p1 p2 pj We’ll give another proof of Corollary 9.10.11 based on rules for counting in Section 15.9.5. 9.11 RSA Public Key Encryption Turing’s code did not work as he hoped. However, his essential idea—using num- ber theory as the basis for cryptography—succeeded spectacularly in the decades after his death. In 1977, Ronald Rivest, Adi Shamir, and Leonard Adleman at MIT proposed a highly secure cryptosystem, called RSA, based on number theory. The purpose of the RSA scheme is to transmit secret messages over public communication chan- nels. As with Turing’s codes, the messages transmitted are nonnegative integers of some fixed size. Moreover, RSA has a major advantage over traditional codes: the sender and receiver of an encrypted message need not meet beforehand to agree on a secret key. Rather, the receiver has both a private key, which they guard closely, and a public key, which they distribute as widely as possible. A sender wishing to transmit a secret message to the receiver encrypts their message using the receiver’s widely- distributed public key. The receiver can then decrypt the received message using their closely held private key. The use of such a public key cryptography system allows you and Amazon, for example, to engage in a secure transaction without meeting up beforehand in a dark alley to exchange a key. Interestingly, RSA does not operate modulo a prime, as Turing’s hypothetical Version 2.0 may have, but rather modulo the product of two large primes—typically primes that are hundreds of digits long. Also, instead of encrypting by multiplica- tion with a secret key, RSA exponentiates to a secret power—which is why Euler’s Theorem is central to understanding RSA. The scheme for RSA public key encryption appears in the box. If the message m is relatively prime to n, then a simple application of Euler’s Theorem implies that this way of decoding the encrypted message indeed repro- duces the original unencrypted message. In fact, the decoding always works—even in (the highly unlikely) case that m is not relatively prime to n. The details are worked out in Problem 9.81. “mcs” — 2017/3/10 — 22:22 — page 335 — #343 9.11. RSA Public Key Encryption 335 The RSA Cryptosystem A Receiver who wants to be able to receive secret numerical messages creates a private key, which they keep secret, and a public key, which they make publicly available. Anyone with the public key can then be a Sender who can publicly send secret messages to the Receiver—even if they have never communicated or shared any information besides the public key. Here is how they do it: Beforehand The Receiver creates a public key and a private key as follows. 1. Generate two distinct primes, p and q. These are used to generate the private key, and they must be kept hidden. (In current practice, p and q are chosen to be hundreds of digits long.) 2. Let n WWD pq. 3. Select an integer e 2 Œ0::n/ such that gcd.e; .p 1/.q 1// D 1. The public key is the pair .e; n/. This should be distributed widely. 4. Let the private key d 2 Œ0::n/ be the inverse of e in the ring Z.p 1/.q 1/ . This private key can be found using the Pulverizer. The private key d should be kept hidden! Encoding To transmit a message m 2 Œ0::n/ to Receiver, a Sender uses the public key to encrypt m into a numerical message b WWD me .Zn /: m The Sender can then publicly transmit m b to the Receiver. Decoding The Receiver decrypts message m b back to message m using the pri- vate key: mDm bd .Zn /: “mcs” — 2017/3/10 — 22:22 — page 336 — #344 336 Chapter 9 Number Theory Why is RSA thought to be secure? It would be easy to figure out the private key d if you knew p and q—you could do it the same way the Receiver does using the Pulverizer. But assuming the conjecture that it is hopelessly hard to factor a number that is the product of two primes with hundreds of digits, an effort to factor n is not going to break RSA. Could there be another approach to reverse engineer the private key d from the public key that did not involve factoring n? Not really. It turns out that given just the private and the public keys, it is easy to factor n17 (a proof of this is sketched in Problem 9.83). So if we are confident that factoring is hopelessly hard, then we can be equally confident that finding the private key just from the public key will be hopeless. But even if we are confident that an RSA private key won’t be found, this doesn’t rule out the possibility of decoding RSA messages in a way that sidesteps the pri- vate key. It is an important unproven conjecture in cryptography that any way of cracking RSA—not just by finding the secret key—would imply the ability to fac- tor. This would be a much stronger theoretical assurance of RSA security than is presently known. But the real reason for confidence is that RSA has withstood all attacks by the world’s most sophisticated cryptographers for nearly 40 years. Despite decades of these attacks, no significant weakness has been found. That’s why the mathemat- ical, financial, and intelligence communities are betting the family jewels on the security of RSA encryption. You can hope that with more studying of number theory, you will be the first to figure out how to do factoring quickly and, among other things, break RSA. But be further warned that even Gauss worked on factoring for years without a lot to show for his efforts—and if you do figure it out, you might wind up meeting some humorless fellows working for a Federal agency in charge of security. . . . 9.12 What has SAT got to do with it? So why does society, or at least everybody’s secret codes, fall apart if there is an efficient test for satisfiability (SAT), as we claimed in Section 3.5? To explain this, remember that RSA can be managed computationally because multiplication of two primes is fast, but factoring a product of two primes seems to be overwhelmingly demanding. 17 In practice, for this reason, the public and private keys should be randomly chosen so that neither is “too small.” “mcs” — 2017/3/10 — 22:22 — page 337 — #345 9.13. References 337 Let’s begin with the observation from Section 3.2 that a digital circuit can be described by a bunch of propositional formulas of about the same total size as the circuit. So testing circuits for satisfiability is equivalent to the SAT problem for propositional formulas (see Problem 3.22). Now designing digital multiplication circuits is completely routine. We can eas- ily build a digital “product checker” circuit out of AND, OR, and NOT gates with 1 output wire and 4n digital input wires. The first n inputs are for the binary repre- sentation of an integer i , the next n inputs for the binary representation of an integer j , and the remaining 2n inputs for the binary representation of an integer k. The output of the circuit is 1 iff ij D k and i; j > 1. A straightforward design for such a product checker uses proportional to n2 gates. Now here’s how to factor any number m with a length 2n binary representation using a SAT solver. First, fix the last 2n digital inputs—the ones for the binary representation of k—so that k equals m. Next, set the first of the n digital inputs for the representation of i to be 1. Do a SAT test to see if there is a satisfying assignment of values for the remaining 2n 1 inputs used for the i and j representations. That is, see if the remaining inputs for i and j can be filled in to cause the circuit to give output 1. If there is such an assignment, fix the first i -input to be 1, otherwise fix it to be 0. So now we have set the first i -input equal to the first digit of the binary representations of an i such that ij D m. Now do the same thing to fix the second of the n digital inputs for the represen- tation of i , and then third, proceeding in this way through all the n inputs for the number i . At this point, we have the complete n-bit binary representation of an i > 1 such ij D m for some j > 1. In other words, we have found an integer i that is a factor of m. We can now find j by dividing m by i . So after n SAT tests, we have factored m. This means that if SAT for digital circuits with 4n inputs and about n2 gates could be determined by a procedure taking a number of steps bounded above by a degree d polynomial in n, then 2n digit numbers can be factored in n times this many steps, that is, with a number of steps bounded by a polynomial of degree d C 1 in n. So if SAT could be solved in polynomial time, then so could factoring, and consequently RSA would be “easy” to break. 9.13 References [2], [42] “mcs” — 2017/3/10 — 22:22 — page 338 — #346 338 Chapter 9 Number Theory Problems for Section 9.1 Practice Problems Problem 9.1. Prove that a linear combination of linear combinations of integers a0 ; : : : ; an is a linear combination of a0 ; : : : ; an . Class Problems Problem 9.2. A number is perfect if it is equal to the sum of its positive divisors, other than itself. For example, 6 is perfect, because 6 D 1 C 2 C 3. Similarly, 28 is perfect, because 28 D 1 C 2 C 4 C 7 C 14. Explain why 2k 1 .2k 1/ is perfect when 2k 1 is prime.18 Problems for Section 9.2 Practice Problems Problem 9.3. Let x WWD 21212121; y WWD 12121212: Use the Euclidean algorithm to find the GCD of x and y. Hint: Looks scary, but it’s not. Problem 9.4. 18 Euclid proved this 2300 years ago. About 250 years ago, Euler proved the converse: every even perfect number is of this form (for a simple proof see http://primes.utm.edu/notes/proofs/EvenPerfect.html). It is not known if there are any odd perfect numbers at all. It is also not known if there are an infinite number of even perfect numbers. One of the charms of number theory is that simple results like those given in this problem lie at the brink of the unknown. “mcs” — 2017/3/10 — 22:22 — page 339 — #347 9.13. References 339 Let x WWD 1788 315 372 591000 22 / y WWD 19.9 3712 533678 5929 : (a) What is gcd.x; y/? (b) What is lcm.x; y/? (“lcm” is least common multiple.) Problem 9.5. Show that there is an integer x such that ax b .mod n/ iff gcd.a; n/ j b: Problem 9.6. Prove that gcd.a5 ; b 5 / D gcd.a; b/5 for all a; b 2 Z. Class Problems Problem 9.7. Use the Euclidean Algorithm to prove that gcd.13a C 8b; 5a C 3b/ D gcd.a; b/: Problem 9.8. (a) Use the Pulverizer to find integers x; y such that x30 C y22 D gcd.30; 22/: (b) Now find integers x 0 ; y 0 with 0 y 0 < 30 such that x 0 30 C y 0 22 D gcd.30; 22/ “mcs” — 2017/3/10 — 22:22 — page 340 — #348 340 Chapter 9 Number Theory Problem 9.9. (a) Use the Pulverizer to find gcd.84; 108/ (b) Find integers x, y with 0 y < 84 such that x 84 C y 108 D gcd.84; 108/: (c) Is there a multiplicative inverse of 84 in Z108 ? If not briefly explain why, otherwise find it. Problem 9.10. Indicate true or false for the following statements about the greatest common divisor, and provide counterexamples for those that are false. (a) If gcd.a; b/ ¤ 1 and gcd.b; c/ ¤ 1, then gcd.a; c/ ¤ 1. true false (b) If a j bc and gcd.a; b/ D 1, then a j c. true false (c) gcd.an ; b n / D .gcd.a; b//n true false (d) gcd.ab; ac/ D a gcd.b; c/. true false (e) gcd.1 C a; 1 C b/ D 1 C gcd.a; b/. true false (f) If an integer linear combination of a and b equals 1, then so does some integer linear combination of a and b 2 . true false (g) If no integer linear combination of a and b equals 2, then neither does any integer linear combination of a2 and b 2 . true false Problem 9.11. For nonzero integers a, b, prove the following properties of divisibility and GCD’S. You may use Theorem 9.2.2 that gcd.a; b/ is an integer linear combination of a and b. You may not appeal to uniqueness of prime factorization Theorem 9.4.1, because some of these properties are needed to prove unique factorization.) (a) Every common divisor of a and b divides gcd.a; b/. (b) gcd.ka; kb/ D k gcd.a; b/ for all k > 0. (c) If a j bc and gcd.a; b/ D 1, then a j c. (d) If p j bc for some prime p then p j b or p j c. “mcs” — 2017/3/10 — 22:22 — page 341 — #349 9.13. References 341 (e) Let m be the smallest integer linear combination of a and b that is positive. Show that m D gcd.a; b/. Homework Problems Problem 9.12. Here is a game you can analyze with number theory and always beat me. We start with two distinct, positive integers written on a blackboard. Call them a and b. Now we take turns. (I’ll let you decide who goes first.) On each turn, the player must write a new positive integer on the board that is the difference of two numbers that are already there. If a player cannot play, then they lose. For example, suppose that 12 and 15 are on the board initially. Your first play must be 3, which is 15 12. Then I might play 9, which is 12 3. Then you might play 6, which is 15 9. Then I can’t play, so I lose. (a) Show that every number on the board at the end of the game is a multiple of gcd.a; b/. (b) Show that every positive multiple of gcd.a; b/ up to max.a; b/ is on the board at the end of the game. (c) Describe a strategy that lets you win this game every time. Problem 9.13. Define the Pulverizer State machine to have: states WWD N6 start state WWD .a; b; 0; 1; 1; 0/ (where a b > 0) transitions WWD .x; y; s; t; u; v/ ! .y; rem.x; y/; u sq; v t q; s; t / (for q D qcnt.x; y/; y > 0): (a) Show that the following properties are preserved invariants of the Pulverizer machine: gcd.x; y/ D gcd.a; b/; (Inv1) sa C t b D y; and (Inv2) ua C vb D x: (Inv3) (b) Conclude that the Pulverizer machine is partially correct. “mcs” — 2017/3/10 — 22:22 — page 342 — #350 342 Chapter 9 Number Theory (c) Explain why the machine terminates after at most the same number of transi- tions as the Euclidean algorithm. Problem 9.14. Prove that the smallest positive integers a b for which, starting in state .a; b/, the Euclidean state machine will make n transitions are F .n C 1/ and F .n/, where F .n/ is the nth Fibonacci number. Hint: Induction. a later chapter, we’ll show that F .n/ ' n where ' is the golden ratio In p .1 C 5/=2. This implies that the Euclidean algorithm halts after at most log' .a/ transitions. This is a somewhat smaller than the 2 log2 a bound derived from equa- tion (9.4). Problem 9.15. Let’s extend the jug filling scenario of Section 9.1.3 to three jugs and a receptacle. Suppose the jugs can hold a, b and c gallons of water, respectively. The receptacle can be used to store an unlimited amount of water, but has no measurement markings. Excess water can be dumped into the drain. Among the possible moves are: 1. fill a bucket from the hose, 2. pour from the receptacle to a bucket until the bucket is full or the receptacle is empty, whichever happens first, 3. empty a bucket to the drain, 4. empty a bucket to the receptacle, and 5. pour from one bucket to another until either the first is empty or the second is full. (a) Model this scenario with a state machine. (What are the states? How does a state change in response to a move?) (b) Prove that Bruce can get k 2 N gallons of water into the receptacle using the above operations if gcd.a; b; c/ j k. “mcs” — 2017/3/10 — 22:22 — page 343 — #351 9.13. References 343 Problem 9.16. The Binary GCD state machine computes the GCD of integers a; b > 0 using only division by 2 and subtraction, which makes it run very efficiently on hardware that uses binary representation of numbers. In practice, it runs more quickly than the more famous Euclidean algorithm described in Section 9.2.1. statesWWDN3 start stateWWD.a; b; 1/ transitionsWWD if min.x; y/ > 0; then .x; y; e/ ! .x=2; y=2; 2e/ (if 2 j x and 2 j y) (i1) .x=2; y; e/ (else if 2 j x) (i2) .x; y=2; e/ (else if 2 j y) (i3) .x y; y; e/ (else if x > y) (i4) .y x; x; e/ (else if y > x) (i5) .1; 0; ex/ (otherwise (x D y)): (i6) (a) Use the Invariant Principle to prove that if this machine stops, that is, reaches a state .x; y; e/ in which no transition is possible, then e D gcd.a; b/. (b) Prove that rule (i1) .x; y; e/ ! .x=2; y=2; 2e/ is never executed after any of the other rules is executed. (c) Prove that the machine reaches a final state in at most 1 C 3.log a C log b/ transitions. (This is a coarse bound; you may be able to get a better one.) Problem 9.17. Extend the binary gcd procedure of Problem 9.16 to obtain a new pulverizer that uses only division by 2 and subtraction. Hint: After the binary gcd procedure has factored out 2’s, it starts computing the gcd.a; b/ for numbers a; b at least one of which is odd. It does this by successively updating a pair of numbers .x; y/ such that gcd.x; y/ D gcd.a; b/. Extend the procedure to find and update coefficients ux ; vx ; uy ; vy such that ux a C vx b D x and uy a C vy b D y: “mcs” — 2017/3/10 — 22:22 — page 344 — #352 344 Chapter 9 Number Theory To see how to update the coefficients when at least one of a and b is odd and ua C vb is even, show that either u and v are both even, or else u b and v C a are both even. Problem 9.18. For any set A of integers, gcd.A/ WWD the greatest common divisor of the elements of A. The following useful property of gcd’s of sets is easy to take for granted: Theorem. gcd.A [ B/ D gcd.gcd.A/; gcd.B//; (AuB) for all finite sets A; B Z. The theorem has an easy proof as a Corollary of the Unique Factorization The- orem. In this problem we develop a proof by induction of Theorem (AuB) just making repeated use of Lemma 9.2.6.b : .d j a AND d j b/ IFF d j gcd.a; b/: (gcddiv) The key to proving (AuB) will be generalizing (gcddiv) to finite sets. Definition. For any subset A Z, d j A WWD 8a 2 A: d j a: (divdef) Lemma. d j A IFF d j gcd.A/: (dAdgA) for all d 2 Z and finite sets A Z. (a) Prove that gcd.a; gcd.b; c// D gcd.gcd.a; b/; c/ (gcd-associativity) for all integers a; b; c. From here on we write “a [ A” as an abbreviation for “fag [ A.” (b) Prove that d j .a [ b [ C / IFF d j .gcd.a; b/ [ C / (abCgcd) for all a; b; d 2 Z, and C Z. “mcs” — 2017/3/10 — 22:22 — page 345 — #353 9.13. References 345 Proof. d j .a [ b [ C / IFF .d j a/ AND .d j b/ AND .d j C / (def (divdef) of divides) IFF .d j gcd.a; b// AND .d j C / by (gcddiv) IFF d j .gcd.a; b/ [ C / (def (divdef) of divides): (c) Using parts (a) and (b), prove by induction on the size of A, that d j .a [ A/ IFF d j gcd.a; gcd.A//; (divauA) for all integers a; d and finite sets A Z. Explain why this proves Lemma (dAdgA). (d) Prove Theorem (AuB). (e) Conclude that gcd.A/ is an integer linear combination of the elements in A. Exam Problems Problem 9.19. Prove that gcd.mb C r; b/ D gcd.b; r/ for all integers m; b; r. Problem 9.20. The Stata Center’s delicate balance depends on two buckets of water hidden in a secret room. The big bucket has a volume of 25 gallons, and the little bucket has a volume of 10 gallons. If at any time a bucket contains exactly 13 gallons, the Stata Center will collapse. There is an interactive display where tourists can remotely fill and empty the buckets according to certain rules. We represent the buckets as a state machine. The state of the machine is a pair .b; l/, where b is the volume of water in big bucket, and l is the volume of water in little bucket. (a) We informally describe some of the legal operations tourists can perform be- low. Represent each of the following operations as a transition of the state machine. The first is done for you as an example. 1. Fill the big bucket. .b; l/ ! .25; l/: 2. Empty the little bucket. “mcs” — 2017/3/10 — 22:22 — page 346 — #354 346 Chapter 9 Number Theory 3. Pour the big bucket into the little bucket. You should have two cases defined in terms of the state .b; l/: if all the water from the big bucket fits in the little bucket, then pour all the water. If it doesn’t, pour until the little jar is full, leaving some water remaining in the big jar. (b) Use the Invariant Principle to show that, starting with empty buckets, the Stata Center will never collapse. That is, the state .13; x/ in unreachable. (In verifying your claim that the invariant is preserved, you may restrict to the representative transitions of part (a).) Problem 9.21. Let m D 29 524 74 117 ; n D 23 722 11211 197 ; p D 25 34 76042 1930 : (a) What is the gcd.m; n; p/? (b) What is the least common multiple lcm.m; n; p/? Let k .n/ be the largest power of k that divides n, where k > 1. That is, k .n/ WWD maxfi j k i divides ng: If A is a nonempty set of nonnegative integers, define k .A/ WWD fk .a/ j a 2 Ag: (c) Express k .gcd.A// in terms of k .A/. (d) Let p be a prime number. Express p .lcm.A// in terms of p .A/. (e) Give an example of integers a; b where 6 .lcm.a; b// > max.6 .a/; 6 .b//. Q Q (f) Let A be the product of all the elements in A. Express p .n/. A/ in terms of p .A/. (g) Let B also be a nonempty set of nonnegative integers. Conclude that gcd.A [ B/ D gcd.gcd.A/; gcd.B//: (9.18) Hint: Consider p ./ of the left and right-hand sides of (9.18). You may assume min.A [ B/ D min.min.A/; min.B//: (9.19) “mcs” — 2017/3/10 — 22:22 — page 347 — #355 9.13. References 347 Problems for Section 9.3 Homework Problems Problem 9.22. TBA: Chebyshvev lower bound in prime density, based on Shoup pp.75–76 Problems for Section 9.4 Practice Problems Problem 9.23. Let p be a prime number and a1 ; : : : ; an integers. Prove the following Lemma by induction: Lemma. If p divides a product a1 a2 an ; then p divides some ai : (*) You may assume the case for n D 2 which was given by Lemma 9.4.2. Be sure to clearly state and label your Induction Hypothesis, Base case(s), and Induction step. Class Problems Problem 9.24. (a) Let m D 29 524 117 1712 and n D 23 722 11211 131 179 192 . What is the gcd.m; n/? What is the least common multiple lcm.m; n/ of m and n? Verify that gcd.m; n/ lcm.m; n/ D mn: (9.20) (b) Describe in general how to find the gcd.m; n/ and lcm.m; n/ from the prime factorizations of m and n. Conclude that equation (9.20) holds for all positive integers m; n. Homework Problems Problem 9.25. p The set of complex p numbers that are equal topm C n 5 for some integers m; n is called ZŒ 5. It will turn out that in ZŒ 5, not all numbers have unique factorizations. “mcs” — 2017/3/10 — 22:22 — page 348 — #356 348 Chapter 9 Number Theory p p p A sum or product of numbers in ZŒ 5 is in ZŒ 5, and since ZŒ 5 is a subset of the complex numbers, all the usual rules for addition and multiplication are true for it. But some weird things do happen. For example, the prime 29 has factors: p (a) Find x; y 2 ZŒ 5 such that xy D 29 and x ¤ ˙1 ¤ y. p On the other hand, the p number 3 is still a “prime” even pin ZŒ 5. More pre- cisely, a number p p 2 ZŒ 5 is called irreducible over ZŒ 5 iff when xy D p for some x; y 2 ZŒ 5, either x D ˙1 or y D ˙1. p p p Claim. The numbers 3; 2 C 5, and 2 5 are irreducible over ZŒ 5. pIn particular, this Claim implies that the number 9 factors into irreducibles over ZŒ 5 in two different ways: p p 3 3 D 9 D .2 C 5/.2 5/: p So ZŒ 5 is an example of what is called a non-unique factorization domain. To verify the Claim, we’ll appeal (without proof) to a familiar technical property of complex numbers given in the following Lemma. p Definition. For p a complex number c D r C si where r; s 2 R and i is 1, the norm jcj of c is r 2 C s 2 . Lemma. For c; d 2 C, jcd j D jcj jd j : p (b) Prove that jxj2 ¤ 3 for all x 2 ZŒ 5. p (c) Prove that if x 2 ZŒ 5 and jxj D 1, then x D ˙1. p (d) Prove that if jxyj D 3 for some x; y 2 ZŒ 5, then x D ˙1 or y D ˙1. 2 p Hint: jzj 2 N for z 2 ZŒ 5. (e) Complete the proof of the Claim. Problems for Section 9.6 Practice Problems Problem 9.26. Prove that if a b .mod 14/ and a b .mod 5/, then a b .mod 70/. “mcs” — 2017/3/10 — 22:22 — page 349 — #357 9.13. References 349 Class Problems Problem 9.27. (a) Prove if n is not divisible by 3, then n2 1 .mod 3/. (b) Show that if n is odd, then n2 1 .mod 8/. (c) Conclude that if p is a prime greater than 3, then p 2 1 is divisible by 24. Problem 9.28. The values of polynomial p.n/ WWD n2 C n C 41 are prime for all the integers from 0 to 39 (see Section 1.1). Well, p didn’t work, but are there any other polynomials whose values are always prime? No way! In fact, we’ll prove a much stronger claim. Definition. The set P of integer polynomials can be defined recursively: Base cases: the identity function IdZ .x/ WWD x is in P . for any integer m the constant function cm .x/ WWD m is in P . Constructor cases. If r; s 2 P , then r C s and r s 2 P . (a) Using the recursive definition of integer polynomials given above, prove by structural induction that for all q 2 P , j k .mod n/ IMPLIES q.j / q.k/ .mod n/; for all integers j; k; n where n > 1. Be sure to clearly state and label your Induction Hypothesis, Base case(s), and Constructor step. (b) We’ll say that q produces multiples if, for every integer greater than one in the range of q, there are infinitely many different multiples of that integer in the range. For example, if q.4/ D 7 and q produces multiples, then there are infinitely many different multiples of 7 in the range of q, and of course, except for 7 itself, none of these multiples is prime. Prove that if q has positive degree and positive leading coefficient, then q produces multiples. You may assume that every such polynomial is strictly increasing for large arguments. “mcs” — 2017/3/10 — 22:22 — page 350 — #358 350 Chapter 9 Number Theory Part (b) implies that an integer polynomial with positive leading coefficient and degree has infinitely many nonprimes in its range. This fact no longer holds true for multivariate polynomials. An amazing consequence of Matiyasevich’s [32] solu- tion to Hilbert’s Tenth Problem is that multivariate polynomials can be understood as general purpose programs for generating sets of integers. If a set of nonnegative integers can be generated by any program, then it equals the set of nonnegative integers in the range of a multivariate integer polynomial! In particular, there is an integer polynomial p.x1 ; : : : ; x7 / whose nonnegative values as x1 ; : : : ; x7 range over N are precisely the set of all prime numbers! Problems for Section 9.7 Practice Problems Problem 9.29. List the numbers of all statements below that are equivalent to ab .mod n/; where n > 1 and a and b are integers. Briefly explain your reasoning. i) 2a 2b .mod n/ ii) 2a 2b .mod 2n/ iii) a3 b 3 .mod n/ iv) rem.a; n/ D rem.b; n/ v) rem.n; a/ D rem.n; b/ vi) gcd.a; n/ D gcd.b; n/ vii) gcd.n; a b/ D n viii) .a b/ is a multiple of n ix) 9k 2 Z: a D b C nk “mcs” — 2017/3/10 — 22:22 — page 351 — #359 9.13. References 351 Problem 9.30. What is remainder.3101 ; 21/? Homework Problems Problem 9.31. Prove that congruence is preserved by arithmetic expressions. Namely, prove that ab .mod n/; (9.21) then eval.e; a/ eval.e; b/ .mod n/; (9.22) for all e 2 Aexp (see Section 7.4). Problem 9.32. A commutative ring is a set R of elements along with two binary operations ˚ and ˝ from R R to R. There is an element in R called the zero-element, 0, and another element called the unit-element, 1. The operations in a commutative ring satisfy the following ring axioms for r; s; t 2 R: .r ˝ s/ ˝ t D r ˝ .s ˝ t / (associativity of ˝); .r ˚ s/ ˚ t D r ˚ .s ˚ t / (associativity of ˚); r ˚s Ds˚r (commutativity of ˚) r ˝s Ds˝r (commutativity of ˝); 0˚r Dr (identity for ˚); 1˝r Dr (identity for ˝); 0 0 9r 2 R: r ˚ r D 0 (inverse for ˚); r ˝ .s ˚ t / D .r ˝ s/ ˚ .r ˝ t / (distributivity): (a) Show that the zero-element is unique, that is, show that if z 2 R has the property that z ˚ r D r; (9.23) then z D 0. (b) Show that additive inverses are unique, that is, show that r ˚ r1 D 0 and (9.24) r ˚ r2 D 0 (9.25) “mcs” — 2017/3/10 — 22:22 — page 352 — #360 352 Chapter 9 Number Theory implies r1 D r2 . (c) Show that multiplicative inverses are unique, that is, show that r ˝ r1 D 1 r ˝ r2 D 1 implies r1 D r2 . Problem 9.33. This problem will use elementary properties of congruences to prove that every positive integer divides infinitely many Fibonacci numbers. A function f W N ! N that satisifies f .n/ D c1 f .n 1/ C c2 f .n 2/ C C cd f .n d/ (9.26) for some ci 2 N and all n d is called a degree d linear-recursive. A function f W N ! N has a degree d repeat modulo m at n and k when it satisfies the following repeat congruences: f .n/ f .k/ .mod m/; f .n 1/ f .k 1/ .mod m/; :: : f .n .d 1// f .k .d 1// .mod m/: for k > n d 1. For the rest of this problem, assume linear-recursive functions and repeats are degree d > 0. (a) Prove that if a linear-recursive function has a repeat modulo m at n and k, then it has one at n C 1 and k C 1. (b) Prove that for all m > 1, every linear-recursive function repeats modulo m at n and k for some n; k 2 Œd 1; d C md /. (c) A linear-recursive function is reverse-linear if its d th coefficient cd D ˙1. Prove that if a reverse-linear function repeats modulo m at n and k for some n d , then it repeats modulo m at n 1 and k 1. (d) Conclude that every reverse-linear function must repeat modulo m at d 1 and .d 1/ C j for some j > 0. “mcs” — 2017/3/10 — 22:22 — page 353 — #361 9.13. References 353 (e) Conclude that if f is an reverse-linear function and f .k/ D 0 for some k 2 Œ0; d /, then every positive integer is a divisor of f .n/ for infinitely many n. (f) Conclude that every positive integer is a divisor of infinitely many Fibonacci numbers. Hint: Start the Fibonacci sequence with the values 0,1 instead of 1, 1. Class Problems Problem 9.34. Find 5555 remainder 98763456789 999 67893414259 ; 14 : (9.27) Problem 9.35. The following properties of equivalence mod n follow directly from its definition and simple properties of divisibility. See if you can prove them without looking up the proofs in the text. (a) If a b .mod n/, then ac bc .mod n/. (b) If a b .mod n/ and b c .mod n/, then a c .mod n/. (c) If a b .mod n/ and c d .mod n/, then ac bd .mod n/. (d) rem.a; n/ a .mod n/. Problem 9.36. (a) Why is a number written in decimal evenly divisible by 9 if and only if the sum of its digits is a multiple of 9? Hint: 10 1 .mod 9/. (b) Take a big number, such as 37273761261. Sum the digits, where every other one is negated: 3 C . 7/ C 2 C . 7/ C 3 C . 7/ C 6 C . 1/ C 2 C . 6/ C 1 D 11 Explain why the original number is a multiple of 11 if and only if this sum is a multiple of 11. Problem 9.37. At one time, the Guinness Book of World Records reported that the “greatest human “mcs” — 2017/3/10 — 22:22 — page 354 — #362 354 Chapter 9 Number Theory calculator” was a guy who could compute 13th roots of 100-digit numbers that were 13th powers. What a curious choice of tasks. . . . In this problem, we prove n13 n .mod 10/ (9.28) for all n. (a) Explain why (9.28) does not follow immediately from Euler’s Theorem. (b) Prove that d 13 d .mod 10/ (9.29) for 0 d < 10. (c) Now prove the congruence (9.28). Problem 9.38. (a) Ten pirates find a chest filled with gold and silver coins. There are twice as many silver coins in the chest as there are gold. They divide the gold coins in such a way that the difference in the number of coins given to any two pirates is not divisible by 10. They will only take the silver coins if it is possible to divide them the same way. Is this possible, or will they have to leave the silver behind? Prove your answer. (b) There are also 3 sacks in the chest, containing 5, 49, and 51 rubies respec- tively. The treasurer of the pirate ship is bored and decides to play a game with the following rules: He can merge any two piles together into one pile, and he can divide a pile with an even number of rubies into two piles of equal size. He makes one move every day, and he will finish the game when he has divided the rubies into 105 piles of one. Is it possible for him to finish the game? Exam Problems Problem 9.39. The sum of the digits of the base 10 representation of an integer is congruent mod- ulo 9 to that integer. For example, 763 7 C 6 C 3 .mod 9/: We can say that “9 is a good modulus for base 10.” “mcs” — 2017/3/10 — 22:22 — page 355 — #363 9.13. References 355 More generally, we’ll say “k is a good modulus for base b” when, for any non- negative integer n, the sum of the digits of the base b representation of n is congru- ent to n modulo k. So 2 is not a good modulus for base 10 because 763 6 7 C 6 C 3 .mod 2/: (a) What integers k > 1 are good moduli for base 10? (b) Show that if b 1 .mod k/, then k is good for base b. (c) Prove conversely, that if k is good for some base b 2, then b 1 .mod k/. Hint: The base b representation of b. (d) Exactly which integers k > 1 are good moduli for base 106? Problem 9.40. We define the sequence of numbers ( 1; for n 3, an D an 1 C an 2 C an 3 C an 4; for n > 3. Use strong induction to prove that remainder.an ; 3/ D 1 for all n 0. Problems for Section 9.8 Exam Problems Problem 9.41. Definition. The set P of single variable integer polynomials can be defined recur- sively: Base cases: the identity function, IdZ .x/ WWD x is in P . for any integer m the constant function, cm .x/ WWD m is in P . “mcs” — 2017/3/10 — 22:22 — page 356 — #364 356 Chapter 9 Number Theory Constructor cases. If r; s 2 P , then r C s and r s 2 P . Prove by structural induction that for all q 2 P , j k .mod n/ IMPLIES q.j / q.k/ .mod n/; for all integers j; k; n where n > 1. Be sure to clearly state and label your Induction Hypothesis, Base case(s), and Constructor step. Problems for Section 9.9 Practice Problems Problem 9.42. (a) Given inputs m; n 2 ZC , the Pulverizer will produce x; y 2 Z such that: (b) Assume n > 1. Explain how to use the numbers x; y to find the inverse of m modulo n when there is an inverse. Problem 9.43. What is the multiplicative inverse (mod 7) of 2? Reminder: by definition, your answer must be an integer between 0 and 6. Problem 9.44. (a) Find integer coefficients x, y such that 25xC32y D gcd.25; 32/. (b) What is the inverse (mod 25) of 32? Problem 9.45. (a) Use the Pulverizer to find integers s; t such that 40s C 7t D gcd.40; 7/: (b) Adjust your answer to part (a) to find an inverse modulo 40 of 7 in Œ1; 40/. “mcs” — 2017/3/10 — 22:22 — page 357 — #365 9.13. References 357 Class Problems Problem 9.46. Two nonparallel lines in the real plane intersect at a point. Algebraically, this means that the equations y D m1 x C b1 y D m2 x C b2 have a unique solution .x; y/, provided m1 ¤ m2 . This statement would be false if we restricted x and y to the integers, since the two lines could cross at a noninteger point: However, an analogous statement holds if we work over the integers modulo a prime p. Find a solution to the congruences y m1 x C b1 .mod p/ y m2 x C b2 .mod p/ when m1 6 m2 .mod p/. Express your solution in the form x ‹ .mod p/ and y ‹ .mod p/ where the ?’s denote expressions involving m1 , m2 , b1 and b2 . You may find it helpful to solve the original equations over the reals first. Problems for Section 9.10 Practice Problems Problem 9.47. Prove that k 2 Œ0; n/ has an inverse modulo n iff it has an inverse in Zn . “mcs” — 2017/3/10 — 22:22 — page 358 — #366 358 Chapter 9 Number Theory Problem 9.48. What is rem.2479 ; 79/? Hint: You should not need to do any actual multiplications! Problem 9.49. (a) Prove that 2212001 has a multiplicative inverse modulo 175. (b) What is the value of .175/, where is Euler’s function? (c) What is the remainder of 2212001 divided by 175? Problem 9.50. How many numbers between 1 and 6042 (inclusive) are relatively prime to 3780? Hint: 53 is a factor. Problem 9.51. How many numbers between 1 and 3780 (inclusive) are relatively prime to 3780? Problem 9.52. (a) What is the probability that an integer from 1 to 360 selected with uniform probability is relatively prime to 360? (b) What is the value of rem.798 ; 360/? Class Problems Problem 9.53. Find the remainder of 261818181 divided by 297. Hint: 1818181 D .180 10101/ C 1; use Euler’s theorem. Problem 9.54. 77 Find the last digit of 77 . “mcs” — 2017/3/10 — 22:22 — page 359 — #367 9.13. References 359 Problem 9.55. Prove that n and n5 have the same last digit. For example: 25 D 32 795 D 3077056399 Problem 9.56. Use Fermat’s theorem to find the inverse i of 13 modulo 23 with 1 i < 23. Problem 9.57. Let be Euler’s function. (a) What is the value of .2/? (b) What are three nonnegative integers k > 1 such that .k/ D 2? (c) Prove that .k/ is even for k > 2. Hint: Consider whether k has an odd prime factor or not. (d) Briefly explain why .k/ D 2 for exactly three values of k. Problem 9.58. Suppose a; b are relatively prime and greater than 1. In this problem you will prove the Chinese Remainder Theorem, which says that for all m; n, there is an x such that x m mod a; (9.30) x n mod b: (9.31) Moreover, x is unique up to congruence modulo ab, namely, if x 0 also satis- fies (9.30) and (9.31), then x 0 x mod ab: (a) Prove that for any m; n, there is some x satisfying (9.30) and (9.31). Hint: Let b 1 be an inverse of b modulo a and define ea WWD b 1 b. Define eb similarly. Let x D mea C neb . (b) Prove that Œx 0 mod a AND x 0 mod b implies x 0 mod ab: “mcs” — 2017/3/10 — 22:22 — page 360 — #368 360 Chapter 9 Number Theory (c) Conclude that x x 0 mod a AND x x 0 mod b implies x x 0 mod ab: (d) Conclude that the Chinese Remainder Theorem is true. (e) What about the converse of the implication in part (c)? Problem 9.59. The order of k 2 Zn is the smallest positive m such that k m D 1 .Zn /. (a) Prove that k m D 1 .Zn / IMPLIES ord.k; n/ j m: Hint: Take the remainder of m divided by the order. Now suppose p > 2 is a prime of the form 2s C 1. For example, 21 C 1; 22 C 1; 24 C 1 are such primes. (b) Conclude from part (a) that if 0 < k < p, then ord.k; p/ is a power of 2. (c) Prove that ord.2; p/ D 2s and conclude that s is a power of 2.19 Hint: 2k 1 for k 2 Œ1::r is positive but too small to equal 0 .Zp /. Homework Problems Problem 9.60. This problem is about finding square roots modulo a prime p. (a) Prove that x 2 y 2 .mod p/ if and only if x y .mod p/ or x y .mod p/. Hint: x 2 y 2 D .x C y/.x y/ An integer x is called a square root of n mod p when x2 n .mod p/: An integer with a square root is called a square mod p. For example, if n is con- gruent to 0 or 1 mod p, then n is a square and it is it’s own square root. So let’s assume that p is an odd prime and n 6 0 .mod p/. It turns out there is a simple test we can perform to see if n is a square mod p: k 19 Numbers of the form 22 C 1 are called Fermat numbers, so we can rephrase this conclusion as saying that any prime of the form 2s C 1 must actually be a Fermat number. The Fermat numbers are prime for k D 1; 2; 3; 4, but not for k D 5. In fact, it is not known if any Fermat number with k > 4 is prime. “mcs” — 2017/3/10 — 22:22 — page 361 — #369 9.13. References 361 Euler’s Criterion i. If n is a square modulo p, then n.p 1/=2 1 .mod p/. ii. If n is not a square modulo p then n.p 1/=2 1 .mod p/. (b) Prove Case (i) of Euler’s Criterion. Hint: Use Fermat’s theorem. (c) Prove Case (ii) of Euler’s Criterion. Hint: Use part (a) (d) Suppose that p 3 .mod 4/, and n is a square mod p. Find a simple expres- sion in terms of n and p for a square root of n. Hint: Write p as p D 4k C 3 and use Euler’s Criterion. You might have to multiply two sides of an equation by n at one point. Problem 9.61. Suppose a; b are relatively prime integers greater than 1. In this problem you will prove that Euler’s function is multiplicative, that is, that .ab/ D .a/.b/: The proof is an easy consequence of the Chinese Remainder Theorem (Problem 9.58). (a) Conclude from the Chinese Remainder Theorem that the function f W Œ0::ab/ ! Œ0::a/ Œ0::b/ defined by f .x/ WWD .rem.x; a/; rem.x; b// is a bijection. (b) For any positive integer k let Zk be the integers in Œ0::k/ that are relatively prime to k. Prove that the function f from part (a) also defines a bijection from Zab to Za Zb . (c) Conclude from the preceding parts of this problem that .ab/ D .a/.b/: (9.32) (d) Prove Corollary 9.10.11: for any number n > 1, if p1 , p2 , . . . , pj are the (distinct) prime factors of n, then 1 1 1 .n/ D n 1 1 1 : p1 p2 pj “mcs” — 2017/3/10 — 22:22 — page 362 — #370 362 Chapter 9 Number Theory Problem 9.62. Definition. Define the order of k over Zn to be ord.k; n/ WWD minfm > 0 j k m D 1 .Zn /g: If no positive power of k equals 1 in Zn , then ord.k; n/ WWD 1. (a) Show that k 2 Zn iff k has finite order in Zn . (b) Prove that for every k 2 Zn , the order of k over Zn divides .n/. Hint: Let m D ord.k; n/. Consider the quotient and remainder of .n/ divided by m. Problem 9.63. The general version of the Chinese Remainder Theorem (see Problem 9.58) extends to more than two relatively prime moduli. Namely, Theorem (General Chinese Remainder). Suppose a1 ; : : : ; ak are integers greater than 1 and each is relatively prime to the others. Let n WWD a1 a2 ak . Then for any integers m1 ; m2 ; : : : ; mk , there is a unique x 2 Œ0::n/ such that x mi .mod ai /; for 1 i k. The proof is a routine induction on k using a fact that follows immediately from unique factorization: if a number is relatively prime to some other numbers, then it is relatively prime to their product. The General Chinese Remainder Theorem is the basis for an efficient approach to performing a long series of additions and multiplications on “large” numbers. Namely, suppose n was large, but each of the factors ai was small enough to be handled by cheap and available arithmetic hardware units. Suppose a calculation requiring many additions and multiplications needs to be performed. To do a sin- gle multiplication or addition of two large numbers x and y in the usual way in this setting would involve breaking up the x and y into pieces small enough to be handled by the arithmetic units, using the arithmetic units to perform additions and multiplications on (many) pairs of small pieces, and then reassembling the pieces into an answer. Moreover, the order in which these operations on pieces can be performed is contrained by dependence among the pieces—because of “carries,” “mcs” — 2017/3/10 — 22:22 — page 363 — #371 9.13. References 363 for example. And this process of breakup and reassembly has to be performed for each addition and multiplication that needs to be performed on large numbers. Explain how the General Chinese Remainder Theorem can be applied to per- form a long series of additions and multiplications on “large” numbers much more efficiently than the usual way described above. Problem 9.64. In this problem we’ll prove that for all integers a; m where m > 1, am am .m/ .mod m/: (9.33) Note that a and m need not be relatively prime. Assume m D p1k1 pnkn for distinct primes, p1 ; : : : ; pn and positive integers k1 ; : : : ; k n . (a) Show that if pi does not divide a, then a.m/ 1 .mod piki /: (b) Show that if pi j a then am .m/ 0 .mod piki /: (9.34) (c) Conclude (9.33) from the facts above. Hint: am am .m/ D am .m/ .a.m/ 1/. Problem 9.65. The Generalized Postage Problem Several other problems (2.7, 2.1, 5.32) work out which amounts of postage can be formed using two stamps of given denominations. In this problem, we generalize this to two stamps with arbitrary positive integer denominations a and b cents. Let’s call an amount of postage that can be made from a and b cent stamps a makeable amount. Lemma. (Generalized Postage) If a and b are relatively prime positive integers, then any integer greater than ab a b is makeable. “mcs” — 2017/3/10 — 22:22 — page 364 — #372 364 Chapter 9 Number Theory To prove the Lemma, consider the following array with a infinite rows: 0 a 2a 3a ::: b bCa b C 2a b C 3a ::: 2b 2b C a 2b C 2a 2b C 3a ::: 3b 3b C a 3b C 2a 3b C 3a ::: :: :: :: :: : : : : ::: .a 1/b .a 1/b C a .a 1/b C 2a .a 1/b C 3a : : : Note that every element in this array is clearly makeable. (a) Suppose that n is at least as large as, and also congruent mod a to, the first element in some row of this array. Explain why n must appear in the array. (b) Prove that every integer from 0 to a 1 is congruent modulo a to one of the integers in the first column of this array. (c) Complete the proof of the Generalized Postage Lemma by using parts (a) and (b) to conclude that every integer n > ab a b appears in the array, and hence is makeable. Hint: Suppose n is congruent mod a to the first element in some row. Assume n is less than that element, and then show that n ab a b. (d) (Optional) What’s more, ab a b is not makeable. Prove it. (e) Explain why the following even more general lemma follows directly from the Generalized Lemma and part (d). Lemma. (Generalized2 Postage) If m and n are positive integers and gWWDgcd.m; n/ > 1, then with m and n cent stamps, you can only make amounts of postage that are multiples of g. You can actually make any amount of postage greater than .mn=g/ m n that is a multiple of g, but you cannot make .mn=g/ m n cents postage. (f) Optional and possibly unknown. Suppose you have three denominations of stamps, a; b; c and gcd.a; b; c/ D 1. Give a formula for the smallest number nabc such that you can make every amount of postage nabc . Exam Problems Problem 9.66. What is the remainder of 639601 divided by 220? “mcs” — 2017/3/10 — 22:22 — page 365 — #373 9.13. References 365 Problem 9.67. Prove that if k1 and k2 are relatively prime to n, then so is k1 n k2 , (a) . . . using the fact that k is relatively prime to n iff k has an inverse modulo n. Hint: Recall that k1 k2 k1 n k2 .mod n/. (b) . . . using the fact that k is relatively prime to n iff k is cancellable modulo n. (c) . . . using the Unique Factorization Theorem and the basic GCD properties such as Lemma 9.2.1. Problem 9.68. Circle true or false for the statements below, and provide counterexamples for those that are false. Variables, a; b; c; m; n range over the integers and m; n > 1. (a) gcd.1 C a; 1 C b/ D 1 C gcd.a; b/. true false (b) If a b .mod n/, then p.a/ p.b/ .mod n/ for any polynomial p.x/ with integer coefficients. true false (c) If a j bc and gcd.a; b/ D 1, then a j c. true false (d) gcd.an ; b n / D .gcd.a; b//n true false (e) If gcd.a; b/ ¤ 1 and gcd.b; c/ ¤ 1, then gcd.a; c/ ¤ 1. true false (f) If an integer linear combination of a and b equals 1, then so does some integer linear combination of a2 and b 2 . true false (g) If no integer linear combination of a and b equals 2, then neither does any integer linear combination of a2 and b 2 . true false (h) If ac bc .mod n/ and n does not divide c, then a b .mod n/. true false (i) Assuming a; b have inverses modulo n, if a 1 b 1 .mod n/, then a b .mod n/. true false (j) If ac bc .mod n/ and n does not divide c, then a b .mod n/. true false “mcs” — 2017/3/10 — 22:22 — page 366 — #374 366 Chapter 9 Number Theory (k) If a b .mod .n// for a; b > 0, then c a c b .mod n/. true false (l) If a b .mod nm/, then a b .mod n/. true false (m) If gcd.m; n/ D 1, then Œa b .mod m/ AND a b .mod n/ iff Œa b .mod mn/ true false (n) If gcd.a; n/ D 1, then an 1 1 .mod n/ true false (o) If a; b > 1, then [a has a inverse mod b iff b has an inverse mod a]. true false Problem 9.69. Find an integer k > 1 such that n and nk agree in their last three digits whenever n is divisible by neither 2 nor 5. Hint: Euler’s theorem. Problem 9.70. (a) Explain why . 12/482 has a multiplicative inverse modulo 175. (b) What is the value of .175/, where is Euler’s function? (c) Call a number from 0 to 174 powerful iff some positive power of the number is congruent to 1 modulo 175. What is the probability that a random number from 0 to 174 is powerful? (d) What is the remainder of . 12/482 divided by 175? Problem 9.71. (a) Calculate the remainder of 3586 divided by 29. (b) Part (a) implies that the remainder of 3586 divided by 29 is not equal to 1. So there there must be a mistake in the following proof, where all the congruences are “mcs” — 2017/3/10 — 22:22 — page 367 — #375 9.13. References 367 taken with modulus 29: 1 6 3586 (by part (a)) (9.35) 686 (since 35 6 .mod 29/) (9.36) 28 6 (since 86 28 .mod 29/) (9.37) 1 (by Fermat’s Little Theorem) (9.38) Identify the exact line containing the mistake and explain the logical error. Problem 9.72. Indicate whether the following statements are true or false. For each of the false statements, give counterexamples. All variables range over the integers, Z. (a) For all a and b, there are x and y such that: ax C by D 1. (b) gcd.mb C r; b/ D gcd.r; b/ for all m; r and b. (c) k p 1 1 .mod p/ for every prime p and every k. (d) For primes p ¤ q, .pq/ D .p 1/.q 1/, where is Euler’s totient function. (e) If a and b are relatively prime to d , then Œac bc mod d IMPLIES Œa b mod d : Problem 9.73. (a) Show that if p j n for some prime p and integer n > 0, then .p 1/ j .n/. (b) Conclude that .n/ is even for all n > 2. Problem 9.74. (a) Calculate the value of .6042/. Hint: 53 is a factor of 6042. (b) Consider an integer k > 0 that is relatively prime to 6042. Explain why k 9361 k .mod 6042/. Hint: Use your solution to part (a). “mcs” — 2017/3/10 — 22:22 — page 368 — #376 368 Chapter 9 Number Theory Problem 9.75. Let Sk D 1k C 2k C C p k ; where p is an odd prime and k is a positive multiple of p 1. Find a 2 Œ0::p/ and b 2 . p::0 such that Sk a b .mod p/: Problems for Section 9.11 Practice Problems Problem 9.76. Suppose a cracker knew how to factor the RSA modulus n into the product of distinct primes p and q. Explain how the cracker could use the public key-pair .e; n/ to find a private key-pair .d; n/ that would allow him to read any message encrypted with the public key. Problem 9.77. Suppose the RSA modulus n D pq is the product of distinct 200 digit primes p and q. A message m 2 Œ0::n/ is called dangerous if gcd.m; n/ D p, because such an m can be used to factor n and so crack RSA. Circle the best estimate of the fraction of messages in Œ0::n/ that are dangerous. 1 1 1 1 1 1 200 400 20010 10200 40010 10400 Problem 9.78. Ben Bitdiddle decided to encrypt all his data using RSA. Unfortunately, he lost his private key. He has been looking for it all night, and suddenly a genie emerges from his lamp. He offers Ben a quantum computer that can perform exactly one procedure on large numbers e; d; n. Which of the following procedures should Ben choose to recover his data? Find gcd.e; d /. Find the prime factorization of n. “mcs” — 2017/3/10 — 22:22 — page 369 — #377 9.13. References 369 Determine whether n is prime. Find rem.e d ; n/. Find the inverse of e modulo n (the inverse of e in Zn /. Find the inverse of e modulo .n/. Class Problems Problem 9.79. Let’s try out RSA! (a) Go through the beforehand steps. Choose primes p and q to be relatively small, say in the range 10–40. In practice, p and q might contain hundreds of digits, but small numbers are easier to handle with pencil and paper. Try e D 3; 5; 7; : : : until you find something that works. Use Euclid’s algo- rithm to compute the gcd. Find d (using the Pulverizer). When you’re done, put your public key on the board prominentally labelled “Public Key.” This lets another team send you a message. (b) Now send an encrypted message to another team using their public key. Select your message m from the codebook below: 2 = Greetings and salutations! 3 = Yo, wassup? 4 = You guys are slow! 5 = All your base are belong to us. 6 = Someone on our team thinks someone on your team is kinda cute. 7 = You are the weakest link. Goodbye. (c) Decrypt the message sent to you and verify that you received what the other team sent! Problem 9.80. (a) Just as RSA would be trivial to crack knowing the factorization into two primes of n in the public key, explain why RSA would also be trivial to crack knowing .n/. “mcs” — 2017/3/10 — 22:22 — page 370 — #378 370 Chapter 9 Number Theory (b) Show that if you knew n, .n/, and that n was the product of two primes, then you could easily factor n. Problem 9.81. A critical fact about RSA is, of course, that decrypting an encrypted message al- ways gives back the original message m. Namely, if n D pq where p and q are distinct primes, m 2 Œ0::pq/, and d e 1 .mod .p 1/.q 1//; then d bd WWD me m D m .Zn /: (9.39) We’ll now prove this. (a) Explain why (9.39) follows very simply from Euler’s theorem when m is rel- atively prime to n. All the rest of this problem is about removing the restriction that m be relatively prime to n. That is, we aim to prove that equation (9.39) holds for all m 2 Œ0::n/. It is important to realize that there is no practical reason to worry about—or to bother to check for—this relative primality condition before sending a message m using RSA. That’s because the whole RSA enterprise is predicated on the difficulty of factoring. If an m ever came up that wasn’t relatively prime to n, then we could factor n by computing gcd.m; n/. So believing in the security of RSA implies believing that the liklihood of a message m turning up that was not relatively prime to n is negligible. But let’s be pure, impractical mathematicians and get rid of this technically un- necessary relative primality side condition, even if it is harmless. One gain for doing this is that statements about RSA will be simpler without the side condition. More important, the proof below illustrates a useful general method of proving things about a number n by proving them separately for the prime factors of n. (b) Prove that if p is prime and a 1 .mod p 1/, then ma D m .Zp /: (9.40) (c) Give an elementary proof20 that if a b .mod pi / for distinct primes pi , then a b modulo the product of these primes. (d) Note that (9.39) is a special case of 20 There is no need to appeal to the Chinese Remainder Theorem. “mcs” — 2017/3/10 — 22:22 — page 371 — #379 9.13. References 371 Claim. If n is a product of distinct primes and a 1 .mod .n//, then ma D m .Zn /: Use the previous parts to prove the Claim. Homework Problems Problem 9.82. Although RSA has successfully withstood cryptographic attacks for a more than a quarter century, it is not known that breaking RSA would imply that factoring is easy. In this problem we will examine the Rabin cryptosystem that does have such a security certification. Namely, if someone has the ability to break the Rabin cryptosystem efficiently, then they also have the ability to factor numbers that are products of two primes. Why should that convince us that it is hard to break the cryptosystem efficiently? Well, mathematicians have been trying to factor efficiently for centuries, and they still haven’t figured out how to do it. What is the Rabin cryptosystem? The public key will be a number N that is a product of two very large primes p; q such that p q 3 .mod 4/. To send the message m, send rem.m2 ; N /.21 The private key is the factorization of N , namely, the primes p; q. We need to show that if the person being sent the message knows p; q, then they can decode the message. On the other hand, if an eavesdropper who doesn’t know p; q listens in, then we must show that they are very unlikely to figure out this message. Say that s is a square modulo N if there is an m 2 Œ0; N / such that s m2 .mod N /. Such an m is a square root of s modulo N . (a) What are the squares modulo 5? For each square in the interval Œ0; 5/, how many square roots does it have? (b) For each integer in Œ1; 15/ that is relatively prime to 15, how many square roots (modulo 15) does it have? Note that all the square roots are also relatively prime to 15. We won’t go through why this is so here, but keep in mind that this is a general phenomenon! (c) Suppose that p is a prime such that p 3 .mod 4/. It turns out that squares modulo p have exactly 2 square roots. First show that .p C 1/=4 is an integer. 21 We will see soon, that there are other numbers that would be encrypted by rem.m2 ; N /, so we’ll have to disallow those other numbers as possible messages in order to make it possible to decode this cryptosystem, but let’s ignore that for now. “mcs” — 2017/3/10 — 22:22 — page 372 — #380 372 Chapter 9 Number Theory Next figure out the two square roots of 1 modulo p. Then show that you can find a “square root mod a prime p” of a number by raising the number to the .p C 1/=4th power. That is, given s, to find m such that s m2 .mod p/, you can compute rem.s .pC1/=4 ; p/. (d) The Chinese Remainder Theorem (Problem 9.58) implies that if p; q are dis- tinct primes, then s is a square modulo pq if and only if s is a square modulo p and s is a square modulo q. In particular, if s x 2 .x 0 /2 .mod p/ where x ¤ x 0 , and likewise s y 2 .y 0 /2 .mod q/ then s has exactly four square roots modulo N , namely, s .xy/2 .x 0 y/2 .xy 0 /2 .x 0 y 0 /2 .mod pq/: So, if you know p; q, then using the solution to part (c), you can efficiently find the square roots of s! Thus, given the private key, decoding is easy. But what if you don’t know p; q? Let’s assume that the evil message interceptor claims to have a program that can find all four square roots of any number modulo N . Show that he can actually use this program to efficiently find the factorization of N . Thus, unless this evil message interceptor is extremely smart and has figured out something that the rest of the scientific community has been working on for years, it is very unlikely that this efficient square root program exists! Hint: Pick r arbitrarily from Œ1; N /. If gcd.N; r/ > 1, then you are done (why?) so you can halt. Otherwise, use the program to find all four square roots of r, call them r; r; r 0 ; r 0 . Note that r 2 r 02 .mod N /. How can you use these roots to factor N ? (e) If the evil message interceptor knows that the message is the encoding one of two possible candidate messages (that is, either “meet at dome at dusk” or “meet at dome at dawn”) and is just trying to figure out which of the two, then can he break this cryptosystem? Problem 9.83. You’ve seen how the RSA encryption scheme works, but why is it hard to break? In this problem, you will see that finding private keys is as hard as finding the prime factorizations of integers. Since there is a general consensus in the crypto community (enough to persuade many large financial institutions, for example) that factoring numbers with a few hundred digits requires astronomical computing resources, we can therefore be sure it will take the same kind of overwhelming “mcs” — 2017/3/10 — 22:22 — page 373 — #381 9.13. References 373 effort to find RSA private keys of a few hundred digits. This means we can be confident the private RSA keys are not somehow revealed by the public keys22 . For this problem, assume that n D p q where p; q are both odd primes and that e is the public key and d the private key of the RSA protocol.. Let c WWD e d 1. (a) Show that .n/ divides c. (b) Conclude that 4 divides c. (c) Show that if gcd.r; n/ D 1, then r c 1 .mod n/: A square root of m modulo n is an integer s 2 Œ0:n/ such that s 2 m .mod n/. Here is a nice fact to know: when n is a product of two odd primes, then every number m such that gcd.m; n/ D 1 has 4 square roots modulo n. In particular, the number 1 has four square roots modulo n. The two trivial ones are 1 and n 1 (which is 1 .mod n/). The other two are called the nontrivial square roots of 1. (d) Since you know c, then for any integer r you can also compute the remainder y of r c=2 divided by n. So y 2 r c .mod n/. Now if r is relatively prime to n, then y will be a square root of 1 modulo n by part (c). Show that if y turns out to be a nontrivial root of 1 modulo n, then you can factor n. Hint: From the fact that y 2 1 D .y C 1/.y 1/, show that y C 1 must be divisible by exactly one of q and p. (e) It turns out that at least half the positive integers r < n that are relatively prime to n will yield y’s in part (d) that are nontrivial roots of 1. Conclude that if, in addition to n and the public key e you also knew the private key d , then you can be sure of being able to factor n. Exam Problems Problem 9.84. Suppose Alice and Bob are using the RSA cryptosystem to send secure messages. Each of them has a public key visible to everyone and a private key known only to themselves, and using RSA in the usual way, they are able to send secret messages to each other over public channels. But a concern for Bob is how he knows that a message he gets is actually from Alice—as opposed to some imposter claiming to be Alice. This concern can be met by using RSA to add unforgeable “signatures” to messages. To send a message m 22 Thisis a very weak kind of “security” property, because it doesn’t even rule out the possibility of deciphering RSA encoded messages by some method that did not require knowing the private key. Nevertheless, over twenty years experience supports the security of RSA in practice. “mcs” — 2017/3/10 — 22:22 — page 374 — #382 374 Chapter 9 Number Theory to Bob with an unforgeable signature, Alice uses RSA encryption on her message m, but instead using Bob’s public key to encrypt m, she uses her own private key to obtain a message m1 . She then sends m1 as her “signed” message to Bob. (a) Explain how Bob can read the original message m from Alice’s signed mes- sage m1 . (Let .nA ; eA / be Alice’s public key and dA her private key. Assume m 2 Œ0::nA /.) (b) Briefly explain why Bob can be confident, assuming RSA is secure, that m1 came from Alice rather than some imposter. (c) Notice that not only Bob, but anyone can use Alice’s public key to reconstruct her message m from its signed version m1 . So how can Alice send a secret signed message to Bob over public channels? “mcs” — 2017/3/10 — 22:22 — page 375 — #383 10 Directed graphs & Partial Orders Directed graphs, called digraphs for short, provide a handy way to represent how things are connected together and how to get from one thing to another by following those connections. They are usually pictured as a bunch of dots or circles with arrows between some of the dots, as in Figure 10.1. The dots are called nodes or vertices and the lines are called directed edges or arrows; the digraph in Figure 10.1 has 4 nodes and 6 directed edges. Digraphs appear everywhere in computer science. For example, the digraph in Figure 10.2 represents a communication net, a topic we’ll explore in depth in Chap- ter 11. Figure 10.2 has three “in” nodes (pictured as little squares) representing locations where packets may arrive at the net, the three “out” nodes representing destination locations for packets, and the remaining six nodes (pictured with lit- tle circles) represent switches. The 16 edges indicate paths that packets can take through the router. Another place digraphs emerge in computer science is in the hyperlink structure of the World Wide Web. Letting the vertices x1 ; : : : ; xn correspond to web pages, and using arrows to indicate when one page has a hyperlink to another, results in a digraph like the one in Figure 10.3—although the graph of the real World Wide Web would have n be a number in the billions and probably even the trillions. At first glance, this graph wouldn’t seem to be very interesting. But in 1995, two students at Stanford, Larry Page and Sergey Brin, ultimately became multibillionaires from the realization of how useful the structure of this graph could be in building a search engine. So pay attention to graph theory, and who knows what might happen! b a c d Figure 10.1 A 4-node directed graph with 6 edges. “mcs” — 2017/3/10 — 22:22 — page 376 — #384 376 Chapter 10 Directed graphs & Partial Orders in0 in1 in2 out0 out1 out2 Figure 10.2 A 6-switch packet routing digraph. x3 x4 x7 x2 x1 x5 x6 Figure 10.3 Links among Web Pages. “mcs” — 2017/3/10 — 22:22 — page 377 — #385 10.1. Vertex Degrees 377 tail e head u v Figure 10.4 A directed edge e D hu ! vi. The edge e starts at the tail vertex u and ends at the head vertex v. Definition 10.0.1. A directed graph G consists of a nonempty set V .G/, called the vertices of G, and a set E.G/, called the edges of G. An element of V .G/ is called a vertex. A vertex is also called a node; the words “vertex” and “node” are used interchangeably. An element of E.G/ is called a directed edge. A directed edge is also called an “arrow” or simply an “edge.” A directed edge starts at some vertex u called the tail of the edge, and ends at some vertex v called the head of the edge, as in Figure 10.4. Such an edge can be represented by the ordered pair .u; v/. The notation hu ! vi denotes this edge. There is nothing new in Definition 10.0.1 except for a lot of vocabulary. For- mally, a digraph G is the same as a binary relation on the set, V D V .G/—that is, a digraph is just a binary relation whose domain and codomain are the same set V . In fact, we’ve already referred to the arrows in a relation G as the “graph” of G. For example, the divisibility relation on the integers in the interval Œ1::12 could be pictured by the digraph in Figure 10.5. 10.1 Vertex Degrees The in-degree of a vertex in a digraph is the number of arrows coming into it, and similarly its out-degree is the number of arrows out of it. More precisely, Definition 10.1.1. If G is a digraph and v 2 V .G/, then indeg.v/ WWD jfe 2 E.G/ j head.e/ D vgj outdeg.v/ WWD jfe 2 E.G/ j tail.e/ D vgj An immediate consequence of this definition is Lemma 10.1.2. X X indeg.v/ D outdeg.v/: v2V .G/ v2V .G/ Proof. Both sums are equal to jE.G/j. “mcs” — 2017/3/10 — 22:22 — page 378 — #386 378 Chapter 10 Directed graphs & Partial Orders 4 2 8 10 5 12 6 1 7 3 9 11 Figure 10.5 The Digraph for Divisibility on f1; 2; : : : ; 12g. 10.2 Walks and Paths Picturing digraphs with points and arrows makes it natural to talk about following successive edges through the graph. For example, in the digraph of Figure 10.5, you might start at vertex 1, successively follow the edges from vertex 1 to vertex 2, from 2 to 4, from 4 to 12, and then from 12 to 12 twice (or as many times as you like). The sequence of edges followed in this way is called a walk through the graph. A path is a walk which never visits a vertex more than once. So following edges from 1 to 2 to 4 to 12 is a path, but it stops being a path if you go to 12 again. The natural way to represent a walk is with the sequence of sucessive vertices it went through, in this case: 1 2 4 12 12 12: However, it is conventional to represent a walk by an alternating sequence of suc- cessive vertices and edges, so this walk would formally be 1 h1 ! 2i 2 h2 ! 4i 4 h4 ! 12i 12 h12 ! 12i 12 h12 ! 12i 12: (10.1) The redundancy of this definition is enough to make any computer scientist cringe, but it does make it easy to talk about how many times vertices and edges occur on the walk. Here is a formal definition: Definition 10.2.1. A walk in a digraph is an alternating sequence of vertices and edges that begins with a vertex, ends with a vertex, and such that for every edge hu ! vi in the walk, vertex u is the element just before the edge, and vertex v is the next element after the edge. “mcs” — 2017/3/10 — 22:22 — page 379 — #387 10.2. Walks and Paths 379 So a walk v is a sequence of the form v WWD v0 hv0 ! v1 i v1 hv1 ! v2 i v2 : : : hvk 1 ! vk i vk where hvi ! vi C1 i 2 E.G/ for i 2 Œ0::k/. The walk is said to start at v0 , to end at vk , and the length jvj of the walk is defined to be k. The walk is a path iff all the vi ’s are different, that is, if i ¤ j , then vi ¤ vj . A closed walk is a walk that begins and ends at the same vertex. A cycle is a positive length closed walk whose vertices are distinct except for the beginning and end vertices. Note that a single vertex counts as a length zero path that begins and ends at itself. It also is a closed walk, but does not count as a cycle, since cycles by definition must have positive length. Length one cycles are possible when a node has an arrow leading back to itself. The graph in Figure 10.1 has none, but every vertex in the divisibility relation digraph of Figure 10.5 is in a length one cycle. Length one cycles are sometimes called self-loops. Although a walk is officially an alternating sequence of vertices and edges, it is completely determined just by the sequence of successive vertices on it, or by the sequence of edges on it. We will describe walks in these ways whenever it’s convenient. For example, for the graph in Figure 10.1, .a; b; d /, or simply abd , is a (vertex-sequence description of a) length two path, .ha ! bi ; hb ! d i/, or simply ha ! bi hb ! d i, is (an edge-sequence de- scription of) the same length two path, abcbd is a length four walk, dcbcbd is a length five closed walk, bdcb is a length three cycle, hb ! ci hc ! bi is a length two cycle, and hc ! bi hb ai ha ! d i is not a walk. A walk is not allowed to follow edges in the wrong direction. If you walk for a while, stop for a rest at some vertex, and then continue walking, you have broken a walk into two parts. For example, stopping to rest after following two edges in the walk (10.1) through the divisibility graph breaks the walk into the first part of the walk 1 h1 ! 2i 2 h2 ! 4i 4 (10.2) “mcs” — 2017/3/10 — 22:22 — page 380 — #388 380 Chapter 10 Directed graphs & Partial Orders from 1 to 4, and the rest of the walk 4 h4 ! 12i 12 h12 ! 12i 12 h12 ! 12i 12: (10.3) from 4 to 12, and we’ll say the whole walk (10.1) is the mergewalks (10.2) and (10.3). In general, if a walk f ends with a vertex v and a walk r starts with the same vertex v we’ll say that their merge f br is the walk that starts with f and continues with r.1 Two walks can only be merged if the first walk ends at the same vertex v with which the second one walk starts. Sometimes it’s useful to name the node v where the walks merge; we’ll use the notation f bv r to describe the merge of a walk f that ends at v with a walk r that begins at v. A consequence of this definition is that Lemma 10.2.2. jfbrj D jfj C jrj: In the next section we’ll get mileage out of walking this way. 10.2.1 Finding a Path If you were trying to walk somewhere quickly, you’d know you were in trouble if you came to the same place twice. This is actually a basic theorem of graph theory. Theorem 10.2.3. The shortest walk from one vertex to another is a path. Proof. If there is a walk from vertex u to another vertex v ¤ u, then by the Well Ordering Principle, there must be a minimum length walk w from u to v. We claim w is a path. To prove the claim, suppose to the contrary that w is not a path, meaning that some vertex x occurs twice on this walk. That is, w D eb x fb xg for some walks e; f; g where the length of f is positive. But then “deleting” f yields a strictly shorter walk ebxg from u to v, contradicting the minimality of w. Definition 10.2.4. The distance, dist .u; v/, in a graph from vertex u to vertex v is the length of a shortest path from u to v. 1 It’s tempting to say the merge is the concatenation of the two walks, but that wouldn’t quite be right because if the walks were concatenated, the vertex v would appear twice in a row where the walks meet. “mcs” — 2017/3/10 — 22:22 — page 381 — #389 10.3. Adjacency Matrices 381 As would be expected, this definition of distance satisfies: Lemma 10.2.5. [The Triangle Inequality] dist .u; v/ dist .u; x/ C dist .x; v/ for all vertices u; v; x with equality holding iff x is on a shortest path from u to v. Of course, you might expect this property to be true, but distance has a technical definition and its properties can’t be taken for granted. For example, unlike ordinary distance in space, the distance from u to v is typically different from the distance from v to u. So, let’s prove the Triangle Inequality Proof. To prove the inequality, suppose f is a shortest path from u to x and r is a shortest path from x to v. Then by Lemma 10.2.2, f b x r is a walk of length dist .u; x/ C dist .x; v/ from u to v, so this sum is an upper bound on the length of the shortest path from u to v by Theorem 10.2.3. Proof of the “iff” is in Problem 10.3. Finally, the relationship between walks and paths extends to closed walks and cycles: Lemma 10.2.6. The shortest positive length closed walk through a vertex is a cycle through that vertex. The proof of Lemma 10.2.6 is essentially the same as for Theorem 10.2.3; see Problem 10.4. 10.3 Adjacency Matrices If a graph G has n vertices v0 ; v1 ; : : : ; vn 1 , a useful way to represent it is with an n n matrix of zeroes and ones called its adjacency matrix AG . The ij th entry of the adjacency matrix, .AG /ij , is 1 if there is an edge from vertex vi to vertex vj and 0 otherwise. That is, ( ˝ ˛ 1 if vi ! vj 2 E.G/; .AG /ij WWD 0 otherwise: “mcs” — 2017/3/10 — 22:22 — page 382 — #390 382 Chapter 10 Directed graphs & Partial Orders For example, let H be the 4-node graph shown in Figure 10.1. Its adjacency matrix AH is the 4 4 matrix: a b c d a 0 1 0 1 AH D b 0 0 1 1 c 0 1 0 0 d 0 0 1 0 A payoff of this representation is that we can use matrix powers to count numbers of walks between vertices. For example, there are two length two walks between vertices a and c in the graph H : a ha ! bi b hb ! ci c a ha ! d i d hd ! ci c and these are the only length two walks from a to c. Also, there is exactly one length two walk from b to c and exactly one length two walk from c to c and from d to b, and these are the only length two walks in H . It turns out we could have read these counts from the entries in the matrix .AH /2 : a b c d a 0 0 2 1 .AH /2 D b 0 1 1 0 c 0 0 1 1 d 0 1 0 0 More generally, the matrix .AG /k provides a count of the number of length k walks between vertices in any digraph G as we’ll now explain. Definition 10.3.1. The length-k walk counting matrix for an n-vertex graph G is the n n matrix C such that Cuv WWD the number of length-k walks from u to v: (10.4) Notice that the adjacency matrix AG is the length-1 walk counting matrix for G, and that .AG /0 , which by convention is the identity matrix, is the length-0 walk counting matrix. Theorem 10.3.2. If C is the length-k walk counting matrix for a graph G, and D is the length-m walk counting matrix, then CD is the length k C m walk counting matrix for G. “mcs” — 2017/3/10 — 22:22 — page 383 — #391 10.3. Adjacency Matrices 383 According to this theorem, the square .AG /2 of the adjacency matrix is the length two walk counting matrix for G. Applying the theorem again to .AG /2 AG shows that the length-3 walk counting matrix is .AG /3 . More generally, it follows by induction that Corollary 10.3.3. The length-k counting matrix of a digraph G is .AG /k , for all k 2 N. In other words, you can determine the number of length k walks between any pair of vertices simply by computing the kth power of the adjacency matrix! That may seem amazing, but the proof uncovers this simple relationship between matrix multiplication and numbers of walks. Proof of Theorem 10.3.2. Any length .kCm/ walk between vertices u and v begins with a length k walk starting at u and ending at some vertex w followed by a length m walk starting at w and ending at v. So the number of length .k C m/ walks from u to v that go through w at the kth step equals the number Cuw of length k walks from u to w, times the number Dw v of length m walks from w to v. We can get the total number of length .k C m/ walks from u to v by summing, over all possible vertices w, the number of such walks that go through w at the kth step. In other words, X #length .k C m/ walks from u to v D Cuw Dw v (10.5) w2V .G/ But the right-hand side of (10.5) is precisely the definition of .CD/uv . Thus, CD is indeed the length-.k C m/ walk counting matrix. 10.3.1 Shortest Paths The relation between powers of the adjacency matrix and numbers of walks is cool—to us math nerds at least—but a much more important problem is finding shortest paths between pairs of nodes. For example, when you drive home for vacation, you generally want to take the shortest-time route. One simple way to find the lengths of all the shortest paths in an n-vertex graph G is to compute the successive powers of AG one by one up to the n 1st, watching for the first power at which each entry becomes positive. That’s because Theo- rem 10.3.2 implies that the length of the shortest path, if any, between u and v, that is, the distance from u to v, will be the smallest value k for which .AG /kuv is nonzero, and if there is a shortest path, its length will be n 1. Refinements of this idea lead to methods that find shortest paths in reasonably efficient ways. The methods apply as well to weighted graphs, where edges are labelled with weights or costs and the objective is to find least weight, cheapest paths. These refinements “mcs” — 2017/3/10 — 22:22 — page 384 — #392 384 Chapter 10 Directed graphs & Partial Orders are typically covered in introductory algorithm courses, and we won’t go into them any further. 10.4 Walk Relations A basic question about a digraph is whether there is a way to get from one particular vertex to another. So for any digraph G we are interested in a binary relation G , called the walk relation on V .G/, where u G v WWD there is a walk in G from u to v: (10.6) Similarly, there is a positive walk relation u G C v WWD there is a positive length walk in G from u to v: (10.7) Definition 10.4.1. When there is a walk from vertex v to vertex w, we say that w is reachable from v, or equivalently, that v is connected to w. 10.4.1 Composition of Relations There is a simple way to extend composition of functions to composition of rela- tions, and this gives another way to talk about walks and paths in digraphs. Definition 10.4.2. Let R W B ! C and S W A ! B be binary relations. Then the composition of R with S is the binary relation .R ı S / W A ! C defined by the rule a .R ı S / c WWD 9b 2 B: .a S b/ AND .b R c/: (10.8) This agrees with the Definition 4.3.1 of composition in the special case when R and S are functions.2 Remembering that a digraph is a binary relation on its vertices, it makes sense to compose a digraph G with itself. Then if we let G n denote the composition of G with itself n times, it’s easy to check (see Problem 10.11) that G n is the length-n walk relation: a Gn b iff there is a length n walk in G from a to b: 2 The reversal of the order of Rand S in (10.8) is not a typo. This is so that relational composition generalizes function composition. The value of function f composed with function g at an argument x is f .g.x//. So in the composition f ı g, the function g is applied first. “mcs” — 2017/3/10 — 22:22 — page 385 — #393 10.5. Directed Acyclic Graphs & Scheduling 385 This even works for n D 0, with the usual convention that G 0 is the identity relation IdV .G/ on the set of vertices.3 Since there is a walk iff there is a path, and every path is of length at most jV .G/j 1, we now have4 G D G 0 [ G 1 [ G 2 [ : : : [ G jV .G/j 1 D .G [ G 0 /jV .G/j 1 : (10.9) The final equality points to the use of repeated squaring as a way to compute G with log n rather than n 1 compositions of relations. 10.5 Directed Acyclic Graphs & Scheduling Some of the prerequisites of MIT computer science subjects are shown in Fig- ure 10.6. An edge going from subject s to subject t indicates that s is listed in the catalogue as a direct prerequisite of t. Of course, before you can take subject t , you have to take not only subject s, but also all the prerequisites of s, and any pre- requisites of those prerequisites, and so on. We can state this precisely in terms of the positive walk relation: if D is the direct prerequisite relation on subjects, then subject u has to be completed before taking subject v iff u D C v. Of course it would take forever to graduate if this direct prerequisite graph had a positive length closed walk. We need to forbid such closed walks, which by Lemma 10.2.6 is the same as forbidding cycles. So, the direct prerequisite graph among subjects had better be acyclic: Definition 10.5.1. A directed acyclic graph (DAG) is a directed graph with no cycles. DAGs have particular importance in computer science. They capture key con- cepts used in analyzing task scheduling and concurrency control. When distributing a program across multiple processors, we’re in trouble if one part of the program needs an output that another part hasn’t generated yet! So let’s examine DAGs and their connection to scheduling in more depth. 3 The identity relation IdA on a set A is the equality relation: a IdA b iff a D b; for a; b 2 A. 4 Equation (10.9) involves a harmless abuse of notation: we should have written graph.G / D graph.G 0 / [ graph.G 1 / : : : : “mcs” — 2017/3/10 — 22:22 — page 386 — #394 386 Chapter 10 Directed graphs & Partial Orders New 6-3: SB in Computer Science and Engineering Subjects All subjects are 12 units ½+½ 6.UAT 6.UAT 6.UAP 6.UAP 66 units units 66 units units Advanced Advanced Undergraduate Undergraduate Subjects Subjects 2 AUS AUS http://www.eecs.mit.edu/ug/newcurriculum/aus.html http://www.eecs.mit.edu/ug/newcurriculum/aus.html http://www.eecs.mit.edu/ug/newcurriculum/aus.html 1 Software Software Lab Lab ((http://www.eecs.mit.edu/ug/newcurriculum/verghese_6.005.html) http://www.eecs.mit.edu/ug/newcurriculum/verghese_6.005.html) 3 6.033 6.033 6.034 6.034 6.046 6.046 Header comp comp sys sys AI AI adv adv algorithms algorithms 3 6.005* 6.006* Foundation 6.004 6.004 6.005* 6.006* comp comp architecture architecture software software algorithms algorithms 2 6.01* 6.01* 6.02* Introductory 6.02* coreq (= 1 Institute Lab) intro intro EECS EECS II intro intro EECS EECS IIII 18.06 or 18.03 2 coreq 18.06 18.06 18.03 18.03 6.042 6.042 Math linear linear algebra algebra diff diff eqs eqs discrete discrete math math (= 2 REST) 8.02 8.02 Elementary Elementary exposure exposure toto programming programming June 2009 *new subject (high (high school, IAP, school, IAP, or or 6.00) 6.00) Figure 10.6 Subject prerequisites for MIT Computer Science (6-3) Majors. “mcs” — 2017/3/10 — 22:22 — page 387 — #395 10.5. Directed Acyclic Graphs & Scheduling 387 left sock right sock underwear shirt pants tie left shoe right shoe belt jacket Figure 10.7 DAG describing which clothing items have to be put on before oth- ers. 10.5.1 Scheduling In a scheduling problem, there is a set of tasks, along with a set of constraints specifying that starting certain tasks depends on other tasks being completed be- forehand. We can map these sets to a digraph, with the tasks as the nodes and the direct prerequisite constraints as the edges. For example, the DAG in Figure 10.7 describes how a man might get dressed for a formal occasion. As we describe above, vertices correspond to garments and the edges specify which garments have to be put on before which others. When faced with a set of prerequisites like this one, the most basic task is finding an order in which to perform all the tasks, one at a time, while respecting the dependency constraints. Ordering tasks in this way is known as topological sorting. Definition 10.5.2. A topological sort of a finite DAG is a list of all the vertices such that each vertex v appears earlier in the list than every other vertex reachable from v. There are many ways to get dressed one item at a time while obeying the con- straints of Figure 10.7. We have listed two such topological sorts in Figure 10.8. In “mcs” — 2017/3/10 — 22:22 — page 388 — #396 388 Chapter 10 Directed graphs & Partial Orders underwear left sock shirt shirt pants tie belt underwear tie right sock jacket pants left sock right shoe right sock belt left shoe jacket right shoe left shoe (a) (b) Figure 10.8 Two possible topological sorts of the prerequisites described in Fig- ure 10.7 . fact, we can prove that every finite DAG has a topological sort. You can think of this as a mathematical proof that you can indeed get dressed in the morning. Topological sorts for finite DAGs are easy to construct by starting from minimal elements: Definition 10.5.3. An vertex v of a DAG D is minimum iff every other vertex is reachable from v. A vertex v is minimal iff v is not reachable from any other vertex. It can seem peculiar to use the words “minimum” and “minimal” to talk about vertices that start paths. These words come from the perspective that a vertex is “smaller” than any other vertex it connects to. We’ll explore this way of thinking about DAGs in the next section, but for now we’ll use these terms because they are conventional. One peculiarity of this terminology is that a DAG may have no minimum element but lots of minimal elements. In particular, the clothing example has four minimal elements: leftsock, rightsock, underwear, and shirt. To build an order for getting dressed, we pick one of these minimal elements— say, shirt. Now there is a new set of minimal elements; the three elements we didn’t chose as step 1 are still minimal, and once we have removed shirt, tie becomes minimal as well. We pick another minimal element, continuing in this way until all elements have been picked. The sequence of elements in the order they were picked will be a topological sort. This is how the topological sorts above were constructed. So our construction shows: “mcs” — 2017/3/10 — 22:22 — page 389 — #397 10.5. Directed Acyclic Graphs & Scheduling 389 Theorem 10.5.4. Every finite DAG has a topological sort. There are many other ways of constructing topological sorts. For example, in- stead of starting from the minimal elements at the beginning of paths, we could build a topological sort starting from maximal elements at the end of paths. In fact, we could build a topological sort by picking vertices arbitrarily from a finite DAG and simply inserting them into the list wherever they will fit.5 10.5.2 Parallel Task Scheduling For task dependencies, topological sorting provides a way to execute tasks one after another while respecting those dependencies. But what if we have the ability to execute more than one task at the same time? For example, say tasks are programs, the DAG indicates data dependence, and we have a parallel machine with lots of processors instead of a sequential machine with only one. How should we schedule the tasks? Our goal should be to minimize the total time to complete all the tasks. For simplicity, let’s say all the tasks take the same amount of time and all the processors are identical. So given a finite set of tasks, how long does it take to do them all in an optimal parallel schedule? We can use walk relations on acyclic graphs to analyze this problem. In the first unit of time, we should do all minimal items, so we would put on our left sock, our right sock, our underwear, and our shirt.6 In the second unit of time, we should put on our pants and our tie. Note that we cannot put on our left or right shoe yet, since we have not yet put on our pants. In the third unit of time, we should put on our left shoe, our right shoe, and our belt. Finally, in the last unit of time, we can put on our jacket. This schedule is illustrated in Figure 10.9. The total time to do these tasks is 4 units. We cannot do better than 4 units of time because there is a sequence of 4 tasks that must each be done before the next. We have to put on a shirt before pants, pants before a belt, and a belt before a jacket. Such a sequence of items is known as a chain. Definition 10.5.5. Two vertices in a DAG are comparable when one of them is reachable from the other. A chain in a DAG is a set of vertices such that any two of them are comparable. A vertex in a chain that is reachable from all other vertices in the chain is called a maximum element of the chain. A finite chain is said to end at its maximum element. 5 In fact, the DAG doesn’t even need to be finite, but you’ll be relieved to know that we have no need to go into this. 6 Yes, we know that you can’t actually put on both socks at once, but imagine you are being dressed by a bunch of robot processors and you are in a big hurry. Still not working for you? Ok, forget about the clothes and imagine they are programs with the precedence constraints shown in Figure 10.7. “mcs” — 2017/3/10 — 22:22 — page 390 — #398 390 Chapter 10 Directed graphs & Partial Orders A1 left sock right sock underwear shirt A2 pants tie A3 left shoe right shoe belt A4 jacket Figure 10.9 A parallel schedule for the tasks-getting-dressed digraph in Fig- ure 10.7. The tasks in Ai can be performed in step i for 1 i 4. A chain of 4 tasks (the critical path in this example) is shown with bold edges. “mcs” — 2017/3/10 — 22:22 — page 391 — #399 10.5. Directed Acyclic Graphs & Scheduling 391 The time it takes to schedule tasks, even with an unlimited number of processors, is at least as large as the number of vertices in any chain. That’s because if we used less time than the size of some chain, then two items from the chain would have to be done at the same step, contradicting the precedence constraints. For this reason, a largest chain is also known as a critical path. For example, Figure 10.9 shows the critical path for the getting-dressed digraph. In this example, we were able to schedule all the tasks with t steps, where t is the size of the largest chain. A nice feature of DAGs is that this is always possible! In other words, for any DAG, there is a legal parallel schedule that runs in t total steps. In general, a schedule for performing tasks specifies which tasks to do at suc- cessive steps. Every task a has to be scheduled at some step, and all the tasks that have to be completed before task a must be scheduled for an earlier step. Here’s a rigorous definition of schedule. Definition 10.5.6. A partition of a set A is a set of nonempty subsets of A called the blocks7 of the partition, such that every element of A is in exactly one block. For example, one possible partition of the set fa; b; c; d; eg into three blocks is fa; cg fb; eg fd g: Definition 10.5.7. A parallel schedule for a DAG D is a partition of V .D/ into blocks A0 ; A1 ; : : : ; such that when j < k, no vertex in Aj is reachable from any vertex in Ak . The block Ak is called the set of elements scheduled at step k, and the time of the schedule is the number of blocks. The maximum number of elements scheduled at any step is called the number of processors required by the schedule. A largest chain ending at an element a is called a critical path to a, and the number of elements less than a in the chain is called the depth of a. So in any possible parallel schedule, there must be at least depth .a/ steps before task a can be started. In particular, the minimal elements are precisely the elements with depth 0. There is a very simple schedule that completes every task in its minimum num- ber of steps: just use a “greedy” strategy of performing tasks as soon as possible. Schedule all the elements of depth k at step k. That’s how we found the above schedule for getting dressed. 7 We think it would be nicer to call them the parts of the partition, but “blocks” is the standard terminology. “mcs” — 2017/3/10 — 22:22 — page 392 — #400 392 Chapter 10 Directed graphs & Partial Orders Theorem 10.5.8. A minimum time schedule for a finite DAG D consists of the sets A0 ; A1 ; : : : ; where Ak WWD fa 2 V .D/ j depth .a/ D kg: We’ll leave to Problem 10.24 the proof that the sets Ak are a parallel schedule according to Definition 10.5.7. We can summarize the story above in this way: with an unlimited number of processors, the parallel time to complete all tasks is simply the size of a critical path: Corollary 10.5.9. Parallel time = size of critical path. Things get more complex when the number of processors is bounded; see Prob- lem 10.25 for an example. 10.5.3 Dilworth’s Lemma Definition 10.5.10. An antichain in a DAG is a set of vertices such that no two ele- ments in the set are comparable—no walk exists between any two different vertices in the set. Our conclusions about scheduling also tell us something about antichains. Corollary 10.5.11. In a DAG D if the size of the largest chain is t , then V .D/ can be partitioned into t antichains. Proof. Let the antichains be the sets Ak WWD fa 2 V .D/ j depth .a/ D kg. It is an easy exercise to verify that each Ak is an antichain (Problem 10.24). Corollary 10.5.11 implies8 a famous result about acyclic digraphs: Lemma 10.5.12 (Dilworth). For all t > 0, every DAG with n vertices must have either a chain of size greater than t or an antichain of size at least n=t. Proof. Assume that there is no chain of size greater than t . Let ` be the size of the largest antichain. If we make a parallel schedule according to the proof of Corollary 10.5.11, we create a number of antichains equal to the size of the largest chain, which is less than or equal t . Each element belongs to exactly one antichain, none of which are larger than `. So the total number of elements at most ` times t—that is, `t n. Simple division implies that ` n=t. 8 Lemma 10.5.12 also follows from a more general result known as Dilworth’s Theorem, which we will not discuss. “mcs” — 2017/3/10 — 22:22 — page 393 — #401 10.6. Partial Orders 393 p Corollary 10.5.13. Every DAG with n vertices has a chain of size greater than n p or an antichain of size at least n. p Proof. Set t D n in Lemma 10.5.12. Example 10.5.14. When the man in our example is getting dressed, n D 10. Try t D 3. There is a chain of size 4. Try t D 4. There is no chain of size 5, but there is an antichain of size 4 10=4. 10.6 Partial Orders After mapping the “direct prerequisite” relation onto a digraph, we were then able to use the tools for understanding computer scientists’ graphs to make deductions about something as mundane as getting dressed. This may or may not have im- pressed you, but we can do better. In the introduction to this chapter, we mentioned a useful fact that bears repeating: any digraph is formally the same as a binary relation whose domain and codomain are its vertices. This means that any binary relation whose domain is the same as its codomain can be translated into a digraph! Talking about the edges of a binary relation or the image of a set under a digraph may seem odd at first, but doing so will allow us to draw important connections between different types of relations. For instance, we can apply Dilworth’s lemma to the “direct prerequisite” relation for getting dressed, because the graph of that relation was a DAG. But how can we tell if a binary relation is a DAG? And once we know that a relation is a DAG, what exactly can we conclude? In this section, we will abstract some of the properties that a binary relation might have, and use those properties to define classes of relations. In particular, we’ll explain this section’s title, partial orders. 10.6.1 The Properties of the Walk Relation in DAGs To begin, let’s talk about some features common to all digraphs. Since merging a walk from u to v with a walk from v to w gives a walk from u to w, both the walk and positive walk relations have a relational property called transitivity: Definition 10.6.1. A binary relation R on a set A is transitive iff .a R b AND b R c/ IMPLIES a R c for every a; b; c 2 A. “mcs” — 2017/3/10 — 22:22 — page 394 — #402 394 Chapter 10 Directed graphs & Partial Orders So we have Lemma 10.6.2. For any digraph G the walk relations G C and G are transitive. Since there is a length zero walk from any vertex to itself, the walk relation has another relational property called reflexivity: Definition 10.6.3. A binary relation R on a set A is reflexive iff a R a for all a 2 A. Now we have Lemma 10.6.4. For any digraph G, the walk relation G is reflexive. We know that a digraph is a DAG iff it has no positive length closed walks. Since any vertex on a closed walk can serve as the beginning and end of the walk, saying a graph is a DAG is the same as saying that there is no positive length path from any vertex back to itself. This means that the positive walk relation of D C of a DAG has a relational property called irreflexivity. Definition 10.6.5. A binary relation R on a set A is irreflexive iff NOT .a R a/ for all a 2 A. So we have Lemma 10.6.6. R is a DAG iff RC is irreflexive. 10.6.2 Strict Partial Orders Here is where we begin to define interesting classes of relations: Definition 10.6.7. A relation that is transitive and irreflexive is called a strict par- tial order. A simple connection between strict partial orders and DAGs now follows from Lemma 10.6.6: Theorem 10.6.8. A relation R is a strict partial order iff R is the positive walk relation of a DAG. Strict partial orders come up in many situations which on the face of it have nothing to do with digraphs. For example, the less-than order < on numbers is a strict partial order: “mcs” — 2017/3/10 — 22:22 — page 395 — #403 10.6. Partial Orders 395 if x < y and y < z then x < z, so less-than is transitive, and NOT.x < x/, so less-than is irreflexive. The proper containment relation is also a partial order: if A B and B C then A C , so containment is transitive, and NOT.A A/, so proper containment is irreflexive. If there are two vertices that are reachable from each other, then there is a posi- tive length closed walk that starts at one vertex, goes to the other, and then comes back. So DAGs are digraphs in which no two vertices are mutually reachable. This corresponds to a relational property called asymmetry. Definition 10.6.9. A binary relation R on a set A is asymmetric iff a R b IMPLIES NOT.b R a/ for all a; b 2 A. So we can also characterize DAGs in terms of asymmetry: Corollary 10.6.10. A digraph D is a DAG iff D C is asymmetric. Corollary 10.6.10 and Theorem 10.6.8 combine to give Corollary 10.6.11. A binary relation R on a set A is a strict partial order iff it is transitive and asymmetric.9 A strict partial order may be the positive walk relation of different DAGs. This raises the question of finding a DAG with the smallest number of edges that deter- mines a given strict partial order. For finite strict partial orders, the smallest such DAG turns out to be unique and easy to find (see Problem 10.30). 10.6.3 Weak Partial Orders The less-than-or-equal relation is at least as familiar as the less-than strict partial order, and the ordinary containment relation is even more common than the proper containment relation. These are examples of weak partial orders, which are just strict partial orders with the additional condition that every element is related to itself. To state this precisely, we have to relax the asymmetry property so it does not apply when a vertex is compared to itself; this relaxed property is called antisymmetry: 9 Some texts use this corollary to define strict partial orders. “mcs” — 2017/3/10 — 22:22 — page 396 — #404 396 Chapter 10 Directed graphs & Partial Orders Definition 10.6.12. A binary relation R on a set A, is antisymmetric iff, for all a ¤ b 2 A, a R b IMPLIES NOT.b R a/ Now we can give an axiomatic definition of weak partial orders that parallels the definition of strict partial orders. Definition 10.6.13. A binary relation on a set is a weak partial order iff it is tran- sitive, reflexive, and antisymmetric. The following lemma gives another characterization of weak partial orders that follows directly from this definition. Lemma 10.6.14. A relation R on a set A is a weak partial order iff there is a strict partial order S on A such that aRb iff .a S b OR a D b/; for all a; b 2 A. Since a length zero walk goes from a vertex to itself, this lemma combined with Theorem 10.6.8 yields: Corollary 10.6.15. A relation is a weak partial order iff it is the walk relation of a DAG. For weak partial orders in general, we often write an ordering-style symbol like or v instead of a letter symbol like R.10 Likewise, we generally use or @ to indicate a strict partial order. Two more examples of partial orders are worth mentioning: Example 10.6.16. Let A be some family of sets and define a R b iff a b. Then R is a strict partial order. Example 10.6.17. The divisibility relation is a weak partial order on the nonnega- tive integers. For practice with the definitions, you can check that two more examples are vacuously partial orders on a set D: the identity relation IdD is a weak partial order, and the empty relation—the relation with no arrows—is a strict partial order. Note that some authors define “partial orders” to be what we call weak partial orders. However, we’ll use the phrase “partial order” to mean a relation that may be either a weak or strict partial order. 10 General relations are usually denoted by a letter like R instead of a cryptic squiggly symbol, so is kind of like the musical performer/composer Prince, who redefined the spelling of his name to be his own squiggly symbol. A few years ago he gave up and went back to the spelling “Prince.” “mcs” — 2017/3/10 — 22:22 — page 397 — #405 10.7. Representing Partial Orders by Set Containment 397 10.7 Representing Partial Orders by Set Containment Axioms can be a great way to abstract and reason about important properties of objects, but it helps to have a clear picture of the things that satisfy the axioms. DAGs provide one way to picture partial orders, but it also can help to picture them in terms of other familiar mathematical objects. In this section, we’ll show that every partial order can be pictured as a collection of sets related by containment. That is, every partial order has the “same shape” as such a collection. The technical word for “same shape” is “isomorphic.” Definition 10.7.1. A binary relation R on a set A is isomorphic to a relation S on a set B iff there is a relation-preserving bijection from A to B; that is, there is a bijection f W A ! B such that for all a; a0 2 A, a R a0 iff f .a/ S f .a0 /: To picture a partial order on a set A as a collection of sets, we simply represent each element A by the set of elements that are to that element, that is, a ! fb 2 A j b ag: For example, if is the divisibility relation on the set of integers f1; 3; 4; 6; 8; 12g, then we represent each of these integers by the set of integers in A that divides it. So 1 ! f1g 3 ! f1; 3g 4 ! f1; 4g 6 ! f1; 3; 6g 8 ! f1; 4; 8g 12 ! f1; 3; 4; 6; 12g So, the fact that 3 j 12 corresponds to the fact that f1; 3g f1; 3; 4; 6; 12g. In this way we have completely captured the weak partial order by the subset relation on the corresponding sets. Formally, we have Lemma 10.7.2. Let be a weak partial order on a set A. Then is isomorphic to the subset relation on the collection of inverse images under the relation of elements a 2 A. “mcs” — 2017/3/10 — 22:22 — page 398 — #406 398 Chapter 10 Directed graphs & Partial Orders We leave the proof to Problem 10.36. Essentially the same construction shows that strict partial orders can be represented by sets under the proper subset relation, (Problem 10.37). To summarize: Theorem 10.7.3. Every weak partial order is isomorphic to the subset relation on a collection of sets. Every strict partial order is isomorphic to the proper subset relation on a collection of sets. 10.8 Linear Orders The familiar order relations on numbers have an important additional property: given two different numbers, one will be bigger than the other. Partial orders with this property are said to be linear orders. You can think of a linear order as one where all the elements are lined up so that everyone knows exactly who is ahead and who is behind them in the line.11 Definition 10.8.1. Let R be a binary relation on a set A and let a; b be elements of A. Then a and b are comparable with respect to R iff Œa R b OR b R a. A partial order for which every two different elements are comparable is called a linear order. So < and are linear orders on R. On the other hand, the subset relation is not linear, since, for example, any two different finite sets of the same size will be incomparable under . The prerequisite relation on Course 6 required subjects is also not linear because, for example, neither 8.01 nor 6.042 is a prerequisite of the other. 10.9 Product Orders Taking the product of two relations is a useful way to construct new relations from old ones. 11 Linear orders are often called “total” orders, but this terminology conflicts with the definition of “total relation,” and it regularly confuses students. Being a linear order is a much stronger condition than being a partial order that is a total relation. For example, any weak partial order is a total relation but generally won’t be linear. “mcs” — 2017/3/10 — 22:22 — page 399 — #407 10.10. Equivalence Relations 399 Definition 10.9.1. The product R1 R2 of relations R1 and R2 is defined to be the relation with domain.R1 R2 / WWD domain.R1 / domain.R2 /; codomain.R1 R2 / WWD codomain.R1 / codomain.R2 /; .a1 ; a2 / .R1 R2 / .b1 ; b2 / iff Œa1 R1 b1 and a2 R2 b2 : It follows directly from the definitions that products preserve the properties of transitivity, reflexivity, irreflexivity, and antisymmetry (see Problem 10.50). If R1 and R2 both have one of these properties, then so does R1 R2 . This implies that if R1 and R2 are both partial orders, then so is R1 R2 . Example 10.9.2. Define a relation Y on age-height pairs of being younger and shorter. This is the relation on the set of pairs .y; h/ where y is a nonnegative integer 2400 that we interpret as an age in months, and h is a nonnegative integer 120 describing height in inches. We define Y by the rule .y1 ; h1 / Y .y2 ; h2 / iff y1 y2 AND h1 h2 : That is, Y is the product of the -relation on ages and the -relation on heights. Since both ages and heights are ordered numerically, the age-height relation Y is a partial order. Now suppose we have a class of 101 students. Then we can apply Dilworth’s lemma 10.5.12 to conclude that there is a chain of 11 students—that is, 11 students who get taller as they get older–or an antichain of 11 students—that is, 11 students who get taller as they get younger, which makes for an amusing in-class demo. On the other hand, the property of being a linear order is not preserved. For example, the age-height relation Y is the product of two linear orders, but it is not linear: the age 240 months, height 68 inches pair, (240,68), and the pair (228,72) are incomparable under Y . 10.10 Equivalence Relations Definition 10.10.1. A relation is an equivalence relation if it is reflexive, symmet- ric, and transitive. Congruence modulo n is an important example of an equivalence relation: It is reflexive because x x .mod n/. “mcs” — 2017/3/10 — 22:22 — page 400 — #408 400 Chapter 10 Directed graphs & Partial Orders It is symmetric because x y .mod n/ implies y x .mod n/. It is transitive because x y .mod n/ and y z .mod n/ imply that x z .mod n/. There is an even more well-known example of an equivalence relation: equality itself. Any total function defines an equivalence relation on its domain: Definition 10.10.2. If f W A ! B is a total function, define a relation f by the rule: a f a0 IFF f .a/ D f .a0 /: From its definition, f is reflexive, symmetric and transitive because these are properties of equality. That is, f is an equivalence relation. This observation gives another way to see that congruence modulo n is an equivalence relation: the Remainder Lemma 9.6.1 implies that congruence modulo n is the same as r where r.a/ is the remainder of a divided by n. In fact, a relation is an equivalence relation iff it equals f for some total func- tion f (see Problem 10.56). So equivalence relations could have been defined using Definition 10.10.2. 10.10.1 Equivalence Classes Equivalence relations are closely related to partitions because the images of ele- ments under an equivalence relation are the blocks of a partition. Definition 10.10.3. Given an equivalence relation R W A ! A, the equivalence class ŒaR of an element a 2 A is the set of all elements of A related to a by R. Namely, ŒaR WWD fx 2 A j a R xg: In other words, ŒaR is the image R.a/. For example, suppose that A D Z and a R b means that a b .mod 5/. Then Œ7R D f: : : ; 3; 2; 7; 12; 22; : : :g: Notice that 7, 12, 17, etc., all have the same equivalence class; that is, Œ7R D Œ12R D Œ17R D . There is an exact correspondence between equivalence relations on A and parti- tions of A. Namely, given any partition of a set, being in the same block is obviously an equivalence relation. On the other hand we have: “mcs” — 2017/3/10 — 22:22 — page 401 — #409 10.11. Summary of Relational Properties 401 Theorem 10.10.4. The equivalence classes of an equivalence relation on a set A are the blocks of a partition of A. We’ll leave the proof of Theorem 10.10.4 as a basic exercise in axiomatic rea- soning (see Problem 10.55), but let’s look at an example. The congruent-mod-5 relation partitions the integers into five equivalence classes: f: : : ; 5; 0; 5; 10; 15; 20; : : :g f: : : ; 4; 1; 6; 11; 16; 21; : : :g f: : : ; 3; 2; 7; 12; 17; 22; : : :g f: : : ; 2; 3; 8; 13; 18; 23; : : :g f: : : ; 1; 4; 9; 14; 19; 24; : : :g In these terms, x y .mod 5/ is equivalent to the assertion that x and y are both in the same block of this partition. For example, 6 16 .mod 5/, because they’re both in the second block, but 2 6 9 .mod 5/ because 2 is in the third block while 9 is in the last block. In social terms, if “likes” were an equivalence relation, then everyone would be partitioned into cliques of friends who all like each other and no one else. 10.11 Summary of Relational Properties A relation R W A ! A is the same as a digraph with vertices A. Reflexivity R is reflexive when 8x 2 A: x R x: Every vertex in R has a self-loop. Irreflexivity R is irreflexive when NOT Œ9x 2 A: x R x: There are no self-loops in R. Symmetry R is symmetric when 8x; y 2 A: x R y IMPLIES y R x: If there is an edge from x to y in R, then there is an edge back from y to x as well. “mcs” — 2017/3/10 — 22:22 — page 402 — #410 402 Chapter 10 Directed graphs & Partial Orders Asymmetry R is asymmetric when 8x; y 2 A: x R y IMPLIES NOT.y R x/: There is at most one directed edge between any two vertices in R, and there are no self-loops. Antisymmetry R is antisymmetric when 8x ¤ y 2 A: x R y IMPLIES NOT.y R x/: Equivalently, 8x; y 2 A: .x R y AND y R x/ IMPLIES x D y: There is at most one directed edge between any two distinct vertices, but there may be self-loops. Transitivity R is transitive when 8x; y; z 2 A: .x R y AND y R z/ IMPLIES x R z: If there is a positive length path from u to v, then there is an edge from u to v. Linear R is linear when 8x ¤ y 2 A: .x R y OR y R x/ Given any two vertices in R, there is an edge in one direction or the other between them. Strict Partial Order R is a strict partial order iff R is transitive and irreflexive iff R is transitive and asymmetric iff it is the positive length walk relation of a DAG. Weak Partial Order R is a weak partial order iff R is transitive and anti-symmetric and reflexive iff R is the walk relation of a DAG. Equivalence Relation R is an equivalence relation iff R is reflexive, symmetric and transitive iff R equals the in-the-same-block-relation for some partition of domain.R/. “mcs” — 2017/3/10 — 22:22 — page 403 — #411 10.11. Summary of Relational Properties 403 Problems for Section 10.1 Practice Problems Problem 10.1. Let S be a nonempty set of size n 2 ZC , and let f W S ! S be total function. Let Df be the digraph with vertices S whose edges are fhs ! f .s/i j s 2 S g. (a) What are the possible values of the out-degrees of vertices of Df ? (b) What are the possible values of the in-degrees of the vertices? (c) Suppose f is a surjection. Now what are the possible values of the in-degrees of the vertices? Exam Problems Problem 10.2. The proof of the Handshaking Lemma 10.1.2 invoked the “obvious” fact that in any finite digraph, the sum of the in-degrees of the vertices equals the number of arrows in the graph. That is, Claim. For any finite digraph G X indeg.v/ D j graph.G/j; (10.10) v2V .G/ But this Claim might not be obvious to everyone. So prove it by induction on the number j graph.G/j of arrows. Problems for Section 10.2 Practice Problems Problem 10.3. Lemma 10.2.5 states that dist .u; v/ dist .u; x/ C dist .x; v/. It also states that equality holds iff x is on a shortest path from u to v. (a) Prove the “iff” statement from left to right. (b) Prove the “iff” from right to left. “mcs” — 2017/3/10 — 22:22 — page 404 — #412 404 Chapter 10 Directed graphs & Partial Orders Class Problems Problem 10.4. (a) Give an example of a digraph that has a closed walk including two vertices but has no cycle including those vertices. (b) Prove Lemma 10.2.6: Lemma. The shortest positive length closed walk through a vertex is a cycle. Problem 10.5. A 3-bit string is a string made up of 3 characters, each a 0 or a 1. Suppose you’d like to write out, in one string, all eight of the 3-bit strings in any convenient order. For example, if you wrote out the 3-bit strings in the usual order starting with 000 001 010. . . , you could concatenate them together to get a length 3 8 D 24 string that started 000001010. . . . But you can get a shorter string containing all eight 3-bit strings by starting with 00010. . . . Now 000 is present as bits 1 through 3, and 001 is present as bits 2 through 4, and 010 is present as bits 3 through 5, . . . . (a) Say a string is 3-good if it contains every 3-bit string as 3 consecutive bits somewhere in it. Find a 3-good string of length 10, and explain why this is the minimum length for any string that is 3-good. (b) Explain how any walk that includes every edge in the graph shown in Fig- ure 10.10 determines a string that is 3-good. Find the walk in this graph that deter- mines your 3-good string from part (a). (c) Explain why a walk in the graph of Figure 10.10 that includes every every edge exactly once provides a minimum-length 3-good string.12 (d) Generalize the 2-bit graph to a k-bit digraph Bk for k 2, where V .Bk / WWD f0; 1gk , and any walk through Bk that contains every edge exactly once determines a minimum length .k C 1/-good bit-string.13 What is this minimum length? Define the transitions of Bk . Verify that the in-degree of each vertex is the same as its out-degree and that there is a positive path from any vertex to any other vertex (including itself) of length at most k. 12 The 3-good strings explained here generalize to n-good strings for n 3. They were studied by the great Dutch mathematician/logician Nicolaas de Bruijn, and are known as de Bruijn sequences. de Bruijn died in February, 2012 at the age of 94. 13 Problem 10.7 explains why such “Eulerian” paths exist. “mcs” — 2017/3/10 — 22:22 — page 405 — #413 10.11. Summary of Relational Properties 405 +1 10 +0 11 +1 +0 +1 +0 00 01 +0 +1 Figure 10.10 The 2-bit graph. Homework Problems Problem 10.6. (a) Give an example of a digraph in which a vertex v is on a positive even-length closed walk, but no vertex is on an even-length cycle. (b) Give an example of a digraph in which a vertex v is on an odd-length closed walk but not on an odd-length cycle. (c) Prove that every odd-length closed walk contains a vertex that is on an odd- length cycle. Problem 10.7. An Euler tour14 of a graph is a closed walk that includes every edge exactly once. Such walks are named after the famous 17th century mathematician Leonhard Eu- ler. (Same Euler as for the constant e 2:718 and the totient function —he did a lot of stuff.) So how do you tell in general whether a graph has an Euler tour? At first glance this may seem like a daunting problem (the similar sounding problem of finding a cycle that touches every vertex exactly once is one of those million dollar NP- 14 In some other texts, this is called an Euler circuit. “mcs” — 2017/3/10 — 22:22 — page 406 — #414 406 Chapter 10 Directed graphs & Partial Orders complete problems known as the Hamiltonian Cycle Problem)—but it turns out to be easy. (a) Show that if a graph has an Euler tour, then the in-degree of each vertex equals its out-degree. A digraph is weakly connected if there is a “path” between any two vertices that may follow edges backwards or forwards.15 In the remaining parts, we’ll work out the converse. Suppose a graph is weakly connected, and the in-degree of every vertex equals its out-degree. We will show that the graph has an Euler tour. A trail is a walk in which each edge occurs at most once. (b) Suppose that a trail in a weakly connected graph does not include every edge. Explain why there must be an edge not on the trail that starts or ends at a vertex on the trail. In the remaining parts, assume the graph is weakly connected, and the in-degree of every vertex equals its out-degree. Let w be the longest trail in the graph. (c) Show that if w is closed, then it must be an Euler tour. Hint: part (b) (d) Explain why all the edges starting at the end of w must be on w. (e) Show that if w was not closed, then the in-degree of the end would be bigger than its out-degree. Hint: part (d) (f) Conclude that if the in-degree of every vertex equals its out-degree in a finite, weakly connected digraph, then the digraph has an Euler tour. Problems for Section 10.3 Homework Problems Problem 10.8. The weight of a walk in a weighted graph is the sum of the weights of the successive 15 More precisely, a graph G is weakly connected iff there is a path from any vertex to any other vertex in the graph H with V .H / D V .G/; and E.H / D E.G/ [ fhv ! ui j hu ! vi 2 E.G/g: In other words H D G [ G 1. “mcs” — 2017/3/10 — 22:22 — page 407 — #415 10.11. Summary of Relational Properties 407 edges in the walk. The minimum weight matrix for length k walks in an n-vertex graph G is the n n matrix W such that for u; v 2 V .G/, ( w if w is the minimum weight among length k walks from u to v; Wuv WWD 1 if there is no length k walk from u to v: The min+ product of two n n matrices W and M with entries in R [ f1g is the n n matrix W M whose ij entry is min+ .W V /ij WWD minfWi k C Vkj j 1 k ng : min+ Prove the following theorem. Theorem. If W is the minimum weight matrix for length k walks in a weighted graph G, and V is the minimum weight matrix for length m walks, then W V is min+ the minimum weight matrix for length k C m walks. Problems for Section 10.4 Practice Problems Problem 10.9. Let A WWD f1; 2; 3g B WWD f4; 5; 6g R WWD f.1; 4/; .1; 5/; .2; 5/; .3; 6/g S WWD f.4; 5/; .4; 6/; .5; 4/g: Note that R is a relation from A to B and S is a relation from B to B. List the pairs in each of the relations below. (a) S ı R. (b) S ı S . (c) S 1 ı R. “mcs” — 2017/3/10 — 22:22 — page 408 — #416 408 Chapter 10 Directed graphs & Partial Orders Problem 10.10. In a round-robin tournament, every two distinct players play against each other just once. For a round-robin tournament with no tied games, a record of who beat whom can be described with a tournament digraph, where the vertices correspond to players and there is an edge hx ! yi iff x beat y in their game. A ranking is a path that includes all the players. So in a ranking, each player won the game against the next lowest ranked player, but may very well have lost their games against much lower ranked players—whoever does the ranking may have a lot of room to play favorites. (a) Give an example of a tournament digraph with more than one ranking. (b) Prove that if a tournament digraph is a DAG, then it has at most one ranking. (c) Prove that every finite tournament digraph has a ranking. Optional (d) Prove that the greater-than relation > on the rational numbers Q is a DAG and a tournament graph that has no ranking. Homework Problems Problem 10.11. Let R be a binary relation on a set A. Regarding R as a digraph, let W .n/ denote the length-n walk relation in the digraph R, that is, a W .n/ b WWD there is a length n walk from a to b in R: (a) Prove that W .n/ ı W .m/ D W .mCn/ (10.11) for all m; n 2 N, where ı denotes relational composition. (b) Let Rn be the composition of R with itself n times for n 0. So R0 WWD IdA , and RnC1 WWD R ı Rn . Conclude that Rn D W .n/ (10.12) for all n 2 N. (c) Conclude that jAj [ C R D Ri i D1 “mcs” — 2017/3/10 — 22:22 — page 409 — #417 10.11. Summary of Relational Properties 409 where RC is the positive length walk relation determined by R on the set A. Problem 10.12. We can represent a relation S between two sets A D fa1 ; : : : ; an g and B D fb1 ; : : : ; bm g as an n m matrix MS of zeroes and ones, with the elements of MS defined by the rule MS .i; j / D 1 IFF ai S bj : If we represent relations as matrices this way, then we can compute the com- position of two relations R and S by a “boolean” matrix multiplication ˝ of their matrices. Boolean matrix multiplication is the same as matrix multiplication except that addition is replaced by OR, multiplication is replaced by AND, and 0 and 1 are used as the Boolean values False and True. Namely, suppose R W B ! C is a bi- nary relation with C D fc1 ; : : : ; cp g. So MR is an m p matrix. Then MS ˝ MR is an n p matrix defined by the rule: ŒMS ˝ MR .i; j / WWD ORm kD1 ŒMS .i; k/ AND MR .k; j /: (10.13) Prove that the matrix representation MRıS of R ı S equals MS ˝ MR (note the reversal of R and S). Problem 10.13. Suppose that there are n chickens in a farmyard. Chickens are rather aggressive birds that tend to establish dominance in relationships by pecking; hence the term “pecking order.” In particular, for each pair of distinct chickens, either the first pecks the second or the second pecks the first, but not both. We say that chicken u virtually pecks chicken v if either: Chicken u directly pecks chicken v, or Chicken u pecks some other chicken w who in turn pecks chicken v. A chicken that virtually pecks every other chicken is called a king chicken. We can model this situation with a chicken digraph whose vertices are chickens with an edge from chicken u to chicken v precisely when u pecks v. In the graph in Figure 10.11, three of the four chickens are kings. Chicken c is not a king in this example since it does not peck chicken b and it does not peck any chicken that pecks chicken b. Chicken a is a king since it pecks chicken d , who in turn pecks chickens b and c. “mcs” — 2017/3/10 — 22:22 — page 410 — #418 410 Chapter 10 Directed graphs & Partial Orders a b king king king not a king d c Figure 10.11 A 4-chicken tournament in which chickens a, b and d are kings. . In general, a tournament digraph is a digraph with exactly one edge between each pair of distinct vertices. (a) Define a 10-chicken tournament graph with a king chicken that has outdegree 1. (b) Describe a 5-chicken tournament graph in which every player is a king. (c) Prove Theorem (King Chicken Theorem). Any chicken with maximum outdegree in a tournament is a king. The King Chicken Theorem means that if the player with the most victories is defeated by another player x, then at least he/she defeats some third player that defeats x. In this sense, the player with the most victories has some sort of bragging rights over every other player. Unfortunately, as Figure 10.11 illustrates, there can be many other players with such bragging rights, even some with fewer victories. Problems for Section 10.5 Practice Problems Problem 10.14. What is the size of the longest chain that is guaranteed to exist in any partially ordered set of n elements? What about the largest antichain? Problem 10.15. Let fA; :::; H g be a set of tasks that we must complete. The following DAG de- “mcs” — 2017/3/10 — 22:22 — page 411 — #419 10.11. Summary of Relational Properties 411 scribes which tasks must be done before others, where there is an arrow from S to T iff S must be done before T . (a) Write the longest chain. (b) Write the longest antichain. (c) If we allow parallel scheduling, and each task takes 1 minute to complete, what is the minimum amount of time needed to complete all tasks? Problem 10.16. Describe a sequence consisting of the integers from 1 to 10,000 in some order so that there is no increasing or decreasing subsequence of size 101. Problem 10.17. What is the smallest number of partially ordered tasks for which there can be more than one minimum time schedule, if there are unlimited number of processors? Explain your answer. Problem 10.18. The following DAG describes the prerequisites among tasks f1; : : : ; 9g. (a) If each task takes unit time to complete, what is the minimum parallel time to complete all the tasks? Briefly explain. “mcs” — 2017/3/10 — 22:22 — page 412 — #420 412 Chapter 10 Directed graphs & Partial Orders 9 8 3 5 7 2 4 6 1 (b) What is the minimum parallel time if no more than two tasks can be completed in parallel? Briefly explain. Problem 10.19. The following DAG describes the prerequisites among tasks f1; : : : ; 9g. 8 9 3 5 7 2 4 6 1 “mcs” — 2017/3/10 — 22:22 — page 413 — #421 10.11. Summary of Relational Properties 413 (a) If each task takes unit time to complete, what is the minimum parallel time to complete all the tasks? Briefly explain. (b) What is the minimum parallel time if no more than two tasks can be completed in parallel? Briefly explain. Class Problems Problem 10.20. The table below lists some prerequisite information for some subjects in the MIT Computer Science program (in 2006). This defines an indirect prerequisite relation that is a DAG with these subjects as vertices. 18:01 ! 6:042 18:01 ! 18:02 18:01 ! 18:03 6:046 ! 6:840 8:01 ! 8:02 6:001 ! 6:034 6:042 ! 6:046 18:03; 8:02 ! 6:002 6:001; 6:002 ! 6:003 6:001; 6:002 ! 6:004 6:004 ! 6:033 6:033 ! 6:857 (a) Explain why exactly six terms are required to finish all these subjects, if you can take as many subjects as you want per term. Using a greedy subject selection strategy, you should take as many subjects as possible each term. Exhibit your complete class schedule each term using a greedy strategy. (b) In the second term of the greedy schedule, you took five subjects including 18.03. Identify a set of five subjects not including 18.03 such that it would be possible to take them in any one term (using some nongreedy schedule). Can you figure out how many such sets there are? (c) Exhibit a schedule for taking all the courses—but only one per term. (d) Suppose that you want to take all of the subjects, but can handle only two per term. Exactly how many terms are required to graduate? Explain why. (e) What if you could take three subjects per term? “mcs” — 2017/3/10 — 22:22 — page 414 — #422 414 Chapter 10 Directed graphs & Partial Orders Problem 10.21. A pair of Math for Computer Science Teaching Assistants, Lisa and Annie, have decided to devote some of their spare time this term to establishing dominion over the entire galaxy. Recognizing this as an ambitious project, they worked out the following table of tasks on the back of Annie’s copy of the lecture notes. 1. Devise a logo and cool imperial theme music - 8 days. 2. Build a fleet of Hyperwarp Stardestroyers out of eating paraphernalia swiped from Lobdell - 18 days. 3. Seize control of the United Nations - 9 days, after task #1. 4. Get shots for Lisa’s cat, Tailspin - 11 days, after task #1. 5. Open a Starbucks chain for the army to get their caffeine - 10 days, after task #3. 6. Train an army of elite interstellar warriors by dragging people to see The Phantom Menace dozens of times - 4 days, after tasks #3, #4, and #5. 7. Launch the fleet of Stardestroyers, crush all sentient alien species, and es- tablish a Galactic Empire - 6 days, after tasks #2 and #6. 8. Defeat Microsoft - 8 days, after tasks #2 and #6. We picture this information in Figure 10.12 below by drawing a point for each task, and labelling it with the name and weight of the task. An edge between two points indicates that the task for the higher point must be completed before beginning the task for the lower one. (a) Give some valid order in which the tasks might be completed. Lisa and Annie want to complete all these tasks in the shortest possible time. However, they have agreed on some constraining work rules. Only one person can be assigned to a particular task; they cannot work to- gether on a single task. Once a person is assigned to a task, that person must work exclusively on the assignment until it is completed. So, for example, Lisa cannot work on building a fleet for a few days, run to get shots for Tailspin, and then return to building the fleet. “mcs” — 2017/3/10 — 22:22 — page 415 — #423 10.11. Summary of Relational Properties 415 devise logo build fleet u8 u 18 A E A E A E A E seize control u9 A uget shots E 11 B E B E E B B E open chain u B E 10 QQ B E Q B E Q E Q B 4 army B uP QQ B E train QPP E Q PP E Q PP E Q PP Q PP E u Q PP Pu EE defeat 6Q Microsoft launch fleet 8 Figure 10.12 Graph representing the task precedence constraints. (b) Lisa and Annie want to know how long conquering the galaxy will take. Annie suggests dividing the total number of days of work by the number of workers, which is two. What lower bound on the time to conquer the galaxy does this give, and why might the actual time required be greater? (c) Lisa proposes a different method for determining the duration of their project. She suggests looking at the duration of the critical path, the most time-consuming sequence of tasks such that each depends on the one before. What lower bound does this give, and why might it also be too low? (d) What is the minimum number of days that Lisa and Annie need to conquer the galaxy? No proof is required. Problem 10.22. Answer the following questions about the powerset pow.f1; 2; 3; 4g/ partially or- dered by the strict subset relation . (a) Give an example of a maximum length chain. “mcs” — 2017/3/10 — 22:22 — page 416 — #424 416 Chapter 10 Directed graphs & Partial Orders (b) Give an example of an antchain of size 6. (c) Describe an example of a topological sort of pow.f1; 2; 3; 4g/. (d) Suppose the partial order describes scheduling constraints on 16 tasks. That is, if A B f1; 2; 3; 4g; then A has to be completed before B starts.16 What is the minimum number of processors needed to complete all the tasks in minimum parallel time? Prove it. (e) What is the length of a minimum time 3-processor schedule? Prove it. Homework Problems Problem 10.23. The following operations can be applied to any digraph, G: 1. Delete an edge that is in a cycle. 2. Delete edge hu ! vi if there is a path from vertex u to vertex v that does not include hu ! vi. 3. Add edge hu ! vi if there is no path in either direction between vertex u and vertex v. The procedure of repeating these operations until none of them are applicable can be modeled as a state machine. The start state is G, and the states are all possible digraphs with the same vertices as G. (a) Let G be the graph with vertices f1; 2; 3; 4g and edges fh1 ! 2i ; h2 ! 3i ; h3 ! 4i ; h3 ! 2i ; h1 ! 4ig What are the possible final states reachable from G? A line graph is a graph whose edges are all on one path. All the final graphs in part (a) are line graphs. (b) Prove that if the procedure terminates with a digraph H then H is a line graph with the same vertices as G. Hint: Show that if H is not a line graph, then some operation must be applicable. 16 As usual, we assume each task requires one time unit to complete. “mcs” — 2017/3/10 — 22:22 — page 417 — #425 10.11. Summary of Relational Properties 417 (c) Prove that being a DAG is a preserved invariant of the procedure. (d) Prove that if G is a DAG and the procedure terminates, then the walk relation of the final line graph is a topological sort of G. Hint: Verify that the predicate P .u; v/ WWD there is a directed path from u to v is a preserved invariant of the procedure, for any two vertices u; v of a DAG. (e) Prove that if G is finite, then the procedure terminates. Hint: Let s be the number of cycles, e be the number of edges, and p be the number of pairs of vertices with a directed path (in either direction) between them. Note that p n2 where n is the number of vertices of G. Find coefficients a; b; c such that as C bp C e C c is nonnegative integer valued and decreases at each transition. Problem 10.24. Let be a strict partial order on a set A and let Ak WWD fa j depth .a/ D kg where k 2 N. (a) Prove that A0 ; A1 ; : : : is a parallel schedule for according to Definition 10.5.7. (b) Prove that Ak is an antichain. Problem 10.25. We want to schedule n tasks with prerequisite constraints among the tasks defined by a DAG. (a) Explain why any schedule that requires only p processors must take time at least dn=pe. (b) Let Dn;t be the DAG with n elements that consists of a chain of t 1 elements, with the bottom element in the chain being a prerequisite of all the remaining ele- ments as in the following figure: What is the minimum time schedule for Dn;t ? Explain why it is unique. How many processors does it require? (c) Write a simple formula M.n; t; p/ for the minimum time of a p-processor schedule to complete Dn;t . “mcs” — 2017/3/10 — 22:22 — page 418 — #426 418 Chapter 10 Directed graphs & Partial Orders t-1 ... ... n - (t - 1) (d) Show that every partial order with n vertices and maximum chain size t has a p-processor schedule that runs in time M.n; t; p/. Hint: Use induction on t. Problems for Section 10.6 Practice Problems Problem 10.26. In this DAG (Figure 10.13) for the divisibility relation on f1; : : : ; 12g, there is an upward path from a to b iff ajb. If 24 was added as a vertex, what is the minimum number of edges that must be added to the DAG to represent divisibility on f1; : : : ; 12; 24g? What are those edges? Problem 10.27. (a) Prove that every strict partial order is a DAG. (b) Give an example of a DAG that is not a strict partial order. (c) Prove that the positive walk relation of a DAG a strict partial order. “mcs” — 2017/3/10 — 22:22 — page 419 — #427 10.11. Summary of Relational Properties 419 8 12 9 4 6 10 11 3 2 5 7 1 Figure 10.13 Class Problems Problem 10.28. (a) What are the maximal and minimal elements, if any, of the power set pow.f1; : : : ; ng/, where n is a positive integer, under the empty relation? (b) What are the maximal and minimal elements, if any, of the set N of all non- negative integers under divisibility? Is there a minimum or maximum element? (c) What are the minimal and maximal elements, if any, of the set of integers greater than 1 under divisibility? (d) Describe a partially ordered set that has no minimal or maximal elements. (e) Describe a partially ordered set that has a unique minimal element, but no minimum element. Hint: It will have to be infinite. Problem 10.29. The proper subset relation defines a strict partial order on the subsets of Œ1::6, that is, on pow.Œ1::6/. (a) What is the size of a maximal chain in this partial order? Describe one. (b) Describe the largest antichain you can find in this partial order. (c) What are the maximal and minimal elements? Are they maximum and mini- mum? (d) Answer the previous part for the partial order on the set pow Œ1::6 ;. Problem 10.30. If a and b are distinct nodes of a digraph, then a is said to cover b if there is an “mcs” — 2017/3/10 — 22:22 — page 420 — #428 420 Chapter 10 Directed graphs & Partial Orders edge from a to b and every path from a to b includes this edge. If a covers b, the edge from a to b is called a covering edge. (a) What are the covering edges in the DAG in Figure 10.14? (b) Let covering .D/ be the subgraph of D consisting of only the covering edges. Suppose D is a finite DAG. Explain why covering .D/ has the same positive walk relation as D. Hint: Consider longest paths between a pair of vertices. (c) Show that if two DAG’s have the same positive walk relation, then they have the same set of covering edges. (d) Conclude that covering .D/ is the unique DAG with the smallest number of edges among all digraphs with the same positive walk relation as D. The following examples show that the above results don’t work in general for digraphs with cycles. (e) Describe two graphs with vertices f1; 2g which have the same set of covering edges, but not the same positive walk relation (Hint: Self-loops.) (f) (i) The complete digraph without self-loops on vertices 1; 2; 3 has directed edges in each direction between every two distinct vertices. What are its covering edges? (ii) What are the covering edges of the graph with vertices 1; 2; 3 and edges h1 ! 2i ; h2 ! 3i ; h3 ! 1i? (iii) What about their positive walk relations? Problems for Section 10.6 Exam Problems Problem 10.31. Prove that for any nonempty set D, there is a unique binary relation on D that is both asymmetric and symmetric. Problem 10.32. Let D be a set of size n > 0. Shown that there are exactly 2n binary relations on D that are both symmetric and antisymmetric. “mcs” — 2017/3/10 — 22:22 — page 421 — #429 10.11. Summary of Relational Properties 421 2 4 6 1 3 5 Figure 10.14 DAG with edges not needed in paths Homework Problems Problem 10.33. Prove that if R is a transitive binary relation on a set A then R D RC . Class Problems Problem 10.34. Let R be a binary relation on a set D. Each of the following equalities and contain- ments expresses the fact that R has one of the basic relational properties: reflexive, irreflexive, symmetric, asymmetric, antisymmetric, transitive. Identify which prop- erty is expressed by each of these formulas and explain your reasoning. (a) R \ IdD D ; (b) R R 1 (c) R D R 1 (d) IdD R (e) R ı R R (f) R \ R 1 D; (g) R \ R 1 IdD “mcs” — 2017/3/10 — 22:22 — page 422 — #430 422 Chapter 10 Directed graphs & Partial Orders Problems for Section 10.7 Class Problems Problem 10.35. Direct Prerequisites Subject 18.01 6.042 18.01 18.02 18.01 18.03 8.01 8.02 8.01 6.01 6.042 6.046 18.02, 18.03, 8.02, 6.01 6.02 6.01, 6.042 6.006 6.01 6.034 6.02 6.004 (a) For the above table of MIT subject prerequisites, draw a diagram showing the subject numbers with a line going down to every subject from each of its (direct) prerequisites. (b) Give an example of a collection of sets partially ordered by the proper subset relation that is isomorphic to (“same shape as”) the prerequisite relation among MIT subjects from part (a). (c) Explain why the empty relation is a strict partial order and describe a collection of sets partially ordered by the proper subset relation that is isomorphic to the empty relation on five elements—that is, the relation under which none of the five elements is related to anything. (d) Describe a simple collection of sets partially ordered by the proper subset re- lation that is isomorphic to the ”properly contains” relation on pow f1; 2; 3; 4g. Problem 10.36. This problem asks for a proof of Lemma 10.7.2 showing that every weak partial order can be represented by (is isomorphic to) a collection of sets partially ordered under set inclusion (). Namely, “mcs” — 2017/3/10 — 22:22 — page 423 — #431 10.11. Summary of Relational Properties 423 Lemma. Let be a weak partial order on a set A. For any element a 2 A, let L.a/ WWD fb 2 A j b ag; L WWD fL.a/ j a 2 Ag: Then the function L W A ! L is an isomorphism from the relation on A, to the subset relation on L. (a) Prove that the function L W A ! L is a bijection. (b) Complete the proof by showing that ab iff L.a/ L.b/ (10.14) for all a; b 2 A. Homework Problems Problem 10.37. Every partial order is isomorphic to a collection of sets under the subset relation (see Section 10.7). In particular, if R is a strict partial order on a set A and a 2 A, define L.a/ WWD fag [ fx 2 A j x R ag: (10.15) Then aRb iff L.a/ L.b/ (10.16) holds for all a; b 2 A. (a) Carefully prove statement (10.16), starting from the definitions of strict partial order and the strict subset relation . (b) Prove that if L.a/ D L.b/ then a D b. (c) Give an example showing that the conclusion of part (b) would not hold if the definition of L.a/ in equation (10.15) had omitted the expression “fag[.” Problems for Section 10.8 Practice Problems Problem 10.38. For each of the binary relations below, state whether it is a strict partial order, a weak partial order, or neither. If it is not a partial order, indicate which of the axioms for partial order it violates. “mcs” — 2017/3/10 — 22:22 — page 424 — #432 424 Chapter 10 Directed graphs & Partial Orders (a) The superset relation, on the power set pow f1; 2; 3; 4; 5g. (b) The relation between any two nonnegative integers a, b given by a b .mod 8/. (c) The relation between propositional formulas G, H given by G IMPLIES H is valid. (d) The relation ’beats’ on Rock, Paper and Scissor (for those who don’t know the game “Rock, Paper, Scissors:” Rock beats Scissors, Scissors beats Paper and Paper beats Rock). (e) The empty relation on the set of real numbers. (f) The identity relation on the set of integers. Problem 10.39. (a) Verify that the divisibility relation on the set of nonnegative integers is a weak partial order. (b) What about the divisibility relation on the set of integers? Problem 10.40. Prove directly from the definitions (without appealing to DAG properties) that if a binary relation R on a set A is transitive and irreflexive, then it is asymmetric. Class Problems Problem 10.41. Show that the set of nonnegative integers partially ordered under the divides rela- tion. . . (a) . . . has a minimum element. (b) . . . has a maximum element. (c) . . . has an infinite chain. (d) . . . has an infinite antichain. (e) What are the minimal elements of divisibility on the integers greater than 1? What are the maximal elements? “mcs” — 2017/3/10 — 22:22 — page 425 — #433 10.11. Summary of Relational Properties 425 Problem 10.42. How many binary relations are there on the set f0; 1g? How many are there that are transitive?, . . . asymmetric?, . . . reflexive?, . . . irreflexive?, . . . strict partial orders?, . . . weak partial orders? Hint: There are easier ways to find these numbers than listing all the relations and checking which properties each one has. Problem 10.43. Prove that if R is a partial order, then so is R 1. Problem 10.44. (a) Indicate which of the following relations below are equiva- lence relations, (Eq), strict partial orders (SPO), weak partial orders (WPO). For the partial orders, also indicate whether it is linear (Lin). If a relation is none of the above, indicate whether it is transitive (Tr), symmetric (Sym), or asymmetric (Asym). (i) The relation a D b C 1 between integers a, b, (ii) The superset relation on the power set of the integers. (iii) The empty relation on the set of rationals. (iv) The divides relation on the nonegative integers N. (v) The divides relation on all the integers Z. (vi) The divides relation on the positive powers of 4. (vii) The relatively prime relation on the nonnegative integers. (viii) The relation “has the same prime factors” on the integers. (b) A set of functions f; g W D ! R can be partially ordered by the relation, where Œf g WWD 8d 2 D: f .d / g.d /: Let L be the set of functions f W R ! R of the form f .x/ D ax C b for constants a; b 2 R. Describe an infinite chain and an infinite anti-chain in L. “mcs” — 2017/3/10 — 22:22 — page 426 — #434 426 Chapter 10 Directed graphs & Partial Orders Problem 10.45. In an n-player round-robin tournament, every pair of distinct players compete in a single game. Assume that every game has a winner—there are no ties. The results of such a tournament can then be represented with a tournament digraph where the vertices correspond to players and there is an edge hx ! yi iff x beat y in their game. (a) Explain why a tournament digraph cannot have cycles of length one or two. (b) Is the “beats” relation for a tournament graph always/sometimes/never: asymmetric? reflexive? irreflexive? transitive? Explain. (c) Show that a tournament graph is a linear order iff there are no cycles of length three. Homework Problems Problem 10.46. Let R and S be transitive binary relations on the same set A. Which of the following new relations must also be transitive? For each part, justify your answer with a brief argument if the new relation is transitive and a counterexample if it is not. (a) R 1 (b) R \ S (c) R ı R (d) R ı S Exam Problems Problem 10.47. Suppose the precedence constraints on a set of 32 unit time tasks was isomorphic to the powerset, pow.f1; 2; 3; 4; 5g/ under the strict subset relation . For example, the task corresponding to the set f2; 4g must be completed be- fore the task corresponding to the set f1; 2; 4g because f2; 4g f1; 2; 4g; the task “mcs” — 2017/3/10 — 22:22 — page 427 — #435 10.11. Summary of Relational Properties 427 corresponding to the empty set must be scheduled first because ; S for every nonempty set S f1; 2; 3; 4; 5g. (a) What is the minimum parallel time to complete these tasks? (b) Describe a maximum size antichain in this partial order. (c) Briefly explain why the minimum number of processors required to complete these tasks in minimum parallel time is equal to the size of the maximum antichain. Problem 10.48. Let R be a weak partial order on a set A. Suppose C is a finite chain.17 (a) Prove that C has a maximum element. Hint: Induction on the size of C . (b) Conclude that there is a unique sequence of all the elements of C that is strictly increasing. Hint: Induction on the size of C , using part (a). Problems for Section 10.9 Practice Problems Problem 10.49. Verify that if either of R1 or R2 is irreflexive, then so is R1 R2 . Class Problems Problem 10.50. Let R1 , R2 be binary relations on the same set A. A relational property is preserved under product, if R1 R2 has the property whenever both R1 and R2 have the property. (a) Verify that each of the following properties are preserved under product. 1. reflexivity, 2. antisymmetry, 3. transitivity. 17 Aset C is a chain when it is nonempty, and all elements c; d 2 C are comparable. Elements c and d are comparable iff Œc R d OR d R c. “mcs” — 2017/3/10 — 22:22 — page 428 — #436 428 Chapter 10 Directed graphs & Partial Orders (b) Verify that if R1 and R2 are partial orders and at least one of them is strict, then R1 R2 is a strict partial order. Problem 10.51. A partial order on a set A is well founded when every non-empty subset of A has a minimal element. For example, the less-than relation on a well ordered set of real numbers (see 2.4) is a linear order that is well founded. Prove that if R and S are well founded partial orders, then so is their product R S. Homework Problems Problem 10.52. Let S be a sequence of n different numbers. A subsequence of S is a sequence that can be obtained by deleting elements of S . For example, if S is .6; 4; 7; 9; 1; 2; 5; 3; 8/; then 647 and 7253 are both subsequences of S (for readability, we have dropped the parentheses and commas in sequences, so 647 abbreviates .6; 4; 7/, for example). An increasing subsequence of S is a subsequence of whose successive elements get larger. For example, 1238 is an increasing subsequence of S . Decreasing sub- sequences are defined similarly; 641 is a decreasing subsequence of S . (a) List all the maximum-length increasing subsequences of S, and all the maximum- length decreasing subsequences. Now let A be the set of numbers in S . (So A is the integers Œ1::9 for the example above.) There are two straightforward linear orders for A. The first is numerical order where A is ordered by the < relation. The second is to order the elements by which comes first in S ; call this order <S . So for the example above, we would have 6 <S 4 <S 7 <S 9 <S 1 <S 2 <S 5 <S 3 <S 8 Let be the product relation of the linear orders <s and <. That is, is defined by the rule a a0 WWD a < a0 AND a <S a0 : So is a partial order on A (Section 10.9). (b) Draw a diagram of the partial order on A. What are the maximal and mini- mal elements? “mcs” — 2017/3/10 — 22:22 — page 429 — #437 10.11. Summary of Relational Properties 429 (c) Explain the connection between increasing and decreasing subsequences of S , and chains and anti-chains under . (d) Prove that every sequence S of length n has an increasing subsequence of p p length greater than n or a decreasing subsequence of length at least n. Problems for Section 10.10 Practice Problems Problem 10.53. For each of the following relations, decide whether it is reflexive, whether it is symmetric, whether it is transitive, and whether it is an equivalence relation. (a) f.a; b/ j a and b are the same ageg (b) f.a; b/ j a and b have the same parentsg (c) f.a; b/ j a and b speak a common languageg Problem 10.54. For each of the binary relations below, state whether it is a strict partial order, a weak partial order, an equivalence relation, or none of these. If it is a partial order, state whether it is a linear order. If it is none, indicate which of the axioms for partial-order and equivalence relations it violates. (a) The superset relation on the power set pow f1; 2; 3; 4; 5g. (b) The relation between any two nonnegative integers a and b such that a b .mod 8/. (c) The relation between propositional formulas G and H such that ŒG IMPLIES H is valid. (d) The relation between propositional formulas G and H such that ŒG IFF H is valid. (e) The relation ‘beats’ on Rock, Paper, and Scissors (for those who don’t know the game Rock, Paper, Scissors, Rock beats Scissors, Scissors beats Paper, and Paper beats Rock). (f) The empty relation on the set of real numbers. “mcs” — 2017/3/10 — 22:22 — page 430 — #438 430 Chapter 10 Directed graphs & Partial Orders (g) The identity relation on the set of integers. (h) The divisibility relation on the integers Z. Class Problems Problem 10.55. Prove Theorem 10.10.4: The equivalence classes of an equivalence relation form a partition of the domain. Namely, let R be an equivalence relation on a set A and define the equivalence class of an element a 2 A to be ŒaR WWD fb 2 A j a R bg: That is, ŒaR D R.a/. (a) Prove that every block is nonempty and every element of A is in some block. (b) Prove that if ŒaR \ ŒbR ¤ ;, then a R b. Conclude that the sets ŒaR for a 2 A are a partition of A. (c) Prove that a R b iff ŒaR D ŒbR . Problem 10.56. For any total function f W A ! B define a relation f by the rule: a f a0 iff f .a/ D f .a0 /: (10.17) (a) Sketch a proof that f is an equivalence relation on A. (b) Prove that every equivalence relation R on a set A is equal to f for the function f W A ! pow.A/ defined as f .a/ WWD fa0 2 A j a R a0 g: That is, f .a/ D R.a/. Problem 10.57. Let R be a binary relation on a set D. Each of the following formulas expresses the fact that R has a familiar relational property such as reflexivity, asymmetry, tran- sitivity. Predicate formulas have roman numerals i.,ii.,. . . , and relational formulas (equalities and containments) are labelled with letters (a),(b),. . . . “mcs” — 2017/3/10 — 22:22 — page 431 — #439 10.11. Summary of Relational Properties 431 Next to each of the relational formulas, write the roman numerals of all the pred- icate formulas equivalent to it. It is not necessary to name the property expressed, but you can get partial credit if you do. For example, part (a) gets the label “i.” It expresses irreflexivity. i. 8d: NOT.d R d/ ii. 8d: dRd iii. 8c; d: c R d IFF d R c iv. 8c; d: c R d IMPLIES d R c v. 8c; d: c R d IMPLIES NOT.d R c/ vi. 8c ¤ d: c R d IMPLIES NOT.d R c/ vii. 8c ¤ d: c R d IFF NOT.d R c/ viii. 8b; c; d: .b R c AND c R d / IMPLIES b R d ix. 8b; d: Œ9c: .b R c AND c R d / IMPLIES b R d x. 8b; d: b R d IMPLIES Œ9c: .b R c AND c R d / (a) R \ IdD D ; i. (b) R R 1 (c) R D R 1 (d) IdD R (e) R ı R R (f) R R ı R (g) R \ R 1 IdD (h) R R 1 (i) R \ IdR D R 1 \ IdR (j) R \ R 1 D; “mcs” — 2017/3/10 — 22:22 — page 432 — #440 432 Chapter 10 Directed graphs & Partial Orders Homework Problems Problem 10.58. Let R1 and R2 be two equivalence relations on a set A. Prove or give a counterex- ample to the claims that the following are also equivalence relations: (a) R1 \ R2 . (b) R1 [ R2 . Problem 10.59. Prove that for any nonempty set D, there is a unique binary relation on D that is both a weak partial order and also an equivalence relation. Exam Problems Problem 10.60. Let A be a nonempty set. (a) Describe a single relation on A that is both an equivalence relation and a weak partial order on A. (b) Prove that the relation of part (a) is the only relation on A with these properties. “mcs” — 2017/3/10 — 22:22 — page 433 — #441 11 Communication Networks Modeling communication networks is an important application of digraphs in com- puter science. In this such models, vertices represent computers, processors, and switches; edges will represent wires, fiber, or other transmission lines through which data flows. For some communication networks, like the internet, the cor- responding graph is enormous and largely chaotic. Highly structured networks, by contrast, find application in telephone switching systems and the communication hardware inside parallel computers. In this chapter, we’ll look at some of the nicest and most commonly used structured networks. 11.1 Routing The kinds of communication networks we consider aim to transmit packets of data between computers, processors, telephones, or other devices. The term packet refers to some roughly fixed-size quantity of data—256 bytes or 4096 bytes or whatever. 11.1.1 Complete Binary Tree Let’s start with a complete binary tree. Figure 11.1 is an example with 4 inputs and 4 outputs. In this diagram, and many that follow, the squares represent terminals—sources and destinations for packets of data. The circles represent switches, which direct packets through the network. A switch receives packets on incoming edges and relays them forward along the outgoing edges. Thus, you can imagine a data packet hopping through the network from an input terminal, through a sequence of switches joined by directed edges, to an output terminal. In a tree there is a unique path between every pair of vertices, so there is only one way to route a packet of data from an input terminal to an output. For example, the route of a packet traveling from input 1 to output 3 is shown in bold. 11.1.2 Routing Problems Communication networks are supposed to get packets from inputs to outputs, with each packet entering the network at its own input switch and arriving at its own output switch. We’re going to consider several different communication network “mcs” — 2017/3/10 — 22:22 — page 434 — #442 434 Chapter 11 Communication Networks IN OUT IN OUT IN OUT IN OUT 0 0 1 1 2 2 3 3 Figure 11.1 Binary Tree net with 4 inputs and outputs designs, where each network has N inputs and N outputs; for convenience, we’ll assume N is a power of two. Which input is supposed to go where is specified by a permutation of Œ0::N 1. So a permutation defines a routing problem: get a packet that starts at input i to output .i /. A routing that solves a routing problem is a set P of paths from each input to its specified output. That is, P is a set of paths Pi where Pi goes from input i to output .i / for i 2 Œ0::N 1. 11.2 Routing Measures 11.2.1 Network Diameter The delay between the time that a packets arrives at an input and arrives at its designated output is a critical issue in communication networks. Generally, this delay is proportional to the length of the path a packet follows. Assuming it takes one time unit to travel across a wire, the delay of a packet will be the number of wires it crosses going from input to output. Packets are usually routed from input to output by the shortest path possible. With a shortest-path routing, the worst-case delay is the distance between the input and output that are farthest apart. This is called the diameter of the network. In other words, the diameter of a network1 is the maximum length of any shortest 1 Theusual definition of diameter for a general graph (simple or directed) is the largest distance between any two vertices, but in the context of a communication network we’re only interested in the “mcs” — 2017/3/10 — 22:22 — page 435 — #443 11.2. Routing Measures 435 path between an input and an output. For example, in the complete binary tree above, the distance from input 1 to output 3 is six. No input and output are farther apart than this, so the diameter of this tree is also six. More broadly, the diameter of a complete binary tree with N inputs and outputs is 2 log N C2. This is quite good, because the logarithm function grows very slowly. We could connect up 210 D 1024 inputs and outputs using a complete binary tree and the worst input-output delay for any packet would be 2 log.210 / C 2 D 22. Switch Size One way to reduce the diameter of a network is to use larger switches. For example, in the complete binary tree, most of the switches have three incoming edges and three outgoing edges, which makes them 3 3 switches. If we had 4 4 switches, then we could construct a complete ternary tree with an even smaller diameter. In principle, we could even connect up all the inputs and outputs via a single monster N N switch. Of course this isn’t very productive. Using an N N switch would just conceal the original network design problem inside this abstract switch. Eventually, we’ll have to design the internals of the monster switch using simpler components, and then we’re right back where we started. So, the challenge in designing a commu- nication network is figuring out how to get the functionality of an N N switch using fixed size, elementary devices, like 3 3 switches. 11.2.2 Switch Count Another goal in designing a communication network is to use as few switches as possible. In a complete binary tree, there is one “root” switch at the top, and the number of switches doubles at successive rows, so the number of switches in an N -input complete binary tree is 1 C 2 C 4 C 8 C C N . So the total number of switches is 2N 1 by the formula for geometric sums (Problem 5.4). This is nearly the best possible with 3 3 switches. 11.2.3 Network Latency We’ll sometimes be choosing routings through a network that optimize some quan- tity besides delay. For example, in the next section we’ll be trying to minimize packet congestion. When we’re not minimizing delay, shortest routings are not al- ways the best, and in general, the delay of a packet will depend on how it is routed. For any routing, the most delayed packet will be the one that follows the longest path in the routing. The length of the longest path in a routing is called its latency. distance between inputs and outputs, not between arbitrary pairs of vertices. “mcs” — 2017/3/10 — 22:22 — page 436 — #444 436 Chapter 11 Communication Networks IN OUT IN OUT IN OUT IN OUT IN OUT IN OUT IN OUT IN OUT 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 Figure 11.2 Two Routings in the Binary Tree Net The latency of a network depends on what’s being optimized. It is measured by assuming that optimal routings are always chosen in getting inputs to their specified outputs. That is, for each routing problem , we choose an optimal routing that solves . Then network latency is defined to be the largest routing latency among these optimal routings. Network latency will equal network diameter if routings are always chosen to optimize delay, but it may be significantly larger if routings are chosen to optimize something else. For the networks we consider below, paths from input to output are uniquely determined (in the case of the tree) or all paths are the same length, so network latency will always equal network diameter. 11.2.4 Congestion The complete binary tree has a fatal drawback: the root switch is a bottleneck. At best, this switch must handle right and vice-versa. Passing all these packets through a single switch could take a long time. At worst, if this switch fails, the network is broken into two equal-sized pieces. It’s true that if the routing problem is given by the identity permutation, Id.i / WWD i , then there is an easy routing P that solves the problem: let Pi be the path from input i up through one switch and back down to output i . On the other hand, if the problem was given by .i / WWD .N 1/ i , then in any solution Q for , each path Qi beginning at input i must eventually loop all the way up through the root switch and then travel back down to output .N 1/ i . These two situations are illustrated in Figure 11.2. We can distinguish between a “good” set of paths and a “bad” set based on congestion. The congestion of a routing P is equal to the largest number of paths in P that pass through a single switch. For example, the congestion of the routing on the left is 1, since at most 1 path passes through each switch. However, the congestion of the routing on the right is 4, since 4 paths pass through the root switch (and the two switches directly below the root). Generally, lower congestion is better since packets can be delayed at an overloaded switch. “mcs” — 2017/3/10 — 22:22 — page 437 — #445 11.3. Network Designs 437 in0 in1 in2 in3 out0 out1 out2 out3 Figure 11.3 Two-dimensional Array with N D 4. By extending the notion of congestion to networks, we can also distinguish be- tween “good” and “bad” networks with respect to bottleneck problems. For each routing problem for the network, we assume a routing is chosen that optimizes congestion, that is, that has the minimum congestion among all routings that solve . Then the largest congestion that will ever be suffered by a switch will be the maximum congestion among these optimal routings. This “maximin” congestion is called the congestion of the network. So for the complete binary tree, the worst permutation would be .i / WWD .N 1/ i . Then in every possible solution for , every packet would have to follow a path passing through the root switch. Thus, the max congestion of the complete binary tree is N —which is horrible! Let’s tally the results of our analysis so far: network diameter switch size # switches congestion complete binary tree 2 log N C 2 33 2N 1 N 11.3 Network Designs 11.3.1 2-D Array Communication networks can also be designed as2-dimensional arrays or grids. A 2-D array with four inputs and outputs is shown in Figure 11.3. The diameter in this example is 8, which is the number of edges between input 0 “mcs” — 2017/3/10 — 22:22 — page 438 — #446 438 Chapter 11 Communication Networks and output 3. More generally, the diameter of an array with N inputs and outputs is 2N , which is much worse than the diameter of 2 log N C 2 in the complete binary tree. But we get something in exchange: replacing a complete binary tree with an array almost eliminates congestion. Theorem 11.3.1. The congestion of an N -input array is 2. Proof. First, we show that the congestion is at most 2. Let be any permutation. Define a solution P for to be the set of paths, Pi , where Pi goes to the right from input i to column .i / and then goes down to output .i /. Thus, the switch in row i and column j transmits at most two packets: the packet originating at input i and the packet destined for output j . Next, we show that the congestion is at least 2. This follows because in any routing problem , where .0/ D 0 and .N 1/ D N 1, two packets must pass through the lower left switch. As with the tree, the network latency when minimizing congestion is the same as the diameter. That’s because all the paths between a given input and output are the same length. Now we can record the characteristics of the 2-D array. network diameter switch size # switches congestion complete binary tree 2 log N C 2 33 2N 1 N 2-D array 2N 22 N2 2 The crucial entry here is the number of switches, which is N 2 . This is a major defect of the 2-D array; a network of size N D 1000 would require a million 2 2 switches! Still, for applications where N is small, the simplicity and low congestion of the array make it an attractive choice. 11.3.2 Butterfly The Holy Grail of switching networks would combine the best properties of the complete binary tree (low diameter, few switches) and of the array (low conges- tion). The butterfly is a widely-used compromise between the two. A good way to understand butterfly networks is as a recursive data type. The recursive definition works better if we define just the switches and their connec- tions, omitting the terminals. So we recursively define Fn to be the switches and connections of the butterfly net with N WWD 2n input and output switches. The base case is F1 with 2 input switches and 2 output switches connected as in Figure 11.4. “mcs” — 2017/3/10 — 22:22 — page 439 — #447 11.3. Network Designs 439 À À 2 inputs 2 outputs ND21 Figure 11.4 F1 , the Butterfly Net with N D 21 . ⎧ Fn 2n ⎨ ⎩ 2n+1 n 1 outputs t t ⎧ Fn 2n ⎨ ⎩ new inputs Fn+1 Figure 11.5 Butterfly Net FnC1 with 2nC1 inputs from two Fn ’s. In the constructor step, we construct FnC1 out of two Fn nets connected to a new set of 2nC1 input switches, as shown in as in Figure 11.5. That is, the i th and 2n C i th new input switches are each connected to the same two switches, the i th input switches of each of two Fn components for i D 1; : : : ; 2n . The output switches of FnC1 are simply the output switches of each of the Fn copies. So FnC1 is laid out in columns of height 2nC1 by adding one more column of switches to the columns in Fn . Since the construction starts with two columns when n D 1, the FnC1 switches are arrayed in n C 1 columns. The total number of switches is the height of the columns times the number 2nC1 .n C 1/ of columns. Remembering that n D log N , we conclude that the Butterfly Net with N inputs has N.log N C 1/ switches. “mcs” — 2017/3/10 — 22:22 — page 440 — #448 440 Chapter 11 Communication Networks Since every path in FnC1 from an input switch to an output is length-n C 1 the diameter of the Butterfly net with 2nC1 inputs is this length plus two because of the two edges connecting to the terminals (square boxes)—one edge from input terminal to input switch (circle) and one from output switch to output terminal. There is an easy recursive procedure to route a packet through the Butterfly Net. In the base case, there is only one way to route a packet from one of the two inputs to one of the two outputs. Now suppose we want to route a packet from an input switch to an output switch in FnC1 . If the output switch is in the “top” copy of Fn , then the first step in the route must be from the input switch to the unique switch it is connected to in the top copy; the rest of the route is determined by recursively routing the rest of the way in the top copy of Fn . Likewise, if the output switch is in the “bottom” copy of Fn , then the first step in the route must be to the switch in the bottom copy, and the rest of the route is determined by recursively routing in the bottom copy of Fn . In fact, this argument shows that the routing is unique: there is exactly one path in the Butterfly Net from each input to each output, which implies that the network latency when minimizing congestion p is the same as the diameter. The congestion p of the butterfly network is about p N . More precisely, the con- gestion is N if N is an even power of 2 and N=2 if N is an odd power of 2. A simple proof of this appears in Problem 11.8. Let’s add the butterfly data to our comparison table: network diameter switch size # switches congestion complete binary tree 2 log N C 2 33 2N 1 N 2-D array 2N 22 N2 p 2p butterfly log N C 2 22 N.log.N / C 1/ N or N=2 The butterfly has lower congestion than the complete binary tree. It also uses fewer switches and has lower diameter than the array. However, the butterfly does not capture the best qualities of each network, but rather is a compromise somewhere between the two. Our quest for the Holy Grail of routing networks goes on. 11.3.3 Beneš Network In the 1960’s, a researcher at Bell Labs named Václav E. Beneš had a remarkable idea. He obtained a marvelous communication network with congestion 1 by plac- ing two butterflies back-to-back. This amounts to recursively growing Beneš nets by adding both inputs and outputs at each stage. Now we recursively define Bn to be the switches and connections (without the terminals) of the Beneš net with N WWD 2n input and output switches. The base case B1 with 2 input switches and 2 output switches is exactly the same as F1 in Figure 11.4. “mcs” — 2017/3/10 — 22:22 — page 441 — #449 11.3. Network Designs 441 À À 2n Bn 2nC1 2n Bn new inputs BnC1 new outputs Figure 11.6 Beneš Net BnC1 with 2nC1 inputs from two Bn ’s. In the constructor step, we construct BnC1 out of two Bn nets connected to a new set of 2nC1 input switches and also a new set of 2nC1 output switches. This is illustrated in Figure 11.6. The i th and 2n C i th new input switches are each connected to the same two switches: the i th input switches of each of two Bn components for i D 1; : : : ; 2n , exactly as in the Butterfly net. In addition, the i th and 2n C i th new output switches are connected to the same two switches, namely, to the i th output switches of each of two Bn components. Now, BnC1 is laid out in columns of height 2nC1 by adding two more columns of switches to the columns in Bn . So, the BnC1 switches are arrayed in 2.n C 1/ columns. The total number of switches is the number of columns times the height 2.n C 1/2nC1 of the columns. All paths in BnC1 from an input switch to an output are length 2.n C 1/ 1, and the diameter of the Beneš net with 2nC1 inputs is this length plus two because of the two edges connecting to the terminals. So Beneš has doubled the number of switches and the diameter, but by doing so he has completely eliminated congestion problems! The proof of this fact relies on a clever induction argument that we’ll come to in a moment. Let’s first see how the “mcs” — 2017/3/10 — 22:22 — page 442 — #450 442 Chapter 11 Communication Networks Beneš network stacks up: network diameter switch size # switches congestion complete binary tree 2 log N C 2 33 2N 1 N 2-D array 2N 22 N2 p 2p butterfly log N C 2 22 N.log.N / C 1/ N or N=2 Beneš 2 log N C 1 22 2N log N 1 The Beneš network has small size and diameter, and it completely eliminates con- gestion. The Holy Grail of routing networks is in hand! Theorem 11.3.2. The congestion of the N -input Beneš network is 1. Proof. By induction on n where N D 2n . So the induction hypothesis is P .n/ WWD the congestion of Bn is 1: Base case (n D 1): B1 D F1 is shown in Figure 11.4. The unique routings in F1 have congestion 1. Inductive step: We assume that the congestion of an N D 2n -input Beneš network is 1 and prove that the congestion of a 2N -input Beneš network is also 1. Digression. Time out! Let’s work through an example, develop some intuition, and then complete the proof. In the Beneš network shown in Figure 11.7 with N D 8 inputs and outputs, the two 4-input/output subnetworks are in dashed boxes. By the inductive assumption, the subnetworks can each route an arbitrary per- mutation with congestion 1. So if we can guide packets safely through just the first and last levels, then we can rely on induction for the rest! Let’s see how this works in an example. Consider the following permutation routing problem: .0/ D 1 .4/ D 3 .1/ D 5 .5/ D 6 .2/ D 4 .6/ D 0 .3/ D 7 .7/ D 2 We can route each packet to its destination through either the upper subnetwork or the lower subnetwork. However, the choice for one packet may constrain the choice for another. For example, we cannot route both packet 0 and packet 4 through the same network, since that would cause two packets to collide at a sin- gle switch, resulting in congestion. Rather, one packet must go through the upper network and the other through the lower network. Similarly, packets 1 and 5, 2 and “mcs” — 2017/3/10 — 22:22 — page 443 — #451 11.3. Network Designs 443 in0 out0 in1 out1 in2 out2 in3 out3 in4 out4 in5 out5 in6 out6 in7 out7 Figure 11.7 Beneš net B3 . 6, and 3 and 7 must be routed through different networks. Let’s record these con- straints in a graph. The vertices are the 8 packets. If two packets must pass through different networks, then there is an edge between them. Thus, our constraint graph looks like this: 1 5 0 2 4 6 7 3 Notice that at most one edge is incident to each vertex. The output side of the network imposes some further constraints. For example, the packet destined for output 0 (which is packet 6) and the packet destined for output 4 (which is packet 2) cannot both pass through the same network; that would require both packets to arrive from the same switch. Similarly, the packets destined for outputs 1 and 5, 2 and 6, and 3 and 7 must also pass through different switches. We can record these additional constraints in our graph with gray edges: “mcs” — 2017/3/10 — 22:22 — page 444 — #452 444 Chapter 11 Communication Networks 1 5 0 2 4 6 7 3 Notice that at most one new edge is incident to each vertex. The two lines drawn between vertices 2 and 6 reflect the two different reasons why these packets must be routed through different networks. However, we intend this to be a simple graph; the two lines still signify a single edge. Now here’s the key insight: suppose that we could color each vertex either red or blue so that adjacent vertices are colored differently. Then all constraints are satisfied if we send the red packets through the upper network and the blue packets through the lower network. Such a 2-coloring of the graph corresponds to a solu- tion to the routing problem. The only remaining question is whether the constraint graph is 2-colorable, which is easy to verify: Lemma 11.3.3. Prove that if the edges of a graph can be grouped into two sets such that every vertex has at most 1 edge from each set incident to it, then the graph is 2-colorable. Proof. It is not hard to show that a graph is 2-colorable iff every cycle in it has even length (see Theorem 12.8.3). We’ll take this for granted here. So all we have to do is show that every cycle has even length. Since the two sets of edges may overlap, let’s call an edge that is in both sets a doubled edge. There are two cases: Case 1: [The cycle contains a doubled edge.] No other edge can be incident to either of the endpoints of a doubled edge, since that endpoint would then be incident to two edges from the same set. So a cycle traversing a doubled edge has nowhere to go but back and forth along the edge an even number of times. Case 2: [No edge on the cycle is doubled.] Since each vertex is incident to at most one edge from each set, any path with no doubled edges must traverse successive edges that alternate from one set to the other. In particular, a cycle must traverse a path of alternating edges that begins and ends with edges from different sets. This means the cycle has to be of even length. For example, here is a 2-coloring of the constraint graph: “mcs” — 2017/3/10 — 22:22 — page 445 — #453 11.3. Network Designs 445 blue red 1 5 red 0 2 red blue 4 6 blue 7 3 blue red The solution to this graph-coloring problem provides a start on the packet routing problem: We can complete the routing in the two smaller Beneš networks by induction! Back to the proof. End of Digression. Let be an arbitrary permutation of Œ0::N 1. Let G be the graph whose vertices are packet numbers 0; 1; : : : ; N 1 and whose edges come from the union of these two sets: E1 WWDfhu—vi j ju vj D N=2g; and E2 WWDfhu—wi j j.u/ .w/j D N=2g: Now any vertex u is incident to at most two edges: a unique edge hu—vi 2 E1 and a unique edge hu—wi 2 E2 . So according to Lemma 11.3.3, there is a 2- coloring for the vertices of G. Now route packets of one color through the upper subnetwork and packets of the other color through the lower subnetwork. Since for each edge in E1 , one vertex goes to the upper subnetwork and the other to the lower subnetwork, there will not be any conflicts in the first level. Since for each edge in E2 , one vertex comes from the upper subnetwork and the other from the lower subnetwork, there will not be any conflicts in the last level. We can complete the routing within each subnetwork by the induction hypothesis P .n/. Problems for Section 11.2 Exam Problems Problem 11.1. Consider the following communication network: (a) What is the max congestion? (b) Give an input/output permutation 0 that forces maximum congestion. (c) Give an input/output permutation 1 that allows minimum congestion. “mcs” — 2017/3/10 — 22:22 — page 446 — #454 446 Chapter 11 Communication Networks in0 in1 in2 out0 out1 out2 (d) What is the latency for the permutation 1 ? (If you could not find 1 , just choose a permutation and find its latency.) Problems for Section 11.3 Class Problems Problem 11.2. The Beneš network has a max congestion of one—every permutation can be routed in such a way that a single packet passes through each switch. Let’s work through an example. A diagram of the Beneš network B3 of size N D 8 appears in Fig- ure 11.7. The two subnetworks of size N D 4 are marked. We’ll refer to these as the upper and lower subnetworks. (a) Now consider the following permutation routing problem: .0/ D 3 .4/ D 2 .1/ D 1 .5/ D 0 .2/ D 6 .6/ D 7 .3/ D 5 .7/ D 4 Each packet must be routed through either the upper subnetwork or the lower sub- network. Construct a graph with vertices numbered by integers 0 to 7 and draw a dashed edge between each pair of packets that cannot go through the same subnet- work because a collision would occur in the second column of switches. (b) Add a solid edge in your graph between each pair of packets that cannot go through the same subnetwork because a collision would occur in the next-to-last column of switches. “mcs” — 2017/3/10 — 22:22 — page 447 — #455 11.3. Network Designs 447 (c) Assign colors red and blue to the vertices of your graph so that vertices that are adjacent by either a dashed or a solid edge get different colors. Why must this be possible, regardless of the permutation ? (d) Suppose that red vertices correspond to packets routed through the upper sub- network and blue vertices correspond to packets routed through the lower subnet- work. Referring to the Beneš network shown in Figure 11.6, indicate the first and last edge traversed by each packet. (e) All that remains is to route packets through the upper and lower subnetworks. One way to do this is by applying the procedure described above recursively on each subnetwork. However, since the remaining problems are small, see if you can complete all the paths on your own. Problem 11.3. A multiple binary-tree network has N inputs and N outputs, where N is a power of 2. Each input is connected to the root of a binary tree with N=2 leaves and with edges pointing away from the root. Likewise, each output is connected to the root of a binary tree with N=2 leaves and with edges pointing toward the root. Two edges point from each leaf of an input tree, and each of these edges points to a leaf of an output tree. The matching of leaf edges is arranged so that for every input and output tree, there is an edge from a leaf of the input tree to a leaf of the output tree, and every output tree leaf has exactly two edges pointing to it. (a) Draw such a multiple binary-tree net for N D 4. (b) Fill in the table, and explain your entries. # switches switch size diameter max congestion Problem 11.4. The n-input 2-D array network was shown to have congestion 2. An n-input 2- layer array consisting of two n-input 2-D Arrays connected as pictured below for n D 4. In general, an n-input 2-layer array has two layers of switches, with each layer connected like an n-input 2-D array. There is also an edge from each switch in the first layer to the corresponding switch in the second layer. The inputs of the 2-layer “mcs” — 2017/3/10 — 22:22 — page 448 — #456 448 Chapter 11 Communication Networks in0 in1 in2 in3 out0 out1 out2 out3 array enter the left side of the first layer, and the n outputs leave from the bottom row of either layer. (a) For any given input-output permutation, there is a way to route packets that achieves congestion 1. Describe how to route the packets in this way. (b) What is the latency of a routing designed to minimize latency? (c) Explain why the congestion of any minimum latency (CML) routing of packets through this network is greater than the network’s congestion. Problem 11.5. A 5-path communication network is shown below. From this, it’s easy to see what an n-path network would be. Fill in the table of properties below, and be prepared to justify your answers. network # switches switch size diameter max congestion 5-path n-path Problem 11.6. “mcs” — 2017/3/10 — 22:22 — page 449 — #457 11.3. Network Designs 449 in0 in1 in2 in3 in4 out0 out1 out2 out3 out4 Figure 11.8 5-Path Tired of being a TA, Megumi has decided to become famous by coming up with a new, better communication network design. Her network has the following specifi- cations: every input node will be sent to a butterfly network, a Beneš network and a 2-d array network. At the end, the outputs of all three networks will converge on the new output. In the Megumi-net a minimum latency routing does not have minimum conges- tion. The latency for min-congestion (LMC) of a net is the best bound on latency achievable using routings that minimize congestion. Likewise, the congestion for min-latency (CML) is the best bound on congestion achievable using routings that minimize latency. in1 out1 in2 Butterfly out2 in3 out3 . . . Beneš . . . inN 2-d Array outN Fill in the following chart for Megumi’s new net and explain your answers. “mcs” — 2017/3/10 — 22:22 — page 450 — #458 450 Chapter 11 Communication Networks network diameter # switches congestion LMC CML Megumi’s net Homework Problems Problem 11.7. Louis Reasoner figures that, wonderful as the Beneš network may be, the butterfly network has a few advantages, namely: fewer switches, smaller diameter, and an easy way to route packets through it. So Louis designs an N -input/output network he modestly calls a Reasoner-net with the aim of combining the best features of both the butterfly and Beneš nets: The i th input switch in a Reasoner-net connects to two switches, ai and bi , and likewise, the j th output switch has two switches, yj and zj , connected to it. Then the Reasoner-net has an N -input Beneš network connected using the ai switches as input switches and the yj switches as its output switches. The Reasoner-net also has an N -input butterfly net connected using the bi switches as inputs and¡ the zj switches as outputs. In the Reasoner-net a minimum latency routing does not have minimum conges- tion. The latency for min-congestion (LMC) of a net is the best bound on latency achievable using routings that minimize congestion. Likewise, the congestion for min-latency (CML) is the best bound on congestion achievable using routings that minimize latency. Fill in the following chart for the Reasoner-net and briefly explain your answers. diameter switch size(s) # switches congestion LMC CML Problem 11.8. p Show that the congestion of the butterfly net, Fn , is exactly N when n is even. Hint: There is a unique path from each input to each output, so the congestion is the maximum number of messages passing through a vertex for any routing problem. “mcs” — 2017/3/10 — 22:22 — page 451 — #459 11.3. Network Designs 451 If v is a vertex in column i of the butterfly network, there is a path from ex- actly 2i input vertices to v and a path from v to exactly 2n i output vertices. At which column of the butterfly network must the congestion be worst? What is the congestion of the topmost switch in that column of the network? “mcs” — 2017/3/10 — 22:22 — page 452 — #460 “mcs” — 2017/3/10 — 22:22 — page 453 — #461 12 Simple Graphs Simple graphs model relationships that are symmetric, meaning that the relationship is mutual. Examples of such mutual relationships are being married, speaking the same language, not speaking the same language, occurring during overlapping time intervals, or being connected by a conducting wire. They come up in all sorts of applications, including scheduling, constraint satisfaction, computer graphics, and communications, but we’ll start with an application designed to get your attention: we are going to make a professional inquiry into sexual behavior. Specifically, we’ll look at some data about who, on average, has more opposite-gender partners: men or women. Sexual demographics have been the subject of many studies. In one of the largest, researchers from the University of Chicago interviewed a random sample of 2500 people over several years to try to get an answer to this question. Their study, published in 1994 and entitled The Social Organization of Sexuality, found that men have on average 74% more opposite-gender partners than women. Other studies have found that the disparity is even larger. In particular, ABC News claimed that the average man has 20 partners over his lifetime, and the av- erage woman has 6, for a percentage disparity of 233%. The ABC News study, aired on Primetime Live in 2004, purported to be one of the most scientific ever done, with only a 2.5% margin of error. It was called “American Sex Survey: A peek between the sheets”—raising some questions about the seriousness of their reporting. Yet again in August, 2007, the New York Times reported on a study by the National Center for Health Statistics of the U.S. government showing that men had seven partners while women had four. So, whose numbers do you think are more accurate: the University of Chicago, ABC News, or the National Center? Don’t answer—this is a trick question designed to trip you up. Using a little graph theory, we’ll explain why none of these findings can be anywhere near the truth. 12.1 Vertex Adjacency and Degrees Simple graphs are defined in almost the same way as digraphs, except that edges are undirected—they connect two vertices without pointing in either direction between the vertices. So instead of a directed edge hv ! wi which starts at vertex v and “mcs” — 2017/3/10 — 22:22 — page 454 — #462 454 Chapter 12 Simple Graphs ends at vertex w, a simple graph only has an undirected edge hv—wi that connects v and w. Definition 12.1.1. A simple graph G consists of a nonempty set, V .G/, called the vertices of G, and a set E.G/ called the edges of G. An element of V .G/ is called a vertex. A vertex is also called a node; the words “vertex” and “node” are used interchangeably. An element of E.G/ is an undirected edge or simply an “edge.” An undirected edge has two vertices u ¤ v called its endpoints. Such an edge can be represented by the two element set fu; vg. The notation hu—vi denotes this edge. Both hu—vi and hv—ui define the same undirected edge, whose endpoints are u and v. b h a f d g i c e Figure 12.1 An example of a graph with 9 nodes and 8 edges. For example, let H be the graph pictured in Figure 12.1. The vertices of H correspond to the nine dots in Figure 12.1, that is, V .H / D fa; b; c; d; e; f; g; h; i g : The edges correspond to the eight lines, that is, E.H / D f ha—bi ; ha—ci ; hb—d i ; hc—d i ; hc—ei ; he—f i ; he—gi ; hh—i i g: Mathematically, that’s all there is to the graph H . Definition 12.1.2. Two vertices in a simple graph are said to be adjacent iff they are the endpoints of the same edge, and an edge is said to be incident to each of its endpoints. The number of edges incident to a vertex v is called the degree of the vertex and is denoted by deg.v/. Equivalently, the degree of a vertex is the number of vertices adjacent to it. For example, for the graph H of Figure 12.1, vertex a is adjacent to vertex b, and b is adjacent to d . The edge ha—ci is incident to its endpoints a and c. Vertex h has degree 1, d has degree 2, and deg.e/ D 3. It is possible for a vertex to have “mcs” — 2017/3/10 — 22:22 — page 455 — #463 12.2. Sexual Demographics in America 455 degree 0, in which case it is not adjacent to any other vertices. A simple graph G does not need to have any edges at all. jE.G/j could be zero, implying that the degree of every vertex would also be zero. But a simple graph must have at least one vertex—jV .G/j is required to be at least one. An edge whose endpoints are the same is called a self-loop. Self-loops aren’t allowed in simple graphs.1 In a more general class of graphs called multigraphs, there can be more than one edge with the same two endpoints, but this doesn’t happen in simple graphs, because every edge is uniquely determined by its two endpoints. Sometimes graphs with no vertices, with self-loops, or with more than one edge between the same two vertices are convenient to have, but we don’t need them, and sticking with simple graphs is simpler. For the rest of this chapter we’ll use “graphs” as an abbreviation for “simple graphs.” A synonym for “vertices” is “nodes,” and we’ll use these words interchangeably. Simple graphs are sometimes called networks, edges are sometimes called arcs. We mention this as a “heads up” in case you look at other graph theory literature; we won’t use these words. 12.2 Sexual Demographics in America Let’s model the question of heterosexual partners in graph theoretic terms. To do this, we’ll let G be the graph whose vertices V are all the people in America. Then we split V into two separate subsets: M which contains all the males, and F which contains all the females.2 We’ll put an edge between a male and a female iff they have been sexual partners. This graph is pictured in Figure 12.2 with males on the left and females on the right. Actually, this is a pretty hard graph to figure out, let alone draw. The graph is enormous: the US population is about 300 million, so jV j 300M . Of these, approximately 50.8% are female and 49.2% are male, so jM j 147:6M , and jF j 152:4M . And we don’t even have trustworthy estimates of how many edges there are, let alone exactly which couples are adjacent. But it turns out that we don’t need to know any of this—we just need to figure out the relationship between the average number of partners per male and partners per female. To do this, we note that every edge has exactly one endpoint at an M vertex (remember, we’re only considering male-female relationships); so the sum of the degrees of the M 1 You might try to represent a self-loop going between a vertex v and itself as fv; vg, but this equals fvg. It wouldn’t be an edge, which is defined to be a set of two vertices. 2 For simplicity, we’ll ignore the possibility of someone being both a man and a woman, or neither. “mcs” — 2017/3/10 — 22:22 — page 456 — #464 456 Chapter 12 Simple Graphs M F Figure 12.2 The sex partners graph. vertices equals the number of edges. For the same reason, the sum of the degrees of the F vertices equals the number of edges. So these sums are equal: X X deg.x/ D deg.y/: x2M y2F Now suppose we divide both sides of this equation by the product of the sizes of the two sets, jM j jF j: P ! y2F deg.y/ P x2M deg.x/ 1 1 D jM j jF j jF j jM j The terms above in parentheses are the average degree of an M vertex and the average degree of an F vertex. So we know: jF j Avg. deg in M D Avg. deg in F (12.1) jM j In other words, we’ve proved that the average number of female partners of males in the population compared to the average number of males per female is determined solely by the relative number of males and females in the population. Now the Census Bureau reports that there are slightly more females than males in America; in particular jF j=jM j is about 1.035. So we know that males have on average 3.5% more opposite-gender partners than females, and that this tells us nothing about any sex’s promiscuity or selectivity. Rather, it just has to do with the relative number of males and females. Collectively, males and females have the “mcs” — 2017/3/10 — 22:22 — page 457 — #465 12.3. Some Common Graphs 457 same number of opposite gender partners, since it takes one of each set for every partnership, but there are fewer males, so they have a higher ratio. This means that the University of Chicago, ABC, and the Federal government studies are way off. After a huge effort, they gave a totally wrong answer. There’s no definite explanation for why such surveys are consistently wrong. One hypothesis is that males exaggerate their number of partners—or maybe fe- males downplay theirs—but these explanations are speculative. Interestingly, the principal author of the National Center for Health Statistics study reported that she knew the results had to be wrong, but that was the data collected, and her job was to report it. The same underlying issue has led to serious misinterpretations of other survey data. For example, a couple of years ago, the Boston Globe ran a story on a survey of the study habits of students on Boston area campuses. Their survey showed that on average, minority students tended to study with non-minority students more than the other way around. They went on at great length to explain why this “remarkable phenomenon” might be true. But it’s not remarkable at all. Using our graph theory formulation, we can see that all it says is that there are fewer students in a minority than students not in that minority, which is, of course, what “minority” means. 12.2.1 Handshaking Lemma The previous argument hinged on the connection between a sum of degrees and the number of edges. There is a simple connection between these in any graph: Lemma 12.2.1. The sum of the degrees of the vertices in a graph equals twice the number of edges. Proof. Every edge contributes two to the sum of the degrees, one for each of its endpoints. We refer to Lemma 12.2.1 as the Handshaking Lemma: if we total up the number of people each person at a party shakes hands with, the total will be twice the number of handshakes that occurred. 12.3 Some Common Graphs Some graphs come up so frequently that they have names. A complete graph Kn has n vertices and an edge between every two vertices, for a total of n.n 1/=2 edges. For example, K5 is shown in Figure 12.3. “mcs” — 2017/3/10 — 22:22 — page 458 — #466 458 Chapter 12 Simple Graphs Figure 12.3 K5 : the complete graph on 5 nodes. Figure 12.4 An empty graph with 5 nodes. The empty graph has no edges at all. For example, the empty graph with 5 nodes is shown in Figure 12.4. An n-node graph containing n 1 edges in sequence is known as a line graph Ln . More formally, Ln has V .Ln / D fv1 ; v2 ; : : : ; vn g and E.Ln / D f hv1 —v2 i ; hv2 —v3 i ; : : : ; hvn 1 —vn i g For example, L5 is pictured in Figure 12.5. There is also a one-way infinite line graph L1 which can be defined by letting the nonnegative integers N be the vertices with edges hk—.k C 1/i for all k 2 N. Figure 12.5 L5 : a 5-node line graph. “mcs” — 2017/3/10 — 22:22 — page 459 — #467 12.4. Isomorphism 459 Figure 12.6 C5 : a 5-node cycle graph. a b 1 2 d c 4 3 (a) (b) Figure 12.7 Two Isomorphic graphs. If we add the edge hvn —v1 i to the line graph Ln , we get a graph called a length- n cycle Cn . Figure 12.6 shows a picture of length-5 cycle. 12.4 Isomorphism Two graphs that look different might actually be the same in a formal sense. For example, the two graphs in Figure 12.7 are both 4-vertex, 5-edge graphs and you get graph (b) by a 90o clockwise rotation of graph (a). Strictly speaking, these graphs are different mathematical objects, but this dif- ference doesn’t reflect the fact that the two graphs can be described by the same picture—except for the labels on the vertices. This idea of having the same picture “up to relabeling” can be captured neatly by adapting Definition 10.7.1 of isomor- phism of digraphs to handle simple graphs. An isomorphism between two graphs is an edge-preserving bijection between their sets of vertices: Definition 12.4.1. An isomorphism between graphs G and H is a bijection f W V .G/ ! V .H / such that hu—vi 2 E.G/ iff hf .u/—f .v/i 2 E.H / “mcs” — 2017/3/10 — 22:22 — page 460 — #468 460 Chapter 12 Simple Graphs Figure 12.8 Isomorphic C5 graphs. for all u; v 2 V .G/. Two graphs are isomorphic when there is an isomorphism between them. Here is an isomorphism f between the two graphs in Figure 12.7: f .a/ WWD 2 f .b/ WWD 3 f .c/ WWD 4 f .d / WWD 1: You can check that there is an edge between two vertices in the graph on the left if and only if there is an edge between the two corresponding vertices in the graph on the right. Two isomorphic graphs may be drawn very differently. For example, Figure 12.8 shows two different ways of drawing C5 . Notice that if f is an isomorphism between G and H , then f 1 is an isomor- phism between H and G. Isomorphism is also transitive because the composition of isomorphisms is an isomorphism. In fact, isomorphism is an equivalence rela- tion. Isomorphism preserves the connection properties of a graph, abstracting out what the vertices are called, what they are made out of, or where they appear in a drawing of the graph. More precisely, a property of a graph is said to be preserved under isomorphism if whenever G has that property, every graph isomorphic to G also has that property. For example, since an isomorphism is a bijection between sets of vertices, isomorphic graphs must have the same number of vertices. What’s more, if f is a graph isomorphism that maps a vertex v of one graph to the vertex f .v/ of an isomorphic graph, then by definition of isomorphism, every vertex adjacent to v in the first graph will be mapped by f to a vertex adjacent to f .v/ in the isomorphic graph. Thus, v and f .v/ will have the same degree. If one graph has a vertex of degree 4 and another does not, then they can’t be isomorphic. In fact, they can’t be isomorphic if the number of degree 4 vertices in each of the graphs is not the same. Looking for preserved properties can make it easy to determine that two graphs are not isomorphic, or to guide the search for an isomorphism when there is one. “mcs” — 2017/3/10 — 22:22 — page 461 — #469 12.5. Bipartite Graphs & Matchings 461 It’s generally easy in practice to decide whether two graphs are isomorphic. How- ever, no one has yet found a procedure for determining whether two graphs are isomorphic that is guaranteed to run in polynomial time on all pairs of graphs.3 Having such a procedure would be useful. For example, it would make it easy to search for a particular molecule in a database given the molecular bonds. On the other hand, knowing there is no such efficient procedure would also be valu- able: secure protocols for encryption and remote authentication can be built on the hypothesis that graph isomorphism is computationally exhausting. The definitions of bijection and isomorphism apply to infinite graphs as well as finite graphs, as do most of the results in the rest of this chapter. But graph theory focuses mostly on finite graphs, and we will too. In the rest of this chapter we’ll assume graphs are finite. We’ve actually been taking isomorphism for granted ever since we wrote “Kn has n vertices. . . ” at the beginning of Section 12.3. Graph theory is all about properties preserved by isomorphism. 12.5 Bipartite Graphs & Matchings There were two kinds of vertices in the “Sex in America” graph, males and females, and edges only went between the two kinds. Graphs like this come up so frequently that they have earned a special name: bipartite graphs. Definition 12.5.1. A bipartite graph is a graph whose vertices can be divided into two sets, L.G/ and R.G/, such that every edge has one endpoint in L.G/ and the other endpoint in R.G/. So every bipartite graph looks something like the graph in Figure 12.2. 12.5.1 The Bipartite Matching Problem The bipartite matching problem is related to the sex-in-America problem that we just studied; only now, the goal is to get everyone happily married. As you might imagine, this is not possible for a variety of reasons, not the least of which is the fact that there are more women in America than men. So, it is simply not possible to marry every woman to a man so that every man is married at most once. But what about getting a mate for every man so that every woman is married at most once? Is it possible to do this so that each man is paired with a woman that 3A procedure runs in polynomial time when it needs an amount of time of at most p.n/, where n is the total number of vertices and p./ is a fixed polynomial. “mcs” — 2017/3/10 — 22:22 — page 462 — #470 462 Chapter 12 Simple Graphs Alice Chuck Martha Tom Sara Michael Jane John Mergatroid Figure 12.9 A graph where an edge between a man and woman denotes that the man likes the woman. he likes? The answer, of course, depends on the bipartite graph that represents who likes who, but the good news is that it is possible to find natural properties of the who-likes-who graph that completely determine the answer to this question. In general, suppose that we have a set of men and an equal-sized or larger set of women, and there is a graph with an edge between a man and a woman if the man likes the woman. In this scenario, the “likes” relationship need not be symmetric, since for the time being, we will only worry about finding a mate for each man that he likes.4 Later, we will consider the “likes” relationship from the female perspective as well. For example, we might obtain the graph in Figure 12.9. A matching is defined to be an assignment of a woman to each man so that different men are assigned to different women, and a man is always assigned a woman that he likes. For example, one possible matching for the men is shown in Figure 12.10. 12.5.2 The Matching Condition A famous result known as Hall’s Matching Theorem gives necessary and sufficient conditions for the existence of a matching in a bipartite graph. It turns out to be a remarkably useful mathematical tool. We’ll state and prove Hall’s Theorem using man-likes-woman terminology. De- fine the set of women liked by a given set of men to consist of all women liked by 4 By the way, we do not mean to imply that marriage should or should not be heterosexual. Nor do we mean to imply that men should get their choice instead of women. It’s just that there are fewer men than women in America, making it impossible to match up all the women with different men. “mcs” — 2017/3/10 — 22:22 — page 463 — #471 12.5. Bipartite Graphs & Matchings 463 Alice Chuck Martha Tom Sara Michael Jane John Mergatroid Figure 12.10 One possible matching for the men is shown with bold edges. For example, John is matched with Mergatroid. at least one of those men. For example, the set of women liked by Tom and John in Figure 12.9 consists of Martha, Sara, and Mergatroid. For us to have any chance at all of matching up the men, the following matching condition must hold: The Matching Condition: every subset of men likes at least as large a set of women. For example, we cannot find a matching if some set of 4 men like only 3 women. Hall’s Theorem says that this necessary condition is actually sufficient; if the match- ing condition holds, then a matching exists. Theorem 12.5.2. A matching for a set M of men with a set W of women can be found if and only if the matching condition holds. Proof. First, let’s suppose that a matching exists and show that the matching condi- tion holds. For any subset of men, each man likes at least the woman he is matched with and a woman is matched with at most one man. Therefore, every subset of men likes at least as large a set of women. Thus, the matching condition holds. Next, let’s suppose that the matching condition holds and show that a matching exists. We use strong induction on jM j, the number of men, on the predicate: P .m/ WWD if the matching condition holds for a set, M , of m men, then there is a matching for M . Base case (jM j D 1): If jM j D 1, then the matching condition implies that the lone man likes at least one woman, and so a matching exists. “mcs” — 2017/3/10 — 22:22 — page 464 — #472 464 Chapter 12 Simple Graphs Inductive Step: Suppose that jM j D m C 1 2. To find a matching for M , there are two cases. Case 1: Every nonempty subset of at most m men likes a strictly larger set of women. In this case, we have some latitude: we pair an arbitrary man with a woman he likes and send them both away. This leaves m men and one fewer women, and the matching condition will still hold. So the induction hypothesis P .m/ implies we can match the remaining m men. Case 2: Some nonempty subset X of at most m men likes an equal-size set Y of women. The matching condition must hold within X, so the strong induction hypothesis implies we can match the men in X with the women in Y . This leaves the problem of matching the set M X of men to the set W Y of women. But the problem of matching M X against W Y also satisfies the Match- ing condition, because any subset of men in M X who liked fewer women in W Y would imply there was a set of men who liked fewer women in the whole set W . Namely, if a subset M0 M X liked only a strictly smaller subset of women W0 W Y , then the set M0 [ X of men would like only women in the strictly smaller set W0 [ Y . So again the strong induction hy- pothesis implies we can match the men in M X with the women in W Y , which completes a matching for M . So in both cases, there is a matching for the men, which completes the proof of the Inductive step. The theorem follows by induction. The proof of Theorem 12.5.2 gives an algorithm for finding a matching in a bi- partite graph, albeit not a very efficient one. However, efficient algorithms for find- ing a matching in a bipartite graph do exist. Thus, if a problem can be reduced to finding a matching, instances of the problem can be solved in a reasonably efficient way. A Formal Statement Let’s restate Theorem 12.5.2 in abstract terms so that you’ll not always be con- demned to saying, “Now this group of men likes at least as many women. . . ” Definition 12.5.3. A matching in a graph G is a set M of edges of G such that no vertex is an endpoint of more than one edge in M . A matching is said to cover a set S of vertices iff each vertex in S is an endpoint of an edge of the matching. A matching is said to be perfect if it covers V .G/. In any graph G the set N.S / of “mcs” — 2017/3/10 — 22:22 — page 465 — #473 12.5. Bipartite Graphs & Matchings 465 neighbors of some set S of vertices is the image of S under the edge-relation, that is, N.S/ WWD f r j hs—ri 2 E.G/ for some s 2 S g: S is called a bottleneck if jS j > j N.S /j: Theorem 12.5.4 (Hall’s Theorem). Let G be a bipartite graph. There is a matching in G that covers L.G/ iff no subset of L.G/ is a bottleneck. An Easy Matching Condition The bipartite matching condition requires that every subset of men has a certain property. In general, verifying that every subset has some property, even if it’s easy to check any particular subset for the property, quickly becomes overwhelming because the number of subsets of even relatively small sets is enormous—over a billion subsets for a set of size 30. However, there is a simple property of vertex degrees in a bipartite graph that guarantees the existence of a matching. Call a bipartite graph degree-constrained if vertex degrees on the left are at least as large as those on the right. More precisely, Definition 12.5.5. A bipartite graph G is degree-constrained when deg.l/ deg.r/ for every l 2 L.G/ and r 2 R.G/. For example, the graph in Figure 12.9 is degree-constrained since every node on the left is adjacent to at least two nodes on the right while every node on the right is adjacent to at most two nodes on the left. Theorem 12.5.6. If G is a degree-constrained bipartite graph, then there is a matching that covers L.G/. Proof. We will show that G satisfies Hall’s condition, namely, if S is an arbitrary subset of L.G/, then j N.S /j jS j: (12.2) Since G is degree-constrained, there is a d > 0 such that deg.l/ d deg.r/ for every l 2 L and r 2 R. Since every edge with an endpoint in S has its other endpoint in N.S / by definition, and every node in N.S / is incident to at most d edges, we know that d j N.S /j #edges with an endpoint in S : Also, since every node in S is the endpoint of at least d edges, #edges incident to a vertex in S d jS j: “mcs” — 2017/3/10 — 22:22 — page 466 — #474 466 Chapter 12 Simple Graphs It follows that d j N.S /j d jSj. Cancelling d completes the derivation of equa- tion (12.2). Regular graphs are a large class of degree-constrained graphs that often arise in practice. Hence, we can use Theorem 12.5.6 to prove that every regular bipartite graph has a perfect matching. This turns out to be a surprisingly useful result in computer science. Definition 12.5.7. A graph is said to be regular if every node has the same degree. Theorem 12.5.8. Every regular bipartite graph has a perfect matching. Proof. Let G be a regular bipartite graph. Since regular graphs are degree-constrained, we know by Theorem 12.5.6 that there must be a matching in G that covers L.G/. Such a matching is only possible when jL.G/j jR.G/j. But G is also degree- constrained if the roles of L.G/ and R.G/ are switched, which implies that jR.G/j jL.G/j also. That is, L.G/ and R.G/ are the same size, and any matching covering L.G/ will also cover R.G/. So every node in G is an endpoint of an edge in the matching, and thus G has a perfect matching. 12.6 Coloring In Section 12.2, we used edges to indicate an affinity between a pair of nodes. But there are lots of situations in which edges will correspond to conflicts between nodes. Exam scheduling is a typical example. 12.6.1 An Exam Scheduling Problem Each term, the MIT Schedules Office must assign a time slot for each final exam. This is not easy, because some students are taking several classes with finals, and (even at MIT) a student can take only one test during a particular time slot. The Schedules Office wants to avoid all conflicts. Of course, you can make such a schedule by having every exam in a different slot, but then you would need hun- dreds of slots for the hundreds of courses, and the exam period would run all year! So, the Schedules Office would also like to keep exam period short. The Schedules Office’s problem is easy to describe as a graph. There will be a vertex for each course with a final exam, and two vertices will be adjacent exactly when some student is taking both courses. For example, suppose we need to sched- ule exams for 6.041, 6.042, 6.002, 6.003 and 6.170. The scheduling graph might appear as in Figure 12.11. “mcs” — 2017/3/10 — 22:22 — page 467 — #475 12.6. Coloring 467 6:170 6:002 6:003 6:041 6:042 Figure 12.11 A scheduling graph for five exams. Exams connected by an edge cannot be given at the same time. blue red green green blue Figure 12.12 A 3-coloring of the exam graph from Figure 12.11. 6.002 and 6.042 cannot have an exam at the same time since there are students in both courses, so there is an edge between their nodes. On the other hand, 6.042 and 6.170 can have an exam at the same time if they’re taught at the same time (which they sometimes are), since no student can be enrolled in both (that is, no student should be enrolled in both when they have a timing conflict). We next identify each time slot with a color. For example, Monday morning is red, Monday afternoon is blue, Tuesday morning is green, etc. Assigning an exam to a time slot is then equivalent to coloring the corresponding vertex. The main constraint is that adjacent vertices must get different colors—otherwise, some student has two exams at the same time. Furthermore, in order to keep the exam period short, we should try to color all the vertices using as few different colors as possible. As shown in Figure 12.12, three colors suffice for our example. The coloring in Figure 12.12 corresponds to giving one final on Monday morning (red), two Monday afternoon (blue), and two Tuesday morning (green). Can we use fewer than three colors? No! We can’t use only two colors since there is a triangle “mcs” — 2017/3/10 — 22:22 — page 468 — #476 468 Chapter 12 Simple Graphs in the graph, and three vertices in a triangle must all have different colors. This is an example of a graph coloring problem: given a graph G, assign colors to each node such that adjacent nodes have different colors. A color assignment with this property is called a valid coloring of the graph—a “coloring,” for short. A graph G is k-colorable if it has a coloring that uses at most k colors. Definition 12.6.1. The minimum value of k for which a graph G has a valid color- ing is called its chromatic number, .G/. So G is k-colorable iff .G/ k. In general, trying to figure out if you can color a graph with a fixed number of colors can take a long time. It’s a classic example of a problem for which no fast algorithms are known. In fact, it is easy to check if a coloring works, but it seems really hard to find it. (If you figure out how, then you can get a $1 million Clay prize.) 12.6.2 Some Coloring Bounds There are some simple properties of graphs that give useful bounds on colorability. The simplest property is being a cycle: an even-length closed cycle is 2-colorable. Cycles in simple graphs by convention have positive length and so are not 1- colorable. So .Ceven / D 2: On the other hand, an odd-length cycle requires 3 colors, that is, .Codd / D 3: (12.3) You should take a moment to think about why this equality holds. Another simple example is a complete graph Kn : .Kn / D n since no two vertices can have the same color. Being bipartite is another property closely related to colorability. If a graph is bipartite, then you can color it with 2 colors using one color for the nodes on the “left” and a second color for the nodes on the “right.” Conversely, graphs with chromatic number 2 are all bipartite with all the vertices of one color on the “left” and those with the other color on the right. Since only graphs with no edges—the empty graphs—have chromatic number 1, we have: Lemma 12.6.2. A graph G with at least one edge is bipartite iff .G/ D 2. “mcs” — 2017/3/10 — 22:22 — page 469 — #477 12.6. Coloring 469 The chromatic number of a graph can also be shown to be small if the vertex degrees of the graph are small. In particular, if we have an upper bound on the degrees of all the vertices in a graph, then we can easily find a coloring with only one more color than the degree bound. Theorem 12.6.3. A graph with maximum degree at most k is .k C 1/-colorable. Since k is the only nonnegative integer valued variable mentioned in the the- orem, you might be tempted to try to prove this theorem using induction on k. Unfortunately, this approach leads to disaster—we don’t know of any reasonable way to do this and expect it would ruin your week if you tried it on a problem set. When you encounter such a disaster using induction on graphs, it is usually best to change what you are inducting on. In graphs, typical good choices for the induction parameter are n, the number of nodes, or e, the number of edges. Proof of Theorem 12.6.3. We use induction on the number of vertices in the graph, which we denote by n. Let P .n/ be the proposition that an n-vertex graph with maximum degree at most k is .k C 1/-colorable. Base case (n D 1): A 1-vertex graph has maximum degree 0 and is 1-colorable, so P .1/ is true. Inductive step: Now assume that P .n/ is true, and let G be an .nC1/-vertex graph with maximum degree at most k. Remove a vertex v (and all edges incident to it), leaving an n-vertex subgraph H . The maximum degree of H is at most k, and so H is .k C 1/-colorable by our assumption P .n/. Now add back vertex v. We can assign v a color (from the set of k C 1 colors) that is different from all its adjacent vertices, since there are at most k vertices adjacent to v and so at least one of the k C 1 colors is still available. Therefore, G is .k C 1/-colorable. This completes the inductive step, and the theorem follows by induction. Sometimes k C 1 colors is the best you can do. For example, .Kn / D n and every node in Kn has degree k D n 1 and so this is an example where Theorem 12.6.3 gives the best possible bound. By a similar argument, we can show that Theorem 12.6.3 gives the best possible bound for any graph with degree bounded by k that has KkC1 as a subgraph. But sometimes k C 1 colors is far from the best that you can do. For example, the n-node star graph shown in Figure 12.13 has maximum degree n 1 but can be colored using just 2 colors. “mcs” — 2017/3/10 — 22:22 — page 470 — #478 470 Chapter 12 Simple Graphs Figure 12.13 A 7-node star graph. 12.6.3 Why coloring? One reason coloring problems frequently arise in practice is because scheduling conflicts are so common. For example, at the internet company Akamai, cofounded by Tom Leighton, a new version of software is deployed over each of its servers (200,000 servers in 2016) every few days. It would take more than twenty years to update all these the servers one at a time, so the deployment must be carried out for many servers simultaneouly. On the other hand, certain pairs of servers with common critical functions cannot be updated simultaneouly, since a server needs to be taken offline while being updated. This problem gets solved by making a 200,000-node conflict graph and coloring it with with a dozen or so colors—so only a dozen or so waves of installs are needed! Another example comes from the need to assign frequencies to radio stations. If two stations have an overlap in their broadcast area, they can’t be given the same frequency. Frequencies are precious and expensive, it is important to minimize the number handed out. This amounts to finding the minimum coloring for a graph whose vertices are the stations and whose edges connect stations with overlapping areas. Coloring also comes up in allocating registers for program variables. While a variable is in use, its value needs to be saved in a register. Registers can be reused for different variables, but two variables need different registers if they are refer- enced during overlapping intervals of program execution. So register allocation is the coloring problem for a graph whose vertices are the variables: vertices are ad- jacent if their intervals overlap, and the colors are registers. Once again, the goal is to minimize the number of colors needed to color the graph. Finally, there’s the famous map coloring problem stated in Proposition 1.1.4. The question is how many colors are needed to color a map so that adjacent territories get different colors? This is the same as the number of colors needed to color a graph that can be drawn in the plane without edges crossing. A proof that four “mcs” — 2017/3/10 — 22:22 — page 471 — #479 12.7. Simple Walks 471 colors are enough for planar graphs was acclaimed when it was discovered about forty years ago. Implicit in that proof was a 4-coloring procedure that takes time proportional to the number of vertices in the graph (countries in the map). Surprisingly, it’s another of those million dollar prize questions to find an effi- cient procedure to tell if any particular planar graph really needs four colors, or if three will actually do the job. A proof that testing 3-colorability of graphs is as hard as the million dollar SAT problem is given in Problem 12.29; this turns out to be true even for planar graphs. (It is easy to tell if a graph is 2-colorable, as explained in Section 12.8.2.) In Chapter 13, we’ll develop enough planar graph theory to present an easy proof that all planar graphs are 5-colorable. 12.7 Simple Walks 12.7.1 Walks, Paths, Cycles in Simple Graphs Walks and paths in simple graphs are esentially the same as in digraphs. We just modify the digraph definitions using undirected edges instead of directed ones. For example, the formal definition of a walk in a simple graph is a virtually the same as the Definition 10.2.1 of a walk in a digraph: Definition 12.7.1. A walk in a simple graph G is an alternating sequence of vertices and edges that begins with a vertex, ends with a vertex, and such that for every edge hu—vi in the walk, one of the endpoints u, v is the element just before the edge, and the other endpoint is the next element after the edge. The length of a walk is the total number of occurrences of edges in it. So a walk v is a sequence of the form v WWD v0 hv0 —v1 i v1 hv1 —v2 i v2 : : : hvk 1 —vk i vk where hvi —vi C1 i 2 E.G/ for i 2 Œ0::k/. The walk is said to start at v0 , to end at vk , and the length, jvj, of the walk is k. The walk is a path iff all the vi ’s are different, that is, if i ¤ j , then vi ¤ vj . A walk that begins and ends at the same vertex is a closed walk. A single vertex counts as a length zero closed walk as well as a length zero path. A cycle can be represented by a closed walk of length three or more whose vertices are distinct except for the beginning and end vertices. Note that in contrast to digraphs, we don’t count length two closed walks as cycles in simple graphs. That’s because a walk going back and forth on the same “mcs” — 2017/3/10 — 22:22 — page 472 — #480 472 Chapter 12 Simple Graphs b d e c a g h f Figure 12.14 A graph with 3 cycles: bhecb, cdec, bcdehb. edge is always possible in a simple graph, and it has no importance. Also, there are no closed walks of length one, since simple graphs don’t have self loops. As in digraphs, the length of a walk is one less than the number of occurrences of vertices in it. For example, the graph in Figure 12.14 has a length 6 path through the seven successive vertices abcdefg. This is the longest path in the graph. The graph in Figure 12.14 also has three cycles through successive vertices bhecb, cdec and bcdehb. 12.7.2 Cycles as Subgraphs We don’t want think of a cycle as having a beginning or an end, so any of the paths that go around it can represent the cycle. For example, in the graph in Figure 12.14, the cycle starting at b and going through vertices bcdehb can also be described as starting at d and going through dehbcd . Furthermore, cycles in simple graphs don’t have a direction: dcbhed describes the same cycle as though it started and ended at d but went in the opposite direction. A precise way to explain which closed walks represent the same cycle is to define cycle as a subgraph. Specifically, we could define a cycle in G to be a subgraph of G that looks like a length-n cycle for n 3. Definition 12.7.2. A graph G is said to be a subgraph of a graph H if V .G/ V .H / and E.G/ E.H /. For example, the one-edge graph G where V .G/ D fg; h; i g and E.G/ D f hh—i i g is a subgraph of the graph H in Figure 12.1. On the other hand, any graph con- taining an edge hg—hi will not be a subgraph of H because this edge is not in E.H /. Another example is an empty graph on n nodes, which will be a subgraph “mcs” — 2017/3/10 — 22:22 — page 473 — #481 12.8. Connectivity 473 of an Ln with the same set of nodes; similarly, Ln is a subgraph of Cn , and Cn is a subgraph of Kn . Definition 12.7.3. For n 3, let Cn be the graph with vertices 1; : : : ; n and edges h1—2i ; h2—3i ; : : : ; h.n 1/—ni ; hn—1i : A cycle of a graph G is a subgraph of G that is isomorphic to Cn for some n 3. This definition formally captures the idea that cycles don’t have direction or be- ginnings or ends. 12.8 Connectivity Definition 12.8.1. Two vertices are connected in a graph when there is a path that begins at one and ends at the other. By convention, every vertex is connected to itself by a path of length zero. A graph is connected when every pair of vertices are connected. 12.8.1 Connected Components Being connected is usually a good property for a graph to have. For example, it could mean that it is possible to get from any node to any other node, or that it is possible to communicate between any pair of nodes, depending on the application. But not all graphs are connected. For example, the graph where nodes represent cities and edges represent highways might be connected for North American cities, but would surely not be connected if you also included cities in Australia. The same is true for communication networks like the internet—in order to be protected from viruses that spread on the internet, some government networks are completely isolated from the internet. Figure 12.15 One graph with 3 connected components. “mcs” — 2017/3/10 — 22:22 — page 474 — #482 474 Chapter 12 Simple Graphs Another example is shown in Figure 12.15, which looks like a picture of three graphs, but is intended to be a picture of one graph. This graph consists of three pieces. Each piece is a subgraph that by itself is connected, but there are no paths between vertices in different pieces. These connected pieces of a graph are called its connected components. Definition 12.8.2. A connected component of a graph is a subgraph consisting of some vertex and every node and edge that is connected to that vertex. So, a graph is connected iff it has exactly one connected component. At the other extreme, the empty graph on n vertices has n connected components, each consisting of a single vertex. 12.8.2 Odd Cycles and 2-Colorability We have already seen that determining the chromatic number of a graph is a chal- lenging problem. There is one special case where this problem is very easy, namely, when the graph is 2-colorable. Theorem 12.8.3. The following graph properties are equivalent: 1. The graph contains an odd length cycle. 2. The graph is not 2-colorable. 3. The graph contains an odd length closed walk. In other words, if a graph has any one of the three properties above, then it has all of the properties. We will show the following implications among these properties: 1. IMPLIES 2. IMPLIES 3. IMPLIES 1: So each of these properties implies the other two, which means they all are equiva- lent. 1 IMPLIES 2 Proof. This follows from equation 12.3. 2 IMPLIES 3 If we prove this implication for connected graphs, then it will hold for an arbitrary graph because it will hold for each connected component. So we can assume that G is connected. “mcs” — 2017/3/10 — 22:22 — page 475 — #483 12.8. Connectivity 475 Proof. Pick an arbitrary vertex r of G. Since G is connected, for every node u 2 V .G/, there will be a walk wu starting at u and ending at r. Assign colors to vertices of G as follows: ( black; if jwu j is even; color.u/ D white; otherwise: Now since G is not colorable, this can’t be a valid coloring. So there must be an edge between two nodes u and v with the same color. But in that case wu breverse.wv /b hv—ui is a closed walk starting and ending at u, and its length is jwu j C jwv j C 1 which is odd. 3 IMPLIES 1 Proof. Since there is an odd length closed walk, the WOP implies there is an odd length closed walk w of minimum length. We claim w must be a cycle. To show this, assume to the contrary that w is not a cycle, so there is a repeat vertex occurrence besides the start and end. There are then two cases to consider depending on whether the additional repeat is different from, or the same as, the start vertex. In the first case, the start vertex has an extra occurrence. That is, w D fb xr for some positive length walks f and r that begin and end at x. Since jwj D jfj C jrj is odd, exactly one of f and r must have odd length, and that one will be an odd length closed walk shorter than w, a contradiction. In the second case, w D fb y gb yr where f is a walk from x to y for some y ¤ x, and r is a walk from y to x, and jgj > 0. Now g cannot have odd length or it would be an odd-length y r must closed walk shorter than w. So g has even length. That implies that fb be an odd-length closed walk shorter than w, again a contradiction. This completes the proof of Theorem 12.8.3. “mcs” — 2017/3/10 — 22:22 — page 476 — #484 476 Chapter 12 Simple Graphs Theorem 12.8.3 turns out to be useful, since bipartite graphs come up fairly often in practice.5 12.8.3 k-connected Graphs If we think of a graph as modeling cables in a telephone network, or oil pipelines, or electrical power lines, then we not only want connectivity, but we want connec- tivity that survives component failure. So more generally, we want to define how strongly two vertices are connected. One measure of connection strength is how many links must fail before connectedness fails. In particular, two vertices are k- edge connected when it takes at least k “edge-failures” to disconnect them. More precisely: Definition 12.8.4. Two vertices in a graph are k-edge connected when they remain connected in every subgraph obtained by deleting up to k 1 edges. A graph is k- edge connected when it has more than one vertex, and every pair of distinct vertices in the graph are k-edge connected. From now on we’ll drop the “edge” modifier and just say “k-connected.”6 Notice that according to Definition 12.8.4, if a graph is k-connected, it is also j -connected for j k. This convenient convention implies that two vertices are connected according to definition 12.8.1 iff they are 1-connected according to Def- inition 12.8.4. For example, in the graph in figure 12.14, vertices c and e are 3-connected, b and e are 2-connected, g and e are 1 connected, and no vertices are 4-connected. The graph as a whole is only 1-connected. A complete graph Kn is .n 1/- connected. Every cycle is 2-connected. The idea of a cut edge is a useful way to explain 2-connectivity. Definition 12.8.5. If two vertices are connected in a graph G, but not connected when an edge e is removed, then e is called a cut edge of G. So a graph with more than one vertex is 2-connected iff it is connected and has no cut edges. The following Lemma is another immediate consequence of the definition: Lemma 12.8.6. An edge is a cut edge iff it is not on a cycle. 5 One example concerning routing networks already came up in Lemma 11.3.3. Corollary 13.5.4 reveals the importance of another example in planar graph theory. 6 There is a corresponding definition of k-vertex connectedness based on deleting vertices rather than edges. Graph theory texts usually use “k-connected” as shorthand for “k-vertex connected.” But edge-connectedness will be enough for us. “mcs” — 2017/3/10 — 22:22 — page 477 — #485 12.8. Connectivity 477 More generally, if two vertices are connected by k edge-disjoint paths—that is, no edge occurs in two paths—then they must be k-connected, since at least one edge will have to be removed from each of the paths before they could disconnect. A fundamental fact, whose ingenious proof we omit, is Menger’s theorem which confirms that the converse is also true: if two vertices are k-connected, then there are k edge-disjoint paths connecting them. It takes some ingenuity to prove this just for the case k D 2. 12.8.4 The Minimum Number of Edges in a Connected Graph The following theorem says that a graph with few edges must have many connected components. Theorem 12.8.7. Every graph G has at least jV .G/j jE.G/j connected compo- nents. Of course for Theorem 12.8.7 to be of any use, there must be fewer edges than vertices. Proof. We use induction on the number k of edges. Let P .k/ be the proposition that every graph G with k edges has at least jV .G/j k connected compo- nents. Base case (k D 0): In a graph with 0 edges, each vertex is itself a connected component, and so there are exactly jV .G/j D jV .G/j 0 connected components. So P .0/ holds. Inductive step: Let Ge be the graph that results from removing an edge, e 2 E.G/. So Ge has k edges, and by the induction hypothesis P .k/, we may assume that Ge has at least .jV .G/j k/ connected components. Now add back the edge e to obtain the original graph G. If the endpoints of e were in the same connected component of Ge , then G has the same sets of connected vertices as Ge , so G has at least .jV .G/j k/ > .jV .G/j .k C 1// components. Alternatively, if the endpoints of e were in different connected components of Ge , then these two components are merged into one component in G, while all other components remain unchanged, so that G has one fewer connected component than Ge . That is, G has at least .jV .G/j k/ 1 D .jV .G/j .k C 1// connected components. So in either case, G has at least jV .G/j .k C 1/ components, as claimed. This completes the inductive step and hence the entire proof by induction. “mcs” — 2017/3/10 — 22:22 — page 478 — #486 478 Chapter 12 Simple Graphs Figure 12.16 A 6-node forest consisting of 2 component trees. Corollary 12.8.8. Every connected graph with n vertices has at least n 1 edges. A couple of points about the proof of Theorem 12.8.7 are worth noticing. First, we used induction on the number of edges in the graph. This is very common in proofs involving graphs, as is induction on the number of vertices. When you’re presented with a graph problem, these two approaches should be among the first you consider. The second point is more subtle. Notice that in the inductive step, we took an arbitrary .k C1/-edge graph, threw out an edge so that we could apply the induction assumption, and then put the edge back. You’ll see this shrink-down, grow-back process very often in the inductive steps of proofs related to graphs. This might seem like needless effort: why not start with an k-edge graph and add one more to get an .k C 1/-edge graph? That would work fine in this case, but opens the door to a nasty logical error called buildup error, illustrated in Problem 12.40. 12.9 Forests & Trees We’ve already made good use of digraphs without cycles, but simple graphs without cycles are arguably the most important graphs in computer science. 12.9.1 Leaves, Parents & Children Definition 12.9.1. An acyclic graph is called a forest. A connected acyclic graph is called a tree. The graph shown in Figure 12.16 is a forest. Each of its connected components is by definition a tree. One of the first things you will notice about trees is that they tend to have a lot of nodes with degree one. Such nodes are called leaves. Definition 12.9.2. A degree 1 node in a forest is called a leaf. The forest in Figure 12.16 has 4 leaves. The tree in Figure 12.17 has 5 leaves. “mcs” — 2017/3/10 — 22:22 — page 479 — #487 12.9. Forests & Trees 479 a e h c g i b d f Figure 12.17 A 9-node tree with 5 leaves. e d g b c f h i a Figure 12.18 The tree from Figure 12.17 redrawn with node e as the root and the other nodes arranged in levels. Trees are a fundamental data structure in computer science. For example, in- formation is often stored in tree-like data structures, and the execution of many recursive programs can be modeled as the traversal of a tree. In such cases, it is often useful to arrange the nodes in levels, where the node at the top level is iden- tified as the root and where every edge joins a parent to a child one level below. Figure 12.18 shows the tree of Figure 12.17 redrawn in this way. Node d is a child of node e and the parent of nodes b and c. 12.9.2 Properties Trees have many unique properties. We have listed some of them in the following theorem. Theorem 12.9.3. Every tree has the following properties: 1. Every connected subgraph is a tree. 2. There is a unique path between every pair of vertices. “mcs” — 2017/3/10 — 22:22 — page 480 — #488 480 Chapter 12 Simple Graphs 3. Adding an edge between nonadjacent nodes in a tree creates a graph with a cycle. 4. Removing any edge disconnects the graph. That is, every edge is a cut edge. 5. If the tree has at least two vertices, then it has at least two leaves. 6. The number of vertices in a tree is one larger than the number of edges. Proof. 1. A cycle in a subgraph is also a cycle in the whole graph, so any sub- graph of an acyclic graph must also be acyclic. If the subgraph is also con- nected, then by definition, it is a tree. 2. Since a tree is connected, there is at least one path between every pair of ver- tices. Suppose for the purposes of contradiction, that there are two different paths between some pair of vertices. Then there are two distinct paths p ¤ q between the same two vertices with minimum total length jpj C jqj. If these paths shared a vertex w other than at the start and end of the paths, then the parts of p and q from start to w, or the parts of p and q from w to the end, must be distinct paths between the same vertices with total length less than jpj C jqj, contradicting the minimality of this sum. Therefore, p and q have no vertices in common besides their endpoints, and so p breverse.q/ is a cycle. 3. An additional edge hu—vi together with the unique path between u and v forms a cycle. 4. Suppose that we remove edge hu—vi. Since the tree contained a unique path between u and v, that path must have been hu—vi. Therefore, when that edge is removed, no path remains, and so the graph is not connected. 5. Since the tree has at least two vertices, the longest path in the tree will have different endpoints u and v. We claim u is a leaf. This follows because, by definition of endpoint, u is incident to at most one edge on the path. Also, if u was incident to an edge not on the path, then the path could be lengthened by adding that edge, contradicting the fact that the path was as long as possible. It follows that u is incident only to a single edge, that is u is a leaf. The same hold for v. 6. We use induction on the proposition P .n/ WWD there are n 1 edges in any n-vertex tree: “mcs” — 2017/3/10 — 22:22 — page 481 — #489 12.9. Forests & Trees 481 Figure 12.19 A graph where the edges of a spanning tree have been thickened. Base case (n D 1): P .1/ is true since a tree with 1 node has 0 edges and 1 1 D 0. Inductive step: Now suppose that P .n/ is true and consider an .nC1/-vertex tree T . Let v be a leaf of the tree. You can verify that deleting a vertex of degree 1 (and its incident edge) from any connected graph leaves a connected subgraph. So by Theorem 12.9.3.1, deleting v and its incident edge gives a smaller tree, and this smaller tree has n 1 edges by induction. If we re- attach the vertex v and its incident edge, we find that T has n D .n C 1/ 1 edges. Hence, P .n C 1/ is true, and the induction proof is complete. Various subsets of properties in Theorem 12.9.3 provide alternative characteri- zations of trees. For example, Lemma 12.9.4. A graph G is a tree iff G is a forest and jV .G/j D jE.G/j C 1. The proof is an easy consequence of Theorem 12.9.3.6 (Problem 12.47). 12.9.3 Spanning Trees Trees are everywhere. In fact, every connected graph contains a subgraph that is a tree with the same vertices as the graph. This is called a spanning tree for the graph. For example, Figure 12.19 is a connected graph with a spanning tree highlighted. Definition 12.9.5. Define a spanning subgraph of a graph G to be a subgraph containing all the vertices of G. Theorem 12.9.6. Every connected graph contains a spanning tree. Proof. Suppose G is a connected graph, so the graph G itself is a connected, span- ning subgraph. So by WOP, G must have a minimum-edge connected, spanning subgraph T . We claim T is a spanning tree. Since T is a connected, spanning subgraph by definition, all we have to show is that T is acyclic. “mcs” — 2017/3/10 — 22:22 — page 482 — #490 482 Chapter 12 Simple Graphs 1 2 2 3 3 3 2 2 3 3 1 1 4 1 1 1 7 7 (a) (b) Figure 12.20 A spanning tree (a) with weight 19 for a graph (b). But suppose to the contrary that T contained a cycle C . By Lemma 12.8.6, an edge e of C will not be a cut edge, so removing it would leave a connected, spanning subgraph that was smaller than T , contradicting the minimality to T . 12.9.4 Minimum Weight Spanning Trees Spanning trees are interesting because they connect all the nodes of a graph using the smallest possible number of edges. For example the spanning tree for the 6- node graph shown in Figure 12.19 has 5 edges. In many applications, there are numerical costs or weights associated with the edges of the graph. For example, suppose the nodes of a graph represent buildings and edges represent connections between them. The cost of a connection may vary a lot from one pair of buildings or towns to another. Another example is where the nodes represent cities and the weight of an edge is the distance between them: the weight of the Los Angeles/New York City edge is much higher than the weight of the NYC/Boston edge. The weight of a graph is simply defined to be the sum of the weights of its edges. For example, the weight of the spanning tree shown in Figure 12.20 is 19. Definition 12.9.7. A minimum weight spanning tree (MST) of an edge-weighted graph G is a spanning tree of G with the smallest possible sum of edge weights. Is the spanning tree shown in Figure 12.20(a) an MST of the weighted graph shown in Figure 12.20(b)? It actually isn’t, since the tree shown in Figure 12.21 is also a spanning tree of the graph shown in Figure 12.20(b), and this spanning tree has weight 17. What about the tree shown in Figure 12.21? It seems to be an MST, but how do we prove it? In general, how do we find an MST for a connected graph G? We “mcs” — 2017/3/10 — 22:22 — page 483 — #491 12.9. Forests & Trees 483 1 2 2 3 1 1 7 Figure 12.21 An MST with weight 17 for the graph in Figure 12.20(b). could try enumerating all subtrees of G, but that approach would be hopeless for large graphs. There actually are many good ways to find MST’s based on a property of some subgraphs of G called pre-MST’s. Definition 12.9.8. A pre-MST for a graph G is a spanning subgraph of G that is also a subgraph of some MST of G. So a pre-MST will necessarily be a forest. For example, the empty graph with the same vertices as G is guaranteed to be a pre-MST of G, and so is any actual MST of G. If e is an edge of G and S is a spanning subgraph, we’ll write S C e for the spanning subgraph with edges E.S / [ feg. Definition 12.9.9. If F is a pre-MST and e is a new edge, that is e 2 E.G/ E.F /, then e extends F when F C e is also a pre-MST. So being a pre-MST is contrived to be an invariant under addition of extending edges, by the definition of extension. The standard methods for finding MST’s all start with the empty spanning forest and build up to an MST by adding one extending edge after another. Since the empty spanning forest is a pre-MST, and being a pre-MST is, by definition, in- variant under extensions, every forest built in this way will be a pre-MST. But no spanning tree can be a subgraph of a different spanning tree. So when the pre-MST finally grows enough to become a tree, it will be an MST. By Lemma 12.9.4, this happens after exactly jV .G/j 1 edge extensions. So the problem of finding MST’s reduces to the question of how to tell if an edge is an extending edge. Here’s how: “mcs” — 2017/3/10 — 22:22 — page 484 — #492 484 Chapter 12 Simple Graphs Definition 12.9.10. Let F be a pre-MST, and color the vertices in each connected component of F either all black or all white. At least one component of each color is required. Call this a solid coloring of F . A gray edge of a solid coloring is an edge of G with different colored endpoints. Any path in G from a white vertex to a black vertex obviously must include a gray edge, so for any solid coloring, there is guaranteed to be at least one gray edge. In fact, there will have to be at least as many gray edges as there are components with the same color. Here’s the punchline: Lemma 12.9.11. An edge extends a pre-MST F if it is a minimum weight gray edge in some solid coloring of F . So to extend a pre-MST, choose any solid coloring, find the gray edges, and among them choose one with minimum weight. Each of these steps is easy to do, so it is easy to keep extending and arrive at an MST. For example, here are three known algorithms that are explained by Lemma 12.9.11: Algorithm 1. [Prim] Grow a tree one edge at a time by adding a minimum weight edge among the edges that have exactly one endpoint in the tree. This is the algorithm that comes from coloring the growing tree white and all the vertices not in the tree black. Then the gray edges are the ones with exactly one endpoint in the tree. Algorithm 2. [Kruskal] Grow a forest one edge at a time by adding a minimum weight edge among the edges with endpoints in different connected components. An edge does not create a cycle iff it connects different components. The edge chosen by Kruskal’s algorithm will be the minimum weight gray edge when the components it connects are assigned different colors. For example, in the weighted graph we have been considering, we might run Algorithm 1 as follows. Start by choosing one of the weight 1 edges, since this is the smallest weight in the graph. Suppose we chose the weight 1 edge on the bottom of the triangle of weight 1 edges in our graph. This edge is incident to the same vertex as two weight 1 edges, a weight 4 edge, a weight 7 edge, and a weight 3 edge. We would then choose the incident edge of minimum weight. In this case, one of the two weight 1 edges. At this point, we cannot choose the third weight 1 edge: it won’t be gray because its endpoints are both in the tree, and so are both colored white. But we can continue by choosing a weight 2 edge. We might end up with the spanning tree shown in Figure 12.22, which has weight 17, the smallest we’ve seen so far. “mcs” — 2017/3/10 — 22:22 — page 485 — #493 12.9. Forests & Trees 485 1 2 3 2 1 1 7 Figure 12.22 A spanning tree found by Algorithm 1. Now suppose we instead ran Algorithm 2 on our graph. We might again choose the weight 1 edge on the bottom of the triangle of weight 1 edges in our graph. Now, instead of choosing one of the weight 1 edges it touches, we might choose the weight 1 edge on the top of the graph. This edge still has minimum weight, and will be gray if we simply color its endpoints differently, so Algorithm 2 can choose it. We would then choose one of the remaining weight 1 edges. Note that neither causes us to form a cycle. Continuing the algorithm, we could end up with the same spanning tree in Figure 12.22, though this will depend on the tie breaking rules used to choose among gray edges with the same minimum weight. For example, if the weight of every edge in G is one, then all spanning trees are MST’s with weight jV .G/j 1, and both of these algorithms can arrive at each of these spanning trees by suitable tie-breaking. The coloring that explains Algorithm 1 also justifies a more flexible algorithm which has Algorithm 1 as a special case: Algorithm 3. Grow a forest one edge at a time by picking any component and adding a minimum weight edge among the edges leaving that component. This algorithm allows components that are not too close to grow in parallel and independently, which is great for “distributed” computation where separate proces- sors share the work with limited communication between processors.7 These are examples of greedy approaches to optimization. Sometimes greediness works and sometimes it doesn’t. The good news is that it does work to find the MST. Therefore, we can be sure that the MST for our example graph has weight 17, since it was produced by Algorithm 2. Furthermore we have a fast algorithm for finding a minimum weight spanning tree for any graph. 7 The idea of growing trees seems first to have been developed in by Boru̇vka (1926), ref TBA. Efficient MST algorithms running in parallel time O.log jV j/ are described in Karger, Klein, and Tarjan (1995), ref TBA. “mcs” — 2017/3/10 — 22:22 — page 486 — #494 486 Chapter 12 Simple Graphs Ok, to wrap up this story, all that’s left is the proof that minimal gray edges are extending edges. This might sound like a chore, but it just uses the same reasoning we used to be sure there would be a gray edge when you need it. Proof. (of Lemma 12.9.11) Let F be a pre-MST that is a subgraph of some MST M of G, and suppose e is a minimum weight gray edge under some solid coloring of F . We want to show that F C e is also a pre-MST. If e happens to be an edge of M , then F C e remains a subgraph of M , and so is a pre-MST. The other case is when e is not an edge of M . In that case, M C e will be a connected, spanning subgraph. Also M has a path p between the different colored endpoints of e, so M C e has a cycle consisting of e together with p. Now p has both a black endpoint and a white one, so it must contain some gray edge g ¤ e. The trick is to remove g from M C e to obtain a subgraph M C e g. Since gray edges by definition are not edges of F , the graph M C e g contains F C e. We claim that M C e g is an MST, which proves the claim that e extends F . To prove this claim, note that M C e is a connected, spanning subgraph, and g is on a cycle of M C e, so by Lemma 12.8.6, removing g won’t disconnect anything. Therefore, M C e g is still a connected, spanning subgraph. Moreover, M C e g has the same number of edges as M , so Lemma 12.9.4 implies that it must be a spanning tree. Finally, since e is minimum weight among gray edges, w.M C e g/ D w.M / C w.e/ w.g/ w.M /: This means that M C e g is a spanning tree whose weight is at most that of an MST, which implies that M C e g is also an MST. Another interesting fact falls out of the proof of Lemma 12.9.11: Corollary 12.9.12. If all edges in a weighted graph have distinct weights, then the graph has a unique MST. The proof of Corollary 12.9.12 is left to Problem 12.63. 12.10 References [8], [13], [22], [25], [27] “mcs” — 2017/3/10 — 22:22 — page 487 — #495 12.10. References 487 Problems for Section 12.2 Practice Problems Problem 12.1. The average degree of the vertices in an n-vertex graph is twice the average number of edges of per vertex. Explain why. Problem 12.2. Among connected simple graphs whose sum of vertex degrees is 20: (a) what is the largest possible number of vertices? (b) what is the smallest possible number of vertices? Class Problems Problem 12.3. (a) Prove that in every simple graph, there are an even number of vertices of odd degree. (b) Conclude that at a party where some people shake hands, the number of people who shake hands an odd number of times is an even number. (c) Call a sequence of people at the party a handshake sequence if each person in the sequence has shaken hands with the next person, if any, in the sequence. Suppose George was at the party and has shaken hands with an odd number of people. Explain why, starting with George, there must be a handshake sequence ending with a different person who has shaken an odd number of hands. Exam Problems Problem 12.4. A researcher analyzing data on heterosexual sexual behavior in a group of m males and f females found that within the group, the male average number of female partners was 10% larger that the female average number of male partners. (a) Comment on the following claim. “Since we’re assuming that each encounter involves one man and one woman, the average numbers should be the same, so the “mcs” — 2017/3/10 — 22:22 — page 488 — #496 488 Chapter 12 Simple Graphs males must be exaggerating.” (b) For what constant c is m D c f ? (c) The data shows that approximately 20% of the females were virgins, while only 5% of the males were. The researcher wonders how excluding virgins from the population would change the averages. If he knew graph theory, the researcher would realize that the nonvirgin male average number of partners will be x.f =m/ times the nonvirgin female average number of partners. What is x? (d) For purposes of further research, it would be helpful to pair each female in the group with a unique male in the group. Explain why this is not possible. Problems for Section 12.4 Practice Problems Problem 12.5. Which of the items below are simple-graph properties preserved under isomor- phism? (a) There is a cycle that includes all the vertices. (b) The vertices are numbered 1 through 7. (c) The vertices can be numbered 1 through 7. (d) There are two degree 8 vertices. (e) Two edges are of equal length. (f) No matter which edge is removed, there is a path between any two vertices. (g) There are two cycles that do not share any vertices. (h) The vertices are sets. (i) The graph can be drawn in a way that all the edges have the same length. (j) No two edges cross. (k) The OR of two properties that are preserved under isomorphism. (l) The negation of a property that is preserved under isomorphism. “mcs” — 2017/3/10 — 22:22 — page 489 — #497 12.10. References 489 Class Problems Problem 12.6. For each of the following pairs of simple graphs, either define an isomorphism between them, or prove that there is none. (We write ab as shorthand for ha—bi.) (a) G1 with V1 D f1; 2; 3; 4; 5; 6g; E1 D f12; 23; 34; 14; 15; 35; 45g G2 with V2 D f1; 2; 3; 4; 5; 6g; E2 D f12; 23; 34; 45; 51; 24; 25g (b) G3 with V3 D f1; 2; 3; 4; 5; 6g; E3 D f12; 23; 34; 14; 45; 56; 26g G4 with V4 D fa; b; c; d; e; f g; E4 D fab; bc; cd; de; ae; ef; cf g Problem 12.7. List all the isomorphisms between the two graphs given in Figure 12.23. Explain why there are no others. 1 a 3 2 4 c b d 5 6 e f Figure 12.23 Graphs with several isomorphisms Homework Problems Problem 12.8. Determine which among the four graphs pictured in Figure 12.24 are isomorphic. For each pair of isomorphic graphs, describe an isomorphism between them. For each pair of graphs that are not isomorphic, give a property that is preserved under isomorphism such that one graph has the property, but the other does not. For at least one of the properties you choose, prove that it is indeed preserved under isomorphism (you only need prove one of them). “mcs” — 2017/3/10 — 22:22 — page 490 — #498 490 Chapter 12 Simple Graphs 1 1 6 6 5 8 9 2 5 9 7 2 10 7 10 8 4 3 4 3 (a) G1 (b) G2 1 1 9 2 6 5 9 7 2 8 3 10 10 8 7 4 4 3 6 5 (c) G3 (d) G4 Figure 12.24 Which graphs are isomorphic? “mcs” — 2017/3/10 — 22:22 — page 491 — #499 12.10. References 491 Problem 12.9. (a) For any vertex v in a graph, let N.v/ be the set of neighbors of v, namely, the vertices adjacent to v: N.v/ WWD fu j hu—vi is an edge of the graphg: Suppose f is an isomorphism from graph G to graph H . Prove that f .N.v// D N.f .v//. Your proof should follow by simple reasoning using the definitions of isomorphism and neighbors—no pictures or handwaving. Hint: Prove by a chain of iff’s that h 2 N.f .v// iff h 2 f .N.v// for every h 2 VH . Use the fact that h D f .u/ for some u 2 VG . (b) Conclude that if G and H are isomorphic graphs, then for each k 2 N, they have the same number of degree k vertices. Problem 12.10. Let’s say that a graph has “two ends” if it has exactly two vertices of degree 1 and all its other vertices have degree 2. For example, here is one such graph: (a) A line graph is a graph whose vertices can be listed in a sequence with edges between consecutive vertices only. So the two-ended graph above is also a line graph of length 4. Prove that the following theorem is false by drawing a counterexample. False Theorem. Every two-ended graph is a line graph. (b) Point out the first erroneous statement in the following bogus proof of the false theorem and describe the error. Bogus proof. We use induction. The induction hypothesis is that every two-ended graph with n edges is a line graph. Base case (n D 1): The only two-ended graph with a single edge consists of two vertices joined by an edge: “mcs” — 2017/3/10 — 22:22 — page 492 — #500 492 Chapter 12 Simple Graphs Sure enough, this is a line graph. Inductive case: We assume that the induction hypothesis holds for some n 1 and prove that it holds for n C 1. Let Gn be any two-ended graph with n edges. By the induction assumption, Gn is a line graph. Now suppose that we create a two-ended graph GnC1 by adding one more edge to Gn . This can be done in only one way: the new edge must join one of the two endpoints of Gn to a new vertex; otherwise, GnC1 would not be two-ended. gn ↑ new edge Clearly, GnC1 is also a line graph. Therefore, the induction hypothesis holds for all graphs with n C 1 edges, which completes the proof by induction. Problems for Section 12.5 Practice Problems Problem 12.11. Let B be a bipartite graph with vertex sets L.B/; R.B/. Explain why the sum of the degrees of the vertices in L.B/ equals the sum of the degrees of the vertices in R.B/. Class Problems Problem 12.12. A certain Institute of Technology has a lot of student clubs; these are loosely over- seen by the Student Association. Each eligible club would like to delegate one of its members to appeal to the Dean for funding, but the Dean will not allow a student to be the delegate of more than one club. Fortunately, the Association VP took Math for Computer Science and recognizes a matching problem when she sees one. “mcs” — 2017/3/10 — 22:22 — page 493 — #501 12.10. References 493 (a) Explain how to model the delegate selection problem as a bipartite matching problem. (This is a modeling problem; we aren’t looking for a description of an algorithm to solve the problem.) (b) The VP’s records show that no student is a member of more than 9 clubs. The VP also knows that to be eligible for support from the Dean’s office, a club must have at least 13 members. That’s enough for her to guarantee there is a proper delegate selection. Explain. (If only the VP had taken an Algorithms class, she could even have found a delegate selection without much effort.) Problem 12.13. A simple graph is called regular when every vertex has the same degree. Call a graph balanced when it is regular and is also a bipartite graph with the same number of left and right vertices. Prove that if G is a balanced graph, then the edges of G can be partitioned into blocks such that each block is a perfect matching. For example, if G is a balanced graph with 2k vertices each of degree j , then the edges of G can be partitioned into j blocks, where each block consists of k edges, each of which is a perfect matching. Exam Problems Problem 12.14. Overworked and over-caffeinated, the Teaching Assistant’s (TA’s) decide to oust the lecturer and teach their own recitations. They will run a recitation session at 4 different times in the same room. There are exactly 20 chairs to which a student can be assigned in each recitation. Each student has provided the TA’s with a list of the recitation sessions her schedule allows and each student’s schedule conflicts with at most two sessions. The TA’s must assign each student to a chair during recitation at a time she can attend, if such an assignment is possible. (a) Describe how to model this situation as a matching problem. Be sure to spec- ify what the vertices/edges should be and briefly describe how a matching would determine seat assignments for each student in a recitation that does not conflict with his schedule. (This is a modeling problem; we aren’t looking for a description of an algorithm to solve the problem.) (b) Suppose there are 41 students. Given the information provided above, is a matching guaranteed? Briefly explain. “mcs” — 2017/3/10 — 22:22 — page 494 — #502 494 Chapter 12 Simple Graphs Problem 12.15. Because of the incredible popularity of his class Math for Computer Science, TA Mike decides to give up on regular office hours. Instead, he arranges for each student to join some study groups. Each group must choose a representative to talk to the staff, but there is a staff rule that a student can only represent one group. The problem is to find a representative from each group while obeying the staff rule. (a) Explain how to model the delegate selection problem as a bipartite matching problem. (This is a modeling problem; we aren’t looking for a description of an algorithm to solve the problem.) (b) The staff’s records show that each student is a member of at most 4 groups, and all the groups have 4 or more members. Is that enough to guarantee there is a proper delegate selection? Explain. Problem 12.16. Let Rb be the “implies” binary relation on propositional formulas defined by the rule that F R b G iff Œ.F IMPLIES G/ is a valid formula: (12.4) For example, .P AND Q/ R b P , because the formula .P AND Q/ IMPLIES P is valid. Also, it is not true that .P OR Q/ Rb P since .P OR Q/ IMPLIES P is not valid. (a) Let A and B be the sets of formulas listed below. Explain why R b is not a weak partial order on the set A [ B. (b) Fill in the R b arrows from A to B. “mcs” — 2017/3/10 — 22:22 — page 495 — #503 12.10. References 495 A arrows B Q P XOR Q P OR Q P AND Q P OR Q OR .P AND Q/ NOT .P AND Q/ P (c) The diagram in part (b) defines a bipartite graph G with L.G/ D A, R.G/ D B and an edge between F and G iff F R b G. Exhibit a subset S of A such that both S and A S are nonempty, and the set N.S / of neighbors of S is the same size as S , that is, jN.S /j D jS j. (d) Let G be an arbitrary, finite, bipartite graph. For any subset S L.G/, let S WWD L.G/ S , and likewise for any M R.G/, let M WWD R.G/ M . Suppose S is a subset of L.G/ such that jN.S /j D jS j, and both S and S are nonempty. Circle the formula that correctly completes the following statement: There is a matching from L.G/ to R.G/ if and only if there is both a matching from S to its neighbors, N.S /, and also a matching from S to 1 1 N.S / N.S / N .N.S // N .N.S // N.S / N.S / N.S / N.S / Hint: The proof of Hall’s Bottleneck Theorem. Problem 12.17. (a) Show that there is no matching for the bipartite graph G in Figure 12.25 that covers L.G/. (b) The bipartite graph H in Figure 12.26 has an easily verified property that implies it has a matching that covers L.H /. What is the property? “mcs” — 2017/3/10 — 22:22 — page 496 — #504 496 Chapter 12 Simple Graphs a v b w c x d y e z L(G) R(G) Figure 12.25 Bipartite graph G. Homework Problems Problem 12.18. A Latin square is n n array whose entries are the number 1; : : : ; n. These en- tries satisfy two constraints: every row contains all n integers in some order, and also every column contains all n integers in some order. Latin squares come up frequently in the design of scientific experiments for reasons illustrated by a little story in a footnote8 8 At Guinness brewery in the eary 1900’s, W. S. Gosset (a chemist) and E. S. Beavan (a “maltster”) were trying to improve the barley used to make the brew. The brewery used different varieties of barley according to price and availability, and their agricultural consultants suggested a different fertilizer mix and best planting month for each variety. Somewhat sceptical about paying high prices for customized fertilizer, Gosset and Beavan planned a season long test of the influence of fertilizer and planting month on barley yields. For as many months as there were varieties of barley, they would plant one sample of each variety using a different one of the fertilizers. So every month, they would have all the barley varieties planted and all the fertilizers used, which would give them a way to judge the overall quality of that planting month. But they also wanted to judge the fertilizers, so they wanted each fertilizer to be used on each variety during the course of the season. Now they had a little mathematical problem, which we can abstract as follows. Suppose there are n barley varieties and an equal number of recommended fertilizers. Form an n n array with a column for each fertilizer and a row for each planting month. We want to fill in the entries of this array with the integers 1,. . . ,n numbering the barley varieties, so that every row contains all n integers in some order (so every month each variety is planted and each fertilizer is used), and also every column contains all n integers (so each fertilizer is used on all the varieties over the course of the growing season). “mcs” — 2017/3/10 — 22:22 — page 497 — #505 12.10. References 497 v a w b x c y d z L(H) R(H) Figure 12.26 Bipartite Graph H . For example, here is a 4 4 Latin square: 1 2 3 4 3 4 2 1 2 1 4 3 4 3 1 2 (a) Here are three rows of what could be part of a 5 5 Latin square: 2 4 5 3 1 4 1 3 2 5 3 2 1 5 4 Fill in the last two rows to extend this “Latin rectangle” to a complete Latin square. (b) Show that filling in the next row of an n n Latin rectangle is equivalent to finding a matching in some 2n-vertex bipartite graph. (c) Prove that a matching must exist in this bipartite graph and, consequently, a Latin rectangle can always be extended to a Latin square. “mcs” — 2017/3/10 — 22:22 — page 498 — #506 498 Chapter 12 Simple Graphs Problem 12.19. Take a regular deck of 52 cards. Each card has a suit and a value. The suit is one of four possibilities: heart, diamond, club, spade. The value is one of 13 possibilities, A; 2; 3; : : : ; 10; J; Q; K. There is exactly one card for each of the 4 13 possible combinations of suit and value. Ask your friend to lay the cards out into a grid with 4 rows and 13 columns. They can fill the cards in any way they’d like. In this problem you will show that you can always pick out 13 cards, one from each column of the grid, so that you wind up with cards of all 13 possible values. (a) Explain how to model this trick as a bipartite matching problem between the 13 column vertices and the 13 value vertices. Is the graph necessarily degree- constrained? (b) Show that any n columns must contain at least n different values and prove that a matching must exist. Problem 12.20. Scholars through the ages have identified twenty fundamental human virtues: hon- esty, generosity, loyalty, prudence, completing the weekly course reading-response, etc. At the beginning of the term, every student in Math for Computer Science pos- sessed exactly eight of these virtues. Furthermore, every student was unique; that is, no two students possessed exactly the same set of virtues. The Math for Com- puter Science course staff must select one additional virtue to impart to each student by the end of the term. Prove that there is a way to select an additional virtue for each student so that every student is unique at the end of the term as well. Suggestion: Use Hall’s theorem. Try various interpretations for the vertices on the left and right sides of your bipartite graph. Problems for Section 12.6 Class Problems Problem 12.21. Let G be the graph below.9 Carefully explain why .G/ D 4. 9 From [30], Exercise 13.3.1 “mcs” — 2017/3/10 — 22:22 — page 499 — #507 12.10. References 499 Problem 12.22. A portion of a computer program consists of a sequence of calculations where the results are stored in variables, like this: Inputs: a; b Step 1: c D aCb 2: d D ac 3: e D cC3 4: f D c e 5: g D aCf 6: h D f C1 Outputs: d; g; h A computer can perform such calculations most quickly if the value of each variable is stored in a register, a chunk of very fast memory inside the microprocessor. Programming language compilers face the problem of assigning each variable in a program to a register. Computers usually have few registers, however, so they must be used wisely and reused often. This is called the register allocation problem. In the example above, variables a and b must be assigned different registers, because they hold distinct input values. Furthermore, c and d must be assigned different registers; if they used the same one, then the value of c would be over- written in the second step and we’d get the wrong answer in the third step. On the other hand, variables b and d may use the same register; after the first step, we no longer need b and can overwrite the register that holds its value. Also, f and h may use the same register; once f C 1 is evaluated in the last step, the register holding the value of f can be overwritten. (a) Recast the register allocation problem as a question about graph coloring. What do the vertices correspond to? Under what conditions should there be an edge between two vertices? Construct the graph corresponding to the example above. (b) Color your graph using as few colors as you can. Call the computer’s registers R1, R2 etc. Describe the assignment of variables to registers implied by your coloring. How many registers do you need? “mcs” — 2017/3/10 — 22:22 — page 500 — #508 500 Chapter 12 Simple Graphs (c) Suppose that a variable is assigned a value more than once, as in the code snippet below: ::: t Dr Cs uDt 3