Computing Science 120 • Study Guide Introduction to Computing Science and Programming I Fall 2010 Edition by Greg Baker Faculty of Applied Sciences Simon Fraser University c Greg Baker, 2004–2010 2 Copyright c 2004–2010 Greg Baker. This work is licenced under the Creative Commons Attribution-ShareAlike 2.5 Canada License. To view a copy of this licence, visit http://creativecommons. org/licenses/by-sa/2.5/ca/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA. Contents Course Introduction 11 Learning Resources . . . . . . . . . . . . . . . . . . . . . . 12 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 13 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 About the Author . . . . . . . . . . . . . . . . . . . . . . . 15 I Computer Science and Programming 17 1 Computing Science Basics 19 1.1 What is an Algorithm? . . . . . . . . . . . . . . . . . . . . 19 1.2 What is Computing Science? . . . . . . . . . . . . . . . . . 22 1.3 What is Programming? . . . . . . . . . . . . . . . . . . . . 25 1.4 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2 Programming Basics 29 2.1 Starting with Python . . . . . . . . . . . . . . . . . . . . . 29 2.2 Doing Calculations . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Storing Information . . . . . . . . . . . . . . . . . . . . . . 35 2.4 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6 How Computers Represent Information . . . . . . . . . . . 41 2.7 Example Problem Solving: Feet and Inches . . . . . . . . . 48 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3 4 CONTENTS 3 Control Structures 55 3.1 Making Decisions . . . . . . . . . . . . . . . . . . . . . . . 55 3.2 Definite Iteration: for loops . . . . . . . . . . . . . . . . . 59 3.3 Indefinite Iteration: while loops . . . . . . . . . . . . . . . 62 3.4 Choosing Control Structures . . . . . . . . . . . . . . . . . 63 3.5 Example Problem Solving: Guessing Game . . . . . . . . . 65 3.6 Running Time . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.7 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.8 Coding Style . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.9 More About Algorithms . . . . . . . . . . . . . . . . . . . 84 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Functions and Decomposition 89 4.1 Defining Functions . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Variable Scope . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3 Python Modules . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.5 Handling Errors . . . . . . . . . . . . . . . . . . . . . . . . 102 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 II Problem Solving 105 5 Data Structures 107 5.1 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2 Lists and for loops . . . . . . . . . . . . . . . . . . . . . . 111 5.3 Slicing and Dicing . . . . . . . . . . . . . . . . . . . . . . . 112 5.4 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.5 Mutability . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 CONTENTS 5 6 Algorithms 123 6.1 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.3 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.4 Designing with Recursion . . . . . . . . . . . . . . . . . . . 135 6.5 What isn’t computable? . . . . . . . . . . . . . . . . . . . 142 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7 Working with Files 147 7.1 File Output . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 File Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.3 The Operating System . . . . . . . . . . . . . . . . . . . . 154 7.4 Disks and Files . . . . . . . . . . . . . . . . . . . . . . . . 155 7.5 Example Problem Solving: File Statistics . . . . . . . . . . 158 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 III Appendices 165 A Technical Instructions 167 A.1 Installing Python . . . . . . . . . . . . . . . . . . . . . . . 167 A.2 Using Python . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.3 Computing Accounts . . . . . . . . . . . . . . . . . . . . . 171 A.4 Working in the Lab . . . . . . . . . . . . . . . . . . . . . . 172 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6 CONTENTS List of Figures 1.1 The “creaming method”: an everyday algorithm. . . . . . . . 20 1.2 An algorithm that guesses a secret number between 1 and 100. 24 1.3 Figure 1.2 written in pseudocode. . . . . . . . . . . . . . . . 26 1.4 Another pseudocode example: a digital clock . . . . . . . . . 27 2.1 Prefixes for storage units. . . . . . . . . . . . . . . . . . . . . 42 2.2 The four-bit unsigned integer values. . . . . . . . . . . . . . . 44 2.3 Some examples of binary addition . . . . . . . . . . . . . . . 44 2.4 The four-bit two’s complement values . . . . . . . . . . . . . 45 2.5 Conversion of the string “Hi” to binary. . . . . . . . . . . . . 47 2.6 Meters to feet-inches conversion pseudocode. . . . . . . . . . 49 2.7 Converting to feet and inches: number of feet. . . . . . . . . 49 2.8 Converting to feet and inches: feet and inches. . . . . . . . . 50 2.9 Converting to feet and inches: rounding inches. . . . . . . . . 51 2.10 Converting to feet and inches: printing quotes . . . . . . . . 52 2.11 Converting to feet and inches: fixed rounding error . . . . . . 53 3.1 Simplified guessing game . . . . . . . . . . . . . . . . . . . . 56 3.2 Implementation of Figure 3.1 . . . . . . . . . . . . . . . . . . 57 3.3 Pseudocode to calculate factorials . . . . . . . . . . . . . . . 60 3.4 Using a for loop in Python . . . . . . . . . . . . . . . . . . . 60 3.5 Calculating factorials with Python . . . . . . . . . . . . . . . 61 3.6 Using a while loop in Python . . . . . . . . . . . . . . . . . 62 3.7 Guessing game: first guess . . . . . . . . . . . . . . . . . . . 65 3.8 Guessing game: get an answer . . . . . . . . . . . . . . . . . 66 3.9 Guessing game: trying a loop . . . . . . . . . . . . . . . . . . 67 3.10 Guessing game: a working loop . . . . . . . . . . . . . . . . . 68 3.11 Guessing game: final version . . . . . . . . . . . . . . . . . . 69 7 8 LIST OF FIGURES 3.12 A much worse version of the guessing game . . . . . . . . . . 70 3.13 Algorithm to check for repeated letters in a word . . . . . . . 73 3.14 Repeated letters Python implementation . . . . . . . . . . . 73 3.15 Algorithm to solve the subset-sum problem. . . . . . . . . . . 75 3.16 Running time of an exponential algorithm . . . . . . . . . . . 75 3.17 Graph of the functions log2 n, n, and n2 . . . . . . . . . . . . 77 3.18 A program with poor style . . . . . . . . . . . . . . . . . . . 82 3.19 Figure 3.18 with the style improved . . . . . . . . . . . . . . 83 3.20 The number 13 represented with 8 bits . . . . . . . . . . . . 85 3.21 Pseudocode to determine the 8-bit binary representation . . . 86 4.1 A program with a function defined . . . . . . . . . . . . . . . 90 4.2 A sample function . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3 A program that takes advantage of local variables . . . . . . 95 4.4 Program that prints today’s date. . . . . . . . . . . . . . . . 97 4.5 Date manipulation with the datetime module’s objects . . . 100 4.6 Catching an exception . . . . . . . . . . . . . . . . . . . . . . 102 4.7 Asking until we get correct input . . . . . . . . . . . . . . . . 103 5.1 Building a list with the append method . . . . . . . . . . . . 111 5.2 Iterating over a list . . . . . . . . . . . . . . . . . . . . . . . 112 5.3 Manipulating strings and lists . . . . . . . . . . . . . . . . . 116 5.4 Variables referencing their contents . . . . . . . . . . . . . . 118 5.5 Reference copied during assignment: aliasing . . . . . . . . . 119 5.6 Changing either alias changes both . . . . . . . . . . . . . . . 119 5.7 An expression creates a new reference and breaks the alias . . 120 5.8 A calculation creates a new instance containing the results . 121 5.9 Slicing a list forces copying of its elements . . . . . . . . . . . 121 6.1 Python implementation of linear search . . . . . . . . . . . . 124 6.2 Python implementation of binary search . . . . . . . . . . . . 127 6.3 Checking for repeated letters with sorting . . . . . . . . . . . 128 6.4 Graph of the functions n2 and n log2 n . . . . . . . . . . . . . 129 6.5 Algorithm for selection sort . . . . . . . . . . . . . . . . . . . 131 6.6 Python implementation of selection sort . . . . . . . . . . . . 132 6.7 Recursively calculate factorials . . . . . . . . . . . . . . . . . 132 6.8 Functions calls made to calculate factorial(3) . . . . . . . 133 6.9 Pseudocode for a recursive algorithm . . . . . . . . . . . . . 138 LIST OF FIGURES 9 6.10 Recursively reversing a string . . . . . . . . . . . . . . . . . . 139 6.11 Recursively putting spaces between characters in a string . . 141 6.12 Fooling a function that claims to solve the halting problem. . 143 7.1 Writing text to a file . . . . . . . . . . . . . . . . . . . . . . . 148 7.2 File created by Figure 7.1 . . . . . . . . . . . . . . . . . . . . 148 7.3 Writing text to a file . . . . . . . . . . . . . . . . . . . . . . . 149 7.4 File created by Figure 7.3 . . . . . . . . . . . . . . . . . . . . 150 7.5 Reading a text file and counting line lengths . . . . . . . . . 151 7.6 Program to read a series of times in a file . . . . . . . . . . . 153 7.7 Input file (times.txt) for Figure 7.6 . . . . . . . . . . . . . . . 153 7.8 Output of Figure 7.6 . . . . . . . . . . . . . . . . . . . . . . 153 7.9 Communication between the user, applications, OS, and hard- ware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.10 A disk divided into blocks . . . . . . . . . . . . . . . . . . . . 157 7.11 The blocks of a file defragmented . . . . . . . . . . . . . . . . 158 7.12 Word count: open file and count lines . . . . . . . . . . . . . 159 7.13 Word count test file . . . . . . . . . . . . . . . . . . . . . . . 159 7.14 Word count: counting characters 1 . . . . . . . . . . . . . . . 160 7.15 Word count: counting characters 2 . . . . . . . . . . . . . . . 161 7.16 Word count: final program . . . . . . . . . . . . . . . . . . . 163 A.1 IDLE, after running a “Hello World” statement . . . . . . . . 169 A.2 IDLE, displaying a syntax error . . . . . . . . . . . . . . . . 170 A.3 IDLE, displaying a run-time error . . . . . . . . . . . . . . . 171 10 LIST OF FIGURES Course Introduction [Most of the material in this introduction will apply to all offerings of CMPT 120, but some details will vary depending on the semester and instructor. Please check your course web site for details.] Welcome to CMPT 120, “Introduction to Computing Science and Pro- gramming I”. This course is an introduction to the core ideas of computing science and the basics of programming. This course is intended for students who do not have programming experience. If you have done a significant amount of programming, you should take CMPT 126 instead. This course isn’t intended to teach you how to use your computer. You are expected to know how to use your computer: you should be comfortable using it for simple tasks such as running programs, finding and opening files, and so forth. You should also be able to use the Internet. Here are some of the goals set out for students in this course. By the end of the course, you should be able to: • Explain some of the basic ideas of computing science. • Explain what computer programming is. • Create algorithms to solve simple problems. • Implement computer programs in the Python programming language. • Apply the core features of a programming language to solve problems. • Design programs that are easy to understand and maintain. Keep these goals in mind as you progress through the course. 11 12 COURSE INTRODUCTION Learning Resources Study Guide The Study Guide is intended to guide you through this course. It will help you learn the content and determine what information to focus on in the texts. Some suggested readings for each section are listed at the beginning of the units. The instructor for your section of the course may modify these readings. You should also look at the key terms listed at the end of each unit. In some places, there are references to other sections. A reference to “Topic 3.4,” for example, means Topic 4 in Unit 3. In the Study Guide, side-notes are indicated with this symbol. They are meant to replace the asides that usually happen in lec- tures when students ask questions. They aren’t strictly part of the “course.” Required Texts The only required text for this course (besides this Study Guide) is How to Think Like a Computer Scientist: Learning With Python. It is available from the SFU bookstore, or you can download it from the course web site. This book is a nice introduction to Python programming, which we will use in the second half of the course. The title of the book is a little misleading. The book does not discuss computer science; it only covers computer programming. In this Guide, we will try to cover more actual “computing science.” Recommended Texts There are currently no recommended texts for this course. If additional resources are required during the semester, the instructor may recommend other books and place them on reserve at the library. COURSE INTRODUCTION 13 Web Materials The web materials for this course can be found in Computing Science’s “Course Central,” http://www.cs.sfu.ca/CC/ . Online References There are several online references that are as important as the texts. You can find links to them on the course web site. These resources are very important to your success in this course. They aren’t meant to be read from beginning to end like the readings in the text- book. You should use them to get an overall picture of the topic and as references as you do the assignments. Email Communications The TAs and course instructor will use email to send announcements and tips during the semester. You should read your SFU email account regularly, at least a few times each week. You can also contact the TAs and course instructor by email if you need help. See the “Getting Help” section at the end of the Introduction for details. Requirements Computer Requirements You need to have access to a computer for this course. The labs on campus or your own computer can be used. Instructions on using the labs can be found on the course web site. You will also need a connection to the Internet through a dial-up connec- tion (one is provided with your SFU registration) or through another type of connection like a cable modem or a DSL line. A high-speed connection will make your life easier, but it isn’t required. You should activate your SFU computing account as soon as possible if you haven’t done so already. Instructions can be found in your registration material. 14 COURSE INTRODUCTION Software Requirements All of the software that you need for this course can be downloaded for free. Links to obtain the course software are on the course web site. Appendix A contains instructions for installing and working with the Python software. There may be some other pieces of software that you’ll need to use when submitting your work. See the course web site for instructions on installing and using these. Evaluation Marking The marking scheme for the course will be announced in class or by email in the first week. Labs and Assignments You will have weekly labs in this course. Details will be announced in class. The labs will consist of relatively short exercises that you should be able to complete quickly. They are intended to make sure you stay up to date with the material in the course as it progresses. The assignments are more involved than the labs. You will have to figure out more on your own and explore the concepts on the course. Exams There will be two examinations in this course. They are closed-book, i.e. you aren’t allowed any notes, calculators, or other aids. The exams have a mixture of short and longer answer questions. The 50 minute midterm exam will be held in week 7 or 8. The instructor will announce what material will be covered in the midterm a few weeks before. The final exam will be three hours long and will cover all of the course material. COURSE INTRODUCTION 15 Academic Dishonesty We take academic dishonesty very seriously in the School of Computing Sci- ence. Academic dishonesty includes (but is not limited to) the following: copying another student’s assignment; allowing others to complete work for you; allowing others to use your work; copying part of an assignment from an outside source; cheating in any way on a test or examination. If you are unclear on what academic dishonesty is, please read Policy 10.02. It can be found in the “Policies & Procedures” section in the SFU web site. Cheating on a lab or assignment will result in a mark of 0 on the piece of work. At the instructor’s option, further penalties may be requested. Any academic dishonesty will also be recorded in your file, as is required by University policy. Any academic dishonesty on the midterm or final will result in a recom- mendation that an F be given for the course. About the Author The author hates writing about himself, so he isn’t going to. So there. I’d like to thank Arsalan Butt, Yaroslav Litus and Mashaal Mamemon who worked on the creation of this course with me in the summer of 2004. Thanks also to Toby Donaldson, Anne Lavergne, David Mitchell, Brad Bart, John Edgar, and all of the other faculty members around the School that were willing to listen to an idea and, more often than not, reply with a better one. If you have any comments on this guide, please feel free to contact me at ggbaker@cs.sfu.ca. 16 COURSE INTRODUCTION Part I Computer Science and Programming 17 Unit 1 Computing Science Basics Learning Outcomes • Define the basic terms of computing science. • Explain simple algorithms using pseudocode. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Chapter 1 in How to Think Like a Computer Scientist. Before we start programming, we need to know a little about what com- puting science really is. Topic 1.1 What is an Algorithm? The concept of an “algorithm” is fundamental to all of computing science and programming. Stated simply, an algorithm is a set of instructions that can be used to solve a problem. Figure 1.1 contains one simple algorithm that you might use in everyday life. This algorithm is used in baking and it is written in a way that most people can understand and follow. It is used to make cookies, cakes, muffins, and many other baked goods. Of course, we aren’t going to spend this whole course talking about cook- ing. (It might be more fun, but the University would get cranky if we started 19 20 UNIT 1. COMPUTING SCIENCE BASICS 1. Combine the room-temperature butter and the sugar. Mix until light and fluffy. 2. Add the eggs to the creamed butter and mix to combine. 3. In another bowl, combine the liquid ingredients and mix to combine. 4. Sift together the flour and other dry ingredients. 5. Alternately add the dry and liquid ingredients to the butter-egg mixture. Mix just enough to combine. Figure 1.1: The “creaming method”: an everyday algorithm. giving cooking lessons in CMPT courses.) Still, the algorithm in Figure 1.1 has a lot in common with the algorithms we will be looking at during this course. We are more interested in the kinds of algorithms that can be completed by computers. We will spend a lot of time in this course designing algorithms and having the computer complete them for us. Here’s a definition of “algorithm” that most computer scientists can live with: [Anany Levitin, Introduction to The Design & Analysis of Algorithms, p. 3] An algorithm is a sequence of unambiguous instructions for solv- ing a problem, i.e., for obtaining a required output for any legit- imate input in a finite amount of time. There are a few words you should notice about the definition: • unambiguous: When you read an algorithm, there should be no ques- tion about what should be done. Is this the case in Figure 1.1? If you understand cooking terms like “light and fluffy” and “sift to- gether”, then you can probably follow most of this recipe. You might have some problem with the last step: you’re supposed to “alternately” add the dry and wet ingredients. Does that mean you should do dry- wet-dry? Dry-wet-dry-wet-dry-wet? How many additions should you make? Recipes in cookbooks are often written with small ambiguities like this either because it doesn’t matter what you do or the author is assuming 1.1. WHAT IS AN ALGORITHM? 21 that the reader will know what to do. For the record, the right thing in this case is probably dry-wet-dry-wet-dry. • problem: An algorithm should always present a solution to a partic- ular problem. Each algorithm is designed with a particular group of problems in mind. In Figure 1.1, the problem must have been something like “Using these ingredients, make muffins.” • legitimate input: An algorithm might need some kind of input to do its job. In the example problem, the inputs are the ingredients; you have to have the correct ingredients before you can start the algorithm. In addition to having the inputs, they have to be “legitimate”. Suppose we start the instructions in Figure 1.1 with these ingredients: 1 can of baby corn, 1 cup orange juice; 1 telephone. We aren’t going to get very far. In this example, “legitimate” ingredients include sugar, eggs, flour and butter. If you put the wrong inputs into the algorithm, it might not be able to deal with them. • finite amount of time: This means that if we start the algorithm, we had better finish it eventually. A recipe that leaves us in the kitchen until the end of time isn’t much good. Suppose we added this step to Figure 1.1: 6. Stir with a fork until the mixture turns into Beef Wellington. No amount of stirring is going to make that happen. If you followed the recipe literally, you’d be standing there stirring forever. Not good. Many later computing science courses cover algorithms for various problems. For example, CMPT 354 (Databases) discusses algo- rithms for efficiently storing database information. Data Structures When discussing algorithms, it also becomes necessary to talk about data structures. A data structure describes how a program stores the data it’s working with. 22 UNIT 1. COMPUTING SCIENCE BASICS To carry on with the cooking example, suppose you’re trying to find a recipe for muffins. Most people have their recipes in cookbooks on a shelf. To find the recipe, you’d probably select a likely looking book or two and check the index of each one for the recipe you want—that’s an algorithm for finding a recipe. On the other hand, if you have recipes on index cards in a box (because you’ve just copied the good recipes out of all of your books), you might have to shuffle through the whole pile to find the one you want. If you keep the pile in some kind of order, e.g. alphabetical by the name of the dish it makes, you might be able to find the recipe much faster. The point? The way you choose to store information can have a big effect on the algorithm you need to work with it. There are many data structures that represent different ways of storing information. We will explore a variety of data structures later in the course. Courses that discuss algorithms for particular problems generally the corresponding data structures too. Topic 1.2 What is Computing Science? Why all this talk of algorithms? This is supposed to be a computing science course: we should be talking about computers. Consider this quote: [Anany Levitin, Computing Research News, January 1993, p. 7] Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes. Science is not about tools, it is about how we use them and what we find out when we do. Computing science (also known as computer science) isn’t all about com- puters. Still, there are certainly a lot of computers around. You will be using computers in this course when you program; most computing science courses involve using computers in one way or another. Computing science is often defined as: [G. Michael Schneider and Judith L. Gersting, An Invitation to Computer Science] The study of algorithms, including 1. Their formal and mathematical properties. 1.2. WHAT IS COMPUTING SCIENCE? 23 2. Their hardware realizations. 3. Their linguistic realizations. 4. Their applications. So, computing science is really about algorithms. We will spend a lot of time in this course talking about algorithms. We will look at how to create them, how to implement them, and how to use them to solve problems. Here is a little more on those four aspects: 1. Their formal and mathematical properties: This includes asking ques- tions like “what problems can be solved with algorithms,” “for what problems can we find solutions in a reasonable amount of time,” and “is it possible to build computers with different properties that would be able to solve more problems?” 2. Their hardware realizations: One of the goals when building computers is to make them fast. That is, they should be able to execute algo- rithms specified by the programmer quickly. They should also make good use of their memory and be able to access other systems (disks, networks, printers, and so on). There are many choices that are made when designing a computer; all of the choices have some effect on the capabilities of the final product. 3. Their linguistic realizations: There are many ways to express algo- rithms so a computer can understand them. These descriptions must be written by a person and then followed by a computer. This requires some “language” that can be understood by both people and comput- ers. Again, there are many choices here that affect how easily both the person and computer can work with the description. 4. Their applications: Finally, there are questions of what actual useful things can be done algorithmically. Is it possible for a computer to un- derstand a conversation? Can it drive a car? Can the small computers in cell phones be made more useful? If the answer to any of these is “yes,” then how? Most of our algorithms won’t look much like Figure 1.1. We will focus on algorithms that computers can follow. See Figure 1.2 for an algorithm that is more relevant to a computing science course. 24 UNIT 1. COMPUTING SCIENCE BASICS 1. Tell the user to pick a secret number between 1 and 100. 2. The smallest possible number is 1; the largest possible is 100. 3. Make a guess that is halfway between the smallest and largest (round down if necessary). 4. Ask the user if your guess is too large, too small or correct. 5. If they say you’re correct, the game is over. 6. If they say your guess is too small, the smallest possible number is now the guess plus one. 7. If they say your guess is too large, the largest possible number is now the guess minus one. 8. Unless you guessed correctly, go back to step 3. Figure 1.2: An algorithm that guesses a secret number between 1 and 100. The algorithm in Figure 1.2 is designed to solve the problem “guess a secret number between 1 and 100.” It meets all of the criteria of the definition of “algorithm” from Topic 1.1. You may have to spend a few minutes to convince yourself that this algorithm will always eventually guess the correct number, thus finishing in a “finite amount of time”. It does. Try a few examples. This algorithm works by keeping track of the smallest and largest pos- sibilities for the user’s secret number. At the start of the algorithm, the number could be anywhere from 1 to 100. If you guess 50 and are told that it’s too large, you can now limit yourself to the numbers from 1 to 49—if 50 if too large then the numbers from 51 to 100 must also be too large. This process continues until you guess the right number. By the end of this course, you should be able to create algorithms like this (and more complicated ones too). You will also be able to implement them so they can be completed by a computer. 1.3. WHAT IS PROGRAMMING? 25 Check-Up Questions ◮ Can you think of a number where the algorithm in Figure 1.2 will make 7 guesses? 8? ◮ What is “legitimate input” for the algorithm in Figure 1.2? What happens if the user enters something else? Topic 1.3 What is Programming? Much of this course will focus on computer programming. What is program- ming? A computer program is an algorithm that a computer can understand. This program is often referred to as an implementation. Not all algorithms can be implemented with a computer: Figure 1.1 can’t. We’re interested in the ones that can. A programming language is a particular way of expressing algorithms to a computer. There are many programming languages and they all have different methods of specifying the parts of an algorithm. What you type in a particular programming language to specify an algorithm is often referred to as code. Each programming language is designed for different reasons and they all have strengths and weaknesses, but they share many of the same concepts. Because of this, once you have learned one or two programming languages, learning others becomes much easier. Why Python? In this course, we will be using the Python programming language. Python is an excellent programming language for people who are learning to program. You may be wondering why this course doesn’t teach programming in C++ or Java. These are the languages that you probably hear about most often. In this course, we want you to focus on the basic concepts of programming. This is much harder to do in Java and C++: there are too many other things to worry about when programming in those languages. Students often get overwhelmed by the details of the language and can’t concentrate on the concepts behind the programs they are writing. 26 UNIT 1. COMPUTING SCIENCE BASICS write “Think of a number between 1 and 100.” set smallest to 1 set largest to 100 until the user answers “equal”, do this: set guess to ⌊(smallest + largest)/2⌋ write “Is your number more, less or equal to guess?” read answer if answer is “more”, then set smallest to guess + 1 if answer is “less”, then set largest to guess − 1 Figure 1.3: Figure 1.2 written in pseudocode. C++ and Java are very useful for creating desktop applications and other big projects. You aren’t doing that in this course. Languages like Python are a lot easier to work with and are well suited for smaller projects and for learning to program. Topic 1.4 Pseudocode Before you start writing programs, you need a way to describe the algorithms that you are going to implement. This is often done with pseudocode. The prefix “pseudo-” means “almost” or “nearly”. Pseudocode is almost code. It’s close enough to being a real program in a programming language that it’s easy to translate, but not so close that you have to worry about the technical details. The natural language (English) algorithm descriptions in Figures 1.1 and 1.2 might be accurate, but they aren’t generally written in a way that’s easy to transform to a program. Figure 1.3 is an example of the way we’ll write pseudocode in this course. It is a translation of Figure 1.2. Figure 1.3 uses the notation ⌊x⌋, which you might not have seen before. It means “round down”, so ⌊3.8⌋ = 3 and ⌊−3.8⌋ = −4. It is usually pronounced “floor ”. 1.4. PSEUDOCODE 27 set hour to 0 set minute to 0 set second to 0 repeat forever: set second to second + 1 if second is more than 59, then set second to 0 set minute to minute + 1 if minute is more than 59, then set minute to 0 set hour to hour + 1 if hour is more than 23, then set hour to 0 write “hour :minute:second ” wait for 1 second Figure 1.4: Another pseudocode example: a digital clock It is usually helpful to express an algorithm in pseudocode before you start programming. Especially as you’re starting to program, just expressing yourself in a programming language is challenging. You don’t want to be worrying about what you’re trying to say and how to say it at the same time. Writing good pseudocode will get you to the point that you at least know what you’re trying to do with your program. Then you can worry about how to say it in Python. There are several computing science courses where no programming lan- guage is used and you don’t write any code at all. If you know how to program, it is assumed that you know how to convert pseudocode to a pro- gram. So, the courses concentrate on pseudocode and algorithms. The rest is easy (once you learn how to program). There is another example of pseudocode in Figure 1.4. This algorithm could be used to manage the display of a digital clock. It keeps track of the current hour, minute, and second (starting at exactly midnight). 28 UNIT 1. COMPUTING SCIENCE BASICS Summary This unit introduced you to some of the fundamental ideas in computing science. The ideas here are key to all of computing science. If you’re a little fuzzy on what exactly a data structure or pseudocode is, you don’t need to panic (yet). After you’ve written a few programs in later units, come back to these terms and see if they make a little more sense then. Key Terms • algorithm • programming language • data structure • Python • computing science • pseudocode • computer programming Unit 2 Programming Basics Learning Outcomes • Use the Python software to get programs running. • Create programs that perform simple calculations. • Use variables to store information in a program. • Create programs that take input from the user. • Explain how computers store information in binary. • Take a simple problem and create an algorithm that solves it. • Implement that algorithm in Python. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Chapter 2 in How to Think Like a Computer Scientist. Topic 2.1 Starting with Python In this course, you will be using the Python programming language. You can download Python for free or use it in the lab. See Appendix A for more instructions on how to use the software. 29 30 UNIT 2. PROGRAMMING BASICS One nice feature of Python is its interactive interpreter . You can start up Python and start typing in Python code. It will be executed immediately, and you will see the results. You can also type Python code into a file and save it. Then, you can run it all at once. The interactive interpreter is generally used for exploring the language or testing ideas. Python code in a file can be run as an application and even double-clicked to run your program. You will start by working with the Python interpreter. See Appendix A for instructions on getting Python running. When you start the Python interpreter, you’ll see something like this: Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) Type "help", "copyright", "credits" or "license" for more information. >>> The >>> is the prompt. Whenever you see it in the interpreter, you can type Python commands. When you press return, the command will be executed and you will be shown the result. Whenever you see the >>> prompt in examples, it’s an example of what you’d see in the interpreter if you typed the code after the >>>. For some reason, when people are taught to program, the first program they see is one that prints the words “Hello world” on the screen. Not wanting to rock the boat, you will do that too. Here’s what it looks like in the Python interpreter: >>> print "Hello world" Hello world The stuff after the prompt is the first line of Python code you have seen. You could have also typed it into a text editor, named the file hello.py and run it. The print command in Python is used to put text on the screen. What- ever comes after it will be printed on the screen. Any text in quotes, like "Hello world" in the example, is called a string. Strings are just a bunch of characters. Characters are letters, numbers, spaces, and punctuation. Strings have to be placed in quotes to be distin- guished from Python commands. If we had left out the quotes, Python would have complained that it didn’t know what “Hello” meant, since there is no built-in command called Hello. 2.1. STARTING WITH PYTHON 31 The Interpreter vs. the Editor When you use Python’s IDLE (Integrated DeveLopment Environment), the first window that you see is the interactive interpreter. That’s the window with the >>> prompt where you can type Python code, press return, and see its result. You can use this to test small sections of Python code to make sure they do what you expect. If you create a “New Window”, the window you create is a file editing window. It doesn’t have a prompt or anything else. You can type Python code here and save it as a .py file. You can run a Python .py file by double clicking it. You can also press F5 while editing it in IDLE to run it. You should use an editor window to write whole programs. Check-Up Questions ◮ Type print "Hello world!" into an editor window in IDLE and save it as hello.py file. Run it with Python. If you’re using Windows and you run the program by double-clicking the file, the output window might disappear before you can see the results. You can stop this from happening by running the program in IDLE or by waiting for the user to press return before ending the program. We’ll talk about how to do that in the next topic. ◮ Add a few more print statements to your hello.py program (one per line). Run it and see what happens. Statements If you did the “Check-Up Questions” above, you would have created a file containing one line: print "Hello world!" This line is a Python statement. Statements are the basic building blocks of Python programs. Each state- ment expresses a part of the overall algorithm that you’re implementing. The print statement is the first one you have seen so far, but there are many others. Each one lets you express a different idea in such a way that the computer can complete it. 32 UNIT 2. PROGRAMMING BASICS When you run a Python program (i.e., code you typed in a .py file and saved), the statements are executed in the order they appear in the file. So, the Python program print "Hello world!" print "I’m a Python program that prints stuff." . . . will produce this output: Hello world! I’m a Python program that prints stuff. Topic 2.2 Doing Calculations In order to implement the algorithm in Figure 1.3, you will need to be able to calculate guess + 1 and ⌊(smallest + largest)/2⌋. Python can do calculations like this. An expression is any kind of calcu- lation that produces a result. Here are some examples of using expressions in print statements in the Python interpreter: >>> print 10 - 2 8 >>> print 15/3 5 >>> print 25+19*5 120 >>> print 10.2 / 2 / 2 2.55 The Python operators +, -, *, and / perform addition, subtraction, mul- tiplication, and division, as you might expect. Note that they do order-of- operations they way you’d expect, too: >>> print 25+19*5 120 >>> print 25+(19*5) 120 >>> print (25+19)*5 220 2.2. DOING CALCULATIONS 33 Parentheses do the same thing they do when you’re writing math: they wrap up part of a calculation so it’s done first. Note that a number by itself is an expression too. >>> print 18 18 Now, in Figure 1.3, suppose that the current value of smallest is 76 and largest is 100. Then, we can at least do the right calculation: >>> print (76+100)/2 88 Python can do calculations on strings too. >>> print "An" + "Expression" AnExpression >>> print "An " + ’Expression’ An Expression >>> print ’ABC’ * 4 ABCABCABCABC Note that when you enter a string, it has to be wrapped up in quotes. This is the only way Python can distinguish between characters that are part of a string or part of the expression itself. In Python, single quotes (’) and double quotes (") can be used interchangeably. If you forget the quotes around a string, you’ll probably get an error message: >>> print An + ’Expression’ NameError: name ’An’ is not defined Here, Python is telling us that it doesn’t know the word “An”. It does know words like print and a few others. If you want to talk about a bunch of characters as part of a string, they have to be surrounded by quotes. So, even when a number, or anything else is in quotes, it is treated like a string (which makes sense, since strings go in quotes): >>> print 120 * 3 360 >>> print "120" * 3 120120120 >>> print "120 * 3" 120 * 3 34 UNIT 2. PROGRAMMING BASICS Functions Python can also use functions as part of expressions. These work like func- tions in mathematics: you give the function some arguments, and something is done to calculate the result. The result that the function gives back is called its return value. For example, in Python, there is a built-in function round that is used to round off a number to the nearest integer: >>> print round(13.89) 14.0 >>> print round(-4.3) -4.0 >>> print round(1000.5) 1001.0 Functions can take more than one argument. The round function can take a second argument (an optional argument) that indicates the number of decimal places it should round to. For example, >>> print round(12.3456, 1) 12.3 >>> print round(12.3456, 2) 12.35 >>> print round(12.3456, 5) 12.3456 In these examples, the value is rounded to the nearest 0.1, 0.01, and 0.00001. For this function, if you don’t indicate the optional argument, its default is 0. The default value for optional arguments depends on the function. Functions can take any type of information as their argument and can return any type. For example, Python’s len function will return the length of a string, i.e. how many characters it has: >>> print len("hello") 5 >>> print len("-<()>-") 6 >>> print len("") 0 2.3. STORING INFORMATION 35 There are many other ways to do calculations on numbers and strings than we have seen here. You will see more as you learn more about programming. You will see some more functions as you need them. Check-Up Questions ◮ Try printing the results of some other expressions. Check the calculations by hand and make sure the result is what you expect. ◮ Try some of the above string expressions, swapping the single quotes for double quotes and vice-versa. Convince yourself that they really do the same thing. ◮ Some of the examples above “multiply” a string by a number (like "cow"*3). The result is repetition of the string. What happens if you multiply a num- ber by a string (3*"cow")? What about a string by a string ("abc"*"def")? Topic 2.3 Storing Information You aren’t going to want to always print out the result of a calculation like we did in Topic 2.2. Sometimes, you need to perform a calculation to be used later, without needing to display the results right away. You might also want to ask the user a question and remember their answer until you need it. For example, in the algorithm in Figure 1.2, you want to calculate values for smallest, largest, and guess and store those results. You also need to ask the user for their answer and store the result. You need to keep all of those in the computer’s memory. Whenever we need the computer to temporarily remember some infor- mation in a program, we will use a variable. A variable is a way for you to reserve a little bit of the computer’s memory to store the information you need. You will give variables names that you will use to refer to them later. For example, if you ask the user for their age and want to store their input, you might use a variable named “age”. The name of the variable should describe and somehow indicate what it represents. 36 UNIT 2. PROGRAMMING BASICS To put a value in a variable, a variable assignment statement is used. For example, to put the result of the calculation 14/2 into a variable named quotient, quotient = 14/2 In a variable assignment statement, put the name of the variable you want to change on the left, an equals sign, and the new value on the right. You can use any expression to calculate the value that will be stored in the variable. Variables can store any kind of information that Python can manipulate. So far we have seen numbers and strings. Be careful: Only the result of the calculation is stored, not the whole calculation. To use the value that’s stored in a variable, you just have to use its name. If a variable name is used in an expression, it is replaced with the stored value. >>> num = 7 >>> word = "yes" >>> print num - 3 4 >>> print word + word yesyes >>> num = 4 >>> print num - 3 1 Note that you can change the value in a variable. In the above example, num was first set to 7 and then changed to 4. Notice that the variable num was holding a number and word was holding a string. You can change the kind of information a variable holds by doing a variable assignment as well. Topic 2.4 Types As noted above and in Topic 2.2, Python treats numbers (like 2, -10, and 3.14) differently than strings (like "abc", "-10", and ""). For example, you can divide two numbers, but it doesn’t make sense to divide strings. 2.4. TYPES 37 >>> print 10/2 5 >>> print "abc" / 2 TypeError: unsupported operand type(s) for /: ’str’ and ’int’ Numbers and strings are two different types of information that Python can manipulate. String variables are used to hold text or collections of characters. In Python, a TypeError indicates that you’ve used values whose types can’t be used with the given operation. The type of the values given to an operator can change the way it works. In Topic 2.2, you saw that the + operator does different things on numbers (addition) and strings (joining). In fact, the numeric values that Python stores aren’t as simple as just “numbers”. Have a look at this example from the Python interpreter: >>> print 10/2 5 >>> print 10/3 3 >>> print 10.0/3 3.33333333333 Why does Python give a different answer for 10/3 than it does for 10.0/3? The division operation does different things with integers than with floating point values. Integers are numbers without any fraction part. So, 10, 0, and -100 are all integers. Numbers with fractional parts, like 3.14, -0.201, and 10.0, are stored as floating point values. These two types are represented differently in the computer’s memory, as we will discuss in Topic 2.6. That’s why Python comes up with different answers for 10/3 and 10.0/3: there are different types of values given. In the case of integer division (10/3), the rule is that the result must be an integer. The floating point result has its fractional part rounded down to give the integer 3. For floating point division, the result can have a fractional part, so the result is what you’d probably expect. There is a built-in function called type that will tell you the type of an object in Python. Try type(10/3) and type(10.0/3). When implementing the pseudocode in Figure 1.3, you can actually use this to make sure the calculation guess rounds down to the next integer. 38 UNIT 2. PROGRAMMING BASICS Note that you can trick Python into treating a whole number like a float- ing point number by giving it a fractional part with you type it. So 10 is an integer (or “int” for short), but 10.0 is a floating point value (or “float”). Type Conversion Sometimes, you’ll find you have information of one type, but you need to convert it to another. For example, suppose you want to calculate the average of several integers. You would do the same thing you would do by hand: add up the numbers and divide by the number of numbers. Suppose you had found the sum of 10 numbers to be 46, leaving the values 46 in sum and 10 in num. If you try to divide these numbers in Python, you’ll get the result 4, since you’re dividing two integers. Really, you want the result 4.6, which you would get if at least one of the values being divided was a float. There are Python functions that can be used to change a value from one type to another. You can use these in an expression to get the type you want. The function int() converts to an integer, float() converts to a floating point value, and str() converts to a string. For example, >>> float(10) 10.0 >>> str(10) ’10’ >>> int(’10’) 10 >>> int(83.7) 83 >>> str(123.321) ’123.321’ >>> int("uhoh") ValueError: invalid literal for int(): uhoh As you can see, these functions will do their best to convert whatever you give them to the appropriate type. Sometimes, that’s just not possible: there’s no way to turn "uhoh" into an integer, so it causes an error. In the example of calculating the average, we can do a type conversion to get the real average: 2.4. TYPES 39 >>> total = 46 >>> num = 10 >>> print total/num 4 >>> print float(total)/num 4.6 >>> print float(total/num) 4.0 Have a closer look at the last example. Since the conversion is wrapped around the whole calculation, only the result is converted. So, Python divides the integers 46 and 10 to get 4. This is converted to the floating point value 4.0. In order for the floating point division to work, at least one of the numbers going into the division must be a floating point value. Converting numbers to strings is often handy when printing. Again, suppose you have 46 in the variable total and you want to print out a line like The sum was 46. You can print out multiple values with the comma, but they are separated by spaces: >>> print "The sum was", total, "." The sum was 46 . Note that there’s a space between the 46 and the period. You can remove this by combining strings to get the result we want: >>> print "The sum was " + str(total) + "." The sum was 46. When Python joins strings, it doesn’t add any extra spaces. You have to convert total to a string here since Python doesn’t know how to add a string and a number: >>> print "The sum was " + total + "." TypeError: cannot concatenate ’str’ and ’int’ objects The word concatenate means “join together”. When you use the + on strings, it’s not really adding them, it’s joining them. That’s called concatenation. 40 UNIT 2. PROGRAMMING BASICS Topic 2.5 User Input Something else you will need to do to implement the algorithm from Fig- ure 1.3 is to get input from the user. You need to ask them if the number they’re thinking of is larger, smaller or equal. To do this in Python, use the raw_input function. This function will give the user whatever message you tell it to, wait for them to type a response and press enter, and return their response to your expression. For example, this program will ask the user for their name and then say hello: name = raw_input("What is your name? ") print "Hello, " + name + "." If you run this program, it will display “What is your name? ” on the screen and wait for the user to respond. Their response will be stored in the variable name. For example, What is your name? Julius Hello, Julius. If the user enters something else, that’s what will go in the name variable, What is your name? Joey Jo-Jo Hello, Joey Jo-Jo. In this guide, any input that the user types will be set in bold, like this. Whenever you use the raw_input function, it will return a string. That’s because as far as the interpreter is concerned, the user just typed a bunch of characters and that’s exactly what a string is. If you want to treat the user’s input as an integer or floating point number, you have to use one of the type conversion functions described above. For example, if you ask the user for their height, you really want a floating point value, but we get a string. So, it must be converted: m = float(raw_input("Enter your height (in metres): ")) inches = 39.37 * m print "You are " + str(inches) + " inches tall." When you run this program, 2.6. HOW COMPUTERS REPRESENT INFORMATION 41 Enter your height (in metres): 1.8 You are 70.866 inches tall. In this example, the user enters the string "1.8", which is returned by the raw_input function. That is converted to the floating point number 1.8 by the float function. This is stored in the variable m (for “metres”). Once there is a floating point value in m, your program can do numeric calculations with it. The number of inches is calculated and the corresponding floating point number is stored in inches. To print this out, it is converted back to a string with the str function. Sometimes print will do the conversion for you, but it was done explicitly in this program. Topic 2.6 How Computers Represent Information You may be wondering why you have to care about all of the different types of values that Python can handle. Why should 25 be different from 25.0? For that matter, how is the number 25 different from the string "25"? The real difference here is in the way the computer stores these different kinds of information. To understand that, you need to know a little about how computers store information. Binary All information that is stored and manipulated with a computer is repre- sented in binary, i.e. with zeros and ones. So, no matter what kind of infor- mation you work with, it has to be turned into a string of zeros and ones if you want to manipulate it with a computer. Why just zeros and ones? A computer’s memory is basically a whole bunch of tiny rechargeable batteries (capacitors). These can either be discharged (0) or charged (1). It’s fairly easy for the computer to look at one of these capacitors and decide if it’s charged or not. 42 UNIT 2. PROGRAMMING BASICS Prefix Symbol Factor (no prefix) 20 = 1 kilo- k 2 = 1024 ≈ 103 10 mega- M 220 = 1048576 ≈ 106 giga- G 230 = 1073741824 ≈ 109 tera- T 2 = 1099511627776 ≈ 1012 40 Figure 2.1: Prefixes for storage units. It’s possible to use the same technology to represent digits from 0 to 9, but it’s very difficult to distinguish ten different levels of charge in a capacitor. It’s also very hard to make sure a capacitor doesn’t discharge a little to drop from a 7 to a 6 without noticing. So, modern computers don’t do this. They just use a simpler system with two levels of charge and end up with zeros and ones. Hard disks and other storage devices also use binary for similar reasons. Computer networks do as well. A single piece of storage that can store a zero or one is called a bit. Since a bit is a very small piece of information to worry about, bits are often grouped. It’s common to divide a computer’s memory into eight-bit groups called bytes. So, 00100111 and 11110110 are examples of bytes. When measuring storage capacity, the number of bits or bytes quickly becomes large. Figure 2.1 show the prefixes that are used for storage units and what they mean. For example, “12 megabytes” is 12 × 220 bytes = 12582912 bytes = 12582912 × 8 bits = 100663296 bits . Note that the values in Figure 2.1 are slightly different than the usual meaning of the metric prefixes. One kilometre is exactly 1000 metres, not 1024 metres. When measuring storage capacities in computers, the 1024 version of the metric prefixes is usually used. That statement isn’t entirely true. Hard drive makers, for instance, generally use units of 1000 because people would generally prefer a “60 gigabyte” drive to a “55.88 gigabyte” drive (60 × 1012 = 55.88 × 230 ). 2.6. HOW COMPUTERS REPRESENT INFORMATION 43 Unsigned Integers Once you have a bunch of bits, you can use them to represent numbers. First, think about the way you count with regular numbers: 1, 2, 3, 4, 5. . . . Consider the number 157. What does each of the digits in that number mean? The “1” is one hundred, “5” is five tens, and “7” is seven ones: 157 = (1 × 102 ) + (5 × 10) + (7 × 1). As you go left from one place to the next, the value it represents is multiplied by 10. Each digit represents the number of 1s, 10s, 100s, 1000s. . . . The reason the values increase by a factor of 10 is that there are ten possible digits in each place: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. This is called decimal or base 10 arithmetic. (The “dec-” prefix in latin means 10.) Applying the same logic, there is a counting system with bits, binary or base 2 arithmetic (“bi-” means 2). The rightmost bit will be the number of 1s, the next will be the number of 2s, then 4s, 8s, 16s, and so on. Binary values are often written with a little 2 (a subscript), to indicate that they are base 2 values: 1012 . If there’s any possibility for confusion, base 10 values are written with a subscript 10: 3410 . To convert binary values to decimal, do the same thing you did above, substituting 2s for the 10s: 10012 = (1 × 23 ) + (0 × 22 ) + (0 × 21 ) + (1 × 20 ) =8+1 = 910 . The base 2 value 10012 is equal to 910 . Another example with a larger number: 100111012 = (1 × 27 ) + (0 × 26 ) + (0 × 25 ) + (1 × 24 ) + (1 × 23 ) + (1 × 22 ) + (0 × 21 ) + (1 × 20 ) = 128 + 16 + 8 + 4 + 1 = 15710 . So, 10011101 is the base 2 representation of the number 157. Any positive whole number can be represented this way, given enough bits. All of the values that can be represented with four bits are listed in Figure 2.2. You should be able to convince yourself that for any group of n bits, there are 2n different possible values that can be stored in those bits. So, n bits 44 UNIT 2. PROGRAMMING BASICS binary decimal binary decimal 1111 15 0111 7 1110 14 0110 6 1101 13 0101 5 1100 12 0100 4 1011 11 0011 3 1010 10 0010 2 1001 9 0001 1 1000 8 0000 0 Figure 2.2: The four-bit unsigned integer values. 1 1 1 1010 1011 1101 + 0100 + 0010 + 0101 1110 1101 10010 Figure 2.3: Some examples of binary addition can represent any number from 0 to 2n − 1. Other common groupings are of 16 bits (which can represent numbers 0 to 216 − 1 = 65535) and 32 bits (which can represent numbers 0 to 232 − 1 = 4294967295). The computer can do operations like addition and subtraction on binary integers the same way you do with decimal numbers. You just have to keep in mind that 1 + 1 = 210 = 102 , so if you add two 1’s together, there is a carry. There are a few examples of binary addition in Figure 2.3. These corre- spond to the decimal operations 10 + 4 = 14, 11 + 2 = 13, and 13 + 5 = 18. You can use the familiar algorithms you know for subtraction, multiplication, and division as well. Positive and Negative Integers The method described above will let us represent any positive integer in the computer’s memory. What about negative numbers? The bits that make up the computer’s memory must be used to represent both positive and negative numbers. The typical method is called two’s 2.6. HOW COMPUTERS REPRESENT INFORMATION 45 binary decimal binary decimal 1111 −1 0111 7 1110 −2 0110 6 1101 −3 0101 5 1100 −4 0100 4 1011 −5 0011 3 1010 −6 0010 2 1001 −7 0001 1 1000 −8 0000 0 Figure 2.4: The four-bit two’s complement values complement notation. (The previous method, which can’t represent negative values, is generally called unsigned integer representation.) To convert a positive value to a negative value in two’s complement, you first flip all of the bits (convert 0s to 1s and 1s to 0s) and then add one. So, the four-bit two’s complement representation for −5 is: start with the positive version: 0101 flip all of the bits: 1010 add one: 1011 . All of the four-bit two’s complement values are shown in Figure 2.4. If we use four bits, we can represent values from −8 to 7. Here are a few other reasons computers use two’s complement notation: • It’s easy to tell if the value is negative: if the first bit is 1, it’s negative. • For positive numbers (values with the first bit 0), the unsigned and two’s complement representations are identical. The values 0–7 have the same representations in Figures 2.2 and 2.4. • Addition and subtraction work the same way as for unsigned numbers. Look back at Figure 2.3. If you instead interpret at the numbers as two’s complement values, the corresponding decimal calculations are −6 + 4 = −2, −5 + 2 = −3, and −3 + 5 = 2. (You have to ignore the last 1 that was carried in the last example—the computer will.) They are still correct. That means that the parts of the computer that do calculations don’t have to know whether they have unsigned or two’s complement values to work with. 46 UNIT 2. PROGRAMMING BASICS • No number has more than one two’s complement representation. If instead the first bit was used for the sign (0 = positive, 1 = negative), then there would be two versions of zero: 0000 and 1000. This is a waste of one representation, which wastes storage space, not to mention that the computer has to deal with the special case that 0000 and 1000 are actually the same value. That makes it difficult to compare two values. Most modern computers and programming languages use 32 bits to store integers. With this many bits, it is possible to store integers from −231 to 231 − 1 or −2147483648 to 2147483647. So, in many programming languages, you will get an error if you try to add one to 2147483647. In other languages, you will get −2147483648. The analogous calculation with four bits is 7 + 1: 1 1 1 0111 + 0001 1000 If these were unsigned values, this is the right answer. But, if you look in Figure 2.4, you’ll see that 1000 represents −8. If this overflow isn’t caught when doing two’s complement, there’s a “wraparound” that means you can suddenly go from a large positive number to a large negative one, or vice- versa. In Python, you don’t generally see any of this. Python will automatically adjust how it represents the numbers internally and can represent any integer. But, if you go on to other languages, you will eventually run into an integer overflow. Another type of numbers is the floating point value. They have to be stored differently because there’s no way to store fractional parts with two’s complement. Floating point representation is more complicated; it is beyond the scope of this course. Characters and Strings The other types of information that you have seen in your Python experience are characters and strings. A character is a single letter, digit or punctuation symbol. A string is a collection of several characters. So, some characters are T, $, and 4. Some strings are "Jasper", "742", and "bhay-gn-flay-vn". 2.6. HOW COMPUTERS REPRESENT INFORMATION 47 H i (ASCII chart lookup) 72 105 (conversion to binary) 01001000 01101001 Figure 2.5: Conversion of the string “Hi” to binary. Storing characters is as easy as storing unsigned integers. For a byte (8 bits) in the computer’s memory, there are 28 = 256 different unsigned num- bers: 0–255. So, just assign each possible character a number and translate the numbers to characters. For example, the character T is represented by the number 84, the charac- ter $ by 36, and 4 by 52. This set of translations from numbers to characters and back again is called a character set. The particular character set that is used by almost all modern computers, when dealing with English and other western languages, is called ASCII. The course web site contains links to a full list of ASCII characters, if you’d like to see it. So, in order to store the character T in the computer’s memory, first look up its number in the character set and get 84. Then, use the method described for unsigned integers to convert the number 84 to an 8-bit value: 01010100. This can then be stored in the computer’s memory. With only one byte per character, we can only store 256 different char- acters in our strings. This is enough to represent English text, but it starts to get pretty hard to represent languages with accents (like á or ü). It’s just not enough characters to represent languages like Chinese or Japanese. The Unicode character set was created to overcome this limitation. Uni- code can represent up to 232 characters. This is enough to represent all of the written languages that people use. Because of the number of possible characters, Unicode requires more than one byte to store each character. In ASCII, storing strings with several characters, can be done by using a sequence of several bytes and storing one character in each one. For example, in Figure 2.5, the string “Hi” is converted to binary. In Figure 2.5, the binary string 0100100001101001 represents “Hi” in ASCII. But, if you look at this chunk of binary as representing an integer, 48 UNIT 2. PROGRAMMING BASICS it’s the same as 18537. How does the computer know whether these two bytes in memory are representing the string “Hi” or the number 18537? There actually isn’t any difference as far as the computer itself is con- cerned. Its only job is to store the bits its given and do whatever calculations it’s asked to do. The programming language must keep track of what kind of information the different parts of the memory are holding. This is why the concept of types is so important in Python. If Python didn’t keep track of the type of each variable, there would be no way to tell. In some programming languages, C in particular, you can work around the type information that the programming language is storing. For example, you could store the string “Hi” and then later convince the computer that you wanted to treat that piece of memory like a number and get 18537. This is almost always a bad idea. How computers represent various types of information is some- times quite important when programming. It is also discussed in CMPT 150 (Computer Design) and courses that cover how pro- gramming languages work like CMPT 379 (Compiler Design). Topic 2.7 Example Problem Solving: Feet and Inches Back in Topic 2.5, there was a program that converted someone’s height in metres to inches: Enter your height (in metres): 1.6 You are 62.992 inches tall. But, people don’t usually think of their height in terms of the number of inches. It’s much more common to think of feet and inches. It would be better if the program worked like this: Enter your height (in metres): 1.6 You are 5’ 3" tall. 2.7. EXAMPLE PROBLEM SOLVING: FEET AND INCHES 49 write “Enter your height (in metres):” read metres set totalinches to 39.37 × metres set feet to ⌊totalinches/12⌋ set inches to totalinches − feet × 12 round inches to the nearest integer write “You are feet ′ inches ′′ tall.” Figure 2.6: Meters to feet-inches conversion pseudocode. metres = float(raw_input( \ "Enter your height (in metres): ")) total_inches = 39.37 * metres feet = total_inches/12 print "You are " + str(feet) + " feet tall." Figure 2.7: Converting to feet and inches: number of feet. The notation 5′ 3′′ is used to indicate “5 feet and 3 inches”, which is 5 × 12 + 3 = 63 inches. To do this conversion, convert the number of metres to inches, as done in Topic 2.5, by multiplying by 39.37. Then, determine how many feet and inches there are in the total number of inches. The pseudocode is shown in Figure 2.6. When you’re converting an idea for an algorithm to code, you shouldn’t try to do it all at once, especially when you’re first learning to program. Implement part of the algorithm first, then test the program to make sure it does what you expect before you move on. Trying to find problems in a large chunk of code is very hard: start small. Start writing a Python program to implement the pseudocode in Fig- ure 2.6. You can grab the first few lines from the program in Topic 2.5. Then, try to calculate the number of feet. This has been done in Figure 2.7. Note that when you run this program, it calculates the number of feet as a floating point number: Enter your height (in metres): 1.6 You are 5.24933333333 feet tall. 50 UNIT 2. PROGRAMMING BASICS metres = float(raw_input( \ "Enter your height (in metres): ")) total_inches = 39.37 * metres feet = int(total_inches/12) inches = total_inches - feet*12 print "You are " + str(feet) + " feet and " \ + str(inches) + " inches tall." Figure 2.8: Converting to feet and inches: feet and inches. This makes sense, given what we know about types: when Python divides a floating point value (metres), it returns a floating point value. But in the algorithm, you need an integer and it needs to be rounded down to the next integer. This is what the int function does when it converts floating point numbers to integers, so you can use that to get the correct value in feet. If you have a statement in Python that you want to split across multiple lines, so it’s easier to read, you can end the line with a backslash, “\”. This was done in Figures 2.7 and 2.8, so the code would fit on the page. Once you have the correct number of feet as an integer, you can calculate the number of inches too. This is done in Figure 2.8. This program does the right calculation, but leaves the number of inches as a floating point number: Enter your height (in metres): 1.6 You are 5 feet and 2.992 inches tall. To convert the number of inches to an integer, you can’t use the int function, which would always round down. You shouldn’t get 5′ 2′′ in the above example; you should round to the nearest integer and get 5′ 3′′ . You can use the round function for this. Note that round does the round- ing, but leaves the result as a floating point value. You will have to use the int function to change the type, but the value will already be correct. See Figure 2.9 for the details. When you run this program, the output is almost correct: Enter your height (in metres): 1.6 You are 5 feet and 3 inches tall. 2.7. EXAMPLE PROBLEM SOLVING: FEET AND INCHES 51 metres = float(raw_input( \ "Enter your height (in metres): ")) total_inches = 39.37 * metres feet = int(total_inches/12) inches = int(round(total_inches - feet*12)) print "You are " + str(feet) + " feet and " \ + str(inches) + " inches tall." Figure 2.9: Converting to feet and inches: rounding inches. The last thing you have to do to get the program working exactly as specified at the start of the topic is to print out the feet and inches in the proper format: 5′ 3′′ . This presents one last problem. You can’t just print double quotes, since they are used to indicate where the string literal begins and ends. Code like this will generate an error: print str(feet) + "’ " + str(inches) + "" tall" The interpreter will see the "" and think it’s an empty string (a string with no characters in it). Then, it will be very confused by the word “tall”. The solution is to somehow indicate that the quote is something that it should print, not something that’s ending the string. There are several ways to do this in Python: • Put a backslash before the quote. This is called escaping a character. It’s used in a lot of languages to indicate that you mean the character itself, not its special use. print str(feet) + "’ " + str(inches) + "\" tall" • Use a single quote to wrap up the string. In Python, you can use either single quotes (’) or double quotes (") to indicate a string literal. There’s no confusion if you have a double quote inside a single-quoted string. print str(feet) + "’ " + str(inches) + ’" tall’ Of course, you have to use double quotes for the string that contains a single quote. 52 UNIT 2. PROGRAMMING BASICS metres = float(raw_input( \ "Enter your height (in metres): ")) total_inches = 39.37 * metres feet = int(total_inches/12) inches = int(round(total_inches - feet*12)) print "You are " + str(feet) + "’ " \ + str(inches) + ’" tall.’ Figure 2.10: Converting to feet and inches: printing quotes • A final trick that can be used is Python’s triple-quoted string. If you wrap a string in three sets of double quotes, you can put anything inside (even line breaks). This can be a handy trick if you have a lot of stuff to print and don’t want to have to worry about escaping characters. print str(feet) + """’ """ + str(inches) \ + """" tall""" This can be very cumbersome and hard to read for short strings like this. (As you can see, it made the whole thing long enough it wouldn’t fit on one line.) It’s more useful for long strings. So, finally, the quotes can be printed to produce the desired output. See Figure 2.10. When the program runs, it produces output like this: Enter your height (in metres): 1.6 You are 5’ 3" tall. But, there is still one problem with this program that is a little hard to notice. What happens when somebody comes along who is 182 cm tall? Enter your height (in metres): 1.82 You are 5’ 12" tall. That’s not right: five feet and twelve inches should be displayed as six feet and zero inches. The problem is with the rounding-off in the calculation. For this input, total_inches becomes 71.6534, which is just under six feet (72 inches). Then the division to calculate feet gives a result of 5, which we should think of as an error. The problem isn’t hard to fix: we are just doing the rounding-off too late. If instead of total_inches being the floating-point value 71.6534, we could 2.7. EXAMPLE PROBLEM SOLVING: FEET AND INCHES 53 metres = float(raw_input( \ "Enter your height (in metres): ")) total_inches = int(round(39.37 * metres)) feet = total_inches/12 inches = total_inches - feet*12 print "You are " + str(feet) + "’ " \ + str(inches) + ’" tall.’ Figure 2.11: Converting to feet and inches: fixed rounding error round it off immediately to 72. That would correct this problem and it has been done in Figure 2.11. Now we get the right output: Enter your height (in metres): 1.82 You are 6’ 0" tall. This is a good lesson for you to see at this point: it’s important to test your program carefully, since bugs can hide in unexpected places. Check-Up Questions ◮ Download the code from this topic from the course web site and test it with some other inputs. Do the conversion by hand and make sure the program is working correctly. ◮ Try some “bad” inputs and see what the program does. For example, what if the user types in a negative height? What if they type something that isn’t a number? Summary There’s a lot in this unit. You should be writing your first programs and figuring out how computers work. The example developed in Topic 2.7 is intended to give you some idea of how the process of creating a program might look. When you’re learning to program, you should be writing programs. Read- ing this Guide over and over won’t help. You should actually spend some 54 UNIT 2. PROGRAMMING BASICS time at a computer, experimenting with the ideas presented here, learning how to decipher error messages, and dealing with all of the other problems that come with writing your first programs. Key Terms • interactive interpreter • unsigned integer • statement • string • expression • ASCII • operator • floating point • function • binary • argument • bit • variable • byte • variable assignment • two’s complement • type • character set • conversion • escaping a character • integer Unit 3 Control Structures Learning Outcomes • Design algorithms that use decision making and implement them in Python. • Design algorithms that use iteration and implement them in Python. • Given an algorithm, approximate its running time. • Find and fix bugs in small programs. • Create algorithms for more complex problems. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Sections 4.2–4.7, 6.2–6.4, 1.3, Appendix A in How to Think Like a Computer Scientist. Topic 3.1 Making Decisions All of the code we have written so far has been pretty simple. It all runs from top to bottom, and every line executes once as it goes by. The process soon becomes boring. It’s also not very useful. 55 56 UNIT 3. CONTROL STRUCTURES write “Think of a number between 1 and 10.” set guess to 6. write “Is your number equal to guess?” read answer if answer is “yes”, then write “I got it right!” if answer isn’t “yes”, then write “Nuts.” write “That’s the end of the game.” Figure 3.1: Simplified guessing game In the guessing game example from Figure 1.3, we need to decide if the user has guessed correctly or not and then take the appropriate action. Al- most every program you write to solve a real problem is going to need to do this kind of thing. There are a few ways to make decisions in Python. We’ll only explore one of them here. The if statement The most common way to make decisions in Python is by using the if statement. The if statement lets you ask if some condition is true. If it is, the body of the if will be executed. For example, let’s simplify the guessing game example from Figure 1.3. In the simplified version, the user things of a number from 1 to 10 and the computer only takes one guess. Pseudocode for this game is shown in Figure 3.1. Some Python code that implements Figure 3.1 can be found in Figure 3.2. Here are two example executions of the program: Think of a number between 1 and 10. Is your number equal to 6? no Nuts. That’s the end of the game. Think of a number between 1 and 10. Is your number equal to 6? yes 3.1. MAKING DECISIONS 57 print "Think of a number between 1 and 10." guess = 6 answer = raw_input("Is your number equal to " \ + str(guess) + "? ") if answer == "yes": print "I got it right!" if answer != "yes": print "Nuts." print "That’s the end of the game." Figure 3.2: Implementation of Figure 3.1 I got it right! That’s the end of the game. As you can see from these examples, only one of the print statements inside of the if is executed. The if statement is used to make a decision about whether or not some code should be executed. The condition is used to decide what to do. The two conditions in Fig- ure 3.2 are answer == "yes" and answer != "yes". These mean “answer is equal to yes” and “answer is not equal to yes,” respectively. We will look more at how to construct conditions later. The indented print statements are not executed when the if condition is false. These statements make up the body of each if statement. The last print is executed no matter what: it isn’t part of the if. In Python (unlike many programming languages), the amount of space you use is important. The only way you can indicate what statements are part of the if body is by indenting, which means you’ll have to be careful about spacing in your program. All block statements in Python (we’ll be seeing more later) are indented the same way. You start the block and then everything that’s indented after it is the body of the block. When you stop indenting, the block is over. How much you indent is up to you, but you have to be consistent. Most Python programmers indent 4 spaces and all of the example code for this course is written that way. 58 UNIT 3. CONTROL STRUCTURES Boolean Expressions The expressions that are used for if conditions must be either true or false. In Figure 3.2, the first condition is answer == "yes" , and it evaluates to true when the value stored in answer is "yes". These conditions are called boolean expressions. The two boolean values are True and False. A boolean expression is any expression that evaluates to true or false. To check to see if two values are equal, the == operator is used and != is the not equal operator. >>> if 4-1==3: print "Yes" Yes You have to press an extra Enter in this example after the print statement. In the Python interpreter, the extra blank line is used to tell it you’re done the block. The less than sign (<) and greater than sign (>) do just what you’d expect. For <, if the left operand is less than the right operand, it returns true. There are also boolean operators less than or equal (<=), and greater than or equal (>=). Note the difference between = and ==. The = is used for variable assign- ment; you’re telling Python to put a value into a variable. The == is used for comparison—you’re asking Python a question about the two operands. Python won’t let you accidentally use a = as part of a boolean expression, for this reason. Functions and methods can also return True or False. For example, strings have a method islower that returns True if all of the characters in the string are lower case (or not letters): >>> s="Hans" >>> s.islower() False >>> s="hans" >>> s.islower() True 3.2. DEFINITE ITERATION: FOR LOOPS 59 The else clause In Figure 3.2, we wanted to take one action if the user answered "yes" and another if they answered anything else. It could also have been written in the following way: if answer == "yes": print "I got it right!" else: print "Nuts." The else clause is executed if the if condition is not true. In the if statement, you can specify an else clause. The purpose of the else is to give an “if not” block of code. The else code is executed if the condition in the if is false. It is also possible to allow more possibilities with elif blocks. The elif is used as an “else if” option. In Figure 3.2, we could have done something like this: if answer == "yes": print "I got it right!" elif answer == "no": print "Nuts." else: print "You must answer ’yes’ or ’no’." Here, the logic of the program changes a little. They have to answer “yes” or “no”. If they answer anything else, they get an error message. Any number of elifs can be inserted to allow for many possibilities. Whenever an if. . . elif. . . elif. . . else structure is used, only one of the code bodies will be executed. The else will only execute if none of the conditions are true. Topic 3.2 Definite Iteration: for loops We are still missing one major concept in computer programming. We need to be able to execute the same code several times (iterate). There are several ways to iterate in most programming languages. We will discuss two ways you can use in Python: for and while loops. 60 UNIT 3. CONTROL STRUCTURES write “Enter a nonnegative integer:” read n set factorial to 1 do this for i equal to each number from 1 to n: set factorial to factorial × i write factorial Figure 3.3: Pseudocode to calculate factorials num = int( raw_input("How high should I count? ") ) for i in range(num): print i, Figure 3.4: Using a for loop in Python The for loop In some problems, you know ahead of time how many times you want to execute some code. For example, suppose we want to calculate factorials. If you haven’t run across factorials before, “n factorial” is written “n!” and is the product of all of the number from 1 to n: n! = 1 × 2 × 3 × · · · × (n − 1) × n . We can write a program to calculate factorials and we’ll know that we have to do n multiplications. Figure 3.3 contains pseudocode to calculate factorials. In Python, if you have a problem like this where you know before you start iterating how many times you’ll have to loop, you can use a for loop. Loops where you know how many times you’re going to loop when you start are called definite loops. Well, they are in textbooks. Ev- erybody else just calls them “for loops”. Isn’t Computing Science fun? Before we try to implement a factorial program from Figure 3.3, let’s explore the for loop a little. The program in Figure 3.4 uses a for loop to count as high as the user asks. 3.2. DEFINITE ITERATION: FOR LOOPS 61 n = int( raw_input("Enter a nonnegative integer: ") ) factorial = 1 for i in range(n): factorial = factorial * (i+1) print factorial Figure 3.5: Calculating factorials with Python The easiest way to construct a for loop is with the range function. When a for loop is given range(x), the loop body will execute x times. Figure 3.4 will look like this when it’s executed: How high should I count? 12 0 1 2 3 4 5 6 7 8 9 10 11 Notice that the range starts from zero and counts up to num − 1. If we wanted to count from one, we could have written the loop like this: for i in range(num): print i+1, We will have to do this when implementing the factorial algorithm since we need the values from 1 to n, not from 0 to n − 1. Here are the details of what happens when that loop runs: The given range, range(num), will cause the loop to repeat num times; the range rep- resents the integers 0, 1, 2, . . . , num-1. The loop variable (i) will be set to the first value in the range (0), and the loop body (the print statement) is executed. The body is then repeated for each of the other values in the range (1, 2, . . . , num-1). Figure 3.5 contains a program program that calculates n! . For some strange historical reason, i is a common choice for the for loop variable if nothing else is appropriate. You should choose a descriptive variable name where possible, but for quick, short loops, i is a good default. 62 UNIT 3. CONTROL STRUCTURES name = raw_input("What is your name? ") while name=="": name = raw_input("Please enter your name: ") print "Hello, " + name Figure 3.6: Using a while loop in Python Topic 3.3 Indefinite Iteration: while loops If you don’t know how many times you want the loop body to execute, the for loop is hard to work with. For example, in the guessing game in Figure 1.3, we just have to keep guessing until we get it right. That could be anywhere from 1 to 7 guesses. We will finally implement the guessing game in Topic 3.5. In Python, you can do this with a while loop. To construct a while loop, you use a condition as you did in a if statement. The body of the loop will execute as many times as necessary until the condition becomes false. For example, the program in Figure 3.6 will ask the user to enter his or her name. If a user just presses enter, the program will keep asking until the user provides a response. It looks like this when it’s executed: What is your name? Please enter your name: Please enter your name: Please enter your name: Sherri Hello, Sherri When the while loop runs, these steps are repeated: 1. Check the value of the loop condition. If it’s False, the loop exits, and the program moves on to the following code. 2. Run the loop body. Basically, the while loop uses its condition to ask “should I keep going?” If so, it runs the loop once more and asks again. 3.4. CHOOSING CONTROL STRUCTURES 63 When you use an indefinite loop, you have to make sure that the loop condition eventually becomes false. If not, your program will just sit there looping forever. This is called an infinite loop. You’ll write an infinite loop sooner or later. Press control-C to stop the Python interpreter when you get tired of waiting for infinity. Topic 3.4 Choosing Control Structures When you’re starting to program, you may find it difficult to decide which control structure(s) to use to get a particular result. This is something that takes practice and experience programming. Once you figure out how to really work with these control structures, it become easier. There aren’t any rigid rules here. There are often many ways to do the same thing in a program, especially as things get more complicated. Below are a few guidelines. Before you can choose a control structure, you need to have a pretty good idea what you want the computer to do: you need to have an algorithm in mind. Once you do, you can then think about how to get a program to do what you want. • Just do it. Remember that most statements in Python (and most other programming languages) are executed in the order that they appear. So, a variable assignment, or print statement will execute right after the statement before (unless a control structure changes how things run). These statements tell the computer what you want to do. The control structures let you express when it will happen. (Sort of. Don’t take that too literally.) • Maybe do it. If you have some code that you want to execute only in a particular situation, then a conditional (if statement) is appropriate. An if will run its body zero times or one time. If you need to do similar things more than once, you should be looking at a loop. If the logic you need is “either this or that,” then you should use if with an else clause. If you need to do “one of these things,” then use the if. . . elif. . . elif. . . else form. 64 UNIT 3. CONTROL STRUCTURES In any of these cases, you need to come up with a boolean expression that describes when it’s appropriate to do each case. This again takes practice. The goal is to use the variables you have and the boolean operators to come up with something that is true exactly when you want the code to run. • Do it several times. When you need to do something several times, you need a loop. Remember that the loops can execute zero, one, or more times, depending on exactly how you’ve expressed things (and the values of variables, and what the use types, and so on). Note that the task done in the body of the loop doesn’t have to be exactly the same every time through. You can use the loop variable (in a for loop) and any other variables in your program to keep track of what should be done this time through the loop. For example, you might want to examine the loop variable to see if it has a particular property and print it to the screen if it does. To do this, you would use a if statement in the loop, and write a condition that expresses the property you’re looking for. • Do it this many times. If you know when you start the loop how many times it will run (for example, count up to a certain number, or run once for every item in a collection), you can use a for loop. For now, all of our for loops will loop over a range of numbers. In Topic 5.2, you will see that for can be used to loop over other items. Note that you don’t need to know how many times you’ll loop when you’re writing the program. The size of the range can be an expression that’s calculated from user input, or anything else. You just need to be able to figure this out when the program gets to the for loop. If you do a calculation and end up looping over range(0), the body of the for loop will run zero times. • Do it until it’s done. There are often situations when you can’t tell how many times to loop until you notice that you’re done. These are usually of the form “keep doing this until you’re done; you’re done when this happens.” In these cases, you probably need a while loop. A while loop is similar to an if in that you need to write a boolean expression that describes 3.5. EXAMPLE PROBLEM SOLVING: GUESSING GAME 65 print "Think of a number from 1 to 100." smallest = 1 largest = 100 guess = (smallest + largest) / 2 print "My first guess is " + str(guess) + "." Figure 3.7: Guessing game: first guess when to “go”. Unlike a if, a while loop will execute the body repeat- edly, until its condition is false. When writing the condition for a while loop, you need to figure out how the computer will determine that you still have more work to do before the loop is finished. Often, this is “I need to take another step if. . . ”. The body of the while is then the code necessary to do a “step”. Once again, finding the control structure to express what you want to do does take some practice, and there aren’t really any rules. But, hopefully the above will give you something to start with. Topic 3.5 Example Problem Solving: Guessing Game In Unit 1, we introduced the guessing game algorithm. It guessed a number from 1 to 100 that the user was thinking. Working from the pseudocode in Figure 1.3, we now have all of the tools we need to write a program implementing this algorithm. As we saw in the first problem solving example in Topic 2.7, you shouldn’t try to write the whole program at once and just hope it will work. You should test as you write. The program in Figure 3.7 starts the game and makes the first guess. This will let us test the expression to calculate guess first. We can change the initial values for smallest and largest and make sure guess is always halfway between. Now that we can make one guess, we can combine this with an if state- ment to ask the user whether or not we’re right. This is similar to Figure 3.2. This is done in Figure 3.8. 66 UNIT 3. CONTROL STRUCTURES print "Think of a number from 1 to 100." smallest = 1 largest = 100 guess = (smallest + largest) / 2 answer = raw_input( "Is your number ’more’, ’less’," \ " or ’equal’ to " + str(guess) + "? " ) if answer == "more": smallest = guess + 1 elif answer == "less": largest = guess - 1 print smallest, largest Figure 3.8: Guessing game: get an answer We can check this program and make sure our logic is right. Have we accidentally interchanged what should be done in the two cases? Suppose we’re thinking of 80: Think of a number from 1 to 100. Is your number ’more’, ’less’, or ’equal’ to 50? more 51 100 Now, the program will be guessing numbers from 51 to 100, which is the right range. On the other side, if we were thinking of 43, Think of a number from 1 to 100. Is your number ’more’, ’less’, or ’equal’ to 50? less 1 49 Since 43 is in the range 1–49, we are still on the right track. Now, we can put this into a loop and keep guessing until we get it right. By directly translating out pseudocode, we would get something like Fig- ure 3.9. But, if you try to run this program, you’ll see an error like this: Think of a number from 1 to 100. Traceback (most recent call last): File "C:/Python23/guess3.py", line 4, in -toplevel- while answer != "equal": NameError: name ’answer’ is not defined 3.5. EXAMPLE PROBLEM SOLVING: GUESSING GAME 67 print "Think of a number from 1 to 100." smallest = 1 largest = 100 while answer != "equal": guess = (smallest + largest) / 2 answer = raw_input( "Is your number ’more’, ’less’," \ " or ’equal’ to " + str(guess) + "? " ) if answer == "more": smallest = guess + 1 elif answer == "less": largest = guess - 1 print smallest, largest print "I got it!" Figure 3.9: Guessing game: trying a loop This happens because we have tried to use the value in the variable answer before ever putting anything into it. In Python, variables don’t exist until you put a value in them with a variables assignment, using =. So, when we try to use the value in answer in the while loop’s condition, the variable doesn’t exist. Python doesn’t have anything to use with the name answer so it generates a NameError. This is a good example of problems that can come up when translating pseudocode into a programming language. There isn’t always a nice, neat translation; you have to work with the language you’re writing your program in. This is also part of the reason it’s a good idea to write pseudocode in the first place: it’s easier to work out the algorithm without fighting with the programming language. In order to get around this, we have to get something in the variable answer before we try to use its value. We could copy the two statements that assign values to guess and answer outside of the loop, but that could be a little hard to work with later: if we have to fix the code, we have to fix two things instead of one. The easiest thing to do is just put a dummy value in answer so the variable exists, but we’re still sure the condition is false. This has been done 68 UNIT 3. CONTROL STRUCTURES print "Think of a number from 1 to 100." smallest = 1 largest = 100 answer = "" while answer != "equal": guess = (smallest + largest) / 2 answer = raw_input( "Is your number ’more’, ’less’," \ " or ’equal’ to " + str(guess) + "? " ) if answer == "more": smallest = guess + 1 elif answer == "less": largest = guess - 1 print smallest, largest print "I got it!" Figure 3.10: Guessing game: a working loop in Figure 3.10. Let’s try this program. Suppose we’re thinking of the number 43 and play the game: Think of a number from 1 to 100. Is your number ’more’, ’less’, or ’equal’ to 50? less 1 49 Is your number ’more’, ’less’, or ’equal’ to 25? more 26 49 Is your number ’more’, ’less’, or ’equal’ to 37? more 38 49 Is your number ’more’, ’less’, or ’equal’ to 43? equal 38 49 I got it! There is still some extra output that we can use while testing the program. After each guess, it prints out the current range of numbers it’s considering (the lines like 26 49). Don’t be afraid to print out extra stuff like this to help you figure out exactly what your program’s doing while testing. Notice that in Figure 3.10, we have a if block inside of a while loop. 3.5. EXAMPLE PROBLEM SOLVING: GUESSING GAME 69 print "Think of a number from 1 to 100." # start with the range 1-100 smallest = 1 largest = 100 # initialize answer to prevent NameError answer = "" while answer != "equal": # make a guess guess = (smallest + largest) / 2 answer = raw_input( "Is your number ’more’, ’less’," \ " or ’equal’ to " + str(guess) + "? " ) # update the range of possible numbers if answer == "more": smallest = guess + 1 elif answer == "less": largest = guess - 1 print "I got it!" Figure 3.11: Guessing game: final version If you want to do that, or include a loop in a loop, or an if in an if, just increase the amount of indenting so the inside block is indented more. Finally, we have a working guessing game program. A final polished version has been created in Figure 3.11. You’ll notice in Figure 3.11 that we have added some comments to the code. In Python, comments are on lines that start with a #. Comments let you include information that is meant only for the pro- grammer. The Python interpreter ignores comments. They are used only to make code easier to understand. You should be in the habit of writing comments on sections of code that briefly describe what the code does. 70 UNIT 3. CONTROL STRUCTURES write “Think of a number between 1 and 100.” set guess to 1 until the user answers “equal”, do this: write “Is your number equal to or not equal to guess?” read answer set guess to guess + 1 Figure 3.12: A much worse version of the guessing game Topic 3.6 Running Time In this section, we will explore how long it takes for algorithms to run. The running time of an algorithm will be part of what determines how fast a program runs. Faster algorithms mean faster programs, often much faster, as we will see. The running time of algorithms is a very important aspect of computing science. We will approach it here by working through some examples and determining their running time. The Guessing Game When you were experimenting with the guessing game algorithm and pro- gram in Topics 1.4 and 3.5, you probably tried the game with a few different numbers. Hopefully, you noticed that this algorithm focuses in on the num- ber you’re guessing quite quickly. It takes at most seven guesses to get your number, no matter which one you’re thinking of. Suppose we had written the program from the pseudocode in Figure 3.12. This algorithm starts at zero and guesses 1, 2, 3, . . . , until the user finally enters “equal”. It does solve the “guess the number between 1 and 100” problem and it meets the other criteria in the definition of an algorithm. What’s different about this algorithm is the amount work it has to do to finish. Instead of at most seven guesses, this algorithm requires up to 100. Obviously, if we’re trying to write a fast program, an algorithm that requires seven steps to solve a problem is much faster than one that takes 100. Suppose we were writing a guessing game that guessed a number from 1 to n. The value of n could be determined when we write the program or by 3.6. RUNNING TIME 71 asking the user for the “size” of the game before we start. How many steps would each algorithm take if we modified it to guess a number from 1 to n? The algorithm in Figure 3.12 would take up to n steps (if the user was thinking of n). The number of guesses needed by the original algorithm from Figure 1.3 is a little harder to figure out. Each time the algorithm makes a guess, it chops the range from smallest to largest in half. The number of times we can cut the range 1–n in half before getting down to one possibility is ⌈log2 n⌉. The notation ⌈x⌉ means “round up”. It’s the opposite of the floor notation, ⌊x⌋ and is usually pronounced “ceiling”. The mathematical expression log2 n is the “base-2 logarithm of n”. It’s the power you have to raise 2 to to get n. So, if we let x = log2 n, then it’s always true that 2x = n. Having a running time around log2 n steps is good since it grows so slowly when n increases: log2 1 = 0 , log2 16 = 4 , log2 1024 = 10 , log2 1048576 = 20 . We could give this program inputs with n = 1000000 and it would still only take about 20 steps. Why does this algorithm take about log2 n steps? Consider the number of possible values that could still be the value the user is thinking of. Remember that this algorithm cuts the number of possibilities in half with each step. Step Possible values 0 n = n/20 1 n/2 = n/21 2 n/4 = n/22 3 n/8 = n/23 k n/2k 72 UNIT 3. CONTROL STRUCTURES In the worst case, the game will end when there is only one possibility left. That is, it will end after k steps, where 1 = n/2k 2k = n log2 2k = log2 n k = log2 n . So, it will take log2 n steps. Basically, any time you can create an algorithm like this that cuts the problem in half with every iteration, it’s going to be fast. Remember: any time you have a loop that cuts the problem in half with each iteration, it will loop log n times. If you understand the above derivation, good. If you’d just like to take it on faith, that’s fine too. Repeated Letters Now consider the algorithm in Figure 3.13. It will check a word (or any string, really) to see if any character is repeated anywhere. For example, the word “jelly” has a repeated “l”; the word “donuts” has no repeated letters. This algorithm works by taking leach letter in the word, one at a time, and checking each letter to the right to see if its the same. For example, with the word “Lenny”, it will make these comparisons: L is compared with e, n, n, y (none are equal) e is compared with n, n, y (none are equal) n is compared with n, y (equals the “n”) n is compared with y (none are equal) y is compared with nothing In the third line, it will compare the first and second “n” and notice that they are equal. So, this word has repeated letters. You can find a Python implementation of this program in Figure 3.14. The only new thing in this program: you can get character n out of a string str by using the expression str[n]. This is called string subscripting. Note that the first character in the string is str[0], not str[1]. This program makes n(n − 1)/2 = n2 /2 − n/2 comparisons if you enter a string with n characters. When measuring running time of a program, 3.6. RUNNING TIME 73 write “Enter the word:” read word set counter to 0 for all letters letter a in the word , do this: for all letters letter b to the right of letter a, do this: if letter a is equal to letter b then set counter to counter +1 if counter > 0 then write “There are repetitions” else write “No repetitions” Figure 3.13: Algorithm to check for repeated letters in a word word = raw_input("Enter the word: ") counter = 0 length = len(word) for i in range(length): # for each letter in the word... for j in range(i+1, length): # for each letter after that one... if word[i]==word[j]: counter = counter + 1 if counter>0: print "There are repeated letters" else: print "There are no repeated letters" Figure 3.14: Repeated letters Python implementation 74 UNIT 3. CONTROL STRUCTURES we won’t generally be concerned with the smaller terms because they don’t change things too much as n gets larger (we’ll ignore n/2 in the example). We generally aren’t concerned with the constant in front of the term (the 1 2 on the n2 ). So, we will say that the algorithm in Figure 3.13 has a running time of n2 . You’d be right if you think that throwing away the 12 is losing a lot of information. The difference between an algorithm that finishes in 100 or 200 seconds is significant. The problem is that it’s too hard to come up with a factor like this that actually means something. The algorithm could easily run twice as fast or half as fast if it was implemented in a different programming language, using different commands in the language, on a different computer, and so on. Bottom line: is the 12 a big deal? Yes, but we don’t worry about it when estimating running times. If you are taking or have taken MACM 101, you might recognize all of this as the same thing that’s done with big-O notation. We would say that the repeated letters algorithm has a running time of O(n2 ). If you haven’t taken MACM 101, watch for the big-O stuff if you do. Subset Sum Let’s consider one final example where the best known solution is very slow. Suppose we get a list of integers from the user, and are asked if some of them (a subset) add up to a particular target value. This problem is known as “subset sum”, since we are asked to find a subset that sums to the given value. For example, we might be asked to find a subset of 6, 14, 127, 7, 2, 8 that add up to 16. In this case, we can. We can take the 6, 2, and 8: 6+2+8 = 14. In this example, we should answer “yes”. If we used the same list of numbers, but had a target of 12, there is no subset that adds to 12. We should answer “no” in that case. Some rough pseudocode for the subset-sum problem can be found in Fig- ure 3.15. This algorithm will solve the problem. It simply checks every possible subset of the original list. If one sums to the target, we can answer “yes”; if none do, the answer is “no”. 3.6. RUNNING TIME 75 for every subset in the list: set sum to to the sum of this subset if sum is equal to target: answer “yes” and quit answer “no” Figure 3.15: Algorithm to solve the subset-sum problem. n 2n ≈ Approx. time 4 16 16 milliseconds 10 103 1 second 20 106 17.7 minutes 30 109 11.6 days 40 1012 31.7 years Figure 3.16: Running time of an exponential algorithm What is the running time of this algorithm? It depends on the number of times the loop runs. If we have a list of n items, there are 2n subsets, so the running time will be exponential: 2n . More accurately, the running time is n2n , since calculating the sum of the subset takes up to n steps. An exponential running time is very slow. It’s so slow that exponen- tial running times are often not even considered “real” solutions. Consider Figure 3.16. It gives running times of a 2n algorithm, assuming an imple- mentation and computer that can do 1000 iterations of the loop per second. As you can see from Figure 3.16, this algorithm can only solve subset-sum for the smallest of cases. Even if we find a computer that is much faster, we can only increase the solvable values of n by a few values. Can we do better than the algorithm in Figure 3.15? Maybe. There are no known sub-exponential algorithms for this problem (or others in a large class of equally-difficult problems), but there is also no proof that they don’t exist. 76 UNIT 3. CONTROL STRUCTURES Number of “Steps” We have avoided giving the exact definition of a “step” when calculating running times. In general, pick a statement in the innermost loop, and count the number of times it runs. Usually, you can multiply together the number of iterations of the nested loops. For example, consider an algorithm like this one: statement 1 for i from 1 to log n: statement 2 for j from 1 to n/2: statement 3 Here, “statement 3” is in the innermost loop, so we will count the number of times it executes. The first for loop runs log n times, and the second runs n/2 times. So, the running time is (log n) · (n/2). We discard the constant factor and get a running time of n log n. Remember that when determining running time, we will throw away lower-order terms and leading constants. That means we don’t have to count anything in “shallower” loops, since they will contribute lower-order terms. Similarly, we don’t have to worry about how many statements are in the loops; that will only create a leading constants, which will be discarded any- way. Summary We have now seem algorithms that have running time log n, n, n2 and 2n . See Figure 3.17 for a comparison of how quickly these times grow as we increase n. In the graph, 2n has been excluded, because it grows too fast to see without making a much larger graph. As you can see, the log2 n function is growing very slowly and n2 is growing quite fast. Coming up with algorithms with good running times for problems can be very hard. We will see a few more examples in this course. A lot of computing scientists spend a lot of time working on efficient algorithms for particular problems. For a particular algorithm, you can come up with programs that run faster or slower because of the way they are written. It’s often possible to decrease the number of iterations slightly or make the calculations more efficient. 3.7. DEBUGGING 77 100 n2 80 n 60 steps 40 20 log2 n 0 20 40 60 80 100 n Figure 3.17: Graph of the functions log2 n, n, and n2 But no matter how fast the program is you’ve written, a better algo- rithm will always win for large inputs. This is why most computing science courses spend so much time focusing on algorithms instead of the details of programming. The programming part is easy (one you learn how, anyway). It’s coming up with correct, fast algorithms that’s hard. Running time is fundamental when it comes to studying algo- rithms. It is covered in CMPT 225 (Data Structures and Pro- gramming), CMPT 307 (Data Structures and Algorithms), and many other courses. The mathematical details of the analysis are covered in MACM 101 (Discrete Math I). Topic 3.7 Debugging Unfortunately, when you write programs, they usually won’t work the first time. They will have errors or bugs. This is perfectly normal, and you shouldn’t get discouraged when your programs don’t work the first time. Debugging is as much a part of programming as writing code. 78 UNIT 3. CONTROL STRUCTURES Section 1.3 and Appendix A in How to Think Like a Computer Scientist cover the topic of bugs and debugging very well, so we won’t repeat too much here. You should read those before you start to write programs on your own. Beginning programmers often make the mistake of concentrating too much on trying to fix errors in their programs without understanding what causes them. If you start to make random changes to your code in the hopes of getting it to work, you’re probably going to introduce more errors and make everything worse. When you realize there’s a problem with your program, you should do things in this order: 1. Figure out where the problem is. 2. Figure out what’s wrong. 3. Fix it. Getting it right the first time The easiest way to get through the first two steps here quickly is to write your programs so you know what parts are working and what parts might not be. Write small pieces of code and test them as you go. As you write your first few programs, it’s perfectly reasonable to test your program with every new line or two of code. It’s almost impossible to debug a complete program if you haven’t tested any of it. If you get yourself into this situation, it’s often easier to remove most of the code and add it back slowly, testing as you do. Obviously, it is much easier to test as you write. Don’t write your whole program without testing and then ask the TAs to fix it. Basically, they would have to rewrite your whole program to fix it, and they aren’t going to do that. As you add code and test, you should temporarily insert some print statements. These will let you test the values that are stored in variables so you can confirm that they are holding the correct values. If not, you have a bug somewhere in the code you’ve written and should fix it before you move on. In the two example “Problem Solving” topics, 2.7 and 3.5, the program was written in small pieces to illustrate this approach. 3.7. DEBUGGING 79 Finding bugs Unfortunately, you won’t always catch every problem in your code as you write it, no matter how careful you are. Sooner or later, you’ll realize there is a bug somewhere in your program that is causing problems. Again, you should resist the urge to try to fix the problem before you know what’s wrong. Appendix A of How to Think Like a Computer Scientist talks about different kinds of errors and what to do about them. When you realize you have a bug in your program, you’re going to have to figure out where it is. When you are narrowing the source of a bug, the print statement can be your best friend. Usually, you’ll first notice either that a variable doesn’t contain the value you think it should or that the flow of control isn’t the way you think it should be because the wrong part of an if is executed. You need to work backwards from the symptom of the bug to its cause. For example, suppose you had an if statement like this: if length*width < possible area: If the condition doesn’t seem to be working properly, you need to figure out why. You can add in some print statements to help you figure out what’s really going on. For example, print "l*w:", length*width print "possible:", possible area if length*width < possible area: print "I’m here" When you check this way, be sure to copy and paste the exact expressions you’re testing. If you accidentally mistype them here, it could take a long time to figure out what has happened. You’ll probably find that at least one of the print statements isn’t doing what it should. In the example, suppose the value of length*width wasn’t what we expected. Then, we could look at both variables separately: print "l, w:", length, width If length was wrong, you would have to backtrack further and look at what- ever code sets length. Remove these print statements and add in some more around the length=. . . statement. 80 UNIT 3. CONTROL STRUCTURES Topic 3.8 Coding Style Writing code that is correct and solves the problem isn’t always enough. It’s also important to write code that someone can actually read and understand. It is often necessary for you or others to return to some code and add features or fix problems. In fact, in commercial software, most of the expense is in maintenance, not in the initial writing of the code. It can be quite difficult to look at someone else’s code (or even your own code after a few months) and figure out what’s going on. In order to fix bugs or add features, you need to understand the logic and details of the code, otherwise you’ll probably break more than you fix. To help others (and yourself) understand the code you’ve written, it’s important to try to make everything as clear as possible. There are no absolute rules in this, but there are some guidelines you can follow. Comments Comments can be used in your code to describe what’s happening. In Python, the number sign (#, also called the hash or pound sign) is used to start a comment. Everything on a line after the # is ignored: it is there for the programmer only, and does not affect the way the program runs. In your programs, you should use comments to explain difficult parts of the code. The comment should explain what is happening and/or why it needs to be done. This can include a description of the algorithm and purpose, if it’s not immediately clear. Often, when beginning programmers are told “comments are good,” the results are something like this: # add one to x x = x + 1 Don’t do that. Anyone reading your code should understand Python, and doesn’t need the language explained to them. Comments that actually ex- plain how or why are much more useful: # the last entry was garbage data, so ignore it count = count - 1 3.8. CODING STYLE 81 That comment will help somebody reading your code understand why it was necessary to decrease count. It is often useful to put a comment at the start of each control structure, explaining what it does, or what the condition checks for. Here are some examples: # if the user entered good data, add it in if value >= 0: total = total + value count = count + 1 # search for a value that divides num while num%factor != 0: factor = factor + 1 You can also put a comment at the top of a “section” of code. Look for chunks of code that do a specific task, and put a comment at the top that describes what that task is, and how it’s done. For example, # Get user input # Ask for integers, adding them up, until the user # enters "0". The Code Itself The way the code itself is written can make a huge difference in readability and maintainability. Probably the easiest thing to do is to use good variable names. Variable names should describe what the variable holds and what it’s for. Consider the first example above with poorly chosen variable names: if x >= 0: y = y + x z = z + 1 It would take a lot of careful reading to figure out what this code actually does. With descriptive variable names, it’s much easier, even without the comment: if value >= 0: total = total + value count = count + 1 82 UNIT 3. CONTROL STRUCTURES a = int(raw_input("Enter an integer: ")) b = 0 for i in range(a+1): if i>0: if a % i==0: b = b + 1 print "There are " + str(b) + " factors." Figure 3.18: A program with poor style There is a bit of a trade-off between the length of the variable name and how readable it is. On one hand, short variables names like a, n1, and x aren’t very descriptive. But, variable names that are too long can be difficult to type and clutter code. The name total_number_of_values_entered_by_user may be very descriptive, but code using it would be unreadable. Perhaps values_entered would be better. The spacing in your code is also important. In Python, you are required to indent the blocks of code inside a con- trol statement. In many other programming languages, this is optional, but still considered good practice. In Python, you should be consistent in your spacing: the standard style is to indent each block by four spaces. Spacing within a statement can help readability as well. Consider these two (functionally identical) statements: y = 100 / x+1 y = 100/x + 1 The spacing in the first one suggests that the x+1 calculation is done first (as in 100/(x+1)), but this is not the case. The order of operations in Python dictate that the expression is equivalent to (100/x)+1. So, the second spacing gives a more accurate first impression. You can use space within lines, and blank lines in the code, to separate sections logically. This will make it easier to scan the code later and pick out the units. Summary Again, there are no absolute rules for coding style. It is overall a matter of opinion, but the guidelines above can be followed to point you in the right 3.8. CODING STYLE 83 # get user input and initialize num = int(raw_input("Enter an integer: ")) count = 0 # check every possible factor (1...num) for factor in range(1, num+1): # if factor divides num, it really is a factor if num%factor == 0: count = count + 1 # output results print "There are " + str(count) + " factors." Figure 3.19: Figure 3.18 with the style improved direction. When you are writing code, you should always keep in mind how easy it is to read and follow the code. Add comments and restructure code where necessary. Consider the program in Figure 3.18. What does it do? How is it being done? Does the text displayed to the user in the last line help? Now look at Figure 3.19. This program does the exact same thing, but has better style. Better variable names and a few comments make a big difference in how easy it is to understand. Even if you don’t know what the % operator does in the if condition, you can probably figure out what the program does. a%b computes the remainder of a divided by b. When it evaluates to zero, there is no remainder: a is evenly divisible by b. There is another change in Figure 3.19 that is worth mentioning. The logic of the program was changed slightly to simplify it. The if in Fig- ure 3.18 is only necessary to eliminate the case where the loop variable is zero: checking this case causes a division by zero error. We can change the range so the loop avoids this case in the first place. This illustrates a final important aspect of coding style: the actual logic of the program. The first way you think of to accomplish something might not be the simplest. Always be on the lookout for more straightforward ways 84 UNIT 3. CONTROL STRUCTURES to do what needs to be done. You can often eliminate some logic in your program in favour of something that does the same thing in an easier way. This, along with other aspects of coding style, takes experience. You will learn new methods and tricks to get things done in a program as you write more code, read more code, and take more courses. Be patient and keep your eyes open for new techniques. ◮ Go back to the code you wrote for the first assignment in this course. Can you easily understand how it works? Keep in mind that it’s only been a few weeks: imagine coming back to it next year. ◮ Now, try to swap code with someone else in the course. Can you understand each other’s code? Topic 3.9 More About Algorithms Now that you know about variables, conditionals, and loops, you have all of the building blocks you need to start implementing algorithms. (You’ll still need to know some more about working with data structures before you can implement any algorithm. We will talk more about data structures in Unit 5.) We have also said that coming up with an algorithm is generally much harder than implementing the algorithm with a programming language. Cre- ating algorithms is something you’ll practice over the next few years if you continue in computing science. Binary Conversion To get you started thinking about creating algorithms, we’ll do one example here. In Topic 2.6, we talked about how to convert a binary value to decimal: each bit is multiplied by the next power of two and the results added. But, how can we do the opposite conversion? If I give you the number 13, how can you determine its (unsigned) binary representation? (For the record, it’s 1101.) First, let’s assume we’re limited to 8-bit binary numbers. The 8-bit rep- resentation of the number 13 is shown in Figure 3.20, with the numeric value 3.9. MORE ABOUT ALGORITHMS 85 00001101 128 64 32 16 8 4 2 1 Figure 3.20: The number 13 represented with 8 bits of each bit. We can check to see that this is the correct representation: 8 + 4 + 1 = 13. Now, back to the question of how we could come up with this. Let’s try to get the eight-bit representation for the number 25. First an easy question: do we need a 0 or 1 in the first position (the 128’s position)? Obviously not, 128 is much bigger than 25, so adding in a 128 will make the whole thing too big. In fact, we can fill in the highest three bits this way. They are all larger than 25, so we’ll definitely want 0’s there: 000????? Well, at least we’re getting somewhere: we have the first few bits taken care of. Now, do we want a 0 or 1 in the next position (16)? According to the above reasoning, we could put a 1, but is that necessarily right? Suppose we don’t: put a 0 in the 16’s position. Then all of the bits we have left to fill are 8, 4, 2, and 1. These add up to 15, so there’s no way we could get 25 out of that. We have to put a 1 in the 16’s position: 0001???? For the rest of the positions, we can continue in a similar way. We have taken care of 16 of the number 25 we’re trying to represent, so we have 9 left. Keep going down the line: for the 8’s position, 8 is less than 9, so put a 1 in that position: 00011??? We now have 1 left to represent. We can’t take a 4 or a 2, but will set the last bit to 1: 00011001 This has described a fairly simple algorithm to determine the binary representation of a number: if the bit we’re looking at will fit in the number we have left, put a 1; if not, put a 0. Pseudocode for this algorithm is in Figure 3.21. 86 UNIT 3. CONTROL STRUCTURES read num for positions p from 7, 6, . . . 0, do this: if num < 2p , then set binary to binary + “0” otherwise, set binary to binary + “1” set num to num − 2p write binary Figure 3.21: Pseudocode to determine the 8-bit binary representation This algorithm only works for numbers up to 255. If you give the algo- rithm a number any bigger than that, it will produce all 1’s. It’s possible to fix the algorithm by starting with bit ⌊log2 n⌋, instead of bit 7. So? You should probably know how to convert a number to its binary represen- tation, but that’s not the point of this topic. As you go on into computing science, development of algorithms is impor- tant. The point here is to give you an idea of how you can go about coming up with an algorithm. Start by trying to work out the problem by hand and try to recognize the common steps and decisions needed. Try to work this into pseudocode expressing a general method and test it on different values. Once you have the pseudocode, you can then start working on a program that implements the algorithm you’ve developed. Summary At this point, we have seen all of the major building blocks of computer programs. Once you have loops and conditionals, you can combine them to tackle just about any problem. Again with this material, you have to practice these ideas by writing programs before you’ll really understand them. 3.9. MORE ABOUT ALGORITHMS 87 Key Terms • if statement • while loop • condition • running time • boolean expression • debugging • for loop 88 UNIT 3. CONTROL STRUCTURES Unit 4 Functions and Decomposition Learning Outcomes • Design and implement functions to carry out a particular task. • Begin to evaluate when it is necessary to split some work into functions. • Locate the parts of a program where particular variables are available. • Import Python modules and use their contents. • Read the Python module reference for information on a module’s con- tents. • Use objects provided by modules in your programs. • Catch errors in programs and handle them gracefully. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Sections 3.6–3.12, 4.8, 5.1–5.4, 5.6 in How to Think Like a Com- puter Scientist. Topic 4.1 Defining Functions We have already seen how several functions work in Python. In particular, we have used raw_input, range, int, and str. Each of these is built into Python and can be used in any Python program 89 90 UNIT 4. FUNCTIONS AND DECOMPOSITION def read_integer(prompt): """Read an integer from the user and return it.""" input = raw_input(prompt) return int(input) num = read_integer("Type a number: ") print "One more is", num+1 num = read_integer("Type another: ") print "One less is", num-1 Figure 4.1: A program with a function defined A function must be given arguments. These are the values in parentheses that come after the name of the function. For example, in int("321"), the string "321" is the argument. Functions can have no arguments, or they can take several. Functions that return values can be used as part of an expression. We saw how the int function works, which returns an integer. It can be used in an expression like this: x = 3*int("10") + 2 After this statement, the variable x will contain the number 32. In this expression the int function returns the integer 10, which is then used in the calculation. Python functions can return any type of value including strings and float- ing point values. Defining your own functions You can define your own functions as well. They are defined with a def block, as shown in Figure 4.1. The code inside the def isn’t executed right away. The function is defined and then run whenever it is called. In Figure 4.1, the function is named “read_integer” and takes one ar- gument that we’ll call prompt. Inside the function definition, prompt works like a variable. Its value is filled in with whatever argument is given when the function is called. 4.1. DEFINING FUNCTIONS 91 The next line is a triple-quoted string that describes the function. This is called a documentation string or docstring. The docstring is a special form of a comment in Python: it has no effect on the behaviour of the function. It works like a comment and will help somebody reading your code figure out what it does. Every function you write in this course must have a meaningful docstring. It will help us understand your code more easily when we mark it. It is also a good habit to get into. When you have to come back to some of your own code after a few weeks, you’ll be glad you included it. The statements in the body of the function are what will be executed when the function is called. The return statement indicates the value that the function returns. The main part of the program in Figure 4.1 makes two calls to the read_integer function. Here’s what the program looks like when it’s run: Type a number: 15 One more is 16 Type another: 192 One less is 191 You should define functions to do tasks that you’ll have to do several times. That way you’ll only have to type and debug the code once and be able to use it many times. As a general rule, you should never copy-and-paste code. If you need to reuse code, put it in a function and call it as many times as necessary. Defining functions is also useful when you are creating larger program. Even if you’re only going to call a function once, it helps you break your program into smaller pieces. Writing and debugging many smaller pieces of code is much easier than working on one large one. Calling Functions Consider the example function in Figure 4.2. This does a calculation that is common in many algorithms. The docstring should be enough for you to figure out what it does. Suppose we then run this statement: half_mid = middle_value(4,2,6) / 2 92 UNIT 4. FUNCTIONS AND DECOMPOSITION def middle_value(a, b, c): """ Return the median of the three arguments. That is, return the value that would be second if they were sorted. """ if a <= b <= c or a >= b >= c: return b elif b <= a <= c or b >= a >= c: return a else: return c Figure 4.2: A sample function What exactly happens when the computer “runs” this code? 1. The expression on the right of the variable assignment must be eval- uated before the variable can be assigned, so that is done first. It evaluates the expression middle_value(4,2,6) / 2. 2. The sub-expressions on either sode of the division operator must be evaluated. The first is a call to our function. It then evaluates the expression middle_value(4,2,6). 3. This requires calling the function in Figure 4.2. Now, this statement is put on hold while the function does its thing. 4. The function middle_value is called. (a) The arguments that are given in the calling code (4,2,6) are as- signed to the local variables given in the argument list (a,b,c). So, the behaviour is as if we had code in the function that made these assignments: a=4, b=2, c=6. (b) The code in the function body then starts to execute. This code executes until it gets to the end of the function or a return state- ment. It this case, the if condition is false, so the return b statement is skipped. The condition of the elif is true (since b <= a <= c), so the return a statement executes. 4.1. DEFINING FUNCTIONS 93 (c) The function returns the integer 4 and exits. Any code after the return doesn’t execute, since the function has already stopped. 5. The calling code gets the return value, 4. This is used in place of the function call. The expressions is now 4/2. 6. The division is done. The original expression has evaluated to the integer 2. 7. The integer 2 is assigned to the variable half_mid. This outline of a function call is reasonably representative of any function call. When the call occurs, that code pauses until the function is finished and returns a value. Functions that don’t return values are similar. The only difference is that they are not part of a larger expression. They just execute and the calling code continues when they are done. Why Use Functions? Functions can be used to break your program into logical sections. You can take a specific task or calculation, and define a function that accomplishes that task or calculation. Breaking the logic of a program up into sections can make it much easier to build. You can create functions to handle parts of your algorithm, and assemble them in a much simpler main program. Using functions well can make your program much easier to read. Func- tions should have descriptive names, like variables. The function should be named after what it does or what it returns. For example, read_data_file, initial_guess, or run_menu. The function definitions themselves can be relatively small (and under- standable) stretches of code. Someone trying to read the program can figure out one function at a time (aided by a good function name and the doc- string). Then, they can move on to the main program that assembles these parts. This is generally much easier than reading (and writing and debug- ging) one long section of code in the main program. Functions are also quite useful to prevent duplication of similar code. If you need to do similar tasks in different parts of the program, you could copy- and-paste code, and many beginning programmers do. But, what happens when you want to change or update that task? You have to hunt for that 94 UNIT 4. FUNCTIONS AND DECOMPOSITION code everywhere it occurs and fix it in every location. This is tedious and error-prone. If the repeated task is separated into a function, then maintaining it is much easier. It only occurs in one place, so it’s easy to fix. You should never copy-and-paste code within a program—it creates a maintainance nightmare. Functions are one tool that can be used to unify tasks. Check-Up Questions ◮ Write a function square that takes one floating point value as its argu- ment. It should return the square of its argument. ◮ Have a look at programs you’ve written for this course. Are there places where some work has been duplicated and could be put into a function? Topic 4.2 Variable Scope In Figure 4.1, the argument prompt is only available in the read_integer function. If we tried to use prompt outside of the function, Python would give the error NameError: name ’prompt’ is not defined It does so because prompt is a local variable in the read_integer func- tion. You could also say that the variable’s scope with the read_integer function. Any variables that are created within a function are local to that function. That means that they can’t be used outside of the function. This is actually a very good thing. It means that when you write a function, you can use a variable like num without worrying that some other part of the program is already using it. The function gets an entirely separate thing named num, and anything named num in the rest of the program is undisturbed. Have a look at the program in Figure 4.3. When it’s run, it produces output like this: How many lines should I print? 4 * ** *** **** 4.2. VARIABLE SCOPE 95 def stars(num): """ Return a string containing num stars. This could also be done with "*" * num, but that doesn’t demonstrate local variables. >>> print stars(5) ***** >>> print stars(15) *************** """ starline = "" for i in range(num): starline = starline + "*" return starline num = int(raw_input("How many lines should I print? ")) for i in range(num): print stars(i+1) Figure 4.3: A program that takes advantage of local variables 96 UNIT 4. FUNCTIONS AND DECOMPOSITION There is no confusion between the variable num in the function and the one in the main program. When the function uses the variable num, it is totally unrelated to the one in the main program. The same thing happens with i. It is used as the loop variable for both for loops. Since the function has its own version of i, there’s no conflict and both loops do what they look like they should So, to use the function stars, you don’t have to worry about how it was implemented—what variables names were used and for what. All you have to know it what it does. If a programming language doesn’t have this property that variables are usually local to a particular function or other part of the program, it becomes very hard to write large programs. Imagine trying to write some code and having to check 20 different functions every time you introduce a new variable to make sure you’re not using the same name over again. Also notice that the docstring in Figure 4.3 is much longer. It includes two examples of what the function should do. Giving examples is a good idea because it gives you something to check when you test the function. The actual behaviour should match what you’ve advertised in the docstring. There is actually a Python module called doctest that looks through your docstrings for things that look like examples of the function’s use. It then checks them to make sure the examples match what actually happens. Why Use Functions? Local variables introduce another reason to use functions. Since variables in functions are separate from the variables elsewhere in the program, the code has very limited interaction with the rest of the program. This makes it much easier to debug programs that are separated with functions. Functions take in values as arguments, and can return a value when they are done. They have no other way to interact with variables in other func- tions. That means that they can be debugged separately: if a function does its job given the correct arguments, then it works. If there is a problem in other parts fo the program, we don’t have to worry about other functions changing variable values because they can’t. Each function can be checked for correctness on its own. 4.3. PYTHON MODULES 97 import time print "Today is " + time.strftime("%B %d, %Y") + "." Figure 4.4: Program that prints today’s date. Since it’s much easier to work with many small chunks of code than one large one, the whole writing and debugging process becomes much easier. As a programmer, you have to create and test individual functions. Once you’re reasonably sure the function is corret, you can forget about it and move on to other tasks. Topic 4.3 Python Modules In most programming languages, you aren’t expected to do everything from scratch. Some prepackaged functions come with the language, and you can use them whenever you need to. These are generally called libraries. In Python, each part of the whole built-in library is called a module. There are a lot of modules that come with Python—it’s one of the things that experienced programmers tend to like about Python. For example, the module time provides functions that help you work with times and dates. The full documentation for the time module can be found online. It’s important that a programmer can find and interpret documentation like this. It might seem daunting at first—the documentation is written for people who know how to program—it should get easier with practice. In the documentation, you’ll find that the time module has a function strftime that can be used to output the current date and time in a particular format. The program in Figure 4.4 uses the time module to output the current date. When the program runs, it produces output like this: Today is December 25, 2010. The first line in Figure 4.4 imports the time module. Modules in Python must be imported before they can be used. There are so many modules that if they were all imported automatically, programs would take a long time to 98 UNIT 4. FUNCTIONS AND DECOMPOSITION start up. This way, you can just import the modules you need at the start of the program. In the next line, the function strftime is referred to as time.strftime. When modules are imported like this, you get to their contents by calling them modulename.function . This is done in case several modules have functions or variables with the same names. You can also import modules so that you don’t have to do this: You could just call the function as strftime. To do that, the module’s contents are imported like this: from time import * This is handy if you want to use the contents of a particular module a lot. How did we know that "%B %d, %Y" would make it output the date in this format? We read the documentation online. The %B gets replaced with the full name of the current month, %d with the day of the month, and %Y with the four-digit year. There are Python modules to do all kinds of things, far too many to mention here. There is a reference to the Python libraries linked from the course web site. We will mention a few more modules as we cover other topics in the course. You can always go to the reference and get a full list and description of their contents. Check-Up Questions ◮ Have a look at the module reference for time and see what else is there. ◮ Look at the other modules available in Python. You probably won’t under- stand what many of them do, but have a look anyway for stuff that you do recognize. Topic 4.4 Objects As you start writing programs, you will often have to represent data that is more complicated that a “number” or “string”. There are some other types that are built into Python. There are also more complicated types that can hold collections of other information. These are called objects. 4.4. OBJECTS 99 Most modern programming languages have the concept of objects. You can think of an “object” in a programming language like a real-world object like a DVD player. A DVD player has some buttons you can press that will make it do various things (play this DVD, go to menu, display information on-screen) and it displays various information for you (playing, stopped, current time). Each of the buttons correspond to various actions the player can take. Each item that is displayed reflects some information about the current state of the player. Objects in a programming language are similar. Objects are collections of properties and methods. A property works like a variable. It holds some information about the object. In the DVD player example, the current position in the movie might be a property. Part way through the movie, the position might be 1:10:41. You could use the remote to change this property to 1:00:00 if you want to re-watch the last ten minutes. In Python, you can set the value of a property directly, just like a variable. A method works like a function. It performs some operation on the object. For the DVD player, a method might be something like “play this DVD”. A method might change some of the method’s properties (like set the counter to 0:00:00) and perform some other actions (start the disc spinning, put the video on the screen). A particular kind of object is called a class. So in the example, there is a class called “DVD Player”. When you create an object in the class, it’s called an instance. So, your DVD player is an instance of the class “DVD Player”. An instance behaves a lot like any other variable, except it contains meth- ods and properties. So, objects are really variables that contain variables and functions of their own. Objects in Python Classes in Python can be created by the programmer or can come from modules. We won’t be creating our own classes in this course, just using classes provided by modules. To instantiate an object, its constructor is used. This is a function that builds the object and returns it. For example, in Python, the module datetime provides a different set of date and time manipulation functions 100 UNIT 4. FUNCTIONS AND DECOMPOSITION import datetime newyr = datetime.date(2005, 01, 01) print newyr.year # the year property print newyr.strftime("%B %d, %Y") # the strftime method print newyr Figure 4.5: Date manipulation with the datetime module’s objects than the time module we saw in Topic 4.3. The datetime module provides everything in classes which contain all of the functions that can work on particular kinds of date information. The datetime module provides the class called “date” which can hold information about a day. Figure 4.5 shows an example of its use. After the datetime module is imported, a date object is created. The constructor for a date object is datetime.date()—this function from the datetime module returns a date object. This object is stored in the newyr variable. Now that we have an object to work with, we can start poking around at its properties (variables inside the object) and methods (functions inside the object). In the first print statement, you can see that a date object has a property called year. You get to a property in an object the same way you get to a function inside a module: use the name of the object, a dot, and the name of the property. The year behaves like a variable, except it’s living inside the date object named newyr. The second print statement shows the use of a method. Date objects contain a method called strftime that works a lot like the function from the time module. The strftime method takes whatever date is stored in its date object and formats that the way you ask it to. Finally, we see that a date object knows how to convert itself to a string if we just ask that it be printed. By default, it just uses a year-month-day format. So, the program in Figure 4.5 produces this output: 2005 January 01, 2005 2005-01-01 4.4. OBJECTS 101 The ways you can use an object depend on how the class has been defined. For example, some classes know how they can be “added” together with the + sign, but date doesn’t: >>> import datetime >>> first = datetime.date(1989, 12, 17) >>> print first 1989-12-17 >>> print first+7 TypeError: unsupported operand type(s) for +: ’datetime.date’ and ’int’ So, Python doesn’t know how to add the integer 7 to a date. But, it does know how to subtract dates: >>> import datetime >>> first = datetime.date(1989, 12, 17) >>> second = datetime.date(1990, 1, 14) >>> print second-first 28 days, 0:00:00 >>> type(second-first) <type ’datetime.timedelta’> So, something in the definition of the date class says that if you subtract two dates, you get a timedelta object. The timedelta class is also defined by the datetime module and its job is to hold on to lengths of time (the time between event A and event B). This is where the power of objects begins to show itself: a programmer can create objects that represent any kind of information and “know” how to do many useful operations for that type of information. Particularly when writing larger programs, classes and objects become very useful when it comes to organizing the information your program needs to work with. Object oriented programming is important in modern program- ming. It will be introduced in more detail in CMPT 125 and 225. Check-Up Question ◮ Have a look at the module reference for datetime. What can you do with a timedelta object? What other classes are provided? 102 UNIT 4. FUNCTIONS AND DECOMPOSITION m_str = raw_input("Enter your height (in metres): ") try: metres = float(m_str) feet = 39.37 * metres / 12 print "You are " + str(feet) + " feet tall." except: print "That wasn’t a number." Figure 4.6: Catching an exception Topic 4.5 Handling Errors So far, whenever we did something like ask for user input, we have assumed that it will work correctly. Consider the program in Figure 2.7, where we got the user to type their height and converted it to feet. If the user enters something that can’t be converted to a float, the results are not very pretty: Enter your height (in metres): tall Traceback (most recent call last): File "inches0.py", line 1, in ? metres = float(raw_input( \ ValueError: invalid literal for float(): tall This isn’t very helpful for the user. It would be much better if we could give them another chance to answer or at least a useful error message. Python lets you catch any kind of error, as Figure 4.6 shows. Here are two sample runs of that program: Enter your height (in metres): tall That wasn’t a number. Enter your height (in metres): 1.8 You are 5.9055 feet tall. Errors that happen while the program is running are called exceptions. The try/except block lets the program handle exceptions when they happen. If any exceptions happen while the try part is running, the except code is executed. It is ignored otherwise. 4.5. HANDLING ERRORS 103 got_height = False while not got_height: m_str = raw_input("Enter your height (in metres): ") try: metres = float(m_str) got_height = True # if we’re here, it was converted. except: print "Please enter a number." feet = 39.37 * metres / 12 print "You are " + str(feet) + " feet tall." Figure 4.7: Asking until we get correct input Figure 4.7 shows another example. In this program, the while loop will continue until there is no exception. The variable got height is used to keep track of whether or not we have the input we need. Check-Up Question ◮ Take a program you have written previously in this course that takes nu- meric input and modify it so it gives a nice error message. Summary This unit covers a lot of bits and pieces that don’t necessarily let your pro- grams do any more, but help you write programs that are better organized and are thus easier to maintain. The modules in Python are very useful. In this course, we will try to point out relevant modules when you need them; we don’t expect you to comb through the entire list of built-in modules every time you need something. Key Terms • functions • arguments 104 UNIT 4. FUNCTIONS AND DECOMPOSITION • return value • object • docstring • class • variable scope • property • module • method • import • exception Part II Problem Solving 105 Unit 5 Data Structures Learning Outcomes • Manipulate string data. • Use lists for storing and manipulating data. • Describe the difference between mutable and immutable data struc- tures. • Identify mutable and immutable data structures, and describe the dif- ference between them. • Describe the use of references in assignment and argument passing. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Chapters 7 and 8 in How to Think Like a Computer Scientist. Topic 5.1 Lists So far, all of the variables that we have used have held a single item: one integer, floating point value, or string. These types are good at storing the information they are designed for, but you will often find that you want to store a collection of values in your programs. 107 108 UNIT 5. DATA STRUCTURES For example, you may want to store a list of values that have been entered by the user or a collection of values that are needed to draw a graph. In Python, lists can be used to store a collection of other values. Lists in Python can hold values of any type; they are written as a comma-separated list enclosed in square brackets: numlist = [23, 10, -100, 2] words = [’zero’, ’one’, ’two’] junk = [0, 1, ’two’, [1,1,1], 4.0] Here, numlist is a list holding four integers; words holds three strings; and junk has five values of different types (one of them is itself a list). Lists are like strings To get a particular value out of a list, it can be subscripted. This is done just like subscripting a string to extract a single character: >>> testlist = [0, 10, 20, 30, 40, 50] >>> print testlist[2] 20 >>> print testlist[0] 0 >>> print testlist[10] IndexError: list index out of range Like strings, the first element in a list is element 0. You can determine the length of a list with the len function: >>> print len(testlist) 6 So, we can walk through each element of a list the same way we iterated over the characters in a string: >>> for i in range(len(testlist)): ... print testlist[i], ... 0 10 20 30 40 50 Lists can be joined (concatenated ) with the + operator: 5.1. LISTS 109 >>> testlist + [60, 70, 80] [0, 10, 20, 30, 40, 50, 60, 70, 80] >>> [’one’, ’two’, ’three’] + [1, 2, 3] [’one’, ’two’, ’three’, 1, 2, 3] Lists can also be returned by functions: >>> s = ’abc-def-ghi’ >>> s.split(’-’) [’abc’, ’def’, ’ghi’] >>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] In all of the above examples, lists are similar to strings. As far as we’ve seen so far, you could just think of strings as just lists of characters. Lists (so far) work just like strings, except you can put anything in each element, not just a character. If you ever program in C, you’ll find that there really isn’t any “string” type in C. Strings are just implemented as lists (called arrays in C) of characters. In Python, strings get a separate data type. Lists are different from strings Of course, the biggest difference between lists and strings is what they can hold. A string holds only characters, but a list can hold any type of Python data. More than that, there are many operations that you can do on lists that aren’t possible on string. It’s not possible to change individual parts of a string without totally rebuilding it. When you want to change a string, you need to write an expression that created a new string, and over-wrote the old variable. With a list, you can assign a new value to a part of the list, without having to rebuild the entire list. This is called element assignment. >>> colours = [’red’, ’yellow’, ’blue’] >>> print colours [’red’, ’yellow’, ’blue’] >>> colours[1] = ’green’ >>> print colours [’red’, ’green’, ’blue’] 110 UNIT 5. DATA STRUCTURES The third statement here changes a single element of the lists, without having the change the entire list (by doing a colours=... assignment). It is also possible to delete an element from a list, using the del statement. >>> colours = [’red’, ’yellow’, ’blue’] >>> del colours[1] >>> print colours [’red’, ’blue’] >>> del colours[1] >>> print colours [’red’] You can also add a new element to the end of a list with the append method. >>> colours = [’red’, ’yellow’, ’blue’] >>> colours.append(’orange’) >>> colours.append(’green’) >>> print colours [’red’, ’yellow’, ’blue’, ’orange’, ’green’] In order to do something similar with a string, a new string must be built with the + operator: >>> letters = ’abc’ >>> letters = letters + ’d’ >>> letters = letters + ’e’ >>> print letters abcde You can do the same thing with lists (rebuild them with + or other operators), but it’s slower. As another example, see Figure 5.1 All of these operations can change part of a list. Changing one element (or a few elements) is more efficient than creating an entirely new list. These operations can make working with lists quite efficient. If you try change part of a string, you will get an error. We will discuss this difference further in Topic 5.5. There are many other list operations as well: lists are a very flexible data structure. See the online Python reference for more details. 5.2. LISTS AND FOR LOOPS 111 print "Enter some numbers, 0 to stop:" numbers = [] x=1 while x!=0: x = int(raw_input()) if x!=0: numbers.append(x) print "The numbers you entered are:" print numbers Figure 5.1: Building a list with the append method Topic 5.2 Lists and for loops In an earlier example, you might have noticed how the range function was used to create a list: >>> range(10) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] But all along, we have been using the range function with the for loop: for i in range(10): # do something with i ... So, what’s the relationship? It turns out that the for loop in Python can iterate over any list, not just those produced by the range function. The range function is a convenient way to produce a list of integers, but not the only way. We have seen other ways to build a list, and those can be used with the for loop as well. Have a look at the code in Figure 5.2. There, the for loop iterates over each element in the list words. The for loop does the same thing it does with a range: it runs the loop body once for each element. The output of Figure 5.2 is: 112 UNIT 5. DATA STRUCTURES words = ["up", "down", "green", "cabbage"] for word in words: print "Here’s a word: " + word Figure 5.2: Iterating over a list Here’s a word: up Here’s a word: down Here’s a word: green Here’s a word: cabbage Iterating through the elements of a list can be quite convenient. It’s common to have to do the same operation on each element of a list, and this gives an easy way to do it. It also makes code very readable. You can interpret the loop in Figure 5.2 as “for every word in the list words, do this. . . .” So, the meaning of the code is very close to the way you’d read it. This is always a benefit when trying to read and maintain code. Topic 5.3 Slicing and Dicing Hopefully you are comfortable with indexing by now. You can access a single element from a string or list with indexing: >>> colours = [’red’, ’yellow’, ’blue’] >>> colours[1] = ’green’ # set an element with indexing >>> print colours[2] # index to retrieve an element ’blue’ It’s also possible to access several elements of a list by slicing. Slicing looks like indexing, but you can specify an entire range of elements: >>> colours = [’red’, ’yellow’, ’green’, ’blue’] >>> print colours[1:3] [’yellow’, ’green’] As you can see, the slice [1:3] refers to elements 1–2 from the list. In general, the slice [a:b] extracts elements a to b-1. 5.3. SLICING AND DICING 113 You might be thinking that the slice operator should extract all of the elements from a to b (including b), but it stops one before that. Maybe it’s not intuitive, but it does match the behaviour of the range function: range(a,b) gives you all of the integers from a to b-1, and the slice [a:b] gives you the elements from a to b-1. Special Slice Positions In addition to selecting “elements a to b-1,” there are special values that can be used in a slice. Negative values count from the end of a list. So, -1 refers to the last item in the list, -2 to the second-last, and so on. You can use this to (for example) extract everything except the last element: >>> colours = [’red’, ’yellow’, ’green’, ’blue’] >>> print colours[0:-1] [’red’, ’yellow’, ’green’] If you leave out one of the values in the slice, it will default to the start or end of the list. For example, the slice [:num] refers to elements 0 to num-1. The slice [2:] gives elements from 2 to the end of the list. Here are some more examples: >>> colours = [’red’, ’yellow’, ’green’, ’blue’, ’orange’] >>> print colours[2:] [’green’, ’blue’, ’orange’] >>> print colours[:3] [’red’, ’yellow’, ’green’] >>> print colours[:-1] [’red’, ’yellow’, ’green’, ’blue’] The slice [:-1] will always give you everything except the last element; [1:] will give everything but the first element. These cases in particu- lar come up fairly often when programming. It’s common to use the first element of a list (work with thelist[0]) and then continue with the tail (thelist[1:]). Similarly, you can work with the last element((thelist[-1]), and then use the head of the list (thelist[:-1]). 114 UNIT 5. DATA STRUCTURES Manipulating Slices You can actually do almost anything with list slices that you can do with simple indexing. For example, you can assign to a slice: >>> colours = [’red’, ’yellow’, ’green’, ’blue’] >>> colours[1:3] = [’yellowish’, ’greenish’] >>> print colours [’red’, ’yellowish’, ’greenish’, ’blue’] >>> colours[1:3] = [’pink’, ’purple’, ’ecru’] >>> print colours [’red’, ’pink’, ’purple’, ’ecru’, ’blue’] Notice that in the second assignment above, we assigned a list of three el- ements to a slice of length two. The list expands to make room or the new elements: the slice colours[1:3] ([’yellowish’, ’greenish’]) is re- placed with the list [’pink’, ’purple’, ’ecru’]. If the list assigned had been shorter than the slice, the list would have shrunk. You can also remove any slice from a list: >>> colours = [’red’, ’yellow’, ’green’, ’blue’] >>> del colours[1:3] >>> print colours [’red’, ’blue’] Slices give you another way to manipulate lists. With both slices and the operators and methods mentioned in Topic 5.1, you can do many things with lists. Topic 5.4 Strings Much of what we have seen in the previous sections about lists also applies to strings. Both are considered sequence types in Python. Strings are a sequence of characters; lists are a sequence of any combination of types. In Topic 5.2, we saw that the for loop can iterate through any list. Lists aren’t the only type that can be used in the for loop. Any type that represents a collection of values can be used as the “list” in a for loop. Since a string represents a sequence of characters, it can be used. For example, this program iterates through the characters in a string: 5.5. MUTABILITY 115 for char in "abc": print "A character:", char When this is executed, it produces this output: A character: a A character: b A character: c List looping over a list, this can make your code very readable, and is often a very useful way to process a string. Slicing Strings In Topic 5.3, we used slices to manipulate parts of lists. You can slice strings using the same syntax as lists: >>> sentence = "Look, I’m a string!" >>> print sentence[:5] Look >>> print sentence[6:11] I’m a >>> print sentence[-7:] string! But, you can’t modify a string slice, just like you can’t assign to a single character of a string. >>> sentence = "Look, I’m a string!" >>> sentence[:5] = "Wow" TypeError: object doesn’t support slice assignment >>> del sentence[6:10] TypeError: object doesn’t support slice assignment Just like lists, you can use the slice [:-1] to indicate “everything but the last character” and [1:] for “everything but the first character”. Topic 5.5 Mutability You may be wondering why assigning to a slice (or single element) works for a list, but not a string. For example: 116 UNIT 5. DATA STRUCTURES dots = dots + "." # statement #1 values = values + [n] # statement #2 values.append(n) # statement #3 Figure 5.3: Manipulating strings and lists >>> colours = [’red’, ’yellow’, ’green’, ’blue’] >>> colours[1:3] = [’yellowish’, ’greenish’] >>> print colours [’red’, ’yellowish’, ’greenish’, ’blue’] >>> sentence = "Look, I’m a string!" >>> sentence[:5] = "Wow" TypeError: object doesn’t support slice assignment Why is it possible to do more with lists than strings? In fact, lists are the only data structure we have seen that can be changed in-place. There are several ways to modify an existing list without totally replacing it: assigning to a slice, using del to remove an element or slice, extending with the append method, and so on. Notice that none of these require creating a new list. On the other hand, any string manipulation requires you to build a new string which can then be stored in a variable. Consider the statements in Figure 5.3. Statement #1 first builds a new string object (by evaluating the right side of the assignment, dots + "."), and then stores that new string in dots. The old value in dots is discarded because it’s no longer in use. In statement #2, the same thing happens with a list. A new list is built (by evaluating values + [n]), and the variable values is set to refer to it instead of its old value. The old value is discarded since it’s no longer used. Each of these statements requires a lot of work if the initial string/list is large: it must be copied, along with the new item, and the old data is dropped from memory. Statement #3 has the same effect as #2, but it happens in a very different way. In this case, the append method uses the existing list, and just adds another element to the end. This requires much less work, since the list doesn’t have to be copied as part of evaluating an expression. At the end of statement #2 or #3, the values variable holds the same list. In the case of #3, the list has been modified but not entirely rebuilt. This should be clear since it is not an assignment statement (i.e. a var= 5.6. REFERENCES 117 statement). Any statement that assigns to a variable must be building a new value in the expression on the right of the =. If there is no assignment, the variable is still holding the same object, but the object may have changed. Data structures that can be changed in-place like this are called mutable. Lists are the only mutable data structure we have seen in detail. The strings and numbers are not mutable: they are immutable. Objects in Python are mutable if they contain methods that can change them without a new assignment. The date objects that were used in Topic 4.4 are immutable since there are no methods that can modify an existing date object. There are other object types in Python modules that have methods that modify them in-place: these are mutable objects. In Python 2.4, two set types were added that can be created with the set() and frozenset() functions. These hold values like lists, but they aren’t in any order. They are just a collection of values. The only difference between a set and frozenset is that sets are mutable (contain methods like add to insert a new element), and frozen sets are immutable (those methods aren’t included). There are instances where either is useful, so they are both available. Topic 5.6 References There are several cases where the contents of one variable are copied to another. In particular, here are two operations that require duplicating the variable x: # x copied to a parameter variable in some_function: print some_function(x) # x copied into y: y = x You probably don’t think of copying the contents of a variable as a difficult operation, but consider the case where x is a list with thousands of elements. Then, making an full copy of x would be a lot of work for the computer, and probably unnecessary since all of its contents are already in memory. In fact, Python avoids making copies where possible. To understand how this happens, it’s important to understand references. 118 UNIT 5. DATA STRUCTURES Statements: my_string = "one" + "two" + "three" my_list = [0, 10, 20] Result: "onetwothree" my_string my_list [0, 10, 20] Figure 5.4: Variables referencing their contents Every variable in Python is actually a reference to the place in memory where its contents are stored. Conceptually, you should think of a variable referencing its contents like an arrow pointing to the contents in memory. In Figure 5.4, you can see a representation of two variables referencing their contents. When you use a variable in an expression, Python follows the reference to find its contents. When you assign to a variable, you are changing it so the variable now references different contents. (The old contents are thrown away since they are no longer being referenced.) Usually, the expression on the right side of an assignment creates a new object in memory. The calculation is performed, and the result is stored in memory. The variable is then set to refer to this result. For example, total = a+b calculates a+b, stores this in memory, and sets total to reference that value. The exception to this is when the right side of an assignment is simply a variable reference (like total=a). In this case, the result is already in memory and the variable can just reference the existing contents. For example, in Figure 5.5, my_list is created and refers to a list. When my_list is assigned to list_copy, the reference is copied, so the list is only stored once. This saves memory, and is faster since the contents don’t have to be copied to another location in memory. 5.6. REFERENCES 119 Statements: my_list = [0, 10, 20] list_copy = my_list Result: my_list [0, 10, 20] list_copy Figure 5.5: Reference copied during assignment: aliasing Statements: my_list = [0, 10, 20] list_copy = my_list list_copy.append(30) my_list.append(40) Result: my_list [0, 10, 20, 30, 40] list_copy Figure 5.6: Changing either alias changes both Aliases When two variables refer to the same contents, they are aliases of each other. This has always happened when we assigned one variable to another, and it’s generally good since it doesn’t require copying the contents to another location in memory. But, now that we have mutable data structures (lists and some objects), aliases complicate things. Since mutable data structures can be changed without totally rebuilding them, we can change the contents without moving the reference to a new object in memory. That means that it’s possible to change a variable, and the changes will 120 UNIT 5. DATA STRUCTURES Statements: my_string = "one" + "two" + "three" string_copy = my_string string_copy = string_copy + "four" Result: "onetwothree" my_string "onetwothreefour" string_copy Figure 5.7: An expression creates a new reference and breaks the alias affect any other variables that reference the same contents. For example, in Figure 5.6, my_list and list_copy are aliases of the same contents. When either one is changed, both are affected. If you were to print out my_list and list_copy after these statement, you would find that they are both [0, 10, 20, 30, 40]. The same things would happen if we used any methods that change the list (or any object that has such methods). So for any mutable data structure, aliasing is an issue. Immutable data structures can also be aliased, but since they can’t be changed, it never causes problems. For example, in Figure 5.7, a string is aliased when it is copied. But the only way to change it is to construct a new string during an assignment and the alias is removed. Remember that any expression (that’s more complicated than a variable reference) will result in a new reference being created. If this is assigned to a variable, then there is no aliasing. This is what happened in Figure 5.7. It also occurs with lists, as you can see in Figure 5.8. Really Copying If you want to make a copy of a variable that isn’t a reference, it’s necessary to force Python to actually copy its contents to a new place in memory. This is called cloning. Cloning is more expensive than aliasing, but it’s necessary when you do want to make a copy that can be separately modified. 5.6. REFERENCES 121 Statements: my_list = [0, 10, 20] bigger_list = my_list + [30] Result: [0, 10, 20] my_list [0, 10, 20, 30] bigger_list Figure 5.8: A calculation creates a new instance containing the results Statements: my_list = [0, 10, 20] list_copy = my_list[:] Result: my_list [0, 10, 20] list_copy [0, 10, 20] Figure 5.9: Slicing a list forces copying of its elements For lists, the slice operator can be used to create a clone. Since cloning requires creating a new list, Python will copy whatever contents are needed by the slice. In Topic 5.3, we saw that in a slice like [a:b], leaving out the a starts the slice at the start of the original list, and leaving out the b goes to the end of the list. If we combine these, we have a slice that refers to the entire list: my_list[:]. For example, in Figure 5.9, you can see the slice operator being used to copy the contents of a list. This creates a new list and reference. Now, the lists can be changed independently. You could also make a copy of a list with the list function that creates a new list (out of the old one). So, list(my_list) would give the same result as my_list[:]. 122 UNIT 5. DATA STRUCTURES For other data types, the copy module contains a copy function. This function will take any Python object and clone its contents. If obj is a Python object, this code will produce a clone in new_obj: import copy new_obj = copy.copy(obj) This should work with any mutable Python object, including lists. It will also clone immutable objects, but it’s not clear why you would want to do that. Summary In this unit, you have learned the basics of lists in Python. You should also have picked up more tools that can be used to manipulate strings. The concept of references might seem odd at first, but it’s fundamental to many programming languages. It’s one of those ideas that comes up often, and you’ll have to be able to deal with it when it does. Key Terms • list • mutable • append • reference • slice • alias • sequence • cloning Unit 6 Algorithms Learning Outcomes • Use and compare two algorithms for searching in lists. • Use sorting to solve problems. • Design and implement functions that use recursion. • Understand and implement a simple sorting algorithm. • Describe some problems that aren’t computable. • Use recursion to solve problems and implement algorithms. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Sections 4.9–4.11 in How to Think Like a Computer Scientist. Topic 6.1 Searching Searching is an important program in computing. “Searching” is the problem of looking up a particular value in a list or other data structure. You generally want to find the value (if it’s there) and determine its position. We will only worry about searching in lists here. There are many other data structures that can be used to store data; each one has its own searching algorithms that can be applied. 123 124 UNIT 6. ALGORITHMS def search(lst, val): """ Find the first occurrence of val in lst. Return its index or -1 if not there. >>> search([0, 10, 20, 30, 40], 30) 3 >>> search([0, 10, 20, 30, 40], 25) -1 """ for i in range(len(lst)): if lst[i]==val: # we only care about the first match, # so if we’ve found one, return it. return i # if we get this far, there is no val in lst. return -1 Figure 6.1: Python implementation of linear search Linear Search For lists in general, you have to look through the whole list to determine if the value is present or not. The algorithm for this is simple: just search through the list from element 0 to the end: if you find the value you’re looking for, stop; if you never do, return −1. (We will always use the “position” −1 to indicate “not found”.) This search algorithm is called linear search and a Python implementation can be found in Figure 6.1. What’s the running time of a linear search for a list with n items? At worst, it will have to scan through each of the n elements, checking each one. So, the running time is n. This isn’t too bad, but if you have to do a lot of lookups in a list, it will take a lot of time. We can do better if the list is arranged properly. 6.1. SEARCHING 125 The list method lst.index(val) does a linear search to find the position of val in the list lst. If the value isn’t there, it causes a ValueError instead of returning −1 like Figure 6.1. The “in” operator (if val in lst:) also uses linear search. Binary Search Think about how you look up numbers in a phone book. If you are trying to find out who has a particular number, you have to look through the whole book like the linear search does. Starting with the first person in the phone book and scan all of the phone numbers until you find the phone number you’re looking for, or get to the end of the book. So, it’s possible to use the phone book to translate a phone number to the person’s name that owns it, but it’s very impractical. On the other hand, what you usually do with a phone book is translate a persons name to a phone number. This is a lot easier because phone books are sorted by name, so you don’t have to scan every entry; you can quickly find the person you’re looking for. So, if we have a Python list that’s in order, we should be able to take advantage of this to search faster. We can write an algorithm that formalizes what you do with a phone book: use the fact that it’s sorted to find the right general part of the book, then the right page, the right column, and the right name. The algorithm that is used to search in sorted lists has a lot in common with the guessing game that was designed in Figure 1.3 and implemented in Figure 3.11. You can think of this game as searching for the value the user was thinking of in the list [1, 2, 3, ..., 100]. The search algorithm will use the same strategy: keep track of the first and last possible position that the value you’re looking for can be. This will start at the first and last items in the list, but can be narrowed quickly. Then, look at the list item that’s halfway between the first and last pos- sible values. If it’s too large, then you know that none of the values after it in the list are possibilities: they are all too large. Now you only have to look at the first half of the list, so the problem has immediately been chopped in half. Similarly, if the value we check in the middle is too small, then we only have to look at values after it in the list. In the guessing game, we did the 126 UNIT 6. ALGORITHMS exact same thing, except we had to ask the user to enter “less”, “more”, or “equal”. Here, you can just check the value in the list. This algorithm is called binary search. A Python implementation of bi- nary search can be found in Figure 6.2. Just like the original guessing game, this algorithm cuts the list we’re searching in half with each iteration. So, like that algorithm, it has running time log n. Have a look back at Figure 3.17 for a comparison of n (linear search) and log n (binary search). As you can see, the binary search will be much faster for large lists. But, to use binary search, you have to keep the list in sorted order. That means that you can’t just use list.append() to insert something into the list anymore. New values have to be inserted into their proper location, so inserting takes a lot more work Inserting into a sorted list takes up to n steps because Python has to shuffle the existing items in the list down to make room. You don’t see this since you can just use list.insert(), but it does happen behind-the-scenes. This means that keeping a sorted list and doing binary search is only worthwhile if you need to search a lot more than you insert. If you have some data that doesn’t change very often, but need to find values regularly, it’s more efficient to keep the list sorted because it makes searches so much faster. Searching is covered in more detail in CMPT 225 and 307. These courses discuss other data structures and how they can be used to hold sets of data so searching, inserting, and deleting are all efficient. There are data structures that can do insert, search, and delete all with running time log n, but they are complicated and aren’t covered until CMPT 307. Topic 6.2 Sorting Sorting is another important problem in computing science. It’s something that comes up very often, in a variety of situations. There are many problems that can be solved quite quickly by first sorting the values you need to work with: once the values are in order, many problems become a lot easier. 6.2. SORTING 127 def binary_search(lst, val): """ Find val in lst. Return its index or -1 if not there. The list MUST be sorted for this to work. >>> binary_search([2, 4, 5, 6, 24, 100, 1001], 100) 5 >>> binary_search([2, 4, 5, 6, 24, 100, 1001], 10) -1 >>> binary_search([2, 4, 5, 6, 24, 100, 1001], 2000) -1 """ # keep track of the first and last possible positions. first = 0 last = len(lst)-1 while first <= last: mid = (first+last)/2 if lst[mid] == val: # found it return mid elif lst[mid] < val: # too small, only look at the right half first = mid+1 else: # lst[mid] > val # too large, only look at the left half last = mid-1 # if we get this far, there is no val in lst. return -1 Figure 6.2: Python implementation of binary search 128 UNIT 6. ALGORITHMS word = raw_input("Enter the word: ") counter = 0 letters = list(word) # converts to a list of characters. # ’gene’ becomes [’g’,’e’,’n’,’e’] letters.sort() # now identical letters are adjacent # above becomes [’e’,’e’,’g’,’n’] for i in range(len(word)-1): if letters[i]==letters[i+1]: counter = counter + 1 if counter>0: print "There are repeated letters" else: print "There are no repeated letters" Figure 6.3: Checking for repeated letters with sorting In the previous topic, you saw that searching is much faster if the list is sorted first. Sorting takes longer than even a linear search, but if there are going to be many searches, it is worth the work. A list in Python can be put in order by calling its sort method: >>> mylist = [100, -23, 12, 8, 0] >>> mylist.sort() >>> print mylist [-23, 0, 8, 12, 100] Example: Repeated letters with sorting As an example of a problem where sorting can greatly speed up a solution, recall the problem of finding repeated letters in a string. Figure 3.13 gives an algorithm with running time n2 and Figure 3.14 is a Python implementation. Now consider the program in Figure 6.3. It does the same thing as the program in Figure 3.14, but it will be significantly faster than the previous solution for long strings. 6.2. SORTING 129 1000 n2 800 600 steps n log2 n 400 200 0 20 40 60 80 100 n Figure 6.4: Graph of the functions n2 and n log2 n The idea is that the program first sorts the letters of the string. Then, any identical letters will be beside each other. So, to check for repeated letters, you only have to skim through the characters once, looking at characters i and i+1 to see if they are the same. If none are, then there are no repeated letters. The sort method has running time n log n on a list with n elements. The rest of the program just scans once through the list, so it takes n steps. The total running time for Figure 6.3 will be n log n + n. Removing the lower-order terms, we get n log n. See Figure 6.4 for a comparison of n2 and n log n. As you can see, the new program that takes advantage of a fast sorting algorithm will be much faster as n grows. How to sort Usually, you can use the sort method that comes with a list in Python when you need to get items in order. But, sorting is important enough that you should have some idea of how it’s done. 130 UNIT 6. ALGORITHMS As noted above, the sort method of a list has running time n log n. In general, it’s not possible to sort n items in less than n log n running time. There are also some special cases when it’s possible to sort with running time n. We won’t cover these algorithms here, though. Algorithms that sort in n log n steps are fairly complicated and will have to wait for another course. So will a proof of why that’s the best you can do. If you’re really interested in n log n sorting, look for mergesort or quicksort algorithms. You probably have some idea of how to sort. Suppose you’re given a pile of a dozen exam papers and are asked to put them in order by the students’ names. Many people would do something like this: 1. Flip through the pile and find the paper that should go first. 2. Put that paper face-down on the table. 3. Repeat from step 1 with the remaining pile, until there are no papers left. This method is roughly equivalent to the selection sort algorithm that can be used on a list. The idea behind selection sort is to scan through a list for the smallest element and swap it with the first item. So, if we look at the list 6, 2, 8, 4, 5, 3, a selection sort will do the following operations to get it in order. At each step, the parts of the list that we know are sorted are bold. Iteration Initial List Operation 1. 6, 2, 8, 4, 5, 3 swap 2 and 6 2. 2, 4, 8, 6, 5, 3 swap 3 and 4 3. 2, 3, 8, 6, 5, 4 swap 4 and 8 4. 2, 3, 4, 6, 5, 8 swap 6 and 5 5. 2, 3, 4, 5, 6, 8 swap 6 and 6 (do nothing) 6. 2, 3, 4, 5, 6, 8 swap 8 and 8 (do nothing) 2, 3, 4, 5, 6, 8 Pseudocode of the selection sort algorithm can be found in Figure 6.5. A Python implementation can be found in Figure 6.6. This algorithm has running time n2 . If you count the number of times the inner loop runs for each element, you find that it takes n steps for the first element, then n − 1 for the second, 6.3. RECURSION 131 for every element e from the list, for every element f from e to the end of the list, if f < smallest, set smallest to f swap smallest and e Figure 6.5: Algorithm for selection sort then n − 2, . . . , 2, 1. So, the total number of steps is n X n(n − 1) i= = n2 /2 − n/2 . i=1 2 Removing lower-order terms and constants, you can see that the running time is n2 . Selection sort will be quite slow for large lists; one of the n log n algorithms should be used instead. Sorting is covered in more detail in CMPT 125, 225, and 307. As you progress, the focus is less on sorting itself and more on how it is used to solve other problems. Topic 6.3 Recursion As we have seen many times, it’s possible for a function to call another function. For example, in Figure 6.1, the search function uses both range and len to create the appropriate loop. But, it’s also possible for a function to call itself. This is useful when a problem can be solved by breaking it into parts and solving the parts. This technique is called recursion, and is very important since many algorithms are most easily described recursively. For example, consider calculating the factorial of a number. The factorial of n is usually written n! and is the product of the numbers from 1 to n: 1 × 2 × 3 × · · · × (n − 1) × n. The factorial function is often defined in terms of itself: 1 for n = 0 n! = n × (n − 1)! for n > 0 132 UNIT 6. ALGORITHMS def selection_sort(lst): """ Sort lst in-place using selection sort """ for pos in range(len(lst)): # get the next smallest in lst[pos] # find the next smallest small = lst[pos] # smallest value seen so far smallpos = pos # position of small in lst for i in range(pos+1, len(lst)): # check each value, searching for one # that’s smaller than the current smallest. if lst[i] < small: small = lst[i] smallpos = i # swap it into lst[pos] lst[pos], lst[smallpos] = lst[smallpos], lst[pos] Figure 6.6: Python implementation of selection sort def factorial(n): """ Calculate n! recursively. >>> factorial(10) 3628800 >>> factorial(0) 1 """ if n==0: return 1 else: return n * factorial(n-1) Figure 6.7: Recursively calculate factorials 6.3. RECURSION 133 original call returns 6 factorial(3) calculates 3 * factorial(2) calls returns 2 factorial(2) calculates 2 * factorial(1) calls returns 1 factorial(1) calculates 1 * factorial(0) calls returns 1 factorial(0) Figure 6.8: Functions calls made to calculate factorial(3) We can use this same definition to create a Python function that calculates the factorial of a (positive) integer. The code in Figure 6.7 defines a function that correctly calculates n! . It does this by calling itself to calculate the factorial of n − 1. How It Works Whenever a function calls itself, you should think of a new copy of the func- tion being made. For example, if we call factorial(3), while running, it will call factorial(2). This will be a separate function call, with separate arguments and local variables. It will run totally independently of the other instance of the function. Figure 6.8 contains a diagram representing all of the calls to the factorial function required to calculate factorial(3). As you can see, factorial(3) 134 UNIT 6. ALGORITHMS calls factorial(2), which itself calls factorial(1), which calls factorial(0). Because of the way the factorial function is written, factorial(0) returns one. Then, factorial(1) can complete its calculations and return; then factorial(2) can finish. Finally, factorial(3) completes and returns the result originally requested. Once everything is put together, the function actually computes the cor- rect value. This is made possible by the design of the function: as long as the recursive calls return the right value, it will calculate the correct result for n. The idea that the function calls itself, or that many copies of the function are running at one time will probably seem strange at first. What you really need to remember is simple: it works. Python can keep track of several instances of the same function that are all operating simultaneously, and what each one is doing. Understanding Recursion When looking at a recursive algorithm, many people find it too complicated to think of every function call, like in Figure 6.7. Keeping track of every step of the recursion isn’t really necessary to believe that the function works, or even to design one yourself. When reading a recursive function, you should just assume that the recur- sive calls return the correct value. In the example, if you take on faith that factorial(n-1) returns (n − 1)! correctly, then it’s clear that the function does the right thing in each case and actually calculates n! . You can let the programming language take care of the details needed to run it. If you look at Figure 6.7 and assume that the recursive call works, then it’s clear that the whole function does as well. The key to the logic here is that the recursive calls are to smaller instances of the same problem. As long as the recursive calls keep getting smaller, they will eventually hit the n = 0 case and the recursion will stop. When writing recursive functions, you have to make sure that the function and its recursive calls are structured so that analogous logic holds. Your function must make recursive calls that head towards the base case that ends the recursion. We will discuss creating recursive functions further in the next topic. 6.4. DESIGNING WITH RECURSION 135 Students who have taken MACM 101 may recognize this as being very similar to proofs by induction. The ideas are very similar: we need somewhere to start (base case), and some way to take a step to a smaller case (recursive/inductive case). Once we have those, everything works out. Topic 6.4 Designing with Recursion As an example of a recursive function, let’s look at the problem of reversing a string. We will construct a recursive function that does this, so calling reverse("looter") should return "retool". In order to write a recursive function to do this, we need to view the problem in the right way and create a recursive function that implements our recursive algorithm. Step 1: Find a Smaller Subproblem The whole point of making a recursive call is to solve a similar, but smaller problem. If we have decomposed the problem properly, we can use the re- cursive solution to build a solution to the problem we are trying to solve. In the factorial example, this relies on noticing that n! = n × (n − 1)! (in almost all cases). Then, if we can somehow calculate (n − 1)! it’s easy to use that to calculate n! . To reverse the string, we have to ask: if we reverse part of the string, can we use that to finish reversing the whole string? If we reverse the tail of the string (string[1:], everything except the first character), we can use it. For example, if we are trying to reverse the string "looter", the tail (string[1:]) is or "ooter". If we make a recursive call to reverse this (reverse(string[1:])), it should return "retoo". This is a big part of the final solution, so the recursive call will have done useful work. We can later use this to build the reverse of the whole string. Step 2: Use the Solution to the Subproblem Once you have taken the problem you’re given and found a subproblem that you can work with, you can get the result with a recursive call to the function. 136 UNIT 6. ALGORITHMS In the factorial example, we know that once we have calculated (n − 1)! , we can simply multiply by n to get n! . In the Python implementation, this becomes n * factorial(n-1). If we put our faith in the correctness of the calculation factorial(n-1), then this is definitely the correct value to return. When reversing a string, if we reverse all but the first character of the string (reverse(string[1:])), we are very close to the final solution. All that remains is to put the first character (string[0]) on the end. Again using "looter" as an example, reverse(string[1:]) returns "retoo" and string[0] is "l". The whole string reversed is: reverse(string[1:]) + string[0] This evaluates to "retool", the correct answer. In general, this expression gives the correct reversal of string. Again, in both examples the method is the same: make a recursive call on the subproblem, and use its result to construct the solution you have been asked to calculate. Step 3: Find a Base Case There will be a few cases where the above method won’t work. Typically these will be the smallest cases where it’s not possible to subdivide the prob- lem further. These cases can simply be handled with an if statement. If the arguments point to a base case, return the appropriate result. Otherwise, proceed with the recursive solution as designed above. In the factorial example, the identity n! = n × (n − 1)! isn’t true for n = 0. In this case, we know the correct answer from the definition of factorial: 0! = 1. In the implementation, we check for this case with an if statement and return the correct result. In all other cases, the recursive solution is correct, so we use it. For reversing a string, there is also one case where the method outlined above for the recursive case can’t be followed. Again, we look at the smallest cases of the problem, which cannot be subdivided further. What should be the result when we call reverse("")? Presumably, the reverse of the empty string is "", but the above method won’t give this result. In fact, if we try to extract element 0 from this string, it will cause an IndexError since it doesn’t have a zero-th character. Since this case 6.4. DESIGNING WITH RECURSION 137 doesn’t match the recursive algorithm, it will be our base case and handled separately. We should also check other small cases: what is the reverse of a single- character string? The function call reverse("X") should return "X". We can check our recursive method for this case, when string is "X": reverse(string[1:]) + string[0] == reverse("") + "X" Since we just decided that reverse("") will return "", this becomes "" + "X", which is "X". This is the correct result, so we don’t have to worry about single-character strings as a base case: the recursive case already handles them correctly. It’s very important every recursive call will eventually get to a base case. The recursive call that is made in the function must be at least one step closer to the base case: the part we decide to solve recursively is smaller than the original problem. Remember that the base case is where the recursion will end. Once it does, the base case will return the correct result (since we made sure it did). From there, the recursive calls will begin to “unwind”, with each one returning the correct result for the arguments it was given. If this isn’t the case, the function will keep making more and more re- cursive calls without ever stopping. This is called infinite recursion. Look back at Figure 6.8 and imagine what it would look like if we didn’t have the special case for num==0. It would keep making more and more calls until the program was halted. This is analogous to the infinite loops you can create with while. Python will stop when the recursion passes a certain “depth”. It will give the error “maximum recursion depth exceeded”. If you see this, you have probably created infinite recursion. Step 4: Combine the Base and Recursive Cases Once we have identified the base case(s) and what recursive calculation to do in other cases, we can write the recursive function. Assembling the parts is relatively easy. In the function, first check to see if the argument(s) point to a base case. If so, just return the solution for this case. These should be very easy since the base cases are the smallest possible instances of the problem. 138 UNIT 6. ALGORITHMS if we have a base case, return the base case solution otherwise, set rec result to the result of a recursive call on the subproblem return the solution, built from rec result Figure 6.9: Pseudocode for a recursive algorithm If we don’t have a base case, then the recursive case applies. We find our subproblem and call the same function recursively to solve it. Once that’s done, it should be possible to transform that into the solution we need. Once we know it, it will be returned. Pseudocode for a recursive function can be found in Figure 6.9. Compare this with the implementation of factorial in Figure 6.7. Figure 6.7 doesn’t put the recursive result in a variable because the calculation is so short, but it’s otherwise the same. The example of reversing a string has been implemented in Figure 6.10. It uses the parts constructed above and the outline in Figure 6.9. It does correctly return the reverse of any string. Debugging Recursion Debugging a recursive function can be trickier than non-recursive code. In particular, when designing the recursive function, we made an assumption: that the recursive call to the same function returned the correct result. Since the function is relying on itself working, tracing problems can be difficult. The key to finding errors in recursive code is to first side-step this problem. There are cases in the recursive function that don’t make recursive calls: the base cases. This gives us somewhere to start testing and debugging. The first thing that should be done when testing or trying to find errors is to examine the base case(s). For example, in the above examples, the first things to test would be: >>> factorial(0) 1 >>> reverse("") ’’ 6.4. DESIGNING WITH RECURSION 139 def reverse(string): """ Return the reverse of a string >>> reverse("bad gib") ’big dab’ >>> reverse("") ’’ """ if len(string)==0: # base case return "" else: # recursive case rev_tail = reverse(string[1:]) return rev_tail + string[0] Figure 6.10: Recursively reversing a string If the base cases work correctly, then at least we have that to work with. If not, we know what code to fix. Once we know the base cases are working, we can then easily test the cases that call the base case. That is, we now want to look at the arguments to the function that are one step away from the base case. For example, >>> factorial(1) 1 >>> reverse("X") ’X’ >>> reverse("(") ’(’ In each case here, the recursive code runs once, and it calls the base case. We can manually check the calculations in these cases and confirm that our expectations match the results. If these aren’t correct, then there is something wrong with the way the recursive code uses the base case (which we have already tested) to compute the final solution. There are a couple of possibilities here: the wrong recursive 140 UNIT 6. ALGORITHMS call might be made, or the calculation done with the recursive result could be incorrect. Both of these should be checked and corrected if necessary. If these cases work, but the overall function still isn’t correct, you can proceed to cases that are another step removed (two steps from the base case). For example, factorial(2) and reverse("Ab"). You should quickly find some case that is incorrect and be able to diagnose the problem. Another Example As a final example of recursion, we will create a function that inserts spaces between characters in a string and returns the result. The function should produce results like this: >>> spacedout("Hello!") ’H e l l o !’ >>> spacedout("g00DbyE") ’g 0 0 D b y E’ We can work through the steps outlined above to create a recursive solu- tion. 1. [Find a Smaller Subproblem.] Again, we look for cases that are one “step” smaller and can contribute to an overall solution. Here, we will take all but the last character of the string, so for "marsh", we will take the substring "mars". The recursive call will be spacedout(string[:-1]). 2. [Use the Solution to the Subproblem.] If the recursive is working cor- rectly, we should get a spaced-out version of the substring. In the example, this will be "m a r s". We can use this to finish by adding a space and the last character to the end. So, the final result will be rec_result + " " + string[-1]. 3. [Find a Base Case.] The above method clearly won’t work for the empty string since there is no last character to remove. In this case, we can just return the empty string. But, the above also won’t work for strings with a single character. The recursive method applied to "X" will return " X", with an extra space at the beginning. We will have to handle these as a base case as well: the correct result is to return the same string unchanged. 6.4. DESIGNING WITH RECURSION 141 def spacedout(string): """ Return a copy of the string with a space between each character. >>> spacedout("ab cd") ’a b c d’ >>> spacedout("") ’’ >>> spacedout("Q") ’Q’ """ if len(string) <= 1: return string else: head_space = spacedout(string[:-1]) return head_space + " " + string[-1] Figure 6.11: Recursively putting spaces between characters in a string In fact, these two cases can easily be combined. If the string has 0 or 1 characters, it can be returned unchanged. 4. [Combine the Base and Recursive Cases.] The above work has been combined in the implementation in Figure 6.11. Check-Up Questions ◮ Repeat the spaced-out string example using everything but the first charac- ter as the subproblem. That is, for the string "hearth", the subproblem should be "earth", instead of "heart". You should be able to get a different recursive function that produces the same results. ◮ Write a recursive function list_sum(lst) that adds up the numbers in the list given as its argument. ◮ Write a recursive function power(x,y) that calculates xy , where y is a positive integer. To create a subproblem, you will have to decrease one of x or y. Which one gets you a useful result? 142 UNIT 6. ALGORITHMS Topic 6.5 What isn’t computable? Throughout this course, we have solved many problems by creating algo- rithms and implementing the algorithms in Python. But, there is a funda- mental question here: are there problems that you can’t write a computer program to solve? The answer is yes. It’s possible to prove that, for some problems, it’s impossible to write a program to solve the problem. That means that the lack of a solution isn’t a failing of the programming language you’re using, your programming abilities, or the amount of time or money spent on a solution. There is no solution to these problems because it’s fundamentally impossible to create one with the tools we have available to do computations. The Halting Problem For example consider the following problem: Consider a particular program. Given the input given to the program, determine whether or not the program will ever finish. This problem is called the halting problem. The basic idea is to decide whether or not a program “halts” on particular input. Does it eventually finish or does it go into an infinite loop? This is something that would be very useful to be able to compute. The Python environment could have this built in and warn you if your program was never going to finish. It would be very helpful when it comes to debug- ging. Unfortunately, it’s impossible to write a program that looks at any pro- gram to determine whether or not it halts. Suppose someone comes along who claims to have created a function halts(prog, input) that solves the halting problem: it returns True if the program in the file prog halts on the given input and False if not. You write the program in Figure 6.12 and ask if you can test their function. This program is based on asking the halts function what happens when a program is given itself as input. So, we imagine what would happen if you ran a program and then typed its filename and pressed enter. Maybe that isn’t sensible input for the program but that doesn’t matter: we only care if the program halts or not, not if it does something useful. 6.5. WHAT ISN’T COMPUTABLE? 143 prog = raw_input("Program file name: ") if halts(prog, prog): while True: print "looping" else: print "done" Figure 6.12: Fooling a function that claims to solve the halting problem. Does the program in Figure 6.12 halt? Sometimes. If its input is a program that halts when given itself as input, Figure 6.12 enters an infinite loop. If the input program doesn’t halt, then the program in Figure 6.12 stops immediately. Suppose we run the program in Figure 6.12 and give it itself as input. So, we enter the file name of the program in Figure 6.12 at the prompt. What answer does the program give? Without knowing anything about the halts function, we know that it will always give the wrong answer for these inputs. These are the two possibilities: 1. halts("fig6.12.py", "fig6.12.py") returns True. Then the pro- gram in Figure 6.12 would have entered an infinite loop: it should have returned False. 2. halts("fig6.12.py", "fig6.12.py") returns False. Then the pro- gram in Figure 6.12 would have printed one line and stopped: it should have returned True. So, no matter what claims are made, a program that claims to compute the halting problem will always make some mistakes. The idea of feeding a program itself as input might seem strange at first, but it’s a perfectly legitimate thing to do. It should be enough to convince you that there are some inputs where a halting function fails. There will be many more for any particular attempt. There’s nothing that says you can’t write a program that answers the halting problem correctly sometimes. The point is that the problem in gen- eral is impossible to solve with any computer. 144 UNIT 6. ALGORITHMS Virus Checking One problem that you may have had to deal with in the real-world is keeping viruses away from your computer. A computer virus is a program that is designed to spread itself from one location to another—between different programs and computers. There are many programs that scan your computer for viruses and report any problems. All of these programs require an up to date list of virus definitions. Every week or so, they download a new list of viruses over the Internet. These “definitions” contain information like “files that contain data like this. . . are infected with virus X.” You may have wondered why this is necessary: why can’t virus checkers just look for programs that behave like viruses. By definition, a virus has to be a program that reproduces; just look for programs that put a copy of themselves into another program. But again, we run into a problem of computability. Writing a program to check for programs that reproduce themselves isn’t possible. So, it’s im- possible to write a perfect anti-virus program. The most effective solution seems to be the creation of lists of viruses and how to detect them. A fairly simple program can scan a hard drive looking for known virus signatures. The downside is that the program will only detect known viruses. When new viruses are created, they must be added to the list. As a result, it’s necessary to update virus definition files regularly. Again, remember that these problems (halting and virus checking) aren’t impossible to compute because of some limitation in a programming lan- guage or the computer you’re working with. They are impossible with every computational device anybody has ever thought of. Summary This unit covers several important aspects of algorithms and computability. Sorting, searching, and recursion are important techniques for designing effi- cient algorithms; you will see much more of them if you go on in Computing Science. The uncomputability topic is important to the way problems are solved with computer. The same problems can be solved with any computer and there are some problems that can’t be solved with any computer. 6.5. WHAT ISN’T COMPUTABLE? 145 Key Terms • searching • recursion • linear search • base case • binary search • uncomputable • sorting • halting problem • selection sort • virus checking 146 UNIT 6. ALGORITHMS Unit 7 Working with Files Learning Outcomes • Create a program that outputs information to a text file. • Create a program that reads a text file and does some simple processing of its contents. • Describe the role on the operating system. • Explain in general terms how a disk stores information. Learning Activities • Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Read Chapter 11 in How to Think Like a Computer Scientist. Topic 7.1 File Output Up to this point, we have had many ways to manipulate information in the computer’s memory, but haven’t had any way to write information to a file on a disk. Doing so will allow us to save information permanently, and create files that can be loaded into other programs. We will only discuss reading and writing text files in this course. These are files that consist only of ASCII characters. These files can be opened and edited with a text editor like the IDLE editor, Notepad, or Simpletext. 147 148 UNIT 7. WORKING WITH FILES dataout = file("sample.txt", "w") dataout.write("Some text\n") dataout.write("...that is written to a file\n") dataout.close() Figure 7.1: Writing text to a file Some text ...that is written to a file Figure 7.2: File created by Figure 7.1 Writing data to a text file is quite easy in Python. Writing a text file is a lot like printing text to the screen with print. Before we can send information to a file, it must be opened . The file function opens a file on the disk for our use, and returns a file object that represents the file. For example, dataout = file("sample.txt", "w") This opens file file sample.txt in the current directory for write (the "w" indicates that we will write to the file). A file object is returned and stored in dataout. Note that when a file is opened for write, any existing contents are discarded. When you’re done with a file object, is should be closed with the close method. This writes all of the data to the file and frees it up so other programs can use it. For example, dataout.close() When we have a file object opened for write, we can send text to it. This text will be stored in the file on disk. The standard way to write text to a file object is to use its write method. The write method takes a string as its argument, and the characters in the string are written to the file. See Figure 7.1 for a complete program that creates a text file. This program produces a file sample.txt; its contents can be seen in Figure 7.2. In the two calls to the write method, you can see that a newline character is produced to indicate a line break should be written to the file. In a Python string, \n is used to indicate a newline character. 7.1. FILE OUTPUT 149 import math csvfile = file("sample.csv", "w") for count in range(10): csvfile.write( str(count) + "," ) csvfile.write( str(count*count) + "," ) csvfile.write( str(math.sqrt(count)) + "\n" ) csvfile.close() Figure 7.3: Writing text to a file When using print, a newline character is automatically generated after each statement. When using the write method, this must be done manually. A line break is produced for every \n in the argument. For example, there would be a blank line between the two lines of text if this statement had been used: dataout.write("Some text\n\n") There is actually a form of the print statement that can be used on files. The statement print >>dataout, "Hello world" will “print” to the file object dataout. Text files in specific formats can be used by many programs. As long as you know how information must be arranged in the file, it’s possible to write a Python program to create a file than can then be imported/loaded in another program. This can be very useful in many applications. Spreadsheet programs can import comma-separated value (or CSV ) files. The format for these files is simple. Each line in the file represents a row in the spreadsheet; the cells on each line are separated by commas. It is quite easy to produce this format in a program. For example, the program in Figure 7.3 produces a CSV file with three columns: the first column is a number, the second is the square of the number, and the third is the square root. A row is produced for every number in range(10). The output of this program can be seen in Figure 7.4. This file can be loaded into any spreadsheet program, or any other program that can import CSV files. This way, a data produced in a Python program can be passed to a spreadsheet or other program for further manipulation or analysis. 150 UNIT 7. WORKING WITH FILES 0,0,0.0 1,1,1.0 2,4,1.41421356237 3,9,1.73205080757 4,16,2.0 5,25,2.2360679775 6,36,2.44948974278 7,49,2.64575131106 8,64,2.82842712475 9,81,3.0 Figure 7.4: File created by Figure 7.3 The csv module contains objects that allow even more convenient reading and writing of CSV files. It correctly handles cells that contain commas and other special characters, which is tricky to do by-hand. Check-Up Questions ◮ Run the program in Figure 7.3; it will produce a file sample.csv. Load this into a spreadsheet program. Topic 7.2 File Input Reading data from a file is somewhat like getting input from the user with raw_input. The same problems arise: bad input and extracting the informa- tion we want from the string of characters. But, unlike user input, we can’t just ask the question again if we get bad input. We either have to process the data in the file or fail outright. A file can be opened for reading the same way as for writing, except we use "r" to indicate that we want to read the file. datain = file("sample.txt", "r") Again, a file object is returned and we store it in the variable datain. File input objects should also be closed when you’re done using them. 7.2. FILE INPUT 151 filename = raw_input("Enter the file name: ") datain = file(filename, "r") for line in datain: print len(line) datain.close() Figure 7.5: Reading a text file and counting line lengths The usual way to read a text file in Python is to process it one line at a time. Files can be very large, so we probably don’t want to read the whole file into memory at once. Handling one line at a time doesn’t use too much memory, and is often the most useful way to look at the file anyway. In Topics 5.2 and 5.4, we saw that the for loop in Python can be used to iterate over every item in any list or string. It can also be used to iterate through file objects. The body of the for is executed once for every line in the file. For example, the code in Figure 7.5 reads a text file and prints the number of characters on each line. If we give it the file from Figure 7.2, the output would be: Enter the file name: sample.txt 10 29 Again, the for loop reads just like what it actually does: “for (every) line in (the) datain (file object). . . ”. Processing File Input Have a more careful look at the text in Figure 7.2, and the sample output of Figure 7.5 above. The first line in Figure 7.2 is “Some text” (9 characters), but the program reports the length of the line as 10. Where did that extra character come from? Look back at the code in Figure 7.1 that produced the file. The first call to write actually produces 10 characters: 9 “visible” characters and a newline. It we modified Figure 7.5 to just output the contents of line 152 UNIT 7. WORKING WITH FILES instead of the length, we would see an extra line break caused by the newline character in the string. There are a couple of ways to deal with the newline character if you don’t want it in the string as you process it. The easiest is to just discard the last character of each line: for line in datain: line = line[:-1] ... Assuming there’s a newline on the last line of the file, this will work. If you want to get rid of the newline and any other spaces or tabs at the end of each line, the rstrip string method can do that: for line in datain: line = line.rstrip() ... Remember that this removes any trailing whitespace, which may not be appropriate in all situations. Once the newline character has been removed (if necessary), you can process the line as you would with any other string. Of course, there are many different ways you might need to handle the lines of a file. As an example, we will read a file that contains a time on each line: each time will be in hh :mm :ss format. We will calculate and report the total number of seconds represented by each time. For example, 1:00:00 is 60 × 60 = 3600 seconds. In order to do this, we will read each line of the file (as in Figure 7.5). Any trailing whitespace will be removed from the line with rstrip. Finally, the line can be split into hour, minute, second sections by the split string method. The split method is used to divide a string into “words” that are sepa- rated by a given delimiter. In this case, we want to divide the string around any colon characters, so the method call will be string.split(":"). This will return substrings for the hour, minute, and second, in order. Once we have strings for each of the hour, minute, and second, these can be converted to integers and the number of seconds easily calculated. This has been done in Figure 7.6. Figure 7.7 contains a sample input file, and Figure 7.8 contains the output when the program is run. 7.2. FILE INPUT 153 timefile = file("times.txt", "r") total_secs = 0 for line in timefile: # break line up into hours, minutes, seconds h,m,s = line.rstrip().split(":") # calculate total seconds on this line secs = 3600*int(h) + 60*int(m) + int(s) total_secs += secs print secs timefile.close() print "Total:", total_secs, "seconds" Figure 7.6: Program to read a series of times in a file 2:34:27 0:58:10 01:09:56 0:23:01 10:12:00 Figure 7.7: Input file (times.txt) for Figure 7.6 9267 3490 4196 1381 36720 Total: 55054 seconds Figure 7.8: Output of Figure 7.6 154 UNIT 7. WORKING WITH FILES Topic 7.3 The Operating System In Topic 7.2, you didn’t have to know what part of the disk contained the file, you only had to know its file name to get at its contents. Similarly, in the Topic 7.1, when you wanted to write data to a file, you didn’t have to actually know how information was arranged on the disk or what part of the disk your file was being written to. When your data is written to the disk, something has to make sure you’re given a part of the disk that isn’t being used by another file; if you’re using an existing file name, the old version has to be overwritten; and so on. All of this is taken care of by the operating system. The operating system is a piece of software that takes care of all communication with the computer’s hardware. The operating system handles all communications with the hard disk, printer, monitor, and other pieces of hardware. Thus, it can make sure that no two application programs are using the same resource at the same time. When we write files, we are relying on the operating system to give us parts of the hard disk, and put our data there so we can retrieve it later. Since the OS takes care of all of this, we don’t have to worry about it when we’re programming. The operating system is also responsible for allocating parts of the computer’s memory to particular programs; this is necessary when we use variables in a program. Modern operating systems come bundled with many applications: file managers, Wordpad, Media Player, iPhoto, and so on. Microsoft has even claimed that some of these (notably, Internet Explorer) are inseparable from the operating system. Still, they are application programs, not really part of the operating system, according to its definition. Whether or not they can be separated from the OS is another problem that has more to do with marketing than computing science. So, what really separates the operating system from other software is that the OS does all of the communication with hardware. It also mediates conflicts (if two applications want to access the hard disk at the same time) and allocates resources (giving out memory and processor time as needed). Figure 7.9 summarizes the communication between the various pieces of the system. In Figure 7.9, there is one user (Langdon) who has four applications open. 7.4. DISKS AND FILES 155 Langdon User(s) IDLE Firefox Solitaire MS Word Applications Windows 7 Operating System ... hard disk display printer ... Hardware Figure 7.9: Communication between the user, applications, OS, and hard- ware Whenever any of these applications needs to interact with the computer’s hardware (open a file, draw something on the screen, get keyboard input, and so on), it makes a request through the operating system. The operating system knows how to talk to the hardware (through device drivers) and fills these requests. Having the OS in the middle means that the applications don’t have to worry about the details of the hardware. If an application wants to print, it just asks the OS to do the dirty work. If not for the OS, every application would have to have its own printer drivers and it would be very hard to avoid problems if several tried to print at once. The role of the operating system and how it does its job are ex- plored further in CMPT 300 (Operating Systems). The interaction between hardware and the OS is discussed briefly in CMPT 250 (Computer Architecture). Topic 7.4 Disks and Files When we read and wrote files in Topics 7.1 and 7.2, we took for granted that the operating system could put the information on the disk and get it back later. This is no small job for the OS to do—we should look a little more at what happens. 156 UNIT 7. WORKING WITH FILES Note that whenever we’re talking about storing information, a disk can refer to any device you can use in a computer to store information. The storage must keep the information when the computer is turned off, so the computer’s memory doesn’t count. These are referred to as nonvolatile stor- age. They include: • hard drive: fast high capacity storage that is used to store the operating system, applications, and most of your data on a computer. • floppy disk : slow low capacity disks that can be easily transported. • flash media cards: small storage devices with no moving parts. These are often used with digital cameras, MP3 players, and other portable devices • USB “disks”: small keychain-sized devices that can be connected di- rectly to a USB port on a computer and then easily transported to another. • MP3 player or digital camera: These can often be connected directly to your computer and transfer information to and from built-in storage (or any inserted media cards). • compact disc: used to distribute programs and other information since they are high capacity and cheap to produce. You can’t write to com- pact discs (at least, not quite the same way you can to the other devices listed here). All of these are treated as “disks” by the operating system. As far as the user and programmer are concerned, they all work the same way. The operating system makes sure they all behave the same way as far as the user or programmer is concerned, regardless of how they work. Once the operating system knows how to physically store information on all of these “disks”, it still has to arrange the information so that it can find it later. When your program asks for a particular file, the computer has to know what information on the disk corresponds to that file; if you change the file, it has to change the information on the disk, adding or removing the space reserved for the file as necessary. The operating system arranges information on the disk into a file system. A file system is a specification of how information is stored on a disk, how to store directories or folders, information about the files (last modified date 7.4. DISKS AND FILES 157 Figure 7.10: A disk divided into blocks and access permissions, for example), and any other information that the computer has to store to keep everything working. The space on the disk is divided up into disk blocks. The block size can vary, but is most commonly 4 kB. The blocks are then allocated to the various files that are stored on the disk. Figure 7.10 shows that the blocks on a disk might look like. Figure 7.10 could be either a hard disk or floppy disk. The insides look the same, but information can be stored much more densely on a hard disk because it’s sealed in its enclosure. A read head sits above the disk surface and can read information from the disk as it spins below it. The time it takes to get information from the disk depends on two factors: the amount of time it takes to position the head so it’s the right distance from the centre of the disk, and the speed the disk is rotating. If the filesystem uses 4 kB blocks, a 10 kB file would need three blocks. That means that the file is actually using 12 kB of disk space—the last 2 kB in the last block are wasted. This is refereed to as internal fragmentation. Internal fragmentation occurs whenever some of the disk block is left over after the file is written—every file on the disk (unless it exactly fills the block) will cause a little internal fragmentation. When the computer tries to read this file, it will have to read the infor- mation from three widely separated blocks on the disk. This will be slow since you have to wait for the read head to move and disk to spin to the right position three times. If you defragment your hard drive, it repairs external fragmentation and moves scattered blocks like this together. In the example, we might end up 158 UNIT 7. WORKING WITH FILES Figure 7.11: The blocks of a file defragmented with the three-block file stored as in Figure 7.11. Now, reading them will be much faster: you only have to move the read head once and the three block will spin under the read head quickly since they are adjacent. Topic 7.5 Example Problem Solving: File Statistics As an example of using file input, we will create a program that opens up a text file specified by the user and outputs some statistics about it. We want the program to count the number of lines, words, and characters in the file. The first step will be to get the file name and open the file. This has been done in Figure 7.12. This program also loops through the lines in the file, counting as it goes. We can now test this program with a sample file, as seen in Figure 7.13. When we run the program in Figure 7.12 and give it this file, the output is: Filename: wc-test.txt Total lines: 6 There are six lines in the input file, so we’re on the right track. Next, we can try to count the number of characters. Since each line in the file is a string, we can just use len to get the number of characters in the string and add it to another counter. This has been done in Figure 7.14. 7.5. EXAMPLE PROBLEM SOLVING: FILE STATISTICS 159 # get the filename and open it filename = raw_input("Filename: ") file = open(filename, "r") # initialize the counter total_lines = 0 for line in file: # do the counting total_lines += 1 # summary output print "Total lines:", total_lines Figure 7.12: Word count: open file and count lines "One trick is to tell them stories that don’t go anywhere, like the time I caught the ferry over to Shelbyville. I needed a new heel for my shoe, so I decided to go to Morganville, which is what they called Shelbyville in those days. So I tied an onion to my belt, which was the style at the time..." - Abe Figure 7.13: Word count test file 160 UNIT 7. WORKING WITH FILES # get the filename and open it filename = raw_input("Filename: ") file = open(filename, "r") # initialize the counters total_lines = 0 total_chars = 0 for line in file: # do the counting total_lines += 1 total_chars += len(line) print total_chars # summary output print "Total lines:", total_lines print "Total characters:", total_chars Figure 7.14: Word count: counting characters 1 Notice that a print statement has been added to the loop to help with debugging. When this program is run on our sample data file, we get this output: Filename: wc-test.txt 64 129 191 255 305 311 Total lines: 6 Total characters: 311 It’s not easy to check if this is correct or not. The program probably should be tested on a smaller file, where we can actually count the number of characters by hand to verify. But, the debugging output has given us something to work with. Look at the last line of Figure 7.13. It contains five characters: dash, space, A, b, e. Why did our program count 311 − 305 = 6 characters on that 7.5. EXAMPLE PROBLEM SOLVING: FILE STATISTICS 161 # get the filename and open it filename = raw_input("Filename: ") file = open(filename, "r") # initialize the counters total_lines = 0 total_chars = 0 for line in file: # clean any trailing whitespace off the string line = line.rstrip() # do the counting total_lines += 1 total_chars += len(line) print total_chars # summary output print "Total lines:", total_lines print "Total characters:", total_chars Figure 7.15: Word count: counting characters 2 line? Like we saw in Topic 7.2, reading the lines from the file includes the newline characters. But, we don’t want to count the “invisible” characters in the file. We can use rstrip to get rid of these, along with any trailing spaces, before we do the counting. The program in Figure 7.15 does this. Now when we run the program on the sample file, we get this output: Filename: wc-test.txt 63 127 188 251 298 303 Total lines: 6 Total characters: 303 162 UNIT 7. WORKING WITH FILES Now we get the right number of characters on the last line. Testing with some other sample files confirms that we are now counting the characters on each line properly. We can now turn to the problem of counting the number of words on each line. Your first thought might be to just count the number of spaces on the line, but that won’t work. If there are several spaces together (the test file has two spaces after each period), then that will count as two “words”. In order to count the number of words in the line, we will to check for the beginning of each word. This takes some more careful examination of the string. Since it’s more complicated than the other counting we’ve done, it has been split into a separate function, words. The function words characterizes the ”start of a word” as a non-space character after by either the start of the string or a space. See Figure 7.16 for the implementation. Since it’s in a function, we can test it separately: >>> words("abc-def ghi") 2 >>> words("abcde f ghijkl") 3 >>> words("... ,,, :::") 3 >>> words("") 0 The final program is assembled in Figure 7.16. If we run it on our test program, we get this output: Filename: wc-test.txt Total lines: 6 Total words: 62 Total characters: 303 Check-Up Questions ◮ Try Figure 7.15 on a text file with a few shorter lines in it. Is it counting the number of characters correctly? ◮ Make a copy of Figure 7.16. Print out the number of words after each iteration of the for line in file loop. Does it match the number of words you count on each line? 7.5. EXAMPLE PROBLEM SOLVING: FILE STATISTICS 163 def words(line): """ Count the number of words in the string line. """ words = 0 for pos in range(len(line)): # line[pos] is the start of a word if it is a non-space # and it is either the start of string or comes after # a space if line[pos]!=" " and (pos==0 or line[pos-1]==" "): words += 1 return words # get the filename and open it filename = raw_input("Filename: ") file = open(filename, "r") # initialize the counters total_lines = 0 total_chars = 0 total_words = 0 for line in file: # clean any trailing whitespace off the string line = line.rstrip() # do the counting total_lines += 1 total_chars += len(line) total_words += words(line) # summary output print "Total lines:", total_lines print "Total words:", total_words print "Total characters:", total_chars Figure 7.16: Word count: final program 164 UNIT 7. WORKING WITH FILES ◮ In Figure 7.16, the if condition in the words function checks pos+1 against len(line)-1. What is the purpose of the plus one and minus one? Summary Working with files gives you a way to “save” things in your program. In- formation you put in a file can be read back in the next time the program runs. Hopefully you have an idea of how to read and write simple text files with a Python program. You should also have some idea of what actually happens when a program, one you write or any other, stores information on a disk. Key Terms • text file • file system • file object • disk block • newline character • internal fragmentation • operating system • external fragmentation • disk Part III Appendices 165 Appendix A Technical Instructions Learning Outcomes This material will help you learn how to use the software you need to do your work in this course. You won’t be tested on it. Learning Activities • Install the Python software, if you’re working with your own computer. • Follow along with the Python instructions yourself and make sure you can work with the tools. • Explore the software more on your own. Topic A.1 Installing Python We are going to use Python to write and run Python programs in this course. The following tutorial will help you get familiar with some of the functionality of the Python software. This installation tutorial assumes that you’re using Windows. Python is available for the MacOS and for Linux as well. You can use any operating system for your work in this course. You can also use Python in a computer lab on-campus. If you do, Python is already installed and you can skip to the next topic. You can download the most recent version of Python from the Python web site, http://www.python.org/download/. Click on the link that says: “Python 167 168 APPENDIX A. TECHNICAL INSTRUCTIONS 2.x.x Windows installer” (where 2.x.x is the most recent release of version 2). Save this file on your desktop. Do not download Python 3: it contains some incompatibilities that this Guide (and most other Python tutorials) do not take into account. For the a Macintosh, download the 32-bit version of Python, not the 64-bit version. Once the file has downloaded, double-click the installation file. You can safely accept all of the defaults for the installation. Make sure you install at least the “Python interpreter and libraries” and “Tck/Tk”. You will need those for this course. You will probably want the help files as well. You can delete the installation file you downloaded once the installation is complete. Topic A.2 Using Python There are two different interfaces where you can write Python code: IDLE (Integrated DeveLopment Environment) or the Command Line. We will use IDLE in this course since it provides a graphical interface for you to work with. You can start IDLE by selecting “IDLE (Python GUI)” from the Start menu if you’re using Windows. Note that in this sections, the screen shots are from Windows, but the instructions apply to any operating system. The usual first program that’s written in every programming language is one that prints “Hello World” on the screen. Let’s see how we can keep up the old tradition in Python. Open up the IDLE if you haven’t already. Your IDLE window will have some text similar to this: Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. IDLE 1.0.3 >>> The >>> at the bottom is the prompt for writing the statements. When you open up IDLE, your cursor should by default be in front of this prompt. Type in the following statement in IDLE: print "Hello World!" A.2. USING PYTHON 169 Figure A.1: IDLE, after running a “Hello World” statement You have just executed your first Python statement and your IDLE window should look like Figure A.1. Your first Python program Typing statements in the IDLE environment isn’t the only way to execute Python statements. You can make programs, store them and run them later. These programs are also called script files and are usually saved with a .py extension. Let’s make a simple script and run it. Select “New Window” from the File menu in IDLE. An editor window will appear on your screen. Type the same statement as you did in the IDLE earlier: print "Hello World!" Select the “Save As. . . ” option from the File menu and save the program as HelloWorld.py . Don’t forget the .py extension. Now run this script file. The easiest way to run and debug your script file is to press F5 while your script file’s editor window. The script file should run in the main IDLE window, displaying its output below the other output that previous commands have created. If you change your script file and try to run it, you will be asked if you want to save your file—you must save the program before it can be run. Change the “Hello World” program that you just made: add the following print statement after the first one: print "This is going to be fun!" 170 APPENDIX A. TECHNICAL INSTRUCTIONS Figure A.2: IDLE, displaying a syntax error Press F5 now. The following window will appear asking you to save the source first. Click on OK and the IDLE will display the out put of your script file. If there are any syntax errors in your script file, those are identified auto- matically before the file is run. For example, in Figure A.2 an error message popped up when F5 was pressed. The error here is the incorrect indenting of the second print statement—the cursor is moved the the interpreter’s best guess at the location of the error. Indentation plays a vital role in Python programming. You will learn more about this as you proceed in the course. Once any syntax errors are fixed, the script will run. There might be more errors that the interpreter can’t catch until its running the program. For example, in Figure A.3, the interpreter has caught an error. When it got to the word squidport, it didn’t know what to do with it. It can’t catch this any earlier since you might have created a variable or function called squidport; the only way to tell for the computer to tell for sure was to run it and see. The error message indicates that it noticed the error on line 3 of the current program (“-toplevel-”). The IDLE editor tells you the line number that the cursor’s on in the bottom left corner. In Figure A.3, “Ln: 4” indicates that the cursor is on line 4, just below the error. The “Ln: 21” in the interpreter window isn’t what we’re interested in: the error message A.3. COMPUTING ACCOUNTS 171 Figure A.3: IDLE, displaying a run-time error always gives lines in the source file. Remember: if you want to save the program so you can run it later (or hand it in), it has to go in an editor window, not at the IDLE >>> prompt. Topic A.3 Computing Accounts There are several username/password combinations you need for this course. Hopefully, this will help you keep them straight. All of the accounts have the same user names, but different passwords. • SFU Computing Account: This account is the one that all SFU students (and faculty and staff) get. It is used to retrieve your @sfu. ca email. All email sent to the course email list will go to this address (unless you have forwarded it elsewhere). This account is also used for Caucus, WebCT, and many other computing resources on-campus. You activate this account, go to my.sfu.ca and click the “Apply for ID” link. You need to enter your student number and some other personal information. You should contact the ACS Help Desk for problems with this account. 172 APPENDIX A. TECHNICAL INSTRUCTIONS • Gradebook Account: This account is used to access Gradebook (http://gradebook.cs.sfu.ca). Gradebook will be used to check your marks on assignments and exams. Your Gradebook password is also needed to access some parts of the course web site. Gradebook is gen- erally activated in the second week of the semester. This account is also used for the submission server, which is used to submit your work on assignments. Your initial password in Gradebook will be your student number (with no spaces or punctuation). If you have used Gradebook before, you password will be the same. If you have problems with your password, the instructor and TAs can reset it to your student number. • ACS Lab Account: This account is used to access the computers in any of the ACS labs on campus. You can do your work for this course in these labs if you wish. Your password on this account is the same as your SFU Computing Account. If you have a new account, you should be able to log in with no problem. If your account is older, you will have to synchronize your Active Directory password. This is a one time process and can be done on the web. You can contact the ACS Help Desk for problems with this account. More information on using the labs is available from the course web site. • CSIL Lab Account: This account is used to access the computers in the CSIL labs, which are run by the School of Computing Science. You can do your work for this course in these labs if you wish. Your password on this account is the same as your SFU Computing Account. You will need an access card to get in the door; you can get the access card by the second week of classes at the Security office. Topic A.4 Working in the Lab All of the software you need in this course is installed in both the ACS labs and the CSIL lab. You can access both and can work in either (or on your own computer is you prefer). A.4. WORKING IN THE LAB 173 See the course web site for more information about using the labs. Summary This material will help you learn how to use the software you need to do your work in this course. You won’t be tested on it. If there are any updates to this material, they will be posted on the course web site or sent by email. Index ==, 58 Boolean Expressions, 58 >>>, 30 (subtopic) in docstrings, 96 boolean expressions, 58 #, 69 boolean values, 58 %, 83 bugs, 77 \, 50 byte, 42 C++, 25 algorithm, 20 called, 90 Algorithms, 123 (unit) Calling Functions, 91 (subtopic) Aliases, 119 (subtopic) capacitors, 41 Another Example, 140 (subtopic) ceiling, 71 append method, 110 character, 46 argument, 34 character set, 47 optional, 34 characters, 30 arguments, 90 Characters and Strings, 46 ASCII, 47 (subtopic) assignment Choosing Control Structures, 63 to an element, 109 (topic) class, 99 base 10 arithmetic, 43 clause base 2 arithmetic, 43 elif, 59 base case, 134 else, 59 Binary, 41 (subtopic) cloning, 120 binary, 41, 43 code, 25 Binary Conversion, 84 (subtopic) Coding Style, 80 (topic) Binary Search, 125 (subtopic) Combine the Base and Recursive binary search, 126 Cases, 137 (subtopic) bit, 42 comma-separated value, 149 body, 56, 57 Comments, 80 (subtopic) 174 INDEX 175 comments, 69 device drivers, 155 compact disc, 156 digital camera, 156 computer program, 25 disk, 156 computer programming, 25 disk blocks, 157 computer science, 22 Disks and Files, 155 (topic) Computing Accounts, 171 (topic) docstring, 91, 96 computing science, 19, 22 doctest module, 96 Computing Science Basics, 19 documentation string, 91, 96 (unit) Doing Calculations, 32 (topic) concatenate, 39, 108 condition, 57 element assignment, 109 constructor, 99 elif clause, 59 Control Structures, 55 (unit) else clause, 59 CSV, 149 else clause, 59 (subtopic) empty string, 51 Data Structures, 21 (subtopic), escaping a character, 51 107 (unit) Example Problem Solving: Feet data structures, 21 and Inches, 48 (topic) Debugging, 77 (topic) Example Problem Solving: File Debugging Recursion, 138 Statistics, 158 (topic) (subtopic) Example Problem Solving: decimal, 43 Guessing Game, 65 (topic) default value, 34 Example: Repeated letters with Defining Functions, 89 (topic) sorting, 128 (subtopic) Defining your own functions, 90 exceptions, 102 (subtopic) expression, 32 Definite Iteration: for loops, 59 external fragmentation, 157 (topic) definite loops, 60 factorial, 131 defragment, 157 factorials, 60 del statement, 110 File Input, 150 (topic) delete file object, 148 list element, 110 File Output, 147 (topic) delimiter, 152 file system, 156 depth Find a Base Case, 136 (subtopic) of recursion, 137 Find a Smaller Subproblem, 135 Designing with Recursion, 135 (subtopic) (topic) Finding bugs, 79 (subtopic) 176 INDEX flash media cards, 156 Installing Python, 167 (topic) float, 38 instance, 99 floating point, 37 int, 38 floor, 26 integers, 37 floppy disk, 156 interactive interpreter, 30 for, 60 internal fragmentation, 157 for loop, 60 (subtopic) interpreter, 30 fragmentation external, 157 Java, 25 internal, 157 len, 34 Functions, 34 (subtopic) libraries, 97 functions Linear Search, 124 (subtopic) arguments, 90 linear search, 124 return values, 90 Lists, 107 (topic) Functions and Decomposition, 89 Lists and for loops, 111 (topic) (unit) Lists are different from strings, 109 (subtopic) Getting it right the first time, 78 Lists are like strings, 108 (subtopic) (subtopic) halting problem, 142 local variable, 94 Handling Errors, 102 (topic) Making Decisions, 55 (topic) hard drive, 156 Manipulating Slices, 114 How Computers Represent (subtopic) Information, 41 (topic) mergesort, 130 How It Works, 133 (subtopic) method, 99 How to sort, 129 (subtopic) module, 97 IDLE, 31, 168 More About Algorithms, 84 if condition, 57 (topic) if statement, 56 (subtopic) MP3 player, 156 immutable, 117 Mutability, 115 (topic) implementation, 25 mutable, 117 imports, 97 NameError, 67, 94 in-place, 116 newline character, 148 Indefinite Iteration: while loops, nonvolatile storage, 156 62 (topic) Number of “Steps”, 76 (subtopic) infinite loop, 63 infinite recursion, 137 Objects, 98 (topic) INDEX 177 Objects in Python, 99 (subtopic) return value, 34 opening a file, 148 round, 34 operating system, 154 rstrip string method, 152 operators, 32 Running Time, 70 (topic) optional argument, 34 overflow, 46 Searching, 123 (topic) selection sort, 130 Positive and Negative Integers, 44 sequence types, 114 (subtopic) slicing, 112 print statement, 30 manipulating slices, 114 Processing File Input, 151 strings, 115 (subtopic) Slicing and Dicing, 112 (topic) Programming Basics, 29 (unit) Slicing Strings, 115 (subtopic) programming language, 25 So?, 86 (subtopic) prompt, 30 Sorting, 126 (topic) property, 99 Special Slice Positions, 113 Pseudocode, 26 (topic) (subtopic) pseudocode, 26 split string method, 152 Python, 25, 29 Starting with Python, 29 (topic) interpreter, 30 statement Python errors print, 30 Name Error, 94 variable assignment, 36 Python Modules, 97 (topic) Statements, 31 (subtopic) quicksort, 130 Storing Information, 35 (topic) quotes, 33 string, 30, 37, 46 printing, 51–52 empty, 51 triple-quoted, 52 read head, 157 string subscripting, 72 Really Copying, 120 (subtopic) Strings, 114 (topic) Recommended Texts, 12 subscript, 108 (subtopic) Subset Sum, 74 (subtopic) Recursion, 131 (topic) Summary, 76 (subtopic), 82 recursion (subtopic) depth, 137 References, 117 (topic) Technical Instructions, 167 (unit) Repeated Letters, 72 (subtopic) text editor, 147 return, 90 text files, 147 return statement, 91 The Code Itself, 81 (subtopic) 178 INDEX The Guessing Game, 70 (subtopic) Why Use Functions?, 93 The Halting Problem, 142 (subtopic), 96 (subtopic) (subtopic) Working in the Lab, 172 (topic) The Interpreter vs. the Editor, 31 Working with Files, 147 (unit) (subtopic) The Operating System, 154 Your first Python program, 169 (topic) (subtopic) triple-quoted string, 52 two’s complement notation, 44 Type Conversion, 38 (subtopic) TypeError, 37 Types, 36 (topic) Understanding Recursion, 134 (subtopic) Unicode, 47 unsigned integer, 45 Unsigned Integers, 43 (subtopic) USB “disks”, 156 Use the Recursive Solution to the Subproblem, 135 (subtopic) User Input, 40 (topic) Using Python, 168 (topic) variable, 35 variable assignment statement, 36 Variable Scope, 94 (topic) virus, 144 Virus Checking, 144 (subtopic) What is an Algorithm?, 19 (topic) What is Computing Science?, 22 (topic) What is Programming?, 25 (topic) What isn’t computable?, 142 (topic) Why Python?, 25 (subtopic)
Authors Greg Baker
License CC-BY-SA-2.5