DOKK Library

Discrete Structures for Computer Science: Counting, Recursion, and Probability

Authors Michiel Smid

License CC-BY-SA-4.0

Plaintext
Discrete Structures for Computer Science:
  Counting, Recursion, and Probability

                Michiel Smid
            School of Computer Science
               Carleton University
                 Ottawa, Ontario
                     Canada
            michiel@scs.carleton.ca


                 July 22, 2019
Contents

Preface                                                                                                         vii

1 Introduction                                                                                                   1
  1.1 Ramsey Theory . . . . . . . . . . . . . . . . . . . . . . . . . .                                          1
  1.2 Sperner’s Theorem . . . . . . . . . . . . . . . . . . . . . . . .                                          4
  1.3 The Quick-Sort Algorithm . . . . . . . . . . . . . . . . . . . .                                           5

2 Mathematical Preliminaries                                                                                     9
  2.1 Basic Concepts . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    9
  2.2 Proof Techniques . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   11
      2.2.1 Direct proofs . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   13
      2.2.2 Constructive proofs . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
      2.2.3 Nonconstructive proofs .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   14
      2.2.4 Proofs by contradiction .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   15
      2.2.5 Proofs by induction . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   16
      2.2.6 More examples of proofs         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   18
  2.3 Asymptotic Notation . . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   20
  2.4 Logarithms . . . . . . . . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   22
  2.5 Exercises . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   23

3 Counting                                                                                                      25
  3.1 The Product Rule . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   25
      3.1.1 Counting Bitstrings of Length n                 .   .   .   .   .   .   .   .   .   .   .   .   .   26
      3.1.2 Counting Functions . . . . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   26
      3.1.3 Placing Books on Shelves . . . .                .   .   .   .   .   .   .   .   .   .   .   .   .   29
  3.2 The Bijection Rule . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   31
  3.3 The Complement Rule . . . . . . . . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   33
  3.4 The Sum Rule . . . . . . . . . . . . . .              .   .   .   .   .   .   .   .   .   .   .   .   .   34
iv                                                                                Contents

     3.5  The Principle of Inclusion and Exclusion . . . .        .   .   .   .   .   .   .   .   35
     3.6  Permutations and Binomial Coefficients . . . . .        .   .   .   .   .   .   .   .   37
          3.6.1 Some Examples . . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   39
          3.6.2 Newton’s Binomial Theorem . . . . . . .           .   .   .   .   .   .   .   .   40
     3.7 Combinatorial Proofs . . . . . . . . . . . . . . .       .   .   .   .   .   .   .   .   43
     3.8 Pascal’s Triangle . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   46
     3.9 More Counting Problems . . . . . . . . . . . . .         .   .   .   .   .   .   .   .   50
          3.9.1 Reordering the Letters of a Word . . . .          .   .   .   .   .   .   .   .   50
          3.9.2 Counting Solutions of Linear Equations .          .   .   .   .   .   .   .   .   51
     3.10 The Pigeonhole Principle . . . . . . . . . . . . .      .   .   .   .   .   .   .   .   55
          3.10.1 India Pale Ale . . . . . . . . . . . . . . .     .   .   .   .   .   .   .   .   55
          3.10.2 Sequences Containing Divisible Numbers           .   .   .   .   .   .   .   .   56
          3.10.3 Long Monotone Subsequences . . . . . .           .   .   .   .   .   .   .   .   57
          3.10.4 There are Infinitely Many Primes . . . .         .   .   .   .   .   .   .   .   58
     3.11 Exercises . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   59

4 Recursion                                                                                        83
  4.1 Recursive Functions . . . . . . . . . . . . . . . . .           .   .   .   .   .   .   .    83
  4.2 Fibonacci Numbers . . . . . . . . . . . . . . . . .             .   .   .   .   .   .   .    85
      4.2.1 Counting 00-Free Bitstrings . . . . . . . .               .   .   .   .   .   .   .    87
  4.3 A Recursively Defined Set . . . . . . . . . . . . .             .   .   .   .   .   .   .    88
  4.4 A Gossip Problem . . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .    91
  4.5 Euclid’s Algorithm . . . . . . . . . . . . . . . . .            .   .   .   .   .   .   .    94
      4.5.1 The Modulo Operation . . . . . . . . . . .                .   .   .   .   .   .   .    95
      4.5.2 The Algorithm . . . . . . . . . . . . . . .               .   .   .   .   .   .   .    95
      4.5.3 The Running Time . . . . . . . . . . . . .                .   .   .   .   .   .   .    97
  4.6 The Merge-Sort Algorithm . . . . . . . . . . . . .              .   .   .   .   .   .   .    99
      4.6.1 Correctness of Algorithm MergeSort . .                    .   .   .   .   .   .   .   100
      4.6.2 Running Time of Algorithm MergeSort                       .   .   .   .   .   .   .   101
  4.7 Computing the Closest Pair . . . . . . . . . . . .              .   .   .   .   .   .   .   104
      4.7.1 The Basic Approach . . . . . . . . . . . .                .   .   .   .   .   .   .   105
      4.7.2 The Recursive Algorithm . . . . . . . . . .               .   .   .   .   .   .   .   111
  4.8 Counting Regions when Cutting a Circle . . . . .                .   .   .   .   .   .   .   115
      4.8.1 A Polynomial Upper Bound on Rn . . . .                    .   .   .   .   .   .   .   115
      4.8.2 A Recurrence Relation for Rn . . . . . . .                .   .   .   .   .   .   .   118
      4.8.3 Simplifying the Recurrence Relation . . . .               .   .   .   .   .   .   .   123
      4.8.4 Solving the Recurrence Relation . . . . . .               .   .   .   .   .   .   .   124
  4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . .         .   .   .   .   .   .   .   125
Contents                                                                                v


5 Discrete Probability                                                                165
  5.1 Anonymous Broadcasting . . . . . . . . . . . . . . .           .   .   .   .   . 165
  5.2 Probability Spaces . . . . . . . . . . . . . . . . . . .       .   .   .   .   . 170
       5.2.1 Examples . . . . . . . . . . . . . . . . . . . .        .   .   .   .   . 171
  5.3 Basic Rules of Probability . . . . . . . . . . . . . . .       .   .   .   .   . 174
  5.4 Uniform Probability Spaces . . . . . . . . . . . . . .         .   .   .   .   . 179
       5.4.1 The Probability of Getting a Full House . . .           .   .   .   .   . 180
  5.5 The Birthday Paradox . . . . . . . . . . . . . . . . .         .   .   .   .   . 181
       5.5.1 Throwing Balls into Boxes . . . . . . . . . . .         .   .   .   .   . 184
  5.6 The Big Box Problem . . . . . . . . . . . . . . . . . .        .   .   .   .   . 185
       5.6.1 The Probability of Finding the Big Box . . . .          .   .   .   .   . 187
  5.7 The Monty Hall Problem . . . . . . . . . . . . . . . .         .   .   .   .   . 189
  5.8 Conditional Probability . . . . . . . . . . . . . . . . .      .   .   .   .   . 190
       5.8.1 Anil’s Children . . . . . . . . . . . . . . . . .       .   .   .   .   . 191
       5.8.2 Rolling a Die . . . . . . . . . . . . . . . . . .       .   .   .   .   . 192
       5.8.3 Flip and Flip or Roll . . . . . . . . . . . . . .       .   .   .   .   . 195
  5.9 The Law of Total Probability . . . . . . . . . . . . .         .   .   .   .   . 198
       5.9.1 Flipping a Coin and Rolling Dice . . . . . . .          .   .   .   .   . 200
  5.10 Please Take a Seat . . . . . . . . . . . . . . . . . . .      .   .   .   .   . 202
       5.10.1 Determining pn,k Using a Recurrence Relation           .   .   .   .   . 203
       5.10.2 Determining pn,k by Modifying the Algorithm            .   .   .   .   . 206
  5.11 Independent Events . . . . . . . . . . . . . . . . . . .      .   .   .   .   . 209
       5.11.1 Rolling Two Dice . . . . . . . . . . . . . . . .       .   .   .   .   . 209
       5.11.2 A Basic Property of Independent Events . . .           .   .   .   .   . 211
       5.11.3 Pairwise and Mutually Independent Events . .           .   .   .   .   . 212
  5.12 Describing Events by Logical Propositions . . . . . .         .   .   .   .   . 213
       5.12.1 Flipping a Coin and Rolling a Die . . . . . . .        .   .   .   .   . 214
       5.12.2 Flipping Coins . . . . . . . . . . . . . . . . .       .   .   .   .   . 215
       5.12.3 The Probability of a Circuit Failing . . . . . .       .   .   .   .   . 215
  5.13 Choosing a Random Element in a Linked List . . . .            .   .   .   .   . 217
  5.14 Long Runs in Random Bitstrings . . . . . . . . . . .          .   .   .   .   . 219
  5.15 Infinite Probability Spaces . . . . . . . . . . . . . . .     .   .   .   .   . 224
       5.15.1 Infinite Series . . . . . . . . . . . . . . . . . .    .   .   .   .   . 225
       5.15.2 Who Flips the First Heads . . . . . . . . . . .        .   .   .   .   . 227
       5.15.3 Who Flips the Second Heads . . . . . . . . . .         .   .   .   .   . 229
  5.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   .   .   . 231
vi                                                                                   Contents


6 Random Variables and Expectation                                          279
  6.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 279
       6.1.1 Flipping Three Coins . . . . . . . . . . . . . . . . . . . 280
       6.1.2 Random Variables and Events . . . . . . . . . . . . . . 281
  6.2 Independent Random Variables . . . . . . . . . . . . . . . . . 283
  6.3 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . 286
  6.4 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . 287
       6.4.1 Some Examples . . . . . . . . . . . . . . . . . . . . . . 288
       6.4.2 Comparing the Expected Values of Comparable Ran-
              dom Variables . . . . . . . . . . . . . . . . . . . . . . . 290
       6.4.3 An Alternative Expression for the Expected Value . . . 291
  6.5 Linearity of Expectation . . . . . . . . . . . . . . . . . . . . . 293
  6.6 The Geometric Distribution . . . . . . . . . . . . . . . . . . . 296
       6.6.1 Determining the Expected Value . . . . . . . . . . . . 297
  6.7 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . 299
       6.7.1 Determining the Expected Value . . . . . . . . . . . . 299
       6.7.2 Using the Linearity of Expectation . . . . . . . . . . . 302
  6.8 Indicator Random Variables . . . . . . . . . . . . . . . . . . . 303
       6.8.1 Runs in Random Bitstrings . . . . . . . . . . . . . . . 304
       6.8.2 Largest Elements in Prefixes of Random Permutations 306
       6.8.3 Estimating the Harmonic Number . . . . . . . . . . . . 309
  6.9 The Insertion-Sort Algorithm . . . . . . . . . . . . . . . . . . 311
  6.10 The Quick-Sort Algorithm . . . . . . . . . . . . . . . . . . . . 313
  6.11 Skip Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
       6.11.1 Algorithm Search . . . . . . . . . . . . . . . . . . . . 318
       6.11.2 Algorithms Insert and Delete . . . . . . . . . . . . 319
       6.11.3 Analysis of Skip Lists . . . . . . . . . . . . . . . . . . . 321
  6.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

7 The      Probabilistic Method                                                                      369
  7.1      Large Bipartite Subgraphs . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   369
  7.2      Ramsey Theory . . . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   371
  7.3      Sperner’s Theorem . . . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   374
  7.4      The Jaccard Distance between Finite Sets      .   .   .   .   .   .   .   .   .   .   .   377
           7.4.1 The First Proof . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   378
           7.4.2 The Second Proof . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   380
     7.5   Planar Graphs and the Crossing Lemma .        .   .   .   .   .   .   .   .   .   .   .   381
           7.5.1 Planar Graphs . . . . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   382
Contents                                                                     vii


        7.5.2 The Crossing Number of a Graph . . . . . . . . . . . . 386
  7.6   Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
viii   Contents
Preface

This is a free textbook for an undergraduate course on Discrete Structures
for Computer Science students, which I have been teaching at Carleton Uni-
versity since the fall term of 2013. The material is offered as the second-year
course COMP 2804 (Discrete Structures II). Students are assumed to have
taken COMP 1805 (Discrete Structures I), which covers mathematical rea-
soning, basic proof techniques, sets, functions, relations, basic graph theory,
asymptotic notation, and countability.
    During a 12-week term with three hours of classes per week, I cover most
of the material in this book, except for Chapter 2, which has been included
so that students can review the material from COMP 1805.
    Chapter 2 is largely taken from the free textbook Introduction to Theory
of Computation by Anil Maheshwari and Michiel Smid, which is available at
http://cg.scs.carleton.ca/~michiel/TheoryOfComputation/
    Please let me know if you find errors, typos, simpler proofs, comments,
omissions, or if you think that some parts of the book “need improvement”.
x
Chapter 1

Introduction

In this chapter, we introduce some problems that will be solved later in this
book. Along the way, we recall some notions from discrete mathematics that
you are assumed to be familiar with. These notions are reviewed in more
detail in Chapter 2.


1.1      Ramsey Theory
Ramsey Theory studies problems of the following form: How many elements
of a given type must there be so that we can guarantee that some property
holds? In this section, we consider the case when the elements are people
and the property is “there is a large group of friends or there is a large group
of strangers”.

Theorem 1.1.1 In any group of six people, there are
   • three friends, i.e., three people who know each other,
   • or three strangers, i.e., three people, none of which knows the other two.

   In order to prove this theorem, we denote the six people by P1 , P2 , . . . , P6 .
Any two people Pi and Pj are either friends or strangers. We define the
complete graph G = (V, E) with vertex set
                              V = {Pi : 1 ≤ i ≤ 6}
and edge set
                          E = {Pi Pj : 1 ≤ i < j ≤ 6}.
2                                                  Chapter 1.   Introduction


Observe that |V | = 6 and |E| = 15. We draw each edge Pi Pj as a straight-line
segment according to the following rule:

    • If Pi and Pj are friends, then the edge Pi Pj is drawn as a solid segment.

    • If Pi and Pj are strangers, then the edge Pi Pj is drawn as a dashed
      segment.

In the example below, P3 and P5 are friends, whereas P1 and P3 are strangers.

                              P2              P3




                    P1                                   P4




                              P6              P5

    Observe that a group of three friends corresponds to a solid triangle in
the graph G, whereas a group of three strangers corresponds to a dashed
triangle. In the example above, there is no solid triangle and, thus, there
is no group of three friends. Since the triangle P2 P4 P5 is dashed, there is a
group of three strangers.
    Using this terminology, Theorem 1.1.1 is equivalent to the following:

Theorem 1.1.2 Consider a complete graph on six vertices, in which each
edge is either solid or dashed. Then there is a solid triangle or a dashed
triangle.

Proof. As before, we denote the vertices by P1 , . . . , P6 . Consider the five
edges that are incident on P1 . Using a proof by contradiction, it can easily
be shown that one of the following two claims must hold:

    • At least three of these five edges are solid.

    • At least three of these five edges are dashed.
1.1.   Ramsey Theory                                                          3


We may assume, without loss of generality, that the first claim holds. (Do
you see why?) Consider three edges incident on P1 that are solid and denote
them by P1 A, P1 B, and P1 C.
    If at least one of the edges AB, AC, and BC is solid, then there is a solid
triangle. In the left figure below, AB is solid and we obtain the solid triangle
P1 AB.

                        A                            A
               P1               B            P1               B
                        C                             C

   Otherwise, all edges AB, AC, and BC are dashed, in which case we
obtain the dashed triangle ABC; see the right figure above.

    You should convince yourself that Theorem 1.1.2 also holds for complete
graphs with more than six vertices. The example below shows an example of
a complete graph with five vertices without any solid triangle and without
any dashed triangle. Thus, Theorem 1.1.2 does not hold for complete graphs
with five vertices. Equivalently, Theorem 1.1.1 does not hold for groups of
five people.

                                    P2             P3




                         P1




                                    P5             P4

   What about larger groups of friends/strangers? Let k ≥ 3 be an integer.
The following theorem states that even if we take b2k/2 c people, we are not
guaranteed that there is a group of k friends or a group of k strangers.
   A k-clique in a graph is a set of k vertices, any two of which are connected
by an edge. For example, a 3-clique is a triangle.
4                                                    Chapter 1.       Introduction


Theorem 1.1.3 Let k ≥ 3 and n ≥ 3 be integers with n ≤ b2k/2 c. There
exists a complete graph with n vertices, in which each edge is either solid or
dashed, such that this graph does not contain a solid k-clique and does not
contain a dashed k-clique.

    We will prove this theorem in Section 7.2, using elementary counting
techniques and probability theory. This probably sounds surprising to you,
because Theorem 1.1.3 does not have anything to do with probability. In
fact, in Section 7.2, we will prove the following claim: Take k = 20 and
n = 1024. If you go to the ByWard Market in downtown Ottawa and take a
random group of n people, then with very high probability, this group does
not contain a subgroup of k friends and does not contain a subgroup of k
strangers. In other words, with very high probability, every subgroup of k
people contains two friends and two strangers.


1.2      Sperner’s Theorem
Consider a set S with five elements, say, S = {1, 2, 3, 4, 5}. Let S1 , S2 , . . . , Sm
be a sequence of m subsets of S, such that for all i and j with i 6= j,

                               Si 6⊆ Sj and Sj 6⊆ Si ,

i.e., Si is not a subset of Sj and Sj is not a subset of Si . How large can m
be? The following example shows that m can be as large as 10:

     S1 = {1, 2}, S2 = {1, 3}, S3 = {1, 4}, S4 = {1, 5}, S5 = {2, 3},
     S6 = {2, 4}, S7 = {2, 5}, S8 = {3, 4}, S9 = {3, 5}, S10 = {4, 5}.

Observe that these are all subsets of S having size two. Can there be such
a sequence of more than 10 subsets? The following theorem states that the
answer is “no”.

Theorem 1.2.1 (Sperner) Let n ≥ 1 be an integer and let S be a set with
n elements. Let S1 , S2 , . . . , Sm be a sequence of m subsets of S, such that for
all i and j with i 6= j,
                                  Si 6⊆ Sj and Sj 6⊆ Si .
Then                                           
                                            n
                                  m≤             .
                                          bn/2c
1.3.    The Quick-Sort Algorithm                                              5


The right-hand side of the last line is a binomial coefficient, which we will
define in Section 3.6. Its value is equal to the number of subsets of S having
size bn/2c. Observe that these subsets satisfy the property in Theorem 1.2.1.
    We will prove Theorem 1.2.1 in Section 7.3, using elementary counting
techniques and probability theory. Again, this probably sounds surprising to
you, because Theorem 1.2.1 does not have anything to do with probability.


1.3      The Quick-Sort Algorithm
You are probably familiar with the QuickSort algorithm. This algorithm
sorts any sequence S of n ≥ 0 pairwise distinct numbers in the following way:
   • If n = 0 or n = 1, then there is nothing to do.

   • If n ≥ 2, then the algorithm picks one of the numbers in S, let us
     call it p (which stands for pivot), scans the sequence S, and splits it
     into three subsequences: One subsequence S1 contains all elements in
     S that are less than p, one subsequence only consists of the element p,
     and the third subsequence S2 contains all elements in S that are larger
     than p; see the figure below.

                           <p      p            >p
                           S1                   S2

       The algorithm then recursively runs QuickSort on the subsequence S1 .
       After this recursive call has terminated, the algorithm runs, again re-
       cursively, QuickSort on the subsequence S2 .
Running QuickSort recursively on the subsequence S1 means that we first
check if S1 has size at most one; if this is the case, nothing needs to be done,
because S1 is sorted already. If S1 has size at least two, then we choose a
pivot p1 in S1 , use p1 to split S1 into three subsequences, recursively run
QuickSort on the subsequence of S1 consisting of all elements that are
less than p1 , and, finally, recursively run QuickSort on the subsequence of
S1 consisting of all elements that are larger than p1 . (We will see recursive
algorithms in more detail in Chapter 4.)
    Algorithm QuickSort correctly sorts any sequence of numbers, no mat-
ter how we choose the pivot element. It turns out, however, that the running
6                                                 Chapter 1.     Introduction


time of the algorithm heavily depends on the pivots that are chosen in the
recursive calls.
    For example, assume that in each (recursive) call to the algorithm, the
pivot happens to be the largest element in the sequence. Then, in each call,
the subsequence of elements that are larger than the pivot is empty. Let us
see what happens in this case:

    • We start with a sequence S of size n. The first pivot p is the largest
      element in S. Thus, using the notation given above, the subsequence
      S1 contains n − 1 elements (all elements of S except for p), whereas the
      subsequence S2 is empty. Computing these subsequences can be done
      in n “steps”, after which we are in the following situation:

                                       <p                    p
                                 n − 1 elements

    • We now run QuickSort on a sequence of n − 1 elements. Again, the
      pivot p1 is the largest element. In n−1 “steps”, we obtain a subsequence
      of n − 2 elements that are less than p1 , and an empty subsequence of
      elements that are larger than p1 ; see the figure below.

                                      < p1                p1
                                  n − 2 elements

    • Next we run QuickSort on a sequence of n − 2 elements. As before,
      the pivot p2 is the largest element. In n − 2 “steps”, we obtain a
      subsequence of n − 3 elements that are less than p2 , and an empty
      subsequence of elements that are larger than p2 ; see the figure below.

                                      < p2              p2
                                 n − 3 elements

You probably see the pattern. The total running time of the algorithm, i.e.,
the total number of “steps”, is proportional to

              n + (n − 1) + (n − 2) + (n − 3) + · · · + 3 + 2 + 1,
1.3.   The Quick-Sort Algorithm                                               7


which, by Theorem 2.2.10, is equal to
                           1           1    1
                             n(n + 1) = n2 + n,
                           2           2    2
which, using the Big-Theta notation (see Section 2.3) is Θ(n2 ), i.e., quadratic
in n. It can be shown that this is, in fact, the worst-case running time of the
QuickSort algorithm.
    What would be a good choice for the pivot elements? Intuitively, a pivot
is good if the sequences S1 and S2 have (roughly) the same size. Thus, after
the first call, we are in the following situation:

                           <p          p        >p
                        (n − 1)/2            (n − 1)/2

     In Section 4.6, we will prove that, if this happens in each recursive call,
the running time of the QuickSort algorithm is only O(n log n). Obviously,
it is not clear at all how we can guarantee that we always choose a good pivot.
It turns out that there is a simple strategy: In each call, choose the pivot
randomly! That is, among all elements involved in the recursive call, pick one
uniformly at random; thus, each element has the same probability of being
chosen. In Section 6.10, we will prove that this leads to an expected running
time of O(n log n).
8   Chapter 1.   Introduction
Chapter 2

Mathematical Preliminaries

2.1     Basic Concepts
Throughout this book, we will assume that you know the following mathe-
matical concepts:

  1. A set is a collection of well-defined objects. Examples are (i) the set of
     all Dutch Olympic Gold Medallists, (ii) the set of all pubs in Ottawa,
     and (iii) the set of all even natural numbers.

  2. The set of natural numbers is N = {0, 1, 2, 3, . . .}.

  3. The set of integers is Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .}.

  4. The set of rational numbers is Q = {m/n : m ∈ Z, n ∈ Z, n 6= 0}.

  5. The set of real numbers is denoted by R.

  6. The empty set is the set that does not contain any element. This set
     is denoted by ∅.

  7. If A is a finite set, then the size (or cardinality) of A, denoted by |A|,
     is the number of elements in A. Observe that |∅| = 0.

  8. If A and B are sets, then A is a subset of B, written as A ⊆ B, if every
     element of A is also an element of B. For example, the set of even
     natural numbers is a subset of the set of all natural numbers. Every
     set A is a subset of itself, i.e., A ⊆ A. The empty set is a subset of
10                              Chapter 2.     Mathematical Preliminaries


       every set A, i.e., ∅ ⊆ A. We say that A is a proper subset of B, written
       as A ⊂ B, if A ⊆ B and A 6= B.

     9. If B is a set, then the power set P(B) of B is defined to be the set of
        all subsets of B:
                                 P(B) = {A : A ⊆ B}.
       Observe that ∅ ∈ P(B) and B ∈ P(B).

 10. If A and B are two sets, then

        (a) their union is defined as

                               A ∪ B = {x : x ∈ A or x ∈ B},

        (b) their intersection is defined as

                              A ∩ B = {x : x ∈ A and x ∈ B},

         (c) their difference is defined as

                              A \ B = {x : x ∈ A and x 6∈ B},

        (d) the Cartesian product of A and B is defined as

                            A × B = {(x, y) : x ∈ A and y ∈ B},

         (e) the complement of A is defined as

                                      A = {x : x 6∈ A}.

 11. A binary relation on two sets A and B is a subset of A × B.

 12. A function f from A to B, denoted by f : A → B, is a binary relation R,
     having the property that for each element a in A, there is exactly one
     ordered pair in R, whose first component is a. If this unique pair is
     (a, b), then we will say that f (a) = b, or f maps a to b, or the image
     of a under f is b. The set A is called the domain of f , and the set

                      {b ∈ B : there is an a ∈ A with f (a) = b}

       is called the range of f .
2.2.    Proof Techniques                                                    11


 13. A function f : A → B is one-to-one (or injective), if for any two distinct
     elements a and a0 in A, we have f (a) 6= f (a0 ). The function f is onto
     (or surjective), if for each element b in B, there exists an element a
     in A, such that f (a) = b; in other words, the range of f is equal to the
     set B. A function f is a bijection, if f is both injective and surjective.

 14. A set A is countable, if A is finite or there is a bijection f : N → A.
     The sets N, Z, and Q are countable, whereas R is not.

 15. A graph G = (V, E) is a pair consisting of a set V , whose elements
     are called vertices, and a set E, where each element of E is a pair of
     distinct vertices. The elements of E are called edges.

 16. The Boolean values are 1 and 0, that represent true and false, respec-
     tively. The basic Boolean operations include

       (a) negation (or NOT ), represented by ¬,
       (b) conjunction (or AND), represented by ∧,
        (c) disjunction (or OR), represented by ∨,
       (d) exclusive-or (or XOR), represented by ⊕,
        (e) equivalence, represented by ↔ or ⇔,
        (f) implication, represented by → or ⇒.

       The following table explains the meanings of these operations.
        NOT       AND          OR         XOR        equivalence   implication
        ¬0 = 1   0∧0=0       0∨0=0       0⊕0=0        0↔0=1         0→0=1
        ¬1 = 0   0∧1=0       0∨1=1       0⊕1=1        0↔1=0         0→1=1
                 1∧0=0       1∨0=1       1⊕0=1        1↔0=0         1→0=0
                 1∧1=1       1∨1=1       1⊕1=0        1↔1=1         1→1=1


2.2      Proof Techniques
              A proof is a proof. What kind of a proof? It’s a proof. A proof
              is a proof. And when you have a good proof, it’s because it’s
              proven.
                  — Jean Chrétien, Prime Minister of Canada (1993–2003)
12                             Chapter 2.     Mathematical Preliminaries


    In mathematics, a theorem is a statement that is true. A proof is a se-
quence of mathematical statements that form an argument to show that a
theorem is true. The statements in the proof of a theorem include axioms
(assumptions about the underlying mathematical structures), hypotheses of
the theorem to be proved, and previously proved theorems. The main ques-
tion is “How do we go about proving theorems?” This question is similar
to the question of how to solve a given problem. Of course, the answer is
that finding proofs, or solving problems, is not easy; otherwise life would be
dull! There is no specified way of coming up with a proof, but there are some
generic strategies that could be of help. In this section, we review some of
these strategies. The best way to get a feeling of how to come up with a
proof is by solving a large number of problems. Here are some useful tips.
(You may take a look at the book How to Solve It, by George Pólya).
     1. Read and completely understand the statement of the theorem to be
        proved. Most often this is the hardest part.
     2. Sometimes, theorems contain theorems inside them. For example,
        “Property A if and only if property B”, requires showing two state-
        ments:
        (a) If property A is true, then property B is true (A ⇒ B).
        (b) If property B is true, then property A is true (B ⇒ A).
       Another example is the theorem “Set A equals set B.” To prove this,
       we need to prove that A ⊆ B and B ⊆ A. That is, we need to show
       that each element of set A is in set B, and each element of set B is in
       set A.
     3. Try to work out a few simple cases of the theorem just to get a grip on
        it (i.e., crack a few simple cases first).
     4. Try to write down the proof once you think you have it. This is to
        ensure the correctness of your proof. Often, mistakes are found at the
        time of writing.
     5. Finding proofs takes time, we do not come prewired to produce proofs.
        Be patient, think, express and write clearly, and try to be precise as
        much as possible.
     In the next sections, we will go through some of the proof strategies.
2.2.    Proof Techniques                                                  13


2.2.1      Direct proofs
As the name suggests, in a direct proof of a theorem, we just approach the
theorem directly.

Theorem 2.2.1 If n is an odd positive integer, then n2 is odd as well.

Proof. An odd positive integer n can be written as n = 2k + 1, for some
integer k ≥ 0. Then

             n2 = (2k + 1)2 = 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1.

Since 2(2k 2 + 2k) is even, and “even plus one is odd”, we can conclude that
n2 is odd.

    For a graph G = (V, E), the degree of a vertex v, denoted by deg(v), is
defined to be the number of edges that are incident on v.

Theorem 2.2.2 Let G = (V, E) be a graph. Then the sum of the degrees of
all vertices is an even integer, i.e.,
                                   X
                                         deg(v)
                                   v∈V


is even.

Proof. If you do not see the meaning of this statement, then first try it out
for a few graphs. The reason why the statement holds is very simple: Each
edge contributes 2 to the summation (because an edge is incident on exactly
two distinct vertices).

   Actually, the proof above proves the following theorem.

Theorem 2.2.3 Let G = (V, E) be a graph. Then the sum of the degrees of
all vertices is equal to twice the number of edges, i.e.,
                             X
                                   deg(v) = 2|E|.
                             v∈V
14                           Chapter 2.        Mathematical Preliminaries


2.2.2    Constructive proofs
This technique not only shows the existence of a certain object, it actually
gives a method of creating it:

Theorem 2.2.4 There exists an object with property P.

Proof. Here is the object: [. . .]
  And here is the proof that the object satisfies property P: [. . .]

    A graph is called 3-regular, if each vertex has degree three. We prove the
following theorem using a constructive proof.

Theorem 2.2.5 For every even integer n ≥ 4, there exists a 3-regular graph
with n vertices.

Proof. Let
                          V = {0, 1, 2, . . . , n − 1},
and

E = {{i, i+1} : 0 ≤ i ≤ n−2}∪{{n−1, 0}}∪{{i, i+n/2} : 0 ≤ i ≤ n/2−1}.

Then the graph G = (V, E) is 3-regular.
   Convince yourself that this graph is indeed 3-regular. It may help to draw
the graph for, say, n = 8.


2.2.3    Nonconstructive proofs
In a nonconstructive proof, we show that a certain object exists, without
actually creating it. Here is an example of such a proof:

Theorem 2.2.6 There exist irrational numbers x and y such that xy is ra-
tional.

Proof. There are two possible cases.
         √ √2
Case 1: 2 ∈ Q.                  √
   In√this case, we take x = y = 2. In Theorem 2.2.9 below, we will prove
that 2 is irrational.
2.2.    Proof Techniques                                                                    15

              √
          √       2
Case 2:       2       6∈ Q.                √
                                       √       2                  √
   In this case, we take x =               2       and y =            2. Since
                                               √        √2
                                        √                         √
                                       
                                                   2                  2
                                xy =           2              =       2 = 2,

the claim in the theorem follows.

   Observe that this proof indeed proves the theorem, but it does not give
an example of a pair of irrational numbers x and y such that xy is rational.

2.2.4     Proofs by contradiction
This is how a proof by contradiction looks like:

Theorem 2.2.7 Statement S is true.

Proof. Assume that statement S is false. Then, derive a contradiction (such
as 1 + 1 = 3).
    In other words, we show that the statement “¬S ⇒ false” is true. This
is sufficient, because the contrapositive of the statement “¬S ⇒ false” is the
statement “true ⇒ S”. The latter logical formula is equivalent to S, and
that is what we wanted to show.

   Below, we give two examples of proofs by contradiction.

Theorem 2.2.8 Let n be a positive integer. If n2 is even, then n is even.

Proof. We will prove the theorem by contradiction. Thus, we assume that
n2 is even, but n is odd. Since n is odd, we know from Theorem 2.2.1 that
n2 is odd. This is a contradiction, because we assumed that n2 is even.

                        √                                 √
Theorem 2.2.9               2 is irrational, i.e.,         2 cannot be written as a fraction of
two integers.

Proof.
√        We will prove√the theorem by contradiction. Thus, we assume that
  2 is rational. Then
                  √     2 can be written as a fraction of two integers m ≥ 1
and n ≥ 1, i.e., 2 = m/n. We may assume that m and n do not share
16                           Chapter 2.        Mathematical Preliminaries


any common factors, i.e., the greatest common divisor of m and n is equal
to one; if √this is not the case, then we can get rid of the common factors. By
squaring 2 = m/n, we get 2n2 = m2 . This implies that m2 is even. Then,
by Theorem 2.2.8, m is even, which means that we can write m as m = 2k,
for some positive integer k. It follows that 2n2 = m2 = 4k 2 , which implies
that n2 = 2k 2 . Hence, n2 is even. Again by Theorem 2.2.8, it follows that n
is even.
    We have shown that m and n are both even. But we know that m and
n
√ are not both even. Hence, we have a contradiction.√Our assumption that
   2 is rational is wrong. Thus, we can conclude that 2 is irrational.

   There is a nice discussion of this proof in the book My Brain is Open:
The Mathematical Journeys of Paul Erdős by Bruce Schechter.


2.2.5     Proofs by induction
This is a very powerful and important technique for proving theorems.
    For each positive integer n, let P (n) be a mathematical statement that
depends on n. Assume we wish to prove that P (n) is true for all positive
integers n. A proof by induction of such a statement is carried out as follows:

Base Case: Prove that P (1) is true.

Induction Step: Prove that for all n ≥ 1, the following holds: If P (n) is
    true, then P (n + 1) is also true.

    In the induction step, we choose an arbitrary integer n ≥ 1 and assume
that P (n) is true; this is called the induction hypothesis. We then prove that
P (n + 1) is also true.

Theorem 2.2.10 For all positive integers n, we have

                                               n(n + 1)
                       1 + 2 + 3 + ··· + n =            .
                                                  2

Proof. We start with the base case of the induction. If n = 1, then both the
left-hand side and the right-hand side are equal to 1. Therefore, the theorem
is true for n = 1.
2.2.        Proof Techniques                                                                             17


    For the induction step, let n ≥ 1 and assume that the theorem is true
for n, i.e., assume that
                                                             n(n + 1)
                               1 + 2 + 3 + ··· + n =                  .
                                                                2
We have to prove that the theorem is true for n + 1, i.e., we have to prove
that
                                               (n + 1)(n + 2)
                 1 + 2 + 3 + · · · + (n + 1) =                .
                                                     2
Here is the proof:

              1 + 2 + 3 + · · · + (n + 1) = |1 + 2 + 3{z+ · · · + n} +(n + 1)
                                                                     n(n+1)
                                                                 =      2

                                                   n(n + 1)
                                                 =          + (n + 1)
                                                      2
                                                   (n + 1)(n + 2)
                                                 =                .
                                                         2


   By the way, here is an alternative proof of the theorem above: Let S =
1 + 2 + 3 + · · · + n. Then,
       S     =      1      +      2      +      3      +   ···       +    (n − 2)   +   (n − 1)   +      n
       S     =      n      +   (n − 1)   +   (n − 2)   +   ···       +       3      +      2      +      1
       2S    =   (n + 1)   +   (n + 1)   +   (n + 1)   +   ···       +    (n + 1)   +   (n + 1)   +   (n + 1)


Since there are n terms on the right-hand side, we have 2S = n(n + 1). This
implies that S = n(n + 1)/2.

Theorem 2.2.11 For every positive integer n, a − b is a factor of an − bn .

Proof. A direct proof can be given by providing a factorization of an − bn :

            an − bn = (a − b)(an−1 + an−2 b + an−3 b2 + · · · + abn−2 + bn−1 ).

We now prove the theorem by induction. For the base case, let n = 1. The
claim in the theorem is “a − b is a factor of a − b”, which is obviously true.
    Let n ≥ 1 and assume that a − b is a factor of an − bn . We have to prove
that a − b is a factor of an+1 − bn+1 . We have

        an+1 − bn+1 = an+1 − an b + an b − bn+1 = an (a − b) + (an − bn )b.
18                           Chapter 2.      Mathematical Preliminaries


The first term on the right-hand side is divisible by a − b. By the induction
hypothesis, the second term on the right-hand side is divisible by a − b as
well. Therefore, the entire right-hand side is divisible by a − b. Since the
right-hand side is equal to an+1 − bn+1 , it follows that a − b is a factor of
an+1 − bn+1 .

     We now give an alternative proof of Theorem 2.2.3:

Theorem 2.2.12 Let G = (V, E) be a graph with m edges. Then the sum
of the degrees of all vertices is equal to twice the number of edges, i.e.,
                                X
                                    deg(v) = 2m.
                              v∈V

Proof. The proof is by induction on the number m of edges. For the base
case of the induction, assume thatPm = 0. Then the graph G does not
contain any edges and, therefore, v∈V deg(v) = 0. Thus, the theorem is
true if m = 0.
    Let m ≥ 0 and assume that the theorem is true for every graph with m
edges.
P       Let G be an arbitrary graph with m + 1 edges. We have to prove that
   v∈V deg(v) = 2(m + 1).
    Let {a, b} be an arbitrary edge in G, and let G0 be the graph obtained
from G by removing the edge {a, b}. Since G0 has m edges, we know from
the induction hypothesis that the sum of the degrees of all vertices in G0 is
equal to 2m. Using this, we obtain
              X            X
                 deg(v) =      deg(v) + 2 = 2m + 2 = 2(m + 1).
              v∈G         v∈G0




2.2.6      More examples of proofs
Recall Theorem 2.2.5, which states that for every even integer n ≥ 4, there
exists a 3-regular graph with n vertices. The following theorem explains why
we stated this theorem for even values of n.

Theorem 2.2.13 Let n ≥ 5 be an odd integer. There is no 3-regular graph
with n vertices.
2.2.   Proof Techniques                                                      19


Proof. The proof is by contradiction. Thus, we assume that there exists a
graph G = (V, E) with n vertices that is 3-regular. Let m be the number of
edges in G. Since deg(v) = 3 for every vertex v, we have
                             X
                                  deg(v) = 3n.
                               v∈V

On the other hand, by Theorem 2.2.3, we have
                           X
                              deg(v) = 2m.
                               v∈V

It follows that
                                     3n = 2m.
Since n is an odd integer, the left-hand side in this equation is an odd integer
as well. The right-hand side, however, is an even integer. This is a contra-
diction.


    Let Kn be the complete graph on n vertices. This graph has a vertex set
of size n, and every pair of distinct vertices is joined by an edge.
    If G = (V, E) is a graph with n vertices, then the complement G of G is
the graph with vertex set V that consists of those edges of Kn that are not
present in G.

Theorem 2.2.14 Let n ≥ 2 and let G be a graph on n vertices. Then G is
connected or G is connected.

Proof. We prove the theorem by induction on the number n of vertices. For
the base case, assume that n = 2. There are two possibilities for the graph
G:
  1. G contains one edge. In this case, G is connected.
  2. G does not contain an edge. In this case, the complement G contains
     one edge and, therefore, G is connected.
Thus, for n = 2, the theorem is true.
   Let n ≥ 2 and assume that the theorem is true for every graph with n
vertices. Let G be graph with n + 1 vertices. We have to prove that G is
connected or G is connected. We consider three cases.
20                            Chapter 2.      Mathematical Preliminaries


Case 1: There is a vertex v whose degree in G is equal to n.
   Since G has n+1 vertices, v is connected by an edge to every other vertex
of G. Therefore, G is connected.
Case 2: There is a vertex v whose degree in G is equal to 0.
    In this case, the degree of v in the graph G is equal to n. Since G has n+1
vertices, v is connected by an edge to every other vertex of G. Therefore, G
is connected.
Case 3: For every vertex v, the degree of v in G is in {1, 2, . . . , n − 1}.
    Let v be an arbitrary vertex of G. Let G0 be the graph obtained by
deleting from G the vertex v, together with all edges that are incident on v.
Since G0 has n vertices, we know from the induction hypothesis that G0 is
connected or G0 is connected.
    Let us first assume that G0 is connected. Then the graph G is connected
as well, because there is at least one edge in G between v and some vertex
of G0 .
    If G0 is not connected, then G0 must be connected. Since we are in Case 3,
we know that the degree of v in G is in the set {1, 2, . . . , n − 1}. It follows
that the degree of v in the graph G is in this set as well. Hence, there is at
least one edge in G between v and some vertex in G0 . This implies that G is
connected.

     The previous theorem can be rephrased as follows:

Theorem 2.2.15 Let n ≥ 2 and consider the complete graph Kn on n ver-
tices. Color each edge of this graph as either red or blue. Let R be the graph
consisting of all the red edges, and let B be the graph consisting of all the
blue edges. Then R is connected or B is connected.

   If you like surprising proofs of various mathematical results, you should
read the book Proofs from THE BOOK by Martin Aigner and Günter Ziegler.


2.3       Asymptotic Notation
Let f : N → R and g : N → R be functions such that f (n) > 0 and g(n) > 0
for all n ∈ N.
2.4.   Logarithms                                                       21


   • We say that f (n) = O(g(n)) if the following is true: There exist con-
     stants c > 0 and k > 0 such that for all n ≥ k,

                                  f (n) ≤ c · g(n).

   • We say that f (n) = Ω(g(n)) if the following is true: There exist con-
     stants c > 0 and k > 0 such that for all n ≥ k,

                                  f (n) ≥ c · g(n).

   • We say that f (n) = Θ(g(n)) if f (n) = O(g(n)) and f (n) = Ω(g(n)).
     Thus, there exist constants c > 0, c0 > 0, and k > 0 such that for all
     n ≥ k,
                           c · g(n) ≤ f (n) ≤ c0 · g(n).

   For example, for all n ≥ 1, we have

                 13 + 7n − 5n2 + 8n3 ≤ 13 + 7n + 8n3
                                     ≤ 13n3 + 7n3 + 8n3
                                     = 28n3 .

Thus, by taking c = 28 and k = 1, it follows that

                        13 + 7n − 5n2 + 8n3 = O(n3 ).                 (2.1)

We also have
                     13 + 7n − 5n2 + 8n3 ≥ −5n2 + 8n3 .
Since n3 ≥ 5n2 for all n ≥ 5, it follows that, again for all n ≥ 5,

                    13 + 7n − 5n2 + 8n3 ≥ −5n2 + 8n3
                                        ≥ −n3 + 8n3
                                        = 7n3 .

Hence, by taking c = 7 and k = 5, we have shown that

                        13 + 7n − 5n2 + 8n3 = Ω(n3 ).                 (2.2)

It follows from (2.1) and (2.2) that

                        13 + 7n − 5n2 + 8n3 = Θ(n3 ).
22                             Chapter 2.          Mathematical Preliminaries


2.4       Logarithms
If b and x are real numbers with b > 1 and x > 0, then logb x denotes the
logarithm of x with base b. Note that
                       logb x = y if and only if by = x.
If b = 2, then we write log x instead of log2 x. We write ln x to refer to the
natural logarithm of x with base e.
Lemma 2.4.1 If b > 1 and x > 0, then
                                    blogb x = x.
Proof. We have seen above that y = logb x if and only if by = x. Thus, if
we write y = logb x, then blogb x = by = x.

     For example, if x > 0, then
                                    2log x = x.
Lemma 2.4.2 If b > 1, x > 0, and a is a real number, then
                               logb (xa ) = a logb x.
Proof. Let y = logb x. Then
                                   a logb x = ay.
Since y = logb x, we have by = x and, thus,
                                xa = (by )a = bay .
Taking logarithms (with base b) on both sides gives
                     logb (xa ) = logb (bay ) = ay = a logb x.



     For example, for x > 1, we get
                            2 log log x = log log2 x
                                                        

and
                        22 log log x = 2log(log x) = log2 x.
                                               2
2.5.     Exercises                                                          23


Lemma 2.4.3 If b > 1, c > 1, and x > 0, then
                                             logc x
                                 logb x =           .
                                             logc b

Proof. Let y = logb x. Then by = x, and we get

                   logc x = logc (by ) = y logc b = logb x logc b.



   For example, if x > 0, then
                                             ln x
                                   log x =        .
                                             ln 2

2.5        Exercises

       Proofs that use a big hammer:
                                             √
       Theorem: For any integer n ≥ 3, n 2 is irrational.
                          √
       Proof: Assume n 2√ is rational. Then there exist positive integers
       a and b such that n 2 = a/b. Thus, we have 2 = (a/b)n , which is
       equivalent to 2 · bn = an , which is equivalent to

                                   b n + b n = an .

       This contradicts Fermat’s Last Theorem.

                  √
2.1 Prove that     p is irrational for every prime number p.
                                                                            √
2.2 Let n be a positive integer that is not a perfect square. Prove that        n
is irrational.

2.3 Use induction to prove that every integer n ≥ 2 can be written as a
product of prime numbers.

2.4 Prove by induction that n4 − 4n2 is divisible by 3, for all integers n ≥ 1.
24                               Chapter 2.       Mathematical Preliminaries


2.5 Prove that                     n
                                  X   1
                                        2
                                          < 2 − 1/n,
                                  i=1
                                      i
for all integers n ≥ 2.

2.6 Prove that 9 divides n3 + (n + 1)3 + (n + 2)3 , for all integers n ≥ 0.
                                                                n
2.7 The Fermat numbers F0 , F1 , F2 , . . . are defined by Fn = 22 +1 for n ≥ 0.

     • Prove by induction that

                                 F0 F1 F2 · · · Fn−1 = Fn − 2

       for all integers n ≥ 1.

     • Prove that for any two distinct integers n ≥ 0 and m ≥ 0, the greatest
       common divisor of Fn and Fm is equal to 1.

     • Conclude that there are infinitely many prime numbers.

2.8 Prove by induction that n! > 21+n/2 for all integers n ≥ 3.
Chapter 3

Counting

             There are three types of people, those who can count and those
             who cannot count.


  Given a set of 23 elements, how many subsets of size 17 are there? How
many solutions are there to the equation

                          x1 + x2 + · · · + x12 = 873,

where x1 ≥ 0, x2 ≥ 0, . . . , x12 ≥ 0 are integers? In this chapter, we will
introduce some general techniques that can be used to answer questions of
these types.


3.1      The Product Rule
How many strings of two characters are there, if the first character must be
an uppercase letter and the second character must be a digit? Examples
of such strings are A0, K7, and Z9. The answer is obviously 26 · 10 = 260,
because there are 26 choices for the first character and, no matter which letter
we choose for being the first character, there are 10 choices for the second
character. We can look at this in the following way: Consider the “procedure”
of writing a string of two characters, the first one being an uppercase letter,
and the second one being a digit. Then our original question becomes “how
many ways are there to perform this procedure?” Observe that the procedure
consists of two “tasks”, the first one being writing the first character, and the
26                                                        Chapter 3.     Counting


second one being writing the second character. Obviously, there are 26 ways
to do the first task. Next, observe that, regardless of how we do the first
task, there are 10 ways to do the second task. The Product Rule states that
the total number of ways to perform the entire procedure is 26 · 10 = 260.
     Product Rule: Assume a procedure consists of performing a se-
     quence of m tasks in order. Furthermore, assume that for each
     i = 1, 2, . . . , m, there are Ni ways to do the i-th task, regardless
     of how the first i − 1 tasks were done. Then, there are N1 N2 · · · Nm
     ways to do the entire procedure.

In the example above, we have m = 2, N1 = 26, and N2 = 10.

3.1.1       Counting Bitstrings of Length n
Let n ≥ 1 be an integer. A bitstring of length n is a sequence of 0’s and 1’s.
How many bitstrings of length n are there? To apply the Product Rule, we
have to specify the “procedure” and the “tasks”:
     • The procedure is “write a bitstring of length n”.

     • For i = 1, 2, . . . , n, the i-th task is “write one bit”.
There are two ways to do the i-th task, regardless of how we did the first
i − 1 tasks. Therefore, we can apply the Product Rule with Ni = 2 for
i = 1, 2, . . . , n, and conclude that there are N1 N2 · · · Nn = 2n ways to do the
entire procedure. As a result, the number of bitstrings of length n is equal
to 2n .

Theorem 3.1.1 For any integer n ≥ 1, the number of bitstrings of length n
is equal to 2n .

3.1.2       Counting Functions
Let m ≥ 1 and n ≥ 1 be integers, let A be a set of size m, and let B be a set
of size n. How many functions f : A → B are there?
    Write the set A as A = {a1 , a2 , . . . , am }. Any function f : A → B is
completely specified by the values f (a1 ), f (a2 ), . . . , f (am ), where each such
value can be any element of B. Again, we are going to apply the Product
Rule. Thus, we have to specify the “procedure” and the “tasks”:
3.1.   The Product Rule                                                         27


   • The procedure is “specify the values f (a1 ), f (a2 ), . . . , f (am )”.

   • For i = 1, 2, . . . , m, the i-th task is “specify the value f (ai )”.




                            ai                     f (ai)


                             A                        B

    For each i, f (ai ) can be any of the n elements of B. As a result, there are
Ni = n ways to do the i-th task, regardless of how we did the first i − 1 tasks.
By the Product Rule, there are N1 N2 · · · Nm = nm ways to do the entire
procedure and, hence, this many functions f : A → B. We have proved the
following result:

Theorem 3.1.2 Let m ≥ 1 and n ≥ 1 be integers, let A be a set of size m,
and let B be a set of size n. The number of functions f : A → B is equal
to nm .

    Recall that a function f : A → B is one-to-one if for any i and j with
i 6= j, we have f (ai ) 6= f (aj ). How many one-to-one functions f : A → B
are there?
    If m > n, then there is no such function. (Do you see why?) Assume
that m ≤ n. To determine the number of one-to-one functions, we use the
same procedure and tasks as before.

   • Since f (a1 ) can be any of the n elements of B, there are N1 = n ways
     to do the first task.

   • In the second task, we have to specify the value f (a2 ). Since the func-
     tion f is one-to-one and since we have already specified f (a1 ), we can
     choose f (a2 ) to be any of the n − 1 elements in the set B \ {f (a1 )}. As
     a result, there are N2 = n − 1 ways to do the second task. Note that
     this is true, regardless of how we did the first task.
28                                                              Chapter 3.    Counting


     • In general, in the i-th task, we have to specify the value f (ai ). Since
       we have already specified f (a1 ), f (a2 ), . . . , f (ai−1 ), we can choose f (ai )
       to be any of the n − i + 1 elements in the set
                               B \ {f (a1 ), f (a2 ), . . . , f (ai−1 )}.
       As a result, there are Ni = n − i + 1 ways to do the i-th task. Note
       that this is true, regardless of how we did the first i − 1 tasks.
By the Product Rule, there are
                  N1 N2 · · · Nm = n(n − 1)(n − 2) · · · (n − m + 1)
ways to do the entire procedure, which is also the number of one-to-one
functions f : A → B.
   Recall the factorial function
                            
                               1                 if k = 0,
                       k! =
                               1 · 2 · 3 · · · k if k ≥ 1.
We can simplify the product
                           n(n − 1)(n − 2) · · · (n − m + 1)
by observing that it is “almost” a factorial:
        n(n − 1)(n − 2) · · · (n − m + 1)
                                                         (n − m)(n − m − 1) · · · 1
          = n(n − 1)(n − 2) · · · (n − m + 1) ·
                                                         (n − m)(n − m − 1) · · · 1
              n(n − 1)(n − 2) · · · 1
          =
            (n − m)(n − m − 1) · · · 1
               n!
          =          .
            (n − m)!
We have proved the following result:
Theorem 3.1.3 Let m ≥ 1 and n ≥ 1 be integers, let A be a set of size m,
and let B be a set of size n.
     1. If m > n, then there is no one-to-one function f : A → B.
     2. If m ≤ n, then the number of one-to-one functions f : A → B is equal
        to
                                        n!
                                              .
                                     (n − m)!
3.1.    The Product Rule                                                    29


3.1.3     Placing Books on Shelves
Let m ≥ 1 and n ≥ 1 be integers, and consider m books B1 , B2 , . . . , Bm and
n bookshelves S1 , S2 , . . . , Sn . How many ways are there to place the books
on the shelves? Placing the books on the shelves means that

   • we specify for each book the shelf at which this book is placed, and

   • we specify for each shelf the left-to-right order of the books that are
     placed on that shelf.

Some bookshelves may be empty. We assume that each shelf is large enough
to fit all books. In the figure below, you see two different placements.

              S1 B4 B3                        S1


              S2 B1                          S2 B3 B1 B4


              S3 B2 B5                        S3 B5 B2


   We are again going to use the Product Rule to determine the number of
placements.

   • The procedure is “place the m books on the n shelves”.

   • For i = 1, 2, . . . , m, the i-th task is “place book Bi on the shelves”.
     When placing book Bi , we can place it on the far left or far right
     of any shelf or between any two of the books B1 , . . . , Bi−1 that have
     already been placed.

Let us see how many ways there are to do each task.

   • Just before we place book B1 , all shelves are empty. Therefore, there
     are N1 = n ways to do the first task.

   • In the second task, we have to place book B2 . Since B1 has already
     been placed, we have the following possibilities for placing B2 :

        – We place B2 on the far left of any of the n shelves.
30                                                      Chapter 3.        Counting


          – We place B2 immediately to the right of B1 .

       As a result, there are N2 = n + 1 ways to do the second task. Note
       that this is true, regardless of how we did the first task.

     • In general, in the i-th task, we have to place book Bi . Since the books
       B1 , B2 , . . . , Bi−1 have already been placed, we have the following pos-
       sibilities for placing Bi :

          – We place Bi on the far left of any of the n shelves.
          – We place Bi immediately to the right of one of B1 , B2 , . . . , Bi−1 .

       As a result, there are Ni = n + i − 1 ways to do the i-th task. Note
       that this is true, regardless of how we did the first i − 1 tasks.

Thus, by the Product Rule, there are

                 N1 N2 · · · Nm = n(n + 1)(n + 2) · · · (n + m − 1)

ways to do the entire procedure, which is also the number of placements of
the m books on the n shelves. As before, we use factorials to simplify this
product:

           n(n + 1)(n + 2) · · · (n + m − 1)
                1 · 2 · 3 · · · (n − 1)
             =                          · n(n + 1)(n + 2) · · · (n + m − 1)
                1 · 2 · 3 · · · (n − 1)
                (n + m − 1)!
             =                     .
                   (n − 1)!

We have proved the following result:

Theorem 3.1.4 Let m ≥ 1 and n ≥ 1 be integers. The number of ways to
place m books on n bookshelves is equal to

                                   (n + m − 1)!
                                                .
                                     (n − 1)!
3.2.      The Bijection Rule                                                     31


3.2        The Bijection Rule
Let n ≥ 0 be an integer and let S be a set with n elements. How many
subsets does S have? If n = 0, then S = ∅ and there is only one subset of S,
namely S itself. Assume from now on that n ≥ 1. As we will see below,
asking for the number of subsets of S is exactly the same as asking for the
number of bitstrings of length n.
   Let A and B be finite sets. Recall that a function f : A → B is a bijection
if

   • f is one-to-one, i.e., if a 6= a0 then f (a) 6= f (a0 ), and

   • f is onto, i.e., for each b in B, there is an a in A such that f (a) = b.

This means that

   • each element of A corresponds to a unique element of B and

   • each element of B corresponds to a unique element of A.

It should be clear that this means that A and B contain the same number
of elements.
       Bijection Rule: Let A and B be finite sets. If there exists a bijection
       f : A → B, then |A| = |B|, i.e., A and B have the same size.

   Let us see how we can apply this rule to the subset problem. We define
the following two sets A and B:

   • A = P(S), i.e., the power set of S, which is the set of all subsets of S:

                                  P(S) = {T : T ⊆ S}.

   • B is the set of all bitstrings of length n.

We have seen in Theorem 3.1.1 that the set B has size 2n . Therefore, if we
can show that there exists a bijection f : A → B, then, according to the
Bijection Rule, we have |A| = |B| and, thus, the number of subsets of S is
equal to 2n .
    Write the set S as S = {s1 , s2 , . . . , sn }. We define the function f : A → B
in the following way:
32                                                              Chapter 3.    Counting


     • For any T ∈ A (i.e., T ⊆ S), f (T ) is the bitstring b1 b2 . . . bn , where

                                           
                                               1 if si ∈ T ,
                                    bi =
                                               0 if si 6∈ T .


For example, assume that n = 5.


     • If T = {s1 , s3 , s4 }, then f (T ) = 10110.


     • If T = S = {s1 , s2 , s3 , s4 , s5 }, then f (T ) = 11111.


     • If T = ∅, then f (T ) = 00000.


    It is not difficult to see that each subset of S corresponds to a unique
bitstring of length n, and each bitstring of length n corresponds to a unique
subset of S. Therefore, this function f is a bijection between A and B.
    Thus, we have shown that there exists a bijection f : A → B. This,
together with Theorem 3.1.1 and the Bijection Rule, implies the following
result:



Theorem 3.2.1 For any integer n ≥ 0, the number of subsets of a set of
size n is equal to 2n .



    You will probably have noticed that we could have proved this result
directly using the Product Rule: The procedure “specify a subset of S =
{s1 , s2 , . . . , sn }” can be carried out by specifying, for i = 1, 2, . . . , n, whether
or not si is contained in the subset. For each i, there are two choices. As a
result, there are 2n ways to do the procedure.
    To conclude this section, we remark that we have already been using the
Bijection Rule in Section 3.1!
3.3.     The Complement Rule                                                 33

               The Product Rule and the Bijection Rule: In order to
               apply the Product Rule to a counting problem, we need the
               following:

                  1. Phrase the counting problem in terms of doing a proce-
                     dure, consisting of a number of tasks.

                  2. There must be a one-to-one correspondence between the
                     different ways to do the procedure and the objects we
                     are counting. In other words:

                      (a) Each way to do the procedure must correspond to
                          a unique object we are counting.
                      (b) Conversely, each object we are counting must cor-
                          respond to a unique way to do the procedure.

                  3. Once we have this one-to-one correspondence, the Bi-
                     jection Rule implies that the number of objects is equal
                     to the number of ways to do the procedure.



3.3        The Complement Rule
Consider strings consisting of 8 characters, each character being a lowercase
letter or a digit. Such a string is called a valid password, if it contains at
least one digit. How many valid passwords are there? One way to answer
this question is to first count the valid passwords with exactly one digit,
then count the valid passwords with exactly two digits, then count the valid
passwords with exactly three digits, etc. As we will see below, it is easier to
first count the strings that do not contain any digit.
    Recall that the difference U \ A of the two sets U and A is defined as

                        U \ A = {x : x ∈ U and x 6∈ A}.


       Complement Rule: Let U be a finite set and let A be a subset of U .
       Then
                          |A| = |U | − |U \ A|.
34                                                         Chapter 3.     Counting


    This rule follows easily from the fact that |U | = |A| + |U \ A|, which holds
because each element in U is either in A or in U \ A.
    To apply the Complement Rule to the password problem, let U be the
set of all strings consisting of 8 characters, each character being a lowercase
letter or a digit, and let A be the set of all valid passwords, i.e., all strings
in U that contain at least one digit. Note that U \ A is the set of all strings
of 8 characters, each character being a lowercase letter or a digit, that do
not contain any digit. In other words, U \ A is the set of all strings of 8
characters, where each character is a lowercase letter.
    By the Product Rule, the set U has size 368 , because each string in U
has 8 characters, and there are 26 + 10 = 36 choices for each character.
Similarly, the set U \ A has size 268 , because there are 26 choices for each
of the 8 characters. Then, by the Complement Rule, the number of valid
passwords is equal to

            |A| = |U | − |U \ A| = 368 − 268 = 2, 612, 282, 842, 880.


3.4      The Sum Rule
If A and B are two finite sets that are disjoint, i.e., A ∩ B = ∅, then it is
obvious that the size of A ∪ B is equal to the sum of the sizes of A and B.
     Sum Rule: Let A1 , A2 , . . . , Am be a sequence of finite and pairwise
     disjoint sets. Then

               |A1 ∪ A2 ∪ · · · ∪ Am | = |A1 | + |A2 | + · · · + |Am |.

   Note that we already used this rule in Section 3.3 when we argued why
the Complement Rule is correct!
    To give an example, consider strings consisting of 6, 7, or 8 characters,
each character being a lowercase letter or a digit. Such a string is called a
valid password, if it contains at least one digit. Let A be the set of all valid
passwords. What is the size of A?
    For i = 6, 7, 8, let Ai be the set of all valid passwords of length i. It is
obvious that A = A6 ∪ A7 ∪ A8 . Since the three sets A6 , A7 , and A8 are
pairwise disjoint, we have, by the Sum Rule,

                            |A| = |A6 | + |A7 | + |A8 |.
3.5.     Inclusion-Exclusion                                                 35


We have seen in Section 3.3 that |A8 | = 368 − 268 . By the same arguments,
we have |A6 | = 366 − 266 and |A7 | = 367 − 267 . Thus, the number of valid
passwords is equal to

   |A| = 366 − 266 + 367 − 267 + 368 − 268 = 2, 684, 483, 063, 360.
                                               



3.5        The Principle of Inclusion and Exclusion
The Sum Rule holds only for sets that are pairwise disjoint. Consider two
finite sets A and B that are not necessarily disjoint. How can we determine
the size of the union A ∪ B? We can start with the sum |A| + |B|, i.e., we
include both A and B. In the Venn diagram below,
   • x is in A but not in B; it is counted exactly once in |A| + |B|,

   • z is in B but not in A; it is counted exactly once in |A| + |B|,

   • y is in A and in B; it is counted exactly twice in |A| + |B|.
Based on this, if we subtract the size of the intersection A∩B, i.e., we exclude
A ∩ B, then we have counted every element of A ∪ B exactly once.




                          x              y                 z

               A                                                 B



       Inclusion-Exclusion: Let A and B be finite sets. Then

                        |A ∪ B| = |A| + |B| − |A ∩ B|.

   To give an example, let us count the bitstrings of length 17 that start
with 010 or end with 11. Let S be the set of all such bitstrings. Define A to
be the set of all bitstrings of length 17 that start with 010, and define B to
be the set of all bitstrings of length 17 that end with 11. Then S = A ∪ B
and, thus, we have to determine the size of A ∪ B.
36                                                          Chapter 3.   Counting


     • The size of A is equal to the number of bitstrings of length 14, because
       the first three bits of every string in A are fixed. Therefore, by the
       Product Rule, we have |A| = 214 .

     • The size of B is equal to the number of bitstrings of length 15, because
       the last two bits of every string in B are fixed. Therefore, by the
       Product Rule, we have |B| = 215 .

     • Each string in A ∩ B starts with 010 and ends with 11. Thus, five bits
       are fixed for every string in A ∩ B. It follows that the size of A ∩ B
       is equal to the number of bitstrings of length 12. Therefore, by the
       Product Rule, we have |A ∩ B| = 212 .
By applying the Inclusion-Exclusion formula, it follows that

            |S| = |A ∪ B| = |A| + |B| − |A ∩ B| = 214 + 215 − 212 = 45, 056.

    The Inclusion-Exclusion formula can be generalized to more than two
sets. You are encouraged to verify, using the Venn diagram below, that the
following formula is the correct one for three sets.
         Inclusion-Exclusion: Let A, B, and C be finite sets. Then

         |A∪B ∪C| = |A|+|B|+|C|−|A∩B|−|A∩C|−|B ∩C|+|A∩B ∩C|.




                                                            B



                                  A

                                                        C


   To give an example, how many bitstrings of length 17 are there that start
with 010, or end with 11, or have 10 at positions1 7 and 8? Let S be the set
     1
         The positions are numbered 1, 2, . . . , 17.
3.6.   Permutations and Binomial Coefficients                                  37


of all such bitstrings. Define A to be the set of all bitstrings of length 17 that
start with 010, define B to be the set of all bitstrings of length 17 that end
with 11, and define C to be the set of all bitstrings of length 17 that have 10
at positions 7 and 8. Then S = A ∪ B ∪ C and, thus, we have to determine
the size of A ∪ B ∪ C.

   • We have seen before that |A| = 214 , |B| = 215 , and |A ∩ B| = 212 .

   • We have |C| = 215 , because the bits at positions 7 and 8 are fixed for
     every string in C.

   • We have |A∩C| = 212 , because 5 bits are fixed for every string in A∩C.

   • We have |B ∩ C| = 213 , because 4 bits are fixed for every string in
     B ∩ C.

   • We have |A ∩ B ∩ C| = 210 , because 7 bits are fixed for every string in
     A ∩ B ∩ C.

By applying the Inclusion-Exclusion formula, it follows that

   |S| =    |A ∪ B ∪ C|
       =    |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|
       =    214 + 215 + 215 − 212 − 212 − 213 + 210
       =    66, 560.


3.6      Permutations and Binomial Coefficients
A permutation of a finite set S is an ordered sequence of the elements of S,
in which each element occurs exactly once. For example, the set S = {a, b, c}
has six permutations:

                            abc, acb, bac, bca, cab, cba

Theorem 3.6.1 Let n ≥ 0 be an integer and let S be a set with n elements.
There are exactly n! many permutations of S.

Proof. If n = 0, then S = ∅ and the only permutation of S is the empty
sequence. Since 0! = 1, the claim holds for n = 0. Assume that n ≥ 1 and
38                                                         Chapter 3.        Counting


denote the elements of S by s1 , s2 , . . . , sn . Consider the procedure “write a
permutation of S” and, for i = 1, 2, . . . , n, the task “write the i-th element in
the permutation”. When we do the i-th task, we have already written i − 1
elements of the permutation; we cannot take any of these elements for the
i-th task. Therefore, there are n − (i − 1) = n − i + 1 ways to do the i-th
task. By the Product Rule, there are
                        n · (n − 1) · (n − 2) · · · 2 · 1 = n!
ways to do the procedure. This number is equal to the number of permuta-
tions of S.


   Note that we could also have used Theorem 3.1.3 to prove Theorem 3.6.1:
A permutation of S can be regarded to be a one-to-one function f : S → S.
Therefore, by applying Theorem 3.1.3 with A = S, B = S and, thus, m = n,
we obtain Theorem 3.6.1.
    Consider the set S = {a, b, c, d, e}. How many 3-element subsets does S
have? Recall that in a set, the order of the elements does not matter. Here
is a list of all 10 subsets of S having size 3:
                 {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e},
                  {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
Definition    3.6.2 Let n ≥ 0 and k ≥ 0 be integers. The binomial coefficient
 n
   
 k
     denotes  the  number of k-element subsets of an n-element set.
The symbol nk is pronounced as “n choose k”.
                 

     The example above shows that 53 = 10. Since the empty      set has exactly
                                                              0
one subset of size zero (the empty set itself), we have 0 =n1. Note that
 n
 k
     = 0 if k > n. Below, we derive a formula for the value of k if 0 ≤ k ≤ n.
     Let S be a set with n elements and let A be the set of all ordered sequences
consisting of exactly k pairwise distinct elements of S. We are going to count
the elements of A in two different ways.
     The first way is by using the Product Rule. This gives
                                                                   n!
              |A| = n(n − 1)(n − 2) · · · (n − k + 1) =                  .       (3.1)
                                                                (n − k)!
Observe that (3.1) also follows from Theorem 3.1.3. (Do you see why?)
  In the second way, we do the following:
3.6.    Permutations and Binomial Coefficients                             39

                  n
                   
   • Write all    k
                       subsets of S having size k.

   • For each of these subsets, write a list of all k! permutations of this
     subset.

If we put all these lists together, then we obtain a big list in which each
ordered sequence of k pairwise distinct elements of S appears exactly once.
In other words, the big  list contains each element of A exactly once. Since
the big list has size nk k!, it follows that
                                          
                                          n
                                   |A| =    k!.                          (3.2)
                                          k

Since the right-hand sides of (3.1) and (3.2) are equal (because they are both
equal to |A|), we obtain the following result:

Theorem 3.6.3 Let n and k be integers with 0 ≤ k ≤ n. Then
                         
                          n          n!
                              =             .
                          k      k!(n − k)!

For example,

                                           1·2·3·4·5
                  
                  5        5!        5!
                     =            =      =           = 10
                  3    3!(5 − 3)!   3!2!   1·2·3·1·2

and                           
                              0     0!     1
                                 =      =     = 1;
                              0    0!0!   1·1
recall that we defined 0! to be equal to 1.


3.6.1     Some Examples
First Example: Consider a standard deck of 52 cards. How many hands
of 5 cards are there? Any such hand is a 5-element subset of the set of 52
cards and, therefore, the number of hands of 5 cards is equal to

                               52 · 51 · 50 · 49 · 48
              
               52       52!
                    =        =                        = 2, 598, 960.
                5      5!47!      5·4·3·2·1
40                                                    Chapter 3.      Counting


Second Example: Let n and k be integers with 0 ≤ k ≤ n. How many
bitstrings of length n have exactly k many 1s? We can answer this question
using the Product Rule:
     • The procedure is “write a bitstring of length n having exactly k many
       1s”.
     • Task 1: Consider the set {1, 2, . . . , n} of positions for the bits of the
       string. Choose a k-element subset of this set.
     • Task 2: Write a 1 in each of the k positions of the chosen subset.
    • Task 3: Write a 0 in each of the n − k remaining positions.
There are nk ways to do the first task, there is one way to do the second task,
            

and there is one way to do the third task. Thus, by the Product Rule, the
number of ways to do the procedure and, therefore, the number of bitstrings
of length n having exactly k many 1s, is equal to
                                           
                               n             n
                                   ·1·1=          .
                               k              k
We can also use the Bijection Rule, by observing, in the same way as we did
in Section 3.2, that there is a bijection between
     • the set of all bitstrings of length n having exactly k many 1s, and
   • the set of all k-element subsets of an n-element set.
Since the latter set has size nk , the former set has size nk as well.
                                                            

Theorem 3.6.4 Let n and k be integers with 0 ≤ k ≤ n. The number of
bitstrings of length n having exactly k many 1s is equal to nk .

3.6.2      Newton’s Binomial Theorem
You have learned in high school that

                            (x + y)2 = x2 + 2xy + y 2 .

You have probably also seen that

                       (x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 .
3.6.   Permutations and Binomial Coefficients                                     41


What is the expansion of (x + y)5 ? Observe that

               (x + y)5 = (x + y)(x + y)(x + y)(x + y)(x + y).

If we expand the expression on the right-hand side, we get terms

                           x5 , x4 y, x3 y 2 , x2 y 3 , xy 4 , y 5 ,

each with some coefficient. What is the coefficient of x2 y 3 ? We obtain a term
x2 y 3 , by
   • choosing 3 of the 5 terms x + y,

   • taking y in each of the 3 chosen terms x + y, and

   • taking x in each of the other 2 terms x + y.
Since there are 53 ways to do this, the coefficient of x2 y 3 is equal to   5
                                                                           
                                                                            3
                                                                                = 10.

Theorem 3.6.5 (Newton’s Binomial Theorem) For any integer n ≥ 0,
we have                       n  
                          n
                             X   n n−k k
                   (x + y) =       x y .
                             k=0
                                 k

Proof. The expression (x+y)n is the product of n terms x+y. By expanding
this product, we get a term xn−k y k for each k = 0, 1, . . . , n, each with some
coefficient. We get a term xn−k y k by
   • choosing k of the n terms x + y,

   • taking y in each of the k chosen terms x + y, and

   • taking x in each of the other n − k terms x + y.
Since there are nk ways to do this, the coefficient of xn−k y k is equal to       n
                                                                                  
                                                                                  k
                                                                                       .


   For example, we have
                                                      
                     3       3 3       3 2          3   2   3 3
             (x + y)     =     x +       x y+         xy +    y
                             0         1            2       3
                         = x3 + 3x2 y + 3xy 2 + y 3 .
42                                                            Chapter 3.   Counting


   To determine the coefficient of x12 y 13 in (x + y)  25
                                                        , we take n = 25 and
                                                    25
k = 13 in Newton’s Binomial Theorem, and get 13 .
   What is the coefficient of x12 y 13 in (2x − 5y)25 ? Observe that

                       (2x − 5y)25 = ((2x) + (−5y))25 .

By replacing x by 2x, and y by −5y in Newton’s Binomial Theorem, we get

                                      25  
                             25
                                      X   25
                   (2x − 5y)      =                   (2x)25−k (−5y)k .
                                      k=0
                                                k

By taking k = 13, we obtain the coefficient of x12 y 13 :
                                        
                   25   25−13       13    25
                      ·2      · (−5) = −     · 212 · 513 .
                   13                     13

   Newton’s Binomial Theorem leads to identities for summations involving
binomial coefficients:

Theorem 3.6.6 For any integer n ≥ 0, we have
                                  n  
                                  X  n
                                                    = 2n .
                                  k=0
                                            k

Proof. Take x = y = 1 in Newton’s Binomial Theorem.

  In Section 3.7, we will see a proof of Theorem 3.6.6 that does not use
Newton’s Binomial Theorem.

Theorem 3.6.7 For any integer n ≥ 1, we have
                               n         
                               X         n  k
                                   (−1)     = 0.
                               k=0
                                         k

Proof. Take x = 1 and y = −1 in Newton’s Binomial Theorem.
3.7.   Combinatorial Proofs                                                43


3.7     Combinatorial Proofs
In a combinatorial proof, we show the validity of an identity, such as the one
in Theorem 3.6.6, by interpreting it as the answer to a counting problem.
The identity is proved by solving this counting problem in two different ways.
This gives two answers to the same counting problem. Obviously, these two
answers must be equal. Observe that we have already used     this approach
                                                          n
in Section 3.6: When we determined the formula for k , we counted, in
two different ways, the number of ordered sequences of k pairwise distinct
elements from an n-element set. In this section, we will give several other
examples of combinatorial proofs.

Theorem 3.7.1 For any integers n and k with 0 ≤ k ≤ n, we have
                                     
                           n        n
                               =         .
                           k       n−k

Proof. The claim can be proved using Theorem 3.6.3. To obtain a combi-
natorial proof, let S be a set with n elements. Recall that
   • nk is the number of ways to choose k elements from the set S,
        

which is the same as
  • the number of ways to not choose n − k elements from the set S.
                               n
                                  
The latter number is equal to n−k  .
  We can also prove the claim using Theorem 3.6.4:
   • The number of bitstrings of length n with exactly k many 1s is equal
     to nk .
          

   • The number of bitstrings of length n with exactly n − k many 0s is
               n
     equal to n−k .
Since these two quantities are equal, the theorem follows.


Theorem 3.7.2 (Pascal’s Identity) For any integers n and k with 1 ≤
k ≤ n, we have                         
                    n+1         n       n
                            =      +          .
                       k        k      k−1
44                                                 Chapter 3.      Counting


Proof. As in the previous theorem, the claim can be proved using Theo-
rem 3.6.3. To obtain a combinatorial proof, let S be a set with n+1 elements.
We are going to count the k-element subsets of S in two different ways.
   First, by definition, the number of k-element subsets of S is equal to
                                        
                                   n+1
                                           .                            (3.3)
                                     k

   For the second way, we choose an element x in S and consider the set
T = S \ {x}, i.e., the set obtained by removing x from S. Any k-element
subset of S is of exactly one of the following two types:

     • The k-element subset of S does not contain x.

         – Any such subset is a k-element subset of T . Since T has size n,
           there are nk many k-element subsets of S that do not contain x.
                       

     • The k-element subset of S contains x.

         – If A is any such subset, then B = A \ {x} is a (k − 1)-element
           subset of T .
         – Conversely, for any (k − 1)-element subset B of T , the set A =
           B ∪ {x} is a k-element subset of S that contains x.
         – It follows that the number of k-element subsets of S containing x
                                    (k − 1)-element subsets of T . The latter
           is equal to the number of
                                 n
           number is equal to k−1 .

Thus, the second way of counting shows that the number of k-element subsets
of S is equal to                         
                               n        n
                                   +          .                       (3.4)
                               k      k−1
Since the expressions in (3.3) and (3.4) count the same objects, they must
be equal. Therefore, the proof is complete.


Theorem 3.7.3 For any integer n ≥ 0, we have
                               n  
                               X  n
                                          = 2n .
                                k=0
                                      k
3.7.   Combinatorial Proofs                                                45


Proof. We have seen in Theorem 3.6.6 that this identity follows from New-
ton’s Binomial Theorem. Below, we give a combinatorial proof.
    Consider a set S with n elements. According to Theorem 3.2.1, this set
has 2n many subsets. A different way to count the subsets of S is by dividing
them into (pairwise disjoint) groups according to their sizes. For each k with
0 ≤ k ≤ n, consider  all k-element subsets of S. The number of such subsets
            n
             
is equal to k . If we take the sum of all these binomial coefficients, then we
have counted each subset of S exactly once. Thus,
                                    n  
                                  X     n
                                   k=0
                                        k

is equal to the total number of subsets of S.


Theorem 3.7.4 (Vandermonde’s Identity) For any integers m ≥ 0, n ≥
0, and r ≥ 0 with r ≤ m and r ≤ n, we have
                       r                  
                      X   m      n         m+n
                                       =        .
                      k=0
                          k    r − k        r

Proof. Consider a set S with m + n elements. We are going to count the
r-element subsets of S in two different ways.
    First, by using the definition of binomial coefficients, the number of r-
                                   m+n
                                      
element subsets of S is equal to r .
    For the second way, we partition the set S into two subsets A and B,
where A has size m and B has size n. Observe that any r-element subset of
S contains
   • some elements of A, say k many, and
   • r − k elements of B.
The value of k can be any integer in the set {0, 1, 2, . . . , r}.
   Let k be any integer with 0 ≤ k ≤ r, and let Nk be the number of r-
element subsets of S that
                        Pcontain  exactly k elements of A (and, thus, r − k
                          r
elements of B). Then, k=0 Nk is equal to the total number of r-element
subsets of S and, thus,
                             r               
                            X          m+n
                                Nk =            .
                            k=0
                                          r
46                                                 Chapter 3.      Counting


To determine Nk , we use the Product Rule: We obtain any subset that is
counted in Nk , by
   • choosing k elements in A (there are m
                                            
                                          k
                                              ways to do this) and
                                               n
                                                 
   • choosing r − k elements in B (there are r−k   ways to do this).
It follows that                        
                                  m    n
                            Nk =           .
                                  k   r−k



Corollary 3.7.5 For any integer n ≥ 0, we have
                           n  2      
                          X    n        2n
                                   =        .
                          k=0
                               k        n

Proof. By taking m = n = r in Vandermonde’s Identity, we get
                       n           
                     X     n     n       2n
                                      =       .
                      k=0
                           k   n−k        n

Using Theorem 3.7.1, we get
                                2
                    n       n    n n    n
                               =      =     .
                    k    n−k     k  k   k




3.8      Pascal’s Triangle
             The computational method at the heart of Pascal’s work was
             actually discovered by a Chinese mathematician named Jia Xian
             around 1050, published by another Chinese mathematician, Zhu
             Shijie, in 1303, discussed in a work by Cardano in 1570, and
             plugged into the greater whole of probability theory by Pascal,
             who ended up getting most of the credit.
                        — Leonard Mlodinow, The Drunkard’s Walk, 2008
3.8.     Pascal’s Triangle                                                      47


   We have seen that


         n
          
   •     0
              = 1 for all integers n ≥ 0,


         n
          
   •     n
              = 1 for all integers n ≥ 0,


         n
            = n−1      n−1
                         
   •     k     k−1
                     +  k
                             for all integers n ≥ 2 and k with 1 ≤ k ≤ n − 1;
        see Theorem 3.7.2.


These relations lead to an algorithm for generating binomial coefficients:

       Algorithm GenerateBinomCoeff:

              BCoeff (0, 0) = 1;
              for n = 1, 2, 3, . . .
              do BCoeff (n, 0) = 1;
                 for k = 1 to n − 1
                 do BCoeff (n, k) = BCoeff (n − 1, k − 1) + BCoeff (n − 1, k)
                 endfor;
                 BCoeff (n, n) = 1
              endfor



   The values BCoeff (n, k) that are computed by this (non-terminating)
algorithm satisfy

                                          
                                          n
                         BCoeff (n, k) =     for 0 ≤ k ≤ n.
                                          k


The triangle obtained by arranging
                             n thesenbinomial       coefficients, with the n-th
                           n
                                           
row containing all values 0 , 1 , . . . , n , is called Pascal’s Triangle. The
figure below shows rows 0, 1, . . . , 6:
48                                                                             Chapter 3.   Counting

                                                       0
                                                       
                                                       0

                                                   1       1
                                                          
                                                   0       1

                                           2           2       2
                                                             
                                           0           1       2

                                       3           3       3       3
                                                                    
                                       0           1       2       3

                                   4       4           4       4           4
                                                                       
                                   0       1           2       3           4

                               5       5           5       5       5            5
                                                                           
                               0       1           2       3       4            5

                       6           6       6           6       6           6        6
                                                                              
                       0           1       2           3       4           5        6



    We obtain the values for the binomial coefficients by using the following
rules:

     • Each value along the boundary is equal to 1.

     • Each value in the interior is equal to the sum of the two values above
       it.

In Figure 3.1, you see rows 0, 1, . . . , 12.
   Below, we state some of our earlier results using Pascal’s Triangle.

     • The values in the n-th row are equal to the coefficients in Newton’s
       Binomial Theorem (i.e., Theorem 3.6.5). For example, the coefficients
       in the expansion of (x + y)5 are given in the 5-th row:

                (x + y)5 = x5 + 5x4 y + 10x3 y 2 + 10x2 y 3 + 5xy 4 + y 5 .

     • Theorem 3.6.6 states that the sum of all values in the n-th row is equal
       to 2n .

     • Theorem 3.7.1 states that reading the n-th row from left to right gives
       the same sequence as reading this row from right to left.

     • Corollary 3.7.5 states that the sum of the squares of all values in the
       n-th row is equal to the middle element in the 2n-th row.
3.9.       More Counting Problems                                                                                               49




                                                                 1

                                                            1         1

                                                       1         2         1

                                                  1         3         3         1

                                             1         4         6         4         1

                                        1         5         10        10        5         1

                                   1         6         15        20        15        6         1

                              1         7         21        35        35        21        7         1

                         1         8         28        56        70        56        28        8         1

                    1         9         36        84       126 126              84        36        9         1

               1         10        45       120 210 252 210 120                                45        10        1

           1        11        55       165 330 462 462 330 165                                      55        11        1

       1       12        66       220 495 792 924 792 495 220                                            66        12       1

                     Figure 3.1: Rows 0, 1, . . . , 12 of Pascal’s Triangle.
50                                                   Chapter 3.      Counting


3.9       More Counting Problems
3.9.1      Reordering the Letters of a Word
How many different strings can be made by reordering the letters of the
7-letter word
                              SUCCESS.
It should be clear that the answer is not 7!: If we swap, for example, the two
occurrences of C, then we obtain the same string.
    The correct answer can be obtained by applying the Product Rule. We
start by counting the frequencies of each letter:
     • The letter S occurs 3 times.
     • The letter C occurs 2 times.
     • The letter U occurs 1 time.
     • The letter E occurs 1 time.
To apply the Product Rule, we have to specify the procedure and the tasks:
     • The procedure is “write the letters occurring in the word SUCCESS”.
     • The first task is “choose 3 positions out of 7, and write the letter S in
       each chosen position”.
     • The second task is “choose 2 positions out of the remaining 4, and write
       the letter C in each chosen position”.
     • The third task is “choose 1 position out of the remaining 2, and write
       the letter U in the chosen position”.
    • The fourth task is “choose 1 position out of the remaining 1, and write
      the letter E in the chosen position”.
                  7                               4
                                                   
Since  there are  3
                     ways to do the first  task,  2
                                                      ways to do the second task,
 2                                     1
                                        
 1
    ways to do the third task, and 1 way to do the fourth task, it follows
that the total number of different strings that can be made by reordering the
letters of the word SUCCESS is equal to
                               
                             7 4 2 1
                                                 = 420.
                             3 2 1 1
3.9.    More Counting Problems                                                51


   In the four tasks above, we first chose the positions for the letter S, then
the positions for the letter C, then the position for the letter U, and finally
the position for the letter E. If we change the order, then we obtain the same
answer. For example, if we choose the positions for the letters in the order
C, E, U, S, then we obtain
                                   
                                 7 5 4 3
                                                  ,
                                 2 1 1 3

which is indeed equal to 420.

3.9.2     Counting Solutions of Linear Equations
Consider the equation
                                x1 + x2 + x3 = 11.
We are interested in the number of solutions (x1 , x2 , x3 ), where x1 ≥ 0,
x2 ≥ 0, x3 ≥ 0 are integers. Examples of solutions are

                      (2, 3, 6), (3, 2, 6), (0, 11, 0), (2, 0, 9).

Observe that we consider (2, 3, 6) and (3, 2, 6) to be different solutions.
    We are going to use the Bijection Rule to determine the number of solu-
tions. For this, we define A to be the set of all solutions, i.e.,

A = {(x1 , x2 , x3 ) : x1 ≥ 0, x2 ≥ 0, x3 ≥ 0 are integers, x1 + x2 + x3 = 11}.

To apply the Bijection Rule, we need a set B and a bijection f : A → B,
such that it is easy to determine the size of B. This set B should be chosen
such that its elements “encode” the elements of A in a unique way. Consider
the following set B:

   • B is the set of all bitstrings of length 13 that contain exactly 2 many
     1s (and, thus, exactly 11 many 0s).

The function f : A → B is defined as follows: If (x1 , x2 , x3 ) is an element of
the set A, then f (x1 , x2 , x3 ) is the bitstring

   • that starts with x1 many 0s,

   • is followed by one 1,
52                                                      Chapter 3.       Counting


     • is followed by x2 many 0s,

     • is followed by one 1,

     • and ends with x3 many 0s.

For example, we have

                           f (2, 3, 6) = 0010001000000,

                           f (3, 2, 6) = 0001001000000,
                          f (0, 11, 0) = 1000000000001,
and
                           f (2, 0, 9) = 0011000000000.
   To show that this function f maps elements of A to elements of B, we
have to verify that the string f (x1 , x2 , x3 ) belongs to the set B. This follows
from the following observations:

     • The string f (x1 , x2 , x3 ) contains exactly 2 many 1s.

     • The number of 0s in the string f (x1 , x2 , x3 ) is equal to x1 + x2 + x3 ,
       which is equal to 11, because (x1 , x2 , x3 ) belongs to the set A.

     • Thus, f (x1 , x2 , x3 ) is a bitstring of length 13 that contains exactly 2
       many 1s.

    It should be clear that this function f is one-to-one: If we take two
different elements (x1 , x2 , x3 ) in A, then f gives us two different bitstrings
f (x1 , x2 , x3 ).
    To prove that f is onto, we have to show that for every bitstring b in the
set B, there is an element (x1 , x2 , x3 ) in A such that f (x1 , x2 , x3 ) = b. This
element of A is obtained by taking

     • x1 to be the number of 0s to the left of the first 1,

     • x2 to be the number of 0s between the two 1s, and

     • x3 to be the number of 0s to the right of the second 1.
3.9.   More Counting Problems                                                  53


For example, if b = 0000110000000, then x1 = 4, x2 = 0, and x3 = 7.
Note that, since b has length 13 and contains exactly 2 many 1s, we have
x1 + x2 + x3 = 11 and, therefore, (x1 , x2 , x3 ) ∈ A.
   Thus, we have shown that f : A → B is indeed a bijection. We know from
Theorem 3.6.4 that B has size 13 2
                                   . Therefore, it follows from the Bijection
Rule that                                
                                          13
                          |A| = |B| =             = 78.
                                            2
    The following theorem states this result for general linear equations. You
are encouraged to come up with the proof.

Theorem 3.9.1 Let k ≥ 1 and n ≥ 0 be integers. The number of solutions
to the equation
                      x1 + x2 + · · · + xk = n,
where x1 ≥ 0, x2 ≥ 0, . . . , xk ≥ 0 are integers, is equal to

                                   n+k−1
                                            
                                               .
                                     k−1

    Let us now consider inequalities instead of equations. For example, con-
sider the inequality
                            x1 + x2 + x3 ≤ 11.
Again, we are interested in the number of solutions (x1 , x2 , x3 ), where x1 ≥ 0,
x2 ≥ 0, x3 ≥ 0 are integers. This inequality contains the same solutions as
before, but it has additional solutions such as

                       (2, 3, 5), (3, 2, 5), (0, 1, 0), (0, 0, 0).

As before, we are going to apply the Bijection Rule. We define

 A = {(x1 , x2 , x3 ) : x1 ≥ 0, x2 ≥ 0, x3 ≥ 0 are integers, x1 + x2 + x3 ≤ 11}

and B to be the set of all bitstrings of length 14 that contain exactly 3 many
1s (and, thus, exactly 11 many 0s).
    The function f : A → B is defined as follows: If (x1 , x2 , x3 ) is an element
of A, then f (x1 , x2 , x3 ) is the bitstring

   • that starts with x1 many 0s,
54                                                    Chapter 3.      Counting


     • is followed by one 1,

     • is followed by x2 many 0s,

     • is followed by one 1,

     • is followed by x3 many 0s,

     • is followed by one 1,

     • and ends with 14 − (x1 + x2 + x3 + 3) many 0s.

For example, we have

                         f (2, 3, 6) = 00100010000001,

                         f (2, 3, 5) = 00100010000010,

                         f (0, 1, 0) = 10110000000000,
and
                         f (0, 0, 0) = 11100000000000.
As before, it can be verified that the string f (x1 , x2 , x3 ) belongs to the set
B and the function f is a bijection. It then follows from the Bijection Rule
that                                   
                                        14
                          |A| = |B| =        = 364.
                                         3
The next theorem gives the answer for the general case. As before, you are
encouraged to give a proof.

Theorem 3.9.2 Let k ≥ 1 and n ≥ 0 be integers. The number of solutions
to the inequality
                      x1 + x2 + · · · + xk ≤ n,
where x1 ≥ 0, x2 ≥ 0, . . . , xk ≥ 0 are integers, is equal to
                                          
                                     n+k
                                             .
                                       k
3.10.     The Pigeonhole Principle                                                    55


3.10         The Pigeonhole Principle
In any group of 366 people, there must be two people having the same birth-
day: Since there are 365 days in a year (ignoring leap years), it is not possible
that the birthdays of 366 people are all distinct.
     Pigeonhole Principle: Let k ≥ 1 be an integer. If k + 1 or more ob-
     jects are placed into k boxes, then there is at least one box containing
     two or more objects.
     Equivalently, if A and B are two finite sets with |A| > |B|, then there
     is no one-to-one function from A to B.


3.10.1        India Pale Ale
                        President of the
                                                                     Favorite Drink
              Carleton Computer Science Society
                   Simon Pratt (2013–2014)                           India Pale Ale
                 Lindsay Bangs (2014–2015)                            Wheat Beer
                  Connor Hillen (2015–2016)                            Black IPA
                   Elisa Kazan (2016–2019)                               Cider
                   William So (2019–2020)                             Amber Lager

    Simon Pratt loves to drink India Pale Ale (IPA). During each day of the
month of April (which has 30 days), Simon drinks at least one bottle of IPA.
During this entire month, he drinks exactly 45 bottles of IPA. The claim is
that there must be a sequence of consecutive days in April, during which
Simon drinks exactly 14 bottles of IPA.
    To prove this, let bi be the number of bottles that Simon drinks on April i,
for i = 1, 2, . . . , 30. We are given that each bi is a positive integer (i.e., bi ≥ 1)
and
                                b1 + b2 + · · · + b30 = 45.
Define, for i = 1, 2, . . . , 30,
                                    ai = b 1 + b 2 + · · · + b i ,
i.e., ai is the total number of bottles of IPA that Simon drinks during the
first i days of April. Consider the sequence of 60 numbers
                    a1 , a2 , . . . , a30 , a1 + 14, a2 + 14, . . . , a30 + 14.
56                                                      Chapter 3.       Counting


Each number in this sequence is an integer that belongs to the set

                                  {1, 2, . . . , 59}.

Therefore, by the Pigeonhole Principle, these 60 numbers cannot all be dis-
tinct. Observe that there are no duplicates in the sequence a1 , a2 , . . . , a30 ,
because all bi are at least one. Similarly, there are no duplicates in the se-
quence a1 + 14, a2 + 14, . . . , a30 + 14. It follows that there are two indices i
and j such that
                                     ai = aj + 14.
Observe that j must be less than i and

                      14 = ai − aj = bj+1 + bj+2 + · · · + bi .

Thus, in the period from April j + 1 until April i, Simon drinks exactly 14
bottles of IPA.

3.10.2       Sequences Containing Divisible Numbers
Let A = {1, 2, . . . , 2n} and consider the sequence n + 1, n + 2, . . . , 2n of ele-
ments in A. This sequence has the property that none of its elements divides
any other element in the sequence. Note that the sequence has length n. The
following theorem states that such a sequence of length n + 1 does not exist.

Theorem 3.10.1 Let n ≥ 1 and consider a sequence a1 , a2 , . . . , an+1 of n+1
elements from the set {1, 2, . . . , 2n}. Then there are two distinct indices i and
j such that ai divides aj or aj divides ai .

Proof. For each i with 1 ≤ i ≤ n + 1, write

                                   ai = 2ki · qi ,

where ki ≥ 0 is an integer and qi is an odd integer. For example,

     • if ai = 48, then ki = 4 and qi = 3, because 48 = 24 · 3,

     • if ai = 1, then ki = 0 and qi = 1, because 1 = 20 · 1,

     • if ai = 7, then ki = 0 and qi = 7, because 7 = 20 · 7.
3.10.    The Pigeonhole Principle                                                  57


Consider the sequence q1 , q2 , . . . , qn+1 of n+1 integers. Each of these numbers
is an odd integer that belongs to the set
                              {1, 3, 5, . . . , 2n − 1}.
Since this set has size n, the Pigeonhole Principle implies that there must be
two numbers in the sequence q1 , q2 , . . . , qn+1 that are equal. In other words,
there are two distinct indices i and j such that qi = qj . It follows that
                              ai   2ki · qi
                                 = kj       = 2ki −kj .
                              aj  2 · qj
Thus, if ki ≥ kj , then aj divides ai . Otherwise, ki < kj , and ai divides aj .




3.10.3      Long Monotone Subsequences
Let n = 3, and consider the sequence 20, 10, 9, 7, 11, 2, 21, 1, 20, 31 of 10 =
n2 + 1 numbers. This sequence contains an increasing subsequence of length
4 = n + 1, namely 10, 11, 21, 31. The following theorem states this result for
arbitrary values of n.

Theorem 3.10.2 Let n ≥ 1 be an integer. Every sequence of n2 + 1 distinct
real numbers contains a subsequence of length n + 1 that is either increasing
or decreasing.

Proof. Let a1 , a2 , . . . , an2 +1 be an arbitrary sequence of n2 + 1 distinct real
numbers. For each i with 1 ≤ i ≤ n2 + 1, let inc i denote the length of
the longest increasing subsequence that starts at ai , and let dec i denote the
length of the longest decreasing subsequence that starts at ai .
   Using this notation, the claim in the theorem can be formulated as follows:
There is an index i such that inc i ≥ n + 1 or dec i ≥ n + 1.
   We will prove the claim by contradiction. Thus, we assume that inc i ≤ n
and dec i ≤ n for all i with 1 ≤ i ≤ n2 + 1.
   Consider the set
                      B = {(b, c) : 1 ≤ b ≤ n, 1 ≤ c ≤ n},
and think of the elements of B as being boxes. For each i with 1 ≤ i ≤ n2 +1,
the pair (inc i , dec i ) is an element of B. Thus, we have n2 + 1 elements
58                                                            Chapter 3.   Counting


(inc i , dec i ), which are placed in the n2 boxes of B. By the Pigeonhole Prin-
ciple, there must be a box that contains two (or more) elements. In other
words, there exist two integers i and j such that i < j and

                           (inc i , dec i ) = (inc j , dec j ).

Recall that the elements in the sequence are distinct. Hence, ai 6= aj . We
consider two cases.
    First assume that ai < aj . Then the length of the longest increasing
subsequence starting at ai must be at least 1+inc j , because we can append ai
to the longest increasing subsequence starting at aj . Therefore, inc i 6= inc j ,
which is a contradiction.
    The second case is when ai > aj . Then the length of the longest decreasing
subsequence starting at ai must be at least 1+dec j , because we can append ai
to the longest decreasing subsequence starting at aj . Therefore, dec i 6= dec j ,
which is again a contradiction.


3.10.4      There are Infinitely Many Primes
As a final application of the Pigeonhole Principle, we prove the following
result:

Theorem 3.10.3 There are infinitely many prime numbers.

Proof. The proof is by contradiction. Thus, we assume that there are, say,
k prime numbers, and denote them by

                            2 = p1 < p2 < · · · < pk .

Note that k is a fixed integer. Since
                                      2n
                                lim         = ∞,
                               n→∞ (n + 1)k


we can choose an integer n such that

                                  2n > (n + 1)k .

Define the function
                            f : {1, 2, . . . , 2n } → Nk
3.11.   Exercises                                                            59


as follows: For any integer x with 1 ≤ x ≤ 2n , consider its prime factorization
                                                  mk
                               x = pm    m2
                                    1 · p2 · · · pk .
                                     1




We define
                            f (x) = (m1 , m2 , . . . , mk ).
Since

                  mi ≤   m1 + m2 + · · · + mk
                     ≤   m1 log p1 + m2 log p2 + · · · + mk log pk
                                             mk
                     =   log (pm    m2
                               1 · p2 · · · pk )
                                1


                     =   log x
                     ≤   n,

it follows that
                             f (x) ∈ {0, 1, 2, . . . , n}k .
Thus, f is a function

                     f : {1, 2, . . . , 2n } → {0, 1, 2, . . . , n}k .

It is easy to see that this function is one-to-one. The set on the left-hand
side has size 2n , whereas the set on the right-hand side has size (n + 1)k . It
then follows from the Pigeonhole Principle that

                                   (n + 1)k ≥ 2n ,

which contradicts our choice for n.



3.11        Exercises
3.1 A licence plate number consists of a sequence of four uppercase letters
followed by three digits. How many licence plate numbers are there?

3.2 A multiple-choice exam consists of 100 questions. Each question has
four possible answers a, b, c, and d. How many ways are there to answer the
100 questions (assuming that each question is answered)?
60                                                      Chapter 3.   Counting


3.3 For each of the following seven cases, determine how many strings of
eight uppercase letters there are.

     • Letters can be repeated.

     • No letter can be repeated.

     • The strings start with PQ (in this order) and letters can be repeated.

     • The strings start with PQ (in this order) and no letter can be repeated.

     • The strings start and end with PQ (in this order) and letters can be
       repeated.

     • The strings start with XYZ (in this order), end with QP (in this order),
       and letters can be repeated.

     • The strings start with XYZ (in this order) or end with QP (in this
       order), and letters can be repeated.

3.4 If n and d are positive integers, then d is a divisor of n, if n/d is an
integer.
    Determine the number of divisors of the integer

                    1, 170, 725, 783, 076, 864 = 217 · 312 · 75 .

3.5 Let k ≥ 1 and n ≥ 1 be integers. Consider k distinct beer bottles and
n distinct students. How many ways are there to hand out the beer bottles
to the students, if there is no restriction on how many bottles a student may
get?

3.6 The Carleton Computer Science Society has a Board of Directors con-
sisting of one president, one vice-president, one secretary, one treasurer, and
a three-person party committee (whose main responsibility is to buy beer
for the other four board members). The entire board consists of seven dis-
tinct students. If there are n ≥ 7 students in Carleton’s Computer Science
program, how many ways are there to choose a Board of Directors?

3.7 The Carleton Computer Science Society has an Academic Events Com-
mittee (AEC) consisting of five students and a Beer Committee (BC) con-
sisting of six students (whose responsibility is to buy beer for the AEC).
3.11.   Exercises                                                         61


   • Assume there are n ≥ 6 students in Carleton’s Computer Science pro-
     gram. Also, assume that a student can be both on the AEC and on the
     BC. What is the total number of ways in which these two committees
     can be chosen?
   • Assume there are n ≥ 11 students in Carleton’s Computer Science
     program. Also, assume that a student cannot be both on the AEC
     and on the BC. What is the total number of ways in which these two
     committees can be chosen?
3.8 Let f ≥ 2, m ≥ 2, and k ≥ 2 be integers such that k ≤ f and k ≤ m.
The Carleton Computer Science program has f female students and m male
students. The Carleton Computer Science Society has a Board of Directors
consisting of k students. At least one of the board members is female and at
least one of the board members is male. Determine the number of ways in
which a Board of Directors can be chosen.
3.9 Let f ≥ 4 and m ≥ 4 be integers. The Carleton Computer Science
program has f female students and m male students that are eligible to be
a TA for COMP 2804. Determine the number of ways to choose eight TAs
out of these f + m students, such that the number of female TAs is equal to
the number of male TAs
3.10 Let m and n be integers with 0 ≤ m ≤ n. There are n + 1 students
in Carleton’s Computer Science program. The Carleton Computer Science
Society has a Board of Directors, consisting of one president and m vice-
presidents. The president cannot be vice-president. Prove that
                                                  
                            n                   n+1
                   (n + 1)      = (n + 1 − m)          ,
                            m                     m
by determining, in two different ways, the number of ways to choose a Board
of Directors.
3.11 In Tic-Tac-Toe, we are given a 3 × 3 grid, consisting of unmarked cells.
Two players, Xavier and Olivia, take turns marking the cells of this grid.
When it is Xavier’s turn, he chooses an unmarked cell and marks it with the
letter X. Similarly, when it is Olivia’s turn, she chooses an unmarked cell
and marks it with the letter O. The first turn is by Xavier. The players
continue making turns until all cells have been marked. Below, you see an
example of a completely marked grid.
62                                                    Chapter 3.       Counting


                                     O   O X
                                     X X    O
                                     X X    O


     • What is the number of completely marked grids?
     • What is the number of different ways (i.e., ordered sequences) in which
       the grid can be completely marked, when following the rules given
       above?

3.12 In how many ways can you paint 200 chairs, if 33 of them must be
painted red, 66 of them must be painted blue, and 101 of them must be
painted green?

3.13 Let A be the set of all integers x > 6543 such that the decimal repre-
sentation of x has distinct digits, none of which is equal to 7, 8, or 9. (The
decimal representation does not have leading zeros.) Determine the size of
the set A.

3.14 Let A be the set of all integers x ∈ {1, 2, . . . , 100} such that the decimal
representation of x does not contain the digit 4. (The decimal representation
does not have leading zeros.)
     • Determine the size of the set A without using the Complement Rule.
     • Use the Complement Rule to determine the size of the set A.

3.15 Let A be a set of size m, let B be a set of size n, and assume that
n ≥ m ≥ 1. How many functions f : A → B are there that are not one-to-
one?

3.16 Consider permutations of the set {a, b, c, d, e, f, g} that do not contain
bge (in this order) and do not contain eaf (in this order). Prove that the
number of such permutations is equal to 4806.

3.17 How many bitstrings of length 8 are there that contain at least 4 con-
secutive 0s or at least 4 consecutive 1s?

3.18 How many bitstrings of length 77 are there that start with 010 (i.e.,
have 010 at positions 1, 2, and 3), or have 101 at positions 2, 3, and 4, or
have 010 at positions 3, 4, and 5?
3.11.   Exercises                                                            63


3.19 Let n ≥ 12 be an integer and let {B1 , B2 , . . . , Bn } be a set of n beer
bottles. Consider permutations of these bottles such that there are exactly
10 bottles between B1 and Bn . (B1 can be to the left or right of Bn .) Prove
that the number of such permutations is equal to
                         n−2
                              
                                 · 10! · 2 · (n − 11)!.
                          10
3.20 Let n ≥ 3 be an integer. The Gn (or Group of n) is an international
forum where the n leaders of the world meet to drink beer together. Two
of these leaders are Donald Trump and Justin Trudeau. At the end of their
meeting, the n leaders stand on a line and a group photo is taken.
   • Determine the number of ways in which the n leaders can be arranged
     on a line, if Donald Trump and Justin Trudeau are standing next to
     each other.
   • Determine the number of ways in which the n leaders can be arranged
     on a line, if Donald Trump and Justin Trudeau are not standing next
     to each other.
   • Determine the number of ways in which the n leaders can be arranged
     on a line, if Donald Trump is to the left of Justin Trudeau. (Donald
     does not necessarily stand immediately to the left of Justin.)
3.21 A string of letters is called a palindrome, if reading the string from left
to right gives the same result as reading the string from right to left. For
example, madam and racecar are palindromes. Recall that there are five
vowels in the English alphabet: a, e, i, o, and u.
    In this exercise, we consider strings consisting of 28 characters, with each
character being a lowercase letter. Determine the number of such strings that
start and end with the same letter, or are palindromes, or contain vowels only.
3.22 A flip in a bitstring is a pair of adjacent bits that are not equal. For
example, the bitstring 010011 has three flips: The first two bits form a flip,
the second and third bits form a flip, and the fourth and fifth bits form a
flip.
   • Determine the number of bitstrings of length 7 that have exactly 3 flips
     at the following positions: The second and third bits form a flip, the
     third and fourth bits form a flip, and the fifth and sixth bits form a
     flip.
64                                                    Chapter 3.       Counting


     • Let n ≥ 2 and k be integers with 0 ≤ k ≤ n − 1. Determine the number
       of bitstrings of length n that have exactly k flips.

3.23 Let m and n be integers with m ≥ n ≥ 1. How many ways are there
to place m books on n shelves, if there must be at least one book on each
shelf? As in Section 3.1.3, the order on each shelf matters.

3.24 You are given m distinct books B1 , B2 , . . . , Bm and n identical blocks
of wood. How many ways are there to arrange these books and blocks in a
straight line?
    For example, if m = 5 and n = 3, then three possible arrangements are
(W stands for a block of wood)

                             W B3 B1 W B5 B4 W B2 ,

                             W B1 B3 W B5 B4 W B2 ,
and
                             B5 W B3 B1 W W B2 B4 .

3.25 Let n ≥ 1 be an integer and consider n boys and n girls. For each
of the following three cases, determine how many ways there are to arrange
these 2n people on a straight line (the order on the line matters):

     • All boys stand next to each other and all girls stand next to each other.

     • All girls stand next to each other.

     • Boys and girls alternate.

3.26 Elisa Kazan has a set {C1 , C2 , . . . , C50 } consisting of 50 cider bottles.
She divides these bottles among 5 friends, so that each friend receives a
subset consisting of 10 bottles. Determine the number of ways in which Elisa
can divide the bottles.

3.27 Let n ≥ 1 be an integer. Consider a tennis tournament with 2n par-
ticipants. In the first round of this tournament, n games will be played and,
thus, the 2n people have to be divided into n pairs. What is the number of
ways in which this can be done?
3.11.    Exercises                                                                  65


3.28 The Ottawa Senators and the Toronto Maple Leafs play a best-of-7
series: These two hockey teams play games against each other, and the first
team to win 4 games wins the series. Each game has a winner (thus, no game
ends in a tie).
    A sequence of games can be described by a string consisting of the char-
acters S (indicating that the Senators win the game) and L (indicating that
the Leafs win the game). Two possible ways for the Senators to win the
series are (L, S, S, S, S) and (S, L, S, L, S, S).
    Determine the number of ways in which the Senators can win the series.

3.29 The Beer Committee of the Carleton Computer Science Society has
bought large quantities of 10 different types of beer. In order to test which
beer students prefer, the committee does the following experiment:

    • Out of the n ≥ 10 students in Carleton’s Computer Science program,
      10 students are chosen.

    • Each of the 10 students chosen drinks one of the 10 beers; no two
      students drink the same beer.

What is the number of ways in which this experiment can be done?

3.30 Let m, n, k, and ` be integers such that m ≥ 1, n ≥ 1, 1 ≤ ` ≤ k ≤ `+m
and ` ≤ n.
    After a week of hard work, Elisa Kazan goes to her neighborhood pub.
This pub has m different types of beer and n different types of cider on tap.
Elisa decides to order k pints: At most one pint of each type, and exactly `
pints of cider. Determine the number of ways in which Elisa can order these
k pints. The order in which Elisa orders matters.

3.31 Let m ≥ 2 and n ≥ 2 be even integers. You are given m beer bottles
B1 , B2 , . . . , Bm and n cider bottles C1 , C2 , . . . , Cn . Assume you arrange these
m + n bottles on a horizontal line such that

    • the leftmost m/2 bottles are all beer bottles, and

    • the rightmost n/2 bottles are all cider bottles.

How many such arrangements are there?
66                                                   Chapter 3.      Counting


3.32 Consider 10 male students M1 , M2 , . . . , M10 and 7 female students F1 ,
F2 , . . . , F7 . Assume these 17 students are arranged on a horizontal line such
that no two female students are standing next to each other. We are inter-
ested in the number of such arrangements, where the order of the students
matters.

     • Explain what is wrong with the following argument:

       We are going to use the Product Rule:

          – Task 1: Arrange the 7 females on a line. There are 7! ways
            to do this.

          – Task 2: Choose 6 males. There are 10
                                                    
                                                  6
                                                      ways to do this.

          – Task 3: Place the 6 males chosen in Task 2 in the 6 “gaps”
            between the females. There are 6! ways to do this.

          – Task 4: At this moment, we have arranged 13 students on
            a line. We are left with 4 males that have to be placed.

               ∗ Task 4.1: Place one male. There are 14 ways to do
                 this.
               ∗ Task 4.2: Place one male. There are 15 ways to do
                 this.
               ∗ Task 4.3: Place one male. There are 16 ways to do
                 this.
               ∗ Task 4.4: Place one male. There are 17 ways to do
                 this.

       By the Product Rule, the total number of ways to arrange the
       17 students is equal to
                  
                  10
            7! ·       · 6! · 14 · 15 · 16 · 17 = 43, 528, 181, 760, 000.
                  6


     • Determine the number of ways to arrange the 17 students.
       Hint: Use the Product Rule. What is easier to count: Placing the
3.11.     Exercises                                                              67


        female students first and then the male students, or placing the male
        students first and then the female students?

3.33 Let n ≥ 1 be an integer. A function f : {1, 2, . . . , n} → {1, 2, . . . , n}
is called awesome, if there is at least one integer i in {1, 2, . . . , n} for which
f (i) = i. Determine the number of awesome functions.

3.34 Let n ≥ 2 be an integer. Consider strings consisting of n digits.
   • Determine the number of such strings, in which no two consecutive
     digits are equal.
   • Determine the number of such strings, in which there is at least one
     pair of consecutive digits that are equal.

3.35 Consider strings consisting of 12 characters, each character being a, b,
or c. Such a string is called valid, if at least one of the characters is missing.
For example, abababababab is a valid string, whereas abababacabab is not a
valid string. How many valid strings are there?

3.36 Consider strings consisting of 40 characters, where each character is an
element of {a, b, c}. Such a string is called cool, if it contains exactly 8 many
a’s or exactly 7 many b’s. Determine the number of cool strings.

3.37 A password consists of 100 characters, each character being a digit or
a lowercase letter. A password must contain at least two digits. How many
passwords are there?

3.38 A password is a string of ten characters, where each character is a
lowercase letter, a digit, or one of the eight special characters !, @, #, $, %,
&, (, and ).
   A password is called awesome, if it contains at least one digit or at least
one special character. Determine the number of awesome passwords.

3.39 A password consists of 100 characters, each character being a digit, a
lowercase letter, or an uppercase letter. A password must contain at least
one digit, at least one lowercase letter, and at least one uppercase letter.
How many passwords are there?
Hint: Recall De Morgan’s Law

                            A ∩ B ∩ C = A ∪ B ∪ C.
68                                                     Chapter 3.    Counting


3.40 A password is a string of 100 characters, where each character is a digit
or a lowercase letter. A password is called valid, if

     • it does not start with abc, and

     • it does not end with xyz, and

     • it does not start with 3456.

Determine the number of valid passwords.

3.41 A password is a string of 8 characters, where each character is a low-
ercase letter or a digit. A password is called valid, if it contains at least one
digit. In Section 3.3, we have seen that the number of valid passwords is
equal to
                        368 − 268 = 2, 612, 282, 842, 880.
Explain what is wrong with the following method to count the valid pass-
words.
     We are going to use the Product Rule.

         • The procedure is “write a valid password”.

         • Since a valid password contains at least one digit, we choose,
           in the first task, a position for the digit.

         • The second task is to write a digit at the chosen position.

         • The third task is to write a character (lowercase letter or digit)
           at each of the remaining 7 positions.

     There are 8 ways to do the first task, 10 ways to do the second task,
     and 367 ways to do the third task. Therefore, by the Product Rule,
     the number of valid passwords is equal to

                        8 · 10 · 367 = 6, 269, 133, 127, 680.


3.42 Consider permutations of the 26 lowercase letters a, b, c, . . . , z.

     • How many such permutations contain the string wine?
3.11.    Exercises                                                            69


   • How many such permutations do not contain any of the strings wine,
     vodka, or coke?

3.43 Determine the number of integers in the set {1, 2, . . . , 1000} that are
not divisible by any of 5, 7, and 11.

3.44 Let n ≥ 4 be an integer. Determine the number of permutations of
{1, 2, . . . , n}, in which

   • 1 and 2 are next to each other, with 1 to the left of 2, or

   • 4 and 3 are next to each other, with 4 to the left of 3.

3.45 Determine the number of functions

                         f : {1, 2, 3, 4} → {a, b, c, . . . , z},

such that f (1) = f (2), or f (3) = f (4), or f (1) 6= f (3).

3.46 Let n ≥ 3 be an integer. Determine the number of permutations of
{1, 2, . . . , n}, in which

   • 1 and 2 are next to each other, with 1 to the left of 2, or

   • 2 and 3 are next to each other, with 2 to the left of 3.

Compare your answer with the answer to Exercise 3.44.

3.47 Let n and k be integers with 2 ≤ k ≤ n, and consider the set S =
{1, 2, 3, . . . , 2n}. An ordered sequence of k elements of S is called valid, if

   • this sequence is strictly increasing, or

   • this sequence is strictly decreasing, or

   • this sequence contains only even numbers (and duplicate elements are
     allowed).

Determine the number of valid sequences.

3.48 Let n ≥ 2 be an integer.
70                                                  Chapter 3.     Counting


     • Determine the number of strings consisting of n characters, where each
       character is an element of the set {a, b, 0}.

     • Let S be a set consisting of n elements. Determine the number of
       ordered pairs (A, B), where A ⊆ S, B ⊆ S, and A ∩ B = ∅.

     • Let S be a set consisting of n elements. Consider ordered pairs (A, B),
       where A ⊆ S, B ⊆ S, and |A ∩ B| = 1. Prove that the number of such
       pairs is equal to n · 3n−1 .

3.49 In a group of 20 people,

     • 6 are blond,

     • 7 have green eyes,

     • 11 are not blond and do not have green eyes.

How many people in this group are blond and have green eyes?

3.50 Let n ≥ 1 be an integer.

     • Assume that n is odd. Determine the number of bitstrings of length n
       that contain more 0’s than 1’s. Justify your answer in plain English.

     • Assume that n is even.

         – Determine the number of bitstrings of length n in which the num-
           ber of 0’s is equal to the number of 1’s.
         – Determine the number of bitstrings of length n that contain strictly
           more 0’s than 1’s.
         – Argue that the binomial coefficient
                                            
                                           n
                                          n/2

            is an even integer.

3.51 Use Pascal’s Identity (Theorem 3.7.2) to prove Newton’s Binomial The-
orem (i.e., Theorem 3.6.5) by induction.
3.11.    Exercises                                                                  71


3.52 Determine the coefficient of x111 y 444 in the expansion of
                                  (−17x + 71y)555 .
3.53 Nick is not only your friendly TA2 , he also has a part-time job in a
grocery store. This store sells n different types of India Pale Ale (IPA) and
n different types of wheat beer, where n ≥ 2 is an integer. Prove that
                                       
                              2n         n
                                   =2        + n2 ,
                               2          2
by counting, in two different ways, the number of ways to choose two different
types of beer.
3.54 You have won the first prize in the Louis van Gaal Impersonation Con-
test 3 . When you arrive at Louis’ home to collect your prize, you see n beer
bottles B1 , B2 , . . . , Bn , n cider bottles C1 , C2 , . . . , Cn , and n wine bottles
W1 , W2 , . . . , Wn . Here, n is an integer with n ≥ 2. Louis tells you that
your prize consists of one beer bottle of your choice, one cider bottle of your
choice, and one wine bottle of your choice.
    Prove that
                    n3 = (n − 1)3 + 3(n − 1)2 + 3(n − 1) + 1,
by counting, in two different ways, the number of ways in which you can
choose your prize.
3.55 Let n ≥ 4 be an integer and consider the set S = {1, 2, . . . , n}. Let k
be an integer with 2 ≤ k ≤ n − 2. In this exercise, we consider subsets A
of S for which |A| = k and {1, 2} 6⊆ A. Let N denote the number of such
subsets.
    • Use the Sum Rule to determine N .
    • Use the Complement Rule to determine N .
    • Use the above two results to prove that
                             n−2          n−2    n−2
                                               
                     n
                         =           +2        +       .
                     k         k          k−1     k−2
   2
    Winter term 2017
   3
    Louis van Gaal has been coach of AZ, Ajax, Barcelona, Bayern München, Manchester
United, and the Netherlands.
72                                                    Chapter 3.       Counting


3.56 Let k ≥ 1 be an integer and consider a sequence n1 , n2 , . . . , nk of posi-
tive integers. Use a combinatorial proof to show that
                                             n1 + n2 + · · · + nk
                                                           
              n1      n2            nk
                  +       + ··· +        ≤                           .
               2       2            2                2
Hint: For each i with 1 ≤ i ≤ k, consider the complete graph on ni vertices.
How many edges does this graph have?
3.57 Let n ≥ 1 be an integer. Prove that
                       n                   
                      X    n      n        2n
                                       =         ,
                      k=1
                           k    k − 1    n +  1

by determining, in two different ways, the number of ways to choose n + 1
people from a group consisting of n men and n women.
3.58 Let n ≥ 1 be an integer. Use Newton’s Binomial Theorem (i.e., Theo-
rem 3.6.5) to prove that
                       n  
                      X   n
                             10k · 26n−k = 36n − 26n .              (3.5)
                      k=1
                          k

In the rest of this exercise, you will give a combinatorial proof of this identity.
    Consider passwords consisting of n characters, each character being a
digit or a lowercase letter. A password must contain at least one digit.
     • Use the Complement Rule of Section 3.3 to show that the number of
       passwords is equal to 36n − 26n .
     • Let k be an integer with 1 ≤ k ≤ n. Prove  that the number of pass-
       words with exactly k digits is equal to nk 10k · 26n−k .
     • Explain why the above two parts imply the identity in (3.5).
3.59 Use Newton’s Binomial Theorem (i.e., Theorem 3.6.5) to prove that
for every integer n ≥ 1,
                           n  
                          X    n k
                                 2 = 3n .                        (3.6)
                          k=0
                               k
In the rest of this exercise, you will give a combinatorial proof of this identity.
    Let A = {1, 2, 3, . . . , n} and B = {a, b, c}. According to Theorem 3.1.2,
the number of functions f : A → B is equal to 3n .
3.11.    Exercises                                                              73


   • Consider a fixed integer k with 0 ≤ k ≤ n and a fixed subset S of A
     having size k. Determine the number of functions f : A → B having
     the property that

         – for all x ∈ S, f (x) ∈ {a, b}, and
         – for all x ∈ A \ S, f (x) = c.

   • Explain why this implies the identity in (3.6).

3.60 Use Newton’s Binomial Theorem (i.e., Theorem 3.6.5) to prove that
for every integer n ≥ 2,
                          n  
                         X   n
                               (n − 1)n−k = nn .                 (3.7)
                         k=0
                             k

In the rest of this exercise, you will give a combinatorial proof of this identity.
    Consider the set A = {1, 2, . . . , n}. According to Theorem 3.1.2, the
number of functions f : A → A is equal to nn .

   • Consider a fixed integer k with 0 ≤ k ≤ n and a fixed subset S of A
     having size k. Determine the number of functions f : A → A having
     the property that

         – for all x ∈ S, f (x) = x, and
         – for all x ∈ A \ S, f (x) 6= x.

   • Explain why this implies the identity in (3.7).

3.61 Let n ≥ 66 be an integer and consider the set S = {1, 2, . . . , n}.
   • Let k be an integer with 66 ≤ k ≤ n. How many 66-element subsets of
     S are there whose largest element is equal to k?

   • Use the result in the first part to prove that
                                n 
                                     k−1
                                            
                               X                 n
                                            =       .
                              k=66
                                      65         66

3.62 Let a ≥ 0, b ≥ 0, and n ≥ 0 be integers, and consider the set S =
{1, 2, 3, . . . , a + b + n + 1}.
74                                                  Chapter 3.     Counting


     • How many subsets of size a + b + 1 does S have?
     • Let k be an integer with 0 ≤ k ≤ n. Consider subsets T of S such that
       |T | = a + b + 1 and the (a + 1)-st smallest element in T is equal to
       a + k + 1. How many such subsets T are there?
     • Use the above results to prove that
                    n 
                                 b+n−k
                                                   
                  X     a+k                    a+b+n+1
                                            =            .
                   k=0
                          k        n −  k         n

3.63 Let n ≥ 0 and k ≥ 0 be integers.
     • How many bitstrings of length n + 1 have exactly k + 1 many 1s?
     • Let i be an integer with k ≤ i ≤ n. What is the number of bitstrings of
       length n+1 that have exactly k +1 many 1s and in which the rightmost
       1 is at position i + 1?
     • Use the above two results to prove that
                                n             
                               X     i      n+1
                                        =         .
                               i=k
                                     k       k+1

3.64 Let k, m, and n be integers with 0 ≤ k ≤ m ≤ n, and let S be a set of
size n. Prove that
                             n−k
                                   
                        n                n    m
                                     =           ,
                        k    m−k         m    k
by determining, in two different ways, the number of ordered pairs (A, B)
with A ⊆ S, B ⊆ S, A ⊆ B, |A| = k, and |B| = m.

3.65 Let m and n be integers with 0 ≤ m ≤ n, and let S be a set of size n.
Prove that             n               
                      X    n     k     n−m n
                                    =2          ,
                      k=m
                           k    m           m
by determining, in two different ways, the number of ordered pairs (A, B)
with A ⊆ S, |A| = m, B ⊆ S, and A ∩ B = ∅.
Hint: The size of B can be any of the values n − m, n − (m + 1), n − (m +
2), . . . , n−n. What is the number of pairs (A, B) having the properties above
and for which |B| = n − k?
3.11.   Exercises                                                             75


3.66 Let m and n be integers with 0 ≤ m ≤ n.

   • How many bitstrings of length n + 1 have exactly m many 1s?

   • Let k be an integer with 0 ≤ k ≤ m. What is the number of bitstrings
     of length n+1 that have exactly m many 1s and that start with 1| ·{z
                                                                        · · 1} 0?
                                                                          k

   • Use the above two results to prove that
                           m 
                                 n−k
                                               
                          X                  n+1
                                         =         .
                          k=0
                                m  − k        m

3.67 Let m and n be integers with 0 ≤ m ≤ n. Use Exercises 3.10, 3.64,
and 3.66 to prove that
                         m   m
                              
                        X
                             k      n+1
                             n =           .
                        k=0 k
                                   n+1−m

3.68 Let n ≥ 1 be an integer. Prove that
                         n     2
                                         2n − 1
                                              
                        X      n
                            k      =n            ,
                        k=1
                               k         n − 1

by determining, in two different ways, the number of ways a committee can
be chosen from a group of n men and n women. Such a committee has a
woman as the chair and n − 1 other members.

3.69 Let n ≥ 2 be an integer and consider the set S = {1, 2, . . . , n}. An
ordered triple (A, x, y) is called awesome, if (i) A ⊆ S, (ii) x ∈ A, and (iii)
y ∈ A.

   • Let k be an integer with 1 ≤ k ≤ n. Determine the number of awesome
     triples (A, x, y) with |A| = k.

   • Prove that the number of awesome triples (A, x, y) with x = y is equal
     to
                                  n · 2n−1 .

   • Determine the number of awesome triples (A, x, y) with x 6= y.
76                                                           Chapter 3.   Counting


     • Use the above results to prove that
                      n      
                      X      n
                             2
                          k     = n(n − 1) · 2n−2 + n · 2n−1 .
                      k=1
                             k

3.70 Let n ≥ 1 be an integer, and let X and Y be two disjoint sets, each
consisting of n elements. An ordered triple (A, B, C) of sets is called cool, if

                  A ⊆ X, B ⊆ Y, C ⊆ B, and |A| + |B| = n.

     • Let k be an integer with 0 ≤ k ≤ n. Determine the number of cool
       triples (A, B, C) for which |A| = k.

     • Let k be an integer with 0 ≤ k ≤ n. Determine the number of cool
       triples (A, B, C) for which |C| = k.

     • Use the above two results to prove that
                       n  2                        n  
                                                        n 2n − k
                                                                 
                       X  n               n−k
                                                    X
                                         ·2       =               .
                       k=0
                                 k                  k=0
                                                        k   n

3.71 Let m ≥ 1 and n ≥ 1 be integers. Consider a rectangle whose horizontal
side has length m and whose vertical side has length n. A path from the
bottom-left corner to the top-right corner is called valid, if in each step, it
either goes one unit to the right or one unit upwards. In the example below,
you see a valid path for the case when m = 5 and n = 3.

                      3

                      2

                      1


                      0              1        2     3    4     5

How many valid paths are there?
3.11.   Exercises                                                            77


3.72 Let n ≥ 1 be an integer. Prove that
                           n     
                          X      n
                              k      = n · 2n−1 .
                          k=1
                                 k

Hint: Take the derivative of (1 + x)n .

3.73 A string consisting of characters is called cool, if exactly one character
in the string is equal to the letter x and each other character is a digit. Let
n ≥ 1 be an integer.
   • Determine the number of cool strings of length n.
   • Let k be an integer with 1 ≤ k ≤ n. Determine the number of cool
     strings of length n that contain exactly n − k many 0’s.
   • Use the above two results to prove that
                           n     
                          X      n k−1
                              k     9    = n · 10n−1 .
                          k=1
                                 k

3.74 Let n ≥ 1 be an integer. We consider binary 2 × n matrices, i.e.,
matrices with 2 rows and n columns, in which each entry is 0 or 1. Any
column in such a matrix is of one of four types, based on the bits that occur in
this column. We will refer to these types as 00-columns, 01-columns, 10-columns,
and 11-columns. For example, in the 2 × 7 matrix below, the first, second, and
fifth columns are 01-columns, the third and seventh columns are 11-columns,
the fourth column is a 00-column, and the sixth column is a 10-column.

                            0   0   1   0   0   1   1
                            1   1   1   0   1   0   1

   For the rest of this exercise, let k be an integer with 0 ≤ k ≤ 2n. A
binary 2 × n matrix is called awesome, if it contains exactly k many 0’s.

   • How many 1’s are there in an awesome 2 × n matrix?
   • How many awesome 2 × n matrices are there?
   • Let i be an integer and consider an arbitrary awesome 2 × n matrix M
     with exactly n − i many 11-columns.
78                                                     Chapter 3.   Counting


         – Prove that dk/2e ≤ i ≤ k.
         – Determine the number of 01-columns plus the number of 10-columns
           in M .

     • Let i be an integer. Prove that the number of awesome 2 × n matrices
       with exactly n − i many 11-columns is equal to
                                                 
                                2i−k    n       i
                               2                      .
                                      n − i 2i − k

     • Use the above results to prove that
                          k                    
                          X
                                   2i n    i     k 2n
                                   2           =2     .
                                      i   k−i       k
                         i=dk/2e


3.75 How many different strings can be obtained by reordering the letters of
the word MississippiMills. (This is a town close to Ottawa. James Naismith,
the inventor of basketball, was born there.)

3.76 In this exercise, we consider strings that can be obtained by reordering
the letters of the word ENGINE.

     • Determine the number of strings that can be obtained.

     • Determine the number of strings in which the two letters E are next to
       each other.

     • Determine the number of strings in which the two letters E are not next
       to each other and the two letters N are not next to each other.

3.77 Determine the number of elements x in the set {1, 2, 3, . . . , 99999} for
which the sum of the digits in the decimal representation of x is equal to 8.
An example of such an element x is 3041.

3.78 In Theorems 3.9.1 and 3.9.2, we have seen how many solutions (in
non-negative integers) there are for equations of the type

                            x1 + x2 + · · · + xk = n
3.11.     Exercises                                                           79


and inequalities of the type

                             x1 + x2 + · · · + xk ≤ n.

Use this to prove the following identity:
                          n 
                              i+k−1
                                              
                        X                   n+k
                                          =      .
                         i=0
                               k−1           k

3.79 Let n and k be integers with n ≥ k ≥ 1. How many solutions are there
to the equation
                         x1 + x2 + · · · + xk = n,
where x1 ≥ 1, x2 ≥ 1, . . . , xk ≥ 1 are integers?
Hint: In Theorem 3.9.1, we have seen the answer if x1 ≥ 0, x2 ≥ 0, . . . ,
xk ≥ 0.

3.80 In this exercise, we consider sequences consisting of five digits.
   • Determine the number of 5-digit sequences d1 d2 d3 d4 d5 , whose digits are
     decreasing, i.e., d1 > d2 > d3 > d4 > d5 .
   • Determine the number of 5-digit sequences d1 d2 d3 d4 d5 , whose digits are
     non-increasing, i.e., d1 ≥ d2 ≥ d3 ≥ d4 ≥ d5 .
        Hint: Consider the numbers x1 = d1 −d2 , x2 = d2 −d3 , x3 = d3 −d4 , x4 =
        d4 − d5 , x5 = d5 . What do you know about x1 + x2 + x3 + x4 + x5 ?

3.81 The square in the left figure below is divided into nine cells. In each
cell, we write one of the numbers −1, 0, and 1.

                                                0   1    0
                                                1   1 −1
                                               −1 0 −1

    Use the Pigeonhole Principle to prove that, among the rows, columns,
and main diagonals, there exist two that have the same sum. For example,
in the right figure above, both main diagonals have sum 0. (Also, the two
topmost rows both have sum 1, whereas the bottom row and the right column
both have sum −2.)
80                                                    Chapter 3.       Counting


3.82 Let S be a set consisting of 19 two-digit integers. Thus, each element
of S belongs to the set {10, 11, . . . , 99}.
    Use the Pigeonhole Principle to prove that this set S contains two distinct
elements x and y, such that the sum of the two digits of x is equal to the
sum of the two digits of y.

3.83 Let S be a set consisting of 9 people. Every person x in S has an age
age(x), which is an integer with 1 ≤ age(x) ≤ 60.

     • Assume that there are two people in S having the same age. Prove that
       there exist two distinct subsets A and B of
                                                PS such that (i)
                                                               Pboth A and
       B are non-empty, (ii) A∩B = ∅, and (iii) x∈A age(x) = x∈B age(x).

     • Assume that all people in S having different ages. Use the Pigeonhole
       Principle to prove that there exist two distinct subsetsPA and B of S
       such
       P that (i) both A and B are non-empty, and (ii) x∈A age(x) =
         x∈B age(x).

     • Assume that all people in S having different ages. Prove that there
       exist two distinct subsets A and B ofP
                                            S such that (i) P
                                                            both A and B are
       non-empty, (ii) A ∩ B = ∅, and (iii) x∈A age(x) = x∈B age(x).

3.84 Let n ≥ 1 be an integer. Use the Pigeonhole Principle to prove that in
any set of n + 1 integers from {1, 2, . . . , 2n}, there are two elements that are
consecutive (i.e., differ by one).

3.85 Let n ≥ 1 be an integer. Use the Pigeonhole Principle to prove that in
any set of n + 1 integers from {1, 2, . . . , 2n}, there are two elements whose
sum is equal to 2n + 1.

3.86 Let S1 , S2 , . . . , S50 be a sequence consisting of 50 subsets of the set
{1, 2, . . . , 55}. Assume that each of these 50 subsets consists of at least seven
elements.
    Use the Pigeonhole Principle to prove that there exist two distinct indices
i and j, such that the largest element in Si is equal to the largest element
in Sj .

3.87 Consider five points in a square with sides of length one. Use the Pi-
geonhole Principle
          √        to prove that there are two of these points having distance
at most 1/ 2.
3.11.     Exercises                                                           81


3.88 Let S1 , S2 , . . . , S26 be a sequence consisting of 26 subsets of the set
{1, 2, . . . , 9}. Assume that each of these 26 subsets consists of at most three
elements. Use the Pigeonhole Principle to prove that there exist two distinct
indices i and j, such that          X       X
                                        x=      x,
                                  x∈Si         x∈Sj

i.e., the sum of the elements in Si is equal
                                         P to the sum of the elements in Sj .
Hint: What are the possible values for x∈Si x?

3.89 Let S be a set of 90 positive integers, each one having at most 25 digits
in decimal notation. Use the Pigeonhole Principle to prove that there are
two distinct subsets A and B of S that have the same sum, i.e.,
                                X       X
                                    x=       x.
                                   x∈A          x∈B


3.90 Let n ≥ 2 be an integer.

   • Let S be a set of n + 1 integers. Prove that S contains two elements
     whose difference is divisible by n.
        Hint: Use the Pigeonhole Principle.

   • Prove that there is an integer that is divisible by n and whose decimal
     representation only contains the digits 0 and 5.
        Hint: Consider the integers 5, 55, 555, 5555, . . .

3.91 In this exercise, we consider the sequence

                                30 , 31 , 32 , . . . , 31000

of integers.

   • Prove that this sequence contains two distinct elements whose difference
     is divisible by 1000. That is, prove that there exist two integers m and
     n with 0 ≤ m < n ≤ 1000, such that 3n − 3m is divisible by 1000.
        Hint: Consider each element in the sequence modulo 1000 and use the
        Pigeonhole Principle.
82                                                           Chapter 3.   Counting


     • Use the first part to prove that the sequence

                                   31 , 32 , . . . , 31000

       contains an element whose decimal representation ends with 001. In
       other words, the last three digits in the decimal representation are 001.

3.92 Let n ≥ 2 be an integer and let G = (V, E) be a graph whose vertex set
V has size n and whose edge set E is non-empty. The degree of any vertex u
is defined to be the number of edges in E that contain u as a vertex. Prove
that there exist at least two vertices in G that have the same degree.
Hint: Consider the cases when G is connected and G is not connected sepa-
rately. In each case, apply the Pigeonhole Principle. Alternatively, consider
a vertex of maximum degree together with its adjacent vertices and, again,
apply the Pigeonhole Principle.

3.93 Let d ≥ 1 be an integer. A point p in Rd is represented by its d
real coordinates as p = (p1 , p2 , . . . , pd ). The midpoint of two points p =
(p1 , p2 , . . . , pd ) and q = (q1 , q2 , . . . , qd ) is the point
                                                                      
                                p1 + q1 p2 + q2                pd + qd
                                            ,            ,...,           .
                                    2              2              2

Let P be a set of 2d + 1 points in Rd , all of which have integer coordinates.
    Use the Pigeonhole Principle to prove that this set P contains two distinct
elements whose midpoint has integer coordinates.
Hint: The sum of two even integers is even, and the sum of two odd integers
is even.
Chapter 4

Recursion

             In order to understand recursion, you must first understand
             recursion.

   Recursion is the concept where an object (such as a function, a set, or an
algorithm) is defined in the following way:
   • There are one or more base cases.
   • There are one or more rules that define an object in terms of “smaller”
     objects that have already been defined.
In this chapter, we will see several examples of such recursive definitions and
how to use them to solve counting problems.


4.1     Recursive Functions
Recall that N = {0, 1, 2, . . .} denotes the set of natural numbers. Consider
the following recursive definition of a function f : N → N:
                    f (0) = 3,
                    f (n) = 2 · f (n − 1) + 3, if n ≥ 1.
These two rules indeed define a function, because f (0) is uniquely defined
and for any integer n ≥ 1, if f (n − 1) is uniquely defined, then f (n) is also
uniquely defined, because it is equal to 2·f (n−1)+3. Therefore, by induction,
for any natural number n, the function value f (n) is uniquely defined. We
can obtain the values f (n) in the following way:
84                                                     Chapter 4.   Recursion


     • We are given that f (0) = 3.

     • If we apply the recursive rule with n = 1, then we get

                         f (1) = 2 · f (0) + 3 = 2 · 3 + 3 = 9.

     • If we apply the recursive rule with n = 2, then we get

                        f (2) = 2 · f (1) + 3 = 2 · 9 + 3 = 21.

     • If we apply the recursive rule with n = 3, then we get

                        f (3) = 2 · f (2) + 3 = 2 · 21 + 3 = 45.

     • If we apply the recursive rule with n = 4, then we get

                        f (4) = 2 · f (3) + 3 = 2 · 45 + 3 = 93.

Can we “solve” this recurrence relation? That is, can we express f (n) in
terms of n only? By looking at these values, you may see a pattern, i.e., you
may guess that for each n ≥ 0,

                              f (n) = 3 · 2n+1 − 3.                      (4.1)

We prove by induction that this is correct: If n = 0, then f (n) = f (0) = 3
and 3 · 2n+1 − 3 = 3 · 20+1 − 3 = 3. Thus, (4.1) is true for n = 0. Let n ≥ 1
and assume that (4.1) is true for n − 1, i.e., assume that

                             f (n − 1) = 3 · 2n − 3.

Then

                          f (n) = 2 · f (n − 1) + 3
                                = 2 (3 · 2n − 3) + 3
                                = 3 · 2n+1 − 3.

Thus, we have proved by induction that (4.1) holds for all integers n ≥ 0.
4.2.   Fibonacci Numbers                                                    85


A recursive definition of factorials: Consider the following recursive
definition of a function g : N → N:
                       g(0) = 1,
                       g(n) = n · g(n − 1), if n ≥ 1.
As in the previous example, a simple induction proof shows that these rules
uniquely define the value g(n) for each n ≥ 0. We leave it to the reader to
verify that g is the factorial function, i.e., g(n) = n! for each n ≥ 0.

A recursive definition of binomial coefficients: Consider the following
recursive definition of a function B : N × N → N with two variables:
  B(n, 0) = 1, if n ≥ 0,
  B(n, n) = 1, if n ≥ 0,
  B(n, k) = B(n − 1, k − 1) + B(n − 1, k), if n ≥ 2 and 1 ≤ k ≤ n − 1.
The recursive rule has the same form as Pascal’sIdentity in Theorem 3.7.2.
                                               n
The first base case shows that B(n,
                                  0) = 1 = 0 , whereas the second base
                               n
case shows that B(n,
                  n) = 1 = n . From this, it can be shown by induction
that B(n, k) = nk for all n and k with 0 ≤ k ≤ n.


4.2     Fibonacci Numbers
             I’ll have an order of the Fibonachos.

   The Fibonacci numbers are defined using the following rules:
                         f0 = 0,
                         f1 = 1,
                         fn = fn−1 + fn−2 , if n ≥ 2.

In words, there are two base cases (i.e., 0 and 1) and each next element in the
sequence is the sum of the previous two elements. This gives the sequence

                   0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .

The following theorem states that we can “solve” this recurrence relation.
That is, we can express the n-th Fibonacci number fn in a non-recursive
way, i.e., without using any other Fibonacci numbers.
86                                                              Chapter 4.   Recursion

                                       √                    √
Theorem 4.2.1 Let ϕ = 1+2 5 and ψ = 1−2 5 be the two solutions of the
quadratic equation x2 = x + 1. Then, for all n ≥ 0, we have

                                               ϕn − ψ n
                                        fn =     √      .
                                                   5

Proof. We prove the claim by induction on n. There are two base cases1 :
                          ϕ0√
                            −ψ 0
     • Both f0 and            5
                                   are equal to 0.

                          ϕ1√
                            −ψ 1
     • Both f1 and            5
                                   are equal to 1.

Let n ≥ 2 and assume that the claim is true for n − 2 and n − 1. In other
words, assume that
                                 ϕn−2 − ψ n−2
                         fn−2 =       √
                                        5
and
                                              ϕn−1 − ψ n−1
                                     fn−1 =       √        .
                                                    5
We have to prove that the claim is true for n as well. Using the definition
of fn , the two assumptions, and the identities ϕ2 = ϕ + 1 and ψ 2 = ψ + 1,
we get

                           fn = fn−1 + fn−2
                                ϕn−1 − ψ n−1 ϕn−2 − ψ n−2
                              =       √         +        √
                                        5                  5
                                  n−2                n−2
                                ϕ (ϕ + 1) ψ (ψ + 1)
                              =      √         −        √
                                        5                  5
                                  n−2     2      n−2     2
                                ϕ      ·ϕ      ψ      ·ψ
                              =     √        −    √
                                      5              5
                                  n      n
                                ϕ −ψ
                              =     √      .
                                      5



     1
         Do you see why there are two base cases?
4.2.    Fibonacci Numbers                                                     87


4.2.1     Counting 00-Free Bitstrings
A bitstring is called 00-free, if it does not contain two 0’s next to each other.
Examples of 00-free bitstrings are 10, 010, 0101010101, and 1111111. On the
other hand, neither of the two bitstrings 101001 and 0100011 is 00-free.
   For any integer n ≥ 1, what is the number of 00-free bitstrings having
length n? Since we do not know the answer yet, we introduce a variable Bn ,
one for each n ≥ 1, for the number of such strings. Thus,

   • Bn denotes the number of 00-free bitstrings of length n.

Let us start by determining Bn for some small values of n. There are two
bitstrings of length 1:
                                  0, 1.
Since neither of them contains 00, we have B1 = 2. There are four bitstrings
of length 2:
                                00, 10, 01, 11.
Since three of them do not contain 00, we have B2 = 3. Similarly, there are
eight bitstrings of length 3:

                     000, 001, 010, 100, 011, 101, 110, 111.

Since five of them do not contain 00, we have B3 = 5.
    Let n ≥ 3. We are going to express Bn in terms of the previous two values
Bn−1 and Bn−2 . This, together with the two base cases B1 = 2 and B2 = 3,
will give a recurrence relation for the entire sequence.
    Consider a matrix that contains all 00-free bitstrings of length n, one
string per row. Since the number of such strings is equal to Bn , the matrix
has Bn rows. Also, the matrix has n columns, because the strings have
length n.
    We rearrange the rows of the matrix such that all strings in the top part
start with 1 and all strings in the bottom part start with 0.

   • How many rows are there in the top part? Any string in the top part
     starts with 1 and is followed by a bitstring of length n − 1 that does not
     contain 00. Thus, if we take the rows in the top part and delete the first
     bit from each row, then we obtain all 00-free bitstrings of length n − 1.
     Since the number of 00-free bitstrings of length n − 1 is equal to Bn−1 ,
     it follows that the top part of the matrix consists of Bn−1 rows.
88                                                   Chapter 4.      Recursion


     • How many rows are there in the bottom part? Any string in the bottom
       part starts with 0. Since the string does not contain 00, the second bit
       must be 1. After these first two bits, we have a bitstring of length n − 2
       that does not contain 00. Thus, if we take the rows in the bottom part
       and delete the first two bits from each row, then we obtain all 00-free
       bitstrings of length n − 2. Since the number of 00-free bitstrings of
       length n − 2 is equal to Bn−2 , it follows that the bottom part of the
       matrix consists of Bn−2 rows.

Thus, on the one hand, the matrix has Bn rows. On the other hand, this
matrix has Bn−1 + Bn−2 rows. Therefore, we have Bn = Bn−1 + Bn−2 .
    To summarize, we have proved that the values Bn , for n ≥ 1, satisfy the
following recurrence relation:

                        B1 = 2,
                        B2 = 3,
                        Bn = Bn−1 + Bn−2 , if n ≥ 3.

This recurrence relation is the same as the one for the Fibonacci numbers,
except that the two base cases are different. The sequence Bn , n ≥ 1, consists
of the integers
                      2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .
We obtain this sequence by removing the first three elements (i.e., f0 , f1 , and
f2 ) from the Fibonacci sequence. We leave it to the reader to verify (using
induction) that for all n ≥ 1,

                                   Bn = fn+2 .


4.3       A Recursively Defined Set
Consider the set S that is defined by the following two rules:

     • 5 is an element of the set S.

     • If x and y are elements of the set S, then x − y is also an element of
       the set S.

Thus, if we already know that x and y belong to the set S, then the second
rule gives us a new element, i.e., x − y, that also belongs to S.
4.3.   A Recursively Defined Set                                           89


    Can we give a simple description of the set S? We are going to use the
rules to obtain some elements of S. From these examples, we then hope to
see a pattern from which we guess the simple description of S. The final step
consists of proving that our guess is correct.

   • We are given that 5 is an element of S.

   • Applying the rule with x = 5 and y = 5 implies that x − y = 0 is also
     an element of S.

   • Applying the rule with x = 0 and y = 5 implies that x − y = −5 is also
     an element of S.

   • Applying the rule with x = 5 and y = −5 implies that x − y = 10 is
     also an element of S.

   • Applying the rule with x = 0 and y = 10 implies that x − y = −10 is
     also an element of S.

   • Applying the rule with x = 5 and y = −10 implies that x − y = 15 is
     also an element of S.

   • Applying the rule with x = 0 and y = 15 implies that x − y = −15 is
     also an element of S.

Thus, we have obtained the following elements of S:

                           −15, −10, −5, 0, 5, 10, 15

Since there is clearly a pattern, it is natural to guess that

                              S = {5n : n ∈ Z},                          (4.2)

where Z is the set of all (positive and negative) integers, including 0. To
prove that this is correct, we will first prove that the set on the left-hand
side is a subset of the set on the right-hand side. Then we prove that the set
on the right-hand side is a subset of the set on the left-hand side.
    We start by proving that

                              S ⊆ {5n : n ∈ Z},
90                                                   Chapter 4.     Recursion


which is equivalent to proving that

                     every element of S is a multiple of 5.              (4.3)

How do we prove this? The set S is defined using a base case and a recursive
rule. The only way to obtain an element of S is by starting with the base
case and then applying the recursive rule a finite number of times. Therefore,
the following will prove that (4.3) holds:
     • The element in the base case, i.e., 5, is a multiple of 5.
     • Let x and y be two elements of S and assume that they are both
       multiples of 5. Then x − y (which is the “next” element of S) is also a
       multiple of 5.
     Next we prove that
                                {5n : n ∈ Z} ⊆ S.
We will do this by proving that for all n ≥ 0,

                             5n ∈ S and − 5n ∈ S.                        (4.4)

The proof is by induction on n. For the base case, i.e., when n = 0, we
observe that, from the definition of S, x = 5 and y = 5 are in S and,
therefore, x − y = 0 is also in S. Therefore, (4.4) is true for n = 0.
   Let n ≥ 0 and assume that (4.4) is true for n, i.e., assume that

                             5n ∈ S and − 5n ∈ S.

We have to show that (4.4) is also true for n + 1, i.e.,

                       5(n + 1) ∈ S and − 5(n + 1) ∈ S.

     • It follows from the definition of S and our assumption that both x = 5
       and y = −5n are in S. Therefore, x − y = 5(n + 1) is also in S.
     • It follows from the definition of S and our assumption that both x =
       −5n and y = 5 are in S. Therefore, x − y = −5(n + 1) is also in S.
Thus, we have shown by induction that (4.4) holds for all n ≥ 0.
    Since we have shown that both (4.3) and (4.4) hold, we conclude that (4.2)
holds as well. In other words, we have indeed obtained a simple description
of the set S: It is the set of all multiples of 5.
4.4.   A Gossip Problem                                                     91


4.4     A Gossip Problem
Let n ≥ 4 be an integer and consider a group P1 , P2 , . . . , Pn of n people.
Assume that each person Pi knows some scandal Si that nobody else knows.
For any i and j, if person Pi makes a phone call with person Pj , they exchange
the scandals they know at that moment, i.e., Pi tells all scandals she knows
to Pj , and Pj tells all scandals he knows to Pi . How many phone calls are
needed until each of the n people knows all n scandals?
    An obvious solution is that each pair of people in the group makes one
phone call. At the end, each person knows all scandals. The number of phone
calls is
                                       n(n − 1)
                                
                                 n
                                    =            ,
                                 2        2
which is quadratic in the number n of people. We will see below that only a
linear number of phone calls are needed.
    Let us first consider the case when n = 4. At the start, each person Pi
only knows the scandal Si , which we visualize in the following table:

                              P1       P2      P3     P4
                              S1       S2      S3     S4

Consider the following sequence of phone calls:
  1. P1 calls P2 . After this phone call, the table looks as follows:

                                P1           P2       P3    P4
                               S1 S2        S1 S2     S3    S4

  2. P3 calls P4 . After this phone call, the table looks as follows:

                             P1         P2           P3      P4
                            S1 S2      S1 S2        S3 S4   S3 S4

  3. P1 calls P3 . After this phone call, the table looks as follows:

                           P1           P2              P3           P4
                       S1 S2 S3 S4     S1 S2        S1 S2 S3 S4     S3 S4

  4. P2 calls P4 . After this phone call, the table looks as follows:
92                                                        Chapter 4.          Recursion

                         P1             P2            P3            P4
                     S1 S2 S3 S4    S1 S2 S3 S4   S1 S2 S3 S4   S1 S2 S3 S4


We see that after four phone calls, each person    knows all four scandals.
Observe that the number of phone calls is 42 = 6 if we would have used the
                                            

obvious solution mentioned above.
    We now have an algorithm that schedules the phone calls for groups of
four people. Below, we will extend this “base case” to a recursive algorithm
that schedules the phone calls for any group of n ≥ 4 people. The approach
is as follows:

     • We assume that we know how to schedule the phone calls for groups of
       n − 1 people.

     • We use this assumption to schedule the phone calls for groups of n
       people.

Let us see how this is done.

     • At the start, P1 knows S1 , P2 knows S2 , . . . , Pn knows Sn .

     • Pn−1 calls Pn . After this phone call, P1 knows S1 , P2 knows S2 , . . . ,
       Pn−2 knows Sn−2 , and both Pn−1 and Pn know Sn−1 and Sn .

                                               0
     • Consider Sn−1 and Sn to be one scandal Sn−1 .

     • Schedule the phone calls for the group P1 , P2 , . . . , Pn−1 of n − 1 people,
                                                    0
       using the scandals S1 , S2 , . . . , Sn−2 , Sn−1 . (We have assumed that we
       know how to do this!) At the end, each of P1 , P2 , . . . , Pn−1 knows all
       scandals S1 , S2 , . . . , Sn .

     • At this moment, Pn only knows Sn−1 and Sn . Therefore, Pn−1 again
       calls Pn and tells her all scandals S1 , S2 , . . . , Sn ; the first n − 2 of these
       are new to Pn .

Below, you see this recursive algorithm in pseudocode.
4.4.     A Gossip Problem                                                        93


       Algorithm gossip(n):

           // n ≥ 4, this algorithm schedules phone calls for P1 , P2 , . . . , Pn
           if n = 4
           then P1 calls P2 ;
                 P3 calls P4 ;
                 P1 calls P3 ;
                 P2 calls P4
           else Pn−1 calls Pn ;
                gossip(n − 1);
                Pn−1 calls Pn
           endif

    We are now going to determine the number of phone calls made when
running algorithm gossip(n). Since we do not know the answer yet, we
introduce a variable C(n) to denote this number. It follows from the pseu-
docode that
                                C(4) = 4.
Let n ≥ 5. Algorithm gossip(n) starts and ends with the same phone call:
Pn−1 calls Pn . In between, it runs algorithm gossip(n − 1), during which,
by definition, C(n − 1) phone calls are made. It follows that
                        C(n) = 2 + C(n − 1) for n ≥ 5.
Thus, we have obtained a recurrence relation for the numbers C(n). The
first few numbers in the sequence are
                    C(4)    =   4,
                    C(5)    =   2 + C(4) = 2 + 4 = 6,
                    C(6)    =   2 + C(5) = 2 + 6 = 8,
                    C(7)    =   2 + C(6) = 2 + 8 = 10.
From this, we guess that
                            C(n) = 2n − 4 for n ≥ 4.
We can easily prove by induction that our guess is correct. Indeed, since
both C(4) and 2 · 4 − 4 are equal to 4, the claim is true for n = 4. If n ≥ 5
and C(n − 1) = 2(n − 1) − 4, then
             C(n) = 2 + C(n − 1) = 2 + (2(n − 1) − 4) = 2n − 4.
94                                                   Chapter 4.      Recursion


This shows that C(n) = 2n − 4 for all n ≥ 4.
   It can be shown that algorithm gossip is optimal: Any algorithm that
schedules phone calls for n ≥ 4 people must make at least 2n − 4 phone calls.
   You may wonder why the base case for algorithm gossip(n) is when
n = 4. You will find the reason in Exercise 4.52.


4.5     Euclid’s Algorithm
             We might call Euclid’s method the granddaddy of all algorithms,
             because it is the oldest nontrivial algorithm that has survived to
             the present day.
                — Donald E. Knuth, The Art of Computer Programming,
             Vol. 2, 1997


    The greatest common divisor of two integers a ≥ 1 and b ≥ 1 is the
largest integer that divides both a and b. We denote this largest integer by
gcd (a, b). For example, the common divisors of 75 and 45 are 1, 3, 5, and 15.
Since 15 is the largest among them, gcd (75, 45) = 15. Observe that for any
integer a ≥ 1, gcd (a, a) = a.
    Assume we are given two large integers a and b, say a = 371, 435, 805 and
b = 137, 916, 675. How can we compute their greatest common divisor? One
approach is to determine the prime factorizations of a and b:

                     a = 371, 435, 805 = 32 · 5 · 134 · 172

and
                     b = 137, 916, 675 = 34 · 52 · 133 · 31.
From this, we see that

                      gcd (a, b) = 32 · 5 · 133 = 98, 865.

Unfortunately, it is not known how to obtain, by an efficient algorithm, the
prime factorization of a very large integer. As a result, this approach to
compute the greatest common divisor of two large integers is not good.
    Around 300 BC, Euclid published an algorithm that is both very simple
and efficient. This algorithm is based on the modulo operation, which we
introduce first.
4.5.     Euclid’s Algorithm                                               95


4.5.1      The Modulo Operation
Let a ≥ 1 and b ≥ 1 be integers. If we divide a by b, then we obtain a
quotient q and a remainder r, which are the unique integers that satisfy

                    a = qb + r, q ≥ 0, and 0 ≤ r ≤ b − 1.

The modulo operation, denoted by a mod b, is the function that maps the
pair (a, b) to the remainder r. Thus, we will write

                                 a mod b = r.

For example,

   • 17 mod 5 = 2, because 17 = 3 · 5 + 2,

   • 17 mod 17 = 0, because 17 = 1 · 17 + 0,

   • 17 mod 1 = 0, because 17 = 17 · 1 + 0,

   • 17 mod 19 = 17, because 17 = 0 · 19 + 17.

4.5.2      The Algorithm
Euclid’s algorithm takes as input two positive integers a and b, where a ≥ b,
and returns gcd (a, b).
   The algorithm starts by computing a mod b and stores the result in a
variable r. If r = 0, then the algorithm returns the value b. Otherwise, we
have r ≥ 1, in which case the algorithm recursively computes the greatest
common divisor of b and r. The algorithm is presented in pseudocode below.
       Algorithm Euclid(a, b):

           // a and b are integers with a ≥ b ≥ 1
           r = a mod b;
           if r = 0
           then return b
           else Euclid(b, r)
                // observe that b > r ≥ 1
           endif
96                                                 Chapter 4.      Recursion


    Let us look at an example. If we run Euclid(75, 45), then the algo-
rithm computes 75 mod 45, which is 30. Then, it runs Euclid(45, 30), dur-
ing which the algorithm computes 45 mod 30, which is 15. Next, it runs
Euclid(30, 15), during which the algorithm computes 30 mod 15, which is 0.
At this moment, the algorithm returns 15, which is indeed the greatest com-
mon divisor of the input values 75 and 45.
    The following lemma is the basis for a proof that algorithm Euclid(a, b)
correctly returns gcd (a, b) for any input values a ≥ b ≥ 1.

Lemma 4.5.1 Let a and b be integers with a ≥ b ≥ 1, and let r = a mod b.
     1. If r = 0, then gcd (a, b) = b.

     2. If r ≥ 1, then gcd (a, b) = gcd (b, r).

Proof. Let q and r be the integers that satisfy a = qb + r, q ≥ 1, and
0 ≤ r ≤ b − 1. (Observe that q cannot be equal to 0, because a ≥ b.)
   If r = 0, then a = qb. In this case, it is clear that gcd (a, b) = b. Assume
that r ≥ 1. We claim that the common divisors of a and b are the same as
the common divisors of b and r:
     • Let d ≥ 1 be an integer that divides both a and b. Since r = a − qb, it
       follows that d divides r. Thus, d divides both b and r.

     • Let d ≥ 1 be an integer that divides both b and r. Since a = qb + r, it
       follows that d divides a. Thus, d divides both a and b.
Since the two pairs a, b and b, r have the same common divisors, their greatest
common divisors are equal as well.


Theorem 4.5.2 For any two integers a and b with a ≥ b ≥ 1, algorithm
Euclid(a, b) returns gcd (a, b).

Proof. If algorithm Euclid(a, b) generates the recursive call Euclid(b, r),
then r < b. Thus, in each recursive call to Euclid, the second argument
decreases. Since this second argument is a positive integer, the algorithm
terminates.
   We leave it to the reader to use Lemma 4.5.1 to prove that the output of
algorithm Euclid(a, b) is gcd (a, b).
4.5.    Euclid’s Algorithm                                                    97


4.5.3     The Running Time
In the beginning of Section 4.5, we mentioned that Euclid’s algorithm is
efficient. In this section, we will formalize this.
    We are going to bound the total number of modulo operations that are
performed when running algorithm Euclid(a, b). This number will be de-
noted by M (a, b).
    For example, when running Euclid(75, 45), the modulo operation is per-
formed three times: The algorithm computes 75 mod 45, 45 mod 30, and
30 mod 15. Therefore, M (75, 45) = 3.
    Our goal is to prove an upper bound on M (a, b) in terms of a and b. In
fact, as we will see, we will obtain an upper bound in terms of b only, i.e., the
upper bound only depends on the smaller of the two input values a and b.
    As a first upper bound, we have seen in the proof of Theorem 4.5.2 that
in each recursive call to algorithm Euclid, the second argument decreases.
Since in the initial call Euclid(a, b), this second argument is equal to b, the
number of modulo operations cannot be larger than b. It follows that, for all
integers a and b with a ≥ b ≥ 1,

                                  M (a, b) ≤ b.

This gives an upper bound that is linear in b. Below, we will prove a much
better upper bound: The value of M (a, b) is at most logarithmic in b. We
will use the Fibonacci numbers of Section 4.2 to obtain this result. Recall
that these numbers are defined by
                        f0 = 0,
                        f1 = 1,
                        fn = fn−1 + fn−2 , if n ≥ 2.

    As mentioned above, we are going to prove an upper bound on M (a, b)
in terms of the logarithm of b. Usually, when analyzing the running time of
an algorithm, we consider a given input and derive an upper bound on the
running time in terms of the input. For algorithm Euclid(a, b), we use the
opposite approach: We fix a value m for the running time M (a, b), and then
prove a lower bound on both a and b in terms of m. The following lemma
makes this precise.

Lemma 4.5.3 Let a and b be integers with a > b ≥ 1, and let m = M (a, b).
Then a ≥ fm+2 and b ≥ fm+1 .
98                                                   Chapter 4.   Recursion


Proof. The proof is by induction on m. The base case is when m = 1. Since
a ≥ b + 1 ≥ 2 = f3 and b ≥ 1 = f2 , the claim in the lemma holds.
   For the induction step, assume that m ≥ 2. Consider the integers q
and r that satisfy a = qb + r, q ≥ 1, and 0 ≤ r ≤ b − 1. Observe that
algorithm Euclid(a, b) computes the value a mod b, which is equal to r.
Since m ≥ 2, we have r ≥ 1 and the total number of modulo operations
performed during the recursive call Euclid(b, r) is equal to m − 1. In other
words, M (b, r) = m − 1. Thus, by induction, we have b ≥ fm+1 and r ≥ fm .
We observe that

                  a = qb + r ≥ b + r ≥ fm+1 + fm = fm+2 .

This completes the induction step.

    In Theorem 4.2.1, we have seen that√ the Fibonacci√ numbers can be ex-
                                     1+ 5           1− 5
pressed in terms of the numbers ϕ = 2 and ψ = 2 . You are encouraged
to prove, by induction and using the fact that ϕ2 = ϕ+1, that for any integer
n ≥ 2,
                                 fn ≥ ϕn−2 .                            (4.5)

Theorem 4.5.4 Let a and b be integers with a ≥ b ≥ 1. Then

                            M (a, b) ≤ 1 + logϕ b,

i.e., the total number of modulo operations performed by algorithm Euclid(a, b)
is O(log b).

Proof. If a = b, then M (a, b) = 1 and the claim obviously holds. Assume
that a > b. Let m = M (a, b). By Lemma 4.5.3 and (4.5), we have

                             b ≥ fm+1 ≥ ϕm−1 .

By taking logarithms with base ϕ, we conclude that

                              m − 1 ≤ logϕ b,

i.e.,
                   M (a, b) = m ≤ 1 + logϕ b = O(log b).
4.6.     The Merge-Sort Algorithm                                                99


4.6       The Merge-Sort Algorithm
MergeSort is a recursive sorting algorithm that works as follows. To sort
the sequence a1 , a2 , . . . , an of numbers,

   • it recursively sorts the sequence a1 , a2 , . . . , am , where m = bn/2c, and
     stores the sorted sequence in a list L1 ,

   • it recursively sorts the sequence am+1 , am+2 , . . . , an and stores the sorted
     sequence in a list L2 ,

   • it merges the two sorted lists L1 and L2 into one sorted list.

Below, you see this recursive algorithm in pseudocode.
       Algorithm MergeSort(L, n):

           // L is a list of n ≥ 0 numbers
           if n ≥ 2
           then m = bn/2c;
                 L1 = list consisting of the first m elements of L;
                 L2 = list consisting of the last n − m elements of L;
                 L1 = MergeSort(L1 , m);
                 L2 = MergeSort(L2 , n − m);
                 L = Merge(L1 , L2 )
           endif;
           return L

    We still have to specify algorithm Merge(L1 , L2 ). Of course, this algo-
rithm uses the fact that both L1 and L2 are sorted lists. The task is to merge
them into one sorted list. This is done in the following way. Initialize an
empty list L. (At the end, this list will contain the final sorted sequence.)

   • Let x be the first element of L1 and let y be the first element of L2 .

   • If x ≤ y, then remove x from L1 and append it to L (i.e., add x at the
     end of L).

   • Otherwise (i.e., if x > y), remove y from L2 and append it to L.
100                                                Chapter 4.      Recursion


Repeat these steps until one of L1 and L2 is empty. If L1 is empty, then
append L2 to L. Otherwise, append L1 to L. Here is the algorithm in
pseudocode:
      Algorithm Merge(L1 , L2 ):

           // L1 and L2 are sorted lists
           L = empty list;
           while L1 is not empty and L2 is not empty
           do x = first element of L1 ;
               y = first element of L2 ;
               if x ≤ y
               then remove x from L1 ;
                      append x to L
               else remove y from L2 ;
                     append y to L
               endif
           endwhile;
           if L1 is empty
           then append L2 to L
           else append L1 to L
           endif;
           return L



4.6.1     Correctness of Algorithm MergeSort
I hope you are convinced that the output L of algorithm Merge(L1 , L2 ) is
a sorted list that contains all elements of L1 and L2 (and no other elements).
How do we prove that algorithm MergeSort(L, n) is correct, i.e., correctly
sorts the elements in any list L of n numbers? Since the algorithm is recursive,
we prove this by induction.
    The two base cases are when n = 0 or n = 1. It follows from the
pseudocode for MergeSort(L, n) that it simply returns the input list L,
which is obviously sorted.
    Let n ≥ 2 and assume that for any integer k with 0 ≤ k < n and for any
list L0 of k numbers, algorithm MergeSort(L0 , k) returns a list containing
the elements of L0 in sorted order. Let L be a list of n numbers. By going
4.6.    The Merge-Sort Algorithm                                            101


through the pseudocode for MergeSort(L, n), we observe the following:

   • The recursive call MergeSort(L1 , m) is on a list with less than n
     numbers. Therefore, by the induction hypothesis, its output, which is
     the list L1 , is sorted.

   • The recursive call MergeSort(L2 , n − m) is on a list with less than n
     numbers. Again by the induction hypothesis, its output, which is the
     list L2 , is sorted.

   • Algorithm Merge(L1 , L2 , ) gets as input the two sorted lists L1 and
     L2 , and returns a list L. Since algorithm Merge is correct, it then
     follows that L is a sorted list.

It follows that the final list L, which is returned by algorithm MergeSort,
is sorted.
    This proves the correctness of algorithm MergeSort(L, n) for any inte-
ger n ≥ 0 and any list L of n numbers.


4.6.2     Running Time of Algorithm MergeSort
We now analyze the running time of algorithm MergeSort. It follows from
the pseudocode that, when running this algorithm together with its recursive
calls, several calls are made to algorithm Merge. We are going to count the
total number of comparisons that are made. That is, we will determine
the total number of times that the line “if x ≤ y” in algorithm Merge is
executed when running algorithm MergeSort(L, n).
    We first observe that the number of comparisons made by algorithm
Merge(L1 , L2 ) is at most |L1 | + |L2 |.
    Let n be an integer and assume for simplicity that n is a power of two, i.e.,
n = 2k for some integer k ≥ 0. We define T (n) to be the maximum number of
comparisons made when running algorithm MergeSort(L, n) on any input
list L of n numbers. Note that we include in T (n) all comparisons that are
made during all calls to Merge that are part of all recursive calls that are
generated when running MergeSort(L, n).
    Consider a list L of n numbers, where n is a power of two. For n = 1, it
follows from the pseudocode for MergeSort(L, n) that

                                   T (1) = 0.
102                                                   Chapter 4.   Recursion


Assume that n ≥ 2 and consider again the pseudocode for MergeSort(L, n).
Which parts of the algorithm make comparisons between input elements?
   • The call MergeSort(L1 , m) is a recursive call on a list of m = n/2
     numbers. By definition, the total number of comparisons made in this
     call (together with all its recursive subcalls) is at most T (n/2).

   • The call MergeSort(L2 , n − m) is a recursive call on a list of n − m =
     n/2 numbers. By definition, the total number of comparisons made in
     this call (together with all its recursive subcalls) is at most T (n/2).

   • Finally, algorithm MergeSort(L, n) calls the non-recursive algorithm
     Merge(L1 , L2 ). We have seen above that the number of comparisons
     made in this call is at most |L1 | + |L2 | = n.
By adding the number of comparisons, we get

              T (n) ≤ T (n/2) + T (n/2) + n = 2 · T (n/2) + n.

Thus, we obtain the following recurrence relation:

         T (1) = 0,
         T (n) ≤ 2 · T (n/2) + n, if n ≥ 2 and n is a power of 2.        (4.6)

Our goal was to determine T (n), but at this moment, we only have a recur-
rence relation for this function. We will solve this recurrence relation using
a technique called unfolding:
    Recall that we assume that n = 2k for some integer k ≥ 0. We further-
more assume that n is a large integer. We know from (4.6) that

                           T (n) ≤ 2 · T (n/2) + n.

If we replace n by n/2 in (4.6), which is a valid thing to do, we get

                        T (n/2) ≤ 2 · T (n/22 ) + n/2.

By combining these two inequalities, we get

                    T (n) ≤ 2 · T (n/2) + n
                          ≤ 2 2 · T (n/22 ) + n/2 + n
                                                 

                          = 22 · T (n/22 ) + 2n.
4.6.   The Merge-Sort Algorithm                                           103


Let us repeat this: Replacing n by n/22 in (4.6) gives

                       T (n/22 ) ≤ 2 · T (n/23 ) + n/22 .

By substituting this into the inequality for T (n), we get

                   T (n) ≤ 22 · T (n/22 ) + 2n
                         ≤ 22 2 · T (n/23 ) + n/22 + 2n
                                                  

                         = 23 · T (n/23 ) + 3n.

In the next step, we replace n by n/23 in (4.6), which gives

                       T (n/23 ) ≤ 2 · T (n/24 ) + n/23 .

By substituting this into the inequality for T (n), we get

                   T (n) ≤ 23 · T (n/23 ) + 3n
                         ≤ 23 2 · T (n/24 ) + n/23 + 3n
                                                  

                         = 24 · T (n/24 ) + 4n.

At this moment, you will see the pattern and, at the end, we get the inequality

                          T (n) ≤ 2k · T (n/2k ) + kn.

Since n = 2k , we have T (n/2k ) = T (1), which is 0 from the base case of the
recurrence relation. Also, n = 2k implies that k = log n. We conclude that

                     T (n) ≤ n · T (1) + n log n = n log n.

We thus have solved the recurrence relation. In case you have doubts about
the validity of the unfolding method, we verify by induction that indeed

           T (n) ≤ n log n, for any integer n that is a power of 2.

The base case is when n = 1. In this case, we have T (1) = 0 and 1 log 1 =
1 · 0 = 0. Let n ≥ 2 be a power of 2 and assume that

                          T (n/2) ≤ (n/2) log(n/2).

From the recurrence relation, we get

                           T (n) ≤ 2 · T (n/2) + n.
104                                                  Chapter 4.      Recursion


By substituting the induction hypothesis into this inequality, we get
                       T (n) ≤     2 · (n/2) log(n/2) + n
                             =     n log(n/2) + n
                             =     n (log n − log 2) + n
                             =     n (log n − 1) + n
                             =     n log n.
Thus, by induction, T (n) ≤ n log n for any integer n that is a power of 2.
    Until now, we have only counted the number of comparisons made by
algorithm MergeSort. It follows from the pseudocode that the total run-
ning time, i.e., the total number of “elementary” steps, is within a constant
factor of the total number of comparisons. Therefore, if n is a power of 2,
the running time of algorithm MergeSort(L, n) is O(n log n).
    For general values of n, the recurrence relation for the number of com-
parisons becomes the following:
                T (n) = 0, if n = 0 or n = 1,
                T (n) ≤ T (bn/2c) + T (dn/2e) + n, if n ≥ 2.
It can be shown by induction that this recurrence relation solves to T (n) =
O(n log n). We have proved the following result:
Theorem 4.6.1 For any list L of n numbers, the running time of algorithm
MergeSort(L, n) is O(n log n).


4.7      Computing the Closest Pair
              For a long time researchers felt that there might be a quadratic
              lower bound on the complexity of the closest-pair problem.
                                                        — Jon Louis Bentley,
               — Communications of the ACM, volume 23, page 226, 1980

   If p = (p1 , p2 ) and q = (q1 , q2 ) are two points in R2 , then their distance
d(p, q) is given by
                                 p
                       d(p, q) = (p1 − q1 )2 + (p2 − q2 )2 .
This follows by applying Pythagoras’ Theorem to the right triangle in the
following figure.
4.7.    Computing the Closest Pair                                        105

                                             q
                              d(p, q)
                                             |p2 − q2|
                         p
                                |p1 − q1|

    Let S be a set of n points in R2 , where n ≥ 2 is an integer. The closest-
pair distance in S, denoted by δ(S), is the minimum distance between any
two distinct points of S, i.e.,

                  δ(S) = min{d(p, q) : p ∈ S, q ∈ S, p 6= q}.




                                 δ(S)




   In this section, we consider the problem of designing an efficient algorithm
that, when given an arbitrary set S of n points in R2 , with n ≥ 2, returns
the closest-pair distance δ(S).
   A trivial algorithm considers all 2-element subsets of S. For each such
subset {p, q}, the algorithm computes the distance d(p, q). After all these
subsets have been considered, the algorithm returns the smallest distance
found. Obviously, the running time of this algorithm is proportional to the
number of 2-element subsets of S, which is
                                  n(n − 1)
                          
                           n
                                           = Θ n2 .
                                                    
                               =
                           2          2
   In this section, we will show that the closest pair problem can be solved,
by a recursive algorithm, in O(n log n) time. In Section 4.7.1, we start by
presenting a high-level overview of the basic approach. Then, in Section 4.7.2,
we present the details of the recursive algorithm.

4.7.1     The Basic Approach
We are given a set S of n points in R2 , where n ≥ 2. We assume that
106                                                  Chapter 4.    Recursion


   • n is a power of two,
   • no two points of S have the same x-coordinate,
   • no two points of S have the same y-coordinate.
We remark that neither of these assumptions is necessary. We only make
them to simplify the presentation.
    As mentioned above, our algorithm will be recursive. The base case is
when n = 2, i.e., the set S consists of exactly two points, say p and q. In
this case, the algorithm simply returns the distance d(p, q).
    From now on, we assume that n ≥ 4. The algorithm performs the follow-
ing four steps:
Step 1: Let ` be a vertical line that splits the set S into two subsets of equal
size. The algorithm computes the set S1 consisting of all points of S that are
to the left of `, and the set S2 consisting of all points of S that are to the
right of `. Observe that |S1 | = |S2 | = n/2.
Step 2: The algorithm recursively computes the closest-pair distance δ1 in
the set S1 .
Step 3: The algorithm recursively computes the closest-pair distance δ2 in
the set S2 .




                  δ1



                                                              δ2



                                   `
                       S1                             S2

Step 4: Let δ = min(δ1 , δ2 ). Consider the set
                  A = {{p, q} : p ∈ S1 , q ∈ S2 , d(p, q) < δ}.
4.7.   Computing the Closest Pair                                             107


   • If A = ∅, then the algorithm returns the value of δ.
   • Assume that A 6= ∅. The algorithm considers all pairs {p, q} ∈ A and
     computes the distances d(p, q). Let δ1,2 be the smallest distance found
     after all these pairs have been considered. Then, the algorithm returns
     the value of δ1,2 .

    It should be clear that this algorithm correctly returns the closest-pair
distance δ(S) in the point set S. What is not clear, however, is how to
efficiently perform the last step that involves the set A. For this, we have to
answer two questions: First, how do we efficiently obtain all pairs {p, q} that
belong to the set A? Second, is there a “small” upper bound on the size of
this set A?
    Let `1 be the vertical line that is at distance δ to the left of `, and let S10
be the set of all points in S1 that are between `1 and `. Similarly, let `2 be
the vertical line that is at distance δ to the right of `, and let S20 be the set
of all points in S2 that are between ` and `2 . Refer to the figure below for
an illustration.
                                   δ        δ



                   δ1



                                                                 δ2



                             `1        `         `2
                                   S10     S20

    Any point that is on or to the left of `1 has distance at least δ to any
point that is on or to the right of `. Similarly, any point that is on or to the
left of ` has distance at least δ to any point that is on or to the right of `2 .
This implies that the set A in Step 4 of the algorithm satisfies
                   A = {{p, q} : p ∈ S10 , q ∈ S20 , d(p, q) < δ}.           (4.7)
108                                                             Chapter 4.        Recursion


    Unfortunately, even using this alternative characterization of the set A, it
is not clear how to obtain all elements of this set in an efficient way. Below,
we will define a superset of A, i.e., a set C of ordered pairs (r, s), with
r ∈ S10 ∪ S20 and s ∈ S10 ∪ S20 , that contains2 all elements of A. As we will
see, the size of this new set C is O(n) and its elements can be obtained in
an efficient way. As a result, the algorithm will use this new set C in Step 4,
                               0
instead of A. If C 6= ∅, let δ1,2  be the smallest distance of any pair (r, s) in
C. The algorithm will return the value
                                           0
                                               
                                   min δ, δ1,2   .
Note that, since A ⊆ C, the algorithm, with the revised Step 4, still correctly
returns the closest-pair distance in the set S.
   Before we define the new set C, we introduce a preliminary set B that is
a superset of A, i.e., A ⊆ B. We will use this set B to define the set C that
we are looking for. This set C will satisfy B ⊆ C and, thus, A ⊆ C.
   We introduce the following notation, which is illustrated in the figure
below. Let r be any point that is between the two lines `1 and `2 . We denote
by Rr the rectangle that has r on its bottom side, whose left side is on `1 ,
whose right side is on `2 , and whose height is equal to δ. Observe that the
width of Rr is equal to 2δ.

                          δ       δ                                 δ       δ




                      Rp                                            p       Rq
             δ                    q                        δ
                      p                                                     q

                 `1           `       `2                       `1       `        `2

      Consider the set
                                           B = B1 ∪ B2 ,
  2
    Even though A consists of unordered pairs and C consists of ordered pairs, we will
cheat a bit and say that C contains A.
4.7.    Computing the Closest Pair                                                109


where
                      B1 = {(p, q) : p ∈ S10 , q ∈ S20 , q ∈ Rp }
and
                      B2 = {(q, p) : p ∈ S10 , q ∈ S20 , p ∈ Rq }.

Lemma 4.7.1 The set B is a superset of the set A, i.e., A ⊆ B.

Proof. We have to show that every element of the set A belongs (as an
ordered pair) to the set B. To prove this, consider an arbitrary element
{p, q} of A. We will show that one of the ordered pairs (p, q) and (q, p) is an
element of the set B.
    It follows from (4.7) that p ∈ S10 and q ∈ S20 . Thus, to prove that one of
(p, q) and (q, p) is an element of B, it remains to be shown that

                                 q ∈ Rp or p ∈ Rq .                              (4.8)

    Since {p, q} ∈ A, we have d(p, q) < δ. This implies that the vertical
distance between p and q is less than δ. That is, if p = (p1 , p2 ) and q = (q1 , q2 ),
then |p2 − q2 | < δ.
    If p2 < q2 , then the point q is contained in the rectangle Rp and, therefore,
(4.8) holds. Otherwise, p2 > q2 , in which case the point p is contained in the
rectangle Rq and, thus, (4.8) also holds.

    Is there a non-trivial upper bound on the size of the set B? Since each
of the two sets S10 and S20 can have n/2 elements, it is clear that |B| ≤
n/2 · n/2 = n2 /4. In words, the size of B is at most quadratic in n. The
following lemma states that the size of B is, in fact, at most linear in n:

Lemma 4.7.2 The size of the set B is at most 4n.

Proof. Let p be an arbitrary point in S10 . We claim that there are at most
four points q such that (p, q) ∈ B1 . We will prove this claim by contradiction.
Thus, assume that there are at least five such points q. Observe that for any
such point q, we have q ∈ S20 and q ∈ Rp . Therefore, all these points q are
contained in the part of Rp that is to the right of the line `. This part is a
square with sides of length δ. By Exercise 3.87, there are two of these points
that have distance at most            √
                                   δ/ 2 < δ.
110                                                          Chapter 4.   Recursion


Thus, the set S20 contains two points having distance less than δ. That is,
the closest-pair distance in the set S2 is less than δ.
    On the other hand, recall that δ = min(δ1 , δ2 ) and δ2 is the closest-pair
distance of the set S2 . It follows that all distances in the set S2 are at least
equal to δ. This is a contradiction.
    Thus, we have shown that, for this fixed point p in S10 , there are at most
four points q such that (p, q) ∈ B1 . Therefore,

                     |B1 | ≤ 4|S10 | ≤ 4|S1 | = 4 · n/2 = 2n.

   By a symmetric argument, for any fixed point q in S20 , there are at most
four points p such that (q, p) ∈ B2 . This implies that the set B2 contains at
most 2n elements. We conclude that

                      |B| = |B1 | + |B2 | ≤ 2n + 2n = 4n.



   We are now ready to define the set C that we are looking for. Let
                                 0
                                S1,2 = S10 ∪ S20 .
                                               0
Imagine that we have the points of this set S1,2  in increasing order of their
                                                   0
y-coordinates. Consider an arbitrary point r of S1,2  . The seven y-successors
                              0
of r are the seven points of S1,2 that immediately follow r in this increasing
order. In the figure below, these are the points a, b, . . . , g.

                                        δ           δ
                                                    g
                                    f
                                                e
                                    d
                                                    c
                                    b
                                                    a
                                    r

                               `1           `           `2
4.7.    Computing the Closest Pair                                           111


   Observe that the number of points that follow r may be less than seven.
In this case, we abuse our terminology a bit and still talk about the seven
y-successors of r, even though there are fewer of them.
   Our final set C is defined as follows:

  C = {(r, s) : r, s ∈ S10 ∪ S20 , s is one of the seven y-successors of r}. (4.9)

Lemma 4.7.3 The set C is a superset of the set A, i.e., A ⊆ C.

Proof. We will prove that B ⊆ C. It will then follow from Lemma 4.7.1
that A ⊆ C.
    Let (p, q) be an arbitrary element in the set B1 . It follows from the
definition of B1 that p ∈ S10 , q ∈ S20 , and q ∈ Rp . To prove that (p, q) is an
element of the set C, we have to argue that q is one of the seven y-successors
of p.
    As in the proof of Lemma 4.7.2, (i) the part of Rp that is to the left of
the line ` contains at most four points of S10 and (ii) the part of Rp that is
to the right of ` contains at most four points of S20 . Thus, the rectangle Rp
contains at most eight points of S10 ∪ S20 . Since p is one of them and p is on
the bottom side of Rp , the point q must be one of the seven y-successors of p.
    Thus, we have shown that B1 ⊆ C. By a symmetric argument, B2 ⊆ C.


    Consider the elements (r, s) of the set C. There are at most n choices
for the point r. For each choice of r, there are at most seven choices for the
point s. This proves the following lemma:

Lemma 4.7.4 The size of the set C is at most 7n.


4.7.2     The Recursive Algorithm
Consider a set of n points in R2 . We make the same assumptions as in
Section 4.7.1. Thus, n ≥ 2, n is a power of two, no two points have the same
x-coordinate, and no two points have the same y-coordinate. Our goal is to
compute the closest-pair distance in this point set. The base case, i.e., when
n = 2, is easy. Assume that n ≥ 4. In Section 4.7.1, we have seen that the
algorithm will make the following steps:
112                                                Chapter 4.      Recursion


Step 1: Determine a vertical line ` that splits the point set into two subsets,
each having size n/2. This step is easy to perform if we have the points in
sorted order of their x-coordinates.
Steps 2 and 3: Run the algorithm recursively, once for all points to the left
of `, and once for all points to the right of `.
Step 4: Compute and traverse the set C that is defined in (4.9). This step is
easy to perform if we have the points in sorted order of their y-coordinates.
   We assume that the set of input points is stored in a list L. The entire
algorithm, which we denote by ClosestPair(L, n), is given in Figure 4.1. In
the pseudocode, Merge(·, ·, y) refers to the merge algorithm of Section 4.6
that merges two lists, based on the y-coordinates of the points.

   • The input to the call ClosestPair(L, n) is a list L that stores n points
     in R2 , where n ≥ 2 and n is a power of two. This list stores the points
     in increasing order of their x-coordinates.

   • The call ClosestPair(L, n) returns the closest-pair distance between
     any two distinct points that are stored in L.

   • At termination, the list L stores the same points, but in sorted order
     of their y-coordinates.

   The algorithm starts by checking if it is in the base case. Clearly, this
base case is easy to handle. Assume that the algorithm is not in the base
case, i.e., n ≥ 4.

   • Since L stores the input points in sorted order of their x-coordinates,
     the algorithm obtains the lists L1 and L2 by a simple traversal of L.
     Observe that, at this moment, the points in both lists L1 and L2 are
     sorted by their x-coordinates. The value z that is chosen by the algo-
     rithm is the x-coordinate of the vertical line `.

   • In the first recursive call ClosestPair(L1 , n/2), the algorithm recur-
     sively computes the closest-pair distance δ1 in L1 , whereas in the second
     recursive call ClosestPair(L2 , n/2), it computes the closest-pair dis-
     tance δ2 in L2 . After these two recursive calls have terminated, the
     points in both lists L1 and L2 are sorted by their y-coordinates.
4.7.   Computing the Closest Pair                                               113




  Algorithm ClosestPair(L, n):

        if n = 2
        then δ = the distance between the two points in L;
              sort the points in L by their y-coordinates;
              return δ
        else L1 = list consisting of the first n/2 points in L;
             L2 = list consisting of the last n/2 points in L;
             z = any value between the x-coordinates of the last point
                   of L1 and the first point of L2 ;
             // both L1 and L2 are sorted by x-coordinate
             δ1 = ClosestPair(L1 , n/2);
             δ2 = ClosestPair(L2 , n/2);
             // both L1 and L2 are sorted by y-coordinate
             δ = min(δ1 , δ2 );
             L01 = list consisting of all points p of L1 with p1 > z − δ;
             L02 = list consisting of all points q of L2 with q1 < z + δ;
             // both L01 and L02 are sorted by y-coordinate
             L01,2 = Merge (L01 , L02 , y);
             L = Merge (L1 , L2 , y);
             // both L01,2 and L are sorted by y-coordinate
             if L01,2 is empty
             then return δ
                     0
             else δ1,2   = min{d(r, s) : r, s ∈ L01,2 , s is one of the seven
                                         y-successors of r};
                                   0
                    return min(δ, δ1,2 )
             endif
        endif;



             Figure 4.1: The recursive closest pair algorithm.
114                                                 Chapter 4.      Recursion


   • By simple traversals of the lists L1 and L2 , the algorithm computes
     the lists L01 and L02 . Observe that L01 stores all points of L1 that are
     to the right of the vertical line `1 that is at distance δ to the left of `.
     Similarly, L02 stores all points of L2 that are to the left of the vertical
     line `2 that is at distance δ to the right of `.
      Since both lists L01 and L02 are in sorted y-order, the algorithm can
      use algorithm Merge (L01 , L02 , y) to merge these two lists into one list
      L01,2 that is also in sorted y-order. Similarly, the algorithm can run
      Merge (L1 , L2 , y) to merge the two lists L1 and L2 into one list L that
      is in sorted y-order.
      In the final step, if the list L01,2 is non-empty, the algorithm computes
                     0
      the value of δ1,2   using a nested for-loop: The outer-loop iterates over
                        0
      all points r of L1,2 . For each such point r, the inner-loop iterates over
      the seven successors of r in the list L01,2 .


    Note that the input list L must contain the points in sorted order of
their x-coordinates. Therefore, before the first call to ClosestPair, we run
algorithm MergeSort(L, n) of Section 4.6 to sort the input points by their
x-coordinates. By Theorem 4.6.1, this takes O(n log n) time.
    We now analyze the running time of algorithm ClosestPair(L, n). Let
T (n) denote the worst-case running time of this algorithm, when given as
input a list of size n whose points are in sorted x-order. If n = 2, then
the running time is bounded by some constant, say c. If n ≥ 2, then the
algorithm spends 2 · T (n/2) time for the two recursive calls, whereas the rest
of the algorithm takes at most c0 n time, where c0 is some constant. Thus,
the function T (n) satisfies the following recurrence:

         T (1) ≤ c,
         T (n) ≤ 2 · T (n/2) + c0 n, if n ≥ 2 and n is a power of 2.

As in Section 4.6.2, this recurrence solves to T (n) = O(n log n). Thus, we
have proved the following result:


Theorem 4.7.5 For any list L of n points in R2 , algorithm ClosestPair(L, n)
computes their closest-pair distance in O(n log n) time.
4.8.    Counting Regions when Cutting a Circle                           115


4.8      Counting Regions when Cutting a Circle
Take a circle, place n points on it, and connect each pair of points by a
straight-line segment. The points must be placed in such a way that no
three segments pass through one point. These segments divide the circle into
regions. Define Rn to be the number of such regions. Can we determine Rn ?




   By looking at the figure above, we see that

                  R1 = 1, R2 = 2, R3 = 4, R4 = 8, R5 = 16.

There seems to be a clear pattern and it is natural to guess that Rn is equal
to 2n−1 for all n ≥ 1. To prove this, we have to argue that the number of
regions doubles if we increase n by 1. If you try to do this, however, then
you will fail! The reason is that Rn is not equal to 2n−1 for all n ≥ 1; our
guess was correct only for 1 ≤ n ≤ 5.
    We will prove below that Rn grows only polynomially in n. This will
imply that Rn cannot be equal to 2n−1 for all n, because the latter function
grows exponentially.


4.8.1     A Polynomial Upper Bound on Rn
Let n be a (large) integer,
                         consider a placement of n points on a circle, and
                       n
connect each of the 2 pairs of points by a straight-line segment. Recall
that we assume that no three segments pass through one point. We define
the following graph:

   • Each of the n points on the circle is a vertex.

   • Each intersection point between two segments is a vertex.
116                                                Chapter 4.     Recursion


   • These vertices divide the segments into subsegments and the circle into
     arcs in a natural way. Each such subsegment and arc is an edge of the
     graph.
    The figure below illustrates this for the case when n = 5. The graph on
the right has 10 = 5 + 5 vertices: Each of the 5 points on the circle leads to
one vertex and each of the  5 intersection points leads to one vertex. These
                         5
10 vertices divide the 2 = 10 segments into 20 straight-line edges and the
circle into 5 circular edges. Therefore, the graph has 20 + 5 = 25 edges.




    Note that, strictly speaking, this process does not define a proper graph,
because any two consecutive vertices on the circle are connected by two edges
(one straight-line edge and one circular edge), whereas in a proper graph,
there can be only one edge between any pair of vertices. For simplicity,
however, we will refer to the resulting structure as a graph.
    Let Vn and En be the number of vertices and edges of the graph, respec-
tively. We claim that
                                           n
                                          
                              Vn ≤ n +     2  .                        (4.10)
                                          2
This claim follows from the following observations:
   • There are exactly n vertices on the circle.
   • The n points on the circle are connected by n2 segments, and any
                                                      

     two such segments intersect at most once. Therefore, the number of
     vertices inside the circle is at most the number of pairs of segments.
     The latter quantity is equal to
                                       n
                                         2  .
                                        2

We next claim that                       
                                         Vn
                              En ≤ n +       .                         (4.11)
                                          2
This claim follows from the following observations:
4.8.   Counting Regions when Cutting a Circle                            117


   • There are exactly n edges on the circle.

   • Any straight-line edge joins two vertices. Therefore, the number of
     straight-line edges is at most the number of pairs of vertices, which is
      Vn
         
       2
           .
The final claim is that
                                  Rn ≤ En .                            (4.12)
To prove this claim, we do the following. For each region r, choose a point
pr inside r, such that the y-coordinate of pr is not equal to the y-coordinate
of any vertex. Let f (r) be the first edge that is reached when walking from
pr horizontally to the right.


                                 r
                                       pr
                                              f (r)




This defines a one-to-one function f from the set of regions to the set of
edges. Therefore, the number of regions, which is Rn , is at most the number
of edges, which is En .
    By combining (4.10), (4.11), and (4.12), we get

                          Rn ≤ En
                                    
                                    Vn
                              ≤ n+
                                     2
                                    n + (22 )
                                        n 

                              ≤ n+            .
                                       2
In order to estimate the last quantity, we are going to use asymptotic nota-
tion; see Section 2.3. First observe that
                                  n(n − 1)
                           
                            n
                                =          = O(n2 ).
                            2         2
118                                                 Chapter 4.    Recursion


This implies that        n
                                      O(n2 )
                                           
                           2       =          = O(n4 ),
                           2            2
which implies that
                           n
                     n+        2    = n + O(n4 ) = O(n4 ),
                               2
which implies that

                      n + (22 )
                           n 
                                   O(n4 )
                                       
                                =          = O(n8 ),
                         2           2
which implies that

                         n + (22 )
                             n 

               Rn ≤ n +            = n + O(n8 ) = O(n8 ).
                            2
Thus, we have proved our claim that Rn grows polynomially in n and, there-
fore, for large values of n, Rn is not equal to 2n−1 . (Using results on pla-
nar graphs that we will see in Section 7.5.1, it can be shown that, in fact,
Rn = O(n4 ).)
    We remark that there is a shorter way to prove that Rn is not equal to
2n−1 for all n ≥ 1: You can verify by hand that R6 = 31. Still, this single
example does not rule out the possibility that Rn grows exponentially. The
analysis that we gave above does rule this out.
    We have proved above that Rn = O(n8 ). We also mentioned that this
upper bound can be improved to O(n4 ). In the following subsections, we
will prove that the latter upper bound cannot be improved. That is, we will
prove that Rn = Θ(n4 ). In fact, we will determine an exact formula, in terms
of n, for the value of Rn .

4.8.2    A Recurrence Relation for Rn
Let n ≥ 2 be an integer and consider a placement of n points on a circle. We
denote these points by p1 , p2 , . . . , pn and assume that they are
                                                                  numbered in
                                                                n
counterclockwise order. As before, we connect each of the 2 pairs of points
by a straight-line segment. We assume that no three segments pass through
one point. We are going to derive a recurrence relation for the number Rn
of regions in the following way:
4.8.    Counting Regions when Cutting a Circle                             119


   • Remove all segments that have pn as an endpoint. At this moment, the
     number of regions is, by definition, equal to Rn−1 .

   • Add the n − 1 line segments p1 pn , p2 pn , . . . , pn−1 pn one by one. For
     each segment pk pn added, determine the increase Ik in the number of
     regions.

   • Take the sum of Rn−1 and all increases Ik , i.e.,
                                                 n−1
                                                 X
                                        Rn−1 +         Ik .
                                                 k=1

       This sum is equal to Rn , because in the entire process, we have counted
       each of the regions for n points exactly once.

   • Thus, together with the base case R1 = 1, we obtain a recurrence
     relation for the values Rn .

    We start by illustrating this process for the case when n = 6. The figure
below shows the situation after we have removed all segments that have p6
as an endpoint. The number of regions is equal to R5 = 16.
                                   p2             p1

                                                       p6
                              p3

                                         p4      p5

    We are going to add, one by one, the five segments that have p6 as an
endpoint. When we add p1 p6 , one region gets cut into two. Thus, the number
of regions increases by one. Using the notation introduced above, we have
I1 = 1.
                                   p2             p1

                                                       p6
                              p3

                                         p4      p5
120                                                    Chapter 4.   Recursion


    When we add p2 p6 , four regions get cut into two. Thus, the number of
regions increases by four, and we have I2 = 4.

                                  p2         p1

                                                  p6
                             p3

                                       p4   p5


    When we add p3 p6 , five regions get cut into two. Thus, the number of
regions increases by five, and we have I3 = 5.

                                  p2         p1

                                                  p6
                             p3

                                       p4   p5


    When we add p4 p6 , four regions get cut into two. Thus, the number of
regions increases by four, and we have I4 = 4.

                                  p2         p1

                                                  p6
                             p3

                                       p4   p5


    Finally, when we add p5 p6 , one region gets cut into two. Thus, the number
of regions increases by one, and we have I5 = 1.

                                  p2         p1

                                                  p6
                             p3

                                       p4   p5
4.8.   Counting Regions when Cutting a Circle                                  121


    After having added the five segments with endpoint p6 , we have accounted
for all regions determined by the six points. In other words, the number of
regions we have at the end is equal to R6 . Since the number of regions at
the end is also equal to the sum of (i) the number of regions we started with,
which is R5 , and (ii) the total increase, we have

                    R6 = R5 + I1 + I2 + I3 + I4 + I5 = 31.


    Let us look at this more carefully. We have seen that I3 = 5. That is,
when adding the segment p3 p6 , the number of regions increases by 5. Where
does this number 5 come from? The segment p3 p6 intersects 4 segments,
namely p1 p4 , p1 p5 , p2 p4 , and p2 p5 . The increase in the number of regions is
one more than the number of intersections. Thus, when adding a segment,
if we determine the number X of intersections between this new segment
and existing segments, then the increase in the number of regions is equal to
1 + X.
    When we add p3 p6 , we have X = 4. Where does this number 4 come
from? We make the following observations:


   • Any segment that intersects p3 p6 has one endpoint above p3 p6 and one
     endpoint below p3 p6 .


   • Any pair (a, b) of points on the circle, with a above p3 p6 and b below
     p3 p6 , defines a segment ab that intersects p3 p6 .


   • Thus, the value of X is equal to the number of pairs (a, b) of points in
     {p1 , p2 , p4 , p5 }, where a is above p3 p6 and b is below p3 p6 . Since there
     are 2 choices for a (viz., p1 and p2 ) and 2 choices for b (viz., p4 and p5 ),
     it follows from the Product Rule that X = 2 · 2 = 4.


   Now that we have seen the basic approach, we are going to derive the
recurrence relation for Rn for an arbitrary integer n ≥ 2. After having
removed all segments that have pn as an endpoint, we have Rn−1 regions.
For each integer k with 1 ≤ k ≤ n − 1, we add the segment pk pn . What is
the number of existing segments that are intersected by this new segment?
122                                                           Chapter 4.    Recursion

                                            pi
                              pk−1
                                                        p1
                               pk                        pn
                              pk+1
                                                        pn−1
                                            pj

   We observe that for i < j,
   • pi pj intersects pk pn if and only if 1 ≤ i ≤ k − 1 and k + 1 ≤ j ≤ n − 1.
Since there are k − 1 choices for i and n − k − 1 choices for j, the Product
Rule implies that the number of intersections due to pk pn is equal to (k −
1)(n − k − 1). Thus, the segment pk pn goes through 1 + (k − 1)(n − k − 1)
regions, and each of them is cut into two. It follows that, when adding pk pn ,
the increase Ik in the number of regions is equal to
                         Ik = 1 + (k − 1)(n − k − 1).
We conclude that
                                     n−1
                                     X
                Rn = Rn−1 +                Ik
                                     k=1
                                     n−1
                                     X
                     = Rn−1 +              (1 + (k − 1)(n − k − 1)) .
                                     k=1

In the summation on the right-hand side
   • the term 1 occurs exactly n − 1 times, and
   • the term (k − 1)(n − k − 1) is non-zero only if 2 ≤ k ≤ n − 2.
It follows that, for n ≥ 2,
                                                n−2
                                                X
               Rn = Rn−1 + (n − 1) +                  (k − 1)(n − k − 1).       (4.13)
                                                k=2

Thus, together with the base case
                                       R1 = 1,                                  (4.14)
we have determined the recurrence relation we were looking for.
4.8.    Counting Regions when Cutting a Circle                             123


4.8.3     Simplifying the Recurrence Relation
In this subsection, we will use a combinatorial proof (see Section 3.7) to show
that the summation on the right-hand side of (4.13) satisfies
                      n−2
                                                n−1
                      X                           
                          (k − 1)(n − k − 1) =      ,                    (4.15)
                      k=2
                                                 3

for any integer n ≥ 2. (In fact, (4.15) is a special case of the result in
Exercise 3.62.)
    If n ∈ {2, 3}, then both sides of (4.15) are equal to zero. Assume that
n ≥ 4 and consider the set S = {1, 2, . . . , n−1}. We know that the number of
3-element subsets of S is equal to n−13
                                           . As we will see below, the summation
on the left-hand side of (4.15) counts exactly the same subsets.
    We divide the 3-element subsets of S into groups based on their mid-
dle element. Observe that the middle element can be any of the values
2, 3, . . . , n − 2. Thus, for any k with 2 ≤ k ≤ n − 2, the k-th group Gk
consists of all 3-element subsets of S whose middle element is equal to k.
Since the groups are pairwise disjoint, we have
                                       n−2
                                 n−1
                                    X
                                     =     |Gk |.
                                  3    k=2

What is the size of the k-th group Gk ? Any 3-element subset in Gk consists
of
   • one element from {1, 2, . . . , k − 1},

   • the element k, and

   • one element from {k + 1, k + 2, . . . , n − 1}.
It then follows from the Product Rule that

            |Gk | = (k − 1) · 1 · (n − k − 1) = (k − 1)(n − k − 1).

Thus, we have proved the identity in (4.15), and the recurrence relation in
(4.13) and (4.14) becomes

                  R1 = 1,
                                               n−1
                                                                        (4.16)
                  Rn = Rn−1 + (n − 1) +         3
                                                      , if n ≥ 2.
124                                              Chapter 4.      Recursion


4.8.4    Solving the Recurrence Relation
Now that we have a recurrence relation that looks reasonable, we are going
to apply the unfolding technique of Section 4.6 to solve it. Let n ≥ 2 be an
integer. By repeatedly applying the recurrence relation in (4.16), we get
                       n−1
                           
   Rn = (n − 1) +             + Rn−1
                         3
                                n−1         n−2
                                               
        = (n − 1) + (n − 2) +           +           + Rn−2
                                  3            3
                                           n−1         n−2         n−3
                                                                   
        = (n − 1) + (n − 2) + (n − 3) +            +           +
                                             3           3           3
           + Rn−3 .
By continuing, we get
   Rn = (n − 1) + (n − 2) + (n − 3) + · · · + 3 + 2 + 1
              n−1      n−2         n−3
                                                    
                                                      3   2   1
          +         +          +             + ··· +    +   +
                3         3          3                3   3   3
          + R1 .
Since 23 = 13 = 0 and R1 = 1, we get
            

          Rn = (n − 1) + (n − 2) + (n − 3) + · · · + 3 + 2 + 1
                  n−1         n−2         n−3
                                                       
                                                             3
               +          +           +             + ··· +
                    3           3           3                3
               + 1.
Since, by Theorem 2.2.10, the first summation is equal to
                                                          
                                                          n
              1 + 2 + 3 + · · · + (n − 1) = n(n − 1)/2 =     ,
                                                          2
we get
                                   Xn−1  
                                  n        k
                        Rn = 1 +    +         .
                                  2   k=3
                                           3
    The final step is to simplify the summation on the right-hand side. We
will use a combinatorial proof to show that
                               n−1       
                               X     k     n
                                        =     ,                      (4.17)
                               k=3
                                     3     4
4.9.   Exercises                                                          125


for any integer n ≥ 2. (As was the case for (4.15), the identity in (4.17) is a
special case of the result in Exercise 3.62.)
    If n ∈ {2, 3}, then both sides of (4.17) are equal to zero. Assume that
n ≥ 4 and consider all 4-element subsets of the set S = {1, 2, . . . , n}. We
know that there are n4 many such subsets. We divide these subsets into
groups based on their largest element. For any k with 3 ≤ k ≤ n − 1, the
k-th group Gk consists of all 4-element subsets of S whose largest element is
equal to k + 1. It should be clear that
                                 X    n−1
                                 n
                                     =      |Gk |.
                                 4      k=3

To determine the size of the group Gk , we observe that any 4-element subset
in Gk consists of
   • three elements from {1, 2, . . . , k} and
   • the element k + 1.
It then follows from the Product Rule that
                                          
                                   k        k
                          |Gk | =     ·1=      ,
                                   3        3
completing the proof of (4.17).
   After (finally!) having solved and simplified our recurrence relation, we
conclude that for any integer n ≥ 1,
                                        
                                      n       n
                           Rn = 1 +      +       .
                                      2       4
   In Exercise 4.76, you will see a shorter way to determine the exact value
of Rn . We went for the long derivation, because it allowed us to illustrate,
along the way, several techniques from previous sections.


4.9     Exercises
4.1 The function f : N → Z is recursively defined as follows:
                    f (0) = 7,
                    f (n) = f (n − 1) + 6n − 3 if n ≥ 1.
Prove that f (n) = 3n2 + 7 for all integers n ≥ 0.
126                                                    Chapter 4.       Recursion


4.2 The function f : N → Z is recursively defined as follows:
               f (0) = −18,
               f (n) = 9(n − 2)(n − 3) + f (n − 1) if n ≥ 1.
Prove that
                          f (n) = 3(n − 1)(n − 2)(n − 3)
for all integers n ≥ 0.

4.3 The function f : N → Z is recursively defined as follows:
                f (0) = 3,
                f (n) = 2 · f (n − 1) − (f (n − 1))2 if n ≥ 1.
                           n                                        n
Prove that f (n) = 1 − 22 for all integers n ≥ 1. (Note that 22 denotes 2 to
the power of 2n .)

4.4 The function f : N → N is defined by
                     f (0) = 1,
                     f (n) = 21 · 4n · f (n − 1) if n ≥ 1.
Prove that for every integer n ≥ 0,
                                                2
                                    f (n) = 2n ;

this reads as 2 to the power n2 .

4.5 The function f : N → N is defined by
                     f (0) = 0,
                     f (1) = 0,
                     f (n) = f (n − 2) + 2n−1 if n ≥ 2.
   • Prove that for every even integer n ≥ 0,
                                              2n+1 − 2
                                    f (n) =            .
                                                 3

   • Prove that for every odd integer n ≥ 1,
                                              2n+1 − 4
                                    f (n) =            .
                                                 3
4.9.     Exercises                                                       127


4.6 The function f : {1, 2, 3, . . .} → R is defined by

                  f (1) = 2,                            
                  f (n) = 12 f (n − 1) +           1
                                               f (n−1)
                                                             if n ≥ 2.

   • Prove that for every integer n ≥ 1,
                                               n−1
                                            32 + 1
                                     f (n) = 2n−1    .
                                            3     −1
                     n−1
        Note that 32       denotes 3 to the power of 2n−1 .

4.7 You are asked to come up with an exam question about recursive func-
tions. You write down some recurrence, which you then solve. Afterwards,
you give the recurrence to the students, together with the solution. The
students must then prove that the given solution is indeed correct.
    This is a painful process, because you must solve the recurrence yourself.
Since you are lazy, you start with the following:
       Exam Question:

           The function f : N → N is defined by

                       f (0) = XXX,
                       f (n) = f (n − 1) + Y Y Y         if n ≥ 1.

           Prove that for every integer n ≥ 0,

                                 f (n) = 7n2 − 2n + 9.


   • Complete the question, i.e., fill in XXX and Y Y Y , so that you obtain
     a complete recurrence that has the given solution.

4.8 The function f : N → Z is defined by

                                  f (n) = 2n(n − 6)

for each integer n ≥ 0. Derive a recursive form of this function.
128                                                               Chapter 4.     Recursion


4.9 The function f : N2 → N is defined by
            f (0, n)    =   2n                               if   n ≥ 0,
            f (m, 0)    =   0                                if   m ≥ 1,
            f (m, 1)    =   2                                if   m ≥ 1,
            f (m, n)    =   f (m − 1, f (m, n − 1))          if   m ≥ 1 and n ≥ 2.
   • Determine f (2, 2).
   • Determine f (1, n) for n ≥ 1.
   • Determine f (3, 3).

4.10 The function f : N3 → N is defined as follows:
      f (k, n, 0)   =   k+n                             if    k   ≥ 0 and n ≥ 0,
      f (k, 0, 1)   =   0                               if    k   ≥ 0,
      f (k, 0, 2)   =   1                               if    k   ≥ 0,
      f (k, 0, i)   =   k                               if    k   ≥ 0 and i ≥ 3,
      f (k, n, i)   =   f (k, f (k, n − 1, i), i − 1)   if    k   ≥ 0, i ≥ 1, and n ≥ 1.
Determine f (2, 3, 2).

4.11 The functions f : N → N and g : N2 → N are recursively defined as
follows:
               f (0)        =   1,
               f (n)        =   g(n, f (n − 1)) if n ≥ 1,
               g(m, 0)      =   0               if m ≥ 0,
               g(m, n)      =   m + g(m, n − 1) if m ≥ 0 and n ≥ 1.
Solve these recurrence relations for f , i.e., express f (n) in terms of n.

4.12 The functions f : N → N and g : N2 → N are recursively defined as
follows:
            f (0)       =   1,
            f (1)       =   2,
            f (n)       =   g(f (n − 2), f (n − 1)) if n ≥ 2,
            g(m, 0)     =   2m                      if m ≥ 0,
            g(m, n)     =   g(m, n − 1) + 1         if m ≥ 0 and n ≥ 1.
Solve these recurrence relations for f , i.e., express f (n) in terms of n.
4.9.   Exercises                                                              129


4.13 The functions f : N → N and g : N2 → N are recursively defined as
follows:
             f (0)      =   1,
             f (n)      =   g(f (n − 1), 2n) if n ≥ 1,
             g(0, n)    =   0                if n ≥ 0,
             g(m, n)    =   g(m − 1, n) + n if m ≥ 1 and n ≥ 0.

Solve these recurrence relations for f , i.e., express f (n) in terms of n.

4.14 The functions f : N → N, g : N2 → N, and h : N → N are recursively
defined as follows:
             f (n)      =   g(n, h(n))          if n ≥ 0,
             g(m, 0)    =   0                   if m ≥ 0,
             g(m, n)    =   g(m, n − 1) + m     if m ≥ 0 and n ≥ 1,
             h(0)       =   1,
             h(n)       =   2 · h(n − 1)        if n ≥ 1.

Solve these recurrences for f , i.e., express f (n) in terms of n.

4.15 The sequence an of numbers, for n ≥ 0, is recursively defined as follows:

                       a0 = 5,
                       a1 = 3,
                       an = 6 · an−1 − 9 · an−2 if n ≥ 2.

   • Determine an for n = 0, 1, 2, 3, 4, 5.

   • Prove that for every integer n ≥ 0,

                                  an = (5 − 4n) · 3n .
                  √                √
4.16 Let ϕ = 1+2 5 and ψ =       1− 5
                                   2
                                      ,   and let n ≥ 0 be an integer. We have
seen in Theorem 4.2.1 that
                                  ϕn − ψ n
                                     √                                (4.18)
                                        5
is equal to the n-th Fibonacci number fn . Since the Fibonacci numbers are
obviously integers, the number in (4.18) is an integer as well.
    Prove that the number in (4.18) is a rational number using only Newton’s
Binomial Theorem (i.e., Theorem 3.6.5).
130                                                      Chapter 4.   Recursion


4.17 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . . In
this exercise, you will prove that there exists a Fibonacci number whose 2018
rightmost digits (when written in decimal notation) are all zero.
    In the rest of this exercise, N denotes the number 104036 . For any integer
n ≥ 0, define
                               gn = fn mod 102018 .

   • Consider the ordered pairs (gn , gn+1 ), for n = 0, 1, 2, . . . , N . Use the
     Pigeonhole Principle to prove that these ordered pairs cannot all be
     distinct. That is, prove that there exist integers m ≥ 0, p ≥ 1, such
     that m + p ≤ N and

                              (gm , gm+1 ) = (gm+p , gm+p+1 ).

   • Prove that (gm−1 , gm ) = (gm+p−1 , gm+p ).

   • Prove that (g0 , g1 ) = (gp , gp+1 ).

   • Consider the decimal representation of fp . Prove that the 2018 right-
     most digits of fp are all zero.

   • Let b ≥ 2 and k ≥ 1 be integers. Prove that there exists a Fibonacci
     number whose k rightmost digits (when written in base-b notation) are
     all zero.

4.18 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . .
Prove that for each integer n ≥ 1,
                                n
                                X
                                       f2i = f2n+1 − 1
                                 i=1

and
                       f12 + f22 + f32 + · · · + fn2 = fn fn+1 .

4.19 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . .
Prove that for each integer n ≥ 0,

   • f3n is even,

   • f3n+1 is odd,
4.9.   Exercises                                                             131


   • f3n+2 is odd,

   • f4n is a multiple of 3.

4.20 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . . In
Section 4.2.1, we have seen that for any integer m ≥ 1, the number of 00-free
bitstrings of length m is equal to fm+2 .
    Let n ≥ 2 be an integer.

   • How many 00-free bitstrings of length n do not contain any 0?

   • How many 00-free bitstrings of length n have the following property:
     The rightmost 0 is at position 1.

   • How many 00-free bitstrings of length n have the following property:
     The rightmost 0 is at position 2.

   • Let k be an integer with 3 ≤ k ≤ n. How many 00-free bitstrings of
     length n have the following property: The rightmost 0 is at position k.

   • Use the previous results to prove that
                                               n
                                               X
                                  fn+2 = 1 +         fk .
                                               k=1


4.21 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . . In
Section 4.2.1, we have seen that for any integer m ≥ 1, the number of 00-free
bitstrings of length m is equal to fm+2 .

   • Let n ≥ 2 be an integer. What is the number of 00-free bitstrings of
     length 2n − 1 for which the bit in the middle position is equal to 1?

   • Let n ≥ 3 be an integer. What is the number of 00-free bitstrings of
     length 2n − 1 for which the bit in the middle position is equal to 0?

   • Use the previous results to prove that for any integer n ≥ 3,

                                  f2n+1 = fn2 + fn+1
                                                 2
                                                     .
132                                                    Chapter 4.        Recursion


4.22 In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . . In
Section 4.2.1, we have seen that for any integer m ≥ 1, the number of 00-free
bitstrings of length m is equal to fm+2 .
    Let n ≥ 1 be an integer.
   • How many 00-free bitstrings of length n + 2 do not contain any 0?
   • How many 00-free bitstrings of length n + 2 contain exactly one 0?
   • How many 00-free bitstrings of length n+2 have the following property:
     The bitstring contains at least two 0’s, and the second rightmost 0 is
     at position 1.
   • How many 00-free bitstrings of length n+2 have the following property:
     The bitstring contains at least two 0’s, and the second rightmost 0 is
     at position 2.
   • Let k be an integer with 3 ≤ k ≤ n. How many 00-free bitstrings of
     length n+2 have the following property: The bitstring contains at least
     two 0’s, and the second rightmost 0 is at position k.
   • Let k be an element of {n + 1, n + 2}. How many 00-free bitstrings
     of length n + 2 have the following property: The bitstring contains at
     least two 0’s, and the second rightmost 0 is at position k.
   • Use the previous results to prove that
                          n
                          X
                                (n − k + 1) · fk = fn+4 − n − 3,
                          k=1

      i.e.,

      n · f1 + (n − 1) · f2 + (n − 2) · f3 + · · · + 2 · fn−1 + 1 · fn = fn+4 − n − 3.

4.23 Use basic algebra to prove that
                                  2
               x2 + y 2 + (x + y)2 = 2 x4 + y 4 + (x + y)4 .
                                                          

In Section 4.2, we have defined the Fibonacci numbers f0 , f1 , f2 , . . . Prove
that for each integer n ≥ 0,
                                    2
                  fn2 + fn+1
                         2      2
                                       = 2 fn4 + fn+1
                                                  4      4
                                                             
                             + fn+2                   + fn+2   .
4.9.   Exercises                                                          133


4.24 Let n ≥ 1 be an integer and consider a 2 × n board Bn consisting of 2n
square cells. The top part of the figure below shows B13 .




    A brick is a horizontal or vertical board consisting of 2 square cells; see
the bottom part of the figure above. A tiling of the board Bn is a placement
of bricks on the board such that
   • the bricks exactly cover Bn and
   • no two bricks overlap.
The figure below shows a tiling of B13 .




   For n ≥ 1, let Tn be the number of different tilings of the board Bn .
Determine the value of Tn , i.e., express Tn in terms of numbers that we have
seen in this chapter.
4.25 Let n be a positive integer and consider a 5 × n board Bn consisting
of 5n cells, each one having sides of length one. The top part of the figure
below shows B12 .
134                                                Chapter 4.      Recursion


    A brick is a horizontal or vertical board consisting of 2 × 3 = 6 cells; see
the bottom part of the figure above. A tiling of the board Bn is a placement
of bricks on the board such that

   • the bricks exactly cover Bn and

   • no two bricks overlap.

The figure below shows a tiling of B12 .




   Let Tn be the number of different tilings of the board Bn .

   • Let n ≥ 6 be a multiple of 6. Determine the value of Tn .

   • Let n be a positive integer that is not a multiple of 6. Prove that
     Tn = 0.

4.26 Let n be a positive integer and consider a 1 × n board Bn consisting of
n cells, each one having sides of length one. The top part of the figure below
shows B9 .



                        R     B    W        Y       G


    We have an unlimited supply of bricks, which are of the following types
(see the bottom part of the figure above):

   • There are red (R) and blue (B) bricks, both of which are 1 × 1 cells.

   • There are white (W ), yellow (Y ), and green (G) bricks, all of which
     are 1 × 2 cells.

   A tiling of the board Bn is a placement of bricks on the board such that
4.9.    Exercises                                                          135


   • the bricks exactly cover Bn and

   • no two bricks overlap.

In a tiling, a color can be used more than once and some colors may not be
used at all. The figure below shows a tiling of B9 , in which each color is used
and the color red is used twice.

                           B     W     R      G   R    Y


   Let Tn be the number of different tilings of the board Bn .

   • Determine T1 and T2 .

   • Let n ≥ 3 be an integer. Prove that

                                 Tn = 2 · Tn−1 + 3 · Tn−2 .

   • Prove that for any integer n,

                            2(−1)n−1 + 3(−1)n−2 = (−1)n .

   • Prove that for any integer n ≥ 1,

                                            3n+1 + (−1)n
                                     Tn =                .
                                                  4

4.27 The sequence of numbers an , for n ≥ 0, is recursively defined as follows:

                        a0 = 0,
                        a1 = 1,
                        an = 2 · an−1 + an−2 if n ≥ 2.

   • Determine an for n = 0, 1, 2, 3, 4, 5.

   • Prove that                             √ n   √ n
                                      1+     2 − 1− 2
                            an =                 √                       (4.19)
                                               2 2
       for all integers n ≥ 0.
       Hint: What are the solutions of the equation x2 = 2x + 1?
136                                                           Chapter 4.   Recursion


   • Since the numbers an , for n ≥ 0, are obviously integers, the fraction on
     the right-hand side of (4.19) is an integer as well. Prove that the frac-
     tion on the right-hand side of (4.19) is an integer using only Newton’s
     Binomial Theorem (i.e., Theorem 3.6.5).

4.28 Let n be a positive integer and consider a 1 × n board Bn consisting of
n cells, each one having sides of length one. The top part of the figure below
shows B9 .




                              R           B           G


    You have an unlimited supply of bricks, which are of the following types
(see the bottom part of the figure above):

   • There are red (R) and blue (B) bricks, both of which are 1 × 1 cells.
     We refer to these bricks as squares.

   • There are green (G) bricks, which are 1 × 2 cells. We refer to these as
     dominoes.

   A tiling of the board Bn is a placement of bricks on the board such that

   • the bricks exactly cover Bn and

   • no two bricks overlap.

In a tiling, a color can be used more than once and some colors may not be
used at all. The figure below shows an example of a tiling of B9 .

                         G        B   B       R   B       G     R


   Let Tn be the number of different tilings of the board Bn .

   • Determine T1 , T2 , and T3 .

   • For any integer n ≥ 1, express Tn in terms of the numbers that appear
     in Exercise 4.27 .
4.9.    Exercises                                                        137


4.29 In this exercise, we use the notation of Exercise 4.28. Let n ≥ 1 be an
integer and consider the 1 × n board Bn .

   • Consider strings consisting of characters, where each character is S or
     D. Let k be an integer with 0 ≤ k ≤ bn/2c. Determine the number of
     such strings of length n − k, that contain exactly k many D’s.

   • Let k be an integer with 0 ≤ k ≤ bn/2c. Determine the number of
     tilings of the board Bn that use exactly k dominoes.
       Hint: How many bricks are used for such a tiling? In the first part,
       imagine that S stands for “square” and D stands for “domino”.

   • Use the results of the previous part to prove that

                                   bn/2c 
                                         n−k
                                   X         
                            Tn =               · 2n−2k .
                                   k=0
                                          k


4.30 In this exercise, we consider strings of characters, where each character
is an element of {a, b, c}. For any integer n ≥ 1, let En be the number of
such strings of length n that have an even number of c’s, and let On be the
number of such strings of length n that have an odd number of c’s. (Recall
that 0 is even.)

   • Determine E1 , O1 , E2 , and O2 .

   • Explain, in plain English, why

                                   En + On = 3n .

   • Prove that for every integer n ≥ 2,

                              En = 2 · En−1 + On−1 .

   • Prove that for every integer n ≥ 1,

                                             1 + 3n
                                    En =            .
                                               2
138                                                  Chapter 4.       Recursion


4.31 Consider strings of n characters, where each character is an element of
{a, b, c, d}, that contain an even number of as. (Recall that 0 is even.) Let
En be the number of such strings. Prove that for any integer n ≥ 1,

                              En+1 = 2 · En + 4n .

4.32 Let An be the number of bitstrings of length n that contain 000. Prove
that for n ≥ 4,
                    An = An−1 + An−2 + An−3 + 2n−3 .

4.33 Let n ≥ 1 be an integer and define An to be the number of bitstrings
of length n that do not contain 101.

   • Determine A1 , A2 , A3 , and A4 .

   • Prove that for each integer n ≥ 4,

              An = 3 + A1 + A2 + A3 + · · · + An−4 + An−3 + An−1
                       n−3
                       X
                 = 3+      Ak + An−1 .
                            k=1


Hint: Divide the strings into groups depending on the number of leading 1s.

4.34 Let n ≥ 1 be an integer and consider n people P1 , P2 , . . . , Pn . Let An
be the number of ways these n people can be divided into groups, such that
each group consists of either one or two people.

   • Determine A1 , A2 , A3 , and A4 .

   • Prove that for each integer n ≥ 3,

                             An = An−1 + (n − 1) · An−2 .

4.35 In this exercise, we consider strings of characters, where each character
is an element of {a, b, c}. Such a string is called aa-free, if it does not contain
two consecutive a’s. For any integer n ≥ 1, let Fn be the number of aa-free
strings of length n.

   • Determine F1 , F2 , and F3 .
4.9.    Exercises                                                         139


   • Let n ≥ 3 be an integer. Express Fn in terms of Fn−1 and Fn−2 .
   • Prove that for every integer n ≥ 1,
                                    √ n              √ n
                                               
                    1     1               1   1 
            Fn =      +√        1+ 3 +       −√     1− 3 .
                    2      3               2    3

       Hint: What are the solutions of the equation x2 = 2x + 2? Using these
       solutions will simplify the proof.

4.36 In this exercise, we consider strings of characters, where each character
is an element of {a, b, c}. Such a string is called awesome, if it does not
contain the substring ab and does not contain the substring ba. For any
integer n ≥ 1, let
  1. Sn denote the number of awesome strings of length n,
  2. An denote the number of awesome strings of length n that start with a,
  3. Bn denote the number of awesome strings of length n that start with b,
  4. Cn denote the number of awesome strings of length n that start with c.

   • Determine S1 and S2 .
   • Let n ≥ 1 be an integer. Express Sn in terms of An , Bn , and Cn .
   • Let n ≥ 2 be an integer. Express Cn in terms of Sn−1 .
   • Let n ≥ 2 be an integer. Prove that
                    Sn = (Sn−1 − Bn−1 ) + (Sn−1 − An−1 ) + Sn−1 .

   • Let n ≥ 3 be an integer. Prove that
                               Sn = 2 · Sn−1 + Sn−2 .

   • Prove that for every integer n ≥ 1,
                          1      √ n+1 1    √ n+1
                    Sn =      1+ 2       +   1− 2     .
                          2                2
       Hint: What are the solutions of the equation x2 = 2x + 1? Using these
       solutions will simplify the proof.
140                                                  Chapter 4.       Recursion


4.37 A block in a bitstring is a maximal consecutive substring of 1’s. For
example, the bitstring 1100011110100111 has four blocks: 11, 1111, 1, and
111. These blocks have lengths 2, 4, 1, and 3, respectively.
    Let n ≥ 1 be an integer and let Bn be the number of bitstrings of length
n that do not contain any block of odd length; in other words, every block
in these bitstrings has an even length.
   • Determine B1 , B2 , B3 , and B4 .
   • Determine the value of Bn , i.e., express Bn in terms of numbers that
     we have seen in this chapter.

4.38 Let n ≥ 1 be an integer and let Sn be the number of ways in which n
can be written as a sum of 1s and 2s; the order in which the 1s and 2s occur
in the sum matters. For example, S3 = 3, because

                        3 = 1 + 1 + 1 = 1 + 2 = 2 + 1.

   • Determine S1 , S2 , and S4 .
   • Determine the value of Sn , i.e., express Sn in terms of numbers that we
     have seen in this chapter.

4.39 Ever since he was a child, Nick has been dreaming to be like Spiderman.
As you all know, Spiderman can climb up the outside of a building; if he is
at a particular floor, then, in one step, he can move up several floors. Nick
is not that advanced yet. In one step, Nick can move up either one floor or
two floors.
    Let n ≥ 1 be an integer and consider a building with n floors, numbered
1, 2, . . . , n. (The first floor has number 1; this is not the ground floor.) Nick
is standing in front of this building, at the ground level. There are different
ways in which Nick can climb to the n-th floor. For example, here are three
different ways for the case when n = 5:
   1. move up 2 floors, move up 1 floor, move up 2 floors.
   2. move up 1 floor, move up 2 floors, move up 2 floors.
   3. move up 1 floor, move up 2 floors, move up 1 floor, move up 1 floor.
   Let Sn be the number of different ways, in which Nick can climb to the
n-th floor.
4.9.    Exercises                                                                    141


    • Determine, S1 , S2 , S3 , and S4 .

    • Determine the value of Sn , i.e., express Sn in terms of numbers that we
      have seen in this chapter.

4.40 Let n ≥ 1 be an integer and consider the set Sn = {1, 2, . . . , n}. A non-
neighbor subset of Sn is any subset T of S having the following property: If
k is any element of T , then k + 1 is not an element of T . (Observe that the
empty set is a non-neighbor subset of Sn .)
    For example, if n = 3, then {1, 3} is a non-neighbor subset, whereas {2, 3}
is not a non-neighbor subset.
    Let Nn denote the number of non-neighbor subsets of the set Sn .

    • Determine N1 , N2 , and N3 .

    • Determine the value of Nn , i.e., express Nn in terms of numbers that
      we have seen in this chapter.

4.41 Let n ≥ 1 be an integer and consider the set S = {1, 2, . . . , n}.

    • Assume we arrange the elements of S in sorted order on a horizontal
      line. Let Bn be the number of subsets of S that do not contain any
      two elements that are neighbors on this line. For example, if n = 4,
      then both subsets {1, 3} and {1, 4} are counted in B4 , but neither of
      the subsets {2, 3} and {2, 3, 4} is counted.
       For each integer n ≥ 1, express Bn in terms of numbers that we have
       seen in this chapter.

    • Assume we arrange the elements of S in sorted order along a circle. Let
      Cn be the number of subsets of S that do not contain any two elements
      that are neighbors on this circle. For example, if n = 4, then the subset
      {1, 3} is counted in C4 , but neither of the subsets {2, 3} and {1, 4} is
      counted.
       For each integer n ≥ 4, express Cn in terms of numbers that we have
       seen in this chapter.

4.42 For any integer n ≥ 1, a permutation a1 , a2 , . . . , an of the set {1, 2, . . . , n}
is called awesome, if the following condition holds:
142                                                 Chapter 4.       Recursion


   • For every i with 1 ≤ i ≤ n, the element ai in the permutation belongs
     to the set {i − 1, i, i + 1}.
For example, for n = 5, the permutation 2, 1, 3, 5, 4 is awesome, whereas
2, 1, 5, 3, 4 is not an awesome permutation.
    Let Pn denote the number of awesome permutations of the set {1, 2, . . . , n}.

   • Determine P1 , P2 , and P3 .

   • Determine the value of Pn , i.e., express Pn in terms of numbers that
     we have seen in this chapter.
      Hint: Derive a recurrence relation. What are the possible values for
      the last element an in an awesome permutation?

4.43 A block in a bitstring is a maximal consecutive substring of 1’s. For
example, the bitstring 1100011110100111 has four blocks: 11, 1111, 1, and
111.
   For a given integer n ≥ 1, consider all 2n bitstrings of length n. Let Bn
be the total number of blocks in all these bitstrings.
   For example, the left part of the table below contains all 8 bitstrings of
length 3. Each entry in the rightmost column shows the number of blocks in
the corresponding bitstring. Thus,

                   B3 = 0 + 1 + 1 + 1 + 1 + 2 + 1 + 1 = 8.

                                  0   0   0   0
                                  0   0   1   1
                                  0   1   0   1
                                  1   0   0   1
                                  0   1   1   1
                                  1   0   1   2
                                  1   1   0   1
                                  1   1   1   1

   • Determine B1 and B2 .

   • Let n ≥ 3 be an integer.

        – Consider all bitstrings of length n that start with 0. What is the
          total number of blocks in these bitstrings?
4.9.    Exercises                                                          143


         – Determine the number of blocks in the bitstring

                                          1| ·{z
                                               · · 1} .
                                              n


         – Determine the number of blocks in the bitstring

                                         1| ·{z
                                              · · 1} 0.
                                           n−1


         – Let k be an integer with 2 ≤ k ≤ n − 1. Consider all bitstrings of
           length n that start with
                                       1| ·{z
                                            · · 1} 0.
                                           k−1

           Prove that the total number of blocks in these bitstrings is equal
           to
                                    2n−k + Bn−k .
         – Prove that
                                               n−1
                                               X
                                                          2n−k + Bn−k .
                                                                     
                          Bn = 2 + Bn−1 +
                                               k=2


         – Use 1 + 2 + 22 + 23 + · · · + 2n−2 = 2n−1 − 1, to prove that

                          Bn = 2n−1 + B1 + B2 + · · · + Bn−1 .            (4.20)

   • Prove that (4.20) also holds for n = 2.

   • Let n ≥ 3. Prove that

                               Bn = 2n−2 + 2 · Bn−1 .                     (4.21)

       Hint: Write (4.20) on one line. Below this line, write (4.20) with n
       replaced by n − 1.

   • Prove that for every n ≥ 1,
                                         n+1 n
                                  Bn =      ·2 .
                                          4
144                                                 Chapter 4.       Recursion


   • The derivation of the recurrence in (4.21) was quite involved. Prove
     this recurrence in a direct way.

4.44 Let n ≥ 1 be an integer and consider a set S consisting of n elements.
A function f : S → S is called cool, if for all elements x of S,

                               f (f (f (x))) = x.

Let An be the number of cool functions f : S → S.

   • Let f : S → S be a cool function, and let x be an element of S. Prove
     that the set
                               {x, f (x), f (f (x))}
      has size 1 or 3.

   • Let f : S → S be a cool function, and let x and y be two distinct
     elements of S. Assume that f (y) = y. Prove that f (x) 6= y.

   • Prove that for any integer n ≥ 4,

                         An = An−1 + (n − 1)(n − 2) · An−3 .

      Hint: Let y be a fixed element in S. Some cool functions f have the
      property that f (y) = y, whereas some other cool functions f have the
      property that f (y) 6= y.

4.45 Let S be the set of ordered pairs of integers that is recursively defined
in the following way:

   • (0, 0) ∈ S.

   • If (a, b) ∈ S then (a + 2, b + 3) ∈ S.

   • If (a, b) ∈ S then (a + 3, b + 2) ∈ S.

Prove that for every element (a, b) in S, a + b is divisible by 5.

4.46 Let S be the set of integers that is recursively defined in the following
way:

   • 4 is an element of S.
4.9.   Exercises                                                            145


   • If x and y are elements of S, then x + y 2 is an element of S.

Prove that every element of S is divisible by 4.

4.47 Let S be the set of ordered triples of integers that is recursively defined
in the following way:

   • (66, 55, 1331) ∈ S.

   • If (a, b, c) ∈ S then (a + 7, b + 5, 14a − 10b + c + 24) ∈ S.

Prove that for every element (a, b, c) in S,

                                 a2 − b2 = c.

4.48 Let S be the set of integers that is recursively defined in the following
way:

   • 1 is an element of S.
                                        √
   • If x is an element of S, then x + 2 x + 1 is also an element of S.

Give a simple description of the set S and prove that your answer is correct.

4.49 The set S of bitstrings is recursively defined in the following way:

   • The string 00 is an element of the set S.

   • The string 01 is an element of the set S.

   • The string 10 is an element of the set S.

   • If the string s is an element of the set S, then the string 0s (i.e., the
     string obtained by adding the bit 0 at the front of s) is also an element
     of the set S.

   • If the string s is an element of the set S, then the string 10s (i.e.,
     the string obtained by adding the bits 10 at the front of s) is also an
     element of the set S.

Let s be an arbitrary string in the set S. Prove that s does not contain the
substring 11.
146                                                    Chapter 4.   Recursion


4.50 A binary tree is
   • either one single node
   • or a node whose left subtree is a binary tree and whose right subtree is
     a binary tree.

                          single node


                           or       binary     binary
                                     tree       tree

   Prove that any binary tree with n leaves has exactly 2n − 1 nodes.
4.51 In this exercise, we will denote Boolean variables by lowercase letters,
such as p and q. A proposition is any Boolean formula that can be obtained
by applying the following recursive rules:
  1. For every Boolean variable p, p is a proposition.
  2. If f is a proposition, then ¬f is also a proposition.
  3. If f and g are propositions, then (f ∨ g) is also a proposition.
  4. If f and g are propositions, then (f ∧ g) is also a proposition.
   • Let p and q be Boolean variables. Prove that
                                ¬ ((p ∧ ¬q) ∨ (¬p ∨ q))
      is a proposition.
   • Let ↑ denote the not-and operator. In other words, if f and g are
     Boolean formulas, then (f ↑ g) is the Boolean formula that has the
     following truth table (0 stands for false, and 1 stands for true):

                                    f   g    (f ↑ g)
                                    0   0       1
                                    0   1       1
                                    1   0       1
                                    1   1       0
4.9.   Exercises                                                             147


         – Let p be a Boolean variable. Use a truth table to prove that the
           Boolean formulas (p ↑ p) and ¬p are equivalent.
         – Let p and q be Boolean variables. Use a truth table to prove that
           the Boolean formulas ((p ↑ p) ↑ (q ↑ q)) and p ∨ q are equivalent.
         – Let p and q be Boolean variables. Express the Boolean formula
           (p ∧ q) as an equivalent Boolean formula that only uses the ↑-
           operator. Use a truth table to justify your answer.

   • Prove that any proposition can be written as an equivalent Boolean
     formula that only uses the ↑-operator.

4.52 In Section 4.4, we have seen the recursive algorithm gossip(n), which
computes a sequence of phone calls for the persons P1 , P2 , . . . , Pn . The base
case for this algorithm was when n = 4. Assume we change the base case to
n = 2: In this new base case, there are only two people P1 and P2 , and only
one phone call is needed. The rest of the algorithm remains unchanged.
   Prove that the modified algorithm gossip(n) results in a sequence of
2n − 3 phone calls for any integer n ≥ 2. (Thus, for n ≥ 4, it makes one
more phone call than the algorithm in Section 4.4.)

4.53 In Section 4.4, we have seen the recursive algorithm gossip(n), which
computes a sequence of phone calls for the persons P1 , P2 , . . . , Pn , for any
integer n ≥ 4.
    Give an iterative (i.e., non-recursive) version of this algorithm in pseu-
docode. Your algorithm must produce exactly the same sequence of phone
calls as algorithm gossip(n).

4.54 In Section 4.5, we have seen algorithm Euclid(a, b), which takes as
input two integers a and b with a ≥ b ≥ 1, and returns their greatest common
divisior.
   Assume we run algorithm Euclid(a, b) with two input integers a and b
that satisfy b > a ≥ 1. What is the output of this algorithm?

4.55 The following recursive algorithm Fib takes as input an integer n ≥ 0
and returns the n-th Fibonacci number fn :
148                                             Chapter 4.     Recursion


      Algorithm Fib(n):

          if n = 0 or n = 1
          then f = n
          else f = Fib(n − 1) + Fib(n − 2)
          endif;
          return f

    Let an be the number of additions made by algorithm Fib(n), i.e., the
total number of times the +-function in the else-case is called. Prove that
for all n ≥ 0,
                              an = fn+1 − 1.


4.56 Consider the following recursive algorithm Beer(n), which takes as
input an integer n ≥ 1:

      Algorithm Beer(n):

          if n = 1
          then eat some peanuts
          else choose an arbitrary integer m with 1 ≤ m ≤ n − 1;
               Beer(m);
               drink one pint of beer;
               Beer(n − m)
          endif




   • Explain why, for any integer n ≥ 1, algorithm Beer(n) terminates.


   • Let B(n) be the number of pints of beer you drink when running algo-
     rithm Beer(n). Determine the value of B(n).


4.57 Consider the following recursive algorithm Silly, which takes as input
an integer n ≥ 1 which is a power of 2:
4.9.     Exercises                                                       149


       Algorithm Silly(n):

           if n = 1
           then drink one pint of beer
           else if n = 2
                then fart once
                else fart once;
                     Silly(n/2);
                     fart once
                endif
           endif

   For n a power of 2, let F (n) be the number of times you fart when running
algorithm Silly(n). Determine the value of F (n).

4.58 In the fall term of 2015, Nick took the course COMP 2804 at Carleton
University. Nick was always sitting in the back of the classroom and spent
most of his time eating bananas. Nick uses the following scheme to buy
bananas:

   • At the start of week 0, there are 2 bananas in Nick’s fridge.

   • For any integer n ≥ 0, Nick does the following during week n:

          – At the start of week n, Nick determines the number of bananas in
            his fridge and stores this number in a variable x.
          – Nick goes to Jim’s Banana Empire, buys x bananas, and puts
            them in his fridge.
          – Nick takes n + 1 bananas out of his fridge and eats them during
            week n.

For any integer n ≥ 0, let B(n) be the number of bananas in Nick’s fridge at
the start of week n. Determine the value of B(n).

4.59 Jennifer loves to drink India Pale Ale (IPA). After a week of hard work,
Jennifer goes to the pub and runs the following recursive algorithm, which
takes as input an integer n ≥ 1, which is a power of 4:
150                                               Chapter 4.   Recursion


      Algorithm JenniferDrinksIPA(n):

          if n = 1
          then place one order of chicken wings
          else for k = 1 to 4
               do JenniferDrinksIPA(n/4);
                   drink n pints of IPA
               endfor
          endif

   For n a power of 4, let

   • P (n) be the number of pints of IPA that Jennifer drinks when running
     algorithm JenniferDrinksIPA(n),

   • C(n) be the number of orders of chicken wings that Jennifer places
     when running algorithm JenniferDrinksIPA(n).

   Determine the values of P (n) and C(n).

4.60 Elisa Kazan loves to drink cider. During the weekend, Elisa goes to
the pub and runs the following recursive algorithm, which takes as input an
integer n ≥ 0:
      Algorithm ElisaDrinksCider(n):

          if n = 0
          then order Fibonachos
          else if n is even
               then ElisaDrinksCider(n/2);
                      drink n2 /2 pints of cider;
                      ElisaDrinksCider(n/2)
               else for i = 1 to 4
                     do ElisaDrinksCider((n − 1)/2);
                         drink (n − 1)/2 pints of cider
                     endfor;
                     drink 1 pint of cider
               endif
          endif
4.9.     Exercises                                                       151


   For n ≥ 0, let C(n) be the number of pints of cider that Elisa drinks when
running algorithm ElisaDrinksCider(n). Determine the value of C(n).




4.61 Elisa Kazan loves to drink cider. After a week of bossing the Vice-
Presidents around, Elisa goes to the pub and runs the following recursive
algorithm, which takes as input an integer n ≥ 0:

       Algorithm ElisaGoesToThePub(n):

           if n = 0
           then drink one bottle of cider
           else for k = 0 to n − 1
                do ElisaGoesToThePub(k);
                    drink one bottle of cider
                endfor
           endif


  For n ≥ 0, let C(n) be the number of bottles of cider that Elisa drinks
when running algorithm ElisaGoesToThePub(n).
   Prove that for every integer n ≥ 1,


                             C(n) = 3 · 2n−1 − 1.



Hint: 1 + 2 + 22 + 23 + · · · + 2n−2 = 2n−1 − 1.




4.62 Elisa Kazan loves to drink cider. On Saturday night, Elisa goes to her
neighborhood pub and runs the following recursive algorithm, which takes
as input an integer n ≥ 1:
152                                                Chapter 4.     Recursion


      Algorithm ElisaDrinksCider(n):

            if n = 1
            then drink one pint of cider
            else if n is even
                 then ElisaDrinksCider(n/2);
                        drink one pint of cider;
                        ElisaDrinksCider(n/2)
                 else drink one pint of cider;
                       ElisaDrinksCider(n − 1);
                       drink one pint of cider
                 endif
            endif

    For any integer n ≥ 1, let P (n) be the number of pints of cider that
Elisa drinks when running algorithm ElisaDrinksCider(n). Determine
the value of P (n).

4.63 Let n ≥ 2 be an integer and consider a sequence s1 , s2 , . . . , sn of n
pairwise distinct numbers. The following algorithm computes the smallest
and largest elements in this sequence:
      Algorithm MinMax(s1 , s2 , . . . , sn ):

            min = s1 ;
            max = s1 ;
            for i = 2 to n
            do if si < min          (1)
                then min = si
                endif;
                if si > max          (2)
                then max = si
                endif
            endwhile;
            return (min, max )

   This algorithm makes comparisons between input elements in lines (1)
and (2). Determine the total number of comparisons as a function of n.
4.9.     Exercises                                                       153


4.64 Let n ≥ 2 be a power of 2 and consider a sequence S of n pairwise
distinct numbers. The following algorithm computes the smallest and largest
elements in this sequence:
       Algorithm FastMinMax(S, n):

           if n = 2
           then let x and y be the two elements in S;
                 if x < y           (1)
                 then min = x;
                         max = y
                 else min = y;
                       max = x
                 endif
           else divide S into two subsequences S1 and S2 , both of size n/2;
                (min 1 , max 1 ) = FastMinMax(S1 , n/2);
                (min 2 , max 2 ) = FastMinMax(S2 , n/2);
                if min 1 < min 2        (2)
                then min = min 1
                else min = min 2
                endif;
                if max 1 < max 2         (3)
                then max = max 2
                else max = max 1
                endif
           endif;
           return (min, max )


    This algorithm makes comparisons between input elements in lines (1),
(2), and (3). Let C(n) be the total number of comparisons made by algorithm
FastMinMax on an input sequence of length n.
   • Derive a recurrence relation for C(n).
   • Use this recurrence relation to prove that C(n) = 23 n−2 for each n ≥ 2
     that is a power of 2.
4.65 Consider the following recursive algorithm, which takes as input a se-
quence (a1 , a2 , . . . , an ) of length n, where n ≥ 1:
154                                                            Chapter 4.        Recursion


      Algorithm Mystery(a1 , a2 , . . . , an ):

            if n = 1
            then return the sequence (a1 )
            else (b1 , b2 , . . . , bn−1 ) = Mystery(a1 , a2 , . . . , an−1 );
                 return the sequence (an , b1 , b2 , . . . , bn−1 )
            endif


   • Express the output of algorithm Mystery(a1 , a2 , . . . , an ) in terms of
     the input sequence (a1 , a2 , . . . , an ).

4.66 Consider the following recursive algorithm, which takes as input a se-
quence (a1 , a2 , . . . , an ) of n numbers, where n is a power of two, i.e., n = 2k
for some integer k ≥ 0:
      Algorithm Mystery(a1 , a2 , . . . , an ):

            if n = 1
            then return a1
            else for i = 1 to n/2
                 do bi = min(a2i−1 , a2i )               (∗)
                 endfor;
                 Mystery(b1 , b2 , . . . , bn/2 )
            endif


   • Express the output of algorithm Mystery(a1 , a2 , . . . , an ) in terms of
     the input sequence (a1 , a2 , . . . , an ).
   • For any integer n ≥ 1 that is a power of two, let T (n) be the to-
     tal number of times that line (∗) is executed when running algorithm
     Mystery(a1 , a2 , . . . , an ). Derive a recurrence for T (n) and use it to
     prove that for any integer n ≥ 1 that is a power of two,

                                          T (n) = n − 1.

4.67 Let k be a positive integer and let n = 2k . You are given an n×n board
Bn , all of whose (square) cells are white, except for one, which is black. (The
left part of the figure below gives an example where k = 3 and n = 8.)
4.9.   Exercises                                                        155


    A tromino is an L-shaped object consisting of three 1 × 1 cells. Each
tromino can appear in four different orientations; see the right part of the
figure below.




   A tiling of the board Bn is a placement of trominoes on the board such
that

   • the trominoes cover exactly all white cells (thus, the black cell is not
     covered by any tromino) and

   • no two trominoes overlap.

   Here is a tiling of the board given above:




   Describe a recursive algorithm that

   • takes as input a board Bn having exactly one black cell (which can be
     anywhere on the board) and

   • returns a tiling of this board.

Hint: Look at the following figure:
156                                                       Chapter 4.   Recursion


4.68 Let n ≥ 1 be an integer and consider a set S consisting of n points
in R2 . Each point p of S is given by its x- and y-coordinates px and py ,
respectively. We assume that no two points of S have the same x-coordinate
and no two points of S have the same y-coordinate.
   A point p of S is called maximal in S if there is no point in S that is to
the north-east of p, i.e.,

                     {q ∈ S : qx > px and qy > py } = ∅.

The figure below shows an example, in which the •-points are maximal and
the ×-points are not maximal. Observe that, in general, there is more than
one maximal element in S.
                            •


                                          •
                                  ×
                                              •
                                      ×               •
                              ×               ×
                                                  ×               •



   Describe a recursive algorithm MaxElem that has the same structure as
algorithm MergeSort in Section 4.6 and has the following specification:
      Algorithm MaxElem(S, n):
      Input: A set S of n points in R2 , in sorted order of their x-
      coordinates.
      Output: All maximal elements of S, in sorted order of their x-
      coordinates.

   The running time of your algorithm must be O(n log n).

4.69 The Hadamard matrices H0 , H1 , H2 , . . . are recursively defined as fol-
lows:
                             H0 = (1)
and for k ≥ 1,                                           
                                    Hk−1 Hk−1
                         Hk =                                 .
                                    Hk−1 −Hk−1
4.9.     Exercises                                                          157


Thus, H0 is a 1 × 1 matrix whose only entry is 1,
                                          
                                    1 1
                            H1 =              ,
                                    1 −1
and                                      
                                1 1  1  1
                               1 −1 1 −1 
                         H2 = 
                               1 1 −1 −1  .
                                          

                                1 −1 −1 1
Observe that Hk has 2k rows and 2k columns.
   If x is a column vector of length 2k , then Hk x is the column vector of
length 2k obtained by multiplying the matrix Hk with the vector x.
   Describe a recursive algorithm Mult that has the following specification:
       Algorithm Mult(k, x):
       Input: An integer k ≥ 0 and a column vector x of length n = 2k .
       Output: The column vector Hk x (having length n).

    The running time T (n) of your algorithm must be O(n log n).
Hint: The input only consists of k and x. The matrix Hk , which has n2 en-
tries, is not given as part of the input. Since you are aiming for an O(n log n)–
time algorithm, you cannot compute all entries of the matrix Hk .
4.70 Let m ≥ 1 and n ≥ 1 be integers and consider an m × n matrix A. The
rows of this matrix are numbered 1, 2, . . . , m, and its columns are numbered
1, 2, . . . , n. Each entry of A stores one number and, for each row, all numbers
in this row are pairwise distinct. For each i = 1, 2, . . . , m, define
g(i) = the position (i.e., column number) of the smallest number in row i.
We say that the matrix A is awesome, if
                       g(1) ≤ g(2) ≤ g(3) ≤ . . . ≤ g(m).
In the matrix below, the smallest number in each row is in boldface. For this
example, we have m = 4, n = 10, g(1) = 3, g(2) = 3, g(3) = 5, and g(4) = 8.
Thus, this matrix is awesome.
                                                           
                     13 12 5 8 6 9 15 20 19 7
                   3 4 1 17 6 13 7 10 2 5 
             A=   19 5 12 7 2 4 11 13 6 3  .
                                                            

                      7 4 17 10 5 14 12 3 20 6
158                                                     Chapter 4.     Recursion


From now on, we assume that the m × n matrix A is awesome.

   • Let i be an integer with 1 ≤ i ≤ m. Describe an algorithm that
     computes g(i) in O(n) time.

   • Describe an algorithm that computes all values g(1), g(2), . . . , g(m) in
     O(mn) total time.

In the rest of this exercise, you will show that all values g(1), g(2), . . . , g(m)
can be computed in O(m + n log m) total time.

   • Assume that m is even and assume that you are given the values

                            g(2), g(4), g(6), g(8), . . . , g(m).

      Describe an algorithm that computes the values

                          g(1), g(3), g(5), g(7), . . . , g(m − 1)

      in O(m + n) total time.

   • Assume that m = 2k , i.e., m is a power of two. Describe a recursive
     algorithm FindRowMinima that has the following specification:

       Algorithm FindRowMinima(A, i):
       Input: An m × n awesome matrix A and an integer i with
       0 ≤ i ≤ k.
       Output: The values g (j · m/2i ) for j = 1, 2, 3, . . . , 2i .


      For each i with 0 ≤ i ≤ k, let T (i) denote the running time of algorithm
      FindRowMinima(A, i). The running time of your algorithm must
      satisfy the recurrence

                   T (0) = O(n),
                   T (i) = T (i − 1) + O 2i + n , if 1 ≤ i ≤ k.
                                               


   • Assume again that m = 2k . Prove that all values g(1), g(2), . . . , g(m)
     can be computed in O(m + n log m) total time.
      Hint: 1 + 2 + 22 + 23 + · · · + 2k ≤ 2m.
4.9.   Exercises                                                         159


4.71 Prove, for example by induction, that for n ≥ 1,
                                                 n(n + 1)
                       1 + 2 + 3 + ··· + n =              ,
                                                    2
and
                                               n(n + 1)(2n + 1)
                 12 + 22 + 32 + · · · + n2 =                    .
                                                      6
4.72 Assume you remember that

                            12 + 22 + 32 + · · · + n2

is equal to a polynomial of degree three, i.e.,

              12 + 22 + 32 + · · · + n2 = An3 + Bn2 + Cn + D,

but you have forgotten the values of A, B, C, and D. How can you determine
these four values?

4.73 In Section 4.8.3, we have shown that
                     n−2
                                                n−1
                     X                            
                          (k − 1)(n − k − 1) =      .
                      k=2
                                                 3

Use Exercise 4.71 to give an alternative proof.

4.74 In Section 4.8.4, we have used the fact that
                                 n−1     
                                 X       n
                                     k=    ,
                                 k=1
                                         2

which follows from Theorem 2.2.10. Give an alternative proof that uses the
approach that we used to prove the identity in (4.17).

4.75 In Section 4.8.4, we have shown that
                               n−1          
                               X    k         n
                                           =    .
                               k=3
                                       3      4

Use induction and Pascal’s Identity (see Theorem 3.7.2) to give an alternative
proof.
160                                                        Chapter 4.   Recursion


4.76 Consider the numbers  Rn that we defined in Section 4.8. The n points
                     n
                      
on the circle define 2 line segments, one segment for each
                                                          pair of points.
Let X be the total number of intersections among these n2 line segments.

   • Prove that                               
                                              n
                                    Rn = 1 +     + X.
                                              2
                                                                                 n
                                                                                  
      Hint: Start with only the circle and the n points. Then add the            2
      line segments one by one.

   • Prove that                               
                                              n
                                          X=    .
                                              4

4.77 For an integer n ≥ 1, draw n straight lines, such that no two of them
are parallel and no three of them intersect in one single point. These lines
divide the plane into regions (some of which are bounded and some of which
are unbounded). Denote the number of these regions by Cn .
    Derive a recurrence relation for the numbers Cn and use it to prove that
for n ≥ 1,
                                        n(n + 1)
                             Cn = 1 +            .
                                           2

4.78 Let n ≥ 1 be an integer. Consider 2n straight lines `1 , `01 , . . . , `n , `0n
such that

   • for each i with 1 ≤ i ≤ n, `i and `0i are parallel,

   • no two of the lines `1 , . . . , `n are parallel,

   • no two of the lines `01 , . . . , `0n are parallel,

   • no three of the 2n lines intersect in one single point.

These lines divide the plane into regions (some of which are bounded and
some of which are unbounded). Denote the number of these regions by Rn .
From the figure below, you can see that R1 = 3, R2 = 9, and R3 = 19.
4.9.   Exercises                                                      161



                       `01                    `01       `2
                                        `1
               `1                                            `02


                    R1 = 3
                                                    R2 = 9

                                  `01   `2
                             `1               `02
                                              `03
                                              `3

                                  R3 = 19

   • Derive a recurrence relation for the numbers Rn and use it to prove
     that Rn = 2n2 + 1 for n ≥ 1.

4.79 Let m ≥ 1 and n ≥ 1 be integers. Consider m horizontal lines and n
non-horizontal lines such that

   • no two of the non-horizontal lines are parallel,

   • no three of the m + n lines intersect in one single point.

These lines divide the plane into regions (some of which are bounded and
some of which are unbounded). Denote the number of these regions by Rm,n .
From the figure below, you can see that R4,3 = 23.
162                                                   Chapter 4.       Recursion


   • Derive a recurrence relation for the numbers Rm,n and use it to prove
     that                                           
                                                n+1
                       Rm,n = 1 + m(n + 1) +           .
                                                  2

4.80 A line is called slanted if it is neither horizontal nor vertical. Let k ≥ 1,
m ≥ 1, and n ≥ 0 be integers. Consider k horizontal lines, m vertical lines,
and n slanted lines, such that

   • no two of the slanted lines are parallel,

   • no three of the k + m + n lines intersect in one single point.

These lines divide the plane into regions (some of which are bounded and
some of which are unbounded). Denote the number of these regions by Rk,m,n .
From the figure below, you can see that R4,2,2 = 30.




   • Prove that
                               Rk,m,0 = (k + 1)(m + 1).

   • Derive a recurrence relation for the numbers Rk,m,n and use it to prove
     that                                                    
                                                         n+1
                Rk,m,n = (k + 1)(m + 1) + (k + m)n +           .
                                                           2

4.81 The sequence SF 0 , SF 1 , SF 2 , . . . of snowflakes is recursively defined in
the following way:

   • The snowflake SF 0 is an equilateral triangle with edges of length 1.

   • For any integer n ≥ 1, the snowflake SF n is obtained by taking the
     snowflake SF n−1 and doing the following for each of its edges:
4.9.   Exercises                                                         163


        – Divide this edge into three edges of equal length.
        – Draw an equilateral triangle that has the middle edge from the
          previous step as its base, and that is outside of SF n−1 .
        – Remove the edge that is the base of the equilateral triangle from
          the previous step.

In the figure below, you see the snowflakes SF 0 up to SF 5 .




   • For any integer n ≥ 0, let Nn be the total number of edges of SF n .
     Determine the value of Nn , by deriving a recurrence relation and solving
     it.

   • For any integer n ≥ 0, let `n be the length of one single edge of SF n .
     Determine the value of `n , by deriving a recurrence relation and solving
     it.

   • For any integer n ≥ 0, let Ln be the total length of all edges of SF n .
     Prove that                           n
                                          4
                                Ln = 3 ·       .
                                          3
   • Let a0 be the area of the triangle SF 0 . For any integer n ≥ 1, let
     an be the area of one single triangle that is added when constructing
164                                                   Chapter 4.   Recursion


      SF n from SF n−1 . Determine the value of an , by deriving a recurrence
      relation and solving it.

  • For any integer n ≥ 1, let An be the total area of all triangles that are
    added when constructing SF n from SF n−1 . Prove that
                                       n
                                   3    4
                              An = ·         · a0 .
                                   4    9

  • Let n ≥ 0 be an integer. Prove that the total area of SF n is equal to
                                        n 
                            a0            4
                               · 8−3·            .
                             5            9

      Hint: For any real number x 6= 1,
                               n
                               X                1 − xn
                                     xk = x ·          .
                               k=1
                                                1−x
Chapter 5

Discrete Probability

We all have some intuitive understanding of the notions of “chance” and
“probability”. When buying a lottery ticket, we know that there is a chance
of winning the jackpot, but we also know that this chance is very small.
Before leaving home in the morning, we check the weather forecast and see
that, with probability 80%, we get 15 centimetres of snow in Ottawa. In
this chapter, we will give a formal definition of this notion of “probability”.
We start by presenting a surprising application of probability and random
numbers.


5.1       Anonymous Broadcasting
Consider a group of n people P1 , P2 , . . . , Pn , for some integer n ≥ 3. One
person in this group, say Pk , would like to broadcast, anonymously, a message
to all other people in the group. That is, Pk wants to broadcast a message
such that

   • everybody in the group receives the message,

   • nobody knows that the message was broadcast by Pk .

In other words, when Pi (with i 6= k) receives the message, she only knows
that it was broadcast by one of P1 , . . . , Pi−1 , Pi+1 , . . . , Pn ; she cannot deter-
mine who broadcast the message.
   At first sight, it seems to be impossible to do this. In 1988, however,
David Chaum published, in the Journal of Cryptology, a surprisingly simple
166                                      Chapter 5.    Discrete Probability


protocol that does achieve this. Chaum referred to the problem as the Dining
Cryptographers Problem.
   We will present and analyze the protocol for the case when n = 3. Thus,
there are three people P1 , P2 , and P3 . We assume that exactly one of them
broadcasts a message and refer to this person as the broadcaster. We also
assume that the message is a bitstring. The broadcaster will announce the
message one bit at a time.
   The three people P1 , P2 , and P3 sit at a table, in clockwise order of their
indices. Let b be the current bit that the broadcaster wants to announce.
The protocol for broadcasting this bit is as follows:
Step 1: Each person Pi generates a random bit bi , for example, by flipping
a coin. Thus, with 50% probability, bi = 0 and with 50% probability, bi = 1.
Step 2: Each person Pi shows the bit bi to her clockwise neighbor.
                               b1

                                      P2
                                               b2
                               P1
                                     P3

                               b3

At the end of this second step,
   • P1 knows b1 and b3 , but not b2 ,

   • P2 knows b1 and b2 , but not b3 ,

   • P3 knows b2 and b3 , but not b1 .
Step 3: Each person Pi computes the sum si (modulo 2) of the bits that she
knows. Thus,
   • P1 computes s1 = (b1 + b3 ) mod 2,

   • P2 computes s2 = (b1 + b2 ) mod 2,

   • P3 computes s3 = (b2 + b3 ) mod 2.
Step 4: Each person Pi does the following:
5.1.   Anonymous Broadcasting                                               167


   • If Pi is not the broadcaster, she sets ti = si .

   • If Pi is the broadcaster, she sets ti = (si + b) mod 2. Recall that b is the
     current bit that the broadcaster wants to announce. (Thus, if b = 1,
     then Pi “secretly” flips the bit si and stores the result in ti .)

Step 5: Each person Pi shows her bit ti to the other two people.
Step 6: Each person Pi computes the sum (modulo 2) of the three bits t1 ,
t2 , and t3 , i.e., the value (t1 + t2 + t3 ) mod 2.
   This concludes the description of the protocol for broadcasting one bit b.
Observe that for any bit x, we have (x + x) mod 2 = 0. Therefore, the bit
computed in the last step is equal to

             t1 + t2 + t3 =   s1 + s2 + s3 + b
                          =   (b1 + b3 ) + (b1 + b2 ) + (b2 + b3 ) + b
                          =   (b1 + b1 ) + (b2 + b2 ) + (b3 + b3 ) + b
                          =   b,

where all arithmetic is done modulo 2. In other words, the bit computed in
the last step is equal to the bit that the broadcaster wants to announce. This
shows that each person in the group receives this bit.
   It remains to show that a non-broadcaster cannot determine who broad-
cast the bit. In the analysis below, we assume that

   • b = 1, i.e., the broadcaster announces the bit 1,

   • P2 is not the broadcaster.

We have to show that P2 cannot determine whether P1 or P3 is the broad-
caster. Note that P2 knows the values

                                b1 , b2 , t1 , t2 , t3 ,

but does not know the bit b3 . We consider the cases when b1 = b2 and b1 6= b2
separately.
Case 1: b1 = b2 . This case has two subcases depending on the value of b3 .
Case 1.1: b3 = b1 ; thus, all three b-bits are equal.
168                                        Chapter 5.    Discrete Probability


   • If P1 is the broadcaster, then (all arithmetic is done modulo 2)

                            t1 = s1 + 1 = b1 + b3 + 1 = 1

      and
                                t3 = s3 = b2 + b3 = 0.

   • If P3 is the broadcaster, then

                                t1 = s1 = b1 + b3 = 0

      and
                            t3 = s3 + 1 = b2 + b3 + 1 = 1.

Thus, the broadcaster is the person whose t-bit is equal to 1.
Case 1.2: b3 6= b1 and, thus, b3 6= b2 .

   • If P1 is the broadcaster, then

                            t1 = s1 + 1 = b1 + b3 + 1 = 0

      and
                                t3 = s3 = b2 + b3 = 1.

   • If P3 is the broadcaster, then

                                t1 = s1 = b1 + b3 = 1

      and
                            t3 = s3 + 1 = b2 + b3 + 1 = 0.

Thus, the broadcaster is the person whose t-bit is equal to 0.
   Since P2 knows b1 and b2 , she knows when Case 1 occurs. Since P2 does
not know b3 , however, she cannot determine whether Case 1.1 or 1.2 occurs.
As a result, P2 cannot determine whether P1 or P3 is the broadcaster.
Case 2: b1 6= b2 . This case has two subcases depending on the value of b3 .
Case 2.1: b3 = b1 and, thus, b3 6= b2 .
5.1.    Anonymous Broadcasting                                            169


   • If P1 is the broadcaster, then

                           t1 = s1 + 1 = b1 + b3 + 1 = 1

       and
                               t3 = s3 = b2 + b3 = 1.

   • If P3 is the broadcaster, then

                               t1 = s1 = b1 + b3 = 0

       and
                           t3 = s3 + 1 = b2 + b3 + 1 = 0.

Thus, t1 is always equal to t3 , no matter whether P1 or P3 is the broadcaster.
Case 2.2: b3 6= b1 and, thus, b3 = b2 .
   • If P1 is the broadcaster, then

                           t1 = s1 + 1 = b1 + b3 + 1 = 0

       and
                               t3 = s3 = b2 + b3 = 0.

   • If P3 is the broadcaster, then

                               t1 = s1 = b1 + b3 = 1

       and
                           t3 = s3 + 1 = b2 + b3 + 1 = 1.

Thus, t1 is always equal to t3 , no matter whether P1 or P3 is the broadcaster.
   Since P2 knows b1 and b2 , she knows when Case 2 occurs. Since P2 does
not know b3 , however, she cannot determine whether Case 2.1 or 2.2 occurs.
As in Case 1, P2 cannot determine whether P1 or P3 is the broadcaster.
    We conclude from Cases 1 and 2 that the broadcasting of the bit b = 1 is
indeed anonymous. Now consider the case when the bit b to be announced
is equal to 0. It follows from the protocol that in this case, there is no
“secret bit flipping” done in Step 4 and all three people use the same rules
to compute the s-values and the t-values. In this case, t1 = s1 , t2 = s2 , and
170                                   Chapter 5.      Discrete Probability


t3 = s3 , and P2 can determine the bit b3 . She cannot, however, determine
whether P1 or P3 is the broadcaster.
    To conclude this section, we remark that for each bit to be announced,
the entire protocol must be followed. That is, in each round of the protocol,
one bit is broadcast and each person Pi must flip a coin to determine the bit
bi that is used in this round. We also remark that the protocol works only if
exactly one person is the broadcaster.


5.2     Probability Spaces
In this section, we give a formal definition of the notion of “probability” in
terms of sets and functions.

Definition 5.2.1 A sample space S is a non-empty countable set. Each
element of S is called an outcome and each subset of S is called an event.

    In daily life, we express probabilities in terms of percentages. For exam-
ple, the weather forecast may tell us that, with 80% probability, we will be
getting a snowstorm today. In probability theory, probabilities are expressed
in terms of numbers in the interval [0, 1]. A probability of 80% becomes a
probability of 0.8.

Definition 5.2.2 Let S be a sample space. A probability function on S is a
function Pr : S → R such that

   • for all ω ∈ S, 0 ≤ Pr(ω) ≤ 1, and
     P
   •    ω∈S Pr(ω) = 1.


   For any outcome ω in the sample space S, we will refer to Pr(ω) as the
probability that the outcome is equal to ω.

Definition 5.2.3 A probability space is a pair (S, Pr), where S is a sample
space and Pr : S → R is a probability function on S.

   A probability function Pr : S → R maps each element of the sample
space S (i.e., each outcome) to a real number in the interval [0, 1]. It turns
5.2.    Probability Spaces                                                171


out to be useful to extend this function so that it maps any event to a real
number in [0, 1]. If A is an event (i.e., A ⊆ S), then we define
                                        X
                              Pr(A) =       Pr(ω).                     (5.1)
                                       ω∈A

We will refer to Pr(A) as the probability that the event A occurs.
  Note that since S ⊆ S, the entire sample space S is an event and
                                   X
                           Pr(S) =     Pr(ω) = 1,
                                     ω∈S

where the last equality follows from the second condition in Definition 5.2.2.

5.2.1     Examples
Flipping a coin: Assume we flip a coin. Since there are two possible
outcomes (the coin comes up either heads (H) or tails (T )), the sample
space is the set S = {H, T }. If the coin is fair, i.e., the probabilities of H
and T are equal, then the probability function Pr : S → R is given by

                                 Pr(H) = 1/2,
                                 Pr(T ) = 1/2.

Observe that this function Pr satisfies the two conditions in Definition 5.2.2.
Since this sample space has two elements, there are four events, one event
for each subset. These events are

                            ∅, {H}, {T }, {H, T },

and it follows from (5.1) that

              Pr(∅)      =       0,
            Pr({H})      =    Pr(H)       =   1/2,
            Pr({T })     =     Pr(T )     =   1/2,
           Pr({H, T })   = Pr(H) + Pr(T ) = 1/2 + 1/2 = 1.

Flipping a coin twice: If we flip a fair coin twice, then there are four
possible outcomes, and the sample space becomes S = {HH, HT, T H, T T }.
For example, HT indicates that the first flip resulted in heads, whereas the
172                                      Chapter 5.         Discrete Probability


second flip resulted in tails. In this case, the probability function Pr : S → R
is given by

               Pr(HH) = Pr(HT ) = Pr(T H) = Pr(T T ) = 1/4.

Observe again that this function Pr satisfies the two conditions in Defini-
tion 5.2.2. Since the sample space consists of 4 elements, the number of
events is equal to 24 = 16. For example, A = {HT, T H} is an event and it
follows from (5.1) that

               Pr(A) = Pr(HT ) + Pr(T H) = 1/4 + 1/4 = 1/2.

In words, when flipping a fair coin twice, the probability that we see one
heads and one tails (without specifying the order) is equal to 1/2.

Rolling a die twice: If we roll a fair die, then there are six possible
outcomes (1, 2, 3, 4, 5, and 6), each one occurring with probability 1/6. If
we roll this die twice, we obtain the sample space

                       S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},

where i is the result of the first roll and j is the result of the second roll. Note
that |S| = 6 × 6 = 36. Since the die is fair, each outcome has the same prob-
ability. Therefore, in order to satisfy the two conditions in Definition 5.2.2,
we must have
                                   Pr(i, j) = 1/36
for each outcome (i, j) in S.
    If we are interested in the sum of the results of the two rolls, then we
define the event

        Ak = “the sum of the results of the two rolls is equal to k”,

which, using the notation of sets, is the same as

                          Ak = {(i, j) ∈ S : i + j = k}.

Consider, for example, the case when k = 4. There are three possible out-
comes of two rolls that result in a sum of 4. These outcomes are (1, 3), (2, 2),
and (3, 1). Thus, the event A4 is equal to

                           A4 = {(1, 3), (2, 2), (3, 1)}.                     (5.2)
5.2.   Probability Spaces                                                       173


    In the matrix below, the leftmost column indicates the result of the first
roll, the top row indicates the result of the second roll, and each entry is the
sum of the results of the two corresponding rolls.

                                1      2   3    4     5        6
                           1    2      3   4    5     6        7
                           2    3      4   5    6     7        8
                           3    4      5   6    7     8        9
                           4    5      6   7    8     9       10
                           5    6      7   8    9     10      11
                           6    7      8   9   10     11      12

    As can be seen from this matrix, the event Ak is non-empty only if k ∈
{2, 3, . . . , 12}. For any other k, the event Ak is empty, which means that it
can never occur.
    It follows from (5.1) that
                            X                       X
             Pr (Ak ) =              Pr(i, j) =              1/36 = |Ak |/36.
                          (i,j)∈Ak                (i,j)∈Ak


For example, the number 4 occurs three times in the matrix and, therefore,
the event A4 has size three. Observe that we have already seen this in (5.2).
It follows that
                     Pr (A4 ) = |A4 |/36 = 3/36 = 1/12.
In a similar way, we see that

                          Pr (A2 )     =   1/36,
                          Pr (A3 )     =   2/36       = 1/18,
                          Pr (A4 )     =   3/36       = 1/12,
                          Pr (A5 )     =   4/36       = 1/9,
                          Pr (A6 )     =   5/36,
                          Pr (A7 )     =   6/36       = 1/6,
                          Pr (A8 )     =   5/36,
                          Pr (A9 )     =   4/36       = 1/9,
                          Pr (A10 )    =   3/36       = 1/12,
                          Pr (A11 )    =   2/36       = 1/18,
                          Pr (A12 )    =   1/36.
174                                       Chapter 5.         Discrete Probability


   A sample space is not necessarily uniquely defined. In the last example,
where we were interested in the sum of the results of two rolls of a die, we
could also have taken the sample space to be the set

                      S 0 = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

The probability function Pr0 corresponding to this sample space S 0 is given
by
                             Pr0 (k) = Pr (Ak ) ,
because Pr0 (k) is the probability that we get the outcome k in the sample
space S 0 , which is the same as the probability that event Ak occurs in the
sample space S. You should verify that this function Pr0 satisfies the two
conditions in Definition 5.2.2 and, thus, is a valid probability function on S 0 .


5.3      Basic Rules of Probability
In this section, we prove some basic properties of probability functions. As
we will see, all these properties follow from Definition 5.2.2. Throughout this
section, (S, Pr) is a probability space.
    Recall that an event is a subset of the sample space S. In particular, the
empty set ∅ is an event. Intuitively, Pr(∅) must be zero, because it is the
probability that there is no outcome, which can never occur. The following
lemma states that this is indeed the case.

Lemma 5.3.1 Pr(∅) = 0.

Proof. By (5.1), we have
                                          X
                               Pr(∅) =          Pr(ω).
                                          ω∈∅


Since there are zero terms in this summation, its value is equal to zero.

    We say that two events A and B are disjoint, if A ∩ B = ∅. A sequence
A1 , A2 , . . . , An of events is pairwise disjoint, if any pair in this sequence is
disjoint. The following lemma is similar to the Sum Rule of Section 3.4.
5.3.   Basic Rules of Probability                                              175


Lemma 5.3.2 If A1 , A2 , . . . , An is a sequence of pairwise disjoint events,
then                                            n
                                               X
               Pr(A1 ∪ A2 ∪ · · · ∪ An ) =        Pr (Ai ) .
                                                     i=1

Proof. Let A = A1 ∪ A2 ∪ · · · ∪ An . Using (5.1), we have
                                      X
                       Pr(A) =           Pr(ω)
                                        ω∈A
                                        Xn     X
                                    =               Pr(ω)
                                        i=1 ω∈Ai
                                        Xn
                                    =          Pr (Ai ) .
                                         i=1




    To give an example, assume we roll a fair die twice. What is the proba-
bility that the sum of the two results is even? If you look at the matrix in
Section 5.2, then you see that there are 18 entries, out of 36, that are even.
Therefore, the probability of having an even sum is equal to 18/36 = 1/2.
Below we will give a different way to determine this probability.
    The sample space is the set

                       S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},

where i is the result of the first roll and j is the result of the second roll. Each
element of S has the same probability 1/36 of being an outcome of rolling
the die twice.
   The event we are interested in is

                        A = {(i, j) ∈ S : i + j is even}.

Observe that i + j is even if and only if both i and j are even or both i and
j are odd. Therefore, we split the event A into two disjoint events

                   A1 = {(i, j) ∈ S : both i and j are even}

and
                   A2 = {(i, j) ∈ S : both i and j are odd}.
176                                  Chapter 5.         Discrete Probability


By Lemma 5.3.2, we have

                        Pr(A) = Pr (A1 ) + Pr (A2 ) .

The set A1 has 3 · 3 = 9 elements, because there are 3 choices for i and 3
choices for j. Similarly, the set A2 has 9 elements. It follows that

             Pr(A) = Pr (A1 ) + Pr (A2 ) = 9/36 + 9/36 = 1/2.

    In the next lemma, we relate the probability that an event occurs to the
probability that the event does not occur. If A is an event, then A denotes
its complement, i.e., A = S \ A. Intuitively, the sum of Pr(A) and Pr A
must be equal to one, because the event A either occurs or does not occur.
The following lemma states that this is indeed the case. Observe that this is
similar to the Complement Rule of Section 3.3.

Lemma 5.3.3 For any event A,
                                           
                           Pr(A) = 1 − Pr A .

Proof. Since A and A are disjoint and S = A∪A, it follows from Lemma 5.3.2
that                                                  
                 Pr(S) = Pr A ∪ A = Pr(A) + Pr A .
We have seen in Section 5.2 that Pr(S) = 1.

    Consider again the sample space that we saw after Lemma 5.3.2. We
showed that, when rolling a fair die twice, we get an even sum with probabil-
ity 1/2. It follows from Lemma 5.3.3 that we get an odd sum with probability
1 − 1/2 = 1/2.
   The next lemma is similar to the Principle of Inclusion and Exclusion
that we have seen in Section 3.5.

Lemma 5.3.4 If A and B are events, then

                 Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

Proof. Since B \ A and A ∩ B are disjoint and B = (B \ A) ∪ (A ∩ B), it
follows from Lemma 5.3.2 that

                     Pr(B) = Pr(B \ A) + Pr(A ∩ B).
5.3.   Basic Rules of Probability                                           177


Next observe that A and B \ A are disjoint. Since A ∪ B = A ∪ (B \ A), we
again apply Lemma 5.3.2 and obtain

                      Pr(A ∪ B) = Pr(A) + Pr(B \ A).

By combining these two equations, we obtain the claim in the lemma.

    To give an example, assume we choose a number x in the sample space S =
{1, 2, . . . , 1000}, such that each element has the same probability 1/1000 of
being chosen. What is the probability that x is divisible by 2 or 3? Consider
the events
                           A = {i ∈ S : i is divisible by 2}
and
                      B = {i ∈ S : i is divisible by 3}.
Then we want to determine Pr(A ∪ B), which, by Lemma 5.3.4, is equal to

                 Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B).

Since there are b1000/2c = 500 even numbers in S, we have

                              Pr(A) = 500/1000.

Since there are b1000/3c = 333 elements in S that are divisible by 3, we have

                              Pr(B) = 333/1000.

Observe that i belongs to A ∩ B if and only if i is divisible by 6, i.e.,

                    A ∩ B = {i ∈ S : i is divisible by 6}.

Since there are b1000/6c = 166 elements in S that are divisible by 6, we have

                           Pr(A ∩ B) = 166/1000.

We conclude that

              Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B)
                        = 500/1000 + 333/1000 − 166/1000
                        = 667/1000.
178                                     Chapter 5.             Discrete Probability


Lemma 5.3.5 (Union Bound) For any integer n ≥ 1, if A1 , A2 , . . . , An is
a sequence of events, then
                                                     n
                                                     X
                    Pr (A1 ∪ A2 ∪ · · · ∪ An ) ≤           Pr (Ai ) .
                                                     i=1

Proof. The proof is by induction on n. If n = 1, we have equality and, thus,
the claim obviously holds. Let n ≥ 2 and assume the claim is true for n − 1,
i.e., assume that
                                                      n−1
                                                      X
                   Pr (A1 ∪ A2 ∪ · · · ∪ An−1 ) ≤           Pr (Ai ) .
                                                      i=1

Let B = A1 ∪ A2 ∪ · · · ∪ An−1 . Then it follows from Lemma 5.3.4 that

      Pr (B ∪ An ) = Pr(B) + Pr (An ) − Pr (B ∩ An ) ≤ Pr(B) + Pr (An ) ,

because Pr (B ∩ An ) ≥ 0 (this follows from the first condition in Defini-
tion 5.2.2). Since we assumed that
                                       n−1
                                       X
                             Pr(B) ≤         Pr (Ai ) ,
                                       i=1

it follows that

              Pr (A1 ∪ A2 ∪ · · · ∪ An ) = Pr (B ∪ An )
                                         ≤ Pr(B) + Pr (An )
                                           n−1
                                           X
                                         ≤     Pr (Ai ) + Pr (An )
                                               i=1
                                              n
                                              X
                                        =            Pr (Ai ) .
                                               i=1




Lemma 5.3.6 If A and B are events with A ⊆ B, then

                                Pr(A) ≤ Pr(B).
5.4.   Uniform Probability Spaces                                             179


Proof. Using (5.1) and the fact that Pr(ω) ≥ 0 for each ω in S, we have
                                      X
                          Pr(A) =        Pr(ω)
                                            ω∈A
                                            X
                                        ≤         Pr(ω)
                                            ω∈B
                                        = Pr(B).




5.4      Uniform Probability Spaces
In this section, we consider finite sample spaces S in which each outcome has
the same probability. Since, by Definition 5.2.2, all probabilities add up to
one, the probability of each outcome must be equal to 1/|S|.
Definition 5.4.1 A uniform probability space is a pair (S, Pr), where S is a
finite sample space and the probability function Pr : S → R satisfies
                                               1
                                    Pr(ω) =       ,
                                              |S|
for each outcome ω in S.
    The probability spaces that we have seen in Section 5.2.1 are all uniform,
except the space (S 0 , Pr0 ) that we saw at the end of that section.
    To give another example, when playing Lotto 6/49, you choose a 6-
element subset of the set A = {1, 2, . . . , 49}. Twice a week, the Ontario Lot-
tery and Gaming Corporation (OLG) draws the six “winning numbers” uni-
formly at random from A. If your numbers are equal to those drawn by OLG,
then you can withdraw from this course and spend the rest of your life on the
beach. Most people find it silly to choose the subset {1, 2, 3, 4, 5, 6}. They
think that it is better to choose, for example, the subset {2, 5, 16, 36, 41, 43}.
Is this true?
    For this example, the sample space     is the set S consisting of all 6-element
subsets of A. Since S has size 49
                                     
                                   6
                                       and the subset drawn by OLG is uniform,
each outcome (i.e., each 6-element subset of S) has a probability of
                        1             1
                       49
                            =                ≈ 0.000000072.
                       6
                                 13, 983, 816
180                                      Chapter 5.     Discrete Probability


In particular, both {1, 2, 3, 4, 5, 6} and {2, 5, 16, 36, 41, 43} have the same
probability of being the winning numbers. (Still, the latter subset was drawn
by OLG on February 8, 2014.)
   The lemma below states that in a uniform probability space (S, Pr), the
probability of an event A is the ratio of the size of A and the size of S.

Lemma 5.4.2 If (S, Pr) is a uniform probability space and A is an event,
then
                                     |A|
                             Pr(A) =     .
                                     |S|

Proof. By using (5.1) and Definition 5.4.1, we get
                        X               X 1        1 X       |A|
              Pr(A) =         Pr(ω) =           =         1=     .
                        ω∈A             ω∈A
                                            |S|   |S| ω∈A
                                                             |S|




5.4.1     The Probability of Getting a Full House
In a standard deck of 52 cards, each card has a suit and a rank. There are
four suits (spades ♠, hearts ♥, clubs ♣, and diamonds ♦), and 13 ranks
(Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King).
    A hand of five cards is called a full house, if three of the cards are of
the same rank and the other two cards are also of the same (but necessarily
different) rank. For example, the hand

                              7♠, 7♥, 7♦, Q♠, Q♣

is a full house, because it consists of three sevens and two Queens.
    Assume we get a uniformly random hand of five cards. What is the
probability that this hand is a full house? To answer this question, first
observe that a hand of five cards is a subset of the set of all 52 cards. Thus,
the sample space is the set S consisting of all 5-element subsets of the set of
52 cards and, therefore,
                                  
                                   52
                           |S| =         = 2, 598, 960.
                                    5
5.5.   The Birthday Paradox                                              181


Each hand of five cards has a probability of 1/|S| of being chosen.
   Since we are interested in the probability of a random hand being a full
house, we define the event A to be the set of all elements in S that are full
houses. By Lemma 5.4.2, we have

                                          |A|
                                Pr(A) =       .
                                          |S|

Thus, to determine Pr(A), it remains to determine the size of the set A, i.e.,
the total number of full houses. For this, we will use the Product Rule of
Section 3.1:

   • The procedure is “choose a full house”.

   • First task: Choose the rank of the three cards in the full house. There
     are 13 ways to do this.

   • Second task: Choose the suits of these three cards. There are 43 ways
                                                                      

     to do this.

   • Third task: Choose the rank of the other two cards in the full house.
     There are 12 ways to do this.

   • Fourth task: Choose the suits of these two cards. There are 42 ways
                                                                   

     to do this.

Thus, the number of full houses is equal to
                                          
                                4           4
                   |A| = 13 ·       · 12 ·     = 3, 744.
                                3           2

We conclude that the probability of getting a full house is equal to

                             |A|      3, 744
                   Pr(A) =       =             ≈ 0.00144.
                             |S|   2, 598, 960

5.5     The Birthday Paradox
Let n ≥ 2 be an integer and consider a group of n people. In this section,
we will determine the probability pn that at least two of them have the same
birthday. We will ignore leap years, so that there are 365 days in one year.
182                                          Chapter 5.          Discrete Probability


    Below, we will show that p2 = 1/365. If n ≥ 366, then it follows from the
Pigeonhole Principle (see Section 3.10) that there must be at least two people
with the same birthday and, therefore, pn = 1. Intuitively, if n increases from
2 to 365, the value of pn increases as well. What is the value of n such that
pn is larger than 1/2 for the first time? That is, what is the value of n for
which pn−1 ≤ 1/2 < pn ? In this section, we will see that this question can be
answered using simple counting techniques that we have seen in Chapter 3.
    We denote the people by P1 , P2 , . . . , Pn , we denote the number of days in
one year by d, and we number the days in one year as 1, 2, . . . , d. The sample
space is the set

        Sn = {(b1 , b2 , . . . , bn ) : bi ∈ {1, 2, . . . , d} for each 1 ≤ i ≤ n},

where bi denotes the birthday of Pi . Note that

                                        |Sn | = dn .

We consider the uniform probability space: For each element (b1 , b2 , . . . , bn )
in Sn , we have
                                                    1      1
                      Pr (b1 , b2 , . . . , bn ) =       = n.
                                                   |Sn |  d
The event we are interested in is

       An = “at least two of the numbers in b1 , b2 , . . . , bn are equal”.

Using the notation of sets, this is the same as

       An = {(b1 , b2 , . . . , bn ) ∈ Sn : b1 , b2 , . . . , bn contains duplicates}.

The probability pn that we introduced above is equal to

                                     pn = Pr (An ) .

As mentioned above, the Pigeonhole Principle implies that pn = 1 for n > d.
Therefore, we assume from now on that n ≤ d.
   Let us start by determining p2 . Since we consider the uniform probability
space, we have, by Lemma 5.4.2,

                                                       |A2 |
                                 p2 = Pr (A2 ) =             .
                                                       |S2 |
5.5.      The Birthday Paradox                                                               183


We know already that |S2 | = d2 . The event A2 is equal to

                              A2 = {(1, 1), (2, 2), . . . , (d, d)}.

Thus, |A2 | = d and we obtain

                                           |A2 |   d  1
                                    p2 =         = 2 = .
                                           |S2 |  d   d

   To determine pn for larger values of n, it is easier to determine the
probability of the complement An . The latter probability, together with
Lemma 5.3.3, will give us the value of pn . Note that

         An = {(b1 , b2 , . . . , bn ) ∈ Sn : b1 , b2 , . . . , bn are pairwise distinct}.

In words, An is the set of all ordered sequences consisting of n pairwise
distinct elements of {1, 2, . . . , d}. In Section 3.6, see (3.1), we have seen that

                                                    d!
                                       |An | =            .
                                                 (d − n)!

We conclude that, for any n with 2 ≤ n ≤ d,

                                  pn = Pr (An )
                                                 
                                     = 1 − Pr An
                                                  |An |
                                        = 1−
                                                  |Sn |
                                                      d!
                                        = 1−                 .
                                                  (d − n)!dn

By taking d = 365, we get p22 = 0.476 and p23 = 0.507. Thus, in a random
group of 23 people1 , the probability that at least two of them have the same
birthday is more than 50%. Most people are very surprised when they see
this for the first time, because our intuition says that a much larger group is
needed to have a probability of more than 50%. The values pn approach 1
pretty fast. For example, p40 = 0.891 and p100 = 0.9999997.
  1
      two soccer teams plus the referee
184                                      Chapter 5.     Discrete Probability


5.5.1     Throwing Balls into Boxes
When we derived the expression for pn , we did not use the fact that the
value of d is equal to 365. In other words, the expression is valid for any
value of d. For general values of d, we can interpret the birthday problem
in the following way: Consider d boxes B1 , B2 , . . . , Bd , where d is a large
integer. Assume that we throw n balls into these boxes so that each ball
lands in a uniformly random box. Then pn is the probability that there is at
least one box that contains more than one ball. Since it is not easy to see
how the expression
                                         d!
                            pn = 1 −
                                     (d − n)!dn
depends on n, we will approximate it by a simpler expression. For this, we
will use the inequality
                               1 − x ≤ e−x ,                               (5.3)
which is valid for any real number x. If x is close to zero, then the inequality
is very accurate. The easiest way to prove this inequality is by showing that
the minimum of the function f (x) = x + e−x is equal to f (0) = 1, using
techniques from calculus.
    If we define qn = 1 − pn , then we have
                                           d!
                                qn =              .
                                       (d − n)!dn
Using (5.3), we get
                 d d−1 d−2 d−3                     d − (n − 1)
         qn =      ·        ·       ·          ···
                 d     d        d         d             d
                 d−1 d−2 d−3                   d − (n − 1)
             =          ·        ·         ···
                   d        d        d               d
             =   (1 − 1/d) · (1 − 2/d) · (1 − 3/d) · · · (1 − (n − 1)/d)
             ≤   e−1/d · e−2/d · e−3/d · · · e−(n−1)/d
             =   e−(1+2+3+···+(n−1))/d .
Using the equality
                     1 + 2 + 3 + · · · + (n − 1) = n(n − 1)/2,
see Theorem 2.2.10, we thus get
                                qn ≤ e−n(n−1)/(2d) ,
5.6.   The Big Box Problem                                               185


and therefore,
                          pn = 1 − qn ≥ 1 − e−n(n−1)/(2d) .
If n is large, then n(n − 1)/(2d) is very close to n2 /(2d) and, thus,
                                                  2 /(2d)
                                   pn & 1 − e−n             .
                 √
If we take n =       2d, then we get

                           pn & 1 − e−1 ≈ 0.632.
                                        √
Thus, for large values of d, if we throw 2d balls into d boxes, then with
probability (approximately) at least 1 − 1/e, there is a box that contains
more than one ball.


5.6      The Big Box Problem
Keith chooses two distinct elements x and y, with x < y, from the set
A = {0, 1, 2, . . . , 100}; he does not show these two numbers to us. He takes
two identical boxes, and puts x dollars in one box and y dollars in the other
box. Then Keith closes the two boxes, shuffles them, and puts them on a
table. At this moment, we can see the two boxes, they look identical to us,
and the only information we have is that they contain different amounts of
money, where each amount is an element of the set A.



                 $x           $y          or         $y         $x


    We will refer to the box containing x dollars as the small box and to the
box containing y dollars as the big box. Our goal is to find the big box. We
are allowed to do the following:

  1. We can choose one of the two boxes, open it, and determine how much
     money is inside it.

  2. Now we have to make our final decision: Either we keep the box we
     just opened or we take the other box.
186                                    Chapter 5.       Discrete Probability


For example, assume that the box we pick in the first step contains $33.
Then we know that the other box contains either less than $33 or more
than $33. It seems that the only reasonable thing to do is to flip a fair coin
when making our final decision. If we do that, then we find the big box with
probability 0.5.
    In the rest of this section, we will show the surprising result that we can
find the big box with probability at least 0.505.
   The idea is as follows. Assume that we know a number z such that
x < z < y. (Keep in mind that we do not know x and we do not know y.
Thus, we assume that we know a number z that is between the two unknown
numbers x and y.)


   • If the box we choose in the first step contains more than z dollars, then
     we know that this is the big box and, therefore, we keep it.


   • If the box we choose in the first step contains less than z dollars, then
     we know that this is the small box and, therefore, we take the other
     box.


Thus, if we know this number z with x < z < y, then we are guaranteed to
find the big box.
    Of course, it is not realistic to assume that we know this magic number z.
The trick is to choose a random z and hope that it is between x and y. If z
is between x and y, then we find the big box with probability 1; otherwise,
we find the big box with probability 1/2. As we will see later, the overall
probability of finding the big box will be at least 0.505.
   In order to avoid the case when z = x or z = y, we will choose z from
the set

                     B = {1/2, 3/2, 5/2, . . . , 100 − 1/2}.

Note that |B| = 100. Our algorithm that attempts to find the big box does
the following:
5.6.     The Big Box Problem                                               187

       Algorithm FindBigBox:
       Step 1: Choose one of the two boxes uniformly at random, open it,
       and determine the amount of money inside it; let this amount be a.
       Step 2: Choose z uniformly at random from the set B.
       Step 3: Do the following:

          • If a > z, then keep the box chosen in Step 1.

          • Otherwise (i.e., if a < z), take the other box.



5.6.1       The Probability of Finding the Big Box
We are now going to determine the probability that this algorithm finds the
big box. First, we have to ask ourselves what the sample space is. There are
two places in the algorithm where a random element is chosen:
   • In Step 1, we choose the element a, which is a random element from
     the set {x, y}. We know that this value a is equal to one of x and y.
     However, at the end of Step 1, we do not know whether a = x or a = y.
   • In Step 2, we choose a random element from the set B.
Based on this, the sample space S is the Cartesian product
                  S = {x, y} × B = {(a, z) : a ∈ {x, y}, z ∈ B}
and Steps 1 and 2 can be replaced by
   • choose a uniformly random element (a, z) in S.
Note that |S| = 200.
   We say that algorithm FindBigBox is successful if it finds the big box.
Thus, we want to determine Pr(W ), where W is the event
                 W = “algorithm FindBigBox is successful”.
We are going to write this event as a subset of the sample space S. For this, we
have to determine all elements (a, z) in S for which algorithm FindBigBox
is successful.
    First consider the case when a = x. In this case, the box we choose in
Step 1 is the small box. There are two possibilities for z:
188                                    Chapter 5.       Discrete Probability


   • If x = a > z, then the algorithm keeps the small box and, thus, is not
     successful.
   • If x = a < z, then the algorithm takes the other box (which is the big
     box) and, thus, is successful.
Thus, the event W contains the set
          Wx = {(x, z) : z ∈ {x + 1/2, x + 3/2, . . . , 100 − 1/2}}.
You can verify that
                               |Wx | = 100 − x.
   The second case to consider is when a = y. In this case, the box we
choose in Step 1 is the big box. Again, there are two possibilities for z:
   • If y = a > z, then the algorithm keeps the big box and, thus, is
     successful.
   • If y = a < z, then the algorithm takes the other box (which is the small
     box) and, thus, is not successful.
Thus, the event W contains the set
                Wy = {(y, z) : z ∈ {1/2, 3/2, . . . , y − 1/2}}.
You can verify that
                                  |Wy | = y.
   Since W = Wx ∪ Wy and the events Wx and Wy are disjoint, we have, by
Lemma 5.3.2,
                      Pr(W ) = Pr (Wx ∪ Wy )
                             = Pr (Wx ) + Pr (Wy ) .
Since the element (a, z) is chosen uniformly at random from the sample
space S, we can use Lemma 5.4.2 to determine the probability that algorithm
FindBigBox is successful:
                      Pr(W ) = Pr (Wx ) + Pr (Wy )
                               |Wx | |Wy |
                             =      +
                                |S|     |S|
                               100 − x       y
                             =          +
                                 200        200
                               1 y−x
                             =   +        .
                               2    200
5.8.     Conditional Probability                                             189


Since x and y are distinct integers and x < y, we have y − x ≥ 1, and we
conclude that
                                  1     1
                        Pr(W ) ≥ +        = 0.505.
                                  2 200

5.7        The Monty Hall Problem
The Monty Hall Problem is a well-known puzzle in probability theory. It
is named after the host, Monty Hall, of the American television game show
Let’s Make a Deal. The problem became famous in 1990, when (part of) a
reader’s letter was published in Marilyn vos Savant’s column Ask Marilyn in
the magazine Parade:
       Suppose you’re on a game show, and you’re given the choice of three
       doors: Behind one door is a car; behind the others, goats. You pick
       a door, say No. 1, and the host, who knows what’s behind the doors,
       opens another door, say No. 3, which has a goat. He then says to you,
       “Do you want to pick door No. 2?” Is it to your advantage to switch
       your choice?

    Note that the host can always open a door that has a goat behind it.
After the host has opened No. 3, we know that the car is either behind No. 1
or No. 2, and it seems that both these doors have the same probability (i.e.,
50%) of having the car behind them. We will prove below, however, that this
is not true: It is indeed to our advantage to switch our choice.
    We assume that the car is equally likely to be behind any of the three
doors. Moreover, the host knows what is behind each door.

   • We initially choose one of the three doors uniformly at random; this
     door remains closed.

   • The host opens one of the other two doors that has a goat behind it.

   • Our final choice is to switch to the other door that is still closed.

Let A be the event that we win the car and let B be the event that the initial
door has a goat behind it. Then it is not difficult to see that event A occurs
if and only if event B occurs. Therefore, the probability that we win the car
is equal to
                           Pr(A) = Pr(B) = 2/3.
190                                      Chapter 5.         Discrete Probability


5.8      Conditional Probability
Anil Maheshwari has two children. We are told that one of them is a boy.
What is the probability that the other child is also a boy? Most people will
say that this probability is 1/2. We will show below that this is not the
correct answer.
   Since Anil has two children, the sample space is

                        S = {(b, b), (b, g), (g, b), (g, g)},

where, for example, (b, g) indicates that the youngest child is a boy and the
oldest child is a girl. We assume a uniform probability function, so that each
outcome has a probability of 1/4.
   We are given the additional information that one of the two children is
a boy, or, to be more precise, that at least one of the two children is a boy.
This means that the actual sample space is not S, but

                              {(b, b), (b, g), (g, b)}.

When asking for the probability that the other child is also a boy, we are
really asking for the probability that both children are boys. Since there is
only one possibility (out of three) for both children to be boys, it follows that
this probability is equal to 1/3.
    This is an example of a conditional probability: We are asking for the
probability of an event (both children are boys), given that another event (at
least one of the two children is a boy) occurs.

Definition 5.8.1 Let (S, Pr) be a probability space and let A and B be two
events with Pr(B) > 0. The conditional probability Pr(A | B), pronounced
as “the probability of A given B”, is defined as

                                           Pr(A ∩ B)
                           Pr(A | B) =               .
                                             Pr(B)

   Let us try to understand where this definition comes from. Initially, the
sample space is equal to S. When we are given the additional information
that event B occurs, the sample space “shrinks” to B, and event A occurs if
and only if event A ∩ B occurs.
5.8.    Conditional Probability                                          191


                        S
                                                    B


                             A


   You may think that Pr(A | B) should therefore be defined to be Pr(A∩B).
However, since the sum of all probabilities must be equal to 1, we have to
normalize, i.e., divide by Pr(B). Equivalently, if A = B, we get Pr(A | A),
which is the probability that event A occurs, given that event A occurs. This
probability should be equal to 1. Indeed, using the definition, we do get
                                   Pr(A ∩ A)   Pr(A)
                    Pr(A | A) =              =       = 1.
                                     Pr(A)     Pr(A)
In Exercise 5.24, you are asked to give a formal proof that our definition
gives a valid probability function on the sample space S.

5.8.1     Anil’s Children
Returning to Anil’s two children, we saw that the sample space is
                        S = {(b, b), (b, g), (g, b), (g, g)}
and we assumed a uniform probability function. The events we considered
are
                     A = “both children are boys”
and
              B = “at least one of the two children is a boy”,
and we wanted to know Pr(A | B). Writing A and B as subsets of the sample
space S, we get
                              A = {(b, b)}
and
                            B = {(b, b), (b, g), (g, b)}.
Using Definition 5.8.1, it follows that
                       Pr(A ∩ B)   Pr(A)   |A|/|S|   1/4
         Pr(A | B) =             =       =         =     = 1/3,
                         Pr(B)     Pr(B)   |B|/|S|   3/4
which is the same answer as we got before.
192                                     Chapter 5.      Discrete Probability


5.8.2     Rolling a Die
Assume we roll a fair die, i.e., we choose an element uniformly at random
from the sample space
                            S = {1, 2, 3, 4, 5, 6}.
Consider the events
                             A = “the result is 3”
and
                      B = “the result is an odd integer”.
    What is the conditional probability Pr(A | B)? To determine this proba-
bility, we assume that event B occurs, i.e., the roll of the die resulted in one
of 1, 3, and 5. Given that event B occurs, event A occurs in one out of these
three possibilities. Thus, Pr(A | B) should be equal to 1/3. We are going
to verify that this is indeed the answer we get when using Definition 5.8.1:
Since
                                   A = {3}
and
                                  B = {1, 3, 5},
we have
                        Pr(A ∩ B)   Pr(A)   |A|/|S|   1/6
          Pr(A | B) =             =       =         =     = 1/3.
                          Pr(B)     Pr(B)   |B|/|S|   3/6

    Let us now consider the conditional probability Pr(B | A). Thus, we are
given that event A occurs, i.e., the roll of the die resulted in 3. Since 3 is an
odd integer, event B is guaranteed to occur. Therefore, Pr(B | A) should be
equal to 1. Again, we are going to verify that this is indeed the answer we
get when using Definition 5.8.1:

                                  Pr(B ∩ A)   Pr(A)
                    Pr(B | A) =             =       = 1.
                                    Pr(A)     Pr(A)

This shows that, in general, Pr(A | B) is not equal to Pr(B | A). Observe
that this is not surprising. (Do you see why?)
   Consider the event

                      C = “the result is a prime number”,
5.8.   Conditional Probability                                           193


which, when written as a subset of the sample space, is

                                 C = {2, 3, 5}.

Then Pr(C | B) should be equal to 2/3 and Pr(C | A) should be equal to 1.
Indeed, we have

                           Pr(C ∩ B)   |C ∩ B|/|S|   2/6
             Pr(C | B) =             =             =     = 2/3
                             Pr(B)       |B|/|S|     3/6

and
                                 Pr(C ∩ A)   Pr(A)
                   Pr(C | A) =             =       = 1.
                                   Pr(A)     Pr(A)
   Recall that B denotes the complement of the event B. Thus, this is the
event
                   B = “the result is an even integer”,
which, when written as a subset of the sample space, is

                                 B = {2, 4, 6}.
             
Then Pr C | B should be equal to 1/3. Indeed, we have
                               
                  Pr C ∩ B        |C ∩ B|/|S|    1/6
        Pr C | B =           =                 =     = 1/3.
                       Pr B           |B|/|S|     3/6

Observe that
                                      
                  Pr(C | B) + Pr C | B = 2/3 + 1/3 = 1.

You may think that this is true for any two events B and C. This is, however,
not the case: Since
                              A = {1, 2, 4, 5, 6},
we have
                                
                       Pr C ∩ A   |C ∩ A|/|S|   2/6
             Pr C | A =         =             =     = 2/5
                          Pr A       |A|/|S|     5/6

and, thus,                             
                   Pr(C | A) + Pr C | A = 1 + 2/5 6= 1.
194                                   Chapter 5.    Discrete Probability


   It should be an easy exercise to verify that
                                            
                        Pr(A | C) + Pr A | C = 1.

Intuitively, this should be true for any two events A and C: When we are
given that event C occurs, then either A occurs or A does not occur (in
which case A occurs). The following lemma states that this intuition is
indeed correct.

Lemma 5.8.2 Let (S, Pr) be a probability space and let A and B be two
events with Pr(B) > 0. Then
                                            
                        Pr(A | B) + Pr A | B = 1.

Proof. By definition, we have
                                                            
                                      Pr(A ∩ B) Pr A ∩ B
          Pr(A | B) + Pr A | B       =           +
                                         Pr(B)       Pr(B)
                                                            
                                       Pr(A ∩ B) + Pr A ∩ B
                                     =                        .
                                               Pr(B)

Since the events A ∩ B and A ∩ B are disjoint, we have, by Lemma 5.3.2,
                                                   
           Pr(A ∩ B) + Pr A ∩ B = Pr (A ∩ B) ∪ A ∩ B .

By drawing a Venn diagram, you will see that
                                        
                         (A ∩ B) ∪ A ∩ B = B,

implying that
                                        
                    Pr(A ∩ B) + Pr A ∩ B = Pr(B).
We conclude that
                                        Pr(B)
                   Pr(A | B) + Pr A | B =       = 1.
                                          Pr(B)
5.8.     Conditional Probability                                                  195


5.8.3      Flip and Flip or Roll
We are given a fair red coin, a fair blue coin, and a fair die. First, we flip
the red coin. If the result of this flip is heads, then we flip the blue coin and
return the result of this second flip. Otherwise, the red coin came up tails,
in which case we roll the die and return the result of this roll.
    What is the probability that the value 5 is returned? Our intuition says
that this probability is equal to 1/12: The value 5 is returned if and only
if the red coin comes up tails (which happens with probability 1/2) and the
result of rolling the die is 5 (which happens with probability 1/6). We will
prove that this is indeed the correct answer.
    We start by modifying the above random process so that it better reflects
the random choices that are being made:
       Algorithm FlipAndFlipOrRoll:

           fr = the result of flipping the red coin;
           if fr = H
           then fb = the result of flipping the blue coin;
                  return the ordered pair (fr , fb )
           else d = the result of rolling the die;
                 return the ordered pair (fr , d)
           endif

    The possible executions of this algorithm are visualized in the following
tree diagram:
                                flip red coin
                           H                     T

               flip blue coin                        roll die

                H          T
                                                1 2 3 4         5   6
               HH        HT              T1 T2 T3 T4 T5 T6

   The sample space is the set S of all possible values that can be returned
by algorithm FlipAndFlipOrRoll. Thus, we have

         S = {(H, H), (H, T ), (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6)}.
196                                     Chapter 5.        Discrete Probability


We are interested in the probability that the algorithm returns the value
(T, 5), i.e., the probability of the event

                                 A = {(T, 5)}.

Since the event A obviously depends on the result of flipping the red coin,
we consider the event

             R = “the result of flipping the red coin is tails”.

If we write this event as a subset of the sample space, we get

               R = {(T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6)}.

Observe that Pr(R) = 1/2 and A ∩ R = A, which implies that

                             Pr(A) = Pr(A ∩ R).

If we rewrite the expression for the conditional probability Pr(A | R) in
Definition 5.8.1, we get

                  Pr(A) = Pr(A ∩ R) = Pr(R) · Pr(A | R).

We have seen already that Pr(R) = 1/2. To determine Pr(A | R), we assume
that event R occurs. Under this assumption, event A occurs if and only if
the result of rolling the die is 5, which happens with probability 1/6. Thus,

                                Pr(A | R) = 1/6

and we conclude that

                          Pr(A) = 1/2 · 1/6 = 1/12,

which is the answer we were expecting to see.
     You may object to this method of determining Pr(A): When we deter-
mined Pr(A | R), we did not use the definition of conditional probability,
i.e.,
                                      Pr(A ∩ R)
                          Pr(A | R) =           .
                                        Pr(R)
Instead, we used the “informal definition”, by determining the probability
that event A occurs, assuming that event R occurs. Thus, we do not yet
5.8.   Conditional Probability                                           197


have a formal justification as to why Pr(A) is equal to 1/12. In the rest of
this section, we do present a formal justification.
    For each integer i with 1 ≤ i ≤ 6, we consider the event

                                Ai = {(T, i)}

and its probability
                                pi = Pr (Ai ) .

First observe that
                        p1 = p2 = p 3 = p4 = p5 = p6 ,

because the die is fair. Let p denote the common value of the pi ’s. Next
observe that
                      R = A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 ∪ A6 ,

where the six events on the right-hand side are pairwise disjoint. We have
seen already that Pr(R) = 1/2. It follows that

                            1/2 = Pr(R)
                                       6
                                            !
                                      [
                                = Pr     Ai
                                             i=1
                                     6
                                     X
                                 =          Pr (Ai )
                                      i=1
                                     6
                                     X
                                 =          p
                                      i=1
                                 = 6p,

implying that p = 1/12. Since the event A we are interested in is equal to
the event A5 , we conclude that

                      Pr(A) = Pr (A5 ) = p5 = p = 1/12.

Thus, we have obtained a formal proof of the fact that the probability of the
event A is equal 1/12.
198                                        Chapter 5.        Discrete Probability


   Using the definition of conditional probability, we can now also formally
determine Pr(A | R):

                                           Pr(A ∩ R)
                          Pr(A | R) =
                                             Pr(R)
                                           Pr(A)
                                         =
                                           Pr(R)
                                           1/12
                                         =
                                            1/2
                                         = 1/6.


5.9     The Law of Total Probability
Both Mick and Keith have a random birthday. What is the probability
that they have the same birthday? We have seen in Section 5.5 that this
probability is equal to 1/365. A common way to determine this probability
is as follows: Consider Mick’s birthday, which can be any of the 365 days
of the year. By symmetry, it does not really matter what Mick’s birthday
is, so we just assume that it is July 26. Then Mick and Keith have the
same birthday if and only if Keith’s birthday is also on July 26. Therefore,
since Keith has a random birthday, the probability that Mick and Keith
have the same birthday is equal to 1/365. The following theorem explains
this reasoning.

Theorem 5.9.1 (Law of Total Probability) Let (S, Pr) be a probability
space and let A be an event. Assume that B1 , B2 , . . . , Bn is a sequence of
events such that

  1. Pr (Bi ) > 0 for all i with 1 ≤ i ≤ n,

  2. the events B1 , B2 , . . . , Bn are pairwise disjoint, and

  3. ni=1 Bi = S.
     S

Then
                                 n
                                 X
                      Pr(A) =          Pr (A | Bi ) · Pr (Bi ) .
                                 i=1
5.9.   The Law of Total Probability                                           199


Proof. The assumptions imply that

                             A = A∩S                      !
                                               n
                                               [
                                 = A∩                Bi
                                               i=1
                                      n
                                      [
                                 =          (A ∩ Bi ) .
                                      i=1

Since the events A ∩ B1 , A ∩ B2 , . . . , A ∩ Bn are pairwise disjoint, it follows
from Lemma 5.3.2 that
                                             n
                                                         !
                                             [
                      Pr(A) = Pr               (A ∩ Bi )
                                             i=1
                                      n
                                      X
                                 =          Pr (A ∩ Bi ) .
                                      i=1

The theorem follows by observing that, from Definition 5.8.1,

                      Pr (A ∩ Bi ) = Pr (A | Bi ) · Pr (Bi ) .



    Let us consider the three conditions in this theorem. The first condition
is that Pr (Bi ) > 0, i.e., there is a positive probability that event Bi occurs.
The second and third conditions, i.e.,
   • the events B1 , B2 , . . . , Bn are pairwise disjoint, and
   • ni=1 Bi = S,
     S

are equivalent to
   • exactly one of the events B1 , B2 , . . . , Bn is guaranteed to occur.
  In the example in the beginning of this section, we wanted to know Pr(A),
where A is the event

               A = “Mick and Keith have the same birthday”.

In order to apply Theorem 5.9.1, we define a sequence B1 , B2 , . . . of events
that satisfy the conditions in this theorem and for which Pr (A | Bi ) is easy
200                                        Chapter 5.          Discrete Probability


to determine. For this example, we define the event Bi , for each i with
1 ≤ i ≤ 365, to be

           Bi = “Mick’s birthday is on the i-th day of the year”.

It is clear that (i) Pr (Bi ) = 1/365 > 0 and (ii) exactly one of the events
B1 , B2 , . . . , B365 is guaranteed to occur. It follows that

                               365
                               X
                     Pr(A) =         Pr (A | Bi ) · Pr (Bi ) .
                               i=1


To determine Pr (A | Bi ), we assume that the event Bi occurs, i.e., we fix
Mick’s birthday to be the i-th day of the year. Given this event Bi , event A
occurs if and only if Keith’s birthday is also on the i-th day. Thus, we have
Pr (A | Bi ) = 1/365 and it follows that

                              365
                              X
                      Pr(A) =     (1/365) · Pr (Bi )
                                     i=1
                                              365
                                              X
                               = (1/365)            Pr (Bi )
                                              i=1
                               = (1/365) · 1
                               = 1/365,

which is the same answer as we got in the beginning of this section.


5.9.1    Flipping a Coin and Rolling Dice
Consider the following experiment:

   • We flip a fair coin.

        – If the coin comes up heads, then we roll a fair die. Let R denote
          the result of this die.
        – If the coin comes up tails, then we roll two fair dice. Let R denote
          the sum of the results of these dice.
5.10.   Please Take a Seat                                                201


What is the probability that the value of R is equal to 2? That is, if we
define the event A to be

                    A = “the value of R is equal to 2”,

then we want to know Pr(A). Since the value of R depends on whether the
coin comes up heads or tails, we define the event

                      B = “the coin comes up heads”.

Since (i) both B and its complement B occur with a positive probability
and (ii) exactly one of B and B is guaranteed to occur, we can apply Theo-
rem 5.9.1 and get
                                                           
              Pr(A) = Pr(A | B) · Pr(B) + Pr A | B · Pr B .

We determine the four terms on the right-hand side:
   • It should be clear that
                                           
                               Pr(B) = Pr B = 1/2.

   • To determine Pr(A | B), we assume that the event B occurs, i.e., the
     coin comes up heads. Because of this assumption, we roll one die, and
     the event A occurs if and only if the result of this roll is 2. It follows
     that
                                Pr(A | B) = 1/6.
                             
   • To determine Pr A | B , we assume that the event B occurs, i.e., the
     coin comes up tails. Because of this assumption, we roll two dice, and
     the event A occurs if and only if both rolls result in 1. Since there are
     36 possible outcomes when rolling two dice, it follows that
                                         
                               Pr A | B = 1/36.

We conclude that
                                                       
            Pr(A) = Pr(A | B) · Pr(B) + Pr A | B · Pr B
                  = 1/6 · 1/2 + 1/36 · 1/2
                  = 7/72.
202                                      Chapter 5.       Discrete Probability


5.10       Please Take a Seat
Let n ≥ 2 and k ≥ 0 be integers. There are n + k chairs C1 , C2 , . . . , Cn+k
inside a room. Outside this room, there are n people P1 , P2 , . . . , Pn . These
people are told to enter the room one by one, in increasing order of their
indices, and each person must sit down in the chair having her index: For
i = 1, 2, . . . , n, person Pi enters the room and sits down in chair Ci .
    The first person P1 did not listen to the instructions and, instead of taking
chair C1 , chooses one of the n + k chairs uniformly at random and sits down
in the chosen chair. (Note that chair C1 may be the chosen chair.) From
then on, for i = 2, 3, . . . , n, person Pi checks if chair Ci is available. If this
is the case, then Pi sits down in chair Ci . Otherwise, Pi chooses one of the
available chairs uniformly at random and sits down in the chosen chair.
    We want to determine the probability pn,k that, at the end, the last person
Pn sits in chair Cn . Before we analyze this probability, we present this process
in pseudocode:
      Algorithm TakeASeat(n, k):

          // n ≥ 2 and k ≥ 0;
          // the input consists of n people P1 , P2 , . . . , Pn and
          // n + k chairs C1 , C2 , . . . , Cn+k
          j = uniformly random element in {1, 2, . . . , n + k};
          person P1 sits down in chair Cj ;
          for i = 2 to n
          do if chair Ci is available
              then person Pi sits down in chair Ci
              else j = index of a uniformly random available chair;
                   person Pi sits down in chair Cj
              endif
          endfor

   We consider the event
        An,k = “after algorithm TakeASeat(n, k) has terminated,
               person Pn sits in chair Cn ”.
The probability that was mentioned above is given by

                                pn,k = Pr (An,k ) .
5.10.    Please Take a Seat                                                     203


    In the for-loop in algorithm TakeASeat(n, k), the variable i runs from
2 to n. We will label the iterations of this loop by the value of the variable i.
Thus, iteration 3 will refer to the iteration in which i = 3; observe that this
is actually the second time that the algorithm goes through the for-loop.
    At this moment, you should convince yourself (for example, by induction
on i) that the following holds for each i = 2, 3, . . . , n:

   • At the start of iteration i,

         – all chairs C2 , C3 , . . . , Ci−1 have been taken, and
         – exactly one of the chairs C1 , Ci , Ci+1 , . . . , Cn+k has been taken.

If we take i = n, then we see that at the start of iteration n

   • all chairs C2 , C3 , . . . , Cn−1 have been taken, and

   • exactly one of the k + 2 chairs C1 , Cn , Cn+1 , . . . , Cn+k has been taken.

Event An,k occurs if and only if chair Cn is available (i.e., has not been taken)
at the start of iteration n.
    Is it true that the chair among C1 , Cn , Cn+1 , . . . , Cn+k that has been taken
at the start of iteration n is a uniformly random chair from these k +2 chairs?
If this is the case, then chair Cn has been taken with probability 1/(k + 2)
and, thus, Cn is available with probability 1 − 1/(k + 2) = (k + 1)/(k + 2).
In other words, if the question above has a positive answer, then

                                                  k+1
                            pn,k = Pr (An,k ) =       .
                                                  k+2
In the rest of this section, we will present two ways to prove that this is
indeed the correct value of pn,k . In both proofs, we will use the Law of Total
Probability of Section 5.9.
   Note that pn,k does not depend on n. In particular, if k = 0, then the
probability that person Pn sits in chair Cn is equal to 1/2.


5.10.1      Determining pn,k Using a Recurrence Relation
Let us start with the case when n = 2. Thus, there are two people P1 and P2 ,
and 2 + k chairs C1 , C2 , . . . , C2+k . Event A2,k occurs if and only if P1 chooses
204                                       Chapter 5.        Discrete Probability


one of the 1 + k chairs C1 , C3 , C4 , . . . , C2+k . Since P1 chooses a uniformly
random chair out of 2 + k chairs, it follows that

                                                  k+1
                            p2,k = Pr (A2,k ) =       .
                                                  k+2

    Assume from now on that n ≥ 3. We are going to derive a recurrence
relation that expresses pn,k in terms of p2,k , p3,k , . . . , pn−1,k .
    Consider the (random) index j of the chair that P1 chooses in the first
line of algorithm TakeASeat(n, k). We consider three cases, depending on
the value of j.

   • Assume that j ∈ {1, n + 1, n + 2, . . . , n + k}. Then for each i =
     2, 3, . . . , n, chair Ci is available at the start of iteration i and person Pi
     sits down in chair Ci . In particular, during iteration n, Pn sits down in
     chair Cn and event An,k occurs.

   • Assume that j = n. Then chair Cn has been taken at the start of
     iteration n and event An,k does not occur.

   • Assume that j ∈ {2, 3, . . . , n − 1}. Then for each i = 2, 3, . . . , j − 1,
     chair Ci is available at the start of iteration i and person Pi sits down in
     chair Ci . At the start of iteration j, the chairs C1 , Cj+1 , Cj+2 , . . . , Cn+k
     are available and person Pj chooses one of these chairs uniformly at ran-
     dom. Thus, iterations j, j +1, . . . , n can be viewed as running algorithm
     TakeASeat(n−j+1, k), where the n−j+1 people are Pj , Pj+1 , . . . , Pn
     and the n − j + 1 + k chairs are C1 , Cj+1 , Cj+2 , . . . , Cn+k . In this case,
     event An,k occurs if and only if, after algorithm TakeASeat(n − j +
     1, k) has terminated, person Pn sits in chair Cn , i.e., event An−j+1,k
     occurs.

Thus, we can determine the probability that event An,k occurs, if we are given
the value of j; note that this is a conditional probability. Since j is a random
element in the set {1, 2, . . . , n + k}, we are going to use the Law of Total
Probability (Theorem 5.9.1): For each j ∈ {1, 2, . . . , n + k}, we consider the
event

       Bn,k,j = “in the second line of algorithm TakeASeat(n, k),
                person P1 sits down in chair Cj ”.
5.10.   Please Take a Seat                                                   205


Since exactly one of these events is guaranteed to occur, we can apply The-
orem 5.9.1 and obtain
                              n+k
                              X
               Pr (An,k ) =         Pr (An,k | Bn,k,j ) · Pr (Bn,k,j ) .
                              j=1

It follows from the first line in algorithm TakeASeat(n, k) that, for each j
with 1 ≤ j ≤ n + k,
                                               1
                               Pr (Bn,k,j ) =     .
                                              n+k
   • Assume that j ∈ {1, n + 1, n + 2, . . . , n + k}. We have seen above that
     event An,k occurs. Thus,

                                 Pr (An,k | Bn,k,j ) = 1.

   • Assume that j = n. We have seen above that event An,k does not
     occur. Thus,
                         Pr (An,k | Bn,k,n ) = 0.

   • Assume that j ∈ {2, 3, . . . , n − 1}. We have seen above that event An,k
     occurs if and only if event An−j+1,k occurs. Thus,

                   Pr (An,k | Bn,k,j ) = Pr (An−j+1,k ) = pn−j+1,k .

We conclude that

                 pn,k = Pr (An,k )
                        n+k
                        X
                      =     Pr (An,k | Bn,k,j ) · Pr (Bn,k,j )
                           j=1
                           n+k
                           X                               1
                       =         Pr (An,k | Bn,k,j ) ·
                           j=1
                                                          n+k
                               n+k
                           1 X
                       =           Pr (An,k | Bn,k,j )
                         n + k j=1
                                                    n−1
                                                                     !
                          1                         X
                       =                (k + 1) +         pn−j+1,k       .
                         n+k                        j=2
206                                       Chapter 5.    Discrete Probability


If we write out the terms in this summation, then we get, for each n ≥ 3,

                       k+1   1
              pn,k =       +   (p2,k + p3,k + · · · + pn−1,k ) .
                       n+k n+k

As we have seen above, the base case is given by

                                          k+1
                                 p2,k =       .
                                          k+2

It remains to solve the recurrence relation. If you use the recurrence to
determine pn,k for some small values of n, then you will notice that they are
all equal to (k + 1)/(k + 2). This suggests that

                                          k+1
                                 pn,k =
                                          k+2

for all integers n ≥ 2. (Recall that we already suspected this.) Using induc-
tion on n, it can easily be proved that this is indeed the case.



5.10.2     Determining pn,k by Modifying the Algorithm

Our second solution is obtained by modifying algorithm TakeASeat(n, k):
Person P1 again did not listen to the instructions and, instead of taking
chair C1 , chooses one of the n + k chairs uniformly at random and sits down
in the chosen chair. From then on, for i = 2, 3, . . . , n − 1, person Pi checks
if chair Ci is available. If this is the case, then Pi sits down in chair Ci .
Otherwise, P1 is sitting in Ci , in which case (i) Pi kicks P1 out of chair Ci ,
(ii) Pi sits down in chair Ci , and (iii) P1 chooses one of the available chairs
uniformly at random and sits down in the chosen chair. In the last step,
person Pn checks if chair Cn is available. If this is the case, then Pn sits down
in chair Cn . Otherwise, Pn chooses one of the available chairs uniformly at
random and sits down in the chosen chair. In pseudocode, this modified
algorithm looks as follows:
5.10.   Please Take a Seat                                                  207


    Algorithm TakeASeat0 (n, k):

          // n ≥ 2 and k ≥ 0;
          // the input consists of n people P1 , P2 , . . . , Pn and
          // n + k chairs C1 , C2 , . . . , Cn+k
          j = uniformly random element in {1, 2, . . . , n + k};
          person P1 sits down in chair Cj ;
          for i = 2 to n − 1
          do // P2 sits in C2 , P3 sits in C3 , . . . , Pi−1 sits in Ci−1
              if chair Ci has been taken
              then // P1 sits in Ci
                    j = uniformly random element
                         in {1, i + 1, i + 2, . . . , n + k};
                    person P1 sits down in chair Cj
              endif;
              person Pi sits down in chair Ci
          endfor;
          // P2 sits in C2 , P3 sits in C3 , . . . , Pn−1 sits in Cn−1
          if chair Cn is available
          then person Pn sits down in chair Cn
          else j = uniformly random element
                    in {1, n + 1, n + 2, . . . , n + k};
                person Pn sits down in chair Cj
          endif

   As in the previous subsection, we label the iterations of the for-loop by
the value of the variable i. Moreover, we consider the first two lines of the
algorithm to be iteration 1. Thus, up to the end of the for-loop, algorithm
TakeASeat0 (n, k) makes iterations that are labeled 1, 2, . . . , n − 1.
   It follows from algorithm TakeASeat0 (n, k) that, after the for-loop has
terminated,

   • for each i = 2, 3, . . . , n − 1, person Pi sits in chair Ci , and

   • person P1 sits in one of the chairs C1 , Cn , Cn+1 , . . . , Cn+k .

    Recall that An,k is the event that person Pn sits in chair Cn , after the
original algorithm TakeASeat(n, k) has terminated. It follows from the
modified algorithm TakeASeat0 (n, k) that event An,k occurs if and only if,
208                                       Chapter 5.       Discrete Probability


after the for-loop of algorithm TakeASeat0 (n, k) has terminated, person P1
sits in one of the chairs C1 , Cn+1 , Cn+2 , . . . , Cn+k . In other words, pn,k =
Pr (An,k ) is equal to the probability that, after the for-loop has terminated,
P1 sits in one of C1 , Cn+1 , Cn+2 , . . . , Cn+k .
    We have seen that, after the for-loop has terminated, P1 sits in one of the
chairs C1 , Cn , Cn+1 , . . . , Cn+k . Thus, there is a value of i with 1 ≤ i ≤ n − 1,
such that P1 sits down in one of these chairs during iteration i. As soon as P1
sits down in one of these chairs, P1 stays there until the end of the algorithm.
This implies that there is exactly one integer i having this property. Based
on this, and since this integer i is random, we are again going to use the
Law of Total Probability (Theorem 5.9.1): For each i ∈ {1, 2, . . . , n − 1}, we
consider the event
            Bn,k,i = “during iteration i, person P1 chooses one of
                           the chairs C1 , Cn , Cn+1 , . . . , Cn+k .”
Since exactly one of these events is guaranteed to occur, Theorem 5.9.1 im-
plies that
                             n−1
                             X
                Pr (An,k ) =     Pr (An,k | Bn,k,i ) · Pr (Bn,k,i ) .
                               i=1
Consider the event
           Bn,k = “a uniformly random element from the set
                   {1, n, n + 1, . . . , n + k} is not equal to n.”
Then for each i with 1 ≤ i ≤ n − 1, we have
                                                          k+1
                     Pr (An,k | Bn,k,i ) = Pr (Bn,k ) =       .
                                                          k+2
It follows that
                         pn,k = Pr (An,k )
                                n−1
                                X   k+1
                              =            · Pr (Bn,k,i )
                                i=1
                                    k +  2
                                           n−1
                                 k+1X
                               =           Pr (Bn,k,i )
                                 k + 2 i=1
                               k+1
                               =     ,
                               k+2
because the last summation is equal to 1.
5.11.    Independent Events                                             209


5.11      Independent Events
Consider two events A and B in a sample space S. In this section, we will
define the notion of these two events being “independent”. Intuitively, this
should express that (i) the probability that event A occurs does not depend
on whether or not event B occurs, and (ii) the probability that event B
occurs does not depend on whether or not event A occurs. Thus, if we
assume that Pr(A) > 0 and Pr(B) > 0, then (i) Pr(A) should be equal to
the conditional probability Pr(A | B), and (ii) Pr(B) should be equal to
the conditional probability Pr(B | A). As we will show below, the following
definition exactly captures this.

Definition 5.11.1 Let (S, Pr) be a probability space and let A and B be
two events. We say that A and B are independent if

                        Pr(A ∩ B) = Pr(A) · Pr(B).

   In this definition, it is not assumed that Pr(A) > 0 and Pr(B) > 0. If
Pr(B) > 0, then
                                        Pr(A ∩ B)
                            Pr(A | B) =           ,
                                          Pr(B)
and A and B are independent if and only if

                            Pr(A | B) = Pr(A).

Similarly, if Pr(A) > 0, then A and B are independent if and only if

                            Pr(B | A) = Pr(B).


5.11.1     Rolling Two Dice
Assume we roll a red die and a blue die; thus, the sample space is

                    S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},

where i is the result of the red die and j is the result of the blue die. We
assume a uniform probability function. Thus, each outcome has a probability
of 1/36.
210                                     Chapter 5.        Discrete Probability


   Let D1 denote the result of the red die and let D2 denote the result of
the blue die. Consider the events

                            A = “D1 + D2 = 7”

and
                               B = “D1 = 4”.
Are these events independent?

   • Since
                  A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)},
      we have Pr(A) = 6/36 = 1/6.

   • Since
                  B = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)},
      we have Pr(B) = 6/36 = 1/6.

   • Since
                                  A ∩ B = {(4, 3)},
      we have Pr(A ∩ B) = 1/36.

   • It follows that Pr(A ∩ B) = Pr(A) · Pr(B) and we conclude that A and
     B are independent.

   As an exercise, you should verify that the events

                           A0 = “D1 + D2 = 11”

and
                                B 0 = “D1 = 5”
are not independent.
    Now consider the two events

                            A00 = “D1 + D2 = 4”

and
                               B 00 = “D1 = 4”.
5.11.    Independent Events                                                 211


Since A00 ∩ B 00 = ∅, we have
                           Pr (A00 ∩ B 00 ) = Pr(∅) = 0.
On the other hand, Pr (A00 ) = 1/12 and Pr (B 00 ) = 1/6. Thus,
                       Pr (A00 ∩ B 00 ) 6= Pr (A00 ) · Pr (B 00 )
and the events A00 and B 00 are not independent. This is not surprising: If we
know that B 00 occurs, then A00 does not occur, i.e., Pr (A00 | B 00 ) = 0. Thus,
the event B 00 has an effect on the probability that the event A00 occurs.

5.11.2     A Basic Property of Independent Events
Consider two events A and B in a sample space S. If these events are
independent, then the probability that A occurs does not depend on whether
or not B occurs. Since whether or not B occurs is the same as whether the
complement B does not or does occur, it should not be a surprise that the
events A and B are independent as well. The following lemma states that
this is indeed the case.
Lemma 5.11.2 Let (S, Pr) be a probability space and let A and B be two
events. If A and B are independent, then A and B are also independent.
Proof. To prove that A and B are independent, we have to show that
                                              
                     Pr A ∩ B = Pr(A) · Pr B .
Using Lemma 5.3.3, this is equivalent to showing that
                              
                   Pr A ∩ B = Pr(A) · (1 − Pr(B)) .                        (5.4)
Since the events A ∩ B and A ∩ B are disjoint and
                                               
                         A = (A ∩ B) ∪ A ∩ B ,
it follows from Lemma 5.3.2 that
                                                  
                      Pr(A) = Pr(A ∩ B) + Pr A ∩ B .
Since A and B are independent, we have
                         Pr(A ∩ B) = Pr(A) · Pr(B).
It follows that                                     
                    Pr(A) = Pr(A) · Pr(B) + Pr A ∩ B ,
which is equivalent to (5.4).
212                                         Chapter 5.         Discrete Probability


5.11.3      Pairwise and Mutually Independent Events
We have defined the notion of two events being independent. The following
definition generalizes this in two ways to sequences of events:

Definition 5.11.3 Let (S, Pr) be a probability space, let n ≥ 2, and let
A1 , A2 , . . . , An be a sequence of events.
   1. We say that this sequence is pairwise independent if for any two distinct
      indices i and j, the events Ai and Aj are independent, i.e.,
                            Pr (Ai ∩ Aj ) = Pr (Ai ) · Pr (Aj ) .

   2. We say that this sequence is mutually independent if for all k with
      2 ≤ k ≤ n and all indices i1 < i2 < . . . < ik ,
             Pr (Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = Pr (Ai1 ) · Pr (Ai2 ) · · · Pr (Aik ) .

    Thus, in order to show that the sequence A1 , A2 , . . . , An is pairwise inde-
                             n
pendent, we have to verify 2 equalities. On the otherPhand, to          show that
this sequence is mutually independent, we have to verify nk=2 nk = 2n −1−n
equalities.
    For example, if we want to prove that the sequence A, B, C of three events
is mutually independent, then we have to show that
                           Pr(A ∩ B) = Pr(A) · Pr(B),
                           Pr(A ∩ C) = Pr(A) · Pr(C),
                           Pr(B ∩ C) = Pr(B) · Pr(C),
and
                    Pr(A ∩ B ∩ C) = Pr(A) · Pr(B) · Pr(C).
   To give an example, consider flipping a coin three times and assume that
the result is a uniformly random element from the sample space
         S = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T },
where, e.g., HHT indicates that the first two flips result in heads and the
third flip results in tails. For i = 1, 2, 3, let fi denote the result of the i-th
flip, and consider the events
                                   A = “f1 = f2 ”,
5.12.     Describing Events by Logical Propositions                       213


                                B = “f2 = f3 ”,
and
                                C = “f1 = f3 ”.
If we write these events as subsets of the sample space, then we get

                        A = {HHH, HHT, T T H, T T T },

                        B = {HHH, T HH, HT T, T T T },
and
                        C = {HHH, HT H, T HT, T T T }.
It follows that
                    Pr(A)     =   |A|/|S|     =   4/8   =   1/2,
                    Pr(B)     =   |B|/|S|     =   4/8   =   1/2,
                    Pr(C)     =   |C|/|S|     =   4/8   =   1/2,
                  Pr(A ∩ B)   = |A ∩ B|/|S|   =   2/8   =   1/4,
                  Pr(A ∩ C)   = |A ∩ C|/|S|   =   2/8   =   1/4,
                  Pr(B ∩ C)   = |B ∩ C|/|S|   =   2/8   =   1/4.

Thus, the sequence A, B, C is pairwise independent. Since

                          A ∩ B ∩ C = {HHH, T T T },

we have
                  Pr(A ∩ B ∩ C) = |A ∩ B ∩ C|/|S| = 2/8 = 1/4.
Thus,
                     Pr(A ∩ B ∩ C) 6= Pr(A) · Pr(B) · Pr(C)
and, therefore, the sequence A, B, C is not mutually independent. Of course,
this is not surprising: If both events A and B occur, then event C also occurs.


5.12       Describing Events by Logical Proposi-
           tions
We have defined an event to be a subset of a sample space. In several
examples, however, we have described events in plain English or as logical
propositions.
214                                    Chapter 5.       Discrete Probability


   • Since the intersection (∩) of sets corresponds to the conjunction (∧) of
     propositions, we often write A ∧ B for the event “both A and B occur”.

   • Similarly, since the union (∪) of sets corresponds to the disjunction (∨)
     of propositions, we often write A ∨ B for the event “A or B occurs”.

5.12.1       Flipping a Coin and Rolling a Die
If we flip a coin and roll a die, the sample space is

           S = {H1, H2, H3, H4, H5, H6, T 1, T 2, T 3, T 4, T 5, T 6}.

The events
                       A = “the coin comes up heads”
and
                     B = “the result of the die is even”
correspond to the subsets

                       A = {H1, H2, H3, H4, H5, H6}

and
                        B = {H2, H4, H6, T 2, T 4, T 6}
of the sample space S, respectively. The event that both A and B occur is
written as A ∧ B and corresponds to the subset

                            A ∩ B = {H2, H4, H6}

of S. The event that A or B occurs is written as A ∨ B and corresponds to
the subset

               A ∪ B = {H1, H2, H3, H4, H5, H6, T 2, T 4, T 6}

of S.
    Assume that both the coin and the die are fair, and the results of rolling
the die and flipping the coin are independent. The probability that both A
and B occur, i.e., Pr(A ∧ B), is equal to |A ∩ B|/|S| = 3/12 = 1/4. We can
also use independence to determine this probability:

               Pr(A ∧ B) = Pr(A) · Pr(B) = 1/2 · 3/6 = 1/4.
5.12.    Describing Events by Logical Propositions                         215


Observe that when we determine Pr(A), we do not consider the entire sample
space S. Instead, we consider the coin’s sample space, which is {H, T }.
Similarly, when we determine Pr(B), we consider the die’s sample space,
which is {1, 2, 3, 4, 5, 6}.
   The probability that A or B occurs, i.e., Pr(A ∨ B), is equal to

                   Pr(A ∨ B) = |A ∪ B|/|S| = 9/12 = 3/4.

5.12.2     Flipping Coins
Let n ≥ 2 be an integer and assume we flip n fair coins. For each i with
1 ≤ i ≤ n, consider the event

                    Ai = “the i-th coin comes up heads”.

We assume that the coin flips are independent of each other, by which we
mean that the sequence A1 , A2 , . . . , An of events is mutually independent.
Consider the event
                        A = A1 ∧ A2 ∧ · · · ∧ An .
What is Pr(A), i.e., the probability that all n coins come up heads? Since
there are 2n many possible outcomes for n coin flips and only one of them
satisfies event A, this probability is equal to 1/2n . Alternatively, we can use
independence to determine Pr(A):

                   Pr(A) = Pr (A1 ∧ A2 ∧ · · · ∧ An )
                         = Pr (A1 ) · Pr (A2 ) · · · Pr (An ) .

Since each coin is fair, we have Pr (Ai ) = 1/2 and, thus, we get

              Pr(A) = (1/2) · (1/2) · · · (1/2) = (1/2)n = 1/2n .
                      |          {z           }
                              n times

5.12.3     The Probability of a Circuit Failing
Consider a circuit C that consists of n components C1 , C2 , . . . , Cn . Let p
be a real number with 0 < p < 1 and assume that any component fails
with probability p, independently of the other components. For each i with
1 ≤ i ≤ n, consider the event

                         Ai = “component Ci fails”.
216                                   Chapter 5.         Discrete Probability


Let A be the event
                       A = “the entire circuit fails”.
   • Assume that the entire circuit fails when at least one component fails.
     What is Pr(A), i.e., the probability that the circuit fails? By our as-
     sumption, we have
                            A = A1 ∨ A2 ∨ · · · ∨ An
      and, thus, using De Morgan’s Law,
                             A = A1 ∧ A2 ∧ · · · ∧ An .
      Using independence and Lemmas 5.3.3 and 5.11.2, we get
                                     
                  Pr(A) = 1 − Pr A
                                                         
                         = 1 − Pr A1 ∧ A2 ∧ · · · ∧ An
                                                          
                         = 1 − Pr A1 · Pr A2 · · · Pr An
                         = 1 − (1 − p)(1 − p) · · · (1 − p)
                                |           {z            }
                                         n times
                         = 1 − (1 − p)n .
      Since 0 < p < 1, we have limn→∞ Pr(A) = 1. We conclude that for
      large values of n, it is very likely that the circuit fails.
   • Now assume that the entire circuit fails when all components fail.
     Again, we want to know the probability Pr(A) that the circuit fails.
     In this case, we have
                             A = A1 ∧ A2 ∧ · · · ∧ An ,
      and we get
                     Pr(A) = Pr (A1 ∧ A2 ∧ · · · ∧ An )
                           = Pr (A1 ) · Pr (A2 ) · · · Pr (An )
                           = p · p···p
                             | {z }
                              n times
                           = pn .
      Since 0 < p < 1, we have limn→∞ Pr(A) = 0. Thus, for large values
      of n, it is very likely that the circuit does not fail.
5.13.    Choosing a Random Element in a Linked List                           217


5.13       Choosing a Random Element in a Linked
           List
Consider a linked list L. Each node u in L stores a pointer to its successor
node succ(u). If u is the last node in L, then u does not have a successor and
succ(u) = nil . We are also given a pointer to the first node head (L) of L.
                 head (L)                   u          succ(u)      nil
    Our task is to choose, uniformly at random, a node in L. Thus, if this list
has n nodes, then each node must have a probability of 1/n of being chosen.
    We assume that we are given a function Random: For any integer
i ≥ 1, a call to Random(i) returns a uniformly random element from the
set {1, 2, . . . , i}; the value returned is independent of all other calls to this
function.
    To make the problem interesting, we assume that we do not know the
value of n, i.e., at the start, we do not know the number of nodes in the
list L. Also, we are allowed to only make one pass over this list. We will
prove below that the following algorithm solves the problem:
    Algorithm ChooseRandomNode(L):

          u = head (L);
          i = 1;
          while u 6= nil
          do r = Random(i);
              if r = 1
              then x = u
              endif;
              u = succ(u);
              i=i+1
          endwhile;
          return x

    In one iteration of the while-loop, the call to Random(i) returns a uni-
formly random element r from the set {1, 2, . . . , i}. If r = 1, which happens
with probability 1/i, the value of x is set to the currently visited node. If
r 6= 1, which happens with probability 1−1/i, the value of x does not change
during this iteration of the while-loop. Thus,
218                                     Chapter 5.       Discrete Probability


   • in the first iteration, x is set to the first node of L with probability 1,

   • in the second iteration, x is set to the second node of L with probability
     1/2, whereas the value of x does not change with probability 1/2,

   • in the third iteration, x is set to the third node of L with probability
     1/3, whereas the value of x does not change with probability 2/3,

   • in the last iteration, x is set to the last node of L with probability 1/|L|,
     whereas the value of x does not change with probability (|L| − 1)/|L|.

    We now prove that the output x of algorithm ChooseRandomNode(L)
is a uniformly random node of the list L. Let n denote the number of nodes
in L and let v be an arbitrary node in L. We will prove that, after the
algorithm has terminated, x = v with probability 1/n.
    Let k be the integer such that v is the k-th node in L; thus, 1 ≤ k ≤ n.
We observe that, after the algorithm has terminated, x = v if and only if

   • during the k-th iteration, the value of x is set to v, and

   • for all i = k + 1, k + 2, . . . , n, during the i-th iteration, the value of x
     does not change.

Consider the event

             A = “after the algorithm has terminated, x = v”.

For each i with 1 ≤ i ≤ n, consider the event

          Ai = “the value of x changes during the i-th iteration”.

Then
                  A = Ak ∧ Ak+1 ∧ Ak+2 ∧ Ak+3 ∧ · · · ∧ An .

Recall that we assume that the output of the function Random is inde-
pendent of all other calls to this function. This implies that the events
5.14.   Long Runs in Random Bitstrings                                     219


A1 , A2 , . . . , An are mutually independent. It follows that
                                                           
  Pr(A) = Pr Ak ∧ Ak+1 ∧ Ak+2 ∧ Ak+3 ∧ · · · ∧ An
                                                                  
                = Pr (Ak ) · Pr Ak+1 · Pr Ak+2 · Pr Ak+3 · · · Pr An
                                                                   
                    1          1             1               1          1
                =      · 1−          · 1−          · 1−         ··· 1 −
                    k        k+1           k+2             k+3          n
                    1      k   k+1 k+2         n−1
                =      ·     ·       ·     ···
                    k k+1 k+2 k+3                n
                    1
                =     .
                    n

5.14       Long Runs in Random Bitstrings
Let n be a large integer and assume we flip a fair coin n times, where all flips
are mutually independent. If we write 0 for heads and 1 for tails, then we
obtain a random bitstring

                                R = r1 r2 . . . rn .

A run of length k is a substring of R, all of whose bits are the same. For
example, the bitstring

                           00111100101000011000

contains, among others, the following substrings in bold,

                          00111100101000011000,

which are runs of lengths 4, 2, and 1, respectively.
   Would you be surprised to see a “long” run in the random bitstring R,
say a run of length about log n? Most people will answer this question with
“yes”. We will prove below, however, that the correct answer is “no”: The
probability that this happens is about 1−1/n2 ; thus, it converges to 1 when n
goes to infinity. In other words, you should be surprised if a random bitstring
does not contain a run of length about log n.
   We choose a positive integer k and consider the event

                A = “R contains a run of length at least k”.
220                                      Chapter 5.         Discrete Probability


We are going to prove a lower bound on Pr(A) in terms of n and k. At the
end, we will show that by taking k to be slightly less than log n, we have
Pr(A) ≥ 1 − 1/n2 .
   For each i with 1 ≤ i ≤ n − k + 1, we consider the event

        Ai = “the substring of length k starting at position i is a run”.

Since a run of length at least k can start at any of the positions 1, 2, . . . , n −
k + 1, we have
                         A = A1 ∨ A2 ∨ · · · ∨ An−k+1 ,
implying that
                     Pr(A) = Pr (A1 ∨ A2 ∨ · · · ∨ An−k+1 ) .
Observe that the events A1 , A2 , . . . , An−k+1 are not pairwise disjoint. As a
result, the probability on the right-hand side is difficult to analyze; it requires
the Principle of Inclusion and Exclusion (see Section 3.5). Because of this,
we consider the complement of the event A, i.e., the event

                  A = “each run in R has length less than k”.

Using De Morgan’s Law, we get

                          A = A1 ∧ A2 ∧ · · · ∧ An−k+1 ,

where Ai is the complement of Ai , i.e, the event

      Ai = “the substring of length k starting at position i is not a run”.

It follows that
                                                       
                  Pr A = Pr A1 ∧ A2 ∧ · · · ∧ An−k+1 .                (5.5)
                      
    We determine Pr Ai , by first determining Pr (Ai ). The event Ai occurs
if and only if
                      ri = ri+1 = · · · = ri+k−1 = 0
or
                          ri = ri+1 = · · · = ri+k−1 = 1.
Since the coin flips are mutually independent, it follows that

                        Pr (Ai ) = 1/2k + 1/2k = 1/2k−1
5.14.   Long Runs in Random Bitstrings                                      221


and, therefore,
                      Pr Ai = 1 − Pr (Ai ) = 1 − 1/2k−1 .
                           

    Is the probability on the right-hand side of (5.5) equal to the product of
the individual probabilities? If the events A1 , A2 , . . . , An−k+1 are mutually
independent, then the answer is “yes”. However, it should be clear that, for
example, the events A1 and A2 are not independent: If we are told that event
A1 occurs, then the first k bits in the bitstring R are equal; let us say they
are all equal to 0. In this case, the probability that event A2 occurs is equal
to the probability that the (k + 1)-st bit in R is 0, which is equal to 1/2 and
not 1/2k−1 (assuming that k ≥ 3). It seems that we are stuck. Fortunately,
there is a way out:
    Let us assume that the integer k is chosen such that n/k is an integer.
We divide the bitstring R = r1 r2 . . . rn into n/k blocks, each having length k.
Thus,
   • the first block is the substring r1 r2 . . . rk ,
   • the second block is the substring rk+1 rk+2 . . . r2k ,
   • the third block is the substring r2k+1 r2k+2 . . . r3k ,
   • the (n/k)-th block is the substring rn−k+1 rn−k+2 . . . rn .
For each i with 1 ≤ i ≤ n/k, we consider the event

                         Bi = “the i-th block is a run”.

Thus, the complement of Bi is the event

                       B i = “the i-th block is not a run”.

Since Bi = A(i−1)k+1 and B i = A(i−1)k+1 , we have

                          Pr B i = 1 − 1/2k−1 .
                                 

Observe that
   • the events B 1 , B 2 , . . . , B n/k are mutually independent, because the
     blocks do not overlap, and
   • if the event A occurs, then the event B 1 ∧ B 2 ∧ · · · ∧ B n/k also occurs
     (but, in general, the converse is not true!).
222                                      Chapter 5.               Discrete Probability


Using Lemma 5.3.6, it follows that
                                            
         Pr A ≤ Pr B 1 ∧ B 2 ∧ · · · ∧ B n/k
                                               
                = Pr B 1 · Pr B 2 · · · Pr B n/k
                = 1 − 1/2k−1 · 1 − 1/2k−1 · · · 1 − 1/2k−1
                                                         
                                 n/k
                = 1 − 1/2k−1          .
Using the inequality 1 − x ≤ e−x , see (5.3), we get
                                                k−1              k
                       1 − 1/2k−1 ≤ e−1/2             = e−2/2
and, thus,
                                 k n/k
                                                  k
                      Pr A ≤ e−2/2       = e−2n/(k2 ) .                          (5.6)
   Note that until now, k was arbitrary. We choose k to be
                            k = log n − 2 log log n.
Using basic properties of logarithms, see Section 2.4, we will show below that,
for this choice of k, the right-hand side in (5.6) is a “nice” function of n.
    In Section 2.4, we have seen that
                                    2log n = n
and
                               22 log log n = log2 n.
It follows that
                                                2log n             n
                  2k = 2log n−2 log log n =                  =          .
                                              22 log log n       log2 n
Thus,
                         2n          2 log2 n
                                =
                         k2k             k
                                            2 log2 n
                                =
                                     log n − 2 log log n
                                     2 log2 n
                                ≥
                                       log n
                                =    2 log n
                                       ln n
                                =    2
                                       ln 2
                                ≥    2 ln n,
5.14.    Long Runs in Random Bitstrings                                    223


implying that
                                           k
                            Pr A ≤ e−2n/(k2 )
                                

                                       ≤ e−2 ln n
                                       = 1/n2 .

We conclude that, for the value of k chosen above,

                       Pr(A) = 1 − Pr A ≥ 1 − 1/n2 .
                                       


Thus, with probability at least 1 − 1/n2 , a random bitstring of length n
contains a run of length at least log n − 2 log log n.
    We remark that we have been cheating, because we assumed that both
                                                           m
k and n/k are integers. Assume that n is of the form 22 , for some positive
integer m. Then both log n and log log n are integers and, thus, k is an
integer as well. In a correct derivation, we divide the bitstring R into bn/kc
blocks of size k and, if n/k is not an integer, one block of length less than k.
We then get
                                                 bn/kc
                                      1 − 1/2k−1
                               
                        Pr A       ≤
                                            bn/kc
                                       −2/2k
                                   ≤ e
                                               k
                                   = e−2bn/kc/2 .

As we have seen before, for k = log n − 2 log log n, we have 2k = n/ log2 n.
Since
                             bn/kc > n/k − 1,
we get

                       2bn/kc   2(n/k − 1)
                          k
                              >
                         2            2k
                                (2 log2 n)(n/k − 1)
                              =
                                          n
                                2 log2 n 2 log2 n
                              =          −
                                    k          n
                                         2 log2 n
                              ≥ 2 ln n −
                                             n
224                                      Chapter 5.      Discrete Probability


and, thus,
                                           k
                             ≤ e−2bn/kc/2
                         
                  Pr A
                                                2
                             ≤ e−2 ln n+(2 log n)/n
                                                 2
                             = e−2 ln n · e(2 log n)/n
                             = (1/n2 ) · 1 + O((log2 n)/n)
                                                          

                             = 1/n2 + O (log2 n)/n3 .
                                                       

This upper bound is larger than the upper bound we had before by only a
small additive factor of O((log2 n)/n3 ).


5.15         Infinite Probability Spaces
In Section 5.2, we defined a sample space to be any non-empty countable set.
All sample spaces that we have seen so far are finite. In some cases, infinite
(but countable) sample spaces arise in a natural way. To give an example,
assume we flip a fair coin repeatedly and independently until it comes up
heads for the first time. The sample space S is the set of all sequences of
coin flips that can occur. If we denote by T n H the sequence consisting of n
tails followed by one heads, then

                S = {H, T H, T T H, T T T H, T T T T H, . . .}
                  = {T n H : n ≥ 0},

which is indeed an infinite set.
   Since the coin is fair and the coin flips are mutually independent, the
outcome T n H has a probability of (1/2)n+1 , i.e.,

                               Pr (T n H) = (1/2)n+1 .

Recall that according to Definition 5.2.2, in order for this to be a valid
probability function, the sum of all probabilities must be equal to 1, i.e.,
the infinite series      ∞               ∞
                        X                X
                            Pr (T n H) =   (1/2)n+1
                         n=0                   n=0

must be equal to 1. Since you may have forgotten about infinite series, we
recall the definition in the following subsection.
5.15.    Infinite Probability Spaces                                                    225


5.15.1      Infinite Series
              The divergent series are the invention of the devil, and it is a
              shame to base on them any demonstration whatsoever.
                                                 — Niels Henrik Abel, 1828


Definition 5.15.1 Let a0 , a1 , a2 , . . . be an infinite sequence of real numbers.
If
                   N
                   X
               lim    an = lim (a0 + a1 + a2 + · · · + aN )
                 N →∞                N →∞
                        n=0
                                                     P∞
exists, then we say that the infinite series              n=0    an converges. In this case,
the value of this infinite series is equal to
                                  ∞
                                  X                 N
                                                    X
                                        an = lim          an .
                                             N →∞
                                  n=0               n=0


   For example, let x be a real number with x 6= 1, and define an = xn for
n ≥ 0. We claim that
            N           N
            X           X
                              n                2   1 − xN +1     N
                an =     x = 1 + x + x + ··· + x =           ,
            n=0      n=0
                                                     1−x

which can be proved either by induction on N or by verifying that

                 (1 − x) 1 + x + x2 + · · · + xN = 1 − xN +1 .
                                                


If −1 < x < 1, then limN →∞ xN +1 = 0 and it follows that
                              ∞
                              X                     N
                                                    X
                                    xn =     lim          xn
                                             N →∞
                              n=0                   n=0
                                                1 − xN +1
                                         =   lim
                                           N →∞   1−x
                                              1
                                         =      .
                                           1−x
We have proved the following result:
226                                          Chapter 5.          Discrete Probability


Lemma 5.15.2 If x is a real number with −1 < x < 1, then
                                  ∞
                                  X             1
                                        xn =       .
                                  n=0
                                               1−x

   Now we can return to the coin flipping example that we saw in the be-
ginning of Section 5.15. If we take x = 1/2 in Lemma 5.15.2, then we get
                        ∞
                        X                      ∞
                                               X
                                  n
                              Pr (T H) =             (1/2)n+1
                        n=0                    n=0
                                                        ∞
                                                        X
                                         = (1/2)              (1/2)n
                                                        n=0
                                                            1
                                         = (1/2) ·
                                                         1 − 1/2
                                         = 1.
Thus, we indeed have a valid probability function on the infinite sample space
S = {T n H : n ≥ 0}.

              The limit does not exist.
                               — Cady Heron (played by Lindsay Lohan),
                                                   — Mean Girls, 2004

    We have seen in Lemma 5.15.2 that the infinite series ∞            n
                                                               P
                                                                 n=0 x converges
if −1 < x < 1. It is not difficult to see that for all other values of x, the limit
                                             N
                                             X
                                      lim          xn
                                      N →∞
                                             n=0

does not exist. As a result, if x ≥ 1 or x ≤ −1, the infinite series ∞     n
                                                                    P
                                                                     n=0 x
does not converge. Another example of an infinite series that does not con-
verge is
                    ∞
                    X
                        1/n = 1 + 1/2 + 1/3 + 1/4 + · · ·
                        n=1
In Section 6.8.3, we will prove that
                  N
                  X
                        1/n = 1 + 1/2 + 1/3 + 1/4 + · · · + 1/N
                  n=1
5.15.      Infinite Probability Spaces                                     227


is about ln N . It follows that
                                         N
                                         X
                                   lim         1/n
                                  N →∞
                                         n=0

is about
                                    lim ln N,
                                   N →∞

which clearly does not exist.

5.15.2       Who Flips the First Heads
Consider a game in which two players P1 and P2 take turns flipping, inde-
pendently, a fair coin. Thus, first P1 flips the coin, then P2 flips the coin,
then P1 flips the coin, then P2 flips the coin, etc. The player who flips heads
first is the winner of the game.
    Who is more likely to win this game? Our intuition says that P1 has
an advantage, because he is the player who starts: If the first flip is heads,
then the game is over and P1 wins. We will prove below that this intuition
is correct: P1 has a probability of 2/3 of winning the game and, thus, the
winning probability of P2 is only 1/3.
    The sample space S is the set of all sequences of coin flips that can occur.
Since the game is over as soon as a heads is flipped, we have

                               S = {T n H : n ≥ 0}.

Since P1 starts, the event

                             A = “P1 wins the game”

corresponds to the subset

                      A = {T n H : n ≥ 0 and n is even},

which we rewrite as
                              A = {T 2m H : m ≥ 0}.
The probability that P1 wins the game is equal to Pr(A). How do we deter-
mine this probability? According to (5.1) in Section 5.2,
                                     X
                           Pr(A) =       Pr(ω).
                                         ω∈A
228                                   Chapter 5.           Discrete Probability


Since each outcome ω in A is of the form T 2m H, we have
                                    ∞
                                    X
                                          Pr T 2m H .
                                                   
                          Pr(A) =
                                    m=0


Thus, we have
                                    ∞
                                    X
                                          Pr T 2m H
                                                      
                         Pr(A) =
                                    m=0
                                    X∞
                                =         (1/2)2m+1
                                    m=0
                                            ∞
                                            X
                                = (1/2)         (1/2)2m
                                            m=0
                                            X∞
                                = (1/2)         (1/4)m .
                                            m=0


By taking x = 1/4 in Lemma 5.15.2, we get

                                             1
                        Pr(A) = (1/2) ·           = 2/3.
                                          1 − 1/4

   Let B be the event

                          B = “P2 wins the game”.

Since either P1 or P2 wins the game, we have

                    Pr(B) = 1 − Pr(A) = 1 − 2/3 = 1/3.

Let us verify, using an infinite series, that Pr(B) is indeed equal to 1/3. The
event B corresponds to the subset

                      B = {T n H : n ≥ 0 and n is odd},

which we rewrite as
                           B = {T 2m+1 H : m ≥ 0}.
5.15.    Infinite Probability Spaces                                      229


The probability that P2 wins the game is thus equal to
                                   ∞
                                   X
                                         Pr T 2m+1 H
                                                       
                     Pr(B) =
                                   m=0
                                   X∞
                               =       (1/2)2m+2
                                   m=0
                                          ∞
                                          X
                               = (1/4)        (1/2)2m
                                          m=0
                                          X∞
                               = (1/4)        (1/4)m
                                          m=0
                                              1
                               = (1/4) ·           = 1/3.
                                           1 − 1/4

5.15.3      Who Flips the Second Heads
Let us change the game from the previous subsection: Again, the two players
P1 and P2 take turns flipping, independently, a fair coin, where P1 starts. The
game ends as soon as a second heads comes up. The player who flips the
second heads wins the game.
    Before you continue reading: Who do you think has a higher probability
of winning this game?
    In this game, a sequence of coin flips can occur if and only if (i) the
sequence contains exactly two heads and (ii) the last element in the sequence
is heads. Thus, the sample space S is given by

                      S = {T m HT n H : m ≥ 0, n ≥ 0}.

The event
                            A = “P1 wins the game”
corresponds to the subset

              A = {T m HT n H : m ≥ 0, n ≥ 0, m + n is odd}.

Below, we will determine Pr(A), i.e., the probability that P1 wins the game.
   We split the event A into two events

            A1 = “P1 flips both the first and the second heads”
230                                     Chapter 5.             Discrete Probability


and

       A2 = “P2 flips the first heads and P1 flips the second heads”.

If we write these two events as subsets of the sample space S, we get

       A1 = {T m HT n H : m ≥ 0, n ≥ 0, m is even and n is odd}
          = {T 2k HT 2`+1 H : k ≥ 0, ` ≥ 0}

and

       A2 = {T m HT n H : m ≥ 0, n ≥ 0, m is odd and n is even}
          = {T 2k+1 HT 2` H : k ≥ 0, ` ≥ 0}.

Observe that A1 ∩ A2 = ∅ and A = A1 ∪ A2 , implying that

                        Pr(A) = Pr (A1 ) + Pr (A2 ) .

We determine the two probabilities on the right-hand side.
  We have
                              ∞ X
                              X ∞
                                        Pr T 2k HT 2`+1 H
                                                                 
                 Pr (A1 ) =
                              k=0 `=0
                              ∞ X
                              X ∞
                          =             (1/2)2k+2`+3
                              k=0 `=0
                                   ∞
                                   X         ∞
                                             X
                                   3      2k
                          = (1/2 )   (1/2)     (1/2)2`
                                       k=0             `=0
                                       ∞
                                       X              ∞
                                                      X
                          = (1/8)            (1/4)k         (1/4)`
                                       k=0            `=0
                                       ∞
                                       X                   1
                          = (1/8)            (1/4)k ·
                                       k=0
                                                        1 − 1/4
                                       ∞
                                       X
                          = (1/6)            (1/4)k
                                       k=0
                                           1
                          = (1/6) ·
                                        1 − 1/4
                          = 2/9
5.16.   Exercises                                                        231


and
                               ∞ X
                               X ∞
                                          Pr T 2k+1 HT 2` H
                                                              
                  Pr (A2 ) =
                                k=0 `=0
                              ∞ X
                              X ∞
                            =     (1/2)2k+2`+3
                                k=0 `=0
                            = 2/9.

Thus, the probability that P1 wins the game is equal to

                       Pr(A) = Pr (A1 ) + Pr (A2 )
                             = 2/9 + 2/9
                             = 4/9.

The probability that P2 wins the game is equal to

                             1 − Pr(A) = 5/9.

Thus, P2 has a slightly larger probability of winning the game.
    You will agree that this was a painful way of determining Pr(A). In Exer-
cise 5.91, you will see an easier way to determine this probability: The game
of this subsection can be seen as two rounds of the game in Section 5.15.2.
This observation, together with the Law of Total Probability (Theorem 5.9.1)
leads to an easier way to prove that Pr(A) = 4/9.


5.16      Exercises
5.1 Consider a coin that has 0 on one side and 1 on the other side. We flip
this coin once and roll a die twice, and are interested in the product of the
three numbers.

   • What is the sample space?

   • How many possible events are there?

   • If both the coin and the die are fair, how would you define the proba-
     bility function Pr for this sample space?
232                                      Chapter 5.       Discrete Probability


5.2 Consider the sample space S = {a, b, c, d} and a probability function
Pr : S → R on S. Consider the events A = {a}, B = {a, b}, C = {a, b, c},
and D = {b, d}. You are given that Pr(A) = 1/10, Pr(B) = 1/2, and
Pr(C) = 7/10. Determine Pr(D).

5.3 Let n be a positive integer. We flip a fair coin 2n times and consider the
possible outcomes, which are strings of length 2n with each character being
H (= heads) or T (= tails). Thus, we take the sample space S to be the set
of all such strings. Since our coin is fair, each string of S should have the
same probability. Thus, we define Pr(s) = 1/|S| for each string s in S. In
other words, we have a uniform probability space.
    You are asked to determine the probability that in the sequence of 2n
flips, the coin comes up heads exactly n times:
   • What is the event A that describes this?

   • Determine Pr(A).

5.4 A cup contains two pennies (P), one nickel (N), and one dime (D). You
choose one coin uniformly at random, and then you choose a second coin
from the remaining coins, again uniformly at random.
   • Let S be the sample space consisting of all ordered pairs of letters P,
     N, and D that represent the possible outcomes. Write out all elements
     of S.

   • Determine the probability for each element in this sample space.

5.5 You are given a box that contains the 8 lowercase letters a, b, c, d, e, f, g, h
and the 5 uppercase letters V, W, X, Y, Z.
   In this exercise, we will consider two ways to choose 4 random letters
from the box. In the first way, we do uniform sampling without replacement,
whereas in the second way, we do uniform sampling with replacement. For
each case, you are asked to determine the probability that the 4-th letter
chosen is an uppercase letter. Before starting this exercise, spend a few
minutes and guess for which case this probability is smaller.

   • You choose 4 letters from the box: These letters are chosen in 4 steps,
     and in each step, you choose a uniformly random letter from the box;
     this letter is removed from the box.
5.16.     Exercises                                                       233


          – What is the sample space?
          – Consider the event
                  A = “the 4-th letter chosen is an uppercase letter ”.
            Determine Pr(A).
   • You choose 4 letters from the box: These letters are chosen in 4 steps,
     and in each step, you choose a uniformly random letter from the box;
     this letter is not removed from the box.
          – What is the sample space?
          – Consider the event
                   B = “the 4-th letter chosen is an uppercase letter”.
            Determine Pr(B).
5.6 You flip a fair coin, independently, six times.
   • What is the sample space?
   • Consider the events
           A = “the coin comes up heads at least four times”,
           B = “the number of heads is equal to the number of tails”,
           C = “there are at least four consecutive heads”.
        Determine Pr(A), Pr(B), Pr(C), Pr(A | B), and Pr(C | A).
5.7 Let k ≥ 2 be an integer and consider the sample space S consisting
of all sequences of k characters, where each character is one of the digits
0, 1, 2, . . . , 9.
    If we choose a sequence s uniformly at random from the sample space S,
what is the probability that none of the digits in s is equal to 5?
5.8 You are given a red coin and a blue coin. Both coins have the number
1 on one side and the number 2 on the other side. You flip both coins once
(independently of each other) and take the sum of the two results. Consider
the events
                 A = “the sum of the results equals 2”,
                 B = “the sum of the results equals 3”,
                 C = “the sum of the results equals 4”.
234                                            Chapter 5.          Discrete Probability


    • Assume both coins are fair. Determine Pr(A), Pr(B), and Pr(C).

    • Let p and q be real numbers with 0 < p < 1 and 0 < q < 1. Assume
      the red coin comes up “1” with probability p and the blue coin comes
      up “1” with probability q. Is it possible to choose p and q such that

                                   Pr(A) = Pr(B) = Pr(C)?

5.9 Let p1 , p2 , . . . , p6 , q1 , q2 , . . . , q6 be real numbers such that each pi is strictly
positive, each qi is strictly positive, and p1 +p2 +· · ·+p6 = q1 +q2 +· · ·+q6 = 1.
    You are given a red die and a blue die. For any i with 1 ≤ i ≤ 6, if you
roll the red die, then the result is i with probability pi , and if you roll the
blue die, then the result is i with probability qi .
    You roll each die once (independently of each other) and take the sum of
the two results. For any s ∈ {2, 3, . . . , 12}, consider the event

                       As = “the sum of the results equals s”.

    • Let x > 0 and y > 0 be real numbers. Prove that
                                             x y
                                              + ≥ 2.
                                             y x

       Hint: Rewrite this inequality until you get an equivalent inequality
       which obviously holds.

    • Assume that Pr (A2 ) = Pr (A12 ) and denote this common value by a.
      Prove that
                                 Pr (A7 ) ≥ 2a.

    • Is it possible to choose p1 , p2 , . . . , p6 , q1 , q2 , . . . , q6 such that for any s ∈
      {2, 3, . . . , 12}, Pr (As ) = 1/11?

5.10 The Fibonacci numbers are defined as follows: f0 = 0, f1 = 1, and
fn = fn−1 + fn−2 for n ≥ 2.
   Let n be a large integer. A Fibonacci die is a die that has fn faces. Such
a die is fair: If we roll it, each face is on top with the same probability 1/fn .
There are three different types of Fibonacci dice:
    • D1 : fn−2 of its faces show the number 1 and the other fn−1 faces show
      the number 4.
5.16.     Exercises                                                           235


   • D2 : Each face shows the number 3.

   • D3 : fn−2 of its faces show the number 5 and the other fn−1 faces show
     the number 2.

    Assume we roll each of D1 , D2 , and D3 once, independently of each
other. Let R1 , R2 , and R3 be the numbers on the top faces of D1 , D2 , and
D3 , respectively. Determine

                                 Pr(R1 > R2 )

and
                                 Pr(R2 > R3 ),
and show that
                                            fn−2 fn+1
                           Pr(R3 > R1 ) =             .
                                               fn2

5.11 You are given a fair die. If you roll this die repeatedly, then the results
of the rolls are independent of each other.

   • You roll the die 6 times. Consider the event

              A = “there is at least one 6 in this sequence of 6 rolls”.

        Determine Pr(A).

   • You roll the die 12 times. Consider the event

            B = “there are at least two 6’s in this sequence of 12 rolls”.

        Determine Pr(B).

   • You roll the die 18 times. Consider the event

           C = “there are at least three 6’s in this sequence of 18 rolls”.

        Determine Pr(C).

Before starting this exercise, spend a few minutes and guess which of these
three probabilities is the smallest.
236                                    Chapter 5.      Discrete Probability


5.12 When Tri is a big boy, he wants to have four children. Assuming that
the genders of these children are uniformly random, which of the following
three events has the highest probability?
   1. All four kids are of the same gender.
   2. Exactly three kids are of the same gender.
   3. Two kids are boys and two kids are girls.

5.13 A group of ten people sits down, uniformly at random, around a table.
Lindsay and Simon are part of this group. Determine the probability that
Lindsay and Simon sit next to each other.

5.14 Consider five people, each of which has a uniformly random and inde-
pendent birthday. (We ignore leap years.) Consider the event
            A = “at least three people have the same birthday”.
Determine Pr(A).

5.15 Donald Trump wants to hire two secretaries. There are n applicants
a1 , a2 , . . . , an , where n ≥ 2 is an integer. Each of these applicants has a
uniformly random birthday, and all birthdays are mutually independent. (We
ignore leap years.)
     Since Donald is too busy making America great again, he does not have
time to interview the applicants. Instead, he uses the following strategy: If
there is an index i such that ai and ai+1 have the same birthday, then he
chooses the smallest such index i and hires ai and ai+1 . In this case, the
hiring process is a tremendous success. If such an index i does not exist,
then nobody is hired and the hiring process is a total disaster.
     Determine the probability that the hiring process is a tremendous success.

5.16 Let d and n be integers such that d ≥ 1, n ≥ d, and n + d is even.
You live on Somerset Street and want to go to your local pub, which is also
located on Somerset Street, at distance d to the east from your home.
                                       d



                     home                             pub
5.16.   Exercises                                                           237


   You use the following strategy:
   • Initially, you are at your home.
   • For each i = 1, 2, . . . , n, you do the following:
        – You flip a fair and independent coin.
        – If the coin comes up heads, you walk a distance 1 to the east.
        – If the coin comes up tails, you walk a distance 1 to the west.
Consider the event
             A = “after these n steps, you are at your local pub”.
Prove that                                    
                                            n     n
                              Pr(A) =       n+d /2 .
                                             2

5.17 In Section 5.4.1, we have seen the different cards that are part of a
standard deck of cards.
   • You choose 2 cards uniformly at random from the 13 spades in a deck
     of 52 cards. Determine the probability that you choose an Ace and a
     King.
   • You choose 2 cards uniformly at random from a deck of 52 cards. De-
     termine the probability that you choose an Ace and a King.
   • You choose 2 cards uniformly at random from a deck of 52 cards. De-
     termine the probability that you choose an Ace and a King of the same
     suit.

5.18 In Section 5.4.1, we have seen the different cards that are part of a
standard deck of cards.
    A hand of cards is a subset consisting of five cards. A hand of cards is
called a straight, if the ranks of these five cards are consecutive and the cards
are not all of the same suit.
    An Ace and a 2 are considered to be consecutive, whereas a King and an
Ace are also considered to be consecutive. For example, each of the three
hands below is a straight:
                             8♠, 9♥, 10♦, J♠, Q♣
238                                   Chapter 5.     Discrete Probability


                            A♦, 2♥, 3♠, 4♠, 5♣
                           10♦, J♥, Q♠, K♠, A♣
   • Assume you get a uniformly random hand of cards. Determine the
     probability that this hand is a straight.
5.19 Three people P1 , P2 , and P3 are in a dark room. Each person has a bag
containing one red hat and one blue hat. Each person chooses a uniformly
random hat from her bag and puts it on her head. Afterwards, the lights are
turned on.
   Each person does not know the color of her hat, but can see the colors of
the other two hats. Each person Pi can do one of the following:
   • Person Pi announces “my hat is red”.
   • Person Pi announces “my hat is blue”.
   • Person Pi says “I pass”.
The game is a success if at least one person announces the correct color of
her hat and no person announces the wrong color of her hat. (If a person
passes, then she does not announce any color.)
   • Assume person P1 announces “my hat is red” and both P2 and P3 pass.
     Consider the event
                           A = “the game is a success”.
      Determine Pr(A).
   • Assume each person Pi does the following:
        – If the two hats that Pi sees have different colors, then Pi passes.
        – If the two hats that Pi sees are both red, then Pi announces “my
          hat is blue”.
        – If the two hats that Pi sees are both blue, then Pi announces “my
          hat is red”.
      Consider the event
                           B = “the game is a success”.
      Determine Pr(B).
5.16.      Exercises                                                     239


5.20 Let A be an event in some probability space (S, Pr). You are given
that the events A and A are independent2 . Determine Pr(A).

5.21 You are given three events A, B, and C in some probability space
(S, Pr). Is the following true or false?
                     
     Pr A ∩ B ∩ C = Pr(A ∪ B ∪ C) − Pr(B) − Pr(C) + Pr (B ∩ C) .

5.22 Let S be a set consisting of 6 positive integers and 8 negative integers.
Choose a 4-element subset of S uniformly at random, and multiply the ele-
ments in this subset. Denote the product by x. Determine the probability
that x > 0.

5.23 Prove the inequality in (5.3), i.e., prove that

                                 1 − x ≤ e−x

for all real numbers x.

5.24 Let (S, Pr) be a probability space and let B be an event with Pr(B) > 0.
Consider the function Pr0 : S → R by
                                  (
                                     Pr(ω)
                                           if ω ∈ B,
                        Pr0 (ω) =    Pr(B)
                                     0     if ω 6∈ B.

   • Prove that Pr0 is a probability function on S according to Defini-
     tion 5.2.2.

   • Prove that for any event A,

                                           Pr(A ∩ B)
                               Pr0 (A) =             .
                                             Pr(B)

5.25 Consider two events A and B in some probability space (S, Pr).
                                          
   • Assume that Pr(A) = 1/2 and Pr B | A = 3/5. Determine Pr(A∪B).
                                             
   • Assume that Pr(A∪B) = 5/6 and Pr A | B = 1/3. Determine Pr(B).
  2
      This is not a typo.
240                                         Chapter 5.     Discrete Probability


5.26 Give an example of a sample space S and six events A, B, C, D, E,
and F such that

   • Pr(A | B) = Pr(A),

   • Pr(C | D) < Pr(C),

   • Pr(E | F ) > Pr(E).

Hint: The sequence of six events may contain duplicates. Try to make the
sample space S as small as you can.

5.27 You roll a fair die twice. Consider the events

                   A = “the sum of the two rolls is 7”,
                   B = “the result of the first roll is 4”.

Determine the conditional probabilities Pr(A | B) and Pr(B | A).

5.28 You flip a fair coin three times. Consider the four events (recall that
zero is even)

        A   =   “the   coin   comes   up   heads an odd number of times”,
        B   =   “the   coin   comes   up   heads an even number of times”,
        C   =   “the   coin   comes   up   tails an odd number of times”,
        D   =   “the   coin   comes   up   tails an even number of times”.

   • Determine Pr(A), Pr(B), Pr(C), Pr(D), Pr(A | C), and Pr(A | D).

   • Are there any two events in the sequence A, B, C, and D that are
     independent?

5.29 Consider a box that contains four beer bottles b1 , b2 , b3 , b4 and two cider
bottles c1 , c2 . You choose a uniformly random bottle from the box (and do
not put it back), after which you again choose a uniformly random bottle
from the box.
   Consider the events

             A = “the first bottle chosen is a beer bottle”,
             B = “the second bottle chosen is a beer bottle”.
5.16.   Exercises                                                       241


   • What is the sample space?
   • For each element ω in your sample space, determine Pr(ω).
   • Determine Pr(A).
   • Determine Pr(B).
   • Are the events A and B independent?
5.30 A standard deck of 52 cards contains 13 spades (♠), 13 hearts (♥), 13
clubs (♣), and 13 diamonds (♦). You choose a uniformly random card from
this deck. Consider the events
         A = “the chosen card is a clubs or a diamonds card”,
         B = “the chosen card is a clubs or a hearts card”,
         C = “the chosen card is a clubs or a spades card”.
   • Are the events A, B, and C pairwise independent?
   • Are the events A, B, and C mutually independent?
5.31 You roll a fair die twice. Consider the events
             A = “the sum of the results is at least 9”,
             B = “at least one of the two rolls results in 2”,
             C = “at least one of the two rolls results in 5”.
   • Determine Pr(A), Pr(B), and Pr(C).
   • Determine Pr(B | C).
   • Are the events A and B independent?
   • Are the events A and C independent?
5.32 A hand of 13 cards is chosen uniformly at random from a standard deck
of 52 cards. Consider the events
                 A = “the hand has at least one Ace”,
                 B = “the hand has at least two Aces”,
                 C = “the hand has the Ace of spades”.
Determine the conditional probabilities Pr(A | B), Pr(B | A), and Pr(B | C).
242                                     Chapter 5.         Discrete Probability


5.33 We take a uniformly random permutation of a standard deck of 52
cards, so that each permutation has a probability of 1/52!. Consider the
events
             A = “the top card is an Ace”,
             B = “the bottom card is the Ace of spades”,
             C = “the bottom card is the Queen of spades”.
Determine the conditional probabilities Pr(A | B) and Pr(A | C).
5.34 Consider two dice, each one having one face showing the letter a, two
faces showing the letter b, and the remaining three faces showing the letter c.
You roll each die once, independently of the other die.
   • What is the sample space?
   • Consider the events
      A = “at least one of the two dice shows the letter b on its top face”,
      B = “both dice show the same letter on their top faces”.
      Determine Pr(A), Pr(B), and Pr(A | B).
5.35 You flip a fair coin, independently, three times. Consider the events
              A = “the first flip results in heads”,
              B = “the coin comes up heads exactly once”.
Determine the conditional probabilities Pr(A | B) and Pr(B | A).
5.36 You roll a fair die twice. Consider the events
               A = “the sum of the results is even”,
               B = “the sum of the results is at least 10”.
Determine the conditional probability Pr(A | B).
5.37 You flip a fair coin seven times, independently of each other. Consider
the events
               A   =   “the   number   of   heads is at least six”,
               B   =   “the   number   of   heads is at least five”,
               C   =   “the   number   of   tails is at least two”,
               D   =   “the   number   of   heads is at least four”.


Determine the conditional probabilities Pr(A | B) and Pr(C | D).
5.16.   Exercises                                                        243


5.38 Consider the set Y = {1, 2, 3, . . . , 10}. We choose, uniformly at ran-
dom, a 6-element subset X of Y . Consider the events

          A = “5 is an element of X”,
          B = “6 is an element of X”,
          C = “6 is an element of X or 7 is an element of X”.

   • Determine Pr(A), Pr(B), and Pr(C).

   • Determine Pr(A | B), Pr(A | C), and Pr(B | C).

5.39 Let A and B be two events
                              in some probability space (S, Pr) such that
Pr(A) = 2/5 and Pr A ∪ B = 3/10.

   • Assume that A and B are disjoint. Determine Pr(B).

   • Assume that A and B are independent. Determine Pr(B).

5.40 In this exercise, we assume that, when a child is born, its gender is
uniformly random, its day of birth is uniformly random, the gender and day
of birth are independent of each other and independent of other children.
    Anil Maheshwari has two children. You are given that at least one of
Anil’s kids is a boy who was born on a Sunday. Determine the probability
that Anil has two boys.

5.41 Elisa and Nick go to Tan Tran’s Darts Bar. When Elisa throws a dart,
she hits the dartboard with probability p. When Nick throws a dart, he
hits the dartboard with probability q. Here, p and q are real numbers with
0 < p < 1 and 0 < q < 1. Elisa and Nick throw one dart each, independently
of each other. Consider the events

                 E = “Elisa’s dart hits the dartboard”,
                 N = “Nick’s dart hits the dartboard”.

Determine Pr(E | E ∪ N ) and Pr(E ∩ N | E ∪ N ).

5.42 As everyone knows, Elisa Kazan loves to drink cider. You may not be
aware that Elisa is not a big fan of beer.
244                                      Chapter 5.       Discrete Probability


    Consider a round table that has six seats numbered 1, 2, 3, 4, 5, 6. Elisa
is sitting in seat 1. On top of the table, there is a rotating tray3 . On this
tray, there are five bottles of beer (B) and one bottle of cider (C), as in the
figure below. After the tray has been spun, there is always a bottle exactly
in front of Elisa. (In other words, you can only spin the tray by a multiple of
60 degrees.) Moreover, Elisa can only see the bottle that is in front of her.
                                         4

                               3         B         5
                                    B         B


                                    B         B
                                2        C        6

                                         1




   Elisa spins the tray uniformly at random in clockwise order. After the
tray has come to a rest, there is a bottle of beer in front of her. Since Elisa is
obviously not happy, she gets a second chance, i.e., Elisa can choose between
one of the following two options:
   1. Spin the tray again uniformly at random and independently of the first
      spin. After the tray has come to a rest, Elisa must drink the bottle
      that is in front of her.

   2. Rotate the tray one position (i.e., 60 degrees) in clockwise order, after
      which Elisa must drink the bottle that is in front of her.

   • Elisa decides to go for the first option. Determine the probability that
     she drinks the bottle of cider.

   • Elisa decides to go for the second option. Determine the probability
     that she drinks the bottle of cider.

5.43 You are given three dice D1 , D2 , and D3 :
  3
    According to Wikipedia, such a tray is called a Lazy Susan or Lazy Suzy. You may
have seen them in Chinese restaurants.
5.16.    Exercises                                                         245


   • Die D1 has 0 on two of its faces and 1 on the other four faces.

   • Die D2 has 0 on all six faces.

   • Die D3 has 1 on all six faces.
You throw these three dice in a box so that they end up at uniformly random
orientations. You pick a uniformly random die in the box and observe that it
has 0 on its top face. Determine the probability that the die that you picked
is D1 .
Hint: You want to determine Pr(A | B), where A is the event that you pick
D1 and B is the event that you see a 0 on the top face of the die that you
picked. There are different ways to define the sample space S. One way is
to take
                    S = {(D1 , 0), (D1 , 1), (D2 , 0), (D3 , 1)},
where, for example, (D1 , 1) is the outcome in which you observe 1 on top of
die D1 . Note that this is not a uniform probability space.

5.44 According to Statistics Canada, a random person in Canada has
   • a probability of 4/5 to live to at least 70 years old and

   • a probability of 1/2 to live to at least 80 years old.
John (a random person in Canada) has just celebrated his 70-th birthday.
What is the probability that John will celebrate his 80-th birthday?

5.45 Nick is taking the course SPID 2804 (The Effect of Spiderman on the
Banana Industry). The final exam for this course consists of one true/false
question. To answer this question, Nick uses the following approach:

        1. If Nick knows that the answer to the question is “true”, he
           answers “true”.

        2. If Nick knows that the answer is “false”, he answers “false”.

        3. If Nick does not know the answer, he flips a fair coin.

           (a) If the coin comes up heads, he answers “true”.
           (b) If the coin comes up tails, he answers “false”.
246                                   Chapter 5.      Discrete Probability


    You are given that Nick knows the answer to the question with probabil-
ity 0.8. Consider the event

           A = “Nick gives the correct answer to the question”.

Determine Pr(A).

5.46 Let A and B be events in some probability space (S, Pr), such that
Pr(A) 6= 0 and Pr(B) 6= 0. Use the definition of conditional probability to
prove Bayes’ Theorem:
                                    Pr(B | A) · Pr(A)
                      Pr(A | B) =                     .
                                         Pr(B)

5.47 Medical doctors have developed a test for detecting disease X.
   • The test is 98% effective on people who have X: If a person has X,
     then with probability 0.98, the test says that the person indeed has X.

   • The test gives a false reading for 3% of the population without the
     disease: If a person does not have X, then with probability 0.03, the
     test says that the person does have X.

   • It is known that 0.1% of the population has X.
Assume we choose a person uniformly at random from the population and
test this person for disease X.
   • Determine the probability that the test says that the person has X.

   • Assume the test says that the person has X. Use Exercise 5.46 to
     determine the probability that the person indeed has X.

5.48 In this exercise, we consider a standard deck of 52 cards.
   • We choose, uniformly at random, one card from the deck. Consider the
     events

                A = “the rank of the chosen card is Ace”,
                B = “the suit of the chosen card is diamonds”.

      Are the events A and B independent?
5.16.     Exercises                                                    247


   • Assume we remove the Queen of hearts from the deck. We choose,
     uniformly at random, one card from the remaining 51 cards. Consider
     the events

                 C = “the rank of the chosen card is Ace”,
                 D = “the suit of the chosen card is diamonds”.

        Are the events C and D independent?

5.49 Let n ≥ 2 and m ≥ 1 be integers and consider two sets A and B, where
A has size n and B has size m. We choose a uniformly random function
f : A → B. For any two integers i and k with 1 ≤ i ≤ n and 1 ≤ k ≤ m,
consider the event
                          Aik = “f (i) = k”.

   • For two integers i and k, determine Pr (Aik ).

   • For two distinct integers i and j, and for an integer k, are the two
     events Aik and Ajk independent?

5.50 Consider three events A, B, and C in some probability space (S, Pr),
and assume that Pr(B ∩ C) 6= 0 and Pr(C) 6= 0. Prove that

             Pr(A ∩ B ∩ C) = Pr(A | B ∩ C) · Pr(B | C) · Pr(C).

5.51 You have a fair die and do the following experiment:

   • Roll the die once; let x be the outcome.

   • Roll the die x times (independently); let y be the smallest outcome of
     these x rolls.

   • Roll the die y times (independently); let z be the largest outcome of
     these y rolls.

Use Exercise 5.50 to determine

                       Pr(x = 1 and y = 2 and z = 3).

5.52 A standard deck of 52 cards has four Aces.
248                                       Chapter 5.     Discrete Probability


   • You get a uniformly random hand of three cards. Consider the event

                       A = “the hand consists of three Aces”.

      Determine Pr(A).

   • You get three cards, which are chosen one after another. Each of these
     three cards is chosen uniformly at random from the current deck of
     cards. (When a card has been chosen, it is removed from the current
     deck.) Consider the events

                              B = “all three cards are Aces”

      and, for i = 1, 2, 3,

                              Bi = “the i-th card is an Ace.”

      Express the event B in terms of B1 , B2 , and B3 , and use this expression,
      together with Exercise 5.50, to determine Pr(B).

5.53 Let p be a real number with 0 < p < 1. You are given two coins C1
and C2 . The coin C1 is fair, i.e., if you flip this coin, it comes up heads with
probability 1/2 and tails with probability 1/2. If you flip the coin C2 , it
comes up heads with probability p and tails with probability 1 − p. You pick
one of these two coins uniformly at random, and flip it twice. These two coin
flips are independent of each other. Consider the events

                A = “the first coin flip results in heads”,
                B = “the second coin flip results in heads”.

   • Determine Pr(A).

   • Assume that p = 1/4. Are the events A and B independent?

   • Determine all values of p for which the events A and B are independent.

5.54 Let n ≥ 2 be an integer. Assume we have n balls and 10 boxes. We
throw the balls independently and uniformly at random in the boxes. Thus,
for each k and i with 1 ≤ k ≤ n and 1 ≤ i ≤ 10,

               Pr( the k-th ball falls in the i-th box ) = 1/10.
5.16.   Exercises                                                        249


Consider the event

           An = “there is a box that contains at least two balls”

and let pn = Pr (An ).

   • Determine the smallest value of n for which pn ≥ 1/2.

   • Determine the smallest value of n for which pn ≥ 2/3.

5.55 Donald Trump wants to hire a secretary and receives n applications for
this job, where n ≥ 1 is an integer. Since he is too busy in making important
announcements on Twitter, he appoints a three-person hiring committee.
After having interviewed the n applicants, each committee member ranks
the applicants from 1 to n. An applicant is hired for the job if he/she is
ranked first by at least two committee members.
    Since the committee members do not have the ability to rank the appli-
cants, each member chooses a uniformly random ranking (i.e., permutation)
of the applicants, independently of each other.
    John is one of the applicants. Determine the probability that John is
hired.

5.56 Edward, Francois-Xavier, Omar, and Yaser are sitting at a round table,
as in the figure below.
                                     E



                             Y               FX



                                     O

    At 11:59am, they all lower their heads. At noon, each of the boys chooses
a uniformly random element from the set {CW , CCW , O}; these choices are
independent of each other. If a boy chooses CW , then he looks at his clock-
wise neighbor, if he chooses CCW , then he looks at his counter-clockwise
neighbor, and if he chooses O, then he looks at the boy at the other side of
the table. When two boys make eye contact, they both shout Vive le Québec
libre.
250                                     Chapter 5.       Discrete Probability


   • Consider the event
         A = “both Edward and Francois-Xavier shout Vive le Québec
             libre, whereas neither Omar nor Yaser does”.

      Determine Pr(A).

   • Consider the event
          B = “both Francois-Xavier and Yaser shout Vive le Québec
              libre, whereas neither Edward nor Omar does”.

      Determine Pr(B).

   • For any integer i with 0 ≤ i ≤ 4, consider the event

                  Ci = “exactly i boys shout Vive le Québec libre”.

      Determine
                                      4
                                      X
                                            Pr (Ci ) .
                                      i=0

      Justify your answer in plain English.

   • Determine each of the five probabilities Pr (C0 ), Pr (C1 ), . . . , Pr (C4 ).

5.57 You are given a fair die. For any integer n ≥ 1, you roll this die n times
(the rolls are independent). Consider the events

            An = “the sum of the results of the n rolls is even”

and

 Bn = “the last roll in the sequence of n rolls results in an even number”,

and their probabilities
                                  pn = Pr (An )
and
                                 qn = Pr (Bn ) .

   • Determine p1 .
5.16.    Exercises                                                       251


   • For any integer n ≥ 1, determine qn .

   • For any integer n ≥ 2, express the event An in terms of the events An−1
     and Bn .

   • Use the previous parts to determine pn for any integer n ≥ 2.

5.58 You are asked to design a random bit generator. You find a coin in
your pocket, but, unfortunately, you are not sure if it is a fair coin. After
some thought, you come up with the following algorithm GenerateBit(n),
which takes as input an integer n ≥ 1:
      Algorithm GenerateBit(n):

          // all coin flips made are mutually independent
          flip the coin n times;
          k = the number of heads in the sequence of n coin flips;
          if k is odd
          then return 0
          else return 1
          endif

   In this exercise, you will show that, when n → ∞, the output of algorithm
GenerateBit(n) is a uniformly random bit.
   Let p be the real number with 0 < p < 1, such that, if the coin is flipped
once, it comes up heads with probability p and tails with probability 1 − p.
(Note that algorithm GenerateBit does not need to know the value of p.)
For any integer n ≥ 1, consider the two events

              An = “algorithm GenerateBit(n) returns 0”

and

      Bn = “the n-th coin flip made by algorithm GenerateBit(n)
           results in heads”,

and define
                               Pn = Pr (An )
and
                              Qn = Pn − 1/2.
252                                          Chapter 5.           Discrete Probability


   • Determine P1 and Q1 .

   • For any integer n ≥ 2, prove that

                                  Pn = p + (1 − 2p) · Pn−1 .

      Hint: Express the event An in terms of the events An−1 and Bn .

   • For any integer n ≥ 2, prove that

                                    Qn = (1 − 2p) · Qn−1 .

   • For any integer n ≥ 1, prove that

                                Qn = (1 − 2p)n−1 · (p − 1/2).

   • Prove that
                                          lim Qn = 0
                                          n→∞

      and
                                         lim Pn = 1/2.
                                        n→∞


5.59 In this exercise, we will use the product notation. In case you are not
familiar with this notation:

   • For k ≤ m, m
                  Q
                    i=k xi denotes the product

                                    xk · xk+1 · xk+2 · · · xm .
                      Qm
   • If k > m, then       i=k   xi is an “empty” product, which we define to be
     equal to 1.

   Let n ≥ 1 be an integer, and for each i = 1, 2, . . . , n, let pi be a real
number such that 0 < pi < 1. In this exercise, you will prove that
                    n
                    X           n
                                Y                  n
                                                   Y
                          pi        (1 − pj ) = 1 − (1 − pi ).                   (5.7)
                    i=1        j=i+1                   i=1


For example,
5.16.   Exercises                                                                253


   • for n = 1, (5.7) becomes

                                   p1 = 1 − (1 − p1 ),

   • for n = 2, (5.7) becomes

                        p1 (1 − p2 ) + p2 = 1 − (1 − p1 )(1 − p2 ),

   • for n = 3, (5.7) becomes

        p1 (1 − p2 )(1 − p3 ) + p2 (1 − p3 ) + p3 = 1 − (1 − p1 )(1 − p2 )(1 − p3 ).

   Assume we do an experiment consisting of n tasks T1 , T2 , . . . , Tn . Each
task is either a success or a failure, independently of the other tasks. For
each i = 1, 2, . . . , n, let pi be the probability that Ti is a success. Consider
the event
                         A = “at least one task is a success”.

   • Prove (5.7) by determining Pr(A) in two different ways.

5.60 Let n ≥ 0 be an integer. In this exercise, you will prove that
                           n            
                         X    1      n+k
                               k
                                 ·          = 2n .                              (5.8)
                          k=0
                              2       k

    The Ottawa Senators and the Toronto Maple Leafs play a best-of-(2n+1)
series: These two hockey teams play games against each other, and the first
team to win n + 1 games wins the series. Assume that
   • each game has a winner (thus, no game ends in a tie),
   • in any game, the Sens have a probability of 1/2 of defeating the Leafs,
     and
   • the results of the games are mutually independent.
Consider the events

                         A = “the Sens win the series”

and
                        B = “the Leafs win the series”.
254                                      Chapter 5.       Discrete Probability


   • Explain in plain English why Pr(A) = Pr(B) = 1/2.

   • For each k with 0 ≤ k ≤ n, consider the event

      Ak = “the Sens win the series after winning the (n + k + 1)-st game”.

      Express the event A in terms of the events A0 , A1 , . . . , An .

   • Consider a fixed value of k with 0 ≤ k ≤ n. Prove that
                                                  
                                       1      n+k
                          Pr (Ak ) = n+k+1 ·         .
                                     2          k

      Hint: Assume event Ak occurs. Which team wins the (n + k + 1)-st
      game? In the first n + k games, how many games are won by the Leafs?

   • Prove that (5.8) holds by combining the results of the previous parts.

5.61 Let n ≥ 0 be an integer. In this exercise, you will prove that
                      n         
                      X      1  n     1
                                         2n+1 − 1 .
                                                 
                                   =                                       (5.9)
                      k=0
                            k+1 k    n+1

   There are n + 1 students in Carleton’s Computer Science program. We
denote these students by P1 , P2 , . . . , Pn+1 . We play the following game:

  1. We choose a uniformly random subset X of {P1 , P2 , . . . , Pn+1 }.

  2. (a) If X 6= ∅, then we choose a uniformly random student in X. The
         chosen student wins a six-pack of cider.
      (b) If X = ∅, then nobody wins the six-pack.

The random choices made are independent of each other.

   • Consider the event

                          A0 = “nobody wins the six-pack”.

      Determine Pr (A0 ).
5.16.     Exercises                                                       255


   • For each i = 1, 2, . . . , n + 1, consider the event
                        Ai = “student Pi wins the six-pack”.
        Explain in plain English why
                       Pr (A1 ) = Pr (A2 ) = . . . = Pr (An+1 ) .

   • Prove that
                                             1 − 1/2n+1
                                Pr (A1 ) =              .
                                                n+1
   • For each k with 0 ≤ k ≤ n, consider the event
                 Bk = “X has size k + 1 and P1 wins the six-pack”.
        Prove that                             n
                                               
                                               k         1
                               Pr (Bk ) =           ·       .
                                             2n+1       k+1
   • Express the event A1 in terms of the events B0 , B1 , . . . , Bn .
   • Prove that (5.9) holds by combining the results of the previous parts.
5.62 Let n and k be integers with 1 ≤ n ≤ k ≤ 2n. In this exercise, you will
prove that
                        n
                                  2n − k
                                        
                       X     k                2n
                                           =       .                  (5.10)
                      i=k−n
                              i    n − i       n
    Jim is working on his assignment for the course COMP 4999 (Computa-
tional Aspects of Growing Cannabis). There are 2n questions on this assign-
ment and each of them is worth 1 mark. Two minutes before the deadline,
Jim has completed the first k questions. Jim is very smart and all answers
to these k questions are correct. Jim knows that the instructor, Professor
Mary Juana, does not accept late submissions. Because of this, Jim leaves
the last 2n − k questions blank and hands in his assignment.
    Tri is a teaching assistant for this course. Since Tri is lazy, he does not
want to mark all questions. Instead, he chooses a uniformly random subset of
n questions out of the 2n questions, and only marks the n chosen questions.
For each correct answer, Tri gives 2 marks, whereas he gives 0 marks for each
wrong (or blank) answer.
    For each integer i ≥ 0, consider the event
           Ai = “Jim receives exactly 2i marks for his assignment”.
256                                  Chapter 5.        Discrete Probability

                                            P
   • Determine the value of the summation     i   Pr (Ai ). Explain your answer
     in plain English.
   • Determine all values of i for which the event Ai is non-empty. For each
     such value i, determine Pr (Ai ).
   • Prove that (5.10) holds by combining the results of the previous parts.

5.63 Let a and z be integers with a > z ≥ 1, and let p be a real number
with 0 < p < 1. Alexa and Zoltan play a game consisting of several rounds.
In one round,
  1. Alexa receives a points with probability p and 0 points with probability
     1 − p,
  2. Zoltan receives z points (with probability 1).
We assume that the results of different rounds are independent.

   • Consider the event

          A = “in one round, Alexa receives more points than Zoltan”.

      We say that Alexa is a better player than Zoltan, if Pr(A) > 1/2.
      For which values of p is Alexa a better player than Zoltan?
   • Assume that a = 3, z = 2, and
                                 √ p is chosen such that p > 1/2 and
      2
     p < 1/2. (For example, p = ( 5 − 1)/2.)
        – Is Alexa a better player than Zoltan?
        – Alexa and Zoltan play a game consisting of two rounds. We con-
          sider the total number of points that each player wins during these
          two rounds. Consider the event

            B = “in two rounds, Alexa receives more points than Zoltan”.

           Prove that Pr(B) < 1/2. (This seems to suggest that Zoltan is a
           better player than Alexa.)
   • Let n be a large integer, and assume that a = n + 1, z = n, and p
     is chosen very close to (but less than) 1. (For example, n = 500 and
     p = 0.99.)
5.16.     Exercises                                                              257


          – Is Alexa a better player than Zoltan?
          – Alexa and Zoltan play a game consisting of n rounds. We consider
            the total number of points that each player wins during these n
            rounds. Consider the event

               C = “in n rounds, Alexa receives more points than Zoltan”.

             Prove that Pr(C) = pn . (If n = 500 and p = 0.99, then pn ≈
             0.0066. This seems to suggest that Zoltan is a much better player
             than Alexa.)

5.64 Let k ≥ 1 be an integer. √
                              Assume we live on a planet on which one year
has d = 4k 2 days. Consider d = 2k people P1 , P2 , . . . , P2k living on our
planet. Each person has a uniformly random birthday, and the birthdays of
these 2k people are mutually independent. Consider the event

         A = “at least two of P1 , P2 , . . . , P2k have the same birthday”.

This exercise will lead you through a proof of the claim that

                           0.221 < Pr(A) < 0.5.
                                     √
Thus, if one year has d days, then d people are enough to have a good
chance that not all birthdays are distinct. (This result is similar to the one
we obtained in Section 5.5.1.)

   • For each i with 1 ≤ i ≤ 2k, consider the event

         Bi = “Pi has the same birthday as at least one of P1 , P2 , . . . , Pi−1 ”.

        Prove that
                                                  i−1
                                     Pr (Bi ) ≤       .
                                                   d
   • Express the event A in terms of the events B1 , B2 , . . . , B2k .

   • Use the Union Bound (Lemma 5.3.5) to prove that

                                       Pr(A) < 1/2.
258                                      Chapter 5.      Discrete Probability


   • Consider the event

         B = “at least two of Pk+1 , Pk+2 , . . . , P2k have the same birthday”

      and for each i with 1 ≤ i ≤ k, the event

      Ci = “Pi has the same birthday as at least one of Pk+1 , Pk+2 , . . . , P2k ”.

      Prove that
                                               1
                                   Pr Ci | B =    .
                                               4k

   • Prove that if the event A occurs, then the event
                                                     
                      C1 ∩ B ∩ C2 ∩ B ∩ · · · ∩ Ck ∩ B

      also occurs.

   • Prove that                                  k
                                               1
                                Pr A ≤      1−       .
                                               4k
      You may use the fact that the events C 1 ∩ B, C 2 ∩ B, . . . , C k ∩ B are
      mutually independent.

   • Use the inequality 1 − x ≤ e−x to prove that

                             Pr(A) ≥ 1 − e−1/4 > 0.221.

5.65 Let n be a large power of two (thus, log n is an integer). Consider a
binary string s = s1 s2 . . . sn , where each bit si is 0 with probability 1/2, and
1 with probability 1/2, independently of the other bits.
   A run of length k is a substring of length k, all of whose bits are equal. In
Section 5.14, we have seen that it is very likely that the bitstring s contains
a run of length at least log n − 2 log log n. In this exercise, you will prove
that it is very unlikely that s contains a run of length more than 2 log n.

   • Let k be an integer with 1 ≤ k ≤ n. Consider the event

             A = “the bitstring s contains a run of length at least k”.
5.16.     Exercises                                                         259


        For each i with 1 ≤ i ≤ n − k + 1, consider the event

                    Ai = “the substring si si+1 . . . si+k−1 is a run”.

        Use the Union Bound (Lemma 5.3.5) to prove that

                                            n−k+1
                                  Pr(A) ≤         .
                                             2k−1


   • Let k = 2 log n. Prove that

                                      Pr(A) ≤ 2/n.


5.66 A hand of 5 cards is chosen uniformly at random from a standard deck
of 52 cards. Consider the event

                     A = “the hand has at least one Ace”.



   • Explain what is wrong with the following argument:

        We are going to determine Pr(A). Event A states that the hand
        has at least one Ace. By symmetry, we may assume that A is
        the event that the hand has the Ace of  spades. Since there are
         52                                   51
         5
             hands of five cards and exactly 4 of them contain the Ace
        of spades, it follows that
                                         51
                                            
                                          4      5
                               Pr(A) = 52 = .
                                          5
                                                 52



   • Explain what is wrong with the following argument:
260                                          Chapter 5.        Discrete Probability


      We are going to determine Pr(A) using the Law of Total Proba-
      bility (Theorem 5.9.1). For each x ∈ {♠, ♥, ♣, ♦}, consider the
      event
                  Bx = “the hand has the Ace of suit x”.
      We observe that
                                              51
                                               
                                              4          5
                            Pr (Bx ) =        52
                                                   =      .
                                              5
                                                        52
      We next observe that

                                 Pr (A | Bx ) = 1,

      because if event Bx occurs, then event A also occurs. Thus,
      using the Law of Total Probability, we get
                              X
                   Pr(A) =        Pr (A | Bx ) · Pr (Bx )
                                    x
                                  X
                             =           1 · Pr (Bx )
                                    x
                               X 5
                             =
                               x
                                 52
                                        5
                             = 4·
                                        52
                                  5
                             =       .
                                  13


   • Determine the value of Pr(A).

5.67 You are doing two projects P and Q. The probability that project P
is successful is equal to 2/3 and the probability that project Q is successful
is equal to 4/5. Whether or not these two projects are successful are inde-
pendent of each other. What is the probability that both P and Q are not
successful?

5.68 Consider two independent events A and B in some probability space
(S, Pr). Assume that A and B are disjoint, i.e., A ∩ B = ∅. What can you
say about Pr(A) and Pr(B)?
5.16.     Exercises                                                        261


5.69 You flip three fair coins independently of each other. Let A be the event
“at least two flips in a row are heads” and let B be the event “the number
of heads is even”. (Note that zero is even.) Are A and B independent?

5.70 You flip three fair coins independently of each other. Consider the
events
                     A = “there is at most one tails”
and
                       B = “not all flips are identical”.
Are A and B independent?

5.71 Let n ≥ 2 be an integer and consider two fixed integers a and b with
1 ≤ a < b ≤ n.
   • Use the Product Rule to determine the number of permutations of
     {1, 2, . . . , n} in which a is to the left of b.

   • Consider a uniformly random permutation of the set {1, 2, . . . , n}, and
     define the event

                   A = “in this permutation, a is to the left of b”.

        Use your answer to the first part of this exercise to determine Pr(A).

5.72 Let n ≥ 4 be an integer and consider a uniformly random permutation
of the set {1, 2, . . . , n}. Consider the event

 A = “in this permutation, both 3 and 4 are to the left of both 1 and 2”.

Determine Pr(A).

5.73 Let n ≥ 3 be an integer, consider a uniformly random permutation of
the set {1, 2, . . . , n}, and define the events

                A = “in this permutation, 2 is to the left of 3”

and

  B = “in this permutation, 1 is to the left of 2 and 1 is to the left of 3”.

Are these two events independent?
262                                   Chapter 5.     Discrete Probability


5.74 Let n ≥ 4 be an integer. Consider a uniformly random permutation of
{1, 2, . . . , n} and define the events

      A = “1 and 2 are next to each other, with 1 to the left of 2, or
          4 and 3 are next to each other, with 4 to the left of 3”

and
      B = “1 and 2 are next to each other, with 1 to the left of 2, or
          2 and 3 are next to each other, with 2 to the left of 3”.

Determine Pr(A) and Pr(B). (Before you determine these probabilities,
spend a few minutes and guess which probability is larger.)

5.75 You flip two fair coins independently of each other. Consider the events

                A =      “the number of heads is odd”,
                B =      “the first coin comes up heads”,
                C =      “the second coin comes up heads”.

   • Are the events A and B independent?

   • Are the events A and C independent?

   • Are the events B and C independent?

   • Are the events A, B, and C pairwise independent?

   • Are the events A, B, and C mutually independent?

5.76 You roll a fair die once. Consider the events

              A = “the result is an element of {1, 3, 4}”,
              B = “the result is an element of {3, 4, 5, 6}”.

Are these two events independent?

5.77 You roll a fair die once. Consider the events
                      A = “the result is even”,
                      B = “the result is odd”,
                      C = “the result is at most 4”.
5.16.   Exercises                                                        263


   • Are the events A and B independent?

   • Are the events A and C independent?

   • Are the events B and C independent?

5.78 You are given a tetrahedron, which is a die with four faces. Each of
these faces has one of the bitstrings 110, 101, 011, and 000 written on it.
Different faces have different bitstrings.
    We roll the tetrahedron so that each face is at the bottom with equal
probability 1/4. For k = 1, 2, 3, consider the event

   Ak = “the bitstring written on the bottom face has 0 at position k”.

For example, if the bitstring at the bottom face is 101, then A1 is false, A2
is true, and A3 is false.

   • Are the events A1 and A2 independent?

   • Are the events A1 and A3 independent?

   • Are the events A2 and A3 independent?

   • Are the events A1 , A2 , A3 pairwise independent?

   • Are the events A1 , A2 , A3 mutually independent?

5.79 In a group of 100 children, 34 are boys and 66 are girls. You are given
the following information about the girls:

   • Each girl has green eyes or is blond or is left-handed.

   • 20 of the girls have green eyes.

   • 40 of the girls are blond.

   • 50 of the girls are left-handed.

   • 10 of the girls have green eyes and are blond.

   • 14 of the girls have green eyes and are left-handed.

   • 4 of the girls have green eyes, are blond, and are left-handed.
264                                      Chapter 5.         Discrete Probability


We choose one of these 100 children uniformly at random. Consider the
events

            G =       “the child chosen is a girl with green eyes”,
            B =       “the child chosen is a blond girl”,
            L =       “the child chosen is a left-handed girl”.

   • Are the events G and B independent?

   • Are the events G and L independent?

   • Are the events B and L independent?

   • Verify whether or not the following equation holds:

                      Pr(G ∧ B ∧ L) = Pr(G) · Pr(B) · Pr(L).

5.80 Let S be a sample space consisting of 100 elements. Consider three
events A, B, and C, as indicated in the figure below. For example, the event
A consists of 50 elements, 20 of which are only in A, 20 of which are only in
A ∩ B, 5 of which are only in A ∩ C, and 5 of which are in A ∩ B ∩ C.

                  S
                                                            B
                         A

                                20       20        20
                                          5
                                     5         5


                                          10
                                                        C
                        15


   Consider the uniform probability function on this sample space.

   • Are the events A and B independent?
5.16.   Exercises                                                       265


   • Determine whether or not
                      Pr(A ∩ B | C) = Pr(A | C) · Pr(B | C).

5.81 Annie, Boris, and Charlie write an exam that consists of only one
question: What is 26 times 26? Calculators are not allowed during the
exam. Both Annie and Boris are pretty clever and each of them gives the
correct answer with probability 9/10. Charlie has trouble with two-digit
numbers and gives the correct answer with probability 6/10.
   • Assume that the three students do not cheat, i.e., each student answers
     the question independently of the other two students. Determine the
     probability that at least two of them give the correct answer.
   • Assume that Annie and Boris do not cheat, but Charlie copies Annie’s
     answer. Determine the probability that at least two of them give the
     correct answer.
Hint: The answer to the second part is smaller than the answer to the first
part.

5.82 Alexa and Zoltan play the following game:
    AZ-game:
    Step 1: Alexa chooses a uniformly random element from the set
    {1, 2, 3}. Let a denote the element that Alexa chooses.
    Step 2: Zoltan chooses a uniformly random element from the set
    {1, 2, 3}. Let z denote the element that Zoltan chooses.
    Step 3: Using one of the three strategies mentioned below, Alexa
    chooses an element from the set {1, 2, 3} \ {a}. Let a0 denote the
    element that Alexa chooses.
    Step 4: Using one of the three strategies mentioned below, Zoltan
    chooses an element from the set {1, 2, 3} \ {z}. Let z 0 denote the
    element that Alexa chooses.
    The AZ-game is a success if a0 6= z 0 .


   • MinMin Strategy: In Step 3, Alexa chooses the smallest element in the
     set {1, 2, 3} \ {a}, and Zoltan chooses the smallest element in the set
     {1, 2, 3} \ {z}.
266                                   Chapter 5.      Discrete Probability


         – Describe the sample space for this strategy.
         – For this strategy, determine the probability that the AZ-game is
           a success.

   • MinMax Strategy: In Step 3, Alexa chooses the smallest element in the
     set {1, 2, 3} \ {a}, and Zoltan chooses the largest element in the set
     {1, 2, 3} \ {z}.

         – Describe the sample space for this strategy.
         – For this strategy, determine the probability that the AZ-game is
           a success.

   • Random Strategy: In Step 3, Alexa chooses a uniformly random element
     in the set {1, 2, 3}\{a}, and Zoltan chooses a uniformly random element
     in the set {1, 2, 3} \ {z}.

         – Describe the sample space for this strategy.
         – For this strategy, determine the probability that the AZ-game is
           a success.

5.83 You are given a box that contains one red ball and one blue ball.
Consider the following algorithm RandomRedBlue(n) that takes as input an
integer n ≥ 3:
      Algorithm RandomRedBlue(n):

          // n ≥ 3
          // initially, the box contains one red ball and one blue ball
          // all random choices are mutually independent
          for k = 1 to n − 2
          do choose a uniformly random ball in the box;
              if the chosen ball is red
              then put the chosen ball back in the box;
                     add one red ball to the box
              else put the chosen ball back in the box;
                   add one blue ball to the box
              endif
          endfor
5.16.     Exercises                                                        267


   For any integers n ≥ 3 and i with 1 ≤ i ≤ n − 1, consider the event

           Ani = “at the end of algorithm RandomRedBlue(n),
                 the number of red balls in the box is equal to i”.

   In this exercise, you will prove that for any integers n ≥ 3 and i with
1 ≤ i ≤ n − 1,
                                           1
                              Pr (Ani ) =     .                     (5.11)
                                          n−1
   • Let n ≥ 3 and k be integers with 1 ≤ k ≤ n − 2. When running
     algorithm RandomRedBlue(n),

          – how many balls does the box contain at the start of the k-th
            iteration,
          – how many balls does the box contain at the end of the k-th iter-
            ation?

   • Let n ≥ 3 be an integer. After algorithm RandomRedBlue(n) has
     terminated, how many balls does the box contain?

   • For any integer n ≥ 3, prove that
                                                 1
                                  Pr (An1 ) =       .
                                                n−1

   • For any integer n ≥ 3, prove that
                                                 1
                                 Pr Ann−1 =
                                         
                                                    .
                                                n−1

   • Let n = 3. Prove that (5.11) holds for all values of i in the indicated
     range.

   • Let n ≥ 4. Consider the event

        A = “in the (n − 2)-th iteration of algorithm RandomRedBlue(n),
            a red ball is chosen”.

        For any integer i with 2 ≤ i ≤ n − 2, express the event Ani in terms of
        the events An−1    n−1
                    i−1 , Ai   , and A.
268                                        Chapter 5.    Discrete Probability


   • Let n ≥ 4. For any integer i with 2 ≤ i ≤ n − 2, prove that
                             n−1         n−1
         Pr (Ani ) = Pr A | Ai−1               + Pr A | Ain−1 · Pr Ain−1 .
                                                                     
                                   · Pr Ai−1

   • Let n ≥ 4. Prove that (5.11) holds for all values of i in the indicated
     range.

5.84 Prove that for any real number x 6= 1 and any integer N ≥ 0,
                              N
                              X            1 − xN +1
                                    xn =             .
                              n=0
                                             1−x

5.85 Use the following argumentation to convince yourself that
                                 ∞
                                 X
                                       1/2n = 2.
                                 n=0


Take the interval I = [0, 2) of length 2 on the real line and, for each n ≥ 0, an
interval In of length 1/2n . It is possible to place all intervals In with n ≥ 0
in I such that

   • no two intervals In and Im , with m 6= n, overlap and

   • all intervals In with n ≥ 0 completely cover the interval I.

5.86 Alexa, Tri, and Zoltan play the OddPlayer game: In one round, each
player flips a fair coin.

  1. Assume that not all flips are equal. Then the coin flips of exactly two
     players are equal. The player whose coin flip is different is called the
     odd player. In this case, the odd player wins the game. For example, if
     Alexa flips tails, Tri flips heads, and Zoltan flips tails, then Tri is the
     odd player and wins the game.

  2. If all three coin flips are equal, then the game is repeated.

Below, this game is presented in pseudocode:
5.16.     Exercises                                                         269

    Algorithm OddPlayer:

            // all coin flips are mutually independent
            each player flips a fair coin;
            if not all coin flips are equal
            then the game terminates and the odd player wins
            else OddPlayer
            endif


   • What is the sample space?

   • Consider the event

                            A = “Alexa wins the game”.

        Express this event as a subset of the sample space.

   • Use your expression from the previous part to determine Pr(A).

   • Use symmetry to determine Pr(A). Explain your answer in plain En-
     glish and a few sentences.
        Hint: What is the probability that Tri wins the game? What is the
        probability that Zoltan wins the game?

5.87 Two players P1 and P2 take turns rolling two fair and independent dice,
where P1 starts the game. The first player who gets a sum of seven wins the
game. Determine the probability that player P1 wins the game.

5.88 By flipping a fair coin repeatedly and independently, we obtain a se-
quence of H’s and T ’s. We stop flipping the coin as soon as the sequence
contains either HH or T H.
   Two players P1 and P2 play a game, in which P1 wins if the last two
symbols in the sequence are HH. Otherwise, the last two symbols in the
sequence are T H, in which case P2 wins. Determine the probability that
player P1 wins the game.

5.89 Two players P1 and P2 play a game in which they take turns flipping,
independently, a fair coin: First P1 flips the coin, then P2 flips the coin, then
P1 flips the coin, then P2 flips the coin, etc. The game ends as soon as the
270                                    Chapter 5.       Discrete Probability


sequence of coin flips contains either HH or T T . The player who flips the
coin for the last time is the winner of the game. For example, if the sequence
of coin flips is HT HT HH, then P2 wins the game.
    Determine the probability that player P1 wins the game.
5.90 We flip a fair coin repeatedly and independently, and stop as soon as
we see one of the two sequences HT T and HHT . Let A be the event that
the process stops because HT T is seen.
   • Prove that the event A is given by the set
                        {T m (HT )n HT T : m ≥ 0, n ≥ 0}.
      In other words, event A holds if and only if the sequence of coin flips
      is equal to T m (HT )n HT T for some m ≥ 0 and n ≥ 0.
   • Prove that Pr(A) = 1/3.
5.91 For i ∈ {1, 2}, consider the game Gi , in which two players P1 and P2
take turns flipping, independently, a fair coin, where Pi starts. The game
ends as soon as heads comes up. The player who flips heads first is the
winner of the game Gi . For j ∈ {1, 2}, consider the event
                       Bij = “Pj wins the game Gi ”.
In Section 5.15.2, we have seen that
                         Pr (B11 ) = Pr (B22 ) = 2/3                   (5.12)
and
                         Pr (B12 ) = Pr (B21 ) = 1/3.                  (5.13)
   Consider the game G, in which P1 and P2 take turns flipping, indepen-
dently, a fair coin, where P1 starts. The game ends as soon as a second heads
comes up. The player who flips the second heads wins the game. Consider
the event
                          A = “P1 wins the game G”.
In Section 5.15.3, we used an infinite series to show that
                                Pr(A) = 4/9.                           (5.14)
Use the Law of Total Probability (Theorem 5.9.1) to give an alternative proof
of (5.14). You are allowed to use (5.12) and (5.13).
5.16.   Exercises                                                            271


5.92 Consider two players P1 and P2 :

   • P1 has one fair coin.

   • P2 has two coins. One of them is fair, whereas the other one is 2-headed
     (Her Majesty is on both sides of this coin).

The two players P1 and P2 play a game in which they alternate making turns:
P1 starts, after which it is P2 ’s turn, after which it is P1 ’s turn, after which
it is P2 ’s turn, etc.

   • When it is P1 ’s turn, she flips her coin once.

   • When it is P2 ’s turn, he does the following:

        – P2 chooses one of his two coins uniformly at random. Then he
          flips the chosen coin once.
        – If the first flip did not results in heads, then P2 repeats this process
          one more time: P2 again chooses one of his two coins uniformly
          at random and flips the chosen coin once.

The player who flips heads first is the winner of the game.

   • Determine the probability that P2 wins this game, assuming that all
     random choices and coin flips made are mutually independent.

5.93 Jennifer loves to drink India Pale Ale (IPA), whereas Connor Hillen
prefers Black IPA. Jennifer and Connor decide to go to their favorite pub
Chez Lindsay et Simon. The beer menu shows that this pub has ten beers
on tap:

   • Phillips Cabin Fever Imperial Black IPA,

   • Big Rig Black IPA,

   • Leo’s Early Breakfast IPA,

   • Goose Island IPA,

   • Caboose IPA,

   • and five other beers, neither of which is an IPA.
272                                    Chapter 5.      Discrete Probability


Each of the first five beers is an IPA, whereas each of the first two beers is a
Black IPA.
   Jennifer and Connor play a game, in which they alternate ordering beer:
Connor starts, after which it is Jennifer’s turn, after which it is Connor’s
turn, after which it is Jennifer’s turn, etc.
   • When it is Connor’s turn, he orders two beers; each of these is chosen
     uniformly at random from the ten beers (thus, these two beers may be
     equal).
   • When it is Jennifer’s turn, she orders one of the ten beers, uniformly
     at random.
The game ends as soon as (i) Connor has ordered at least one Black IPA, in
which case he pays the bill, or (ii) Jennifer has ordered at least one IPA, in
which case she pays the bill.
   • Determine the probability that Connor pays the bill, assuming that all
     random choices made are mutually independent.
5.94 You would like to generate a uniformly random bit, i.e., with proba-
bility 1/2, this bit is 0, and with probability 1/2, it is 1. You find a coin in
your pocket, but you are not sure if it is a fair coin: It comes up heads (H)
with probability p and tails (T ) with probability 1 − p, for some real number
p that is unknown to you. In particular, you do not know if p = 1/2. In this
exercise, you will show that this coin can be used to generate a uniformly
random bit.
    Consider the following recursive algorithm GetRandomBit, which does
not take any input:
      Algorithm GetRandomBit:

          // all coin flips made are mutually independent
          flip the coin twice;
          if the result is HT
          then return 0
          else if the result is T H
                then return 1
                else GetRandomBit
                endif
          endif
5.16.    Exercises                                                            273


   • The sample space S is the set of all sequences of coin flips that can oc-
     cur when running algorithm GetRandomBit. Determine this sample
     space S.

   • Prove that algorithm GetRandomBit returns a uniformly random
     bit.

5.95 You would like to generate a biased random bit: With probability 2/3,
this bit is 0, and with probability 1/3, it is 1. You find a fair coin in your
pocket: This coin comes up heads (H) with probability 1/2 and tails (T )
with probability 1/2. In this exercise, you will show that this coin can be
used to generate a biased random bit.
    Consider the following recursive algorithm GetBiasedBit, which does
not take any input:
    Algorithm GetBiasedBit:

          // all coin flips made are mutually independent
          flip the coin;
          if the result is H
          then return 0
          else b = GetBiasedBit;
                return 1 − b
          endif


   • The sample space S is the set of all sequences of coin flips that can occur
     when running algorithm GetBiasedBit. Determine this sample space
     S.

   • Prove that algorithm GetBiasedBit returns 0 with probability 2/3.

5.96 Both Alexa and Shelly have an infinite bitstring. Alexa’s bitstring is
denoted by a1 a2 a3 . . ., whereas Shelly’s bitstring is denoted by s1 s2 s3 . . ..
Alexa can see her bitstring, but she cannot see Shelly’s bitstring. Similarly,
Shelly can see her bitstring, but she cannot see Alexa’s bitstring. The bits
in both bitstrings are uniformly random and independent.
    The ladies play the following game: Alexa chooses a positive integer k
and Shelly chooses a positive integer `. The game is a success if sk = 1 and
a` = 1. In words, the game is a success if Alexa chooses a position in Shelly’s
274                                        Chapter 5.        Discrete Probability


bitstring that contains a 1, and Shelly chooses a position in Alexa’s bitstring
that contains a 1.
    • Assume Alexa chooses k = 4 and Shelly chooses ` = 7. Determine the
      probability that the game is a success.

    • Assume Alexa chooses the position, say k, of the leftmost 1 in her
      bitstring, and Shelly chooses the position, say `, of the leftmost 1 in
      her bitstring.

         – If k 6= `, is the game a success?
         – Determine the probability that the game is a success.

5.97 Alexa and Shelly take turns flipping, independently, a coin, where Alexa
starts. The game ends as soon as heads comes up. The lady who flips heads
first is the winner of the game.
    Alexa proposes that they both use a fair coin. Of course, Shelly does
not agree, because she knows from Section 5.15.2 that this gives Alexa a
probability of 2/3 of winning the game.
    The ladies agree on the following: Let p and q be real numbers with
0 < p < 1 and 0 ≤ q ≤ 1. Alexa uses a coin that comes up heads with
probability p, and Shelly uses a coin that comes up heads with probability q.

    • Assume that p = 1/2. Determine the value of q for which Alexa and
      Shelly have the same probability of winning the game.

    • From now on, assume that 0 < p < 1 and 0 < q < 1.

         – Determine the probability that Alexa wins the game.
         – Assume that p > 1/2. Prove that for any q with 0 < q < 1, the
           probability that Alexa wins the game is strictly larger than 1/2.
         – Assume that p < 1/2. Determine the value of q for which Alexa
           and Shelly have the same probability of winning the game.

5.98 Let n ≥ 2 be an integer and consider a uniformly random permutation
(a1 , a2 , . . . , an ) of the set {1, 2, . . . , n}. For each k with 1 ≤ k ≤ n, consider
the event
         Ak = “ak is the largest element among the first k elements
              in the permutation”.
5.16.     Exercises                                                              275


   • Let k and ` be two integers with 1 ≤ k < ` ≤ n. Prove that the events
     Ak and A` are independent.

        Hint: Use the Product Rule to determine the number of permutations
        that define Ak , A` , and Ak ∩ A` , respectively.



   • Prove that the sequence A1 , A2 , . . . , An of events is mutually indepen-
     dent.




5.99 Let n ≥ 2 be an integer. We generate a random bitstring R =
r1 r2 · · · rn , by setting, for each i = 1, 2, . . . , n, ri = 1 with probability 1/i
and, thus, ri = 0 with probability 1 − 1/i. All random choices made when
setting these bits are mutually independent.
   For each i with 1 ≤ i ≤ n, consider the events


                                   Bi = “ri = 1”


and

           Ri = “the rightmost 1 in the bitstring R is at position i”.



   • Determine Pr (Ri ).



    The following algorithm TryToFindRightmostOne(R, n, m) takes as
input the bitstring R = r1 r2 · · · rn of length n and an integer m with 1 ≤
m ≤ n. As the name suggests, this algorithm tries to find the position of the
rightmost 1 in the string R.
276                                      Chapter 5.       Discrete Probability


      Algorithm TryToFindRightmostOne(R, n, m):

          for i = 1 to m
          do if ri = 1
              then k = i
              endif
          endfor;
          // k is the position of the rightmost 1 in the substring
          // r1 r2 · · · rm .
          // the next while-loop finds the position of the leftmost 1
          // in the substring rm+1 rm+2 · · · rn , if this position exists.
          ` = m + 1;
          while ` ≤ n and r` = 0
          do ` = ` + 1
          endwhile;
          // if ` ≤ n, then ` is the position of the leftmost 1 in the
          // substring rm+1 rm+2 · · · rn .
          if ` ≤ n
          then return `
          else return k
          endif

   Consider the event

       Em = “there is exactly one 1 in the substring rm+1 rm+2 · · · rn ”.

   • Prove that
                                                                 
                               m        1   1          1
                    Pr (Em ) =            +   + ··· +                 .
                               n        m m+1         n−1

Consider the event
 A = “TryToFindRightmostOne(R, n, m) returns the position of the
     rightmost 1 in the string R”.

   • Prove that
                                                                  
                           m           1   1          1
                  Pr (A) =          1+   +   + ··· +                      .
                           n           m m+1         n−1
5.16.   Exercises                                                           277


5.100 You realize that it is time to buy a pair of shoes. You look up all n
shoe stores in Ottawa and visit them in random order. While shopping, you
create a bitstring r1 r2 · · · rn of length n: For each i with 1 ≤ i ≤ n, you set
ri to 1 if and only if the i-th store has the best pair of shoes, among the first
i stores that you have visited.

   • Use Exercise 5.98 to prove that this bitstring satisfies the condition in
     Exercise 5.99.

   After you have visited the first m shoe stores, you are bored of shopping.
You keep on visiting shoe stores, but as soon as you visit a store that has
a pair of shoes that you like more than the previously best pair you have
found, you buy the former pair of shoes.

   • Use Exercise 5.99 to determine the probability that you buy the best
     pair of shoes that is available in Ottawa.
278   Chapter 5.   Discrete Probability
Chapter 6

Random Variables and
Expectation

             A natural question: What is the definition of random variable?
             Classically, and in many of today’s textbooks, you see definitions
             such as, a random variable is the observed value of a random
             quantity. What on earth does that mean? How can any sort of
             theory be built on such vagueness?
              — Persi Diaconis and Brian Skyrms, Ten Great Ideas About
             Chance, 2018



6.1     Random Variables
We have already seen random variables in Chapter 5, even though we did not
use that term there. For example, in Section 5.2.1, we rolled a die twice and
were interested in the sum of the results of these two rolls. In other words,
we did an “experiment” (rolling a die twice) and asked for a function of the
outcome (the sum of the results of the two rolls).

Definition 6.1.1 Let S be a sample space. A random variable on the sample
space S is a function X : S → R.

   In the example given above, the sample space is

                     S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6}
280                Chapter 6.      Random Variables and Expectation


and the random variable is the function X : S → R defined by

                              X(i, j) = i + j

for all (i, j) in S.
    Note that the term “random variable” is misleading: A random variable
is not random, but a function that assigns, to every outcome ω in the sample
space S, a real number X(ω). Also, a random variable is not a variable, but
a function.

            A random variable is neither random nor variable.


6.1.1    Flipping Three Coins
Assume we flip three coins. The sample space is

        S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T },

where, e.g., T T H indicates that the first two coins come up tails and the
third coin comes up heads.
    Let X : S → R be the random variable that maps any outcome (i.e., any
element of S) to the number of heads in the outcome. Thus,

                             X(HHH)      =   3,
                             X(HHT )     =   2,
                             X(HT H)     =   2,
                             X(HT T )    =   1,
                             X(T HH)     =   2,
                             X(T HT )    =   1,
                             X(T T H)    =   1,
                             X(T T T )   =   0.

If we define the random variable Y to be the function Y : S → R that

   • maps an outcome to 1 if all three coins come up heads or all three coins
     come up tails, and

   • maps an outcome to 0 in all other cases,
6.1.       Random Variables                                                 281


then we have
                                Y (HHH)      =   1,
                                Y (HHT )     =   0,
                                Y (HT H)     =   0,
                                Y (HT T )    =   0,
                                Y (T HH)     =   0,
                                Y (T HT )    =   0,
                                Y (T T H)    =   0,
                                Y (T T T )   =   1.
    Since a random variable is a function X : S → R, it maps any outcome
ω to a real number X(ω). Usually, we just write X instead of X(ω). Thus,
for any outcome in the sample space S, we denote the value of the random
variable, for this outcome, by X. In the example above, we flip three coins
and write
                         X = the number of heads
and
       
           1 if all three coins come up heads or all three coins come up tails,
Y =
           0 otherwise.

6.1.2       Random Variables and Events
Random variables give rise to events in a natural way. In the three-coin
example, “X = 0” corresponds to the event {T T T }, whereas “X = 2”
corresponds to the event {HHT, HT H, T HH}. The table below gives some
values of the random variables X and Y , together with the corresponding
events.

                value    event
                X=0      {T T T }
                X=1      {HT T, T HT, T T H}
                X=2      {HHT, HT H, T HH}
                X=3      {HHH}
                X=4      ∅
                Y =0     {HHT, HT H, HT T, T HH, T HT, T T H}
                Y =1     {HHH, T T T }
                Y =2     ∅
282                 Chapter 6.    Random Variables and Expectation


  Thus, the event “X = x” corresponds to the set of all outcomes that are
mapped, by the function X, to the value x:

Definition 6.1.2 Let S be a sample space and let X : S → R be a random
variable. For any real number x, we define “X = x” to be the event

                          {ω ∈ S : X(ω) = x}.

    Let us return to the example in which we flip three coins. Assume that
the coins are fair and the three flips are mutually independent. Consider
again the corresponding random variables X and Y . It should be clear how
we determine, for example, the probability that X is equal to 0, which we
will write as Pr(X = 0). Using our interpretation of “X = 0” as being the
event {T T T }, we get

                        Pr(X = 0) = Pr(T T T )
                                  = 1/8.

Similarly, we get

       Pr(X = 1) =     Pr({HT T, T HT, T T H})
                 =     3/8,
       Pr(X = 2) =     Pr({HHT, HT H, T HH})
                 =     3/8,
       Pr(X = 3) =     Pr({HHH})
                 =     1/8,
       Pr(X = 4) =     Pr(∅)
                 =     0,
       Pr(Y = 0) =     Pr({HHT, HT H, HT T, T HH, T HT, T T H})
                 =     6/8
                 =     3/4,
       Pr(Y = 1) =     Pr({HHH, T T T })
                 =     2/8
                 =     1/4,
       Pr(Y = 2) =     Pr(∅)
                 =     0.
6.2.   Independent Random Variables                                     283


   Consider an arbitrary probability space (S, Pr) and let X : S → R be
a random variable. Using (5.1) and Definition 6.1.2, the probability of the
event “X = x”, i.e., the probability that X is equal to x, is equal to

                 Pr(X = x) = Pr({ω ∈ S : X(ω) = x})
                               X
                           =        Pr(ω).
                                  ω:X(ω)=x


   We have interpreted “X = x” as being an event. We extend this to more
general statements involving X. For example, “X ≥ x” denotes the event

                           {ω ∈ S : X(ω) ≥ x}.

For our three-coin example, the random variable X can take each of the
values 0, 1, 2, and 3 with a positive probability. As a result, “X ≥ 2”
denotes the event “X = 2 or X = 3”, and we have

                 Pr(X ≥ 2) =      Pr(X = 2 ∨ X = 3)
                           =      Pr(X = 2) + Pr(X = 3)
                           =      3/8 + 1/8
                           =      1/2.


6.2     Independent Random Variables
In Section 5.11, we have defined the notion of two events being independent.
The following definition extends this to random variables.

Definition 6.2.1 Let (S, Pr) be a probability space and let X and Y be two
random variables on S. We say that X and Y are independent if for all real
numbers x and y, the events “X = x” and “Y = y” are independent, i.e.,

               Pr(X = x ∧ Y = y) = Pr(X = x) · Pr(Y = y).

   Assume we flip three fair coins independently and, as in Section 6.1.1,
consider the random variables

                        X = the number of heads
284                   Chapter 6.     Random Variables and Expectation


and
      
          1 if all three coins come up heads or all three coins come up tails,
Y =
          0 otherwise.
Are these two random variables independent? Observe the following: If
Y = 1, then X = 0 or X = 3. In other words, if we are given some
information about the random variable Y (in this case, Y = 1), then the
random variable X cannot take, for example, the value 2. Based on this, we
take x = 2 and y = 1 in Definition 6.2.1. Since the event “X = 2 ∧ Y = 1”
is equal to ∅, we have
                        Pr(X = 2 ∧ Y = 1) = Pr(∅) = 0.
On the other hand, we have seen in Section 6.1.2 that Pr(X = 2) = 3/8 and
Pr(Y = 1) = 1/4. It follows that
                  Pr(X = 2 ∧ Y = 1) 6= Pr(X = 2) · Pr(Y = 1)
and, therefore, the random variables X and Y are not independent.
   Now consider the random variable
                   
                      1 if the first coin comes up heads,
              Z=
                      0 if the first coin comes up tails.
We claim that the random variables Y and Z are independent. To verify
this, we have to show that for all real numbers y and z,
                  Pr(Y = y ∧ Z = z) = Pr(Y = y) · Pr(Z = z).              (6.1)
   Recall from Section 6.1.2 that Pr(Y = 1) = 1/4 and Pr(Y = 0) = 3/4.
Since the coin flips are independent, we have Pr(Z = 1) = 1/2 and Pr(Z =
0) = 1/2. Furthermore,
                Pr(Y = 1 ∧ Z = 1) =      Pr(HHH)
                                  =      1/8,
                Pr(Y = 1 ∧ Z = 0) =      Pr(T T T )
                                  =      1/8,
                Pr(Y = 0 ∧ Z = 1) =      Pr(HHT, HT H, HT T )
                                  =      3/8,
                Pr(Y = 0 ∧ Z = 0) =      Pr(T HH, T HT, T T H)
                                  =      3/8.
6.3.   Distribution Functions                                                     285


It follows that
                  Pr(Y = 1 ∧ Z = 1) = Pr(Y = 1) · Pr(Z = 1),
                  Pr(Y = 1 ∧ Z = 0) = Pr(Y = 1) · Pr(Z = 0),
                  Pr(Y = 0 ∧ Z = 1) = Pr(Y = 0) · Pr(Z = 1),
and
                  Pr(Y = 0 ∧ Z = 0) = Pr(Y = 0) · Pr(Z = 0).
Thus, (6.1) holds if (y, z) ∈ {(1, 1), (1, 0), (0, 1), (0, 0)}. For any other pair
(y, z), such as (y, z) = (3, 5) or (y, z) = (1, 2), at least one of the events
“Y = y” and “Z = z” is the empty set, i.e., cannot occur. Therefore, for
such pairs, we have
               Pr(Y = y ∧ Z = z) = 0 = Pr(Y = y) · Pr(Z = z).
Thus, we have indeed verified that (6.1) holds for all real numbers y and z. As
a result, we have shown that the random variables Y and Z are independent.
   Are the random variables X and Z independent? If X = 0, then all three
coins come up tails and, therefore, Z = 0. Thus,
                        Pr(X = 0 ∧ Z = 1) = Pr(∅) = 0,
whereas
                     Pr(X = 0) · Pr(Z = 1) = 1/8 · 1/2 6= 0.
As a result, the random variables X and Z are not independent.
   We have defined the notion of two random variables being independent.
As in Definition 5.11.3, there are two ways to generalize this to sequences of
random variables:

Definition 6.2.2 Let (S, Pr) be a probability space, let n ≥ 2, and let
X1 , X2 , . . . , Xn be a sequence of random variables on S.
   1. We say that this sequence is pairwise independent if for all real numbers
      x1 , x2 , . . . , xn , the sequence “X1 = x1 ”, “X2 = x2 ”, . . . , “Xn = xn ” of
      events is pairwise independent.
   2. We say that this sequence is mutually independent if for all real numbers
      x1 , x2 , . . . , xn , the sequence “X1 = x1 ”, “X2 = x2 ”, . . . , “Xn = xn ” of
      events is mutually independent.
286                 Chapter 6.       Random Variables and Expectation


6.3      Distribution Functions
Consider a random variable X on a sample space S. In Section 6.1.2, we
have defined Pr(X = x), i.e., the probability of the event “X = x”, to be

                   Pr(X = x) = Pr({ω ∈ S : X(ω) = x}).

This defines a function that maps any real number x to the real number
Pr(X = x). This function is called the distribution function of the random
variable X:

Definition 6.3.1 Let (S, Pr) be a probability space and let X : S → R be a
random variable. The distribution function of X is the function D : R → R
defined by
                            D(x) = Pr(X = x)

for all x ∈ R.

    For example, consider a fair red die and a fair blue die, and assume we
roll them independently. The sample space is

                     S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},

where i is the result of the red die and j is the result of the blue die. Each
outcome (i, j) in S has the same probability of 1/36.
    Let X be the random variable whose value is equal to the sum of the
results of the two dies. The matrix below gives all possible values of X. The
leftmost column gives the result of the red die, the top row gives the result
of the blue die, and each other entry is the corresponding value of X.

                             1   2   3    4   5     6
                         1   2   3   4    5   6     7
                         2   3   4   5    6   7     8
                         3   4   5   6    7   8     9
                         4   5   6   7    8   9    10
                         5   6   7   8    9   10   11
                         6   7   8   9   10   11   12
6.4.    Expected Values                                                  287


   As can be seen from this matrix, the random variable X can take any
value in {2, 3, 4, . . . , 12}. The distribution function D of X is given by

                         D(2)      = Pr(X = 2)      =   1/36,
                         D(3)      = Pr(X = 3)      =   2/36,
                         D(4)      = Pr(X = 4)      =   3/36,
                         D(5)      = Pr(X = 5)      =   4/36,
                         D(6)      = Pr(X = 6)      =   5/36,
                         D(7)      = Pr(X = 7)      =   6/36,
                         D(8)      = Pr(X = 8)      =   5/36,
                         D(9)      = Pr(X = 9)      =   4/36,
                         D(10)     = Pr(X = 10)     =   3/36,
                         D(11)     = Pr(X = 11)     =   2/36,
                         D(12)     = Pr(X = 12)     =   1/36,

whereas for all x 6∈ {2, 3, 4, . . . , 12},

                               D(x) = Pr(X = x) = 0.

    In Sections 6.6 and 6.7, we will see other examples of distribution func-
tions.


6.4       Expected Values
Consider the probability space (S, Pr) with sample space S = {1, 2, 3} and
probability function Pr defined by Pr(1) = 4/5, Pr(2) = 1/10, and Pr(3) =
1/10. Assume we choose an element in S according to this probability func-
tion. Let X be the random variable whose value is equal to the element in S
that is chosen. Thus, as a function X : S → R, we have X(1) = 1, X(2) = 2,
and X(3) = 3.
    The “expected value” of X is the value of X that we observe “on average”.
How should we define this? Since X has a much higher probability to take
the value 1 than the other two values 2 and 3, the value 1 should get a larger
“weight” in the expected value of X. Based on this, it is natural to define
the expected value of X to be

                                                   4      1      1  13
         1 · Pr(1) + 2 · Pr(2) + 3 · Pr(3) = 1 ·     +2·    +3·    = .
                                                   5     10     10  10
288                    Chapter 6.        Random Variables and Expectation


Definition 6.4.1 Let (S, Pr) be a probability space and let X : S → R be
a random variable. The expected value of X is defined to be
                                X
                        E(X) =      X(ω) · Pr(ω),
                                       ω∈S

provided this summation converges absolutely1 .

6.4.1      Some Examples
Flipping a coin: Assume we flip a fair coin, in which case the sample
space is S = {H, T } and Pr(H) = Pr(T ) = 1/2. Define the random variable
X to have value
                       
                          1 if the coin comes up heads,
                  X=
                          0 if the coin comes up tails.

Thus, as a function X : S → R, we have X(H) = 1 and X(T ) = 0. The
expected value E(X) of X is equal to

                    E(X) = X(H) · Pr(H) + X(T ) · Pr(T )
                               1    1
                         = 1· +0·
                               2    2
                           1
                         =   .
                           2
This example shows that the term “expected value” is a bit misleading: E(X)
is not the value that we expect to observe, because the value of X is never
equal to its expected value.

Rolling a die: Assume we roll a fair die. Define the random variable X to
be the value of the result. Then, X takes each of the values in {1, 2, 3, 4, 5, 6}
with equal probability 1/6, and we get
                                 1    1   1   1   1    1
              E(X) = 1 ·           +2· +3· +4· +5· +6·
                                 6    6   6   6   6    6
                           7
                      =      .
                           2
   1
                P∞                                      P∞
    The series n=0 an converges absolutely if the series n=0 |an | converges. If a series
converges absolutely, then we can change the order of summation without changing the
value of the series.
6.4.   Expected Values                                                     289


Now define the random variable Y to be equal to one divided by the result
of the die. In other words, Y = 1/X. This random variable takes each of the
values in {1, 1/2, 1/3, 1/4, 1/5, 1/6} with equal probability 1/6, and we get

                        1 1 1 1 1 1 1 1 1 1 1
           E(Y ) = 1 ·     + · + · + · + · + ·
                        6 2 6 3 6 4 6 5 6 6 6
                      49
                   =     .
                     120

Note that E(Y ) 6= 1/E(X). Thus, this example shows that, in general,
E(1/X) 6= 1/E(X).



Rolling two dice: Consider a fair red die and a fair blue die, and assume
we roll them independently. The sample space is


                      S = {(i, j) : 1 ≤ i ≤ 6, 1 ≤ j ≤ 6},


where i is the result of the red die and j is the result of the blue die. Each
outcome (i, j) in S has the same probability of 1/36.
   Let X be the random variable whose value is equal to the sum of the
results of the two rolls. As a function X : S → R, we have X(i, j) = i+j. The
matrix below gives all possible values of X. The leftmost column indicates
the result of the red die, the top row indicates the result of the blue die, and
each other entry is the corresponding value of X.



                              1   2   3    4   5     6
                          1   2   3   4    5   6     7
                          2   3   4   5    6   7     8
                          3   4   5   6    7   8     9
                          4   5   6   7    8   9    10
                          5   6   7   8    9   10   11
                          6   7   8   9   10   11   12
290                Chapter 6.          Random Variables and Expectation


   The expected value E(X) of X is equal to
                     X
          E(X) =         X(i, j) · Pr(i, j)
                     (i,j)∈S
                      X                     1
                 =             (i + j) ·
                                           36
                     (i,j)∈S
                      1 X
                 =        (i + j)
                     36
                         (i,j)∈S
                    1
                 =    · the sum of all 36 entries in the matrix
                   36
                    1
                 =    · 252
                   36
                 = 7.

6.4.2    Comparing the Expected Values of Comparable
         Random Variables
Consider a probability space (S, Pr), and let X and Y be two random vari-
ables on S. Recall that X and Y are functions that map elements of S to
real numbers. We will write X ≤ Y , if for each element ω ∈ S, we have
X(ω) ≤ Y (ω). In other words, the value of X is at most the value of Y ,
no matter which outcome ω is chosen. The following lemma should not be
surprising:

Lemma 6.4.2 Let (S, Pr) be a probability space and let X and Y be two
random variables on S. If X ≤ Y , then E(X) ≤ E(Y ).

Proof. Using Definition 6.4.1 and the assumption that X ≤ Y , we obtain
                                  X
                       E(X) =         X(ω) · Pr(ω)
                                           ω∈S
                                           X
                                   ≤             Y (ω) · Pr(ω)
                                           ω∈S
                                   = E(Y ).
6.4.    Expected Values                                                  291


6.4.3     An Alternative Expression for the Expected Value
In the last example of Section 6.4.1, we used Definition 6.4.1 to compute the
expected value E(X) of the random variable X that was defined to be the
sum of the results when rolling two fair and independent dice. This was a
painful way to compute E(X), because we added all 36 entries in the matrix.
There is a slightly easier way to determine E(X): By looking at the matrix,
we see that the value 4 occurs three times. Thus, the event “X = 4” has
size 3, i.e., if we consider the subset of the sample space S that corresponds
to this event, then this subset has size 3. Similarly, the event “X = 7” has
size 6, because the value 7 occurs 6 times in the matrix. The table below
lists the sizes of all non-empty events, together with their probabilities.
                     event    size of event   probability
                    X=2             1            1/36
                    X=3             2            2/36
                    X=4             3            3/36
                    X=5             4            4/36
                    X=6             5            5/36
                    X=7             6            6/36
                    X=8             5            5/36
                    X=9             4            4/36
                    X = 10          3            3/36
                    X = 11          2            2/36
                    X = 12          1            1/36
Based on this, we get
                    1      2        3      4        5       6
        E(X) = 2 ·    +3·    +4·       +5·     +6·      +7·    +
                   36     36      36       36       36      36
                    5      4         3        2         1
                8·    +9·    + 10 ·    + 11 ·    + 12 ·
                   36     36        36        36        36
              = 7.
Even though this is still quite painful, less computation is needed. What we
have done is the following: In the definition of E(X), i.e.,
                                  X
                       E(X) =         X(i, j) · Pr(i, j),
                                (i,j)∈S

we rearranged the terms in the summation. That is, instead of taking the
sum over all elements (i, j) in S,
292                Chapter 6.        Random Variables and Expectation


   • we grouped together all outcomes (i, j) for which X(i, j) = i + j has
     the same value, say, k,
   • we multiplied this common value k by the probability that X is equal
     to k,
   • and we took the sum of the resulting products over all possible values
     of k.
This resulted in
                                 12
                                 X
                        E(X) =            k · Pr(X = k).
                                    k=2
The following lemma states that we can do this for any random variable.
Lemma 6.4.3 Let (S, Pr) be a probability space and let X : S → R be a
random variable. The expected value of X is equal to
                                X
                       E(X) =       x · Pr(X = x).
                                     x

Proof. Recall that the event “X = x” corresponds to the subset
                        Ax = {ω ∈ S : X(ω) = x}
of the sample space S. We have
                             X
                    E(X) =     X(ω) · Pr(ω)
                              ω∈S
                              X           X
                          =                       X(ω) · Pr(ω)
                                x    ω:X(ω)=x
                              X           X
                          =                       x · Pr(ω)
                                x    ω:X(ω)=x
                              XX
                          =                 x · Pr(ω)
                                x    ω∈Ax
                              X           X
                          =          x           Pr(ω)
                                x         ω∈Ax
                              X
                          =          x · Pr (Ax )
                                x
                              X
                          =          x · Pr(X = x).
                                x
6.5.   Linearity of Expectation                                          293




    When determining the expected value of a random variable X, it is usually
easier to use Lemma 6.4.3 than Definition 6.4.1. To use Lemma 6.4.3, you
have to do the following:

   • Determine all values x that X can take, i.e., determine the range of
     the function X.

   • For each such value x, determine Pr(X = x).

   • Compute the sum of all products x · Pr(X = x).

             Expected value of a random variable X : S → R:
                                             P
                • Definition 6.4.1: E(X) = ω∈S X(ω) · Pr(ω). This is a
                  sum over all elements of the domain of X.
                                          P
                • Lemma 6.4.3: E(X) = x x · Pr(X = x). This is a sum
                  over all elements of the range of X.



6.5     Linearity of Expectation
In this section, we will present one of the most useful tools for determining
expected values. Consider a probability space (S, Pr), and let X and Y be
two random variables on S. Recall that X and Y are functions that map
elements of S to real numbers. Let a and b be two real numbers, and let
Z : S → R be the random variable defined by

                        Z(ω) = a · X(ω) + b · Y (ω)

for all elements ω in S. Thus, we combine the random variables X and Y ,
together with the real numbers a and b, into a new random variable Z on
the same sample space S. Usually, we just write this new random variable
as Z = aX + bY .
    The Linearity of Expectation tells us how to obtain the expected value of
Z from the expected values of X and Y :
294                Chapter 6.         Random Variables and Expectation


Theorem 6.5.1 Let (S, Pr) be a probability space. For any two random
variables X and Y on S, and for any two real numbers a and b,

                    E(aX + bY ) = a · E(X) + b · E(Y ).

Proof. We write Z = aX + bY . Using Definition 6.4.1, we get
                     X
           E(Z) =       Z(ω) · Pr(ω)
                         ω∈S
                         X
                    =          (a · X(ω) + b · Y (ω)) · Pr(ω)
                         ω∈S
                           X                        X
                    = a          X(ω) · Pr(ω) + b         Y (ω) · Pr(ω)
                           ω∈S                      ω∈S
                    = a · E(X) + b · E(Y ).



    Let us return to the example in which we roll two fair and independent
dice, one being red and the other being blue. Define the random variable
X to be the sum of the results of the two rolls. We have seen two ways to
compute the expected value E(X) of X. We now present a third way, which
is the easiest one: We define two random variables

                        Y = the result of the red die

and
                        Z = the result of the blue die.
In Section 6.4.1, we have seen that
                        1    1   1   1   1   1 7
          E(Y ) = 1 ·     +2· +3· +4· +5· +6· = .
                        6    6   6   6   6   6 2
By the same computation, we have
                                          7
                                    E(Z) = .
                                          2
Observe that
                                   X = Y + Z.
6.5.    Linearity of Expectation                                                    295


Then, by the Linearity of Expectation (i.e., Theorem 6.5.1), we have
                              E(X) = E(Y + Z)
                                   = E(Y ) + E(Z)
                                     7 7
                                   =    +
                                     2 2
                                   = 7.
    We have stated the Linearity of Expectation for two random variables.
The proof of Theorem 6.5.1 can easily be generalized to any finite sequence
of random variables:
Theorem 6.5.2 Let (S, Pr) be a probability space, let n ≥ 2 be an integer, let
X1 , X2 , . . . , Xn be a sequence of random variables on S, and let a1 , a2 , . . . , an
be a sequence of real numbers. Then,
                                 n
                                         !   n
                                X           X
                            E      ai X i =    ai · E (Xi ) .
                              i=1                      i=1

    The following theorem states that the Linearity of Expectation also holds
for infinite sequences of random variables:
Theorem 6.5.3 Let (S, Pr) be a probability space and let X1 , X2 , . . . be an
infinite sequence of random variables on S such that the infinite series
                                          ∞
                                          X
                                                E (|Xi |)
                                          i=1

converges. Then,
                                 ∞
                                               !       ∞
                                 X                     X
                             E            Xi       =           E (Xi ) .
                                    i=1                  i=1

Proof. Define the random variable X to be
                                                   ∞
                                                   X
                                          X=             Xi .
                                                   i=1

That is, as a function X : S → R, we have
                                                   ∞
                                                   X
                                 X(ω) =                  Xi (ω)
                                                   i=1
296                  Chapter 6.        Random Variables and Expectation


for all elements ω in S.
    The derivation
               P below uses Definition 6.4.1 and the assumption that the
infinite series ∞i=1 E (|Xi |) converges, which allows us to change the order
of summation without changing the value of the series:
                     ∞
                     X                 ∞ X
                                       X
                           E (Xi ) =               Xi (ω) · Pr(ω)
                     i=1               i=1 ω∈S
                                       XX   ∞
                                  =                Xi (ω) · Pr(ω)
                                       ω∈S i=1
                                       X             ∞
                                                     X
                                  =          Pr(ω)         Xi (ω)
                                       ω∈S           i=1
                                       X
                                  =          Pr(ω) · X(ω)
                                       ω∈S
                                  = E(X)
                                       ∞
                                           !
                                       X
                                  = E    Xi .
                                             i=1




6.6      The Geometric Distribution
Let p be a real number with 0 < p < 1 and consider an experiment that is
successful with probability p and fails with probability 1 − p. We repeat this
experiment independently until it is successful for the first time. What is the
expected number of times that we perform the experiment?
    We model this problem in the following way: Assume we have a coin that
comes up heads with probability p and, thus, comes up tails with probability
1 − p. We flip this coin repeatedly and independently until it comes up heads
for the first time. (We have seen this process in Section 5.15 for the case
when p = 1/2.) Define the random variable X to be the number of times
that we flip the coin; this includes the last coin flip, which resulted in heads.
We want to determine the expected value E(X) of X.
    The sample space is given by
                              S = {T k−1 H : k ≥ 1},
6.6.    The Geometric Distribution                                         297


where T k−1 H denotes the sequence consisting of k − 1 tails followed by one
heads. Since the coin flips are independent, the outcome T k−1 H has a prob-
ability of (1 − p)k−1 p = p(1 − p)k−1 , i.e.,
                           Pr T k−1 H = p(1 − p)k−1 .
                                      

Let us first verify that all probabilities add up to 1: Using Lemma 5.15.2, we
have
                    ∞
                    X                              ∞
                                                   X
                           Pr T k−1 H                  p(1 − p)k−1
                                        
                                            =
                     k=1                           k=1
                                                    X∞
                                            = p            (1 − p)k−1
                                                    k=1
                                                    X∞
                                            = p            (1 − p)`
                                                     `=0
                                                           1
                                            = p·
                                                      1 − (1 − p)
                                            = 1.

6.6.1     Determining the Expected Value
We are going to use Lemma 6.4.3 to determine the expected value E(X).
We first observe that X can take any value in {1, 2, 3, . . .}. For any integer
k ≥ 1, X = k if and only if the coin flips give the sequence T k−1 H. It follows
that
                  Pr(X = k) = Pr T k−1 H = p(1 − p)k−1 .
                                             
                                                                           (6.2)
By Lemma 6.4.3, we have
                                        ∞
                                        X
                           E(X) =             k · Pr(X = k)
                                        k=1
                                        X∞
                                  =           kp(1 − p)k−1
                                        k=1
                                         X∞
                                  = p             k(1 − p)k−1 .
                                            k=1

How do we determine the infinite series on the right-hand side?
298                 Chapter 6.          Random Variables and Expectation


   According to Lemma 5.15.2, we have
                               ∞
                               X                1
                                        xk =       ,
                                  k=0
                                               1−x

for any real number x with −1 < x < 1. Both sides of this equation are func-
tions of x and these two functions are equal to each other. If we differentiate
both sides, we get two derivatives that are also equal to each other:
                            ∞
                            X                      1
                                  kxk−1 =                .
                            k=0
                                                (1 − x)2

Since for k = 0, the term kxk−1 is equal to 0, we have
                            ∞
                            X                      1
                                  kxk−1 =                .
                            k=1
                                                (1 − x)2

If we take x = 1 − p, we get
                                          ∞
                                          X
                         E(X) = p               k(1 − p)k−1
                                          k=1
                                                 1
                                  = p·
                                           (1 − (1 − p))2
                                    p
                                  =
                                    p2
                                    1
                                  =   .
                                    p

   In Section 6.3, we have defined the distribution function of a random
variable. The distribution function of the coin-flipping random variable X is
given by (6.2). This function is called a geometric distribution:

Definition 6.6.1 Let p be a real number with 0 < p < 1. A random variable
X has a geometric distribution with parameter p, if its distribution function
satisfies
                         Pr(X = k) = p(1 − p)k−1
for any integer k ≥ 1.
6.7.    The Binomial Distribution                                          299


   Our calculation that led to the value of E(X) proves the following theo-
rem:

Theorem 6.6.2 Let p be a real number with 0 < p < 1 and let X be a
random variable that has a geometric distribution with parameter p. Then

                                 E(X) = 1/p.

   For example, if we flip a fair coin (in which case p = 1/2) repeatedly and
independently until it comes up heads for the first time, then the expected
number of coin flips is equal to 2.


6.7      The Binomial Distribution
As in Section 6.6, we choose a real number p with 0 < p < 1, and consider
an experiment that is successful with probability p and fails with probability
1−p. For an integer n ≥ 1, we repeat the experiment, independently, n times.
What is the expected number of times that the experiment is successful?
    We again model this problem using a coin that comes up heads with
probability p and, thus, comes up tails with probability 1−p. We flip the coin,
independently, n times and define the random variable X to be the number
of times the coin comes up heads. We want to determine the expected value
E(X) of X.
    Since our coin comes up heads with probability p, it is reasonable to guess
that E(X) is equal to pn. For example, if p = 1/2, then, on average, n/2
of the coin flips should come up heads. We will prove below that E(X) is
indeed equal to pn.


6.7.1     Determining the Expected Value
Since the random variable X can take any value in {0, 1, 2, . . . , n}, we have,
by Lemma 6.4.3,
                               X n
                       E(X) =      k · Pr(X = k).
                                   k=0

Thus, we have to determine Pr(X = k), i.e., the probability that in a sequence
of n independent coin flips, the coin comes up heads exactly k times.
300                  Chapter 6.       Random Variables and Expectation


    To give an example, assume that n = 4 and k = 2. The table below gives
all 42 = 6 sequences of 4 coin flips that contain exactly 2 H’s, together with
their probabilities:


                sequence                   probability
                 HHT T       p · p · (1 − p) · (1 − p) = p2 (1 − p)2
                 HT HT       p · (1 − p) · p · (1 − p) = p2 (1 − p)2
                 HT T H      p · (1 − p) · (1 − p) · p = p2 (1 − p)2
                 T HHT       (1 − p) · p · p · (1 − p) = p2 (1 − p)2
                 T HT H      (1 − p) · p · (1 − p) · p = p2 (1 − p)2
                 T T HH      (1 − p) · (1 − p) · p · p = p2 (1 − p)2

                                                       4
                                                           
As can be seen from this table, each of the            2
                                                               sequences has the same
probability p2 (1 − p)2 . It follows that, if n = 4,
                                       
                                       4 2
                          Pr(X = 2) =     p (1 − p)2 .
                                       2

   We now consider the general case. Let n ≥ 1 and k be integers with
0 ≤ k ≤ n. Then, X = k if and only if there are exactly k H’s in the
sequence of n coin flips. The number of such sequences is equal to nk , and
each one of them has probability pk (1 − p)n−k . Therefore, we have
                                     
                                     n k
                        Pr(X = k) =     p (1 − p)n−k .                          (6.3)
                                     k

As a sanity check, let us use Newton’s Binomial Theorem (i.e., Theorem 3.6.5)
to verify that all probabilities add up to 1:
                   n                n  
                   X               X   n k
                       Pr(X = k) =       p (1 − p)n−k
                   k=0             k=0
                                       k
                                     = ((1 − p) + p)n
                                     = 1.

   We are now ready to compute the expected value of the random vari-
6.7.     The Binomial Distribution                                    301


able X:
                                n
                                X
                    E(X) =            k · Pr(X = k)
                                k=0
                               n     
                              X      n k
                            =     k     p (1 − p)n−k
                              k=0
                                     k
                               n     
                              X      n k
                            =     k     p (1 − p)n−k .
                              k=1
                                     k

Since
                         
                         n            n!
                      k     = k·
                         k        k!(n − k)!
                                      (n − 1)!
                            = n·
                                  (k − 1)!(n − k)!
                                 n−1
                                      
                            = n          ,
                                  k−1
we get
                            n
                                  n−1 k
                           X            
                    E(X) =     n          p (1 − p)n−k .
                           k=1
                                   k − 1
By changing the summation variable from k to ` + 1, we get
                        n−1 
                                n − 1 `+1
                        X             
                 E(X) =      n         p (1 − p)n−1−`
                        `=0
                                  `
                           n−1 
                                 n−1 `
                           X           
                      = pn              p (1 − p)n−1−` .
                            `=0
                                    `

By Newton’s Binomial Theorem (i.e., Theorem 3.6.5), the summation is equal
to
                         ((1 − p) + p)n−1 = 1.
Therefore, we get

                             E(X) = pn · 1
                                  = pn.
302                  Chapter 6.      Random Variables and Expectation


   We have done the following: Our intuition told us that E(X) = pn. Then,
we went through a painful calculation to show that our intuition was correct.
There must be an easier way to show that E(X) = pn. We will show below
that there is indeed a much easier way.


6.7.2     Using the Linearity of Expectation
We define a sequence X1 , X2 , . . . , Xn of random variables as follows: For each
i with 1 ≤ i ≤ n,
                   
                      1 if the i-th coin flip results in heads,
              Xi =
                      0 if the i-th coin flip results in tails.

Observe that
                           X = X1 + X2 + · · · + X n ,
because

   • X counts the number of heads in the sequence of n coin flips, and

   • the summation on the right-hand side is equal to the number of 1’s
     in the sequence X1 , X2 , . . . , Xn , which, by definition, is equal to the
     number of heads in the sequence of n coin flips.

Using the Linearity of Expectation (see Theorem 6.5.2), we get
                                         n
                                               !
                                         X
                          E(X) = E          Xi
                                              i=1
                                        n
                                        X
                                    =         E (Xi ) .
                                        i=1

Thus, we have to determine the expected value of Xi . Since Xi is either 1
or 0, we have, using Lemma 6.4.3,

               E (Xi ) =   1 · Pr (Xi = 1) + 0 · Pr (Xi = 0)
                       =   Pr (Xi = 1)
                       =   Pr(the i-th coin flip results in heads)
                       =   p.
6.8.   Indicator Random Variables                                           303


We conclude that
                                        n
                                        X
                             E(X) =            E (Xi )
                                         i=1
                                        n
                                        X
                                    =          p
                                         i=1
                                    = pn.

I hope you agree that this is much easier than what we did before.
    The distribution function of the random variable X is given by (6.3).
This function is called a binomial distribution:

Definition 6.7.1 Let n ≥ 1 be an integer and let p be a real number with
0 < p < 1. A random variable X has a binomial distribution with parameters
n and p, if its distribution function satisfies
                                       
                                        n k
                        Pr(X = k) =         p (1 − p)n−k
                                        k

for any integer k with 0 ≤ k ≤ n.

   Our calculation that led to the value of E(X) proves the following theo-
rem:

Theorem 6.7.2 Let n ≥ 1 be an integer, let p be a real number with 0 <
p < 1, and let X be a random variable that has a binomial distribution with
parameters n and p. Then
                              E(X) = pn.


6.8      Indicator Random Variables
In Section 6.7, we considered the random variable X whose value is equal
to the number of heads in a sequence of n independent coin flips. In Sec-
tion 6.7.2, we defined a sequence X1 , X2 , . . . , Xn of random variables, where
Xi = 1 if the i-th coin flip results in heads and Xi = 0 otherwise. This
random variable Xi indicates whether or not the i-th flip in the sequence is
heads. Because of this, we call Xi an indicator random variable.
304                     Chapter 6.       Random Variables and Expectation


Definition 6.8.1 A random variable X is an indicator random variable, if
it can only take values in {0, 1}.

   As we have already seen in Section 6.7.2, the expected value of an indi-
cator random variable is easy to determine:

Lemma 6.8.2 If X is an indicator random variable, then

                                E(X) = Pr(X = 1).

Proof. Since X is either 0 or 1, we have, using Lemma 6.4.3,

                    E(X) = 0 · Pr(X = 0) + 1 · Pr(X = 1)
                         = Pr(X = 1).



   In the following subsections, we will see some examples of how indicator
random variables can be used to compute the expected value of non-trivial
random variables.

6.8.1       Runs in Random Bitstrings
Let n be a large integer. We generate a random bitstring

                                    R = r1 r2 . . . rn

by flipping a fair coin, independently, n times. Let k ≥ 1 be an integer.
Recall from Section 5.14 that a run of length k is a consecutive subsequence
of R, all of whose bits are equal. Define the random variable X to be the
number of runs of length k.
   For example, if R is the bitstring

        0   0   1   1   1   1   1    0   0    0    1      1   0     0    0   0
        1   2   3   4   5   6   7    8   9    10   11    12   13   14   15   16

and k = 3, then X = 6, because R contains 6 runs of length 3, starting at
positions 3, 4, 5, 8, 13, and 14.
   We want to determine the expected value E(X) of X.
6.8.    Indicator Random Variables                                             305


    A run of length k can start at any of the positions 1, 2, . . . , n − k + 1. Our
approach will be to define an indicator random variable that tells us whether
or not the subsequence of length k that starts at any such position is a run.
Thus, for any i with 1 ≤ i ≤ n − k + 1, we define the indicator random
variable
                
                   1 if the subsequence ri ri+1 . . . ri+k−1 is a run,
          Xi =
                   0 otherwise.

Using Lemma 6.8.2, we get

                             E (Xi ) = Pr (Xi = 1) .

Since Xi = 1 if and only if all bits in the subsequence ri ri+1 . . . ri+k−1 are 0
or all bits in this subsequence are 1, we have

                          E (Xi ) = Pr (Xi = 1)
                                  = (1/2)k + (1/2)k
                                  = 1/2k−1 .

Since
                                        n−k+1
                                         X
                                  X=            Xi ,
                                         i=1

the Linearity of Expectation (see Theorem 6.5.2) implies that
                                       n−k+1
                                                !
                                        X
                         E(X) = E            Xi
                                             i=1
                                        n−k+1
                                         X
                                    =           E (Xi )
                                         i=1
                                        n−k+1
                                         X
                                    =           1/2k−1
                                         i=1
                                      n−k+1
                                    =       .
                                       2k−1
   Observe that the random variables X1 , X2 , . . . , Xn+k−1 are not mutually
independent. (Do you see why?) Nevertheless, our derivation is correct,
306                   Chapter 6.       Random Variables and Expectation


because the Linearity of Expectation is valid for any sequence of random
variables; independence is not needed.
   For example, if we take k = 1 + log n, then 2k−1 = 2log n = n, so that
                                  n − log n         log n
                         E(X) =              =1−          .
                                       n              n
Thus, for large values of n, the expected number of runs of length 1 + log n
is very close to 1. This is in line with Section 5.14, because we proved there
that it is very likely that the sequence contains a run of length about log n.
    If we take k = 1 + 21 log n, then
                                                 √    √
                         2k−1 = 2(log n)/2 = 2log n = n
and
                                 n − 12 log n √  log n
                       E(X) =       √        = n− √ .
                                        n        2 n
Thus, for large values
                √      of n, the expected number of runs of length 1 + 21 log n
is very close to n.

6.8.2      Largest Elements in Prefixes of Random Permu-
           tations
Let n ≥ 1 be an integer and consider a sequence s1 , s2 , . . . , sn of n numbers.
The following algorithm computes the largest element in this sequence:
      Algorithm FindMax(s1 , s2 , . . . , sn ):

            max = −∞;
            for i = 1 to n
            do if si > max
                then max = si          (*)
                endif
            endfor;
            return max

     We would like to know the number of times that line (*) is executed,
i.e., the number of times that the value of the variable max changes. For
example, if the input sequence is
                                   3, 2, 5, 4, 6, 1,
6.8.    Indicator Random Variables                                           307


then the value of max changes 3 times, namely when we encounter 3, 5,
and 6. On the other hand, for the sequence

                                  6, 5, 4, 3, 2, 1,

the value of max changes only once, whereas for

                                  1, 2, 3, 4, 5, 6,

it changes 6 times.
    Assume that the input sequence s1 , s2 , . . . , sn is a uniformly random per-
mutation of the set {1, 2, . . . , n}. Thus, each permutation has probability
1/n! of being the input. We define a random variable X whose value is equal
to the number of times that line (*) is executed when running algorithm
FindMax(s1 , s2 , . . . , sn ). We are interested in the expected value E(X) of
this random variable.
    The algorithm makes n iterations. In each iteration, line (*) is either
executed or not executed. We define, for each iteration, an indicator random
variable that tells us whether or not line (*) is executed during that iteration.
That is, for any i with 1 ≤ i ≤ n, we define
                  
                     1 if line (*) is executed in the i-th iteration,
           Xi =
                     0 otherwise.

Since                                    n
                                         X
                                  X=           Xi ,
                                         i=1

it follows from the Linearity of Expectation (see Theorem 6.5.2) that
                                         n
                                              !
                                        X
                         E(X) = E          Xi
                                             i=1
                                       n
                                       X
                                  =          E (Xi )
                                       i=1
                                       Xn
                                  =          Pr (Xi = 1) .
                                       i=1

    How do we determine Pr (Xi = 1)? Observe that Xi = 1 if and only
if the maximum of the subsequence s1 , s2 , . . . , si is at the last position in
308                   Chapter 6.        Random Variables and Expectation


this subsequence. Since the entire sequence s1 , s2 , . . . , sn is a uniformly ran-
dom permutation of the set {1, 2, . . . , n}, the elements in the subsequence
s1 , s2 , . . . , si are in a uniformly random order as well. The largest element in
this subsequence is in any of the i positions with equal probability 1/i. In
particular, the probability that the largest element is at the last position in
this subsequence is equal to 1/i. It follows that

                                 Pr (Xi = 1) = 1/i.

This can be proved in a more formal way as follows: By the Product Rule,
the number of permutations s1 , s2 , . . . , sn of {1, 2, . . . , n} for which si is the
largest element among s1 , s2 , . . . , si is equal to
                        
                           n
                                (i − 1)!(n − i)! = n!/i.
                           i

(Do you see why?) Therefore,

                                              n!/i
                             Pr (Xi = 1) =         = 1/i.
                                               n!
Thus,
                                      n
                                      X
                         E(X) =             Pr (Xi = 1)
                                      i=1
                                      n
                                      X
                                  =         1/i
                                      i=1
                                            1 1       1
                                  = 1+       + + ··· + .
                                            2 3       n
The number on the right-hand side is called the harmonic number and de-
noted by Hn . In the following subsection, we will show that Hn is approx-
imately equal to ln n. Thus, the expected number of times that line (*) of
algorithm FindMax is executed, when given as input a uniformly random
permutation of {1, 2, . . . , n}, is about ln n.
   As a final remark, the indicator random variables X1 , X2 , . . . , Xn that we
have introduced above are mutually independent; see Exercise 5.98. Keep in
mind, however, that we do not need this, because the Linearity of Expectation
does not require these random variables to be mutually independent.
6.8.    Indicator Random Variables                                            309


6.8.3     Estimating the Harmonic Number
Consider a positive real-valued decreasing function f : [1, ∞) → R. Thus, if
1 ≤ x < x0 , then f (x) ≥ f (x0 ) > 0. For any integer n ≥ 1, we would like to
estimate the summation                n
                                    X
                                        f (i).
                                     i=1

For example, if we take f (x) = 1/x, then the summation is the harmonic
number Hn of the previous subsection.
   For each i with 2 ≤ i ≤ n, draw the rectangle with bottom-left corner at
the point (i − 1, 0) and top-right corner at the point (i, f (i)), as in the figure
below.




                                      y = f (x)




                    1 2 3 4                            n−1 n
   The area of the i-th rectangle is equal to f (i) and, thus,
                                     n
                                     X
                                           f (i)
                                     i=1

is equal to the sum of
   • f (1) and
   • the total area of the n − 1 rectangles.
Since f is decreasing, the rectangles are below the graph y = f (x). It follows
that the total area of the n − 1 rectangles is less than or equal to the area
between f and the x-axis, between x = 1 and x = n. We conclude that
                         Xn                  Z n
                             f (i) ≤ f (1) +     f (x) dx.                (6.4)
                         i=1                       1
310                  Chapter 6.      Random Variables and Expectation


    To obtain a lower bound on the summation, we modify the figure as
indicated below: For each i with 1 ≤ i ≤ n, draw the rectangle with bottom-
left corner at the point (i, 0) and top-right corner at the point (i + 1, f (i));
see the figure below.




                                    y = f (x)




                   1 2 3 4                                 n n+1

   In this case, the graph y = f (x) is below the top sides of the rectangles
and, therefore,
                          Xn         Z n+1
                             f (i) ≥       f (x) dx.                    (6.5)
                           i=1          1

   If we apply (6.4) and (6.5) to the function f (x) = 1/x, then we get
                                        n
                                        X 1
                              Hn =
                                        i=1
                                              i
                                                  n
                                                      dx
                                              Z
                                   ≤ 1+
                                              1       x
                                   = 1 + ln n
and
                                        n
                                        X 1
                                 Hn =
                                        i=1
                                              i
                                          n+1
                                            dx
                                        Z
                                    ≥
                                        1    x
                                    = ln(n + 1)
                                    ≥ ln n.
We have proved the following result:
6.9.     The Insertion-Sort Algorithm                                     311

                                                                     Pn
Lemma 6.8.3 For any integer n ≥ 1, the harmonic number Hn =            i=1   1/i
satisfies
                       ln n ≤ Hn ≤ 1 + ln n.


6.9        The Insertion-Sort Algorithm
InsertionSort is a simple sorting algorithm that takes as input an array
A[1 . . . n] of numbers. The algorithm uses a for-loop in which a variable i
runs from 2 to n. At the start of the i-th iteration,

   • the subarray A[1 . . . i − 1] is sorted, whereas

   • the algorithm has not yet seen any of the elements in the subarray
     A[i . . . n].

In the i-th iteration, the algorithm takes the element A[i] and repeatedly
swaps it with its left neighbor until the subarray A[1 . . . i] is sorted. The
pseudocode of this algorithm is given below.
       Algorithm InsertionSort(A[1 . . . n]):

            for i = 2 to n
            do j = i;
               while j > 1 and A[j] < A[j − 1]
               do swap A[j] and A[j − 1];
                   j =j−1
               endwhile
            endfor

   We are interested in the total number of swaps that are made by this
algorithm. The worst-case happens when the input array is sorted in reverse
order, in which case the total number of swaps is equal to
                                                     
                                                     n
                      1 + 2 + 3 + · · · + (n − 1) =     .
                                                     2

Thus, in the worst case, each of the n2 pairs of input elements is swapped.
                                         

   Assume that the input array A[1 . . . n] contains a uniformly random per-
mutation of the set {1, 2, . . . , n}. Thus, each permutation has probability
312                   Chapter 6.       Random Variables and Expectation


1/n! of being the input. We define the random variable X to be the total
number of swaps made when running algorithm InsertionSort(A[1 . . . n]).
We will determine the expected value E(X) of X.
    Since we want to count the number of pairs of input elements that are
swapped, we will use, for each pair of input elements, an indicator random
variable that indicates whether or not this pair gets swapped by the algo-
rithm. That is, for each a and b with 1 ≤ a < b ≤ n, we define
                  
                     1 if a and b get swapped by the algorithm,
           Xab =
                     0 otherwise.
We observe that, since a < b, these two elements get swapped if and only
if in the input array, b is to the left of a. Since the input array contains a
uniformly random permutation, the events “b is to the left of a” and “a is to
the left of b” are symmetric. Therefore, we have
                         E (Xab ) = Pr (Xab = 1) = 1/2.
A formal proof of this is obtained by showing that there are n!/2 permutations
of {1, 2, . . . , n} in which b appears to the left of a and, thus, n!/2 permutations
in which a appears to the left of b. (See also Exercise 5.71.)
    Since each pair of input elements is swapped at most once, we have
                                      n−1 X
                                      X   n
                                X=                Xab .
                                      a=1 b=a+1

It follows from the Linearity of Expectation (see Theorem 6.5.2) that
                                      n−1 X n
                                                    !
                                      X
                       E(X) = E                 Xab
                                            a=1 b=a+1
                                      n−1 X
                                      X   n
                                  =               E (Xab )
                                      a=1 b=a+1
                                      n−1 n
                                      X  X    1
                                  =
                                    a=1 b=a+1
                                              2
                                      
                                    1 n
                                  =        .
                                    2 2
Thus, the expected number of swaps on a uniformly random input array is
one half times the worst-case number of swaps.
6.10.   The Quick-Sort Algorithm                                          313


6.10      The Quick-Sort Algorithm
We have already seen algorithm QuickSort in Section 1.3. This algorithm
takes as input an array A[1 . . . n] of numbers, which we assume for simplicity
to be pairwise distinct. A generic call QuickSort(A, i, j) takes two indices
i and j and sorts the subarray A[i . . . j]. Thus, the call QuickSort(A, 1, n)
sorts the entire array.
    Algorithm QuickSort(A, i, j):

          if i < j
          then p = uniformly random element in A[i . . . j];
                 compare p with all other elements in A[i . . . j];
                 rearrange A[i . . . j] such that it has the following
                 form (this rearranging defines the value of k):

                                <p         p      >p
                         i                 k             j

                  QuickSort(A, i, k − 1);
                  QuickSort(A, k + 1, j)
          endif

    The element p is called the pivot. We have seen in Section 1.3 that the
worst-case running time of algorithm QuickSort(A, 1, n) is Θ(n2 ). In this
section, we will prove that the expected running time is only O(n log n).
    We assume for simplicity that the input array is a permutation of the set
{1, 2, . . . , n}. We do not make any other assumption about the input. In
particular, we do not assume that the input is a random permutation. The
only place where randomization is used is when the pivot is chosen: It is
chosen uniformly at random in the subarray on which QuickSort is called.
    The quantity that we will analyze is the total number of comparisons
(between pairs of input elements) that are made during the entire execution
of algorithm QuickSort(A, 1, n). In such a comparison, the algorithm takes
two distinct input elements, say a and b, and decides whether a < b or a > b.
Observe from the pseudocode that the only comparisons being made are
between the pivot and all other elements in the subarray that is the input
to the current call to QuickSort. Since the operation “compare a to b” is
314                 Chapter 6.      Random Variables and Expectation


the same as the operation “compare b to a” (even though the outcomes are
opposite), we will assume below that in such a comparison, a < b.
   We define the random variable X to be the total number of comparisons
that are made by algorithm QuickSort(A, 1, n). We will prove that the
expected value of X satisfies E(X) = O(n log n).
   For each a and b with 1 ≤ a < b ≤ n, we consider the indicator random
variable
               
                1 if a and b are compared to each other when
         Xab =         running QuickSort(A, 1, n),
                   0 otherwise.
               

Since each pair of input elements is compared at most once, we have
                                   n−1 X
                                   X   n
                              X=               Xab .
                                   a=1 b=a+1


It follows from the Linearity of Expectation (see Theorem 6.5.2) that
                                    n−1 X n
                                                  !
                                    X
                      E(X) = E                Xab
                                        a=1 b=a+1
                                  n−1
                                  X     n
                                        X
                              =               E (Xab )
                                  a=1 b=a+1
                                  n−1
                                  X     n
                                        X
                              =               Pr (Xab = 1) .
                                  a=1 b=a+1


    We consider two input elements a and b with 1 ≤ a < b ≤ n. We are
going to determine Pr (Xab = 1), i.e., the probability that the elements a and
b are compared to each other when running algorithm QuickSort(A, 1, n).
Consider the set
                          Sab = {a, a + 1, . . . , b}.
   At the start of algorithm QuickSort(A, 1, n), all elements of the set Sab
are part of the input. Consider the first pivot p that is chosen. We observe
the following:

   • Assume that p 6∈ Sab .
6.10.   The Quick-Sort Algorithm                                              315


        – If p < a, then after the algorithm has rearranged the input ar-
          ray, all elements of the set Sab are to the right of p and, thus,
          all these elements are part of the input for the recursive call
          QuickSort(A, k+1, n). During the rearranging, a and b have not
          been compared to each other. However, they may be compared
          to each other during later recursive calls.
        – If p > b, then after the algorithm has rearranged the input ar-
          ray, all elements of the set Sab are to the left of p and, thus,
          all these elements are part of the input for the recursive call
          QuickSort(A, 1, k−1). During the rearranging, a and b have not
          been compared to each other. However, they may be compared
          to each other during later recursive calls.

   • Assume that p ∈ Sab .

        – If p 6= a and p 6= b, then after the algorithm has rearranged the
          input array, a is to the left of p and b is to the right of p. During the
          rearranging, a and b have not been compared to each other. Also,
          since a and b have been “separated”, they will not be compared
          to each other during later recursive calls. Thus, we have Xab = 0.
        – If p = a or p = b, then during the rearranging, a and b have been
          compared to each other. Thus, we have Xab = 1. (Note that in
          later recursive calls, a and b will not be compared to each other
          again.)

    We conclude that whether or not a and b are compared to each other is
completely determined by the element of the set Sab that is the first element
in this set to be chosen as a pivot. If this element is equal to a or b, then
Xab = 1. On the other hand, if this element belongs to Sab \ {a, b}, then
Xab = 0. Since

   • in any recursive call, the pivot is chosen uniformly at random from the
     subarray that is the input for this call, and

   • at the start of the first recursive call in which the pivot belongs to the
     set Sab , all elements of this set are part of the input for this call,

each of the b − a + 1 elements of Sab has the same probability of being the
316                 Chapter 6.         Random Variables and Expectation


first element of Sab that is chosen as a pivot. It follows that
                                               2
                            Pr (Xab = 1) =         .
                                             b−a+1
   We conclude that
                             n−1 X
                             X   n
                E(X) =                   Pr (Xab = 1)
                             a=1 b=a+1
                             n−1   n
                             X     X   2
                        =
                          a=1 b=a+1
                                    b−a+1
                            n−1                    
                            X     1 1             1
                        = 2         + + ··· +
                            a=1
                                  2 3         n−a+1
                            n−1                
                            X     1 1         1
                        ≤ 2         + + ··· +
                            a=1
                                  2  3        n
                               n−1
                               X
                        = 2          (Hn − 1)
                               a=1
                        = 2(n − 1) (Hn − 1)
                        ≤ 2n (Hn − 1) ,

where Hn is the harmonic number that we have seen in Sections 6.8.2 and 6.8.3.
Using Lemma 6.8.3, it follows that

                                E(X) ≤ 2n ln n.


6.11       Skip Lists
Consider a set S of n numbers. We would like to store these numbers in a
data structure that supports the following operations:
   • Search(x): This operation returns the largest element in the set S
     that is less than or equal to x.

   • Insert(x): This operation inserts the number x into the set S.

   • Delete(x): This operation deletes the number x from the set S.
6.11.    Skip Lists                                                            317


    A standard data structure for this problem is a balanced binary search
tree (such as a red-black tree or an AVL-tree), which allows each of these three
operations to be performed in O(log n) time. Searching in a binary search
tree is straightforward, but keeping the tree balanced after an insertion or
deletion is cumbersome.
    In this section, we introduce skip lists as an alternative data structure.
A skip list is constructed using the outcomes of coin flips, which result in a
structure that is balanced in the expected sense. Because of this, the insertion
and deletion algorithms become straightforward: We, as a programmer, do
not have to take care of rebalancing operations, because the coin flips take
care of this.
    To define a skip list for the set S of n numbers, we first construct a
sequence S0 , S1 , S2 , . . . of subsets of S:
   • Let S0 = S.
   • For i = 0, 1, 2, . . ., assume that the set Si has already been constructed.
     If Si is non-empty, we do the following:
         – Initialize an empty set Si+1 .
         – For each element y in the set Si , flip a fair and independent coin.
           If the coin comes up heads, element y is added to the set Si+1 .
The process terminates as soon as the next set Si+1 is empty.
   Let h be the number of non-empty sets that are constructed by this
process, and consider the sequence S0 , S1 , . . . , Sh of sets. Observe that h is a
random variable and each of the sets S1 , S2 , . . . , Sh is a random subset of S.
   The skip list for S consists of the following:
   • For each i with 0 ≤ i ≤ h, we store the sorted sequence of elements of
     the set Si in a linked list Li .
         – Each node u of Li stores one element of Si , which is denoted by
           key(u).
         – Each node u of Li stores a pointer to its successor node in Li ,
           which is denoted by right(u). If u is the rightmost node in Li ,
           then right(u) = nil .
         – We add a dummy node at the beginning of Li . The key of this
           node is nil and its successor is the node of Li whose key is the
           smallest element in Si .
318                  Chapter 6.        Random Variables and Expectation


   • For each i with 1 ≤ i ≤ h and each node u of Li , u stores a pointer to
     the node u0 in Li−1 for which key(u0 ) = key(u). The node u0 is denoted
     by down(u).



   • There is a pointer to the dummy node in the list Lh . We will refer to
     this node as the root of the skip list.



The value of h is called the height of the skip list. An example of a skip list
of height h = 3 for the set S = {1, 2, 3, 4, 6, 7, 9} is shown in the figure below.




           L3                                                 7

           L2                            3                    7

           L1               1            3      4             7      9

           L0               1      2     3      4      6      7      9




6.11.1      Algorithm Search

The algorithm that searches for a number x keeps track of the current node u
and the index i of the list Li that contains u. Initially, u is the root of the
skip list and i = h. At any moment, if i ≥ 1, the algorithm tests if the key
of right(u) is less than x. If this is the case, then u moves one node to the
right in the list Li ; otherwise, u moves to the node down(u) in the list Li−1 .
Once i = 0, node u moves to the right in the list L0 and stops at the last
node whose key is at most equal to x. The pseudocode of this algorithm
Search(x) is given below.
6.11.    Skip Lists                                                       319


    Algorithm Search(x):

          // returns the rightmost node u in L0 such that key(u) ≤ x
          u = root of the skip list;
          i = h;
          while i ≥ 1
          do if right(u) 6= nil and key(right(u)) < x
              then u = right(u)
              else u = down(u);
                   i=i−1
              endif
          endwhile;
          while right(u) 6= nil and key(right(u)) ≤ x
          do u = right(u)
          endwhile;
          return u

    The dashed arrows in the figure below show the path that is followed when
running algorithm Search(7). Note that if we replace “key(right(u)) < x”
in the first while-loop by “key(right(u)) ≤ x”, we obtain a different path that
ends in the same node: This path moves from the root to the node in L3
whose key is 7, and then it moves down to the list L0 . As we will see later,
using the condition “key(right(u)) < x” simplifies the algorithm for deleting
an element from the skip list.

           L3                                              7

           L2                          3                   7

           L1             1            3      4            7     9

           L0             1      2     3      4     6      7     9


6.11.2     Algorithms Insert and Delete
Algorithm Insert(x) takes as input a number x and inserts it into the skip
list. This algorithm works as follows:
320                  Chapter 6.      Random Variables and Expectation


   • Run algorithm Search(x) and consider the node u that is returned.
     We assume that key(u) 6= x and, thus, x is not in the skip list yet. Ob-
     serve that the new number x belongs between the nodes u and right(u).

   • Flip a fair and independent coin repeatedly until it comes up tails for
     the first time. Let k be the number of flips.

   • Add the new number x to the lists L0 , L1 , . . . , Lk−1 . Note that if k ≥
     h + 2, we have to add new lists Lh+1 , . . . , Lk−1 to the skip list (each one
     containing a dummy node and a node storing x), set h = k − 1, and
     update the pointer to the root of the new skip list.

   • When adding x to a list Li , we have to know its predecessor in this list.

        – To find these predecessors, we modify algorithm Search(x) as
          follows: Each time the current node u moves down, we push u
          onto an initially empty stack. In this way, the predecessors that
          we need are stored, in the correct order, on the stack.
        – An easier way that avoids using a stack is to flip the coin and,
          thus, determine k, before running algorithm Search(x). We then
          modify algorithm Search(x): If i < k and the current node u
          moves down, we add the new number x to Li between the nodes
          u and right(u).

The figure below shows the skip list that results when inserting the number 5
into our example skip list. In this case, k = 3 and the new number is added
to the lists L0 , L1 , and L2 . The dashed arrows indicate the pointers that are
changed during this insertion.

         L3                                                     7

         L2                          3             5            7

         L1             1            3      4      5            7      9

         L0             1      2     3      4      5     6      7      9

    Algorithm Delete(x) takes as input a number x and deletes it from the
skip list. This algorithm does the following:
6.11.    Skip Lists                                                              321


   • Run a modified version of algorithm Search(x): Each time the current
     node u moves down, test if key(right(u)) = x. If this is the case, delete
     the node right(u) by setting right(u) = right(right(u)). Finally, delete
     the node in L0 whose key is equal to x.

   • At this moment, it may happen that some of the lists Lh , Lh−1 , . . . only
     consist of dummy nodes. If this is the case, delete these lists, and
     update the height h and the root of the new skip list.

   Implementation details of skip lists and algorithms Search, Insert, and
Delete can be found in Pat Morin’s free textbook Open Data Structures,
which is available at http://opendatastructures.org/


6.11.3      Analysis of Skip Lists
In this subsection, we will prove that the expected size of a skip list is O(n)
and the expected running time of algorithm Search is O(log n). This will
imply that the expected running times of algorithms Insert and Delete
are O(log n) as well. Throughout this subsection, we assume for simplicity
that n is a power of 2, so that log n is an integer.
    Consider again the lists L0 , L1 , . . . , Lh in the skip list. For the purpose of
analysis, we define, for each integer i > h, Li to be an empty list.
    For each number x that is stored in the list L0 , we define the random
variable h(x) to be the largest value of i such that x is contained in the list Li .
Thus, x occurs in the lists L0 , L1 , . . . , Lh(x) , but not in the list Lh(x)+1 .

Lemma 6.11.1 For any number x that is stored in the list L0 ,

                                   E(h(x)) = 1.

Proof. The value of h(x) is determined by the following process: flip a
fair coin repeatedly and independently until it comes up tails for the first
time. The value of h(x) is then equal to the number of flips minus one. For
example, if we flip the coin three times (i.e., obtain the sequence HHT ),
then x is contained in the lists L0 , L1 , and L2 , but not in L3 ; thus, h(x) = 2.
By Theorem 6.6.2, the expected number of coin flips is equal to two. As a
result, the expected value of h(x) is equal to one.
322                  Chapter 6.      Random Variables and Expectation


Lemma 6.11.2 For any number x that is stored in the list L0 and for any
i ≥ 0,
                       Pr (x ∈ Li ) = 1/2i .

Proof. The claim follows from the fact that x is contained in the list Li if
and only if the first i coin flips for x all result in heads.


Lemma 6.11.3 Let i ≥ 0 and let |Li | denote the number of nodes in the
list Li , ignoring the dummy node. Then,

                                E (|Li |) = n/2i .

Proof. We know from Lemma 6.11.2 that each number x in L0 is con-
tained in Li with probability 1/2i , independently of the other numbers in Li .
Therefore, |Li | is a random variable that has a binomial distribution with
parameters n and p = 1/2i . The claim then follows from Theorem 6.7.2.


Lemma 6.11.4 Let X be the random variable whose value is equal to the
total number of nodes in all lists L0 , L1 , L2 , . . ., ignoring the dummy nodes.
Then,
                                 E(X) = 2n.

Proof. We will give two proofs. In the first proof, we observe that
                                       h
                                       X
                                  X=          |Li |
                                        i=0


and, thus,
                                          h
                                                      !
                                          X
                            E(X) = E             |Li | .
                                          i=0

Observe that the number of terms in the summation on the right-hand side
is equal to h + 1, which is a random variable. In general, the Linearity of
Expectation does not apply to summations consisting of a random number
of terms; see Exercise 6.64 for an example. Therefore, we proceed as follows.
6.11.   Skip Lists                                                     323


Recall that, for the purpose of analysis, we have defined, for each integer
i > h, Li to be an empty list. It follows that
                                      ∞
                                      X
                               X=             |Li |.
                                      i=0

Using the Linearity of Expectation (i.e., Theorem 6.5.3) and Lemmas 6.11.3
and 5.15.2, we get
                                           ∞
                                                  !
                                          X
                         E(X) = E            |Li |
                                              i=0
                                      ∞
                                      X
                                 =            E (|Li |)
                                      i=0
                                      ∞
                                      X
                                 =            n/2i
                                      i=0
                                        X∞
                                 = n            (1/2)i
                                          i=0
                                 = 2n.
   In the second proof, we use the fact that each number x in L0 occurs in
exactly 1 + h(x) lists, namely L0 , L1 , . . . , Lh(x) . Thus, we have
                                  X
                            X=          (1 + h(x)) .
                                 x

Using the Linearity of Expectation (i.e., Theorem 6.5.2) and Lemma 6.11.1,
we get
                                                    !
                                     X
                       E(X) = E           (1 + h(x))
                                          x
                                 X
                             =        E (1 + h(x))
                                  x
                                 X
                             =        (1 + E (h(x)))
                                  x
                                 X
                             =        2
                                  x
                             = 2n.
324                  Chapter 6.       Random Variables and Expectation




Lemma 6.11.5 Recall that h is the random variable whose value is equal to
the height of the skip list. We have

                               E(h) ≤ log n + 1.

Proof. Since
                                h = max h(x),
                                       x

we have                                     
                            E(h) = E max h(x) .
                                           x

It is tempting, but wrong, to think that this is equal to

                                max E (h(x)) ,
                                  x

which is equal to 1 by Lemma 6.11.1. (In Exercise 6.63, you will find a simple
example showing that, in general, the expected value of a maximum is not
equal to the maximum of the expected values.)
   To prove a correct upper bound on E(h), we introduce, for each integer
i ≥ 1, an indicator random variable
                  
                     1 if the list Li stores at least one number,
            Xi =
                     0 otherwise.

We observe that
                                       ∞
                                       X
                                  h=           Xi .
                                       i=1

Since Xi is either 0 or 1, it is obvious that

                                  E (Xi ) ≤ 1.                           (6.6)

We next claim that
                                   Xi ≤ |Li |.                           (6.7)
To justify this, if the list Li does not store any number, then (6.7) becomes
0 ≤ 0, which is a true statement. On the other hand, if the list Li stores
6.11.   Skip Lists                                                              325


at least one number, then (6.7) becomes 1 ≤ |Li |, which is again a true
statement. Combining (6.7) with Lemmas 6.4.2 and 6.11.3, we obtain

                         E (Xi ) ≤ E (|Li |) = n/2i .                           (6.8)

Using the Linearity of Expectation (i.e., Theorem 6.5.3), we get
                                 ∞
                                        !
                                X
                  E(h) = E          Xi
                                     i=1
                             ∞
                             X
                         =           E (Xi )
                              i=1
                             log n                      ∞
                             X                          X
                         =           E (Xi ) +                      E (Xi ) .
                             i=1                      i=log n+1


If we apply (6.6) to the first summation and (6.8) to the second summation,
we get
                                     log n            ∞
                                     X                X     n
                      E(h) ≤                 1+
                                     i=1          i=log n+1
                                                            2i
                                                  ∞
                                                  X             n
                             = log n +
                                                  j=0
                                                        2log n+1+j
                                                   ∞
                                                  X         n
                             = log n +
                                                  j=0
                                                        n · 21+j
                                                   ∞
                                                  X         1
                             = log n +
                                                  j=0
                                                        21+j
                                                     ∞
                                                  1 X       1
                             = log n +
                                                  2   j=0
                                                            2j
                                       1
                             = log n +    ·2
                                       2
                             = log n + 1.
326                  Chapter 6.         Random Variables and Expectation


Lemma 6.11.6 Let Y be the random variable whose value is equal to the
total number of nodes in all lists L0 , L1 , L2 , . . ., including the dummy nodes.
Then
                          E(Y ) ≤ 2n + log n + 2.

Proof. The total number of dummy nodes is equal to h + 1. Using the
notation of Lemma 6.11.4, we have
                                Y = X + h + 1.
Thus, using the Linearity of Expectation (i.e., Theorem 6.5.2) and Lem-
mas 6.11.4 and 6.11.5, we get
                        E(Y ) =        E(X + h + 1)
                              =        E(X) + E(h) + 1
                              ≤        2n + (log n + 1) + 1
                              =        2n + log n + 2.



    Consider any number x. As we have seen in Section 6.11.1, algorithm
Search(x) starts at the root of the skip list and follows a path to the right-
most node u in the bottom list L0 for which key(u) ≤ x. We will refer to
this path as the search path of the algorithm. In the figure below, you see
the same skip list as we have seen before. The dashed arrows indicate the
search path of algorithm Search(7).
           L3                                                 7

           L2                              3                  7

           L1               1              3      4           7      9

           L0               1      2       3      4     6     7      9

Lemma 6.11.7 For any number x, let N be the random variable whose value
is equal to the number of nodes on the search path of algorithm Search(x).
Then,
                            E(N ) ≤ 2 log n + 5.
6.11.   Skip Lists                                                        327


Proof. Consider the node u that is returned by algorithm Search(x), let v
be the second last node on the search path, and let P be the part of this
search path from the root to v. In the example above, u is the node in L0
whose key is 7, v is the node in L0 whose key is 6, and P is the part of the
dashed path from the root to v.
   Let M be the random variable whose value is equal to the number of
nodes on P . Then, N = M + 1 and

                       E(N ) = E(M + 1) = E(M ) + 1.

Thus, it suffices to prove that

                             E(M ) ≤ 2 log n + 4.

   Consider the following path P 0 in the skip list:
   • P 0 starts at node v.
   • At any node on P 0 , the path P 0 moves up one level if this is possible,
     and moves one node to the left otherwise.
You should convince yourself that this path P 0 is the reverse of P and, there-
fore, M is equal to the number of nodes on P 0 . You should also convince
yourself that this may not be true, if we take for P the path from the root
to u.
    For each i ≥ 0, let Mi be the random variable whose value is equal to the
number of nodes in the list Li at which the path P 0 moves one node to the
left. Then, M is the sum of
   • h: these are the nodes on P 0 at which P 0 moves up one level,
   • 1: this accounts for the last node on P 0 , which is the root, and
     Ph
   •    i=0 Mi .

Thus,
                                            h
                                                         !
                                            X
                     E(M ) = E h + 1 +             Mi
                                             i=0
                                                   h
                                                              !
                                                   X
                             = E(h) + 1 + E              Mi       .
                                                   i=0
328                  Chapter 6.        Random Variables and Expectation


As in the proof of Lemma 6.11.4, the number of terms in the latter summation
is equal to h + 1, which is a random variable. Therefore, we cannot apply
the Linearity of Expectation to this sum. As in the proof of Lemma 6.11.4,
we proceed as follows. We first observe that
                                                   ∞
                                                   X
                              M =h+1+                    Mi .
                                                   i=0

As the figure below indicates, the random variable Mi can be interpreted as
being the number of tails obtained when flipping a fair coin until it comes
up heads for the first time. Since (i) the list Li may be empty (in which case
Mi = 0) or (ii) the portion of the path P 0 in Li may terminate because it
reaches the dummy node, Mi is in fact less than or equal to the number of
tails.
                 Li+1 :

                 Li :       H      T           T         T      T         T
                 Li−1 :


Therefore, by Lemma 6.4.2 and Theorem 6.6.2,

                                   E (Mi ) ≤ 1.                                       (6.9)

Also, since Mi is less than or equal to the size |Li | of the list Li (ignoring the
dummy node), we have, using Lemmas 6.4.2 and 6.11.3,

                           E (Mi ) ≤ E (|Li |) = n/2i .                              (6.10)

   Using the Linearity of Expectation (i.e., Theorem 6.5.3), we get
                                   ∞
                                          !
                                  X
           E(M ) = E h + 1 +          Mi
                                       i=0
                                       ∞
                                       X
                     = E(h) + 1 +              E (Mi )
                                        i=0
                                       log n                    ∞
                                       X                        X
                     = E(h) + 1 +              E (Mi ) +                 E (Mi ) .
                                       i=0                   i=log n+1
6.12.   Exercises                                                           329


We know from Lemma 6.11.5 that E(h) ≤ log n + 1. If we apply (6.9) to the
first summation and (6.10) to the second summation, we get

                                            log n            ∞
                                            X                X
             E(M ) ≤ (log n + 1) + 1 +              1+               n/2i
                                           i=0           i=log n+1
                                         ∞
                                         X
                     = 2 log n + 3 +                n/2i .
                                       i=log n+1


We have seen the infinite series in the proof of Lemma 6.11.5 and showed
that it is equal to 1. Thus, we conclude that

                            E(M ) ≤ 2 log n + 4.




6.12      Exercises
6.1 Consider a fair coin that has 0 on one side and 1 on the other side. We
flip this coin once and roll a fair die twice. Consider the following random
variables:

                      X = the result of the coin,
                      Y = the sum of the two dice,
                      Z = X · Y.

   • Determine the distribution functions of X, Y , and Z.

   • Are X and Y independent random variables?

   • Are X and Z independent random variables?

   • Are Y and Z independent random variables?

   • Are X, Y and Z mutually independent random variables?
330                 Chapter 6.      Random Variables and Expectation


6.2 Consider the set S = {2, 3, 5, 30}. We choose a uniformly random ele-
ment x from this set. Consider the random variables
                           
                              1 if x is divisible by 2,
                    X =
                              0 otherwise,
                           
                              1 if x is divisible by 3,
                    Y =
                              0 otherwise,
                           
                              1 if x is divisible by 5,
                    Z =
                              0 otherwise.

   • Is the sequence X, Y , Z of random variables pairwise independent?

   • Is the sequence X, Y , Z of random variables mutually independent?

6.3 Let a and b be real numbers. You flip a fair and independent coin three
times. For i = 1, 2, 3, let
                     
                        a if the i-th coin flip results in heads,
              fi =
                        b if the i-th coin flip results in tails.

Consider the random variables

                                X = f1 · f2 ,
                                Y = f2 · f3 .

   • Assume that a = b. Are the random variables X and Y independent?

   • Assume that a = 0 and b 6= a. Are the random variables X and Y
     independent?

   • Assume that a 6= 0 and b = −a. Are the random variables X and Y
     independent?

   • Assume that a 6= 0, b 6= 0, a 6= b, and b 6= −a. Are the random
     variables X and Y independent?

6.4 Lindsay and Simon want to play a game in which the expected amount
of money that each of them wins is equal to zero. After having chosen a num-
ber x, the game is played as follows: Lindsay rolls a fair die, independently,
three times.
6.12.     Exercises                                                          331


   • If none of the three rolls results in 6, then Lindsay pays one dollar to
     Simon.

   • If exactly one of the rolls results in 6, then Simon pays one dollar to
     Lindsay.

   • If exactly two rolls result in 6, then Simon pays two dollars to Lindsay.

   • If all three rolls result in 6, then Simon pays x dollars to Lindsay.

Determine the value of x.

6.5 You are given a fair coin.

   • You flip this coin twice; the two flips are independent. For each heads,
     you win 3 dollars, whereas for each tails, you lose 2 dollars. Consider
     the random variable

                      X = the amount of money that you win.

          – Use the definition of expected value to determine E(X).
          – Use the linearity of expectation to determine E(X).

   • You flip this coin 99 times; these flips are mutually independent. For
     each heads, you win 3 dollars, whereas for each tails, you lose 2 dollars.
     Consider the random variable

                      Y = the amount of money that you win.

        Determine the expected value E(Y ) of Y .
                                                        r
6.6 Let r and b be positive integers and define α = r+b   . A bowl contains r
red balls and b blue balls; thus, α is the fraction of the balls that are red.
Consider the following experiment:

   • Choose one ball uniformly at random.

          – If the chosen ball is red, then put it back, together with an addi-
            tional red ball.
          – If the chosen ball is blue, then put it back, together with an ad-
            ditional blue ball.
332                Chapter 6.      Random Variables and Expectation


Define the random variable X to be the fraction of the balls that are red,
after this experiment. Prove that E(X) = α.

6.7 The Ontario Lottery and Gaming Corporation (OLG) offers the follow-
ing lottery game:
   • OLG chooses a winning number w in the set S = {0, 1, 2, . . . , 999}.

   • If John wants to play, he pays $1 and chooses a number x in S.

        – If x = w, then John receives $700 from OLG. In this case, John
          wins $699.
        – Otherwise, x 6= w and John does not receive anything. In this
          case, John loses $1.

Assume that
   • John plays this game once per day for one year (i.e., for 365 days),

   • each day, OLG chooses a new winning number,

   • each day, John chooses x uniformly at random from the set S, inde-
     pendently from previous choices.
Define the random variable X to be the total amount of dollars that John
wins during one year. Determine the expected value E(X).
Hint: Use the Linearity of Expectation.

6.8 Assume we flip a fair coin twice, independently of each other. Consider
the following random variables:

          X =     the number of heads,
          Y =     the number of tails,
          Z =     the number of heads times the number of tails.

   • Determine the expected values of these three random variables.

   • Are X and Y independent random variables?

   • Are X and Z independent random variables?

   • Are Y and Z independent random variables?
6.12.     Exercises                                                      333


6.9 As of this writing2 , Ma Long is the number 1 ranked ping pong player in
the world. Simon Bose3 also plays ping pong, but he is not at Ma’s level yet.
If you play a game of ping pong against Ma, then you win with probability p.
If you play a game against Simon, you win with probability q. Here, p and
q are real numbers such that 0 < p < q < 1. (Of course, p is much smaller
than q.) If you play several games against Ma and Simon, then the results
are mutually independent.
    You have the choice between the following two series of games:

  1. MSM : First, play against Ma, then against Simon, then against Ma.

  2. SMS : First, play against Simon, then against Ma, then against Simon.

For each s ∈ {MSM , SMS }, consider the event

           As = “you play series s and beat Ma at least once and
                beat Simon at least once”

and the random variable

           Xs = the number of games you win when playing series s.

   • Determine Pr (AMSM ) and Pr (ASMS ). Which of these two probabilities
     is larger?

   • Determine E (XMSM ) and E (XSMS ). Which of these two expected val-
     ues is larger?

6.10 In order to attract more customers, the Hyacintho Cactus Bar and Grill
in downtown Ottawa organizes a game night, hosted by their star employee
Tan Tran.
    After paying $26, a player gets two questions P and Q. If the player
gives the correct answer to question P , this player wins $30; if the player
gives the correct answer to question Q, this player wins $60. A player can
choose between the following two options:

  1. Start with question P . In this case, the player is allowed to answer
     question Q only if the answer to question P is correct.
  2
      November 2016
  3
      Jit’s son
334                 Chapter 6.      Random Variables and Expectation


  2. Start with question Q. In this case, the player is allowed to answer
     question P only if the answer to question Q is correct.

   Elisa decides to play this game. The probability that Elisa correctly
answers question P is equal to 1/2, whereas she correctly answers question
Q with probability 1/3. The events of correctly answering are independent.

   • Assume Elisa chooses the first option. Define the random variable X
     to be the amount of money that Elisa wins (this includes the $26 that
     she has to pay in order to play the game). Determine the expected
     value E(X).

   • Assume Elisa chooses the second option. Define the random variable
     Y to be the amount of money that Elisa wins (this includes the $26
     that she has to pay in order to play the game). Determine the expected
     value E(Y ).

6.11 Assume we roll two fair and independent dice, where one die is red and
the other die is blue. Let (i, j) be the outcome, where i is the result of the
red die and j is the result of the blue die. Consider the random variables

                                 X =i+j

and
                                 Y = i − j.
Are X and Y independent random variables?

6.12 Assume we roll two fair and independent dice, where one die is red and
the other die is blue. Let (i, j) be the outcome, where i is the result of the
red die and j is the result of the blue die. Consider the random variables

                                 X = |i − j|

and
                               Y = max(i, j).
Are X and Y independent random variables?
6.12.   Exercises                                                        335


6.13 Consider the sample space S = {1, 2, 3 . . . , 10}. We choose a uniformly
random element x in S. Consider the following random variables:
                           
                            0 if x ∈ {1, 2},
                     X=      1 if x ∈ {3, 4, 5, 6},
                             2 if x ∈ {7, 8, 9, 10}
                           

and                             
                                    0 if x is even,
                          Y =
                                    1 if x is odd.
Are X and Y independent random variables?
6.14 Consider the 8-element set A = {a, b, c, d, e, f, g, h}. We choose a uni-
formly random 5-element subset B of A. Consider the following random
variables:
                          X = |B ∩ {a, b, c, d}|,
                          Y = |B ∩ {e, f, g, h}|.
   • Determine the expected value E(X) of the random variable X.
   • Are X and Y independent random variables?
6.15 You roll a fair die repeatedly and independently until the result is an
even number. Consider the random variables
                 X = the number of times you roll the die
and
                       Y = the result of the last roll.
For example, if the results of the rolls are 5, 1, 3, 3, 5, 2, then X = 6 and
Y = 2.
   Prove that the random variables X and Y are independent.
6.16 Consider two random variables X and Y . If X and Y are independent,
then it can be shown that
                           E(XY ) = E(X) · E(Y ).
In this exercise, you will show that the converse of this statement is, in
general, not true.
   Let X be the random variable that takes each of the values −1, 0, and 1
with probability 1/3. Let Y be the random variable with value Y = X 2 .
336                Chapter 6.      Random Variables and Expectation


   • Prove that X and Y are not independent.
   • Prove that E(XY ) = E(X) · E(Y ).
6.17 You are given two independent random variables X and Y , where
       Pr(X = 1) = Pr(X = −1) = Pr(Y = 1) = Pr(Y = −1) = 1/2.
Consider the random variable
                                Z = X · Y.
Are X and Z independent random variables?
6.18 Jennifer loves to drink India Pale Ale (IPA), whereas Lindsay Bangs
prefers wheat beer. Jennifer and Lindsay decide to go to their favorite pub
Chez Connor et Simon. The beer menu shows that this pub has ten beers
on tap:
   • Five of these beers are of the IPA style.
   • Three of these beers are of the wheat beer style.
   • Two of these beers are of the pilsner style.
Jennifer and Lindsay order a uniformly random subset of seven beers (thus,
there are no duplicates). Consider the following random variables:
              J = the number of IPAs in this order,
              L = the number of wheat beers in this order.
   • Determine the expected value E(L) of the random variable L.
   • Are J and L independent random variables?
6.19 You roll a fair die five times, where all rolls are independent of each
other. Consider the random variable
                 X = the largest value in these five rolls.
Prove that the expected value E(X) of the random variable X is equal to
                                        14077
                               E(X) =         .
                                        2592
Hint: What are the possible value for X? What is Pr(X = k)?
6.12.     Exercises                                                        337


6.20 Consider the following algorithm, which takes as input a large integer
n and returns a random subset A of the set {1, 2, . . . , n}:
    Algorithm RandomSubset(n):

            // all coin flips are mutually independent
            A = ∅;
            for i = 1 to n
            do flip a fair coin;
                if the result of the coin flip is heads
                then A = A ∪ {i}
                endif
            endfor;
            return A

   Define
                            
                                the largest element in A if A 6= ∅,
               max(A) =
                                0                        if A = ∅,
                            
                                the smallest element in A if A 6= ∅,
              min(A) =
                                0                         if A = ∅,
and the random variable

                                X = max(A) − min(A).

   • Prove that the expected value E(X) of the random variable X satisfies

                                    E(X) = n − 3 + f (n),

        where f (n) is some function that converges to 0 when n → ∞.
        Hint: Introduce random variables Y = min(A) and Z = max(A) and
        compute their expected values. You may use
                      n
                      X                x (n · xn+1 − (n + 1) · xn + 1)
                            k · xk =                                   .
                      k=1
                                                   (x − 1)2

   • Give an intuitive explanation why E(X) is approximately equal to n−3.
338                  Chapter 6.        Random Variables and Expectation


6.21 Let n ≥ 1 be an integer and let A[1 . . . n] be an array that stores a
permutation of the set {1, 2, . . . , n}. If the array A is sorted, then A[k] = k
for k = 1, 2, . . . , n and, thus,
                              n
                              X
                                     |A[k] − k| = 0.                      (6.11)
                               k=1


If the array A is not sorted and A[k] = i, where i 6= k, then |A[k] − k| is equal
to the “distance” between the position of the value i in A and the position of
i in case the array were sorted. Thus, the summation in (6.11) is a measure
for the “sortedness” of the array A: If the summation is small, then A is
“close” to being sorted. On the other hand, if the summation is large, then
A is “far away” from being sorted. In this exercise, you will determine the
expected value of the summation in (6.11).
    Assume that the array stores a uniformly random permutation of the set
{1, 2, . . . , n}. For each k = 1, 2, . . . , n, consider the random variable

                               Xk = |A[k] − k|,

and let
                                         n
                                         X
                                 X=            Xk .
                                         k=1


   • Assume that n = 1. Determine the expected value E(X).

   • Assume that n ≥ 2. Is the sequence X1 , X2 , . . . , Xn of random variables
     pairwise independent?

   • Assume that n ≥ 1. Let k be an integer with 1 ≤ k ≤ n. Prove that

                                       n + 1 k 2 − k − kn
                          E (Xk ) =         +             .
                                         2         n

      Hint: Assume A[k] = i. If 1 ≤ i ≤ k, then |A[k] − k| = k − i. If
      k + 1 ≤ i ≤ n, then |A[k] − k| = i − k. For any integer m ≥ 1,

                                                      m(m + 1)
                         1 + 2 + 3 + ··· + m =                 .
                                                         2
6.12.     Exercises                                                                339


    • Assume that n ≥ 1. Prove that

                                              n2 − 1
                                       E(X) =        .
                                                 3
        Hint:
                                                       n(n + 1)(2n + 1)
                      12 + 22 + 32 + · · · + n2 =                       .
                                                              6

6.22 Let n ≥ 2 be an integer. You are given n cider bottles C1 , C2 , . . . , Cn
and two beer bottles B1 and B2 . Consider a uniformly random permutation
of these n + 2 bottles. The positions in this permutation are numbered
1, 2, . . . , n + 2. Consider the following two random variables:

            X =       the position of the first cider bottle,
            Y =       the position of the first bottle having index 1.

For example, if n = 5 and the permutation is

                             B2 , C5 , C2 , C4 , B1 , C3 , C1 ,

then X = 2 and Y = 5.

    • Determine the expected value E(X) of the random variable X.

    • Determine the expected value E(Y ) of the random variable Y .
            Pn+1                         Pn+1 2
      Hint:  k=1 k = (n+1)(n+2)/2 and      k=1 k = (n+1)(n+2)(2n+3)/6.


    • Are X and Y independent random variables?

6.23 Let m ≥ 1 and n ≥ 1 be integers. You are given m cider bottles
C1 , C2 , . . . , Cm and n beer bottles B1 , B2 , . . . , Bn . Consider a uniformly ran-
dom permutation of these m + n bottles. The positions in this permutation
are numbered 1, 2, . . . , m + n. Consider the random variable

                 X = the position of the leftmost cider bottle.

    • Determine the possible values for X.
340                     Chapter 6.         Random Variables and Expectation


    • For any value k that X can take, prove that
                                                            n
                                                                 
                                              m            k−1
                                  Pr(X = k) =   ·          m+n .
                                                               
                                              k             k

       Hint: Use the Product Rule to determine the number of permutations
       for which X = k. Rewrite your answer using basic properties of bino-
       mial coefficients.
    • For each i = 1, 2, . . . , n, consider the indicator random variable
                       
                           1 if Bi is to the left of all cider bottles,
                Xi =
                           0 otherwise.
       Prove that
                                                       1
                                         E (Xi ) =        .
                                                      m+1
    • Express X in terms of X1 , X2 , . . . , Xn .
    • Use the expression from the previous part to determine E(X).
    • Prove that
                                   n+1     n
                                             
                                   X      k−1         m+n+1
                                          m+n
                                                 =            .
                                   k=1     k
                                                      m(m + 1)

6.24 Let b ≥ 1, c ≥ 1, and w ≥ 1 be integers, and let n = b + c + w. You are
given b beer bottles B1 , B2 , . . . , Bb , c cider bottles C1 , C2 , . . . , Cc , and w wine
bottles W1 , W2 , . . . , Ww . Let m ≥ 1 be an integer with m ≤ b and m ≤ n − b.
   All n bottles are in a box. From this box, you choose a uniformly random
subset consisting of m bottles. Consider the random variables
           X =       the number of beer bottles in the chosen subset,
           Y =       the number of cider bottles in the chosen subset,
           Z =       the number of wine bottles in the chosen subset.

    • Determine the expected value E(X + Y + Z).
    • Let k be an integer with 0 ≤ k ≤ m. Prove that
                                           b
                                              n−b 
                                                       k   m−k
                                   Pr(X = k) =             n
                                                                .
                                                           m
6.12.     Exercises                                                              341


   • For each i = 1, 2, . . . , b and j = 1, 2, . . . , c, consider the indicator ran-
     dom variables
                             
                                 1 if Bi is in the chosen subset,
                   Xi =
                                 0 otherwise.
        and                  
                                 1 if Cj is in the chosen subset,
                      Yj =
                                 0 otherwise.
        Prove that
                                                        m
                                  E (Xi ) = E (Yj ) =     .
                                                        n
   • Prove that             m
                                      n−b
                                                
                           X      b             bm n
                               k              =        .
                           k=0
                                  k   m − k      n  m

   • Let i and j be integers with 1 ≤ i ≤ b and 1 ≤ j ≤ c. Are the random
     variables Xi and Yj independent?
   • Let i and j be integers with 1 ≤ i ≤ b and 1 ≤ j ≤ c. Determine
     E (Xi · Yj ).
   • Let i and j be integers with 1 ≤ i ≤ b and 1 ≤ j ≤ c. Is the following
     true or false?
                          E (Xi · Yj ) = E (Xi ) · E (Yj ) .

6.25 Let m ≥ 1, n ≥ 1, and k ≥ 1 be integers with k ≤ m + n. Consider
a set P consisting of m men and n women. We choose a uniformly random
k-element subset Q of P . Consider the random variables

              X = the number of men in the chosen subset Q,
              Y = the number of women in the chosen subset Q,
              Z = X − Y.

   • Prove that
                                  E(Z) = 2 · E(X) − k.

   • Determine the expected value E(X).
        Hint: Denote the men as M1 , M2 , . . . , Mm . Use indicator random
        variables.
342                  Chapter 6.     Random Variables and Expectation


   • Prove that
                                          m−n
                                E(Z) = k ·       .
                                          m+n
6.26 You are given four fair and independent dice, each one having six faces:
  1. One die is red and has the numbers 7, 7, 7, 7, 1, 1 on its faces.
  2. One die is blue and has the numbers 5, 5, 5, 5, 5, 5 on its faces.
  3. One die is green and has the numbers 9, 9, 3, 3, 3, 3 on its faces.
  4. One die is yellow and has the numbers 8, 8, 8, 2, 2, 2 on its faces.
    Let c be a color in the set {red, blue, green, yellow}. You roll the die of
color c. Define the random variable Xc to be the result of this roll.
   • For each c ∈ {red, blue, green, yellow}, determine the expected value
     E (Xc ) of the random variable Xc .
   • Let c and c0 be two distinct colors in the set {red, blue, green, yellow}.
     Determine
                        Pr (Xc < Xc0 ) + Pr (Xc > Xc0 ) .
   • Let c and c0 be two distinct colors in the set {red, blue, green, yellow}.
     We say that the die of color c is better than the die of color c0 , if
                               Pr (Xc > Xc0 ) > 1/2.
        – Is the red die better than the blue die?
        – Is the blue die better than the green die?
        – Is the green die better than the yellow die?
        – Is the yellow die better than the red die?
        – Explain why these dice are called non-transitive dice.
6.27 In this exercise, you are given a fair and independent coin. Let n ≥ 1
be an integer. Farah flips the coin n times, whereas May flips the coin n + 1
times. Consider the following two random variables:
       X =      the number of heads in Farah’s sequence of coin flips,
       Y =      the number of heads in May’s sequence of coin flips.
Let A be the event
                               A = “X < Y ”.
6.12.   Exercises                                                            343


   • Prove that
                                        n   n+1        
                                 1     X    X    n    n+1
                     Pr(A) =                       ·        .
                               22n+1   k=0 `=k+1
                                                 k     `

   • Consider the following two random variables:

          X0 =      the number of tails in Farah’s sequence of coin flips,
          Y0 =      the number of tails in May’s sequence of coin flips.

        – What is X + X 0 ?
        – What is Y + Y 0 ?
        – Let B be the event

                                     B = “ X 0 < Y 0 ”.

           Explain in plain English why

                                       Pr(A) = Pr(B).

        – Express the event B in terms of the event A.
        – Use the results of the previous parts to determine Pr(A).
   • Prove that
                           n   n+1        
                          X    X    n    n+1
                                      ·        = 22n .
                          k=0 `=k+1
                                    k     `

6.28 Elisa Kazan’s neighborhood pub serves three types of drinks: cider,
wine, and beer. Elisa likes cider and wine, but does not like beer.
    After a week of hard work, Elisa goes to this pub and repeatedly orders
a random drink (the results of the orders are mutually independent). If she
gets a glass of cider or a glass of wine, then she drinks it and places another
order. As soon as she gets a pint of beer, she drinks it and takes a taxi home.
    When Elisa orders one drink, she gets a glass of cider with probability 2/5,
a glass of wine with probability 2/5, and a pint of beer with probability 1/5.
    Consider the random variables

           X = the number of drinks that Elisa orders,
           Y = the number of different types that Elisa drinks.
344                 Chapter 6.     Random Variables and Expectation


If we denote cider by C, wine by W , and beer by B, then a possible sequence
of drinks is CCW CB; for this case X = 5 and Y = 3. For the sequence
W W W B, we have X = 4 and Y = 2.

   • Determine the expected value E(X).

   • Describe the sample space in terms of strings consisting of characters
     C, W , and B.

   • Describe the event “Y = 1” in terms of a subset of the sample space.

   • Use the result of the previous part to determine Pr(Y = 1).

   • Describe the event “Y = 2” in terms of a subset of the sample space.

   • Use the result of the previous part to determine Pr(Y = 2).

   • Determine Pr(Y = 3).

   • Use the results of the previous five parts to determine the expected
     value E(Y ).

   • Consider the random variable
                  
                     1 if Elisa drinks at least one glass of cider,
             Yc =
                     0 otherwise.

      Determine the expected value E (Yc ).

   • Consider the random variable
                   
                     1 if Elisa drinks at least one glass of wine,
             Yw =
                     0 otherwise.

      Determine the expected value E (Yw ).

   • Express Y in terms of Yc and Yw .

   • Use the results of the previous three parts to determine the expected
     value E(Y ).
6.12.     Exercises                                                       345


6.29 You repeatedly flip a fair coin and stop as soon as you get tails followed
by heads. (All coin flips are mutually independent.) Consider the random
variable
                   X = the total number of coin flips.
For example, if the sequence of coin flips is HHHT T T T H, then X = 8.

   • Determine the expected value E(X) of X.
        Hint: Use the linearity of expectation.

6.30 In Section 6.6, we have shown that for −1 < x < 1,
                              ∞
                              X                 x
                                     kxk =            .
                               k=1
                                             (1 − x)2

In this exercise, you will prove this identity in a different way.
    Consider the following infinite matrix:
                                                        
                           x 0 0 0 0 0 ...
                        x2 x2 0 0 0 0 . . . 
                                                        
                        x3 x3 x3 0 0 0 . . . 
                                                        
                        x4 x4 x4 x4 0 0 . . . 
                                                        
                        x5 x5 x5 x5 x5 0 . . . 
                           ..  ..   ..  ..   .. .. . .
                                                        
                            .   .    .   .    . .      .

We are going to add all elements in this matrix in two different ways. A
row-sum is the sum of all elements in one row, whereas a column-sum is the
sum of all elements in one column.
   Note that the sum of all row-sums is equal to
                                                          ∞
                                                          X
                   x + 2x2 + 3x3 + 4x4 + 5x5 + · · · =          kxk .
                                                          k=1


   • Using only the identity ∞        k   1
                              P
                                 k=0 x = 1−x and algebraic manipulation,
     prove that the sum of all column-sums is equal to
                                           x
                                                 .
                                        (1 − x)2
346                 Chapter 6.     Random Variables and Expectation


6.31 Let X be a random variable that takes values in {0, 1, 2, 3, . . .}. By
Lemma 6.4.3, we have
                                  ∞
                                  X
                         E(X) =         k · Pr(X = k).
                                  k=1

As in Exercise 6.30, define an infinite matrix and use it to prove that
                                   ∞
                                   X
                          E(X) =            Pr(X ≥ k).
                                   k=1

6.32 Let 0 < p < 1 and consider a coin that comes up heads with probability
p and tails with probability 1 − p. We flip the coin independently until it
comes up heads for the first time. Define the random variable X to be the
number of times that we flip the coin. In Section 6.6, we have shown that
E(X) = 1/p. Below, you will prove this in a different way.

   • Let k ≥ 1 be an integer. Determine Pr(X ≥ k).

   • Using only the identity ∞      k     1
                             P
                               k=0 x = 1−x , the expression for E(X) from
     Exercise 6.31, and your answer for Pr(X ≥ k), prove that E(X) = 1/p.

6.33 By flipping a fair coin repeatedly and independently, we obtain a se-
quence of H’s and T ’s. We stop flipping the coin as soon as the sequence
contains either HH or T T . Define the random variable X to be the number
of times that we flip the coin. For example, if the sequence of coin flips is
HT HT T , then X = 5.

   • Let k ≥ 2 be an integer. Determine Pr(X = k).

   • Determine the expected value E(X) of X using the expression
                                  X
                         E(X) =      k · Pr(X = k).
                                        k
                                              P∞                x
      Hint: Recall that, for −1 < x < 1,        k=1   kxk =   (1−x)2
                                                                     .

   • Determine Pr(X ≥ 1).

   • Let k ≥ 2 be an integer. Determine Pr(X ≥ k).
6.12.     Exercises                                                         347


   • According to Exercise 6.31, we have
                                        ∞
                                        X
                               E(X) =         Pr(X ≥ k).
                                        k=1

        Use this expression to determine the expected value E(X) of X.

6.34 Consider an experiment that is successful with probability 0.8. We
repeat this experiment (independently) until it is successful for the first time.
The first 5 times we do the experiment, we have to pay $10 per experiment.
After this, we have to pay $5 per experiment. Define the random variable X
to be the total amount of money that we have to pay during all experiments.
Determine the expected   value E(X).
Hint: Recall that ∞        k−1
                               = 1/(1 − x)2 .
                  P
                     k=1 kx

6.35 When Lindsay and Simon have a child, this child is a boy with prob-
ability 1/2 and a girl with probability 1/2, independently of the gender of
previous children. Lindsay and Simon stop having children as soon as they
have a girl. Consider the random variables

            B = the number of boys that Lindsay and Simon have

and
            G = the number of girls that Lindsay and Simon have.
Determine the expected values E(B) and E(G).

6.36 Let p be a real number with 0 < p < 1. When Lindsay and Simon
have a child, this child is a boy with probability p and a girl with probability
1 − p, independently of the gender of previous children. Lindsay and Simon
stop having children as soon as they have a child that has the same gender as
their first child. Define the random variable X to be the number of children
that Lindsay and Simon
                    P∞ have.      Determine the expected value E(X).
                              k−1
Hint: Recall that k=1 kx          = 1/(1 − x)2 .

6.37 Let X1 , X2 , . . . , Xn be a sequence of mutually independent random vari-
ables. For each i with 1 ≤ i ≤ n, assume that

   • the variable Xi is either equal to 0 or equal to n + 1, and
348                 Chapter 6.      Random Variables and Expectation


   • E(Xi ) = 1.

Determine
                        Pr(X1 + X2 + · · · + Xn ≤ n).

6.38 The Ottawa Senators and the Toronto Maple Leafs play a best-of-seven
series: These two hockey teams play games against each other, and the first
team to win four games wins the series. Assume that

   • each game has a winner (thus, no game ends in a tie),

   • in any game, the Sens have a probability of 3/4 of defeating the Leafs,

   • the results of the games are mutually independent.

Determine the probability that seven games are played in this series.

6.39 Let n ≥ 1 be an integer, let p be a real number with 0 < p < 1, and
let X be a random variable that has a binomial distribution with parameters
n and p. In Section 6.7.1, we have seen that the expected value E(X) of X
satisfies                       n     
                               X      n k
                      E(X) =       k     p (1 − p)n−k .              (6.12)
                               k=1
                                      k

Recall Newton’s Binomial Theorem (i.e., Theorem 3.6.5):
                                    n  
                                n
                                   X   n n−k k
                         (x + y) =       x y .
                                   k=0
                                       k

   • Use (6.12) to prove that E(X) = pn, by taking the derivative, with
     respect to y, in Newton’s Binomial Theorem.

6.40 A block in a bitstring is a maximal consecutive substring of 1’s. For
example, the bitstring 1000111110100111 has four blocks: 1, 11111, 1, and
111.
    Let n ≥ 1 be an integer and consider a random bitstring of length n that
is obtained by flipping a fair coin, independently, n times. Define the random
variable X to be the number of blocks in this bitstring.

   • Use Exercise 4.43 to determine the expected value E(X) of X.
6.12.    Exercises                                                                  349


    • Use indicator random variables to determine the expected value E(X)
      of X.
6.41 Let n ≥ 1 be an integer and consider a uniformly random permutation
a1 , a2 , . . . , an of the set {1, 2, . . . , n}. Define the random variable X to be the
number of indices i for which 1 ≤ i < n and ai < ai+1 .
     Determine the expected value E(X) of X.
Hint: Use indicator random variables.
6.42 Let n ≥ 2 be an integer and let a1 , a2 , . . . , an be a permutation of the
set {1, 2, . . . , n}. Define a0 = 0 and an+1 = 0, and consider the sequence
                             a0 , a1 , a2 , a3 , . . . , an , an+1 .
A position i with 1 ≤ i ≤ n is called awesome, if ai > ai−1 and ai > ai+1 .
In words, i is awesome if the value at position i is larger than both its
neighboring values.
   For example, if n = 6 and the permutation is 2, 5, 4, 3, 1, 6, we get the
sequence
                       value        0     2    5     4    3     1      6   0
                      position      0     1    2     3    4     5      6   7
In this case, the positions 2 and 6 are awesome, whereas the positions 1, 3,
4, and 5 are not awesome.
    Consider a uniformly random permutation of the set {1, 2, . . . , n} and
define the random variable X to be the number of awesome positions. De-
termine the expected value E(X) of the random variable X.
Hint: Use indicator random variables.
6.43 Let n ≥ 1 be an integer and consider a permutation a1 , a2 , . . . , an of the
set {1, 2, . . . , n}. We partition this permutation into increasing subsequences.
For example, for n = 10, the permutation
                               3, 5, 8, 1, 2, 4, 10, 7, 6, 9
is partitioned into four increasing subsequences: (i) 3, 5, 8, (ii) 1, 2, 4, 10, (iii)
7, and (iv) 6, 9.
    Let a1 , a2 , . . . , an be a uniformly random permutation of {1, 2, . . . , n}.
Define the random variable X to be the number of increasing subsequences
in the partition of this permutation. For the example above, we have X = 4.
In this exercise, you will determine the expected value E(X) of X in two
different ways.
350                Chapter 6.      Random Variables and Expectation


   • For each i with 1 ≤ i ≤ n, let
               
                  1 if an increasing subsequence starts at position i,
          Xi =
                  0 otherwise.

      For the example above, we have X1 = 1, X2 = 0, X3 = 0, and X8 = 1.

        – Determine E (X1 ).
        – Let i be an integer with 2 ≤ i ≤ n. Use the Product Rule to
          determine the number of permutations of {1, 2, . . . , n} for which
          Xi = 1.
        – Use these indicator random variables to determine E(X).

   • For each i with 1 ≤ i ≤ n, let
            
             1 if the value i is the leftmost element of an increasing
       Yi =        subsequence,
                0 otherwise.
            

      For the example above, we have Y1 = 1, Y3 = 1, Y5 = 0, and Y7 = 1.

        – Determine E (Y1 ).
        – Let i be an integer with 2 ≤ i ≤ n. Use the Product Rule to
          determine the number of permutations of {1, 2, . . . , n} for which
          Yi = 1.
        – Use these indicator random variables to determine E(X).

6.44 Lindsay Bangs and Simon Pratt visit their favorite pub that has 10
different beers on tap. Both Lindsay and Simon order, independently of each
other, a uniformly random subset of 5 beers.

   • One of the beers available is Leo’s Early Breakfast IPA. Determine the
     probability that this is one of the beers that Lindsay orders.

   • Let X be the random variable whose value is the number of beers that
     are ordered by both Lindsay and Simon. Determine the expected value
     E(X) of X.
      Hint: Use indicator random variables.
6.12.   Exercises                                                          351


6.45 Lindsay and Simon have discovered a new pub that has n different
beers B1 , B2 , . . . , Bn on tap, where n ≥ 1 is an integer. They want to try
all different beers in this pub and agree on the following approach: During
a period of n days, they visit the pub every day. On each day, they drink
one of the beers. Lindsay drinks the beers in order, i.e., on the i-th day, she
drinks beer Bi . Simon takes a uniformly random permutation a1 , a2 , . . . , an
of the set {1, 2, . . . , n} and drinks beer Bai on the i-th day.
    Let X be the random variable whose value is the number of days during
which Lindsay and Simon drink the same beer. Determine the expected value
E(X) of X.
Hint: Use indicator random variables.

6.46 Consider the following recursive algorithm TwoTails, which takes as
input a positive integer n:
    Algorithm TwoTails(n):

          // all coin flips are mutually independent
          flip a fair coin twice;
          if the coin came up tails exactly twice
          then return 2n
          else TwoTails(n + 1)
          endif




   • You run algorithm TwoTails(1), i.e., with n = 1. Define the random
     variable X to be the value of the output
                                             of this algorithm. Let k ≥ 1
                                         k
     be an integer. Determine Pr X = 2 .

   • Is the expected value E(X) of the random variable X finite or infinite?

6.47 Let A[1 . . . n] be an array of n numbers. Consider the following two al-
gorithms, which take as input the array A and a number x. If x is not present
in A, then these algorithms return the message “not present”. Otherwise,
they return an index i such that A[i] = x. The first algorithm runs linear
search from left to right, whereas the second algorithm runs linear search
from right to left.
352                 Chapter 6.      Random Variables and Expectation


      Algorithm LinearSearchLeftToRight(A, x):

          i := 1;
          while i ≤ n and A[i] 6= x do i := i + 1 endwhile;
          if i = n + 1 then return “not present” else return i endif


      Algorithm LinearSearchRightToLeft(A, x):

          i := n;
          while i ≥ 1 and A[i] 6= x do i := i − 1 endwhile;
          if i = 0 then return “not present” else return i endif

   Consider the following algorithm, which again take as input the array A
and a number x. If x is not present in A, then it returns the message “not
present”. Otherwise, it returns an index i such that A[i] = x.
      Algorithm RandomLinearSearch(A, x):

          flip a fair coin;
          if the coin comes up heads
          then LinearSearchLeftToRight(A, x)
          else LinearSearchRightToLeft(A, x)
          endif

    Assume that the number x occurs exactly once in the array A and let k
be the index such that A[k] = x. Let X be the random variable whose
value is the number of times the test “A[i] 6= x” is made in algorithm
RandomLinearSearch(A, x). (In words, X is the number of compar-
isons made by algorithm RandomLinearSearch(A, x).) Determine the
expected value E(X) of X.

6.48 Let n ≥ 3 be an integer and let p be a real number with 0 < p < 1.
Consider the set V = {1, 2, . . . , n}. We construct a graph G = (V, E) with
vertex set V , whose edge set E is determined by the following random process:
Each unordered pair {i, j} of vertices, where i 6= j, occurs as an edge in E
with probability p, independently of the other unordered pairs.
   A triangle in G is an unordered triple {i, j, k} of distinct vertices, such
that {i, j}, {j, k}, and {k, i} are edges in G.
6.12.   Exercises                                                          353


   Define the random variable X to be the total number of triangles in the
graph G. Determine the expected value E(X).
Hint: Use indicator random variables.

6.49 In Section 6.9, we have seen the following algorithm InsertionSort,
which sorts any input array A[1 . . . n]:
    Algorithm InsertionSort(A[1 . . . n]):

         for i = 2 to n
         do j = i;
            while j > 1 and A[j] < A[j − 1]
            do swap A[j] and A[j − 1];
                j =j−1
            endwhile
         endfor

Consider an input array A[1 . . . n], where each element A[i] is chosen inde-
pendently and uniformly at random from the set {1, 2, . . . , m}.

   • Let i and j be two indices with 1 ≤ i < j ≤ n, and consider the values
     A[i] and A[j] (just before the algorithm starts). Prove that

                                                1   1
                            Pr(A[i] > A[j]) =     −   .
                                                2 2m

   • Let X be the random variable that is equal to the number of times the
     swap-operation is performed when running InsertionSort(A[1 . . . n]).
     Determine the expected value E(X) of X.

6.50 Let n ≥ 2 be an integer. Consider the following random process that
divides the integers 1, 2, . . . , n into two sorted lists L1 and L2 :

  1. Initialize both L1 and L2 to be empty.

  2. For each i = 1, 2, . . . , n, flip a fair coin. If the coin comes up heads,
     then add i at the end of list L1 . Otherwise, add i at the end of the list
     L2 . (All coin flips during this process are mutually independent.)
354                 Chapter 6.       Random Variables and Expectation


We now run algorithm Merge(L1 , L2 ) of Section 4.6. Define the random
variable X to be the total number of comparisons made when running this
algorithm: As in Section 4.6, X counts the number of times the line “if
x ≤ y” in algorithm Merge(L1 , L2 ) is executed. In this exercise, you will
determine the expected value E(X) of the random variable X.

   • Prove that E(X) = 1/2 for the case when n = 2.

   • Prove that E(X) = 5/4 for the case when n = 3.

   • Assume that n ≥ 2. For each i and j with 1 ≤ i < j ≤ n, consider the
     indicator random variable
                           
                              1 if i and j are compared,
                     Xij =
                              0 otherwise.

      Prove that E (Xij ) = (1/2)j−i .
      Hint: Assume that i and j are compared. Can i and j be in the same
      list? What about the elements i, i + 1, . . . , j − 1 and the element j?

   • Determine E(X).
                                             1−xk+1
      Hint: 1 + x + x2 + x3 + · · · + xk =    1−x
                                                    .

6.51 Assume we have n balls and m boxes. We throw the balls independently
and uniformly at random in the boxes. Thus, for each k and i with 1 ≤ k ≤ n
and 1 ≤ i ≤ m,

               Pr( the k-th ball falls in the i-th box ) = 1/m.

Consider the following three random variables:

         X = the number of boxes that do not contain any ball,
         Y = the number of boxes that contain at least one ball,
         Z = the number of boxes that contain exactly one ball.

   • Determine the expected values E(X), E(Y ), and E(Z).

   • Assuming that m = n, determine the limits

        1. limn→∞ E(X)/n,
6.12.     Exercises                                                            355


          2. limn→∞ E(Y )/n,
          3. limn→∞ E(Z)/n.

        Hint: limn→∞ (1 − 1/n)n = 1/e.

6.52 Let 0 < p < 1 and consider a coin that comes up heads with probability
p and tails with probability 1 − p. For each integer n, let bn be the outcome
when flipping this coin; thus, bn ∈ {H, T }. The values bn partition the set of
integers into intervals, where each interval is a maximal consecutive sequence
of zero or more T ’s followed by one H:

        ... H T T         T    H   T   H    T    T H      H    T   H   ...
        . . . −2 −1 0     1    2   3   4     5   6    7    8   9    10 . . .


   • Consider the interval that contains the integer 0, and let X be its
     length. (In the example above, X = 4.) Determine the expected value
     E(X) of X.
        Hint: Use the Linearity of Expectation. The answer is not 1/p, which
        is the expected number of coin flips until the first H.

6.53 Your friend Mick takes a permutation of 1, 2, . . . , n, stores it in boxes
B1 , B2 , . . . , Bn (so that each box stores exactly one number), and then closes
all boxes. You have no idea what the permutation is.
    Mick opens the boxes B1 , B2 , . . . , Bn , one after another. For each i with
1 ≤ i ≤ n, just before opening box Bi , you have to guess which number is
stored in it.

   • Assume that, when you guess the number in box Bi , you do not remem-
     ber the numbers stored in B1 , B2 , . . . , Bi−1 . Then, the only reasonable
     thing you can do is to take a random element in {1, 2, . . . , n} and guess
     that this random element is stored in Bi .
        Assume that you do this for each i with 1 ≤ i ≤ n. Let X be the
        random variable whose value is equal to the number of times that your
        guess is correct. Compute the expected value E(X) of X.

   • Now assume that your memory is perfect, so that, when you guess the
     number in box Bi , you know the numbers stored in B1 , B2 , . . . , Bi−1 .
356                   Chapter 6.        Random Variables and Expectation


      How would you make the n guesses such that the following is true: If
      Y is the random variable whose value is equal to the number of times
      that your guess is correct, then the expected value E(Y ) of Y satisfies
      E(Y ) = Ω(log n).

6.54 Let n ≥ 1 be an integer.
   • Consider a fixed integer i with 1 ≤ i ≤ n. How many permutations
     a1 , a2 , . . . , an of the set {1, 2, . . . , n} have the property that ai = i?
   • We choose a permutation a1 , a2 , . . . , an of the set {1, 2, . . . , n} uniformly
     at random. Consider the random variable

                           X = |{i : 1 ≤ i ≤ n and ai = i}|.

      Determine the expected value E(X).
      Hint: Use indicator random variables.

6.55 Let n ≥ 2 be an integer. Consider a uniformly random permutation
a1 , a2 , . . . , an of the set {1, 2, . . . , n}. Define the random variable X to be
the number of ordered pairs (i, j) with 1 ≤ i < j ≤ n for which ai = j and
aj = i. Determine the expected value E(X) of X.
Hint: Use indicator random variables.

6.56 Let n ≥ 2 be an integer and consider n people P1 , P2 , . . . , Pn . Each of
these people has a uniformly random birthday, and all birthdays are mutually
independent. (We ignore leap years.) Consider the random variable

X = the number of indices i such that Pi and Pi+1 have the same birthday.

Determine the expected value E(X).
Hint: Use indicator random variables.

6.57 Let d ≥ 1 be the number of days in one year, let n ≥ 2 be an integer,
and consider a group P1 , P2 , . . . , Pn of n people. Assume that each person has
a uniformly random and independent birthday. Define the random variable
X to be the number of pairs {Pi , Pj } of people that have the same birthday.
Prove that                                       
                                              1 n
                                   E(X) =           .
                                              d 2
Hint: Use indicator random variables.
6.12.     Exercises                                                           357


6.58 Nick wants to know how many students cheat on the assignments. One
approach is to ask every student “Did you cheat?”. This obviously does not
work, because every student will answer “I did not cheat”. Instead, Nick
uses the following ingenious scheme, which gives a reasonable estimate of the
number of cheaters, without identifying them.
   We denote the students by S1 , S2 , . . . , Sn . Let k denote the number of
cheaters. Nick knows the value of n, but he does not know the value of k.
   For each i with 1 ≤ i ≤ n, Nick does the following:
  1. Nick meets student Si and asks “Did you cheat?”.
  2. Student Si flips a fair coin twice, independently of each other; Si does
     not show the results of the coin flips to Nick.
        (a) If the coin flips are HH or HT , then Si is honest in answering the
            question: If Si is a cheater, then he answers “I cheated”; otherwise,
            he answers “I did not cheat”.
        (b) If the coin flips are T H, then Si answers “I cheated”.
         (c) If the coin flips are T T , then Si answers “I did not cheat”.

   • Define the random variable X to be the number of students who answer
     “I cheated”. Determine the expected value E(X) of X.
        Hint: For each i, use an indicator random variable Xi which indicates
        whether or not Si answers “I cheated”. If Si is a cheater, what is
        E (Xi )? If Si is not a cheater, what is E (Xi )?
   • Consider the random variable
                                       Y = 2X − n/2.
        Prove that E(Y ) = k. In words, the expected value of Y is equal to
        the number of cheaters.

6.59 You roll a fair die repeatedly, and independently, until you have seen
all of the numbers 1, 2, 3, 4, 5, 6 at least once. Consider the random variable
                   X = the number of times you roll the die.
For example, if you roll the sequence
                           5, 5, 3, 5, 1, 3, 4, 2, 5, 2, 1, 3, 6,
358                 Chapter 6.         Random Variables and Expectation


then X = 13.
   Determine the expected value E(X) of the random variable X.
Hint: Use the Linearity of Expectation. If you have seen exactly i different
elements from the set {1, 2, 3, 4, 5, 6}, how many times do you expect to roll
the die until you see a new element from this set?


6.60 Michiel’s Craft Beer Company (MCBC) sells n different brands of India
Pale Ale (IPA). When you place an order, MCBC sends you one bottle of
IPA, chosen uniformly at random from the n different brands, independently
of previous orders.
    Simon Pratt wants to try all different brands of IPA. He repeatedly places
orders at MCBC (one bottle per order) until he has received at least one
bottle of each brand.
    Define the random variable X to be the total number of orders that Simon
places. Determine the expected value E(X) of the random variable X.
Hint: Use the Linearity of Expectation. If Simon has received exactly i
different brands of IPA, how many orders does he expect to place until he
receives a new brand?


6.61 MCBC still sells n different brands of IPA. As in Exercise 6.60, when
you place an order, MCBC sends you one bottle of IPA, chosen uniformly at
random from the n different brands, independently of previous orders.
   Simon Pratt places m orders at MCBC. Define the random variable X to
be the total number of distinct brands that Simon receives. Determine the
expected value E(X) of X.
Hint: Use indicator random variables.


6.62 You are given an array A[0 . . . n−1] of n numbers. Let D be the number
of distinct numbers that occur in this array. For each i with 0 ≤ i ≤ n − 1,
let Ni be the number of elements in the array that are equal to A[i].

                      Pn−1
   • Show that D =      i=0   1/Ni .


Consider the following algorithm:
6.12.     Exercises                                                            359


   Algorithm EstimateD(A[1 . . . n]):

   Step 1: Choose an integer k in {0, 1, 2, . . . , n − 1} uniformly at
   random, and let a = A[k].
   Step 2: Traverse the array and compute the number Nk of times
   that a occurs.
   Step 3: Return the value X = n/Nk .


   • Determine the expected value E(X) of the random variable X.
        Hint: Use the definition of expected value, i.e., Definition 6.4.1.

6.63 One of Jennifer and Thomas is chosen uniformly at random. The person
who is chosen wins $100. Consider the random variables

                     J =     the amount that Jennifer wins,
                     T =     the amount that Thomas wins.

Prove that
                       E (max(J, T )) 6= max (E(J), E(T )) .

6.64 Consider the sample space

        S = {(123), (132), (213), (231), (312), (321), (111), (222), (333)}.

We choose an element u from S uniformly at random. For each i = 1, 2, 3,
let Xi be the random variable whose value is the i-th number in u. (For
example, if u = (312), then X1 = 3, X2 = 1, and X3 = 2.) Let N be the
random variable whose value is equal to that of X2 .
   • Verify that Pr(Xi = k) = 1/3 for any i and k with 1 ≤ i ≤ 3 and
     1 ≤ k ≤ 3.

   • Verify that X1 , X2 and X3 are pairwise independent.

   • Verify that X1 , X2 and X3 are not mutually independent.
                 P )
   • Verify that E(N
                   i=1 E(Xi ) = 4.
                   P          P
                       N           E(N )
   • Verify that E     i=1 X i 6=  i=1 E(Xi ).
360                   Chapter 6.       Random Variables and Expectation


6.65 Let k ≥ 0 be an integer and let T be a full binary tree, whose levels
are numbered 0, 1, 2, . . . , k. (The root is at level 0, whereas the leaves are
at level k.) Assume that each edge of T is removed with probability 1/2,
independently of other edges. Denote the resulting graph by T 0 .
    Define the random variable X to be the number of nodes that are con-
nected to the root by a path in T 0 ; the root itself is included in X.
    In the left figure below, the tree T is shown for the case when k = 3. The
right figure shows the tree T 0 : The dotted edges are those that have been
removed from T , the black nodes are connected to the root by a path in T 0 ,
whereas the white nodes are not connected to the root by a path in T 0 . For
this case, X = 6.




                     T                                         T0




   • Let n be the number of nodes in the tree T . Express n in terms of k.

   • Prove that the expected value E(X) of the random variable X is equal
     to
                                   E(X) = log(n + 1).


      Hint: For any ` with 0 ≤ ` ≤ k, how many nodes of T are at level `?
      Use indicator random variables to determine the expected number of
      level-` nodes of T that are connected to the root by a path in T 0 .


6.66 Let n ≥ 2 be a power of two and consider a full binary tree with n leaves.
Let a1 , a2 , . . . , an be a random permutation of the numbers 1, 2, . . . , n. Store
this permutation at the leaves of the tree, in the order a1 , a2 , . . . , an , from
left to right. For example, if n = 8 and the permutation is 2, 8, 1, 4, 6, 3, 5, 7,
then we obtain the following tree:
6.12.   Exercises                                                           361




                2       8       1       4       6       3       5       7

   Perform the following process on the tree:
   • Visit the levels of the tree from bottom to top.

   • At each level, take all pairs of consecutive nodes that have the same
     parent. For each such pair, compare the numbers stored at the two
     nodes, and store the smaller of these two numbers at the common
     parent.
   For our example tree, we obtain the following tree:

                                            1

                            1                               3

                    2               1               3               5

                2       8       1       4       6       3       5       7

   It is clear that at the end of this process, the root stores the number 1.
Define the random variable X to be the number that is not equal to 1 and
that is stored at a child of the root; think of X being the “loser of the final
game”. For our example tree, X = 3.
   In this exercise, you will determine the expected value E(X) of the random
variable X.

   • Prove that 2 ≤ X ≤ 1 + n/2.

   • Prove that the following is true for each k with 1 ≤ k ≤ n/2: X ≥ k +1
     if and only if
362                 Chapter 6.       Random Variables and Expectation


        – all numbers 1, 2, . . . , k are stored in the left subtree of the root
        – or all numbers 1, 2, . . . , k are stored in the right subtree of the
          root.
   • Prove that for each k with 1 ≤ k ≤ n/2,
                                      n/2                        n/2
                                                                     
                                       k
                                          k!(n − k)!              k
                Pr(X ≥ k + 1) = 2 ·                  =2·          n       .
                                           n!                     k

   • According to Exercise 6.31, we have
                                         ∞
                                         X
                              E(X) =           Pr(X ≥ k).
                                         k=1

      Prove that
                                               n/2
                                               X
                     E(X) = Pr(X ≥ 1) +              Pr(X ≥ k + 1).
                                               k=1

   • Use Exercise 3.67 to prove that
                                                  4
                                 E(X) = 3 −          .
                                                 n+2

6.67 If X is a random variable that can take any value in {1, 2, 3, . . .}, and
A is an event, then the conditional expected value E(X | A) is given by
                                   ∞
                                   X
                      E(X | A) =         k · Pr(X = k | A).
                                   k=1

In words, E(X | A) is the expected value of X, when you are given that the
event A occurs.
    You roll a fair die repeatedly, and independently, until you see the num-
ber 6. Define the random variable X to be the number of times you roll the
die (this includes the last roll, in which you see the number 6). It follows
from Theorem 6.6.2 that E(X) = 6. Let A be the event
              A = “the results of all rolls are even numbers”.
Determine the conditional expected
                                P∞value E(X    | A).
                                           k−1
Hint: E(X | A) 6= 3. Recall that k=1 k · x     = 1/(1 − x)2 .
6.12.     Exercises                                                         363


6.68 For any integer n ≥ 0 and any real number x with 0 < x < 1, define
the function
                                    ∞  
                                   X    k k
                          Fn (x) =        x .
                                   k=n
                                        n

(Using the ratio test from calculus, it can be shown that this infinite series
converges for any fixed integer n.)

   • Determine a closed form expression for F0 (x).

   • Let n ≥ 1 be an integer and let x be a real number with 0 < x < 1.
     Prove that
                               x              x2     0
                      Fn (x) = · Fn−1 (x) +       · Fn−1 (x),
                               n              n
              0
     where Fn−1  denotes the derivative of Fn−1 .
     Hint: If k ≥ n ≥ 1, then nk = nk n−1
                                       k−1
                                          
                                            .

   • Prove that for any integer n ≥ 0 and any real number x with 0 < x < 1,
                                                      xn
                                   Fn (x) =                  ,
                                                  (1 − x)n+1

        and
                                              xn + n · xn−1
                                  Fn0 (x) =                 .
                                               (1 − x)n+2

   • Let n ≥ 0 and m be integers with m ≥ n + 1. Prove that
                       min(n+1,m−n)
                                                             m−`
                                                             
                           X
                                          `       n+1
                                     (−1)                         = 0.
                            `=0
                                                   `          n

        Hint: You have shown above that
                           ∞  
                      n+1
                          X   k k
               (1 − x)           x = (1 − x)n+1 · Fn (x) = xn .           (6.13)
                          k=n
                              n

        Use Newton’s Binomial Theorem to expand (1 − x)n+1 . Then consider
        the expansion of the left-hand side in (6.13). What is the coefficient of
        xm in this expansion?
364                 Chapter 6.      Random Variables and Expectation


6.69 Consider a fair red coin and a fair blue coin. We repeatedly flip both
coins, and keep track of the number of times that the red coin comes up
heads. As soon as the blue coin comes up tails, the process terminates.
    A formal description of this process is given in the pseudocode below.
The value of the variable i is equal to the number of iterations performed
so far, the value of the variable h is equal to the number of times that the
red coin came up heads so far, whereas the Boolean variable stop is used to
decide when the while-loop terminates.
      Algorithm RandomCoinFlips:

          // both the red coin and the blue coin are fair
          // all coin flips are mutually independent
          i = 0;
          h = 0;
          stop = false;
          while stop = false
          do i = i + 1;
              flip the red coin;
              if the result of the red coin is heads
              then h = h + 1
              endif;
              flip the blue coin;
              if the result of the blue coin is tails
              then stop = true
              endif
          endwhile;
          return i and h

   Consider the random variables

 X = the value of i that is returned by algorithm RandomCoinFlips,
 Y = the value of h that is returned by algorithm RandomCoinFlips.

   Assume that the value of the random variable Y is equal to some integer
n ≥ 0. In this exercise, you will determine the expected value of the random
variable X.
   Thus, we are interested in the conditional expected value E(X | Y = n),
which is the expected value of X (i.e., the number of iterations of the while-
6.12.   Exercises                                                        365


loop), when you are given that the event “Y = n” (i.e., during the while-loop,
the red coin comes up heads n times) occurs. Formally, we have
                                  X
                E(X | Y = n) =        k · Pr(X = k | Y = n),
                                       k

where the summation ranges over all values of k that X can take.
   The functions Fn and Fn0 that are used below are the same as those in
Exercise 6.68.

   • Let n ≥ 1 be an integer. Prove that
                                ∞
                                X
                Pr(Y = n) =           Pr(Y = n | X = k) · Pr(X = k).
                                k=n


   • Prove that
                                ∞
                                X
                  Pr(Y = 0) =         Pr(Y = 0 | X = k) · Pr(X = k).
                                k=1


   • Let n ≥ 1 be an integer. Prove that

                                Pr(Y = n) = Fn (1/4).

   • Prove that
                                                 1
                                      Pr(Y = 0) = .
                                                 3
   • Let n ≥ 1 be an integer. Prove that
                                            Fn0 (1/4)
                           E(X | Y = n) =              .
                                          4 · Fn (1/4)

   • Let n ≥ 1 be an integer. Prove that
                                                 4n + 1
                             E(X | Y = n) =             .
                                                   3

   • Prove that
                                               4
                                 E(X | Y = 0) = .
                                               3
366                  Chapter 6.           Random Variables and Expectation


6.70 Let (S, Pr) be a probability space, and let X and Y be two identical
non-negative random variables on S. Thus, for all ω in S, X(ω) = Y (ω) ≥ 0.
   Consider the new probability space (S 2 , Pr), where S 2 is the Cartesian
product S × S and

                         Pr (ω1 , ω2 ) = Pr (ω1 ) · Pr (ω2 )

for all elements (ω1 , ω2 ) in S 2 . (In words, we choose two elements ω1 and ω2
in S, independently of each other.)
    Consider the random variable Z on S 2 defined by

                    Z (ω1 , ω2 ) = min (X (ω1 ))2 , (Y (ω2 ))2
                                                                      


for all (ω1 , ω2 ) in S 2 . Observe that the expected value of Z is equal to
                            X X
                   E(Z) =                 Z (ω1 , ω2 ) · Pr (ω1 , ω2 ) .
                            ω1 ∈S ω2 ∈S


   • Let a and b be two non-negative real numbers. Prove that

                                   min a2 , b2 ≤ ab.
                                              


   • Prove that
                                   E(Z) ≤ (E(X))2 .

6.71 Carleton University has implemented a new policy for students who
cheat on assignments:

   1. When a student is caught cheating, the student meets with the Dean.

   2. The Dean has a box that contains n coins. One of these coins has the
      number n written on it, whereas each of the other n − 1 coins has the
      number 1 written on it. Here, n is a very large integer.

   3. The student chooses a uniformly random coin from the box.

   4. If x is the number written on the chosen coin, then the student gives
      x2 bottles of cider to Elisa Kazan.
6.12.     Exercises                                                       367


   Consider the random variables

             X = the number written on the chosen coin,
             Z = the number of bottles of cider that Elisa gets.

(Note that Z = X 2 .)

   • Prove that
                                E(X) = 2 − 1/n ≤ 2.

   • Prove that
                              E(Z) = n + 1 − 1/n ≥ n.

   • Prove that
                               E X2 =6 O (E(X))2 .
                                               

   • By the arguments above, Elisa gets, on average, a very large amount
     of cider. Since she cannot drink all these bottles, Carleton University
     changes their policy:

          1. The student chooses a uniformly random coin from the box (and
             puts it back in the box).
          2. Again, the student chooses a uniformly random coin from the box
             (and puts it back in the box).
          3. If x is the number written on the first chosen coin, and y is the
             number written on the second chosen coin, then the student gives
             min(x2 , y 2 ) bottles of cider to Elisa.

        Consider the random variables

               U = the number written on the first chosen coin,
               V = the number written on the second chosen coin,
               W = the number of bottles of cider that Elisa gets.

Use Exercise 6.70 to prove that

                                  E(W ) ≤ 4.
368   Chapter 6.   Random Variables and Expectation
Chapter 7

The Probabilistic Method

The Probabilistic Method is a very powerful and surprising tool that uses
probability theory to prove results in discrete mathematics. In this chapter,
we will illustrate this method using several examples.


7.1     Large Bipartite Subgraphs
Recall that a graph is a pair G = (V, E), where V is a finite set whose
elements are called vertices and E is a set whose elements are unordered
pairs of distinct vertices. The elements of E are called edges. Assume we
partition the vertex set V of G into two subsets A and B (thus, A ∩ B = ∅
and A ∪ B = V ). We say that an edge of E is between A and B, if one vertex
of this edge is in A and the other vertex is in B.
                              a              b


                                      e


                              d              c

   For example, in the graph above, let A = {a, d} and B = {b, c, e}. Then
four of the eight edges are between A and B, namely {a, b}, {a, e}, {d, c},
and {d, e}. Thus, the vertex set of this graph can be partitioned into two
subsets A and B, such that at least half of G’s edges are between A and B.
The following theorem states that this is true for any graph.
370                              Chapter 7.            The Probabilistic Method


Theorem 7.1.1 Let G = (V, E) be a graph with m edges. The vertex set V
of G can be partitioned into two subsets A and B such that the number of
edges between A and B is at least m/2.

Proof. Consider the following random process: Initialize A = ∅ and B = ∅.
For each vertex u of G, flip a fair and independent coin. If the coin comes
up heads, add u to A; otherwise, add u to B.
   Define the random variable X to be the number of edges of G that are
between A and B. We will determine the expected value E(X) of X.
   Number the edges of G arbitrarily as e1 , e2 , . . . , em . For each i with 1 ≤
i ≤ m, consider the indicator random variable
                     
                       1 if e is an edge between A and B,
               Xi =
                       0 otherwise.

Then
                                        m
                                        X
                                  X=          Xi
                                        i=1

and
                                            m
                                                       !
                                            X
                          E(X) = E                Xi
                                            i=1
                                      m
                                      X
                                  =         E (Xi )
                                      i=1
                                      m
                                      X
                                  =         Pr (Xi = 1) .
                                      i=1



To determine Pr (Xi = 1), let ei have vertices a and b. The following ta-
ble shows the four possibilities for a and b; each one of them occurs with
probability 1/4.

                             a ∈ A, b ∈ A         Xi   =0
                             a ∈ A, b ∈ B         Xi   =1
                             a ∈ B, b ∈ A         Xi   =1
                             a ∈ B, b ∈ B         Xi   =0
7.2.   Ramsey Theory                                                        371


Since Xi = 1 in two out of the four cases, we have

                           Pr (Xi = 1) = 2/4 = 1/2,

and it follows that
                                    m
                                    X
                           E(X) =          1/2 = m/2.
                                     i=1

   Assume the claim in the theorem does not hold. Then, no matter how
we partition the vertex set V into A and B, the number of edges between
A and B will be less than m/2. In particular, the random variable X will
always be less than m/2. But then, E(X) < m/2 as well, contradicting that
E(X) = m/2.



7.2      Ramsey Theory
We return to a problem that we have seen in Section 1.1. Consider a complete
graph with n vertices, in which each vertex represents a person. Any pair of
distinct vertices is connected by an edge. Such an edge is solid if the two
persons representing the vertices of this edge are friends. If these persons are
strangers, the edge is dashed. Consider a subset S of k vertices. We say that
S is a solid k-clique, if any two distinct vertices in S are connected by a solid
edge. Thus, a solid k-clique represents a group of k mutual friends. If any
two distinct vertices of S are connected by a dashed edge, then we say that
S is a dashed k-clique; this represents a group of k mutual strangers.
    In Section 1.1, we stated, without proof, Theorem 1.1.3. We repeat the
statement of this theorem and use the Probabilistic Method to prove it.

Theorem 7.2.1 Let k ≥ 3 and n ≥ 3 be integers with n ≤ b2k/2 c. There
exists a complete graph with n vertices, in which each edge is either solid or
dashed, such that this graph does not contain a solid k-clique and does not
contain a dashed k-clique.

Proof. We denote the complete graph with n vertices by Kn . Consider the
following random process: For each edge e of Kn , flip a fair and independent
coin. If the coin comes up heads, make e a solid edge; otherwise, make e a
dashed edge.
372                             Chapter 7.            The Probabilistic Method


   Consider the event

       A = “there is a solid k-clique or there is a dashed k-clique”.
                                                             
We will prove below that Pr(A) < 1. This will imply that Pr A > 0, i.e.,
the event

      A = “there is no solid k-clique and there is no dashed k-clique”

has a positive probability. This, in turn, will imply that the statement
                                                                        in the
theorem holds: If the statement would not hold, then Pr A would be zero.
    Thus, it    remains to prove that Pr(A) < 1. The vertex set of Kn has
exactly nk  many subsets of size k. We     denote these subsets by Vi , i =
              n                           n
1, 2, . . . , k . For each i with 1 ≤ i ≤ k , consider the event

             Ai = “Vi is a solid k-clique or a dashed k-clique”.

                                                                     k
                                                                         
Since the event Ai occurs if and only if the edges joining the       2
                                                                             pairs of
vertices of Vi are either all solid or all dashed, we have

                                              2
                                Pr (Ai ) =        ;
                                             2(2)
                                               k




note that the denominator is equal to 2 to the power k2 .
                                                             

    Since A occurs if and only if A1 ∨ A2 ∨ · · · ∨ A(n) occurs, the Union Bound
                                                      k
(i.e., Lemma 5.3.5) implies that
                                                     
                    Pr(A) = Pr A1 ∨ A2 ∨ · · · ∨ A(n)
                                                           k

                                ()
                                X
                                  n
                                  k

                            ≤         Pr (Ai )
                                i=1
                              (nk)
                              X       2
                            =
                                      (k2)
                              i=1 2
                              2 nk
                                   
                            =        .
                              2(2)
                                 k
7.3.   Sperner’s Theorem                                                 373


If we can show that the quantity in the last line is less than one, then the
proof is complete. We have

            2 nk
                 
                      n(n − 1)(n − 2) · · · (n − k + 1)       2
                   =                                    · (k2 −k)/2
            2(2)
               k
                                    k!                   2
                        nk 21+k/2
                   ≤      ·       .
                        k! 2k2 /2
Since n ≤ b2k/2 c ≤ 2k/2 , we get

                           2 nk
                                
                                    (2k/2 )k 21+k/2
                                  ≤         · k2 /2
                            2(2)
                              k
                                      k!      2
                                       21+k/2
                                =             .
                                         k!
By Exercise 2.8, we have k! > 21+k/2 for k ≥ 3. Thus, we conclude that

                                 2 nk
                                      
                                        < 1.
                                 2(2)
                                    k




    Take, for example, k = 20 and n = 1024. Theorem 7.2.1 states that there
exists a group of 1024 people that does not contain a subgroup of 20 mutual
friends and does not contain a subgroup of 20 mutual strangers. In fact, the
proof shows more: Consider a group of 1024 people such that any two are
friends with probability 1/2, and strangers with probability 1/2. The above
proof shows that Pr(A), i.e., the probability that there is a subgroup of 20
mutual friends or there is a subgroup of 20 mutual strangers, satisfies

                                       21+k/2   211
                           Pr(A) ≤            =     .
                                         k!     20!
Therefore, with probability at least

                          211
                       1−     = 0.999999999999999158,
                          20!
(there are 15 nines) this group does not contain a subgroup of 20 mutual
friends and does not contain a subgroup of 20 mutual strangers.
374                              Chapter 7.           The Probabilistic Method


7.3      Sperner’s Theorem
In Section 1.2, we considered the following problem. Let S be a set of size n
and consider a sequence S1 , S2 , . . . , Sm of m subsets of S, such that for all i
and j with i 6= j,
                            Si 6⊆ Sj and Sj 6⊆ Si .                          (7.1)
What is the largest possible value of m for which such a sequence exists?
    The sequence consisting of all subsets of S having size bn/2c satisfies
                                              n
(7.1). This sequence has length m = bn/2c         . In Section 1.2, we stated,
without proof, that this is the largest possible value of m; see Theorem 1.2.1.
After stating this theorem again, we will prove it using the Probabilistic
Method.

Theorem 7.3.1 (Sperner) Let n ≥ 1 be an integer and let S be a set with
n elements. Let S1 , S2 , . . . , Sm be a sequence of m subsets of S, such that for
all i and j with i 6= j,
                                  Si 6⊆ Sj and Sj 6⊆ Si .
Then                                           
                                            n
                                 m≤              .
                                          bn/2c

Proof. We assume that none of the subsets in the sequence S1 , S2 , . . . , Sm is
empty, because otherwise, m must be equal to 1, in which case the theorem
clearly holds.
    We assume that S = {1, 2, . . . , n}. We choose a uniformly random per-
mutation a1 , a2 , . . . , an of the elements of S; thus, each permutation has
probability 1/n! of being chosen. Consider the following sequence of subsets
A1 , A2 , . . . , An of S: For j = 1, 2, . . . , n,

                             Aj = {a1 , a2 , . . . , aj }.

For example, if n = 4 and the permutation is 3, 1, 4, 2, then

                               A1   =    {3},
                               A2   =    {1, 3},
                               A3   =    {1, 3, 4},
                               A4   =    {1, 2, 3, 4}.
7.3.    Sperner’s Theorem                                                                  375


Observe that the subsets A1 , A2 , . . . , An are random subsets of S, because
the permutation was randomly chosen.
     Consider a subset Si in the statement of the theorem. We say that Si
occurs in the sequence A1 , A2 , . . . , An if there is an index j such that Si = Aj .
     Define the random variable X to be the number of subsets in the sequence
S1 , S2 , . . . , Sm that occur in A1 , A2 , . . . , An . Since the subsets A1 , A2 , . . . , An
are properly nested, i.e.,
                                   A1 ⊂ A2 ⊂ · · · ⊂ An ,
the assumption in the theorem implies that X is either 0 or 1. It follows that
the expected value of X satisfies
                                           E(X) ≤ 1.
We now derive an exact expression for the value of E(X). For each i with
1 ≤ i ≤ m, consider the indicator random variable
               
                  1 if Si occurs in the sequence A1 , A2 , . . . , An ,
         Xi =
                  0 otherwise.
Let k denote the size of the subset Si , i.e., k = |Si |. Then
                             Xi = 1 if and only if Si = Ak .
Since Ak = {a1 , a2 , . . . , ak }, Xi = 1 if and only if the first k values in the
permutation form a permutation of the subset Si :
                            a1, a2, . . . , ak   ak+1, ak+2, . . . , an


                         permutation of Si
The Product Rule of Section 3.1 shows that there are k!(n − k)! many per-
mutations of S that have this property. Therefore, since we chose a random
permutation of S, we have
                                 E (Xi ) = Pr (Xi = 1)
                                           k!(n − k)!
                                         =
                                               n!
                                            1
                                         = n
                                                   k
                                                    1
                                             =      n
                                                           .
                                                   |Si |
376                            Chapter 7.                    The Probabilistic Method


   Thus, since
                                       m
                                       X
                                X=              Xi ,
                                          i=1
we get
                                                m
                                                                 !
                                                X
                          E(X) = E                          Xi
                                                i=1
                                        m
                                        X
                                  =             E (Xi )
                                          i=1
                                        m
                                        X             1
                                  =                n
                                                            .
                                          i=1     |Si |

If we combine this with our upper bound E(X) ≤ 1, we get
                               m
                               X       1
                                       n
                                               ≤ 1.
                               i=1    |Si |

For a fixed value of n, the binomial coefficient nk is maximized when k =
                                                   

bn/2c; i.e., the largest value in the n-th row of Pascal’s Triangle (see Sec-
tion 3.8) is in the middle. Thus,
                                            
                                n          n
                                     ≤          ,
                               |Si |     bn/2c
implying that
                                      m
                                      X           1
                             1 ≥                 n
                                                        
                                      i=1       |Si |
                                      m
                                      X            1
                                ≥                 n
                                                            
                                      i=1       bn/2c
                                           m
                                =           n
                                                  .
                                          bn/2c

We conclude that                               
                                            n
                              m≤                 .
                                          bn/2c
7.4.    The Jaccard Distance between Finite Sets                           377


7.4       The Jaccard Distance between Finite Sets
Let X and Y be two finite and non-empty sets. We want to define a measure
that indicates how “close together” these two sets are. This measure should
be equal to 0 if the two sets are the same (i.e., X = Y ), it should be equal
to 1 if the two sets are disjoint (i.e., X ∩ Y = ∅), and it should be in the
open interval (0, 1) in all other cases.
    The symmetric difference X Y is defined to be the “union minus the
intersection”, i.e.,
                         X Y = (X ∪ Y ) \ (X ∩ Y ).
From the Venn diagram below, it should be clear that

                        X       Y = (X \ Y ) ∪ (Y \ X),

i.e., the set consisting of all elements in X that are not in Y and all elements
in Y that are not in X.
                              X                       Y

                               X \Y             Y \X




    If the symmetric difference X Y is “small” compared to the union X ∪Y ,
then the two sets X and Y are “pretty much the same”. On the other hand,
it X Y is “large” compared to X ∪ Y , then the sets X and Y are “very
different”.
    Based on this, the Jaccard distance dJ (X, Y ) between the two finite and
non-empty sets X and Y is defined as
                                             |X Y |
                              dJ (X, Y ) =            .                    (7.2)
                                             |X ∪ Y |
Since
                        |X      Y | = |X ∪ Y | − |X ∩ Y |,
we have
                                          |X ∩ Y |
                             dJ (X, Y ) = 1 −      .                       (7.3)
                                          |X ∪ Y |
   The following claims are easy to verify:
378                              Chapter 7.      The Probabilistic Method


   • 0 ≤ dJ (X, Y ) ≤ 1.

   • dJ (X, Y ) = dJ (Y, X).

   • dJ (X, X) = 0.

   • If X ∩ Y = ∅, then dJ (X, Y ) = 1.

   • If X 6= Y and X ∩ Y 6= ∅, then 0 < dJ (X, Y ) < 1.

   In the rest of this section, we will prove that the Jaccard distance satisfies
the triangle inequality:


Theorem 7.4.1 Let X, Y , and Z be finite and non-empty sets. Then

                       dJ (X, Z) ≤ dJ (X, Y ) + dJ (Y, Z).


    We will present two proofs of this result. The first proof uses “brute
force”: We consider the Venn diagram for the sets X, Y , and Z. Based on
this diagram, we transform the inequality in Theorem 7.4.1 into an equiv-
alent algebraic inequality. We then argue that the algebraic inequality is
valid. In the second proof, we show that the inequality in Theorem 7.4.1
can be rephrased as an inequality involving probabilities. The result then
follows by straightforward applications of Lemma 5.3.6 and the Union Bound
(Lemma 5.3.5).


7.4.1     The First Proof
In the figure below, you see the Venn diagram for the three sets X, Y , and Z.
The variables a, b, . . . , g denote the number of elements in the different parts
of this diagram. For example, d denotes the number of elements that are
in X and in Y , but not in Z, whereas g denotes the number of elements
that are in all three sets. Note that some of these variables may be equal
to 0. However, since none of the three sets X, Y , and Z is empty, we have
a + d + f + g > 0, b + d + e + g > 0, and c + e + f + g > 0.
7.4.     The Jaccard Distance between Finite Sets                        379

                              X                        Y

                                   a       d       b
                                           g
                                       f       e

                                           c
                                                   Z

   Using the definition of Jaccard distance in (7.2), the inequality in Theo-
rem 7.4.1 is equivalent to
       a+c+d+e                 a+b+e+f                   b+c+d+f
                          ≤                       +                       ,
   a+c+d+e+f +g             a+b+d+e+f +g b+c+d+e+f +g
which we rewrite as
     a+b+e+f                 b+c+d+f                   a+c+d+e
                       +                       −                         ≥ 0.
 a+b+d+e+f +g b+c+d+e+f +g a+c+d+e+f +g
After combining the three fractions into one fraction, and expanding the
three products in the numerator of the resulting fraction, we get1
                                  N
                                      ≥ 0,
                                  D
where
 N = a2 b + ab2 + a2 c + 2abc + b2 c + ac2 + bc2 + a2 d + 2abd + b2 d +
     2acd + 2bcd + ad2 + bd2 + 2abe + b2 e + 2ace + 2bce + c2 e +
     ade + 2bde + cde + be2 + ce2 + a2 f + 4abf + 2b2 f + 4acf +
     4bcf + c2 f + 4adf + 5bdf + 3cdf + 2d2 f + 3aef + 5bef + 4cef +
     4def + 2e2 f + 3af 2 + 4bf 2 + 3cf 2 + 4df 2 + 4ef 2 + 2f 3 + 2abg +
     2b2 g + 2acg + 2bcg + adg + 3bdg + 3beg + ceg + 3af g + 6bf g +
     3cf g + 4df g + 4ef g + 4f 2 g + 2bg 2 + 2f g 2
and
D = (a + b + d + e + f + g)(b + c + d + e + f + g)(a + c + d + e + f + g).
Observe that D > 0. Moreover, all terms in the equation for N are non-
negative and they are connected by plus signs. It follows that N ≥ 0 and,
therefore, N/D ≥ 0. Thus, we have proved Theorem 7.4.1.
  1
      with some help from Wolfram Alpha
380                              Chapter 7.                 The Probabilistic Method


7.4.2     The Second Proof
Consider the set X ∪ Y ∪ Z. Note that this is a set, so that there are no
duplicates. Let n = |X∪Y ∪Z| and consider a uniformly random permutation

                                x1 , x2 , x3 , . . . , xn

of the elements of X ∪ Y ∪ Z. Consider the random variables

                            i = min{` : x` ∈ X},
                            j = min{` : x` ∈ Y },
                            k = min{` : x` ∈ Z}.

In words, i is determined by walking along the sequence x1 , x2 , x3 , . . . , xn ,
from left to right. The value of i is the index of the first element that
belongs to the set X.
   Consider the event
                            AXY = “i 6= j”.
We are going to determine the probability Pr (AXY ) that this event occurs.
Observe that                                   
                       Pr (AXY ) = 1 − Pr AXY ,
where AXY is the event
                                     AXY = “i = j”.
                              
To determine Pr AXY , we do the following. Remove from the sequence
x1 , x2 , . . . , xn all elements that do not belong to X and do not belong to Y .
Then we are left with a uniformly random permutation of the set X ∪Y . The
event AXY occurs if and only if the first element of this new sequence belongs
to both X and Y . Since each element of X ∪ Y has the same probability of
being the first element in this new sequence, it follows that
                                     |X ∩ Y |
                              Pr AXY =
                                       |X ∪ Y |

and, thus, using (7.3),

                                         |X ∩ Y |
                     Pr (AXY ) = 1 −              = dJ (X, Y ).
                                         |X ∪ Y |
7.5.    Planar Graphs and the Crossing Lemma                             381


   If we consider the events

                               AXZ = “i 6= k”

and
                               AY Z = “j 6= k”,
then we have, by the same arguments,

                           Pr (AXZ ) = dJ (X, Z)

and
                           Pr (AY Z ) = dJ (Y, Z).
Thus, the inequality in Theorem 7.4.1 is equivalent to

                     Pr (AXZ ) ≤ Pr (AXY ) + Pr (AY Z ) .

Since
                           i 6= k ⇒ i 6= j ∨ j 6= k,
Lemma 5.3.6 implies that

                       Pr (AXZ ) ≤ Pr (AXY ∨ AY Z ) .

By applying the Union Bound (Lemma 5.3.5), we conclude that

                     Pr (AXZ ) ≤ Pr (AXY ) + Pr (AY Z ) .

Thus, we have completed our second proof of Theorem 7.4.1.


7.5      Planar Graphs and the Crossing Lemma
Consider a graph G = (V, E). Any one-to-one function f : V → R2 gives an
embedding of G:
  1. Each vertex a of V is drawn as the point f (a) in the plane.

  2. Each edge {a, b} of E is drawn as the straight-line segment f (a)f (b)
     between the points f (a) and f (b).
Besides the function f being one-to-one, we assume that it satisfies the fol-
lowing three properties:
382                              Chapter 7.        The Probabilistic Method


  1. For any two edges {a, b} and {a0 , b0 } of E, the intersection of the line
     segments f (a)f (b) and f (a0 )f (b0 ) is empty or consists of exactly one
     point.
  2. For any edge {a, b} in E and any vertex c in V , the point f (c) is not
     in the interior of the line segment f (a)f (b).
  3. For any three edges {a, b}, {a0 , b0 }, and {a00 , b00 } of E, the line segments
     f (a)f (b), f (a0 )f (b0 ), and f (a00 )f (b00 ) do not have a point in common
     that is in the interior of any of these line segments.
    For simplicity, we do not distinguish any more between a graph and its
embedding. That is, a vertex a refers to both an element of V and the point
in the plane that represents a. Similarly, an edge refers to both an element
of E and the line segment that represents it.

7.5.1    Planar Graphs
Definition 7.5.1 An embedding of a graph G = (V, E) is called plane, if no
two edges of E intersect, except possibly at their endpoints. A graph G is
called planar if there is a plane embedding of G.

    Consider a plane embedding of a planar graph G. Again for simplicity, we
denote this embedding by G. This embedding consists of vertices, edges, and
faces (one of them being the unbounded face). For example, in the following
plane embedding, there are 11 vertices, 18 edges, and 9 faces.




   In the rest of this section, we will use the following notation:
   • G denotes a plane embedding of a planar graph.
   • v denotes the number of vertices of G.
7.5.   Planar Graphs and the Crossing Lemma                              383


   • e denotes the number of edges of G.

   • f denotes the number of faces in the embedding of G.

   How many edges can G have? Since G has v vertices, we obviously have
e ≤ v2 = Θ(v 2 ), an upper bound which holds for any graph with v vertices.
Since our graph G is planar, we expect a much smaller upper bound on e: If
G has Θ(v 2 ) edges, then it seems to be impossible to draw G without edge
crossings. Below, we will prove that e is, in fact, at most linear in v. The
proof will use Euler’s Theorem for planar graphs:

Theorem 7.5.2 (Euler) Consider any plane embedding of a planar graph
G. Let v, e, and f be the number of vertices, edges, and faces of this embed-
ding, respectively. Moreover, let c be the number of connected components of
G. Then
                              v − e + f = c + 1.                         (7.4)

Proof. The idea of the proof is as follows. We start by removing all edges
from G (but keep all vertices), and show that (7.4) holds. Then we add back
the edges of G, one by one, and show that (7.4) remains valid throughout
this process.
    After having removed all edges, we have e = 0 and the embedding consists
of a collection of v points. Since f = 1 and c = v, the relation v −e+f = c+1
holds.
    Assume the relation v − e + f = c + 1 holds and consider what happens
when we add an edge ab. There are two possible cases.
Case 1: Before adding the edge ab, the vertices a and b belong to the same
connected component.



                               b

                         a

   When adding the edge ab,

   • the number v of vertices does not change,
384                            Chapter 7.       The Probabilistic Method


   • the number e of edges increases by one,

   • the number f of faces increases by one (because the edge ab splits one
     face into two),

   • the number c of connected components does not change.

It follows that the relation v − e + f = c + 1 still holds after ab has been
added.
Case 2: Before adding the edge ab, the vertices a and b belong to different
connected components.




                                           b

                              a

   When adding the edge ab,

   • the number v of vertices does not change,

   • the number e of edges increases by one,

   • the number f of faces does not change,

   • the number c of connected components decreases by one.

It again follows that the relation v − e + f = c + 1 still holds after ab has
been added.

   Usually, Euler’s Theorem is stated for connected planar graphs, i.e., pla-
nar graphs for which c = 1:

Theorem 7.5.3 (Euler) Consider any plane embedding of a connected pla-
nar graph G. If v, e, and f denote the number of vertices, edges, and faces
of this embedding, respectively, then

                               v − e + f = 2.
7.5.   Planar Graphs and the Crossing Lemma                             385


   We now show how to use Euler’s Theorem to prove an upper bound on
the number of edges and faces of any connected planar graph:
Theorem 7.5.4 Let G be any plane embedding of a connected planar graph
with v ≥ 3 vertices. Then
  1. G has at most 3v − 6 edges and
  2. this embedding has at most 2v − 4 faces.
Proof. As before, let e and f denote the number of edges and faces of G,
respectively. If v = 3, then e ≤ 3 and f ≤ 2. Hence, in this case, we have
e ≤ 3v − 6 and f ≤ 2v − 4.
    Assume that v ≥ 4. We number the faces of G arbitrarily from 1 to f .
For each i with 1 ≤ i ≤ f , let mi denote the number of edges on the i-th
face of G. Since
             P each edge lies on the boundary of at most two faces, the
summation fi=1 mi counts each edge at most twice. Thus,
                                 f
                                 X
                                       mi ≤ 2e.
                                 i=1

On the other hand, since G is connected and v ≥ 4, each face has at least
three edges on its boundary, i.e., mi ≥ 3. It follows that
                                 f
                                 X
                                       mi ≥ 3f.
                                 i=1

Combining these two inequalities implies that 3f ≤ 2e, which we rewrite as
                                  f ≤ 2e/3.
Using Euler’s formula (with c = 1, because G is connected), we obtain
                         e = v + f − 2 ≤ v + 2e/3 − 2,
which is equivalent to
                                  e ≤ 3v − 6.
We also obtain
                     f ≤ 2e/3 ≤ 2(3v − 6)/3 = 2v − 4.
This completes the proof.
386                            Chapter 7.      The Probabilistic Method


7.5.2    The Crossing Number of a Graph
Consider an embedding of a graph G = (V, E). We say that two distinct
edges of E cross, if their interiors have a point in common. In this case, we
call this common point a crossing.
    The example below shows an embedding of the complete graph K6 on six
vertices, which are denoted by black dots. In this embedding, there are three
crossings, which are numbered 1, 2, and 3.




                                      2
                         3



                                  1




Definition 7.5.5 The crossing number cr (G) of a graph G is defined to be
the minimum number of crossings in any embedding of G.

   Thus, a graph G is planar if and only if cr (G) = 0. The example above
shows that cr (K6 ) ≤ 3.
   In the rest of this section, we consider the following problem: Given a
graph G with v vertices and e edges, can we prove good bounds, in terms of
v and e, on the crossing number cr (G) of G?

A simple lower bound on the crossing number
Let G be any graph with v ≥ 3 vertices and e edges. Consider an embedding
of G having cr (G) crossings; hence, this embedding is the “best” one.
    We “make” G planar, by defining all crossings to be vertices. That is, let
H be the graph whose vertex set is the union of the vertex set of G and the
set of all crossings in the embedding. Edges of G are cut by the crossings
into smaller edges, which are edges in the graph H.
    The figure below shows the planar version of the embedding of K6 that
we saw before. This new graph has 9 vertices and 21 edges.
7.5.   Planar Graphs and the Crossing Lemma                               387




   We make the following observations:

   • The graph H is planar, because it is embedded without any crossings.

   • The graph H has v + cr (G) vertices.

   • How many edges does H have? Any crossing in G is the intersection
     of exactly two edges of G; these two edges contribute four edges to H.
     Hence, for any crossing in G, the number of edges in H increases by
     two. It follows that H has e + 2 · cr (G) edges.

Since H is planar, we know from Theorem 7.5.4 that the number of its edges
is bounded from above by three times the number of its vertices minus six,
i.e.,
                     e + 2 · cr (G) ≤ 3(v + cr (G)) − 6.
By rewriting this inequality, we obtain the following result:

Theorem 7.5.6 For any graph G with v ≥ 3 vertices and e edges, we have

                             cr (G) ≥ e − 3v + 6.

                        the complete graph Kn on n vertices, where n ≥ 3.
   For example, consider
                      n
Since this graph has 2 edges, we obtain
                             
                             n             1     7
                 cr (Kn ) ≥     − 3n + 6 = n2 − n + 6.               (7.5)
                             2             2     2

For n = 6, we get cr (K6 ) ≥ 3. Since we have seen before that cr (K6 ) ≤ 3, it
follows that cr (K6 ) = 3.
388                             Chapter 7.       The Probabilistic Method


   Since Kn has n2 edges and any two of them cross at most once, we have
                   

the following obvious upper bound on the crossing number of Kn :
                                    n
                        cr (Kn ) ≤   2   = O(n4 ).                  (7.6)
                                     2
(Of course (7.6) holds for any graph with n vertices.)
   To conclude this subsection, (7.5) gives an n2 -lower bound, whereas (7.6)
gives an n4 -upper bound on the crossing number of Kn . In the next section,
we will determine the true asymptotic behavior of cr (Kn ).

A better lower bound on the crossing number
As before, let G be any graph with v ≥ 3 vertices and e edges. Again
we consider an embedding of G having cr (G) crossings. In the rest of this
subsection, we will use the Probabilistic Method to prove a lower bound on
the crossing number of G.
   We choose a real number p such that 0 < p ≤ 1. Consider a coin that
comes up heads with probability p and comes up tails with probability 1 − p.
Let Gp be the random subgraph of G, that is obtained as follows.
   • For each vertex a of G, flip the coin (independently of the other coin
     flips) and add a as a vertex to Gp if and only if the coin comes up
     heads.
   • Each edge ab of G appears as an edge in Gp if and only if both a and
     b are vertices of Gp .
Recall that we fixed the embedding of G. As a result, this random process
gives us an embedding of Gp (which may not be the best one in terms of the
number of crossings.)
   We denote the number of vertices, edges, and crossings in the embedding
of Gp by vp , ep , and xp , respectively. Observe that these three quantities are
random variables.
   It follows from Theorem 7.5.6 that

                            cr (Gp ) − ep + 3vp ≥ 6,

provided that vp ≥ 3. This implies that

                            cr (Gp ) − ep + 3vp ≥ 0,
7.5.   Planar Graphs and the Crossing Lemma                                   389


for any value of vp that results from our random choices.
    Since cr (Gp ) ≤ xp , it follows that
                                  xp − ep + 3vp ≥ 0.
The left-hand side is a random variable, which is always non-negative, no
matter what graph Gp results from our random choices. Therefore, its ex-
pected value is also non-negative, i.e.,
                             E (xp − ep + 3vp ) ≥ 0.
Using the Linearity of Expectation (i.e., Theorem 6.5.2), we get
                         E(xp ) − E(ep ) + 3 · E(vp ) ≥ 0.                   (7.7)
We are now going to compute these three expected values separately.
   The random variable vp is equal to the number of successes in v inde-
pendent trials, each one having success probability p. In other words, vp has
a binomial distribution with parameters v and p, and, therefore, by Theo-
rem 6.7.2,
                                 E(vp ) = pv.
To compute E(ep ), we number the edges of G arbitrarily from 1 to e. For
each i with 1 ≤ i ≤ e, define Xi to be the indicator random variable with
value               
                       1 if the i-th edge is an edge in Gp ,
               Xi =
                       0 otherwise.
Since an edge of G is in Gp if and only if both its vertices are in Gp , it follows
that
                          E(Xi ) = Pr(Xi = 1) = p2 .
Then, since ep = ei=1 Xi , we get
                 P

                            e
                                 !      e             e
                          X           X              X
             E(ep ) = E        Xi =        E(Xi ) =      p2 = p2 e.
                            i=1           i=1          i=1

Finally, we compute the expected value of the random variable xp . Number
the crossings in the embedding of G arbitrarily from 1 to cr (G). For each i
with 1 ≤ i ≤ cr (G), define Yi to be the indicator random variable with value
                   
                     1 if the i-th crossing is a crossing in Gp ,
             Yi =
                     0 otherwise.
390                                 Chapter 7.            The Probabilistic Method


Let ab and cd be the edges of G that cross in the i-th crossing2 . This crossing
appears as a crossing in Gp if and only if both ab and cd are edges in Gp .
Since the points a, b, c, and d are pairwise distinct, it follows that the i-th
crossing of G appears as a crossing in Gp with probability p4 . Thus,

                               E(Yi ) = Pr(Yi = 1) = p4 .
             Pcr (G)
Since xp =     i=1 Yi , it follows that
                                
                        cr (G)       cr (G)          cr (G)
                         X            X               X
          E(xp ) = E          Yi =
                                           E(Yi ) =        p4 = p4 · cr (G).
                         i=1            i=1                i=1


Substituting the three expected values into (7.7), we get

                           p4 · cr (G) − p2 e + 3 · pv ≥ 0,

which we rewrite as
                                      p2 e − 3pv
                                 cr (G) ≥        .                    (7.8)
                                           p4
Observe that this inequality holds for any real number p with 0 < p ≤ 1.
   If we assume that e ≥ 4v, and take p = 4v/e (so that 0 < p ≤ 1), then
we obtain a new lower bound on the crossing number:

Theorem 7.5.7 (Crossing Lemma) Let G be any graph with v vertices
and e edges. If e ≥ 4v, then
                                               1 e3
                                    cr (G) ≥          .
                                               64 v 2
    Applying this lower bound to the complete graph Kn gives cr (Kn ) =
Ω(n4 ). This lower bound is much better than the quadratic lower bound in
(7.5) and it matches the upper bound in (7.6). Hence, we have shown that
cr (Kn ) = Θ(n4 ).

Remark 7.5.8 Let n be a very large integer     and consider the complete
                                        n
                                          
graph Kn with v = n vertices and e = 2 edges. Let us see what happens
if we repeat the proof for this graph. We choose a random subgraph Gp of
   2
    By our definition of embedding, see Section 7.5.1, there are exactly two edges that
determine the i-th crossing.
7.6.   Exercises                                                              391


Kn , where p = 4v/e = 8/(n − 1). The expected number of vertices in Gp
is equal to pn, which is approximately equal to 8. Thus, the random graph
Gp is, expected, extremely small. Then we apply the weak lower bound of
Theorem 7.5.6 to this, again expected, extremely small graph. The result is a
proof that in any embedding of the huge graph Kn , there are Ω(n4 ) crossings!


7.6      Exercises
7.1 Prove that, for any graph G with m edges, the sequence X1 , X2 , . . . , Xm
of random variables in the proof of Theorem 7.1.1 is pairwise independent.
    Give an example of a graph for which this sequence is not mutually inde-
pendent.
7.2 Prove that Theorem 7.5.4 also holds if G is not connected.
7.3 Let K5 be the complete graph on 5 vertices. In this graph, each pair of
vertices is connected by an edge. Prove that K5 is not planar.
7.4 Let G be any embedding of a connected planar graph with v ≥ 4 vertices.
Assume that this embedding has no triangles, i.e., there are no three vertices
a, b, and c, such that ab, bc, and ac are edges of G.
   • Prove that G has at most 2v − 4 edges.
   • Let K3,3 be the complete bipartite graph on 6 vertices. The vertex set
     of this graph consists of two sets A and B, both of size three, and each
     vertex of A is connected by an edge to each vertex of B. Prove that
     K3,3 is not planar.
7.5 Consider the numbers Rn that were defined in Section 4.8. In Sec-
tion 4.8.1, we proved that Rn = O(n8 ). Prove that Rn = O(n4 ).
7.6 Let n be a sufficiently large positive integer and consider the complete
graph Kn . This graph has vertex set V = {1, 2, . . . , n}, and each pair of
distinct vertices is connected by an undirected edge. (Thus, Kn has n2
edges.)
        ~ n be the directed graph obtained by making each edge {i, j} of Kn
    Let K
                          ~ n , this edge either occurs as the directed edge (i, j)
a directed edge; thus, in K
from i to j or as the directed edge (j, i) from j to i.
    We say that three pairwise distinct vertices i, j, and k define a directed
            ~ n , if
triangle in K
392                              Chapter 7.         The Probabilistic Method

                                             ~ n or
   • (i, j), (j, k), and (k, i) are edges in K
                                             ~ n.
   • (i, k), (k, j), and (j, i) are edges in K
   Prove that there exists a way to direct the edges of Kn , such that the
                                ~              1 n
                                                  
number of directed triangles in Kn is at least 4 3 .

7.7 Let G = (V, E) be a graph with vertex set V and edge set E. A subset I
of V is called an independent set if for any two distinct vertices u and v in I,
(u, v) is not an edge in E. For example, in the following graph, I = {a, e, i}
is an independent set.
                                              f
                        c                                      i


               a                                        g
                                   d


                        b                     e                h

    Let n and m denote the number of vertices and edges in G, respectively,
and assume that m ≥ n/2. This exercise will lead you through a proof of the
fact that G contains an independent set of size at least n2 /(4m).
    Consider the following algorithm, in which all random choices made are
mutually independent:
      Algorithm IndepSet(G):

          Step 1: Set H = G.

          Step 2: Let d = 2m/n. For each vertex v of H, with
          probability 1 − 1/d, delete the vertex v, together with its
          incident edges, from H.

          Step 3: As long as the graph H contains edges, do the
          following: Pick an arbitrary edge (u, v) in H, and remove
          the vertex u, together with its incident edges, from H.

          Step 4: Let I be the vertex set of the graph H. Return I.
7.6.    Exercises                                                              393


   • Argue that the set I that is returned by this algorithm is an independent
     set in G.
   • Let X and Y be the random variables whose values are the number of
     vertices and edges in the graph H after Step 2, respectively. Prove that
                                    E(X) = n2 /(2m)
       and
                                    E(Y ) = n2 /(4m).
   • Let Z be the random variable whose value is the size of the independent
     set I that is returned by the algorithm. Argue that
                                      Z ≥ X − Y.

   • Prove that
                                    E(Z) ≥ n2 /(4m).
   • Argue that this implies that the graph G contains an independent set
     of size at least n2 /(4m).
7.8 Elisa Kazan is having a party at her home. Elisa has a round table
that has 52 seats numbered 0, 1, 2, . . . , 51 in clockwise order. Elisa invites 51
friends, so that the total number of people at the party is 52. Of these 52
people, 15 drink cider, whereas the other 37 drink beer.
    In this exercise, you will prove the following claim: No matter how the 52
people sit at the table, there is always a consecutive group of 7 people such
that at least 3 of them drink cider.
    From now on, we consider an arbitrary (which is not random) arrange-
ment of the 52 people sitting at the table.
   • Let k be a uniformly random element of the set {0, 1, 2, . . . , 51}. Con-
     sider the consecutive group of 7 people that sit in seats k, k + 1, k +
     2, . . . , k + 6; these seat numbers are to be read modulo 52. Define the
     random variable X to be the number of people in this group that drink
     cider. Prove that E(X) > 2.
       Hint: Number the 15 cider drinkers arbitrarily as P1 , P2 , . . . , P15 . For
       each i with 1 ≤ i ≤ 15, consider the indicator random variable
               
                  1 if Pi sits in one of the seats k, k + 1, k + 2, . . . , k + 6,
         Xi =
                  0 otherwise.
394                          Chapter 7.     The Probabilistic Method


  • For the given arrangement of the 52 people sitting at the table, prove
    that there is a consecutive group of 7 people such that at least 3 of
    them drink cider.
      Hint: Assume the claim is false. What is an upper bound on E(X)?