Trees, Binary Search Trees, Heaps & Applications

Authors Dr. Chris Bourke
Plaintext
                                    Trees
          Trees, Binary Search Trees, Heaps & Applications


                         Dr. Chris Bourke
           Department of Computer Science & Engineering
                  University of Nebraska—Lincoln
                     Lincoln, NE 68588, USA
                       cbourke@cse.unl.edu
                  http://cse.unl.edu/~cbourke

                            2015/01/31 21:05:31


                                    Abstract

    These are lecture notes used in CSCE 156 (Computer Science II), CSCE 235 (Dis-
crete Structures) and CSCE 310 (Data Structures & Algorithms) at the University of
Nebraska—Lincoln.




                  This work is licensed under a Creative Commons
                  Attribution-ShareAlike 4.0 International License


                                        1
Contents

I   Trees                                                                                         4

1 Introduction                                                                                     4

2 Definitions & Terminology                                                                        5

3 Tree Traversal                                                                                   7
    3.1   Preorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       7
    3.2   Inorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      7
    3.3   Postorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      7
    3.4   Breadth-First Search Traversal . . . . . . . . . . . . . . . . . . . . . . . . . .       8
    3.5   Implementations & Data Structures . . . . . . . . . . . . . . . . . . . . . . .          8
          3.5.1   Preorder Implementations . . . . . . . . . . . . . . . . . . . . . . . .         8
          3.5.2   Inorder Implementation . . . . . . . . . . . . . . . . . . . . . . . . .         9
          3.5.3   Postorder Implementation . . . . . . . . . . . . . . . . . . . . . . . .        10
          3.5.4   BFS Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .        12
          3.5.5   Tree Walk Implementations . . . . . . . . . . . . . . . . . . . . . . .         12
    3.6   Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    12

4 Binary Search Trees                                                                             14
    4.1   Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      15

5 Balanced Binary Search Trees                                                                    17
    5.1   2-3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   17
    5.2   AVL Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     17
    5.3   Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     19

6 Optimal Binary Search Trees                                                                     19

7 Heaps                                                                                           19
    7.1   Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    20
    7.2   Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       21
          7.2.1   Finding the first available open spot in a Tree-based Heap . . . . . .          21


                                                  2
  7.3    Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     25

8 Java Collections Framework                                                                     26

9 Applications                                                                                   26
  9.1    Huffman Coding      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   26
         9.1.1   Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     28

A Stack-based Traversal Simulations                                                              29
  A.1 Preorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       29
  A.2 Inorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      30
  A.3 Postorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        31


List of Algorithms
 1      Recursive Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . .         8
 2      Stack-based Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . .         9
 3      Stack-based Inorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . .       10
 4      Stack-based Postorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . .       11
 5      Queue-based BFS Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . .         12
 6      Tree Walk based Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . .       13
 7      Search algorithm for a binary search tree . . . . . . . . . . . . . . . . . . . . .      16
 8      Find Next Open Spot - Numerical Technique . . . . . . . . . . . . . . . . . .            23
 9      Find Next Open Spot - Walk Technique . . . . . . . . . . . . . . . . . . . . .           24
 10     Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    25
 11     Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       28


Code Samples

List of Figures
  1      A Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      6
  2      A Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      15
  3      Binary Search Tree Operations       . . . . . . . . . . . . . . . . . . . . . . . . .   18

                                                3
    4    A min-heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   19
    5    Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   29


Part I
Trees
Lecture Outlines
CSCE 156

    • Basic definitions

    • Tree Traversals

CSCE 235

    • Review of basic definitions

    • Huffman Coding

CSCE 310

    • Heaps, Heap Sort

    • Balanced BSTs: 2-3 Trees, AVL, Red-Black


1       Introduction
Motivation: we want a data structure to store elements that offers efficient, arbitrary retrieval
(search), insertion, and deletion.

    • Array-based Lists

         – O(n) insertion and deletion
         – Fast index-based retrieval
         – Efficient binary search if sorted

    • Linked Lists

         – Efficient, O(1) insert/delete for head/tail

                                               4
         – Inefficient, O(n) arbitrary search/insert/delete
         – Efficient binary search not possible without random access

    • Stacks and queues are efficient, but are restricted access data structures

    • Possible alternative: Trees

    • Trees have the potential to provide O(log n) efficiency for all operations


2     Definitions & Terminology
    • A tree is an acyclic graph

    • For our purposes: a tree is a collection of nodes (that can hold keys, data, etc.) that
      are connected by edges

    • Trees are also oriented : each node has a parent and children

    • A node with no parents is the root of the tree, all child nodes are oriented downward

    • Nodes not immediately connected can have an ancestor, descendant or cousin relation-
      ship

    • A node with no children is a leaf

    • A tree such that all nodes have at most two children is called a binary tree

    • A binary tree is also oriented horizontally: each node may have a left and/or a right
      child

    • Example: see Figure 1

    • A path in a tree is a sequence nodes connected by edges

    • The length of a path in a tree is the number of edges in the path (which equals the
      number of nodes in the path minus one)

    • A path is simple if it does not traverse nodes more than once (this is the default type
      of path)

    • The depth of a node u is the length of the (unique) path from the root to u

    • The depth of the root is 0

    • The depth of a tree is the maximal depth of any node in the tree (sometimes the term
      height is used)


                                              5
   • All nodes of the same depth are considered to be at the same level

   • A binary tree is complete (also called full or perfect) if all nodes are present at all levels
     0 up to its depth d

   • A sub-tree rooted at a node u is the tree consisting of all descendants with u oriented
     as the root


                                                             a


                                           b                         c


                                  d              e                               f

                              g        h                 i               j           k

                          l       m        n         o           p           q


                              r


                                      Figure 1: A Binary Tree

Properties:

   • In a tree, all nodes are connected by exactly one unique path

   • The maximum number of nodes at any level k is 2k

   • Thus, the maximum number of nodes n for any binary tree of depth d is:
                                                                         d
                                                                         X
                      n = 20 + 21 + 22 + · · · + 2d−1 + 2d =                   2k = 2d+1 − 1
                                                                         k=0


   • Given a full binary tree with n nodes in it has depth:

                                               d = log (n + 1) − 1

   • That is, d = O(log n)

Motivation: if we can create a tree-based data structure with operations proportional to its
depth, then we could potentially have a data structure that allows retrieval/search, insertion,
and deletion in O(log n)-time.



                                                         6
3     Tree Traversal
    • Given a tree, we need a way to enumerate elements in a tree

    • Many algorithms exist to iterate over the elements in a tree

    • We’ll look at several variations on a depth-first-search


3.1    Preorder Traversal
    • A preorder traversal strategy visits nodes in the following order: root; left-sub-tree;
      right-sub-tree

    • An example traversal on the tree in Figure 1:

                               a, b, d, g, l, m, r, h, n, e, i, o, c, f, j, p, q, k

    • Applications:

         – Building a tree, traversing a tree, copying a tree, etc.
         – Efficient stack-based implementation
         – Used in prefix notation (polish notation); used in languages such as Lisp/Scheme


3.2    Inorder Traversal
    • An inorder traversal strategy visits nodes in the following order: left-sub-tree; root;
      right-sub-tree

    • An example traversal on the tree in Figure 1:

                               l, g, r, m, d, h, n, b, e, o, i, a, c, p, j, q, f, k

    • Applications:

         – Enumerating elements in order in a binary search tree
         – Expression trees


3.3    Postorder Traversal
    • A postorder traversal strategy visits nodes in the following order: left-sub-tree; right-
      sub-tree; root


                                                    7
   • An example traversal on the tree in Figure 1:
                              l, r, m, g, n, h, d, o, i, e, b, p, q, j, k, f, c, a

   • Applications:
         – Topological sorting
         – Destroying a tree when manual memory management is necessary (roots are the
           last thing that get cleaned up)
         – Reverse polish notation (operand-operand-operator, unambiguous, used in old HP
           calculators)
         – PostScript (Page Description Language)


3.4     Breadth-First Search Traversal
   • Breadth-First Search (BFS) traversal is a general graph traversal strategy that explores
     local or close nodes first before traversing “deeper” into the graph
   • When applied to an oriented binary tree, BFS explores the tree level-by-level (top-to-
     bottom, left-to-right)


3.5     Implementations & Data Structures
   • Reference based implementation: TreeNode<T>
         – Owns (through composition) references to: leftChild, rightChild, parent
         – Can use either sentinel nodes or null to indicate missing children and parent
   • BinaryTree<T> owns a root
   • SVN examples: unl.cse.bst

3.5.1    Preorder Implementations

      Input : A binary tree node u
      Output: A preorder traversal of the nodes in the subtree rooted at u
  1   print u
  2   preOrderTraversal(u → lef tChild)
  3   preOrderTraversal(u → rightChild)

                  Algorithm 1: Recursive Preorder Tree Traversal
Stack-based implementation:

                                                   8
   • Initially, we push the tree’s root into the stack

   • Within a loop, we pop the top of the stack and process it

   • We need to push the node’s children for future processing

   • Since a stack is LIFO, we push the right child first.

      Input : A binary tree, T
      Output: A preorder traversal of the nodes in T
  1   S ← empty stack
  2   push T ’s root onto S
  3   while S is not empty do
  4      node ← S.pop
  5      push node’s right-child onto S
  6      push node’s left-child onto S
  7      process node
  8   end

                 Algorithm 2: Stack-based Preorder Tree Traversal

3.5.2    Inorder Implementation

Stack-based implementation:

   • The same basic idea: push nodes onto the stack as you visit them

   • However, we want to delay processing the node until we’ve explored the left-sub-tree

   • We need a way to tell if we are visiting the node for the first time or returning from
     the left-tree exploration

   • To achieve this, we allow the node to be null

   • If null, then we are returning from a left-tree exploration, pop the top of the stack and
     process (then push the right child)

   • If not null, then we push it to the stack for later processing, explore the left child




                                              9
      Input : A binary tree, T
      Output: An inorder traversal of the nodes in T
  1   S ← empty stack
  2   u ← root
  3   while S is not empty Or u 6= null do
  4      if u 6= null then
  5          push u onto S
  6          u ← u.leftChild
  7      else
  8         u ← S.pop
  9         process u
 10         u ← u.rightChild
 11      end
 12   end

                 Algorithm 3: Stack-based Inorder Tree Traversal

3.5.3    Postorder Implementation

Stack-based implementation:

   • Same basic ideas, except that we need to distinguish if we’re visiting the node for the
     first time, second time or last (so that we can process it)

   • To achieve this, we keep track of where we came from: a parent, left, or right node

   • We keep track of a previous and a current node




                                            10
     Input : A binary tree, T
     Output: A postorder traversal of the nodes in T
 1   S ← empty stack
 2   prev ← null
 3   push root onto S
 4   while S is not empty do
 5      curr ← S.peek
 6      if prev = null Or prev.leftChild = curr Or prev.rightChild = curr then
 7          if curr.leftChild 6= null then
 8              push curr.leftChild onto S
 9          else if curr.rightChild 6= null then
10              push curr.rightChild onto S
11          end
12      else if curr.leftChild = prev then
13         if curr.rightChild 6= null then
14             push curr.rightChild onto S
15         end
16      else
17         process curr
18         S.pop
19      end
20      prev ← curr
21   end

               Algorithm 4: Stack-based Postorder Tree Traversal




                                         11
3.5.4    BFS Implementation

      Input : A binary tree, T
      Output: A BFS traversal of the nodes in T
  1   Q ← empty queue
  2   enqueue T ’s root into Q
  3   while Q is not empty do
  4      node ← Q.dequeue
  5      enqueue node’s left-child onto Q
  6      enqueue node’s right-child onto Q
  7      print node
  8   end

                    Algorithm 5: Queue-based BFS Tree Traversal

3.5.5    Tree Walk Implementations

   • Simple rules based on local information: where you are and where you came from

   • No additional data structures required

   • Traversal is a “walk” around the perimeter of the tree

   • Can use similar rules to determine when the current node should be processed to
     achieve pre, in, and postorder traversals

   • Need to take care with corner cases (when current node is the root or children are
     missing)

   • Pseudocode presented Algorithm 6



3.6     Operations
Basic Operations:

   • Search for a particular element/key

   • Adding an element

         – Add at most shallow available spot
         – Add at a random leaf

                                              12
     Input : A binary tree, T
     Output: A Tree Walk around T
 1   curr ← root
 2   prevT ype ← parent
 3   while curr 6=null do
 4      if prevT ype = parent then
            //preorder: process curr here
 5          if curr.lef tChild exists then
                //Go to the left child:
 6              curr ← curr.lef tChild
 7              prevT ype ← parent
 8         else
 9            curr ← curr
10            prevT ype ← lef t
11         end
12      else if prevT ype = lef t then
           //inorder: process curr here
13         if curr.rightChild exists then
               //Go to the right child:
14             curr ← curr.rightChild
15             prevT ype ← parent
16         else
17            curr ← curr
18            prevT ype ← right
19         end
20      else if prevT ype = right then
           //postorder: process curr here
21         if curr.parent = null then
               //root has no parent, we’re done traversing
22             curr ← curr.parent
           //are we at the parent’s left or right child?
23         else if curr = curr.parent.lef tChild then
24            curr ← curr.parent
25            prevT ype ← lef t
26         else
27            curr ← curr.parent
28            prevT ype ← right
29         end
30      end                             13
31   end

                 Algorithm 6: Tree Walk based Tree Traversal
        – Add internally, shift nodes down by some criteria

    • Removing elements

        – Removing leaves
        – Removing elements with one child
        – Removing elements with two children

Other Operations:

    • Compute the total number of nodes in a tree

    • Compute the total number of leaves in a tree

    • Given an item or node, compute its depth

    • Compute the depth of a tree


4     Binary Search Trees
Regular binary search trees have little structure to their elements; search, insert, delete
operations are still linear with respect to the number of tree nodes, O(n). We want a data
structure with operations proportional to its depth, O(d). To this end, we add structure and
order to tree nodes.

    • Each node has an associated key

    • Binary Search Tree Property: For every node u with key uk in T

        1. Every node in the left-sub-tree of u has keys less than uk
        2. Every node in the right-sub-tree of u has keys greater than uk

    • Duplicate keys can be handled, but you must be consistent and not guaranteed to be
      contiguous

    • Alternatively: do not allow duplicate keys or define a key scheme that ensures a total
      order

    • Inductive property: all sub-trees are also binary search trees

    • A full example can be found in Figure 2




                                              14
                                                               50

                                              40                     60

                                    20              45                          90

                              10         24               49              80         95

                          2        15          28    48             75         85

                              12


                                   Figure 2: A Binary Search Tree

4.1    Basic Operations
Observation: a binary search tree has more structure: the key in each node provides infor-
mation on where a node is not located. We will exploit this structure to achieve O(log n)
operations.
Search/retrieve

   • Goal: find a node (and its data) that matches a given key k

   • Start at the node

   • At each node u, compare k to u’s key, uk :

        – If equal, element found, stop and return
        – If k < uk , traverse to u’s left-child
        – If k > uk , traverse to u’s right-child

   • Traverse until the sub-tree is empty (element not found)

   • Analysis: number of comparisons is bounded by the depth of the tree, O(d)




                                                          15
      Input : A binary search tree, T , a key k
      Output: The tree node u ∈ T whose key, uk matches k
  1   u ← T ’s root
  2   while u 6= φ do
  3      if uk = k then
  4          output u
  5      end
  6      else if uk > k then
  7          u ← u’s left-child
  8      else if uk < k then
  9          u ← u’s left-child
 10   end
 11   output φ

                 Algorithm 7: Search algorithm for a binary search tree
Insert

   • Insert new nodes as leaves

   • To determine where it should be inserted: traverse the tree as above

   • Insert at the first available spot (first missing child node)

   • Analysis: finding the available location is O(d), inserting is just reference juggling,
     O(1)

Delete

   • Need to first find the node u to delete, traverse as above, O(d)

   • If u is a leaf (no children): its safe to simply delete it

   • If u has one child, then we can “promote” it to u’s spot (u’s parent will now point to
     u’s child)

   • If u has two children, we need to find a way to preserve the BST property

         – Want to minimally change the tree’s structure
         – Need the operation to be efficient
         – Find the minimum element of the greater nodes (right sub-tree) or the maximal
           element of the lesser nodes (left sub-tree)
         – Such an element will have at most one child (which we know how to delete)

                                                16
        – Delete it and store off the key/data
        – Replace us key/data with the contents of the minimum/maximum element
    • Analysis:
        – Search/Find: O(d)
        – Finding the min/max: O(d)
        – Swapping: O(1)
        – In total: O(d)
    • Examples illustrated in Figure 3



5     Balanced Binary Search Trees
Motivation:

    • Ideally, a full binary tree has depth d = O(log n)
    • General BSTs can degenerate such that d = O(n)
    • Need a way to preserve the BST property while at the same time limiting the depth
      to O(log n)
    • Solution: Balanced Binary Search Trees
    • BSTs that efficiently reorganize themselves such that the BST Property is preserved
      and that the depth is restricted to O(log n)
    • Some types of Balanced BSTs:
        – B-trees (and 2-3 trees)
        – AVL Trees
        – Red-Black Trees
        – Splay trees


5.1    2-3 Trees
See 310 Note Set.


5.2    AVL Trees
See 310 Note Set.

                                             17
                                      15                                                         15

                          7                          30                              7                             30

              4                   9         20             40                4               9            20             40

      1               5                17        23                  1           5                   17        23

                                                                                                 16
                      (a) A Binary Search Tree                   (b) Insertion of a new node (16) into a Binary
                                                                 Search Tree

                                      15                                                             9

                          7                          30                              7                             30

              4                   9         20             40                4                            20             40

      1               5                17        23                  1           5                   17        23
    (c) Deletion of a node with two children (15). (d) Node 15 is replaced with the extremal node,
    First step: find the maximum node in the left- preserving the BST property
    sub-tree (lesser elements).


                                       9                                                         9

                              7                       30                         4                             30

                  4                             20          40           1               5               20             40

          1               5                17        23                                          17           23
        (e) Deletion a node with only one child (7).                 (f) Removal is achieved by simply promoting
                                                                     the single child/subtree.

Figure 3: Binary Search Tree Operations. Figure 3(b) depicts the insertion of (16) into
the tree in Figure 3(a). Figures 3(c) and 3(d) depict the deletion of a node (15) with two
children. Figures 3(e) and 3(f) depict the deletion of a node with only one child (7).


                                                                18
5.3     Red-Black Trees
See 310 Note Set.


6      Optimal Binary Search Trees
See 310 Note Set.


7      Heaps
Definition 1. A heap is a binary tree that satisfies the following properties.

    1. It is a full or complete binary tree: all nodes are present except possibly the last row

    2. If the last row is not full, all nodes are full-to-the-left

    3. It satisfies the Heap Property: every node has a key that is greater than both of its
       children (max-heap)

    • As a consequence of the Heap Property, the maximal element is always at the root

    • Alternatively, we can define a min-heap

    • Variations: 2-3 heaps, fibonacci heaps, etc.

    • A min-heap example can be found in Figure 4


                                                               1

                                            5                                     30

                                  50                 55                 40                  80

                            60         53       72        70       65        45        90        95

                       92        63




                                            Figure 4: A min-heap

Applications


                                                          19
   • Heaps are an optimal implementation of a priority queue
   • Used extensively in other algorithms (Heap Sort, Prim’s, Dijkstra’s, Huffman Coding,
     etc.) to ensure efficient operation


7.1      Operations
Insert

   • Want to preserve the full-ness property and the Heap Property
   • Preserve full-ness: add new element at the end of the heap (last row, first free spot on
     the left)
   • Insertion at the end may violate the Heap Property
   • Heapify/fix: bubble up inserted element until Heap Property is satisfied
   • Analysis: insert is O(1); heapify: O(d)

Remove from top

   • Want to preserve the full-ness property and the Heap Property
   • Preserve full-ness: swap root element with the last element in the heap (lowest row,
     right-most element)
   • Heap property may be violated
   • Heapify/fix: bubble new root element down until Heap Property is satisfied
   • Analysis: Swap is O(1); heapify: O(d)

Others

   • Arbitrary remove
   • Find
   • Possible, but not ideal: Heaps are restricted-access data structures

Analysis

   • All operations are O(d)
   • Since Heaps are full, d = O(log n)
   • Thus, all operations are O(log n)

                                               20
7.2     Implementations
Array-Based

   • Root is located at index 1

   • If a node u is at index i, u’s left-child is at 2i, its right-child is at 2i + 1

   • If node u is at index j, its parent is at index b 2j c

   • Alternatively: 0-index array left/right children/parent are at 2n + 1, 2n + 2, and b j−1
                                                                                           2
                                                                                              c

   • Advantage: easy implementation, all items are contiguous in the array (in fact, a BFS
     ordering!)

   • Disadvantage: Insert operation may force a reallocation, but this can be done in
     amortized-constant time (though may still have wasted space)

Tree-Based

   • Reference-based tree (nodes which contain references to children/parent)

   • Parent reference is now required for efficiency

   • For efficiency, we need to keep track of the last element in the tree

   • For deletes/inserts: we need a way to find the last element and first “open spot”

   • We’ll focus on finding the first available open spot as the same technique can be used
     to find the last element with minor modifications

7.2.1   Finding the first available open spot in a Tree-based Heap

Technique A: numerical technique

   • WLOG: assume we keep track of the number of nodes in the heap, n and thus the
     depth d = blog nc

   • If n = 2d+1 − 1 then the tree is full, the last element is all the way to the right, the
     first available spot is all the way to the left

   • Otherwise n < 2d+1 − 1 and the heap is not full (the first available spot is located at
     level d, root is at level 0)

   • Starting at the root, we want to know if the last element is in the left-subtree or the
     right subtree


                                                21
• Let m = n − (2d − 1), the number of nodes present in level d
           d
• If m ≥ 22 then the left-sub tree is full at the last level and so the next open spot would
  be in the right-sub tree
                      d
• Otherwise if m < 22 then the left-sub tree is not full at the last level and so the next
  open spot is in the left-sub tree

• Traverse down to the left or right respectively and repeat: the resulting sub-tree will
                                                                 d
  have depth d − 1 with m = m (if traversing left) or m = m − 22 (if traversing right)

• Repeat until we’ve found the first available spot

• Analysis: in any case, its O(d) = O(log n) to traverse from the root to the first open
  spot




                                           22
      Input : A tree-based heap H with n nodes
      Output: The node whose child is the next available open spot in the heap
  1   curr ← T.head
  2   d ← blog nc
  3   m←n
  4   while curr has both children do
  5      if m = 2d+1 − 1 then
            //remaining tree is full, traverse all the way left
  6         while curr has both children do
  7             curr ← curr.lef tChild
  8         end
  9      else
            //remaining tree is not full, determine if the next open spot is
              in the left or right sub-tree
                     d
 10         if m ≥ 22 then
               //left sub-tree is full
 11            d ← (d − 1)
                            d
 12             m ← (m − 22 )
 13             curr ← curr.rightChild
 14         else
               //left sub-tree is not full
 15            d ← (d − 1)
 16            m←m
 17            curr ← curr.lef tChild
 18         end
 19      end
 20   end
 21   output curr

             Algorithm 8: Find Next Open Spot - Numerical Technique
Technique B: Walk Technique

   • Alternatively, we can adapt the idea behind the tree walk algorithm to find the next
     available open spot

   • We’ll assume that we’ve kept track of the last node

   • If the tree is full, we simply traverse all the way to the left and insert, O(d)


                                              23
 • If the last node is a left-child then its parent’s right child is the next available spot,
   finding it is O(1)

 • Otherwise, we’ll need to traverse around the perimeter of the tree until we reach the
   next open slot

     Input : A tree-based heap H with n nodes
     Output: The node whose (missing) child is the next available open spot in the heap
 1   d ← blog nc
 2   if n = 2d+1 − 1 then
         //The tree is full, traverse all the way to the left
 3       curr ← root
 4       while curr.lef tChild 6= null do
 5          curr ← curr.lef tChild
 6       end
 7   else if last is a left-child then
        //parent’s right child is open
 8      curr ← last.parent
 9   else
        //The open spot lies in a subtree to the right of the last node
        //Walk the tree until we reach it
10      curr ← last.parent
11      while curr is a right-child do
12         curr ← curr.parent
13      end
        //"turn" right
14      curr ← curr.parent
15      curr ← curr.rightChild
        //traverse all the way left
16      while curr.lef tChild 6= null do
17         curr ← curr.lef tChild
18      end
19   end
     //current node’s missing child is the open spot
20   output curr

              Algorithm 9: Find Next Open Spot - Walk Technique



                                            24
7.3     Heap Sort
   • If min/max element is always at the top; simply insert all elements, then remove them
     all!
   • Perfect illustration of “Smart data structures and dumb code are a lot better than the
     other way around”

      Input : A collection of elements A = {a1 , . . . , an }
      Output: A collection, A0 of elements in A, sorted
  1   H ← empty heap
  2   A0 ← empty collection
  3   foreach x ∈ A do
  4      insert x into H
  5   end
  6   while H is not empty do
  7      y ← remove top from H
  8      Add y to the end of A0
  9   end
 10   output A0

                               Algorithm 10: Heap Sort
Analysis

   • Amortized analysis: insert/remove operations are not constant throughout the algo-
     rithm
   • On first iteration: insert is d = O(1); on the i-th iteration, d = O(log i); only on the
     last iteration is insertion O(log n)
   • In total, the insert phase is:
                                         n
                                         X
                                                log i = O(n log n)
                                          i=1

   • A similar lower bound can be shown
   • Same analysis applies to the remove phase:
                                                  1
                                                  X
                                                         log i
                                                   i=n

   • In total, O(n log n)

                                                  25
8     Java Collections Framework
Java has support for several data structures supported by underlying tree structures.

    • java.util.PriorityQueue<E> is a binary-heap based priority queue
         – Priority (keys) based on either natural ordering or a provided Comparator
         – Guaranteed O(log n) time for insert (offer) and get top (poll)
         – Supports O(n) arbitrary remove(Object) and search (contains) methods
    • java.util.TreeSet<E>
         – Implements the SortedSet interface; makes use of a Comparator
         – Backed by TreeMap, a red-black tree balanced binary tree implementation
         – Guaranteed O(log n) time for add, remove, contains operations
         – Default iterator is an in-order traversal


9     Applications
9.1      Huffman Coding
Overview

    • Coding Theory is the study and theory of codes—schemes for transmitting data
    • Coding theory involves efficiently padding out data with redundant information to
      increase reliability (detect or even correct errors) over a noisy channel
    • Coding theory also involves compressing data to save space
         – MP3s (uses a form of Huffman coding, but is information lossy)
         – jpegs, mpegs, even DVDs
         – pack (straight Huffman coding)
         – zip, gzip (uses a Ziv-Lempel and Huffman compression algorithm)

Basics

    • Let Σ be a fixed alphabet of size n
    • A coding is a mapping of this alphabet to a collection of binary codewords,

                                            Σ → {0, 1}∗

                                              26
   • A block encoding is a fixed length encoding scheme where all codewords have the same
     length (example: ASCII); requires dlog2 ne length codes

   • Not all symbols have the same frequency, alternative: variable length encoding

   • Intuitively: assign shorter codewords to more frequent symbols, longer to less frequent
     symbols

   • Reduction in the overall average codeword length

   • Variable length encodings must be unambiguous

   • Solution: prefix free codes: a code in which no whole codeword is the prefix of another
     (other than itself of course).

   • Examples:

        – {0, 01, 101, 010} is not a prefix free code.
        – {10, 010, 110, 0110} is a prefix free code.

   • A simple way of building a prefix free code is to associate codewords with the leaves
     of a binary tree (not necessarily full).

   • Each edge corresponds to a bit, 0 if it is to the left sub-child and 1 to the right sub-child.

   • Since no simple path from the root to any leaf can continue to another leaf, then we
     are guaranteed a prefix free coding.

   • Using this idea along with a greedy encoding forms the basis of Huffman Coding

Steps

   • Consider a precomputed relative frequency function:

                                           freq : Σ → [0, 1]

   • Build a collection of weighted trees Tx for each symbol x ∈ Sigma with wt(Tx ) = freq(x)

   • Combine the two least weighted trees via a new node; associate a new weight (the sum
     of the weights of the two subtrees)

   • Keep combining until only one tree remains

   • The tree constructed in Huffman’s algorithm is known as a Huffman Tree and it defines
     a Huffman Code



                                               27
      Input : An alphabet of symbols, Σ with relative frequencies, freq(x)
      Output: A Huffman Tree
  1   H ← new min-heap
  2   foreach x ∈ Σ do
  3      Tx ← single node tree
  4      wt(Tx ) ← freq(x)
  5      insert Tx into H
  6   end
  7   while size of H > 1 do
  8      Tr ← new tree root node
  9      Ta ← H.getM in
 10      Tb ← H.getM in
 11      Tr .lef tChild ← Ta
 12      Tr .rightChild ← Tb
 13      wt(r) ← wt(Ta ) + wt(Tb )
 14      insert Tr into H
 15   end
 16   output H.getM in

                           Algorithm 11: Huffman Coding

9.1.1    Example

Construct the Huffman Tree and Huffman Code for a file with the following content.

                      character  A          B      C      D     E     F      G
                      frequency 0.10       0.15   0.34   .05   .12   .21    .03


   • Average codeword length:

                 .10 · 3 + .15 · 3 + .34 · 2 + .05 · 4 + .12 · 3 + .21 · 2 + .03 · 4 = 2.53

   • Compression ratio:
                                          (3 − 2.53)
                                                     = 15.67%
                                              3
   • In general, for text files, pack (Huffman Coding), claims an average compression ratio
     of 25-40%.

   • Degenerative cases:

                                                  28
                                                                     .08
                                                           0                      1

                                                 .39                                          .61
                                         0             1                              0             1

                                   .18                 F : .21                  .27                 C : .34
                           0                 1                             0              1

                         .08             A : .10                      E : .12         B : .15
                  0            1

               G : .03     D : .05

                                                  Figure 5: Huffman Tree

        – When the probability distribution is uniform: p(x) = p(y) for all x, y ∈ Σ
        – When the probability distribution follows a fibonacci sequence (the sum of each
          of the two smallest probabilities is less than equal to the next highest probability
          for all probabilities)


A      Stack-based Traversal Simulations
A.1     Preorder
Simulation of a stack-based preorder traversal of the binary tree in Figure 1.

      push a                                            (enter loop)             pop, node = e                 push f
                            (enter loop)               pop, node = r                 push i                   (no push)
   (enter loop)            pop, node = g                 (no push)                 (no push)                   print c
  pop, node = a               push m                     (no push)                  print e
      push c                   push l                      print r                                        (enter loop)
      push b                  print g                                             (enter loop)           pop, node = f
     print a                                            (enter loop)             pop, node = i              push k
                            (enter loop)               pop, node = h               (no push)                 push j
   (enter loop)            pop, node = l                  push n                     push o                 print f
  pop, node = b              (no push)                   (no push)                   print i
      push e                 (no push)                    print h                                         (enter loop)
      push d                   print l                                            (enter loop)           pop, node = j
      print b                                           (enter loop)             pop, node = o               push q
                            (enter loop)               pop, node = n               (no push)                 push p
   (enter loop)            pop, node = m                 (no push)                 (no push)                 print j
  pop, node = d              (no push)                   (no push)                   print o
     push h                    push r                     print n                                         (enter loop)
      push g                  print m                                             (enter loop)           pop, node = p
     print d                                               (enter loop)          pop, node = c             (no push)


                                                               29
    (no push)            (enter loop)           (no push)                                    (no push)
     print p            pop, node = q           (no push)            (enter loop)            (no push)
                                                 print q            pop, node = k             print k


A.2      Inorder
Simulation of a stack-based inorder traversal of the binary tree in Figure 1.

(enter loop, u = a)      (enter loop,       update u = null                                    u=p
     push a               u = null)                                  (enter loop,            process p
   update u = b         pop r, update          (enter loop,           u = null)           update u = null
                            u=r                 u = null)           pop i, update
(enter loop, u = b)       process r           pop b, update              u=i                (enter loop,
     push b            update u = null            u=b                  process i             u = null)
   update u = d                                 process b          update u = null         pop j, update
                         (enter loop,         update u = e                                     u=j
(enter loop, u = d)       u = null)                                   (enter loop,           process j
     push d             pop m, update       (enter loop, u = e)        u = null)           update u = q
   update u = g             u=m                   push e            pop a, update
                          process m          update u = null             u=a             (enter loop, u = q)
(enter loop, u = g)    update u = null                                 process a               push q
     push g                                    (enter loop,          update u = c         update u = null
   update u = l          (enter loop,           u = null)
                          u = null)           pop e, update       (enter loop, u = c)       (enter loop,
(enter loop, u = l)     pop d, update             u=e                   push c               u = null)
     push l                 u=d                 process e          update u = null         pop q, update
 update u = null          process d            update u = i                                    u=q
                        update u = h                                 (enter loop,            process q
   (enter loop,                             (enter loop, u = i)       u = null)           update u = null
    u = null)         (enter loop, u = h)         push i            pop c, update
pop l, update u = l         push h             update u = o             u=c                 (enter loop,
     process l         update u = null                                process c              u = null)
 update u = null                            (enter loop, u = o)     update u = f           pop f , update
                         (enter loop,             push o                                       u=f
   (enter loop,           u = null)          update u = null      (enter loop, u = f )       process f
    u = null)           pop h, update                                   push f             update u = k
  pop g, update             u=h               (enter loop,           update u = j
      u=g                 process h            u = null)                                 (enter loop, u = k)
    process g           update u = n         pop o, update        (enter loop, u = j)          push k
  update u = m                                   u=o                    push j            update u = null
                      (enter loop, u = n)      process o             update u = p
(enter loop, u = m)         push n          update u = null                                 (enter loop,
      push m           update u = null                            (enter loop, u = p)        u = null)
   update u = r                               (enter loop,              push p             pop k, update
                         (enter loop,          u = null)           update u = null             u=k
(enter loop, u = r)       u = null)          pop i, update                                   process k
      push r            pop n, update             u=i                (enter loop,         update u = null
 update u = null            u=n                 process i             u = null)
                          process n         update u = null         pop p, update              (done)



                                                    30
A.3      Postorder
Simulation of a stack-based postorder traversal of the binary tree in Figure 1:


      prev = null                 process l                   update prev = m           update curr = (n)
        push a                 update prev = l                                                 check:
                                                                  (enter loop)        prev.rightChild = curr
      (enter loop)                (enter loop)               update curr = (m)        (null.rightChild = (n))
   update curr = (a)          update curr = (g)                      check:                   process n
  check: prev = null                  check:               prev.rightChild = curr        update prev = n
       push (b)            prev.rightChild = curr          (null.rightChild = (m))
    update prev = a        (null.rightChild = (g))                 process m                (enter loop)
                                      check:                  update prev = m           update curr = (h)
       (enter loop)         curr.lef tChild = prev                                             check:
    update curr = (b)        ((l).lef tChild = (l))              (enter loop)         prev.rightChild = curr
           check:                  push (m)                  update curr = (g)        (null.rightChild = (h))
 prev.lef tChild = curr        update prev = g                      check:                    process h
  ((b).lef tChild = (b))                                   prev.rightChild = curr        update prev = h
        push (d)                 (enter loop)              (null.rightChild = (g))
     update prev = b         update curr = (m)                     process g                (enter loop)
                                    check:                    update prev = g           update curr = (d)
       (enter loop)        prev.rightChild = curr                                              check:
    update curr = (d)      ((m).rightChild = (m))                 (enter loop)        prev.rightChild = curr
           check:                 push (r)                     update curr = (d)      ((n).rightChild = (d))
 prev.lef tChild = curr       update prev = m                         check:                 process d
  ((d).lef tChild = (d))                                   prev.rightChild = curr        update prev = d
        push (g)                 (enter loop)              ((m).rightChild = (d))
     update prev = d          update curr = (r)                       check:                 (enter loop)
                                     check:                 curr.lef tChild = prev        update curr = (b)
       (enter loop)        prev.lef tChild = curr            ((g).lef tChild = (g))              check:
    update curr = (g)       ((r).lef tChild = (r))                 push (h)           prev.rightChild = curr
           check:                   (noop)                      update prev = d        ((h).rightChild = (b))
 prev.lef tChild = curr        update prev = r                                                   check:
  ((g).lef tChild = (g))                                         (enter loop)          curr.lef tChild = prev
         push (l)                (enter loop)                update curr = (h)          ((d).lef tChild = (d))
     update prev = g          update curr = (r)                     check:                     push (e)
                                    check:                 prev.rightChild = curr          update prev = b
       (enter loop)        prev.rightChild = curr          ((h).rightChild = (h))
   update curr = (l)       (null.rightChild = (r))                push (n)                   (enter loop)
           check:                  process r                  update prev = h            update curr = (e)
 prev.lef tChild = curr        update prev = r                                                  check:
  ((l).lef tChild = (l))                                         (enter loop)         prev.rightChild = curr
          (noop)                  (enter loop)               update curr = (n)         ((e).rightChild = (e))
     update prev = l          update curr = (m)                     check:                    push (i)
                                      check:               prev.rightChild = curr         update prev = e
      (enter loop)         prev.rightChild = curr          ((n).rightChild = (n))
   update curr = (l)       (null.rightChild = (m))                  (noop)                  (enter loop)
         check:                       check:                  update prev = n            update curr = (i)
prev.rightChild = curr      curr.lef tChild = prev                                             check:
(null.rightChild = (l))      ((r).lef tChild = (r))              (enter loop)         prev.rightChild = curr


                                                      31
 ((i).rightChild = (i))              check:                                           ((q).rightChild = (f ))
        push (o)           prev.rightChild = curr                (enter loop)                   check:
    update prev = i         ((i).rightChild = (b))           update curr = (p)        curr.lef tChild = prev
                                   process b                        check:             ((j).lef tChild = (j))
      (enter loop)             update prev = b             prev.rightChild = curr            push (k)
   update curr = (o)                                       (null.rightChild = (p))       update prev = f
          check:                  (enter loop)                     process p
prev.lef tChild = curr        update curr = (a)               update prev = p               (enter loop)
 ((o).lef tChild = (o))               check:                                            update curr = (k)
         (noop)            prev.rightChild = curr                 (enter loop)                 check:
    update prev = o         ((e).rightChild = (a))             update curr = (j)      prev.rightChild = curr
                                      check:                          check:          ((k).rightChild = (k))
      (enter loop)          curr.lef tChild = prev         prev.rightChild = curr              (noop)
   update curr = (o)         ((b).lef tChild = (b))        (null.rightChild = (j))       update prev = k
         check:                     push (c)                          check:
prev.rightChild = curr         update prev = a              curr.lef tChild = prev          (enter loop)
(null.rightChild = (o))                                      ((p).lef tChild = (p))     update curr = (k)
        process o                 (enter loop)                      push (q)                   check:
    update prev = o           update curr = (c)                 update prev = j       prev.rightChild = curr
                                     check:                                           (null.rightChild = (k))
       (enter loop)        prev.rightChild = curr                 (enter loop)                process k
    update curr = (i)       ((c).rightChild = (c))            update curr = (q)          update prev = k
           check:                  push (f )                         check:
prev.rightChild = curr         update prev = c             prev.rightChild = curr           (enter loop)
(null.rightChild = (i))                                     ((q).rightChild = (q))      update curr = (f )
           check:                 (enter loop)                       (noop)                    check:
 curr.lef tChild = prev      update curr = (f )                update prev = q        prev.rightChild = curr
  ((o).lef tChild = (o))             check:                                           (null.rightChild = (f ))
     update prev = i       prev.rightChild = curr                (enter loop)                 process f
                           ((f ).rightChild = (f ))           update curr = (q)          update prev = f
      (enter loop)                 push (j)                         check:
   update curr = (i)           update prev = f             prev.rightChild = curr            (enter loop)
         check:                                            (null.rightChild = (q))       update curr = (c)
prev.rightChild = curr           (enter loop)                      process q                    check:
(null.rightChild = (i))       update curr = (j)                update prev = q        prev.rightChild = curr
        process i                    check:                                            ((k).rightChild = (c))
    update prev = i        prev.lef tChild = curr                (enter loop)                  process c
                            ((j).lef tChild = (j))            update curr = (j)           update prev = c
      (enter loop)                 push (p)                         check:
   update curr = (e)           update prev = j             prev.rightChild = curr            (enter loop)
         check:                                            (null.rightChild = (j))      update curr = (a)
prev.rightChild = curr           (enter loop)                      process j                    check:
(null.rightChild = (e))       update curr = (p)                update prev = j        prev.rightChild = curr
        process e                    check:                                           ((f ).rightChild = (a))
    update prev = e        prev.lef tChild = curr                (enter loop)                 process a
                            ((p).lef tChild = (p))           update curr = (f )           update prev = a
    (enter loop)                    (noop)                          check:
  update curr = (b)            update prev = p             prev.rightChild = curr




                                                      32