Plaintext
Trees
Trees, Binary Search Trees, Heaps & Applications
Dr. Chris Bourke
Department of Computer Science & Engineering
University of Nebraska—Lincoln
Lincoln, NE 68588, USA
cbourke@cse.unl.edu
http://cse.unl.edu/~cbourke
2015/01/31 21:05:31
Abstract
These are lecture notes used in CSCE 156 (Computer Science II), CSCE 235 (Dis-
crete Structures) and CSCE 310 (Data Structures & Algorithms) at the University of
Nebraska—Lincoln.
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
1
Contents
I Trees 4
1 Introduction 4
2 Definitions & Terminology 5
3 Tree Traversal 7
3.1 Preorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Inorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Postorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Breadth-First Search Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Implementations & Data Structures . . . . . . . . . . . . . . . . . . . . . . . 8
3.5.1 Preorder Implementations . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5.2 Inorder Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5.3 Postorder Implementation . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5.4 BFS Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5.5 Tree Walk Implementations . . . . . . . . . . . . . . . . . . . . . . . 12
3.6 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Binary Search Trees 14
4.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5 Balanced Binary Search Trees 17
5.1 2-3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 AVL Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Optimal Binary Search Trees 19
7 Heaps 19
7.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.1 Finding the first available open spot in a Tree-based Heap . . . . . . 21
2
7.3 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8 Java Collections Framework 26
9 Applications 26
9.1 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
9.1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
A Stack-based Traversal Simulations 29
A.1 Preorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A.2 Inorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.3 Postorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
List of Algorithms
1 Recursive Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Stack-based Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . 9
3 Stack-based Inorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Stack-based Postorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . 11
5 Queue-based BFS Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Tree Walk based Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 Search algorithm for a binary search tree . . . . . . . . . . . . . . . . . . . . . 16
8 Find Next Open Spot - Numerical Technique . . . . . . . . . . . . . . . . . . 23
9 Find Next Open Spot - Walk Technique . . . . . . . . . . . . . . . . . . . . . 24
10 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
11 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Code Samples
List of Figures
1 A Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 A Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Binary Search Tree Operations . . . . . . . . . . . . . . . . . . . . . . . . . 18
3
4 A min-heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Part I
Trees
Lecture Outlines
CSCE 156
• Basic definitions
• Tree Traversals
CSCE 235
• Review of basic definitions
• Huffman Coding
CSCE 310
• Heaps, Heap Sort
• Balanced BSTs: 2-3 Trees, AVL, Red-Black
1 Introduction
Motivation: we want a data structure to store elements that offers efficient, arbitrary retrieval
(search), insertion, and deletion.
• Array-based Lists
– O(n) insertion and deletion
– Fast index-based retrieval
– Efficient binary search if sorted
• Linked Lists
– Efficient, O(1) insert/delete for head/tail
4
– Inefficient, O(n) arbitrary search/insert/delete
– Efficient binary search not possible without random access
• Stacks and queues are efficient, but are restricted access data structures
• Possible alternative: Trees
• Trees have the potential to provide O(log n) efficiency for all operations
2 Definitions & Terminology
• A tree is an acyclic graph
• For our purposes: a tree is a collection of nodes (that can hold keys, data, etc.) that
are connected by edges
• Trees are also oriented : each node has a parent and children
• A node with no parents is the root of the tree, all child nodes are oriented downward
• Nodes not immediately connected can have an ancestor, descendant or cousin relation-
ship
• A node with no children is a leaf
• A tree such that all nodes have at most two children is called a binary tree
• A binary tree is also oriented horizontally: each node may have a left and/or a right
child
• Example: see Figure 1
• A path in a tree is a sequence nodes connected by edges
• The length of a path in a tree is the number of edges in the path (which equals the
number of nodes in the path minus one)
• A path is simple if it does not traverse nodes more than once (this is the default type
of path)
• The depth of a node u is the length of the (unique) path from the root to u
• The depth of the root is 0
• The depth of a tree is the maximal depth of any node in the tree (sometimes the term
height is used)
5
• All nodes of the same depth are considered to be at the same level
• A binary tree is complete (also called full or perfect) if all nodes are present at all levels
0 up to its depth d
• A sub-tree rooted at a node u is the tree consisting of all descendants with u oriented
as the root
a
b c
d e f
g h i j k
l m n o p q
r
Figure 1: A Binary Tree
Properties:
• In a tree, all nodes are connected by exactly one unique path
• The maximum number of nodes at any level k is 2k
• Thus, the maximum number of nodes n for any binary tree of depth d is:
d
X
n = 20 + 21 + 22 + · · · + 2d−1 + 2d = 2k = 2d+1 − 1
k=0
• Given a full binary tree with n nodes in it has depth:
d = log (n + 1) − 1
• That is, d = O(log n)
Motivation: if we can create a tree-based data structure with operations proportional to its
depth, then we could potentially have a data structure that allows retrieval/search, insertion,
and deletion in O(log n)-time.
6
3 Tree Traversal
• Given a tree, we need a way to enumerate elements in a tree
• Many algorithms exist to iterate over the elements in a tree
• We’ll look at several variations on a depth-first-search
3.1 Preorder Traversal
• A preorder traversal strategy visits nodes in the following order: root; left-sub-tree;
right-sub-tree
• An example traversal on the tree in Figure 1:
a, b, d, g, l, m, r, h, n, e, i, o, c, f, j, p, q, k
• Applications:
– Building a tree, traversing a tree, copying a tree, etc.
– Efficient stack-based implementation
– Used in prefix notation (polish notation); used in languages such as Lisp/Scheme
3.2 Inorder Traversal
• An inorder traversal strategy visits nodes in the following order: left-sub-tree; root;
right-sub-tree
• An example traversal on the tree in Figure 1:
l, g, r, m, d, h, n, b, e, o, i, a, c, p, j, q, f, k
• Applications:
– Enumerating elements in order in a binary search tree
– Expression trees
3.3 Postorder Traversal
• A postorder traversal strategy visits nodes in the following order: left-sub-tree; right-
sub-tree; root
7
• An example traversal on the tree in Figure 1:
l, r, m, g, n, h, d, o, i, e, b, p, q, j, k, f, c, a
• Applications:
– Topological sorting
– Destroying a tree when manual memory management is necessary (roots are the
last thing that get cleaned up)
– Reverse polish notation (operand-operand-operator, unambiguous, used in old HP
calculators)
– PostScript (Page Description Language)
3.4 Breadth-First Search Traversal
• Breadth-First Search (BFS) traversal is a general graph traversal strategy that explores
local or close nodes first before traversing “deeper” into the graph
• When applied to an oriented binary tree, BFS explores the tree level-by-level (top-to-
bottom, left-to-right)
3.5 Implementations & Data Structures
• Reference based implementation: TreeNode<T>
– Owns (through composition) references to: leftChild, rightChild, parent
– Can use either sentinel nodes or null to indicate missing children and parent
• BinaryTree<T> owns a root
• SVN examples: unl.cse.bst
3.5.1 Preorder Implementations
Input : A binary tree node u
Output: A preorder traversal of the nodes in the subtree rooted at u
1 print u
2 preOrderTraversal(u → lef tChild)
3 preOrderTraversal(u → rightChild)
Algorithm 1: Recursive Preorder Tree Traversal
Stack-based implementation:
8
• Initially, we push the tree’s root into the stack
• Within a loop, we pop the top of the stack and process it
• We need to push the node’s children for future processing
• Since a stack is LIFO, we push the right child first.
Input : A binary tree, T
Output: A preorder traversal of the nodes in T
1 S ← empty stack
2 push T ’s root onto S
3 while S is not empty do
4 node ← S.pop
5 push node’s right-child onto S
6 push node’s left-child onto S
7 process node
8 end
Algorithm 2: Stack-based Preorder Tree Traversal
3.5.2 Inorder Implementation
Stack-based implementation:
• The same basic idea: push nodes onto the stack as you visit them
• However, we want to delay processing the node until we’ve explored the left-sub-tree
• We need a way to tell if we are visiting the node for the first time or returning from
the left-tree exploration
• To achieve this, we allow the node to be null
• If null, then we are returning from a left-tree exploration, pop the top of the stack and
process (then push the right child)
• If not null, then we push it to the stack for later processing, explore the left child
9
Input : A binary tree, T
Output: An inorder traversal of the nodes in T
1 S ← empty stack
2 u ← root
3 while S is not empty Or u 6= null do
4 if u 6= null then
5 push u onto S
6 u ← u.leftChild
7 else
8 u ← S.pop
9 process u
10 u ← u.rightChild
11 end
12 end
Algorithm 3: Stack-based Inorder Tree Traversal
3.5.3 Postorder Implementation
Stack-based implementation:
• Same basic ideas, except that we need to distinguish if we’re visiting the node for the
first time, second time or last (so that we can process it)
• To achieve this, we keep track of where we came from: a parent, left, or right node
• We keep track of a previous and a current node
10
Input : A binary tree, T
Output: A postorder traversal of the nodes in T
1 S ← empty stack
2 prev ← null
3 push root onto S
4 while S is not empty do
5 curr ← S.peek
6 if prev = null Or prev.leftChild = curr Or prev.rightChild = curr then
7 if curr.leftChild 6= null then
8 push curr.leftChild onto S
9 else if curr.rightChild 6= null then
10 push curr.rightChild onto S
11 end
12 else if curr.leftChild = prev then
13 if curr.rightChild 6= null then
14 push curr.rightChild onto S
15 end
16 else
17 process curr
18 S.pop
19 end
20 prev ← curr
21 end
Algorithm 4: Stack-based Postorder Tree Traversal
11
3.5.4 BFS Implementation
Input : A binary tree, T
Output: A BFS traversal of the nodes in T
1 Q ← empty queue
2 enqueue T ’s root into Q
3 while Q is not empty do
4 node ← Q.dequeue
5 enqueue node’s left-child onto Q
6 enqueue node’s right-child onto Q
7 print node
8 end
Algorithm 5: Queue-based BFS Tree Traversal
3.5.5 Tree Walk Implementations
• Simple rules based on local information: where you are and where you came from
• No additional data structures required
• Traversal is a “walk” around the perimeter of the tree
• Can use similar rules to determine when the current node should be processed to
achieve pre, in, and postorder traversals
• Need to take care with corner cases (when current node is the root or children are
missing)
• Pseudocode presented Algorithm 6
3.6 Operations
Basic Operations:
• Search for a particular element/key
• Adding an element
– Add at most shallow available spot
– Add at a random leaf
12
Input : A binary tree, T
Output: A Tree Walk around T
1 curr ← root
2 prevT ype ← parent
3 while curr 6=null do
4 if prevT ype = parent then
//preorder: process curr here
5 if curr.lef tChild exists then
//Go to the left child:
6 curr ← curr.lef tChild
7 prevT ype ← parent
8 else
9 curr ← curr
10 prevT ype ← lef t
11 end
12 else if prevT ype = lef t then
//inorder: process curr here
13 if curr.rightChild exists then
//Go to the right child:
14 curr ← curr.rightChild
15 prevT ype ← parent
16 else
17 curr ← curr
18 prevT ype ← right
19 end
20 else if prevT ype = right then
//postorder: process curr here
21 if curr.parent = null then
//root has no parent, we’re done traversing
22 curr ← curr.parent
//are we at the parent’s left or right child?
23 else if curr = curr.parent.lef tChild then
24 curr ← curr.parent
25 prevT ype ← lef t
26 else
27 curr ← curr.parent
28 prevT ype ← right
29 end
30 end 13
31 end
Algorithm 6: Tree Walk based Tree Traversal
– Add internally, shift nodes down by some criteria
• Removing elements
– Removing leaves
– Removing elements with one child
– Removing elements with two children
Other Operations:
• Compute the total number of nodes in a tree
• Compute the total number of leaves in a tree
• Given an item or node, compute its depth
• Compute the depth of a tree
4 Binary Search Trees
Regular binary search trees have little structure to their elements; search, insert, delete
operations are still linear with respect to the number of tree nodes, O(n). We want a data
structure with operations proportional to its depth, O(d). To this end, we add structure and
order to tree nodes.
• Each node has an associated key
• Binary Search Tree Property: For every node u with key uk in T
1. Every node in the left-sub-tree of u has keys less than uk
2. Every node in the right-sub-tree of u has keys greater than uk
• Duplicate keys can be handled, but you must be consistent and not guaranteed to be
contiguous
• Alternatively: do not allow duplicate keys or define a key scheme that ensures a total
order
• Inductive property: all sub-trees are also binary search trees
• A full example can be found in Figure 2
14
50
40 60
20 45 90
10 24 49 80 95
2 15 28 48 75 85
12
Figure 2: A Binary Search Tree
4.1 Basic Operations
Observation: a binary search tree has more structure: the key in each node provides infor-
mation on where a node is not located. We will exploit this structure to achieve O(log n)
operations.
Search/retrieve
• Goal: find a node (and its data) that matches a given key k
• Start at the node
• At each node u, compare k to u’s key, uk :
– If equal, element found, stop and return
– If k < uk , traverse to u’s left-child
– If k > uk , traverse to u’s right-child
• Traverse until the sub-tree is empty (element not found)
• Analysis: number of comparisons is bounded by the depth of the tree, O(d)
15
Input : A binary search tree, T , a key k
Output: The tree node u ∈ T whose key, uk matches k
1 u ← T ’s root
2 while u 6= φ do
3 if uk = k then
4 output u
5 end
6 else if uk > k then
7 u ← u’s left-child
8 else if uk < k then
9 u ← u’s left-child
10 end
11 output φ
Algorithm 7: Search algorithm for a binary search tree
Insert
• Insert new nodes as leaves
• To determine where it should be inserted: traverse the tree as above
• Insert at the first available spot (first missing child node)
• Analysis: finding the available location is O(d), inserting is just reference juggling,
O(1)
Delete
• Need to first find the node u to delete, traverse as above, O(d)
• If u is a leaf (no children): its safe to simply delete it
• If u has one child, then we can “promote” it to u’s spot (u’s parent will now point to
u’s child)
• If u has two children, we need to find a way to preserve the BST property
– Want to minimally change the tree’s structure
– Need the operation to be efficient
– Find the minimum element of the greater nodes (right sub-tree) or the maximal
element of the lesser nodes (left sub-tree)
– Such an element will have at most one child (which we know how to delete)
16
– Delete it and store off the key/data
– Replace us key/data with the contents of the minimum/maximum element
• Analysis:
– Search/Find: O(d)
– Finding the min/max: O(d)
– Swapping: O(1)
– In total: O(d)
• Examples illustrated in Figure 3
5 Balanced Binary Search Trees
Motivation:
• Ideally, a full binary tree has depth d = O(log n)
• General BSTs can degenerate such that d = O(n)
• Need a way to preserve the BST property while at the same time limiting the depth
to O(log n)
• Solution: Balanced Binary Search Trees
• BSTs that efficiently reorganize themselves such that the BST Property is preserved
and that the depth is restricted to O(log n)
• Some types of Balanced BSTs:
– B-trees (and 2-3 trees)
– AVL Trees
– Red-Black Trees
– Splay trees
5.1 2-3 Trees
See 310 Note Set.
5.2 AVL Trees
See 310 Note Set.
17
15 15
7 30 7 30
4 9 20 40 4 9 20 40
1 5 17 23 1 5 17 23
16
(a) A Binary Search Tree (b) Insertion of a new node (16) into a Binary
Search Tree
15 9
7 30 7 30
4 9 20 40 4 20 40
1 5 17 23 1 5 17 23
(c) Deletion of a node with two children (15). (d) Node 15 is replaced with the extremal node,
First step: find the maximum node in the left- preserving the BST property
sub-tree (lesser elements).
9 9
7 30 4 30
4 20 40 1 5 20 40
1 5 17 23 17 23
(e) Deletion a node with only one child (7). (f) Removal is achieved by simply promoting
the single child/subtree.
Figure 3: Binary Search Tree Operations. Figure 3(b) depicts the insertion of (16) into
the tree in Figure 3(a). Figures 3(c) and 3(d) depict the deletion of a node (15) with two
children. Figures 3(e) and 3(f) depict the deletion of a node with only one child (7).
18
5.3 Red-Black Trees
See 310 Note Set.
6 Optimal Binary Search Trees
See 310 Note Set.
7 Heaps
Definition 1. A heap is a binary tree that satisfies the following properties.
1. It is a full or complete binary tree: all nodes are present except possibly the last row
2. If the last row is not full, all nodes are full-to-the-left
3. It satisfies the Heap Property: every node has a key that is greater than both of its
children (max-heap)
• As a consequence of the Heap Property, the maximal element is always at the root
• Alternatively, we can define a min-heap
• Variations: 2-3 heaps, fibonacci heaps, etc.
• A min-heap example can be found in Figure 4
1
5 30
50 55 40 80
60 53 72 70 65 45 90 95
92 63
Figure 4: A min-heap
Applications
19
• Heaps are an optimal implementation of a priority queue
• Used extensively in other algorithms (Heap Sort, Prim’s, Dijkstra’s, Huffman Coding,
etc.) to ensure efficient operation
7.1 Operations
Insert
• Want to preserve the full-ness property and the Heap Property
• Preserve full-ness: add new element at the end of the heap (last row, first free spot on
the left)
• Insertion at the end may violate the Heap Property
• Heapify/fix: bubble up inserted element until Heap Property is satisfied
• Analysis: insert is O(1); heapify: O(d)
Remove from top
• Want to preserve the full-ness property and the Heap Property
• Preserve full-ness: swap root element with the last element in the heap (lowest row,
right-most element)
• Heap property may be violated
• Heapify/fix: bubble new root element down until Heap Property is satisfied
• Analysis: Swap is O(1); heapify: O(d)
Others
• Arbitrary remove
• Find
• Possible, but not ideal: Heaps are restricted-access data structures
Analysis
• All operations are O(d)
• Since Heaps are full, d = O(log n)
• Thus, all operations are O(log n)
20
7.2 Implementations
Array-Based
• Root is located at index 1
• If a node u is at index i, u’s left-child is at 2i, its right-child is at 2i + 1
• If node u is at index j, its parent is at index b 2j c
• Alternatively: 0-index array left/right children/parent are at 2n + 1, 2n + 2, and b j−1
2
c
• Advantage: easy implementation, all items are contiguous in the array (in fact, a BFS
ordering!)
• Disadvantage: Insert operation may force a reallocation, but this can be done in
amortized-constant time (though may still have wasted space)
Tree-Based
• Reference-based tree (nodes which contain references to children/parent)
• Parent reference is now required for efficiency
• For efficiency, we need to keep track of the last element in the tree
• For deletes/inserts: we need a way to find the last element and first “open spot”
• We’ll focus on finding the first available open spot as the same technique can be used
to find the last element with minor modifications
7.2.1 Finding the first available open spot in a Tree-based Heap
Technique A: numerical technique
• WLOG: assume we keep track of the number of nodes in the heap, n and thus the
depth d = blog nc
• If n = 2d+1 − 1 then the tree is full, the last element is all the way to the right, the
first available spot is all the way to the left
• Otherwise n < 2d+1 − 1 and the heap is not full (the first available spot is located at
level d, root is at level 0)
• Starting at the root, we want to know if the last element is in the left-subtree or the
right subtree
21
• Let m = n − (2d − 1), the number of nodes present in level d
d
• If m ≥ 22 then the left-sub tree is full at the last level and so the next open spot would
be in the right-sub tree
d
• Otherwise if m < 22 then the left-sub tree is not full at the last level and so the next
open spot is in the left-sub tree
• Traverse down to the left or right respectively and repeat: the resulting sub-tree will
d
have depth d − 1 with m = m (if traversing left) or m = m − 22 (if traversing right)
• Repeat until we’ve found the first available spot
• Analysis: in any case, its O(d) = O(log n) to traverse from the root to the first open
spot
22
Input : A tree-based heap H with n nodes
Output: The node whose child is the next available open spot in the heap
1 curr ← T.head
2 d ← blog nc
3 m←n
4 while curr has both children do
5 if m = 2d+1 − 1 then
//remaining tree is full, traverse all the way left
6 while curr has both children do
7 curr ← curr.lef tChild
8 end
9 else
//remaining tree is not full, determine if the next open spot is
in the left or right sub-tree
d
10 if m ≥ 22 then
//left sub-tree is full
11 d ← (d − 1)
d
12 m ← (m − 22 )
13 curr ← curr.rightChild
14 else
//left sub-tree is not full
15 d ← (d − 1)
16 m←m
17 curr ← curr.lef tChild
18 end
19 end
20 end
21 output curr
Algorithm 8: Find Next Open Spot - Numerical Technique
Technique B: Walk Technique
• Alternatively, we can adapt the idea behind the tree walk algorithm to find the next
available open spot
• We’ll assume that we’ve kept track of the last node
• If the tree is full, we simply traverse all the way to the left and insert, O(d)
23
• If the last node is a left-child then its parent’s right child is the next available spot,
finding it is O(1)
• Otherwise, we’ll need to traverse around the perimeter of the tree until we reach the
next open slot
Input : A tree-based heap H with n nodes
Output: The node whose (missing) child is the next available open spot in the heap
1 d ← blog nc
2 if n = 2d+1 − 1 then
//The tree is full, traverse all the way to the left
3 curr ← root
4 while curr.lef tChild 6= null do
5 curr ← curr.lef tChild
6 end
7 else if last is a left-child then
//parent’s right child is open
8 curr ← last.parent
9 else
//The open spot lies in a subtree to the right of the last node
//Walk the tree until we reach it
10 curr ← last.parent
11 while curr is a right-child do
12 curr ← curr.parent
13 end
//"turn" right
14 curr ← curr.parent
15 curr ← curr.rightChild
//traverse all the way left
16 while curr.lef tChild 6= null do
17 curr ← curr.lef tChild
18 end
19 end
//current node’s missing child is the open spot
20 output curr
Algorithm 9: Find Next Open Spot - Walk Technique
24
7.3 Heap Sort
• If min/max element is always at the top; simply insert all elements, then remove them
all!
• Perfect illustration of “Smart data structures and dumb code are a lot better than the
other way around”
Input : A collection of elements A = {a1 , . . . , an }
Output: A collection, A0 of elements in A, sorted
1 H ← empty heap
2 A0 ← empty collection
3 foreach x ∈ A do
4 insert x into H
5 end
6 while H is not empty do
7 y ← remove top from H
8 Add y to the end of A0
9 end
10 output A0
Algorithm 10: Heap Sort
Analysis
• Amortized analysis: insert/remove operations are not constant throughout the algo-
rithm
• On first iteration: insert is d = O(1); on the i-th iteration, d = O(log i); only on the
last iteration is insertion O(log n)
• In total, the insert phase is:
n
X
log i = O(n log n)
i=1
• A similar lower bound can be shown
• Same analysis applies to the remove phase:
1
X
log i
i=n
• In total, O(n log n)
25
8 Java Collections Framework
Java has support for several data structures supported by underlying tree structures.
• java.util.PriorityQueue<E> is a binary-heap based priority queue
– Priority (keys) based on either natural ordering or a provided Comparator
– Guaranteed O(log n) time for insert (offer) and get top (poll)
– Supports O(n) arbitrary remove(Object) and search (contains) methods
• java.util.TreeSet<E>
– Implements the SortedSet interface; makes use of a Comparator
– Backed by TreeMap, a red-black tree balanced binary tree implementation
– Guaranteed O(log n) time for add, remove, contains operations
– Default iterator is an in-order traversal
9 Applications
9.1 Huffman Coding
Overview
• Coding Theory is the study and theory of codes—schemes for transmitting data
• Coding theory involves efficiently padding out data with redundant information to
increase reliability (detect or even correct errors) over a noisy channel
• Coding theory also involves compressing data to save space
– MP3s (uses a form of Huffman coding, but is information lossy)
– jpegs, mpegs, even DVDs
– pack (straight Huffman coding)
– zip, gzip (uses a Ziv-Lempel and Huffman compression algorithm)
Basics
• Let Σ be a fixed alphabet of size n
• A coding is a mapping of this alphabet to a collection of binary codewords,
Σ → {0, 1}∗
26
• A block encoding is a fixed length encoding scheme where all codewords have the same
length (example: ASCII); requires dlog2 ne length codes
• Not all symbols have the same frequency, alternative: variable length encoding
• Intuitively: assign shorter codewords to more frequent symbols, longer to less frequent
symbols
• Reduction in the overall average codeword length
• Variable length encodings must be unambiguous
• Solution: prefix free codes: a code in which no whole codeword is the prefix of another
(other than itself of course).
• Examples:
– {0, 01, 101, 010} is not a prefix free code.
– {10, 010, 110, 0110} is a prefix free code.
• A simple way of building a prefix free code is to associate codewords with the leaves
of a binary tree (not necessarily full).
• Each edge corresponds to a bit, 0 if it is to the left sub-child and 1 to the right sub-child.
• Since no simple path from the root to any leaf can continue to another leaf, then we
are guaranteed a prefix free coding.
• Using this idea along with a greedy encoding forms the basis of Huffman Coding
Steps
• Consider a precomputed relative frequency function:
freq : Σ → [0, 1]
• Build a collection of weighted trees Tx for each symbol x ∈ Sigma with wt(Tx ) = freq(x)
• Combine the two least weighted trees via a new node; associate a new weight (the sum
of the weights of the two subtrees)
• Keep combining until only one tree remains
• The tree constructed in Huffman’s algorithm is known as a Huffman Tree and it defines
a Huffman Code
27
Input : An alphabet of symbols, Σ with relative frequencies, freq(x)
Output: A Huffman Tree
1 H ← new min-heap
2 foreach x ∈ Σ do
3 Tx ← single node tree
4 wt(Tx ) ← freq(x)
5 insert Tx into H
6 end
7 while size of H > 1 do
8 Tr ← new tree root node
9 Ta ← H.getM in
10 Tb ← H.getM in
11 Tr .lef tChild ← Ta
12 Tr .rightChild ← Tb
13 wt(r) ← wt(Ta ) + wt(Tb )
14 insert Tr into H
15 end
16 output H.getM in
Algorithm 11: Huffman Coding
9.1.1 Example
Construct the Huffman Tree and Huffman Code for a file with the following content.
character A B C D E F G
frequency 0.10 0.15 0.34 .05 .12 .21 .03
• Average codeword length:
.10 · 3 + .15 · 3 + .34 · 2 + .05 · 4 + .12 · 3 + .21 · 2 + .03 · 4 = 2.53
• Compression ratio:
(3 − 2.53)
= 15.67%
3
• In general, for text files, pack (Huffman Coding), claims an average compression ratio
of 25-40%.
• Degenerative cases:
28
.08
0 1
.39 .61
0 1 0 1
.18 F : .21 .27 C : .34
0 1 0 1
.08 A : .10 E : .12 B : .15
0 1
G : .03 D : .05
Figure 5: Huffman Tree
– When the probability distribution is uniform: p(x) = p(y) for all x, y ∈ Σ
– When the probability distribution follows a fibonacci sequence (the sum of each
of the two smallest probabilities is less than equal to the next highest probability
for all probabilities)
A Stack-based Traversal Simulations
A.1 Preorder
Simulation of a stack-based preorder traversal of the binary tree in Figure 1.
push a (enter loop) pop, node = e push f
(enter loop) pop, node = r push i (no push)
(enter loop) pop, node = g (no push) (no push) print c
pop, node = a push m (no push) print e
push c push l print r (enter loop)
push b print g (enter loop) pop, node = f
print a (enter loop) pop, node = i push k
(enter loop) pop, node = h (no push) push j
(enter loop) pop, node = l push n push o print f
pop, node = b (no push) (no push) print i
push e (no push) print h (enter loop)
push d print l (enter loop) pop, node = j
print b (enter loop) pop, node = o push q
(enter loop) pop, node = n (no push) push p
(enter loop) pop, node = m (no push) (no push) print j
pop, node = d (no push) (no push) print o
push h push r print n (enter loop)
push g print m (enter loop) pop, node = p
print d (enter loop) pop, node = c (no push)
29
(no push) (enter loop) (no push) (no push)
print p pop, node = q (no push) (enter loop) (no push)
print q pop, node = k print k
A.2 Inorder
Simulation of a stack-based inorder traversal of the binary tree in Figure 1.
(enter loop, u = a) (enter loop, update u = null u=p
push a u = null) (enter loop, process p
update u = b pop r, update (enter loop, u = null) update u = null
u=r u = null) pop i, update
(enter loop, u = b) process r pop b, update u=i (enter loop,
push b update u = null u=b process i u = null)
update u = d process b update u = null pop j, update
(enter loop, update u = e u=j
(enter loop, u = d) u = null) (enter loop, process j
push d pop m, update (enter loop, u = e) u = null) update u = q
update u = g u=m push e pop a, update
process m update u = null u=a (enter loop, u = q)
(enter loop, u = g) update u = null process a push q
push g (enter loop, update u = c update u = null
update u = l (enter loop, u = null)
u = null) pop e, update (enter loop, u = c) (enter loop,
(enter loop, u = l) pop d, update u=e push c u = null)
push l u=d process e update u = null pop q, update
update u = null process d update u = i u=q
update u = h (enter loop, process q
(enter loop, (enter loop, u = i) u = null) update u = null
u = null) (enter loop, u = h) push i pop c, update
pop l, update u = l push h update u = o u=c (enter loop,
process l update u = null process c u = null)
update u = null (enter loop, u = o) update u = f pop f , update
(enter loop, push o u=f
(enter loop, u = null) update u = null (enter loop, u = f ) process f
u = null) pop h, update push f update u = k
pop g, update u=h (enter loop, update u = j
u=g process h u = null) (enter loop, u = k)
process g update u = n pop o, update (enter loop, u = j) push k
update u = m u=o push j update u = null
(enter loop, u = n) process o update u = p
(enter loop, u = m) push n update u = null (enter loop,
push m update u = null (enter loop, u = p) u = null)
update u = r (enter loop, push p pop k, update
(enter loop, u = null) update u = null u=k
(enter loop, u = r) u = null) pop i, update process k
push r pop n, update u=i (enter loop, update u = null
update u = null u=n process i u = null)
process n update u = null pop p, update (done)
30
A.3 Postorder
Simulation of a stack-based postorder traversal of the binary tree in Figure 1:
prev = null process l update prev = m update curr = (n)
push a update prev = l check:
(enter loop) prev.rightChild = curr
(enter loop) (enter loop) update curr = (m) (null.rightChild = (n))
update curr = (a) update curr = (g) check: process n
check: prev = null check: prev.rightChild = curr update prev = n
push (b) prev.rightChild = curr (null.rightChild = (m))
update prev = a (null.rightChild = (g)) process m (enter loop)
check: update prev = m update curr = (h)
(enter loop) curr.lef tChild = prev check:
update curr = (b) ((l).lef tChild = (l)) (enter loop) prev.rightChild = curr
check: push (m) update curr = (g) (null.rightChild = (h))
prev.lef tChild = curr update prev = g check: process h
((b).lef tChild = (b)) prev.rightChild = curr update prev = h
push (d) (enter loop) (null.rightChild = (g))
update prev = b update curr = (m) process g (enter loop)
check: update prev = g update curr = (d)
(enter loop) prev.rightChild = curr check:
update curr = (d) ((m).rightChild = (m)) (enter loop) prev.rightChild = curr
check: push (r) update curr = (d) ((n).rightChild = (d))
prev.lef tChild = curr update prev = m check: process d
((d).lef tChild = (d)) prev.rightChild = curr update prev = d
push (g) (enter loop) ((m).rightChild = (d))
update prev = d update curr = (r) check: (enter loop)
check: curr.lef tChild = prev update curr = (b)
(enter loop) prev.lef tChild = curr ((g).lef tChild = (g)) check:
update curr = (g) ((r).lef tChild = (r)) push (h) prev.rightChild = curr
check: (noop) update prev = d ((h).rightChild = (b))
prev.lef tChild = curr update prev = r check:
((g).lef tChild = (g)) (enter loop) curr.lef tChild = prev
push (l) (enter loop) update curr = (h) ((d).lef tChild = (d))
update prev = g update curr = (r) check: push (e)
check: prev.rightChild = curr update prev = b
(enter loop) prev.rightChild = curr ((h).rightChild = (h))
update curr = (l) (null.rightChild = (r)) push (n) (enter loop)
check: process r update prev = h update curr = (e)
prev.lef tChild = curr update prev = r check:
((l).lef tChild = (l)) (enter loop) prev.rightChild = curr
(noop) (enter loop) update curr = (n) ((e).rightChild = (e))
update prev = l update curr = (m) check: push (i)
check: prev.rightChild = curr update prev = e
(enter loop) prev.rightChild = curr ((n).rightChild = (n))
update curr = (l) (null.rightChild = (m)) (noop) (enter loop)
check: check: update prev = n update curr = (i)
prev.rightChild = curr curr.lef tChild = prev check:
(null.rightChild = (l)) ((r).lef tChild = (r)) (enter loop) prev.rightChild = curr
31
((i).rightChild = (i)) check: ((q).rightChild = (f ))
push (o) prev.rightChild = curr (enter loop) check:
update prev = i ((i).rightChild = (b)) update curr = (p) curr.lef tChild = prev
process b check: ((j).lef tChild = (j))
(enter loop) update prev = b prev.rightChild = curr push (k)
update curr = (o) (null.rightChild = (p)) update prev = f
check: (enter loop) process p
prev.lef tChild = curr update curr = (a) update prev = p (enter loop)
((o).lef tChild = (o)) check: update curr = (k)
(noop) prev.rightChild = curr (enter loop) check:
update prev = o ((e).rightChild = (a)) update curr = (j) prev.rightChild = curr
check: check: ((k).rightChild = (k))
(enter loop) curr.lef tChild = prev prev.rightChild = curr (noop)
update curr = (o) ((b).lef tChild = (b)) (null.rightChild = (j)) update prev = k
check: push (c) check:
prev.rightChild = curr update prev = a curr.lef tChild = prev (enter loop)
(null.rightChild = (o)) ((p).lef tChild = (p)) update curr = (k)
process o (enter loop) push (q) check:
update prev = o update curr = (c) update prev = j prev.rightChild = curr
check: (null.rightChild = (k))
(enter loop) prev.rightChild = curr (enter loop) process k
update curr = (i) ((c).rightChild = (c)) update curr = (q) update prev = k
check: push (f ) check:
prev.rightChild = curr update prev = c prev.rightChild = curr (enter loop)
(null.rightChild = (i)) ((q).rightChild = (q)) update curr = (f )
check: (enter loop) (noop) check:
curr.lef tChild = prev update curr = (f ) update prev = q prev.rightChild = curr
((o).lef tChild = (o)) check: (null.rightChild = (f ))
update prev = i prev.rightChild = curr (enter loop) process f
((f ).rightChild = (f )) update curr = (q) update prev = f
(enter loop) push (j) check:
update curr = (i) update prev = f prev.rightChild = curr (enter loop)
check: (null.rightChild = (q)) update curr = (c)
prev.rightChild = curr (enter loop) process q check:
(null.rightChild = (i)) update curr = (j) update prev = q prev.rightChild = curr
process i check: ((k).rightChild = (c))
update prev = i prev.lef tChild = curr (enter loop) process c
((j).lef tChild = (j)) update curr = (j) update prev = c
(enter loop) push (p) check:
update curr = (e) update prev = j prev.rightChild = curr (enter loop)
check: (null.rightChild = (j)) update curr = (a)
prev.rightChild = curr (enter loop) process j check:
(null.rightChild = (e)) update curr = (p) update prev = j prev.rightChild = curr
process e check: ((f ).rightChild = (a))
update prev = e prev.lef tChild = curr (enter loop) process a
((p).lef tChild = (p)) update curr = (f ) update prev = a
(enter loop) (noop) check:
update curr = (b) update prev = p prev.rightChild = curr
32