Trees Trees, Binary Search Trees, Heaps & Applications Dr. Chris Bourke Department of Computer Science & Engineering University of Nebraska—Lincoln Lincoln, NE 68588, USA cbourke@cse.unl.edu http://cse.unl.edu/~cbourke 2015/01/31 21:05:31 Abstract These are lecture notes used in CSCE 156 (Computer Science II), CSCE 235 (Dis- crete Structures) and CSCE 310 (Data Structures & Algorithms) at the University of Nebraska—Lincoln. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License 1 Contents I Trees 4 1 Introduction 4 2 Definitions & Terminology 5 3 Tree Traversal 7 3.1 Preorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Inorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Postorder Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.4 Breadth-First Search Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.5 Implementations & Data Structures . . . . . . . . . . . . . . . . . . . . . . . 8 3.5.1 Preorder Implementations . . . . . . . . . . . . . . . . . . . . . . . . 8 3.5.2 Inorder Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.5.3 Postorder Implementation . . . . . . . . . . . . . . . . . . . . . . . . 10 3.5.4 BFS Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5.5 Tree Walk Implementations . . . . . . . . . . . . . . . . . . . . . . . 12 3.6 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Binary Search Trees 14 4.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5 Balanced Binary Search Trees 17 5.1 2-3 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 AVL Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6 Optimal Binary Search Trees 19 7 Heaps 19 7.1 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.2 Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 7.2.1 Finding the first available open spot in a Tree-based Heap . . . . . . 21 2 7.3 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 8 Java Collections Framework 26 9 Applications 26 9.1 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 9.1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 A Stack-based Traversal Simulations 29 A.1 Preorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.2 Inorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.3 Postorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 List of Algorithms 1 Recursive Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Stack-based Preorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . 9 3 Stack-based Inorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . 10 4 Stack-based Postorder Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . 11 5 Queue-based BFS Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . 12 6 Tree Walk based Tree Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . 13 7 Search algorithm for a binary search tree . . . . . . . . . . . . . . . . . . . . . 16 8 Find Next Open Spot - Numerical Technique . . . . . . . . . . . . . . . . . . 23 9 Find Next Open Spot - Walk Technique . . . . . . . . . . . . . . . . . . . . . 24 10 Heap Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 11 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Code Samples List of Figures 1 A Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 A Binary Search Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Binary Search Tree Operations . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 4 A min-heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Huffman Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Part I Trees Lecture Outlines CSCE 156 • Basic definitions • Tree Traversals CSCE 235 • Review of basic definitions • Huffman Coding CSCE 310 • Heaps, Heap Sort • Balanced BSTs: 2-3 Trees, AVL, Red-Black 1 Introduction Motivation: we want a data structure to store elements that offers efficient, arbitrary retrieval (search), insertion, and deletion. • Array-based Lists – O(n) insertion and deletion – Fast index-based retrieval – Efficient binary search if sorted • Linked Lists – Efficient, O(1) insert/delete for head/tail 4 – Inefficient, O(n) arbitrary search/insert/delete – Efficient binary search not possible without random access • Stacks and queues are efficient, but are restricted access data structures • Possible alternative: Trees • Trees have the potential to provide O(log n) efficiency for all operations 2 Definitions & Terminology • A tree is an acyclic graph • For our purposes: a tree is a collection of nodes (that can hold keys, data, etc.) that are connected by edges • Trees are also oriented : each node has a parent and children • A node with no parents is the root of the tree, all child nodes are oriented downward • Nodes not immediately connected can have an ancestor, descendant or cousin relation- ship • A node with no children is a leaf • A tree such that all nodes have at most two children is called a binary tree • A binary tree is also oriented horizontally: each node may have a left and/or a right child • Example: see Figure 1 • A path in a tree is a sequence nodes connected by edges • The length of a path in a tree is the number of edges in the path (which equals the number of nodes in the path minus one) • A path is simple if it does not traverse nodes more than once (this is the default type of path) • The depth of a node u is the length of the (unique) path from the root to u • The depth of the root is 0 • The depth of a tree is the maximal depth of any node in the tree (sometimes the term height is used) 5 • All nodes of the same depth are considered to be at the same level • A binary tree is complete (also called full or perfect) if all nodes are present at all levels 0 up to its depth d • A sub-tree rooted at a node u is the tree consisting of all descendants with u oriented as the root a b c d e f g h i j k l m n o p q r Figure 1: A Binary Tree Properties: • In a tree, all nodes are connected by exactly one unique path • The maximum number of nodes at any level k is 2k • Thus, the maximum number of nodes n for any binary tree of depth d is: d X n = 20 + 21 + 22 + · · · + 2d−1 + 2d = 2k = 2d+1 − 1 k=0 • Given a full binary tree with n nodes in it has depth: d = log (n + 1) − 1 • That is, d = O(log n) Motivation: if we can create a tree-based data structure with operations proportional to its depth, then we could potentially have a data structure that allows retrieval/search, insertion, and deletion in O(log n)-time. 6 3 Tree Traversal • Given a tree, we need a way to enumerate elements in a tree • Many algorithms exist to iterate over the elements in a tree • We’ll look at several variations on a depth-first-search 3.1 Preorder Traversal • A preorder traversal strategy visits nodes in the following order: root; left-sub-tree; right-sub-tree • An example traversal on the tree in Figure 1: a, b, d, g, l, m, r, h, n, e, i, o, c, f, j, p, q, k • Applications: – Building a tree, traversing a tree, copying a tree, etc. – Efficient stack-based implementation – Used in prefix notation (polish notation); used in languages such as Lisp/Scheme 3.2 Inorder Traversal • An inorder traversal strategy visits nodes in the following order: left-sub-tree; root; right-sub-tree • An example traversal on the tree in Figure 1: l, g, r, m, d, h, n, b, e, o, i, a, c, p, j, q, f, k • Applications: – Enumerating elements in order in a binary search tree – Expression trees 3.3 Postorder Traversal • A postorder traversal strategy visits nodes in the following order: left-sub-tree; right- sub-tree; root 7 • An example traversal on the tree in Figure 1: l, r, m, g, n, h, d, o, i, e, b, p, q, j, k, f, c, a • Applications: – Topological sorting – Destroying a tree when manual memory management is necessary (roots are the last thing that get cleaned up) – Reverse polish notation (operand-operand-operator, unambiguous, used in old HP calculators) – PostScript (Page Description Language) 3.4 Breadth-First Search Traversal • Breadth-First Search (BFS) traversal is a general graph traversal strategy that explores local or close nodes first before traversing “deeper” into the graph • When applied to an oriented binary tree, BFS explores the tree level-by-level (top-to- bottom, left-to-right) 3.5 Implementations & Data Structures • Reference based implementation: TreeNode<T> – Owns (through composition) references to: leftChild, rightChild, parent – Can use either sentinel nodes or null to indicate missing children and parent • BinaryTree<T> owns a root • SVN examples: unl.cse.bst 3.5.1 Preorder Implementations Input : A binary tree node u Output: A preorder traversal of the nodes in the subtree rooted at u 1 print u 2 preOrderTraversal(u → lef tChild) 3 preOrderTraversal(u → rightChild) Algorithm 1: Recursive Preorder Tree Traversal Stack-based implementation: 8 • Initially, we push the tree’s root into the stack • Within a loop, we pop the top of the stack and process it • We need to push the node’s children for future processing • Since a stack is LIFO, we push the right child first. Input : A binary tree, T Output: A preorder traversal of the nodes in T 1 S ← empty stack 2 push T ’s root onto S 3 while S is not empty do 4 node ← S.pop 5 push node’s right-child onto S 6 push node’s left-child onto S 7 process node 8 end Algorithm 2: Stack-based Preorder Tree Traversal 3.5.2 Inorder Implementation Stack-based implementation: • The same basic idea: push nodes onto the stack as you visit them • However, we want to delay processing the node until we’ve explored the left-sub-tree • We need a way to tell if we are visiting the node for the first time or returning from the left-tree exploration • To achieve this, we allow the node to be null • If null, then we are returning from a left-tree exploration, pop the top of the stack and process (then push the right child) • If not null, then we push it to the stack for later processing, explore the left child 9 Input : A binary tree, T Output: An inorder traversal of the nodes in T 1 S ← empty stack 2 u ← root 3 while S is not empty Or u 6= null do 4 if u 6= null then 5 push u onto S 6 u ← u.leftChild 7 else 8 u ← S.pop 9 process u 10 u ← u.rightChild 11 end 12 end Algorithm 3: Stack-based Inorder Tree Traversal 3.5.3 Postorder Implementation Stack-based implementation: • Same basic ideas, except that we need to distinguish if we’re visiting the node for the first time, second time or last (so that we can process it) • To achieve this, we keep track of where we came from: a parent, left, or right node • We keep track of a previous and a current node 10 Input : A binary tree, T Output: A postorder traversal of the nodes in T 1 S ← empty stack 2 prev ← null 3 push root onto S 4 while S is not empty do 5 curr ← S.peek 6 if prev = null Or prev.leftChild = curr Or prev.rightChild = curr then 7 if curr.leftChild 6= null then 8 push curr.leftChild onto S 9 else if curr.rightChild 6= null then 10 push curr.rightChild onto S 11 end 12 else if curr.leftChild = prev then 13 if curr.rightChild 6= null then 14 push curr.rightChild onto S 15 end 16 else 17 process curr 18 S.pop 19 end 20 prev ← curr 21 end Algorithm 4: Stack-based Postorder Tree Traversal 11 3.5.4 BFS Implementation Input : A binary tree, T Output: A BFS traversal of the nodes in T 1 Q ← empty queue 2 enqueue T ’s root into Q 3 while Q is not empty do 4 node ← Q.dequeue 5 enqueue node’s left-child onto Q 6 enqueue node’s right-child onto Q 7 print node 8 end Algorithm 5: Queue-based BFS Tree Traversal 3.5.5 Tree Walk Implementations • Simple rules based on local information: where you are and where you came from • No additional data structures required • Traversal is a “walk” around the perimeter of the tree • Can use similar rules to determine when the current node should be processed to achieve pre, in, and postorder traversals • Need to take care with corner cases (when current node is the root or children are missing) • Pseudocode presented Algorithm 6 3.6 Operations Basic Operations: • Search for a particular element/key • Adding an element – Add at most shallow available spot – Add at a random leaf 12 Input : A binary tree, T Output: A Tree Walk around T 1 curr ← root 2 prevT ype ← parent 3 while curr 6=null do 4 if prevT ype = parent then //preorder: process curr here 5 if curr.lef tChild exists then //Go to the left child: 6 curr ← curr.lef tChild 7 prevT ype ← parent 8 else 9 curr ← curr 10 prevT ype ← lef t 11 end 12 else if prevT ype = lef t then //inorder: process curr here 13 if curr.rightChild exists then //Go to the right child: 14 curr ← curr.rightChild 15 prevT ype ← parent 16 else 17 curr ← curr 18 prevT ype ← right 19 end 20 else if prevT ype = right then //postorder: process curr here 21 if curr.parent = null then //root has no parent, we’re done traversing 22 curr ← curr.parent //are we at the parent’s left or right child? 23 else if curr = curr.parent.lef tChild then 24 curr ← curr.parent 25 prevT ype ← lef t 26 else 27 curr ← curr.parent 28 prevT ype ← right 29 end 30 end 13 31 end Algorithm 6: Tree Walk based Tree Traversal – Add internally, shift nodes down by some criteria • Removing elements – Removing leaves – Removing elements with one child – Removing elements with two children Other Operations: • Compute the total number of nodes in a tree • Compute the total number of leaves in a tree • Given an item or node, compute its depth • Compute the depth of a tree 4 Binary Search Trees Regular binary search trees have little structure to their elements; search, insert, delete operations are still linear with respect to the number of tree nodes, O(n). We want a data structure with operations proportional to its depth, O(d). To this end, we add structure and order to tree nodes. • Each node has an associated key • Binary Search Tree Property: For every node u with key uk in T 1. Every node in the left-sub-tree of u has keys less than uk 2. Every node in the right-sub-tree of u has keys greater than uk • Duplicate keys can be handled, but you must be consistent and not guaranteed to be contiguous • Alternatively: do not allow duplicate keys or define a key scheme that ensures a total order • Inductive property: all sub-trees are also binary search trees • A full example can be found in Figure 2 14 50 40 60 20 45 90 10 24 49 80 95 2 15 28 48 75 85 12 Figure 2: A Binary Search Tree 4.1 Basic Operations Observation: a binary search tree has more structure: the key in each node provides infor- mation on where a node is not located. We will exploit this structure to achieve O(log n) operations. Search/retrieve • Goal: find a node (and its data) that matches a given key k • Start at the node • At each node u, compare k to u’s key, uk : – If equal, element found, stop and return – If k < uk , traverse to u’s left-child – If k > uk , traverse to u’s right-child • Traverse until the sub-tree is empty (element not found) • Analysis: number of comparisons is bounded by the depth of the tree, O(d) 15 Input : A binary search tree, T , a key k Output: The tree node u ∈ T whose key, uk matches k 1 u ← T ’s root 2 while u 6= φ do 3 if uk = k then 4 output u 5 end 6 else if uk > k then 7 u ← u’s left-child 8 else if uk < k then 9 u ← u’s left-child 10 end 11 output φ Algorithm 7: Search algorithm for a binary search tree Insert • Insert new nodes as leaves • To determine where it should be inserted: traverse the tree as above • Insert at the first available spot (first missing child node) • Analysis: finding the available location is O(d), inserting is just reference juggling, O(1) Delete • Need to first find the node u to delete, traverse as above, O(d) • If u is a leaf (no children): its safe to simply delete it • If u has one child, then we can “promote” it to u’s spot (u’s parent will now point to u’s child) • If u has two children, we need to find a way to preserve the BST property – Want to minimally change the tree’s structure – Need the operation to be efficient – Find the minimum element of the greater nodes (right sub-tree) or the maximal element of the lesser nodes (left sub-tree) – Such an element will have at most one child (which we know how to delete) 16 – Delete it and store off the key/data – Replace us key/data with the contents of the minimum/maximum element • Analysis: – Search/Find: O(d) – Finding the min/max: O(d) – Swapping: O(1) – In total: O(d) • Examples illustrated in Figure 3 5 Balanced Binary Search Trees Motivation: • Ideally, a full binary tree has depth d = O(log n) • General BSTs can degenerate such that d = O(n) • Need a way to preserve the BST property while at the same time limiting the depth to O(log n) • Solution: Balanced Binary Search Trees • BSTs that efficiently reorganize themselves such that the BST Property is preserved and that the depth is restricted to O(log n) • Some types of Balanced BSTs: – B-trees (and 2-3 trees) – AVL Trees – Red-Black Trees – Splay trees 5.1 2-3 Trees See 310 Note Set. 5.2 AVL Trees See 310 Note Set. 17 15 15 7 30 7 30 4 9 20 40 4 9 20 40 1 5 17 23 1 5 17 23 16 (a) A Binary Search Tree (b) Insertion of a new node (16) into a Binary Search Tree 15 9 7 30 7 30 4 9 20 40 4 20 40 1 5 17 23 1 5 17 23 (c) Deletion of a node with two children (15). (d) Node 15 is replaced with the extremal node, First step: find the maximum node in the left- preserving the BST property sub-tree (lesser elements). 9 9 7 30 4 30 4 20 40 1 5 20 40 1 5 17 23 17 23 (e) Deletion a node with only one child (7). (f) Removal is achieved by simply promoting the single child/subtree. Figure 3: Binary Search Tree Operations. Figure 3(b) depicts the insertion of (16) into the tree in Figure 3(a). Figures 3(c) and 3(d) depict the deletion of a node (15) with two children. Figures 3(e) and 3(f) depict the deletion of a node with only one child (7). 18 5.3 Red-Black Trees See 310 Note Set. 6 Optimal Binary Search Trees See 310 Note Set. 7 Heaps Definition 1. A heap is a binary tree that satisfies the following properties. 1. It is a full or complete binary tree: all nodes are present except possibly the last row 2. If the last row is not full, all nodes are full-to-the-left 3. It satisfies the Heap Property: every node has a key that is greater than both of its children (max-heap) • As a consequence of the Heap Property, the maximal element is always at the root • Alternatively, we can define a min-heap • Variations: 2-3 heaps, fibonacci heaps, etc. • A min-heap example can be found in Figure 4 1 5 30 50 55 40 80 60 53 72 70 65 45 90 95 92 63 Figure 4: A min-heap Applications 19 • Heaps are an optimal implementation of a priority queue • Used extensively in other algorithms (Heap Sort, Prim’s, Dijkstra’s, Huffman Coding, etc.) to ensure efficient operation 7.1 Operations Insert • Want to preserve the full-ness property and the Heap Property • Preserve full-ness: add new element at the end of the heap (last row, first free spot on the left) • Insertion at the end may violate the Heap Property • Heapify/fix: bubble up inserted element until Heap Property is satisfied • Analysis: insert is O(1); heapify: O(d) Remove from top • Want to preserve the full-ness property and the Heap Property • Preserve full-ness: swap root element with the last element in the heap (lowest row, right-most element) • Heap property may be violated • Heapify/fix: bubble new root element down until Heap Property is satisfied • Analysis: Swap is O(1); heapify: O(d) Others • Arbitrary remove • Find • Possible, but not ideal: Heaps are restricted-access data structures Analysis • All operations are O(d) • Since Heaps are full, d = O(log n) • Thus, all operations are O(log n) 20 7.2 Implementations Array-Based • Root is located at index 1 • If a node u is at index i, u’s left-child is at 2i, its right-child is at 2i + 1 • If node u is at index j, its parent is at index b 2j c • Alternatively: 0-index array left/right children/parent are at 2n + 1, 2n + 2, and b j−1 2 c • Advantage: easy implementation, all items are contiguous in the array (in fact, a BFS ordering!) • Disadvantage: Insert operation may force a reallocation, but this can be done in amortized-constant time (though may still have wasted space) Tree-Based • Reference-based tree (nodes which contain references to children/parent) • Parent reference is now required for efficiency • For efficiency, we need to keep track of the last element in the tree • For deletes/inserts: we need a way to find the last element and first “open spot” • We’ll focus on finding the first available open spot as the same technique can be used to find the last element with minor modifications 7.2.1 Finding the first available open spot in a Tree-based Heap Technique A: numerical technique • WLOG: assume we keep track of the number of nodes in the heap, n and thus the depth d = blog nc • If n = 2d+1 − 1 then the tree is full, the last element is all the way to the right, the first available spot is all the way to the left • Otherwise n < 2d+1 − 1 and the heap is not full (the first available spot is located at level d, root is at level 0) • Starting at the root, we want to know if the last element is in the left-subtree or the right subtree 21 • Let m = n − (2d − 1), the number of nodes present in level d d • If m ≥ 22 then the left-sub tree is full at the last level and so the next open spot would be in the right-sub tree d • Otherwise if m < 22 then the left-sub tree is not full at the last level and so the next open spot is in the left-sub tree • Traverse down to the left or right respectively and repeat: the resulting sub-tree will d have depth d − 1 with m = m (if traversing left) or m = m − 22 (if traversing right) • Repeat until we’ve found the first available spot • Analysis: in any case, its O(d) = O(log n) to traverse from the root to the first open spot 22 Input : A tree-based heap H with n nodes Output: The node whose child is the next available open spot in the heap 1 curr ← T.head 2 d ← blog nc 3 m←n 4 while curr has both children do 5 if m = 2d+1 − 1 then //remaining tree is full, traverse all the way left 6 while curr has both children do 7 curr ← curr.lef tChild 8 end 9 else //remaining tree is not full, determine if the next open spot is in the left or right sub-tree d 10 if m ≥ 22 then //left sub-tree is full 11 d ← (d − 1) d 12 m ← (m − 22 ) 13 curr ← curr.rightChild 14 else //left sub-tree is not full 15 d ← (d − 1) 16 m←m 17 curr ← curr.lef tChild 18 end 19 end 20 end 21 output curr Algorithm 8: Find Next Open Spot - Numerical Technique Technique B: Walk Technique • Alternatively, we can adapt the idea behind the tree walk algorithm to find the next available open spot • We’ll assume that we’ve kept track of the last node • If the tree is full, we simply traverse all the way to the left and insert, O(d) 23 • If the last node is a left-child then its parent’s right child is the next available spot, finding it is O(1) • Otherwise, we’ll need to traverse around the perimeter of the tree until we reach the next open slot Input : A tree-based heap H with n nodes Output: The node whose (missing) child is the next available open spot in the heap 1 d ← blog nc 2 if n = 2d+1 − 1 then //The tree is full, traverse all the way to the left 3 curr ← root 4 while curr.lef tChild 6= null do 5 curr ← curr.lef tChild 6 end 7 else if last is a left-child then //parent’s right child is open 8 curr ← last.parent 9 else //The open spot lies in a subtree to the right of the last node //Walk the tree until we reach it 10 curr ← last.parent 11 while curr is a right-child do 12 curr ← curr.parent 13 end //"turn" right 14 curr ← curr.parent 15 curr ← curr.rightChild //traverse all the way left 16 while curr.lef tChild 6= null do 17 curr ← curr.lef tChild 18 end 19 end //current node’s missing child is the open spot 20 output curr Algorithm 9: Find Next Open Spot - Walk Technique 24 7.3 Heap Sort • If min/max element is always at the top; simply insert all elements, then remove them all! • Perfect illustration of “Smart data structures and dumb code are a lot better than the other way around” Input : A collection of elements A = {a1 , . . . , an } Output: A collection, A0 of elements in A, sorted 1 H ← empty heap 2 A0 ← empty collection 3 foreach x ∈ A do 4 insert x into H 5 end 6 while H is not empty do 7 y ← remove top from H 8 Add y to the end of A0 9 end 10 output A0 Algorithm 10: Heap Sort Analysis • Amortized analysis: insert/remove operations are not constant throughout the algo- rithm • On first iteration: insert is d = O(1); on the i-th iteration, d = O(log i); only on the last iteration is insertion O(log n) • In total, the insert phase is: n X log i = O(n log n) i=1 • A similar lower bound can be shown • Same analysis applies to the remove phase: 1 X log i i=n • In total, O(n log n) 25 8 Java Collections Framework Java has support for several data structures supported by underlying tree structures. • java.util.PriorityQueue<E> is a binary-heap based priority queue – Priority (keys) based on either natural ordering or a provided Comparator – Guaranteed O(log n) time for insert (offer) and get top (poll) – Supports O(n) arbitrary remove(Object) and search (contains) methods • java.util.TreeSet<E> – Implements the SortedSet interface; makes use of a Comparator – Backed by TreeMap, a red-black tree balanced binary tree implementation – Guaranteed O(log n) time for add, remove, contains operations – Default iterator is an in-order traversal 9 Applications 9.1 Huffman Coding Overview • Coding Theory is the study and theory of codes—schemes for transmitting data • Coding theory involves efficiently padding out data with redundant information to increase reliability (detect or even correct errors) over a noisy channel • Coding theory also involves compressing data to save space – MP3s (uses a form of Huffman coding, but is information lossy) – jpegs, mpegs, even DVDs – pack (straight Huffman coding) – zip, gzip (uses a Ziv-Lempel and Huffman compression algorithm) Basics • Let Σ be a fixed alphabet of size n • A coding is a mapping of this alphabet to a collection of binary codewords, Σ → {0, 1}∗ 26 • A block encoding is a fixed length encoding scheme where all codewords have the same length (example: ASCII); requires dlog2 ne length codes • Not all symbols have the same frequency, alternative: variable length encoding • Intuitively: assign shorter codewords to more frequent symbols, longer to less frequent symbols • Reduction in the overall average codeword length • Variable length encodings must be unambiguous • Solution: prefix free codes: a code in which no whole codeword is the prefix of another (other than itself of course). • Examples: – {0, 01, 101, 010} is not a prefix free code. – {10, 010, 110, 0110} is a prefix free code. • A simple way of building a prefix free code is to associate codewords with the leaves of a binary tree (not necessarily full). • Each edge corresponds to a bit, 0 if it is to the left sub-child and 1 to the right sub-child. • Since no simple path from the root to any leaf can continue to another leaf, then we are guaranteed a prefix free coding. • Using this idea along with a greedy encoding forms the basis of Huffman Coding Steps • Consider a precomputed relative frequency function: freq : Σ → [0, 1] • Build a collection of weighted trees Tx for each symbol x ∈ Sigma with wt(Tx ) = freq(x) • Combine the two least weighted trees via a new node; associate a new weight (the sum of the weights of the two subtrees) • Keep combining until only one tree remains • The tree constructed in Huffman’s algorithm is known as a Huffman Tree and it defines a Huffman Code 27 Input : An alphabet of symbols, Σ with relative frequencies, freq(x) Output: A Huffman Tree 1 H ← new min-heap 2 foreach x ∈ Σ do 3 Tx ← single node tree 4 wt(Tx ) ← freq(x) 5 insert Tx into H 6 end 7 while size of H > 1 do 8 Tr ← new tree root node 9 Ta ← H.getM in 10 Tb ← H.getM in 11 Tr .lef tChild ← Ta 12 Tr .rightChild ← Tb 13 wt(r) ← wt(Ta ) + wt(Tb ) 14 insert Tr into H 15 end 16 output H.getM in Algorithm 11: Huffman Coding 9.1.1 Example Construct the Huffman Tree and Huffman Code for a file with the following content. character A B C D E F G frequency 0.10 0.15 0.34 .05 .12 .21 .03 • Average codeword length: .10 · 3 + .15 · 3 + .34 · 2 + .05 · 4 + .12 · 3 + .21 · 2 + .03 · 4 = 2.53 • Compression ratio: (3 − 2.53) = 15.67% 3 • In general, for text files, pack (Huffman Coding), claims an average compression ratio of 25-40%. • Degenerative cases: 28 .08 0 1 .39 .61 0 1 0 1 .18 F : .21 .27 C : .34 0 1 0 1 .08 A : .10 E : .12 B : .15 0 1 G : .03 D : .05 Figure 5: Huffman Tree – When the probability distribution is uniform: p(x) = p(y) for all x, y ∈ Σ – When the probability distribution follows a fibonacci sequence (the sum of each of the two smallest probabilities is less than equal to the next highest probability for all probabilities) A Stack-based Traversal Simulations A.1 Preorder Simulation of a stack-based preorder traversal of the binary tree in Figure 1. push a (enter loop) pop, node = e push f (enter loop) pop, node = r push i (no push) (enter loop) pop, node = g (no push) (no push) print c pop, node = a push m (no push) print e push c push l print r (enter loop) push b print g (enter loop) pop, node = f print a (enter loop) pop, node = i push k (enter loop) pop, node = h (no push) push j (enter loop) pop, node = l push n push o print f pop, node = b (no push) (no push) print i push e (no push) print h (enter loop) push d print l (enter loop) pop, node = j print b (enter loop) pop, node = o push q (enter loop) pop, node = n (no push) push p (enter loop) pop, node = m (no push) (no push) print j pop, node = d (no push) (no push) print o push h push r print n (enter loop) push g print m (enter loop) pop, node = p print d (enter loop) pop, node = c (no push) 29 (no push) (enter loop) (no push) (no push) print p pop, node = q (no push) (enter loop) (no push) print q pop, node = k print k A.2 Inorder Simulation of a stack-based inorder traversal of the binary tree in Figure 1. (enter loop, u = a) (enter loop, update u = null u=p push a u = null) (enter loop, process p update u = b pop r, update (enter loop, u = null) update u = null u=r u = null) pop i, update (enter loop, u = b) process r pop b, update u=i (enter loop, push b update u = null u=b process i u = null) update u = d process b update u = null pop j, update (enter loop, update u = e u=j (enter loop, u = d) u = null) (enter loop, process j push d pop m, update (enter loop, u = e) u = null) update u = q update u = g u=m push e pop a, update process m update u = null u=a (enter loop, u = q) (enter loop, u = g) update u = null process a push q push g (enter loop, update u = c update u = null update u = l (enter loop, u = null) u = null) pop e, update (enter loop, u = c) (enter loop, (enter loop, u = l) pop d, update u=e push c u = null) push l u=d process e update u = null pop q, update update u = null process d update u = i u=q update u = h (enter loop, process q (enter loop, (enter loop, u = i) u = null) update u = null u = null) (enter loop, u = h) push i pop c, update pop l, update u = l push h update u = o u=c (enter loop, process l update u = null process c u = null) update u = null (enter loop, u = o) update u = f pop f , update (enter loop, push o u=f (enter loop, u = null) update u = null (enter loop, u = f ) process f u = null) pop h, update push f update u = k pop g, update u=h (enter loop, update u = j u=g process h u = null) (enter loop, u = k) process g update u = n pop o, update (enter loop, u = j) push k update u = m u=o push j update u = null (enter loop, u = n) process o update u = p (enter loop, u = m) push n update u = null (enter loop, push m update u = null (enter loop, u = p) u = null) update u = r (enter loop, push p pop k, update (enter loop, u = null) update u = null u=k (enter loop, u = r) u = null) pop i, update process k push r pop n, update u=i (enter loop, update u = null update u = null u=n process i u = null) process n update u = null pop p, update (done) 30 A.3 Postorder Simulation of a stack-based postorder traversal of the binary tree in Figure 1: prev = null process l update prev = m update curr = (n) push a update prev = l check: (enter loop) prev.rightChild = curr (enter loop) (enter loop) update curr = (m) (null.rightChild = (n)) update curr = (a) update curr = (g) check: process n check: prev = null check: prev.rightChild = curr update prev = n push (b) prev.rightChild = curr (null.rightChild = (m)) update prev = a (null.rightChild = (g)) process m (enter loop) check: update prev = m update curr = (h) (enter loop) curr.lef tChild = prev check: update curr = (b) ((l).lef tChild = (l)) (enter loop) prev.rightChild = curr check: push (m) update curr = (g) (null.rightChild = (h)) prev.lef tChild = curr update prev = g check: process h ((b).lef tChild = (b)) prev.rightChild = curr update prev = h push (d) (enter loop) (null.rightChild = (g)) update prev = b update curr = (m) process g (enter loop) check: update prev = g update curr = (d) (enter loop) prev.rightChild = curr check: update curr = (d) ((m).rightChild = (m)) (enter loop) prev.rightChild = curr check: push (r) update curr = (d) ((n).rightChild = (d)) prev.lef tChild = curr update prev = m check: process d ((d).lef tChild = (d)) prev.rightChild = curr update prev = d push (g) (enter loop) ((m).rightChild = (d)) update prev = d update curr = (r) check: (enter loop) check: curr.lef tChild = prev update curr = (b) (enter loop) prev.lef tChild = curr ((g).lef tChild = (g)) check: update curr = (g) ((r).lef tChild = (r)) push (h) prev.rightChild = curr check: (noop) update prev = d ((h).rightChild = (b)) prev.lef tChild = curr update prev = r check: ((g).lef tChild = (g)) (enter loop) curr.lef tChild = prev push (l) (enter loop) update curr = (h) ((d).lef tChild = (d)) update prev = g update curr = (r) check: push (e) check: prev.rightChild = curr update prev = b (enter loop) prev.rightChild = curr ((h).rightChild = (h)) update curr = (l) (null.rightChild = (r)) push (n) (enter loop) check: process r update prev = h update curr = (e) prev.lef tChild = curr update prev = r check: ((l).lef tChild = (l)) (enter loop) prev.rightChild = curr (noop) (enter loop) update curr = (n) ((e).rightChild = (e)) update prev = l update curr = (m) check: push (i) check: prev.rightChild = curr update prev = e (enter loop) prev.rightChild = curr ((n).rightChild = (n)) update curr = (l) (null.rightChild = (m)) (noop) (enter loop) check: check: update prev = n update curr = (i) prev.rightChild = curr curr.lef tChild = prev check: (null.rightChild = (l)) ((r).lef tChild = (r)) (enter loop) prev.rightChild = curr 31 ((i).rightChild = (i)) check: ((q).rightChild = (f )) push (o) prev.rightChild = curr (enter loop) check: update prev = i ((i).rightChild = (b)) update curr = (p) curr.lef tChild = prev process b check: ((j).lef tChild = (j)) (enter loop) update prev = b prev.rightChild = curr push (k) update curr = (o) (null.rightChild = (p)) update prev = f check: (enter loop) process p prev.lef tChild = curr update curr = (a) update prev = p (enter loop) ((o).lef tChild = (o)) check: update curr = (k) (noop) prev.rightChild = curr (enter loop) check: update prev = o ((e).rightChild = (a)) update curr = (j) prev.rightChild = curr check: check: ((k).rightChild = (k)) (enter loop) curr.lef tChild = prev prev.rightChild = curr (noop) update curr = (o) ((b).lef tChild = (b)) (null.rightChild = (j)) update prev = k check: push (c) check: prev.rightChild = curr update prev = a curr.lef tChild = prev (enter loop) (null.rightChild = (o)) ((p).lef tChild = (p)) update curr = (k) process o (enter loop) push (q) check: update prev = o update curr = (c) update prev = j prev.rightChild = curr check: (null.rightChild = (k)) (enter loop) prev.rightChild = curr (enter loop) process k update curr = (i) ((c).rightChild = (c)) update curr = (q) update prev = k check: push (f ) check: prev.rightChild = curr update prev = c prev.rightChild = curr (enter loop) (null.rightChild = (i)) ((q).rightChild = (q)) update curr = (f ) check: (enter loop) (noop) check: curr.lef tChild = prev update curr = (f ) update prev = q prev.rightChild = curr ((o).lef tChild = (o)) check: (null.rightChild = (f )) update prev = i prev.rightChild = curr (enter loop) process f ((f ).rightChild = (f )) update curr = (q) update prev = f (enter loop) push (j) check: update curr = (i) update prev = f prev.rightChild = curr (enter loop) check: (null.rightChild = (q)) update curr = (c) prev.rightChild = curr (enter loop) process q check: (null.rightChild = (i)) update curr = (j) update prev = q prev.rightChild = curr process i check: ((k).rightChild = (c)) update prev = i prev.lef tChild = curr (enter loop) process c ((j).lef tChild = (j)) update curr = (j) update prev = c (enter loop) push (p) check: update curr = (e) update prev = j prev.rightChild = curr (enter loop) check: (null.rightChild = (j)) update curr = (a) prev.rightChild = curr (enter loop) process j check: (null.rightChild = (e)) update curr = (p) update prev = j prev.rightChild = curr process e check: ((f ).rightChild = (a)) update prev = e prev.lef tChild = curr (enter loop) process a ((p).lef tChild = (p)) update curr = (f ) update prev = a (enter loop) (noop) check: update curr = (b) update prev = p prev.rightChild = curr 32