CSC 301 Hebrew University Java Data Structures Computer Programming Task

User Generated

zf16

Programming

CSC 301

Hebrew university

CSC

Description

This is for a Data Structures class.

Here is the .java file: https://www.dropbox.com/s/8olmtlxds2zu9ct/MyDigraph.java?dl=0

Here is the folder with all the files: https://www.dropbox.com/s/mszjm6o6ed43xnr/algs4.zi...

https://www.dropbox.com/s/8olmtlxds2zu9ct/MyDigrap...Try this.

Unformatted Attachment Preview

CSC 301/403 - Data Structures II Lecture 8 Dr. David Zaretsky david.zaretsky@depaul.edu 1 CSC 301 – DePaul University 1/3/21 Today’s Topics } } } } } Minimum Spanning Trees Edge-weighted graph API Greedy algorithm Kruskal's algorithm Prim's algorithm 2 CSC 301 – DePaul University 1/3/21 Minimum Spanning Trees } } } } } edge-weighted graph API greedy algorithm Kruskal's algorithm Prim's algorithm advanced topics 3 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 18 5 16 9 11 8 10 14 7 21 graph G 4 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 5 16 9 18 11 8 10 14 7 21 not connected 5 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 5 16 9 18 11 8 10 14 7 21 not acyclic 6 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. Brute force. Try all spanning trees? 24 4 23 6 18 5 16 9 11 8 10 14 7 21 spanning tree T: cost = 50 = 4 + 6 + 8 + 5 + 11 + 9 + 7 7 CSC 301 – DePaul University 1/3/21 Applications } MST is fundamental problem with diverse applications. } } } } } } } } } } } } 8 Dithering. Cluster analysis. Max bottleneck paths. Real-time face verification. LDPC codes for error correction. Image registration with Renyi entropy. Find road networks in satellite and aerial imagery. Reducing data storage in sequencing amino acids in a protein. Model locality of particle interactions in turbulent fluid flows. Autoconfig protocol for Ethernet bridging to avoid cycles in a network. Approximation algorithms for NP-hard problems (e.g., TSP, Steiner tree). Network design (communication, electrical, hydraulic, cable, computer, road). http://www.ics.uci.edu/~eppstein/gina/mst.html CSC 301 – DePaul University 1/3/21 Weighted edge API } } Edge abstraction needed for weighted edges. Idiom for processing an edge e: int v = e.either(), w = e.other(v); public class Edge implements Comparable Edge(int v, int w, double create a weighted edge v-w weight) int either() either endpoint int other(int v) the endpoint that's not v int compareTo(Edge that) compare this edge to that edge double weight() the weight String toString() string representation v 9 weight w CSC 301 – DePaul University 1/3/21 Weighted edge: Java implementation public class Edge implements Comparable { private final int v, w; private final double weight; public Edge(int v, int w, double weight) { this.v = v; this.w = w; this.weight = weight; } constructor public int either() { return v; } either endpoint public int other(int vertex) { if (vertex == v) return w; else return v; } } other endpoint public int compareTo(Edge that) { if (this.weight < that.weight) return -1; else if (this.weight > that.weight) return +1; else return 0; } 10 compare edges by weight CSC 301 – DePaul University 1/3/21 Edge-weighted graph API public class EdgeWeightedGraph EdgeWeightedGraph(int V) create an empty graph with V vertices EdgeWeightedGraph(In in) create a graph from input stream addEdge(Edge e) add weighted edge e to this graph Iterable adj(int v) edges incident to v Iterable edges() all edges in this graph int V() number of vertices int E() number of edges toString() string representation void String • Conventions. Allow self-loops and parallel edges. 11 CSC 301 – DePaul University 1/3/21 Edge-weighted graph: adjacency-lists representation } Maintain vertex-indexed array of Edge lists. tinyEWG.txt V 8 16 4 5 4 7 5 7 0 7 1 5 0 4 2 3 1 7 0 2 1 2 1 3 2 7 6 2 3 6 6 0 6 4 E 0.35 0.37 0.28 0.16 0.32 0.38 0.17 0.19 0.26 0.36 0.29 0.34 0.40 0.52 0.58 0.93 adj[] 0 1 2 3 4 5 6 7 6 0 .58 0 2 .26 0 4 .38 0 7 .16 1 3 .29 1 2 .36 1 7 .19 1 5 .32 6 2 .40 2 7 .34 1 2 .36 0 2 .26 3 6 .52 1 3 .29 2 3 .17 6 4 .93 0 4 .38 4 7 .37 1 5 .32 5 7 .28 4 5 .35 6 4 .93 6 0 .58 3 6 .52 6 2 .40 2 7 .34 1 7 .19 0 7 .16 5 7 .28 Bag objects 2 3 .17 4 5 .35 references to the same Edge object 5 7 .28 Edge-weighted graph representation 12 CSC 301 – DePaul University 1/3/21 Edge-weighted graph: adjacency-lists implementation } } Identical to Graph.java but use Edge adjacency sets instead of int. Parallel edges and selfloops allowed public class EdgeWeightedGraph { private final int V; private final Bag[] adj; public EdgeWeightedGraph(int V) { this.V = V; adj = (Bag[]) new Bag[V]; for (int v = 0; v < V; v++) adj[v] = new Bag(); } public void addEdge(Edge e) { int v = e.either(), w = e.other(v); adj[v].add(e); adj[w].add(e); } same as Graph, but adjacency lists of Edges instead of integers constructor add edge to both adjacency lists public Iterable adj(int v) { return adj[v]; } } 13 CSC 301 – DePaul University 1/3/21 Minimum spanning tree API } Q. How to represent the MST? public class MST Iterable double MST(EdgeWeightedGraph G) constructor edges() edges in MST weight() weight of MST % java MST tinyEWG.txt 0-7 0.16 1-7 0.19 0-2 0.26 2-3 0.17 5-7 0.28 4-5 0.35 6-2 0.40 1.81 14 CSC 301 – DePaul University 1/3/21 Minimum spanning tree API } Q. How to represent the MST? public class MST Iterable double MST(EdgeWeightedGraph G) constructor edges() edges in MST weight() weight of MST public static void main(String[] args) { In in = new In(args[0]); EdgeWeightedGraph G = new EdgeWeightedGraph(in); MST mst = new MST(G); for (Edge e : mst.edges()) StdOut.println(e); StdOut.printf("%.2f\n", mst.weight()); } 15 CSC 301 – DePaul University % java MST tinyEWG.txt 0-7 0.16 1-7 0.19 0-2 0.26 2-3 0.17 5-7 0.28 4-5 0.35 6-2 0.40 1.81 1/3/21 Cut property } } } } } Simplifying assumptions. Edge weights are distinct; graph is connected. Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing crossing edges separating edge of min weight is in the MST. gray from white vertices are drawn in red Q. Given a cut, why must the MST contain at least one crossing edge? A. Otherwise, it would not be connected. e minimum-weight crossing edge must be in the MST 16 CSC 301 – DePaul University Cut property 1/3/21 Cut property: correctness proof } } } } Simplifying assumptions. Edge weights are distinct; graph is connected. Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing edge of min weight is in the MST. Proof. Let e be the min-weight crossing edge in cut. } } } } } } Suppose e is not in the MST. Adding e to the MST creates a cycle. Some other edge f in cycle must be a crossing edge. Removing f and adding e is also a spanning tree. Since weight of e is less than the weight of f, that spanning tree is lower weight. Contradiction. the MST does not contain e f e adding e to MST creates a cycle 17 CSC 301 – DePaul University Cut property 1/3/21 Greedy MST algorithm demo } Greedy algorithm. } } } 18 Start with all edges colored gray. Find a cut with no black crossing edges, and color its min-weight edge black. Continue until V - 1 edges are colored black. CSC 301 – DePaul University 1/3/21 Greedy MST algorithm: correctness proof } } Proposition. The greedy algorithm computes the MST. Proof } } Any edge colored black is in the MST (via cut property). If fewer than V - 1 black edges, there exists a cut with no black crossing edges. (consider cut whose vertices are one connected component) fewer than V-1 edges colored black 19 a cut with no black crossing edges CSC 301 – DePaul University 1/3/21 Greedy MST algorithm: efficient implementations } Proposition. The following algorithm computes the MST: } } } } Start with all edges colored gray. Find a cut with no black crossing edges, and color its min-weight edge black. Continue until V - 1 edges are colored black. Efficient implementations. How to choose cut? How to find min-weight edge? } } } 20 Ex 1. Kruskal's algorithm. [stay tuned] Ex 2. Prim's algorithm. [stay tuned] Ex 3. Borüvka's algorithm. CSC 301 – DePaul University 1/3/21 Removing two simplifying assumptions } } Q. What if edge weights are not all distinct? A. Greedy MST algorithm still correct if equal weights are present! (our correctness proof fails, but that can be fixed) no MST if graph is not connected } } Q. What if graph is not connected? A. Compute minimum spanning forest = MST of each component. can independently compute MSTs of components 21 CSC 301 – DePaul University weights need not be 4 4 5 1 2 0 1 0 5 6 6 5 3 3 6 2 0.61 0.62 0.88 0.11 0.35 0.6 0.10 0.22 1/3/21 Kruskal's algorithm demo } Kruskal's algorithm. } } 22 Consider edges in ascending order of weight. Add the next edge to the tree T unless doing so would create a cycle. CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: visualization 23 CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: correctness proof } } Proposition. Kruskal's algorithm computes the MST. Proof. Kruskal's algorithm is a special case of the greedy MST algorithm. } } } } 24 Suppose Kruskal's algorithm colors the edge e = v–w black. Cut = set of vertices connected to v in tree T. No crossing edge is black. add edge to tree No crossing edge has lower weight. Why? CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: implementation challenge } Challenge. Would adding edge v–w to tree T create a cycle? If not, add it. } How difficult? } } } } } 25 E +V V log V log* V 1 run DFS from v, check if w is reachable (T has at most V – 1 edges) use the union-find data structure ! add edge to tree CSC 301 – DePaul University adding edge to tree would create a cycle 1/3/21 Kruskal's algorithm: implementation challenge } } Challenge. Would adding edge v–w to tree T create a cycle? If not, add it. Efficient solution. Use the union-find data structure. } } } Maintain a set for each connected component in T. If v and w are in same set, then adding v–w would create a cycle. To add v–w to T, merge sets containing v and w. w v w v Case 1: adding v–w creates a cycle 26 Case 2: add v–w to T and merge sets containing v and w CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: Java implementation public class KruskalMST { private Queue mst = new Queue(); build priority queue public KruskalMST(EdgeWeightedGraph G) { MinPQ pq = new MinPQ(); for (Edge e : G.edges()) pq.insert(e); UF uf = new UF(G.V()); while (!pq.isEmpty() && mst.size() < G.V()-1) { Edge e = pq.delMin(); int v = e.either(), w = e.other(v); if (!uf.connected(v, w)) { uf.union(v, w); mst.enqueue(e); } } greedily add edges to MST edge v–w does not create cycle merge sets add edge to MST } public Iterable edges() { return mst; } } 27 CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: running time } } Proposition. Kruskal's algorithm computes MST in time proportional to E log E (in the worst case). Proof: operation frequency time per op build pq 1 E delete-min E log E union V log* V † connected E log* V † † amortized bound using weighted quick union with path compression } Remark. If edges are already sorted, order of growth is E log* V. 28 CSC 301 – DePaul University 1/3/21 Prim's algorithm demo } Prim's algorithm. } } 29 Start with vertex 0 and greedily grow tree T. At each step, add to T the min weight edge with exactly one endpoint in T. CSC 301 – DePaul University 1/3/21 Prim’s algorithm: visualization 30 CSC 301 – DePaul University 1/3/21 Prim's algorithm: proof of correctness } } Proposition. Prim's algorithm computes the MST. Pf. Prim's algorithm is a special case of the greedy MST algorithm. } Suppose edge e = min weight edge connecting a vertex on the tree to a vertex not on the tree. Cut = set of vertices connected on tree. edge e = 7-5 added to tree No crossing edge is black. No crossing edge has lower weight. 31 CSC 301 – DePaul University } } } 1/3/21 Prim's algorithm: implementation } } Challenge. Find the min weight edge with exactly one endpoint in T. How difficult? } } } } } 32 E V log E log* E l try all edges use a priority queue ! CSC 301 – DePaul University 1/3/21 Prim's algorithm: lazy implementation } } Challenge. Find the min weight edge with exactly one endpoint in T. Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. } } } } Key = edge; priority = weight of edge. Delete-min to determine next edge e = v–w to add to T. Disregard if both endpoints v and w are in T. 1-7 is min weight edge with exactly one endpoint in T Otherwise, let v be vertex not in T : } } 33 add to PQ any edge incident to v (assuming other endpoint not in T) add v to T CSC 301 – DePaul University priority queue of crossing edges 1-7 0-2 5-7 2-7 4-7 0-4 6-0 0.19 0.26 0.28 0.34 0.37 0.38 0.58 1/3/21 Prim's algorithm: lazy implementation public class LazyPrimMST { private boolean[] marked; private Queue mst; private MinPQ pq; // MST vertices // MST edges // PQ of edges public LazyPrimMST(WeightedGraph G) { pq = new MinPQ(); mst = new Queue(); marked = new boolean[G.V()]; visit(G, 0); assume G is connected while (!pq.isEmpty()) { Edge e = pq.delMin(); int v = e.either(), w = e.other(v); if (marked[v] && marked[w]) continue; mst.enqueue(e); if (!marked[v]) visit(G, v); if (!marked[w]) visit(G, w); } repeatedly delete the min weight edge e = v–w from PQ ignore if both endpoints in T add edge e to tree add v or w to tree } } 34 CSC 301 – DePaul University 1/3/21 Prim's algorithm: lazy implementation private void visit(WeightedGraph G, int v) { marked[v] = true; for (Edge e : G.adj(v)) if (!marked[e.other(v)]) pq.insert(e); } add v to T for each edge e = v–w, add to PQ if w not already in T public Iterable mst() { return mst; } 35 CSC 301 – DePaul University 1/3/21 Lazy Prim's algorithm: running time } Proposition. Lazy Prim's algorithm computes the MST in time proportional to E log E and extra space proportional to E (in the worst case). } Pf. 36 operation frequency binary heap delete min E log E insert E log E CSC 301 – DePaul University 1/3/21 Prim's algorithm: eager implementation } } Challenge. Find min weight edge with exactly one endpoint in T. Eager solution. Maintain a PQ of vertices connected by an edge to T, where priority of vertex v = weight of shortest edge connecting v to T. } } Delete min vertex v and add its associated edge e = v–w to T. Update PQ by considering all edges e = v–x incident to v } } } ignore if x is already in T add x to PQ if not already on it decrease priority of x if v–x becomes shortest edge connecting x to T 0 1 2 3 4 5 6 7 1-7 0-2 1-3 0-4 5-7 6-0 0-7 0.19 0.26 0.29 0.38 0.28 0.58 0.16 red: on PQ black: on MST 37 CSC 301 – DePaul University 1/3/21 Prim's algorithm: eager demo } } Use IndexMinPQ: key = edge weight, index = vertex. (eager version has at most one PQ entry per vertex) 38 CSC 301 – DePaul University 1/3/21 Indexed priority queue } Associate an index between 0 and N - 1 with each key in a priority queue. } } Client can insert and delete-the-minimum. Client can change the key by specifying the index. public class IndexMinPQ IndexMinPQ(int N) void insert(int k, Key key) associate key with index k void decreaseKey(int k, Key key) decrease the key associated with index k boolean contains(k) int delMin() boolean isEmpty() int size() 39 create indexed priority queue with indices 0, 1, …, N-1 is k an index on the priority queue? remove a minimal key and return its associated index is the priority queue empty? number of entries in the priority queue CSC 301 – DePaul University 1/3/21 Indexed priority queue implementation } Implementation. } } Start with same code as MinPQ. Maintain parallel arrays keys[], pq[], and qp[] so that: } keys[i] is the priority of i ¨ ¨ } pq[i] is the index of the key in heap position i qp[i] is the heap position of the key with index i Use swim(qp[k]) implement decreaseKey(k, key). i keys[i] pq[i] qp[i] 0 A 1 1 S 0 5 2 O 6 4 3 R 7 8 1 2 N 4 O 4 T 2 7 5 I 1 6 6 N 5 2 8 3 - A 3 5 S 7 G 4 3 6 I G 7 T 8 R 40 CSC 301 – DePaul University 1/3/21 Prim's algorithm: running time } Depends on PQ implementation: V insert,V delete-min, E decrease-key. PQ implementation insert delete-min decrease-key total array 1 V 1 V2 binary heap log V log V log V E log V d-way heap (Johnson 1975) d logd V d logd V logd V E logE/V V 1† log V † 1† E + V log V Fibonacci heap (FredmanTarjan 1984) } † amortized Bottom line. } } } } 41 Array implementation optimal for dense graphs. Binary heap much faster for sparse graphs. 4-way heap worth the trouble in performance-critical situations. Fibonacci heap best in theory, but not worth implementing. CSC 301 – DePaul University 1/3/21 Does a linear-time MST algorithm exist? deterministic compare-based MST algorithms } year worst case discovered by 1975 E log log V Yao 1976 E log log V Cheriton-Tarjan 1984 E log* V, E + V log V Fredman-Tarjan 1986 E log (log* V) Gabow-Galil-Spencer-Tarjan 1997 E α(V) log α(V) Chazelle 2000 E α(V) Chazelle 2002 optimal Pettie-Ramachandran 20xx E ??? Remark. Linear-time randomized MST algorithm (Karger-Klein-Tarjan 1995). 42 CSC 301 – DePaul University 1/3/21 Euclidean MST } } } } Given N points in the plane, find MST connecting them, where the distances between point pairs are their Euclidean distances. Brute force. Compute ~ N 2 / 2 distances and run Prim's algorithm. Ingenuity. Exploit geometry and do it in ~ c N log N. 43 CSC 301 – DePaul University 1/3/21 Scientific application: clustering } } } k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Goal. Divide into clusters so that objects in different clusters are far apart. outbreak of cholera deaths in London in 1850s (Nina Mishra) } Applications. } } } } 44 Routing in mobile ad hoc networks. Document categorization for web search. Similarity searching in medical image databases. Skycat: cluster 109 sky objects into stars, quasars, galaxies. CSC 301 – DePaul University 1/3/21 Single-link clustering } } } } k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Single link. Distance between two clusters equals the distance between the two closest objects (one in each cluster). Single-link clustering. Given an integer k, find a k-clustering that maximizes the distance between two closest clusters. distance between two clusters distance between two closest clusters 4-clustering 45 CSC 301 – DePaul University 1/3/21 Single-link clustering algorithm } “Well-known” algorithm for single-link clustering: } } } } } Form V clusters of one object each. Find the closest pair of objects such that each object is in a different cluster, and merge the two clusters. Repeat until there are exactly k clusters. Observation. This is Kruskal's algorithm (stop when k connected components). Alternate solution. Run Prim's algorithm and delete k-1 max weight edges. 46 CSC 301 – DePaul University 1/3/21 Next… } } } Read Algorithms, Chapters 4.2 & 4.3 Look on D2L for homework #4 Quiz 4 47 CSC 301 – DePaul University 1/3/21 CSC 301/403 - Data Structures II Lecture 8 Dr. David Zaretsky david.zaretsky@depaul.edu 1 CSC 301 – DePaul University 1/3/21 Today’s Topics } } } } } Minimum Spanning Trees Edge-weighted graph API Greedy algorithm Kruskal's algorithm Prim's algorithm 2 CSC 301 – DePaul University 1/3/21 Minimum Spanning Trees } } } } } edge-weighted graph API greedy algorithm Kruskal's algorithm Prim's algorithm advanced topics 3 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 18 5 16 9 11 8 10 14 7 21 graph G 4 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 5 16 9 18 11 8 10 14 7 21 not connected 5 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. 24 4 23 6 5 16 9 18 11 8 10 14 7 21 not acyclic 6 CSC 301 – DePaul University 1/3/21 Minimum spanning tree } } } } Given. Undirected graph G with positive edge weights (connected). Def. A spanning tree of G is a subgraph T that is connected and acyclic. Goal. Find a min weight spanning tree. Brute force. Try all spanning trees? 24 4 23 6 18 5 16 9 11 8 10 14 7 21 spanning tree T: cost = 50 = 4 + 6 + 8 + 5 + 11 + 9 + 7 7 CSC 301 – DePaul University 1/3/21 Applications } MST is fundamental problem with diverse applications. } } } } } } } } } } } } 8 Dithering. Cluster analysis. Max bottleneck paths. Real-time face verification. LDPC codes for error correction. Image registration with Renyi entropy. Find road networks in satellite and aerial imagery. Reducing data storage in sequencing amino acids in a protein. Model locality of particle interactions in turbulent fluid flows. Autoconfig protocol for Ethernet bridging to avoid cycles in a network. Approximation algorithms for NP-hard problems (e.g., TSP, Steiner tree). Network design (communication, electrical, hydraulic, cable, computer, road). http://www.ics.uci.edu/~eppstein/gina/mst.html CSC 301 – DePaul University 1/3/21 Weighted edge API } } Edge abstraction needed for weighted edges. Idiom for processing an edge e: int v = e.either(), w = e.other(v); public class Edge implements Comparable Edge(int v, int w, double create a weighted edge v-w weight) int either() either endpoint int other(int v) the endpoint that's not v int compareTo(Edge that) compare this edge to that edge double weight() the weight String toString() string representation v 9 weight w CSC 301 – DePaul University 1/3/21 Weighted edge: Java implementation public class Edge implements Comparable { private final int v, w; private final double weight; public Edge(int v, int w, double weight) { this.v = v; this.w = w; this.weight = weight; } constructor public int either() { return v; } either endpoint public int other(int vertex) { if (vertex == v) return w; else return v; } } other endpoint public int compareTo(Edge that) { if (this.weight < that.weight) return -1; else if (this.weight > that.weight) return +1; else return 0; } 10 compare edges by weight CSC 301 – DePaul University 1/3/21 Edge-weighted graph API public class EdgeWeightedGraph EdgeWeightedGraph(int V) create an empty graph with V vertices EdgeWeightedGraph(In in) create a graph from input stream addEdge(Edge e) add weighted edge e to this graph Iterable adj(int v) edges incident to v Iterable edges() all edges in this graph int V() number of vertices int E() number of edges toString() string representation void String • Conventions. Allow self-loops and parallel edges. 11 CSC 301 – DePaul University 1/3/21 Edge-weighted graph: adjacency-lists representation } Maintain vertex-indexed array of Edge lists. tinyEWG.txt V 8 16 4 5 4 7 5 7 0 7 1 5 0 4 2 3 1 7 0 2 1 2 1 3 2 7 6 2 3 6 6 0 6 4 E 0.35 0.37 0.28 0.16 0.32 0.38 0.17 0.19 0.26 0.36 0.29 0.34 0.40 0.52 0.58 0.93 adj[] 0 1 2 3 4 5 6 7 6 0 .58 0 2 .26 0 4 .38 0 7 .16 1 3 .29 1 2 .36 1 7 .19 1 5 .32 6 2 .40 2 7 .34 1 2 .36 0 2 .26 3 6 .52 1 3 .29 2 3 .17 6 4 .93 0 4 .38 4 7 .37 1 5 .32 5 7 .28 4 5 .35 6 4 .93 6 0 .58 3 6 .52 6 2 .40 2 7 .34 1 7 .19 0 7 .16 5 7 .28 Bag objects 2 3 .17 4 5 .35 references to the same Edge object 5 7 .28 Edge-weighted graph representation 12 CSC 301 – DePaul University 1/3/21 Edge-weighted graph: adjacency-lists implementation } } Identical to Graph.java but use Edge adjacency sets instead of int. Parallel edges and selfloops allowed public class EdgeWeightedGraph { private final int V; private final Bag[] adj; public EdgeWeightedGraph(int V) { this.V = V; adj = (Bag[]) new Bag[V]; for (int v = 0; v < V; v++) adj[v] = new Bag(); } public void addEdge(Edge e) { int v = e.either(), w = e.other(v); adj[v].add(e); adj[w].add(e); } same as Graph, but adjacency lists of Edges instead of integers constructor add edge to both adjacency lists public Iterable adj(int v) { return adj[v]; } } 13 CSC 301 – DePaul University 1/3/21 Minimum spanning tree API } Q. How to represent the MST? public class MST Iterable double MST(EdgeWeightedGraph G) constructor edges() edges in MST weight() weight of MST % java MST tinyEWG.txt 0-7 0.16 1-7 0.19 0-2 0.26 2-3 0.17 5-7 0.28 4-5 0.35 6-2 0.40 1.81 14 CSC 301 – DePaul University 1/3/21 Minimum spanning tree API } Q. How to represent the MST? public class MST Iterable double MST(EdgeWeightedGraph G) constructor edges() edges in MST weight() weight of MST public static void main(String[] args) { In in = new In(args[0]); EdgeWeightedGraph G = new EdgeWeightedGraph(in); MST mst = new MST(G); for (Edge e : mst.edges()) StdOut.println(e); StdOut.printf("%.2f\n", mst.weight()); } 15 CSC 301 – DePaul University % java MST tinyEWG.txt 0-7 0.16 1-7 0.19 0-2 0.26 2-3 0.17 5-7 0.28 4-5 0.35 6-2 0.40 1.81 1/3/21 Cut property } } } } } Simplifying assumptions. Edge weights are distinct; graph is connected. Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing crossing edges separating edge of min weight is in the MST. gray from white vertices are drawn in red Q. Given a cut, why must the MST contain at least one crossing edge? A. Otherwise, it would not be connected. e minimum-weight crossing edge must be in the MST 16 CSC 301 – DePaul University Cut property 1/3/21 Cut property: correctness proof } } } } Simplifying assumptions. Edge weights are distinct; graph is connected. Def. A cut in a graph is a partition of its vertices into two (nonempty) sets. A crossing edge connects a vertex in one set with a vertex in the other. Cut property. Given any cut, the crossing edge of min weight is in the MST. Proof. Let e be the min-weight crossing edge in cut. } } } } } } Suppose e is not in the MST. Adding e to the MST creates a cycle. Some other edge f in cycle must be a crossing edge. Removing f and adding e is also a spanning tree. Since weight of e is less than the weight of f, that spanning tree is lower weight. Contradiction. the MST does not contain e f e adding e to MST creates a cycle 17 CSC 301 – DePaul University Cut property 1/3/21 Greedy MST algorithm demo } Greedy algorithm. } } } 18 Start with all edges colored gray. Find a cut with no black crossing edges, and color its min-weight edge black. Continue until V - 1 edges are colored black. CSC 301 – DePaul University 1/3/21 Greedy MST algorithm: correctness proof } } Proposition. The greedy algorithm computes the MST. Proof } } Any edge colored black is in the MST (via cut property). If fewer than V - 1 black edges, there exists a cut with no black crossing edges. (consider cut whose vertices are one connected component) fewer than V-1 edges colored black 19 a cut with no black crossing edges CSC 301 – DePaul University 1/3/21 Greedy MST algorithm: efficient implementations } Proposition. The following algorithm computes the MST: } } } } Start with all edges colored gray. Find a cut with no black crossing edges, and color its min-weight edge black. Continue until V - 1 edges are colored black. Efficient implementations. How to choose cut? How to find min-weight edge? } } } 20 Ex 1. Kruskal's algorithm. [stay tuned] Ex 2. Prim's algorithm. [stay tuned] Ex 3. Borüvka's algorithm. CSC 301 – DePaul University 1/3/21 Removing two simplifying assumptions } } Q. What if edge weights are not all distinct? A. Greedy MST algorithm still correct if equal weights are present! (our correctness proof fails, but that can be fixed) no MST if graph is not connected } } Q. What if graph is not connected? A. Compute minimum spanning forest = MST of each component. can independently compute MSTs of components 21 CSC 301 – DePaul University weights need not be 4 4 5 1 2 0 1 0 5 6 6 5 3 3 6 2 0.61 0.62 0.88 0.11 0.35 0.6 0.10 0.22 1/3/21 Kruskal's algorithm demo } Kruskal's algorithm. } } 22 Consider edges in ascending order of weight. Add the next edge to the tree T unless doing so would create a cycle. CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: visualization 23 CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: correctness proof } } Proposition. Kruskal's algorithm computes the MST. Proof. Kruskal's algorithm is a special case of the greedy MST algorithm. } } } } 24 Suppose Kruskal's algorithm colors the edge e = v–w black. Cut = set of vertices connected to v in tree T. No crossing edge is black. add edge to tree No crossing edge has lower weight. Why? CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: implementation challenge } Challenge. Would adding edge v–w to tree T create a cycle? If not, add it. } How difficult? } } } } } 25 E +V V log V log* V 1 run DFS from v, check if w is reachable (T has at most V – 1 edges) use the union-find data structure ! add edge to tree CSC 301 – DePaul University adding edge to tree would create a cycle 1/3/21 Kruskal's algorithm: implementation challenge } } Challenge. Would adding edge v–w to tree T create a cycle? If not, add it. Efficient solution. Use the union-find data structure. } } } Maintain a set for each connected component in T. If v and w are in same set, then adding v–w would create a cycle. To add v–w to T, merge sets containing v and w. w v w v Case 1: adding v–w creates a cycle 26 Case 2: add v–w to T and merge sets containing v and w CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: Java implementation public class KruskalMST { private Queue mst = new Queue(); build priority queue public KruskalMST(EdgeWeightedGraph G) { MinPQ pq = new MinPQ(); for (Edge e : G.edges()) pq.insert(e); UF uf = new UF(G.V()); while (!pq.isEmpty() && mst.size() < G.V()-1) { Edge e = pq.delMin(); int v = e.either(), w = e.other(v); if (!uf.connected(v, w)) { uf.union(v, w); mst.enqueue(e); } } greedily add edges to MST edge v–w does not create cycle merge sets add edge to MST } public Iterable edges() { return mst; } } 27 CSC 301 – DePaul University 1/3/21 Kruskal's algorithm: running time } } Proposition. Kruskal's algorithm computes MST in time proportional to E log E (in the worst case). Proof: operation frequency time per op build pq 1 E delete-min E log E union V log* V † connected E log* V † † amortized bound using weighted quick union with path compression } Remark. If edges are already sorted, order of growth is E log* V. 28 CSC 301 – DePaul University 1/3/21 Prim's algorithm demo } Prim's algorithm. } } 29 Start with vertex 0 and greedily grow tree T. At each step, add to T the min weight edge with exactly one endpoint in T. CSC 301 – DePaul University 1/3/21 Prim’s algorithm: visualization 30 CSC 301 – DePaul University 1/3/21 Prim's algorithm: proof of correctness } } Proposition. Prim's algorithm computes the MST. Pf. Prim's algorithm is a special case of the greedy MST algorithm. } Suppose edge e = min weight edge connecting a vertex on the tree to a vertex not on the tree. Cut = set of vertices connected on tree. edge e = 7-5 added to tree No crossing edge is black. No crossing edge has lower weight. 31 CSC 301 – DePaul University } } } 1/3/21 Prim's algorithm: implementation } } Challenge. Find the min weight edge with exactly one endpoint in T. How difficult? } } } } } 32 E V log E log* E l try all edges use a priority queue ! CSC 301 – DePaul University 1/3/21 Prim's algorithm: lazy implementation } } Challenge. Find the min weight edge with exactly one endpoint in T. Lazy solution. Maintain a PQ of edges with (at least) one endpoint in T. } } } } Key = edge; priority = weight of edge. Delete-min to determine next edge e = v–w to add to T. Disregard if both endpoints v and w are in T. 1-7 is min weight edge with exactly one endpoint in T Otherwise, let v be vertex not in T : } } 33 add to PQ any edge incident to v (assuming other endpoint not in T) add v to T CSC 301 – DePaul University priority queue of crossing edges 1-7 0-2 5-7 2-7 4-7 0-4 6-0 0.19 0.26 0.28 0.34 0.37 0.38 0.58 1/3/21 Prim's algorithm: lazy implementation public class LazyPrimMST { private boolean[] marked; private Queue mst; private MinPQ pq; // MST vertices // MST edges // PQ of edges public LazyPrimMST(WeightedGraph G) { pq = new MinPQ(); mst = new Queue(); marked = new boolean[G.V()]; visit(G, 0); assume G is connected while (!pq.isEmpty()) { Edge e = pq.delMin(); int v = e.either(), w = e.other(v); if (marked[v] && marked[w]) continue; mst.enqueue(e); if (!marked[v]) visit(G, v); if (!marked[w]) visit(G, w); } repeatedly delete the min weight edge e = v–w from PQ ignore if both endpoints in T add edge e to tree add v or w to tree } } 34 CSC 301 – DePaul University 1/3/21 Prim's algorithm: lazy implementation private void visit(WeightedGraph G, int v) { marked[v] = true; for (Edge e : G.adj(v)) if (!marked[e.other(v)]) pq.insert(e); } add v to T for each edge e = v–w, add to PQ if w not already in T public Iterable mst() { return mst; } 35 CSC 301 – DePaul University 1/3/21 Lazy Prim's algorithm: running time } Proposition. Lazy Prim's algorithm computes the MST in time proportional to E log E and extra space proportional to E (in the worst case). } Pf. 36 operation frequency binary heap delete min E log E insert E log E CSC 301 – DePaul University 1/3/21 Prim's algorithm: eager implementation } } Challenge. Find min weight edge with exactly one endpoint in T. Eager solution. Maintain a PQ of vertices connected by an edge to T, where priority of vertex v = weight of shortest edge connecting v to T. } } Delete min vertex v and add its associated edge e = v–w to T. Update PQ by considering all edges e = v–x incident to v } } } ignore if x is already in T add x to PQ if not already on it decrease priority of x if v–x becomes shortest edge connecting x to T 0 1 2 3 4 5 6 7 1-7 0-2 1-3 0-4 5-7 6-0 0-7 0.19 0.26 0.29 0.38 0.28 0.58 0.16 red: on PQ black: on MST 37 CSC 301 – DePaul University 1/3/21 Prim's algorithm: eager demo } } Use IndexMinPQ: key = edge weight, index = vertex. (eager version has at most one PQ entry per vertex) 38 CSC 301 – DePaul University 1/3/21 Indexed priority queue } Associate an index between 0 and N - 1 with each key in a priority queue. } } Client can insert and delete-the-minimum. Client can change the key by specifying the index. public class IndexMinPQ IndexMinPQ(int N) void insert(int k, Key key) associate key with index k void decreaseKey(int k, Key key) decrease the key associated with index k boolean contains(k) int delMin() boolean isEmpty() int size() 39 create indexed priority queue with indices 0, 1, …, N-1 is k an index on the priority queue? remove a minimal key and return its associated index is the priority queue empty? number of entries in the priority queue CSC 301 – DePaul University 1/3/21 Indexed priority queue implementation } Implementation. } } Start with same code as MinPQ. Maintain parallel arrays keys[], pq[], and qp[] so that: } keys[i] is the priority of i ¨ ¨ } pq[i] is the index of the key in heap position i qp[i] is the heap position of the key with index i Use swim(qp[k]) implement decreaseKey(k, key). i keys[i] pq[i] qp[i] 0 A 1 1 S 0 5 2 O 6 4 3 R 7 8 1 2 N 4 O 4 T 2 7 5 I 1 6 6 N 5 2 8 3 - A 3 5 S 7 G 4 3 6 I G 7 T 8 R 40 CSC 301 – DePaul University 1/3/21 Prim's algorithm: running time } Depends on PQ implementation: V insert,V delete-min, E decrease-key. PQ implementation insert delete-min decrease-key total array 1 V 1 V2 binary heap log V log V log V E log V d-way heap (Johnson 1975) d logd V d logd V logd V E logE/V V 1† log V † 1† E + V log V Fibonacci heap (FredmanTarjan 1984) } † amortized Bottom line. } } } } 41 Array implementation optimal for dense graphs. Binary heap much faster for sparse graphs. 4-way heap worth the trouble in performance-critical situations. Fibonacci heap best in theory, but not worth implementing. CSC 301 – DePaul University 1/3/21 Does a linear-time MST algorithm exist? deterministic compare-based MST algorithms } year worst case discovered by 1975 E log log V Yao 1976 E log log V Cheriton-Tarjan 1984 E log* V, E + V log V Fredman-Tarjan 1986 E log (log* V) Gabow-Galil-Spencer-Tarjan 1997 E α(V) log α(V) Chazelle 2000 E α(V) Chazelle 2002 optimal Pettie-Ramachandran 20xx E ??? Remark. Linear-time randomized MST algorithm (Karger-Klein-Tarjan 1995). 42 CSC 301 – DePaul University 1/3/21 Euclidean MST } } } } Given N points in the plane, find MST connecting them, where the distances between point pairs are their Euclidean distances. Brute force. Compute ~ N 2 / 2 distances and run Prim's algorithm. Ingenuity. Exploit geometry and do it in ~ c N log N. 43 CSC 301 – DePaul University 1/3/21 Scientific application: clustering } } } k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Goal. Divide into clusters so that objects in different clusters are far apart. outbreak of cholera deaths in London in 1850s (Nina Mishra) } Applications. } } } } 44 Routing in mobile ad hoc networks. Document categorization for web search. Similarity searching in medical image databases. Skycat: cluster 109 sky objects into stars, quasars, galaxies. CSC 301 – DePaul University 1/3/21 Single-link clustering } } } } k-clustering. Divide a set of objects classify into k coherent groups. Distance function. Numeric value specifying "closeness" of two objects. Single link. Distance between two clusters equals the distance between the two closest objects (one in each cluster). Single-link clustering. Given an integer k, find a k-clustering that maximizes the distance between two closest clusters. distance between two clusters distance between two closest clusters 4-clustering 45 CSC 301 – DePaul University 1/3/21 Single-link clustering algorithm } “Well-known” algorithm for single-link clustering: } } } } } Form V clusters of one object each. Find the closest pair of objects such that each object is in a different cluster, and merge the two clusters. Repeat until there are exactly k clusters. Observation. This is Kruskal's algorithm (stop when k connected components). Alternate solution. Run Prim's algorithm and delete k-1 max weight edges. 46 CSC 301 – DePaul University 1/3/21 Next… } } } Read Algorithms, Chapters 4.2 & 4.3 Look on D2L for homework #4 Quiz 4 47 CSC 301 – DePaul University 1/3/21 Algorithms FOURTH EDITION http://avaxhome.ws/blogs/ChrisRedfield This page intentionally left blank Algorithms FOURTH EDITION Robert Sedgewick and Kevin Wayne Princeton University Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include electronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382-3419 corpsales@pearsontechgroup.com For sales outside the United States, please contact: International Sales international@pearson.com Visit us on the Web: informit.com/aw Cataloging-in-Publication Data is on file with the Library of Congress. Copyright © 2011 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, write to: Pearson Education, Inc. Rights and Contracts Department 501 Boylston Street, Suite 900 Boston, MA 02116 Fax: (617) 671-3447 ISBN-13: 978-0-321-57351-3 ISBN-10: 0-321-57351-X Text printed in the United States on recycled paper at Courier in Westford, Massachusetts. First printing, March 2011 ______________________________ To Adam, Andrew, Brett, Robbie and especially Linda ______________________________ ___________________ To Jackie and Alex ___________________ CONTENTS Preface . . . . . . . . . . . . . . . . . . . . . . . . .viii 1 Fundamentals . . . . . . . . . . . . . . . . . . . . . .3 1.1 Basic Programming Model 1.2 Data Abstraction 1.3 Bags, Queues, and Stacks 8 64 120 1.4 Analysis of Algorithms 172 1.5 216 Case Study: Union-Find 2 Sorting . . . . . . . . . . . . . . . . . . . . . . . 243 2.1 Elementary Sorts 244 2.2 Mergesort 270 2.3 Quicksort 288 2.4 308 Priority Queues 2.5 Applications 336 3 Searching . . . . . . . . . . . . . . . . . . . . . . 361 vi 3.1 Symbol Tables 362 3.2 Binary Search Trees 396 3.3 Balanced Search Trees 424 3.4 Hash Tables 458 3.5 Applications 486 4 Graphs . . . . . . . . . . . . . . . . . . . . . . . 515 4.1 Undirected Graphs 518 4.2 Directed Graphs 566 4.3 Minimum Spanning Trees 604 4.4 Shortest Paths 638 5 Strings . . . . . . . . . . . . . . . . . . . . . . . 695 5.1 String Sorts 702 5.2 Tries 730 5.3 Substring Search 758 5.4 Regular Expressions 788 5.5 Data Compression 810 6 Context . . . . . . . . . . . . . . . . . . . . . . . 853 Index . . . . . . . . . . . . . . . . . . . . . . . . . 933 Algorithms . . . . . . . . . . . . . . . . . . . . . . 954 Clients . . . . . . . . . . . . . . . . . . . . . . . . 955 vii PREFACE T his book is intended to survey the most important computer algorithms in use today, and to teach fundamental techniques to the growing number of people in need of knowing them. It is intended for use as a textbook for a second course in computer science, after students have acquired basic programming skills and familiarity with computer systems. The book also may be useful for self-study or as a reference for people engaged in the development of computer systems or applications programs, since it contains implementations of useful algorithms and detailed information on performance characteristics and clients. The broad perspective taken makes the book an appropriate introduction to the field. the study of algorithms and data structures is fundamental to any computerscience curriculum, but it is not just for programmers and computer-science students. Everyone who uses a computer wants it to run faster or to solve larger problems. The algorithms in this book represent a body of knowledge developed over the last 50 years that has become indispensable. From N-body simulation problems in physics to genetic-sequencing problems in molecular biology, the basic methods described here have become essential in scientific research; from architectural modeling systems to aircraft simulation, they have become essential tools in engineering; and from database systems to internet search engines, they have become essential parts of modern software systems. And these are but a few examples—as the scope of computer applications continues to grow, so grows the impact of the basic methods covered here. Before developing our fundamental approach to studying algorithms, we develop data types for stacks, queues, and other low-level abstractions that we use throughout the book. Then we survey fundamental algorithms for sorting, searching, graphs, and strings. The last chapter is an overview placing the rest of the material in the book in a larger context. viii Distinctive features The orientation of the book is to study algorithms likely to be of practical use. The book teaches a broad variety of algorithms and data structures and provides sufficient information about them that readers can confidently implement, debug, and put them to work in any computational environment. The approach involves: Algorithms. Our descriptions of algorithms are based on complete implementations and on a discussion of the operations of these programs on a consistent set of examples. Instead of presenting pseudo-code, we work with real code, so that the programs can quickly be put to practical use. Our programs are written in Java, but in a style such that most of our code can be reused to develop implementations in other modern programming languages. Data types. We use a modern programming style based on data abstraction, so that algorithms and their data structures are encapsulated together. Applications. Each chapter has a detailed description of applications where the algorithms described play a critical role. These range from applications in physics and molecular biology, to engineering computers and systems, to familiar tasks such as data compression and searching on the web. A scientific approach. We emphasize developing mathematical models for describing the performance of algorithms, using the models to develop hypotheses about performance, and then testing the hypotheses by running the algorithms in realistic contexts. Breadth of coverage. We cover basic abstract data types, sorting algorithms, searching algorithms, graph processing, and string processing. We keep the material in algorithmic context, describing data structures, algorithm design paradigms, reduction, and problem-solving models. We cover classic methods that have been taught since the 1960s and new methods that have been invented in recent years. Our primary goal is to introduce the most important algorithms in use today to as wide an audience as possible. These algorithms are generally ingenious creations that, remarkably, can each be expressed in just a dozen or two lines of code. As a group, they represent problemsolving power of amazing scope. They have enabled the construction of computational artifacts, the solution of scientific problems, and the development of commercial applications that would not have been feasible without them. ix Booksite An important feature of the book is its relationship to the booksite algs4.cs.princeton.edu. This site is freely available and contains an extensive amount of material about algorithms and data structures, for teachers, students, and practitioners, including: An online synopsis. The text is summarized in the booksite to give it the same overall structure as the book, but linked so as to provide easy navigation through the material. Full implementations. All code in the book is available on the booksite, in a form suitable for program development. Many other implementations are also available, including advanced implementations and improvements described in the book, answers to selected exercises, and client code for various applications. The emphasis is on testing algorithms in the context of meaningful applications. Exercises and answers. The booksite expands on the exercises in the book by adding drill exercises (with answers available with a click), a wide variety of examples illustrating the reach of the material, programming exercises with code solutions, and challenging problems. Dynamic visualizations. Dynamic simulations are impossible in a printed book, but the website is replete with implementations that use a graphics class to present compelling visual demonstrations of algorithm applications. Course materials. A complete set of lecture slides is tied directly to the material in the book and on the booksite. A full selection of programming assignments, with check lists, test data, and preparatory material, is also included. Links to related material. Hundreds of links lead students to background information about applications and to resources for studying algorithms. Our goal in creating this material was to provide a complementary approach to the ideas. Generally, you should read the book when learning specific algorithms for the first time or when trying to get a global picture, and you should use the booksite as a reference when programming or as a starting point when searching for more detail while online. x Use in the curriculum The book is intended as a textbook in a second course in computer science. It provides full coverage of core material and is an excellent vehicle for students to gain experience and maturity in programming, quantitative reasoning, and problemsolving. Typically, one course in computer science will suffice as a prerequisite—the book is intended for anyone conversant with a modern programming language and with the basic features of modern computer systems. The algorithms and data structures are expressed in Java, but in a style accessible to people fluent in other modern languages. We embrace modern Java abstractions (including generics) but resist dependence upon esoteric features of the language. Most of the mathematical material supporting the analytic results is self-contained (or is labeled as beyond the scope of this book), so little specific preparation in mathematics is required for the bulk of the book, although mathematical maturity is definitely helpful. Applications are drawn from introductory material in the sciences, again self-contained. The material covered is a fundamental background for any student intending to major in computer science, electrical engineering, or operations research, and is valuable for any student with interests in science, mathematics, or engineering. Context The book is intended to follow our introductory text, An Introduction to Programming in Java: An Interdisciplinary Approach, which is a broad introduction to the field. Together, these two books can support a two- or three-semester introduction to computer science that will give any student the requisite background to successfully address computation in any chosen field of study in science, engineering, or the social sciences. The starting point for much of the material in the book was the Sedgewick series of Algorithms books. In spirit, this book is closest to the first and second editions of that book, but this text benefits from decades of experience teaching and learning that material. Sedgewick’s current Algorithms in C/C++/Java, Third Edition is more appropriate as a reference or a text for an advanced course; this book is specifically designed to be a textbook for a one-semester course for first- or second-year college students and as a modern introduction to the basics and a reference for use by working programmers. xi Acknowledgments This book has been nearly 40 years in the making, so full recognition of all the people who have made it possible is simply not feasible. Earlier editions of this book list dozens of names, including (in alphabetical order) Andrew Appel, Trina Avery, Marc Brown, Lyn Dupré, Philippe Flajolet, Tom Freeman, Dave Hanson, Janet Incerpi, Mike Schidlowsky, Steve Summit, and Chris Van Wyk. All of these people deserve acknowledgement, even though some of their contributions may have happened decades ago. For this fourth edition, we are grateful to the hundreds of students at Princeton and several other institutions who have suffered through preliminary versions of the work, and to readers around the world for sending in comments and corrections through the booksite. We are grateful for the support of Princeton University in its unwavering commitment to excellence in teaching and learning, which has provided the basis for the development of this work. Peter Gordon has provided wise counsel throughout the evolution of this work almost from the beginning, including a gentle introduction of the “back to the basics” idea that is the foundation of this edition. For this fourth edition, we are grateful to Barbara Wood for her careful and professional copyediting, to Julie Nahil for managing the production, and to many others at Pearson for their roles in producing and marketing the book. All were extremely responsive to the demands of a rather tight schedule without the slightest sacrifice to the quality of the result. Robert Sedgewick Kevin Wayne Princeton, NJ January, 2011 xii This page intentionally left blank ONE Fundamentals 1.1 Basic Programming Model. . . . . . . . . 8 1.2 Data Abstraction . . . . . . . . . . . . . . 64 1.3 Bags, Queues, and Stacks . . . . . . . 120 1.4 Analysis of Algorithms . . . . . . . . . 172 1.5 Case Study: Union-Find. . . . . . . . . 216 T he objective of this book is to study a broad variety of important and useful algorithms—methods for solving problems that are suited for computer implementation. Algorithms go hand in hand with data structures—schemes for organizing data that leave them amenable to efficient processing by an algorithm. This chapter introduces the basic tools that we need to study algorithms and data structures. First, we introduce our basic programming model. All of our programs are implemented using a small subset of the Java programming language plus a few of our own libraries for input/output and for statistical calculations. Section 1.1 is a summary of language constructs, features, and libraries that we use in this book. Next, we emphasize data abstraction, where we define abstract data types (ADTs) in the service of modular programming. In Section 1.2 we introduce the process of implementing an ADT in Java, by specifying an applications programming interface (API) and then using the Java class mechanism to develop an implementation for use in client code. As important and useful examples, we next consider three fundamental ADTs: the bag, the queue, and the stack. Section 1.3 describes APIs and implementations of bags, queues, and stacks using arrays, resizing arrays, and linked lists that serve as models and starting points for algorithm implementations throughout the book. Performance is a central consideration in the study of algorithms. Section 1.4 describes our approach to analyzing algorithm performance. The basis of our approach is the scientific method: we develop hypotheses about performance, create mathematical models, and run experiments to test them, repeating the process as necessary. We conclude with a case study where we consider solutions to a connectivity problem that uses algorithms and data structures that implement the classic union-find ADT. 3 4 CHAPTER 1 ■ Fundamentals Algorithms When we write a computer program, we are generally implementing a method that has been devised previously to solve some problem. This method is often independent of the particular programming language being used—it is likely to be equally appropriate for many computers and many programming languages. It is the method, rather than the computer program itself, that specifies the steps that we can take to solve the problem. The term algorithm is used in computer science to describe a finite, deterministic, and effective problem-solving method suitable for implementation as a computer program. Algorithms are the stuff of computer science: they are central objects of study in the field. We can define an algorithm by describing a procedure for solving a problem in a natural language, or by writing a computer program that implements the procedure, as shown at right for Euclid’s algorithm for finding the greatest common divisor of two numbers, a variant of which was devised over 2,300 years ago. If you are not familiar English-language description with Euclid’s algorithm, you are encourCompute the greatest common divisor of two nonnegative integers p and q as follows: aged to work Exercise 1.1.24 and Exercise If q is 0, the answer is p. If not, divide p by q 1.1.25, perhaps after reading Section 1.1. In and take the remainder r. The answer is the this book, we use computer programs to degreatest common divisor of q and r. scribe algorithms. One important reason for doing so is that it makes easier the task of Java-language description public static int gcd(int p, int q) checking whether they are finite, determin{ istic, and effective, as required. But it is also if (q == 0) return p; int r = p % q; important to recognize that a program in a return gcd(q, r); particular language is just one way to express } an algorithm. The fact that many of the alEuclid’s algorithm gorithms in this book have been expressed in multiple programming languages over the past several decades reinforces the idea that each algorithm is a method suitable for implementation on any computer in any programming language. Most algorithms of interest involve organizing the data involved in the computation. Such organization leads to data structures, which also are central objects of study in computer science. Algorithms and data structures go hand in hand. In this book we take the view that data structures exist as the byproducts or end products of algorithms and that we must therefore study them in order to understand the algorithms. Simple algorithms can give rise to complicated data structures and, conversely, complicated algorithms can use simple data structures. We shall study the properties of many data structures in this book; indeed, we might well have titled the book Algorithms and Data Structures. CHAPTER 1 ■ Fundamentals When we use a computer to help us solve a problem, we typically are faced with a number of possible approaches. For small problems, it hardly matters which approach we use, as long as we have one that correctly solves the problem. For huge problems (or applications where we need to solve huge numbers of small problems), however, we quickly become motivated to devise methods that use time and space efficiently. The primary reason to learn about algorithms is that this discipline gives us the potential to reap huge savings, even to the point of enabling us to do tasks that would otherwise be impossible. In an application where we are processing millions of objects, it is not unusual to be able to make a program millions of times faster by using a welldesigned algorithm. We shall see such examples on numerous occasions throughout the book. By contrast, investing additional money or time to buy and install a new computer holds the potential for speeding up a program by perhaps a factor of only 10 or 100. Careful algorithm design is an extremely effective part of the process of solving a huge problem, whatever the applications area. When developing a huge or complex computer program, a great deal of effort must go into understanding and defining the problem to be solved, managing its complexity, and decomposing it into smaller subtasks that can be implemented easily. Often, many of the algorithms required after the decomposition are trivial to implement. In most cases, however, there are a few algorithms whose choice is critical because most of the system resources will be spent running those algorithms. These are the types of algorithms on which we concentrate in this book. We study fundamental algorithms that are useful for solving challenging problems in a broad variety of applications areas. The sharing of programs in computer systems is becoming more widespread, so although we might expect to be using a large fraction of the algorithms in this book, we also might expect to have to implement only a small fraction of them. For example, the Java libraries contain implementations of a host of fundamental algorithms. However, implementing simple versions of basic algorithms helps us to understand them better and thus to more effectively use and tune advanced versions from a library. More important, the opportunity to reimplement basic algorithms arises frequently. The primary reason to do so is that we are faced, all too often, with completely new computing environments (hardware and software) with new features that old implementations may not use to best advantage. In this book, we concentrate on the simplest reasonable implementations of the best algorithms. We do pay careful attention to coding the critical parts of the algorithms, and take pains to note where low-level optimization effort could be most beneficial. The choice of the best algorithm for a particular task can be a complicated process, perhaps involving sophisticated mathematical analysis. The branch of computer science that comprises the study of such questions is called analysis of algorithms. Many 5 6 CHAPTER 1 ■ Fundamentals of the algorithms that we study have been shown through analysis to have excellent theoretical performance; others are simply known to work well through experience. Our primary goal is to learn reasonable algorithms for important tasks, yet we shall also pay careful attention to comparative performance of the methods. We should not use an algorithm without having an idea of what resources it might consume, so we strive to be aware of how our algorithms might be expected to perform. Summary of topics As an overview, we describe the major parts of the book, giving specific topics covered and an indication of our general orientation toward the material. This set of topics is intended to touch on as many fundamental algorithms as possible. Some of the areas covered are core computer-science areas that we study in depth to learn basic algorithms of wide applicability. Other algorithms that we discuss are from advanced fields of study within computer science and related fields. The algorithms that we consider are the products of decades of research and development and continue to play an essential role in the ever-expanding applications of computation. Fundamentals (Chapter 1) in the context of this book are the basic principles and methodology that we use to implement, analyze, and compare algorithms. We consider our Java programming model, data abstraction, basic data structures, abstract data types for collections, methods of analyzing algorithm performance, and a case study. Sorting algorithms (Chapter 2) for rearranging arrays in order are of fundamental importance. We consider a variety of algorithms in considerable depth, including insertion sort, selection sort, shellsort, quicksort, mergesort, and heapsort. We also encounter algorithms for several related problems, including priority queues, selection, and merging. Many of these algorithms will find application as the basis for other algorithms later in the book. Searching algorithms (Chapter 3) for finding specific items among large collections of items are also of fundamental importance. We discuss basic and advanced methods for searching, including binary search trees, balanced search trees, and hashing. We note relationships among these methods and compare performance. Graphs (Chapter 4) are sets of objects and connections, possibly with weights and orientation. Graphs are useful models for a vast number of difficult and important problems, and the design of algorithms for processing graphs is a major field of study. We consider depth-first search, breadth-first search, connectivity problems, and several algorithms and applications, including Kruskal’s and Prim’s algorithms for finding minimum spanning tree and Dijkstra’s and the Bellman-Ford algorithms for solving shortest-paths problems. CHAPTER 1 ■ Fundamentals Strings (Chapter 5) are an essential data type in modern computing applications. We consider a range of methods for processing sequences of characters. We begin with faster algorithms for sorting and searching when keys are strings. Then we consider substring search, regular expression pattern matching, and data-compression algorithms. Again, an introduction to advanced topics is given through treatment of some elementary problems that are important in their own right. Context (Chapter 6) helps us relate the material in the book to several other advanced fields of study, including scientific computing, operations research, and the theory of computing. We survey event-based simulation, B-trees, suffix arrays, maximum flow, and other advanced topics from an introductory viewpoint to develop appreciation for the interesting advanced fields of study where algorithms play a critical role. Finally, we describe search problems, reduction, and NP-completeness to introduce the theoretical underpinnings of the study of algorithms and relationships to material in this book. The study of algorithms is interesting and exciting because it is a new field (almost all the algorithms that we study are less than 50 years old, and some were just recently discovered) with a rich tradition (a few algorithms have been known for hundreds of years). New discoveries are constantly being made, but few algorithms are completely understood. In this book we shall consider intricate, complicated, and difficult algorithms as well as elegant, simple, and easy ones. Our challenge is to understand the former and to appreciate the latter in the context of scientific and commercial applications. In doing so, we shall explore a variety of useful tools and develop a style of algorithmic thinking that will serve us well in computational challenges to come. 7 1.1 BASIC PROGRAMMING MODEL Our study of algorithms is based upon implementing them as programs written in the Java programming language. We do so for several reasons: ■ Our programs are concise, elegant, and complete descriptions of algorithms. ■ You can run the programs to study properties of the algorithms. ■ You can put the algorithms immediately to good use in applications. These are important and significant advantages over the alternatives of working with English-language descriptions of algorithms. A potential downside to this approach is that we have to work with a specific programming language, possibly making it difficult to separate the idea of the algorithm from the details of its implementation. Our implementations are designed to mitigate this difficulty, by using programming constructs that are both found in many modern languages and needed to adequately describe the algorithms. We use only a small subset of Java. While we stop short of formally defining the subset that we use, you will see that we make use of relatively few Java constructs, and that we emphasize those that are found in many modern programming languages. The code that we present is complete, and our expectation is that you will download it and execute it, on our test data or test data of your own choosing. We refer to the programming constructs, software libraries, and operating system features that we use to implement and describe algorithms as our programming model. In this section and Section 1.2, we fully describe this programming model. The treatment is self-contained and primarily intended for documentation and for your reference in understanding any code in the book. The model we describe is the same model introduced in our book An Introduction to Programming in Java: An Interdisciplinary Approach, which provides a slower-paced introduction to the material. For reference, the figure on the facing page depicts a complete Java program that illustrates many of the basic features of our programming model. We use this code for examples when discussing language features, but defer considering it in detail to page 46 (it implements a classic algorithm known as binary search and tests it for an application known as whitelist filtering). We assume that you have experience programming in some modern language, so that you are likely to recognize many of these features in this code. Page references are included in the annotations to help you find answers to any questions that you might have. Since our code is somewhat stylized and we strive to make consistent use of various Java idioms and constructs, it is worthwhile even for experienced Java programmers to read the information in this section. 8 ■ 1.1 Basic Programming Model import a Java library (see page 27) import java.util.Arrays; code must be in file BinarySearch.java (see page 26) parameter public class BinarySearch variables static method (see page 22) { public static int rank(int key, int[] a) { initializing return type parameter type declaration statement int lo = 0; (see page 16) int hi = a.length - 1; while (lo a[mid]) lo = mid + 1; (see page 15) else return mid; } return -1; return statement } system calls main() unit test client (see page 26) public static void main(String[] args) { no return value; just side effects (see page 24) int[] whitelist = In.readInts(args[0]); Arrays.sort(whitelist); call a method in a Java library (see page 27) call a method in our standard library; need to download code (see page 27) while (!StdIn.isEmpty()) { int key = StdIn.readInt(); if (rank(key, whitelist) == -1) StdOut.println(key); } conditional statement (see page 15) call a local method (see page 27) } } system passes argument value "whitelist.txt" to main() command line (see page 36) file name (args[0]) % java BinarySearch largeW.txt < largeT.txt StdOut (see page 37) 499569 984875 ... file redirectd from StdIn (see page 40) Anatomy of a Java program and its invocation from the command line 9 10 CHAPTER 1 ■ Fundamentals Basic structure of a Java program A Java program (class) is either a library of static methods (functions) or a data type definition. To create libraries of static methods and data-type definitions, we use the following five components, the basis of programming in Java and many other modern languages: ■ Primitive data types precisely define the meaning of terms like integer, real number, and boolean value within a computer program. Their definition includes the set of possible values and operations on those values, which can be combined into expressions like mathematical expressions that define values. ■ Statements allow us to define a computation by creating and assigning values to variables, controlling execution flow, or causing side effects. We use six types of statements: declarations, assignments, conditionals, loops, calls, and returns. ■ Arrays allow us to work with multiple values of the same type. ■ Static methods allow us to encapsulate and reuse code and to develop programs as a set of independent modules. ■ Strings are sequences of characters. Some operations on them are built in to Java. ■ Input/output sets up communication between programs and the outside world. ■ Data abstraction extends encapsulation and reuse to allow us to define nonprimitive data types, thus supporting object-oriented programming. In this section, we will consider the first five of these in turn. Data abstraction is the topic of the next section. Running a Java program involves interacting with an operating system or a program development environment. For clarity and economy, we describe such actions in terms of a virtual terminal, where we interact with programs by typing commands to the system. See the booksite for details on using a virtual terminal on your system, or for information on using one of the many more advanced program development environments that are available on modern systems. For example, BinarySearch is two static methods, rank() and main(). The first static method, rank(), is four statements: two declarations, a loop (which is itself an assignment and two conditionals), and a return. The second, main(), is three statements: a declaration, a call, and a loop (which is itself an assignment and a conditional). To invoke a Java program, we first compile it using the javac command, then run it using the java command. For example, to run BinarySearch, we first type the command javac BinarySearch.java (which creates a file BinarySearch.class that contains a lower-level version of the program in Java bytecode in the file BinarySearch.class). Then we type java BinarySearch (followed by a whitelist file name) to transfer control to the bytecode version of the program. To develop a basis for understanding the effect of these actions, we next consider in detail primitive data types and expressions, the various kinds of Java statements, arrays, static methods, strings, and input/output. 1.1 ■ Basic Programming Model Primitive data types and expressions A data type is a set of values and a set of operations on those values. We begin by considering the following four primitive data types that are the basis of the Java language: ■ Integers, with arithmetic operations (int) ■ Real numbers, again with arithmetic operations (double) ■ Booleans, the set of values { true, false } with logical operations (boolean) ■ Characters, the alphanumeric characters and symbols that you type (char) Next we consider mechanisms for specifying values and operations for these types. A Java program manipulates variables that are named with identifiers. Each variable is associated with a data type and stores one of the permissible data-type values. In Java code, we use expressions like familiar mathematical expressions to apply the operations associated with each type. For primitive types, we use identifiers to refer to variables, operator symbols such as + - * / to specify operations, literals such as 1 or 3.14 to specify values, and expressions such as (x + 2.236)/2 to specify operations on values. The purpose of an expression is to define one of the data-type values. term examples definition primitive data type int double boolean char a set of values and a set of operations on those values (built in to the Java language) identifier a abc Ab$ a_b ab123 lo hi a sequence of letters, digits, which is not a digit _, and $, the first of variable [any identifier] names a data-type value operator + - * / names a data-type operation literal int double boolean char 1 0 -42 1.0e-15 3.14 true false 'a' '+' '9' '\n' source-code representation of a value expression int double boolean lo + (hi - lo)/2 1.0e-15 * t lo ) a construct that we have already defined, to indicate that we can use any instance of that construct where specified. In this case, represents an expression that has a boolean value, such as one involving a comparison operation, and represents a sequence of Java statements. It is possible to make formal definitions of and , but we refrain from going into that level of detail. The meaning of an if statement is selfexplanatory: the statement(s) in the block are to be executed if and only if the boolean expression is true. The if-else statement: if () { } else { } allows for choosing between two alternative blocks of statements. Loops. Many computations are inherently repetitive. The basic Java construct for handling such computations has the following format: while () { } The while statement has the same form as the if statement (the only difference being the use of the keyword while instead of if), but the meaning is quite different. It is an instruction to the computer to behave as follows: if the boolean expression is false, do nothing; if the boolean expression is true, execute the sequence of statements in the block (just as with if) but then check the boolean expression again, execute the sequence of statements in the block again if the boolean expression is true, and continue as long as the boolean expression is true. We refer to the statements in the block in a loop as the body of the loop. Break and continue. Some situations call for slightly more complicated control flow than provide by the basic if and while statements. Accordingly, Java supports two additional statements for use within while loops: ■ The break statement, which immediately exits the loop ■ The continue statement, which immediately begins the next iteration of the loop We rarely use these statements in the code in this book (and many programmers never use them), but they do considerably simplify code in certain instances. 15 16 CHAPTER 1 ■ Fundamentals Shortcut notations There are several ways to express a given computation; we seek clear, elegant, and efficient code. Such code often takes advantage of the following widely used shortcuts (that are found in many languages, not just Java). Initializing declarations. We can combine a declaration with an assignment to initialize a variable at the same time that it is declared (created). For example, the code int i = 1; creates an int variable named i and assigns it the initial value 1. A best practice is to use this mechanism close to first use of the variable (to limit scope). Implicit assignments. The following shortcuts are available when our purpose is to modify a variable’s value relative to its current value: ■ Increment/decrement operators: i++ is the same as i = i + 1 and has the value i in an expression. Similarly, i-- is the same as i = i - 1. The code ++i and --i are the same except that the expression value is taken after the increment/ decrement, not before. ■ Other compound operations: Prepending a binary operator to the = in an assignment is equivalent to using the variable on the left as the first operand. For example, the code i/=2; is equivalent to the code i = i/2; Note that i += 1; has the same effect as i = i+1; (and i++). Single-statement blocks. If a block of statements in a conditional or a loop has only a single statement, the curly braces may be omitted. For notation. Many loops follow this scheme: initialize an index variable to some value and then use a while loop to test a loop continuation condition involving the index variable, where the last statement in the while loop increments the index variable. You can express such loops compactly with Java’s for notation: for (; ; ) { } This code is, with only a few exceptions, equivalent to ; while () { ; } We use for loops to support this initialize-and-increment programming idiom. 1.1 statement declaration assignment examples a = b + 3; discriminant = b*b - 4.0*c; implicit assignment i++; conditional (if) conditional (if-else) loop (while) loop (for) call return 17 create a variable of a specified type, named with a given identifier double c; int i = 1; Basic Programming Model definition int i; initializing declaration ■ double c = 3.141592625; assign a data-type value to a variable declaration that also assigns an initial value i = i + 1; i += 1; execute a statement, depending on boolean expression if (x < 0) x = -x; execute one or the other statement, depending on boolean expression if (x > y) max = x; else max = y; int v = 0; while (v 1e-15*t) t = (c/t + t) / 2.0; for (int i = 1; i err * t) body t = (c/t + t) / 2.0; amples of static methods are shown in the return t; table on the facing page. } return statement call on another method Invoking a static method. A call on a static Anatomy of a static method method is its name followed by expressions that specify argument values in parentheses, separated by commas. When the method call is part of an expression, the method computes a value and that value is used in place of the call in the expression. For example the call on rank() in BinarySearch() returns an int value. A method call followed by a semicolon is a statement that generally causes side effects. For example, the call Arrays.sort() in main() in BinarySearch is...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached ex...


Anonymous
Great! 10/10 would recommend using Studypool to help you study.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags