To traverse means to visit the
vertices in some systematic order. You should be familiar with various
traversal methods for trees:
preorder: visit each node before its
postorder: visit each node after its
inorder (for binary trees only):
visit left subtree, node, right subtree.
We also saw another kind of
traversal, topological ordering, when I talked about shortest paths.
Today, we'll see two other
traversals: breadth first search (BFS) and depth first search (DFS). Both of
these construct spanning trees with certain properties useful in other graph
algorithms. We'll start by describing them in undirected graphs, but they are
both also very useful for directed graphs.
This can be throught of as being
like Dijkstra's algorithm for shortest paths, but with every edge having the
same length. However it is a lot simpler and doesn't need any data structures.
We just keep a tree (the breadth first search tree), a list of nodes to be
added to the tree, and markings (Boolean variables) on the vertices to tell whether
they are in the tree or list.
breadth first search:
starting vertex x
list L = x
tree T = x
while L nonempty
choose some vertex
v from front of list
for each unmarked
add it to end
add edge vw to
It's very important that you remove
vertices from the other end of the list than the one you add them to, so that
the list acts as a queue (fifo storage) rather than a stack (lifo). The
"visit v" step would be filled out later depending on what you are
using BFS for, just like the tree traversals usually involve doing something at
each vertex that is not specified as part of the basic algorithm. If a vertex
has several unmarked neighbors, it would be equally correct to visit them in
any order. Probably the easiest method to implement would be simply to visit
them in the order the adjacency list for v is stored in.
Let's prove some basic facts about
this algorithm. First, each vertex is clearly marked at most once, added to the
list at most once (since that happens only when it's marked), and therefore
removed from the list at most once. Since the time to process a vertex is
proportional to the length of its adjacency list, the total time for the whole
algorithm is O(m).
let's look at the tree T constructed by the algorithm. Why is it a tree? If you
think of each edge vw as pointing "upward" from w to v, then each
edge points from a vertex visited later to one visited earlier. Following
successive edges upwards can only get stopped at x (which has no edge going
upward from it) so every vertex in T has a path to x. This means that T is at
least a connected subgraph of G. Now let's prove that it's a tree. A tree is
just a connected and acyclic graph, so we need only to show that T has no
cycles. In any cycle, no matter how you orient the edges so that one direction
is "upward" and the other "downward", there is always a
"bottom" vertex having two upward edges out of it. But in T, each
vertex has at most one upward edge, so T can have no cycles. Therefore T really
is a tree. It is known as a breadth first search tree.
We also want to know that T is a spanning
tree, i.e. that if the graph is connected (every vertex has some path to
the root x) then every vertex will occur somewhere in T. We can prove this by
induction on the length of the shortest path to x. If v has a path of length k,
starting v-w-...-x, then w has a path of length k-1, and by induction would be included
in T. But then when we visited w we would have seen edge vw, and if v were not
already in the tree it would have been added.
Breadth first traversal of G
corresponds to some kind of tree traversal on T. But it isn't preorder,
postorder, or even inorder traversal. Instead, the traversal goes a level
at a time, left to right within a level (where a level is defined simply in
terms of distance from the root of the tree). For instance, the following tree
is drawn with vertices numbered in an order that might be followed by breadth
/ | \
2 3 4
| / | \
8 9 10 11
The proof that vertices are in this
order by breadth first search goes by induction on the level number. By the
induction hypothesis, BFS lists all vertices at level k-1 before those at level
k. Therefore it will place into L all vertices at level k before all those of
level k+1, and therefore so list those of level k before those of level k+1.
(This really is a proof even though it sounds like circular reasoning.)
Breadth first search trees have a
nice property: Every edge of G can be classified into one of three groups. Some
edges are in T themselves. Some connect two vertices at the same level of T.
And the remaining ones connect two vertices on two adjacent levels. It is not
possible for an edge to skip a level.
Therefore, the breadth first search
tree really is a shortest path tree starting from its root. Every vertex has a
path to the root, with path length equal to its level (just follow the tree
itself), and no path can skip a level so this really is a shortest path.
first search has several uses in other graph algorithms, but most are too
complicated to explain in detail here. One is as part of an algorithm for matching,
which is a problem in which you want to pair up the n vertices of a graph by
n/2 edges. If you have a partial matching, pairing up only some of the
vertices, you can extend it by finding an alternating path connecting
two unmatched vertices; this is a path in which every other edge is part of the
partial matching. If you remove those edges in the path from the matching, and
add the other path edges back into the matching, you get a matching with one
more edge. Alternating paths can be found using a version of breadth first
A second use of breadth first search
arises in certain pattern matching problems. For instance, if you're looking
for a small subgraph such as a triangle as part of a larger graph, you know
that every vertex in the triangle has to be connected by an edge to every other
vertex. Since no edge can skip levels in the BFS tree, you can divide the
problem into subproblems, in which you look for the triangle in pairs of
adjacent levels of the tree. This sort of problem, in which you look for a
small graph as part of a larger one, is known as subgraph isomorphism.
In a recent paper, I used this idea
to solve many similar pattern-matching problems in linear time.
Depth first search is another way of
traversing graphs, which is closely related to preorder traversal of a tree.
Recall that preorder traversal simply visits each node before its children. It
is most easy to program as a recursive routine:
for each child w
To turn this into a graph traversal
algorithm, we basically replace "child" by "neighbor". But
to prevent infinite loops, we only want to visit each vertex once. Just like in
BFS we can use marks to keep track of the vertices that have already been
visited, and not visit them again. Also, just like in BFS, we can use this
search to build a spanning tree with certain useful properties.
for each neighbor
w of v
if w is
add edge vw to
The overall depth first search
algorithm then simply initializes a set of markers so we can tell which
vertices are visited, chooses a starting vertex x, initializes tree T to x, and
calls dfs(x). Just like in breadth first search, if a vertex has several
neighbors it would be equally correct to go through them in any order. I didn't
simply say "for each unvisited neighbor of v" because it is very
important to delay the test for whether a vertex is visited until the recursive
calls for previous neighbors are finished.
that this produces a spanning tree (the depth first search tree) is
essentially the same as that for BFS, so I won't repeat it. However while the
BFS tree is typically "short and bushy", the DFS tree is typically
"long and stringy".
Just like we did for BFS, we can use
DFS to classify the edges of G into types. Either an edge vw is in the DFS tree
itself, v is an ancestor of w, or w is an ancestor of v. (These last two cases
should be thought of as a single type, since they only differ by what order we
look at the vertices in.) What this means is that if v and w are in different
subtrees of v, we can't have an edge from v to w. This is because if such an
edge existed and (say) v were visited first, then the only way we would avoid
adding vw to the DFS tree would be if w were visited during one of the
recursive calls from v, but then v would be an ancestor of w.
As an example of why this property
might be useful, let's prove the following fact: in any graph G, either G has
some path of length at least k. or G has O(kn) edges.
Proof: look at the longest path in
the DFS tree. If it has length at least k, we're done. Otherwise, since each
edge connects an ancestor and a descendant, we can bound the number of edges by
counting the total number of ancestors of each descendant, but if the longest
path is shorter than k, each descendant has at most k-1 ancestors. So there can
be at most (k-1)n edges.
This fact can be used as part of an
algorithm for finding long paths in G, another subgraph isomorphism problem
closely related to the traveling salesman problem. If k is a small constant
(like say 5) you can find paths of length k in linear time (measured as a
function of n). But measured as a function of k, the time is exponential, which
isn't surprising because this problem is closely related to the traveling
salesman problem. For more on this particular problem, see Michael R. Fellows
and Michael A. Langston, "On search, decision and the efficiency of
polynomial-time algorithms", 21st ACM Symp. Theory of Computing, 1989, pp.
between BFS and DFS
It may not be clear from the
pseudo-code above, but BFS and DFS are very closely related to each other. (In
fact in class I tried to describe a search in which I modified the "add to
end of list" line in the BFS pseudocode to "add to start of
list" but the resulting traversal algorithm was not the same as DFS.)
With a little care it is possible to
make BFS and DFS look almost the same as each other (as similar as, say, Prim's
and Dijkstra's algorithms are to each other):
list L = empty
tree T = empty
choose a starting
(v,w) from start of L
if w not yet
add (v,w) to T
list L = empty
tree T = empty
choose a starting
(v,w) from end of L
if w not yet
add (v,w) to T
for each edge
add edge (v,w)
to end of L
Both of these search algorithms now keep a list of edges to
explore; the only difference between the two is that, while both algorithms
adds items to the end of L, BFS removes them from the beginning, which results
in maintaining the list as a queue, while DFS removes them from the end,
maintaining the list as a stack