Connectivity in Undirected Graphs

In this lesson, we will look at connectivity in undirected graphs. Recall that that the edges of an undirected graph are bidirectional, meaning that if there's a path from $a$ to $b$, then there's also a path from $b$ to $a$.

A connected component is a maximal set of nodes that is pairwise connected.

Here are the connected components in an example graph, where each connected component is colored by a single color:

More formally, if we let $a \sim b$ mean $a$ is connected to $b$, then:

Hence, $\sim$ is what you would call an equivalence relation, and the equivalence classes under $\sim$ are the connected components of the graph.

Finding connected components can be done with BFS or DFS, which you probably already knew. Both algorithms work fine. (though DFS usually gets deeper in recursion than BFS.)

A cycle is a nontrivial path from $a$ to itself. We say a graph is acyclic if it has no cycles. An undirected acyclic graph is called a forest. A connected forest is a tree.

The natural question is how to detect if there's a cycle in an undirected graph. For this, you can use BFS or DFS again to detect if a cycle exists (how?) and find such a cycle (how?) in $O(n + e)$ time. A fancier (and perhaps easier-to-implement) approach is union-find, although it runs in $O(n + e\cdot \alpha(n))$, where $\alpha$ is the slow-growing inverse Ackermann function. Due to this factor, union-find is noticeably slower when $n \ge 2^{2^{2^{65536}}}$, so this is unacceptable. Just kidding! $\alpha(2^{2^{2^{65536}}}) \le 5$, so in practice this is fine. (Though if the time limits are particularly tight, that factor of $5$ sometimes bites, so it could be better to use a BFS/DFS instead. You have to judge when to use union-find or not. Also, note that this only detects a cycle; it's also harder to find the cycle with union-find.)

Depth-first search revisited (undirected version)

Now it's time to take a closer look at one of the simplest graph traversal methods: DFS. The DFS will be central in the algorithms we'll discuss later on, so now is a good time to revisit DFS with more detail.

DFS, in its simplest form, is just a particular order of traversal of the graph determined by the following recursive procedure: (in pseudocode. I expect you can easily convert pseudocode into real code by now!)

  1. function DFS(i):
  2. // perform a DFS starting at node i
  3. visited[i] = true // mark it as visited
  4. for j in adj[i]:
  5. if not visited[j]:
  6. DFS(j)
  7. function DFS_all():
  8. // perform a DFS on the whole graph
  9. for i = 0..n-1:
  10. visited[i] = false
  11. for i = 0..n-1:
  12. if not visited[i]:
  13. DFS(i)

On its own, it's not very interesting, since all this does is visit all nodes (and mark “visited[i]” as true). But we can actually extract more information from this DFS procedure. The first (useful) thing we can do is generalize:

  1. function DFS(i, p):
  2. visited[i] = true
  3. parent[i] = p
  4. visit_start(i) // do something once a new node is visited
  5. for j in adj[i]:
  6. if not visited[j]:
  7. DFS(j, i)
  8. visit_end(i) // do something once a node has finished expanding
  9. function DFS_all():
  10. for i = 0..n-1:
  11. visited[i] = false
  12. parent[i] = -1
  13. for i = 0..n-1:
  14. if not visited[i]:
  15. DFS(i, -1)

Here, the function “visit_start” and “visit_end” are whatever you wanted to do with the nodes. They will be called once for each node, in order of the starting and ending times of the nodes, respectively. This generalized version is quite useful in many cases.

Notice also that we're computing the “parent” array, denoting the parent of the nodes in the traversal. This could be useful in some cases.

But actually, DFS is more interesting and useful than that; there are still other bits of information we can extract from this traversal that can help us solve some problems.

To find this hidden information, let's try consider a particular DFS traversal, and then let's draw the nodes on paper so that each node appears “beneath” its parent, and the “children” of a node are ordered from left to right according to the order of visitation. For instance, it might look like this:

The black edges represent the edges that are traversed by the DFS. We call such edges tree edges. The remaining edges, colored in blue, are the ones ignored by the DFS, and are called back edges. Thus, we can clearly see that DFS classifies the edges into one of two types, depending on how they were handled by the DFS traversal!

We'll see later on that this classification will be useful for us.

Note that, due to the depth-first nature of DFS, other types of edges in the DFS forest can't appear, such as nontree edges that don't point to an ancestor. Please convince yourself of this.

If we consider only the black edges, then we get a forest. For this reason, this is sometimes called the DFS forest of the graph.

To be more specific, the classification of the edges is done by the following procedure:

  1. function DFS(i, p):
  2. start_time[i] = time++
  3. parent[i] = p
  4. for j in adj[i]:
  5. if start_time[j] == -1:
  6. mark (i, j) as a tree edge
  7. DFS(j, i)
  8. else if j != p:
  9. mark (i, j) as a back edge
  10. finish_time[i] = time++
  11. function DFS_all():
  12. time = 0
  13. for i = 0..n-1:
  14. start_time[i] = -1
  15. finish_time[i] = -1
  16. parent[i] = -1
  17. for i = 0..n-1:
  18. if start_time[i] == -1:
  19. DFS(i, -1)

An important thing to note here is that the condition j != p is checked before marking some edge as a back edge; this is very important, otherwise we will be marking all edges as back edges! (Why?) (An even further complication in this part of the code is when there are parallel edges, that is, multiple edges that connect the same pair of nodes. In this discussion, we'll assume that there are no parallel edges. But if you want to learn how to deal with parallel edges, you can try to figure it out yourself, or just ask :))

Notice that we've also replaced the visited array with two new arrays start_time and finish_time, which will contain the starting and finishing times of each node's visitation. For now, there aren't many uses to them, but they will be more useful for us later on.

The running time is still $O(n + e)$, but along the way, we've gained more information about the DFS traversal, namely the edge classifications, the starting and ending times of each visitation, and the parents of the nodes! These pieces of information will prove valuable in the upcoming algorithms.

By the way, note that the implementation above is written recursively. In some large trees, stack overflow might be a concern, especially if the stack size is limited. In those cases, you might want to implement an iterative version. (A generic way to do that conversion is to simulate the call stack with an actual stack, and the stack entries describe the whole state of the function at that point. In this case, you only need to push “i” and the index of “j” in “adj[i]”.)

Bridges and articulation points

A bridge is an edge whose removal increases the number of connected components. An articulation point is a node whose removal increases the number of connected components. Note that when you remove a node you also remove the edges adjacent to it.

For simplicity, let's assume here that the graph is connected; if not, we can consider each connected component separately. Thus, we will use the following specialized definitions: In a connected graph, a bridge is an edge whose removal disconnects the graph, and an articulation point is a node whose removal disconnects the graph.

In the following picture, the blue edges are the bridges, and the red nodes are the articulation points:

It's easy to see why one would identify and study these edges/nodes. Roughly speaking, these edges and nodes are the “weakest links” of your network; if you're modelling a computer network, then a bridge could represent a critical connection, and an articulation point could represent a critical computer.

Bridges and articulation points are also sometimes called “cut edges” and “cut vertices”, respectively, for obvious reasons.

Finding bridges and articulation points

Given a (connected) undirected graph, how do you find all its bridges and articulation points? If the graph is small enough, you can just individually check each edge/node if they are a bridge/articulation point by removing it and checking if the resulting graph is disconnected. Since it takes $O(n + e)$ time to traverse a graph, it takes $O(e(n + e))$ and $O(n(n + e))$ time to find the bridges and articulation points this way.

But it turns out that we can compute both in $O(n + e)$ time using DFS! To see how, let's say we performed DFS on our graph. The following is a picture of the resulting “DFS forest” (which is really just a “DFS tree” since the graph is connected):

Stare at this picture for a while and think about exactly when an edge is a bridge, or when a node is an articulation point.

After a bit of pondering, we can state the conditions precisely. Note that the terms “ancestor” and “descendant” refer to the nodes' relationships in the DFS tree, and a node is considered an ancestor and a descendant of itself.

Take note of the second case regarding articulation points; it's easy to miss, but important.

With these observations, we can now compute the bridges and articulation points by augmenting DFS with additional data:

Using ${\mathrm{disc}}[i]$ and ${\mathrm{low}}[i]$, we can now state precisely when something is a bridge or an articulation point:

Thus, the only remaining task is to compute ${\mathrm{disc}}[i]$ and ${\mathrm{low}}[i]$ for each $i$. But we can compute these values during the DFS, like so:

  1. function DFS(i, p):
  2. disc[i] = low[i] = time++
  3. children = 0
  4. has_low_child = false
  5. for j in adj[i]:
  6. if disc[j] == -1:
  7. // this means (i, j) is a tree edge
  8. DFS(j, i)
  9. // update low[i] and other data
  10. low[i] = min(low[i], low[j])
  11. children++
  12. if low[j] > disc[i]:
  13. mark edge (i, j) as a bridge
  14. if low[j] >= disc[i]:
  15. has_low_child = true
  16. else if j != p:
  17. // this means (i, j) is a back edge
  18. low[i] = min(low[i], disc[j])
  19. if (p == -1 and children >= 2) or (p != -1 and has_low_child):
  20. mark i as an articulation point
  21. function bridges_and_articulation_points():
  22. time = 0
  23. for i = 0..n-1:
  24. disc[i] = -1
  25. for i = 0..n-1:
  26. if disc[i] == -1:
  27. DFS(i, -1)

This procedure now correctly identifies all bridges and articulation points in the graph!

An important thing to understand here is how ${\mathrm{low}}[i]$ is being computed. Please ensure that you understand it.