Efficient Graph Search

It is a spring Monday in the Twin Cities of Minneapolis and St. Paul. The weather forecast calls for humid and overcast day with the possibility of thunderstorms this afternoon. In general, I try to shutdown my computers at home as soon as I hear thunder. So far all is good. Hopefully the storms will start later in the day after my workday is over!

The weekend was rather nice. It was humid and warm. It seems that the weather has changed more than usual this year. In Lima Peru, there are no storms or any considerable rain. Apparently earlier today some lightning and associate thunder passed by the coastal city. As far as I know this is a first.

Last week I ran into the article: Efficient Graph Search by Terence Kelly.

The article covers basic DFS and how a simple modification is able to provide in practical graphs considerable performance increase. Lately I have been interested in graphs so when I saw the article in question I decided to read it and experiment with some code.

The article provides a couple examples in what appears to be C++ code. I decided to use the Java programming language and the VSCode IDE on a Windows 10 computer. Of course you can use the programming language and IDE of your choice.

Since the article provides just two functions / methods I had to write a test scaffold in order to experiment with the code. Please note that the author provides source code for the original article. I believe he also provides code to generate more complicated graphs. If interested in the subject, you can download the original code. The article contains a link to download a tar archive.

9
13
1,2
1, 3
1,4
2,9
 2,8
4,3
4, 5
3,6
3,7
 5, 6
6,7
7,8
8, 9
main <<< V: 9
main <<< m: 13
main <<< edges: [[1, 2], [1, 3], [1, 4], [2, 9], [2, 8], [4, 3], [4, 5], [3, 6], [3, 7], [5, 6], [6, 7], [7, 8], [8, 9]]
v (0)
v (1) -> 2 -> 3 -> 4
v (2) -> 1 -> 9 -> 8
v (3) -> 1 -> 4 -> 6 -> 7
v (4) -> 1 -> 3 -> 5
v (5) -> 4 -> 6
v (6) -> 3 -> 5 -> 7
v (7) -> 3 -> 6 -> 8
v (8) -> 2 -> 7 -> 9
v (9) -> 2 -> 8
main <<< s: (val: 1 par: 0 dist: 0)
classicBFS <<< f: 9
main <<< g:
V: 9 E: 13
(val: 1 par: 1 dist: 0)
(val: 2 par: 1 dist: 1)
(val: 3 par: 1 dist: 1)
(val: 4 par: 1 dist: 1)
(val: 5 par: 4 dist: 2)
(val: 6 par: 3 dist: 2)
(val: 7 par: 3 dist: 2)
(val: 8 par: 2 dist: 2)
v (0)
v (1) -> 2 -> 3 -> 4
v (2) -> 1 -> 9 -> 8
v (3) -> 1 -> 4 -> 6 -> 7
v (4) -> 1 -> 3 -> 5
v (5) -> 4 -> 6
v (6) -> 3 -> 5 -> 7
v (7) -> 3 -> 6 -> 8
v (8) -> 2 -> 7 -> 9
v (9) -> 2 -> 8
efficientBFS <<< f: 9 q.size: 5
main <<< g:
V: 9 E: 13
(val: 1 par: 2 dist: 2)
(val: 2 par: 1 dist: 1)
(val: 3 par: 1 dist: 1)
(val: 4 par: 1 dist: 1)
(val: 5 par: 4 dist: 2)
(val: 6 par: 3 dist: 2)
(val: 7 par: 3 dist: 2)
(val: 8 par: 2 dist: 2)

I decided to start by providing data so our test code could generate a graph with it. At some point I was just going to provide the first two lines which map to the number of node and vertices respectively. The test code would automatically generate the edges. I decided to directly experiment with slightly different graphs by altering the values for the first two lines and editing the list of edges that follow.

After the input data is processed, the program displays the number of vertices followed by the number of edges and then the list of actual edges.

There are two main ways to define a graph. One is to use a linked list and the other a matrix for the adjacent nodes. As we will see, the adjacent list was implemented. I left the adjacency matrix as an exercise. Please note that in general one uses an adjacency list for sparse graphs and an adjacency matrix for graphs with larger number of nodes and edges.

The adjacency list is then displayed. In our case we have nine nodes (node 0 is not used) and the values for the nodes start with one and end with V.

Once the graph is created we could have prompted for the source node for the BFS operation. I decided to just change the value and not prompt for it. In our example we use node one as the root for our BFS traversals.

I will not go into detail on what the variable ‘f’ is used for yet.

After the DFS traversal the contents of the graph nodes is displayed. The nodes contain a value (or name), the value of the parent vertex, and the distance of the vertex to the root used to start the traversal.

Note that similar data is displayed a second time starting with the adjacency list. The reason is that we first call DFS and then we call the more efficient version of DFS.

    /**
     * Test scaffold
     * 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        
        // **** initialization ****
        List<List<Integer>> edges = new ArrayList<>();

        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read number of vertices in graph ****
        int V = Integer.parseInt(br.readLine().trim());

        // ???? ????
        System.out.println("main <<< V: " + V);

        // **** read number of edges in graph ****
        int m = Integer.parseInt(br.readLine().trim());

        // ???? ????
        System.out.println("main <<< m: " + m);

        // **** read graph edges ****
        IntStream.range(0, m).forEach(i -> {
            try {
                edges.add(
                    Arrays.stream(br.readLine().split(","))
                        .map(s -> s.replace(" ", ""))
                        .map(Integer::parseInt)
                        .collect(toList())
                );
            } catch (IOException ex) {
                throw new RuntimeException(ex);
            }
        });

        // **** close the buffered reader ****
        br.close();

        // ???? ????
        System.out.println("main <<< edges: " + edges.toString());

        // **** create and populate the graph ****
        MyGraph g = new MyGraph(V, edges, AdjacencyType.LIST);

        // ???? ????
        g.printGraph();


        // **** perform DFS from this vertex ****
        GraphNode s = g.G.get(1);

        // ???? ????
        System.out.println("main <<< s: " + s.toString());

        // **** 'classic' BFS from specified vertex ****
        g.classicBFS(s);

        // ???? ????
        System.out.println("main <<< g:");
        System.out.print(g.toString());

        // **** create and populate the graph ****
        g = new MyGraph(V, edges, AdjacencyType.LIST);

        // ???? ????
        g.printGraph();

        // **** 'efficient' BFS from specified vertex ****
        g.efficientBFS(s);
        
        // ???? ????
        System.out.println("main <<< g:");
        System.out.print(g.toString());
    }

This is the test scaffold we use to accept input, generate the graph and make the two calls to the classicBFS() and then to the efficientBFS() methods.

I believe the code has enough comments that make it easy to follow. Of course if you have questions please leave me a comment in the bottom section of the post.

/**
 * GraphNode class
 */
public class GraphNode {
    
    /**
     * 
     */
    public int distance;
    public int parent;
    public int val;

    /**
     * Constructor
     */
    public GraphNode(int value) {
        this.val = value;
    }

    /**
     * Return string with node values. 
     */
    @Override
    public String toString() {
        return "(val: " + val + " par: " + parent + " dist: " + distance + ")" ;
    }
}

The GraphNode class is used to represent nodes in our graph. We have three members. The ‘val’ field represents the value or the name of the node. Nodes start with value 1 and are in monotonically ascending order. Note that internally we leave a place for node 0 which is not used in this example.

// **** adjacency type ****
enum AdjacencyType {
    LIST, MATRIX, UNDEFINED
}

The enum is used to create graphs using an adjacency list or a matrix. As I mentioned I just implemented the adjacency list. If requested I can implement the adjacency matrix.

    /**
     * Class members.
     */
    public ArrayList<ArrayList<Integer>> adjList    = null;
    public int[][] adjMatrix                        = null;
    public AdjacencyType adjType                    = AdjacencyType.UNDEFINED;
    public int V                                    = 0;
    public int E                                    = 0;
    public HashMap<Integer, GraphNode> G            = null;

By now we should have a reasonable idea on how the first five members of the Graph class are used. The sixth member is used to manage the nodes. We index the vertices by the value of the node. This allows access to a vertex in O(1) time.

    /**
     * Constructor(s)
     */
    public MyGraph(int V, List<List<Integer>> edges, AdjacencyType adjType) {

        // **** sanity checks ****
        if (V <= 0)
            return;

        if (edges == null || edges.size() == 0)
            return;

        if (!(adjType == AdjacencyType.LIST || adjType == AdjacencyType.MATRIX))
            return;

        // **** set the number of vertices in the graph ****
        this.V = V;

        // **** create adjacency list or matrix ****
        this.adjType = adjType;
        if (adjType == AdjacencyType.LIST) {

            // **** create adjacency list ****
            this.adjList = new ArrayList<ArrayList<Integer>>();

            // **** create empty entries in the adjacency list ****
            for (int i = 0; i < (V + 1); i++)
                adjList.add(new ArrayList<Integer>());

        } else if (this.adjType == AdjacencyType.MATRIX) {


        } else {
            System.out.println("MyGraph <<< unexpected adjType: " + this.adjType.toString());
            return;
        }

        // **** create the graph ****
        this.G = new HashMap<>();

        // **** traverse the edges populating the adjacency list and the graph - O(E) ****
        for (int i = 0; i < edges.size(); i++) {

            // **** get this edge ****
            List<Integer> edge = edges.get(i);

            // ???? ????
            // System.out.println("MyGraph <<< edge: " + edge.toString());

            // **** for ease of use ****
            int v = edge.get(0);
            int u = edge.get(1);

            // ???? ????
            // System.out.println("MyGraph <<< u: " + u + " v: " + v);

            // **** create the u node in the graph (if needed) ****
            if (!G.containsKey(u)) {
                G.put(u, new GraphNode(u));
            }

            // **** create the v node in the graph (if needed) ****
            if (!G.containsKey(v)) {
                G.put(v, new GraphNode(v));
            }

            // **** add this edge to the graph ****
            addEdge(u, v);
        }
    }

This is the constructor for a graph. It expects the number of vertices, a list of edges and the adjacency type. In our current implementation we only support linked lists.

We start by performing some sanity checks.

We then set the number of vertices in the graph.

The adjacency list is created and initialized. The hash map to reference the vertices in the graph is created.

We then traverse the list of edges. For ease of use we extract the value for the v and u vertices. With that information we create (if needed) the node for the graph with the proper value.

    /**
     * Utility function to add an edge in an undirected graph.
     */
    public void addEdge(int u, int v) {

        // **** ****
        if (this.adjType == AdjacencyType.LIST) {

            // **** add bidirectional edge ****
            this.adjList.get(u).add(v);
            this.adjList.get(v).add(u);

            // **** count the edge ****
            this.E++;
        } else if (this.adjType == AdjacencyType.MATRIX) {


        } else {
            System.out.println("addEdge <<< unexpected adjType: " + this.adjType.toString());
            return;
        }
    }

Finally we add the edges between the v and u vertices since we are dealing with a bidirectional graph.

    /**
     * Utility to print the contents of graph.
     */
    public void printGraph() {

        // **** print the graph based on the adjacency data structure - O(V)****
        if (this.adjType == AdjacencyType.LIST) {

            // **** ****
            for (int i = 0; i < adjList.size(); i++) {

                // **** start the list ****
                System.out.print("v (" + i + ")");

                // **** ****
                for (int j = 0; j < adjList.get(i).size(); j++)
                    System.out.print(" -> " + adjList.get(i).get(j));

                // **** done with the list ****
                System.out.println();
            }
        } else if (this.adjType == AdjacencyType.MATRIX) {

        } else {
            System.out.println("printGraph <<< unexpected adjType: " + this.adjType.toString());
            return;
        }
    }

This method is used to display the adjacency list (or matrix) for the graph.

 

After displaying a header we enter a loop in which the contents of the vertex adjacent to the current node are displayed.

    /**
     * toString
     */
    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("V: " + this.V + " E: " + this.E + "\n");
        for (int i = 1; i < this.G.size(); i++) {
            sb.append(this.G.get(i).toString() + "\n");
        }
        return sb.toString();
    }

This method is used to display the members of a node / vertex.

    // **** counts vertices with finalized output fields ****
    public int f = 1;

We declare a global variable that is used to count the number of vertices in a map that have been found. This variable does not need to be global. As we look further into the code it will become apparent why I decided to use a global variable. In production I would have changed existing code, but since this code is to experiment and learn, I just ran with the flow.

    /**
     * Classic BFS.
     * 
     * BFS computes for each vertex v two related outputs:
     * 
     * o distance from the source (i.e., the number of edges on a shortest path between s and v), 
     * o and a parent vertex chosen from v's neighbors.
     * 
     * BFS computes for each vertex v two related outputs: 
     * o distance from the source (i.e., the number of edges on a shortest path between s and v), 
     * o and a parent vertex chosen from v's neighbors. 
     * 
     * Moving to v's parent, grandparent, great-grandparent, etc. traces a shortest path to s
     * backtracking yields a shortest path from s to v.
     * 
     * Runtime: O(V + E)
     */
    public void classicBFS(GraphNode s) {

        // **** sanity check(s) ****
        if (s == null)
            return;

        // **** initialization ****
        s.distance  = 0;
        s.parent    = s.val;        // source is defined to be its own parent
        f           = 1;            // # of vertices found & finalized

        LinkedList<GraphNode> q = new LinkedList<>();

        // **** prime the queue ****
        q.add(s);

        // **** loop while the queue is not empty - O(V) ****
        while (!q.isEmpty()) {

            // **** remove head node ****
            GraphNode u = q.removeFirst();

            // **** get the adjacent list for node u ****
            List<Integer> adj = this.adjList.get(u.val);

            // **** for each vertex v adjacent to u - O(2 * E) ****
            adj.forEach( (node) -> {

                // **** get vertex from the graph ****
                GraphNode v = G.get(node);

                // **** process this vertex (if needed) ****
                if (v.parent == 0) {

                    // **** ****
                    v.distance = u.distance + 1;
                    v.parent = u.val;

                    // **** update counter ****
                    ++f;

                    // **** insert vertex into queue ****
                    q.add(v);
                }
            });
        }

        // ???? ????
        System.out.println("classicBFS <<< f: " + f);
    }

As we discussed earlier we have two BFS methods. One is labeled as traditional while the other is referred as efficient. As we will see, one method differs from the other in the use of the variable ‘f’ which we were introduced earlier.

We start by performing some sanity checks.

A initialization step is then performed. If you look at code that I have developed many years ago, you will see this pattern.

Note that the variable ‘f’ is initialized here. The queue used in standard BFS implementations is initialized and primed.

We now enter the main loop. We then remove the first and only entry in the queue.

We select the adjacency list associated with the vertex we just pulled from the queue.

We then loop through all the adjacent vertices. I used the forEach method which requires a lambda expression. Due to limitations of the lambda expression, we remove the forEach and replace it with a for loop, which could have been easy to do, or just declare the variable global.

As you can see the structure and implementation of the classic BFS method is well known and understood. The only difference is that the value of the ‘f’ variable in incremented and displayed before we exist the method. At this point we only know how the ‘f’ variable is used, and if we go back to the screen capture in which the code is executed, we know the final value.

   /**
     * Efficient BFS.
     * 
     * BFS computes for each vertex v two related outputs:
     * 
     * o distance from the source (i.e., the number of edges on a shortest path between s and v), 
     * o and a parent vertex chosen from v's neighbors.
     * 
     * BFS computes for each vertex v two related outputs: 
     * o distance from the source (i.e., the number of edges on a shortest path between s and v), 
     * o and a parent vertex chosen from v's neighbors. 
     * 
     * Moving to v's parent, grandparent, great-grandparent, etc. traces a shortest path to s
     * backtracking yields a shortest path from s to v.
     */
    public void efficientBFS(GraphNode s) {

        // **** sanity check(s) ****
        if (s == null)
            return;

        // **** initialization ****
        s.distance  = 0;
        s.parent    = s.val;        // source is defined to be its own parent
        f           = 1;            // # of vertices found & finalized
        LinkedList<GraphNode> q = new LinkedList<>();

        // **** prime the queue ****
        q.add(s);

        // **** loop while the queue is not empty ****
        while (!q.isEmpty()) {

            // **** remove head node ****
            GraphNode u = q.removeFirst();

            // **** get the adjacent list for node u ****
            List<Integer> adj = this.adjList.get(u.val);

            // **** for each vertex v adjacent to u ****
            adj.forEach( (node) -> {

                // **** get vertex from the graph ****
                GraphNode v = G.get(node);

                // **** process this vertex (if needed) ****
                if (v.parent == 0) {

                    // **** ****
                    v.distance = u.distance + 1;
                    v.parent = u.val;

                    // **** check if done ****
                    if (++f == this.V) {

                        // ???? ????
                        System.out.println("efficientBFS <<< f: " + f + " q.size: " + q.size());

                        // **** we are done ****
                        return;
                    }

                    // **** insert vertex into queue ****
                    q.add(v);
                }
            });
        }
    }

This is the efficient BFS implementation. The ‘f’ variable is used in order to exit the method and avoid additional processing which is irrelevant.

The sanity check and initialization steps are very similar to the previous DFS method. As a matter of fact most steps are the same.  What is different is the code added in the forEach loop. We are counting in the ‘f’ variable the number of vertices / nodes that we have processed so far. When we reach the value of V which represents the total number of nodes, we exit the method. Note that we display the value of ‘f’ and the size of the queue.

Take a minute and look at the screen capture of the execution of this program. In both implementations we display the value of 9 for ‘f’. In the second implementation we also display the number of vertices left in the queue that we need to process. The difference is that in the efficientBFS method we exit and do not spend additional time processing vertices that will not change the result.

Experiment as needed and realize that in most BFS implementations the number of vertices is typically not tracked and no action is taken. Depending on the graph values (number of vertices and number of edges) the savings could be huge.

If you go back to the original post, the savings are very well explained. I do not know about you, but unless there is a compelling reason, from now on I will add the ‘f’ check.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, please do not hesitate and leave me a note below. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

Thanks for reading this post, feel free to connect with me John Canessa at LinkedIn.

Regards;

John

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.