The motivation for this post is the Coursera class “Graph Analytics for Big Data” by the University of California San Diego I am currently taking. One of the algorithms that we briefly touched was shortest path between two nodes by Edsger Dijkstra.

The algorithm comes in different flavors. One can compute the shortest path between two nodes, the shortest paths between all nodes, among others. In this case I just went with the first approach.

For simplicity I used an adjacency matrix to represent the graphs. An adjacency matrix is one that represents distances between nodes. I left in the code two matrices for testing. The smaller follows:

// **** graph **** int[][] graph = { // A B C D E {0, 6, 0, 1, 0}, // A {6, 0, 5, 2, 2}, // B {0, 5, 0, 0, 5}, // C {1, 2, 0, 0, 1}, // D {0, 2, 5, 1, 0} // E };

Each row represents the distances from a vertex to another vertex. In our example you may note upper case letters across the columns and down the rows. If you wish to know which vertices are adjacent to node ‘D’, you select row 3 and move across. Each column with a non-zero entry represents the distance from node ‘D’ to its adjacent nodes. In this case ‘D’ is connected to ‘A’, ‘B’ and ‘E’. Following is a diagram of the graph generated using Visio:

I did spent time experimenting with different data structures to support the operations required to implement by the algorithm. The algorithm is relative expensive to implement, especially if you are dealing with hundreds of thousands vertices and possibly millions of edges. Graphs are currently used by many applications which we use every day (e.g., Facebook, Twitter, etc.)

The algorithm requires the developer to implement the following objects:

Vertex list |

Shortest distance from Root vertex |

Previous vertex |

Visited vertex |

Please note that one may used and possibly combine different classes of objects to speed up or simplify operations. Like I mentioned earlier, I tried a few things. I strongly recommend experiment and end up with a custom implementation that will better suit the requirements of the project at hand.

I added a charMap[] array to be able to display the graph vertices using letters instead of numbers. Seems like the concepts are easier to understand and follow.

The code was implemented in Java using the Eclipse IDE. The code follows:

// **** graph **** package canessa.dijkstra.shortest.path; import java.util.HashSet; import java.util.Stack; /* * */ public class Soulution { static char[] charMap = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K'}; // 0 1 2 3 4 5 6 7 8 9 0 /* * return the unvisited vertex with the smallest know distance from the start vertex (greedy algorithm) */ static int selectUnvisitedSmallestDistance(int[][] graph, HashSet<Integer> unVisited, int[] shortestDist) { int v = Integer.MAX_VALUE; int distance = Integer.MAX_VALUE; // **** select unvisited vertex with smallest know distance from the start vertex **** for (int i = 0; i < graph.length; i++) { if (unVisited.contains(i) && shortestDist[i] < distance) { v = i; distance = shortestDist[i]; } } // **** **** return v; } /* * Display all shortest paths */ static void displayShortestPaths(int root, int[] shortestDist, int[] prevVertex) { for (int i = 1; i < shortestDist.length; i++) { // System.out.println("from: A to " + charMap[i] ); Stack<Character> stack = new Stack<Character>(); stack.push(charMap[i]); int j = i; do { stack.push(charMap[prevVertex[j]]); j = prevVertex[j]; } while (j != root); System.out.printf("%2d : ", shortestDist[i]); while (!stack.isEmpty()) { System.out.print(stack.pop()); if (!stack.empty()) System.out.print(" -> "); } System.out.println(); } } /* * */ static void shortestPath(int[][] graph, int root) { // System.out.println("graph.length: " + graph.length); HashSet<Integer> unVisited = new HashSet<Integer>(); int[] shortestDist = new int[graph.length]; int[] prevVertex = new int[graph.length]; int v; // **** initialization **** for (int i = 0; i < graph.length; i++) { shortestDist[i] = Integer.MAX_VALUE; prevVertex[i] = Integer.MAX_VALUE; unVisited.add(i); } shortestDist[root] = 0; // **** loop visiting all nodes in the graph **** do { // **** visit unvisited vertex with the smallest know distance from the start vertex **** v = selectUnvisitedSmallestDistance(graph, unVisited, shortestDist); // System.out.println("v: " + charMap[v]); // **** for each unvisited neighbor of the current vertex **** for (int i = 0; i < graph.length; i++) { // **** skip unvisited vertices **** if (!unVisited.contains(i)) continue; // **** check if this vertex is not adjacent **** if (graph[v][i] == 0) continue; // System.out.println(charMap[v] + " -> " + charMap[i]); // **** calculate distance from the start to this vertex **** int distance = shortestDist[v] + graph[v][i]; // System.out.println("distance: " + distance); // **** if the calculated distance of this vertex is less than the known distance **** if (distance < shortestDist[i]) { // **** update shortest distance to this vertex **** shortestDist[i] = distance; // **** update the previous vertex with the current vertex **** prevVertex[i] = v; } } // **** remove the vertex from the unvisited set **** unVisited.remove(v); } while (unVisited.size() != 0); // // **** display the shortest distances **** // for (int i = 1; i < graph.length; i++) { // System.out.println(charMap[root] + " -> " + charMap[i] + " : " + shortestDist[i]); // } // **** display shortest paths **** displayShortestPaths(root, shortestDist, prevVertex); } /* * */ public static void main(String[] args) { // **** graph **** int[][] graph = { // A B C D E {0, 6, 0, 1, 0}, // A {6, 0, 5, 2, 2}, // B {0, 5, 0, 0, 5}, // C {1, 2, 0, 0, 1}, // D {0, 2, 5, 1, 0} // E }; // int graph[][] = new int[][] { // // TO: 0 1 2 3 4 5 6 7 9 FROM: // {0, 4, 0, 0, 0, 0, 0, 8, 0}, // 0 // {4, 0, 8, 0, 0, 0, 0, 11, 0}, // 1 // {0, 8, 0, 7, 0, 4, 0, 0, 2}, // 2 // {0, 0, 7, 0, 9, 14, 0, 0, 0}, // 3 // {0, 0, 0, 9, 0, 10, 0, 0, 0}, // 4 // {0, 0, 4, 14, 10, 0, 2, 0, 0}, // 5 // {0, 0, 0, 0, 0, 2, 0, 1, 6}, // 6 // {8, 11, 0, 0, 0, 0, 1, 0, 7}, // 7 // {0, 0, 2, 0, 0, 0, 6, 7, 0} // 8 // }; // **** find shortest paths **** shortestPath(graph, 0); } }

The following screen capture shows the output of the program executing the first graph.

3 : A -> D -> B 7 : A -> D -> E -> C 1 : A -> D 2 : A -> D -> E

The algorithm as implemented is **greedy**. When processing a vertex it chooses the next one based on the shortest distance. The hope is that in most cases the shortest path will be easier/faster to find than if the next vertex would be chosen in increasing or random order.

I hope to have documented the steps well enough that the reader would be able to follow the general steps. If this is not the case please let me know.

The following screen capture shows the output of the program processing the second graph.

4 : A -> B 12 : A -> B -> C 19 : A -> B -> C -> D 21 : A -> H -> G -> F -> E 11 : A -> H -> G -> F 9 : A -> H -> G 8 : A -> H 14 : A -> B -> C -> I

Would like to note that my general approach to learn or polish on a subject is to give it a try on my own. Once that is done whether successful or not, I look for articles and videos (typically YouTube) online. In this case I liked the most a Youtube video by Kevin Drumm. I tend to read and watch two or three sources and incorporate / reject ideas into my code. It is not a good practice, which I have seen many times at different companies, to just copy and paste code from the internet without fully understanding and in most cases adapting it to your project requirements.

Hope you enjoyed the post. If you have comments or questions, please leave me a comment below. I will respond as soon as possible.

Happy software development;

John

Follow me on Twitter: **@john_canessa**