Alien Dictionary

It is Thursday morning and it has been a long week at work. Apparently we have been experiencing lots of issues with our cloud platform. It seems that the storage server works fine when running on bare metal, but it experiences network related issues when running on virtual machines. We are currently looking at what might be causing the issues with the stack in VMs.

Before we start dealing with the problem at hand, I want to mention that I have a few books on algorithms. Like most people, I like to avoid spending money when I can get the same or similar results by doing a web search and reading the resulting material. I am a member of the ACM. I have access some of O’Reilly materials on-line. I believe both the books and videos have helped me locate quality information and are planning on continuing to use it.

In preparation for solving the problem in this post, I decided to watch a couple pertinent chapters in “Algorithms: 24-part Lecture Series” by Robert Sedgewick and Kevin Wayne. In time I have purchased a couple books authored by them. When I purchase a book, I tend to write my name and the year of purchase. I checked in my version of “Algorithms Fourth Edition” () by Robert Sedgewick and Kevin Wayne and it seems I purchased that book in 2016. I was planning on getting “

Algorithms, Fourth Edition: Book and 24-Part Lecture Series” also by Robert Sedgewick and Kevin Wayne but using an on-line reference was good enough. That changed earlier this week. I have ordered the book on Amazon and should be arriving around the second week of February 2021. I do like to annotate my books so in general I prefer paper copies. Will let you know my thoughts on the book after I read and experiment with the material from the first few chapters.

Not sure if it was this week or over the weekend, but I decided to tackle LeetCode problem 269 Alien Dictionary. I enjoyed researching new, experimenting and reviewing topics I have learned but have not come up for work or on previous problems.

There is a new alien language that uses the English alphabet.
However, the order among letters is unknown to you.

You are given a list of strings words from the dictionary, 
where words are "sorted lexicographically" by the rules of this new language.

Derive the order of letters in this language, and return it.
If the given input is invalid, return "". 
If there are multiple valid solutions, return any of them.

Constraints:

o 1 <= words.length <= 100
o 1 <= words[i].length <= 100
o words[i] consists of only lowercase English letters.

The words from the alien dictionary are in order. We are required to return in lexicographical order the characters used. It may sound simple, but there is a lot of detail.

We need to have a handle on What is lexicographical order?

Lexicographical order is alphabetical order. 
The other type is numerical ordering.

Consider the following values:

1,10,2

Those values are in lexicographical order.
10 comes after 2 in numerical order, but 10 comes before 2 in "alphabetical" order.

o So in Lexicographical order, only the first digit will be considered of the value? 
Am I understanding right?

No.
If the first digits match, then the second digits will be compared; 
but it compares like a String; 
that is 10 comes before 2 but 111 comes after 10 (but also after 1000). 
Because 0 is less than 1. 
A lexical sort compares the characters in each string as characters, not integral values.

I work on a storage server. It stores data on folders we call “disk caches”, each of which has a set of subfolders labeled “00”, “01”, … “ff”. To get a good example, which I ran into several times a day, on a Windows machine open a console and get the directory contents of a disk cache.

01/22/2021  11:59 AM    <DIR>          00
01/22/2021  11:59 AM    <DIR>          01
01/22/2021  11:59 AM    <DIR>          02
01/22/2021  11:59 AM    <DIR>          03
01/22/2021  11:59 AM    <DIR>          04
01/22/2021  11:59 AM    <DIR>          05
01/22/2021  11:59 AM    <DIR>          06
01/22/2021  11:59 AM    <DIR>          07
01/22/2021  11:59 AM    <DIR>          08
01/22/2021  11:59 AM    <DIR>          09
01/22/2021  11:59 AM    <DIR>          0a
01/22/2021  11:59 AM    <DIR>          0b
01/22/2021  11:59 AM    <DIR>          0c
01/22/2021  11:59 AM    <DIR>          0d
01/22/2021  11:59 AM    <DIR>          0e
01/22/2021  11:59 AM    <DIR>          0f
01/22/2021  11:59 AM    <DIR>          10
01/22/2021  11:59 AM    <DIR>          11
01/22/2021  11:59 AM    <DIR>          12
01/22/2021  11:59 AM    <DIR>          13

The listing is in ascending order (in ASCII ascending not lexicographical order).

Repeat the task this time using File Explorer.

The folders are displayed in lexicographical order.

It seems that we could solve this problem by defining a Directed acyclic graph and then performing a Topological Sorting of the characters.

A topological ordering is possible if and only if the graph has no directed cycles, 
that is, if it is a directed acyclic graph (DAG). 

Any DAG has at least one topological ordering, 
and algorithms are known for constructing a topological ordering of any DAG in linear time. 

For the topological sorting algorithm you can get some ideas from Topological Sorting.

We will attempt to solve this problem using the Java programming language on a Windows 10 computer using the VSCode IDE. After we have a candidate for the solution, we will have to copy and paste the code in the LeetCode IDE. A simpler approach could be to solve it directly using the LeetCode IDE.

    public String alienOrder(String[] words) {
        
    }

The signature of the method we need to implement expects a list of words in the alien language.

I forgot to provide additional detail regarding Lecture 13: Directed Graphs – Topological sort I watched on ACM. The lecture covers the subject using numbers (not lowercase characters). The algorithm returns an Iterable which is a stack. The results are returned in reverse order. If interested in the code I wrote while watching the video, you can find it in the GitHub repository associated with this post.

DAG = Directed Acyclic Graph
DFS == Depth First Search

Topological sort demo
Run DFS
Return vertices in reverse postorder

postorder: 4 1 2 5 0 6 3

As I mentioned, you can get access to the video or the book I mentioned earlier.

Since we are developing the code on a personal computer, we will have to implement a simple test scaffolding to read in the alien words, pass them to the method we need to implement, and for our benefit display the results.

"wrt","wrf","er","ett","rftt"
main <<<  words: [wrt, wrf, er, ett, rftt]
main <<< result: wertf
main <<< result: wertf


"z","x"
main <<<  words: [z, x]
main <<< result: zx
main <<< result: zx


"z","x","z"
main <<<  words: [z, x, z]
main <<< result:
main <<< result:


"abc","ab"
main <<<  words: [abc, ab]
main <<< result:
main <<< result:


"gtal"
main <<<  words: [gtal]
main <<< result: atgl
main <<< result: atgl

The LeetCode site provides some test cases. The others came from a failed attempt or just experimenting. Note that I kept the list relatively short.

Note that it seems we have two different implementations. IN this post we will cover one. If interested in the second, you can find the code in my GitHub (https://github.com/JohnCanessa/AlienDictionary) repository.

    /**
     * Test scaffolding.
     * 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        
        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read string[] ****
        String[] words = br.readLine().trim().split(",");

        // **** close buffered reader ****
        br.close();

        // **** remove "s from all words ****
        for (int i = 0; i < words.length; i++)
            words[i] = words[i].substring(1, words[i].length() - 1);

        // ???? ????
        System.out.println("main <<<  words: " + Arrays.toString(words));

        // **** call method and display result ****
        System.out.println("main <<< result: " + alienOrder(words));

        // **** call method and display result ****
        System.out.println("main <<< result: " + alienOrder1(words));
    }

We use a buffered reader to read the input line with the list of alien words. We declare and populate a String[] and remove the double quotes from the words. The words are displayed to make sure all is well so far. Then we call two versions of the method in question. Both seem to return the same answer.

    /**
     * Derive the order of letters in this language, and return it.
     * 
     * 1) build a graph
     * 2) topological sort of the graph
     * 
     * Runtime: 4 ms, faster than 37.06% of Java online submissions.
     * Memory Usage: 38.4 MB, less than 63.50% of Java online submissions.
     */
    static String alienOrder(String[] words) {

        // **** sanity check ****
        if (words.length == 0)
            return "";

        // **** reverse order of single string ****
        // if (words.length == 1) {
        //     StringBuilder sb = new StringBuilder(words[0]);
        //     return sb.reverse().toString();
        // }
        
        // **** return this string (if needed) ****
        if (words.length == 1)
            return words[0];

        // **** initialization ****
        int[] inDegree                      = new int[26];
        Map<Character, Set<Character>> g    = new HashMap<>();

        // **** 1) build graph ****
        buildGraph(g, words, inDegree);


        // ???? ????
        // System.out.println("alienOrder <<<        g: " + g.toString());
        // System.out.println("alienOrder <<< g.size(): " + g.size());

        // ???? ????
        // DepthFirstSearch dfs = new DepthFirstSearch(g);
        // Iterator<Character> it = dfs.reversePost().iterator();
        // StringBuilder sb = new StringBuilder();
        // while (it.hasNext()) {
        //     sb.append(it.next());
        // }
        // System.out.println("main <<< sb: " + sb.reverse().toString());

        // ???? ????
        // boolean cyclic = dfs.isCyclic(g);
        // System.out.println("main <<< cyclic: " + cyclic);


        // **** 2) graph topological sort ****
        return bfs(g, inDegree);
    }

We start by performing some sanity checks. If there are no words, we return a blank string. If we have a single word the word is returned. Note that we tried returning the reversed string. That was also accepted.

We use an int[] to keep track on the number of edges coming into a node holding each letter. The graph is implemented with a hash map. The key is the letter at the node, and the value is a set holding the letters that are directed to the node.

The first step after the initialization builds the graph. The second step performs a topological sort of the graph. This will be used to get the order of the letters.

The code that has been commented out, was used to test and experiment with the graph.

    /**
     * Build graph.
     */
    private static void buildGraph( Map<Character, Set<Character>> g, 
                                    String[] words, 
                                    int[] inDegree) {

        // **** initialization ****
        for (String word : words) {
            for (char c : word.toCharArray()) {
                g.putIfAbsent(c, new HashSet<>());
            }
        }

        // **** traverse the words[] two words at a time ****
        for (int i = 1; i < words.length; i++) {

            // **** for ease of use ****
            String first    = words[i - 1];
            String second   = words[i];
            int len         = Math.min(first.length(), second.length());

            // ???? ????
            // System.out.println("buildGraph <<< len: " + len 
            //                     + " first: " + first + " second: " + second);

            // **** look for different characters in these words ****
            for (int j = 0; j < len; j++) {

                // ???? ????
                // System.out.println("buildGraph <<< j: " + j);
                // System.out.println("buildGraph <<< firstCh: " + first.charAt(j) 
                //                     + " secondCh: " + second.charAt(j));

                // **** check if characters differ ****
                if (first.charAt(j) != second.charAt(j)) {

                    // **** for ease of use ****
                    char out    = first.charAt(j);
                    char in     = second.charAt(j);

                    // ???? ????
                    // System.out.println("buildGraph <<< out: " + out + " in: " + in);

                    // **** add incomming edge ****
                    if (!g.get(out).contains(in)) {
                        g.get(out).add(in);
                        inDegree[in - 'a']++;
                    }

                    // **** ****
                    break;
                } else {

                    // **** ****
                    if (j + 1 == len && first.length() > second.length()) {
                        g.clear();
                        break;
                    }
                }
            }
        }

        // ???? ????
        // System.out.println("buildGraph <<<        g: " + g.toString());
        // System.out.println("buildGraph <<< inDegree: " + Arrays.toString(inDegree));
    }

We start by initializing the graph. Note that at this time we end up with a node for each letter found in any of the input words.

We then traverse the array of words processing two adjacent words at a time.

We get the length of the shorter word because we need to compare letters in the position in both words.

Since we are comparing words in lexicographically order, we check that each word contains the same characters. When we find a word that does not, we add an edge in the map and increment the count of edges in the inDegree array. If the characters match, then we check if the length of the first word is longer than the second. If so, we clear the graph and return. The order of characters is not defined.

    /**
     * Topological graph sort using BFS approach.
     */
    private static String bfs(  Map<Character, Set<Character>> g,
                                int[] inDegree) {

        // **** initialization ****
        Queue<Character> q  = new LinkedList<>();
        StringBuilder sb    = new StringBuilder();
        int totalChars      = g.size();

        // ???? ????
        // System.out.println("bfs <<< totalChars: " + totalChars);

        // **** prime the queue ****
        for (char c : g.keySet()) {
            if (inDegree == 0) {
                sb.append(c);
                q.offer(c);
            }
        }

        // **** loop inserting and removing characters until the queue is empty ****
        while (!q.isEmpty()) {

            // ???? ????
            // System.out.println("bfs <<<   q: " + q.toString());

            // **** retrieve and remove current character from the queue ****
            char cur = q.poll();

            // ???? ????
            // System.out.println("bfs <<< cur: " + cur);

            // **** current node might not have neighbors ****
            if ((g.get(cur) == null) || (g.get(cur).size() == 0))
                continue;

            // **** ****
            for (char neighbor : g.get(cur)) {
                inDegree[neighbor - 'a']--;
                if (inDegree[neighbor - 'a'] == 0) {
                    q.offer(neighbor);
                    sb.append(neighbor);
                }
            }  
        }

        // **** return the string ****
        return (sb.length() == totalChars) ? sb.toString() : "";
    }

At this point we have a graph. We now need to generate a string with the characters is lexicographical sorted order. We will use a breadth first approach to populate the string.

We prime the queue and then proceed with the BFS algorithm.

Once we have processed all the letters we end up in the string builder with a string holding the characters in lexicographically sorted order.

If we were able to process all characters (no cycles), we return the string; otherwise we return a blank string.

As you can see in the comments section of the alienOrder() method, this implementation was accepted by LeetCode.

Keeping in mind what we did, the following implementation is shorter but is implemented using a similar approach.


    /**
     * Derive the order of letters in this language, and return it.
     */
    static String alienOrder1(String[] words) {

        // **** initialization ****
        Map<Character, List<Character>> hm  = new HashMap<>();
        Map<Character, Integer> inDegree    = new HashMap<>();
        
        for(String word: words) {
            for(char c: word.toCharArray()) {
                hm.put(c, new ArrayList<>());
                inDegree.put(c, 0);
            }
        }

        // ???? ????
        System.out.println("<<<       hm: " + hm.toString());
        System.out.println("<<< inDegree: " + inDegree.toString());

        // **** compare two consecutive words at a time ****
        for (int i = 0; i < words.length - 1; i++) {

            // **** for ease of use ****
            String s1 = words[i];
            String s2 = words[i + 1];

            // **** same characters; different lengths ****
            if (s1.length() > s2.length() && s1.startsWith(s2)) 
                return "";

            // **** use length of shorter string to compare characters ****
            int len = Math.min(s1.length(), s2.length());

            // ???? ????
            System.out.println("<<< len: " + len);

            // **** ****
            for(int j = 0; j < len; j++) {

                // **** for ease of use ****
                char c1 = s1.charAt(j);
                char c2 = s2.charAt(j);

                // **** ****
                if(c1 != c2) {
                    hm.get(c1).add(c2);
                    inDegree.put(c2, inDegree.get(c2) + 1);
                    break;
                }
            }
        }

        // **** ****
        Deque<Character> q = new ArrayDeque<>();

        // ???? ????
        System.out.println("<<< inDegree: " + inDegree.toString());

        // **** ****
        for (Character c: inDegree.keySet()) {
            if(inDegree.get(c) == 0) {
                q.add(c);
            }
        }

        // ???? ????
        System.out.println("<<<        q: " + q.toString());
        System.out.println("<<<       hm: " + hm.toString());

        // **** holds string to return ****
        StringBuilder sb = new StringBuilder();

        // **** traverse the queue ****
        while (!q.isEmpty()) {

            // ???? ????
            System.out.println("<<<        q: " + q.toString());
            System.out.println("<<<       sb: " + sb.toString());

            // **** get the nexh character from the queue ****
            char cur = q.poll();

            // **** append it to the string ****
            sb.append(cur);

            // **** ****
            if (!hm.containsKey(cur))
                continue;

            // **** ****
            for (char next: hm.get(cur)) {
                inDegree.put(next, inDegree.get(next) - 1);
                if (inDegree.get(next) == 0) {
                    q.add(next);
                }
            }
        }

        // ???? ????
        System.out.println("<<< inDegree: " + inDegree.toString());

        // **** ****
        return (sb.length() == inDegree.size()) ? sb.toString() : "";
    }

In this approach we replace the inDegree variable that was an integer array with a hash map.

We populate the graph which is implemented with the hm variable which is also a hash map.

As you recalled from the previous implementation, we need to compare two adjacent words per pass in the loop.

We check if the length of the strings and characters and decide that the alphabet is not valid.

The next loop is used to process each character at the same position in the two words. If the characters differ, we add the character from the second word as an edge and increment the inDegree.

We are now ready to perform the BSF part of the algorithm. Note the similarities and differences.

When all is set and done we return the string with the alien characters, or an empty string indicating that the alphabet could not be defined by the provided set of words.

Note that I left the debugging lines not commented out.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all 6,238 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.