LeetCode 208. Implement Trie (Prefix Tree) in Java

In this post I will solve the LeetCode 208 Implement Trie (Prefix Tree) problem using the Java programming language and the VSCode IDE on a Windows computer. After the code was accepted by LeetCode I decided to add a couple features. The first one is to provide a method to access one of the main uses for a Trie (https://en.wikipedia.org/wiki/Trie). The second, which is based on the first one, is used to display, in order, the words inserted into the Trie.

A trie (pronounced as "try") or prefix tree is a tree data structure 
used to efficiently store and retrieve keys in a dataset of strings. 
There are various applications of this data structure, 
such as autocomplete and spellchecker.

Implement the Trie class:

* Trie()
  Initializes the trie object.
* void insert(String word)
  Inserts the string word into the trie.
* boolean search(String word) 
  Returns true if the string word is in the trie (i.e., was inserted before), and false otherwise.
* boolean startsWith(String prefix) 
  Returns true if there is a previously inserted string word that has the prefix prefix, 
  and false otherwise.
 
Constraints:

o 1 <= word.length, prefix.length <= 2000
o word and prefix consist only of lowercase English letters.
o At most 3 * 10^4 calls in total will be made to insert, search, and startsWith.
 
Related Topics:

o Hash Table
o String
o Design
o Trie

The requirements are very well described. We need to implement the class Trie which must include a specific set of methods. In addition we need to specify the class members.

class Trie {

    /** Initialize your data structure here. */
    public Trie() {
        
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        
    }
    
    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        
    }
    
    /** Returns if there is any word in the trie that starts with the given prefix. */
    public boolean startsWith(String prefix) {
        
    }
}

/**
 * Your Trie object will be instantiated and called as such:
 * Trie obj = new Trie();
 * obj.insert(word);
 * boolean param_2 = obj.search(word);
 * boolean param_3 = obj.startsWith(prefix);
 */

The signature for the class lists the four methods we need to implement.

In addition LeetCode provides an example of how our Trie class implementation will be tested.

Since I will be developing the code on my Windows machine, I will not be using the online IDE provided by LeetCode. The reason for this is due to my preference of keeping the test code, test cases, and the solution on my computer. That way I can easily experiment with the solution even if I am offline. If you do not have a special requirement like I do, my suggestion is to use the online IDE provided by LeetCode.

Trie, insert, search, search, startsWith, insert, search, search
null, apple, apple, app, app, app, app, apple
main <<< cmdArr: [Trie, insert, search, search, startsWith, insert, search, search]
main <<< argArr: [null, apple, apple, app, app, app, app, apple]
main <<< trie: (ch: * end: false)
main <<< insert(apple): apple
main <<< search(apple): true
main <<< search(app): false
main <<< startsWith(app): true
main <<< insert(app): app
main <<< search(app): true
main <<< search(apple): true


Trie, insert, insert, insert, insert, insert, insert, startsWith, startsWith, startsWith     
null, bags, baggage, bat, banner, box, cloths, ba, bag, bags
main <<< cmdArr: [Trie, insert, insert, insert, insert, insert, insert, startsWith, startsWith, startsWith]
main <<< argArr: [null, bags, baggage, bat, banner, box, cloths, ba, bag, bags]
main <<< trie: (ch: * end: false)
main <<< insert(bags): bags
main <<< insert(baggage): baggage
main <<< insert(bat): bat
main <<< insert(banner): banner
main <<< insert(box): box
main <<< insert(cloths): cloths
main <<< startsWith(ba): true
main <<< startsWith(bag): true
main <<< startsWith(bags): true


Trie, insert, insert, insert, insert, insert, insert, anyWords, anyWords, anyWords, anyWords
null, bags, baggage, bat, banner, box, cloths, ba, bag, bags, *
main <<< cmdArr: [Trie, insert, insert, insert, insert, insert, insert, anyWords, anyWords, anyWords, anyWords]
main <<< argArr: [null, bags, baggage, bat, banner, box, cloths, ba, bag, bags, *]
main <<< trie: (ch: * end: false string: null)
main <<< insert(bags): bags
main <<< insert(baggage): baggage
main <<< insert(bat): bat
main <<< insert(banner): banner
main <<< insert(box): box
main <<< insert(cloths): cloths
main <<< anyWords(ba): [baggage, bags, banner, bat]
main <<< anyWords(bag): [baggage, bags]
main <<< anyWords(bags): [bags]
main <<< anyWords(*): [baggage, bags, banner, bat, box, cloths]


Trie, insert, insert, insert, insert, insert, insert, insert, insert, insert, insert, anyWords
null, coffee, doughnuts, milk, chocolate, cake, juice, cup, glass, cupcake, pancake, *
main <<< cmdArr: [Trie, insert, insert, insert, insert, insert, insert, insert, insert, insert, insert, anyWords]
main <<< argArr: [null, coffee, doughnuts, milk, chocolate, cake, juice, cup, glass, cupcake, pancake, *]
main <<< trie: (ch: * end: false string: null)
main <<< insert(coffee): coffee
main <<< insert(doughnuts): doughnuts
main <<< insert(milk): milk
main <<< insert(chocolate): chocolate
main <<< insert(cake): cake
main <<< insert(juice): juice
main <<< insert(cup): cup
main <<< insert(glass): glass
main <<< insert(cupcake): cupcake
main <<< insert(pancake): pancake
main <<< anyWords(*): [cake, chocolate, coffee, cupcake, cup, doughnuts, glass, juice, milk, pancake]

Each test case is separated from others by two blank lines.

The first two lines are inputs which our test scaffold must read and use to instantiate the Trie class and then make the necessary method calls to test the implementation.

Our test scaffold seems to read the two input lines, and create two String[] arrays named `cmdArr` and `argArr` which respectively hold the commands and associated argument per method invocation.

The partial contents of the root node in the trie is displayed. I used the `toString` method to be able to display a Trie node. Such implementation IS NOT PART OF THE SOLUTION! I used it to help me implement and debug the code.

The test code seems to process in tandem the two arrays making the necessary method calls. The results are displayed after each call is performed.

Please take a couple minutes to make sure the methods return the proper results.

Note that the second test case includes the `startsWith` method which adheres to the requirements. The problem is that if you would be using the Trie, knowing that there are words in the Trie with the specific characters is not too useful. What we would probably like to do is to display valid words that start with a specific prefix. This is how an autocomplete feature or spelling checker works.

If you look at the third test case, we use the not required `anyWords` method. It seems we can provide a prefix string and the method returns a list of valid words. In addition, if we pass on `*` to the method, it returns all the valid words in the Trie. In practice that would not be too useful due to the number of valid words that would be in the Trie. Since we are using a limited number of words, this method allows us to list them in alphabetical order. That way we can verify that all the words we insert are in the Trie (or not).

    /**
     * Test scaffold
     * @throws IOException
     * !!! NOT PART OF THE SOLUTION !!!
     */
    public static void main(String[] args) throws IOException {

        // **** initialization ****
        Trie trie = null;

        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read command sequence ****
        String[] cmdArr = Arrays.stream(br.readLine().trim().split(", "))
                            .toArray(size -> new String[size]);

        // **** read argument sequence ****
        String[] argArr = Arrays.stream(br.readLine().trim().split(", "))
                            .toArray(size -> new String[size]);

        // **** close buffered reader ****
        br.close();

        // **** check array lengths ****
        if (cmdArr.length != argArr.length) {
            System.err.println("main <<< cmdArr.length: " + cmdArr.length + 
                                " != argArr.length: " + argArr.length);
            System.exit(-1);
        }
        
        // ???? ????
        System.out.println("main <<< cmdArr: " + Arrays.toString(cmdArr));
        System.out.println("main <<< argArr: " + Arrays.toString(argArr));

        // **** loop once per command ****
        for (int i = 0; i < cmdArr.length; i++) {

            // **** proceed based on command ****
            switch (cmdArr[i]) {
                case "Trie":

                    // **** instantiate Trie ****
                    trie = new Trie();

                    // ???? ????
                    System.out.println("main <<< trie: " + trie.toString());
                break;

                case "insert":
                    System.out.println("main <<< insert(" + argArr[i] + "): " + trie.insert(argArr[i]));
                break;

                case "search":
                    System.out.println("main <<< search(" + argArr[i] + "): " + trie.search(argArr[i]));
                break;

                case "startsWith":
                    System.out.println("main <<< startsWith(" + argArr[i] + "): " + trie.startsWith(argArr[i]));
                break;

                case "anyWords":
                    System.out.println("main <<< anyWords(" + argArr[i] + "): " + trie.anyWords(argArr[i]));
                break;

                default:
                    System.out.println("main <<< unexpected cmdArr[" + i + "]: " + cmdArr[i]);
                break;
            }
        }
    }

We start our test scaffold by declaring the `trie` variable.

We then read the two input lines, populate the two String[] arrays, and display their content. We do this to make sure all is well before starting to develop the code of interest.

The loop is used to traverse through the contents of the `cmdArr` String[] array. As we call the associated methods we may use the associated argument found at the same index in the `argArr` String[] array.

The approach seems to be compatible with the one described by LeetCode. The only exception is the implementation of additional methods which do not seem to interfere with the solution.

Please note that the test scaffold IS NOT PART OF THE SOLUTION. If you develop a solution using the online IDE provided by LeetCode you will not be required to implement a test scaffold.

        // **** class members ****
        public char     ch;
        public String   string;
        public boolean  end;
        public Trie[]   children;

I decided to use four class members for the Trie. I added the `string` variable to optimize the operation of the `anyWords` method which is NOT REQUIRED TO BE PART OF THE SOLUTION. You can ignore it.

        /**
         * Constructor.
         */
        public Trie() {
            this.ch         = '*';
            this.children   = new Trie[26];
            this.end        = false;
        }


        /**
         * Constructor.
         */
        public Trie(char ch) {
            this.ch         = ch;
            this.children   = new Trie[26];
            this.end        = false;
        }


        /**
         * Constructor.
         */
        public Trie(char ch, String string) {
            this.ch         = ch;
            this.children   = new Trie[26];
            this.end        = false;
            this.string     = string;
        }

The first constructor is a good fit to declare the Trie. The second one is useful when inserting characters into the Trie on behalf of a word. The last one is used to insert nodes into the Trie when we need to populate the `string` variable. The variable is used to keep track of the substring of the word at the time of the insertion. The `string` variable will help us offer completion words when a prefix is presented and suggestions are needed to present to the caller. Will see how it is used shortly.

        /**
         * toString.
         */
        @Override
        public String toString() {
            return "(ch: " + this.ch +  " end: " + this.end + " string: " + this.string + ")";
        }

This is the `toString` implementation for a Trie node. I used it to track operation of the code and to debug issues. This method is not required. LeetCode ignores it.

        /**
         * Insert the string word into the trie.
         */
        public String insert(String word) {

            // **** sanity checks ****
            if (word.length() == 0) return word;

            // **** initialization ****
            Trie p  = this;
            int i   = 0;
            
            // **** traverse word checking and inserting nodes ****
            for (char ch : word.toCharArray()) {

                // **** traverse the trie adding / finding nodes with this ch ****
                if (p.children[ch - 'a'] != null) {
                    p = p.children[ch - 'a'];
                } else {

                    // **** create node for this ch ****
                    Trie node = new Trie(ch, word.substring(0, i + 1));

                    // **** add node to Trie ****
                    p.children[ch - 'a'] = node;

                    // **** update p ****
                    p = node;
                }

                // **** increment index ****
                i++;
            }

            // **** flag this ch as the end for word ****
            p.end = true;

            // **** return word ****
            return word;
        }

The `insert` method is used to insert a word into the Trie.

After the sanity checks and initialization steps, we traverse the characters from the `word` argument.

The if statement determines if the current character is already part of the word that we are inserting. The else clause allows us to create a new node which will be inserted into the trie. Note that the constructor uses the substring of the word being inserted. This will be used to offer completed words when specifying a prefix.

We then increment the index and continue in the loop. Note that I could have changed the loop to include the index `i` but I did not make such a change.

After the loop is done `p` will be pointing at the node with the last character in the word. We need to flag it as the end of a word.

The requirements call for the `insert` method to be `void`. I changed it to return the input `word` so the inserted value could be easily displayed.

        /**
         * Returns true if the string word is in the trie
         * (i.e., was inserted before), and false otherwise.
         */
        public boolean search(String word) {

            // **** sanity checks ****
            if (word.length() == 0 || word.equals(" ")) return false;

            // **** initialization ****
            Trie p = this;

            // **** traverse the trie looking for word ****
            for (char ch : word.toCharArray()) {

                // **** look for ch in children ****
                if (p.children[ch - 'a'] == null) return false;

                // **** keep on traversing the trie ****
                p = p.children[ch - 'a'];
            }

            // **** return if word was found  ****
            return p.end;
        }

The `search` method is used to search for a full `word` in the trie. 

This is done in the loop. The characters in the `word` are used to traverse the nodes in the trie. If at some point the current character `ch` is not found, the method returns `false` to indicate that the `word` is not in the trie.

If the loop completes we return `true` or p.end which is also set to the same value.

        /**
         * Returns true if there is a previously inserted string 
         * word that has the prefix prefix, and false otherwise.
         */
        public boolean startsWith(String prefix) {

            // **** sanity checks ****
            if (prefix.length() == 0) return false;

            // **** initialization ****
            Trie p = this;

            // **** traverse the trie looking for the prefix ****
            for (char ch : prefix.toCharArray()) {

                // **** look for ch in p's children ****
                if (p.children[ch - 'a'] == null) return false;

                // **** keep on traversing the trie ****
                p = p.children[ch - 'a'];
            }

            // **** found prefix in trie ****
            return true;
        }

The `startsWith` method is used to determine if the specified `prefix` is part of one or more words in the trie. It is very similar to the implementation of the `search` method. In this case when we exit the loop we return `true` to indicate that the `prefix` was found in the trie.

        /**
         * Returns a list of words if there are one or more 
         * previously inserted string words that have the prefix prefix.
         * If no words are found returns an empty list.
         */
        public List<String> anyWords(String prefix) {

            // **** initialization ****
            List<String> words = new ArrayList<>();

            // **** sanity checks ****
            if (prefix.length() == 0 || prefix.equals(" ")) return words;

            // **** to traverse trie ****
            Trie p = this;

            // **** get to the end of the prefix ****
            for (int i = 0; prefix.charAt(0) != '*' && i < prefix.length(); i++) {

                // **** get current character ****
                char ch = prefix.charAt(i);
    
                // **** ****
                if (p.children[ch - 'a'] == null) return words;

                // **** continue search ****
                p = p.children[ch - 'a'];
            }

            // **** search for full words starting here ****
            findWords(p, words);

            // **** return list of possible words ****
            return words;
        }

The `anyWords` method is not part of the requirements of this problem. IT IS NOT PART OF THE SOLUTION. I figured that if you are new to the Trie concept, it is nice to see the trie in action.

The idea is that the user has entered the first few letters of a word and the system wants to provide help by displaying (probably in a popup) options to autocomplete the word.

I implemented the function by first positioning at the end of the `prefix` in the trie. Note that if the prefix is not in the trie, the method returns an empty list.

Once we are at the end of the `prefix` we make a call to the recursive function `findWords` which will populate the specified `words` list with all the complete words that it can find with the specified `prefix`.

Please take a look at the third test case. Of interest are the last few calls to the method in question. We first call it with the `prefix` “ba” and the method returns “baggage”, “bags”, “banner” and “bat”. The next call represents the user inserting character `g` to form the `prefix` “bag”. The method now returns the words “baggage” and “bags”. The user the completes the word “bags” which the method indicates that it is a valid word. The process is completed.

        /**
         * Find full words in the trie starting at p
         * and add them to the list.
         * Recursive call.
         */
        public void findWords(Trie p, List<String>words) {

            // **** end condition ****
            if (p == null) return;

            // **** traverse trie ****
            for (Trie child : p.children)
                findWords(child, words);

            // ???? ????
            // System.out.println("findWords <<< p: " + p.toString());

            // **** check if full word ****
            if (p.end == true) {

                // ???? ????
                // System.out.println("<<< found full word ==>" + p.string + "<==");

                // **** add string to list ****
                words.add(p.string);
            }
        }

The `findWords` recursive function is used to find all the words in the trie that start with the initial `prefix` which at this time is represented by the `p` pointing to the node of the last character of the `prefix`.

As complete words are found they are inserted into the `words` list.

One more thing; if the user calls `findWords` method with a “*”, the function returns the entire list of full words in the trie. I did that in order to make sure that as words were being added to the trie the trie was properly storing them.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository named ImplementTriePrefixTree.

Please note that the code here presented might not be the best possible solution. After solving a problem, you should take a look at the best accepted solutions provided by the different websites (i.e., HackerRank, LeetCode to name a few). The idea is to learn concepts, not to memorize solutions.

If you have comments or questions regarding this, or any other post in this blog, please do not hesitate and leave me a note below. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer / engineering toolset.

Thanks for reading, feel free to connect with me John Canessa at LinkedIn.

Enjoy;

John

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.