Implement Trie (Prefix Tree)

It is Wednesday morning and the cleaning crew is at our place making lots of noise. Typically they show up on Fridays but due to personal issues they requested to move the appointment. They do an excellent job. If you live in the Twin Cities of Minneapolis and St. Paul and are in need of a good cleaning service, please let me know. I will pass your information so you can get in touch with them or vice versa. I value privacy.

While looking at a matrix of algorithms and data structures, I saw Tries. I have not used them in a while so I decided to search for a problem in LeetCode. I found LeetCode 208 Implement Trie (Prefix Tree) so I went for it.

Implement a trie with insert, search, and startsWith methods.

The requirements are quite simple. Just do it.

Trie trie = new Trie();

trie.insert("apple");
trie.search("apple");   // returns true
trie.search("app");     // returns false
trie.startsWith("app"); // returns true
trie.insert("app");   
trie.search("app");     // returns true

Note:

o You may assume that all inputs are consist of lowercase letters a-z.
o All inputs are guaranteed to be non-empty strings.

The way LeetCode will judge your solution is very well explained. A Trie object is instantiated. A set of operations with different words will be performed. Makes sense and we have seen this approach in many different problems.

Note that the input will be words using lowercase letters. We also will not be presented with null or blank strings.

I will go with Java to solve this problem. In reality it is not a problem it is a class we need to implement and LeetCode will be kind enough to verify it works by running a set of tests.

I will implement the code in a Windows 10 machine in my personal office at home. I will use the VSCode IDE. Due to this fact, I will have to generate a simple test scaffolding to test the Trie class implementation in my computer. Such code is NOT PART OF THE SOLUTION. It is much simpler to implement the code on the LeetCode site using the provided IDE.

class Trie {

    /** Initialize your data structure here. */
    public Trie() {
        
    }
    
    /** Inserts a word into the trie. */
    public void insert(String word) {
        
    }
    
    /** Returns if the word is in the trie. */
    public boolean search(String word) {
        
    }
    
    /** Returns if there is any word in the trie that starts with the given prefix. */
    public boolean startsWith(String prefix) {
        
    }
}

/**
 * Your Trie object will be instantiated and called as such:
 * Trie obj = new Trie();
 * obj.insert(word);
 * boolean param_2 = obj.search(word);
 * boolean param_3 = obj.startsWith(prefix);
 */

The signature for the Trie class is provided. We just need to define some class members and populate the specified methods.

main <<< search("apple"): false
main <<< trie: ch:   end: false children: [null, null, null, null, 
null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null]   
main <<< insert("apple")
main <<< search("apple"): true
main <<< search("app"): false
main <<< startWith("app")true
main <<< insert("app")
main <<< search("app"): true
main <<< search("orange"): false

This is the output of the test scaffolding we had to generate. We perform a set of operations that kind of match the sequence suggested by LeetCode. Note that the second output line displays the root node of the Trie. That is NOT PART OF THE SOLUTION. This will become clear when we look at the code. Note that LeetCode provided the signature for the Trie class and the toString() method was not included. It is also NOT USED IN THE SOLUTION.

    /**
     * Test scaffolding.
     * 
     * !!!! NOT PART OF SOLUTION !!!!
     */
    public static void main(String[] args) {
        
        // **** create and initialize the trie ****
        Trie trie = new Trie();

        // **** search for this word ****
        System.out.println("main <<< search(\"apple\"): " + trie.search("apple")); 

        // ???? ????
        System.out.println("main <<< trie: " + trie.toString());

        // **** insert this word ****
        trie.insert("apple");
        System.out.println("main <<< insert(\"apple\")");

        // **** search for this word ****
        System.out.println("main <<< search(\"apple\"): " + trie.search("apple")); 

        // **** search for this word ****
        System.out.println("main <<< search(\"app\"): " + trie.search("app"));

        // **** start with this word ****
        System.out.println("main <<< startWith(\"app\")" + trie.startsWith("app"));

        // **** insert this word ****
        trie.insert("app");
        System.out.println("main <<< insert(\"app\")");

        // **** search for this word ****
        System.out.println("main <<< search(\"app\"): " + trie.search("app"));

        // **** search for this word ****
        System.out.println("main <<< search(\"orange\"): " + trie.search("orange"));   
    }

We create a Trie by calling one of the constructors. Note that the different required methods are called at least once. To keep track of what is going on I added displaying what is happening on each step.

I believe there is not much more to say about the test scaffolding.

/**
 * LeetCode 208. Implement Trie (Prefix Tree)
 * https://leetcode.com/problems/implement-trie-prefix-tree/
 * 
 * Runtime: 40 ms, faster than 34.67% of Java online submissions.
 * Memory Usage: 47 MB, less than 98.85% of Java online submissions.
 * 
 * Runtime: 29 ms, faster than 95.08% of Java online submissions.
 * Memory Usage: 48.4 MB, less than 73.26% of Java online submissions.
 */
public class Trie {

    // **** class members ****
    public char ch;

    // public TreeMap<Character, Trie> children;
    public Trie[] children;

    public boolean end;

	// ???? class methods follow ????
}

As you can see I separated two runs. The first one was somewhat slower. The second was somewhat faster.

We have three class members. The ch is used to hold a character from a word and the end flag indicates the end of a complete word. Note that some words may be in the same path i.e., “app”, “apple”, “application”, and “apparatus”. The trie needs to flag the end of each word.

The third class variable named children has been implemented first as a TreeMap and second as a Trie[]. The commented out code with the TreeMap was associated with the first submission, and the second submission is associated with the Trie[] implementation. This is due to the fact that all words are expressed in 26 lowercase characters.

    /** 
     * Initialize your data structure here.
     */
    public Trie() {
        this.ch         = ' ';

        // this.children   = new TreeMap<Character, Trie>();
        this.children   = new Trie[26];

        this.end        = false;
    }

    /** 
     * Initialize your data structure here.
     */
    public Trie(char ch) {
        this.ch         = ch;

        // this.children   = new TreeMap<Character, Trie>();
        this.children   = new Trie[26];

        this.end        = false;
    }

I decided to have two constructors. The first is used for the root node in the Trie. The second is used for nodes associated with characters from a word. We only needed one constructor and then we could have set the ch field by using trie.ch = some_character but I did not.

    /** 
     * Inserts a word into the trie.
     */
    public void insert(String word) {

        // **** initialization ****
        Trie p = this;

        // **** traverse the word checking and inserting nodes ****
        for (int i = 0; i < word.length(); i++) {

            // **** ****
            char ch = word.charAt(i);

            // **** check we have a child with this character ****

            // if (p.children.containsKey(ch)) {
            //     p = p.children.get(ch);
            // } else {
            //     Trie node = new Trie(ch);
            //     p.children.put(ch, node);
            //     p = node;
            // }

            if (p.children[ch - 'a'] != null)
                p = p.children[ch - 'a'];
            else {
                Trie node = new Trie(ch);
                p.children[ch - 'a'] = node;
                p = node;
            }
        }

        // **** flag as a end of word ****
        p.end = true;
    }

This is the method that is used to insert a word into our Trie. We start by referencing the root node which serves as a start for all words. The first character of each word is a child of this root node.

We then enter a loop traversing the word character by character. On each pass we check if the current character is already in the Trie. If so we proceed to the next node. If not, we create a new node with the current character, add the node to the children and proceed with the current node.

Note that the commented out code implements the use of a TreeMap while the active code uses a Trie[]. The logic used is the same. The space and performance vary.

Finally, we need to indicate that the last node represents the end of a word.

    /** 
     * Returns true if the word is in the trie.
     */
    public boolean search(String word) {
        
        // **** initialization ****
        Trie p = this;

        // **** traverse the word and the trie ****
        for (int i = 0; i < word.length(); i++) {

            // **** ****
            char ch = word.charAt(i);

            // **** check if this character is in the trie ****

            // if (p.children.containsKey(ch)) {
            //     p = p.children.get(ch);
            // } else {
            //     return false;
            // }

            if (p.children[ch - 'a'] == null)
                return false;
            
            // **** move to next node ****
            p = p.children[ch - 'a'];
        }

        // **** word in trie ****
        return p.end;
    }

This method has the TreeMap code commented out. As before, the logic we cover holds for a TreeMap or a Trie[].

We start at the root node. We traverse the characters in the word. On each loop we check if the current character is or is not in the trie. If no, we return false.

If we complete the loop we have found the word. Note that we do not return true. We return the value of the last node in the trie. The reason is that the specified word may or may not be a complete word.

    /** 
     * Returns true if there is any word in the trie 
     * that starts with the given prefix.
     */
    public boolean startsWith(String prefix) {

        // **** initialization ****
        Trie p = this;

        // **** traverse the word and the trie ****
        for (int i = 0; i < prefix.length(); i++) {

            // **** ****
            char ch = prefix.charAt(i);

            // **** check if this character is in the trie (or not) ****

            // if (p.children.containsKey(ch)) {
            //     p = p.children.get(ch);
            // } else {
            //     return false;
            // }

            if (p.children[ch - 'a'] == null)
                return false;

            // **** move to the next node ****
            p = p.children[ch - 'a'];

        }

        // **** found prefix ****
        return true;
    }

This method is used to look for a prefix. In other words, we just care for the sequence of characters and we do not care it they are terminated with the end of a word or not.

We have seen this code in this post. Note that this method returns true if the loop completes.

    /**
     * To string method.
     * 
     * !!!! NOT PART OF SOLUTION !!!!
     */
    @Override
    public String toString() {
        // return "ch: " + this.ch + " end: " + this.end + 
        //         " children: " + this.children.toString();

        return "ch: " + this.ch + " end: " + this.end + 
                    " children: " + Arrays.toString(this.children);
    }

I just wanted to display a node in the Trie. We used it once from the test scaffolding. This code IS NOT PART OF THE SOLUTION.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all 6,907 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.