Wildcard Matching

It has been a few days since my last post. Busy with work and family items. The weather in the Twin Cities of Minneapolis and St. Paul has been quite pleasant. The temperature has been at or slightly above average. On the other hand precipitation seems to be below average. Hopefully this pattern will not disrupt farmer’s production.

I had selected a couple articles for brief discussion in this post. I did not bookmark one of them and I am having a hard time finding it. In a nutshell the articled had to do with Pfizer, their COVID-19 vaccine and India being able to mass-produce it. In the USA we have the concept of a patent. It is a great idea that helps the rightful owner of an idea to freely market and sell their invention for a period of time. If not mistaken the period is seven years. I fully agree with such concept. It is fair and invention needs to be rewarded to motivate progress.

Like any other subject, there are always two sides to the same coin. Biden has proposed to ban all COVID-19 patents to help the world win the battle against it. Good for public relations, but would such approach work? My opinion is no. In the USA alone there are about 30% of the people that have and will refuse a COVID-19 vaccine. That said, we will not achieve herd immunity. I have relatives and friends in different countries. Some are very interested in getting a vaccine. Some have traveled to the USA at a high expense, spent a month here, and received the two doses of the Pfizer vaccine. Some have no desire in getting vaccinated at all. The main reasons are the lack of effectiveness (most vaccines provide 100% protection) and side effects (you can possibly get ill after receiving the vaccine). Not sure how effective is the Pfizer vaccine, but it is not 100%. Regarding the side effects, the COVID-19 vaccines administered in the USA seem to not cause side effects. You cannot say the same for the two Chinese vaccines with are about 70% effective and have shown a large number of people getting very ill with side effects. Hopefully the numbers are rapidly improving with time.

I received a link to the post “18 Reasons I Won’t Be Getting a COVID Vaccine” by Christian Elliot. Some of the points are hard to ignore and our government should act on them. On the other side big Pharma lawyers and lobbyist will make change hard to achieve.

BTW, my wife and I have received the two Pfizer doses. The same holds true for the family of one of our sons. The other is still thinking about it.

The other news that called my interest comes from Yale University. The article “DODD: Sit down, be humble” by Ethan Dodd. Perhaps the article is too political, but politics seem to be something we are all familiar with.

What called my attention was the acceptance that no one has all the answers. That is especially true at work. People have different backgrounds and experiences. That is what diversity means to me. Color of skin, religious believes, and gender as examples, is not what makes people diverse. We are diverse because of our unique experiences. Of course, as the article states, pretending you have all the answers is not beneficial in any company, relationship or society. Keep that in mind the next time you are working on a project.

Enough chit-chat and let’s get to the main subject of this post. I went after LeetCode problem 44 Wildcard Matching. This is rated as a hard problem. I agree with the rating.

On a side note, if you check the related topics, which I did, Dynamic Programming is included in the set. For some reason it used to be that Facebook did not consider interview problems that required dynamic programming. Not sure why or if such policy has changed with time.

Given an input string (s) and a pattern (p), 
implement wildcard pattern matching 
with support for '?' and '*' where:

o '?' Matches any single character.
o '*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

Constraints:

o 0 <= s.length, p.length <= 2000
o s contains only lowercase English letters.
o p contains only lowercase English letters, '?' or '*'

The requirements are simple to understand. I believe most (never generalize) software engineers are familiar with regular expressions. We have used them in several posts in this blog. We are given a string and a pattern. Our task is to determine if the pattern matches the contents of the string.

We will use the Java programming language and the VSCode IDE to solve the problem on a Windows 10 computer. Because of this choice we will also need to include a simple test scaffold which will read the two input strings, call our implementation of the required method / functions and display the result. If you use the IDE provided by LeetCode you will not require building test code.

public boolean isMatch(String s, String p) {
	
}

The signature for the method / function is provided. We need a string and a pattern.

aa
a
main <<< s ==>aa<==
main <<< p ==>a<==
main <<<   output: false
main <<< duration: 9 ms
main <<<   output: false
main <<< duration: 0 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 0 ms
main <<<   output: false
main <<< duration: 0 ms


aa
*
main <<< s ==>aa<==
main <<< p ==>*<==
main <<<   output: true
main <<< duration: 13 ms
main <<<   output: true
main <<< duration: 0 ms
main <<<   output: true
main <<< duration: 0 ms
main <<<   output: true
main <<< duration: 0 ms
main <<<   output: true
main <<< duration: 0 ms


cb
?a
main <<< s ==>cb<==
main <<< p ==>?a<==
main <<<   output: false
main <<< duration: 11 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 0 ms
main <<<   output: false
main <<< duration: 1 ms


adceb
*a*b
main <<< s ==>adceb<==
main <<< p ==>*a*b<==
main <<<   output: true
main <<< duration: 9 ms
main <<<   output: true
main <<< duration: 0 ms
main <<<   output: true
main <<< duration: 1 ms
main <<<   output: true
main <<< duration: 0 ms
main <<<   output: true
main <<< duration: 0 ms


acdcb
a*c?b
main <<< s ==>acdcb<==
main <<< p ==>a*c?b<==
main <<<   output: false
main <<< duration: 10 ms
main <<<   output: false
main <<< duration: 0 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 0 ms
main <<<   output: false
main <<< duration: 0 ms


babbbbaabababaabbababaababaabbaabababbaaababbababaaaaaabbabaaaabababbabbababbbaaaababbbabbbbbbbbbbaabbb
b**bb**a**bba*b**a*bbb**aba***babbb*aa****aabb*bbb***a
main <<< s ==>babbbbaabababaabbababaababaabbaabababbaaababbababaaaaaabbabaaaabababbabbababbbaaaababbbabbbbbbbbbbaabbb<==
main <<< p ==>b**bb**a**bba*b**a*bbb**aba***babbb*aa****aabb*bbb***a<==
main <<<   output: false
main <<< duration: 11575 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 1499 ms
main <<<   output: false
main <<< duration: 1 ms
main <<<   output: false
main <<< duration: 0 ms

The LeetCode site provides five test cases. The last test case part of the set of almost 2,000 test cases that you need to pass to get your solution accepted.

We are provided with two input lines. The first represents the string and the second the pattern.

Our test code reads the input strings and displays them to verify that all is well so far.

We then have a set of lines in groups of two. The first line indicates the result of an implementation of the function of interest. The second line displays the time the function took to generate the associated result. It seems we have five implementations.

It seems that all implementations return the same results. The execution times vary between them but the differences are in a range of 1:10.

We now take a look at an actual test conducted by LeetCode. The execution times vary base on the implementation. Obviously some are considerably faster than others. Also note that the last test case is quite elaborate and could easily be simplified by replacing multiple consecutive ‘*’s by a single one.

   /**
     * Test scaffold.
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {

        // **** initialization ****
        long start  = 0;
        long end    = 0;

        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read string ****
        String s = br.readLine().trim();

        // **** read pattern ****
        String p = br.readLine().trim();

        // **** close buffered reader ****
        br.close();

        // ???? ????
        System.out.println("main <<< s ==>" + s + "<==");
        System.out.println("main <<< p ==>" + p + "<==");

        // **** call method of interest and display result ****
        start = System.currentTimeMillis();
        System.out.println("main <<<   output: " + isMatch0(s, p));
        end = System.currentTimeMillis();
        System.out.println("main <<< duration: " + (end - start) + " ms");

        // **** call method of interest and display result ****
        start = System.currentTimeMillis();
        System.out.println("main <<<   output: " + isMatch1(s, p));
        end = System.currentTimeMillis();
        System.out.println("main <<< duration: " + (end - start) + " ms");

        // **** call method of interest and display result ****
        start = System.currentTimeMillis();
        System.out.println("main <<<   output: " + isMatch2(s, p));
        end = System.currentTimeMillis();
        System.out.println("main <<< duration: " + (end - start) + " ms");

        // ???? ????
        // System.out.println("main <<<      hm: " + hm.toString());
        // System.out.println("main <<< hm.size: " + hm.size() + " callCounter: " + callCounter);

        // **** call method of interest and display result ****
        start = System.currentTimeMillis();
        System.out.println("main <<<   output: " + isMatch3(s, p));
        end = System.currentTimeMillis();
        System.out.println("main <<< duration: " + (end - start) + " ms");

        // **** call method of interest and display result ****
        start = System.currentTimeMillis();
        System.out.println("main <<<   output: " + isMatch(s, p));
        end = System.currentTimeMillis();
        System.out.println("main <<< duration: " + (end - start) + " ms");
    }

Our test code starts by initializing two values that will be used to time our different implementations.

We then use a buffered reader to read the string and pattern required by the method in question. Both strings are then displayed.

We then call five implementations of the function / method in question. On each instance we compute and display the elapsed time in milliseconds. We also display the result. In all six examples the different implementations return the same value.

Note that we have a couple lines that have been commented out. As we will see later in this post, they are used to count the number of times the method in question is called, and compare it to the number of different values generated by calling the method.

Note that timing the functions and counting the number of times a function is called is NOT PART OF THE SOLUTION.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'.
     * 
     * first arg == p
     * second arg == s
     */
    static boolean isMatch0(String s, String p) {

        // **** base condition ****
        if (p.length() == 0 && s.length() == 0) {
            return true;
        }

        // // ???? ????
        // System.out.println("<<< p ==>" + p + "<==");

        // **** replace multiple contiguous '*'s with a single '*' ****
        p = p.replaceAll("\\*{2,}", "*");

        // // ???? ????
        // System.out.println("<<< p ==>" + p + "<== p.length: " + p.length());
        // System.out.println("<<< s ==>" + s + "<== s.length: " + s.length());

        // **** '?' in pattern and empty string ****
        if (p.length() > 0 && p.charAt(0) == '?' && s.length() == 0) {
            return false;
        }

        // **** '*' in pattern and empty string ****
        if (p.length() > 0 && p.charAt(0) == '*' && s.length() == 0) {
            return isMatch0(s, p.substring(1));
        }

        // **** '?' in pattern and character in string ****
        if (p.length() > 0 && p.charAt(0) == '?' && s.length() > 0)
            return isMatch0(s.substring(1), p.substring(1));

        // **** make sure the characters after '*' are present in second string ****
        if (p.length() > 1 && p.charAt(0) == '*' && s.length() == 0)
            return false;

        // **** '?' in pattern ****
        if ((p.length() > 1 && p.charAt(0) == '?') || 
            (p.length() != 0 && s.length() != 0 && p.charAt(0) == s.charAt(0)))
            return isMatch0(s.substring(1), p.substring(1));

        // **** a) consider current character of second string
        //      b) ignore current character of second string ****
        if (p.length() > 0 && p.charAt(0) == '*')
            return isMatch0(s, p.substring(1)) || isMatch0(s.substring(1), p);
        
        // **** ****
        return false;
    }

This method represents my first approach. Note that there is no runtime or memory usage information. The reason is that it passed all the first five examples but timed out while executing the sixth test which is part of the acceptance process from LeetCode.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'. 
     * 
     * Runtime: 26 ms, faster than 43.80% of Java online submissions.
     * Memory Usage: 39.4 MB, less than 47.89% of Java online submissions.
     */
    static boolean isMatch1(String s, String p) {

        // **** initialization ****
        boolean[][] match = new boolean[s.length() + 1][p.length() + 1];
        match[s.length()][p.length()] = true;

        // // ???? ????
        // System.out.println("<<< match: ");
        // for (int i = 0; i < match.length; i++)
        //     System.out.println(Arrays.toString(match[i]));

        // **** initialize match ****
        for (int i = p.length() - 1; i >= 0; i--) {
            if (p.charAt(i) != '*')
                break;
            else 
                match[s.length()][i] = true;
        }

        // // ???? ????
        // System.out.println("<<< match: ");
        // for (int i = 0; i < match.length; i++)
        //     System.out.println(Arrays.toString(match[i]));

        // **** process string and pattern backwards ****
        for (int i = s.length() - 1; i >= 0; i--) {
            for (int j = p.length() - 1; j >= 0; j--) {
                if (s.charAt(i) == p.charAt(j) || p.charAt(j) == '?')
                    match[i][j] = match[i + 1][j + 1];
                else if (p.charAt(j) == '*') 
                    match[i][j] = match[i + 1][j] || match[i][j + 1];
                else
                    match[i][j] = false;
            }
        }

        // // ???? ????
        // System.out.println("<<< match: ");
        // for (int i = 0; i < match.length; i++)
        //     System.out.println(Arrays.toString(match[i]));

        // **** return result ****
        return match[0][0];
    }

This method uses a two-dimensional array which is used to determine the result by traversing while populating the match array.

This version was accepted by LeetCode. Not the fastest, so I decided to continue reading and experimenting.

    // // ???? to keep track of repeated calls ????
    // static HashMap<String, Integer> hm = null;
    // static int callCounter = 0;


    // **** to cache repeated values (memoization) ****
    static HashMap<String, Boolean> cache = null;

The commented code is used to track the intermediate values and the number of times the method in question is called.

The cache is used to cache values so we do not repeat processing.

When you look at the simple examples, there is no need to cache because the values do not repeat and the number of calls to the method in question is quite small. When you take a look at the last example, things change. The number of call is quite large and a large number of values repeat. This is when memoization plays an important role.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'.
     * 
     * Entry point for recursion.
     * Bottom down approach.
     */
    static boolean isMatch2(String s, String p) {

        // // ???? to keep track of repeated calls ????
        // hm = new HashMap<String, Integer>();

        // **** initialize cache ****
        cache = new HashMap<String, Boolean>();

        // **** replace multiple contiguous '*'s with a single '*' ****
        p = p.replaceAll("\\*{2,}", "*");

        // **** recursive call ****
        return isMatch2(s, p, 0, 0);
    }

In this implementation we use the cache for memoization. We also remove duplicate ‘*’s from the pattern. Note that the first five test cases do not illustrate the need for this technique, but the sixth test case does.

We are now ready to call the associated recursive call.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'.
     * 
     * Recursive call.
     * Bottom-down approach.
     */
    static private boolean isMatch2(String s, String p, int startS, int startP) {

        // **** initialization ****
        boolean result = false;

        // **** build pair string ****
        String pair = "" + startS + "," + startP;


        // // ???? update pair count ????
        // if (!hm.containsKey(pair))
        //     hm.put(pair, 1);
        // else {
        //     int value = hm.get(pair);
        //     hm.put(pair, ++value);
        // }

        // // ???? count this call ????
        // callCounter++;


        // **** reached the end of both s and p ****
        if (startS == s.length() && startP == p.length())
            result = true;

        // **** there are still characters in S => there is no match ****
        else if (startP == p.length())
            result = false;

        // **** hopefully the remaining characters in P are all stars ****
        else if (startS == s.length())
            result = p.charAt(startP) == '*' && isMatch2(s, p, startS, startP + 1);

        // **** '*' either matches 0 or >= 1 character ****
        else if (p.charAt(startP) == '*')
            result = isMatch2(s, p, startS + 1, startP) || isMatch2(s, p, startS, startP + 1);

        // **** move both pointers ****
        else if (p.charAt(startP) == '?' || p.charAt(startP) == s.charAt(startS))
            result = isMatch2(s, p, startS + 1, startP + 1);

        // **** # the current char from P is a lowercase char different from s[startS] ****
        else 
            result =  false;
            
        // **** save result in cache (if needed) ****
        if (!cache.containsKey(pair))
            cache.put(pair, result);

        // **** return result ****
        return result;
    }

Note that this implementation is similar to the previous one. The difference is that we perform a set of tests and make recursive calls in a couple instances.

To be honest with you, I do not recall at this time if I submitted this implementation or not. I might but it might have timed out. Feel free to edit it and give it a try. If you do so, please let me know by leaving a message in the comment section.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'.
     * 
     * Bottom-up approach.
     * 
     * Runtime: 29 ms, faster than 27.41% of Java online submissions.
     * Memory Usage: 39.3 MB, less than 57.77% of Java online submissions.
     */
    static private boolean isMatch3(String s, String p) {

        // // **** replace multiple contiguous '*'s with a single '*' ****
        // p = p.replaceAll("\\*{2,}", "*");

        // **** initialization ****
        boolean dp[][] = new boolean[s.length() + 1][p.length() + 1];

        // **** ****
        for (int i = 0; i < s.length() + 1; i++) {
            for (int j = 0; j < p.length() + 1; j++) {

                // **** ****
                int startS = i - 1;
                int startP = j - 1;

                // **** conditions ****
                if (i == 0 && j == 0)
                    dp[i][j] = true;

                else if (i == 0) 
                    dp[i][j] = p.charAt(startP) == '*' && dp[i][j - 1];

                else if (j == 0) 
                    dp[i][j] = false;

                else if (p.charAt(startP) == '*') 
                    dp[i][j] = dp[i][j - 1] || dp[i - 1][j];

                else if (s.charAt(startS) == p.charAt(startP) || p.charAt(startP) == '?')
                    dp[i][j] = dp[i - 1][j - 1]; 
            }
        }

        // **** return result ****
        return dp[s.length()][p.length()];
    }

We are back testing the approach of using a matrix to keep the intermediate results and then traversing it to obtain the result.

It seems that at some point I added and then commented out removing the duplicate ‘*’s. Note that LeetCode has about 2,000 test cases applied to the code. Perhaps most of them do not have duplicate ‘*’s so the time spent attempting to remove them became a liability. Note the execution time. We already have a letter implementation.

    /**
     * Given an input string (s) and a pattern (p), 
     * implement wildcard pattern matching with support for '?' and '*'.
     * 
     * Runtime: 2 ms, faster than 100.00% of Java online submissions.
     * Memory Usage: 38.6 MB, less than 99.58% of Java online submissions.
     */
    static boolean isMatch(String s, String p) {

        // **** initialization ****
        int sLen = s.length(); 
        int pLen = p.length();

        int sIdx = 0;
        int pIdx = 0;

        int starIdx = -1;
        int sTmpIdx = -1;
        
        // **** ****
        while (sIdx < sLen) {
            if (pIdx < pLen && (s.charAt(sIdx) == p.charAt(pIdx) || p.charAt(pIdx) == '?')){
                sIdx++;
                pIdx++;
            } else if (pIdx < pLen && p.charAt(pIdx) == '*'){
                starIdx = pIdx;
                sTmpIdx = sIdx;
                pIdx++;
            } else if (starIdx == -1){
                return false;
            } else {
                sIdx    = sTmpIdx + 1;
                pIdx    = starIdx + 1;
                sTmpIdx = sIdx;
            }
            
        }

        // **** return false result ****
        for (int i = pIdx; i < pLen; i++){
            if (p.charAt(i) != '*')
                return false;
        }

        // **** return true result ****
        return true;
    }

This is an edited copy of the fastest implementation submitted to LeetCode. It is hands down the fastest!

I would like to point out that as soon as you have a solution accepted, you can visit a section in LeetCode with different approaches and associated explanations. Make sure you always check such section to learn more about the different approaches. In this case you can find the explanations here.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, please do not hesitate and leave me a note below. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

Thanks for reading this post, feel free to connect with me John Canessa at LinkedIn.

Regards;

John

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.