Maximum Length of a Concatenated String with Unique Characters

Hello and hope you are doing well. Today the temperature went up to 51F in the Twin Cities of Minneapolis and St. Paul. I did not get a chance to go out. Hopefully we will experience a mild winter this year.

Sorry but it is the middle of the week (hump day) and I do not have much to tell you. Time appears to be passing quite fast. Many cities in the country are upping COVID-19 restrictions. It is going to be an interesting Thanksgiving Day coming up in about a week. My wife and I will be preparing a meal. Will take some to our family and on our way back will get on-line (Jitsi or Skype) and will enjoy the meal.

A couple days ago I selected LeetCode problem 1239 Maximum Length of a Concatenated String with Unique Characters. I thought it was one of the best so far. The reason for this is that you can experiment and attempt to solve it in different ways. I tried at least four. They all appeared to have been accepted. I will comment more on this later on.

Given an array of strings arr. 
String s is a concatenation of a sub-sequence of arr which have unique characters.

Return the maximum possible length of s.

Constraints:

1 <= arr.length <= 16
1 <= arr[i].length <= 26
arr[i] contains only lower case English letters.

The problem is simply defined. Based on this description, is it reasonable to assume that the provided strings are unique? The answer is no. You may encounter duplicate characters in the set of strings provided as inputs.

If you are interested in this problem, please check it out at the LeetCode site. They provide some samples and constraints.

class Solution {
    public int maxLength(List<String> arr) {
        
    }
}

We need to populate the maxLength() function with our solution.

After reading and thinking about the problem, it seems we need to generate all possible unique combinations of strings. For each we need to verify that the concatenated strings do not hold duplicate lower case characters. If you wish to experiment with the counts of combinations the Combinations without Repetition could be interesting to review.

I decided to solve the problem in Java using the VSCode IDE on a Windows 10 machine. Because I like to solve the problems on a machine with the tools I use for work, I had to generate a test scaffolding.

"un","iq","ue"
main <<< 0 s ==>un<==
main <<< 1 s ==>iq<==
main <<< 2 s ==>ue<==
main <<< maxLength0: 4
main <<< maxLength1: 4
main <<< maxLength2: 4
main <<<  maxLength: 4


"cha","r","act","ers"
main <<< 0 s ==>cha<==
main <<< 1 s ==>r<==
main <<< 2 s ==>act<==
main <<< 3 s ==>ers<==
main <<< maxLength0: 6
main <<< maxLength1: 6
main <<< maxLength2: 6
main <<<  maxLength: 6


"a","ab","abc"
main <<< 0 s ==>a<==
main <<< 1 s ==>ab<==
main <<< 2 s ==>abc<==
main <<< maxLength0: 3
main <<< maxLength1: 3
main <<< maxLength2: 3
main <<<  maxLength: 3


"a","b","c","d"
main <<< 0 s ==>a<==
main <<< 1 s ==>b<==
main <<< 2 s ==>c<==
main <<< 3 s ==>d<==
main <<< maxLength0: 4
main <<< maxLength1: 4
main <<< maxLength2: 4
main <<<  maxLength: 4


"ab","ba","cd","dc","ef","fe","gh","hg","ij","ji","kl","lk","mn","nm","op","po"
main <<< 0 s ==>ab<==
main <<< 1 s ==>ba<==
main <<< 2 s ==>cd<==
main <<< 3 s ==>dc<==
main <<< 4 s ==>ef<==
main <<< 5 s ==>fe<==
main <<< 6 s ==>gh<==
main <<< 7 s ==>hg<==
main <<< 8 s ==>ij<==
main <<< 9 s ==>ji<==
main <<< 10 s ==>kl<==
main <<< 11 s ==>lk<==
main <<< 12 s ==>mn<==
main <<< 13 s ==>nm<==
main <<< 14 s ==>op<==
main <<< 15 s ==>po<==
main <<< maxLength0: 16
main <<< maxLength1: 16
main <<< maxLength2: 16
main <<<  maxLength: 16


"abcdefghijklmnopqrstuvwxyz"
main <<< 0 s ==>abcdefghijklmnopqrstuvwxyz<==
main <<< maxLength0: 26
main <<< maxLength1: 26
main <<< maxLength2: 26
main <<<  maxLength: 26


"a","abc","d","de","def"
main <<< 0 s ==>a<==
main <<< 1 s ==>abc<==
main <<< 2 s ==>d<==
main <<< 3 s ==>de<==
main <<< 4 s ==>def<==
main <<< maxLength0: 6
main <<< maxLength1: 6
main <<< maxLength2: 6
main <<<  maxLength: 6


"s","sa","m","e","mu","ei"
main <<< 0 s ==>s<==
main <<< 1 s ==>sa<==
main <<< 2 s ==>m<==
main <<< 3 s ==>e<==
main <<< 4 s ==>mu<==
main <<< 5 s ==>ei<==
main <<< maxLength0: 6
main <<< maxLength1: 6
main <<< maxLength2: 6
main <<<  maxLength: 6


"yy","bkhwmpbiisbldzknpm"
main <<< 0 s ==>yy<==
main <<< 1 s ==>bkhwmpbiisbldzknpm<==
main <<< maxLength0: 0
main <<< maxLength1: 0
main <<< maxLength2: 0
main <<<  maxLength: 0

The first line represents the list of strings which should not exceed 16. It seems that the program reads the input line and displays the strings. It seems that we currently have four different attempts. They all seem to return the same result.

I tried different approaches some of which were consumed by others. I was going to get a couple more (e.g., dfs) implemented, but I decided to move on.

The large number of test cases came from LeetCode. I tend to clear the previous solution and then paste the next which I develop in my computer. Since I am not using the LeetCode scaffolding while developing, I miss things in the copy procedure or incorporate a new bug which I did not have in a previous implementation. That said; some test cases are my own doing.

For simplicity will discuss the last approach. If interested on the entire code, you can get It from my GitHub repository.

    /**
     * Test scaffolding
     */
    public static void main(String[] args) {
        
        // **** open scanner ****
        Scanner sc = new Scanner(System.in);

        // **** read strings into array ****
        String[] strs = sc.nextLine().trim().split(",");

        // **** close scanner ****
        sc.close();

        // **** create and populate list ****
        List<String> arr = new ArrayList<String>();
        for (int i = 0; i < strs.length; i++) {
            arr.add(strs[i].substring(1, strs[i].length() - 1));
        }
        
        // ???? display list ????
        for (int i = 0; i < arr.size(); i++)
            System.out.println("main <<< " + i + " s ==>" + arr.get(i) + "<==");

        // // **** process and display results ****
        // System.out.println("main <<< maxLength0: " + maxLength0(arr));

        // // **** process and display results ****
        // System.out.println("main <<< maxLength1: " + maxLength1(arr));

        // // **** process and display results ****
        // System.out.println("main <<< maxLength2: " + maxLength2(arr));

        // **** process and display results ****
        System.out.println("main <<<  maxLength: " + maxLength(arr));
    }

The first few lines are used to collect the set of strings. At that point we are able to call the entry point function and display the returned status. Let’s dive into the last approach.

    /**
     * Seems like the initial set of strings may contain 
     * dupplicate characters.
     * 
     * Runtime: 106 ms, faster than 16.80% of Java online submissions
     * Memory Usage: 47.7 MB, less than 5.76% of Java online submissions
     */
    static int maxLength(List<String> arr) {
        
        // **** number of strings ****
        int n = arr.size();

        // **** one mask per string ****
        int[] masks = new int[n];

        // **** loop once per string ****
        for (int i = 0; i < n; i++) {

            // **** get current string ****
            String str = arr.get(i);

            // **** loop once per character in string ****
            for (int j = 0; j < str.length(); j++) {

                // **** ****
                int curr = 1 << (str.charAt(j) - 'a');

                // **** check for duplicate characters in a string ****
                if ((masks[i] & curr) != 0) {

                    // **** indicate an error ****
                    masks[i] = -1;
                    break;
                }

                // **** ****
                masks[i] |= curr;
            }
        }

        // **** loop generating all possible combinations ****
        List<int[]> combs = new ArrayList<int[]>();
        for (int r = 1; r <= n; r++) {
            combs.addAll(generateIntCombinations(n, r));
        }

        // **** return max length of unique string ****
        return uniqueStrMaxLen(arr, combs, masks);
    }

We get the number of strings in the array list. We create an array of integers which by the name, will be used as masks to determine which characters are in each string. This should help verify that there are no duplicate characters each time we add a string to the string of unique characters we are about to concatenate to.

The following loop is used to build the masks. We select a string and then loop through each character setting the position in the mash to 1 if the character is in the string. This works because the English alphabet contains 26 lowercase characters which easily fit in 32 or 64 bits. In our case we are using integers that fit in a 32-bit word.

In this case we will generate all combinations for the strings. We will have combinations of n strings taken 1, 2, 3, … n – 1 at a time.

The final step is to check each set of strings and determine if there are only unique characters. If so, we count the length of such strings for our result. We can keep a running max value and update it as we go.

    /**
     * Generate combinations of integers using iterative algorithm.
     */
    static List<int[]> generateIntCombinations(int n, int r) {

        /// **** initialization ****
        List<int[]> combinations    = new ArrayList<>();
        int[] combination           = new int[r];
     
        // **** initialize with lowest lexicographic combination ****
        for (int i = 0; i < r; i++) {
            combination[i] = i;
        }
     
        // **** ****
        while (combination[r - 1] < n) {
            combinations.add(combination.clone());
     
             // **** generate next combination in lexicographic order ****
            int t = r - 1;
            while (t != 0 && combination[t] == n - r + t) {
                t--;
            }
            combination[t]++;
            for (int i = t + 1; i < r; i++) {
                combination[i] = combination[i - 1] + 1;
            }
        }
     
        // **** return list of combinations ****
        return combinations;
    }

In this function we just generate using an iterative approach all possible combinations. Note that the combinations do not repeat. For example combination “ab” is included, but combination “ba” is not.

    /**
     * 
     */
    static int uniqueStrMaxLen(List<String> arr, List<int[]> combs, int[] masks) {

        // **** initialization ****
        int maxLen  = 0;

        // **** loop once per combination ****
        for (int c = 0; c < combs.size(); c++) {

            // **** get current combination ****
            int[] comb = combs.get(c);

            // **** determine if this combination is unique ****
            int mask        = 0;
            boolean unique  = true;
            for (int i = 0; unique && i < comb.length; i++) {

                // **** check if not unique ****
                if ((mask & masks[comb[i]]) != 0) {
                    unique = false;
                    continue;
                }

                // **** update mask ****
                mask = mask | masks[comb[i]];
            }

            // **** duplicate characters in this string :o) ****
            if (mask == -1)
                continue;

            // **** get length of unique string & update max length ****
            if (unique) {

             // **** generate the length of the string ****
                int len = 0;
                while (mask != 0) {
                    if ((mask & 1) == 1) {
                        len++;
                    }
                    mask >>= 1;
                }

                // **** update max length of unique string ****
                maxLen = Math.max(maxLen, len);
            }
        }

        // **** ****
        return maxLen;
    }

This function initializes a maximum length. It then loops once per combination starting with 1 element and ending with n. Of course the actual loop is between 0 and n – 1.

We first determine using the masks if the combination contains unique characters. If it does we compute the length by shifting the mask and counting the number of 1s which represent the lower case characters in the string.

With the length in hand we can then update the maximum length taking into account the current string.

When done, we just return the maximum length.

At some point this problem was quite popular with Microsoft. I believe it deserves additional time. I will generate a post regarding combinations which will give us a chance to further improve on the execution time of the code.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is the best way to learn, become proficient, refresh your knowledge and enhance your developer toolset!

One last thing, many thanks to all 4,285 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy.

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.