Contains Duplicate

Good day! It is Thursday morning and it is a typical gloomy and cold winter day in the Twin Cities of Minneapolis and St. Paul. The good thing is that is Thursday. One more day to go to and we can start enjoying the weekend. For most of us the weekend will be 2-days long. It is different from the past two weekends in which most people enjoyed 3 or 4 days off work.

Due to COVID-19 my wife and I just leave home for grocery shopping and healthcare appoints when needed. The good thing is that vaccination has started in the USA and hopefully in a few more months most of us will be vaccinated and can start getting back to normal. We all will see what happens.

Yesterday I attended a webinar and two coding problems were discussed. It is always interesting to watch someone else implementing code. I believe we can always learn something new with most (if not all) experiences no matter how simple or complicated they might be.

The first problem will be addressed in this post. The second will be for the following post (this evening or tomorrow morning). Since I like to have exhaustive testing performed on my code, I started by looking for the problem on LeetCode. I found it. The problem LeetCode 217 Contains Duplicate (https://leetcode.com/problems/contains-duplicate/) seems to be very similar to the requirements discussed during the webinar.

Given an array of integers, find if the array contains any duplicates.

Your function should return true if any value appears at least twice in the array, 
and it should return false if every element is distinct.

The idea is to check in an array of integers for one or more duplicates. We should check if the numbers are in any specific order, the limits (range) for the integers and find out if there is a single duplicate or multiple ones. The requirements tend to imply that there could be multiple duplicates. The name of the function, which we will see in a few, implies that there might be a single duplicate value. Some of the questions on the requirements you may have might be answered by the examples. Of course, if you are dealing with an actual work situation, you need to get such information before you implement and test a solution.

    public boolean containsDuplicate(int[] nums) {
        
    }

The signature of the function / method we should implement just provides an array of integers. We return true if we find a duplicate and false if we all values are unique.

1,2,3,1
main <<< nums: [1, 2, 3, 1]
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true


1,2,3,4
main <<< nums: [1, 2, 3, 4]
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false


1,2,3,4,5
main <<< nums: [1, 2, 3, 4, 5]
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false


1,2,3,4,5,6
main <<< nums: [1, 2, 3, 4, 5, 6]
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false
main <<< duplicates(s): false


1,1,1,3,3,4,3,2,4,2
main <<< nums: [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true


1,2,3,2,5
main <<< nums: [1, 2, 3, 2, 5]
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): true


1,2,3,2,7
main <<< nums: [1, 2, 3, 2, 7]
main <<< duplicates(s): true
main <<< duplicates(s): true
main <<< duplicates(s): false
main <<< duplicates(s): true
main <<< duplicates(s): true

The problem provides three samples. I have created some additional ones. All the test cases seem to start with the value 1. While testing I was sometimes selecting a subset of the values. In retrospect I could have limited myself to two test cases and select from them different subsets. Sorry about the large number of similar test cases.

I would like to call your attention to the last test case. It seems that the third implementation which passed all other cases, failed. I was tinkering with something and instead of ignoring the approach, I decided to leave it. I did not submit such implementation for testing at LeetCode. It would have passed the initial tests, but should have miserably failed the extensive tests.

As usual I implemented the solution using Java in my own computer using the VSCode IDE. For work I use many different programming languages and different IDEs. Due to the fact that I will implement the solution on my computer, I will need a test scaffolding. The test code WILL NOT BE PART OF THE SOLUTION. When done I will need to copy and paste the contents of the function / method specified by LeetCode.

    /**
     * Test scaffolding.
     * !!! NOT PART OF THE SOLUTION !!!
     * 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        
        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read the line and populate array of integers ****
        int[] nums = Arrays.stream(br.readLine().split(","))
                        .mapToInt(Integer::parseInt)
                        .toArray();
 
        // **** close buffere reader ****
        br.close();

        // ???? ????
        System.out.println("main <<< nums: " + Arrays.toString(nums));

        // **** call function to check for duplicates and display result ****
        System.out.println("main <<< duplicates(s): " + containsDuplicate(nums));

        // **** call function to check for duplicates and display result ****
        System.out.println("main <<< duplicates(s): " + containsDuplicate1(nums));

        // **** call function to check for duplicates and display result ****
        System.out.println("main <<< duplicates(s): " + containsDuplicate2(nums));

        // **** call function to check for duplicates and display result ****
        System.out.println("main <<< duplicates(s): " + containsDuplicate3(nums));

        // **** call function to check for duplicates and display result ****
        System.out.println("main <<< duplicates(s): " + containsDuplicate4(nums));
    }

Our test code opens a buffered reader, reads the line holding the integers, and creates an array. The buffered reader is closed. The array is displayed to allow us to check if all is well so far. The test code then makes calls to different implementations of the required function / method and displays the results. They should all match. As I mentioned before, one implementation fails one test.

    /**
     * Given an array of integers, find if the array contains any duplicates.
     * Sort array and traverse looking for duplicate.
     * If the nums array is modified it may produce side effects.
     * Need a duplicate array O(n) space.
     * 
     * Runtime: 3 ms, faster than 99.61% of Java online submissions.
     * Memory Usage: 41.9 MB, less than 96.66% of Java online submissions.
     */
    static boolean containsDuplicate(int[] nums) {

        // **** sanity checks ****
        if (nums.length == 0 || nums.length == 1)
            return false;

        // **** create local copy O(n) ****
        int[] localNums = Arrays.copyOf(nums, nums.length);
        
        // **** sort local copy  O(n log(n)) ****
        Arrays.sort(localNums);

        // **** traverse array looking for duplicates O(n) ****
        for (int i = 0; i < localNums.length - 1; i++) {
            if (localNums[i] == localNums[i + 1])
                return true;
        }

        // **** no duplicates ****
        return false;
    }

As you can see on the header, this implementation was submitted to LeetCode.

We perform some sanity checks. I like to check argument(s) in order to avoid performing unnecessary operations.  That said; I did not check for a NULL array. In production that would be required.

Since we are going to sort the array, we will first make a copy of it and then sort the copy. This does not affect our function / method but could introduce problems in the calling chain because we altered the contents of the array.

We then traverse the sorted array checking if two consecutive values match. If so, we return with a true value since we have encountered a duplicate.

If we get to the last line of the function, we return false because we did not find a duplicate.

The general structure of this function is used in the rest of the implementations.

    /**
     * Given an array of integers, find if the array contains any duplicates.
     * Needs a set to check for duplicates O(n) space.
     * Uses HashSet.
     * 
     * Runtime: 11 ms, faster than 5.48% of Java online submissions
     * Memory Usage: 53.8 MB, less than 5.03% of Java online submissions
     */
    static boolean containsDuplicate1(int[] nums) {

        // **** sanity checks ****
        if (nums.length == 0 || nums.length == 1)
            return false;

        // **** initialization ****
        HashSet<Integer> hs = new HashSet<>();

        // **** traverse the array checking for duplicate entries O(n) ****
        for (int i = 0; i < nums.length; i++) {

            // **** check if in set ****
            if (hs.contains(nums[i]))
                return true;
            
            // **** add to set ****
            hs.add(nums[i]);
        }
        
        // **** no duplicates ****
        return false;
    }

This implementation was also submitted to LeetCode. We decided not to alter the input array. To determine if we have duplicates we will use a hash set. We initialize it and then enter a loop. We traverse the input array. For each entry we check if we have seen this value before. If so we have a duplicate so we exit and return true. If the value is not in the hash set we add it and proceed.

    /**
     * Given an array of integers, find if the array contains any duplicates.
     * Computes and compares two sums O(2) space.
     * 
     * !!!! Works some times depending on values !!!!
     */
    static boolean containsDuplicate2(int[] nums) {

        // **** sanity checks ****
        if (nums.length == 0 || nums.length == 1)
            return false;

        // **** initialization O(2) ****
        int arrSum  = 0;
        int sum     = nums.length * (nums.length + 1) / 2;
        
        // **** compute the sum of elements in the array O(n) ****
        for (int i = 0; i < nums.length; i++) {
            arrSum += nums[i];
        }

        // **** check is sums mismatch ****
        if (sum != arrSum)
            return true;

        // **** no duplicates ****
        return false;
    }

In this implementation we will attempt use a formula to see if we can figure out if there is a duplicate in the array. The idea is to sum the values in the array and compare them to the results of the formula Sum of the First n Natural Numbers. This was not submitted because due to the observed / deduced requirements the approach would not work. As you can see I did not submit this code. This is the function that failed in the last test case.

    /**
     * Given an array of integers, find if the array contains any duplicates.
     * Sort array and traverse looking for duplicate.
     *
     * Runtime: 3 ms, faster than 99.61% of Java online submissions.
     * Memory Usage: 42.5 MB, less than 80.96% of Java online submissions.
     */
    static boolean containsDuplicate3(int[] nums) {

        // **** sanity checks ****
        if (nums.length == 0 || nums.length == 1)
            return false;
        
        // **** sort local copy O(n log(n)) ****
        Arrays.sort(nums);

        // **** traverse array looking for duplicates O(n) ****
        for (int i = 0; i < nums.length - 1; i++) {
            if (nums[i] == nums[i + 1]) return true;
        }

        // **** no duplicates ****
        return false;
    }

This is similar to the first implementation. The only difference is that we did not make a copy of the array, so we might introduce side effects within the calling stack. All is fine for this call, but in production this approach should not be used.

Note that this is a fastest implementation of the solution.

    /**
     * Given an array of integers, find if the array contains any duplicates.
     * Needs a set to check for duplicates O(n) space.
     * Uses HashMap.
     * 
     * Runtime: 5 ms, faster than 70.54% of Java online submissions.
     * Memory Usage: 45.1 MB, less than 44.32% of Java online submissions.
     */
    static boolean containsDuplicate4(int[] nums) {

        // **** sanity checks ****
        if (nums.length == 0 || nums.length == 1)
            return false;

        // **** initialization ****
        HashMap<Integer, Integer> hm = new HashMap<>();

        // **** traverse the array checking for duplicate entries O(n) ****
        for (int i = 0; i < nums.length; i++) {

            // **** ****
            if (hm.containsKey(nums[i]))
                return true;

            // **** ****
            hm.put(nums[i], 0);
        }
        
        // **** no duplicates ****
        return false;
    }

I had to do this. We have already seen an implementation that uses a hash set. I just wanted to see how it would compare to an implementation that uses a hash map. Note that we do not have a use for the key : value pair. We are only interested in the key side of things. Take a look at the execution time.

I would expect that a HashMap would be slower than a HashSet implementation. The statistics collected by LeetCode vary depending on the workload at the site. You can take the values as a guideline but that is it. In practice you should always use a local machine or set of machines, run multiple times (at least five) and drop the fastest and slowest readings and average the remaining ones. At work I spend time optimizing updates and new features that I have developed. I always follow the 5/3 rule and run the set of tests attempting to isolate the results from other code to keep them as faithful as possible.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is the best way to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all 5,505 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.