Encode and Decode TinyURL in Java

It is Sunday morning again. It seems that time is flying by faster than ever. Yesterday I worked all morning (about 6 hours). My daughter in law was going to visit her family and my son had to stay home. They were expecting a delivery and installation of a piece of exercising equipment. The delivery was scheduled between 03:00 PM and 06:00 PM.

Earlier that day I had made some pizza dough. I used to knit the dough by hand. In the past couple years I have switched to using our Kitchenaid mixer. I can take all the ingredients, mix them and put the dough to rise in about 25 minutes. I do it while chatting with my wife during breakfast. She just watches me switch tasks while having yogurt with homemade granola and warm milk with a triple espresso. We both like to drink coffee.

My son showed up at our place around 01:00 PM. We made and consume four pizzas. They were all different. My son took two slices home to share with his family. Around 03:00 PM we all headed to his place to chat while waiting for the delivery. About 03:30 PM the delivery was made and two technicians put the unit together. It took them about 1 ½ hours.

Today I will finish this post, prepare and have breakfast with my wife. After getting ready, I will work for a few hours. We need to get groceries. We are running low in milk and yogurt.

In this post we will deal with LeetCode 535 “Encode and Decode TinyURL”. This seems a topic that we could better relate to.

Note: This is a companion problem to the System Design problem: 
Design TinyURL.
https://leetcode.com/discuss/interview-question/124658/Design-a-URL-Shortener-(-TinyURL-)-System/

TinyURL is a URL shortening service where you enter a URL such as:
https://leetcode.com/problems/design-tinyurl 
and it returns a short URL such as:
http://tinyurl.com/4e9iAk.

Design the encode and decode methods for the TinyURL service.

There is no restriction on how your encode/decode algorithm should work.

You just need to ensure that a URL can be encoded to a tiny URL 
and the tiny URL can be decoded to the original URL.

Apparently there is a companion problem. I will take a look at it later this week.

For example, if you watch YouTube videos (and who does not), chances are that you have noticed that the URLs are somewhat different to most URLs. They have been replaced in order to reduce space and to hide information about where the video clips are stored.

We need to design the encode() and decode() methods that will take respectively a long URL and convert it to a short URL and vice versa. Note that we can do whatever we want in the required methods.

I will develop the code using the Java programming language on a Windows 10 machine using the VSCode IDE. The simplest approach is to directly solve it in the LeetCode web site using the provided IDE.

Since I will use my computer, in addition to solving the problem, we will be generating a test scaffolding.

public class Codec {

    // Encodes a URL to a shortened URL.
    public String encode(String longUrl) {
        
    }

    // Decodes a shortened URL to its original URL.
    public String decode(String shortUrl) {
        
    }
}

// Your Codec object will be instantiated and called as such:
// Codec codec = new Codec();
// codec.decode(codec.encode(url));

The name of the class and the two required methods are shown. In addition, note how LeetCode will be testing our implementation. That is interesting.

Something we should consider is the concept of Bijection. To solve this problem we need to take a string and convert it to another. Later we take the resulting string and we need to return the original one.

https://leetcode.com/problems/design-tinyurl
main <<< longUrl ==>https://leetcode.com/problems/design-tinyurl<==
main <<< shortUrl ==>jib<==
main <<< longUrl ==>https://leetcode.com/problems/design-tinyurl<==        
main <<< longUrl ==>https://leetcode.com/problems/design-tinyurl<==  

It seems that we will just use the example provided by LeetCode. Our test code needs the long URL which is provided as input. We then display the log URL to make sure all is well so far.

We will talk more about what we display as we look into the source code.

We seem to display the short URL. Our program must be calling the encode() method.

We then seem to call the decode() method and display the results. The results look fine because we returned the associated long URL.

The last line seems to replicate the mechanism used by LeetCode to test our solution. We input the long URL to the encode() method, we call the decode() method and display the result. As expected we get back the long URL.

    /**
     * Test scafolding
     * 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        
        // **** open buffered reader ****
        BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

        // **** read URL ****
        String longUrl = br.readLine().trim();

        // **** close buffered reader ****
        br.close();

        // ???? display URL ????
        System.out.println("main <<< longUrl ==>" + longUrl + "<==");

        // **** instantiate the codec object ****
        Codec codec = new Codec();

        // **** convert longUrl -> shortUrl ****
       String shortUrl = codec.encode(longUrl);

       // ???? ????
       System.out.println("main <<< shortUrl ==>" + shortUrl + "<==");

        // **** convert shortUrl -> longUrl ****
        longUrl = codec.decode(shortUrl);
        
       // ???? ????
       System.out.println("main <<< longUrl ==>" + longUrl + "<==");

        // **** call methods in the way LeetCode will do ****
        System.out.println( "main <<< longUrl ==>" 
                            + codec.decode(codec.encode(longUrl)) + "<==");
    }

The code seems to work as previously described. We read the long URL provided as input and display it. We then invoke the encode(), followed by the decode() and terminating by making the combine call as suggested by LeetCode.

Let’s look at the code and then I will make some comments.

    // **** Codec class member(s) ****
    HashMap<String, String> hm = null;
    char map[] = null;

We will be using a hash map to keep associated the long and the short URL. We will also use a char[] to map 62 characters and digits so we can encode the short URL in a custom base 62.

    /**
     * Constructor.
     */
    public Codec() {
        this.hm     = new HashMap<>();
        this.map    = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".toCharArray();
    }

Our constructor initializes the hash map and the char[] that we will use to map our long to a short URL.

Before we move forward, I would assume you have thought that we could store the long URL in a string and when we call the decode() method we return the contents of the string. The encoded short URL could be a constant i.e., “1”. Obviously this is not what LeetCode was asking for. I have not checked if what I suggested and you probably thought works. If you do, let me know.

    /**
     * Encodes a URL to a shortened URL.
     * Uses a base 62 encoding.
     * O(n) 
     */
    public String encode(String longUrl) {
        
        // **** sanity check(s) ****
        if (longUrl.equals(""))
            return "";

        // **** initialization ****
        StringBuilder sb    = new StringBuilder();
        int sum             = 0;

        // **** sum characters in string - O(n) ****
        for (char c : longUrl.toCharArray())
            sum += c;

        // **** convert the sum to a base 62 number ****
        while (sum > 0) {
            sb.append(map[sum % 62]); 
            sum /= 62;  
        }

        // **** string builder to string ****
        // String shortUrl = sb.reverse().toString();
        String shortUrl = sb.toString();

        // **** put URLs into hash map ****
        this.hm.put(shortUrl, longUrl);

        // **** return short URL ****
        return shortUrl;
    }

This is the implementation of the encode() method. We will use a string builder and the sum of the characters in the long URL. We will talk more about this shortly.

We traverse the characters and generate their sum.

We now will map the sum (convert it to our base 62). Note that as we are looping, we add the mapped character to our string builder.

We then convert the reverse of the contents of our string builder to a string. The reason for this is because the characters were being processed from left to right. If you have questions about this conversion try it using base 10 to 16 (hexadecimal).

Note that in the current code we are not reversing the characters. The reason is that they do not matter. What we need is to get to a representation of the long URL that is somewhat unique. We do not want to generate a short URL that will match multiple long URLs.

Our code then without further checks, saves (or perhaps replaces) the association of the short URL with a different long URL. In production code, we would have to check if there is a collision and address it.

The simplest way would be to generate a random number of three or four hexadecimal characters, and prepend or append them to the short URL. Note that if the newly generated short URL has a collision we would have to generate a new value. For simplicity we could quickly try incrementing the random value by one and checking.

Finally we return the short URL which in this case is not a URL. In practice we should have used a substring of the long URL or a different path, and used it as a base for the short URL. In the test case, we could have generated “http://tinyurl.com/” + “jib” == http://tinyurl.com/jib”. If in addition we would have prepended a random hexadecimal we would end with something like “http://tinyurl.com/4e9jib”.

    /**
     * Decodes a shortened URL to its original URL.
     */
    public String decode(String shortUrl) {

        // **** sanity check(s) ****
        if (shortUrl.equals(""))
            return "";

        // **** return longUrl ****
        return this.hm.get(shortUrl);
    }

The decode() method is quite simple. We perform some sanity checks and then we return the map of the short URL.

In practice we should have removed at least the http://tinyurl.com/ and only use “4e9jib”. Note that we did not use the additional random number.

The code as we have seen in this post was accepted by LeetCode. If interested in the execution time please take a look at the code in the associated GitHub repository.

I am sure you have also thought that it would not make much sense to use any type of data structure that is kept in memory. That would imply that we have to load the mapping, which could be in the order of gigabytes, each time we start our program.

A database is a much better approach. That said; then we could implement a caching mechanism to avoid the round trip to the database on each request. For that a hash map would work fine.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to help out with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private e-mail message. I will reply as soon as possible.

Keep on reading and experimenting. It is one of the best ways to learn, become proficient, refresh your knowledge and enhance your developer toolset.

One last thing, many thanks to all 6,260 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy. I believe we can all see the light at the end of the tunnel.

Regards;

John

john.canessa@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.