Masking Personal Information

Earlier this morning while having breakfast with my wife, around 06:30 AM she noted that the day was quite dark. I replied that this is normal this time of the year, especially because Daylight Savings in the United States was extended since the last update.

Around 07:00 AM, after showering and getting dressed, I descend to my home office. Around 10:00 AM I prepared coffee for my first break of the working day. It was sunny and windy with a temperature in the high 40’s. Hopefully the weather will remain like that for the rest of the day.

While browsing the LeetCode web site problem 831 Masking Personal Information called my attention. I develop software and systems that deal with medical records and images which need to be de-identified. This problem seemed to contain a very reduced set of data that needs to be anonymized.

I decided to use the Java programming language and the VSCode IDE to solve the problem on a Windows 10 machine.

We are given a personal information string S, which may represent either an email address or a phone number.

Notes:

S.length <= 40.
Emails have length at least 8.
Phone numbers have length at least 10.

The overall requirements call to anonymize phone numbers and e-mail addresses. If interested in this problem, please read the entire requirements from LeetCode.

The web site contains four examples. Two on phone numbers and two on e-mail addresses. I decided to add a few more to better test my solution on my computer before submitting it.

LeetCode@LeetCode.com
main <<< S ==>LeetCode@LeetCode.com<==
main <<< maskPPI ==>l*****e@leetcode.com<==


AB@qq.com
main <<< S ==>AB@qq.com<==
main <<< maskPPI ==>a*****b@qq.com<==


1(234)567-890
main <<< S ==>1(234)567-890<==
main <<< maskPPI ==>***-***-7890<==


86-(10)12345678
main <<< S ==>86-(10)12345678<==
main <<< maskPPI ==>+**-***-***-5678<==


a@b.c
main <<< S ==>a@b.c<==
main <<< maskPPI ==><==


ab@cd.ef
main <<< S ==>ab@cd.ef<==
main <<< maskPPI ==>a*****b@cd.ef<==


@abcd.ef
main <<< S ==>@abcd.ef<==
main <<< maskPPI ==><==


ab@de.gh
main <<< S ==>ab@de.gh<==
main <<< maskPPI ==>a*****b@de.gh<==


a@de.ghx
main <<< S ==>a@de.ghx<==
main <<< maskPPI ==><==


ab@d.ghy
main <<< S ==>ab@d.ghy<==
main <<< maskPPI ==><==


ab@dez.h
main <<< S ==>ab@dez.h<==
main <<< maskPPI ==><==

I wrote my own test scaffolding which allows me to develop the code on my computer and be able to use a Test Driven Development approach. I use the essence of the TDD approach which encompasses less formal testing (i.e., unit testing) and more software engineering.

class Solution {
    public String maskPII(String S) {
        
    }
}

The program parses the string to be masked and displays it. We then call the maskPII() function to generate the mask and display the results.

    /**
     * Test scaffolding
     */
    public static void main(String[] args) {
        
        // **** open scanner ****
        Scanner sc = new Scanner(System.in);

        // **** read string ****
        String S = sc.nextLine().trim();

        // ???? ????
        System.out.println("main <<< S ==>" + S + "<==");

        // **** close scanner ****
        sc.close();

        // **** process and display the result ****
        System.out.println("main <<< maskPPI ==>" + maskPII(S) + "<==");
    }

The test scaffolding is quite simple. Open a scanner, read the input string, display it, close the scanner and display the returned the masked string generated by the function of interest. Note that the test scaffolding is not part of the solution.

    /**
     * Please see requirements on the LeetCode web page.
     * Runtime: 8 ms, faster than 54.44% of Java online submissions.
     * Memory Usage: 37.9 MB, less than 63.91% of Java online submissions.
     */
    static String maskPII(String S) {
        if (S.contains("@"))
            return maskEMail(S);
        else
            return maskPhone(S);
    }

Since we have two string types, I decided to detect the type of string and split the work into two separate functions. In production this would help debugging and in the future enhancing and supporting the software.

    /**
     * Mask PPI on email address.
     */
    static String maskEMail(String S) {

        // **** sanity checks ****
        if ((S.length() < 8) || !S.contains("@") || !S.contains("."))
            return "";

        // **** convert to lower case ****
        S = S.toLowerCase();

        // **** '@' split ****
        String[] atSplit = S.split("@");

        // **** ****
        if (atSplit[0].length() < 2)
            return "";

        // **** '.' split ****
        String[] dotSplit = atSplit[1].split("\\.");

        // **** ****
        if ((dotSplit[0].length() < 2) || (dotSplit[1].length() < 2))
            return "";

        // **** build anonymized string ****
        S = atSplit[0].charAt(0) + "*****" + atSplit[0].charAt(atSplit[0].length() - 1);
        S += "@";
        S += dotSplit[0];
        S += ".";
        S += dotSplit[1];

        // **** returned anonymized string ****
        return S;
    }

We start by performing some sanity checks. The requirements call for lower case masks, so we comply. We then split the email address into three fields to check and manipulate. Once that is done, we are ready to build the returned string.

I did try different to improve code execution keeping in mind maintainability. After a few attempts this is what I ended up with. Perhaps I could have left the version that used the StringBuilder class, but I did not.

    /**
     * Mask PPI on phone number.
     * A phone number is represented as: a + b + c + d
     */
    static String maskPhone(String S) {

        // **** extract digits only ****
        S = S.replaceAll("\\D+", "");

        // **** check length ****
        if ((S.length() < 10) || (S.length() > 13))
            return "";

        // **** extract d component ****
        String d = S.substring(S.length() - 4, S.length());

        // **** ****
        switch (S.length()) {
            case 10:
                S = "***-***-" + d;
            break;

            case 11:
                S = "+*-" + "***-***-" + d;
            break;

            case 12:
                S = "+**-" + "***-***-" + d;
            break;

            case 13:
                S = "+***-" + "***-***-" + d;
            break;

            default:
                return "";
        }

        // **** returned anonymized string ****
        return S;
    }

The first this is to get rid of all non-digit characters. We know the target format so we can remove them with multiple string manipulations or just a single regular expression. I decided to go with the regular expression.

At this point we need to make sure the number of digits matches a phone number which must have at least 10 digits.

We then proceed based on the number of digits we have. The excess from 10 accounts for country codes. Note that we return an empty string if the length is out of range. You may think this is not needed because we already checked the length of the string. I always start with a default case when I use a switch. Then I fill the different cases. If something happens with the code flow while maintaining the code, such default line may prove useful.

Hope you enjoyed solving this problem as much as I did. The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to serve of assistance with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private message using the following address:  john.canessa@gmail.com. I will reply as soon as possible.

Keep on reading and experimenting. It is the best way to learn, become proficient, refresh your knowledge and enhance your developer toolset!

One last thing, many thanks to all 2813 subscribers to this blog!!!

Keep safe during the COVID-19 pandemic and help restart the world economy.

John

Twitter:  @john_canessa

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.