Sub Strings

The end of year 2018 holiday season has come to a conclusion. This past year it seemed to be quite long. Probably because of the day in which Christmas and New Year landed. One way or the other, on Monday January 07, 2019 most people will be at work or at school.

As usual I spend some time every week taking a coding challenge. In my opinion they do little to determine if a person is capable of working as a software developer, group lead or system architect. Being able to come up with a good approach to solve a problem in an hour or so using tools you are not familiar with does not seem to be a logical approach to determine fitness for a company. The reason these challenges have become popular is due to the fact that they are encapsulated. You get a challenge, you complete it in the allotted time, and you should be able to solve any other problem. We humans are quite complex. Having excelled in school, sports, worked for a few companies and having owned my own, and studying every day for a couple hours, gives me a good insight on how and what needs to be done to achieve success in the software life cycle.

Last week I tried a challenge that if not exactly the same, I have solved at least a couple times before. In this case the platform would not suggest the libraries you needed. I used Java to solve the problem. At work I use different languages (e.g., C/C++, C#, Java, and Python) at different times depending on what I am doing. I also use different IDEs. When I work with Java I mostly use Eclipse and IntelliJ Idea.

Enough of background and here is my (perhaps I missed something) interpretation on the requirements.

“Given a string of lowercase characters select all unique substrings with a specified number of characters”.

I highlighted some words. The fact that lowercase characters are used is probably to eliminate the mix of upper and lower case characters. This would avoid the question if “ABC” equals “ABc” or “abc”. That makes sense.

A unique substring could imply that only sub strings of the specified input string may be used. If the input sting is “abcd” and n (the number of characters per unique substring is set to 3, then the only answer is “abc” because we have already used “abc” the “bcd” is not a sub string. On the other hand, given the same input string, we could produce “abc” and “bcd”. That was not made clear in the requirements. That said, when we develop the solution using a TDD (Test Driven Development) approach, we may address the two possible solutions.

The number of characters is just a number that specified consecutive numbers. For example, if n = 3 then the output sub strings will hold 3 characters each.

Let’s start with a blank / empty project and describe our approach. We also will need to generate the random samples because we do not have the hidden scaffolding for the challenge. Our starting code follows:

package substrings.canessa;

import java.util.ArrayList;

public class Solution {

	/*
	 * Return the unique sub strings.
	 */
	static ArrayList<String> uniqueSubStrs(String inStr, int n) {
		
		ArrayList<String> subStrs = new ArrayList<String>();
		
		// **** ****
		return subStrs;
	}

	/*
	 * Scaffolding to test the solution.
	 */
	public static void main(String[] args) {
	
		// **** generate a string of random characters or numbers ****
		
		// **** generate the number of characters per sub string ****
		
		// **** call a function that returns the sub strings ****
		
		// **** display the count and actual sub strings ****
		
	}

}

The main program generates the input string (need to specify length) and the number of characters per substring. Once that is done we can call the function uniqueSubStrs() to generate them and put them in a list (in this case and ArrayList) and then display them and their count. I believe I did not include the count in the requirements but it seems to be a good idea in case an issue is encountered. We could easily compare the actual returned count versus the expected.

I should note that as soon as I typed ArrayList the test was flagged. When I completed the line I hovered on it and Eclipse suggested the possible libraries I could include. If your IDE does not offer such feature think about replacing it for one that does. We live in the 21st century. We are not writing software using assembly language for a custom CPU.

Let’s implement the test program before addressing the actual function. This is how I always design and implement software. This approach is consistent with TDD. It may seem to take longer, but in actuality it greatly reduces testing and the resulting code is in most cases free of issues / bugs. What good is to develop code quickly riddle with issues some of which will be encountered by testing and others by actual customers. The longer it takes to find and address issues the most costly it is for the company.

package substrings.canessa;

import java.util.ArrayList;
import java.util.Random;

public class Solution {

	/*
	 * Return the unique sub strings in the specified string.
	 */
	static ArrayList<String> uniqueSubStrings(String inStr, int n) {
		
		ArrayList<String> subStrs = new ArrayList<String>();
		
		// **** ****
		return subStrs;
	}
	
	/*
	 * Generate a random string of the specified size holding
	 * lower case characters.
	 */
	static String lowerCaseStr(Random rand, int len) {
		StringBuilder strb = new StringBuilder();

		// **** ****
		return strb.toString();
	}
	
	/*
	 * Scaffolding to test the solution.
	 */
	public static void main(String[] args) {
	
		// **** initialize the pseudo random number generator ****
		final int RANDOM_SEED = 123;
		Random rand = new Random(RANDOM_SEED);
		
		// **** generate the length for the input string ****
		final int MAX_LEN = 32;
		int len = rand.nextInt(MAX_LEN);
		System.out.println("main <<< len: " + len);
		
		// **** generate a string of random characters ****
		String inStr = lowerCaseStr(rand, len);
		System.out.println("main <<< inStr ==>" + inStr + "<==");
		
		// **** generate the length of each unique string ****
		final int MAX_N = 3;
		int n = rand.nextInt(MAX_N) + 1;
		System.out.println("main <<< n: " + n);
				
		// **** call a function that returns the sub strings ****
		ArrayList<String> subStrs = uniqueSubStrings(inStr, n);
		
		// **** display the count and actual sub strings ****
		System.out.println("main <<< size: " + subStrs.size());
		for (String str : subStrs)
			System.out.println("main <<< str ==>" + str + "<==");
	}

}

As you can tell we added a function and went as far as making the code compile and run without returning much information. Of interest is the addition of a function that generates the input random string.

The output from the console follows:

main <<< len: 23
main <<< inStr ==><==
main <<< n: 3
main <<< size: 0

We also seeded the pseudo random number generator. I tried a few values and decided to go with the seed of 123. The reason to seed the generator is to be able to repeat the code in case we run into an issue during development. After that we will remove the seed to generate more random cases. Do not stop after all is working with a random seed. There might be cases with issues that we may / will miss.

Let’s continue by generating the input string.

	static String lowerCaseStr(Random rand, int len) {
		StringBuilder strb = new StringBuilder();
		
//		final String chars = "abcdefghijklmnopqrstuvwxyz";
		final String chars = "0123456789";
		System.out.println("lowerCaseStr <<< chars.length: " + chars.length());
		
		// **** loop randomly appending characters to the string ****
		for (int i = 0; i < len; i++)
			strb.append(chars.charAt(rand.nextInt(chars.length())));
		
		// **** ****
		return strb.toString();
	}

As you can see I decided to generate a string of digits given that there are only 10 digits as opposed to 26 characters to pick from. This way we should get more unique substrings. After testing with digits we will test with lower case characters.

The output for the console follows:

main <<< len: 46
lowerCaseStr <<< chars.length: 10
main <<< inStr ==>2634105618217546153026768787718764650385795752<==
main <<< n: 3
main <<< size: 0

To be honest I have not checked the number of unique strings in the input string. Will do when we have some code of the uniqueSubStrings() function.

The following code will be the solution for this challenge.

package substrings.canessa;

import java.util.HashSet;
import java.util.Random;

public class Solution {

	/*
	 * Return the unique sub strings in the specified string.
	 */
	static String[] uniqueSubStrings(String inStr, int n) {
		
		String[] subStrs;
		HashSet<String> set	= new HashSet<String>();
		
		// **** traverse the input string looking for unique sub strings ****
		for (int i = 0; i < inStr.length() - n + 1; i++) {
			
			// **** extract the next sub string ****
			String subStr = inStr.substring(i, i + n);
//			System.out.println("uniqueSubStrings <<< subStr ==>" + subStr + "<==");
			
			// **** insert the substring into the set (if unique) ****
			boolean dup = set.add(subStr);
			if (!dup)
				System.out.println("uniqueSubStrings <<< subStr ===>" + subStr + "<== is a duplicate");
		}
		
		// **** convert the set to an array ****
		subStrs = set.toArray(new String[0]);

		// **** return the array ****
		return subStrs;
	}
	
	/*
	 * Generate a random string of the specified size holding
	 * lower case characters.
	 */
	static String lowerCaseStr(Random rand, int len) {
		StringBuilder strb = new StringBuilder();
		
		final String chars = "abcdefghijklmnopqrstuvwxyz";
//		final String chars = "0123456789";
		System.out.println("lowerCaseStr <<< chars.length: " + chars.length());
		
		// **** loop randomly appending characters to the string ****
		for (int i = 0; i < len; i++)
			strb.append(chars.charAt(rand.nextInt(chars.length())));
		
		// **** ****
		return strb.toString();
	}
	
	/*
	 * Scaffolding to test the solution.
	 */
	public static void main(String[] args) {
	
		// **** initialize the pseudo random number generator ****
//		final int RANDOM_SEED = 12;
//		Random rand = new Random(RANDOM_SEED);
		
		// **** instantiate a pseudo random number generator ****
		Random rand = new Random();
		
		// **** generate the length for the input string ****
		final int MAX_LEN = 256;
		int len = rand.nextInt(MAX_LEN);
		System.out.println("main <<< len: " + len);

		// **** generate a string of random characters ****
		String inStr = lowerCaseStr(rand, len);
		System.out.println("main <<< inStr ==>" + inStr + "<==");

		// **** generate the length of each unique string ****
		final int MAX_N = 3;
		int n = rand.nextInt(MAX_N) + 1;
		System.out.println("main <<< n: " + n);
				
		// **** call a function that returns the sub strings ****
		String[] subStrs = uniqueSubStrings(inStr, n);
		
		// **** display the count and actual sub strings ****
		System.out.println("main <<< size: " + subStrs.length);
		for (String str : subStrs)
			System.out.println("main <<< str ==>" + str + "<==");
	}
}

The associated console capture follows:

main <<< len: 151
lowerCaseStr <<< chars.length: 26
main <<< inStr ==>jnmtaddhvlppgshbxxqmerdjttstwaocrhbrzgnbeefnufscwdrntlqzdxgussyeqrafesomduagnaeqpazrmdjsydkzkbxrbspcphsmtrctsgptbygfwrhestkgolabhbkmycajzmffwtkzxqssvmg<==
main <<< n: 2
uniqueSubStrings <<< subStr ===>hb<== is a duplicate
uniqueSubStrings <<< subStr ===>gn<== is a duplicate
uniqueSubStrings <<< subStr ===>eq<== is a duplicate
uniqueSubStrings <<< subStr ===>md<== is a duplicate
uniqueSubStrings <<< subStr ===>dj<== is a duplicate
uniqueSubStrings <<< subStr ===>sy<== is a duplicate
uniqueSubStrings <<< subStr ===>bx<== is a duplicate
uniqueSubStrings <<< subStr ===>mt<== is a duplicate
uniqueSubStrings <<< subStr ===>ts<== is a duplicate
uniqueSubStrings <<< subStr ===>rh<== is a duplicate
uniqueSubStrings <<< subStr ===>es<== is a duplicate
uniqueSubStrings <<< subStr ===>st<== is a duplicate
uniqueSubStrings <<< subStr ===>hb<== is a duplicate
uniqueSubStrings <<< subStr ===>fw<== is a duplicate
uniqueSubStrings <<< subStr ===>tk<== is a duplicate
uniqueSubStrings <<< subStr ===>kz<== is a duplicate
uniqueSubStrings <<< subStr ===>xq<== is a duplicate
uniqueSubStrings <<< subStr ===>ss<== is a duplicate
main <<< size: 132
main <<< str ==>pp<==
main <<< str ==>xx<==
main <<< str ==>pt<==
main <<< str ==>yc<==
main <<< str ==>yd<==
main <<< str ==>hs<==
main <<< str ==>ye<==
main <<< str ==>hv<==
main <<< str ==>yg<==
main <<< str ==>qm<==
main <<< str ==>qp<==
main <<< str ==>qr<==
main <<< str ==>ab<==
main <<< str ==>qs<==
main <<< str ==>ad<==
main <<< str ==>ae<==
main <<< str ==>af<==
main <<< str ==>ag<==
main <<< str ==>qz<==
main <<< str ==>aj<==
main <<< str ==>zd<==
main <<< str ==>zg<==
main <<< str ==>ao<==
main <<< str ==>ra<==
main <<< str ==>rb<==
main <<< str ==>zk<==
main <<< str ==>rc<==
main <<< str ==>rd<==
main <<< str ==>zm<==
main <<< str ==>rh<==
main <<< str ==>zr<==
main <<< str ==>az<==
main <<< str ==>rm<==
main <<< str ==>rn<==
main <<< str ==>zx<==
main <<< str ==>be<==
main <<< str ==>jn<==
main <<< str ==>bh<==
main <<< str ==>rz<==
main <<< str ==>js<==
main <<< str ==>bk<==
main <<< str ==>jt<==
main <<< str ==>br<==
main <<< str ==>sc<==
main <<< str ==>jz<==
main <<< str ==>bs<==
main <<< str ==>sg<==
main <<< str ==>sh<==
main <<< str ==>bx<==
main <<< str ==>kb<==
main <<< str ==>by<==
main <<< str ==>sm<==
main <<< str ==>so<==
main <<< str ==>kg<==
main <<< str ==>sp<==
main <<< str ==>ca<==
main <<< str ==>ss<==
main <<< str ==>st<==
main <<< str ==>km<==
main <<< str ==>sv<==
main <<< str ==>sy<==
main <<< str ==>ta<==
main <<< str ==>cp<==
main <<< str ==>tb<==
main <<< str ==>cr<==
main <<< str ==>kz<==
main <<< str ==>ct<==
main <<< str ==>cw<==
main <<< str ==>la<==
main <<< str ==>tk<==
main <<< str ==>tl<==
main <<< str ==>tr<==
main <<< str ==>ts<==
main <<< str ==>dd<==
main <<< str ==>tt<==
main <<< str ==>tw<==
main <<< str ==>dh<==
main <<< str ==>lp<==
main <<< str ==>lq<==
main <<< str ==>dj<==
main <<< str ==>dk<==
main <<< str ==>ua<==
main <<< str ==>dr<==
main <<< str ==>uf<==
main <<< str ==>du<==
main <<< str ==>dx<==
main <<< str ==>md<==
main <<< str ==>me<==
main <<< str ==>mf<==
main <<< str ==>mg<==
main <<< str ==>us<==
main <<< str ==>ee<==
main <<< str ==>ef<==
main <<< str ==>mt<==
main <<< str ==>eq<==
main <<< str ==>my<==
main <<< str ==>er<==
main <<< str ==>es<==
main <<< str ==>na<==
main <<< str ==>nb<==
main <<< str ==>vl<==
main <<< str ==>vm<==
main <<< str ==>nm<==
main <<< str ==>fe<==
main <<< str ==>ff<==
main <<< str ==>nt<==
main <<< str ==>nu<==
main <<< str ==>fn<==
main <<< str ==>wa<==
main <<< str ==>fs<==
main <<< str ==>wd<==
main <<< str ==>fw<==
main <<< str ==>oc<==
main <<< str ==>wr<==
main <<< str ==>ol<==
main <<< str ==>wt<==
main <<< str ==>om<==
main <<< str ==>gf<==
main <<< str ==>gn<==
main <<< str ==>go<==
main <<< str ==>gp<==
main <<< str ==>gs<==
main <<< str ==>gu<==
main <<< str ==>xg<==
main <<< str ==>pa<==
main <<< str ==>pc<==
main <<< str ==>pg<==
main <<< str ==>ph<==
main <<< str ==>xq<==
main <<< str ==>hb<==
main <<< str ==>xr<==
main <<< str ==>he<==

As you can tell I changed the signature of the uniqueSubStrings() function. From returning an ArrayString I simplified it to returning an array of Strings. I was going to illustrate a copy using an Iterator but decided to go the simplest path.

I also switched from using a list of integers to a list of lower case characters like the requirements called for. In addition I removed the pseudo random generator code.

Hope you enjoyed this post. If you have comments or questions regarding this or any other entry please leave me a note bellow.

Keep on learning and developing better code;

John

Please follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.