IsUnique

My wife and her girlfriend are out and about shopping. When they get back in about four hours it will be my wife and I going shopping for jeans for me. Later two of her brothers and wife’s will stop by for lunch. We try to get together about once a month. We rotate homes. This month is our turn.

I am also waiting on an Azure Kinect DK camera from Microsoft. The order says PENDING. I will start experimenting on a daily basis as soon as it arrives.

As I mentioned in a previous post I purchased a copy of “Cracking the Coding Interview” and a whiteboard. I am going to go over most of the interview questions first using a whiteboard and then using an IDE. I want to understand if there is an advantage in speed, quality, syntax, correctness when you code on a whiteboard versus on an IDE. Will let you know my first thoughts after I go through a couple dozen problems.

I am going to tackle question 1.1 on page 90:  Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?  

The question appears to have two possible solutions. Let’s go after the first one.

I have a string with six characters. It is not unique. Character ‘c’ is repeated. The brute force approach would be to create to loops. The outer loop would start with the first character and the inner loop will start with the second character. This would work but would not be too efficient O(n^2).

The suggested approach would be to sort the string and then traverse the sorted string looking for the first consecutive duplicate character. In our example it would be character ‘c’.

I have developed millions of lines of code (LOCs) in different programming languages and like to write functions / methods that perform tests, allocate resources if any, execute the core task and release resources if any. This takes some time but produces readable and maintainable code. It seems that it clutters the code on the whiteboard making it harder to follow.

The idea is to check for empty and single character strings and return true.

We take the string and convert it into a character array. We sort the character array, and at the time, I decided to convert the sorted character array to a sorted string.

Declared the lastChar variable to hold the value of the last character. Now we loop from the second character to the last character in the string. On each pass, we check if the last character matches the current character. If they do match, we return false (found a duplicate character). If there is no match we update the lastChar variable with the current character.

If we complete the loop and do not find a duplicate we return true.

Before we see the code in an IDE we will look at the second part of the question. It states we should not use a second data structure. We used two additional ones. The first was a character array and the second a string. Strings are not mutable in Java so the two instances we used represent two separate data structure.

We will traverse the string looking for the same character in the smaller substring. Technically this approach might be considered to use an additional data structure but I asked the interviewer and she said it was OK to proceed. Note that we could have changed from Java to C and all would be fine.

We start by performing the check on the string length.

We then loop using the currChar variable which holds the current character. We search for that character in the substring that starts with the following character. If we find the same character in the substring, we return false because we found a duplicate character.

If we successfully complete all loops then we return true because all characters in the string are unique.

	/*
	 * Test scaffolding
	 */
	public static void main(String[] args) {

		// **** open a scanner ****
		Scanner sc = new Scanner(System.in);
		
		// **** prompt for string ****
		System.out.print(">>> str: ");
		
		// **** read the string ****
		String str = sc.nextLine();

		// **** check if the string is unique ****
		System.out.println("main <<< str ==>" + str + "<== unique: " + isUnique(str));
		
		// **** check if the string is unique ****
		System.out.println("main <<< str ==>" + str + "<== unique: " + isUnique2(str));
		
		// **** close the scanner ****
		sc.close();
	}

I switched to the Eclipse IDE and wrote a scaffolding to test our algorithms. We prompt for a string and run it through both methods. The first uses multiple data structures, and the second does not. The answers should always match.

>>> str: caebfd
main <<< str ==>caebfd<== unique: true
main <<< str ==>caebfd<== unique: true >>> str: caebcd
main <<< str ==>caebcd<== unique: false
main <<< str ==>caebcd<== unique: false >>> str: caebbd
main <<< str ==>caebbd<== unique: false
main <<< str ==>caebbd<== unique: false

The first string is unique because no characters are repeated. The last two strings contain a duplicate character so the methods return false.

!!! UPDATE !!!

A simpler and more elegant approach is to instantiate an array of Boolean and initialize its elements to false (default in Java). We then traverse the string in O(n) and check if we have previously seen the current character. If so we return false. If we have not then we flag it as true.

	/*
	 * Determine if string contains unique characters.
	 * We may use additional data structures.
	 */
	static boolean isUnique(String str) {
		
		// **** array of flags (one per ASCII character) ****
		boolean[] found = new boolean[128];
		
		// **** check for short strings (cannot have duplicates)****
		if (str.length() <= 1) return true; // **** check for long strings (must have duplicates) **** if (str.length() >= 128)
			return false;

		
//		// **** instantiate a character array from the string ****
//		char[] charArray = str.toCharArray();
//		
//		// **** sort the character array ****
//		Arrays.sort(charArray);
//		
//		// **** build a sorted string from the sorted character array ****
//		str = new String(charArray);
//			
//		// **** traverse string looking for duplicate characters ****
//		char lastChar = str.charAt(0);
//		for (int i = 1; i < str.length(); i++ ) {
//			
//			// **** check if this character matches the last one ****
//			if (str.charAt(i) == lastChar)
//				return false;
//
//			// **** this is now the last character ****
//			lastChar = str.charAt(i);
//		}
//
//		// **** traverse the sorted character array looking for duplicates ****
//		for (int i = 1; i < charArray.length; i++) {
//			
//			// **** check if this character matches the previous one ****
//			if (charArray[i - 1] == charArray[i])
//				return false;
//		}
		
		
		// **** traverse the string looking for duplicates ****
		for (int i = 0; i < str.length(); i++) {
			
			// **** character at current position ****
			char ch = str.charAt(i);
			
			// **** check if we have already seen this character; otherwise flag we are seeing it ****
			if (found[ch] == true)
				return false;
			else
				found[ch] = true;
		}
		
		// **** the string has unique characters ****
		return true;
	}

The method isUnique performs a check. Next we instantiate a character array using the data from the specified string. We sort the character array. At this point there is no need to use a string. We can just complete the algorithm using the character array.

I left commented out the original code that matches the algorithm in the whiteboard. As you can see we are better off using the character array and skipping converting the character array to a string and using the additional lastChar variable.

We loop through the character array looking for consecutive indices with the same character. If we find consecutive duplicate characters we return false.

If the loop completes then the string has unique characters and we return true.

	/*
	 * This time without additional data structure(s).
	 */
	static boolean isUnique2(String str) {
		
		// **** check for short strings ****
		if (str.length() <= 1)
			return true;
		
		// **** traverse the string looking for duplicate characters ****
		for (int i = 0; i < (str.length() - 1); i++) {
			
			// **** ****
			char currentChar = str.charAt(i);
	
			// **** look for this character in the remaining string ****
			int index = str.substring(i + 1).indexOf(currentChar);
						
			// **** check if we found the caracter in the substring ****
			if (index != -1)
				return false;
		}
		
		// **** the string has unique characters ****
		return true;
	}

This is the code which does not use additional data structures.

We start by performing a check.

The loop starts with the first character and looks for it in the substring started with the following character. Note we use the currentChar variable to make it easier to follow. We could have replaced it in the check with the expression. Unless the code needs to be optimized, I prefer readability.

If the character is not found (index = -1) in the substring we continue looping. If the character is found we return false because the string does not hold unique characters.

If the loop completes we return true given that the characters in the string are unique.

I have not looked at the solution yet. Will check the solution after this entry is posted in the blog. I will make a reference of my findings on my next post.

The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this or any other post in this blog, or if you would like me to help with any phase in the SDLC (Software Development Life Cycle) of a product or service, please do not hesitate and leave me a note below. Requests for help will remain private.

Keep on reading and experimenting. It is the best way to learn and refresh your knowledge!

John

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.