Split Words

splitstringI have had a stressful week. Glad the weekend arrived. Today I will provide my solution, including the thinking process, to a common and quite popular programming challenge. I have encountered several variations of it. The base problem follows:

“Split a text string into words. Words only contain ASCII letters which for simplicity may be treated as lower (a – z) or upper case (A – Z) characters (developer’s choice)”.

When I am presented with a challenge / problem, no matter how simple or complex, I always try to understand it before developing a production solution. Depending on the complexity, I usually check for approaches that I can find online using my favorite search engine. This is a relatively simple problem but may have hidden requirements. In most specifications, requirements might be hidden and one tends to find them when you run some unit tests and they fail. This is the normal development process contrary what some people may believe. I will attempt to provide you with a glimpse of my thinking process.

I always like to mention, that developers always work with their favorite IDE and have access to the WWW. Writing code on a white board with a marker, not having well specified requirements, and having a few minutes to come up with a solution is not how most (not to say all) software developers work.stop-think-act

Following is console output from the Eclipse IDE:

main >>> s: hello world

main <<< s ==>hello world<==

main <<< words ==>hello world <==

Following is the starting point code written in Java:

package john.canessa.split;

import java.util.Scanner;

public class Solution {

/*

* test code

*/

public static void main(String[] args) {

// **** prompt and read the string ****

Scanner sc    = new Scanner(System.in);

System.out.print(“main >>> s: “);

String s      = sc.nextLine();

sc.close();

System.out.println(“main <<< s ==>” + s + “<==”);

// **** split the string into words ****

String[] words = s.split(” “);

// **** display the words ****

System.out.print(“main <<< words ==>”);

for (String w : words) {

System.out.print(w + ” “);

}

System.out.println(“<==”);

}

}

For a first pass it looks OK. Note that when I print strings I like to bracket them with arrows. That gives me a good indication is all is well or I might need some tweaking. As you can see in the previous console output, the code printed an additional trailing space.

The following code snippet seems to address the issue:

// **** display the words ****

System.out.print(“main <<< words ==>”);

for (int i = 0; i < words.length; i++) {

System.out.print(words[i]);

if (i < (words.length – 1)) {

System.out.print(” “);

}

}

System.out.println(“<==”);

Console output to verify the change follows:

main >>> s: hello world

main <<< s ==>hello world<==

main <<< words ==>hello world<==

As I mentioned before, we are dealing with the base problem. Typically the initial requirements might add some other constraints. Given that the possibilities are almost endless (and these challenges are just brain teasers allegedly used to convey others your typical problem solving skills without an IDE, under unrealistic time constraints and without you able to reference the Internet or your book library). That said; they are always fun to solve and will help you come up with alternate approaches.

I have found versions of this challenge that do not allow the user to use the String.split() method. Some add a twist by requiring something out of the scope of the initial problem (e.g., when done print the words in reverse order).

So let’s update the requirements as follows:

“Split a text string into words. Words contain only ASCII lower-case characters. Do not use String.split(). Separators may include multiple characters which need to be preserved. The output must reverse the orders in the words (e.g., hello ==> olleh”.

If you still have time, let’s modify our code to handle the updated requirement specification. We could try to use a regular expression in String.split() but that is not allowed and it requires additional knowledge of Regular Expressions.

The final solution contains the following test code:

public static void main(String[] args) {

// **** prompt and read the string ****

Scanner sc = new Scanner(System.in);

System.out.print(“main >>> s: “);

String s = sc.nextLine();

sc.close();

s = s.toLowerCase();

System.out.println(“main <<< s ==>” + s + “<==”);

// **** split the words ****

ArrayList<String> words = split(s);

// **** split separators ****

ArrayList<String> seps = separators(s);

// **** reverse the letters in the words ****

words = reverse(words);

// **** display the result ****

System.out.println(“main <<< ==>” + combine(s, words, seps) + “<==”);

}

We first read the input line. The input is converted to lower-case. The words are extracted from the original string. Next the spacing is extracted from the string. The order of the letters in the words is reversed. Finally the result string is put together.

I ran a few tests using the test code. That said; I did not have enough time to generate unit tests for each method because I have already spent about three hours (not 20 or 30 minutes) on this blog and code. Following is a screen capture for a test run:

main >>> s:    hello,       world bye

main <<< s ==>       hello,       world bye   <==

main <<< ==>  olleh,       dlrow eyb   <==

The first output line shows the string I entered. The second one shows the result. Before each word and after the word “bye” I entered <SPACE><TAB>. I also placed a “,” after the word “hello”.

The rest of the code follows:

/*

* extract words

*/

static ArrayList<String> split(String s) {

ArrayList<String> words = new ArrayList<String>();

// **** check for null or empty sting ****

if ((s == null) || (s.length() == 0)) {

return words;

}

// **** traverse the string ****

Int begin     = -1;

int end              = -1;

for (int i = 0; i < s.length(); i++) {

Character c = s.charAt(i);

if (Character.isLowerCase(c)) {

if (begin < 0) {

begin = i;

}

} else {

if (begin >= 0) {

end           = i;

String w      = s.substring(begin, end);

words.add(w);

begin = -1;

end    = -1;

}

}

}

// **** check if a word was left behind ****

if (begin >= 0) {

end           = s.length();

String w      = s.substring(begin, end);

words.add(w);

}

// **** return the array of words ****

return words;

}

/*

* extract separators

*/

static ArrayList<String> separators(String s) {

ArrayList<String> seps = new ArrayList<String>();

// **** check for null or empty sting ****

if ((s == null) || (s.length() == 0)) {

return seps;

}

// **** traverse the string ****

Int begin     = -1;

int end              = -1;

for (int i = 0; i < s.length(); i++) {

Character c = s.charAt(i);

if (!Character.isLowerCase(c)) {

if (begin < 0) {

begin = i;

}

} else {

if (begin >= 0) {

end           = i;

String w      = s.substring(begin, end);

seps.add(w);

begin = -1;

end    = -1;

}

}

}

// **** check if a separator was left behind ****

if (begin >= 0) {

end           = s.length();

String w      = s.substring(begin, end);

seps.add(w);

}

// **** return the array of separators ****

return seps;

}

/*

* combine the words and separators taking into account leading and trailing separators

*/

static String combine(String s, ArrayList<String> words, ArrayList<String> seps) {

StringBuilder sb     = new StringBuilder();

int j                      = 0;

// **** check for null or empty sting ****

if ((s == null) || (s.length() == 0)) {

return sb.toString();

}

// **** ****

Character c = s.charAt(0);

if (!Character.isLowerCase(c)) {

sb.append(seps.get(j++));

}

// **** ****

for (int i = 0; i < words.size(); i++) {

sb.append(words.get(i));

if (j < seps.size()) {

sb.append(seps.get(j++));

}

}

// **** ****

return sb.toString();

}

/*

* reverse the order of the characters in each string

*/

static ArrayList<String> reverse(ArrayList<String> words) {

// **** ****

for (int i = 0; i < words.size(); i++) {

String w             = words.get(i);

StringBuffer sb = new StringBuffer(w);

sb = sb.reverse();

words.set(i, sb.toString());

}

// **** ****

return words;

}

In conclusion; I like to first experiment with ideas to properly capture the actual requirements. As code is being developed, input from users (in this case myself) helps make sure (sometimes changing) requirements are understood and satisfied. There are many developers out there that do not generate documentation and code by the seat of their pants. Hopefully the development process here presented illustrates how code should properly be developed (in stages, documented and tested).

If you have comments or questions on this blog entry, please send me a message.

Regards;

John

john.canessa@gmail.com

 

Leave a Reply

Your email address will not be published. Required fields are marked *