Last week I was talking with a software engineer. While chatting the description of a problem / challenge came up. The description, as far as I can recall, follows:
“Given a text file with all (most would be more precise) English language words in lower-case, find all anagrams for a specified word”.
An example of an anagram was discussed. At the time I was thinking on palindromes and was not able to concentrate on the definition of anagrams. I thought that an anagram consists of a set of letters in any order and letters may repeat. For example, “ana” and “anna” would fit my incorrect definition. I mentioned that I would spend some time on my own and come up with a solution.
Over the weekend I did not have time to work on the challenge. I did schedule time Monday (today) to address it. Last evening, while sleeping, I woke up with what might be a solution for it. This type of event happens to me quite often. I crawled out bed in order not to wake up my wife and headed to the bathroom with my cell phone. I had to confirm what came up while sleeping. Went to OneLook.com and entered “anagram”. The first definition follows:
“A word or phrase made by transposing the letters of another word or phrase”.
An anagram is defined as a permutation of letters. At the time, it would have been great if the word “permutation” would have been used instead of “anagram”. This meant that given a set of letters (in a word) find all other words in the dictionary that match all possible permutations.
The following solution outline (in pseudo code) immediately came up to mind:
readDictionary();
word = promptAndRead();
String[] permutations = generatePermutations(word);
for (String p : permutations) {
String anagram = lookUp(p);
If (anagram != null) {
Print(anagram);
}
}
I decided to get back in bed. I would try the approach first thing in the morning as soon as I start work (07:00 AM CDT).
I always like to do things in steps and test as I go. When all is said and done I may or may not (in this case) add unit testing. This approach is called test driven. My initial code written in Java using the Eclipse IDE follows:
public static void main(String[] args) throws IOException {
final String dictName = “c:\\temp\\dictionary.txt”;
TreeSet<String> dict = null;
// **** read English dictionary ****
dict = readDict(dictName);
System.out.println(“main <<< dict.size: ” + dict.size());
// **** prompt and get a word to look for associates anagrams (permutations) ****
// **** generate permutations ****
// **** loop once per permutation ****
// **** look up this permutation in the dictionary ****
// **** print the permutation (if needed) ****
}
Following is a screen capture of the console in the IDE:
main <<< dict.size: 354935
At this point in time, I have a good feeling that the words in the English dictionary have been loaded. The next step is to put in the code that reads the word that specifies the permutations that will be looked up in the dictionary.
The following code in the test program was added:
// **** prompt and get a word to look for associates anagrams (permutations) ****
String word = promptAndRead();
System.out.println(“main <<< word ==>” + word + “<==”);
After entering the word “cake” the console displays:
main <<< dict.size: 354935
promptAndRead >>> word: cake
main <<< word ==>cake<==
The next step is to produce the permutations for the specified word keeping in mind that some may repeat. For example, the word “ana” would only produce the following unique permutations:
main <<< dict.size: 354935
promptAndRead >>> word: ana
main <<< word ==>ana<==
main <<< p ==>aan<==
main <<< p ==>ana<==
main <<< p ==>naa<==
After cleaning the code I have produced the following complete solution:
package john.canessa.anagram;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.Set;
import java.util.TreeSet;
public class Solution {
static Set<String> permutations = null;
/*
* generate all permutations for the specified word
*/
public static Set<String> genPerms(String s){
permutations = new TreeSet<String>();
return perm(“”, s);
}
/*
*
*/
private static Set<String>perm(String prefix, String s) {
int n = s.length();
if (n == 0) {
permutations.add(prefix);
}
else {
for (int i = 0; i < n; i++) {
perm(prefix + s.charAt(i), s.substring(0, i) + s.substring(i + 1, n));
}
}
return permutations;
}
/*
* prompt for and read the word to look for the anagrams
*/
static String promptAndRead() {
Scanner in = new Scanner(System.in);
System.out.print(“promptAndRead >>> word: “);
String word = in.next();
in.close();
return word;
}
/*
* read the specified English dictionary converting all
* words to lower case
*/
static TreeSet<String> readDict(String dictName) throws IOException {
// **** check if the dictionary is not there ****
File file = new File(dictName);
if (!file.exists() || file.isDirectory()) {
throw new FileNotFoundException(“readDict <<< dictName ==>” + dictName + “<==”);
}
// **** create the dictionary data structure ****
TreeSet<String> dict = new TreeSet<String>();
// **** open and read the dictionary into the specified data structure ****
FileReader in = new FileReader(file);
BufferedReader br = new BufferedReader(in);
while (br.ready()) {
String word = br.readLine().toLowerCase();
dict.add(word);
}
br.close();
in.close();
// **** return the dictionary ****
return dict;
}
/*
* test code
*/
public static void main(String[] args) throws IOException {
final String dictName = “c:\\temp\\dictionary.txt”;
TreeSet<String> dict = null;
// **** read English dictionary ****
dict = readDict(dictName);
System.out.println(“main <<< dict.size: ” + dict.size());
// **** prompt and get a word to look for associates anagrams (permutations) ****
String word = promptAndRead();
// **** generate a set of unique permutations ****
Set<String> perms = genPerms(word);
// **** loop once per permutation ****
for (String p : perms) {
if (dict.contains(p)) {
System.out.println(“main <<< found ==>” + p + “<==”);
}
}
}
}
Following is the output from a final run:
main <<< dict.size: 354935
promptAndRead >>> word: taser
main <<< found ==>arest<==
main <<< found ==>aster<==
main <<< found ==>astre<==
main <<< found ==>rates<==
main <<< found ==>reast<==
main <<< found ==>resat<==
main <<< found ==>serta<==
main <<< found ==>stare<==
main <<< found ==>strae<==
main <<< found ==>tares<==
main <<< found ==>tarse<==
main <<< found ==>tears<==
main <<< found ==>teras<==
main <<< found ==>treas<==
To be honest with you, I did not have an idea that some of the previous words were valid. With the help of an editor, I looked them up in the c:\temp\dictionary.txt file I used. I also looked each word on OneLook.com. Seems that all words are all valid :o)
The entire documentation, development and testing took me about three hours. Of course the initial solution with no details took me a few seconds to conceive after I understood the proper definition of an anagram :o(
I have not checked by solution with others. At this point I will call this challenge solved unless I get some feedback to the contrary.
John
john.canessa@gmail.com