Sort on Frequency

UPDATE

Noticed that some of the code blocks had additional characters e.g., &s. I removed and inserted the  original code blocks. The code looks good. Sorry about that.

Today is Saturday March 07, 2020 and the high for today in Apple Valley, MN is forecasted to be in the low to mid 50’s. Tomorrow we should be hitting low to mid 60’s. We can call it global warming or whatever you would like, but the global climate is changing. We can debate the reasons but Occam’s Razor problem-solving principle would attribute it to us humans.

One way or the other, my wife and I believe that we are doing our part. Starting this month, the neighborhood has hired a new company for garbage removal. They collect both bins every week. Possible smell during summer would prevent us from keeping our garbage for more than a week, but we generate such a small amount that the service could pick it up once a month.

Yesterday we had an appointment to take our SUV for service at the dealer. Since we purchased it on July 01, 2018 (20 months ago) we have driven it 10,000 miles. We do have a second car, a convertible, and that car has about 40,000 miles. We purchased it about 7 years ago. Not bad.

One more thing and we will get to the subject of this post. This past week I watched “Elon Musk as Inspiration for Science Fiction (Alex Garland) | AI Podcast Clips” by Lex Fridman. He had Alex Garland as a guest. During the video both movies “2001: Space Odyssey” and “Ex Machina” were referenced. Some years ago I watched 2001: Space Odyssey. I had not watched Ex Machina so yesterday my wife and I watched it in Netflix. From the AI and ML aspects, the movie was good. From the point of view of robotics and AI technologies, in my humble opinion, it was too futuristic. It reminded me about Tesla’s AV. It is work in progress, but for it to match the driving of a 16 year old, we might have to wait for quite a longer time. In addition, the inventor spent a lot of time drinking and lifting weights. It is hard to buy that a person with the traits displayed in the movie would be capable to achieve what was portrayed.

OK, let’s get to the core of this post. I started to watch the YouTube video “Technical Interview with a Software Engineer – Algorithms & Data Structures” by Nicholas White and stopped it at 01:33. I will watch the rest of the video after I finish the first version of this post. At that time will make additional comments if needed based on the solution presented in the video.

The requirements are:  Given a string, sort it in decreasing order based on the frequency of the characters.

There are three test cases that one can use to clarify the requirements and test against.

I will implement a solution in Java 8 using the Microsoft Visual Studio Code. Since I started using VSCode I have developed a taste for it. In my machine I have different versions of Visual Studio. The latest one is Visual Studio 2019 Enterprise Edition. That version is quite bloated. That said; I use it mostly for work stuff. Gradually I am finding that some (if not most) work related tasks can also be implemented using VSCode. Go figure.

I started in GitHub by creating a repository and editing the README.md file.

# **** clone GitHub repository ****
C:\Users\John\workspace8>git clone https://github.com/JohnCanessa/SortOnFrequency.git
Cloning into 'SortOnFrequency'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.

# **** verify that a folder for the project was created ****
C:\Users\John\workspace8>dir
03/06/2020  01:08 PM    <DIR>          .
03/06/2020  01:08 PM    <DIR>          ..
03/06/2020  01:08 PM    <DIR>          SortOnFrequency

# **** get to the new folder ****
C:\Users\John\workspace8>cd SortOnFrequency

# **** check what we have cloned ****
C:\Users\John\workspace8\SortOnFrequency>dir /A
03/06/2020  01:08 PM    <DIR>          .
03/06/2020  01:08 PM    <DIR>          ..
03/06/2020  01:08 PM    <DIR>          .git
03/06/2020  01:08 PM               244 README.md

# **** display the README.md file that was created on GitHub ****
C:\Users\John\workspace8\SortOnFrequency>type README.md
# SortOnFrequency
Given a string, sort the characters in decreasing order based on the frequency of characters.

For aditional information on the requirements and actual Java 8 code please refer to the following
post in my blog:

T.B.D.

I then proceeded to write the initial code.

7
tree
cccaaa
Aabb
Canessa
cacaca
Mississippi
1223334444


eert
aaaccc
bbAa
aassCen
aaaccc
ssssiiiippM
4444333221

The input is a set of strings. So let’s have the number of test cases first followed by the actual strings. After reading a string we can process and display results.

  /**
   * Test scaffolding.
   */
  public static void main(String[] args) {
    
    // **** open scanner ****
    Scanner sc = new Scanner(System.in);

    // **** read the number of strings to process ****
    int n = Integer.parseInt(sc.nextLine());

    // **** process each string ****
    for (int i = 0; i &lt; n; i++) {

      // **** read the string ****
      String str = sc.nextLine();

      // **** sort the characters in the string and display result ****
      System.out.println(sortString(str));
    }

    // **** close scanner ****
    sc.close();
  }

The first three test cases were given in the video. The expected answers seem to match my solution. I added the few more strings. My last name is “Canessa” so the solution could have generated “aassCen”, “ssaaCen”, and other variations of the string “Cen”. The order is dictated by the number of characters not their lexicographical order in case of a tie (i.e., 2 a’s and 2 s’).

  /**
   * Given a string, sort it in increasing order based on the frequency of its characters.
   * This method runs in -> O(n log(n)) time.
   */
  static String sortString(String str) {

    // **** hash map used to determine character frequency ****
    HashMap<Character, Integer> hm = new HashMap<Character, Integer>();

    // **** populate the hash map -> O(n) ****
    for (int i = 0; i < str.length(); i++) { // **** for ease of use **** char ch = str.charAt(i); // **** key in hash map **** if (hm.containsKey(ch)) { int count = hm.get(ch); hm.put(ch, ++count); } // **** key not in hashmap **** else { hm.put(ch, 1); } } // **** sort the hash map in descending order by value -> O(n log(n)) ****
    List<Entry<Character, Integer>> al = sortHMByValue(hm);

    // **** string builder to generate string ****
    StringBuilder sb = new StringBuilder();

    // **** populate the string builder -> O(n) ****
    for (int i = 0; i < al.size(); i++) {

      // **** for ease of use ****
      Entry<Character, Integer> e = al.get(i);

      // **** generate char[] with the specified character ****
      char[] chs = new char[e.getValue()];
      Arrays.fill(chs, e.getKey());

      // **** append the char[] to the string builder ****
      sb.append(chs);
    }

    // **** return string ****
    return sb.toString();
  }

I decided to use a hash map to keep track of the characters and associate counts.

The method starts by traversing the string and populating the hash map. When done we have the frequencies per character.

We create an array list ordered by the key value. We can now traverse the array list and have access to the characters and associated frequencies.

We loop through the array list creating a character array of the specified frequency and populating it with the current character. The character array is used to append the associated string to the string builder.

We could have used a string but this is a typical case in which for performance a string builder would do better than an immutable string.

When all is wet and done we return the string associated with our string builder.

  /**
   * Sort hash map by value in descending order.
   * This method runs in -> O(n log(n)) time.
   */
  static List<Entry<Character, Integer>> sortHMByValue(HashMap<Character, Integer> hm) {

    // **** array list with hash map entries (needs to be sorted) ****
    List<Entry<Character, Integer>> sortedEntries = new ArrayList<Entry<Character, Integer>>(hm.entrySet());

    // **** sort the array list -> O(n log(n)) ****
    Collections.sort( sortedEntries,
                      new Comparator<Entry<Character, Integer>>() {
                        @Override
                        public int compare(Entry<Character, Integer> e1, Entry<Character, Integer> e2) {
                          return e2.getValue().compareTo(e1.getValue());
                        }
                      });

    // **** array list sorted by value in ascending order ****
    return sortedEntries;
  }

This method takes our hash map and creates an array list. Note that the array list is in the same order as the hash map. We then sort the array list using a custom comparator. When done we return the sorted array list.

It seems that all is well with the code at this time. I will now commit this version to our GitHub repository. When done will watch the rest of the video. If needed, I will create a branch and edit my solution.

# **** contents of the folder ****
C:\Users\John\workspace8\SortOnFrequency&gt;dir
03/06/2020  01:16 PM    &lt;DIR&gt;          .
03/06/2020  01:16 PM    &lt;DIR&gt;          ..
03/06/2020  01:08 PM               244 README.md
03/07/2020  11:01 AM             3,263 Solution.java

# **** ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add &lt;file&gt;..." to include in what will be committed)

        Solution.java

nothing added to commit but untracked files present (use "git add" to track)

# **** add the file to the next commit ****
C:\Users\John\workspace8\SortOnFrequency&gt;git add Solution.java

# **** all is good at this time ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD &lt;file&gt;..." to unstage)

        new file:   Solution.java

# **** commit the first pass of the solution ****
C:\Users\John\workspace8\SortOnFrequency&gt;git commit -m "solution before watching the rest of the video"
[master e909c1a] solution before watching the rest of the video
 1 file changed, 114 insertions(+)
 create mode 100644 Solution.java

# ****  push the changes to GitHub ****
C:\Users\John\workspace8\SortOnFrequency&gt;git push origin master
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 1.29 KiB | 1.29 MiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/JohnCanessa/SortOnFrequency.git
   caab5d6..e909c1a  master -&gt; master

# **** check the log ****
C:\Users\John\workspace8\SortOnFrequency&gt;git log
commit e909c1a0b994a5898c8f6dedbc498af3b9b7f808 (HEAD -&gt; master, origin/master, origin/HEAD)
Author: JohnCanessa &lt;john.canessa@gmail.com&gt;
Date:   Sat Mar 7 11:02:46 2020 -0600

    solution before watching the rest of the video

commit caab5d6be4f0ee0c901e97bc05eb47330737c924
Author: John Canessa &lt;8019504+JohnCanessa@users.noreply.github.com&gt;
Date:   Fri Mar 6 13:07:48 2020 -0600

    Create README.md

OK, went to my GitHub repository and made sure the initial solution was pushed.

Now let’s watch the rest of the video…

OK, it is Sunday morning. Due to daylight saving time, last night (actually early morning today) we lost an hour (spring forward). There was a beautiful full moon when I got up.

This post would look like I am repeating things, but after finishing watching the YouTube video I decided to add a second approach using a priority queue. Initially I did not want to use a priority queue because in my recent post “Priority Queue” I covered it’s with an example using student grades. The video uses a priority queue so I decided to add a method which would implement a second solution. I left both solutions in the final code.

# **** ****
C:\Users\John\workspace8\SortOnFrequency&gt;dir /a
03/08/2020  09:44 AM    &lt;DIR&gt;          .
03/08/2020  09:44 AM    &lt;DIR&gt;          ..
03/08/2020  09:44 AM    &lt;DIR&gt;          .git
03/08/2020  09:44 AM               244 README.md
03/08/2020  09:44 AM             3,263 Solution.java

# **** create branch add_method from master ****
C:\Users\John\workspace8\SortOnFrequency&gt;git checkout -b add_method master
Switched to a new branch 'add_method'

# **** before editing Solutions.java ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch add_method
nothing to commit, working tree clean

Before we edit our code, I decided to create a branch and name it add_method. Once done, I moved into updating the code.

7
tree
cccaaa
Aabb
Canessa
cacaca
Mississippi
1223334444

eert
eert
aaaccc
aaaccc
bbAa
bbAa
aassCen
aassenC
aaaccc
aaaccc
ssssiiiippM
ssssiiiippM
4444333221
4444333221

As you can see I duplicated the output and in some cases it is different. The task is to output the characters ordered by frequency. In characters with the same frequency (e.g., 1) the order in which they appear was not specified.

 /**
   * Test scaffolding.
   */
  public static void main(String[] args) {
    
    // **** open scanner ****
    Scanner sc = new Scanner(System.in);

    // **** read the number of strings to process ****
    int n = Integer.parseInt(sc.nextLine());

    // **** process each string ****
    for (int i = 0; i &lt; n; i++) {

      // **** read the string ****
      String str = sc.nextLine();

      // **** sort the characters in the string and display result ****
      System.out.println(sortString(str));
      System.out.println(frequencyString(str));
    }

    // **** close scanner ****
    sc.close();
  }

It looks like the only change to the test scaffolding was to add a second method which returns a string.

  /**
   * Given a string, sort it in increasing order based on the frequency of its characters.
   * This method runs in -&gt; O(n)) time.
   * The code is shorter than in the sortString method.
   */
  static String frequencyString(String str) {

    // **** check the string ****
    if ((str == null) || (str.length() &lt;= 0)) { return ""; } // **** used to build the string to return **** StringBuilder sb = new StringBuilder(); // **** populate the hashmap used to get the frequency of the characters -&gt; O(n) ****
    HashMap&lt;Character, Integer&gt; hm = new HashMap&lt;Character, Integer&gt;();
    for (char c : str.toCharArray()) {
      hm.put(c, hm.getOrDefault(c, 0) + 1);
    }
    
    // **** declare a priority queue with Comparator ****
    PriorityQueue&lt;Character&gt; pq = new PriorityQueue&lt;Character&gt;(new Comparator&lt;Character&gt;() {
      @Override
      public int compare(Character a, Character b) {
        return hm.get(b) - hm.get(a);
      }
    });

    // **** populate the priority queue -&gt; O(n) ****
    for (char c : hm.keySet()) {
      pq.add(c);      // O(log(n))
    }

    // **** populate the string builder -&gt; O(n) **** 
    while (!pq.isEmpty()) {

      // **** get the next character from the priority queue ****
      char c = pq.remove();

      // **** get the count from the hash map ****
      int count = hm.get(c);

      // **** append this character to string builder the specified count ****
      for (int i = 0; i &lt; count; i++) {
        sb.append(c);
      }
    }

    // **** return the string ****
    return sb.toString();
  }

We start by adding a test when the string is not specified or when the string is empty. I did not specified one in the original approach.

We will use a string builder as we did before.

We also use a hash map. The way it is populates is somewhat different (shorter) but the results are the same. When done each character is assigned the same frequency as in the original method.

The priority queue implements the mechanism that will generate the order for the characters in the final string. The priority queue will be ordered by the frequency of the characters in the original string.

Once that is done, we can populate the string builder. Note that the way we do it is quite similar. The only difference is that in this case we loop appending the characters to the string builder.

That is it. We now need to get our changes into GitHub.

# **** after editing Solutions.java ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch add_method
Changes not staged for commit:
  (use "git add &lt;file&gt;..." to update what will be committed)
  (use "git checkout -- &lt;file&gt;..." to discard changes in working directory)

        modified:   Solution.java

no changes added to commit (use "git add" and/or "git commit -a")

# **** add the updated code in preparation for commit ****
C:\Users\John\workspace8\SortOnFrequency&gt;git add Solution.java

# **** ready to commit ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch add_method
Changes to be committed:
  (use "git reset HEAD &lt;file&gt;..." to unstage)

        modified:   Solution.java

# **** commit changes ****
C:\Users\John\workspace8\SortOnFrequency&gt;git commit -m "added new method"
[add_method af2251f] added new method
 1 file changed, 63 insertions(+), 1 deletion(-)

# **** changes commited ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch add_method
nothing to commit, working tree clean

# **** push changes to GitHub ****
C:\Users\John\workspace8\SortOnFrequency&gt;git push origin add_method
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 978 bytes | 489.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
remote:
remote: Create a pull request for 'add_method' on GitHub by visiting:
remote:      https://github.com/JohnCanessa/SortOnFrequency/pull/new/add_method
remote:
To https://github.com/JohnCanessa/SortOnFrequency.git
 * [new branch]      add_method -&gt; add_method

# **** in branch add_method ****
C:\Users\John\workspace8\SortOnFrequency&gt;git status
On branch add_method
nothing to commit, working tree clean

We added the changes and committed the code. The branch was pushed to GitHub.

# **** switch to the master branch ****
C:\Users\John\workspace8\SortOnFrequency&gt;git checkout master
Already on 'master'
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

# **** check out master to get the latest and greatest (not needed in this case) ****
C:\Users\John\workspace8\SortOnFrequency&gt;git pull origin master
From https://github.com/JohnCanessa/SortOnFrequency
 * branch            master     -&gt; FETCH_HEAD
Already up to date.

# **** merge add_method to the master branch ****
C:\Users\John\workspace8\SortOnFrequency&gt;git merge add_method
Already up to date.

# **** push the updated code to the GitHub repo ****
C:\Users\John\workspace8\SortOnFrequency&gt;git push origin master
Total 0 (delta 0), reused 0 (delta 0)
To https://github.com/JohnCanessa/SortOnFrequency.git
   e909c1a..af2251f  master -&gt; master
   
# **** check the log file ****
C:\Users\John\workspace8\SortOnFrequency&gt;git log
commit af2251f8b544af75acc31d7d0712a34485cbbc1f (HEAD -&gt; master, origin/master, origin/add_method, origin/HEAD, add_method)
Author: JohnCanessa &lt;john.canessa@gmail.com&gt;
Date:   Sun Mar 8 09:51:34 2020 -0500

    added new method

commit e909c1a0b994a5898c8f6dedbc498af3b9b7f808
Author: JohnCanessa &lt;john.canessa@gmail.com&gt;
Date:   Sat Mar 7 11:02:46 2020 -0600

    solution before watching the rest of the video

commit caab5d6be4f0ee0c901e97bc05eb47330737c924
Author: John Canessa &lt;8019504+JohnCanessa@users.noreply.github.com&gt;
Date:   Fri Mar 6 13:07:48 2020 -0600

    Create README.md

Now that the branch is in GitHub we went ahead and merged it to master. We then pushed the updated code to the GitHub repository. Finally we take a look at the log to make sure things are in an acceptable state. All appears to be well.

After posting this article I will get the URL and update the README.md file directly in GitHub.

The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to serve of assistance with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private message using the following address:  john.canessa@gmail.com. I will reply as soon as possible.

Keep on reading and experimenting. It is the best way to learn, refresh your knowledge and enhance your developer toolset!

One last thing, thanks to all 429 subscribers to my blog!!!

John

Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.