The Full Counting Sort

306 Subscribers

Today I decided to solve a HackerRank problem. Randomly I selected The Full Counting Sort. If interested read the requirements. I read the requirements and decided to give it a try.

Based on my experience with this problem you might want to follow my advice. Work on the algorithm and make sure it passes the two sample test cases. Once you are done, submit your solution. If you have a valid approach then chances are that your solution will fail test #5, it will time out. I generated up to three different versions of the countSort() function. I could not get past test #5 because it would time out. I spent time reading the discussions and they did not make much sense. I even bought test #5 for some hackos. By the way, the test includes 1,000,000 strings which I could not download no matter how many times I tried. With this problem do not purchase test #5. You will not be able to run it.

I will just go over the steps I took to get to my solution which was accepted.

I went to GitHub, created a repository and named it JohnCanessa/FullCountingSort. Once created, I wrote a one liner README.md file which I have to update with the link to this post after it has been published.

# **** initial contents of my workspace ****
C:\Users\John\workspace7>dir
02/19/2020  08:02 AM    <DIR>          .
02/19/2020  08:02 AM    <DIR>          ..
01/30/2020  01:32 PM    <DIR>          .metadata
01/31/2020  09:02 AM    <DIR>          .vscode
01/05/2020  07:15 AM    <DIR>          3DSurfaceArea
01/06/2020  11:52 AM    <DIR>          AlmostSorted
02/06/2020  07:16 AM    <DIR>          AndProduct
01/24/2020  12:49 PM    <DIR>          AStarSearch
12/05/2019  01:23 PM    <DIR>          BiggerIsGreater
02/14/2020  03:00 PM    <DIR>          Cipher
12/22/2019  08:35 AM    <DIR>          DeterminingDNAHealth
01/08/2020  07:07 AM    <DIR>          EmasSupercomputer
02/12/2020  02:46 PM    <DIR>          FirstDuplicate
02/03/2020  09:07 AM    <DIR>          FlippingBits
02/16/2020  09:09 AM    <DIR>          GitHubTest
01/31/2020  02:57 PM    <DIR>          guava-mini
01/31/2020  10:44 AM    <DIR>          HelloMaven
02/17/2020  10:43 AM         2,339,925 ij.jar
02/17/2020  10:52 AM    <DIR>          ImageJ
02/04/2020  02:04 PM    <DIR>          JumpingOnTheClouds
01/23/2020  08:05 AM    <DIR>          LarrysArray
12/03/2019  04:26 PM    <DIR>          MatrixLayerRotation
02/03/2020  10:42 AM    <DIR>          MiniMaxSum
01/25/2020  08:07 AM    <DIR>          PriorityQ
01/30/2020  01:42 PM    <DIR>          rtree2
02/03/2020  01:44 PM    <DIR>          SansaXOR
01/19/2020  09:16 AM    <DIR>          StacksReverseOrder
02/19/2020  08:08 AM    <DIR>          SumOfTwo
01/04/2020  07:51 AM    <DIR>          TheGridSearch
12/11/2019  10:09 AM    <DIR>          ToTrieOrNotToTrie

# **** after creating the repo in GitHub, clone it to my machine ****
C:\Users\John\workspace7>git clone https://github.com/JohnCanessa/FullCountingSort.git
Cloning into 'FullCountingSort'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.

# **** workspace folder after cloning ****
C:\Users\John\workspace7>dir
02/20/2020  08:08 AM    <DIR>          .
02/20/2020  08:08 AM    <DIR>          ..
01/30/2020  01:32 PM    <DIR>          .metadata
01/31/2020  09:02 AM    <DIR>          .vscode
01/05/2020  07:15 AM    <DIR>          3DSurfaceArea
01/06/2020  11:52 AM    <DIR>          AlmostSorted
02/06/2020  07:16 AM    <DIR>          AndProduct
01/24/2020  12:49 PM    <DIR>          AStarSearch
12/05/2019  01:23 PM    <DIR>          BiggerIsGreater
02/14/2020  03:00 PM    <DIR>          Cipher
12/22/2019  08:35 AM    <DIR>          DeterminingDNAHealth
01/08/2020  07:07 AM    <DIR>          EmasSupercomputer
02/12/2020  02:46 PM    <DIR>          FirstDuplicate
02/03/2020  09:07 AM    <DIR>          FlippingBits
02/20/2020  08:08 AM    <DIR>          FullCountingSort     <===
02/16/2020  09:09 AM    <DIR>          GitHubTest
01/31/2020  02:57 PM    <DIR>          guava-mini
01/31/2020  10:44 AM    <DIR>          HelloMaven
02/17/2020  10:43 AM         2,339,925 ij.jar
02/17/2020  10:52 AM    <DIR>          ImageJ
02/04/2020  02:04 PM    <DIR>          JumpingOnTheClouds
01/23/2020  08:05 AM    <DIR>          LarrysArray
12/03/2019  04:26 PM    <DIR>          MatrixLayerRotation
02/03/2020  10:42 AM    <DIR>          MiniMaxSum
01/25/2020  08:07 AM    <DIR>          PriorityQ
01/30/2020  01:42 PM    <DIR>          rtree2
02/03/2020  01:44 PM    <DIR>          SansaXOR
01/19/2020  09:16 AM    <DIR>          StacksReverseOrder
02/19/2020  08:08 AM    <DIR>          SumOfTwo
01/04/2020  07:51 AM    <DIR>          TheGridSearch
12/11/2019  10:09 AM    <DIR>          ToTrieOrNotToTrie

# **** move to the new directory ****
C:\Users\John\workspace7>cd FullCountingSort

# **** dontents of the new folder ****
C:\Users\John\workspace7\FullCountingSort>dir /A
02/20/2020  08:08 AM    <DIR>          .
02/20/2020  08:08 AM    <DIR>          ..
02/20/2020  08:08 AM    <DIR>          .git
02/20/2020  08:08 AM               147 README.md            <=== # **** check on our remotes **** C:\Users\John\workspace7\FullCountingSort>git remote -v
origin  https://github.com/JohnCanessa/FullCountingSort.git (fetch)
origin  https://github.com/JohnCanessa/FullCountingSort.git (push)

I opened a command prompt in my Windows 10 machine and got to my workspace7 directory. I will put the code in a sub directory which will be created when I clone my new repository in GitHub.

I cloned my repository and looked at the contents of the workspace7 folder. We now have a new FullCountingSort folder. The new folder contains the README.md file that I created. As you can see we can fetch and push to the remote GitHub repo.

4
0 a
1 b
0 c
1 d

- c - d


20
0 ab
6 cd
0 ef
6 gh
4 ij
0 ab
6 cd
0 ef
6 gh
0 ij
4 that
3 be
0 to
1 be
5 question
1 or
2 not
4 is
2 to
4 the

- - - - - to be or not to be - that is the question - - - -


10
1 e
2 a
1 b
3 a
4 f
1 f
2 a
1 e
1 b
1 c

- - f e b c - a - -


20
88 pe
36 uw
57 ng
58 lt
29 gv
44 be
22 bz
67 yd
20 xm
86 gy
99 dd
35 eg
19 qc
65 jm
6 po
76 xv
72 qm
24 vn
98 mr
11 ab

po ab qc - - vn - eg - - - - jm - qm xv - - mr dd

The sample test cases contain a number of entries followed by the actual entries. Each entry consists of a number in the range 0 <= n < 100 followed by a string of length 1<= s <= 10. The caveat is that the first half of the strings we input need to be replaced by the string “-“.

0 -> ab ef ab ef ij to 
1 -> be or
2 -> not to
3 -> be
4 -> ij that is the
5 -> question
6 -> cd gh cd gh

- - - - - to be or not to be - that is the question - - - -
0 0 0 0 0 0  1  1  2   2  3  4 4    4  4   5        6 6 6 6

The second example with 20 strings is well explained in the requirements page at HackerRank. I believe the representation is a good match of the output of the expected solution.

I decided to implement the solution using the Java 8 programming language with the Visual Studio Code IDE.

The following recommendations came from the discussions which at the time I read them did not make much sense to me.

  1. StringBuilder, don’t underestimate how important it is when working with Strings.
  2. I/O is slow, minimize where possible.
  3. Really think about the aspects that contribute the most to overhead/performance and not try small things which take a lot of time and result in small / negligible gains.
  4. Scanner is slower than BufferedReader.

I could understand the comments about StringBuilder but all the input was done by the provided test scaffolding. Our task is to implement the countSort() function which does not have to perform input. It does have to output the results.

 /**
   * Original test scaffolding.
   */
  public static void main(String[] args) throws IOException {

    // **** ****
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));

    // **** ****
    int n = Integer.parseInt(bufferedReader.readLine().trim());

    // **** ****
    List<List<String>> arr = new ArrayList<>();

    // **** ****
    IntStream.range(0, n).forEach(i -> {
      try {
        arr.add(Stream.of(bufferedReader.readLine().replaceAll("\\s+$", "").split(" ")).collect(toList()));
      } catch (IOException ex) {
        throw new RuntimeException(ex);
      }
    });

    // **** ****
    countSort(arr);

    // **** ****
    bufferedReader.close();
  }

As usual I copy the test scaffolding provided by HackerRank and edit it adding comments in order to understand how the input is collected and presented to the function / method in which we should provide our code.

In this case the countSort() function, which is the place to implement our solution, receives an array of List<List<String>>. Each string contains a number in the range [ 0 : 99 ] followed a string with a length in the range [ 1 : 10 ]. When our function is called we receive the input data already packaged, at least that is the idea.

The problem is that no matter what I did, test #5 would time out. As I mentioned before, I read the comments in the discussion and it seemed that the only way to get past that test case was to ignore the test scaffolding and implement the solution there. In my humble opinion such approach is unacceptable. The implication is that you can no longer trust a test scaffolding in future problems.

  /**
   * Complete the countSort function below.
   */
  static void countSort(List<List<String>> arr) {

    // **** instantiate array of string builders ****
    StringBuilder[] sb = new StringBuilder[100];
    for (int i = 0; i < 100; i++) {
      sb[i] = new StringBuilder();
    }

    // **** to speed up String to Integer conversion ****
    HashMap<String, Integer> hm = new HashMap<String, Integer>();
    for (int i = 0; i <= 99; i++) {
      String s = Integer.toString(i);
      hm.put(s, i);
    }

    // **** ****
    int index = 0;

    // **** set first half of the strings to "- " ****
    for (int i = 0; i < arr.size() / 2; i++) {
      index = hm.get(arr.get(i).get(0));
      sb[index].append("- ");
    }

    // **** append strings with same index to respective string builder ****
    for (int i = arr.size() / 2; i < arr.size(); i++) {
      index = hm.get(arr.get(i).get(0));
      sb[index].append(arr.get(i).get(1) + " ");
    }

    // **** build and output the result string ****
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < 100; i++) {
      result.append(sb[i].toString());
    }
    System.out.println(result.toString());
  }

The implementation of the countSort function passes all the initial test cases. Once you submit it only test #5 times out.

What I did is to edit the test scaffolding incorporating the code that took me hours to implement.

  /**
   * Test scaffolding. UNACCEPTABLE having to replace the test scaffolding.
   */
  public static void main(String[] args) throws IOException {

    // **** open buffered reader ****
    BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));

    // **** get the number of elements ****
    int n = Integer.parseInt(bufferedReader.readLine().trim());

    // **** instantiate array of string builders ****
    StringBuilder[] sb = new StringBuilder[100];
    for (int i = 0; i < 100; i++) {
      sb[i] = new StringBuilder();
    }

    // **** set first half of the strings to "- " ****
    for (int i = 0; i < n / 2; i++) {

      // **** read the next line ****
      String s = bufferedReader.readLine();

      // **** ****
      String[] split = s.split(" ");

      // **** ****
      int index = Integer.parseInt(split[0]);

      // **** ****
      sb[index].append("- ");
    }

    // **** append strings with same index to respective string builder ****
    for (int i = n / 2; i < n; i++) {

      // **** read the next line ****
      String s = bufferedReader.readLine();

      // **** ****
      String[] split = s.split(" ");

      // **** ****
      int index = Integer.parseInt(split[0]);
      sb[index].append(split[1] + " ");
    }

    // **** build and output the result string ****
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < 100; i++) {
      result.append(sb[i].toString());
    }
    System.out.println(result.toString());

    // **** close buffered reader ****
    bufferedReader.close();
  }

First we open a buffered reader to read the count of strings and the actual strings. We then instantiate an array of string builders. The array will be used to build the output strings associated with each number.

The next loop takes care of replacing the first half of the input strings with the minus (-) character. Given that we need to display the strings separated by a space, we use the “- “ string instead.

The next loop which is similar to the first takes into account the actual string. Note that in both loops we immediately append the string to the proper string builder.

At this point we are ready to print the results. We could have used a loop to print the string builders as strings, but for I/O reasons we concatenated the individual string builders into a larger one and then printed it in a single operation.

Guess what, this implementation passed all tests on the first submission!!!

# **** check status ****
C:\Users\John\workspace7\FullCountingSort>git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        Solution.java   <=== nothing added to commit but untracked files present (use "git add" to track) # **** add to next commit **** C:\Users\John\workspace7\FullCountingSort>git add Solution.java

# **** check status ****
C:\Users\John\workspace7\FullCountingSort>git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        new file:   Solution.java

# **** commit current software ****
C:\Users\John\workspace7\FullCountingSort>git commit -m "first version of a completed solution"
[master 8797ee4] first version of a completed solution
 1 file changed, 145 insertions(+)
 create mode 100644 Solution.java

# **** push origin to master ****
C:\Users\John\workspace7\FullCountingSort>git push origin master
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 1.25 KiB | 1.25 MiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/JohnCanessa/FullCountingSort.git
   8dd3d15..8797ee4  master -> master

# **** check the master (no branches at this time) ****
C:\Users\John\workspace7\FullCountingSort>git log -a
commit 8797ee4102d8f813c29b2ca891b91c2436ae7fd5 (HEAD -> master, origin/master, origin/HEAD)
Author: JohnCanessa <john.canessa@gmail.com>
Date:   Fri Feb 21 14:05:01 2020 -0600

    first version of a completed solution

commit 8dd3d1544360eb500b7c790c687831aaf60d6312
Author: John Canessa <8019504+JohnCanessa@users.noreply.github.com>
Date:   Thu Feb 20 08:05:06 2020 -0600

    Create README.md

We are done with the code for the solution. We need to update the GitHub repository.

As usual, we get status. We need to add Solution.java to the list of files to commit. We add Solution.java and check status. Looks like we are ready to commit.

We then commit with a message briefly describing what we committed. We then push origin to master. At this point our version and the one in our GitHub repo are in sync.

Finally we check the log file on master. We have two entries. The first when we created the README.md file and the second when we pushed the solution to the HackerRank problem.

I went to my GitHub repo and verified that the Solution.java file was in the repo and that it matched what we had in our computer.

OK, we are now done with this problem. I will make an entry in the discussions to mention that solution for problems should not be dependent with the test scaffolding. All solutions should be limited to functions and methods created outside of main().

The entire code for this project can be found in my GitHub repository.

If you have comments or questions regarding this, or any other post in this blog, or if you would like for me to serve of assistance with any phase in the SDLC (Software Development Life Cycle) of a project associated with a product or service, please do not hesitate and leave me a note below. If you prefer, send me a private message using the following address:  john.canessa@gmail.com. I will reply as soon as possible.

Keep on reading and experimenting. It is the best way to learn, refresh your knowledge and enhance your developer toolset!

John

Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.