Hash Table

Why would one implement a hash table? When needed, software developers use existing classes which exhibit the properties of a hash table. For example, one might use a List or Dictionary class. That said; the user should fully understand the need and concept of a raw hash table.

A hash table is typically considered when the application requires random access speeds, the domain of values is large and an array would be sparse (most entices unused). The concept is to use some type of function that would allow quickly computing an index in a smaller array to store and looking up the element. Ideally the function would be uniform. That reduces collisions (multiple objects go into the same entry). A way to deal with that is to implement an overflow mechanism allowing multiple buckets

If interested in the subject take a look at algorithms books or search the web for articles and sample code.

On several occasions I have implemented hash tables. The last opportunity was for tracking client IP addresses. The code was implemented using C / C++. I am not at liberty to use the actual code, but I can provide a simpler implementation using Java.

The idea (or call it requirements) is to be able to track a few client IPs in a hash table. When the client connects the IP is stored in the hash table. When the client disconnects the IP is removed from the hash table. When a timeout occurs the information is logged into a system log. For our example we will use the hash of the client IP. In practice this would probably not be a good choice.

The buckets would be implemented with doubly linked lists and should not hold too many entries. An enhancement to the code might be to insert an entry in the log if a particular bucket exceeds a limit. The nice thing about lists is that they can easily grow.

Let’s start with some test code before implementing the HashTable class. The following Java code was created using the Eclipse IDE:

package john.hash.table;

import java.util.Random;

public class Solution {

// **** ****

static final int MAX_IP = 16;

// **** ****

static Random rand = new Random(System.currentTimeMillis());

/**

* generate random IP in specified range.

*/

       static String GenIP() {

String ip = “192.168.1.”;

ip += String.valueOf(1 + rand.nextInt(MAX_IP));

return ip;

}

/**

* test code

*/

public static void main(String[] args) {

// **** ****

final int TEST_LOOPS       = 8;

final int ENTRIES    = 4;

// **** ****

String ip = “”;

// **** instantiate a hash table ****

HashTable ht = new HashTable(ENTRIES);

// **** use the hash table ****

for (int t = 0; t < TEST_LOOPS; t++) {

// **** generate an IP ****

ip = GenIP();

System.out.println(“main <<< ip ==>” + ip + “<==”);

// **** look up the IP in hash table ****

boolean found = ht.LookUp(ip);

System.out.println(“main <<< LookUp(” + ip + “): ” + found);

// **** put the IP in the hash table ****

ht.Put(ip);

// **** get the IP from the hash table ****

String str = ht.Get(ip);

System.out.println(“main <<< Get() str ==>” + str + “<==”);

// **** ****

System.out.println();

}

// **** clear the hash table ****

ht.Clear();

// **** look up the last IP in hash table ****

boolean found = ht.LookUp(ip);

System.out.println(“main <<< LookUp(” + ip + “): ” + found);

}

}

Following is a screen capture of the console:

main <<< ip ==>192.168.1.2<==

main <<< LookUp(192.168.1.2): false

main <<< Get() str ==>192.168.1.2<==

main <<< ip ==>192.168.1.9<==

main <<< LookUp(192.168.1.9): false

main <<< Get() str ==>192.168.1.9<==

main <<< ip ==>192.168.1.6<==

main <<< LookUp(192.168.1.6): false

main <<< Get() str ==>192.168.1.6<==

main <<< ip ==>192.168.1.12<==

main <<< LookUp(192.168.1.12): false

main <<< Get() str ==>192.168.1.12<==

main <<< ip ==>192.168.1.1<==

main <<< LookUp(192.168.1.1): false

main <<< Get() str ==>192.168.1.1<==

main <<< ip ==>192.168.1.7<==

main <<< LookUp(192.168.1.7): false

main <<< Get() str ==>192.168.1.7<==

main <<< ip ==>192.168.1.1<==

main <<< LookUp(192.168.1.1): true

main <<< Get() str ==>192.168.1.1<==

main <<< ip ==>192.168.1.5<==

main <<< LookUp(192.168.1.5): false

main <<< Get() str ==>192.168.1.5<==

main <<< LookUp(192.168.1.5): false

The test code is quite simple and reduced. If this would be production code we should test and each method in different ways to verify they all work.

We start by first instantiating a hash table. We add some strings. Clear the hash table and then do a final call to check if the last IP was found in the table. That should always return false.

The code for the class follows:

package john.hash.table;

import java.util.ArrayList;

import java.util.LinkedList;

import java.util.List;

public class HashTable {

// **** members ****

private int   n;

private List<LinkedList<String>> table;

// **** constructor ****

public HashTable(int n) {

this.n = n;

this.table = new ArrayList<LinkedList<String>>();

for (int i = 0; i < n; i++)

table.add(new LinkedList<String>());

}

// **** methods ****

private int Hash(String ip) {

String nets[] = ip.split(“\\.”);

int sum = 0;

for (int i = 0; i < nets.length; i++)

sum += Integer.parseInt(nets[i]);

return sum % n;

}

public void Put(String ip) {

int slot = Hash(ip);

List<String> bucket = table.get(slot);

bucket.add(ip);

}

public String Get(String ip) {

int slot = Hash(ip);

List<String> bucket = table.get(slot);

if (bucket.contains(ip))

return ip;

else return null;

}

public boolean LookUp(String ip) {

int slot = Hash(ip);

List<String> bucket = table.get(slot);

return bucket.contains(ip);

}

public void Remove(String ip) {

int slot = Hash(ip);

List<String> bucket = table.get(slot);

if (bucket.contains(ip))

bucket.remove(ip);

}

public void Clear() {

for (int i = 0; i < table.size(); i++) {

List<String> bucket = table.get(i);

bucket.clear();

}

}

}

This particular implementation uses an array list. Each entry contains a list of strings. In practice we would probably like to put additional data (e.g., a timestamp to determine when to delete the entries when a timeout occurs).

The buckets are implemented by lists. In this example there is no limit to the number of entries in a bucket. In practice a limitation should be set so we do not end up searching in sequential order the list. Of course some lists provide better performance but that is not the point.

As usual always try to use existing classes in libraries before implementing your own class (do not reinvent the wheel). It takes time to test and become confident that new code operates as intended and without side effects.

Also you should always keep in mind that you need to address the issue of thread safe. The previous code is not thread safe. I am not sure if it is wise under most real life conditions to write code that is not thread safe.

If you have comments or questions regarding this or any other entry in this blog please send me a message. Will replay ASAP and will not use your name unless you explicitly allow me to do so.

John

john.canessa@gmail.com

Follow me on Twitter: @john_canessa

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.