A complete guide to hashing and collision resolution strategy

Vivek Kumar | July 22, 2017 | Data Structure | No Comments

Hashing is an approach which provides searching operation in constant running time.

There are many applications that require predominantly dictionary operations(Insert, Delete, Search).
One of the approaches could be used is Balanced binary search tree, taking O(logn) time. But, what if better would be achieved?
The answer is Hashing, that possesses on an average O(1) time for searching an element. Though the worst condition leads to O(n), but it performs extremely well in practice. Interestingly, constant running time can also be achieved for the insertion and deletion operation.

Say want to store elements 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. The optimal way to do is an array of 10 blocks with index zero slots contains 0, index one contains element 1 and so on. See the following

Benefits of using this, want to access element(or key) 4 then, just write arr[3] (base index 0) i.e. it takes O(1) time for searching and even for insertion and deletion. Array with stored keys at the respective indexes is shown is referred as Direct-access table or Direct-address table. There are various problems associated with it.
Note:- Elements are the data needed to store and access when required and keys are used to decide where to store. Keys are generated from the element. In our example, keys are the elements itself.

Table of Contents

Cons of Direct-address table:

1. If key is very large

Direct addressing method will no longer be used if the key value is large. A key 25000 means an array with the index 24999 under direct addressing method, is not a good idea. Therefore, what can be done is to use some relative size array(decided as per requirement) and provide some mapping function. In other words, there has to be some function that takes the key and returns an index. This mapping function is referred as Hash Function. The generalized array we’re using in direct-addressing is now referred as Hash Table.
For example, key 25000 will become 25000mod10 = 0 where mod10 is the hash function, 10 is the size of the array and the result 0 is the slot 0 where 25000 will be stored.

2. Collision: relatively higher number of keys to store than available slots

Here comes another problem, the most challenging one. The number of keys required to store are higher than the available slots. Say hashing fun mod10 and the keys are 14, 24, 34, 94 etc. This will lead to the collision as all strike to same slot 4.
Thus, collision handling technique are used. Based on it hashing is classified as two types

Chaining
Open addressing

Also, the above discussion on hashing considering only numeric based keys, but, it could be a string as well. See below, where the hash function is generating numeric value for the string type key. A possible collision is also shown with two keys mapping to the same slot.

No collision — Hash Table with no collision

With Collisions — Hash Table with collision

Hashing has following components:

Hash Table, The data structure where elements are stored.
Hash-function, generates the index(also called hashes) from the element. It uses the element’s value to generate the hash and
Collision resolution technique:
1. Chaining: It says use linked list on collision i.e., each slot of the hash table points to a linked list of keys having same value(hashes). Whenever there is a collision then the new element is added at the end of the linked list. To improve the efficiency and take less processing time this linked list is sometimes replaced by a tree. This reduces the time complexity from O(n) to O(log n).
2. Open Addressing or Closed Hashing: All the keys have to be stored in the hash table itself(no linked list). In case of collision, probing is done for finding available free slots.

Knowledge is most useful when liberated and shared. Share this to motivate us to keep writing such online tutorials for free and do comment if anything is missing or wrong or you need any kind of help.

Keep Learning… Happy Learning.. 🙂

0 Comments

Inline Feedbacks

View all comments

Codingeek

A complete guide to hashing and collision resolution strategy

Cons of Direct-address table:

1. If key is very large

2. Collision: relatively higher number of keys to store than available slots

Hashing has following components:

Recommended -