Internal Working of HashMap in Java

A HashMap is a part of Java's Collections Framework, and it stores data in key-value pairs. Internally, it uses a hash table to store entries, making it efficient for fast lookups, insertions, and deletions. The underlying structure of a HashMap is based on a dynamic array of buckets (or "bins") and the hashing technique for quick access.

Here’s an overview of how HashMap works internally:

1. Structure of HashMap

A HashMap consists of the following key components:

Buckets: A bucket is essentially an index in an array where hash collisions are stored. A HashMap is backed by an array, and each entry (a key-value pair) is stored in one of the array's indices (buckets).
Entries (Key-Value Pairs): Each entry is a key-value pair, stored in the buckets.
Hashing: A hash function is used to determine the index (bucket) where a key-value pair will be stored.

2. Hashing Process

When a key-value pair is added to a HashMap, the key undergoes a hashing process to determine where the key-value pair should be placed:

Hash Code Calculation:
The hashCode() method is called on the key. The hashCode() is an integer that uniquely identifies the key's identity (although different keys can have the same hash code, leading to a hash collision).
Bucket Index Calculation:
The hash code is then processed to determine the index in the array where the key-value pair should be stored. Typically, a formula like this is used to calculate the bucket index:
```
int index = hashCode & (array.length - 1);
```
Here, array.length - 1 ensures that the index is within the bounds of the array.
Handling Hash Collisions:
If multiple keys have the same hash code (a collision), the HashMap stores them in a linked list or a red-black tree (from Java 8 onward) at the same bucket. This helps ensure that even with collisions, the HashMap remains efficient.

3. Add Method (`put()`)

When a new key-value pair is added to the HashMap using the put() method, the following steps occur:

Calculate Hash Code and Index:
The hash code of the key is calculated, and the bucket index is determined.
Check for Existing Key:
If there is already an entry in the bucket at that index:
- If the key already exists, the existing value is replaced with the new value.
- If the key doesn’t exist, a new entry (node) is added to the bucket, either at the beginning or at the end of the chain (for linked lists) or as a child node in the red-black tree (for high collision cases).
Rehashing:
If the number of elements in the HashMap exceeds a certain threshold (determined by the load factor, typically 0.75), the HashMap automatically resizes and rehashes all the entries into a larger array.

4. Get Method (`get()`)

When retrieving a value from a HashMap using the get() method:

Calculate Hash Code and Index:
The hash code of the key is calculated and used to find the corresponding bucket.
Search the Bucket:
If there are multiple entries in the bucket (due to hash collisions), the HashMap traverses the chain of entries (either in the linked list or the red-black tree) and compares the keys using the equals() method to find the matching key.
Return the Value:
Once the matching key is found, the associated value is returned.

5. Remove Method (`remove()`)

To remove an entry:

Calculate Hash Code and Index:
The hash code of the key is calculated, and the index of the corresponding bucket is determined.
Search the Bucket:
The bucket is searched, and if there are multiple entries (due to collisions), the equals() method is used to identify the matching key.
Remove the Entry:
Once the entry is found, it is removed from the linked list or red-black tree. If the bucket becomes empty, it is marked as empty.

6. Resizing and Rehashing

When the HashMap reaches a certain load factor (usually 0.75), it automatically resizes:

New Array Creation:
A new, larger array is created (typically double the size of the original array).
Rehashing Entries:
All existing entries are rehashed and redistributed into the new array. This ensures that the HashMap remains efficient even as the number of entries grows.

7. Bucket Size and Load Factor

Initial Capacity: The number of buckets in the internal array when the HashMap is first created. The default value is 16.
Load Factor: Determines when the HashMap should resize. The default load factor is 0.75, which means that the HashMap will resize when it is 75% full.

Example:


// Creating a HashMap with initial capacity and load factor
HashMap<String, Integer> map = new HashMap<>(16, 0.75f);

8. Example of `HashMap` in Action


import java.util.HashMap;

public class HashMapExample {
    public static void main(String[] args) {
        HashMap<String, String> map = new HashMap<>();

        // Adding key-value pairs to the HashMap
        map.put("Apple", "Fruit");
        map.put("Carrot", "Vegetable");
        map.put("Banana", "Fruit");

        // Retrieving values
        System.out.println("Apple is a " + map.get("Apple")); // Output: Fruit
        System.out.println("Carrot is a " + map.get("Carrot")); // Output: Vegetable

        // Removing a key-value pair
        map.remove("Banana");

        // Checking if a key exists
        System.out.println("Contains 'Banana'? " + map.containsKey("Banana")); // Output: false
    }
}

Key Points to Remember

Hashing: The HashMap uses the hashCode() and equals() methods to determine where to store and retrieve entries.
Buckets: Entries are stored in an array of buckets, and collisions are handled using a linked list or red-black tree.
Resizing: The HashMap automatically resizes when it exceeds the load factor threshold.
Performance:
- Average Time Complexity for put(), get(), and remove() is O(1), but it can degrade to O(n) in the worst case (for hash collisions).

AtoZ about Java

Search This Blog