How Hash Tables Work

Deep-dive into hash tables — hashing functions, collision resolution (chaining & open addressing), load factor, rehashing, Big-O analysis, real-world implementations, and an interactive playground.

By Visual Explainer45 min readIntermediateInteractive Demo
How Hash Tables Work

The Big Picture — What Problem Do Hash Tables Solve?

Imagine you have a million user records and need to find one by username. With an array, you'd scan every record — O(n). With a hash table, you compute an index directly from the username and jump straight to it — O(1). Hash tables are the most important data structure in computer science, powering everything from Python dicts to database indexes to Redis caches.

The Library Analogy

Finding a book in an unsorted pile vs a perfectly indexed library.

📚Unsorted Pile — Array Scan

📖 Check book #1... nope

📖 Check book #2... nope

📖 Check book #3... nope

📖 ... keep going until you find it

O(n) — check every item
🏛️Indexed Library — Hash Table

🔑 Look up "Algorithms" in the index card

📍 Index says: Shelf 7, Position 3

📖 Walk directly there — found it!

 

O(1) — jump directly to it

Array vs Hash Table — Live Lookup

Toggle mode and search for a name. Watch the step counter.

Steps:0
[0]aliceEngineer
[1]bobDesigner
[2]carolManager
[3]daveAnalyst
[4]eveDevOps
[5]frankQA Lead
[6]graceDirector
[7]heidiIntern

Hash Tables Are Everywhere

🐍

Python

dict / set

Java

HashMap / HashSet

🟨

JavaScript

Object / Map / Set

🐹

Go

map[K]V

🔴

Redis

Key-Value Store

🗄️

Database

Hash Index

🔑 Key Takeaway: Hash tables solve the fundamental lookup problem — turning O(n) linear search into O(1) constant-time access by computing exactly where each item lives, rather than scanning everything.

How Hashing Functions Work

A hash function takes any input — a string, number, or object — and produces a fixed-size integer that serves as a bucket index. The quality of this function determines everything: a good hash distributes keys uniformly across buckets, while a bad one clusters them together, destroying O(1) performance.

Properties of a Good Hash Function

🎯Deterministic

Same input ALWAYS produces the same output. hash("alice") = 5 every time.

📊Uniform Distribution

Outputs are spread evenly across all buckets. No bucket should be much fuller than others.

Fast Computation

O(k) where k = key length. Must be faster than the lookup it enables.

🌊Avalanche Effect

A tiny change in input causes a massive change in output. "abc" → 7, "abd" → 2.

Hash Computation — Step by Step

Type any string and watch the hash pipeline: string → ASCII codes → sum → modulo → bucket index.

Step 1: String → ASCII Codes

'n'110
'a'97
'm'109
'e'101

Step 2: Sum All ASCII Codes

110 + 97 + 109 + 101 = 417

Step 3: Modulo (% 8 buckets)

417 % 8 = 1

[0]
[1]"name" ✓
[2]
[3]
[4]
[5]
[6]
[7]

djb2 (production hash):

djb2("name") = 2090536006 → bucket 6

Distribution Quality — Bucket Fill Visualization

Add keys and watch how evenly (or unevenly) they distribute across 8 buckets.

alicebobcaroldaveevefrankgraceheidi
[0]
2
clustering!
[1]
[2]
[3]
1
[4]
[5]
2
clustering!
[6]
1
[7]
2
clustering!

8 keys across 8 buckets — ideal: 1.0 per bucket

Famous Hash Functions

NameSpeedQualityUse Case
djb2Very FastGoodGeneral purpose, string hashing. Default in many C hash tables.
MurmurHash3FastExcellentUsed in Redis, Cassandra, Elasticsearch. Great distribution.
xxHashFastestExcellentFastest non-crypto hash. Used in Linux kernel, LZ4 compression.
SHA-256SlowPerfectCryptographic — overkill for hash tables. Used in Bitcoin, TLS, git.

🔑 Key Takeaway: A hash function converts any key into a fixed-size bucket index. A good one distributes keys uniformly across all buckets — minimizing collisions and keeping lookups at O(1).

Collision Resolution — Chaining

When two keys hash to the same bucket — a collision— the hash table needs a strategy. Separate chaining stores all colliding entries in a linked list at that bucket. It's simple, robust, and the default in Java's HashMap. But when chains grow long, lookups degrade from O(1) to O(n).

Separate Chaining — Animated Insertion

Click "Insert Next" to add keys one by one and watch collisions form chains.

0/8 inserted · 0 collisions
alice:25bob:30carol:28dave:35eve:22frank:40grace:33heidi:27

[0]

empty

[1]

empty

[2]

empty

[3]

empty

[4]

empty

[5]

empty

[6]

empty

[7]

empty

⚠️ Worst Case: All Keys in One Bucket

If every key hashes to bucket 0, the chain becomes a linked list of n elements. Lookup degrades from O(1) to O(n). This is why hash function quality matters.

🌳 Java 8+ Improvement: Treeification

When a chain grows beyond 8 nodes, Java HashMap converts it from a LinkedList to a Red-Black Tree. Worst-case lookup improves from O(n) to O(log n). When it shrinks below 6, it converts back.

LinkedList (≤8)Red-Black Tree (>8)

Lookup in a Chain

Insert some keys above, then search for one to see the chain traversal.

Chaining Implementation

// Simplified chaining HashMap
class HashMap<K, V> {
    private LinkedList<Entry<K,V>>[] buckets;
    private int capacity = 16;

    public void put(K key, V value) {
        int index = Math.abs(key.hashCode()) % capacity;
        if (buckets[index] == null)
            buckets[index] = new LinkedList<>();
        for (Entry<K,V> e : buckets[index])
            if (e.key.equals(key)) { e.value = value; return; }
        buckets[index].add(new Entry<>(key, value));
    }
}

🔑 Key Takeaway: Chaining handles collisions by storing multiple entries in each bucket as a linked list. Average lookup is O(1+α) where α is the load factor, but degrades to O(n) if all keys cluster in one bucket.

Collision Resolution — Open Addressing

Open addressing takes a different approach: instead of chaining, it finds another empty slot within the same array. Linear probing checks the next slot, quadratic probing jumps by i², and double hashing uses a second hash function. It's more cache-friendly than chaining but introduces clustering problems and makes deletion tricky.

Open Addressing — Probing Strategies

Instead of chaining, open addressing finds another empty slot in the same array.

index = (hash + i) % size

Check the next slot sequentially. Simple but causes primary clustering — long runs of occupied slots form and slow lookups.

0/7 inserted
alicebobcaroldaveevefrankgrace
[0]empty
[1]empty
[2]empty
[3]empty
[4]empty
[5]empty
[6]empty
[7]empty
[8]empty
[9]empty
[10]empty
[11]empty

Primary Clustering Problem (Linear Probing)

As the table fills, occupied slots cluster together into long runs. New insertions that hash ANYWHERE in the cluster must probe to the end — making the cluster even longer. It's a positive feedback loop that degrades performance.

🪦 The Deletion Problem: Tombstones

You can't simply clear a slot — it would break the probe chain for keys inserted after it. Instead, mark deleted slots as "tombstones" (DELETED). Lookups skip tombstones but don't stop. Insertions can reuse them. Too many tombstones degrade performance → periodic rehash to clean up.

Chaining vs Open Addressing

FeatureChainingOpen Addressing
MemoryExtra pointers per node (linked list overhead)No extra pointers — all data in the array
Cache PerformancePoor — linked list nodes scattered in memoryExcellent — data is contiguous in memory
Load FactorCan exceed 1.0 (chains grow freely)Must stay below 1.0 (table must have empty slots)
DeletionSimple — remove node from linked listComplex — requires tombstone markers
ClusteringNo clustering problemPrimary/secondary clustering with linear/quadratic
Worst CaseO(n) — long chainO(n) — long probe sequence

Open Addressing — Linear Probing

class HashTable:
    def __init__(self, size=16):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def put(self, key, value):
        index = hash(key) % self.size
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.values[index] = value
                return
            index = (index + 1) % self.size  # linear probe
        self.keys[index] = key
        self.values[index] = value

🔑 Key Takeaway: Open addressing stores everything in the array itself — no linked lists. It has better cache performance but suffers from clustering and requires tombstones for deletion. Double hashing gives the best distribution.

Load Factor & Rehashing

The load factor (entries ÷ capacity) is the critical health metric of a hash table. As it approaches 1.0, collisions skyrocket and lookups slow down exponentially. When it exceeds a threshold (typically 0.75), the table rehashes — doubling its capacity and reinserting every entry. This sounds expensive, but the amortized cost is still O(1) per insertion.

Load Factor & Rehashing

Load factor = entries / capacity. When it exceeds the threshold, the table doubles in size and all entries are rehashed.

Rehash Threshold

0.75
0.50 (aggressive)0.75 (Java default)1.00 (no rehash)

Entries

0

Capacity

8

Load Factor

0.00

Collisions

0

0threshold 0.751.0

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Why Load Factor Matters — Collision Curve

Average probes needed vs load factor (open addressing). Above 0.75, performance degrades rapidly.

← threshold
α = 0.10.50.751.0

Load Factor (α) → Avg Probes

📊 Amortized O(1)

Rehashing costs O(n) when it happens, but it only happens after n/2 insertions (doubling). Spreading the O(n) cost across n/2 cheap O(1) insertions gives an amortized cost of O(1) per insertion — like a staircase where each big step is paid for by many small ones.

Rehashing Implementation

func (h *HashTable) resize() {
    oldBuckets := h.buckets
    h.capacity *= 2
    h.buckets = make([][]Entry, h.capacity)
    h.count = 0
    for _, bucket := range oldBuckets {
        for _, entry := range bucket {
            h.Put(entry.Key, entry.Value)
        }
    }
}

🔑 Key Takeaway: Rehashing doubles the table size when the load factor exceeds a threshold (typically 0.75). Every entry is reinserted with the new modulus. This keeps collisions low and maintains O(1) average performance — at an amortized cost.

Why HashMap is O(1) Average but O(n) Worst Case

Hash tables are O(1) on average under the assumption of uniform hashing. But an adversary who knows your hash function can craft keys that all collide in one bucket, degrading your O(1) table to O(n²). Real-world hash flooding attacks have exploited this in PHP, Python, Ruby, and Java — leading to the adoption of randomized hashing (SipHash) as a defense.

Why O(1) Average but O(n) Worst Case?

Hash tables are O(1) on average with uniform hashing — but adversarial inputs can force O(n).

OperationChainingOpen AddressingJava 8+ (Treeified)
Insert (avg)O(1)O(1/(1-α))O(1)
Insert (worst)O(n)O(n)O(log n) tree
Lookup (avg)O(1+α)O(1/(1-α))O(1)
Lookup (worst)O(n)O(n)O(log n) tree
Delete (avg)O(1+α)O(1/(1-α))O(1)
Delete (worst)O(n)O(n)O(log n) tree

☠️ Worst Case: Adversarial Hash Collision

// All keys hash to bucket 0 — degenerate case
bucket[0]: key1 → key2 → key3 → ... → key_n
bucket[1]: (empty)
bucket[2]: (empty)
...
bucket[m]: (empty)

// Lookup for key_n requires traversing entire chain
// Time: O(n) — no better than a linked list!

An attacker who knows the hash function can craft keys that ALL hash to the same bucket, turning your O(1) hash table into an O(n) linked list.

Hash Flooding Attack — Real CVEs

Attacker sends crafted HTTP headers/POST data that all collide, turning O(1) into O(n²).

👤
Attacker
POST /api with 100,000 params that all hash to bucket[0]
→ each insert: O(n) → total: O(n²) → server frozen
🖥️
Server
2003

CVE-2003-0364Linux Kernel

Hash table DoS in IP fragment reassembly. Crafted packets caused all fragments to collide in one bucket, enabling kernel-level DoS.

2011

CVE-2011-4885PHP

Crafted POST parameters all hashed to bucket 0. Processing 100,000 colliding keys took 30+ seconds, freezing web servers.

2012

CVE-2012-5371Ruby

Predictable hash function allowed crafted HTTP headers to trigger O(n²) hash table behavior, enabling DoS attacks.

2011

Hashdos (2011)Perl, Python 2, Java, ASP.NET

Universal attack across languages. A single HTTP request with ~100KB of crafted form data could consume minutes of CPU time.

Defenses Against Hash Flooding

🎲

Randomized Hash Seed

Each process uses a random seed for hash computation. Same key produces different bucket indices across restarts. Attacker cannot predict collisions.

Python: PYTHONHASHSEED (random since 3.3)
Ruby: Hash seed randomized since 1.9
Perl: Hash seed randomized since 5.18

🔑 Key Takeaway: Hash tables are O(1) average under uniform hashing, but adversarial inputs can force O(n) worst case. Real-world hash flooding attacks exploited this in PHP, Python, Ruby, and Java. Modern defenses use randomized seeds (SipHash) to make collision crafting computationally infeasible.

Real-World Implementations Compared

Every language implements hash tables differently. Java uses chaining with tree fallback. Python uses open addressing with a compact layout that preserves insertion order. Go uses 8-entry buckets with overflow chains. Redis maintains two tables simultaneously for non-blocking incremental rehashing. Understanding these differences helps you choose the right tool and avoid performance pitfalls.

Real-World Hash Table Implementations

How Java, Python, Go, and Redis implement hash tables differently — and why.

Java HashMap

Initial Capacity16
Load Factor Threshold0.75
Collision StrategyChaining (LinkedList → Red-Black Tree at 8 nodes)
Rehash Triggersize > capacity × 0.75 → double capacity
OrderingNo ordering guarantee (use LinkedHashMap for insertion order)
Null KeysAllows one null key (hashes to bucket 0)
Thread SafetyNot thread-safe (use ConcurrentHashMap)

Implementation Notes

Java 8 introduced treeification: when a chain exceeds 8 nodes AND capacity ≥ 64, the chain converts to a red-black tree (O(log n) worst case). Untreeifies at 6 nodes. The hash is also perturbed: h ^ (h >>> 16) to spread bits.

Side-by-Side Comparison

Feature☕ Java🐍 Python🐹 Go🔴 Redis
Collision StrategyChain → TreeOpen addressingBucket + overflowChaining
Load Factor0.750.6676.51.0
Rehash×2 (stop-world)×2/×4IncrementalIncremental (2 tables)
OrderingNoneInsertion orderRandomNone
Hash FunctionObject.hashCode()SipHash-1-3AES-basedSipHash-2-4

🔑 Key Takeaway: Every language implements hash tables differently — Java uses chaining with tree fallback, Python uses open addressing with compact layout, Go uses 8-entry buckets, and Redis uses dual-table incremental rehashing. The best choice depends on your workload: memory-constrained? Use open addressing. Latency-sensitive? Use incremental rehashing.

Interactive Hash Table Playground

Build your own hash table from scratch. Insert key-value pairs, watch them get hashed into buckets, toggle between chaining and open addressing, monitor the load factor, and search for keys to see the probe sequence in action.

Interactive Hash Table Playground

Insert key-value pairs, watch them hash into buckets, and search for keys.

Entries

0

Capacity

8

Load Factor

0.00

Max Chain

1

00.751.0

Insert Key-Value Pair

[0]

empty

[1]

empty

[2]

empty

[3]

empty

[4]

empty

[5]

empty

[6]

empty

[7]

empty

🔍 Lookup Key

🔑 Key Takeaway: Hash tables provide O(1) average-case lookups by computing a bucket index from the key. The choice between chaining and open addressing affects memory layout, cache behavior, and how collisions are resolved. Keeping the load factor below 0.75 ensures collisions stay rare.