How Hash Tables Work

Deep-dive into hash tables — hashing functions, collision resolution (chaining & open addressing), load factor, rehashing, Big-O analysis, real-world implementations, and an interactive playground.

By Visual Explainer·45 min read·IntermediateInteractive Demo

The Big Picture — What Problem Do Hash Tables Solve?

Imagine you have a million user records and need to find one by username. With an array, you'd scan every record — O(n). With a hash table, you compute an index directly from the username and jump straight to it — O(1). Hash tables are the most important data structure in computer science, powering everything from Python dicts to database indexes to Redis caches.

The Library Analogy

Finding a book in an unsorted pile vs a perfectly indexed library.

📚Unsorted Pile — Array Scan

📖 Check book #1... nope

📖 Check book #2... nope

📖 Check book #3... nope

📖 ... keep going until you find it

O(n) — check every item

🏛️Indexed Library — Hash Table

🔑 Look up "Algorithms" in the index card

📍 Index says: Shelf 7, Position 3

📖 Walk directly there — found it!

O(1) — jump directly to it

Array vs Hash Table — Live Lookup

Toggle mode and search for a name. Watch the step counter.

Steps:0

[0]aliceEngineer

[1]bobDesigner

[2]carolManager

[3]daveAnalyst

[4]eveDevOps

[5]frankQA Lead

[6]graceDirector

[7]heidiIntern

Hash Tables Are Everywhere

🐍

Python

dict / set

☕

Java

HashMap / HashSet

🟨

JavaScript

Object / Map / Set

🐹

map[K]V

🔴

Redis

Key-Value Store

🗄️

Database

Hash Index

🔑 Key Takeaway: Hash tables solve the fundamental lookup problem — turning O(n) linear search into O(1) constant-time access by computing exactly where each item lives, rather than scanning everything.

How Hashing Functions Work

A hash function takes any input — a string, number, or object — and produces a fixed-size integer that serves as a bucket index. The quality of this function determines everything: a good hash distributes keys uniformly across buckets, while a bad one clusters them together, destroying O(1) performance.

Properties of a Good Hash Function

🎯Deterministic

Same input ALWAYS produces the same output. hash("alice") = 5 every time.

📊Uniform Distribution

Outputs are spread evenly across all buckets. No bucket should be much fuller than others.

⚡Fast Computation

O(k) where k = key length. Must be faster than the lookup it enables.

🌊Avalanche Effect

A tiny change in input causes a massive change in output. "abc" → 7, "abd" → 2.

Hash Computation — Step by Step

Type any string and watch the hash pipeline: string → ASCII codes → sum → modulo → bucket index.

Step 1: String → ASCII Codes

'n'↓110

'a'↓97

'm'↓109

'e'↓101

Step 2: Sum All ASCII Codes

110 + 97 + 109 + 101 = 417

Step 3: Modulo (% 8 buckets)

417 % 8 = 1

[0]

[1]"name" ✓

[2]

[3]

[4]

[5]

[6]

[7]

djb2 (production hash):

djb2("name") = 2090536006 → bucket 6

Distribution Quality — Bucket Fill Visualization

Add keys and watch how evenly (or unevenly) they distribute across 8 buckets.

alicebobcaroldaveevefrankgraceheidi

[0]

clustering!

[1]

[2]

[3]

[4]

[5]

clustering!

[6]

[7]

clustering!

8 keys across 8 buckets — ideal: 1.0 per bucket

Famous Hash Functions

Name	Speed	Quality	Use Case
djb2	Very Fast	Good	General purpose, string hashing. Default in many C hash tables.
MurmurHash3	Fast	Excellent	Used in Redis, Cassandra, Elasticsearch. Great distribution.
xxHash	Fastest	Excellent	Fastest non-crypto hash. Used in Linux kernel, LZ4 compression.
SHA-256	Slow	Perfect	Cryptographic — overkill for hash tables. Used in Bitcoin, TLS, git.

🔑 Key Takeaway: A hash function converts any key into a fixed-size bucket index. A good one distributes keys uniformly across all buckets — minimizing collisions and keeping lookups at O(1).

Collision Resolution — Chaining

When two keys hash to the same bucket — a collision— the hash table needs a strategy. Separate chaining stores all colliding entries in a linked list at that bucket. It's simple, robust, and the default in Java's HashMap. But when chains grow long, lookups degrade from O(1) to O(n).

Separate Chaining — Animated Insertion

Click "Insert Next" to add keys one by one and watch collisions form chains.

0/8 inserted · 0 collisions

alice:25bob:30carol:28dave:35eve:22frank:40grace:33heidi:27

[0]

empty

[1]

empty

[2]

empty

[3]

empty

[4]

empty

[5]

empty

[6]

empty

[7]

empty

⚠️ Worst Case: All Keys in One Bucket

If every key hashes to bucket 0, the chain becomes a linked list of n elements. Lookup degrades from O(1) to O(n). This is why hash function quality matters.

🌳 Java 8+ Improvement: Treeification

When a chain grows beyond 8 nodes, Java HashMap converts it from a LinkedList to a Red-Black Tree. Worst-case lookup improves from O(n) to O(log n). When it shrinks below 6, it converts back.

LinkedList (≤8)→Red-Black Tree (>8)

Lookup in a Chain

Insert some keys above, then search for one to see the chain traversal.

Chaining Implementation

// Simplified chaining HashMap
class HashMap<K, V> {
    private LinkedList<Entry<K,V>>[] buckets;
    private int capacity = 16;

    public void put(K key, V value) {
        int index = Math.abs(key.hashCode()) % capacity;
        if (buckets[index] == null)
            buckets[index] = new LinkedList<>();
        for (Entry<K,V> e : buckets[index])
            if (e.key.equals(key)) { e.value = value; return; }
        buckets[index].add(new Entry<>(key, value));
    }
}

🔑 Key Takeaway: Chaining handles collisions by storing multiple entries in each bucket as a linked list. Average lookup is O(1+α) where α is the load factor, but degrades to O(n) if all keys cluster in one bucket.

Collision Resolution — Open Addressing

Open addressing takes a different approach: instead of chaining, it finds another empty slot within the same array. Linear probing checks the next slot, quadratic probing jumps by i², and double hashing uses a second hash function. It's more cache-friendly than chaining but introduces clustering problems and makes deletion tricky.

Open Addressing — Probing Strategies

Instead of chaining, open addressing finds another empty slot in the same array.

index = (hash + i) % size

Check the next slot sequentially. Simple but causes primary clustering — long runs of occupied slots form and slow lookups.

0/7 inserted

alicebobcaroldaveevefrankgrace

[0]empty

[1]empty

[2]empty

[3]empty

[4]empty

[5]empty

[6]empty

[7]empty

[8]empty

[9]empty

[10]empty

[11]empty

Primary Clustering Problem (Linear Probing)

As the table fills, occupied slots cluster together into long runs. New insertions that hash ANYWHERE in the cluster must probe to the end — making the cluster even longer. It's a positive feedback loop that degrades performance.

🪦 The Deletion Problem: Tombstones

You can't simply clear a slot — it would break the probe chain for keys inserted after it. Instead, mark deleted slots as "tombstones" (DELETED). Lookups skip tombstones but don't stop. Insertions can reuse them. Too many tombstones degrade performance → periodic rehash to clean up.

Chaining vs Open Addressing

Feature	Chaining	Open Addressing
Memory	Extra pointers per node (linked list overhead)	No extra pointers — all data in the array
Cache Performance	Poor — linked list nodes scattered in memory	Excellent — data is contiguous in memory
Load Factor	Can exceed 1.0 (chains grow freely)	Must stay below 1.0 (table must have empty slots)
Deletion	Simple — remove node from linked list	Complex — requires tombstone markers
Clustering	No clustering problem	Primary/secondary clustering with linear/quadratic
Worst Case	O(n) — long chain	O(n) — long probe sequence

Open Addressing — Linear Probing

class HashTable:
    def __init__(self, size=16):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size

    def put(self, key, value):
        index = hash(key) % self.size
        while self.keys[index] is not None:
            if self.keys[index] == key:
                self.values[index] = value
                return
            index = (index + 1) % self.size  # linear probe
        self.keys[index] = key
        self.values[index] = value

🔑 Key Takeaway: Open addressing stores everything in the array itself — no linked lists. It has better cache performance but suffers from clustering and requires tombstones for deletion. Double hashing gives the best distribution.

Load Factor & Rehashing

The load factor (entries ÷ capacity) is the critical health metric of a hash table. As it approaches 1.0, collisions skyrocket and lookups slow down exponentially. When it exceeds a threshold (typically 0.75), the table rehashes — doubling its capacity and reinserting every entry. This sounds expensive, but the amortized cost is still O(1) per insertion.

Load Factor & Rehashing

Load factor = entries / capacity. When it exceeds the threshold, the table doubles in size and all entries are rehashed.

Rehash Threshold

0.75

0.50 (aggressive)0.75 (Java default)1.00 (no rehash)

Entries

Capacity

Load Factor

0.00

Collisions

0threshold 0.751.0

[0]

—

[1]

—

[2]

—

[3]

—

[4]

—

[5]

—

[6]

—

[7]

—

Why Load Factor Matters — Collision Curve

Average probes needed vs load factor (open addressing). Above 0.75, performance degrades rapidly.

← threshold

α = 0.10.50.751.0

Load Factor (α) → Avg Probes

📊 Amortized O(1)

Rehashing costs O(n) when it happens, but it only happens after n/2 insertions (doubling). Spreading the O(n) cost across n/2 cheap O(1) insertions gives an amortized cost of O(1) per insertion — like a staircase where each big step is paid for by many small ones.

Rehashing Implementation

func (h *HashTable) resize() {
    oldBuckets := h.buckets
    h.capacity *= 2
    h.buckets = make([][]Entry, h.capacity)
    h.count = 0
    for _, bucket := range oldBuckets {
        for _, entry := range bucket {
            h.Put(entry.Key, entry.Value)
        }
    }
}

🔑 Key Takeaway: Rehashing doubles the table size when the load factor exceeds a threshold (typically 0.75). Every entry is reinserted with the new modulus. This keeps collisions low and maintains O(1) average performance — at an amortized cost.

Why HashMap is O(1) Average but O(n) Worst Case

Hash tables are O(1) on average under the assumption of uniform hashing. But an adversary who knows your hash function can craft keys that all collide in one bucket, degrading your O(1) table to O(n²). Real-world hash flooding attacks have exploited this in PHP, Python, Ruby, and Java — leading to the adoption of randomized hashing (SipHash) as a defense.

Why O(1) Average but O(n) Worst Case?

Hash tables are O(1) on average with uniform hashing — but adversarial inputs can force O(n).

Operation	Chaining	Open Addressing	Java 8+ (Treeified)
Insert (avg)	O(1)	O(1/(1-α))	O(1)
Insert (worst)	O(n)	O(n)	O(log n) tree
Lookup (avg)	O(1+α)	O(1/(1-α))	O(1)
Lookup (worst)	O(n)	O(n)	O(log n) tree
Delete (avg)	O(1+α)	O(1/(1-α))	O(1)
Delete (worst)	O(n)	O(n)	O(log n) tree

☠️ Worst Case: Adversarial Hash Collision

// All keys hash to bucket 0 — degenerate case
bucket[0]: key1 → key2 → key3 → ... → key_n
bucket[1]: (empty)
bucket[2]: (empty)
...
bucket[m]: (empty)

// Lookup for key_n requires traversing entire chain
// Time: O(n) — no better than a linked list!

An attacker who knows the hash function can craft keys that ALL hash to the same bucket, turning your O(1) hash table into an O(n) linked list.

Hash Flooding Attack — Real CVEs

Attacker sends crafted HTTP headers/POST data that all collide, turning O(1) into O(n²).

👤

Attacker

POST /api with 100,000 params that all hash to bucket[0]

→ each insert: O(n) → total: O(n²) → server frozen

🖥️

Server

2003

CVE-2003-0364 — Linux Kernel

Hash table DoS in IP fragment reassembly. Crafted packets caused all fragments to collide in one bucket, enabling kernel-level DoS.

2011

CVE-2011-4885 — PHP

Crafted POST parameters all hashed to bucket 0. Processing 100,000 colliding keys took 30+ seconds, freezing web servers.

2012

CVE-2012-5371 — Ruby

Predictable hash function allowed crafted HTTP headers to trigger O(n²) hash table behavior, enabling DoS attacks.

2011

Hashdos (2011) — Perl, Python 2, Java, ASP.NET

Universal attack across languages. A single HTTP request with ~100KB of crafted form data could consume minutes of CPU time.

Defenses Against Hash Flooding

🎲

Randomized Hash Seed

Each process uses a random seed for hash computation. Same key produces different bucket indices across restarts. Attacker cannot predict collisions.

Python: PYTHONHASHSEED (random since 3.3)
Ruby: Hash seed randomized since 1.9
Perl: Hash seed randomized since 5.18

🔑 Key Takeaway: Hash tables are O(1) average under uniform hashing, but adversarial inputs can force O(n) worst case. Real-world hash flooding attacks exploited this in PHP, Python, Ruby, and Java. Modern defenses use randomized seeds (SipHash) to make collision crafting computationally infeasible.

Real-World Implementations Compared

Every language implements hash tables differently. Java uses chaining with tree fallback. Python uses open addressing with a compact layout that preserves insertion order. Go uses 8-entry buckets with overflow chains. Redis maintains two tables simultaneously for non-blocking incremental rehashing. Understanding these differences helps you choose the right tool and avoid performance pitfalls.

Real-World Hash Table Implementations

How Java, Python, Go, and Redis implement hash tables differently — and why.

☕

Java HashMap

Initial Capacity16

Load Factor Threshold0.75

Collision StrategyChaining (LinkedList → Red-Black Tree at 8 nodes)

Rehash Triggersize > capacity × 0.75 → double capacity

OrderingNo ordering guarantee (use LinkedHashMap for insertion order)

Null KeysAllows one null key (hashes to bucket 0)

Thread SafetyNot thread-safe (use ConcurrentHashMap)

Implementation Notes

Java 8 introduced treeification: when a chain exceeds 8 nodes AND capacity ≥ 64, the chain converts to a red-black tree (O(log n) worst case). Untreeifies at 6 nodes. The hash is also perturbed: h ^ (h >>> 16) to spread bits.

Side-by-Side Comparison

Feature	☕ Java	🐍 Python	🐹 Go	🔴 Redis
Collision Strategy	Chain → Tree	Open addressing	Bucket + overflow	Chaining
Load Factor	0.75	0.667	6.5	1.0
Rehash	×2 (stop-world)	×2/×4	Incremental	Incremental (2 tables)
Ordering	None	Insertion order	Random	None
Hash Function	Object.hashCode()	SipHash-1-3	AES-based	SipHash-2-4

🔑 Key Takeaway: Every language implements hash tables differently — Java uses chaining with tree fallback, Python uses open addressing with compact layout, Go uses 8-entry buckets, and Redis uses dual-table incremental rehashing. The best choice depends on your workload: memory-constrained? Use open addressing. Latency-sensitive? Use incremental rehashing.

Interactive Hash Table Playground

Build your own hash table from scratch. Insert key-value pairs, watch them get hashed into buckets, toggle between chaining and open addressing, monitor the load factor, and search for keys to see the probe sequence in action.

Interactive Hash Table Playground

Insert key-value pairs, watch them hash into buckets, and search for keys.

Entries

Capacity

Load Factor

0.00

Max Chain

00.751.0

Insert Key-Value Pair

[0]

empty

[1]

empty

[2]

empty

[3]

empty

[4]

empty

[5]

empty

[6]

empty

[7]

empty

🔍 Lookup Key

🔑 Key Takeaway: Hash tables provide O(1) average-case lookups by computing a bucket index from the key. The choice between chaining and open addressing affects memory layout, cache behavior, and how collisions are resolved. Keeping the load factor below 0.75 ensures collisions stay rare.