Caching: How It Works, Strategies, and Tradeoffs

Caching stores frequently accessed data in fast memory (RAM) to avoid slow recomputation or database hits, cutting latency from ~10ms to sub-millisecond. Common patterns are cache-aside, write-through, and write-back. Key tradeoffs involve consistency, staleness, eviction policy (LRU/LFU), and invalidation, which is famously one of the hardest problems in computing.

What Caching Is and Why It Matters

A cache is a high-speed storage layer that holds a subset of data so future requests are served faster than fetching from the primary store. A RAM read takes roughly 100 nanoseconds versus ~10 milliseconds for a disk-backed database query and 100+ ms for a cross-region network call, so caching can improve latency by orders of magnitude.

Caches appear at every layer: CPU L1/L2/L3 caches, OS page cache, browser cache, CDN edge cache, application in-memory caches (Caffeine, Guava), and distributed caches like Redis and Memcached. The core principle is locality: temporal (recently used data is reused) and spatial (nearby data is used together).

Caching Strategies (Read and Write Patterns)

Cache-aside (lazy loading) is the most common: the application checks the cache first; on a miss it reads the database, populates the cache, and returns. It's resilient (a cache failure still serves from DB) but the first request always misses and stale data can persist until TTL expiry.

Read-through: the cache library itself loads from the DB on a miss, keeping app code simple but coupling the cache to the data source.
Write-through: writes go to cache and DB synchronously, keeping them consistent but adding write latency.
Write-back (write-behind): writes hit cache immediately and flush to DB asynchronously; very fast writes but risks data loss if the cache dies before flush.
Write-around: writes go straight to the DB, bypassing cache, good for write-heavy data that's rarely re-read.

Eviction Policies and Invalidation

When a cache fills, an eviction policy decides what to drop. LRU (Least Recently Used) is the default in Redis and most systems; LFU (Least Frequently Used) suits skewed access patterns; FIFO and random are simpler but less effective. Redis offers allkeys-lru, volatile-lru, allkeys-lfu, and TTL-based eviction.

Invalidation keeps caches from serving stale data. TTL (time-to-live) expiry is simplest but allows bounded staleness. Explicit invalidation on writes is precise but error-prone. Watch for the thundering herd / cache stampede when many keys expire simultaneously, mitigated with jittered TTLs, request coalescing, or probabilistic early expiration.

Strategy	Consistency	Write latency	Best for
Cache-aside	Eventual	Low	Read-heavy, general purpose
Write-through	Strong	Higher	Data needing fresh cache
Write-back	Weak (risk loss)	Lowest	Write-heavy, tolerant of loss
Write-around	Eventual	Low	Write-once, read-rarely

Real Systems and Tools

Redis (single-threaded core, rich data structures, persistence via RDB/AOF) is used by Twitter, GitHub, and Stack Overflow. Memcached is a simpler multi-threaded key-value cache favored at Facebook for huge horizontally scaled pools. CDNs like Cloudflare and Akamai cache static assets at the edge. Facebook's memcached deployment famously serves billions of requests per second with lease-based stampede protection.

ResuMax tailors your resume to each role, scores it like a recruiter, and preps you for interviews.

Practice with the interview coach

Frequently asked questions

What is cache invalidation and why is it hard?

Cache invalidation is removing or updating stale cache entries when the source data changes. It's hard because there's no universal signal for when data became stale, distributed caches can have many copies, and over-aggressive invalidation kills hit rates while under-invalidation serves wrong data.

Redis vs Memcached: which should I use?

Use Redis when you need data structures (lists, sorted sets, hashes), persistence, pub/sub, or replication. Use Memcached for a simple, multi-threaded, pure key-value cache that scales horizontally with minimal overhead. Redis is the more common default today.

What is a cache stampede?

A cache stampede (thundering herd) happens when a popular key expires and many concurrent requests all miss and hit the database simultaneously. Mitigations include jittered TTLs, request coalescing/locking, and probabilistic early recomputation before expiry.

What's a good cache hit ratio?

It depends on access patterns, but well-tuned caches often achieve 80-95%+ hit ratios. Even a 90% hit ratio means only 1 in 10 requests reaches the slower backend, dramatically reducing load and tail latency.

All system design