Database Sharding: Horizontal Partitioning Explained
Sharding horizontally partitions a database, splitting rows across multiple servers (shards) so each holds a subset of data. It scales writes and storage beyond one machine. Strategies include range-based, hash-based, and directory-based sharding. The main tradeoffs are hot shards, painful cross-shard joins/transactions, and complex resharding.
What Sharding Is
Sharding (horizontal partitioning) splits a large table across multiple database servers, where each shard holds a disjoint subset of rows but the same schema. Unlike replication (which copies all data to every node), sharding divides the data, so it scales writes and total storage, not just reads.
You shard when a single node can no longer handle the write throughput, dataset size, or memory working set, typically when vertical scaling (a bigger box) becomes too expensive or hits hardware limits. The shard key (e.g., user_id) determines which shard a row lives on.
Sharding Strategies
The sharding strategy determines how rows map to shards, and each makes a different tradeoff between even distribution, query flexibility, and the pain of adding nodes later.
- Range-based: partition by key ranges (users A-M on shard 1, N-Z on shard 2). Simple and supports range scans, but prone to hot shards if data is unevenly distributed.
- Hash-based: apply a hash to the shard key and modulo the shard count. Distributes evenly but breaks range queries and requires resharding when adding nodes (mitigated by consistent hashing).
- Directory-based: a lookup table maps keys to shards. Flexible (rebalance by updating the map) but the directory becomes a bottleneck and single point of failure.
- Geo/entity-based: shard by region or tenant for data locality and compliance (e.g., EU data in EU shards).
The Hard Problems
Hot shards (celebrity problem): a skewed key like a viral user concentrates load on one shard. Mitigate by choosing a high-cardinality shard key, composite keys, or splitting hot entities.
Cross-shard operations: joins and transactions spanning shards are slow or impossible with single-node ACID. Teams denormalize, use application-level joins, or adopt distributed transaction protocols (2PC, sagas). Cross-shard queries fan out and gather, increasing tail latency.
Resharding: changing shard count requires migrating data while serving traffic. Consistent hashing minimizes the keys that move (only ~1/N), and tools like Vitess automate resharding for MySQL.
| Strategy | Even distribution | Range queries | Resharding pain |
|---|---|---|---|
| Range | Poor (hot spots) | Excellent | Moderate |
| Hash (modulo) | Good | Poor | High |
| Consistent hash | Good | Poor | Low |
| Directory | Tunable | Tunable | Low (update map) |
Real Systems
Instagram shards Postgres by user ID using logical shards mapped to physical machines, letting them rebalance without changing IDs. YouTube/PlanetScale use Vitess to shard MySQL transparently. DynamoDB and Cassandra auto-partition by hash of the partition key. MongoDB supports range and hashed sharding natively. Discord migrated trillions of messages to Cassandra/ScyllaDB partitioned by channel and time bucket.
ResuMax tailors your resume to each role, scores it like a recruiter, and preps you for interviews.
Practice with the interview coachFrequently asked questions
What's the difference between sharding and partitioning?
Partitioning is the general concept of splitting data; sharding specifically means horizontal partitioning across separate database servers/nodes. Vertical partitioning (splitting columns) and within-node partitioning are partitioning but not sharding.
How do I pick a good shard key?
Pick a high-cardinality key with even access distribution that most queries filter on, so requests hit a single shard. user_id is common. Avoid low-cardinality or monotonically increasing keys (like timestamps) that create hot shards.
Why are cross-shard joins and transactions hard?
Data on different shards lives on different servers, so a single-node ACID transaction or join can't span them. You must fan out queries and merge in the app, denormalize to keep related data together, or use distributed transactions (2PC/sagas) which add latency and complexity.
What is a hot shard and how do I avoid it?
A hot shard receives disproportionate load because of a skewed shard key (e.g., a celebrity user). Avoid it with a high-cardinality key, composite/compound shard keys, consistent hashing, or by splitting/replicating hot entities.