How to shard databases for horizontal scale
· Category: System Design
Short answer
Sharding splits a database into smaller pieces called shards, each handling a subset of data to distribute load and storage.
Steps
- Select a shard key that distributes writes evenly and supports common queries.
- Use hash sharding for uniform distribution or range sharding for range queries.
- Implement a directory service or smart client to route requests to the correct shard.
- Rebalance shards as data grows by splitting or migrating ranges.
- Handle cross-shard transactions carefully or avoid them when possible.
Tips
- Choose a high-cardinality shard key to avoid hot spots.
- Keep related data on the same shard to minimize cross-shard joins.
- Use auto-sharding in managed services to reduce manual overhead.
- Plan for resharding events that require temporary downtime or dual writes.
Common issues
- Hot spots when the shard key has skewed distribution.
- Complex cross-shard queries and transactions.
- Uneven shard sizes causing some nodes to become bottlenecks.
- Application complexity from managing shard routing logic.
Example
-- Routing based on user_id hash
SELECT * FROM users_shard_2
WHERE user_id = 12345;
This query targets a specific shard determined by hashing the user identifier, illustrating how sharding routes requests to the correct partition.