How to shard databases for horizontal scale

· Category: System Design

Short answer

Sharding splits a database into smaller pieces called shards, each handling a subset of data to distribute load and storage.

Steps

  1. Select a shard key that distributes writes evenly and supports common queries.
  2. Use hash sharding for uniform distribution or range sharding for range queries.
  3. Implement a directory service or smart client to route requests to the correct shard.
  4. Rebalance shards as data grows by splitting or migrating ranges.
  5. Handle cross-shard transactions carefully or avoid them when possible.

Tips

  • Choose a high-cardinality shard key to avoid hot spots.
  • Keep related data on the same shard to minimize cross-shard joins.
  • Use auto-sharding in managed services to reduce manual overhead.
  • Plan for resharding events that require temporary downtime or dual writes.

Common issues

  • Hot spots when the shard key has skewed distribution.
  • Complex cross-shard queries and transactions.
  • Uneven shard sizes causing some nodes to become bottlenecks.
  • Application complexity from managing shard routing logic.

Example

-- Routing based on user_id hash
SELECT * FROM users_shard_2
WHERE user_id = 12345;

This query targets a specific shard determined by hashing the user identifier, illustrating how sharding routes requests to the correct partition.