How to design a database schema for scale
· Category: System Design
Short answer
A scalable schema balances normalization for consistency with denormalization for read performance, supported by indexes and partitioning.
Steps
- Normalize data to third normal form to eliminate redundancy and anomalies.
- Identify query patterns and add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
- Partition large tables by range, list, or hash to improve manageability.
- Denormalize selectively for read-heavy paths using materialized views or summary tables.
- Monitor query plans and adjust indexes as access patterns evolve.
Tips
- Use covering indexes to satisfy queries without accessing the heap.
- Keep hot data in smaller partitions for faster scans.
- Avoid excessive indexes that slow down writes and increase storage.
- Use surrogate keys instead of natural keys for stability.
Common issues
- Over-normalization causing excessive joins and latency.
- Missing indexes leading to full table scans.
- Hotspot partitions when the partition key is poorly chosen.
- Schema changes locking large tables for extended periods.
Example
# Consistent hashing for service discovery
import hashlib
def get_node(key, nodes):
hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
return nodes[hash_val % len(nodes)]
node = get_node('user-123', ['node-a', 'node-b', 'node-c'])
This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.