How to design a database schema for scale

· Category: System Design

Short answer

A scalable schema balances normalization for consistency with denormalization for read performance, supported by indexes and partitioning.

Steps

  1. Normalize data to third normal form to eliminate redundancy and anomalies.
  2. Identify query patterns and add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
  3. Partition large tables by range, list, or hash to improve manageability.
  4. Denormalize selectively for read-heavy paths using materialized views or summary tables.
  5. Monitor query plans and adjust indexes as access patterns evolve.

Tips

  • Use covering indexes to satisfy queries without accessing the heap.
  • Keep hot data in smaller partitions for faster scans.
  • Avoid excessive indexes that slow down writes and increase storage.
  • Use surrogate keys instead of natural keys for stability.

Common issues

  • Over-normalization causing excessive joins and latency.
  • Missing indexes leading to full table scans.
  • Hotspot partitions when the partition key is poorly chosen.
  • Schema changes locking large tables for extended periods.

Example

# Consistent hashing for service discovery
import hashlib

def get_node(key, nodes):
    hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
    return nodes[hash_val % len(nodes)]

node = get_node('user-123', ['node-a', 'node-b', 'node-c'])

This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.