How to design a database schema for scale

Question

QA Hub Editorial · Accepted Answer

Short answer

A scalable schema balances normalization for consistency with denormalization for read performance, supported by indexes and partitioning.

Steps

Normalize data to third normal form to eliminate redundancy and anomalies.
Identify query patterns and add indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
Partition large tables by range, list, or hash to improve manageability.
Denormalize selectively for read-heavy paths using materialized views or summary tables.
Monitor query plans and adjust indexes as access patterns evolve.

Tips

Use covering indexes to satisfy queries without accessing the heap.
Keep hot data in smaller partitions for faster scans.
Avoid excessive indexes that slow down writes and increase storage.
Use surrogate keys instead of natural keys for stability.

Common issues

Over-normalization causing excessive joins and latency.
Missing indexes leading to full table scans.
Hotspot partitions when the partition key is poorly chosen.
Schema changes locking large tables for extended periods.

Example

# Consistent hashing for service discovery
import hashlib

def get_node(key, nodes):
    hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
    return nodes[hash_val % len(nodes)]

node = get_node('user-123', ['node-a', 'node-b', 'node-c'])

This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.

Short answer

Steps

Tips

Common issues

Example

Related Questions

How to shard databases for horizontal scale

How to design a search engine architecture

SQL vs NoSQL for system design

What is eventual consistency in distributed systems

How to implement distributed caching

What are microservices and when to use them