How the saga pattern manages distributed transactions

· Category: System Design

Short answer

The saga pattern breaks long-running transactions into a sequence of local transactions, each with a compensating action for rollback.

Steps

  1. Decompose the business process into steps owned by different services.
  2. Execute each step and publish an event upon completion.
  3. If a step fails, trigger compensating transactions for prior completed steps.
  4. Use orchestration with a central coordinator or choreography with event-driven collaboration.
  5. Ensure all compensations are idempotent and observable.

Tips

  • Prefer choreography for loose coupling and orchestration for complex flows.
  • Log every step and compensation for auditability.
  • Design compensations to be semantically correct even if delayed.
  • Use timeouts and alerts to detect stalled sagas.

Common issues

  • Compensation failures leaving the system in a partially inconsistent state.
  • Ordering bugs causing compensations to run before the original action.
  • Complexity from tracking saga state across many services.
  • Difficulty testing all failure combinations in long sagas.

Example

# Consistent hashing for service discovery
import hashlib

def get_node(key, nodes):
    hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
    return nodes[hash_val % len(nodes)]

node = get_node('user-123', ['node-a', 'node-b', 'node-c'])

This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.