How the saga pattern manages distributed transactions
· Category: System Design
Short answer
The saga pattern breaks long-running transactions into a sequence of local transactions, each with a compensating action for rollback.
Steps
- Decompose the business process into steps owned by different services.
- Execute each step and publish an event upon completion.
- If a step fails, trigger compensating transactions for prior completed steps.
- Use orchestration with a central coordinator or choreography with event-driven collaboration.
- Ensure all compensations are idempotent and observable.
Tips
- Prefer choreography for loose coupling and orchestration for complex flows.
- Log every step and compensation for auditability.
- Design compensations to be semantically correct even if delayed.
- Use timeouts and alerts to detect stalled sagas.
Common issues
- Compensation failures leaving the system in a partially inconsistent state.
- Ordering bugs causing compensations to run before the original action.
- Complexity from tracking saga state across many services.
- Difficulty testing all failure combinations in long sagas.
Example
# Consistent hashing for service discovery
import hashlib
def get_node(key, nodes):
hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
return nodes[hash_val % len(nodes)]
node = get_node('user-123', ['node-a', 'node-b', 'node-c'])
This snippet implements consistent hashing to distribute keys across nodes, a foundational technique in scalable distributed systems.