How to design a rate limiter

· Category: System Design

Short answer

A rate limiter controls the number of requests a client can make in a time window to protect resources and ensure fair usage.

Steps

  1. Choose an algorithm: token bucket allows bursts, sliding window offers precision, and fixed window is simplest.
  2. Store counters or token state in a fast in-memory store like Redis.
  3. Check and decrement counters on each request.
  4. Reject or delay requests that exceed the limit.
  5. Return rate limit headers so clients can adapt.

Tips

  • Use distributed rate limiters when running multiple API instances.
  • Differentiate limits by user tier, endpoint cost, and method.
  • Allow short bursts for interactive users while limiting sustained load.
  • Combine global and per-user limits for layered protection.

Common issues

  • Race conditions causing limit overshoot in distributed counters.
  • Clock skew affecting sliding window calculations.
  • Memory pressure from maintaining per-client state.
  • False positives from shared NAT IP addresses.

Example

from redis import Redis
from redis_rate_limit import RateLimit

with RateLimit(resource='api', client_ip='1.2.3.4',
               max_requests=100, expire=60):
    process_request()

This example uses Redis to enforce a per-client rate limit of 100 requests per minute, protecting backend resources from abuse.

Additional context

Applying these principles consistently across projects leads to more maintainable systems, clearer team communication, and better outcomes for end users. Regular review and refinement of practices ensure continuous improvement.