How rate limiting protects APIs

· Category: API & REST

Short answer

Rate limiting restricts the number of requests a client can make in a given time window, preventing overload and ensuring service availability.

Steps

  1. Choose an algorithm: fixed window, sliding window log, sliding window counter, or token bucket.
  2. Set limits based on user tier, endpoint cost, and infrastructure capacity.
  3. Return rate limit headers such as X-RateLimit-Limit and X-RateLimit-Remaining.
  4. Respond with 429 Too Many Requests when limits are exceeded, including a Retry-After header.
  5. Monitor rate limit violations to detect abuse or misconfigured clients.

Tips

  • Use distributed rate limiters like Redis when running multiple API instances.
  • Differentiate between burst and sustained rate limits.
  • Provide higher limits for authenticated users than anonymous traffic.
  • Gracefully degrade service rather than hard-blocking legitimate users.

Common issues

  • Race conditions in distributed rate counters causing limit overshoot.
  • Clients ignoring 429 responses and retrying immediately without backoff.
  • Inadequate rate limits during traffic spikes leading to cascading failures.
  • Difficulty debugging rate limit hits when headers are missing or unclear.

Example

curl -X GET https://api.example.com/users   -H "Accept: application/json"   -H "Authorization: Bearer $TOKEN"

This curl command demonstrates a standard GET request with headers for content negotiation and bearer token authentication.