Back to Notes

Rate Limiter

Problem

Design a rate limiter that restricts the number of requests a user/service can make in a given time window.


Functional Requirements

  • Allow N requests per user per time window (e.g., 100 req/min)
  • Return HTTP 429 (Too Many Requests) when limit exceeded
  • Support different limits per API endpoint / user tier

Non-Functional Requirements

  • Low latency (rate check must not slow down request path)
  • Highly available (limiter failure ≠ service failure)
  • Distributed — works across multiple servers

Algorithms

Token Bucket

  • Bucket holds max N tokens. Tokens refill at fixed rate.
  • Each request consumes 1 token. If empty → reject.
  • Pros: Handles bursts. Used by: AWS, Stripe.

Sliding Window Counter

  • Track request timestamps in a window [now - 1min, now]
  • Count requests in window → if > limit, reject
  • Pros: Accurate. Cons: Memory per user.

Fixed Window Counter

  • Simple counter reset every window
  • Con: Boundary burst problem (2× limit possible at window edge)

High-Level Design

Client → API Gateway → Rate Limiter Middleware → Service
                              ↓
                         Redis (counters)
  • Redis with atomic INCR + EXPIRE for counters
  • Rate limiter as middleware in API Gateway (e.g., Kong, Nginx)
  • Rules stored in config / DB, cached locally

Key Tradeoffs

DecisionOption AOption B
AlgorithmToken Bucket (burst-friendly)Sliding Window (accurate)
StorageRedis (distributed)Local memory (fast, no sync)
Failure modeFail open (allow)Fail closed (reject)

Interview Talking Points

  • Token bucket vs sliding window: Stripe uses token bucket
  • Race condition on counter → Redis Lua scripts for atomicity
  • Distributed rate limiting → each node syncs to Redis, or gossip protocol
  • Headers: X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After

Notes

<!-- Add as you study Gaurav Sen's session -->