Rate Limiter
Problem
Design a rate limiter that restricts the number of requests a user/service can make in a given time window.
Functional Requirements
- Allow N requests per user per time window (e.g., 100 req/min)
- Return HTTP 429 (Too Many Requests) when limit exceeded
- Support different limits per API endpoint / user tier
Non-Functional Requirements
- Low latency (rate check must not slow down request path)
- Highly available (limiter failure ≠ service failure)
- Distributed — works across multiple servers
Algorithms
Token Bucket
- Bucket holds max N tokens. Tokens refill at fixed rate.
- Each request consumes 1 token. If empty → reject.
- Pros: Handles bursts. Used by: AWS, Stripe.
Sliding Window Counter
- Track request timestamps in a window [now - 1min, now]
- Count requests in window → if > limit, reject
- Pros: Accurate. Cons: Memory per user.
Fixed Window Counter
- Simple counter reset every window
- Con: Boundary burst problem (2× limit possible at window edge)
High-Level Design
Client → API Gateway → Rate Limiter Middleware → Service
↓
Redis (counters)
- Redis with atomic
INCR+EXPIREfor counters - Rate limiter as middleware in API Gateway (e.g., Kong, Nginx)
- Rules stored in config / DB, cached locally
Key Tradeoffs
| Decision | Option A | Option B |
|---|---|---|
| Algorithm | Token Bucket (burst-friendly) | Sliding Window (accurate) |
| Storage | Redis (distributed) | Local memory (fast, no sync) |
| Failure mode | Fail open (allow) | Fail closed (reject) |
Interview Talking Points
- Token bucket vs sliding window: Stripe uses token bucket
- Race condition on counter → Redis Lua scripts for atomicity
- Distributed rate limiting → each node syncs to Redis, or gossip protocol
- Headers:
X-RateLimit-Limit,X-RateLimit-Remaining,Retry-After