Home About Projects Articles Notes Interests Toolkit

Rate Limiter

Problem

Design a rate limiter that restricts the number of requests a user/service can make in a given time window.

Functional Requirements

Allow N requests per user per time window (e.g., 100 req/min)
Return HTTP 429 (Too Many Requests) when limit exceeded
Support different limits per API endpoint / user tier

Non-Functional Requirements

Low latency (rate check must not slow down request path)
Highly available (limiter failure ≠ service failure)
Distributed — works across multiple servers

Algorithms

Token Bucket

Bucket holds max N tokens. Tokens refill at fixed rate.
Each request consumes 1 token. If empty → reject.
Pros: Handles bursts. Used by: AWS, Stripe.

Sliding Window Counter

Track request timestamps in a window [now - 1min, now]
Count requests in window → if > limit, reject
Pros: Accurate. Cons: Memory per user.

Fixed Window Counter

Simple counter reset every window
Con: Boundary burst problem (2× limit possible at window edge)

High-Level Design

Client → API Gateway → Rate Limiter Middleware → Service
                              ↓
                         Redis (counters)

Redis with atomic INCR + EXPIRE for counters
Rate limiter as middleware in API Gateway (e.g., Kong, Nginx)
Rules stored in config / DB, cached locally

Key Tradeoffs

Decision	Option A	Option B
Algorithm	Token Bucket (burst-friendly)	Sliding Window (accurate)
Storage	Redis (distributed)	Local memory (fast, no sync)
Failure mode	Fail open (allow)	Fail closed (reject)

Interview Talking Points

Token bucket vs sliding window: Stripe uses token bucket
Race condition on counter → Redis Lua scripts for atomicity
Distributed rate limiting → each node syncs to Redis, or gossip protocol
Headers: X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After

Notes