Rate Limiting Your API: Best Practices Every Developer Should Know
An unprotected API is an open invitation โ for abuse, for scraping, for accidental self-inflicted outages. Rate limiting is one of the highest-value things you can add to any API, and most developers implement it wrong the first time.
Last updated: July 6, 2026 ยท 23 min read ยท Python & Node.js examples
What rate limiting protects you from
Cost explosion
One runaway client hammers your API and your database bill triples overnight.
Brute force attacks
Login endpoints, password resets, OTP fields โ all need per-IP limiting.
Scraping and abuse
Competitors or bots harvesting your data faster than real users ever could.
Unintentional DoS
A buggy client in a retry loop takes down the service for everyone else.
Every API that's ever been exposed to the internet has been abused in some way โ usually faster than the developer expected. The question isn't whether you'll need rate limiting, it's whether you'll add it before or after your first incident.
Rate limiting sits at the intersection of security, reliability, and fairness. It keeps your infrastructure stable under load, prevents individual clients from monopolizing resources, and gives you a lever to pull when something goes wrong. Done well, it's largely invisible to legitimate users and a strong wall against everyone else.
This guide walks through everything from choosing the right algorithm to the exact response headers you should return โ with working code you can adapt for your stack.
The four rate limiting algorithms compared
Before writing a line of code, you need to pick an algorithm. Each handles traffic bursts differently, and the right choice depends on what you're protecting.
How it works: Count requests in a fixed time bucket (e.g. 0โ60s, 60โ120s). Reset the counter when the window ends.
โ Simplest to implement. Easy to reason about.
โ Boundary spike problem โ a user can send 2ร the limit by hitting the end of one window and start of the next.
Best for: Internal admin APIs, low-risk endpoints
How it works: Store a timestamp for every request. Count only requests within the last N seconds from now. Drop old timestamps.
โ Precise โ no boundary spikes. Accurate per-second enforcement.
โ Memory-heavy at scale. Storing every timestamp for millions of users adds up.
Best for: High-security endpoints, authentication, payment flows
How it works: Hybrid: track current and previous window counts. Weight the previous count by how much of it still falls in the sliding window.
โ Good approximation of sliding window. Memory-efficient โ just two counters per client.
โ Slightly approximate, not perfectly precise at boundaries.
Best for: Most public APIs โ best balance of accuracy and efficiency
How it works: Each client has a bucket that fills at a constant rate (e.g. 10 tokens/second, max 100). Each request costs tokens. Requests are rejected when the bucket is empty.
โ Allows controlled bursts. Natural and flexible. Used by most large APIs (AWS, Stripe).
โ Slightly harder to explain to end users. Burst size needs careful tuning.
Best for: Production APIs where controlled bursts are acceptable
Which to pick? For most APIs, start with the sliding window counter. It's accurate enough for almost all use cases, memory-efficient, and straightforward to implement with Redis. Use token bucket if your users have legitimate reasons to burst (batch processing, bulk uploads). Use sliding window log only if you need exact precision for security-critical endpoints.
What to rate limit (and what not to)
Not every endpoint needs the same treatment. Applying a blanket limit everywhere is lazy and often counterproductive โ it blocks legitimate use cases while not specifically targeting high-risk ones. Think in tiers:
๐ด Strict limits โ security-critical
- โLogin / password check endpoints
- โPassword reset request + OTP verification
- โAccount creation
- โEmail verification resend
- โAny endpoint that triggers an email, SMS, or notification
Suggested limit: 3โ10 requests / minute per IP, 5โ20 / hour per user
These are the primary targets for brute force. Low limits are non-negotiable.
๐ก Moderate limits โ data endpoints
- โSearch endpoints
- โList / browse endpoints (products, users, posts)
- โExport / download endpoints
- โAI-powered or expensive compute endpoints
Suggested limit: 60โ300 requests / minute per user or API key
Protects database and compute resources from hammering without blocking normal use.
๐ข Light limits or none โ static / cheap
- โHealth check endpoints (/health, /ping)
- โStatic asset endpoints
- โPublicly cached read endpoints
- โWebhook endpoints (rate limit by payload not requests)
Suggested limit: High limit or none โ but still consider IP-level global limits
Over-limiting here breaks monitoring, CDNs, and uptime checks.
Choosing your rate limit key
The rate limit key is what you're counting against. Getting this wrong means your limits either don't work (too broad) or block legitimate users (too narrow).
IP address
โ Use for: Unauthenticated endpoints, login forms, registration
โ Watch out: Corporate users behind NAT โ entire office shares one IP. Blocking it locks out hundreds of legit users.
key = f"rate:ip:{client_ip}:{endpoint}"User ID / API key
โ Use for: Authenticated endpoints. Most fair โ limits are per paying customer.
โ Watch out: Doesn't protect pre-auth endpoints. Attackers can create many accounts.
key = f"rate:user:{user_id}:{endpoint}"IP + endpoint
โ Use for: Focused protection per route. Lets users call one endpoint freely while limiting another.
โ Watch out: More keys to manage. Needs consistent endpoint normalization.
key = f"rate:ip:{client_ip}:POST:/auth/login"User ID + endpoint
โ Use for: Best for authenticated APIs with different limits per feature.
โ Watch out: Requires auth to be resolved before rate limiting โ adds middleware complexity.
key = f"rate:user:{user_id}:GET:/api/search"API key tier
โ Use for: SaaS products with free/pro/enterprise plans. Limits tied to subscription.
โ Watch out: Requires knowing the plan at middleware time. Needs caching to avoid DB hits per request.
key = f"rate:apikey:{api_key}"For most production APIs, combine strategies: use IP-based limits for unauthenticated endpoints and user ID-based limits for authenticated ones. This covers both attack vectors without penalizing corporate users.
Implementation: Redis + Python / Node.js
Redis is the standard backing store for rate limiting because it's fast, supports atomic operations, and has native TTL (expiry) support. Here are working implementations for the two most common stacks.
Sliding window counter in Python (FastAPI / Flask)
import redis
import time
from fastapi import Request, HTTPException
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
def is_rate_limited(
key: str,
limit: int,
window_seconds: int,
) -> tuple[bool, int, int]:
"""
Sliding window counter using two Redis keys.
Returns: (is_limited, current_count, remaining)
"""
now = int(time.time())
current_window = now // window_seconds
previous_window = current_window - 1
current_key = f"{key}:{current_window}"
previous_key = f"{key}:{previous_window}"
pipe = r.pipeline()
pipe.get(current_key)
pipe.get(previous_key)
current_count_raw, previous_count_raw = pipe.execute()
current_count = int(current_count_raw or 0)
previous_count = int(previous_count_raw or 0)
# Weight previous window by how much of it is still "in" the window
elapsed_in_window = now % window_seconds
previous_weight = 1 - (elapsed_in_window / window_seconds)
estimated_count = current_count + (previous_count * previous_weight)
if estimated_count >= limit:
return True, int(estimated_count), 0
# Increment current window counter
pipe = r.pipeline()
pipe.incr(current_key)
pipe.expire(current_key, window_seconds * 2)
pipe.execute()
remaining = max(0, limit - int(estimated_count) - 1)
return False, int(estimated_count) + 1, remaining
# FastAPI middleware
from fastapi import FastAPI
from fastapi.responses import JSONResponse
app = FastAPI()
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
client_ip = request.client.host
path = request.url.path
# Strict limit on auth endpoints
if path.startswith("/auth/"):
key = f"rate:ip:{client_ip}:auth"
limited, count, remaining = is_rate_limited(key, limit=10, window_seconds=60)
else:
key = f"rate:ip:{client_ip}:global"
limited, count, remaining = is_rate_limited(key, limit=300, window_seconds=60)
if limited:
return JSONResponse(
status_code=429,
content={"error": "Too many requests. Please slow down."},
headers={
"Retry-After": "60",
"X-RateLimit-Limit": str(10 if path.startswith("/auth/") else 300),
"X-RateLimit-Remaining": "0",
},
)
response = await call_next(request)
response.headers["X-RateLimit-Remaining"] = str(remaining)
return responseToken bucket in Node.js (Express)
const Redis = require("ioredis");
const redis = new Redis();
// Token bucket rate limiter
async function tokenBucket(key, { capacity, refillRate, refillPeriodMs }) {
const now = Date.now();
const bucketKey = `bucket:${key}`;
const result = await redis
.multi()
.hgetall(bucketKey)
.exec();
const data = result[0][1];
let tokens = data?.tokens ? parseFloat(data.tokens) : capacity;
let lastRefill = data?.lastRefill ? parseInt(data.lastRefill) : now;
// Refill tokens based on elapsed time
const elapsed = now - lastRefill;
const tokensToAdd = (elapsed / refillPeriodMs) * refillRate;
tokens = Math.min(capacity, tokens + tokensToAdd);
lastRefill = now;
if (tokens < 1) {
// Calculate when the next token will be available
const msUntilToken = ((1 - tokens) / refillRate) * refillPeriodMs;
return { allowed: false, remaining: 0, retryAfterMs: Math.ceil(msUntilToken) };
}
tokens -= 1;
await redis
.multi()
.hset(bucketKey, "tokens", tokens.toString(), "lastRefill", lastRefill.toString())
.pexpire(bucketKey, refillPeriodMs * 2)
.exec();
return { allowed: true, remaining: Math.floor(tokens), retryAfterMs: 0 };
}
// Express middleware
function rateLimitMiddleware(config) {
return async (req, res, next) => {
const key = req.user?.id
? `user:${req.user.id}`
: `ip:${req.ip}`;
const result = await tokenBucket(key, config);
res.setHeader("X-RateLimit-Limit", config.capacity);
res.setHeader("X-RateLimit-Remaining", result.remaining);
if (!result.allowed) {
res.setHeader("Retry-After", Math.ceil(result.retryAfterMs / 1000));
return res.status(429).json({ error: "Rate limit exceeded." });
}
next();
};
}
// Apply to Express app
const express = require("express");
const app = express();
// Strict limit for login
app.post(
"/auth/login",
rateLimitMiddleware({ capacity: 5, refillRate: 1, refillPeriodMs: 60_000 }),
loginHandler
);
// General API limit
app.use(
"/api/",
rateLimitMiddleware({ capacity: 100, refillRate: 10, refillPeriodMs: 1_000 }),
);Response headers โ tell clients what's happening
Good rate limiting is transparent. Clients shouldn't have to guess how many requests they have left or when they can retry. The standard headers make this information machine-readable, which means good API clients can automatically back off without crashing or spamming retries.
X-RateLimit-LimitSend alwaysExample: 100The maximum number of requests allowed in the current window.
X-RateLimit-RemainingSend alwaysExample: 47How many requests the client has left before hitting the limit.
X-RateLimit-ResetSend alwaysExample: 1751234567 (Unix timestamp)When the current window resets and the counter returns to the limit. Send as UTC epoch seconds.
Retry-AfterSend alwaysExample: 30 (seconds)Only sent on 429 responses. How long the client must wait before retrying. Required by RFC 6585.
X-RateLimit-PolicyOptionalExample: 100;w=60;burst=20;comment="standard"Draft standard (IETF) describing the full policy in a structured format. Not widely required yet but good practice.
# Example response headers on a normal request
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1751234567
# Example response on a 429
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1751234597
{
"error": "rate_limit_exceeded",
"message": "You've exceeded 100 requests per minute. Retry after 30 seconds.",
"retry_after": 30
}Rate limit tiers and per-user quotas
A single global limit is a blunt instrument. Production APIs almost always need differentiated limits โ different rules for different user types, plan levels, or use cases. Here's a pattern for implementing tiered rate limits cleanly.
| Tier | Requests / min | Requests / day | Burst | Who |
|---|---|---|---|---|
| Unauthenticated | 10 | 500 | 15 | Anonymous / public |
| Free tier | 60 | 5,000 | 100 | Signed-up, no subscription |
| Pro | 300 | 50,000 | 500 | Paying individual |
| Business | 1,000 | 200,000 | 2,000 | Team plan |
| Enterprise | Custom | Unlimited | Custom | Contract customers |
| Internal services | None | None | None | Service-to-service (trusted) |
# Python: look up tier limits from cache/DB
TIER_LIMITS = {
"anonymous": {"per_minute": 10, "per_day": 500},
"free": {"per_minute": 60, "per_day": 5_000},
"pro": {"per_minute": 300, "per_day": 50_000},
"business": {"per_minute": 1_000, "per_day": 200_000},
"internal": None, # no limit
}
def get_user_limits(user) -> dict | None:
if user is None:
return TIER_LIMITS["anonymous"]
if user.is_internal_service:
return None # no limits for trusted services
return TIER_LIMITS.get(user.subscription_tier, TIER_LIMITS["free"])
# In middleware
limits = get_user_limits(request.user)
if limits:
limited_minute, _, remaining = is_rate_limited(
key=f"rate:{key}:minute",
limit=limits["per_minute"],
window_seconds=60,
)
limited_day, _, _ = is_rate_limited(
key=f"rate:{key}:day",
limit=limits["per_day"],
window_seconds=86_400,
)
if limited_minute or limited_day:
raise HTTPException(status_code=429, detail="Rate limit exceeded")Common bypass techniques to defend against
Naive rate limiting is easier to bypass than most developers realize. Here are the most common tricks attackers use โ and how to close each gap.
โ๏ธ IP rotation
Attacker routes requests through a proxy pool, using a different IP for each one. IP-only rate limiting is completely defeated.
Defense: Add rate limits on user ID / API key post-authentication. Use fingerprinting signals (User-Agent, Accept-Language, TLS fingerprint) to detect rotation patterns. Services like Cloudflare Bot Management do this automatically.
โ๏ธ Account farming
Attacker creates thousands of free accounts to each get their own rate limit bucket. Distributed brute force through many accounts.
Defense: Require email verification before full access. Apply stricter limits to unverified accounts. Use device fingerprinting to link accounts from the same device. Require CAPTCHA after N failed attempts.
โ๏ธ Slow drip attacks
Instead of hammering the endpoint, attacker sends requests just under the threshold โ forever. Doesn't trigger the rate limit but still causes damage over time.
Defense: Add both per-minute AND per-day limits. Monitor for sustained low-rate patterns. A real user rarely sends exactly 9 requests/minute for 72 hours straight.
โ๏ธ Header spoofing
If you trust X-Forwarded-For or X-Real-IP headers without validation, attackers can spoof their IP address and bypass IP-based limits.
Defense: Only trust X-Forwarded-For from your own load balancer or proxy. Set TRUSTED_PROXIES explicitly. Extract the real client IP only from the last untrusted hop.
โ๏ธ Endpoint variation
If you rate limit /api/search but not /api/search/, /api/Search, or /API/search, attackers try variations to find unprotected paths.
Defense: Normalize all paths before using them in rate limit keys. Apply limits at the middleware level before routing, not at the route handler.
Production readiness checklist
Before shipping your rate limiting to production, work through this checklist. Each item represents a real gap that's caused an incident somewhere.
Rate limits applied before auth resolves (for auth endpoints)
CriticalBoth per-minute and per-day limits configured for key endpoints
CriticalAll 429 responses include Retry-After header
CriticalX-RateLimit-Remaining header sent on every response
Client IP extracted correctly from X-Forwarded-For (trusted proxy only)
CriticalRate limit key includes endpoint path, not just IP
Redis connection failure handled gracefully โ fail open or closed?
CriticalRate limit errors logged with key, client ID, and endpoint for monitoring
Alert set up for sudden spike in 429 responses (potential attack)
CriticalInternal service accounts whitelisted from rate limits
Rate limit configuration externalized (env/config file, not hardcoded)
Tested that rotating IPs doesn't bypass per-user limits
CriticalPath normalization applied before building rate limit keys
429 error message is informative but doesn't reveal limit details to attackers
โ ๏ธ What to do when Redis goes down
If your rate limiter depends on Redis and Redis goes down, you have two choices: fail open (allow all requests) or fail closed (block all requests). Failing open is usually right for availability โ a brief window of unprotected traffic is better than a full outage. Implement a circuit breaker pattern around Redis calls so that a Redis failure degrades gracefully rather than crashing your middleware.
FAQ
Should I implement rate limiting myself or use a gateway?โ
What HTTP status code should I return for rate limiting?โ
How do I rate limit without Redis?โ
How do I handle legitimate bursts without punishing users?โ
Should I rate limit by IP or by user?โ
How do I communicate rate limits to my API users?โ
Quick recap
- 1Pick your algorithm: sliding window counter for most APIs, token bucket if clients need burst support.
- 2Not every endpoint needs the same limit โ auth and notification endpoints need the strictest treatment.
- 3Rate limit key = IP for pre-auth, user ID for post-auth, combine both for critical endpoints.
- 4Redis is the standard backend โ atomic increments + TTL makes it ideal for this.
- 5Always return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After on 429.
- 6Implement tiered limits per plan โ anonymous, free, pro, enterprise should have different quotas.
- 7Defend against bypass: IP rotation, account farming, slow drip, header spoofing, path variation.
- 8Handle Redis failure gracefully โ decide on fail-open vs. fail-closed before it happens in production.