Security

Rate Limiting Your API: Best Practices Every Developer Should Know

An unprotected API is an open invitation โ€” for abuse, for scraping, for accidental self-inflicted outages. Rate limiting is one of the highest-value things you can add to any API, and most developers implement it wrong the first time.

Last updated: July 6, 2026 ยท 23 min read ยท Python & Node.js examples

What rate limiting protects you from

Cost explosion

One runaway client hammers your API and your database bill triples overnight.

Brute force attacks

Login endpoints, password resets, OTP fields โ€” all need per-IP limiting.

Scraping and abuse

Competitors or bots harvesting your data faster than real users ever could.

Unintentional DoS

A buggy client in a retry loop takes down the service for everyone else.

Every API that's ever been exposed to the internet has been abused in some way โ€” usually faster than the developer expected. The question isn't whether you'll need rate limiting, it's whether you'll add it before or after your first incident.

Rate limiting sits at the intersection of security, reliability, and fairness. It keeps your infrastructure stable under load, prevents individual clients from monopolizing resources, and gives you a lever to pull when something goes wrong. Done well, it's largely invisible to legitimate users and a strong wall against everyone else.

This guide walks through everything from choosing the right algorithm to the exact response headers you should return โ€” with working code you can adapt for your stack.

The four rate limiting algorithms compared

Before writing a line of code, you need to pick an algorithm. Each handles traffic bursts differently, and the right choice depends on what you're protecting.

Fixed Window Counterโญ Simple

How it works: Count requests in a fixed time bucket (e.g. 0โ€“60s, 60โ€“120s). Reset the counter when the window ends.

โœ“ Simplest to implement. Easy to reason about.

โœ— Boundary spike problem โ€” a user can send 2ร— the limit by hitting the end of one window and start of the next.

Best for: Internal admin APIs, low-risk endpoints

Sliding Window Logโญโญ Moderate

How it works: Store a timestamp for every request. Count only requests within the last N seconds from now. Drop old timestamps.

โœ“ Precise โ€” no boundary spikes. Accurate per-second enforcement.

โœ— Memory-heavy at scale. Storing every timestamp for millions of users adds up.

Best for: High-security endpoints, authentication, payment flows

Sliding Window Counterโญโญ Moderate

How it works: Hybrid: track current and previous window counts. Weight the previous count by how much of it still falls in the sliding window.

โœ“ Good approximation of sliding window. Memory-efficient โ€” just two counters per client.

โœ— Slightly approximate, not perfectly precise at boundaries.

Best for: Most public APIs โ€” best balance of accuracy and efficiency

Token Bucketโญโญโญ More involved

How it works: Each client has a bucket that fills at a constant rate (e.g. 10 tokens/second, max 100). Each request costs tokens. Requests are rejected when the bucket is empty.

โœ“ Allows controlled bursts. Natural and flexible. Used by most large APIs (AWS, Stripe).

โœ— Slightly harder to explain to end users. Burst size needs careful tuning.

Best for: Production APIs where controlled bursts are acceptable

Which to pick? For most APIs, start with the sliding window counter. It's accurate enough for almost all use cases, memory-efficient, and straightforward to implement with Redis. Use token bucket if your users have legitimate reasons to burst (batch processing, bulk uploads). Use sliding window log only if you need exact precision for security-critical endpoints.

What to rate limit (and what not to)

Not every endpoint needs the same treatment. Applying a blanket limit everywhere is lazy and often counterproductive โ€” it blocks legitimate use cases while not specifically targeting high-risk ones. Think in tiers:

๐Ÿ”ด Strict limits โ€” security-critical

  • โ€”Login / password check endpoints
  • โ€”Password reset request + OTP verification
  • โ€”Account creation
  • โ€”Email verification resend
  • โ€”Any endpoint that triggers an email, SMS, or notification

Suggested limit: 3โ€“10 requests / minute per IP, 5โ€“20 / hour per user

These are the primary targets for brute force. Low limits are non-negotiable.

๐ŸŸก Moderate limits โ€” data endpoints

  • โ€”Search endpoints
  • โ€”List / browse endpoints (products, users, posts)
  • โ€”Export / download endpoints
  • โ€”AI-powered or expensive compute endpoints

Suggested limit: 60โ€“300 requests / minute per user or API key

Protects database and compute resources from hammering without blocking normal use.

๐ŸŸข Light limits or none โ€” static / cheap

  • โ€”Health check endpoints (/health, /ping)
  • โ€”Static asset endpoints
  • โ€”Publicly cached read endpoints
  • โ€”Webhook endpoints (rate limit by payload not requests)

Suggested limit: High limit or none โ€” but still consider IP-level global limits

Over-limiting here breaks monitoring, CDNs, and uptime checks.

Choosing your rate limit key

The rate limit key is what you're counting against. Getting this wrong means your limits either don't work (too broad) or block legitimate users (too narrow).

IP address

โœ“ Use for: Unauthenticated endpoints, login forms, registration

โœ— Watch out: Corporate users behind NAT โ€” entire office shares one IP. Blocking it locks out hundreds of legit users.

key = f"rate:ip:{client_ip}:{endpoint}"

User ID / API key

โœ“ Use for: Authenticated endpoints. Most fair โ€” limits are per paying customer.

โœ— Watch out: Doesn't protect pre-auth endpoints. Attackers can create many accounts.

key = f"rate:user:{user_id}:{endpoint}"

IP + endpoint

โœ“ Use for: Focused protection per route. Lets users call one endpoint freely while limiting another.

โœ— Watch out: More keys to manage. Needs consistent endpoint normalization.

key = f"rate:ip:{client_ip}:POST:/auth/login"

User ID + endpoint

โœ“ Use for: Best for authenticated APIs with different limits per feature.

โœ— Watch out: Requires auth to be resolved before rate limiting โ€” adds middleware complexity.

key = f"rate:user:{user_id}:GET:/api/search"

API key tier

โœ“ Use for: SaaS products with free/pro/enterprise plans. Limits tied to subscription.

โœ— Watch out: Requires knowing the plan at middleware time. Needs caching to avoid DB hits per request.

key = f"rate:apikey:{api_key}"

For most production APIs, combine strategies: use IP-based limits for unauthenticated endpoints and user ID-based limits for authenticated ones. This covers both attack vectors without penalizing corporate users.

Implementation: Redis + Python / Node.js

Redis is the standard backing store for rate limiting because it's fast, supports atomic operations, and has native TTL (expiry) support. Here are working implementations for the two most common stacks.

Sliding window counter in Python (FastAPI / Flask)

import redis
import time
from fastapi import Request, HTTPException

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def is_rate_limited(
    key: str,
    limit: int,
    window_seconds: int,
) -> tuple[bool, int, int]:
    """
    Sliding window counter using two Redis keys.
    Returns: (is_limited, current_count, remaining)
    """
    now = int(time.time())
    current_window = now // window_seconds
    previous_window = current_window - 1

    current_key = f"{key}:{current_window}"
    previous_key = f"{key}:{previous_window}"

    pipe = r.pipeline()
    pipe.get(current_key)
    pipe.get(previous_key)
    current_count_raw, previous_count_raw = pipe.execute()

    current_count = int(current_count_raw or 0)
    previous_count = int(previous_count_raw or 0)

    # Weight previous window by how much of it is still "in" the window
    elapsed_in_window = now % window_seconds
    previous_weight = 1 - (elapsed_in_window / window_seconds)
    estimated_count = current_count + (previous_count * previous_weight)

    if estimated_count >= limit:
        return True, int(estimated_count), 0

    # Increment current window counter
    pipe = r.pipeline()
    pipe.incr(current_key)
    pipe.expire(current_key, window_seconds * 2)
    pipe.execute()

    remaining = max(0, limit - int(estimated_count) - 1)
    return False, int(estimated_count) + 1, remaining


# FastAPI middleware
from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    client_ip = request.client.host
    path = request.url.path

    # Strict limit on auth endpoints
    if path.startswith("/auth/"):
        key = f"rate:ip:{client_ip}:auth"
        limited, count, remaining = is_rate_limited(key, limit=10, window_seconds=60)
    else:
        key = f"rate:ip:{client_ip}:global"
        limited, count, remaining = is_rate_limited(key, limit=300, window_seconds=60)

    if limited:
        return JSONResponse(
            status_code=429,
            content={"error": "Too many requests. Please slow down."},
            headers={
                "Retry-After": "60",
                "X-RateLimit-Limit": str(10 if path.startswith("/auth/") else 300),
                "X-RateLimit-Remaining": "0",
            },
        )

    response = await call_next(request)
    response.headers["X-RateLimit-Remaining"] = str(remaining)
    return response

Token bucket in Node.js (Express)

const Redis = require("ioredis");
const redis = new Redis();

// Token bucket rate limiter
async function tokenBucket(key, { capacity, refillRate, refillPeriodMs }) {
  const now = Date.now();
  const bucketKey = `bucket:${key}`;

  const result = await redis
    .multi()
    .hgetall(bucketKey)
    .exec();

  const data = result[0][1];
  let tokens = data?.tokens ? parseFloat(data.tokens) : capacity;
  let lastRefill = data?.lastRefill ? parseInt(data.lastRefill) : now;

  // Refill tokens based on elapsed time
  const elapsed = now - lastRefill;
  const tokensToAdd = (elapsed / refillPeriodMs) * refillRate;
  tokens = Math.min(capacity, tokens + tokensToAdd);
  lastRefill = now;

  if (tokens < 1) {
    // Calculate when the next token will be available
    const msUntilToken = ((1 - tokens) / refillRate) * refillPeriodMs;
    return { allowed: false, remaining: 0, retryAfterMs: Math.ceil(msUntilToken) };
  }

  tokens -= 1;

  await redis
    .multi()
    .hset(bucketKey, "tokens", tokens.toString(), "lastRefill", lastRefill.toString())
    .pexpire(bucketKey, refillPeriodMs * 2)
    .exec();

  return { allowed: true, remaining: Math.floor(tokens), retryAfterMs: 0 };
}

// Express middleware
function rateLimitMiddleware(config) {
  return async (req, res, next) => {
    const key = req.user?.id
      ? `user:${req.user.id}`
      : `ip:${req.ip}`;

    const result = await tokenBucket(key, config);

    res.setHeader("X-RateLimit-Limit", config.capacity);
    res.setHeader("X-RateLimit-Remaining", result.remaining);

    if (!result.allowed) {
      res.setHeader("Retry-After", Math.ceil(result.retryAfterMs / 1000));
      return res.status(429).json({ error: "Rate limit exceeded." });
    }

    next();
  };
}

// Apply to Express app
const express = require("express");
const app = express();

// Strict limit for login
app.post(
  "/auth/login",
  rateLimitMiddleware({ capacity: 5, refillRate: 1, refillPeriodMs: 60_000 }),
  loginHandler
);

// General API limit
app.use(
  "/api/",
  rateLimitMiddleware({ capacity: 100, refillRate: 10, refillPeriodMs: 1_000 }),
);

Response headers โ€” tell clients what's happening

Good rate limiting is transparent. Clients shouldn't have to guess how many requests they have left or when they can retry. The standard headers make this information machine-readable, which means good API clients can automatically back off without crashing or spamming retries.

X-RateLimit-LimitSend always
Example: 100

The maximum number of requests allowed in the current window.

X-RateLimit-RemainingSend always
Example: 47

How many requests the client has left before hitting the limit.

X-RateLimit-ResetSend always
Example: 1751234567 (Unix timestamp)

When the current window resets and the counter returns to the limit. Send as UTC epoch seconds.

Retry-AfterSend always
Example: 30 (seconds)

Only sent on 429 responses. How long the client must wait before retrying. Required by RFC 6585.

X-RateLimit-PolicyOptional
Example: 100;w=60;burst=20;comment="standard"

Draft standard (IETF) describing the full policy in a structured format. Not widely required yet but good practice.

# Example response headers on a normal request
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1751234567

# Example response on a 429
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1751234597

{
  "error": "rate_limit_exceeded",
  "message": "You've exceeded 100 requests per minute. Retry after 30 seconds.",
  "retry_after": 30
}

Rate limit tiers and per-user quotas

A single global limit is a blunt instrument. Production APIs almost always need differentiated limits โ€” different rules for different user types, plan levels, or use cases. Here's a pattern for implementing tiered rate limits cleanly.

TierRequests / minRequests / dayBurstWho
Unauthenticated1050015Anonymous / public
Free tier605,000100Signed-up, no subscription
Pro30050,000500Paying individual
Business1,000200,0002,000Team plan
EnterpriseCustomUnlimitedCustomContract customers
Internal servicesNoneNoneNoneService-to-service (trusted)
# Python: look up tier limits from cache/DB
TIER_LIMITS = {
    "anonymous":   {"per_minute": 10,    "per_day": 500},
    "free":        {"per_minute": 60,    "per_day": 5_000},
    "pro":         {"per_minute": 300,   "per_day": 50_000},
    "business":    {"per_minute": 1_000, "per_day": 200_000},
    "internal":    None,  # no limit
}

def get_user_limits(user) -> dict | None:
    if user is None:
        return TIER_LIMITS["anonymous"]
    if user.is_internal_service:
        return None  # no limits for trusted services
    return TIER_LIMITS.get(user.subscription_tier, TIER_LIMITS["free"])

# In middleware
limits = get_user_limits(request.user)
if limits:
    limited_minute, _, remaining = is_rate_limited(
        key=f"rate:{key}:minute",
        limit=limits["per_minute"],
        window_seconds=60,
    )
    limited_day, _, _ = is_rate_limited(
        key=f"rate:{key}:day",
        limit=limits["per_day"],
        window_seconds=86_400,
    )
    if limited_minute or limited_day:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

Common bypass techniques to defend against

Naive rate limiting is easier to bypass than most developers realize. Here are the most common tricks attackers use โ€” and how to close each gap.

โš”๏ธ IP rotation

Attacker routes requests through a proxy pool, using a different IP for each one. IP-only rate limiting is completely defeated.

Defense: Add rate limits on user ID / API key post-authentication. Use fingerprinting signals (User-Agent, Accept-Language, TLS fingerprint) to detect rotation patterns. Services like Cloudflare Bot Management do this automatically.

โš”๏ธ Account farming

Attacker creates thousands of free accounts to each get their own rate limit bucket. Distributed brute force through many accounts.

Defense: Require email verification before full access. Apply stricter limits to unverified accounts. Use device fingerprinting to link accounts from the same device. Require CAPTCHA after N failed attempts.

โš”๏ธ Slow drip attacks

Instead of hammering the endpoint, attacker sends requests just under the threshold โ€” forever. Doesn't trigger the rate limit but still causes damage over time.

Defense: Add both per-minute AND per-day limits. Monitor for sustained low-rate patterns. A real user rarely sends exactly 9 requests/minute for 72 hours straight.

โš”๏ธ Header spoofing

If you trust X-Forwarded-For or X-Real-IP headers without validation, attackers can spoof their IP address and bypass IP-based limits.

Defense: Only trust X-Forwarded-For from your own load balancer or proxy. Set TRUSTED_PROXIES explicitly. Extract the real client IP only from the last untrusted hop.

โš”๏ธ Endpoint variation

If you rate limit /api/search but not /api/search/, /api/Search, or /API/search, attackers try variations to find unprotected paths.

Defense: Normalize all paths before using them in rate limit keys. Apply limits at the middleware level before routing, not at the route handler.

Production readiness checklist

Before shipping your rate limiting to production, work through this checklist. Each item represents a real gap that's caused an incident somewhere.

๐Ÿ”ด

Rate limits applied before auth resolves (for auth endpoints)

Critical
๐Ÿ”ด

Both per-minute and per-day limits configured for key endpoints

Critical
๐Ÿ”ด

All 429 responses include Retry-After header

Critical
โšช

X-RateLimit-Remaining header sent on every response

๐Ÿ”ด

Client IP extracted correctly from X-Forwarded-For (trusted proxy only)

Critical
โšช

Rate limit key includes endpoint path, not just IP

๐Ÿ”ด

Redis connection failure handled gracefully โ€” fail open or closed?

Critical
โšช

Rate limit errors logged with key, client ID, and endpoint for monitoring

๐Ÿ”ด

Alert set up for sudden spike in 429 responses (potential attack)

Critical
โšช

Internal service accounts whitelisted from rate limits

โšช

Rate limit configuration externalized (env/config file, not hardcoded)

๐Ÿ”ด

Tested that rotating IPs doesn't bypass per-user limits

Critical
โšช

Path normalization applied before building rate limit keys

โšช

429 error message is informative but doesn't reveal limit details to attackers

โš ๏ธ What to do when Redis goes down

If your rate limiter depends on Redis and Redis goes down, you have two choices: fail open (allow all requests) or fail closed (block all requests). Failing open is usually right for availability โ€” a brief window of unprotected traffic is better than a full outage. Implement a circuit breaker pattern around Redis calls so that a Redis failure degrades gracefully rather than crashing your middleware.

FAQ

Should I implement rate limiting myself or use a gateway?โ†“
For simple cases, implement it yourself โ€” it gives you full control and there are no moving parts. For complex cases (multiple services, multiple teams, high traffic), use an API gateway like Kong, AWS API Gateway, or Cloudflare. They handle rate limiting, auth, logging, and routing in one layer. The two aren't mutually exclusive: a gateway for global limits + application-level limits for fine-grained control is a common production pattern.
What HTTP status code should I return for rate limiting?โ†“
429 Too Many Requests is the correct code, standardized in RFC 6585. Some older APIs used 503 (Service Unavailable) or 403 (Forbidden) โ€” both are wrong. 429 tells clients exactly what happened and that retrying later will work. Always pair it with a Retry-After header.
How do I rate limit without Redis?โ†“
For a single-server setup, an in-memory store (a dict in Python, a Map in Node.js) works but won't survive restarts and doesn't scale across multiple processes. For multiple servers, you need shared state โ€” Redis is the standard, but Memcached or a database with atomic operations (PostgreSQL advisory locks, for example) also work. For serverless functions, you'll need an external store since function instances don't share memory.
How do I handle legitimate bursts without punishing users?โ†“
Use the token bucket algorithm with a burst parameter. If your normal limit is 60 req/min, allow a burst of 100 tokens. This lets a user make 100 requests immediately after a long pause without hitting the limit, while still enforcing the average rate. This is how Stripe, GitHub, and most major APIs handle it.
Should I rate limit by IP or by user?โ†“
Both, applied at different layers. IP limits protect pre-authentication endpoints (login, registration, password reset) where you don't have a user ID yet. User-based limits protect authenticated endpoints and are fairer โ€” they don't penalize shared offices or universities where many people share an IP. Apply both for critical endpoints: IP limit as a first line, user limit after auth.
How do I communicate rate limits to my API users?โ†“
Document them clearly in your API reference โ€” what the limits are, what keys they apply to, and what happens when exceeded. Return the X-RateLimit-* headers on every response so clients can build backoff logic. Write a Retry-After header on 429s. Consider a dashboard showing usage vs. limits for API key holders. Surprises are the worst โ€” known limits that clients can program against are fine.

Quick recap

  1. 1Pick your algorithm: sliding window counter for most APIs, token bucket if clients need burst support.
  2. 2Not every endpoint needs the same limit โ€” auth and notification endpoints need the strictest treatment.
  3. 3Rate limit key = IP for pre-auth, user ID for post-auth, combine both for critical endpoints.
  4. 4Redis is the standard backend โ€” atomic increments + TTL makes it ideal for this.
  5. 5Always return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After on 429.
  6. 6Implement tiered limits per plan โ€” anonymous, free, pro, enterprise should have different quotas.
  7. 7Defend against bypass: IP rotation, account farming, slow drip, header spoofing, path variation.
  8. 8Handle Redis failure gracefully โ€” decide on fail-open vs. fail-closed before it happens in production.

Feedback

Live