429 Too Many Requests: The Rate Limiting Deep Dive

HTTP 429 is the most honest error code in scraping. The server isn’t confused about who you are — it knows exactly what you’re doing — it’s simply saying: “Slow down.”

Unlike a 403 (which means “go away”), a 429 is an invitation to retry. But doing it wrong — hammering the server with immediate retries — will escalate your 429 into a permanent IP ban.

This guide covers how rate limiting actually works on the server side, and the engineering patterns that let you maximize throughput without triggering blocks.

How Rate Limiting Works (Server Side)

Understanding the server’s perspective helps you work within its constraints:

Token Bucket Algorithm

The most common rate limiting implementation. The server maintains a “bucket” for each IP (or API key):

Token Bucket for IP 203.0.113.42:
├─ Capacity: 60 tokens
├─ Refill rate: 1 token/second
├─ Current tokens: 0 ← you've used them all
└─ Next refill: 1 second

Key insight: Token buckets allow bursts. If you haven’t made requests in a while, your bucket fills up. This means you can do 60 rapid requests, but then you must wait 60 seconds to refill.

Sliding Window

More sophisticated sites use a sliding window counter:

Rate limit: 100 requests per 60-second window

Your request history:
├─ 11:00:00 → 11:00:42: 95 requests ✅
├─ 11:00:43: request #96 ✅
├─ 11:00:44: request #97 ✅
├─ 11:00:45: request #98 ✅
├─ 11:00:46: request #99 ✅
├─ 11:00:47: request #100 ✅
└─ 11:00:48: request #101 → 429 ❌

No burst allowance. Every request counts equally within the window.

Adaptive Rate Limiting

Enterprise bot-protection adjusts limits dynamically:

New IP → generous limit (100 req/min)
After 500 requests → reduced limit (30 req/min)
Bot-like pattern detected → aggressive limit (5 req/min)
Known bot signature → instant block (0 req/min)

This is why a scraper “works for a while then suddenly stops” — the server is tightening the limits as it gains confidence you’re automated.

The Retry-After Header

When a well-configured server sends a 429, it includes a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 30 seconds."}

The value can be either:

Seconds: Retry-After: 30 → wait 30 seconds
Date: Retry-After: Wed, 26 Feb 2026 11:45:00 GMT → wait until that time

Always respect this header. Ignoring it and retrying immediately will:

Waste your bandwidth
Potentially trigger an escalation to IP ban
Never succeed (the server won’t serve you until the cooldown expires)

import time
import requests

def fetch_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=HEADERS)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 30))
            print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
            time.sleep(retry_after)
            continue

        response.raise_for_status()

    raise Exception(f"Failed after {max_retries} retries")

Exponential Backoff with Jitter

When there’s no Retry-After header, exponential backoff is the standard pattern:

import random
import time

def backoff_delay(attempt, base=1, max_delay=60):
    """Calculate delay with exponential backoff + jitter."""
    delay = min(base * (2 ** attempt), max_delay)
    jitter = random.uniform(0, delay * 0.5)
    return delay + jitter

# Attempt 0: ~1.0 - 1.5 seconds
# Attempt 1: ~2.0 - 3.0 seconds
# Attempt 2: ~4.0 - 6.0 seconds
# Attempt 3: ~8.0 - 12.0 seconds
# Attempt 4: ~16.0 - 24.0 seconds

Why jitter matters: Without jitter, if 100 scraper workers all get 429’d at the same time, they all retry at exactly the same time — creating a synchronized burst that triggers more 429s. Jitter spreads retries randomly across the window.

Distributed Rate Limit Management

For high-volume scraping, you need to manage rate limits across multiple IP addresses:

The Math

Target: 1,000,000 pages/day
Site's rate limit: 60 requests/min per IP
Seconds in a day: 86,400

Requests per IP per day: 60 × 60 × 24 = 86,400
IPs needed: 1,000,000 / 86,400 ≈ 12 IPs

But at 60 req/min, adaptive rate limiting will kick in.
Realistic sustainable rate: ~20 req/min per IP
Adjusted IPs needed: 1,000,000 / 28,800 ≈ 35 IPs

This is where proxy rotation becomes essential — not to hide your identity, but to distribute load across enough IPs to stay within each IP’s rate limit.

Cost Comparison: Proxy Approaches

Approach	IPs Available	Cost for 35 IPs	Rate Limit Management
Datacenter proxies	Dedicated	~$35/mo ($1/IP)	Manual — likely blocked fast
Residential rotating	Shared pool	~$75-150/mo (by GB)	Auto-rotation, but shared pool risks
ISP proxies	Dedicated residential	~$105-175/mo ($3-5/IP)	Best of both worlds
Managed scraping API	Handled for you	~$99-299/mo	Fully managed, includes retries

Common Rate Limiting Patterns by Site Type

Site Type	Typical Limit	Enforcement	Detection Signal
Public APIs	Documented (e.g., 100/min)	`Retry-After` header	API key / IP
E-commerce	Undocumented, ~30-60/min	429 → 403 → ban	IP + session cookie
News sites	Generous, ~120/min	Usually just 429	IP
Social platforms	Aggressive, ~10-20/min	429 → CAPTCHA → ban	Account + IP + fingerprint
Government/data portals	Very generous, ~300/min	Polite 429	IP

Engineering Patterns

Pattern 1: Request Queue with Per-Domain Throttling

# Pseudocode for a rate-aware scraping queue
class DomainThrottler:
    def __init__(self, requests_per_second=0.5):
        self.min_delay = 1.0 / requests_per_second
        self.last_request_time = {}

    async def throttle(self, domain):
        now = time.time()
        last = self.last_request_time.get(domain, 0)
        wait = max(0, self.min_delay - (now - last))

        if wait > 0:
            await asyncio.sleep(wait)

        self.last_request_time[domain] = time.time()

Pattern 2: Adaptive Rate Discovery

Start fast, slow down when you hit 429s, speed up when successful:

Initial rate: 2 req/s
├─ 10 successes → increase to 3 req/s
├─ 10 more successes → increase to 4 req/s
├─ Got 429 → decrease to 2 req/s
├─ Wait for Retry-After
├─ 10 successes → increase to 3 req/s
└─ ... converges to the site's actual limit

Pattern 3: Polite Scraping Defaults

POLITE_DEFAULTS = {
    "delay_range": (2, 8),           # seconds between requests
    "concurrent_per_domain": 1,       # one request at a time per domain
    "respect_robots_txt": True,
    "retry_on_429": True,
    "max_retries": 3,
    "backoff_factor": 2,
    "daily_limit_per_domain": 5000,   # self-imposed
    "user_agent_rotation": True,
}

Key Takeaways

429 is recoverable — unlike 403, the server explicitly invites you to retry later.
Always check Retry-After before implementing custom backoff.
Exponential backoff + jitter prevents thundering herd problems in distributed scrapers.
Adaptive rate limiting means your sustainable throughput decreases over time. Plan for 30-50% of the theoretical maximum.
Proxy rotation solves rate limits through parallelism, not bypass. You’re distributing load, not hiding.
The break-even point between self-managed proxies and a managed API is typically around 500K-1M requests/month.