web-scraping TechArticle Information Gain: 9/10

429 Too Many Requests: Rate Limiting for Scrapers

How rate limiting works, why your scraper gets 429 errors, and engineering patterns for handling request throttling at scale.

By ProxyOps Team ·

429 Too Many Requests: The Rate Limiting Deep Dive

HTTP 429 is the most honest error code in scraping. The server isn’t confused about who you are — it knows exactly what you’re doing — it’s simply saying: “Slow down.”

Unlike a 403 (which means “go away”), a 429 is an invitation to retry. But doing it wrong — hammering the server with immediate retries — will escalate your 429 into a permanent IP ban.

This guide covers how rate limiting actually works on the server side, and the engineering patterns that let you maximize throughput without triggering blocks.


How Rate Limiting Works (Server Side)

Understanding the server’s perspective helps you work within its constraints:

Token Bucket Algorithm

The most common rate limiting implementation. The server maintains a “bucket” for each IP (or API key):

Token Bucket for IP 203.0.113.42:
├─ Capacity: 60 tokens
├─ Refill rate: 1 token/second
├─ Current tokens: 0 ← you've used them all
└─ Next refill: 1 second

Key insight: Token buckets allow bursts. If you haven’t made requests in a while, your bucket fills up. This means you can do 60 rapid requests, but then you must wait 60 seconds to refill.

Sliding Window

More sophisticated sites use a sliding window counter:

Rate limit: 100 requests per 60-second window

Your request history:
├─ 11:00:00 → 11:00:42: 95 requests ✅
├─ 11:00:43: request #96 ✅
├─ 11:00:44: request #97 ✅
├─ 11:00:45: request #98 ✅
├─ 11:00:46: request #99 ✅
├─ 11:00:47: request #100 ✅
└─ 11:00:48: request #101 → 429 ❌

No burst allowance. Every request counts equally within the window.

Adaptive Rate Limiting

Enterprise bot-protection adjusts limits dynamically:

  • New IP → generous limit (100 req/min)
  • After 500 requests → reduced limit (30 req/min)
  • Bot-like pattern detected → aggressive limit (5 req/min)
  • Known bot signature → instant block (0 req/min)

This is why a scraper “works for a while then suddenly stops” — the server is tightening the limits as it gains confidence you’re automated.


The Retry-After Header

When a well-configured server sends a 429, it includes a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{"error": "Rate limit exceeded. Try again in 30 seconds."}

The value can be either:

  • Seconds: Retry-After: 30 → wait 30 seconds
  • Date: Retry-After: Wed, 26 Feb 2026 11:45:00 GMT → wait until that time

Always respect this header. Ignoring it and retrying immediately will:

  1. Waste your bandwidth
  2. Potentially trigger an escalation to IP ban
  3. Never succeed (the server won’t serve you until the cooldown expires)
import time
import requests

def fetch_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=HEADERS)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 30))
            print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
            time.sleep(retry_after)
            continue

        response.raise_for_status()

    raise Exception(f"Failed after {max_retries} retries")

Exponential Backoff with Jitter

When there’s no Retry-After header, exponential backoff is the standard pattern:

import random
import time

def backoff_delay(attempt, base=1, max_delay=60):
    """Calculate delay with exponential backoff + jitter."""
    delay = min(base * (2 ** attempt), max_delay)
    jitter = random.uniform(0, delay * 0.5)
    return delay + jitter

# Attempt 0: ~1.0 - 1.5 seconds
# Attempt 1: ~2.0 - 3.0 seconds
# Attempt 2: ~4.0 - 6.0 seconds
# Attempt 3: ~8.0 - 12.0 seconds
# Attempt 4: ~16.0 - 24.0 seconds

Why jitter matters: Without jitter, if 100 scraper workers all get 429’d at the same time, they all retry at exactly the same time — creating a synchronized burst that triggers more 429s. Jitter spreads retries randomly across the window.


Distributed Rate Limit Management

For high-volume scraping, you need to manage rate limits across multiple IP addresses:

The Math

Target: 1,000,000 pages/day
Site's rate limit: 60 requests/min per IP
Seconds in a day: 86,400

Requests per IP per day: 60 × 60 × 24 = 86,400
IPs needed: 1,000,000 / 86,400 ≈ 12 IPs

But at 60 req/min, adaptive rate limiting will kick in.
Realistic sustainable rate: ~20 req/min per IP
Adjusted IPs needed: 1,000,000 / 28,800 ≈ 35 IPs

This is where proxy rotation becomes essential — not to hide your identity, but to distribute load across enough IPs to stay within each IP’s rate limit.

Cost Comparison: Proxy Approaches

ApproachIPs AvailableCost for 35 IPsRate Limit Management
Datacenter proxiesDedicated~$35/mo ($1/IP)Manual — likely blocked fast
Residential rotatingShared pool~$75-150/mo (by GB)Auto-rotation, but shared pool risks
ISP proxiesDedicated residential~$105-175/mo ($3-5/IP)Best of both worlds
Managed scraping APIHandled for you~$99-299/moFully managed, includes retries

Common Rate Limiting Patterns by Site Type

Site TypeTypical LimitEnforcementDetection Signal
Public APIsDocumented (e.g., 100/min)Retry-After headerAPI key / IP
E-commerceUndocumented, ~30-60/min429 → 403 → banIP + session cookie
News sitesGenerous, ~120/minUsually just 429IP
Social platformsAggressive, ~10-20/min429 → CAPTCHA → banAccount + IP + fingerprint
Government/data portalsVery generous, ~300/minPolite 429IP

Engineering Patterns

Pattern 1: Request Queue with Per-Domain Throttling

# Pseudocode for a rate-aware scraping queue
class DomainThrottler:
    def __init__(self, requests_per_second=0.5):
        self.min_delay = 1.0 / requests_per_second
        self.last_request_time = {}

    async def throttle(self, domain):
        now = time.time()
        last = self.last_request_time.get(domain, 0)
        wait = max(0, self.min_delay - (now - last))

        if wait > 0:
            await asyncio.sleep(wait)

        self.last_request_time[domain] = time.time()

Pattern 2: Adaptive Rate Discovery

Start fast, slow down when you hit 429s, speed up when successful:

Initial rate: 2 req/s
├─ 10 successes → increase to 3 req/s
├─ 10 more successes → increase to 4 req/s
├─ Got 429 → decrease to 2 req/s
├─ Wait for Retry-After
├─ 10 successes → increase to 3 req/s
└─ ... converges to the site's actual limit

Pattern 3: Polite Scraping Defaults

POLITE_DEFAULTS = {
    "delay_range": (2, 8),           # seconds between requests
    "concurrent_per_domain": 1,       # one request at a time per domain
    "respect_robots_txt": True,
    "retry_on_429": True,
    "max_retries": 3,
    "backoff_factor": 2,
    "daily_limit_per_domain": 5000,   # self-imposed
    "user_agent_rotation": True,
}

Key Takeaways

  1. 429 is recoverable — unlike 403, the server explicitly invites you to retry later.
  2. Always check Retry-After before implementing custom backoff.
  3. Exponential backoff + jitter prevents thundering herd problems in distributed scrapers.
  4. Adaptive rate limiting means your sustainable throughput decreases over time. Plan for 30-50% of the theoretical maximum.
  5. Proxy rotation solves rate limits through parallelism, not bypass. You’re distributing load, not hiding.
  6. The break-even point between self-managed proxies and a managed API is typically around 500K-1M requests/month.
PS

ProxyOps Team

Independent infrastructure reviews from engineers who've deployed at scale. No vendor bias, just data.