429 Too Many Requests: Rate Limiting for Scrapers
How rate limiting works, why your scraper gets 429 errors, and engineering patterns for handling request throttling at scale.
429 Too Many Requests: The Rate Limiting Deep Dive
HTTP 429 is the most honest error code in scraping. The server isn’t confused about who you are — it knows exactly what you’re doing — it’s simply saying: “Slow down.”
Unlike a 403 (which means “go away”), a 429 is an invitation to retry. But doing it wrong — hammering the server with immediate retries — will escalate your 429 into a permanent IP ban.
This guide covers how rate limiting actually works on the server side, and the engineering patterns that let you maximize throughput without triggering blocks.
How Rate Limiting Works (Server Side)
Understanding the server’s perspective helps you work within its constraints:
Token Bucket Algorithm
The most common rate limiting implementation. The server maintains a “bucket” for each IP (or API key):
Token Bucket for IP 203.0.113.42:
├─ Capacity: 60 tokens
├─ Refill rate: 1 token/second
├─ Current tokens: 0 ← you've used them all
└─ Next refill: 1 second
Key insight: Token buckets allow bursts. If you haven’t made requests in a while, your bucket fills up. This means you can do 60 rapid requests, but then you must wait 60 seconds to refill.
Sliding Window
More sophisticated sites use a sliding window counter:
Rate limit: 100 requests per 60-second window
Your request history:
├─ 11:00:00 → 11:00:42: 95 requests ✅
├─ 11:00:43: request #96 ✅
├─ 11:00:44: request #97 ✅
├─ 11:00:45: request #98 ✅
├─ 11:00:46: request #99 ✅
├─ 11:00:47: request #100 ✅
└─ 11:00:48: request #101 → 429 ❌
No burst allowance. Every request counts equally within the window.
Adaptive Rate Limiting
Enterprise bot-protection adjusts limits dynamically:
- New IP → generous limit (100 req/min)
- After 500 requests → reduced limit (30 req/min)
- Bot-like pattern detected → aggressive limit (5 req/min)
- Known bot signature → instant block (0 req/min)
This is why a scraper “works for a while then suddenly stops” — the server is tightening the limits as it gains confidence you’re automated.
The Retry-After Header
When a well-configured server sends a 429, it includes a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{"error": "Rate limit exceeded. Try again in 30 seconds."}
The value can be either:
- Seconds:
Retry-After: 30→ wait 30 seconds - Date:
Retry-After: Wed, 26 Feb 2026 11:45:00 GMT→ wait until that time
Always respect this header. Ignoring it and retrying immediately will:
- Waste your bandwidth
- Potentially trigger an escalation to IP ban
- Never succeed (the server won’t serve you until the cooldown expires)
import time
import requests
def fetch_with_retry(url, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url, headers=HEADERS)
if response.status_code == 200:
return response
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 30))
print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
time.sleep(retry_after)
continue
response.raise_for_status()
raise Exception(f"Failed after {max_retries} retries")
Exponential Backoff with Jitter
When there’s no Retry-After header, exponential backoff is the standard pattern:
import random
import time
def backoff_delay(attempt, base=1, max_delay=60):
"""Calculate delay with exponential backoff + jitter."""
delay = min(base * (2 ** attempt), max_delay)
jitter = random.uniform(0, delay * 0.5)
return delay + jitter
# Attempt 0: ~1.0 - 1.5 seconds
# Attempt 1: ~2.0 - 3.0 seconds
# Attempt 2: ~4.0 - 6.0 seconds
# Attempt 3: ~8.0 - 12.0 seconds
# Attempt 4: ~16.0 - 24.0 seconds
Why jitter matters: Without jitter, if 100 scraper workers all get 429’d at the same time, they all retry at exactly the same time — creating a synchronized burst that triggers more 429s. Jitter spreads retries randomly across the window.
Distributed Rate Limit Management
For high-volume scraping, you need to manage rate limits across multiple IP addresses:
The Math
Target: 1,000,000 pages/day
Site's rate limit: 60 requests/min per IP
Seconds in a day: 86,400
Requests per IP per day: 60 × 60 × 24 = 86,400
IPs needed: 1,000,000 / 86,400 ≈ 12 IPs
But at 60 req/min, adaptive rate limiting will kick in.
Realistic sustainable rate: ~20 req/min per IP
Adjusted IPs needed: 1,000,000 / 28,800 ≈ 35 IPs
This is where proxy rotation becomes essential — not to hide your identity, but to distribute load across enough IPs to stay within each IP’s rate limit.
Cost Comparison: Proxy Approaches
| Approach | IPs Available | Cost for 35 IPs | Rate Limit Management |
|---|---|---|---|
| Datacenter proxies | Dedicated | ~$35/mo ($1/IP) | Manual — likely blocked fast |
| Residential rotating | Shared pool | ~$75-150/mo (by GB) | Auto-rotation, but shared pool risks |
| ISP proxies | Dedicated residential | ~$105-175/mo ($3-5/IP) | Best of both worlds |
| Managed scraping API | Handled for you | ~$99-299/mo | Fully managed, includes retries |
Common Rate Limiting Patterns by Site Type
| Site Type | Typical Limit | Enforcement | Detection Signal |
|---|---|---|---|
| Public APIs | Documented (e.g., 100/min) | Retry-After header | API key / IP |
| E-commerce | Undocumented, ~30-60/min | 429 → 403 → ban | IP + session cookie |
| News sites | Generous, ~120/min | Usually just 429 | IP |
| Social platforms | Aggressive, ~10-20/min | 429 → CAPTCHA → ban | Account + IP + fingerprint |
| Government/data portals | Very generous, ~300/min | Polite 429 | IP |
Engineering Patterns
Pattern 1: Request Queue with Per-Domain Throttling
# Pseudocode for a rate-aware scraping queue
class DomainThrottler:
def __init__(self, requests_per_second=0.5):
self.min_delay = 1.0 / requests_per_second
self.last_request_time = {}
async def throttle(self, domain):
now = time.time()
last = self.last_request_time.get(domain, 0)
wait = max(0, self.min_delay - (now - last))
if wait > 0:
await asyncio.sleep(wait)
self.last_request_time[domain] = time.time()
Pattern 2: Adaptive Rate Discovery
Start fast, slow down when you hit 429s, speed up when successful:
Initial rate: 2 req/s
├─ 10 successes → increase to 3 req/s
├─ 10 more successes → increase to 4 req/s
├─ Got 429 → decrease to 2 req/s
├─ Wait for Retry-After
├─ 10 successes → increase to 3 req/s
└─ ... converges to the site's actual limit
Pattern 3: Polite Scraping Defaults
POLITE_DEFAULTS = {
"delay_range": (2, 8), # seconds between requests
"concurrent_per_domain": 1, # one request at a time per domain
"respect_robots_txt": True,
"retry_on_429": True,
"max_retries": 3,
"backoff_factor": 2,
"daily_limit_per_domain": 5000, # self-imposed
"user_agent_rotation": True,
}
Key Takeaways
- 429 is recoverable — unlike 403, the server explicitly invites you to retry later.
- Always check
Retry-Afterbefore implementing custom backoff. - Exponential backoff + jitter prevents thundering herd problems in distributed scrapers.
- Adaptive rate limiting means your sustainable throughput decreases over time. Plan for 30-50% of the theoretical maximum.
- Proxy rotation solves rate limits through parallelism, not bypass. You’re distributing load, not hiding.
- The break-even point between self-managed proxies and a managed API is typically around 500K-1M requests/month.
ProxyOps Team
Independent infrastructure reviews from engineers who've deployed at scale. No vendor bias, just data.