How Datadome Bot Detection Works Under the Hood
Technical deep dive into Datadome's detection: JS fingerprinting, Picasso challenges, TLS analysis, and behavioral ML. What scrapers face.
How Datadome Bot Detection Works (Technical Deep Dive)
Datadome protects over 10,000 websites — primarily e-commerce, ticketing, and classified ad platforms. If you’ve ever hit a page that suddenly showed a CAPTCHA with the Datadome logo, or received a x-datadome header in your response, you’ve encountered one of the most sophisticated bot-detection systems in production.
This article explains how Datadome detects automated traffic, not how to circumvent it. Understanding the detection mechanisms helps data teams make informed decisions about their collection infrastructure.
Datadome’s Detection Stack
Datadome operates as a reverse proxy, analyzing traffic before it reaches the origin server. Every request is evaluated across 5 detection layers simultaneously, with each layer contributing to a composite trust score.
Layer 1: Server-Side Signal Analysis
Before any code runs in the browser, Datadome analyzes the raw network request:
TLS Fingerprinting Like Cloudflare, Datadome generates a JA3/JA4 hash from your TLS Client Hello. The fingerprint identifies whether the connection comes from a real browser, an HTTP library, or a headless browser.
Detection logic:
1. Extract TLS Client Hello parameters
2. Generate JA3 hash
3. Compare against known browser fingerprints
4. Flag mismatches (Python requests ≠ Chrome)
HTTP Header Analysis Datadome checks header completeness, ordering, and internal consistency. It looks for:
- Missing
sec-ch-ua-*Client Hints (Chrome always sends these) - Incorrect header ordering (each browser has a unique order)
- Inconsistencies between User-Agent and other headers
IP Reputation Every IP is classified:
- Datacenter vs. residential vs. mobile
- Association with known proxy/VPN providers
- Historical bot activity from that IP or subnet
- Geographic plausibility (Swedish IP accessing Japanese site?)
Layer 2: JavaScript Fingerprinting (Client-Side Tag)
This is where Datadome becomes significantly more sophisticated than basic WAF rules. A JavaScript tag injected into every page collects:
Browser Environment
// Datadome's JS tag collects signals like:
navigator.userAgent // Browser identification
navigator.platform // OS platform
navigator.hardwareConcurrency // CPU cores
navigator.deviceMemory // RAM (Chrome only)
screen.width / screen.height // Screen resolution
window.devicePixelRatio // Display density
Intl.DateTimeFormat().resolvedOptions().timeZone // Timezone
Canvas Fingerprinting Datadome renders invisible images using the HTML Canvas API and WebGL. The rendering output varies by:
- GPU manufacturer and model
- Graphics driver version
- Browser rendering engine
- Operating system
This creates a device fingerprint that’s consistent for real devices but inconsistent for emulated environments. Headless browsers in VMs produce canvas fingerprints that don’t match any known real device configuration.
JavaScript Engine Properties
// Headless browser detection signals:
navigator.webdriver // true in automation tools
window.chrome // missing in some headless configs
navigator.plugins.length // 0 in headless browsers
navigator.languages // often incomplete in automation
// Advanced detection:
Function.prototype.toString.call(HTMLElement.prototype.click)
// Returns different results in patched vs. real browsers
Layer 3: The “Picasso” Challenge
This is Datadome’s most innovative detection — and the one that catches sophisticated scrapers who pass all other checks.
How Picasso works:
- Datadome sends a set of graphical rendering instructions to the client
- The browser must execute these instructions using Canvas/WebGL
- The rendering output is sent back to Datadome
- Datadome verifies the output matches what the claimed browser/OS combination should produce
Why it’s effective:
- A real Chrome on macOS produces a specific pixel-perfect rendering
- Chrome on Windows produces a slightly different rendering (different font rendering engine)
- A headless Chrome in Docker produces yet another rendering (no GPU, software rendering)
- If the Picasso output doesn’t match the claimed User-Agent + platform, the request is flagged
This means you can’t just say you’re “Chrome on macOS” — you must actually render like Chrome on macOS. Spoofing User-Agent and headers is insufficient; the visual output must be consistent.
Layer 4: Behavioral Analysis (ML)
Datadome’s machine learning models analyze:
Mouse and Touch Behavior
Human patterns:
├─ Curved mouse movements (Bézier-like paths)
├─ Variable movement speed (accelerate/decelerate)
├─ Natural click positions (not pixel-perfect center)
├─ Occasional scroll events between actions
└─ Idle periods (reading content)
Bot patterns:
├─ Linear or absent mouse movements
├─ Instant teleportation between coordinates
├─ Perfectly centered clicks
├─ No scroll events
└─ Immediate action upon page load
Timing Patterns
- Time from page load to first interaction
- Consistency of delay between actions
- Whether timings follow human distributions (typically log-normal)
Navigation Patterns
- Do you visit pages in a logical order?
- Do you load resources (CSS, images, fonts) like a real browser?
- Do you follow the expected referrer chain?
Layer 5: Device Check and CAPTCHA
When Datadome’s trust score drops below a threshold but isn’t conclusive enough for an outright block, it serves a Device Check — a full-page interstitial that:
- Runs additional JavaScript fingerprinting
- Presents a visual challenge (slider, image selection)
- Collects behavioral data during the challenge (mouse movement analysis)
- Generates a clearance cookie if passed
The newer WASM (WebAssembly) challenges add another layer: the browser must execute a compiled state machine that produces a specific output. This is computationally expensive to solve without actually executing the WASM binary in a real browser environment.
Detection Timeline
What happens during a typical Datadome-protected page load:
Time 0ms: TLS handshake → JA3 fingerprint extracted
Time 1ms: HTTP request received → headers analyzed
Time 2ms: IP reputation checked against database
Time 5ms: Initial trust score calculated
Time 10ms: HTML response sent (includes Datadome JS tag)
Time 50ms: JS tag begins collecting browser fingerprint
Time 100ms: Canvas/WebGL rendering executed
Time 150ms: Picasso challenge completed
Time 200ms: Behavioral monitoring begins
Time 300ms: All signals sent to Datadome's ML engine
Time 350ms: Final decision: allow / challenge / block
The entire detection pipeline runs in under 350 milliseconds. This is why Datadome claims minimal performance impact on legitimate users — the detection is faster than the page render.
What Makes Datadome Different from Other Systems
| Feature | Cloudflare | Datadome | Akamai | PerimeterX |
|---|---|---|---|---|
| TLS fingerprinting | ✅ | ✅ | ✅ | ✅ |
| JS fingerprinting | ✅ | ✅ (deeper) | ✅ | ✅ |
| Canvas fingerprinting | ⚠️ Limited | ✅ Full | ✅ | ✅ |
| Picasso validation | ❌ | ✅ Unique | ❌ | ❌ |
| WASM challenges | ❌ | ✅ | ❌ | ⚠️ |
| Behavioral ML | ✅ | ✅ (advanced) | ✅ | ✅ |
| Mobile SDK | ❌ | ✅ | ✅ | ✅ |
| Detection latency | ~20ms | ~50ms | ~30ms | ~40ms |
Datadome’s Picasso challenge is its primary differentiator. It’s the only major bot-protection system that validates visual rendering consistency against device claims.
Identifying Datadome-Protected Sites
You can detect Datadome presence through:
Response Headers
x-datadome: protected
x-dd-b: value
x-dd-type: value
Set-Cookie: datadome=xxx
Page Source
<script src="https://js.datadome.co/tags.js"></script>
Challenge Page The Datadome CAPTCHA/Device Check has a distinctive visual style with the Datadome logo and a specific slider or image challenge format.
Infrastructure Implications
For data teams that need to collect information from Datadome-protected sites:
| Approach | Effectiveness | Monthly Cost | Engineering Effort |
|---|---|---|---|
| Standard HTTP library | ❌ Blocked instantly | $0 | None |
| Headless browser (basic) | ❌ Canvas fingerprint fails | ~$30/mo VPS | Low |
| Headless + stealth plugins | ⚠️ May pass JS checks, Picasso often fails | ~$50/mo VPS | Medium |
| Managed unblocking API | ✅ Provider handles detection | $99-499/mo | None |
| Premium proxy + real browser | ✅ If browser is properly configured | $150-500/mo | High |
The Picasso challenge specifically makes Datadome harder to handle than other protection systems. A properly patched headless browser might pass TLS, header, and JS checks but still fail the visual rendering validation.
Key Takeaways
- Datadome uses 5 simultaneous detection layers — passing one is not enough.
- Picasso challenges validate rendering output, not just browser properties. This catches headless browsers that otherwise look real.
- Behavioral ML runs on every interaction, not just the first request. Maintaining human-like patterns throughout a session is essential.
- WASM challenges add computational requirements that can’t be simulated without actually executing the binary.
- Detection happens in under 350ms — it doesn’t impact legitimate user experience.
- For B2B data collection, managed scraping services or premium proxy solutions with built-in Datadome handling are typically more cost-effective than building and maintaining a custom solution.
ProxyOps Team
Independent infrastructure reviews from engineers who've deployed at scale. No vendor bias, just data.