Fabien Vauchelles
Cracking the Code: Decoding Anti-Bot Systems!
#1about 5 minutes
The fundamental challenge of web scraping as a turing test
Web scraping is fundamentally a Turing test where automated scripts must mimic natural human behavior to avoid detection by anti-bot systems.
#2about 10 minutes
How anti-bot systems analyze the browser stack for signals
Anti-bot systems analyze signals from the entire browser stack, including IP address, TCP/TLS/HTTP2 fingerprints, JavaScript execution, and user navigation patterns.
#3about 2 minutes
Exploiting the business need to minimize false positives
The necessity for websites to avoid blocking real customers (false positives) forces anti-bot systems to focus on a limited set of the most effective signals.
#4about 5 minutes
Tools and techniques to identify anti-bot systems
Use tools like Wappalyzer, browser dev tools, and proxy interceptors to identify the specific anti-bot protection and analyze its architecture and encrypted payloads.
#5about 7 minutes
A step-by-step methodology for building robust scrapers
Follow an incremental approach to bypass protections, starting with basic scraper tuning and progressively adding proxies, headless browsers, and unblocker APIs.
#6about 4 minutes
Designing a scalable architecture for data collection
Build a scalable scraping infrastructure using a central data store, an orchestrator, a proxy management layer, and a farm of diverse browsers.
#7about 7 minutes
Decoding common javascript obfuscation techniques
Anti-bot systems use JavaScript obfuscation techniques like string concealing, code flow confusion, and control flow flattening to make their code unreadable.
#8about 3 minutes
Identifying the five key signal types after deobfuscation
After deobfuscating the code, identify the five main types of signals collected: configuration details, automation flags, rendering fingerprints, reverse engineering checks, and integrity controls.
#9about 1 minute
The next frontier in anti-bot is javascript virtual machines
The next evolution in anti-bot technology involves JavaScript virtual machines that execute proprietary, undocumented bytecode, making reverse engineering significantly more difficult.
#10about 14 minutes
Answering questions on scraping legality, VPNs, and rate limits
The Q&A session addresses common questions about the legality of web scraping, the effectiveness of VPNs, managing rate limits, and the cat-and-mouse game with anti-bot providers.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
04:01 MIN
Navigating the complexities of modern web scraping
How to scrape modern websites to feed AI agents
11:08 MIN
Overcoming blocking techniques and messy HTML
Scrape, Train, Predict: The Lifecycle of Data for AI Applications
17:41 MIN
Presenting live web scraping demos at a developer conference
Tech with Tim at WeAreDevelopers World Congress 2024
49:19 MIN
Defending systems with honeypots and tarpits
Honeypots and Tarpits, Benefits of Building your own Tools and more with Salma Alam-Naylor
24:42 MIN
How to defend against AI-powered attacks
Skynet wants your Passwords! The Role of AI in Automating Social Engineering
19:38 MIN
Solving scaling challenges in web data collection
Tech with Tim at WeAreDevelopers World Congress 2024
12:04 MIN
The evolving cybersecurity landscape with AI
Fireside Chat with Cloudflare's Chief Strategy Officer, Stephanie Cohen (with Mike Butcher MBE)
08:08 MIN
Verifying bots and agents with Web Bot Auth
Fireside Chat with Cloudflare's Chief Strategy Officer, Stephanie Cohen (with Mike Butcher MBE)
Featured Partners
Related Videos
The attacker's footprint
Antonio de Mello & Amine Abed
WeAreDevelopers LIVE: Scammer Payback with Python, Grok Goes Unhinged, The Future of Chromium and mo
Dan Cranney, Chris Heilmann & Brian Rountree
Getting under the skin: The Social Engineering techniques
Mauro Verderosa
WeAreDevelopers LIVE - Chrome for Sale? Comet - the upcoming perplexity browser Stealing and leaking
Chris Heilmann & Daniel Cranney & Ramona Schwering
Skynet wants your Passwords! The Role of AI in Automating Social Engineering
Wolfgang Ettlinger & Alexander Hurbean
Typed Security: Preventing Vulnerabilities By Design
Michael Koppmann
How to scrape modern websites to feed AI agents
Jan Curn
WeAreDevelopers Live: Browser Extensions, Honey Scam, Jailbreaking LLMs and more
Chris Heilmann & Daniel Cranney
Related Articles
View all articles



From learning to earning
Jobs that call for the skills explored in this talk.

Développement d'un agent interviewer pur web (LLM in the browser)
b\u003C>com
Remote
€13K
Software Architecture
Natural Language Processing







