Guides | Crawlee for Python · Fast, reliable Python web crawlers.

📄️ Architecture overview

An overview of the core components of the Crawlee library and its architecture.

📄️ Avoid getting blocked

How to avoid getting blocked when scraping

📄️ Logging in with a crawler

How to log in to websites with Crawlee.

📄️ Creating web archive

How to create a Web ARChive (WARC) with Crawlee

📄️ Error handling

How to handle errors that occur during web crawling.

📄️ HTTP clients

Learn about Crawlee's HTTP client architecture, how to switch between different implementations, and create custom HTTP clients for specialized web scraping needs.

📄️ HTTP crawlers

Learn about Crawlee's HTTP crawlers including BeautifulSoup, Parsel, and raw HTTP crawlers for efficient server-rendered content extraction without JavaScript execution.

📄️ Playwright crawler

Learn how to use PlaywrightCrawler for browser-based web scraping.

📄️ Adaptive Playwright crawler

Learn how to use the Adaptive Playwright crawler to automatically switch between browser-based and HTTP-only crawling.

📄️ Playwright with Stagehand

How to integrate Stagehand AI-powered automation with PlaywrightCrawler.

📄️ Proxy management

Using proxies to get around those annoying IP-blocks

📄️ Request loaders

How to manage the requests your crawler will go through.

📄️ Request router

Learn how to use the Router class to organize request handlers, error handlers, and pre-navigation hooks in Crawlee.

📄️ Running in web server

Running in web server

📄️ Scaling crawlers

Learn how to scale your crawlers by controlling concurrency and limiting requests per minute.

📄️ Service locator

Crawlee's service locator is a central registry for global services, managing and providing access to them throughout the whole framework.

📄️ Session management

How to manage your cookies, proxy IP rotations and more.

📄️ Storage clients

How to work with storage clients in Crawlee, including the built-in clients and how to create your own.

📄️ Storages

How to work with storages in Crawlee, how to manage requests and how to store and retrieve scraping results.

📄️ Trace and monitor crawlers

Learn how to instrument your crawlers with OpenTelemetry to trace request handling, identify bottlenecks, monitor performance, and visualize telemetry data using Jaeger for performance optimization.