๐๏ธ Architecture overview
An overview of the core components of the Crawlee library and its architecture.
๐๏ธ Avoid getting blocked
How to avoid getting blocked when scraping
๐๏ธ Logging in with a crawler
How to log in to websites with Crawlee.
๐๏ธ Creating web archive
How to create a Web ARChive (WARC) with Crawlee
๐๏ธ Error handling
How to handle errors that occur during web crawling.
๐๏ธ HTTP clients
Learn about Crawlee's HTTP client architecture, how to switch between different implementations, and create custom HTTP clients for specialized web scraping needs.
๐๏ธ HTTP crawlers
Learn about Crawlee's HTTP crawlers including BeautifulSoup, Parsel, and raw HTTP crawlers for efficient server-rendered content extraction without JavaScript execution.
๐๏ธ Playwright crawler
Learn how to use PlaywrightCrawler for browser-based web scraping.
๐๏ธ Adaptive Playwright crawler
Learn how to use the Adaptive Playwright crawler to automatically switch between browser-based and HTTP-only crawling.
๐๏ธ Playwright with Stagehand
How to integrate Stagehand AI-powered automation with PlaywrightCrawler.
๐๏ธ Proxy management
Using proxies to get around those annoying IP-blocks
๐๏ธ Request loaders
How to manage the requests your crawler will go through.
๐๏ธ Request router
Learn how to use the Router class to organize request handlers, error handlers, and pre-navigation hooks in Crawlee.
๐๏ธ Running in web server
Running in web server
๐๏ธ Scaling crawlers
Learn how to scale your crawlers by controlling concurrency and limiting requests per minute.
๐๏ธ Service locator
Crawlee's service locator is a central registry for global services, managing and providing access to them throughout the whole framework.
๐๏ธ Session management
How to manage your cookies, proxy IP rotations and more.
๐๏ธ Storage clients
How to work with storage clients in Crawlee, including the built-in clients and how to create your own.
๐๏ธ Storages
How to work with storages in Crawlee, how to manage requests and how to store and retrieve scraping results.
๐๏ธ Trace and monitor crawlers
Learn how to instrument your crawlers with OpenTelemetry to trace request handling, identify bottlenecks, monitor performance, and visualize telemetry data using Jaeger for performance optimization.