Introduction
Crawlee covers your crawling and scraping end-to-end and helps you build reliable scrapers. Fast.
Your crawlers will appear human-like and fly under the radar of modern bot protections even with the default configuration. Crawlee gives you the tools to crawl the web for links, scrape data and persistently store it in machine-readable formats, without having to worry about the technical details. And thanks to rich configuration options, you can tweak almost any aspect of Crawlee to suit your project's needs if the default settings don't cut it.
What you will learn
The goal of the introduction is to provide a step-by-step guide to the most important features of Crawlee. It will walk you through creating the simplest of crawlers that only prints text to console, all the way up to a full-featured scraper that collects links from a website and extracts data.
🛠 Features
Why Crawlee is the preferred choice for web scraping and crawling?
Why use Crawlee instead of just a random HTTP library with an HTML parser?
- Unified interface for HTTP & headless browser crawling.
- Automatic parallel crawling based on available system resources.
- Written in Python with type hints - enhances DX (IDE autocompletion) and reduces bugs (static type checking).
- Automatic retries on errors or when you are getting blocked.
- Integrated proxy rotation and session management.
- Configurable request routing - direct URLs to the appropriate handlers.
- Persistent queue for URLs to crawl.
- Pluggable storage of both tabular data and files.
- Robust error handling.
Why to use Crawlee rather than Scrapy?
- Crawlee has out-of-the-box support for headless browser crawling (Playwright).
- Crawlee has a minimalistic & elegant interface - Set up your scraper with fewer than 10 lines of code.
- Complete type hint coverage.
- Based on standard Asyncio.
Next steps
Next, you will install Crawlee and learn how to bootstrap projects with the prepared Crawlee templates.