Setting up
This guide will help you get started with Crawlee by setting it up on your computer. Follow the steps below to ensure a smooth installation process.
Prerequisitesโ
Before installing Crawlee itself, make sure that your system meets the following requirements:
- Python 3.9 or higher: Crawlee requires Python 3.9 or a newer version. You can download Python from the official website.
- Python package manager: While this guide uses pip (the most common package manager), you can also use any package manager you want. You can download pip from the official website.
Verifying prerequisitesโ
To check if Python and pip are installed, run the following commands:
python --version
python -m pip --version
If these commands return the respective versions, you're ready to continue.
Installing Crawleeโ
Crawlee is available as crawlee
package on PyPI. This package includes the core functionality, while additional features are available as optional extras to keep dependencies and package size minimal.
Basic installationโ
To install the core package, run:
python -m pip install crawlee
After installation, verify that Crawlee is installed correctly by checking its version:
python -c 'import crawlee; print(crawlee.__version__)'
Full installationโ
If you do not mind the package size, you can run the following command to install Crawlee with all optional features:
python -m pip install 'crawlee[all]'
Installing specific extrasโ
Depending on your use case, you may want to install specific extras to enable additional functionality:
For using the BeautifulSoupCrawler
, install the beautifulsoup
extra:
python -m pip install 'crawlee[beautifulsoup]'
For using the ParselCrawler
, install the parsel
extra:
python -m pip install 'crawlee[parsel]'
For using the CurlImpersonateHttpClient
, install the curl-impersonate
extra:
python -m pip install 'crawlee[curl-impersonate]'
If you plan to use a (headless) browser with PlaywrightCrawler
, install Crawlee with the playwright
extra:
python -m pip install 'crawlee[playwright]'
After installing the playwright extra, install the necessary Playwright dependencies:
playwright install
Installing multiple extrasโ
You can install multiple extras at once by using a comma as a separator:
python -m pip install 'crawlee[beautifulsoup,curl-impersonate]'
Start a new projectโ
The quickest way to get started with Crawlee is by using the Crawlee CLI and selecting one of the prepared templates. The CLI helps you set up a new project in seconds.
Using Crawlee CLI with Pipxโ
First, ensure you have Pipx installed. You can check if Pipx is installed by running:
pipx --version
If Pipx is not installed, follow the official installation guide.
Then, run the Crawlee CLI using Pipx and choose from the available templates:
pipx run crawlee create my_crawler
Using Crawlee CLI directlyโ
If you already have crawlee
installed, you can spin it up by running:
crawlee create my_crawler
Follow the interactive prompts in the CLI to choose a crawler type and set up your new project.
Running your projectโ
To run your newly created project, navigate to the project directory, activate the virtual environment, and execute the Python interpreter with the project module:
- Linux
- Windows
cd my_crawler/
source .venv/bin/activate
python -m my_crawler
cd my_crawler/
venv\Scripts\activate
python -m my_crawler
Congratulations! You have successfully set up and executed your first Crawlee project.
Next stepsโ
Next, you will learn how to create a very simple crawler and Crawlee components while building it.