Skip to main content

Setting up

This guide will help you get started with Crawlee by setting it up on your computer. Follow the steps below to ensure a smooth installation process.

Prerequisitesโ€‹

Before installing Crawlee itself, make sure that your system meets the following requirements:

  • Python 3.9 or higher: Crawlee requires Python 3.9 or a newer version. You can download Python from the official website.
  • Python package manager: While this guide uses pip (the most common package manager), you can also use any package manager you want. You can download pip from the official website.

Verifying prerequisitesโ€‹

To check if Python and pip are installed, run the following commands:

python --version
python -m pip --version

If these commands return the respective versions, you're ready to continue.

Installing Crawleeโ€‹

Crawlee is available as crawlee package on PyPI. This package includes the core functionality, while additional features are available as optional extras to keep dependencies and package size minimal.

Basic installationโ€‹

To install the core package, run:

python -m pip install crawlee

After installation, verify that Crawlee is installed correctly by checking its version:

python -c 'import crawlee; print(crawlee.__version__)'

Full installationโ€‹

If you do not mind the package size, you can run the following command to install Crawlee with all optional features:

python -m pip install 'crawlee[all]'

Installing specific extrasโ€‹

Depending on your use case, you may want to install specific extras to enable additional functionality:

For using the BeautifulSoupCrawler, install the beautifulsoup extra:

python -m pip install 'crawlee[beautifulsoup]'

For using the ParselCrawler, install the parsel extra:

python -m pip install 'crawlee[parsel]'

For using the CurlImpersonateHttpClient, install the curl-impersonate extra:

python -m pip install 'crawlee[curl-impersonate]'

If you plan to use a (headless) browser with PlaywrightCrawler, install Crawlee with the playwright extra:

python -m pip install 'crawlee[playwright]'

After installing the playwright extra, install the necessary Playwright dependencies:

playwright install

Installing multiple extrasโ€‹

You can install multiple extras at once by using a comma as a separator:

python -m pip install 'crawlee[beautifulsoup,curl-impersonate]'

Start a new projectโ€‹

The quickest way to get started with Crawlee is by using the Crawlee CLI and selecting one of the prepared templates. The CLI helps you set up a new project in seconds.

Using Crawlee CLI with Pipxโ€‹

First, ensure you have Pipx installed. You can check if Pipx is installed by running:

pipx --version

If Pipx is not installed, follow the official installation guide.

Then, run the Crawlee CLI using Pipx and choose from the available templates:

pipx run crawlee create my_crawler

Using Crawlee CLI directlyโ€‹

If you already have crawlee installed, you can spin it up by running:

crawlee create my_crawler

Follow the interactive prompts in the CLI to choose a crawler type and set up your new project.

Running your projectโ€‹

To run your newly created project, navigate to the project directory, activate the virtual environment, and execute the Python interpreter with the project module:

cd my_crawler/
source .venv/bin/activate
python -m my_crawler

Congratulations! You have successfully set up and executed your first Crawlee project.

Next stepsโ€‹

Next, you will learn how to create a very simple crawler and Crawlee components while building it.