Running your crawler in the Cloud

Apify platform

Crawlee is developed by Apify, the web scraping and automation platform. You could say it is the home of Crawlee projects. In this section you'll see how to deploy the crawler there with just a few simple steps. You can deploy a Crawlee project wherever you want, but using the Apify platform will give you the best experience.

With a few simple steps, you can convert your Crawlee project into a so-called Actor. Actors are serverless micro-apps that are easy to develop, run, share, and integrate. The infra, proxies, and storages are ready to go. Learn more about Actors.

Dependencies

Before we get started, you'll need to install two new dependencies:

Apify SDK, a toolkit for working with the Apify platform. This will allow us to wire the storages (e.g. RequestQueue and Dataset) to the Apify cloud products. The Apify SDK, like Crawlee itself, is available as a PyPI package and can be installed with any Python package manager. To install it using pip, run:
```
pip install apify
```
Apify CLI, a command-line tool that will help us with authentication and deployment. It is a Node.js package, and can be installed using any Node.js package manager. In this guide, we will use npm. We will install it globally, so you can use it across all your Crawlee and Apify projects. To install it using npm, run:
```
npm install -g apify-cli
```

Logging in to the Apify platform

The next step will be creating your Apify account. Don't worry, we have a free tier, so you can try things out before you buy in! Once you have that, it's time to log in with the just-installed Apify CLI. You will need your personal access token, which you can find at https://console.apify.com/account#/integrations.

apify login

Adjusting the code

Now that you have your account set up, you will need to adjust the code a tiny bit. We will use the Apify SDK, which will help us to wire the Crawlee storages (like the RequestQueue) to their Apify platform counterparts - otherwise Crawlee would keep things only in memory.

Open your src/main.py file, and wrap everything in your main function with the Actor context manager. Your code should look like this:

src/main.py
import asyncio

from apify import Actor

from crawlee.crawlers import PlaywrightCrawler

from .routes import router


async def main() -> None:
    async with Actor:
        crawler = PlaywrightCrawler(
            # Let's limit our crawls to make our tests shorter and safer.
            max_requests_per_crawl=10,
            # Provide our router instance to the crawler.
            request_handler=router,
        )

        await crawler.run(['https://warehouse-theme-metal.myshopify.com/collections'])


if __name__ == '__main__':
    asyncio.run(main())

The context manager will configure Crawlee to use the Apify API instead of its default memory storage interface. It also sets up few other things, like listening to the platform events via websockets. After the body is finished, it handles graceful shutdown.

Understanding async with Actor behavior with environment variables

The Actor context manager works conditionally based on the environment variables, namely based on the APIFY_IS_AT_HOME env var, which is set to true on the Apify platform. This means that your project will remain working the same locally, but will use the Apify API when deployed to the Apify platform.

Initializing the project

You will also need to initialize the project for Apify, to do that, use the Apify CLI again:

apify init

The CLI will check the project structure and guide you through the setup process. If prompted, follow the instructions and answer the questions to configure the project correctly. For more information follow the Apify CLI documentation.

This will create a folder called .actor, and an actor.json file inside it - this file contains the configuration relevant to the Apify platform, namely the Actor name, version, build tag, and few other things. Check out the relevant documentation to see all the different things you can set there up.

Ship it!

And that's all, your project is now ready to be published on the Apify platform. You can use the Apify CLI once more to do that:

apify push

This command will create an archive from your project, upload it to the Apify platform and initiate a Docker build. Once finished, you will get a link to your new Actor on the platform.

Learning more about web scraping

Explore Apify Academy Resources

If you want to learn more about web scraping and browser automation, check out the Apify Academy. It's full of courses and tutorials on the topic. From beginner to advanced. And the best thing: It's free and open source ❤️

Thank you! 🎉

That's it! Thanks for reading the whole introduction and if there's anything wrong, please 🙏 let us know on GitHub or in our Discord community. Happy scraping! 👋

Apify platform​

Dependencies​

Logging in to the Apify platform​

Adjusting the code​

Initializing the project​

Ship it!​

Learning more about web scraping​

Thank you! 🎉​