To run Crawlee on your own computer, you need to meet the following pre-requisites first:
- Have Node.js version 16.0 or higher installed.
- Have NPM installed, or use other package manager of your choice.
- NPM comes bundled with Node.js, so you should already have it. If not, reinstall Node.js.
If not certain, confirm the prerequisites by running:
Creating a new project
The fastest and best way to create new projects with Crawlee is to use the Crawlee CLI. You can use the
npx utility to download and run the CLI - it is also embedded in the
npx crawlee create my-new-project
Let's choose the first template called Crawlee playwright template. The command will create a new directory in your current working directory, called my-new-project, add a package.json to this folder and install all the necessary dependencies. It will also add example source code that you can immediately run.
Let's try that!
You will see log messages in the terminal as Crawlee boots up and after a second a Chromium browser window will open. In the window, you'll see quickly changing pages and back in the terminal, you will see the printed titles (contents of the
<title> HTML tags) of the pages.
We picked the Playwright template, which uses Chromium to open pages. If you pick the Cheerio template instead, there won't be any browser window, as the requests to the target site will be done via a specialized HTTP client:
got-scraping, instead of a browser.
You can always terminate the crawl with a keypress in the terminal:
The next lesson will teach you how to create a very simple crawler and explain Crawlee components while building it.