To run Crawlee on your own computer, you need to meet the following pre-requisites first:
- Have Node.js version 16.0 or higher installed.
- Have NPM installed, or use other package manager of your choice.
- NPM comes bundled with Node.js, so you should already have it. If not, reinstall Node.js.
If not certain, confirm the prerequisites by running:
Creating a new project
The fastest and best way to create new projects with Crawlee is to use the Crawlee CLI. You can use the
npx utility to download and run the CLI - it is embedded in the
npx crawlee create my-crawler
Let's choose the first template called Getting started example. The command will create a new directory in your current working directory, called my-crawler, add a package.json to this folder and install all the necessary dependencies. It will also add example source code that you can immediately run.
Let's try that!
You will see log messages in the terminal as Crawlee boots up and starts scraping the Crawlee website.
INFO PlaywrightCrawler: Starting the crawl
INFO PlaywrightCrawler: Title of https://crawlee.dev/ is 'Crawlee · Build reliable crawlers. Fast. | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/docs/examples is 'Examples | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/api/core is '@crawlee/core | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/api/core/changelog is 'Changelog | API | Crawlee'
INFO PlaywrightCrawler: Title of https://crawlee.dev/docs/quick-start is 'Quick Start | Crawlee'
You can always terminate the crawl with a keypress in the terminal:
Running headful browsers
Browsers controlled by Playwright run headless (without a visible window). You can switch to headful by uncommenting the
headless: false option in the crawler's constructor. This is useful in the development phase when you want to see what's going on in the browser.
// Uncomment this option to see the browser window.
When you run the example again, after a second a Chromium browser window will open. In the window, you'll see quickly changing pages as the crawler does its job.
The next lesson will teach you how to create a very simple crawler and explain Crawlee components while building it.