Examples | Crawlee for JavaScript · Build reliable crawlers. Fast.

📄️ Accept user input

This example accepts and logs user input:

📄️ Add data to dataset

This example saves data to the default dataset. If the dataset doesn't exist, it will be created.

This is the most bare-bones example of using Crawlee, which demonstrates some of its building blocks such as the BasicCrawler. You probably don't need to go this deep though, and it would be better to start with one of the full-featured crawlers

📄️ Cheerio crawler

This example demonstrates how to use CheerioCrawler to crawl a list of URLs from an external file, load each URL using a plain HTTP request, parse the HTML using the Cheerio library and extract some data from it: the page title and all h1 tags.

📄️ Crawl all links on a website

This example uses the enqueueLinks() method to add new links to the RequestQueue

📄️ Crawl multiple URLs

This example crawls the specified list of URLs.

📄️ Crawl a website with relative links

When crawling a website, you may encounter different types of links present that you may want to crawl.

📄️ Crawl a single URL

This example uses the got-scraping npm package

📄️ Crawl a sitemap

We will crawl sitemap which tells search engines which pages and file are important in the website, it also provides valuable information about these files. This example builds a sitemap crawler which downloads and crawls the URLs from a sitemap, by using the Sitemap utility class provided by the @crawlee/utils module.

📄️ Crawl some links on a website

This CheerioCrawler example uses the globs property in the enqueueLinks() method to only add links to the RequestQueue queue if they match the specified pattern.

📄️ Using Puppeteer Stealth Plugin (puppeteer-extra) and playwright-extra

puppeteer-extra and playwright-extra are community-built

📄️ Export entire dataset to one file

This Dataset example uses the exportToValue function to export the entire default dataset to a single CSV file into the default key-value store.

📄️ Download a file

When webcrawling, you sometimes need to download files such as images, PDFs, or other binary files. This example demonstrates how to download files using Crawlee and save them to the default key-value store.

📄️ Download a file with Node.js streams

For larger files, it is more efficient to use Node.js streams to download and transfer the files. This example demonstrates how to download files using streams.

📄️ Fill and Submit a Form using Puppeteer

This example demonstrates how to use PuppeteerCrawler to

📄️ HTTP crawler

This example demonstrates how to use HttpCrawler to build a HTML crawler that crawls a list of URLs from an external file, load each URL using a plain HTTP request, and save HTML.

📄️ JSDOM crawler

This example demonstrates how to use JSDOMCrawler to interact with a website using jsdom DOM implementation.

📄️ Dataset Map and Reduce methods

This example shows an easy use-case of the Dataset map

📄️ Playwright crawler

This example demonstrates how to use PlaywrightCrawler in combination with RequestQueue to recursively scrape the Hacker News website using headless Chrome / Playwright.

📄️ Using Firefox browser with Playwright crawler

This example demonstrates how to use PlaywrightCrawler with headless Firefox browser.

📄️ Capture a screenshot using Puppeteer

Using Puppeteer directly

📄️ Puppeteer crawler

This example demonstrates how to use PuppeteerCrawler in combination

📄️ Puppeteer recursive crawl

Run the following example to perform a recursive crawl of a website using PuppeteerCrawler.

📄️ Skipping navigations for certain requests

While crawling a website, you may encounter certain resources you'd like to save, but don't need the full power of a crawler to do so (like images delivered through a CDN).