Skip to main content
Version: Next

Crawl a sitemap

This example downloads and crawls the URLs from a sitemap, by using the Sitemap utility class provided by the @crawlee/utils module.

Run on
import { CheerioCrawler, Sitemap } from 'crawlee';

const crawler = new CheerioCrawler({
// Function called for each URL
async requestHandler({ request, log }) {
log.info(request.url);
},
maxRequestsPerCrawl: 10, // Limitation for only 10 requests (do not use if you want to crawl a sitemap)
});

const { urls } = await Sitemap.load('https://crawlee.dev/sitemap.xml');

await crawler.addRequests(urls);

// Run the crawler
await crawler.run();