Skip to main content
Version: 3.7

Crawl a website with relative links

When crawling a website, you may encounter different types of links present that you may want to crawl. To facilitate the easy crawling of such links, we provide the enqueueLinks() method on the crawler context, which will automatically find links and add them to the crawler's RequestQueue.

We provide 3 different strategies for crawling relative links:

note

For these examples, we are using the CheerioCrawler, however the same method is available for both the PuppeteerCrawler and PlaywrightCrawler, and you use it the exact same way.

Example domains

For a url of https://subdomain.example.com, enqueueLinks() will match relative urls or urls that point to the same domain name, regardless of their subdomain.

For instance, hyperlinks like https://subdomain.example.com/some/path, /absolute/example or ./relative/example will all be matched by this strategy, as well as links to other subdomains or to the naked domain, like https://other-subdomain.example.com or https://example.com will work too.

Run on
import { CheerioCrawler, EnqueueStrategy } from 'crawlee';

const crawler = new CheerioCrawler({
maxRequestsPerCrawl: 10, // Limitation for only 10 requests (do not use if you want to crawl all links)
async requestHandler({ request, enqueueLinks, log }) {
log.info(request.url);
await enqueueLinks({
// Setting the strategy to 'same-domain' will enqueue all links found that are on the
// same hostname as request.loadedUrl or request.url
strategy: EnqueueStrategy.SameDomain,
// Alternatively, you can pass in the string 'same-domain'
// strategy: 'same-domain',
});
},
});

// Run the crawler with initial request
await crawler.run(['https://crawlee.dev']);