Skip to main content
Version: Next

Request Locking

Release announcement

As of May 2024 (crawlee version 3.10.0), this experiment is now enabled by default! With that said, if you encounter issues you can:

  • set requestLocking to false in the experiments object of your crawler options
  • update all imports of RequestQueue to RequestQueueV1
  • open an issue on our GitHub repository

The content below is kept for documentation purposes. If you're interested in the changes, you can read the blog post about the new Request Queue storage system on the Apify blog.


caution

This is an experimental feature. While we welcome testers, keep in mind that it is currently not recommended to use this in production.

The API is subject to change, and we might introduce breaking changes in the future.

Should you be using this, feel free to open issues on our GitHub repository, and we'll take a look.

Starting with crawlee version 3.5.5, we have introduced a new crawler option that lets you enable using a new request locking API. With this API, you will be able to pass a RequestQueue to multiple crawlers to parallelize the crawling process.

Keep in mind

The request queue that supports request locking is currently exported via the RequestQueueV2 class. Once the experiment is over, this class will replace the current RequestQueue class

How to enable the experiment

In crawlers

note

This example shows how to enable the experiment in the CheerioCrawler, but you can apply this to any crawler type.

import { CheerioCrawler } from 'crawlee';

const crawler = new CheerioCrawler({
experiments: {
requestLocking: true,
},
async requestHandler({ $, request }) {
const title = $('title').text();
console.log(`The title of "${request.url}" is: ${title}.`);
},
});

await crawler.run(['https://crawlee.dev']);

Outside crawlers (to setup your own request queue that supports locking)

Previously, you would import RequestQueue from crawlee. To switch to the queue that supports locking, you need to import RequestQueueV2 instead.

import { RequestQueueV2 } from 'crawlee';

const queue = await RequestQueueV2.open('my-locking-queue');
await queue.addRequests([
{ url: 'https://crawlee.dev' },
{ url: 'https://crawlee.dev/docs' },
{ url: 'https://crawlee.dev/api' },
]);

Using the new request queue in crawlers

If you make your own request queue that supports locking, you will also need to enable the experiment in your crawlers.

danger

If you do not enable the experiment, you will receive a runtime error and the crawler will not start.

import { CheerioCrawler, RequestQueueV2 } from 'crawlee';

const queue = await RequestQueueV2.open('my-locking-queue');

const crawler = new CheerioCrawler({
experiments: {
requestLocking: true,
},
requestQueue: queue,
async requestHandler({ $, request }) {
const title = $('title').text();
console.log(`The title of "${request.url}" is: ${title}.`);
},
});

await crawler.run();

Other changes

info

This section is only useful if you're a tinkerer and want to see what's going on under the hood.

In order to facilitate the new request locking API, as well as keep both the current request queue logic and the new, locking based request queue logic, we have implemented a common starting point called RequestProvider.

This class implements almost all functions by default, but expects you, the developer, to implement the following methods: fetchNextRequest and ensureHeadIsNotEmpty. These methods are responsible for loading and returning requests to process, and tell crawlers if there are more requests to process.

You can use this base class to implement your own request providers if you need to fetch requests from a different source.

tip

We recommend you use TypeScript when implementing your own request provider, as it comes with suggestions for the abstract methods, as well as giving you the exact types you need to return.