FileSystemRequestQueueClient

A file system implementation of the request queue client.

This client persists requests to the file system as individual JSON files, making it suitable for scenarios where data needs to survive process restarts. Each request is stored as a separate file in a directory structure following the pattern:

{STORAGE_DIR}/request_queues/{QUEUE_ID}/{REQUEST_ID}.json

The implementation uses RecoverableState to maintain ordering information, in-progress status, and request handling status. This allows for proper state recovery across process restarts without embedding metadata in individual request files. File system storage provides durability at the cost of slower I/O operations compared to memory only-based storage.

This implementation is ideal for long-running crawlers where persistence is important and for situations where you need to resume crawling after process termination.

Hierarchy

RequestQueueClient
- FileSystemRequestQueueClient

Index

Methods

Properties

Methods

init

__init__(*, metadata, path_to_rq, lock, recoverable_state): None

Initialize a new instance.

Preferably use the FileSystemRequestQueueClient.open class method to create a new instance.
Parameters
- keyword-onlymetadata: RequestQueueMetadata
- keyword-onlypath_to_rq: Path
- keyword-onlylock: asyncio.Lock
- keyword-onlyrecoverable_state: RecoverableState[RequestQueueState]
Returns None

add_batch_of_requests

async add_batch_of_requests(requests, *, forefront): AddRequestsResponse

Overrides RequestQueueClient.add_batch_of_requests
Add batch of requests to the queue.

This method adds a batch of requests to the queue. Each request is processed based on its uniqueness (determined by unique_key). Duplicates will be identified but not re-added to the queue.
Parameters
- requests: Sequence[Request]
  The collection of requests to add to the queue.
- optionalkeyword-onlyforefront: bool = False
  Whether to put the added requests at the beginning (True) or the end (False) of the queue. When True, the requests will be processed sooner than previously added requests.
Returns AddRequestsResponse

drop

async drop(): None

Overrides RequestQueueClient.drop
Drop the whole request queue and remove all its values.

The backend method for the RequestQueue.drop call.
Returns None

fetch_next_request

async fetch_next_request(): Request | None

Overrides RequestQueueClient.fetch_next_request
Return the next request in the queue to be processed.

Once you successfully finish processing of the request, you need to call RequestQueue.mark_request_as_handled to mark the request as handled in the queue. If there was some error in processing the request, call RequestQueue.reclaim_request instead, so that the queue will give the request to some other consumer in another call to the fetch_next_request method.

Note that the None return value does not mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, use RequestQueue.is_finished instead.
Returns Request | None

get_metadata

async get_metadata(): RequestQueueMetadata

Overrides RequestQueueClient.get_metadata
Get the metadata of the request queue.
Returns RequestQueueMetadata

get_request

async get_request(unique_key): Request | None

Overrides RequestQueueClient.get_request
Retrieve a request from the queue.
Parameters
- unique_key: str
  Unique key of the request to retrieve.
Returns Request | None

is_empty

async is_empty(): bool

Overrides RequestQueueClient.is_empty
Check if the request queue is empty.
Returns bool

mark_request_as_handled

async mark_request_as_handled(request): ProcessedRequest | None

Overrides RequestQueueClient.mark_request_as_handled
Mark a request as handled after successful processing.

Handled requests will never again be returned by the RequestQueue.fetch_next_request method.
Parameters
- request: Request
  The request to mark as handled.
Returns ProcessedRequest | None

open

async open(*, id, name, alias, configuration): Self

Open or create a file system request queue client.

This method attempts to open an existing request queue from the file system. If a queue with the specified ID or name exists, it loads the metadata and state from the stored files. If no existing queue is found, a new one is created.
Parameters
- keyword-onlyid: str | None
  The ID of the request queue to open. If provided, searches for existing queue by ID.
- keyword-onlyname: str | None
  The name of the request queue for named (global scope) storages.
- keyword-onlyalias: str | None
  The alias of the request queue for unnamed (run scope) storages.
- keyword-onlyconfiguration: Configuration
  The configuration object containing storage directory settings.
Returns Self

purge

async purge(): None

Overrides RequestQueueClient.purge
Purge all items from the request queue.

The backend method for the RequestQueue.purge call.
Returns None

reclaim_request

async reclaim_request(request, *, forefront): ProcessedRequest | None

Overrides RequestQueueClient.reclaim_request
Reclaim a failed request back to the queue.

The request will be returned for processing later again by another call to RequestQueue.fetch_next_request.
Parameters
- request: Request
  The request to return to the queue.
- optionalkeyword-onlyforefront: bool = False
  Whether to add the request to the head or the end of the queue.
Returns ProcessedRequest | None

Properties

path_to_metadata

path_to_metadata: Path

The full path to the request queue metadata file.

path_to_rq

path_to_rq: Path

The full path to the request queue directory.

Hierarchy

Index

Methods

Properties

Methods

__init__

Parameters

keyword-onlymetadata: RequestQueueMetadata

keyword-onlypath_to_rq: Path

keyword-onlylock: asyncio.Lock

keyword-onlyrecoverable_state: RecoverableState[RequestQueueState]

Returns None

add_batch_of_requests

Parameters

requests: Sequence[Request]

optionalkeyword-onlyforefront: bool = False

Returns AddRequestsResponse

drop

Returns None

fetch_next_request

Returns Request | None

get_metadata

Returns RequestQueueMetadata

get_request

Parameters

unique_key: str

Returns Request | None

is_empty

Returns bool

mark_request_as_handled

Parameters

request: Request

Returns ProcessedRequest | None

open

Parameters

keyword-onlyid: str | None

keyword-onlyname: str | None

keyword-onlyalias: str | None

keyword-onlyconfiguration: Configuration

Returns Self

purge

Returns None

reclaim_request

Parameters

request: Request

optionalkeyword-onlyforefront: bool = False

Returns ProcessedRequest | None

Properties

path_to_metadata

path_to_rq

init