Skip to main content

FileSystemRequestQueueClient

A file system implementation of the request queue client.

This client persists requests to the file system as individual JSON files, making it suitable for scenarios where data needs to survive process restarts. Each request is stored as a separate file in a directory structure following the pattern:

{STORAGE_DIR}/request_queues/{QUEUE_ID}/{REQUEST_ID}.json

The implementation uses RecoverableState to maintain ordering information, in-progress status, and request handling status. This allows for proper state recovery across process restarts without embedding metadata in individual request files. File system storage provides durability at the cost of slower I/O operations compared to memory only-based storage.

This implementation is ideal for long-running crawlers where persistence is important and for situations where you need to resume crawling after process termination.

Hierarchy

Index

Methods

__init__

  • __init__(*, metadata, storage_dir, lock): None
  • Initialize a new instance.

    Preferably use the FileSystemRequestQueueClient.open class method to create a new instance.


    Parameters

    • keyword-onlymetadata: RequestQueueMetadata
    • keyword-onlystorage_dir: Path
    • keyword-onlylock: asyncio.Lock

    Returns None

add_batch_of_requests

  • Add batch of requests to the queue.

    This method adds a batch of requests to the queue. Each request is processed based on its uniqueness (determined by unique_key). Duplicates will be identified but not re-added to the queue.


    Parameters

    • requests: Sequence[Request]

      The collection of requests to add to the queue.

    • optionalkeyword-onlyforefront: bool = False

      Whether to put the added requests at the beginning (True) or the end (False) of the queue. When True, the requests will be processed sooner than previously added requests.

    Returns AddRequestsResponse

drop

  • async drop(): None
  • Drop the whole request queue and remove all its values.

    The backend method for the RequestQueue.drop call.


    Returns None

fetch_next_request

  • async fetch_next_request(): Request | None
  • Return the next request in the queue to be processed.

    Once you successfully finish processing of the request, you need to call RequestQueue.mark_request_as_handled to mark the request as handled in the queue. If there was some error in processing the request, call RequestQueue.reclaim_request instead, so that the queue will give the request to some other consumer in another call to the fetch_next_request method.

    Note that the None return value does not mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, use RequestQueue.is_finished instead.


    Returns Request | None

get_metadata

get_request

  • async get_request(request_id): Request | None

is_empty

  • async is_empty(): bool

mark_request_as_handled

open

  • Open or create a file system request queue client.

    This method attempts to open an existing request queue from the file system. If a queue with the specified ID or name exists, it loads the metadata and state from the stored files. If no existing queue is found, a new one is created.


    Parameters

    • keyword-onlyid: str | None

      The ID of the request queue to open. If provided, searches for existing queue by ID.

    • keyword-onlyname: str | None

      The name of the request queue to open. If not provided, uses the default queue.

    • keyword-onlyconfiguration: Configuration

      The configuration object containing storage directory settings.

    Returns FileSystemRequestQueueClient

purge

  • async purge(): None
  • Purge all items from the request queue.

    The backend method for the RequestQueue.purge call.


    Returns None

reclaim_request

  • Reclaim a failed request back to the queue.

    The request will be returned for processing later again by another call to RequestQueue.fetch_next_request.


    Parameters

    • request: Request

      The request to return to the queue.

    • optionalkeyword-onlyforefront: bool = False

      Whether to add the request to the head or the end of the queue.

    Returns ProcessedRequest | None

Properties

path_to_metadata

path_to_metadata: Path

The full path to the request queue metadata file.

path_to_rq

path_to_rq: Path

The full path to the request queue directory.