FileSystemRequestQueueClient
Hierarchy
- RequestQueueClient
- FileSystemRequestQueueClient
Index
Methods
__init__
Initialize a new instance.
Preferably use the
FileSystemRequestQueueClient.open
class method to create a new instance.Parameters
keyword-onlymetadata: RequestQueueMetadata
keyword-onlystorage_dir: Path
keyword-onlylock: asyncio.Lock
Returns None
add_batch_of_requests
Add batch of requests to the queue.
This method adds a batch of requests to the queue. Each request is processed based on its uniqueness (determined by
unique_key
). Duplicates will be identified but not re-added to the queue.Parameters
requests: Sequence[Request]
The collection of requests to add to the queue.
optionalkeyword-onlyforefront: bool = False
Whether to put the added requests at the beginning (True) or the end (False) of the queue. When True, the requests will be processed sooner than previously added requests.
Returns AddRequestsResponse
drop
Drop the whole request queue and remove all its values.
The backend method for the
RequestQueue.drop
call.Returns None
fetch_next_request
Return the next request in the queue to be processed.
Once you successfully finish processing of the request, you need to call
RequestQueue.mark_request_as_handled
to mark the request as handled in the queue. If there was some error in processing the request, callRequestQueue.reclaim_request
instead, so that the queue will give the request to some other consumer in another call to thefetch_next_request
method.Note that the
None
return value does not mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, useRequestQueue.is_finished
instead.Returns Request | None
get_metadata
Get the metadata of the request queue.
Returns RequestQueueMetadata
get_request
Retrieve a request from the queue.
Parameters
request_id: str
ID of the request to retrieve.
Returns Request | None
is_empty
Check if the request queue is empty.
Returns bool
mark_request_as_handled
Mark a request as handled after successful processing.
Handled requests will never again be returned by the
RequestQueue.fetch_next_request
method.Parameters
request: Request
The request to mark as handled.
Returns ProcessedRequest | None
open
Open or create a file system request queue client.
This method attempts to open an existing request queue from the file system. If a queue with the specified ID or name exists, it loads the metadata and state from the stored files. If no existing queue is found, a new one is created.
Parameters
keyword-onlyid: str | None
The ID of the request queue to open. If provided, searches for existing queue by ID.
keyword-onlyname: str | None
The name of the request queue to open. If not provided, uses the default queue.
keyword-onlyconfiguration: Configuration
The configuration object containing storage directory settings.
Returns FileSystemRequestQueueClient
purge
Purge all items from the request queue.
The backend method for the
RequestQueue.purge
call.Returns None
reclaim_request
Reclaim a failed request back to the queue.
The request will be returned for processing later again by another call to
RequestQueue.fetch_next_request
.Parameters
request: Request
The request to return to the queue.
optionalkeyword-onlyforefront: bool = False
Whether to add the request to the head or the end of the queue.
Returns ProcessedRequest | None
Properties
path_to_metadata
The full path to the request queue metadata file.
path_to_rq
The full path to the request queue directory.
A file system implementation of the request queue client.
This client persists requests to the file system as individual JSON files, making it suitable for scenarios where data needs to survive process restarts. Each request is stored as a separate file in a directory structure following the pattern:
The implementation uses
RecoverableState
to maintain ordering information, in-progress status, and request handling status. This allows for proper state recovery across process restarts without embedding metadata in individual request files. File system storage provides durability at the cost of slower I/O operations compared to memory only-based storage.This implementation is ideal for long-running crawlers where persistence is important and for situations where you need to resume crawling after process termination.