Skip to main content

RequestProvider

Abstract base class defining the interface and common behaviour for request providers.

Request providers are used to manage and provide access to a storage of crawling requests.

Key responsibilities:

Fetching the next request to be processed.
Reclaiming requests that failed during processing, allowing retries.
Marking requests as successfully handled after processing.
Adding new requests to the provider, both individually and in batches.
Managing state information such as the total and handled request counts.
Deleting or dropping the provider from the underlying storage.

Subclasses of RequestProvider should provide specific implementations for each of the abstract methods.

Hierarchy

RequestProvider
- RequestList
- RequestQueue

Index

Methods

add_request
add_requests_batched
drop
fetch_next_request
get_handled_count
get_total_count
is_empty
is_finished
mark_request_as_handled
reclaim_request

Properties

name

Methods

add_request

async add_request(*, request, forefront): ProcessedRequest

Add a single request to the provider and store it in underlying resource client.
Parameters
- optionalkeyword-onlyrequest: str | Request
  The request object (or its string representation) to be added to the provider.
- optionalkeyword-onlyforefront: bool = False
  Determines whether the request should be added to the beginning (if True) or the end (if False) of the provider.
Returns ProcessedRequest

add_requests_batched

async add_requests_batched(*, requests, batch_size, wait_time_between_batches, wait_for_all_requests_to_be_added, wait_for_all_requests_to_be_added_timeout): None

Add requests to the underlying resource client in batches.
Parameters
- optionalkeyword-onlyrequests: Sequence[str | Request]
  Requests to add to the queue.
- optionalkeyword-onlybatch_size: int = 1000
  The number of requests to add in one batch.
- optionalkeyword-onlywait_time_between_batches: timedelta = timedelta(seconds=1)
  Time to wait between adding batches.
- optionalkeyword-onlywait_for_all_requests_to_be_added: bool = False
  If True, wait for all requests to be added before returning.
- optionalkeyword-onlywait_for_all_requests_to_be_added_timeout: timedelta | None = None
  Timeout for waiting for all requests to be added.
Returns None

drop

async drop(): None

Removes the queue either from the Apify Cloud storage or from the local database.
Returns None

fetch_next_request

async fetch_next_request(): Request | None

Returns a next request in the queue to be processed, or null if there are no more pending requests.
Returns Request | None

get_handled_count

async get_handled_count(): int

Returns the number of handled requests.
Returns int

get_total_count

async get_total_count(): int

Returns an offline approximation of the total number of requests in the queue (i.e. pending + handled).
Returns int

is_empty

async is_empty(): bool

Returns True if there are no more requests in the queue (there might still be unfinished requests).
Returns bool

is_finished

async is_finished(): bool

Returns True if all requests have been handled.
Returns bool

mark_request_as_handled

async mark_request_as_handled(*, request): ProcessedRequest | None

Marks a request as handled after a successful processing (or after giving up retrying).
Parameters
- optionalkeyword-onlyrequest: Request
Returns ProcessedRequest | None

reclaim_request

async reclaim_request(*, request, forefront): ProcessedRequest | None

Reclaims a failed request back to the queue, so that it can be returned for processing later again.

It is possible to modify the request data by supplying an updated request as a parameter.
Parameters
- optionalkeyword-onlyrequest: Request
- optionalkeyword-onlyforefront: bool = False
Returns ProcessedRequest | None

Properties

name

name: str | None

ID or name of the request queue.

Page Options

Hide Inherited

add_request
add_requests_batched
drop
fetch_next_request
get_handled_count
get_total_count
is_empty
is_finished
mark_request_as_handled
reclaim_request
name