Skip to main content

RequestProvider

Abstract base class defining the interface and common behaviour for request providers.

Request providers are used to manage and provide access to a storage of crawling requests.

Key responsibilities:

  • Fetching the next request to be processed.
  • Reclaiming requests that failed during processing, allowing retries.
  • Marking requests as successfully handled after processing.
  • Adding new requests to the provider, both individually and in batches.
  • Managing state information such as the total and handled request counts.
  • Deleting or dropping the provider from the underlying storage.

Subclasses of RequestProvider should provide specific implementations for each of the abstract methods.

Hierarchy

Index

Methods

add_request

  • Add a single request to the provider and store it in underlying resource client.


    Parameters

    • optionalkeyword-onlyrequest: str | Request

      The request object (or its string representation) to be added to the provider.

    • optionalkeyword-onlyforefront: bool = False

      Determines whether the request should be added to the beginning (if True) or the end (if False) of the provider.

    Returns ProcessedRequest

add_requests_batched

  • async add_requests_batched(*, requests, batch_size, wait_time_between_batches, wait_for_all_requests_to_be_added, wait_for_all_requests_to_be_added_timeout): None
  • Add requests to the underlying resource client in batches.


    Parameters

    • optionalkeyword-onlyrequests: Sequence[str | Request]

      Requests to add to the queue.

    • optionalkeyword-onlybatch_size: int = 1000

      The number of requests to add in one batch.

    • optionalkeyword-onlywait_time_between_batches: timedelta = timedelta(seconds=1)

      Time to wait between adding batches.

    • optionalkeyword-onlywait_for_all_requests_to_be_added: bool = False

      If True, wait for all requests to be added before returning.

    • optionalkeyword-onlywait_for_all_requests_to_be_added_timeout: timedelta | None = None

      Timeout for waiting for all requests to be added.

    Returns None

drop

  • async drop(): None
  • Removes the queue either from the Apify Cloud storage or from the local database.


    Returns None

fetch_next_request

  • async fetch_next_request(): Request | None
  • Returns a next request in the queue to be processed, or null if there are no more pending requests.


    Returns Request | None

get_handled_count

  • async get_handled_count(): int
  • Returns the number of handled requests.


    Returns int

get_total_count

  • async get_total_count(): int
  • Returns an offline approximation of the total number of requests in the queue (i.e. pending + handled).


    Returns int

is_empty

  • async is_empty(): bool
  • Returns True if there are no more requests in the queue (there might still be unfinished requests).


    Returns bool

is_finished

  • async is_finished(): bool
  • Returns True if all requests have been handled.


    Returns bool

mark_request_as_handled

  • Marks a request as handled after a successful processing (or after giving up retrying).


    Parameters

    • optionalkeyword-onlyrequest: Request

    Returns ProcessedRequest | None

reclaim_request

  • Reclaims a failed request back to the queue, so that it can be returned for processing later again.

    It is possible to modify the request data by supplying an updated request as a parameter.


    Parameters

    • optionalkeyword-onlyrequest: Request
    • optionalkeyword-onlyforefront: bool = False

    Returns ProcessedRequest | None

Properties

name

name: str | None

ID or name of the request queue.