Skip to main content

RequestProvider

crawlee.storages._request_provider.RequestProvider

Abstract base class defining the interface and common behaviour for request providers.

Request providers are used to manage and provide access to a storage of crawling requests.

Key responsibilities: - Fetching the next request to be processed. - Reclaiming requests that failed during processing, allowing retries. - Marking requests as successfully handled after processing. - Adding new requests to the provider, both individually and in batches. - Managing state information such as the total and handled request counts. - Deleting or dropping the provider from the underlying storage.

Subclasses of RequestProvider should provide specific implementations for each of the abstract methods.

Index

Methods

add_request

  • async add_request(request, *, forefront): ProcessedRequest
  • Add a single request to the provider and store it in underlying resource client.


    Parameters

    • request: str | Request
    • forefront: bool = Falsekeyword-only

    Returns ProcessedRequest

add_requests_batched

  • async add_requests_batched(requests, *, batch_size, wait_time_between_batches, wait_for_all_requests_to_be_added, wait_for_all_requests_to_be_added_timeout): None
  • Add requests to the underlying resource client in batches.


    Parameters

    • requests: Sequence[str | Request]
    • batch_size: int = 1000keyword-only
    • wait_time_between_batches: timedelta = timedelta(seconds=1)keyword-only
    • wait_for_all_requests_to_be_added: bool = Falsekeyword-only
    • wait_for_all_requests_to_be_added_timeout: timedelta | None = Nonekeyword-only

    Returns None

drop

  • async drop(): None
  • Removes the queue either from the Apify Cloud storage or from the local database.


    Returns None

fetch_next_request

  • async fetch_next_request(): Request | None
  • Returns Request | None

get_handled_count

  • async get_handled_count(): int
  • Returns int

get_total_count

  • async get_total_count(): int
  • Returns int

is_empty

  • async is_empty(): bool
  • Returns bool

is_finished

  • async is_finished(): bool
  • Returns bool

mark_request_as_handled

  • async mark_request_as_handled(request): ProcessedRequest | None
  • Marks a request as handled after a successful processing (or after giving up retrying).


    Parameters

    • request: Request

    Returns ProcessedRequest | None

reclaim_request

  • async reclaim_request(request, *, forefront): ProcessedRequest | None
  • Reclaims a failed request back to the queue, so that it can be returned for processing later again.

    It is possible to modify the request data by supplying an updated request as a parameter.


    Parameters

    • request: Request
    • forefront: bool = Falsekeyword-only

    Returns ProcessedRequest | None

Properties

name

name: str | None

ID or name of the request queue.