Skip to main content

RequestLoader

Abstract base class defining the interface for classes that provide access to a read-only stream of requests.

Request loaders are used to manage and provide access to a storage of crawling requests.

Key responsibilities:

  • Fetching the next request to be processed.
  • Marking requests as successfully handled after processing.
  • Managing state information such as the total and handled request counts.

Hierarchy

Index

Methods

fetch_next_request

  • async fetch_next_request(): Request | None
  • Returns the next request to be processed, or null if there are no more pending requests.


    Returns Request | None

get_handled_count

  • async get_handled_count(): int
  • Returns the number of handled requests.


    Returns int

get_total_count

  • async get_total_count(): int
  • Returns an offline approximation of the total number of requests in the source (i.e. pending + handled).


    Returns int

is_empty

  • async is_empty(): bool
  • Returns True if there are no more requests in the source (there might still be unfinished requests).


    Returns bool

is_finished

  • async is_finished(): bool
  • Returns True if all requests have been handled.


    Returns bool

mark_request_as_handled

  • Marks a request as handled after a successful processing (or after giving up retrying).


    Parameters

    • optionalkeyword-onlyrequest: Request

    Returns ProcessedRequest | None

to_tandem

  • Combine the loader with a request manager to support adding and reclaiming requests.


    Parameters

    • optionalkeyword-onlyrequest_manager: RequestManager | None = None

      Request manager to combine the loader with. If None is given, the default request queue is used.

    Returns RequestManagerTandem