Skip to main content

RequestList

Represents a (potentially very large) list of URLs to crawl.

Hierarchy

RequestLoader
- RequestList

Index

Methods

Properties

name

Methods

init

__init__(requests, name, persist_state_key, persist_requests_key): None

Initialize a new instance.
Parameters
- optionalrequests: (Iterable[str | Request] | AsyncIterable[str | Request]) | None = None
  The request objects (or their string representations) to be added to the provider.
- optionalname: str | None = None
  A name of the request list.
- optionalpersist_state_key: str | None = None
  A key for persisting the progress information of the RequestList. If you do not pass a key but pass a name, a key will be derived using the name. Otherwise, state will not be persisted.
- optionalpersist_requests_key: str | None = None
  A key for persisting the request data loaded from the requests iterator. If specified, the request data will be stored in the KeyValueStore to make sure that they don't change over time. This is useful if the requests iterator pulls the data dynamically.
Returns None

fetch_next_request

async fetch_next_request(): Request | None

Overrides RequestManager.fetch_next_request
Return the next request to be processed, or null if there are no more pending requests.
Returns Request | None

get_handled_count

async get_handled_count(): int

Overrides RequestManager.get_handled_count
Get the number of requests in the loader that have been handled.
Returns int

get_total_count

async get_total_count(): int

Overrides RequestManager.get_total_count
Get an offline approximation of the total number of requests in the loader (i.e. pending + handled).
Returns int

is_empty

async is_empty(): bool

Overrides RequestManager.is_empty
Return True if there are no more requests in the loader (there might still be unfinished requests).
Returns bool

is_finished

async is_finished(): bool

Overrides RequestManager.is_finished
Return True if all requests have been handled.
Returns bool

mark_request_as_handled

async mark_request_as_handled(request): ProcessedRequest | None

Overrides RequestManager.mark_request_as_handled
Mark a request as handled after a successful processing (or after giving up retrying).
Parameters
- request: Request
Returns ProcessedRequest | None

to_tandem

async to_tandem(request_manager): RequestManagerTandem

Inherited from RequestLoader.to_tandem
Combine the loader with a request manager to support adding and reclaiming requests.
Parameters
- optionalrequest_manager: RequestManager | None = None
  Request manager to combine the loader with. If None is given, the default request queue is used.
Returns RequestManagerTandem

Properties

name

name: str | None

Page Options

Hide Inherited

__init__
fetch_next_request
get_handled_count
get_total_count
is_empty
is_finished
mark_request_as_handled
to_tandem
name