RequestList
Hierarchy
- RequestLoader
- RequestList
Index
Methods
__init__
Initialize a new instance.
Parameters
optionalrequests: (Iterable[str | Request] | AsyncIterable[str | Request]) | None = None
The request objects (or their string representations) to be added to the provider.
optionalname: str | None = None
A name of the request list.
optionalpersist_state_key: str | None = None
A key for persisting the progress information of the RequestList. If you do not pass a key but pass a
name
, a key will be derived using the name. Otherwise, state will not be persisted.optionalpersist_requests_key: str | None = None
A key for persisting the request data loaded from the
requests
iterator. If specified, the request data will be stored in the KeyValueStore to make sure that they don't change over time. This is useful if therequests
iterator pulls the data dynamically.
Returns None
fetch_next_request
Return the next request to be processed, or
null
if there are no more pending requests.Returns Request | None
get_handled_count
Get the number of requests in the loader that have been handled.
Returns int
get_total_count
Get an offline approximation of the total number of requests in the loader (i.e. pending + handled).
Returns int
is_empty
Return True if there are no more requests in the loader (there might still be unfinished requests).
Returns bool
is_finished
Return True if all requests have been handled.
Returns bool
mark_request_as_handled
Mark a request as handled after a successful processing (or after giving up retrying).
Parameters
request: Request
Returns ProcessedRequest | None
to_tandem
Combine the loader with a request manager to support adding and reclaiming requests.
Parameters
optionalrequest_manager: RequestManager | None = None
Request manager to combine the loader with. If None is given, the default request queue is used.
Returns RequestManagerTandem
Represents a (potentially very large) list of URLs to crawl.