Skip to main content

RequestList

Represents a (potentially very large) list of URLs to crawl.

Hierarchy

Index

Methods

__init__

  • __init__(requests, name, persist_state_key, persist_requests_key): None
  • Initialize a new instance.


    Parameters

    • optionalrequests: (Iterable[str | Request] | AsyncIterable[str | Request]) | None = None

      The request objects (or their string representations) to be added to the provider.

    • optionalname: str | None = None

      A name of the request list.

    • optionalpersist_state_key: str | None = None

      A key for persisting the progress information of the RequestList. If you do not pass a key but pass a name, a key will be derived using the name. Otherwise, state will not be persisted.

    • optionalpersist_requests_key: str | None = None

      A key for persisting the request data loaded from the requests iterator. If specified, the request data will be stored in the KeyValueStore to make sure that they don't change over time. This is useful if the requests iterator pulls the data dynamically.

    Returns None

fetch_next_request

  • async fetch_next_request(): Request | None

get_handled_count

  • async get_handled_count(): int

get_total_count

  • async get_total_count(): int
  • Get an offline approximation of the total number of requests in the loader (i.e. pending + handled).


    Returns int

is_empty

  • async is_empty(): bool
  • Return True if there are no more requests in the loader (there might still be unfinished requests).


    Returns bool

is_finished

  • async is_finished(): bool

mark_request_as_handled

to_tandem

  • Combine the loader with a request manager to support adding and reclaiming requests.


    Parameters

    • optionalrequest_manager: RequestManager | None = None

      Request manager to combine the loader with. If None is given, the default request queue is used.

    Returns RequestManagerTandem

Properties

name

name: str | None