Skip to main content

RequestHandlerRunResult

Record of calls to storage-related context helpers.

Index

Methods

__init__

  • __init__(*, key_value_store_getter): None

add_requests

  • async add_requests(requests, *, limit, base_url, strategy, include, exclude): None
  • Track a call to the add_requests context helper.


    Parameters

    • requests: Sequence[str | Request]
    • keyword-onlyoptionallimit: int

      Maximum number of requests to be enqueued.

    • keyword-onlyoptionalbase_url: str

      Base URL to be used for relative URLs.

    • keyword-onlyoptionalstrategy: Literal['all', 'same-domain', 'same-hostname', 'same-origin']

      Enqueue strategy to be used for determining which links to extract and enqueue.

      Options: all: Enqueue every link encountered, regardless of the target domain. Use this option to ensure that all links, including those leading to external websites, are followed. same-domain: Enqueue links that share the same domain name as the current page, including any subdomains. This strategy is ideal for crawling within the same top-level domain while still allowing for subdomain exploration. same-hostname: Enqueue links only if they match the exact hostname of the current page. This is the default behavior and restricts the crawl to the current hostname, excluding subdomains. same-origin: Enqueue links that share the same origin as the current page. The origin is defined by the combination of protocol, domain, and port, ensuring a strict scope for the crawl.

    • keyword-onlyoptionalinclude: list[re.Pattern | Glob]

      List of regular expressions or globs that URLs must match to be enqueued.

    • keyword-onlyoptionalexclude: list[re.Pattern | Glob]

      List of regular expressions or globs that URLs must not match to be enqueued.

    Returns None

get_key_value_store

  • Parameters

    • optionalkeyword-onlyid: str | None = None
    • optionalkeyword-onlyname: str | None = None

    Returns KeyValueStoreInterface

push_data

  • async push_data(data, dataset_id, dataset_name): None
  • Track a call to the push_data context helper.


    Parameters

    • data: JsonSerializable
    • optionaldataset_id: str | None = None
    • optionaldataset_name: str | None = None

    Returns None