Skip to main content

BaseRequestData

Data needed to create a new crawling request.

Hierarchy

Index

Methods

from_url

  • from_url(*, url, method, headers, payload, label, unique_key, keep_url_fragment, use_extended_unique_key, kwargs): Self
  • Create a new BaseRequestData instance from a URL. See Request.from_url for more details.


    Parameters

    • optionalkeyword-onlyurl: str
    • optionalkeyword-onlymethod: HttpMethod = 'GET'
    • optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
    • optionalkeyword-onlypayload: (HttpPayload | str) | None = None
    • optionalkeyword-onlylabel: str | None = None
    • optionalkeyword-onlyunique_key: str | None = None
    • optionalkeyword-onlykeep_url_fragment: bool = False
    • optionalkeyword-onlyuse_extended_unique_key: bool = False
    • optionalkeyword-onlykwargs: Any

    Returns Self

get_query_param_from_url

  • get_query_param_from_url(*, param, default): str | None
  • Get the value of a specific query parameter from the URL.


    Parameters

    • optionalkeyword-onlyparam: str
    • optionalkeyword-onlydefault: str | None = None

    Returns str | None

Properties

handled_at

handled_at: datetime | None

Timestamp when the request was handled.

headers

headers: HttpHeaders

HTTP request headers.

loaded_url

loaded_url: str | None

URL of the web page that was loaded. This can differ from the original URL in case of redirects.

method

method: HttpMethod

HTTP request method.

model_config

model_config: Undefined

no_retry

no_retry: bool

If set to True, the request will not be retried in case of failure.

payload

payload: HttpPayload | None

HTTP request payload.

retry_count

retry_count: int

Number of times the request has been retried.

unique_key

unique_key: str

A unique key identifying the request. Two requests with the same unique_key are considered as pointing to the same URL.

If unique_key is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the unique_key of http://www.example.com/something.

Pass an arbitrary non-empty text value to the unique_key property to override the default behavior and specify which URLs shall be considered equal.

url

url: str

The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.

user_data

user_data: dict[str, JsonSerializable]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.