BaseRequestData

Data needed to create a new crawling request.

Hierarchy

BaseRequestData
- Request

Index

Methods

from_url
get_query_param_from_url

Properties

handled_at
headers
loaded_url
method
model_config
no_retry
payload
retry_count
unique_key
url
user_data

Methods

from_url

from_url(*, url, method, headers, payload, label, unique_key, keep_url_fragment, use_extended_unique_key, kwargs): Self

Create a new BaseRequestData instance from a URL. See Request.from_url for more details.
Parameters
- optionalkeyword-onlyurl: str
- optionalkeyword-onlymethod: HttpMethod = 'GET'
- optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
- optionalkeyword-onlypayload: (HttpPayload | str) | None = None
- optionalkeyword-onlylabel: str | None = None
- optionalkeyword-onlyunique_key: str | None = None
- optionalkeyword-onlykeep_url_fragment: bool = False
- optionalkeyword-onlyuse_extended_unique_key: bool = False
- optionalkeyword-onlykwargs: Any
Returns Self

get_query_param_from_url

get_query_param_from_url(*, param, default): str | None

Get the value of a specific query parameter from the URL.
Parameters
- optionalkeyword-onlyparam: str
- optionalkeyword-onlydefault: str | None = None
Returns str | None

Properties

handled_at

handled_at: datetime | None

Timestamp when the request was handled.

headers

headers: HttpHeaders

HTTP request headers.

loaded_url

loaded_url: str | None

URL of the web page that was loaded. This can differ from the original URL in case of redirects.

method

method: HttpMethod

HTTP request method.

model_config

model_config: Undefined

no_retry

no_retry: bool

If set to True, the request will not be retried in case of failure.

payload

payload: HttpPayload | None

HTTP request payload.

retry_count

retry_count: int

Number of times the request has been retried.

unique_key

unique_key: str

A unique key identifying the request. Two requests with the same unique_key are considered as pointing to the same URL.

If unique_key is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the unique_key of http://www.example.com/something.

Pass an arbitrary non-empty text value to the unique_key property to override the default behavior and specify which URLs shall be considered equal.

url

url: str

The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.

user_data

user_data: dict[str, JsonSerializable]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.

Hierarchy

Index

Methods

Properties

Methods

from_url

Parameters

optionalkeyword-onlyurl: str

optionalkeyword-onlymethod: HttpMethod = 'GET'

optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None

optionalkeyword-onlypayload: (HttpPayload | str) | None = None

optionalkeyword-onlylabel: str | None = None

optionalkeyword-onlyunique_key: str | None = None

optionalkeyword-onlykeep_url_fragment: bool = False

optionalkeyword-onlyuse_extended_unique_key: bool = False

optionalkeyword-onlykwargs: Any

Returns Self

get_query_param_from_url

Parameters

optionalkeyword-onlyparam: str

optionalkeyword-onlydefault: str | None = None

Returns str | None

Properties

handled_at

headers

loaded_url

method

model_config

no_retry

payload

retry_count

unique_key

url

user_data