Skip to main content

BaseRequestData

crawlee._request.BaseRequestData

Data needed to create a new crawling request.

Index

Methods

from_url

  • from_url(url, *, method, payload, label, unique_key, id, keep_url_fragment, use_extended_unique_key, kwargs): Self
  • Create a new BaseRequestData instance from a URL. See Request.from_url for more details.


    Parameters

    • url: str
    • method: HttpMethod = 'GET'keyword-only
    • payload: HttpPayload | None = Nonekeyword-only
    • label: str | None = Nonekeyword-only
    • unique_key: str | None = Nonekeyword-only
    • id: str | None = Nonekeyword-only
    • keep_url_fragment: bool = Falsekeyword-only
    • use_extended_unique_key: bool = Falsekeyword-only
    • kwargs: Any

    Returns Self

get_query_param_from_url

  • get_query_param_from_url(param, *, default): str | None
  • Get the value of a specific query parameter from the URL.


    Parameters

    • param: str
    • default: str | None = Nonekeyword-only

    Returns str | None

Properties

data

data: Annotated[dict[str, Any], Field(default_factory=dict)]

handled_at

handled_at: Annotated[datetime | None, Field(alias='handledAt')]

headers

headers: Annotated[HttpHeaders, Field(default_factory=HttpHeaders())]

HTTP request headers.

loaded_url

loaded_url: Annotated[str | None, BeforeValidator(validate_http_url), Field(alias='loadedUrl')]

method

method: HttpMethod

HTTP request method.

model_config

model_config:

no_retry

no_retry: Annotated[bool, Field(alias='noRetry')]

payload

payload: HttpPayload | None

query_params

query_params: Annotated[HttpQueryParams, Field(alias='queryParams', default_factory=dict)]

URL query parameters.

retry_count

retry_count: Annotated[int, Field(alias='retryCount')]

unique_key

unique_key: Annotated[str, Field(alias='uniqueKey')]

A unique key identifying the request. Two requests with the same unique_key are considered as pointing to the same URL.

If unique_key is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the unique_key of http://www.example.com/something.

Pass an arbitrary non-empty text value to the unique_key property to override the default behavior and specify which URLs shall be considered equal.

url

url: Annotated[str, BeforeValidator(validate_http_url), Field()]

URL of the web page to crawl

user_data

user_data: Annotated[ dict[str, JsonValue], # Internally, the model contains `UserData`, this is just for convenience Field(alias='userData', default_factory=lambda: UserData()), PlainValidator(user_data_adapter.validate_python), PlainSerializer( lambda instance: user_data_adapter.dump_python( instance, by_alias=True, exclude_none=True, exclude_unset=True, exclude_defaults=True, ) ), ]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.