Skip to main content

BaseRequestData

crawlee.models.BaseRequestData

Data needed to create a new crawling request.

Index

Methods

from_url

  • from_url(url, *, label, unique_key, kwargs): Self
  • Create a new RequestData instance from a URL.


    Parameters

    • url: str
    • label: str | None = Nonekeyword-only
    • unique_key: str | None = Nonekeyword-only
    • kwargs: Any

    Returns Self

Properties

handled_at

handled_at: Annotated[datetime | None, Field(alias='handledAt')]

headers

headers: Annotated[dict[str, str] | None, Field(default_factory=dict)]

loaded_url

loaded_url: Annotated[str | None, Field(alias='loadedUrl')]

method

method: str

model_config

model_config:

no_retry

no_retry: Annotated[bool, Field(alias='noRetry')]

payload

payload: str | None

retry_count

retry_count: Annotated[int, Field(alias='retryCount')]

unique_key

unique_key: Annotated[str, Field(alias='uniqueKey')]

A unique key identifying the request. Two requests with the same uniqueKey are considered as pointing to the same URL.

If uniqueKey is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the uniqueKey of http://www.example.com/something.

Pass an arbitrary non-empty text value to the uniqueKey property to override the default behavior and specify which URLs shall be considered equal.

url

url: Annotated[str, Field(min_length=1)]

URL of the web page to crawl

user_data

user_data: Annotated[dict[str, Any], Field(alias='userData', default_factory=dict)]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.