Skip to main content

Request

crawlee._request.Request

Represents a request in the Crawlee framework, containing the necessary information for crawling operations.

The Request class is one of the core components in Crawlee, utilized by various components such as request providers, HTTP clients, crawlers, and more. It encapsulates the essential data for executing web requests, including the URL, HTTP method, headers, payload, and user data. The user data allows custom information to be stored and persisted throughout the request lifecycle, including its retries.

Key functionalities include managing the request's identifier (id), unique key (unique_key) that is used for request deduplication, controlling retries, handling state management, and enabling configuration for session rotation and proxy handling.

The recommended way to create a new instance is by using the Request.from_url constructor, which automatically generates a unique key and identifier based on the URL and request parameters.

request = Request.from_url('https://crawlee.dev')

Index

Methods

enqueue_strategy

  • enqueue_strategy(new_enqueue_strategy): None
  • Parameters

    • new_enqueue_strategy: EnqueueStrategy

    Returns None

forefront

  • forefront(new_value): None
  • Parameters

    • new_value: bool

    Returns None

from_base_request_data

  • from_base_request_data(base_request_data, *, id): Self
  • Create a complete Request object based on a BaseRequestData instance.


    Parameters

    • base_request_data: BaseRequestData
    • id: str | None = Nonekeyword-only

    Returns Self

from_url

  • from_url(url, *, method, payload, label, unique_key, id, keep_url_fragment, use_extended_unique_key, kwargs): Self
  • Create a new Request instance from a URL.

    This is recommended constructor for creating new Request instances. It generates a Request object from a given URL with additional options to customize HTTP method, payload, unique key, and other request properties. If no unique_key or id is provided, they are computed automatically based on the URL, method and payload. It depends on the keep_url_fragment and use_extended_unique_key flags.


    Parameters

    • url: str
    • method: HttpMethod = 'GET'keyword-only
    • payload: HttpPayload | None = Nonekeyword-only
    • label: str | None = Nonekeyword-only
    • unique_key: str | None = Nonekeyword-only
    • id: str | None = Nonekeyword-only
    • keep_url_fragment: bool = Falsekeyword-only
    • use_extended_unique_key: bool = Falsekeyword-only
    • kwargs: Any

    Returns Self

last_proxy_tier

  • last_proxy_tier(new_value): None
  • Parameters

    • new_value: int

    Returns None

max_retries

  • max_retries(new_max_retries): None
  • Parameters

    • new_max_retries: int

    Returns None

session_rotation_count

  • session_rotation_count(new_session_rotation_count): None
  • Parameters

    • new_session_rotation_count: int

    Returns None

state

  • state(new_state): None
  • Parameters

    • new_state: RequestState

    Returns None

Properties

crawlee_data

crawlee_data: CrawleeRequestData

Crawlee-specific configuration stored in the user_data.

enqueue_strategy

enqueue_strategy: EnqueueStrategy

The strategy used when enqueueing the request.

forefront

forefront: bool

Should the request be enqueued at the start of the queue?

id

id: str

json_

json_: str | None

TODO: get rid of this

label

label: str | None

A string used to differentiate between arbitrary request types.

last_proxy_tier

last_proxy_tier: int | None

The last proxy tier used to process the request.

max_retries

max_retries: int | None

Crawlee-specific limit on the number of retries of the request.

order_no

order_no: Decimal | None

TODO: get rid of this

session_rotation_count

session_rotation_count: int | None

Crawlee-specific number of finished session rotations for the request.

state

state: RequestState | None

Crawlee-specific request handling state.