Skip to main content

RequestWithLock

A crawling request with information about locks.

Hierarchy

Index

Methods

__eq__

  • __eq__(*, other): bool
  • Compare all relevant fields of the Request class, excluding deprecated fields json_ and order_no.

    TODO: Remove this method once the issue is resolved. https://github.com/apify/crawlee-python/issues/94


    Parameters

    • optionalkeyword-onlyother: object

    Returns bool

from_base_request_data

  • from_base_request_data(*, base_request_data, id): Self
  • Create a complete Request object based on a BaseRequestData instance.


    Parameters

    • optionalkeyword-onlybase_request_data: BaseRequestData
    • optionalkeyword-onlyid: str | None = None

    Returns Self

from_url

  • from_url(*, url, method, headers, payload, label, unique_key, id, keep_url_fragment, use_extended_unique_key, always_enqueue, kwargs): Self
  • Create a new Request instance from a URL.

    This is recommended constructor for creating new Request instances. It generates a Request object from a given URL with additional options to customize HTTP method, payload, unique key, and other request properties. If no unique_key or id is provided, they are computed automatically based on the URL, method and payload. It depends on the keep_url_fragment and use_extended_unique_key flags.


    Parameters

    • optionalkeyword-onlyurl: str

      The URL of the request.

    • optionalkeyword-onlymethod: HttpMethod = 'GET'

      The HTTP method of the request.

    • optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None

      The HTTP headers of the request.

    • optionalkeyword-onlypayload: (HttpPayload | str) | None = None

      The data to be sent as the request body. Typically used with 'POST' or 'PUT' requests.

    • optionalkeyword-onlylabel: str | None = None

      A custom label to differentiate between request types. This is stored in user_data, and it is used for request routing (different requests go to different handlers).

    • optionalkeyword-onlyunique_key: str | None = None

      A unique key identifying the request. If not provided, it is automatically computed based on the URL and other parameters. Requests with the same unique_key are treated as identical.

    • optionalkeyword-onlyid: str | None = None

      A unique identifier for the request. If not provided, it is automatically generated from the unique_key.

    • optionalkeyword-onlykeep_url_fragment: bool = False

      Determines whether the URL fragment (e.g., `section`) should be included in the unique_key computation. This is only relevant when unique_key is not provided.

    • optionalkeyword-onlyuse_extended_unique_key: bool = False

      Determines whether to include the HTTP method and payload in the unique_key computation. This is only relevant when unique_key is not provided.

    • optionalkeyword-onlyalways_enqueue: bool = False

      If set to True, the request will be enqueued even if it is already present in the queue. Using this is not allowed when a custom unique_key is also provided and will result in a ValueError.

    • optionalkeyword-onlykwargs: Any

    Returns Self

get_query_param_from_url

  • get_query_param_from_url(*, param, default): str | None
  • Get the value of a specific query parameter from the URL.


    Parameters

    • optionalkeyword-onlyparam: str
    • optionalkeyword-onlydefault: str | None = None

    Returns str | None

Properties

crawl_depth

crawl_depth: int

The depth of the request in the crawl tree.

crawlee_data

crawlee_data: CrawleeRequestData

Crawlee-specific configuration stored in the user_data.

enqueue_strategy

enqueue_strategy: EnqueueStrategy

The strategy used when enqueueing the request.

forefront

forefront: bool

Indicate whether the request should be enqueued at the front of the queue.

handled_at

handled_at: datetime | None

Timestamp when the request was handled.

headers

headers: HttpHeaders

HTTP request headers.

id

id: str

A unique identifier for the request. Note that this is not used for deduplication, and should not be confused with unique_key.

json_

json_: str | None

Deprecated internal field, do not use it.

Should be removed as part of https://github.com/apify/crawlee-python/issues/94.

label

label: str | None

A string used to differentiate between arbitrary request types.

last_proxy_tier

last_proxy_tier: int | None

The last proxy tier used to process the request.

loaded_url

loaded_url: str | None

URL of the web page that was loaded. This can differ from the original URL in case of redirects.

lock_expires_at

lock_expires_at: datetime

The timestamp when the lock expires.

max_retries

max_retries: int | None

Crawlee-specific limit on the number of retries of the request.

method

method: HttpMethod

HTTP request method.

model_config

model_config: Undefined

no_retry

no_retry: bool

If set to True, the request will not be retried in case of failure.

order_no

order_no: Decimal | None

Deprecated internal field, do not use it.

Should be removed as part of https://github.com/apify/crawlee-python/issues/94.

payload

payload: HttpPayload | None

HTTP request payload.

TODO: Re-check the need for Validator and Serializer once the issue is resolved. https://github.com/apify/crawlee-python/issues/94

retry_count

retry_count: int

Number of times the request has been retried.

session_rotation_count

session_rotation_count: int | None

Crawlee-specific number of finished session rotations for the request.

state

state: RequestState | None

Crawlee-specific request handling state.

unique_key

unique_key: str

A unique key identifying the request. Two requests with the same unique_key are considered as pointing to the same URL.

If unique_key is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the unique_key of http://www.example.com/something.

Pass an arbitrary non-empty text value to the unique_key property to override the default behavior and specify which URLs shall be considered equal.

url

url: str

The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.

user_data

user_data: dict[str, JsonSerializable]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.