RequestWithLock

A crawling request with information about locks.

Hierarchy

Request
- RequestWithLock

Methods

from_url

from_url(url, *, method, headers, payload, label, session_id, unique_key, id, keep_url_fragment, use_extended_unique_key, always_enqueue, kwargs): Self

Inherited from Request.from_url
Create a new Request instance from a URL.

This is recommended constructor for creating new Request instances. It generates a Request object from a given URL with additional options to customize HTTP method, payload, unique key, and other request properties. If no unique_key or id is provided, they are computed automatically based on the URL, method and payload. It depends on the keep_url_fragment and use_extended_unique_key flags.
Parameters
- url: str
  The URL of the request.
- optionalkeyword-onlymethod: HttpMethod = 'GET'
  The HTTP method of the request.
- optionalkeyword-onlyheaders: HttpHeaders | dict[str, str] | None = None
  The HTTP headers of the request.
- optionalkeyword-onlypayload: HttpPayload | str | None = None
  The data to be sent as the request body. Typically used with 'POST' or 'PUT' requests.
- optionalkeyword-onlylabel: str | None = None
  A custom label to differentiate between request types. This is stored in user_data, and it is used for request routing (different requests go to different handlers).
- optionalkeyword-onlysession_id: str | None = None
  ID of a specific Session to which the request will be strictly bound. If the session becomes unavailable when the request is processed, a RequestCollisionError will be raised.
- optionalkeyword-onlyunique_key: str | None = None
  A unique key identifying the request. If not provided, it is automatically computed based on the URL and other parameters. Requests with the same unique_key are treated as identical.
- optionalkeyword-onlyid: str | None = None
  A unique identifier for the request. If not provided, it is automatically generated from the unique_key.
- optionalkeyword-onlykeep_url_fragment: bool = False
  Determines whether the URL fragment (e.g., `section`) should be included in the unique_key computation. This is only relevant when unique_key is not provided.
- optionalkeyword-onlyuse_extended_unique_key: bool = False
  Determines whether to include the HTTP method, ID Session and payload in the unique_key computation. This is only relevant when unique_key is not provided.
- optionalkeyword-onlyalways_enqueue: bool = False
  If set to True, the request will be enqueued even if it is already present in the queue. Using this is not allowed when a custom unique_key is also provided and will result in a ValueError.
- kwargs: Any
Returns Self

get_query_param_from_url

get_query_param_from_url(param, *, default): str | None

Inherited from Request.get_query_param_from_url
Get the value of a specific query parameter from the URL.
Parameters
- param: str
- optionalkeyword-onlydefault: str | None = None
Returns str | None

Properties

crawl_depth

crawl_depth: int

The depth of the request in the crawl tree.

crawlee_data

crawlee_data: CrawleeRequestData

Crawlee-specific configuration stored in the user_data.

enqueue_strategy

enqueue_strategy: EnqueueStrategy

The strategy that was used for enqueuing the request.

forefront

forefront: bool

Indicate whether the request should be enqueued at the front of the queue.

handled_at

handled_at: Annotated[datetime | None, Field(alias='handledAt')]

Timestamp when the request was handled.

headers

headers: Annotated[HttpHeaders, Field(default_factory=HttpHeaders)]

HTTP request headers.

id

id: str

A unique identifier for the request. Note that this is not used for deduplication, and should not be confused with unique_key.

label

label: str | None

A string used to differentiate between arbitrary request types.

last_proxy_tier

last_proxy_tier: int | None

The last proxy tier used to process the request.

loaded_url

loaded_url: Annotated[str | None, BeforeValidator(validate_http_url), Field(alias='loadedUrl')]

URL of the web page that was loaded. This can differ from the original URL in case of redirects.

lock_expires_at

lock_expires_at: datetime

The timestamp when the lock expires.

max_retries

max_retries: int | None

Crawlee-specific limit on the number of retries of the request.

method

method: HttpMethod

HTTP request method.

model_config

model_config: Undefined

no_retry

no_retry: Annotated[bool, Field(alias='noRetry')]

If set to True, the request will not be retried in case of failure.

payload

payload: Annotated[ HttpPayload | None, BeforeValidator(lambda v: v.encode() if isinstance(v, str) else v), PlainSerializer(lambda v: v.decode() if isinstance(v, bytes) else v), ]

HTTP request payload.

retry_count

retry_count: Annotated[int, Field(alias='retryCount')]

Number of times the request has been retried.

session_id

session_id: str | None

The ID of the bound session, if there is any.

session_rotation_count

session_rotation_count: int | None

Crawlee-specific number of finished session rotations for the request.

state

state: RequestState | None

Crawlee-specific request handling state.

unique_key

unique_key: Annotated[str, Field(alias='uniqueKey')]

A unique key identifying the request. Two requests with the same unique_key are considered as pointing to the same URL.

If unique_key is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the unique_key of http://www.example.com/something.

Pass an arbitrary non-empty text value to the unique_key property to override the default behavior and specify which URLs shall be considered equal.

url

url: Annotated[str, BeforeValidator(validate_http_url), Field()]

The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.

user_data

user_data: Annotated[ dict[str, JsonSerializable], Field(alias='userData', default_factory=lambda: UserData()), PlainValidator(user_data_adapter.validate_python), PlainSerializer( lambda instance: user_data_adapter.dump_python( instance, by_alias=True, exclude_none=True, exclude_unset=True, exclude_defaults=True, ) ), ]

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.

was_already_handled

was_already_handled: bool

Indicates whether the request was handled.

Hierarchy

Index

Methods

Properties

Methods

from_url

Parameters

url: str

optionalkeyword-onlymethod: HttpMethod = 'GET'

optionalkeyword-onlyheaders: HttpHeaders | dict[str, str] | None = None

optionalkeyword-onlypayload: HttpPayload | str | None = None

optionalkeyword-onlylabel: str | None = None

optionalkeyword-onlysession_id: str | None = None

optionalkeyword-onlyunique_key: str | None = None

optionalkeyword-onlyid: str | None = None

optionalkeyword-onlykeep_url_fragment: bool = False

optionalkeyword-onlyuse_extended_unique_key: bool = False

optionalkeyword-onlyalways_enqueue: bool = False

kwargs: Any

Returns Self

get_query_param_from_url

Parameters

param: str

optionalkeyword-onlydefault: str | None = None

Returns str | None

Properties

crawl_depth

crawlee_data

enqueue_strategy

forefront

handled_at

headers

id

label

last_proxy_tier

loaded_url

lock_expires_at

max_retries

method

model_config

no_retry

payload

retry_count

session_id

session_rotation_count

state

unique_key

url

user_data

was_already_handled