Request
Hierarchy
Index
Methods
Properties
Methods
__eq__
Compare all relevant fields of the
Request
class, excluding deprecated fieldsjson_
andorder_no
.TODO: Remove this method once the issue is resolved. https://github.com/apify/crawlee-python/issues/94
Parameters
optionalkeyword-onlyother: object
Returns bool
crawl_depth
Parameters
optionalkeyword-onlynew_value: int
Returns None
enqueue_strategy
Parameters
optionalkeyword-onlynew_enqueue_strategy: EnqueueStrategy
Returns None
forefront
Parameters
optionalkeyword-onlynew_value: bool
Returns None
from_base_request_data
Create a complete Request object based on a BaseRequestData instance.
Parameters
optionalkeyword-onlybase_request_data: BaseRequestData
optionalkeyword-onlyid: str | None = None
Returns Self
from_url
Create a new
Request
instance from a URL.This is recommended constructor for creating new
Request
instances. It generates aRequest
object from a given URL with additional options to customize HTTP method, payload, unique key, and other request properties. If nounique_key
orid
is provided, they are computed automatically based on the URL, method and payload. It depends on thekeep_url_fragment
anduse_extended_unique_key
flags.Parameters
optionalkeyword-onlyurl: str
The URL of the request.
optionalkeyword-onlymethod: HttpMethod = 'GET'
The HTTP method of the request.
optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
The HTTP headers of the request.
optionalkeyword-onlypayload: (HttpPayload | str) | None = None
The data to be sent as the request body. Typically used with 'POST' or 'PUT' requests.
optionalkeyword-onlylabel: str | None = None
A custom label to differentiate between request types. This is stored in
user_data
, and it is used for request routing (different requests go to different handlers).optionalkeyword-onlyunique_key: str | None = None
A unique key identifying the request. If not provided, it is automatically computed based on the URL and other parameters. Requests with the same
unique_key
are treated as identical.optionalkeyword-onlyid: str | None = None
A unique identifier for the request. If not provided, it is automatically generated from the
unique_key
.optionalkeyword-onlykeep_url_fragment: bool = False
Determines whether the URL fragment (e.g.,
`section`
) should be included in theunique_key
computation. This is only relevant whenunique_key
is not provided.optionalkeyword-onlyuse_extended_unique_key: bool = False
Determines whether to include the HTTP method and payload in the
unique_key
computation. This is only relevant whenunique_key
is not provided.optionalkeyword-onlyalways_enqueue: bool = False
If set to
True
, the request will be enqueued even if it is already present in the queue. Using this is not allowed when a customunique_key
is also provided and will result in aValueError
.optionalkeyword-onlykwargs: Any
Returns Self
get_query_param_from_url
Get the value of a specific query parameter from the URL.
Parameters
optionalkeyword-onlyparam: str
optionalkeyword-onlydefault: str | None = None
Returns str | None
last_proxy_tier
Parameters
optionalkeyword-onlynew_value: int
Returns None
max_retries
Parameters
optionalkeyword-onlynew_max_retries: int
Returns None
session_rotation_count
Parameters
optionalkeyword-onlynew_session_rotation_count: int
Returns None
state
Parameters
optionalkeyword-onlynew_state: RequestState
Returns None
Properties
crawl_depth
The depth of the request in the crawl tree.
crawlee_data
Crawlee-specific configuration stored in the user_data
.
enqueue_strategy
The strategy used when enqueueing the request.
forefront
Indicate whether the request should be enqueued at the front of the queue.
handled_at
Timestamp when the request was handled.
headers
HTTP request headers.
id
A unique identifier for the request. Note that this is not used for deduplication, and should not be confused
with unique_key
.
json_
Deprecated internal field, do not use it.
Should be removed as part of https://github.com/apify/crawlee-python/issues/94.
label
A string used to differentiate between arbitrary request types.
last_proxy_tier
The last proxy tier used to process the request.
loaded_url
URL of the web page that was loaded. This can differ from the original URL in case of redirects.
max_retries
Crawlee-specific limit on the number of retries of the request.
method
HTTP request method.
model_config
no_retry
If set to True
, the request will not be retried in case of failure.
order_no
Deprecated internal field, do not use it.
Should be removed as part of https://github.com/apify/crawlee-python/issues/94.
payload
HTTP request payload.
TODO: Re-check the need for Validator
and Serializer
once the issue is resolved.
https://github.com/apify/crawlee-python/issues/94
retry_count
Number of times the request has been retried.
session_rotation_count
Crawlee-specific number of finished session rotations for the request.
state
Crawlee-specific request handling state.
unique_key
A unique key identifying the request. Two requests with the same unique_key
are considered as pointing
to the same URL.
If unique_key
is not provided, then it is automatically generated by normalizing the URL.
For example, the URL of HTTP://www.EXAMPLE.com/something/
will produce the unique_key
of http://www.example.com/something
.
Pass an arbitrary non-empty text value to the unique_key
property
to override the default behavior and specify which URLs shall be considered equal.
url
The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.
user_data
Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.
Represents a request in the Crawlee framework, containing the necessary information for crawling operations.
The
Request
class is one of the core components in Crawlee, utilized by various components such as request providers, HTTP clients, crawlers, and more. It encapsulates the essential data for executing web requests, including the URL, HTTP method, headers, payload, and user data. The user data allows custom information to be stored and persisted throughout the request lifecycle, including its retries.Key functionalities include managing the request's identifier (
id
), unique key (unique_key
) that is used for request deduplication, controlling retries, handling state management, and enabling configuration for session rotation and proxy handling.The recommended way to create a new instance is by using the
Request.from_url
constructor, which automatically generates a unique key and identifier based on the URL and request parameters.Usage