RequestWithLock
Hierarchy
- Request
- RequestWithLock
Index
Methods
Properties
Methods
__eq__
Compare all relevant fields of the
Request
class, excluding deprecated fieldsjson_
andorder_no
.TODO: Remove this method once the issue is resolved. https://github.com/apify/crawlee-python/issues/94
Parameters
optionalkeyword-onlyother: object
Returns bool
from_base_request_data
Create a complete Request object based on a BaseRequestData instance.
Parameters
optionalkeyword-onlybase_request_data: BaseRequestData
optionalkeyword-onlyid: str | None = None
Returns Self
from_url
Create a new
Request
instance from a URL.This is recommended constructor for creating new
Request
instances. It generates aRequest
object from a given URL with additional options to customize HTTP method, payload, unique key, and other request properties. If nounique_key
orid
is provided, they are computed automatically based on the URL, method and payload. It depends on thekeep_url_fragment
anduse_extended_unique_key
flags.Parameters
optionalkeyword-onlyurl: str
The URL of the request.
optionalkeyword-onlymethod: HttpMethod = 'GET'
The HTTP method of the request.
optionalkeyword-onlyheaders: (HttpHeaders | dict[str, str]) | None = None
The HTTP headers of the request.
optionalkeyword-onlypayload: (HttpPayload | str) | None = None
The data to be sent as the request body. Typically used with 'POST' or 'PUT' requests.
optionalkeyword-onlylabel: str | None = None
A custom label to differentiate between request types. This is stored in
user_data
, and it is used for request routing (different requests go to different handlers).optionalkeyword-onlyunique_key: str | None = None
A unique key identifying the request. If not provided, it is automatically computed based on the URL and other parameters. Requests with the same
unique_key
are treated as identical.optionalkeyword-onlyid: str | None = None
A unique identifier for the request. If not provided, it is automatically generated from the
unique_key
.optionalkeyword-onlykeep_url_fragment: bool = False
Determines whether the URL fragment (e.g.,
`section`
) should be included in theunique_key
computation. This is only relevant whenunique_key
is not provided.optionalkeyword-onlyuse_extended_unique_key: bool = False
Determines whether to include the HTTP method and payload in the
unique_key
computation. This is only relevant whenunique_key
is not provided.optionalkeyword-onlyalways_enqueue: bool = False
If set to
True
, the request will be enqueued even if it is already present in the queue. Using this is not allowed when a customunique_key
is also provided and will result in aValueError
.optionalkeyword-onlykwargs: Any
Returns Self
get_query_param_from_url
Get the value of a specific query parameter from the URL.
Parameters
optionalkeyword-onlyparam: str
optionalkeyword-onlydefault: str | None = None
Returns str | None
Properties
crawl_depth
The depth of the request in the crawl tree.
crawlee_data
Crawlee-specific configuration stored in the user_data
.
enqueue_strategy
The strategy used when enqueueing the request.
forefront
Indicate whether the request should be enqueued at the front of the queue.
handled_at
Timestamp when the request was handled.
headers
HTTP request headers.
id
A unique identifier for the request. Note that this is not used for deduplication, and should not be confused
with unique_key
.
json_
Deprecated internal field, do not use it.
Should be removed as part of https://github.com/apify/crawlee-python/issues/94.
label
A string used to differentiate between arbitrary request types.
last_proxy_tier
The last proxy tier used to process the request.
loaded_url
URL of the web page that was loaded. This can differ from the original URL in case of redirects.
lock_expires_at
The timestamp when the lock expires.
max_retries
Crawlee-specific limit on the number of retries of the request.
method
HTTP request method.
model_config
no_retry
If set to True
, the request will not be retried in case of failure.
order_no
Deprecated internal field, do not use it.
Should be removed as part of https://github.com/apify/crawlee-python/issues/94.
payload
HTTP request payload.
TODO: Re-check the need for Validator
and Serializer
once the issue is resolved.
https://github.com/apify/crawlee-python/issues/94
retry_count
Number of times the request has been retried.
session_rotation_count
Crawlee-specific number of finished session rotations for the request.
state
Crawlee-specific request handling state.
unique_key
A unique key identifying the request. Two requests with the same unique_key
are considered as pointing
to the same URL.
If unique_key
is not provided, then it is automatically generated by normalizing the URL.
For example, the URL of HTTP://www.EXAMPLE.com/something/
will produce the unique_key
of http://www.example.com/something
.
Pass an arbitrary non-empty text value to the unique_key
property
to override the default behavior and specify which URLs shall be considered equal.
url
The URL of the web page to crawl. Must be a valid HTTP or HTTPS URL, and may include query parameters and fragments.
user_data
Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.
A crawling request with information about locks.