RequestList
Hierarchy
- RequestProvider
- RequestList
Index
Methods
__init__
A default constructor.
Parameters
optionalkeyword-onlyrequests: Sequence[str | Request] | None = None
The request objects (or their string representations) to be added to the provider.
optionalkeyword-onlyname: str | None = None
A name of the request list.
Returns None
add_request
Add a single request to the provider and store it in underlying resource client.
Parameters
optionalkeyword-onlyrequest: str | Request
The request object (or its string representation) to be added to the provider.
optionalkeyword-onlyforefront: bool = False
Determines whether the request should be added to the beginning (if True) or the end (if False) of the provider.
Returns ProcessedRequest
add_requests_batched
Add requests to the underlying resource client in batches.
Parameters
optionalkeyword-onlyrequests: Sequence[str | Request]
Requests to add to the queue.
optionalkeyword-onlybatch_size: int = 1000
The number of requests to add in one batch.
optionalkeyword-onlywait_time_between_batches: timedelta = timedelta(seconds=1)
Time to wait between adding batches.
optionalkeyword-onlywait_for_all_requests_to_be_added: bool = False
If True, wait for all requests to be added before returning.
optionalkeyword-onlywait_for_all_requests_to_be_added_timeout: timedelta | None = None
Timeout for waiting for all requests to be added.
Returns None
drop
Removes the queue either from the Apify Cloud storage or from the local database.
Returns None
fetch_next_request
Returns a next request in the queue to be processed, or
null
if there are no more pending requests.Returns Request | None
get_handled_count
Returns the number of handled requests.
Returns int
get_total_count
Returns an offline approximation of the total number of requests in the queue (i.e. pending + handled).
Returns int
is_empty
Returns True if there are no more requests in the queue (there might still be unfinished requests).
Returns bool
is_finished
Returns True if all requests have been handled.
Returns bool
mark_request_as_handled
Marks a request as handled after a successful processing (or after giving up retrying).
Parameters
optionalkeyword-onlyrequest: Request
Returns ProcessedRequest | None
reclaim_request
Reclaims a failed request back to the queue, so that it can be returned for processing later again.
It is possible to modify the request data by supplying an updated request as a parameter.
Parameters
optionalkeyword-onlyrequest: Request
optionalkeyword-onlyforefront: bool = False
Returns ProcessedRequest | None
Properties
name
ID or name of the request queue.
Represents a (potentially very large) list of URLs to crawl.
Disclaimer: The
RequestList
class is in its early version and is not fully implemented. It is currently intended mainly for testing purposes and small-scale projects. The current implementation is only in-memory storage and is very limited. It will be (re)implemented in the future. For more details, see the GitHub issue: https://github.com/apify/crawlee-python/issues/99. For production usage we recommend to use theRequestQueue
.