RequestList
Hierarchy
- RequestLoader
- RequestList
Index
Methods
__init__
fetch_next_request
Return the next request to be processed, or
null
if there are no more pending requests.Returns Request | None
get_handled_count
Return the number of handled requests.
Returns int
get_total_count
Return an offline approximation of the total number of requests in the source (i.e. pending + handled).
Returns int
is_empty
Return True if there are no more requests in the source (there might still be unfinished requests).
Returns bool
is_finished
Return True if all requests have been handled.
Returns bool
mark_request_as_handled
Marks a request as handled after a successful processing (or after giving up retrying).
Parameters
optionalkeyword-onlyrequest: Request
Returns ProcessedRequest | None
to_tandem
Combine the loader with a request manager to support adding and reclaiming requests.
Parameters
optionalkeyword-onlyrequest_manager: RequestManager | None = None
Request manager to combine the loader with. If None is given, the default request queue is used.
Returns RequestManagerTandem
Represents a (potentially very large) list of URLs to crawl.
Disclaimer: The
RequestList
class is in its early version and is not fully implemented. It is currently intended mainly for testing purposes and small-scale projects. The current implementation is only in-memory storage and is very limited. It will be (re)implemented in the future. For more details, see the GitHub issue: https://github.com/apify/crawlee-python/issues/99. For production usage we recommend to use theRequestQueue
.