RequestList
crawlee.storages._request_list.RequestList
Index
Constructors
__init__
Initialize the RequestList.
Parameters
requests: Sequence[str | Request] | None = None
name: str | None = None
Returns None
Methods
add_request
Parameters
request: str | Request
forefront: bool = Falsekeyword-only
Returns ProcessedRequest
add_requests_batched
Parameters
requests: Sequence[str | Request]
batch_size: int = 1000keyword-only
wait_time_between_batches: timedelta = timedelta(seconds=1)keyword-only
wait_for_all_requests_to_be_added: bool = Falsekeyword-only
wait_for_all_requests_to_be_added_timeout: timedelta | None = Nonekeyword-only
Returns None
drop
Returns None
fetch_next_request
Returns Request | None
get_handled_count
Returns int
get_total_count
Returns int
is_empty
Returns bool
is_finished
Returns bool
mark_request_as_handled
Parameters
request: Request
Returns None
reclaim_request
Parameters
request: Request
forefront: bool = Falsekeyword-only
Returns None
Represents a (potentially very large) list of URLs to crawl.
Disclaimer: The
RequestList
class is in its early version and is not fully implemented. It is currently intended mainly for testing purposes and small-scale projects. The current implementation is only in-memory storage and is very limited. It will be (re)implemented in the future. For more details, see the GitHub issue: https://github.com/apify/crawlee-python/issues/99. For production usage we recommend to use theRequestQueue
.