Skip to main content

RequestList

crawlee.storages._request_list.RequestList

Represents a (potentially very large) list of URLs to crawl.

Disclaimer: The RequestList class is in its early version and is not fully implemented. It is currently intended mainly for testing purposes and small-scale projects. The current implementation is only in-memory storage and is very limited. It will be (re)implemented in the future. For more details, see the GitHub issue: https://github.com/apify/crawlee-python/issues/99. For production usage we recommend to use the RequestQueue.

Index

Constructors

__init__

  • __init__(requests, name): None
  • Initialize the RequestList.


    Parameters

    • requests: Sequence[str | Request] | None = None
    • name: str | None = None

    Returns None

Methods

add_request

  • async add_request(request, *, forefront): ProcessedRequest
  • Parameters

    • request: str | Request
    • forefront: bool = Falsekeyword-only

    Returns ProcessedRequest

add_requests_batched

  • async add_requests_batched(requests, *, batch_size, wait_time_between_batches, wait_for_all_requests_to_be_added, wait_for_all_requests_to_be_added_timeout): None
  • Parameters

    • requests: Sequence[str | Request]
    • batch_size: int = 1000keyword-only
    • wait_time_between_batches: timedelta = timedelta(seconds=1)keyword-only
    • wait_for_all_requests_to_be_added: bool = Falsekeyword-only
    • wait_for_all_requests_to_be_added_timeout: timedelta | None = Nonekeyword-only

    Returns None

drop

  • async drop(): None
  • Returns None

fetch_next_request

  • async fetch_next_request(): Request | None
  • Returns Request | None

get_handled_count

  • async get_handled_count(): int
  • Returns int

get_total_count

  • async get_total_count(): int
  • Returns int

is_empty

  • async is_empty(): bool
  • Returns bool

is_finished

  • async is_finished(): bool
  • Returns bool

mark_request_as_handled

  • async mark_request_as_handled(request): None
  • Parameters

    • request: Request

    Returns None

reclaim_request

  • async reclaim_request(request, *, forefront): None
  • Parameters

    • request: Request
    • forefront: bool = Falsekeyword-only

    Returns None

Properties

name

name: str