Skip to main content

AutoscaledPool

crawlee.autoscaling.autoscaled_pool.AutoscaledPool

Manages a pool of asynchronous resource-intensive tasks that are executed in parallel.

The pool only starts new tasks if there is enough free CPU and memory available. If an exception is thrown in any of the tasks, it is propagated and the pool is stopped.

Index

Constructors

__init__

  • __init__(*, system_status, concurrency_settings, run_task_function, is_task_ready_function, is_finished_function, task_timeout, autoscale_interval, logging_interval, desired_concurrency_ratio, scale_up_step_ratio, scale_down_step_ratio): None
  • Initialize the AutoscaledPool.


    Parameters

    • system_status: SystemStatuskeyword-only
    • concurrency_settings: ConcurrencySettings | None = Nonekeyword-only
    • run_task_function: Callable[[], Awaitable]keyword-only
    • is_task_ready_function: Callable[[], Awaitable[bool]]keyword-only
    • is_finished_function: Callable[[], Awaitable[bool]]keyword-only
    • task_timeout: timedelta | None = Nonekeyword-only
    • autoscale_interval: timedelta = timedelta(seconds=10)keyword-only
    • logging_interval: timedelta = timedelta(minutes=1)keyword-only
    • desired_concurrency_ratio: float = 0.9keyword-only
    • scale_up_step_ratio: float = 0.05keyword-only
    • scale_down_step_ratio: float = 0.05keyword-only

    Returns None

Methods

abort

  • async abort(): None
  • Interrupt the autoscaled pool and all the tasks in progress.


    Returns None

pause

  • pause(): None
  • Pause the autoscaled pool so that it does not start new tasks.


    Returns None

resume

  • resume(): None
  • Resume a paused autoscaled pool so that it continues starting new tasks.


    Returns None

run

  • async run(): None
  • Start the autoscaled pool and return when all tasks are completed and is_finished_function returns True.

    If there is an exception in one of the tasks, it will be re-raised.


    Returns None

Properties

current_concurrency

current_concurrency: int

The number of concurrent tasks in progress.

desired_concurrency

desired_concurrency: int

The current desired concurrency, possibly updated by the pool according to system load.