Skip to main content

BrowserPool

crawlee.browsers.browser_pool.BrowserPool

Manages a pool of browsers and their pages, handling lifecycle events and resource allocation.

This class is responsible for opening and closing browsers, managing pages within those browsers, and handling the overall lifecycle of these resources. It provides flexible configuration via constructor options, which include various hooks that allow for the insertion of custom behavior at different stages of the browser and page lifecycles.

The browsers in the pool can be in one of three states: active, inactive, or closed.

Index

Constructors

__init__

  • __init__(plugins, *, operation_timeout, browser_inactive_threshold, identify_inactive_browsers_interval, close_inactive_browsers_interval): None
  • Create a new instance.


    Parameters

    • plugins: Sequence[BaseBrowserPlugin] | None = None
    • operation_timeout: timedelta = timedelta(seconds=15)keyword-only
    • browser_inactive_threshold: timedelta = timedelta(seconds=10)keyword-only
    • identify_inactive_browsers_interval: timedelta = timedelta(seconds=20)keyword-only
    • close_inactive_browsers_interval: timedelta = timedelta(seconds=30)keyword-only

    Returns None

Methods

__aenter__

  • async __aenter__(): BrowserPool
  • Enter the context manager and initialize all browser plugins.


    Returns BrowserPool

__aexit__

  • async __aexit__(exc_type, exc_value, exc_traceback): None
  • Exit the context manager and close all browser plugins.


    Parameters

    • exc_type: type[BaseException] | None
    • exc_value: BaseException | None
    • exc_traceback: TracebackType | None

    Returns None

new_page

  • async new_page(*, page_id, browser_plugin): CrawleePage
  • Opens a new page in a browser using the specified or a random browser plugin.


    Parameters

    • page_id: str | None = Nonekeyword-only
    • browser_plugin: BaseBrowserPlugin | None = Nonekeyword-only

    Returns CrawleePage

new_page_with_each_plugin

  • async new_page_with_each_plugin(): Sequence[CrawleePage]
  • Create a new page with each browser plugin in the pool.

    This method is useful for running scripts in multiple environments simultaneously, typically for testing or website analysis. Each page is created using a different browser plugin, allowing you to interact with various browser types concurrently.


    Returns Sequence[CrawleePage]

with_default_plugin

  • with_default_plugin(*, headless, browser_type, kwargs): BrowserPool
  • Create a new instance with a single BaseBrowserPlugin configured with the provided options.


    Parameters

    • headless: bool | None = Nonekeyword-only
    • browser_type: Literal['chromium', 'firefox', 'webkit'] | None = Nonekeyword-only
    • kwargs: Any

    Returns BrowserPool

Properties

active_browsers

active_browsers: Sequence[BaseBrowserController]

Return the active browsers in the pool.

inactive_browsers

inactive_browsers: Sequence[BaseBrowserController]

Return the inactive browsers in the pool.

pages

pages: Mapping[str, CrawleePage]

Return the pages in the pool.

plugins

plugins: Sequence[BaseBrowserPlugin]

Return the browser plugins.

total_pages_count

total_pages_count: int

Returns the total number of pages opened since the browser pool was launched.