Skip to main content

BrowserPool

Manage a pool of browsers and pages, handling their lifecycle and resource allocation.

The BrowserPool is responsible for opening and closing browsers, managing pages within those browsers, and handling the overall lifecycle of these resources. It provides flexible configuration via constructor options, which include various hooks that allow for the insertion of custom behavior at different stages of the browser and page lifecycles.

The browsers in the pool can be in one of three states: active, inactive, or closed.

Index

Methods

__aenter__

  • Enter the context manager and initialize all browser plugins.


    Returns BrowserPool

__aexit__

  • async __aexit__(*, exc_type, exc_value, exc_traceback): None
  • Exit the context manager and close all browser plugins.


    Parameters

    • optionalkeyword-onlyexc_type: type[BaseException] | None
    • optionalkeyword-onlyexc_value: BaseException | None
    • optionalkeyword-onlyexc_traceback: TracebackType | None

    Returns None

__init__

  • __init__(*, plugins, operation_timeout, browser_inactive_threshold, identify_inactive_browsers_interval, close_inactive_browsers_interval): None
  • A default constructor.


    Parameters

    • optionalkeyword-onlyplugins: Sequence[BaseBrowserPlugin] | None = None

      Browser plugins serve as wrappers around various browser automation libraries, providing a consistent interface across different libraries.

    • optionalkeyword-onlyoperation_timeout: timedelta = timedelta(seconds=15)

      Operations of the underlying automation libraries, such as launching a browser or opening a new page, can sometimes get stuck. To prevent BrowserPool from becoming unresponsive, we add a timeout to these operations.

    • optionalkeyword-onlybrowser_inactive_threshold: timedelta = timedelta(seconds=10)

      The period of inactivity after which a browser is considered as inactive.

    • optionalkeyword-onlyidentify_inactive_browsers_interval: timedelta = timedelta(seconds=20)

      The period of inactivity after which a browser is considered as retired.

    • optionalkeyword-onlyclose_inactive_browsers_interval: timedelta = timedelta(seconds=30)

      The interval at which the pool checks for inactive browsers and closes them. The browser is considered as inactive if it has no active pages and has been idle for the specified period.

    Returns None

new_page

  • async new_page(*, page_id, browser_plugin, proxy_info): CrawleePage
  • Open a new page in a browser using the specified or a random browser plugin.


    Parameters

    • optionalkeyword-onlypage_id: str | None = None

      The ID to assign to the new page. If not provided, a random ID is generated.

    • optionalkeyword-onlybrowser_plugin: BaseBrowserPlugin | None = None

      browser_plugin: The browser plugin to use for creating the new page. If not provided, the next plugin in the rotation is used.

    • optionalkeyword-onlyproxy_info: ProxyInfo | None = None

      The proxy configuration to use for the new page.

    Returns CrawleePage

new_page_with_each_plugin

  • async new_page_with_each_plugin(): Sequence[CrawleePage]
  • Create a new page with each browser plugin in the pool.

    This method is useful for running scripts in multiple environments simultaneously, typically for testing or website analysis. Each page is created using a different browser plugin, allowing you to interact with various browser types concurrently.


    Returns Sequence[CrawleePage]

with_default_plugin

  • with_default_plugin(*, browser_type, browser_launch_options, browser_new_context_options, headless, kwargs): BrowserPool
  • Create a new instance with a single PlaywrightBrowserPlugin configured with the provided options.


    Parameters

    • optionalkeyword-onlybrowser_type: BrowserType | None = None

      The type of browser to launch ('chromium', 'firefox', or 'webkit').

    • optionalkeyword-onlybrowser_launch_options: Mapping[str, Any] | None = None

      Keyword arguments to pass to the browser launch method. These options are provided directly to Playwright's browser_type.launch method. For more details, refer to the Playwright documentation: https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch.

    • optionalkeyword-onlybrowser_new_context_options: Mapping[str, Any] | None = None

      Keyword arguments to pass to the browser new context method. These options are provided directly to Playwright's browser.new_context method. For more details, refer to the Playwright documentation: https://playwright.dev/python/docs/api/class-browser#browser-new-context.

    • optionalkeyword-onlyheadless: bool | None = None

      Whether to run the browser in headless mode.

    • optionalkeyword-onlykwargs: Any

      Additional arguments for default constructor.

    Returns BrowserPool

Properties

active

active: bool

Indicate whether the context is active.

active_browsers

active_browsers: Sequence[BaseBrowserController]

Return the active browsers in the pool.

inactive_browsers

inactive_browsers: Sequence[BaseBrowserController]

Return the inactive browsers in the pool.

pages

pages: Mapping[str, CrawleePage]

Return the pages in the pool.

plugins

plugins: Sequence[BaseBrowserPlugin]

Return the browser plugins.

total_pages_count

total_pages_count: int

Return the total number of pages opened since the browser pool was launched.