Skip to main content

PlaywrightCrawler

crawlee.playwright_crawler._playwright_crawler.PlaywrightCrawler

A crawler that leverages the Playwright browser automation library.

PlaywrightCrawler is a subclass of BasicCrawler, inheriting all its features, such as autoscaling of requests, request routing, and utilization of RequestProvider. Additionally, it offers Playwright-specific methods and properties, like the page property for user data extraction, and the enqueue_links method for crawling other pages.

This crawler is ideal for crawling websites that require JavaScript execution, as it uses headless browsers to download web pages and extract data. For websites that do not require JavaScript, consider using BeautifulSoupCrawler, which uses raw HTTP requests, and it is much faster.

PlaywrightCrawler opens a new browser page (i.e., tab) for each Request object and invokes the user-provided request handler function via the Router. Users can interact with the page and extract the data using the Playwright API.

Note that the pool of browser instances used by PlaywrightCrawler, and the pages they open, is internally managed by the BrowserPool.

Index

Constructors

Constructors

__init__

  • __init__(browser_pool, browser_type, headless, kwargs): None
  • Create a new instance.


    Parameters

    • browser_pool: BrowserPool | None = None
    • browser_type: BrowserType | None = None
    • headless: bool | None = None
    • kwargs: Unpack[BasicCrawlerOptions[PlaywrightCrawlingContext]]

    Returns None