PlaywrightCrawler
crawlee.playwright_crawler._playwright_crawler.PlaywrightCrawler
Index
Constructors
Constructors
__init__
Create a new instance.
Parameters
browser_pool: BrowserPool | None = None
browser_type: BrowserType | None = None
headless: bool | None = None
kwargs: Unpack[BasicCrawlerOptions[PlaywrightCrawlingContext]]
Returns None
A crawler that leverages the Playwright browser automation library.
PlaywrightCrawler
is a subclass ofBasicCrawler
, inheriting all its features, such as autoscaling of requests, request routing, and utilization ofRequestProvider
. Additionally, it offers Playwright-specific methods and properties, like thepage
property for user data extraction, and theenqueue_links
method for crawling other pages.This crawler is ideal for crawling websites that require JavaScript execution, as it uses headless browsers to download web pages and extract data. For websites that do not require JavaScript, consider using
BeautifulSoupCrawler
, which uses raw HTTP requests, and it is much faster.PlaywrightCrawler
opens a new browser page (i.e., tab) for eachRequest
object and invokes the user-provided request handler function via theRouter
. Users can interact with the page and extract the data using the Playwright API.Note that the pool of browser instances used by
PlaywrightCrawler
, and the pages they open, is internally managed by theBrowserPool
.