Skip to main content
Version: Next

StagehandCrawler

StagehandCrawler provides AI-powered web crawling using Browserbase's Stagehand library.

It extends BrowserCrawler and adds natural language interaction capabilities:

  • page.act() - Perform actions using natural language
  • page.extract() - Extract structured data with AI
  • page.observe() - Get AI-suggested actions
  • page.agent() - Create autonomous agents for complex workflows

The crawler automatically applies anti-blocking features including browser fingerprinting, making it suitable for crawling sites with bot protection like Cloudflare.

@example
import { StagehandCrawler } from '@crawlee/stagehand';
import { z } from 'zod';

const crawler = new StagehandCrawler({
stagehandOptions: {
env: 'LOCAL',
model: 'openai/gpt-4.1-mini',
verbose: 1,
},
maxConcurrency: 3,
async requestHandler({ page, request, log }) {
log.info(`Crawling ${request.url}`);

// Use AI to interact with the page
await page.act('Click the Products link');
await page.act('Scroll to load more items');

// Extract structured data
const products = await page.extract(
'Get all product names and prices',
z.object({
items: z.array(z.object({
name: z.string(),
price: z.number(),
})),
})
);

log.info(`Found ${products.items.length} products`);
},
});

await crawler.run(['https://example.com']);

Hierarchy

Index

Constructors

constructor

  • Creates a new instance of StagehandCrawler.


    Parameters

    Returns StagehandCrawler

Properties

optionalinheritedautoscaledPool

autoscaledPool?: AutoscaledPool

A reference to the underlying AutoscaledPool class that manages the concurrency of the crawler.

NOTE: This property is only initialized after calling the crawler.run() function. We can use it to change the concurrency settings on the fly, to pause the crawler by calling autoscaledPool.pause() or to abort it by calling autoscaledPool.abort().

inheritedbrowserPool

browserPool: BrowserPool<{ browserPlugins: [StagehandPlugin] }, never, never, never, never, never>

A reference to the underlying BrowserPool class that manages the crawler's browsers.

readonlyinheritedconfig

config: Configuration = ...

inheritedhasFinishedBefore

hasFinishedBefore: boolean = false

inheritedlaunchContext

launchContext: BrowserLaunchContext<LaunchOptions, unknown>

readonlyinheritedlog

log: Log

optionalinheritedproxyConfiguration

proxyConfiguration?: ProxyConfiguration

A reference to the underlying ProxyConfiguration class that manages the crawler's proxies. Only available if used by the crawler.

optionalinheritedrequestList

requestList?: IRequestList

A reference to the underlying RequestList class that manages the crawler's requests. Only available if used by the crawler.

optionalinheritedrequestQueue

requestQueue?: RequestProvider

Dynamic queue of URLs to be processed. This is useful for recursive crawling of websites. A reference to the underlying RequestQueue class that manages the crawler's requests. Only available if used by the crawler.

readonlyinheritedrouter

router: RouterHandler<{ request: LoadedRequest<Request<Dictionary>> } & Omit<StagehandCrawlingContext<Dictionary>, request>> = ...

Default Router instance that will be used if we don't specify any requestHandler. See router.addHandler() and router.addDefaultHandler().

inheritedrunning

running: boolean = false

optionalinheritedsessionPool

sessionPool?: SessionPool

A reference to the underlying SessionPool class that manages the crawler's sessions. Only available if used by the crawler.

readonlyinheritedstats

stats: Statistics

A reference to the underlying Statistics class that collects and logs run statistics for requests.

Methods

inheritedaddRequests

  • Adds requests to the queue in batches. By default, it will resolve after the initial batch is added, and continue adding the rest in background. You can configure the batch size via batchSize option and the sleep time in between the batches via waitBetweenBatchesMillis. If you want to wait for all batches to be added to the queue, you can use the waitForAllRequestsToBeAdded promise you get in the response object.

    This is an alias for calling addRequestsBatched() on the implicit RequestQueue for this crawler instance.


    Parameters

    Returns Promise<CrawlerAddRequestsResult>

inheritedexportData

  • exportData<Data>(path, format, options): Promise<Data[]>
  • Retrieves all the data from the default crawler Dataset and exports them to the specified format. Supported formats are currently 'json' and 'csv', and will be inferred from the path automatically.


    Parameters

    Returns Promise<Data[]>

inheritedgetData

inheritedgetDataset

  • getDataset(idOrName): Promise<Dataset<Dictionary>>
  • Retrieves the specified Dataset, or the default crawler Dataset.


    Parameters

    • optionalidOrName: string

    Returns Promise<Dataset<Dictionary>>

inheritedgetRequestQueue

  • Returns Promise<RequestProvider>

inheritedpushData

  • pushData(data, datasetIdOrName): Promise<void>
  • Pushes data to the specified Dataset, or the default crawler Dataset by calling Dataset.pushData.


    Parameters

    • data: Dictionary | Dictionary[]
    • optionaldatasetIdOrName: string

    Returns Promise<void>

inheritedrun

  • Runs the crawler. Returns a promise that resolves once all the requests are processed and autoscaledPool.isFinished returns true.

    We can use the requests parameter to enqueue the initial requests — it is a shortcut for running crawler.addRequests() before crawler.run().


    Parameters

    Returns Promise<FinalStatistics>

inheritedsetStatusMessage

  • setStatusMessage(message, options): Promise<void>
  • This method is periodically called by the crawler, every statusMessageLoggingInterval seconds.


    Parameters

    Returns Promise<void>

inheritedstop

  • stop(reason): void
  • Gracefully stops the current run of the crawler.

    All the tasks active at the time of calling this method will be allowed to finish.

    To stop the crawler immediately, use crawler.teardown() instead.


    Parameters

    • reason: string = 'The crawler has been gracefully stopped.'

    Returns void

inheriteduseState

  • useState<State>(defaultValue): Promise<State>
  • Parameters

    • defaultValue: State = ...

    Returns Promise<State>