StagehandCrawler
Hierarchy
- BrowserCrawler<{ browserPlugins: [StagehandPlugin] }, LaunchOptions, StagehandCrawlingContext>
- StagehandCrawler
Index
Constructors
constructor
Creates a new instance of StagehandCrawler.
Parameters
options: StagehandCrawlerOptions = {}
Crawler configuration options
config: Configuration = ...
Returns StagehandCrawler
Properties
optionalinheritedautoscaledPool
A reference to the underlying AutoscaledPool class that manages the concurrency of the crawler.
NOTE: This property is only initialized after calling the
crawler.run()function. We can use it to change the concurrency settings on the fly, to pause the crawler by callingautoscaledPool.pause()or to abort it by callingautoscaledPool.abort().
inheritedbrowserPool
A reference to the underlying BrowserPool class that manages the crawler's browsers.
readonlyinheritedconfig
inheritedhasFinishedBefore
inheritedlaunchContext
readonlyinheritedlog
optionalinheritedproxyConfiguration
A reference to the underlying ProxyConfiguration class that manages the crawler's proxies. Only available if used by the crawler.
optionalinheritedrequestList
A reference to the underlying RequestList class that manages the crawler's requests. Only available if used by the crawler.
optionalinheritedrequestQueue
Dynamic queue of URLs to be processed. This is useful for recursive crawling of websites. A reference to the underlying RequestQueue class that manages the crawler's requests. Only available if used by the crawler.
readonlyinheritedrouter
Default Router instance that will be used if we don't specify any requestHandler.
See router.addHandler() and router.addDefaultHandler().
inheritedrunning
optionalinheritedsessionPool
A reference to the underlying SessionPool class that manages the crawler's sessions. Only available if used by the crawler.
readonlyinheritedstats
A reference to the underlying Statistics class that collects and logs run statistics for requests.
Methods
inheritedaddRequests
Adds requests to the queue in batches. By default, it will resolve after the initial batch is added, and continue adding the rest in background. You can configure the batch size via
batchSizeoption and the sleep time in between the batches viawaitBetweenBatchesMillis. If you want to wait for all batches to be added to the queue, you can use thewaitForAllRequestsToBeAddedpromise you get in the response object.This is an alias for calling
addRequestsBatched()on the implicitRequestQueuefor this crawler instance.Parameters
requests: RequestsLike
The requests to add
options: CrawlerAddRequestsOptions = {}
Options for the request queue
Returns Promise<CrawlerAddRequestsResult>
inheritedexportData
Retrieves all the data from the default crawler Dataset and exports them to the specified format. Supported formats are currently 'json' and 'csv', and will be inferred from the
pathautomatically.Parameters
path: string
optionalformat: json | csv
optionaloptions: DatasetExportOptions
Returns Promise<Data[]>
inheritedgetData
Retrieves data from the default crawler Dataset by calling Dataset.getData.
Parameters
rest...args: [options: DatasetDataOptions]
Returns Promise<DatasetContent<Dictionary>>
inheritedgetDataset
inheritedgetRequestQueue
Returns Promise<RequestProvider>
inheritedpushData
Pushes data to the specified Dataset, or the default crawler Dataset by calling Dataset.pushData.
Parameters
data: Dictionary | Dictionary[]
optionaldatasetIdOrName: string
Returns Promise<void>
inheritedrun
Runs the crawler. Returns a promise that resolves once all the requests are processed and
autoscaledPool.isFinishedreturnstrue.We can use the
requestsparameter to enqueue the initial requests — it is a shortcut for runningcrawler.addRequests()beforecrawler.run().Parameters
optionalrequests: RequestsLike
The requests to add.
optionaloptions: CrawlerRunOptions
Options for the request queue.
Returns Promise<FinalStatistics>
inheritedsetStatusMessage
This method is periodically called by the crawler, every
statusMessageLoggingIntervalseconds.Parameters
message: string
options: SetStatusMessageOptions = {}
Returns Promise<void>
inheritedstop
Gracefully stops the current run of the crawler.
All the tasks active at the time of calling this method will be allowed to finish.
To stop the crawler immediately, use
crawler.teardown()instead.Parameters
reason: string = 'The crawler has been gracefully stopped.'
Returns void
inheriteduseState
Parameters
defaultValue: State = ...
Returns Promise<State>
StagehandCrawler provides AI-powered web crawling using Browserbase's Stagehand library.
It extends BrowserCrawler and adds natural language interaction capabilities:
page.act()- Perform actions using natural languagepage.extract()- Extract structured data with AIpage.observe()- Get AI-suggested actionspage.agent()- Create autonomous agents for complex workflowsThe crawler automatically applies anti-blocking features including browser fingerprinting, making it suitable for crawling sites with bot protection like Cloudflare.