@crawlee/stagehand
Provides AI-powered web crawling using Stagehand for natural language browser automation. The enhanced page object offers page.act() to perform actions with plain English, page.extract() to get structured data with Zod schemas, and page.observe() to discover available actions.
Since StagehandCrawler uses AI models for page interaction, it is useful for crawling websites with complex or frequently changing layouts where traditional CSS selectors are difficult to maintain. If the target website has a stable structure, consider using PlaywrightCrawler, which is faster and doesn't require AI API keys.
The crawler extends BrowserCrawler and supports all standard Crawlee features including request queues, proxy rotation, autoscaling, and browser fingerprinting.
API Key Configuration
The apiKey option is interpreted based on the env setting:
env: 'LOCAL'(default):apiKeyis the LLM provider key (OpenAI, Anthropic, or Google)env: 'BROWSERBASE':apiKeyis the Browserbase API key
const crawler = new StagehandCrawler({
stagehandOptions: {
model: 'openai/gpt-4.1-mini',
apiKey: 'sk-...', // LLM API key for LOCAL env
},
// ...
});
Alternatively, you can use environment variables (used as fallback when apiKey is not provided):
- OpenAI:
OPENAI_API_KEY - Anthropic:
ANTHROPIC_API_KEY - Google:
GOOGLE_API_KEY
Example usage
import { StagehandCrawler } from '@crawlee/stagehand';
import { z } from 'zod';
const crawler = new StagehandCrawler({
stagehandOptions: {
model: 'openai/gpt-4.1-mini',
},
async requestHandler({ page, request, log }) {
log.info(`Processing ${request.url}`);
// Use natural language to interact with the page
await page.act('Click the "Load More" button');
// Extract structured data with AI
const data = await page.extract(
'Get all product names and prices',
z.object({
products: z.array(z.object({
name: z.string(),
price: z.number(),
})),
}),
);
log.info(`Found ${data.products.length} products`);
},
});
await crawler.run(['https://example.com']);
Index
References
- AddRequestsBatchedOptions
- AddRequestsBatchedResult
- AutoscaledPool
- AutoscaledPoolOptions
- BaseHttpClient
- BaseHttpResponseData
- BASIC_CRAWLER_TIMEOUT_BUFFER_SECS
- BasicCrawler
- BasicCrawlerOptions
- BasicCrawlingContext
- BLOCKED_STATUS_CODES
- BrowserCrawler
- BrowserCrawlerOptions
- BrowserCrawlingContext
- BrowserErrorHandler
- BrowserHook
- BrowserLaunchContext
- BrowserRequestHandler
- checkStorageAccess
- Cheerio
- CheerioAPI
- CheerioRoot
- ClientInfo
- Configuration
- ConfigurationOptions
- Cookie
- CrawlerAddRequestsOptions
- CrawlerAddRequestsResult
- CrawlerExperiments
- CrawlerRunOptions
- CrawlingContext
- createBasicRouter
- CreateContextOptions
- CreateSession
- CriticalError
- Dataset
- DatasetConsumer
- DatasetContent
- DatasetDataOptions
- DatasetExportOptions
- DatasetExportToOptions
- DatasetIteratorOptions
- DatasetMapper
- DatasetOptions
- DatasetReducer
- Element
- enqueueLinks
- EnqueueLinksOptions
- EnqueueStrategy
- ErrnoException
- ErrorHandler
- ErrorSnapshotter
- ErrorTracker
- ErrorTrackerOptions
- EventManager
- EventType
- EventTypeName
- filterRequestsByPatterns
- FinalStatistics
- GetUserDataFromRequest
- GlobInput
- GlobObject
- GotScrapingHttpClient
- HttpRequest
- HttpRequestOptions
- HttpResponse
- IRequestList
- IRequestManager
- IStorage
- KeyConsumer
- KeyValueStore
- KeyValueStoreIteratorOptions
- KeyValueStoreOptions
- LoadedRequest
- LocalEventManager
- log
- Log
- Logger
- LoggerJson
- LoggerOptions
- LoggerText
- LogLevel
- MAX_POOL_SIZE
- NonRetryableError
- PERSIST_STATE_KEY
- PersistenceOptions
- processHttpRequestOptions
- ProxyConfiguration
- ProxyConfigurationFunction
- ProxyConfigurationOptions
- ProxyInfo
- PseudoUrl
- PseudoUrlInput
- PseudoUrlObject
- purgeDefaultStorages
- PushErrorMessageOptions
- QueueOperationInfo
- RecordOptions
- RecoverableState
- RecoverableStateOptions
- RecoverableStatePersistenceOptions
- RedirectHandler
- RegExpInput
- RegExpObject
- Request
- RequestHandler
- RequestHandlerResult
- RequestList
- RequestListOptions
- RequestListSourcesFunction
- RequestListState
- RequestManagerTandem
- RequestOptions
- RequestProvider
- RequestProviderOptions
- RequestQueue
- RequestQueueOperationOptions
- RequestQueueOptions
- RequestQueueV1
- RequestQueueV2
- RequestsLike
- RequestState
- RequestTransform
- ResponseLike
- ResponseTypes
- RestrictedCrawlingContext
- RetryRequestError
- Router
- RouterHandler
- RouterRoutes
- Session
- SessionError
- SessionOptions
- SessionPool
- SessionPoolOptions
- SessionState
- SitemapRequestList
- SitemapRequestListOptions
- SkippedRequestCallback
- SkippedRequestReason
- SnapshotResult
- Snapshotter
- SnapshotterOptions
- Source
- StatisticPersistedState
- Statistics
- StatisticsOptions
- StatisticState
- StatusMessageCallback
- StatusMessageCallbackParams
- StorageClient
- StorageManagerOptions
- StreamingHttpResponse
- SystemInfo
- SystemStatus
- SystemStatusOptions
- TieredProxy
- tryAbsoluteURL
- UrlPatternObject
- useState
- UseStateOptions
- withCheckedStorageAccess
Namespaces
Classes
Interfaces
Type Aliases
Functions
References
AddRequestsBatchedOptions
AddRequestsBatchedResult
AutoscaledPool
AutoscaledPoolOptions
BaseHttpClient
BaseHttpResponseData
BASIC_CRAWLER_TIMEOUT_BUFFER_SECS
BasicCrawler
BasicCrawlerOptions
BasicCrawlingContext
BLOCKED_STATUS_CODES
BrowserCrawler
BrowserCrawlerOptions
BrowserCrawlingContext
BrowserErrorHandler
BrowserHook
BrowserLaunchContext
BrowserRequestHandler
checkStorageAccess
Cheerio
CheerioAPI
CheerioRoot
ClientInfo
Configuration
ConfigurationOptions
Cookie
CrawlerAddRequestsOptions
CrawlerAddRequestsResult
CrawlerExperiments
CrawlerRunOptions
CrawlingContext
createBasicRouter
CreateContextOptions
CreateSession
CriticalError
Dataset
DatasetConsumer
DatasetContent
DatasetDataOptions
DatasetExportOptions
DatasetExportToOptions
DatasetIteratorOptions
DatasetMapper
DatasetOptions
DatasetReducer
Element
enqueueLinks
EnqueueLinksOptions
EnqueueStrategy
ErrnoException
ErrorHandler
ErrorSnapshotter
ErrorTracker
ErrorTrackerOptions
EventManager
EventType
EventTypeName
filterRequestsByPatterns
FinalStatistics
GetUserDataFromRequest
GlobInput
GlobObject
GotScrapingHttpClient
HttpRequest
HttpRequestOptions
HttpResponse
IRequestList
IRequestManager
IStorage
KeyConsumer
KeyValueStore
KeyValueStoreIteratorOptions
KeyValueStoreOptions
LoadedRequest
LocalEventManager
log
Log
Logger
LoggerJson
LoggerOptions
LoggerText
LogLevel
MAX_POOL_SIZE
NonRetryableError
PERSIST_STATE_KEY
PersistenceOptions
processHttpRequestOptions
ProxyConfiguration
ProxyConfigurationFunction
ProxyConfigurationOptions
ProxyInfo
PseudoUrl
PseudoUrlInput
PseudoUrlObject
purgeDefaultStorages
PushErrorMessageOptions
QueueOperationInfo
RecordOptions
RecoverableState
RecoverableStateOptions
RecoverableStatePersistenceOptions
RedirectHandler
RegExpInput
RegExpObject
Request
RequestHandler
RequestHandlerResult
RequestList
RequestListOptions
RequestListSourcesFunction
RequestListState
RequestManagerTandem
RequestOptions
RequestProvider
RequestProviderOptions
RequestQueue
RequestQueueOperationOptions
RequestQueueOptions
RequestQueueV1
RequestQueueV2
RequestsLike
RequestState
RequestTransform
ResponseLike
ResponseTypes
RestrictedCrawlingContext
RetryRequestError
Router
RouterHandler
RouterRoutes
Session
SessionError
SessionOptions
SessionPool
SessionPoolOptions
SessionState
SitemapRequestList
SitemapRequestListOptions
SkippedRequestCallback
SkippedRequestReason
SnapshotResult
Snapshotter
SnapshotterOptions
Source
StatisticPersistedState
Statistics
StatisticsOptions
StatisticState
StatusMessageCallback
StatusMessageCallbackParams
StorageClient
StorageManagerOptions
StreamingHttpResponse
SystemInfo
SystemStatus
SystemStatusOptions
TieredProxy
tryAbsoluteURL
UrlPatternObject
useState
UseStateOptions
withCheckedStorageAccess
Type Aliases
externalAgentConfig
Type declaration
externaloptionalcua?: boolean
externaloptionalexecutionModel?: string | AgentModelConfig<string>
The model to use for tool execution (observe/act calls within agent tools). If not specified, inherits from the main model configuration. Format: "provider/model" (e.g., "openai/gpt-4o-mini", "google/gemini-2.0-flash-exp")
externaloptionalintegrations?: (Client | string)[]
MCP integrations - Array of Client objects
externaloptionalmode?: AgentToolMode
Tool mode for the agent. Determines which set of tools are available.
- 'dom' (default): Uses DOM-based tools (act, fillForm) for structured interactions
- 'hybrid': Uses coordinate-based tools (click, type, dragAndDrop, clickAndHold, fillFormVision) for visual/screenshot-based interactions
- 'cua': Uses Computer Use Agent (CUA) providers for screenshot-based automation
externaloptionalmodel?: string | AgentModelConfig<string>
The model to use for agent functionality
externaloptionalstream?: boolean
Enable streaming mode for the agent. When true, execute() returns AgentStreamResult with textStream for incremental output. When false (default), execute() returns AgentResult after completion.
externaloptionalsystemPrompt?: string
Custom system prompt to provide to the agent. Overrides the default system prompt.
externaloptionaltools?: ToolSet
Tools passed to the agent client
externalModelConfiguration
StagehandGotoOptions
Goto options for StagehandCrawler navigation.
AI-powered web crawling with Stagehand integration for Crawlee.
This package provides StagehandCrawler, which extends BrowserCrawler with natural language browser automation capabilities powered by Browserbase's Stagehand library.
Key Features
page.act()to perform actions with plain English instructionspage.extract()with Zod schemas for type-safe data extractionpage.observe()to get AI-suggested actionspage.agent()for complex multi-step workflows