Skip to main content
Version: Next

@crawlee/stagehand

Provides AI-powered web crawling using Stagehand for natural language browser automation. The enhanced page object offers page.act() to perform actions with plain English, page.extract() to get structured data with Zod schemas, and page.observe() to discover available actions.

Since StagehandCrawler uses AI models for page interaction, it is useful for crawling websites with complex or frequently changing layouts where traditional CSS selectors are difficult to maintain. If the target website has a stable structure, consider using PlaywrightCrawler, which is faster and doesn't require AI API keys.

The crawler extends BrowserCrawler and supports all standard Crawlee features including request queues, proxy rotation, autoscaling, and browser fingerprinting.

API Key Configuration

The apiKey option is interpreted based on the env setting:

  • env: 'LOCAL' (default): apiKey is the LLM provider key (OpenAI, Anthropic, or Google)
  • env: 'BROWSERBASE': apiKey is the Browserbase API key
const crawler = new StagehandCrawler({
stagehandOptions: {
model: 'openai/gpt-4.1-mini',
apiKey: 'sk-...', // LLM API key for LOCAL env
},
// ...
});

Alternatively, you can use environment variables (used as fallback when apiKey is not provided):

  • OpenAI: OPENAI_API_KEY
  • Anthropic: ANTHROPIC_API_KEY
  • Google: GOOGLE_API_KEY

Example usage

import { StagehandCrawler } from '@crawlee/stagehand';
import { z } from 'zod';

const crawler = new StagehandCrawler({
stagehandOptions: {
model: 'openai/gpt-4.1-mini',
},
async requestHandler({ page, request, log }) {
log.info(`Processing ${request.url}`);

// Use natural language to interact with the page
await page.act('Click the "Load More" button');

// Extract structured data with AI
const data = await page.extract(
'Get all product names and prices',
z.object({
products: z.array(z.object({
name: z.string(),
price: z.number(),
})),
}),
);

log.info(`Found ${data.products.length} products`);
},
});

await crawler.run(['https://example.com']);

AI-powered web crawling with Stagehand integration for Crawlee.

This package provides StagehandCrawler, which extends BrowserCrawler with natural language browser automation capabilities powered by Browserbase's Stagehand library.

Key Features

  • Natural Language Actions: Use page.act() to perform actions with plain English instructions
  • Structured Data Extraction: Use page.extract() with Zod schemas for type-safe data extraction
  • Action Discovery: Use page.observe() to get AI-suggested actions
  • Autonomous Agents: Use page.agent() for complex multi-step workflows
  • Anti-Blocking: Automatic browser fingerprinting and Cloudflare bypass
  • Browserbase Integration: Optional cloud browser support
@example
import { StagehandCrawler } from '@crawlee/stagehand';
import { z } from 'zod';

const crawler = new StagehandCrawler({
stagehandOptions: {
env: 'LOCAL',
model: 'openai/gpt-4.1-mini',
},
async requestHandler({ page, request, log }) {
log.info(`Processing ${request.url}`);

// Use natural language to interact
await page.act('Click the Products link');

// Extract structured data
const products = await page.extract(
'Get all products',
z.object({
items: z.array(z.object({
name: z.string(),
price: z.number(),
})),
})
);

await Dataset.pushData(products);
},
});

await crawler.run(['https://example.com']);

Index

References

Namespaces

Classes

Interfaces

Type Aliases

Functions

References

AddRequestsBatchedOptions

AddRequestsBatchedResult

AutoscaledPool

Re-exports AutoscaledPool

AutoscaledPoolOptions

BaseHttpClient

Re-exports BaseHttpClient

BaseHttpResponseData

BASIC_CRAWLER_TIMEOUT_BUFFER_SECS

BasicCrawler

Re-exports BasicCrawler

BasicCrawlerOptions

BasicCrawlingContext

BLOCKED_STATUS_CODES

BrowserCrawler

Re-exports BrowserCrawler

BrowserCrawlerOptions

BrowserCrawlingContext

BrowserErrorHandler

BrowserHook

Re-exports BrowserHook

BrowserLaunchContext

BrowserRequestHandler

checkStorageAccess

Cheerio

Re-exports Cheerio

CheerioAPI

Re-exports CheerioAPI

CheerioRoot

Re-exports CheerioRoot

ClientInfo

Re-exports ClientInfo

Configuration

Re-exports Configuration

ConfigurationOptions

Cookie

Re-exports Cookie

CrawlerAddRequestsOptions

CrawlerAddRequestsResult

CrawlerExperiments

CrawlerRunOptions

CrawlingContext

Re-exports CrawlingContext

createBasicRouter

CreateContextOptions

CreateSession

Re-exports CreateSession

CriticalError

Re-exports CriticalError

Dataset

Re-exports Dataset

DatasetConsumer

Re-exports DatasetConsumer

DatasetContent

Re-exports DatasetContent

DatasetDataOptions

DatasetExportOptions

DatasetExportToOptions

DatasetIteratorOptions

DatasetMapper

Re-exports DatasetMapper

DatasetOptions

Re-exports DatasetOptions

DatasetReducer

Re-exports DatasetReducer

Element

Re-exports Element

enqueueLinks

Re-exports enqueueLinks

EnqueueLinksOptions

EnqueueStrategy

Re-exports EnqueueStrategy

ErrnoException

Re-exports ErrnoException

ErrorHandler

Re-exports ErrorHandler

ErrorSnapshotter

Re-exports ErrorSnapshotter

ErrorTracker

Re-exports ErrorTracker

ErrorTrackerOptions

EventManager

Re-exports EventManager

EventType

Re-exports EventType

EventTypeName

Re-exports EventTypeName

filterRequestsByPatterns

FinalStatistics

Re-exports FinalStatistics

GetUserDataFromRequest

GlobInput

Re-exports GlobInput

GlobObject

Re-exports GlobObject

GotScrapingHttpClient

HttpRequest

Re-exports HttpRequest

HttpRequestOptions

HttpResponse

Re-exports HttpResponse

IRequestList

Re-exports IRequestList

IRequestManager

Re-exports IRequestManager

IStorage

Re-exports IStorage

KeyConsumer

Re-exports KeyConsumer

KeyValueStore

Re-exports KeyValueStore

KeyValueStoreIteratorOptions

KeyValueStoreOptions

LoadedRequest

Re-exports LoadedRequest

LocalEventManager

log

Re-exports log

Log

Re-exports Log

Logger

Re-exports Logger

LoggerJson

Re-exports LoggerJson

LoggerOptions

Re-exports LoggerOptions

LoggerText

Re-exports LoggerText

LogLevel

Re-exports LogLevel

MAX_POOL_SIZE

Re-exports MAX_POOL_SIZE

NonRetryableError

PERSIST_STATE_KEY

PersistenceOptions

processHttpRequestOptions

ProxyConfiguration

ProxyConfigurationFunction

ProxyConfigurationOptions

ProxyInfo

Re-exports ProxyInfo

PseudoUrl

Re-exports PseudoUrl

PseudoUrlInput

Re-exports PseudoUrlInput

PseudoUrlObject

Re-exports PseudoUrlObject

purgeDefaultStorages

PushErrorMessageOptions

QueueOperationInfo

RecordOptions

Re-exports RecordOptions

RecoverableState

Re-exports RecoverableState

RecoverableStateOptions

RecoverableStatePersistenceOptions

RedirectHandler

Re-exports RedirectHandler

RegExpInput

Re-exports RegExpInput

RegExpObject

Re-exports RegExpObject

Request

Re-exports Request

RequestHandler

Re-exports RequestHandler

RequestHandlerResult

RequestList

Re-exports RequestList

RequestListOptions

RequestListSourcesFunction

RequestListState

Re-exports RequestListState

RequestManagerTandem

RequestOptions

Re-exports RequestOptions

RequestProvider

Re-exports RequestProvider

RequestProviderOptions

RequestQueue

Re-exports RequestQueue

RequestQueueOperationOptions

RequestQueueOptions

RequestQueueV1

Re-exports RequestQueueV1

RequestQueueV2

Re-exports RequestQueueV2

RequestsLike

Re-exports RequestsLike

RequestState

Re-exports RequestState

RequestTransform

Re-exports RequestTransform

ResponseLike

Re-exports ResponseLike

ResponseTypes

Re-exports ResponseTypes

RestrictedCrawlingContext

RetryRequestError

Router

Re-exports Router

RouterHandler

Re-exports RouterHandler

RouterRoutes

Re-exports RouterRoutes

Session

Re-exports Session

SessionError

Re-exports SessionError

SessionOptions

Re-exports SessionOptions

SessionPool

Re-exports SessionPool

SessionPoolOptions

SessionState

Re-exports SessionState

SitemapRequestList

SitemapRequestListOptions

SkippedRequestCallback

SkippedRequestReason

SnapshotResult

Re-exports SnapshotResult

Snapshotter

Re-exports Snapshotter

SnapshotterOptions

Source

Re-exports Source

StatisticPersistedState

Statistics

Re-exports Statistics

StatisticsOptions

StatisticState

Re-exports StatisticState

StatusMessageCallback

StatusMessageCallbackParams

StorageClient

Re-exports StorageClient

StorageManagerOptions

StreamingHttpResponse

SystemInfo

Re-exports SystemInfo

SystemStatus

Re-exports SystemStatus

SystemStatusOptions

TieredProxy

Re-exports TieredProxy

tryAbsoluteURL

Re-exports tryAbsoluteURL

UrlPatternObject

Re-exports UrlPatternObject

useState

Re-exports useState

UseStateOptions

Re-exports UseStateOptions

withCheckedStorageAccess

Type Aliases

externalAgentConfig

AgentConfig: { cua?: boolean; executionModel?: string | AgentModelConfig<string>; integrations?: (Client | string)[]; mode?: AgentToolMode; model?: string | AgentModelConfig<string>; stream?: boolean; systemPrompt?: string; tools?: ToolSet }

Type declaration

  • externaloptionalcua?: boolean
    @deprecated

    Use mode: "cua" instead. This option will be removed in a future version. Enables Computer Use Agent (CUA) mode.

  • externaloptionalexecutionModel?: string | AgentModelConfig<string>

    The model to use for tool execution (observe/act calls within agent tools). If not specified, inherits from the main model configuration. Format: "provider/model" (e.g., "openai/gpt-4o-mini", "google/gemini-2.0-flash-exp")

  • externaloptionalintegrations?: (Client | string)[]

    MCP integrations - Array of Client objects

  • externaloptionalmode?: AgentToolMode

    Tool mode for the agent. Determines which set of tools are available.

    • 'dom' (default): Uses DOM-based tools (act, fillForm) for structured interactions
    • 'hybrid': Uses coordinate-based tools (click, type, dragAndDrop, clickAndHold, fillFormVision) for visual/screenshot-based interactions
    • 'cua': Uses Computer Use Agent (CUA) providers for screenshot-based automation
  • externaloptionalmodel?: string | AgentModelConfig<string>

    The model to use for agent functionality

  • externaloptionalstream?: boolean

    Enable streaming mode for the agent. When true, execute() returns AgentStreamResult with textStream for incremental output. When false (default), execute() returns AgentResult after completion.

  • externaloptionalsystemPrompt?: string

    Custom system prompt to provide to the agent. Overrides the default system prompt.

  • externaloptionaltools?: ToolSet

    Tools passed to the agent client

externalModelConfiguration

ModelConfiguration: AvailableModel | (ClientOptions & { modelName: AvailableModel })

StagehandGotoOptions

StagehandGotoOptions: Dictionary & Parameters<Page[goto]>[1]

Goto options for StagehandCrawler navigation.

Page Options