Skip to main content
Version: Next

@crawlee/core

Core set of classes required for Crawlee.

The crawlee package consists of several smaller packages, released separately under @crawlee namespace:

Installing Crawlee

Most of the Crawlee packages are extending and reexporting each other, so it's enough to install just the one you plan on using, e.g. @crawlee/playwright if you plan on using playwright - it already contains everything from the @crawlee/browser package, which includes everything from @crawlee/basic, which includes everything from @crawlee/core.

If we don't care much about additional code being pulled in, we can just use the crawlee meta-package, which contains (re-exports) most of the @crawlee/* packages, and therefore contains all the crawler classes.

npm install crawlee

Or if all we need is cheerio support, we can install only @crawlee/cheerio.

npm install @crawlee/cheerio

When using playwright or puppeteer, we still need to install those dependencies explicitly - this allows the users to be in control of which version will be used.

npm install crawlee playwright
# or npm install @crawlee/playwright playwright

Alternatively we can also use the crawlee meta-package which contains (re-exports) most of the @crawlee/* packages, and therefore contains all the crawler classes.

Sometimes you might want to use some utility methods from @crawlee/utils, so you might want to install that as well. This package contains some utilities that were previously available under Apify.utils. Browser related utilities can be also found in the crawler packages (e.g. @crawlee/playwright).

Index

Crawlers

Result Stores

Scaling

Sources

Other

Other

RequestQueueV2

Renames and re-exports RequestQueue

EventTypeName

EventTypeName: EventType | systemInfo | persistState | migrating | aborting | exit

GetUserDataFromRequest

GetUserDataFromRequest<T>: T extends Request<infer Y> ? Y : never

Type parameters

  • T

GlobInput

GlobInput: string | GlobObject

GlobObject

GlobObject: { glob: string } & Pick<RequestOptions, method | payload | label | userData | headers>

LoadedRequest

LoadedRequest<R>: WithRequired<R, id | loadedUrl>

Type parameters

PseudoUrlInput

PseudoUrlInput: string | PseudoUrlObject

PseudoUrlObject

PseudoUrlObject: { purl: string } & Pick<RequestOptions, method | payload | label | userData | headers>

RedirectHandler

RedirectHandler: (redirectResponse: BaseHttpResponseData, updatedRequest: { headers: SimpleHeaders; url?: string | URL }) => void

Type of a function called when an HTTP redirect takes place. It is allowed to mutate the updatedRequest argument.


Type declaration

    • (redirectResponse: BaseHttpResponseData, updatedRequest: { headers: SimpleHeaders; url?: string | URL }): void
    • Parameters

      • redirectResponse: BaseHttpResponseData
      • updatedRequest: { headers: SimpleHeaders; url?: string | URL }
        • headers: SimpleHeaders
        • optionalurl: string | URL

      Returns void

RegExpInput

RegExpInput: RegExp | RegExpObject

RegExpObject

RegExpObject: { regexp: RegExp } & Pick<RequestOptions, method | payload | label | userData | headers>

RequestListSourcesFunction

RequestListSourcesFunction: () => Promise<RequestListSource[]>

Type declaration

    • (): Promise<RequestListSource[]>
    • Returns Promise<RequestListSource[]>

RouterRoutes

RouterRoutes<Context, UserData>: { [ label in string | symbol ]: (ctx: Omit<Context, request> & { request: Request<UserData> }) => Awaitable<void> }

Type parameters

  • Context
  • UserData: Dictionary

Source

Source: (Partial<RequestOptions> & { regex?: RegExp; requestsFromUrl?: string }) | Request

UrlPatternObject

UrlPatternObject: { glob?: string; regexp?: RegExp } & Pick<RequestOptions, method | payload | label | userData | headers>

constBLOCKED_STATUS_CODES

BLOCKED_STATUS_CODES: number[] = ...

externalconstlog

log: Log

constMAX_POOL_SIZE

MAX_POOL_SIZE: 1000 = 1000

constPERSIST_STATE_KEY

PERSIST_STATE_KEY: SDK_SESSION_POOL_STATE = 'SDK_SESSION_POOL_STATE'