Skip to main content
Version: 3.1

Configuration

Configuration is a value object holding Crawlee configuration. By default, there is a global singleton instance of this class available via Configuration.getGlobalConfig(). Places that depend on a configurable behaviour depend on this class, as they have the global instance as the default value.

Using global configuration:

import { BasicCrawler, Configuration } from 'crawlee';

// Get the global configuration
const config = Configuration.getGlobalConfig();
// Set the 'persistStateIntervalMillis' option
// of global configuration to 10 seconds
config.set('persistStateIntervalMillis', 10_000);

// No need to pass the configuration to the crawler,
// as it's using the global configuration by default
const crawler = new BasicCrawler();

Using custom configuration:

import { BasicCrawler, Configuration } from 'crawlee';

// Create a new configuration
const config = new Configuration({ persistStateIntervalMillis: 30_000 });
// Pass the configuration to the crawler
const crawler = new BasicCrawler({ ... }, config);

The configuration provided via environment variables always takes precedence. We can also define the crawlee.json file in the project root directory which will serve as a baseline, so the options provided in constructor will override those. In other words, the precedence is:

crawlee.json < constructor options < environment variables

Supported Configuration Options

KeyEnvironment VariableDefault Value
memoryMbytesCRAWLEE_MEMORY_MBYTES-
logLevelCRAWLEE_LOG_LEVEL-
headlessCRAWLEE_HEADLESStrue
defaultDatasetIdCRAWLEE_DEFAULT_DATASET_ID'default'
defaultKeyValueStoreIdCRAWLEE_DEFAULT_KEY_VALUE_STORE_ID'default'
defaultRequestQueueIdCRAWLEE_DEFAULT_REQUEST_QUEUE_ID'default'
persistStateIntervalMillisCRAWLEE_PERSIST_STATE_INTERVAL_MILLIS60_000
purgeOnStartCRAWLEE_PURGE_ON_STARTtrue
persistStorageCRAWLEE_PERSIST_STORAGEtrue

Advanced Configuration Options

KeyEnvironment VariableDefault Value
inputKeyCRAWLEE_INPUT_KEY'INPUT'
xvfbCRAWLEE_XVFB-
chromeExecutablePathCRAWLEE_CHROME_EXECUTABLE_PATH-
defaultBrowserPathCRAWLEE_DEFAULT_BROWSER_PATH-
disableBrowserSandboxCRAWLEE_DISABLE_BROWSER_SANDBOX-
availableMemoryRatioCRAWLEE_AVAILABLE_MEMORY_RATIO0.25

Index

Constructors

constructor

  • new Configuration(options?: ConfigurationOptions): Configuration
  • Creates new Configuration instance with provided options. Env vars will have precedence over those.


    Parameters

    • options: ConfigurationOptions = {}

    Returns Configuration

Methods

get

  • get<T, U>(key: T, defaultValue?: U): U
  • Returns configured value. First checks the environment variables, then provided configuration, fallbacks to the defaultValue argument if provided, otherwise uses the default value as described in the above section.


    Type parameters

    • T: keyof ConfigurationOptions
    • U: undefined | string | number | boolean | Dictionary<any> | StorageClient | (radix?: number) => string | () => number | (fractionDigits?: number) => string | (fractionDigits?: number) => string | (precision?: number) => string | ({ (locales?: string | string[], options?: NumberFormatOptions): string; (locales?: LocalesArgument, options?: NumberFormatOptions): string }) | EventManager

    Parameters

    • key: T
    • optionaldefaultValue: U

    Returns U

getEventManager

  • getEventManager(): EventManager
  • Returns EventManager

set

  • set(key: keyof ConfigurationOptions, value?: any): void
  • Sets value for given option. Only affects this Configuration instance, the value will not be propagated down to the env var. To reset a value, we can omit the value argument or pass undefined there.


    Parameters

    • key: keyof ConfigurationOptions
    • optionalvalue: any

    Returns void

useEventManager

  • useEventManager(events: EventManager): void
  • Parameters

    • events: EventManager

    Returns void

useStorageClient

  • useStorageClient(client: StorageClient): void
  • Parameters

    • client: StorageClient

    Returns void

staticgetEventManager

  • getEventManager(): EventManager

staticgetGlobalConfig

  • getGlobalConfig(): Configuration
  • Returns the global configuration instance. It will respect the environment variables.


    Returns Configuration

staticgetStorageClient

  • getStorageClient(): StorageClient

staticresetGlobalState

  • resetGlobalState(): void
  • Resets global configuration instance. The default instance holds configuration based on env vars, if we want to change them, we need to first reset the global state. Used mainly for testing purposes.


    Returns void