Skip to main content

Configuration

Configuration settings for the Crawlee project.

This class stores common configurable parameters for Crawlee. Default values are provided for all settings, so typically, no adjustments are necessary. However, you may modify settings for specific use cases, such as changing the default storage directory, the default storage IDs, the timeout for internal operations, and more.

Settings can also be configured via environment variables, prefixed with CRAWLEE_.

Index

Methods

get_global_configuration

  • get_global_configuration(): Self
  • Retrieve the global instance of the configuration.


    Returns Self

Properties

available_memory_ratio

available_memory_ratio: float

The ratio of system memory to use when memory_mbytes is not specified. The Snapshotter.available_memory_ratio is set to this value.

chrome_executable_path

chrome_executable_path: str | None

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

default_browser_path

default_browser_path: str | None

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

default_dataset_id

default_dataset_id: str

The default dataset ID.

default_key_value_store_id

default_key_value_store_id: str

The default key-value store ID.

default_request_queue_id

default_request_queue_id: str

The default request queue ID.

disable_browser_sandbox

disable_browser_sandbox: bool

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

headless

headless: bool

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

internal_timeout

internal_timeout: timedelta | None

Timeout for the internal asynchronous operations.

log_level

log_level: Literal[DEBUG, INFO, WARNING, ERROR, CRITICAL]

The logging level.

max_used_cpu_ratio

max_used_cpu_ratio: float

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

memory_mbytes

memory_mbytes: int | None

The maximum memory in megabytes. The Snapshotter.max_memory_size is set to this value.

model_config

model_config: Undefined

persist_state_interval

persist_state_interval: timedelta_ms

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

persist_storage

persist_storage: bool

Whether to persist the storage.

purge_on_start

purge_on_start: bool

Whether to purge the storage on the start.

storage_dir

storage_dir: str

The path to the storage directory.

system_info_interval

system_info_interval: timedelta_ms

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.

verbose_log

verbose_log: bool

Whether to enable verbose logging.

write_metadata

write_metadata: bool

Whether to write the storage metadata.

xvfb

xvfb: bool

This setting is currently unused. For more details, see https://github.com/apify/crawlee-python/issues/670.