Configuration
Index
Methods
Properties
Methods
get_global_configuration
- Retrieve the global instance of the configuration. - Mostly for the backwards compatibility. It is recommended to use the - service_locator.get_configuration()instead.- Returns Self
Properties
available_memory_ratio
The maximum proportion of system memory to use. If memory_mbytes is not provided, this ratio is used to
calculate the maximum memory. This option is utilized by the Snapshotter.
default_browser_path
Specifies the path to the browser executable. Currently primarily for Playwright-based features. This option
is passed directly to Playwright's browser_type.launch method as executable_path argument. For more details,
refer to the Playwright documentation:
https://playwright.dev/docs/api/class-browsertype#browser-type-launch.
disable_browser_sandbox
Disables the sandbox for the browser. Currently primarily for Playwright-based features. This option
is passed directly to Playwright's browser_type.launch method as chromium_sandbox. For more details,
refer to the Playwright documentation:
https://playwright.dev/docs/api/class-browsertype#browser-type-launch.
headless
Whether to run the browser in headless mode. Currently primarily for Playwright-based features. This option
is passed directly to Playwright's browser_type.launch method as headless. For more details,
refer to the Playwright documentation:
https://playwright.dev/docs/api/class-browsertype#browser-type-launch.
internal_timeout
Timeout for the internal asynchronous operations.
log_level
The logging level.
max_client_errors
The maximum number of client errors (HTTP 429) allowed before the system is considered overloaded.
This option is used by the Snapshotter.
max_event_loop_delay
The maximum event loop delay. If the event loop delay exceeds this value, it is considered overloaded.
This option is used by the Snapshotter.
max_used_cpu_ratio
The maximum CPU usage ratio. If the CPU usage exceeds this value, the system is considered overloaded.
This option is used by the Snapshotter.
max_used_memory_ratio
The maximum memory usage ratio. If the memory usage exceeds this ratio, it is considered overloaded.
This option is used by the Snapshotter.
memory_mbytes
The maximum used memory in megabytes. This option is utilized by the Snapshotter.
model_config
persist_state_interval
Interval at which PersistState events are emitted. The event ensures the state persistence during
the crawler run. This option is utilized by the EventManager.
purge_on_start
Whether to purge the storage on the start. This option is utilized by the storage clients.
storage_dir
The path to the storage directory. This option is utilized by the storage clients.
system_info_interval
Interval at which SystemInfo events are emitted. The event represents the current status of the system.
This option is utilized by the LocalEventManager.
Configuration settings for the Crawlee project.
This class stores common configurable parameters for Crawlee. Default values are provided for all settings, so typically, no adjustments are necessary. However, you may modify settings for specific use cases, such as changing the default storage directory, the default storage IDs, the timeout for internal operations, and more.
Settings can also be configured via environment variables, prefixed with
CRAWLEE_.