Skip to main content

SqlStorageClient

SQL implementation of the storage client.

This storage client provides access to datasets, key-value stores, and request queues that persist data to a SQL database using SQLAlchemy 2+. Each storage type uses two tables: one for metadata and one for records.

The client accepts either a database connection string or a pre-configured AsyncEngine. If neither is provided, it creates a default SQLite database 'crawlee.db' in the storage directory.

Database schema is automatically created during initialization. SQLite databases receive performance optimizations including WAL mode and increased cache size.

Hierarchy

Index

Methods

__aenter__

__aexit__

  • async __aexit__(exc_type, exc_value, exc_traceback): None
  • Async context manager exit.


    Parameters

    • exc_type: type[BaseException] | None
    • exc_value: BaseException | None
    • exc_traceback: TracebackType | None

    Returns None

__init__

  • __init__(*, connection_string, engine): None
  • Initialize the SQL storage client.


    Parameters

    • optionalkeyword-onlyconnection_string: str | None = None

      Database connection string (e.g., "sqlite+aiosqlite:///crawlee.db"). If not provided, defaults to SQLite database in the storage directory.

    • optionalkeyword-onlyengine: AsyncEngine | None = None

      Pre-configured AsyncEngine instance. If provided, connection_string is ignored.

    Returns None

close

  • async close(): None
  • Close the database connection pool.


    Returns None

create_dataset_client

  • async create_dataset_client(*, id, name, alias, configuration): DatasetClient

create_kvs_client

create_rq_client

create_session

  • create_session(): AsyncSession
  • Create a new database session.


    Returns AsyncSession

get_accessed_modified_update_interval

  • get_accessed_modified_update_interval(): timedelta
  • Get the interval for accessed and modified updates.


    Returns timedelta

get_dialect_name

  • get_dialect_name(): str | None
  • Get the database dialect name.


    Returns str | None

get_rate_limit_errors

  • get_rate_limit_errors(): dict[int, int]

get_storage_client_cache_key

  • get_storage_client_cache_key(configuration): Hashable
  • Return a cache key that can differentiate between different storages of this and other clients.

    Can be based on configuration or on the client itself. By default, returns a module and name of the client class.


    Parameters

    Returns Hashable

initialize

  • async initialize(configuration): None
  • Initialize the database schema.

    This method creates all necessary tables if they don't exist. Should be called before using the storage client.


    Parameters

    Returns None

Properties

engine

engine: AsyncEngine

Get the SQLAlchemy AsyncEngine instance.